1
2
3\startcomponent languagesoptions
4
5\environment languagesenvironment
6
7\startchapter[title=Options][color=darkblue]
8
9\startsection[title=Introduction]
10
11Hyphenation of words is controlled by so called patterns. They take a word and
12try to match parts with a pattern that describes where a hyphen can be injected.
13Preferred and discouraged injection points accumulate to a score that in the end
14determine where so called discretionary nodes gets injected in the list of
15glyphs that make a word. The patterns are language specific.
16
17This mechanism is agnostic when it comes to the characters involved: they are
18just numbers. However, when in a next step font features like ligature building
19and kerning are applied we also have to deal with language specific properties
20(and meanings). Often a ligature at the boundary of a composed word can make
21reading confusing and has to be avoided. Some of that can be controlled by the
22font when it implements language specific features but because that approach is
23not based on a dictionary it is more about playing safe and prevention than about
24quality.
25
26In the next sections a mechanism is discussed that also uses patterns. This time
27it is about controlling fonts as well as how hyphenation patterns are applied.
28This process kicks in before hyphenation is applied but it definitely has to be
29seen as part of that same process. It is integrated in hyphenation machinery and
30acts as preprocessor with the possibility to feedback and move forward. The
31implementation is such that when its not used there is no performance penalty.
32\footnote {There are by now plenty of alternative approaches to these problems
33but after some discussion about the pros and cons of each this new mechanism was
34made. I admit that the fun factor played a role. It is also one of the things we
35can do in \LUAMETATEX\ without worrying about a possible negative impact on
36\LUATEX\ users other than \CONTEXT .}
37
38There are several predefined operations that are characterized by keywords and
39shortcuts and collected in an option list that is part of a language goodie file.
40Examples can be found in the distribution in files with the suffix \type {llg}
41(\LUA\ language goodie). The framework of such a file is:
42
43\starttyping
44return {
45 name = "whatever",
46 version = "1.00",
47 comment = "Goodies for experiments and demo.",
48 author = "Hans Hagen",
49 copyright = "ConTeXt development team",
50 options = {
51 { ... },
52 ........
53 { ... },
54 }
55}
56\stoptyping
57
58These options will eventually result in patterns that are bound to words,
59think of:
60
61\starttabulate[T]
62\NC effe \NC \type {foobar} \NC \type {....} \NC inhibit ligature \NC \NR
63\NC foobar \NC \type {foo=bar} \NC \type {...=...} \NC inhibit kerning \NC \NR
64\NC somemore \NC \type {somemore} \NC \type {........} \NC compound word \NC \NR
65\stoptabulate
66
67The whole repertoire is:
68
69\starttabulate[T]
70\NC \type {ab} \NC a:norightligature, b:noleftligature \NC \NR
71\NC \type {a=b} \NC a:norightkern, b:noleftkern \NC \NR
72\NC \type {a<b} \NC b:noleftkern \NC \NR
73\NC \type {a>b} \NC a:norightkern \NC \NR
74\NC \type {ab} \NC a:compound:b \NC \NR
75\stoptabulate
76
77Later we will see how some can be combined. An option can be defined using entries
78in a subtable:
79
80\starttabulate[T]
81\NC patterns \NC hash \NC \type {[snippet] = "replacement pattern"} \NC \NR
82\NC words \NC string \NC string of words, separated by whitespace \NC \NR
83\NC prefixes \NC string \NC snippets that combine with words (at the start) \NC \NR
84\NC suffixes \NC string \NC snippets that combine with words (at the end) \NC \NR
85\NC matches \NC array or number \NC a number or table indicating which match matters \NC \NR
86\NC actions \NC hash \NC \type {[character] = "action(s)"} \NC \NR
87\NC characters \NC string \NC permitted characters (additional hjcodes) \NC \NR
88\NC return \NC integer \NC what to do next \NC \NR
89\stoptabulate
90
91The default return value is2 but there are some more:
92
93\starttabulate[T]
94\NC 0 \NC go to the next (valid) word \NC \NR
95\NC 1 \NC restart \NC \NR
96\NC 2 \NC exceptions and after that patterns \NC \NR
97\NC 3 \NC patterns \NC \NR
98\stoptabulate
99
100There are some safeguards built in that force a restart. For instance when a word
101is replaced a restart is enforces unless we skip the word. A restart will not
102permit a second replacement (after all we need to avoid endless loops).
103
104In a multiline word list, lines that start with a comment trigger: \LUAs
105double dash or the usual \TEX\ percent sign.
106
107\stopsection
108
109\startsection[title=Inhibiting]
110
111The next definition replaces \type {ff} by \type {ff} in the words given and
112eventually block a ligature.
113
114\starttyping
115{
116 patterns = {
117 ff = "ff",
118 },
119 words = [[
120 effe
121 ]],
122}
123\stoptyping
124
125Some fonts provide the \type {ij} ligature or do some special kerning between
126these characters (something Dutch). Because it depends on the font logic if a
127dedicated replacement or kerning is used this is an example where we do this:
128
129\starttyping
130{
131 patterns = {
132 ij = "ij",
133 },
134 actions = {
135 [""] = "nokern noligature",
136 },
137 words = [[
138 ijverig
139 fijn to ligature fi or ij, thats the question
140 ]],
141}
142\stoptyping
143
144A more extensive definition is the following. Here we explicitly define that only
145the first match in a word get treated. Here we not only block ligatures but also
146kerns.
147
148\starttyping
149{
150 patterns = {
151 ff = "ff",
152 },
153 matches = { 1 },
154 actions = {
155 [""] = "noligature nokern"
156 },
157 words = [[
158 effe
159 effeffe
160 ]],
161}
162\stoptyping
163
164You can also omit the pattern when you inject specifiers yourself:
165
166\starttyping
167{
168 actions = {
169 [""] = "noligature nokern"
170 },
171 words = [[
172 effe
173 effeffe
174 ]],
175}
176\stoptyping
177
178You can also use different shortcuts:
179
180\starttyping
181{
182 actions = {
183 ["1"] = "noligature"
184 ["2"] = "nokern"
185 },
186 words = [[
187 ef1fe
188 ef1fef2fe
189 ]],
190}
191\stoptyping
192
193Although I cannot come up with a nice example, there can be reasons for
194inhibiting kerns. Here we inhibit kerns left of the upcoming character:
195
196\starttyping
197{
198 patterns = {
199 fo = "f<o",
200 rm = "r<m",
201 },
202 words = [[
203 information
204 ]],
205}
206\stoptyping
207
208And here we inhibit kerns left of the previous and upcoming character:
209
210\starttyping
211{
212 patterns = {
213 th = "t=h",
214 },
215 words = [[
216 thrive
217 ]],
218}
219\stoptyping
220
221Just look in the files in the distribution for realistic examples, like
222
223\starttyping
224{
225 patterns = {
226 fi = "fi",
227 },
228 words = [[
229 deafish dwarfish elfish oafish selfish
230 ]],
231 suffixes = [[
232 ness ly
233 ]]
234}
235\stoptyping
236
237where we block ligatures in 15 words. Theres also a \type {prefixes} key.
238
239\stopsection
240
241\startsection[title=Replacements]
242
243Replacements are probably not used that much but here is one for German. Not
244only is the uppercase variant of ß seldom used, many fonts dont provide it
245so we can best replace it:
246
247\starttyping
248{
249 characters = "ẞ", uppercase ß, not visible in all verbatim fonts
250 patterns = {
251 ["ẞ"] = "SS", key is uppercase ß
252 },
253}
254\stoptyping
255
256Here we define that character as valid, something that normally is done with the
257patterns but patterns dont have them. If we do not specify it here, the
258hyphenator will skip this word. For the record: this can also be done with a font
259feature that decomposes the character.
260
261\stopsection
262
263\startsection[title=Compound words]
264
265You might want to suppress ligatures and maybe even kerning when compound words
266are involved.
267
268\starttyping
269{
270 patterns = {
271 ff = "ff",
272 },
273 words = [[
274 aaaaffaaaa
275 bbffbb
276 ]],
277}
278\stoptyping
279
280Again you can also say:
281
282\starttyping
283{
284 words = [[
285 aaaaffaaaa
286 bbffbb
287 ]],
288}
289\stoptyping
290
291But patterns make sense when you have a large list (that might come from some
292other source than yourself).
293
294The next specification will turn two times three \type {bla}s into a compound
295word but also make sure that we have at least 4 characters left and right of a
296potential break.
297
298\starttyping
299 {
300 left = 4,
301 right = 4,
302 words = [[
303 blablablablablabla
304 ]],
305 }
306\stoptyping
307
308\stopsection
309
310\startsection[title=Performance]
311
312Although these mechanisms introduce overhead, the performance hit in \LMTX\ is
313not that large. This is because the number of words in a document is limited and
314\LUA\ is fast enough.
315
316\stopsection
317
318\startsection[title=Plugins]
319
320{\em This interface is preliminary but for the record I put an example here
321anyway.}
322
323\starttyping
324local n = 0
325function document.myhack(original)
326 n = n 1
327 print(n,original)
328 return original
329end
330
331languages.installhandler("de","document.myhack")
332\stoptyping
333
334One can manipulate a text as in:
335
336\starttyping
337function document.myhack(original)
338 local t = utf.split(original)
339 local t = table.reverse(t)
340 local f = t[#t]
341 local l = t[1]
342 if characters.upper(f) == f then
343 t[1] = characters.upper()
344 t[#t] = characters.lower(f)
345 end
346 local original = table.concat(t)
347 return original
348end
349
350languages.installhandler("en","document.myhack")
351\stoptyping
352
353The text will fed again into the hyphenator and treated in the normal way. There
354are some safeguards against the text being processed twice.
355
356\stopsection
357
358\startsection[title=Tracing]
359
360You can also embed definitions in the source file:
361
362\starttyping
363\startlanguageoptions[de]
364 Zapfinnovation
365\stoplanguageoptions
366\stoptyping
367
368\stopsection
369
370\startsection[title=Exceptions]
371
372When you set exceptions in a goodie file, it will use the plugin mechanism to
373check for them. This is a bit more efficient than using the internal checkerm
374which actually also goes via a\LUA\ hash.
375
376\starttyping
377{
378 exceptions = [[
379 avery{}{}{w}eird{1}{2}{3}(w)ord
380 ]],
381}
382\stoptyping
383
384Watch out: when you specify a discretionary replacement three braced valued are
385passed: the pre, post and replace text. The replace text is used in the lookup,
386unless you add a string between parentheses, which then will be used instead. A
387digit between bracket will apply a penalty according to the following logic (in
388the engine): A zero digit results in \type {\hyphenpenalty}, otherwise the
389digits1 upto9 will be used as multiplier for \type {\exceptionpenalty} when
390that value is larger than 100000, otherwise \type {\exceptionpenalty} is used.
391
392\stopsection
393
394\startsection[title=Tracing]
395
396The following tracker can be used:
397
398\starttyping
399\enabletrackers[languages.goodies]
400\stoptyping
401
402In addition the style \type {languagesgoodies} implements some tracing options.
403You can just run that one to see what it does.
404
405The engine itself has also a tracing option: \type {\tracinghyphenation}. When
406set to zero nothing is shown, when set to one redundant patterns will be
407reported. A value of two reports what words get fed into the hyphenator and if
408they got hyphenated. A value of three gives more detail: when a word gets
409hyphenated the relevant (resulting) part of the node list is shown. You need to
410set \type {\tracingonline} to a value larger than zero to get this reported to
411the console. Expects lots of extra output to the console for large documents but
412it can be revealing.
413
414\stopsection
415
416\stopchapter
417
418\stopcomponent
419
420
421
422
423
424
425 |