SourceBrowser

evenmore-hyphenation.tex /size: 14 Kb last modification: 2025-02-21 11:03
1% language=us runpath=texruns:manuals/evenmore
2
3\environment evenmore-style
4
5\startcomponent evenmore-hyphenation
6
7\usebodyfont[pagella]
8
9\startchapter[title=Hyphenation]
10
11\startsection[title={Introduction}]
12
13Hyphenation is driven by the character codes. In a traditional \TEX\ such a code
14accesses a glyph in a font, which is why the font encoding mattered, but in
15\LUATEX\ we use \UNICODE\ and when hyphenation is applied. \footnote {In
16\CONTEXT\ \MKII\ we also use \UTF\ patterns, which made it possible to ship
17patterns that didn't depend on a font encoding. Mojca and Arthur made \UTF\ the
18default when the (upgraded) hyphenation pattern project started.} Later, the
19character codes are adapted by the font handler where they become glyphs. There
20are moments when you don't want to hyphenate and a cheap trick is to switch to a
21language that has no hyphenation patterns. But, in a system like \CONTEXT\ that
22doesn't work well because we have lots of language bound properties. Therefore in
23\MKIV\ we set the left- and right hyphen minima to extreme values, something that
24blocks hyphenation quite well. But this is not a pretty solution at all. Even
25worse is that when we have situations where discretionaries (\type
26{\discretionary}), automatic (\type{-}) or explicit (\type {\-}) are used these
27still kick in.
28
29For that reason in \LMTX\ we have a mode variable that controls hyphenation. In
30\LUATEX\ we have primitives like \type {\compoundhyphenmode}, \type
31{\hyphenationbounds} and \type {\hyphenpenaltymode} that controlled how
32hyphenation and discretionary injection is handled but when in \LUAMETATEX\ the
33more generic \type {\hyphenationmode} parameter was introduced the precursors
34were all merged into this one. One can argue that this is a form of regression
35but there are good reasons, most noticeably the fact that we keep these
36properties with glyph nodes so that we have better control over them in grouped
37situations where as some operations happen when the paragraph as whole get
38treated local overloads are lost. \footnote {Of course it also is a wink to those
39who complain that we add primitives to an otherwise leaner variant of \LUATEX,
40but let us not elaborate on that misunderstanding.} It anyway means that in
41\LMTX\ we have to set different parameters but that is no big deal because users
42are supposed to use the more high level interfaces; instead of setting parameters
43to values one flips bits in \type {\hyphenationmode}, which in the end makes more
44sense and also permits extensions later without adding much overhead.
45
46Currently this mode parameter controls the following options:
47
48\starttabulate[|Tr|||]
49\NC \uchexnumber{\normalhyphenationcode}           \NC \type{\normalhyphenationcode}           \NC honour the (normal) \type{\discretionary} primitive \NC \NR
50\NC \uchexnumber{\automatichyphenationcode}        \NC \type{\automatichyphenationcode}        \NC turn \type {-} into (automatic) discretionaries \NC \NR
51\NC \uchexnumber{\explicithyphenationcode}         \NC \type{\explicithyphenationcode}         \NC turn \type {\-} into (explicit) discretionaries \NC \NR
52\NC \uchexnumber{\syllablehyphenationcode}         \NC \type{\syllablehyphenationcode}         \NC hyphenate (syllable) according to language \NC \NR
53\NC \uchexnumber{\uppercasehyphenationcode}        \NC \type{\uppercasehyphenationcode}        \NC hyphenate uppercase characters too \NC \NR
54\NC \uchexnumber{\compoundhyphenationcode}         \NC \type{\compoundhyphenationcode}         \NC permit break at an explicit hyphen (border cases) \NC \NR
55\NC \uchexnumber{\strictstarthyphenationcode}      \NC \type{\strictstarthyphenationcode}      \NC traditional \TEX\ compatibility wrt the start of a word \NC \NR
56\NC \uchexnumber{\strictendhyphenationcode}        \NC \type{\strictendhyphenationcode}        \NC traditional \TEX\ compatibility wrt the end of a word \NC \NR
57\NC \uchexnumber{\automaticpenaltyhyphenationcode} \NC \type{\automaticpenaltyhyphenationcode} \NC use \type {\automatichyphenpenalty} \NC \NR
58\NC \uchexnumber{\explicitpenaltyhyphenationcode}  \NC \type{\explicitpenaltyhyphenationcode}  \NC use \type {\explicithyphenpenalty} \NC \NR
59\NC \uchexnumber{\permitgluehyphenationcode}       \NC \type{\permitgluehyphenationcode}       \NC turn glue in discretionaries into kerns \NC \NR
60\stoptabulate
61
62The default \CONTEXT\ setup is:
63
64\starttyping
65\hyphenationmode \numexpr
66    \normalhyphenationcode
67  + \automatichyphenationcode
68  + \explicithyphenationcode
69  + \syllablehyphenationcode
70  + \uppercasehyphenationcode
71  + \compoundhyphenationcode
72  % \strictstarthyphenationcode
73  % \strictendhyphenationcode
74  + \automaticpenaltyhyphenationcode
75  + \explicitpenaltyhyphenationcode
76  + \permitgluehyphenationcode
77\relax
78\stoptyping
79
80When a discretionary node is created (triggered by \type {\discretionary}) the
81current value is used. Injected glyph nodes on the other hand will store the
82current value and use that when it is needed for hyphenating the list.
83
84\stopsection
85
86\startsection[title={Controlling hyphenation}]
87
88We start with an example that has some Dutch words:
89
90\startbuffer[sample]
91NEDERLANDS\par Nederlands\par nederlands\par
92\CONTEXT  \par test\-test\par test-test \par
93\stopbuffer
94
95\typebuffer[sample]
96
97\startbuffer[result]
98\startlinecorrection
99\dontleavehmode \dorecurse{\boxlines\scratchboxone} {%
100   \setbox\scratchbox\boxline\scratchboxone#1%
101   \ruledhpack{\strut\unhbox\scratchbox}%
102   \kern.25\emwidth
103}
104\stoplinecorrection
105\stopbuffer
106
107When we typeset this with a \type {\hsize} of 2mm we get:
108
109\setbox\scratchboxone\vbox{\dontcomplain \nl \hsize 2mm \getbuffer[sample]}
110
111\getbuffer[result]
112
113But when we block hyphenation with \type {\nohyhens} we see:
114
115\setbox\scratchboxone\vbox{\dontcomplain \nl \hsize 2mm \nohyphens \getbuffer[sample]}
116
117\getbuffer[result]
118
119The \MKIV\ behavior can be emulated by setting the mode as follows
120
121\startbuffer[demo]
122\bitwiseflip \hyphenationmode \syllablehyphenationcode
123\stopbuffer
124
125\setbox\scratchboxone\vbox{\dontcomplain \nl \hsize 2mm \getbuffer[demo] \getbuffer[sample]}
126
127\getbuffer[result]
128
129This time the three non|-|syllable variants get hyphenated and that is not what
130we want. In this case there is a \type {\discretionary} in the definition of the
131macro that generates \CONTEXT\ and, apart from the fact that we might not even
132want to hyphenate logos, we have to block it when we apply \type {\nohyphens}.
133
134This mode setting are directly applied to the three non|-|syllable variants but
135delayed in the syllable discretionaries because hyphenation happens later so the
136state becomes a property of glyph nodes. Doing the same for the other
137discretionaries would demand an adaption of various pieces of the engine code and
138plugged in user (\LUA) code also has to consider it which makes no sense.
139
140\startbuffer[sample]
141\nohyphens nederlands {\dohyphens nederlands} nederlands\par
142\stopbuffer
143
144\typebuffer[sample]
145
146\setbox\scratchboxone\vbox{\dontcomplain \nl \hsize 2mm \getbuffer[sample]}
147\getbuffer[result]
148
149Compare this with:
150
151\startbuffer[sample]
152nederlands {\nohyphens nederlands} nederlands\par
153\stopbuffer
154
155\typebuffer[sample]
156
157\setbox\scratchboxone\vbox{\dontcomplain \nl \hsize 2mm \getbuffer[sample]}
158\getbuffer[result]
159
160\stopsection
161
162\startsection[title={Compound hyphenation}]
163
164Yet another discretionary related issue is with compound words, that is: cases
165where \type {\discretionary} commands sit between words. There are of course
166tricks to deal with it like adding a huge penalty combined with a zero skip. This
167is okay in a traditional \TEX\ engine but in an opened up one you might not want
168this. Just to mention one aspect: when processing \OPENTYPE\ fonts you actually
169need to look into discretionaries in order to deal with glyphs that interact. And
170you don't want to deal with penalties and skips unless they have an explicit
171meaning. We show the four possibilities:
172
173\startbuffer[sample]
174nederlands\discretionary           {!}{!}{!}nederlands\blank
175\stopbuffer
176
177\typebuffer[sample]
178
179\setbox\scratchboxone\vbox{\dontcomplain \nl \hsize 2mm \getbuffer[sample]}
180\getbuffer[result]
181
182\startbuffer[sample]
183nederlands\discretionary options 1 {!}{!}{!}nederlands\blank
184\stopbuffer
185
186\typebuffer[sample]
187
188\setbox\scratchboxone\vbox{\dontcomplain \nl \hsize 2mm \getbuffer[sample]}
189\getbuffer[result]
190
191\startbuffer[sample]
192nederlands\discretionary options 2 {!}{!}{!}nederlands\blank
193\stopbuffer
194
195\typebuffer[sample]
196
197\setbox\scratchboxone\vbox{\dontcomplain \nl \hsize 2mm \getbuffer[sample]}
198\getbuffer[result]
199
200\startbuffer[sample]
201nederlands\discretionary options 3 {!}{!}{!}nederlands\blank
202\stopbuffer
203
204\typebuffer[sample]
205
206\setbox\scratchboxone\vbox{\dontcomplain \nl \hsize 2mm \getbuffer[sample]}
207\getbuffer[result]
208
209Here is an example of such an interference. Of course in practice this happens
210seldom and certainly not with ligatures. Some fonts have kerning between certain
211glyphs and for instance dashes and there it could matter.
212
213\startbuffer
214ef%
215\penalty \plustenthousand
216\hskip   \zeropoint
217\discretionary{-}{f}{f}%
218\penalty \plustenthousand
219\hskip   \zeropoint
220e
221ef\discretionary options 3 {-}{f}{f}e
222\stopbuffer
223
224\typebuffer
225
226As you can see, we only get the ligature when we set the options. In the process
227of processing \OPENTYPE\ features it can be that one actually looses a
228discretionary, although we try to prevent this when possible.
229
230\startlinecorrection
231\scale[height=2cm]{\setupbodyfont[pagella]\showglyphs\getbuffer}
232\stoplinecorrection
233
234But, as said, the fact that we don't need the penalties and glue helps at the
235\LUA\ end: the cleaner the node list, the better.
236
237\stopsection
238
239\startsection[title={Tracing}]
240
241The already present tracker command has been extended so handle the options:
242
243\startbuffer[sample0]
244\enabletrackers[discretionaries]
245\stopbuffer
246\startbuffer[sample1]
247test\discretionary {]} {[} {[]}test
248\stopbuffer
249\startbuffer[sample2]
250testing\discretionary {]} {[} {[]}testing
251\stopbuffer
252\startbuffer[sample3]
253testing\discretionary options 3 {]} {[} {[]}testing
254\stopbuffer
255
256\typebuffer[sample0,sample1,sample2,sample3]
257
258\setbox\scratchboxone\vbox{\dontcomplain            \getbuffer[sample0,sample1]} \getbuffer[result]
259\setbox\scratchboxone\vbox{\dontcomplain \hsize 2mm \getbuffer[sample0,sample2]} \getbuffer[result]
260\setbox\scratchboxone\vbox{\dontcomplain \hsize 2mm \getbuffer[sample0,sample3]} \getbuffer[result]
261
262\stopsection
263
264\startsection[title={Glue in discretionaries}]
265
266In the case you cannot predict what goes into a discretionary you can get run into
267an error message with respect to unsupported glue in a disc node. The mode value
268\number\permitgluehyphenationcode\space makes glue acceptable and turn into
269kern, as demonstrated here;
270
271\startbuffer
272{\hsize 1mm \darkblue \discretionary{potential conspiracy}{prophets}{influencers}\par}
273\stopbuffer
274
275\typebuffer
276
277The line break occurs but the space in the pre part is of course frozen:
278
279{\getbuffer}
280
281As usual \TEX\ users will come up with applications.
282
283\stopsection
284
285\startsection[title={Penalties}]
286
287By default the par builder will use the value of \type {\hyphenpenalty} that gets
288stored in the discretionary node. However, when the \type {\discretionary} is
289followed by a \type {penalty} keyword and a number, that one will.
290
291\stopsection
292
293\startsection[title=Exceptions]
294
295At some point a user on the \CONTEXT\ mailing list wondered how to deal with a case
296like this:
297
298\startbuffer[example]
299\switchtobodyfont[pagella]\mainlanguage[de]auffasse
300\stopbuffer
301
302\typebuffer[example]
303
304\startlinecorrection
305\scale[height=2cm]{\inlinebuffer[example]}
306\stoplinecorrection
307
308\startbuffer
309\startexceptions[de]
310au{f-}{-f}{ff}(f\zwnj f)asse
311\stopexceptions
312\stopbuffer
313
314In \LUAMETATEX\ you can block the unwanted ligature using this trick:
315
316\typebuffer \getbuffer
317
318\startlinecorrection
319\scale[height=2cm]{\inlinebuffer[example]}
320\stoplinecorrection
321
322The exception mechanism in \LUATEX\ and therefore \LUAMETATEX\ works as follows.
323When we have this exception:
324
325\starttyping
326au{f-}{-f}{ff}asse
327\stoptyping
328
329the engine will register that exception under \type {auffasse}, that is: the
330replacement part determines the word. When it runs into that word, it will create
331a so called discretionary node with a pre, post and replace part. However, it
332only uses the \type {ff} for a lookup and keeps the original two glyphs: these
333become the replacement text. However, in \LUAMETATEX\ you can add an alternative
334replacement:
335
336\startbuffer
337\startexceptions[de]
338au{f-}{-f}{ff}(st)asse
339\stopexceptions
340\stopbuffer
341
342\typebuffer \getbuffer
343
344This time the replacement text becomes \type {xx}. So we get \type {austasse} and
345it is that sequence that is seen by the font handler when it applies its tricks.
346On some fonts however
347
348\startbuffer[example]
349\switchtobodyfont[pagella]\mainlanguage[de]auffasse
350\stopbuffer
351
352\startlinecorrection
353\scale[height=2cm]{\showglyphs\showfontkerns\inlinebuffer[example]}
354\stoplinecorrection
355
356But in the Pagella font that we use here, a kern is added between the \type {s} and
357the \type {t}. If you don't want that you can say this:
358
359\startbuffer
360\startexceptions[de]
361au{f-}{-f}{ff}(s\zwnj t)asse
362\stopexceptions
363\stopbuffer
364
365\typebuffer \getbuffer
366
367\startlinecorrection
368\scale[height=2cm]{\showglyphs\showfontkerns\inlinebuffer[example]}
369\stoplinecorrection
370
371A \type {zwj} will block a ligature (some fonts have an \type {st} ligature) and a
372\type {zwnj} blocks a ligatures as well as kerns.
373
374You can actually abuse this mechanism for trickery like this:
375
376\startbuffer
377\startexceptions[nl]
378wis-kun-d{e-}{o}{eo}(e-o)n-der-wijs
379\stopexceptions
380\stopbuffer
381
382\typebuffer \getbuffer
383
384The Dutch word \type {wiskundeonderwijs} is found as exception and comes out like
385this:
386
387\startbuffer[example]
388\switchtobodyfont[pagella]\mainlanguage[nl]wiskundeonderwijs
389\stopbuffer
390
391\startlinecorrection
392\scale[height=1cm]{\showglyphs\showfontkerns\inlinebuffer[example]}
393\stoplinecorrection
394
395Watch the hyphen that makes the compound word more visible! The other hyphens in
396the exception are proper hyphenation points and when a break happens there a
397hyphen is automatically added. The \type {\nokerning} and \type {\noligaturing}
398macros can be used grouped:
399
400\startbuffer[example]
401{every}\quad
402{\nokerning every}\quad
403{\noligaturing every}\quad
404{e{\nokerning v}ery}\quad
405{e{\glyphoptions\noleftkernglyphoptioncode  v}ery}\quad
406{e{\glyphoptions\norightkernglyphoptioncode v}ery}\quad
407\stopbuffer
408
409\typebuffer[example]
410
411There are several low level control options. In addition to those shown here we
412have a pair for ligatures: \typ {\noleftligatureglyphoptioncode} and \typ
413{\norightligatureglyphoptioncode}.
414
415\startlinecorrection[blank]
416\scale[width=\textwidth]{\showglyphs\showfontkerns\inlinebuffer[example]}
417\stoplinecorrection
418
419There are alternative mechanism, like a blocker that implements a font feature
420and a replacement mechanism, but these are not discussed here.
421
422\stopsection
423
424\stopchapter
425
426\stopcomponent
427
Source Browser ?