languages-hyphenation.tex /size: 27 Kb    last modification: 2021-10-28 13:50
1% language=us runpath=texruns:manuals/languages
2
3\startcomponent languages-hyphenation
4
5\environment languages-environment
6
7\startchapter[title=Hyphenation][color=darkmagenta]
8
9\startsection[title=How it works]
10
11Proper hyphenation is one of the strong points of \TEX. Hyphenation in \TEX\ is
12done using so called hyphenation patterns. Making these patterns is an art
13and most users (including me) happily use whatever is available. Patterns can be
14created automatically using \type {patgen} but often manual tweaking is needed
15too. A pattern looks as follows:
16
17\starttyping
18pat1tern
19\stoptyping
20
21This means as much as: you can split the word \type {pattern} in two pieces, with
22a hyphen between the two \type {t}'s. Actually it will also split the word \type
23{patterns} because the hyphenation mechanism looks at substrings. When no number
24between characters in a pattern is given, a zero is assumed. This means as much
25as {\em undefined}. An even number inhibits hyphenation, an odd number permits
26it. The larger the number (weight), the more influence it has. A more restricted
27pattern is:
28
29\starttyping
30.pat1tern.
31\stoptyping
32
33Here the periods set the word boundaries. The pattern dictionary for us
34english has smaller patterns and the next trace shows how these are applied.
35
36\starthyphenation[traditional]
37\showhyphenationtrace[en][pattern]
38\stophyphenation
39
40The effective hyphenation of a word is determined by several factors:
41
42\startitemize[packed]
43\startitem the current language, each language can have different patterns \stopitem
44\startitem the characters, as some characters might block hyphenation \stopitem
45\startitem the settings of \type {\lefthyphenmin} and \type {\righthyphenmin} \stopitem
46\stopitemize
47
48A place where a word can be hyphenated is called a discretionary. When \TEX\
49analyzes a stream, it will inject discretionary nodes into that stream.
50
51\starttyping
52pat\discretionary{-}{}{}tern.
53\stoptyping
54
55In traditional \TEX\ hyphenation, ligature building and kerning are tightly
56interwoven which is quite effective. However, there was also a strong
57relationship between the current font and hyphenation. This is a side effect of
58traditional \TEX\ having at most 256 characters in a font and the fact that the
59used character is fact a reference to a slot in a font. There a character in the
60input initially ends up as a character node and eventually becomes a glyph node.
61For instance two characters \type {fi} can become a ligature glyph representing
62this combination.
63
64In \LUATEX\ the hyphenation, ligature building and kerning stages are separated
65and can be overloaded. In \CONTEXT\ all three can be replaced by code written in
66\LUA. Because normally hyphenation happens before font logic is applied, there is
67no relationship with font encoding. I wrote the first \LUA\ version of the
68hyohenator on a rainy weekend and the result was not that bad so it was presented
69at the 2014 \CONTEXT\ meeting. After some polishing I decided to add this routine
70to the standard \MKIV\ repertoire which then involved some proper interfacing.
71
72You can enable the \LUA\ variant with the following command:
73
74\starttyping
75\setuphyphenation[method=traditional]
76\stoptyping
77
78We call this method \type {traditional} because in principle we can have
79many more methods and this one is (supposed to be) mostly compatible to the
80built-in method. This is a global setting. You can switch back with:
81
82\starttyping
83\setuphyphenation[method=default]
84\stoptyping
85
86In the next sections we will see how we can provide alternatives within the
87traditional method. These alternatives can be set local and therefore can operate
88over a limited range of characters.
89
90One complication in interfacing is that \TEX\ has grouping (which permits local
91settings) and we want to limit some of the above functionality using groups. At
92the same time hyphenation is a paragraph related action so we need to enable the
93hyphenation related code at a global level (or at least make sure that it gets
94exercised by forcing a \type {\par}). That means that the alternative
95hyphenator has to be quite compatible so that we could just enable it for a whole
96document. This can have an impact on performance but in practice that can be
97neglected. In \LUATEX\ the \LUA\ variant is 4~times slower than the built-in one,
98in \LUAJITTEX\ it's 3~times slower. But the good news is that the amount of time
99spent in the hyphenator is relatively small compared to other manipulations and
100macro expansion. The additional time needed for loading and preparing the
101patterns into a more \LUA\ specific format can be neglected.
102
103You can check how words get hyphenated using the patterns management script:
104
105\starttyping
106>mtxrun --script patterns --hyphenate language
107
108hyphenator      |
109hyphenator      | . l a n g u a g e .   . l a n g u a g e .
110hyphenator      |    0a2n0               0 0 2 0 0 0 0 0 0
111hyphenator      |    2a0n0g0             0 2 2 0 0 0 0 0 0
112hyphenator      |      0n1g0u0           0 2 2 1 0 0 0 0 0
113hyphenator      |        0g0u4a0         0 2 2 1 0 4 0 0 0
114hyphenator      |              2g0e0.0   0 2 2 1 0 4 2 0 0
115hyphenator      | .0l2a2n1g0u4a2g0e0.   . l a n-g u a g e .
116hyphenator      |
117mtx-patterns    | us 3 3 : language : lan-guage
118\stoptyping
119
120\stopsection
121
122\startsection[title=The last words]
123
124Mid 2014 we had to upgrade a style for a \PDF\ assembly service: chapters from
125(technical) school books are combined into arbitrary new books. There are some
126nasty aspects with this flow: for instance, all section numbers in a chapter are
127replaced by new numbers and this also involves figure and table prefixes.
128It boils down to splitting up books, analyzing the typeset content and
129preparing it for replacements. The structure is described in \XML\ files so that
130we can generate tables of contents. The reason for not generating from \XML\
131sources is that the publisher doesn't have a \XML\ workflow and that books
132already were available. Also, books from several series are combined and even
133within a series structure (and rendering) differs.
134
135What has this to do with hyphenation? Writing a style for such a flow always
136results in a more complex one that estimated and as usual it's in the details.
137The original style was written in \MKII\ and used some box juggling to achieve
138reasonable results but in \MKIV\ we can do better.
139
140Each chapter has a title and books get titles and subtitles as well. The titles
141are typeset each time a new book is composed. This happens within some layout
142constraints. Think of constraints like these:
143
144\startitemize[packed]
145\startitem the title goes on top of a shape that doesn't permit much overflow \stopitem
146\startitem there can be very long words (not uncommon in Dutch or German) \stopitem
147\startitem a short word or hyphenated part should not end up on the last line \stopitem
148\startitem the left and right hyphenation minima are at least four \stopitem
149\stopitemize
150
151The last requirement is a compromise because in most cases publishers seem to
152want ragged right not hyphenated rendering (at least in Dutch schoolbooks). The
153arguments for this are quite weak and probably originate in fear of bad rendering
154given past experiences. It's this kind of situations that drive the development
155of the more obscure features that ship with \CONTEXT\ and a (partial) solution
156for this specific case will be given later.
157
158If you look at thousands of titles and turn these into (small) paragraphs \TEX\
159does a pretty good job. It's the few exceptions that we need to catch. The next
160examples demonstrate such an extreme case.
161
162\startbuffer[example]
163\dorecurse{5} { % dejavu
164    \startlinecorrection[blank]
165        \bTABLE
166            \bTR
167                \bTD[align=middle,width=2em,foregroundstyle=bold]
168                    #1
169                \eTD
170                \bTD[align={verytolerant,flushleft},width=15em,offset=1ex]
171                    \hsize \dimexpr11\emwidth-#1\dimexpr.5\emwidth\relax
172                    \dontcomplain
173                    \lefthyphenmin=4\righthyphenmin=4
174                    \blackrule[color=darkyellow,width=\hsize,height=-3pt,depth=5pt]\par
175                    \begstrut\getbuffer[long]\endstrut\par
176                \eTD
177                \bTD[align={verytolerant,flushleft},width=15em,offset=1ex]
178                    \sethyphenationfeatures[demo]
179                    \hsize \dimexpr11\emwidth-#1\dimexpr.5\emwidth\relax
180                    \dontcomplain
181                    \blackrule[color=darkyellow,width=\hsize,height=-3pt,depth=5pt]\par
182                    \begstrut\getbuffer[long]\endstrut\par
183                \eTD
184            \eTR
185        \eTABLE
186    \stoplinecorrection
187}
188\stopbuffer
189
190\definehyphenationfeatures
191  [demo]
192  [rightwords=1,
193   lefthyphenmin=4,
194   righthyphenmin=4]
195
196\startbuffer[long]
197a verylongword and then anevenlongerword
198\stopbuffer
199
200\starthyphenation[traditional]
201    \enabletrackers[hyphenator.visualize]
202    \getbuffer[example]\par
203    \disabletrackers[hyphenator.visualize]
204\stophyphenation
205
206Of course in practice there need to be some reasonable width and when we pose
207these limits the longest possible word should fit into the allocated space. In
208these examples the rule shows the width. In the right columns we see a red
209colored word and that one will not get hyphenated.
210
211\stopsection
212
213\startsection[title=Explicit hyphens]
214
215Another special case that we needed to handle were (compound) words with explicit
216hyphens. Because often data comes from \XML\ files we can not really control the
217typesetting as in a \TEX\ document where the author sees what gets done. So here
218we need a way to turn these hyphens into proper hyphenation directives and at the
219same time permit the words to be hyphenated.
220
221\definehyphenationfeatures
222  [demo]
223  [hyphens=yes,
224   lefthyphenmin=4,
225   righthyphenmin=4]
226
227\startbuffer[long]
228a very-long-word and then an-even-longer-word
229\stopbuffer
230
231\starthyphenation[traditional]
232    \enabletrackers[hyphenator.visualize]
233    \getbuffer[example]\par
234    \disabletrackers[hyphenator.visualize]
235\stophyphenation
236
237\stopsection
238
239\startsection[title=Extended patterns]
240
241As with more opened up mechanisms, in \MKIV\ we can extend functionality. As an
242example I have implemented the extensions discussed in the article by László
243Németh in the Proceedings of Euro\TEX\ 2006: {\em Hyphenation in OpenOffice.org}
244(TUGboat, Volume 27, 2006). The syntax for these extension is somewhat ugly and
245involves optional offsets and ranges. \footnote {I'm not sure if there were ever
246patterns released that used this syntax.}
247
248\startbuffer
249\registerhyphenationpattern[nl][e1ë/e=e]
250\registerhyphenationpattern[nl][a9atje./a=t,1,3]
251\registerhyphenationpattern[en][eigh1tee/t=t,5,1]
252\registerhyphenationpattern[de][c1k/k=k]
253\registerhyphenationpattern[de][schif1f/ff=f,5,2]
254\stopbuffer
255
256\typebuffer \getbuffer
257
258These patterns result in the following hyphenations:
259
260\starthyphenation[traditional]
261    \switchtobodyfont[big]
262    \starttabulate[|||]
263        \NC reëel      \NC \language[nl]\hyphenatedcoloredword{reëel}      \NC \NR
264        \NC omaatje    \NC \language[nl]\hyphenatedcoloredword{omaatje}    \NC \NR
265        \NC eighteen   \NC \language[en]\hyphenatedcoloredword{eighteen}   \NC \NR
266        \NC Zucker     \NC \language[de]\hyphenatedcoloredword{Zucker}     \NC \NR
267        \NC Schiffahrt \NC \language[de]\hyphenatedcoloredword{Schiffahrt} \NC \NR
268    \stoptabulate
269\stophyphenation
270
271In a specification, the \type {.} indicates a word boundary and numbers indicate
272the weight of a breakpoint. The optional extended specification comes after the
273\type {/}. The values separated by a \type {=} are the pre and post sequences:
274these end up at the end of the current line and beginning of the next one. The
275optional numbers are the start position and length. These default to~1 and~2, so
276in the first example they identify \type {} (the weights don't count).
277
278There is a pitfall here. When the language already has patterns that for
279instance prohibit a hyphen between \type {e} and type {ë}, like \type{e2ë}, we
280need to make sure that we give our new one a higher priority, which is why we
281used a \type{e9ë}.
282
283This feature is somewhat experimental and can be improved. Here is a more \LUA-ish
284way of setting such patterns:
285
286\starttyping
287local registerpattern =
288    languages.hyphenators.traditional.registerpattern
289
290registerpattern("nl","e1ë", {
291    start  = 1,
292    length = 2,
293    before = "e",
294    after  = "e",
295} )
296
297registerpattern("nl","a9atje./a=t,1,3")
298\stoptyping
299
300Just adding extra patterns to an existing set without much testing is not wise. For
301instance we could add these to the dutch dictionary:
302
303\starttyping
304\registerhyphenationpattern[nl][e3ë/e=e]
305\registerhyphenationpattern[nl][o3ë/o=e]
306\registerhyphenationpattern[nl][e3ï/e=i]
307\registerhyphenationpattern[nl][i3ë/i=e]
308\registerhyphenationpattern[nl][a5atje./a=t,1,3]
309\registerhyphenationpattern[nl][toma8at5je]
310\stoptyping
311
312That would work oke well for words like
313
314\starttyping
315coëfficiënt
316geïntroduceerd
317copiëren
318omaatje
319tomaatje
320\stoptyping
321
322However, the last word only goes right because we explicitly added a pattern
323for it. One reason is that the existing patterns already contain rules to
324prevent weird hyphenations. The same is true for the accented characters. So,
325consider these examples and coordinate additional patterns with other users
326so that errors can be identified.
327
328\stopsection
329
330\startsection[title=Exceptions]
331
332We have a variant on the \TEX\ primitive \type {\hyphenation}, the official way
333to register a specific way to hyphenate a word.
334
335\startbuffer
336\registerhyphenationexception[aaaaa-bbbbb]
337aaaaabbbbb \par
338\stopbuffer
339
340\typebuffer
341
342This code is self explaining and results in:
343
344\blank
345
346\starthyphenation[traditional]
347\setupindenting[no]\hsize 1mm \lefthyphenmin 1 \righthyphenmin 1 \getbuffer
348\stophyphenation
349
350There can be multiple hyphens and even multiple words in such a specification:
351
352\startbuffer
353\registerhyphenationexception[aaaaa-bbbbb cc-ccc-ddd-dd]
354aaaaabbbbb \par
355cccccddddd \par
356\stopbuffer
357
358\typebuffer
359
360We get:
361
362\blank
363
364\starthyphenation[traditional]
365\setupindenting[no]\hsize 1mm \lefthyphenmin 1 \righthyphenmin 1 \getbuffer
366\stophyphenation
367
368
369\stopsection
370
371\startsection[title=Boundaries]
372
373A box, rule, math or discretionary will end a word and prohibit hyphenation
374of that word. Take this example:
375
376\startbuffer[demo]
377whatever \par
378whatever\hbox{!} \par
379\vl whatever\vl \par
380whatever$x$ \par
381whatever-whatever \par
382\stopbuffer
383
384\typebuffer[demo]
385
386These lines will hyphenate differently and in traditional \TEX\ you need to
387insert penalties and|/|or glue to get around it unless you instruct \LUATEX\ to
388be more. In the \LUA\ variant we can enable that limitation.
389
390\startbuffer
391\definehyphenationfeatures
392  [strict]
393  [rightedge=tex]
394\stopbuffer
395
396\typebuffer \getbuffer
397
398Here we show the three variants: traditional \TEX\ and \LUA\ with and without
399strict settings.
400
401\starttabulate[|p|p|p|]
402\HL
403\NC \ttbf \hbox to 11em{default\hss}
404\NC \ttbf \hbox to 11em{traditional\hss}
405\NC \ttbf \hbox to 11em{traditional strict\hss}
406\NC \NR
407\HL
408\NC \starthyphenation[default]     \hsize1mm \getbuffer[demo] \stophyphenation
409\NC \starthyphenation[traditional] \hsize1mm \getbuffer[demo] \stophyphenation
410\NC \starthyphenation[traditional] \sethyphenationfeatures[strict]
411                                   \hsize1mm \getbuffer[demo] \stophyphenation
412\NC \NR
413\HL
414\stoptabulate
415
416By default \CONTEXT\ is configured to hyphenate words that start with an
417uppercase character. This behaviour is controlled in \TEX\ by the \typ {\uchyph}
418variable. A positive value will enable this and a negative one disables it.
419
420\starttabulate[|p|p|p|p|]
421\HL
422\NC \ttbf \hbox to 8em{default     0\hss}
423\NC \ttbf \hbox to 8em{default     1\hss}
424\NC \ttbf \hbox to 8em{traditional 0\hss}
425\NC \ttbf \hbox to 8em{traditional 1\hss}
426\NC \NR
427\HL
428\NC \starthyphenation[default]     \hsize1mm \uchyph\zerocount TEXified \dontcomplain \stophyphenation
429\NC \starthyphenation[traditional] \hsize1mm \uchyph\zerocount TEXified \dontcomplain \stophyphenation
430\NC \starthyphenation[default]     \hsize1mm \uchyph\plusone   TEXified \dontcomplain \stophyphenation
431\NC \starthyphenation[traditional] \hsize1mm \uchyph\plusone   TEXified \dontcomplain \stophyphenation
432\NC \NR
433\HL
434\stoptabulate
435
436The \LUA\ variants behaves the same as the built-in implementation (that of course
437remains the reference).
438
439\stopsection
440
441\startsection[title=Plug-ins]
442
443The default hyphenator is similar to the built-in one, with a couple of
444extensions as mentioned. However, you can plug in your own code, given that it
445does return a proper hyphenation result. One reason for providing this plug is
446that there are users who want to play with hyphenators based  on a different
447logic. In \CONTEXT\ we already have some methods to deal with languages that
448(for instance) have no spaces but split on words or syllables. A more tight
449integration with the hyphenator can have advantages so I will explore these
450options when there is demand.
451
452A result table indicates where we can break a word. If we have a four character
453word and can break after the second character, the result looks like this:
454
455\starttyping
456result = { false, true, false, false }
457\stoptyping
458
459Instead of \type {true} we can also have a table that has entries like the
460extensions discussed in a previous section. Let's give an example of a
461plug-in.
462
463\startbuffer
464\startluacode
465    local subset = {
466        a = true,
467        e = true,
468        i = true,
469        o = true,
470        u = true,
471        y = true,
472    }
473
474    languages.hyphenators.traditional.installmethod("test",
475        function(dictionary,word,n)
476            local t = { }
477            for i=1,#word do
478                local w = word[i]
479                if subset[w] then
480                    t[i] = {
481                        before = "<" .. w,
482                        after  = w .. ">",
483                        left   = false,
484                        right  = false,
485                    }
486                else
487                    t[i] = false
488                end
489            end
490            return t
491        end
492    )
493\stopluacode
494\stopbuffer
495
496\typebuffer \getbuffer
497
498Here we hyphenate on vowels and surround them by angle brackets when
499split over lines. This alternative is installed as follows:
500
501\startbuffer
502\definehyphenationfeatures
503  [demo]
504  [alternative=test]
505\stopbuffer
506
507\typebuffer \getbuffer
508
509We can now use it as follows:
510
511\starttyping
512\setuphyphenation[method=traditional]
513\sethyphenationfeatures[demo]
514\stoptyping
515
516When applied to one the tufte example we get:
517
518\startbuffer[demo]
519\starthyphenation[traditional]
520    \setuptolerance[tolerant]
521    \sethyphenationfeatures[demo]
522    \dontleavehmode
523    \input tufte\relax
524\stophyphenation
525\stopbuffer
526
527\blank \startnarrower \getbuffer[demo] \stopnarrower \blank
528
529A more realistic (but not perfect) example is the following:
530
531\startbuffer
532\startluacode
533    local packslashes = false
534
535    local specials = {
536        ["!"]  = "before", ["?"]  = "before",
537        ['"']  = "before", ["'"]  = "before",
538        ["/"]  = "before", ["\\"] = "before",
539        ["#"]  = "before",
540        ["$"]  = "before",
541        ["%"]  = "before",
542        ["&"]  = "before",
543        ["*"]  = "before",
544        ["+"]  = "before", ["-"]  = "before",
545        [","]  = "before", ["."]  = "before",
546        [":"]  = "before", [";"]  = "before",
547        ["<"]  = "before", [">"]  = "before",
548        ["="]  = "before",
549        ["@"]  = "before",
550        ["("]  = "before",
551        ["["]  = "before",
552        ["{"]  = "before",
553        ["^"]  = "before", ["_"]  = "before",
554        ["`"]  = "before",
555        ["|"]  = "before",
556        ["~"]  = "before",
557        --
558        [")"]  = "after",
559        ["]"]  = "after",
560        ["}"]  = "after",
561    }
562
563    languages.hyphenators.traditional.installmethod("url",
564        function(dictionary,word,n)
565            local t = { }
566            local p = nil
567            for i=1,#word do
568                local w = word[i]
569                local s = specials[w]
570                if s == "after" then
571                    s = {
572                        start  = 1,
573                        length = 1,
574                        after  = w,
575                        left   = false,
576                        right  = false,
577                    }
578                    specials[w] = s
579                elseif s == "before" then
580                    s = {
581                        start  = 1,
582                        length = 1,
583                        before = w,
584                        left   = false,
585                        right  = false,
586                    }
587                    specials[w] = s
588                end
589                if not s then
590                    s = false
591                elseif w == p and w == "/" then
592                    t[i-1] = false
593                end
594                t[i] = s
595                if packslashes then
596                    p = w
597                end
598            end
599            return t
600        end
601    )
602\stopluacode
603\stopbuffer
604
605\typebuffer \getbuffer
606
607Again we define a plug:
608
609\startbuffer
610\definehyphenationfeatures
611  [url]
612  [characters=all,
613   alternative=url]
614\stopbuffer
615
616\typebuffer \getbuffer
617
618So, we only break a line after symbols.
619
620\startlinecorrection[blank]
621    \starthyphenation[traditional]
622        \tt
623        \sethyphenationfeatures[url]
624        \scale[width=\hsize]{\hyphenatedcoloredword{http://www.pragma-ade.nl}}
625    \stophyphenation
626\stoplinecorrection
627
628A quick test can look as follows:
629
630\startbuffer
631\starthyphenation[traditional]
632    \sethyphenationfeatures[url]
633    \tt
634    \dontcomplain
635    \hsize 1mm
636    http://www.pragma-ade.nl
637\stophyphenation
638\stopbuffer
639
640\typebuffer
641
642Or:
643
644\getbuffer
645
646\stopsection
647
648\startsection[title=Blocking ligatures]
649
650Yet another predefined feature is the ability to block a ligature. In
651traditional \TEX\ this can be done by putting a \type {{}} between
652the characters, although that effect can get lost when the text is
653manipulated. The natural way to do this in a \UNICODE\ environment
654is to use the special characters \type {zwj} and \type {zwnj}.
655
656We use the following example lines:
657
658\startbuffer[sample]
659supereffective \blank
660superef\zwnj fective
661\stopbuffer
662
663\typebuffer[sample]
664
665and define two featuresets:
666
667\startbuffer
668\definehyphenationfeatures
669  [demo-1]
670  [characters=\zwnj\zwj,
671   joiners=yes]
672
673\definehyphenationfeatures
674  [demo-2]
675  [joiners=no]
676\stopbuffer
677
678\typebuffer \getbuffer
679
680We limit the width to 1mm and get:
681
682\startlinecorrection[blank]
683\bTABLE[option=stretch,offset=.5ex]
684    \bTR
685        \bTD \tx
686            \type{method=default}
687        \eTD
688        \bTD \tx
689            \type{method=traditional}
690        \eTD
691        \bTD \tx
692            \type{method=traditional}\par
693            \type{featureset=demo-1}
694        \eTD
695        \bTD \tx
696            \type{method=traditional}\par
697            \type{featureset=demo-2}
698        \eTD
699    \eTR
700    \bTR
701        \bTD
702            \hsize 1mm \dontcomplain
703            \starthyphenation[default]
704                \getbuffer[sample]
705            \stophyphenation
706        \eTD
707        \bTD
708            \hsize 1mm \dontcomplain
709            \starthyphenation[traditional]
710                \getbuffer[sample]
711            \stophyphenation
712        \eTD
713        \bTD
714            \hsize 1mm \dontcomplain
715            \starthyphenation[traditional]
716                \sethyphenationfeatures[demo-1]
717                \getbuffer[sample]
718            \stophyphenation
719        \eTD
720        \bTD
721            \hsize 1mm \dontcomplain
722            \starthyphenation[traditional]
723                \sethyphenationfeatures[demo-2]
724                \getbuffer[sample]
725            \stophyphenation
726        \eTD
727    \eTR
728\eTABLE
729\stoplinecorrection
730
731\stopsection
732
733\startsection[title=Special characters]
734
735The \type {characters} example can be used (to some extend) to do the
736same as the breakpoints mechanism (compounds).
737
738\startbuffer
739\definehyphenationfeatures
740  [demo-3]
741  [characters={()[]}]
742\stopbuffer
743
744\typebuffer \blank \getbuffer \blank
745
746\startbuffer[demo]
747\starthyphenation[traditional]
748    \sethyphenationfeatures[demo-3]
749    \dontcomplain
750    \hsize 1mm
751    we use (super)special(ized) patterns
752\stophyphenation
753\stopbuffer
754
755\typebuffer[demo] \blank \getbuffer[demo] \blank
756
757We can make this more clever by adding patterns:
758
759\startbuffer
760\registerhyphenationpattern[en][)9]
761\registerhyphenationpattern[en][9(]
762\stopbuffer
763
764\typebuffer \blank \getbuffer \blank
765
766This gives:
767
768\blank \getbuffer[demo] \blank
769
770A detailed trace shows that these patterns get applied:
771
772\starthyphenation[traditional]
773    \ttx
774    \showhyphenationtrace[en][(super)special(ized)]
775\stophyphenation
776
777\unregisterhyphenationpattern[en][)9]
778\unregisterhyphenationpattern[en][9(]
779
780The somewhat weird hyphens at the edges will in practice not show up because
781there is always one regular character there.
782
783\stopsection
784
785\startsection[title=Counting]
786
787There is not much you can do about patterns. It's a craft to make them and so
788they are shipped with the distribution. In order to hyphenate well, \TEX\ looks
789at some character properties. In \CONTEXT\ only the characters used in the
790patterns of a language get tagged as valid in a word.
791
792The following example illustrates that there can be corner cases. In fact, this
793example might render differently depending on the patterns available. First we
794define an extra language, based on French.
795
796\startbuffer
797\installlanguage[frf][default=fr,patterns=fr,factor=yes]
798\stopbuffer
799
800\typebuffer \getbuffer
801
802Here we set the \type {factor} parameter which tells the loader that it should
803look at the characters used in a special way: some count for none, and some count
804for more than one when determining the min values used to determine if and where
805hyphenation is to be applied.
806
807\startbuffer
808\startmixedcolumns[n=3,balance=yes]
809    \hsize 1mm \dontcomplain
810    \language[fr]  aesop oedipus æsop œdipus \column
811    \hsize 1mm \dontcomplain
812    \language[frf] aesop oedipus æsop œdipus \column
813    \startexceptions æ-sop \stopexceptions
814    \hsize 1mm \dontcomplain
815    \language[frf] aesop oedipus æsop œdipus
816\stopmixedcolumns
817\stopbuffer
818
819\typebuffer
820
821We get three (when writing this manual) different columns:
822
823\getbuffer
824
825The trick is in the \type {factor}: when set to \type {yes} an \type {æ} is
826counted as two characters. Combining marks count as zero but you will not
827find them being used as we already resolve them in an earlier stage.
828
829\startluacode
830context.startcolumns { n = 2 }
831context.starttabulate { "|Tc|c|c|l|" }
832for u, data in table.sortedhash(languages.hjcounts) do
833    if data.category ~= "combining" then
834        context.NC() context("%05U",u)
835        context.NC() context("%c",u)
836        context.NC() context(data.count)
837        context.NC() context(data.category)
838        context.NC() context.NR()
839    end
840end
841context.stoptabulate()
842context.stopcolumns()
843\stopluacode
844
845It is very unlikely to find an \type {} in the input and even an \type {ij} is
846rare. The \type {æ} is marked as character and the \type {œ} a ligatyure in
847\UNICODE. Maybe all the characters here are dubious but al least we provide a
848way to experiment with them.
849
850\stopsection
851
852\startsection[title=Tracing]
853
854Among the tracing options (low level trackers) there is one for pattern developers:
855
856\startbuffer
857\usemodule[s-languages-hyphenation]
858
859\startcomparepatterns[de,nl,en,fr]
860    \input zapf \quad (\showcomparepatternslegend)
861\stopcomparepatterns
862\stopbuffer
863
864\typebuffer
865
866The different hyphenation points are shown with colored bars. Some valid points
867might not be shown because the font engine can collapse successive
868discretionaries.
869
870\getbuffer
871
872\stopsection
873
874\startsection[title=Neat tricks]
875
876The following two examples are for users to test. The first one shows all hyphenation
877points in a paragraph:
878
879\starttyping
880\bgroup
881    \setupalign[flushright]
882    \hyphenpenalty-100000
883    \input tufte
884    \par % force hyphenation
885\egroup
886\stoptyping
887
888The second one shows the cases where a hyphenated word ends a page:
889
890\starttyping
891\bgroup
892    \page
893    \interlinepenalty10000
894    \brokenpenalty-10000
895    \input tufte
896    \page
897\egroup
898\stoptyping
899
900A less space consuming variant of that one is:
901
902\starttyping
903\bgroup
904    \setbox\scratchboxone\vbox \bgroup
905        \interlinepenalty10000
906        \brokenpenalty-10000
907        \input tufte
908    \egroup
909    \doloop {
910        \ifvoid\scratchboxone
911            \hrule
912            \exitloop
913        \else
914            \setbox\scratchboxtwo\vsplit\scratchboxone to 1pt
915            \hrule
916            \unvbox\scratchboxtwo
917        \fi
918    }
919\egroup
920\stoptyping
921
922\stopsection
923
924\stopchapter
925
926\stopcomponent
927