SourceBrowser

languages-hyphenation.tex /size: 28 Kb last modification: 2025-02-21 11:03
1% language=us runpath=texruns:manuals/languages
2
3\startcomponent languages-hyphenation
4
5\environment languages-environment
6
7\startchapter[title=Hyphenation][color=darkmagenta]
8
9\startsection[title=How it works]
10
11Proper hyphenation is one of the strong points of \TEX. Hyphenation in \TEX\ is
12done using so called hyphenation patterns. Making these patterns is an art
13and most users (including me) happily use whatever is available. Patterns can be
14created automatically using \type {patgen} but often manual tweaking is needed
15too. A pattern looks as follows:
16
17\starttyping
18pat1tern
19\stoptyping
20
21This means as much as: you can split the word \type {pattern} in two pieces, with
22a hyphen between the two \type {t}'s. Actually it will also split the word \type
23{patterns} because the hyphenation mechanism looks at substrings. When no number
24between characters in a pattern is given, a zero is assumed. This means as much
25as {\em undefined}. An even number inhibits hyphenation, an odd number permits
26it. The larger the number (weight), the more influence it has. A more restricted
27pattern is:
28
29\starttyping
30.pat1tern.
31\stoptyping
32
33Here the periods set the word boundaries. The pattern dictionary for us
34english has smaller patterns and the next trace shows how these are applied.
35
36\starthyphenation[traditional]
37\showhyphenationtrace[en][pattern]
38\stophyphenation
39
40The effective hyphenation of a word is determined by several factors:
41
42\startitemize[packed]
43\startitem the current language, each language can have different patterns \stopitem
44\startitem the characters, as some characters might block hyphenation \stopitem
45\startitem the settings of \type {\lefthyphenmin} and \type {\righthyphenmin} \stopitem
46\stopitemize
47
48A place where a word can be hyphenated is called a discretionary. When \TEX\
49analyzes a stream, it will inject discretionary nodes into that stream.
50
51\starttyping
52pat\discretionary{-}{}{}tern.
53\stoptyping
54
55In traditional \TEX\ hyphenation, ligature building and kerning are tightly
56interwoven which is quite effective. However, there was also a strong
57relationship between the current font and hyphenation. This is a side effect of
58traditional \TEX\ having at most 256 characters in a font and the fact that the
59used character is fact a reference to a slot in a font. There a character in the
60input initially ends up as a character node and eventually becomes a glyph node.
61For instance two characters \type {fi} can become a ligature glyph representing
62this combination.
63
64In \LUATEX\ the hyphenation, ligature building and kerning stages are separated
65and can be overloaded. In \CONTEXT\ all three can be replaced by code written in
66\LUA. Because normally hyphenation happens before font logic is applied, there is
67no relationship with font encoding. I wrote the first \LUA\ version of the
68hyohenator on a rainy weekend and the result was not that bad so it was presented
69at the 2014 \CONTEXT\ meeting. After some polishing I decided to add this routine
70to the standard \MKIV\ repertoire which then involved some proper interfacing.
71
72You can enable the \LUA\ variant with the following command:
73
74\starttyping
75\setuphyphenation[method=traditional]
76\stoptyping
77
78We call this method \type {traditional} because in principle we can have
79many more methods and this one is (supposed to be) mostly compatible to the
80built-in method. This is a global setting. You can switch back with:
81
82\starttyping
83\setuphyphenation[method=default]
84\stoptyping
85
86In the next sections we will see how we can provide alternatives within the
87traditional method. These alternatives can be set local and therefore can operate
88over a limited range of characters.
89
90One complication in interfacing is that \TEX\ has grouping (which permits local
91settings) and we want to limit some of the above functionality using groups. At
92the same time hyphenation is a paragraph related action so we need to enable the
93hyphenation related code at a global level (or at least make sure that it gets
94exercised by forcing a \type {\par}). That means that the alternative
95hyphenator has to be quite compatible so that we could just enable it for a whole
96document. This can have an impact on performance but in practice that can be
97neglected. In \LUATEX\ the \LUA\ variant is 4~times slower than the built-in one,
98in \LUAJITTEX\ it's 3~times slower. But the good news is that the amount of time
99spent in the hyphenator is relatively small compared to other manipulations and
100macro expansion. The additional time needed for loading and preparing the
101patterns into a more \LUA\ specific format can be neglected.
102
103You can check how words get hyphenated using the patterns management script:
104
105\starttyping
106>mtxrun --script patterns --hyphenate language
107
108hyphenator      |
109hyphenator      | . l a n g u a g e .   . l a n g u a g e .
110hyphenator      |    0a2n0               0 0 2 0 0 0 0 0 0
111hyphenator      |    2a0n0g0             0 2 2 0 0 0 0 0 0
112hyphenator      |      0n1g0u0           0 2 2 1 0 0 0 0 0
113hyphenator      |        0g0u4a0         0 2 2 1 0 4 0 0 0
114hyphenator      |              2g0e0.0   0 2 2 1 0 4 2 0 0
115hyphenator      | .0l2a2n1g0u4a2g0e0.   . l a n-g u a g e .
116hyphenator      |
117mtx-patterns    | us 3 3 : language : lan-guage
118\stoptyping
119
120\stopsection
121
122\startsection[title=The last words]
123
124Mid 2014 we had to upgrade a style for a \PDF\ assembly service: chapters from
125(technical) school books are combined into arbitrary new books. There are some
126nasty aspects with this flow: for instance, all section numbers in a chapter are
127replaced by new numbers and this also involves figure and table prefixes.
128It boils down to splitting up books, analyzing the typeset content and
129preparing it for replacements. The structure is described in \XML\ files so that
130we can generate tables of contents. The reason for not generating from \XML\
131sources is that the publisher doesn't have a \XML\ workflow and that books
132already were available. Also, books from several series are combined and even
133within a series structure (and rendering) differs.
134
135What has this to do with hyphenation? Writing a style for such a flow always
136results in a more complex one that estimated and as usual it's in the details.
137The original style was written in \MKII\ and used some box juggling to achieve
138reasonable results but in \MKIV\ we can do better.
139
140Each chapter has a title and books get titles and subtitles as well. The titles
141are typeset each time a new book is composed. This happens within some layout
142constraints. Think of constraints like these:
143
144\startitemize[packed]
145\startitem the title goes on top of a shape that doesn't permit much overflow \stopitem
146\startitem there can be very long words (not uncommon in Dutch or German) \stopitem
147\startitem a short word or hyphenated part should not end up on the last line \stopitem
148\startitem the left and right hyphenation minima are at least four \stopitem
149\stopitemize
150
151The last requirement is a compromise because in most cases publishers seem to
152want ragged right not hyphenated rendering (at least in Dutch schoolbooks). The
153arguments for this are quite weak and probably originate in fear of bad rendering
154given past experiences. It's this kind of situations that drive the development
155of the more obscure features that ship with \CONTEXT\ and a (partial) solution
156for this specific case will be given later.
157
158If you look at thousands of titles and turn these into (small) paragraphs \TEX\
159does a pretty good job. It's the few exceptions that we need to catch. The next
160examples demonstrate such an extreme case.
161
162\startbuffer[example]
163\dorecurse{5} { % dejavu
164    \startlinecorrection[blank]
165        \bTABLE
166            \bTR
167                \bTD[align=middle,width=2em,foregroundstyle=bold]
168                    #1
169                \eTD
170                \bTD[align={verytolerant,flushleft},width=15em,offset=1ex]
171                    \hsize \dimexpr11\emwidth-#1\dimexpr.5\emwidth\relax
172                    \dontcomplain
173                    \lefthyphenmin=4\righthyphenmin=4
174                    \blackrule[color=darkyellow,width=\hsize,height=-3pt,depth=5pt]\par
175                    \begstrut\getbuffer[long]\endstrut\par
176                \eTD
177                \bTD[align={verytolerant,flushleft},width=15em,offset=1ex]
178                    \sethyphenationfeatures[demo]
179                    \hsize \dimexpr11\emwidth-#1\dimexpr.5\emwidth\relax
180                    \dontcomplain
181                    \blackrule[color=darkyellow,width=\hsize,height=-3pt,depth=5pt]\par
182                    \begstrut\getbuffer[long]\endstrut\par
183                \eTD
184            \eTR
185        \eTABLE
186    \stoplinecorrection
187}
188\stopbuffer
189
190\definehyphenationfeatures
191  [demo]
192  [rightwords=1,
193   lefthyphenmin=4,
194   righthyphenmin=4]
195
196\startbuffer[long]
197a verylongword and then anevenlongerword
198\stopbuffer
199
200\starthyphenation[traditional]
201    \enabletrackers[hyphenator.visualize]
202    \getbuffer[example]\par
203    \disabletrackers[hyphenator.visualize]
204\stophyphenation
205
206Of course in practice there need to be some reasonable width and when we pose
207these limits the longest possible word should fit into the allocated space. In
208these examples the rule shows the width. In the right columns we see a red
209colored word and that one will not get hyphenated.
210
211\stopsection
212
213\startsection[title=Explicit hyphens]
214
215Another special case that we needed to handle were (compound) words with explicit
216hyphens. Because often data comes from \XML\ files we can not really control the
217typesetting as in a \TEX\ document where the author sees what gets done. So here
218we need a way to turn these hyphens into proper hyphenation directives and at the
219same time permit the words to be hyphenated.
220
221\definehyphenationfeatures
222  [demo]
223  [hyphens=yes,
224   lefthyphenmin=4,
225   righthyphenmin=4]
226
227\startbuffer[long]
228a very-long-word and then an-even-longer-word
229\stopbuffer
230
231\starthyphenation[traditional]
232    \enabletrackers[hyphenator.visualize]
233    \getbuffer[example]\par
234    \disabletrackers[hyphenator.visualize]
235\stophyphenation
236
237\stopsection
238
239\startsection[title=Extended patterns]
240
241As with more opened up mechanisms, in \MKIV\ we can extend functionality. As an
242example I have implemented the extensions discussed in the article by László
243Németh in the Proceedings of Euro\TEX\ 2006: {\em Hyphenation in OpenOffice.org}
244(TUGboat, Volume 27, 2006). The syntax for these extension is somewhat ugly and
245involves optional offsets and ranges. \footnote {I'm not sure if there were ever
246patterns released that used this syntax.}
247
248\startbuffer
249\registerhyphenationpattern[nl][e1ë/e=e]
250\registerhyphenationpattern[nl][a9atje./a=t,1,3]
251\registerhyphenationpattern[en][eigh1tee/t=t,5,1]
252\registerhyphenationpattern[de][c1k/k=k]
253\registerhyphenationpattern[de][schif1f/ff=f,5,2]
254\stopbuffer
255
256\typebuffer \getbuffer
257
258These patterns result in the following hyphenations:
259
260\starthyphenation[traditional]
261    \switchtobodyfont[big]
262    \starttabulate[|||]
263        \NC reëel      \NC \language[nl]\hyphenatedcoloredword{reëel}      \NC \NR
264        \NC omaatje    \NC \language[nl]\hyphenatedcoloredword{omaatje}    \NC \NR
265        \NC eighteen   \NC \language[en]\hyphenatedcoloredword{eighteen}   \NC \NR
266        \NC Zucker     \NC \language[de]\hyphenatedcoloredword{Zucker}     \NC \NR
267        \NC Schiffahrt \NC \language[de]\hyphenatedcoloredword{Schiffahrt} \NC \NR
268    \stoptabulate
269\stophyphenation
270
271In a specification, the \type {.} indicates a word boundary and numbers indicate
272the weight of a breakpoint. The optional extended specification comes after the
273\type {/}. The values separated by a \type {=} are the pre and post sequences:
274these end up at the end of the current line and beginning of the next one. The
275optional numbers are the start position and length. These default to~1 and~2, so
276in the first example they identify \type {eë} (the weights don't count).
277
278There is a pitfall here. When the language already has patterns that for
279instance prohibit a hyphen between \type {e} and type {ë}, like \type{e2ë}, we
280need to make sure that we give our new one a higher priority, which is why we
281used a \type{e9ë}.
282
283This feature is somewhat experimental and can be improved. Here is a more \LUA-ish
284way of setting such patterns:
285
286\starttyping
287local registerpattern =
288    languages.hyphenators.traditional.registerpattern
289
290registerpattern("nl","e1ë", {
291    start  = 1,
292    length = 2,
293    before = "e",
294    after  = "e",
295} )
296
297registerpattern("nl","a9atje./a=t,1,3")
298\stoptyping
299
300Just adding extra patterns to an existing set without much testing is not wise. For
301instance we could add these to the dutch dictionary:
302
303\starttyping
304\registerhyphenationpattern[nl][e3ë/e=e]
305\registerhyphenationpattern[nl][o3ë/o=e]
306\registerhyphenationpattern[nl][e3ï/e=i]
307\registerhyphenationpattern[nl][i3ë/i=e]
308\registerhyphenationpattern[nl][a5atje./a=t,1,3]
309\registerhyphenationpattern[nl][toma8at5je]
310\stoptyping
311
312That would work oke well for words like
313
314\starttyping
315coëfficiënt
316geïntroduceerd
317copiëren
318omaatje
319tomaatje
320\stoptyping
321
322However, the last word only goes right because we explicitly added a pattern
323for it. One reason is that the existing patterns already contain rules to
324prevent weird hyphenations. The same is true for the accented characters. So,
325consider these examples and coordinate additional patterns with other users
326so that errors can be identified.
327
328\stopsection
329
330\startsection[title=Exceptions]
331
332We have a variant on the \TEX\ primitive \type {\hyphenation}, the official way
333to register a specific way to hyphenate a word.
334
335\startbuffer
336\registerhyphenationexception[aaaaa-bbbbb]
337aaaaabbbbb \par
338\stopbuffer
339
340\typebuffer
341
342This code is self explaining and results in:
343
344\blank
345
346\starthyphenation[traditional]
347\setupindenting[no]\hsize 1mm \lefthyphenmin 1 \righthyphenmin 1 \getbuffer
348\stophyphenation
349
350There can be multiple hyphens and even multiple words in such a specification:
351
352\startbuffer
353\registerhyphenationexception[aaaaa-bbbbb cc-ccc-ddd-dd]
354aaaaabbbbb \par
355cccccddddd \par
356\stopbuffer
357
358\typebuffer
359
360We get:
361
362\blank
363
364\starthyphenation[traditional]
365\setupindenting[no]\hsize 1mm \lefthyphenmin 1 \righthyphenmin 1 \getbuffer
366\stophyphenation
367
368Some languages are a bit picky with respect to ligatures and hyphenation so we
369have ways to control this.
370
371% \zwj  : no ligatures
372% \zwnj : no kerns either
373
374\startbuffer
375\startexceptions[de]
376begri{ff-}{l}{ffl}(f\zwj fl)ich
377xegri{ff-}{l}{ffl}(ff\zwj l)ich
378zegri{ff-}{l}{ffl}(ffl)ich
379wegri{ff-}{l}{ffl}(f\zwj f\zwj l)ich
380\stopexceptions
381\stopbuffer
382
383\typebuffer \getbuffer
384
385Here \type {\zwj} prevents a ligature and \type {\zwnj} prevents a ligature as
386well as a font kern (in for instance Latin Modern ligatures are a bit more
387distinctive).
388
389\startlinecorrection
390    \showglyphs \showfontkerns
391    \startcombination[2*2]
392        {\de\glyphscale\numexpr4*\glyphscale\relax begrifflich} {}
393        {\de\glyphscale\numexpr4*\glyphscale\relax xegrifflich} {}
394        {\de\glyphscale\numexpr4*\glyphscale\relax zegrifflich} {}
395        {\de\glyphscale\numexpr4*\glyphscale\relax wegrifflich} {}
396    \stopcombination
397\stoplinecorrection
398
399\stopsection
400
401\startsection[title=Boundaries]
402
403A box, rule, math or discretionary will end a word and prohibit hyphenation
404of that word. Take this example:
405
406\startbuffer[demo]
407whatever \par
408whatever\hbox{!} \par
409\vl whatever\vl \par
410whatever$x$ \par
411whatever-whatever \par
412\stopbuffer
413
414\typebuffer[demo]
415
416These lines will hyphenate differently and in traditional \TEX\ you need to
417insert penalties and|/|or glue to get around it unless you instruct \LUATEX\ to
418be more. In the \LUA\ variant we can enable that limitation.
419
420\startbuffer
421\definehyphenationfeatures
422  [strict]
423  [rightedge=tex]
424\stopbuffer
425
426\typebuffer \getbuffer
427
428Here we show the three variants: traditional \TEX\ and \LUA\ with and without
429strict settings.
430
431\starttabulate[|p|p|p|]
432\HL
433\NC \ttbf \hbox to 11em{default\hss}
434\NC \ttbf \hbox to 11em{traditional\hss}
435\NC \ttbf \hbox to 11em{traditional strict\hss}
436\NC \NR
437\HL
438\NC \starthyphenation[default]     \hsize1mm \getbuffer[demo] \stophyphenation
439\NC \starthyphenation[traditional] \hsize1mm \getbuffer[demo] \stophyphenation
440\NC \starthyphenation[traditional] \sethyphenationfeatures[strict]
441                                   \hsize1mm \getbuffer[demo] \stophyphenation
442\NC \NR
443\HL
444\stoptabulate
445
446By default \CONTEXT\ is configured to hyphenate words that start with an
447uppercase character. This behaviour is controlled in \TEX\ by the \typ {\uchyph}
448variable. A positive value will enable this and a negative one disables it.
449
450\starttabulate[|p|p|p|p|]
451\HL
452\NC \ttbf \hbox to 8em{default     0\hss}
453\NC \ttbf \hbox to 8em{default     1\hss}
454\NC \ttbf \hbox to 8em{traditional 0\hss}
455\NC \ttbf \hbox to 8em{traditional 1\hss}
456\NC \NR
457\HL
458\NC \starthyphenation[default]     \hsize1mm \uchyph\zerocount TEXified \dontcomplain \stophyphenation
459\NC \starthyphenation[traditional] \hsize1mm \uchyph\zerocount TEXified \dontcomplain \stophyphenation
460\NC \starthyphenation[default]     \hsize1mm \uchyph\plusone   TEXified \dontcomplain \stophyphenation
461\NC \starthyphenation[traditional] \hsize1mm \uchyph\plusone   TEXified \dontcomplain \stophyphenation
462\NC \NR
463\HL
464\stoptabulate
465
466The \LUA\ variants behaves the same as the built-in implementation (that of course
467remains the reference).
468
469\stopsection
470
471\startsection[title=Plug-ins]
472
473The default hyphenator is similar to the built-in one, with a couple of
474extensions as mentioned. However, you can plug in your own code, given that it
475does return a proper hyphenation result. One reason for providing this plug is
476that there are users who want to play with hyphenators based  on a different
477logic. In \CONTEXT\ we already have some methods to deal with languages that
478(for instance) have no spaces but split on words or syllables. A more tight
479integration with the hyphenator can have advantages so I will explore these
480options when there is demand.
481
482A result table indicates where we can break a word. If we have a four character
483word and can break after the second character, the result looks like this:
484
485\starttyping
486result = { false, true, false, false }
487\stoptyping
488
489Instead of \type {true} we can also have a table that has entries like the
490extensions discussed in a previous section. Let's give an example of a
491plug-in.
492
493\startbuffer
494\startluacode
495    local subset = {
496        a = true,
497        e = true,
498        i = true,
499        o = true,
500        u = true,
501        y = true,
502    }
503
504    languages.hyphenators.traditional.installmethod("test",
505        function(dictionary,word,n)
506            local t = { }
507            for i=1,#word do
508                local w = word[i]
509                if subset[w] then
510                    t[i] = {
511                        before = "<" .. w,
512                        after  = w .. ">",
513                        left   = false,
514                        right  = false,
515                    }
516                else
517                    t[i] = false
518                end
519            end
520            return t
521        end
522    )
523\stopluacode
524\stopbuffer
525
526\typebuffer \getbuffer
527
528Here we hyphenate on vowels and surround them by angle brackets when
529split over lines. This alternative is installed as follows:
530
531\startbuffer
532\definehyphenationfeatures
533  [demo]
534  [alternative=test]
535\stopbuffer
536
537\typebuffer \getbuffer
538
539We can now use it as follows:
540
541\starttyping
542\setuphyphenation[method=traditional]
543\sethyphenationfeatures[demo]
544\stoptyping
545
546When applied to one the tufte example we get:
547
548\startbuffer[demo]
549\starthyphenation[traditional]
550    \setuptolerance[tolerant]
551    \sethyphenationfeatures[demo]
552    \dontleavehmode
553    \input tufte\relax
554\stophyphenation
555\stopbuffer
556
557\blank \startnarrower \getbuffer[demo] \stopnarrower \blank
558
559A more realistic (but not perfect) example is the following:
560
561\startbuffer
562\startluacode
563    local packslashes = false
564
565    local specials = {
566        ["!"]  = "before", ["?"]  = "before",
567        ['"']  = "before", ["'"]  = "before",
568        ["/"]  = "before", ["\\"] = "before",
569        ["#"]  = "before",
570        ["$"]  = "before",
571        ["%"]  = "before",
572        ["&"]  = "before",
573        ["*"]  = "before",
574        ["+"]  = "before", ["-"]  = "before",
575        [","]  = "before", ["."]  = "before",
576        [":"]  = "before", [";"]  = "before",
577        ["<"]  = "before", [">"]  = "before",
578        ["="]  = "before",
579        ["@"]  = "before",
580        ["("]  = "before",
581        ["["]  = "before",
582        ["{"]  = "before",
583        ["^"]  = "before", ["_"]  = "before",
584        ["`"]  = "before",
585        ["|"]  = "before",
586        ["~"]  = "before",
587        --
588        [")"]  = "after",
589        ["]"]  = "after",
590        ["}"]  = "after",
591    }
592
593    languages.hyphenators.traditional.installmethod("url",
594        function(dictionary,word,n)
595            local t = { }
596            local p = nil
597            for i=1,#word do
598                local w = word[i]
599                local s = specials[w]
600                if s == "after" then
601                    s = {
602                        start  = 1,
603                        length = 1,
604                        after  = w,
605                        left   = false,
606                        right  = false,
607                    }
608                    specials[w] = s
609                elseif s == "before" then
610                    s = {
611                        start  = 1,
612                        length = 1,
613                        before = w,
614                        left   = false,
615                        right  = false,
616                    }
617                    specials[w] = s
618                end
619                if not s then
620                    s = false
621                elseif w == p and w == "/" then
622                    t[i-1] = false
623                end
624                t[i] = s
625                if packslashes then
626                    p = w
627                end
628            end
629            return t
630        end
631    )
632\stopluacode
633\stopbuffer
634
635\typebuffer \getbuffer
636
637Again we define a plug:
638
639\startbuffer
640\definehyphenationfeatures
641  [url]
642  [characters=all,
643   alternative=url]
644\stopbuffer
645
646\typebuffer \getbuffer
647
648So, we only break a line after symbols.
649
650\startlinecorrection[blank]
651    \starthyphenation[traditional]
652        \tt
653        \sethyphenationfeatures[url]
654        \scale[width=\hsize]{\hyphenatedcoloredword{http://www.pragma-ade.nl}}
655    \stophyphenation
656\stoplinecorrection
657
658A quick test can look as follows:
659
660\startbuffer
661\starthyphenation[traditional]
662    \sethyphenationfeatures[url]
663    \tt
664    \dontcomplain
665    \hsize 1mm
666    http://www.pragma-ade.nl
667\stophyphenation
668\stopbuffer
669
670\typebuffer
671
672Or:
673
674\getbuffer
675
676\stopsection
677
678\startsection[title=Blocking ligatures]
679
680Yet another predefined feature is the ability to block a ligature. In
681traditional \TEX\ this can be done by putting a \type {{}} between
682the characters, although that effect can get lost when the text is
683manipulated. The natural way to do this in a \UNICODE\ environment
684is to use the special characters \type {zwj} and \type {zwnj}.
685
686We use the following example lines:
687
688\startbuffer[sample]
689supereffective \blank
690superef\zwnj fective
691\stopbuffer
692
693\typebuffer[sample]
694
695and define two featuresets:
696
697\startbuffer
698\definehyphenationfeatures
699  [demo-1]
700  [characters=\zwnj\zwj,
701   joiners=yes]
702
703\definehyphenationfeatures
704  [demo-2]
705  [joiners=no]
706\stopbuffer
707
708\typebuffer \getbuffer
709
710We limit the width to 1mm and get:
711
712\startlinecorrection[blank]
713\bTABLE[option=stretch,offset=.5ex]
714    \bTR
715        \bTD \tx
716            \type{method=default}
717        \eTD
718        \bTD \tx
719            \type{method=traditional}
720        \eTD
721        \bTD \tx
722            \type{method=traditional}\par
723            \type{featureset=demo-1}
724        \eTD
725        \bTD \tx
726            \type{method=traditional}\par
727            \type{featureset=demo-2}
728        \eTD
729    \eTR
730    \bTR
731        \bTD
732            \hsize 1mm \dontcomplain
733            \starthyphenation[default]
734                \getbuffer[sample]
735            \stophyphenation
736        \eTD
737        \bTD
738            \hsize 1mm \dontcomplain
739            \starthyphenation[traditional]
740                \getbuffer[sample]
741            \stophyphenation
742        \eTD
743        \bTD
744            \hsize 1mm \dontcomplain
745            \starthyphenation[traditional]
746                \sethyphenationfeatures[demo-1]
747                \getbuffer[sample]
748            \stophyphenation
749        \eTD
750        \bTD
751            \hsize 1mm \dontcomplain
752            \starthyphenation[traditional]
753                \sethyphenationfeatures[demo-2]
754                \getbuffer[sample]
755            \stophyphenation
756        \eTD
757    \eTR
758\eTABLE
759\stoplinecorrection
760
761\stopsection
762
763\startsection[title=Special characters]
764
765The \type {characters} example can be used (to some extend) to do the
766same as the breakpoints mechanism (compounds).
767
768\startbuffer
769\definehyphenationfeatures
770  [demo-3]
771  [characters={()[]}]
772\stopbuffer
773
774\typebuffer \blank \getbuffer \blank
775
776\startbuffer[demo]
777\starthyphenation[traditional]
778    \sethyphenationfeatures[demo-3]
779    \dontcomplain
780    \hsize 1mm
781    we use (super)special(ized) patterns
782\stophyphenation
783\stopbuffer
784
785\typebuffer[demo] \blank \getbuffer[demo] \blank
786
787We can make this more clever by adding patterns:
788
789\startbuffer
790\registerhyphenationpattern[en][)9]
791\registerhyphenationpattern[en][9(]
792\stopbuffer
793
794\typebuffer \blank \getbuffer \blank
795
796This gives:
797
798\blank \getbuffer[demo] \blank
799
800A detailed trace shows that these patterns get applied:
801
802\starthyphenation[traditional]
803    \ttx
804    \showhyphenationtrace[en][(super)special(ized)]
805\stophyphenation
806
807\unregisterhyphenationpattern[en][)9]
808\unregisterhyphenationpattern[en][9(]
809
810The somewhat weird hyphens at the edges will in practice not show up because
811there is always one regular character there.
812
813\stopsection
814
815\startsection[title=Counting]
816
817There is not much you can do about patterns. It's a craft to make them and so
818they are shipped with the distribution. In order to hyphenate well, \TEX\ looks
819at some character properties. In \CONTEXT\ only the characters used in the
820patterns of a language get tagged as valid in a word.
821
822The following example illustrates that there can be corner cases. In fact, this
823example might render differently depending on the patterns available. First we
824define an extra language, based on French.
825
826\startbuffer
827\installlanguage[frf][default=fr,patterns=fr,factor=yes]
828\stopbuffer
829
830\typebuffer \getbuffer
831
832Here we set the \type {factor} parameter which tells the loader that it should
833look at the characters used in a special way: some count for none, and some count
834for more than one when determining the min values used to determine if and where
835hyphenation is to be applied.
836
837\startbuffer
838\startmixedcolumns[n=3,balance=yes]
839    \hsize 1mm \dontcomplain
840    \language[fr]  aesop oedipus æsop œdipus \column
841    \hsize 1mm \dontcomplain
842    \language[frf] aesop oedipus æsop œdipus \column
843    \startexceptions æ-sop \stopexceptions
844    \hsize 1mm \dontcomplain
845    \language[frf] aesop oedipus æsop œdipus
846\stopmixedcolumns
847\stopbuffer
848
849\typebuffer
850
851We get three (when writing this manual) different columns:
852
853\getbuffer
854
855The trick is in the \type {factor}: when set to \type {yes} an \type {æ} is
856counted as two characters. Combining marks count as zero but you will not
857find them being used as we already resolve them in an earlier stage.
858
859\startluacode
860context.startcolumns { n = 2 }
861context.starttabulate { "|Tc|c|c|l|" }
862for u, data in table.sortedhash(languages.hjcounts) do
863    if data.category ~= "combining" then
864        context.NC() context("%05U",u)
865        context.NC() context("%c",u)
866        context.NC() context(data.count)
867        context.NC() context(data.category)
868        context.NC() context.NR()
869    end
870end
871context.stoptabulate()
872context.stopcolumns()
873\stopluacode
874
875It is very unlikely to find an \type {ﬃ} in the input and even an \type {ĳ} is
876rare. The \type {æ} is marked as character and the \type {œ} a ligatyure in
877\UNICODE. Maybe all the characters here are dubious but al least we provide a
878way to experiment with them.
879
880\stopsection
881
882\startsection[title=Tracing]
883
884Among the tracing options (low level trackers) there is one for pattern developers:
885
886\startbuffer
887\usemodule[s-languages-hyphenation]
888
889\startcomparepatterns[de,nl,en,fr]
890    \input zapf \quad (\showcomparepatternslegend)
891\stopcomparepatterns
892\stopbuffer
893
894\typebuffer
895
896The different hyphenation points are shown with colored bars. Some valid points
897might not be shown because the font engine can collapse successive
898discretionaries.
899
900\getbuffer
901
902\stopsection
903
904\startsection[title=Neat tricks]
905
906The following two examples are for users to test. The first one shows all hyphenation
907points in a paragraph:
908
909\starttyping
910\bgroup
911    \setupalign[flushright]
912    \hyphenpenalty-100000
913    \input tufte
914    \par % force hyphenation
915\egroup
916\stoptyping
917
918The second one shows the cases where a hyphenated word ends a page:
919
920\starttyping
921\bgroup
922    \page
923    \interlinepenalty10000
924    \brokenpenalty-10000
925    \input tufte
926    \page
927\egroup
928\stoptyping
929
930A less space consuming variant of that one is:
931
932\starttyping
933\bgroup
934    \setbox\scratchboxone\vbox \bgroup
935        \interlinepenalty10000
936        \brokenpenalty-10000
937        \input tufte
938    \egroup
939    \doloop {
940        \ifvoid\scratchboxone
941            \hrule
942            \exitloop
943        \else
944            \setbox\scratchboxtwo\vsplit\scratchboxone to 1pt
945            \hrule
946            \unvbox\scratchboxtwo
947        \fi
948    }
949\egroup
950\stoptyping
951
952\stopsection
953
954\stopchapter
955
956\stopcomponent
957
Source Browser ?