luatex-languages.tex /size: 46 Kb    last modification: 2023-12-21 09:43
1% language=us engine=luatex runpath=texruns:manuals/luatex
2
3\environment luatex-style
4
5\startcomponent luatex-languages
6
7\startchapter[reference=languages,title={Languages, characters, fonts and glyphs}]
8
9\startsection[title={Introduction}]
10
11\topicindex {languages}
12
13\LUATEX's internal handling of the characters and glyphs that eventually become
14typeset is quite different from the way \TEX82 handles those same objects. The
15easiest way to explain the difference is to focus on unrestricted horizontal mode
16(i.e.\ paragraphs) and hyphenation first. Later on, it will be easy to deal
17with the differences that occur in horizontal and math modes.
18
19In \TEX82, the characters you type are converted into \type {char} node records
20when they are encountered by the main control loop. \TEX\ attaches and processes
21the font information while creating those records, so that the resulting \quote
22{horizontal list} contains the final forms of ligatures and implicit kerning.
23This packaging is needed because we may want to get the effective width of for
24instance a horizontal box.
25
26When it becomes necessary to hyphenate words in a paragraph, \TEX\ converts (one
27word at time) the \type {char} node records into a string by replacing ligatures
28with their components and ignoring the kerning. Then it runs the hyphenation
29algorithm on this string, and converts the hyphenated result back into a \quote
30{horizontal list} that is consecutively spliced back into the paragraph stream.
31Keep in mind that the paragraph may contain unboxed horizontal material, which
32then already contains ligatures and kerns and the words therein are part of the
33hyphenation process.
34
35Those \type {char} node records are somewhat misnamed, as they are glyph
36positions in specific fonts, and therefore not really \quote {characters} in the
37linguistic sense. There is no language information inside the \type {char} node
38records at all. Instead, language information is passed along using \type
39{language whatsit} nodes inside the horizontal list.
40
41In \LUATEX, the situation is quite different. The characters you type are always
42converted into \nod {glyph} node records with a special subtype to identify them
43as being intended as linguistic characters. \LUATEX\ stores the needed language
44information in those records, but does not do any font|-|related processing at
45the time of node creation. It only stores the index of the current font and a
46reference to a character in that font.
47
48When it becomes necessary to typeset a paragraph, \LUATEX\ first inserts all
49hyphenation points right into the whole node list. Next, it processes all the
50font information in the whole list (creating ligatures and adjusting kerning),
51and finally it adjusts all the subtype identifiers so that the records are \quote
52{glyph nodes} from now on.
53
54\stopsection
55
56\startsection[title={Characters, glyphs and discretionaries},reference=charsandglyphs]
57
58\topicindex {characters}
59\topicindex {glyphs}
60\topicindex {hyphenation}
61
62\TEX82 (including \PDFTEX) differentiates between \type {char} nodes and \type
63{lig} nodes. The former are simple items that contained nothing but a \quote
64{character} and a \quote {font} field, and they lived in the same memory as
65tokens did. The latter also contained a list of components, and a subtype
66indicating whether this ligature was the result of a word boundary, and it was
67stored in the same place as other nodes like boxes and kerns and glues.
68
69In \LUATEX, these two types are merged into one, somewhat larger structure called
70a \nod {glyph} node. Besides having the old character, font, and component
71fields there are a few more, like \quote {attr} that we will see in \in {section}
72[glyphnodes], these nodes also contain a subtype, that codes four main types and
73two additional ghost types. For ligatures, multiple bits can be set at the same
74time (in case of a single|-|glyph word).
75
76\startitemize
77    \startitem
78        \type {character}, for characters to be hyphenated: the lowest bit
79        (bit 0) is set to 1.
80    \stopitem
81    \startitem
82        \nod {glyph}, for specific font glyphs: the lowest bit (bit 0) is
83        not set.
84    \stopitem
85    \startitem
86        \type {ligature}, for constructed ligatures bit 1 is set.
87    \stopitem
88    \startitem
89        \type {ghost}, for so called \quote {ghost objects} bit 2 is set.
90    \stopitem
91    \startitem
92        \type {left}, for ligatures created from a left word boundary and for
93        ghosts created from \lpr {leftghost} bit 3 gets set.
94    \stopitem
95    \startitem
96        \type {right}, for ligatures created from a right word boundary and
97        for ghosts created from \lpr {rightghost} bit 4 is set.
98    \stopitem
99\stopitemize
100
101The \nod {glyph} nodes also contain language data, split into four items that
102were current when the node was created: the \prm {setlanguage} (15~bits), \prm
103{lefthyphenmin} (8~bits), \prm {righthyphenmin} (8~bits), and \prm {uchyph}
104(1~bit).
105
106Incidentally, \LUATEX\ allows 16383 separate languages, and words can be 256
107characters long. The language is stored with each character. You can set
108\prm {firstvalidlanguage} to for instance~1 and make thereby language~0
109an ignored hyphenation language.
110
111The new primitive \lpr {hyphenationmin} can be used to signal the minimal length
112of a word. This value is stored with the (current) language.
113
114Because the \prm {uchyph} value is saved in the actual nodes, its handling is
115subtly different from \TEX82: changes to \prm {uchyph} become effective
116immediately, not at the end of the current partial paragraph.
117
118Typeset boxes now always have their language information embedded in the nodes
119themselves, so there is no longer a possible dependency on the surrounding
120language settings. In \TEX82, a mid|-|paragraph statement like \type {\unhbox0}
121would process the box using the current paragraph language unless there was a
122\prm {setlanguage} issued inside the box. In \LUATEX, all language variables
123are already frozen.
124
125In traditional \TEX\ the process of hyphenation is driven by \type {lccode}s. In
126\LUATEX\ we made this dependency less strong. There are several strategies
127possible. When you do nothing, the currently used \type {lccode}s are used, when
128loading patterns, setting exceptions or hyphenating a list.
129
130When you set \prm {savinghyphcodes} to a value greater than zero the current set
131of \type {lccode}s will be saved with the language. In that case changing a \type
132{lccode} afterwards has no effect. However, you can adapt the set with:
133
134\starttyping
135\hjcode`a=`a
136\stoptyping
137
138This change is global which makes sense if you keep in mind that the moment that
139hyphenation happens is (normally) when the paragraph or a horizontal box is
140constructed. When \prm {savinghyphcodes} was zero when the language got
141initialized you start out with nothing, otherwise you already have a set.
142
143When a \lpr {hjcode} is greater than 0 but less than 32 it indicates the
144to be used length. In the following example we map a character (\type {x}) onto
145another one in the patterns and tell the engine that \type {Å“} counts as one
146character. Because traditionally zero itself is reserved for inhibiting
147hyphenation, a value of 32 counts as zero.
148
149Here are some examples (we assume that French patterns are used):
150
151\starttabulate[||||]
152\NC                                  \NC \type{foobar} \NC \type{foo-bar} \NC \NR
153\NC \type{\hjcode`x=`o}              \NC \type{fxxbar} \NC \type{fxx-bar} \NC \NR
154\NC \type{\lefthyphenmin3}           \NC \type{Å“dipus} \NC \type{Å“di-pus} \NC \NR
155\NC \type{\lefthyphenmin4}           \NC \type{Å“dipus} \NC \type{Å“dipus}  \NC \NR
156\NC \type{\hjcode`Å“=2}               \NC \type{Å“dipus} \NC \type{Å“di-pus} \NC \NR
157\NC \type{\hjcode`i=32 \hjcode`d=32} \NC \type{Å“dipus} \NC \type{Å“dipus}  \NC \NR
158\NC
159\stoptabulate
160
161Carrying all this information with each glyph would give too much overhead and
162also make the process of setting up these codes more complex. A solution with
163\type {hjcode} sets was considered but rejected because in practice the current
164approach is sufficient and it would not be compatible anyway.
165
166Beware: the values are always saved in the format, independent of the setting
167of \prm {savinghyphcodes} at the moment the format is dumped.
168
169A boundary node normally would mark the end of a word which interferes with for
170instance discretionary injection. For this you can use the \prm {wordboundary}
171as a trigger. Here are a few examples of usage:
172
173\startbuffer
174    discrete---discrete
175\stopbuffer
176\typebuffer \startnarrower \dontcomplain \hsize 1pt \getbuffer \par \stopnarrower
177\startbuffer
178    discrete\discretionary{}{}{---}discrete
179\stopbuffer
180\typebuffer \startnarrower \dontcomplain \hsize 1pt \getbuffer \par \stopnarrower
181\startbuffer
182    discrete\wordboundary\discretionary{}{}{---}discrete
183\stopbuffer
184\typebuffer \startnarrower \dontcomplain \hsize 1pt \getbuffer \par \stopnarrower
185\startbuffer
186    discrete\wordboundary\discretionary{}{}{---}\wordboundary discrete
187\stopbuffer
188\typebuffer \startnarrower \dontcomplain \hsize 1pt \getbuffer \par \stopnarrower
189\startbuffer
190    discrete\wordboundary\discretionary{---}{}{}\wordboundary discrete
191\stopbuffer
192\typebuffer \startnarrower \dontcomplain \hsize 1pt \getbuffer \par \stopnarrower
193
194We only accept an explicit hyphen when there is a preceding glyph and we skip a
195sequence of explicit hyphens since that normally indicates a \type {--} or \type
196{---} ligature in which case we can in a worse case usage get bad node lists
197later on due to messed up ligature building as these dashes are ligatures in base
198fonts. This is a side effect of separating the hyphenation, ligaturing and
199kerning steps.
200
201The start and end of a sequence of characters is signalled by a \nod {glue}, \nod
202{penalty}, \nod {kern} or \nod {boundary} node. But by default also a \nod
203{hlist}, \nod {vlist}, \nod {rule}, \nod {dir}, \nod {whatsit}, \nod {ins}, and
204\nod {adjust} node indicate a start or end. You can omit the last set from the
205test by setting \lpr {hyphenationbounds} to a non|-|zero value:
206
207\starttabulate[|c|l|]
208\DB value    \BC behaviour \NC \NR
209\TB
210\NC \type{0} \NC not strict \NC \NR
211\NC \type{1} \NC strict start \NC \NR
212\NC \type{2} \NC strict end \NC \NR
213\NC \type{3} \NC strict start and strict end \NC \NR
214\LL
215\stoptabulate
216
217The word start is determined as follows:
218
219\starttabulate[|l|l|]
220\DB node      \BC behaviour \NC \NR
221\TB
222\BC boundary  \NC yes when wordboundary \NC \NR
223\BC hlist     \NC when hyphenationbounds 1 or 3 \NC \NR
224\BC vlist     \NC when hyphenationbounds 1 or 3 \NC \NR
225\BC rule      \NC when hyphenationbounds 1 or 3 \NC \NR
226\BC dir       \NC when hyphenationbounds 1 or 3 \NC \NR
227\BC whatsit   \NC when hyphenationbounds 1 or 3 \NC \NR
228\BC glue      \NC yes \NC \NR
229\BC math      \NC skipped \NC \NR
230\BC glyph     \NC exhyphenchar (one only) : yes (so no -- ---) \NC \NR
231\BC otherwise \NC yes \NC \NR
232\LL
233\stoptabulate
234
235The word end is determined as follows:
236
237\starttabulate[|l|l|]
238\DB node      \BC behaviour \NC \NR
239\TB
240\BC boundary  \NC yes \NC \NR
241\BC glyph     \NC yes when different language \NC \NR
242\BC glue      \NC yes \NC \NR
243\BC penalty   \NC yes \NC \NR
244\BC kern      \NC yes when not italic (for some historic reason) \NC \NR
245\BC hlist     \NC when hyphenationbounds 2 or 3 \NC \NR
246\BC vlist     \NC when hyphenationbounds 2 or 3 \NC \NR
247\BC rule      \NC when hyphenationbounds 2 or 3 \NC \NR
248\BC dir       \NC when hyphenationbounds 2 or 3 \NC \NR
249\BC whatsit   \NC when hyphenationbounds 2 or 3 \NC \NR
250\BC ins       \NC when hyphenationbounds 2 or 3 \NC \NR
251\BC adjust    \NC when hyphenationbounds 2 or 3 \NC \NR
252\LL
253\stoptabulate
254
255\in {Figures} [hb:1] upto \in [hb:5] show some examples. In all cases we set the
256min values to 1 and make sure that the words hyphenate at each character.
257
258\hyphenation{o-n-e t-w-o}
259
260\def\SomeTest#1#2%
261  {\lefthyphenmin  \plusone
262   \righthyphenmin \plusone
263   \parindent      \zeropoint
264   \everypar       \emptytoks
265   \dontcomplain
266   \hbox to 2cm {%
267     \vtop {%
268       \hsize 1pt
269       \hyphenationbounds#1
270       #2
271       \par}}}
272
273\startplacefigure[reference=hb:1,title={\type{one}}]
274    \startcombination[4*1]
275        {\SomeTest{0}{one}} {\type{0}}
276        {\SomeTest{1}{one}} {\type{1}}
277        {\SomeTest{2}{one}} {\type{2}}
278        {\SomeTest{3}{one}} {\type{3}}
279    \stopcombination
280\stopplacefigure
281
282\startplacefigure[reference=hb:2,title={\type{one\null two}}]
283    \startcombination[4*1]
284        {\SomeTest{0}{one\null two}} {\type{0}}
285        {\SomeTest{1}{one\null two}} {\type{1}}
286        {\SomeTest{2}{one\null two}} {\type{2}}
287        {\SomeTest{3}{one\null two}} {\type{3}}
288    \stopcombination
289\stopplacefigure
290
291\startplacefigure[reference=hb:3,title={\type{\null one\null two}}]
292    \startcombination[4*1]
293        {\SomeTest{0}{\null one\null two}} {\type{0}}
294        {\SomeTest{1}{\null one\null two}} {\type{1}}
295        {\SomeTest{2}{\null one\null two}} {\type{2}}
296        {\SomeTest{3}{\null one\null two}} {\type{3}}
297    \stopcombination
298\stopplacefigure
299
300\startplacefigure[reference=hb:4,title={\type{one\null two\null}}]
301    \startcombination[4*1]
302        {\SomeTest{0}{one\null two\null}} {\type{0}}
303        {\SomeTest{1}{one\null two\null}} {\type{1}}
304        {\SomeTest{2}{one\null two\null}} {\type{2}}
305        {\SomeTest{3}{one\null two\null}} {\type{3}}
306    \stopcombination
307\stopplacefigure
308
309\startplacefigure[reference=hb:5,title={\type{\null one\null two\null}}]
310    \startcombination[4*1]
311        {\SomeTest{0}{\null one\null two\null}} {\type{0}}
312        {\SomeTest{1}{\null one\null two\null}} {\type{1}}
313        {\SomeTest{2}{\null one\null two\null}} {\type{2}}
314        {\SomeTest{3}{\null one\null two\null}} {\type{3}}
315    \stopcombination
316\stopplacefigure
317
318% (Future versions of \LUATEX\ might provide more granularity.)
319
320In traditional \TEX\ ligature building and hyphenation are interwoven with the
321line break mechanism. In \LUATEX\ these phases are isolated. As a consequence we
322deal differently with (a sequence of) explicit hyphens. We already have added
323some control over aspects of the hyphenation and yet another one concerns
324automatic hyphens (e.g.\ \type {-} characters in the input).
325
326When \lpr {automatichyphenmode} has a value of 0, a hyphen will be turned into
327an automatic discretionary. The snippets before and after it will not be
328hyphenated. A side effect is that a leading hyphen can lead to a split but one
329will seldom run into that situation. Setting a pre and post character makes this
330more prominent. A value of 1 will prevent this side effect and a value of 2 will
331not turn the hyphen into a discretionary. Experiments with other options, like
332permitting hyphenation of the words on both sides were discarded.
333
334\startbuffer[a]
335before-after \par
336before--after \par
337before---after \par
338\stopbuffer
339
340\startbuffer[b]
341-before \par
342after- \par
343--before \par
344after-- \par
345---before \par
346after--- \par
347\stopbuffer
348
349\startbuffer[c]
350before-after \par
351before--after \par
352before---after \par
353\stopbuffer
354
355\startbuffer[demo]
356\startcombination[nx=4,ny=3,location=top]
357    {\framed[align=normal,strut=no,top=\vskip.5ex,bottom=\vskip.5ex]{\automatichyphenmode\zerocount \hsize6em \getbuffer[a]}} {A~0~6em}
358    {\framed[align=normal,strut=no,top=\vskip.5ex,bottom=\vskip.5ex]{\automatichyphenmode\zerocount \hsize2pt \getbuffer[a]}} {A~0~2pt}
359    {\framed[align=normal,strut=no,top=\vskip.5ex,bottom=\vskip.5ex]{\automatichyphenmode\plusone   \hsize2pt \getbuffer[a]}} {A~1~2pt}
360    {\framed[align=normal,strut=no,top=\vskip.5ex,bottom=\vskip.5ex]{\automatichyphenmode\plustwo   \hsize2pt \getbuffer[a]}} {A~2~2pt}
361    {\framed[align=normal,strut=no,top=\vskip.5ex,bottom=\vskip.5ex]{\automatichyphenmode\zerocount \hsize6em \getbuffer[b]}} {B~0~6em}
362    {\framed[align=normal,strut=no,top=\vskip.5ex,bottom=\vskip.5ex]{\automatichyphenmode\zerocount \hsize2pt \getbuffer[b]}} {B~0~2pt}
363    {\framed[align=normal,strut=no,top=\vskip.5ex,bottom=\vskip.5ex]{\automatichyphenmode\plusone   \hsize2pt \getbuffer[b]}} {B~1~2pt}
364    {\framed[align=normal,strut=no,top=\vskip.5ex,bottom=\vskip.5ex]{\automatichyphenmode\plustwo   \hsize2pt \getbuffer[b]}} {B~2~2pt}
365    {\framed[align=normal,strut=no,top=\vskip.5ex,bottom=\vskip.5ex]{\automatichyphenmode\zerocount \hsize6em \getbuffer[c]}} {C~0~6em}
366    {\framed[align=normal,strut=no,top=\vskip.5ex,bottom=\vskip.5ex]{\automatichyphenmode\zerocount \hsize2pt \getbuffer[c]}} {C~0~2pt}
367    {\framed[align=normal,strut=no,top=\vskip.5ex,bottom=\vskip.5ex]{\automatichyphenmode\plusone   \hsize2pt \getbuffer[c]}} {C~1~2pt}
368    {\framed[align=normal,strut=no,top=\vskip.5ex,bottom=\vskip.5ex]{\automatichyphenmode\plustwo   \hsize2pt \getbuffer[c]}} {C~2~2pt}
369\stopcombination
370\stopbuffer
371
372\startplacefigure[reference=automatichyphenmode:1,title={The automatic modes \type {0} (default), \type {1} and \type {2}, with a \prm {hsize}
373of 6em and 2pt (which triggers a linebreak).}]
374    \dontcomplain \tt \getbuffer[demo]
375\stopplacefigure
376
377\startplacefigure[reference=automatichyphenmode:2,title={The automatic modes \type {0} (default), \type {1} and \type {2}, with \lpr {preexhyphenchar} and \lpr {postexhyphenchar} set to characters \type {A} and \type {B}.}]
378    \postexhyphenchar`A\relax
379    \preexhyphenchar `B\relax
380    \dontcomplain \tt \getbuffer[demo]
381\stopplacefigure
382
383In \in {figure} [automatichyphenmode:1] \in {and} [automatichyphenmode:2] we show
384what happens with three samples:
385
386Input A: \typebuffer[a]
387Input B: \typebuffer[b]
388Input C: \typebuffer[c]
389
390As with primitive companions of other single character commands, the \prm {-}
391command has a more verbose primitive version in \lpr {explicitdiscretionary}
392and the normally intercepted in the hyphenator character \type {-} (or whatever
393is configured) is available as \lpr {automaticdiscretionary}.
394
395\stopsection
396
397\startsection[title={The main control loop}]
398
399\topicindex {main loop}
400\topicindex {hyphenation}
401
402In \LUATEX's main loop, almost all input characters that are to be typeset are
403converted into \nod {glyph} node records with subtype \quote {character}, but
404there are a few exceptions.
405
406\startitemize[n]
407
408\startitem
409    The \prm {accent} primitive creates nodes with subtype \quote {glyph}
410    instead of \quote {character}: one for the actual accent and one for the
411    accentee. The primary reason for this is that \prm {accent} in \TEX82 is
412    explicitly dependent on the current font encoding, so it would not make much
413    sense to attach a new meaning to the primitive's name, as that would
414    invalidate many old documents and macro packages. A secondary reason is that
415    in \TEX82, \prm {accent} prohibits hyphenation of the current word. Since
416    in \LUATEX\ hyphenation only takes place on \quote {character} nodes, it is
417    possible to achieve the same effect. Of course, modern \UNICODE\ aware macro
418    packages will not use the \prm {accent} primitive at all but try to map
419    directly on composed characters.
420
421    This change of meaning did happen with \prm {char}, that now generates
422    \quote {glyph} nodes with a character subtype. In traditional \TEX\ there was
423    a strong relationship between the 8|-|bit input encoding, hyphenation and
424    glyphs taken from a font. In \LUATEX\ we have \UTF\ input, and in most cases
425    this maps directly to a character in a font, apart from glyph replacement in
426    the font engine. If you want to access arbitrary glyphs in a font directly
427    you can always use \LUA\ to do so, because fonts are available as \LUA\
428    table.
429\stopitem
430
431\startitem
432    All the results of processing in math mode eventually become nodes with
433    \quote {glyph} subtypes. In fact, the result of processing math is just
434    a regular list of glyphs, kerns, glue, penalties, boxes etc.
435\stopitem
436
437\startitem
438    The \ALEPH|-|derived commands \lpr {leftghost} and \lpr {rightghost}
439    create nodes of a third subtype: \quote {ghost}. These nodes are ignored
440    completely by all further processing until the stage where inter|-|glyph
441    kerning is added.
442\stopitem
443
444\startitem
445    Automatic discretionaries are handled differently. \TEX82 inserts an empty
446    discretionary after sensing an input character that matches the \prm
447    {hyphenchar} in the current font. This test is wrong in our opinion: whether
448    or not hyphenation takes place should not depend on the current font, it is a
449    language property. \footnote {When \TEX\ showed up we didn't have \UNICODE\
450    yet and being limited to eight bits meant that one sometimes had to
451    compromise between supporting character input, glyph rendering, hyphenation.}
452
453    In \LUATEX, it works like this: if \LUATEX\ senses a string of input
454    characters that matches the value of the new integer parameter \prm
455    {exhyphenchar}, it will insert an explicit discretionary after that series of
456    nodes. Initially \TEX\ sets the \type {\exhyphenchar=`\-}. Incidentally, this
457    is a global parameter instead of a language-specific one because it may be
458    useful to change the value depending on the document structure instead of the
459    text language.
460
461    The insertion of discretionaries after a sequence of explicit hyphens happens
462    at the same time as the other hyphenation processing, {\it not\/} inside the
463    main control loop.
464
465    The only use \LUATEX\ has for \prm {hyphenchar} is at the check whether a
466    word should be considered for hyphenation at all. If the \prm {hyphenchar}
467    of the font attached to the first character node in a word is negative, then
468    hyphenation of that word is abandoned immediately. This behaviour is added
469    for backward compatibility only, and the use of \type {\hyphenchar=-1} as a
470    means of preventing hyphenation should not be used in new \LUATEX\ documents.
471\stopitem
472
473\startitem
474    The \prm {setlanguage} command no longer creates whatsits. The meaning of
475    \prm {setlanguage} is changed so that it is now an integer parameter like all
476    others. That integer parameter is used in \type {\glyph_node} creation to add
477    language information to the glyph nodes. In conjunction, the \prm {language}
478    primitive is extended so that it always also updates the value of \prm
479    {setlanguage}.
480\stopitem
481
482\startitem
483    The \prm {noboundary} command (that prohibits word boundary processing
484    where that would normally take place) now does create nodes. These nodes are
485    needed because the exact place of the \prm {noboundary} command in the
486    input stream has to be retained until after the ligature and font processing
487    stages.
488\stopitem
489
490\startitem
491    There is no longer a \type {main_loop} label in the code. Remember that
492    \TEX82 did quite a lot of processing while adding \type {char_nodes} to the
493    horizontal list? For speed reasons, it handled that processing code outside
494    of the \quote {main control} loop, and only the first character of any \quote
495    {word} was handled by that \quote {main control} loop. In \LUATEX, there is
496    no longer a need for that (all hard work is done later), and the (now very
497    small) bits of character|-|handling code have been moved back inline. When
498    \prm {tracingcommands} is on, this is visible because the full word is
499    reported, instead of just the initial character.
500\stopitem
501
502\stopitemize
503
504Because we tend to make hard coded behaviour configurable a few new primitives
505have been added:
506
507\starttyping
508\hyphenpenaltymode
509\automatichyphenpenalty
510\explicithyphenpenalty
511\stoptyping
512
513The first parameter has the following consequences for automatic discs (the ones
514resulting from an \prm {exhyphenchar}:
515
516\starttabulate[|c|l|l|]
517\DB mode     \BC automatic disc \type {-}      \BC explicit disc \prm{-}         \NC \NR
518\TB
519\NC \type{0} \NC \prm {exhyphenpenalty}        \NC \prm {exhyphenpenalty}        \NC \NR
520\NC \type{1} \NC \prm {hyphenpenalty}          \NC \prm {hyphenpenalty}          \NC \NR
521\NC \type{2} \NC \prm {exhyphenpenalty}        \NC \prm {hyphenpenalty}          \NC \NR
522\NC \type{3} \NC \prm {hyphenpenalty}          \NC \prm {exhyphenpenalty}        \NC \NR
523\NC \type{4} \NC \lpr {automatichyphenpenalty} \NC \lpr {explicithyphenpenalty}  \NC \NR
524\NC \type{5} \NC \prm {exhyphenpenalty}        \NC \lpr {explicithyphenpenalty}  \NC \NR
525\NC \type{6} \NC \prm {hyphenpenalty}          \NC \lpr {explicithyphenpenalty}  \NC \NR
526\NC \type{7} \NC \lpr {automatichyphenpenalty} \NC \prm {exhyphenpenalty}        \NC \NR
527\NC \type{8} \NC \lpr {automatichyphenpenalty} \NC \prm {hyphenpenalty}          \NC \NR
528\LL
529\stoptabulate
530
531other values do what we always did in \LUATEX: insert \prm {exhyphenpenalty}.
532
533\stopsection
534
535\startsection[title={Loading patterns and exceptions},reference=patternsexceptions]
536
537\topicindex {hyphenation}
538\topicindex {hyphenation+patterns}
539\topicindex {hyphenation+exceptions}
540\topicindex {patterns}
541\topicindex {exceptions}
542
543Although we keep the traditional approach towards hyphenation (which is still
544superior) the implementation of the hyphenation algorithm in \LUATEX\ is quite
545different from the one in \TEX82.
546
547After expansion, the argument for \prm {patterns} has to be proper \UTF8 with
548individual patterns separated by spaces, no \prm {char} or \prm {chardef}d
549commands are allowed. The current implementation is quite strict and will reject
550all non|-|\UNICODE\ characters. Likewise, the expanded argument for \prm
551{hyphenation} also has to be proper \UTF8, but here a bit of extra syntax is
552provided:
553
554\startitemize[n]
555\startitem
556    Three sets of arguments in curly braces (\type {{}{}{}}) indicate a desired
557    complex discretionary, with arguments as in \prm {discretionary}'s command in
558    normal document input.
559\stopitem
560\startitem
561    A \type {-} indicates a desired simple discretionary, cf.\ \type {\-} and
562    \type {\discretionary{-}{}{}} in normal document input.
563\stopitem
564\startitem
565    Internal command names are ignored. This rule is provided especially for \prm
566    {discretionary}, but it also helps to deal with \prm {relax} commands that
567    may sneak in.
568\stopitem
569\startitem
570    An \type {=} indicates a (non|-|discretionary) hyphen in the document input.
571\stopitem
572\stopitemize
573
574The expanded argument is first converted back to a space|-|separated string while
575dropping the internal command names. This string is then converted into a
576dictionary by a routine that creates key|-|value pairs by converting the other
577listed items. It is important to note that the keys in an exception dictionary
578can always be generated from the values. Here are a few examples:
579
580\starttabulate[|l|l|l|]
581\DB value                  \BC implied key (input) \BC effect \NC\NR
582\TB
583\NC \type {ta-ble}         \NC table               \NC \type {ta\-ble} ($=$ \type {ta\discretionary{-}{}{}ble}) \NC\NR
584\NC \type {ba{k-}{}{c}ken} \NC backen              \NC \type {ba\discretionary{k-}{}{c}ken} \NC\NR
585\LL
586\stoptabulate
587
588The resultant patterns and exception dictionary will be stored under the language
589code that is the present value of \prm {language}.
590
591In the last line of the table, you see there is no \prm {discretionary} command
592in the value: the command is optional in the \TEX-based input syntax. The
593underlying reason for that is that it is conceivable that a whole dictionary of
594words is stored as a plain text file and loaded into \LUATEX\ using one of the
595functions in the \LUA\ \type {lang} library. This loading method is quite a bit
596faster than going through the \TEX\ language primitives, but some (most?) of that
597speed gain would be lost if it had to interpret command sequences while doing so.
598
599It is possible to specify extra hyphenation points in compound words by using
600\type {{-}{}{-}} for the explicit hyphen character (replace \type {-} by the
601actual explicit hyphen character if needed). For example, this matches the word
602\quote {multi|-|word|-|boundaries} and allows an extra break inbetween \quote
603{boun} and \quote {daries}:
604
605\starttyping
606\hyphenation{multi{-}{}{-}word{-}{}{-}boun-daries}
607\stoptyping
608
609The motivation behind the \ETEX\ extension \prm {savinghyphcodes} was that
610hyphenation heavily depended on font encodings. This is no longer true in
611\LUATEX, and the corresponding primitive is basically ignored. Because we now
612have \lpr {hjcode}, the case relate codes can be used exclusively for \prm
613{uppercase} and \prm {lowercase}.
614
615The three curly brace pair pattern in an exception can be somewhat unexpected so
616we will try to explain it by example. The pattern \type {foo{}{}{x}bar} pattern
617creates a lookup \type {fooxbar} and the pattern \type {foo{}{}{}bar} creates
618\type {foobar}. Then, when a hit happens there is a replacement text (\type {x})
619or none. Because we introduced penalties in discretionary nodes, the exception
620syntax now also can take a penalty specification. The value between square brackets
621is a multiplier for \lpr {exceptionpenalty}. Here we have set it to 10000 so
622effectively we get 30000 in the example.
623
624\def\ShowSample#1#2%
625  {\startlinecorrection[blank]
626   \hyphenation{#1}%
627   \exceptionpenalty=10000
628   \bTABLE[foregroundstyle=type]
629     \bTR
630       \bTD[align=middle,nx=4] \type{#1} \eTD
631     \eTR
632     \bTR
633       \bTD[align=middle] \type{10em} \eTD
634       \bTD[align=middle] \type {3em} \eTD
635       \bTD[align=middle] \type {0em} \eTD
636       \bTD[align=middle] \type {6em} \eTD
637     \eTR
638     \bTR
639       \bTD[width=10em]\vtop{\hsize 10em 123 #2 123\par}\eTD
640       \bTD[width=10em]\vtop{\hsize  3em 123 #2 123\par}\eTD
641       \bTD[width=10em]\vtop{\hsize  0em 123 #2 123\par}\eTD
642       \bTD[width=10em]\vtop{\setupalign[verytolerant,stretch]\rmtf\hsize 6em 123 #2 #2 #2 #2 123\par}\eTD
643     \eTR
644   \eTABLE
645   \stoplinecorrection}
646
647\ShowSample{x{a-}{-b}{}x{a-}{-b}{}x{a-}{-b}{}x{a-}{-b}{}xx}{xxxxxx}
648\ShowSample{x{a-}{-b}{}x{a-}{-b}{}[3]x{a-}{-b}{}[1]x{a-}{-b}{}xx}{xxxxxx}
649
650\ShowSample{z{a-}{-b}{z}{a-}{-b}{z}{a-}{-b}{z}{a-}{-b}{z}z}{zzzzzz}
651\ShowSample{z{a-}{-b}{z}{a-}{-b}{z}[3]{a-}{-b}{z}[1]{a-}{-b}{z}z}{zzzzzz}
652
653\stopsection
654
655\startsection[title={Applying hyphenation}]
656
657\topicindex {hyphenation+how it works}
658\topicindex {hyphenation+discretionaries}
659\topicindex {discretionaries}
660
661The internal structures \LUATEX\ uses for the insertion of discretionaries in
662words is very different from the ones in \TEX82, and that means there are some
663noticeable differences in handling as well.
664
665First and foremost, there is no \quote {compressed trie} involved in hyphenation.
666The algorithm still reads pattern files generated by \PATGEN, but \LUATEX\ uses a
667finite state hash to match the patterns against the word to be hyphenated. This
668algorithm is based on the \quote {libhnj} library used by \OPENOFFICE, which in
669turn is inspired by \TEX.
670
671There are a few differences between \LUATEX\ and \TEX82 that are a direct result
672of the implementation:
673
674\startitemize
675\startitem
676    \LUATEX\ happily hyphenates the full \UNICODE\ character range.
677\stopitem
678\startitem
679    Pattern and exception dictionary size is limited by the available memory
680    only, all allocations are done dynamically. The trie|-|related settings in
681    \type {texmf.cnf} are ignored.
682\stopitem
683\startitem
684    Because there is no \quote {trie preparation} stage, language patterns never
685    become frozen. This means that the primitive \prm {patterns} (and its \LUA\
686    counterpart \type {lang.patterns}) can be used at any time, not only in
687    ini\TEX.
688\stopitem
689\startitem
690    Only the string representation of \prm {patterns} and \prm {hyphenation} is
691    stored in the format file. At format load time, they are simply
692    re|-|evaluated. It follows that there is no real reason to preload languages
693    in the format file. In fact, it is usually not a good idea to do so. It is
694    much smarter to load patterns no sooner than the first time they are actually
695    needed.
696\stopitem
697\startitem
698    \LUATEX\ uses the language-specific variables \lpr {prehyphenchar} and \lpr
699    {posthyphenchar} in the creation of implicit discretionaries, instead of
700    \TEX82's \prm {hyphenchar}, and the values of the language|-|specific
701    variables \lpr {preexhyphenchar} and \lpr {postexhyphenchar} for explicit
702    discretionaries (instead of \TEX82's empty discretionary).
703\stopitem
704\startitem
705    The value of the two counters related to hyphenation, \prm {hyphenpenalty}
706    and \prm {exhyphenpenalty}, are now stored in the discretionary nodes. This
707    permits a local overload for explicit \prm {discretionary} commands. The
708    value current when the hyphenation pass is applied is used. When no callbacks
709    are used this is compatible with traditional \TEX. When you apply the \LUA\
710    \type {lang.hyphenate} function the current values are used.
711\stopitem
712\startitem
713    The hyphenation exception dictionary is maintained as key|-|value hash, and
714    that is also dynamic, so the \type {hyph_size} setting is not used either.
715\stopitem
716\stopitemize
717
718Because we store penalties in the disc node the \prm {discretionary} command has
719been extended to accept an optional penalty specification, so you can do the
720following:
721
722\startbuffer
723\hsize1mm
7241:foo{\hyphenpenalty 10000\discretionary{}{}{}}bar\par
7252:foo\discretionary penalty 10000 {}{}{}bar\par
7263:foo\discretionary{}{}{}bar\par
727\stopbuffer
728
729\typebuffer
730
731This results in:
732
733\blank \start \getbuffer \stop \blank
734
735Inserted characters and ligatures inherit their attributes from the nearest glyph
736node item (usually the preceding one, but the following one for the items
737inserted at the left-hand side of a word).
738
739Word boundaries are no longer implied by font switches, but by language switches.
740One word can have two separate fonts and still be hyphenated correctly (but it
741can not have two different languages, the \prm {setlanguage} command forces a
742word boundary).
743
744All languages start out with \type {\prehyphenchar=`\-}, \type {\posthyphenchar=0},
745\type {\preexhyphenchar=0} and \type {\postexhyphenchar=0}. When you assign the
746values of one of these four parameters, you are actually changing the settings
747for the current \prm {language}, this behaviour is compatible with \prm {patterns}
748and \prm {hyphenation}.
749
750\LUATEX\ also hyphenates the first word in a paragraph. Words can be up to 256
751characters long (up from 64 in \TEX82). Longer words are ignored right now, but
752eventually either the limitation will be removed or perhaps it will become
753possible to silently ignore the excess characters (this is what happens in
754\TEX82, but there the behaviour cannot be controlled).
755
756If you are using the \LUA\ function \type {lang.hyphenate}, you should be aware
757that this function expects to receive a list of \quote {character} nodes. It will
758not operate properly in the presence of \quote {glyph}, \quote {ligature}, or
759\quote {ghost} nodes, nor does it know how to deal with kerning.
760
761\stopsection
762
763\startsection[title={Applying ligatures and kerning}]
764
765\topicindex {ligatures}
766\topicindex {kerning}
767
768After all possible hyphenation points have been inserted in the list, \LUATEX\
769will process the list to convert the \quote {character} nodes into \quote {glyph}
770and \quote {ligature} nodes. This is actually done in two stages: first all
771ligatures are processed, then all kerning information is applied to the result
772list. But those two stages are somewhat dependent on each other: If the used font
773makes it possible to do so, the ligaturing stage adds virtual \quote {character}
774nodes to the word boundaries in the list. While doing so, it removes and
775interprets \prm {noboundary} nodes. The kerning stage deletes those word
776boundary items after it is done with them, and it does the same for \quote
777{ghost} nodes. Finally, at the end of the kerning stage, all remaining \quote
778{character} nodes are converted to \quote {glyph} nodes.
779
780This word separation is worth mentioning because, if you overrule from \LUA\ only
781one of the two callbacks related to font handling, then you have to make sure you
782perform the tasks normally done by \LUATEX\ itself in order to make sure that the
783other, non|-|overruled, routine continues to function properly.
784
785Although we could improve the situation the reality is that in modern \OPENTYPE\
786fonts ligatures can be constructed in many ways: by replacing a sequence of
787characters by one glyph, or by selectively replacing individual glyphs, or by
788kerning, or any combination of this. Add to that contextual analysis and it will
789be clear that we have to let \LUA\ do that job instead. The generic font handler
790that we provide (which is part of \CONTEXT) distinguishes between base mode
791(which essentially is what we describe here and which delegates the task to \TEX)
792and node mode (which deals with more complex fonts.
793
794Let's look at an example. Take the word \type {office}, hyphenated \type
795{of-fice}, using a \quote {normal} font with all the \type {f}-\type {f} and
796\type {f}-\type {i} type ligatures:
797
798\starttabulate[|l|l|]
799\NC initial              \NC \type {{o}{f}{f}{i}{c}{e}}             \NC\NR
800\NC after hyphenation    \NC \type {{o}{f}{{-},{},{}}{f}{i}{c}{e}}  \NC\NR
801\NC first ligature stage \NC \type {{o}{{f-},{f},{<ff>}}{i}{c}{e}}  \NC\NR
802\NC final result         \NC \type {{o}{{f-},{<fi>},{<ffi>}}{c}{e}} \NC\NR
803\stoptabulate
804
805That's bad enough, but let us assume that there is also a hyphenation point
806between the \type {f} and the \type {i}, to create \type {of-f-ice}. Then the
807final result should be:
808
809\starttyping
810{o}{{f-},
811    {{f-},
812     {i},
813     {<fi>}},
814    {{<ff>-},
815     {i},
816     {<ffi>}}}{c}{e}
817\stoptyping
818
819with discretionaries in the post-break text as well as in the replacement text of
820the top-level discretionary that resulted from the first hyphenation point.
821
822Here is that nested solution again, in a different representation:
823
824\testpage[4]
825
826\starttabulate[|l|c|c|c|c|c|c|]
827\DB         \BC pre           \BC     \BC post      \BC       \BC replace       \BC       \NC \NR
828\TB
829\NC topdisc \NC \type {f-}    \NC (1) \NC           \NC sub 1 \NC               \NC sub 2 \NC \NR
830\NC sub 1   \NC \type {f-}    \NC (2) \NC \type {i} \NC (3)   \NC \type {<fi>}  \NC (4)   \NC \NR
831\NC sub 2   \NC \type {<ff>-} \NC (5) \NC \type {i} \NC (6)   \NC \type {<ffi>} \NC (7)   \NC \NR
832\LL
833\stoptabulate
834
835When line breaking is choosing its breakpoints, the following fields will
836eventually be selected:
837
838\starttabulate[|l|c|c|]
839\NC \type {of-f-ice} \NC \type {f-}    \NC (1) \NC \NR
840\NC                  \NC \type {f-}    \NC (2) \NC \NR
841\NC                  \NC \type {i}     \NC (3) \NC \NR
842\NC \type {of-fice}  \NC \type {f-}    \NC (1) \NC \NR
843\NC                  \NC \type {<fi>}  \NC (4) \NC \NR
844\NC \type {off-ice}  \NC \type {<ff>-} \NC (5) \NC \NR
845\NC                  \NC \type {i}     \NC (6) \NC \NR
846\NC \type {office}   \NC \type {<ffi>} \NC (7) \NC \NR
847\stoptabulate
848
849The current solution in \LUATEX\ is not able to handle nested discretionaries,
850but it is in fact smart enough to handle this fictional \type {of-f-ice} example.
851It does so by combining two sequential discretionary nodes as if they were a
852single object (where the second discretionary node is treated as an extension of
853the first node).
854
855One can observe that the \type {of-f-ice} and \type {off-ice} cases both end with
856the same actual post replacement list (\type {i}), and that this would be the
857case even if \type {i} was the first item of a potential following ligature like
858\type {ic}. This allows \LUATEX\ to do away with one of the fields, and thus make
859the whole stuff fit into just two discretionary nodes.
860
861The mapping of the seven list fields to the six fields in this discretionary node
862pair is as follows:
863
864\starttabulate[|l|c|c|]
865\DB field                 \BC description   \NC       \NC \NR
866\TB
867\NC \type {disc1.pre}     \NC \type {f-}    \NC (1)   \NC \NR
868\NC \type {disc1.post}    \NC \type {<fi>}  \NC (4)   \NC \NR
869\NC \type {disc1.replace} \NC \type {<ffi>} \NC (7)   \NC \NR
870\NC \type {disc2.pre}     \NC \type {f-}    \NC (2)   \NC \NR
871\NC \type {disc2.post}    \NC \type {i}     \NC (3,6) \NC \NR
872\NC \type {disc2.replace} \NC \type {<ff>-} \NC (5)   \NC \NR
873\LL
874\stoptabulate
875
876What is actually generated after ligaturing has been applied is therefore:
877
878\starttyping
879{o}{{f-},
880    {<fi>},
881    {<ffi>}}
882   {{f-},
883    {i},
884    {<ff>-}}{c}{e}
885\stoptyping
886
887The two discretionaries have different subtypes from a discretionary appearing on
888its own: the first has subtype 4, and the second has subtype 5. The need for
889these special subtypes stems from the fact that not all of the fields appear in
890their \quote {normal} location. The second discretionary especially looks odd,
891with things like the \type {<ff>-} appearing in \type {disc2.replace}. The fact
892that some of the fields have different meanings (and different processing code
893internally) is what makes it necessary to have different subtypes: this enables
894\LUATEX\ to distinguish this sequence of two joined discretionary nodes from the
895case of two standalone discretionaries appearing in a row.
896
897Of course there is still that relationship with fonts: ligatures can be implemented by
898mapping a sequence of glyphs onto one glyph, but also by selective replacement and
899kerning. This means that the above examples are just representing the traditional
900approach.
901
902\stopsection
903
904\startsection[title={Breaking paragraphs into lines}]
905
906\topicindex {linebreaks}
907\topicindex {paragraphs}
908\topicindex {discretionaries}
909
910This code is almost unchanged, but because of the above|-|mentioned changes
911with respect to discretionaries and ligatures, line breaking will potentially be
912different from traditional \TEX. The actual line breaking code is still based on
913the \TEX82 algorithms, and it does not expect there to be discretionaries inside
914of discretionaries. But, as patterns evolve and font handling can influence
915discretionaries, you need to be aware of the fact that long term consistency is not
916an engine matter only.
917
918But that situation is now fairly common in \LUATEX, due to the changes to the
919ligaturing mechanism. And also, the \LUATEX\ discretionary nodes are implemented
920slightly different from the \TEX82 nodes: the \type {no_break} text is now
921embedded inside the disc node, where previously these nodes kept their place in
922the horizontal list. In traditional \TEX\ the discretionary node contains a
923counter indicating how many nodes to skip, but in \LUATEX\ we store the pre, post
924and replace text in the discretionary node.
925
926The combined effect of these two differences is that \LUATEX\ does not always use
927all of the potential breakpoints in a paragraph, especially when fonts with many
928ligatures are used. Of course kerning also complicates matters here.
929
930\stopsection
931
932\startsection[title={The \type {lang} library}][library=lang]
933
934\subsection {\type {new} and \type {id}}
935
936\topicindex {languages+library}
937
938\libindex {new}
939\libindex {id}
940
941This library provides the interface to \LUATEX's structure representing a
942language, and the associated functions.
943
944\startfunctioncall
945<language> l = lang.new()
946<language> l = lang.new(<number> id)
947\stopfunctioncall
948
949This function creates a new userdata object. An object of type \type {<language>}
950is the first argument to most of the other functions in the \type {lang} library.
951These functions can also be used as if they were object methods, using the colon
952syntax. Without an argument, the next available internal id number will be
953assigned to this object. With argument, an object will be created that links to
954the internal language with that id number.
955
956\startfunctioncall
957<number> n = lang.id(<language> l)
958\stopfunctioncall
959
960The number returned is the internal \prm {language} id number this object refers to.
961
962\subsection {\type {hyphenation}}
963
964\libindex {hyphenation}
965
966You can hyphenate a string directly with:
967
968\startfunctioncall
969<string> n = lang.hyphenation(<language> l)
970lang.hyphenation(<language> l, <string> n)
971\stopfunctioncall
972
973\subsection {\type {clear_hyphenation} and \type {clean}}
974
975\libindex {clear_hyphenation}
976\libindex {clean}
977
978This either returns the current hyphenation exceptions for this language, or adds
979new ones. The syntax of the string is explained in~\in {section}
980[patternsexceptions].
981
982\startfunctioncall
983lang.clear_hyphenation(<language> l)
984\stopfunctioncall
985
986This call clears the exception dictionary (string) for this language.
987
988\startfunctioncall
989<string> n = lang.clean(<language> l, <string> o)
990<string> n = lang.clean(<string> o)
991\stopfunctioncall
992
993This function creates a hyphenation key from the supplied hyphenation value. The
994syntax of the argument string is explained in \in {section} [patternsexceptions].
995This function is useful if you want to do something else based on the words in a
996dictionary file, like spell|-|checking.
997
998\subsection {\type {patterns} and \type {clear_patterns}}
999
1000\libindex {patterns}
1001\libindex {clear_patterns}
1002
1003\startfunctioncall
1004<string> n = lang.patterns(<language> l)
1005lang.patterns(<language> l, <string> n)
1006\stopfunctioncall
1007
1008This adds additional patterns for this language object, or returns the current
1009set. The syntax of this string is explained in \in {section}
1010[patternsexceptions].
1011
1012\startfunctioncall
1013lang.clear_patterns(<language> l)
1014\stopfunctioncall
1015
1016This can be used to clear the pattern dictionary for a language.
1017
1018\subsection {\type {hyphenationmin}}
1019
1020\libindex {hyphenationmin}
1021
1022This function sets (or gets) the value of the \TEX\ parameter
1023\type {\hyphenationmin}.
1024
1025\startfunctioncall
1026n = lang.hyphenationmin(<language> l)
1027lang.hyphenationmin(<language> l, <number> n)
1028\stopfunctioncall
1029
1030\subsection {\type {[pre|post][ex|]hyphenchar}}
1031
1032\libindex {prehyphenchar}
1033\libindex {posthyphenchar}
1034\libindex {preexhyphenchar}
1035\libindex {postexhyphenchar}
1036
1037\startfunctioncall
1038<number> n = lang.prehyphenchar(<language> l)
1039lang.prehyphenchar(<language> l, <number> n)
1040
1041<number> n = lang.posthyphenchar(<language> l)
1042lang.posthyphenchar(<language> l, <number> n)
1043\stopfunctioncall
1044
1045These two are used to get or set the \quote {pre|-|break} and \quote
1046{post|-|break} hyphen characters for implicit hyphenation in this language. The
1047intial values are decimal 45 (hyphen) and decimal~0 (indicating emptiness).
1048
1049\startfunctioncall
1050<number> n = lang.preexhyphenchar(<language> l)
1051lang.preexhyphenchar(<language> l, <number> n)
1052
1053<number> n = lang.postexhyphenchar(<language> l)
1054lang.postexhyphenchar(<language> l, <number> n)
1055\stopfunctioncall
1056
1057These gets or set the \quote {pre|-|break} and \quote {post|-|break} hyphen
1058characters for explicit hyphenation in this language. Both are initially
1059decimal~0 (indicating emptiness).
1060
1061\subsection {\type {hyphenate}}
1062
1063\libindex {hyphenate}
1064
1065The next call inserts hyphenation points (discretionary nodes) in a node list. If
1066\type {tail} is given as argument, processing stops on that node. Currently,
1067\type {success} is always true if \type {head} (and \type {tail}, if specified)
1068are proper nodes, regardless of possible other errors.
1069
1070\startfunctioncall
1071<boolean> success = lang.hyphenate(<node> head)
1072<boolean> success = lang.hyphenate(<node> head, <node> tail)
1073\stopfunctioncall
1074
1075Hyphenation works only on \quote {characters}, a special subtype of all the glyph
1076nodes with the node subtype having the value \type {1}. Glyph modes with
1077different subtypes are not processed. See \in {section} [charsandglyphs] for
1078more details.
1079
1080\subsection {\type {[set|get]hjcode}}
1081
1082\libindex {sethjcode}
1083\libindex {gethjcode}
1084
1085The following two commands can be used to set or query hj codes:
1086
1087\startfunctioncall
1088lang.sethjcode(<language> l, <number> char, <number> usedchar)
1089<number> usedchar = lang.gethjcode(<language> l, <number> char)
1090\stopfunctioncall
1091
1092When you set a hjcode the current sets get initialized unless the set was already
1093initialized due to \prm {savinghyphcodes} being larger than zero.
1094
1095\stopsection
1096
1097\stopchapter
1098
1099\stopcomponent
1100
1101% \parindent0pt \hsize=1.1cm
1102% 12-34-56 \par
1103% 12-34-\hbox{56} \par
1104% 12-34-\vrule width 1em height 1.5ex \par
1105% 12-\hbox{34}-56 \par
1106% 12-\vrule width 1em height 1.5ex-56 \par
1107% \hjcode`\1=`\1 \hjcode`\2=`\2 \hjcode`\3=`\3 \hjcode`\4=`\4 \vskip.5cm
1108% 12-34-56 \par
1109% 12-34-\hbox{56} \par
1110% 12-34-\vrule width 1em height 1.5ex \par
1111% 12-\hbox{34}-56 \par
1112% 12-\vrule width 1em height 1.5ex-56 \par
1113
1114