% language=us runpath=texruns:manuals/evenmore \environment evenmore-style \startcomponent evenmore-hyphenation \usebodyfont[pagella] \startchapter[title=Hyphenation] \startsection[title={Introduction}] Hyphenation is driven by the character codes. In a traditional \TEX\ such a code accesses a glyph in a font, which is why the font encoding mattered, but in \LUATEX\ we use \UNICODE\ and when hyphenation is applied. \footnote {In \CONTEXT\ \MKII\ we also use \UTF\ patterns, which made it possible to ship patterns that didn't depend on a font encoding. Mojca and Arthur made \UTF\ the default when the (upgraded) hyphenation pattern project started.} Later, the character codes are adapted by the font handler where they become glyphs. There are moments when you don't want to hyphenate and a cheap trick is to switch to a language that has no hyphenation patterns. But, in a system like \CONTEXT\ that doesn't work well because we have lots of language bound properties. Therefore in \MKIV\ we set the left- and right hyphen minima to extreme values, something that blocks hyphenation quite well. But this is not a pretty solution at all. Even worse is that when we have situations where discretionaries (\type {\discretionary}), automatic (\type{-}) or explicit (\type {\-}) are used these still kick in. For that reason in \LMTX\ we have a mode variable that controls hyphenation. In \LUATEX\ we have primitives like \type {\compoundhyphenmode}, \type {\hyphenationbounds} and \type {\hyphenpenaltymode} that controlled how hyphenation and discretionary injection is handled but when in \LUAMETATEX\ the more generic \type {\hyphenationmode} parameter was introduced the precursors were all merged into this one. One can argue that this is a form of regression but there are good reasons, most noticeably the fact that we keep these properties with glyph nodes so that we have better control over them in grouped situations where as some operations happen when the paragraph as whole get treated local overloads are lost. \footnote {Of course it also is a wink to those who complain that we add primitives to an otherwise leaner variant of \LUATEX, but let us not elaborate on that misunderstanding.} It anyway means that in \LMTX\ we have to set different parameters but that is no big deal because users are supposed to use the more high level interfaces; instead of setting parameters to values one flips bits in \type {\hyphenationmode}, which in the end makes more sense and also permits extensions later without adding much overhead. Currently this mode parameter controls the following options: \starttabulate[|Tr|||] \NC \uchexnumber{\normalhyphenationcode} \NC \type{\normalhyphenationcode} \NC honour the (normal) \type{\discretionary} primitive \NC \NR \NC \uchexnumber{\automatichyphenationcode} \NC \type{\automatichyphenationcode} \NC turn \type {-} into (automatic) discretionaries \NC \NR \NC \uchexnumber{\explicithyphenationcode} \NC \type{\explicithyphenationcode} \NC turn \type {\-} into (explicit) discretionaries \NC \NR \NC \uchexnumber{\syllablehyphenationcode} \NC \type{\syllablehyphenationcode} \NC hyphenate (syllable) according to language \NC \NR \NC \uchexnumber{\uppercasehyphenationcode} \NC \type{\uppercasehyphenationcode} \NC hyphenate uppercase characters too \NC \NR \NC \uchexnumber{\compoundhyphenationcode} \NC \type{\compoundhyphenationcode} \NC permit break at an explicit hyphen (border cases) \NC \NR \NC \uchexnumber{\strictstarthyphenationcode} \NC \type{\strictstarthyphenationcode} \NC traditional \TEX\ compatibility wrt the start of a word \NC \NR \NC \uchexnumber{\strictendhyphenationcode} \NC \type{\strictendhyphenationcode} \NC traditional \TEX\ compatibility wrt the end of a word \NC \NR \NC \uchexnumber{\automaticpenaltyhyphenationcode} \NC \type{\automaticpenaltyhyphenationcode} \NC use \type {\automatichyphenpenalty} \NC \NR \NC \uchexnumber{\explicitpenaltyhyphenationcode} \NC \type{\explicitpenaltyhyphenationcode} \NC use \type {\explicithyphenpenalty} \NC \NR \NC \uchexnumber{\permitgluehyphenationcode} \NC \type{\permitgluehyphenationcode} \NC turn glue in discretionaries into kerns \NC \NR \stoptabulate The default \CONTEXT\ setup is: \starttyping \hyphenationmode \numexpr \normalhyphenationcode + \automatichyphenationcode + \explicithyphenationcode + \syllablehyphenationcode + \uppercasehyphenationcode + \compoundhyphenationcode % \strictstarthyphenationcode % \strictendhyphenationcode + \automaticpenaltyhyphenationcode + \explicitpenaltyhyphenationcode + \permitgluehyphenationcode \relax \stoptyping When a discretionary node is created (triggered by \type {\discretionary}) the current value is used. Injected glyph nodes on the other hand will store the current value and use that when it is needed for hyphenating the list. \stopsection \startsection[title={Controlling hyphenation}] We start with an example that has some Dutch words: \startbuffer[sample] NEDERLANDS\par Nederlands\par nederlands\par \CONTEXT \par test\-test\par test-test \par \stopbuffer \typebuffer[sample] \startbuffer[result] \startlinecorrection \dontleavehmode \dorecurse{\boxlines\scratchboxone} {% \setbox\scratchbox\boxline\scratchboxone#1% \ruledhpack{\strut\unhbox\scratchbox}% \kern.25\emwidth } \stoplinecorrection \stopbuffer When we typeset this with a \type {\hsize} of 2mm we get: \setbox\scratchboxone\vbox{\dontcomplain \nl \hsize 2mm \getbuffer[sample]} \getbuffer[result] But when we block hyphenation with \type {\nohyhens} we see: \setbox\scratchboxone\vbox{\dontcomplain \nl \hsize 2mm \nohyphens \getbuffer[sample]} \getbuffer[result] The \MKIV\ behavior can be emulated by setting the mode as follows \startbuffer[demo] \bitwiseflip \hyphenationmode \syllablehyphenationcode \stopbuffer \setbox\scratchboxone\vbox{\dontcomplain \nl \hsize 2mm \getbuffer[demo] \getbuffer[sample]} \getbuffer[result] This time the three non|-|syllable variants get hyphenated and that is not what we want. In this case there is a \type {\discretionary} in the definition of the macro that generates \CONTEXT\ and, apart from the fact that we might not even want to hyphenate logos, we have to block it when we apply \type {\nohyphens}. This mode setting are directly applied to the three non|-|syllable variants but delayed in the syllable discretionaries because hyphenation happens later so the state becomes a property of glyph nodes. Doing the same for the other discretionaries would demand an adaption of various pieces of the engine code and plugged in user (\LUA) code also has to consider it which makes no sense. \startbuffer[sample] \nohyphens nederlands {\dohyphens nederlands} nederlands\par \stopbuffer \typebuffer[sample] \setbox\scratchboxone\vbox{\dontcomplain \nl \hsize 2mm \getbuffer[sample]} \getbuffer[result] Compare this with: \startbuffer[sample] nederlands {\nohyphens nederlands} nederlands\par \stopbuffer \typebuffer[sample] \setbox\scratchboxone\vbox{\dontcomplain \nl \hsize 2mm \getbuffer[sample]} \getbuffer[result] \stopsection \startsection[title={Compound hyphenation}] Yet another discretionary related issue is with compound words, that is: cases where \type {\discretionary} commands sit between words. There are of course tricks to deal with it like adding a huge penalty combined with a zero skip. This is okay in a traditional \TEX\ engine but in an opened up one you might not want this. Just to mention one aspect: when processing \OPENTYPE\ fonts you actually need to look into discretionaries in order to deal with glyphs that interact. And you don't want to deal with penalties and skips unless they have an explicit meaning. We show the four possibilities: \startbuffer[sample] nederlands\discretionary {!}{!}{!}nederlands\blank \stopbuffer \typebuffer[sample] \setbox\scratchboxone\vbox{\dontcomplain \nl \hsize 2mm \getbuffer[sample]} \getbuffer[result] \startbuffer[sample] nederlands\discretionary options 1 {!}{!}{!}nederlands\blank \stopbuffer \typebuffer[sample] \setbox\scratchboxone\vbox{\dontcomplain \nl \hsize 2mm \getbuffer[sample]} \getbuffer[result] \startbuffer[sample] nederlands\discretionary options 2 {!}{!}{!}nederlands\blank \stopbuffer \typebuffer[sample] \setbox\scratchboxone\vbox{\dontcomplain \nl \hsize 2mm \getbuffer[sample]} \getbuffer[result] \startbuffer[sample] nederlands\discretionary options 3 {!}{!}{!}nederlands\blank \stopbuffer \typebuffer[sample] \setbox\scratchboxone\vbox{\dontcomplain \nl \hsize 2mm \getbuffer[sample]} \getbuffer[result] Here is an example of such an interference. Of course in practice this happens seldom and certainly not with ligatures. Some fonts have kerning between certain glyphs and for instance dashes and there it could matter. \startbuffer ef% \penalty \plustenthousand \hskip \zeropoint \discretionary{-}{f}{f}% \penalty \plustenthousand \hskip \zeropoint e ef\discretionary options 3 {-}{f}{f}e \stopbuffer \typebuffer As you can see, we only get the ligature when we set the options. In the process of processing \OPENTYPE\ features it can be that one actually looses a discretionary, although we try to prevent this when possible. \startlinecorrection \scale[height=2cm]{\setupbodyfont[pagella]\showglyphs\getbuffer} \stoplinecorrection But, as said, the fact that we don't need the penalties and glue helps at the \LUA\ end: the cleaner the node list, the better. \stopsection \startsection[title={Tracing}] The already present tracker command has been extended so handle the options: \startbuffer[sample0] \enabletrackers[discretionaries] \stopbuffer \startbuffer[sample1] test\discretionary {]} {[} {[]}test \stopbuffer \startbuffer[sample2] testing\discretionary {]} {[} {[]}testing \stopbuffer \startbuffer[sample3] testing\discretionary options 3 {]} {[} {[]}testing \stopbuffer \typebuffer[sample0,sample1,sample2,sample3] \setbox\scratchboxone\vbox{\dontcomplain \getbuffer[sample0,sample1]} \getbuffer[result] \setbox\scratchboxone\vbox{\dontcomplain \hsize 2mm \getbuffer[sample0,sample2]} \getbuffer[result] \setbox\scratchboxone\vbox{\dontcomplain \hsize 2mm \getbuffer[sample0,sample3]} \getbuffer[result] \stopsection \startsection[title={Glue in discretionaries}] In the case you cannot predict what goes into a discretionary you can get run into an error message with respect to unsupported glue in a disc node. The mode value \number\permitgluehyphenationcode\space makes glue acceptable and turn into kern, as demonstrated here; \startbuffer {\hsize 1mm \darkblue \discretionary{potential conspiracy}{prophets}{influencers}\par} \stopbuffer \typebuffer The line break occurs but the space in the pre part is of course frozen: {\getbuffer} As usual \TEX\ users will come up with applications. \stopsection \startsection[title={Penalties}] By default the par builder will use the value of \type {\hyphenpenalty} that gets stored in the discretionary node. However, when the \type {\discretionary} is followed by a \type {penalty} keyword and a number, that one will. \stopsection \startsection[title=Exceptions] At some point a user on the \CONTEXT\ mailing list wondered how to deal with a case like this: \startbuffer[example] \switchtobodyfont[pagella]\mainlanguage[de]auffasse \stopbuffer \typebuffer[example] \startlinecorrection \scale[height=2cm]{\inlinebuffer[example]} \stoplinecorrection \startbuffer \startexceptions[de] au{f-}{-f}{ff}(f\zwnj f)asse \stopexceptions \stopbuffer In \LUAMETATEX\ you can block the unwanted ligature using this trick: \typebuffer \getbuffer \startlinecorrection \scale[height=2cm]{\inlinebuffer[example]} \stoplinecorrection The exception mechanism in \LUATEX\ and therefore \LUAMETATEX\ works as follows. When we have this exception: \starttyping au{f-}{-f}{ff}asse \stoptyping the engine will register that exception under \type {auffasse}, that is: the replacement part determines the word. When it runs into that word, it will create a so called discretionary node with a pre, post and replace part. However, it only uses the \type {ff} for a lookup and keeps the original two glyphs: these become the replacement text. However, in \LUAMETATEX\ you can add an alternative replacement: \startbuffer \startexceptions[de] au{f-}{-f}{ff}(st)asse \stopexceptions \stopbuffer \typebuffer \getbuffer This time the replacement text becomes \type {xx}. So we get \type {austasse} and it is that sequence that is seen by the font handler when it applies its tricks. On some fonts however \startbuffer[example] \switchtobodyfont[pagella]\mainlanguage[de]auffasse \stopbuffer \startlinecorrection \scale[height=2cm]{\showglyphs\showfontkerns\inlinebuffer[example]} \stoplinecorrection But in the Pagella font that we use here, a kern is added between the \type {s} and the \type {t}. If you don't want that you can say this: \startbuffer \startexceptions[de] au{f-}{-f}{ff}(s\zwnj t)asse \stopexceptions \stopbuffer \typebuffer \getbuffer \startlinecorrection \scale[height=2cm]{\showglyphs\showfontkerns\inlinebuffer[example]} \stoplinecorrection A \type {zwj} will block a ligature (some fonts have an \type {st} ligature) and a \type {zwnj} blocks a ligatures as well as kerns. You can actually abuse this mechanism for trickery like this: \startbuffer \startexceptions[nl] wis-kun-d{e-}{o}{eo}(e-o)n-der-wijs \stopexceptions \stopbuffer \typebuffer \getbuffer The Dutch word \type {wiskundeonderwijs} is found as exception and comes out like this: \startbuffer[example] \switchtobodyfont[pagella]\mainlanguage[nl]wiskundeonderwijs \stopbuffer \startlinecorrection \scale[height=1cm]{\showglyphs\showfontkerns\inlinebuffer[example]} \stoplinecorrection Watch the hyphen that makes the compound word more visible! The other hyphens in the exception are proper hyphenation points and when a break happens there a hyphen is automatically added. The \type {\nokerning} and \type {\noligaturing} macros can be used grouped: \startbuffer[example] {every}\quad {\nokerning every}\quad {\noligaturing every}\quad {e{\nokerning v}ery}\quad {e{\glyphoptions\noleftkernglyphoptioncode v}ery}\quad {e{\glyphoptions\norightkernglyphoptioncode v}ery}\quad \stopbuffer \typebuffer[example] There are several low level control options. In addition to those shown here we have a pair for ligatures: \typ {\noleftligatureglyphoptioncode} and \typ {\norightligatureglyphoptioncode}. \startlinecorrection[blank] \scale[width=\textwidth]{\showglyphs\showfontkerns\inlinebuffer[example]} \stoplinecorrection There are alternative mechanism, like a blocker that implements a font feature and a replacement mechanism, but these are not discussed here. \stopsection \stopchapter \stopcomponent