1
2
3\startcomponent languagesbasics
4
5\environment languagesenvironment
6
7\startchapter[title=Some basics][color=darkyellow]
8
9\startsection[title={Introduction}]
10
11In this chapter we will see how we can toggle between languages. A first
12introduction to patterns will be given. Some details of how to control the
13hyphenation with specific patterns will be given in a later chapter.
14
15\stopsection
16
17\startsection[title={Available languages}]
18
19When you use the English version of \CONTEXT\ you will default to US English as
20main language. This means that hyphenation will be US specific, which by the way
21is different from the rules in GB. All labels that are generated by the system
22are also in English. Languages can often be accessed by names like \type
23{english} or \type {dutch} although it is quite common to use the short tags like
24\type {en} and \type {nl}. Because we want to be as compatible as possible with
25\MKII, there are quite some synonyms. The following table lists the languages that
26for which support is builtin.\footnote {More languages can be defined. It is
27up to users to provide the information.}
28
29\startbuffer
30\usemodule[languagessystem]
31
32\loadinstalledlanguages
33\showinstalledlanguages
34\stopbuffer
35
36\getbuffer
37
38You can call up such a table with the following commands:
39
40\typebuffer
41
42Instead you can run \type {context global languagessystem.mkiv}. Most
43languages have a two character tag but also a more verbose one. Not all two
44characters names are available as control sequence! Use the more verbose ones
45when possible.
46
47As you can see, many languages have hyphenation patterns but for Japanese,
48Korean, Chinese as well as Arabic languages they make no sense. The patterns are
49loaded on demand. The number is the internal number that is used in the engine; a
50user never has to use that number. Numbers $<1$ are used to disable hyphenation.
51The file tag is used to locate and load a specification. Such files have names
52like type {langnl.lua}.
53
54Some languages share the same hyphenation patterns but can have demands that
55differ, like labels or quotes. The characters shown in the table are those found
56in the pattern files. The number of patterns differs a lot between languages.
57This relates to the systematic behind them. Some languages use word stems, others
58base their hyphenation on syllables. Some language have inflections which adds to
59the complexity while others can combine words in ways that demand special care
60for word boundaries. Of course a low or high number can signal a low quality as
61well, but most pattern collections are assembled over many years and updated when
62for instance spelling rules change. I think that we can safely say that most patterns
63are quite stable and of good quality.
64
65\stopsection
66
67\startsection[title=Switching]
68
69The document language is set with
70
71\starttyping
72\mainlanguage[en]
73\stoptyping
74
75but when you want to apply the proper hyphenation rules to an embedded language
76you can use:
77
78\starttyping
79\language[en]
80\stoptyping
81
82or just:
83
84\starttyping
85\en
86\stoptyping
87
88The main language determines what labels show up, how numbering happens, in what
89way dates get formatted, etc. Normally the \typ {\mainlanguage} command comes
90before the \typ {\starttext} command.
91
92\stopsection
93
94\startsection[title=Hyphenation]
95
96In \LUATEX\ each character that gets typeset not only carries a font id and character
97code, but also a language number. You can switch language whenever you want and
98the change will be carried with the characters. Switching within a word doesnt make
99sense but it is permitted:
100
101\starttabulate[T]
102\NC 1 \NC \type{\de incrediblykompliziert} \NC \hyphenatedword{\de incrediblykompliziert} \NC \NR
103\NC 2 \NC \type{\en incrediblykompliziert} \NC \hyphenatedword{\en incrediblykompliziert} \NC \NR
104\NC 3 \NC \type{\en incredibly\de kompliziert} \NC \hyphenatedword{\en incredibly\de kompliziert} \NC \NR
105\NC 4 \NC \type{\en incredibly\de\-kompliziert} \NC \hyphenatedword{\en incredibly\de\-kompliziert} \NC \NR
106\NC 5 \NC \type{\en incredibly\dekompliziert} \NC \hyphenatedword{\en incredibly\dekompliziert} \NC \NR
107\stoptabulate
108
109In the line 4 we have a \type {\-} between the two words, and in the last
110line just a \type {}. If you look closely you will notice that the snippets
111can be quite small. If we typeset a word with a 1mm text width we get this:
112
113\blank \start \en \hsize 1mm incredibly \par \stop \blank
114
115If you are familiar with the details of hyphenation, you know that the number of
116characters at the end and beginning of a word is controlled by the two variables
117\typ {\lefthyphenmin} and \typ {\righthyphenmin}. However, these only influence
118the hyphenation process. What bits and pieces eventually end up on a line is
119determined by the par builder and there the \type {\hsize} matters. In practice
120you will not run into these situations, unless you have extreme long words and a
121narrow column.
122
123Hyphenation normally is limited to regular characters that make up the alphabet of
124a language. It is insensitive for capitalization as the following text shows:
125
126\blank
127
128\startnarrower
129\hyphenatedword {This time the musical distraction while developing code came
130from watching youtube performances of Cory Henry (also known from Snarky Puppy,
131a conglomerate of excellent players). Just search the web for his name with \quote
132{Stevie Wonder and Michael Jackson Tribute}. There is no keyboard he cant play.
133Another interesting keyboard player is Sun Rai (a short name for Rai
134Thistlethwayte, just google for \quote {The Beatles, Come Together, Live Piano
135Acoustic with Loop Pedal}, or do a combined search with \quote {Matt
136Chamberlain}. Okay, and talking of keyboards, lets not forget Vika Yermolyeva
137(vkgoeswild) as shes one of a kind too on the web. And then there is Jacob
138Collier, in one word: incredible (or hyphenated the Dutch way {\nl incredible},
139let me repeat that in French {\fr incredible}).} \footnote {Get me right, there
140are of course many more fantastic musicians.}
141\stopnarrower
142
143\blank
144
145Of course, names are often short and dont need to be hyphenated
146(or the left and right settings prohibit it). Another complication with names is
147that they can come from another language so we either need to switch language
148temporarily or we need to add an exception (more about that later).
149
150\stopsection
151
152\startsection[title=Primitives]
153
154In traditional \TEX\ the language is not a property of a character but is
155triggered by a signal in the (so called) list. Think of:
156
157\starttyping
158<language 1>this is <language 2>nederlands<language 1> mixed with english
159\stoptyping
160
161This number is set by the primitive \typ {\language}. Language triggers are
162injected into the list depending on the value of this number. There is also a \typ
163{\setlanguage} primitive that can inject triggers without setting the \typ
164{\language} number. Because in \LUATEX\ the state is kept with the character
165you dont need to worry about the subtle differences here.
166
167In \CONTEXT\ the \typ {\language} and \typ {\setlanguage} commands are overloaded
168by a more advanced switch macro. You cannot assume that they work as explained in
169general manuals about \TEX. Currently you can still assign a number but that
170might change. Just consider the language to be an abstraction and dont mess with
171this number. Both commands not only change the current language but also do
172specific initializations when needed.
173
174What characters get involved in hyhenation is historically determines by the so
175called \type {\lccode} values. Each character can have such a value which maps
176an uppercase to a lowercase character. This concept has been extended in \ETEX\
177where it binds to a pattern set (language). However, in \CONTEXT\ the user never
178has to worry about such details.
179
180
181
182
183In traditional hyphenation there will not be hyphenated if the sum of \typ
184{\lefthyphenmin} and \typ {\righthyphenmin} exceeds 62. This limitation is not
185present in the to be presented \LUA\ variant of this routine as there is no
186good reason for this limitation other than implementation constraints.
187
188\stopsection
189
190\startsection[title=Control]
191
192We already mentioned \typ {\lefthyphenmin} and \typ {\righthyphenmin}. These
193two variables control the area in a word that is subjected to hyphenation.
194Setting these values is a matter of taste but making them too small can result in
195bad hyphenation when the patterns are made with the assumptions that certain
196minima are used. Using a \typ {\lefthyphenmin} of 2 while the patterns are made
197with a value of 3 in mind is a bad idea.
198
199\startlinecorrection[blank]
200\startluacode
201context.bTABLE { option = "stretch", align= "middle" }
202 context.bTR()
203 context.bTD { ny = 2, align = "middle,lohi", style = "monobold" }
204 context.verbatim("\\lefthyphenmin")
205 context.eTD()
206 context.bTD { nx = 5, style = "monobold" }
207 context.verbatim("\\righthyphenmin")
208 context.eTD()
209 context.eTR()
210 context.bTR()
211 for right=1,5 do
212 context.bTD()
213 context.mono(right)
214 context.eTD()
215 end
216 context.eTR()
217 for left=1,5 do
218 context.bTR()
219 context.bTD()
220 context.mono(left)
221 context.eTD()
222 for right=1,5 do
223 context.bTD()
224 context("\\lefthyphenmin %s \\righthyphenmin %s \\hyphenatedword{interesting}",left,right)
225 context.eTD()
226 end
227 context.eTR()
228 end
229context.eTABLE()
230\stopluacode
231\stoplinecorrection
232
233When \TEX\ breaks a paragraph into lines it will try do so without hyphenation.
234When that fails (read: when the badness becomes too high) a next effort will take
235hyphenation into account. \footnote {Because in \LUATEX\ we always hyphenate
236there is no real gain in trying not to hyphenate. Because in traditional \TEX\
237hyphenation happens on the fly a pass without hyphenating makes more sense.} When
238the badness is still too high, an optional emergency pass can be made but only
239when the tolerances are set to permit this. In \CONTEXT\ you can try these
240settings when you get too many over or underfull boxes reported on the console.
241
242\starttyping
243\setupalign[tolerant]
244\setupalign[verytolerant]
245\setupalign[verytolerant,stretch]
246\stoptyping
247
248Personally I tend to use the last setting, especially in automated flows. After
249all, \TEX\ will not apply stretch unless its really needed.
250
251The two \typ {\*hyphenmin} parameters can be set any time and the current value
252is stored with each character. They can also be set with the language which we
253will see later.
254
255When \TEX\ hyphenates words it has to decide where a word starts and ends. In
256traditional \TEX\ the words starts normally at a character that falls within the
257scope of the hyphenator. It ends at when a box (hlist or vlist) is seen, but also
258at a rule, discretionary, accent (forget about this in \CONTEXT) or math. An
259example will be given in the chapter that discussed the \LUA\ alternative.
260
261\stopsection
262
263\startsection[title=Installing]
264
265 todo
266
267\stopsection
268
269\startsection[title=Modes]
270
271Languages are one of the mechanisms where you can access the current state. There are
272for instance two (official) macros that contain the current (main) language:
273
274\startbuffer
275\starttabulate[Tc]
276\HL
277\NC \bf macro \NC \bf value \NC \NR
278\HL
279\NC \type {\currentmainlanguage} \NC \currentmainlanguage \NC \NR
280\NC \type {\currentlanguage} \NC \currentlanguage \NC \NR
281\HL
282\stoptabulate
283\stopbuffer
284
285\getbuffer
286
287When we have set \type {\language[nl]} we get this:
288
289\start \nl \getbuffer \stop
290
291If you write a style that needs to adapt to a language you can use modes. There
292are several ways to do this:
293
294\startbuffer
295\language[nl]
296
297\startmode[**en]
298 \color[darkred]{main english}
299\stopmode
300
301\startmode[*en]
302 \color[darkred]{local english}
303\stopmode
304
305\startmode[**nl]
306 \color[darkblue]{main dutch}
307\stopmode
308
309\startmode[*nl]
310 \color[darkblue]{local dutch}
311\stopmode
312
313\startmodeset
314 [*en] {\color[darkgreen]{english set}}
315 [*nl] {\color[darkgreen]{dutch set}}
316\stopmodeset
317\stopbuffer
318
319\typebuffer
320
321This typesets:
322
323\blank \startpacked \setupindenting[no] \getbuffer \stoppacked \blank
324
325When you use setups you can use the following trick:
326
327\startbuffer
328\language[nl]
329
330\startsetups language:en
331 \color[darkorange]{something english}
332\stopsetups
333
334\startsetups language:nl
335 \color[darkorange]{something dutch}
336\stopsetups
337
338\setups[language:\currentlanguage]
339\stopbuffer
340
341\typebuffer
342
343As expected we get:
344
345\blank \start \setupindenting[no] \getbuffer \stop \blank
346
347\stopsection
348
349\stopchapter
350
351\stopcomponent
352 |