languages-basics.tex /size: 12 Kb    last modification: 2021-10-28 13:50
1% language=us runpath=texruns:manuals/languages
2
3\startcomponent languages-basics
4
5\environment languages-environment
6
7\startchapter[title=Some basics][color=darkyellow]
8
9\startsection[title={Introduction}]
10
11In this chapter we will see how we can toggle between languages. A first
12introduction to patterns will be given. Some details of how to control the
13hyphenation with specific patterns will be given in a later chapter.
14
15\stopsection
16
17\startsection[title={Available languages}]
18
19When you use the English version of \CONTEXT\ you will default to US English as
20main language. This means that hyphenation will be US specific, which by the way
21is different from the rules in GB. All labels that are generated by the system
22are also in English. Languages can often be accessed by names like \type
23{english} or \type {dutch} although it is quite common to use the short tags like
24\type {en} and \type {nl}. Because we want to be as compatible as possible with
25\MKII, there are quite some synonyms. The following table lists the languages that
26for which support is built|-|in.\footnote {More languages can be defined. It is
27up to users to provide the information.}
28
29\startbuffer
30\usemodule[languages-system]
31
32\loadinstalledlanguages
33\showinstalledlanguages
34\stopbuffer
35
36\getbuffer
37
38You can call up such a table with the following commands:
39
40\typebuffer
41
42Instead you can run \type {context --global languages-system.mkiv}.
43
44As you can see, many languages have hyphenation patterns but for Japanese,
45Korean, Chinese as well as Arabic languages they make no sense. The patterns are
46loaded on demand. The number is the internal number that is used in the engine; a
47user never has to use that number. Numbers $<1$ are used to disable hyphenation.
48The file tag is used to locate and load a specification. Such files have names
49like type {lang-nl.lua}.
50
51Some languages share the same hyphenation patterns but can have demands that
52differ, like labels or quotes. The characters shown in the table are those found
53in the pattern files. The number of patterns differs a lot between languages.
54This relates to the systematic behind them. Some languages use word stems, others
55base their hyphenation on syllables. Some language have inflections which adds to
56the complexity while others can combine words in ways that demand special care
57for word boundaries. Of course a low or high number can signal a low quality as
58well, but most pattern collections are assembled over many years and updated when
59for instance spelling rules change. I think that we can safely say that most patterns
60are quite stable and of good quality.
61
62\stopsection
63
64\startsection[title=Switching]
65
66The document language is set with
67
68\starttyping
69\mainlanguage[en]
70\stoptyping
71
72but when you want to apply the proper hyphenation rules to an embedded language
73you can use:
74
75\starttyping
76\language[en]
77\stoptyping
78
79or just:
80
81\starttyping
82\en
83\stoptyping
84
85The main language determines what labels show up, how numbering happens, in what
86way dates get formatted, etc. Normally the \typ {\mainlanguage} command comes
87before the \typ {\starttext} command.
88
89\stopsection
90
91\startsection[title=Hyphenation]
92
93In \LUATEX\ each character that gets typeset not only carries a font id and character
94code, but also a language number. You can switch language whenever you want and
95the change will be carried with the characters. Switching within a word doesn't make
96sense but it is permitted:
97
98\starttabulate[|||T|]
99\NC 1 \NC \type{\de incrediblykompliziert}      \NC \hyphenatedword{\de incrediblykompliziert}     \NC \NR
100\NC 2 \NC \type{\en incrediblykompliziert}      \NC \hyphenatedword{\en incrediblykompliziert}     \NC \NR
101\NC 3 \NC \type{\en incredibly\de kompliziert}  \NC \hyphenatedword{\en incredibly\de kompliziert} \NC \NR
102\NC 4 \NC \type{\en incredibly\de\-kompliziert} \NC \hyphenatedword{\en incredibly\de\-kompliziert} \NC \NR
103\NC 5 \NC \type{\en incredibly\de-kompliziert}  \NC \hyphenatedword{\en incredibly\de-kompliziert} \NC \NR
104\stoptabulate
105
106In the line 4 we have a \type {\-} between the two words, and in the last
107line just a \type {-}. If you look closely you will notice that the snippets
108can be quite small. If we typeset a word with a 1mm text width we get this:
109
110\blank \start \en \hsize 1mm incredibly \par \stop \blank
111
112If you are familiar with the details of hyphenation, you know that the number of
113characters at the end and beginning of a word is controlled by the two variables
114\typ {\lefthyphenmin} and \typ {\righthyphenmin}. However, these only influence
115the hyphenation process. What bits and pieces eventually end up on a line is
116determined by the par builder and there the \type {\hsize} matters. In practice
117you will not run into these situations, unless you have extreme long words and a
118narrow column.
119
120Hyphenation normally is limited to regular characters that make up the alphabet of
121a language. It is insensitive for capitalization as the following text shows:
122
123\blank
124
125\startnarrower
126\hyphenatedword {This time the musical distraction while developing code came
127from watching youtube performances of Cory Henry (also known from Snarky Puppy,
128a conglomerate of excellent players). Just search the web for his name with \quote
129{Stevie Wonder and Michael Jackson Tribute}. There is no keyboard he can't play.
130Another interesting keyboard player is Sun Rai (a short name for Rai
131Thistlethwayte, just google for \quote {The Beatles, Come Together, Live Piano
132Acoustic with Loop Pedal}, or do a combined search with \quote {Matt
133Chamberlain}. Okay, and talking of keyboards, let's not forget Vika Yermolyeva
134(vkgoeswild) as she's one of a kind too on the web. And then there is Jacob
135Collier, in one word: incredible (or hyphenated the Dutch way {\nl incredible},
136let me repeat that in French {\fr incredible}).} \footnote {Get me right, there
137are of course many more fantastic musicians.}
138\stopnarrower
139
140\blank
141
142Of course, names are often short and don't need to be hyphenated
143(or the left and right settings prohibit it). Another complication with names is
144that they can come from another language so we either need to switch language
145temporarily or we need to add an exception (more about that later).
146
147\stopsection
148
149\startsection[title=Primitives]
150
151In traditional \TEX\ the language is not a property of a character but is
152triggered by a signal in the (so called) list. Think of:
153
154\starttyping
155<language 1>this is <language 2>nederlands<language 1> mixed with english
156\stoptyping
157
158This number is set by the primitive \typ {\language}. Language triggers are
159injected into the list depending on the value of this number. There is also a \typ
160{\setlanguage} primitive that can inject triggers without setting the \typ
161{\language} number. Because in \LUATEX\ the state is kept with the character
162you don't need to worry about the subtle differences here.
163
164In \CONTEXT\ the \typ {\language} and \typ {\setlanguage} commands are overloaded
165by a more advanced switch macro. You cannot assume that they work as explained in
166general manuals about \TEX. Currently you can still assign a number but that
167might change. Just consider the language to be an abstraction and don't mess with
168this number. Both commands not only change the current language but also do
169specific initializations when needed.
170
171What characters get involved in hyhenation is historically determines by the so
172called \type {\lccode} values. Each character can have such a value which maps
173an uppercase to a lowercase character. This concept has been extended in \ETEX\
174where it binds to a pattern set (language). However, in \CONTEXT\ the user never
175has to worry about such details.
176
177% The \type {\patterns} primitive is
178% The \type {\hyphenation} primitive is
179
180In traditional hyphenation there will not be hyphenated if the sum of \typ
181{\lefthyphenmin} and \typ {\righthyphenmin} exceeds 62. This limitation is not
182present in the to be presented \LUA\ variant of this routine as there is no
183good reason for this limitation other than implementation constraints.
184
185\stopsection
186
187\startsection[title=Control]
188
189We already mentioned \typ {\lefthyphenmin} and \typ {\righthyphenmin}. These
190two variables control the area in a word that is subjected to hyphenation.
191Setting these values is a matter of taste but making them too small can result in
192bad hyphenation when the patterns are made with the assumptions that certain
193minima are used. Using a \typ {\lefthyphenmin} of 2 while the patterns are made
194with a value of 3 in mind is a bad idea.
195
196\startlinecorrection[blank]
197\startluacode
198context.bTABLE { option = "stretch", align= "middle" }
199    context.bTR()
200        context.bTD { ny = 2, align = "middle,lohi", style = "monobold" }
201            context.verbatim("\\lefthyphenmin")
202        context.eTD()
203        context.bTD { nx = 5, style = "monobold" }
204            context.verbatim("\\righthyphenmin")
205        context.eTD()
206    context.eTR()
207    context.bTR()
208        for right=1,5 do
209            context.bTD()
210                context.mono(right)
211            context.eTD()
212        end
213    context.eTR()
214    for left=1,5 do
215        context.bTR()
216            context.bTD()
217                context.mono(left)
218            context.eTD()
219            for right=1,5 do
220                context.bTD()
221                    context("\\lefthyphenmin %s \\righthyphenmin %s \\hyphenatedword{interesting}",left,right)
222                context.eTD()
223            end
224        context.eTR()
225    end
226context.eTABLE()
227\stopluacode
228\stoplinecorrection
229
230When \TEX\ breaks a paragraph into lines it will try do so without hyphenation.
231When that fails (read: when the badness becomes too high) a next effort will take
232hyphenation into account. \footnote {Because in \LUATEX\ we always hyphenate
233there is no real gain in trying not to hyphenate. Because in traditional \TEX\
234hyphenation happens on the fly a pass without hyphenating makes more sense.} When
235the badness is still too high, an optional emergency pass can be made but only
236when the tolerances are set to permit this. In \CONTEXT\ you can try these
237settings when you get too many over- or underfull boxes reported on the console.
238
239\starttyping
240\setupalign[tolerant]
241\setupalign[verytolerant]
242\setupalign[verytolerant,stretch]
243\stoptyping
244
245Personally I tend to use the last setting, especially in automated flows. After
246all, \TEX\ will not apply stretch unless it's really needed.
247
248The two \typ {\*hyphenmin} parameters can be set any time and the current value
249is stored with each character. They can also be set with the language which we
250will see later.
251
252When \TEX\ hyphenates words it has to decide where a word starts and ends. In
253traditional \TEX\ the words starts normally at a character that falls within the
254scope of the hyphenator. It ends at when a box (hlist or vlist) is seen, but also
255at a rule, discretionary, accent (forget about this in \CONTEXT) or math. An
256example will be given in the chapter that discussed the \LUA\ alternative.
257
258\stopsection
259
260\startsection[title=Installing]
261
262    todo
263
264\stopsection
265
266\startsection[title=Modes]
267
268Languages are one of the mechanisms where you can access the current state. There are
269for instance two (official) macros that contain the current (main) language:
270
271\startbuffer
272\starttabulate[||Tc|]
273\HL
274\NC \bf macro                    \NC \bf value            \NC \NR
275\HL
276\NC \type {\currentmainlanguage} \NC \currentmainlanguage \NC \NR
277\NC \type {\currentlanguage}     \NC \currentlanguage     \NC \NR
278\HL
279\stoptabulate
280\stopbuffer
281
282\getbuffer
283
284When we have set \type {\language[nl]} we get this:
285
286\start \nl \getbuffer \stop
287
288If you write a style that needs to adapt to a language you can use modes. There
289are several ways to do this:
290
291\startbuffer
292\language[nl]
293
294\startmode[**en]
295    \color[darkred]{main english}
296\stopmode
297
298\startmode[*en]
299    \color[darkred]{local english}
300\stopmode
301
302\startmode[**nl]
303    \color[darkblue]{main dutch}
304\stopmode
305
306\startmode[*nl]
307    \color[darkblue]{local dutch}
308\stopmode
309
310\startmodeset
311    [*en] {\color[darkgreen]{english set}}
312    [*nl] {\color[darkgreen]{dutch set}}
313\stopmodeset
314\stopbuffer
315
316\typebuffer
317
318This typesets:
319
320\blank \startpacked \setupindenting[no] \getbuffer \stoppacked \blank
321
322When you use setups you can use the following trick:
323
324\startbuffer
325\language[nl]
326
327\startsetups language:en
328    \color[darkorange]{something english}
329\stopsetups
330
331\startsetups language:nl
332    \color[darkorange]{something dutch}
333\stopsetups
334
335\setups[language:\currentlanguage]
336\stopbuffer
337
338\typebuffer
339
340As expected we get:
341
342\blank \start \setupindenting[no] \getbuffer \stop \blank
343
344\stopsection
345
346\stopchapter
347
348\stopcomponent
349