% language=us runpath=texruns:manuals/languages \startcomponent languages-basics \environment languages-environment \startchapter[title=Some basics][color=darkyellow] \startsection[title={Introduction}] In this chapter we will see how we can toggle between languages. A first introduction to patterns will be given. Some details of how to control the hyphenation with specific patterns will be given in a later chapter. \stopsection \startsection[title={Available languages}] When you use the English version of \CONTEXT\ you will default to US English as main language. This means that hyphenation will be US specific, which by the way is different from the rules in GB. All labels that are generated by the system are also in English. Languages can often be accessed by names like \type {english} or \type {dutch} although it is quite common to use the short tags like \type {en} and \type {nl}. Because we want to be as compatible as possible with \MKII, there are quite some synonyms. The following table lists the languages that for which support is built|-|in.\footnote {More languages can be defined. It is up to users to provide the information.} \startbuffer \usemodule[languages-system] \loadinstalledlanguages \showinstalledlanguages \stopbuffer \getbuffer You can call up such a table with the following commands: \typebuffer Instead you can run \type {context --global languages-system.mkiv}. As you can see, many languages have hyphenation patterns but for Japanese, Korean, Chinese as well as Arabic languages they make no sense. The patterns are loaded on demand. The number is the internal number that is used in the engine; a user never has to use that number. Numbers $<1$ are used to disable hyphenation. The file tag is used to locate and load a specification. Such files have names like type {lang-nl.lua}. Some languages share the same hyphenation patterns but can have demands that differ, like labels or quotes. The characters shown in the table are those found in the pattern files. The number of patterns differs a lot between languages. This relates to the systematic behind them. Some languages use word stems, others base their hyphenation on syllables. Some language have inflections which adds to the complexity while others can combine words in ways that demand special care for word boundaries. Of course a low or high number can signal a low quality as well, but most pattern collections are assembled over many years and updated when for instance spelling rules change. I think that we can safely say that most patterns are quite stable and of good quality. \stopsection \startsection[title=Switching] The document language is set with \starttyping \mainlanguage[en] \stoptyping but when you want to apply the proper hyphenation rules to an embedded language you can use: \starttyping \language[en] \stoptyping or just: \starttyping \en \stoptyping The main language determines what labels show up, how numbering happens, in what way dates get formatted, etc. Normally the \typ {\mainlanguage} command comes before the \typ {\starttext} command. \stopsection \startsection[title=Hyphenation] In \LUATEX\ each character that gets typeset not only carries a font id and character code, but also a language number. You can switch language whenever you want and the change will be carried with the characters. Switching within a word doesn't make sense but it is permitted: \starttabulate[|||T|] \NC 1 \NC \type{\de incrediblykompliziert} \NC \hyphenatedword{\de incrediblykompliziert} \NC \NR \NC 2 \NC \type{\en incrediblykompliziert} \NC \hyphenatedword{\en incrediblykompliziert} \NC \NR \NC 3 \NC \type{\en incredibly\de kompliziert} \NC \hyphenatedword{\en incredibly\de kompliziert} \NC \NR \NC 4 \NC \type{\en incredibly\de\-kompliziert} \NC \hyphenatedword{\en incredibly\de\-kompliziert} \NC \NR \NC 5 \NC \type{\en incredibly\de-kompliziert} \NC \hyphenatedword{\en incredibly\de-kompliziert} \NC \NR \stoptabulate In the line 4 we have a \type {\-} between the two words, and in the last line just a \type {-}. If you look closely you will notice that the snippets can be quite small. If we typeset a word with a 1mm text width we get this: \blank \start \en \hsize 1mm incredibly \par \stop \blank If you are familiar with the details of hyphenation, you know that the number of characters at the end and beginning of a word is controlled by the two variables \typ {\lefthyphenmin} and \typ {\righthyphenmin}. However, these only influence the hyphenation process. What bits and pieces eventually end up on a line is determined by the par builder and there the \type {\hsize} matters. In practice you will not run into these situations, unless you have extreme long words and a narrow column. Hyphenation normally is limited to regular characters that make up the alphabet of a language. It is insensitive for capitalization as the following text shows: \blank \startnarrower \hyphenatedword {This time the musical distraction while developing code came from watching youtube performances of Cory Henry (also known from Snarky Puppy, a conglomerate of excellent players). Just search the web for his name with \quote {Stevie Wonder and Michael Jackson Tribute}. There is no keyboard he can't play. Another interesting keyboard player is Sun Rai (a short name for Rai Thistlethwayte, just google for \quote {The Beatles, Come Together, Live Piano Acoustic with Loop Pedal}, or do a combined search with \quote {Matt Chamberlain}. Okay, and talking of keyboards, let's not forget Vika Yermolyeva (vkgoeswild) as she's one of a kind too on the web. And then there is Jacob Collier, in one word: incredible (or hyphenated the Dutch way {\nl incredible}, let me repeat that in French {\fr incredible}).} \footnote {Get me right, there are of course many more fantastic musicians.} \stopnarrower \blank Of course, names are often short and don't need to be hyphenated (or the left and right settings prohibit it). Another complication with names is that they can come from another language so we either need to switch language temporarily or we need to add an exception (more about that later). \stopsection \startsection[title=Primitives] In traditional \TEX\ the language is not a property of a character but is triggered by a signal in the (so called) list. Think of: \starttyping this is nederlands mixed with english \stoptyping This number is set by the primitive \typ {\language}. Language triggers are injected into the list depending on the value of this number. There is also a \typ {\setlanguage} primitive that can inject triggers without setting the \typ {\language} number. Because in \LUATEX\ the state is kept with the character you don't need to worry about the subtle differences here. In \CONTEXT\ the \typ {\language} and \typ {\setlanguage} commands are overloaded by a more advanced switch macro. You cannot assume that they work as explained in general manuals about \TEX. Currently you can still assign a number but that might change. Just consider the language to be an abstraction and don't mess with this number. Both commands not only change the current language but also do specific initializations when needed. What characters get involved in hyhenation is historically determines by the so called \type {\lccode} values. Each character can have such a value which maps an uppercase to a lowercase character. This concept has been extended in \ETEX\ where it binds to a pattern set (language). However, in \CONTEXT\ the user never has to worry about such details. % The \type {\patterns} primitive is % The \type {\hyphenation} primitive is In traditional hyphenation there will not be hyphenated if the sum of \typ {\lefthyphenmin} and \typ {\righthyphenmin} exceeds 62. This limitation is not present in the to be presented \LUA\ variant of this routine as there is no good reason for this limitation other than implementation constraints. \stopsection \startsection[title=Control] We already mentioned \typ {\lefthyphenmin} and \typ {\righthyphenmin}. These two variables control the area in a word that is subjected to hyphenation. Setting these values is a matter of taste but making them too small can result in bad hyphenation when the patterns are made with the assumptions that certain minima are used. Using a \typ {\lefthyphenmin} of 2 while the patterns are made with a value of 3 in mind is a bad idea. \startlinecorrection[blank] \startluacode context.bTABLE { option = "stretch", align= "middle" } context.bTR() context.bTD { ny = 2, align = "middle,lohi", style = "monobold" } context.verbatim("\\lefthyphenmin") context.eTD() context.bTD { nx = 5, style = "monobold" } context.verbatim("\\righthyphenmin") context.eTD() context.eTR() context.bTR() for right=1,5 do context.bTD() context.mono(right) context.eTD() end context.eTR() for left=1,5 do context.bTR() context.bTD() context.mono(left) context.eTD() for right=1,5 do context.bTD() context("\\lefthyphenmin %s \\righthyphenmin %s \\hyphenatedword{interesting}",left,right) context.eTD() end context.eTR() end context.eTABLE() \stopluacode \stoplinecorrection When \TEX\ breaks a paragraph into lines it will try do so without hyphenation. When that fails (read: when the badness becomes too high) a next effort will take hyphenation into account. \footnote {Because in \LUATEX\ we always hyphenate there is no real gain in trying not to hyphenate. Because in traditional \TEX\ hyphenation happens on the fly a pass without hyphenating makes more sense.} When the badness is still too high, an optional emergency pass can be made but only when the tolerances are set to permit this. In \CONTEXT\ you can try these settings when you get too many over- or underfull boxes reported on the console. \starttyping \setupalign[tolerant] \setupalign[verytolerant] \setupalign[verytolerant,stretch] \stoptyping Personally I tend to use the last setting, especially in automated flows. After all, \TEX\ will not apply stretch unless it's really needed. The two \typ {\*hyphenmin} parameters can be set any time and the current value is stored with each character. They can also be set with the language which we will see later. When \TEX\ hyphenates words it has to decide where a word starts and ends. In traditional \TEX\ the words starts normally at a character that falls within the scope of the hyphenator. It ends at when a box (hlist or vlist) is seen, but also at a rule, discretionary, accent (forget about this in \CONTEXT) or math. An example will be given in the chapter that discussed the \LUA\ alternative. \stopsection \startsection[title=Installing] todo \stopsection \startsection[title=Modes] Languages are one of the mechanisms where you can access the current state. There are for instance two (official) macros that contain the current (main) language: \startbuffer \starttabulate[||Tc|] \HL \NC \bf macro \NC \bf value \NC \NR \HL \NC \type {\currentmainlanguage} \NC \currentmainlanguage \NC \NR \NC \type {\currentlanguage} \NC \currentlanguage \NC \NR \HL \stoptabulate \stopbuffer \getbuffer When we have set \type {\language[nl]} we get this: \start \nl \getbuffer \stop If you write a style that needs to adapt to a language you can use modes. There are several ways to do this: \startbuffer \language[nl] \startmode[**en] \color[darkred]{main english} \stopmode \startmode[*en] \color[darkred]{local english} \stopmode \startmode[**nl] \color[darkblue]{main dutch} \stopmode \startmode[*nl] \color[darkblue]{local dutch} \stopmode \startmodeset [*en] {\color[darkgreen]{english set}} [*nl] {\color[darkgreen]{dutch set}} \stopmodeset \stopbuffer \typebuffer This typesets: \blank \startpacked \setupindenting[no] \getbuffer \stoppacked \blank When you use setups you can use the following trick: \startbuffer \language[nl] \startsetups language:en \color[darkorange]{something english} \stopsetups \startsetups language:nl \color[darkorange]{something dutch} \stopsetups \setups[language:\currentlanguage] \stopbuffer \typebuffer As expected we get: \blank \start \setupindenting[no] \getbuffer \stop \blank \stopsection \stopchapter \stopcomponent