mk-order.tex /size: 14 Kb    last modification: 2023-12-21 09:43
1% language=us
2
3\environment mk-environment
4
5\startcomponent mk-order
6
7\chapter{The order of things}
8
9Normally the text that makes up a paragraph comes directly from
10the input stream or macro expansions (think of labels). When \TEX\
11has collected enough content to make a paragraph, for instance
12because a \type {\par} token signals it \TEX\ will try to create
13one. The raw material available for making such a paragraph is
14linked in a list nodes: references to glyphs in a font, kerns
15(fixed spacing), glue (flexible spacing), penalties (consider them
16to be directives), whatsits (can be anything, e.g.\ \PDF\ literals
17or hyperlinks). The result is a list of horizontal boxes (wrappers with
18lists that represent \quote {lines}) and this is either wrapped in
19vertical box of added to the main vertical list that keeps the
20page stream.
21
22The treatment consists of four activities:
23
24\startitemize[packed]
25\item construction of ligatures (an f plus an i can become fi)
26\item hyphenation of words that cross a line boundary
27\item kerning of characters based on information in the font
28\item breaking the list in lines in the most optimal way
29\stopitemize
30
31The process of breaking into lines is also influenced by
32protrusion (like hanging punctuation) and expansion
33(hz-optimization) but here we will not take these processes
34into account. There are numerous variables that control
35the process and the quality.
36
37These activities are rather interwoven and optimized. For
38instance, in order to hyphenate, ligatures are to be decomposed
39and|/|or constructed. Hyphenation happens when needed. Decisions
40about optimal breakpoints in lines can be influenced by penalties
41(like: not to many hyphenated words in a row) and permitting extra
42stretch between words. Because a paragraph can be boxed and
43unboxed, decomposed and fed into the machinery again, information
44is kept around. Just imagine the following: you want to measure
45the width of a word and therefore you box it. In order to get the
46right dimensions, \TEX\ has to construct the ligatures and add
47kerns. However, when we unbox that word and feed it into the
48paragraph builder, potential hyphenation points have to be
49consulted and at such a point might lay between the characters
50that resulted in the ligature. You can imagine that adding (and
51removing) inter|-|character kerns complicates the process even
52more.
53
54At the cost of some extra runtime and memory usage, in \LUATEX\
55these steps are more isolated. There is a function that builts
56ligatures, one that kerns characters, and another one that
57hyphenates all words in a list, not just the ones that are
58candidate for breaking. The potential breakpoints (called
59discretionaries) can contain ligature information as well. The
60linebreak process is also a separate function.
61
62The order in which this happens now is:
63
64\startitemize[packed,intro]
65\item hyphenation of words
66\item building of ligatures from sequences of glyphs
67\item kerning of glyphs
68\item breaking all this into lines
69\stopitemize
70
71One can discuss endless about the terminology here: are we dealing
72with characters or with glyphs. When a glyph node is made, it
73contains a reference to a slot in a font. Because in traditional
74\TEX\ the number of slots is limited to 256 the relationship
75between characters in the input and the shape in the font, called
76glyph, is kind of indirect (the input encoding versus font
77encoding issue) while in \LUATEX\ we can keep the font in
78\UNICODE\ encoding if we want. In traditional \TEX, hyphenation is
79based on the font encoding and therefore glyphs, and although in
80\LUATEX\ this is still the case, there we can more safely talk of
81characters till we start mapping then to shapes that have no
82\UNICODE\ point. This is of course macro package dependent but in
83\CONTEXT\ \MKIV\ we normalize all input to \UNICODE\ exclusively.
84
85The last step is now really isolated and for that reason we can
86best talk in terms of preparation of the to-be paragraph when
87we refer to the first three activities. In \LUATEX\ these three
88are available as functions that operate on a node list. They each
89have their own callback so we can disable them by replacing the
90default functions by dummies. Then we can hook in a new function
91in the two places that matter: \type {hpack_filter} and \type
92{pre_linebreak_filter} and move the preparation to there.
93
94A simple overload is shown below. Because the first node is always
95a whatsit that holds directional information (and at some point in
96the future maybe even more paragraph related state info), we can
97safely assume that \type {head} does not change. Of course this
98situation might change when you start adding your own
99functionality.
100
101\starttyping
102local function my_preparation(head)
103    local tail = node.slide(head) -- also add prev pointers
104    tail = lang.hyphenate(head,tail)
105    tail = node.ligaturing(head,tail)
106    tail = node.kerning(head,tail)
107    return head
108end
109
110callback.register("pre_linebreak_filter", my_preparation)
111callback.register("hpack_filter",         my_preparation)
112
113local dummy = function(head,tail) return tail end
114
115callback.register("hyphenate",  dummy)
116callback.register("ligaturing", dummy)
117callback.register("kerning",    dummy)
118\stoptyping
119
120It might be clear that the order of actions matter. It might also
121be clear that you are responsible for that order yourself. There
122is no pre||cooked mechanism for guarding your actions and there are
123several reasons for this:
124
125\startitemize
126
127\item Each macro package does things its own way so any hard-coded
128mechanism would be replaced and overloaded anyway. Compare this to
129the usage of catcodes, font systems, auxiliary files, user
130interfaces, handling of inserts etc. The combination of callbacks,
131the three mentioned functions and the availability of \LUA\ makes
132it possible to implement any system you like.
133
134\item Macro packages might want to provide hooks for specialized
135node list processing, and since there are many places where code
136can be hooked in, some kind of oversight is needed (real people
137who keep track of interference of user supplied features, no
138program can do that).
139
140\item User functions can mess up the node list and successive
141actions then might make the wrong assumptions. In order to guard
142this, macro packages might add tracing options and again there are
143too many ways to communicate with users. Debugging and tracing has
144to be embedded in the bigger system in a natural way.
145
146\stopitemize
147
148In \CONTEXT\ \MKIV\ there are already a few places where users can
149hook code into the task list, but so far we haven't really
150encouraged that. The interfaces are simply not stable enough yet.
151On the other hand, there are already quite some node list
152manipulators at work. The most prominent one is the \OPENTYPE\
153feature handler. That one replaces the ligature and kerning
154functions (at least for some fonts). It also means that we need to
155keep an eye on possible interferences between \CONTEXT\ \MKIV\
156mechanisms and those provided by \LUATEX.
157
158For fonts, that is actually quite simple: the \LUATEX\ functions
159use ligature and kerning information stored in the \TFM\ table,
160and for \OPENTYPE\ fonts we simply don't provide that information
161when we define a font, so in that case \LUATEX\ will not ligature
162and kern. Users can influence this process to some extend by
163setting the \type {mode} for a specific instance of a font to
164\type {base} or \type {node}. Because \TYPEONE\ fonts have no
165features like \OPENTYPE\ such fonts are (at least currently)
166always are processed in base mode.
167
168Deep down in \CONTEXT\ we call a sequence of actions a \quote
169{task}. One such task is \quote {processors} and the actions
170discussed so far are in this category. Within this category we
171have subcategories:
172
173\starttabulate[|l|p|]
174\NC \bf subcategory \NC \bf intended usage \NC \NR
175\HL
176\NC before      \NC experimental (or module) plugins \NC \NR
177\NC normalizers \NC cleanup and preparation handlers \NC \NR
178\NC characters  \NC operations on individual characters \NC \NR
179\NC words       \NC operations on words \NC \NR
180\NC fonts       \NC font related manipulations \NC \NR
181\NC lists       \NC manipulations on the list as a whole \NC \NR
182\NC after       \NC experimental (or module) plugins \NC \NR
183\stoptabulate
184
185Here \quote {plugins} are experimental handlers or specialized
186ones provided in modules that are not part of the kernel. The categories
187are not that distinctive and only provide a convenient way to group
188actions.
189
190Examples of normalizers are: checking for missing characters and
191replacing character references by fallbacks. Character processors
192are for instance directional analysers (for right to left
193typesetting), case swapping, and specialized character triggered
194hyphenation (like compound words). Word processors deal with
195hyphenation (here we use the default function provided by \LUATEX)
196and spell checking. The font processors deal with \OPENTYPE\ as
197well as the ligature building and kerning of other font types.
198Finally, the list processors are responsible for tasks like special
199spacing (french punctuation) and kerning (additional
200inter||character kerning). Of course, this all is rather \CONTEXT\
201specific and we expect to add quite some more less trivial handlers
202the upcoming years.
203
204Many of these handlers are triggered by attributes. Nodes can have
205many attributes and each can have many values. Traditionally \TEX\
206had only a few attributes: language and font, where the first is
207not even a real attribute and the second is only bound to glyph
208nodes. In \LUATEX\ language is also a glyph property. The nice
209thing about attributes is that they can be set at the \TEX\ end
210and obey grouping. This makes them for instance perfect for
211implementing color mechanims. Because attributes are part of the
212nodes, and not nodes themselves, they don't influence or interfere
213processing unless one explicitly tests for them and acts
214accordingly.
215
216In addition to the mentioned task \quote {processors} we also have
217a task \quote {shipouts} and there will be more tasks in future
218versions of \CONTEXT. Again we have subcategories, currently:
219
220\starttabulate[|l|p|]
221\NC \bf subcategory \NC \bf intended usage \NC \NR
222\HL
223\NC before      \NC experimental (or module) plugins \NC \NR
224\NC normalizers \NC cleanup and preparation handlers \NC \NR
225\NC finishers   \NC manipulations on the list as a whole \NC \NR
226\NC after       \NC experimental (or module) plugins \NC \NR
227\stoptabulate
228
229An example of a normalizer is cleanup of the \quote {to be shipped
230out} list. Finishers deal with color, transparency, overprint,
231negated content (sometimes used in page imposition), special
232effects effect (like outline fonts) and viewer layers (something
233\PDF). Quite possible hyperlink support will also be handled there
234but not before the backend code is rewritten.
235
236The previous description is far from complete. For instance, not
237all handlers use the same interface: some work \type {head}
238onwards, some need a \type {tail} pointer too. Some report back
239success or failure. So the task handler needs to normalize their
240usage. Also, some effort goes into optimizing the task in such a
241way that processing the document is still reasonable fast. Keep in
242mind that each construction of a box invokes a callback, and there
243are many boxes used for constructing a page. Even a nilled
244callback is one, so for a simple one word paragraph four callbacks
245are triggered: the (nilled) hyphenate, ligature and kern callbacks
246as well as the one called \type {pre_linebreak_filter}. The task
247handler that we plug in the filter callbacks calls many functions
248and each of them does one of more passes over the node list, and
249in turn might do many call to functions. You can imagine that
250we're quite happy that \TEX\ as well as \LUA\ is so efficient.
251
252As I already mentioned, implementing a task handler as well as
253deciding what actions within tasks to perform in what order is
254specific for the way a macro package is set up. The following code
255can serve as a starting point
256
257\starttyping
258filters = { } -- global namespace
259
260local list = { }
261
262function filters.add(fnc,n)
263    if not n or n > #list + 1 then
264        table.insert(list,#list+1)
265    elseif n < 0 then
266        table.insert(list,1)
267    else
268        table.insert(list,n)
269    end
270end
271
272function filters.remove(fnc,n)
273    if n and n > 0 and n <= #list then
274        table.remove(list,n)
275    end
276end
277
278local function run_filters(head,...)
279    local tail = node.slide(head)
280    for _, fnc in ipairs(list) do
281        head, tail = fnc(head,tail,...)
282    end
283    return head
284end
285
286local function hyphenation(head,tail)
287    return head, tail, lang.hyphenate(head,tail) -- returns done
288end
289local function ligaturing(head,tail)
290    return node.ligaturing(head,tail) -- returns head,tail,done
291end
292local function kerning(head,tail)
293    return node.kerning(head,tail) -- returns head,tail,done
294end
295
296filters.add(hyphenation)
297filters.add(ligaturing)
298filters.add(kerning)
299
300callback.register("pre_linebreak_filter", run_filters)
301callback.register("hpack_filter",         run_filters)
302\stoptyping
303
304Although one can inject extra filters by using the \type {add}
305function it may be clear that this can be dangerous due to
306interference. Therefore a slightly more secure variant is the
307following, where \type {main} is reserved for macro package
308actions and the others can be used by add||ons.
309
310\starttyping
311filters = { } -- global namespace
312
313local list = {
314    pre = { }, main = { }, post = { },
315}
316
317local order = {
318    "pre", "main", "post"
319}
320
321local function somewhere(where)
322    if not where then
323        texio.write_nl("error: invalid filter category")
324    elseif not list[where] then
325        texio.write_nl(string.format("error: invalid filter category '%s'",where))
326    else
327        return list[where]
328    end
329    return false
330end
331
332function filters.add(where,fnc,n)
333    local list = somewhere(where)
334    if not list then
335        -- error
336    elseif not n or n > #list + 1 then
337        table.insert(list,#list+1)
338    elseif n < 0 then
339        table.insert(list,1)
340    else
341        table.insert(list,n)
342    end
343end
344
345function filters.remove(where,fnc,n)
346    local list = somewhere(where)
347    if list and n and n > 0 and n <= #list then
348        table.remove(list,n)
349    end
350end
351
352local function run_filters(head,...)
353    local tail = node.slide(head)
354    for _, lst in pairs(order) do
355        for _, fnc in ipairs(list[lst]) do
356            head, tail = fnc(head,tail,...)
357        end
358    end
359    return head
360end
361
362filters.add("main",hyphenation)
363filters.add("main",ligaturing)
364filters.add("main",kerning)
365
366callback.register("pre_linebreak_filter", run_filters)
367callback.register("hpack_filter",         run_filters)
368\stoptyping
369
370Of course, \CONTEXT\ users who try to use this code will
371be punished by loosing much of the functionality already
372present, simply because we use yet another variant of the
373above code.
374
375\stopcomponent
376