about-nodes.tex /size: 17 Kb    last modification: 2023-12-21 09:43
1% language=us
2
3\usemodule[nodechart]
4
5\startcomponent about-nodes
6
7\environment about-environment
8
9\startchapter[title={Juggling nodes}]
10
11\startsection[title=Introduction]
12
13When you use \TEX, join the community, follow mailing lists, read manuals,
14and|/|or attend meetings, there will come a moment when you run into the word
15\quote {node}. But, as a regular user, even if you write macros, you can happily
16ignore them because in practice you will never really see them. They are hidden
17deep down in \TEX.
18
19Some expert \TEX ies love to talk about \TEX's mouth, stomach, gut and other
20presumed bodily elements. Maybe it is seen as proof of the deeper understanding
21of this program as Don Knuth uses these analogies in his books about \TEX\ when
22he discusses how \TEX\ reads the input, translates it and digests it into a
23something that can be printed or viewed. No matter how your input gets digested,
24at some point we get nodes. However, as users have no real access to the
25internals, nodes never show themselves to the user. They have no bodily analogy
26either.
27
28A character that is read from the input can become a character node. Multiple
29characters can become a linked list of nodes. Such a list can contain other kind
30of nodes as well, for instance spaced become glue. There can also be penalties
31that steer the machinery. And kerns too: fixed displacements. Such a list can be
32wrapped in a box. In the process hyphenation is applied, characters become glyphs
33and intermediate math nodes becomes a combination of regular glyphs, kerns and
34glue, wrapped into boxes. So, an hbox that contains the three glyphs \type {tex}
35can be represented as follows:
36
37\startlinecorrection
38    \setupFLOWchart
39      [dx=2em,
40       dy=1em,
41       width=4em,
42       height=2em]
43    \setupFLOWshapes
44      [framecolor=maincolor]
45    \startFLOWchart[nodes]
46      \startFLOWcell
47        \name       {box}
48        \location   {1,1}
49        \shape      {action}
50        \text       {hbox}
51        \connection [rl] {t}
52      \stopFLOWcell
53      \startFLOWcell
54        \name       {t}
55        \location   {2,1}
56        \shape      {action}
57        \text       {t}
58        \connection [+t-t] {e}
59      \stopFLOWcell
60      \startFLOWcell
61        \name       {e}
62        \location   {3,1}
63        \shape      {action}
64        \text       {e}
65        \connection [+t-t] {x}
66        \connection [-b+b] {t}
67      \stopFLOWcell
68      \startFLOWcell
69        \name       {x}
70        \location   {4,1}
71        \shape      {action}
72        \text       {x}
73        \connection [-b+b] {e}
74      \stopFLOWcell
75    \stopFLOWchart
76    \FLOWchart[nodes]
77\stoplinecorrection
78
79Eventually a long sequence of nodes can become a paragraph of lines and each line
80is a box. The lines together make a page which is also a box. There are many kind
81of nodes but some are rather special and don't translate directly to some visible
82result. When dealing with \TEX\ as user we can forget about nodes: we never really
83see them.
84
85In this example we see an hlist (hbox) node. Such a node has properties like
86width, height, depth, shift etc. The characters become glyph nodes that have
87(among other properties) a reference to a font, character, language.
88
89Because \TEX\ is also about math, and because math is somewhat special, we have
90noads, some intermediate kind of node that makes up a math list, that eventually
91gets transformed into a list of nodes. And, as proof of extensibility, Knuth came
92up with a special node that is more or less ignored by the machinery but travels
93with the list and can be dealt with in special backend code. Their name indicates
94what it's about: they are called whatsits (which sounds better that whatevers).
95In \LUATEX\ some whatsits are used in the frontend, for instance directional
96information is stored in whatsits.
97
98The \LUATEX\ engine not only opens up the \UNICODE\ and \OPENTYPE\ universes, but
99also the traditional \TEX\ engine. It gives us access to nodes. And this permits
100us to go beyond what was possible before and therefore on mailing lists like the
101\CONTEXT\ list, the word node will pop up more frequently. If you look into the
102\LUA\ files that ship with \CONTEXT\ you cannot avoid seeing them. And, when you
103use the \CLD\ interface you might even want to manipulate them. A nice side
104effect is that you can sound like an expert without having to refer to bodily
105aspects of \TEX: you just see them as some kind of \LUA\ userdata variable. And
106you access them like tables: they are abstracts units with properties.
107
108\stopsection
109
110\startsection[title=Basics]
111
112Nodes are kind of special in the sense that you need to keep an eye on creation
113and destruction. In \TEX\ itself this is mostly hidden:
114
115\startbuffer
116\setbox0\hbox{some text}
117\stopbuffer
118
119\typebuffer
120
121If we look {\em into} this box we get a list of glyphs (see \in {figure}
122[fig:dummy:1]).
123
124\startplacefigure[reference=fig:dummy:1]
125    \getbuffer
126    \boxtoFLOWchart[dummy]{0}
127    \small
128    \FLOWchart[dummy][width=14em,height=3em,dx=1em,dy=.75em] % ,hcompact=yes]
129\stopplacefigure
130
131In \TEX\ you can flush such a box using \type {\box0} or copy it using \type
132{\copy0}. You can also flush the contents i.e.\ omit the wrapper using \type
133{\unhbox0} and \type {\unhcopy0}. The possibilities for disassembling the
134content of a box (or any list for that matter) are limited. In practice you
135can consider disassembling to be absent.
136
137This is different at the \LUA\ end: there we can really start at the beginning of
138a list, loop over it and see what's in there as well as change, add and remove
139nodes. The magic starts with:
140
141\starttyping
142local box = tex.box[0]
143\stoptyping
144
145Now we have a variable that has a so called \type {hlist} node. This node has not
146only properties like \type {width}, \type {height}, \type {depth} and \type
147{shift}, but also a pointer to the content: \type {list}.
148
149\starttyping
150local list = box.list
151\stoptyping
152
153Now, when we start messing with this list, we need to keep into account that the
154nodes are in fact userdata objects, that is: they are efficient \TEX\ data
155structures that have a \LUA\ interface. At the \TEX\ end the repertoire of
156commands that we can use to flush boxes is rather limited and as we cannot mess
157with the content we have no memory management issues. However, at the \LUA\ end
158this is different. Nodes can have pointers to other nodes and they can even have
159special properties that relate to other resources in the program.
160
161Take this example:
162
163\starttyping
164\setbox0\hbox{some text}
165\directlua{node.write(tex.box[0])}
166\stoptyping
167
168At the \TEX\ end we wrap something in a box. Then we can at the \LUA\ end access
169that box and print it back into the input. However, as \TEX\ is no longer in
170control it cannot know that we already flushed the list. Keep in mind that this
171is a simple example, but imagine more complex content, that contains hyperlinks
172or so. Now take this:
173
174\starttyping
175\setbox0\hbox{some text 1}
176\setbox0\hbox{some text 2}
177\stoptyping
178
179Here \TEX\ knows that the box has content and it will free the memory beforehand
180and forget the first text. Or this:
181
182\starttyping
183\setbox0\hbox{some text}
184\box0 \box0
185\stoptyping
186
187The box will be used and after that it's empty so the second flush is basically a
188harmless null operation: nothing gets inserted. But this:
189
190\starttyping
191\setbox0\hbox{some text}
192\directlua{node.write(tex.box[0])}
193\directlua{node.write(tex.box[0])}
194\stoptyping
195
196will definitely fail. The first call flushes the box and the second one sees
197no box content and will bark. The best solution is to use a copy:
198
199\starttyping
200\setbox0\hbox{some text}
201\directlua{node.write(node.copy_list(tex.box[0]))}
202\stoptyping
203
204That way \TEX\ doesn't see a change in the box and will free it when needed: when
205it gets flushed, reassigned, at the end of a group, wherever.
206
207In \CONTEXT\ a somewhat shorter way of printing back to \TEX\ is the following
208and we will use that:
209
210\starttyping
211\setbox0\hbox{some text}
212\ctxlua{context(node.copy_list(tex.box[0])}
213\stoptyping
214
215or shortcut into \CONTEXT:
216
217\starttyping
218\setbox0\hbox{some text}
219\cldcontext{node.copy_list(tex.box[0])}
220\stoptyping
221
222As we've now arrived at the \LUA\ end, we have more possibilities with nodes. In
223the next sections we will explore some of these.
224
225\stopsection
226
227\startsection[title=Management]
228
229The most important thing to keep in mind is that each node is unique in the sense
230that it can be used only once. If you don't need it and don't flush it, you
231should free it. If you need it more than once, you need to make a copy. But let's
232first start with creating a node.
233
234\starttyping
235local g = node.new("glyph")
236\stoptyping
237
238This node has some properties that need to be set. The most important are the font
239and the character. You can find more in the \LUATEX\ manual.
240
241\starttyping
242g.font = font.current()
243g.char = utf.byte("a")
244\stoptyping
245
246After this we can write it to the \TEX\ input:
247
248\starttyping
249context(g)
250\stoptyping
251
252This node is automatically freed afterwards. As we're talking \LUA\ you can use
253all kind of commands that are defined in \CONTEXT. Take fonts:
254
255\startbuffer
256\startluacode
257local g1 = node.new("glyph")
258local g2 = node.new("glyph")
259
260g1.font = fonts.definers.internal {
261    name = "dejavuserif",
262    size = "60pt",
263}
264
265g2.font = fonts.definers.internal {
266    name = "dejavusansmono",
267    size = "60pt",
268}
269
270g1.char = utf.byte("a")
271g2.char = utf.byte("a")
272
273context(g1)
274context(g2)
275\stopluacode
276\stopbuffer
277
278\typebuffer
279
280We get: \getbuffer, but there is one pitfall: the nodes have to be flushed in
281horizontal mode, so either put \type {\dontleavehmode} in front or add \type
282{context.dontleavehmode()}. If you get error messages like \typ {this can't
283happen} you probably forgot to enter horizontal mode.
284
285In \CONTEXT\ you have some helpers, for instance:
286
287\starttyping
288\startluacode
289local id = fonts.definers.internal { name = "dejavuserif" }
290
291context(nodes.pool.glyph(id,utf.byte("a")))
292context(nodes.pool.glyph(id,utf.byte("b")))
293context(nodes.pool.glyph(id,utf.byte("c")))
294\stopluacode
295\stoptyping
296
297or, when we need these functions a lot and want to save some typing:
298
299\startbuffer
300\startluacode
301local getfont  = fonts.definers.internal
302local newglyph = nodes.pool.glyph
303local utfbyte  = utf.byte
304
305local id = getfont { name = "dejavuserif" }
306
307context(newglyph(id,utfbyte("a")))
308context(newglyph(id,utfbyte("b")))
309context(newglyph(id,utfbyte("c")))
310\stopluacode
311\stopbuffer
312
313\typebuffer
314
315This renders as: \getbuffer. We can make copies of nodes too:
316
317\startbuffer
318\startluacode
319local id = fonts.definers.internal { name = "dejavuserif" }
320local a  = nodes.pool.glyph(id,utf.byte("a"))
321
322for i=1,10 do
323    context(node.copy(a))
324end
325
326node.free(a)
327\stopluacode
328\stopbuffer
329
330\typebuffer
331
332This gives: \getbuffer. Watch how afterwards we free the node. If we have not one
333node but a list (for instance because we use box content) you need to use the
334alternatives \type {node.copy_list} and \type {node.free_list} instead.
335
336In \CONTEXT\ there is a convenient helper to create a list of text nodes:
337
338\startbuffer
339\startluacode
340context(nodes.typesetters.tonodes("this works okay"))
341\stopluacode
342\stopbuffer
343
344\typebuffer
345
346And indeed, \getbuffer, even when we use spaces. Of course it makes
347more sense (and it is also more efficient) to do this:
348
349\startbuffer
350\startluacode
351context("this works okay")
352\stopluacode
353\stopbuffer
354
355In this case the list is constructed at the \TEX\ end. We have now learned enough
356to start using some convenient operations, so these are introduced next. Instead
357of the longer \type {tonodes} call we will use the shorter one:
358
359\starttyping
360local head, tail = string.tonodes("this also works"))
361\stoptyping
362
363As you see, this constructor returns the head as well as the tail of the
364constructed list.
365
366\stopsection
367
368\startsection[title=Operations]
369
370If you are familiar with \LUA\ you will recognize this kind of code:
371
372\starttyping
373local str = "time: " .. os.time()
374\stoptyping
375
376Here a string \type {str} is created that is built out if two concatinated
377snippets. And, \LUA\ is clever enough to see that it has to convert the number to
378a string.
379
380In \CONTEXT\ we can do the same with nodes:
381
382\startbuffer
383\startluacode
384local foo = string.tonodes("foo")
385local bar = string.tonodes("bar")
386local amp = string.tonodes(" & ")
387
388context(foo .. amp .. bar)
389\stopluacode
390\stopbuffer
391
392\typebuffer
393
394This will append the two node lists: \getbuffer.
395
396\startbuffer
397\startluacode
398local l = string.tonodes("l")
399local m = string.tonodes(" ")
400local r = string.tonodes("r")
401
402context(5 * l .. m .. r * 5)
403\stopluacode
404\stopbuffer
405
406\typebuffer
407
408You can have the multiplier on either side of the node: \getbuffer.
409Addition and subtraction is also supported but it comes in flavors:
410
411\startbuffer
412\startluacode
413local l1 = string.tonodes("aaaaaa")
414local r1 = string.tonodes("bbbbbb")
415local l2 = string.tonodes("cccccc")
416local r2 = string.tonodes("dddddd")
417local m  = string.tonodes(" + ")
418
419context((l1 - r1) .. m .. (l2 + r2))
420\stopluacode
421\stopbuffer
422
423\typebuffer
424
425In this case, as we have two node (lists) involved in the addition and
426subtraction, we get one of them injected into the other: after the first, or
427before the last node. This might sound weird but it happens.
428
429\dontleavehmode \start \maincolor \getbuffer \stop
430
431We can use these operators to take a slice of the given node list.
432
433\startbuffer
434\startluacode
435local l = string.tonodes("123456")
436local r = string.tonodes("123456")
437local m = string.tonodes("+ & +")
438
439context((l - 3) .. (1 + m - 1).. (3 + r))
440\stopluacode
441\stopbuffer
442
443\typebuffer
444
445So we get snippets that get appended: \getbuffer. The unary operator
446reverses the list:
447
448\startbuffer
449\startluacode
450local l = string.tonodes("123456")
451local r = string.tonodes("123456")
452local m = string.tonodes(" & ")
453
454context(l .. m .. - r)
455\stopluacode
456\stopbuffer
457
458\typebuffer
459
460This is probably not that useful, but it works as expected: \getbuffer.
461
462We saw that \type {*} makes copies but sometimes that is not enough. Consider the
463following:
464
465\startbuffer
466\startluacode
467local n = string.tonodes("123456")
468
469context((n - 2) .. (2 + n))
470\stopluacode
471\stopbuffer
472
473\typebuffer
474
475Because the slicer frees the unused nodes, the value of \type {n} in the second
476case is undefined. It still points to a node but that one already has been freed.
477So you get an error message. But of course (as already demonstrated) this is
478valid:
479
480\startbuffer
481\startluacode
482local n = string.tonodes("123456")
483
484context(2 + n - 2)
485\stopluacode
486\stopbuffer
487
488\typebuffer
489
490We get the two middle characters: \getbuffer. So, how can we use a
491node (list) several times in an expression? Here is an example
492
493\startbuffer
494\startluacode
495local l = string.tonodes("123")
496local m = string.tonodes(" & ")
497local r = string.tonodes("456")
498
499context((l^1 .. r^1)^2 .. m^1 .. r .. m .. l)
500\stopluacode
501\stopbuffer
502
503\typebuffer
504
505Using \type {^} we create copies, so we can still use the original later on. You
506can best make sure that one reference to a node is not copied because otherwise
507we get a memory leak. When you write the above without copying \LUATEX\ most
508likely end up in a loop. The result of the above is:
509
510\blank \start \dontleavehmode \maincolor \getbuffer \stop \blank
511
512Let's repeat it once more time: keep in mind that we need to do the memory
513management ourselves. In practice we will seldom need more than the
514concatination, but if you make complex expressions be prepared to loose some
515memory when you copy and don't free them. As \TEX\ runs are normally limited in
516time this is hardly an issue.
517
518So what about the division. We needed some kind of escape and as with \type
519{lpeg} we use the \type {/} to apply additional operations.
520
521\startbuffer
522\startluacode
523local l = string.tonodes("123")
524local m = string.tonodes(" & ")
525local r = string.tonodes("456")
526
527local function action(n)
528    for g in node.traverse_id(node.id("glyph"),n) do
529        g.char = string.byte("!")
530    end
531    return n
532end
533
534context(l .. m / action .. r)
535\stopluacode
536\stopbuffer
537
538\typebuffer
539
540And indeed we the middle glyph gets replaced: \getbuffer.
541
542\startbuffer
543\startluacode
544local l = string.tonodes("123")
545local r = string.tonodes("456")
546
547context(l .. nil .. r)
548\stopluacode
549\stopbuffer
550
551\typebuffer
552
553When you construct lists programmatically it can happen that one of the
554components is nil and to some extend this is supported: so the above
555gives: \getbuffer.
556
557Here is a summary of the operators that are currently supported. Keep in mind that
558these are not built in \LUATEX\ but extensions in \MKIV. After all, there are many
559ways to map operators on actions and this is just one.
560
561\starttabulate[|l|l|]
562\NC \type{n1 .. n2} \NC append nodes (lists) \type {n1} and \type {n2}, no copies \NC \NR
563\NC \type{n * 5}    \NC append 4 copies of node (list) \type {n} to \type {n} \NC \NR
564\NC \type{5 + n}    \NC discard the first 5 nodes from list \type {n} \NC \NR
565\NC \type{n - 5}    \NC discard the last 5 nodes from list \type {n} \NC \NR
566\NC \type{n1 + n2}  \NC inject (list) \type {n2} after first of list \type {n1} \NC \NR
567\NC \type{n1 - n2}  \NC inject (list) \type {n2} before last of list \type {n1} \NC \NR
568\NC \type{n^2}      \NC make two copies of node (list) \type {n} and keep the orginal \NC \NR
569\NC \type{- n}      \NC reverse node (list) \type {n} \NC \NR
570\NC \type{n / f}    \NC apply function \type {f} to node (list) \type {n} \NC \NR
571\stoptabulate
572
573As mentioned, you can only use a node or list once, so when you need it more times, you need
574to make copies. For example:
575
576\startbuffer
577\startluacode
578local l = string.tonodes(     -- maybe: nodes.maketext
579    " 1 2 3 "
580)
581local r = nodes.tracers.rule( -- not really a user helper (spec might change)
582    string.todimen("1%"),     -- or maybe: nodes.makerule("1%",...)
583    string.todimen("2ex"),
584    string.todimen(".5ex"),
585    "maincolor"
586)
587
588context(30 * (r^1 .. l) .. r)
589\stopluacode
590\stopbuffer
591
592\typebuffer
593
594This gives a mix of glyphs, glue and rules: \getbuffer. Of course you can wonder
595how often this kind of juggling happens in use cases but at least in some core
596code the concatination (\type {..}) gives a bit more readable code and the
597overhead is quite acceptable.
598
599\stopsection
600
601\stopchapter
602
603\stopcomponent
604