luametatex-internals.tex /size: 52 Kb    last modification: 2025-02-21 11:03
1% language=us runpath=texruns:manuals/luametatex
2\environment luametatex-style
3
4\startdocument[title=Internals]
5
6\defineframed
7  [mymemory]
8  [frame=off,
9   align=normal,
10   strut=no,
11 % rulethickness=.2uu,
12 % framecolor=maincolor,
13   rulethickness=0uu,
14   offset=1uu,
15   background=color,
16 % backgroundcolor=maincolor,
17   backgroundcolor=darkgray,
18   foregroundstyle=bold,
19   foregroundcolor=white]
20
21\defineframed
22  [myslot]
23  [framecolor=maincolor]
24
25\startsection[title={Introduction}]
26
27If you look at \TEX\ as a programming language and are familiar with other
28languages, a natural question to ask is what data types there are and how is all
29managed. Here I will give a general overview of some concepts. The explanation
30below is not entirely accurate because it tries to avoid the sometimes messy
31details. More can be found in the other low level manuals. I assume that one
32knows at least how to process a simple document with a few commands.
33
34It is not natural to start an explanation with how memory is laid out but by
35doing this it is easier to introduce the concepts. I will focus on what is called
36hash table, the stack, node memory and token memory. We leave fonts, languages,
37character properties, math, etc.\ out of the picture. There are details that we
38skip because it's the general picture that matters here.
39
40{\em I might add some more to this manual, depending on questions by users at
41meetings or on the mailing list. Some details might change over time but the
42principles remain the same.}
43
44\stopsection
45
46\startsection[title={A few basics}]
47
48This is a reference manual and not a tutorial. This means that we discuss changes
49relative to traditional \TEX\ and also present new (or extended) functionality.
50As a consequence we will refer to concepts that we assume to be known or that
51might be explained later. Because the \LUATEX\ and \LUAMETATEX\ engines open up
52\TEX\ there's suddenly quite some more to explain, especially about the way a (to
53be) typeset stream moves through the machinery. However, discussing all that in
54detail makes not much sense, because deep knowledge is only relevant for those
55who write code not possible with regular \TEX\ and who are already familiar with
56these internals (or willing to spend time on figuring it out).
57
58So, the average user doesn't need to know much about what is in this manual. For
59instance fonts and languages are normally dealt with in the macro package that
60you use. Messing around with node lists is also often not really needed at the
61user level. If you do mess around, you'd better know what you're dealing with.
62Reading \quotation {The \TEX\ Book} by Donald Knuth is a good investment of time
63then also because it's good to know where it all started. A more summarizing
64overview is given by \quotation {\TEX\ by Topic} by Victor Eijkhout. You might
65want to peek in \quotation {The \ETEX\ manual} too.
66
67But \unknown\ if you're here because of \LUA, then all you need to know is that
68you can call it from within a run. If you want to learn the language, just read
69the well written \LUA\ book. The macro package that you use probably will provide
70a few wrapper mechanisms but the basic \type {\directlua} command that does the job
71is:
72
73\starttyping
74\directlua{tex.print("Hi there")}
75\stoptyping
76
77You can put code between curly braces but if it's a lot you can also put it in a
78file and load that file with the usual \LUA\ commands. If you don't know what
79this means, you definitely need to have a look at the \LUA\ book first.
80
81If you still decide to read on, then it's good to know what nodes are, so we do a
82quick introduction here. If you input this text:
83
84\starttyping
85Hi There ...
86\stoptyping
87
88eventually we will get a linked lists of nodes, which in \ASCII\ art looks like:
89
90\starttyping
91H <=> i <=> [glue] <=> T <=> h <=> e <=> r <=> e ...
92\stoptyping
93
94When we have a paragraph, we actually get something like this, where a \type
95{par} node stores some metadata and is followed by a \type {hlist} flagged as
96indent box:
97
98\starttyping
99[par] <=> [hlist] <=> H <=> i <=> [glue] <=> T <=> h <=> e <=> r <=> e ...
100\stoptyping
101
102Each character becomes a so called glyph node, a record with properties like the
103current font, the character code and the current language. Spaces become glue
104nodes. There are many node types and nodes can have many properties but that will
105be discussed later. Each node points back to a previous node or next node, given
106that these exist. Sometimes multiple characters are represented by one glyph
107(shape), so one can also get:
108
109\starttyping
110[par] <=> [hlist] <=> H <=> i <=> [glue] <=> Th <=> e <=> r <=> e ...
111\stoptyping
112
113And maybe some characters get positioned relative to each other, so we might
114see:
115
116\starttyping
117[par] <=> [hlist] <=> H <=> [kern] <=> i <=> [glue] <=> Th <=> e <=> r <=> e ...
118\stoptyping
119
120Actually, the above representation is one view, because in \LUAMETATEX\ we can
121choose for this:
122
123\starttyping
124[par] <=> [glue] <=> H <=> [kern] <=> i <=> [glue] <=> Th <=> e <=> r <=> e ...
125\stoptyping
126
127where glue (currently fixed) is used instead of an empty hlist (think of a \type
128{\hbox}). Options like this are available because want a certain view on these
129lists from the \LUA\ end and the result being predicable is part of that.
130
131It's also good to know beforehand that \TEX\ is basically centered around
132creating paragraphs and pages. The par builder takes a list and breaks it into
133lines. At some point horizontal blobs are wrapped into vertical ones. Lines are
134so called boxes and can be separated by glue, penalties and more. The page
135builder accumulates lines and when feasible triggers an output routine that will
136take the list so far. Constructing the actual page is not part of \TEX\ but done
137using primitives that permit manipulation of boxes. The result is handled back to
138\TEX\ and flushed to a (often \PDF) file.
139
140\starttyping
141\setbox\scratchbox\vbox\bgroup
142    line 1\par line 2
143\egroup
144
145\showbox\scratchbox
146\stoptyping
147
148The above code produces the next log lines that reveal how the engines sees a
149paragraph (wrapped in a \type {\vbox}):
150
151\starttyping[style=small]
1521:4: > \box257=
1531:4: \vbox[normal][16=1,17=1,47=1], width 483.69687, height 27.58083, depth 0.1416, direction l2r
1541:4: .\list
1551:4: ..\hbox[line][16=1,17=1,47=1], width 483.69687, height 7.59766, depth 0.1416, glue 455.40097fil, direction l2r
1561:4: ...\list
1571:4: ....\glue[left hang][16=1,17=1,47=1] 0.0pt
1581:4: ....\glue[left][16=1,17=1,47=1] 0.0pt
1591:4: ....\glue[parfillleft][16=1,17=1,47=1] 0.0pt
1601:4: ....\par[newgraf][16=1,17=1,47=1], hangafter 1, hsize 483.69687, pretolerance 100, tolerance 3000, adjdemerits 10000, linepenalty 10, doublehyphendemerits 10000, finalhyphendemerits 5000, clubpenalty 2000, widowpenalty 2000, brokenpenalty 100, emergencystretch 12.0, parfillskip 0.0pt plus 1.0fil, hyphenationmode 499519
1611:4: ....\glue[indent][16=1,17=1,47=1] 0.0pt
1621:4: ....\glyph[32768][16=1,17=1,47=1], language (n=1,l=2,r=3), hyphenationmode 499519, options 128 , font <30: DejaVuSerif @ 10.0pt>, glyph U+00006C l
1631:4: ....\glyph[32768][16=1,17=1,47=1], language (n=1,l=2,r=3), hyphenationmode 499519, options 128 , font <30: DejaVuSerif @ 10.0pt>, glyph U+000069 i
1641:4: ....\glyph[32768][16=1,17=1,47=1], language (n=1,l=2,r=3), hyphenationmode 499519, options 128 , font <30: DejaVuSerif @ 10.0pt>, glyph U+00006E n
1651:4: ....\glyph[32768][16=1,17=1,47=1], language (n=1,l=2,r=3), hyphenationmode 499519, options 128 , font <30: DejaVuSerif @ 10.0pt>, glyph U+000065 e
1661:4: ....\glue[space][16=1,17=1,47=1] 3.17871pt plus 1.58936pt minus 1.05957pt, font 30
1671:4: ....\glyph[32768][16=1,17=1,47=1], language (n=1,l=2,r=3), hyphenationmode 499519, options 128 , font <30: DejaVuSerif @ 10.0pt>, glyph U+000031 1
1681:4: ....\penalty[line][16=1,17=1,47=1] 10000
1691:4: ....\glue[parfill][16=1,17=1,47=1] 0.0pt plus 1.0fil
1701:4: ....\glue[right][16=1,17=1,47=1] 0.0pt
1711:4: ....\glue[right hang][16=1,17=1,47=1] 0.0pt
1721:4: ..\glue[par][16=1,17=1,47=1] 5.44995pt plus 1.81665pt minus 1.81665pt
1731:4: ..\glue[baseline][16=1,17=1,47=1] 6.79396pt
1741:4: ..\hbox[line][16=1,17=1,47=1], width 483.69687, height 7.59766, depth 0.1416, glue 455.40097fil, direction l2r
1751:4: ...\list
1761:4: ....\glue[left hang][16=1,17=1,47=1] 0.0pt
1771:4: ....\glue[left][16=1,17=1,47=1] 0.0pt
1781:4: ....\glue[parfillleft][16=1,17=1,47=1] 0.0pt
1791:4: ....\par[newgraf][16=1,17=1,47=1], hangafter 1, hsize 483.69687, pretolerance 100, tolerance 3000, adjdemerits 10000, linepenalty 10, doublehyphendemerits 10000, finalhyphendemerits 5000, clubpenalty 2000, widowpenalty 2000, brokenpenalty 100, emergencystretch 12.0, parfillskip 0.0pt plus 1.0fil, hyphenationmode 499519
1801:4: ....\glue[indent][16=1,17=1,47=1] 0.0pt
1811:4: ....\glyph[32768][16=1,17=1,47=1], language (n=1,l=2,r=3), hyphenationmode 499519, options 128 , font <30: DejaVuSerif @ 10.0pt>, glyph U+00006C l
1821:4: ....\glyph[32768][16=1,17=1,47=1], language (n=1,l=2,r=3), hyphenationmode 499519, options 128 , font <30: DejaVuSerif @ 10.0pt>, glyph U+000069 i
1831:4: ....\glyph[32768][16=1,17=1,47=1], language (n=1,l=2,r=3), hyphenationmode 499519, options 128 , font <30: DejaVuSerif @ 10.0pt>, glyph U+00006E n
1841:4: ....\glyph[32768][16=1,17=1,47=1], language (n=1,l=2,r=3), hyphenationmode 499519, options 128 , font <30: DejaVuSerif @ 10.0pt>, glyph U+000065 e
1851:4: ....\glue[space][16=1,17=1,47=1] 3.17871pt plus 1.58936pt minus 1.05957pt, font 30
1861:4: ....\glyph[32768][16=1,17=1,47=1], language (n=1,l=2,r=3), hyphenationmode 499519, options 128 , font <30: DejaVuSerif @ 10.0pt>, glyph U+000032 2
1871:4: ....\penalty[line][16=1,17=1,47=1] 10000
1881:4: ....\glue[parfill][16=1,17=1,47=1] 0.0pt plus 1.0fil
1891:4: ....\glue[right][16=1,17=1,47=1] 0.0pt
1901:4: ....\glue[right hang][16=1,17=1,47=1] 0.0pt
191\stoptyping
192
193The \LUAMETATEX\ engine provides hooks for \LUA\ code at nearly every reasonable
194point in the process: collecting content, hyphenating, applying font features,
195breaking into lines, etc. This means that you can overload \TEX's natural
196behavior, which still is the benchmark. When we refer to \quote {callbacks} we
197means these hooks. The \TEX\ engine itself is pretty well optimized but when you
198kick in much \LUA\ code, you will notices that performance drops. Don't blame and
199bother the authors with performance issues. In \CONTEXT\ over 50\% of the time
200can be spent in \LUA, but so far we didn't get many complaints about efficiency.
201Adding more callbacks makes no sense, also because at some point the performance
202hit gets too large. There are plenty of ways to achieve goals. For that reason:
203take remarks about \LUAMETATEX, features, potential, performance etc.\ with a natural
204grain of salt.
205
206Where plain \TEX\ is basically a basic framework for writing a specific style,
207macro packages like \CONTEXT\ and \LATEX\ provide the user a whole lot of
208additional tools to make documents look good. They hide the dirty details of font
209management, language support, turning structure into typeset results, wrapping
210pages, including images, and so on. You should be aware of the fact that when you
211hook in your own code to manipulate lists, this can interfere with the macro
212package that you use. Each successive step expects a certain result and if you
213mess around to much, the engine eventually might bark and quit. It can even
214crash, because testing everywhere for what users can do wrong is no real option.
215
216When you read about nodes in the following chapters it's good to keep in mind
217what commands relate to them. Here are a few:
218
219\starttabulate[|l|l|p|]
220\FL
221\BC command                \BC node           \BC explanation \NC \NR
222\TL
223\NC \type {\hbox}          \NC \type {hlist} \NC horizontal box \NC \NR
224\NC \type {\vbox}          \NC \type {vlist} \NC vertical box with the baseline at the bottom \NC \NR
225\NC \type {\vtop}          \NC \type {vlist} \NC vertical box with the baseline at the top \NC \NR
226\NC \type {\hskip}         \NC \type {glue}  \NC horizontal skip with optional stretch and shrink \NC \NR
227\NC \type {\vskip}         \NC \type {glue}  \NC vertical skip with optional stretch and shrink \NC \NR
228\NC \type {\kern}          \NC \type {kern}  \NC horizontal or vertical fixed skip \NC \NR
229\NC \type {\discretionary} \NC \type {disc}  \NC hyphenation point (pre, post, replace) \NC \NR
230\NC \type {\char}          \NC \type {glyph} \NC a character \NC \NR
231\NC \type {\hrule}         \NC \type {rule}  \NC a horizontal rule \NC \NR
232\NC \type {\vrule}         \NC \type {rule}  \NC a vertical rule \NC \NR
233\NC \type {\textdirection} \NC \type {dir}   \NC a change in text direction \NC \NR
234\LL
235\stoptabulate
236
237Whatever we feed into \TEX\ at some point becomes a token which is either
238interpreted directly or stored in a linked list. A token is just a number that
239encodes a specific command (operator) and some value (operand) that further
240specifies what that command is supposed to do. In addition to an interface to
241nodes, there is an interface to tokens, as later chapters will demonstrate.
242
243Text (interspersed with macros) comes from an input medium. This can be a file,
244token list, macro body cq.\ arguments, some internal quantity (like a number),
245\LUA, etc. Macros get expanded. In the process \TEX\ can enter a group. Inside
246the group, changes to registers get saved on a stack, and restored after leaving
247the group. When conditionals are encountered, another kind of nesting happens,
248and again there is a stack involved. Tokens, expansion, stacks, input levels are
249all terms used in the next chapters. Don't worry, they loose their magic once you
250use \TEX\ a lot. You have access to most of the internals and when not, at least
251it is possible to query some state we're in or level we're at.
252
253When we talk about pack(ag)ing it can mean two things. When \TEX\ has consumed
254some tokens that represent text they are added to the current list. When the text
255is put into a so called \type {\hbox} (for instance a line in a paragraph) it
256(normally) first gets hyphenated, next ligatures are build, and finally kerns are
257added. Each of these stages can be overloaded using \LUA\ code. When these three
258stages are finished, the dimension of the content is calculated and the box gets
259its width, height and depth. What happens with the box depends on what macros do
260with it.
261
262The other thing that can happen is that the text starts a new paragraph. In that
263case some information is stored in a leading \type {par} node. Then indentation
264is appended and the paragraph ends with some glue. Again the three stages are
265applied but this time afterwards, the long line is broken into lines and the
266result is either added to the content of a box or to the main vertical list (the
267running text so to say). This is called par building. At some point \TEX\ decides
268that enough is enough and it will trigger the page builder. So, building is
269another concept we will encounter. Another example of a builder is the one that
270turns an intermediate math list into something typeset.
271
272Wrapping something in a box is called packing. Adding something to a list is
273described in terms of contributing. The more complicated processes are wrapped
274into builders. For now this should be enough to enable you to understand the next
275chapters. The text is not as enlightening and entertaining as Don Knuths books,
276sorry.
277
278\stopsection
279
280\startsection[title={Memory words}]
281
282Before we come to know that \TEX\ manages most of it memory itself. It allocates
283arrays of (pairs of) 32 bit integers because that is what \TEX\ uses all over the
284place: integers. They store integer numbers of various ranges values, fixed point
285floats, pointers (indices in arrays), states, commands, and often groups of them
286travel around the system.
287
288\starttabulate
289\NC integer           \EQ mostly 8, 16, 24, 32 but we have odd packing too \NC \NR
290\NC fixed point float \EQ 16.16 used to represent dimensions \NC \NR
291\NC boolean           \EQ simple state variables \NC \NR
292\NC enumerations      \EQ a choice from a set, like operators and operands \NC \NR
293\NC strings           \EQ an index in a string pool (character array) \NC \NR
294\stoptabulate
295
296The main memory areas in \TEX\ are therefore arrays integers or pairs of integers
297as we want to handle linked lists where in an element one integer has some data
298and the other points to another element. Keep in mind that when \TEX\ showed up
299efficient memory management was best done by the application, especially when it
300had to be portable. This might seem odd now but is actually not that bad
301performance wise. One just has to get accustomed to the way \TEX\ handles data.
302
303\startsetups memory:halfword
304    \startframed[mymemory]
305        \offinterlineskip
306        \dontleavehmode
307        \myslot[width=4uu]{1}\hkern 1uu
308        \myslot[width=4uu]{1}\hkern 1uu
309        \myslot[width=4uu]{1}\hkern 1uu
310        \myslot[width=4uu]{1}\vkern 1uu
311        \dontleavehmode
312        \myslot[width=9uu]{2}\hkern 1uu
313        \myslot[width=9uu]{2}\vkern 1uu
314        \dontleavehmode
315        \myslot[width=19uu]{4}
316    \stopframed
317\stopsetups
318
319\startsetups memory:token
320    \startframed[mymemory]
321        \offinterlineskip
322        \dontleavehmode
323        \myslot[width=4uu]{1}\hkern 1uu
324        \myslot[width=14uu]{3}\vkern 1uu
325        \dontleavehmode
326        \myslot[width=19uu]{4}
327    \stopframed
328\stopsetups
329
330\startlinecorrection
331    \forgetall
332    \uunit=.5em
333    \setups[memory:halfword]
334\stoplinecorrection
335
336Depending on usage we use four, two or one byte. Often a pair is used:
337
338\startlinecorrection
339    \forgetall
340    \uunit=.5em
341    \offinterlineskip
342    \dontleavehmode
343    \setups[memory:halfword]\hkern 1uu
344    \setups[memory:halfword]
345\stoplinecorrection
346
347Such a pair is called a (memory) word and each component is a halfword that
348itself can have two quarterwords and four singlewords. In \LUAMETATEX\ we also
349can combine them:
350
351\startlinecorrection
352    \forgetall
353    \uunit=.5em
354    \offinterlineskip
355    \dontleavehmode
356    \setups[memory:halfword]\hkern 1uu
357    \setups[memory:halfword]\vkern 1uu
358    \dontleavehmode
359    \startframed[mymemory]
360        \myslot[width=41uu]{8}
361    \stopframed
362\stoplinecorrection
363
364The eight byte field is used for pointers (to more dynamic structures) and double
365floats but that can only happen when multiple words are used as a combined data
366structure (as in a so called node, explained below). Quite often the second field
367is used as pointer to another pair. We could have changed that model in \LUATEX\
368and \LUAMETATEX\ but there is little gain in that and we want to stay close to
369the well documented original as much as possible. It also has the side effect of
370simplifying the code and retain performance. \footnote {In the source this is
371reflected in the names used: \type {vinfo} and \type {vlink} in these pairs but
372in \LUAMETATEX\ we often use more symbolic names.}
373
374\stopsection
375
376\startsection[title={Tokens}]
377
378A token is a halfword, so a 32 bit integer as mentioned before. Here we use a one
379plus three model, not mentioned in the previous section. Sometimes we just look
380at the whole number, but quite often we look at the two smaller ones. The single
381byte is the so called command identifier (cmd), the second one traditionally is
382called character (chr), but what we're really talking about is an operator and
383operand kind of model. In a \TEX\ engine source you can find variable names like
384\typ {cur_cmd}, \typ {cur_chr} and \typ {cur_tok} were the third one combines the
385first two.
386
387\startlinecorrection
388    \forgetall
389    \uunit=.5em
390    \offinterlineskip
391    \dontleavehmode
392    \setups[memory:token]
393\stoplinecorrection
394
395Tokens travel through the system as integers and when some action is required the
396command part is consulted which then triggers some action further defined by the
397character part. The combination can either directly trigger some action but often
398that action has to look ahead in order to get some more details.
399
400Consider the following input:
401
402\starttyping[option=TEX]
403\starttext
404Hi there!
405
406This is a \hbox{box}.
407\stoptext
408\stoptyping
409
410Every character falls in a category, and there are 16 of them. The \type {H} is a
411\quote {letter}, the empty line a {newline}. The backslash is an \quote {escape}
412that tells the parser to scan for a command where the name is from letters. That
413command is then looked up and a token is created: in this case a \quote {call}
414command with as operand the memory address (an index in the to be discussed hash)
415where the start of a list of stored tokens can be found.
416
417The characters in the text also become tokens and here we get two \quote {letter}
418commands (with the \UNICODE\ slots as operand), one \quote {space} command, five
419more letter commands and an \quote {other} command, and so on.
420
421Here every token is fed into the interpreter. The \type {\starttext} and \type
422{\stoptext} are macros (control sequences) so they get expanded and the stored
423tokens get interpreted. The letters become (to be discussed) nodes in a linked
424list of content. In this case the tokens are not stored and discarded as we read
425on.
426
427The \type {\hbox} is also a control sequence but a built in primitive. The
428operator is \typ {make_box} and the operand is \typ {hbox}. It will trigger
429making a box of the given kind by reading an optional specification, the left
430curly brace (begin group) collects content, and when the right curly brace (end
431group) is seen wraps up by packaging the result. Al that is hard coded, contrary
432to a macro, but one can of course define \type {\hbox} as macro, which normally
433is a bad idea.
434
435As a side note: quite often \TEX\ reads a token, and then puts if back into the
436input. For instance, when it expects a number or keyword it keeps reading till it
437is satisfied and when it ends up in the unexpected it has to wrap up and go one
438step back. However, when we read from file we can't go back, which is why \TEX\
439has a model of \quote {input levels}. Pushing back boils down to creating a token
440list with this one token and then starts reading from that list. It is beyond
441this explanation to go into details but all you need to know is that \TEX\ has
442various input sources, for instance files, token lists, arguments to commands
443(also token lists) and \LUA\ output, but in the end all provide tokens. \footnote
444{We could use a double linked list in which case we would have a three integer
445element which is odd for \TEX\ and has no real benefits as it would change the
446model completely.}
447
448\startlinecorrection
449    \forgetall
450    \uunit=.5em
451    \offinterlineskip
452    \startframed[mymemory]
453        \dontleavehmode
454        \myslot[width=2uu,frame=off]{1}\hkern1uu
455        \myslot[width=19uu]{info}\hkern1uu
456        \myslot[width=19uu]{link}\vkern1uu
457        \dontleavehmode
458        \myslot[width=2uu,frame=off]{}\hkern1uu
459        \myslot[width=19uu,frame=dash]{}\hkern1uu
460        \myslot[width=19uu,frame=dash]{}\vkern1uu
461        \dontleavehmode
462        \myslot[width=2uu,frame=off]{n}\hkern1uu
463        \myslot[width=19uu]{info}\hkern1uu
464        \myslot[width=19uu]{link}
465    \stopframed
466\stoplinecorrection
467
468So to wrap up tokens, we have either singular ones (just 32 bit integers encoding
469a command and value aka operator and operand) or a pair where the second one is a
470link. A token list starts at some index and the link is zero (end of list) or
471another index. Token memory is huge array of memory words like these. When token
472lists are constructed we take from this pool so there is an index indicating the
473first available token. When a list is discarded it gets appended to a list of
474free tokens. So in practice we first try to get a free token from this pool. In
475\LUAMETATEX\ it the token array will grow on demand with a configurable chunk size.
476
477\stopsection
478
479\startsection[title=Nodes]
480
481We already mentioned nodes. These are slices from an array that hold some values
482that belong together. So again we have a large array of memory words but where a
483token is one pair a node is multiple. Nodes have different size. The first node
484starts at index~1 and when it needs four memory words the second node starts at
485index~5.
486
487A character in the input that is typeset will become a glyph node of \cldcontext
488{node.size ("glyph")} bytes and a paragraph starts with a par node of \cldcontext
489{node.size ("par")} bytes. A space becomes a glue node of \cldcontext {node.size
490("glue")} bytes and every box that you (or \TEX) make is \cldcontext {node.size
491("hlist")} bytes. Most nodes are way larger in \LUAMETATEX\ than in traditional
492\TEX\ but we don't have the memory constraints of those times.
493
494Here it is worth noticing that where \TEX\ has a dedicated subsystem for glue
495which make sharing space related glue efficient: the so called glue
496specifications are reference counted. In \LUATEX\ we made these normal nodes
497which is slightly less efficient but fits better in the opened up (\LUA)
498interface and also has some other advantages (we leave it to reader to guess
499what).
500
501For instance, a kern node at the time of this writing needs three memory words
502(as with other nodes we might add some more fields, like \type {options}).
503
504\startlinecorrection
505    \forgetall
506    \uunit=.5em
507    \offinterlineskip
508    \startframed[mymemory]
509        \offinterlineskip
510        \dontleavehmode
511        \myslot[width=6uu,frame=off]{3128}\hkern1uu
512        \myslot[width=10uu]{type}\hkern 1uu
513        \myslot[width=10uu]{subtype}\hkern 1uu
514        \myslot[width=21uu]{next}\vkern 1uu
515        \dontleavehmode
516        \myslot[width=6uu,frame=off]{3129}\hkern1uu
517        \myslot[width=21uu]{previous}\hkern 1uu
518        \myslot[width=21uu]{attribute}\vkern 1uu
519        \dontleavehmode
520        \myslot[width=6uu,frame=off]{3130}\hkern1uu
521        \myslot[width=21uu]{amount}\hkern 1uu
522        \myslot[width=21uu]{expansion}
523    \stopframed
524\stoplinecorrection
525
526So here we take a slice of three memory words from the node array starting at
527index 3128. We mention this detail because sometimes (when tracing) you see these
528numbers. This doesn't mean that at that point we had 3128 nodes, because the next
529node taken from this pool will have number 3131. The numbers are indices!
530
531In the source code we access thes enumber like this:
532
533\starttyping[option=LUA]
534# define kern_amount(a)    vlink(a,2)
535# define kern_expansion(a) vinfo(a,2)
536\stoptyping
537
538So when $a = 3128$ the amount is found in the link field $a = 3128 + 2 = 3130$.
539The name link is somewhat weird here but that's the way these fields are called:
540\type {vlink} and \type {vinfo}. It could as well be \type {first} and \type
541{second} but by using macros we get away by abstraction. So now you can figure
542out what these references do: \footnote {In what order these two fields end up in
543memory depends on the \CPU\ being little or big endian.}
544
545\starttyping[option=LUA]
546# define node_type(a)    vinfo0(a,0)
547# define node_subtype(a) vinfo1(a,0)
548
549# define node_next(a)    vlink(a,0)
550# define node_prev(a)    vlink(a,1)
551# define node_attr(a)    vinfo(a,1)
552\stoptyping
553
554Not all nodes end up in a list that results in output, like paragraphs and pages.
555For instance \typ {\parshape} and \typ {\widowpenalties} also use nodes as
556storage container. Their common node is a specification node of \cldcontext
557{node.size ("specification")} but with a pointer to a dynamically memory array.
558
559\protected\def\SnapNow % we need to fetch both at the same time
560  {\edef\SnapUsage{\cldcontext{nodes.pool.usage().glyph}}\relax
561   \edef\SnapStock{\cldcontext{nodes.pool.stock().glyph}}\relax}
562
563Because the sizes differ one cannot simply have a list of free nodes (as with
564tokens) without some lookup mechanism that combines nodes when needed (they need
565to be next to each other) or split larger ones when we run out of nodes. In
566\LUATEX\ and \LUAMETATEX\ we keep a list of free nodes per size which in practice
567is more efficient and one seldom runs out of nodes because on the average a page
568has a similar distribution and when a page is flushed (or any box for that
569matter) nodes get freed. For instance right at \SnapNow this moment, we have
570\SnapUsage\ nodes in use and \SnapStock\ glyphs in stock. \footnote {And a while
571later (that is: \SnapNow here) these numbers are \SnapUsage\ and \SnapStock.
572These numbers can handly be called dramatic as a page can only have so many glyph
573nodes: \SnapNow \SnapUsage\ and \SnapStock\ were the numbers after the colon.}
574
575\stopsection
576
577\startsection[title=The hash table]
578
579The engine has a lot of built-in commands and users can define additional ones.
580An example is macros, like the mentioned \typ {\starttext} and \type {\stoptext}
581that refer to a token list that starts the typesetting process. When reading the
582input from file these commands and macros are looked up in a hash table. There
583are also built-in commands that generate a hash entry. For instance when you
584define a counter or a font, the given name becomes a hash entry that points to a
585memory location (again an index).
586
587Here it gets more complex. A hash table is used to lookup primitive commands like
588\type {\hbox} and \type {\font} as well as \type {\starttext} and \type
589{\stoptext}. The string is converted into an integer within a specific range.
590That integer is then an index into a table like we saw before, with two halfwords
591per slot.
592
593\startlinecorrection
594    \forgetall
595    \uunit=.5em
596    \offinterlineskip
597    \startframed[mymemory]
598        \dontleavehmode
599        \myslot[width=2uu,frame=off]{1}\hkern1uu
600        \myslot[width=19uu]{next}\hkern1uu
601        \myslot[width=19uu]{string}\vkern1uu
602        \dontleavehmode
603        \myslot[width=2uu,frame=off]{}\hkern1uu
604        \myslot[width=19uu,frame=dash]{}\hkern1uu
605        \myslot[width=19uu,frame=dash]{}\vkern1uu
606        \dontleavehmode
607        \myslot[width=2uu,frame=off]{n}\hkern1uu
608        \myslot[width=19uu]{next}\hkern1uu
609        \myslot[width=19uu]{string}
610    \stopframed
611\stoplinecorrection
612
613The hash value (integer calculated from string) point to a slot and the string
614is compared with the stored string. When the string is different, the next field
615points to a different slot (outside the hash range in the same table) and again
616the string is checked. When there is no next value set (zero), the index is used to
617determine what to do.
618
619\startlinecorrection
620    \forgetall
621    \uunit=.5em
622    \offinterlineskip
623    \startframed[mymemory]
624        \dontleavehmode
625        \myslot[width=2uu,frame=off]{1}\hkern1uu
626        \myslot[width=8uu]{type}\hkern1uu
627        \myslot[width=8uu]{flags}\hkern1uu
628        \myslot[width=16uu]{level}\hkern1uu
629        \myslot[width=32uu]{value}\vkern1uu
630        \dontleavehmode
631        \myslot[width=2uu,frame=off]{}\hkern1uu
632        \myslot[width=8uu,frame=dash]{}\hkern1uu
633        \myslot[width=8uu,frame=dash]{}\hkern1uu
634        \myslot[width=16uu,frame=dash]{}\hkern1uu
635        \myslot[width=32uu,frame=dash]{}\vkern1uu
636        \dontleavehmode
637        \myslot[width=2uu,frame=off]{n}\hkern1uu
638        \myslot[width=8uu]{type}\hkern1uu
639        \myslot[width=8uu]{flags}\hkern1uu
640        \myslot[width=16uu]{level}\hkern1uu
641        \myslot[width=32uu]{value}\vkern1uu
642    \stopframed
643\stoplinecorrection
644
645This table is called the table of equivalents. In \LUAMETATEX\ this is
646implemented a bit different than in the other engines because we combine tables.
647The fields that you see here keep track of the type (so that we can optimize some
648bits and pieces), flags (so that we can implement overload protection), a level
649(so that we can restore values after the group ends and of course a value.
650
651That value can be a a pointer to (index of) a token list, or a pointer to (index
652of) a node. It can also be just some value, like a dimension, character reference
653or register entry.
654
655Although there are similarities, the memory mapping in \LUAMETATEX\ differs from
656\LUATEX\ and that one differs from \PDFTEX\ which again differs from original \TEX.
657
658In original \TEX\ table of equivalents is organized in six regions.
659
660\startitemize[n,packed,broad,columns,two]
661\startitem active characters \stopitem
662\startitem hash table\\ font identifiers \stopitem
663\startitem glue \\ muglue \stopitem
664\startitem token lists\\ boxes\\ font names\\ math codes\\ category codes\\ lowercase codes\\ uppercase codes\\ space factors \stopitem
665\startitem integers\\ delimiter codes \stopitem
666\startitem dimensions \stopitem
667\stopitemize
668
669The internal dimension, integer, skip, muskip, token and box registers are part
670of this and for users there are 256 registers of each category. There are 256
671active characters, and the mentioned codes and factors also have 256 entries.
672
673In \LUAMETATEX\ (like in \LUATEX) we use \UNICODE, so there it makes no sense to
674store values in the table of equivalents. We use dedicates hashes instead. So
675there we have different regions. In \LUATEX\ we roughly have this:
676
677\startitemize[n,packed,broad,columns,two]
678\startitem hash table \stopitem
679\startitem frozen control sequences \stopitem
680\startitem font identifiers \stopitem
681\startitem glue \stopitem
682\startitem muglue \stopitem
683\startitem tokens \stopitem
684\startitem boxes \stopitem
685\startitem integers \stopitem
686\startitem attributes \stopitem
687\startitem dimensions \stopitem
688\stopitemize
689
690As we moved forward, \LUAMETATEX\ has some more:
691
692\startitemize[n,packed,broad,columns,two]
693\startitem hash table \stopitem
694\startitem frozen control sequences \stopitem
695\startitem glue \stopitem
696\startitem muglue \stopitem
697\startitem tokens \stopitem
698\startitem boxes \stopitem
699\startitem integers \stopitem
700\startitem attributes \stopitem
701\startitem dimensions \stopitem
702\startitem posits \stopitem
703\startitem units \stopitem
704\startitem specifications \stopitem
705\stopitemize
706
707In case one wonders, on top of built-in units users can define their own.
708Specifications are for instance shape and penalty arrays. Fonts are not
709in here because we manage them in \LUA.
710
711In traditional \TEX\ a delimiter code needs two integers so there it uses
712both fields in a memory word and saves the state in a parallel array with
713quarterwords. We don't need this in \LUAMETATEX\ because we store delimiters
714in a separate hash table (and actually don't need them at all, because we use
715\OPENTYPE\ fonts).
716
717We need to keep some save|/|restore related state in the table but for integers
718and delimiter codes we need all four bytes of the value. Therefore original \TEX\
719has a separate parallel table for this, which as side effect spoils some memory. In
720\LUATEX\ we have way more registers so there the waste is larger.
721
722In \LUAMETATEX\ we got rid of this. We could also use less space for the type and
723store some extra data. A side effect is that we keep the type information which
724is handy for tracing, sparse dumping, and optimizing save and restore. This is
725why with more functionality we don't need less more memory than one would expect.
726
727The hash table in original \TEX\ is a bit too small for larger macro packages
728which is why in practice engines took more than the default couple of thousands
729slots. But going too large makes no sense because one ends up with many misses
730and unused hash and equivalent space. That is why soon after \TEX\ showed up
731support for extra hash space was introduced. That space is allocated at the end of
732normal hash space and can be configured when the format file is made. This means that
733the hash table also grows to the size of the equivalents table:
734
735\startlinecorrection
736    \forgetall
737    \uunit=.5em
738    \offinterlineskip
739    \dontleavehmode
740    \startframed[mymemory]
741        \dontleavehmode
742        \myslot[width=16uu,frame=off,foregroundcolor=maincolor]{hash table}\vkern1uu
743        \dontleavehmode
744        \myslot[width=16uu]{hash entries}\vkern1uu
745        \dontleavehmode
746        \myslot[width=16uu,frame=off]{}\vkern1uu
747        \dontleavehmode
748        \myslot[width=16uu,frame=off]{}
749    \stopframed
750    \hskip1uu
751    \startframed[mymemory]
752        \dontleavehmode
753        \myslot[width=16uu,frame=off,foregroundcolor=maincolor]{equivalents}\vkern1uu
754        \dontleavehmode
755        \myslot[width=16uu]{hash data}\vkern1uu
756        \dontleavehmode
757        \myslot[width=16uu]{other data}\vkern1uu
758        \dontleavehmode
759        \myslot[width=16uu]{}
760    \stopframed
761    \hskip4uu
762    \startframed[mymemory]
763        \dontleavehmode
764        \myslot[width=16uu,frame=off,foregroundcolor=maincolor]{hash table}\vkern1uu
765        \dontleavehmode
766        \myslot[width=16uu]{hash entries}\vkern1uu
767        \dontleavehmode
768        \myslot[width=16uu,frame=off]{}\vkern1uu
769        \dontleavehmode
770        \myslot[width=16uu]{extra entries}
771    \stopframed
772    \hskip1uu
773    \startframed[mymemory]
774        \dontleavehmode
775        \myslot[width=16uu,frame=off,foregroundcolor=maincolor]{equivalents}\vkern1uu
776        \dontleavehmode
777        \myslot[width=16uu]{hash data}\vkern1uu
778        \dontleavehmode
779        \myslot[width=16uu]{other data}\vkern1uu
780        \dontleavehmode
781        \myslot[width=16uu]{extra data}
782    \stopframed
783\stoplinecorrection
784
785Too much extra hash space also means too much equivalent space as these arrays
786run in parallel. In \LUAMETATEX\ we can let hash memory grow on demand so there
787the penalty is less.
788
789{\em It makes sense to move the \quote {other data} to the beginning so that we
790can use a smaller hash but. That could potentially save 4MB memory, but when we
791decide to limit the maximum number of registers to 8K (instead of 64K) we are at
792512KB so that might be easier as it avoids using offsets. And who knows how we
793can use the yet unused space later. Compared to \LUATEX\ we already save much
794memory elsewhere.}
795
796\stopsection
797
798\startsection[title=Save stack]
799
800I only mention this here because it relates to the table of equivalents. Whenever
801a quantity (register, parameter, macro, you name it) changes the engine registers
802the old value on the save stack when the assignment is local. The equivalent is
803replaced and when found in the save stack restored afterwards. In order to let
804the save stack not grow too much we try to only save a state when there is a real
805change. We can do that because we have a bit more information available and
806otherwise do a bit more testing. This is specific for \LUAMETATEX.
807
808\stopsection
809
810\startsection[title=Data types]
811
812The long winding explanation explanation in the previous section shows that we
813have a curious mix of data to manage. We already saw tokens and nodes but here we
814also saw registers. However, integers, dimensions and attributes are all
815basically just 32 bit numbers. Even a posit (float) fits into that space. So if
816you enter \type {10pt} internally it becomes a so called scaled (dimension). The
817skip registers point to a glue node and the token and box registers to a node
818list and those pointers are also numbers. So, what the user sees as a data type
819internally is just a number and its type (the command field in a token) tells
820what to do with it.
821
822When tracing is turned on there can be mentioning of save stack, input levels,
823fonts, languages, hyphenation, various character related properties and so on.
824Here we have specialized data structures that have their own memory layout and
825management. Where terms like token, node, integer (count), dimension and glue
826indicate something that the user should grasp, the entries in a save stack are
827never presented other than in an message.
828
829Manipulating data types is explained in various low level manuals, some relate to
830programming, and some to typesetting. It makes no sense to repeat that here. Take
831for instance macros: then come in variants (think of \typ {\protected} and|/|or
832\type {\tolerant} ones) can take arguments (which effectively are token lists)
833and the flags in the mentioned table of equivalents control take care of that.
834
835One aspect of token lists is worth mentioning: they start with a so called head
836token. So a list of length one actually has two tokens. The head keeps track of
837the fact that a list is a copy. Because a macro is also a token list, in \LUAMETATEX\
838the head also has some information that permits a more efficient code path. Because
839token lists are used all over the place in the engine, sharing makes sense.
840
841Attributes attached to a node are node lists themselves and these are also shared
842which not only saves memory but also is more performing. There are many places
843where \LUAMETATEX\ differs from its predecessors: there are more primitives,
844there is more data moved around but it got compensated by optimizing mechanisms.
845But as much as possible we stayed within the same paradigms.
846
847\stopsection
848
849\startsection[title=Time flies]
850
851For those curious about how different the engines are when it comes to memory usage,
852here is a quote from \TEX\ the program:
853
854\startnarrower
855
856Since we are assuming 32-bit integers, a halfword must contain at least 16 bits,
857and a quarterword must contain at least 8 bits. But it doesn't hurt to have more
858bits; for example, with enough 36-bit words you might be able to have \type {mem_max}
859as large as 262142, which is eight times as much memory as anybody had during the
860first four years of \TEX's existence.
861
862N.B.: Valuable memory space will be dreadfully wasted unless \TeX\ is compiled by
863a \PASCAL\ that packs all of the \typ {memory_word} variants into the space of a
864single integer. This means, for example, that \typ {glue_ratio} words should be
865\typ {short_real} instead of \type {real} on some computers. Some \PASCAL\
866compilers will pack an integer whose subrange is \typ {0 .. 255} into an
867eight-bit field, but others insist on allocating space for an additional sign
868bit; on such systems you can get 256 values into a quarterword only if the
869subrange is \typ {128 .. 127}.
870
871The present implementation tries to accommodate as many variations as possible,
872so it makes few assumptions. If integers having the subrange \typ
873{min_quarterword .. max_quarterword} can be packed into a quarterword, and if
874integers having the subrange \typ {min_halfword .. max_halfword} can be packed
875into a halfword, everything should work satisfactorily.
876
877It is usually most efficient to have \typ {min_quarterword = min_halfword = 0},
878so one should try to achieve this unless it causes a severe problem. The values
879defined here are recommended for most 32-bit computers.
880
881\stopnarrower
882
883This still applies to \PDFTEX\ although there a memory word is two 32 bit
884integer, so each halfword in there spans 32 bits, and a quarterword 16 bits. So
885what does that mean for nodes? Here is what the original code says about char
886nodes.
887
888\startnarrower
889
890A \typ {char_node}, which represents a single character, is the most important
891kind of node because it accounts for the vast majority of all boxes. Special
892precautions are therefore taken to ensure that a \typ {char_node} does not take
893up much memory space. Every such node is one word long, and in fact it is
894identifiable by this property, since other kinds of nodes have at least two
895words, and they appear in \typ {mem} locations less than \typ {hi_mem_min}. This
896makes it possible to omit the \typ {type} field in a \typ {char_node}, leaving us
897room for two bytes that identify a \typ {font} and a \typ {character} within that
898font.
899
900Note that the format of a \typ {char_node} allows for up to 256 different fonts
901and up to 256 characters per font; but most implementations will probably limit
902the total number of fonts to fewer than 75 per job, and most fonts will stick to
903characters whose codes are less than 128 (since higher codes are more difficult
904to access on most keyboards).
905
906\stopnarrower
907
908So, in order to save space these single size nodes use little memory. Even more
909interesting is the follow up on that explanation:
910
911\startnarrower
912
913Extensions of \TEX\ intended for oriental languages will need even more than $256
914\times 256$ possible characters, when we consider different sizes and styles of
915type. It is suggested that Chinese and Japanese fonts be handled by representing
916such characters in two consecutive \typ {char_node} entries: The first of these
917has \typ {font = font_base}, and its \typ {link} points to the second; the second
918identifies the font and the character dimensions. The saving feature about
919oriental characters is that most of them have the same box dimensions. The \typ
920{character} field of the first \typ {char_node} is a \typ {charext} that
921distinguishes between graphic symbols whose dimensions are identical for
922typesetting purposes. (See the \METAFONT\ manual.) Such an extension of \TEX\ would not
923be difficult; further details are left to the reader.
924
925In order to make sure that the \typ {character} code fits in a quarterword, \TEX\
926adds the quantity \typ {min_quarterword} to the actual code.
927
928\stopnarrower
929
930What if that had been implemented right from the start? What if \UTF8\ had been
931around at that time? Of course when 32 bit integers are used we can use these
932extra bit for a larger code range anyway.
933
934When we flash forward to \LUATEX\ we don't see that optimization and there are
935reasons for it. First of all content related nodes have an attribute list pointer
936as well as a \type {prev} field; lists are double linked. That means we don't reuse
937the \type {type} and \type {subtype} fields. The macros that define a glyph are:
938
939\starttyping[option=CPP]
940# define glyph_node_size       7
941# define character(a)          vinfo((a)+2)
942# define font(a)               vlink((a)+2)
943# define lang_data(a)          vinfo((a)+3)
944# define lig_ptr(a)            vlink((a)+3)
945# define x_displace(a)         vinfo((a)+4)
946# define y_displace(a)         vlink((a)+4)
947# define ex_glyph(a)           vinfo((a)+5)  /* expansion factor (hz) */
948# define glyph_node_data(a)    vlink((a)+5)
949# define synctex_tag_glyph(a)  vinfo((a)+6)
950# define synctex_line_glyph(a) vlink((a)+6)
951\stoptyping
952
953Instead of one memory word we use seven, and given the amount of characters
954on a page that adds quite a bit compared to the original. Of course it is
955irrelevant on todays machines. So how about \LUAMETATEX\ as of late 2024?
956
957\starttyping[option=CPP]
958# define glyph_node_size     14
959# define glyph_character(a)  vinfo(a,2)
960# define glyph_font(a)       vlink(a,2)   /*tex can be quarterword */
961# define glyph_data(a)       vinfo(a,3)   /*tex handy in context */
962# define glyph_state(a)      vlink(a,3)   /*tex handy in context */
963# define glyph_language(a)   vinfo0(a,4)
964# define glyph_script(a)     vinfo1(a,4)
965# define glyph_control(a)    vlink0(a,4)  /*tex we store 0xXXXX in the |\cccode| */
966# define glyph_reserved(a)   vlink1(a,4)
967# define glyph_options(a)    vinfo(a,5)
968# define glyph_hyphenate(a)  vlink(a,5)
969# define glyph_protected(a)  vinfo00(a,6)
970# define glyph_lhmin(a)      vinfo01(a,6)
971# define glyph_rhmin(a)      vinfo02(a,6)
972# define glyph_discpart(a)   vinfo03(a,6)
973# define glyph_expansion(a)  vlink(a,6)
974# define glyph_x_scale(a)    vinfo(a,7)
975# define glyph_y_scale(a)    vlink(a,7)
976# define glyph_scale(a)      vinfo(a,8)
977# define glyph_raise(a)      vlink(a,8)
978# define glyph_left(a)       vinfo(a,9)
979# define glyph_right(a)      vlink(a,9)
980# define glyph_x_offset(a)   vinfo(a,10)
981# define glyph_y_offset(a)   vlink(a,10)
982# define glyph_weight(a)     vinfo(a,11)
983# define glyph_slant(a)      vlink(a,11)
984# define glyph_properties(a) vinfo0(a,12)  /*tex for math */
985# define glyph_group(a)      vinfo1(a,12)  /*tex for math */
986# define glyph_index(a)      vlink(a,12)   /*tex for math */
987# define glyph_input_file(a) vinfo(a,13)
988# define glyph_input_line(a) vlink(a,13)
989\stoptyping
990
991We carry scaled, offsets, status information and various data around and consume
992twice what \LUATEX\ needs. In both cases there are the common fields:
993
994\starttyping[option=CPP]
995# define node_type(a)    vinfo0(a,0)
996# define node_subtype(a) vinfo1(a,0)
997# define node_next(a)    vlink(a,0)
998# define node_prev(a)    vlink(a,1)
999# define node_attr(a)    vinfo(a,1)
1000\stoptyping
1001
1002As you see, we still use the original \TEX\ \type {vinfo} and \type {vlink}
1003identifications but in \LUAMETATEX\ we have node specific verbose accessors
1004because we no longer use the same slots for (for instance) width, height and
1005depth. This of course has impact on the code base because now \typ {width(n)}
1006becomes a different accessor per node it applies to. We get less compact code
1007but gain readability and we often need to distinguish anyway. Where \LUATEX\
1008and predecessors we see:
1009
1010\starttyping[option=CPP]
1011w += width(n)
1012\stoptyping
1013
1014that covers boxes, glue and kerns. For glyphs we need to get the width from the
1015font using the \type {font} and \type {char} fields. Actually, in \TEX82\ that
1016can be done directly because we know that these values are okay. In \LUATEX\
1017however these values can be set in \LUA\ and therefore we do need to check if
1018they reference a loaded font and valid character slot. So in \LUATEX\ we do need
1019a dedicated function to get the glyph width.
1020
1021In \LUAMETATEX\ we have to be more granular and deal with each node type that has
1022width independently:
1023
1024\starttyping[option=CPP]
1025switch (subtype(n) {
1026    case glyph_node:
1027        w += tex_glyph_width(s);
1028        break;
1029    case hlist_node:
1030    case vlist_node:
1031        w += box_width(n);
1032        break;
1033    case rule_node:
1034        w += rule_width(n);
1035        break;
1036    case glue_node:
1037        w += glue_amount(n);
1038        break;
1039    case kern_node:
1040        w += kern_amount(s);
1041        break;
1042    case math_node:
1043        if (tex_math_glue_is_zero(s)) {
1044            w += math_surround(s);
1045        } else {
1046            w -= math_amount(s);
1047        }
1048        break;
1049
1050}
1051\stoptyping
1052
1053Because a glyph can have scaled set and similar features exist for glue we need to
1054distinguish need to distinguish anyway. Watch the math node: we have to deal with
1055either kern or glue.
1056
1057\stopsection
1058
1059\startsection[title=Keywords]
1060
1061The \ETEX\ extension added primitives, \PDFTEX\ did the same, as did \OMEGA\ and
1062therefore also \LUATEX, which took from its ancestors and added more. The
1063\LUAMETATEX\ engine again extends the repertoire. However, in order to control
1064some primitive (functional) behavior instead of using extra primitive parameters,
1065we use keywords. For instance \type {\hbox} accepts multiple \type {attr}, \typ
1066{direction}, (\LUATEX) but also \type {xoffset}, \type {yoffset}, \typ
1067{orientation} and more. This has no impact on compatibility because scanning
1068keywords stops at the left brace (or its equivalent). The \type {\hrule} like
1069primitives also accept more keywords but here scanning stops at an unknown
1070keyword, which can give interesting side effects when it's last in macro followed
1071by text that itself starts with a valid keyword (say \type {height}) but not by a
1072dimensions.
1073
1074\startlinenumbering
1075\starttyping
1076\def\foo{\hrule width 10pt} \foo height or depth, what about it.
1077\def\foo{\hrule width 10pt\relax} \foo height or depth, what about it.
1078\def\foo{\hrule width 10pt} \foo what about it.
1079\hbox to 20pt{x}
1080\hbox attr 999 1 to 20pt{x}
1081\hbox to 20pt attr 999 1 {x}
1082\stoptyping
1083\stoplinenumbering
1084
1085The first line gives an error, the second uses \type {\relax} to end the
1086scanning. The last line is wrong in \LUATEX\ where order matters while it's okay
1087in \LUAMETATEX. The third line is okay in \LUATEX\ where the \type {what} is
1088pushed back but wrong in \LUAMETATEX\ where it expect \type {w} to start a valid
1089keyword. The last is actually an incompatibility but one should keep in mind that
1090using \type {\relax} is the way to go here anyway. The same is true for scanning
1091glue specifications.
1092
1093The fact that \type {what} gets pushed back (in \LUATEX) into the input add extra
1094overhead. But in this case it's little. However, think of this in \LUATEX:
1095
1096\starttyping
1097if (scan_keyword("width")) {
1098    scan_normal_dimen();
1099    width(q) = cur_val;
1100    goto RESWITCH;
1101}
1102if (scan_keyword("height")) {
1103    scan_normal_dimen();
1104    height(q) = cur_val;
1105    goto RESWITCH;
1106}
1107if (scan_keyword("depth")) {
1108    scan_normal_dimen();
1109    depth(q) = cur_val;
1110    goto RESWITCH;
1111}
1112\stoptyping
1113
1114Here we push back two times when we only specify the \type {depth}. This is still
1115not that bad but imagine many more keywords. This is why in \LUAMETATEX\ we
1116cascade: we check for the first character and act on that and if needed do the
1117same with later characters (box specifications take \type {adapt}, \type {attr},
1118\type {anchor} and \type {axis} so here a second character differentiates. In par
1119passes we have \typ {adjustspacingstep}, \typ {adjustspacingshrink}, \typ
1120{adjustspacingstretch} so there is no need to push back the \typ {adjustspacings}
1121and if you look carefully \type {tep} and \type {tretch} also cascade. Of course
1122the code looks a bit more messy but we do gain here due to less push back and
1123therefore input level bumping. In some cases we also need less further tracing
1124because we already know what is coming. Of course given \TEX's already good
1125scanning performance it all depends on usage what we gain in practice.
1126
1127\stopsection
1128
1129\startsection[title=Sparse arrays]
1130
1131Because original \TEX\ supports 256 characters it can use data structures and
1132ranges in the main equivalent repertoire without too much overhead but with
1133\LUATEX\ we went \UNICODE\ so dedicated sparse arrays were used instead for \prm
1134{catcode}, \prm {lccode}, \prm {uccode} and \prm {sfcode}. The new \prm {hjcode},
1135math characters, delimiters and font character arrays also use this mechanism and
1136in \LUAMETATEX\ we use them even more. Although in principle we can use the
1137regular save stack for pushing and popping values each sparse array comes with
1138its own stack.
1139
1140In \LUAMETATEX\ this mechanism has been optimized. Depending on the kind of data
1141we use nibbles, bytes, shorts, integers or integer pairs. There is also more
1142aggressive optimization of storing the set values in the format file. Stack
1143management is more efficient too, which mostly has benefits for math where we use
1144sparse arrays for math parameters of which we have plenty.
1145
1146The sparse array mechanism is also interfaced to \LUA, and we might actually use
1147that feature in \CONTEXT\ some day.
1148
1149\stopsection
1150
1151\stopdocument
1152