luametatex-internals.tex /size: 13 Kb    last modification: 2021-10-28 13:50
1% language=us runpath=texruns:manuals/luametatex
2
3\environment luametatex-style
4
5\startcomponent luametatex-internals
6
7\startchapter[reference=internals,title={The internals}]
8
9\topicindex{nodes}
10\topicindex{boxes}
11\topicindex{\LUA}
12
13This is a reference manual and not a tutorial. This means that we discuss changes
14relative to traditional \TEX\ and also present new (or extended) functionality.
15As a consequence we will refer to concepts that we assume to be known or that
16might be explained later. Because the \LUATEX\ and \LUAMETATEX\ engines open up
17\TEX\ there's suddenly quite some more to explain, especially about the way a (to
18be) typeset stream moves through the machinery. However, discussing all that in
19detail makes not much sense, because deep knowledge is only relevant for those
20who write code not possible with regular \TEX\ and who are already familiar with
21these internals (or willing to spend time on figuring it out).
22
23So, the average user doesn't need to know much about what is in this manual. For
24instance fonts and languages are normally dealt with in the macro package that
25you use. Messing around with node lists is also often not really needed at the
26user level. If you do mess around, you'd better know what you're dealing with.
27Reading \quotation {The \TEX\ Book} by Donald Knuth is a good investment of time
28then also because it's good to know where it all started. A more summarizing
29overview is given by \quotation {\TEX\ by Topic} by Victor Eijkhout. You might
30want to peek in \quotation {The \ETEX\ manual} too.
31
32But \unknown\ if you're here because of \LUA, then all you need to know is that
33you can call it from within a run. If you want to learn the language, just read
34the well written \LUA\ book. The macro package that you use probably will provide
35a few wrapper mechanisms but the basic \prm {directlua} command that does the job
36is:
37
38\starttyping
39\directlua{tex.print("Hi there")}
40\stoptyping
41
42You can put code between curly braces but if it's a lot you can also put it in a
43file and load that file with the usual \LUA\ commands. If you don't know what
44this means, you definitely need to have a look at the \LUA\ book first.
45
46If you still decide to read on, then it's good to know what nodes are, so we do a
47quick introduction here. If you input this text:
48
49\starttyping
50Hi There ...
51\stoptyping
52
53eventually we will get a linked lists of nodes, which in \ASCII\ art looks like:
54
55\starttyping
56H <=> i <=> [glue] <=> T <=> h <=> e <=> r <=> e ...
57\stoptyping
58
59When we have a paragraph, we actually get something like this, where a \type
60{par} node stores some metadata and is followed by a \type {hlist} flagged as
61indent box:
62
63\starttyping
64[par] <=> [hlist] <=> H <=> i <=> [glue] <=> T <=> h <=> e <=> r <=> e ...
65\stoptyping
66
67Each character becomes a so called glyph node, a record with properties like the
68current font, the character code and the current language. Spaces become glue
69nodes. There are many node types and nodes can have many properties but that will
70be discussed later. Each node points back to a previous node or next node, given
71that these exist. Sometimes multiple characters are represented by one glyph
72(shape), so one can also get:
73
74\starttyping
75[par] <=> [hlist] <=> H <=> i <=> [glue] <=> Th <=> e <=> r <=> e ...
76\stoptyping
77
78And maybe some characters get positioned relative to each other, so we might
79see:
80
81\starttyping
82[par] <=> [hlist] <=> H <=> [kern] <=> i <=> [glue] <=> Th <=> e <=> r <=> e ...
83\stoptyping
84
85Actually, the above representation is one view, because in \LUAMETATEX\ we can
86choose for this:
87
88\starttyping
89[par] <=> [glue] <=> H <=> [kern] <=> i <=> [glue] <=> Th <=> e <=> r <=> e ...
90\stoptyping
91
92where glue (currently fixed) is used instead of an empty hlist (think of a \type
93{\hbox}). Options like this are available because want a certain view on these
94lists from the \LUA\ end and the result being predicable is part of that.
95
96It's also good to know beforehand that \TEX\ is basically centered around
97creating paragraphs and pages. The par builder takes a list and breaks it into
98lines. At some point horizontal blobs are wrapped into vertical ones. Lines are
99so called boxes and can be separated by glue, penalties and more. The page
100builder accumulates lines and when feasible triggers an output routine that will
101take the list so far. Constructing the actual page is not part of \TEX\ but done
102using primitives that permit manipulation of boxes. The result is handled back to
103\TEX\ and flushed to a (often \PDF) file.
104
105\starttyping
106\setbox\scratchbox\vbox\bgroup
107    line 1\par line 2
108\egroup
109
110\showbox\scratchbox
111\stoptyping
112
113The above code produces the next log lines that reveal how the engines sees a
114paragraph (wrapped in a \type {\vbox}):
115
116\starttyping[style=small]
1171:4: > \box257=
1181:4: \vbox[normal][16=1,17=1,47=1], width 483.69687, height 27.58083, depth 0.1416, direction l2r
1191:4: .\list
1201:4: ..\hbox[line][16=1,17=1,47=1], width 483.69687, height 7.59766, depth 0.1416, glue 455.40097fil, direction l2r
1211:4: ...\list
1221:4: ....\glue[left hang][16=1,17=1,47=1] 0.0pt
1231:4: ....\glue[left][16=1,17=1,47=1] 0.0pt
1241:4: ....\glue[parfillleft][16=1,17=1,47=1] 0.0pt
1251:4: ....\par[newgraf][16=1,17=1,47=1], hangafter 1, hsize 483.69687, pretolerance 100, tolerance 3000, adjdemerits 10000, linepenalty 10, doublehyphendemerits 10000, finalhyphendemerits 5000, clubpenalty 2000, widowpenalty 2000, brokenpenalty 100, emergencystretch 12.0, parfillskip 0.0pt plus 1.0fil, hyphenationmode 499519
1261:4: ....\glue[indent][16=1,17=1,47=1] 0.0pt
1271:4: ....\glyph[32768][16=1,17=1,47=1], language (n=1,l=2,r=3), hyphenationmode 499519, options 128 , font <30: DejaVuSerif @ 10.0pt>, glyph U+00006C l
1281:4: ....\glyph[32768][16=1,17=1,47=1], language (n=1,l=2,r=3), hyphenationmode 499519, options 128 , font <30: DejaVuSerif @ 10.0pt>, glyph U+000069 i
1291:4: ....\glyph[32768][16=1,17=1,47=1], language (n=1,l=2,r=3), hyphenationmode 499519, options 128 , font <30: DejaVuSerif @ 10.0pt>, glyph U+00006E n
1301:4: ....\glyph[32768][16=1,17=1,47=1], language (n=1,l=2,r=3), hyphenationmode 499519, options 128 , font <30: DejaVuSerif @ 10.0pt>, glyph U+000065 e
1311:4: ....\glue[space][16=1,17=1,47=1] 3.17871pt plus 1.58936pt minus 1.05957pt, font 30
1321:4: ....\glyph[32768][16=1,17=1,47=1], language (n=1,l=2,r=3), hyphenationmode 499519, options 128 , font <30: DejaVuSerif @ 10.0pt>, glyph U+000031 1
1331:4: ....\penalty[line][16=1,17=1,47=1] 10000
1341:4: ....\glue[parfill][16=1,17=1,47=1] 0.0pt plus 1.0fil
1351:4: ....\glue[right][16=1,17=1,47=1] 0.0pt
1361:4: ....\glue[right hang][16=1,17=1,47=1] 0.0pt
1371:4: ..\glue[par][16=1,17=1,47=1] 5.44995pt plus 1.81665pt minus 1.81665pt
1381:4: ..\glue[baseline][16=1,17=1,47=1] 6.79396pt
1391:4: ..\hbox[line][16=1,17=1,47=1], width 483.69687, height 7.59766, depth 0.1416, glue 455.40097fil, direction l2r
1401:4: ...\list
1411:4: ....\glue[left hang][16=1,17=1,47=1] 0.0pt
1421:4: ....\glue[left][16=1,17=1,47=1] 0.0pt
1431:4: ....\glue[parfillleft][16=1,17=1,47=1] 0.0pt
1441:4: ....\par[newgraf][16=1,17=1,47=1], hangafter 1, hsize 483.69687, pretolerance 100, tolerance 3000, adjdemerits 10000, linepenalty 10, doublehyphendemerits 10000, finalhyphendemerits 5000, clubpenalty 2000, widowpenalty 2000, brokenpenalty 100, emergencystretch 12.0, parfillskip 0.0pt plus 1.0fil, hyphenationmode 499519
1451:4: ....\glue[indent][16=1,17=1,47=1] 0.0pt
1461:4: ....\glyph[32768][16=1,17=1,47=1], language (n=1,l=2,r=3), hyphenationmode 499519, options 128 , font <30: DejaVuSerif @ 10.0pt>, glyph U+00006C l
1471:4: ....\glyph[32768][16=1,17=1,47=1], language (n=1,l=2,r=3), hyphenationmode 499519, options 128 , font <30: DejaVuSerif @ 10.0pt>, glyph U+000069 i
1481:4: ....\glyph[32768][16=1,17=1,47=1], language (n=1,l=2,r=3), hyphenationmode 499519, options 128 , font <30: DejaVuSerif @ 10.0pt>, glyph U+00006E n
1491:4: ....\glyph[32768][16=1,17=1,47=1], language (n=1,l=2,r=3), hyphenationmode 499519, options 128 , font <30: DejaVuSerif @ 10.0pt>, glyph U+000065 e
1501:4: ....\glue[space][16=1,17=1,47=1] 3.17871pt plus 1.58936pt minus 1.05957pt, font 30
1511:4: ....\glyph[32768][16=1,17=1,47=1], language (n=1,l=2,r=3), hyphenationmode 499519, options 128 , font <30: DejaVuSerif @ 10.0pt>, glyph U+000032 2
1521:4: ....\penalty[line][16=1,17=1,47=1] 10000
1531:4: ....\glue[parfill][16=1,17=1,47=1] 0.0pt plus 1.0fil
1541:4: ....\glue[right][16=1,17=1,47=1] 0.0pt
1551:4: ....\glue[right hang][16=1,17=1,47=1] 0.0pt
156\stoptyping
157
158The \LUATEX\ engine provides hooks for \LUA\ code at nearly every reasonable
159point in the process: collecting content, hyphenating, applying font features,
160breaking into lines, etc. This means that you can overload \TEX's natural
161behaviour, which still is the benchmark. When we refer to \quote {callbacks} we
162means these hooks. The \TEX\ engine itself is pretty well optimized but when you
163kick in much \LUA\ code, you will notices that performance drops. Don't blame and
164bother the authors with performance issues. In \CONTEXT\ over 50\% of the time
165can be spent in \LUA, but so far we didn't get many complaints about efficiency.
166Adding more callbacks makes no sense, also because at some point the performance
167hit gets too large. There are plenty of ways to achieve goals. For that reason:
168take remarks about \LUATEX, features, potential, performance etc.\ with a natural
169grain of salt.
170
171Where plain \TEX\ is basically a basic framework for writing a specific style,
172macro packages like \CONTEXT\ and \LATEX\ provide the user a whole lot of
173additional tools to make documents look good. They hide the dirty details of font
174management, language support, turning structure into typeset results, wrapping
175pages, including images, and so on. You should be aware of the fact that when you
176hook in your own code to manipulate lists, this can interfere with the macro
177package that you use. Each successive step expects a certain result and if you
178mess around to much, the engine eventually might bark and quit. It can even
179crash, because testing everywhere for what users can do wrong is no real option.
180
181When you read about nodes in the following chapters it's good to keep in mind
182what commands relate to them. Here are a few:
183
184\starttabulate[|l|l|p|]
185\DB command              \BC node          \BC explanation \NC \NR
186\TB
187\NC \prm {hbox}          \NC \nod {hlist} \NC horizontal box \NC \NR
188\NC \prm {vbox}          \NC \nod {vlist} \NC vertical box with the baseline at the bottom \NC \NR
189\NC \prm {vtop}          \NC \nod {vlist} \NC vertical box with the baseline at the top \NC \NR
190\NC \prm {hskip}         \NC \nod {glue}  \NC horizontal skip with optional stretch and shrink \NC \NR
191\NC \prm {vskip}         \NC \nod {glue}  \NC vertical skip with optional stretch and shrink \NC \NR
192\NC \prm {kern}          \NC \nod {kern}  \NC horizontal or vertical fixed skip \NC \NR
193\NC \prm {discretionary} \NC \nod {disc}  \NC hyphenation point (pre, post, replace) \NC \NR
194\NC \prm {char}          \NC \nod {glyph} \NC a character \NC \NR
195\NC \prm {hrule}         \NC \nod {rule}  \NC a horizontal rule \NC \NR
196\NC \prm {vrule}         \NC \nod {rule}  \NC a vertical rule \NC \NR
197\NC \prm {textdirection} \NC \nod {dir}   \NC a change in text direction \NC \NR
198\LL
199\stoptabulate
200
201Whatever we feed into \TEX\ at some point becomes a token which is either
202interpreted directly or stored in a linked list. A token is just a number that
203encodes a specific command (operator) and some value (operand) that further
204specifies what that command is supposed to do. In addition to an interface to
205nodes, there is an interface to tokens, as later chapters will demonstrate.
206
207Text (interspersed with macros) comes from an input medium. This can be a file,
208token list, macro body cq.\ arguments, some internal quantity (like a number),
209\LUA, etc. Macros get expanded. In the process \TEX\ can enter a group. Inside
210the group, changes to registers get saved on a stack, and restored after leaving
211the group. When conditionals are encountered, another kind of nesting happens,
212and again there is a stack involved. Tokens, expansion, stacks, input levels are
213all terms used in the next chapters. Don't worry, they loose their magic once you
214use \TEX\ a lot. You have access to most of the internals and when not, at least
215it is possible to query some state we're in or level we're at.
216
217When we talk about pack(ag)ing it can mean two things. When \TEX\ has consumed
218some tokens that represent text they are added to the current list. When the text
219is put into a so called \type {\hbox} (for instance a line in a paragraph) it
220(normally) first gets hyphenated, next ligatures are build, and finally kerns are
221added. Each of these stages can be overloaded using \LUA\ code. When these three
222stages are finished, the dimension of the content is calculated and the box gets
223its width, height and depth. What happens with the box depends on what macros do
224with it.
225
226The other thing that can happen is that the text starts a new paragraph. In that
227case some information is stored in a leading \type {par} node. Then indentation
228is appended and the paragraph ends with some glue. Again the three stages are
229applied but this time afterwards, the long line is broken into lines and the
230result is either added to the content of a box or to the main vertical list (the
231running text so to say). This is called par building. At some point \TEX\ decides
232that enough is enough and it will trigger the page builder. So, building is
233another concept we will encounter. Another example of a builder is the one that
234turns an intermediate math list into something typeset.
235
236Wrapping something in a box is called packing. Adding something to a list is
237described in terms of contributing. The more complicated processes are wrapped
238into builders. For now this should be enough to enable you to understand the next
239chapters. The text is not as enlightening and entertaining as Don Knuths books,
240sorry.
241
242\stopchapter
243
244\stopcomponent
245