ontarget-constants.tex /size: 8440 b    last modification: 2024-01-16 10:21
1% language=us runpath=texruns:manuals/ontarget
2
3\startcomponent ontarget-constants
4
5\environment ontarget-style
6
7\usemodule[system-tokens]
8
9\startchapter[title={Constants}]
10
11Strings don't really fit into the concept of \TEX. There everything we input and
12store is tokens and nodes, so when you define a macro like
13
14\startbuffer
15\def\foo{foo}
16\stopbuffer
17
18\typebuffer \getbuffer
19
20you don't store a string but a tokenlist with three tokens:
21
22\luatokentable\foo
23
24We have three single byte characters but end up with 32 bytes memory used because
25we have a linked list with a housekeeping initial token; such a token has a value
26(operator & operand) as well as a pointer to a next token. This is quite ok
27because whenever we need that macro the body has to be interpreted and it already
28being tokenized is what makes \TEX\ fly.
29
30There are occasions where the expansion of a list that itself can contain
31references to macros produces a new list in which case copies are being made.
32Take this:
33
34\startbuffer
35\def\oof{foo}
36\def\foo{foo \oof}
37\stopbuffer
38
39\typebuffer \getbuffer
40
41\luatokentable\foo
42
43When \type {\foo} is expanded, the macro body is pushed onto the input stack and
44traversed and when \type {\oof} is seen, that one gets pushed and processed. No
45copy is needed. Now take this:
46
47\startbuffer
48\def\oof{foo}
49\edef\foo{foo \oof}
50\stopbuffer
51
52\typebuffer \getbuffer
53
54\luatokentable\foo
55
56Here \type {\foo} gets the expanded result but again \type {\oof} got pushed onto
57the stack. This doesn't involved copying either but there is still the pushing
58and popping input overhead. So when does copying occur? Here is an example:
59
60\startbuffer
61\def\oof{oof}
62\def\ofo{ofo}
63\def\foo{\begincsname \oof:\ofo\endcsname}
64\stopbuffer
65
66\typebuffer \getbuffer
67
68\luatokentable\foo
69
70When a csname is checked, the engine needs to construct a string in order to access the
71hash table. Here is what happens:
72
73\startitemize[packed]
74\startitem
75    everything upto the \type {\endcsname} is collected
76\stopitem
77\startitem
78    in the process macros are expanded (with pushing and popping input) and the
79    expanded tokens are appended to the result
80\stopitem
81\startitem
82    when we're okay that list get converted to a string
83\stopitem
84\startitem
85    that string is used as lookup into the hash
86\stopitem
87\stopitemize
88
89Normally we're okay but when there is some unexpected unexpandable token (an
90assignment, node generator, protected macro, etc.) the collection stops and the
91list so far is recycled. This process is quite efficient, as is everything \TEX,
92but given that going from token list to string involved some \UTF8 juggling too
93there definitely is some overhead.
94
95In \CONTEXT\ we use csname checking and usage quite a lot. The first line is the
96traditional way. It has the disadvantage that it creates an hash entry with alias
97\type {\relax} if there is no such name. That is why \ETEX\ came up with the test
98as in the second line. In \LUATEX\ we introduced \type {\lastnamedcs} so that we
99don't have to construct the mentioned) token list again which saves time. The
100fourth line is similar to the first line but doesn't create a new command.
101
102\starttyping
103\csname      \namespace\key\endcsname                                  ...
104\ifcsname    \namespace\key\endcsname \csname \namespace\key\endcsname ... \fi
105\ifcsname    \namespace\key\endcsname \lastnamedcs                     ... \fi
106\begincsname \namespace\key\endcsname
107\stoptyping
108
109One of the things all versions of \CONTEXT\ have in common (right from the start)
110is that we use this namespace model consistently. In \MKIV\ we changed the
111subsystem that deals with this: it's more flexible and uses less memory but it
112also has way more overhead. But on the average performance is about the same so
113users didn't notice that.
114
115There is however a trick to speed this up a bit. In the 360 page \LUAMETATEX\
116manual we expand macros like \type {\namespace} and \type {\key} 4.3 million
117times (beginning of June 2023). Because Mikael Sundqvist and I are in the middle
118of some math magic, we also checked his 300 page math book, and that also does it
1194.2 million times (the gain was about 0.5 seconds). The upcoming math manual has
120some 1.2 million. How come that we have so many expansions? First of all we use
121abstraction when possible and that means that there's plenty of checking of
122options and some constructs fall back on parent classes (sometimes more that two
123times up the parent chain). Also, we often have three macros to expand:
124
125\starttyping
126\ifcsname\namespace\currentinstance\key\endcsname
127\stoptyping
128
129But these have an important property: their body is basically a string. Nothing
130in there needs expansion and if it does, it's an indication of rubish that
131doesn't contribute for a valid csname anyway. Once we know that we can improve
132performance:
133
134\startbuffer
135\cdef\oof{oof}
136\cdef\ofo{ofo}
137\def\foo{\begincsname \oof:\ofo\endcsname}
138\stopbuffer
139
140\typebuffer \getbuffer
141
142So, \type {\cdef} (or \type {\constant \edef} flags the macro as being a constant
143that doesn't require expansion. For the record, when you define that macro having
144arguments it just becomes an \type {\edef}.
145
146\luatokentable\foo
147
148Here we define the two macros as constant ones which in practice means that they
149are just macros but also indicates that in some scenarios we can directly use
150their body. Now when in this csname construction we do this instead:
151
152\startitemize[packed]
153\startitem
154    everything upto the \type {\endcsname} is collected
155\stopitem
156\startitem
157    in the process macros are expanded (with pushing and popping input) but when
158    we have a constant we add reference token when there is more than one body
159    token, otherwise the expanded tokens are appended to the result
160\stopitem
161\startitem
162    when we're okay that list get converted to a string and in that stage we just
163    convert the referenced body of the constant
164\stopitem
165\startitem
166    that string is used as lookup into the hash
167\stopitem
168\stopitemize
169
170So, instead of immediately injecting an expanded body of a macro that needs no
171expansion we inject a reference and use that later on for the conversion into
172characters. On the 4242938 times in \LUAMETATEX\ (at the time of writing this) this
173trick gives the following results.
174
175\starttyping
176\edef\foo{xxxx} \begincsname\foo\endcsname    0.37
177\cdef\foo{xxxx} \begincsname\foo\endcsname    0.28
178\edef\foo{xxxx} \ifcsname\foo\endcsname\fi    0.53
179\cdef\foo{xxxx} \ifcsname\foo\endcsname\fi    0.35
180\stoptyping
181
182And here for an existing command (\type {\relax}):
183
184\starttyping
185\edef\foo{relax} \begincsname\foo\endcsname   0.55
186\cdef\foo{relax} \begincsname\foo\endcsname   0.36
187\edef\foo{relax} \ifcsname\foo\endcsname\fi   0.62
188\cdef\foo{relax} \ifcsname\foo\endcsname\fi   0.36
189\stoptyping
190
191When I used that trick in for instance some font switching macros it also had
192some gain. For instance 200000 times \type {\it} went from 0.60 down to 0.54
193seconds but it is unlikely that in a document one does that many font switches.
194\footnote {There are a few more places where constants can gain a little but
195those don't add up much.}
196
197In practice other operations play a role, so here we might also benefit from the
198data being in the \CPU\ cache but on the manual I gained a decent .2 seconds. One
199can question if on a 8.5 second run this is worth the trouble. However, in this
200particular manual we spend 3.5 seconds on font processing, some 1.5 seconds on
201the backend and have a unique \METAPOST\ graphics on every page. We spend more
202time in \LUA\ than in \TEX ! On 4 seconds \TEX, these .2 seconds is some 2.5
203gain, and it might actually be even more percent wise.
204
205In case one wonders why I spend time on this, one reason is that the last decade
206I was not that impressed by performance gains of a single core and \TEX\ is a
207single core process. I also can't afford the latest greatest laptops and
208definitely don't want to contribute more e-waste. Also, with \TEX\ and friends
209running on virtual machines and competing for resources (memory, \CPU\ and disk
210or network drives) any gain is good gain. Of course it is also fun to improve
211\LUAMETATEX\ and this string-like property has always bothered me. \footnote {I
212did some experiment with a native string register but that made no sense because
213then tokenization in other places takes a toll. With the mentioned constants we
214don't pay that price.}
215
216\stopchapter
217
218\stopcomponent
219
220% timestamp: Peter Gabriel Live 2023 Amsterdam
221