ontarget-registers.tex /size: 9624 b    last modification: 2024-01-16 10:21
1% language=us runpath=texruns:manuals/ontarget
2
3\startcomponent ontarget-registers
4
5\environment ontarget-style
6
7\startchapter[title={Gaining performance}]
8
9In the meantime (2022) the \LUAMETATEX\ engine has touched many aspects of the
10original \TEX\ implementation. This has resulted in less memory consumption than
11for instance \LUATEX\ when we talk tokens, more efficient macro handing,
12additional storage options and numerous new features and optimizations. Of course
13one can disagree about all of this, but what matters to us is that it facilitates
14\CONTEXT\ well. That macro package went from \MKII\ to \MKIV\ to \MKXL\ (aka
15\LMTX).
16
17Although over the years the macros evolved the basic ideas haven't changed: it is
18a keyword driven macro package that is set up in a way that makes it possible to
19move forward. In spite of what one might think, the fundamentals didn't change
20much. It looks like we made the right decisions at the start, which means that we
21can change low level implementations to match the engine without users noticing
22much. Of course in the area of fonts, input encoding and languages things have
23changed simply because the environment in which we operate changes.
24
25A fundamental difference between \PDFTEX\ and \LUAMETATEX\ is that the later is
26in many aspects 32 and even 64 bit all over the place. That comes with a huge
27performance hit but also with possibilities (that I won't discuss here now)! On a
28simple document nothing can beat \PDFTEX, even with the optimizations that we can
29apply when using the modern engines. However, on more complex documents reality
30is that \LUAMETATEX\ can outperform \PDFTEX, and documents (read: user demands)
31have become more complex indeed.
32
33So, how does that work in practice? One can add some features to an engine but
34then the macro package has to be adapted. Due to the way \CONTEXT\ is organized
35it was not that hard to keep it in sync with new features, although not all are
36applied yet to full extend. Some new features improved performance, others made
37the machinery (or its usage) a bit slower. The first versions of \LUAMETATEX\
38were some 25\percent\ slower than \LUATEX, simply because the backend is written
39in \LUA. But, end 2022 we can safely say that \LUAMETATEX\ can be 50\percent\
40faster than its ancestor. This is due to a mix of the already mentioned
41optimizations and new features, for instance a more powerful macro parser. The
42backend has become more complex too, but also benefits from a few more helpers.
43
44Because we spend a lot of time in \LUA\ the interfaces to \TEX\ have been
45extended and improved too. Of course we depend on the \LUA\ interpreter being
46kept in optimum state by its authors. It must be said that quite some of the
47interfaces might look obscure but these are not really meant for the average user
48anyway. Also, as soon as one messes with tokens and nodes at that level one
49definitely need to know what one's doing!
50
51The more stable the engine becomes, the less there is to improve. Occasionally it
52was possible to squeeze our a few more milliseconds on run but it depends a lot
53of what one does. And \TEX\ is already quite fast anyway. Of course 0.005 seconds
54on a 5 second run is not much but hundred times such an improvement is
55noticeable, especially when there are multiple runs or when one processes a batch
56of 10.000 documents (each needing two runs).
57
58One interesting aspect of \TEX\ that it can surprise you every now and then. End
592022 I decided to play a bit more with a feature that has been around for a
60while:
61
62\starttyping
63\integerdef  \fooA 123
64\dimensiondef\fooB 123pt
65\stoptyping
66
67These primitives create a counter and a dimen where the value is stored in the hash
68table. The original reason was that I didn't want to spoil registers. But although
69these are basically constants there is more to it now.
70
71\starttyping
72\countdef\fooC 27
73\dimendef\fooD 56
74\stoptyping
75
76These primitives create a command that stores the register number (here 27 and
7756) with the name. In this case a \quote {variable} is accessed in two steps: the
78\type {\fooC} macro expands to an register accessor with value 27. Next that
79accessor will kick in and fetch (or set) the value in slot 27 of the memory range
80bound to (in total 65K) counters. All these registers sit a the lower end of
81\TEX's memory which is definitely not next to the meaning of \type {\fooC}. So we
82have two memory accesses to get to the number. Contrary to that once we are at
83\type {\fooA} we are also at the value. Although memory access can be fast when
84the relevant slots are cached in practice it can give delays, especially in a
85program like \TEX\ where most data is spread all over the place. And imagine other
86processes competing for access too.
87
88It is for that reason that I decided to replace the more or less \quote
89{constant} property of \type {\fooA} by one that also supports assignments As
90well as the arithmic commands like \type {\advance}. This was not that hard due
91to the way the \LUAMETATEX\ source is organized. After that using these pseudo
92constants proved to be more efficient than registers, but of course I then had to
93adapt the source. Interestingly that should have been easy because one only needs
94to change the definitions of for instance \type {\newcount} but in practice that
95doesn't work because it will|/|can break for instance generic packages like Tikz.
96
97So, in the end a new allocator was added and just over 1000 lines in some 120
98files (with some overlap) had to be adapted to this. In addition some precautions
99had to be made for access from \LUA\ because the quantities were no longer
100registers. But it was rewarding in the sense that the test suite now ran some
1015\percent\ faster and processing the \LUAMETATEX\ manual went from 8.7 seconds on
102my laptop down to around 8.5, which is not bad.
103
104Now why do we bother so much about performance? If I really want a faster run
105using a decent desktop is of more help. But even then there can be reasons. When
106Mikael and I were discussing math engine developments at some point we noticed
107that a run took twice as much time as a result of (supposedly idle) background
108tasks. Now keep in mind that \TEX\ uses a single core so with plenty cores it
109should not be that bad. However, when the video chat program takes half of the
110CPU power, or when a mathematical manipulation program idles in the background
111taking 80 percent of a modern machine, or when a popular editor keeps all kind of
112plug ins busy for no reason, or when a supposedly closed a browser consumes
113gigabytes of memory and keeps dozens of supposedly idle threads busy, it becomes
114clear that we should not let \TEX\ put a large burden on memory access (and
115cache).
116
117It can get even worse when one runs on virtual machines where the host suggests
118that you get 16 cores so that you can run a dozen \TEX\ jobs in parallel but
119simple measurements show that these shared cores report a much higher ideal
120performance than the one you measure. So, the less demanding a \CONTEXT\ run
121becomes, the better: we're not so much after the .2 seconds on a 8 second run,
122but more after 3 seconds for that same run when using shared resources where it
123became 15 seconds. And this is what observations with respect to the performance
124of the test suite seem to indicate.
125
126In the end it's mostly about comfort: when you process a document of 300 pages,
12710 seconds is quite okay for a few changes, because one can relate time to
128output, but 20 seconds \unknown\ And when processing a a few page document the
129waiting time of a second is often less than what one needs to move the mouse
130around to the viewer. Also, when a user starts \TEX\ on the console and
131afterwards opens a browser from there that second is even less noticeable.
132
133Now let's go back to improvements. A related addition was \type {\advanceby} that
134doesn't check for the \type {by} keyword. When there is no such keyword we can
135avoid pushing back the non|-|matching next token which is also noticeable. Here
136about 680 changes were needed. Changes like these only make a difference in
137performance for some very demanding mechanisms in \CONTEXT. Again one cannot
138overload an existing primitive because generic packages can fail (as the test
139suite proved). There were also a few places where a dirty trick had to be changed
140because we cannot alias these constants.
141
142We can give similar stories about other improvements but this one sort of stands
143out because it is so noticeable. Also, other changes involve more drastic low
144level adaptations of \CONTEXT\ so these happen over a longer period of time. Of
145course all has to happen in ways that don't impact users. An example of a
146performance primitive is \typ {\advancebyplusone} which is actually implemented
147but still disabled because the gain is in hundreds of seconds range and I need to
148(again) adapt the source in order to benefit.
149
150The mentioned register variants are implemented for count (integer), dimen
151(dimension), skip (gluespec) and muskip (mugluespec). Token registers are more
152complex as they have reference counters as well as more manipulator primitives.
153The same is true for boxes (although it is tempting to come up with some faster
154access mechanism) and attributes, that also have more diverse accessors. Also,
155token lists and boxes involve way more than a simple assignment or access so any
156gain will drown in other actions. That said, it really makes sense now to drop
157the maximum of 64K registers to some more reasonable 8K (or even less for mu
158skips). That will save a couple of megabytes which sounds like little but still
159puts less burden on the system.
160
161\stopchapter
162
163\stopcomponent
164
165