followingup-formats.tex /size: 18 Kb    last modification: 2021-10-28 13:50
1% language=us runpath=texruns:manuals/followingup
2
3\environment followingup-style
4
5\startcomponent  followingup-format
6
7\startchapter[title={The format file}]
8
9It is interesting when someone compares macro package and used parameters like
10the size of a format file, the output of \type {\tracingall}, or startup time to
11make some point. The point I want to make here is that unless you know exactly
12what goes on in a run that involves a real document, which can itself involve
13multiple runs, such a comparison is rather pointless. For sure I do benchmark but
14I can only draw conclusions of what I (can) know (about). Yes, benchmarking your
15own work makes sense but doing that in comparisons to what you consider
16comparable variants assumes knowledge of more than your own work and objectives.
17
18
19For instance, when you load few fonts, typeset one page and don't do anything
20that demands any processing or multiple runs, you basically don't measure
21anything. More interesting are the differences between 10 or 500 pages, a few
22font calls or tens of thousands, no color of extensive usage of color and other
23properties, interfacing, including inheritance of document constructs, etc. And
24even then, when comparing macro packages, it is kind of tricky to deduce much
25from what you observe. You really need to know what is going on inside and also
26how that relates to for instance adaptive font scaling. You can have a fast start
27up but if a users needs one tikz picture, loading that package alone will make
28you forget the initial startup time. You always pay a price for advanced features
29and integration! And we didn't even talk about the operating system caching
30files, running on a network share, sharing processors among virtual machines,
31etc.
32
33Pointless comparing is also true for looking at the log file when enabling \type
34{\tracingall}. When a macro package loads stuff at startup you can be sure that
35the log file is larger. When a font or language is loaded the first time, or
36maybe when math is set up there can be plenty of lines dumped. Advanced analysis
37of conditions and trial runs come at a price too. And eventually, when a box is
38shown the configured depth and breadth really matter, and it might also be that
39the engine provides much more (verbose) detail. So, a comparison is again
40pointless. It can also backfire. Over the decades of developing \CONTEXT\ I have
41heard people working on systems make claims like \quotation {We prefer not to
42\unknown} or \quotation {It is better to it this way \unknown} or (often about
43operating system) \quotation {It is bad that \unknown} just to see years later
44the same being done in the presumed better alternative. I can have a good laugh
45about that: do this and don't do that backfiring.
46
47That brings us to the format file. When you make a \CONTEXT\ format with the
48English user interface, with interfacing being a feature that itself introduces
49overhead, the \LUATEX\ engine will show this at the end:
50
51\starttyping
52Beginning to dump on file cont-en.fmt
53 (format=cont-en 2021.6.9)
5448605 strings using 784307 bytes
551050637 memory locations dumped; current usage is 414&523763
5644974 multiletter control sequences
57\font\nullfont=nullfont
580 preloaded fonts
59\stoptyping
60
61The file itself is quite large: 11,129,903 bytes. However, it is actually much
62larger because the format file is compressed! The real size is 19.399.216. Not
63taking that into account when comparing the size of format files is kind of bad
64because compression directly relates to what resources a format use and how
65usage is distributed over the available memory blobs. The \LUATEX\ engine does
66some optimizations and saves the data sparse but the more holes you create, the
67worse it gets. For instance, the large character vectors are compartmentalized in
68order to handle \UNICODE\ efficiently so the used memory relates to what you
69define: do you set up all catcodes or just a subset. Maybe you delay some
70initialization to after the format is loaded, in which case a smaller format file
71gets compensated by more memory usage and initializaton time afterwards. Maybe
72your temporary macros create holes in the token array. The memory that is
73configured in the configuration files also matter. Some memory blobs are saved at
74their configured size, others dismiss the top part that is not used when saving
75the format but allocate the lot when the format is loaded. That means that memory
76usage in for instance \LUATEX\ can be much larger than a format file suggests.
77Keep in mind that a format file is basically a memory dump.
78
79Now, how does \LUAMETATEX\ compare to \LUATEX. Again we will look at the size of
80the format file, but you need to keep in mind that for various reasons the \LMTX\
81macros are somewhat more efficient than the \MKIV\ ones, in the meantime some new
82mechanism were added, which adds more \TEX\ and \LUA\ code, but I still expect
83(at least for now) a smaller format file. However when we create the format we
84see this (reformatted):
85
86\starttyping
87Dumping format 'cont-en.fmt 2021.6.9' in file 'cont-en.fmt':
88tokenlist compacted from 489733 to 488204 entries,
891437 potentially aliased lua call/value entries,
90max string length 69, 16 fingerprint
91+ 16 engine + 28 preamble
92+ 836326 stringpool
93+ 10655 nodes + 3905660 tokens
94+ 705300 equivalents
95+ 23072 math codes + 493024 text codes
96+ 38132 primitives + 497352 hashtable
97+ 4 fonts + 10272 math + 1008 language + 180 insert
98+ 10305643 bytecodes
99+ 12 housekeeping = 16826700 total.
100\stoptyping
101
102This looks quite different from the \LUATEX\ output. Here we report more detail:
103for each blob we mention the number of bytes used. The final result is a file
104that takes 16.826.700 bytes. That number should be compared with the 19.399.216
105for \LUATEX. So, we need less indeed. But, when we compress the \LUAMETATEX\
106format we get this: 5,913,932 which is much less than the 11,129,903 compressed
107size that the \LUATEX\ engine makes of it. One reason for using level 3 zip
108compression compression in \LUATEX\ is that (definitely when we started) it loads
109faster. It adds to creating the dump but doesn't really influence loading,
110although that depends a bit on the compiler used. It is not easy to see from
111these numbers what goes on, but when you consider the fact that we mostly store
11232 bit numbers it will also be clear that many can be zero or have two or three
113zero bytes. There's a lot of repetition involved!
114
115So let's look at some of these numbers. The mentioning of token list compaction
116relates to getting rid of holes in memory. Each token takes 8 bytes, 4 for the
117token identifier, internally called a cmd and chr, and 4 for a value like an
118integer or dimension value, or a glue pointer, or a pointer to a next token, etc.
119In our case compaction doesn't save that much.
120
121The mentioning of potentially aliased \LUA\ call|/|value entries is more a warning.
122Because the \LUA\ engine starts fresh each run, you cannot store its \quote
123{pointers} and because hashes are randomized this means that you need to delay
124initialization to startup time, definitely for function tokens.
125
126Strings in \TEX\ can be pretty long but in practice they aren't. In \CONTEXT\ the
127maximum string length is 69. This makes it possible to use one byte for
128registering the string length instead of four which saves quite a bit. Of course
129one large string will spoil this game.
130
131The fingerprint, engine, preamble and later housekeeping bytes can be neglected
132but the string pool not. These are the bytes that make up the strings. The bytes
133are stored in format but when loaded become dynamically allocated. The \LUATEX\
134engine and its successor don't really have a pool.
135
136Now comes a confusing number. There are not tens of thousands of nodes allocated.
137A node is just a pointer into a large array so actually node references are just
138indices. Their size varies from 2 slots to 25; the largest are par nodes, while
139shape nodes are allocated dynamically. So what gets reported are the number of
140bytes that nodes take. each node slot takes 8 bytes, so a glyph node of 12
141bytes takes 96 bytes, while a glue spec node (think skip registers) takes 5 slots
142or 40 bytes. These are amounts of memory that were not realistic when \TEX\ was
143written. For the record: in \LUATEX\ glue spec nodes are not shared, so we have
144many more.
145
146The majority of \TEX\ related dump data is for tokens, and here we need 3905660
147which means 488K tokens (each reported value also includes some overhead). The
148memory used for the table of equivalents makes for some 88K of them. This table
149relates to macros (their names and content). Keep in mind that (math) character
150references are also macros.
151
152The next sections that get loaded are math and text codes. These are the
153mentioned compartimized character properties. The number of math codes is not
154that large (because we delay much of math) but the text codes are plenty, think
155of lc, uc, sf, hj, catcodes, etc. Compared to \LUATEX\ we have more categories
156but use less space because we have an more granular storage model. Optimizing
157that bit really payed off, also because we have more vectors.
158
159The way primitives and macro names get resolved is pretty much the same in all
160engines but by using the fact that we operate in 32 bit I could actually get rid
161of some parallel tables that handle saving and restore. Some optimizations relate
162to the fact that the register ranges are part of the game so basically we have
163some holes in there when they are not used. I guess this is why \ETEX\ uses a
164sparse model for the registers above 255. What also saved a lot is that we don't
165need to store font names, because these are available in another way; even in
166\LUATEX\ that takes a large, basically useless, chunk. The memory that a macro
167without parameters consumes is 8 bytes smaller and in \CONTEXT\ we have lots of
168these.
169We don't really store fonts, so that section is small, but we do store the math
170parameters, and there is not much we can save there. We also have more such
171parameters in \LUAMETATEX\ so there we might actually use more storage. The
172information related to languages is also minimal because patterns and exceptions
173are loaded at runtime. A new category (compared to \LUATEX) is inserts because in
174\LUAMETATEX\ we can use an alternative (not register based) variant. As you can
175see from the 180 bytes used, indeed \CONTEXT\ is using that variant.
176
177That leaves a large block of more than 10 million bytes that relates to \LUA\
178byte code. A large part of that is the huge \LUA\ character table that \CONTEXT\
179uses. The implementation of font handling also takes quite a bit and we're not
180even talking of all the auxiliary \LUA\ modules, \XML\ processing, etc. When
181\CONTEXT\ would load that on demand, which is nearly always, the format file
182would be much smaller but one would pay for it later. Loading the (some 600)
183\LUA\ byte code chunks takes of course some time as does initialization but not
184much.
185
186All that said, the reason why we have a large format file can be understood well
187if one considers what goes in there. The \CONTEXT\ format files for \PDFTEX\ and
188\XETEX\ are 3.3 and 4.7 MB each which is smaller but not that much when you
189consider the fact that there is no \LUA\ code stored and that there are less
190character tables and an \ETEX\ register model used. But a format file is not the
191whole story. Runtime memory usage also comes at a price.
192
193The current memory settings of \CONTEXT\ are as follows; these values get
194reported when a format has been generated and can be queried at runtime an any
195moment:
196
197\starttabulate[|l|r|r|r|r|]
198\BC           \BC       max \BC      min \BC      set \BC     stp \BC \NR
199\HL
200\BC string    \NC   2097152 \NC   150000 \NC   500000 \NC  100000 \NC \NR
201\BC pool      \NC 100000000 \NC 10000000 \NC 20000000 \NC 1000000 \NC \NR
202\BC hash      \NC   2097152 \NC   150000 \NC   250000 \NC  100000 \NC \NR
203\BC lookup    \NC   2097152 \NC   150000 \NC   250000 \NC  100000 \NC \NR
204\BC node      \NC  50000000 \NC  1000000 \NC  5000000 \NC  500000 \NC \NR
205\BC token     \NC  10000000 \NC  1000000 \NC 10000000 \NC  250000 \NC \NR
206\BC buffer    \NC 100000000 \NC  1000000 \NC 10000000 \NC 1000000 \NC \NR
207\BC input     \NC    100000 \NC    10000 \NC   100000 \NC   10000 \NC \NR
208\BC file      \NC      2000 \NC      500 \NC     2000 \NC     250 \NC \NR
209\BC nest      \NC     10000 \NC     1000 \NC    10000 \NC    1000 \NC \NR
210\BC parameter \NC    100000 \NC    20000 \NC   100000 \NC   10000 \NC \NR
211\BC save      \NC    500000 \NC   100000 \NC   500000 \NC   10000 \NC \NR
212\BC font      \NC    100000 \NC      250 \NC      250 \NC     250 \NC \NR
213\BC language  \NC     10000 \NC      250 \NC      250 \NC     250 \NC \NR
214\BC mark      \NC     10000 \NC       50 \NC       50 \NC      50 \NC \NR
215\BC insert    \NC       500 \NC       10 \NC       10 \NC      10 \NC \NR
216\stoptabulate
217
218The maxima is what can be used at most. Apart from the magic number 2097152 all
219these maxima can be bumped at compile time but if you need more, you might wonder
220of your approach to rendering makes sense. The minima are what always gets
221allocated, and again these are hard coded defaults. The size can be configured
222and is normally the same as the minima but we use larger values in \CONTEXT. The
223step is how much an initial memory blob will grow when more is needed than is
224currently available. The last four entries show that we don't start out with many
225fonts (especially when we use the \CONTEXT\ compact font model not that many are
226needed) and because \CONTEXT\ implements marks in a different way we actually
227don't need them. We do use the new insert properties storage model and for now
228the set sizes are enough for what we need.
229
230In practice a \LUAMETATEX\ run uses less memory than a \LUATEX\ one, not only
231because memory allocation is more dynamic, but also because of other
232optimizations. When the compact font model is used (something \CONTEXT) even less
233memory is needed. Even this claim should be made with care. Whenever I discuss
234the use of resources one needs to limit the conclusions to \CONTEXT. I can't
235speak for other macro packages simply because I don't know the internals and the
236design decisions made and their impact on the statistics. As a teaser I show the
237impact of some definitions:
238
239\starttyping
240\chardef     \MyFooA1234
241\Umathchardef\MyFooB"1 "0 "1234
242\Umathcode   1 2 3 4
243\def         \MyFooC{ABC}
244\def         \MyFooD#1{A#1C}
245\def         \MyFooE{\directlua{print("some lua")}}
246\stoptyping
247
248The stringpool grows because we store the names (here they are of equal length).
249Only symbolic definitions bump the hashtable and equivalents. And with
250definitions that have text inside the number of bytes taken by tokens grows fast
251because every character in that linked list takes 8 bytes, 4 for the character
252with its catcode state and 4 for the link to the next token.
253
254\starttabulate[|l||||||]
255\BC                       \BC stringpool \BC tokens  \BC equivalents \BC hashtable \BC total    \NC \NR
256\HL
257\NC                       \NC 836408     \NC 3906124 \NC 705316      \NC 497396    \NC 16828987 \NC \NR
258\NC \type {\chardef}      \NC 836415     \NC 3906116 \NC 705324      \NC 497408    \NC 16829006 \NC \NR
259\NC \type {\Umathchardef} \NC 836422     \NC 3906116 \NC 705324      \NC 497420    \NC 16829025 \NC \NR
260\NC \type {\Umathcode}    \NC 836422     \NC 3906124 \NC 705324      \NC 497420    \NC 16829033 \NC \NR
261\NC \type {\def} (no arg) \NC 836429     \NC 3906148 \NC 705332      \NC 497428    \NC 16829080 \NC \NR
262\NC \type {\def} (arg)    \NC 836436     \NC 3906196 \NC 705340      \NC 497440    \NC 16829155 \NC \NR
263\NC \type {\def} (text)   \NC 836443     \NC 3906372 \NC 705348      \NC 497452    \NC 16829358 \NC \NR
264\stoptabulate
265
266So, every time a user wants some feature (some extra checking, a warning, color
267or font support for some element) that results in a trivial extension to the
268core, it can bump the size fo the format file more than you think. Of course when
269it leads to some overhaul sharing code can actually make the format shrink too. I
270hope it is clear now that there really is not much to deduce from the bare
271numbers. Just try to imagine what:
272
273\starttyping
274\definefilesynonym
275  [type-imp-newcomputermodern-book.mkiv]
276  [type-imp-newcomputermodern.mkiv]
277\stoptyping
278
279adds to the format. Convenience has a price.
280
281\stopchapter
282
283\stopcomponent
284
285% Some bonus content:
286
287When processing thousand paragraphs \type {tufte.tex}, staying below 4 seconds
288(just over 60 pages per second) all|-|in that looks ok. But it doesn't say that
289much. Outputting 1000 pages in 2 seconds tells a bit about the overhead on a page
290but again in practice things work out differently. So what do we need to
291consider?
292
293\startitemize
294
295\startitem
296    Check what macros and resources are preloaded and what gets always loaded at
297    runtime.
298\stopitem
299
300\startitem
301    After a first run it's likely that the operating system has resources in its
302    cache so start measuring after a few runs.
303\stopitem
304
305\startitem
306    Best run a test many times and and take the average runtime.
307\stopitem
308
309\startitem
310    Simple macro performance tests can be faster than in real usage because the
311    related bytes are in \CPU\ cache memory. So one can only use that to test a
312    specific improvement (or hit due to added functionality).
313\stopitem
314
315\startitem
316    The size of the used \TEX\ tree can matter. The file databases need to be
317    loaded and consulted.
318\stopitem
319
320\startitem
321    The binary matters: is it optimized, does it load libraries, is it 64 bit or not.
322\stopitem
323
324\startitem
325    Local and|/|or global font definitions can hit performance and when a style
326    does many redundant switches it might hit performance. Of course that only is
327    the case when font switching is adaptive.
328\stopitem
329
330\startitem
331    The granularity of subsystems impacts performance: advanced color support,
332    inheritance used in mechanisms, abstraction combined with extensive
333    support for features, it all matters.
334\stopitem
335
336\startitem
337    The more features one enables the more it will impact performance as does
338    preprocessing the input (normalizing, bidi checking, etc).
339\stopitem
340
341\startitem
342    It matters how the page (and layout) dimensions are defined. Although
343    language doesn't really play a role (apart from possible hyphenation)
344    specific scripts might.
345\stopitem
346
347\stopitemize
348
349These are just a few points, but it might be clear that I don't take comparisons
350too serious simply because it's real runs that matter. As long as we're in the
351runtime comfort zone we're okay. You can run tests within the domain of a macro
352package but comparing macro package makes not that much sense. It can even
353backfire, especially when claims were made about what should be or not be in a
354kernel (while later violating that) or relying on old stories (or rumors) about a
355variant macro package being slow. (The same is true when comparing one's favorite
356operating system.) Yes, the \CONTEXT\ format file is huge and performance less
357than for instance plain \TEX. If that is a problem and not a virtue then make
358sure your own alternative will never end up like that. And just don't come to
359conclusions about a system that you don't really know.
360