evenmore-formats.tex /size: 18 Kb    last modification: 2021-10-28 13:50
1% language=us runpath=texruns:manuals/evenmore
2
3% This one accidentally ended up in the older history document followingup,
4% btu it's now moved here.
5
6\environment evenmore-style
7
8\startcomponent  evenmore-format
9
10\startchapter[title={The format file}]
11
12It is interesting when someone compares macro package and used parameters like
13the size of a format file, the output of \type {\tracingall}, or startup time to
14make some point. The point I want to make here is that unless you know exactly
15what goes on in a run that involves a real document, which can itself involve
16multiple runs, such a comparison is rather pointless. For sure I do benchmark but
17I can only draw conclusions of what I (can) know (about). Yes, benchmarking your
18own work makes sense but doing that in comparisons to what you consider
19comparable variants assumes knowledge of more than your own work and objectives.
20
21
22For instance, when you load few fonts, typeset one page and don't do anything
23that demands any processing or multiple runs, you basically don't measure
24anything. More interesting are the differences between 10 or 500 pages, a few
25font calls or tens of thousands, no color of extensive usage of color and other
26properties, interfacing, including inheritance of document constructs, etc. And
27even then, when comparing macro packages, it is kind of tricky to deduce much
28from what you observe. You really need to know what is going on inside and also
29how that relates to for instance adaptive font scaling. You can have a fast start
30up but if a users needs one tikz picture, loading that package alone will make
31you forget the initial startup time. You always pay a price for advanced features
32and integration! And we didn't even talk about the operating system caching
33files, running on a network share, sharing processors among virtual machines,
34etc.
35
36Pointless comparing is also true for looking at the log file when enabling \type
37{\tracingall}. When a macro package loads stuff at startup you can be sure that
38the log file is larger. When a font or language is loaded the first time, or
39maybe when math is set up there can be plenty of lines dumped. Advanced analysis
40of conditions and trial runs come at a price too. And eventually, when a box is
41shown the configured depth and breadth really matter, and it might also be that
42the engine provides much more (verbose) detail. So, a comparison is again
43pointless. It can also backfire. Over the decades of developing \CONTEXT\ I have
44heard people working on systems make claims like \quotation {We prefer not to
45\unknown} or \quotation {It is better to it this way \unknown} or (often about
46operating system) \quotation {It is bad that \unknown} just to see years later
47the same being done in the presumed better alternative. I can have a good laugh
48about that: do this and don't do that backfiring.
49
50That brings us to the format file. When you make a \CONTEXT\ format with the
51English user interface, with interfacing being a feature that itself introduces
52overhead, the \LUATEX\ engine will show this at the end:
53
54\starttyping
55Beginning to dump on file cont-en.fmt
56 (format=cont-en 2021.6.9)
5748605 strings using 784307 bytes
581050637 memory locations dumped; current usage is 414&523763
5944974 multiletter control sequences
60\font\nullfont=nullfont
610 preloaded fonts
62\stoptyping
63
64The file itself is quite large: 11,129,903 bytes. However, it is actually much
65larger because the format file is compressed! The real size is 19.399.216. Not
66taking that into account when comparing the size of format files is kind of bad
67because compression directly relates to what resources a format use and how
68usage is distributed over the available memory blobs. The \LUATEX\ engine does
69some optimizations and saves the data sparse but the more holes you create, the
70worse it gets. For instance, the large character vectors are compartmentalized in
71order to handle \UNICODE\ efficiently so the used memory relates to what you
72define: do you set up all catcodes or just a subset. Maybe you delay some
73initialization to after the format is loaded, in which case a smaller format file
74gets compensated by more memory usage and initializaton time afterwards. Maybe
75your temporary macros create holes in the token array. The memory that is
76configured in the configuration files also matter. Some memory blobs are saved at
77their configured size, others dismiss the top part that is not used when saving
78the format but allocate the lot when the format is loaded. That means that memory
79usage in for instance \LUATEX\ can be much larger than a format file suggests.
80Keep in mind that a format file is basically a memory dump.
81
82Now, how does \LUAMETATEX\ compare to \LUATEX. Again we will look at the size of
83the format file, but you need to keep in mind that for various reasons the \LMTX\
84macros are somewhat more efficient than the \MKIV\ ones, in the meantime some new
85mechanism were added, which adds more \TEX\ and \LUA\ code, but I still expect
86(at least for now) a smaller format file. However when we create the format we
87see this (reformatted):
88
89\starttyping
90Dumping format 'cont-en.fmt 2021.6.9' in file 'cont-en.fmt':
91tokenlist compacted from 489733 to 488204 entries,
921437 potentially aliased lua call/value entries,
93max string length 69, 16 fingerprint
94+ 16 engine + 28 preamble
95+ 836326 stringpool
96+ 10655 nodes + 3905660 tokens
97+ 705300 equivalents
98+ 23072 math codes + 493024 text codes
99+ 38132 primitives + 497352 hashtable
100+ 4 fonts + 10272 math + 1008 language + 180 insert
101+ 10305643 bytecodes
102+ 12 housekeeping = 16826700 total.
103\stoptyping
104
105This looks quite different from the \LUATEX\ output. Here we report more detail:
106for each blob we mention the number of bytes used. The final result is a file
107that takes 16.826.700 bytes. That number should be compared with the 19.399.216
108for \LUATEX. So, we need less indeed. But, when we compress the \LUAMETATEX\
109format we get this: 5,913,932 which is much less than the 11,129,903 compressed
110size that the \LUATEX\ engine makes of it. One reason for using level 3 zip
111compression compression in \LUATEX\ is that (definitely when we started) it loads
112faster. It adds to creating the dump but doesn't really influence loading,
113although that depends a bit on the compiler used. It is not easy to see from
114these numbers what goes on, but when you consider the fact that we mostly store
11532 bit numbers it will also be clear that many can be zero or have two or three
116zero bytes. There's a lot of repetition involved!
117
118So let's look at some of these numbers. The mentioning of token list compaction
119relates to getting rid of holes in memory. Each token takes 8 bytes, 4 for the
120token identifier, internally called a cmd and chr, and 4 for a value like an
121integer or dimension value, or a glue pointer, or a pointer to a next token, etc.
122In our case compaction doesn't save that much.
123
124The mentioning of potentially aliased \LUA\ call|/|value entries is more a warning.
125Because the \LUA\ engine starts fresh each run, you cannot store its \quote
126{pointers} and because hashes are randomized this means that you need to delay
127initialization to startup time, definitely for function tokens.
128
129Strings in \TEX\ can be pretty long but in practice they aren't. In \CONTEXT\ the
130maximum string length is 69. This makes it possible to use one byte for
131registering the string length instead of four which saves quite a bit. Of course
132one large string will spoil this game.
133
134The fingerprint, engine, preamble and later housekeeping bytes can be neglected
135but the string pool not. These are the bytes that make up the strings. The bytes
136are stored in format but when loaded become dynamically allocated. The \LUATEX\
137engine and its successor don't really have a pool.
138
139Now comes a confusing number. There are not tens of thousands of nodes allocated.
140A node is just a pointer into a large array so actually node references are just
141indices. Their size varies from 2 slots to 25; the largest are par nodes, while
142shape nodes are allocated dynamically. So what gets reported are the number of
143bytes that nodes take. each node slot takes 8 bytes, so a glyph node of 12
144bytes takes 96 bytes, while a glue spec node (think skip registers) takes 5 slots
145or 40 bytes. These are amounts of memory that were not realistic when \TEX\ was
146written. For the record: in \LUATEX\ glue spec nodes are not shared, so we have
147many more.
148
149The majority of \TEX\ related dump data is for tokens, and here we need 3905660
150which means 488K tokens (each reported value also includes some overhead). The
151memory used for the table of equivalents makes for some 88K of them. This table
152relates to macros (their names and content). Keep in mind that (math) character
153references are also macros.
154
155The next sections that get loaded are math and text codes. These are the
156mentioned compartimized character properties. The number of math codes is not
157that large (because we delay much of math) but the text codes are plenty, think
158of lc, uc, sf, hj, catcodes, etc. Compared to \LUATEX\ we have more categories
159but use less space because we have an more granular storage model. Optimizing
160that bit really payed off, also because we have more vectors.
161
162The way primitives and macro names get resolved is pretty much the same in all
163engines but by using the fact that we operate in 32 bit I could actually get rid
164of some parallel tables that handle saving and restore. Some optimizations relate
165to the fact that the register ranges are part of the game so basically we have
166some holes in there when they are not used. I guess this is why \ETEX\ uses a
167sparse model for the registers above 255. What also saved a lot is that we don't
168need to store font names, because these are available in another way; even in
169\LUATEX\ that takes a large, basically useless, chunk. The memory that a macro
170without parameters consumes is 8 bytes smaller and in \CONTEXT\ we have lots of
171these.
172We don't really store fonts, so that section is small, but we do store the math
173parameters, and there is not much we can save there. We also have more such
174parameters in \LUAMETATEX\ so there we might actually use more storage. The
175information related to languages is also minimal because patterns and exceptions
176are loaded at runtime. A new category (compared to \LUATEX) is inserts because in
177\LUAMETATEX\ we can use an alternative (not register based) variant. As you can
178see from the 180 bytes used, indeed \CONTEXT\ is using that variant.
179
180That leaves a large block of more than 10 million bytes that relates to \LUA\
181byte code. A large part of that is the huge \LUA\ character table that \CONTEXT\
182uses. The implementation of font handling also takes quite a bit and we're not
183even talking of all the auxiliary \LUA\ modules, \XML\ processing, etc. When
184\CONTEXT\ would load that on demand, which is nearly always, the format file
185would be much smaller but one would pay for it later. Loading the (some 600)
186\LUA\ byte code chunks takes of course some time as does initialization but not
187much.
188
189All that said, the reason why we have a large format file can be understood well
190if one considers what goes in there. The \CONTEXT\ format files for \PDFTEX\ and
191\XETEX\ are 3.3 and 4.7 MB each which is smaller but not that much when you
192consider the fact that there is no \LUA\ code stored and that there are less
193character tables and an \ETEX\ register model used. But a format file is not the
194whole story. Runtime memory usage also comes at a price.
195
196The current memory settings of \CONTEXT\ are as follows; these values get
197reported when a format has been generated and can be queried at runtime an any
198moment:
199
200\starttabulate[|l|r|r|r|r|]
201\BC           \BC       max \BC      min \BC      set \BC     stp \BC \NR
202\HL
203\BC string    \NC   2097152 \NC   150000 \NC   500000 \NC  100000 \NC \NR
204\BC pool      \NC 100000000 \NC 10000000 \NC 20000000 \NC 1000000 \NC \NR
205\BC hash      \NC   2097152 \NC   150000 \NC   250000 \NC  100000 \NC \NR
206\BC lookup    \NC   2097152 \NC   150000 \NC   250000 \NC  100000 \NC \NR
207\BC node      \NC  50000000 \NC  1000000 \NC  5000000 \NC  500000 \NC \NR
208\BC token     \NC  10000000 \NC  1000000 \NC 10000000 \NC  250000 \NC \NR
209\BC buffer    \NC 100000000 \NC  1000000 \NC 10000000 \NC 1000000 \NC \NR
210\BC input     \NC    100000 \NC    10000 \NC   100000 \NC   10000 \NC \NR
211\BC file      \NC      2000 \NC      500 \NC     2000 \NC     250 \NC \NR
212\BC nest      \NC     10000 \NC     1000 \NC    10000 \NC    1000 \NC \NR
213\BC parameter \NC    100000 \NC    20000 \NC   100000 \NC   10000 \NC \NR
214\BC save      \NC    500000 \NC   100000 \NC   500000 \NC   10000 \NC \NR
215\BC font      \NC    100000 \NC      250 \NC      250 \NC     250 \NC \NR
216\BC language  \NC     10000 \NC      250 \NC      250 \NC     250 \NC \NR
217\BC mark      \NC     10000 \NC       50 \NC       50 \NC      50 \NC \NR
218\BC insert    \NC       500 \NC       10 \NC       10 \NC      10 \NC \NR
219\stoptabulate
220
221The maxima is what can be used at most. Apart from the magic number 2097152 all
222these maxima can be bumped at compile time but if you need more, you might wonder
223of your approach to rendering makes sense. The minima are what always gets
224allocated, and again these are hard coded defaults. The size can be configured
225and is normally the same as the minima but we use larger values in \CONTEXT. The
226step is how much an initial memory blob will grow when more is needed than is
227currently available. The last four entries show that we don't start out with many
228fonts (especially when we use the \CONTEXT\ compact font model not that many are
229needed) and because \CONTEXT\ implements marks in a different way we actually
230don't need them. We do use the new insert properties storage model and for now
231the set sizes are enough for what we need.
232
233In practice a \LUAMETATEX\ run uses less memory than a \LUATEX\ one, not only
234because memory allocation is more dynamic, but also because of other
235optimizations. When the compact font model is used (something \CONTEXT) even less
236memory is needed. Even this claim should be made with care. Whenever I discuss
237the use of resources one needs to limit the conclusions to \CONTEXT. I can't
238speak for other macro packages simply because I don't know the internals and the
239design decisions made and their impact on the statistics. As a teaser I show the
240impact of some definitions:
241
242\starttyping
243\chardef     \MyFooA1234
244\Umathchardef\MyFooB"1 "0 "1234
245\Umathcode   1 2 3 4
246\def         \MyFooC{ABC}
247\def         \MyFooD#1{A#1C}
248\def         \MyFooE{\directlua{print("some lua")}}
249\stoptyping
250
251The stringpool grows because we store the names (here they are of equal length).
252Only symbolic definitions bump the hashtable and equivalents. And with
253definitions that have text inside the number of bytes taken by tokens grows fast
254because every character in that linked list takes 8 bytes, 4 for the character
255with its catcode state and 4 for the link to the next token.
256
257\starttabulate[|l||||||]
258\BC                       \BC stringpool \BC tokens  \BC equivalents \BC hashtable \BC total    \NC \NR
259\HL
260\NC                       \NC 836408     \NC 3906124 \NC 705316      \NC 497396    \NC 16828987 \NC \NR
261\NC \type {\chardef}      \NC 836415     \NC 3906116 \NC 705324      \NC 497408    \NC 16829006 \NC \NR
262\NC \type {\Umathchardef} \NC 836422     \NC 3906116 \NC 705324      \NC 497420    \NC 16829025 \NC \NR
263\NC \type {\Umathcode}    \NC 836422     \NC 3906124 \NC 705324      \NC 497420    \NC 16829033 \NC \NR
264\NC \type {\def} (no arg) \NC 836429     \NC 3906148 \NC 705332      \NC 497428    \NC 16829080 \NC \NR
265\NC \type {\def} (arg)    \NC 836436     \NC 3906196 \NC 705340      \NC 497440    \NC 16829155 \NC \NR
266\NC \type {\def} (text)   \NC 836443     \NC 3906372 \NC 705348      \NC 497452    \NC 16829358 \NC \NR
267\stoptabulate
268
269So, every time a user wants some feature (some extra checking, a warning, color
270or font support for some element) that results in a trivial extension to the
271core, it can bump the size fo the format file more than you think. Of course when
272it leads to some overhaul sharing code can actually make the format shrink too. I
273hope it is clear now that there really is not much to deduce from the bare
274numbers. Just try to imagine what:
275
276\starttyping
277\definefilesynonym
278  [type-imp-newcomputermodern-book.mkiv]
279  [type-imp-newcomputermodern.mkiv]
280\stoptyping
281
282adds to the format. Convenience has a price.
283
284\stopchapter
285
286\stopcomponent
287
288% Some bonus content:
289
290When processing thousand paragraphs \type {tufte.tex}, staying below 4 seconds
291(just over 60 pages per second) all|-|in that looks ok. But it doesn't say that
292much. Outputting 1000 pages in 2 seconds tells a bit about the overhead on a page
293but again in practice things work out differently. So what do we need to
294consider?
295
296\startitemize
297
298\startitem
299    Check what macros and resources are preloaded and what gets always loaded at
300    runtime.
301\stopitem
302
303\startitem
304    After a first run it's likely that the operating system has resources in its
305    cache so start measuring after a few runs.
306\stopitem
307
308\startitem
309    Best run a test many times and and take the average runtime.
310\stopitem
311
312\startitem
313    Simple macro performance tests can be faster than in real usage because the
314    related bytes are in \CPU\ cache memory. So one can only use that to test a
315    specific improvement (or hit due to added functionality).
316\stopitem
317
318\startitem
319    The size of the used \TEX\ tree can matter. The file databases need to be
320    loaded and consulted.
321\stopitem
322
323\startitem
324    The binary matters: is it optimized, does it load libraries, is it 64 bit or not.
325\stopitem
326
327\startitem
328    Local and|/|or global font definitions can hit performance and when a style
329    does many redundant switches it might hit performance. Of course that only is
330    the case when font switching is adaptive.
331\stopitem
332
333\startitem
334    The granularity of subsystems impacts performance: advanced color support,
335    inheritance used in mechanisms, abstraction combined with extensive
336    support for features, it all matters.
337\stopitem
338
339\startitem
340    The more features one enables the more it will impact performance as does
341    preprocessing the input (normalizing, bidi checking, etc).
342\stopitem
343
344\startitem
345    It matters how the page (and layout) dimensions are defined. Although
346    language doesn't really play a role (apart from possible hyphenation)
347    specific scripts might.
348\stopitem
349
350\stopitemize
351
352These are just a few points, but it might be clear that I don't take comparisons
353too serious simply because it's real runs that matter. As long as we're in the
354runtime comfort zone we're okay. You can run tests within the domain of a macro
355package but comparing macro package makes not that much sense. It can even
356backfire, especially when claims were made about what should be or not be in a
357kernel (while later violating that) or relying on old stories (or rumors) about a
358variant macro package being slow. (The same is true when comparing one's favorite
359operating system.) Yes, the \CONTEXT\ format file is huge and performance less
360than for instance plain \TEX. If that is a problem and not a virtue then make
361sure your own alternative will never end up like that. And just don't come to
362conclusions about a system that you don't really know.
363