mk-last.tex /size: 19 Kb    last modification: 2023-12-21 09:43
1% language=us
2
3\startcomponent mk-arabic
4
5\environment mk-environment
6
7\chapter{Where do we stand}
8
9In the previous chapter we discussed the state of \LUATEX\ in the
10beginning of 2009, the prelude to version 0.50. We consider the
11release of the 0.50 version to be a really important, both for
12\LUATEX\ and for \MKIV\ so here I will reflect on the state
13around this release. I will do this from the perspective of
14processing documents because useability is an important measure.
15
16There are several reasons why \LUATEX\ 0.50 is an important release,
17both for \LUATEX\ and for \MKIV. Let's start with \LUATEX.
18
19\startitemize
20
21\startitem Apart from a couple of bug fixes, the current version
22is pretty usable and stable. Details of what we've reached so far
23have been presented previously. \stopitem
24
25\startitem The code base has been converted from \PASCAL\ to
26\CCODE, and as a result the source tree has become simpler (being
27\CWEB\ compliant happens around 0.60). This transition also opens
28up the possibility to start looking into some of the more tricky
29internals, like page building. \stopitem
30
31\startitem Most of the front end has been opened up and the new
32backend code is getting into shape. As the backend was partly already done in
33\CCODE\ the moment has come to do a real cleanup. Keep in mind that
34we started with \PDFTEX\ and that much of its extra functionality is
35rather interwoven with traditional \TEX\ code. \stopitem
36
37\stopitemize
38
39If we look at \CONTEXT, we've also reached a crucial point in the
40upgrade.
41
42\startitemize
43
44\startitem The code base is now divided into \MKII\ and \MKIV. This
45permits us not only to reimplement bits and pieces (something that
46was already in progress) but also to clean up the code (only
47\MKIV). \stopitem
48
49\startitem If you kept up with the development you already know
50the kind of tasks we can (and do) delegate to \LUA. Just to
51mention a few: file handling, font loading and \OPENTYPE\
52processing, casing and some spacing issues, everything related to
53graphics and \METAPOST, language support, color and other
54attributes, input regimes, \XML, multi|-|pass data, etc. \stopitem
55
56\startitem Recently all backend related code was moved to
57\LUA\ and the code dealing with hyperlinks, widgets and alike is
58now mostly moved away from \TEX. The related cleanup was possible
59because we no longer have to deal with a mix of \DVI\ drivers too.
60\stopitem
61
62\startitem Everything related to structure (which includes
63numbering and multi-pass data like tables of contents and
64registers) is now delegated to \LUA. We move around way more
65information and will extend these mechanisms in the near future.
66\stopitem
67
68\stopitemize
69
70Tracing on Taco's machine has shown that when processing the
71\LUATEX\ reference manual the engine spends about 10\%
72of the time on getting tokens, 15\% on macro expansion, and some
7350\% on \LUA\ (callback interfacing included). Especially the time
74spent by \LUA\ differs per document and garbage collections seems
75to be a bottleneck here. So, let's wrap up how \LUATEX\ performs
76around the time of 0.50.
77
78We use three documents for testing (intermediate) \LUATEX\
79binaries: the reference manual, the history document \quote{mk},
80and the revised metafun manual. The reference manual has a
81\METAPOST\ graphic on each page which is positioned using the
82\CONTEXT\ background layering mechanism. This mechanism is active
83only when backgrounds are defined and has some performance
84consequences for the page builder. However, most time is spent on
85constructing the tables (tabulate) and because these can contain
86paragraphs that can run over multiple pages, constructing a table
87takes a few analysis passes per table plus some so-called
88vsplitting. We load some fonts (including narrow variants) but for
89the rest this document is not that complex. Of course colors are
90used as well as hyperlinks.
91
92The report at the end of the runs looks as follows:
93
94\start \switchtobodyfont[small]
95\starttyping
96input load time           - 0.109 seconds
97stored bytecode data      - 184 modules, 45 tables, 229 chunks
98node list callback tasks  - 4 unique tasks, 4 created, 20980 calls
99cleaned up reserved nodes - 29 nodes, 10 lists of 1427
100node memory usage         - 19 glue_spec, 2 dir
101h-node processing time    - 0.312 seconds including kernel
102attribute processing time - 1.154 seconds
103used backend              - pdf (backend for directly generating pdf output)
104loaded patterns           - en:us:pat:exc:2
105jobdata time              - 0.078 seconds saving, 0.047 seconds loading
106callbacks                 - direct: 86692, indirect: 13364, total: 100056
107interactive elements      - 178 references, 356 destinations
108v-node processing time    - 0.062 seconds
109loaded fonts              - 43 files: ....
110fonts load time           - 1.030 seconds
111metapost processing time  - 0.281 seconds, loading: 0.016 seconds,
112                            execution: 0.156 seconds, n: 161
113result saved in file      - luatexref-t.pdf
114luatex banner             - this is luatex, version beta-0.42.0
115control sequences         - 31880 of 147189
116current memory usage      - 106 MB (ctx: 108 MB)
117runtime                   - 12.433 seconds, 164 processed pages,
118                            164 shipped pages, 13.191 pages/second
119\stoptyping
120\stop
121
122The runtime is influenced by the fact that some startup time and
123font loading takes place. The more pages your document has, the
124less the runtime is influenced by this.
125
126More demanding is the \quote {mk} document (figure~\ref{fig.mk}). Here
127we have many fonts, including some really huge \CJK\ and Arabic ones (and these are
128loaded at several sizes and with different features). The reported
129font load time is large but this is partly due to the fact that on
130my machine for some reason passing the tables to \TEX\ involved a
131lot of pagefaults (we think that the cpu cache is the culprit).
132Older versions of \LUATEX\ didn't have that performance penalty,
133so probably half of the reported font loading time is kind of
134wasted.
135
136The hnode processing time refers mostly to \OPENTYPE\ font
137processing and attribute processing time has to do with backend
138issues (like injecting color directives). The more features you
139enable, the larger these numbers get. The \METAPOST\ font loading
140refers to the punk font instances.
141
142\start \switchtobodyfont[small]
143\starttyping
144input load time           - 0.125 seconds
145stored bytecode data      - 184 modules, 45 tables, 229 chunks
146node list callback tasks  - 4 unique tasks, 4 created, 24295 calls
147cleaned up reserved nodes - 116 nodes, 29 lists of 1411
148node memory usage         - 21 attribute, 23 glue_spec, 7 attribute_list,
149                            7 local_par, 2 dir
150h-node processing time    - 1.763 seconds including kernel
151attribute processing time - 2.231 seconds
152used backend              - pdf (backend for directly generating pdf output)
153loaded patterns           - en:us:pat:exc:2 en-gb:gb:pat:exc:3 nl:nl:pat:exc:4
154language load time        - 0.094 seconds, n=4
155jobdata time              - 0.062 seconds saving, 0.031 seconds loading
156callbacks                 - direct: 98199, indirect: 20257, total: 118456
157xml load time             - 0.000 seconds, lpath calls: 46, cached calls: 31
158v-node processing time    - 0.234 seconds
159loaded fonts              - 69 files: ....
160fonts load time           - 28.205 seconds
161metapost processing time  - 0.421 seconds, loading: 0.016 seconds,
162                            execution: 0.203 seconds, n: 65
163graphics processing time  - 0.125 seconds including tex, n=7
164result saved in file      - mk.pdf
165metapost font generation  - 0 glyphs, 0.000 seconds runtime
166metapost font loading     - 0.187 seconds, 40 instances,
167                            213.904 instances/second
168luatex banner             - this is luatex, version beta-0.42.0
169control sequences         - 34449 of 147189
170current memory usage      - 454 MB (ctx: 465 MB)
171runtime                   - 50.326 seconds, 316 processed pages,
172                            316 shipped pages, 6.279 pages/second
173\stoptyping
174\stop
175
176Looking at the Metafun manual one might expect that one needs
177even more time per page but this is not true. We use \OPENTYPE\
178fonts in base mode as we don't use fancy font features (base mode
179uses traditional \TEX\ methods). Most interesting here is the time
180involved in processing \METAPOST\ graphics. There are a lot of
181them (1772) and in addition we have 7 calls to independent
182\CONTEXT\ runs that take one third of the total runtime. About
183half of the runtime involves graphics.
184
185\start \switchtobodyfont[small]
186\starttyping
187input load time           - 0.109 seconds
188stored bytecode data      - 184 modules, 45 tables, 229 chunks
189node list callback tasks  - 4 unique tasks, 4 created, 33510 calls
190cleaned up reserved nodes - 39 nodes, 93 lists of 1432
191node memory usage         - 249 attribute, 19 glue_spec, 82 attribute_list,
192                            85 local_par, 2 dir
193h-node processing time    - 0.562 seconds including kernel
194attribute processing time - 2.512 seconds
195used backend              - pdf (backend for directly generating pdf output)
196loaded patterns           - en:us:pat:exc:2
197jobdata time              - 0.094 seconds saving, 0.031 seconds loading
198callbacks                 - direct: 143950, indirect: 28492, total: 172442
199interactive elements      - 214 references, 371 destinations
200v-node processing time    - 0.250 seconds
201loaded fonts              - 45 files: l.....
202fonts load time           - 1.794 seconds
203metapost processing time  - 5.585 seconds, loading: 0.047 seconds,
204                            execution: 2.371 seconds, n: 1772,
205                            external: 15.475 seconds (7 calls)
206mps conversion time       - 0.000 seconds, 1 conversions
207graphics processing time  - 0.499 seconds including tex, n=74
208result saved in file      - metafun.pdf
209luatex banner             - this is luatex, version beta-0.42.0
210control sequences         - 32587 of 147189
211current memory usage      - 113 MB (ctx: 115 MB)
212runtime                   - 43.368 seconds, 362 processed pages,
213                            362 shipped pages, 8.347 pages/second
214\stoptyping
215\stop
216
217By now it will be clear that processing a document takes a bit of
218time. However, keep in mind that these documents are a bit
219atypical. Although \unknown\ thee average \CONTEXT\ document
220probably uses color (including color spaces that involve resource
221management), and has multiple layers, which involves some testing of
222the about 30 areas that make up the page. And there is the
223user interface that comes with a price.
224
225It might be good to say a bit more about fonts. In \CONTEXT\ we
226use symbolic names and often a chain of them, so the abstract
227\type {SerifBold} resolves to \type {MyNiceFontSerif-Bold} which
228in turn resolves to \type {mnfs-bold.otf}. As \XETEX\ introduced
229lookup by internal (or system) fontname instead of filename,
230\MKII\ also provides that method but \MKIV\ adds some heuristics
231to it. Users can specify font sizes in traditional \TEX\ units but
232also relative to the body font. All this involves a bit of
233expansion (resolving the chain) and parsing (of the
234specification). At each of the levels of name abstraction we can
235have associated parameters, like features, fallbacks and more.
236Although these mechanisms are quite optimized this still comes at a
237performance price.
238
239Also, in the default \MKIV\ font setup we use a couple more
240font variants (as they are available in Latin Modern). We've kept
241definitions sort of dynamic so you can change them and combine
242them in many ways. Definitions are collected in typescripts which
243are filtered. We support multiple mixed font sets which takes a bit
244of time to define but switching is generally fast. Compared to \MKII\
245the model lacks the (font) encoding and case handling code (here
246we gain speed) but it now offers fallback fonts (replaced ranges
247within fonts) and dynamic \OPENTYPE\ font feature switching. When
248used we might lose a bit of processing speed although fewer
249definitions are needed which gets us some back. The font subsystem
250is anyway a factor in the performance, if only because more
251complex scripts or font features demand extensive node list
252parsing.
253
254Processing the \TEX book with \LUATEX\ on Taco's machine takes some
2553.5 seconds in \PDFTEX\ and 5.5 seconds in \LUATEX. This is
256because \LUATEX\ internally is \UNICODE\ and has a larger memory
257space. The few seconds more runtime are consistent with this. One
258of the reasons that The \TEX\ Book processes fast is that the font
259system is not that complex and has hardly any overhead, and an
260efficient output routine is used. The format file is small and the
261macro set is optimal for the task. The coding is rather low level
262so to say (no layers of interfacing). Anyway, 100 pages per second
263is not bad at all and we don't come close with \CONTEXT\ and the
264kind of documents that we produce there.
265
266This made me curious as to how fast really dumb documents could be
267processed. It does not make sense to compare plain \TEX\ and
268\CONTEXT\ because they do different things. Instead I decided to
269look at differences in engines and compare runs with different
270numbers of pages. That way we get an idea of how startup time
271influences overall performance. We look at \PDFTEX, which is
272basically an 8-bit system, \XETEX, which uses external libraries and is
273\UNICODE, and \LUATEX\ which is also \UNICODE, but stays closer to
274traditional \TEX\ but has to check for callbacks.
275
276In our measurement we use a really simple test document as we only
277want to see how the baseline performs. As not much content is
278processed, we focus on loading (startup), the output routine and
279page building, and some basic \PDF\ generation. After all, it's
280often a quick and dirty test that gives users their first
281impression. When looking at the times you need to keep in mind
282that \XETEX\ pipes to \DVIPDFMX\ and can benefit from multiple
283cpu cores. All systems have different memory management and garbage
284collection might influence performance (as demonstrated in an
285earlier chapter of the \quote{mk} document we can trace in detail
286how the runtime is distributed). As terminal output is a significant
287slowdown for \TEX\ we run in batchmode. The test is as follows:
288
289\starttyping
290\starttext
291    \dorecurse{2000}{test\page}
292\stoptext
293\stoptyping
294
295On my laptop (Dell M90 with 2.3Ghz T76000 Core 2 and 4MB memory
296running Vista) I get the following results. The test script ran
297each test set 5~times and we show the fastest run so we kind of
298avoid interference with other processes that take time. In
299practice runtime differs quite a bit for similar runs, depending
300on the system load. The time is in seconds and between parentheses
301the number of pages per seconds is mentioned.
302
303% \starttabulate[||||||]
304% \NC \bf engine \NC 30 \NC 300 \NC 2000 \NC 10000 \NC \NR
305% \HL
306% \NC \bf xetex  \NC 1.84 (16) 1.81 (16) \NC 2.51 (119) 2.45 (122) \NC 7.38 (270) 6.97 (286) \NC 38.53 (259) 29.20 (342) \NC \NR
307% \NC \bf pdftex \NC 1.32 (22) 1.28 (23) \NC 2.16 (138) 2.07 (144) \NC 7.34 (272) 6.96 (287) \NC 43.73 (228) 30.94 (323) \NC \NR
308% \NC \bf luatex \NC 1.53 (19) 1.48 (20) \NC 2.41 (124) 2.36 (127) \NC 8.16 (245) 7.85 (254) \NC 44.67 (223) 34.34 (291) \NC \NR
309% \stoptabulate
310
311\starttabulate[||||||]
312\NC \bf engine \NC 30 \NC 300 \NC 2000 \NC 10000 \NC \NR
313\HL
314\NC \bf xetex  \NC 1.81 (16) \NC 2.45 (122) \NC 6.97 (286) \NC 29.20 (342) \NC \NR
315\NC \bf pdftex \NC 1.28 (23) \NC 2.07 (144) \NC 6.96 (287) \NC 30.94 (323) \NC \NR
316\NC \bf luatex \NC 1.48 (20) \NC 2.36 (127) \NC 7.85 (254) \NC 34.34 (291) \NC \NR
317\stoptabulate
318
319The next table shows the same test but this time on a 2.5Ghz E5420
320quad core server with 16GB memory running Linux, but with 6
321virtual machines idling in the background. All binaries are 64 bit.
322
323% \starttabulate[||||||]
324% \NC \bf engine \NC 30 \NC 300 \NC 2000 \NC 10000 \NC \NR
325% \HL
326% \NC \bf xetex  \NC 0.94 (31) 0.92 (32) \NC 2.00 (150) 1.89 (158) \NC 9.02 (221) 8.74 (228) \NC 42.41 (235) 42.19 (237) \NC \NR
327% \NC \bf pdftex \NC 0.51 (58) 0.49 (61) \NC 1.19 (251) 1.14 (262) \NC 5.34 (374) 5.23 (382) \NC 25.16 (397) 24.66 (405) \NC \NR
328% \NC \bf luatex \NC 1.09 (27) 1.07 (27) \NC 2.06 (145) 1.99 (150) \NC 8.72 (229) 8.32 (240) \NC 40.10 (249) 38.22 (261) \NC \NR
329% \stoptabulate
330
331\starttabulate[||||||]
332\NC \bf engine \NC 30 \NC 300 \NC 2000 \NC 10000 \NC \NR
333\HL
334\NC \bf xetex  \NC 0.92 (32) \NC 1.89 (158) \NC 8.74 (228) \NC 42.19 (237) \NC \NR
335\NC \bf pdftex \NC 0.49 (61) \NC 1.14 (262) \NC 5.23 (382) \NC 24.66 (405) \NC \NR
336\NC \bf luatex \NC 1.07 (27) \NC 1.99 (150) \NC 8.32 (240) \NC 38.22 (261) \NC \NR
337\stoptabulate
338
339A test demonstrated that for \LUATEX\ the 30 and 300 page runs
340take 70\% more runtime with 32 bit binaries (recent binaries for
341these engines are available on the \CONTEXT\ wiki \type
342{contextgarden.net}).
343
344When you compare both tables it will be clear that it is
345non|-|trivial to come to conclusions about performances. But one thing
346is clear: \LUATEX\ with \CONTEXT\ \MKIV\ is not performing that
347badly compared to its cousins. The \UNICODE\ engines perform about
348the same and \PDFTEX\ beats them significantly. Okay, I have to
349admit that in the meantime some cleanup of code in \MKIV\ has
350happened and the \LUATEX\ runs benefit from this, but on the other
351hand, the other engines are not hindered by callbacks. As I expect
352to use \MKII\ less frequently optimizing the older code makes no
353sense.
354
355There is not much chance of \LUATEX\ itself becoming faster,
356although a few days before writing this Taco managed to speed up
357font inclusion in the backend code significantly (we're talking
358about half a second to a second for the three documents used
359here). On the contrary, when we open up more mechanisms and have
360upgraded backend code it might actually be a bit slower. On the
361other hand, I expect to be able to clean up some more \CONTEXT\
362code, although we already got rid of some subsystems (like the
363rather flexible (mixed) font encoding, where each language could
364have multiple hyphenation patters, etc.). Also, although initial
365loading of math fonts might take a bit more time (as long as we
366use virtual Latin Modern math), font switching is more efficient
367now due to fewer families. But speedups in the \CONTEXT\ code might
368be compensated for by more advanced mechanisms that call out to \LUA.
369You will be surprised by how much speed can be improved by proper
370document encoding and proper styles. I can try to gain a couple
371more pages per second by more efficient code, but a user's style
372that does an inefficient massive font switch for some 10 words per
373page easily compensates for that.
374
375When processing this 10 page chapter in an editor (Scite) it takes
376some 2.7 seconds between hitting the processing key and the result
377showing up in Acrobat. I can live with that, especially when I
378keep in mind that my next computer will be faster.
379
380This is where we stand now. The three reports shown before give
381you an impression of the impact of \LUATEX\ on \CONTEXT. To what
382extent is this reflected in the code base? We end this chapter
383with showing four tables. The first table shows the number of
384files that make up the core of \CONTEXT\ (modules are excluded).
385The second table shows the accumulated size of these files
386(comments and spacing stripped). The third and fourth table show
387the same information in a different way, just to give you a better
388impression of the relative number of files and sizes. The four
389character tags represent the file groups, so the files have
390names like \type {node-ini.mkiv}, \type {font-otf.lua} and
391\type {supp-box.tex}.
392
393Eventually most \MKII\ files (with the \type {mkii} suffix) and
394\MKIV\ files (with suffix \type {mkiv}) will differ and the number
395of files with the \type {tex} suffix will be fewer. Because they
396are and will be mostly downward compatible, styles and modules
397will be shared as much as possible.
398
399\placefigure[none,90,page]{}{\externalfigure[mk-last-state.pdf][page=1,width=\the\textheight]}
400\placefigure[none,90,page]{}{\externalfigure[mk-last-state.pdf][page=2,width=\the\textheight]}
401\placefigure[none,90,page]{}{\externalfigure[mk-last-state.pdf][page=3,width=\the\textheight]}
402\placefigure[none,90,page]{}{\externalfigure[mk-last-state.pdf][page=4,width=\the\textheight]}
403
404\stopcomponent
405