1
2
3\startcomponent mkarabic
4
5\environment mkenvironment
6
7\chapter{Where do we stand}
8
9In the previous chapter we discussed the state of \LUATEX\ in the
10beginning of 2009, the prelude to version 0.50. We consider the
11release of the 0.50 version to be a really important, both for
12\LUATEX\ and for \MKIV\ so here I will reflect on the state
13around this release. I will do this from the perspective of
14processing documents because useability is an important measure.
15
16There are several reasons why \LUATEX\ 0.50 is an important release,
17both for \LUATEX\ and for \MKIV. Lets start with \LUATEX.
18
19\startitemize
20
21\startitem Apart from a couple of bug fixes, the current version
22is pretty usable and stable. Details of what weve reached so far
23have been presented previously. \stopitem
24
25\startitem The code base has been converted from \PASCAL\ to
26\CCODE, and as a result the source tree has become simpler (being
27\CWEB\ compliant happens around 0.60). This transition also opens
28up the possibility to start looking into some of the more tricky
29internals, like page building. \stopitem
30
31\startitem Most of the front end has been opened up and the new
32backend code is getting into shape. As the backend was partly already done in
33\CCODE\ the moment has come to do a real cleanup. Keep in mind that
34we started with \PDFTEX\ and that much of its extra functionality is
35rather interwoven with traditional \TEX\ code. \stopitem
36
37\stopitemize
38
39If we look at \CONTEXT, weve also reached a crucial point in the
40upgrade.
41
42\startitemize
43
44\startitem The code base is now divided into \MKII\ and \MKIV. This
45permits us not only to reimplement bits and pieces (something that
46was already in progress) but also to clean up the code (only
47\MKIV). \stopitem
48
49\startitem If you kept up with the development you already know
50the kind of tasks we can (and do) delegate to \LUA. Just to
51mention a few: file handling, font loading and \OPENTYPE\
52processing, casing and some spacing issues, everything related to
53graphics and \METAPOST, language support, color and other
54attributes, input regimes, \XML, multipass data, etc. \stopitem
55
56\startitem Recently all backend related code was moved to
57\LUA\ and the code dealing with hyperlinks, widgets and alike is
58now mostly moved away from \TEX. The related cleanup was possible
59because we no longer have to deal with a mix of \DVI\ drivers too.
60\stopitem
61
62\startitem Everything related to structure (which includes
63numbering and multipass data like tables of contents and
64registers) is now delegated to \LUA. We move around way more
65information and will extend these mechanisms in the near future.
66\stopitem
67
68\stopitemize
69
70Tracing on Tacos machine has shown that when processing the
71\LUATEX\ reference manual the engine spends about 10\%
72of the time on getting tokens, 15\% on macro expansion, and some
7350\% on \LUA\ (callback interfacing included). Especially the time
74spent by \LUA\ differs per document and garbage collections seems
75to be a bottleneck here. So, lets wrap up how \LUATEX\ performs
76around the time of 0.50.
77
78We use three documents for testing (intermediate) \LUATEX\
79binaries: the reference manual, the history document \quote{mk},
80and the revised metafun manual. The reference manual has a
81\METAPOST\ graphic on each page which is positioned using the
82\CONTEXT\ background layering mechanism. This mechanism is active
83only when backgrounds are defined and has some performance
84consequences for the page builder. However, most time is spent on
85constructing the tables (tabulate) and because these can contain
86paragraphs that can run over multiple pages, constructing a table
87takes a few analysis passes per table plus some socalled
88vsplitting. We load some fonts (including narrow variants) but for
89the rest this document is not that complex. Of course colors are
90used as well as hyperlinks.
91
92The report at the end of the runs looks as follows:
93
94\start \switchtobodyfont[small]
95\starttyping
96input load time 0.109 seconds
97stored bytecode data 184 modules, 45 tables, 229 chunks
98node list callback tasks 4 unique tasks, 4 created, 20980 calls
99cleaned up reserved nodes 29 nodes, 10 lists of 1427
100node memory usage 19 gluespec, 2 dir
101hnode processing time 0.312 seconds including kernel
102attribute processing time 1.154 seconds
103used backend pdf (backend for directly generating pdf output)
104loaded patterns en:us:pat:exc:2
105jobdata time 0.078 seconds saving, 0.047 seconds loading
106callbacks direct: 86692, indirect: 13364, total: 100056
107interactive elements 178 references, 356 destinations
108vnode processing time 0.062 seconds
109loaded fonts 43 files: ....
110fonts load time 1.030 seconds
111metapost processing time 0.281 seconds, loading: 0.016 seconds,
112 execution: 0.156 seconds, n: 161
113result saved in file luatexreft.pdf
114luatex banner this is luatex, version beta0.42.0
115control sequences 31880 of 147189
116current memory usage 106 MB (ctx: 108 MB)
117runtime 12.433 seconds, 164 processed pages,
118 164 shipped pages, 13.191 pagessecond
119\stoptyping
120\stop
121
122The runtime is influenced by the fact that some startup time and
123font loading takes place. The more pages your document has, the
124less the runtime is influenced by this.
125
126More demanding is the \quote {mk} document (figure\ref{fig.mk}). Here
127we have many fonts, including some really huge \CJK\ and Arabic ones (and these are
128loaded at several sizes and with different features). The reported
129font load time is large but this is partly due to the fact that on
130my machine for some reason passing the tables to \TEX\ involved a
131lot of pagefaults (we think that the cpu cache is the culprit).
132Older versions of \LUATEX\ didnt have that performance penalty,
133so probably half of the reported font loading time is kind of
134wasted.
135
136The hnode processing time refers mostly to \OPENTYPE\ font
137processing and attribute processing time has to do with backend
138issues (like injecting color directives). The more features you
139enable, the larger these numbers get. The \METAPOST\ font loading
140refers to the punk font instances.
141
142\start \switchtobodyfont[small]
143\starttyping
144input load time 0.125 seconds
145stored bytecode data 184 modules, 45 tables, 229 chunks
146node list callback tasks 4 unique tasks, 4 created, 24295 calls
147cleaned up reserved nodes 116 nodes, 29 lists of 1411
148node memory usage 21 attribute, 23 gluespec, 7 attributelist,
149 7 localpar, 2 dir
150hnode processing time 1.763 seconds including kernel
151attribute processing time 2.231 seconds
152used backend pdf (backend for directly generating pdf output)
153loaded patterns en:us:pat:exc:2 engb:gb:pat:exc:3 nl:nl:pat:exc:4
154language load time 0.094 seconds, n=4
155jobdata time 0.062 seconds saving, 0.031 seconds loading
156callbacks direct: 98199, indirect: 20257, total: 118456
157xml load time 0.000 seconds, lpath calls: 46, cached calls: 31
158vnode processing time 0.234 seconds
159loaded fonts 69 files: ....
160fonts load time 28.205 seconds
161metapost processing time 0.421 seconds, loading: 0.016 seconds,
162 execution: 0.203 seconds, n: 65
163graphics processing time 0.125 seconds including tex, n=7
164result saved in file mk.pdf
165metapost font generation 0 glyphs, 0.000 seconds runtime
166metapost font loading 0.187 seconds, 40 instances,
167 213.904 instancessecond
168luatex banner this is luatex, version beta0.42.0
169control sequences 34449 of 147189
170current memory usage 454 MB (ctx: 465 MB)
171runtime 50.326 seconds, 316 processed pages,
172 316 shipped pages, 6.279 pagessecond
173\stoptyping
174\stop
175
176Looking at the Metafun manual one might expect that one needs
177even more time per page but this is not true. We use \OPENTYPE\
178fonts in base mode as we dont use fancy font features (base mode
179uses traditional \TEX\ methods). Most interesting here is the time
180involved in processing \METAPOST\ graphics. There are a lot of
181them (1772) and in addition we have 7 calls to independent
182\CONTEXT\ runs that take one third of the total runtime. About
183half of the runtime involves graphics.
184
185\start \switchtobodyfont[small]
186\starttyping
187input load time 0.109 seconds
188stored bytecode data 184 modules, 45 tables, 229 chunks
189node list callback tasks 4 unique tasks, 4 created, 33510 calls
190cleaned up reserved nodes 39 nodes, 93 lists of 1432
191node memory usage 249 attribute, 19 gluespec, 82 attributelist,
192 85 localpar, 2 dir
193hnode processing time 0.562 seconds including kernel
194attribute processing time 2.512 seconds
195used backend pdf (backend for directly generating pdf output)
196loaded patterns en:us:pat:exc:2
197jobdata time 0.094 seconds saving, 0.031 seconds loading
198callbacks direct: 143950, indirect: 28492, total: 172442
199interactive elements 214 references, 371 destinations
200vnode processing time 0.250 seconds
201loaded fonts 45 files: l.....
202fonts load time 1.794 seconds
203metapost processing time 5.585 seconds, loading: 0.047 seconds,
204 execution: 2.371 seconds, n: 1772,
205 external: 15.475 seconds (7 calls)
206mps conversion time 0.000 seconds, 1 conversions
207graphics processing time 0.499 seconds including tex, n=74
208result saved in file metafun.pdf
209luatex banner this is luatex, version beta0.42.0
210control sequences 32587 of 147189
211current memory usage 113 MB (ctx: 115 MB)
212runtime 43.368 seconds, 362 processed pages,
213 362 shipped pages, 8.347 pagessecond
214\stoptyping
215\stop
216
217By now it will be clear that processing a document takes a bit of
218time. However, keep in mind that these documents are a bit
219atypical. Although \unknown\ thee average \CONTEXT\ document
220probably uses color (including color spaces that involve resource
221management), and has multiple layers, which involves some testing of
222the about 30 areas that make up the page. And there is the
223user interface that comes with a price.
224
225It might be good to say a bit more about fonts. In \CONTEXT\ we
226use symbolic names and often a chain of them, so the abstract
227\type {SerifBold} resolves to \type {MyNiceFontSerifBold} which
228in turn resolves to \type {mnfsbold.otf}. As \XETEX\ introduced
229lookup by internal (or system) fontname instead of filename,
230\MKII\ also provides that method but \MKIV\ adds some heuristics
231to it. Users can specify font sizes in traditional \TEX\ units but
232also relative to the body font. All this involves a bit of
233expansion (resolving the chain) and parsing (of the
234specification). At each of the levels of name abstraction we can
235have associated parameters, like features, fallbacks and more.
236Although these mechanisms are quite optimized this still comes at a
237performance price.
238
239Also, in the default \MKIV\ font setup we use a couple more
240font variants (as they are available in Latin Modern). Weve kept
241definitions sort of dynamic so you can change them and combine
242them in many ways. Definitions are collected in typescripts which
243are filtered. We support multiple mixed font sets which takes a bit
244of time to define but switching is generally fast. Compared to \MKII\
245the model lacks the (font) encoding and case handling code (here
246we gain speed) but it now offers fallback fonts (replaced ranges
247within fonts) and dynamic \OPENTYPE\ font feature switching. When
248used we might lose a bit of processing speed although fewer
249definitions are needed which gets us some back. The font subsystem
250is anyway a factor in the performance, if only because more
251complex scripts or font features demand extensive node list
252parsing.
253
254Processing the \TEX book with \LUATEX\ on Tacos machine takes some
2553.5 seconds in \PDFTEX\ and 5.5 seconds in \LUATEX. This is
256because \LUATEX\ internally is \UNICODE\ and has a larger memory
257space. The few seconds more runtime are consistent with this. One
258of the reasons that The \TEX\ Book processes fast is that the font
259system is not that complex and has hardly any overhead, and an
260efficient output routine is used. The format file is small and the
261macro set is optimal for the task. The coding is rather low level
262so to say (no layers of interfacing). Anyway, 100 pages per second
263is not bad at all and we dont come close with \CONTEXT\ and the
264kind of documents that we produce there.
265
266This made me curious as to how fast really dumb documents could be
267processed. It does not make sense to compare plain \TEX\ and
268\CONTEXT\ because they do different things. Instead I decided to
269look at differences in engines and compare runs with different
270numbers of pages. That way we get an idea of how startup time
271influences overall performance. We look at \PDFTEX, which is
272basically an 8bit system, \XETEX, which uses external libraries and is
273\UNICODE, and \LUATEX\ which is also \UNICODE, but stays closer to
274traditional \TEX\ but has to check for callbacks.
275
276In our measurement we use a really simple test document as we only
277want to see how the baseline performs. As not much content is
278processed, we focus on loading (startup), the output routine and
279page building, and some basic \PDF\ generation. After all, its
280often a quick and dirty test that gives users their first
281impression. When looking at the times you need to keep in mind
282that \XETEX\ pipes to \DVIPDFMX\ and can benefit from multiple
283cpu cores. All systems have different memory management and garbage
284collection might influence performance (as demonstrated in an
285earlier chapter of the \quote{mk} document we can trace in detail
286how the runtime is distributed). As terminal output is a significant
287slowdown for \TEX\ we run in batchmode. The test is as follows:
288
289\starttyping
290\starttext
291 \dorecurse{2000}{test\page}
292\stoptext
293\stoptyping
294
295On my laptop (Dell M90 with 2.3Ghz T76000 Core 2 and 4MB memory
296running Vista) I get the following results. The test script ran
297each test set 5times and we show the fastest run so we kind of
298avoid interference with other processes that take time. In
299practice runtime differs quite a bit for similar runs, depending
300on the system load. The time is in seconds and between parentheses
301the number of pages per seconds is mentioned.
302
303
304
305
306
307
308
309
310
311\starttabulate[]
312\NC \bf engine \NC 30 \NC 300 \NC 2000 \NC 10000 \NC \NR
313\HL
314\NC \bf xetex \NC 1.81 (16) \NC 2.45 (122) \NC 6.97 (286) \NC 29.20 (342) \NC \NR
315\NC \bf pdftex \NC 1.28 (23) \NC 2.07 (144) \NC 6.96 (287) \NC 30.94 (323) \NC \NR
316\NC \bf luatex \NC 1.48 (20) \NC 2.36 (127) \NC 7.85 (254) \NC 34.34 (291) \NC \NR
317\stoptabulate
318
319The next table shows the same test but this time on a 2.5Ghz E5420
320quad core server with 16GB memory running Linux, but with 6
321virtual machines idling in the background. All binaries are 64 bit.
322
323
324
325
326
327
328
329
330
331\starttabulate[]
332\NC \bf engine \NC 30 \NC 300 \NC 2000 \NC 10000 \NC \NR
333\HL
334\NC \bf xetex \NC 0.92 (32) \NC 1.89 (158) \NC 8.74 (228) \NC 42.19 (237) \NC \NR
335\NC \bf pdftex \NC 0.49 (61) \NC 1.14 (262) \NC 5.23 (382) \NC 24.66 (405) \NC \NR
336\NC \bf luatex \NC 1.07 (27) \NC 1.99 (150) \NC 8.32 (240) \NC 38.22 (261) \NC \NR
337\stoptabulate
338
339A test demonstrated that for \LUATEX\ the 30 and 300 page runs
340take 70\% more runtime with 32 bit binaries (recent binaries for
341these engines are available on the \CONTEXT\ wiki \type
342{contextgarden.net}).
343
344When you compare both tables it will be clear that it is
345nontrivial to come to conclusions about performances. But one thing
346is clear: \LUATEX\ with \CONTEXT\ \MKIV\ is not performing that
347badly compared to its cousins. The \UNICODE\ engines perform about
348the same and \PDFTEX\ beats them significantly. Okay, I have to
349admit that in the meantime some cleanup of code in \MKIV\ has
350happened and the \LUATEX\ runs benefit from this, but on the other
351hand, the other engines are not hindered by callbacks. As I expect
352to use \MKII\ less frequently optimizing the older code makes no
353sense.
354
355There is not much chance of \LUATEX\ itself becoming faster,
356although a few days before writing this Taco managed to speed up
357font inclusion in the backend code significantly (were talking
358about half a second to a second for the three documents used
359here). On the contrary, when we open up more mechanisms and have
360upgraded backend code it might actually be a bit slower. On the
361other hand, I expect to be able to clean up some more \CONTEXT\
362code, although we already got rid of some subsystems (like the
363rather flexible (mixed) font encoding, where each language could
364have multiple hyphenation patters, etc.). Also, although initial
365loading of math fonts might take a bit more time (as long as we
366use virtual Latin Modern math), font switching is more efficient
367now due to fewer families. But speedups in the \CONTEXT\ code might
368be compensated for by more advanced mechanisms that call out to \LUA.
369You will be surprised by how much speed can be improved by proper
370document encoding and proper styles. I can try to gain a couple
371more pages per second by more efficient code, but a users style
372that does an inefficient massive font switch for some 10 words per
373page easily compensates for that.
374
375When processing this 10 page chapter in an editor (Scite) it takes
376some 2.7 seconds between hitting the processing key and the result
377showing up in Acrobat. I can live with that, especially when I
378keep in mind that my next computer will be faster.
379
380This is where we stand now. The three reports shown before give
381you an impression of the impact of \LUATEX\ on \CONTEXT. To what
382extent is this reflected in the code base? We end this chapter
383with showing four tables. The first table shows the number of
384files that make up the core of \CONTEXT\ (modules are excluded).
385The second table shows the accumulated size of these files
386(comments and spacing stripped). The third and fourth table show
387the same information in a different way, just to give you a better
388impression of the relative number of files and sizes. The four
389character tags represent the file groups, so the files have
390names like \type {nodeini.mkiv}, \type {fontotf.lua} and
391\type {suppbox.tex}.
392
393Eventually most \MKII\ files (with the \type {mkii} suffix) and
394\MKIV\ files (with suffix \type {mkiv}) will differ and the number
395of files with the \type {tex} suffix will be fewer. Because they
396are and will be mostly downward compatible, styles and modules
397will be shared as much as possible.
398
399\placefigure[none,90,page]{}{\externalfigure[mklaststate.pdf][page=1,width=\the\textheight]}
400\placefigure[none,90,page]{}{\externalfigure[mklaststate.pdf][page=2,width=\the\textheight]}
401\placefigure[none,90,page]{}{\externalfigure[mklaststate.pdf][page=3,width=\the\textheight]}
402\placefigure[none,90,page]{}{\externalfigure[mklaststate.pdf][page=4,width=\the\textheight]}
403
404\stopcomponent
405 |