onandon-performance.tex /size: 35 Kb    last modification: 2023-12-21 09:43
1% language=us
2
3% no zero timing compensation, just simple tests
4% m4all book
5
6\startcomponent onandon-performance
7
8\environment onandon-environment
9
10\startchapter[title=Performance]
11
12\startsection[title=Introduction]
13
14This chapter is about performance. Although it concerns \LUATEX\ this text is
15only meant for \CONTEXT\ users. This is not because they ever complain about
16performance, on the contrary, I never received a complain from them. No, it's
17because it gives them some ammunition against the occasionally occurring nagging
18about the speed of \LUATEX\ (somewhere on the web or at some meeting). My
19experience is that in most such cases those complaining have no clue what they're
20talking about, so effectively we could just ignore them, but let's, for the sake
21of our users, waste some words on the issue.
22
23\stopsection
24
25\startsection[title=What performance]
26
27So what exactly does performance refer to? If you use \CONTEXT\ there are
28probably only two things that matter:
29
30\startitemize[packed]
31\startitem How long does one run take? \stopitem
32\startitem How many runs do I need? \stopitem
33\stopitemize
34
35Processing speed is reported at the end of a run in terms of seconds spent on the
36run, but also in pages per second. The runtime is made up out of three
37components:
38
39\startitemize[packed]
40\startitem start-up time \stopitem
41\startitem processing pages \stopitem
42\startitem finishing the document \stopitem
43\stopitemize
44
45The startup time is rather constant. Let's take my 2013 Dell Precision with
46i7-3840QM as reference. A simple
47
48\starttyping
49\starttext
50\stoptext
51\stoptyping
52
53document reports 0.4 seconds but, as we wrap the run in an \type {mtxrun}
54management run, we have an additional 0.3 overhead (auxiliary file handling,
55\PDF\ viewer management, etc). This includes loading the Latin Modern font. With
56\LUAJITTEX, these times are below 0.3 and 0.2 seconds. It might look like a lot
57of overhead, but in an edit|-|preview runs it feels snappy. One can try this:
58
59\starttyping
60\stoptext
61\stoptyping
62
63which bring down the time to about 0.2 seconds for both engines but it doesn't
64do anything useful in practice.
65
66Finishing a document is not that demanding, because most gets flushed as we go.
67The more (large) fonts we use, the longer it takes to finish a document, but, on
68the average that time is not worth noticing. The main runtime contribution comes
69from processing the pages.
70
71Okay, this is not always true. For instance, if we process a 400 page book from
722500 small \XML\ files with multiple graphics per page, there is a little
73overhead in loading the files and in constructing the \XML\ tree as well as in
74inserting the graphics, but in such cases one expects a few seconds longer
75runtime. \METAFUN\ manual has some 450 pages with over 2500 runtime|-|generated
76\METAPOST\ graphics. It has color, uses quite some fonts, has lots of font
77switches (verbatim, too), but, still, one run takes only 18 seconds in stock
78\LUATEX\ and less and less that 15 seconds with \LUAJITTEX. Keep these numbers in
79mind if a non|-|\CONTEXT\ users bark against the performance tree that his few
80page mediocre document takes 10 seconds to compile: the content, styling, quality
81of macros and whatever one can come up with all play a role. Personally I find
82any rate between 10 and 30 pages per second acceptable, and, if I get the lower
83rate, then I normally know pretty well that the job is demanding in all kind of
84aspects.
85
86Over time, the \CONTEXT||\LUATEX\ combination, in spite of the fact that more
87functionality has been added, has not become slower. In fact, some subsystems
88have been sped up. For instance, font handling is very sensitive to adding
89functionality. However, each version so far performed a bit better. Whenever some
90neat new trickery was added, at the same time improvements were made thanks to
91more insight in the matter. In practice, we're not talking of changes in speed by
92large factors but more by small percentages. I'm pretty sure that most \CONTEXT\
93users never noticed. Recently, a 15\endash30\% speed up (in font handling) was
94realized (for more complex fonts), but only when you use such complex fonts and
95pages full of text will you see a positive impact on the whole run.
96
97There is one important factor I didn't mention yet: the efficiency of the
98console. You can best check that by making a format (\typ {context --make en}).
99When that is done by piping the messages to a file, it takes 3.2 seconds on my
100laptop and about the same when done from the editor (\SCITE), maybe because the
101\LUATEX\ run and the log pane run on a different thread. When I use the standard
102console, it takes 3.8 seconds in Windows 10 Creative update (in older versions it
103took 4.3 and slightly less when using a console wrapper). The powershell takes
1043.2 seconds, which is the same as piping to a file. Interesting is that in Bash
105on Windows, it takes 2.8 seconds and 2.6 seconds when piped to a file. Normal
106runs are somewhat slower, but it looks like the 64 bit Linux binary is somewhat
107faster than the 64 bit mingw version. \footnote {Long ago, we found that \LUATEX\
108is very sensitive to for instance the \CPU\ cache, so maybe there are some
109differences due to optimization flags and|/|or the fact that bash runs in one
110thread, and all file \IO\ takes place in the main Windows instance. Who knows.}
111Anyway, it demonstrates that when someone yells a number you need to ask what the
112conditions were.
113
114At a \CONTEXT\ meeting, there has been a presentation about possible speed|-|up
115of of a run by using, for instance, a separate syntax checker to prevent a
116useless run. However, the use case concerned a document that took a minute on the
117machine used, while the same document took a few seconds on mine. At the same
118meeting, we also did a comparison of speed for a \LATEX\ run using \PDFTEX\ and
119the same document migrated to \CONTEXT\ \MKIV\ using \LUATEX\ (Harald K\"onigs
120\XML\ torture and compatibility test). Contrary to what one might expect, the
121\CONTEXT\ run was significantly faster; the resulting document was a few
122gigabytes in size.
123
124\stopsection
125
126\startsection[title=Bottlenecks]
127
128I will discuss a few potential bottlenecks next. A complex integrated system like
129\CONTEXT\ has lots of components and some can be quite demanding. However, when
130something is not used, it has no (or hardly any) impact on performance. Even when
131we spend a lot of time in \LUA, that is not the reason for a slow|-|down.
132Sometimes using \LUA\ results in a speedup, sometimes it doesn't matter. Complex
133mechanisms like natural tables, for instance, will not suddenly become less
134complex. So, let's focus on the \quotation {aspects} that come up in those
135complaints: fonts and \LUA. Because I only use \CONTEXT\ and occasionally test
136with the plain \TEX\ version that we provide, I will not explore the potential
137impact of using truckloads of packages, styles, and such, which I'm sure of plays
138a role, but one neglected in my discussion.
139
140\startsubsubject[title=Fonts]
141
142According to the principles of \LUATEX, we process (\OPENTYPE) fonts using \LUA.
143That way, we have complete control over any aspect of font handling, and can, as
144to be expected in \TEX\ systems, provide users what they need, now and in the
145future. In fact, if we didn't had that freedom in \CONTEXT, I'd probably already
146quit using \TEX\ a decade ago and found myself some other (programming) niche.
147
148After a font has been loaded, part of the data gets passed to the \TEX\ engine,
149so that it can do its work. For instance, in order to be able to typeset a
150paragraph, \TEX\ needs to know the dimensions of glyphs. Once a font has been
151loaded (that is, the binary blob) it's fetched from a cache the next time.
152Initial loading (and preparation) takes some time, depending on the complexity
153and the size of the font. Loading from cache is close to instantaneous. After
154loading, the dimensions are passed to \TEX\ but all data remains accessible for
155any desired usage. The \OPENTYPE\ feature processor, for instance, uses that data
156and \CONTEXT, for sure, needs that data (quickly accessible) for different
157purposes, too.
158
159When a font is used in so|-|called base mode, we let \TEX\ do the ligaturing and
160kerning. This is possible with simple fonts and features. If you have a critical
161workflow, you might enable base mode, which can be done per font instance.
162Processing in node mode takes some time, but how much depends on the font and
163script. Normally, there is no difference between \CONTEXT\ and generic usage. In
164\CONTEXT, we also have dynamic features, and the impact on performance depends on
165usage. In addition to base and node, we also have plug mode, but that is only used
166for testing and therefore not advertised.
167
168Every \type {\hbox} and every paragraph goes through the font handler. Because
169we support mixed modes, some analysis takes place, and because we do more in
170\CONTEXT, the generic analyzer is more lightweight, which again can mean that a
171generic run is not slower than a similar \CONTEXT\ one.
172
173Interesting is that added functionality for variable and|/|or color fonts had no
174impact on performance. Runtime|-|added user features can have some impact, but,
175when defined well, it can be neglected. I bet that when you add additional node
176list handling yourself, its impact on performance will be larger. But in the end
177what counts is that the job gets done and the more you demand the higher the
178price you pay.
179
180\stopsubsubject
181
182\startsubsubject[title=\LUA]
183
184The second possible bottleneck when using \LUATEX\ can be in using \LUA\ code.
185However, using that is laughable as an argument for slow runs. For instance,
186\CONTEXT\ \MKIV\ can easily spend half its time in \LUA, and that is not making
187it any slower than \MKII\ using \PDFTEX\ doing equally complex things. For
188instance, the embedded \METAPOST\ library makes \MKIV\ way faster than \MKII, and
189the built|-|in \XML\ processing capabilities in \MKIV\ can easily beat \MKII\
190\XML\ handling, apart from the fact that it can do more, like filtering by path
191and expression. In fact, files that take, say, half a minute in \MKIV, could as
192well have taken 15 minutes or more in \MKII\ (and imagine multiple runs then).
193
194So, for \CONTEXT, using \LUA\ to achieve its objectives is mandatory. The
195combination of \TEX, \METAPOST\ and \LUA\ is pretty powerful! Each of these
196components is really fast. If \TEX\ is your bottleneck, review your macros! When
197\LUA\ seems to be the bad, go over your code and make it better. Much of the
198\LUA\ code I see flying around doesn't look that efficient, which is okay, because
199the interpreter is really fast, but don't blame \LUA\ beforehand, blame your
200coding (style) first. When \METAPOST\ is the bottleneck, well, sometimes not much
201can be done about it, but when you know that language well enough, you can often
202make it perform better.
203
204For the record: every additional mechanism that kicks in, like character spacing
205(the ugly one), case treatments, special word and line trickery, marginal stuff,
206graphics, line numbering, underlining, referencing, and a few dozen more will add
207a bit to the processing time. In that case, in \CONTEXT, the font related runtime
208gets pretty well obscured by other things happening, just that you know.
209
210\stopsubsubject
211
212\stopsection
213
214\startsection[title=Some timing]
215
216Next, I will show some timings related to fonts. For this, I use stock \LUATEX\
217(second column) as well as \LUAJITTEX\ (last column), which, of course, performs
218much better. The timings are rounded to three decimal places, but, as the system
219load is usually only consistent in a set of test runs, the last two decimals only
220matter in relative comparison. So, for comparing runs over time, round to the
221first decimal. Let's start with loading a bodyfont. This happens once per
222document, and one usually only has one bodyfont active. Loading involves
223definitions as well as setting up math, so a couple of fonts are actually loaded
224even if they're not used later on. A setup normally involves a serif, sans, mono
225and math setup (in \CONTEXT). \footnote {The timing for Latin Modern is so low,
226because that font is loaded already.}
227
228\environment onandon-speed-000
229
230\ShowSample{onandon-speed-000} % bodyfont
231
232There is a bit of a difference between the font sets, but a safe average is 150
233milliseconds, and this is rather constant over runs.
234
235An actual font switch can result in loading a font, but this is a one|-|time overhead.
236Loading four variants (regular, bold, italic and bold italic) roughly takes the
237following time:
238
239\ShowSample{onandon-speed-001} % four variants
240
241Using them again later on takes no time:
242
243\ShowSample{onandon-speed-002} % four variants
244
245Before we start timing the font handler, a few baseline benchmarks are shown.
246When no font is applied and nothing else is done with the node list, we get:
247
248\ShowSample{onandon-speed-009}
249
250A simple monospaced, no|-|features|-|applied, run takes a bit more:
251
252\ShowSample{onandon-speed-010}
253
254Now, we show a one|-|font typesetting run. As with the two benchmarks before, we
255just typeset a text in a \type {\hbox}, so no par builder interference happens.
256We use the \type {sapolsky} sample text and typeset it 100 times 4, first without
257font switches.
258
259\ShowSample{onandon-speed-003}
260
261Much more runtime is needed when we typeset with four font switches. Ebgaramond
262is the most demanding. Actually, we're not doing 4 fonts there because ebgaramond
263has no bold, so the numbers are a bit lower than expected for this example. One
264reason for it being demanding is that it has lots of (contextual) lookups.
265Combining lookups saves space and time, so complexity of a font is not always a
266good predictor for performance hits.
267
268% \ShowSample{onandon-speed-004}
269
270If we typeset paragraphs, we get the following:
271
272\ShowSample{onandon-speed-005}
273
274We're talking of some 275 pages here.
275
276\ShowSample{onandon-speed-006}
277
278There is, of course overhead in handling paragraphs and pages:
279
280\ShowSample{onandon-speed-011}
281
282Before I discuss these numbers in more detail, two more benchmarks are
283shown. The next table concerns a paragraph with only a few (bold) words.
284
285\ShowSample{onandon-speed-007}
286
287The following table has paragraphs with a few mono spaced words
288typeset using \type{\type}.
289
290\ShowSample{onandon-speed-008}
291
292When a node list (hbox or paragraph) is processed, each glyph is looked at. One
293important property of \LUATEX\ (compared to \PDFTEX) is that it hyphenates the
294whole text, not only the most feasible spots. For the \type {sapolsky} snippet,
295this results in 200 potential breakpoints registered in an equal number of
296discretionary nodes. The snippet has 688 characters grouped into 125 words and,
297because it's an English quote, we're not hampered with composed characters or
298complex script handling. And, when we mention 100 runs, then we actually mean
299400 ones when font switching and bodyfonts are compared
300
301\startnarrower
302    \showglyphs \showfontkerns
303    \input sapolsky \wordright{Robert M. Sapolsky}
304\stopnarrower
305
306In order to get substitutions and positioning right, we need not only to consult
307streams of glyphs but also combinations with preceding pre or replace, or
308trailing post and replace texts. When a font has a bit more complex substitutions,
309as ebgaramond has, multiple (sometimes hundreds of) passes over the list are made.
310This is why the more complex a font is, the more runtime is involved.
311
312Another factor, one you could easily deduce from the benchmarks, is intermediate
313font switches. Even a few such switches (in the last benchmarks) already result
314in a runtime penalty. The four switch benchmarks show an impressive increase of
315runtime, but it's good to know that such a situation seldom happens. It's also
316important not to confuse, for instance, a verbatim snippet with a bold one. The
317bold one is indeed leading to a pass over the list, but verbatim is normally
318skipped, because it uses a font that needs no processing. That verbatim or bold
319have the same penalty is mainly due to the fact that verbatim itself is costly:
320the text is picked up using a different catcode regime and travels through \TEX\
321and \LUA\ before it finally gets typeset. This relates to special treatments of
322spacing, syntax highlighting, and such.
323
324Also, keep in mind that the page examples are quite unreal. We use a layout with
325no margins, just text from edge to edge.
326
327\placefigure
328  {\SampleTitle{onandon-speed-005}}
329  {\externalfigure[onandon-speed-005][frame=on,orientation=90,width=.45\textheight]}
330
331\placefigure
332  {\SampleTitle{onandon-speed-006}}
333  {\externalfigure[onandon-speed-006][frame=on,orientation=90,maxwidth=.45\textheight,maxheight=\textwidth]}
334
335\placefigure
336  {\SampleTitle{onandon-speed-007}}
337  {\externalfigure[onandon-speed-007][frame=on,orientation=90,width=.45\textheight]}
338
339\placefigure
340  {\SampleTitle{onandon-speed-008}}
341  {\externalfigure[onandon-speed-008][frame=on,orientation=90,width=.45\textheight]}
342
343\placefigure
344  {\SampleTitle{onandon-speed-011}}
345  {\externalfigure[onandon-speed-011][frame=on,orientation=90,width=.45\textheight]}
346
347So, what is a realistic example? That is hard to say. Unfortunately, no one has
348ever asked us to typeset novels. They are rather brain dead-products for a
349machinery, so they process fast. On the mentioned laptop, 350 word pages in
350Dejavu fonts can be processed at a rate of 75 pages per second with \LUATEX\ and
351over 100 pages per second with \LUAJITTEX . On a more modern laptop or a
352professional server, the performance is of course better. And, for automated
353flows, batch mode is your friend. The rate is not much worse for a document in a
354language with a bit more complex character handling, take accents or ligatures.
355Of course, \PDFTEX\ is faster on such a dumb document, but kick in some more
356functionality, and the advantage quickly disappears. So, if someone complains
357that \LUATEX\ needs 10 or more seconds for a simple few page document \unknown\
358you can bet that when the fonts are seen as reason, then the setup is pretty bad.
359Personally I would not waste time on such a complaint.
360
361\stopsection
362
363\startsection[title=Valid questions]
364
365Here are some reasonable questions that you can ask when someone complains to you
366about the slowness of \LUATEX:
367
368\startsubsubject[title={What engines do you compare?}]
369
370If you come from \PDFTEX, you come from an 8-bit world: input and font handling
371are based on bytes, and hyphenation is integrated into the par builder. If you use
372\UTF-8\ in \PDFTEX, the input is decoded by \TEX\ macros, which carries a speed
373penalty. Because in the wide engines macro names can also be \UTF\ sequences,
374construction of macro names is less efficient too.
375
376When you try to use wide fonts, there is, again, a penalty. Now, if you use
377\XETEX\ or \LUATEX, your input is \UTF-8, which becomes something 32-bit
378internally. Fonts are wide, so more resources are needed, apart from these fonts
379being larger and in need of more processing due to feature handling. Where
380\XETEX\ uses a library, \LUATEX\ uses its own handler. Does that have a
381consequence for performance? Yes and no. First of all, it depends on how much
382time is spent on fonts at all, but even then, the difference is not that large.
383Sometimes \XETEX\ wins, sometimes it's \LUATEX. One thing is clear: \LUATEX\ is
384more flexible as we can roll out our own solutions and therefore do more advanced
385font magic. For \CONTEXT, it doesn't matter as we use \LUATEX\ exclusively, and
386we rely on the flexible font handler, also for future extensions. If really
387needed, you can kick in a library-based handler but it's (currently) not
388distributed as we lose other functionality, which would, in turn, result in
389complaints about that fact (apart from conflicting with the strive for
390independence).
391
392There is no doubt that \PDFTEX\ is faster, but, for \CONTEXT, it's an obsolete
393engine. The hard-coded-solutions engine \XETEX\ is not feasible for \CONTEXT\
394either. So, in practice, \CONTEXT\ users have no choice: \LUATEX\ is used, but
395users of other macro packages can use the alternatives if they are not satisfied
396with performance. The fact that \CONTEXT\ users don't complain about speed is a
397clear signal that this is a no|-|issue. And, if you want more speed, you can always
398use \LUAJITTEX. \footnote {In plug mode, we can actually test a library and
399experiments have shown that performance on the average is much worse, but it can
400be a bit better for complex scripts, although a gain gets unnoticed in normal
401documents. So, one can decide to use a library but at the cost of much other
402functionality that \CONTEXT\ offers, so we don't support it.} In the last section,
403the different engines will be compared in more detail.
404
405Just that you know, when we do the four|-|switches example in plain \TEX\ on my
406laptop, I get a rate of 40 pages per second, and, for one font, 180 pages per
407second. There is, of course, a bit more going on in \CONTEXT\ in page building
408and so, but the difference between plain and \CONTEXT\ is not that large.
409
410\stopsubsubject
411
412\startsubsubject[title={What macro package is used?}]
413
414When plain \TEX\ is used, a follow up question is: what variant? The \CONTEXT\
415distribution ships with \type {luatex-plain}, and that is our benchmark. If there
416really is a bottleneck, it is worth exploring, but keep in mind that, in order to
417be plain, not that much can be done. The \LUATEX\ part is just an example of an
418implementation. We already discussed \CONTEXT, and for \LATEX, I don't want to
419speculate where performance hits might come from. When we're talking fonts,
420\CONTEXT\ can actually be a bit slower than the generic (or \LATEX) variant, because
421we can kick in more functionality. Also, when you compare macro packages, keep in
422mind that, when node list processing code is added in that package, the impact
423depends on interaction with other functionality and depends on the efficiency of
424the code. You can't compare mechanisms or draw general conclusions when you don't
425know what else is done!
426
427\stopsubsubject
428
429\startsubsubject[title={What do you load?}]
430
431Most \CONTEXT\ modules are small and load fast. Of course, there can be exceptions
432when we rely on third party code; for instance, loading tikz takes a bit of
433time. It makes no sense to look for ways to speed that system up, because it is
434maintained elsewhere. There can probably be gained a bit, but, again, no user
435has complained so far.
436
437If \CONTEXT\ is not used, one probably also uses a large \TEX\ installation.
438File lookup in \CONTEXT\ is done differently, and can be faster. Even loading
439can be more efficient in \CONTEXT, but it's hard to generalize that conclusion.
440If one complains about loading fonts being an issue, just try to measure how much
441time is spent on loading other code.
442
443\stopsubsubject
444
445\startsubsubject[title={Did you patch macros?}]
446
447Not everyone is a \TEX pert. So, coming up with macros that are expanded many
448times and|/|or have inefficient user interfacing, can have some impact. If someone
449complains about one subsystem being slow, then honesty demands to complain about
450other subsystems as well. You get what you ask for.
451
452\stopsubsubject
453
454\startsubsubject[title={How efficient is the code that you use?}]
455
456Writing super|-|efficient code only makes sense when it's used frequently. In
457\CONTEXT, most code is reasonable efficient. It can be that in one document fonts
458are responsible for most runtime, but in another document, table construction can
459be more demanding while yet another document puts some stress on interactive
460features. When hz or protrusion is enabled, then you run substantially slower
461anyway, so when you are willing to sacrifice 10 \% or more of runtime, don't
462complain about other components. The same is true for enabling \SYNCTEX: if you
463are willing to add more than 10 \% of runtime for that, don't wither about the
464same amount for font handling. \footnote {In \CONTEXT, we use a \SYNCTEX\
465alternative that is somewhat faster, but it remains a fact that enabling more and
466more functionality will make the penalty of, for instance, font processing
467relatively small.}
468
469\stopsubsubject
470
471\startsubsubject[title={How efficient is the styling that you use?}]
472
473Probably the most easily overlooked optimization is in switching fonts and colors.
474Although in \CONTEXT, font switching is fast, I have no clue about it in other
475macro packages. But in a style, you can decide to use inefficient (massive) font
476switches. The effects can easily be tested by commenting bit and pieces. For
477instance, sometimes you need to do a full bodyfont switch when changing a style,
478like assigning \type {\small\bf} to the \type {style} key in \type {\setuphead},
479but often using e.g.\ \type {\tfd} is much more efficient and works quite as
480well. Just try it.
481
482\stopsubsubject
483
484\startsubsubject[title={Are fonts really the bottleneck?}]
485
486We already mentioned that one can look in the wrong direction. Maybe, once someone
487is convinced that fonts are the culprit, it gets hard to look at the real issue.
488If a similar job in different macro packages has a significantly different runtime,
489one can wonder what happens indeed.
490
491It is good to keep in mind that the amount of text is often not as large as you
492think. It's easy to do a test with hundreds of paragraphs of text, but, in practice,
493we have whitespace, section titles, half empty pages, floats, itemize and similar
494constructs, etc. Often, we don't mix many fonts in the running text either. So,
495in the end, a real document is your best test.
496
497\stopsubsubject
498
499\startsubsubject[title={If you use \LUA, is that code any good?}]
500
501You can gain from the faster virtual machine of \LUAJITTEX. Don't expect wonders
502from the jitting as that only pays off in long runs with the same code used over
503and over again. If the gain is high, you can even wonder how well-written your
504\LUA\ code is anyway.
505
506\stopsubsubject
507
508\startsubsubject[title={What if they don't believe you?}]
509
510So, say that someone finds \LUATEX\ slow, what can be done about it? Just advice
511them to stick to their previously|-|used tool. Then, if arguments come that one
512also wants to use \UTF-8, \OPENTYPE\ fonts, a bit of \METAPOST, and is looking
513forward to using \LUA\ runtime, the only answer is: take it or leave it. You pay
514a price for progress, but, if you do your job well, the price is not that high.
515Tell them to spend time on learning and maybe adapting and to bark against their own
516tree before barking against those who took that step a decade ago. Most \CONTEXT\
517users took that step and someone still using \LUATEX\ after a decade can't be
518that stupid. It's always best to first wonder what one actually asks from \LUATEX,
519and if the benefit of having \LUA\ on board has an advantage. If not, one can
520just use another engine.
521
522Also think of this: when a job is slow, for me it's no problem to identify where
523the problem is. The question then is: can something be done about it? Well, I
524happily keep the answer for myself. After all, some people always need room to
525complain, maybe if only to hide their ignorance or incompetence. Who knows.
526
527\stopsubsubject
528
529\stopsection
530
531\startsection[title={Comparing engines}]
532
533The next comparison is to be taken with a grain of salt and concerns the state of
534affairs mid-2017. First of all, you cannot really compare \MKII\ with \MKIV: the
535latter has more functionality (or a more advanced implementation of
536functionality). And, as mentioned, you can also not really compare \PDFTEX\ and the
537wide engines. Anyway, here are some (useless) tests. First, a bunch of loads. Keep
538in mind that different engines also deal differently with reading files. For
539instance, \MKIV\ uses \LUATEX\ callbacks to normalize the input and has its own
540readers. There is a bit more overhead in starting up a \LUATEX\ run, and some
541functionality is enabled that is not present in \MKII. The format is also larger,
542if only because we preload a lot of useful font, character and script related
543data.
544
545\starttyping
546\starttext
547    \dorecurse {#1} {
548        \input knuth
549        \par
550    }
551\stoptext
552\stoptyping
553
554When looking at the numbers, one should realize that the times include startup and
555job management by the runner scripts. We also run in batchmode to avoid logging
556to influence runtime. The average is calculated from 5 runs.
557
558% sample 1, number of runs: 5
559
560\starttabulate[||r|r|r|]
561\HL
562\BC engine \BC 50 \BC 500 \BC 2500 \NC \NR
563\HL
564\BC pdftex    \NC 0.43 \NC 0.77 \NC 2.33 \NC \NR
565\BC xetex     \NC 0.85 \NC 2.66 \NC 10.79 \NC \NR
566\BC luatex    \NC 0.94 \NC 2.50 \NC 9.44 \NC \NR
567\BC luajittex \NC 0.68 \NC 1.69 \NC 6.34 \NC \NR
568\HL
569\stoptabulate
570
571The second example does a few switches in a paragraph:
572
573\starttyping
574\starttext
575    \dorecurse {#1} {
576        \tf \input knuth
577        \bf \input knuth
578        \it \input knuth
579        \bs \input knuth
580        \par
581    }
582\stoptext
583\stoptyping
584
585% sample 2, number of runs: 5
586
587\starttabulate[||r|r|r|]
588\HL
589\BC engine \BC 50 \BC 500 \BC 2500 \NC \NR
590\HL
591\BC pdftex    \NC 0.58 \NC 2.10 \NC 8.97 \NC \NR
592\BC xetex     \NC 1.47 \NC 8.66 \NC 42.50 \NC \NR
593\BC luatex    \NC 1.59 \NC 8.26 \NC 38.11 \NC \NR
594\BC luajittex \NC 1.12 \NC 5.57 \NC 25.48 \NC \NR
595\HL
596\stoptabulate
597
598The third example does more, resulting in multiple subranges per style:
599
600\starttyping
601\starttext
602    \dorecurse {#1} {
603        \tf \input knuth \it knuth
604        \bf \input knuth \bs knuth
605        \it \input knuth \tf knuth
606        \bs \input knuth \bf knuth
607        \par
608    }
609\stoptext
610\stoptyping
611
612% sample 3, number of runs: 5
613
614\starttabulate[||r|r|r|]
615\HL
616\BC engine \BC 50 \BC 500 \BC 2500 \NC \NR
617\HL
618\BC pdftex    \NC 0.59 \NC 2.20 \NC 9.52 \NC \NR
619\BC xetex     \NC 1.49 \NC 8.88 \NC 43.85 \NC \NR
620\BC luatex    \NC 1.64 \NC 8.91 \NC 41.26 \NC \NR
621\BC luajittex \NC 1.15 \NC 5.91 \NC 27.15 \NC \NR
622\HL
623\stoptabulate
624
625The last example adds some color. Enabling more functionality can have an impact
626on performance. In fact, as \MKIV\ uses a lot of \LUA\ and is also more advanced
627that \MKII, one can expect a performance hit, but, in practice, the opposite
628happens, which can also be due to some fundamental differences deep down at the
629macro level.
630
631\starttyping
632\setupcolors[state=start] % default in MkIV
633
634\starttext
635    \dorecurse {#1} {
636        {\red \tf \input knuth \green \it knuth}
637        {\red \bf \input knuth \green \bs knuth}
638        {\red \it \input knuth \green \tf knuth}
639        {\red \bs \input knuth \green \bf knuth}
640        \par
641    }
642\stoptext
643\stoptyping
644
645% sample 4, number of runs: 5
646
647\starttabulate[||r|r|r|]
648\HL
649\BC engine \BC 50 \BC 500 \BC 2500 \NC \NR
650\HL
651\BC pdftex    \NC 0.61 \NC 2.36 \NC 10.33 \NC \NR
652\BC xetex     \NC 1.53 \NC 9.25 \NC 45.59 \NC \NR
653\BC luatex    \NC 1.65 \NC 8.91 \NC 41.32 \NC \NR
654\BC luajittex \NC 1.15 \NC 5.93 \NC 27.34 \NC \NR
655\HL
656\stoptabulate
657
658In these measurements, the accuracy is a few decimals, but a pattern is visible.
659As expected, \PDFTEX\ wins on simple documents but starts losing when things get
660more complex. For these tests, I used 64 bit binaries. A 32-bit \XETEX\ with
661\MKII\ performs the same as \LUAJITTEX\ with \MKIV, but a 64-bit \XETEX\ is
662actually quite a bit slower. In that case, the mingw cross|-|compiled \LUATEX\
663version does pretty well. A 64-bit \PDFTEX\ is also slower (it looks) than a
66432-bit version. So, in the end, there are more factors that play a role. Choosing
665between \LUATEX\ and \LUAJITTEX\ depends on how well the memory|-|limited
666\LUAJITTEX\ variant can handle your documents and fonts.
667
668Because in most of our recent styles we use \OPENTYPE\ fonts and (structural)
669features as well as recent \METAFUN\ extensions only present in \MKIV, we cannot
670compare engines using such documents. The mentioned performance of \LUATEX\ (or
671\LUAJITTEX) and \MKIV\ on the \METAFUN\ manual illustrate that, in most cases, this
672combination is a clear winner.
673
674\starttyping
675\starttext
676    \dorecurse {#1} {
677        \null \page
678    }
679\stoptext
680\stoptyping
681
682This gives:
683
684% sample 5, number of runs: 5
685
686\starttabulate[||r|r|r|]
687\HL
688\BC engine \BC 50 \BC 500 \BC 2500 \NC \NR
689\HL
690\BC pdftex    \NC 0.46 \NC 1.05 \NC 3.72 \NC \NR
691\BC xetex     \NC 0.73 \NC 1.80 \NC 6.56 \NC \NR
692\BC luatex    \NC 0.84 \NC 1.44 \NC 4.07 \NC \NR
693\BC luajittex \NC 0.61 \NC 1.10 \NC 3.33 \NC \NR
694\HL
695\stoptabulate
696
697That leaves the zero run:
698
699\starttyping
700\starttext
701    \dorecurse {#1} {
702        % nothing
703    }
704\stoptext
705\stoptyping
706
707This gives the following numbers. In longer runs, the difference in overhead is
708negligible.
709
710% sample 6, number of runs: 5
711
712\starttabulate[||r|r|r|]
713\HL
714\BC engine \BC 50 \BC 500 \BC 2500 \NC \NR
715\HL
716\BC pdftex    \NC 0.36 \NC 0.36 \NC 0.36 \NC \NR
717\BC xetex     \NC 0.57 \NC 0.57 \NC 0.59 \NC \NR
718\BC luatex    \NC 0.74 \NC 0.74 \NC 0.74 \NC \NR
719\BC luajittex \NC 0.53 \NC 0.53 \NC 0.54 \NC \NR
720\HL
721\stoptabulate
722
723It will be clear that when we use different fonts, the numbers will also be
724different. And, if you use a lot of runtime \METAPOST\ graphics (for instance for
725backgrounds), the \MKIV\ runs end up at the top. And, when we process \XML, it
726will be clear that going back to \MKII\ is no longer a realistic option. It must
727be noted that I occasionally manage to improve performance, but we've now reached
728a state where there is not that much to gain. Some functionality is hard to
729compare. For instance, in \CONTEXT, we don't use much of the \PDF\ backend
730features because we implement them all in \LUA. In fact, even in \MKII, already
731done in \TEX, so in the end, the speed difference there is not large and often in
732favour of \MKIV.
733
734For the record, I mention that shipping out the about 1250 pages has some overhead
735too: about 2 seconds. Here, \LUAJITTEX\ is 20\% more efficient, which is an
736indication of quite some \LUA\ involvement. Loading the input files has an
737overhead of about half a second. Starting up \LUATEX\ takes more time than
738\PDFTEX\ and \XETEX, but that disadvantage disappears with more pages. So, in the
739end, there are quite some factors that blur the measurements. In practice, what
740matters is convenience: does the runtime feel reasonable and, in most cases, it
741does.
742
743If I would replace my laptop with a reasonable comparable alternative, that one
744would be some 35\% faster (single threads on processors don't gain much per year).
745I guess that this is about the same increase in performance than \CONTEXT\
746\MKIV\ got in that period. I don't expect such a gain in the upcoming years, so,
747at some point, we're stuck with what we have.
748
749\stopsection
750
751\startsection[title=Summary]
752
753So, how \quotation {slow} is \LUATEX\ really compared to the other engines? If we
754go back in time to when the first wide engines showed up, \OMEGA\ was considered
755to be slow, although I never tested that myself. Then, when \XETEX\ showed up,
756there was not much talk about speed, just about the fact that we could use
757\OPENTYPE\ fonts and native \UTF\ input. If you look at the numbers, for sure you
758can say that it was much slower than \PDFTEX. So, how come that some people
759complain about \LUATEX\ being so slow, especially when we take into account that
760it's not that much slower than \XETEX, and that \LUAJITTEX\ is often faster than
761\XETEX. Also, computers have become faster. With the wide engines, you get more
762functionality and that comes at a price. This was accepted for \XETEX\ and is
763also acceptable for \LUATEX. But the price is nto that high if you take into
764account that hardware performs better: you just need to compare \LUATEX\ (and
765\XETEX) runtime with \PDFTEX\ runtime 15 years ago.
766
767As a comparison, look at games and video. Resolution became much higher as did
768color depth. Higher frame rates were in demand. Therefore, the hardware had to
769become faster, and it did, and, as a result, the user experience kept up. No user
770will say that a modern game is slower than an old one, because the old one does
771500 frames per second compared to some 50 for the new game on the modern
772hardware. In a similar fashion, the demands for typesetting became higher:
773\UNICODE, \OPENTYPE, graphics, \XML, advanced \PDF, more complex (niche)
774typesetting, etc. This happened more or less in parallel with computers becoming
775more powerful. So, as with games, the user experience didn't degrade with
776demands. Comparing \LUATEX\ with \PDFTEX\ is like comparing a low|-|res,
777low|-|framerate, low|-|color game with a modern one. You need to have
778up|-|to|-|date hardware and even then, the writer of such programs needs to make
779sure that they run efficiently, simply because hardware no longer scales like it
780did decades ago. You need to look at the bigger picture.
781
782\stopsection
783
784\stopchapter
785
786\stopcomponent
787