hybrid-jit.tex /size: 30 Kb    last modification: 2023-12-21 09:43
1% language=us
2
3\startcomponent hybrid-backends
4
5\environment hybrid-environment
6
7\logo[SWIGLIB]  {SwigLib}
8\logo[LUAJIT]   {LuaJIT}
9\logo[LUAJITTEX]{Luajit\TeX}
10\logo[JIT]      {jit}
11
12\startchapter[title={Just in time}]
13
14\startsection [title={Introduction}]
15
16Reading occasional announcements about \LUAJIT,\footnote {\LUAJIT\ is written by
17Mike Pall and more information about it and the technology it uses is at \type
18{http://luajit.org}, a site also worth visiting for its clean design.} one starts
19wondering if just||in||time compilation can speed up \LUATEX. As a side track of
20the \SWIGLIB\ project and after some discussion, Luigi Scarso decided to compile
21a version of \LUATEX\ that had the \JIT\ compiler as the \LUA\ engine. That's
22when our journey into \JIT\ began.
23
24We started with \LINUX\ 32-bit as this is what Luigi used at that time. Some
25quick first tests indicated that the \LUAJIT\ compiler made \CONTEXT\ \MKIV\ run
26faster but not that much. Because \LUAJIT\ claims to be much faster than stock
27\LUA, Luigi then played a bit with \type {ffi}, i.e.\ mixing \CCODE\ and \LUA,
28especially data structures. There is indeed quite some speed to gain here;
29unfortunately, we would have to mess up the \CONTEXT\ code base so much that one
30might wonder why \LUA\ was used in the first place. I could confirm these
31observations in a Xubuntu virtual machine in \VMWARE\ running under 32-bit
32Windows 8. So, we decided to conduct some more experiments.
33
34A next step was to create a 64-bit binary because the servers at \PRAGMA\ are
35\KVM\ virtual machines running a 64-bit OpenSuse 12.1 and 12.2. It took a bit of
36effort to get a \JIT\ version compiled because Luigi didn't want to mess up the
37regular codebase too much. This time we observed a speedup of about 40\% on some
38runs so we decided to move on to \WINDOWS\ to see if we could observe a similar
39effect there. And indeed, when we adapted Akira Kakuto's \WINDOWS\ setup a bit we
40could compile a version for \WINDOWS\ using the native \MICROSOFT\ compiler. On
41my laptop a similar speedup was observed, although by then we saw that in
42practice a 25\% speedup was about what we could expect. A bonus is that making
43formats and identifying fonts is also faster.
44
45So, in that stage, we could safely conclude that \LUATEX\ combined with \LUAJIT\
46made sense if you want a somewhat faster version. But where does the speedup come
47from? The easiest way to see if jitting has effect is to turn it on and off.
48
49\starttyping
50jit.on()
51jit.off()
52\stoptyping
53
54To our surprise \CONTEXT\ runs are not much influenced by turning the jitter on
55or off. \footnote {We also tweaked some of the fine|-|tuning parameters of
56\LUAJIT\ but didn't notice any differences. In due time more tests will
57be done.} This means that the improvement comes from other places:
58
59\startitemize[packed,n]
60\startitem The virtual machine is a different one, and targets the platforms that
61it runs on. This means that regular bytecode also runs faster. \stopitem
62\startitem The garbage collector is the one from \LUA\ 5.2, so that can make a
63difference. It looks like memory consumption is somewhat lower. \stopitem
64\startitem Some standard library functions are recognized and supported in a more
65efficient way. Think of \type {math.sin}. \stopitem
66\startitem Some built-in functions like \type {type} are probably dealt with in
67a more efficient way. \stopitem
68\stopitemize
69
70The third item is an important one. We don't use that many standard functions.
71For instance, if we need to go from characters to bytes and vice versa, we have
72to do that for \UTF\ so we use some dedicated functions or \LPEG. If in \CONTEXT\
73we parse strings, we often use \LPEG\ instead of string functions anyway. And if
74we still do use string functions, for instance when dealing with simple strings,
75it only happens a few times.
76
77The more demanding \CONTEXT\ code deals with node lists, which means frequent
78calls to core \LUATEX\ functions. Alas, jitting doesn't help much there unless we
79start messing with \type {ffi} which is not on the agenda. \footnote {If we want
80to improve these mechanisms it makes much more sense to make more helpers.
81However, profiling has shown us that the most demanding code is already quite
82optimized.}
83
84\stopsection
85
86\startsection[title=Benchmarks]
87
88Let's look at some of the benchmarks. The first one uses \METAPOST\ and because
89we want to see if calculations are faster, we draw a path with a special pen so
90that some transformations have to be done in the code that generates the \PDF\
91output. We only show the \MSWINDOWS\ and 64-bit \LINUX\ tests here. The 32-bit
92tests are consistent with those on \MSWINDOWS\ so we didn't add those timings
93here (also because in the meantime Luigi's machine broke down and he moved on
94to 64 bits).
95
96\typefile{benchmark-1.tex}
97
98The following times are measured in seconds. They are averages of 5~runs. There
99is a significant speedup but jitting doesn't do much.
100
101% mingw crosscompiled 5.2 / new mp : 25.5
102
103\starttabulate[|l|r|r|r|]
104\HL
105\NC               \NC traditional \NC \JIT\ on \NC \JIT\ off \NC \NR
106\HL
107\NC \bf Windows 8 \NC 26.0        \NC 20.6     \NC 20.8      \NC \NR
108\NC \bf Linux 64  \NC 34.2        \NC 14.9     \NC 14.1      \NC \NR
109\HL
110\stoptabulate
111
112Our second example uses multiple fonts in a paragraph and adds color as well.
113Although well optimized, font||related code involves node list parsing and a
114bit of calculation. Color again deals with node lists and the backend
115code involves calculations but not that many. The traditional run on \LINUX\ is
116somewhat odd, but might have to do with the fact that the \METAPOST\ library
117suffers from the 64 bits. It is at least an indication that optimizations make
118less sense if there is a different dominant weak spot. We have to look into this
119some time.
120
121\typefile{benchmark-2.tex}
122
123Again jitting has no real benefits here, but the overall gain in speed is quite
124nice. It could be that the garbage collector plays a role here.
125
126% mingw crosscompiled 5.2 / new mp : 64.3
127
128\starttabulate[|l|r|r|r|]
129\HL
130\NC               \NC traditional \NC \JIT\ on \NC \JIT\ off \NC \NR
131\HL
132\NC \bf Windows 8 \NC 54.6        \NC 36.0     \NC 35.9      \NC \NR
133\NC \bf Linux 64  \NC 46.5        \NC 32.0     \NC 31.7      \NC \NR
134\HL
135\stoptabulate
136
137This benchmark writes quite a lot of data to the console, which can have impact on
138performance as \TEX\ flushes on a per||character basis. When one runs \TEX\ as a
139service this has less impact because in that case the output goes into the void.
140There is a lot of file reading going on here, but normally the operating system
141will cache data, so after a first run this effect disappears. \footnote {On \MSWINDOWS\
142it makes sense to use \type {console2} because due to some clever buffering
143tricks it has a much better performance than the default console.}
144
145The third benchmark is one that we often use for testing regression in speed of
146the \CONTEXT\ core code. It measures the overhead in the page builder without
147special tricks being used, like backgrounds. The document has some 1000 pages.
148
149\typefile{benchmark-3.tex}
150
151These numbers are already quite okay for the normal version but the speedup of
152the \LUAJIT\ version is consistent with the expectations we have by now.
153
154% mingw crosscompiled 5.2 / new mp : 6.8
155
156\starttabulate[|l|r|r|r|]
157\HL
158\NC               \NC traditional \NC \JIT\ on \NC \JIT\ off \NC \NR
159\HL
160\NC \bf Windows 8 \NC 4.5         \NC 3.6      \NC 3.6       \NC \NR
161\NC \bf Linux 64  \NC 4.8         \NC 3.9      \NC 4.0       \NC \NR
162\HL
163\stoptabulate
164
165The fourth benchmark uses some structuring, which involved \LUA\ tables and
166housekeeping, an itemize, which involves numbering and conversions, and a table
167mechanism that uses more \LUA\ than \TEX.
168
169\typefile{benchmark-4.tex}
170
171Here it looks like \JIT\ slows down the process, but of course we shouldn't take the last
172digit too seriously.
173
174% mingw crosscompiled 5.2 / new mp : 27.4
175
176\starttabulate[|l|r|r|r|]
177\HL
178\NC               \NC traditional \NC \JIT\ on \NC \JIT\ off \NC \NR
179\HL
180\NC \bf Windows 8 \NC 20.9        \NC 16.8     \NC 16.5      \NC \NR
181\NC \bf Linux 64  \NC 20.4        \NC 16.0     \NC 16.1      \NC \NR
182\HL
183\stoptabulate
184
185Again, this example does a bit of logging, but not that much reading from file as
186buffers are kept in memory.
187
188We should start wondering when \JIT\ does kick in. This is what the fifth
189benchmark does.
190
191\typefile{benchmark-5.tex}
192
193Here we see \JIT\ having an effect! First of all the \LUAJIT\ versions are now 4~times
194faster. Making the \type {sin} a \type {local} function (the numbers after /) does not
195make much of a difference because the math functions are optimized anyway.. See how
196we're still faster when \JIT\ is disabled:
197
198% mingw crosscompiled 5.2 / new mp : 2.5/2.1
199
200\starttabulate[|l|r|r|r|]
201\HL
202\NC               \NC traditional \NC \JIT\ on    \NC \JIT\ off   \NC \NR
203\HL
204\NC \bf Windows 8 \NC 1.97 / 1.54 \NC 0.46 / 0.45 \NC 0.73 / 0.61 \NC \NR
205\NC \bf Linux 64  \NC 1.62 / 1.27 \NC 0.41 / 0.42 \NC 0.67 / 0.52 \NC \NR
206\HL
207\stoptabulate
208
209Unfortunately this kind of calculation (in these amounts) doesn't happen that
210often but maybe some users can benefit.
211
212\stopsection
213
214\startsection[title=Conclusions]
215
216So, does it make sense to complicate the \LUATEX\ build with \LUAJIT ? It does
217when speed matters, for instance when \CONTEXT\ is run as a service. Some 25\% gain
218in speed means less waiting time, better use of \CPU\ cycles, less energy
219consumption, etc. On the other hand, computers are still becoming faster and compared
220to those speed|-|ups the 25\% is not that much. Also, as \TEX\ deals with files,
221the advance of \SSD\ disks and larger and faster memory helps too. Faster and
222larger \CPU\ caches contributes too. On the other hand, multiple cores don't help that
223much on a system that only runs \TEX. Interesting is that multi|-|core
224architectures tend to run at slower speeds than single cores where more heat can
225be dissipated and in that respect servers mostly running \TEX\ are better off with
226fewer cores that can run at higher frequencies. But anyhow, 25\% is still better
227than nothing and it makes my old laptop feel faster. It prolongs the lifetime
228of machines!
229
230Now, say that we cannot speed up \TEX\ itself that much, but that there is still
231something to gain at the \LUA\ end \emdash\ what can we reasonably expect? First of all
232we need to take into account that only part of the runtime is due to \LUA. Say
233that this is 25\% for a document of average complexity.
234
235\startnarrower
236runtime\low{tex} + runtime\low{lua} = 100
237\stopnarrower
238
239We can consider the time needed by \TEX\ to be constant; so if that is
24075\% of the total time (say 100 seconds) to begin with, we have:
241
242\startnarrower
24375 + runtime\low{lua} = 100
244\stopnarrower
245
246It will be clear that if we bring down the runtime to 80\% (80 seconds) of the
247original we end up with:
248
249\startnarrower
25075 + runtime\low{lua} = 80
251\stopnarrower
252
253And the 25 seconds spent in \LUA\ went down to 5, meaning that \LUA\ processing
254got 5 times faster! It is also clear that getting much more out of \LUA\
255becomes hard. Of course we can squeeze more out of it, but \TEX\ still needs its
256time. It is hard to measure how much time is actually spent in \LUA. We do keep
257track of some times but it is not that accurate. These experiments and the gain
258in speed indicate that we probably spend more time in \LUA\ than we first
259guessed. If you look in the \CONTEXT\ source it's not that hard to imagine that
260indeed we might well spend 50\% or more of our time in \LUA\ and|/|or in
261transferring control between \TEX\ and \LUA. So, in the end there still might
262be something to gain.
263
264Let's take benchmark 4 as an example. At some point we measured for a regular
265\LUATEX\ 0.74 run 27.0 seconds and for a \LUAJITTEX\ run 23.3 seconds. If we
266assume that the \LUAJIT\ virtual machine is twice as fast as the normal one, some
267juggling with numbers makes us conclude that \TEX\ takes some 19.6 seconds of
268this. An interesting border case is \type {\directlua}: we sometimes pass quite
269a lot of data and that gets tokenized first (a \TEX\ activity) and the resulting
270token list is converted into a string (also a \TEX\ activity) and then converted
271to bytecode (a \LUA\ task) and when okay executed by \LUA. The time involved in
272conversion to byte code is probably the same for stock \LUA\ and \LUAJIT.
273
274In the \LUATEX\ case, 30\% of the runtime for benchmark 4 is on \LUA's tab, and
275in \LUAJITTEX\ it's 15\%. We can try to bring down the \LUA\ part even more, but
276it makes more sense to gain something at the \TEX\ end. There macro expansion
277can be improved (read: \CONTEXT\ core code) but that is already rather
278optimized.
279
280Just for the sake of completeness Luigi compiled a stock \LUATEX\ binary for 64-bit
281\LINUX\ with the \type {-o3} option (which forces more inlining of functions
282as well as a different switch mechanism). We did a few tests and this is the result:
283
284\starttabulate[|lTB|r|r|]
285\HL
286\NC              \NC \LUATEX\ 0.74 -o2 \NC \LUATEX\ 0.74 - o3 \NC \NR
287\HL
288\NC benchmark-1  \NC 15.5              \NC 15.0               \NC \NR
289\NC benchmark-2  \NC 35.8              \NC 34.0               \NC \NR
290\NC benchmark-3  \NC  4.0              \NC  3.9               \NC \NR
291\NC benchmark-4  \NC 16.0              \NC 15.8               \NC \NR
292\HL
293\stoptabulate
294
295This time we used \type {--batch} and \type {--silent} to eliminate terminal
296output. So, if you really want to squeeze out the maximum performance you need
297to compile with \type {-o3}, use \LUAJITTEX\ (with the faster virtual machine)
298but disable \JIT\ (disabled by default anyway).
299
300% tex + jit = 23.3
301% tex + lua = 27.0
302% lua = 2*jit       % cf roberto
303%
304% so:
305%
306% 2*tex + 2*jit = 46.6
307%   tex + 2*jit = 27.0
308% -------------------- -
309%   tex         = 19.6
310%
311% ratios:
312%
313% tex : lua = 70 : 30
314% tex : jit = 85 : 15
315
316We have no reason to abandon stock \LUA. Also, because during these experiments
317we were still using \LUA\ 5.1 we started wondering what the move to 5.2 would
318bring. Such a move forward also means that \CONTEXT\ \MKIV\ will not depend on
319specific \LUAJIT\ features, although it is aware of it (this is needed because we
320store bytecodes). But we will definitely explore the possibilities and see where
321we can benefit. In that respect there will be a way to enable and
322disable jitting. So, users have the choice to use either stock \LUATEX\ or the
323\JIT||aware version but we default to the regular binary.
324
325As we use stock \LUA\ as benchmark, we will use the \type {bit32} library, while
326\LUAJIT\ has its own bit library. Some functions can be aliased so that is no big
327deal. In \CONTEXT\ we use wrappers anyway. More problematic is that we want to
328move on to \LUA\ 5.2 and not all 5.2 features are supported (yet) in \LUAJIT. So,
329if \LUAJIT\ is mandatory in a workflow, then users had better make sure that the
330\LUA\ code is compatible. We don't expect too many problems in \CONTEXT\ \MKIV.
331
332\stopsection
333
334\startsection[title=About speed]
335
336It is worth mentioning that the \LUA\ version in \LUATEX\ has a patch for
337converting floats into strings. Instead of some \type {INF#} result we just
338return zero, simply because \TEX\ is integer||based and intercepting incredibly
339small numbers is too cumbersome. We had to apply the same patch in the \JIT\
340version.
341
342The benchmarks only indicate a trend. In a real document much more happens than
343in the above tests. So what are measurements worth? Say that we compile the \TEX
344book. This grandparent of all documents coded in \TEX\ is rather plainly coded
345(using of course plain \TEX) and compiles pretty fast. Processing does not suffer
346from complex expansions, there is no color, hardly any text manipulation, it's
347all 8 bit, the pagebuilder is straightforward as is all spacing. Although on my
348old machine I can get \CONTEXT\ to run at over 200 pages per second, this quickly
349drops to 10\% of that speed when we add some color, backgrounds, headers and
350footers, font switches, etc.
351
352So, running documents like the \TEX book for comparing the speed of, say,
353\PDFTEX, \XETEX, \LUATEX\ and now \LUAJITTEX\ makes no sense. The first one is
354still eight bit, the rest are \UNICODE. Also, the \TEX book uses traditional
355fonts with traditional features so effectively that it doesn't rely on anything
356that the new engines provide, not even \ETEX\ extensions. On the other hand, a
357recent document uses advanced fonts, properties like color and|/|or
358transparencies, hyperlinks, backgrounds, complex cover pages or chapter openings,
359embeds graphics, etc. Such a document might not even process in \PDFTEX\ or
360\XETEX, and if it does, it's still comparing different technologies: eight bit
361input and fast fonts in \PDFTEX, frozen \UNICODE\ and wide font support in
362\XETEX, instead of additional trickery and control, written in \LUA. So, when we
363investigate speed, we need to take into account what (font and input)
364technologies are used as well as what complicating layout and rendering features
365play a role. In practice speed only matters in an edit|-|view cycle and services
366where users wait for some result.
367
368It's rather hard to find a recent document that can be used to compare these
369engines. The best we could come up with was the rendering of the user interface
370documentation.
371
372\starttabulate[|T|T|T|T||]
373\NC texexec \NC --engine=pdftex    \NC --global \NC x-set-12.mkii \NC 5.9 seconds \NC \NR
374\NC texexec \NC --engine=xetex     \NC --global \NC x-set-12.mkii \NC 6.2 seconds \NC \NR
375\NC context \NC --engine=luatex    \NC --global \NC x-set-12.mkiv \NC 6.2 seconds \NC \NR
376\NC context \NC --engine=luajittex \NC --global \NC x-set-12.mkiv \NC 4.6 seconds \NC \NR
377\stoptabulate
378
379Keep in mind that \type{texexec} is a \RUBY\ script and uses \type {kpsewhich}
380while \type {context} uses \LUA\ and its own (\TDS||compatible) file manager. But
381still, it is interesting to see that there is not that much difference if we keep
382\JIT\ out of the picture. This is because in \MKIV\ we have somewhat more clever
383\XML\ processing, although earlier measurements have demonstrated that in this
384case not that much speedup can be assigned to that.
385
386And so recent versions of \MKIV\ already keep up rather well with the older eight
387bit world. We do way more in \MKIV\ and the interfacing macros are nicer but
388potentially somewhat slower. Some mechanisms might be more efficient because of
389using \LUA, but some actually have more overhead because we keep track of more
390data. Font feature processing is done in \LUA, but somehow can keep up with the
391libraries used in \XETEX, or at least is not that significant a difference,
392although I can think of more demanding tasks. Of course in \LUATEX\ we can go
393beyond what libraries provide.
394
395No matter what one takes into account, performance is not that much worse in
396\LUATEX, and if we enable \JIT\ and so remove some of the traditional \LUA\
397virtual machine overhead, we're even better off. Of course we need to add a
398disclaimer here: don't force us to prove that the relative speed ratios are the
399same for all cases. In fact, it being so hard to measure and compare, performance
400can be considered to be something taken for granted as there is not that much we
401can do about getting nicer numbers, apart from maybe parallelizing which brings
402other complexities into the picture. On our servers, a few other virtual machines
403running \TEX\ services kicking in at the same time, using \CPU\ cycles, network
404bandwidth (as all data lives someplace else) and asking for disk access have much
405more impact than the 25\% we gain. Of course if all processes run faster then
406we've gained something.
407
408For what it's worth: processing this text takes some 2.3 seconds on my laptop for
409regular \LUATEX\ and 1.8 seconds with \LUAJITTEX, including the extra overhead of
410restarting. As this is a rather average example it fits earlier measurements.
411
412Processing a font manual (work in progress) takes \LUAJITTEX\ 15 seconds for 112
413pages compared to 18.4 seconds for \LUATEX. The not yet finished manual loads 20
414different fonts (each with multiple instances), uses colors, has some \METAPOST\
415graphics and does some font juggling. The gain in speed sounds familiar.
416
417\stopsection
418
419\startsection[title=The future]
420
421At the 2012 \LUA\ conference Roberto Ierusalimschy mentioned that the virtual
422machine of \LUAJIT\ is about twice as fast due to it being partly done in
423assembler while the regular machinery is written in standard \CCODE\ and keeps
424portability in mind.
425
426He also presented some plans for future versions of \LUA. There will be some
427lightweight helpers for \UTF. Our experiences so far are that only a handful of
428functions are actually needed: byte to character conversions and vice versa,
429iterators for \UTF\ characters and \UTF\ values and maybe a simple substring
430function is probably enough. Currently \LUATEX\ has some extra string iterators
431and it will provide the converters as well.
432
433There is a good chance that \LPEG\ will become a standard library (which it
434already is in \LUATEX), which is also nice. It's interesting that, especially on
435longer sequences, \LPEG\ can beat the string matchers and replacers, although
436when in a substitution no match and therefore no replacements happen, the regular
437gsub wins. We're talking small numbers here, in daily usage \LPEG\ is about as
438efficient as you can wish. In \CONTEXT\ we have a \type {lpeg.UR} and \type
439{lpeg.US} and it would be nice to have these as native \UTF\ related methods, but
440I must admit that I seldom need them.
441
442This and other extensions coming to the language also have some impact on a \JIT\
443version: the current \LUAJIT\ is already not entirely compatible with \LUA\ 5.2
444so you need to keep that into account if you want to use this version of \LUATEX.
445So, unless \LUAJIT\ follows the mainstream development, as \CONTEXT\ \MKIV\ user
446you should not depend on it. But at the moment it's nice to have this choice.
447
448The yet experimental code will end up in the main \LUATEX\ repository in time
449before the \TEX\ Live 2013 code freeze. In order to make it easier to run both
450versions alongside, we have added the \LUA\ 5.2 built|-|in library \type {bit32}
451to \LUAJITTEX. We found out that it's too much trouble to add that library to
452\LUA~5.1 but \LUATEX\ has moved on to 5.2 anyway.
453
454\stopsection
455
456\startsection[title=Running]
457
458So, as we will definitely stick to stock \LUA, one might wonder if it makes sense
459to officially support jitting in \CONTEXT. First of all, \LUATEX\ is not
460influenced that much by the low level changes in the \API\ between 5.1 and 5.2.
461Also \LUAJIT\ does support the most important new 5.2 features, so at the moment
462we're mostly okay. We expect that eventually \LUAJIT\ will catch up but if not,
463we are not in big trouble: the performance of stock \LUA\ is quite okay and above
464all, it's portable! \footnote {Stability and portability are important properties
465of \TEX\ engines, which is yet another reason for using \LUA. For those doing
466number crunching in a document, \JIT\ can come in handy.} For the moment you can
467consider \LUAJITTEX\ to be an experiment and research tool, but we will do our
468best to keep it production ready.
469
470So how do we choose between the two engines? After some experimenting with
471alternative startup scenarios and dedicated caches, the following solution was
472reached:
473
474\starttyping
475context --engine=luajittex ...
476\stoptyping
477
478The usual preamble line also works:
479
480\starttyping
481% engine=luajittex
482\stoptyping
483
484As the main infrastructure uses the \type {luatex} and related binaries, this
485will result in a relaunch: the \type {context} script will be restarted using
486\type {luajittex}. This is a simple solution and the overhead is rather minimal,
487especially compared to the somewhat faster run. Alternatively you can copy \type
488{luajittex} over \type {luatex} but that is more drastic. Keep in mind that \type
489{luatex} is the benchmark for development of \CONTEXT, so the \JIT\ aware version
490might fall behind sometimes.
491
492Yet another approach is adapting the configuration file, or better, provide (or
493adapt) your own \type {texmfcnf.lua} in for instance \type {texmf-local/web2c}
494path:
495
496\starttyping
497return {
498  type    = "configuration",
499  version = "1.2.3",
500  date    = "2012-12-12",
501  time    = "12:12:12",
502  comment = "Local overloads",
503  author  = "Hans Hagen, PRAGMA-ADE, Hasselt NL",
504  content = {
505    directives = {
506      ["system.engine"] = "luajittex",
507    },
508  },
509}
510\stoptyping
511
512This has the same effect as always providing \type {--engine=luajittex} but only
513makes sense in well controlled situations as you might easily forget that it's
514the default. Of course one could have that file and just comment out the
515directive unless in test mode.
516
517Because the bytecode of \LUAJIT\ differs from the one used by \LUA\ itself we
518have a dedicated format as well as dedicated bytecode compiled resources (for
519instance \type {tmb} instead of \type {tmc}). For most users this is not
520something they should bother about as it happens automatically.
521
522Based on experiments, by default we have disabled \JIT\, so we only benefit from
523the faster virtual machine. Future versions of \CONTEXT\ might provide some
524control over that but first we want to conduct more experiments.
525
526\stopsection
527
528\startsection[title=Addendum]
529
530These developments and experiments took place in November and December 2012. At
531the time of this writing we also made the move to \LUA\ 5.2 in stock \LUATEX; the
532first version to provide this was 0.74. Here are some measurements on Taco
533Hoekwater's 64-bit \LINUX\ machine:
534
535\starttabulate[|lTB|r|r|l|]
536\HL
537\NC              \NC \LUATEX\ 0.70 \NC \LUATEX\ 0.74 \NC        \NC \NR
538\HL
539\NC benchmark-1  \NC 23.67         \NC 19.57         \NC faster \NC \NR
540\NC benchmark-2  \NC 65.41         \NC 62.88         \NC faster \NC \NR
541\NC benchmark-3  \NC  4.88         \NC  4.67         \NC faster \NC \NR
542\NC benchmark-4  \NC 23.09         \NC 22.71         \NC faster \NC \NR
543\NC benchmark-5  \NC  2.56/2.06    \NC  2.66/2.29    \NC slower \NC \NR
544\HL
545\stoptabulate
546
547There is a good chance that this is due to improvements of the garbage collector,
548virtual machine and string handling. It also looks like memory consumption is a
549bit less. Some speed optimizations in reading files have been removed (at least
550for now) and some patches to the \type {format} function (in the \type {string}
551namespace) that dealt with (for \TEX) unfortunate number conversions have not
552been ported. The code base is somewhat cleaner and we expect to be able to split
553up the binary in a core program plus some libraries that are loaded on demand.
554\footnote {Of course this poses some constraints on stability as components get
555decoupled, but this is one of the issues that we hope to deal with properly in
556the library project.} In general, we don't expect too many issues in the
557transition to \LUA\ 5.2, and \CONTEXT\ is already adapted to support \LUATEX\
558with 5.2 as well as \LUAJITTEX\ with an older version.
559
560Running the same tests on a 32-bit \MSWINDOWS\ machine gives this:
561
562\starttabulate[|lTB|r|r|r|]
563\HL
564\NC              \NC \LUATEX\ 0.70 \NC \LUATEX\ 0.74 \NC        \NC \NR
565\HL
566\NC benchmark-1  \NC 26.4          \NC 25.5          \NC faster \NC \NR
567\NC benchmark-2  \NC 64.2          \NC 63.6          \NC faster \NC \NR
568\NC benchmark-3  \NC  7.1          \NC  6.9          \NC faster \NC \NR
569\NC benchmark-4  \NC 28.3          \NC 27.0          \NC faster \NC \NR
570\NC benchmark-5  \NC  1.95/1.50    \NC  1.84/1.48    \NC faster \NC \NR
571\HL
572\stoptabulate
573
574The gain is less impressive but the machine is rather old and we can benefit less
575from modern \CPU\ properties (cache, memory bandwidth, etc.). I tend to conclude
576that there is no significant improvement here but it also doesn't get worse.
577However we need to keep in mind that file \IO\ is less optimal in 0.74 so this
578might play a role. As usual, runtime is negatively influenced by the relatively
579slow speed of displaying messages on the console (even when we use \type
580{console2}).
581
582A few days before the end of 2012, Akira Kakuto compiled native \MSWINDOWS\
583binaries for both engines. This time I decided to run a comparison inside the
584\SCITE\ editor, that has very fast console output. \footnote {Most of my personal
585\TEX\ runs are from within \SCITE, while most runs on the servers are in batch
586mode, so normally the overhead of the console is acceptable or even neglectable.}
587
588\starttabulate[|lTB|r|r|r|]
589\HL
590\NC              \NC \LUATEX\ 0.74 (5.2) \NC \LUAJITTEX\ 0.72 (5.1) \NC         \NC \NR
591\HL
592\NC benchmark-1  \NC 25.4                \NC 25.4                   \NC similar \NC \NR
593\NC benchmark-2  \NC 54.7                \NC 36.3                   \NC faster  \NC \NR
594\NC benchmark-3  \NC  4.3                \NC  3.6                   \NC faster  \NC \NR
595\NC benchmark-4  \NC 20.0                \NC 16.3                   \NC faster  \NC \NR
596\NC benchmark-5  \NC  1.93/1.48          \NC  0.74/0.61             \NC faster  \NC \NR
597\HL
598\stoptabulate
599
600Only the \METAPOST\ library and conversion benchmark didn't show a speedup. The
601regular \TEX\ tests 1||3 gain some 15||35\%. Enabling \JIT\ (off by default)
602slowed down processing. For the sake of completeness I also timed \LUAJITTEX\
603on the console, so here you see the improvement of both engines.
604
605\starttabulate[|lTB|r|r|r|]
606\HL
607\NC              \NC \LUATEX\ 0.70 \NC \LUATEX\ 0.74 \NC \LUAJITTEX\ 0.72 \NC \NR
608\HL
609\NC benchmark-1  \NC 26.4          \NC 25.5          \NC  25.9      \NC \NR
610\NC benchmark-2  \NC 64.2          \NC 63.6          \NC  45.5      \NC \NR
611\NC benchmark-3  \NC 7.1           \NC  6.9          \NC   6.0      \NC \NR
612\NC benchmark-4  \NC 28.3          \NC 27.0          \NC  23.3      \NC \NR
613\NC benchmark-5  \NC 1.95/1.50     \NC 1.84/1.48     \NC  0.73/0.60 \NC \NR
614\HL
615\stoptabulate
616
617In this text, the term \JIT\ has come up a lot but you might rightfully wonder if
618the observations here relate to \JIT\ at all. For the moment I tend to conclude
619that the implementation of the virtual machine and garbage collection have more
620impact than the actual just||in||time compilation. More exploration of \JIT\ is
621needed to see if we can really benefit from that. Of course the fact that we use
622a bit less memory is also nice. In case you wonder why I bother about speed at
623all: we happen to run \LUATEX\ mostly as a (remote) service and generating a
624bunch of (related) documents takes a bit of time. Bringing the waiting down from
62515 to 10 seconds might not sound impressive but it makes a difference when it is
626someone's job to generate these sets.
627
628In summary: just before we entered 2013, we saw two rather fundamental updates of
629\LUATEX\ show up: an improved traditional one with \LUA\ 5.2 as well as the
630somewhat faster \LUAJITTEX\ with a mixture between 5.1 and 5.2. And in 2013 we
631will of course try to make them both even more attractive.
632
633\stopsection
634
635\stopchapter
636
637% benchmark-4:
638%
639% tex + jit = 23.3
640% tex + lua = 27.0
641% lua = 2*jit       % cf roberto
642%
643% so:
644%
645% 2*tex + 2*jit = 46.6
646%   tex + 2*jit = 27.0
647% -------------------- -
648%   tex         = 19.6
649%
650% ratios:
651%
652% tex : lua = 70 : 30
653% tex : jit = 85 : 15
654