about-nuts.tex /size: 27 Kb    last modification: 2023-12-21 09:43
1% language=us
2
3\startcomponent about-calls
4
5\environment about-environment
6
7\startchapter[title={Going nuts}]
8
9\startsection[title=Introduction]
10
11This is not the first story about speed and it will probably not be the last one
12either. This time we discuss a substantial speedup: upto 50\% with \LUAJITTEX.
13So, if you don't want to read further at least know that this speedup came at the
14cost of lots of testing and adapting code. Of course you could be one of those
15users who doesn't care about that and it may also be that your documents don't
16qualify at all.
17
18Often when I see a kid playing a modern computer game, I wonder how it gets done:
19all that high speed rendering, complex environments, shading, lightning,
20inter||player communication, many frames per second, adapted story lines,
21\unknown. Apart from clever programming, quite some of the work gets done by
22multiple cores working together, but above all the graphics and physics
23processors take much of the workload. The market has driven the development of
24this hardware and with success. In this perspective it's not that much of a
25surprise that complex \TEX\ jobs still take some time to get finished: all the
26hard work has to be done by interpreted languages using rather traditional
27hardware. Of course all kind of clever tricks make processors perform better than
28years ago, but still: we don't get much help from specialized hardware. \footnote
29{Apart from proper rendering on screen and printing on paper.} We're sort of
30stuck: when I replaced my 6 year old laptop (when I buy one, I always buy the
31fastest one possible) for a new one (so again a fast one) the gain in speed of
32processing a document was less than twice. The many times faster graphic
33capabilities are not of much help there, not is twice the amount of cores.
34
35So, if we ever want to go much faster, we need to improve the software. The
36reason for trying to speed up \MKIV\ has been mentioned before, but let's
37summarize it here:
38
39\startitemize
40
41\startitem
42    There was a time when users complained about the speed of \CONTEXT,
43    especially compared to other macro packages. I'm not so sure if this is still
44    a valid complaint, but I do my best to avoid bottlenecks and much time goes
45    into testing efficiency.
46\stopitem
47
48\startitem
49    Computers don't get that much faster, at least we don't see an impressive
50    boost each year any more. We might even see a slowdown when battery live
51    dominates: more cores at a lower speed seems to be a trend and that doesn't
52    suit current \TEX\ engines well. Of course we assume that \TEX\ will be
53    around for some time.
54\stopitem
55
56\startitem
57    Especially in automated workflows where multiple products each demanding a
58    couple of runs are produced speed pays back in terms of resources and
59    response time. Of course the time invested in the speedup is never regained
60    by ourselves, but we hope that users appreciate it.
61\stopitem
62
63\startitem
64    The more we do in \LUA, read: the more demanding users get and the more
65    functionality is enabled, the more we need to squeeze out of the processor.
66    And we want to do more in \LUA\ in order to get better typeset results.
67\stopitem
68
69\startitem
70    Although \LUA\ is pretty fast, future versions might be slower. So, the more
71    efficient we are, the less we probably suffer from changes.
72\stopitem
73
74\startitem
75    Using more complex scripts and fonts is so demanding that the number of pages
76    per second drops dramatically. Personally I consider a rate of 15 pps with
77    \LUATEX\ or 20 pps with \LUAJITTEX\ reasonable minima on my laptop. \footnote
78    {A Dell 6700 laptop with Core i7 3840QM, 16 GB memory and SSD, running 64 bit
79    Windows 8.}
80\stopitem
81
82\startitem
83    Among the reasons why \LUAJIT\ jitting does not help us much is that (at
84    least in \CONTEXT) we don't use that many core functions that qualify for
85    jitting. Also, as runs are limited in time and much code kicks in only a few
86    times the analysis and compilation doesn't pay back in runtime. So we cannot
87    simply sit down and wait till matters improve.
88\stopitem
89
90\stopitemize
91
92Luigi Scarso and I have been exploring several options, with \LUATEX\ as well as
93\LUAJITTEX. We observed that the virtual machine in \LUAJITTEX\ is much faster so
94that engine already gives a boots. The advertised jit feature can best be
95disabled as it slows down a run noticeably. We played with \type {ffi} as well,
96but there is additional overhead involved (\type {cdata}) as well as limited
97support for userdata, so we can forget about that too. \footnote {As we've now
98introduced getters we can construct a metatable at the \LUA\ end as that is what
99\type {ffi} likes most. But even then, we don't expect much from it: the four
100times slow down that experiments showed will not magically become a large gain.}
101Nevertheless, the twice as fast virtual machine of \LUAJIT\ is a real blessing,
102especially if you take into account that \CONTEXT\ spends quite some time in
103\LUA. We're also looking forward to the announced improved garbage collector of
104\LUAJIT.
105
106In the end we started looking at \LUATEX\ itself. What can be gained there,
107within the constraints of not having to completely redesign existing
108(\CONTEXT) \LUA\ code? \footnote {In the end a substantial change was needed but
109only in accessing node properties. The nice thing about C is that there macros
110often provide a level of abstraction which means that a similar adaption of \TEX\
111source code would be more convenient.}
112
113\stopsection
114
115\startsection[title={Two access models}]
116
117Because the \CONTEXT\ code is reasonably well optimized already, the only option
118is to look into \LUATEX\ itself. We had played with the \TEX||\LUA\ interface
119already and came to the conclusion that some runtime could be gained there. On
120the long run it adds up but it's not too impressive; these extensions are
121awaiting integration. Tracing and bechmarking as well as some quick and dirty
122patches demonstrated that there were two bottlenecks in accessing fields in
123nodes: checking (comparing the metatables) and constructing results (userdata
124with metatable).
125
126In case you're infamiliar with the concept this is how nodes work. There is an
127abstract object called node that is in \LUA\ qualified as user data. This object
128contains a pointer to \TEX's node memory. \footnote {The traditional \TEX\ node
129memory manager is used, but at some point we might change to regular C
130(de)allocation. This might be slower but has some advantages too.} As it is real
131user data (not so called light) it also carries a metatable. In the metatble
132methods are defined and one of them is the indexer. So when you say this:
133
134\starttyping
135local nn = n.next
136\stoptyping
137
138given that \type {n} is a node (userdata) the \type {next} key is resolved up
139using the \type {__index} metatable value, in our case a function. So, in fact,
140there is no \type {next} field: it's kind of virtual. The index function that
141gets the relevant data from node memory is a fast operation: after determining
142the kind of node, the requested field is located. The return value can be a
143number, for instance when we ask for \type {width}, which is also fast to return.
144But it can also be a node, as is the case with \type {next}, an then we need to
145allocate a new userdata object (memory management overhead) and a metatable has
146to be associated. And that comes at a cost.
147
148In a previous update we had already optimized the main \type {__index} function
149but felt that some more was possible. For instance we can avoid the lookup of the
150metatable for the returned node(s). And, if we don't use indexed access but a
151instead a function for frequently accessed fields we can sometimes gain a bit too.
152
153A logical next step was to avoid some checking, which is okay given that one pays
154a bit attention to coding. So, we provided a special table with some accessors of
155frequently used fields. We actually implemented this as a so called \quote {fast}
156access model, and adapted part of the \CONTEXT\ code to this, as we wanted to see
157if it made sense. We were able to gain 5 to 10\% which is nice but still not
158impressive. In fact, we concluded that for the average run using fast was indeed
159faster but not enough to justify rewriting code to the (often) less nice looking
160faster access. A nice side effect of the recoding was that I can add more advanced
161profiling.
162
163But, in the process we ran into another possibility: use accessors exclusively
164and avoiding userdata by passing around references to \TEX\ node memory directly.
165As internally nodes can be represented by numbers, we ended up with numbers, but
166future versions might use light userdata instead to carry pointers around. Light
167userdata is cheap basic object with no garbage collection involved. We tagged
168this method \quote {direct} and one can best treat the values that gets passed
169around as abstract entities (in \MKIV\ we call this special view on nodes
170\quote {nuts}).
171
172So let's summarize this in code. Say that we want to know the next node of
173\type {n}:
174
175\starttyping
176local nn = n.next
177\stoptyping
178
179Here \type {__index} will be resolved and the associated function be called. We
180can avoid that lookup by applying the \type {__index} method directly (after all,
181that one assumes a userdata node):
182
183\starttyping
184local getfield = getmetatable(n).__index
185
186local nn = getfield(n,"next") -- userdata
187\stoptyping
188
189But this is not a recomended interface for regular users. A normal helper that
190does checking is as about fast as the indexed method:
191
192\starttyping
193local getfield = node.getfield
194
195local nn = getfield(n,"next") -- userdata
196\stoptyping
197
198So, we can use indexes as well as getters mixed and both perform more of less
199equal. A dedicated getter is somewhat more efficient:
200
201\starttyping
202local getnext = node.getnext
203
204local nn = getnext(n) -- userdata
205\stoptyping
206
207If we forget about checking, we can go fast, in fact the nicely interfaced \type
208{__index} is the fast one.
209
210\starttyping
211local getfield = node.fast.getfield
212
213local nn = getfield(n,"next") -- userdata
214\stoptyping
215
216Even more efficient is the following as that one knows already what to fetch:
217
218\starttyping
219local getnext = node.fast.getnext
220
221local nn = getnext(n) -- userdata
222\stoptyping
223
224The next step, away from userdata was:
225
226\starttyping
227local getfield = node.direct.getfield
228
229local nn = getfield(n,"next") -- abstraction
230\stoptyping
231
232and:
233
234\starttyping
235local getnext = node.direct.getnext
236
237local nn = getnext(n) -- abstraction
238\stoptyping
239
240Because we considered three variants a bit too much and because \type {fast} was
241only 5 to 10\% faster in extreme cases, we decided to drop that experimental code
242and stick to providing accessors in the node namespace as well as direct variants
243for critical cases.
244
245Before you start thinking: \quote {should I rewrite all my code?} think twice!
246First of all, \type {n.next} is quite fast and switching between the normal and
247direct model also has some cost. So, unless you also adapt all your personal
248helper code or provide two variants of each, it only makes sense to use direct
249mode in critical situations. Userdata mode is much more convenient when
250developing code and only when you have millions of access you can gain by direct
251mode. And even then, if the time spent in \LUA\ is small compared to the time
252spent in \TEX\ it might not even be noticeable. The main reason we made direct
253variants is that it does pay of in \OPENTYPE\ font processing where complex
254scripts can result in many millions of calls indeed. And that code will be set up
255in such a way that it will use userdata by default and only in well controlled
256case (like \MKIV) we will use direct mode. \footnote {When we are confident
257that \type {direct} node code is stable we can consider going direct in generic
258code as well, although we need to make sure that third party code keeps working.}
259
260Another thing to keep in mind is that when you provide hooks for users you should
261assume that they use the regular mode so you need to cast the plugins onto direct
262mode then. Because the idea is that one should be able to swap normal functions
263by direct ones (which of course is only possible when no indexes are used) all
264relevant function in the \type {node} namespace are available in \type {direct}
265as well. This means that the following code is rather neutral:
266
267\starttyping
268local x = node -- or: x = node.direct
269
270for n in x.traverse(head) do
271  if x.getid(n) == node.id("glyph") and x.getchar(n) == 0x123 then
272    x.setfield(n,"char",0x456)
273  end
274end
275\stoptyping
276
277Of course one needs to make sure that \type {head} fits the model. For this you
278can use the cast functions:
279
280\starttyping
281node.direct.todirect(node or direct)
282node.direct.tonode(direct or node)
283\stoptyping
284
285These helpers are flexible enough to deal with either model. Aliasing the
286functions to locals is of course more efficient when a large number of calls
287happens (when you use \LUAJITTEX\ it will do some of that for you automatically).
288Of course, normally we use a more natural variant, using an id traverser:
289
290\starttyping
291for n in node.traverse_id(head,node.id("glyph")) do
292  if n.char == 0x123 then
293    n.char = 0x456
294  end
295end
296\stoptyping
297
298This is not that much slower, especially when it's only ran once. Just count the
299number of characters on a page (or in your document) and you will see that it's
300hard to come up with that many calls. Of course, processing many pages of Arabic
301using a mature font with many features enabled and contextual lookups, you do run
302into quantities. Tens of features times tens of contextual lookup passes can add
303up considerably. In Latin scripts you never reach such numbers, unless you use
304fonts like Zapfino.
305
306\stopsection
307
308\startsection[title={The transition}]
309
310After weeks of testing, rewriting, skyping, compiling and making decisions, we
311reached a more or less stable situation. At that point we were faced with a
312speedup that gave us a good feeling, but transition to the faster variant has a
313few consequences.
314
315\startitemize
316
317\startitem We need to use an adapted code base: indexes are to be replaced by
318function calls. This is a tedious job that can endanger stability so it has to be
319done with care. \footnote {The reverse is easier, as converting getters and
320setters to indexed is a rather simple conversion, while for instance changing
321type {.next} into a \type {getnext} needs more checking because that key is not
322unique to nodes.} \stopitem
323
324\startitem When using an old engine with the new \MKIV\ code, this approach will
325result in a somewhat slower run. Most users will probably accept a temporary
326slowdown of 10\%, so we might take this intermediate step. \stopitem
327
328\startitem When the regular getters and setters become available we get back to
329normal. Keep in mind that these accessors do some checking on arguments so that
330slows down to the level of using indexes. On the other hand, the dedicated ones
331(like \type {getnext}) are more efficient so there we gain. \stopitem
332
333\startitem As soon as direct becomes available we suddenly see a boost in speed.
334In documents of average complexity this is 10-20\% and when we use more complex
335scripts and fonts it can go up to 40\%. Here we assume that the macro package
336spends at least 50\% of its time in \LUA. \stopitem
337
338\stopitemize
339
340If we take the extremes: traditional indexed on the one hand versus optimized
341direct in \LUAJITTEX, a 50\% gain compared to the old methods is feasible.
342Because we also retrofitted some fast code into the regular accessor, indexed
343mode should also be somewhat faster compared to the older engine.
344
345In addition to the already provide helpers in the \type {node} namespace, we
346added the following:
347
348\starttabulate[|Tl|p|]
349\HL
350\NC getnext    \NC this one is used a lot when analyzing and processing node lists \NC \NR
351\NC getprev    \NC this one is used less often but fits in well (companion to \type {getnext}) \NC \NR
352\NC getfield   \NC this is the general accessor, in userdata mode as fast as indexed \NC \NR
353\HL
354\NC getid      \NC one of the most frequent called getters when parsing node lists \NC \NR
355\NC getsubtype \NC especially in fonts handling this getter gets used \NC \NR
356\HL
357\NC getfont    \NC especially in complex font handling this is a favourite \NC \NR
358\NC getchar    \NC as is this one \NC \NR
359\HL
360\NC getlist    \NC we often want to recurse into hlists and vlists and this helps \NC \NR
361\NC getleader  \NC and also often need to check if glue has leader specification (like list) \NC \NR
362\HL
363\NC setfield   \NC we have just one setter as setting is less critical \NC \NR
364\HL
365\stoptabulate
366
367As \type {getfield} and \type {setfield} are just variants on indexed access, you
368can also use them to access attributes. Just pass a number as key. In the \type
369{direct} namespace, helpers like \type {insert_before} also deal with direct
370nodes.
371
372We currently only provide \type {setfield} because setting happens less than
373getting. Of course you can construct nodelists at the \LUA\ end but it doesn't
374add up that fast and indexed access is then probably as efficient. One reason why
375setters are less an issue is that they don't return nodes so no userdata overhead
376is involved. We could (and might) provide \type {setnext} and \type {setprev},
377although, when you construct lists at the \LUA\ end you will probably use the
378type {insert_after} helper anyway.
379
380\stopsection
381
382\startsection[title={Observations}]
383
384So how do these variants perform? As we no longer have \type {fast} in the engine
385that I use for this text, we can only check \type {getfield} where we can simulate
386fast mode with calling the \type{__index} metamethod. In practice the \type
387{getnext} helper will be somewhat faster because no key has to be checked,
388although the \type {getfield} functions have been optimized according to the
389frequencies of accessed keys already.
390
391\starttabulate
392\NC node[*]              \NC 0.516 \NC \NR
393\NC node.fast.getfield   \NC 0.616 \NC \NR
394\NC node.getfield        \NC 0.494 \NC \NR
395\NC node.direct.getfield \NC 0.172 \NC \NR
396\stoptabulate
397
398Here we simulate a dumb 20 times node count of 200 paragraphs \type {tufte.tex}
399with a little bit of overhead for wrapping in functions. \footnote {When
400typesetting Arabic or using complex fonts we quickly get a tenfold.} We encounter
401over three million nodes this way. We average a couple or runs.
402
403\starttyping
404local function check(current)
405  local n = 0
406  while current do
407    n = n + 1
408    current = getfield(current,"next") -- current = current.next
409  end
410  return n
411end
412\stoptyping
413
414What we see here is that indexed access is quite okay given the amount of nodes,
415but that direct is much faster. Of course we will never see that gain in practice
416because much more happens than counting and because we also spend time in \TEX.
417The 300\% speedup will eventually go down to one tenth of that.
418
419Because \CONTEXT\ avoids node list processing when possible the baseline
420performance is not influenced much.
421
422\starttyping
423\starttext \dorecurse{1000}{test\page} \stoptext
424\stoptyping
425
426With \LUATEX\ we get some 575 pages per second and with \LUAJITTEX\ more than 610
427pages per second.
428
429\starttyping
430\setupbodyfont[pagella]
431
432\edef\zapf{\cldcontext
433  {context(io.loaddata(resolvers.findfile("zapf.tex")))}}
434
435\starttext \dorecurse{1000}{\zapf\par} \stoptext
436\stoptyping
437
438For this test \LUATEX\ needs 3.9 seconds and runs at 54 pages per second, while
439\LUAJITTEX\ needs only 2.3 seconds and gives us 93 pages per second.
440
441Just for the record, if we run this:
442
443\starttyping
444\starttext
445\stoptext
446\stoptyping
447
448a \LUATEX\ runs takes 0.229 seconds and a \LUAJITTEX\ run 0.178 seconds. This includes
449initializing fonts. If we run just this:
450
451\starttyping
452\stoptext
453\stoptyping
454
455\LUATEX\ needs 0.199 seconds and \LUAJITTEX\ only 0.082 seconds. So, in the
456meantime, we hardly spend any time on startup. Launching the binary and managing
457the job with \type {mtxrun} calling \type {mtx-context} adds 0.160 seconds
458overhead. Of course this is only true when you have already ran \CONTEXT\ once as
459the operating system normally caches files (in our case format files and fonts).
460This means that by now an edit|-|preview cycle is quite convenient. \footnote {I
461use \SCITE\ with dedicated lexers as editor and currently \type {sumatrapdf} as
462previewer.}
463
464As a more practical test we used the current version of \type {fonts-mkiv} (166
465pages, using all kind of font tricks and tracing), \type {about} (60 pages, quite
466some traced math) and a torture test of Arabic text (61 pages dense text). The
467following measurements are from 2013-07-05 after adapting some 50 files to the
468new model. Keep in mind that the old binary can fake a fast getfield and setfield
469but that the other getters are wrapped functions. The more we have, the slower it
470gets. We used the mingw versions.
471
472\starttabulate[|l|r|r|r|]
473\HL
474\NC version                                 \NC fonts \NC about \NC arabic \NC \NR
475\HL
476\NC old mingw, indexed plus some functions  \NC  8.9  \NC  3.2  \NC  20.3  \NC \NR
477\NC old mingw, fake functions               \NC  9.9  \NC  3.5  \NC  27.4  \NC \NR
478\HL
479\NC new mingw, node functions               \NC  9.0  \NC  3.1  \NC  20.8  \NC \NR
480\NC new mingw, indexed plus some functions  \NC  8.6  \NC  3.1  \NC  19.6  \NC \NR
481\NC new mingw, direct functions             \NC  7.5  \NC  2.6  \NC  14.4  \NC \NR
482\HL
483\stoptabulate
484
485The second row shows what happens when we use the adapted \CONTEXT\ code with an
486older binary. We're slower. The last row is what we will have eventually. All
487documents show a nice gain in speed and future extensions to \CONTEXT\ will no
488longer have the same impact as before. This is because what we here see also
489includes \TEX\ activity. The 300\% increase of speed of node access makes node
490processing less influential. On the average we gain 25\% here and as on these
491documents \LUAJITTEX\ gives us some 40\% gain on indexed access, it gives more
492than 50\% on the direct function based variant.
493
494In the fonts manual some 25 million getter accesses happen while the setters
495don't exceed one million. I lost the tracing files but at some point the Arabic
496test showed more than 100 millions accesses. So it's save to conclude that
497setters are sort of neglectable. In the fonts manual the amount of accesses to
498the previous node were less that 5000 while the id and next fields were the clear
499winners and list and leader fields also scored high. Of course it all depends on
500the kind of document and features used, but we think that the current set of
501helpers is quite adequate. And because we decided to provide that for normal
502nodes as well, there is no need to go direct for more simple cases.
503
504Maybe in the future further tracing might show that adding getters for width,
505height, depth and other properties of glyph, glue, kern, penalty, rule, hlist and
506vlist nodes can be of help, but quite probably only in direct mode combined with
507extensive list manipulations. We will definitely explore other getters but only
508after the current set has proven to be useful.
509
510\stopsection
511
512\startsection[title={Nuts}]
513
514So why going nuts and what are nuts? In Dutch \quote {node} sounds a bit like
515\quote {noot} and translates back to \quote {nut}. And as in \CONTEXT\ I needed
516word for these direct nodes they became \quote {nuts}. It also suits this
517project: at some point we're going nuts because we could squeeze more out
518of \LUAJITTEX, so we start looking at other options. And we're sure some folks
519consider us being nuts anyway, because we spend time on speeding up. And adapting
520the \LUATEX\ and \CONTEXT\ \MKIV\ code mid||summer is also kind of nuts.
521
522At the \CONTEXT\ 2013 conference we will present this new magic and about that
523time we've done enough tests to see if it works our well. The \LUATEX\ engine
524will provide the new helpers but they will stay experimental for a while as one
525never knows where we messed up.
526
527I end with another measurement set. Every now and and then I play with a \LUA\
528variant of the \TEX\ par builder. At some point it will show up on \MKIV\ but
529first I want to abstract it a bit more and provide some hooks. In order to test
530the performance I use the following tests:
531
532% \testfeatureonce{1000}{\tufte \par}
533
534\starttyping
535\testfeatureonce{1000}{\setbox0\hbox{\tufte}}
536
537\testfeatureonce{1000}{\setbox0\vbox{\tufte}}
538
539\startparbuilder[basic]
540  \testfeatureonce{1000}{\setbox0\vbox{\tufte}}
541\stopparbuilder
542\stoptyping
543
544We use a \type {\hbox} to determine the baseline performance. Then we break lines
545using the built|-|in parbuilder. Next we do the same but now with the \LUA\
546variant. \footnote {If we also enable protrusion and hz the \LUA\ variant suffers
547less because it implements this more efficient.}
548
549\starttabulate[|l|l|l|l|l|]
550\HL
551\NC                \NC \bf \rlap{luatex} \NC \NC \bf \rlap{luajittex} \NC \NC \NR
552\HL
553\NC                \NC \bf total \NC \bf linebreak \NC \bf total \NC \bf linebreak \NC \NR
554\HL
555\NC 223 pp nodes   \NC 5.67      \NC 2.25 flushing \NC 3.64      \NC 1.58 flushing \NC \NR
556\HL
557\NC hbox nodes     \NC 3.42      \NC               \NC 2.06      \NC               \NC \NR
558\NC vbox nodes     \NC 3.63      \NC 0.21 baseline \NC 2.27      \NC 0.21 baseline \NC \NR
559\NC vbox lua nodes \NC 7.38      \NC 3.96          \NC 3.95      \NC 1.89          \NC \NR
560\HL
561\NC 223 pp nuts    \NC 4.07      \NC 1.62 flushing \NC 2.36      \NC 1.11 flushing \NC \NR
562\HL
563\NC hbox nuts      \NC 2.45      \NC               \NC 1.25      \NC               \NC \NR
564\NC vbox nuts      \NC 2.53      \NC 0.08 baseline \NC 1.30      \NC 0.05 baseline \NC \NR
565\NC vbox lua nodes \NC 6.16      \NC 3.71          \NC 3.03      \NC 1.78          \NC \NR
566\NC vbox lua nuts  \NC 5.45      \NC 3.00          \NC 2.47      \NC 1.22          \NC \NR
567\HL
568\stoptabulate
569
570We see that on this test nuts have an advantage over nodes. In this case we
571mostly measure simple font processing and there is no markup involved. Even a 223
572page document with only simple paragraphs needs to be broken across pages,
573wrapped in page ornaments and shipped out. The overhead tagged as \quote
574{flushed} indicates how much extra time would have been involved in that. These
575numbers demonstrate that with nuts the \LUA\ parbuilder is performing 10\% better
576so we gain some. In a regular document only part of the processing involves
577paragraph building so switching to a \LUA\ variant has no big impact anyway,
578unless we have simple documents (like novels). When we bring hz into the picture
579performance will drop (and users occasionally report this) but here we already
580found out that this is mostly an implementation issue: the \LUA\ variant suffers
581less so we will backport some of the improvements. \footnote {There are still
582some aspects that can be approved. For instance these tests still checks lists
583for \type {prev} fields, something that is not needed in future versions.}
584
585\stopsection
586
587\startsection[title={\LUA\ 5.3}]
588
589When we were working on this the first working version of \LUA\ 5.3 was
590announced. Apart from some minor changes that won't affect us, the most important
591change is the introduction of integers deep down. On the one hand we can benefit
592from this, given that we adapt the \TEX|-|\LUA\ interfaces a bit: the distinction
593between \type {to_number} and \type {to_integer} for instance. And, numbers are
594always somewhat special in \TEX\ as it relates to reproduction on different
595architectures, also over time. There are some changes in conversion to string
596(needs attention) and maybe at some time also in the automated casting from
597strings to numbers (the last is no big deal for us).
598
599On the one hand the integers might have a positive influence on performance
600especially as scaled points are integers and because fonts use them too (maybe
601there is some advantage in memory usage). But we also need a proper efficient
602round function (or operator) then. I'm wondering if mixed integer and float usage
603will be efficient, but on the the other hand we do not that many calculations so
604the benefits might outperform the drawbacks.
605
606We noticed that 5.2 was somewhat faster but that the experimental generational
607garbage collecter makes runs slower. Let's hope that the garbage collector
608performance doesn't degrade. But the relative gain of node versus direct will
609probably stay.
610
611Because we already have an experimental setup we will probably experiment a bit
612with this in the future. Of course the question then is how \LUAJITTEX\ will work
613out, because it is already not 5.2 compatible it has to be seen if it will
614support the next level. At least in \CONTEXT\ \MKIV\ we can prepare ourselves as
615we did with \LUA\ 5.2 so that we're ready when we follow up.
616
617\stopsection
618
619\stopchapter
620