about-jitting.tex /size: 15 Kb    last modification: 2023-12-21 09:43
1% language=us engine=luajittex
2
3\startluacode
4
5    local nofjitruns = 5000
6
7    local runnow     = string.find(environment.jobname,"about%-jitting") and jit
8
9    local runtimes   = table.load("about-jitting-jit.lua") or {
10        nofjitruns = nofjitruns,
11        timestamp  = os.currenttime(),
12    }
13
14    document.NOfJitRuns  = runtimes.nofjitruns or nofjitruns
15    document.JitRunTimes = runtimes
16
17    function document.JitRun(specification)
18
19        local code = buffers.getcontent(specification.name)
20
21        if runnow then
22
23            local function testrun(how)
24                local test = load(code)()
25                collectgarbage("collect")
26                jit[how]()
27                local t = os.clock()
28                for i=1,document.NOfJitRuns do
29                    test()
30                end
31                t = os.clock() - t
32                jit.off()
33                return string.format("%0.3f",t)
34            end
35
36            local rundata = {
37                off = testrun("off"),
38                on  = testrun("on"),
39            }
40
41            runtimes[code]     = rundata
42            document.JitTiming = rundata
43
44        else
45
46            local rundata      = runtimes[code] or { }
47
48            document.JitTiming = {
49                off = rundata.off or "0",
50                on  = rundata.on  or "0",
51            }
52
53
54        end
55
56    end
57
58\stopluacode
59
60\starttexdefinition LuaJitTest #1%
61
62    \ctxlua{document.JitRun { name = "#1" } }
63
64    \starttabulate[|lT|lT|]
65        \NC off \NC \cldcontext{document.JitTiming.off} \NC \NR
66        \NC on  \NC \cldcontext{document.JitTiming.on } \NC \NR
67    \stoptabulate
68
69\stoptexdefinition
70
71\starttexdefinition NOfLuaJitRuns
72    \cldcontext{document.NOfJitRuns}
73\stoptexdefinition
74
75% end of code
76
77\startcomponent about-jitting
78
79\environment about-environment
80
81\definehead[jittestsection][subsubsection][color=,style=bold]
82
83\startchapter[title=Luigi's nightmare]
84
85\startsection[title=Introduction]
86
87If you have a bit of a background in programming and watch kids playing video
88games, either or not on a dedicates desktop machine, a console or even a mobile
89device, there is a good change that you realize how much processing power is
90involved. All those pixels get calculated many times per second, based on a
91dynamic model that not only involves characters, environment, physics and a story
92line but also immediately reacts on user input.
93
94If on the other hand in your text editor hit the magic key combination that
95renders a document source into for instance a \PDF\ file, you might wonder why
96that takes so many seconds. Of course it does matter that some resources are
97loaded, that maybe images are included, and lots of fuzzy logic makes things
98happen, but the most important factor is without doubt that \TEX\ macros are not
99compiled into machine code but into an intermediate representation. Those macros
100then get expanded, often over and over again, and that a relative slow process.
101As (local) macros can be redefined any time, the engine needs to take that into
102account and there is not much caching going on, unless you explicitly define
103macros that do so. Take this:
104
105\starttyping
106\def\bar{test}
107\def\foo{test \bar\space test}
108\stoptyping
109
110Even if the definition of \type {\test} stays the same, that if \type {\bar} can
111change:
112
113\starttyping
114\foo \def\bar{foo} \foo
115\stoptyping
116
117There is no mechanism to freeze the meaning of \type {\bar} in \type {\foo},
118something that is possible in the other language used in \CONTEXT:
119
120\starttyping
121local function bar() context("test") end
122function foo() context("test ") bar() context(" test") end
123\stoptyping
124
125Here we can use local functions to limit their scope.
126
127\starttyping
128foo() local function bar() context("foo") end foo()
129\stoptyping
130
131In a way you can say that \TEX\ is a bit more dynamic that \LUA, and optimizing
132(as well as hardening) it is much more difficult. In \CONTEXT\ we already
133stretched that to the limits, although occasionally I find ways to speed up a
134bit. Given that we spend a considerable amount of runtime in \LUA\ it makes sense
135to see what we can gain there. We have less possible interference and often a more
136predictable outcome as \type {bar}s won't suddenly become \type {foo}s.
137
138Nevertheless, the dynamic nature of both \TEX\ and \LUA\ has some impact on
139performance, especially when they do most of the work. While in games there are
140dedicated chips to do tasks, for \TEX\ there aren't. So, we're sort of stuck when
141it comes to speeding up the process to the level that is similar to advanced
142games. In the next sections I will discuss a few aspects of possible speedups and
143the reason why it doesn't work out as expected.
144
145\stopsection
146
147\startsection[title=Jitting]
148
149Let's go back once more to Luigi's nightmare of disappointing jit \footnote
150{Luigi Scarso is the author of \LUAJITTEX\ and we have reported on experiments
151with this variant of \LUATEX\ on several occasions.} We already know that the
152virtual machine of \LUAJIT\ is about twice as fast as the standard machine. We
153also experienced that enabling jit can degrade performance. Although we did
154observe some real drastic drop in performance when testing functions like \type
155{math.random} using the \type {mingw} compiler, we also saw a performance boost
156with simple pure \LUA\ functions. In that respect \LUAJIT\ is an impressive
157effort. So, it makes sense to use \LUAJITTEX\ even if in theory it could be
158faster.
159
160Next some tests will be shown. The timings are snapshots so different versions of
161\LUAJITTEX\ can have different outcomes. The tests are mostly used for
162discussions between Luigi and me and further experiments and believe me: we've
163really done all kind of tests to see if we can get some speed out of jitting.
164After all it's hard to believe that we can't gain something from it, so we might
165as do something wrong.
166
167Each test is run \NOfLuaJitRuns\ times. These are of course non|-|typical
168examples but they illustrate the principle. Each time we show two measurements:
169one with jit turned on, and one with jit off, but in both cases the faster
170virtual machine is enabled. The times shown are of course dependent on the
171architecture and operating system, but as we are only interested in relative
172times it's enough to know that we run 32 bit mingw binaries under 64 bit Windows
1738 on a modern quad core Ivy bridge \CPU. We did most tests with \LUAJIT\ 2.0.1
174but as far as we can see 2.0.2 has a similar performance.
175
176\startjittestsection[title={simple loops, no function calls}]
177
178\startbuffer[jittest]
179return function()
180    local a = 0
181    for i=1,10000 do
182        a = a + i
183    end
184end
185\stopbuffer
186
187\typebuffer[jittest] \LuaJitTest{jittest}
188
189\stopjittestsection
190
191\startjittestsection[title={simple loops, with simple function}]
192
193\startbuffer[jittest]
194local function whatever(i)
195    return i
196end
197
198return function()
199    local a = 0
200    for i=1,10000 do
201        a = a + whatever(i)
202    end
203end
204\stopbuffer
205
206\typebuffer[jittest] \LuaJitTest{jittest}
207
208\stopjittestsection
209
210\startjittestsection[title={simple loops, with built-in basic functions}]
211
212\startbuffer[jittest]
213return function()
214    local a = 0
215    for i=1,10000 do
216        a = a + math.sin(1/i)
217    end
218end
219\stopbuffer
220
221\typebuffer[jittest] \LuaJitTest{jittest}
222
223\stopjittestsection
224
225\startjittestsection[title={simple loops, with built-in simple functions}]
226
227\startbuffer[jittest]
228return function()
229    local a = 0
230    for i=1,1000 do
231        local a = a + tonumber(tostring(i))
232    end
233end
234\stopbuffer
235
236\typebuffer[jittest] \LuaJitTest{jittest}
237
238\stopjittestsection
239
240\startjittestsection[title={simple loops, with built-in simple functions}]
241
242\startbuffer[jittest]
243local tostring, tonumber = tostring, tonumber
244return function()
245    local a = 0
246    for i=1,1000 do
247        local a = a + tonumber(tostring(i))
248    end
249end
250\stopbuffer
251
252\typebuffer[jittest] \LuaJitTest{jittest}
253
254\stopjittestsection
255
256\startjittestsection[title={simple loops, with built-in complex functions}]
257
258\startbuffer[jittest]
259return function()
260    local a = 0
261    local p = (1-lpeg.P("5"))^0 * lpeg.P("5") + lpeg.Cc(0)
262    for i=1,100 do
263        local a = a + lpeg.match(p,tostring(i))
264    end
265end
266\stopbuffer
267
268\typebuffer[jittest] \LuaJitTest{jittest}
269
270\stopjittestsection
271
272\startjittestsection[title={simple loops, with foreign function}]
273
274\startbuffer[jittest]
275return function()
276    local a = 0
277    for i=1,10000 do
278        a = a + font.current()
279    end
280end
281\stopbuffer
282
283\typebuffer[jittest] \LuaJitTest{jittest}
284
285\stopjittestsection
286
287\startjittestsection[title={simple loops, with wrapped foreign functions}]
288
289\startbuffer[jittest]
290local fc = font.current
291
292function font.xcurrent()
293    return fc()
294end
295
296return function()
297    local a = 0
298    for i=1,10000 do
299        a = a + font.xcurrent()
300    end
301end
302\stopbuffer
303
304\typebuffer[jittest] \LuaJitTest{jittest}
305
306\stopjittestsection
307
308What we do observe here is that turning on jit doesn't always help. By design the
309current just|-|in|-|time compiler aborts optimization when it sees a function
310that is not known. This means that in \LUAJITTEX\ most code will not get jit,
311because we use built|-|in library calls a lot. Also, in version 2.0 we notice
312that a bit of extra wrapping will make performance worse too. This might be why
313for us jitting doesn't work out the way it is advertised. Often performance tests
314are done with simple functions that use built in functions that do get jit. And
315the more of those are supported, the better it gets. Although, when you profile a
316\CONTEXT\ run, you will notice that we don't call that many standard library
317functions, at least not so often that jitting would get noticed.
318
319A safe conclusion is that you can benefit a lot from the fast virtual machine but
320should check carefully if jit is not having a negative impact. As it is turned on
321by default in \LUAJIT\ (but off in \LUAJITTEX) it might as well get unnoticed,
322especially because there is always a performance gain due to the faster virtual
323machine and that might show more overall gain than the drawback of jitting
324unjittable code. It might just be a bit less drastic then possible because of
325artifacts mentioned here, but who knows what future versions of \LUAJIT\ will
326bring.
327
328Maybe sometime we can benefit from \type {ffi} but it makes no sense to mess up
329the \CONTEXT\ code with related calls: it looks ugly and also makes the code
330unusable in stock \LUA, so it is a a sort of no|-|go. There are some suggestions
331in \LUAJIT\ related posts about adapting the code to suit the jitter, but again,
332that makes no sense. If we need to keep a specific interpreter in mind, we could
333as well start writing everything in C. So, our hopes are on future versions of
334stock \LUA\ and \LUAJIT. Luigi uncovered the following comment in the source code:
335
336\starttyping
337/* C functions can have arbitrary side-effects and are not
338recorded (yet). */
339\stoptyping
340
341Although the \type {(yet)} indicates that at some point this restriction can be
342lifted, we don't expect this to happen soon. And patching the jit machinery
343ourselves to suite \LUATEX\ is no option.
344
345There is an important difference between a \LUATEX\ run and other programs: they
346are runs and these live short. A lot of code gets executed only once of a few
347times (like loading fonts), or gets executed in such different ways that (branch)
348prediction is hard. If you run a web server using \LUA\ it runs for weeks in a
349row so optimizing a function pays off, given that it gets optimized. When you
350have a \LUA\ enhanced interactive program, again, the session is long enough to
351benefit from jitting (if applied). And, when you crunch numbers, it might pay off
352too. In practice, a \TEX\ run has no such characteristics.
353
354\stopsection
355
356\startsection[title=Implementation]
357
358In \LUA\ 5.2 there are some changes in the implementation compared to 5.1 and
359before. It is hard to measure the impact of that but it's probably a win some
360here and loose some there situation. A good example is the way \LUA\ deals with
361strings. Before 5.2 all strings were hashed, but now only short strings are
362(at most 32 bytes are looked at). Now, consider this:
363
364\startitemize
365    \startitem
366        In \CONTEXT\ we do all font handling in \LUA\ and that involves lots of
367        tables with lots of (nicely hashed) short keys. So, comparing them is
368        pretty fast.
369    \stopitem
370    \startitem
371        We also read a lot from files, and each line passes filters and such
372        before it gets passed to \TEX. There hashing is not really needed,
373        although when it gets processed by filters it might as well save some
374        time.
375    \stopitem
376    \startitem
377        When we go from \TEX\ to \LUA\ and reverse, lots of strings are involved
378        and many of them are unique and used once. There hashing might bring a
379        penalty.
380    \stopitem
381    \startitem
382        When we loop over a string with \type {gmatch} or some \type {lpeg}
383        subprogram lots of (small) strings can get created and each gets hashed,
384        even if they have a short livespan.
385    \stopitem
386\stopitemize
387
388The above items indicate that we can benefit from hashing but that sometimes it
389might have a performance hit. My impression is that on the average we're better
390off by hashing and it's one of the reasons why \LUA\ is so fast (and useable).
391
392In \TEX\ all numbers are integers and in \LUA\ all numbers are floats. On modern
393computers dealing with floating point is fast and we're not crunching numbers
394anyway. We definitely would have an issue when numbers were just integers and an
395upcoming mixed integer|/|float model might not be in our advantage. We'll see.
396
397I had expected to benefit from bitwise operations but so far never could find a
398real application in \CONTEXT, at least not one that had a positive impact. But
399maybe it's just a way of thinking that hasn't evolved yet. Also, the fact that
400functions are used instead of a real language extension makes it less possible
401that there is a speedup involved.
402
403\stopsection
404
405\startsection[title=Garbage collection]
406
407In the beginning I played with tuning the \LUA\ garbage collector in order to
408improve performance. For some documents changing the step and multiplier worked
409out well, but for others it didn't, so I decided that one can best leave the
410values as they are. Turning the garbage collector off as expected gives a
411relative small speedup, and for the average run the extra memory used can be
412neglected. Just keep in mind that a \TEX\ run are never persistent so memory
413can't keep filling. I did some tests with the in theory faster (experimental)
414generational mode of the garbage collector but it made runs significantly slower.
415For instance processing the \type {fonts-mkiv.pdf} went from 9 to 9.5 seconds.
416
417\stopsection
418
419\startsection[title=Conclusion]
420
421So what is, given unpredictable performance hits of advertised optimizations, the
422best approach. It all starts by the \LUA\ (and \TEX) code: sloppy coding can have
423a price. Some of that can be disguised by clever interpreters but some can't. If
424the code is already fast, there is not much to gain. When going from \MKII\ to
425\MKIV\ more and more \LUA\ got introduced and lots of approaches were
426benchmarked, so, I'm already rather confident that there is not that much to
427gain. It will never have the impressive performance of interactive games and
428that's something we have to live with. As long as \LUA\ stays lean and mean,
429things can only get better over time.
430
431\stopsection
432
433\startluacode
434    table.save("about-jitting-jit.lua",document.JitRunTimes)
435\stopluacode
436
437\stopchapter
438
439\stopcomponent
440