% language=us engine=luajittex \startluacode local nofjitruns = 5000 local runnow = string.find(environment.jobname,"about%-jitting") and jit local runtimes = table.load("about-jitting-jit.lua") or { nofjitruns = nofjitruns, timestamp = os.currenttime(), } document.NOfJitRuns = runtimes.nofjitruns or nofjitruns document.JitRunTimes = runtimes function document.JitRun(specification) local code = buffers.getcontent(specification.name) if runnow then local function testrun(how) local test = load(code)() collectgarbage("collect") jit[how]() local t = os.clock() for i=1,document.NOfJitRuns do test() end t = os.clock() - t jit.off() return string.format("%0.3f",t) end local rundata = { off = testrun("off"), on = testrun("on"), } runtimes[code] = rundata document.JitTiming = rundata else local rundata = runtimes[code] or { } document.JitTiming = { off = rundata.off or "0", on = rundata.on or "0", } end end \stopluacode \starttexdefinition LuaJitTest #1% \ctxlua{document.JitRun { name = "#1" } } \starttabulate[|lT|lT|] \NC off \NC \cldcontext{document.JitTiming.off} \NC \NR \NC on \NC \cldcontext{document.JitTiming.on } \NC \NR \stoptabulate \stoptexdefinition \starttexdefinition NOfLuaJitRuns \cldcontext{document.NOfJitRuns} \stoptexdefinition % end of code \startcomponent about-jitting \environment about-environment \definehead[jittestsection][subsubsection][color=,style=bold] \startchapter[title=Luigi's nightmare] \startsection[title=Introduction] If you have a bit of a background in programming and watch kids playing video games, either or not on a dedicates desktop machine, a console or even a mobile device, there is a good change that you realize how much processing power is involved. All those pixels get calculated many times per second, based on a dynamic model that not only involves characters, environment, physics and a story line but also immediately reacts on user input. If on the other hand in your text editor hit the magic key combination that renders a document source into for instance a \PDF\ file, you might wonder why that takes so many seconds. Of course it does matter that some resources are loaded, that maybe images are included, and lots of fuzzy logic makes things happen, but the most important factor is without doubt that \TEX\ macros are not compiled into machine code but into an intermediate representation. Those macros then get expanded, often over and over again, and that a relative slow process. As (local) macros can be redefined any time, the engine needs to take that into account and there is not much caching going on, unless you explicitly define macros that do so. Take this: \starttyping \def\bar{test} \def\foo{test \bar\space test} \stoptyping Even if the definition of \type {\test} stays the same, that if \type {\bar} can change: \starttyping \foo \def\bar{foo} \foo \stoptyping There is no mechanism to freeze the meaning of \type {\bar} in \type {\foo}, something that is possible in the other language used in \CONTEXT: \starttyping local function bar() context("test") end function foo() context("test ") bar() context(" test") end \stoptyping Here we can use local functions to limit their scope. \starttyping foo() local function bar() context("foo") end foo() \stoptyping In a way you can say that \TEX\ is a bit more dynamic that \LUA, and optimizing (as well as hardening) it is much more difficult. In \CONTEXT\ we already stretched that to the limits, although occasionally I find ways to speed up a bit. Given that we spend a considerable amount of runtime in \LUA\ it makes sense to see what we can gain there. We have less possible interference and often a more predictable outcome as \type {bar}s won't suddenly become \type {foo}s. Nevertheless, the dynamic nature of both \TEX\ and \LUA\ has some impact on performance, especially when they do most of the work. While in games there are dedicated chips to do tasks, for \TEX\ there aren't. So, we're sort of stuck when it comes to speeding up the process to the level that is similar to advanced games. In the next sections I will discuss a few aspects of possible speedups and the reason why it doesn't work out as expected. \stopsection \startsection[title=Jitting] Let's go back once more to Luigi's nightmare of disappointing jit \footnote {Luigi Scarso is the author of \LUAJITTEX\ and we have reported on experiments with this variant of \LUATEX\ on several occasions.} We already know that the virtual machine of \LUAJIT\ is about twice as fast as the standard machine. We also experienced that enabling jit can degrade performance. Although we did observe some real drastic drop in performance when testing functions like \type {math.random} using the \type {mingw} compiler, we also saw a performance boost with simple pure \LUA\ functions. In that respect \LUAJIT\ is an impressive effort. So, it makes sense to use \LUAJITTEX\ even if in theory it could be faster. Next some tests will be shown. The timings are snapshots so different versions of \LUAJITTEX\ can have different outcomes. The tests are mostly used for discussions between Luigi and me and further experiments and believe me: we've really done all kind of tests to see if we can get some speed out of jitting. After all it's hard to believe that we can't gain something from it, so we might as do something wrong. Each test is run \NOfLuaJitRuns\ times. These are of course non|-|typical examples but they illustrate the principle. Each time we show two measurements: one with jit turned on, and one with jit off, but in both cases the faster virtual machine is enabled. The times shown are of course dependent on the architecture and operating system, but as we are only interested in relative times it's enough to know that we run 32 bit mingw binaries under 64 bit Windows 8 on a modern quad core Ivy bridge \CPU. We did most tests with \LUAJIT\ 2.0.1 but as far as we can see 2.0.2 has a similar performance. \startjittestsection[title={simple loops, no function calls}] \startbuffer[jittest] return function() local a = 0 for i=1,10000 do a = a + i end end \stopbuffer \typebuffer[jittest] \LuaJitTest{jittest} \stopjittestsection \startjittestsection[title={simple loops, with simple function}] \startbuffer[jittest] local function whatever(i) return i end return function() local a = 0 for i=1,10000 do a = a + whatever(i) end end \stopbuffer \typebuffer[jittest] \LuaJitTest{jittest} \stopjittestsection \startjittestsection[title={simple loops, with built-in basic functions}] \startbuffer[jittest] return function() local a = 0 for i=1,10000 do a = a + math.sin(1/i) end end \stopbuffer \typebuffer[jittest] \LuaJitTest{jittest} \stopjittestsection \startjittestsection[title={simple loops, with built-in simple functions}] \startbuffer[jittest] return function() local a = 0 for i=1,1000 do local a = a + tonumber(tostring(i)) end end \stopbuffer \typebuffer[jittest] \LuaJitTest{jittest} \stopjittestsection \startjittestsection[title={simple loops, with built-in simple functions}] \startbuffer[jittest] local tostring, tonumber = tostring, tonumber return function() local a = 0 for i=1,1000 do local a = a + tonumber(tostring(i)) end end \stopbuffer \typebuffer[jittest] \LuaJitTest{jittest} \stopjittestsection \startjittestsection[title={simple loops, with built-in complex functions}] \startbuffer[jittest] return function() local a = 0 local p = (1-lpeg.P("5"))^0 * lpeg.P("5") + lpeg.Cc(0) for i=1,100 do local a = a + lpeg.match(p,tostring(i)) end end \stopbuffer \typebuffer[jittest] \LuaJitTest{jittest} \stopjittestsection \startjittestsection[title={simple loops, with foreign function}] \startbuffer[jittest] return function() local a = 0 for i=1,10000 do a = a + font.current() end end \stopbuffer \typebuffer[jittest] \LuaJitTest{jittest} \stopjittestsection \startjittestsection[title={simple loops, with wrapped foreign functions}] \startbuffer[jittest] local fc = font.current function font.xcurrent() return fc() end return function() local a = 0 for i=1,10000 do a = a + font.xcurrent() end end \stopbuffer \typebuffer[jittest] \LuaJitTest{jittest} \stopjittestsection What we do observe here is that turning on jit doesn't always help. By design the current just|-|in|-|time compiler aborts optimization when it sees a function that is not known. This means that in \LUAJITTEX\ most code will not get jit, because we use built|-|in library calls a lot. Also, in version 2.0 we notice that a bit of extra wrapping will make performance worse too. This might be why for us jitting doesn't work out the way it is advertised. Often performance tests are done with simple functions that use built in functions that do get jit. And the more of those are supported, the better it gets. Although, when you profile a \CONTEXT\ run, you will notice that we don't call that many standard library functions, at least not so often that jitting would get noticed. A safe conclusion is that you can benefit a lot from the fast virtual machine but should check carefully if jit is not having a negative impact. As it is turned on by default in \LUAJIT\ (but off in \LUAJITTEX) it might as well get unnoticed, especially because there is always a performance gain due to the faster virtual machine and that might show more overall gain than the drawback of jitting unjittable code. It might just be a bit less drastic then possible because of artifacts mentioned here, but who knows what future versions of \LUAJIT\ will bring. Maybe sometime we can benefit from \type {ffi} but it makes no sense to mess up the \CONTEXT\ code with related calls: it looks ugly and also makes the code unusable in stock \LUA, so it is a a sort of no|-|go. There are some suggestions in \LUAJIT\ related posts about adapting the code to suit the jitter, but again, that makes no sense. If we need to keep a specific interpreter in mind, we could as well start writing everything in C. So, our hopes are on future versions of stock \LUA\ and \LUAJIT. Luigi uncovered the following comment in the source code: \starttyping /* C functions can have arbitrary side-effects and are not recorded (yet). */ \stoptyping Although the \type {(yet)} indicates that at some point this restriction can be lifted, we don't expect this to happen soon. And patching the jit machinery ourselves to suite \LUATEX\ is no option. There is an important difference between a \LUATEX\ run and other programs: they are runs and these live short. A lot of code gets executed only once of a few times (like loading fonts), or gets executed in such different ways that (branch) prediction is hard. If you run a web server using \LUA\ it runs for weeks in a row so optimizing a function pays off, given that it gets optimized. When you have a \LUA\ enhanced interactive program, again, the session is long enough to benefit from jitting (if applied). And, when you crunch numbers, it might pay off too. In practice, a \TEX\ run has no such characteristics. \stopsection \startsection[title=Implementation] In \LUA\ 5.2 there are some changes in the implementation compared to 5.1 and before. It is hard to measure the impact of that but it's probably a win some here and loose some there situation. A good example is the way \LUA\ deals with strings. Before 5.2 all strings were hashed, but now only short strings are (at most 32 bytes are looked at). Now, consider this: \startitemize \startitem In \CONTEXT\ we do all font handling in \LUA\ and that involves lots of tables with lots of (nicely hashed) short keys. So, comparing them is pretty fast. \stopitem \startitem We also read a lot from files, and each line passes filters and such before it gets passed to \TEX. There hashing is not really needed, although when it gets processed by filters it might as well save some time. \stopitem \startitem When we go from \TEX\ to \LUA\ and reverse, lots of strings are involved and many of them are unique and used once. There hashing might bring a penalty. \stopitem \startitem When we loop over a string with \type {gmatch} or some \type {lpeg} subprogram lots of (small) strings can get created and each gets hashed, even if they have a short livespan. \stopitem \stopitemize The above items indicate that we can benefit from hashing but that sometimes it might have a performance hit. My impression is that on the average we're better off by hashing and it's one of the reasons why \LUA\ is so fast (and useable). In \TEX\ all numbers are integers and in \LUA\ all numbers are floats. On modern computers dealing with floating point is fast and we're not crunching numbers anyway. We definitely would have an issue when numbers were just integers and an upcoming mixed integer|/|float model might not be in our advantage. We'll see. I had expected to benefit from bitwise operations but so far never could find a real application in \CONTEXT, at least not one that had a positive impact. But maybe it's just a way of thinking that hasn't evolved yet. Also, the fact that functions are used instead of a real language extension makes it less possible that there is a speedup involved. \stopsection \startsection[title=Garbage collection] In the beginning I played with tuning the \LUA\ garbage collector in order to improve performance. For some documents changing the step and multiplier worked out well, but for others it didn't, so I decided that one can best leave the values as they are. Turning the garbage collector off as expected gives a relative small speedup, and for the average run the extra memory used can be neglected. Just keep in mind that a \TEX\ run are never persistent so memory can't keep filling. I did some tests with the in theory faster (experimental) generational mode of the garbage collector but it made runs significantly slower. For instance processing the \type {fonts-mkiv.pdf} went from 9 to 9.5 seconds. \stopsection \startsection[title=Conclusion] So what is, given unpredictable performance hits of advertised optimizations, the best approach. It all starts by the \LUA\ (and \TEX) code: sloppy coding can have a price. Some of that can be disguised by clever interpreters but some can't. If the code is already fast, there is not much to gain. When going from \MKII\ to \MKIV\ more and more \LUA\ got introduced and lots of approaches were benchmarked, so, I'm already rather confident that there is not that much to gain. It will never have the impressive performance of interactive games and that's something we have to live with. As long as \LUA\ stays lean and mean, things can only get better over time. \stopsection \startluacode table.save("about-jitting-jit.lua",document.JitRunTimes) \stopluacode \stopchapter \stopcomponent