% language=us runpath=texruns:manuals/followingup \startcomponent followingup-compilation \environment followingup-style \logo[WLS] {WLS} \logo[INTEL] {Intel} \logo[APPLE] {Apple} \logo[UBUNTU] {Ubuntu} \logo[RASPBERRY] {RaspberryPi} \startchapter[title={Compilation}] Compiling \LUATEX\ is possible because after all it's what I do on my machine. The \LUATEX\ source tree is part of a larger infrastructure: \TEX Live. Managing that one is work for specialists and the current build system is the work of experts over a quite long period of time. When you only compile \LUATEX\ it goes unnoticed that there are many dependencies, some of which are actually unrelated to \LUATEX\ itself but are a side effect of the complexity of the build structure. When going from \LUATEX\ to \LUAMETATEX\ many dependencies were removed and I eventually ended up with a simpler setup. The source tree went down to less than 30 MB and zipped to around 4 MB. That makes it possible to consider adding the code to the regular \CONTEXT\ distribution. One reason for doing that is that one keeps the current version of the engine packaged with the current version of \CONTEXT. But a more important one is that it fulfils a demand. Some time ago we were asked by some teachers participating in a (basically free) math method for technical education what guarantees there are that the tools used are available forever. Now, even with \LUAMETATEX\ one has to set up a compiler but it is much easier than installing the whole \TEX Live infrastructure for that. A third reason is that it gives me a comfortable feeling that I myself can compile it anywhere as can \CONTEXT\ users who want to do that. The source tree traditionally has libs in a separate directory (lua, luajit, zlib and zziplib). However, it is more practical to have them alongside our normal source. These are relative small collections of files that never change so there is no reason not to do it. \footnote {If I ever decide to add more libraries, only the minimal interfaces needed will be provided, but at this moment there are no such plans.} Another assumption we're going to make is that we use 64 bit binaries. There is no need to support obsolete platforms either. As a start we make sure it compiles on the platforms used by \CONTEXT\ users. Basically we make a kind of utility. For now I can compile the \WINDOWS\ 32 bit binaries that my colleague needs in half a minute anyway, but in the long run we will settle for 64 bits. I spent about a week figuring out why the compilation is so complex (by selectively removing components). At some point compilation on \OSX\ stopped working. When the minimum was reached I decided to abandon the automake tool chain and see if \type {cmake} could be used (after all, Mojca challenged that). In retrospect I should have done that sooner because in a day I could get all relevant platforms working. Flattening the source tree was a next step and so there is no way back now. What baffled me (and Alan, who at some point joined in testing \OSX) is the speed of compilation. My pretty old laptop needed about half a minute to get the job done and even on a \RASPBERRY\ with only a flash card just a few minutes were needed. At that point, as we could remove more make related files, the compressed 11 MB archive (\type {tar.xz}) shrunk to just over 2~MB. Interesting is that compiling \MPLIB\ takes most time, and when one compiles in parallel (on more cores) that one finishes last. For the record: I do all this on a laptop running \MSWINDOWS\ 10 using the Linux subsystem. When that came around, Luigi made me a working setup for cross compilation but in the meantime with GCC 8.2 all works out of the box. I edit the files at the \MSWINDOWS\ end (using \SCITE), compile at the \LINUX\ end, and test everything on \MSWINDOWS. It is a pretty convenient setup. When compilation got faster it became also more convenient to do some more code reshuffling. This time I decided to pack the global variables into structures, more or less organized the way the header files were organized. It gives a bit more verbosity but also has the side effects that (at least in principle) the \CPU\ cache can perform better because neighboring variables are often cached as part of the deal. Now it might be imagination, but in the process I did notice that mid March processing the manual went down to below 11.7 seconds while before it stayed around 12.1 seconds. Of course this is not that relevant currently, but I might make a difference on less capable processors (as in a low power setup). It anyway didn't hurt. In the meantime some of the constants used in the program got prefixes or suffixes to make them more unique and for instance the use of \type {normal} as equivalent for zero was made a bit more distinctive as we now have more subtypes. That is: all the subtypes were collected in enumerations instead of \CCODE\ defines. Back to the basics. End of 2020 I noticed that the binary had grown a bit relative to the mid 2020 versions. This surprised me because some improvements actually made them smaller, something you notice when you compile a couple of times when doing these things. I also noticed that the platforms on the compile farm had quite a bit of variation. In most cases we're still below my 3MB threshold, but when for instance cross compiled binaries become a few hundred MB larger one can get puzzled. In the \LUAMETAFUN\ manual I have this comment at the top: \starttyping[style=\ttx] ------------------------ ------------------------ ------------------------ 2019-12-17 32bit 64bit 2020-01-10 32bit 64bit 2020-11-30 32bit 64bit ------------------------ ------------------------ ------------------------ freebsd 2270k 2662k freebsd 2186k 2558k freebsd 2108k 2436k openbsd6.6 2569k 2824k openbsd6.6 2472k 2722k openbsd6.8 2411k 2782k linux-armhf 2134k linux-armhf 2063k linux-armhf 2138k 2860k linux 2927k 2728k linux 2804k 2613k linux (?) 3314k 2762k linux-musl 2532k 2686k osx 2821k osx 2732k osx 2711k ms mingw 2562k 2555k ms mingw 2481k 2471k ms mingw 2754k 2760k ms intel 2448k ms arm 3894k ms clang 2159k ------------------------ ------------------------ ------------------------ \stoptyping So why the differences? One possible answer is that the cross compiler now uses \GCC9 instead of \GCC8. It is quite likely that inlining code is done more aggressively (at least one can find remarks of that kind on the Internet). An interesting exception in this overview is the \LINUX\ 32 bit version. The native \WINDOWS\ binary is smaller than the \MINGW\ binary but the \CLANG\ variant is still smaller. For the native compilation we always enabled link time optimization, which makes compiling a bit slower but similar to regular compilation in \WLS\ but when for the other compilers we turn on link time optimization the linker takes quite some time. I just turn it off when testing code because it's no fun to wait these additional minutes with \GCC. Given that the native windows binary by now runs nearly as fast as the cross compiled ones, it is an indication that the native \WINDOWS\ compiler is quite okay. The numbers also show (for \WINDOWS) that using \CLANG\ is not yet an option: the binaries are smaller but also much slower and compilation (without link time optimization) also takes much longer. But we'll see how that evolves: the compile farm generates them all. So, what effects does link time optimization has? The (current) cross compiled binary is is some 60KB smaller and performs a little better. Some tests show some 3~percent gain but I'm pretty sure users won't notice that on a normal run. So, when we forget to enable it when we release new binaries, it's no big deal. Another end 2020 adventure was generating \ARM\ binaries for \OSX\ and \WINDOWS. This seems to work out well. The \OSX\ binaries were tested, but we don't have the proper hardware in the compile farm, so for now users have to use \INTEL\ binaries on that hardware. Compiling the \LUAMETATEX\ manual on a 2020 M1 is a little more that twice as fast than on my 2013 i7 laptop running \WINDOWS. A native \ARM\ binary is about three times faster, which is what one expects from a more modern (also a bit performance hyped) chipset. On a \RASPBERRY\ with 4MB ram, an external \SSD\ on \USB3, running \UBUNTU\ 20, the manual compiles three times slower than on my laptop. So, when we limit conclusions to \LUAMETATEX\ it looks like \ARM\ is catching up: these modern chipsets (from \APPLE\ and \MICROSOFT, although the later was not yet tested) with plenty of cache, lots of fast memory, fast graphics and speedy disks are six times faster than a cheap media oriented \ARM\ chipset. Being a single core consumer, \LUAMETATEX\ benefits more from faster cores than from more cores. But, unless I have these machines on my desk these rough estimates have to do. \stopchapter \stopcomponent