followingup-compilation.tex /size: 9339 b    last modification: 2021-10-28 13:50
1% language=us runpath=texruns:manuals/followingup
2
3\startcomponent followingup-compilation
4
5\environment followingup-style
6
7\logo[WLS]       {WLS}
8\logo[INTEL]     {Intel}
9\logo[APPLE]     {Apple}
10\logo[UBUNTU]    {Ubuntu}
11\logo[RASPBERRY] {RaspberryPi}
12
13\startchapter[title={Compilation}]
14
15Compiling \LUATEX\ is possible because after all it's what I do on my machine.
16The \LUATEX\ source tree is part of a larger infrastructure: \TEX Live. Managing
17that one is work for specialists and the current build system is the work of
18experts over a quite long period of time. When you only compile \LUATEX\ it goes
19unnoticed that there are many dependencies, some of which are actually unrelated
20to \LUATEX\ itself but are a side effect of the complexity of the build
21structure.
22
23When going from \LUATEX\ to \LUAMETATEX\ many dependencies were removed and I
24eventually ended up with a simpler setup. The source tree went down to less than
2530 MB and zipped to around 4 MB. That makes it possible to consider adding the
26code to the regular \CONTEXT\ distribution.
27
28One reason for doing that is that one keeps the current version of the engine
29packaged with the current version of \CONTEXT. But a more important one is that
30it fulfils a demand. Some time ago we were asked by some teachers participating
31in a (basically free) math method for technical education what guarantees there
32are that the tools used are available forever. Now, even with \LUAMETATEX\ one
33has to set up a compiler but it is much easier than installing the whole \TEX
34Live infrastructure for that. A third reason is that it gives me a comfortable
35feeling that I myself can compile it anywhere as can \CONTEXT\ users who want to
36do that.
37
38The source tree traditionally has libs in a separate directory (lua, luajit, zlib
39and zziplib). However, it is more practical to have them alongside our normal
40source. These are relative small collections of files that never change so there
41is no reason not to do it. \footnote {If I ever decide to add more libraries,
42only the minimal interfaces needed will be provided, but at this moment there are
43no such plans.}
44
45Another assumption we're going to make is that we use 64 bit binaries. There is
46no need to support obsolete platforms either. As a start we make sure it compiles
47on the platforms used by \CONTEXT\ users. Basically we make a kind of utility.
48For now I can compile the \WINDOWS\ 32 bit binaries that my colleague needs in
49half a minute anyway, but in the long run we will settle for 64 bits.
50
51I spent about a week figuring out why the compilation is so complex (by
52selectively removing components). At some point compilation on \OSX\ stopped
53working. When the minimum was reached I decided to abandon the automake tool
54chain and see if \type {cmake} could be used (after all, Mojca challenged that).
55In retrospect I should have done that sooner because in a day I could get all
56relevant platforms working. Flattening the source tree was a next step and so
57there is no way back now. What baffled me (and Alan, who at some point joined in
58testing \OSX) is the speed of compilation. My pretty old laptop needed about half
59a minute to get the job done and even on a \RASPBERRY\ with only a flash card
60just a few minutes were needed. At that point, as we could remove more make
61related files, the compressed 11 MB archive (\type {tar.xz}) shrunk to just over
622~MB. Interesting is that compiling \MPLIB\ takes most time, and when one compiles
63in parallel (on more cores) that one finishes last.
64
65For the record: I do all this on a laptop running \MSWINDOWS\ 10 using the Linux
66subsystem. When that came around, Luigi made me a working setup for cross
67compilation but in the meantime with GCC 8.2 all works out of the box. I edit the
68files at the \MSWINDOWS\ end (using \SCITE), compile at the \LINUX\ end, and test
69everything on \MSWINDOWS. It is a pretty convenient setup.
70
71When compilation got faster it became also more convenient to do some more code
72reshuffling. This time I decided to pack the global variables into structures,
73more or less organized the way the header files were organized. It gives a bit
74more verbosity but also has the side effects that (at least in principle) the
75\CPU\ cache can perform better because neighboring variables are often cached as
76part of the deal. Now it might be imagination, but in the process I did notice
77that mid March processing the manual went down to below 11.7 seconds while before
78it stayed around 12.1 seconds. Of course this is not that relevant currently, but
79I might make a difference on less capable processors (as in a low power setup).
80It anyway didn't hurt.
81
82In the meantime some of the constants used in the program got prefixes or
83suffixes to make them more unique and for instance the use of \type {normal} as
84equivalent for zero was made a bit more distinctive as we now have more subtypes.
85That is: all the subtypes were collected in enumerations instead of \CCODE\
86defines. Back to the basics.
87
88End of 2020 I noticed that the binary had grown a bit relative to the mid 2020
89versions. This surprised me because some improvements actually made them smaller,
90something you notice when you compile a couple of times when doing these things.
91I also noticed that the platforms on the compile farm had quite a bit of
92variation. In most cases we're still below my 3MB threshold, but when for
93instance cross compiled binaries become a few hundred MB larger one can get
94puzzled. In the \LUAMETAFUN\ manual I have this comment at the top:
95
96\starttyping[style=\ttx]
97------------------------   ------------------------   ------------------------
982019-12-17  32bit  64bit   2020-01-10  32bit  64bit   2020-11-30  32bit  64bit
99------------------------   ------------------------   ------------------------
100freebsd     2270k  2662k   freebsd     2186k  2558k   freebsd     2108k  2436k
101openbsd6.6  2569k  2824k   openbsd6.6  2472k  2722k   openbsd6.8  2411k  2782k
102linux-armhf 2134k          linux-armhf 2063k          linux-armhf 2138k  2860k
103linux       2927k  2728k   linux       2804k  2613k   linux   (?) 3314k  2762k
104                                                      linux-musl  2532k  2686k
105osx                2821k   osx                2732k   osx                2711k
106ms mingw    2562k  2555k   ms mingw    2481k  2471k   ms mingw    2754k  2760k
107                                                      ms intel           2448k
108                                                      ms arm             3894k
109                                                      ms clang           2159k
110------------------------   ------------------------   ------------------------
111\stoptyping
112
113So why the differences? One possible answer is that the cross compiler now uses
114\GCC9 instead of \GCC8. It is quite likely that inlining code is done more
115aggressively (at least one can find remarks of that kind on the Internet). An
116interesting exception in this overview is the \LINUX\ 32 bit version. The native
117\WINDOWS\ binary is smaller than the \MINGW\ binary but the \CLANG\ variant is
118still smaller. For the native compilation we always enabled link time
119optimization, which makes compiling a bit slower but similar to regular
120compilation in \WLS\ but when for the other compilers we turn on link time
121optimization the linker takes quite some time. I just turn it off when testing
122code because it's no fun to wait these additional minutes with \GCC. Given that
123the native windows binary by now runs nearly as fast as the cross compiled ones,
124it is an indication that the native \WINDOWS\ compiler is quite okay. The numbers
125also show (for \WINDOWS) that using \CLANG\ is not yet an option: the binaries
126are smaller but also much slower and compilation (without link time optimization)
127also takes much longer. But we'll see how that evolves: the compile farm
128generates them all.
129
130So, what effects does link time optimization has? The (current) cross compiled
131binary is is some 60KB smaller and performs a little better. Some tests show
132some 3~percent gain but I'm pretty sure users won't notice that on a normal run.
133So, when we forget to enable it when we release new binaries, it's no big deal.
134
135Another end 2020 adventure was generating \ARM\ binaries for \OSX\ and \WINDOWS.
136This seems to work out well. The \OSX\ binaries were tested, but we don't have
137the proper hardware in the compile farm, so for now users have to use \INTEL\
138binaries on that hardware. Compiling the \LUAMETATEX\ manual on a 2020 M1 is a
139little more that twice as fast than on my 2013 i7 laptop running \WINDOWS. A
140native \ARM\ binary is about three times faster, which is what one expects from a
141more modern (also a bit performance hyped) chipset. On a \RASPBERRY\ with 4MB
142ram, an external \SSD\ on \USB3, running \UBUNTU\ 20, the manual compiles three
143times slower than on my laptop. So, when we limit conclusions to \LUAMETATEX\ it
144looks like \ARM\ is catching up: these modern chipsets (from \APPLE\ and
145\MICROSOFT, although the later was not yet tested) with plenty of cache, lots of
146fast memory, fast graphics and speedy disks are six times faster than a cheap
147media oriented \ARM\ chipset. Being a single core consumer, \LUAMETATEX\ benefits
148more from faster cores than from more cores. But, unless I have these machines
149on my desk these rough estimates have to do.
150
151\stopchapter
152
153\stopcomponent
154