followingup-memory.tex /size: 7486 b    last modification: 2021-10-28 13:50
1% language=us runpath=texruns:manuals/followingup
2
3\startcomponent followingup-memory
4
5\environment followingup-style
6
7\startchapter[title={Memory}]
8
9\startsection[title={Introduction}]
10
11\stopsection
12
13\startsection[title={\LUA}]
14
15When you initialize \LUA\ a proper memory allocator has to be provided. The
16allocator gets an old size and new size passed. When both are zero the allocator
17can \type {free} the blob, when the new size exceeds the old size the blob has to
18be \type {realloc}'s, and otherwise an initial \type {malloc} happens. When used
19with \CONTEXT, \LUAMETATEX\ will do lots of calls to the allocator and often an
20initial allocation is followed by a reallocation, for instance because tables
21start out small but immediately grows a while after.
22
23It is for this reason that early 2021 I decided to look into alternative
24allocators. I can of course code one myself, but where a \LUATEX\ run is a one
25time event, often with growing memory usage due to all kind of accumulating
26resources, using the engine as stand alone interpreter needs a more sophisticated
27approach than just keeping a bunch of bucket pools alive: when the script engine
28runs for months or even years memory should be returned to the operating system
29occasionally. We don't want the same side effects that \HTML\ browsers have:
30during the day you need to restart them occasionally because they use up quite a
31bit of your computers memory (often for no real reason, so it probably has to do
32with keeping memory in store instead of returning it and|/|or it can be a side
33effect of a scattered pool \unknown\ who knows).
34
35Instead of reinventing that wheel I ended up with testing Daan Leijen's \type
36{mimalloc} implementation: a not bloated, not too low level, reasonable sized
37library. Some simple experiments learned that it does make a difference in
38performance. The experiment was done with the native \MICROSOFT\ compiler (msvc).
39One reason for that is that till that moment I preferred the cross compiled
40\MINGW\ versions (for cross compiling I use the \LINUX\ subsystem that comes with
41\MSWINDOWS). Although native binaries compile faster and are smaller, the cross
42compiled ones perform somewhat better (often some 5\%). Interesting is that
43making the format file is always much faster with a native binary, probably
44because the console output is supported better. When the alternative memory
45allocator is plugged into \LUA\ suddenly the native version outperforms the cross
46compiled one (also by some 5\%). The overall gain on a native binary for
47compiling the \LUAMETATEX\ manual is between~5 and~10\% which was reason enough
48to continue this experiment. As a first step the native compiled version will
49default to it, later other platforms might follow.
50
51\stopsection
52
53\startsection[title={\TEX}]
54
55Memory allocation in \TEX\ has always been done by the engine itself. At startup
56a couple of big chunks are allocated and from that smaller blobs are taken. The
57largest chunks are for nodes, tokens and the table of equivalents (including the
58hash where control sequences are mapped onto registers and macros (lists of
59tokens). Smaller chunks are used for nesting states, after group restoration
60stacks, in- and output levels, etc. In modern engines the sizes of the chunks can
61be configured, some only at format generation time. In \LUAMETATEX\ we are more
62dynamic and after an initial (minimal) chunk allocation, when needed more memory
63will be allocated on demand, in steps, until a configured size is reached. That
64size has an upper limit (which if needed can be enlarged at compilation time). A
65side effect is that we (need to) do some more checking.
66
67Node memory is special in the sense that nodes are basically offsets in a large
68array where each node has a number of slots after that offset. This is rather
69efficient in terms of performance and memory. New nodes (of any size) are taken
70from the node chunk and never returned. When freed they are appended to a list
71per size and that list serves as pool before new nodes get taken from the chunk.
72Variable size chunks are done differently, if only because we use them plenty in
73\CONTEXT\ and they can lead to (excessive and) fragmented memory usage otherwise.
74
75Tokens all have the same size so here there is only one list of free tokens.
76Because tokens and (most) nodes make it into linked lists those lists of free
77nodes and tokens are rather natural. And it's also fast. It all means that \TEX\
78itself does hardly any real memory allocation: only a few dozen large chunks. An
79exception is the string pool, where contrary to traditional \TEX\ engines, the
80\LUATEX\ (and \LUAMETATEX) engines allocate strings using \type {malloc}. Those
81strings (used for control sequences) are never freed. In other cases where
82strings are used, like in for instance \type {\csname} construction, temporary
83strings are used. The same is true for some file related operations. None of
84these are real demanding in terms of excessive allocation and freeing. Also, in
85places that matter \LUAMETATEX\ is already quite optimized so using a different
86allocator gives no gain here.
87
88Technically we could allocate nodes by using \type {malloc} but there are a few
89places in the engine that makes this hard. It can be done but then we need to
90make some conceptual changes (with regards to the way inserts are dealt with) and
91the question is if we gain much by breaking away from tradition. I guess there it
92will actually hurt performance if we change this. Another variant is where we
93allocate nodes of the same size from different pools but this doesn't bring us
94any gain either. A stringer argument is that changing the current (and historic)
95memory management of nodes will complicate the code.
96
97A bit of an exception is the flow of information between \LUA\ and \TEX. There we
98do quite some allocation but it depends on how much a macro package demands of
99that.
100
101\stopsection
102
103\startsection[title={\METAPOST}]
104
105When the \METAPOST\ library was written, Taco changed the memory allocation to be
106more dynamic. One reason for this is that the number models (scaled, double,
107decimal, binary) have their own demands. For some objects (like numbers) the
108implementation uses a pool so it sits between the way \TEX\ works and \LUA\ when
109the standard allocator is used. This means that although quite some allocation
110is demanded, often the pool can serve the requests. (We might use a few more
111pools in the future.)
112
113In \LUAMETATEX\ the memory related code has been reorganized a little so that
114(again as experiment) the \type {mimalloc} manager can be used. The performance
115gain is not as impressive as with \LUA, but we'll see how that evolves when more
116demand poses more stress.
117
118\stopsection
119
120\startsection[title={The verdict}]
121
122In \LUAMETATEX\ version 2.09.4 and later the native \MSWINDOWS\ binaries now use
123the alternative \type {mimalloc} allocator. The gain is most noticeable for \LUA\
124and a little for \TEX\ and \METAPOST. The test suite with 2550 files runs in 1200
125seconds which is quite an improvement over the \MINGW\ cross compiled binary that
126needs 1350 seconds. We do occasionally test a binary compiled with \CLANG\ but
127that one is much slower than both others (compilation also takes much more time)
128but that might improve over time. Because of these results, it is likely that
129I'll also check out the other platforms, once the \MSWINDOWS\ binaries have
130proven to be stable (those are the once I use anyway).
131
132\stopsection
133
134\stopchapter
135
136\stopcomponent
137