% language=us runpath=texruns:manuals/followingup \startcomponent followingup-memory \environment followingup-style \startchapter[title={Memory}] \startsection[title={Introduction}] \stopsection \startsection[title={\LUA}] When you initialize \LUA\ a proper memory allocator has to be provided. The allocator gets an old size and new size passed. When both are zero the allocator can \type {free} the blob, when the new size exceeds the old size the blob has to be \type {realloc}'s, and otherwise an initial \type {malloc} happens. When used with \CONTEXT, \LUAMETATEX\ will do lots of calls to the allocator and often an initial allocation is followed by a reallocation, for instance because tables start out small but immediately grows a while after. It is for this reason that early 2021 I decided to look into alternative allocators. I can of course code one myself, but where a \LUATEX\ run is a one time event, often with growing memory usage due to all kind of accumulating resources, using the engine as stand alone interpreter needs a more sophisticated approach than just keeping a bunch of bucket pools alive: when the script engine runs for months or even years memory should be returned to the operating system occasionally. We don't want the same side effects that \HTML\ browsers have: during the day you need to restart them occasionally because they use up quite a bit of your computers memory (often for no real reason, so it probably has to do with keeping memory in store instead of returning it and|/|or it can be a side effect of a scattered pool \unknown\ who knows). Instead of reinventing that wheel I ended up with testing Daan Leijen's \type {mimalloc} implementation: a not bloated, not too low level, reasonable sized library. Some simple experiments learned that it does make a difference in performance. The experiment was done with the native \MICROSOFT\ compiler (msvc). One reason for that is that till that moment I preferred the cross compiled \MINGW\ versions (for cross compiling I use the \LINUX\ subsystem that comes with \MSWINDOWS). Although native binaries compile faster and are smaller, the cross compiled ones perform somewhat better (often some 5\%). Interesting is that making the format file is always much faster with a native binary, probably because the console output is supported better. When the alternative memory allocator is plugged into \LUA\ suddenly the native version outperforms the cross compiled one (also by some 5\%). The overall gain on a native binary for compiling the \LUAMETATEX\ manual is between~5 and~10\% which was reason enough to continue this experiment. As a first step the native compiled version will default to it, later other platforms might follow. \stopsection \startsection[title={\TEX}] Memory allocation in \TEX\ has always been done by the engine itself. At startup a couple of big chunks are allocated and from that smaller blobs are taken. The largest chunks are for nodes, tokens and the table of equivalents (including the hash where control sequences are mapped onto registers and macros (lists of tokens). Smaller chunks are used for nesting states, after group restoration stacks, in- and output levels, etc. In modern engines the sizes of the chunks can be configured, some only at format generation time. In \LUAMETATEX\ we are more dynamic and after an initial (minimal) chunk allocation, when needed more memory will be allocated on demand, in steps, until a configured size is reached. That size has an upper limit (which if needed can be enlarged at compilation time). A side effect is that we (need to) do some more checking. Node memory is special in the sense that nodes are basically offsets in a large array where each node has a number of slots after that offset. This is rather efficient in terms of performance and memory. New nodes (of any size) are taken from the node chunk and never returned. When freed they are appended to a list per size and that list serves as pool before new nodes get taken from the chunk. Variable size chunks are done differently, if only because we use them plenty in \CONTEXT\ and they can lead to (excessive and) fragmented memory usage otherwise. Tokens all have the same size so here there is only one list of free tokens. Because tokens and (most) nodes make it into linked lists those lists of free nodes and tokens are rather natural. And it's also fast. It all means that \TEX\ itself does hardly any real memory allocation: only a few dozen large chunks. An exception is the string pool, where contrary to traditional \TEX\ engines, the \LUATEX\ (and \LUAMETATEX) engines allocate strings using \type {malloc}. Those strings (used for control sequences) are never freed. In other cases where strings are used, like in for instance \type {\csname} construction, temporary strings are used. The same is true for some file related operations. None of these are real demanding in terms of excessive allocation and freeing. Also, in places that matter \LUAMETATEX\ is already quite optimized so using a different allocator gives no gain here. Technically we could allocate nodes by using \type {malloc} but there are a few places in the engine that makes this hard. It can be done but then we need to make some conceptual changes (with regards to the way inserts are dealt with) and the question is if we gain much by breaking away from tradition. I guess there it will actually hurt performance if we change this. Another variant is where we allocate nodes of the same size from different pools but this doesn't bring us any gain either. A stringer argument is that changing the current (and historic) memory management of nodes will complicate the code. A bit of an exception is the flow of information between \LUA\ and \TEX. There we do quite some allocation but it depends on how much a macro package demands of that. \stopsection \startsection[title={\METAPOST}] When the \METAPOST\ library was written, Taco changed the memory allocation to be more dynamic. One reason for this is that the number models (scaled, double, decimal, binary) have their own demands. For some objects (like numbers) the implementation uses a pool so it sits between the way \TEX\ works and \LUA\ when the standard allocator is used. This means that although quite some allocation is demanded, often the pool can serve the requests. (We might use a few more pools in the future.) In \LUAMETATEX\ the memory related code has been reorganized a little so that (again as experiment) the \type {mimalloc} manager can be used. The performance gain is not as impressive as with \LUA, but we'll see how that evolves when more demand poses more stress. \stopsection \startsection[title={The verdict}] In \LUAMETATEX\ version 2.09.4 and later the native \MSWINDOWS\ binaries now use the alternative \type {mimalloc} allocator. The gain is most noticeable for \LUA\ and a little for \TEX\ and \METAPOST. The test suite with 2550 files runs in 1200 seconds which is quite an improvement over the \MINGW\ cross compiled binary that needs 1350 seconds. We do occasionally test a binary compiled with \CLANG\ but that one is much slower than both others (compilation also takes much more time) but that might improve over time. Because of these results, it is likely that I'll also check out the other platforms, once the \MSWINDOWS\ binaries have proven to be stable (those are the once I use anyway). \stopsection \stopchapter \stopcomponent