followingup-cleanup.tex /size: 16 Kb    last modification: 2021-10-28 13:50
1% language=us runpath=texruns:manuals/followingup
2
3% Youtube: TheLucs play with Jacob Collier // Don't stop til you get enough
4
5\startcomponent followingup-cleanup
6
7\environment followingup-style
8
9\logo [ALGOL]   {Algol}
10\logo [FORTRAN] {FORTRAN}
11\logo [SPSS]    {SPSS}
12\logo [DEC]     {DEC}
13\logo [VAX]     {VAX}
14\logo [AMIGA]   {Amiga}
15
16\startchapter[title={Cleanup}]
17
18\startsection[title={Introduction}]
19
20Original \TEX\ is a literate program, which means that code and documentation are
21mixed. This mix, called a \WEB, is split into a source file and a \TEX\ file and
22both parts are processed independently into a program (binary) and a typeset
23document. The evolution of \TEX\ went through stages but in the end a \PASCAL\
24\WEB\ file was the result. This fact has lead to the more or less standard \WEBC\
25compilation infrastructure which is the basis for \TEXLIVE.
26
27% My programming experience started with programming a micro processor kit (using
28% an 1802 processor), but at the university I went from \ALGOL\ to \PASCAL\ (okay,
29% I also remember lots of \SPSS\ kind|-|of|-|\FORTRAN\ programming. The \PASCAL\
30% was the one provided on \DEC\ and \VAX\ machines and it was a bit beyond standard
31% \PASCAL. Later I did quite some programming in \MODULA 2 in (for a while an
32% \AMIGA) but mostly on personal computers. The reason that I mention this it that
33% it still determines the way I look at programs. For instance that code goes
34% through a couple if stepwise improvements (and that it can always be done
35% better). That you need to keep an eye on memory consumption (can be a nice
36% challenge). That a properly formatted source code is important (at least for me).
37%
38% When into \PASCAL, I ran into the \TEX\ series and as it looked familiar it ended
39% up on my bookshelf. However, I could not really get an idea what it was about,
40% simply because I had no access to the \TEX\ program. But the magic stayed with
41% me. The fact that \LUA\ resembles \PASCAL, made it a good candidate for extending
42% \TEX\ (there were other reasons as well). When decades later, after using \TEX\
43% in practice, I ended up looking at the source, it was the \LUATEX\ source.
44
45So, \TEX\ is a woven program and this is also true for the starting point of
46\LUATEX: \PDFTEX. But, because we wanted to open up the internals, and because
47\LUA\ is written in \CCODE, already in an early stage Taco decided to start from
48the \CCODE\ translated from \PASCAL. A permanent conversion was achieved using
49additional scripts and the original documentation stayed in the source. The one
50large file was split into more logical smaller parts and combined with snippets
51from \ALEPH .
52
53After we released version 1.0 I went through the documentation parts of the code
54and normalized that a bit. The at that moment still sort of simple \WEB\ files
55became regular \CCODE\ files, and the idea was (and is) that at some point it
56should be possible to process the documentation (using \CONTEXT).
57
58Over time the \CCODE\ code evolved and functions ended up in places that at that
59made most sense at that moment. After the previously described stripping process,
60I decided to go through the files and see if a bit of reshuffling made sense,
61mostly because that would make documenting easier. (I'm not literate enough to
62turn it into a proper literate program.) It was also a good moment to get rid of
63unused code (not that much) and unused macros (some more than expected). It also
64made sense to change a few names (for instance to avoid potential future clashes
65with \type {lua_} core functions). However, all this takes quite some careful
66checking and compilation runs, so I expect that after this first cleanup, for
67quite some time stepwise improvements can happen (especially in adding comments).
68\footnote {This is and will be an ongoing effort. It probably doesn't show, but
69getting the code base in the state it is in now, took quite some time. It
70probably won't take away complaints and nagging but I've decided no longer to pay
71attention to those on the sideline.} \footnote {In the end not much \PDFTEX\ and
72\ALEPH\ code is present in \LUAMETATEX , but these were useful intermediate
73steps. No matter how lean \LUAMETATEX\ becomes, I have a weak spot for \PDFTEX\
74as it always served us well and without it \TEX\ would be less present today.}
75
76One of the things that I keep in mind when doing this, is that we use \LUA. This
77component compiles on most relevant platforms and as such we can assume that
78\LUAMETATEX\ also should (and can be) made a bit less dependent on old mechanisms
79that are used in stock \LUATEX. For instance, we don't come from \PASCAL\ any
80longer but there are traces of that transition still present. We also don't use
81specific operating system features, and those that we use are also used in \LUA.
82And, as we try to share code we can also delegate some (more) to \LUA. For
83instance file related code is not dependent on other components in the \TEX\
84infrastructure, but maybe at some point the runtime loadable \KPSE\ library can
85kick in. So, basically the idea is to sort of go bare bone first and later see
86how with the help of \LUA\ we can get bring some back. For the record: this is
87not needed for \CONTEXT\ as it already has this interface to \TDS. \footnote
88{This has been removed from my agenda.}
89
90\stopsection
91
92\startsection[title={Motivation}]
93
94The \LUATEX\ project started as an experiment of adding \LUA\ to \PDFTEX, which
95was done by Hartmut and in order to avoid confusion we named it \LUATEX. When we
96figured out that there this had possibilities we decided to go further and Taco
97took the challenge to rework the code base. Part of that work was sponsored by
98Idris' Oriental \TEX\ project. I have fond memory of the intensive and rapid
99development cycles: online discussions, binaries going my directions,
100experimental \CONTEXT\ code going the other way. When we had reached a sort of
101stable state but at some point, read: usage in \CONTEXT\ had become crucial, a
102steady further development started, where Taco redid \METAPOST\ into \MPLIB,
103funded by user groups. At some point Luigi took over from Taco the task of
104integration of components (also into \TEX Live), introduced \LUAJIT\ into the
105binary, conducted the (again partially funded) swiglib project, followed by
106support for \FFI. A while later I myself started messing around in the code base
107directly and continued extending the engine and \LUA\ interfaces.
108
109I could work on this because I have quite some freedom at the place where I work.
110We use (part of) \CONTEXT\ for some projects and especially in dealing with \XML\
111we could benefit from \LUATEX. It must be said that (long running) projects like
112these never pay off (on the contrary, they cost a lot in terms of money and
113energy) so it's quite safe to conclude that \LUATEX\ development is to a large
114extend a (many man years) work of love for the subject. I guess that no sane
115company will do (permit) such a thing. It is also for that reason that I keep
116spending time on it, and as a simplification of the code base was always one of
117my dreams, this is what I spend my time on now. After all, \LUATEX\ is just
118juggling bytes and as it is written in \CCODE, and has no graphical user
119interface or complex dependencies, it should be possible to have a relative
120simple setup in terms of code files and compilation. Of course this is also made
121possible by the fact that I can use \LUA. It's also why I decided to
122\quotation {Just do it}, and then \quotation {Let's see where I end up}. No
123matter how it turns out, it makes a good vehicle for further development and
124years of fun.
125
126\stopsection
127
128\startsection[title={Files}]
129
130After a decade of adding and moving around code it's about time to reorganize the
131code a bit, but we do so without deviating too much from the original setup. For
132instance we started out with a small number of \LUA\ interface macros and these
133were collected in a few files, and defined in one \type {h} file, but it made
134sense to have header files alongside the libraries that implement helpers. This
135is a rather tedious job but with music videos or video casts on a second screen
136it is bearable.
137
138When I reached a state where we only needed the \LUATEX\ files plus the minimal
139set of libraries I tried to get rid of directories in the source tree that were
140placeholders, but with \type {automake} files, like those for \PDFTEX\ and
141\XETEX. After a couple of attempts I gave up on that because the build setup is
142rather hard coded for checking them. Also, there were some (puzzling)
143dependencies in the configuring on \OMEGA\ files as well as some \DVI\ related
144tools. So, that bit is for later to sort out. \footnote {Of course later the
145decision was made to forget about using \type {autotools} and go for an as simple
146as possible \type {cmake} solution.}
147
148\stopsection
149
150\startsection[title={Command line arguments}]
151
152As we need to set up a backend and deal with font loading in \LUA, we can as well
153delegate some of the command line handling to \LUA\ as well. Therefore, only the
154a limited set of options is dealt with: those that determine the startup and \LUA\
155behavior. In principle we can even get rid of all and always use a startup script
156but for now it makes sense to not deviate too much from a regular \TEX\ run.
157
158At the time of this writing some code is still in place that is a candidate for
159removal. For instance, using the \type {&} to define a format file has long be
160replaced by \type {--fmt}. There are sentimental reasons for keeping it but at
161the same time we need to realize that shells use these special characters too. A
162for me unknown (or forgotten) feature of prefixing a jobname with a \type {*}
163will be removed as it makes no sense. There is some \MSWINDOWS\ specific last
164resort code that probably will go too, unless I can figure out why it is needed
165in the first place. \footnote {Intercepting these symbols has been dropped in
166favor of the command line flags.}
167
168Now left with a very simple set of command line options it also makes sense to
169use a simple option analyzer, so that was a next step as it rid us of a
170dependency and produces less code.
171
172So, the option parser has now been replaced by a simple variant that is more in
173tune with what will happen when you deal with options in \LUA: no magic. One
174problem is that \TEX's first input file is moved from the command line to the
175input buffer and a an interactive session is emulated. As mentioned before, there
176is some extra \type {&}, \type {*} and \type {\\} parsing involved. One can
177wonder if this still makes sense in a situation where one has to specify a format
178and \LUA\ file (using \type {--fmt} and \type {--ini}) so that might as well be
179redone a bit some day. \footnote {In the end only these explicit command line
180options were supported.}
181
182\stopsection
183
184\startsection[title={Platforms}]
185
186When going through the code I noticed conditional sections for long obsolete
187platforms: \type {amiga}, \type {dos} and \type {djgpp}, \type {os/2}, \type
188{aix}, \type {solaris}, etc. Also, with 64 bit becoming the standard, it makes
189sense to assume that users will use a modern 64 platform (intel or arm combined
190with \MSWINDOWS\ or some popular \UNIX\ variant). We don't need large and complex
191code management for obscure platforms and architectures simply because we want to
192proof that \LUAMETATEX\ runs everywhere. With respect to \MSWINDOWS\ we use a
193cross compiler (\type {mingw}) as reference but native compilation should be no
194big deal eventually. We can cross that bridge when we have a simplified
195compilation set up. Right now it doesn't make sense to waste time on a native
196\MICROSOFT\ compilation as it would also pollute the code with conditional
197sections. We'll see what happens when I'm bored. \footnote {In the meantime no
198effort is made to let the source compile otherwise than with the cross compiler.
199Best is to keep the code as clean as possible with respect to conditional code
200sections. So don't bother me with patches.}
201
202\stopsection
203
204\startsection[title={Stubs}]
205
206A \CONTEXT\ run is managed by \MTXRUN\ in combination with a specific script
207
208\starttyping
209mtxrun --script context
210\stoptyping
211
212On windows, we use a stub because using a \type {cmd} file create an indirectness
213that is not seen as executable and therefore in other command files needs to
214be called in a special way to guarantee continuation. So, there we have a small
215binary:
216
217\starttyping
218mtxrun.exe ...
219\stoptyping
220
221that will call:
222
223\starttyping
224luatex --luaonly mtxrun.lua ...
225\stoptyping
226
227And when the stub has a different name than \type {mtxrun}, say:
228
229\starttyping
230context.exe ...
231\stoptyping
232
233it effectively becomes:
234
235\starttyping
236luatex --luaonly mtxrun.lua --script context ...
237\stoptyping
238
239Because the stripped down version assumes some kind of initializations anyway a
240small extension made it possible to use \LUAMETATEX\ as stub too. So, when we
241rename \type {luametatex.exe} to \type {mtxrun.exe} (on \UNIX\ we don't use a
242suffix) it will start up as \LUA\ interpreter when it finds a script with the
243name \type {mtxrun.lua} in the same path. When we rename it to \type
244{context.exe} it will search for \type {context.lua} and all that that script has
245to do is this:
246
247\starttyping
248arg[0] = "mtxrun"
249
250table.insert(arg,1,"mtx-context")
251table.insert(arg,1,"--script")
252
253dofile(os.selfpath .. "/" .. "mtxrun.lua")
254\stoptyping
255
256So, it basically becomes a call to \type {mtxrun}, but we stay in \LUAMETATEX.
257Because we want an isolated run this will launch \LUAMETATEX\ again with the
258right command line arguments. This sounds inefficient but because we have a small
259binary this is no real issue, and as that run is isolated, it cannot influence
260the caller. The overhead is really small: on my somewhat older laptop it's .2
261seconds, but we had that management overhead already for decades, so no one
262bothers about it. On all platforms using symbolic links works ok too.
263
264\stopsection
265
266\startsection[title={Global variables}]
267
268There are quite a bit global variables and function in the code base, but in the
269process of opening up I got rid of some. The cleanup turned some more into
270locals which saved executable bytes (keep in mind that we also use the engine as
271\LUA\ interpreter so, the smaller, the more friendly). \footnote {Later the
272global variables were collected in so called \CCODE\ structs.} This is work
273in progress.
274
275\stopsection
276
277\startsection[title={Memory usage}]
278
279By going over all the code a couple of times, I was able to decrease the amount
280of used memory a bit as well as avoid some memory allocations. This has no
281consequences for performance but is nicer when multiple runs at the same time
282(e.g.\ on virtual machines) have to compete for resources. \footnote {I will
283probably have to spend some more time on this in order to reach a state that I'm
284satisfied with.}
285
286\stopsection
287
288\startsection[title={\METAPOST}]
289
290The current code base doesn't have that many files. We can imagine that, when
291\LUA\ can be compiled on a platform, that compiling \LUAMETATEX\ is also no that
292complicated. However, the rather complex build infrastructure demonstrates the
293opposite. One of the complications is that \MPLIB\ is codes in \CWEB\ and that
294needs some juggling to get \CCODE. The process has quite some dependencies. There
295are some upstream patches needed, but for now occasionally checking with the
296upstream sources used for compiling \MPLIB\ in \LUATEX\ works okay. \footnote
297{Later I decided to cleanup the \MPLIB\ code: unused font related code was
298removed, the \POSTSCRIPT\ backend was untangled, the translation from \CWEB\ to
299\CCODE\ got done by a \LUA\ script, aspects like error reporting and \IO\ were
300redone, and in the end some new extensions were added. Some of that might trickle
301back to th original, as long as it doesn't harm compatibility; after all
302\METAPOST\ (the program) is standardized and considered functionally stable.}
303
304As \LUAMETATEX\ is also used for experiments we use a copy of the \LUA\ library
305interface. That way we don't interfere with the stable \LUATEX\ situation. When
306we play with extensions, we can always decide to backport them, once they are
307found useful and in good working order. But, as that interface was just \CCODE\
308this was trivial.
309
310\stopsection
311
312\startsection[title={Files}]
313
314In a relative late stage I decided to cleanup some of the filename handling.
315First I got rid of the \type {area}, \type {name} and \type {ext} decomposition
316and optional recomposition. In the original engine that goes through the string
317pool and although there is some recovery in the end, with many files and fonts
318being used, the pool can get exhausted. For instance when you have hundreds of
319thousands of \typ {\font \foo = bar} kind of definitions, each definition wipes
320out the previous entry in the hash, but its font name is kept in the string pool.
321I got rid of that side effect by reusing strings but in the end decided to avoid
322the pool altogether. It was then a small step to also do that for other
323filenames. In the process I also decided that it made no sense to keep the code
324around that reads a filename from the console: we now just quit. Restarting the
325program with a proper filename is no big deal today. I might do some more cleanup
326there. In the end we can best use a callback for handling input from the console.
327
328\stopsection
329
330\stopchapter
331
332\stopcomponent
333