followingup-stripping.tex /size: 15 Kb    last modification: 2021-10-28 13:50
1% language=us runpath=texruns:manuals/followingup
2
3%  2,777,600 / 11,561,471 cont-en.fmt
4
5% Hooverphonic - Live at the Ancienne Belgique (Geike Arnaert)
6
7\startcomponent followingup-stripping
8
9\environment followingup-style
10
11\startchapter[title={Stripping}]
12
13\startsection[title={Introduction}]
14
15Normally I need a couple of iterations to reach the implementation that I like
16(an average of three rewrites is rather normal). So, I sat down and started
17stripping the engine and did so a few times in order to get an idea of how to
18proceed. One drawback of going public too soon (and we ran into that with
19\LUATEX) is that as soon as there are more users, one gets stuck into the
20situation that a different approach is not really possible. This is why from now
21on experimental is really experimental, even if that means: it works ok in
22\CONTEXT\ (even for production) but we can change interfaces be better, e.g.\
23more consistent (although we're also stuck with existing \TEX\ terminology).
24Anyway, let's proceed.
25
26\stopsection
27
28\startsection[title={The binary}]
29
30In 2014 the \LUATEX\ binary was some 10.9 MB large. The version 1.09 binary of
31October 2018 was about 6.8MB, and the reduction was due to removing the bitmap
32generation from \MPLIB\ as well as replacing poppler by pplib. As an exercise I
33decided to see how easy it was to make a small version suitable for \CONTEXT\
34\LMTX, and as expected the binary shrunk to below 3MB (plus a \LUA\ and \KPSE\
35dll). This is a reasonable size given what is still present.
36
37There is hardly any file related code left because in practice the backend used
38the most different file types. That also meant that we could remove \KPSE\
39related code and keep all that in the library part. In principle one can load
40that library and hook it into the few callbacks that relate to loading files.
41Once we're stable I'll probably write some code for that. \footnote {In the
42meantime I think it makes not much sense to do that.} Launching the binary with a
43startup script can deal with all matters needed, because the command line
44arguments are available.
45
46We could actually go even smaller by removing the built|-|in \TFM\ and \VF\
47readers. For instance it made not much sense to read and store information that
48is never used anyway, like virtual font data: as long as the backend has access
49to what it needs it's fine. By removing unused code and stripping no longer used
50fields in the internal font tables (which is also good for memory consumption),
51and cleaning up a bit here and there the experimental binary ended up at a bit
52above 2.5MB (plus a \LUA\ dll). \footnote {Mid January we were just below 2.7 MB
53with a static, all inclusive, binary. In March the static ended up at 2.9 MB on
54\MSWINDOWS\ and 2.6 MB in \UNIX.}
55
56\stopsection
57
58\startsection[title={Functionality}]
59
60There is no real reason to change much in the functionality of the frontend but
61as we have no backend now, some primitives are gone. These have to be implemented
62as part of creating a backend.
63
64\starttyping
65\dviextension \dvivariable \dvifeedback
66\pdfextension \pdfvariable \pdffeedback
67\stoptyping
68
69The already obsolete related dimensions are also removed:
70
71\starttyping
72\pageleftoffset \pagerightoffset
73\pagetopoffset  \pagebottomoffset
74\stoptyping
75
76And we no longer need the page dimensions because they are just registers that
77are normally used in the backend. So, we got rid of:
78
79\starttyping
80\pageheight
81\pagewidth
82\stoptyping
83
84Some font related inheritances from \PDFTEX\ have also been dropped:
85
86\starttyping
87\letterspacefont
88\copyfont
89\expandglyphsinfont
90\ignoreligaturesinfont
91\tagcode
92\stoptyping
93
94Internally all backend whatsits are gone, but generic \type {literal}, \type
95{save}, \type {restore} and \type {setmatrix} nodes can still be created. Under
96consideration is to let them be so called user nodes but for testing it made
97sense to keep them around for a while. \footnote {Don't take this as a reference:
98later we will see that more was changed.}
99
100The resource relates primitives are backend dependent so the primitives have been
101removed. As with other backend related primitives, their arguments depend on the
102implementation. So, no more:
103
104\starttyping
105\saveboxresource
106\useboxresource
107\lastsavedboxresourceindex
108\stoptyping
109
110and:
111
112\starttyping
113\saveimageresource
114\useimageresource
115\lastsavedimageresourceindex
116\lastsavedimageresourcepages
117\stoptyping
118
119Of course the rule nodes subtypes are still there, so the typesetting machinery
120will handle them fine. It is no big deal to define a pseudo|-|primitive that
121provides the functionality at the \TEX\ level.
122
123The position related primitives are also backend dependent so again they were
124removed. \footnote {There was some sentimental element in this. Long ago, even
125before \PDFTEX\ showed up, \CONTEXT\ already had a positional mechanism. It
126worked by using specials in combination with a program that calculated the
127positions from the \DVI\ file. At some point that functionality was integrated
128into \PDFTEX. For me it always was a nice example of demonstrating that
129complaints like \quotation {\TEX\ is limited because we don't know the position
130of an element in the text.} make no sense: \TEX\ can do more than one thinks,
131given that one thinks the right way.}
132
133\starttyping
134\savepos
135\lastxpos
136\lastypos
137\stoptyping
138
139We could have kept \type {\savepos} but better is to be consistent. We no longer
140need these:
141
142\starttyping
143\outputmode
144\draftmode
145\synctex
146\stoptyping
147
148These could go because we no longer have a backend and if one needs it it's easy
149to define a meaningful variable and listen to that.
150
151The \type {\shipout} primitive does no ship out but just flushes the content of
152the box, if that hasn't happened already.
153
154Because we have \LUA\ on board, and because we can now use the token scanners to
155implement features, we no longer need the hard coded randomizer extensions. In
156fact, also the \METAPOST\ should now use the \LUA\ randomizer, so that we are
157consistent. Anyway, removed are:
158
159\starttyping
160\randomseed
161\setrandomseed
162\normaldeviate
163\uniformdeviate
164\stoptyping
165
166plus the helpers in the \type {tex} library.
167
168\stopsection
169
170\startsection[title={Fonts}]
171
172Fonts are sort of special. We need the data at the \LUA\ end in order to process
173\OPENTYPE\ fonts and the backend code needs the virtual commands. The par builder
174also needs to access font properties, as does the math renderer, but here is no
175real reason to carry virtual font information around (which involves packing and
176unpacking virtual packets). So, in the end it made much sense to also delegate
177the \TFM\ and \VF\ loading to \LUA\ as well. And, as a consequence dumping and
178undumping font information could go away too, which is okay, as we didn't preload
179fonts in \CONTEXT\ anyway. The saving in binary bytes is not impressive but
180keeping unused code around neither. In principle we can get rid of the internal
181representation if we fetch relevant data from the \LUA\ tables but that might be
182unwise from the perspective of performance. By removing the no longer needed
183fields the memory footprint became somewhat smaller and font loading (passing
184from \LUA\ to \TEX) more efficient.
185
186\stopsection
187
188\startsection[title={File IO}]
189
190What came next? A program like \LUATEX\ interacts with its environment and one of
191the nice things about \TEX\ is that it has a standard ecosystem, organized as the
192\quotation {\TEX\ Directory Structure}. There is library that interfaces with
193this structure: \KPSE, but in \CONTEXT\ \MKIV\ we implement its functionality in
194\LUA. The primary reason for this was performance. When we started with \LUATEX\
195the startup on my machine (\MSWINDOWS) and a few servers (\LINUX) of a \TEX\
196engine took seconds and most fo that was due to loading the rather large file
197databases, because a \TEX\ Live installation was a gigabyte adventure. With the
198\LUA\ variant I could bring that down to milliseconds, because I could pre|-|hash
199the database and limit it to files relevant for \CONTEXT\ (still a lot, as fonts
200made up most). Nowadays we have \SSD\ disks and plenty of memory for caching, so
201these things are less urgent, but on network shares it still matters.
202
203So, as we don't use \KPSE, we can remove that library. By doing that we simplify
204compilation a lot as then all dependencies are in the engine's source tree, and
205we're no longer dependent on updates. One can argue that we then sacrifice too
206much, but already for a decade we don't use it and the \LUA\ variant does the job
207well within the \TDS\ ecosystem. Also, in our by now stripped down engine, there
208is not that much lookup going on anyway: we're already in \LUA\ when we do fonts.
209But on the other hand, some generic usage could benefit from the library to be
210present, so we face a choice. The choice is made even more difficult by the fact
211that we can remove all kind of tweaks once we delegate for instance control over
212command execution to \LUA\ completely. But, we might provide \KPSE\ as loadable
213\LUA\ module so that when needed one can use a stub to start the program with a
214\LUA\ script that as first action loads this library that then can take care of
215further file management. As command line arguments are available in \LUA, one can
216also implement the relevant extra switches (and even more if needed).
217
218Now, the interesting thing is that because we have a \LUA\ interface to \KPSE\ we
219can actually drop some hard coded solutions. This means that we can have a binary
220without \KPSE, in which case one has to cook up callbacks that do what this
221library does. But in a version with \KPSE\ embedded one also has to define some
222file related callbacks although they can be rather simple. By keeping a handful
223of file related callbacks the code base could be simplified a lot. In the process
224the recorder option went away (not that we ever used it). It is relatively easy
225to support this in the \quote {find} related callbacks and one has to deal with
226other files (like images and fonts) also, so keeping this feature was a cheat
227anyway.
228
229At this point it is important to notice that while we're dropping some command
230line options, they can still be passed and intercepted at the \LUA\ end. So,
231providing compatible (or alternative solution) is no big deal. For instance,
232execution of (shell) programs is a \LUA\ activity and can be managed from there.
233
234\stopsection
235
236\startsection[title={Callbacks}]
237
238Callbacks can be organized in groups. First there are those related to
239\IO. We only have to deal with a few types: all kind of \TEX\ files (data
240files), format files and \LUA\ modules (but these to are on the list of
241potentially dropped files as this can be programmed in \LUA).
242
243\starttyping
244find_write_file
245find_data_file open_data_file read_data_file
246find_format_file find_lua_file find_clua_file
247\stoptyping
248
249The callbacks related to errors stay: \footnote {Some more error handling was
250added later, as was intercepting user input related to it.}
251
252\starttyping
253show_error_hook show_lua_error_hook,
254show_error_message show_warning_message
255\stoptyping
256
257% We kept the buffer handlers but dropped the output handler later anyway, so we
258% have left:
259%
260% \starttyping
261% process_input_buffer
262% \stoptyping
263
264The management hooks were kept (but the edit one might go): \footnote {And
265indeed, that one went away.}
266
267\starttyping
268process_jobname
269call_edit
270start_run stop_run wrapup_run
271pre_dump
272start_file stop_file
273\stoptyping
274
275Of course the typesetting callbacks remain too as they are the backbone of the
276opening up:
277
278\starttyping
279buildpage_filter hpack_filter vpack_filter
280hyphenate ligaturing kerning
281pre_output_filter contribute_filter build_page_insert
282pre_linebreak_filter linebreak_filter post_linebreak_filter
283insert_local_par append_to_vlist_filter new_graf
284hpack_quality vpack_quality
285mlist_to_hlist make_extensible
286\stoptyping
287
288Finally we mention one of the important callbacks:
289
290\starttyping
291define_font
292\stoptyping
293
294Without that one defined not much will happen with respect to typesetting. I
295could actually remove the \type {\font} primitive but that would be a bit weird
296as other font related commands stay. Also, it's one of the fundamental frontend
297primitives, so removal was never really considered.
298
299\stopsection
300
301\startsection[title={Bits and pieces}]
302
303In the process some helpers and status queries were removed. From the summary
304above you can deduce that this concerns images, backend, and file management.
305Also not used variables (some inherited from the past and predecessors) were
306removed. These and other changes are the reason why there is a separate manual
307for \LUAMETATEX. \footnote {Relatively late in the project I decided to be more
308selective in what got initialized in \LUA\ only mode.}
309
310One of my objectives was to see how lean and mean the code base could be. But
311even if we don't use that many files, the rather complex build system makes that
312we need to have (make and configure) files in the tree that are not really used
313but even then omitting them aborts a build. I played a bit with that but the
314problem is that it needs to be dealt with upstream in order to prevent repetitive
315work. So, this is something to sort out later. Eventually it would be nice to be
316able to compile with a minimal set of source files, also because other programs
317(all kind of \TEX\ variants) that are checked for but not compiled depend on
318libraries that we don't need (and therefore want) to have in the stripped down
319source tree. \footnote {In the end, the source tree was redesigned completely.}
320
321For now we also brought down the number of catcode tables (to 256) \footnote {As
322with math families, and if more tables are needed one should wonder about the
323\TEX\ code used.}, and the number of languages (to 8192) \footnote {This is
324already a lot and because languages are loaded run time, we can go much lower
325than this.} as that saves some initially allocated memory.
326
327\stopsection
328
329\startsection[title={What's next}]
330
331Basically the experiment ends here. A next step is to create a stable code base,
332make compilation easy and consider the way the code is packages. Then some
333cleanup can take place. Also, as it's a window to the outside world, \type {ffi}
334support will move to the code base and be integral to \LUAMETATEX. And of course
335the decision about \LUAJIT\ support has to be made some day soon. The same is
336true for \LUA\ 5.4: in \LUATEX\ for now we stick to 5.3 but experimenting with
3375.4 in \LUAMETATEX\ can't harm us. \footnote {The choice has been made:
338\LUAMETATEX\ will not have a \LUAJIT\ based companion.}
339
340To what extend the \CONTEXT\ code base will have a special files for \LMTX\ is
341yet to be decided, but we have some ideas about new features that might make that
342desirable from the perspective of maintenance. The main question is: do I want to
343have hybrid files or clean files for each variant (stock \MKIV\ and \LMTX).
344
345For the record: at the time of wrapping this up, processing the \LUATEX\ manual
346of 294 pages took 13.5 seconds using stock \LUATEX\ while using the stripped down
347binary, where \LUA\ takes over some tasks, took 13.9 seconds. \footnote {In the
348meantime we're down to around 11.6MB. These are all rough numbers and mostly
349indicate relative speeds at some point.} The \LUAJITTEX\ variant needed 10.9 and
35010.8 seconds. So, there is no real reason to not explore this route, although
351\unknown\ the \PDF\ file size shrinks from 1.48MB to 1.18MB (and optionally we
352can squeeze out more) but one can wonder if I didn't make big mistakes. It is
353good to realize that there is not much performance to gain in the engine simply
354because most code is already pretty well optimized. The same is true for the
355\CONTEXT\ code: there might be a few places where we can squeeze out a few
356milliseconds but probably it will go unnoticed.
357
358On the todo list went removal of \type {\primitive} which we never use (need) and
359the possible introduction of a way to protect primitives and macros against
360redefinition, but on the other hand, it might impact performance and be not worth
361the trouble. In the end it is a macro package issue anyway and we never really
362ran into users redefining primitives. \footnote {Indeed this primitive has been
363removed.}
364
365\stopsection
366
367\stopchapter
368
369\stopcomponent
370