followingup-evolution.tex /size: 20 Kb    last modification: 2021-10-28 13:50
1% language=us runpath=texruns:manuals/followingup
2
3\startcomponent followingup-evolution
4
5\environment followingup-style
6
7% Yes, music is still evolving in qualitive ways ...
8%
9% Home Is - Jacob Collier with VOCES8
10%
11% and as long as there's interesting new music to run into I keep
12% doing thse kind of things.
13
14\startchapter[title={Evolution}]
15
16\startsection[title={Introduction}]
17
18The original idea behind \TEX\ is that of a relatively small kernel with (either
19or not system dependent) extensions. One such extension is the \DVI\ backend, and
20later \PDFTEX\ added a \PDF\ backend. Other extensions are \quote {writing to
21files} and \quote {writing to the output medium} using so called specials. This
22extension mechanism permits \TEX\ to support, for instance, color and image
23inclusion.
24
25The \LUATEX\ project started from \PDFTEX, including its extensions like font
26expansion, and combined that with (bi|)|directional typesetting from the, at that
27moment, stable \OMEGA\ variant \ALEPH. During the more than a decade development
28we integrated expansion in a more efficient way and limited directions to the
29four that made sense. The assumption that \UNICODE\ has the future lead to \UTF8
30being used all over the place.
31
32The \LUATEX\ variant opens up the internals using the \LUA\ extension language.
33The idea was (and still is) that instead if adding more and more hard coded
34solutions, one can use \LUA\ to do it on demand. So, for instance \OPENTYPE\
35fonts are supported by providing a font file reader but the implementation of
36features is up to \LUA. From \PDFTEX\ the graphic inclusions were inherited but
37an image and \PDF\ reading library provided a few more possibilities, for
38instance for querying properties. An important integral part of \LUATEX\ is the
39\METAPOST\ library, but apart from that one, the amount of libraries is kept at a
40minimum. That way we're free of dependencies and compilation hassles.
41
42With version 1.0 the functionality became official and with version 1.1 the
43functionality became more of less frozen. The main reason for this is that
44further extensions would violate the principle of using \LUA\ instead of hard
45coding solutions. Another reason is that at some point you have to provide a
46stable machinery for macro packages so that backward as well as forward
47compatibility over a longer period is possible. Also, because one can use \TEX\
48in (unattended) workflows sudden changes become undesirable.
49
50\stopsection
51
52\startsection[title={What next?}]
53
54Does it stop here? We have reached a reasonable stable state with \CONTEXT\
55\MKIV\ and can basically do what we want to do. However, during the more than a
56decade development of this \MKII\ follow up, the idea surfaced that we can go
57more minimal in the engine. Basically we can go back to where \TEX\ started: a
58core plus extension mechanism. What does that mean? First of all, there is the
59very efficient frontend: scanning macros, expanding them and constructing node
60lists, all within a powerful grouping mechanism. There is no reason to reconsider
61that. The core of the interface is also well documented, for instance in the
62\TEX\ book. We added some primitives to \LUATEX, but most of them are of no real
63importance to users; they make more sense to macro package writers.
64
65Original \TEX\ has a \DVI\ backend which is a simple representation of a page:
66characters and rules positioned on some grid. A separate program has to convert
67that into something for a printer. There is a basic extension mechanism that
68permits injection of so called specials that get passed to the external program
69so that for instance an image can be included. Given that \LUATEX\ is mostly used
70to generate \PDF, using so called wide fonts in a \UNICODE\ universe, a \DVI\
71backend is not that useful. In fact, one can then better use the faster \PDFTEX\
72program or just \ETEX\ or \TEX: use the best tool available for the job.
73
74The backend however can be left out and can be implemented in \LUA\ instead. In
75fact, most of the backend related code in \CONTEXT\ doesn't really use the
76\LUATEX\ backend features at all. The backend is only used to convert the page
77stream to a \PDF\ content stream, include images, include fonts and manage low
78level objects. Everything specific to \PDF\ is already done in \LUA. Of course
79this has a performance penalty but given the overhead already present in
80\CONTEXT\ it is bearable.
81
82Alongside the frontend the \METAPOST\ library plays an important role in
83\CONTEXT: integration between \TEX, \METAPOST\ and \LUA\ is pretty tight and a
84unique property of \CONTEXT. But, for instance the font reader library is no
85longer used. Also the interfacing to the \TEX\ Directory Structure was done in
86\LUA, originally for performance reasons as it reduced startup time by more that
87a second. For some of the frontend code (like hyphenation and par building) we
88can kick in \LUA\ variants too but there is not much to gain there. (I know that
89some users use them with success.)
90
91So, traditional \TEX\ can be summarized as:
92
93\starttyping
94tex core + dvi backend + tex extensions
95\stoptyping
96
97where the extension interface provide a few goodies. If we would have to summarize
98\LUATEX\ we could say:
99
100\starttyping
101tex core + dvi & pdf backend + tex extensions + lua callbacks
102\stoptyping
103
104The core interprets the input and does the typesetting. In order to be able to
105typeset \TEX\ only needs the dimensions of characters and information about
106spacing (which in principle are sort of independent) in math mode a few more
107properties are needed, like snippets that make large symbols. In text mode
108ligature and kerning information can be used too. However, in \LUATEX, where
109normally \OPENTYPE\ fonts are used, that information is provided from \LUA. This
110means that one can also think of:
111
112\starttyping
113tex core + basic font data + tex extensions + lua callbacks
114\stoptyping
115
116Compared to regular \TEX\ this is not that different, and it's what \CONTEXT\ can
117do with. So, it will be no surprise that when I wondered what \LUATEX\ 2.0 could
118be that a more minimalistic approach was considered: back to the basics.
119
120\stopsection
121
122\startsection[title={Roadmap}]
123
124Before I continue it is good to mention the following. One of the burdens that
125\CONTEXT\ users (and developers) carry is that the outside world likes putting
126labels on \CONTEXT, like \quotation {A macro package depending on \PDFTEX} in a
127time that we supported \DVI\ at the same level using a more of less generic
128driver model. The same is true for \MKIV, e.g.\ \quotation {\CONTEXT\ uses a lot
129of \LUA\ and moves away from \TEX} while in fact we provide a hybrid tool: you
130can use \TEX\ input (which most users do) but also \LUA\ (which can be handy) or
131\XML\ (which some publishers demand and definitely seems to be used by some
132\CONTEXT\ power users). A special one is \quotation {\CONTEXT\ is kind of plain
133\TEX, so you have to program all yourself.} Reality is that \CONTEXT\ is an
134integrated system, where \TEX\ and \METAPOST\ work together to provide a lot of
135integrated functionality. Because of \LUATEX\ development and the relation
136between an updated engine and the beta version of \CONTEXT, the impression can be
137that we have an unstable system. This strategy of parallel adaptation is the only
138way to really test of things work as expected. Because we have a rather fast
139update cycle normally users don't suffer that much from it.
140
141The core of whatever we follow up with is and remains \TEX, just because I like
142it. So, when I talk about a small core, I actually still talk about \TEX. The
143main reason is that it's way easier (and readable) to code some solutions in this
144hybrid fashion. A pure \LUA\ solution is no fun, maybe even a pain, and I have no
145use for it, but a pure \TEX\ solution can be cumbersome too. And \TEX\ input is
146just very convenient and for that one needs a \TEX\ interpreter. I would already
147have dropped out when \TEX\ was not part of the game: an intriguing, puzzling and
148powerful toy. And \METAPOST\ and \LUA\ add even more fun. So, I settle for a mix
149between three interesting languages. And, because I seldom run into professional
150demand for \LUATEX\ related support (or high end, high performance rendering),
151the fun factor has always been the driving force.
152
153All that said, for practical reasons, when we explore a follow up in the
154perspective of \CONTEXT, we will use the working title \LUAMETATEX\ instead.
155\LUAMETATEX\ has the current \LUATEX\ frontend, some \LUA\ libraries, but no
156backend. Gone are the font reader, image inclusion, \DVI\ and \PDF\ backend
157(including font inclusion) and the interface to the \TDS. Can that work? As
158mentioned, the font reader was already not used in \CONTEXT\ for quite a while. An
159alternative page stream builder was also in good working condition in \CONTEXT\
160when \LUATEX\ 1.08 was released and around \LUATEX\ 1.09 image inclusion was
161replaced (\PDF\ inclusion was already accompanied for a while by a \LUA\
162variant). Currently (fall 2018) \CONTEXT\ is able to completely construct the
163\PDF\ file which also meant font inclusion. However, it didn't make much sense to
164release that code yet because after all, there was minimal gain when using it
165with a full blown \LUATEX. Also, switching to this variant involved some runtime
166adaption of code which might confuse users. But above all, it needed more
167testing, and releasing something before an upcoming \TEX Live code freeze is a
168bad idea.
169
170During \LUATEX\ development a few times we got suggestions for additional
171features but merely looking at them already made clear that what works for
172someone in a particular case, can introduce side effects that make (for instance)
173\CONTEXT\ fail. And, how many folks keep \CONTEXT\ in mind? So, when \LUATEX\
174goes into maintenance mode, specific distributions could accept patches outside
175our control, which has the danger that a binary (suggesting to be \LUATEX)
176doesn't work with \CONTEXT. Of course we cannot change something ourselves either
177without looking around. And I'm not even bringing possible negative side effects
178on performance into the discussion here.
179
180When developing \LUATEX\ some ideas were dropped or delayed and these can now be
181explored without the danger of messing up the stable version. It has always been
182relatively easy to adapt \CONTEXT\ to changes so an (at least for now)
183experimental follow up can be dealt with too, but this time the concept of \quote
184{experimental} is really bound to \CONTEXT. When something is found useful (or
185can be improved) it can always (after testing it for a while) be fed back into
186\LUATEX, as long as it doesn't break something. I'll decide on that later.
187
188In the documentation of \TEX, when discussing the extension mechanism, Donald
189Knuth says:
190
191\startquotation
192The goal of a \TEX\ extender should be to minimize alterations to the standard
193parts of the program, and to avoid them completely if possible. He or she should
194also be quite sure that there's no easy way to accomplish the desired goals with
195the standard features that \TEX\ already has. \quotation {Think thrice before
196extending}, because that may save a lot of work, and it will also keep
197incompatible extensions of \TEX\ from proliferating.
198\stopquotation
199
200With the in the next chapters discussed reduction of backend and some frontend
201code, combined with hooks that can trigger callbacks, we try to come close to
202this objective. Now, the last sentence of this quote relates to stability and
203this is also a reason why we enter this new thread: the smaller the core is, the
204less subjected we are to change. Think of this: I haven't used \CONTEXT\ \MKII\
205in over a decade. A \PDFTEX\ format still gets generated but I have no clue if
206the engine has been changed in ways that make some code behave differently (it
207could also be the ecosystem related to that engine), but I assume it's still
208behaving the same. The same has to become true for stock \LUATEX\ and \MKIV\ and
209for \CONTEXT\ it can even become more true with \LUAMETATEX. We'll see.
210
211\stopsection
212
213\startsection[title={Experiments}]
214
215This (still sort of) prototype of what \LUAMETATEX\ could be boils down to a much
216smaller binary, and not that much more \LUA\ code on top of what we already have.
217There are no longer dependencies on third party code, apart from \LUA\ (\type
218{pplib} is tuned for \LUATEX\ and permanent part of the code base). Performance
219wise the backend of the experimental version makes a run upto 5\% slower than
220when using a native backend (on processing the \LUATEX\ manual) but history has
221learned that we can gain some of that back in due time. Performance also depends
222a bit on the properties of the document. Interesting is that better control over
223the output showed that \PDF\ output of the mentioned manual was a bit smaller
224(but that might change). \footnote {In the meantime the experimental version can
225process the \LUATEX\ manual 5\endash10\% faster and the result is still smaller.}
226
227The experiments actually started already years ago with no longer using the font
228loader. It sort of went this way:
229
230\startitemize
231\startitem
232    Stepwise \CONTEXT\ functionality started using a combination of \TEX\ and
233    \LUA\ code and we got an idea of what was needed. The most demanding part
234    was support for fonts.
235\stopitem
236\startitem
237    Font handling was done in \LUA\ because it's flexible which is what \TEX ies
238    are accustomed to. The \OPENTYPE\ and \PDF\ standards would not be called
239    standards if some implementation was impossible and so far we're ok. (Some
240    more script support will be provided in future versions.)
241\stopitem
242\startitem
243    We stopped using the fontforge font loader but use one written in \LUA\
244    instead. One reason for this was that when variable fonts showed up we wanted
245    to support it in \CONTEXT\ right from the start (not that there has been much
246    demand). The same is true for fonts using color (like emoji). Also, fighting
247    the built|-|in \FONTFORGE\ heuristics was hard.
248\stopitem
249\startitem
250    The (large and dependent on \CPLUSPLUS) poppler library used for \PDF\
251    embedding has been replaced by a small lightweight library in pure \CCODE.
252    This was triggered at a chat during a bacho\TEX\ meeting.
253\stopitem
254\startitem
255    The hard coded \PDF\ inclusion can be swapped with a \LUA\ based one so that
256    we can for instance filter the page stream. We already had a hybrid solution
257    in \CONTEXT\ anyway for other reasons (merging annotations, layers,
258    bookmarks, etc.).
259\stopitem
260\startitem
261    The page stream constructor got a (shipout and xforms) by a \LUA\ variant,
262    but I decided not to make that an independent option in stock \LUATEX\ with
263    \CONTEXT\ \MKIV, although for a while I had the option \type {--lmtx} for
264    activating that experimental code.
265\stopitem
266\startitem
267    Then of course bitmap image inclusion had to be done by \LUA\ code, in order
268    to see if we can get rid of another external dependency as some of these
269    libraries get frequent updates while in practice we only use a very small
270    subset of functionality. Indeed this was possible. \footnote {I have a pure
271    \LUA\ parser for \PDF\ too, so at some point that might get included in the
272    \CONTEXT\ code base.}
273\stopitem
274\startitem
275    With some effort (deciphering specs and such) the font inclusion could also
276    be done by a \LUA. This was made possible by the fact that we already had
277    support for variable fonts. More tricks are possible and will be explored.
278\stopitem
279\startitem
280    Finally the \PDF\ file construction and \PDF\ object management had to be
281    implemented. This was actually the easiest part.
282\stopitem
283\stopitemize
284
285Performance wise the \LUA\ font loader is faster than the built in one. The same
286is true for \PDF\ inclusion but in practice that is unnoticeable. Bitmap
287inclusion is currently slower for interlaced images (seldom used in print) and
288just as efficient for other types. The page stream constructor is definitely
289slower but this is compensated by the faster font inclusion and \PDF\ file
290construction. Of course it all depends on the kind of content, but these are the
291observation as of fall 2018. Anyway, they were enough reason to continue this
292experiment.
293
294One thing to keep in mind is that the smaller the binary and the less code paths
295we have, the better future performance might be. Computers are not becoming much
296faster for single thread processes like \TEX, so the less we jump around code
297space (memory) the better it probably is for \CPU\ caching (as caches are not
298growing much either).
299
300\stopsection
301
302\startsection[title={Conclusion}]
303
304Normally when writing this kind of code I make sure that I can enable such new
305mechanisms on top of others but at some point one has to decide how to really
306integrate them. For instance, we can do font inclusion independent of \PDF\
307generation or page stream construction independent of \PDF\ generation and|/|or
308font inclusion but in the end that doesn't make sense and makes the code base a
309bit of a mess. So, this is how it will go.
310
311Stock \LUATEX\ with \MKIV\ will use the normal backend but probably there might
312be an option to overload the built|-|in image inclusion so that one can avoid the
313abortion of a run in case of problematic images. Complete \PDF\ file
314construction, which then also includes page stream construction, font embedding
315and object management might be available as option for \MKIV\ with \LUATEX\ 1.10
316(for a while) but will be default when using \LUAMETATEX. When we move on \LMTX\
317support might evolve in more sophisticated trickery. \footnote {A few months
318later I decided that this made no sense, and that it was cleaner to just leave
319that approach for \LMTX\ only. So, now both engines use different code
320exclusively.}
321
322Once tested a bit in real documents experimental code will end up in the
323distribution. That code can then be turned into production code (read: cleaned up
324and reshuffled a bit). We can streamline the engine code base: strip the
325components that are not needed any more, remove some obsolete features, optimize
326the code, strip some functions from \LUA\ libraries, rename some helpers, and
327finally add some documentation. There are some plans to extend \METAPOST\ so also
328things can get added. Concerning the \LUA\ interface it means that \type
329{slunicode} is removed, the embedded socket related \LUA\ code goes external (but
330the library stays), the font loader gets removed, the \type {img} library goes
331away, no longer \PNG\ libraries are embedded, synctex is stripped out (but the
332fields in nodes stay or get extended). \footnote {Much later I also decided to
333remove the zip file reader library.} The resulting binary will be much smaller
334and the code base more independent and smaller too. In the process \LUAJIT\
335support might be dropped as well, simply because it no longer is in sync with
336stock \LUA, but that also depends on how complex long term maintenance becomes.
337\footnote {As we will see in following chapters, indeed support for \LUAJIT\ has
338been dropped while \LUA\ got upgraded to 5.4.}
339
340Because such a stripped down binary is no longer what got presented as \LUATEX\
341version~1, it will basically become \LUATEX\ version 2, but then we have the
342problem that its binary name clashes with the original. This is why it will be
343run as \typ {luametatex}. For \CONTEXT\ it's not that relevant as it will run on
344both \LUATEX\ 1.10 and its lean and mean successor. I might also provide a plain
345\TEX\ (read: generic) version but that is to be decided because it probably
346doesn't make much sense to spend time on it. As usual we will test this within
347the \CONTEXT\ beta program. The good thing is that it doesn't interact with
348\LUATEX, so that other macro packages are not affected. Another side effect can
349be that we uncover issues with \LUATEX\ 1.10 and that we can experiment with some
350improvements that we feed back into the parent.
351
352At the \CONTEXT\ end of this there are some plans to extend the export, maybe
353improve already present \PDF\ tagging (if found useful), add some more input
354(xml) manipulations, and maybe extend (virtual) font handling a bit, now that we
355no longer are bound to the currently used packet model. Contrary to what one
356might expect this is not really dependent on the engine.
357
358How do we proceed? As with the transition from \MKII\ to \MKIV, it will all
359happen stepwise. This means that for a while the code base will be a bit hybrid
360but at some point it might be partially split to make things cleaner, not that I
361expect many fundamental differences (certainly not in the front|-|end). This
362dualistic approach means more work but also makes that we keep a working
363\CONTEXT. We also need to keep an eye on for instance generic commands as used in
364tikz: we can't drop them so we emulate them (so far with success). As the time of
365this writing, begin November 2018, the \CONTEXT\ test suite can be processed in
366\LMTX\ mode without problems so I'm confident that it will work out ok. The next
367chapter describes the results of how we did the above in more detail.
368
369\stopsection
370
371\stopchapter
372
373\stopcomponent
374