luametatex-modifications.tex /size: 47 Kb    last modification: 2024-01-16 09:02
1% language=us runpath=texruns:manuals/luametatex
2
3\environment luametatex-style
4
5\startcomponent luametatex-modifications
6
7\startchapter[reference=modifications,title={The original engines}]
8
9\startsection[title=The merged engines]
10
11\startsubsection[title=The rationale]
12
13\topicindex {engines}
14\topicindex {history}
15
16The first version of \LUATEX, made by Hartmut after we discussed the possibility
17of an extension language, only had a few extra primitives and it was largely the
18same as \PDFTEX. It was presented to the public in 2005. As part of the Oriental
19\TEX\ project, Taco merged some parts of \ALEPH\ into the code and some more
20primitives were added. Then we started more fundamental experiments. After many
21years, when the engine had become more stable, the decision was made to clean up
22the rather hybrid nature of the program. This means that some primitives were
23promoted to core primitives, often with a different name, and that others were
24removed. This also made it possible to start cleaning up the code base, which
25showed decades of stepwise additions to original \TEX. In \in {chapter}
26[enhancements] we discuss some new primitives, here we will cover most of the
27adapted ones.
28
29During more than a decade stepwise new functionality was added and after 10 years
30the more of less stable version 1.0 was presented. But we continued and after
31some 15 years the \LUAMETATEX\ follow up entered its first testing stage. But
32before details about the engine are discussed in successive chapters, we first
33summarize where we started from. Keep in mind that in \LUAMETATEX\ we have a bit
34less than in \LUATEX, so this section differs from the one in the \LUATEX\
35manual.
36
37Besides the expected changes caused by new functionality, there are a number of
38not|-|so|-|expected changes. These are sometimes a side|-|effect of a new
39(conflicting) feature, or, more often than not, a change necessary to clean up
40the internal interfaces. These will also be mentioned.
41
42Again we stress that {\em this is not a \TEX\ manual, nor a tutorial}. If you are
43unfamiliar with \TEX\ first play a little with a macro package, take a look at
44the \TEX\ book, make yourself familiar with the concepts and macro language. That
45will likely take days and not hours. Also, many of the new concepts introduced in
46\LUATEX\ and \LUAMETATEX\ are explained in documents that come with the \CONTEXT\
47distribution, articles and presentations. It doesn't pay of to repeat that here,
48especially not in a time when users often search instead of read from cover to
49cover.
50
51Occasionally there are extensions to \PDFTEX\ and \LUATEX\ but these are unlikely
52to en dup in \LUAMETATEX. If needed one can add functionality using \LUA. Another
53reason is that because the way we handle files and generate output being
54compatible would only harm the engine. We have some fundamental extensions that
55overcome limitations anyway. One area where the are significate changes is in
56logging: at some point it no longer made sense to be compatible (with \LUATEX)
57because we carry around more information.
58
59\stopsubsection
60
61\startsubsection[title={Original \TEX}]
62
63\topicindex {\TEX}
64
65Of course it all starts with traditional \TEX. Even if we started with the
66\PDFTEX\ code base, most still comes from original Knuthian \TEX. But we divert a
67bit.
68
69\startitemize
70
71\startitem
72    The current code base is written in \CCODE, not \PASCAL. The original \WEB\
73    documentation is kept when possible and not wrapped in tagged comments. As a
74    consequence instead of one large file plus change files, we now have multiple
75    files organized in categories like \type {tex}, \type {lua}, \type
76    {languages}, \type {fonts}, \type {libraries}, etc. There are some artifacts
77    of the conversion to \CCODE, but these got (and get) removed stepwise. The
78    documentation, which actually comes from the mix of engines (via so called
79    change files), is a mix of what authors of the engines wove into the source,
80    and most is of course from Don Knuths original. In \LUAMETATEX\ we try to
81    stay as close as possible to the original so that the documentation of the
82    fundamentals behind \TEX\ by Don Knuth still applies. However, because we use
83    \CCODE, some documentation is a bit off. Also, most global variables are now
84    collected in structures, but the original names and level of abstraction were
85    mostly kept. On the other hand, opening up had its impact on the code, so
86    that makes some documentation a bit off too. Adapting that all will take time.
87\stopitem
88
89\startitem
90    See \in {chapter} [languages] for quite some changes related to paragraph
91    building, language handling and hyphenation. Because we have independent runs
92    over the node list for hyphenation, kerning, ligature building, plus
93    callbacks that also can tweak the list, adding a brace group in the middle of
94    a word (like in \type {of{}fice}) does not prevent ligature creation. In
95    fact, preventing kerns and ligatures can now be done with glyph options so
96    that we don't depend on side effects of the engine. Because hyphenation,
97    ligature building and kerning has been split so that we can hook in
98    alternative or extra code wherever we like. There are various options to
99    control discretionary injection and related penalties are now integrated in
100    these nodes. Language information is now bound to glyphs. The number of
101    languages in \LUAMETATEX\ is smaller than in \LUATEX. Control over
102    discretionaries is more granular and now managed by less variables. Although
103    \LUAMETATEX\ behaves pretty much like you expect from \TEX, due to the many
104    possibilities it is unlikely that you get identical output.
105\stopitem
106
107\startitem
108    There is no pool file, all strings are embedded during compilation. This also
109    removed some memory constraints. We kept token and node memory management
110    because it is convenient and efficient but parts were reimplemented in order
111    to remove some constraints. Token and node memory management is a bit more
112    efficient which was needed because we carry around more information. All the
113    other large memory structures, like those related to nesting, the save stack,
114    input levels, the hash table and table of equivalents, etc. now all start out
115    small and are enlarged when needed, where maxima are controlled in the usual
116    way. In principle the initial memory footprint is smaller while at the same
117    time we can go real large. Because we have wide memory words some data
118    (arrays) used for housekeeping could be reorganized a bit.
119\stopitem
120
121\startitem
122    The macro (definition and expansion) parsers are extended and we can have more
123    detailed argument parsing. This has been done in a way that keeps compatibility.
124\stopitem
125
126\startitem
127    The specifier \type {plus 1 fillll} does not generate an error. The extra
128    \quote {l} is simply typeset.
129\stopitem
130
131\startitem
132    The upper limit to \prm {endlinechar} and \prm {newlinechar} is 127.
133\stopitem
134
135\startitem
136    Because the backend is not built|-|in, the magnification (\tex {mag})
137    primitive is gone. A \tex {shipout} command just discards the content of the
138    given box. The write related primitives have to be implemented in the used
139    macro package using \LUA. None of the \PDFTEX\ derived primitives is present.
140\stopitem
141
142\startitem
143    Because there is no font loader, a \LUA\ variant is free to either support or
144    not the \OMEGA\ \type {ofm} file format. As there are hardly any such fonts
145    it probably makes no sense. There is plenty of control over the way glyphs
146    get treated and scaling of fonts and glyphs is also more dynamic.
147\stopitem
148
149\startitem
150    There is more control over some (formerly hard|-|coded) math properties. In
151    fact, there is a whole extra bit of math related code because we need to deal
152    with \OPENTYPE\ fonts. The math processing has been adapted to the new
153    (dynamic) font and glyph scaling features. Because there is more granular
154    control, for instance because there are more classes, the engine has to be
155    set up differently. This is also true for features that control how for
156    instance math fonts are processed. An intermediate, improved, variant of the
157    \LUATEX\ dual code path approach has been sacrificed in the process.
158\stopitem
159
160\startitem
161    Math atoms and constructs like fractions, fences, radicals and accents have
162    all been extended. The new variants accept all kind of keywords that control
163    the rendering. As direct consequence noads (and nodes in general) are much
164    bigger in terms of memory usage. For now we keep the old commands available
165    but that might change when we see no eight bit fonts being used.
166\stopitem
167
168\startitem
169    The \prm {outer} and \prm {long} prefixed are silently ignored but other
170    prefixes have been added. It is permitted to use \prm {par} in math and
171    there are more such convenience options.
172\stopitem
173
174\startitem
175    The lack of a backend means that some primitives related to it are not
176    implemented. This is no big deal because it is possible to use the scanner
177    library to implement them as needed, which depends on the macro package and
178    backend.
179\stopitem
180
181\startitem
182    The math style related primitives can use numbers as well as symbolic names.
183    There is some more (control over) math anyway, which is a side effect of
184    supporting \OPENTYPE\ math.
185\stopitem
186
187\stopitemize
188
189There is much more to say here but at least this gives an idea of what you end up
190with if you move from traditional \TEX\ to \LUAMETATEX: a more complex but also
191more flexible system.
192
193\stopsubsection
194
195\startsubsection[title={Goodies from \ETEX}]
196
197\topicindex {\ETEX}
198
199Being the de|-|facto standard extension of course we provide the \ETEX\ features,
200but only those that make sense. We used version 2.2 which is basically the only
201version that was ever released.
202
203\startitemize
204
205\startitem
206    The \ETEX\ functionality is always present and enabled so the prepended
207    asterisk or \type {-etex} switch for \INITEX\ is not needed.
208\stopitem
209
210\startitem
211    The \TEXXET\ extension is not present, so the primitives \type
212    {\TeXXeTstate}, \type {\beginR}, \type {\beginL}, \type {\endR} and \type
213    {\endL} are missing. Instead we used the \OMEGA|/|\ALEPH\ approach to
214    directionality as starting point, albeit it has been changed quite a bit, so
215    that we're probably not that far from \TEXXET. In the end right to left
216    typesetting mostly boils down to marking regions in the node list and reverse
217    flushing these in the backend. The main addition that \OMEGA\ brought was the
218    initial paragraph node that stores the direction.
219\stopitem
220
221\startitem
222    Some of the tracing information that is output by \ETEX's \prm
223    {tracingassigns} and \prm {tracingrestores} is not there. Where \ETEX\ added
224    some tracing, \LUAMETATEX\ adds much more and also permits to set details.
225    Tracing is not compatible, if only because we have more complex nodes and do
226    more in all kind of mechanism.
227\stopitem
228
229\startitem
230    Register management in \LUAMETATEX\ uses the \OMEGA|/|\ALEPH\ model, so the
231    maximum value is 65535 and the implementation uses a flat array instead of
232    the mixed flat & sparse model from \ETEX.
233\stopitem
234
235\startitem
236    Because we have more nodes, conditionals, etc.\ the \ETEX\ status related
237    variables are adapted to \LUAMETATEX: we use different \quote {constants},
238    but that should be no problem because any sane macro package uses
239    abstraction. All these properties can be queried via \LUA.
240\stopitem
241
242\startitem
243    The \prm {scantokens} primitive is now using the same mechanism as \LUA\
244    print|-|to|-|\TEX\ uses, which simplifies the code. There is a little
245    performance hit but it will not be noticed in \CONTEXT, because we never use
246    this primitive.
247\stopitem
248
249\startitem
250    The \ETEX\ engine provides \prm {protected} and although we have that too,
251    the implementation is different. Users should not notice that.
252\stopitem
253
254\startitem
255    Because we don't use change files on top of original \TEX, the integration of
256    \ETEX\ functionality is bit more natural, code wise.
257\stopitem
258
259\startitem
260    The \tex {readline} primitive has to be implemented in \LUA. This is a side
261    effect of delegating all file \IO.
262\stopitem
263
264\startitem
265    Most of the code is rewritten but the original primitives are still tagged as
266    coming from \ETEX.
267\stopitem
268
269\stopitemize
270
271\stopsubsection
272
273\startsubsection[title={Bits of \PDFTEX}]
274
275\topicindex {\PDFTEX}
276
277Because we want to produce \PDF\ the most natural starting point was the popular
278\PDFTEX\ program, so we took version 1.40. We inherit the stable features,
279dropped most of the experimental code and promoted some functionality to core
280\LUATEX\ functionality which in turn triggered renaming primitives. However, as
281the backend was dropped, not that much from \PDFTEX\ is present any more.
282Basically all we now inherit from \PDFTEX\ is expansion and protrusion but even
283that has been adapted. So don't expect \LUAMETATEX\ to be compatible.
284
285\startitemize
286
287\startitem
288    The experimental primitives \prm {ifabsnum} and \prm {ifabsdim} have been
289    promoted to core primitives and became part of the much larger repertoire
290    of \LUAMETATEX\ conditionals. The primitives \prm {ifincsname} is also
291    inherited but has a different implementation.
292\stopitem
293
294\startitem
295    Of course \prm {quitvmode} has become a core primitive too.
296\stopitem
297
298\startitem
299    As the hz (expansion) and protrusion mechanism are part of the core the
300    related primitives \prm {lpcode}, \prm {rpcode}, \prm {efcode}, \prm
301    {leftmarginkern}, \prm {rightmarginkern} are promoted to core primitives. The
302    two commands \prm {protrudechars} and \prm {adjustspacing} control these
303    processes. The protrusion and kern related primitives are now dimensions
304    while expansion is still one of these 1000 based scales.
305\stopitem
306
307\startitem
308    In \LUAMETATEX\ three extra primitives can be used to overload the font
309    specific settings: \prm {adjustspacingstep} (max: 100), \prm
310    {adjustspacingstretch} (max: 1000) and \prm {adjustspacingshrink} (max: 500).
311\stopitem
312
313\startitem
314    The hz optimization code has been redone so that we no longer need to create
315    extra font instances. The front- and backend have been decoupled and the
316    glyph and kern nodes carry the used values. In \LUATEX\ that made a more
317    efficient generation of \PDF\ code possible. It also resulted in much cleaner
318    code. The backend code is gone, but of course the information is still
319    carried around. Performance in \LUAMETATEX\ should be a bit better than in
320    \PDFTEX\ although of course its 32 bit machinery is in general slower than
321    the eight bit \PDFTEX.
322\stopitem
323
324\startitem
325    When \prm {adjustspacing} has value~2, hz optimization will be applied to
326    glyphs and kerns. When the value is~3, only glyphs will be treated. A value
327    smaller than~2 disables this feature.
328\stopitem
329
330\startitem
331    When \prm {protrudechars} has a value larger than zero characters at the edge
332    of a line can be made to hang out. A value of~2 will take the protrusion into
333    account when breaking a paragraph into lines. A value of~3 will try to deal
334    with right|-|to|-|left rendering; this is a still experimental feature.
335\stopitem
336
337\startitem
338    The pixel multiplier dimension \prm {pxdimen} has be inherited as core
339    primitive.
340\stopitem
341
342\startitem
343    The primitive \prm {tracingfonts} is now a core primitive but doesn't relate
344    to the backend.
345\stopitem
346
347\startitem
348    The image inclusion code was already different in \LUATEX\ and is gone in
349    \LUAMETATEX\ which has no backend. One can implement the same abstraction
350    layer (aka resouces) using \LUA.
351\stopitem
352
353\stopitemize
354
355Even if not that much is present from \PDFTEX\ in \LUAMETATEX\ we still see it as
356its ancestor. After all, without \PDFTEX\ the \TEX\ community would not be where
357it is now. We still use it as reference when we check something (that we
358changed).
359
360\stopsubsection
361
362\startsubsection[title=Direcionality from \ALEPH]
363
364\topicindex {\ALEPH}
365
366In \LUATEX\ we took the 32 bit aspects of \ALEPH\ RC4, the stable version of
367\OMEGA\ that also integrated \ETEX. In \LUATEX\ we also took much of the
368directional mechanisms and merged it into the \PDFTEX\ code base as starting
369point for further development. Then we simplified directionality, fixed it and
370opened it up. In \LUAMETATEX\ not that much of this is left. We only have two
371horizontal directions. Instead of vertical directions we introduce an orientation
372model bound to boxes. We kept the initial par node, local boxes (that also use
373par nodes) and directional nodes.
374
375The already reduced|-|to|-|four set of directions now only has two members:
376left|-|to|-|right and right|-|to|-|left. They don't do much as it is the backend
377that has to deal with them. When paragraphs are constructed a change in
378horizontal direction is irrelevant for calculating the dimensions. So, basically
379most that we do is registering state and passing that on till the backend can do
380something with it.
381
382Here is a summary of inherited functionality:
383
384\startitemize
385
386\startitem
387    The \type {^^} notation has been extended: after \type {^^^^} four
388    hexadecimal characters are expected and after \type {^^^^^^} six hexadecimal
389    characters have to be given. The original \TEX\ interpretation is still valid
390    for the \type {^^} case but the four and six variants do no backtracking,
391    i.e.\ when they are not followed by the right number of hexadecimal digits
392    they issue an error message. Because \type {^^^} is a normal \TEX\ case, we
393    don't support the odd number of \type {^^^^^} either. This kind of parsing
394    can be disabled in \LUAMETATEX.
395\stopitem
396
397\startitem
398    Glues {\it immediately after} direction change commands are not legal
399    breakpoints. There is a bit more sanity testing for the direction state. This
400    can be configured.
401\stopitem
402
403\startitem
404    The placement of math formula numbers is direction aware and adapts
405    accordingly. Boxes carry directional information but rules don't.
406\stopitem
407
408\startitem
409    There are no direction related primitives for page and body directions. The
410    paragraph, text and math directions are specified using primitives that
411    take a number. The three letter codes are dropped.
412\stopitem
413
414\startitem
415    The local box mechamism has been extended and redone which permits a more
416    generalized and robust usage.
417\stopitem
418
419\stopitemize
420
421Most of the directional work is actually up to the backend. As \OMEGA\ never had
422a \PDF\ backend, the \LUATEX\ took care of the many directions. We now only have
423two directions so the backend code that has to be provided can be relatively
424simple. The biggest complication is in handling fonts and synchronizing the glyph
425streams. Much is also macro package (and usage) dependent.
426
427\stopsubsection
428
429\startsubsection[title={No longer \WEBC}]
430
431\topicindex {\WEBC}
432
433The \LUAMETATEX\ codebase is not dependent on the \WEBC\ framework. The
434interaction with the file system and \TDS\ is up to \LUA. There still might be
435traces but eventually the code base should be lean and mean. The \METAPOST\
436library is coded in \CWEB\ and in order to be independent from related tools,
437conversion to \CCODE\ is done with a \LUA\ script ran by, surprise, \LUAMETATEX.
438
439The biggest consequence of this is that there are no dependencies, also not on
440ever changing libraries that we occasionally see break compilation of \LUATEX.
441Even on older machines (say 2013\endash2020) compilation should be sub minute.
442The amount of platform specific code is minimal.
443
444\stopsubsection
445
446\startsubsection[title={The follow up on \LUATEX}]
447
448\topicindex {\LUATEX}
449
450This engine is a follow up on \LUATEX, that became more or less frozen after
451version 1.10, so that is the version we started from. Apart from reorganizing the
452code base, simplifying the build, limiting dependencies etc.\ this project also
453adds new functionality and removes some as well. The main differences are
454discussed in a separate section. The basic ideas remain the same but the engine
455is not downward compatible. This is why we have \CONTEXT\ \MKIV\ for \LUATEX\ and
456\CONTEXT\ \LMTX\ for \LUAMETATEX .
457
458There is no \LUAJIT\ version of \LUAMETATEX, simply because there is not that
459much gain in the average run (at least not in \CONTEXT. Depending on the kind of
460documents, complexity of macro code and usage of \LUA, the \LUAMETATEX\ engine
461can be upto 30\percent\ faster than \LUATEX\ anyway.
462
463\stopsubsection
464
465\stopsection
466
467\startsection[title=Implementation notes]
468
469\startsubsection[title=Memory allocation]
470
471\topicindex {memory}
472
473The single internal memory heap that traditional \TEX\ used for tokens and nodes
474is split into two separate arrays. Each of these will grow dynamically when
475needed. Internally a token or node is an index into these arrays. This permits
476for an efficient implementation and is also responsible for the performance of
477the core. All other data structures are mostly the same but managed dynamically
478too. Because we operate in a 64 bit world, the parallel table of equivalents
479needed for managing levels, is gone. Anyhow, the original documentation in \TEX\
480The Program mostly applies!
481
482\stopsubsection
483
484\startsubsection[title=Sparse arrays]
485
486The \prm {mathcode}, \prm {delcode}, \prm {catcode}, \prm {sfcode}, \prm {lccode}
487and \prm {uccode} (and the new \prm {hjcode}) tables are now sparse arrays that
488are implemented in~\CCODE. They are no longer part of the \TEX\ \quote
489{equivalence table} and because each had 1.1 million entries with a few memory
490words each, this makes a major difference in memory usage. Performance is not
491really hurt by this.
492
493The \prm {catcode}, \prm {sfcode}, \prm {lccode}, \prm {uccode} and \prm {hjcode}
494assignments don't show up when using the \ETEX\ tracing routines \prm
495{tracingassigns} and \prm {tracingrestores} but we don't see that as a real
496limitation. It also saves a lot of clutter.
497
498The glyph ids within a font are also managed by means of a sparse array as glyph
499ids can go up to index $2^{21}-1$ but these are never accessed directly so again
500users will not notice this.
501
502\stopsubsection
503
504\startsubsection[title=Simple single|-|character csnames]
505
506\topicindex {csnames}
507
508Single|-|character commands are no longer treated specially in the internals,
509they are stored in the hash just like the multiletter control sequences. This is
510a side effect of going \UNICODE\ and \UTF. Where using 256 slots in an array add
511no burden supporting the whole \UNICODE\ range is a waste of space. Therefore,
512also active characters are internally implemented as a special type of
513multi|-|letter control sequences that uses a prefix that is otherwise impossible
514to obtain.
515
516The code that displays control sequences explicitly checks if the length is one
517when it has to decide whether or not to add a trailing space.
518
519\stopsubsection
520
521\startsubsection[title=Binary file reading]
522
523\topicindex {files+binary}
524
525All input now goes via \LUA: files loaded with \type {\input} as well as files
526that are opened with \type {\openin}. Actually the later has to be implemented
527in terms of macros and \LUA\ calls. This also means that compared to \LUATEX\
528the internal handling of input has been changed but users won't notice that.
529
530Setting a callback is expected now. Although reading input natively using \type
531{getc} calls is more efficient, we now fetch lines from \LUA, put them in a
532buffer and then pick successive bytes (keep in mind that we read \UTF) from that.
533The performance is quite ok, also because \LUA\ is fast, todays operating systems
534cache, and storage media have become very fast. Also, \TEX\ is spending more time
535messing around with what it has input than actually reading input.
536
537\stopsubsection
538
539\startsubsection[title=Tabs and spaces]
540
541\topicindex {space}
542\topicindex {newline}
543
544We conform to the way other \TEX\ engines handle trailing tabs and spaces. For
545decades trailing tabs and spaces (before a newline) were removed from the input
546but this behaviour was changed in September 2017 to only handle spaces. We are
547aware that this can introduce compatibility issues in existing workflows but
548because we don't want too many differences with upstream \TEXLIVE\ we just follow
549up on that patch (which is a functional one and not really a fix). It is up to
550macro packages maintainers to deal with possible compatibility issues and in
551\LUAMETATEX\ they can do so via the callbacks that deal with reading from files.
552
553The previous behaviour was a known side effect and (as that kind of input
554normally comes from generated sources) it was normally dealt with by adding a
555comment token to the line in case the spaces and|/|or tabs were intentional and
556to be kept. We are aware of the fact that this contradicts some of our other
557choices but consistency with other engines. We still stick to our view that at
558the log level we can (and might be) more incompatible. We already expose some
559more details anyway.
560
561\stopsubsection
562
563\startsubsection[title=Logging]
564
565When detailed logging is enabled more detail is output with respect to what nodes
566are involved. This is a side effect of the core nodes having more detailed
567subtype information. The benefit of more detail wins from any wish to be byte
568compatible in the logging. One can always write additional logging in \LUA.
569
570The information that goes into the log file can be different from \LUATEX, and
571might even differ a bit more in the future. The main reason is that inside the
572engine we have more granularity, which for instance means that we output subtype
573and attribute related information when nodes are printed. Of course we could have
574offered a compatibility mode but it serves no purpose. Over time there have been
575many subtle changes to control logs in the \TEX\ ecosystems so another one is
576bearable.
577
578In a similar fashion, there is a bit different behaviour when \TEX\ expects
579input, which in turn is a side effect of removing the interception of \type {*}
580and \type {&} which made for cleaner code (quite a bit had accumulated as side
581effect of continuous adaptations in the \TEX\ ecosystems). There was already code
582that was never executed, simply as side effect of the way \LUATEX\ initializes
583itself (one needs to enable classes of primitives for instance). Keep in mind
584that over time system dependencies have been handles with \TEX\ change files, the
585\WEBC\ infrastructure, \KPSE\ features, compilation variables and flags, etc. In
586\LUAMETATEX\ we try to minimize all that.
587
588When it became unavoidable that we output more detail, it also became clear that
589it made no sense to stay log and trace compatible. Some is controlled by
590parameters in order to stay close the original, but \CONTEXT\ is configured such
591that we benefit from the new possibilities. Examples are that in addition to \prm
592{meaning} we have \prm {meaningfull} that also exposes macro properties, and \prm
593{meaningless} that only exposes the body; their companions \prm {meaningful} and
594\prm {meaningles} show no body but do show the preamble when present. The \prm
595{untraced} prefix will suppress some in the log, and we set \prm {tracinglevels}
596to 3 in order to get details about the input and grouping level. When there's
597less shown than expected keep in mind that \LUAMETATEX\ has a somewhat optimized
598saving and restoring of meanings so less can happen which is reflected in
599tracing. When node lists are serialized (as with \prm {showbox}) some nodes, like
600discretionaries report more detail. The compact serializer, used for instance to
601signal overfull boxes, also shows a bit more detail with respect to non|-|content
602nodes. I math more is shown if only because we have more control and additional
603mechanisms.
604
605\stopsubsection
606
607\startsubsection[title=Parsing]
608
609Token parsers have been upgraded for the sake of \LUA, \prm {csname} handling
610has been extended, macro definitions can be more flexible so there code was
611adapted, more conditionals also brought some changes. But we build upon the
612(reorganized) \TEX\ foundation so the basics can definitely be recognized.
613
614Because of interfacing in \LUA\ the internal token and node organization has
615been normalized (read: we cannot cheat because all is kind of visible). On
616the one hand this can come with a performance penalty but that is more than
617compensated by extensions, optimized parsers and such. Still the fact that we
618are \UTF\ based (32 bit) makes the machinery slower than the 8~bit original.
619The reworked \LUAMETATEX\ engine is substantially faster than the \LUATEX\
620predecessor.
621
622The handling of conditionals has been adapted so that we can have flatter
623branches (\prm {orelse} cum suis). This again has some consequences for
624parsing. Because parsing alignments is rather interwoven in general parsing and
625expansion the handling of related primitives has been slightly adapted (also for
626the sake of \LUA\ interfacing) and dealing with \prm {noalign} situations is a
627bit more convenient.
628
629This are just a few of the adaptations and most of this happened stepwise with
630testing in the \CONTEXT\ code base. It will be clear that \LUAMETATEX\ is a quite
631different extension to the original. You're warned.
632
633\stopsubsection
634
635\startsubsection[title=Changes in keyword scanning]
636
637\topicindex {keywords}
638
639Some primitives accept (optional) keywords and in \LUAMETATEX\ there are more
640keywords than in \LUATEX. Scanning can trigger error messages and lookahead side
641effects and in \LUAMETATEX\ these can be different. This is no big deal because
642errors are still errors.
643
644\stopsubsection
645
646\startsection[reference=differences,title={Differences with \LUATEX}]
647
648\startsubsection[title=Dropped primitives]
649
650As \LUAMETATEX\ is a leaner and meaner \LUATEX. This means that substantial parts and
651dependencies are gone: quite some font code, all backend code with related frontend
652code and of course image and font inclusion. There is also new functionality which
653makes for less lean but in the end we still have less, also in terms of dependencies.
654This chapter will discuss what is gone. We start with the primitives that were dropped.
655
656\starttabulate[|l|pl|]
657\BC fonts       \NC \type {\letterspacefont}
658                    \type {\copyfont}
659                    \type {\expandglyphsinfont}
660                    \type {\ignoreligaturesinfont}
661                    \type {\tagcode}
662                    \type {\leftghost}
663                    \type {\rightghost}
664                \NC \NR
665\BC backend     \NC \type {\dviextension}
666                    \type {\dvivariable }
667                    \type {\dvifeedback}
668                    \type {\pdfextension}
669                    \type {\pdfvariable }
670                    \type {\pdffeedback}
671                    \type {\dviextension}
672                    \type {\draftmode}
673                    \type {\outputmode}
674                \NC \NR
675\BC dimensions  \NC \type {\pageleftoffset}
676                    \type {\pagerightoffset}
677                    \type {\pagetopoffset}
678                    \type {\pagebottomoffset}
679                    \type {\pageheight}
680                    \type {\pagewidth}
681                \NC \NR
682\BC resources   \NC \type {\saveboxresource}
683                    \type {\useboxresource}
684                    \type {\lastsavedboxresourceindex}
685                    \type {\saveimageresource}
686                    \type {\useimageresource}
687                    \type {\lastsavedimageresourceindex}
688                    \type {\lastsavedimageresourcepages}
689                \NC \NR
690\BC positioning \NC \type {\savepos}
691                    \type {\lastxpos}
692                    \type {\lastypos}
693                \NC \NR
694\BC directions  \NC \type {\textdir}
695                    \type {\linedir}
696                    \type {\mathdir}
697                    \type {\pardir}
698                    \type {\pagedir}
699                    \type {\bodydir}
700                    \type {\pagedirection}
701                    \type {\bodydirection}
702                \NC \NR
703\BC randomizer  \NC \type {\randomseed}
704                    \type {\setrandomseed}
705                    \type {\normaldeviate}
706                    \type {\uniformdeviate}
707                \NC \NR
708\BC utilities   \NC \type {\synctex}
709                \NC \NR
710\BC extensions  \NC \type {\latelua}
711                    \type {\lateluafunction}
712                    \type {\openout}
713                    \type {\write}
714                    \type {\closeout}
715                    \type {\openin}
716                    \type {\read}
717                    \type {\readline}
718                    \type {\closein}
719                    \type {\ifeof}
720                \NC \NR
721\BC control     \NC \type {\suppressfontnotfounderror}
722                    \type {\suppresslongerror}
723                    \type {\suppressprimitiveerror}
724                    \type {\suppressmathparerror}
725                    \type {\suppressifcsnameerror}
726                    \type {\suppressoutererror}
727                    \type {\mathoption}
728                \NC \NR
729\BC system      \NC \type {\primitive}
730                    \type {\ifprimitive}
731                    \type {\formatname}
732                \NC \NR
733\BC ignored     \NC \type {\long}
734                    \type {\outer}
735                    \type {\mag}
736                \NC \NR
737\stoptabulate
738
739The math machinery has been overhauled stepwise. In the process detailed control
740has been added but later some of that got removed or replaced. The engine now
741assumes that \OPENTYPE\ fonts are used but you do need to set up the engine
742properly, something that has to be done with respect to fonts anyway. By enabling
743and|/|disabling certain features you can emulate the traditional engine. Font
744parameters no longer are taken from the traditional parameters when they are not
745set. We just assume properly passed so called math constants and quite a few new
746ones have been added.
747
748The resources and positioning primitives are actually useful but can be defined
749as macros that (via \LUA) inject nodes in the input that suit the macro package
750and backend. The three||letter direction primitives are gone and the numeric
751variants are now leading. There is no need for page and body related directions
752and they don't work well in \LUATEX\ anyway. We only have two directions left.
753Because we can hook in \LUA\ functions that get information about what is expected
754(consumer or provider) there are plenty possibilities for adding functionality
755using this scripting language.
756
757The primitive related extensions were not that useful and reliable so they have
758been removed. There are some new variants that will be discussed later. The \prm
759{outer} and \prm {long} prefixes are gone as they don't make much sense
760nowadays and them becoming dummies opened the way to something new: control
761sequence properties that permit protection against as well as controlled
762overloading of definitions. I don't think that (\CONTEXT) users will notice these
763prefixes being gone. The definition and parsing related \type {\suppress..}
764features are now default and can't be changed so related primitives are gone.
765
766The \prm {shipout} primitive does no ship out but just erases the content of
767the box unless of course that has happened already in another way. A macro
768package should implement its own backend and related shipout. Talking of backend,
769the extension primitives that relate to backends can be implemented as part of a
770backend design using generic whatsits. There is only one type of whatsit now. In
771fact we're now closer to original \TEX\ with respect to the extensions.
772
773The \type {img} library has been removed as it's rather bound to the backend. The
774\type {slunicode} library is also gone. There are some helpers in the string
775library that can be used instead and one can write additional \LUA\ code if
776needed. There is no longer a \type {pdf} backend library but we have an up to
777date \PDF\ parsing library on board.
778
779In the \type {node}, \type {tex} and \type {status} library we no longer have
780helpers and variables that relate to the backend. The \LUAMETATEX\ engine is in
781principle \DVI\ and \PDF\ unaware. There are, as mentioned, only generic whatsit
782nodes that can be used for some management related tasks. For instance you can
783use them to implement user nodes. More extensive status information is provided
784in the overhauled status library. All libraries have additional functionality and
785names of functions have been normalized (for as far as possible).
786
787The margin kern nodes are gone and we now use regular kern nodes for them. As a
788consequence there are two extra subtypes indicating the injected left or right
789kern. The glyph field served no real purpose so there was no reason for a special
790kind of node.
791
792The \KPSE\ library is no longer built|-|in, but one can use an external \KPSE\
793library, assuming that it is present on the system, because the engine has a so
794called optional library interface to it. Because there is no backend, quite some
795file related callbacks could go away. The following file related callbacks
796remained (till now):
797
798\starttyping
799find_write_file find_format_file open_data_file
800\stoptyping
801
802The callbacks related to errors are changed:
803
804\starttyping
805intercept_tex_error intercept_lua_error
806show_error_message show_warning_message
807\stoptyping
808
809There is a hook that gets called when one of the fundamental memory structures
810gets reallocated.
811
812\starttyping
813trace_memory
814\stoptyping
815
816When you use the overload protect mechanisms, a callback can be plugged in to handle
817exceptions:
818
819\starttyping
820handle_overload
821\stoptyping
822
823The (job) management hooks are kept:
824
825\starttyping
826process_jobname
827start_run stop_run wrapup_run
828pre_dump
829start_file stop_file
830\stoptyping
831
832Because we use a more generic whatsit model, there is a new callback:
833
834\starttyping
835show_whatsit
836\stoptyping
837
838Because tracing boxes now reports a lot more information, we have a plug in for
839detail:
840
841\starttyping
842get_attribute
843\stoptyping
844
845Being the core of extensibility, the typesetting callbacks of course stayed. This
846is what we ended up with:
847
848\startalign[flushleft,nothyphenated]
849\tt \cldcontext{table.concat(table.sortedkeys(callbacks.list), ", ")}
850\stopalign
851
852As in \LUATEX\ font loading happens with the following callback. This time it
853really needs to be set because there is no built|-|in font loader.
854
855\starttyping
856define_font
857\stoptyping
858
859There are all kinds of subtle differences in the implementation, for instance we
860no longer intercept \type {*} and \type {&} as these were already replaced long
861ago in \TEX\ engines by command line options. Talking of options, only a few are
862left. All input goes via \LUA, even the console. One can program a terminal if
863needed.
864
865We took our time for reaching a stable state in \LUATEX. Among the reasons is the
866fact that most was experimented with in \CONTEXT, which we can adapt to the
867engine as we go. It took many years to decide what to keep and how to do things.
868Of course there are places when things can be improved but that most likely only
869happens in \LUAMETATEX. Contrary to what is sometimes suggested, the
870\LUATEX|-|\CONTEXT\ \MKIV\ combination (assuming matched versions) has been quite
871stable. It made no sense otherwise. Most \CONTEXT\ functionality didn't change
872much at the user level. Of course there have been issues, as is natural with
873everything new and beta, but we have a fast update cycle.
874
875The same is true for \LUAMETATEX\ and \CONTEXT\ \LMTX: it can be used for
876production as usual and in practice \CONTEXT\ users tend to use the beta
877releases, which proves this. Of course, if you use low level features that are
878experimental you're on your own. Also, as with \LUATEX\ it might take many years
879before a long term stable is defined. The good news is that, when the source code
880has become part of the \CONTEXT\ distribution, there is always a properly
881working, more or less long term stable, snapshot.
882
883The error reporting subsystem has been redone quite a bit but is still
884fundamentally the same. We don't really assume interactive usage but if someone
885uses it, it might be noticed that it is not possible to backtrack or inject
886something. Of course it is no big deal to implement all that in \LUA\ if needed.
887It removes a system dependency and makes for a bit cleaner code. In \CONTEXT\ we
888quit on an error simply because one has to fix source anyway and runs are fast
889enough. Logging provides more detail and new primitives can be used to prevent
890clutter in tracing (the more complex a macro package becomes, the more extreme
891tracing becomes).
892
893\stopsubsection
894
895\startsubsection[title=New primitives]
896
897There are new primitives as well as some extensions to existing primitive
898functionality. These are described in following chapters but there might be
899hidden treasures in the binary. If you locate them, don't automatically assume
900them to stay, some might be part of experiments! There are for instance a few
901csname related definers, we have integer and dimension constants, the macro
902argument parser can be brought in tolerant mode, the repertoire of conditionals
903has been extended, some internals can be controlled (think of normalization of
904lines, hyphenation etc.), and macros can be protected against user overload. Not
905all is discussed in detail in this manual but there are introductions in the
906\CONTEXT\ distribution that explain them. But the \TEX\ kernel is of course
907omnipresent.
908
909\startbuffer[luatex]
910
911The following primitives are available in \LUATEX\ but not in \LUAMETATEX. Some
912of these are emulated in \CONTEXT. Some of these primitives that deal with math
913and start with an \type {U} have been renamed to names not having this prefix.
914
915\stopbuffer
916
917\startbuffer[luametatex]
918
919The following primitives are available in \LUAMETATEX\ only. In the meantime the
920\LUAMETATEX\ code base is so different from \LUATEX\ that porting back is no
921longer reasonable. The primitives can roughly be divided in those that relate to
922programming and those that deal with typesetting. In this manual we don't go into
923details about most of these. More information (and examples) of the first
924category can be found in the \quote {primitives} manual that ships with \CONTEXT,
925and the second category is spread over (for instance) the \quote {lowlevel}
926manuals. After all, it is easier to present usage in a known environment. Because
927development of \LUATEX\ and \LUAMETATEX\ is related to \CONTEXT\ development, you
928can also expect to find more examples of usage there.
929
930\stopbuffer
931
932\startluacode
933
934local luametatex = tex.primitives()
935local luatex     = table.load("luatex-primitives.lua")
936
937if not luatex then
938    local tex = "\\starttext \\ctxlua {table.save(tex.jobname .. '.lua',tex.primitives())} \\stoptext"
939
940    io.savedata("luatex-primitives.tex",    tex)
941
942    os.execute("context --luatex --once luatex-primitives")
943
944    luatex = table.load("luatex-primitives.lua")
945end
946
947if luatex and luametatex then
948
949    local match = string.match
950
951    local found = { }
952
953    local function collect(index)
954        if index then
955            local data = index.entries
956            for i=1,#data do
957                found[match(data[i].list[1][1],"\\tex%s*{(.-)}") or ""] = true
958            end
959         -- inspect(found)
960        end
961    end
962
963    collect(structures.registers.collected and structures.registers.collected.texindex)
964    collect(structures.registers.collected and structures.registers.collected.luatexindex)
965
966    luatex     = table.tohash(luatex)
967    luametatex = table.tohash(luametatex)
968
969 -- context.page()
970    context.getbuffer { "luatex"}
971    context.blank()
972    context.startcolumns { n = 2 }
973        for k, v in table.sortedhash(luatex) do
974            if not luametatex[k] then
975                if not found[k] then
976                    context.dontleavehmode()
977                end
978                context.type(k)
979                context.crlf()
980            end
981        end
982    context.stopcolumns()
983    context.blank()
984
985 -- context.page()
986    context.getbuffer { "luametatex"}
987    context.blank()
988    context.startcolumns { n = 2 }
989        for k, v in table.sortedhash(luametatex) do
990            if not luatex[k] then
991                if not found[k] then
992                    context.dontleavehmode()
993                    context.llap("\\infofont[todo] ")
994                end
995                context.type(k)
996                context.crlf()
997            end
998        end
999    context.stopcolumns()
1000    context.blank()
1001
1002end
1003
1004\stopluacode
1005
1006When in the preceding list a primitive has \type {[todo]} in front it is sort of
1007experimental and it will be discussed later on when it's stable.
1008
1009\stopsubsection
1010
1011\startsubsection[title=Changed function names]
1012
1013As part of a bit more consistency some function names also changed. Names with an
1014\type {_} got that removed (as that was the minority). It's easy to provide a
1015back mapping if needed (just alias the functions).
1016
1017{\em Todo: only mention the \LUATEX\ ones.}
1018
1019\starttabulate[|l|l|l|l|]
1020\DB library  \BC old name          \BC new name         \BC comment \NC \NR
1021\TB
1022\NC language \NC clear_patterns    \NC clearpatterns    \NC \NR
1023\NC          \NC clear_hyphenation \NC clearhyphenation \NC \NR
1024\NC mplib    \NC italcor           \NC italic           \NC \NR
1025\NC          \NC pen_info          \NC peninfo          \NC \NR
1026\NC          \NC solve_path        \NC solvepath        \NC \NR
1027\NC texio    \NC write_nl          \NC writenl          \NC old name stays \NC \NR
1028\NC node     \NC protect_glyph     \NC protectglyph     \NC \NR
1029\NC          \NC protect_glyphs    \NC protectglyphs    \NC \NR
1030\NC          \NC unprotect_glyph   \NC unprotectglyph   \NC \NR
1031\NC          \NC unprotect_glyphs  \NC unprotectglyphs  \NC \NR
1032\NC          \NC end_of_math       \NC endofmath        \NC \NR
1033\NC          \NC mlist_to_hlist    \NC mlisttohlist     \NC \NR
1034\NC          \NC effective_glue    \NC effectiveglue    \NC \NR
1035\NC          \NC has_glyph         \NC hasglyph         \NC \NR
1036\NC          \NC first_glyph       \NC firstglyph       \NC \NR
1037\NC          \NC has_field         \NC hasfield         \NC \NR
1038\NC          \NC copy_list         \NC copylist         \NC \NR
1039\NC          \NC flush_node        \NC flushnode        \NC \NR
1040\NC          \NC flush_list        \NC flushlist        \NC \NR
1041\NC          \NC insert_before     \NC insertbefore     \NC \NR
1042\NC          \NC insert_after      \NC insertafter      \NC \NR
1043\NC          \NC last_node         \NC lastnode         \NC \NR
1044\NC          \NC is_zero_glue      \NC iszeroglue       \NC \NR
1045\NC          \NC make_extensible   \NC makeextensible   \NC \NR
1046\NC          \NC uses_font         \NC usesfont         \NC \NR
1047\NC          \NC is_char           \NC ischar           \NC \NR
1048\NC          \NC is_direct         \NC isdirect         \NC \NR
1049\NC          \NC is_glyph          \NC isglyph          \NC \NR
1050\NC          \NC is_node           \NC isnode           \NC \NR
1051\NC token    \NC scan_keyword      \NC scankeyword      \NC \NR
1052\NC          \NC scan_keywordcs    \NC scankeywordcs    \NC \NR
1053\NC          \NC scan_int          \NC scanint          \NC \NR
1054\NC          \NC scan_real         \NC scanreal         \NC \NR
1055\NC          \NC scan_float        \NC scanfloat        \NC \NR
1056\NC          \NC scan_dimen        \NC scandimen        \NC \NR
1057\NC          \NC scan_glue         \NC scanglue         \NC \NR
1058\NC          \NC scan_toks         \NC scantoks         \NC \NR
1059\NC          \NC scan_code         \NC scancode         \NC \NR
1060\NC          \NC scan_string       \NC scanstring       \NC \NR
1061\NC          \NC scan_argument     \NC scanargument     \NC \NR
1062\NC          \NC scan_word         \NC scanword         \NC \NR
1063\NC          \NC scan_csname       \NC scancsname       \NC \NR
1064\NC          \NC scan_list         \NC scanlist         \NC \NR
1065\NC          \NC scan_box          \NC scanbox          \NC \NR
1066\LL
1067\stoptabulate
1068
1069It's all part of trying to make the code base consistent but it is sometimes a
1070bit annoying. However, that's why we develop this engine independent of the
1071\LUATEX\ code base. It's anyway a change that has been on my todo list for quite
1072a while because those inconsistencies annoyed me. It might take some years to
1073get all done.
1074
1075\stopsubsection
1076
1077\stopsection
1078
1079\stopchapter
1080
1081\stopcomponent
1082