% language=us \usemodule[art-01,abr-02] \setupbodyfont[11pt] \starttext \startchapter[title=\LUATEX\ going stable] \startsection[title=Introduction] We're closing in on version 1.0 of \LUATEX\ and at the time of this writing (mid April 2016) we're at version 0.95. The last decade we've reported on a regular basis about progress in user group journals, \CONTEXT\ related documents and the \LUATEX\ manual and it makes no sense to repeat ourselves. So where do we stand now? I will not go into details about what is available in \LUATEX, for that you consult the manual but will stick to the larger picture instead. \stopsection \startsection[title=What is it] First of all, as the name suggests, \LUATEX\ has the \LUA\ scripting engine on board. Currently we're still at version 5.2 and the reason for not going 5.3 is mainly because it has a different implementation of numbers and we cannot foresee side effects. We will test this when we move on to \LUATEX\ version 2.0. The second part of the name indicates that we have some kind of \TEX\ and we think we managed to remain largely compatible with the traditional engine. We took most of \ETEX, much of \PDFTEX\ and some from \ALEPH\ (\OMEGA). On top of that we added a few new primitives and extended others. If you look at the building blocks of \TEX, you can roughly recognize these: \startitemize \startitem an input parser (tokenizer) that includes macro expansion; its working is well described, of course in the \TEX\ book, but more than three decades of availability has made \TEX's behaviour rather well documented \stopitem \startitem a list builder that links basic elements like characters (tagged with font information), rules, boxes, glue and kerns together in a double linked list of so called nodes (and noads in intermediate math lists) \stopitem \startitem a language subsystem that is responsible for hyphenating words using so called patterns and exceptions \stopitem \startitem a font subsystem that provides information about glyphs properties, and that also makes it possible to construct math symbols from snippets; it also makes sure that the backend knows what to embed \stopitem \startitem a paragraph builder that breaks a long list into lines and a page builder that splits of chunks that can be wrapped into pages; this is all done within given constraints using a model of rewards and penalties \stopitem \startitem a first class math renderer that set the standard and has inspired modern math font technology \stopitem \startitem mechanisms for dealing with floating data, marking page related info, wrapping stuff in boxes, adding glue, penalties and special information \stopitem \startitem a backend that is responsible for wrapping everything typeset in a format that can be printed and viewed \stopitem \stopitemize So far we're still talking of a rather generic variant of \TEX\ with \LUA\ as extension language. Next we zoom in on some details. \stopsection \startsection[title=Where it differs] Given experiences with discussing extensions to the engine and given the fact that there is never really an agreement about what makes sense or not, the decission was made to not extend the engine any more than really needed but to provide hooks to do that in \LUA. And, time has proven that this is a feasible approach. On the one hand we are as good as possible faithful to the original, and at the same time we can deal with todays and near future demands. Tokenization still happens as before but we can also write input parsers ourselves. You can intercept the raw input when it gets read from file, but you can also create scanners that you can sort of plug into the parser. Both are a compromise between convenience and speed but powerful enough. At the input end we now can group catcode changes (catcodes are properties of characters that control how they are interpreted) into tables so that switching between regimes is fast. You can in great detail influence how data gets read from files because the \IO\ subsystem is opened up. In fact, you have the full power of \LUA\ available when doing so. At the same time you can print back from \LUA\ into the input stream. The input that makes in into \TEX, either or not intercepted and manipulated beforehand, is to be in \UTF8. What comes out to the terminal and log is also \UTF8, and internally all codepaths work with wide characters. Some memory constraints have been lifted, and character related commands accept large numbers. This comes at a price, which means that in practice the \LUATEX\ engine can be several times slower than the 8|-|bit \PDFTEX, but of course in practice performance is mostly determined by the efficiency of macro package, so it might actually be faster in situations that would stress its ancestors. Node lists travel through \TEX\ and can be intercepted at many points. That way you can add additional manipulations. You can for instance rely on \TEX\ for hyphenation, ligature building and kerning but you can also plug in alternatives. For this purpose these stages are clearly separated and less integrated (deep down) than in traditional \TEX. There are helpers for accessing lists of nodes, individual nodes and you can box those lists too (this is called packing). You can adapt, create and destroy node lists at will, as long as you make sure you feed back into \TEX\ something that makes sense. In order to control (or communicate with) nodes from the \TEX\ end, an attribute mechanism was added that makes it possible to bind properties to nodes when they get added to lists. At the \TEX\ end you can set an attribute that then gets assigned to the currently injected nodes, while at the \LUA\ end you can query the node for these attributes and their values. The language subsystem is re|-|implemented and behaves mostly the same as in the original \TEX\ program. It has a few extensions and permits runtime loading of patterns. In addition to language support we also have basic script support, that is: directional information is now part of the stream and contrary to \ALEPH\ that wraps this into extension whatsits, in \LUATEX\ we have directional nodes as core nodes. The font subsystem is opened up in such a way that you can pass your own fonts to the core. You can even construct virtual fonts. This open approach makes it possible to support \OPENTYPE\ fonts and whatever format will show up in the future. Of course the backend needs to embed the right data in the result file but by then the hard work is already done. This approach fits into the always present wish of users (and package writers) to be able to implement whatever crazy thought one comes up with. The paragraph builder is a somewhat cleaned up variant of the \PDFTEX\ one, combined with directional and boundary support from \ALEPH. The protrusion and expansion mechanism have been redone in such a way that the front- and backend code is better separated and is somewhat more efficient now. As one can intercept the paragraph builder, additional functionality can be injected before, after or at some stages in the process. Of course we have kept the math engine but, because we now need to support \OPENTYPE\ math, alternative code paths have been added to deal with the kind of information that such fonts provide. We also took the opportunity to open up the math machinery a bit so that one can control rendering of some more complex elements and set the spacing between elements. Because \TEX\ users are quite traditional we had to stop somewhere, simply because legacy code has to be dealt with. Most mentioned auxiliary mechanisms can be accessed via the node lists, for instance you can locate inserts and marks in them. The backend related whatsit nodes can be recognized as well. At any time one can query and set \TEX\ registers and intercept boxed material. Of course some knowledge of the inner working of \TEX\ helps here. The backend code is as much as possible separated from the frontend code (but there is still some work to do there). As in \PDFTEX\ you can of course inject arbitrary \PDF\ code and make feature rich documents. This flexibility keeps \TEX\ current. \stopsection \startsection[title=Extras] Is that all? No, apart from some minor extensions that might help to make programming somewhat easier \TEX, there are a few more fundamental additions. Images and reusable content (boxes) are now part of the core instead of them being wrapped into backend specific whatsits, although of course the backend has to provide support for it. This is more natural in the frontend (and user interface) and also more consistent in the engine itself. All backend functionality is now collected in three primitives that take arguments. This permits a cleaner separation between front- and backend. Then there is the \METAPOST\ library, a feature already present for many years now. It provides \TEX\ with some graphic capabilities that, given the origin, fits nicely into the whole. The \LUATEX\ and \MPLIB\ project started about the same time and right from the start it was our plan to combine both. One of the extras is of course \LUA. It not only permits us to interface to the internals of \TEX, but it also provides the user with a way to manipulate data. Even if you never use \LUA\ to access internals, it might still be found useful for occasionally doing things that are hard to accomplish using the macro langage. In addition to stock \LUA\ we include the \LPEG\ library, an image reading library (related to the backend) including read access to \PDF\ files via the used poppler library, parsing of \PDF\ content streams, zip compression, access to the file system, the ability to run commands and socket support. Some of this might become external libraries at some point, as we want to keep the expected core functionality lean and mean. A nice extra is that we provide \LUAJITTEX, a compatible variant that has a faster \LUA\ virtual machine on board. \stopsection \startsection[title=Follow up] The interfaces that we have now have to a large extent evolved to what we had in mind. We started with simple experiments: just \LUA\ plus a bit of access to registers. Then the Oriental \TEX\ project (with Idris Samawi Hamid) made it possible to speed up development and conversion to \CCODE\ and opening up took off. After that we gradually moved forward. That doesn't mean that we're done yet. The \LUATEX\ 1.0 engine will not change much. We might add a few things, and for sure we will keep working on the code base. The move from \PASCAL\ to \CCODE\ \WEB\ (an impressive job by itself), as well as merging functionality of engines (kind of a challenge when you want to remain compatible), opening up via \LUA\ (which possibilities even surprised us), and experimenting (\CONTEXT\ users paid the price for that) took quite some time, also because we played with proofs of concept. It helped that we used the engine exclusively for real typesetting related work ourselves. We will continue to clean up and document the source and stepwise improve the manual. If you followed the development of \CONTEXT, you will have noticed that \MKIV\ is heavily relying on the \LUA\ interface so stability is important (although we can relatively easy adapt to future developments as we did in the past). However, the fact that other packages support \LUATEX\ means that we also need to keep the 1.0 engine stable. Our challenge is to provide stability on the one hand, but not limit ourselves to much on the other. We'll keep you posted on what comes next. \blank Hans, Hartmut, Luigi, Taco \stopsection \stopchapter \stoptext