mk-reflection.tex /size: 35 Kb    last modification: 2023-12-21 09:43
1% language=us
2
3\startcomponent mk-reflection
4
5\environment mk-environment
6
7\chapter {The luafication of \TEX\ and \CONTEXT}
8
9% (Previously published in \TUGBOAT, ask Karl for reference.)
10
11\subject {introduction}
12
13Here I will present the current stage of \LUATEX\ around beta
14stage 2, and discuss the impact so far on \CONTEXT\ \MKIV\
15that we use as our testbed. I'm writing this at the end of February
162008 as part of the series of regular updates on \LUATEX. As such,
17this report is part of our more or less standard test document
18(\type{mk.tex}). More technical details can be found in the reference
19manual that comes with \LUATEX. More information on \MKIV\ is
20available in the \CONTEXT\ mailing lists, \WIKI, and
21\type{mk.pdf}.
22
23For those who never heard of \LUATEX: this is a new variant of
24\TEX\ where several long pending wishes are fulfilled:
25
26\startitemize[packed]
27\item combine the best of all \TEX\ engines
28\item add scripting capabilities
29\item open up the internals to the scripting engine
30\item enhance font support to \OPENTYPE
31\item move on to \UNICODE
32\item integrate \METAPOST
33\stopitemize
34
35There are a few more wishes, like converting the code base to
36\CCODE\ but these are long term goals.
37
38The project started a few years ago and is conducted by Taco
39Hoekwater (\PASCAL\ and \CCODE\ coding, code base management,
40reference manual), Hartmut Henkel (\PDF\ backend, experimental
41features) and Hans Hagen (general overview, \LUA\ and \TEX\
42coding, website). The code development got a boost by a grant of
43the Oriental \TEX\ project (project lead: Idris Samawi Hamid) and
44funding via the \TUG. The related \MPLIB\ project by the same team
45is also sponsored by several user groups. The very much needed
46\OPENTYPE\ fonts are also a user group funded effort: the Latin
47Modern and \TEX\ Gyre projects (project leads: Jerzy Ludwichowski,
48Volker RW\ Schaa and Hans Hagen), with development (the real
49work) by: Bogus\l{}aw Jackowski and Janusz Nowacki.
50
51One of our leading principles is that we focus on opening up. This
52means that we don't implement solutions (which also saves us many
53unpleasant and everlasting discussions). Implementing solutions is
54up to the user, or more precisely: the macro package writer, and
55since there are many solutions possible, each can do it his or her
56way. In that sense we follow the footsteps of Don Knuth: we make
57an extensible tool, you are free to like it or not, you can take
58it and extend it where needed, and there is no need to bother us
59(unless of course you find bugs or weird side effects). So far
60this has worked out quite well and we're confident that we can keep
61our schedule.
62
63We do our tests of a variant of \CONTEXT\ tagged \MKIV, especially
64meant for \LUATEX, but \LUATEX\ itself is in no way limited to or
65tuned for \CONTEXT. Large chunks of the code written for \MKIV\
66are rather generic and may eventually be packaged as a base system
67(especially font handling) so that one can use \LUATEX\ in rather
68plain mode. To a large extent \MKIV\ will be functionally compatible
69with \MKII, the version meant for traditional \TEX, although it
70knows how to profit from \XETEX. Of course the expectation is that
71certain things can be done better in \MKIV\ than in \MKII.
72
73\subject{status}
74
75By the end of 2007 the second major beta release of \LUATEX\ was
76published. In the first quarter of 2008 Taco would concentrate on
77\MPLIB, Hartmut would come up with the first version of the image
78library while I could continue working on \MKIV\ and start using
79\LUATEX\ in real projects. Of course there is some risk involved
80in that, but since we have a rather close loop for critical bug
81fixes, and because I know how to avoid some dark corners, the
82risk was worth taking.
83
84What did we accomplish so far? I can best describe this in relation
85to how \CONTEXT\ \MKIV\ evolved and will evolve. Before we do this,
86it makes sense to spend some words on why we started working on \MKIV\
87in the first place.
88
89When the \LUATEX\ project started, \CONTEXT\ was about 10 years in
90the field. I can safely say that we were still surprised by the
91fact that what at first sight seems unsolvable in \TEX\ somehow
92could always be dealt with. However, some of the solutions were
93rather tricky. The code evolved towards a more or less stable
94state, but sometimes depended on controlled processing. Take for
95instance backgrounds that can span pages and columns, can be
96nested and can have arbitrary shapes. This feature has been
97present in \CONTEXT\ for quite a while, but it involves an
98interplay between \TEX\ and \METAPOST. It depends on information
99collected in a previous run as well as (at runtime or not)
100processing of graphics.
101
102This means that by now \CONTEXT\ is not just a bunch of \TEX\ macros,
103but also closely related to \METAPOST. It also means that
104processing itself is by now rather controlled by a wrapper, in the
105case of \MKII\ called \TEXEXEC. It may sound complicated, but the
106fact that we have implemented workflows that run unattended for
107many years and involve pretty complex layouts and graphic
108manipulations demonstrates that in practice it's not as bad as it
109may sound.
110
111With the arrival of \LUATEX\ we not only have a rigourously
112updated \TEX\ engine, but also get \METAPOST\ integrated. Even
113better, the scripting language \LUA\ is not only used for opening
114up \TEX, but is also used for all kind of management tasks. As
115a result, the development of \MKIV\ not only concerns rewriting
116whole chunks of \CONTEXT, but also results in a set of new
117utilities and a rewrite of existing ones. Since dealing with
118\MKIV\ will demand some changes in the way users deal with
119\CONTEXT\ I will discuss some of them first. It also demonstrates
120that \LUATEX\ is more than just \TEX.
121
122\subject{utilities}
123
124There are two main scripts: \LUATOOLS\ and \MTXRUN. The first one
125started as a replacement for \KPSEWHICH\ but evolved into a base
126tool for generating (\TDS) file databases and generating formats.
127In \MKIV\ we replace the regular file searching, and therefore we
128use a different database model. That's the easy part. More
129tricky is that we need to bootstrap \MKIV\ into this alternative
130mode and when doing so we don't want to use the \type {kpse} library
131because that would trigger loading of its databases. To discuss
132the gory details here might cause users to refrain from using \LUATEX\ so
133we stick to a general description.
134
135\startitemize
136\item When generating a format, we also generate a bootstrap \LUA\
137      file. This file is compiled to bytecode and is put alongside
138      the format file. The libraries of this bootstrap file are
139      also embedded in the format.
140\item When we process a document, we instruct \LUATEX\ to load
141      this bootstrap file before loading the format. After the
142      format is loaded, we re-initialize the embedded libraries.
143      This is needed because at that point more information may be
144      available than at loading time. For instance, some
145      functionality is available only after the format is loaded
146      and \LUATEX\ enters the \TEX\ state.
147\item File databases, formats, bootstrap files, and
148      runtime|-|generated cached data is kept in a \TDS\ tree specific cache
149      directory. For instance, \OPENTYPE\ font tables are stored
150      on disk so that next time loading them is faster.
151\stopitemize
152
153Starting \LUATEX\ and \MKIV\ is done by \LUATOOLS. This tool
154is generic enough to handle other formats as well, like \MPTOPDF\
155or \PLAIN. When you run this script without argument, you will
156see:
157
158\starttyping
159version 1.1.1 - 2006+ - PRAGMA ADE / CONTEXT
160
161--generate        generate file database
162--variables       show configuration variables
163--expansions      show expanded variables
164--configurations  show configuration order
165--expand-braces   expand complex variable
166--expand-path     expand variable (resolve paths)
167--expand-var      expand variable (resolve references)
168--show-path       show path expansion of ...
169--var-value       report value of variable
170--find-file       report file location
171--find-path       report path of file
172--make or --ini   make luatex format
173--run or --fmt=   run luatex format
174--luafile=str     lua inifile (default is <progname>.lua)
175--lualibs=list    libraries to assemble (optional)
176--compile         assemble and compile lua inifile
177--verbose         give a bit more info
178--minimize        optimize lists for format
179--all             show all found files
180--sort            sort cached data
181--engine=str      target engine
182--progname=str    format or backend
183--pattern=str     filter variables
184--lsr             use lsr and cnf directly
185\stoptyping
186
187For the \LUA\ based file searching, \LUATOOLS\ can be seen as a
188replacement for \MKTEXLSR\ and \KPSEWHICH\ and as such it also
189recognizes some of the \KPSEWHICH\ flags. The script is self
190contained in the sense that all needed libraries are embedded. As
191a result no library paths need to be set and packaged. Of course
192the script has to be run using \LUATEX\ itself. The following
193commands generate the file databases, generate a \CONTEXT\ \MKIV\
194format, and process a file:
195
196\starttyping
197luatools --generate
198luatools --make --compile cont-en
199luatools --fmt=cont-en somefile.tex
200\stoptyping
201
202There is no need to install \LUA in order to run this script. This
203is because \LUATEX\ can act as such with the advantage that the
204built-in libraries are available too, for instance the \LUA\ file
205system \type {lfs}, the \ZIP\ file manager \type {zip}, the
206\UNICODE\ libary \type {unicode}, \type {md5}, and of course some of
207our own.
208
209\starttabulate
210\NC luatex  \NC a \LUA||enhanced \TEX\ engine \NC \NR
211\NC texlua  \NC a \LUA\ engine enhanced with some libraries \NC \NR
212\NC texluac \NC a \LUA\ bytecode compiler enhanced with some libraries \NC \NR\NC \NR
213\stoptabulate
214
215In principle \type {luatex} can perform all tasks but because we
216need to be downward compatible with respect to the command line
217and because we want \LUA\ compatible variants, you can copy or
218symlink the two extra variants to the main binary.
219
220The second script, \MTXRUN, can be seen as a replacement for the
221\RUBY\ script \TEXMFSTART, a utility whose main task is to launch
222scripts (or documents or whatever) in a \TDS\ tree. The \MTXRUN\
223script makes it possible to get away from installing \RUBY\ and as
224a result a regular \TEX\ installation can be made independent of
225scripting tools.
226
227\starttyping
228version 1.0.2 - 2007+ - PRAGMA ADE / CONTEXT
229
230--script              run an mtx script
231--execute             run a script or program
232--resolve             resolve prefixed arguments
233--ctxlua              run internally (using preloaded libs)
234--locate              locate given filename
235
236--autotree            use texmf tree cf.\ environment settings
237--tree=pathtotree     use given texmf tree (def: 'setuptex.tmf')
238--environment=name    use given (tmf) environment file
239--path=runpath        go to given path before execution
240--ifchanged=filename  only execute when given file has changed
241--iftouched=old,new   only execute when given file has changed
242
243--make                create stubs for (context related) scripts
244--remove              remove stubs (context related) scripts
245--stubpath=binpath    paths where stubs wil be written
246--windows             create windows (mswin) stubs
247--unix                create unix (linux) stubs
248
249--verbose             give a bit more info
250--engine=str          target engine
251--progname=str        format or backend
252
253--edit                launch editor with found file
254--launch (--all)      launch files (assume os support)
255
256--intern              run script using built-in libraries
257\stoptyping
258
259This help information gives an impression of what the script does:
260running other scripts, either within a certain \TDS\ tree or not,
261and either conditionally or not. Users of \CONTEXT\ will probably
262recognize most of the flags. As with \TEXMFSTART, arguments with
263prefixes like \type{file:} will be resolved before being
264passed to the child process.
265
266The first option, \type {--script} is the most important one and
267is used like:
268
269\starttyping
270mtxrun --script fonts --reload
271mtxrun --script fonts --pattern=lm
272\stoptyping
273
274In \MKIV\ you can access fonts by filename or by font name, and
275because we provide several names per font you can use this command
276to see what is possible. Patterns can be \LUA\ expressions, as
277demonstrated here:
278
279\starttyping
280mtxrun --script font  --list --pattern=lmtype.*regular
281
282lmtypewriter10-capsregular   LMTypewriter10-CapsRegular   lmtypewriter10-capsregular.otf
283lmtypewriter10-regular       LMTypewriter10-Regular       lmtypewriter10-regular.otf
284lmtypewriter12-regular       LMTypewriter12-Regular       lmtypewriter12-regular.otf
285lmtypewriter8-regular        LMTypewriter8-Regular        lmtypewriter8-regular.otf
286lmtypewriter9-regular        LMTypewriter9-Regular        lmtypewriter9-regular.otf
287lmtypewritervarwd10-regular  LMTypewriterVarWd10-Regular  lmtypewritervarwd10-regular.otf
288\stoptyping
289
290A simple
291
292\starttyping
293mtxrun --script fonts
294\stoptyping
295
296gives:
297
298\starttyping
299version 1.0.2 - 2007+ - PRAGMA ADE / CONTEXT | font tools
300
301--reload              generate new font database
302--list                list installed fonts
303--save                save open type font in raw table
304
305--pattern=str         filter files
306--all                 provide alternatives
307\stoptyping
308
309In \MKIV\ font names can be prefixed by \type {file:} or \type
310{name:} and when they are resolved, several attempts are made, for
311instance non|-|characters are ignored. The \type {--all} flag shows
312more variants.
313
314Another example is:
315
316\starttyping
317mtxrun --script context --ctx=somesetup somefile.tex
318\stoptyping
319
320Again, users of \TEXEXEC\ may recognize part of this and indeed this is
321its replacement. Instead of \TEXEXEC\ we use a script named \type
322{mtx-context.lua}. Currently we have the following scripts and
323more will follow:
324
325The \type {babel} script is made in cooperation with Thomas
326Schmitz and can be used to convert babelized Greek files into
327proper \UTF. More of such conversions may follow. With \type
328{cache} you can inspect the content of the \MKIV\ cache and do
329some cleanup. The \type {chars} script is used to construct some
330tables that we need in the process of development. As its name
331says, \type {check} is a script that does some checks, and in
332particular it tries to figure out if \TEX\ files are correct. The
333already mentioned \type {context} script is the \MKIV\ replacement
334of \TEXEXEC, and takes care of multiple runs, preloading project
335specific files, etc. The \type {convert} script will replace the
336\RUBY\ script \type {pstopdf}.
337
338A rather important script is the already mentioned \type {fonts}.
339Use this one for generating font name databases (which then
340permits a more liberal access to fonts) or identifying installed
341fonts. The \type {unzip} script indeed unzips archives. The \type
342{update} script is still somewhat experimental and is one of the
343building blocks of the \CONTEXT\ minimal installer system by
344Mojca Miklavec and Arthur Reutenauer. This update script
345synchronizes a local tree with a repository and keeps an
346installation as small as possible, which for instance means: no
347\OPENTYPE\ fonts for \PDFTEX, and no redundant \TYPEONE\ fonts for
348\LUATEX\ and \XETEX.
349
350The (for the moment) last two scripts are \type {watch} and \type
351{web}. We use them in (either automated or not) remote publishing
352workflows. They evolved out of the \EXAMPLE\ framework which is
353currently being reimplemented in \LUA.
354
355As you can see, the \LUATEX\ project and its \CONTEXT\ companion
356\MKIV\ project not only deal with \TEX\ itself but also
357facilitates managing the workflows. And the next list is
358just a start.
359
360\starttabulate
361\NC context \NC controls processing of files by \MKIV \NC \NR
362\NC babel   \NC conversion tools for \LATEX\ files \NC \NR
363\NC cache   \NC utilities for managing the cache \NC \NR
364\NC chars   \NC utilities used for \MKIV\ development \NC \NR
365\NC check   \NC \TEX\ syntax checker \NC \NR
366\NC convert \NC helper for some basic graphic conversion \NC \NR
367\NC fonts   \NC utilities for managing font databases \NC \NR
368\NC update  \NC tool for installing minimal \CONTEXT\ trees \NC \NR
369\NC watch   \NC hot folder processing tool \NC \NR
370\NC web     \NC utilities related to automate workflows \NC \NR
371\stoptabulate
372
373There will be more scripts. These scripts are normally rather small
374because they hook into \MTXRUN\ which provides the libraries. Of course
375existing tools remain part of the toolkit. Take for instance \CTXTOOLS,
376a \RUBY\ script that converts font encoded pattern files to generic
377\UTF\ encoded files.
378
379Those who have followed the development of \CONTEXT\ will notice that we moved
380from utilities written in \MODULA\ to tools written in \PERL. These were later
381replaced by \RUBY\ scripts and eventually most of them will be rewritten in
382\LUA.
383
384\subject{macros}
385
386I will not repeat what is said already in the \MKIV\ related
387documents, but stick to a summary of what the impact on \CONTEXT\
388is and will be. From this you can deduce what the possible influence
389on other macro packages can be.
390
391Opening up \TEX\ started with rewriting all \IO\ related activities.
392Because we wanted to be able to read from \ZIP\ files, the web and
393more, we moved away from the traditional \KPSE\ based file
394handling. Instead \MKIV\ uses an extensible variant written in
395\LUA. Because we need to be downward compatible, the code is
396somewhat messy, but it does the job, and pretty quickly and efficiently
397too. Some alternative input media are implemented and many more
398can be added. In the beginning I permitted several ways to specify
399a resource but recently a more restrictive \URL\ syntax was
400imposed. Of course the file locating mechanisms provide the same
401control as provided by the file readers in \MKII.
402
403An example of reading from a \ZIP\ file is:
404
405\starttyping
406\input zip:///archive.zip?name=blabla.tex
407\input zip:///archive.zip?name=/somepath/blabla.tex
408\stoptyping
409
410In addition one can register files, like:
411
412\starttyping
413\usezipfile[archive.zip]
414\usezipfile[tex.zip][texmf-local]
415\usezipfile[tex.zip?tree=texmf-local]
416\stoptyping
417
418The last two variants register a zip file in the \TDS\ structure
419where more specific lookup rules apply. The files in a
420registered file are known to the file searching mechanism so one
421can give specifications like the following:
422
423\starttyping
424\input */blabla.tex
425\input */somepath/blabla.tex
426\stoptyping
427
428In a similar fashion one can use the \type {http}, \type {ftp} and
429other protocols. For this we use independent fetchers that cache
430data in the \MKIV\ cache. Of course, in more structured projects,
431one will seldom use the \type {\input} command but use a project
432structure instead.
433
434Handling of files rather quickly reached a stable state, and we seldom need
435to visit the code for fixes. Already after a few years of developing the first
436code for \LUATEX\ we reached a state of \quote {Hm, when did I write
437this?}. When we have reached a stable state I foresee that much of the
438older code will need a cleanup.
439
440Related to reading files is the sometimes messy area of input
441regimes (file encoding) and font encoding, which itself relates to
442dealing with languages. Since \LUATEX\ is \UTF-8 based, we need to
443deal with file encoding issues in the frontend, and this is what
444\LUA\ based file handling does. In practice users of \LUATEX\ will
445swiftly switch to \UTF\ anyway but we provide regime control for
446historic reasons. This time the recoding tables are \LUA\ based
447and as a result \MKIV\ has no regime files. In a similar fashion
448font encoding is gone: there is still some old code that deals
449with default fallback characters, but most of the files are gone.
450The same will be true for math encoding. All information is now
451stored in a character table which is the central point in many
452subsystems now.
453
454It is interesting to notice that until now users have never asked
455for support with regards to input encoding. We can safely assume
456that they just switched to \UTF\ and recoded older documents. It
457is good to know that \LUATEX\ is mostly \PDFTEX\ but also
458incorporates some features of \OMEGA. The main reason for this is
459that the Oriental \TEX\ project needed bidirectional typesetting
460and there was a preference for this implementation over the one provided by
461\ETEX. As a side effect input translation is also present, but
462since no one seems to use it, that may as well go away. In \MKIV\
463we refrain from input processing as much as possible and focus on
464processing the node lists. That way there is no interference
465between user data, macro expansion and whatever may lead to the
466final data that ends up in the to|-|be|-|typeset stream. As said, users
467seem to be happy to use \UTF\ as input, and so there is hardly any need
468for manipulations.
469
470Related to processing input is verbatim: a feature that is always
471somewhat complicated by the fact that one wants to typeset a
472manual about \TEX\ in \TEX\ and therefore needs flexible escapes
473from illustrative as well as real \TEX\ code. In \MKIV\ verbatim
474as well as all buffering of data is dealt with in \LUA. It took a
475while to figure out how \LUATEX\ should deal with the concept of a
476line ending, but we got there. Right from the start we made sure
477that \LUATEX\ could deal with collections of catcode settings
478(those magic states that characters can have). This means that one
479has complete control at both the \TEX\ and \LUA\ end over the way
480characters are dealt with.
481
482In \MKIV\ we also have some pretty printing features, but many
483languages are still missing. Cleaning up the premature verbatim code
484and extending pretty printing is on the agenda for the end of 2008.
485
486Languages also are handled differently. A major change is that
487pattern files are no longer preloaded but read in at runtime.
488There is still some relation between fonts and languages, no
489longer in the encoding but in dealing with \OPENTYPE\ features.
490Later we will do a more drastic overhaul (with multiple name
491schemes and such). There are a few experimental features, like
492spell checking.
493
494Because we have been using \UTF\ encoded hyphenation patterns for
495quite some time now, and because \CONTEXT\ ships with its own files,
496this transition probably went unnoticed, apart maybe from a faster
497format generation and less startup time.
498
499Most of these features started out as an experiment and provided a
500convenient way to test the \LUATEX\ extensions. In \MKIV\ we go
501quite far in replacing \TEX\ code by \LUA, and how far one goes is
502a matter of taste and ambition. An example of a recent replacement
503is graphic inclusion. This is one of the oldest mechanisms in
504\CONTEXT\ and it has been extended many times, for instance by
505plugins that deal with figure databases (selective filtering from
506\PDF\ files made for this purpose), efficient runtime conversion,
507color conversion, downsampling and product dependent alternatives.
508
509One can question if a properly working mechanism should be
510replaced. Not only is there hardly any speed to gain (after all,
511not that many graphics are included in documents), a \LUA--\TEX\
512mix may even look more complex. However, when an opened-up \TEX\
513keeps evolving at the current pace, this last argument becomes
514invalid because we can no longer give that \TeX ie code to \LUA. Also,
515because most of the graphic inclusion code deals with locating
516files and figuring out the best quality variant, we can benefit
517much from \LUA: file handling is more robust, the code looks
518cleaner, complex searches are faster, and eventually we can
519provide way more clever lookup schemes. So, after all, switching
520to \LUA\ here makes sense. A nice side effect is that some of the
521mentioned plugins now take a few lines of extra code instead of
522many lines of \TEX. At the time of writing this, the beta version
523of \MKIV\ has \LUA\ based graphic inclusion.
524
525A disputable area for Luafication is multipass data. Most of that has
526already been moved to \LUA\ files instead of \TEX\ files, and the
527rest will follow: only tables of contents still use a \TEX\
528auxiliary file. Because at some point we will reimplement the
529whole section numbering and cross referencing, we postponed that
530till later. The move is disputable because in the end, most data
531ends up in \TEX\ again, which involves some conversion. However, in
532\LUA\ we can store and manipulate information much more easily and so
533we decided to follow that route. As a start, index information is
534now kept in \LUA\ tables, sorted on demand, depending on language
535needs and such. Positional information used to take up much hash
536space which could deplete the memory pool, but now we can have
537millions of tracking points at hardly any cost.
538
539Because it is a quite independent task, we could rewrite the
540\METAPOST\ conversion code in \LUA\ quite early in the
541development. We got smaller and cleaner code, more flexibility, and
542also gained some speed. The code involved in this may change as
543soon as we start experimenting with \MPLIB. Our expectations
544are high because in a bit more modern designs a graphic engine
545cannot be missed. For instance, in educational material,
546backgrounds and special shapes are all over the place, and we're
547talking about many \METAPOST\ runs then. We expect to bring down the
548processing time of such documents considerably, if only because
549the \METAPOST\ runtime will be close to zero (as experiments have
550shown us).
551
552While writing the code involved in the \METAPOST\ conversion a new
553feature showed up in \LUA: \type {lpeg}, a parsing library. From
554that moment on \type {lpeg} was being used all over the place,
555most noticeably in the code that deals with processing \XML. Right
556from the start I had the feeling that \LUA\ could provide a more
557convenient way to deal with this input format. Some experiments
558with rewriting the \MKII\ mechanisms did not show the expected
559speedup and were abandoned quickly.
560
561Challenged by \type {lpeg} I then wrote a parser and started
562playing with a mixture of a tree based and stream approach to
563\XML\ (\MKII\ is mostly stream based). Not only is loading \XML\
564code extremely fast (we used 40~megaByte files for testing),
565dealing with the tree is also convenient. The additional \MKIV\
566methods are currently being tested in real projects and so far
567they result in an acceptable and pleasant mix of \TEX\ and \XML. For
568instance, we can now selectively process parts of the tree using
569path expressions, hook in code, manipulate data, etc.
570
571The biggest impact of \LUATEX\ on the \CONTEXT\ code base is not
572the previously mentioned mechanisms but one not yet mentioned:
573fonts. Contrary to \XETEX, which uses third party libraries,
574\LUATEX\ does not implement dealing with font specific issues at
575all. It can load several font formats and accepts font data in a
576well|-|defined table format. It only processes character nodes into
577glyph nodes and it's up to the user to provide more by
578manipulating the node lists. Of course there is still basic
579ligature building and kerning available but one can bypass that with
580other code.
581
582In \MKIV, when we deal with \TYPEONE\ fonts, we try to get away
583from traditional \TFM\ files and use \AFM\ files instead (indeed,
584we parse them using \type {lpeg}). The fonts are mapped onto
585\UNICODE. Awaiting extensions of math we only use \TFM\ files for
586math fonts. Of course \OPENTYPE\ fonts are dealt with and this is
587where we find most \LUA\ code in \MKIV: implementing features.
588Much of that is a grey area but as part of the Oriental \TEX\
589project we're forced to deal with complex feature support, so that
590provides a good test bed as well as some pressure for getting it
591done. Of course there is always the question to what extent we
592should follow the (maybe faulty) other programs that deal with
593font features. We're lucky that the Latin Modern and \TEX\ Gyre
594projects provide real fonts as well as room for discussion and
595exploring these grey areas.
596
597In parallel to writing this, I made a tracing feature for Oriental
598\TEX er Idris so that he could trace what happened with the Arabic
599fonts that he is making. This was relatively easy because already
600in an early stage of \MKIV\ some debugging mechanisms were built.
601One of its nice features is that on an error, or when one
602traces something, the results will be shown in a web browser.
603Unfortunately I have not enough time to explore such aspects in
604more detail, but at least it demonstrates that we can change some
605aspects of the traditional interaction with \TEX\ in more radical
606ways.
607
608Many users may be aware of the existence of so|-|called virtual
609fonts, if only because it can be a cause of problems (related to
610map files and such). Virtual fonts have a lot of potential but
611because they were related to \TEX's own font data format they never got
612very popular. In \LUATEX\ we can make virtual fonts at runtime. In
613\MKIV\ for instance we have a feature (we provide features beyond
614what \OPENTYPE\ does) that completes a font by composing missing
615glyphs on the fly. More of this trickery can be expected as soon
616as we have time and reason to implement it.
617
618In \PDFTEX\ we have a couple of font related goodies, like
619character expansion (inspired by Hermann Zapf) and character
620protruding. There are a few more but these had limitations and
621were suboptimal and therefore have been removed from \LUATEX.
622After all, they can be implemented more robustly in \LUA. The two
623mentioned extensions have been (of course) kept and have been partially
624reimplemented so that they are now uniquely bound to fonts
625(instead of being common to fonts that traditional \TEX\ shares in
626memory). The character related tables can be filled with \LUA\ and
627this is what \MKIV\ now does. As a result much \TEX\ code could go
628away. We still use shape related vectors to set up the values, but
629we also use information stored in our main character database.
630
631A likely area of change is math and not only as a result of the
632\TEX\ gyre math project which will result in a bunch of \UNICODE\
633compliant math fonts. Currently in \MKIV\ the initialization
634already partly takes place using the character database, and so
635again we will end up with less \TEX\ code. A side effect of
636removing encoding constraints (i.e.\ moving to \UNICODE) is that
637things get faster. Later this year math will be opened up.
638
639One of the biggest impacts of opening up is the arrival of
640attributes. In traditional \TEX\ only glyph nodes have an
641attribute, namely the font id. Now all nodes can have attributes,
642many of them. We use them to implement a variety of features that
643already were present in \MKII, but used marks instead: color (of
644course including color spaces and transparency), inter|-|character
645spacing, character case manipulation, language dependent pre and
646post character spacing (for instance after colons in French),
647special font rendering such as outlines, and much more. An
648experimental application is a more advanced glue|/|penalty model
649with look|-|back and look|-|ahead as well as relative weights. This
650is inspired by the one good thing that \XML\ formatting objects
651provide: a spacing and pagebreak model.
652
653It does not take much imagination to see that features demanding
654processing of node lists come with a price: many of the
655callbacks that \LUATEX\ provides are indeed used and as a result
656quite some time is spent in \LUA. You can add to that the time
657needed for handling font features, which also boils down to
658processing node lists. The second half of 2007 Taco and I spent
659much time on benchmarking and by now the interface between \TEX\
660and \LUA\ (passing information and manipulating nodes) has been
661optimized quite well. Of course there's always a price for
662flexibility and \LUATEX\ will never be as fast as \PDFTEX, but
663then, \PDFTEX\ does not deal with \OPENTYPE\ and such.
664
665We can safely conclude that the impact of \LUATEX\ on \CONTEXT\ is
666huge and that fundamental changes take place in all key
667components: files, fonts, languages, graphics, \METAPOST\, \XML,
668verbatim and color to start with, but more will follow. Of course
669there are also less prominent areas where we use \LUA\ based
670approaches: handling \URL's, conversions, alternative math
671input to mention a few. Sometime in 2009 we expect to start
672working on more fundamental typesetting related issues.
673
674\subject{roadmap}
675
676On the \LUATEX\ website \type {www.luatex.org} you can find a
677roadmap. This roadmap is just an indication of what happened and
678will happen and it will be updated when we feel the need. Here is
679a summary.
680
681\startitemize
682
683\head merging engines
684
685Merge some of the \ALEPH\ codebase into \PDFTEX\ (which already has
686\ETEX) so that \LUATEX\ in \DVI\ mode behaves like \ALEPH, and in
687\PDF\ mode like \PDFTEX. There will be \LUA\ callbacks for file
688searching. This stage is mostly finished.
689
690\head \OPENTYPE\ fonts
691
692Provide \PDF\ output for \ALEPH\ bidirectional functionality and add
693support for \OPENTYPE\ fonts. Allow \LUA\ scripts to control all
694aspects of font loading, font definition and manipulation. Most of
695this is finished.
696
697\head tokenizing and node lists
698
699Use \LUA\ callbacks for various internals, complete access to
700tokenizer and provide access to node lists at moments that make
701sense. This stage is completed.
702
703\head paragraph building
704
705Provide control over various aspects of paragraph building
706(hyphenation, kerning, ligature building), dynamic loading loading
707of hyphenation patterns. Apart from some small details these
708objectives are met.
709
710\head \METAPOST\ (\MPLIB)
711
712Incorporate a \METAPOST\ library and investigate options for runtime
713font generation and manipulation. This activity is on schedule and
714integration will take place before summer 2008.
715
716\head image handling
717
718Image identification and loading in \LUA\ including scaling and
719object management. This is nicely on schedule, the first version of the
720image library showed up in the 0.22 beta and some more features
721are planned.
722
723\head special features
724
725Cleaning up of \HZ\ optimization and protruding and getting rid of
726remaining global font properties. This includes some cleanup of
727the backend. Most of this stage is finished.
728
729\head page building
730
731Control over page building and access to internals that matter.
732Access to inserts. This is on the agenda for late 2008.
733
734\head \TEX\ primitives
735
736Access to and control over most \TEX\ primitives (and related
737mechanisms) as well as all registers. Especially box handling
738has to be reinvented. This is an ongoing effort.
739
740\head \PDF\ backend
741
742Open up most backend related features, like annotations and
743object management. The first code will show up at the end of 2008.
744
745\head math
746
747Open up the math engine parallel to the development of
748the \TEX\ Gyre math fonts. Work on this will start during 2008 and
749we hope that it will be finished by early 2009.
750
751\head \CWEB
752
753Convert the \TEX\ Pascal source into \CWEB\ and start using \LUA\
754as glue language for components. This will be tested on \MPLIB\
755first. This is on the long term agenda, so maybe around 2010 you
756will see the first signs.
757
758\stopitemize
759
760In addition to the mentioned functionality we have a couple of
761ideas that we will implement along the road. The first formal beta
762was released at \TUG\ 2007 in San Diego (\USA). The first
763formal release will be at \TUG\ 2008 in Cork (Ireland). The
764production version will be released at Euro\TEX\ in the
765Netherlands (2009).
766
767
768Eventually \LUATEX\ will be the successor to \PDFTEX\ (informally
769we talk of \PDFTEX\ version~2). It can already be used as a
770drop|-|in for \ALEPH\ (the stable variant of \OMEGA). It provides a
771scripting engine without the need to install a specific scripting
772environment. These factors are among the reasons why distributors
773have added the binaries to the collections. Norbert Preining
774maintains the \LINUX\ packages, Akira Kakuto provides \WINDOWS\
775binaries as part of his distribution, Arthur Reutenauer takes care
776of \MACOSX\ and Christian Schenk recently added \LUATEX\ to \MIKTEX.
777The \LUATEX\ and \MPLIB\ projects are hosted at Supelec by Fabrice
778Popineau (one of our technical consultants). And with Karl Berry
779being one of our motivating supporters, you can be sure that the
780binaries will end up someplace in \TEXLIVE\ this year.
781
782\stopcomponent
783