mk-mix.tex /size: 35 Kb    last modification: 2023-12-21 09:43
1% language=us
2
3\startcomponent mk-mix
4
5\environment mk-environment
6
7\chapter{The \luaTeX\ Mix}
8
9\subject{introduction}
10
11The idea of embedding \LUA\ into \TEX\ originates in some
12experiments with \LUA\ embedded in the \SCITE\ editor. You can add
13functionality to this editor by loading \LUA\ scripts. This is
14accomplished by a library that gives access to the internals of
15the editing component.
16
17The first integration of \LUA\ in \PDFTEX\ was relatively simple:
18from \TEX\ one could call out to \LUA\ and from \LUA\ one could
19print to \TEX. My first application was converting math encoded a
20calculator syntax to \TEX. Following experiments dealt with
21\METAPOST. At this point integration meant as little as: having some
22scripting language as addition to the macro language. But, even in
23this early stage further possibilities were explored, for instance
24in manipulating the final output (i.e.\ the \PDF\ code). The first
25versions of what by then was already called \LUATEX\ provided
26access to some internals, like counter and dimension registers and
27the dimensions of boxes.
28
29Boosted by the oriental \TeX\ project, the team started exploring
30more fundamental possibilities: hooks in the input|/|output,
31tokenization, fonts and nodelists. This was followed by opening up
32hyphenation, breaking lines into paragraphs and building
33ligatures. At that point we not only had access to some internals
34but also could influence the way \TEX\ operates.
35
36After that, an excursion was made to \MPLIB, which fulfilled a
37long standing wish for a more natural integration of \METAPOST\
38into \TEX. At that point we ended up with mixtures of \TEX, \LUA\
39and \METAPOST\ code.
40
41Medio 2008 we still need to open up more of \TEX, like page
42building, math, alignments and the backend. Eventually \LUATEX\
43will be nicely split up in components, rewritten in \CCODE, and we may
44even end up with \LUA\ glueing together the components that make
45up the \TEX\ engine. At that point the interoperation between
46\TEX\ and \LUA\ may be more rich that it is now.
47
48In the next sections I will discuss some of the ideas behind
49\LUATEX\ and the relationship between \LUA\ and \TEX\ and how it
50presents itself to users. I will not discuss the interface itself,
51which consists of quite some functions (organized in pseudo
52libraries) and the mechanisms used to access and replace internals
53(we call them callbacks).
54
55\subject {tex vs. lua}
56
57\TEX\ is a macro language. Everything boils down to either allowing
58stepwise expansion or explicitly preventing it. There are no real
59control features, like loops; tail recursion is a key concept.
60There are few accessible data|-|structures like numbers, dimensions,
61glue, token lists and boxes. What happens inside \TEX\ is
62controlled by variables, mostly hidden from view, and optimized
63within the constraints of 30 years ago.
64
65The original idea behind \TEX\ was that an author would write a
66specific collection of macros for each publication, but increasing
67popularity among non-programmers quickly resulted in distributed
68collections of macros, called macro packages. They started small
69but grew and grew and by now have become pretty large. In these
70packages there are macros dealing with fonts, structure, page
71layout, graphic inclusion, etc. There is also code dealing with
72user interfaces, process control, conversion and much of that code
73looks out of place: the lack of control features and string
74manipulation is solved by mimicking other languages, the
75unavailability of a float datatype is compensated by misusing
76dimension registers, and you can find provisions to force or
77inhibit expansion all over the place.
78
79\TEX\ is a powerful typographical programming language but
80lacks some of the handy features of scripting languages. Handy in the
81sense that you will need them when you want to go beyond the
82original purpose of the system. \LUA\ is a powerful scripting
83language, but knows nothing of typesetting. To some extent it
84resembles the language that \TEX\ was written in: \PASCAL. And,
85since \LUA\ is meant for embedding and extending existing systems,
86it makes sense to bring \LUA\ into \TEX. How do they compare?
87Let's give some examples.
88
89About the simplest example of using \LUA\ in \TEX\ is the following:
90
91\starttyping
92\directlua { tex.print(math.sqrt(10)) }
93\stoptyping
94
95This kind of application is probably what most users will want and
96use, if they use \LUA\ at all. However, we can go further than that.
97
98In \TEX\ a loop can be implemented as in the plain format
99(copied with comment):
100
101\starttyping
102\def\loop#1\repeat{\def\body{#1}\iterate}
103\def\iterate{\body\let\next\iterate\else\let\next\relax\fi\next}
104\let\repeat=\fi % this makes \loop...\if...\repeat skippable
105\stoptyping
106
107This is then used as:
108
109\starttyping
110\newcount \mycounter \mycounter=1
111\loop
112    ...
113    \advance\mycounter 1
114    \ifnum\mycounter < 11
115\repeat
116\stoptyping
117
118The definition shows a bit how \TEX\ programming works. Of course
119such definitions can be wrapped in macros, like:
120
121\starttyping
122\forloop{1}{10}{1}{some action}
123\stoptyping
124
125and this is what often happens in more complex macro packages. In
126order to use such control loops without side effects, the macro
127writer needs to take measures that permit for instance nested
128usage and avoids clashes between local variables (counters or
129macros) and user defined ones. Here we use a counter in the
130condition, but in practice expressions will be more complex
131and this is not that trivial to implement.
132
133The original definition of the iterator can be written a bit
134more efficient:
135
136\starttyping
137\def\iterate{\body \expandafter\iterate \fi}
138\stoptyping
139
140And indeed, in macro packages you will find many such expansion
141control primitives being used, which does not make reading macros
142easier.
143
144Now, get me right, this does not make \TEX\ less powerful, it's
145just that the language is focused on typesetting and not on
146general purpose programming, and in principle users can do
147without: documents can be preprocessed using another language, and
148document specific styles can be used.
149
150We have to keep in mind that \TEX\ was written in a time when
151resources in terms of memory and \CPU\ cycles weres less abundant
152than they are now. The 255 registers per class and the about 3000
153hash slots in original \TEX\ were more than enough for typesetting
154a book, but in huge collections of macros they are not all that much. For
155that reason many macropackages use obscure names to hide their
156private registers from users and instead of allocating new ones
157with meaningful names, existing ones are shared. It is therefore
158not completely fair to compare \TEX\ code with \LUA\ code: in \LUA\
159we have plenty of memory and the only limitations are those
160imposed by modern computers.
161
162In \LUA, a loop looks like this:
163
164\starttyping
165for i=1,10 do
166    ...
167end
168\stoptyping
169
170But while in the \TEX\ example, the content directly ends up in
171the input stream, in \LUA\ we need to do that explicitly, so in
172fact we will have:
173
174\starttyping
175for i=1,10 do
176    tex.print("...")
177end
178\stoptyping
179
180And, in order to execute this code snippet, in \LUATEX\ we will do:
181
182\starttyping
183\directlua 0 {
184    for i=1,10 do
185        tex.print("...")
186    end
187}
188\stoptyping
189
190So, eventually we will end up with more code than just \LUA\ code,
191but still the loop itself looks quite readable and more complex loops
192are possible:
193
194\starttyping
195\directlua 0 {
196    local t, n = { }, 0
197    while true do
198        local r = math.random(1,10)
199        if not t[r] then
200            t[r], n = true, n+1
201            tex.print(r)
202            if n == 10 then break end
203        end
204    end
205}
206\stoptyping
207
208This will typeset the numbers 1 to 10 in randomized order.
209Implementing a random number generator in pure \TEX\ takes some bit of
210code and keeping track of already defined numbers in macros can be
211done with macros, but both are not very efficient.
212
213I already stressed that \TEX\ is a typographical programming
214language and as such some things in \TEX\ are easier than in \LUA,
215given some access to internals:
216
217\starttyping
218\setbox0=\hbox{x} \the\wd0
219\stoptyping
220
221In \LUA\ we can do this as follows:
222
223\starttyping
224\directlua 0 {
225    local n = node.new('glyph')
226    n.font = font.current()
227    n.char = string.byte('x')
228    tex.box[0] = node.hpack(n)
229    tex.print(tex.box[0].width/65536 .. "pt")
230}
231\stoptyping
232
233One pitfall here is that \TEX\ rounds the number differently than
234\LUA. Both implementations can be wrapped in a macro cq. function:
235
236\starttyping
237\def\measured#1{\setbox0=\hbox{#1}\the\wd0\relax}
238\stoptyping
239
240Now we get:
241
242\starttyping
243\measured{x}
244\stoptyping
245
246The same macro using \LUA\ looks as follows:
247
248\starttyping
249\directlua 0 {
250    function measure(chr)
251        local n = node.new('glyph')
252        n.font = font.current()
253        n.char = string.byte(chr)
254        tex.box[0] = node.hpack(n)
255        tex.print(tex.box[0].width/65536 .. "pt")
256    end
257}
258\def\measured#1{\directlua0{measure("#1")}}
259\stoptyping
260
261In both cases, special tricks are needed if you want to pass for
262instance a \type {#} to \TEX's variant, or a \type {"} to \LUA. In
263both cases we can use shortcuts like \type {\#} and in the second
264case we can pass strings as long strings using double square
265brackets to \LUA.
266
267This example is somewhat misleading. Imagine that we want to
268pass more than one character. The \TEX\ variant is already suited
269for that, but the function will now look like:
270
271\starttyping
272\directlua 0 {
273    function measure(str)
274        if str == "" then
275            tex.print("0pt")
276        else
277            local head, tail = nil, nil
278            for chr in str:gmatch(".") do
279                local n = node.new('glyph')
280                n.font = font.current()
281                n.char = string.byte(chr)
282                if not head then
283                    head = n
284                else
285                    tail.next = n
286                end
287                tail = n
288            end
289            tex.box[0] = node.hpack(head)
290            tex.print(tex.box[0].width/65536 .. "pt")
291        end
292    end
293}
294\stoptyping
295
296And still it's not okay, since \TEX\ inserts kerns between
297characters (depending on the font) and glue between words, and
298doing that all in \LUA\ takes more code. So, it will be clear that
299although we will use \LUA\ to implement advanced features, \TEX\
300itself still has quite some work to do.
301
302In the following example we show code, but this is not of
303production quality. It just demonstrates a new way of dealing
304with text in \TEX.
305
306Occasionally a design demands that at some place the first
307character of each word should be uppercase, or that the first word
308of a paragraph should be in small caps, or that each first line of a
309paragraph has to be in dark blue. When using traditional \TEX\ the user
310then has to fall back on parsing the data stream, and preferably
311you should then start such a sentence with a command that can pick
312up the text. For accentless languages like English this is quite
313doable but as soon as commands (for instance dealing with accents)
314enter the stream this process becomes quite hairy.
315
316The next code shows how \CONTEXT\ \MKII\ defines the \type {\Word}
317and \type {\Words} macros that capitalize the first characters of
318word(s). The spaces are really important here because they signal
319the end of a word.
320
321\starttyping
322\def\doWord#1%
323  {\bgroup\the\everyuppercase\uppercase{#1}\egroup}
324
325\def\Word#1%
326  {\doWord#1}
327
328\def\doprocesswords#1 #2\od
329  {\doifsomething{#1}{\processword{#1} \doprocesswords#2 \od}}
330
331\def\processwords#1%
332  {\doprocesswords#1 \od\unskip}
333
334\let\processword\relax
335
336\def\Words
337  {\let\processword\Word \processwords}
338\stoptyping
339
340Actually, the code is not that complex. We split of words and feed
341them to a macro that picks up the first token (hopefully a character)
342which is then fed into the \type {\uppercase} primitive. This assumes that
343for each character a corresponding uppercase variant is defined using the
344\type {\uccode} primitive. Exceptions can be dealt with by assigning relevant
345code to the token register \type {\everyuppercase}.
346However, such macros are far from robust. What happens if the text
347is generated and not input as-is? What happens with commands in
348the stream that do something with the following tokens?
349
350A \LUA\ based solution can look as follows:
351
352\starttyping
353\def\Words#1{\directlua 0
354    for s in unicode.utf8.gmatch("#1", "([^ ])") do
355        tex.sprint(string.upper(s:sub(1,1)) .. s:sub(2))
356    end
357}
358\stoptyping
359
360But there is no real advantage here, apart from the fact that less code
361is needed. We still operate on the input and therefore we need to look
362to a different kind of solution: operating on the node list.
363
364\starttyping
365function CapitalizeWords(head)
366    local done = false
367    local glyph = node.id("glyph")
368    for start in node.traverse_id(glyph,head) do
369        local prev, next = start.prev, start.next
370        if prev and prev.id == kern and prev.subtype == 0 then
371            prev = prev.prev
372        end
373        if next and next.id == kern and next.subtype == 0 then
374            next = next.next
375        end
376        if (not prev or prev.id ~= glyph) and
377                    next and next.id == glyph then
378            done = upper(start)
379        end
380    end
381    return head, done
382end
383\stoptyping
384
385A node list is a forward|-|linked list. With a helper
386function in the \type {node} library we can loop over such lists. Instead
387of traversing we can use a regular while loop, but it is probably less
388efficient in this case. But how to apply this function to the relevant
389part of the input? In \LUATEX\ there are several callbacks that operate
390on the horizontal lists and we can use one of them to plug in this
391function. However, in that case the function is applied to probably
392more text than we want.
393
394The solution for this is to assign attributes to the range of text
395that such a function has to take care of. These attributes (there
396can be many) travel with the nodes. This is also a reason why such
397code normally is not written by end users, but by macropackage
398writers: they need to provide the frameworks where you can plug in
399code. In \CONTEXT\ we have several such mechanisms and therefore
400in \MKIV\ this function looks (slightly stripped) as follows:
401
402\starttyping
403function cases.process(namespace,attribute,head)
404    local done, actions = false, cases.actions
405    for start in node.traverse_id(glyph,head) do
406        local attr = has_attribute(start,attribute)
407        if attr and attr > 0 then
408            unset_attribute(start,attribute)
409            local action = actions[attr]
410            if action then
411                local _, ok = action(start)
412                done = done and ok
413            end
414        end
415    end
416    return head, done
417end
418\stoptyping
419
420Here we check attributes (these are set at the \TEX\ end) and we have
421all kind of actions that can be applied, depending on the value of the
422attribute. Here the function that does the actual uppercasing
423is defined somewhere else. The \type {cases} table provides us a
424namespace; such namespaces needs to be coordinated by macro package
425writers.
426
427This approach means that the macro code looks completely different; in
428pseudo code we get:
429
430\starttyping
431\def\Words#1{{<setattribute><cases><somevalue>#1}}
432\stoptyping
433
434Or alternatively:
435
436\starttyping
437\def\StartWords{\begingroup<setattribute><cases><somevalue>}
438\def\StopWords {\endgroup}
439\stoptyping
440
441Because starting a paragraph with a group can have unwanted side
442effects (like \type {\everypar} being expanded inside a group) a
443variant is:
444
445\starttyping
446\def\StartWords{<setattribute><cases><somevalue>}
447\def\StopWords {<resetattribute><cases>}
448\stoptyping
449
450So, what happens here is that the users sets an attribute using some high
451level command, and at some point during the transformation of the input into
452node lists, some action takes place. At that point commands, expansion and
453whatever no longer can interfere.
454
455In addition to some infrastructure, macro packages need to carry some
456knowledge, just as with the \type {\uccode} used in \type {\uppercase}.
457The \type {upper} function in the first example looks as follows:
458
459\starttyping
460local function upper(start)
461    local data, char = characters.data, start.char
462    if data[char] then
463        local uc = data[char].uccode
464        if uc and fonts.ids[start.font].characters[uc] then
465            start.char = uc
466            return true
467        end
468    end
469    return false
470end
471\stoptyping
472
473Such code is really macro package dependent: \LUATEX\ only
474provides the means, not the solutions. In \CONTEXT\ we have
475collected information about characters in a \type {data} table
476in the \type {characters} namespace. There we have stored the
477uppercase codes (\type {uccode}). The, again \CONTEXT\ specific,
478\type {fonts} table keeps track of all defined fonts and before
479we change the case, we make sure that this character is present
480in the font. Here \type {id} is the number by which
481\LUATEX\ keeps track of the used fonts. Each glyph node carries
482such a reference.
483
484In this example, eventually we end up with more code than in \TEX,
485but the solution is much more robust. Just imagine what would happen
486when in the \TEX\ solution we would have:
487
488\starttyping
489\Words{\framed[offset=3pt]{hello world}}
490\stoptyping
491
492It simply does not work. On the other hand, the \LUA\ code never
493sees \TEX\ commands, it only sees the two words represented by
494glyphs nodes and separated by glue.
495
496Of course, there is a danger when we start opening \TEX's core
497features. Currently macro packages know what to expect, they know
498what \TEX\ can and cannot do. Of course macro writers have
499exploited every corner of \TEX, even the dark ones. Where dirty
500tricks in the \TEX book had an educational purpose, those of users
501sometimes have obscene traits. If we just stick to the trickery
502introduced for parsing input, converting this into that, doing
503some calculations, and alike, it will be clear that \LUA\ is more
504than welcome. It may hurt to throw away thousands of lines of
505impressive code and replace it by a few lines of \LUA\ but that's
506the price the user pays for abusing \TEX. Eventually \CONTEXT\ \MKIV\
507will be a decent mix of \LUA\ and \TEX\ code, and hopefully the
508solutions programmed in those languages are as clean as possible.
509
510Of course we can discuss until eternity whether \LUA\ is the best
511choice. Taco, Hartmut and I are pretty confident that it is, and
512in the couple of years that we are working on \LUATEX\ nothing has proved
513us wrong yet. We can fantasize about concepts, only to find out that
514they are impossible to implement or hard to agree on; we just go
515ahead using trial and error. We can talk over and over how opening up
516should be done, which is what the team does in a nicely
517closed and efficient loop, but at some points decisions have to be
518made. Nothing is perfect, neither is \LUATEX, but most users won't
519notice it as long as it extends \TEX's live and makes usage more
520convenient.
521
522Users of \TEX\ and \METAPOST\ will have noticed that both
523languages have their own grouping (scope) model. In \TEX\ grouping
524is focused on content: by grouping the macro writer (or author)
525can limit the scope to a specific part of the text or keep certain
526macros live within their own world.
527
528\starttyping
529.1. \bgroup .2. \egroup .1.
530\stoptyping
531
532Everything done at 2 is local unless explicitly told otherwise.
533This means that users can write (and share) macros with a small
534chance of clashes. In \METAPOST\ grouping is available too, but
535variables explicitly need to be saved.
536
537\starttyping
538.1. begingroup ; save p ; path p ; .2. endgroup .1.
539\stoptyping
540
541After using \METAPOST\ for a while this feels quite natural
542because an enforced local scope demands multiple return values
543which is not part of the macro language. Actually, this is another
544fundamental difference between the languages: \METAPOST\ has (a
545kind of) functions, which \TEX\ lacks. In \METAPOST\ you can write
546
547\starttyping
548draw origin for i=1 upto 10 : .. (i,sin(i)) endfor ;
549\stoptyping
550
551but also:
552
553\starttyping
554draw some(0) for i=1 upto 10 : .. some(i) endfor ;
555\stoptyping
556
557with
558
559\starttyping
560vardef some (expr i) =
561    if i > 4 : i = i - 4 fi ;
562    (i,sin(i))
563enddef ;
564\stoptyping
565
566The condition and assignment in no way interfere with the loop where
567this function is called, as long as some value is returned (a pair in
568this case).
569
570In \TEX\ things work differently. Take this:
571
572\starttyping
573\count0=1
574\message{\advance\count0 by 1 \the\count0}
575\the\count0
576\stoptyping
577
578The terminal wil show:
579
580\starttyping
581\advance \count 0 by 1 1
582\stoptyping
583
584At the end the counter still has the value~1. There are quite some
585situations like this, for instance when data like a table of
586contents has to be written to a file. You cannot write macros where
587such calculations are done and hidden and only the result is seen.
588
589The nice thing about the way \LUA\ is presented to the user is that it
590permits the following:
591
592\starttyping
593\count0=1
594\message{\directlua0{tex.count[0] = tex.count[0] + 1}\the\count0}
595\the\count0
596\stoptyping
597
598This will report~2 to the terminal and typeset a 2 in the
599document. Of course this does not solve everything, but it is a
600step forward. Also, compared to \TEX\ and \METAPOST, grouping is
601done differently: there is a \type {local} prefix that makes
602variables (and functions are variables too) local in modules,
603functions, conditions, loops etc. The \LUA\ code in this story
604contains such locals.
605
606In practice most users will use a macro package and so, if a user
607sees \TEX, he or she sees a user interface, not the code behind
608it. As such, they will also not encounter the code written in
609\LUA\ that deals with for instance fonts or node list
610manipulations. If a user sees \LUA, it will most probably be in
611processing actual data. Therefore, in the next section I will give an
612example of two ways to deal with \XML: one more suitable for
613traditional \TEX, and one inspired by \LUA. It demonstrates how
614the availability of \LUA\ can result in different solutions for
615the same problem.
616
617\subject {an example: xml}
618
619In \CONTEXT\ \MKII, the version that deals with \PDFTEX\ and \XETEX,
620we use a stream based \XML\ parser, written in \TEX. Each \type {<}
621and \type {&} triggers a macro that then parses the tag and/or entity.
622This method is quite efficient in terms of memory but the associated
623code is not simple because it has to deal with attributes, namespaces
624and nesting.
625
626The user interface is not that complex, but involves quite some
627commands. Take for instance the following \XML\ snippet:
628
629\starttyping
630<document>
631    <section>
632        <title>Whatever</title>
633        <p>some text</p>
634        <p>some more</p>
635    </section>
636</document>
637\stoptyping
638
639When using \CONTEXT\ commands, we can imagine the following definitions:
640
641\starttyping
642\defineXMLenvironment[document]{\starttext} {\stoptext}
643\defineXMLargument   [title]   {\section}
644\defineXMLenvironment[p]       {\ignorespaces}{\par}
645\stoptyping
646
647When attributes have to be dealt with, for instance a reference to
648this section, things quickly start looking more complex. Also,
649users need to know what definitions to use in situations like this:
650
651\starttyping
652<table>
653    <tr><td>first</td><td>...</td> <td>last</td></tr>
654    <tr><td>left</td><td>...</td> <td>right</td></tr>
655</table>
656\stoptyping
657
658Here we cannot be sure if a cell does not contain a nested table,
659which is why we need to define the mapping as follows:
660
661\starttyping
662\defineXMLnested[table]{\bTABLE} {\eTABLE}
663\defineXMLnested[tr]   {\bTR}    {\eTR}
664\defineXMLnested[td]   {\bTD}    {\eTD}
665\stoptyping
666
667The \type {\defineXMLnested} macro is rather messy because it has
668to collect snippets and keep track of the nesting level, but users
669don't see that code, they just need to know when to use what
670macro. Once it works, it keeps working.
671
672Unfortunately mappings from source to style are never that simple
673in real life. We usually need to collect, filter and relocate
674data. Of course this can be done before feeding the source to
675\TEX, but \MKII\ provides a few mechanisms for that too. If for
676instance you want to reverse the order you can do this:
677
678\starttyping
679<article>
680    <title>Whatever</title>
681    <author>Someone</author>
682    <p>some text</p>
683</article>
684\stoptyping
685
686\starttyping
687\defineXMLenvironment[article]
688    {\defineXMLsave[author]}
689    {\blank author: \XMLflush{author}}
690\stoptyping
691
692This will save the content of the \type {author} element and flush
693it when the end tag \type {article} is seen. So, given previous
694definitions, we will get the title, some text and then the author.
695You may argue that instead we should use for instance \XSLT\ but
696even then a mapping is needed from the \XML\ to \TEX, and it's a
697matter of taste where the burden is put.
698
699Because \CONTEXT\ also wants to support standards like
700\MATHML, there are some more mechanisms but these are hidden from
701the user. And although these do a good job in most cases, the code
702associated with the solutions has never been satisfying.
703
704Supporting \XML\ this way is doable, and \CONTEXT\ has used this method
705for many years in fairly complex situations. However, now that we
706have \LUA\ available, it is possible to see if some things can be done
707simpler (or differently).
708
709After some experimenting I decided to write a full blown \XML\
710parser in \LUA, but contrary to the stream based approach, this
711time the whole tree is loaded in memory. Although this uses more
712memory than a streaming solution, in practice the difference is
713not significant because often in \MKII\ we also needed to store
714whole chunks.
715
716Loading \XML\ files in memory is real fast and once it is done we
717can have access to the elements in a way similar to \XPATH. We can
718selectively pipe data to \TEX\ and manipulate content using \TEX\
719or \LUA. In most cases this is faster than the stream|-|based
720method. Interesting is that we can do this without linking to
721existing \XML\ libraries, and as a result we are pretty
722independent.
723
724So how does this look from the perspective of the user? Say that
725we have the simple article definition stored in \type {demo.xml}.
726
727\starttyping
728<?xml version ='1.0'?>
729<article>
730    <title>Whatever</title>
731    <author>Someone</author>
732    <p>some text</p>
733</article>
734\stoptyping
735
736This time we associate so called setups with the elements. Each
737element can have its own setup, and we can use expressions to
738assign them. Here we have just one such setup:
739
740\starttyping
741\startxmlsetups xml:document
742   \xmlsetsetup{main}{article}{xml:article}
743\stopxmlsetups
744\stoptyping
745
746When loading the document it will automatically be associated with the tag \type
747{main}. The previous rule associates setup \type {xml:article}
748with the \type {article} element in tree \type {main}. We need to
749register this setup so that it will be applied to the document
750after loading:
751
752\starttyping
753\xmlregistersetup{xml:document}
754\stoptyping
755
756and the document itself is processed with:
757
758\starttyping
759\xmlprocessfile{main}{demo.xml}{} % optional setup
760\stoptyping
761
762The setup \type {xml:article} can look as follows:
763
764\starttyping
765\startxmlsetups xml:article
766    \section{\xmltext{#1}{/title}}
767    \xmlall{#1}{!(title|author)}
768    \blank author: \xmltext{#1}{/author}
769\stopxmlsetups
770\stoptyping
771
772Here \type {#1} refers to the current node in the \XML\ tree, in
773this case the root element, \type {article}. The second argument
774of \type {\xmltext} and \type {\xmlall} is a path expression,
775comparable with \XPATH: \type {/title} means: the \type {title}
776element anchored to the current root (\type{#1}), and \type
777{!(title|author)} is the negation of (complement to) \type{title}
778or \type {author}. Such expressions can be more complex that the
779one above, like:
780
781\starttyping
782\xmlfirst{#1}{/one/(alpha|beta)/two/text()}
783\stoptyping
784
785which returns the content of the first element that satisfies one of
786the paths (nested tree):
787
788\starttyping
789/one/alpha/two
790/one/beta/two
791\stoptyping
792
793There is a whole bunch of commands like \type {\xmltext} that
794filter content and pipe it into \TEX. These are calling \LUA\
795functions. This is no manual, so we will not discuss them here.
796However, it is important to realize that we have to associate
797setups (consider them free formatted macros) to at least one
798element in order to get started. Also, \XML\ inclusions have to be
799dealt with before assigning the setups. These are simple
800one|-|line commands. You can also assign defaults to elements,
801which saves some work.
802
803Because we can use \LUA\ to access the tree and manipulate
804content, we can now implement parts of \XML\ handling in \LUA. An
805example of this is dealing with so|-|called Cals tables. This is
806done in approximately 150 lines of \LUA\ code, loaded at runtime in a
807module. This time the association uses functions instead of setups and those
808functions will pipe data back to \TEX. In the module you will find:
809
810\starttyping
811\startxmlsetups xml:cals:process
812    \xmlsetfunction {\xmldocument} {cals:table} {lxml.cals.table}
813\stopxmlsetups
814
815\xmlregistersetup{xml:cals:process}
816
817\xmlregisterns{cals}{cals}
818\stoptyping
819
820These commands tell \MKIV\ that elements with a namespace
821specification that contains \type {cals} will be remapped to the
822internal namespace \type {cals} and the setup associates a
823function with this internal namespace.
824
825By now it will be clear that from the perspective of the user
826hardly any \LUA\ is visible. Sure, he or she can deduce that deep
827down some magic takes place, especially when you run into more
828complex expressions like this (the \type {@} denotes an
829attribute):
830
831\starttyping
832\xmlsetsetup
833    {main} {item[@type='mpctext' or @type='mrtext']}
834    {questions:multiple:text}
835\stoptyping
836
837Such expressions resemble \XPATH, but can go much further than
838that, just by adding more functions to the library.
839
840\starttyping
841b[position() > 2 and position() < 5 and text() == 'ok']
842b[position() > 2 and position() < 5 and text() == upper('ok')]
843b[@n=='03' or @n=='08']
844b[number(@n)>2 and number(@n)<6]
845b[find(text(),'ALSO')]
846\stoptyping
847
848Just to give you an idea \unknown\ in the module that implements
849the parser you will find definitions that match the function calls
850in the above expressions.
851
852\starttyping
853xml.functions.find   = string.find
854xml.functions.upper  = string.upper
855xml.functions.number = tonumber
856\stoptyping
857
858So much for the different approaches. It's up to the user what
859method to use: stream based \MKII, tree based \MKIV, or a mixture.
860
861The main reason for taking \XML\ as an example of mixing \TEX\ and
862\LUA\ is in that it can be a bit mind boggling if you start
863thinking of what happens behind the screens. Say that we have
864
865\starttyping
866<?xml version ='1.0'?>
867<article>
868    <title>Whatever</title>
869    <author>Someone</author>
870    <p>some <b>bold</b> text</p>
871</article>
872\stoptyping
873
874and that we use the setup shown before with \type {article}.
875
876At some point, we are done with defining setups and load the
877document. The first thing that happens is that the list of
878manipulations is applied: file inclusions are processed first,
879setups and functions are assigned next, maybe some elements are
880deleted or added, etc. When that is done we serialize the tree to
881\TEX, starting with the root element. When piping data to \TEX\ we
882use the current catcode regime; linebreaks and spaces are honored
883as usual.
884
885Each element can have a function (command) associated and when
886this is the case, control is given to that function. In our case
887the root element has such a command, one that will trigger a
888setup. And so, instead of piping content to \TEX, a function is
889called that lets \TEX\ expand the macro that deals with this
890setup.
891
892However, that setup itself calls \LUA\ code that filters the title
893and feeds it into the \type {\section} command, next it flushes
894everything except the title and author, which again involves
895calling \LUA. Last it flushes the author. The nested sequence
896of events is as follows:
897
898\startitemize[2*broad]
899
900    \sym{lua:} Load the document and apply setups and alike.
901
902    \sym{lua:} Serialize the \type {article} element, but since
903    there is an associated setup, tell \TEX\ do expand that one
904    instead.
905
906    \startitemize[2*broad]
907
908        \sym{tex:} Execute the setup, first expand the \type {\section}
909        macro, but its argument is a call to \LUA.
910
911        \startitemize[2*broad]
912
913            \sym{lua:} Filter \type {title} from the subtree under
914            \type {article}, print the content to \TEX\ and return
915            control to \TEX.
916
917        \stopitemize
918
919        \sym{tex:} Tell \LUA\ to filter the paragraphs i.e.\ skip \type
920        {title} and \type {author}; since the \type {b} element has
921        no associated setup (or whatever) it is just serialized.
922
923        \startitemize[2*broad]
924
925            \sym{lua:} Filter the requested elements and return control
926            to \TEX.
927
928        \stopitemize
929
930        \sym{tex:} Ask \LUA\ to filter \type {author}.
931
932        \startitemize[2*broad]
933            \sym{lua:} Pipe \type {author}'s content to \TEX.
934        \stopitemize
935
936        \sym{tex:} We're done.
937
938    \stopitemize
939
940    \sym{lua:} We're done.
941
942\stopitemize
943
944This is a really simple case. In my daily work I am dealing
945with rather extensive and complex educational documents where in
946one source there is text, math, graphics, all kind of fancy stuff,
947questions and answers in several categories and of different kinds,
948either or not to be reshuffled, omitted or combined. So there
949we are talking about many more levels of \TEX\ calling \LUA\ and \LUA\
950piping to \TEX\ etc. To stay in \TEX\ speak: we're dealing with
951one big ongoing nested expansion (because \LUA calls expand), and
952you can imagine that this somewhat stresses \TEX's input stack, but
953so far I have not encountered any problems.
954
955\subject{some remarks}
956
957Here I discussed several possible applications of \LUA\ in \TEX. I
958didn't mention yet that because \LUATEX\ contains a scripting engine
959plus some extra libraries, it can also be used purely for that.
960This means that support programs can now be written in \LUA\ and
961that there are no longer dependencies of other scripting engines
962being present on the system. Consider this a bonus.
963
964Usage in \TEX\ can be organized in four categories:
965
966\startitemize[n]
967\item  Users can use \LUA\ for generating data, do all kind of
968       data manipulations, maybe read data from file, etc. The
969       only link with \TEX\ is the print function.
970\item  Users can use information provided by \TEX\ and use this
971       when making decisions. An example is collecting data in
972       boxes and use \LUA\ to do calculations with the dimensions.
973       Another example is a converter from \METAPOST\ output to
974       \PDF\ literals. No real knowledge of \TEX's internals is
975       needed. The \MKIV\ \XML\ functionality discussed before
976       demonstrates this: it's mostly data processing and piping
977       to \TEX. Other examples are dealing with buffers, defining
978       character mappings, and handling error messages, verbatim
979       \unknown\ the list is long.
980\item  Users can extend \TEX's core functionality. An example is
981       support for \OPENTYPE\ fonts: \LUATEX\ itself does not
982       support this format directly, but provides ways to feed
983       \TEX\ with the relevant information. Support for \OPENTYPE\
984       features demands manipulating node lists. Knowledge of
985       internals is a requirement. Advanced spacing and language
986       specific features are made possible by node list
987       manipulations and attributes. The alternative \type {\Words}
988       macro is an example of this.
989\item  Users can replace existing \TEX\ functionality. In \MKIV\
990       there are numerous example of this, for instance all file
991       \IO\ is written in \LUA, including reading from \ZIP\ files
992       and remote locations. Loading and defining fonts is also
993       under \LUA\ control. At some point \MKIV\ will provide
994       dedicated splitters for multicolumn typesetting and
995       probably also better display spacing and display
996       math splitting.
997\stopitemize
998
999The boundaries between these categories are not frozen. For
1000instance, support for image inclusion and \MPLIB\ in \CONTEXT\
1001\MKIV\ sits between category 3 and~4. Category 3 and~4, and
1002probably also~2 are normally the domain of macro package writers
1003and more advanced users who contribute to macro packages. Because
1004a macropackage has to provide some stability it is not a good idea
1005to let users mess around with all those internals, because of
1006potential interference. On the other hand, normally users operate
1007on top of a kernel using some kind of \API\ and history has
1008proved that macro packages are stable enough for this.
1009
1010Sometime around 2010 the team expects \LUATEX\ to be feature
1011complete and stable. By that time I can probably provide a more
1012detailed categorization.
1013
1014\stopcomponent
1015