lowlevel-expansion.tex /size: 35 Kb    last modification: 2021-10-28 13:50
1% language=us runpath=texruns:manuals/lowlevel
2
3% This is work in progress and after an initial draft got extended because of the
4% 2021 meeting. It will hopefully improve over time.
5
6\usemodule[system-tokens]
7
8\environment lowlevel-style
9
10\startdocument
11  [title=expansion,
12   color=middleyellow]
13
14\startsectionlevel[title=Preamble]
15
16This short manual demonstrates a couple of properties of the macro language. It
17is not an in|-|depth philosophical expose about macro languages, tokens,
18expansion and such that some \TEX ies like. I prefer to stick to the practical
19aspects. Occasionally it will be technical but you can just skip those paragraphs
20(or later return to them) when you can't follow the explanation. It's often not
21that relevant. I won't talk in terms of mouth, stomach and gut the way the \TEX
22book does and although there is no way to avoid the word \quote {token} I will do
23my best to not complicate matters by too much token speak. Examples show best
24what we mean.
25
26\stopsectionlevel
27
28\startsectionlevel[title={\TEX\ primitives}]
29
30The \TEX\ language provides quite some commands and those built in are called
31primitives. User defined commands are called macros. A macro is a shortcut to a
32list of primitives and|/|or macro calls. All can be mixed with characters that
33are to be typeset somehow.
34
35\starttyping[option=TEX]
36\def\MyMacro{b}
37
38a\MyMacro c
39\stoptyping
40
41When \TEX\ reads this input the \type {a} gets turned into a glyph node with a
42reference to the current font set and the character \type {a}. Then the parser
43sees a macro call, and it will enter another input level where it expands this
44macro. In this case it sees just an \type {b} and it will give this the same
45treatment as the \type {a}. The macro ends, the input level decrements and the
46\type {c} gets its treatment.
47
48Before we move on to more examples and differences between engines, it is good to
49stress that \type {\MyMacro} is not a primitive command: we made our command
50here. The \type {b} actually can be seen as a sort of primitive because in this
51macro it gets stored as so called token with a primitive property. That primitive
52property can later on be used to determine what to do. More explicit examples of
53primitives are \type {\hbox}, \type {\advance} and \type {\relax}. It will be
54clear that \CONTEXT\ extends the repertoire of primitive commands with a lot of
55macro commands. When we typeset a source using module \type {m-scite} the
56primitives come out dark blue.
57
58The amount of primitives differs per engine. It all starts with \TEX\ as written
59by Don Knuth. Later \ETEX\ added some more primitives and these became official
60extensions adopted by other variants of \TEX. The \PDFTEX\ engine added quite
61some and as follow up on that \LUATEX\ added more but didn't add all of \PDFTEX.
62A few new primitives came from \OMEGA\ (\ALEPH). The \LUAMETATEX\ engine drops a
63set of primitives that comes with \LUATEX\ and adds plenty new ones. The nature
64of this engine (no backend and less frontend) makes that we need to implement
65some primitives as macros. But the basic set is what good old \TEX\ comes with.
66
67Internally these so called primitives are grouped in categories that relate to
68their nature. They can be directly expanded (a way of saying that they get
69immediately interpreted) or delayed (maybe stored for later usage). They can
70involve definitions, calculations, setting properties and values or they can
71result in some typesetting. This is what makes \TEX\ confusing to new users: it
72is a macro programming language, an interpreter but at the same time an executor
73of typesetting instructions.
74
75A group of primitives is internally identified as a command (they have a \type
76{cmd} code) and the sub commands are flagged by their \type {chr} code. This
77sounds confusing but just thing of the fact that most of what we input are
78characters and therefore they make up most sub commands. For instance the \quote
79{letter \type {cmd}} is used for characters that are seen as letters that can be
80used in the name of user commands, can be typeset, are valid for hyphenation
81etc.\ The letter related \type {cmd} can have many \type {chr} codes (all of
82\UNICODE). I'd like to remark that the grouping is to a large extend functional,
83so sometimes primitives that you expect to be similar in nature are in different
84groups. This has to do with the fact that \TEX\ needs to be a able to determine
85efficiently if a primitive is operating (or forbidden) in horizontal, vertical
86and|/|or math mode.
87
88There are more than 150 internal \type {cmd} groups. if we forget about the
89mentioned character related ones, some, have only a few sub commands (\type
90{chr}) and others many more (just consider all the \OPENTYPE\ math spacing
91related parameters). A handful of these commands deal with what we call macros:
92user defined combinations of primitives and other macros, consider them little
93programs. The \type {\MyMacro} example above is an example. There are differences
94between engines. In standard \TEX\ there are \type {\outer} and \type {\long}
95commands, and most engines have these. However, in \LUAMETATEX\ the later to be
96discussed \type {\protected} macros have their own specific \quote {call \type
97{cmd}}. Users don't need to bother about this.
98
99So, when from now on we talk about primitives, we mean the built in, hard coded
100commands, and when we talk about macros we mean user commands. Although
101internally there are less \type {cmd} categories than primitives, from the
102perspective of the user they are all unique. Users won't consult the source
103anyway but when they do they are warned. Also, when in \LUAMETATEX\ you use the
104low level interfacing to \TEX\ you have to figure out these subtle aspects
105because there this grouping does matter.
106
107Before we continue I want to make clear that expansion (as discussed in this
108document) can refer to a macro being expanded (read: its meaning gets injected
109into the input, so the engine kind of sidetracks from what is was doing) but also
110to direct consequences of running into a primitive. However, users only need to
111consider expansion in the perspective of macros. If a user has \type {\advance}
112in the input it immediately gets done. But when it's part of a macro definition
113it only is executed when the macro expands. A good check in (traditional) \TEX\
114is to compare what happens in \type {\def} and \type {\edef} which is why we will
115use these two in the upcoming examples. You put something in a macro and then
116check what \type {\meaning} or \type {\show} reports.
117
118Now back to user defined macros. A macro can contain references to macros so in
119practice the input can go several levels up and some applications push back a lot
120so this is why your \TEX\ input stack can be configured to be huge.
121
122\starttyping[option=TEX]
123\def\MyMacroA{ and }
124\def\MyMacroB{1\MyMacroA 2}
125
126a\MyMacroA b
127\stoptyping
128
129When \type {\MyMacroB} is defined, its body gets three so called tokens: the
130character token \type {1} with property \quote {other}, a token that is a
131reference to the macro \type {\MyMacroB}, and a character token \type {2}, also
132with property \quote {other} The meaning of \type {\MyMacroA} is five tokens:
133a reference to a space token, then three character tokens with property \quote
134{letter}, and finally a space token.
135
136\starttyping[option=TEX]
137\def \MyMacroA{ and }
138\edef\MyMacroB{1\MyMacroA 2}
139
140a\MyMacroA b
141\stoptyping
142
143In the second definition an \type {\edef} is used, where the \type {e} indicates
144expansion. This time the meaning gets expanded immediately. So we get effectively the same
145as in:
146
147\starttyping[option=TEX]
148\def\MyMacroB{1 and 2}
149\stoptyping
150
151Characters are easy: they just expand to themselves or trigger adding a glyph
152node, but not all primitives expand to their meaning or effect.
153
154\startbuffer
155\def\MyMacroA{\scratchcounter = 1 }
156\def\MyMacroB{\advance\scratchcounter by 1}
157\def\MyMacroC{\the\scratchcounter}
158
159\MyMacroA a
160\MyMacroB b
161\MyMacroB c
162\MyMacroB d
163\MyMacroC
164\stopbuffer
165
166\typebuffer[option=TEX]
167
168\scratchcounter0 \getbuffer
169
170\startlines \tt
171\meaning\MyMacroA
172\meaning\MyMacroB
173\meaning\MyMacroC
174\stoplines
175
176Let's assume that \type {\scratchcounter} is zero to start with and use \type
177{\edef's}:
178
179\startbuffer
180\edef\MyMacroA{\scratchcounter = 1 }
181\edef\MyMacroB{\advance\scratchcounter by 1}
182\edef\MyMacroC{\the\scratchcounter}
183
184\MyMacroA a
185\MyMacroB b
186\MyMacroB c
187\MyMacroB d
188\MyMacroC
189\stopbuffer
190
191\typebuffer[option=TEX]
192
193\scratchcounter0 \getbuffer
194
195\startlines \tt
196\meaning\MyMacroA
197\meaning\MyMacroB
198\meaning\MyMacroC
199\stoplines
200
201So, this time the third macro has its meaning frozen, but we can
202prevent this by applying a \type {\noexpand} when we do this:
203
204\startbuffer
205\edef\MyMacroA{\scratchcounter = 1 }
206\edef\MyMacroB{\advance\scratchcounter by 1}
207\edef\MyMacroC{\noexpand\the\scratchcounter}
208
209\MyMacroA a
210\MyMacroB b
211\MyMacroB c
212\MyMacroB d
213\MyMacroC
214\stopbuffer
215
216\typebuffer[option=TEX]
217
218\scratchcounter0 \getbuffer
219
220\startlines \tt
221\meaning\MyMacroA
222\meaning\MyMacroB
223\meaning\MyMacroC
224\stoplines
225
226Of course this is a rather useless example but it serves its purpose: you'd better
227be aware what gets expanded immediately in an \type {\edef}. In most cases you
228only need to worry about \type {\the} and embedded macros (and then of course
229their meanings).
230
231\def\MyShow{\quotation {\strut \inlinebuffer \expandafter \typ \expandafter
232{\the\scratchtoks}\strut}}
233
234You can also store tokens in a so-called token register. Here we use a predefined
235scratch register:
236
237\startbuffer
238\def\MyMacroA{ and }
239\def\MyMacroB{1\MyMacroA 2}
240\scratchtoks {\MyMacroA}
241\stopbuffer
242
243\typebuffer[option=TEX]
244
245The content of \type {\scratchtoks} is: \MyShow, so no expansion has happened
246here.
247
248\startbuffer
249\def\MyMacroA{ and }
250\def\MyMacroB{1\MyMacroA 2}
251\scratchtoks \expandafter {\MyMacroA}
252\stopbuffer
253
254\typebuffer[option=TEX]
255
256Now the content of \type {\scratchtoks} is: \MyShow, so this time expansion has
257happened.
258
259\startbuffer
260\def\MyMacroA{ and }
261\def\MyMacroB{1\MyMacroA 2}
262\scratchtoks \expandafter {\MyMacroB}
263\stopbuffer
264
265\typebuffer[option=TEX]
266
267Indeed the macro gets expanded but only one level: \MyShow. Compare this with:
268
269\startbuffer
270\def\MyMacroA{ and }
271\edef\MyMacroB{1\MyMacroA 2}
272\scratchtoks \expandafter {\MyMacroB}
273\stopbuffer
274
275\typebuffer[option=TEX]
276
277The trick is to expand in two steps with an intermediate \type {\edef}: \MyShow. Later we will see that other
278engines provide some more expansion tricks. The only way to get some grip on
279expansion is to just play with it.
280
281The \type {\expandafter} primitive expands the token (which can be a macro) standing after
282the next next one and then injects its meaning into the stream. So:
283
284\starttyping[option=TEX]
285\expandafter \MyMacroA \MyMacroB
286\stoptyping
287
288works okay. In a normal document you will never need this kind of hackery: it
289only happens in a bit more complex macros. Here is an example:
290
291\startbuffer[a]
292\scratchcounter 1
293\bgroup
294\advance\scratchcounter 1
295\egroup
296\the\scratchcounter
297\stopbuffer
298
299\typebuffer[a][option=TEX]
300
301\startbuffer[b]
302\scratchcounter 1
303\bgroup
304\advance\scratchcounter 1
305\expandafter
306\egroup
307\the\scratchcounter
308\stopbuffer
309
310\typebuffer[b][option=TEX]
311
312The first one gives \inlinebuffer[a], while the second gives \inlinebuffer[b].
313
314% \let
315% \futurelet
316% \afterassignment
317% \aftergroup
318
319\stopsectionlevel
320
321\startsectionlevel[title={\ETEX\ primitives}]
322
323In this engine a couple of extensions were added and later on \PDFTEX\ added some
324more. We only discuss a few that relate to expansion. There is however a pitfall
325here. Before \ETEX\ showed up, \CONTEXT\ already had a few mechanism that also
326related to expansion and it used some names for macros that clash with those in
327\ETEX. This is why we will use the \type {\normal} prefix here to indicate the
328primitive. \footnote {In the meantime we no longer have a low level \type
329{\protected} macro so one can use the primitive}.
330
331\startbuffer
332\def\MyMacroA{a}
333\def\MyMacroB{b}
334\normalprotected\def\MyMacroC{c}
335\edef\MyMacroABC{\MyMacroA\MyMacroB\MyMacroC}
336\stopbuffer
337
338\typebuffer[option=TEX] \getbuffer
339
340These macros have the following meanings:
341
342\startlines \tt
343\meaning\MyMacroA
344\meaning\MyMacroB
345\meaning\MyMacroC
346\meaning\MyMacroABC
347\stoplines
348
349In \CONTEXT\ you will use the \type {\unexpanded} prefix instead, because that one
350did something similar in older versions of \CONTEXT. As we were early adopters of
351\ETEX, this later became a synonym to the \ETEX\ primitive.
352
353\startbuffer
354\def\MyMacroA{a}
355\def\MyMacroB{b}
356\normalprotected\def\MyMacroC{c}
357\normalexpanded{\scratchtoks{\MyMacroA\MyMacroB\MyMacroC}}
358\stopbuffer
359
360\typebuffer[option=TEX] \getbuffer
361
362Here the wrapper around the token register assignment will expand the three
363macros, unless they are protected, so its content becomes \MyShow. This saves
364either a lot of more complex \type {\expandafter} usage or the need to use an intermediate
365\type {\edef}. In \CONTEXT\ the \type {\expanded} macro does something simpler
366but it doesn't expand the first token as this is meant as a wrapper around a command,
367like:
368
369\starttyping[option=TEX]
370\expanded{\chapter{....}} % a ConTeXt command
371\stoptyping
372
373where we do want to expand the title but not the \type {\chapter} command (not
374that this would happen actually because \type {\chapter} is a protected command.)
375
376The counterpart of \type {\normalexpanded} is \type {\normalunexpanded}, as in:
377
378\startbuffer
379\def\MyMacroA{a}
380\def\MyMacroB{b}
381\normalprotected\def\MyMacroC{c}
382\normalexpanded {\scratchtoks
383    {\MyMacroA\normalunexpanded {\MyMacroB}\MyMacroC}}
384\stopbuffer
385
386\typebuffer[option=TEX] \getbuffer
387
388The register now holds \MyShow: three tokens, one character token and two
389macro references.
390
391Tokens can represent characters, primitives, macros or be special entities like
392starting math mode, beginning a group, assigning a dimension to a register, etc.
393Although you can never really get back to the original input, you can come pretty
394close, with:
395
396\startbuffer
397\detokenize{this can $ be anything \bgroup}
398\stopbuffer
399
400\typebuffer[option=TEX]
401
402This (when typeset monospaced) is: {\tt \inlinebuffer}. The detokenizer is like
403\type {\string} applied to each token in its argument. Compare this to:
404
405\startbuffer
406\normalexpanded {
407    \normaldetokenize{10pt}
408}
409\stopbuffer
410
411\typebuffer[option=TEX]
412
413We get four tokens: {\tt\inlinebuffer}.
414
415\startbuffer
416\normalexpanded {
417    \string 1\string 0\string p\string t
418}
419\stopbuffer
420
421\typebuffer[option=TEX]
422
423So that was the same operation: {\tt\inlinebuffer}, but in both cases there is a
424subtle thing going on: characters have a catcode which distinguishes them. The
425parser needs to know what makes up a command name and normally that's only
426letters. The next snippet shows these catcodes:
427
428\startbuffer
429\normalexpanded {
430    \noexpand\the\catcode`\string 1 \noexpand\enspace
431    \noexpand\the\catcode`\string 0 \noexpand\enspace
432    \noexpand\the\catcode`\string p \noexpand\enspace
433    \noexpand\the\catcode`\string t \noexpand
434}
435\stopbuffer
436
437\typebuffer[option=TEX]
438
439The result is \quotation {\tt\inlinebuffer}: two characters are marked as \quote
440{letter} and two fall in the \quote {other} category.
441
442\stopsectionlevel
443
444\startsectionlevel[title={\LUATEX\ primitives}]
445
446This engine adds a little to the expansion repertoire. First of all it offers a
447way to extend token lists registers:
448
449\startbuffer
450\def\MyMacroA{a}
451\def\MyMacroB{b}
452\normalprotected\def\MyMacroC{b}
453\scratchtoks{\MyMacroA\MyMacroB}
454\stopbuffer
455
456\typebuffer[option=TEX] \getbuffer
457
458The result is: \MyShow.
459
460\startbuffer
461\toksapp\scratchtoks{\MyMacroA\MyMacroB}
462\stopbuffer
463
464\typebuffer[option=TEX] \getbuffer
465
466We're now at: \MyShow.
467
468\startbuffer
469\etoksapp\scratchtoks{\MyMacroA\space\MyMacroB\space\MyMacroC}
470\stopbuffer
471
472\typebuffer[option=TEX] \getbuffer
473
474The register has this content: \MyShow, so the additional context got expanded in
475the process, except of course the protected macro \type {\MyMacroC}.
476
477There is a bunch of these combiners: \type {\toksapp} and \type {\tokspre} for
478local appending and prepending, with global companions: \type {\gtoksapp} and
479\type {\gtokspre}, as well as expanding variant: \type {\etoksapp}, \type
480{\etokspre}, \type {\xtoksapp} and \type {\xtokspre}.
481
482These are not beforehand more efficient that using intermediate expanded macros
483or token lists, simply because in the process \TEX\ has to create tokens lists
484too, but sometimes they're just more convenient to use. In \CONTEXT\ we actually
485do benefit from these.
486
487\stopsectionlevel
488
489\startsectionlevel[title={\LUAMETATEX\ primitives}]
490
491We already saw that macro's can be defined protected which means that
492
493\startbuffer
494           \def\TestA{A}
495\protected \def\TestB{B}
496          \edef\TestC{\TestA\TestB}
497\stopbuffer
498
499\typebuffer[option=TEX] \getbuffer
500
501gives this:
502
503\startlines
504\type{\TestC} : {\tttf \meaningless\TestC}
505\stoplines
506
507One way to get \type {\TestB} expanded it to prefix it with \type {\expand}:
508
509\startbuffer
510           \def\TestA{A}
511\protected \def\TestB{B}
512          \edef\TestC{\TestA\TestB}
513          \edef\TestD{\TestA\expand\TestB}
514\stopbuffer
515
516\typebuffer[option=TEX] \getbuffer
517
518We now get:
519
520\startlines
521\type{\TestC} : {\tttf \meaningless\TestC}
522\type{\TestD} : {\tttf \meaningless\TestD}
523\stoplines
524
525There are however cases where one wishes this to happen automatically, but that
526will also make protected macros expand which will create havoc, like switching fonts.
527
528\startbuffer
529               \def\TestA{A}
530\protected     \def\TestB{B}
531\semiprotected \def\TestC{C}
532              \edef\TestD{\TestA\TestB\TestC}
533              \edef\TestE{\normalexpanded{\TestA\TestB\TestC}}
534              \edef\TestF{\semiexpanded  {\TestA\TestB\TestC}}
535\stopbuffer
536
537\typebuffer[option=TEX] \getbuffer
538
539This time \type {\TestC} looses its protection:
540
541\startlines
542\type{\TestA} : {\tttf \meaningless\TestA}
543\type{\TestB} : {\tttf \meaningless\TestB}
544\type{\TestC} : {\tttf \meaningless\TestC}
545\type{\TestD} : {\tttf \meaningless\TestD}
546\type{\TestE} : {\tttf \meaningless\TestE}
547\type{\TestF} : {\tttf \meaningless\TestF}
548\stoplines
549
550Actually adding \type {\fullyexpanded} would be trivial but it makes not much
551sense to add the overhead (at least not now). This feature is experimental
552anyway so it might go away when I see no real advantage from it.
553
554When you store something in a macro or token register you always need to keep an
555eye on category codes. A dollar in the input is normally treated as math shift, a
556hash indicates a macro parameter or preamble entry. Characters like \quote {A}
557are letters but \quote {[} and \quote {]} are tagged as \quote {other}. The \TEX\
558scanner acts according to these codes. If you ever find yourself in a situation
559that changing catcodes is no option or cumbersome, you can do this:
560
561\starttyping[option=TEX]
562\edef\TestOA{\expandtoken\othercatcode `A}
563\edef\TestLA{\expandtoken\lettercatcode`A}
564\stoptyping
565
566In both cases the meaning is \type {A} but in the first case it's not a letter
567but a character flagged as \quote {other}.
568
569A whole new category of commands has to do with so called local control. When
570\TEX\ scans and interprets the input, a process takes place that is called
571tokenizing: (sequences of) characters get a symbolic representation and travel
572through the system as tokens. Often they immediately get interpreted and are then
573discarded. But when for instance you define a macro they end up as a linked list
574of tokens in the macro body. We already saw that expansion plays a role. In most
575cases, unless \TEX\ is collecting tokens, the main action is dealt with in the so-called
576main loop. Something gets picked up from the input but can also be pushed
577back, for instance because of some lookahead that didn't result in an action.
578Quite some time is spent in pushing and popping from the so-called input stack.
579
580When we are in \LUA, we can pipe back into the engine but all is collected till
581we're back in \TEX\ where the collected result is pushed into the input. Because
582\TEX\ is a mix of programming and action there basically is only that main loop.
583There is no real way to start a sub run in \LUA\ and do all kind of things
584independent of the current one. This makes sense when you consider the mix: it
585would get too confusing.
586
587However, in \LUATEX\ and even better in \LUAMETATEX, we can enter a sort of local
588state and this is called \quote {local control}. When we are in local control a
589new main loop is entered and the current state is temporarily forgotten: we can for
590instance expand where one level up expansion was not done. It sounds complicated
591an indeed it is complicated so examples have to clarify it.
592
593\starttyping[option=TEX]
5941 \setbox0\hbox to 10pt{2} \count0=3 \the\count0 \multiply\count0 by 4
595\stoptyping
596
597This snippet of code is not that useful but illustrates what we're dealing with:
598
599\startitemize
600
601\startitem
602    The \type {1} gets typeset. So, characters like that are seen as text.
603\stopitem
604
605\startitem
606    The \type {\setbox} primitive triggers picking up a register number, then
607    goes on scanning for a box specification and that itself will typeset a
608    sequence of whatever until the group ends.
609\stopitem
610
611\startitem
612    The \type {count} primitive triggers scanning for a register number (or
613    reference) and then scans for a number; the equal sign is optional.
614\stopitem
615
616\startitem
617    The \type {the} primitive injects some value into the current input stream
618    and it does so by entering a new input level.
619\stopitem
620
621\startitem
622    The \type {multiply} primitive picks up a register specification and
623    multiplies that by the next scanned number. The \type {by} is optional.
624\stopitem
625
626\stopitemize
627
628We now look at this snippet again but with an expansion context:
629
630\startbuffer[def]
631\def \TestA{1 \setbox0\hbox{2} \count0=3 \the\count0}
632\stopbuffer
633
634\startbuffer[edef]
635\edef\TestB{1 \setbox0\hbox{2} \count0=3 \the\count0}
636\stopbuffer
637
638\typebuffer[def] [option=TEX]
639\typebuffer[edef][option=TEX]
640
641\getbuffer[def]
642\getbuffer[edef]
643
644These two macros have a slightly different body. Make sure you see the
645difference before reading on.
646
647\luatokentable\TestA
648
649\luatokentable\TestB
650
651We now introduce a new primitive \type {\localcontrolled}:
652
653\startbuffer[edef]
654\edef\TestB{1 \setbox0\hbox{2} \count0=3 \the\count0}
655\stopbuffer
656
657\startbuffer[ldef]
658\edef\TestC{1 \setbox0\hbox{2} \localcontrolled{\count0=3} \the\count0}
659\stopbuffer
660
661\typebuffer[edef][option=TEX]
662\typebuffer[ldef][option=TEX]
663
664\getbuffer[edef]
665\getbuffer[ldef]
666
667Again, watch the subtle differences:
668
669\luatokentable\TestB
670
671\luatokentable\TestC
672
673Another example:
674
675\startbuffer[edef]
676\edef\TestB{1 \setbox0\hbox{2} \count0=3 \the\count0}
677\stopbuffer
678
679\startbuffer[ldef]
680\edef\TestD{\localcontrolled{1 \setbox0\hbox{2} \count0=3 \the\count0}}
681\stopbuffer
682
683\typebuffer[edef][option=TEX]
684\typebuffer[ldef][option=TEX]
685
686\getbuffer[edef]\getbuffer[ldef]\quad{\darkgray\leftarrow\space Watch how the results end up here!}
687
688\luatokentable\TestB
689
690\luatokentable\TestD
691
692We can use this mechanism to define so called fully expandable macros:
693
694\startbuffer[def]
695\def\WidthOf#1%
696  {\beginlocalcontrol
697   \setbox0\hbox{#1}%
698   \endlocalcontrol
699   \wd0 }
700\stopbuffer
701
702\startbuffer[use]
703\scratchdimen\WidthOf{The Rite Of Spring}
704
705\the\scratchdimen
706\stopbuffer
707
708\typebuffer[def][option=TEX]
709\typebuffer[use][option=TEX]
710
711\getbuffer[def]\getbuffer[use]
712
713When you want to add some grouping, it quickly can become less pretty:
714
715\startbuffer[def]
716\def\WidthOf#1%
717  {\dimexpr
718      \beginlocalcontrol
719        \begingroup
720          \setbox0\hbox{#1}%
721          \expandafter
722        \endgroup
723      \expandafter
724      \endlocalcontrol
725      \the\wd0
726   \relax}
727\stopbuffer
728
729\startbuffer[use]
730\scratchdimen\WidthOf{The Rite Of Spring}
731
732\the\scratchdimen
733\stopbuffer
734
735\typebuffer[def][option=TEX]
736\typebuffer[use][option=TEX]
737
738\getbuffer[def]\getbuffer[use]
739
740A single token alternative is available too and its usage is like this:
741
742\startbuffer
743 \def\TestA{\scratchcounter=100 }
744\edef\TestB{\localcontrol\TestA \the\scratchcounter}
745\edef\TestC{\localcontrolled{\TestA} \the\scratchcounter}
746\stopbuffer
747
748\typebuffer[option=TEX] \getbuffer
749
750The content of \type {\TestB} is \quote {\tttf\meaningless\TestB} and of course
751the \type {\TestC} macro gives \quote {\tttf\meaningless\TestC}.
752
753We now move to the \LUA\ end. Right from the start the way to get something into
754\TEX\ from \LUA\ has been the print functions. But we can also go local
755(immediate). There are several methods:
756
757\startitemize[packed]
758\startitem via a set token register \stopitem
759\startitem via a defined macro \stopitem
760\startitem via a string \stopitem
761\stopitemize
762
763Among the things to keep in mind are catcodes, scope and expansion (especially in
764when the result itself ends up in macros). We start with an example where we go via
765a token register:
766
767\startbuffer[set]
768\toks0={\setbox0\hbox{The Rite Of Spring}}
769\toks2={\setbox0\hbox{The Rite Of Spring!}}
770\stopbuffer
771
772\typebuffer[set][option=TEX]
773
774\startbuffer[run]
775\startluacode
776tex.runlocal(0) context("[1: %p]",tex.box[0].width)
777tex.runlocal(2) context("[2: %p]",tex.box[0].width)
778\stopluacode
779\stopbuffer
780
781\typebuffer[run][option=TEX]
782
783\start \getbuffer[set,run] \stop
784
785We can also use a macro:
786
787\startbuffer[set]
788\def\TestA{\setbox0\hbox{The Rite Of Spring}}
789\def\TestB{\setbox0\hbox{The Rite Of Spring!}}
790\stopbuffer
791
792\typebuffer[set][option=TEX]
793
794\startbuffer[run]
795\startluacode
796tex.runlocal("TestA") context("[3: %p]",tex.box[0].width)
797tex.runlocal("TestB") context("[4: %p]",tex.box[0].width)
798\stopluacode
799\stopbuffer
800
801\typebuffer[run][option=TEX]
802
803\start \getbuffer[set,run] \stop
804
805A third variant is more direct and uses a (\LUA) string:
806
807\startbuffer[run]
808\startluacode
809tex.runstring([[\setbox0\hbox{The Rite Of Spring}]])
810
811context("[5: %p]",tex.box[0].width)
812
813tex.runstring([[\setbox0\hbox{The Rite Of Spring!}]])
814
815context("[6: %p]",tex.box[0].width)
816\stopluacode
817\stopbuffer
818
819\typebuffer[run][option=TEX]
820
821\start \getbuffer[run] \stop
822
823A bit more high level:
824
825\starttyping[option=LUA]
826context.runstring([[\setbox0\hbox{(Here \bf 1.2345)}]])
827context.runstring([[\setbox0\hbox{(Here \bf   %.3f)}]],1.2345)
828\stoptyping
829
830Before we had \type {runstring} this was the way to do it when staying in \LUA\
831was needed:
832
833\startbuffer[run]
834\startluacode
835token.setmacro("TestX",[[\setbox0\hbox{The Rite Of Spring}]])
836tex.runlocal("TestX")
837context("[7: %p]",tex.box[0].width)
838\stopluacode
839\stopbuffer
840
841\typebuffer[run][option=TEX]
842
843\start \getbuffer[run] \stop
844
845\startbuffer[run]
846\startluacode
847tex.scantoks(0,tex.ctxcatcodes,[[\setbox0\hbox{The Rite Of Spring!}]])
848tex.runlocal(0)
849context("[8: %p]",tex.box[0].width)
850\stopluacode
851\stopbuffer
852
853\typebuffer[run][option=TEX]
854
855\start \getbuffer[run] \stop
856
857The order of flushing matters because as soon as something is not stored in a
858token list or macro body, \TEX\ will typeset it. And as said, a lot of this relates
859to pushing stuff into the input which is stacked. Compare:
860
861\startbuffer[run]
862\startluacode
863context("[HERE 1]")
864context("[HERE 2]")
865\stopluacode
866\stopbuffer
867
868\typebuffer[run][option=TEX]
869
870\start \getbuffer[run] \stop
871
872with this:
873
874\startbuffer[run]
875\startluacode
876tex.pushlocal() context("[HERE 1]") tex.poplocal()
877tex.pushlocal() context("[HERE 2]") tex.poplocal()
878\stopluacode
879\stopbuffer
880
881\typebuffer[run][option=TEX]
882
883\start \getbuffer[run] \stop
884
885You can expand a macro at the \LUA\ end with \type {token.expandmacro} which has
886a peculiar interface. The first argument has to be a string (the name of a macro)
887or a userdata (a valid macro token). This macro can be fed with parameters by
888passing more arguments:
889
890\starttabulate[|||]
891\NC string \NC serialized to tokens \NC \NR
892\NC true   \NC wrap the next string in curly braces \NC \NR
893\NC table  \NC each entry will become an argument wrapped in braces \NC \NR
894\NC token  \NC inject the token directly \NC \NR
895\NC number \NC change control to the given catcode table \NC \NR
896\stoptabulate
897
898There are more scanner related primitives, like the \ETEX\ primitive
899\type {\detokenize}:
900
901\startbuffer[run]
902[\detokenize {test \relax}]
903\stopbuffer
904
905\typebuffer[run][option=TEX]
906
907This gives: {\tttf \getbuffer[run]}. In \LUAMETATEX\ we also have complementary
908primitive(s):
909
910\startbuffer[run]
911[\tokenized   catcodetable \vrbcatcodes {test {\bf test} test}]
912[\tokenized                             {test {\bf test} test}]
913[\retokenized              \vrbcatcodes {test {\bf test} test}]
914\stopbuffer
915
916\typebuffer[run][option=TEX]
917
918The \type {\tokenized} takes an optional keyword and the examples above give: {\tttf
919\getbuffer[run]}. The \LUATEX\ primitive \type {\scantextokens} which is a
920variant of \ETEX's \type {\scantokens} operates under the current catcode regime
921(the last one honors \type {\everyeof}). The difference with \type {\tokenized}
922is that this one first serializes the given token list (just like \type
923{\detokenize}). \footnote {The \type {\scan*tokens} primitives now share the same
924helpers as \LUA, but they should behave the same as in \LUATEX.}
925
926With \type {\retokenized} the catcode table index is mandatory (it saves a bit of
927scanning and is easier on intermixed \type {\expandafter} usage. There
928often are several ways to accomplish the same:
929
930\startbuffer[run]
931\def\MyTitle{test {\bf test} test}
932\detokenize               \expandafter{\MyTitle}: 0.46\crlf
933\meaningless                           \MyTitle : 0.47\crlf
934\retokenized              \notcatcodes{\MyTitle}: 0.87\crlf
935\tokenized   catcodetable \notcatcodes{\MyTitle}: 0.93\crlf
936\stopbuffer
937
938\typebuffer[run][option=TEX]
939
940\getbuffer[run]
941
942Here the numbers show the relative performance of these methods. The \type
943{\detokenize} and \type {\meaningless} win because they already know that a
944verbose serialization is needed. The last two first serialize and then
945reinterpret the resulting token list using the given catcode regime. The last one
946is slowest because it has to scan the keyword.
947
948There is however a pitfall here:
949
950\startbuffer[run]
951\def\MyText {test}
952\def\MyTitle{test \MyText\space test}
953\detokenize               \expandafter{\MyTitle}\crlf
954\meaningless                           \MyTitle \crlf
955\retokenized              \notcatcodes{\MyTitle}\crlf
956\tokenized   catcodetable \notcatcodes{\MyTitle}\crlf
957\stopbuffer
958
959\typebuffer[run][option=TEX]
960
961The outcome is different now because we have an expandable embedded macro call.
962The fact that we expand in the last two primitives is also the reason why they are
963\quote {slower}.
964
965\getbuffer[run]
966
967To complete this picture, we show a variant than combines much of what has been
968introduced in this section:
969
970\startbuffer[run]
971\semiprotected\def\MyTextA {test}
972\def\MyTextB {test}
973\def\MyTitle{test \MyTextA\space \MyTextB\space test}
974\detokenize               \expandafter{\MyTitle}\crlf
975\meaningless                           \MyTitle \crlf
976\retokenized              \notcatcodes{\MyTitle}\crlf
977\retokenized              \notcatcodes{\semiexpanded{\MyTitle}}\crlf
978\tokenized   catcodetable \notcatcodes{\MyTitle}\crlf
979\tokenized   catcodetable \notcatcodes{\semiexpanded{\MyTitle}}
980\stopbuffer
981
982\typebuffer[run][option=TEX]
983
984This time compare the last four lines:
985
986\getbuffer[run]
987
988Of course the question remains to what extend we need this and eventually will
989apply in \CONTEXT. The \type {\detokenize} is used already. History shows that
990eventually there is a use for everything and given the way \LUAMETATEX\ is
991structured it was not that hard to provide the alternatives without sacrificing
992performance or bloating the source.
993
994% tex.quitlocal
995%
996% tex.expandmacro   : string|userdata + [string|true|table|userdata|number]*
997% tex.expandasvalue : kind + string|userdata + [string|true|table|userdata|number]*
998% tex.runstring     : [catcode] + string + expand + grouped
999% tex.runlocal      : function|number(register)|string(macro)|userdata(token) + expand + grouped
1000% mplib.expandtex   : mpx + kind + string|userdata + [string|true|table|userdata|number]*
1001
1002\stopsectionlevel
1003
1004\startsectionlevel[title=Dirty tricks]
1005
1006When I was updating this manual Hans vd Meer and I had some discussions about
1007expansion and tokenization related issues when combining of \XML\ processing with
1008\TEX\ macros where he did some manipulations in \LUA. In these mixed cases you
1009can run into catcode related problems because in \XML\ you want for instance a
1010\type {#} to be a hash mark (other character) and not an parameter identifier.
1011Normally this is handled well in \CONTEXT\ but of course there are complex cases
1012where you need to adapt.
1013
1014Say that you want to compare two strings (officially we should say token lists)
1015with mixed catcodes. Let's also assume that you want to use the normal \type
1016{\if} construct (which was part of the discussion). We start with defining
1017a test set. The reason that we present this example here is that we use
1018commands discussed in previous sections:
1019
1020\startbuffer[run]
1021               \def\abc{abc}
1022\semiprotected \def\xyz{xyz}
1023              \edef\pqr{\expandtoken\notcatcodes`p%
1024                        \expandtoken\notcatcodes`q%
1025                        \expandtoken\notcatcodes`r}
1026
10271: \ifcondition\similartokens{abc} {def}YES\else NOP\fi (NOP) \quad
10282: \ifcondition\similartokens{abc}{\abc}YES\else NOP\fi (YES)
1029
10303: \ifcondition\similartokens{xyz} {pqr}YES\else NOP\fi (NOP) \quad
10314: \ifcondition\similartokens{xyz}{\xyz}YES\else NOP\fi (YES)
1032
10335: \ifcondition\similartokens{pqr} {pqr}YES\else NOP\fi (YES) \quad
10346: \ifcondition\similartokens{pqr}{\pqr}YES\else NOP\fi (YES)
1035\stopbuffer
1036
1037\typebuffer[run][option=TEX]
1038
1039So, we have a mix of expandable and semi expandable macros, and also a mix of
1040catcodes. A naive approach would be:
1041
1042\startbuffer[def]
1043\permanent\protected\def\similartokens#1#2%
1044  {\iftok{#1}{#2}}
1045\stopbuffer
1046
1047\typebuffer[def][option=TEX]
1048
1049but that will fail on some cases:
1050
1051\pushoverloadmode \startpacked \tttf \getbuffer[def,run]\stoppacked \popoverloadmode
1052
1053So how about:
1054
1055\startbuffer[def]
1056\permanent\protected\def\similartokens#1#2%
1057  {\iftok{\detokenize{#1}}{\detokenize{#2}}}
1058\stopbuffer
1059
1060\typebuffer[def][option=TEX]
1061
1062That one is even worse:
1063
1064\pushoverloadmode \startpacked \tttf \getbuffer[def,run]\stoppacked \popoverloadmode
1065
1066We need to expand so we end up with this:
1067
1068\startbuffer[def]
1069\permanent\protected\def\similartokens#1#2%
1070  {\normalexpanded{\noexpand\iftok
1071     {\noexpand\detokenize{#1}}
1072     {\noexpand\detokenize{#2}}}}
1073\stopbuffer
1074
1075\typebuffer[def][option=TEX]
1076
1077Better:
1078
1079\pushoverloadmode \startpacked \tttf \getbuffer[def,run]\stoppacked \popoverloadmode
1080
1081But that will still not deal with the mildly protected macro so in the end we
1082have:
1083
1084\startbuffer[def]
1085\permanent\protected\def\similartokens#1#2%
1086  {\semiexpanded{\noexpand\iftok
1087     {\noexpand\detokenize{#1}}
1088     {\noexpand\detokenize{#2}}}}
1089\stopbuffer
1090
1091\typebuffer[def][option=TEX]
1092
1093Now we're good:
1094
1095\pushoverloadmode \startpacked \tttf \getbuffer[def,run]\stoppacked \popoverloadmode
1096
1097Finally we wrap this one in the usual \type {\doifelse...} macro:
1098
1099\startbuffer[def]
1100\permanent\protected\def\doifelsesimilartokens#1#2%
1101  {\ifcondition\similartokens{#1}{#2}%
1102     \expandafter\firstoftwoarguments
1103   \else
1104     \expandafter\secondoftwoarguments
1105   \fi}
1106\stopbuffer
1107
1108\typebuffer[def][option=TEX]
1109
1110so that we can do:
1111
1112\starttyping[option=TEX]
1113\doifelsesimilartokens{pqr}{\pqr}{YES}{NOP}
1114\stoptyping
1115
1116A companion macro of this is \type {\wipetoken} but for that one you need to look
1117into the source.
1118
1119\stopsectionlevel
1120
1121\stopdocument
1122
1123% \aftergroups
1124% \aftergrouped
1125%
1126%     \starttyping
1127%           \def\foo{foo}
1128% \protected\def\oof{oof}
1129%
1130% \csname foo\endcsname
1131% \csname oof\endcsname
1132% \csname \foo\endcsname
1133% \begincsname \oof\endcsname % error in luametatex, but in texexpand l 477 we can block an error
1134%
1135% \ifcsname  foo\endcsname yes\else nop\fi
1136% \ifcsname  oof\endcsname yes\else nop\fi
1137% \ifcsname \foo\endcsname yes\else nop\fi
1138% \ifcsname \oof\endcsname yes\else nop\fi % nop in luametatex
1139% \stoptyping
1140