lowlevel-macros.tex /size: 56 Kb    last modification: 2023-12-21 09:43
1% language=us runpath=texruns:manuals/lowlevel
2
3% Extending the macro argument parser happened stepwise and at each step a bit of
4% \CONTEXT\ code was adapted for testing. At the beginning of October the 20201010
5% version of \LUAMETATEX\ was more of less complete, and I decided to adapt some
6% more and more intrusive too. Of course that resulted in some more files than I
7% had intended so mid October about 100 files were adapted. When this works out
8% well, I'll do some more. In the process many macros got the frozen property so
9% that was also a test and we'll see how that works out (as it can backfire). As
10% usual, here is a musical timestamp: working on this happened when Pineapple Thief
11% released \quotation {Versions of the Truth} which again a magnificent drumming by
12% Gavin Harrison.
13
14% \permanent\tolerant\protected\def\xx[#1]#*#;[#2]#:#3% loops .. todo
15
16\usemodule[system-tokens]
17\usemodule[system-syntax]
18
19\environment lowlevel-style
20
21\startdocument
22  [title=macros,
23   color=middleorange]
24
25\startsectionlevel[title=Preamble]
26
27This chapter overlaps with other chapters but brings together some extensions to
28the macro definition and expansion parts. As these mechanisms were stepwise
29extended, the other chapters describe intermediate steps in the development.
30
31Now, in spite of the extensions discussed here the main ides is still that we
32have \TEX\ act like before. We keep the charm of the macro language but these
33additions make for easier definitions, but (at least initially) none that could
34not be done before using more code.
35
36\stopsectionlevel
37
38\startsectionlevel[title=Definitions]
39
40A macro definition normally looks like like this: \footnote {The \type
41{\dontleavehmode} command make the examples stay on one line.}
42
43\startbuffer[definition]
44\def\macro#1#2%
45  {\dontleavehmode\hbox to 6em{\vl\type{#1}\vl\type{#2}\vl\hss}}
46\stopbuffer
47
48\typebuffer[definition][option=TEX] \getbuffer[definition]
49
50Such a macro can be used as:
51
52\startbuffer[example]
53\macro {1}{2}
54\macro {1} {2}  middle space gobbled
55\macro 1 {2}    middle space gobbled
56\macro {1} 2    middle space gobbled
57\macro 1 2      middle space gobbled
58\stopbuffer
59
60\typebuffer[example][option=TEX]
61
62We show the result with some comments about how spaces are handled:
63
64\startlines \getbuffer[example] \stoplines
65
66A definition with delimited parameters looks like this:
67
68\startbuffer[definition]
69\def\macro[#1]%
70  {\dontleavehmode\hbox to 6em{\vl\type{#1}\vl\hss}}
71\stopbuffer
72
73\typebuffer[definition][option=TEX] \getbuffer[definition]
74
75When we use this we get:
76
77\startbuffer[example]
78\macro [1]
79\macro [ 1]    leading space kept
80\macro [1 ]    trailing space kept
81\macro [ 1 ]   both spaces kept
82\stopbuffer
83
84\typebuffer[example][option=TEX]
85
86Again, watch the handling of spaces:
87
88\startlines \getbuffer[example] \stoplines
89
90Just for the record we show a combination:
91
92\startbuffer[definition]
93\def\macro[#1]#2%
94  {\dontleavehmode\hbox to 6em{\vl\type{#1}\vl\type{#2}\vl\hss}}
95\stopbuffer
96
97\typebuffer[definition][option=TEX] \getbuffer[definition]
98
99With this:
100
101\startbuffer[example]
102\macro [1]{2}
103\macro [1] {2}
104\macro [1] 2
105\stopbuffer
106
107\typebuffer[example][option=TEX]
108
109we can again see the spaces go away:
110
111\startlines \getbuffer[example] \stoplines
112
113A definition with two separately delimited parameters is given next:
114
115\startbuffer[definition]
116\def\macro[#1#2]%
117  {\dontleavehmode\hbox to 6em{\vl\type{#1}\vl\type{#2}\vl\hss}}
118\stopbuffer
119
120\typebuffer[definition][option=TEX] \getbuffer[definition]
121
122When used:
123
124\startbuffer[example]
125\macro [12]
126\macro [ 12]     leading space gobbled
127\macro [12 ]     trailing space kept
128\macro [ 12 ]    leading space gobbled, trailing space kept
129\macro [1 2]     middle space kept
130\macro [ 1 2 ]   leading space gobbled, middle and trailing space kept
131\stopbuffer
132
133\typebuffer[example][option=TEX]
134
135We get ourselves:
136
137\startlines \getbuffer[example] \stoplines
138
139These examples demonstrate that the engine does some magic with spaces before
140(and therefore also between multiple) parameters.
141
142We will now go a bit beyond what traditional \TEX\ engines do and enter the
143domain of \LUAMETATEX\ specific parameter specifiers. We start with one that
144deals with this hard coded space behavior:
145
146\startbuffer[definition]
147\def\macro[#^#^]%
148  {\dontleavehmode\hbox to 6em{\vl\type{#1}\vl\type{#2}\vl\hss}}
149\stopbuffer
150
151\typebuffer[definition][option=TEX] \getbuffer[definition]
152
153The \type {#^} specifier will count the parameter, so here we expect again two
154arguments but the space is kept when parsing for them.
155
156\startbuffer[example]
157\macro [12]
158\macro [ 12]
159\macro [12 ]
160\macro [ 12 ]
161\macro [1 2]
162\macro [ 1 2 ]
163\stopbuffer
164
165\typebuffer[example][option=TEX]
166
167Now keep in mind that we could deal well with all kind of parameter handling in
168\CONTEXT\ for decades, so this is not really something we missed, but it
169complements the to be discussed other ones and it makes sense to have that level
170of control. Also, availability triggers usage. Nevertheless, some day the \type
171{#^} specifier will come in handy.
172
173\startlines \getbuffer[example] \stoplines
174
175We now come back to an earlier example:
176
177\startbuffer[definition]
178\def\macro[#1]%
179  {\dontleavehmode\hbox spread 1em{\vl\type{#1}\vl\hss}}
180\stopbuffer
181
182\typebuffer[definition][option=TEX] \getbuffer[definition]
183
184When we use this we see that the braces in the second call are removed:
185
186\startbuffer[example]
187\macro [1]
188\macro [{1}]
189\stopbuffer
190
191\typebuffer[example][option=TEX] \getbuffer[example]
192
193This can be prohibited by the \type {#+} specifier, as in:
194
195\startbuffer[definition]
196\def\macro[#+]%
197  {\dontleavehmode\hbox spread 1em{\vl\type{#1}\vl\hss}}
198\stopbuffer
199
200\typebuffer[definition][option=TEX] \getbuffer[definition]
201
202As we see, the braces are kept:
203
204\startbuffer[example]
205\macro [1]
206\macro [{1}]
207\stopbuffer
208
209\typebuffer[example][option=TEX]
210
211Again, we could easily get around that (for sure intended) side effect but it just makes nicer
212code when we have a feature like this.
213
214\getbuffer[example]
215
216Sometimes you want to grab an argument but are not interested in the results. For this we have
217two specifiers: one that just ignores the argument, and another one that keeps counting but
218discards it, i.e.\ the related parameter is empty.
219
220\startbuffer[definition]
221\def\macro[#1][#0][#3][#-][#4]%
222  {\dontleavehmode\hbox spread 1em
223     {\vl\type{#1}\vl\type{#2}\vl\type{#3}\vl\type{#4}\vl\hss}}
224\stopbuffer
225
226\typebuffer[definition][option=TEX] \getbuffer[definition]
227
228The second argument is empty and the fourth argument is simply ignored which is why we need
229\type {#4} for the fifth entry.
230
231\startbuffer[example]
232\macro [1][2][3][4][5]
233\stopbuffer
234
235\typebuffer[example][option=TEX]
236
237Here is proof that it works:
238
239\getbuffer[example]
240
241The reasoning behind dropping arguments is that for some cases we get around the
242nine argument limitation, but more important is that we don't construct token
243lists that are not used, which is more memory (and maybe even \CPU\ cache)
244friendly.
245
246Spaces are always kind of special in \TEX, so it will be no surprise that we have
247another specifier that relates to spaces.
248
249\startbuffer[definition]
250\def\macro[#1]#*[#2]%
251  {\dontleavehmode\hbox spread 1em{\vl\type{#1}\vl\type{#2}\vl\hss}}
252\stopbuffer
253
254\typebuffer[definition][option=TEX] \getbuffer[definition]
255
256This permits usage like the following:
257
258\startbuffer[example]
259\macro [1][2]
260\macro [1] [2]
261\stopbuffer
262
263\typebuffer[example][option=TEX] \getbuffer[example]
264
265Without the optional \quote {grab spaces} specifier the second line would
266possibly throw an error. This because \TEX\ then tries to match \type{][} so the
267\type {] [} in the input is simply added to the first argument and the next
268occurrence of \type {][} will be used. That one can be someplace further in your
269source and if not \TEX\ complains about a premature end of file. But, with the
270\type {#*} option it works out okay (unless of course you don't have that second
271argument \type {[2]}.
272
273Now, you might wonder if there is a way to deal with that second delimited
274argument being optional and of course that can be programmed quite well in
275traditional macro code. In fact, \CONTEXT\ does that a lot because it is set up
276as a parameter driven system with optional arguments. That subsystem has been
277optimized to the max over years and it works quite well and performance wise
278there is very little to gain. However, as soon as you enable tracing you end up
279in an avalanche of expansions and that is no fun.
280
281This time the solution is not in some special specifier but in the way a macro
282gets defined.
283
284\startbuffer[definition]
285\tolerant\def\macro[#1]#*[#2]%
286  {\dontleavehmode\hbox spread 1em{\vl\type{#1}\vl\type{#2}\vl\hss}}
287\stopbuffer
288
289\typebuffer[definition][option=TEX] \getbuffer[definition]
290
291The magic \type {\tolerant} prefix with delimited arguments and just quits when
292there is no match. So, this is acceptable:
293
294\startbuffer[example]
295\macro [1][2]
296\macro [1] [2]
297\macro [1]
298\macro
299\stopbuffer
300
301\typebuffer[example][option=TEX] \getbuffer[example]
302
303We can check how many arguments have been processed with a dedicated conditional:
304
305\startbuffer[definition]
306\tolerant\def\macro[#1]#*[#2]%
307  {\ifarguments 0\or 1\or 2\or ?\fi: \vl\type{#1}\vl\type{#2}\vl}
308\stopbuffer
309
310\typebuffer[definition][option=TEX] \getbuffer[definition]
311
312We use this test:
313
314\startbuffer[example]
315\macro [1][2] \macro [1] [2] \macro [1] \macro
316\stopbuffer
317
318\typebuffer[example][option=TEX]
319
320The result is: \inlinebuffer[example]\ which is what we expect because we flush
321inline and there is no change of mode. When the following definition is used in
322display mode, the leading \type {n=} can for instance start a new paragraph and
323when code in \type {\everypar} you can loose the right number when macros get
324expanded before the \type {n} gets injected.
325
326\starttyping[option=TEX]
327\tolerant\def\macro[#1]#*[#2]%
328  {n=\ifarguments 0\or 1\or 2\or ?\fi: \vl\type{#1}\vl\type{#2}\vl}
329\stoptyping
330
331In addition to the \type {\ifarguments} test primitive there is also a related
332internal counter \type {\lastarguments} set that you can consult, so the \type
333{\ifarguments} is actually just a shortcut for \typ {\ifcase \lastarguments}.
334
335We now continue with the argument specifiers and the next two relate to this optional
336grabbing. Consider the next definition:
337
338\startbuffer[definition]
339\tolerant\def\macro#1#*#2%
340  {\dontleavehmode\hbox spread 1em{\vl\type{#1}\vl\type{#2}\vl\hss}}
341\stopbuffer
342
343\typebuffer[definition][option=TEX] \getbuffer[definition]
344
345With this test:
346
347\startbuffer[example]
348\macro {1} {2}
349\macro {1}
350\macro
351\stopbuffer
352
353\typebuffer[example][option=TEX]
354
355We get:
356
357\getbuffer[example]
358
359This is okay because the last \type {\macro} is a valid (single token) argument. But, we
360can make the braces mandate:
361
362\startbuffer[definition]
363\tolerant\def\macro#=#*#=%
364  {\dontleavehmode\hbox spread 1em{\vl\type{#1}\vl\type{#2}\vl\hss}}
365\stopbuffer
366
367\typebuffer[definition][option=TEX] \getbuffer[definition]
368
369Here the \type {#=} forces a check for braces, so:
370
371\startbuffer[example]
372\macro {1} {2}
373\macro {1}
374\macro
375\stopbuffer
376
377\typebuffer[example][option=TEX]
378
379gives this:
380
381\getbuffer[example]
382
383However, we do loose these braces and sometimes you don't want that. Of course when you pass the
384results downstream to another macro you can always add them, but it was cheap to add a related
385specifier:
386
387\startbuffer[definition]
388\tolerant\def\macro#_#*#_%
389  {\dontleavehmode\hbox spread 1em{\vl\type{#1}\vl\type{#2}\vl\hss}}
390\stopbuffer
391
392\typebuffer[definition][option=TEX] \getbuffer[definition]
393
394Again, the magic \type {\tolerant} prefix works will quit scanning when there is
395no match. So:
396
397\startbuffer[example]
398\macro {1} {2}
399\macro {1}
400\macro
401\stopbuffer
402
403\typebuffer[example][option=TEX]
404
405leads to:
406
407\getbuffer[example]
408
409When you're tolerant it can be that you still want to pick up some argument
410later on. This is why we have a continuation option.
411
412\startbuffer[definition]
413\tolerant\def\foo      [#1]#*[#2]#:#3{!#1!#2!#3!}
414\tolerant\def\oof[#1]#*[#2]#:(#3)#:#4{!#1!#2!#3!#4!}
415\tolerant\def\ofo      [#1]#:(#2)#:#3{!#1!#2!#3!}
416\stopbuffer
417
418\typebuffer[definition][option=TEX] \getbuffer[definition]
419
420Hopefully the next example demonstrates how it works:
421
422\startbuffer[example]
423\foo{3} \foo[1]{3} \foo[1][2]{3}
424\oof{4} \oof[1]{4} \oof[1][2]{4}
425\oof[1][2](3){4} \oof[1](3){4} \oof(3){4}
426\ofo{3} \ofo[1]{3}
427\ofo[1](2){3} \ofo(2){3}
428\stopbuffer
429
430\typebuffer[example][option=TEX]
431
432As you can see we can have multiple continuations using the \type {#:} directive:
433
434\startlines \getbuffer[example] \stoplines
435
436The last specifier doesn't work well with the \type {\ifarguments} state because
437we no longer know what arguments were skipped. This is why we have another test
438for arguments. A zero value means that the next token is not a parameter
439reference, a value of one means that a parameter has been set and a value of two
440signals an empty parameter. So, it reports the state of the given parameter as
441a kind if \type {\ifcase}.
442
443\startbuffer[definition]
444\def\foo#1#2{ [\ifparameter#1\or(ONE)\fi\ifparameter#2\or(TWO)\fi] }
445\stopbuffer
446
447\typebuffer[definition][option=TEX] \getbuffer[definition]
448
449\startbuffer[example]
450\foo{1}{2} \foo{1}{} \foo{}{2} \foo{}{}
451\stopbuffer
452
453Of course the test has to be followed by a valid parameter specifier:
454
455\typebuffer[example][option=TEX]
456
457The previous code gives this:
458
459\getbuffer[example]
460
461A combination check \type {\ifparameters}, again a case, matches the first
462parameter that has a value set.
463
464We could add plenty of specifiers but we need to keep in ind that we're not
465talking of an expression scanner. We need to keep performance in mind, so nesting
466and backtracking are no option. We also have a limited set of useable single
467characters, but here's one that uses a symbol that we had left:
468
469\startbuffer[definition]
470\def\startfoo[#/]#/\stopfoo{ [#1](#2) }
471\stopbuffer
472
473\typebuffer[definition][option=TEX] \getbuffer[definition]
474
475\startbuffer[example]
476\startfoo [x ] x \stopfoo
477\startfoo [ x ] x \stopfoo
478\startfoo [ x] x \stopfoo
479\startfoo [ x] \par x \par \par \stopfoo
480\stopbuffer
481
482The slash directive removes leading and trailing so called spacers as well as tokens
483that represent a paragraph end:
484
485\typebuffer[example][option=TEX]
486
487So we get this:
488
489\getbuffer[example]
490
491The next directive, the quitter \type {#;}, is demonstrated with an example. When
492no match has occurred, scanning picks up after this signal, otherwise we just
493quit.
494
495\startbuffer[example]
496\tolerant\def\foo[#1]#;(#2){/#1/#2/}
497
498\foo[1]\quad\foo[2]\quad\foo[3]\par
499\foo(1)\quad\foo(2)\quad\foo(3)\par
500
501\tolerant\def\foo[#1]#;#={/#1/#2/}
502
503\foo[1]\quad\foo[2]\quad\foo[3]\par
504\foo{1}\quad\foo{2}\quad\foo{3}\par
505
506\tolerant\def\foo[#1]#;#2{/#1/#2/}
507
508\foo[1]\quad\foo[2]\quad\foo[3]\par
509\foo{1}\quad\foo{2}\quad\foo{3}\par
510
511\tolerant\def\foo[#1]#;(#2)#;#={/#1/#2/#3/}
512
513\foo[1]\quad\foo[2]\quad\foo[3]\par
514\foo(1)\quad\foo(2)\quad\foo(3)\par
515\foo{1}\quad\foo{2}\quad\foo{3}\par
516\stopbuffer
517
518\typebuffer[example][option=TEX] \startpacked \getbuffer[example] \stoppacked
519
520I have to admit that I don't really need it but it made some macros that I was
521redefining behave better, so there is some self|-|interest here. Anyway, I
522considered some other features, like picking up a detokenized argument but I
523don't expect that to be of much use. In the meantime we ran out of reasonable
524characters, but some day \type {#?} and \type {#!} might show up, or maybe I find
525a use for \type {#<} and \type {#>}. A summary of all this is given here:
526
527\starttabulate[|T|i2l|]
528\FL
529\NC +   \NC keep the braces \NC \NR
530\NC -   \NC discard and don't count the argument \NC \NR
531\NC /   \NC remove leading an trailing spaces and pars \NC \NR
532\NC =   \NC braces are mandate \NC \NR
533\NC _   \NC braces are mandate and kept \NC \NR
534\NC ^   \NC keep leading spaces \NC \NR
535\ML
536\NC 1-9 \NC an argument \NC \NR
537\NC 0   \NC discard but count the argument \NC \NR
538\ML
539\NC *   \NC ignore spaces \NC \NR
540\NC :   \NC pick up scanning here  \NC \NR
541\NC ;   \NC quit scanning \NC \NR
542\ML
543\NC .   \NC ignore pars and spaces \NC \NR
544\NC ,   \NC push back space when quit \NC \NR
545\LL
546\stoptabulate
547
548The last two have not been discussed and were added later. The period
549directive gobbles space and par tokens and discards them in the
550process. The comma directive is like \type {*} but it pushes back a space
551when the matching quits.
552
553\startbuffer
554\tolerant\def\FooA[#1]#*[#2]{(#1/#2)} % remove spaces
555\tolerant\def\FooB[#1]#,[#2]{(#1/#2)} % push back space
556
557/\FooA/ /\FooA / /\FooA[1]/ /\FooA[!] / /\FooA[1] [2]/ /\FooA[1] [2] /\par
558/\FooB/ /\FooB / /\FooB[1]/ /\FooB[!] / /\FooB[1] [2]/ /\FooB[1] [2] /\par
559\stopbuffer
560
561\typebuffer[example][option=TEX] \startpacked \getbuffer[example] \stoppacked
562
563Gobbling spaces versus pushing back is an interface design decision because it
564has to do with consistency.
565
566\stopsectionlevel
567
568\startsectionlevel[title=Runaway arguments]
569
570There is a particular troublesome case left: a runaway argument. The solution is
571not pretty but it's the only way: we need to tell the parser that it can quit.
572
573\startbuffer[definition]
574\tolerant\def\foo[#1=#2]%
575  {\ifarguments 0\or 1\or 2\or 3\or 4\fi:\vl\type{#1}\vl\type{#2}\vl}
576\stopbuffer
577
578\typebuffer[definition][option=TEX] \getbuffer[definition]
579
580\startbuffer[example]
581\dontleavehmode \foo[a=1]
582\dontleavehmode \foo[b=]
583\dontleavehmode \foo[=]
584\dontleavehmode \foo[x]\ignorearguments
585\stopbuffer
586
587The outcome demonstrates that one still has to do some additional checking for sane
588results and there are alternative way to (ab)use this mechanism. It all boils down
589to a clever combination of delimiters and \type {\ignorearguments}.
590
591\typebuffer[example][option=TEX]
592
593All calls are accepted:
594
595\startlines \getbuffer[example] \stoplines
596
597Just in case you wonder about performance: don't expect miracles here. On the one
598hand there is some extra overhead in the engine (when defining macros as well as
599when collecting arguments during a macro call) and maybe using these new features
600can sort of compensate that. As mentioned: the gain is mostly in cleaner macro
601code and less clutter in tracing. And I just want the \CONTEXT\ code to look
602nice: that way users can look in the source to see what happens and not drown in
603all these show|-|off tricks, special characters like underscores, at signs,
604question marks and exclamation marks.
605
606For the record: I normally run tests to see if there are performance side effects
607and as long as processing the test suite that has thousands of files of all kind
608doesn't take more time it's okay. Actually, there is a little gain in \CONTEXT\
609but that is to be expected, but I bet users won't notice it, because it's easily
610offset by some inefficient styling. Of course another gain of loosing some
611indirectness is that error messages point to the macro that the user called for
612and not to some follow up.
613
614\stopsectionlevel
615
616\startsectionlevel[title=Introspection]
617
618A macro has a meaning. You can serialize that meaning as follows:
619
620\startbuffer[definition]
621\tolerant\protected\def\foo#1[#2]#*[#3]%
622  {(1=#1) (2=#3) (3=#3)}
623
624\meaning\foo
625\stopbuffer
626
627\typebuffer[definition][option=TEX]
628
629The meaning of \type {\foo} comes out as:
630
631\startnarrower \getbuffer[definition] \stopnarrower
632
633When you load the module \type {system-tokens} you can also say:
634
635\startbuffer[example]
636\luatokentable\foo
637\stopbuffer
638
639\typebuffer[example][option=TEX]
640
641This produces a table of tokens specifications:
642
643{\getbuffer[definition]\getbuffer[example]}
644
645A token list is a linked list of tokens. The magic numbers in the first column
646are the token memory pointers. and because macros (and token lists) get recycled
647at some point the available tokens get scattered, which is reflected in the order
648of these numbers. Normally macros defined in the macro package are more sequential
649because they stay around from the start. The second and third row show the so
650called command code and the specifier. The command code groups primitives in
651categories, the specifier is an indicator of what specific action will follow, a
652register number a reference, etc. Users don't need to know these details. This
653macro is a special version of the online variant:
654
655\starttyping[option=TEX]
656\showluatokens\foo
657\stoptyping
658
659That one is always available and shows a similar list on the console. Again, users
660normally don't want to know such details.
661
662\stopsectionlevel
663
664\startsectionlevel[title=nesting]
665
666You can nest macros, as in:
667
668\startbuffer
669\def\foo#1#2{\def\oof##1{<#1>##1<#2>}}
670\stopbuffer
671
672\typebuffer[option=TEX] \getbuffer
673
674At first sight the duplication of \type {#} looks strange but this is what
675happens. When \TEX\ scans the definition of \type {\foo} it sees two arguments.
676Their specification ends up in the preamble that defines the matching. When the
677body is scanned, the \type {#1} and \type {#2} are turned into a parameter
678reference. In order to make nested macros with arguments possible a \type {#}
679followed by another \type {#} becomes just one \type {#}. Keep in mind that the
680definition of \type {\oof} is delayed till the macro \type {\foo} gets expanded.
681That definition is just stored and the only thing that get's replaced are the two
682references to a macro parameter
683
684\luatokentable\foo
685
686Now, when we look at these details, it might become clear why for instance we
687have \quote {variable} names like \type {#4} and not \type {#whatever} (with or
688without hash). Macros are essentially token lists and token lists can be seen as
689a sequence of numbers. This is not that different from other programming
690environments. When you run into buzzwords like \quote {bytecode} and \quote
691{virtual machines} there is actually nothing special about it: some high level
692programming (using whatever concept, and in the case of \TEX\ it's macros)
693eventually ends up as a sequence of instructions, say bytecodes. Then you need
694some machinery to run over that and act upon those numbers. It's something you
695arrive at naturally when you play with interpreting languages. \footnote {I
696actually did when I wrote an interpreter for some computer assisted learning
697system, think of a kind of interpreted \PASCAL, but later realized that it was a a
698bytecode plus virtual machine thing. I'd just applied what I learned when playing
699with eight bit processors that took bytes, and interpreted opcodes and such.
700There's nothing spectacular about all this and I only realized decades later that
701the buzzwords describes old natural concepts.}
702
703So, internally a \type {#4} is just one token, a operator|-|operand combination
704where the operator is \quotation {grab a parameter} and the operand tells
705\quotation {where to store} it. Using names is of course an option but then one
706has to do more parsing and turn the name into a number \footnote {This is kind of
707what \METAPOST\ does with parameters to macros. The side effect is that in
708reporting you get \type {text0}, \type {expr2} and such reported which doesn't
709make things more clear.}, add additional checking in the macro body, figure out
710some way to retain the name for the purpose of reporting (which then uses more
711token memory or strings). It is simply not worth the trouble, let alone the fact
712that we loose performance, and when \TEX\ showed up those things really mattered.
713
714It is also important to realize that a \type {#} becomes either a preamble token
715(grab an argument) or a reference token (inject the passed tokens into a new
716input level). Therefore the duplication of hash tokens \type {##} that you see in
717macro nested bodies also makes sense: it makes it possible for the parser to
718distinguish between levels. Take:
719
720\starttyping[option=TEX]
721\def\foo#1{\def\oof##1{#1##1#1}}
722\stoptyping
723
724Of course one can think of this:
725
726\starttyping[option=TEX]
727\def\foo#fence{\def\oof#text{#fence#text#fence}}
728\stoptyping
729
730But such names really have to be unique then! Actually \CONTEXT\ does have an
731input method that supports such names, but discussing it here is a bit out of
732scope. Now, imagine that in the above case we use this:
733
734\starttyping[option=TEX]
735\def\foo[#1][#2]{\def\oof##1{#1##1#2}}
736\stoptyping
737
738If you're a bit familiar with the fact that \TEX\ has a model of category codes
739you can imagine that a predictable \quotation {hash followed by a number} is way
740more robust than enforcing the user to ensure that catcodes of \quote {names} are
741in the right category (read: is a bracket part of the name or not). So, say that
742we go completely arbitrary names, we then suddenly needs some escaping, like:
743
744\starttyping[option=TEX]
745\def\foo[#{left}][#{right}]{\def\oof#{text}{#{left}#{text}#{right}}}
746\stoptyping
747
748And, if you ever looked into macro packages, you will notice that they differ in
749the way they assign category codes. Asking users to take that into account when
750defining macros makes not that much sense.
751
752So, before one complains about \TEX\ being obscure (the hash thing), think twice.
753Your demand for simplicity for your coding demand will make coding more
754cumbersome for the complex cases that macro packages have to deal with. It's
755comparable using \TEX\ for input or using (say) mark down. For simple documents
756the later is fine, but when things become complex, you end up with similar
757complexity (or even worse because you lost the enforced detailed structure). So,
758just accept the unavoidable: any language has its peculiar properties (and for
759sure I do know why I dislike some languages for it). The \TEX\ system is not the
760only one where dollars, percent signs, ampersands and hashes have special
761meaning.
762
763\stopsectionlevel
764
765\startsectionlevel[title=Prefixes]
766
767Traditional \TEX\ has three prefixes that can be used with macros: \type {\global},
768\type {\outer} and \type {\long}. The last two are no|-|op's in \LUAMETATEX\ and
769if you want to know what they do (did) you can look it up in the \TEX book. The
770\ETEX\ extension gave us \type {\protected}.
771
772In \LUAMETATEX\ we have \type {\global}, \type {\protected}, \type {\tolerant}
773and overload related prefixes like \type {\frozen}. A protected macro is one that
774doesn't expand in an expandable context, so for instance inside an \type {\edef}.
775You can force expansion by using the \type {\expand} primitive in front which is
776also something \LUAMETATEX.
777
778% A protected macro can be made expandable by \typ {\unletprotected} and can be
779% protected with \typ {\letprotected}.
780%
781% \startbuffer[example]
782%                \def\foo{foo} \edef\oof{oof\foo} 1: \meaning\oof
783%      \protected\def\foo{foo} \edef\oof{oof\foo} 2: \meaning\oof
784% \unletprotected    \foo      \edef\oof{oof\foo} 3: \meaning\oof
785% \stopbuffer
786%
787% \typebuffer[example][option=TEX]
788%
789% \startlines \getbuffer[example] \stoplines
790
791Frozen macros cannot be redefined without some effort. This feature can to some
792extent be used to prevent a user from overloading, but it also makes it harder
793for the macro package itself to redefine on the fly. You can remove the lock with
794\typ {\unletfrozen} and add a lock with \typ {\letfrozen} so in the end users
795still have all the freedoms that \TEX\ normally provides.
796
797\startbuffer[example]
798                 \def\foo{foo} 1: \meaning\foo
799          \frozen\def\foo{foo} 2: \meaning\foo
800     \unletfrozen    \foo      3: \meaning\foo
801\protected\frozen\def\foo{foo} 4: \meaning\foo
802     \unletfrozen    \foo      5: \meaning\foo
803\stopbuffer
804
805\typebuffer[example][option=TEX]
806
807\startlines \overloadmode0 \getbuffer[example] \stoplines
808
809This actually only works when you have set \type {\overloadmode} to a value that
810permits redefining a frozen macro, so for the purpose of this example we set it
811to zero.
812
813A \type {\tolerant} macro is one that will quit scanning arguments when a
814delimiter cannot be matched. We saw examples of that in a previous section.
815
816These prefixes can be chained (in arbitrary order):
817
818\starttyping[option=TEX]
819\frozen\tolerant\protected\global\def\foo[#1]#*[#2]{...}
820\stoptyping
821
822There is actually an additional prefix, \type {\immediate} but that one is there
823as signal for a macro that is defined in and handled by \LUA. This prefix can
824then perform the same function as the one in traditional \TEX, where it is used
825for backend related tasks like \type {\write}.
826
827Now, the question is of course, to what extent will \CONTEXT\ use these new
828features. One important argument in favor of using \type {\tolerant} is that it
829gives (hopefully) better error messages. It also needs less code due to lack of
830indirectness. Using \type {\frozen} adds some safeguards although in some places
831where \CONTEXT\ itself overloads commands, we need to defrost. Adapting the code
832is a tedious process and it can introduce errors due to mistypings, although
833these can easily be fixed. So, it will be used but it will take a while to adapt
834the code base.
835
836One problem with frozen macros is that they don't play nice with for instance
837\typ {\futurelet}. Also, there are places in \CONTEXT\ where we actually do
838redefine some core macro that we also want to protect from redefinition by a
839user. One can of course \typ {\unletfrozen} such a command first but as a bonus
840we have a prefix \typ {\overloaded} that can be used as prefix. So, one can easily
841redefine a frozen macro but it takes a little effort. After all, this feature is
842mainly meant to protect a user for side effects of definitions, and not as final
843blocker. \footnote {As usual adding features like this takes some experimenting
844and we're now at the third variant of the implementation, so we're getting there.
845The fact that we can apply such features in large macro package like \CONTEXT\
846helps figuring out the needs and best approaches.}
847
848A frozen macro can still be overloaded, so what if we want to prevent that? For
849this we have the \typ {\permanent} prefix. Internally we also create primitives
850but we don't have a prefix for that. But we do have one for a very special case
851which we demonstrate with an example:
852
853\startbuffer[example]
854\def\FOO % trickery needed to pick up an optional argument
855  {\noalign{\vskip10pt}}
856
857\noaligned\protected\tolerant\def\OOF[#1]%
858  {\noalign{\vskip\iftok{#1}\emptytoks10pt\else#1\fi}}
859
860\starttabulate[|l|l|]
861    \NC test \NC test \NC \NR
862    \NC test \NC test \NC \NR
863    \FOO
864    \NC test \NC test \NC \NR
865    \OOF[30pt]
866    \NC test \NC test \NC \NR
867    \OOF
868    \NC test \NC test \NC \NR
869\stoptabulate
870\stopbuffer
871
872\typebuffer[example][option=TEX]
873
874When \TEX\ scans input (from a file or token list) and starts an alignment, it
875will pick up rows. When a row is finished it will look ahead for a \type
876{\noalign} and it expands the next token. However, when that token is protected,
877the scanner will not see a \type {\noalign} in that macro so it will likely start
878complaining when that next macro does get expanded and produces a \type
879{\noalign} when a cell is built. The \type {\noaligned} prefix flags a macro as
880being one that will do some \type {\noalign} as part of its expansion. This trick
881permits clean macros that pick up arguments. Of course it can be done with
882traditional means but this whole exercise is about making the code look nice.
883
884The table comes out as:
885
886\getbuffer[example]
887
888One can check the flags with \type {\ifflags} which takes a control sequence and
889a number, where valid numbers are:
890
891\starttabulate[|r|lw(8em)|r|lw(8em)|r|lw(8em)|r|lw(8em)|]
892\NC \the\frozenflagcode    \NC frozen
893\NC \the\permanentflagcode \NC permanent
894\NC \the\immutableflagcode \NC immutable
895\NC \the\primitiveflagcode \NC primitive  \NC \NR
896\NC \the\mutableflagcode   \NC mutable
897\NC \the\noalignedflagcode \NC noaligned
898\NC \the\instanceflagcode  \NC instance
899\NC                        \NC            \NC \NR
900\stoptabulate
901
902The level of checking is controlled with the \type {\overloadmode} but I'm still
903not sure about how many levels we need there. A zero value disables checking,
904the values 1 and 3 give warnings and the values 2 and 4 trigger an error.
905
906\stopsectionlevel
907
908\startsectionlevel[title=Arguments]
909
910The number of arguments that a macro takes is traditionally limited to nine (or
911ten if one takes the trailing \type {#} into account). That this is enough for
912most cases is demonstrated by the fact that \CONTEXT\ has only a handful of
913macros that use \type {#9}. The reason for this limitation is in part a side
914effect of the way the macro preamble and arguments are parsed. However, because
915in \LUAMETATEX\ we use a different implementation, it was not that hard to permit
916a few more arguments, which is why we support upto 15 arguments, as in:
917
918\starttyping[option=TEX]
919\def\foo#1#2#3#4#5#6#7#8#9#A#B#C#D#E#F{...}
920\stoptyping
921
922We can support the whole alphabet without much trouble but somehow sticking to
923the hexadecimal numbers makes sense. It is unlikely that the core of \CONTEXT\
924will use this option but sometimes at the user level it can be handy. The penalty
925in terms of performance can be neglected.
926
927\starttyping[option=TEX]
928\tolerant\def\foo#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=%
929  {(#1)(#2)(#3)(#4)(#5)(#6)(#7)(#8)(#9)(#A)(#B)(#C)(#D)(#E)(#F)}
930
931\foo{1}{2}
932\stoptyping
933
934In the previous example we have 15 optional arguments where braces are mandate
935(otherwise we the scanner happily scoops up what follows which for sure gives some
936error).
937
938\stopsectionlevel
939
940\startsectionlevel[title=Constants]
941
942The \LUAMETATEX\ engine has lots of efficiency tricks in the macro parsing and
943expansion code that makes it not only fast but also let is use less memory.
944However, every time that the body of a macro is to be injected the expansion
945machinery kicks in. This often means that a copy is made (pushed in the input and
946used afterwards). There are however cases where the body is just a list of
947character tokens (with category letter or other) and no expansion run over the
948list is needed.
949
950It is tempting to introduce a string data type that just stores strings and
951although that might happen at some point it has the disadvantage that one need to
952tokenize that string in order to be able to use it, which then defeats the gain.
953An alternative has been found in constant macros, that is: a macro without
954parameters and a body that is considered to be expanded and never freed by
955redefinition. There are two variants:
956
957\starttyping[option=TEX]
958\cdef      \foo          {whatever}
959\cdefcsname foo\endcsname{whatever}
960\stoptyping
961
962These are actually just equivalents to
963
964\starttyping[option=TEX]
965\edef      \foo          {whatever}
966\edefcsname foo\endcsname{whatever}
967\stoptyping
968
969just to make sure that the body gets expanded at definition time but they are
970also marked as being constant which in some cases might give some gain, for
971instance when used in csname construction. The gain is less then one expects
972although there are a few cases in \CONTEXT\ where extreme usage of parameters
973benefits from it. Users are unlikely to use these two primitives.
974
975Another example of a constant usage is this:
976
977\starttyping[option=TEX]
978\lettonothing\foo
979\stoptyping
980
981which gives \type {\foo} an empty body. That one is used in the core, if only because
982it gives a bit smaller code. Performance is no that different from
983
984\starttyping[option=TEX]
985\let\foo\empty
986\stoptyping
987
988but it saves one token (8 bytes) when used in a macro. The assignment itself is
989not that different because \type {\foo} is made an alias to \type {\empty} which
990in turn only needs incrementing a reference counter.
991
992\stopsectionlevel
993
994\startsectionlevel[title=Passing parameters]
995
996When you define a macro, the \type {#1} and more parameters are embedded as a
997reference to a parameter that is passed. When we have four parameters, the
998parameter stack has four entries and when an entry is eventually accessed a new
999input level is pushed and tokens are fetched from that list. This has some side
1000effects when we check a parameter. This can happen multiple times, depending on
1001how often we access a parameter. Take the following:
1002
1003\startbuffer
1004\def\oof#1{#1}
1005
1006\tolerant\def\foo[#1]#*[#2]%
1007  {1:\ifparameter#1\or Y\else N\fi\quad
1008   2:\ifparameter#2\or Y\else N\fi\quad
1009   \oof{3:\ifparameter #1\or Y\else N\fi\quad
1010        4:\ifparameter #2\or Y\else N\fi\quad}%
1011   \par}
1012
1013\foo \foo[] \foo[][] \foo[A] \foo[A][B]
1014\stopbuffer
1015
1016\typebuffer
1017
1018This gives:
1019
1020\startpacked \tttf
1021\inlinebuffer
1022\stoppacked
1023
1024as you probably expect. However the first two checks are different from the
1025embedded checks because they can check against the parameter reference. When we
1026expand \type {\oof} its argument gets passed to the macro as a list and when the
1027scanner collects the next token it will then push the parameter content on the
1028input stack. So, then, instead of a reference we get the referenced parameter
1029list. Internally that means that in 3 and 4 we check for a token and not for the
1030length of the list (as in case 1 & 2). This means that
1031
1032\starttyping
1033\iftok{#1}\emptytoks Y\else N\fi
1034\ifparameter#1\or    Y\else N\fi
1035\stoptyping
1036
1037are different. In the first case we have a proper token list and nested
1038conditionals in that list are okay. In the second case we just look ahead to see
1039if there is an \type {\or}, \type {\else} or other condition related command and
1040if so we decide that there is no parameter. So, if \type {\ifparameter} is a
1041suitable check for empty depends on the need for expansion.
1042
1043When you define macros that themselves call macros that should operate on the
1044arguments of its parent you can easily pass these:
1045
1046\startbuffer[test-1]
1047\def\foo#1#2%
1048  {\oof{#1}{#2}{P}%
1049   \oof{#1}{#2}{Q}%
1050   \oof{#1}{#2}{R}}
1051
1052\def\oof#1#2#3%
1053  {[#1][#1]%
1054   #3%
1055   [#2][#2]}
1056\stopbuffer
1057
1058\typebuffer[test-1]
1059
1060Here the nested call to \type {\oof} involved three passed parameters. You can
1061avoid that as follows:
1062
1063\startbuffer[test-2]
1064\def\foo#1#2%
1065  {\def\MyIndexOne{#1}%
1066   \def\MyIndexTwo{#2}%
1067   \oof{P}\oof{Q}\oof{R}}
1068
1069\def\oof#1%
1070  {(\MyIndexOne)(\MyIndexOne)%
1071   #1%
1072   (\MyIndexTwo)(\MyIndexTwo)}
1073\stopbuffer
1074
1075\typebuffer[test-2]
1076
1077You can also do this:
1078
1079\startbuffer[test-3]
1080\def\foo#1#2%
1081  {\def\oof##1%
1082     {/#1/#2/%
1083     ##1%
1084     /#1//#2/}%
1085   \oof{P}\oof{Q}\oof{R}}
1086\stopbuffer
1087
1088\typebuffer[test-3]
1089
1090These parameters indicated by \type {#} in the macro body are in fact references.
1091When we call for instance \type {\foo {1}{2}} the two parameters get pushed on a
1092parameter stack and the embodied references point to these stack entries. By the
1093time that body gets expanded \TEX\ bumps the input level and pushes the parameter
1094list onto the input stack. It then continues expansion. The parameter is not
1095copied, because it can't be changed anyway. The only penalty in terms of
1096performance and memory usage is the pushing and popping of the input. So how does
1097that work out for these three cases?
1098
1099When in the first case the \type {\oof{#1}{#2}{P}} is seen, \TEX\ starts expanding
1100the \type {\oof} macro. That one expects three arguments. The \type {#1} reference is
1101seen and in this case a copy of that parameter is passed. The same is true for the
1102other two. Then, inside \type {\oof} expansion happens on the parameters on the stack
1103and no copies have to be made there.
1104
1105The second case defines two macros so again two copies are made that make the bodies
1106of these macros. This comes at the cost of some runtime and memory. However, this
1107time with \type {\oof{P}} only one argument gets passed and instead expansion of the
1108macros happen in there.
1109
1110Normally macro arguments are not that large but there can be situations where we
1111really want to avoid useless copying. This not only saves memory but also can give a
1112bit better performance. In the examples above the second variant is some 10\percent
1113faster than the first one. We can gain another 10\percent with the following trick:
1114
1115\startbuffer[test-4]
1116\def\foo#1#2%
1117  {\parameterdef\MyIndexOne\plusone % 1
1118   \parameterdef\MyIndexTwo\plustwo % 2
1119   \oof{P}\oof{Q}\oof{R}\norelax}
1120
1121\def\oof#1%
1122  {<\MyIndexOne><\MyIndexOne>%
1123   #1%
1124   <\MyIndexTwo><\MyIndexTwo>}
1125\stopbuffer
1126
1127\typebuffer[test-4]
1128
1129Here we define an explicit parameter reference that we access later on. There is
1130the overhead of a definition but it can be neglected. We use that reference
1131(abstraction) in \type {\oof}. Actually you can use that reference in any call
1132down the chain.
1133
1134When applied to \type {\foo{1}{2}} the four variants above give us:
1135
1136\startpacked
1137\startlines \tt
1138\getbuffer[test-1]\foo{1}{2}
1139\getbuffer[test-2]\foo{1}{2}
1140\getbuffer[test-3]\foo{1}{2}
1141\getbuffer[test-4]\foo{1}{2}
1142\stoplines
1143\stoppacked
1144
1145Before we had \type {parameterdef} we had this:
1146
1147\startbuffer[test-5]
1148\def\foo#1#2%
1149  {\integerdef\MyIndexOne\parameterindex\plusone % 1
1150   \integerdef\MyIndexTwo\parameterindex\plustwo % 2
1151   \oof{P}\oof{Q}\oof{R}\norelax}
1152
1153\def\oof#1%
1154  {<\expandparameter\MyIndexOne><\expandparameter\MyIndexOne>%
1155   #1%
1156   <\expandparameter\MyIndexTwo><\expandparameter\MyIndexTwo>}
1157\stopbuffer
1158
1159\typebuffer[test-5]
1160
1161It involves more tokens, is a bit less abstract, but as it is a cheap extension
1162we kept it. It actually demonstrates that one can access parameters in the stack
1163by index, but it one then needs to keep track of where access takes place. In
1164principle one can debug the call chain this way.
1165
1166To come back to performance and memory usage, when the arguments become larger
1167the fourth variant with the \type {\parameterdef} quickly gains over the others.
1168But it only shows in exceptional usage. This mechanism is more about abstraction:
1169it permits us to efficiently turn arguments into local variables without the
1170overhead involved in creating macros.
1171
1172\stopsectionlevel
1173
1174\startsectionlevel[title=Nesting]
1175
1176We also have a few preamble features that relate to nesting. Although we can do
1177without (as shown for years in \LMTX) they do have some benefits. They are
1178discussed as group here and because they are only useful for low level
1179programming we stick to simple examples. The \type {#L} and \type {#R} use the
1180following token as delimiters. Here we use \type {[} and \type {]} but they can
1181be a \type {\cs} as well. Nested delimiters are handled well.
1182
1183The \type {#S} grabs the argument till the next final square bracket \type {]}
1184but in the process will grab nested with it sees a \type {[}. The \type {#P} does
1185the same for parentheses and \type {#X} for angle brackets. In the next examples
1186the \type {#*} just gobbles optional spaces but we've seen that one already.
1187
1188The \type {#G} argument just registers the next token as delimiter but it will
1189grab multiple of them. The \type {#M} gobbles more: in addition to the delimiter
1190spaces are gobbled.
1191
1192\startbuffer
1193\tolerant\def\fooA               [#1]{(#1)}
1194\tolerant\def\fooB          [#L[#R]#1{(#1)}
1195\tolerant\def\fooC               #S#1{(#1)}
1196\tolerant\def\fooE              #S#1,{(#1)}
1197\tolerant\def\fooF         #S#1#*#S#2{(#1/#2)}
1198\tolerant\def\fooG [#1]#S[#2]#*#S[#3]{(#1/#2/#3)}
1199\tolerant\def\fooH [#1][#S#2]#*[#S#3]{(#1/#2/#3)}
1200\tolerant\def\fooI           #1=#2#G,{(#1=#2)}
1201\tolerant\def\fooJ           #1=#2#M,{(#1=#2)}
1202\stopbuffer
1203
1204\typebuffer
1205
1206\getbuffer
1207
1208\starttabulate[|T|T|T||]
1209\NC \type{\fooA [x]}            \NC \fooA [x]             \NC (x)           \NC \NR
1210\NC \type{\fooB [x]}            \NC \fooB [x]             \NC (x)           \NC \NR
1211\NC \type{\fooC [1[2]3[4]5]}    \NC \fooC [1[2]3[4]5]     \NC (1[2]3[4]5)   \NC \NR
1212\NC \type{\fooE X[,]X,}         \NC \fooE X[,]X,          \NC (X[,]X)       \NC \NR
1213\NC \type{\fooF [A] [B]}        \NC \fooF [A] [B]         \NC (A/B)         \NC \NR
1214\NC \type{\fooF [] []}          \NC \fooF [] []           \NC (/)           \NC \NR
1215\NC \type{\fooG [a][b][c]}      \NC \fooG [a][b][c]       \NC (a/b/c)       \NC \NR
1216\NC \type{\fooG [a][b]}         \NC \fooG [a][b]          \NC (a/b/)        \NC \NR
1217\NC \type{\fooG [a]}            \NC \fooG [a]             \NC (a//)         \NC \NR
1218\NC \type{\fooG [a][x[x]x][c]}  \NC \fooG [a][x[x]x][c]   \NC (a/x[x]x/c)   \NC \NR
1219\NC \type{\fooH [a][x[x]x][c]}  \NC \fooH [a][x[x]x][c]   \NC (a/x[x]x/c)   \NC \NR
1220\NC \type{\fooI X=X,,,}         \NC \fooI X=X,,,          \NC (X=X)         \NC \NR
1221\NC \type{\fooJ X=X, , ,}       \NC \fooJ X=X, , ,        \NC (X=X)         \NC \NR
1222\stoptabulate
1223
1224These features make it possible to support nested setups more efficiently and
1225also makes it possible to accept values that contain balanced brackets in setup
1226commands without additional overhead. Although it has never been an issue to let
1227users specify:
1228
1229\starttyping
1230\defineoverlay[whatever][{some \command[withparameters] here}]
1231
1232\setupfoo[before={\blank[big]}]
1233\stoptyping
1234
1235it might be less confusing to permit:
1236
1237\starttyping
1238\defineoverlay[whatever][some \command[withparameters] here]
1239
1240\setupfoo[before=\blank[big]]
1241\stoptyping
1242
1243as well, if only because occasionally users get hit by this.
1244
1245\stopsectionlevel
1246
1247\startsectionlevel[title=Duplicate hashes]
1248
1249In \TEX\ every character has a so called category code. Most characters are
1250classified as \quote {letter} (they make up words) or as \quote {other}. In
1251\UNICODE\ we distinguish symbols, punctuation, and more, but in \TEX\ these are
1252all of category \quote {other}. In math however we can classify them differently
1253but in this perspective we ignore that. The backslash has category \quote
1254{escape} and it starts a control sequence. The curly braces are (internally) of
1255category \quote {left brace} and \quote {right brace} aka \quote {begin group}
1256and \quote {end group} but, no matter what they are called, they begin and end
1257something: a group, argument, token list, box, etc. Any character can have those
1258categories. Although it would loook strange to a \TEX\ user, this can be made
1259valid:
1260
1261\startbuffer
1262!protected !gdef !weird¶1
1263B
1264    something: ¶1
1265E
1266!weird BhereE
1267\stopbuffer
1268
1269\typebuffer
1270
1271In such a setup spaces can be of category \quote {invisible}. The paragraph
1272symbol takes the place of the hash as parameter identifier. The next code shows
1273how this is done. Here we wrap all in a macro so that we don't get catcode
1274interference in the document source.
1275
1276\startbuffer[demo]
1277\def\NotSoTeX
1278  {\begingroup
1279   \catcode `B \begingroupcatcode
1280   \catcode `E \endgroupcatcode
1281   \catcode `¶ \parametercatcode
1282   \catcode `! \escapecatcode
1283   \catcode 32 \ignorecatcode
1284   \catcode 13 \ignorecatcode
1285   % this buffer has a definition:
1286   \getbuffer
1287   % which is now known globally
1288   \endgroup}
1289\NotSoTeX
1290\weird{there}
1291\stopbuffer
1292
1293\typebuffer[demo]
1294
1295This results in:
1296
1297\startlines
1298\getbuffer [demo]
1299\stoplines
1300
1301In the first line the \type {!}, \type {B} and \type {E} are used as escape and
1302argument delimiters, in the second one we use the normal characters. When we show
1303the \type {\meaningasis} we get:
1304
1305\startlines \tt
1306\meaningasis\weird
1307\stoplines
1308
1309or in more detail:
1310
1311\start \tt
1312\luatokentable\weird
1313\stop
1314
1315So, no matter how we set up the system, in the end we get some generic
1316representation. When we see \type {#1} in \quote {print} it can be either two
1317tokens, \type {#} (catcode parameter) followed by \type {1} with catcode other,
1318or one token referring to parameter \type {1} where the character \type {1} is
1319the opcode of an internal \quote {reference command}. In order to distinguish a
1320reference from the two token case, parameter hash tokens get shown as doubles.
1321
1322\start
1323
1324\catcode `¶=\parametercatcode
1325\catcode `§=\parametercatcode
1326
1327\startbuffer
1328\def\test #1{x#1x##1x####1x}
1329\def\tset ¶1{x¶1x¶¶1x¶¶¶¶1x}
1330\stopbuffer
1331
1332\typebuffer \getbuffer
1333
1334And with \type {\meaning} we get, consistent with the input:
1335
1336\startlines \tt
1337\meaning\test
1338\meaning\tset
1339\stoplines
1340
1341These are equivalent, apart from the parameter character in the body of the
1342definition:
1343
1344\startlines \tt
1345\luatokentable\test
1346\luatokentable\tset
1347\stoplines
1348
1349\stop
1350
1351Watch how every \quote {parameter} is just a character with the \UNICODE\ index
1352of the used input character as property. Let us summarize the process. When a
1353single parameter character is seen in the input, the next characer determines how
1354it will be interpreted. If there is a digit then it becomes a reference to a
1355parameter in the preamble, and when followed by another parameter character it
1356will be appended to the body of the macro and that second one is dropped. So, two
1357parameter characters become one, and four become two. One parameter character
1358becomes a reference and from that you can guess what three in a row become.
1359However, when \TEX\ is showing the macro definition (using \type {meaning}) the
1360hashes get duplicated in order to distinguish parameter references from parameter
1361characters that were kept (e.g.\ for nested definitions). One can make an
1362argument for \type {\parameterchar} as we also have \type {\escapechar} but by
1363now this convention is settled and it doesn't look that bad anyway.
1364
1365We now come to the more tricky part with respect to the doubling of hashes. When
1366\TEX\ was written its application landscape looked a bit different. For instance,
1367fonts were limited and therefore it was natural to access special characters by
1368name. Using \type {\#} to get a hash in the text was not that problematic, if one
1369needed that character at all. The same can be said for the braces, backslash and
1370even the dollar (after all \TEX\ is free software).
1371
1372But what if we have more visualization and|/|or serialization than meanings and
1373tracing? When we opened op the internals in \LUATEX\ and even more in
1374\LUAMETATEX\ the duplicating of hashes became a bit of a problem. There we don't
1375need to distinguish between a parameter reference and a parameter character
1376because by that time these references are resolved. All hashes that we encounter
1377are just that: hashes. And this is why in \LUAMETATEX\ we disable the duplication
1378for those cases where it serves no purpose.
1379
1380When the engine scans a macro definition it starts with pickin  g up the name of
1381the macro. Then it starts scanning the preamble upto the left brace. In the
1382preamble of a macro the scanner converts hashes followed by another token into
1383single match token. Then when the macro body is scanned single hashes followed by
1384a number become a reference, while double hashes become one hash and get
1385interpreted at expansion time (possibly triggering an error when not followed by
1386a valid specifier like a number). In traditional \TEX\ we basically had this:
1387
1388\starttyping
1389\def\test#1{#1}
1390\def\test#1{##}
1391\def\test#1{#X}
1392\def\test#1{##1}
1393\stoptyping
1394
1395There can be a traling \type {#} in the preamble for special purposes but we
1396forget about that now. The first definition is valid, the second definition is
1397invalid when the macro is expanded and the third definition triggers an error at
1398definition time. The last definition will again trigger an error at expansion
1399time.
1400
1401However, in \LUAMETATEX\ we have an extended preamble where the following
1402preamble parameters are handled (some only in tolerant mode):
1403
1404\starttabulate[|c|||]
1405\NC \type{#n} \NC parameter                                   \NC index \type{1} upto \type{E} \NC \NR
1406\TB
1407\NC \type{#0} \NC throw away parameter                        \NC increment index              \NC \NR
1408\NC \type{#-} \NC ignore parameter                            \NC keep index                   \NC \NR
1409\TB
1410\NC \type{#*} \NC gobble white space                          \NC                              \NC \NR
1411\NC \type{#+} \NC keep (honor) the braces                     \NC                              \NC \NR
1412\NC \type{#.} \NC ignore pars and spaces                      \NC                              \NC \NR
1413\NC \type{#,} \NC push back space when no match               \NC                              \NC \NR
1414\NC \type{#/} \NC remove leading and trailing spaces and pars \NC                              \NC \NR
1415\NC \type{#=} \NC braces are mandate                          \NC                              \NC \NR
1416\NC \type{#^} \NC keep leading spaces                         \NC                              \NC \NR
1417\NC \type{#_} \NC braces are mandate and kept (obey)          \NC                              \NC \NR
1418\TB
1419\NC \type{#@} \NC par delimiter                               \NC only for internal usage      \NC \NR
1420\TB
1421\NC \type{#:} \NC pick up scanning here                       \NC                              \NC \NR
1422\NC \type{#;} \NC quit scanning                               \NC                              \NC \NR
1423\TB
1424\NC \type{#L} \NC left delimiter token                        \NC followed by token            \NC \NR
1425\NC \type{#R} \NC right delimiter token                       \NC followed by token            \NC \NR
1426\TB
1427\NC \type{#G} \NC gobble token                                \NC followed by token            \NC \NR
1428\NC \type{#M} \NC gobble token and spaces                     \NC followed by token            \NC \NR
1429\TB
1430\NC \type{#S} \NC nest square brackets                        \NC only inner pairs             \NC \NR
1431\NC \type{#X} \NC nest angle brackets                         \NC only inner pairs             \NC \NR
1432\NC \type{#P} \NC nest parentheses                            \NC only inner pairs             \NC \NR
1433\stoptabulate
1434
1435As mentioned these will become so called match tokens and only when we show the
1436meaning the hash will show up again.
1437
1438\startbuffer
1439\def\test[#1]#*[*S#2]{.#1.#2.}
1440\stopbuffer
1441
1442\typebuffer \getbuffer
1443
1444\startlines \tt
1445\luatokentable\test
1446\stoplines
1447
1448This means that in the body of a macro you will not see \type {#*} show up. It is
1449just a directive that tells the macro parser that spaces are to be skipped. The
1450\type {#S} directive makes the parser for the second parameter handle nested
1451square bracket. The only hash that we can see end up in the body is the one that
1452we entered as double hash (then turned single) followed by (in traditional terms)
1453a number that when all gets parsed with then become a reference: the sequence
1454\type {##1} internally is \type {#1} and becomes \quote {reference to parameter
14551} assuming that we define a macro in that body. If no number is there, an error
1456is issued. This opens up the possibility to add more variants because it will
1457only break compatibility with respect to what is seen as error. As with the
1458preamble extensions, old documents that have them would have crashed before they
1459became available.
1460
1461So, this means that in the body, and actually anywhere in the document apart from
1462preambles, we now support the following general parameter specifiers. Keep in
1463mind that they expand in an expansion context which can be tricky when they
1464overlap with preamble entries, like for instance \type {#R} in such an expansion.
1465Future extensions can add more so {\em any} hashed shortcut is sensitive for
1466that.
1467
1468\starttabulate[|l|||]
1469\NC \type{#I} \NC current iterator     \NC \type {\currentloopiterator}    \NC \NR
1470\NC \type{#P} \NC parent iterator      \NC \type {\previousloopiterator 1} \NC \NR
1471\NC \type{#G} \NC grandparent iterator \NC \type {\previousloopiterator 2} \NC \NR
1472\TB
1473\NC \type{#H} \NC hash escape          \NC \type {#}  \NC \NR
1474\NC \type{#S} \NC space escape         \NC \ruledhbox to  \interwordspace{\novrule height .8\strutht} \NC \NR
1475\NC \type{#T} \NC tab escape           \NC \type {\t} \NC \NR
1476\NC \type{#L} \NC newline escape       \NC \type {\n} \NC \NR
1477\NC \type{#R} \NC return escape        \NC \type {\r} \NC \NR
1478\NC \type{#X} \NC backslash escape     \NC \tex  {}   \NC \NR
1479\TB
1480\NC \type{#N} \NC nbsp \NC \type {U+00A0} (under consideration) \NC \NR
1481\NC \type{#Z} \NC zws  \NC \type {U+200B} (under consideration) \NC \NR
1482%NC \type{#-} \NC zwnj \NC \type {U+200C} (under consideration) \NC \NR
1483%NC \type{#+} \NC zwj  \NC \type {U+200D} (under consideration) \NC \NR
1484%NC \type{#>} \NC l2r  \NC \type {U+200E} (under consideration) \NC \NR
1485%NC \type{#<} \NC r2l  \NC \type {U+200F} (under consideration) \NC \NR
1486\stoptabulate
1487
1488Some will now argue that we already have \type {^^} escapes in \TEX\ and \type
1489{^^^^} and \type {^^^^^^} in \LUATEX\ and that is true. However, these can be
1490disabled, and in \CONTEXT\ they are, where we instead enable the prescript,
1491postscript, and index features in mathmode and there type {^} and \type {_} are
1492used. Even more: in \CONTEXT\ we just let \type {^}, \type {_} and \type {&} be
1493what they are. Occasionally I consider \type {$} to be just that but as I don't
1494have dollars I will happily leave that for inline math. When users are not
1495defining macros or are using the alternative definitions we can consider making
1496the \type {#} a hash. An excellent discussion of how \TEX\ reads it's input and
1497changes state accordingly can be found in Victor Eijkhouts \quotation {\TEX\ By
1498Topic}, section 2.6: when \type {^^} is followed by a character with $v < 128$
1499the interpreter will inject a character with code $v - 64$. When followed by two
1500(!) lowercase hexadecimal characters, the corresponding character will be
1501injected. Anyway, it not only looks kind of ugly, it also is somewhat weird
1502because what follows is interpreted mixed way. The substitution happens early on
1503(which is okay). But, how about the output? Traditional \TEX\ serializes special
1504characters with a similar syntax but that has become optional when eight bit mode
1505was added to the engines, it is configurable in \LUATEX\ and has been dropped in
1506\LUAMETATEX: we operate in a \UTF\ universum.
1507
1508\stopsectionlevel
1509
1510\stopdocument
1511
1512% freezing pitfalls:
1513%
1514% - \futurelet  : \overloaded needed
1515% - \let        : \overloaded sometimes needed
1516%
1517% primitive protection:
1518%
1519% \newif\iffoo \footrue \foofalse : problem when we make iftrue and iffalse
1520% permanent ... they inherit, so we can't let them, we need a not permanent
1521% alias which is again tricky ... something native?
1522%
1523% immutable : still \count000 but we can consider blocking that, for instance
1524% by \def\count{some error}
1525%
1526% \defcsname
1527% \edefcsname
1528% \letcsname
1529
1530% {
1531%     \scratchdimenone 10pt \the\currentstacksize\par
1532%     \scratchdimentwo 10pt \the\currentstacksize\par
1533%     \scratchdimenone 20pt \the\currentstacksize\par
1534%     \scratchdimentwo 20pt \the\currentstacksize\par
1535%     \scratchdimenone 10pt \the\currentstacksize\par
1536%     {
1537%         \scratchdimenone 10pt \the\currentstacksize\par
1538%         \scratchdimentwo 20pt \the\currentstacksize\par
1539%     }
1540% }
1541