still-tokens.tex /size: 28 Kb    last modification: 2023-12-21 09:43
1% language=us
2
3\environment still-environment
4
5\starttext
6
7\startchapter[title=Scanning input]
8
9\startsection[title=Introduction]
10
11Tokens are the building blocks of the input for \TEX\ and they drive the process
12of expansion which in turn results in typesetting. If you want to manipulate the
13input, intercepting tokens is one approach. Other solutions are preprocessing or
14writing macros that do something with their picked|-|up arguments. In \CONTEXT\
15\MKIV\ we often forget about manipulating the input but manipulate the
16intermediate typesetting results instead. The advantage is that only at that
17moment do you know what you're truly dealing with, but a disadvantage is that
18parsing the so-called node lists is not always efficient and it can even be
19rather complex, for instance in math. It remains a fact that until \LUATEX\
20version 0.80 \CONTEXT\ hardly used the token interface.
21
22In version 0.80 a new scanner interface was introduced, demonstrated by Taco
23Hoekwater at the \CONTEXT\ conference 2014. Luigi Scarso and I integrated that
24code and I added a few more functions. Eventually the team will kick out the old
25token library and overhaul the input|-|related code in \LUATEX, because no
26callback is needed any more (and also because the current code still has traces
27of multiple \LUA\ instances). This will happen stepwise to give users who use the
28old mechanism an opportunity to adapt.
29
30Here I will show a bit of the new token scanners and explain how they can be used
31in \CONTEXT. Some of the additional scanners written on top of the built|-|in ones
32will probably end up in the generic \LUATEX\ code that ships with \CONTEXT.
33
34\stopsection
35
36\startsection[title=The \TEX\ scanner]
37
38The new token scanner library of \LUATEX\ provides a way to hook \LUA\ into \TEX\
39in a rather natural way. I have to admit that I never had any real demand for
40such a feature but now that we have it, it is worth exploring.
41
42The \TEX\ scanner roughly provides the following sub-scanners that are used to
43implement primitives: keyword, token, token list, dimension, glue and integer.
44Deep down there are specific variants for scanning, for instance, font dimensions
45and special numbers.
46
47A token is a unit of input, and one or more characters are turned into a token.
48How a character is interpreted is determined by its current catcode. For instance
49a backslash is normally tagged as `escape character' which means that it starts a
50control sequence: a macro name or primitive. This means that once it is scanned a
51macro name travels as one token through the system. Take this:
52
53\starttyping
54\def\foo#1{\scratchcounter=123#1\relax}
55\stoptyping
56
57Here \TEX\ scans \type {\def} and turns it into a token. This particular token
58triggers a specific branch in the scanner. First a name is scanned with
59optionally an argument specification. Then the body is scanned and the macro is
60stored in memory. Because \type {\scratchcounter}, \type
61{\relax} and \type {#1} are
62turned into tokens, this body has 7~tokens.
63
64When the macro \type {\foo} is referenced the body gets expanded which here means
65that the scanner will scan for an argument first and uses that in the
66replacement. So, the scanner switches between different states. Sometimes tokens
67are just collected and stored, in other cases they get expanded immediately into
68some action.
69
70\stopsection
71
72\startsection[title=Scanning from \LUA]
73
74The basic building blocks of the scanner are available at the \LUA\ end, for
75instance:
76
77\starttyping
78\directlua{print(token.scan_int())} 123
79\stoptyping
80
81This will print \type {123} to the console. Or, you can store the number and
82use it later:
83
84\starttyping
85\directlua{SavedNumber = token.scan_int())} 123
86
87We saved: \directlua{tex.print(SavedNumber)}
88\stoptyping
89
90The number of scanner functions is (on purpose) limited but you can use them to
91write additional ones as you can just grab tokens, interpret them and act
92accordingly.
93
94The \type {scan_int} function picks up a number. This can also be a counter, a
95named (math) character or a numeric expression. In \TEX, numbers are integers;
96floating|-|point is not supported naturally. With \type {scan_dimen} a dimension
97is grabbed, where a dimen is either a number (float) followed by a unit, a dimen
98register or a dimen expression (internally, all become integers). Of course
99internal quantities are also okay. There are two optional arguments, the first
100indicating that we accept a filler as unit, while the second indicates that math
101units are expected. When an integer or dimension is scanned, tokens are expanded
102till the input is a valid number or dimension. The \type {scan_glue} function
103takes one optional argument: a boolean indicating if the units are math.
104
105The \type {scan_toks} function picks up a (normally) brace|-|delimited sequence of
106tokens and (\LUATEX\ 0.80) returns them as a table of tokens. The function \type
107{get_token} returns one (unexpanded) token while \type {scan_token} returns
108an expanded one.
109
110Because strings are natural to \LUA\ we also have \type {scan_string}. This one
111converts a following brace|-|delimited sequence of tokens into a proper string.
112
113The function \type {scan_keyword} looks for the given keyword and when found skips
114over it and returns \type {true}. Here is an example of usage: \footnote {In
115\LUATEX\ 0.80 you should use \type {newtoken} instead of \type {token}.}
116
117\starttyping
118function ScanPair()
119  local one = 0
120  local two = ""
121  while true do
122    if token.scan_keyword("one") then
123      one = token.scan_int()
124    elseif token.scan_keyword("two") then
125      two = token.scan_string()
126    else
127      break
128    end
129  end
130  tex.print("one: ",one,"\\par")
131  tex.print("two: ",two,"\\par")
132end
133\stoptyping
134
135This can be used as:
136
137\starttyping
138\directlua{ScanPair()}
139\stoptyping
140
141You can scan for an explicit character (class) with \type {scan_code}. This
142function takes a positive number as argument and returns a character or \type
143{nil}.
144
145\starttabulate[|r|r|l|]
146\NC \cldcontext{tokens.bits.escape     } \NC  0 \NC \type{escape}      \NC \NR
147\NC \cldcontext{tokens.bits.begingroup } \NC  1 \NC \type{begingroup}  \NC \NR
148\NC \cldcontext{tokens.bits.endgroup   } \NC  2 \NC \type{endgroup}    \NC \NR
149\NC \cldcontext{tokens.bits.mathshift  } \NC  3 \NC \type{mathshift}   \NC \NR
150\NC \cldcontext{tokens.bits.alignment  } \NC  4 \NC \type{alignment}   \NC \NR
151\NC \cldcontext{tokens.bits.endofline  } \NC  5 \NC \type{endofline}   \NC \NR
152\NC \cldcontext{tokens.bits.parameter  } \NC  6 \NC \type{parameter}   \NC \NR
153\NC \cldcontext{tokens.bits.superscript} \NC  7 \NC \type{superscript} \NC \NR
154\NC \cldcontext{tokens.bits.subscript  } \NC  8 \NC \type{subscript}   \NC \NR
155\NC \cldcontext{tokens.bits.ignore     } \NC  9 \NC \type{ignore}      \NC \NR
156\NC \cldcontext{tokens.bits.space      } \NC 10 \NC \type{space}       \NC \NR
157\NC \cldcontext{tokens.bits.letter     } \NC 11 \NC \type{letter}      \NC \NR
158\NC \cldcontext{tokens.bits.other      } \NC 12 \NC \type{other}       \NC \NR
159\NC \cldcontext{tokens.bits.active     } \NC 13 \NC \type{active}      \NC \NR
160\NC \cldcontext{tokens.bits.comment    } \NC 14 \NC \type{comment}     \NC \NR
161\NC \cldcontext{tokens.bits.invalid    } \NC 15 \NC \type{invalid}     \NC \NR
162\stoptabulate
163
164So, if you want to grab the character you can say:
165
166\starttyping
167local c = token.scan_code(2^10 + 2^11 + 2^12)
168\stoptyping
169
170In \CONTEXT\ you can say:
171
172\starttyping
173local c = tokens.scanners.code(
174  tokens.bits.space +
175  tokens.bits.letter +
176  tokens.bits.other
177)
178\stoptyping
179
180When no argument is given, the next character with catcode letter or other is
181returned (if found).
182
183In \CONTEXT\ we use the \type {tokens} namespace which has additional scanners
184available. That way we can remain compatible. I can add more scanners when
185needed, although it is not expected that users will use this mechanism directly.
186
187\starttabulate[||||]
188\NC \type {(new)token}   \NC \type {tokens}             \NC arguments \NC \NR
189\HL
190\NC                      \NC \type {scanners.boolean}   \NC \NC \NR
191\NC \type {scan_code}    \NC \type {scanners.code}      \NC \type {(bits)}      \NC \NR
192\NC \type {scan_dimen}   \NC \type {scanners.dimension} \NC \type {(fill,math)} \NC \NR
193\NC \type {scan_glue}    \NC \type {scanners.glue}      \NC \type {(math)}      \NC \NR
194\NC \type {scan_int}     \NC \type {scanners.integer}   \NC \NC \NR
195\NC \type {scan_keyword} \NC \type {scanners.keyword}   \NC \NC \NR
196\NC                      \NC \type {scanners.number}    \NC \NC \NR
197\NC \type {scan_token}   \NC \type {scanners.token}     \NC \NC \NR
198\NC \type {scan_tokens}  \NC \type {scanners.tokens}    \NC \NC \NR
199\NC \type {scan_string}  \NC \type {scanners.string}    \NC \NC \NR
200\NC \type {scan_word}    \NC \type {scanners.word}      \NC \NC \NR
201\NC \type {get_token}    \NC \type {getters.token}      \NC \NC \NR
202\NC \type {set_macro}    \NC \type {setters.macro}      \NC \type {(catcodes,cs,str,global)} \NC \NR
203\stoptabulate
204
205All except \type {get_token} (or its alias \type {getters.token}) expand tokens
206in order to satisfy the demands.
207
208Here are some examples of how we can use the scanners. When we would call
209\type {Foo} with regular arguments we do this:
210
211\starttyping
212\def\foo#1{%
213  \directlua {
214    Foo("whatever","#1",{n = 1})
215  }
216}
217\stoptyping
218
219but when \type {Foo} uses the scanners it becomes:
220
221\starttyping
222\def\foo#1{%
223  \directlua{Foo()} {whatever} {#1} n {1}\relax
224}
225\stoptyping
226
227In the first case we have a function \type {Foo} like this:
228
229\starttyping
230function Foo(what,str,n)
231  --
232  -- do something with these three parameters
233  --
234end
235\stoptyping
236
237and in the second variant we have (using the \type {tokens} namespace):
238
239\starttyping
240function Foo()
241  local what = tokens.scanners.string()
242  local str  = tokens.scanners.string()
243  local n    = tokens.scanners.keyword("n") and
244               tokens.scanners.integer() or 0
245  --
246  -- do something with these three parameters
247  --
248end
249\stoptyping
250
251The string scanned is kind of special as the result depends ok what is seen.
252Given the following definition:
253
254\startbuffer
255           \def\bar  {bar}
256\unexpanded\def\ubar {ubar} % \protected in plain etc
257           \def\foo  {foo-\bar-\ubar}
258           \def\wrap {{foo-\bar}}
259           \def\uwrap{{foo-\ubar}}
260\stopbuffer
261
262\typebuffer
263
264\getbuffer
265
266We get:
267
268\def\TokTest{\ctxlua{
269    local s = tokens.scanners.string()
270    context("\\bgroup\\red\\tt")
271    context.verbatim(s)
272    context("\\egroup")
273}}
274
275\starttabulate[|l|Tl|]
276\NC \type{{foo}}       \NC \TokTest {foo}       \NC \NR
277\NC \type{{foo-\bar}}  \NC \TokTest {foo-\bar}  \NC \NR
278\NC \type{{foo-\ubar}} \NC \TokTest {foo-\ubar} \NC \NR
279\NC \type{foo-\bar}    \NC \TokTest foo-\bar    \NC \NR
280\NC \type{foo-\ubar}   \NC \TokTest foo-\ubar   \NC \NR
281\NC \type{foo$bar$}    \NC \TokTest foo$bar$    \NC \NR
282\NC \type{\foo}        \NC \TokTest \foo        \NC \NR
283\NC \type{\wrap}       \NC \TokTest \wrap       \NC \NR
284\NC \type{\uwrap}      \NC \TokTest \uwrap      \NC \NR
285\stoptabulate
286
287Because scanners look ahead the following happens: when an open brace is seen (or
288any character marked as left brace) the scanner picks up tokens and expands them
289unless they are protected; so, effectively, it scans as if the body of an \type
290{\edef} is scanned. However, when the next token is a control sequence it will be
291expanded first to see if there is a left brace, so there we get the full
292expansion. In practice this is convenient behaviour because the braced variant
293permits us to pick up meanings honouring protection. Of course this is all a side
294effect of how \TEX\ scans.\footnote {This lookahead expansion can sometimes give
295unexpected side effects because often \TEX\ pushes back a token when a condition
296is not met. For instance when it scans a number, scanning stops when no digits
297are seen but the scanner has to look at the next (expanded) token in order to
298come to that conclusion. In the process it will, for instance, expand
299conditionals. This means that intermediate catcode changes will not be effective
300(or applied) to already-seen tokens that were pushed back into the input. This
301also happens with, for instance, \cs {futurelet}.}
302
303With the braced variant one can of course use primitives like \type {\detokenize}
304and \type {\unexpanded} (in \CONTEXT: \type {\normalunexpanded}, as we already
305had this mechanism before it was added to the engine).
306
307\stopsection
308
309\startsection[title=Considerations]
310
311Performance|-|wise there is not much difference between these methods. With some
312effort you can make the second approach faster than the first but in practice you
313will not notice much gain. So, the main motivation for using the scanner is that
314it provides a more \TEX|-|ified interface. When playing with the initial version
315of the scanners I did some tests with performance|-|sensitive \CONTEXT\ calls and
316the difference was measurable (positive) but deciding if and when to use the
317scanner approach was not easy. Sometimes embedded \LUA\ code looks better, and
318sometimes \TEX\ code. Eventually we will end up with a mix. Here are some
319considerations:
320
321\startitemize
322\startitem
323    In both cases there is the overhead of a \LUA\ call.
324\stopitem
325\startitem
326    In the pure \LUA\ case the whole argument is tokenized by \TEX\ and then
327    converted to a string that gets compiled by \LUA\ and executed.
328\stopitem
329\startitem
330    When the scan happens in \LUA\ there are extra calls to functions but
331    scanning still happens in \TEX; some token to string conversion is avoided
332    and compilation can be more efficient.
333\stopitem
334\startitem
335    When data comes from external files, parsing with \LUA\ is in most cases more
336    efficient than parsing by \TEX .
337\stopitem
338\startitem
339    A macro package like \CONTEXT\ wraps functionality in macros and is
340    controlled by key|/|value specifications. There is often no benefit in terms
341    of performance when delegating to the mentioned scanners.
342\stopitem
343\stopitemize
344
345Another consideration is that when using macros, parameters are often passed
346between \type {{}}:
347
348\starttyping
349\def\foo#1#2#3%
350  {...}
351\foo {a}{123}{b}
352\stoptyping
353
354and suddenly changing that to
355
356\starttyping
357\def\foo{\directlua{Foo()}}
358\stoptyping
359
360and using that as:
361
362\starttyping
363\foo {a} {b} n 123
364\stoptyping
365
366means that \type {{123}} will fail. So, eventually you will end up with something:
367
368\starttyping
369\def\myfakeprimitive{\directlua{Foo()}}
370\def\foo#1#2#3{\myfakeprimitive {#1} {#2} n #3 }
371\stoptyping
372
373and:
374
375\starttyping
376\foo {a} {b} {123}
377\stoptyping
378
379So in the end you don't gain much here apart from the fact that the fake
380primitive can be made more clever and accept optional arguments. But such new
381features are often hidden for the user who uses more high|-|level wrappers.
382
383When you code in pure \TEX\ and want to grab a number directly you need to test
384for the braced case; when you use the \LUA\ scanner method you still need to test
385for braces. The scanners are consistent with the way \TEX\ works. Of course you
386can write helpers that do some checking for braces in \LUA, so there are no real
387limitations, but it adds some overhead (and maybe also confusion).
388
389One way to speed up the call is to use the \type {\luafunction} primitive in
390combinations with predefined functions and although both mechanisms can benefit
391from this, the scanner approach gets more out of that as this method cannot be
392used with regular function calls that get arguments. In (rather low level) \LUA\
393it looks like this:
394
395\starttyping
396luafunctions[1] = function()
397  local a token.scan_string()
398  local n token.scan_int()
399  local b token.scan_string()
400  -- whatever --
401end
402\stoptyping
403
404And in \TEX:
405
406\starttyping
407\luafunction1 {a} 123 {b}
408\stoptyping
409
410This can of course be wrapped as:
411
412\starttyping
413\def\myprimitive{\luafunction1 }
414\stoptyping
415
416\stopsection
417
418\startsection[title=Applications]
419
420The question now pops up: where can this be used? Can you really make new
421primitives? The answer is yes. You can write code that exclusively stays on the
422\LUA\ side but you can also do some magic and then print back something to \TEX.
423Here we use the basic token interface, not \CONTEXT:
424
425\startbuffer
426\directlua {
427local token = newtoken or token
428function ColoredRule()
429  local w, h, d, c, t
430  while true do
431    if token.scan_keyword("width") then
432      w = token.scan_dimen()
433    elseif token.scan_keyword("height") then
434      h = token.scan_dimen()
435    elseif token.scan_keyword("depth") then
436      d = token.scan_dimen()
437    elseif token.scan_keyword("color") then
438      c = token.scan_string()
439    elseif token.scan_keyword("type") then
440      t = token.scan_string()
441    else
442      break
443    end
444  end
445  if c then
446    tex.sprint("\\color[",c,"]{")
447  end
448  if t == "vertical" then
449    tex.sprint("\\vrule")
450  else
451    tex.sprint("\\hrule")
452  end
453  if w then
454    tex.sprint("width ",w,"sp")
455  end
456  if h then
457    tex.sprint("height ",h,"sp")
458  end
459  if d then
460    tex.sprint("depth ",d,"sp")
461  end
462  if c then
463    tex.sprint("\\relax}")
464  end
465end
466}
467\stopbuffer
468
469\typebuffer \getbuffer
470
471This can be given a \TeX\ interface like:
472
473\startbuffer
474\def\myhrule{\directlua{ColoredRule()} type {horizontal} }
475\def\myvrule{\directlua{ColoredRule()} type {vertical} }
476\stopbuffer
477
478\typebuffer \getbuffer
479
480And used as:
481
482\startbuffer
483\myhrule width \hsize height 1cm color {darkred}
484\stopbuffer
485
486\typebuffer
487
488giving:
489
490% when no newtokens:
491%
492% \startbuffer
493% \blackrule[width=\hsize,height=1cm,color=darkred]
494% \stopbuffer
495
496\startlinecorrection \getbuffer \stoplinecorrection
497
498Of course \CONTEXT\ users can use the following commands to color an
499otherwise-black rule (likewise):
500
501\startbuffer
502\blackrule[width=\hsize,height=1cm,color=darkgreen]
503\stopbuffer
504
505\typebuffer \startlinecorrection \getbuffer \stoplinecorrection
506
507The official \CONTEXT\ way to define such a new command is the following. The
508conversion back to verbose dimensions is needed because we pass back to \TEX.
509
510\startbuffer
511\startluacode
512local myrule = tokens.compile {
513  {
514    { "width",  "dimension", "todimen" },
515    { "height", "dimension", "todimen" },
516    { "depth",  "dimension", "todimen" },
517    { "color",  "string" },
518    { "type",   "string" },
519  }
520}
521
522interfaces.scanners.ColoredRule = function()
523  local t = myrule()
524  context.blackrule {
525    color  = t.color,
526    width  = t.width,
527    height = t.height,
528    depth  = t.depth,
529  }
530end
531\stopluacode
532\stopbuffer
533
534\typebuffer \getbuffer
535
536With:
537
538\startbuffer
539\unprotect \let\myrule\clf_ColoredRule \protect
540\stopbuffer
541
542\typebuffer \getbuffer
543
544and
545
546\startbuffer
547\myrule width \textwidth height 1cm color {maincolor} \relax
548\stopbuffer
549
550\typebuffer
551
552we get:
553
554% when no newtokens:
555%
556% \startbuffer
557% \blackrule[width=\hsize,height=1cm,color=maincolor]
558% \stopbuffer
559
560\startlinecorrection \getbuffer \stoplinecorrection
561
562There are many ways to use the scanners and each has its charm. We will look at
563some alternatives from the perspective of performance. The timings are more meant
564as relative measures than absolute ones. After all it depends on the hardware. We
565assume the following shortcuts:
566
567\starttyping
568local scannumber  = tokens.scanners.number
569local scankeyword = tokens.scanners.keyword
570local scanword    = tokens.scanners.word
571\stoptyping
572
573We will scan for four different keys and values. The number is scanned using a
574helper \type {scannumber} that scans for a number that is acceptable for \LUA.
575Thus, \type {1.23} is valid, as are \type {0x1234} and \type {12.12E4}.
576
577% interfaces.scanners.test_scaling_a
578
579\starttyping
580function getmatrix()
581  local sx, sy = 1, 1
582  local rx, ry = 0, 0
583  while true do
584    if scankeyword("sx") then
585      sx = scannumber()
586    elseif scankeyword("sy") then
587      sy = scannumber()
588    elseif scankeyword("rx") then
589      rx = scannumber()
590    elseif scankeyword("ry") then
591      ry = scannumber()
592    else
593      break
594    end
595  end
596  -- action --
597end
598\stoptyping
599
600Scanning the following specification  100000 times takes 1.00 seconds:
601
602\starttyping
603sx 1.23 sy 4.5 rx 1.23 ry 4.5
604\stoptyping
605
606The \quote {tight} case takes 0.94 seconds:
607
608\starttyping
609sx1.23 sy4.5 rx1.23 ry4.5
610\stoptyping
611
612% interfaces.scanners.test_scaling_b
613
614We can compare this to scanning without keywords. In that case there have to be
615exactly four arguments. These have to be given in the right order which is no big
616deal as often such helpers are encapsulated in a user|-|friendly macro.
617
618\starttyping
619function getmatrix()
620  local sx, sy = scannumber(), scannumber()
621  local rx, ry = scannumber(), scannumber()
622  -- action --
623end
624\stoptyping
625
626As expected, this is more efficient than the previous examples. It takes 0.80
627seconds to scan this 100000 times:
628
629\starttyping
6301.23 4.5 1.23 4.5
631\stoptyping
632
633A third alternative is the following:
634
635\starttyping
636function getmatrix()
637  local sx, sy = 1, 1
638  local rx, ry = 0, 0
639  while true do
640    local kw = scanword()
641    if kw == "sx" then
642      sx = scannumber()
643    elseif kw == "sy" then
644      sy = scannumber()
645    elseif kw == "rx" then
646      rx = scannumber()
647    elseif kw == "ry" then
648      ry = scannumber()
649    else
650      break
651    end
652  end
653  -- action --
654end
655\stoptyping
656
657Here we scan for a keyword and assign a number to the right variable. This one
658call happens to be less efficient than calling \type {scan_keyword} 10 times
659($4+3+2+1$) for the explicit scan. This run takes 1.11 seconds for the next line.
660The spaces are really needed as words can be anything that has no space.
661\footnote {Hard|-|coding the word scan in a \CCODE\ helper makes little sense, as
662different macro packages can have different assumptions about what a word is. And
663we don't extend \LUATEX\ for specific macro packages.}
664
665\starttyping
666sx 1.23 sy 4.5 rx 1.23 ry 4.5
667\stoptyping
668
669Of course these numbers need to be compared to a baseline of no scanning (i.e.\
670the overhead of a \LUA\ call which here amounts to 0.10 seconds. This brings
671us to the following table.
672
673\starttabulate[|l|l|]
674\NC keyword checks \NC 0.9 sec\NC \NR
675\NC no keywords    \NC 0.7 sec\NC \NR
676\NC word checks    \NC 1.0 sec\NC \NR
677\stoptabulate
678
679The differences are not that impressive given the number of calls. Even in a
680complex document the overhead of scanning can be negligible compared to the
681actions involved in typesetting the document. In fact, there will always be some
682kind of scanning for such macros so we're talking about even less impact. So you
683can just use the method you like most. In practice, the extra overhead of using
684keywords in combination with explicit checks (the first case) is rather
685convenient.
686
687If you don't want to have many tests you can do something like this:
688
689\starttyping
690local keys = {
691  sx = scannumber, sy = scannumber,
692  rx = scannumber, ry = scannumber,
693}
694
695function getmatrix()
696  local values = { }
697  while true do
698    for key, scan in next, keys do
699      if scankeyword(key) then
700        values[key] = scan()
701      else
702        break
703      end
704    end
705  end
706  -- action --
707end
708\stoptyping
709
710This is still quite fast although one now has to access the values in a table.
711Working with specifications like this is clean anyway so in \CONTEXT\ we have a
712way to abstract the previous definition.
713
714\starttyping
715local specification = tokens.compile {
716  {
717    { "sx", "number" }, { "sy", "number" },
718    { "rx", "number" }, { "ry", "number" },
719  },
720}
721
722function getmatrix()
723  local values = specification()
724  -- action using values.sx etc --
725end
726\stoptyping
727
728Although one can make complex definitions this way, the question remains if it
729is a better approach than passing \LUA\ tables. The standard \CONTEXT\ way for
730controlling features is:
731
732\starttyping
733\getmatrix[sx=1.2,sy=3.4]
734\stoptyping
735
736So it doesn't matter much if deep down we see:
737
738\starttyping
739\def\getmatrix[#1]%
740  {\getparameters[@@matrix][sx=1,sy=1,rx=1,ry=1,#1]%
741   \domatrix
742     \@@matrixsx
743     \@@matrixsy
744     \@@matrixrx
745     \@@matrixry
746   \relax}
747\stoptyping
748
749or:
750
751\starttyping
752\def\getmatrix[#1]%
753  {\getparameters[@@matrix][sx=1,sy=1,rx=1,ry=1,#1]%
754   \domatrix
755     sx \@@matrixsx
756     sy \@@matrixsy
757     rx \@@matrixrx
758     ry \@@matrixry
759   \relax}
760\stoptyping
761
762In the second variant (with keywords) can be a scanner like we defined before:
763
764\starttyping
765\def\domatrix#1#2#3#4%
766  {\directlua{getmatrix()}}
767\stoptyping
768
769but also:
770
771\starttyping
772\def\domatrix#1#2#3#4%
773  {\directlua{getmatrix(#1,#2,#3,#4)}}
774\stoptyping
775
776given:
777
778\starttyping
779function getmatrix(sx,sy,rx,ry)
780    -- action using sx etc --
781end
782\stoptyping
783
784or maybe nicer:
785
786\starttyping
787\def\domatrix#1#2#3#4%
788  {\directlua{domatrix{
789     sx = #1,
790     sy = #2,
791     rx = #3,
792     ry = #4
793   }}}
794\stoptyping
795
796assuming:
797
798\starttyping
799function getmatrix(values)
800    -- action using values.sx etc --
801end
802\stoptyping
803
804If you go for speed the scanner variant without keywords is the most efficient
805one. For readability the scanner variant with keywords or the last shown example
806where a table is passed is better. For flexibility the table variant is best as
807it makes no assumptions about the scanner \emdash\ the token scanner can quit on
808unknown keys, unless that is intercepted of course. But as mentioned before, even
809the advantage of the fast one should not be overestimated. When you trace usage
810it can be that the (in this case matrix) macro is called only a few thousand
811times and that doesn't really add up. Of course many different sped-up calls can
812make a difference but then one really needs to optimize consistently the whole
813code base and that can conflict with readability. The token library presents us
814with a nice chicken||egg problem but nevertheless is fun to play with.
815
816\stopsection
817
818\startsection[title=Assigning meanings]
819
820The token library also provides a way to create tokens and access properties but
821that interface can change with upcoming versions when the old library is replaced
822by the new one and the input handling is cleaned up. One experimental function is
823worth mentioning:
824
825\starttyping
826token.set_macro("foo","the meaning of bar")
827\stoptyping
828
829This will turn the given string into tokens that get assigned to \type {\foo}.
830Here are some alternative calls:
831
832\starttabulate
833\NC \type {set_macro("foo")}                    \NC \type { \def \foo {}}        \NC \NR
834\NC \type {set_macro("foo","meaning")}          \NC \type { \def \foo {meaning}} \NC \NR
835\NC \type {set_macro("foo","meaning","global")} \NC \type {\gdef \foo {meaning}} \NC \NR
836\stoptabulate
837
838The conversion to tokens happens under the current catcode regime. You can
839enforce a different regime by passing a number of an allocated catcode table as
840the first argument, as with \type {tex.print}. As we mentioned performance
841before: setting at the \LUA\ end like this:
842
843\starttyping
844token.set_macro("foo","meaning")
845\stoptyping
846
847is about two times as fast as:
848
849\starttyping
850tex.sprint("\\def\\foo{meaning}")
851\stoptyping
852
853or (with slightly more overhead) in \CONTEXT\ terms:
854
855\starttyping
856context("\\def\\foo{meaning}")
857\stoptyping
858
859The next variant is actually slower (even when we alias \type {setvalue}):
860
861\starttyping
862context.setvalue("foo","meaning")
863\stoptyping
864
865but although 0.4 versus 0.8 seconds looks like a lot on a \TEX\ run I need a
866million calls to see such a difference, and a million macro definitions during a
867run is a lot. The different assignments involved in, for instance, 3000 entries
868in a bibliography (with an average of 5 assignments per entry) can hardly be
869measured as we're talking about milliseconds. So again, it's mostly a matter of
870convenience when using this function, not a necessity.
871
872\stopsection
873
874\startsection[title=Conclusion]
875
876For sure we will see usage of the new scanner code in \CONTEXT, but to what
877extent remains to be seen. The performance gain is not impressive enough to
878justify many changes to the code but as the low|-|level interfacing can sometimes
879become a bit cleaner it will be used in specific places, even if we sacrifice
880some speed (which then probably will be compensated for by a little gain
881elsewhere).
882
883The scanners will probably never be used by users directly simply because there
884are no such low level interfaces in \CONTEXT\ and because manipulating input is
885easier in \LUA. Even deep down in the internals of \CONTEXT\ we will use wrappers
886and additional helpers around the scanner code. Of course there is the fun-factor
887and playing with these scanners is fun indeed. The macro setters have as their
888main benefit that using them can be nicer in the \LUA\ source, and of course
889setting a macro this way is also conceptually cleaner (just like we can set
890registers).
891
892Of course there are some challenges left, like determining if we are scanning
893input of already converted tokens (for instance in a macro body or token\-list
894expansion). Once we can properly feed back tokens we can also look ahead like
895\type {\futurelet} does. But for that to happen we will first clean up the
896\LUATEX\ input scanner code and error handler.
897
898\stopsection
899
900\stopchapter
901
902\stoptext
903
904