1
2
3\environment stillenvironment
4
5\starttext
6
7\startchapter[title=Scanning input]
8
9\startsection[title=Introduction]
10
11Tokens are the building blocks of the input for \TEX\ and they drive the process
12of expansion which in turn results in typesetting. If you want to manipulate the
13input, intercepting tokens is one approach. Other solutions are preprocessing or
14writing macros that do something with their pickedup arguments. In \CONTEXT\
15\MKIV\ we often forget about manipulating the input but manipulate the
16intermediate typesetting results instead. The advantage is that only at that
17moment do you know what youre truly dealing with, but a disadvantage is that
18parsing the socalled node lists is not always efficient and it can even be
19rather complex, for instance in math. It remains a fact that until \LUATEX\
20version 0.80 \CONTEXT\ hardly used the token interface.
21
22In version 0.80 a new scanner interface was introduced, demonstrated by Taco
23Hoekwater at the \CONTEXT\ conference 2014. Luigi Scarso and I integrated that
24code and I added a few more functions. Eventually the team will kick out the old
25token library and overhaul the inputrelated code in \LUATEX, because no
26callback is needed any more (and also because the current code still has traces
27of multiple \LUA\ instances). This will happen stepwise to give users who use the
28old mechanism an opportunity to adapt.
29
30Here I will show a bit of the new token scanners and explain how they can be used
31in \CONTEXT. Some of the additional scanners written on top of the builtin ones
32will probably end up in the generic \LUATEX\ code that ships with \CONTEXT.
33
34\stopsection
35
36\startsection[title=The \TEX\ scanner]
37
38The new token scanner library of \LUATEX\ provides a way to hook \LUA\ into \TEX\
39in a rather natural way. I have to admit that I never had any real demand for
40such a feature but now that we have it, it is worth exploring.
41
42The \TEX\ scanner roughly provides the following subscanners that are used to
43implement primitives: keyword, token, token list, dimension, glue and integer.
44Deep down there are specific variants for scanning, for instance, font dimensions
45and special numbers.
46
47A token is a unit of input, and one or more characters are turned into a token.
48How a character is interpreted is determined by its current catcode. For instance
49a backslash is normally tagged as escape character which means that it starts a
50control sequence: a macro name or primitive. This means that once it is scanned a
51macro name travels as one token through the system. Take this:
52
53\starttyping
54\def\foo#1{\scratchcounter=123#1\relax}
55\stoptyping
56
57Here \TEX\ scans \type {\def} and turns it into a token. This particular token
58triggers a specific branch in the scanner. First a name is scanned with
59optionally an argument specification. Then the body is scanned and the macro is
60stored in memory. Because \type {\scratchcounter}, \type
61{\relax} and \type {#1} are
62turned into tokens, this body has 7tokens.
63
64When the macro \type {\foo} is referenced the body gets expanded which here means
65that the scanner will scan for an argument first and uses that in the
66replacement. So, the scanner switches between different states. Sometimes tokens
67are just collected and stored, in other cases they get expanded immediately into
68some action.
69
70\stopsection
71
72\startsection[title=Scanning from \LUA]
73
74The basic building blocks of the scanner are available at the \LUA\ end, for
75instance:
76
77\starttyping
78\directlua{print(token.scan_int())} 123
79\stoptyping
80
81This will print \type {123} to the console. Or, you can store the number and
82use it later:
83
84\starttyping
85\directlua{SavedNumber = token.scan_int())} 123
86
87We saved: \directlua{tex.print(SavedNumber)}
88\stoptyping
89
90The number of scanner functions is (on purpose) limited but you can use them to
91write additional ones as you can just grab tokens, interpret them and act
92accordingly.
93
94The \type {scanint} function picks up a number. This can also be a counter, a
95named (math) character or a numeric expression. In \TEX, numbers are integers;
96floatingpoint is not supported naturally. With \type {scandimen} a dimension
97is grabbed, where a dimen is either a number (float) followed by a unit, a dimen
98register or a dimen expression (internally, all become integers). Of course
99internal quantities are also okay. There are two optional arguments, the first
100indicating that we accept a filler as unit, while the second indicates that math
101units are expected. When an integer or dimension is scanned, tokens are expanded
102till the input is a valid number or dimension. The \type {scanglue} function
103takes one optional argument: a boolean indicating if the units are math.
104
105The \type {scantoks} function picks up a (normally) bracedelimited sequence of
106tokens and (\LUATEX\ 0.80) returns them as a table of tokens. The function \type
107{gettoken} returns one (unexpanded) token while \type {scantoken} returns
108an expanded one.
109
110Because strings are natural to \LUA\ we also have \type {scanstring}. This one
111converts a following bracedelimited sequence of tokens into a proper string.
112
113The function \type {scankeyword} looks for the given keyword and when found skips
114over it and returns \type {true}. Here is an example of usage: \footnote {In
115\LUATEX\ 0.80 you should use \type {newtoken} instead of \type {token}.}
116
117\starttyping
118function ScanPair()
119 local one = 0
120 local two = ""
121 while true do
122 if token.scankeyword("one") then
123 one = token.scanint()
124 elseif token.scankeyword("two") then
125 two = token.scanstring()
126 else
127 break
128 end
129 end
130 tex.print("one: ",one,"\\par")
131 tex.print("two: ",two,"\\par")
132end
133\stoptyping
134
135This can be used as:
136
137\starttyping
138\directlua{ScanPair()}
139\stoptyping
140
141You can scan for an explicit character (class) with \type {scancode}. This
142function takes a positive number as argument and returns a character or \type
143{nil}.
144
145\starttabulate[rrl]
146\NC \cldcontext{tokens.bits.escape } \NC 0 \NC \type{escape} \NC \NR
147\NC \cldcontext{tokens.bits.begingroup } \NC 1 \NC \type{begingroup} \NC \NR
148\NC \cldcontext{tokens.bits.endgroup } \NC 2 \NC \type{endgroup} \NC \NR
149\NC \cldcontext{tokens.bits.mathshift } \NC 3 \NC \type{mathshift} \NC \NR
150\NC \cldcontext{tokens.bits.alignment } \NC 4 \NC \type{alignment} \NC \NR
151\NC \cldcontext{tokens.bits.endofline } \NC 5 \NC \type{endofline} \NC \NR
152\NC \cldcontext{tokens.bits.parameter } \NC 6 \NC \type{parameter} \NC \NR
153\NC \cldcontext{tokens.bits.superscript} \NC 7 \NC \type{superscript} \NC \NR
154\NC \cldcontext{tokens.bits.subscript } \NC 8 \NC \type{subscript} \NC \NR
155\NC \cldcontext{tokens.bits.ignore } \NC 9 \NC \type{ignore} \NC \NR
156\NC \cldcontext{tokens.bits.space } \NC 10 \NC \type{space} \NC \NR
157\NC \cldcontext{tokens.bits.letter } \NC 11 \NC \type{letter} \NC \NR
158\NC \cldcontext{tokens.bits.other } \NC 12 \NC \type{other} \NC \NR
159\NC \cldcontext{tokens.bits.active } \NC 13 \NC \type{active} \NC \NR
160\NC \cldcontext{tokens.bits.comment } \NC 14 \NC \type{comment} \NC \NR
161\NC \cldcontext{tokens.bits.invalid } \NC 15 \NC \type{invalid} \NC \NR
162\stoptabulate
163
164So, if you want to grab the character you can say:
165
166\starttyping
167local c = token.scancode(210 211 212)
168\stoptyping
169
170In \CONTEXT\ you can say:
171
172\starttyping
173local c = tokens.scanners.code(
174 tokens.bits.space
175 tokens.bits.letter
176 tokens.bits.other
177)
178\stoptyping
179
180When no argument is given, the next character with catcode letter or other is
181returned (if found).
182
183In \CONTEXT\ we use the \type {tokens} namespace which has additional scanners
184available. That way we can remain compatible. I can add more scanners when
185needed, although it is not expected that users will use this mechanism directly.
186
187\starttabulate[]
188\NC \type {(new)token} \NC \type {tokens} \NC arguments \NC \NR
189\HL
190\NC \NC \type {scanners.boolean} \NC \NC \NR
191\NC \type {scancode} \NC \type {scanners.code} \NC \type {(bits)} \NC \NR
192\NC \type {scandimen} \NC \type {scanners.dimension} \NC \type {(fill,math)} \NC \NR
193\NC \type {scanglue} \NC \type {scanners.glue} \NC \type {(math)} \NC \NR
194\NC \type {scanint} \NC \type {scanners.integer} \NC \NC \NR
195\NC \type {scankeyword} \NC \type {scanners.keyword} \NC \NC \NR
196\NC \NC \type {scanners.number} \NC \NC \NR
197\NC \type {scantoken} \NC \type {scanners.token} \NC \NC \NR
198\NC \type {scantokens} \NC \type {scanners.tokens} \NC \NC \NR
199\NC \type {scanstring} \NC \type {scanners.string} \NC \NC \NR
200\NC \type {scanword} \NC \type {scanners.word} \NC \NC \NR
201\NC \type {gettoken} \NC \type {getters.token} \NC \NC \NR
202\NC \type {setmacro} \NC \type {setters.macro} \NC \type {(catcodes,cs,str,global)} \NC \NR
203\stoptabulate
204
205All except \type {gettoken} (or its alias \type {getters.token}) expand tokens
206in order to satisfy the demands.
207
208Here are some examples of how we can use the scanners. When we would call
209\type {Foo} with regular arguments we do this:
210
211\starttyping
212\def\foo#1{
213 \directlua {
214 Foo("whatever","#1",{n = 1})
215 }
216}
217\stoptyping
218
219but when \type {Foo} uses the scanners it becomes:
220
221\starttyping
222\def\foo#1{
223 \directlua{Foo()} {whatever} {#1} n {1}\relax
224}
225\stoptyping
226
227In the first case we have a function \type {Foo} like this:
228
229\starttyping
230function Foo(what,str,n)
231
232 do something with these three parameters
233
234end
235\stoptyping
236
237and in the second variant we have (using the \type {tokens} namespace):
238
239\starttyping
240function Foo()
241 local what = tokens.scanners.string()
242 local str = tokens.scanners.string()
243 local n = tokens.scanners.keyword("n") and
244 tokens.scanners.integer() or 0
245
246 do something with these three parameters
247
248end
249\stoptyping
250
251The string scanned is kind of special as the result depends ok what is seen.
252Given the following definition:
253
254\startbuffer
255 \def\bar {bar}
256\unexpanded\def\ubar {ubar}
257 \def\foo {foo\bar\ubar}
258 \def\wrap {{foo\bar}}
259 \def\uwrap{{foo\ubar}}
260\stopbuffer
261
262\typebuffer
263
264\getbuffer
265
266We get:
267
268\def\TokTest{\ctxlua{
269 local s = tokens.scanners.string()
270 context("\\bgroup\\red\\tt")
271 context.verbatim(s)
272 context("\\egroup")
273}}
274
275\starttabulate[lTl]
276\NC \type{{foo}} \NC \TokTest {foo} \NC \NR
277\NC \type{{foo\bar}} \NC \TokTest {foo\bar} \NC \NR
278\NC \type{{foo\ubar}} \NC \TokTest {foo\ubar} \NC \NR
279\NC \type{foo\bar} \NC \TokTest foo\bar \NC \NR
280\NC \type{foo\ubar} \NC \TokTest foo\ubar \NC \NR
281\NC \type{foo$bar$} \NC \TokTest foo$bar$ \NC \NR
282\NC \type{\foo} \NC \TokTest \foo \NC \NR
283\NC \type{\wrap} \NC \TokTest \wrap \NC \NR
284\NC \type{\uwrap} \NC \TokTest \uwrap \NC \NR
285\stoptabulate
286
287Because scanners look ahead the following happens: when an open brace is seen (or
288any character marked as left brace) the scanner picks up tokens and expands them
289unless they are protected; so, effectively, it scans as if the body of an \type
290{\edef} is scanned. However, when the next token is a control sequence it will be
291expanded first to see if there is a left brace, so there we get the full
292expansion. In practice this is convenient behaviour because the braced variant
293permits us to pick up meanings honouring protection. Of course this is all a side
294effect of how \TEX\ scans.\footnote {This lookahead expansion can sometimes give
295unexpected side effects because often \TEX\ pushes back a token when a condition
296is not met. For instance when it scans a number, scanning stops when no digits
297are seen but the scanner has to look at the next (expanded) token in order to
298come to that conclusion. In the process it will, for instance, expand
299conditionals. This means that intermediate catcode changes will not be effective
300(or applied) to alreadyseen tokens that were pushed back into the input. This
301also happens with, for instance, \cs {futurelet}.}
302
303With the braced variant one can of course use primitives like \type {\detokenize}
304and \type {\unexpanded} (in \CONTEXT: \type {\normalunexpanded}, as we already
305had this mechanism before it was added to the engine).
306
307\stopsection
308
309\startsection[title=Considerations]
310
311Performancewise there is not much difference between these methods. With some
312effort you can make the second approach faster than the first but in practice you
313will not notice much gain. So, the main motivation for using the scanner is that
314it provides a more \TEXified interface. When playing with the initial version
315of the scanners I did some tests with performancesensitive \CONTEXT\ calls and
316the difference was measurable (positive) but deciding if and when to use the
317scanner approach was not easy. Sometimes embedded \LUA\ code looks better, and
318sometimes \TEX\ code. Eventually we will end up with a mix. Here are some
319considerations:
320
321\startitemize
322\startitem
323 In both cases there is the overhead of a \LUA\ call.
324\stopitem
325\startitem
326 In the pure \LUA\ case the whole argument is tokenized by \TEX\ and then
327 converted to a string that gets compiled by \LUA\ and executed.
328\stopitem
329\startitem
330 When the scan happens in \LUA\ there are extra calls to functions but
331 scanning still happens in \TEX; some token to string conversion is avoided
332 and compilation can be more efficient.
333\stopitem
334\startitem
335 When data comes from external files, parsing with \LUA\ is in most cases more
336 efficient than parsing by \TEX .
337\stopitem
338\startitem
339 A macro package like \CONTEXT\ wraps functionality in macros and is
340 controlled by keyvalue specifications. There is often no benefit in terms
341 of performance when delegating to the mentioned scanners.
342\stopitem
343\stopitemize
344
345Another consideration is that when using macros, parameters are often passed
346between \type {{}}:
347
348\starttyping
349\def\foo#1#2#3
350 {...}
351\foo {a}{123}{b}
352\stoptyping
353
354and suddenly changing that to
355
356\starttyping
357\def\foo{\directlua{Foo()}}
358\stoptyping
359
360and using that as:
361
362\starttyping
363\foo {a} {b} n 123
364\stoptyping
365
366means that \type {{123}} will fail. So, eventually you will end up with something:
367
368\starttyping
369\def\myfakeprimitive{\directlua{Foo()}}
370\def\foo#1#2#3{\myfakeprimitive {#1} {#2} n #3 }
371\stoptyping
372
373and:
374
375\starttyping
376\foo {a} {b} {123}
377\stoptyping
378
379So in the end you dont gain much here apart from the fact that the fake
380primitive can be made more clever and accept optional arguments. But such new
381features are often hidden for the user who uses more highlevel wrappers.
382
383When you code in pure \TEX\ and want to grab a number directly you need to test
384for the braced case; when you use the \LUA\ scanner method you still need to test
385for braces. The scanners are consistent with the way \TEX\ works. Of course you
386can write helpers that do some checking for braces in \LUA, so there are no real
387limitations, but it adds some overhead (and maybe also confusion).
388
389One way to speed up the call is to use the \type {\luafunction} primitive in
390combinations with predefined functions and although both mechanisms can benefit
391from this, the scanner approach gets more out of that as this method cannot be
392used with regular function calls that get arguments. In (rather low level) \LUA\
393it looks like this:
394
395\starttyping
396luafunctions[1] = function()
397 local a token.scanstring()
398 local n token.scanint()
399 local b token.scanstring()
400 whatever
401end
402\stoptyping
403
404And in \TEX:
405
406\starttyping
407\luafunction1 {a} 123 {b}
408\stoptyping
409
410This can of course be wrapped as:
411
412\starttyping
413\def\myprimitive{\luafunction1 }
414\stoptyping
415
416\stopsection
417
418\startsection[title=Applications]
419
420The question now pops up: where can this be used? Can you really make new
421primitives? The answer is yes. You can write code that exclusively stays on the
422\LUA\ side but you can also do some magic and then print back something to \TEX.
423Here we use the basic token interface, not \CONTEXT:
424
425\startbuffer
426\directlua {
427local token = newtoken or token
428function ColoredRule()
429 local w, h, d, c, t
430 while true do
431 if token.scan_keyword("width") then
432 w = token.scan_dimen()
433 elseif token.scan_keyword("height") then
434 h = token.scan_dimen()
435 elseif token.scan_keyword("depth") then
436 d = token.scan_dimen()
437 elseif token.scan_keyword("color") then
438 c = token.scan_string()
439 elseif token.scan_keyword("type") then
440 t = token.scan_string()
441 else
442 break
443 end
444 end
445 if c then
446 tex.sprint("\\color[",c,"]{")
447 end
448 if t == "vertical" then
449 tex.sprint("\\vrule")
450 else
451 tex.sprint("\\hrule")
452 end
453 if w then
454 tex.sprint("width ",w,"sp")
455 end
456 if h then
457 tex.sprint("height ",h,"sp")
458 end
459 if d then
460 tex.sprint("depth ",d,"sp")
461 end
462 if c then
463 tex.sprint("\\relax}")
464 end
465end
466}
467\stopbuffer
468
469\typebuffer \getbuffer
470
471This can be given a \TeX\ interface like:
472
473\startbuffer
474\def\myhrule{\directlua{ColoredRule()} type {horizontal} }
475\def\myvrule{\directlua{ColoredRule()} type {vertical} }
476\stopbuffer
477
478\typebuffer \getbuffer
479
480And used as:
481
482\startbuffer
483\myhrule width \hsize height 1cm color {darkred}
484\stopbuffer
485
486\typebuffer
487
488giving:
489
490
491
492
493
494
495
496\startlinecorrection \getbuffer \stoplinecorrection
497
498Of course \CONTEXT\ users can use the following commands to color an
499otherwiseblack rule (likewise):
500
501\startbuffer
502\blackrule[width=\hsize,height=1cm,color=darkgreen]
503\stopbuffer
504
505\typebuffer \startlinecorrection \getbuffer \stoplinecorrection
506
507The official \CONTEXT\ way to define such a new command is the following. The
508conversion back to verbose dimensions is needed because we pass back to \TEX.
509
510\startbuffer
511\startluacode
512local myrule = tokens.compile {
513 {
514 { "width", "dimension", "todimen" },
515 { "height", "dimension", "todimen" },
516 { "depth", "dimension", "todimen" },
517 { "color", "string" },
518 { "type", "string" },
519 }
520}
521
522interfaces.scanners.ColoredRule = function()
523 local t = myrule()
524 context.blackrule {
525 color = t.color,
526 width = t.width,
527 height = t.height,
528 depth = t.depth,
529 }
530end
531\stopluacode
532\stopbuffer
533
534\typebuffer \getbuffer
535
536With:
537
538\startbuffer
539\unprotect \let\myrule\clf_ColoredRule \protect
540\stopbuffer
541
542\typebuffer \getbuffer
543
544and
545
546\startbuffer
547\myrule width \textwidth height 1cm color {maincolor} \relax
548\stopbuffer
549
550\typebuffer
551
552we get:
553
554
555
556
557
558
559
560\startlinecorrection \getbuffer \stoplinecorrection
561
562There are many ways to use the scanners and each has its charm. We will look at
563some alternatives from the perspective of performance. The timings are more meant
564as relative measures than absolute ones. After all it depends on the hardware. We
565assume the following shortcuts:
566
567\starttyping
568local scannumber = tokens.scanners.number
569local scankeyword = tokens.scanners.keyword
570local scanword = tokens.scanners.word
571\stoptyping
572
573We will scan for four different keys and values. The number is scanned using a
574helper \type {scannumber} that scans for a number that is acceptable for \LUA.
575Thus, \type {1.23} is valid, as are \type {0x1234} and \type {12.12E4}.
576
577
578
579\starttyping
580function getmatrix()
581 local sx, sy = 1, 1
582 local rx, ry = 0, 0
583 while true do
584 if scankeyword("sx") then
585 sx = scannumber()
586 elseif scankeyword("sy") then
587 sy = scannumber()
588 elseif scankeyword("rx") then
589 rx = scannumber()
590 elseif scankeyword("ry") then
591 ry = scannumber()
592 else
593 break
594 end
595 end
596 action
597end
598\stoptyping
599
600Scanning the following specification 100000 times takes 1.00 seconds:
601
602\starttyping
603sx 1.23 sy 4.5 rx 1.23 ry 4.5
604\stoptyping
605
606The \quote {tight} case takes 0.94 seconds:
607
608\starttyping
609sx1.23 sy4.5 rx1.23 ry4.5
610\stoptyping
611
612
613
614We can compare this to scanning without keywords. In that case there have to be
615exactly four arguments. These have to be given in the right order which is no big
616deal as often such helpers are encapsulated in a userfriendly macro.
617
618\starttyping
619function getmatrix()
620 local sx, sy = scannumber(), scannumber()
621 local rx, ry = scannumber(), scannumber()
622 action
623end
624\stoptyping
625
626As expected, this is more efficient than the previous examples. It takes 0.80
627seconds to scan this 100000 times:
628
629\starttyping
6301.23 4.5 1.23 4.5
631\stoptyping
632
633A third alternative is the following:
634
635\starttyping
636function getmatrix()
637 local sx, sy = 1, 1
638 local rx, ry = 0, 0
639 while true do
640 local kw = scanword()
641 if kw == "sx" then
642 sx = scannumber()
643 elseif kw == "sy" then
644 sy = scannumber()
645 elseif kw == "rx" then
646 rx = scannumber()
647 elseif kw == "ry" then
648 ry = scannumber()
649 else
650 break
651 end
652 end
653 action
654end
655\stoptyping
656
657Here we scan for a keyword and assign a number to the right variable. This one
658call happens to be less efficient than calling \type {scankeyword} 10 times
659($4321$) for the explicit scan. This run takes 1.11 seconds for the next line.
660The spaces are really needed as words can be anything that has no space.
661\footnote {Hardcoding the word scan in a \CCODE\ helper makes little sense, as
662different macro packages can have different assumptions about what a word is. And
663we dont extend \LUATEX\ for specific macro packages.}
664
665\starttyping
666sx 1.23 sy 4.5 rx 1.23 ry 4.5
667\stoptyping
668
669Of course these numbers need to be compared to a baseline of no scanning (i.e.\
670the overhead of a \LUA\ call which here amounts to 0.10 seconds. This brings
671us to the following table.
672
673\starttabulate[ll]
674\NC keyword checks \NC 0.9 sec\NC \NR
675\NC no keywords \NC 0.7 sec\NC \NR
676\NC word checks \NC 1.0 sec\NC \NR
677\stoptabulate
678
679The differences are not that impressive given the number of calls. Even in a
680complex document the overhead of scanning can be negligible compared to the
681actions involved in typesetting the document. In fact, there will always be some
682kind of scanning for such macros so were talking about even less impact. So you
683can just use the method you like most. In practice, the extra overhead of using
684keywords in combination with explicit checks (the first case) is rather
685convenient.
686
687If you dont want to have many tests you can do something like this:
688
689\starttyping
690local keys = {
691 sx = scannumber, sy = scannumber,
692 rx = scannumber, ry = scannumber,
693}
694
695function getmatrix()
696 local values = { }
697 while true do
698 for key, scan in next, keys do
699 if scankeyword(key) then
700 values[key] = scan()
701 else
702 break
703 end
704 end
705 end
706 action
707end
708\stoptyping
709
710This is still quite fast although one now has to access the values in a table.
711Working with specifications like this is clean anyway so in \CONTEXT\ we have a
712way to abstract the previous definition.
713
714\starttyping
715local specification = tokens.compile {
716 {
717 { "sx", "number" }, { "sy", "number" },
718 { "rx", "number" }, { "ry", "number" },
719 },
720}
721
722function getmatrix()
723 local values = specification()
724 action using values.sx etc
725end
726\stoptyping
727
728Although one can make complex definitions this way, the question remains if it
729is a better approach than passing \LUA\ tables. The standard \CONTEXT\ way for
730controlling features is:
731
732\starttyping
733\getmatrix[sx=1.2,sy=3.4]
734\stoptyping
735
736So it doesnt matter much if deep down we see:
737
738\starttyping
739\def\getmatrix[#1]
740 {\getparameters[@@matrix][sx=1,sy=1,rx=1,ry=1,#1]
741 \domatrix
742 \@@matrixsx
743 \@@matrixsy
744 \@@matrixrx
745 \@@matrixry
746 \relax}
747\stoptyping
748
749or:
750
751\starttyping
752\def\getmatrix[#1]
753 {\getparameters[@@matrix][sx=1,sy=1,rx=1,ry=1,#1]
754 \domatrix
755 sx \@@matrixsx
756 sy \@@matrixsy
757 rx \@@matrixrx
758 ry \@@matrixry
759 \relax}
760\stoptyping
761
762In the second variant (with keywords) can be a scanner like we defined before:
763
764\starttyping
765\def\domatrix#1#2#3#4
766 {\directlua{getmatrix()}}
767\stoptyping
768
769but also:
770
771\starttyping
772\def\domatrix#1#2#3#4
773 {\directlua{getmatrix(#1,#2,#3,#4)}}
774\stoptyping
775
776given:
777
778\starttyping
779function getmatrix(sx,sy,rx,ry)
780 action using sx etc
781end
782\stoptyping
783
784or maybe nicer:
785
786\starttyping
787\def\domatrix#1#2#3#4
788 {\directlua{domatrix{
789 sx = #1,
790 sy = #2,
791 rx = #3,
792 ry = #4
793 }}}
794\stoptyping
795
796assuming:
797
798\starttyping
799function getmatrix(values)
800 action using values.sx etc
801end
802\stoptyping
803
804If you go for speed the scanner variant without keywords is the most efficient
805one. For readability the scanner variant with keywords or the last shown example
806where a table is passed is better. For flexibility the table variant is best as
807it makes no assumptions about the scanner \emdash\ the token scanner can quit on
808unknown keys, unless that is intercepted of course. But as mentioned before, even
809the advantage of the fast one should not be overestimated. When you trace usage
810it can be that the (in this case matrix) macro is called only a few thousand
811times and that doesnt really add up. Of course many different spedup calls can
812make a difference but then one really needs to optimize consistently the whole
813code base and that can conflict with readability. The token library presents us
814with a nice chickenegg problem but nevertheless is fun to play with.
815
816\stopsection
817
818\startsection[title=Assigning meanings]
819
820The token library also provides a way to create tokens and access properties but
821that interface can change with upcoming versions when the old library is replaced
822by the new one and the input handling is cleaned up. One experimental function is
823worth mentioning:
824
825\starttyping
826token.setmacro("foo","the meaning of bar")
827\stoptyping
828
829This will turn the given string into tokens that get assigned to \type {\foo}.
830Here are some alternative calls:
831
832\starttabulate
833\NC \type {setmacro("foo")} \NC \type { \def \foo {}} \NC \NR
834\NC \type {setmacro("foo","meaning")} \NC \type { \def \foo {meaning}} \NC \NR
835\NC \type {setmacro("foo","meaning","global")} \NC \type {\gdef \foo {meaning}} \NC \NR
836\stoptabulate
837
838The conversion to tokens happens under the current catcode regime. You can
839enforce a different regime by passing a number of an allocated catcode table as
840the first argument, as with \type {tex.print}. As we mentioned performance
841before: setting at the \LUA\ end like this:
842
843\starttyping
844token.setmacro("foo","meaning")
845\stoptyping
846
847is about two times as fast as:
848
849\starttyping
850tex.sprint("\\def\\foo{meaning}")
851\stoptyping
852
853or (with slightly more overhead) in \CONTEXT\ terms:
854
855\starttyping
856context("\\def\\foo{meaning}")
857\stoptyping
858
859The next variant is actually slower (even when we alias \type {setvalue}):
860
861\starttyping
862context.setvalue("foo","meaning")
863\stoptyping
864
865but although 0.4 versus 0.8 seconds looks like a lot on a \TEX\ run I need a
866million calls to see such a difference, and a million macro definitions during a
867run is a lot. The different assignments involved in, for instance, 3000 entries
868in a bibliography (with an average of 5 assignments per entry) can hardly be
869measured as were talking about milliseconds. So again, its mostly a matter of
870convenience when using this function, not a necessity.
871
872\stopsection
873
874\startsection[title=Conclusion]
875
876For sure we will see usage of the new scanner code in \CONTEXT, but to what
877extent remains to be seen. The performance gain is not impressive enough to
878justify many changes to the code but as the lowlevel interfacing can sometimes
879become a bit cleaner it will be used in specific places, even if we sacrifice
880some speed (which then probably will be compensated for by a little gain
881elsewhere).
882
883The scanners will probably never be used by users directly simply because there
884are no such low level interfaces in \CONTEXT\ and because manipulating input is
885easier in \LUA. Even deep down in the internals of \CONTEXT\ we will use wrappers
886and additional helpers around the scanner code. Of course there is the funfactor
887and playing with these scanners is fun indeed. The macro setters have as their
888main benefit that using them can be nicer in the \LUA\ source, and of course
889setting a macro this way is also conceptually cleaner (just like we can set
890registers).
891
892Of course there are some challenges left, like determining if we are scanning
893input of already converted tokens (for instance in a macro body or token\-list
894expansion). Once we can properly feed back tokens we can also look ahead like
895\type {\futurelet} does. But for that to happen we will first clean up the
896\LUATEX\ input scanner code and error handler.
897
898\stopsection
899
900\stopchapter
901
902\stoptext
903
904 |