1
2
3\environment luametatexstyle
4
5\startdocument[title=Tokens]
6
7\startsection[title={Introduction}]
8
9If a \TEX\ programmer talks tokens (and nodes) the average user can safely ignore
10it. Often it is enough to now that your input is tokenized which means that one
11or more characters in the input got converted into some efficient internal
12representation that then travels through the system and triggers actions. When
13you see an error message with \TEX\ code, the reverse happened: tokens were
14converted back into commands that resemble the (often expanded) input.
15
16There are not that many examples here because the functions discusses here are
17often not used directly but instead integrated in a bit more convenient
18interfaces. However, in due time more examples might show up here.
19
20\stopsection
21
22\startsection[title={\LUA\ token representation}]
23
24A token is an 32 bit integer that encodes a command and a value, index, reference
25or whatever goes with a command. The input is converted into a token and the body
26of macros are stored as linked list of tokens. In the later case we combine a
27token and a next pointer in what is called a memory word. If we see tokens in
28\LUA\ we dont get the integer but a userdata object that comes with accessors.
29
30Unless youre into very low level programming the likelihood of encountering
31tokens is low. But related to tokens is scanning so that is what we cover here in
32more detail.
33
34\stopsection
35
36\startsection[title={Helpers}]
37
38\startsubsection[title={Basics}]
39
40References to macros are stored in a table along with some extra properties but
41in the end they travel around as tokens. The same is true for characters, they
42are also encoded in a token. We have three ways to create a token:
43
44\starttyping[option=LUA]
45function token.create ( <t:integer> value )
46 return <t:token> userdata
47end
48
49function token.create ( <t:integer> value, <t:integer> command)
50 return <t:token> userdata
51end
52
53function token.create ( <t:string> csname )
54 return <t:token> userdata
55end
56\stoptyping
57
58An example of the first variant is \type {token.create(65)}. When we
59print (inspect) this in \CONTEXT\ we get:
60
61\starttyping[option=LUA]
62<lua token : 476151 == letter 65>={
63 ["category"]="letter",
64 ["character"]="A",
65 ["id"]=476151,
66}
67\stoptyping
68
69If we say \type {token.create(65,12)} instead we get:
70
71\starttyping[option=LUA]
72<lua token : 476151 == otherchar 65>={
73 ["category"]="other",
74 ["character"]="A",
75 ["id"]=476151,
76}
77\stoptyping
78
79An example of the third call is \type {token.create("relax")}. This time get:
80
81\starttyping[option=LUA]
82<lua token : 580111 == relax : relax 0>={
83 ["active"]=false,
84 ["cmdname"]="relax",
85 ["command"]=16,
86 ["csname"]="relax",
87 ["expandable"]=false,
88 ["frozen"]=false,
89 ["id"]=580111,
90 ["immutable"]=false,
91 ["index"]=0,
92 ["instance"]=false,
93 ["mutable"]=false,
94 ["noaligned"]=false,
95 ["permanent"]=false,
96 ["primitive"]=true,
97 ["protected"]=false,
98 ["tolerant"]=false,
99}
100\stoptyping
101
102Another example is \type {token.create("dimen")}:
103
104\starttyping[option=LUA]
105<lua token : 467905 == dimen : register 3>={
106 ["active"]=false,
107 ["cmdname"]="register",
108 ["command"]=121,
109 ["csname"]="dimen",
110 ["expandable"]=false,
111 ["frozen"]=false,
112 ["id"]=467905,
113 ["immutable"]=false,
114 ["index"]=3,
115 ["instance"]=false,
116 ["mutable"]=false,
117 ["noaligned"]=false,
118 ["permanent"]=false,
119 ["primitive"]=true,
120 ["protected"]=false,
121 ["tolerant"]=false,
122}
123\stoptyping
124
125The most important properties are \type {command} and \type {index} because the
126combination determines what it does. The macros (here primitives) have a lot of extra
127properties. These are discusses in the low level manuals.
128
129You can check if something is a token with the next function; when a token is
130passed the return value is the string literal \type {token}.
131
132\starttyping[option=LUA]
133function token.type ( <t:whatever> )
134 return <t:string> "token" <t:nil>
135end
136\stoptyping
137
138A maybe more natural test is:
139
140\starttyping[option=LUA]
141function token.istoken ( <t:whatever> )
142 return <t:boolean> success
143end
144\stoptyping
145
146Internally we can see variables like \type {cmd}, \type {chr}, \type {tok} and
147such, where the later is a combination of the first two. The \type {create}
148variant that take two integers relate to this. Of course you need to know what
149the magic numbers are. Passing weird numbers can give side effects so dont
150expect too much help with that. You need to know what youre doing. The best way
151to explore the way these internals work is to just look at how primitives or
152macros or \type {\chardef}d commands are tokenized. Just create a known one and
153inspect its fields. A variant that ignores the current catcode table is:
154
155\startbuffer
156\protected\def\MyMacro#1{\dimen 0 = \numexpr #1 10 \relax}
157\stopbuffer
158
159\typebuffer
160
161A macro like this is actually a little program:
162
163\starttyping
164467922 19 49 match argument 1
165580083 20 0 end match
166
167467931 121 3 register dimen
168580013 12 48 other char 0 (U00030)
169582314 10 32 spacer
170582312 12 61 other char = (U0003D)
171580193 10 32 spacer
172582783 81 75 some item numexpr
173582310 21 1 parameter reference
174190952 10 32 spacer
175582785 12 43 other char (U0002B)
176476151 10 32 spacer
177580190 12 49 other char 1 (U00031)
178582265 12 48 other char 0 (U00030)
179467939 10 32 spacer
180580045 16 0 relax relax
181\stoptyping
182
183The first column shows indices in token memory where we have a token combined
184with a next pointer. So, in slot \type {467931} we have both a token and a
185pointer to slot \type {580013}.
186
187There is another way to create a token.
188
189\starttyping[option=LUA]
190function token.new ( <t:string> command, <t:integer> value )
191 return <t:token>
192end
193
194function token.new ( <t:integer> value, <t:integer> command )
195 return <t:token>
196end
197\stoptyping
198
199Watch the order of arguments. We not have four ways to create a token
200
201\starttyping[option=LUA]
202<lua token : 580087 == letter 65>={
203 ["category"]="letter",
204 ["character"]="A",
205 ["id"]=580087,
206}
207\stoptyping
208
209namely:
210
211\starttyping[option=LUA]
212token.new("letter",65)
213token.new(65,11)
214token.create(65,11)
215token.create(65)
216\stoptyping
217
218You can test if a control sequence is defined with:
219
220\starttyping[option=LUA]
221function token.isdefined ( <t:string> t )
222 return <t:boolean> success
223end
224\stoptyping
225
226The engine was never meant to be this open which means that in various places the
227assumption is that tokens are valid. However, it is possible to create tokens that
228make little sense in some context and can even make the system crash. When
229possible we catch this but checking everywhere would bloat the code and harm
230performance. Compare this to changing a few bytes in a binary that at some point
231create can havoc.
232
233\stopsubsection
234
235\startsubsection[title={Getters}]
236
237The userdata objects have a virtual interface that permits access by fieldname.
238Instead you can use one of the getters.
239
240
241
242\starttyping[option=LUA]
243function token.getcommand ( <t:token> t ) return <t:integer> end
244function token.getindex ( <t:token> t ) return <t:integer> end
245function token.getcmdname ( <t:token> t ) return <t:string> end
246function token.getcsname ( <t:token> t ) return <t:string> end
247function token.getid ( <t:token> t ) return <t:integer> end
248function token.getactive ( <t:token> t ) return <t:boolean> end
249\stoptyping
250
251If you want to know what the possible values are, you can use:
252
253\starttyping[option=LUA]
254function token.getrange (
255 <t:token> <t:integer>
256)
257return
258 <t:integer>, first
259 <t:integer> last
260end
261\stoptyping
262
263We can also ask for the macro properties but instead you can just fetch the bit
264set that describes them.
265
266\starttyping[option=LUA]
267function token.getexpandable ( <t:token> t ) return <t:boolean> end
268function token.getprotected ( <t:token> t ) return <t:boolean> end
269function token.getfrozen ( <t:token> t ) return <t:boolean> end
270function token.gettolerant ( <t:token> t ) return <t:boolean> end
271function token.getnoaligned ( <t:token> t ) return <t:boolean> end
272function token.getprimitive ( <t:token> t ) return <t:boolean> end
273function token.getpermanent ( <t:token> t ) return <t:boolean> end
274function token.getimmutable ( <t:token> t ) return <t:boolean> end
275function token.getinstance ( <t:token> t ) return <t:boolean> end
276function token.getconstant ( <t:token> t ) return <t:boolean> end
277\stoptyping
278
279The bit set can be fetched with:
280
281\starttyping[option=LUA]
282function token.getflags ( <t:token> t )
283 return <t:integer> bit set
284end
285\stoptyping
286
287The possible flags are:
288
289\startthreerows
290\getbuffer[engine:syntax:flagcodes]
291\stopthreerows
292
293The number of parameters of a macro can be queried with:
294
295\starttyping[option=LUA]
296function token.getparameters ( <t:token> t )
297 return <t:integer>
298end
299\stoptyping
300
301The three properties that are used to identify a token can be fetched with:
302
303\starttyping[option=LUA]
304function token.getcmdchrcs ( <t:token> t )
305 return
306 <t:integer>, command (cmd)
307 <t:integer>, value (chr)
308 <t:integer> index (cs)
309end
310\stoptyping
311
312A simpler call is:
313
314\starttyping[option=LUA]
315function token.getcstoken ( <t:string> csname )
316 return <t:integer> token number
317end
318\stoptyping
319
320A table with relevant properties of a token (or control sequence) can be fetched
321with:
322
323\starttyping[option=LUA]
324function token.getfields ( <t:token> token )
325 return <t:table> fields
326end
327
328function token.getfields ( <t:string> csname )
329 return <t:table> fields
330end
331\stoptyping
332
333\stopsubsection
334
335\startsubsection[title={Setters}]
336
337The \type {setmacro} function can be called with a different amount of arguments,
338where the prefix list comes last. Examples of prefixes are \type {global} and \type
339{protected}.
340
341\starttyping[option=LUA]
342function token.setmacro (
343 <t:string> csname
344)
345
346function token.setmacro (
347 <t:integer> catcodetable,
348 <t:string> csname
349)
350 no return values
351end
352
353function token.setmacro (
354 <t:string> csname,
355 <t:string> content
356)
357 no return values
358end
359
360function token.setmacro (
361 <t:integer> catcodetable,
362 <t:string> csname,
363 <t:string> content
364)
365 no return values
366end
367
368function token.setmacro (
369 <t:string> csname,
370 <t:string> content,
371 <t:string> prefix
372 there can be more prefixes
373)
374 no return values
375end
376
377function token.setmacro (
378 <t:integer> catcodetable,
379 <t:string> csname,
380 <t:string> content,
381 <t:string> prefix
382 there can be more prefixes
383)
384 no return values
385end
386\stoptyping
387
388A macro can also be queried:
389
390\starttyping[option=LUA]
391function token.getmacro (
392 <t:string> csname,
393 <t:boolean> preamble,
394 <t:boolean> onlypreamble
395)
396 return <t:string>
397end
398\stoptyping
399
400The various arguments determine what you get:
401
402\startbuffer
403\def\foo#1{foo: #1}
404
405\ctxlua{context.type(token.getmacro("foo"))}
406\ctxlua{context.type(token.getmacro("foo",true))}
407\ctxlua{context.type(token.getmacro("foo",false,true))}
408\stopbuffer
409
410\typebuffer
411
412We get:
413
414\startlines
415\getbuffer
416\stoplines
417
418The meaning can be fetched as string or table:
419
420\starttyping[option=LUA]
421function token.getmeaning (
422 <t:string> csname,
423)
424 return <t:string>
425end
426
427function token.getmeaning (
428 <t:string> csname,
429 <t:true> astable,
430 <t:boolean> subtables,
431 <t:boolean> originalindices special usage
432)
433 return <t:table>
434end
435\stoptyping
436
437The name says it:
438
439\starttyping[option=LUA]
440function token.undefinemacro ( <t:string> csname)
441 no return values
442end
443\stoptyping
444
445Expanding a macro happens in a \quote {local control} context which makes it
446immediate, that is, while running \LUA\ code.
447
448\starttyping[option=LUA]
449function token.expandmacro ( <t:string> csname)
450 no return values
451end
452\stoptyping
453
454This means that:
455
456\startbuffer
457\def\foo{\scratchdimen100pt \edef\oof{\the\scratchdimen}}
458
459\startluacode
460token.expandmacro("foo")
461context(token.getmacro("oof"))
462\stopluacode
463\stopbuffer
464
465\typebuffer
466
467gives:\inlinebuffer, because when \typ {getmacro} is called the expansion has
468been performed. You can consider this a sort of subrun (local to the main control
469loop).
470
471The next helper creates a token that refers to a \LUA\ function with an entry in
472the table that you can access with \typ {lua.getfunctionstable}. It is the
473companion to \type {\luadef}. When the first (and only) argument is true the size
474will preset to the value of \typ {texconfig.functionsize}.
475
476\starttyping[option=LUA]
477function token.setlua (
478 <t:string> csname,
479 <t:integer> id,
480 <t:string> prefix
481 there can be more prefixes
482)
483 return <t:token>
484end
485\stoptyping
486
487
488
489
490
491
492\stopsubsection
493
494\startsubsection[title={Writers}]
495
496In the \type {tex} library we have various ways to print something back to the
497input and the these print helpers in most cases also accept tokens. The \type
498{token.putnext} function is rather tolerant with respect to its arguments and
499there can be multiple. As with most prints, a new input level is created.
500
501\starttyping[option=LUA]
502function token.putnext ( <t:string> <t:number> <t:token> <t:table> )
503 no return values
504end
505\stoptyping
506
507Here are some examples. We save some scanned tokens and flush them
508
509\starttyping
510local t1 = token.scannext()
511local t2 = token.scannext()
512local t3 = token.scannext()
513local t4 = token.scannext()
514 watch out, we flush in sequence
515token.putnext { t1, t2 }
516 but this one gets pushed in front
517token.putnext ( t3, t4 )
518\stoptyping
519
520When we scan \type {wxyz!} we get \type {yzwx!} back. The argument is either a
521table with tokens or a list of tokens. The \type {token.expand} function will
522trigger expansion but what happens really depends on what youre doing where.
523
524This putter is actually a bit more flexible because the following input also
525works out okay:
526
527\startbuffer
528\def\foo#1{[#1]}
529
530\directlua {
531 local list = { 101, 102, 103, token.create("foo"), "{abracadabra}" }
532 token.putnext("(the)")
533 token.putnext(list)
534 token.putnext("(order)")
535 token.putnext(unpack(list))
536 token.putnext("(is reversed)")
537}
538\stopbuffer
539
540\typebuffer
541
542We get this: \blank {\tt \inlinebuffer} \blank So, strings get converted to
543individual tokens according to the current catcode regime and numbers become
544characters also according to this regime. A more low level, single token push
545back is the next one, it does the same as when \TEX\ itself puts a token back into
546the input, something that for instance happens when an integer is scanned and the
547last scanned token is not a digit.
548
549\starttyping[option=LUA]
550function token.putback ( <t:token> )
551 no return values
552end
553\stoptyping
554
555You can force an \quote {expand step} with the following function. What happens
556depends on the input and scanner states \TEX\ is.
557
558\starttyping[option=LUA]
559function token.expand ( )
560 no return values
561end
562\stoptyping
563
564\stopsubsection
565
566\startsubsection[title={Scanning}]
567
568The token library provides means to intercept the input and deal with it at the
569\LUA\ level. The library provides a basic scanner infrastructure that can be used
570to write macros that accept a wide range of arguments. This interface is on
571purpose kept general and as performance is quite okay so one can build additional
572parsers without too much overhead. Its up to macro package writers to see how
573they can benefit from this as the main principle behind \LUAMETATEX\ is to
574provide a minimal set of tools and no solutions. The scanner functions are
575probably the most intriguing.
576
577We start with token scanners. The first one just reads the next token from the
578current input (file, token list, \LUA\ output) while the second variant expands
579the next token, which can push back results and make us enter a new input level,
580and then reads a token from what is then the input.
581
582\starttyping[option=LUA]
583function token.scannext ( )
584 return <t:token>
585end
586
587function token.scannextexpanded ( )
588 return <t:token>
589end
590\stoptyping
591
592This is a simple scanner that picks up a character:
593
594\starttyping[option=LUA]
595function token.scannextchar ( )
596 return <t:string>
597end
598\stoptyping
599
600We can look ahead, that is: pick up a token and push a copy back into the input.
601The second helper first expands the upcoming token and the third one is the peek
602variant of \type {scannextchar}.
603
604\starttyping[option=LUA]
605function token.peeknext ( )
606 return <t:token>
607end
608
609function token.peeknextexpanded ( )
610 return <t:token>
611end
612
613function token.peeknextchar ( )
614 return <t:token>
615end
616\stoptyping
617
618We can skip tokens with the following two helpers where the second one first
619expands the upcoming token
620
621\starttyping[option=LUA]
622function token.skipnext ( )
623 no return values
624end
625
626function token.skipnextexpanded ( )
627 no return values
628end
629\stoptyping
630
631The next token can be converted into a combination of command and value. The
632second variant shown below first expands the upcoming token.
633
634\starttyping[option=LUA]
635function token.scancmdchr ( )
636 return
637 <t:integer>, command a.k.a cmd
638 <t:integer>, value a.k.a chr
639end
640
641function token.scancmdchrexpanded ( )
642 return
643 <t:integer>, command a.k.a cmd
644 <t:integer>, value a.k.a chr
645end
646\stoptyping
647
648We have two keywords scanners. The first scans how \TEX\ does it: a mixture of
649lower and uppercase. The second is case sensitive.
650
651\starttyping[option=LUA]
652function token.scankeyword ( <t:string> keyword )
653 return <t:boolean> success
654end
655
656function token.scankeywordcs ( <t:string> keyword )
657 return <t:boolean> success
658end
659\stoptyping
660
661The integer, dimension and glue scanners take an extra optional argument that
662signals that en optional equal is permitted. The next function errors when
663the integer exceeds the maximum that \TEX\ likes: \number \maxcount .
664
665\starttyping[option=LUA]
666function token.scaninteger ( <t:boolean> optionalequal )
667 return <t:integer>
668end
669\stoptyping
670
671Cardinals are unsigned integers:
672
673\starttyping[option=LUA]
674function token.scancardinal ( <t:boolean> optionalequal )
675 return <t:cardinal>
676end
677\stoptyping
678
679When an integer or dimension is wrapped in curly braces, like \type {{123}} and
680\type {{4.5pt}}, you can use one of the next two. Of course unwrapped integers
681and dimensions are also read.
682
683\starttyping[option=LUA]
684function token.scanintegerargument ( <t:boolean> optionalequal )
685 return <t:integer>
686end
687
688function token.scandimensionargument (
689 <t:boolean> infinity,
690 <t:boolean> mu,
691 <t:boolean> optionalequal
692)
693 return <t:integer>
694end
695\stoptyping
696
697When we scan for a float, we also accept an exponent, so \type {123.45} and
698\type {1.23e45} are valid:
699
700
701
702
703\starttyping[option=LUA]
704function token.scanfloat ( )
705 return <t:number>
706end
707\stoptyping
708
709Contrary to the previous scanner here we dont handle the exponent:
710
711\starttyping[option=LUA]
712function token.scanreal ( )
713 return <t:number>
714end
715\stoptyping
716
717In \LUA\ a very precise representation of a float is the hexadecimal notation. In
718addition to regular floating point, optionally with an exponent, you can also
719have \type {0x1.23p45}.
720
721
722
723\starttyping[option=LUA]
724function token.scanluanumber ( )
725 return <t:number>
726end
727\stoptyping
728
729Integers can be signed:
730
731\starttyping[option=LUA]
732function token.scanluainteger ( )
733 return <t:integer>
734end
735\stoptyping
736
737while cardinals (\MODULA2 speak) are unsigned:
738unsigned
739
740\starttyping[option=LUA]
741function token.scanluacardinal ( )
742 return <t:cardinal>
743end
744\stoptyping
745
746\cldcontext{token.scanscale()} 122.345
747
748\starttyping[option=LUA]
749function token.scanscale ( )
750 return <t:integer>
751end
752\stoptyping
753
754A posit is (in \LUAMETATEX) a float packed into an integer, but contrary to a
755scaled value it can have exponents. Here \type {12.34} gives {\tttf
756\cldcontext{token.scanposit()} 12.34} and Here \type {12.34e5} gives {\tttf
757\cldcontext{token.scanposit()}12.34e5}. Because we have integers we can store
758them in \LUAMETATEX\ float registers. Optionally you can return a float instead
759of the integer that encodes the posit.
760
761\starttyping[option=LUA]
762function token.scanposit (
763 <t:boolean> optionalqual,
764 <t:boolean> float
765)
766 return <t:integer> <t:float>
767end
768\stoptyping
769
770In (traditional) \TEX\ we dont really have floats. If we enter for instance a
771dimension in point units, we actually scan for two 16 bit integers that will be
772packed into a 32 bit integer. The next scanner expects a number plus a unit, like
773\type {pt}, \type {cm} and \type {em}, but also handles user defined units, like
774in \CONTEXT\ \type {tw}.
775
776\starttyping[option=LUA]
777function token.scandimension (
778 <t:boolean> infinity,
779 <t:boolean> mu,
780 <t:boolean> optionalequal
781)
782 return <t:integer>
783end
784\stoptyping
785
786A glue (spec) is a dimension with optional stretch andor shrink, like \typ {12pt plus
7874pt minus 2pt} or \typ {10pt plus 1 fill}. The glue scanner returns five values:
788
789\starttyping[option=LUA]
790function token.scanglue (
791 <t:boolean> mu,
792 <t:boolean> optionalequal
793)
794 return
795 <t:integer>, amount
796 <t:integer>, stretch
797 <t:integer>, shrink
798 <t:integer>, stretchorder
799 <t:integer> shrinkorder
800end
801
802function token.scanglue (
803 <t:boolean> mu,
804 <t:boolean> optionalequal,
805 <t:true>
806)
807 return {
808 <t:integer>, amount
809 <t:integer>, stretch
810 <t:integer>, shrink
811 <t:integer>, stretchorder
812 <t:integer> shrinkorder
813 }
814end
815\stoptyping
816
817The skip scanner does the same but returns a \type {gluespec} node:
818
819\starttyping[option=LUA]
820function token.scanskip (
821 <t:boolean> mu,
822 <t:boolean> optionalequal
823)
824 return <t:node> gluespec
825end
826\stoptyping
827
828There are several token scanners, for instance one that returns a table:
829
830\starttyping[option=LUA]
831function token.scantoks (
832 <t:boolean> macro,
833 <t:boolean> expand
834)
835 return <t:table> tokens
836end
837\stoptyping
838
839Here \type {token.scantoks()} will return \type {{123}} as
840
841\starttyping[option=LUA]
842{
843 "<lua token : 589866 == otherchar 49>",
844 "<lua token : 589867 == otherchar 50>",
845 "<lua token : 589870 == otherchar 51>",
846}
847\stoptyping
848
849The next variant returns a token list:
850
851\starttyping[option=LUA]
852function token.scantokenlist (
853 <t:boolean> macro,
854 <t:boolean> expand
855)
856 return <t:token> tokenlist
857end
858\stoptyping
859
860Here we get the head of a token list:
861
862\starttyping[option=LUA]
863<lua token : 590083 => 169324 : refcount>={
864 ["active"]=false,
865 ["cmdname"]="escape",
866 ["command"]=0,
867 ["expandable"]=false,
868 ["frozen"]=false,
869 ["id"]=590083,
870 ["immutable"]=false,
871 ["index"]=0,
872}
873\stoptyping
874
875This scans a single character token with specified catcode (bit) sets:
876
877\starttyping[option=LUA]
878function token.scancode ( <t:integer> catcodes )
879 return <t:string> character
880end
881\stoptyping
882
883This scans a single character token with catcode letter or other:
884
885\starttyping[option=LUA]
886function token.scantokencode ( )
887 return <t:token>
888end
889\stoptyping
890
891The difference between \typ {scanstring} and \typ {scanargument} is that the
892first returns a string given between \type {{}}, as \type {\macro} or as sequence
893of characters with catcode 11 or 12 while the second also accepts a \type {\cs}
894which then get expanded one level unless we force further expansion.
895
896\starttyping[option=LUA]
897function token.scanstring ( <t:boolean> expand )
898 return <t:string>
899end
900
901function token.scanargument ( <t:boolean> expand )
902 return <t:string>
903end
904\stoptyping
905
906So the \type {scanargument} function expands the given argument. When a braced
907argument is scanned, expansion can be prohibited by passing \type {false}
908(default is \type {true}). In case of a control sequence passing \type {false}
909will result in a onelevel expansion (the meaning of the macro).
910
911The string scanner scans for something between curly braces and expands on the
912way, or when it sees a control sequence it will return its meaning. Otherwise it
913will scan characters with catcode \type {letter} or \type {other}. So, given the
914following definition:
915
916\startbuffer
917\def\oof{oof}
918\def\foo{foo\oof}
919\stopbuffer
920
921\typebuffer \getbuffer
922
923we get:
924
925\starttabulate[lTll]
926\FL
927\BC name \BC result \NC \NR
928\TL
929\NC \type {\directlua{token.scanstring()}{foo}} \NC \directlua{context("{\\red\\type {"..token.scanstring().."}}")} {foo} \NC full expansion \NC \NR
930\NC \type {\directlua{token.scanstring()}foo} \NC \directlua{context("{\\red\\type {"..token.scanstring().."}}")} foo \NC letters and others \NC \NR
931\NC \type {\directlua{token.scanstring()}\foo} \NC \directlua{context("{\\red\\type {"..token.scanstring().."}}")}\foo \NC meaning \NC \NR
932\LL
933\stoptabulate
934
935The \type {\foo} case only gives the meaning, but one can pass an already
936expanded definition (\type {\edef}d). In the case of the braced variant one can
937of course use the \type {\detokenize} and \prm {unexpanded} primitives since
938there we do expand.
939
940A variant is the following which give a bit more control over what doesnt get
941expanded:
942
943\starttyping[option=LUA]
944function token.scantokenstring (
945 <t:boolean> noexpand,
946 <t:boolean> noexpandconstant,
947 <t:boolean> noexpandparameters
948)
949 return <t:string>
950end
951\stoptyping
952
953Heres one that can scan a delimited argument:
954
955\starttyping[option=LUA]
956function token.scandelimited (
957 <t:integer> leftdelimiter,
958 <t:integer> rightdelimiter,
959 <t:boolean> expand
960)
961 return <t:string>
962end
963\stoptyping
964
965A word is a sequence of what \TEX\ calls letters and other characters. The
966optional \type {keep} argument endures that trailing space and \type {\relax}
967tokens are pushed back into the input.
968
969\starttyping[option=LUA]
970function token.scanword ( <t:boolean> keep )
971 return <t:string>
972end
973\stoptyping
974
975Here we do the same but only accept letters:
976
977\starttyping[option=LUA]
978function token.scanletters ( <t:boolean> keep )
979 return <t:string>
980end
981\stoptyping
982
983\starttyping[option=LUA]
984function token.scankey ( )
985 return <t:string>
986end
987\stoptyping
988
989We can pick up a string that stops at a specific character with the next
990function, which accepts two such sentinels (think of a comma and closing
991bracket).
992
993\starttyping[option=LUA]
994function token.scanvalue ( <t:integer> one, <t:integer> two )
995 return <t:string>
996end
997\stoptyping
998
999This returns a single (\UTF) character. Special input like back slashes, hashes,
1000etc.\ are interpreted as characters.
1001
1002\starttyping[option=LUA]
1003function token.scanchar ( )
1004 return <t:string>
1005end
1006\stoptyping
1007
1008This scanner looks for a control sequence and if found returns the name.
1009Optionally leading spaces can be skipped.
1010
1011\starttyping[option=LUA]
1012function token.scancsname ( <t:boolean> skipspaces )
1013 return <t:string> <t:nil>
1014end
1015\stoptyping
1016
1017The next one returns an integer instead:
1018
1019\starttyping[option=LUA]
1020function token.scancstoken ( <t:boolean> skipspaces )
1021 return <t:integer> <t:nil>
1022end
1023\stoptyping
1024
1025This is a straightforward simple scanner that expands next token if needed:
1026
1027\starttyping[option=LUA]
1028function token.scantoken ( )
1029 return <t:token>
1030end
1031\stoptyping
1032
1033Then next scanner picks up a box specification and returns a \type {[hv]list}
1034node. There are two possible calls. The first variant expects a \type {\hbox}, \type
1035{\vbox} etc. The second variant scans for an explicitly passed box type: \type
1036{hbox}, \type {vbox}, \type {vbox} or \type {dbox}.
1037
1038\starttyping[option=LUA]
1039function token.scanbox ( )
1040 return <t:node> box
1041end
1042
1043function token.scanbox ( <t:string> boxtype )
1044 return <t:node> box
1045end
1046\stoptyping
1047
1048This scans and returns a so called \quote {detokenized} string:
1049
1050\starttyping[option=LUA]
1051function token.scandetokened ( <t:boolean> expand )
1052 return <t:string>
1053end
1054\stoptyping
1055
1056In the next function we check if a specific character with catcode
1057letter or other is picked up.
1058
1059\starttyping[option=LUA]
1060function token.isnextchar ( <t:integer> charactercode )
1061 return <t:boolean>
1062end
1063\stoptyping
1064
1065\stopsubsection
1066
1067\startsubsection[title={Gobbling}]
1068
1069You can gobble up an integer or dimension with the following helpers. An error is silently
1070ignored.
1071
1072\starttyping[option=LUA]
1073function token.gobbleinteger ( <t:boolean> optionalequal )
1074 no return values
1075end
1076
1077function token.gobbledimension ( <t:boolean> optionalequal )
1078 no return values
1079end
1080\stoptyping
1081
1082This is a nested gobbler:
1083
1084\starttyping[option=LUA]
1085function token.gobble ( <t:token> left, <t:token> right )
1086 no return values
1087end
1088\stoptyping
1089
1090and this a nested grabber that returns a string:
1091
1092\starttyping[option=LUA]
1093function token.grab ( <t:token> left, <t:token> right )
1094 return <t:string>
1095end
1096\stoptyping
1097
1098\stopsubsection
1099
1100\startsubsection[title={Macros}]
1101
1102This is a nasty one. It pick up two tokens. Then it checks if the next character
1103matches the argument and if so, it pushes the first token back into the input,
1104otherwise the second.
1105
1106\starttyping[option=LUA]
1107function token.futureexpand ( <t:integer> charactercode )
1108 no return values
1109end
1110\stoptyping
1111
1112The \type {pushmacro} and \type {popmacro} function are still experimental and
1113can be used to get and set an existing macro. The push call returns a user data
1114object and the pop takes such a userdata object. These object have no accessors
1115and are to be seen as abstractions.
1116
1117\starttyping[option=LUA]
1118function token.pushmacro ( <t:string> csname )
1119 return <t:userdata>
1120end
1121
1122function token.pushmacro ( <t:integer> token )
1123 return <t:userdata> entry
1124end
1125\stoptyping
1126
1127\starttyping[option=LUA]
1128function token.popmacro ( <t:userdata> entry )
1129 return todo
1130end
1131\stoptyping
1132
1133This saves a \LUA\ function index on the save stack. When a group is closes the
1134function will be called.
1135
1136\starttyping[option=LUA]
1137function token.savelua ( <t:integer> functionindex, <t:boolean> backtrack )
1138 no return values
1139end
1140\stoptyping
1141
1142The next function serializes a token list:
1143
1144\starttyping[option=LUA]
1145function token.serialize ( )
1146 return <t:string>
1147end
1148\stoptyping
1149
1150The function is somewhat picky so give van example in \CONTEXT\ speak:
1151
1152\startbuffer
1153\startluacode
1154 local t = token.scantokenlist()
1155 local s = token.serialize(t)
1156 context.type(tostring(t)) context.par()
1157 context.type(s) context.par()
1158 context(s) context.par()
1159\stopluacode {before\hskip10pt after}
1160\stopbuffer
1161
1162\typebuffer
1163
1164The serialize expects a token list as scanned by \typ {scantokenlist} which
1165starts with token that points to the list and maintains a reference count, which
1166in this context is irrelevant but is used in the engine to prevent duplicates;
1167for instance the \type {\let} primitive just points to the original and bumps the
1168count.
1169
1170\startlines
1171\getbuffer
1172\stoplines
1173
1174You can interpret a string as \TEX\ input with embedded macros expanded, unless
1175they are unexpandable.
1176
1177\starttyping[option=LUA]
1178function token.getexpansion ( <t:string> code )
1179 return <t:string> result
1180end
1181\stoptyping
1182
1183Here is an example:
1184
1185\startbuffer
1186 \def\foo{foo}
1187\protected\def\oof{oof}
1188
1189\startluacode
1190context.type(token.getexpansion("test \relax"))
1191context.par()
1192context.type(token.getexpansion("test \\relax{!} \\foo\\oof"))
1193\stopluacode
1194\stopbuffer
1195
1196\typebuffer
1197
1198Watch how the single backslash actually is a \LUA\ escape that results in
1199a newline:
1200
1201\startlines
1202\getbuffer
1203\stoplines
1204
1205You can also specify a catcode table identifier:
1206
1207\starttyping[option=LUA]
1208function token.getexpansion (
1209 <t:integer> catcodetable,
1210 <t:string> code
1211)
1212 return <t:string> result
1213end
1214\stoptyping
1215
1216\stopsubsection
1217
1218\startsubsection[title={Information}]
1219
1220In some cases you signal to \LUA\ what data type is involved. The list of known
1221types are available with:
1222
1223\starttyping[option=LUA]
1224function token.getfunctionvalues ( )
1225 return <t:table>
1226end
1227\stoptyping
1228
1229\startthreerows
1230\getbuffer[engine:syntax:functioncodes]
1231\stopthreerows
1232
1233The names of command is made available with:
1234
1235\starttyping[option=LUA]
1236function token.getcommandvalues ( )
1237 return <t:table>
1238end
1239\stoptyping
1240
1241\starttworows
1242\getbuffer[engine:syntax:commandcodes]
1243\stoptworows
1244
1245The complete list of primitives can be fetched with the next one:
1246
1247\starttyping[option=LUA]
1248function token.getprimitives ( )
1249 return {
1250 { <t:integer>, <t:integer>, <t:string> }, command, value, name
1251 ...
1252 }
1253end
1254\stoptyping
1255
1256The numbers shown below can change if we add or reorganize primitives, although
1257this seldom happens. The list gives an impression how primitives are grouped.
1258
1259\showengineprimitives[2]
1260
1261This is a curious one: it returns the number of steps that a hash lookup took:
1262
1263\starttyping[option=LUA]
1264function token.locatemacro ( <t:string> name )
1265 return <t:integer> steps
1266end
1267\stoptyping
1268
1269We used this helper when deciding on a reasonable hash size. Of the many
1270primitives there are a few that need more than one lookup step:
1271
1272\startluacode
1273local p = token.getprimitives()
1274local d = { { }, { }, { }, { } }
1275local n = { 0 , 0 , 0 , 0 }
1276table.sort(p,function(a,b) return a[3] < b[3] end)
1277for i=1,#p do
1278 local m = p[i][3]
1279 local s = token.locatemacro(m)
1280 if n[s] then
1281 if s > 1 then
1282 table.insert(d[s],m)
1283 end
1284 n[s] = n[s] + 1
1285 else
1286 print(">>>>>>>>>>>>>>>>>>>>>>>>>> check",s)
1287 end
1288end
1289context.starttabulate { "|c|r|lpT|" }
1290context.FL()
1291context.BC() context("steps")
1292context.BC() context("total")
1293context.BC() context("macros")
1294context.NC() context.NR()
1295context.TL()
1296for i=1,4 do
1297 local di = d[i]
1298 local ni = n[i]
1299 if ni > 0 then
1300 context.NC() context(i)
1301 context.NC() context(ni)
1302 context.NC() if ni > 20 then context.unknown() else context("% t",di) end
1303 context.NC() context.NR()
1304 end
1305end
1306context.LL()
1307context.stoptabulate()
1308\stopluacode
1309
1310\stopsubsection
1311
1312\stopsection
1313
1314\stopdocument
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472 |