still-expanding.tex /size: 35 Kb    last modification: 2023-12-21 09:43
1% language=us
2
3\environment still-environment
4
5\starttext
6
7\startchapter[title=Possibly useful extensions]
8
9\startsection[title=Introduction]
10
11While working on \LUATEX, it is tempting to introduce all kinds of new fancy
12programming features. Arguments for doing this can be characterized by
13descriptions like \quote {handy}, \quote {speedup}, \quote {less code}, \quote
14{necessity}. It must be stated that traditional \TEX\ is rather complete, and one
15can do quite a lot of macro magic to achieve many goals. So let us look a bit
16more at the validity of these arguments.
17
18The \quote {handy} argument is in fact a valid one. Of course, one can always
19wrap clumsy code in a macro to hide the dirty tricks, but, still, it would be
20nicer to avoid needing to employ extremely dirty tricks. I found myself looking
21at old code wondering why something has to be done in such a complex way, only to
22realize, after a while, that it comes with the concept; one can get accustomed to
23it. After all, every programming language has its stronger and weaker aspects.
24
25The \quote {speedup} argument is theoretically a good one too, but, in practice,
26it's hard to prove that a speedup really occurs. Say we save 5\% on a job. This
27is nice for multipass on a server where many jobs run at the same time or after
28each other, but a little bit of clever macro coding will easily gain much more.
29Or, as we often see: sloppy macro or style writing will easily negate those
30gains. Another pitfall is that you can measure (say) half a million calls to a
31macro can indeed be brought down to a fraction of its runtime thanks to some
32helper, but, in practice, you will not see that gain because saving 0.1 seconds
33on a 10 second run can be neglected. Furthermore, adding a single page to the
34document will already make such a gain invisible to the user as that will itself
35increase the runtime. Of course, many small speedups can eventually accumulate to
36yield a significant overall gain, but, if the macro package is already quite
37optimized, it might not be easy to squeeze out much more. At least in \CONTEXT, I
38find it hard to locate bottlenecks that could benefit from extensions, unless one
39adds very specific features, which is not what we want.
40
41Of course one can create \quote {less} code by using more wrappers. But this can
42definitely have a speed penalty, so this argument should be used with care. An
43appropriate extra helper can make wrappers fast and the fewer helpers the better.
44The danger is in choosing what helpers. A good criterion is that it should be
45hard otherwise in \TEX. Adding more primitives (and overhead) merely because some
46macro package would like it would be bad practice. I'm confident that helpers for
47\CONTEXT\ would not be that useful for plain \TEX, \LATEX, etc., and vice versa.
48
49The \quote {necessity} argument is a strong one. Many already present extensions
50from \ETEX\ fall into this category: fully expandable expressions (although the
51implementation is somewhat restricted), better macro protection, expansion
52control, and the ability to test for a so|-|called csname (control sequence name)
53are examples.
54
55In the end, the only valid argument is \quote {it can't be done otherwise}, which
56is a combination of all these arguments with \quote {necessity} being dominant.
57This is why in \LUATEX\ there are not that many extensions to the language (nor
58will there be). I must admit that even after years of working with \TEX, the
59number of wishes for more facilities is not that large.
60
61The extensions in \LUATEX, compared to traditional \TEX, can be summarized as
62follows:
63
64\startitemize
65    \startitem
66        Of course we have the \ETEX\ extensions, and these already have
67        a long tradition of proven usage. We did remove the limited directional
68        support.
69    \stopitem
70    \startitem
71        From \ALEPH\ (follow-up on \OMEGA), part of the directional support and
72        some font support was inherited.
73    \stopitem
74    \startitem
75        From \PDFTEX, we took most of the backend code, but it has been improved
76        in the meantime. We also took the protrusion and expansion code, but
77        especially the latter has been implemented a bit differently (in the
78        frontend as well as in the backend).
79    \stopitem
80    \startitem
81        Some handy extensions from \PDFTEX\ have been generalized; other
82        obscure or specialized ones have been removed. So we now have
83        frontend support for position tracking, resources (images) and reusable
84        content in the core. The backend code has been separated a bit better and
85        only a few backend|-|related primitives remain.
86    \stopitem
87    \startitem
88        The input encoding is now \UTF-8, exclusively, but one can easily hook in
89        code to preprocess data that enters \TEX's parser using \LUA. The
90        characteristic catcode settings for \TEX\ can be grouped and switched
91        efficiently.
92    \stopitem
93    \startitem
94        The font machinery has been opened wide so that we can use the embedded
95        \LUA\ interpreter to implement any technology that we might want, with
96        the usual control that \TEX ies like. Some further limitations have been
97        lifted. One interesting point is that one can now construct virtual fonts
98        at runtime.
99    \stopitem
100    \startitem
101        Ligature construction, kerning and paragraph building have been separated
102        as a side effect of \LUA\ control. There are some extensions in that
103        area. For instance, we store the language and min|/|max values in the
104        glyph nodes, and we also store penalties with discretionaries. Patterns
105        can be loaded at runtime, and character codes that influence
106        hyphenation can be manipulated.
107    \stopitem
108    \startitem
109        The math renderer has been upgraded to support \OPENTYPE\ math. This has
110        resulted in many new primitives and extensions, not only to define
111        characters and spacing, but also to control placement of superscripts and
112        subscripts and generally to influence the way things are constructed. A
113        couple of mechanisms have gained control options.
114    \stopitem
115    \startitem
116        Several \LUA\ interfaces are available making it possible to manipulate the
117        (intermediate) results. One can pipe text to \TEX, write parsers, mess
118        with node lists, inspect attributes assigned at the \TEX\ end, etc.
119    \stopitem
120\stopitemize
121
122Some of the features mentioned above are rather \LUATEX\ specific, such as
123catcode tables and attributes. They are present as they permit more advanced
124\LUA\ interfacing. Other features, such as \UTF-8\ and \OPENTYPE\ math, are a
125side effect of more modern techniques. Bidirectional support is there because it
126was one of the original reasons for going forward with \LUATEX. The removal of
127backend primitives and thereby separating the code in a better way (see companion
128article) comes from the desire to get closer to the traditional core, so that
129most documentation by Don Knuth still applies. It's also the reason why we still
130speak of \quote {tokens}, \quote {nodes} and \quote {noads}.
131
132In the following sections I will discuss a few new low|-|level primitives. This
133is not a complete description (after all, we have reported on much already), and
134one can consult the \LUATEX\ manual to get the complete picture. The extensions
135described below are also relatively new and date from around version 0.85, the
136prelude to the stable version~1 release.
137
138\stopsection
139
140\startsection[title=Rules]
141
142For insiders, it is no secret that \TEX\ has no graphic capabilities, apart from
143the ability to draw rules. But with rules you can do quite a lot already. Add to
144that the possibility to insert arbitrary graphics or even backend drawing
145directives, and the average user won't notice that it's not true core
146functionality.
147
148When we started with \LUATEX, we used code from \PDFTEX\ and \OMEGA\ (\ALEPH),
149and, as a consequence, we ended up with many whatsits. Normal running text has
150characters, kerns, some glue, maybe boxes, all represented by a limited set of
151so|-|called nodes. A whatsit is a kind of escape as it can be anything an
152extension to \TEX\ needs to wrap up and put in the current list. Examples are (in
153traditional \TEX\ already) whatsits that write to file (using \type {\write}) and
154whatsits that inject code into the backend (using \type {\special}). The
155directional mechanism of \OMEGA\ uses whatsits to indicate direction changes.
156
157For a long time images were also included using whatsits, and basically one had
158to reserve the right amount of space and inject a whatsit with a directive for
159the backend to inject something there with given dimensions or scale. Of course,
160one then needs methods to figure out the image properties, but, in the end, all
161of this could be done rather easily.
162
163In \PDFTEX, two new whatsits were introduced: images and reusable so|-|called
164forms, and, contrary to other whatsits, these do have dimensions. As a result,
165suddenly the \TEX\ code base could no longer just ignore whatsits, but it had to
166check for these two when dimensions were important, for instance in the paragraph
167builder, packager, and backend.
168
169So what has this to do with rules? Well, in \LUATEX\ all the whatsits are now
170back to where they belong, in the backend extension code. Directions are now
171first|-|class nodes, and we have native resources and reusable boxes. These
172resources and boxes are an abstraction of the \PDFTEX\ images and forms, and,
173internally, they are a special kind of rule (i.e.\ a blob with dimensions).
174Because checking for rules is part of the (traditional) \TEX\ kernel, we could
175simply remove the special whatsit code and let existing rule|-|related code do
176the job. This simplified the code a lot.
177
178Because we suddenly had two more types of rules, we took the opportunity to add a
179few more.
180
181\starttyping
182\nohrule width 10cm height 2cm depth 0cm
183\novrule width 10cm height 2cm depth 0cm
184\stoptyping
185
186This is a way to reserve space, and it's nearly equivalent to the following
187(respectively):
188
189\starttyping
190{\setbox0\hbox{}\wd0=10cm\ht0=2cm\dp0=0cm\box0\relax}
191{\setbox0\vbox{}\wd0=10cm\ht0=2cm\dp0=0cm\box0\relax}
192\stoptyping
193
194There is no real gain in efficiency because keywords also take time to parse, but
195the advantage is that no \LUA\ callbacks are triggered. \footnote {I still am
196considering adding variants of \type {\hbox} and \type {\vbox} where no callback
197would be triggered.} Of course, this variant would not have been introduced had
198we still had just rules and no further subtypes; it was just a rather trivial
199extension that fit in the repertoire. \footnote {This is one of the things I
200wanted to have for a long time but seems less useful today.}
201
202So, while we were at it, yet another rule type was introduced, but this one has
203been made available only in \LUA. As this text is about \LUATEX, a bit of \LUA\
204code does fit into the discussion, so here we go. The code shown here is rather
205generic and looks somewhat different in \CONTEXT, but it does the job.
206
207First, let's create a straightforward rectangle drawing routine. We initialize
208some variables first, then scan properties using the token scanner, and, finally,
209we construct the rectangle using four rules. The packaged (so|-|called) hlist is
210written to \TEX.
211
212\startbuffer
213\startluacode
214function FramedRule()
215    local width     = 0
216    local height    = 0
217    local depth     = 0
218    local linewidth = 0
219    --
220    while true do
221        if token.scan_keyword("width") then
222            width = token.scan_dimen()
223        elseif token.scan_keyword("height") then
224            height = token.scan_dimen()
225        elseif token.scan_keyword("depth") then
226            depth = token.scan_dimen()
227        elseif token.scan_keyword("line") then
228            linewidth = token.scan_dimen()
229        else
230            break
231        end
232    end
233    local doublelinewidth = 2*linewidth
234    --
235    local left    = node.new("rule")
236    local bottom  = node.new("rule")
237    local right   = node.new("rule")
238    local top     = node.new("rule")
239    local back    = node.new("kern")
240    local list    = node.new("hlist")
241    --
242    left.width    = linewidth
243    bottom.width  = width - doublelinewidth
244    bottom.height = -depth + linewidth
245    bottom.depth  = depth
246    right.width   = linewidth
247    top.width     = width - doublelinewidth
248    top.height    = height
249    top.depth     = -height + linewidth
250    back.kern     = -width + linewidth
251    list.list     = left
252    list.width    = width
253    list.height   = height
254    list.depth    = depth
255    list.dir      = "TLT"
256    --
257    node.insert_after(left,left,bottom)
258    node.insert_after(left,bottom,right)
259    node.insert_after(left,right,back)
260    node.insert_after(left,back,top)
261    --
262    node.write(list)
263 end
264\stopluacode
265\stopbuffer
266
267\typebuffer \getbuffer
268
269This function can be wrapped in a macro:
270
271\startbuffer
272\def\FrameRule{\directlua{FramedRule()}}
273\stopbuffer
274
275\typebuffer \getbuffer
276
277and the macro can be used as follows:
278
279\startbuffer
280\FrameRule width 3cm height 1cm depth 1cm line 2pt
281\stopbuffer
282
283\typebuffer
284
285The result is: \inlinebuffer
286
287A different approach follows. Again, we define a rule, but, this time we only set
288dimensions and assign some attributes to it. Normally, one would reserve some
289attribute numbers for this purpose, but, for our example here, high numbers are
290safe enough. Now there is no need to wrap the rule in a box.
291
292\startbuffer
293\startluacode
294function FramedRule()
295    local width     = 0
296    local height    = 0
297    local depth     = 0
298    local linewidth = 0
299    local radius    = 0
300    local type      = 0
301    --
302    while true do
303        if token.scan_keyword("width") then
304            width = token.scan_dimen()
305        elseif token.scan_keyword("height") then
306            height = token.scan_dimen()
307        elseif token.scan_keyword("depth") then
308            depth = token.scan_dimen()
309        elseif token.scan_keyword("line") then
310            linewidth = token.scan_dimen()
311        elseif token.scan_keyword("type") then
312            type = token.scan_int()
313        elseif token.scan_keyword("radius") then
314            radius = token.scan_dimen()
315        else
316            break
317        end
318    end
319    --
320    local r   = node.new("rule")
321    r.width   = width
322    r.height  = height
323    r.depth   = depth
324    r.subtype = 4 -- user rule
325    r[20000]  = type
326    r[20001]  = linewidth
327    r[20002]  = radius or 0
328    node.write(r)
329end
330\stopluacode
331\stopbuffer
332
333\typebuffer \getbuffer
334
335Nodes with subtype~4 (user) are intercepted and passed to a callback function,
336when set. Here we show a possible implementation:
337
338\startbuffer
339\startluacode
340local bpfactor = (7200/7227)/65536
341
342local f_rectangle = "%f w 0 0 %f %f re %s"
343
344local f_radtangle = [[
345    %f w %f 0 m
346    %f 0 l %f %f %f %f y
347    %f %f l %f %f %f %f y
348    %f %f l %f %f %f %f y
349    %f %f l %f %f %f %f y
350    h %s
351]]
352
353callback.register("process_rule",function(n,h,v)
354    local t = n[20000] == 0 and "f" or "s"
355    local l = n[20001] * bpfactor -- linewidth
356    local r = n[20002] * bpfactor -- radius
357    local w = h * bpfactor
358    local h = v * bpfactor
359    if r > 0 then
360        p = string.format(f_radtangle,
361            l, r, w-r, w,0,w,r, w,h-r, w,h,w-r,h,
362            r,h, 0,h,0,h-r, 0,r, 0,0,r,0, t)
363    else
364        p = string.format(f_rectangle, l, w, h, t)
365    end
366    pdf.print("direct",p)
367end)
368\stopluacode
369\stopbuffer
370
371\typebuffer \getbuffer
372
373We can now also specify a radius and type, where \type {0} is a filled and \type
374{1} a stroked shape.
375
376\startbuffer
377\FrameRule
378    type   1
379    width  3cm
380    height 1cm
381    depth  5mm
382    line   0.2mm
383    radius 2.5mm
384\stopbuffer
385
386\typebuffer
387
388Since we specified a radius we get round corners: \inlinebuffer
389
390The nice thing about these extensions to rules is that the internals of \TEX\ are
391not affected much. Rules are just blobs with dimensions and the par builder, for
392instance, doesn't care what they are. There is no need for further inspection.
393Maybe future versions of \LUATEX\ will provide more useful subtypes.
394
395\stopsection
396
397\startsection[title=Spaces]
398
399Multiple successive spaces in \TEX\ are normally collapsed into one. But, what if
400you don't want any spaces at all? It turns out this is rather hard to achieve.
401You can, of course, change the catcodes, but that won't work well if you pass
402text around as macro arguments. Also, you would not want spaces that separate
403macros and text to be ignored, but only those in the typeset text. For such use,
404\LUATEX\ introduces \type {\nospaces}.
405
406This new primitive can be used to overrule the usual \type {\spaceskip}|-|related
407heuristics when a space character is seen in a text flow. The value~\type{1}
408specifies no injection, a value of \type{2} results in injection of a zero skip,
409and the default \type{0} gets the standard behavior. Below we see the results for
410four characters separated by spaces.
411
412\startlinecorrection \dontcomplain
413\startcombination[nx=3,ny=2,distance=1cm]
414    {\ruledhbox to 4cm{\vtop{\hsize 10mm\nospaces=0\relax x x x x \par}\hss}} {\type {0 / hsize 10mm}}
415    {\ruledhbox to 4cm{\vtop{\hsize 10mm\nospaces=1\relax x x x x \par}\hss}} {\type {1 / hsize 10mm}}
416    {\ruledhbox to 4cm{\vtop{\hsize 10mm\nospaces=2\relax x x x x \par}\hss}} {\type {2 / hsize 10mm}}
417    {\ruledhbox to 4cm{\vtop{\hsize  1mm\nospaces=0\relax x x x x \par}\hss}} {\type {0 / hsize 1mm}}
418    {\ruledhbox to 4cm{\vtop{\hsize  1mm\nospaces=1\relax x x x x \par}\hss}} {\type {1 / hsize 1mm}}
419    {\ruledhbox to 4cm{\vtop{\hsize  1mm\nospaces=2\relax x x x x \par}\hss}} {\type {2 / hsize 1mm}}
420\stopcombination
421\stoplinecorrection
422
423In case you wonder why setting the space related skips to zero is not enough:
424even when it is set to zero you will always get something. What gets inserted
425depends on \type {\spaceskip}, \type {\xspaceskip}, \type {\spacefactor} and font
426dimensions. I must admit that I always have to look up the details, as, normally,
427it's wrapped up in a spacing system that you implement once then forget about. In
428any case, with \type {\nospaces}, you can completely get rid of even an inserted
429zero space.
430
431\stopsection
432
433\startsection[title=Token lists]
434
435The following four primitives are provided because they are more efficient than
436macro|-|based variants: \type {\toksapp}, \type {\tokspre}, and \type {\e...}
437(expanding) versions of both. They can be used to append or prepend tokens to a
438token register.
439
440However, don't overestimate the gain that can be brought in simple situations
441with not that many tokens involved (read: there is no need to instantly change
442all code that does it the traditional way). The new method avoids saving tokens
443in a temporary register. Then, when you combine registers (which is also
444possible), the source gets appended to the target and, afterwards, the source is
445emptied: we don't copy but combine!
446
447Their use can best be demonstrated by examples. We employ a scratch register
448\type {\ToksA}. The examples here show the effects of grouping; in fact, they
449were written for testing this effect. Because we don't use the normal assignment
450code, we need to initialize a local copy in order to get the original content
451outside the group.
452
453\newtoks\ToksA
454\newtoks\ToksB
455
456\startbuffer
457\ToksA{}
458\bgroup
459   \ToksA{}
460   \bgroup \toksapp\ToksA{!!} [\the\ToksA=!!] \egroup
461   [\the\ToksA=]
462\egroup
463[\the\ToksA=]
464\stopbuffer
465
466\typebuffer result: {\nospacing\start\tttf\inlinebuffer\stop}
467
468\startbuffer
469\ToksA{}
470\bgroup
471    \ToksA{A}
472    \bgroup \toksapp\ToksA{!!} [\the\ToksA=A!!] \egroup
473    [\the\ToksA=A]
474\egroup
475[\the\ToksA=]
476\stopbuffer
477
478\typebuffer result: {\nospacing\start\tttf\inlinebuffer\stop}
479
480\startbuffer
481\ToksA{}
482\bgroup
483    \ToksA{}
484    \bgroup
485        \ToksA{A} \toksapp\ToksA{!!} [\the\ToksA=A!!]
486    \egroup
487    [\the\ToksA=]
488\egroup
489[\the\ToksA=]
490\stopbuffer
491
492\typebuffer result: {\nospacing\start\tttf\inlinebuffer\stop}
493
494\startbuffer
495\ToksA{}
496\bgroup
497    \ToksA{A}
498    \bgroup
499        \ToksA{} \toksapp\ToksA{!!} [\the\ToksA=!!]
500    \egroup
501    [\the\ToksA=A]
502\egroup
503[\the\ToksA=]
504\stopbuffer
505
506\typebuffer result: {\nospacing\start\tttf\inlinebuffer\stop}
507
508
509\startbuffer
510\ToksA{}
511\bgroup
512    \ToksA{}
513    \bgroup
514        \tokspre\ToksA{!!} [\the\ToksA=!!]
515    \egroup
516   [\the\ToksA=]
517\egroup
518[\the\ToksA=]
519\stopbuffer
520
521\typebuffer result: {\nospacing\start\tttf\inlinebuffer\stop}
522
523\startbuffer
524\ToksA{}
525\bgroup
526    \ToksA{A}
527    \bgroup
528        \tokspre\ToksA{!!} [\the\ToksA=!!A]
529    \egroup
530    [\the\ToksA=A]
531\egroup
532[\the\ToksA=]
533\stopbuffer
534
535\typebuffer result: {\nospacing\start\tttf\inlinebuffer\stop}
536
537\startbuffer
538\ToksA{}
539\bgroup
540    \ToksA{}
541    \bgroup
542        \ToksA{A} \tokspre\ToksA{!!} [\the\ToksA=!!A]
543    \egroup
544    [\the\ToksA=]
545\egroup
546[\the\ToksA=]
547\stopbuffer
548
549\typebuffer result: {\nospacing\start\tttf\inlinebuffer\stop}
550
551\startbuffer
552\ToksA{}
553\bgroup
554    \ToksA{A}
555    \bgroup
556        \ToksA{} \tokspre\ToksA{!!} [\the\ToksA=!!]
557    \egroup
558    [\the\ToksA=A]
559\egroup
560[\the\ToksA=]
561\stopbuffer
562
563\typebuffer result: {\nospacing\start\tttf\inlinebuffer\stop}
564
565Here we used \type {\toksapp} and \type {\tokspre}, but there are two more
566primitives, \type {\etoksapp} and \type {\etokspre}; these expand the given
567content while it gets added.
568
569The next example demonstrates that you can also append another token list. In
570this case the original content is gone after an append or prepend.
571
572\startbuffer
573\ToksA{A}
574\ToksB{B}
575\toksapp\ToksA\ToksB
576\toksapp\ToksA\ToksB
577[\the\ToksA=AB]
578\stopbuffer
579
580\typebuffer result: {\nospacing\start\tttf\inlinebuffer\stop}
581
582This is intended behaviour! The original content of the source is not copied but
583really appended or prepended. Of course, grouping works well.
584
585\startbuffer
586\ToksA{A}
587\ToksB{B}
588\bgroup
589    \toksapp\ToksA\ToksB
590    \toksapp\ToksA\ToksB
591    [\the\ToksA=AB]
592\egroup
593[\the\ToksA=AB]
594\stopbuffer
595
596\typebuffer result: {\nospacing\start\tttf\inlinebuffer\stop}
597
598\stopsection
599
600\startsection[title=Active characters]
601
602We now enter an area of very dirty tricks. If you have read the \TEX\ book or
603listened to talks by \TEX\ experts, you will, for sure, have run into the term
604\quote {active} characters. In short, it boils down to this: each character has a
605catcode and there are 16 possible values. For instance, the backslash normally
606has catcode zero, braces have values one and two, and normal characters can be 11
607or 12. Very special are characters with code 13 as they are \quote {active} and
608behave like macros. In Plain \TEX, the tilde is one such active character, and
609it's defined to be a \quote {non|-|breakable space}. In \CONTEXT, the vertical
610bar is active and used to indicate compound and fence constructs.
611
612Below is an example of a definition:
613
614\starttyping
615\catcode`A=13
616\def A{B}
617\stoptyping
618
619This will make the \type {A} into an active character that will typeset a \type
620{B}. Of course, such an example is asking for problems since any \type {A} is
621seen that way, so a macro name that uses one will not work. Speaking of macros:
622
623\starttyping
624\def\whatever
625  {\catcode`A=13
626   \def A{B}}
627\stoptyping
628
629This won't work out well. When the macro is read it gets tokenized and stored and
630at that time the catcode change is not yet done so when this macro is called the
631A is frozen with catcode letter (11) and the \type {\def} will not work as
632expected (it gives an error). The solution is this:
633
634\starttyping
635\bgroup
636\catcode`A=13
637\gdef\whatever
638  {\catcode`A=13
639   \def A{B}}
640\egroup
641\stoptyping
642
643Here we make the \type {A} active before the definition and we use grouping
644because we don't want that to be permanent. But still we have a hard|-|coded
645solution, while we might want a more general one that can be used like this:
646
647\starttyping
648\whatever{A}{B}
649\whatever{=}{{\bf =}}
650\stoptyping
651
652Here is the definition of \type {whatever}:
653
654\starttyping
655\bgroup
656\catcode`~=13
657\gdef\whatever#1#2%
658  {\uccode`~=`#1\relax
659   \catcode`#1=13
660   \uppercase{\def\tempwhatever{~}}%
661   \expandafter\gdef\tempwhatever{#2}}
662\egroup
663\stoptyping
664
665If you read backwards, you can imagine that \type {\tempwhatever} expands into an
666active \type {A} (the first argument). So how did it become one? The trick is in
667the \type {\uppercase} (a \type {\lowercase} variant will also work). When casing
668an active character, \TEX\ applies the (here) uppercase and makes the result
669active too.
670
671We can argue about the beauty of this trick or its weirdness, but it is a fact
672that for a novice user this indeed looks more than a little strange. And so, a
673new primitive \type {\letcharcode} has been introduced, not so much out of
674necessity but simply driven by the fact that, in my opinion, it looks more
675natural. Normally the meaning of the active character can be put in its own
676macro, say:
677
678\starttyping
679\def\MyActiveA{B}
680\stoptyping
681
682We can now directly assign this meaning to the active character:
683
684\starttyping
685\letcharcode`A=\MyActiveA
686\stoptyping
687
688Now, when \type {A} is made active this meaning kicks in.
689
690\starttyping
691\def\whatever#1#2%
692  {\def\tempwhatever{#2}%
693   \letcharcode`#1\tempwhatever
694   \catcode`#1=13\relax}
695\stoptyping
696
697We end up with less code but, more important, it is easier to explain to a user
698and, in my eyes, it looks less obscure, too. Of course, the educational gain here
699wins over any practical gain because a macro package hides such details and only
700implements such an active character installer once.
701
702\stopsection
703
704\startsection[title=\type {\csname} and friends]
705
706You can check for a macro being defined as follows:
707
708\starttyping
709\ifdefined\foo
710    do something
711\else
712    do nothing
713\fi
714\stoptyping
715
716which, of course, can be obscured to:
717
718\starttyping
719do \ifdefined\foo some\else no\fi thing
720\stoptyping
721
722A bit more work is needed when a macro is defined using \type {\csname}, in which
723case arbitrary characters (like spaces) can be used:
724
725\starttyping
726\ifcsname something or nothing\endcsname
727    do something
728\else
729    do nothing
730\fi
731\stoptyping
732
733Before \ETEX, this was done as follows:
734
735\starttyping
736\expandafter\ifx\csname something or nothing\endcsname\relax
737    do nothing
738\else
739    do something
740\fi
741\stoptyping
742
743The \type {\csname} primitive will do a lookup and create an entry in the hash
744for an undefined name that then defaults to \type {\relax}. This can result in
745many unwanted entries when checking potential macro names. Thus, \ETEX's \type
746{\ifcsname} test primitive can be qualified as a \quote {necessity}.
747
748Now take the following example:
749
750\starttyping
751\ifcsname do this\endcsname
752    \csname do this\endcsname
753\else\ifcsname do that\endcsname
754    \csname do that\endcsname
755\else
756    \csname do nothing\endcsname
757\fi\fi
758\stoptyping
759
760If \type {do this} is defined, we have two lookups. If it is undefined and \type
761{do that} is defined, we have three lookups. So there is always one redundant
762lookup. Also, when no match is found, \TEX\ has to skip to the \type {\else} or
763\type {\fi}. One can save a bit by uglifying this to:
764
765\starttyping
766\csname do%
767    \ifcsname do this\endcsname this\else
768    \ifcsname do that\endcsname that\else
769                            nothing\fi\fi
770\endcsname
771\stoptyping
772
773This, of course, assumes that there is always a final branch. So let's get back
774to:
775
776\starttyping
777\ifcsname do this\endcsname
778    \csname do this\endcsname
779\else\ifcsname do that\endcsname
780    \csname do that\endcsname
781\fi\fi
782\stoptyping
783
784As said, when there is some match, there is always one test too many. In case you
785think this might be slowing down \TEX, be warned: it's hard to measure. But as
786there can be (m)any character(s) involved, including multi|-|byte \UTF-8\
787characters or embedded macros, there is a bit of penalty in terms of parsing
788token lists and converting to \UTF\ strings used for the lookup. And, because
789\TEX\ has to give an error message in case of troubles, the already|-|seen tokens
790are stored too.
791
792So, in order to avoid this somewhat redundant operation of parsing, memory
793allocation (for the lookup string) and storing tokens, the new primitive \type
794{\lastnamedcs} is now provided:
795
796\starttyping
797\ifcsname do this\endcsname
798    \lastnamedcs
799\else\ifcsname do that\endcsname
800    \lastnamedcs
801\fi\fi
802\stoptyping
803
804In addition to the (in practice, often negligible) speed gain, there are other
805advantages: \TEX\ has less to skip, and although skipping is fast, it still isn't
806a nice side effect (also useful when tracing). Another benefit is that we don't
807have to type the to|-|be|-|looked|-|up text twice. This reduces the chance of
808errors. In our example we also save 16 tokens (taking 64 bytes) in the format
809file. So, there are enough benefits to gain from this primitive, which is not a
810specific feature, but just an extension to an existing mechanism.
811
812It also works in this basic case:
813
814\starttyping
815\csname do this\endcsname
816\lastnamedcs
817\stoptyping
818
819And even this works:
820
821\starttyping
822\csname do this\endcsname
823\expandafter\let\expandafter\dothis\lastnamedcs
824\stoptyping
825
826And after:
827
828\starttyping
829\bgroup
830\expandafter\def\csname do this\endcsname{or that}
831\global\expandafter\let\expandafter\dothis\lastnamedcs
832\expandafter\def\csname do that\endcsname{or this}
833\global\expandafter\let\expandafter\dothat\lastnamedcs
834\egroup
835\stoptyping
836
837We can use \type {\dothis} that gives \type {or that} and \type {\dothat} that
838gives \type {or this}, so we have the usual freedom to be able to use something
839meant to make code clean for the creation of obscure code. % Amen!
840
841A variation on this is the following:
842
843\starttyping
844\begincsname do this\endcsname
845\stoptyping
846
847This call will check if \type {\do this} is defined, and, if so, will expand it.
848However, when \type {\do this} is not found, it does not create a hash entry. It
849is equivalent to:
850
851\starttyping
852\ifcsname do this\endcsname\lastnamedcs\fi
853\stoptyping
854
855but it avoids the \type {\ifcsname}, which is sometimes handy as these tests can
856interfere.
857
858I played with variations like \type {\ifbegincsname}, but we then quickly end up
859with dirty code due to the fact that we first expand something and then need to
860deal with the following \type {\else} and \type {\fi}. The two above|-|mentioned
861primitives are non|-|intrusive in the sense that they were relatively easy to add
862without obscuring the code base.
863
864As a bonus, \LUATEX\ also provides a variant of \type {\string} that doesn't add
865the escape character: \type {\csstring}. There is not much to explain to this:
866
867\starttyping
868\string\whatever<>\csstring\whatever
869\stoptyping
870
871This gives: \expanded{\type{\string\whatever<>\csstring\whatever}}.
872
873The main advantage of these several new primitives is that a bit less code is
874needed and (at least for \CONTEXT) leads to a bit less tracing output. When you
875enable \type {\tracingall} for a larger document or example, which is sometimes
876needed to figure out a problem, it's not much fun to work with the resulting
877megabyte (or sometimes even gigabyte) of output so the more we can get rid of,
878the better. This consequence is just an unfortunate side effect of the \CONTEXT\
879user interface with its many parameters. As said, there is no real gain in speed.
880
881\stopsection
882
883\startsection[title=Packing]
884
885Deep down in \TEX, horizontal and vertical lists eventually get packed. Packing
886of an \type {\hbox} involves:
887
888\startitemize[n,packed]
889\startitem ligature building (for traditional \TEX\ fonts), \stopitem
890\startitem kerning (for traditional \TEX\ fonts), \stopitem
891\startitem calling out to \LUA\ (when enabled) and \stopitem
892\startitem wrapping the list in a box and calculating the width. \stopitem
893\stopitemize
894
895When a \LUA\ function is called, in most cases, the location where it happens
896(group code) is also passed. But say that you try the following:
897
898\starttyping
899\hbox{\hbox{\hbox{\hbox foo}}}
900\stoptyping
901
902Here we do all four steps, while for the three outer boxes, only the last step
903makes any sense. And it's not trivial to avoid the application of the \LUA\
904function here. Of course, one can assign an attribute to the boxes and use that
905to intercept, but it's kind of clumsy. This is why we now can say:
906
907\starttyping
908\hpack{\hpack{\hpack{\hbox foo}}}
909\stoptyping
910
911There are also \type {\vpack} for a \type {\vbox} and \type {\tpack} for a \type
912{\vtop}. There can be a small gain in speed when many complex manipulations are
913done, although in, for instance, \CONTEXT, we already have provisions for that.
914It's just that the new primitives are a cleaner way out of a conceptually nasty
915problem. Similar functions are available on the \LUA\ side.
916
917\stopsection
918
919\startsection[title=Errors]
920
921We end with a few options that can be convenient to use if you don't care about
922exact compatibility.
923
924\starttyping
925\suppresslongerror
926\suppressmathparerror
927\suppressoutererror
928\suppressifcsnameerror
929\stoptyping
930
931When entering your document on a paper teletype terminal, starting \TEX, and then
932going home in order to have a look at the result the next day, it does make sense
933to catch runaway cases, like premature ending of a paragraph (using \type {\par}
934or equivalent empty lines), or potentially missing \type {$$}s. Nowadays, it's
935less important to catch such coding issues (and be more tolerant) because editing
936takes place on screen and running (and restarting) \TEX\ is very fast.
937
938The first two flags given above deal with this. If you set the first to any value
939greater than zero, macros not defined as \type {\long} (not accepting paragraph
940endings) will not complain about \cs{par} tokens in arguments. The second setting
941permits and ignores empty lines (also pars) in math without reverting to dirty
942tricks. Both are handy when your content comes from places that are outside of
943your control. The job will not be aborted (or hang) because of an empty line.
944
945The third setting suppresses the \type {\outer} directive so that macros that
946originally can only be used at the outer level can now be used anywhere. It's
947hard to explain the concept of outer (and the related error message) to a user
948anyway.
949
950The last one is a bit special. Normally, when you use \type {\ifcsname} you will
951get an error when \TEX\ sees something unexpandable or that can't be part of a
952name. But sometimes you might find it to be quite acceptable and can just
953consider the condition as false. When the fourth variable is set to non|-|zero,
954\TEX\ will ignore this issue and try to finish the check properly, so basically
955you then have an \type {\iffalse}.
956
957\stopsection
958
959\startsection[title=Final remarks]
960
961I mentioned performance a number of times, and it's good to notice that most
962changes discussed here will potentially be faster than the alternatives, but this
963is not always noticeable, in practice. There are several reasons.
964
965For one thing, \TEX\ is already highly optimized. It has speedy memory management
966of tokens and nodes and unnecessary code paths are avoided. However, due to
967extensions to the original code, a bit more happens in the engine than in decades
968past. For instance, \UNICODE\ fonts demand sparse arrays instead of fixed|-|size,
969256|-|slot data structures. Handling \UTF\ involves more testing and construction
970of more complex strings. Directional typesetting leads to more testing and
971housekeeping in the frontend as well as the backend. More keywords to handle, for
972instance \type {\hbox}, result in more parsing and pushing back unmatched tokens.
973Some of the penalty has been compensated for through the changing of whatsits
974into regular nodes. In recent versions of \LUATEX, scanning of \type {\hbox}
975arguments is somewhat more efficient, too.
976
977In any case, any speedup we manage to achieve, as said before, can easily become
978noise through inefficient macro coding or user's writing bad styles. And we're
979pretty sure that not much more speed can be squeezed out. To achieve higher
980performance, it's time to buy a machine with a faster \CPU\ (and a huge cache),
981faster memory (lanes), an \SSD, and regularly check your coding.
982
983\stopsection
984
985\stopchapter
986
987\stoptext
988