onandon-runtoks.tex /size: 21 Kb    last modification: 2023-12-21 09:43
1% language=us
2
3\startcomponent onandon-amputating
4
5\environment onandon-environment
6
7\startchapter[title={Amputating code}]
8
9\startsection[title={Introduction}]
10
11Because \CONTEXT\ is already rather old in terms of software life and because it
12evolves over time, code can get replaced by better code. Reasons for this can be:
13
14\startitemize[packed]
15\startitem a better understanding of the way \TEX\ and \METAPOST\ work \stopitem
16\startitem demand for more advanced options \stopitem
17\startitem a brainwave resulting in a better solution \stopitem
18\startitem new functionality provided in \TEX\ engine used \stopitem
19\startitem the necessity to speed up a core process \stopitem
20\stopitemize
21
22Replacing code that in itself does a good job but is no longer the best to be
23used comes with sentiments. It can be rather satisfying to cook up a
24(conceptually as well as codewise) good solution and therefore removing code from
25a file can result in a somewhat bad feeling and even a feeling of losing
26something. Hence the title of this chapter.
27
28Here I will discuss one of the more complex subsystems: the one dealing with
29typeset text in \METAPOST\ graphics. I will stick to the principles and not
30present (much) code as that can be found in archives. This is not a tutorial,
31but more a sort of wrap|-|up for myself. It anyhow show the thinking behind
32this mechanism. I'll also introduce a new \LUATEX\ feature here: subruns.
33
34\stopsection
35
36\startsection[title={The problem}]
37
38\METAPOST\ is meant for drawing graphics and adding text to them is not really
39part of the concept. Its a bit like how \TEX\ sees images: the dimensions matter,
40the content doesn't. This means that in \METAPOST\ a blob of text is an
41abstraction. The native way to create a typeset text picture is:
42
43\starttyping
44picture p ; p := btex some text etex ;
45\stoptyping
46
47In traditional \METAPOST\ this will create a temporary \TEX\ file with the words
48\type {some text} wrapped in a box that when typeset is just shipped out. The
49result is a \DVI\ file that with an auxiliary program will be transformed into a
50\METAPOST\ picture. That picture itself is made from multiple pictures, because
51each sequences of characters becomes a picture and kerns become shifts.
52
53There is also a primitive \type {infont} that takes a text and just converts it
54into a low level text object but no typesetting is done there: so no ligatures
55and no kerns are found there. In \CONTEXT\ this operator is redefined to do the
56right thing.
57
58In both cases, what ends up in the \POSTSCRIPT\ file is references to fonts and
59characters and the original idea is that \DVIPS\ understands what
60fonts to embed. Details are communicated via specials (comments) that \DVIPS\ is
61supposed to intercept and understand. This all happens in an 8~bit (font) universe.
62
63When we moved on to \PDF, a converter from \METAPOST's rather predictable and
64simple \POSTSCRIPT\ code to \PDF\ was written in \TEX. The graphic operators
65became \PDF\ operators and the text was retypeset using the font information and
66snippets of strings and injected at the right spot. The only complication was
67that a non circular pen actually produced two path of which one has to be
68transformed.
69
70At that moment it already had become clear that a more tight integration in
71\CONTEXT\ would happen and not only would that demand a more sophisticated
72handling of text, but it would also require more features not present in
73\METAPOST, like dealing with \CMYK\ colors, special color spaces, transparency,
74images, shading, and more. All this was implemented. In the next sections we will
75only discuss texts.
76
77\stopsection
78
79\startsection[title={Using the traditional method}]
80
81The \type {btex} approach was not that flexible because what happens is that
82\type {btex} triggers the parser to just grabbing everything upto the \type
83{etex} and pass that to an external program. It's special scanner mode and
84because because of that using macros for typesetting texts is a pain. So, instead
85of using this method in \CONTEXT\ we used \type {textext}. Before a run the
86\METAPOST\ file was scanned and for each \type {textext} the argument was copied
87to a file. The \type {btex} calls were scanned to and replaced by \type {textext}
88calls.
89
90For each processed snippet the dimensions were stored in order to be loaded at
91the start of the \METAPOST\ run. In fact, each text was just a rectangle with
92certain dimensions. The \PDF\ converter would use the real snippet (by
93typesetting it).
94
95Of course there had to be some housekeeping in order to make sure that the right
96snippets were used, because the order of definition (as picture) can be different
97from them being used. This mechanism evolved into reasonable robust text handling
98but of course was limited by the fact that the file was scanned for snippets. So,
99the string had to be string and not assembled one. This disadvantage was
100compensated by the fact that we could communicate relevant bits of the
101environment and apply all the usual context trickery in texts in a way that was
102consistent with the rest of the document.
103
104A later implementation could communicate the text via specials which is more
105flexible. Although we talk of this method in the past sense it is still used in
106\MKII.
107
108\stopsection
109
110\startsection[title={Using the library}]
111
112When the \MPLIB\ library showed up in \LUATEX, the same approach was used but
113soon we moved on to a different approach. We already used specials to communicate
114extensions to the backend, using special colors and fake objects as signals. But
115at that time paths got pre- and postscripts fields and those could be used to
116really carry information with objects because unlike specials, they were bound to
117that object. So, all extensions using specials as well as texts were rewritten to
118use these scripts.
119
120The \type {textext} macro changed its behaviour a bit too. Remember that a
121text effectively was just a rectangle with some transformation applied. However
122this time the postscript field carried the text and the prescript field some
123specifics, like the fact that that we are dealing with text. Using the script made
124it possible to carry some more inforation around, like special color demands.
125
126\starttyping
127draw textext("foo") ;
128\stoptyping
129
130Among the prescripts are \typ {tx_index=trial} and \typ {tx_state=trial}
131(multiple prescripts are prepended) and the postscript is \type {foo}. In a
132second run the prescript is \type {tx_index=trial} and \typ {tx_state=final}.
133After the first run we analyze all objects, collect the texts (those with a \type
134{tx_} variables set) and typeset them. As part of the second run we pass the
135dimensions of each indexed text snippet. Internally before the first run we
136\quote {reset} states, then after the first run we \quote {analyze}, and after
137the second run we \quote {process} as part of the conversion of output to \PDF.
138
139\stopsection
140
141\startsection[title={Using \type {runscript}}]
142
143When the \type {runscript} feature was introduced in the library we no longer
144needed to pass the dimensions via subscripted variables. Instead we could just
145run a \LUA\ snippets and ask for the dimensions of a text with some index. This
146is conceptually not much different but it saves us creating \METAPOST\ code that
147stored the dimensions, at the cost of potentially a bit more runtime due to the
148\type {runscript} calls. But the code definitely looks a bit cleaner this way. Of
149course we had to keep the dimensions at the \LUA\ end but we already did that
150because we stored the preprocessed snippets for final usage.
151
152\stopsection
153
154\startsection[title={Using a sub \TEX\ run}]
155
156We now come the current (post \LUATEX\ 1.08) solution. For reasons I will
157mention later a two pass approach is not optimal, but we can live with that,
158especially because \CONTEXT\ with \METAFUN\ (which is what we're talking about
159here) is quit efficient. More important is that it's kind of ugly to do all the
160not that special work twice. In addition to text we also have outlines, graphics
161and more mechanisms that needed two passes and all these became one pass
162features.
163
164A \TEX\ run is special in many ways. At some point after starting up \TEX\
165enters the main loop and begins reading text and expanding macros. Normally you
166start with a file but soon a macro is seen, and a next level of input is entered,
167because as part of the expansion more text can be met, files can be opened,
168other macros be expanded. When a macro expands a token register, another level is
169entered and the same happens when a \LUA\ call is triggered. Such a call can
170print back something to \TEX\ and that has to be scanned as if it came from a
171file.
172
173When token lists (and macros) get expanded, some commands result in direct
174actions, others result in expansion only and processing later as one of more
175tokens can end up in the input stack. The internals of the engine operate in
176miraculous ways. All commands trigger a function call, but some have their own
177while others share one with a switch statement (in \CCODE\ speak) because they
178belong to a category of similar actions. Some are expanded directly, some get
179delayed.
180
181Does it sound complicated? Well, it is. It's even more so when you consider that
182\TEX\ uses nesting, which means pushing and popping local assignments, knows
183modes, like horizontal, vertical and math mode, keeps track of interrupts and at
184the same type triggers typesetting, par building, page construction and flushing
185to the output file.
186
187It is for this reason plus the fact that users can and will do a lot to influence
188that behaviour that there is just one main loop and in many aspects global state.
189There are some exceptions, for instance when the output routine is called, which
190creates a sort of closure: it interrupts the process and for that reason gets
191grouping enforced so that it doesn't influence the main run. But even then the
192main loop does the job.
193
194Starting with version 1.10 \LUATEX\ provides a way to do a local run. There are
195two ways provided: expanding a token register and calling a \LUA\ function. It
196took a bit of experimenting to reach an implementation that works out reasonable
197and many variants were tried. In the appendix we give an example of usage.
198
199The current variant is reasonable robust and does the job but care is needed.
200First of all, as soon as you start piping something to \TEX\ that gets typeset
201you'd better in a valid mode. If not, then for instance glyphs can end up in a
202vertical list and \LUATEX\ will abort. In case you wonder why we don't intercept
203this: we can't because we don't know the users intentions. We cannot enforce a
204mode for instance as this can have side effects, think of expanding \type
205{\everypar} or injecting an indentation box. Also, as soon as you start juggling
206nodes there is no way that \TEX\ can foresee what needs to be copied to
207discarded. Normally it works out okay but because in \LUATEX\ you can cheat in
208numerous ways with \LUA, you can get into trouble.
209
210So, what has this to do with \METAPOST ? Well, first of all we could now use a
211one pass approach. The \type {textext} macro calls \LUA, which then let \TEX\ do
212some typesetting, and then gives back the dimensions to \METAPOST. The \quote
213{analyze} phase is now integrated in the run. For a regular text this works quite
214well because we just box some text and that's it. However, in the next section we
215will see where things get complicated.
216
217Let's summarize the one pass approach: the \type {textext} macro creates
218rectangle with the right dimensions and for doing passes the string to \LUA\
219using \type {runscript}. We store the argument of \type {textext} in a variable,
220then call \type {runtoks}, which expands the given token list, where we typeset a
221box with the stored text (that we fetch with a \LUA\ call), and the \type
222{runscript} passes back the three dimensions as fake \RGB\ color to \METAPOST\
223which applies a \type {scantokens} to the result. So, in principle there is no
224real conceptual difference except that we now analyze in|-|place instead of
225between runs. I will not show the code here because in \CONTEXT\ we use a wrapper
226around \type {runscript} so low level examples won't run well.
227
228\stopsection
229
230\startsection[title={Some aspects}]
231
232An important aspect of the text handling is that the whole text can be
233transformed. Normally this is only some scaling but rotation is also quite valid.
234In the first approach, the original \METAPOST\ one, we have pictures constructed
235of snippets and pictures transform well as long as the backend is not too
236confused, something that can happen when for instance very small or large font
237scales are used. There were some limitations with respect to the number of fonts
238and efficient inclusion when for instance randomization was used (I remember
239cases with thousands of font instances). The \PDF\ backend could handle most
240cases well, by just using one size and scaling at the \PDF\ level. All the \type
241{textext} approaches use rectangles as stubs which is very efficient and permits
242all transforms.
243
244How about color? Think of this situation:
245
246\starttyping
247\startMPcode
248    draw textext("some \color[red]{text}")
249        withcolor green ;
250\stopMPcode
251\stoptyping
252
253And what about the document color? We suffice by saying that this is all well
254supported. Of course using transparency, spot colors etc.\ also needs extensions.
255These are however not directly related to texts although we need to take it into
256account when dealing with the inclusion.
257
258\starttyping
259\startMPcode
260    draw textext("some \color[red]{text}")
261      withcolor "blue"
262      withtransparency (1,0.5) ;
263\stopMPcode
264\stoptyping
265
266What if you have a graphic with many small snippets of which many have the same
267content? These are by default shared, but if needed you can disable it. This makes
268sense if you have a case like this:
269
270\starttyping
271\useMPlibrary[dum]
272
273\startMPcode
274    draw textext("\externalfigure[unknown]") notcached ;
275    draw textext("\externalfigure[unknown]") notcached ;
276\stopMPcode
277\stoptyping
278
279Normally each unknown image gets a nice placeholder with some random properties.
280So, do we want these two to have the same or not? At least you can control it.
281
282When I said that things can get complicated with the one pass approach the
283previous code snippet is a good example. The dummy figure is generated by
284\METAPOST. So, as we have one pass, and jump temporarily back to \TEX,
285we have two problems: we reenter the \MPLIB\ instance again in the middle of
286a run, and we might pipe back something to and|/|or from \TEX\ nested.
287
288The first problem could be solved by starting a new \MPLIB\ session. This
289normally is not a problem as both runs are independent of each other. In
290\CONTEXT\ we can have \METAPOST\ runs in many places and some produce some more
291of less stand alone graphic in the text while other calls produce \PDF\ code in
292the backend that is used in a different way (for instance in a font). In the
293first case the result gets nicely wrapped in a box, while in the second case it
294might directly end up in the page stream. And, as \TEX\ has no knowledge of what
295is needed, it's here that we can get the complications that can lead to aborting
296a run when you are careless. But in any case, if you abort, then you can be sure
297you're doing the wrong thing. So, the second problem can only be solved by
298careful programming.
299
300When I ran the test suite on the new code, some older modules had to be fixed.
301They were doing the right thing from the perspective of intermediate runs and
302therefore independent box handling, putting a text in a box and collecting
303dimensions, but interwoven they demanded a bit more defensive programming. For
304instance, the multi|-|pass approach always made copies snippets while the one
305pass approach does that only when needed. And that confused some old code in a
306module, which incidentally is never used today because we have better
307functionality built|-|in (the \METAFUN\ \type {followtext} mechanism).
308
309The two pass approach has special code for cases where a text is not used.
310Imagine this:
311
312\starttyping
313picture p ; p := textext("foo") ;
314
315draw boundingbox p;
316\stoptyping
317
318Here the \quote {analyze} stage will never see the text because we don't flush p.
319However because \type {textext} is called it can also make sure we still know the
320dimensions. In the next case we do use the text but in two different ways. These
321subtle aspects are dealt with properly and could be made a it simpler in the
322single pass approach.
323
324\starttyping
325picture p ; p := textext("foo") ;
326
327draw p rotated 90 withcolor red ;
328draw p withcolor green ;
329\stoptyping
330
331\stopsection
332
333\startsection[title=One or two runs]
334
335So are we better off now? One problem with two passes is that if you use the
336equation solver you need to make sure that you don't run into the redundant
337equation issue. So, you need to manage your variables well. In fact you need to
338do that anyway because you can call out to \METAPOST\ many times in a run so old
339variables can interfere anyway. So yes, we're better off here.
340
341Are we worse off now? The two runs with in between the text processing is very
342robust. There is no interference of nested runs and no interference of nested
343local \TEX\ calls. So, maybe we're also bit worse off. You need to anyhow keep
344this in mind when you write your own low level \TEX|-|\METAPOST\ interaction
345trickery, but fortunately now many users do that. And if you did write your own
346plugins, you now need to make them single pass.
347
348The new code is conceptually cleaner but also still not trivial because due to
349the mentioned complications. It's definitely less code but somehow amputating the
350old code does hurt a bit. Maybe I should keep it around as reference of how text
351handling evolved over a few decades.
352
353\stopsection
354
355\startsection[title=Appendix]
356
357Because the single pass approach made me finally look into a (although somewhat
358limited) local \TEX\ run, I will show a simple example. For the sake of
359generality I will use \type {\directlua}. Say that you need the dimensions of a
360box while in \LUA:
361
362\startbuffer
363\directlua {
364    tex.sprint("result 1: <")
365
366    tex.sprint("\\setbox0\\hbox{one}")
367    tex.sprint("\\number\\wd0")
368
369    tex.sprint("\\setbox0\\hbox{\\directlua{tex.print{'first'}}}")
370    tex.sprint(",")
371    tex.sprint("\\number\\wd0")
372
373    tex.sprint(">")
374}
375\stopbuffer
376
377\typebuffer \getbuffer
378
379This looks ok, but only because all printed text is collected and pushed into a
380new input level once the \LUA\ call is done. So take this then:
381
382\startbuffer
383\directlua {
384    tex.sprint("result 2: <")
385
386    tex.sprint("\\setbox0\\hbox{one}")
387    tex.sprint(tex.getbox(0).width)
388
389    tex.sprint("\\setbox0\\hbox{\\directlua{tex.print{'first'}}}")
390    tex.sprint(",")
391    tex.sprint(tex.getbox(0).width)
392
393    tex.sprint(">")
394}
395\stopbuffer
396
397\typebuffer \getbuffer
398
399This time we get the widths of the box known at the moment that we are in \LUA,
400but we haven't typeset the content yet, so we get the wrong dimensions. This
401however will work okay:
402
403\startbuffer
404\toks0{\setbox0\hbox{one}}
405\toks2{\setbox0\hbox{first}}
406\directlua {
407    tex.forcehmode(true)
408
409    tex.sprint("<")
410
411    tex.runtoks(0)
412    tex.sprint(tex.getbox(0).width)
413
414    tex.runtoks(2)
415    tex.sprint(",")
416    tex.sprint(tex.getbox(0).width)
417
418    tex.sprint(">")
419}
420\stopbuffer
421
422\typebuffer \getbuffer
423
424as does this:
425
426\startbuffer
427\toks0{\setbox0\hbox{\directlua{tex.sprint(MyGlobalText)}}}
428\directlua {
429    tex.forcehmode(true)
430
431    tex.sprint("result 3: <")
432
433    MyGlobalText = "one"
434    tex.runtoks(0)
435    tex.sprint(tex.getbox(0).width)
436
437    MyGlobalText = "first"
438    tex.runtoks(0)
439    tex.sprint(",")
440    tex.sprint(tex.getbox(0).width)
441
442    tex.sprint(">")
443}
444\stopbuffer
445
446\typebuffer \getbuffer
447
448Here is a variant that uses functions:
449
450\startbuffer
451\directlua {
452    tex.forcehmode(true)
453
454    tex.sprint("result 4: <")
455
456    tex.runtoks(function()
457        tex.sprint("\\setbox0\\hbox{one}")
458    end)
459    tex.sprint(tex.getbox(0).width)
460
461    tex.runtoks(function()
462        tex.sprint("\\setbox0\\hbox{\\directlua{tex.print{'first'}}}")
463    end)
464    tex.sprint(",")
465    tex.sprint(tex.getbox(0).width)
466
467    tex.sprint(">")
468}
469\stopbuffer
470
471\typebuffer \getbuffer
472
473The \type {forcemode} is needed when you do this in vertical mode. Otherwise the
474run aborts. Of course you can also force horizontal mode before the call. I'm
475sure that users will be surprised by side effects when they really use this
476feature but that is to be expected: you really need to be aware of the subtle
477interference of input levels and mix of input media (files, token lists, macros
478or \LUA) as well as the fact that \TEX\ often looks one token ahead, and often,
479when forced to typeset something, also can trigger builders. You're warned.
480
481\stopsection
482
483\stopchapter
484
485\stopcomponent
486
487% \starttext
488
489% \toks0{\hbox{test}}  [\ctxlua{tex.runtoks(0)}]\par
490
491% \toks0{\relax\relax\hbox{test}\relax\relax}[\ctxlua{tex.runtoks(0)}]\par
492
493% \toks0{xxxxxxx}  [\ctxlua{tex.runtoks(0)}]\par
494
495% \toks0{\hbox{(\ctxlua{context("test")})}}  [\ctxlua{tex.runtoks(0)}]\par
496
497% \toks0{\global\setbox1\hbox{(\ctxlua{context("test")})}}  [\ctxlua{tex.runtoks(0)}\box1]\par
498
499% \startluacode
500% local s = "[\\ctxlua{tex.runtoks(0)}\\box1]"
501% context("<")
502% context( function() context(s) end)
503% context( function() context(s) end)
504% context(">")
505% \stopluacode\par
506
507% \toks10000{\hbox{\red test1}}
508% \toks10002{\green\hbox{test2}}
509% \toks10004{\hbox{\global\setbox1\hbox to 1000sp{\directlua{context("!4!")}}}}
510% \toks10006{\hbox{\global\setbox3\hbox to 2000sp{\directlua{context("?6?")}}}}
511% \hbox{x\startluacode
512%     local s0 = "(\\hbox{\\ctxlua{tex.runtoks(10000)}})"
513%     local s2 = "[\\hbox{\\ctxlua{tex.runtoks(10002)}}]"
514%     context("<!")
515% --    context( function() context(s0) end)
516% --    context( function() context(s0) end)
517% --    context( function() context(s2) end)
518%     context(s0)
519%     context(s0)
520%     context(s2)
521%     context("<")
522%     tex.runtoks(10004)
523%     context("X")
524%     tex.runtoks(10006)
525%     context(tex.box[1].width)
526%     context("/")
527%     context(tex.box[3].width)
528%     context("!>")
529% \stopluacode x}\par
530
531
532