% language=us \startcomponent onandon-amputating \environment onandon-environment \startchapter[title={Amputating code}] \startsection[title={Introduction}] Because \CONTEXT\ is already rather old in terms of software life and because it evolves over time, code can get replaced by better code. Reasons for this can be: \startitemize[packed] \startitem a better understanding of the way \TEX\ and \METAPOST\ work \stopitem \startitem demand for more advanced options \stopitem \startitem a brainwave resulting in a better solution \stopitem \startitem new functionality provided in \TEX\ engine used \stopitem \startitem the necessity to speed up a core process \stopitem \stopitemize Replacing code that in itself does a good job but is no longer the best to be used comes with sentiments. It can be rather satisfying to cook up a (conceptually as well as codewise) good solution and therefore removing code from a file can result in a somewhat bad feeling and even a feeling of losing something. Hence the title of this chapter. Here I will discuss one of the more complex subsystems: the one dealing with typeset text in \METAPOST\ graphics. I will stick to the principles and not present (much) code as that can be found in archives. This is not a tutorial, but more a sort of wrap|-|up for myself. It anyhow show the thinking behind this mechanism. I'll also introduce a new \LUATEX\ feature here: subruns. \stopsection \startsection[title={The problem}] \METAPOST\ is meant for drawing graphics and adding text to them is not really part of the concept. Its a bit like how \TEX\ sees images: the dimensions matter, the content doesn't. This means that in \METAPOST\ a blob of text is an abstraction. The native way to create a typeset text picture is: \starttyping picture p ; p := btex some text etex ; \stoptyping In traditional \METAPOST\ this will create a temporary \TEX\ file with the words \type {some text} wrapped in a box that when typeset is just shipped out. The result is a \DVI\ file that with an auxiliary program will be transformed into a \METAPOST\ picture. That picture itself is made from multiple pictures, because each sequences of characters becomes a picture and kerns become shifts. There is also a primitive \type {infont} that takes a text and just converts it into a low level text object but no typesetting is done there: so no ligatures and no kerns are found there. In \CONTEXT\ this operator is redefined to do the right thing. In both cases, what ends up in the \POSTSCRIPT\ file is references to fonts and characters and the original idea is that \DVIPS\ understands what fonts to embed. Details are communicated via specials (comments) that \DVIPS\ is supposed to intercept and understand. This all happens in an 8~bit (font) universe. When we moved on to \PDF, a converter from \METAPOST's rather predictable and simple \POSTSCRIPT\ code to \PDF\ was written in \TEX. The graphic operators became \PDF\ operators and the text was retypeset using the font information and snippets of strings and injected at the right spot. The only complication was that a non circular pen actually produced two path of which one has to be transformed. At that moment it already had become clear that a more tight integration in \CONTEXT\ would happen and not only would that demand a more sophisticated handling of text, but it would also require more features not present in \METAPOST, like dealing with \CMYK\ colors, special color spaces, transparency, images, shading, and more. All this was implemented. In the next sections we will only discuss texts. \stopsection \startsection[title={Using the traditional method}] The \type {btex} approach was not that flexible because what happens is that \type {btex} triggers the parser to just grabbing everything upto the \type {etex} and pass that to an external program. It's special scanner mode and because because of that using macros for typesetting texts is a pain. So, instead of using this method in \CONTEXT\ we used \type {textext}. Before a run the \METAPOST\ file was scanned and for each \type {textext} the argument was copied to a file. The \type {btex} calls were scanned to and replaced by \type {textext} calls. For each processed snippet the dimensions were stored in order to be loaded at the start of the \METAPOST\ run. In fact, each text was just a rectangle with certain dimensions. The \PDF\ converter would use the real snippet (by typesetting it). Of course there had to be some housekeeping in order to make sure that the right snippets were used, because the order of definition (as picture) can be different from them being used. This mechanism evolved into reasonable robust text handling but of course was limited by the fact that the file was scanned for snippets. So, the string had to be string and not assembled one. This disadvantage was compensated by the fact that we could communicate relevant bits of the environment and apply all the usual context trickery in texts in a way that was consistent with the rest of the document. A later implementation could communicate the text via specials which is more flexible. Although we talk of this method in the past sense it is still used in \MKII. \stopsection \startsection[title={Using the library}] When the \MPLIB\ library showed up in \LUATEX, the same approach was used but soon we moved on to a different approach. We already used specials to communicate extensions to the backend, using special colors and fake objects as signals. But at that time paths got pre- and postscripts fields and those could be used to really carry information with objects because unlike specials, they were bound to that object. So, all extensions using specials as well as texts were rewritten to use these scripts. The \type {textext} macro changed its behaviour a bit too. Remember that a text effectively was just a rectangle with some transformation applied. However this time the postscript field carried the text and the prescript field some specifics, like the fact that that we are dealing with text. Using the script made it possible to carry some more inforation around, like special color demands. \starttyping draw textext("foo") ; \stoptyping Among the prescripts are \typ {tx_index=trial} and \typ {tx_state=trial} (multiple prescripts are prepended) and the postscript is \type {foo}. In a second run the prescript is \type {tx_index=trial} and \typ {tx_state=final}. After the first run we analyze all objects, collect the texts (those with a \type {tx_} variables set) and typeset them. As part of the second run we pass the dimensions of each indexed text snippet. Internally before the first run we \quote {reset} states, then after the first run we \quote {analyze}, and after the second run we \quote {process} as part of the conversion of output to \PDF. \stopsection \startsection[title={Using \type {runscript}}] When the \type {runscript} feature was introduced in the library we no longer needed to pass the dimensions via subscripted variables. Instead we could just run a \LUA\ snippets and ask for the dimensions of a text with some index. This is conceptually not much different but it saves us creating \METAPOST\ code that stored the dimensions, at the cost of potentially a bit more runtime due to the \type {runscript} calls. But the code definitely looks a bit cleaner this way. Of course we had to keep the dimensions at the \LUA\ end but we already did that because we stored the preprocessed snippets for final usage. \stopsection \startsection[title={Using a sub \TEX\ run}] We now come the current (post \LUATEX\ 1.08) solution. For reasons I will mention later a two pass approach is not optimal, but we can live with that, especially because \CONTEXT\ with \METAFUN\ (which is what we're talking about here) is quit efficient. More important is that it's kind of ugly to do all the not that special work twice. In addition to text we also have outlines, graphics and more mechanisms that needed two passes and all these became one pass features. A \TEX\ run is special in many ways. At some point after starting up \TEX\ enters the main loop and begins reading text and expanding macros. Normally you start with a file but soon a macro is seen, and a next level of input is entered, because as part of the expansion more text can be met, files can be opened, other macros be expanded. When a macro expands a token register, another level is entered and the same happens when a \LUA\ call is triggered. Such a call can print back something to \TEX\ and that has to be scanned as if it came from a file. When token lists (and macros) get expanded, some commands result in direct actions, others result in expansion only and processing later as one of more tokens can end up in the input stack. The internals of the engine operate in miraculous ways. All commands trigger a function call, but some have their own while others share one with a switch statement (in \CCODE\ speak) because they belong to a category of similar actions. Some are expanded directly, some get delayed. Does it sound complicated? Well, it is. It's even more so when you consider that \TEX\ uses nesting, which means pushing and popping local assignments, knows modes, like horizontal, vertical and math mode, keeps track of interrupts and at the same type triggers typesetting, par building, page construction and flushing to the output file. It is for this reason plus the fact that users can and will do a lot to influence that behaviour that there is just one main loop and in many aspects global state. There are some exceptions, for instance when the output routine is called, which creates a sort of closure: it interrupts the process and for that reason gets grouping enforced so that it doesn't influence the main run. But even then the main loop does the job. Starting with version 1.10 \LUATEX\ provides a way to do a local run. There are two ways provided: expanding a token register and calling a \LUA\ function. It took a bit of experimenting to reach an implementation that works out reasonable and many variants were tried. In the appendix we give an example of usage. The current variant is reasonable robust and does the job but care is needed. First of all, as soon as you start piping something to \TEX\ that gets typeset you'd better in a valid mode. If not, then for instance glyphs can end up in a vertical list and \LUATEX\ will abort. In case you wonder why we don't intercept this: we can't because we don't know the users intentions. We cannot enforce a mode for instance as this can have side effects, think of expanding \type {\everypar} or injecting an indentation box. Also, as soon as you start juggling nodes there is no way that \TEX\ can foresee what needs to be copied to discarded. Normally it works out okay but because in \LUATEX\ you can cheat in numerous ways with \LUA, you can get into trouble. So, what has this to do with \METAPOST ? Well, first of all we could now use a one pass approach. The \type {textext} macro calls \LUA, which then let \TEX\ do some typesetting, and then gives back the dimensions to \METAPOST. The \quote {analyze} phase is now integrated in the run. For a regular text this works quite well because we just box some text and that's it. However, in the next section we will see where things get complicated. Let's summarize the one pass approach: the \type {textext} macro creates rectangle with the right dimensions and for doing passes the string to \LUA\ using \type {runscript}. We store the argument of \type {textext} in a variable, then call \type {runtoks}, which expands the given token list, where we typeset a box with the stored text (that we fetch with a \LUA\ call), and the \type {runscript} passes back the three dimensions as fake \RGB\ color to \METAPOST\ which applies a \type {scantokens} to the result. So, in principle there is no real conceptual difference except that we now analyze in|-|place instead of between runs. I will not show the code here because in \CONTEXT\ we use a wrapper around \type {runscript} so low level examples won't run well. \stopsection \startsection[title={Some aspects}] An important aspect of the text handling is that the whole text can be transformed. Normally this is only some scaling but rotation is also quite valid. In the first approach, the original \METAPOST\ one, we have pictures constructed of snippets and pictures transform well as long as the backend is not too confused, something that can happen when for instance very small or large font scales are used. There were some limitations with respect to the number of fonts and efficient inclusion when for instance randomization was used (I remember cases with thousands of font instances). The \PDF\ backend could handle most cases well, by just using one size and scaling at the \PDF\ level. All the \type {textext} approaches use rectangles as stubs which is very efficient and permits all transforms. How about color? Think of this situation: \starttyping \startMPcode draw textext("some \color[red]{text}") withcolor green ; \stopMPcode \stoptyping And what about the document color? We suffice by saying that this is all well supported. Of course using transparency, spot colors etc.\ also needs extensions. These are however not directly related to texts although we need to take it into account when dealing with the inclusion. \starttyping \startMPcode draw textext("some \color[red]{text}") withcolor "blue" withtransparency (1,0.5) ; \stopMPcode \stoptyping What if you have a graphic with many small snippets of which many have the same content? These are by default shared, but if needed you can disable it. This makes sense if you have a case like this: \starttyping \useMPlibrary[dum] \startMPcode draw textext("\externalfigure[unknown]") notcached ; draw textext("\externalfigure[unknown]") notcached ; \stopMPcode \stoptyping Normally each unknown image gets a nice placeholder with some random properties. So, do we want these two to have the same or not? At least you can control it. When I said that things can get complicated with the one pass approach the previous code snippet is a good example. The dummy figure is generated by \METAPOST. So, as we have one pass, and jump temporarily back to \TEX, we have two problems: we reenter the \MPLIB\ instance again in the middle of a run, and we might pipe back something to and|/|or from \TEX\ nested. The first problem could be solved by starting a new \MPLIB\ session. This normally is not a problem as both runs are independent of each other. In \CONTEXT\ we can have \METAPOST\ runs in many places and some produce some more of less stand alone graphic in the text while other calls produce \PDF\ code in the backend that is used in a different way (for instance in a font). In the first case the result gets nicely wrapped in a box, while in the second case it might directly end up in the page stream. And, as \TEX\ has no knowledge of what is needed, it's here that we can get the complications that can lead to aborting a run when you are careless. But in any case, if you abort, then you can be sure you're doing the wrong thing. So, the second problem can only be solved by careful programming. When I ran the test suite on the new code, some older modules had to be fixed. They were doing the right thing from the perspective of intermediate runs and therefore independent box handling, putting a text in a box and collecting dimensions, but interwoven they demanded a bit more defensive programming. For instance, the multi|-|pass approach always made copies snippets while the one pass approach does that only when needed. And that confused some old code in a module, which incidentally is never used today because we have better functionality built|-|in (the \METAFUN\ \type {followtext} mechanism). The two pass approach has special code for cases where a text is not used. Imagine this: \starttyping picture p ; p := textext("foo") ; draw boundingbox p; \stoptyping Here the \quote {analyze} stage will never see the text because we don't flush p. However because \type {textext} is called it can also make sure we still know the dimensions. In the next case we do use the text but in two different ways. These subtle aspects are dealt with properly and could be made a it simpler in the single pass approach. \starttyping picture p ; p := textext("foo") ; draw p rotated 90 withcolor red ; draw p withcolor green ; \stoptyping \stopsection \startsection[title=One or two runs] So are we better off now? One problem with two passes is that if you use the equation solver you need to make sure that you don't run into the redundant equation issue. So, you need to manage your variables well. In fact you need to do that anyway because you can call out to \METAPOST\ many times in a run so old variables can interfere anyway. So yes, we're better off here. Are we worse off now? The two runs with in between the text processing is very robust. There is no interference of nested runs and no interference of nested local \TEX\ calls. So, maybe we're also bit worse off. You need to anyhow keep this in mind when you write your own low level \TEX|-|\METAPOST\ interaction trickery, but fortunately now many users do that. And if you did write your own plugins, you now need to make them single pass. The new code is conceptually cleaner but also still not trivial because due to the mentioned complications. It's definitely less code but somehow amputating the old code does hurt a bit. Maybe I should keep it around as reference of how text handling evolved over a few decades. \stopsection \startsection[title=Appendix] Because the single pass approach made me finally look into a (although somewhat limited) local \TEX\ run, I will show a simple example. For the sake of generality I will use \type {\directlua}. Say that you need the dimensions of a box while in \LUA: \startbuffer \directlua { tex.sprint("result 1: <") tex.sprint("\\setbox0\\hbox{one}") tex.sprint("\\number\\wd0") tex.sprint("\\setbox0\\hbox{\\directlua{tex.print{'first'}}}") tex.sprint(",") tex.sprint("\\number\\wd0") tex.sprint(">") } \stopbuffer \typebuffer \getbuffer This looks ok, but only because all printed text is collected and pushed into a new input level once the \LUA\ call is done. So take this then: \startbuffer \directlua { tex.sprint("result 2: <") tex.sprint("\\setbox0\\hbox{one}") tex.sprint(tex.getbox(0).width) tex.sprint("\\setbox0\\hbox{\\directlua{tex.print{'first'}}}") tex.sprint(",") tex.sprint(tex.getbox(0).width) tex.sprint(">") } \stopbuffer \typebuffer \getbuffer This time we get the widths of the box known at the moment that we are in \LUA, but we haven't typeset the content yet, so we get the wrong dimensions. This however will work okay: \startbuffer \toks0{\setbox0\hbox{one}} \toks2{\setbox0\hbox{first}} \directlua { tex.forcehmode(true) tex.sprint("<") tex.runtoks(0) tex.sprint(tex.getbox(0).width) tex.runtoks(2) tex.sprint(",") tex.sprint(tex.getbox(0).width) tex.sprint(">") } \stopbuffer \typebuffer \getbuffer as does this: \startbuffer \toks0{\setbox0\hbox{\directlua{tex.sprint(MyGlobalText)}}} \directlua { tex.forcehmode(true) tex.sprint("result 3: <") MyGlobalText = "one" tex.runtoks(0) tex.sprint(tex.getbox(0).width) MyGlobalText = "first" tex.runtoks(0) tex.sprint(",") tex.sprint(tex.getbox(0).width) tex.sprint(">") } \stopbuffer \typebuffer \getbuffer Here is a variant that uses functions: \startbuffer \directlua { tex.forcehmode(true) tex.sprint("result 4: <") tex.runtoks(function() tex.sprint("\\setbox0\\hbox{one}") end) tex.sprint(tex.getbox(0).width) tex.runtoks(function() tex.sprint("\\setbox0\\hbox{\\directlua{tex.print{'first'}}}") end) tex.sprint(",") tex.sprint(tex.getbox(0).width) tex.sprint(">") } \stopbuffer \typebuffer \getbuffer The \type {forcemode} is needed when you do this in vertical mode. Otherwise the run aborts. Of course you can also force horizontal mode before the call. I'm sure that users will be surprised by side effects when they really use this feature but that is to be expected: you really need to be aware of the subtle interference of input levels and mix of input media (files, token lists, macros or \LUA) as well as the fact that \TEX\ often looks one token ahead, and often, when forced to typeset something, also can trigger builders. You're warned. \stopsection \stopchapter \stopcomponent % \starttext % \toks0{\hbox{test}} [\ctxlua{tex.runtoks(0)}]\par % \toks0{\relax\relax\hbox{test}\relax\relax}[\ctxlua{tex.runtoks(0)}]\par % \toks0{xxxxxxx} [\ctxlua{tex.runtoks(0)}]\par % \toks0{\hbox{(\ctxlua{context("test")})}} [\ctxlua{tex.runtoks(0)}]\par % \toks0{\global\setbox1\hbox{(\ctxlua{context("test")})}} [\ctxlua{tex.runtoks(0)}\box1]\par % \startluacode % local s = "[\\ctxlua{tex.runtoks(0)}\\box1]" % context("<") % context( function() context(s) end) % context( function() context(s) end) % context(">") % \stopluacode\par % \toks10000{\hbox{\red test1}} % \toks10002{\green\hbox{test2}} % \toks10004{\hbox{\global\setbox1\hbox to 1000sp{\directlua{context("!4!")}}}} % \toks10006{\hbox{\global\setbox3\hbox to 2000sp{\directlua{context("?6?")}}}} % \hbox{x\startluacode % local s0 = "(\\hbox{\\ctxlua{tex.runtoks(10000)}})" % local s2 = "[\\hbox{\\ctxlua{tex.runtoks(10002)}}]" % context("") % \stopluacode x}\par