% language=us \startcomponent hybrid-lexing \environment hybrid-environment \startchapter[title={Updating the code base}] \startsection [title={Introduction}] After much experimenting with new code in \MKIV\ a new stage in \CONTEXT\ development was entered in the last quarter of 2011. This was triggered by several more or less independent developments. I will discuss some of them here since they are a nice illustration of how \CONTEXT\ evolves. This chapter was published in TugBoat 103; thanks to Karl Berry and Barbara Beeton for making it better. \stopsection \startsection [title={Interfacing}] Wolfgang Schuster, Aditya Mahajan and I were experimenting with an abstraction layer for module writers. In fact this layer itself was a variant of some new mechanisms used in the \MKIV\ structure related code. That code was among the first to be adapted as it is accompanied by much \LUA\ code and has been performing rather well for some years now. In \CONTEXT\ most of the user interface is rather similar and module writers are supposed to follow the same route as the core of \CONTEXT. For those who have looked in the source the following code might look familiar: \starttyping \unexpanded\def\mysetupcommand {\dosingleempty\domysetupcommand} \def\domysetupcommand[#1]% {.......... \getparameters[\??my][#1]% .......... ..........} \stoptyping This implements the command \type {\mysetupcommand} that is used as follows: \starttyping \mysetupcommand[color=red,style=bold,...] \stoptyping The above definition uses three rather low|-|level interfacing commands. The \type {\unexpanded} makes sure that the command does not expand in unexpected ways in cases where expansion is less desirable. (Aside: The \CONTEXT\ \type {\unexpanded} prefix has a long history and originally resulted in the indirect definition of a macro. That way the macro could be part of testing (expanded) equivalence. When \ETEX\ functionality showed up we could use \type {\protected} but we stuck to the name \type {\unexpanded}. So, currently \CONTEXT's \type {\unexpanded} is equivalent to \ETEX's \type {\protected}. Furthermore, in \CONTEXT\ \type {\expanded} is not the same as the \ETEX\ primitive. In order to use the primitives you need to use their \type {\normal...} synonyms.) The \type {\dosingleempty} makes sure that one argument gets seen by injecting a dummy when needed. At some point the \type {\getparameters} command will store the values of keys in a namespace that is determined by \type {\??my}. The namespace used here is actually one of the internal namespaces which can be deduced from the double question marks. Module namespaces have four question marks. There is some magic involved in storing the values. For instance, keys are translated from the interface language into the internal language which happens to be English. This translation is needed because a new command is generated: \starttyping \def\@@mycolor{red} \def\@@mystyle{bold} \stoptyping and such a command can be used internally because in so|-|called unprotected mode \type {@?!} are valid in names. The Dutch equivalent is: \starttyping \mijnsetupcommando[kleur=rood,letter=vet] \stoptyping and here the \type {kleur} has to be converted into \type {color} before the macro is constructed. Of course values themselves can stay as they are as long as checking them uses the internal symbolic names that have the language specific meaning. \starttyping \c!style{color} \k!style{kleur} \v!bold {vet} \stoptyping Internally assignments are done with the \type {\c!} variant, translation of the key is done using the \type {\k!} alternative and values are prefixed by \type {\v!}. It will be clear that for the English user interface no translation is needed and as a result that interface is somewhat faster. There we only need \starttyping \c!style{color} \v!bold {bold} \stoptyping Users never see these prefixed versions, unless they want to define an internationalized style, in which case the form \starttyping \mysetupcommand[\c!style=\v!bold] \stoptyping has to be used, as it will adapt itself to the user interface. This leaves the \type {\??my} that in fact expands to \type {\@@my}. This is the namespace prefix. Is this the whole story? Of course it isn't, as in \CONTEXT\ we often have a generic instance from which we can clone specific alternatives; in practice, the \type {\@@mycolor} variant is used in a few cases only. In that case a setup command can look like: \starttyping \mysetupcommand[myinstance][style=bold] \stoptyping And access to the parameters is done with: \starttyping \getvalue{\??my myinstance\c!color} \stoptyping So far the description holds for \MKII\ as well as \MKIV, but in \MKIV\ we are moving to a variant of this. At the cost of a bit more runtime and helper macros, we can get cleaner low|-|level code. The magic word here is \type {commandhandler}. At some point the new \MKIV\ code started using an extra abstraction layer, but the code needed looked rather repetitive despite subtle differences. Then Wolfgang suggested that we should wrap part of that functionality in a definition macro that could be used to define module setup and definition code in one go, thereby providing a level of abstraction that hides some nasty details. The main reason why code could look cleaner is that the experimental core code provided a nicer inheritance model for derived instances and Wolfgang's letter module uses that extensively. After doing some performance tests with the code we decided that indeed such an initializer made sense. Of course, after that we played with it, some more tricks were added, and eventually I decided to replace the similar code in the core as well, that is: use the installer instead of defining helpers locally. So, how does one install a new setup mechanism? We stick to the core code and leave modules aside for the moment. \starttyping \definesystemvariable{my} \installcommandhandler \??my {whatever} \??my \stoptyping After this command we have available some new helper commands of which only a few are mentioned here (after all, this mechanism is still somewhat experimental): \starttyping \setupwhatever[key=value] \setupwhatever[instance][key=value] \stoptyping Now a value is fetched using a helper: \starttyping \namedwhateverparameter{instance}{key} \stoptyping However, more interesting is this one: \starttyping \whateverparameter{key} \stoptyping For this to work, we need to set the instance: \starttyping \def\currentwhatever{instance} \stoptyping Such a current state macro already was used in many places, so it fits into the existing code quite well. In addition to \type {\setupwhatever} and friends, another command becomes available: \starttyping \definewhatever[instance] \definewhatever[instance][key=value] \stoptyping Again, this is not so much a revolution as we can define such a command easily with helpers, but it pairs nicely with the setup command. One of the goodies is that it provides the following feature for free: \starttyping \definewhatever[instance][otherinstance] \definewhatever[instance][otherinstance][key=value] \stoptyping In some cases this creates more overhead than needed because not all commands have instances. On the other hand, some commands that didn't have instances yet, now suddenly have them. For cases where this is not needed, we provide simple variants of commandhandlers. Additional commands can be hooked into a setup or definition so that for instance the current situation can be updated or extra commands can be defined for this instance, such as \type {\start...} and \type {\stop...} commands. It should be stressed that the installer itself is not that special in the sense that we could do without it, but it saves some coding. More important is that we no longer have the \type {@@} prefixed containers but use \type {\whateverparameter} commands instead. This is definitely slower than the direct macro, but as we often deal with instances, it's not that much slower than \type {\getvalue} and critical components are rather well speed|-|optimized anyway. There is, however, a slowdown due to the way inheritance is implemented. That is how this started out: using a different (but mostly compatible) inheritance model. In the \MKII\ approach (which is okay in itself) inheritance happens by letting values point to the parent value. In the new model we have a more dynamic chain. It saves us macros but can expand quite wildly depending on the depth of inheritance. For instance, in sectioning there can easily be five or more levels of inheritance. So, there we get slower processing. The same is true for \type {\framed} which is a rather critical command, but there it is nicely compensated by less copying. My personal impression is that due to the way \CONTEXT\ is set up, the new mechanism is actually more efficient on an average job. Also, because many constructs also depend on the \type {\framed} command, that one can easily be part of the chain, which again speeds up a bit. In any case, the new mechanisms use much less hash space. Some mechanisms still look too complex, especially when they hook into others. Multiple inheritance is not trivial to deal with, not only because the meaning of keys can clash, but also because supporting it would demand quite complex fully expandable resolvers. So for the moment we stay away from it. In case you wonder why we cannot delegate more to \LUA: it's close to impossible to deal with \TEX's grouping in efficient ways at the \LUA\ end, and without grouping available \TEX\ becomes less useful. Back to the namespace. We already had a special one for modules but after many years of \CONTEXT\ development, we started to run out of two character combinations and many of them had no relation to what name they spaced. As the code base is being overhauled anyway, it makes sense to also provide a new core namespace mechanism. Again, this is nothing revolutionary but it reads much more nicely. \starttyping \installcorenamespace {whatever} \installcommandhandler \??whatever {whatever} \??whatever \stoptyping This time deep down no \type {@@} is used, but rather something more obscure. In any case, no one will use the meaning of the namespace variables, as all access to parameters happens indirectly. And of course there is no speed penalty involved; in fact, we are more efficient. One reason is that we often used the prefix as follows: \starttyping \setvalue{\??my:option:bla}{foo} \stoptyping and now we just say: \starttyping \installcorenamespace {whateveroption} \setvalue{\??whateveroption bla}{foo} \stoptyping The commandhandler does such assignments slightly differently as it has to prevent clashes between instances and keywords. A nice example of such a clash is this: \starttyping \setvalue{\??whateveroption sectionnumber}{yes} \stoptyping In sectioning we have instances named \type {section}, but we also have keys named \type {number} and \type {sectionnumber}. So, we end up with something like this: \starttyping \setvalue{\??whateveroption section:sectionnumber}{yes} \setvalue{\??whateveroption section:number}{yes} \setvalue{\??whateveroption :number}{yes} \stoptyping When I decided to replace code similar to that generated by the installer a new rewrite stage was entered. Therefore one reason for explaining this here is that in the process of adapting the core code instabilities are introduced and as most users use the beta version of \MKIV, some tolerance and flexibility is needed and it might help to know why something suddenly fails. In itself using the commandhandler is not that problematic, but wherever I decide to use it, I also clean up the related code and that is where the typos creep in. Fortunately Wolfgang keeps an eye on the changes so problems that users report on the mailing lists are nailed down relatively fast. Anyway, the rewrite itself is triggered by another event but that one is discussed in the next section. We don't backport (low|-|level) improvements and speedups to \MKII, because for what we need \TEX\ for, we consider \PDFTEX\ and \XETEX\ rather obsolete. Recent tests show that at the moment of this writing a \LUATEX\ \MKIV\ run is often faster than a comparable \PDFTEX\ \MKII\ run (using \UTF-8 and complex font setups). When compared to a \XETEX\ \MKII\ run, a \LUATEX\ \MKIV\ run is often faster, but it's hard to compare, as we have advanced functionality in \MKIV\ that is not (or differently) available in \MKII. \stopsection \startsection [title={Lexing}] The editor that I use, called \SCITE, has recently been extended with an extra external lexer module that makes more advanced syntax highlighting possible, using the \LUA\ \LPEG\ library. It is no secret that the user interface of \CONTEXT\ is also determined by the way structure, definitions and setups can be highlighted in an editor. \footnote {It all started with \type {wdt}, \type {texedit} and \type {texwork}, editors and environments written by myself in \MODULA2 and later in \PERL\ Tk, but that was in a previous century.} When I changed to \SCITE\ I made sure that we had proper highlighting there. At \PRAGMA\ one of the leading principles has always been: if the document source looks bad, mistakes are more easily made and the rendering will also be affected. Or phrased differently: if we cannot make the source look nice, the content is probably not structured that well either. The same is true for \TEX\ source, although to a large extent there one must deal with the specific properties of the language. So, syntax highlighting, or more impressively: lexing, has always been part of the development of \CONTEXT\ and for instance the pretty printers of verbatim provide similar features. For a long time we assumed line|-|based lexing, mostly for reasons of speed. And surprisingly, that works out quite well with \TEX. We used a simple color scheme suitable for everyday usage, with not too intrusive coloring. Of course we made sure that we had runtime spell checking integrated, and that the different user interfaces were served well. But then came the \LPEG\ lexer. Suddenly we could do much more advanced highlighting. Once I started playing with it, a new color scheme was set up and more sophisticated lexing was applied. Just to mention a few properties: \startitemize[packed] \startitem We distinguish between several classes of macro names: primitives, helpers, interfacing, and user macros. \stopitem \startitem In addition we highlight constant values and special registers differently. \stopitem \startitem Conditional constructs can be recognized and are treated as in any regular language (keep in mind that users can define their own). \stopitem \startitem Embedded \METAPOST\ code is lexed independently using a lexer that knows the language's primitives, helpers, user macros, constants and of course specific syntax and drawing operators. Related commands at the \TEX\ end (for defining and processing graphics) are also dealt with. \stopitem \startitem Embedded \LUA\ is lexed independently using a lexer that not only deals with the language but also knows a bit about how it is used in \CONTEXT. Of course the macros that trigger \LUA\ code are handled. \stopitem \startitem Metastructure and metadata related macros are colored in a fashion similar to constants (after all, in a document one will not see any constants, so there is no color clash). \stopitem \startitem Some special and often invisible characters get a special background color so that we can see when there are for instance non|-|breakable spaces sitting there. \stopitem \startitem Real|-|time spell checking is part of the deal and can optionally be turned on. There we distinguish between unknown words, known but potentially misspelled words, and known words. \stopitem \stopitemize Of course we also made lexers for \METAPOST, \LUA, \XML, \PDF\ and text documents so that we have a consistent look and feel. When writing the new lexer code, and testing it on sources, I automatically started adapting the source to the new lexing where possible. Actually, as cleaning up code is somewhat boring, the new lexer is adding some fun to it. I'm not so sure if I would have started a similar overhaul so easily otherwise, especially because the rewrite now also includes speedup and cleanup. At least it helps to recognize less desirable left|-|overs of \MKII\ code. \stopsection \startsection [title={Hiding}] It is interesting to notice that users seldom define commands that clash with low level commands. This is of course a side effect of the fact that one seldom needs to define a command, but nevertheless. Low|-|level commands were protected by prefixing them by one or more (combinations of) \type {do}, \type {re} and \type {no}'s. This habit is a direct effect of the early days of writing macros. For \TEX\ it does not matter how long a name is, as internally it becomes a pointer anyway, but memory consumption of editors, loading time of a format, string space and similar factors determined the way one codes in \TEX\ for quite a while. Nowadays there are hardly any limits and the stress that \CONTEXT\ puts on the \TEX\ engine is even less than in \MKII\ as we delegate many tasks to \LUA. Memory comes cheap, editors can deal with large amount of data (keep in mind that the larger the file gets, the more lexing power can be needed), and screens are wide enough not to lose part of long names in the edges. Another development has been that in \LUATEX\ we have lots of registers so that we no longer have to share temporary variables and such. The rewrite is a good moment to get rid of that restriction. This all means that at some point it was decided to start using longer command names internally and permit \type {_} in names. As I was never a fan of using \type {@} for this, underscore made sense. We have been discussing the use of colons, which is also nice, but has the disadvantage that colons are also used in the source, for instance to create a sub|-|namespace. When we have replaced all old namespaces, colons might show up in command names, so another renaming roundup can happen. One reason for mentioning this is that users get to see these names as part of error messages. An example of a name is: \starttyping \page_layouts_this_or_that \stoptyping The first part of the name is the category of macros and in most cases is the same as the first part of the filename. The second part is a namespace. The rest of the name can differ but we're approaching some consistency in this. In addition we have prefixed names, where prefixes are used as consistently as possible: \starttabulate[|l|l|] \NC \type {t_} \NC token register \NC \NR \NC \type {d_} \NC dimension register \NC \NR \NC \type {s_} \NC skip register \NC \NR \NC \type {u_} \NC muskip register \NC \NR \NC \type {c_} \NC counter register, constant or conditional \NC \NR \NC \type {m_} \NC (temporary) macro \NC \NR \NC \type {p_} \NC (temporary) parameter expansion (value of key)\NC \NR \NC \type {f_} \NC fractions \NC \NR \stoptabulate This is not that different from other prefixing in \CONTEXT\ apart from the fact that from now on those variables (registers) are no longer accessible in a regular run. We might decide on another scheme but renaming can easily be scripted. In the process some of the old prefixes are being removed. The main reason for changing to this naming scheme is that it is more convenient to grep for them. In the process most traditional \type {\if}s get replaced by \quote {conditionals}. The same is true for \type {\chardef}s that store states; these become \quote {constants}. \stopsection \startsection[title=Status] We always try to keep the user interface constant, so most functionality and control stays stable. However, now that most users use \MKIV, commands that no longer make sense are removed. An interesting observation is that some users report that low|-|level macros or registers are no longer accessible. Fortunately that is no big deal as we point them to the official ways to deal with matters. It is also a good opportunity for users to clean up accumulated hackery. The systematic (file by file) cleanup started in the second half of 2011 and as of January 2012 one third of the core (\TEX) modules have to be cleaned up and the planning is to get most of that done as soon as possible. However, some modules will be rewritten (or replaced) and that takes more time. In any case we hope that rather soon most of the code is stable enough that we can start working on new mechanisms and features. Before that a cleanup of the \LUA\ code is planned. Although in many cases there are no fundamental changes in the user interface and functionality, I will wrap up some issues that are currently being dealt with. This is just a snapshot of what is happening currently and as a consequence it describes what users can run into due to newly introduced bugs. The core modules of \CONTEXT\ are loosely organized in groups. Over time there has been some reorganization and in \MKIV\ some code has been moved into new categories. The alphabetical order does not reflect the loading order or dependency tree as categories are loaded intermixed. Therefore the order below is somewhat arbitrary and does not express importance. Each category has multiple files. \startsubsubject[title={anch: anchoring and positioning}] More than a decade ago we started experimenting with position tracking. The ability to store positional information and use that in a second pass permits for instance adding backgrounds. As this code interacts nicely with (runtime) \METAPOST\ it has always been quite powerful and flexible on the one hand, but at the same time it was demanding in terms of runtime and resources. However, were it not for this feature, we would probably not be using \TEX\ at all, as backgrounds and special relative positioning are needed in nearly all our projects. In \MKIV\ this mechanism had already been ported to a hybrid form, but recently much of the code has been overhauled and its \MKII\ artifacts stripped. As a consequence the overhead in terms of memory probably has increased but the impact on runtime has been considerably reduced. It will probably take some time to become stable if only because the glue to \METAPOST\ has changed. There are some new goodies, like backgrounds behind parshapes, something that probably no one uses and is always somewhat tricky but it was not too hard to support. Also, local background support has been improved which means that it's easier to get them in more column-based layouts, several table mechanisms, floats and such. This was always possible but is now more automatic and hopefully more intuitive. \stopsubsubject \startsubsubject[title={attr: attributes}] We use attributes (properties of nodes) a lot. The framework for this had been laid early in \MKIV\ development, so not much has changed here. Of course the code gets cleaner and hopefully better as it is putting quite a load on the processing. Each new feature depending on attributes adds some extra overhead even if we make sure that mechanisms only kick in when they are used. This is due to the fact that attributes are linked lists and although unique lists are shared, they travel with each node. On the other hand, the cleanup (and de|-|\MKII|-|ing) of code leads to better performance so on the average no user will notice this. \stopsubsubject \startsubsubject[title={back: backend code generation}] This category wraps backend issues in an abstract way that is similar to the special drivers in \MKII. So far we have only three backends: \PDF, \XML, and \XHTML. Such code is always in a state of maintenance, if only because backends evolve. \stopsubsubject \startsubsubject[title={bibl: bibliographies}] For a while now, bibliographies have not been an add|-|on but part of the core. There are two variants: traditional \BIBTEX\ support derived from a module by Taco Hoekwater but using \MKIV\ features (the module hooks into core code), and a variant that delegates most work to \LUA\ by creating an in-memory \XML\ tree that gets manipulated. At some point I will extend the second variant. Going the \XML\ route also connects better with developments such as Jean|-|Michel Hufflen's Ml\BIBTEX. \stopsubsubject \startsubsubject[title={blob: typesetting in \LUA}] Currently we only ship a few helpers but eventually this will become a framework for typesetting raw text in \LUA. This might be handy for some projects that we have where the only input is \XML, but I'm not that sure if it will produce nice results and if the code will look better. On the other hand, there are some cases where in a regular \TEX\ run some basic typesetting in \LUA\ might make sense. Of course I also need an occasional pet project so this might qualify as one. \stopsubsubject \startsubsubject[title={buff: buffers and verbatim}] Traditionally buffers and verbatim have always been relatives as they share code. The code was among the first to be adapted to \LUATEX. There is not that much to gain in adapting it further. Maybe I will provide more lexers for pretty|-|printing some day. \stopsubsubject \startsubsubject[title={catc: catcodes}] Catcodes are a rather \TEX|-|specific feature and we have organized them in catcode regimes. The most important recent change has been that some of the characters with a special meaning in \TEX\ (like ampersand, underscore, superscript, etc.) are no longer special except in cases that matter. This somewhat incompatible change surprisingly didn't lead to many problems. Some code that is specific for the \MKII\ \XML\ processor has been removed as we no longer assume it is in \MKIV. \stopsubsubject \startsubsubject[title={char: characters}] This important category deals with characters and their properties. Already from the beginning of \MKIV\ character properties have been (re)organized in \LUA\ tables and therefore much code deals with it. The code is rather stable but occasionally the tables are updated as they depend on developments in \UNICODE. In order to share as much data as possible and prevent duplicates there are several inheritance mechanisms in place but their overhead is negligible. \stopsubsubject \startsubsubject[title={chem: chemistry}] The external module that deals with typesetting chemistry was transformed into a \MKIV\ core module some time ago. Not much has changed in this department but some enhancements are pending. \stopsubsubject \startsubsubject[title={cldf: \CONTEXT\ \LUA\ documents}] These modules are mostly \LUA\ code and are the interface into \CONTEXT\ as well as providing ways to code complete documents in \LUA. This is one of those categories that is visited every now and then to be adapted to improvements in other core code or in \LUATEX. This is one of my favourite categories as it exposes most of \CONTEXT\ at the \LUA\ end which permits writing solutions in \LUA\ while still using the full power of \CONTEXT. A dedicated manual is on its way. \stopsubsubject \startsubsubject[title={colo: colors and transparencies}] This is rather old code, and apart from some cleanup not much has been changed here. Some macros that were seldom used have been removed. One issue that is still pending is a better interface to \METAPOST\ as it has different color models and we have adapted code at that end. This has a rather low priority because in practice it is no real problem. \stopsubsubject \startsubsubject[title={cont: runtime code}] These modules contain code that is loaded at runtime, such as filename remapping, patches, etc. It does not make much sense to improve these. \stopsubsubject \startsubsubject[title={core: all kinds of core code}] Housekeeping is the main target of these modules. There are still some typesetting|-|related components here but these will move to other categories. This code is cleaned up when there is a need for it. Think of managing files, document project structure, module loading, environments, multipass data, etc. \stopsubsubject \startsubsubject[title={data: file and data management}] This category hosts only \LUA\ code and hasn't been touched for a while. Here we deal with locating files, caching, accessing remote data, resources, environments, and the like. \stopsubsubject \startsubsubject[title={enco: encodings}] Because (font) encodings are gone, there is only one file in this category and that one deals with weird (composed or otherwise special) symbols. It also provides a few traditional \TEX\ macros that users expect to be present, for instance to put accents over characters. \stopsubsubject \startsubsubject[title={file: files}] There is some overlap between this category and core modules. Loading files is always somewhat special in \TEX\ as there is the \TEX\ directory structure to deal with. Sometimes you want to use files in the so|-|called tree, but other times you don't. This category provides some management code for (selective) loading of document files, modules and resources. Most of the code works with accompanying \LUA\ code and has not been touched for years, apart from some weeding and low|-|level renaming. The project structure code has mostly been moved to \LUA\ and this mechanism is now more restrictive in the sense that one cannot misuse products and components in unpredictable ways. This change permits better automatic loading of cross references in related documents. \stopsubsubject \startsubsubject[title={font: fonts}] Without proper font support a macro package is rather useless. Of course we do support the popular font formats but nowadays that's mostly delegated to \LUA\ code. What remains at the \TEX\ end is code that loads and triggers a combination of fonts efficiently. Of course in the process text and math each need to get the proper amount of attention. There is no longer shared code between \MKII\ and \MKIV. Both already had rather different low|-|level solutions, but recently with \MKIV\ we went a step further. Of course it made sense to kick out commands that were only used for \PDFTEX\ \TYPEONE\ and \XETEX\ \OPENTYPE\ support but more important was the decision to change the way design sizes are supported. In \CONTEXT\ we have basic font definition and loading code and that hasn't conceptually changed much over the years. In addition to that we have so-called bodyfont environments and these have been made a bit more powerful in recent \MKIV. Then there are typefaces, which are abstract combinations of fonts and defining them happens in typescripts. This layered approach is rather flexible, and was greatly needed when we had all those font encodings (to be used in all kinds of combinations within one document). In \MKIV, however, we already had fewer typescripts as font encodings are gone (also for \TYPEONE\ fonts). However, there remained a rather large blob of definition code dealing with Latin Modern; large because it comes in design sizes. As we always fall back on Latin Modern, and because we don't preload fonts, there is some overhead involved in resolving design size related issues and definitions. But, it happens that this is the only font that ships with many files related to different design sizes. In practice no user will change the defaults. So, although the regular font mechanism still provides flexible ways to define font file combinations per bodyfont size, resolving to the right best matching size now happens automatically via a so|-|called \LUA\ font goodie file which brings down the number of definitions considerably. The consequence is that \CONTEXT\ starts up faster, not only in the case of Latin Modern being used, but also when other designs are in play. The main reason for this is that we don't have to parse those large typescripts anymore, as the presets were always part of the core set of typescripts. At the same time loading a specific predefined set has been automated and optimized. Of course on a run of 30 seconds this is not that noticeable, but it is on a 5 second run or when testing something in the editor that takes less than a second. It also makes a difference in automated workflows; for instance at \PRAGMA\ we run unattended typesetting flows that need to run as fast as possible. Also, in virtual machines using network shares, the fewer files consulted the better. Because math support was already based on \OPENTYPE, where \CONTEXT\ turns \TYPEONE\ fonts into \OPENTYPE\ at runtime, nothing fundamental has changed here, apart from some speedups (at the cost of some extra memory). Where the overhead of math font switching in \MKII\ is definitely a factor, in \MKIV\ it is close to negligible, even if we mix regular, bold, and bidirectional math, which we have done for a while. The low|-|level code has been simplified a bit further by making a better distinction between the larger sizes (\type {a} up to \type {d}) and smaller sizes (\type {x} and \type {xx}). These now operate independently of each other (i.e.\ one can now have a smaller relative \type {x} size of a larger one). This goes at the cost of more resources but it is worth the effort. By splitting up the large basic font module into smaller ones, I hope that it can be maintained more easily although someone familiar with the older code will only recognize bits and pieces. This is partly due to the fact that font code is highly optimized. \stopsubsubject \startsubsubject[title={grph: graphic (and widget) inclusion}] Graphics inclusion is always work in progress as new formats have to be dealt with or users want additional conversions to be done. This code will be cleaned up later this year. The plug|-|in mechanisms will be extended (examples of existing plug|-|ins are automatic converters and barcode generation). \stopsubsubject \startsubsubject[title={hand: special font handling}] As we treat protrusion and hz as features of a font, there is not much left in this category apart from some fine|-|tuning. So, not much has happened here and eventually the left|-|overs in this category might be merged with the font modules. \stopsubsubject \startsubsubject[title={java: \JAVASCRIPT\ in \PDF}] This code already has been cleaned up a while ago, when moving to \MKIV, but we occasionally need to check and patch due to issues with \JAVASCRIPT\ engines in viewers. \stopsubsubject \startsubsubject[title={lang: languages and labels}] There is not much changed in this department, apart from additional labels. The way inheritance works in languages differs too much from other inheritance code so we keep what we have here. Label definitions have been moved to \LUA\ tables from which labels at the \TEX\ end are defined that can then be overloaded locally. Of course the basic interface has not changed as this is typically code that users will use in styles. \stopsubsubject \startsubsubject[title={luat: housekeeping}] This is mostly \LUA\ code needed to get the basic components and libraries in place. While the \type {data} category implements the connection to the outside world, this category runs on top of that and feeds the \TEX\ machinery. For instance conversion of \MKVI\ files happens here. These files are seldom touched but might need an update some time (read: prune obsolete code). \stopsubsubject \startsubsubject[title={lpdf: \PDF\ backend}] Here we implement all kinds of \PDF\ backend features. Most are abstracted via the backend interface. So, for instance, colors are done with a high level command that goes via the backend interface to the \type {lpdf} code. In fact, there is more such code than in (for instance) the \MKII\ special drivers, but readability comes at a price. This category is always work in progress as insights evolve and users demand more. \stopsubsubject \startsubsubject[title={lxml: \XML\ and lpath}] As this category is used by some power users we cannot change too much here, apart from speedups and extensions. It's also the bit of code we use frequently at \PRAGMA, and as we often have to deal with rather crappy \XML\ I expect to move some more helpers into the code. The latest greatest trickery related to proper typesetting can be seen in the documents made by Thomas Schmitz. I wonder if I'd still have fun doing our projects if I hadn't, in an early stage of \MKIV, written the \XML\ parser and expression parser used for filtering. \stopsubsubject \startsubsubject[title={math: mathematics}] Math deserves its own category but compared to \MKII\ there is much less code, thanks to \UNICODE. Since we support \TYPEONE\ as virtual \OPENTYPE\ nothing special is needed there (and eventually there will be proper fonts anyway). When rewriting code I try to stay away from hacks, which is sometimes possible by using \LUA\ but it comes with a slight speed penalty. Much of the \UNICODE\ math|-|related font code is already rather old but occasionally we add new features. For instance, because \OPENTYPE\ has no italic correction we provide an alternative (mostly automated) solution. On the agenda is more structural math encoding (maybe like openmath) but tagging is already part of the code so we get a reasonable export. Not that someone is waiting for it, but it's there for those who want it. Most math|-|related character properties are part of the character database which gets extended on demand. Of course we keep \MATHML\ up|-|to|-|date because we need it in a few projects. We're not in a hurry here but this is something where Aditya and I have to redo some of the code that provides \AMS|-|like math commands (but as we have them configurable some work is needed to keep compatibility). In the process it's interesting to run into probably never|-|used code, so we just remove those artifacts. \stopsubsubject \startsubsubject[title={meta: metapost interfacing}] This and the next category deal with \METAPOST. This first category is quite old but already adapted to the new situation. Sometimes we add extra functionality but the last few years the situation has become rather stable with the exception of backgrounds, because these have been overhauled completely. \stopsubsubject \startsubsubject[title={mlib: metapost library}] Apart from some obscure macros that provide the interface between front- and backend this is mostly \LUA\ code that controls the embedded \METAPOST\ library. So, here we deal with extensions (color, shading, images, text, etc.) as well as runtime management because sometimes two runs are needed to get a graphic right. Some time ago, the \MKII|-|like extension interface was dropped in favor of one more natural to the library and \METAPOST~2. As this code is used on a daily basis it is quite well debugged and the performance is pretty good too. \stopsubsubject \startsubsubject[title={mult: multi|-|lingual user interface}] Even if most users use the English user interface, we keep the other ones around as they're part of the trademark. Commands, keys, constants, messages and the like are now managed with \LUA\ tables. Also, some of the tricky remapping code has been stripped because the setup definitions files are dealt with. These are \XML\ files that describe the user interface that get typeset and shipped with \CONTEXT. These files are being adapted. First of all the commandhandler code is defined here. As we use a new namespace model now, most of these namespaces are defined in the files where they are used. This is possible because they are more verbose so conflicts are less likely (also, some checking is done to prevent reuse). Originally the namespace prefixes were defined in this category but eventually all that code will be gone. This is a typical example where 15|-|year|-|old constraints are no longer an issue and better code can be used. \stopsubsubject \startsubsubject[title={node: nodes}] This is a somewhat strange category as all typeset material in \TEX\ becomes nodes so this deals with everything. One reason for this category is that new functionality often starts here and is sometimes shared between several mechanisms. So, for the moment we keep this category. Think of special kerning, insert management, low|-|level referencing (layer between user code and backend code) and all kinds of rule and displacement features. Some of this functionality is described in previously published documents. \stopsubsubject \startsubsubject[title={norm: normalize primitives}] We used to initialize the primitives here (because \LUATEX\ starts out blank). But after moving that code this category only has one definition left and that one will go too. In \MKII\ these files are still used (and actually generated by \MKIV). \stopsubsubject \startsubsubject[title={pack: wrapping content in packages}] This is quite an important category as in \CONTEXT\ lots of things get packed. The best example is \type {\framed} and this macro has been maximally optimized, which is not that trivial since much can be configured. The code has been adapted to work well with the new commandhandler code and in future versions it might use the commandhandler directly. This is however not that trivial because hooking a setup of a command into \type {\framed} can conflict with the two commands using keys for different matters. Layers are also in this category and they probably will be further optimized. Reimplementing reusable objects is on the horizon, but for that we need a more abstract \LUA\ interface, so that will come first. This has a low priority because it all works well. This category also hosts some helpers for the page builder but the builder itself has a separate category. \stopsubsubject \startsubsubject[title={page: pages and output routines}] Here we have an old category: output routines (trying to make a page), page building, page imposition and shipout, single and multi column handling, very special page construction, line numbering, and of course setting up pages and layouts. All this code is being redone stepwise and stripped of old hacks. This is a cumbersome process as these are core components where side effects are sometimes hard to trace because mechanisms (and user demands) can interfere. Expect some changes for the good here. \stopsubsubject \startsubsubject[title={phys: physics}] As we have a category for chemistry it made sense to have one for physics and here is where the unit module's code ended up. So, from now on units are integrated into the core. We took the opportunity to rewrite most of it from scratch, providing a bit more control. \stopsubsubject \startsubsubject[title={prop: properties}] The best|-|known property in \TEX\ is a font and color is a close second. Both have their own category of files. In \MKII\ additional properties like backend layers and special rendering of text were supported in this category but in \MKIV\ properties as a generic feature are gone and replaced by more specific implementations in the \type {attr} namespace. We do issue a warning when any of the old methods are used. \stopsubsubject \startsubsubject[title={regi: input encodings}] We still support input encoding regimes but hardly any \TEX\ code is involved now. Only when users demand more functionality does this code get extended. For instant, recently a user wanted a conversion function for going from \UTF-8 to an encoding that another program wanted to see. \stopsubsubject \startsubsubject[title={scrn: interactivity and widgets}] All modules in this category have been overhauled. On the one hand we lifted some constraints, for instance the delayed initialization of fields no longer makes sense as we have a more dynamic variable resolver now (which is somewhat slower but still acceptable). On the other hand some nice but hard to maintain features have been simplified (not that anyone will notice as they were rather special). The reason for this is that vaguely documented \PDF\ features tend to change over time which does not help portability. Of course there have also been some extensions, and it is actually less hassle (but still no fun) to deal with such messy backend related code in \LUA. \stopsubsubject \startsubsubject[title={scrp: script|-|specific tweaks}] These are script|-|specific \LUA\ files that help with getting better results for scripts like \CJK. Occasionally I look at them but how they evolve depends on usage. I have some very experimental files that are not in the distribution. \stopsubsubject \startsubsubject[title={sort: sorting}] As sorting is delegated to \LUA\ there is not much \TEX\ code here. The \LUA\ code occasionally gets improved if only because users have demands. For instance, sorting Korean was an interesting exercise, as was dealing with multiple languages in one index. Because sorting can happen on a combination of \UNICODE, case, shape, components, etc.\ the sorting mechanism is one of the more complex subsystems. \stopsubsubject \startsubsubject[title={spac: spacing}] This important set of modules is responsible for vertical spacing, strut management, justification, grid snapping, and all else that relates to spacing and alignments. Already in an early stage vertical spacing was mostly delegated to \LUA\ so there we're only talking of cleaning up now. Although \unknown\ I'm still not satisfied with the vertical spacing solution because it is somewhat demanding and an awkward mix of \TEX\ and \LUA\ which is mostly due to the fact that we cannot evaluate \TEX\ code in \LUA. Horizontal spacing can be quite demanding when it comes down to configuration: think of a table with 1000 cells where each cell has to be set up (justification, tolerance, spacing, protrusion, etc.). Recently a more drastic optimization has been done which permits even more options but at the same time is much more efficient, although not in terms of memory. Other code, for instance spread|-|related status information, special spacing characters, interline spacing and linewise typesetting all falls into this category and there is probably room for improvement there. It's good to mention that in the process of the current cleanup hardly any \LUA\ code gets touched, so that's another effort. \stopsubsubject \startsubsubject[title={strc: structure}] Big things happened here but mostly at the \TEX\ end as the support code in \LUA\ was already in place. In this category we collect all code that gets or can get numbered, moves around and provides visual structure. So, here we find itemize, descriptions, notes, sectioning, marks, block moves, etc. This means that the code here interacts with nearly all other mechanisms. Itemization now uses the new inheritance code instead of its own specific mechanism but that is not a fundamental change. More important is that code has been moved around, stripped, and slightly extended. For instance, we had introduced proper \type {\startitem} and \type {\stopitem} commands which are somewhat conflicting with \type {\item} where a next instance ends a previous one. The code is still not nice, partly due to the number of options. The code is a bit more efficient now but functionally the same. The sectioning code is under reconstruction as is the code that builds lists. The intention is to have a better pluggable model and so far it looks promising. As similar models will be used elsewhere we need to converge to an acceptable compromise. One thing is clear: users no longer need to deal with arguments but variables and no longer with macros but with setups. Of course providing backward compatibility is a bit of a pain here. The code that deals with descriptions, enumerations and notes was already done in a \MKIV\ way, which means that they run on top of lists as storage and use the generic numbering mechanism. However, they had their own inheritance support code and moving to the generic code was a good reason to look at them again. So, now we have a new hierarchy: constructs, descriptions, enumerations and notations where notations are hooked into the (foot)note mechanisms. These mechanisms share the rendering code but operate independently (which was the main challenge). I did explore the possibility of combining the code with lists as there are some similarities but the usual rendering is too different as in the interface (think of enumerations with optional local titles, multiple notes that get broken over pages, etc.). However, as they are also stored in lists, users can treat them as such and reuse the information when needed (which for instance is just an alternative way to deal with end notes). At some point math formula numbering (which runs on top of enumerations) might get its own construct base. Math will be revised when we consider the time to be ripe for it anyway. The reference mechanism is largely untouched as it was already doing well, but better support has been added for automatic cross|-|document referencing. For instance it is now easier to process components that make up a product and still get the right numbering and cross referencing in such an instance. Float numbering, placement and delaying can all differ per output routine (single column, multi|-|column, columnset, etc.). Some of the management has moved to \LUA\ but most is just a job for \TEX. The better some support mechanisms become, the less code we need here. Registers will get the same treatment as lists: even more user control than is already possible. Being a simple module this is a relatively easy task, something for a hot summer day. General numbering is already fine as are block moves so they come last. The \XML\ export and \PDF\ tagging is also controlled from this category. \stopsubsubject \startsubsubject[title={supp: support code}] Support modules are similar to system ones (discussed later) but on a slightly more abstract level. There are not that many left now so these might as well become system modules at some time. The most important one is the one dealing with boxes. The biggest change there is that we use more private registers. I'm still not sure what to do with the visual debugger code. The math|-|related code might move to the math category. \stopsubsubject \startsubsubject[title={symb: symbols}] The symbol mechanisms organizes special characters in groups. With \UNICODE|-|related fonts becoming more complete we hardly need this mechanism. However, it is still the abstraction used in converters (for instance footnote symbols and interactive elements). The code has been cleaned up a bit but generally stays as is. \stopsubsubject \startsubsubject[title={syst: tex system level code}] Here you find all kinds of low|-|level helpers. Most date from early times but have been improved stepwise. We tend to remove obscure helpers (unless someone complains loudly) and add new ones every now and then. Even if we would strip down \CONTEXT\ to a minimum size, these modules would still be there. Of course the bootstrap code is also in this category: think of allocators, predefined constants and such. \stopsubsubject \startsubsubject[title={tabl: tables}] The oldest table mechanism was a quite seriously patched version of \TABLE\ and finally the decision has been made to strip, replace and clean up that bit. So, we have less code, but more features, such as colored columns and more. The (in|-|stream) tabulate code is mostly unchanged but has been optimized (again) as it is often used. The multipass approach stayed but is somewhat more efficient now. The natural table code was originally meant for \XML\ processing but is quite popular among users. The functionality and code is frozen but benefits from optimizations in other areas. The reason for the freeze is that it is pretty complex multipass code and we don't want to break anything. As an experiment, a variant of natural tables was made. Natural tables have a powerful inheritance model where rows and cells (first, last, \unknown) can be set up as a group but that is rather costly in terms of runtime. The new table variant treats each column, row and cell as an instance of \type {\framed} where cells can be grouped arbitrarily. And, because that is somewhat extreme, these tables are called x|-|tables. As much of the logic has been implemented in \LUA\ and as these tables use buffers (for storing the main body) one could imagine that there is some penalty involved in going between \TEX\ and \LUA\ several times, as we have a two, three or four pass mechanism. However, this mechanism is surprisingly fast compared to natural tables. The reason for writing it was not only speed, but also the fact that in a project we had tables of 50 pages with lots of spans and such that simply didn't fit into \TEX's memory any more, took ages to process, and could also confuse the float splitter. Line tables \unknown\ well, I will look into them when needed. They are nice in a special way, as they can split vertically and horizontally, but they are seldom used. (This table mechanism was written for a project where large quantities of statistical data had to be presented.) \stopsubsubject \startsubsubject[title={task: lua tasks}] Currently this is mostly a place where we collect all kinds of tasks that are delegated to \LUA, often hooked into callbacks. No user sees this code. \stopsubsubject \startsubsubject[title={toks: token lists}] This category has some helpers that are handy for tracing or manuals but no sane user will ever use them, I expect. However, at some point I will clean up this old \MKIV\ mess. This code might end up in a module outside the core. \stopsubsubject \startsubsubject[title={trac: tracing}] A lot of tracing is possible in the \LUA\ code, which can be controlled from the \TEX\ end using generic enable and disable commands. At the macro level we do have some tracing but this will be replaced by a similar mechanism. This means that many \type {\tracewhatevertrue} directives will go away and be replaced. This is of course introducing some incompatibility but normally users don't use this in styles. \stopsubsubject \startsubsubject[title={type: typescripts}] We already mentioned that typescripts relate to fonts. Traditionally this is a layer on top of font definitions and we keep it this way. In this category there are also the definitions of typefaces: combinations of fonts. As we split the larger into smaller ones, there are many more files now. This has the added benefit that we use less memory as typescripts are loaded only once and stored permanently. \stopsubsubject \startsubsubject[title={typo: typesetting and typography}] This category is rather large in \MKIV\ as we move all code into here that somehow deals with special typesetting. Here we find all kinds of interesting new code that uses \LUA\ solutions (slower but more robust). Much has been discussed in articles as they are nice examples and often these are rather stable. The most important new kid on the block is margin data, which has been moved into this category. The new mechanism is somewhat more powerful but the code is also quite complex and still experimental. The functionality is roughly the same as in \MKII\ and older \MKIV, but there is now more advanced inheritance, a clear separation between placement and rendering, slightly more robust stacking, local anchoring (new). It was a nice challenge but took a bit more time than other reimplementations due to all kinds of possible interference. Also, it's not always easy to simulate \TEX\ grouping in a script language. Even if much more code is involved, it looks like the new implementation is somewhat faster. I expect to clean up this code a couple of times. On the agenda is not only further cleanup of all modules in this category, but also more advanced control over paragraph building. There is a parbuilder written in \LUA\ on my machine for years already which we use for experiments and in the process a more \LUATEX-ish (and efficient) way of dealing with protrusion has been explored. But for this to become effective, some of the \LUATEX\ backend code has to be reorganized and Hartmut wants do that first. In fact, we can then backport the new approach to the built|-|in builder, which is not only faster but also more efficient in terms of memory usage. \stopsubsubject \startsubsubject[title={unic: \UNICODE\ vectors and helpers}] As \UNICODE\ support is now native all the \MKII\ code (mostly vectors and converters) is gone. Only a few helpers remain and even these might go away. Consider this category obsolete and replaced by the \type {char} category. \stopsubsubject \startsubsubject[title={util: utility functions}] These are \LUA\ files that are rather stable. Think of parsers, format generation, debugging, dimension helpers, etc. Like the data category, this one is loaded quite early. \stopsubsubject \startsubsubject[title={Other \TEX\ files}] Currently there are the above categories which can be recognized by filename and prefix in macro names. But there are more files involved. For instance, user extensions can go into these categories as well but they need names starting with something like \type {xxxx-imp-} with \type {xxxx} being the category. Then there are modules that can be recognized by their prefix: \type {m-} (basic module), \type {t-} (third party module), \type {x-} (\XML|-|specific module), \type {u-} (user module), \type {p-} (private module). Some modules that Wolfgang and Aditya are working on might end up in the core distribution. In a similar fashion some seldom used core code might get moved to (auto|-|loaded) modules. There are currently many modules that provide tracing for mechanisms (like font and math) and these need to be normalized into a consistent interface. Often such modules show up when we work on an aspect of \CONTEXT\ or \LUATEX\ and at that moment integration is not high on the agenda. \stopsubsubject \startsubsubject[title={\METAPOST\ files}] A rather fundamental change in \METAPOST\ is that it no longer has a format (mem file). Maybe at some point it will read \type {.gz} files, but all code is loaded at runtime. For this reason I decided to split the files for \MKII\ and \MKIV\ as having version specific code in a common set no longer makes much sense. This means that already for a while we have \type {.mpii} and \type {.mpiv} files with the latter category being more efficient because we delegate some backend|-|related issues to \CONTEXT\ directly. I might split up the files for \MKIV\ a bit more so that selective loading is easier. This gives a slight performance boost when working over a network connection. \stopsubsubject \startsubsubject[title={\LUA\ files}] There are some generic helper modules, with names starting with \type {l-}. Then there are the \type {mtx-*} scripts for all kinds of management tasks with the most important one being \type {mtx-context} for managing a \TEX\ run. \stopsubsubject \startsubsubject[title={Generic files}] This leaves the bunch of generic files that provides \OPENTYPE\ support to packages other than \CONTEXT. Much time went into moving \CONTEXT|-|specific code out of the way and providing a better abstract interface. This means that new \CONTEXT\ code (we provide more font magic) will be less likely to interfere and integration is easier. Of course there is a penalty for \CONTEXT\ but it is bearable. And yes, providing generic code takes quite a lot of time so I sometimes wonder why I did it in the first place, but currently the maintenance burden is rather low. Khaled Hosny is responsible for bridging this code to \LATEX. \stopsubsubject \stopsection \startsection[title={What next}] Here ends this summary of the current state of \CONTEXT. I expect to spend the rest of the year on further cleaning up. I'm close to halfway now. What I really like is that many users upgrade as soon as there is a new beta, and as in a rewrite typos creep in, I therefore often get a fast response. Of course it helps a lot that Wolfgang Schuster, Aditya Mahajan, and Luigi Scarso know the code so well that patches show up on the list shortly after a problem gets reported. Also, for instance Thomas Schmitz uses the latest betas in academic book production, presentations, lecture notes and more, and so provides invaluable fast feedback. And of course Mojca Miklavec keeps all of it (and us) in sync. Such a drastic cleanup could not be done without their help. So let's end this status report with \unknown\ a big thank you to all those (unnamed) patient users and contributors. \stopsection \stopchapter \stopcomponent