% language=us runpath=texruns:manuals/beyond

\startcomponent beyond-namespaces

\environment beyond-style

\startchapter[title={Namespaces},author=Hans Hagen]

Occasionally on \TEX\ related mailing lists, meetings, articles or forums
performance comes up. It makes no sense for me to go into the specific (assumed)
bottlenecks mentioned but as in \CONTEXT\ we do keep an eye on performance every
now and then I also spend words on it, so here are some.

The nature of the (multilingual) user interface of \CONTEXT\ there is extensive
use of the \type {\csname} and related primitives. For instance, if we have the
namespace \type {999>} and a keyword \typ {testkeyword}, we can have a specific
property set with:

\starttyping
\expandafter\def\csname 999>testkeyword\endcsname{}
\stoptyping

We can then test if a macro with the inaccessible name \quote {999>testkeyword}
exists and has been set with a test command available in all engines that carry
\ETEX\ extensions:

\starttyping
\ifcsname 999>testkeyword\endcsname
    % whatever
\fi
\stoptyping

In order to test this, the list of tokens starting at \type {9} and ending at
\type {d} has to be converted into a (\CCODE) string that is used for a hash
lookup. One can expect this to be a costly operation. In a 300 page book with
many thousands of formulas this easily runs into the millions. Testing this five
times on one million such tests gives:

\starttyping
0.303 0.293 0.283 0.301 0.298
\stoptyping

for \LUAMETATEX\ and

\starttyping
0.276 0.287 0.287 0.274 0.274
\stoptyping

for \LUATEX. I deliberately show five numbers because one has to keep some system
load into account. When I'm interested in performance I only care about trends
because no run ever gets the whole machine for its job. That said, where does the
noticeable difference between these engines come from? It can partly be explained
by \LUAMETATEX\ having more primitives and therefore a bit more overhead (more
scattered code in memory and \CPU\ cache). But as the basic code that kicks in
here is not that much different I figured that it might be the hash lookup and,
because indeed we had a follow up lookup in the hash (two steps), by using a
larger hash table we could limit that to a direct hit.

\starttyping
0.288 0.281 0.280 0.288 0.277
\stoptyping

So we ended up with similar measurements for these engines. Before we carry on,
let's ask ourselves if these numbers worry us. Say that this book takes 12
seconds to process, does it matter much if we half this overhead? Probably not,
but in the following, we need to keep in mind that much can interfere. A simple
million times test is likely very \CPU\ cache friendly. There are however other
factors in play: convenience coding, abstraction, less cluttered tracing, more
detailed feedback from the engine, less code and memory usage, the size of the
format file. Trying to get lower numbers is also kind of fun.

Back to the user interface, we now introduce some abstraction (the \type {test}
in the names avoids clashes with existing definitions):

\starttyping
\def\??testfoo    {999>}
\def\c!testkeyword{keyword}

\ifcsname\??testfoo\c!testkeyword\endcsname
    % whatever
\fi
\stoptyping

Again \LUAMETATEX\ is a little slower but it is kind of noise:

\starttyping
0.243 0.243 0.247 0.241 0.249 luatex
0.251 0.250 0.250 0.249 0.249 luametatex
\stoptyping

But how about the following timings for \LUAMETATEX:

\starttyping
0.136 0.143 0.139 0.139 0.140
0.132 0.132 0.133 0.129 0.130
\stoptyping

In the first case we defined the namespace and keyword as follows:

\starttyping
\cdef\??testfoo    {999>}
\cdef\c!testkeyword{keyword}
\stoptyping

A \type {\cdef}'s macro is basically an \type {\edef}. This definition is scanned
as token list and therefore we know the macro has no arguments. It operates as
any macro but in a \type {\csname} related command it is just passes as|-|is and
only expanded when we need to do a lookup. When that happens we don't need to go
through a token list (copy) but directly can go to string characters.

The second measurement shows a little improvement and is the outcome from an
experiment with build in namespaces. Think of this:

\starttyping
\namespaceifcsnamedef\iftestfoocsname 999

\iftestfoocsname\c!testkeyword\endcsname
    % whatever
\fi
\stoptyping

That variant is faster but we're talking .05 second on 2.5 million calls in the
book because we already use \type {\cdef}. Even more important is to notice that
most documents have only tens of thousands such calls. And 0.15 seconds csname
\quotation {test and call} on the whole run is not that bad. So, if we go beyond
\type {\cdef} usage we don't need the efficiency argument but the other ones. So,
after a few days of playing with this I rejected this solution. First of all the
source didn't become more readable. We also had many more commands because there
were for instance:

% \namespacestring 999

\starttyping
\namespacecsnamedef     \csnamefoo      999
\namespacedefcsnamedef  \defcsnamefoo   999
\namespaceifcsnamedef   \ifcsnamefoo    999
\namespacebegincsnamedef\begincsnamefoo 999
\stoptyping

We also had a callback for reporting associated names when tracing. Of course
there can be use cases where we have tens of millions of \type {\csname} calls
but I still need to find them. But don't expect miracles now that we're in these
low numbers. Integrating all this is also not that trivial because \TEX\ has two
separated code paths for expandable commands and ones more related to
housekeeping and typesetting (the mail loop). This means that one has to
intercept expansion of encoded namespaces and that gives a bit of a mess,
especially because we also need to handle nested csnames.

As an aside I also played a bit with \quote {compiling} regular csname commands
followed by a namespace into one token but that was even more messier. \footnote
{Occasionally I consider some compilation of tokens lists into more efficient
ones but so far I could resist.} So in the end I removed all that experimental
namespace code and happily accept the fact that there's nothing to gain, but it
was a fun experiment.

As a side effect of this experiment I decided to enable a primitive that had been
commented. When it was tested years ago there was no real gain but I realized that
it could be implemented a bit more efficient in specific scenarios. Think of this:

\starttyping
\csname\ifcsname999>foobar:width\endcsname999>foo:width\fi\endcsname
\stoptyping

when abstracted becomes:

\starttyping
\csname\ifcsname\??testme foobar:\c!width\endcsname\??testme foo:\c!width\fi\endcsname
\stoptyping

In both cases the same list of tokens (\typ {\??testme foobar:\c!width}) has to
be converted into a byte string, which we can avoid by:

\starttyping
\csname\ifcsname\??testme foobar:\c!width\endcsname\csnamestring\fi\endcsname
\stoptyping

when we have a hit. After all, the found macro has a known name that has been
registered as a string. This variant runs over 10 percent faster, which of course
can be neglected, especially if we don't call it millions of times; the book has
400.000 calls to \typ {\csnamestring}. But as with many optimizations: gaining 20
times 0.1 seconds on different subsystems eventually adds up to 20 \percent\ on a
10 seconds run for that 300 page, math extensive, book.

When looking at timings one always need to keep in mind that a simple test (in a
loop) is very easy on the \CPU\ cache while in a real document there can be more
cache misses simply because the cache is limited in size. That is why in practice
we often see a bit more positive impact than shown here. In the case of the \typ
{\csnamestring} we not only gain a bit on parameter handling but also on some
font related operations, but again the gain depends on how many (more complex)
font switches happen, which is more likely in for instance manuals.

\stopchapter

\stopcomponent