beyond-namespaces.tex /size: 7894 b    last modification: 2025-02-21 11:03
1% language=us runpath=texruns:manuals/beyond
2
3\startcomponent beyond-namespaces
4
5\environment beyond-style
6
7\startchapter[title={Namespaces},author=Hans Hagen]
8
9Occasionally on \TEX\ related mailing lists, meetings, articles or forums
10performance comes up. It makes no sense for me to go into the specific (assumed)
11bottlenecks mentioned but as in \CONTEXT\ we do keep an eye on performance every
12now and then I also spend words on it, so here are some.
13
14The nature of the (multilingual) user interface of \CONTEXT\ there is extensive
15use of the \type {\csname} and related primitives. For instance, if we have the
16namespace \type {999>} and a keyword \typ {testkeyword}, we can have a specific
17property set with:
18
19\starttyping
20\expandafter\def\csname 999>testkeyword\endcsname{}
21\stoptyping
22
23We can then test if a macro with the inaccessible name \quote {999>testkeyword}
24exists and has been set with a test command available in all engines that carry
25\ETEX\ extensions:
26
27\starttyping
28\ifcsname 999>testkeyword\endcsname
29    % whatever
30\fi
31\stoptyping
32
33In order to test this, the list of tokens starting at \type {9} and ending at
34\type {d} has to be converted into a (\CCODE) string that is used for a hash
35lookup. One can expect this to be a costly operation. In a 300 page book with
36many thousands of formulas this easily runs into the millions. Testing this five
37times on one million such tests gives:
38
39\starttyping
400.303 0.293 0.283 0.301 0.298
41\stoptyping
42
43for \LUAMETATEX\ and
44
45\starttyping
460.276 0.287 0.287 0.274 0.274
47\stoptyping
48
49for \LUATEX. I deliberately show five numbers because one has to keep some system
50load into account. When I'm interested in performance I only care about trends
51because no run ever gets the whole machine for its job. That said, where does the
52noticeable difference between these engines come from? It can partly be explained
53by \LUAMETATEX\ having more primitives and therefore a bit more overhead (more
54scattered code in memory and \CPU\ cache). But as the basic code that kicks in
55here is not that much different I figured that it might be the hash lookup and,
56because indeed we had a follow up lookup in the hash (two steps), by using a
57larger hash table we could limit that to a direct hit.
58
59\starttyping
600.288 0.281 0.280 0.288 0.277
61\stoptyping
62
63So we ended up with similar measurements for these engines. Before we carry on,
64let's ask ourselves if these numbers worry us. Say that this book takes 12
65seconds to process, does it matter much if we half this overhead? Probably not,
66but in the following, we need to keep in mind that much can interfere. A simple
67million times test is likely very \CPU\ cache friendly. There are however other
68factors in play: convenience coding, abstraction, less cluttered tracing, more
69detailed feedback from the engine, less code and memory usage, the size of the
70format file. Trying to get lower numbers is also kind of fun.
71
72Back to the user interface, we now introduce some abstraction (the \type {test}
73in the names avoids clashes with existing definitions):
74
75\starttyping
76\def\??testfoo    {999>}
77\def\c!testkeyword{keyword}
78
79\ifcsname\??testfoo\c!testkeyword\endcsname
80    % whatever
81\fi
82\stoptyping
83
84Again \LUAMETATEX\ is a little slower but it is kind of noise:
85
86\starttyping
870.243 0.243 0.247 0.241 0.249 luatex
880.251 0.250 0.250 0.249 0.249 luametatex
89\stoptyping
90
91But how about the following timings for \LUAMETATEX:
92
93\starttyping
940.136 0.143 0.139 0.139 0.140
950.132 0.132 0.133 0.129 0.130
96\stoptyping
97
98In the first case we defined the namespace and keyword as follows:
99
100\starttyping
101\cdef\??testfoo    {999>}
102\cdef\c!testkeyword{keyword}
103\stoptyping
104
105A \type {\cdef}'s macro is basically an \type {\edef}. This definition is scanned
106as token list and therefore we know the macro has no arguments. It operates as
107any macro but in a \type {\csname} related command it is just passes as|-|is and
108only expanded when we need to do a lookup. When that happens we don't need to go
109through a token list (copy) but directly can go to string characters.
110
111The second measurement shows a little improvement and is the outcome from an
112experiment with build in namespaces. Think of this:
113
114\starttyping
115\namespaceifcsnamedef\iftestfoocsname 999
116
117\iftestfoocsname\c!testkeyword\endcsname
118    % whatever
119\fi
120\stoptyping
121
122That variant is faster but we're talking .05 second on 2.5 million calls in the
123book because we already use \type {\cdef}. Even more important is to notice that
124most documents have only tens of thousands such calls. And 0.15 seconds csname
125\quotation {test and call} on the whole run is not that bad. So, if we go beyond
126\type {\cdef} usage we don't need the efficiency argument but the other ones. So,
127after a few days of playing with this I rejected this solution. First of all the
128source didn't become more readable. We also had many more commands because there
129were for instance:
130
131% \namespacestring 999
132
133\starttyping
134\namespacecsnamedef     \csnamefoo      999
135\namespacedefcsnamedef  \defcsnamefoo   999
136\namespaceifcsnamedef   \ifcsnamefoo    999
137\namespacebegincsnamedef\begincsnamefoo 999
138\stoptyping
139
140We also had a callback for reporting associated names when tracing. Of course
141there can be use cases where we have tens of millions of \type {\csname} calls
142but I still need to find them. But don't expect miracles now that we're in these
143low numbers. Integrating all this is also not that trivial because \TEX\ has two
144separated code paths for expandable commands and ones more related to
145housekeeping and typesetting (the mail loop). This means that one has to
146intercept expansion of encoded namespaces and that gives a bit of a mess,
147especially because we also need to handle nested csnames.
148
149As an aside I also played a bit with \quote {compiling} regular csname commands
150followed by a namespace into one token but that was even more messier. \footnote
151{Occasionally I consider some compilation of tokens lists into more efficient
152ones but so far I could resist.} So in the end I removed all that experimental
153namespace code and happily accept the fact that there's nothing to gain, but it
154was a fun experiment.
155
156As a side effect of this experiment I decided to enable a primitive that had been
157commented. When it was tested years ago there was no real gain but I realized that
158it could be implemented a bit more efficient in specific scenarios. Think of this:
159
160\starttyping
161\csname\ifcsname999>foobar:width\endcsname999>foo:width\fi\endcsname
162\stoptyping
163
164when abstracted becomes:
165
166\starttyping
167\csname\ifcsname\??testme foobar:\c!width\endcsname\??testme foo:\c!width\fi\endcsname
168\stoptyping
169
170In both cases the same list of tokens (\typ {\??testme foobar:\c!width}) has to
171be converted into a byte string, which we can avoid by:
172
173\starttyping
174\csname\ifcsname\??testme foobar:\c!width\endcsname\csnamestring\fi\endcsname
175\stoptyping
176
177when we have a hit. After all, the found macro has a known name that has been
178registered as a string. This variant runs over 10 percent faster, which of course
179can be neglected, especially if we don't call it millions of times; the book has
180400.000 calls to \typ {\csnamestring}. But as with many optimizations: gaining 20
181times 0.1 seconds on different subsystems eventually adds up to 20 \percent\ on a
18210 seconds run for that 300 page, math extensive, book.
183
184When looking at timings one always need to keep in mind that a simple test (in a
185loop) is very easy on the \CPU\ cache while in a real document there can be more
186cache misses simply because the cache is limited in size. That is why in practice
187we often see a bit more positive impact than shown here. In the case of the \typ
188{\csnamestring} we not only gain a bit on parameter handling but also on some
189font related operations, but again the gain depends on how many (more complex)
190font switches happen, which is more likely in for instance manuals.
191
192\stopchapter
193
194\stopcomponent
195
196