1
2
3
4
5
6\startcomponent evenmoreexpressions
7
8\environment evenmorestyle
9
10\startchapter[title={Expressions}]
11
12\startsection[title={Introduction}]
13
14Do we need bitwise expressions? Actually the answer is \quotation {no, although
15not until recently}. In \CONTEXT\ \MKII\ and \MKIV\ we just use integer addition
16because we only need to enable things but in \LMTX\ we want to control de
17detailed modes that some mechanisms in the engine provides and in order to not
18have tons of parameters these use bit sets. We manipulate these with the bitwise
19macros that actually are efficient \LUA\ function calls. But, as with some other
20extensions in \LUAMETATEX, one way to prevent tracing clutter is to have a few
21handy primitives. So lets see what we got.
22
23{\em I havent checked all operators and combinations yet!}
24
25\stopsection
26
27\startsection[title={Exploration}]
28
29Already early in the \LUAMETATEX\ development (2019) the expression parser was
30extended with an integer division operator \type {:} that we actually use in
31\LMTX, and soon after that I added basic bitwise operators but these were never
32activated but kept as comment because I didnt want to impact the scanner (even
33if we can afford to loose some performance because the scanner has been
34optimized). But in the process of cleaning up \quote {todo} comments in the
35source code I eventually arrived at expressions again.
36
37The colon already makes the scanner incompatible because \type {\numexpr 12:}
38expects a number (which means that we cannot port back) and more operators only
39make that less likely. In \CONTEXT\ I nearly always use \type {\relax} as
40terminator unless were sure that lookahead is no issue. \footnote {In the \ETEX\
41expression parser, the normal \type {} rounds the result. Both the \type {*} and
42\type {} operator have a dedicated code path that assures no loss of accuracy.
43The \type {:} operator just divides like \LUAs \type {} which is an integer
44division operator. There are subtle differences between the division variants
45which can be noticeable when you go round trip. That is actually the main reason
46why this was one of the first things added to \LUAMETATEX\ as I wanted to get rid
47of some few scaled point rounding issues. The \ETEX\ expression parser is
48somewhat complicated because it can deal with a mix of integers, dimensions and
49even glue, but always brings the result back to its main operating model. Because
50we adopted some of these \ETEX\ rather early in \CONTEXT\ lookahead pitfalls are
51taken care of already.}
52
53When going over the code in 2021, mostly because I wanted to get rid of some
54commented experiments, I decided that the extension should not go into the
55normal scanner but that a dedicated, simple and integer only scanner made more
56sense, so during a rainy summer weekend I started playing with that. It eventually
57became a bit more than initially intended, although the amount of code is rather
58minimal. The performance was about twice that of the already available bitwise
59macros but operator precedence was not provided (apart from the multiplication
60and division operators). The final implementation was different, not that much
61faster on simple bitwise operations but could do more complex things in one go.
62Performance was not a real reason to provide this anyway because were talking
63microseconds, its more about less code and better readability.
64
65The initial primitive command was \type {\bitexpr} and it supported nesting with
66parenthesis as the other expressions do. Because there are many operators, also
67verbose ones, the nonoptional \type {\relax} token finishes parsing. But
68soon we moved on to two dedicated primitives.
69
70\stopsection
71
72\startsection[title={Operators}]
73
74The set of operators that we have to support is the following. Most have
75alternatives so that we can get around catcode issues.
76
77\starttabulate[cTcT]
78\BC add \NC \NC \NC \NR
79\BC subtract \NC \NC \NC \NR
80\BC multiply \NC * \NC \NC \NR
81\BC divide \NC : \NC \NC \NR
82\BC mod \NC \letterpercent \NC mod \NC \NR
83\BC band \NC \NC band \NC \NR
84\BC bxor \NC \NC bxor \NC \NR
85\BC bor \NC \letterbar \space v \NC bor \NC \NR
86\BC and \NC \NC and \NC \NR
87\BC or \NC \letterbar\letterbar \NC or \NC \NR
88\BC setbit \NC <undecided> \NC bset \NC \NR
89\BC resetbit \NC <undecided> \NC breset \NC \NR
90\BC left \NC << \NC \NC \NR
91\BC right \NC >> \NC \NC \NR
92\BC less \NC < \NC \NC \NR
93\BC lessequal \NC <= \NC \NC \NR
94\BC equal \NC = == \NC \NC \NR
95\BC moreequal \NC >= \NC \NC \NR
96\BC more \NC > \NC \NC \NR
97\BC unequal \NC <> != \lettertilde = \NC \NC \NR
98\BC not \NC ! \lettertilde \NC not \NC \NR
99\stoptabulate
100
101I considered using \type {} and type {} as the \type {bset} and \type
102{bunset} shortcuts but that leads to issues because in \TEX\ \type {10} is
103a valid number and one never knows what sequence (without spaces) gets fed into
104an expression.
105
106Originally Id added some \UNICODE\ characters but for some reason support of
107logical operators is suboptimal so I removed that feature. Because these special
108characters are multibyte \UTF\ sequences they are not that much better than
109verbose words anyway.
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131\stopsection
132
133\startsection[title={Integers and dimensions}]
134
135When I was playing a bit with this feature, I wondered if we could mix in some
136dimensions. It was actually not that hard to add this: only explicit (verbose)
137dimensions had to be intercepted because dimen registers and such are seen as
138integers by the integer scanner. Once were able do handle that, a next step was
139to make sure that \typ {2 * 10pt} was permitted, something that the \ETEX\ \type
140{\dimexpr} primitives cant handle. So, a variant of the dimen parser has to be
141used that makes the unit optional: \type {\dimexpression} and \type
142{\numexpression} were born.
143
144The resulting parsers worked quite well but were about twice as slow as the
145normal expression scanners but that is no surprise because they do more. For
146instance we are case insensitive and need to handle letter and other (and in a
147few cases alignment and superscript) catcodes too. However, with a slightly tuned
148integer parser, also possible because the sentinel \type {\relax} makes parsing
149more predictable, and a dedicated unit scanner, in the end both the integer and
150dimension parser were performing well. Its not like we run them millions of
151times in a document.
152
153\startbuffer
154\scratchcounter = \numexpression
155 "00000 bor "00001 bor "00020 bor "00400 bor "08000 bor "F0000
156\relax
157\stopbuffer
158
159Here is an example that results in {0x\inlinebuffer\uchexnumber\scratchcounter}:
160
161\typebuffer
162
163\startbuffer
164\scratchcounter = \numexpression
165 "FFFFF bxor "10101
166\relax
167\stopbuffer
168
169And this gives {0x\inlinebuffer\uchexnumber\scratchcounter}:
170
171\typebuffer
172
173We can give numerous example but you get the picture. In the above table you can
174see that some operators have equivalents. The reason for this is that a macro
175package can change catcodes and some characters have special meanings. So, the
176scanner is rather tolerant.
177
178\startbuffer
179\scratchcounterone = 10
180\scratchcountertwo = 20
181\ifcase \numexpression
182 (\scratchcounterone > 5) (\scratchcountertwo > 5)
183\relax yes\else nop\fi
184
185\space
186
187\scratchcounterone = 2
188\scratchcountertwo = 4
189\ifcase \numexpression
190 (\scratchcounterone > 5) and (\scratchcountertwo > 5)
191\relax nop\else yes\fi
192\stopbuffer
193
194And this gives \quote {\tttf \inlinebuffer}:
195
196\typebuffer
197
198The normal expansion rules apply, so one can use macros and other symbolic
199numbers. The only difference in handling dimensions is that we dont support
200\type {true} units but these are obsolete in \LUAMETATEX\ anyway.
201
202In the end I decided to also add an extra conditional so that we can say:
203
204\starttyping
205\ifexpression (\scratchcounterone > 5) and (\scratchcountertwo > 5)\relax
206 nop
207\else
208 yes
209\fi
210\stoptyping
211
212which looks more natural. Actually, this is an nowadays alias because we have two
213variants:
214
215\starttyping
216\ifnumexpression ... \relax ... \else ... \fi
217\ifdimexpression ... \relax ... \else ... \fi
218\stoptyping
219
220where the later is equivalent to
221
222\starttyping
223\ifboolean\dimexpression ... \relax ... \else ... \fi
224\stoptyping
225
226\stopsection
227
228\startsection[title={Tracing}]
229
230When \type {\tracingexpressions} is set to one or higher the intermediate \quote
231{reverse polish notation} stack that is used for the calculation is shown, for
232instance:
233
234\starttyping
2354:8: {numexpression rpn: 2 5 > 4 5 > and}
236\stoptyping
237
238When you want the output on your console, you need to say:
239
240\starttyping
241\tracingexpressions 1
242\tracingonline 1
243\stoptyping
244
245The fact that we process the expression in two phases makes it possible to provide this
246kind of tracing.
247
248\stopsection
249
250\startsection[title={Performance}]
251
252The following table shows the results of 100.000 evaluations (per line) so youll
253notice that there is a difference. But keep in mind that the new variant can so
254more, so it might pay off when we have cases that otherwise demand multiple
255traditional expressions.
256
257\starttabulate[lc]
258\NC \type {\dimexpr 4pt*2 6pt\relax} \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchdimen \dimexpr 4pt*2 6pt\relax} \elapsedtime\fi \NC \NR
259\NC \type {\dimexpression 4pt*2 6pt\relax} \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchdimen \dimexpression 4pt*2 6pt\relax} \elapsedtime\fi \NC \NR
260\NC \type {\dimexpression 2*4pt 6pt\relax} \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchdimen \dimexpression 4pt*2 6pt\relax} \elapsedtime\fi \NC \NR
261\TB
262\NC \type {\numexpr 4 * 2 6\relax} \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchcounter\numexpr 4 * 2 6\relax} \elapsedtime\fi \NC \NR
263\NC \type {\numexpression 2 * 4 6\relax} \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchcounter\numexpression 2 * 4 6\relax} \elapsedtime\fi \NC \NR
264\TB
265\NC \type {\numexpr 4*26\relax} \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchcounter\numexpr 4*26\relax} \elapsedtime\fi \NC \NR
266\NC \type {\numexpression 2*46\relax} \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchcounter\numexpression 2*46\relax} \elapsedtime\fi \NC \NR
267\TB
268\NC \type {\numexpr (12)*(34)\relax} \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchcounter\numexpr (12)*(34)\relax} \elapsedtime\fi \NC \NR
269\NC \type {\numexpression (12)*(34)\relax} \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchcounter\numexpression (12)*(34)\relax} \elapsedtime\fi \NC \NR
270\TB
271\NC \type {\numexpr (1 2) * (3 4) \relax} \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchcounter\numexpr (1 2) * (3 4) \relax} \elapsedtime\fi \NC \NR
272\NC \type {\numexpression (1 2) * (3 4) \relax} \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchcounter\numexpression (1 2) * (3 4) \relax} \elapsedtime\fi \NC \NR
273\stoptabulate
274
275As usual Ill probably find some way to improve performance a bit but that might
276than also concern the traditional one. When we compare them, the new numeric
277scanner suffers from more options while the new dimension parser gain on the
278units. Also, keep in mind than the \LUAMETATEX\ normal parsers are already
279somewhat faster than the ones in \LUATEX. The numbers above are calculated when
280this document is rendered, so they may change over time and per run. The two
281engines compare as follows (mid 2021):
282
283\starttabulate[lcc]
284\NC \BC \LUATEX \BC \LUAMETATEX \NC \NR
285\NC \type {\dimexpr 4pt*2 6pt\relax} \NC 0.073 \NC 0.045 \NC \NR
286\NC \type {\numexpr 4 * 2 6\relax} \NC 0.034 \NC 0.028 \NC \NR
287\NC \type {\numexpr 4*26\relax} \NC 0.035 \NC 0.032 \NC \NR
288\NC \type {\numexpr (12)*(34)\relax} \NC 0.050 \NC 0.047 \NC \NR
289\NC \type {\numexpr (1 2) * (3 4) \relax} \NC 0.052 \NC 0.048 \NC \NR
290\stoptabulate
291
292Of course tests like these are dubious because often \CPU\ cache will keep the
293current code accessible, but who knows.
294
295It will probably take a while before I will use this in the source code because
296first I need to make sure that all works as expected and while doing that I might
297adapt some of this. But the basic framework is there.
298
299\stopsection
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334\stopchapter
335
336\stopcomponent
337 |