followingup-expressions.tex /size: 14 Kb    last modification: 2021-10-28 13:50
1% language=us runpath=texruns:manuals/followingup
2
3\startcomponent followingup-expressions
4
5\environment followingup-style
6
7\startchapter[title={Expressions}]
8
9\startsection[title={Introduction}]
10
11Do we need bitwise expressions? Actually the answer is \quotation {no, although
12not until recently}. In \CONTEXT\ \MKII\ and \MKIV\ we just use integer addition
13because we only need to enable things but in \LMTX\ we want to control de
14detailed modes that some mechanisms in the engine provides and in order to not
15have tons of parameters these use bit sets. We manipulate these with the bitwise
16macros that actually are efficient \LUA\ function calls. But, as with some other
17extensions in \LUAMETATEX, one way to prevent tracing clutter is to have a few
18handy primitives. So let's see what we got.
19
20{\em I haven't checked all operators and combinations yet!}
21
22\stopsection
23
24\startsection[title={Exploration}]
25
26Already early in the \LUAMETATEX\ development (2019) the expression parser was
27extended with an integer division operator \type {:} that we actually use in
28\LMTX, and soon after that I added basic bitwise operators but these were never
29activated but kept as comment because I didn't want to impact the scanner (even
30if we can afford to loose some performance because the scanner has been
31optimized). But in the process of cleaning up \quote {todo} comments in the
32source code I eventually arrived at expressions again.
33
34The colon already makes the scanner incompatible because \type {\numexpr 1+2:}
35expects a number (which means that we cannot port back) and more operators only
36make that less likely. In \CONTEXT\ I nearly always use \type {\relax} as
37terminator unless we're sure that lookahead is no issue. \footnote {In the \ETEX\
38expression parser, the normal \type {/} rounds the result. Both the \type {*} and
39\type {/} operator have a dedicated code path that assures no loss of accuracy.
40The \type {:} operator just divides like \LUA's \type {//} which is an integer
41division operator. There are subtle differences between the division variants
42which can be noticeable when you go round trip. That is actually the main reason
43why this was one of the first things added to \LUAMETATEX\ as I wanted to get rid
44of some few scaled point rounding issues. The \ETEX\ expression parser is
45somewhat complicated because it can deal with a mix of integers, dimensions and
46even glue, but always brings the result back to its main operating model. Because
47we adopted some of these \ETEX\ rather early in \CONTEXT\ lookahead pitfalls are
48taken care of already.}
49
50When going over the code in 2021, mostly because I wanted to get rid of some
51commented experiments, I decided that the extension should not go into the
52normal scanner but that a dedicated, simple and integer only scanner made more
53sense, so during a rainy summer weekend I started playing with that. It eventually
54became a bit more than initially intended, although the amount of code is rather
55minimal. The performance was about twice that of the already available bitwise
56macros but operator precedence was not provided (apart from the multiplication
57and division operators). The final implementation was different, not that much
58faster on simple bitwise operations but could do more complex things in one go.
59Performance was not a real reason to provide this anyway because we're talking
60microseconds, it's more about less code and better readability.
61
62The initial primitive command was \type {\bitexpr} and it supported nesting with
63parenthesis as the other expressions do. Because there are many operators, also
64verbose ones, the non|-|optional \type {\relax} token finishes parsing. But
65soon we moved on to two dedicated primitives.
66
67\stopsection
68
69\startsection[title={Operators}]
70
71The set of operators that we have to support is the following. Most have
72alternatives so that we can get around catcode issues.
73
74\starttabulate[||cT|cT|]
75\BC add       \NC +                    \NC        \NC \NR
76\BC subtract  \NC -                    \NC        \NC \NR
77\BC multiply  \NC *                    \NC        \NC \NR
78\BC divide    \NC / :                  \NC        \NC \NR
79\BC mod       \NC \letterpercent       \NC mod    \NC \NR
80\BC band      \NC &                    \NC band   \NC \NR
81\BC bxor      \NC ^                    \NC bxor   \NC \NR
82\BC bor       \NC \letterbar \space v  \NC bor    \NC \NR
83\BC and       \NC &&                   \NC and    \NC \NR
84\BC or        \NC \letterbar\letterbar \NC or     \NC \NR
85\BC setbit    \NC <undecided>          \NC bset   \NC \NR
86\BC resetbit  \NC <undecided>          \NC breset \NC \NR
87\BC left      \NC <<                   \NC        \NC \NR
88\BC right     \NC >>                   \NC        \NC \NR
89\BC less      \NC <                    \NC        \NC \NR
90\BC lessequal \NC <=                   \NC        \NC \NR
91\BC equal     \NC = ==                 \NC        \NC \NR
92\BC moreequal \NC >=                   \NC        \NC \NR
93\BC more      \NC >                    \NC        \NC \NR
94\BC unequal   \NC <> != \lettertilde = \NC        \NC \NR
95\BC not       \NC ! \lettertilde       \NC not    \NC \NR
96\stoptabulate
97
98I considered using \type {++} and type {--} as the \type {bset} and \type
99{bunset} shortcuts but that leads to issues because in \TEX\ \type {-+-++--10} is
100a valid number and one never knows what sequence (without spaces) gets fed into
101an expression.
102
103Originally I'd added some \UNICODE\ characters but for some reason support of
104logical operators is suboptimal so I removed that feature. Because these special
105characters are multi|-|byte \UTF\ sequences they are not that much better than
106verbose words anyway.
107
108% 0x00AC  !    ¬              lua: not
109% 0x00D7  *    ×
110% 0x00F7  /    ÷
111% 0x2227  &&   ∧ c: and       lua: and
112% 0x2228  ||   ∨ c: or        lua: or
113% 0x2229  &    ∩ c: bitand    lua: band
114% 0x222A  |    ∪ c: bitor     lua: bor
115%         ^      c: bitxor    lua: bxor
116% 0x2260  !=   ≠
117% 0x2261  ==   ≡
118% 0x2264  <=   ≤
119% 0x2265  >=   ≥
120% 0x22BB  xor  ⊻
121% 0x22BC  nand ⊼
122% 0x22BD  nor  ⊽
123% 0x22C0  and  ⋀ n-arry logical and
124% 0x22C1  or   ⋁ n-arry logical or
125% 0x2AA1  <<   ⪡
126% 0x2AA2  >>   ⪢
127
128\stopsection
129
130\startsection[title={Integers and dimensions}]
131
132When I was playing a bit with this feature, I wondered if we could mix in some
133dimensions. It was actually not that hard to add this: only explicit (verbose)
134dimensions had to be intercepted because dimen registers and such are seen as
135integers by the integer scanner. Once we're able do handle that, a next step was
136to make sure that \typ {2 * 10pt} was permitted, something that the \ETEX\ \type
137{\dimexpr} primitives can't handle. So, a variant of the dimen parser has to be
138used that makes the unit optional: \type {\dimexpression} and \type
139{\numexpression} were born.
140
141The resulting parsers worked quite well but were about twice as slow as the
142normal expression scanners but that is no surprise because they do more. For
143instance we are case insensitive and need to handle letter and other (and in a
144few cases alignment and superscript) catcodes too. However, with a slightly tuned
145integer parser, also possible because the sentinel \type {\relax} makes parsing
146more predictable, and a dedicated unit scanner, in the end both the integer and
147dimension parser were performing well. It's not like we run them millions of
148times in a document.
149
150\startbuffer
151\scratchcounter = \numexpression
152    "00000 bor "00001 bor "00020 bor "00400 bor "08000 bor "F0000
153\relax
154\stopbuffer
155
156Here is an example that results in {0x\inlinebuffer\uchexnumber\scratchcounter}:
157
158\typebuffer
159
160\startbuffer
161\scratchcounter = \numexpression
162    "FFFFF bxor "10101
163\relax
164\stopbuffer
165
166And this gives {0x\inlinebuffer\uchexnumber\scratchcounter}:
167
168\typebuffer
169
170We can give numerous example but you get the picture. In the above table you can
171see that some operators have equivalents. The reason for this is that a macro
172package can change catcodes and some characters have special meanings. So, the
173scanner is rather tolerant.
174
175\startbuffer
176\scratchcounterone = 10
177\scratchcountertwo = 20
178\ifcase \numexpression
179    (\scratchcounterone > 5) && (\scratchcountertwo > 5)
180\relax yes\else nop\fi
181%
182\space
183%
184\scratchcounterone = 2
185\scratchcountertwo = 4
186\ifcase \numexpression
187    (\scratchcounterone > 5) and (\scratchcountertwo > 5)
188\relax nop\else yes\fi
189\stopbuffer
190
191And this gives \quote {\tttf \inlinebuffer}:
192
193\typebuffer
194
195The normal expansion rules apply, so one can use macros and other symbolic
196numbers. The only difference in handling dimensions is that we don't support
197\type {true} units but these are obsolete in \LUAMETATEX\ anyway.
198
199In the end I decided to also add an extra conditional so that we can say:
200
201\starttyping
202\ifexpression (\scratchcounterone > 5) and (\scratchcountertwo > 5)\relax
203    nop
204\else
205    yes
206\fi
207\stoptyping
208
209which looks more natural. Actually, this is an nowadays alias because we have two
210variants:
211
212\starttyping
213\ifnumexpression ... \relax ... \else ... \fi
214\ifdimexpression ... \relax ... \else ... \fi
215\stoptyping
216
217where the later is equivalent to
218
219\starttyping
220\ifboolean\dimexpression ... \relax ... \else ... \fi
221\stoptyping
222
223\stopsection
224
225\startsection[title={Tracing}]
226
227When \type {\tracingexpressions} is set to one or higher the intermediate \quote
228{reverse polish notation} stack that is used for the calculation is shown, for
229instance:
230
231\starttyping
2324:8: {numexpression rpn: 2 5 > 4 5 > and}
233\stoptyping
234
235When you want the output on your console, you need to say:
236
237\starttyping
238\tracingexpressions 1
239\tracingonline      1
240\stoptyping
241
242The fact that we process the expression in two phases makes it possible to provide this
243kind of tracing.
244
245\stopsection
246
247\startsection[title={Performance}]
248
249The following table shows the results of 100.000 evaluations (per line) so you'll
250notice that there is a difference. But keep in mind that the new variant can so
251more, so it might pay off when we have cases that otherwise demand multiple
252traditional expressions.
253
254\starttabulate[|l|c|]
255\NC \type {\dimexpr       4pt*2 + 6pt\relax}        \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchdimen  \dimexpr       4pt*2 + 6pt\relax} \elapsedtime\fi \NC \NR
256\NC \type {\dimexpression 4pt*2 + 6pt\relax}        \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchdimen  \dimexpression 4pt*2 + 6pt\relax} \elapsedtime\fi \NC \NR
257\NC \type {\dimexpression 2*4pt + 6pt\relax}        \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchdimen  \dimexpression 4pt*2 + 6pt\relax} \elapsedtime\fi \NC \NR
258\TB
259\NC \type {\numexpr       4 * 2 + 6\relax}          \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchcounter\numexpr       4 * 2 + 6\relax}   \elapsedtime\fi \NC \NR
260\NC \type {\numexpression 2 * 4 + 6\relax}          \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchcounter\numexpression 2 * 4 + 6\relax}   \elapsedtime\fi \NC \NR
261\TB
262\NC \type {\numexpr       4*2+6\relax}              \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchcounter\numexpr       4*2+6\relax}       \elapsedtime\fi \NC \NR
263\NC \type {\numexpression 2*4+6\relax}              \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchcounter\numexpression 2*4+6\relax}       \elapsedtime\fi \NC \NR
264\TB
265\NC \type {\numexpr       (1+2)*(3+4)\relax}        \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchcounter\numexpr       (1+2)*(3+4)\relax} \elapsedtime\fi \NC \NR
266\NC \type {\numexpression (1+2)*(3+4)\relax}        \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchcounter\numexpression (1+2)*(3+4)\relax} \elapsedtime\fi \NC \NR
267\TB
268\NC \type {\numexpr       (1 + 2) * (3 + 4) \relax} \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchcounter\numexpr       (1 + 2) * (3 + 4) \relax} \elapsedtime\fi \NC \NR
269\NC \type {\numexpression (1 + 2) * (3 + 4) \relax} \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchcounter\numexpression (1 + 2) * (3 + 4) \relax} \elapsedtime\fi \NC \NR
270\stoptabulate
271
272As usual I'll probably find some way to improve performance a bit but that might
273than also concern the traditional one. When we compare them, the new numeric
274scanner suffers from more options while the new dimension parser gain on the
275units. Also, keep in mind than the \LUAMETATEX\ normal parsers are already
276somewhat faster than the ones in \LUATEX. The numbers above are calculated when
277this document is rendered, so they may change over time and per run. The two
278engines compare as follows (mid 2021):
279
280\starttabulate[|l|c|c|]
281\NC                                           \BC \LUATEX \BC \LUAMETATEX \NC \NR
282\NC \type {\dimexpr 4pt*2 + 6pt\relax}        \NC 0.073   \NC 0.045 \NC \NR
283\NC \type {\numexpr 4 * 2 + 6\relax}          \NC 0.034   \NC 0.028 \NC \NR
284\NC \type {\numexpr 4*2+6\relax}              \NC 0.035   \NC 0.032 \NC \NR
285\NC \type {\numexpr (1+2)*(3+4)\relax}        \NC 0.050   \NC 0.047 \NC \NR
286\NC \type {\numexpr (1 + 2) * (3 + 4) \relax} \NC 0.052   \NC 0.048 \NC \NR
287\stoptabulate
288
289Of course tests like these are dubious because often \CPU\ cache will keep the
290current code accessible, but who knows.
291
292It will probably take a while before I will use this in the source code because
293first I need to make sure that all works as expected and while doing that I might
294adapt some of this. But the basic framework is there.
295
296\stopsection
297
298% \start
299% \nologbuffering
300% \scratchdimen    100pt
301% \scratchdimenone 65.536pt
302% \scratchdimentwo 65.536bp
303
304% \tracingonline1
305% \tracingexpressions1
306% \scratchcounter\bitexpr \scratchdimen / 2   \relax\the\scratchcounter\par
307
308% \scratchcounter\numexpression \scratchdimen / 2sp \relax \the\scratchcounter\par
309% \scratchcounter\numexpression \scratchdimen / 1pt \relax \the\scratchcounter\par
310% \scratchcounter\numexpression \scratchdimenone / 65.536pt \relax \the\scratchcounter\par
311% \scratchcounter\numexpression \scratchdimentwo / 2 \relax \the\scratchcounter\par
312
313% \scratchcounter\numexpression \scratchcounterone / 4 \relax \the\scratchcounter\par
314% \scratchdimen  \dimexpression \scratchcounterone / 4 \relax \the\scratchdimen\par
315
316% \scratchdimen  \dimexpression 2 * 4pt \relax \the\scratchdimen\par
317
318% \tracingexpressions0
319% \tracingonline0
320
321% \startTEXpage
322% \tracingonline1
323% \tracingexpressions1
324% \the\dimexpr -10pt\relax\quad
325% \the\dimexpr  10pt\relax\quad
326% \the\dimexpr  10.12 pt\relax\quad
327% \the\dimexpression -10pt\relax\quad
328% \the\dimexpression  10pt\relax\quad
329% \stopTEXpage
330
331\stopchapter
332
333\stopcomponent
334