evenmore-expressions.tex /size: 14 Kb    last modification: 2021-10-28 13:50
1% language=us runpath=texruns:manuals/evenmore
2
3% This one accidentally ended up in the older history document followingup,
4% btu it's now moved here.
5
6\startcomponent evenmore-expressions
7
8\environment evenmore-style
9
10\startchapter[title={Expressions}]
11
12\startsection[title={Introduction}]
13
14Do we need bitwise expressions? Actually the answer is \quotation {no, although
15not until recently}. In \CONTEXT\ \MKII\ and \MKIV\ we just use integer addition
16because we only need to enable things but in \LMTX\ we want to control de
17detailed modes that some mechanisms in the engine provides and in order to not
18have tons of parameters these use bit sets. We manipulate these with the bitwise
19macros that actually are efficient \LUA\ function calls. But, as with some other
20extensions in \LUAMETATEX, one way to prevent tracing clutter is to have a few
21handy primitives. So let's see what we got.
22
23{\em I haven't checked all operators and combinations yet!}
24
25\stopsection
26
27\startsection[title={Exploration}]
28
29Already early in the \LUAMETATEX\ development (2019) the expression parser was
30extended with an integer division operator \type {:} that we actually use in
31\LMTX, and soon after that I added basic bitwise operators but these were never
32activated but kept as comment because I didn't want to impact the scanner (even
33if we can afford to loose some performance because the scanner has been
34optimized). But in the process of cleaning up \quote {todo} comments in the
35source code I eventually arrived at expressions again.
36
37The colon already makes the scanner incompatible because \type {\numexpr 1+2:}
38expects a number (which means that we cannot port back) and more operators only
39make that less likely. In \CONTEXT\ I nearly always use \type {\relax} as
40terminator unless we're sure that lookahead is no issue. \footnote {In the \ETEX\
41expression parser, the normal \type {/} rounds the result. Both the \type {*} and
42\type {/} operator have a dedicated code path that assures no loss of accuracy.
43The \type {:} operator just divides like \LUA's \type {//} which is an integer
44division operator. There are subtle differences between the division variants
45which can be noticeable when you go round trip. That is actually the main reason
46why this was one of the first things added to \LUAMETATEX\ as I wanted to get rid
47of some few scaled point rounding issues. The \ETEX\ expression parser is
48somewhat complicated because it can deal with a mix of integers, dimensions and
49even glue, but always brings the result back to its main operating model. Because
50we adopted some of these \ETEX\ rather early in \CONTEXT\ lookahead pitfalls are
51taken care of already.}
52
53When going over the code in 2021, mostly because I wanted to get rid of some
54commented experiments, I decided that the extension should not go into the
55normal scanner but that a dedicated, simple and integer only scanner made more
56sense, so during a rainy summer weekend I started playing with that. It eventually
57became a bit more than initially intended, although the amount of code is rather
58minimal. The performance was about twice that of the already available bitwise
59macros but operator precedence was not provided (apart from the multiplication
60and division operators). The final implementation was different, not that much
61faster on simple bitwise operations but could do more complex things in one go.
62Performance was not a real reason to provide this anyway because we're talking
63microseconds, it's more about less code and better readability.
64
65The initial primitive command was \type {\bitexpr} and it supported nesting with
66parenthesis as the other expressions do. Because there are many operators, also
67verbose ones, the non|-|optional \type {\relax} token finishes parsing. But
68soon we moved on to two dedicated primitives.
69
70\stopsection
71
72\startsection[title={Operators}]
73
74The set of operators that we have to support is the following. Most have
75alternatives so that we can get around catcode issues.
76
77\starttabulate[||cT|cT|]
78\BC add       \NC +                    \NC        \NC \NR
79\BC subtract  \NC -                    \NC        \NC \NR
80\BC multiply  \NC *                    \NC        \NC \NR
81\BC divide    \NC / :                  \NC        \NC \NR
82\BC mod       \NC \letterpercent       \NC mod    \NC \NR
83\BC band      \NC &                    \NC band   \NC \NR
84\BC bxor      \NC ^                    \NC bxor   \NC \NR
85\BC bor       \NC \letterbar \space v  \NC bor    \NC \NR
86\BC and       \NC &&                   \NC and    \NC \NR
87\BC or        \NC \letterbar\letterbar \NC or     \NC \NR
88\BC setbit    \NC <undecided>          \NC bset   \NC \NR
89\BC resetbit  \NC <undecided>          \NC breset \NC \NR
90\BC left      \NC <<                   \NC        \NC \NR
91\BC right     \NC >>                   \NC        \NC \NR
92\BC less      \NC <                    \NC        \NC \NR
93\BC lessequal \NC <=                   \NC        \NC \NR
94\BC equal     \NC = ==                 \NC        \NC \NR
95\BC moreequal \NC >=                   \NC        \NC \NR
96\BC more      \NC >                    \NC        \NC \NR
97\BC unequal   \NC <> != \lettertilde = \NC        \NC \NR
98\BC not       \NC ! \lettertilde       \NC not    \NC \NR
99\stoptabulate
100
101I considered using \type {++} and type {--} as the \type {bset} and \type
102{bunset} shortcuts but that leads to issues because in \TEX\ \type {-+-++--10} is
103a valid number and one never knows what sequence (without spaces) gets fed into
104an expression.
105
106Originally I'd added some \UNICODE\ characters but for some reason support of
107logical operators is suboptimal so I removed that feature. Because these special
108characters are multi|-|byte \UTF\ sequences they are not that much better than
109verbose words anyway.
110
111% 0x00AC  !    ¬              lua: not
112% 0x00D7  *    ×
113% 0x00F7  /    ÷
114% 0x2227  &&   ∧ c: and       lua: and
115% 0x2228  ||   ∨ c: or        lua: or
116% 0x2229  &    ∩ c: bitand    lua: band
117% 0x222A  |    ∪ c: bitor     lua: bor
118%         ^      c: bitxor    lua: bxor
119% 0x2260  !=   ≠
120% 0x2261  ==   ≡
121% 0x2264  <=   ≤
122% 0x2265  >=   ≥
123% 0x22BB  xor  ⊻
124% 0x22BC  nand ⊼
125% 0x22BD  nor  ⊽
126% 0x22C0  and  ⋀ n-arry logical and
127% 0x22C1  or   ⋁ n-arry logical or
128% 0x2AA1  <<   ⪡
129% 0x2AA2  >>   ⪢
130
131\stopsection
132
133\startsection[title={Integers and dimensions}]
134
135When I was playing a bit with this feature, I wondered if we could mix in some
136dimensions. It was actually not that hard to add this: only explicit (verbose)
137dimensions had to be intercepted because dimen registers and such are seen as
138integers by the integer scanner. Once we're able do handle that, a next step was
139to make sure that \typ {2 * 10pt} was permitted, something that the \ETEX\ \type
140{\dimexpr} primitives can't handle. So, a variant of the dimen parser has to be
141used that makes the unit optional: \type {\dimexpression} and \type
142{\numexpression} were born.
143
144The resulting parsers worked quite well but were about twice as slow as the
145normal expression scanners but that is no surprise because they do more. For
146instance we are case insensitive and need to handle letter and other (and in a
147few cases alignment and superscript) catcodes too. However, with a slightly tuned
148integer parser, also possible because the sentinel \type {\relax} makes parsing
149more predictable, and a dedicated unit scanner, in the end both the integer and
150dimension parser were performing well. It's not like we run them millions of
151times in a document.
152
153\startbuffer
154\scratchcounter = \numexpression
155    "00000 bor "00001 bor "00020 bor "00400 bor "08000 bor "F0000
156\relax
157\stopbuffer
158
159Here is an example that results in {0x\inlinebuffer\uchexnumber\scratchcounter}:
160
161\typebuffer
162
163\startbuffer
164\scratchcounter = \numexpression
165    "FFFFF bxor "10101
166\relax
167\stopbuffer
168
169And this gives {0x\inlinebuffer\uchexnumber\scratchcounter}:
170
171\typebuffer
172
173We can give numerous example but you get the picture. In the above table you can
174see that some operators have equivalents. The reason for this is that a macro
175package can change catcodes and some characters have special meanings. So, the
176scanner is rather tolerant.
177
178\startbuffer
179\scratchcounterone = 10
180\scratchcountertwo = 20
181\ifcase \numexpression
182    (\scratchcounterone > 5) && (\scratchcountertwo > 5)
183\relax yes\else nop\fi
184%
185\space
186%
187\scratchcounterone = 2
188\scratchcountertwo = 4
189\ifcase \numexpression
190    (\scratchcounterone > 5) and (\scratchcountertwo > 5)
191\relax nop\else yes\fi
192\stopbuffer
193
194And this gives \quote {\tttf \inlinebuffer}:
195
196\typebuffer
197
198The normal expansion rules apply, so one can use macros and other symbolic
199numbers. The only difference in handling dimensions is that we don't support
200\type {true} units but these are obsolete in \LUAMETATEX\ anyway.
201
202In the end I decided to also add an extra conditional so that we can say:
203
204\starttyping
205\ifexpression (\scratchcounterone > 5) and (\scratchcountertwo > 5)\relax
206    nop
207\else
208    yes
209\fi
210\stoptyping
211
212which looks more natural. Actually, this is an nowadays alias because we have two
213variants:
214
215\starttyping
216\ifnumexpression ... \relax ... \else ... \fi
217\ifdimexpression ... \relax ... \else ... \fi
218\stoptyping
219
220where the later is equivalent to
221
222\starttyping
223\ifboolean\dimexpression ... \relax ... \else ... \fi
224\stoptyping
225
226\stopsection
227
228\startsection[title={Tracing}]
229
230When \type {\tracingexpressions} is set to one or higher the intermediate \quote
231{reverse polish notation} stack that is used for the calculation is shown, for
232instance:
233
234\starttyping
2354:8: {numexpression rpn: 2 5 > 4 5 > and}
236\stoptyping
237
238When you want the output on your console, you need to say:
239
240\starttyping
241\tracingexpressions 1
242\tracingonline      1
243\stoptyping
244
245The fact that we process the expression in two phases makes it possible to provide this
246kind of tracing.
247
248\stopsection
249
250\startsection[title={Performance}]
251
252The following table shows the results of 100.000 evaluations (per line) so you'll
253notice that there is a difference. But keep in mind that the new variant can so
254more, so it might pay off when we have cases that otherwise demand multiple
255traditional expressions.
256
257\starttabulate[|l|c|]
258\NC \type {\dimexpr       4pt*2 + 6pt\relax}        \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchdimen  \dimexpr       4pt*2 + 6pt\relax} \elapsedtime\fi \NC \NR
259\NC \type {\dimexpression 4pt*2 + 6pt\relax}        \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchdimen  \dimexpression 4pt*2 + 6pt\relax} \elapsedtime\fi \NC \NR
260\NC \type {\dimexpression 2*4pt + 6pt\relax}        \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchdimen  \dimexpression 4pt*2 + 6pt\relax} \elapsedtime\fi \NC \NR
261\TB
262\NC \type {\numexpr       4 * 2 + 6\relax}          \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchcounter\numexpr       4 * 2 + 6\relax}   \elapsedtime\fi \NC \NR
263\NC \type {\numexpression 2 * 4 + 6\relax}          \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchcounter\numexpression 2 * 4 + 6\relax}   \elapsedtime\fi \NC \NR
264\TB
265\NC \type {\numexpr       4*2+6\relax}              \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchcounter\numexpr       4*2+6\relax}       \elapsedtime\fi \NC \NR
266\NC \type {\numexpression 2*4+6\relax}              \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchcounter\numexpression 2*4+6\relax}       \elapsedtime\fi \NC \NR
267\TB
268\NC \type {\numexpr       (1+2)*(3+4)\relax}        \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchcounter\numexpr       (1+2)*(3+4)\relax} \elapsedtime\fi \NC \NR
269\NC \type {\numexpression (1+2)*(3+4)\relax}        \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchcounter\numexpression (1+2)*(3+4)\relax} \elapsedtime\fi \NC \NR
270\TB
271\NC \type {\numexpr       (1 + 2) * (3 + 4) \relax} \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchcounter\numexpr       (1 + 2) * (3 + 4) \relax} \elapsedtime\fi \NC \NR
272\NC \type {\numexpression (1 + 2) * (3 + 4) \relax} \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchcounter\numexpression (1 + 2) * (3 + 4) \relax} \elapsedtime\fi \NC \NR
273\stoptabulate
274
275As usual I'll probably find some way to improve performance a bit but that might
276than also concern the traditional one. When we compare them, the new numeric
277scanner suffers from more options while the new dimension parser gain on the
278units. Also, keep in mind than the \LUAMETATEX\ normal parsers are already
279somewhat faster than the ones in \LUATEX. The numbers above are calculated when
280this document is rendered, so they may change over time and per run. The two
281engines compare as follows (mid 2021):
282
283\starttabulate[|l|c|c|]
284\NC                                           \BC \LUATEX \BC \LUAMETATEX \NC \NR
285\NC \type {\dimexpr 4pt*2 + 6pt\relax}        \NC 0.073   \NC 0.045 \NC \NR
286\NC \type {\numexpr 4 * 2 + 6\relax}          \NC 0.034   \NC 0.028 \NC \NR
287\NC \type {\numexpr 4*2+6\relax}              \NC 0.035   \NC 0.032 \NC \NR
288\NC \type {\numexpr (1+2)*(3+4)\relax}        \NC 0.050   \NC 0.047 \NC \NR
289\NC \type {\numexpr (1 + 2) * (3 + 4) \relax} \NC 0.052   \NC 0.048 \NC \NR
290\stoptabulate
291
292Of course tests like these are dubious because often \CPU\ cache will keep the
293current code accessible, but who knows.
294
295It will probably take a while before I will use this in the source code because
296first I need to make sure that all works as expected and while doing that I might
297adapt some of this. But the basic framework is there.
298
299\stopsection
300
301% \start
302% \nologbuffering
303% \scratchdimen    100pt
304% \scratchdimenone 65.536pt
305% \scratchdimentwo 65.536bp
306
307% \tracingonline1
308% \tracingexpressions1
309% \scratchcounter\bitexpr \scratchdimen / 2   \relax\the\scratchcounter\par
310
311% \scratchcounter\numexpression \scratchdimen / 2sp \relax \the\scratchcounter\par
312% \scratchcounter\numexpression \scratchdimen / 1pt \relax \the\scratchcounter\par
313% \scratchcounter\numexpression \scratchdimenone / 65.536pt \relax \the\scratchcounter\par
314% \scratchcounter\numexpression \scratchdimentwo / 2 \relax \the\scratchcounter\par
315
316% \scratchcounter\numexpression \scratchcounterone / 4 \relax \the\scratchcounter\par
317% \scratchdimen  \dimexpression \scratchcounterone / 4 \relax \the\scratchdimen\par
318
319% \scratchdimen  \dimexpression 2 * 4pt \relax \the\scratchdimen\par
320
321% \tracingexpressions0
322% \tracingonline0
323
324% \startTEXpage
325% \tracingonline1
326% \tracingexpressions1
327% \the\dimexpr -10pt\relax\quad
328% \the\dimexpr  10pt\relax\quad
329% \the\dimexpr  10.12 pt\relax\quad
330% \the\dimexpression -10pt\relax\quad
331% \the\dimexpression  10pt\relax\quad
332% \stopTEXpage
333
334\stopchapter
335
336\stopcomponent
337