musings-unicode.tex /size: 85 Kb    last modification: 2024-01-16 10:21
1% language=us runpath=texruns:manuals/math
2
3\def\unichar#1#2{#1 (U+#2: \char"#2)}
4
5\def\APL{\ss apl}
6
7% \useMPlibrary[dum]
8
9\startcomponent musings-unicode
10
11\environment musings-style
12
13% \usemodule[mathfun]
14
15\startchapter[title=Unicode]
16
17\startsection[title=Introduction]
18
19When working on a \TEX\ macro package for decades one can hardly avoid dealing
20with math; after all \TEX\ is pretty much about math. When this wonderful
21typesetting infrastructure was written it was all about quality and how to make
22your documents look nice. And for sure, Don Knuths documents looks nice, also
23because he pays a lot of attention to the \quotation {fine points of math
24typesetting}.
25
26The constraints of those time (like hardware, compilers, fonts, and for sure also
27time) made \TEX\ into what it is: eight bit character sets, eight bit fonts,
28eight bit hyphenation patterns, efficient memory usage and therefore carrying
29around as little as possible. It all makes sense. But one needs to pay attention.
30\footnote {And that is what Mikael Sundqvist and I have been doing a lot since we
31started upgrading math in \CONTEXT\ in combination with enhancing the math engine
32in \LUAMETATEX. The story here is a byproduct of our explorations and very much a
33combined effort.}
34
35Math typesetting is actually a sort of separated process in the engine:
36unprocessed lists go in and after some juggling a list of assembled boxes,
37glyphs, glues and penalties come out. I will not go into detail about that and
38only mention that in \LUAMETATEX\ we extended all this to be a bit more flexible
39and controllable, something that has been driven by the fact that we need to
40support \UNICODE\ fonts. This is all part of a related effort to move from eight
41bit \quote {everything} to \UNICODE\ \quote {everywhere}.
42
43Now, one can say a lot about \UNICODE\ but the main advantage is that it tries to
44cover \quote {all} characters ever encountered, including scripts (used in
45languages) that are long gone, as well as these little pictures that people like
46to see on the web: emojis. One can safely say that \UNICODE\ simplifies mixing
47languages and scripts, and thereby makes \TEX\ macro packages less complex. On
48the other hand, \UNICODE\ (or more precisely, related wide) fonts makes all kind
49of features possible and thereby add a complication.
50
51So, how about math? When Don Knuth gave us \TEX\ he also gave us fonts and there
52are plenty symbols in these fonts. But, as mathematicians seem to love variations
53on symbols soon more fonts arrived, most noticeably those from the \AMS\ that
54also added some more alphabets: mathematicians also love to render the shapes of
55letters differently. In order to access these glyphs names were invented that
56also sometimes suggested that there was some order in the matter. And, for some
57reason these names got aliases and soon we had a huge list of often obscure and
58inconsistent macro names. It didn't take long for a little mess and confusion to
59creep in.
60
61It has been said that the verbose \TEX\ math \ASCII\ input format is also a way
62for mathematicians to communicate, just because many use the same tool to render
63the formulas. Of course that gets obscured when one starts to add additional
64macros. It gets even more tricky once we start talking \quote {standard} as in
65\quotation {\LATEX\ is the standard}. That has for instance resulted in browsers
66interpreting \TEX\ like input without using \TEX\ (so how about expansion?). It
67has also sort of put \TEX\ into the range of possible word processing systems,
68which in turn leads to these \MSWORD\ versus Google docs versus \LATEX\ debates
69that can get rather nasty and unrealistic when it comes to discussing usage and
70quality. Interestingly, \MSWORD\ now has reasonable math, to some extent
71modelled after \TEX. It has some verbose \TEX\ like (but constrained) input and
72would do well for probably mostly people who occasionally have to inject some
73math. There were also attempts by the people at \MICROSOFT\ to normalize the
74input but we leave that aside now.
75
76However, because we now do have all these symbols and because source code editors
77make them accessible and show them there is a good chance that users will inject
78them, if only by cut and paste, so we do have to deal with that. This
79automatically puts us in the position that we need to deal with different
80meanings for the same symbol, which in turn might demand different spacing,
81penalties and such. In the end it is users that drive all this, not publishers;
82they don't really care and out|-|source typesetting anyway. We're not aware of
83any research and development being done and I suppose we would have noticed
84because after all we're involved in developing \LUATEX. It is one of the engines
85that does \OPENTYPE\ and \UNICODE\ math and no publisher or supplier ever took
86serious interest in it. From our perspective what users do is visible, everything
87else is hidden behind corporate curtains. And this is why nowadays we only need
88to care about users (mainly authors).
89
90Back to typesetting. For a long time all went well: one could typeset documents
91that looked good. Okay, not all looked good because not everyone paid attention to
92details, and the more the web evolved the more patching cut'n'paste of bad
93examples made its way into documents, but let's not start talking quality here.
94But then came \UNICODE\ and a while later people started talking about
95accessibility, cutting and pasting and more. In the meantime there had been
96developments like \MATHML\ and \OPENMATH\ that tried to structure and organize
97formulas in a more symbolic way. \footnote {It probably went unnoticed that
98\CONTEXT\ always supported rendering \MATHML, and as such had to deal with all
99the weird aspects (read: way it was used). Although one is not supposed to
100directly edit \MATHML\ we work with authors who are quite happy to do that simply
101because they code the documents in \XML\ because there is a need for high quality
102\PDF\ as well as \HTML\ output and a \CONTEXT\ based workflow can handle the
103\XML\ well. We're talking of large volumes here (mostly for basically free
104school math).}
105
106In the meantime the \TEX\ community had lost the edge on fonts, and \OPENTYPE\
107math was invented by \MICROSOFT\ and implemented in \MSWORD\ before a substantial
108number of \TEX\ users understood what was happening. They had it coming. To a
109large extend one can say the same about math in \UNICODE. Where a Greek capital
110\quote {A} is seen as different from a Latin capital \quote {A}, even when they
111often have the same shape, a math italic variable \quote {h} was made synonym to
112\quote {Planck constant}, as if the letters used in math had no meaning at all.
113We'll see that a wide hat is an extensible character of zero width combining hat
114accent, which makes for curious handling of the initial character. There is more
115granularity in some symbols, especially popular symbols like slashes and bars,
116than in letters. It is as if the math community didn't care much about how the
117letters (variables) were communicated and perceived but were picky about the
118slope of slashes. It seems more of a visual world, which might actually be the
119reason structured input never really took of. Maybe \TEX ies just love the mix of
120characters, commands, spacing directives. Maybe they just love to reposition and
121space these glyphs to suit all kind of curious non|-|standard math rendering.
122
123All this makes it pretty hard to communicate meaning, and it is just one of the
124examples where the \TEX\ community, for as far involved, failed to make a strong
125case. Our personal opinion is that no one really cared because in the \TEX\
126community it is all about rendering. The fact that we use math to communicate
127only gained attention when accessibility became hot and by then it was too late.
128Efforts like \OPENMATH\ started ambitious and in the end basically failed. Coding
129in \XML\ using \MATHML\ isn't much better and one always had to adapt to the
130latest fashion. Also, once plenty code shows up bugs become features. Browser
131support came and went and came back. Simplified input using for instance
132\ASCIIMATH\ started indeed simple but quickly became a (somewhat inconsistent)
133mess. What we see here is the same as everything web (and computer languages): we
134can do better, we start some project, then move on, and we end up with half|-|way
135abandoned results. The development cycles are short, results have to be achieved
136fast, there is no time (or interest) for iterating and refactoring. The word
137\quote {standard} and mantra \quote {everyone should use this} are quite popular.
138
139So where does that leave us with \TEX ? Well, with a mess. Decades of various
140efforts have not brought us a coherent system of organizing symbols and
141properties, made us end up with inconsistencies, made users revert to hacks,
142didn't make math easily transferable and complicates rendering. Personally we
143find it sort of strange that we spend time on for instance tagging and
144accessibility before we get these math alphabets and shared math specific symbols
145sorted out. If we cannot make good arguments for that (math being a script on its
146own with semantics and such) we waste energy and are pulling a dead horse. What
147puzzles us most is that one would expect mathematicians to be able to come up
148with strong arguments for a structured approach. But maybe it was simply the fact
149that \TEX\ math typesetting was pretty much driven by large commercial publishers
150and those providing services for them: the first category doesn't invest in these
151matters and even less today, and the second category makes money from sorting out
152the mess, so why get rid of it. Who knows. For us, it means that any complain
153about these matters deserves the same answer: the \TEX\ community created this
154mess, so it has to live with it. And the bad thing is: bugs and work|-|arounds
155eventually become features and then one is supposed to conform, even if deep down
156one knows better. It doesn't help that the community is proud of what it can
157render and has built itself a reputation that all is good.
158
159So why this criticism? Why not just abandon \TEX ? The answer is simple: \TEX\ is
160quite okay and cannot be blamed for where we are now. We need to think of
161solutions and in that respect the \CONTEXT\ users are lucky! They have always
162been told not to use this macro package for math because there are other
163standards and because publishers want \LATEX\ (even if they just let the
164manuscripts be recoded). That means that we don't really need to care much about
165the past. Those who use \CONTEXT\ can benefit from the compatibility we have
166anyway but also move forward to more structured and consistent math. It is in
167this perspective that we will discuss some more details next so that eventually
168we can draw some conclusions. The end goal is to have an additional layer of
169grouping math symbols that permits consistent high quality rendering in a mixed
170input environment.
171
172\stopsection
173
174\startsection[title=Molecules]
175
176Before we go into details about some characters, we spend some word on the
177rendering. The building blocks of a formula are atoms and internally the term
178nucleus is used for what we have without scripts. The simple sequence \type {1 +
179x} will result in a linked list of three atoms with three nuclei. In \type {x^2}
180the \type {x} is the nucleus. Atoms can have scripts: prescripts, postscripts and
181a prime. The majority of \UNICODE\ math characters become such atoms (nuclei and
182scripts) and they get a class property that determines their spacing, but that is
183not part of the \UNICODE\ specification. From the upcoming sections it will be
184clear that when we classify we don't get that much help from \MATHML\ or even the
185\TEX\ community either.
186
187In addition to these atoms the \LUAMETATEX\ engine (which builds upon \TEX) has
188what we can call molecules. There are several types: fractions, accents, fences,
189radicals. This distinction is to some extent present in \UNICODE: plenty of
190fraction related slashes, all kind of accents, vertical delimiters that can be
191made from snippets and act as fences, and a radical symbol. In \MATHML\ we see
192similar constructs but there in practice quite often operators need to be
193interpreted in a way that can distinguish between atoms and molecules. That is
194partly a side effect of applications that generate \MATHML. And as usual with
195standards pushed upon the world without years of exploration the confusion became
196part of the norm and will stay.
197
198In the \TEX\ engine over and under delimiters are implemented on top of radicals
199(using the same noad, the wrapper node for yet unprocessed math) but they have
200different code paths. Basically we have vertically fenced material and just like
201fractions have left and right fences as part of the concept (for binominals) the
202radical has a sort of left fence too. You can also wonder why we need accent
203noads while we support other delimiters with radicals. This organization mostly
204relates to subtypes and classes (and likely some limitations of the past) that
205have related spacing properties, but we can think if a generic structure noad and
206meaningful subtypes. However, that is not what we get so let's be more precise:
207
208{\bf Fractions:} these stack two atoms (or molecules) and separate them by a
209visible or phantom rule, or in \LUAMETATEX\ by a delimiter. They can have a left
210and right fence which originates in them also suitable for binominals. You may
211wonder why we don't use regular fences here. One reason we can think of is that
212when you fence something, you have an open and close class at the edges while
213with a fenced fraction the whole still is fraction. In \LUAMETATEX\ we can tweak
214classes at the edges but in regular \TEX\ there are fewer classes, so there
215constructs become ordinary or inner.
216
217{\bf Accents:} these put something on top of or below an atom (or molecule) and
218are driven by characters. The accent related commands take an integer
219(traditional) or three integers (extended) and it is this expected input that
220drives it. However, they are treated like delimiters. In traditional \TEX\ a
221delimiter is defined by two characters: the direct unscaled one, and when not
222found a second one drives the lookup from wider variants and eventually an
223extensible character. Accents just have the second one, which probably relates to
224the fact that the text ones that would be the starting point make no sense. It is
225this \quote {looking} for a single code point that makes that accents are not
226merged with the more general radical command space. Another reason is that
227accents deal a bit different with spacing and italic correction so even if we
228could merge, it would be more confusing in the end.
229
230{\bf Fences:} these come in pairs with optional middle ones. The reason for
231pairing is that they need to get the same size. That means that before we
232construct them the atom or molecule that they fence has to be analyzed. It also
233makes the result a construct of its own, although in \LUAMETATEX\ we can unpack
234that result so that it can be broken across lines. In practice that was never an
235issue because in a running text unscaled fences are used (just atoms with open
236and close classes assigned) but as soon as one goes to multi|-|line displays
237formulas things become more hairy. The related commands expect delimiters (the
238two part character definitions) but in the meantime are also happy with a single
239one because in the end \OPENTYPE\ math has all in one font.
240
241{\bf Radicals:} originally this only concerned roots but because they are
242basically wrappers we also use them for content that gets a delimiter above,
243below or both. In that sense the term radical can also be interpreted as \quote
244{extreme}, more than a carrot looking symbol. The related commands take one or
245more delimiters (or character) because we support left as well as right
246delimiters connected by a rule, so in the end radicals evolved into a construct
247with delimiters of all kind. So, the unique property of radicals is that the
248fences assume a cooperation between one or more glyphs and a rule. In \CONTEXT\
249we support actuarian hooks as radicals that are used for annuity expressions,
250otherwise the \UNICODE\ symbols is useless and the \MATHML\ construct complex.
251
252So, where accents take numbers as delimiter specification, fences, fractions and
253radicals take specific math quantities or just letters. This makes that we will
254not merge these into one scanner and handler even if they all use the same
255(large) noad to store and carry around their properties. Also, it has some charm
256to keep the original \TEX\ distinctions. After all, it's not like \UNICODE,
257\MATHML\ or \OPENTYPE\ math fonts have brought some new insights: in the end they
258all draw from \TEX\ and they way it's done there.
259
260\stopsection
261
262\startsection[title=Symbols]
263
264There are plenty of symbols in \UNICODE. When we try to get an idea how we ended
265up with that set we're surprised that not much seems to be known about it. There
266are references to \ISO\ standards, usage by specific organizations (like those
267dealing with patents), there are references to lists of publishers. In personal
268communications with people involved it becomes clear that the criterion that some
269symbols really has to be used somewhere doesn't apply to these math symbols.
270There are bizarre specimens that we cannot locate anywhere. They are often
271assigned the \quote {relation} property which for \TEX\ is a safe bet because
272binary and relations get similar spacing, but binary makes an exception when it
273sits at the front. The fact that relation spacing is used can even obscure the
274fact that some characters have zero width properties; the results just look
275somewhat bad and one can always blame the font or renderer and adding some thin
276spacing is accepted behavior. So one can make the argument that because \TEX\ was
277the main renderer of math, a safe bet was better than a confusing and
278unproven|-|by|-|usage assignment to some category.
279
280In \TEX\ some symbols have multiple names, even when they have the same class.
281This indicates the wish for meaning at one end but shape at the other, and once a
282name has been assigned it sticks. It would be interesting to know how
283mathematicians see formulas: if one puts \type {\bar}s around a variable does one
284see \quotation {bar x bar} or \quotation {the modulus of x}, and how is translation
285to audio to be performed?
286
287One important aspect of using any symbol in \TEX, or basically any typesetting
288system that deals with math, is that the spacing depends on the meaning. Now, in
289the perspective of \UNICODE\ meaning is somewhat diffuse. A Latin capital \quote
290{A} related to \quote {a} is not the same as a Greek capital \quote {A} that
291relates to \quote {\alpha}. So, from the shape one cannot beforehand deduce what is
292meant, but when copying it the \UNICODE\ will expose the meaning. This is not the
293case in math: although many symbols have one meaning only, there are also plenty
294that can mean different things and the (\TEX) math community has not been able to
295make a strong case for providing different slots. Maybe the reason was that there
296already was a tradition of using commands that then relate a shape to a class
297that then results in appropriate spacing. Maybe it is also assumed that an
298article or book starts by explaining what a specific symbol means in that
299particular context. But that doesn't help much for copying. It also doesn't help
300with direct \UNICODE\ input. The way out for this last problem is that in
301\CONTEXT\ we will add additional properties to characters that then can
302communicate the class and thereby control the spacing. Although we initially did
303that at the \LUA\ end we now use the lightweight dictionary feature of the
304engine: a property, group, slot model. The main reason is that we foresee that at
305some point we might have to add property based rendering to the engine, and this
306opens up that possibility. Ever since we started with \LUATEX\ and \MKIV\ we have
307used the character database (in \LUA\ format) to store most properties so that we
308have all in one place.
309
310For figuring out the properties we can look at how traditionally symbols got
311multiple commands associated, how \MATHML\ looks at it, what \UNICODE\ reveals and
312what we find in fonts. It is a bit of jungle out there so for sure we have to
313make decisions ourselves. We next turn to that exploration.
314
315\stopsection
316
317\startsection[title=Slashes]
318
319The definition on the \WIKIPEDIA\ page [1] of slashes is as follows:
320
321\startquotation
322    The slash is an oblique slanting line punctuation mark /. Once used to mark
323    periods and commas, the slash is now used to represent exclusive or inclusive
324    or, division and fractions, and as a date separator. It is called a solidus
325    in \UNICODE, is also known as an oblique stroke, and has several other
326    historical or technical names including oblique and virgule.
327\stopquotation
328
329The page then has a very detailed description on how slashes are used in text,
330mathematics, computing, currency, dates, numbering, linguistic transcriptions,
331line breaks, abbreviations, proofreading, fiction, libraries, addresses, poetry,
332music, sports, and text messages. It is a pretty good and detailed page which also
333gives a nice summary of usage in math.
334
335In mathematics, we use the slash (a forward leaning bar) for fractions, division,
336and quotient of set. Examples of fractions are $\vfrac {1} {2}$ but also
337$\percent$ sits in this category.
338
339\starttabulate[|T|l|l|]
340\NC U+0002F \NC \switchtobodyfont[stixtwo]$\utfchar{"0002F}$ \NC this is the official solidus    \NC \NR % /
341\NC U+02044 \NC \switchtobodyfont[stixtwo]$\utfchar{"02044}$ \NC the mathematical fraction slash \NC \NR % ⁄
342\NC U+02215 \NC \switchtobodyfont[stixtwo]$\utfchar{"02215}$ \NC the mathematical division slash \NC \NR % ∕
343\NC U+02571 \NC \switchtobodyfont[stixtwo]$\utfchar{"02571}$ \NC a diagonal box drawing line     \NC \NR % ╱
344\NC U+029F8 \NC \switchtobodyfont[stixtwo]$\utfchar{"029F8}$ \NC the mathematical big solidus    \NC \NR % ⧸
345\NC U+0FF0F \NC \switchtobodyfont[stixtwo]$\utfchar{"0FF0F}$ \NC a full width solidus            \NC \NR % /
346\NC U+1F67C \NC \switchtobodyfont[stixtwo]$\utfchar{"1F67C}$ \NC the very heavy solidus          \NC \NR % 🙼
347\stoptabulate
348
349The \STIX\ fonts have the first five, the rest is not there, so we can safely
350assume that they are not used in math. That brings us to the question that, say
351that the other ones are used, how does the user access them? In the editor they
352often look pretty much the same. For \TEX ies the answer is easy: you use a
353command. But as we already mentioned, there we enter a real fuzzy area: these
354commands either describe a shape or they communicate a meaning, at least, in an
355ideal world. Sometimes wrapping in a macro helps, like \typ {$\vfrac {1} {2}$}.
356
357In the document that explains \UNICODE\ math there is a section \quotation
358{Fraction Slash and Other Diagonals}. Even if we limit ourselves to the forward
359leaning slashes it looks like we need to include
360exotic symbols, as the empty set symbol with an left arrow on top: \type
361{U+29B4} a circle with left pointing arrow on top, that doesn't show up in most
362math fonts but \STIX\ has it {\switchtobodyfont[stixtwo]{$$}}. We quote:
363
364\startquotation
365    \type {U+2044 } \typ {FRACTION SLASH} is typically used to build up simple
366    skewed fractions in running text. It applies to immediately adjacent
367    sequences of decimal digits, that is, to spans of characters with the General
368    Category property value \type {Nd}. For example, \type {12} should be
369    displayed as \type {½}. In ordinary plain text, any character other than a
370    digit delimits the numerator or denominator. So \type {5 12} should be
371    displayed as \type {5½} since a space follows the \type {5}. In general
372    mathematical use, a more versatile method for layout of fractions is needed
373    (see, for example, Section 2.1 of [UnicodeMath]), however parsers of
374    mathematical texts should be prepared to handle \typ {FRACTION SLASH} when it
375    is received from other sources. \type {U+27CB}
376    \typ {MATHEMATICAL RISING DIAGONAL} and \type {U+27CD}
377    \typ {MATHEMATICAL FALLING DIAGONAL} are
378    mathematical symbols for specific uses, to be distinguished from the more
379    widely used solidi and reverse solidi operators as well as from
380    nonmathematical diagonals.
381\stopquotation
382
383In \TEX\ there is no parsing going on: we just get sequences of atoms and the
384inter atom spacing applies. Curly braced arguments are used to communicate units
385that needs to be treated a while. As side note: where for some scripts there are
386special characters that tell where something (state) starts and ends this is not
387available for math, which makes it impossible to mark a sequence of characters as
388being something math. The whole repertoire of pre|-|composed fractions and super-
389and subscripted \UNICODE\ symbols are not to be used in math.
390
391Most documents that somehow relate to or (partially) originate in \TEX\ can
392be rather fuzzy, so we can read here:
393
394\startquotation
395    \type {U+27CB} corresponds to the \LATEX\ entity \type {\diagup} and \type
396    {U+27CD} to \type {\diagdown}. Their glyphs are invariably drawn with 45° and
397    135° slopes, respectively, instead of the more upright slants typical for the
398    solidi operators. The diagonals are also to be distinguished from the two box
399    drawing characters \type {U+2571} and \type {U+2572}. While in some fonts
400    those characters may be drawn with 45° and 135° slopes, respectively, they
401    are not intended to be used as mathematical symbols. One usage recorded for
402    \type {U+27CB} and \type {U+27CD} is in the notation for spaces of double
403    cosets.
404\stopquotation
405
406So, it is the angles that math users should translate into meaning which I guess
407is natural for them. From the above we cannot deduce if we should take them into
408account in a macro package.
409
410The \MATHML\ specification [3] keeps it abstract and talks about division without
411mentioning the rendering. In content \MATHML\ we have:
412
413\starttyping
414divide = element divide { CommonAtt, DefEncAtt, empty}
415\stoptyping
416
417and the suggested rendering (from an example) is a slash.
418
419In the chapter \quotation {Characters, Entities and Fonts} there is mentioning of:
420
421\startquotation
422    There is one more case where combining characters turn up naturally in
423    mathematical markup. Some relations have associated negations, such as \type
424    {U+226F} [\typ {NOT GREATER-THAN}] for the negation of U+003E [\typ
425    {GREATER-THAN SIGN}]. The glyph for U+226F [NOT GREATER-THAN] is usually just
426    that for U+003E [\typ {GREATER-THAN SIGN}] with a slash through it. Thus it
427    could also be expressed by \type {U+003E}|-|\type {U+0338} making use of the
428    combining slash \type {U+0338} [COMBINING LONG SOLIDUS OVERLAY]. That is true
429    of 25 other characters in common enough mathematical use to merit their own
430    \UNICODE\ code points. In the other direction there are 31 character entity
431    names listed in [\typ {Entities}] which are to be expressed using \type
432    {U+0338} [\typ {COMBINING LONG SOLIDUS OVERLAY}].
433\stopquotation
434
435A curious note is this:
436
437\startquotation
438    For special purposes, one may need a symbol which does not have a \UNICODE\
439    representation. In these cases one may use the \type {mglyph} element for
440    direct access to a glyph as an image, or (in some systems) from a font that
441    uses a non|-|\UNICODE\ encoding. All \MATHML\ token elements accept
442    characters in their content and also accept an \type {mglyph} there. Beware,
443    however, that use of \type {mglyph} to access a font is deprecated and the
444    mechanism may not work in all systems. The \type {mglyph} element should
445    always supply a useful alternative representation in its alt attribute.
446\stopquotation
447
448At some point we experimented with very precise positioned \HTML\ from \TEX\
449(read: \CONTEXT) and that worked very well: the rendering was exactly the same as
450\PDF\ but then suddenly it was no longer possible to access glyphs from fonts. The
451assumption had become that one should feed text into the font rendering machinery
452and use \OPENTYPE\ features to access specific shapes, which of course is a
453fragile approach (the libraries and logic keep evolving, and the most robust
454access is simply by index, or by glyph name if present, assuming that one uses
455the font that was meant to be used). So, how the \MATHML\ glyph element is
456supposed to work out well is not clear. Anyway, as we want nicely typeset math we
457don't care that much if features present in \LUAMETATEX\ and \CONTEXT\ are unique
458and cannot be reproduced otherwise.
459
460In \type {mathclass.txt} [4] which is \quotation {{\em not} formally part of the
461\UNICODE\ Character Database at this time} we see a classification:
462
463\starttabulate[|T|l|]
464\NC U+0002F \NC binary \NC \NR
465\NC U+02044 \NC binary \NC \NR
466\NC U+02215 \NC binary \NC \NR
467\NC U+02571 \NC not mentioned \NC \NR
468\NC U+029F8 \NC n-ary or large operator, often takes limits \NC \NR
469\NC U+0FF0F \NC not mentioned \NC \NR
470\NC U+1F67C \NC not mentioned \NC \NR
471\stoptabulate
472
473So, in the end we can focus on the four that are mentioned, and we will do that
474with the above in mind as well as what is common in the \TEX\ world. We will look
475at usage, classification (groups) and classes.
476
477% modern   % ok, both the same
478% cambria  % different, no extensible /
479% bonum    % ok, both the same
480% pagella  % ok, both the same
481% stixtwo  % only / extensible, 2044 useless
482% lucida   % both extensible, 2044 looks bad and more slope
483
484Unfortunately this sort of mess also results in a mess in fonts. For instance
485when we checked out the difference between \type {U+002F} and \type {U+2044} we
486found that in the fonts produced by the \TEX Gyre project both have proper
487dimensions (and look the same), so they can be used stand alone, but also as
488delimiters. In Cambria the dimensions are okay but only \type {U+2044} has
489extensible characters. In \CONTEXT\ we have defined \type {\slash} to use that slot but
490when you test Lucida and \STIX2 the results are disappointing: In Lucida the
491width of \type {U+2044} makes it unusable (it looks bad anyway), and in \STIX2 it
492is a bit wider so in the end it even becomes fuzzy what to recommend as fix:
493quarter width, half width or full width. Defining \type {\slash} as any of them
494gives at some point an issue so in the end we just patch the font in the goodie
495file: we make them the same and make sure they have extensible characters. After all,
496chances are slim that this will ever be fixed. In that respect a newer engine
497doesn't change the problem: we need to handle it in the macro package, but at
498least that can be done a bit more natural. \footnote {In principle, we can support
499the goodies in the generic font handler, but we think it makes no sense because it
500also relates to the way math is handled in general and supporting a wide range of
501different applications can only cripple the code, let along that agreeing on
502matters can be hard.}
503
504% \ctxlua{table.tocontext(characters.data[0x002F],"[0x002F]")}
505% \ctxlua{table.tocontext(characters.data[0x2044],"[0x2044]")}
506% \ctxlua{table.tocontext(characters.data[0x2215],"[0x2215]")}
507% \ctxlua{table.tocontext(characters.data[0x2571],"[0x2571]")}
508% \ctxlua{table.tocontext(characters.data[0x29F8],"[0x29F8]")}
509
510\stopsection
511
512\startsection[title=Bars]
513
514Again we start with the \WIKIPEDIA\ page, this time the one dedicated to bars
515[5]. The page starts with mathematics so that suggests that the (initial) author
516is familiar with usage in that field: if we cut and paste the itemized list we
517even get \TEX\ math (sort of). Examples of usage are: absolute value,
518cardinality, conditional probability, determinant, distance, divisibility,
519function evaluation, length, norm, order, restriction, set|-|builder notation,
520the Sheffer stroke in logic, subtraction, but also \quotation {A vertical bar can
521be used to separate variables from fixed parameters in a function, or in the
522notation for elliptic integrals}.
523
524Among the objectives of our exploration are grouping symbols in sets that
525represent related meanings and usage. Within these groups we can fine tune with
526classes but that is more geared at rendering. Although currently users enter
527specific usage of symbols with the same shape (or even \UNICODE) with commands we
528can imagine them entering the \quote {real} characters and in that case we need
529some automatic class assignment based on a group (or set of groups). The
530\WIKIPEDIA\ page mentions that in physics \quotation {The vertical bar is used in
531bra||ket notation in quantum physics}. It then goes on about usage in computing,
532phonetics and literature. This ordering is different from the slashes, but okay.
533
534The page then makes a distinction between solid and broken bars and there is some
535interesting history behind that, which relates to typewriters, terminals and
536printers in the perspective of distinction and indeed we noticed that on our
537keyboard the broken bar is still used, even if the rendering is solid. The
538page ends with the \UNICODE\ bars and entities. We mention most:
539
540\starttabulate[|T|l|l|]
541\NC U+007C \NC \switchtobodyfont[stixtwo]$\utfchar{"007C}$ \NC a single vertical line         \NC \NR % |
542\NC U+00A6 \NC \switchtobodyfont[stixtwo]$\utfchar{"00A6}$ \NC a single broken line          \NC \NR % ¦
543\NC U+2016 \NC \switchtobodyfont[stixtwo]$\utfchar{"2016}$ \NC a double vertical line (norms) \NC \NR % ‖
544\NC U+2223 \NC \switchtobodyfont[stixtwo]$\utfchar{"2223}$ \NC divides                        \NC \NR % ∣
545\NC U+2225 \NC \switchtobodyfont[stixtwo]$\utfchar{"2225}$ \NC parallel lines                 \NC \NR % ∥
546\NC U+2502 \NC \switchtobodyfont[stixtwo]$\utfchar{"2502}$ \NC a vertical box drawing line    \NC \NR % │
547\NC U+FF5C \NC \switchtobodyfont[stixtwo]$\utfchar{"FF5C}$ \NC a fullwidth vertical line      \NC \NR % |
548\stoptabulate
549
550Given the mentioned wide range of usage it will be clear bars that can be confusing
551and are pretty overloaded. We're not aware of broken bars being used in math, so
552we ignore these.
553
554The \UNICODE\ math draft talks of \quote {vertical lines} and distinguishes two
555series, delimiters:
556
557\starttabulate[|T|l|l|]
558\NC U+007C \NC \switchtobodyfont[stixtwo]$\utfchar{"007C}$ \NC single vertical lines \NC \NR
559\NC U+2016 \NC \switchtobodyfont[stixtwo]$\utfchar{"2016}$ \NC double vertical lines \NC \NR
560\NC U+2980 \NC \switchtobodyfont[stixtwo]$\utfchar{"2980}$ \NC triple vertical lines \NC \NR
561\stoptabulate
562
563and operators:
564
565\starttabulate[|T|l|l|]
566\NC U+2223 \NC \switchtobodyfont[stixtwo]$\utfchar{"2223}$ \NC divides (single line)           \NC \NR
567\NC U+2225 \NC \switchtobodyfont[stixtwo]$\utfchar{"2225}$ \NC parallel (double lines)         \NC \NR
568\NC U+2AF4 \NC \switchtobodyfont[stixtwo]$\utfchar{"2AF4}$ \NC binary relation (tripple lines) \NC \NR
569\NC U+2AFC \NC \switchtobodyfont[stixtwo]$\utfchar{"2AFC}$ \NC s large triplle operator        \NC \NR
570\stoptabulate
571
572Watch the triples: these are not (yet) in the \WIKIPEDIA\ summary. Rightfully
573there is a remark that the official \UNICODE\ descriptions use \typ {BAR} and
574\typ {LINE} but \TEX ies can't complain about that, can they? After all, they
575also use these terms mixed.
576
577The delimiters sit at the edges but sometimes also in the middle. The operators
578are between other elements and the document states that they also should grow.
579And is it mentioned that spacing depends on usage. The large triple is an n-ary
580operator but as usual with math symbols the user (reader) has to guess what that
581actually means.
582
583It is actually unfortunate that the fences have no left, middle and right
584variant. Even if these render the same it would make life easier and consistency
585with other fences is also worth something. One wonders how it would have looked
586if accessibility demands had kicked in earlier. The \UNICODE\ \type
587{mathclass.txt} [4] provides:
588
589\starttabulate[|T|l|]
590\NC U+007C \NC fence (unpaired delimiter) \NC \NR
591\NC U+2016 \NC fence (unpaired delimiter) \NC \NR
592\NC U+2980 \NC fence (unpaired delimiter) \NC \NR
593\stoptabulate
594
595We assume that the unpaired qualification is actually an indication that usage as
596what in \TEX\ is called \quote {middle} is okay. The operators are classified as:
597
598\starttabulate[|T|l|]
599\NC U+2223 \NC relation    \NC \NR
600\NC U+2225 \NC relation    \NC \NR
601\NC U+2AF4 \NC binary      \NC \NR
602\NC U+2AFC \NC large n-ary \NC \NR
603\stoptabulate
604
605% \ctxlua{table.tocontext(characters.data[0x007C],"[0x007C]")}
606% \ctxlua{table.tocontext(characters.data[0x00A6],"[0x00A6]")}
607% \ctxlua{table.tocontext(characters.data[0x2016],"[0x2016]")}
608% \ctxlua{table.tocontext(characters.data[0x2980],"[0x2980]")}
609% \ctxlua{table.tocontext(characters.data[0x2223],"[0x2223]")}
610% \ctxlua{table.tocontext(characters.data[0x2225],"[0x2225]")}
611% \ctxlua{table.tocontext(characters.data[0x2AF4],"[0x2AF4]")}
612% \ctxlua{table.tocontext(characters.data[0x2AFC],"[0x2AFC]")}
613
614The main problem with bars in \TEX\ is that there is no distinction between a
615left and right bar which makes it impossible to use them directly as fences. On
616can consider this to be an omission to \UNICODE\ math because shape rules over
617meaning. So anyway, this is something that a macro package has to deal with. If
618needed these can get a class on their own in which case we can define atom
619spacing rules that deal with them ending up left or right. In \UNICODE\ there are
620signals that deal with bidirectional text, so we see no reason why there shouldn't
621be similar provisions for math.
622
623\stopsection
624
625\startsection[title=Hyphens and Dashes]
626
627This section applies to text and math as both are riddled with horizontal lines:
628easy to scratch in wood, chisel in stone or draw on paper symbols. We limit
629ourselves to the straight ones, but similar observations can be made for curved
630ones.
631
632\WIKIPEDIA\ distinguishes hyphens, minus, and dashes so there are multiple pages
633dedicated to this. The page about minus mentions that there are three usages
634(somewhat rephrased):
635
636\startitemize[packed]
637    \startitem
638        It is used as subtraction operator and therefore a binary operator
639        that indicates the operation of subtraction.
640    \stopitem
641    \startitem
642        It can be function whose value for any real or complex argument is the
643        additive inverse of that argument.
644    \stopitem
645    \startitem
646        It can serve as a prefix of a numeric constant. When it is placed
647        immediately before an unsigned numeral, the combination names a negative
648        number, the additive inverse of the positive number that the numeral
649        would otherwise name.
650    \stopitem
651\stopitemize
652
653The functional variant is how content \MATHML\ sees it: you apply a minus
654operator to something, singular of multiple. We were surprised to see that there
655is a distinctive rendering suggested, something we have argued for at several
656occasions (mostly \TEX\ meetings):
657
658\startquotation
659    In many contexts, it does not matter whether the second or the third of these
660    usages is intended: \type {5} is the same number. When it is important to
661    distinguish them, a raised minus sign \type {¯} is sometimes used for negative
662    constants, as in elementary education, the programming language \APL, and some
663    early graphing calculators.
664\stopquotation
665
666Unfortunately that distinction was not recognized by the \TEX\ community at large
667which (we guess) is why we don't see it in \UNICODE, which on the other hand has
668plenty dashes as we will see soon.
669
670The page mentions usage in indicating blood types and music, which is a nice
671detail. It also mentions usage in computing, including regular expressions and in
672physics and chemistry indicating charge. It lists these codes for minus symbols:
673
674\starttabulate[|Tl|l|]
675\NC U+002D \NC hyphen minus            \NC \NR
676\NC U+2212 \NC minus                   \NC \NR
677\NC U+FE63 \NC small hyphen minus      \NC \NR
678\NC U+FF0D \NC full width hyphen minus \NC \NR
679\stoptabulate
680
681The page also mentions the commercial minus \type {} (see also [7]) and division
682sign \type {÷} (see also [8]) and we think these should be supported in math mode
683simply because they can be part of (even simple text style) formulas.
684
685The fact that we use the hyphen as minus and expect it to render as a wider dash
686like shape is something that related to math mode in \TEX\ speak. In text mode we
687expect it to be seen as hyphenation related indicator. We won't go into details
688about automated hyphenation and explicit hyphens in text mode but here are the
689hyphens as mentioned on the hyphen specific \WIKIPEDIA\ page:
690
691\starttabulate[|Tl|l|]
692\NC U+002D \NC hyphen minus \NC \NR
693\NC U+00AD \NC soft hyphen \NC \NR
694\NC U+2010 \NC hyphen \NC \NR
695\NC U+2011 \NC non breaking hyphen \NC \NR
696\stoptabulate
697
698You might wonder why we mention text variants here and one reason is that we
699actually might need to provide a catch for the last two: maybe when a user copies
700these from a document (when rendered at all) we need to treat them as the simple
701hyphen minus and just remap them to the math minus when in math mode. Below, we
702will discuss dashes, and although these are also meant for text, a reason for
703exploring these can be found in the fact that \TEX\ users like to decorate the
704content in unexpected ways and lines (or rules) fit into that. The \WIKIPEDIA\
705pages go into some details about the hyphens being used in compounds and there
706can be some confusion about whether to use endashes or hyphens for that. We're
707pretty sure that typesetting wars have been fought over that. Usage as pre- and
708suffixes definitely is worth noting (and we use them as such in this sentence).
709
710We leave out all the other usages and see what there is to tell about related
711symbols. The \WIKIPEDIA\ page about dashes is an extensive one. It starts out with
712the distinction between \unichar {figure dash} {2012}, \unichar {endash} {2013},
713\unichar {emdash} {2014} and \unichar {horizontal bar} {2015}. Of these a \TEX ie
714will for sure recognize the endash and emdash. The hyphen is not a dash but if
715you look at \TEX\ input that double or triple hyphens get ligatured into en- and
716emdashes! The only certainty one has is that the endash is often half the width
717of an emdash. Also, the width of the emdash is often the same as the font size.
718
719One reason why a language subsystem of a \TEX\ macro package is complex is that
720it has to deal with cultural aspects and the usage as well as spacing around all
721these dashes can differ. When trying to support that a macro writer soon finds
722out that one user of language~X can tell you the rules are done this way, and a
723while later you get a mail from another user who claims that in language~X the
724rules are done that way. Word processing and dominance of English probably adds
725to the confusion. The same is true for quotes, but math doesn't need these, so we
726skip them. Now wait, you will say: does math use these dashes? Users probably
727will mix them in but more important is that the width of these dashes also has
728associated skips: \type {\enspace} and \type {\emspace} or \type {\quad} and
729these one definitely see users mix into math.
730
731The figure dash has the same width as digits which makes them useful in tables. In
732the fonts that come with \TEX\ it is the reverse: the digits have the same width
733and that width matches the endash. There is no habit of using the figuredash, but
734we might need to change that. After all, we now have the fonts! We do need to
735deal with the figure dash because users might mix math and text in tables, and
736although you can find plenty of badly typeset by \TEX\ tables, this is no excuse
737for using a mix of minus and figure dash in inconsistent ways.
738
739The \WIKIPEDIA\ page mentions the usage of the endash: as connector, as compound
740hyphen, and as sentence interrupter. Now the one that needs some attention is the
741second one. In Dutch, we can combine words in many ways and for educational
742purposes adding a compound dash makes sense. However, because the weight of the
743hyphen and endash in \TEX\ fonts is rather incompatible, in \CONTEXT\ we use(d)
744fakes: two overlapping hyphens. Another complication is that one has to wrap that
745in a discretionary node in order to make the hyphenator happy, but that is now
746delegated to the engine that can be configured to see certain characters as valid
747hyphenation points. Although we support discretionaries in math this doesn't
748relate to dashes but to pluses and minuses and such. The engine supports explicit
749discretionaries but can also automatically repeat symbols that are set up as
750repeatable across lines. We're not sure if users actually use en- and emdashes in
751math mode, but one can occasionally run into examples (on the web) where special
752effects are achieved in curious ways. \footnote {The math stream doesn't go
753through the font handler although embedded \type {\hbox}es get that treatment.
754This means that two hyphens in a row are just two atoms and not get collapsed to
755an endash.}
756
757It is worth pointing out that \WIKIPEDIA\ discusses \quotation {Ranges of values}
758and this is something we need to investigate in the perspective of math! Strictly
759spoken that is a text thing, but \unknown\  Among the many observed and suggested
760patterns we note that among \TEX ies using the endash as itemize symbols is
761also popular.
762
763Usage of the emdash is related to the use of parenthesis or colons, so it is more
764a kind of punctuation. It can also be used as an interrupt and again it is a
765candidate for an itemize symbol. There is of course a \TEX\ thing there: lack of
766text symbols made for a rather mixed usage of math and text symbols in
767itemizations. For instance a dotted one uses the well visible math dot instead of
768the often hardly visible text dot that simply was not present in \TEX\ fonts, so
769our eyes got accustomed to the bolder ones. It is one of the reasons why a \TEX\
770macro package load a math font even when no math is used. Over the years in \TEX\
771math and text symbols have been mixed in various ways, also a side effect if the
772limited amount of characters in text fonts and the abundance of them in math
773mode, even if most are only accessible by name. We need to deal with that
774historic mix.
775
776The page rightfully mentions that \TEX\ has no horizontal bar, also known as
777\quote {quotation dash}, used for dialogues in some languages. We should make a
778note then that it might be good to see if we have to reconfigure the
779sub|-|sentence presets to match that expectation. The proposed hack {\red MPS:
780where?} for a missing symbol is somewhat curious:
781
782\starttyping
783x \hbox{---}\kern-.5em--- x
784\stoptyping
785
786\startbuffer[dash-example]
787\uleaders \hbox to 1.5em {---\hskip 0pt minus .5em---} \hskip.125em minus .125em \relax
788\stopbuffer
789
790Why not \type {\hbox {---\kern-.5em---}} or just \type {---\kern-.5em---} to get
791the same effect? This also assumes that the font collapses these three hyphens
792into a dash, then it backtracks the symbol width and does a second one.
793\footnote {Here is some food for thought: for this kind of usage one can argue
794that such a dash should have some stretch. In \LUAMETATEX\ and therefore
795\CONTEXT\ we can do this: \typeinlinebuffer [dash-example] and get: \dorecurse
796{30} {x \getbuffer [dash-example] x}. Boxed material can be stretched and be
797taken into account when creating paragraphs. It is no big deal to wrap that in a
798macro, say \type {\figuredashed}.} Anyway, where figure dashes are related to
799minuses we can probably ignore this super minus resembling horizontal bar.
800\footnote {We can actually issue a warning when it is used in math mode.}
801
802The \WIKIPEDIA\ page ends with a summary of all kind of dashes, including
803underscores, script specific symbols, accents (like macron), modifiers and curly
804ones. Here we only mention the ones that can end up in some source when one cuts
805and pastes. Doing that can result in missing characters (because not all fonts
806provides them) or a change in meaning (for as far as the symbols relates to an
807intention). We show some that fit into this discussion and also mention the
808\UNICODE\ description:
809
810\starttabulate[|T|lb{\ttx}|p|]
811\NC U+002D \NC HYPHEN-MINUS                  \NC the usual hyphen but also used as minus \NC \NR
812\NC U+005F \NC LOW LINE                      \NC aka underscore \NC \NR
813\NC U+00AD \NC SOFT HYPHEN                   \NC valid hyphenation point (invisible) \NC \NR
814\NC U+2010 \NC HYPHEN                        \NC the real hyphen but more work on a keyboard \NC \NR
815\NC U+2011 \NC NON-BREAKING HYPHEN           \NC a hard hyphen, disables following hyphenation \NC \NR
816\NC U+2012 \NC FIGURE DASH                   \NC see discussion above \NC \NR
817\NC U+2013 \NC EN DASH                       \NC see discussion above \NC \NR
818\NC U+2014 \NC EM DASH                       \NC see discussion above \NC \NR
819\NC U+2015 \NC HORIZONTAL BAR                \NC see discussion above \NC \NR
820\NC U+2043 \NC HYPHEN BULLET                 \NC used in itemized lists \NC \NR
821\NC U+207B \NC SUPERSCRIPT MINUS             \NC combined with pre-superscripted characters \NC \NR
822\NC U+208B \NC SUBSCRIPT MINUS               \NC combined with pre-subscripted characters \NC \NR
823\NC U+2212 \NC MINUS SIGN                    \NC the math minus (rendering of hyphen) \NC \NR
824\NC U+23AF \NC HORIZONTAL LINE EXTENSION     \NC build long connected horizontal lines \NC \NR
825\NC U+23E4 \NC STRAIGHTNESS                  \NC represents line straightness in technical context \NC \NR
826\NC U+2500 \NC BOX DRAWINGS LIGHT HORIZONTAL \NC part of the box-drawing repertoire \NC \NR
827\NC U+2796 \NC HEAVY MINUS SIGN              \NC a visual variant with no meaning \NC \NR
828\NC U+2E3A \NC TWO-EM DASH                   \NC a visual variant with no meaning \NC \NR
829\NC U+2E3B \NC THREE-EM DASH                 \NC a visual variant with no meaning \NC \NR
830\NC U+FE58 \NC SMALL EM DASH                 \NC a visual variant with no meaning \NC \NR
831\NC U+FE63 \NC SMALL HYPHEN-MINUS            \NC a visual variant with no meaning \NC \NR
832\NC U+FF0D \NC FULLWIDTH HYPHEN-MINUS        \NC a visual variant with no meaning \NC \NR
833\stoptabulate
834
835The \UNICODE\ math draft only mentions the hyphen: \footnote {When I copy this
836snippet into the document source there are \typ {START OF TEXT} symbols at the
837places where a hyphenation occurs, which is probably a side effect of a bad \type
838{TOUNICODE} entry in the \PDF\ file, but it is kind of interesting in this
839perspective as definitely a hyphen is rendered.}
840
841\startquotation
842    Minus sign. \type {U+2212} [or] \type{} [known as] \typ {MINUS SIGN} is the
843    preferred representation of the unary and binary minus sign rather than the
844    \ASCII|-|derived \type {U+002D} [or] \type {-} [known as] \typ
845    {HYPHEN-MINUS}, because minus sign is unambiguous and because it is rendered
846    with a more desirable length, usually longer than a hyphen.
847\stopquotation
848
849and elsewhere we can read:
850
851\startquotation
852    The \ASCII\ hyphen minus \type {U+002D} [or] \type {-} is a weakly
853    mathematical character that may be used for the subtraction operator, but
854    \type {U+2212} [or] \type {} [known as] \typ {MINUS SIGN} is preferred for
855    this purpose and looks better.
856\stopquotation
857
858We are not aware of the concept of weak mathematical characters, so we will not
859take that property too serious when we try to improve the rendering.
860
861This is basically it. There is no mentioning of classes (after all, traditional
862\TEX\ has no unary class) so it is assumed that the renderer does the right
863thing: interpreting the sequence of characters and apply spacing accordingly.
864There are users who like to see a unary minus being rendered differently, just as
865the minus that a student is supposed to key in a calculator and while the
866\WIKIPEDIA\ page mentions this explicitly, it is ignored here. Yes, having two
867distinctive slots for this would have been great. Maybe it is not seen as
868relevant enough by the community that would benefit most, but who knows what had
869happened it the \WIKIPEDIA\ page had been there before!
870
871The minus is mentioned in the somewhat curious section about how shapes should be
872positioned relative to the baseline, where the position of the minus relates to
873what in \TEX\ speak is the math axis. There is also some mentioning of non-mathematical use, like:
874
875\startquotation
876    The concept of mathematical use is deliberately kept broad; therefore the
877    Math property is also given to characters that are used as operators, but are
878    not part of standard mathematical notation, such as \type {U+2052} \typ
879    {COMMERCIAL MINUS}.
880\stopquotation
881
882There should be no confusion with the \typ {SET MINUS} which renders as a
883backslash, a \typ {(NEG\-ATED) MINUS TILDE} or \typ {(NEG\-ATED) SIMILAR MINUS
884SIMILAR} that look more like relations. {\red MPS: overfull hbox, and do you
885intend to hyphenate?}
886
887The \MATHML\ document recognizes the minus as being unary or binary. In content
888\MATHML\ it is easy: when applied to a single atom it is a unary. In presentation
889\MATHML\ minus is an operator that sits at the front of a row (unary) or in the
890middle (binary). Keep in mind that we are limited to \type {mn} for numbers,
891\type {mi} for alphabetic symbols and \type {mo} for operators, not to be
892confused with \TEX's math operators, because in \MATHML\ relations are also
893operators. One can wonder about a minus in \type {mn} elements.
894
895So to summarize: we definitely need to make sure that (whatever renders as)
896hyphens is dealt with in math as minus. We can wonder what to do with
897(especially) en- and emdashes and the other horizontal lines that actually might
898show up as (what we call) middle delimiters in mathematical constructs: if it's
899there, \TEX ies will use it! The lack of specific symbols for unary minus has to
900be compensated at the macro package level.
901
902% \ctxlua{table.tocontext(characters.data[0x002D],"[0x002D]")}
903% \ctxlua{table.tocontext(characters.data[0x2010],"[0x2010]")}
904% \ctxlua{table.tocontext(characters.data[0x2011],"[0x2011]")}
905% \ctxlua{table.tocontext(characters.data[0x2212],"[0x2212]")}
906% \ctxlua{table.tocontext(characters.data[0x2212],"[0x2213]")}
907% \ctxlua{table.tocontext(characters.data[0x2212],"[0x2214]")}
908% \ctxlua{table.tocontext(characters.data[0x2212],"[0x2215]")}
909% \ctxlua{table.tocontext(characters.data[0xFE63],"[0xFE63]")}
910% \ctxlua{table.tocontext(characters.data[0xFF0D],"[0xFF0D]")}
911
912% U+2043 HYPHEN BULLET
913% U+207B SUPERSCRIPT MINUS
914% U+208B SUBSCRIPT MINUS
915
916\stopsection
917
918\startsection[title=Pieces]
919
920In \UNICODE\ one can find all kind of constructors, for instance characters that
921find their origin in those character sets that had lines and corners for drawing
922on a terminal. It is therefore no surprise that there are also some constructors
923that relate to math. An example demonstrates this:
924
925\startbuffer[definition]
926\def\makeweird#1#2#3#4%
927  {\vcenter\bgroup
928     \offinterlineskip
929     \hbox{$\scriptscriptstyle\char"#1$}\par
930     \hbox{$\scriptscriptstyle\char"#2$}\par
931     \hbox{$\scriptscriptstyle\char"#3$}\par
932     \hbox{$\scriptscriptstyle\char"#4$}%
933   \egroup}
934
935\def\lwA{\mathopen {\makeweird{23A7}{23A8}{23A8}{23A9}}}
936\def\rwA{\mathclose{\makeweird{23AB}{23AC}{23AC}{23AD}}}
937\def\lwB{\mathopen {\makeweird{23A7}{23AC}{23AC}{23A9}}}
938\def\rwB{\mathclose{\makeweird{23AB}{23A8}{23A8}{23AD}}}
939\def\lwC{\mathopen {\makeweird{23A7}{23AC}{23A8}{23A9}}}
940\def\rwC{\mathclose{\makeweird{23AB}{23A8}{23AC}{23AD}}}
941\stopbuffer
942
943\startbuffer[demo]
944$\lwA x + 4 + \lwB x^2 + 4^2 + \lwC x^3 + 4^3 \rwC \rwB \rwA$
945\stopbuffer
946
947\typebuffer[definition,demo]
948
949This renders as:
950
951\startlinecorrection
952\getbuffer[definition]
953\scale[width=\textwidth]{\getbuffer[demo]}
954\stoplinecorrection
955
956So, we have official \UNICODE\ characters for constructing large fences. In the
957\UNICODE\ math documents there is some mentioning of this and interesting is that
958there are suggested compositions expressed in 2, 3, 5 etc. stacked \quote {lines}
959which makes one wonder how math is perceived (or supposed to be rendered). But
960what is really weird is that there are plenty of arrows but no snippets defined that
961can be used to create extended ones. Why vertical snippets and no horizontal
962ones? This is clearly an omission and the \TEX\ community did take care of this
963need. So, for horizontal arrows and alike one expects the font to handle it and
964for fences not?
965
966It is not only fences that have snippets, we also find them for integrals. But
967for vertical arrows they are lacking: that is completely up to the font. Now, for
968us that is fine, but again, for consistency they could have been there. It would
969make it possible to filter bits and pieces from fonts using official slots
970instead of private ones. So, to some extent we can best assume there is nothing
971like that and ignore whatever pieces are in \UNICODE\ anyway (like the braces in
972the example). One can even argue that because of this inconsistency a font
973designed can as well only use private slots and not provide snippets at all.
974
975So, how do we get out of this situation? Because no one cared getting it in
976\UNICODE, we can do as we like. Of course, we can define arrow fillers as has
977always been done in \TEX, but because in \LUAMETATEX\ we have a bit more in our
978toolkit, and because we want to support stretch fractions (where the rule is
979replaced by a horizontal delimiter) it was decided to define a tweak that deals
980with this: when the basic arrows have no horizontal parts defined, we just
981assemble them. For those arrows that have a hook or so at the other end, we use
982the space as extender. \footnote {Actually we no longer do that because the
983engine will center the arrow anyway when it's too short.} If we ever end up with
984proper snippets un \UNICODE\ then we also need adapted fonts, and then we can get
985rid of these hacks. That said: because all decent math fonts do have the three
986pairs or fences (brace, parenthesis, bracket) the vertical snippets are rather
987useless, unless one wants to construct assembled weird ones. This would be
988different for horizontal assemblies, because there is more variety in them.
989
990The official name for all related to characters that can stretch is \quote
991{delimiter}. In traditional \TEX\ one can define a command that becomes a
992character. In that case a family, class and slot is assigned. You can also
993directly access a character in which case one will assign these properties
994otherwise (no command is defined). The same is true for these delimiters.
995However, in traditional \TEX\ the larger character usually comes from a so called
996extension font and uses family~3). In \OPENTYPE\ fonts we have all in one font so
997there the large family, class, and slot are not used.
998
999An interesting side effect of the updated math machinery in \LUAMETATEX\ is that
1000we no longer really need delimiter specifications when we use \OPENTYPE\ fonts.
1001This is because in practice the only two classes that really matter are the open
1002and close ones. There are basically two kinds of delimiters: fences and
1003singulars. Fences need open and close and only bars have a dual character. So,
1004when we don't define it as delimiter, the engine can still use that character and
1005take its assigned class when used stand|-|alone, while in the case of fences
1006these themselves are of class open and close. And, for instance a left brace can
1007get class open because when used stand alone it is an unscaled left fence. In the
1008rare case that one really need a different class we are using commands: some
1009characters can be binary, ordinary or whatever so then commands relate a name to
1010a class|-|character combination. Actually, in \CONTEXT\ we will switch to using
1011dictionaries and field specific rendering instead, but that is a different story.
1012We can illustrate the arrows with an example:
1013
1014\startbuffer
1015$ x +
1016    \left\downarrow a \uparrow \frac{1}{b} \downarrow c \right\uparrow
1017= y $
1018\stopbuffer
1019
1020\typebuffer
1021
1022The stand alone arrows are defines with class relation but when used as fences
1023their spacing is driven by the fences themselves.
1024
1025\startlinecorrection
1026\scale[width=\textwidth]{\showmakeup[mathglue]\mathspacingmode1\showglyphs\getbuffer}
1027\stoplinecorrection
1028
1029This means that in \CONTEXT\ \LMTX\ we no longer have delimiter code definitions.
1030Of course the engine has to be able to use math characters of any kind (by
1031commands, direct or as \UTF) as delimiters, but that was not that hard to
1032provide. It also simplifies the code we use for fencing as it can be less
1033selective.
1034
1035Another interesting side effect of once again looking into these stretched
1036characters is that the fraction mechanism that already was extended with skewed
1037fractions, now supports any stretchable character as alternative for a fraction
1038rule.
1039
1040\startbuffer
1041$
1042    p \leftarrowtext {a + b + c + d}{x + y} q
1043    \quad
1044    p \frac {a + b + c + d}{x + y} q
1045$
1046\stopbuffer
1047
1048\typebuffer
1049
1050Watch the difference in spacing: here the class of the used delimiter determines the
1051spacing around the (pseudo) fraction:
1052
1053\startlinecorrection
1054\scale[width=\textwidth]{\showmakeup[mathglue]\mathspacingmode1\showglyphs\getbuffer}
1055\stoplinecorrection
1056
1057Again this simplifies some code because normally one ends up with stacking stuff
1058using leaders in between.
1059
1060\stopsection
1061
1062\startsection[title=Accents]
1063
1064When we talk about accents, we refer to tiny symbols that anchor themselves onto
1065base characters. We limit ourselves to the ones common in Latin scripts because
1066they are the ones used in math. Accents in \UNICODE\ are somewhat special. In
1067the past, when encoding vectors were limited, accents were entered as part of an
1068input sequence and then anchored by the renderer. Nowadays often pre|-|composed
1069characters are used. A very cheap way of anchoring is to have accents that just
1070overlay, and in practice centering an accent over a base character works sort of
1071okay. As an example of an accent we will use the hat:
1072
1073\starttabulate[|T|c|l|c|]
1074\NC U+005E \NC x\char"005E x m\char"005E m\NC \tex {Hat}     \NC \im{x \char"005E x + m\char"005E m} \NC \NR %  94
1075\NC U+02C6 \NC x\char"02C6 x m\char"02C6 m\NC \tex {hat}     \NC \im{x \char"02C6 x + m\char"02C6 m} \NC \NR % 710
1076\NC U+0302 \NC x\char"0302 x m\char"0302 m\NC \tex {widehat} \NC \im{x \char"0302 x + m\char"0302 m} \NC \NR % 770
1077\stoptabulate
1078
1079Normally the font handler will take care of anchoring \type {U+0302}, but it can
1080only be done properly when there are anchors defined for what are called \quote
1081{marks}: the official feature description is mark|-|to|-|base (or simply \type
1082{mark}).  The last column in the above table shows math and as we input a raw
1083character we don't get proper anchoring: the zero width makes it overlap.
1084
1085% till here
1086
1087Now wait, you will say, but why does it actually overlap? The reason is that zero
1088width is not actually zero width here! The glyph has a bounding box that goes
1089into the negative horizontal direction and therefore, when such a shape gets
1090injected into the output, the rendering in the viewer will move the left edge to
1091the left. But because the \TEX\ engine only handles positive widths and because
1092the width is explicitly part of a character specification anyway\footnote {The
1093height and depth are not: these we derive from the bounding box.} we don't
1094progress (advance) which is why the overlapping sort of works for the $x$ but
1095less so for the $m$: in math mode we need to use these \type {\hat} and \type
1096{\widehat} commands.
1097
1098The hat and widehat assignments were those of August 2022. In plain \TEX\ we see
1099these definitions:
1100
1101\starttyping
1102\def\hat    {\mathaccent"705E }
1103\def\widehat{\mathaccent"0362 }
1104\stoptyping
1105
1106The \type {\mathaccent} primitive takes an integer that encodes the class, family,
1107and slot in the 8 bit font encoding. Here we see that the hat comes from family
11080, the upright math font. The widehat comes from extensible family 3. These two
1109are independently defined. When you want a hat that spans the nucleus, you need to
1110use the widehat. In the math engine spanning actually means that we have a
1111delimiter and normally that means: start with a basic shape, when that is too
1112narrow, go to the extensible font and follow the chain with increasing sizes and
1113when you run out of those apply an extensible recipe. The sequence and extensible
1114are both optional and the important part is that we first look at what is called
1115the small character and then to the large one(s).
1116
1117However, the \type {\mathaccent} primitives doesn't take a delimiter! It directly
1118starts following a chain if the given character has it (and then the character
1119itself is of course the first in that chain). And this is where the problems
1120start when we move to \OPENTYPE\ and \UNICODE\ math.
1121
1122\starttabulate[|T|l|l|]
1123\NC U+005E \NC Hat     \NC some useless, often ugly large glyph \NC \NR %  94
1124\NC U+02C6 \NC hat     \NC it has width but no extensibles      \NC \NR % 710
1125\NC U+0302 \NC widehat \NC it has zero width and extensibles    \NC \NR % 770
1126\stoptabulate
1127
1128Now, if we define \type {\hat} as \type {U+02C6} we don't get the extensibles,
1129and it basically is what was always done in \TEX\ macro packages following the
1130plain suggestions. If we define \type {\widehat} we start out with a glyph that
1131has likely zero width\footnote {Over the many years that \LUATEX\ evolved this
1132was not guaranteed, for instance when wide (\UNICODE) fonts were constructed from
1133traditional eight bit (\TEX\ encoded) fonts.} And, because \OPENTYPE\ starts with
1134the base glyph and {\em then} uses a set of variants of eventually a recipe of
1135parts, we suddenly have a different situation with \type {\mathaccent} than we
1136normally have, where these are decoupled. Therefore, the definition of \type {\hat}
1137and \type {\widehat} determines what an \OPENTYPE\ math engine will do, just as
1138in regular \TEX, but we might need them to be defined differently.
1139
1140A solution would be to let \type {\mathaccent} (or \type {\Umathaccent}) directly
1141go to the variants, but that is sort of weird. Because a zero width glyph doesn't
1142match the criteria to span a nucleus it is likely to be skipped anyway, although
1143there can be a case where the next in size overruns the width of the nucleus in
1144which case the zero width one is used which itself is not that nice. We could
1145actually derive the width from the boundingbox, but that would be a bit abnormal,
1146and it makes no sense to burden the font machinery with that exception. Another
1147approach we can follow is to just copy the extensibles from \type {U+0302} to
1148\type {02C6} and use that one for \type {\hat} as well as \type {\widehat} and
1149then make \type {\widehat} an alias to \type {\hat}. After, all, the main reason
1150why we have two commands comes from the fact that \type {\mathaccent} doesn't
1151take a delimiter but single character reference (encoded in an integer).
1152
1153Here is the whole list of accents:
1154
1155\starttabulate[||T||T|]
1156\NC \tex{grave} \NC U+0060 \NC \tex{widegrave} \NC U+0300 \NC \NR
1157\NC \tex{ddot}  \NC U+00A8 \NC \tex{wideddot}  \NC U+0308 \NC \NR
1158\NC \tex{bar}   \NC U+00AF \NC \tex{widebar}   \NC U+0304 \NC \NR
1159\NC \tex{acute} \NC U+00B4 \NC \tex{wideacute} \NC U+0301 \NC \NR
1160\NC \tex{hat}   \NC U+02C6 \NC \tex{widehat}   \NC U+0302 \NC \NR
1161\NC \tex{check} \NC U+02C7 \NC \tex{widecheck} \NC U+030C \NC \NR
1162\NC \tex{breve} \NC U+02D8 \NC \tex{widebreve} \NC U+0306 \NC \NR
1163\NC \tex{dot}   \NC U+02D9 \NC \tex{widedot}   \NC U+0307 \NC \NR
1164\NC \tex{ring}  \NC U+02DA \NC \tex{widering}  \NC U+030A \NC \NR
1165\NC \tex{tilde} \NC U+02DC \NC \tex{widetilde} \NC U+0303 \NC \NR
1166\NC \tex{dddot} \NC U+20DB \NC \tex{widedddot} \NC U+20DB \NC \NR
1167\stoptabulate
1168
1169The only accent that is an exception is the last one but is it really used? It
1170anyway makes no real sense to assume that users will ever directly input the
1171\UTF\ characters conforming the last column, so we can just go for the first one
1172and use the extensibles from the second and see where we end up. Neither \MATHML\
1173nor \TEX\ related specifications seem to cover this well, so we can just do what
1174suits us best.
1175
1176\startbuffer
1177\showglyphs
1178\im {\widehat{a} + \widehat         {aa}} =
1179\im {\hat    {a} + \hat             {aa}} =
1180\im {\hat    {a} + \hat[stretch=yes]{aa}} =
1181\setupmathaccent[top][stretch=yes]
1182\im {\hat    {a} + \hat             {aa}}
1183\stopbuffer
1184
1185Because all has to fit into the \CONTEXT\ user interface and because we also want
1186to be backward compatible (command wise), we end up with something:
1187
1188\typebuffer
1189
1190that gives us:
1191
1192\startpacked \glyphscale = \numexpr2*\glyphscale\relax \getbuffer \stoppacked
1193
1194Now, one problem, is of course that users can enter these modifiers as \UTF\
1195sequence in the input, just like they do with delimiters. Therefore we do support
1196the following feature (which is under class control so disabled by default):
1197
1198\startbuffer
1199\Umathcode    "02C6   \mathaccentcode 0 "02C6
1200\edef         \HiHatA {\Uchar"02C6}
1201\Umathchardef \HiHatB \mathaccentcode 0 "02C6
1202
1203$ \Uchar"02C6{x} + \HiHatA{xx} + \HiHatB{xx} = \widehat {xxxx} $
1204\stopbuffer
1205
1206\typebuffer
1207
1208You get this:
1209
1210\start
1211    \pushoverloadmode \getbuffer \popoverloadmode
1212\stop
1213
1214The only cheat here is that normally accents come after the accentee, but we can
1215live with that. After all, it's all about convenience.
1216
1217There is another aspect of accents that we need to mention here. The hat, tilde,
1218and check are often used over not only single letters but also small expressions.
1219So how come that fonts have only very few variants defined? We can imagine that
1220in eight bit fonts the number of available slots plays a role but in \OPENTYPE\
1221fonts that is not the case. It therefore can be considered an
1222oversight that usage of these wide accents has not be communicated well to the
1223font designers.
1224
1225\def\CrappyHack#1{\im{
1226    #1{a}       + #1{a+b}       + #1{a+b+c} +
1227    #1{a+b+c+d} + #1{a+b+c+d+e} + #1{a+b+c+d+e+f}
1228}\par}
1229
1230\startpacked
1231\CrappyHack\widehat
1232\CrappyHack\widetilde
1233\CrappyHack\widecheck
1234\stoppacked
1235
1236The previous lines demonstrate that we can actually cheat a little for these
1237three top accents: we can just scale the last variant horizontally. It was a few
1238lines patch to \LUAMETATEX\ to make this automatic and triggered by setting the
1239\type {extensible} field in a character table to \type {true} instead of a
1240recipe. The ingredients to get this working were already there, and it works out
1241quite well. The only complication was that the \type {flac} feature (that
1242provides flat accents for cases where the nucleus is rather high) could interfere,
1243but that was trivial to deal with in the code that does the goodies. \footnote
1244{When we were testing fonts this got us by surprise when we tested Cambria that
1245has these flat overloads for the tilde and check. Because supports this automatic
1246(hidden from the user) one doesn't look into that direction when testing
1247something.}
1248
1249When it comes to these delimiters that have no real solution in the font, we can
1250consider delegating coming up with a glyph to the macro package at the time it is
1251needed, and we can actually do that. However, this is mostly interesting for
1252educational usage, where the amount of delimiters is predictable and limited.
1253About a decade ago some mechanism was added to the \MKIV\ math machinery that
1254support plugins so that we could use \METAFUN\ to generate (most noticeably)
1255square root symbols the way we liked. \footnote {This was a fun project of Alan
1256and Hans.} The main drawback is that mixing this in means matching to a font, and
1257that is not always trivial. But it is this kind of trickery that makes working
1258with \TEX\ fun. That said: what we are discussing here is more fundamental in the
1259sense that we try to come up with generic engine solutions that just rely on the
1260fonts. That way complex math with all reasonable symbols is also served.
1261\footnote {These \METAFUN\ plugins are still possible, but we need to adapt some
1262to \LMTX\ which will happen as we go.}
1263
1264Interestingly there are some arrows that act like accents. There are over- and
1265under ones as well as combining (often zero width) accents. Fonts are not always
1266consistent in how these extends (the wide ones). Often the combining accents are
1267smaller and closer to the running text. Traditionally in \TEX\ fonts there are no
1268extensible arrows: they are constructed from arrow heads, minus and equal signs
1269with some negative spacing in between. One can therefore wonder is the smaller
1270combining ones are appreciated by those who want stable math. It definitely means
1271that we have to make choices. Even more interesting is that while \UNICODE\ has
1272some means to construct braces from predictable \UNICODE\ slots. there is no way
1273to do the same with arrows and (indeed) there are fonts out there with shaped
1274arrows that demand different middle and end pieces. In fact, the same is true for
1275rules that are not simple rectangles and radical extensions that are not flat
1276rules either. In all these cases the usage patterns of accents and similar
1277constructs has not really been fed back into the way \UNICODE\ and \OPENTYPE\
1278fonts support math. \footnote {One can argue that this is not what \UNICODE\ is
1279for but if so, then some other bits and pieces also make little sense.}
1280
1281\stopsection
1282
1283\startsection[title=Bullets]
1284
1285In \TEX\ usage bullets are a it special. Because fonts had a limited number of slots
1286available, bullets in for instance itemized lists traditionally were taken from
1287a math font. The bullet in Computer Modern has a comfortable size and is quite
1288useful for that. Bullets in text fonts often were (are) relatively small so even when
1289they were available they were not really used. The official \UNICODE\ slot for
1290bullet is \type {U+2022} and in this font it shows up as \quote {}. The \WIKIPEDIA\ page
1291on bullets (typography) mentions:
1292
1293\startquotation
1294    A variant, the bullet operator (\type {U+2219}  \typ {BULLET OPERATOR}) is
1295    used as a math symbol, akin to the dot operator. Specifically, in logic, $x 
1296    y$ means logical conjunction. It is the same as saying \quotation {x and y}
1297\stopquotation
1298
1299The page also mentions that \quotation {glyphs such as {\switchtobodyfont
1300[stixtwo]$$} and {\switchtobodyfont [stixtwo]$$}} have \quotation {reversed
1301variants {\switchtobodyfont [stixtwo]$$} and {\switchtobodyfont [stixtwo]$$}}
1302although we haven't see the reverse once in \TEX\ documents (yet), like these (we
1303use \STIX2\ to show them):
1304
1305\starttabulate[|Tl|l|l|]
1306\NC U+2022 \NC \switchtobodyfont[stixtwo]$$ \NC BULLET \NC \NR
1307\NC U+2023 \NC \switchtobodyfont[stixtwo]$$ \NC TRIANGULAR BULLET \NC \NR
1308\NC U+2043 \NC \switchtobodyfont[stixtwo]$$\NC HYPHEN BULLET \NC \NR
1309\NC U+204C \NC \switchtobodyfont[stixtwo]$$\NC LACK LEFTWARDS BULLET \NC \NR
1310\NC U+204D \NC \switchtobodyfont[stixtwo]$$\NC LACK RIGHTWARDS BULLET \NC \NR
1311\NC U+2219 \NC \switchtobodyfont[stixtwo]$$ \NC BULLET OPERATOR (math) \NC \NR
1312\NC U+25CB \NC \switchtobodyfont[stixtwo]$$ \NC WHITE CIRCLE \NC \NR
1313\NC U+25CF \NC \switchtobodyfont[stixtwo]$$ \NC BLACK CIRCLE \NC \NR
1314\NC U+25D8 \NC \switchtobodyfont[stixtwo]$$ \NC INVERSE BULLET \NC \NR
1315\NC U+25E6 \NC \switchtobodyfont[stixtwo]$$ \NC WHITE BULLET \NC \NR
1316\NC U+29BE \NC \switchtobodyfont[stixtwo]$$ \NC CIRCLED WHITE BULLET \NC \NR
1317\NC U+29BF \NC \switchtobodyfont[stixtwo]$⦿$ \NC CIRCLED BULLET \NC \NR
1318\stoptabulate
1319
1320The reverse ones are not really reverse in \STIX2\ as they have bigger circles.
1321There are a few more bullets mentioned but probably only because they have the
1322word bullet in their description and they don't really look like bullets. Given
1323the already discussed lack of granularity in some math symbols with multiple
1324usage it is somewhat surprising that we have a math bullet. The weird looking
1325left- and rightward bullets are kind of hard to distinguish. Let's hope that
1326mathematicians don't discover these!
1327
1328This brings us to the more general way of looking at these bullets because among
1329the popular math symbols used in text are also the triangles and (\TEX) math
1330fonts came with. When we have a few commands for circular shapes like \typ
1331{$\bullet \bigcirc \circ$} giving $\bullet \bigcirc \circ$ we have plenty of
1332(black) triangles.
1333
1334For instance, we have \type {\triangledown} and \type {\bigtriangledown} and these
1335have corresponding \UNICODE\ slots \type {U+25BD} and \type {U+25BF} but when
1336you try these in for instance \STIX2, Pagella and Cambria you got:
1337 + ,  + ? and ? + ?, where the question mark indicates a missing character.
1338
1339It is for that reason that \type {\triangledown} and \type {\bigtriangledown} are
1340both defined as using the large one. This test also demonstrated us that we
1341didn't have to waste time looking up what \MATHML\ had to tell about it. A
1342typeset version of that specification was never a visual highlight and missing
1343glyphs only makes that worse. And, when fonts lack shapes no one uses them
1344anyway.
1345
1346However, it makes sense to think a bit about how to deal with this properly, and
1347we will likely add some checking to the goodie files for it, so that when we do
1348have them, we use them. \footnote {Most practical is to add this information to
1349the character database which is a bit of work}. But even then, most troublesome
1350is that the size (and even positioning) of these symbols is rather inconsistent
1351across math fonts, but because they are seldom used it doesn't make much sense to
1352compensate for that (read: we just wait till users ask for it).
1353
1354% {\switchtobodyfont[stixtwo]$\char"25BD+\char"25BF$}% +\triangledown+\bigtriangledown$
1355% {\switchtobodyfont[pagella]$\char"25BD+\char"25BF$}% +\triangledown+\bigtriangledown$
1356% {\switchtobodyfont[cambria]$\char"25BD+\char"25BF$}% +\triangledown+\bigtriangledown$
1357
1358\stopsection
1359
1360\startsection[title=Punctuation]
1361
1362There are quite some punctuation symbols in \UNICODE\ but not for math where the
1363main troublemakers are the period, comma, colon and semicolon. The first two can
1364be used as separator in numbers, in which case we don't want any spacing, or they
1365can be part of a (pseudo) sentence in a formula, or they can separate entries in
1366a list (take coordinates).
1367
1368\starttyping
13691.1 + 1.2
1370(1.1, 1.2)
1371x + 1.1, x + 1.2
1372\stoptyping
1373
1374When used as separator in a sentence, which is more likely in display math than
1375in inline math, the spacing after it can be either regular (as in text) or wide.
1376And the symbol can come from the math font or text (and these can actually look
1377different). In \CONTEXT\ (also pre \LMTX) we have some special trickery at work
1378for spacing comma's and periods but we leave that aside now. What should be noted
1379is that out|-|of|-|the|-|box spaces are ignored when math is scanned so we cannot
1380take that surrounding into account when dealing with spacing in the engine.
1381
1382Although the \UNICODE\ specification provides a classification of characters that
1383includes punctuation in practice we need to deal with it ourselves. For instance,
1384by default a period is not considered punctuation but a command and semi colon
1385are, while a colon is a relation!
1386
1387Take for instance $f.$ (math italic f followed by a period). Italic correction
1388and math glyphs have this special relationship and it also shows up in
1389punctuation. Imagine that we have a sequence of characters, say $fx$. These are
1390actually two ordinary atoms but in $f,$ we have an ordinary atom followed by a
1391punctuation atom so here spacing is determined by how these classes are set up.
1392But, given the shape if the $f$ we actually don't want italic correction here.
1393
1394\startbuffer
1395$fx + f. +f, + f: + f; + a. +a, + a: + a; + x, +x, + x: + x;$%
1396\stopbuffer
1397
1398\startlinecorrection
1399\scale[width=\textwidth]{%
1400    \getbuffer
1401}
1402\blank[halfline]
1403\scale[width=\textwidth]{%
1404    \showmakeup[mathglue]%
1405    \mathspacingmode\plusone
1406    \showfontitalics
1407    \showfontkerns
1408    \showglyphs
1409    \getbuffer
1410}
1411\stoplinecorrection
1412
1413When you zoom in you can see the subtle spacing differences. We can compensate
1414for the semi colon being a bit higher than the period by applying some kern,
1415something that we can set up in the goodie file.
1416
1417Actually, if we assume that periods only occur in numbers we can make it
1418punctuation and set it up for digit spacing but then commas etc also get done
1419that way. A variant is to have two punctuation classes (or cheat and put the
1420period in the digit class). No matter what we do, no help can be expected from
1421documents mentioned: it's mostly a visual thing anyway.
1422
1423Let's end with the visual aspect: in most fonts the two colons \type {0x003A} and
1424\type {0x2236} are different: one has more distance between the periods. Which
1425one? Well, that depends on the font! Latin Modern has a cramped \type {0x2236}
1426while \STIX2 has a cramped \type {0x003A}. Cambria has square dots for the
1427{0x003A} and round ones slightly more cramped for \type {0x2236}. Lucida goes
1428extreme: it has smaller dots far apart for \type {0x2236}. If the idea is that a
1429reader should get from the shape what it's about one can wonder if texts get read
1430the way the author intended. Of maybe shapes don't matter. Of course a macro
1431package can obscure these inconsistencies by setting the math character code of
1432\type {0x003A} to \type {0x2236} but that only obscures the fact that little
1433attention has been paid: what one can consider bugs became features.
1434
1435\stopsection
1436
1437\startsection[title=Special ones]
1438
1439There are quite some characters that really depend on a math renderer. Examples
1440are wide accents, fences, and arrows. Some constructs, like fractions use rules
1441and these don't come from \UNICODE\ nor fonts. A mixed case is radicals: there
1442is a \UNICODE\ point and fonts can provide larger variants. Normally one steps up
1443a slightly slanted version but when things get large the radical becomes an
1444extensible and therefore gets an upright shape. The engine is supposed to add a
1445horizontal rule at the right location. Interesting is that there is no provision
1446for a right end cap. The reason probably is that \TEX, being the major renderer,
1447has no combined horizontal and vertical extenders and \OPENTYPE\ doesn't have
1448that either. Some properties are driven by the fonts' math parameters which sort
1449of makes the radical rendering a very restricted adventure: it is supposed to be
1450used for roots only, either of not with a degree anchored in the right top area.
1451It looks like that degree is not really to extend much beyond the left edge of
1452the symbol.
1453
1454In \UNICODE\ there is an actuarian character \type {U+20E7} and support in fonts
1455is not that good. We do support it because we ran into in \MATHML. However, it is
1456a hack. The symbol as provided by fonts is rather useless.
1457
1458\startbuffer
1459$ \sqrt {x + 1} + \annuity{x + 1} $
1460\stopbuffer
1461
1462\typebuffer
1463
1464Let's see how it renders:
1465
1466\startlinecorrection
1467\scale[width=.5\textwidth]{\getbuffer}
1468\stoplinecorrection
1469
1470We take the dimensions of a radical as template and when we look at the bare
1471glyphs we see this:
1472
1473\startlinecorrection
1474\scale[height=2\lineheight]{$\char"221A \enspace \char"20E7$}
1475\stoplinecorrection
1476
1477Basically we have a right actuarian character like we have a left radical. But In
1478this case the rule will go left instead of right. This is implemented on top of
1479radicals so and driven by \type {\Udelimited} that takes two delimiters and
1480doesn't scan for a degree. For two-sided roots (with degree) we have \type
1481{\Urooted}. And like normal radicals the delimited one adapts itself to the
1482content:
1483
1484\startbuffer
1485$ \sqrt {x + \frac{1}{x}} + \annuity {x + \frac{1}{x}} $
1486\stopbuffer
1487
1488\typebuffer
1489
1490So we get:
1491
1492\startlinecorrection
1493\scale[width=.5\textwidth]{\showstruts \getbuffer}
1494\stoplinecorrection
1495
1496For the record: in \CONTEXT\ spacing is also driven by the struts and because we
1497use the radicals renderer the gap and distance parameters also apply. It might
1498look spacy, but keep in mind that we want radicals to look similar when we have
1499more of them in line, and we can configure all. We have also enabled the feature
1500that radicals at the same level are normalized in height and depth. Here are some
1501variants:
1502
1503\startbuffer
1504$ \lannuity  {x + \frac{1}{x}} +
1505  \rannuity  {x + \frac{1}{x}} +
1506  \lrannuity {x + \frac{1}{x}} $
1507\stopbuffer
1508
1509\typebuffer
1510
1511This gives:
1512
1513\startlinecorrection
1514\scale[width=.75\textwidth]{\getbuffer}
1515\stoplinecorrection
1516
1517So we can have a mix of left, right and both end radical like symbols that
1518encompass the nucleus. We're not aware of more such characters in \UNICODE\ but
1519when they show up we are prepared. Only real usage can result in some parameters
1520being fine|-|tuned.
1521
1522\stopsection
1523
1524% \startsection[title=Summary]
1525%
1526% Here we give a summary of some of the things that added on top of \UNICODE\ and
1527% \OPENTYPE\ math in order to be able to properly render these more complex atoms
1528% and molecules.
1529%
1530% \stopsection
1531
1532\startsection[title=Final words]
1533
1534This text was written in 2022 when we were working on math, extending the goodie
1535files with new tweaks, checking support in fonts and updating manuals. But, as we
1536moved forward, for instance with adapting \TYPEONE\ support of Antykwa and Iwona
1537to the new possibilities again we had to go back in time and figure out why
1538actually things were done in certain ways. And I have to admit that we had some
1539good laughs and quite some fun on seeing how strange and inconsistent the assumed
1540structured and logical \TEX\ ecosystem deals with math. A wrapup like is is never
1541complete and we can keep adding to it so just consider it to be a momentary
1542impression.
1543
1544Personally I have to admit that I've always overestimated what happened outside
1545the \CONTEXT\ bubble, especially given the claims made. Consistency in \UNICODE\
1546math is probably not as good as is could have been and the same is true for
1547\OPENTYPE\ math support, but maybe I'm naive in expecting consistency and logic
1548in math related work. The mere fact that Donald Knuth pays a lot of attention to
1549the math in his writing doesn't automatically translate in all \TEX ies doing the
1550same. I don't claim that \CONTEXT\ is doing better but I do hope that its users
1551keep going for the best outcome.
1552
1553One final note. In \CONTEXT\ we always tried to keep up with developments and
1554\UNICODE\ input as well as using \OPENTYPE\ math fonts are part of that. However,
1555because we're not part of the \quote {gremia of \TEX\ math and related coding} it
1556hardly matters what our opinions are with respect to these issues. The best we
1557can do is adapt to whatever shows up, it being bad or good. It is however kind of
1558funny to see (by now rusty) problems that have been noticed already long ago
1559being presented as kind of new. Hopefully staying ahead and|/|or adapting with
1560specific solutions doesn't'backfire to hard on the \CONTEXT\ users. If so, we're
1561sorry for that. As long as they can render their documents well, it doesn't
1562matter that much anyway. After all, we can always just blame \quote {the others
1563involved}.
1564
1565\stopsection
1566
1567\startsection[title=Resources]
1568
1569\starttyping
1570[1] en.wikipedia.org/wiki/Slash_(punctuation)
1571[2] www.unicode.org/reports/tr25
1572[3] www.w3.org/TR/MathML3
1573[4] www.unicode.org/Public/math/revision-15/MathClass-15.txt
1574[5] en.wikipedia.org/wiki/Vertical_bar
1575[6] en.wikipedia.org/wiki/Dash
1576[7] en.wikipedia.org/wiki/Commercial_minus_sign
1577[8] en.wikipedia.org/wiki/Division_sign
1578[9] en.wikipedia.org/wiki/Bullet_(typography)
1579\stoptyping
1580
1581\stopsection
1582
1583% After reading the \UNICODE\ report about math I don't feel too guilty when people
1584% complain about the \CONTEXT\ manuals. It is a curious mix of discussing
1585% organization of symbols, rendering, usage, structure, exchange, parsing,
1586% confusion, etc. and it is clearly a mix of experiences with the web, word
1587% processing and \TEX\ and as such not that useable because it is just not how
1588% \TEX\ works with input and fonts and how users perceive matters. But it
1589% definitely helps to get an idea why we ended up with the current situation: the
1590% unification of math was more a combination of what was there and not a fresh
1591% start. Maybe that is not really possible anyway. If we flash forward a couple of
1592% pages it will all look the same to us as stone age chiseling in stone.
1593
1594\stopchapter
1595
1596\stopcomponent
1597