hybrid-math.tex /size: 19 Kb    last modification: 2023-12-21 09:43
1% language=us
2
3\startcomponent hybrid-math
4
5\environment hybrid-environment
6
7\startchapter[title={Handling math: A retrospective}]
8
9{This is \TUGBOAT\ article .. reference needed.}
10
11% In this article I will reflect on how the plain \TEX\ approach to math
12% fonts influenced the way math has been dealt with in \CONTEXT\ \MKII\
13% and why (and how) we divert from it in its follow up \MKIV, now that
14% \LUATEX\ and \OPENTYPE\ math have come around.
15
16When you start using \TEX, you cannot help but notice that math plays an
17important role in this system. As soon as you dive into the code you will see
18that there is a concept of families that is closely related to math typesetting.
19A family is a set of three sizes: text, script and scriptscript.
20
21\startformula
22a^{b^{c}} = \frac{d}{e}
23\stopformula
24
25The smaller sizes are used in superscripts and subscripts and in more complex
26formulas where information is put on top of each other.
27
28It is no secret that the latest math font technology is not driven by the \TEX\
29community but by Microsoft. They have taken a good look at \TEX\ and extended the
30\OPENTYPE\ font model with the information that is needed to do things similar to
31\TEX\ and beyond. It is a firm proof of \TEX's abilities that after some 30 years
32it is still seen as the benchmark for math typesetting. One can only speculate
33what Don Knuth would have come up with if today's desktop hardware and printing
34technology had been available in those days.
35
36As a reference implementation of a font Microsoft provides Cambria Math. In the
37specification the three sizes are there too: a font can provide specifically
38designed script and scriptscript variants for text glyphs where that is relevant.
39Control is exercised with the \type {ssty} feature.
40
41Another inheritance from \TEX\ and its fonts is the fact that larger symbols can
42be made out of snippets and these snippets are available as glyphs in the font,
43so no special additional (extension) fonts are needed to get for instance really
44large parentheses. The information of when to move up one step in size (given
45that there is a larger shape available) or when and how to construct larger
46symbols out of snippets is there as well. Placement of accents is made easy by
47information in the font and there are a whole lot of parameters that control the
48typesetting process. Of course you still need machinery comparable to \TEX's math
49subsystem but Microsoft Word has such capabilities.
50
51I'm not going to discuss the nasty details of providing math support in \TEX, but
52rather pay some attention to an (at least for me) interesting side effect of
53\TEX's math machinery. There are excellent articles by Bogus\l{}aw Jackowski and
54Ulrik Vieth about how \TEX\ constructs math and of course Knuth's publications
55are the ultimate source of information as well.
56
57Even if you only glance at the implementation of traditional \TEX\ font support,
58the previously mentioned families are quite evident. You can have 16 of them but
594 already have a special role: the upright roman font, math italic, math symbol
60and math extension. These give us access to some 1000 glyphs in theory, but when
61\TEX\ showed up it was mostly a 7-bit engine and input of text was often also
627-bit based, so in practice many fewer shapes are available, and subtracting the
63snippets that make up the large symbols brings down the number again.
64
65Now, say that in a formula you want to have a bold character. This character is
66definitely not in the 4 mentioned families. Instead you enable another one, one
67that is linked to a bold font. And, of course there is also a family for bold
68italic, slanted, bold slanted, monospaced, maybe smallcaps, sans serif, etc. To
69complicate things even more, there are quite a few symbols that are not covered
70in the foursome so we need another 2 or 3 families just for those. And yes, bold
71math symbols will demand even more families.
72
73\startformula
74a + \bf b + \bi c = \tt d + \ss e + \cal f
75\stopformula
76
77Try to imagine what this means for implementing a font system. When (in for
78instance \CONTEXT) you choose a specific body font at a certain size, you not
79only switch the regular text fonts, you also initialize math. When dealing with
80text and a font switch there, it is no big deal to delay font loading and
81initialization till you really need the font. But for math it is different. In
82order to set up the math subsystem, the families need to be known and set up and
83as each one can have three members you can imagine that you easily initialize
84some 30 to 40 fonts. And, when you use several math setups in a document,
85switching between them involves at least some re-initialization of those
86families.
87
88When Taco Hoekwater and I were discussing \LUATEX\ and especially what was needed
89for math, it was sort of natural to extend the number of families to 256. After
90all, years of traditional usage had demonstrated that it was pretty hard to come
91up with math font support where you could freely mix a whole regular and a whole
92bold set of characters simply because you ran out of families. This is a side
93effect of math processing happening in several passes: you can change a family
94definition within a formula, but as \TEX\ remembers only the family number, a
95later definition overloads a previous one. The previous example in a traditional
96\TEX\ approach can result in:
97
98\starttyping
99a + \fam7 b + \fam8 c = \fam9 d + \fam10 e + \fam11 f
100\stoptyping
101
102Here the \type{a} comes from the family that reflects math italic (most likely
103family~1) and \type {+} and \type {=} can come from whatever family is told to
104provide them (this is driven by their math code properties). As family numbers
105are stored in the identification pass, and in the typesetting pass resolve to
106real fonts you can imagine that overloading a family in the middle of a
107definition is not an option: it's the number that gets stored and not what it is
108bound to. As it is unlikely that we actually use more than 16 families we could
109have come up with a pool approach where families are initialized on demand but
110that does not work too well with grouping (or at least it complicates matters).
111
112So, when I started thinking of rewriting the math font support for \CONTEXT\
113\MKIV, I still had this nicely increased upper limit in mind, if only because I
114was still thinking of support for the traditional \TEX\ fonts. However, I soon
115realized that it made no sense at all to stick to that approach: \OPENTYPE\ math
116was on its way and in the meantime we had started the math font project. But
117given that this would easily take some five years to finish, an intermediate
118solution was needed. As we can make virtual fonts in \LUATEX, I decided to go
119that route and for several years already it has worked quite well. For the moment
120the traditional \TEX\ math fonts (Computer Modern, px, tx, Lucida, etc) are
121virtualized into a pseudo|-|\OPENTYPE\ font that follows the \UNICODE\ math
122standard. So instead of needing more families, in \CONTEXT\ we could do with
123less. In fact, we can do with only two: one for regular and one for bold,
124although, thinking of it, there is nothing that prevents us from mixing different
125font designs (or preferences) in one formula but even then a mere four families
126would still be fine.
127
128To summarize this, in \CONTEXT\ \MKIV\ the previous example now becomes:
129
130\starttyping
131U+1D44E + U+1D41B + 0x1D484 = U+1D68D + U+1D5BE + U+1D4BB
132\stoptyping
133
134For a long time I have been puzzled by the fact that one needs so many fonts for
135a traditional setup. It was only after implementing the \CONTEXT\ \MKIV\ math
136subsystem that I realized that all of this was only needed in order to support
137alphabets, i.e.\ just a small subset of a font. In \UNICODE\ we have quite a few
138math alphabets and in \CONTEXT\ we have ways to map a regular keyed-in (say)
139\quote{a} onto a bold or monospaced one. When writing that code I hadn't even
140linked the \UNICODE\ math alphabets to the family approach for traditional \TEX.
141Not being a mathematician myself I had no real concept of systematic usage of
142alternative alphabets (apart from the occasional different shape for an
143occasional physics entity).
144
145Just to give an idea of what \UNICODE\ defines: there are alphabets in regular
146(upright), bold, italic, bold italic, script, bold script, fraktur, bold fraktur,
147double|-|struck, sans|-|serif, sans|-|serif bold, sans|-|serif italic,
148sans|-|serif bold italic and monospace. These are regular alphabets with upper-
149and lowercase characters complemented by digits and occasionally Greek.
150
151It was a few years later (somewhere near the end of 2010) that I realized that a
152lot of the complications in (and load on) a traditional font system were simply
153due to the fact that in order to get one bold character, a whole font had to be
154loaded in order for families to express themselves. And that in order to have
155several fonts being rendered, one needed lots of initialization for just a few
156cases. Instead of wasting one font and family for an alphabet, one could as well
157have combined 9 (upper and lowercase) alphabets into one font and use an offset
158to access them (in practice we have to handle the digits too). Of course that
159would have meant extending the \TEX\ math machinery with some offset or
160alternative to some extensive mathcode juggling but that also has some overhead.
161
162If you look at the plain \TEX\ definitions for the family related matters, you
163can learn a few things. First of all, there are the regular four families
164defined:
165
166\starttyping
167\textfont0=\tenrm \scriptfont0=\sevenrm \scriptscriptfont0=\fiverm
168\textfont1=\teni  \scriptfont1=\seveni  \scriptscriptfont1=\fivei
169\textfont2=\tensy \scriptfont2=\sevensy \scriptscriptfont2=\fivesy
170\textfont3=\tenex \scriptfont3=\tenex   \scriptscriptfont3=\tenex
171\stoptyping
172
173Each family has three members. There are some related definitions
174as well:
175
176\starttyping
177\def\rm      {\fam0\tenrm}
178\def\mit     {\fam1}
179\def\oldstyle{\fam1\teni}
180\def\cal     {\fam2}
181\stoptyping
182
183So, with \type {\rm} you not only switch to a family (in math mode) but you also
184enable a font. The same is true for \type {\oldstyle} and this actually brings us
185to another interesting side effect. The fact that oldstyle numerals come from a
186math font has implications for the way this rendering is supported in macro
187packages. As naturally all development started when \TEX\ came around, package
188design decisions were driven by the basic fact that there was only one math font
189available. And, as a consequence most users used the Computer Modern fonts and
190therefore there was never a real problem in getting those oldstyle characters in
191your document.
192
193However, oldstyle figures are a property of a font design (like table digits) and
194as such not specially related to math. And, why should one tag each number then?
195Of course it's good practice to tag extensively (and tagging makes switching
196fonts easy) but to tag each number is somewhat over the top. When more fonts
197(usable in \TEX) became available it became more natural to use a proper oldstyle
198font for text and the \type {\oldstyle} more definitely ended up as a math
199command. This was not always easy to understand for users who primarily used
200\TEX\ for anything but math.
201
202Another interesting aspect is that with \OPENTYPE\ fonts oldstyle figures are
203again an optional feature, but now at a different level. There are a few more
204such traditional issues: bullets often come from a math font as well (which works
205out ok as they have nice, not so tiny bullets). But the same is true for
206triangles, squares, small circles and other symbols. And, to make things worse,
207some come from the regular \TEX\ math fonts, and others from additional ones,
208like the \AMS\ symbols. Again, \OPENTYPE\ and \UNICODE\ will change this as now
209these symbols are quite likely to be found in fonts as they have a larger
210repertoire of shapes.
211
212From the perspective of going from \MKII\ to \MKIV\ it boils down to changing old
213mechanisms that need to handle all this (dependent on the availability of fonts)
214to cleaner setups. Of course, as fonts are never completely consistent, or
215complete for that matter, and features can be implemented incorrectly or
216incompletely we still end up with issues, but (at least in \CONTEXT) dealing with
217that has been moved to runtime manipulation of the fonts themselves (as part of
218the so-called font goodies).
219
220Back to the plain definitions, we now arrive at some new families:
221
222\starttyping
223\newfam\itfam \def\it{\fam\itfam\tenit}
224\newfam\slfam \def\sl{\fam\slfam\tensl}
225\newfam\bffam \def\bf{\fam\bffam\tenbf}
226\newfam\ttfam \def\tt{\fam\ttfam\tentt}
227\stoptyping
228
229The plain \TEX\ format was never meant as a generic solution but instead was an
230example of a macro set and serves as a basis for styles used by Don Knuth for his
231books. Nevertheless, in spite of the fact that \TEX\ was made to be extended,
232pretty soon it became frozen and the macros and font definitions that came with
233it became the benchmark. This might be the reason why \UNICODE\ now has a
234monospaced alphabet. Once you've added monospaced you might as well add more
235alphabets as for sure in some countries they have their own preferences.
236\footnote {At the Dante 2011 meeting we had interesting discussions during dinner
237about the advantages of using Sütterlinschrift for vector algebra and the
238possibilities for providing it in the upcoming \TeX\ Gyre math fonts.}
239
240As with \type {\rm}, the related commands are meant to be used in text as well.
241More interesting is to see what follows now:
242
243\starttyping
244\textfont        \itfam=\tenit
245\textfont        \slfam=\tensl
246
247\textfont        \bffam=\tenbf
248\scriptfont      \bffam=\sevenbf
249\scriptscriptfont\bffam=\fivebf
250
251\textfont        \ttfam=\tentt
252\stoptyping
253
254Only the bold definition has all members. This means that (regular) italic,
255slanted, and monospaced are not actually that much math at all. You will probably
256only see them in text inside a math formula. From this you can deduce that
257contrary to what I said before, these variants were not really meant for
258alphabets, but for text in which case we need complete fonts. So why do I still
259conclude that we don't need all these families? In practice text inside math is
260not always done this way but with a special set of text commands. This is a
261consequence of the fact that when we add text, we want to be able to do so in
262each language with even language|-|specific properties supported. And, although a
263family switch like the above might do well for English, as soon as you want
264Polish (extended Latin), Cyrillic or Greek you definitely need more than a family
265switch, if only because encodings come into play. In that respect it is
266interesting that we do have a family for monospaced, but that \type {\Im} and
267\type {\Re} have symbolic names, although a more extensive setup can have a
268blackboard family switch.
269
270By the way, the fact that \TEX\ came with italic alongside slanted also has some
271implications. Normally a font design has either italic or something slanted (then
272called oblique). But, Computer Modern came with both, which is no surprise as
273there is a metadesign behind it. And therefore macro packages provide ways to
274deal with those variants alongside. I wonder what would have happened if this had
275not been the case. Nowadays there is always this regular, italic (or oblique),
276bold and bold italic set to deal with, and the whole set can become lighter or
277bolder.
278
279In \CONTEXT\ \MKII, however, the set is larger as we also have slanted and bold
280slanted and even smallcaps, so most definition sets have 7~definitions instead
281of~4. By the way, smallcaps is also special. if Computer Modern had had smallcaps
282for all variants, support for them in \CONTEXT\ undoubtedly would have been kept
283out of the mentioned~7 but always been a new typeface definition (i.e.\ another
284fontclass for insiders). So, when something would have to be smallcaps, one would
285simply switch the whole lot to smallcaps (bold smallcaps, etc.). Of course this
286is what normally happens, at least in my setups, but nevertheless one can still
287find traces of this original Computer Modern|-|driven approach. And now we are at
288it: the whole font system still has the ability to use design sizes and combine
289different ones in sets, if only because in Computer Modern you don't have all
290sizes. The above definitions use ten, seven and five, but for instance for an
291eleven point set up you need to creatively choose the proper originals and scale
292them to the right family size. Nowadays only a few fonts ship with multiple
293design sizes, and although some can be compensated with clever hinting it is a
294pity that we can apply this mechanism only to the traditional \TEX\ fonts.
295
296Concerning the slanting we can remark that \TEX ies are so fond of this that they
297even extended the \TEX\ engines to support slanting in the core machinery (or
298more precisely in the backend while the frontend then uses adapted metrics). So,
299slanting is available for all fonts.
300
301This brings me to another complication in writing a math font subsystem: bold.
302During the development of \CONTEXT\ \MKII\ I was puzzled by the fact that user
303demands with respect to bold were so inconsistent. This is again related to the
304way a somewhat simple setup looks: explicitly switching to bold characters or
305symbols using a \type {\bf} (alike) switch. This works quite well in most cases,
306but what if you use math in a section title? Then the whole lot should be in bold
307and an embedded bold symbol should be heavy (i.e.\ more bold than bold). As a
308consequence (and due to limited availability of complete bold math fonts) in
309\MKII\ there are several bold strategies implemented.
310
311However, in a \UNICODE\ universe things become surprisingly easy as \UNICODE\
312defines those symbols that have bold companions (whatever you want to call them,
313mostly math alphanumerics) so a proper math font has them already. This limited
314subset is often available in a font collection and font designers can stick to
315that subset. So, eventually we get one regular font (with some bold glyphs
316according to the \UNICODE\ specification) and a bold companion that has heavy
317variants for those regular bold shapes.
318
319The simple fact that \UNICODE\ distinguishes regular and bold simplifies an
320implementation as it's easier to take that as a starting point than users who for
321all their goodwill see only their small domain of boldness.
322
323It might sound like \UNICODE\ solves all our problems but this is not entirely
324true. For instance, the \UNICODE\ principle that no character should be there
325more than once has resulted in holes in the \UNICODE\ alphabets, especially
326Greek, blackboard, fraktur and script. As exceptions were made for non|-|math I
327see no reason why the few math characters that now put holes in an alphabet could
328not have been there. As with more standards, following some principles too
329strictly eventually results in all applications that follow the standard having
330to implement the same ugly exceptions explicitly. As some standards aim for
331longevity I wonder how many programming hours will be wasted this way.
332
333This brings me to the conclusion that in practice 16 families are more than
334enough in a \UNICODE|-|aware \TEX\ engine especially when you consider that for a
335specific document one can define a nice set of families, just as in plain \TEX.
336It's simply the fact that we want to make a macro package that does it all and
337therefore has to provide all possible math demands into one mechanism that
338complicates life. And the fact that \UNICODE\ clearly demonstrates that we're
339only talking about alphabets has brought (at least) \CONTEXT\ back to its basics:
340a relatively simple, few|-|family approach combined with a dedicated alphabet
341selection system. Of course eventually users may come up with new demands and we
342might again end up with a mess. After all, it's the fact that \TEX\ gives us
343control that makes it so much fun.
344
345\stopchapter
346
347\stopcomponent
348