musings-treasures.tex /size: 9547 b    last modification: 2024-01-16 10:21
1% language=us runpath=texruns:manuals/musings
2
3\startcomponent musings-treasures
4
5\environment musings-style
6
7\startchapter[title={Hidden treasures}]
8
9\startlines \setupalign[flushright]
10Hans Hagen
11Hasselt 2020
12February 2020
13\stoplines
14
15At \CONTEXT\ meetings we always find our moments to reflect on the interesting
16things that relate to \TEX\ that we have run into. Among those we discussed were
17some of the historic treasures one can run into when one looks at source files. I
18will show examples from several domains in the ecosystem and we hereby invite the
19reader to come up with other interesting observations, not so much in order to
20criticize the fantastic open source efforts related to \TEX, but just to indicate
21how decades of development and usage are reflected in the code base and usage, if
22only to make it part of the history of computing.
23
24I start with the plain \TEX\ format. At the top of that file we run into this:
25
26\starttyping[style=\ttx]
27% The following changes define internal codes as recommended
28% in Appendix C of The TeXbook:
29
30\mathcode`\^^@="2201 % \cdot
31\mathcode`\^^A="3223 % \downarrow
32\mathcode`\^^B="010B % \alpha
33\mathcode`\^^C="010C % \beta
34\mathcode`\^^D="225E % \land
35\mathcode`\^^E="023A % \lnot
36\mathcode`\^^F="3232 % \in
37...
38\mathcode`\^^Y="3221 % \rightarrow
39\mathcode`\^^Z="8000 % \ne
40\mathcode`\^^[="2205 % \diamond
41\mathcode`\^^\="3214 % \le
42\mathcode`\^^]="3215 % \ge
43\mathcode`\^^^="3211 % \equiv
44\mathcode`\^^_="225F % \lor
45\stoptyping
46
47This means that when you manage to key in one of these recommended character
48codes that in \ASCII\ sits below the space slot, you will get some math symbol,
49given that you are in math mode. Now, if you also consider that the plain \TEX\
50format is pretty compact and that no bytes are wasted,\footnote {Such definitions
51don't take additional space in the format file.} you might wonder what these
52lines do there. The answer is simple: there were keyboards out there that had
53these symbols. But, by the time \TEX\ became popular, the dominance of the \IBM\
54keyboard let those memories fade away. This is just Don's personal touch I guess.
55Of course the question remains if the sources of TAOCP contain these characters.
56
57There is another interesting hack in the plain \TEX\ file, one that actually,
58when I first looked at the file, didn't immediately made sense to me.
59
60\starttyping[style=\ttx]
61\font\preloaded=cmti9
62\font\preloaded=cmti8
63\font\preloaded=cmti7
64
65\let\preloaded=\undefined
66\stoptyping
67
68What happens here is that a bunch of fonts get defined and they all use the same
69name. Then eventually that name gets nilled. The reason that these definitions
70are there is that when \TEX\ dumps a format file, the information that comes from
71those fonts is embedded to (dimensions, ligatures, kerns, parameters and math
72related) data. It is an indication that in those days it was more efficient to
73have them preloaded (that is why they use that name) than loading them at
74runtime. The fonts are loaded but you can only access them when you define them
75again! Of course nowadays that makes less sense, especially because storage is
76fast and operating systems do a nice job at caching files in memory so that
77successive runs have font files available already.
78
79Talking of fonts, one of the things a new \TEX\ user will notice and also one of
80the things users love to brag about is ligatures. If you run the \type {tftopl}
81program on a file like \type {cmr10.tfm} you will get a verbose representation of
82the font. Here are some lines:
83
84\starttyping[style=\ttx]
85(LABEL C f)  (LIG C i O 14) (LIG C f O 13) (LIG C l O 15)
86(LABEL O 13) (LIG C i O 16) (LIG C l O 17)
87(LABEL C `)  (LIG C ` C \)
88(LABEL C ')  (LIG C ' C ")
89(LABEL C -)  (LIG C - C {)
90(LABEL C {)  (LIG C - C |)
91(LABEL C !)  (LIG C ` C <)
92(LABEL C ?)  (LIG C ` C >)
93\stoptyping
94
95The \type {C} is followed by an \ASCII\ representation and the \type {)} by the
96position in the font \type {O} (a number) or \type {C} (a character). So,
97consider the first two lines to be a puzzle: they define the fi, ff, fl ligatures
98as well as the ffi and ffl ones. Do you see how ligatures are chained?
99
100But anyway, what do these other lines do there? It looks like \type {``} becomes
101the character in the backslash slot and \type {''} the one in the double quote.
102Keep in mind that \TEX\ treats the backslash special and when you want it, it
103will be taken from elsewhere. But still, these two ligatures look familiar: they
104point to slots that have the left and right double quotes.\footnote {\CONTEXT\
105never assumed this and encourages users to use the quotation macros. Those \type
106{``quotes''} look horrible in a source anyway.} They are not really ligatures but
107abuse the ligature mechanism to achieve a similar effect. The last four lines are
108the most interesting: these are ligatures that (probably) no \TEX\ user ever uses
109or encounters. They are again something from the past. Also, changes are low that
110you mistakenly enter these sequences and the follow up Latin Modern fonts don't
111have them anyway.
112
113Actually, if you look at the \METAFONT and \METAPOST\ sources you can find
114lines like these (here we took from \type {mp.w} in the \LUATEX\ repository):
115
116\starttyping[style=\ttx]
117@ @<Put each...@>=
118mp_primitive (mp, "=:", mp_lig_kern_token, 0);
119@:=:_}{\.{=:} primitive@>;
120mp_primitive (mp, "=:|", mp_lig_kern_token, 1);
121@:=:/_}{\.{=:\char'174} primitive@>;
122mp_primitive (mp, "=:|>", mp_lig_kern_token, 5);
123@:=:/>_}{\.{=:\char'174>} primitive@>;
124mp_primitive (mp, "|=:", mp_lig_kern_token, 2);
125@:=:/_}{\.{\char'174=:} primitive@>;
126mp_primitive (mp, "|=:>", mp_lig_kern_token, 6);
127@:=:/>_}{\.{\char'174=:>} primitive@>;
128mp_primitive (mp, "|=:|", mp_lig_kern_token, 3);
129@:=:/_}{\.{\char'174=:\char'174} primitive@>;
130mp_primitive (mp, "|=:|>", mp_lig_kern_token, 7);
131@:=:/>_}{\.{\char'174=:\char'174>} primitive@>;
132mp_primitive (mp, "|=:|>>", mp_lig_kern_token, 11);
133@:=:/>_}{\.{\char'174=:\char'174>>} primitive@>;
134\stoptyping
135
136I won't explain what happens there (as I would have to reread the relevant
137sections of \TEX\ The Program) but the magic is in the special sequences: \typ
138{=: =:| =:|> |=: |=:> |=:| |=:|> |=:|>>}. Similar sequences are used in some font
139related files. I bet that most \METAPOST\ users never entered these as they
140relate to defining ligatures for fonts. Most users know that combining a \type
141{f} and \type {i} gives a \type {fi} but there are other ways to combine too. One
142can praise today's capabilities of \OPENTYPE\ ligature building but \TEX\ was not
143stupid either! But these options were never really used and this treasure will
144stay hidden. Actually, to come back to a previous remark about abusing the
145ligature mechanism: \OPENTYPE\ fonts are just as sloppy as \TEX\ with the quotes:
146there a ligature is just a name for a multiple|-|to|-|one mapping which is not
147always the same as a ligature.
148
149But there are even more surprises with fonts. When Alan Braslau and I redid the
150bibliography subsystem of \CONTEXT\ with help from \LUA, I wrote a converter in
151that language. I actually did that the way I normally do: look at a file (in this
152case a \BIBTEX\ file) and write a parser from scratch. However, at some point we
153wondered how exactly strings got concatenated so I decided to locate the source
154and look at it there. When I scrolled down I noticed a peculiar section:
155
156\starttyping[style=\ttx]
157@^character set dependencies@>
158@^system dependencies@>
159Now we initialize the system-dependent |char_width| array, for which
160|space| is the only |white_space| character given a nonzero printing
161width.  The widths here are taken from Stanford's June~'87
162$cmr10$~font and represent hundredths of a point (rounded), but since
163they're used only for relative comparisons, the units have no meaning.
164
165@d ss_width = 500        {character |@'31|'s width in the $cmr10$ font}
166@d ae_width = 722        {character |@'32|'s width in the $cmr10$ font}
167@d oe_width = 778        {character |@'33|'s width in the $cmr10$ font}
168@d upper_ae_width = 903  {character |@'35|'s width in the $cmr10$ font}
169@d upper_oe_width = 1014 {character |@'36|'s width in the $cmr10$ font}
170
171@<Set initial values of key variables@>=
172for i:=0 to @'177 do char_width[i] := 0;
173@#
174char_width[@'40] := 278;
175char_width[@'41] := 278;
176char_width[@'42] := 500;
177char_width[@'43] := 833;
178char_width[@'44] := 500;
179char_width[@'45] := 833;
180\stoptyping
181
182Do you see what happens here? There are hard coded font metrics in there! As far
183as I can tell, these are used in order to guess the width of the margin for
184references. Of course that won't work well in practice, simply because fonts
185differ. But given that the majority of documents that need references are using
186Computer Modern fonts, it actually might work well, especially with Plain \TEX\
187because that is also hardwired for 10pt fonts. Personally I'd go for a multipass
188analysis (or maybe would have had \BIBTEX\ produce a list of those labels for the
189purpose of analysis but for sure at that time any extra pass was costly in terms
190of performance). That code stays around of course. It makes for some nice
191deduction by historians in the future.
192
193I bet that one can also find weird or unexpected code in \CONTEXT, and definitely
194on the machines of \TEX\ users all around the world. For instance, now that most
195people use \UTF8\ all those encoding related hacks have become history. On the
196other hand, as history tends to cycle, bitmap symbolic fonts suddenly can look
197modern in a time when emoji are often bitmaps. We should guard our treasures.
198
199\stopchapter
200
201\stopcomponent
202