fonts-formats.tex /size: 44 Kb    last modification: 2021-10-28 13:50
1% language=us runpath=texruns:manuals/fonts
2
3\startcomponent fonts-formats
4
5\environment fonts-environment
6
7\startchapter[title=Font formats][color=darkred]
8
9\startsection[title=Introduction]
10
11In this chapter the font formats as we know them will be introduced. The
12descriptions will be rather general but more details can be found in the
13appendix. Although in \MKIV\ we do support all these types eventually the focus
14will be on \OPENTYPE\ fonts but it does not hurt to see where we are coming from.
15
16\stopsection
17
18\startsection[title=Glyphs]
19
20A typeset text is mostly a sequence of characters turned into glyphs. We talk of
21characters when you input the text, but the visualization involves glyphs. When
22you copy a part of the screen in an open \PDF\ document or \HTML\ page back to
23your editor you end up with characters again. In case you wonder why we make this
24distinction between these two states we give an example.
25
26\startplacefigure [location=here,reference=fig:character-glyph,title=From characters to glyphs.]
27    \startcombination
28        {\color[maincolor]{\definedfont[Serif*default       at 30pt]affiliation}} {upright}
29        {\color[maincolor]{\definedfont[SerifItalic*default at 30pt]affiliation}} {italic}
30    \stopcombination
31\stopplacefigure
32
33We see here that the shape of the \type {a} is different for an upright serif and
34an italic. We also see that in \type {ffi} there is no dot on the \type {i}. The
35first case is just a stylistic one but the second one, called a ligature, is
36actually one shape. The 11 characters are converted into 9 glyphs. Hopefully the
37final document format carries some extra information about this transformation so
38that a cut and paste will work out well. In \PDF\ files this is normally the
39case. In this document we will not be too picky about the distinction as in most
40cases the glyph is rather related to the character as one knows it.
41
42So, a font contains glyphs and it also carries some information about
43replacements. In addition to that there needs to be at least some information
44about the dimensions of them. Actually, a typesetting engine does not have to
45know anything about the actual shape at all.
46
47\startplacefigure [location=here,reference=fig:glyph-dimension-normal,title=The boundingbox of some normal glyphs.]
48    \startcombination[9*1]
49        {\ruledhbox{\color[maincolor]{\definedfont[Serif*default at 30pt]a}}}   {}
50        {\ruledhbox{\color[maincolor]{\definedfont[Serif*default at 30pt]b}}}   {}
51        {\ruledhbox{\color[maincolor]{\definedfont[Serif*default at 30pt]g}}}   {}
52        {\ruledhbox{\color[maincolor]{\definedfont[Serif*default at 30pt]l}}}   {}
53        {\ruledhbox{\color[maincolor]{\definedfont[Serif*default at 30pt]q}}}   {}
54        {\ruledhbox{\color[maincolor]{\definedfont[Serif*default at 30pt].}}}   {}
55        {\ruledhbox{\color[maincolor]{\definedfont[Serif*default at 30pt];}}}   {}
56        {\ruledhbox{\color[maincolor]{\definedfont[Serif*default at 30pt]?}}}   {}
57        {\ruledhbox{\color[maincolor]{\definedfont[Serif*default at 30pt]ffi}}} {}
58    \stopcombination
59\stopplacefigure
60
61\startplacefigure [location=here,reference=fig:glyph-dimension-italic,title=The boundingbox of some italic glyphs.]
62    \startcombination[9*1]
63        {\ruledhbox{\color[maincolor]{\definedfont[SerifItalic*default at 30pt]a}}}   {}
64        {\ruledhbox{\color[maincolor]{\definedfont[SerifItalic*default at 30pt]b}}}   {}
65        {\ruledhbox{\color[maincolor]{\definedfont[SerifItalic*default at 30pt]g}}}   {}
66        {\ruledhbox{\color[maincolor]{\definedfont[SerifItalic*default at 30pt]l}}}   {}
67        {\ruledhbox{\color[maincolor]{\definedfont[SerifItalic*default at 30pt]q}}}   {}
68        {\ruledhbox{\color[maincolor]{\definedfont[SerifItalic*default at 30pt].}}}   {}
69        {\ruledhbox{\color[maincolor]{\definedfont[SerifItalic*default at 30pt];}}}   {}
70        {\ruledhbox{\color[maincolor]{\definedfont[SerifItalic*default at 30pt]?}}}   {}
71        {\ruledhbox{\color[maincolor]{\definedfont[SerifItalic*default at 30pt]ffi}}} {}
72    \stopcombination
73\stopplacefigure
74
75The rectangles around the shapes \in {figure} [fig:glyph-dimension-normal] and \in
76{figure} [fig:glyph-dimension-italic] are called boundingbox. The dashed line
77reflects the baseline where they eventually are aligned onto next to each other.
78The amount above the baseline is called height, and below is called depth. The
79piece of the shape above the baseline is the ascender and the bit below the
80descender. The width of the bounding box is not by definition the width of the
81glyph. In \TYPEONE\ and \OPENTYPE\ fonts each shape has a so called advance width
82and that is the one that will be used.
83
84\usemodule[fonts-kerns]
85
86\startplacefigure [location=here,reference=fig:glyph-kerns,title={Kerning in Latin Roman, Cambria, Pagella and Dejavu.}]
87    \scale[width=\textwidth]{\startcombination[1*4]
88        {\color[maincolor]{\definedfont[name:lmroman10-regular*default     sa   1]\ShowKernedHBox{Very often glyphs get very small spaces inserted horizontally.}}} {}
89        {\color[maincolor]{\definedfont[name:cambria*default               sa   1]\ShowKernedHBox{Very often glyphs get very small spaces inserted horizontally.}}} {}
90        {\color[maincolor]{\definedfont[name:texgyrepagellaregular*default sa   1]\ShowKernedHBox{Very often glyphs get very small spaces inserted horizontally.}}} {}
91        {\color[maincolor]{\definedfont[name:dejavuserif*default           sa 0.9]\ShowKernedHBox{Very often glyphs get very small spaces inserted horizontally.}}} {}
92    \stopcombination}
93\stopplacefigure
94
95Another traditional property of a font is kerning. In \in {figure}
96[fig:glyph-kerns] you see this in action. These examples
97demonstrate that not all fonts need (or provide) the same kerns
98(in points).
99
100So, as a start, we have now met a couple of properties of a font.
101They can be summarized as follows:
102
103\starttabulate[|l|p|]
104\NC mapping to glyphs   \EQ characters are represented by a shapes that have recognizable
105                            properties so that readers know what they mean \NC \NR
106\NC ligature building   \EQ a sequence of characters gets mapped onto one glyph \NC \NR
107\NC dimensions          \EQ each glyph has a width, height and depth \NC \NR
108\NC inter-glyph kerning \EQ optionally a bit of positive or negative space has to be inserted between glyphs \NC \NR
109%NC italic correction   \EQ a correction is applied between an oblique shape and what follows \NC \NR
110\stoptabulate
111
112Regular font kerning is hardly noticeable and improves the overall look of the
113page. Typesetting applications sometimes are capable of inserting additional
114spaces between shapes. This more excessive kerning is not that much related to
115the font and is used for special purposes, like making a snippet of text stand
116out. In \CONTEXT\ this kind of kerning is available but it is a font independent
117feature. Keep in mind that when applying that kind of rather visible kerning
118you'd better not have ligatures and fancy replacements enabled as \CONTEXT\
119already tries to deal with that as good as possible.
120
121\stopsection
122
123\startsection[title=The basic process]
124
125In \TEX\ a font is an abstraction: the engine only needs to know about the
126mapping from characters to glyphs, what the width, height and depth is, what
127sequences need to be translated into ligatures and when kerning has to be
128applied. If for the moment we forget about math, these are all the properties
129that matter and this is what the \TEX\ font metric files that we see in the next
130section provide.
131
132Because one of the principles behind \LUATEX\ is that the core engine (the
133binary) stays small and that new functionality is provided in \LUA\ code, the
134font subsystem largely looks like it always has been. As users will normally use
135a macro package most of the loading will be hidden from the user. It is however
136good to give a quick overview of how for instance \PDFTEX\ deals with fonts using
137traditional metric files.
138
139\startFLOWchart[pdftex]
140    \startFLOWcell
141        \name {source}
142        \location {1,1}
143        \shape {action}
144        \text {input}
145        \connection [rl] {parser}
146    \stopFLOWcell
147    \startFLOWcell
148        \name {parser}
149        \location {2,1}
150        \shape {action}
151        \text {characters}
152        \connection [rl] {builder}
153    \stopFLOWcell
154    \startFLOWcell
155        \name {builder}
156        \location {3,1}
157        \shape {action}
158        \text {glyphs}
159        \connection [rl] {backend}
160    \stopFLOWcell
161    \startFLOWcell
162        \name {backend}
163        \location {4,1}
164        \shape {action}
165        \text {subset}
166    \stopFLOWcell
167\stopFLOWchart
168
169\startplacefigure [location=here,reference=fig:tfm-pdftex,title={Several translation steps in a traditonal \TEX\ flow.}]
170    \FLOWchart[pdftex]
171\stopplacefigure
172
173The input (bytes) gets translated into characters by the input parser. Normally
174this is a one|-|to|-|one translation but there are examples of some translation
175taking place. You can for instance make characters active and give them a
176meaning. So, the eight bit represention of an editors code page \type {ë} can
177become something else internally, for instance a regular \type {e} with an \type
178{¨} overlayed. It can also become another character, which in the code page
179would be shown as \type {á} but the user will not know this as by then this byte
180is already tokenized. Another example is multibyte translation, for instance
181\UTF\ sequences can get remapped to something that is known internally as being a
182character of some kind. The \LUATEX\ engine expects \UTF\ so a macro package has
183to make sure that translation to this encoding happens beforehand, for instance
184using a callback that intercepts the input from file. \footnote {In \CONTEXT\ we
185talk of input regimes and these can be mixed, although in practice most users
186will stick to \UTF\ and never use regimes.}
187
188So, the input character (sequence) becomes tokens representing a character. From
189these tokens \TEX\ will start building a (linked) node list where each character
190becomes a node. In this node there is a reference to the current font. If you
191know \TEX\ you will understand that a list can have more than characters: there
192can be skips, kerns, rules, references to images, boxes, etc.
193
194At some point \TEX\ will handle this list over to a routine that will turn them
195into something that resembles a paragraph or otherwise snippet of text. In that
196stage hyphenation kicks in, ligatures get built and kerning is added. Character
197references become glyph indices. This list can finally be broken into lines.
198
199It is no secret that \TEX\ can box and unbox material and that after unboxing
200some new formatting has to happen. The traditional engine has some optimizations
201that demand a partial reconstruction of the original list but in \LUATEX\ we
202removed this kind of optimization so there the process is somewhat simpler. We
203will see more of that later.
204
205When \TEX\ ships out a page, the backend will load the real font data and merge
206that into the final output. It will now take the glyph index and build the right
207data structures and references to the real font. As a font gets subset only the
208used glyphs end up in the final output.
209
210There is one tricky aspect involved here: re|-|encoding. In so called map files
211one can map a specific metric filename onto a real font name. One can also
212specify an encoding vector that tells what a given index really refers to. This
213makes it possible to use fonts that have more than 256 glyphs and refer to any of
214them. This is also the trick that makes it possible to use \TRUETYPE\ fonts in
215\PDFTEX: the backend code filters the right glyphs from the font, remapping
216\TEX's glyph indices onto real entries in the font happens via the encoding
217vector. In \in {figure} [fig:tfm-bytes] we show a possible route for input byte
21868.
219
220\startFLOWchart[bytes]
221    \startFLOWcell
222        \name {source}
223        \location {1,1}
224        \shape {action}
225        \text {bytes (68)}
226        \connection [rl] {parser}
227    \stopFLOWcell
228    \startFLOWcell
229        \name {parser}
230        \location {2,1}
231        \shape {action}
232        \text {bytes (31)}
233        \connection [rl] {builder}
234    \stopFLOWcell
235    \startFLOWcell
236        \name {builder}
237        \location {3,1}
238        \shape {action}
239        \text {index (31)}
240        \connection [rl] {backend}
241    \stopFLOWcell
242    \startFLOWcell
243        \name {backend}
244        \location {4,1}
245        \shape {action}
246        \text {index (88)}
247    \stopFLOWcell
248\stopFLOWchart
249
250\startplacefigure [location=here,reference=fig:tfm-bytes,title={From bytes to indices.}]
251    \FLOWchart[bytes]
252\stopplacefigure
253
254As \LUATEX\ carries much of the bagage of older engines, you can still do it this
255way but in \CONTEXT\ \MKIV\ we have made our live much simpler: we use unicode as
256much as possible. This means that we effectively have removed two steps (see \in
257{figure} [fig:tfm-luatex]).
258
259\startFLOWchart[luatex]
260    \startFLOWcell
261        \name {source}
262        \location {1,1}
263        \shape {action}
264        \text {input}
265        \connection [rl] {builder}
266    \stopFLOWcell
267    \startFLOWcell
268        \name {builder}
269        \location {2,1}
270        \shape {action}
271        \text {glyphs}
272    \stopFLOWcell
273\stopFLOWchart
274
275\startplacefigure [location=here,reference=fig:tfm-luatex,title={Simplified mapping in \LUATEX.}]
276    \FLOWchart[luatex]
277\stopplacefigure
278
279There is of course still some work to do for the backend, like subsetting, but
280the nasty dependency on the input encoding, font encoding (that itself relates to
281hyphenation) and backend re|-|encoding is gone. But keep in mind that the
282internal data structure of the font is still quite traditional.
283
284Before we move on to font formats I like to point out that there is no space in
285\TEX. Spaces in the input are converted into glue, either or not with some
286stretch and|/|or shrink. This also means that accessing character 32 in
287traditional \TEX\ will not end up as space in the output.
288
289\stopsection
290
291\startsection[title=\TEX\ metrics]
292
293\appendixdata{\in[fontdata:tfm]}
294\appendixdata{\in[fontdata:vf]}
295
296Traditional font metrics are packaged in a binary format. Due to the limitations
297of that time a font has at most 256 characters. In books dedicated to \TEX\ you
298will often find tables that show what glyphs are in a font, so we will not repeat
299that here as after all we got rid of that limitation in \LUATEX.
300
301Because 256 is not that much, especially when you mix many scripts and need lots
302of symbols from the same font, there are quite some encodings used in traditional
303\TEX, like \type {texnansi}, \type {ec} and \type {qx}. When you use \LUATEX\
304exclusively you can do with way less font files. This is easier for users,
305especially because most of those files were never used anyway. It's interesting
306to notice that some of the encodings contain symbols that are never used or used
307only once in a document, like the copyright or registered symbols. They are often
308accessed by symbolic names and therefore easily could have been omitted and
309collected in a dedicated symbol font thereby freeing slots for more useful
310characters anyway. The lack of coverage is probably one of the reasons why new
311encodings kept popping up. In the next table you see how many files are involved
312in Latin Modern which comes in a couple of design sizes. \footnote {The original
313Computer Modern fonts have \METAFONT\ source files and (runtime) generated bitmap
314files in whatever resolutions are needed for previewing and printing. The
315\TYPEONE\ follow|-|up came in several sets, organized by language support. The
316Latin Modern fonts have a few more weights and variants than Computer Modern.}
317
318\starttabulate[|l|c|r|r|r|]
319\HL
320\NC \bf font format \NC \bf type \NC \bf \# files \NC \bf size in bytes \NC \bf \CONTEXT \NC \NR
321\HL
322\NC type 1   \NC tfm \NC 380 \NC  3.841.708 \NC \NC \NR
323\NC          \NC afm \NC  25 \NC  2.697.583 \NC \NC \NR
324\NC          \NC pfb \NC  92 \NC  9.193.082 \NC \NC \NR
325\NC          \NC enc \NC  15 \NC     37.605 \NC \NC \NR
326\NC          \NC map \NC   9 \NC     42.040 \NC \NC \NR
327\HL[darkgray]
328\NC          \NC     \NC 521 \NC 15.812.018 \NC mkii \NC \NR
329\HL
330\NC opentype \NC otf \NC  73 \NC  8.224.100 \NC mkiv \NC \NR
331\HL
332\stoptabulate
333
334A \TFM\ file can contain so called italic corrections. This is an additional kern
335that can be added after a character in order to get better spacing between an
336italic shape and an upright one. As this is manual work, it's a not that advanced
337mechanism, but in addition to width, height, depth, kerns and ligatures it is
338nevertheless a useful piece of information. But, it's a rather \TEX\ specific
339quantity.
340
341Since \TEX\ showed up many fonts have been added. In addition support for
342commercial fonts was provided. In fact, for that to happen, one only needs
343accompanying metric files for \TEX\ itself and map files and encoding vectors
344for the backend. Because a metric file also has some general information, like
345spacing (including stretch and shrink), the ex|-|height and em|-|width, this
346means that sometimes guesses must be made when the original font does not come
347with such parameters.
348
349At some point virtual fonts were introduced. In a virtual font a \TFM\ file has
350an accompanying \VF\ file. In that file each glyph has a specification that tells
351where to find the real glyph. It is even possible to construct glyphs from other
352glyphs. In traditional \TEX\ this only concerns the backend, which in \PDFTEX\ is
353built in. In \LUATEX\ this mechanism is integrated into the frontend which means
354that users can construct such virtual fonts themselves. We will see more of that
355later, but for now it's enough to know that when we talk about the representation
356of font (the \TFM\ table) in \LUATEX, this includes virtual functionality.
357
358An important limitation of \TFM\ files cq.\ traditional \TEX\ is that the number
359of depths and heights is limited to 16 each. Although this results in somewhat
360inaccurate dimensions in practice this gets unnoticed, if only because many
361designs have some consistency in this. On the other hand, it is a limitation when
362we start thinking of accents or even multiple accents which lead to many more
363distinctive heights and depths.
364
365Concerning ligatures we can remark that there are quite some substitutions
366possible although in practice only the multiple to one replacement has been used.
367
368Some fonts that are used in \TEX\ started out as bitmaps but rather soon
369\TYPEONE\ outline fonts became the fashion. These are supported using the map
370files that we will discuss later. First we look into \TYPEONE\ fonts.
371
372\stopsection
373
374\startsection[title=\TYPEONE]
375
376\appendixdata{\in[fontdata:afm]}
377\appendixdata{\in[fontdata:enc]}
378\appendixdata{\in[fontdata:map]}
379
380For a long time \TYPEONE\ fonts have dominated the scene. These are \POSTSCRIPT\
381fonts that can have more that 256 glyphs in the file that defines the shapes, but
382only 256 of them can be used at one time. Of course there can be multiple subsets
383active in one document.
384
385In traditional \TEX\ a \TYPEONE\ font is used by making a \TFM\ file from a so
386called Adobe metric file (\AFM) that come with such a font. There are several
387tool chains for doing this and \CONTEXT\ \MKII\ ships with one that can be of
388help when you need to support commercial fonts. Projects like the Latin Modern
389Fonts and \TEX\ Gyre have normalized a whole lot of fonts that came in several
390more or less complete encodings into a consistent package of \TYPEONE\ fonts.
391This already simplified live a lot but still users had to choose a suitable input
392and font encoding for their language and|/|or script. As \TEX\ only cares about
393metrics and not about the rendering, it doesn't consider \TYPEONE\ fonts as
394something special. Also, as \TEX\ and \POSTSCRIPT\ were developed about the same
395time support for \TYPEONE\ fonts is rather present in \TEX\ distributions.
396
397You can still follow this route but for \CONTEXT\ \MKIV\ this is no longer the
398recommended way because there we have changed the whole subsystem to use
399\UNICODE. As a result we no longer use \TFM\ files derived from \AFM\ files, but
400directly interpret the \AFM\ data. This not only removes the 256 limitation, but
401also brings more resolution in height and depth as we no longer have at most 16
402alternatives. There can also be more kerns. Of course we need some heuristics to
403determine for instance the spacing but that is not different from former times.
404
405Because most \TEX\ users don't use commercial fonts, they will not notice that
406\CONTEXT\ \MKIV\ treats \TYPEONE\ fonts this way. One reason is that the free
407fonts also come as wide fonts in \OPENTYPE\ format and whenever possible
408\CONTEXT\ prefers \OPENTYPE\ over \TYPEONE\ over \TFM.
409
410In the beginning \LUATEX\ only could load a \TFM\ file, which is why loading
411\AFM\ files is implemented in \LUA. Later, when the \OPENTYPE\ loaded was added,
412loading \PFB\ and \AFM\ files also became possible but it's slower and we see no
413reason to rewrite the current code in \CONTEXT. We also do a couple of extra
414things when loading such a file. As more \TYPEONE\ fonts move on to \OPENTYPE\ we
415don't expect that much usage anyway.
416
417\stopsection
418
419\startsection[title=\OPENTYPE]
420
421\appendixdata{\in[fontdata:otf]}
422
423When an engine can deal with \UNICODE\ directly it also means that internally it
424uses pretty large numbers for storing characters and glyph indices. The first
425\TEX\ descendent that went wide was \OMEGA, later replaced by \ALEPH. However, this
426engine never took off and still used its own extended \TFM\ format: \OFM. In fact,
427as \LUATEX\ uses some of the \ALEPH\ code, it can also use these extended metric
428files but I don't think that there are any useful fonts around so we can forget
429about this.
430
431We use the term \OPENTYPE\ for a couple of font formats that share the same
432principles: \OPENTYPE\ (\OTF), \TRUETYPE\ (\TTF) and \TRUETYPE\ containers
433(\TTC). The \LUATEX\ font reader presents them in a similar format. In the case
434of a \TRUETYPE\ container, one does not load the whole font but selects an
435instance from it. Internally an \OPENTYPE\ font can have the glyphs organized in
436subfonts.
437
438The first \TEX\ descendent to really go wide from front to back is \XETEX. This
439engine can use \OPENTYPE\ fonts directly and for a whole category of users this
440opened up a new world. Hoever, it is still mostly a traditional engine. The
441transition from characters to glyphs is accomplished by external libraries, while
442in \LUATEX\ we code in \LUA. This has the disadvantage that it is slower
443(although that depends on the job) but the advantage is that we have much more
444control and can extend the font handler as we like.
445
446An \OPENTYPE\ font is much more complex than a \TYPEONE\ one. Unless it is a
447quick and dirty converted existing font, it will have more glyphs to start with.
448Quite likely it will have kerns and ligatures too and of course there are
449dimensions. However, there is no concept of a depth and height. These need to be
450deduced from the bounding box instead. There is an advance width. This means that
451we can start right away using such fonts if we map those properties onto the
452\TFM\ table that \LUATEX\ expects.
453
454But there is more, take ligatures. In a traditional font the sequence \type {ffi}
455always becomes a ligature, given that the font has such a glyph. In \LUATEX\
456there is a way to disable this mechanism, which is sometimes handy when dealing
457with mono|-|spaced fonts in verbatim. It's pretty hard to disable that. For
458instance one option is to insert kerns manually. In an \OPENTYPE\ font ligatures
459are collected in a so called feature. There can be many such features and even
460kerning is a feature. Other examples are old style numerals, fractions,
461superiors, inferiors, historic ligatures and stylistic alternates.
462
463\starttabulate[|lT|l|l|l|l|]
464\NC \type{onum} \NC \ruledhbox{\maincolor\DemoOnumLM\char45 1}
465                \NC \ruledhbox{\maincolor\DemoOnumLM1234567890}
466                \NC \ruledhbox{\maincolor\DemoOnumLM\char"A2}
467                \NC \ruledhbox{\maincolor\DemoOnumLM\char"24} \NC \NR
468%NC \type{lnum} \NC \ruledhbox{\maincolor\DemoLnumLM\char45 1}
469%               \NC \ruledhbox{\maincolor\DemoLnumLM1234567890}
470%               \NC \ruledhbox{\maincolor\DemoLnumLM\char"A2}
471%               \NC \ruledhbox{\maincolor\DemoLnumLM\char"24} \NC \NR
472\NC \type{tnum} \NC \ruledhbox{\maincolor\DemoTnumLM\char45 1}
473                \NC \ruledhbox{\maincolor\DemoTnumLM1234567890}
474                \NC \ruledhbox{\maincolor\DemoTnumLM\char"A2}
475                \NC \ruledhbox{\maincolor\DemoTnumLM\char"24} \NC \NR
476\NC \type{pnum} \NC \ruledhbox{\maincolor\DemoPnumLM\char45 1}
477                \NC \ruledhbox{\maincolor\DemoPnumLM1234567890}
478                \NC \ruledhbox{\maincolor\DemoPnumLM\char"A2}
479                \NC \ruledhbox{\maincolor\DemoPnumLM\char"24} \NC \NR
480\stoptabulate
481
482To this all you need to add that features operate in two dimensions: languages
483and scripts. This means that when ligatures are enabled for Dutch the \type {ij}
484sequence becomes a single glyph but for German it gets mapped onto two glyphs.
485And, to make it even more complex, a substitution can depend on circumstances,
486which means that for Dutch \type {fijn} becomes \type {f ij n} but \type {fiets}
487becomes \type {fi ets}. It will be no surprise that not all \OPENTYPE\ fonts come
488with a complete and rich repertoire of rules. To make things worse, there can be
489rules that turn \type {1/2} into one glyph, or transfer the numbers into superior
490and inferior alternatives, but leaves us with an unacceptable rendered \type
491{1/a}, given that the \type {frac} features is enabled. It looks like features
492like this are to be applied to a manually selected range of characters.
493
494The fact that an \OPENTYPE\ font can contain many features and rules to apply
495them makes it possible to typeset scripts like Arabic. And this is where it gets
496vague. A generic \OPENTYPE\ sub|-|engine can do clever things using these rules,
497but if you read the specification for some scripts additional intelligence has to
498be provided by the typesetting engine.
499
500While users no longer have to care about encodings, map files and back|-|end
501issues, they do have to carry knowledge about the possibilities and limitations
502of features. Even worse, he or she needs to be aware that fonts can have bugs.
503Also, as font vendors have no tradition of providing updates this is something
504that we might need to take care of ourselves by tweaking the engine.
505
506One of the problems with the transition from \TYPEONE\ to \OPENTYPE\ is that font
507designers can take an existing design and start from that basic repertoire of
508shapes. If such a design had oldstyle figures only, there is a good chance that
509this will be the case in the \OPENTYPE\ variant too. However, such a default
510interferes with the fact that the \type {onum} feature is one that we explicitly
511have to enable. This means that writing a generic style where a font is later
512plugged in becomes somewhat messy if it assumes that features need to be turned
513on.
514
515\TEX\ users expect more control, which means that in practice just an \OPENTYPE\
516engine is not enough, but for the average font the \TEX\ model using the
517traditional approach still is quite acceptable. After all, not all users use
518complex scripts or need advanced features. And, in practice most readers don't
519notice the difference anyway.
520
521\stopsection
522
523\startsection[title=\LUA]
524
525\appendixdata{\in[fontdata:lua]}
526
527As mentioned support for virtual fonts is built into \LUATEX\ and loading the so
528called \VF\ files happens when needed. However, that concerns traditional fonts
529that we already covered. In \CONTEXT\ we do use the virtual font mechanism for
530creating missing glyphs out of existing ones or add fallbacks when this is not
531possible. But this is not related to some kind of font format.
532
533In 2010 and 2011 the first public \OPENTYPE\ math fonts showed up that replace
534their \TYPEONE\ originals. In \CONTEXT\ we already went forward and created
535virtual \UNICODE\ fonts out of traditional fonts. Of course eventually the
536defaults will change to the \OPENTYPE\ alternatives. The specification for such a
537virtual font is given in \LUA\ tables and therefore you can consider \LUA\ to be
538a font format as well. In \CONTEXT\ such fonts can be defined in so called
539goodies files. As we use these files for much more tuning, we come back to that
540in a later chapter. In a virtual font you can mix real \TYPEONE\ fonts and real
541\OPENTYPE\ fonts using whatever metrics suit best.
542
543An extreme example is the virtual \UNICODE\ Punk font. This font is defined in
544the \METAPOST\ language (derived from Don Knuths \METAFONT\ sources) where each
545glyph is one graphic. Normally we get \POSTSCRIPT, but in \LUATEX\ we can also
546get output in a comparable \LUA\ table. That output is converted to \PDF\
547literals that become part of the virtual font definitions and these eventually
548end up in the \PDF\ page stream. So, at the \TEX\ end we have regular (virtual)
549characters and all \TEX\ needs is their dimensions, but in the \PDF\ each glyph
550is shown using drawing operations. Of course the now available \OPENTYPE\ variant
551is more efficient, but it demonstrates the possibilities.
552
553\stopsection
554
555\startsection[title=Files]
556
557We summarize these formats in the following table where we explain what the file
558suffixes stand for:
559
560\starttabulate[|Tl|p|]
561\HL
562\NC tfm \NC This is the traditional \TEX\ font metric file format and it reflects
563            the internal quantities that \TEX\ uses. The internal data structures
564            (in \LUATEX) are an extension of the \TFM\ format. \NC \NR
565\NC vf  \NC This file contains information about how to construct and where to
566            find virtual glyphs and is meant for the backend. With \LUATEX\ this
567            format gets more known. \NC \NR
568\NC pk  \NC This is the bitmap format used for the first generation of \TEX\
569            fonts but the typesetter never deals with them. Bitmap files are more
570            or less obselete. \NC \NR
571\HL
572\NC ofm \NC This is the \OMEGA\ variant of the \type {tfm} files that caters for
573            larger fonts. \NC \NR
574\NC ovf \NC This is the \OMEGA\ variant of the \type {vf}. \NC \NR
575\HL
576\NC pfb \NC In this file we find the glyph data (outlines) and some basic
577            information about the font, like name|-|to|-|index mappings. A
578            differently byte|-|encoded variant of this format is \type {pfa}.\NC
579            \NR
580\NC afm \NC This file accompanies the \type {pfb} file and provides additional
581            metrics, kerns and information about ligatures. A binary variant of
582            this is the \PFA\ format. For \MSWINDOWS\ there is a variant that has the
583            \type {pfm} suffix. \NC \NR
584\NC map \NC The backend will consult this file for mapping metric file names onto
585            real font names. \NC \NR
586\NC enc \NC The backend will include (and use) this encoding vector to map
587            internal indices to font indices using glyph names, if needed. \NC
588            \NR
589\HL
590\NC otf \NC This binary format describes not only the font in terms of metrics,
591            features and properties but also contains the shapes. \NC \NR
592\NC ttf \NC This is the \MICROSOFT\ variant of \OPENTYPE. \NC \NR
593\NC ttc \NC This is the \MICROSOFT\ container format that combines multiple fonts
594            in one. \NC \NR
595\HL
596\NC fea \NC A (\FONTFORGE) feature definition file. Such a file can be loaded and
597            applied to a font. This is no longer supported in \CONTEXT\ as we have
598            other means to achieve the same goals. \NC \NR
599\NC cid \NC A glyph index (name) to \UNICODE\ mapping file that is referenced
600            from an \OPENTYPE\ font and is shared between fonts. \NC \NR
601\HL
602\NC lfg \NC These are \CONTEXT\ specific \LUA\ font goodie files providing
603            additional information. \NC \NR
604\HL
605\stoptabulate
606
607If you look at how files are organized in a \TEX\ distribution, you will notice
608that these files all get their own place. Therefore adding a \TYPEONE\ font to
609the distribution is not that trivial if you want to avoid clashes. Also, files
610are simply not found when they are not in the right spot. Just to mention a few
611paths:
612
613\starttyping
614<root>/fonts/tfm/vendor/typeface
615<root>/fonts/vf/vendor/typeface
616<root>/fonts/type1/vendor/typeface
617<root>/fonts/truetype/vendor/typeface
618<root>/fonts/opentype/vendor/typeface
619<root>/fonts/fea
620<root>/fonts/cid
621<root>/fonts/dvips/enc
622<root>/fonts/dvips/map
623\stoptyping
624
625There can be multiple roots and the right locations are specified in a
626configuration file. Currently all engines can use the \DVIPS\ encoding and map
627files, so luckily we don't need to duplicate this. For some reason \TRUETYPE\ and
628\OPENTYPE\ fonts have different locations and you need to be aware of the fact
629that some fonts come in both formats (just to confuse users) so you might end up
630with conflicts.
631
632In \CONTEXT\ we try to make live somewhat easier by also supporting a simple path
633structure:
634
635\starttyping
636<root>/fonts/data/vendor/typeface
637\stoptyping
638
639This way files are kept together and installing commercial fonts is less complex
640and error prone. Also, in practice we only have one set of files now: one of the
641other \OPENTYPE\ formats.
642
643If you want to see the difference between a traditional (\PDFTEX\ or \XETEX\ plus
644\CONTEXT\ \MKII) setup or a modern one (\LUATEX\ with \CONTEXT\ \MKIV) you can
645install the \CONTEXT\ suite (formerly known as minimals). If you explicitly
646choose for a \LUATEX\ only setup, you will notice that far less files get
647installed.
648
649\stopsection
650
651\startsection[title=Text]
652
653This is not an in|-|depth explanation of how to define and load fonts in
654\CONTEXT. First of all this is covered in other manuals, but more important is
655that we assume that the reader is already familiar with the way \CONTEXT\ deals
656with fonts. Therefore we limit ourselves to some remarks and expand on this a bit
657in later chapters.
658
659The font subsystem has evolved over years and when you look at the low level code
660you will probably find it complex. This is true, although in some aspects it is
661not as complex as in \MKII\ where we also had to deal with encodings due to the
662eight bit limitations. In fact, setting up fonts is easier due the fact that we
663have less files to deal with.
664
665The main properties of a (modern) font subsystem for typesetting text are the
666following:
667
668\startitemize[n]
669    \startitem
670        We need to be able to switch the look and feel efficiently and
671        consistently, for instance going from regular to bold or italic. So,
672        when we load a font family we not only load one file, but often
673        at least four: regular, bold, italic (oblique) and bolditalic
674        (boldoblique).
675    \stopitem
676    \startitem
677        When we change the size we also need to make sure that these related
678        sets are changed accordingly. You really want the bold shapes to scale
679        along with the regular ones.
680    \stopitem
681    \startitem
682        Shapes are organized in serif, sans serif, mono spaced and math and for
683        proper working of a typesetter that has math all over you need always
684        need the math. Again, when you change size, all these shapes need to
685        scale in sync.
686    \stopitem
687    \startitem
688        In one document several families can be combined so the subsystem should
689        make it possible to switch from one to the other without too much
690        overhead.
691    \stopitem
692    \startitem
693       Because section heads and other structural elements have their own sizes
694       there has to be a consistent way to deal with that. It should also be
695       possible to specify exceptions for them.
696   \stopitem
697\stopitemize
698
699In the next chapters we will cover some details, for instance font features. You
700can actually control these when setting up a body font, simply by redefining
701the \type {default} feature set, but not all features are dealt with this way.
702So let's continue the demands put on a font subsystem.
703
704\startitemize[continue]
705    \startitem
706        Sometimes inter|-|character kerning is needed. In \CONTEXT\ this is not a
707        property of a font because glyphs can be mixed with basically anything.
708        This kind of features is applied independent of a font.
709    \stopitem
710    \startitem
711        The same is true for casing (like uppercasing and such) which is not
712        related to a font but applied to a selected (or marked) piece of the
713        input stream.
714    \stopitem
715    \startitem
716        Using so called \quotation {small caps} or \quotation {old style}
717        numerals or \unknown\ can be dealt with by setting the default features
718        but often these are applied selectively. As these are applied using the
719        information in a font they do belong to the font subsystem but in
720        practice they can be seen as independent (assuming that the font supports
721        them at all).
722    \stopitem
723    \startitem
724        Protrusion (into margins) and expansion (to improve whitespace) are
725        applied to the font at load time because the engine needs to know about
726        them. But they two can selectively be turned on and off. They are more
727        related to line break handling than font defining.
728    \stopitem
729    \startitem
730        Slanting (to fake oblique) and expanding (to fake bold) are regular
731        features but are applied to the font because the engine needs to know
732        about them. They permanently influence the shape.
733    \stopitem
734\stopitemize
735
736We will discuss these in this manual too. What we will not discuss in depth is
737spacing, even when it depends on the (main body) font size. These use properties
738of fonts (like the ex|-|height or em|-|width and maybe the width of the space,
739but normally they are controlled by the spacing subsystem. We will however
740mention some rather specific possibilities:
741
742\startitemize[continue]
743    \startitem
744        The \CONTEXT\ font subsystem provides ways to combine multiple fonts
745        into one.
746    \stopitem
747    \startitem
748        You can construct artificial fonts, using existing fonts or \METAPOST\
749        graphics.
750    \stopitem
751    \startitem
752        Fonts can be fixed (dimensions) and completed (for instance accented
753        characters) when loading/
754    \stopitem
755    \startitem
756        There are extensive tracing options, not only for applied features but
757        also for loading, checking etc. There is a set of styles that can be
758        used to study fonts.
759    \stopitem
760\stopitemize
761
762Sometimes users ask for very special trickery and it no surprise then that some
763of that is now widely know (or even discussed in detail). When we get notice of
764that we can mention it in this manual.
765
766So how does this all relate to font formats? We mentioned that when loading we
767basically load some four files per family (and more if we use specific fonts for
768titling). These files just provide the data: metric information, shapes and ways
769to remap characters (or sequences) into glyphs, either of not positioned relative
770to each other. In traditional \TEX\ only dimensions, kerns and ligatures
771mattered, but in nowadays we also deal with specific \OPENTYPE\ features. But
772still, as you can deduce from the above, this is only part of the story. You need
773a complete and properly integrated system. It is no big deal to set up some
774environment that uses font files to achieve some typesetting goal, but to provide
775users with some consistent and extensible system is a bit more work.
776
777There are basically three font formats: good old bitmaps, \TYPEONE\ and
778\OPENTYPE. All need to be supported and expectations are that we also support
779their features. But is should be noticed that whatever font you use, the quality
780of the outcome depends on what information the font can provide. We can improve
781processing but are often stuck with the font. There are many thousands of
782fonts out there and we need to be able to use them all.
783
784\stopsection
785
786\startsection[title=Math]
787
788In the previous section we already mentioned math fonts. The fonts are just one
789aspect of typesetting math and math fonts are special in the sense that they have
790to provide the relevant information. For instance a parenthesis comes in several
791sizes and at some point turns in a symbol made out of pieces (like a top curve,
792middle lines and bottom curve) that overlap. The user never sees such details. In
793fact, there are ot that many math fonts and these are already set up so there is
794not much to mess up here. Nevertheless we mention:
795
796\startitemize [n]
797    \startitem
798        Math fonts are loaded in three sizes: text, script and scriptscript. The
799        optimal relative sizes ar defined in the font.
800    \stopitem
801    \startitem
802        There are direction aware math fonts and we support this in \CONTEXT.
803    \stopitem
804    \startitem
805        Bold math is in fact a bolder version of a regular math font (that can
806        have bold symbols too). Again this is supported.
807    \stopitem
808\stopitemize
809
810The way math is dealt with in \CONTEXT\ is different from the way it is done
811traditionally. Already when we started with \MKIV\ we moved to \UNICODE\ and
812the setup at the font level is kept simple by delegating some of the work to
813the \LUA\ end. We will see some of the mentioned aspects in more detail later.
814
815Because of it's complexity and because in a math text there can be many times
816activation of math fonts (and related settings) quite some effort has been put in
817making it efficient. But you need to keep in mind that when we discuss math
818related topics later on, this is hardly of concern. Math fonts are loaded only
819once so manipulating them a bit has no penalty. And using them later on is hardly
820related to the font subsystem.
821
822Concerning formats we can notice that traditional \TEX\ comes with math fonts
823that have properties that the engine can use. Because there were not many math
824fonts, this was no problem. The \OPENTYPE\ math fonts however are also used in
825other applications and therefore are a bit more generic. \footnote {Their
826internals are now defined in the \OPENTYPE\ specification.} For this we not only
827had to adapt the math engine in \LUATEX\ (although we kept that to the minimum)
828but we also had to think different about loading them. In later chapters we will
829see that in the transition to \UNICODE\ math fonts we implemented a mechanism for
830combining \TYPEONE\ fonts into virtual \UNICODE\ fonts. We did that because it
831made no sense to keep an old and new loader alongside.
832
833There will not be thousands of math fonts flying around. A few dozen is already a
834lot and the developers of macro packages can set them up for the users. So, in
835practice there is not much that a user needs to know about math font formats.
836
837\stopsection
838
839\startsection[title=Caching]
840
841Because fonts can be large and because we use \LUA\ tables to describe them
842a bit of effort has been put into managing them efficiently. Once converted
843to the representation that we need they get cached. You can peek into the cache
844which is someplace on your system (depending on the setup):
845
846\starttabulate[|l|p|]
847\NC \type{fonts/data}    \NC font name databases \NC \NR
848\NC \type{fonts/mp}      \NC fonts created using \METAPOST \NC \NR
849\NC \type{fonts/one}     \NC type one fonts, converted from \type {afm} and \type
850                             {pfb} files \NC \NR
851\NC \type{fonts/otl}     \NC open type fonts, converted from \type {ttf}, \type {otf},
852                             \type {ttc} and \type {ttx} files loaded using the
853                             \CONTEXT\ \LUA\ loader \NC \NR
854\NC \type{fonts/pdf}     \NC font shapes for color fonts \NC \NR
855\NC \type{fonts/shapes}  \NC outlines of fonts (for instance for use in \METAFUN) \NC \NR
856\NC \type{fonts/streams} \NC font programs for variable font instances \NC \NR
857\stoptabulate
858
859There can be three types of files there. The \type{tma} files are just \LUA\
860tables and they can be large. These files can be compiled to bytecode where \type
861{tmc} is for stock \LUATEX\ and \type {tmb} for \LUAJITTEX. The \type {tma} files
862are optimized for space and memory (aka: packed) but you can expand them with
863\type {mtxrun --script font}.
864
865Fonts in the cache are automatically updated when you install new versions of a
866font or when the \CONTEXT\ font loader has been updated.
867
868\stopsection
869
870\startsection[title=Paths]
871
872The search for fonts happens on paths defined in \type {texmf.cnf}. The information
873in there is used to generate a file database for fast access with priorities based
874on file type. The \TDS\ is starting point. The environment variable driven paths
875\type {OSFONTDIR} (set automatically) and \type {EXTRAFONTDIR} are taken into account.
876
877In addition you can set \type {RUNTIMEFONTS} which is, when set, consulted at
878runtime. You can also add a path in your style:
879
880\starttyping
881\usefontpath[c:/data/projects/myproject/fonts]
882\stoptyping
883
884although in general we recommend to put fonts in
885
886\starttyping
887<texroot>/tex/texmf-fonts/fonts/data]
888\stoptyping
889
890which is more efficient.
891
892\stopsection
893
894\stopchapter
895
896\stopcomponent
897