mk-fonts.tex /size: 36 Kb    last modification: 2023-12-21 09:43
1% language=us
2
3\usemodule[virtual]
4
5\startcomponent mk-fonts
6
7\environment mk-environment
8
9\chapter{A fresh look at fonts}
10
11\subject{readers}
12
13Now that we have the file system, \LUA\ script integration, input
14encoding and basic logging in place, we have arrived at fonts.
15Although today \OPENTYPE\ fonts are the fashion, we still need to
16deal with \TEX's native font machinery. Although Latin Modern and
17the \TEX\ Gyre collection will bring us many free \OPENTYPE\
18fonts, we can be sure that for a long time \TYPEONE\ variants will
19be used as well, and when one has lots of bought fonts, replacing
20them with \OPENTYPE\ updates is not always an option. And so,
21reimplementing the readers for \TEX\ Font Metrics (\type {tfm}
22files) and Virtual Fonts (\type {vf} files), was the first step.
23
24Because \ALEPH\ font handling was integrated already, Taco decided
25to combine the \TFM\ and \OFM\ readers into a new one. The
26combined loader is written in C and produces tables that are
27accessible from within \LUA. A problem is that once a font is
28used, one cannot simply change its metrics. So, we have to make
29sure that we apply changes before a font is actually used:
30
31\starttyping
32\font\test=texnansi-lmr at 31.415 pt
33\test Yet another nice Kate Bush song: Pi
34\stoptyping
35
36In this example, any change to the fontmetrics has to be done before
37\type {test} is invoked. For this purpose the \type {define_font}
38callback is provided. Below you see an experimental overload:
39
40\starttyping
41callback.register("define_font", function (name,area,size)
42    return fonts.patches.process(font.read_tfm(name,size))
43end )
44\stoptyping
45
46The \type {fonts.patched.process} function (currently in \CONTEXT\
47\MKIV) implements a mechanism for tweaking the font parameters in
48between. In order to get an idea of further features we played a
49bit with ligature replacement, character spacing, kern tweaking
50etc. Think of such a function (or a chain of functions) doing
51things similar to:
52
53\starttyping
54callback.register("define_font", function (name,area,size)
55    local tfmblob = font.read_tfm(name,size) -- build in loader
56    tfmblob.characters[string.byte("f")].ligatures = nil
57    return tfmblob -- datastructure that TeX will use internally
58end )
59\stoptyping
60
61Of course the above definition is not complete, if only because we
62need to handle chained ligatures as well (fl followed by i).
63
64In practice we prefer a more abstract interface (at the macro
65level) but the idea stays the same. Interesting is that having
66access to the internals this way already makes our \TEX\ Live more
67interesting. (We cannot demonstrate this trickery here because
68when this document is processed you cannot be sure if the
69experimental interface is still in place.)
70
71When playing with this we ran into problems with file searching.
72When performing the backend role, \LUATEX\ will look in the \TEX\
73tree if there is a corresponding virtual file. It took a while and
74a bit of tracing (which is not that hard in the \LUA\ based
75reader) to figure out that the omega related path definitions in
76\type {texmf.cnf} files were not correct, something that went
77unnoticed because omega never had a backend integrated and the
78\DVI\ processors did multiple searches to get around this.
79
80Currently, if you want to enable extensive tracing of file
81searching and loading, you can set an environment variable:
82
83\starttyping
84MTX.INPUT.TRACE=3
85\stoptyping
86
87This will produce a lot of information about what file is asked
88for, what types (tex, font, etc) determines the search, along what
89paths is being searched, what readers and locators are used (file,
90zip, protocol), etc.
91
92\subject{AFM}
93
94While Taco implemented the virtual font reader |<|eventually its
95data will be merged with the \TFM\ table|>| I started playing with
96constructing \TFM\ tables directly. Because \CONTEXT\ has a rather
97systematic naming scheme, we can rather easily see which encoding
98we are dealing with. This means that in principle we can throw all
99encoded \TFM\ files out of our tree and construct the tables using
100the \AFM\ file and an encoding vector.
101
102It took us a good day to figure out the details, but in the end we
103were able to trick \LUATEX\ into using \AFM\ files. With a bit of
104internal caching it was even reasonable fast. When the basic
105conversion mechanism was written we tried to compare the results
106with existing \TFM\ metrics as generated by \type {afm2tfm} and
107\type {afm2pl}. Doing so was less trivial than we first thought.
108To mention a few aspects:
109
110\startitemize[packed]
111\item heights and depths have a limited number of values in \TEX
112\item we need to convert to \TEX's scaled points
113\item rounding errors of one scaled point occur
114\item \type {afm2tfm} can only add kerns when virtual fonts are used
115\item \type {afm2tfm} adds some extra ligatures and also does some
116      kern magic
117\item \type {afm2pl} adds even more kerns
118\item the tools remove kern pars between digits
119\stopitemize
120
121In this perspective we need not be too picky on what exactly a
122ligature is. An example of a ligature is \type {fi} and such a
123character can be in the font. In the \TFM\ file, the definition of
124\type {f} contains information about what to do when it's followed
125by an \type {i}: it has to insert a reference (character number)
126pointing to the fi glyph.
127
128However, because \TEX\ was written in \ASCII\ time space, there
129was a problem of how to get access to for instance the Spanish
130quotation and exclamation marks. Here the ligature mechanism
131available in the \TFM\ format was misused in the sense that a
132combination of \type {exclam} and \type {quoteleft} becomes \type
133{exclamdown}. In a similar fashion will two single quotes become a
134double quote. And every \TEX ie knows that multiple hyphens
135combine into -- (endash) and --- (emdash), where the later one is
136achieved by defining a ligature between an endash and a hyphen.
137
138Of course we have to deal with conversions from \AFM\ units (1000
139per em) to \TEX's scaled points. Such conversions may be sensitive
140for rounding errors. Because we noticed differences of one scaled
141point, I tried several strategies to get the results consistent
142but so far I didn't manage to find out where these differences
143come from. Rounding errors seem to be rather random and I have no
144clue what strategy the regular converters follow. Another fuzzy
145area are the font parameters (visible as font dimensions for
146users): I wonder how many users really know what values are used
147and why.
148
149You may wonder to what extend this rounding problem will influence
150consistent typesetting. We have no reason to assume that the
151rounding error is operating system dependent. This leaves the
152different methods used and personally I have no problems with the
153direct reader being not 100\% compatible with the regular tools.
154First of all it's an illusion to think that \TEX\ distributions
155are stable over the years. Fonts and conversion tools are being
156updated every now and then, and metrics change over time (apart
157from Computer Modern which is stable by definition). Also, pattern
158file are updated, so paragraphs may be broken into lines different
159anyway. If you really want stability, then you need to store the
160fonts and patterns with your document.
161
162As we already mentioned, the regular converter programs add kerns
163as well. Treating common glyph shapes similar is not uncommon in
164\CONTEXT\ so I decided to provide methods for adding \quote
165{missing} kerns. For example, with regards to kerning, we can
166treat \type {eacute} the same way as an~\type {e}. Some ligatures,
167like \type {ae} or \type {fi}, need to be seen from two sides:
168when looked at from the left side they resemble an \type {a} and
169\type {f}, but when kerned at their right, they are to be treated
170as \type {e} and \type {i}.
171
172So, when all this is taken care of, we will have a reasonable
173robust and compatible way to deal with \AFM\ files and when this
174variant is enabled, we can prune our \TEX\ trees pretty well.
175Also, now that we have font related tables, we can start moving
176tables built out of \TEX\ macros (think of protruding and hz) to
177\LUA, which will not only save us much hash entries but also
178permits us faster implementations.
179
180The question may arise why there is no hard coded \AFM\ reader.
181Although some speed up can be achieved by reading the table with
182\AFM\ data directly, there would still be the issue of making that
183table accessible for manipulations as described (costs time too).
184The \AFM\ format is human readable contrary to the \TFM\ format
185and therefore they can conveniently be processed by \LUA. Also,
186the possible manipulations may differ per macro package, user, and
187even documents. The changes of users and developers reaching an
188agreement about such issues is near zero. By writing the reader in
189\LUA, a macro package writer can also implement caching mechanisms
190that suits the package. Also, keep in mind that we often only need
191to load about four \AFM\ files or a few more when we mix fonts.
192
193In my main tree (regular distributions) there are some 350 files
194in \type {texnansi} encoding that take over 2~MByte. My personal
195font tree has over a thousand such entries which means that we can
196prune the tree considerably when we use the \AFM\ loader. Why
197bother about \TFM\ when \AFM\ can do the job.
198
199In order to reduce the overhead in reading the \AFM\ file, we now
200use external caching, which (in \CONTEXT\ \MKIV) boils down to
201serializing the internal \AFM\ tables and compiling them to
202bytecode. As a result, the runtime becomes comparable to a run
203using regular \TFM\ files. On this document usign the \AFM\ reader
204(cached) takes some .3 seconds more on 8 seconds total (28 pages
205in Optima Nova with a couple of graphics).
206
207While we were playing with this, Hermann Zapf surprised me by
208sending me a \CD\ with his marvelous new Palatino Sans. So,
209instead of generating \TFM\ metrics, I decided to use \type
210{ttf2afm} to generate me an \AFM\ file from the \TRUETYPE\ files
211and use these metrics. It worked right out of the box which means
212that one can copy a set of font files directly from the source to
213the tree. In a demo document the Palatino Sans came out quite well
214and so we will use this font to explore the upcoming Open Type
215features.
216
217Because we now have less font resources (only two files per font)
218we decided to get away from the spread||all||over||the||tree
219paradigm. For this we introduced
220
221\starttyping
222../fonts/data/vendor/collection
223\stoptyping
224
225like:
226
227\starttyping
228../fonts/data/tex/latin-modern
229../fonts/data/tex-gyre/bonum
230../fonts/data/linotype/optima-nova
231../fonts/data/linotype/palatino-nova
232../fonts/data/linotype/palatino-sans
233\stoptyping
234
235Of course one needs to adapt the related font paths in the
236configuration files but getting that done in tex distributions is
237another story.
238
239\subject{map files}
240
241Reading an \AFM\ file is only part of the game. Because we bypass
242the regular \TFM\ reader we may internally end up with different
243names of fonts (and|/|or files). This also means that the map
244files that map an internal name onto an font (outline) file may be
245of no use. The map file also specifies the encoding file which
246maps character numbers onto names used in font files.
247
248The map file maps a font name to a (preferable outline) font
249resource file. This can be a file with suffix \type {pfb}, \type
250{ttf}, \type {otf} or alike. When we convert am \AFM\ file into a
251more suitable format, we also store the associated (outline)
252filename, that we use later when we assemble the map line data (we
253use \type {\pdfmapline} to tell \LUATEX\ how to prepare and embed
254a file.
255
256Eventually \LUATEX\ will take care of all these issues itself
257thereby rendering map files and encoding files kind of useless.
258When loading an \AFM\ file we already have to read encoding files,
259so we have all the information available that normally goes into
260the map file. While conducting experiments with reading \AFM\
261files, we therefore could use the \type {\pdfmapline} primitive to
262push the right entries into font inclusion machinery. Because
263\CONTEXT\ already handles map data itself we could easily hook
264this into the normal handlers for that. (There are some nasty
265synchronization issues involved in handling map entries in general
266but we will not bother you with that now).
267
268Although eventually we may get rid of map files, we also used the
269general map file handling in \CONTEXT\ as a playground for the
270\XML\ handler that we wrote in \LUA. Playing with many map files
271(a few KBytes) coded in \XML\ format, or with one big map file
272(easily 800 MBytes) makes a good test case for loading and dumping
273
274But why bother too much about map files in \LUATEX\ \unknown\ they
275will go away anyway.
276
277\subject{OTF \& TTF}
278
279One of the reasons for starting the \LUATEX\ development was that we wanted to
280be able to use \OPENTYPE\ (and \TRUETYPE) fonts in \PDFTEX. As a prelude (and kind of
281transition) we first dealt with \TYPEONE\ using either \TFM\ or \AFM. For \TEX\ it does
282not really matter what font is used, it only deals with dimensions and generic
283characteristics. Of course, when fonts offer more advanced possibilities, we may
284need more features in the \TEX\ kernel, but think of \HZ\ or protruding as provided
285by \PDFTEX: it's not part of the font (specification) but of the engine. The same
286is actually true for kerning and ligature building, although here the font (data) may
287provide the information needed to deal with it properly.
288
289\OPENTYPE\ fonts come with features. Examples of features are using oldstyle figures or
290tabular digits instead of the default ones. Dealing with such issues boils down to
291replacing one character representation by another or treating combinations of character
292in the input differently depending on the circumstances. There can be relationships
293between languages and scripts, but, as \TEX ies know, other relationships exist as well,
294for instance between content and visualization.
295
296Therefore, it will be no surprise that \LUATEX\ does not simply implement the \OPENTYPE\
297specification as such. On the one hand it implements a way to load information stored
298in the font, on the other hand it implements mechanisms to fullfil the demands of such
299fonts and more. The glue between both is done with \LUA. In the simple case of ligatures
300and kerns this goes as follows. A user (or macropackage) specified a font, and this
301call can be intercepted using a callback. This callback can use a built in function that
302loads an \OTF\ or \TTF\ font. From this table, a font table is constructed that is passed
303on to \TEX. The construction
304may involve building ligature and kerning tables using the information present
305in the font file, but it may as well mean more. So, given a bare \LUATEX\ system,
306\OPENTYPE\ font support is not giving you automatically handling of features, or more
307precisely, there is no hard coded support for features.
308
309This may sound as a disadvantage
310but as soon as you start looking at how \TEX\ users use their system (in most cases
311by using a macro package) you may understand that flexibility is larger this way. Instead
312of adding more and more control and exceptions, and thereby making the kernel more
313instable and complex, we delegate control to the macro package. The advantage is that
314there are no (everlasting) discussions on how to deal with things and in the end the
315user will use a high level interface anyway. Of course the macro package needs proper
316access to the font's internals, but this is provided: the code used for reading in the
317data comes from FontForge (an advanced font editor) and is presented via \LUA\ tables
318in a well organized way.
319
320Given that users expect \OPENTYPE\ features to be supported, how do we provide an
321interface. In \CONTEXT\ the user interface has always be an important aspect and
322consistency is a priority. On the other hand, there has been the tradition of specifying
323the size explicity and a new custom introduced by \XETEX\ to enhance fontname
324with directives. Traditional \TEX\ provides:
325
326\starttyping
327\font \name filename [optional size]
328\stoptyping
329
330\XETEX\ accepts
331
332\starttyping
333\font \name "fontname[:optional features]" [optional size]
334\font \name  fontname[:optional features]  [optional size]
335\stoptyping
336
337Instead of a fontname one can pass a filename between square brackets. \LUATEX\
338handles:
339
340\starttyping
341\font \name  anything  [optional size]
342\font \name {anything} [optional size]
343\stoptyping
344
345where anything as well as the size are passed on to the callback.
346
347This permits us to implement a traditional specification, support \XETEX\ like
348definitions, and easily pass information from a macro package down to the
349callback as well. Interpreting anything is done in \LUA.
350
351While implementing the \LUA\ side of the loader we took a similar approach
352as the \AFM\ reader and cached intermediate tables as well as keep track
353of font names (in addition to filenames). In order to be able to quickly
354determine the (internal) font name of an \OPENTYPE\ font, special loader
355functions are provided.
356
357The size is kind of special, because we can have specifications like
358
359\starttyping
360at 10pt
361at 3ex
362at \dimexpr\bodyfontsize+1pt\relax
363\stoptyping
364
365This means that we need to handle that on the \TEX\ side and pass the
366calculated value to the callback.
367
368Virtual fonts have a rather special nature. They permit you to define variations
369of fonts using other fonts and special (\DVI\ related) operators. However, from the
370perspective of \TEX\ itself they don't exist at all. When you create a virtual font
371you also end up with a \TFM\ file and \TEX\ only needs this file, which defined
372characters in terms of a width, height, depth and italic correction as well as
373associates characters with kerning pairs and ligatures. \TEX\ leaves it to the
374backend to deal the actual glyphs and therefore the backend will be confronted
375by the internals of a virtual font. Because \PDFTEX\ and therefore \LUATEX\ has the
376backend built in, it is capable of handling virtual fonts information.
377
378In \LUATEX\ you can build your own virtual font and this will suit us well. It
379permits us for instance to complete fonts that lack certain characters (glyphs) and
380thereby let us get rid of ugly macro based fallback trickery. Although in \CONTEXT\
381we will provide a high level interface, we will give you a taste of \LUA\ here.
382
383\starttyping
384callback.register("define_font", function(name,size)
385    if name == "demo" then
386        local f = font.read_tfm('texnansi-lmr10',size)
387        if f then
388            local capscale, digscale = 0.85, 0.75
389            f.name, f.type = name, 'virtual'
390            f.fonts = {
391                { name="texnansi-lmr10" , size=size },
392                { name="texnansi-lmss10", size=size*capscale },
393                { name="texnansi-lmtt10", size=size*digscale }
394            }
395            for k,v in pairs(f.characters) do
396               local chr = utf.char(k)
397               if chr:find("[A-Z]") then
398                    v.width = capscale*v.width
399                    v.commands = {
400                        {"special","pdf: 1 0 0 rg"},
401                        {"font",2}, {"char",k},
402                        {"special","pdf: 0 g"}
403                    }
404                elseif chr:find("[0-9]") then
405                    v.width  = digscale*v.width
406                    v.commands = {
407                        {"special","pdf: 0 0 1 rg"},
408                        {"font",3}, {"char",k},
409                        {"special","pdf: 0 g"}
410                    }
411                else
412                    v.commands = {
413                        {"font",1}, {"char",k}
414                    }
415                end
416            end
417            return f
418        end
419    end
420    return font.read_tfm(name,size)
421end)
422\stoptyping
423
424Here we define a virtual font that uses three real fonts and
425which font is used depends on the kind of character we're
426dealing with (inreal world situations we can best use the \MKIV\ function
427that tells what class a character belongs to). The \type {commands}
428table determines what glyphs comes out in what way. We use a bit of
429literal pdf code to color the special characters but generally color is
430not handled at the font level.
431
432This example can be used like:
433
434\starttyping
435\font\test=demo \test
436Hi there, this is the first (number 1) example of playing with
437Virtual Fonts, some neat feature of \TeX, once you have access
438to it. For instance, we can misuse it to fill in gaps in fonts.
439\stoptyping
440
441During development of this mechanism, we decided to save some redundant
442loading by permitting id's in the fonts array:
443
444\starttyping
445callback.register("define_font", function(name,size)
446    if name == "demo" then
447        local f = font.read_tfm('texnansi-lmr10',size)
448        if f then
449            local id = font.define(f)
450            local capscale, digscale = 0.85, 0.75
451            f.name, f.type = name, 'virtual'
452            f.fonts = {
453                { id=id },
454                { name="texnansi-lmss10", size=size*capscale },
455                { name="texnansi-lmtt10", size=size*digscale }
456            }
457            for k,v in pairs(f.characters) do
458               local chr = utf.char(k)
459               if chr:find("[A-Z]") then
460                    v.width = capscale*v.width
461                    v.commands = {
462                        {"special","pdf: 1 0 0 rg"},
463                        {"slot",2,k},
464                        {"special","pdf: 0 g"}
465                    }
466                elseif chr:find("[0-9]") then
467                    v.width  = digscale*v.width
468                    v.commands = {
469                        {"special","pdf: 0 0 1 rg"},
470                        {"slot",3,k},
471                        {"special","pdf: 0 g"}
472                    }
473                else
474                    v.commands = {
475                        {"slot",1,k}
476                    }
477                end
478            end
479            return f
480        end
481    end
482    return font.read_tfm(name,size)
483end)
484\stoptyping
485
486Hardwiring fontnames in callbacks this way does not deserve a price and
487when possible we will provide better extension interfaces. Anyhow,
488in the experimental \CONTEXT\ code we used calls like this, where
489\type {demo} is an installed feature.
490
491\startbuffer
492\font\myfont = special@demo-1 at 12pt \myfont
493Hi there, this is the first (number 1) example of playing with Virtual Fonts,
494some neat feature of \TeX, once you have access to it. For instance, we can
495misuse it to fill in gaps in fonts.
496\stopbuffer
497
498\typebuffer \start \getbuffer \par \stop
499
500Keep in mind that this is just an example. In practice we will not do such things
501at the font level but by manipulating \TEX's internals.
502
503While developing this functionality and especially when Taco was
504programming the backend functionality, we used more sane \MKIV\ code. Think
505of (still \LUA) definitions like:
506
507\startbuffer
508\ctxlua {
509    fonts.definers.methods.install("weird", {
510        { "copy-range",     "lmroman10-regular"                      } ,
511        { "copy-char",      "lmroman10-regular",          65,     66 } ,
512        { "copy-range",     "lmsans10-regular",       0x0100, 0x01FF } ,
513        { "copy-range",     "lmtypewriter10-regular", 0x0200, 0xFF00 } ,
514        { "fallback-range", "lmtypewriter10-regular", 0x0000, 0x0200 }
515    })
516}
517\stopbuffer
518
519\typebuffer \getbuffer
520
521Again, this is not the final user interface, but it shows the
522direction we're heading. The result looks like:
523
524\startbuffer
525\font\test={myfont@weird} at 12pt \test
526\eacute \rcaron \adoublegrave \char65
527\stopbuffer
528
529\typebuffer
530
531This shows up as:
532
533\start \getbuffer \stop
534
535Here the \type {@} tells the (new) \CONTEXT\ font handler what constructor
536should be used.
537
538Because some testers already have \XETEX\ font support files, we
539also support a \XETEX\ like definition syntax.
540
541\startbuffer
542\font\test={lmroman10-regular:dlig;liga}\test
543f i fi ffi \crlf
544f i f\kern0pti f\kern0ptf\kern0pti \crlf
545\char64259 \space\char64256 \char105 \space \char102\char102\char105
546\stopbuffer
547
548\typebuffer
549
550This gives:
551
552\start \getbuffer \stop
553
554We are quite tolerant with regards to this specification and will provide less
555dense methods as well. Of course we need to implement a whole bunch of
556features but we will do this in such a way that we give users full control.
557
558\subject{encodings}
559
560By now we've reached a stage where we can get rid of font encodings. We now
561have the full unicode range available and no longer depend on the font
562encoding when we hyphenate. In a previous chapter we discussed the difference
563in size between formats.
564
565\starttabulate[|c|c|c|c|c|]
566\NC \bf date   \NC \bf luatex \NC \bf pdftex \NC \NR
567\NC 2006-10-23 \NC 3 135 568  \NC 7 095 775  \NC \NR
568\NC 2007-02-18 \NC 3 373 206  \NC 7 426 451  \NC \NR
569\NC 2007-02-19 \NC 3 060 103  \NC 7 426 451  \NC \NR
570\stoptabulate
571
572The size of the formats has grown a bit due to a few more
573patterns and a extra preloaded encoding. But the \LUATEX\
574format shrinks some 10\% now that we can get rid of encoding
575support. Some support for encodings is still present, so that
576one can keep using the metric files that are installed (for
577instance in project related trees that have special fonts)
578although \AFM/\TYPEONE\ files or \OPENTYPE\ fonts will be used when
579available.
580
581A couple of years from now, we may throw away some \LUA\ code
582related to encodings.
583
584\subject{files}
585
586\TEX\ distributions tend to be rather large, both in terms of
587files and bytes. Fonts take most of the space. The merged
588\TEX Live 2007 trees contain some 60.000 files that take
5891.123 MBytes. Of this, 25.000 files concern fonts totaling
590to 431 MBytes. A recent \CONTEXT\ distribution spans 1200 files and
59120 MBytes and a bit more when third party modules are taken into
592account. The fonts in \TEX Live are distributed as follows:
593
594\starttabulate[|l|r|r|r|r|]
595\HL
596\NC \bf format \NC \bf files \NC \bf bytes \NC     \NC            \NC \NR
597\HL
598\NC AFM      \NC  1.769 \NC 123.068.970 \NC    443 \NC 22.290.132 \NC \NR
599\NC TFM      \NC 10.613 \NC  44.915.448 \NC  2.346 \NC  8.028.920 \NC \NR
600\NC VF       \NC  3.798 \NC   6.322.343 \NC    861 \NC  1.391.684 \NC \NR
601\NC TYPE1    \NC  2.904 \NC 180.567.337 \NC    456 \NC 18.375.045 \NC \NR
602\NC TRUETYPE \NC     22 \NC   1.494.943 \NC        \NC            \NC \NR
603\NC OPENTYPE \NC    144 \NC  17.571.732 \NC        \NC            \NC \NR
604\NC ENC      \NC    268 \NC     782.680 \NC        \NC            \NC \NR
605\NC MAP      \NC    406 \NC   6.098.982 \NC    110 \NC    129.135 \NC \NR
606\NC OFM      \NC     39 \NC  10.309.792 \NC        \NC            \NC \NR
607\NC OVF      \NC     39 \NC     413.352 \NC        \NC            \NC \NR
608\NC OVP      \NC     22 \NC   2.698.027 \NC        \NC            \NC \NR
609\NC SOURCE   \NC  4.736 \NC  25.932.413 \NC        \NC            \NC \NR
610\HL
611\stoptabulate
612
613We omitted the more obscure file types. The last two columns show the
614numbers for one of my local font trees.
615
616In due time we will see a shift from \TYPEONE\ to \OPENTYPE\ and \TRUETYPE\
617files and because these fonts are more
618complete, they may take some more space. More important is that the \TEX\ specific
619font metric files will phase out and the less \TYPEONE\ fonts we have, the less \AFM\
620companions we need (\AFM\ files are not compressed and therefore relatively
621large). Mapping and encoding files can also go away.
622
623In \LUATEX\ we can do with less files, but the number of bytes may grow a bit
624depending on how much is catched (especially fonts). Anyhow, we can safely
625assume that a \LUATEX\ based distributions will carry less files and less
626bytes around.
627
628\subject{fallbacks}
629
630Do we need virtual fonts? Currently in \CONTEXT, when a font encoding is chosen, a
631fallback mechanism steps in as soon as a character is not in the encoding. So far,
632so good. But occasionally we run into a font that does not (completely) fits an
633encoding and we end up with defining a non standard one. In traditional \TEX\
634a side effects of font encodings is that they relate to hyphenation. \CONTEXT\ can
635deal with that comfortably and multiple instances of the same set of hyphenation
636patterns can be loaded, but for custom encodings this is kind of cumbersome.
637
638In \LUATEX\ we have just one font encoding: \UNICODE. When \OPENTYPE\ fonts are used,
639we don't expect many problems related to missing glyphs, but you can bet on it that
640they will occur. This is where in \CONTEXT\ \MKIV\ fallbacks will be used and this
641will be implemented using vitual fonts. The advantage of using virtual fonts is that
642we still deal with proper characters and hyphenation will take place as expected. And
643since virtual fonts can be defined on the fly, we can be flexible in our implementation.
644We can think of generic fallbacks, not much different than macro based representations,
645or font specific ones, where we even may rely on \METAPOST\ for generating the glyph
646data.
647
648How do we define a fall back character. When building this mechanism I used the
649\quote {\textcent} as an example. A cent symbol is roughly defined as follows:
650
651\starttyping
652local t = table.fastcopy(g.characters[0x0063]) -- mkiv function
653local s = fonts.constructors.scaled(g.fonts[1].size)    -- mkiv function
654t.commands = {
655    {"push"},
656    {"slot", 1, c},
657    {"pop"},
658    {"right", .5*t.width},
659    {"down",  .2*t.height},
660    {"rule", 1.4*t.height, .02*s}
661}
662t.height = 1.2*t.height
663t.depth  = 0.2*t.height
664\stoptyping
665
666Here, \type {g} is a loaded font (table) which has type \type {virtual}. The
667first font in the \type {fonts} array is the main font. What happens here
668is the following: we assign the characteristics of \quote {c} to the cent
669symbol (this includes kerning and dimensions) and then define a command
670sequence that draws the \quote {c} and a vertical rule through it.
671
672The real code is slightly more complicated because we need to take care of
673italic properties when applicable and because we have added some tracing too.
674While playing with this kind of things, it becomes clear what features are
675handy, and the reason that we now have a virtual command \type {comment} is
676that it permits us to implement tracing (using for instance color specials).
677
678\def\TestLine#1%
679  {\start
680   \font\test=#1\relax
681   \test
682   c\quad
683   \textcent\quad
684   \ruledhbox{c}\quad
685   \ruledhbox{\textcent}\quad
686   \scaron\quad
687   \eacute\quad
688   \adiaeresis\quad
689   \udiaeresis\quad
690   \char 465\quad
691   \char 463\quad
692   \char7685\quad
693   \stop
694   \blank}
695
696\TestLine {lmroman10-regular@demo-2 at 24pt}
697\TestLine {lmroman10-italic@demo-2  at 24pt}
698
699The previous lines are typeset using a similar specification as mentioned
700before:
701
702\starttyping
703\font\test=lmroman10-regular@demo-2
704\stoptyping
705
706Without the fallbacks we get:
707
708\TestLine {lmroman10-regular at 24pt}
709\TestLine {lmroman10-italic  at 24pt}
710
711And with normal (non forced fallbacks) it looks as follows. As it happens,
712this font has a cent symbol so no fallback is needed.
713
714\TestLine {lmroman10-regular@demo-3 at 24pt}
715\TestLine {lmroman10-italic@demo-3  at 24pt}
716
717The font definition callback intercepts the \type {demo-2} and a couple of
718chained lua functions make sure that characters missing in the font are
719replaced by fallbacks. In the case of missing composed characters, they are
720constructed from their components. In this particular example we have told
721the handler to assume that all composed characters are missing.
722
723\subject{memory}
724
725Traditional \TEX\ has been designed for speed and a small memory footprint. Todays
726implementations are considerably more generous with the amount of memory that
727you can use (hash, fonts, main memory, patterns, backend, etc). Depending
728on how complicated a document layout it, memory may run into tens of megabytes.
729
730Because \LUATEX\ is not only suitable for wide fonts, but also does away with some of
731the optimizations in the \TEX\ code that complicate extensions, it has a larger
732footprint that \PDFTEX. When implementing the \OPENTYPE\ font basics, we did quite
733some tests with respect to memory usage. Getting the numbers right is non trivial
734because the \LUA\ garbage collector is interfering. For instance, on my machine a
735test file with the regular \CONTEXT\ setup of of Latin Modern fonts made \LUA\
736allocate 130 MB, while the same run on Taco's machine took 100 MB.
737
738When a font data table is constructed, it is handled over to \TEX, and turned into
739the internal font data structures. During the construction of that \TABLE\ at the
740\LUA\ end, \CONTEXT\ \MKIV\ disables the garbage collector. By doing this, the time
741needed to construct and scale a font can be halved. Curious to the amount of memory
742involved in passing such a table, I added the following piece of code:
743
744\starttyping
745if type(fontdata) == "table" then
746    local s = statistics.luastate_bytes
747    local t = table.copy(fontdata)
748    local d = statistics.luastate_bytes-s
749    texio.write_nl(string.format("table memory footprint: %s",d))
750end
751\stoptyping
752
753It turned out that a Regular Latin Modern font (\OPENTYPE) takes around
754800 KB. However, more interesting was that by adding this snippet of testcode
755which duplicted the table in order to measure its size, the total memory footprint
756dropped to 100 MB (about the amount used on Taco's machine). This demonstrates
757that one should be very careful with drawing conclusions.
758
759Because fonts are rather important in \TEX\ and because there can be lots of
760them used, it makes sense to keep an eye on memory as well as performance.
761Because many manipulations now take place in \LUA, it no longer makes sense
762to let \TEX\ buffer fonts. In plain \TEX\ one finds these magic
763
764\starttyping
765\font\preloaded=cmr10
766\font\preloaded=cmr12
767\stoptyping
768
769lines. The second definitions obscures the first, but the \type {cmr10} stays
770loaded.
771
772\starttyping
773\font\one=cmr10 at 10pt
774\font\two=cmr10 at 10pt
775\stoptyping
776
777These two definitions make \TEX\ load the font only once. However, since
778we can now delegate loading to \LUA, \TEX\ no longer helps us there. For instance,
779\TEX\ has no knowledge to what extend this \type {cmr10} font has been manipulated
780and therefore both instances may actually differ.
781
782When you use a callback to define the font, \TEX\ passes a font id number. You can
783use this number as a reference to a loaded font (that is, passed to \TEX). If
784instead of a table, you return a number, \TEX\ will reuse the already loaded font.
785This feature can save you a lot of time, especially when a macro package (like
786\CONTEXT) defines  fonts dynamically which means that when grouping is used, fonts
787get (re)defined a lot. Of course additional caching can take place at the \LUA\ end,
788but there one needs to take into account more than just the scaled instance. Think of
789\OPENTYPE\ features or virtual font properties. The following are quite certainly
790different setups, in spite of the common size.
791
792\starttyping
793\font\one=lmr10@demo-1 at 10pt
794\font\two=lmr10@demo-2 at 10pt
795\stoptyping
796
797When scaling a font, one not only needs to handle the regular glyph dimensions, but also the
798kerning tables. We found out that dealing with such issues takes some 25\% of the time
799spent on loading Latin Modern fonts that have rather extensive kerning tables.
800When creating a virtual font, copying glyph tables may happen a lot. Deep copying
801tables takes a bit of time. This is one of the reasons why we discussed (and consider)
802some dedicated support functions so that copying and recalculating tables happens faster
803(less costly hash lookups and such).  On the other hand, the time wasted on calculations
804(including rounding to scaled points) can be neglected.
805
806The following table shows what happens when we enforce a different
807garbage collecting scheme. This test was triggered by another experiment
808where at regular time, for instance after a pag eis shipped out, say
809
810\starttyping
811collectgarbage("collect")
812\stoptyping
813
814However, such a complete sweep has drastic consequences for the runtime.
815But, since the memory footprint becomes 10--15\% less by doing so, we
816played a bit with
817
818\starttyping
819collectgarbage("setstepmul", somenumber)
820\stoptyping
821
822When processing a not so large file but one that loads a bunch of open type
823fonts, we get the following values. The left set is on linux (Taco's machine)
824and the right set in mine.
825
826\starttabulate[|r|r|r|r|r|]
827\NC \bf stepmul \NC \bf run (s) \NC \bf mem (MB) \NC \bf run (s) \NC \bf mem (MB) \NC \NR
828\HL
829\NC     200 \NC 1.58    \NC 69.14    \NC 5.6     \NC 84.17     \NC \NR
830\NC    1000 \NC 1.63    \NC 69.14    \NC 6.5     \NC 72.32     \NC \NR
831\NC    2000 \NC 1.64    \NC 60.66    \NC 6.8     \NC 73.53     \NC \NR
832\NC   10000 \NC 1.71    \NC 59.94    \NC 7.0     \NC 72.30     \NC \NR
833\stoptabulate
834
835Since I use an old laptop running Windows with a probably
836different \TEX\ configuration (fonts), and under some load, both columns
837don't compare well, but the general idea is the same. For practical usage
838a value of 1000 is probably best, especially because memory intensive font
839and script loading only happens at the first couple of pages.
840
841\stopcomponent
842