onandon-ffi.tex /size: 25 Kb    last modification: 2023-12-21 09:43
1% language=us
2
3\startcomponent onandon-ffi
4
5\environment onandon-environment
6
7\startchapter[title={Plug mode, an application of ffi}]
8
9A while ago, at an NTG meeting, Kai Eigner and Ivo Geradts demonstrated how to
10use the Harfbuzz (hb) library for processing \OPENTYPE\ fonts. The main
11motivation for them playing with that was that it provides a way to compare the
12\LUA\ based font machinery with other methods. They also assumed that it would
13give a better performance for complex fonts and|/|or scripts.
14
15One of the guiding principles of \LUATEX\ development is that we don't provide
16hard coded solutions. For that reason we opened up the internals so that one can
17provide solutions written in pure \LUA, but, of course, one can cooperate with
18libraries via \LUA\ code as well. Hard coding solutions makes no sense as there
19are often several solutions possible, depending on one's need. Although
20development is closely related to \CONTEXT, the development of the \LUATEX\
21engine is generic. We try to be macro package agnostic. Already in an early stage
22we made sure that the \CONTEXT\ font handler could be used in other packages as
23well, but one can easily dream up light weight variants for specific purposes.
24The standard \TEX\ font handling was kept and is called \type {base} mode in
25\CONTEXT. The \LUA\ variant is tagged \type {node} mode because it operates on
26the node list. Later we will refer to these modes.
27
28With the output of \XETEX\ for comparison, the first motive mentioned for looking
29into support for such a library is not that strong. And when we want to test
30against the standard, we can use MS-Word. A minimal \CONTEXT\ \MKIV\ installation
31one only has the \LUATEX\ engine. Maintaining several renderers simultaneously
32might give rise to unwanted dependencies.
33
34The second motive could be more valid for users because, for complex fonts, there
35is|=|or at least was|=|a performance hit with the \LUA\ variant. Some fonts use
36many lookup steps or are inefficient even in using their own features. It must be
37said that till now I haven't heard \CONTEXT\ users complain about speed. In fact,
38the font handling became many times faster the last few years, and probably no
39one even noticed. Also, when using alternatives to the built in methods, in the
40end, you will loose functionality and|/|or interactions with other mechanisms
41that are built into the current font system. Any possible gain in speed is lost,
42or even becomes negative, when a user wants to use additional functionality that
43requires additional processing. \footnote {In general we try to stay away from
44libraries. For instance, graphics can be manipulated with external programs, and
45caching the result is much more efficient than recreating it. Apart from \SQL\
46support, where integration makes sense, I never felt the need for libraries. And
47even \SQL\ can efficiently be dealt with via intermediate files.}
48
49Just kicking in some alternative machinery is not the whole story. We still need
50to deal with the way \TEX\ sees text, and that, in practice, is as a sequence of
51glyph nodes|=|mixed with discretionaries for languages that hyphenate, glue,
52kern, boxes, math, and more. It's the discretionary part that makes it a bit
53complex. In contextual analysis as well as positioning one needs to process up to
54three additional cases: the pre, post and replace texts|=|either or not linked
55backward and forward. And as applied features accumulate one ends up winding and
56unwinding these snippets. In the process one also needs to keep an eye on spaces
57as they can be involved in lookups. Also, when injecting or removing glyphs one
58needs to deal with attributes associated with nodes. Of course something hard
59codes in the engine might help a little, but then one ends up with the situation
60where macro packages have different demands (and possible interactions) and no
61solution is the right one. Using \LUA\ as glue is a way to avoid that problem. In
62fact, once we go along that route, it starts making sense to come up with a
63stripped down \LUATEX\ that might suit \CONTEXT\ better, but it's not a route we
64are eager to follow right now.
65
66Kai and Ivo are plain \TEX\ users so they use a font definition and switching
67environment that is quite different from \CONTEXT. In an average \CONTEXT\ run
68the time spent on font processing is measurable but not the main bottleneck
69because other time consuming things happen. Sometimes the load on the font
70subsystem can be higher because we provide additional features normally not found
71in \OPENTYPE. Add to that a more dynamic font model and it will be clear that
72comparing performance between situations that use different macro packages is not
73that trivial (or relevant).
74
75More reasons why we follow a \LUA\ route are that we: support (run time
76generated) virtual fonts, are able to kick in additional features, can let the
77font mechanism cooperate with other functionality, and so on. In the upcoming
78years more trickery will be provided in the current mechanisms. Because we had to
79figure out a lot of these \OPENTYPE\ things a decade ago when standards were
80fuzzy quite some tracing and visualization is available. Below we will see some
81timings, It's important to keep in mind that in \CONTEXT\ the \OPENTYPE\ font
82handler can do a bit more if requested to do so, which comes with a bit of
83overhead when the handler is used in \CONTEXT|=|something we can live with.
84
85Some time after Kai's presentation he produced an article, and that was the
86moment I looked into the code and tried to replicate his experiments. Because
87we're talking libraries, one can understand that this is not entirely trivial,
88especially because I'm on another platform than he is|=|Windows instead of OSX.
89The first thing that I did was rewrite the code that glues the library to \TEX\
90in a way that is more suitable for \CONTEXT. Mixing with existing modes (\type
91{base} or \type {node} mode) makes no sense and is asking for unwanted
92interferences, so instead a new \type {plug} mode was introduced. A sort of
93general text filtering mechanism was derived from the original code so that we
94can plug in whatever we want. After all, stability is not the strongest point of
95today's software development, so when we depend on a library, we need to be
96prepared for other (library based) solutions|=|for instance, if I understood
97correctly, \XETEX\ switched a few times.
98
99After redoing the code the next step was to get the library running and I decided
100that the \type {ffi} route made most sense. \footnote {One can think of a
101intermediate layer but I'm pretty sure that I have different demands than others,
102but \type {ffi} sort of frees us from endless discussions.} Due to some expected
103functions not being supported, my efforts in using the library failed. At that
104time I thought it was a matter of interfacing, but I could get around it by
105piping into the command line tools that come with the library, and that was good
106enough for testing. Of course it was dead slow, but the main objective was
107comparison of rendering so it doesn't matter that much. After that I just quit
108and moved on to something else.
109
110At some point Kai's article came close to publishing, and I tried the old code
111again, and, surprise, after some messing around, the library worked. On my system
112the one shipped with Inkscape is used, which is okay as it frees me from bothering
113about installations. As already mentioned, we have no real reason in \CONTEXT\
114for using fonts libraries, but the interesting part was that it permitted me to
115play with this so called \type {ffi}. At that moment it was only available in
116\LUAJITTEX\. Because that creates a nasty dependency, after a while, Luigi
117Scarso and I managed to get a similar library working in stock \LUATEX\, which is
118of course the reference. So, I decided to give it a second try, and in the process
119I rewrote the interfacing code. After all, there is no reason not to be nice for
120libraries and optimize the interface where possible.
121
122Now, after a decade of writing \LUA\ code, I dare to claim that I know a bit
123about how to write relatively fast code. I was surprised to see that where Kai
124claimed that the library was faster than the \LUA\ code.I saw that it really
125depends on the font. Sometimes the library approach is actually slower, which is
126not what one expects. But remember that one argument for using a library is for
127complex fonts and scripts. So what is meant with complex?
128
129Most Latin fonts are not complex|=|ligatures and kerns and maybe a little bit of
130contextual analysis. Here the \LUA\ variant is the clear winner. It runs upto ten
131times faster. For more complex Latin fonts, like EBgaramond, that resolves
132ligatures in a different way, the library catches up, but still the \LUA\ handler
133is faster. Keep in mind that we need to juggle discretionary nodes in any case.
134One difference between both methods is that the \LUA\ handler runs over all the
135lists (although it has to jump over fonts not being processed then), while the
136library gets snippets. However, tests show that the overhead involved in that is
137close to zero and can be neglected. Already long ago we saw that when we compared
138\MKIV\ \LUATEX\ and \MKII\ \XETEX, the \LUA\ based font handler is not that slow
139at all. This makes sense because the problem doesn't change, and maybe more
140importantly because \LUA\ is a pretty fast language. If one or the other approach
141is less that two times faster the gain will probably go unnoticed in real runs.
142In my experience a few bad choices in macro or style writing is more harmful than
143a bit slower font machinery. Kick in some additional node processing and it might
144make comparison of a run even harder. By the way, one reason why font handling
145has been sped up over the years is because our workflows sometimes have a high
146load, and, for instance, processing a set of 5 documents remotely has to be fast.
147Also, in an edit workflow you want the runtime to be a bit comfortable.
148
149Contrary to Latin, a pure Arabic text (normally) has no discretionary nodes, and
150the library profits most of this. Some day I have to pick up the thread with
151Idris about the potential use of discretionary nodes in Arabic typesetting.
152Contrary to Arabic, Latin text has not many replacements and positioning, and,
153therefore, the \LUA\ variant gets the advantage. Some of the additional features
154that the \LUA\ variant provides can, of course, be provided for the library
155variant by adding some pre- and postprocessing of the list, but then you quickly
156loose any gain a library provides. So, Arabic has less complex node lists with no
157branches into discretinaries, but it definitely has more replacements,
158positioning and contextual lookups due to the many calls to helpers in the \LUA\
159code. Here the library should win because it can (I assume) use more optimized
160datastructures.
161
162In Kai's prototype there are some cheats for right|-|to|-|left rendering and
163special scripts like Devanagari. As these tweaks mostly involve discretionary
164nodes; there is no real need for them. When we don't hyphenate no time is wasted
165anyway. I didn't test Devanagari, but there is some preprocessing needed in the
166\LUA\ variant (provided by Kai and Ivo) that I might rewrite from scratch once I
167understand what happens there. But still, I expect the library to perform
168somewhat better there but I didn't test it. Eventually I might add support for
169some more scripts that demand special treatments, but so far there has not been
170any request for it.
171
172So what is the processing speed of non|-|Latin scripts? An experiment with Arabic
173using the frequently used Arabtype font showed that the library performs faster,
174but when we use a mixed Latin and Arabic document the differences become less
175significant. On pure Latin documents the \LUA\ variant will probably win. On pure
176Arabic the library might be on top. On average there is little difference in
177processing speed between the \LUA\ and library engines when processing mixed
178documents. The main question is, does one want to loose functionality provided by
179the \LUA\ variant? Of course one can depend on functionality provided by the
180library but not by the \LUA\ variant. In the end the user decides.
181
182How did we measure? The baseline measurement is the so called \type {none} mode:
183nothing is done there. It's fast but still takes a bit of time as it is triggered
184by a general mode identifying pass. That pass determines what font processing
185modes are needed for a list. \type {Base} mode only makes sense for Latin and has
186some limitations. It's fast and, basically, its run time can be neglected. That's
187why, for instance, \PDFTEX\ is faster than the other engines, but it doesn't do
188\UNICODE\ well. \type {Node} mode is the fancy name for the \LUA\ font handler.
189So, in order of increasing run time we have: \type {none}, \type {base} and \type
190{node}. If we compare \type{node} mode with \type {plug} mode (in our case using
191the hb library), we can subtract \type {none} mode. This gives a cleaner (more
192distinctive) comparison but not a real honest one because the identifying pass
193always happens.
194
195We also tested with and without hyphenation, but in practice that makes no sense.
196Only verbatim is typeset that way, and normally we typeset that in \type {none}
197mode anyway. On the other hand mixing fonts does happen. All the tests start with
198forced garbage collection in order to get rid of that variance. We also pack into
199horizontal boxes so that the par builder (with all kind of associated callbacks)
200doesn't kick in, although the \type {node} mode should compensate that.
201
202Keep in mind that the tests are somewhat dumb. There is no overhead in handling
203structure, building pages, adding color or whatever. I never process raw text. As
204a reference it's no problem to let \CONTEXT\ process hundreds of pages per
205second. In practice a moderate complex document like the metafun manual does some
20620 pages per second. In other words, only a fraction of the time is spent on
207fonts. The timings for \LUATEX\ are as follows:
208
209\usemodule[m-fonts-plugins]
210
211\startluacode
212    local process = moduledata.plugins.processlist
213    local data    = table.load("m-fonts-plugins-timings-luatex.lua")
214                 or table.load("t:/sources/m-fonts-plugins-timings-luatex.lua")
215
216    context.testpage { 6 }
217    context.subsubject("luatex latin")
218    process(data.timings.latin)
219    context.testpage { 6 }
220    context.subsubject("luatex arabic")
221    process(data.timings.arabic)
222    context.testpage { 6 }
223    context.subsubject("luatex mixed")
224    process(data.timings.mixed)
225\stopluacode
226
227The timings for \LUAJITTEX\ are, of course, overall better. This is because the
228virtual machine is faster, but at the cost of some limitations. We seldom run
229into these limitations, but fonts with large tables can't be cached unless we
230rewrite some code and sacrifice clean solutions. Instead, we perform a runtime
231conversion which is not that noticeable when it's just a few fonts. The numbers
232below are not influenced by this as the test stays away from these rare cases.
233
234\startluacode
235    local process = moduledata.plugins.processlist
236    local data    = table.load("m-fonts-plugins-timings-luajittex.lua")
237                 or table.load("t:/sources/m-fonts-plugins-timings-luajittex.lua")
238
239    context.testpage { 6 }
240    context.subsubject("luajittex latin")
241    process(data.timings.latin)
242    context.testpage { 6 }
243    context.subsubject("luajittex arabic")
244    process(data.timings.arabic)
245    context.testpage { 6 }
246    context.subsubject("luajittex mixed")
247    process(data.timings.mixed)
248\stopluacode
249
250A few side notes. Since a library is an abstraction, one has to live with what
251one gets. In my case that was a crash in \UTF-32 mode. I could get around it, but
252one advantage of using \LUA\ is that it's hard to crash|=|if only because as a
253scripting language it manages its memory well without user interference. My
254policy with libraries is just to wait till things get fixed and not bother with
255the why and how of the internals.
256
257Although \CONTEXT\ will officially support the \type {plug} model, it will not be
258actively used by me, or in documentation, so for support users are on their own.
259I didn't test the \type {plug} mode in real documents. Most documents that I
260process are Latin (or a mix), and redefining feature sets or adapting styles for
261testing makes no sense. So, can one just switch engines without looking at the
262way a font is defined? The answer is|=|not really, because (even without the user
263knowing about it) virtual fonts might be used, additional features kicked in and
264other mechanisms can make assumptions about how fonts are dealt with too.
265
266The useability of \type {plug} mode probably depends on the workflow one has. We
267use \CONTEXT\ in a few very specific workflows where, interestingly, we only use a
268small subset of its functionality. Most of which is driven by users, and tweaking
269fonts is popular and has resulted in all kind of mechanisms. So, for us it's
270unlikely that we will use it. If you process (in bursts) many documents in
271succession, each demanding a few runs, you don't want to sacrifice speed.
272
273Of course timing can (and likely will) be different for plain \TEX\ and \LATEX\
274usage. It depends on how mechanisms are hooked into the callbacks, what extra
275work is done or not done compared to \CONTEXT. This means that my timings for
276\CONTEXT\ for sure will differ from those of other packages. Timings are a
277snapshot anyway. And as said, font processing is just one of the many things that
278goes on. If you are not using \CONTEXT\ you probably will use Kai's version
279because it is adapted to his use case and well tested.
280
281A fundamental difference between the two approaches is that|=|whereas the \LUA\
282variant operates on node lists only, the \type {plug} variant generates strings
283that get passed to a library where, in the \CONTEXT\ variant of hb support, we
284use \UTF-32 strings. Interesting, a couple of years ago I considered using a
285similar method for \LUA\ but eventually decided against it, first of all for
286performance reasons, but mostly because one still has to use some linked list
287model. I might pick up that idea as a variant, but because all this \TEX\ related
288development doesn't really pay off and costs a lot of free time it will probably
289never happen.
290
291I finish with a few words on how to use the plug model. Because the library
292initializes a default set of features,\footnote {Somehow passing features to the
293library fails for Arabic. So when you don't get the desired result, just try with
294the defaults.} all you need to do is load the plugin mechanism:
295
296\starttyping
297\usemodule[fonts-plugins]
298\stoptyping
299
300Next you define features that use this extension:
301
302\starttyping
303\definefontfeature
304  [hb-native]
305  [mode=plug,
306   features=harfbuzz,
307   shaper=native]
308\stoptyping
309
310After this you can use this feature set when you define fonts. Here is a complete
311example:
312
313\starttyping
314\usemodule[fonts-plugins]
315
316\starttext
317
318    \definefontfeature
319      [hb-library]
320      [mode=plug,
321       features=harfbuzz,
322       shaper=native]
323
324    \definedfont[Serif*hb-library]
325
326    \input ward \par
327
328    \definefontfeature
329      [hb-binary]
330      [mode=plug,
331       features=harfbuzz,
332       method=binary,
333       shaper=uniscribe]
334
335    \definedfont[Serif*hb-binary]
336
337    \input ward \par
338
339\stoptext
340\stoptyping
341
342The second variant uses the \type {hb-shape} binary which is, of course, pretty
343slow, but does the job and is okay for testing.
344
345There are a few trackers available too:
346
347\starttyping
348\enabletrackers[fonts.plugins.hb.colors]
349\enabletrackers[fonts.plugins.hb.details]
350\stoptyping
351
352The first one colors replaced glyphs while the second gives lot of information
353about what is going on. If you want to know what gets passed to the library you
354can use the \type {text} plugin:
355
356\starttyping
357\definefontfeature[test][mode=plug,features=text]
358\start
359    \definedfont[Serif*test]
360    \input ward \par
361\stop
362\stoptyping
363
364This produces something:
365
366\starttyping[style=\ttx]
367otf plugin > text > start run 3
368otf plugin > text > 001 : [-] The [+]-> U+00054 U+00068 U+00065
369otf plugin > text > 002 : [+] Earth, [+]-> U+00045 U+00061 U+00072 ...
370otf plugin > text > 003 : [+] as [+]-> U+00061 U+00073
371otf plugin > text > 004 : [+] a [+]-> U+00061
372otf plugin > text > 005 : [+] habi- [-]-> U+00068 U+00061 U+00062 ...
373otf plugin > text > 006 : [-] tat [+]-> U+00074 U+00061 U+00074
374otf plugin > text > 007 : [+] habitat [+]-> U+00068 U+00061 U+00062 ...
375otf plugin > text > 008 : [+] for [+]-> U+00066 U+0006F U+00072
376otf plugin > text > 009 : [+] an- [-]-> U+00061 U+0006E U+0002D
377\stoptyping
378
379You can see how hyphenation of \type {habi-tat} results in two snippets and a
380whole word. The font engine can decide to turn this word
381into a disc node with a pre, post and replace text. Of course the machinery will
382try to retain as many hyphenation points as possible. Among the tricky parts of
383this are lookups across and inside discretionary nodes resulting in (optional)
384replacements and kerning. You can imagine that there is some trade off between
385performance and quality here. The results are normally acceptable, especially
386because \TEX\ is so clever in breaking paragraphs into lines.
387
388Using this mechanism (there might be variants in the future) permits the user to
389cook up special solutions. After all, that is what \LUATEX\ is about|=|the
390traditional core engine with the ability to plug in your own code using \LUA.
391This is just an example of it.
392
393I'm not sure yet when the plugin mechanism will be in the \CONTEXT\ distribution,
394but it might happen once the \type {ffi} library is supported in \LUATEX. At the
395end of this document the basics of the test setup are shown, just in case you
396wonder what the numbers apply to.
397
398Just to put things in perspective, the current (February 2017) \METAFUN\ manual
399has 424 pages. It takes \LUATEX\ 18.3 seconds and \LUAJITTEX\ 14.4 seconds on my
400Dell 7600 laptop with 3840QM mobile i7 processor. Of this 6.1 (4.5) seconds is
401used for processing 2170 \METAPOST\ graphics. Loading the 15 fonts used takes
4020.25 (0.3) seconds, which includes also loading the outline of some. Font
403handling is part of the, so called, hlist processing and takes around 1 (0.5)
404second, and attribute backend processing takes 0.7 (0.3) seconds. One problem in
405these timings is that font processing often goes too fast for timing, especially
406when we have lots of small snippets. For example, short runs like titles and such
407take no time at all, and verbatim needs no font processing. The difference in
408runtime between \LUATEX\ and \LUAJITTEX\ is significant so we can safely assume
409that we spend some more time on fonts than reported. Even if we add a few
410seconds, in this rather complete document, the time spent on fonts is still not
411that impressive. A five fold increase in processing (we use mostly Pagella and
412Dejavu) is a significant addition to the total run time, especially if you need a
413few runs to get cross referencing etc.\ right.
414
415The test files are the familiar ones present in the distribution. The \type
416{tufte} example is a good torture test for discretionary processing. We preload
417the files so that we don't have the overhead of \type {\input}.
418
419\starttyping
420\edef\tufte{\cldloadfile{tufte.tex}}
421\edef\khatt{\cldloadfile{khatt-ar.tex}}
422\stoptyping
423
424We use six buffers for the tests. The Latin test uses three fonts and also
425has a paragraph with mixed font usage. Loading the fonts happens once before
426the test, and the local (re)definition takes no time. Also, we compensate
427for general overhead by subtracting the \type {none} timings.
428
429\starttyping
430\startbuffer[latin-definitions]
431\definefont[TestA][Serif*test]
432\definefont[TestB][SerifItalic*test]
433\definefont[TestC][SerifBold*test]
434\stopbuffer
435
436\startbuffer[latin-text]
437\TestA \tufte \par
438\TestB \tufte \par
439\TestC \tufte \par
440\dorecurse {10} {%
441    \TestA Fluffy Test Font A
442    \TestB Fluffy Test Font B
443    \TestC Fluffy Test Font C
444}\par
445\stopbuffer
446\stoptyping
447
448The Arabic tests are a bit simpler. Of course we do need to make sure that we go
449from right to left.
450
451\starttyping
452\startbuffer[arabic-definitions]
453\definedfont[Arabic*test at 14pt]
454\setupinterlinespace[line=18pt]
455\setupalign[r2l]
456\stopbuffer
457
458\startbuffer[arabic-text]
459\dorecurse {10} {
460    \khatt\space
461    \khatt\space
462    \khatt\blank
463}
464\stopbuffer
465\stoptyping
466
467The mixed case use a Latin and an Arabic font and also processes a mixed script
468paragraph.
469
470\starttyping
471\startbuffer[mixed-definitions]
472\definefont[TestL][Serif*test]
473\definefont[TestA][Arabic*test at 14pt]
474\setupinterlinespace[line=18pt]
475\setupalign[r2l]
476\stopbuffer
477
478\startbuffer[mixed-text]
479\dorecurse {2} {
480    {\TestA\khatt\space\khatt\space\khatt}
481    {\TestL\lefttoright\tufte}
482    \blank
483    \dorecurse{10}{%
484        {\TestA وَ قَرْمِطْ بَيْنَ الْحُرُوفِ؛ فَإِنَّ}
485        {\TestL\lefttoright A snippet text that makes no sense.}
486    }
487}
488\stopbuffer
489\stoptyping
490
491The related font features are defined as follows:
492
493\starttyping
494\definefontfeature
495  [test-none]
496  [mode=none]
497
498\definefontfeature
499  [test-base]
500  [mode=base,
501   liga=yes,
502   kern=yes]
503
504\definefontfeature
505  [test-node]
506  [mode=node,
507   script=auto,
508   autoscript=position,
509   autolanguage=position,
510   ccmp=yes,liga=yes,clig=yes,
511   kern=yes,mark=yes,mkmk=yes,
512   curs=yes]
513
514\definefontfeature
515  [test-text]
516  [mode=plug,
517   features=text]
518
519\definefontfeature
520  [test-native]
521  [mode=plug,
522   features=harfbuzz,
523   shaper=native]
524
525\definefontfeature
526  [arabic-node]
527  [arabic]
528
529\definefontfeature
530  [arabic-native]
531  [mode=plug,
532   features=harfbuzz,
533   script=arab,language=dflt,
534   shaper=native]
535\stoptyping
536
537The timings are collected in \LUA\ tables and typeset afterwards, so there is no
538interference there either.
539
540{\em The timings are as usual a snapshot and just indications. The relative times
541can differ over time depending on how binaries are compiled, libraries are
542improved and \LUA\ code evolves. In node mode we can have experimental trickery
543that is not yet optimized. Also, especially with complex fonts like Husayni, not
544all shapers give the same result, although node mode and Uniscribe should be the
545same in most cases. A future (public) version of Husayni will play more safe and
546use less complex sequences of features.}
547
548% And for the record: when I finished it, this 12 page documents processes in
549% roughly 1~second with \LUATEX\ and 0.8 second with \LUAJITTEX\ which is okay for
550% a edit|-|preview cycle.
551
552\stopchapter
553
554\stopcomponent
555