% language=us

\startcomponent onandon-ffi

\environment onandon-environment

\startchapter[title={Plug mode, an application of ffi}]

A while ago, at an NTG meeting, Kai Eigner and Ivo Geradts demonstrated how to
use the Harfbuzz (hb) library for processing \OPENTYPE\ fonts. The main
motivation for them playing with that was that it provides a way to compare the
\LUA\ based font machinery with other methods. They also assumed that it would
give a better performance for complex fonts and|/|or scripts.

One of the guiding principles of \LUATEX\ development is that we don't provide
hard coded solutions. For that reason we opened up the internals so that one can
provide solutions written in pure \LUA, but, of course, one can cooperate with
libraries via \LUA\ code as well. Hard coding solutions makes no sense as there
are often several solutions possible, depending on one's need. Although
development is closely related to \CONTEXT, the development of the \LUATEX\
engine is generic. We try to be macro package agnostic. Already in an early stage
we made sure that the \CONTEXT\ font handler could be used in other packages as
well, but one can easily dream up light weight variants for specific purposes.
The standard \TEX\ font handling was kept and is called \type {base} mode in
\CONTEXT. The \LUA\ variant is tagged \type {node} mode because it operates on
the node list. Later we will refer to these modes.

With the output of \XETEX\ for comparison, the first motive mentioned for looking
into support for such a library is not that strong. And when we want to test
against the standard, we can use MS-Word. A minimal \CONTEXT\ \MKIV\ installation
one only has the \LUATEX\ engine. Maintaining several renderers simultaneously
might give rise to unwanted dependencies.

The second motive could be more valid for users because, for complex fonts, there
is|=|or at least was|=|a performance hit with the \LUA\ variant. Some fonts use
many lookup steps or are inefficient even in using their own features. It must be
said that till now I haven't heard \CONTEXT\ users complain about speed. In fact,
the font handling became many times faster the last few years, and probably no
one even noticed. Also, when using alternatives to the built in methods, in the
end, you will loose functionality and|/|or interactions with other mechanisms
that are built into the current font system. Any possible gain in speed is lost,
or even becomes negative, when a user wants to use additional functionality that
requires additional processing. \footnote {In general we try to stay away from
libraries. For instance, graphics can be manipulated with external programs, and
caching the result is much more efficient than recreating it. Apart from \SQL\
support, where integration makes sense, I never felt the need for libraries. And
even \SQL\ can efficiently be dealt with via intermediate files.}

Just kicking in some alternative machinery is not the whole story. We still need
to deal with the way \TEX\ sees text, and that, in practice, is as a sequence of
glyph nodes|=|mixed with discretionaries for languages that hyphenate, glue,
kern, boxes, math, and more. It's the discretionary part that makes it a bit
complex. In contextual analysis as well as positioning one needs to process up to
three additional cases: the pre, post and replace texts|=|either or not linked
backward and forward. And as applied features accumulate one ends up winding and
unwinding these snippets. In the process one also needs to keep an eye on spaces
as they can be involved in lookups. Also, when injecting or removing glyphs one
needs to deal with attributes associated with nodes. Of course something hard
codes in the engine might help a little, but then one ends up with the situation
where macro packages have different demands (and possible interactions) and no
solution is the right one. Using \LUA\ as glue is a way to avoid that problem. In
fact, once we go along that route, it starts making sense to come up with a
stripped down \LUATEX\ that might suit \CONTEXT\ better, but it's not a route we
are eager to follow right now.

Kai and Ivo are plain \TEX\ users so they use a font definition and switching
environment that is quite different from \CONTEXT. In an average \CONTEXT\ run
the time spent on font processing is measurable but not the main bottleneck
because other time consuming things happen. Sometimes the load on the font
subsystem can be higher because we provide additional features normally not found
in \OPENTYPE. Add to that a more dynamic font model and it will be clear that
comparing performance between situations that use different macro packages is not
that trivial (or relevant).

More reasons why we follow a \LUA\ route are that we: support (run time
generated) virtual fonts, are able to kick in additional features, can let the
font mechanism cooperate with other functionality, and so on. In the upcoming
years more trickery will be provided in the current mechanisms. Because we had to
figure out a lot of these \OPENTYPE\ things a decade ago when standards were
fuzzy quite some tracing and visualization is available. Below we will see some
timings, It's important to keep in mind that in \CONTEXT\ the \OPENTYPE\ font
handler can do a bit more if requested to do so, which comes with a bit of
overhead when the handler is used in \CONTEXT|=|something we can live with.

Some time after Kai's presentation he produced an article, and that was the
moment I looked into the code and tried to replicate his experiments. Because
we're talking libraries, one can understand that this is not entirely trivial,
especially because I'm on another platform than he is|=|Windows instead of OSX.
The first thing that I did was rewrite the code that glues the library to \TEX\
in a way that is more suitable for \CONTEXT. Mixing with existing modes (\type
{base} or \type {node} mode) makes no sense and is asking for unwanted
interferences, so instead a new \type {plug} mode was introduced. A sort of
general text filtering mechanism was derived from the original code so that we
can plug in whatever we want. After all, stability is not the strongest point of
today's software development, so when we depend on a library, we need to be
prepared for other (library based) solutions|=|for instance, if I understood
correctly, \XETEX\ switched a few times.

After redoing the code the next step was to get the library running and I decided
that the \type {ffi} route made most sense. \footnote {One can think of a
intermediate layer but I'm pretty sure that I have different demands than others,
but \type {ffi} sort of frees us from endless discussions.} Due to some expected
functions not being supported, my efforts in using the library failed. At that
time I thought it was a matter of interfacing, but I could get around it by
piping into the command line tools that come with the library, and that was good
enough for testing. Of course it was dead slow, but the main objective was
comparison of rendering so it doesn't matter that much. After that I just quit
and moved on to something else.

At some point Kai's article came close to publishing, and I tried the old code
again, and, surprise, after some messing around, the library worked. On my system
the one shipped with Inkscape is used, which is okay as it frees me from bothering
about installations. As already mentioned, we have no real reason in \CONTEXT\
for using fonts libraries, but the interesting part was that it permitted me to
play with this so called \type {ffi}. At that moment it was only available in
\LUAJITTEX\. Because that creates a nasty dependency, after a while, Luigi
Scarso and I managed to get a similar library working in stock \LUATEX\, which is
of course the reference. So, I decided to give it a second try, and in the process
I rewrote the interfacing code. After all, there is no reason not to be nice for
libraries and optimize the interface where possible.

Now, after a decade of writing \LUA\ code, I dare to claim that I know a bit
about how to write relatively fast code. I was surprised to see that where Kai
claimed that the library was faster than the \LUA\ code.I saw that it really
depends on the font. Sometimes the library approach is actually slower, which is
not what one expects. But remember that one argument for using a library is for
complex fonts and scripts. So what is meant with complex?

Most Latin fonts are not complex|=|ligatures and kerns and maybe a little bit of
contextual analysis. Here the \LUA\ variant is the clear winner. It runs upto ten
times faster. For more complex Latin fonts, like EBgaramond, that resolves
ligatures in a different way, the library catches up, but still the \LUA\ handler
is faster. Keep in mind that we need to juggle discretionary nodes in any case.
One difference between both methods is that the \LUA\ handler runs over all the
lists (although it has to jump over fonts not being processed then), while the
library gets snippets. However, tests show that the overhead involved in that is
close to zero and can be neglected. Already long ago we saw that when we compared
\MKIV\ \LUATEX\ and \MKII\ \XETEX, the \LUA\ based font handler is not that slow
at all. This makes sense because the problem doesn't change, and maybe more
importantly because \LUA\ is a pretty fast language. If one or the other approach
is less that two times faster the gain will probably go unnoticed in real runs.
In my experience a few bad choices in macro or style writing is more harmful than
a bit slower font machinery. Kick in some additional node processing and it might
make comparison of a run even harder. By the way, one reason why font handling
has been sped up over the years is because our workflows sometimes have a high
load, and, for instance, processing a set of 5 documents remotely has to be fast.
Also, in an edit workflow you want the runtime to be a bit comfortable.

Contrary to Latin, a pure Arabic text (normally) has no discretionary nodes, and
the library profits most of this. Some day I have to pick up the thread with
Idris about the potential use of discretionary nodes in Arabic typesetting.
Contrary to Arabic, Latin text has not many replacements and positioning, and,
therefore, the \LUA\ variant gets the advantage. Some of the additional features
that the \LUA\ variant provides can, of course, be provided for the library
variant by adding some pre- and postprocessing of the list, but then you quickly
loose any gain a library provides. So, Arabic has less complex node lists with no
branches into discretinaries, but it definitely has more replacements,
positioning and contextual lookups due to the many calls to helpers in the \LUA\
code. Here the library should win because it can (I assume) use more optimized
datastructures.

In Kai's prototype there are some cheats for right|-|to|-|left rendering and
special scripts like Devanagari. As these tweaks mostly involve discretionary
nodes; there is no real need for them. When we don't hyphenate no time is wasted
anyway. I didn't test Devanagari, but there is some preprocessing needed in the
\LUA\ variant (provided by Kai and Ivo) that I might rewrite from scratch once I
understand what happens there. But still, I expect the library to perform
somewhat better there but I didn't test it. Eventually I might add support for
some more scripts that demand special treatments, but so far there has not been
any request for it.

So what is the processing speed of non|-|Latin scripts? An experiment with Arabic
using the frequently used Arabtype font showed that the library performs faster,
but when we use a mixed Latin and Arabic document the differences become less
significant. On pure Latin documents the \LUA\ variant will probably win. On pure
Arabic the library might be on top. On average there is little difference in
processing speed between the \LUA\ and library engines when processing mixed
documents. The main question is, does one want to loose functionality provided by
the \LUA\ variant? Of course one can depend on functionality provided by the
library but not by the \LUA\ variant. In the end the user decides.

How did we measure? The baseline measurement is the so called \type {none} mode:
nothing is done there. It's fast but still takes a bit of time as it is triggered
by a general mode identifying pass. That pass determines what font processing
modes are needed for a list. \type {Base} mode only makes sense for Latin and has
some limitations. It's fast and, basically, its run time can be neglected. That's
why, for instance, \PDFTEX\ is faster than the other engines, but it doesn't do
\UNICODE\ well. \type {Node} mode is the fancy name for the \LUA\ font handler.
So, in order of increasing run time we have: \type {none}, \type {base} and \type
{node}. If we compare \type{node} mode with \type {plug} mode (in our case using
the hb library), we can subtract \type {none} mode. This gives a cleaner (more
distinctive) comparison but not a real honest one because the identifying pass
always happens.

We also tested with and without hyphenation, but in practice that makes no sense.
Only verbatim is typeset that way, and normally we typeset that in \type {none}
mode anyway. On the other hand mixing fonts does happen. All the tests start with
forced garbage collection in order to get rid of that variance. We also pack into
horizontal boxes so that the par builder (with all kind of associated callbacks)
doesn't kick in, although the \type {node} mode should compensate that.

Keep in mind that the tests are somewhat dumb. There is no overhead in handling
structure, building pages, adding color or whatever. I never process raw text. As
a reference it's no problem to let \CONTEXT\ process hundreds of pages per
second. In practice a moderate complex document like the metafun manual does some
20 pages per second. In other words, only a fraction of the time is spent on
fonts. The timings for \LUATEX\ are as follows:

\usemodule[m-fonts-plugins]

\startluacode
    local process = moduledata.plugins.processlist
    local data    = table.load("m-fonts-plugins-timings-luatex.lua")
                 or table.load("t:/sources/m-fonts-plugins-timings-luatex.lua")

    context.testpage { 6 }
    context.subsubject("luatex latin")
    process(data.timings.latin)
    context.testpage { 6 }
    context.subsubject("luatex arabic")
    process(data.timings.arabic)
    context.testpage { 6 }
    context.subsubject("luatex mixed")
    process(data.timings.mixed)
\stopluacode

The timings for \LUAJITTEX\ are, of course, overall better. This is because the
virtual machine is faster, but at the cost of some limitations. We seldom run
into these limitations, but fonts with large tables can't be cached unless we
rewrite some code and sacrifice clean solutions. Instead, we perform a runtime
conversion which is not that noticeable when it's just a few fonts. The numbers
below are not influenced by this as the test stays away from these rare cases.

\startluacode
    local process = moduledata.plugins.processlist
    local data    = table.load("m-fonts-plugins-timings-luajittex.lua")
                 or table.load("t:/sources/m-fonts-plugins-timings-luajittex.lua")

    context.testpage { 6 }
    context.subsubject("luajittex latin")
    process(data.timings.latin)
    context.testpage { 6 }
    context.subsubject("luajittex arabic")
    process(data.timings.arabic)
    context.testpage { 6 }
    context.subsubject("luajittex mixed")
    process(data.timings.mixed)
\stopluacode

A few side notes. Since a library is an abstraction, one has to live with what
one gets. In my case that was a crash in \UTF-32 mode. I could get around it, but
one advantage of using \LUA\ is that it's hard to crash|=|if only because as a
scripting language it manages its memory well without user interference. My
policy with libraries is just to wait till things get fixed and not bother with
the why and how of the internals.

Although \CONTEXT\ will officially support the \type {plug} model, it will not be
actively used by me, or in documentation, so for support users are on their own.
I didn't test the \type {plug} mode in real documents. Most documents that I
process are Latin (or a mix), and redefining feature sets or adapting styles for
testing makes no sense. So, can one just switch engines without looking at the
way a font is defined? The answer is|=|not really, because (even without the user
knowing about it) virtual fonts might be used, additional features kicked in and
other mechanisms can make assumptions about how fonts are dealt with too.

The useability of \type {plug} mode probably depends on the workflow one has. We
use \CONTEXT\ in a few very specific workflows where, interestingly, we only use a
small subset of its functionality. Most of which is driven by users, and tweaking
fonts is popular and has resulted in all kind of mechanisms. So, for us it's
unlikely that we will use it. If you process (in bursts) many documents in
succession, each demanding a few runs, you don't want to sacrifice speed.

Of course timing can (and likely will) be different for plain \TEX\ and \LATEX\
usage. It depends on how mechanisms are hooked into the callbacks, what extra
work is done or not done compared to \CONTEXT. This means that my timings for
\CONTEXT\ for sure will differ from those of other packages. Timings are a
snapshot anyway. And as said, font processing is just one of the many things that
goes on. If you are not using \CONTEXT\ you probably will use Kai's version
because it is adapted to his use case and well tested.

A fundamental difference between the two approaches is that|=|whereas the \LUA\
variant operates on node lists only, the \type {plug} variant generates strings
that get passed to a library where, in the \CONTEXT\ variant of hb support, we
use \UTF-32 strings. Interesting, a couple of years ago I considered using a
similar method for \LUA\ but eventually decided against it, first of all for
performance reasons, but mostly because one still has to use some linked list
model. I might pick up that idea as a variant, but because all this \TEX\ related
development doesn't really pay off and costs a lot of free time it will probably
never happen.

I finish with a few words on how to use the plug model. Because the library
initializes a default set of features,\footnote {Somehow passing features to the
library fails for Arabic. So when you don't get the desired result, just try with
the defaults.} all you need to do is load the plugin mechanism:

\starttyping
\usemodule[fonts-plugins]
\stoptyping

Next you define features that use this extension:

\starttyping
\definefontfeature
  [hb-native]
  [mode=plug,
   features=harfbuzz,
   shaper=native]
\stoptyping

After this you can use this feature set when you define fonts. Here is a complete
example:

\starttyping
\usemodule[fonts-plugins]

\starttext

    \definefontfeature
      [hb-library]
      [mode=plug,
       features=harfbuzz,
       shaper=native]

    \definedfont[Serif*hb-library]

    \input ward \par

    \definefontfeature
      [hb-binary]
      [mode=plug,
       features=harfbuzz,
       method=binary,
       shaper=uniscribe]

    \definedfont[Serif*hb-binary]

    \input ward \par

\stoptext
\stoptyping

The second variant uses the \type {hb-shape} binary which is, of course, pretty
slow, but does the job and is okay for testing.

There are a few trackers available too:

\starttyping
\enabletrackers[fonts.plugins.hb.colors]
\enabletrackers[fonts.plugins.hb.details]
\stoptyping

The first one colors replaced glyphs while the second gives lot of information
about what is going on. If you want to know what gets passed to the library you
can use the \type {text} plugin:

\starttyping
\definefontfeature[test][mode=plug,features=text]
\start
    \definedfont[Serif*test]
    \input ward \par
\stop
\stoptyping

This produces something:

\starttyping[style=\ttx]
otf plugin > text > start run 3
otf plugin > text > 001 : [-] The [+]-> U+00054 U+00068 U+00065
otf plugin > text > 002 : [+] Earth, [+]-> U+00045 U+00061 U+00072 ...
otf plugin > text > 003 : [+] as [+]-> U+00061 U+00073
otf plugin > text > 004 : [+] a [+]-> U+00061
otf plugin > text > 005 : [+] habi- [-]-> U+00068 U+00061 U+00062 ...
otf plugin > text > 006 : [-] tat [+]-> U+00074 U+00061 U+00074
otf plugin > text > 007 : [+] habitat [+]-> U+00068 U+00061 U+00062 ...
otf plugin > text > 008 : [+] for [+]-> U+00066 U+0006F U+00072
otf plugin > text > 009 : [+] an- [-]-> U+00061 U+0006E U+0002D
\stoptyping

You can see how hyphenation of \type {habi-tat} results in two snippets and a
whole word. The font engine can decide to turn this word
into a disc node with a pre, post and replace text. Of course the machinery will
try to retain as many hyphenation points as possible. Among the tricky parts of
this are lookups across and inside discretionary nodes resulting in (optional)
replacements and kerning. You can imagine that there is some trade off between
performance and quality here. The results are normally acceptable, especially
because \TEX\ is so clever in breaking paragraphs into lines.

Using this mechanism (there might be variants in the future) permits the user to
cook up special solutions. After all, that is what \LUATEX\ is about|=|the
traditional core engine with the ability to plug in your own code using \LUA.
This is just an example of it.

I'm not sure yet when the plugin mechanism will be in the \CONTEXT\ distribution,
but it might happen once the \type {ffi} library is supported in \LUATEX. At the
end of this document the basics of the test setup are shown, just in case you
wonder what the numbers apply to.

Just to put things in perspective, the current (February 2017) \METAFUN\ manual
has 424 pages. It takes \LUATEX\ 18.3 seconds and \LUAJITTEX\ 14.4 seconds on my
Dell 7600 laptop with 3840QM mobile i7 processor. Of this 6.1 (4.5) seconds is
used for processing 2170 \METAPOST\ graphics. Loading the 15 fonts used takes
0.25 (0.3) seconds, which includes also loading the outline of some. Font
handling is part of the, so called, hlist processing and takes around 1 (0.5)
second, and attribute backend processing takes 0.7 (0.3) seconds. One problem in
these timings is that font processing often goes too fast for timing, especially
when we have lots of small snippets. For example, short runs like titles and such
take no time at all, and verbatim needs no font processing. The difference in
runtime between \LUATEX\ and \LUAJITTEX\ is significant so we can safely assume
that we spend some more time on fonts than reported. Even if we add a few
seconds, in this rather complete document, the time spent on fonts is still not
that impressive. A five fold increase in processing (we use mostly Pagella and
Dejavu) is a significant addition to the total run time, especially if you need a
few runs to get cross referencing etc.\ right.

The test files are the familiar ones present in the distribution. The \type
{tufte} example is a good torture test for discretionary processing. We preload
the files so that we don't have the overhead of \type {\input}.

\starttyping
\edef\tufte{\cldloadfile{tufte.tex}}
\edef\khatt{\cldloadfile{khatt-ar.tex}}
\stoptyping

We use six buffers for the tests. The Latin test uses three fonts and also
has a paragraph with mixed font usage. Loading the fonts happens once before
the test, and the local (re)definition takes no time. Also, we compensate
for general overhead by subtracting the \type {none} timings.

\starttyping
\startbuffer[latin-definitions]
\definefont[TestA][Serif*test]
\definefont[TestB][SerifItalic*test]
\definefont[TestC][SerifBold*test]
\stopbuffer

\startbuffer[latin-text]
\TestA \tufte \par
\TestB \tufte \par
\TestC \tufte \par
\dorecurse {10} {%
    \TestA Fluffy Test Font A
    \TestB Fluffy Test Font B
    \TestC Fluffy Test Font C
}\par
\stopbuffer
\stoptyping

The Arabic tests are a bit simpler. Of course we do need to make sure that we go
from right to left.

\starttyping
\startbuffer[arabic-definitions]
\definedfont[Arabic*test at 14pt]
\setupinterlinespace[line=18pt]
\setupalign[r2l]
\stopbuffer

\startbuffer[arabic-text]
\dorecurse {10} {
    \khatt\space
    \khatt\space
    \khatt\blank
}
\stopbuffer
\stoptyping

The mixed case use a Latin and an Arabic font and also processes a mixed script
paragraph.

\starttyping
\startbuffer[mixed-definitions]
\definefont[TestL][Serif*test]
\definefont[TestA][Arabic*test at 14pt]
\setupinterlinespace[line=18pt]
\setupalign[r2l]
\stopbuffer

\startbuffer[mixed-text]
\dorecurse {2} {
    {\TestA\khatt\space\khatt\space\khatt}
    {\TestL\lefttoright\tufte}
    \blank
    \dorecurse{10}{%
        {\TestA وَ قَرْمِطْ بَيْنَ الْحُرُوفِ؛ فَإِنَّ}
        {\TestL\lefttoright A snippet text that makes no sense.}
    }
}
\stopbuffer
\stoptyping

The related font features are defined as follows:

\starttyping
\definefontfeature
  [test-none]
  [mode=none]

\definefontfeature
  [test-base]
  [mode=base,
   liga=yes,
   kern=yes]

\definefontfeature
  [test-node]
  [mode=node,
   script=auto,
   autoscript=position,
   autolanguage=position,
   ccmp=yes,liga=yes,clig=yes,
   kern=yes,mark=yes,mkmk=yes,
   curs=yes]

\definefontfeature
  [test-text]
  [mode=plug,
   features=text]

\definefontfeature
  [test-native]
  [mode=plug,
   features=harfbuzz,
   shaper=native]

\definefontfeature
  [arabic-node]
  [arabic]

\definefontfeature
  [arabic-native]
  [mode=plug,
   features=harfbuzz,
   script=arab,language=dflt,
   shaper=native]
\stoptyping

The timings are collected in \LUA\ tables and typeset afterwards, so there is no
interference there either.

{\em The timings are as usual a snapshot and just indications. The relative times
can differ over time depending on how binaries are compiled, libraries are
improved and \LUA\ code evolves. In node mode we can have experimental trickery
that is not yet optimized. Also, especially with complex fonts like Husayni, not
all shapers give the same result, although node mode and Uniscribe should be the
same in most cases. A future (public) version of Husayni will play more safe and
use less complex sequences of features.}

% And for the record: when I finished it, this 12 page documents processes in
% roughly 1~second with \LUATEX\ and 0.8 second with \LUAJITTEX\ which is okay for
% a edit|-|preview cycle.

\stopchapter

\stopcomponent