font-phb.lua /size: 25 Kb    last modification: 2021-10-28 13:50
1if not modules then modules = { } end modules ['font-phb'] = {
2    version   = 1.000, -- 2016.10.10,
3    comment   = "companion to font-txt.mkiv",
4    original  = "derived from a prototype by Kai Eigner",
5    author    = "Hans Hagen", -- so don't blame KE
6    copyright = "TAT Zetwerk / PRAGMA ADE / ConTeXt Development Team",
7    license   = "see context related readme files",
8}
9
10-- The next code is a rewrite of Kai's prototype. Here we forget about components
11-- and assume some sane data structures. Clusters are handled on the fly. This is
12-- probably one of the places where generic and context code is (to be) different
13-- anyway. All errors in the logic below are mine (Hans). The optimizations probably
14-- make less sense in luajittex because there the interpreter does some optimization
15-- but we may end up with a non-jit version some day.
16--
17-- For testing I used the commandline tool as this code is not that critital and not
18-- used in context for production (maybe for testing). I noticed some issues with
19-- r2l shaping of latin but the uniscribe shaper seems better with that but as it's
20-- a library we're supposed to treat it as a magic black box and not look into it. In
21-- the end all will be sorted out I guess so we don't need to worry about it. Also, I
22-- can always improve the code below if really needed.
23--
24-- We create intermediate tables which might look inefficient. For instance we could
25-- just return two tables or an iterator but in the end this is not the bottleneck.
26-- In fact, speed is hard to measure anyway, as it depends on the font, complexity
27-- of the text, etc. Sometimes the library is faster, sometimes the context Lua one
28-- (which is interesting as it does a bit more, i.e. supports additional features,
29-- which also makes it even harder to check). When we compare context mkiv runs with
30-- mkii runs using pdftex or xetex (which uses harfbuzz) the performance of luatex
31-- on (simple) font demos normally is significant less compared with pdftex (8 bit
32-- and no unicode) but a bit better than xetex. It looks like the interface that gets
33-- implemented here suits that pattern (keep in mind that especially discretionary
34-- handling is quite complex and similar to the context mkiv variant).
35--
36-- The main motivations for supporting this are (1) the fact that Kai spent time on
37-- it, and (2) that we can compare the Lua variant with uniscribe, which is kind of
38-- a reference. We started a decade ago (2006) with the Lua implementation and had
39-- to rely on MSWord for comparison. On the other hand, the command line version is
40-- also useable for that. Don't blame the library or its (maybe wrong) use (here)
41-- for side effects.
42--
43-- Currently there are two methods: (1) binary, which is slow and uses the command
44-- line shaper and (2) the ffi binding. In the meantime I redid the feed-back-into-
45-- the-node-list method. This way tracing is easier, performance better, and there
46-- is no need to mess so much with spacing. I have no clue if I lost functionality
47-- and as this is not production code issues probably will go unnoticed for a while.
48-- We'll see.
49--
50-- Usage: see m-fonts-plugins.mkiv as that is the interface.
51--
52-- Remark: It looks like the library sets up some features by default. Passing them
53-- somehow doesn't work (yet) so I must miss something here. There is something fishy
54-- here with enabling features like init, medi, fina etc because when we turn them on
55-- they aren't applied. Also some features are not processed.
56--
57-- Remark: Because utf32 is fragile I append a couple of zero slots which seems to
58-- work out ok. In fact, after some experiment I figured out that utf32 needs a list
59-- of 4 byte cardinals. From the fact that Kai used the utf8 method I assumed that
60-- there was a utf32 too and indeed that worked but I have no time to look into it
61-- more deeply. It seems to work ok though.
62--
63-- The plugin itself has plugins and we do it the same as with (my)sql support, i.e.
64-- we provide methods. The specific methods are implemented in the imp files. We
65-- follow that model with other libraries too.
66--
67-- Somehow the command line version does uniscribe (usp10.dll) but not the library
68-- so when I can get motivated I might write a binding for uniscribe. (Problem: I
69-- don't look forward to decipher complex (c++) library api's so in the end it might
70-- never happen. A quick glance at the usp10 api gives me the impression that the
71-- apis don't differ that much, but still.)
72--
73-- Warning: This is rather old code, cooked up in the second half of 2016. I'm not
74-- sure if it will keep working because it's not used in production and therefore
75-- doesn't get tested. It was written as part of some comparison tests for Idris,
76-- who wanted to compare the ConTeXt handler, uniscribe and hb, for which there are
77-- also some special modules (that show results alongside). It has never been tested
78-- in regular documents. As it runs independent of the normal font processors there
79-- is probably not that much risk of interference but of course one looses all the
80-- goodies that have been around for a while (or will show up in the future). The
81-- code can probably be optimized a bit.
82
83-- There are three implementation specific files:
84--
85-- 1  font-phb-imp-binary.lua   : calls the command line version of hb
86-- 2  font-phb-imp-library.lua  : uses ffi to interface to hb
87-- 3  font-phb-imp-internal.lua : uses a small library to interface to hb
88--
89-- Variants 1 and 2 should work with mkiv and were used when playing with these
90-- things, when writing the articles, and when running some tests for Idris font
91-- development. Variant 3 (and maybe 1 also works) is meant for lmtx and has not
92-- been used (read: tested) so far. The 1 and 2 variants are kind of old, but 3 is
93-- an adaptation of 2 so not hip and modern either.
94
95if not context then
96    return
97end
98
99local next, tonumber, pcall, rawget = next, tonumber, pcall, rawget
100
101local concat        = table.concat
102local sortedhash    = table.sortedhash
103local formatters    = string.formatters
104
105local fonts         = fonts
106local otf           = fonts.handlers.otf
107local texthandler   = otf.texthandler
108
109local fontdata      = fonts.hashes.identifiers
110
111local nuts          = nodes.nuts
112local tonode        = nuts.tonode
113local tonut         = nuts.tonut
114
115local remove_node   = nuts.remove
116
117local getboth       = nuts.getboth
118local getnext       = nuts.getnext
119local setnext       = nuts.setnext
120local getprev       = nuts.getprev
121local setprev       = nuts.setprev
122local getid         = nuts.getid
123local getchar       = nuts.getchar
124local setchar       = nuts.setchar
125local setlink       = nuts.setlink
126local setoffsets    = nuts.setoffsets
127----- getcomponents = nuts.getcomponents
128----- setcomponents = nuts.setcomponents
129local getwidth      = nuts.getwidth
130local setwidth      = nuts.setwidth
131
132local copy_node     = nuts.copy
133local find_tail     = nuts.tail
134
135local nodepool      = nuts.pool
136local new_kern      = nodepool.fontkern
137local new_glyph     = nodepool.glyph
138
139local nodecodes     = nodes.nodecodes
140local glyph_code    = nodecodes.glyph
141local glue_code     = nodecodes.glue
142
143local skipped = {
144    -- we assume that only valid features are set but maybe we need a list
145    -- of valid hb features as there can be many context specific ones
146    mode     = true,
147    features = true,
148    language = true,
149    script   = true,
150}
151
152local seenspaces = {
153    [0x0020] = true,
154    [0x00A0] = true,
155    [0x0009] = true, -- indeed
156    [0x000A] = true, -- indeed
157    [0x000D] = true, -- indeed
158}
159
160-- helpers
161
162local helpers     = { }
163local methods     = { }
164local initialized = { } -- we don't polute the shared table
165
166local method      = "library"
167local shaper      = "native"   -- "uniscribe"
168local report      = logs.reporter("font plugin","hb")
169
170utilities.hb = {
171    methods = methods,
172    helpers = helpers,
173    report  = report,
174}
175
176do
177
178    local toutf8 = utf.char
179    local space  = toutf8(0x20)
180
181    -- we can move this to the internal lib .. just pass a table .. but it is not faster
182
183    function helpers.packtoutf8(text,leading,trailing)
184        local size = #text
185        for i=1,size do
186            text[i] = toutf8(text[i])
187        end
188        if leading then
189            text[0] = space
190        end
191        if trailing then
192            text[size+1] = space
193        end
194        return concat(text,"",leading and 0 or 1,trailing and (size + 1) or size)
195    end
196
197    local toutf32 = utf.toutf32string
198    local space   = toutf32(0x20)
199
200    function helpers.packtoutf32(text,leading,trailing)
201        local size = #text
202        for i=1,size do
203            text[i] = toutf32(text[i])
204        end
205        if leading then
206            text[0] = space
207        end
208        if trailing then
209            text[size+1] = space
210        end
211        return concat(text,"",leading and 0 or 1,trailing and (size + 1) or size)
212    end
213
214end
215
216local function initialize(font)
217
218    local tfmdata      = fontdata[font]
219    local resources    = tfmdata.resources
220    local shared       = tfmdata.shared
221    local filename     = resources.filename
222    local features     = shared.features
223    local descriptions = shared.rawdata.descriptions
224    local characters   = tfmdata.characters
225    local featureset   = { }
226    local copytochar   = shared.copytochar -- indextounicode
227    local spacewidth   = nil -- unscaled
228    local factor       = tfmdata.parameters.factor
229    local marks        = resources.marks or { }
230
231    -- could be shared but why care about a few extra tables
232
233    if not copytochar then
234        copytochar = { }
235        -- let's make sure that we have an indexed table and not a hash
236        local max = 0
237        for k, v in next, descriptions do
238            if v.index > max then
239                max = v.index
240            end
241        end
242        for i=0,max do
243            copytochar[i] = i
244        end
245        -- the normal mapper
246        for k, v in next, descriptions do
247            copytochar[v.index] = k
248        end
249        shared.copytochar = copytochar
250    end
251
252    -- independent from loop as we have unordered hashes
253
254    if descriptions[0x0020] then
255        spacewidth = descriptions[0x0020].width
256    elseif descriptions[0x00A0] then
257        spacewidth = descriptions[0x00A0].width
258    end
259
260    for k, v in sortedhash(features) do
261        if #k > 4 then
262            -- unknown ones are ignored anyway but we can assume that the current
263            -- (and future) extra context features use more verbose names
264        elseif skipped[k] then
265            -- we don't want to pass language and such so we block a few features
266            -- explicitly
267        elseif v == "yes" or v == true then
268            featureset[#featureset+1] = k .. "=1"     -- cf command line (false)
269        elseif v == "no" or v == false then
270            featureset[#featureset+1] = k .. "=0"     -- cf command line (true)
271        elseif type(v) == "number" then
272            featureset[#featureset+1] = k .. "=" .. v -- cf command line (alternate)
273        else
274            -- unset
275        end
276    end
277
278    local data = {
279        language   = features.language, -- do we need to uppercase and padd to 4 ?
280        script     = features.script,   -- do we need to uppercase and padd to 4 ?
281        features   = #featureset > 0 and concat(featureset,",") or "", -- hash
282        featureset = #featureset > 0 and featureset or nil,
283        copytochar = copytochar,
284        spacewidth = spacewidth,
285        filename   = filename,
286        marks      = marks,
287        factor     = factor,
288        characters = characters, -- the loaded font (we use its metrics which is more accurate)
289        method     = features.method or method,
290        shaper     = features.shaper or shaper,
291    }
292    initialized[font] = data
293    return data
294end
295
296-- In many cases this gives compatible output but especially with respect to spacing and user
297-- discretionaries that mix fonts there can be different outcomes. We also have no possibility
298-- to tweak and cheat. Of course one can always run a normal node mode pass with specific
299-- features first but then one can as well do all in node mode. So .. after a bit of playing
300-- around I redid this one from scratch and also added tracing.
301
302local trace_colors  = false  trackers.register("fonts.plugins.hb.colors", function(v) trace_colors  = v end)
303local trace_details = false  trackers.register("fonts.plugins.hb.details",function(v) trace_details = v end)
304local check_id      = false
305----- components    = false -- we have no need for them
306
307local setcolor      = function() end
308local resetcolor    = function() end
309
310if context then
311    setcolor   = nodes.tracers.colors.set
312    resetcolor = nodes.tracers.colors.reset
313end
314
315table.setmetatableindex(methods,function(t,k)
316    local l = "font-phb-imp-" .. k .. ".lua"
317    report("start loading method %a from %a",k,l)
318    dofile(resolvers.findfile(l))
319    local v = rawget(t,k)
320    if v then
321        report("loading method %a succeeded",k)
322    else
323        report("loading method %a failed",k)
324        v = function() return { } end
325    end
326    t[k] = v
327    return v
328end)
329
330local inandout  do
331
332    local utfbyte = utf.byte
333    local utfchar = utf.char
334    local utf3208 = utf.utf32_to_utf8_le
335
336    inandout = function(text,result,first,last,copytochar)
337        local s = { }
338        local t = { }
339        local r = { }
340        local f = formatters["%05U"]
341        for i=1,#text do
342            local c = text[i]
343         -- t[#t+1] = f(utfbyte(utf3208(c)))
344            s[#s+1] = utfchar(c)
345            t[#t+1] = f(c)
346        end
347        for i=first,last do
348            r[#r+1] = f(copytochar[result[i][1]])
349        end
350        return s, t, r
351    end
352
353end
354
355local function harfbuzz(head,font,attr,rlmode,start,stop,text,leading,trailing)
356    local data = initialized[font]
357
358    if not data then
359        data = initialize(font)
360    end
361
362    if check_id then
363        if getid(start) ~= glyph_code then
364            report("error: start is not a glyph")
365            return head
366        elseif getid(stop) ~= glyph_code then
367            report("error: stop is not a glyph")
368            return head
369        end
370    end
371    local size   = #text -- original text, without spaces
372    local result = methods[data.method](font,data,rlmode,text,leading,trailing)
373    local length = result and #result or 0
374
375    if length == 0 then
376     -- report("warning: no result")
377        return head
378    end
379
380    local factor     = data.factor
381    local marks      = data.marks
382    local spacewidth = data.spacewidth
383    local copytochar = data.copytochar
384    local characters = data.characters
385
386    -- the text analyzer is only partially clever so we must assume that we get
387    -- inconsistent lists
388
389    -- we could check if something has been done (replacement or kern or so) but
390    -- then we pass around more information and need to check a lot and spaces
391    -- are kind of spoiling that game (we need a different table then) .. more
392    -- pain than gain
393
394    -- we could play with 0xFFFE as boundary
395
396    local current  = start
397    local prev     = nil
398    local glyph    = nil
399
400    local first    = 1
401    local last     = length
402    local next     = nil -- todo: keep track of them
403    local prev     = nil -- todo: keep track of them
404
405    if leading then
406        first = first + 1
407    end
408    if trailing then
409        last = last - 1
410    end
411
412    local position = first
413    local cluster  = 0
414    local glyph    = nil
415    local index    = 0
416    local count    = 1
417 -- local runner   = nil
418    local saved    = nil
419
420    if trace_details then
421        report("start run, original size: %i, result index: %i upto %i",size,first,last)
422        local s, t, r = inandout(text,result,first,last,copytochar)
423        report("method : %s",data.method)
424        report("shaper : %s",data.shaper)
425        report("string : %t",s)
426        report("text   : % t",t)
427        report("result : % t",r)
428    end
429
430    -- okay, after some experiments, it became clear that more complex code aimed at
431    -- optimization doesn't pay off as complexity also demands more testing
432
433    for i=first,last do
434        local r = result[i]
435        local unicode = copytochar[r[1]] -- can be private of course
436        --
437        cluster = r[2] + 1 -- starts at zero
438        --
439        if position == cluster then
440            if i == first then
441                index = 1
442                if trace_details then
443                    report("[%i] position: %i, cluster: %i, index: %i, starting",i,position,cluster,index)
444                end
445            else
446                index = index + 1
447                if trace_details then
448                    report("[%i] position: %i, cluster: %i, index: %i, next step",i,position,cluster,index)
449                end
450            end
451        elseif position < cluster then
452            -- a new cluster
453            current  = getnext(current)
454            position = position + 1
455            size     = size - 1
456         -- if runner then
457         --     local h, t
458         --     if saved then
459         --         h = copy_node(runner)
460         --         if trace_colors then
461         --             resetcolor(h)
462         --         end
463         --         setchar(h,saved)
464         --         t = h
465         --         if trace_details then
466         --             report("[%i] position: %i, cluster: %i, index: -, initializing components",i,position,cluster)
467         --         end
468         --     else
469         --         h = getcomponents(runner)
470         --         t = find_tail(h)
471         --     end
472         --     for p=position,cluster-1 do
473         --         local n
474         --         head, current, n = remove_node(head,current)
475         --         setlink(t,n)
476         --         t = n
477         --         if trace_details then
478         --             report("[%i] position: %i, cluster: %i, index: -, moving node to components",i,p,cluster)
479         --         end
480         --         size = size - 1
481         --     end
482         --     if saved then
483         --         setcomponents(runner,h)
484         --         saved = false
485         --     end
486         -- else
487                for p=position,cluster-1 do
488                    head, current = remove_node(head,current,true)
489                    if trace_details then
490                        report("[%i] position: %i, cluster: %i, index: -, removing node",i,p,cluster)
491                    end
492                    size = size - 1
493                end
494         -- end
495            position = cluster
496            index    = 1
497            glyph    = nil
498            if trace_details then
499                report("[%i] position: %i, cluster: %i, index: %i, arriving",i,cluster,position,index)
500            end
501        else -- maybe a space got properties
502            if trace_details then
503                report("position: %i, cluster: %i, index: %i, quitting due to fatal inconsistency",position,cluster,index)
504            end
505            return head
506        end
507        local copied = false
508        if glyph then
509            if trace_details then
510                report("[%i] position: %i, cluster: %i, index: %i, copying glyph, unicode %U",i,position,cluster,index,unicode)
511            end
512            local g = copy_node(glyph)
513            if trace_colors then
514                resetcolor(g)
515            end
516            setlink(current,g,getnext(current))
517            current = g
518            copied  = true
519        else
520            if trace_details then
521                report("[%i] position: %i, cluster: %i, index: %i, using glyph, unicode %U",i,position,cluster,index,unicode)
522            end
523            glyph = current
524        end
525        --
526        if not current then
527            if trace_details then
528                report("quitting due to unexpected end of node list")
529            end
530            return head
531        end
532        --
533        local id = getid(current)
534        if id ~= glyph_code then
535            if trace_details then
536                report("glyph expected in node list")
537            end
538            return head
539        end
540        --
541        -- really, we can get a tab (9), lf (10), or cr(13) back in cambria .. don't ask me why
542        --
543        local prev, next = getboth(current)
544        --
545        -- assign glyph: first in run
546        --
547     -- if components and index == 1 then
548     --     runner = current
549     --     saved  = getchar(current)
550     --     if saved ~= unicode then
551     --         setchar(current,unicode) -- small optimization
552     --         if trace_colors then
553     --             count = (count == 8) and 1 or count + 1
554     --             setcolor(current,"trace:"..count)
555     --         end
556     --     end
557     -- else
558            setchar(current,unicode)
559            if trace_colors then
560                count = (count == 8) and 1 or count + 1
561                setcolor(current,"trace:"..count)
562            end
563     -- end
564        --
565        local x_offset  = r[3] -- r.dx
566        local y_offset  = r[4] -- r.dy
567        local x_advance = r[5] -- r.ax
568        ----- y_advance = r[6] -- r.ay
569        local left  = 0
570        local right = 0
571        local dx    = 0
572        local dy    = 0
573        if trace_details then
574            if x_offset ~= 0 or y_offset ~= 0 or x_advance ~= 0 then -- or y_advance ~= 0
575                report("[%i] position: %i, cluster: %i, index: %i, old, xoffset: %p, yoffset: %p, xadvance: %p, width: %p",
576                    i,position,cluster,index,x_offset*factor,y_offset*factor,x_advance*factor,characters[unicode].width)
577            end
578        end
579        if y_offset ~= 0 then
580            dy = y_offset * factor
581        end
582        if rlmode >= 0 then
583            -- l2r marks and rest
584            if x_offset ~= 0 then
585                dx = x_offset * factor
586            end
587            local width = characters[unicode].width
588            local delta = x_advance * factor
589            if delta ~= width then
590             -- right = -(delta - width)
591                right = delta - width
592            end
593        elseif marks[unicode] then -- why not just the next loop
594            -- r2l marks
595            if x_offset ~= 0 then
596                dx = -x_offset * factor
597            end
598        else
599            -- r2l rest
600            local width = characters[unicode].width
601            local delta = (x_advance - x_offset) * factor
602            if delta ~= width then
603                left = delta - width
604            end
605            if x_offset ~= 0 then
606                right = x_offset * factor
607            end
608        end
609        if copied or dx ~= 0 or dy ~= 0 then
610            setoffsets(current,dx,dy)
611        end
612        if left ~= 0 then
613            setlink(prev,new_kern(left),current) -- insertbefore
614            if current == head then
615                head = prev
616            end
617        end
618        if right ~= 0 then
619            local kern = new_kern(right)
620            setlink(current,kern,next)
621            current = kern
622        end
623        if trace_details then
624            if dy ~= 0 or dx ~= 0 or left ~= 0 or right ~= 0 then
625                report("[%i] position: %i, cluster: %i, index: %i, new, xoffset: %p, yoffset: %p, left: %p, right: %p",i,position,cluster,index,dx,dy,left,right)
626            end
627        end
628    end
629    --
630    if trace_details then
631        report("[-] position: %i, cluster: %i, index: -, at end",position,cluster)
632    end
633    if size > 1 then
634        current = getnext(current)
635     -- if runner then
636     --     local h, t
637     --     if saved then
638     --         h = copy_node(runner)
639     --         if trace_colors then
640     --             resetcolor(h)
641     --         end
642     --         setchar(h,saved)
643     --         t = h
644     --         if trace_details then
645     --             report("[-] position: %i, cluster: -, index: -, initializing components",position)
646     --         end
647     --     else
648     --         h = getcomponents(runner)
649     --         t = find_tail(h)
650     --     end
651     --     for i=1,size-1 do
652     --         if trace_details then
653     --             report("[-] position: %i + %i, cluster: -, index: -, moving node to components",position,i)
654     --         end
655     --         local n
656     --         head, current, n = remove_node(head,current,true)
657     --         setlink(t,n)
658     --         t = n
659     --     end
660     --     if saved then
661     --         setcomponents(runner,h)
662     --         saved = false
663     --     end
664     -- else
665            for i=1,size-1 do
666                if trace_details then
667                    report("[-] position: %i + %i, cluster: -, index: -, removing node",position,i)
668                end
669                head, current = remove_node(head,current,true)
670            end
671     -- end
672    end
673    --
674    -- We see all kind of interesting spaces come back (like tabs in cambria) so we do a bit of
675    -- extra testing here.
676    --
677    if leading then
678        local r = result[1]
679        local unicode = copytochar[r[1]]
680        if seenspaces[unicode] then
681            local x_advance = r[5]
682            local delta     = x_advance - spacewidth
683            if delta ~= 0 then
684                -- nothing to do but jump one slot ahead
685                local prev = getprev(start)
686                if getid(prev) == glue_code then
687                    local dx = delta * factor
688                    setwidth(prev,getwidth(prev) + dx)
689                    if trace_details then
690                        report("compensating leading glue by %p due to codepoint %U",dx,unicode)
691                    end
692                else
693                    report("no valid leading glue node")
694                end
695            end
696        end
697    end
698    --
699    if trailing then
700        local r = result[length]
701        local unicode = copytochar[r[1]]
702        if seenspaces[unicode] then
703            local x_advance = r[5]
704            local delta     = x_advance - spacewidth
705            if delta ~= 0 then
706                local next = getnext(stop)
707                if getid(next) == glue_code then
708                    local dx = delta * factor
709                    setwidth(next,getwidth(next) + dx)
710                    if trace_details then
711                        report("compensating trailing glue by %p due to codepoint %U",dx,unicode)
712                    end
713                else
714                    report("no valid trailing glue node")
715                end
716            end
717        end
718    end
719    --
720    if trace_details then
721        report("run done")
722    end
723    return head
724end
725
726otf.registerplugin("harfbuzz",function(head,font,attr,direction)
727    return texthandler(head,font,attr,direction,harfbuzz)
728end)
729