mk-optimization.tex /size: 10 Kb    last modification: 2023-12-21 09:43
1% language=us
2
3\startcomponent mk-arabic
4
5\environment mk-environment
6
7\chapter{Optimization}
8
9\subject{quality of code}
10
11How good is the \MKIV\ code? Well, as good as I can make it. When you browse
12the code you will probably notice differences in coding style and this is a
13related to the learning curve. For instance the \type {luat-inp} module needs
14some cleanup, for instance hiding local function from users.
15
16Since benchmarking has been done right from the start there is probably not
17that much to gain, but who knows. When coding in \LUA\ you should be careful
18with defining global variables, since they may override something. In \MKIV\
19we don't guarantee that the name you use for variable will not be used at
20some point. Therefore, best operate in a dedicated \LUA\ instance, or operate
21in userspace.
22
23\starttyping
24do
25    -- your code
26end
27\stoptyping
28
29If you want to use your data later on, think of working this way (the example
30is somewhat silly):
31
32\starttyping
33userdata['your.name'] = userdata['your.name'] or { }
34
35do
36    local mydata = userdata['your.name']
37
38    mydata.data = {}
39
40    local function foo() return 'bar' end
41
42    function mydata.dothis()
43        mydata[foo] = foo()
44    end
45
46
47end
48\stoptyping
49
50In this case you can always access your user data while temporary
51variables are hidden. The \type {userdata} table is predefined. As is
52\type {thirddata} for modules that you may write. Of course this
53assumes that you create a namespace within these global tables.
54
55A nice test for checking global cluttering is the following:
56
57\starttyping
58for k, v in pairs(_G) do
59    print(k, v)
60end
61\stoptyping
62
63When you incidentally define global variables like \type {n} or \type {str}
64they will show up here.
65
66\subject{clean or dirty}
67
68Processing the first 120 pages of this document (16 chapters) takes some 23.5
69seconds on a dell M90 (2.3GHZ, 4GB mem, Windows Vista Ultimate). A rough estimate
70of where \LUA\ spends its time is:
71
72\starttabulate[|l|c|]
73\NC \bf acticvity             \NC \bf sec \NC \NR
74\NC input load time           \NC 0.114   \NC \NR
75\NC fonts load time           \NC 6.692   \NC \NR
76\NC mps conversion time       \NC 0.004   \NC \NR
77\NC node processing time      \NC 0.832   \NC \NR
78\NC attribute processing time \NC 3.376   \NC \NR
79\stoptabulate
80
81Font loading takes some time, which is nu surprise because we load huge Zapfino, Arabic
82and \CJK\ fonts and define many instances of them. Some tracing learns that there
83are some 14.254.041 function calls, of which 13.339.226 concern functions that are
84called more than 5.000 times. A total of 62.434 function is counted, which is
85a result of locally defined ones.
86
87A rough indication of this overhead is given by the following test code:
88
89\starttyping
90local a,b,c,d,e,f = 1,2,3,4,5,6
91
92function one  (a)           local n = 1 end
93function three(a,b,c)       local n = 1 end
94function six  (a,b,c,d,e,f) local n = 1 end
95
96for i=1,14254041 do one  (a)           end
97for i=1,14254041 do three(a,b,c)       end
98for i=1,14254041 do six  (a,b,c,d,e,f) end
99\stoptyping
100
101The runtime for these tests (excluding startup) is:
102
103\starttabulate[|l|l|]
104\NC one argument    \NC 1.8 seconds \NC \NR
105\NC three arguments \NC 2.0 seconds \NC \NR
106\NC six arguments   \NC 2.3 seconds \NC \NR
107\stoptabulate
108
109So, the of the total runtime for this document we easily spend a couple
110of seconds on function calls, especially in node processing and attribute
111resolving. Does this mean that we need to change the code and follow a more
112inline approach? Eventually we may optimize some code, but for the moment
113we keep things as readable as possible, and even then much code is still
114quite complex. Font loading is often constant for a document anyway, and
115independent of the number of pages. Time spent on node processing depends on
116the script, and often processing intense scripts are typeset in a larger font and
117since they are less verbose than latin, this does not really influence
118the average time spent on typesetting a page. Attribute handling is probably
119the most time consuming activity, and for large documents the time spent on this
120is large compared to font loading and node processing. But then, after a few
121\MKIV\ development cycles the picture may be different.
122
123When we turned on tracing of function calls, if becomes clear where currently
124the time is spent in a document like this which demands complex Zapfino
125contextual analysis as well as Arabic analysis and feature application (both
126fonts demand node insertion and deletion). Of course using color also has a
127price. Handling weighted and conditional spacing (new in \MKIV) involves
128just over 10.000 calls to the main handler for 120 pages of this document.
129Glyph related processing of node lists needs 42.000 calls, and contextual
130analysis of \OPENTYPE\ fonts is good for 11.000 calls. Timing \LUA\ related
131tasks involves 2 times 37.000 calls to the stopwatch. Collapsing \UTF\ in
132the input lines equals the number of lines: 7700.
133
134However, at the the top of the charts we find calls to attribute related
135functions. 97.000 calls for handling special effects, overprint, transparency
136and alike, and another 24.000 calls for combined color and colorspace handling.
137These calls result in over 6.000 insertions of \PDF\ literals (this number is
138large because we show Arabic samples with color based tracing enabled). In
139case you wonder if the attribute handler can be made more efficient (we're
140talking seconds here), the answer is \quotation {possibly not}. This action
141is needed for each shipped out object and each shipped out page. If we divide
142the 24.000 (calls) by 120 (pages) we get 200 calls per page for color processing
143which is okay if you keep in mind that we need to recurse in nested horizontal
144and vertical lists of the completely made op page.
145
146\subject{serialization}
147
148When serializing tables, we can end up with very large tables, especially
149when dealing with big fonts like \quote{arabtype} or \quote {zapfino}. When
150serializing tables one has to find a compromise between speed of writing,
151effeciency of loading and readability. First we had (sub)tables like:
152
153\starttyping
154boundingbox = {
155    [1] = 0,
156    [2] = 0,
157    [3] = 100,
158    [4] = 200
159}
160\stoptyping
161
162I mistakingly assumed that this would generate an indexed table, but at \TUG\ 2007
163Roberto Ierusalimschy explained to me that this was not that efficient, since this
164variant boils down to the following byte code:
165
166\starttyping
1671       [1]     NEWTABLE        0 0 4
1682       [2]     SETTABLE        0 -2 -3 ; 1 0
1693       [3]     SETTABLE        0 -4 -3 ; 2 0
1704       [4]     SETTABLE        0 -5 -6 ; 3 100
1715       [5]     SETTABLE        0 -7 -8 ; 4 200
1726       [6]     SETGLOBAL       0 -1    ; boundingbox
1737       [6]     RETURN          0 1
174\stoptyping
175
176This creates a hashed table. The following variant is better:
177
178\starttyping
179boundingbox = { 0, 0, 100, 200 }
180\stoptyping
181
182This results in:
183
184\starttyping
1851       [1]     NEWTABLE        0 4 0
1862       [2]     LOADK           1 -2    ; 0
1873       [3]     LOADK           2 -2    ; 0
1884       [4]     LOADK           3 -3    ; 100
1895       [6]     LOADK           4 -4    ; 200
1906       [6]     SETLIST         0 4 1   ; 1
1917       [6]     SETGLOBAL       0 -1    ; boundingbox
1928       [6]     RETURN          0 1
193\stoptyping
194
195The resulting tables are not only smaller in terms of bytes, but also
196are less memory hungry when loaded. For readability we write tables with
197only numbers, strings or boolean values in an inline||format:
198
199\starttyping
200boundingbox = { 0, 0, 100, 200 }
201\stoptyping
202
203The serialized tables are somewhat smaller, depending on how
204many subtables are indexed (boundary boxes, lookup sequences, etc.)
205
206\starttabulate[|r|r|l|]
207\NC \bf normal \NC \bf compact \NC \bf filename \NC \NR
208\NC 34.055.092 \NC 32.403.326 \NC arabtype.tma                \NC \NR
209\NC  1.620.614 \NC  1.513.863 \NC lmroman10-italic.tma        \NC \NR
210\NC  1.325.585 \NC  1.233.044 \NC lmroman10-regular.tma       \NC \NR
211\NC  1.248.157 \NC  1.158.903 \NC lmsans10-regular.tma        \NC \NR
212\NC    194.646 \NC    153.120 \NC lmtypewriter10-regular.tma  \NC \NR
213\NC  1.771.678 \NC  1.658.461 \NC palatinosanscom-bold.tma    \NC \NR
214\NC  1.695.251 \NC  1.584.491 \NC palatinosanscom-regular.tma \NC \NR
215\NC 13.736.534 \NC 13.409.446 \NC zapfinoextraltpro.tma       \NC \NR
216\stoptabulate
217
218Since we compile the tables to bytecode, the effects are more
219spectacular there.
220
221\starttabulate[|r|r|l|]
222\NC \bf normal \NC \bf compact \NC \bf filename \NC \NR
223\NC 13.679.038 \NC 11.774.106 \NC arabtype.tmc                \NC \NR
224\NC    886.248 \NC    754.944 \NC lmroman10-italic.tmc        \NC \NR
225\NC    729.828 \NC    466.864 \NC lmroman10-regular.tmc       \NC \NR
226\NC    688.482 \NC    441.962 \NC lmsans10-regular.tmc        \NC \NR
227\NC    128.685 \NC     95.853 \NC lmtypewriter10-regular.tmc  \NC \NR
228\NC    715.929 \NC    582.985 \NC palatinosanscom-bold.tmc    \NC \NR
229\NC    669.942 \NC    540.126 \NC palatinosanscom-regular.tmc \NC \NR
230\NC  1.560.588 \NC  1.317.000 \NC zapfinoextraltpro.tmc       \NC \NR
231\stoptabulate
232
233Especially when a table is partially indexed and hashed, readability is a bit
234less than normal but in practice one will seldom consult such tables in its verbose
235form.
236
237After going beta, users reported problems with scaling of the the Latin Modern and
238\TeX-Gyre fonts. The troubles originate in the fact that the \OPENTYPE\ versions of
239these fonts lack a design size specification and it happens that the Latin Modern
240fonts do have design sizes other than 10 points. Here the power of a flexible
241\TEX\ engine shows \unknown\ we can repair this when we load the font. In \MKIV\
242we can now define patches:
243
244\starttyping
245do
246    local function patch(data,filename)
247        if data.design_size == 0 then
248            local ds = (file.basename(filename)):match("(%d+)")
249            if ds then
250                logs.report("load otf",string.format("patching design size (%s)",ds))
251                data.design_size = tonumber(ds) * 10
252            end
253        end
254    end
255
256    fonts.otf.enhance.patches["^lmroman"] = patch
257    fonts.otf.enhance.patches["^lmsans"]  = patch
258    fonts.otf.enhance.patches["^lmmono"]  = patch
259end
260\stoptyping
261
262Eventually such code will move to typescripts instead of in the kernel code.
263
264
265\stopcomponent
266