cld-backendcode.tex /size: 11 Kb    last modification: 2021-10-28 13:50
1% language=us runpath=texruns:manuals/cld
2
3\startcomponent cld-backendcode
4
5\environment cld-environment
6
7% derived from hybrid
8
9\startchapter[title={Backend code}]
10
11\startsection [title={Introduction}]
12
13In \CONTEXT\ we've always separated the backend code in so called driver files.
14This means that in the code related to typesetting only calls to the \API\ take
15place, and no backend specific code is to be used. Currently a \PDF\ backend is
16supported as well as an \XML\ export. \footnote {This chapter is derived from an
17article on these matters. You can find nore information in \type {hybrid.pdf}.}
18
19Some \CONTEXT\ users like to add their own \PDF\ specific code to their styles or
20modules. However, such extensions can interfere with existing code, especially
21when resources are involved. Therefore the construction of \PDF\ data structures
22and resources is rather controlled and has to be done via the official helper
23macros.
24
25\stopsection
26
27\startsection [title={Structure}]
28
29A \PDF\ file is a tree of indirect objects. Each object has a number and the file
30contains a table (or multiple tables) that relates these numbers to positions in
31a file (or position in a compressed object stream). That way a file can be viewed
32without reading all data: a viewer only loads what is needed.
33
34\starttyping
351 0 obj <<
36    /Name (test) /Address 2 0 R
37>>
382 0 obj [
39   (Main Street) (24) (postal code) (MyPlace)
40]
41\stoptyping
42
43For the sake of the discussion we consider strings like \type {(test)} also to be
44objects. In the next table we list what we can encounter in a \PDF\ file. There
45can be indirect objects in which case a reference is used (\type{2 0 R}) and
46direct ones.
47
48It all starts in the document's root object. From there we access the page tree
49and resources. Each page carries its own resource information which makes random
50access easier. A page has a page stream and there we find the to be rendered
51content as a mixture of (\UNICODE) strings and special drawing and rendering
52operators. Here we will not discuss them as they are mostly generated by the
53engine itself or dedicated subsystems like the \METAPOST\ converter. There we use
54literal or \type {\latelua} whatsits to inject code into the current stream.
55
56\stopsection
57
58\startsection [title={Data types}]
59
60There are several datatypes in \PDF\ and we support all of them one way or the
61other.
62
63\starttabulate[|l|l|p|]
64\FL
65\NC \bf type \NC \bf form \NC \bf meaning \NC \NR
66\TL
67\NC constant   \NC \type{/...} \NC A symbol (prescribed string). \NC \NR
68\NC string     \NC \type{(...)} \NC A sequence of characters in pdfdoc
69                   encoding \NC \NR
70\NC unicode    \NC \type{<...>} \NC A sequence of characters in utf16
71                   encoding \NC \NR
72\NC number     \NC \type{3.1415} \NC A number constant. \NC \NR
73\NC boolean    \NC \type{true/false} \NC A boolean constant. \NC \NR
74\NC reference  \NC \type{N 0 R} \NC A reference to an object \NC \NR
75\NC dictionary \NC \type{<< ... >>} \NC A collection of key value pairs
76                   where the value itself is an (indirect) object.
77                   \NC \NR
78\NC array      \NC \type{[ ... ]} \NC A list of objects or references to
79                   objects. \NC \NR
80\NC stream     \NC \NC A sequence of bytes either or not packaged with
81                   a dictionary that contains descriptive data. \NC \NR
82\NC xform      \NC \NC A special kind of object containing an reusable
83                   blob of data, for example an image. \NC \NR
84\LL
85\stoptabulate
86
87While writing additional backend code, we mostly create dictionaries.
88
89\starttyping
90<< /Name (test) /Address 2 0 R >>
91\stoptyping
92
93In this case the indirect object can look like:
94
95\starttyping
96[ (Main Street) (24) (postal code) (MyPlace) ]
97\stoptyping
98
99The \LUATEX\ manual mentions primitives like \type {\pdfobj}, \type {\pdfannot},
100\type {\pdfcatalog}, etc. However, in \MKIV\ no such primitives are used. You can
101still use many of them but those that push data into document or page related
102resources are overloaded to do nothing at all.
103
104In the \LUA\ backend code you will find function calls like:
105
106\starttyping
107local d = lpdf.dictionary {
108    Name    = lpdf.string("test"),
109    Address = lpdf.array {
110        "Main Street", "24", "postal code", "MyPlace",
111    }
112}
113\stoptyping
114
115Equaly valid is:
116
117\starttyping
118local d = lpdf.dictionary()
119d.Name = "test"
120\stoptyping
121
122Eventually the object will end up in the file using calls like:
123
124\starttyping
125local r = lpdf.immediateobject(tostring(d))
126\stoptyping
127
128or using the wrapper (which permits tracing):
129
130\starttyping
131local r = lpdf.flushobject(d)
132\stoptyping
133
134The object content will be serialized according to the formal specification so
135the proper \type {<< >>} etc.\ are added. If you want the content instead you can
136use a function call:
137
138\starttyping
139local dict = d()
140\stoptyping
141
142An example of using references is:
143
144\starttyping
145local a = lpdf.array {
146    "Main Street", "24", "postal code", "MyPlace",
147}
148local d = lpdf.dictionary {
149    Name    = lpdf.string("test"),
150    Address = lpdf.reference(a),
151}
152local r = lpdf.flushobject(d)
153\stoptyping
154
155\stopsection
156
157We have the following creators. Their arguments are optional.
158
159\starttabulate[|l|p|]
160\FL
161\NC \bf function \NC \bf optional parameter \NC \NR
162\TL
163\NC \type{lpdf.null}        \NC \NC \NR
164\NC \type{lpdf.number}      \NC number \NC \NR
165\NC \type{lpdf.constant}    \NC string \NC \NR
166\NC \type{lpdf.string}      \NC string \NC \NR
167\NC \type{lpdf.unicode}     \NC string \NC \NR
168\NC \type{lpdf.boolean}     \NC boolean \NC \NR
169\NC \type{lpdf.array}       \NC indexed table of objects \NC \NR
170\NC \type{lpdf.dictionary}  \NC hash with key/values \NC \NR
171%NC \type{lpdf.stream}      \NC indexed table of operators \NC \NR
172\NC \type{lpdf.reference}   \NC string \NC \NR
173\NC \type{lpdf.verbose}     \NC indexed table of strings \NC \NR
174\LL
175\stoptabulate
176
177\ShowLuaExampleString{tostring(lpdf.null())}
178\ShowLuaExampleString{tostring(lpdf.number(123))}
179\ShowLuaExampleString{tostring(lpdf.constant("whatever"))}
180\ShowLuaExampleString{tostring(lpdf.string("just a string"))}
181\ShowLuaExampleString{tostring(lpdf.unicode("just a string"))}
182\ShowLuaExampleString{tostring(lpdf.boolean(true))}
183\ShowLuaExampleString{tostring(lpdf.array { 1, lpdf.constant("c"), true, "str" })}
184\ShowLuaExampleString{tostring(lpdf.dictionary { a=1, b=lpdf.constant("c"), d=true, e="str" })}
185%ShowLuaExampleString{tostring(lpdf.stream("whatever"))}
186\ShowLuaExampleString{tostring(lpdf.reference(123))}
187\ShowLuaExampleString{tostring(lpdf.verbose("whatever"))}
188
189\stopsection
190
191\startsection[title={Managing objects}]
192
193Flushing objects is done with:
194
195\starttyping
196lpdf.flushobject(obj)
197\stoptyping
198
199Reserving object is or course possible and done with:
200
201\starttyping
202local r = lpdf.reserveobject()
203\stoptyping
204
205Such an object is flushed with:
206
207\starttyping
208lpdf.flushobject(r,obj)
209\stoptyping
210
211We also support named objects:
212
213\starttyping
214lpdf.reserveobject("myobject")
215
216lpdf.flushobject("myobject",obj)
217\stoptyping
218
219A delayed object is created with:
220
221\starttyping
222local ref = pdf.delayedobject(data)
223\stoptyping
224
225The data will be flushed later using the object number that is returned (\type
226{ref}). When you expect that many object with the same content are used, you can
227use:
228
229\starttyping
230local obj = lpdf.shareobject(data)
231local ref = lpdf.shareobjectreference(data)
232\stoptyping
233
234This one flushes the object and returns the object number. Already defined
235objects are reused. In addition to this code driven optimization, some other
236optimization and reuse takes place but all that happens without user
237intervention. Only use this when it's really needed as it might consume more
238memory and needs more processing time.
239
240\startsection [title={Resources}]
241
242While \LUATEX\ itself will embed all resources related to regular typesetting,
243\MKIV\ has to take care of embedding those related to special tricks, like
244annotations, spot colors, layers, shades, transparencies, metadata, etc. Because
245third party modules (like tikz) also can add resources we provide some macros
246that makes sure that no interference takes place:
247
248\starttyping
249\pdfbackendsetcatalog       {key}{string}
250\pdfbackendsetinfo          {key}{string}
251\pdfbackendsetname          {key}{string}
252
253\pdfbackendsetpageattribute {key}{string}
254\pdfbackendsetpagesattribute{key}{string}
255\pdfbackendsetpageresource  {key}{string}
256
257\pdfbackendsetextgstate     {key}{pdfdata}
258\pdfbackendsetcolorspace    {key}{pdfdata}
259\pdfbackendsetpattern       {key}{pdfdata}
260\pdfbackendsetshade         {key}{pdfdata}
261\stoptyping
262
263One is free to use the \LUA\ interface instead, as there one has more
264possibilities but when code is shared with other macro packages the macro
265interface makes more sense. The names of the \LUA\ functions are similar, like:
266
267\starttyping
268lpdf.addtoinfo(key,anything_valid_pdf)
269\stoptyping
270
271Currently we expose a  bit more of the backend code than we like and
272future versions will have a more restricted access. The following
273function will stay public:
274
275\starttyping
276lpdf.addtopageresources  (key,value)
277lpdf.addtopageattributes (key,value)
278lpdf.addtopagesattributes(key,value)
279
280lpdf.adddocumentextgstate(key,value)
281lpdf.adddocumentcolorspac(key,value)
282lpdf.adddocumentpattern  (key,value)
283lpdf.adddocumentshade    (key,value)
284
285lpdf.addtocatalog        (key,value)
286lpdf.addtoinfo           (key,value)
287lpdf.addtonames          (key,value)
288\stoptyping
289
290\stopsection
291
292\startsection [title={Annotations}]
293
294You can use the \LUA\ functions that relate to annotations etc.\ but normally you
295will use the regular \CONTEXT\ user interface. You can look into some of the
296\type {lpdf-*} modules to see how special annotations can be dealt with.
297
298\stopsection
299
300\startsection [title={Tracing}]
301
302There are several tracing options built in and some more will be added in due
303time:
304
305\starttyping
306\enabletrackers
307  [backend.finalizers,
308   backend.resources,
309   backend.objects,
310   backend.detail]
311\stoptyping
312
313As with all trackers you can also pass them on the command line, for example:
314
315\starttyping
316context --trackers=backend.* yourfile
317\stoptyping
318
319The reference related backend mechanisms have their own trackers. When you write
320code that generates \PDF, it also helps to look in the \PDF\ file so see if
321things are done right. In that case you need to disable compression:
322
323\starttyping
324\nopdfcompression
325\stoptyping
326
327\stopsection
328
329\startsection[title={Analyzing}]
330
331The \type {epdf} library that comes with \LUATEX\ offers a userdata interface to
332\PDF\ files. On top of that \CONTEXT\ provides a more \LUA-ish access, using
333tables. You can open a \PDF\ file with:
334
335\starttyping
336local mypdf = lpdf.epdf.load(filename)
337\stoptyping
338
339When opening is successful, you have access to a couple of tables:
340
341\starttyping
342\NC \type{pages}         \NC indexed \NC \NR
343\NC \type{destinations}  \NC hashed  \NC \NR
344\NC \type{javascripts}   \NC hashed  \NC \NR
345\NC \type{widgets}       \NC hashed  \NC \NR
346\NC \type{embeddedfiles} \NC hashed  \NC \NR
347\NC \type{layers}        \NC indexed \NC \NR
348\stoptyping
349
350These provide efficient access to some data that otherwise would take a bit of
351code to deal with. Another top level table is the for \PDF\ characteristic \type
352{Catalog}. Watch the capitalization: as with other native \PDF\ data structures,
353keys are case sensitive and match the standard.
354
355Here is an example of usage:
356
357\starttyping
358local MyDocument = lpdf.epdf.load("somefile.pdf")
359
360context.starttext()
361
362  local pages    = MyDocument.pages
363  local nofpages = pages.n
364
365  context.starttabulate { "|c|c|c|" }
366
367    context.NC() context("page")
368    context.NC() context("width")
369    context.NC() context("height") context.NR()
370
371    for i=1, nofpages do
372      local page = pages[i]
373      local bbox = page.CropBox or page.MediaBox
374      context.NC() context(i)
375      context.NC() context(bbox[4]-bbox[2])
376      context.NC() context(bbox[3]-bbox[1]) context.NR()
377    end
378
379  context.stoptabulate()
380
381context.stoptext()
382\stoptyping
383
384\stopsection
385
386\stopchapter
387
388\stopcomponent
389