hybrid-backend.tex /size: 14 Kb    last modification: 2023-12-21 09:43
1% language=us
2
3\startcomponent hybrid-backends
4
5\environment hybrid-environment
6
7\startchapter[title={Backend code}]
8
9\startsection [title={Introduction}]
10
11In \CONTEXT\ we've always separated the backend code in so called driver files.
12This means that in the code related to typesetting only calls to the \API\ take
13place, and no backend specific code is to be used. That way we can support
14backend like dvipsone (and dviwindo), dvips, acrobat, pdftex and dvipdfmx with
15one interface. A simular model is used in \MKIV\ although at the moment we only
16have one backend: \PDF. \footnote {At this moment we only support the native
17\PDF\ backend but future versions might support \XML\ (\HTML) output as well.}
18
19Some \CONTEXT\ users like to add their own \PDF\ specific code to their styles or
20modules. However, such extensions can interfere with existing code, especially
21when resources are involved. This has to be done via the official helper macros.
22
23In the next sections an overview will be given of the current approach. There are
24still quite some rough edges but these will be polished as soon as the backend
25code is more isolated in \LUATEX\ itself.
26
27\stopsection
28
29\startsection [title={Structure}]
30
31A \PDF\ file is a tree of indirect objects. Each object has a number and the file
32contains a table (or multiple tables) that relates these numbers to positions in
33a file (or position in a compressed object stream). That way a file can be viewed
34without reading all data: a viewer only loads what is needed.
35
36\starttyping
371 0 obj <<
38    /Name (test) /Address 2 0 R
39>>
402 0 obj [
41   (Main Street) (24) (postal code) (MyPlace)
42]
43\stoptyping
44
45For the sake of the discussion we consider strings like \type {(test)} also to be
46objects. In the next table we list what we can encounter in a \PDF\ file. There
47can be indirect objects in which case a reference is used (\type{2 0 R}) and
48direct ones.
49
50\starttabulate[|l|l|p|]
51\FL
52\NC \bf type \NC \bf form \NC \bf meaning \NC \NR
53\TL
54\NC constant   \NC \type{/...} \NC A symbol (prescribed string). \NC \NR
55\NC string     \NC \type{(...)} \NC A sequence of characters in pdfdoc encoding \NC \NR
56\NC unicode    \NC \type{<...>} \NC A sequence of characters in utf16  encoding \NC \NR
57\NC number     \NC \type{3.1415} \NC A number constant. \NC \NR
58\NC boolean    \NC \type{true/false} \NC A boolean constant. \NC \NR
59\NC reference  \NC \type{N 0 R} \NC A reference to an object \NC \NR
60\NC dictionary \NC \type{<< ... >>} \NC A collection of key value pairs where the
61                   value itself is an (indirect) object. \NC \NR
62\NC array      \NC \type{[ ... ]} \NC A list of objects or references to objects. \NC \NR
63\NC stream     \NC \NC A sequence of bytes either or not packaged with a dictionary
64                   that contains descriptive data. \NC \NR
65\NC xform      \NC \NC A special kind of object containing an reusable blob of data,
66                   for example an image. \NC \NR
67\LL
68\stoptabulate
69
70While writing additional backend code, we mostly create dictionaries.
71
72\starttyping
73<< /Name (test) /Address 2 0 R >>
74\stoptyping
75
76In this case the indirect object can look like:
77
78\starttyping
79[ (Main Street) (24) (postal code) (MyPlace) ]
80\stoptyping
81
82It all starts in the document's root object. From there we access the page tree
83and resources. Each page carries its own resource information which makes random
84access easier. A page has a page stream and there we find the to be rendered
85content as a mixture of (\UNICODE) strings and special drawing and rendering
86operators. Here we will not discuss them as they are mostly generated by the
87engine itself or dedicated subsystems like the \METAPOST\ converter. There we use
88literal or \type {\latelua} whatsits to inject code into the current stream.
89
90In the \CONTEXT\ \MKII\ backend drivers code you will see objects in their
91verbose form. The content is passed on using special primitives, like \type
92{\pdfobj}, \type{\pdfannot}, \type {\pdfcatalog}, etc. In \MKIV\ no such
93primitives are used. In fact, some of them are overloaded to do nothing at all.
94In the \LUA\ backend code you will find function calls like:
95
96\starttyping
97local d = lpdf.dictionary {
98    Name    = lpdf.string("test"),
99    Address = lpdf.array {
100        "Main Street", "24", "postal code", "MyPlace",
101    }
102}
103\stoptyping
104
105Equaly valid is:
106
107\starttyping
108local d = lpdf.dictionary()
109d.Name = "test"
110\stoptyping
111
112Eventually the object will end up in the file using calls like:
113
114\starttyping
115local r = pdf.immediateobj(tostring(d))
116\stoptyping
117
118or using the wrapper (which permits tracing):
119
120\starttyping
121local r = lpdf.flushobject(d)
122\stoptyping
123
124The object content will be serialized according to the formal specification so
125the proper \type {<< >>} etc.\ are added. If you want the content instead you can
126use a function call:
127
128\starttyping
129local dict = d()
130\stoptyping
131
132An example of using references is:
133
134\starttyping
135local a = lpdf.array {
136    "Main Street", "24", "postal code", "MyPlace",
137}
138local d = lpdf.dictionary {
139    Name    = lpdf.string("test"),
140    Address = lpdf.reference(a),
141}
142local r = lpdf.flushobject(d)
143\stoptyping
144
145\stopsection
146
147We have the following creators. Their arguments are optional.
148
149\starttabulate[|l|p|]
150\FL
151\NC \bf function \NC \bf optional parameter \NC \NR
152\TL
153%NC \type{lpdf.stream}      \NC indexed table of operators \NC \NR
154\NC \type{lpdf.dictionary}  \NC hash with key/values \NC \NR
155\NC \type{lpdf.array}       \NC indexed table of objects \NC \NR
156\NC \type{lpdf.unicode}     \NC string \NC \NR
157\NC \type{lpdf.string}      \NC string \NC \NR
158\NC \type{lpdf.number}      \NC number \NC \NR
159\NC \type{lpdf.constant}    \NC string \NC \NR
160\NC \type{lpdf.null}        \NC \NC \NR
161\NC \type{lpdf.boolean}     \NC boolean \NC \NR
162%NC \type{lpdf.true}        \NC \NC \NR
163%NC \type{lpdf.false}       \NC \NC \NR
164\NC \type{lpdf.reference}   \NC string \NC \NR
165\NC \type{lpdf.verbose}     \NC indexed table of strings \NC \NR
166\LL
167\stoptabulate
168
169Flushing objects is done with:
170
171\starttyping
172lpdf.flushobject(obj)
173\stoptyping
174
175Reserving object is or course possible and done with:
176
177\starttyping
178local r = lpdf.reserveobject()
179\stoptyping
180
181Such an object is flushed with:
182
183\starttyping
184lpdf.flushobject(r,obj)
185\stoptyping
186
187We also support named objects:
188
189\starttyping
190lpdf.reserveobject("myobject")
191
192lpdf.flushobject("myobject",obj)
193\stoptyping
194
195\startsection [title={Resources}]
196
197While \LUATEX\ itself will embed all resources related to regular typesetting,
198\MKIV\ has to take care of embedding those related to special tricks, like
199annotations, spot colors, layers, shades, transparencies, metadata, etc. If you
200ever took a look in the \MKII\ \type {spec-*} files you might have gotten the
201impression that it quickly becomes messy. The code there is actually rather old
202and evolved in sync with the \PDF\ format as well as \PDFTEX\ and \DVIPDFMX\
203maturing to their current state. As a result we have a dedicated object
204referencing model that sometimes results in multiple passes due to forward
205references. We could have gotten away from that with the latest versions of
206\PDFTEX\ as it provides means to reserve object numbers but it makes not much
207sense to do that now that \MKII\ is frozen.
208
209Because third party modules (like tikz) also can add resources like in \MKII\
210using an \API\ that makes sure that no interference takes place. Think of macros
211like:
212
213\starttyping
214\pdfbackendsetcatalog       {key}{string}
215\pdfbackendsetinfo          {key}{string}
216\pdfbackendsetname          {key}{string}
217
218\pdfbackendsetpageattribute {key}{string}
219\pdfbackendsetpagesattribute{key}{string}
220\pdfbackendsetpageresource  {key}{string}
221
222\pdfbackendsetextgstate     {key}{pdfdata}
223\pdfbackendsetcolorspace    {key}{pdfdata}
224\pdfbackendsetpattern       {key}{pdfdata}
225\pdfbackendsetshade         {key}{pdfdata}
226\stoptyping
227
228One is free to use the \LUA\ interface instead, as there one has more
229possibilities. The names are similar, like:
230
231\starttyping
232lpdf.addtoinfo(key,anything_valid_pdf)
233\stoptyping
234
235At the time of this writing (\LUATEX\ .50) there are still places where \TEX\ and
236\LUA\ code is interwoven in a non optimal way, but that will change in the future
237as the backend is completely separated and we can do more \TEX\ trickery at the
238\LUA\ end.
239
240Also, currently we expose more of the backend code than we like and future
241versions will have a more restricted access. The following function will stay
242public:
243
244\starttyping
245lpdf.addtopageresources  (key,value)
246lpdf.addtopageattributes (key,value)
247lpdf.addtopagesattributes(key,value)
248
249lpdf.adddocumentextgstate(key,value)
250lpdf.adddocumentcolorspac(key,value)
251lpdf.adddocumentpattern  (key,value)
252lpdf.adddocumentshade    (key,value)
253
254lpdf.addtocatalog        (key,value)
255lpdf.addtoinfo           (key,value)
256lpdf.addtonames          (key,value)
257\stoptyping
258
259There are several tracing options built in and some more will be added in due
260time:
261
262\starttyping
263\enabletrackers
264  [backend.finalizers,
265   backend.resources,
266   backend.objects,
267   backend.detail]
268\stoptyping
269
270As with all trackers you can also pass them on the command line, for example:
271
272\starttyping
273context --trackers=backend.* yourfile
274\stoptyping
275
276The reference related backend mechanisms have their own trackers.
277
278\stopsection
279
280\startsection [title={Transformations}]
281
282There is at the time of this writing still some backend related code at the \TEX\
283end that needs a cleanup. Most noticeable is the code that deals with
284transformations (like scaling). At some moment in \PDFTEX\ a primitive was
285introduced but it was not completely covering the transform matrix so we never
286used it. In \LUATEX\ we will come up with a better mechanism. Till that moment we
287stick to the \MKII\ method.
288
289\stopsection
290
291\startsection [title={Annotations}]
292
293The \LUA\ based backend of \MKIV\ is not so much less code, but definitely
294cleaner. The reason why there is quite some code is because in \CONTEXT\ we also
295handle annotations and destinations in \LUA. In other words: \TEX\ is not
296bothered by the backend any more. We could make that split without too much
297impact as we never depended on \PDFTEX\ hyperlink related features and used
298generic annotations instead. It's for that reason that \CONTEXT\ has always been
299able to nest hyperlinks and have annotations with a chain of actions.
300
301Another reason for doing it all at the \LUA\ end is that as in \MKII\ we have to
302deal with the rather hybrid cross reference mechanisms which uses a sort of
303language and parsing this is also easier at the \LUA\ end. Think of:
304
305\starttyping
306\definereference[somesound][StartSound(attention)]
307
308\at {just some page} [someplace,somesound,StartMovie(somemovie)]
309\stoptyping
310
311We parse the specification expanding shortcuts when needed, create an action
312chain, make sure that the movie related resources are taken care of (normally the
313movie itself will be a figure), and turn the three words into hyperlinks. As this
314all happens in \LUA\ we have less \TEX\ code. Contrary to what you might expect,
315the \LUA\ code is not that much faster as the \MKII\ \TEX\ code is rather
316optimized.
317
318Special features like \JAVASCRIPT\ as well as widgets (and forms) are also
319reimplemented. Support for \JAVASCRIPT\ is not that complex at all, but as in
320\CONTEXT\ we can organize scripts in collections and have automatic inclusion of
321used functions, still some code is needed. As we now do this in \LUA\ we use less
322\TEX\ memory. Reimplementing widgets took a bit more work as I used the
323opportunity to remove hacks for older viewers. As support for widgets is somewhat
324instable in viewers quite some testing was needed, especially because we keep
325supporting cloned and copied fields (resulting in widget trees).
326
327An interesting complication with widgets is that each instance can have a lot of
328properties and as we want to be able to use thousands of them in one document,
329each with different properties, we have efficient storage in \MKII\ and want to
330do the same in \LUA. Most code at the \TEX\ end is related to passing all those
331options.
332
333You could use the \LUA\ functions that relate to annotations etc.\ but normally
334you will use the regular \CONTEXT\ user interface. For practical reasons, the
335backend code is grouped in several tables:
336
337The \type{backends} table has subtables for each backend and currently there is
338only one: \type {pdf}. Each backend provides tables itself. In the
339\type{codeinjections} namespace we collect functions that don't interfere with
340the typesetting or typeset result, like inserting all kind of resources (movies,
341attachment, etc.), widget related functionality, and in fact everything that does
342not fit into the other categories. In \type {nodeinjections} we organize
343functions that inject literal \PDF\ code in the nodelist which then ends up in
344the \PDF\ stream: color, layers, etc. The \type {registrations} table is reserved
345for functions related to resources that result from node injections: spot colors,
346transparencies, etc. Once the backend code is finished we might come up with
347another organization. No matter what we end up with, the way the \type {backends}
348table is supposed to be organized determines the \API\ and those who have seen
349the \MKII\ backend code will recognize some of it.
350
351\startsection [title={Metadata}]
352
353We always had the opportunity to set the information fields in a \PDF\ but
354standardization forces us to add these large verbose metadata blobs. As this blob
355is coded in \XML\ we use the built in \XML\ parser to fill a template. Thanks to
356extensive testing and research by Peter Rolf we now have a rather complete
357support for \PDF/x related demands. This will definitely evolve with the advance
358of the \PDF\ specification. You can replace the information with your own but we
359suggest that you stay away from this metadata mess as far as possible.
360
361\stopsection
362
363\startsection [title={Helpers}]
364
365If you look into the \type {lpdf-*.lua} files you will find more
366functions. Some are public helpers, like:
367
368\starttabulate
369\NC \type {lpdf.toeight(str)}   \NC returns \type {(string)} \NC \NR
370%NC \type {lpdf.cleaned(str)}   \NC returns \type {escaped string} \NC \NR
371\NC \type {lpdf.tosixteen(str)} \NC returns \type {<utf16 sequence>} \NC \NR
372\stoptabulate
373
374An example of another public function is:
375
376\starttyping
377lpdf.sharedobj(content)
378\stoptyping
379
380This one flushes the object and returns the object number. Already defined
381objects are reused. In addition to this code driven optimization, some other
382optimization and reuse takes place but all that happens without user
383intervention.
384
385\stopsection
386
387\stopchapter
388
389\stopcomponent
390