hybrid-merge.tex /size: 8934 b    last modification: 2023-12-21 09:43
1% language=us
2
3\startcomponent hybrid-merge
4
5\environment hybrid-environment
6
7\startchapter[title={Including pages}]
8
9\startsection [title={Introduction}]
10
11It is tempting to add more and more features to the backend code
12of the engine but it is not really needed. Of course there are
13features that can best be supported natively, like including
14images. In order to include \PDF\ images in \LUATEX\ the backend
15uses a library (xpdf or poppler) that can load an page from a file
16and embed that page into the final \PDF, including all relevant
17(indirect) objects needed for rendering. In \LUATEX\ an
18experimental interface to this library is included, tagged as
19\type {epdf}. In this chapter I will spend a few words on my first
20attempt to use this new library.
21
22\stopsection
23
24\startsection [title={The library}]
25
26The interface is rather low level. I got the following example
27from Hartmut (who is responsible for the \LUATEX\ backend code and
28this library).
29
30\starttyping
31local doc = epdf.open("luatexref-t.pdf")
32local cat = doc:getCatalog()
33local pag = cat:getPage(3)
34local box = pag:getMediaBox()
35
36local w = pag:getMediaWidth()
37local h = pag:getMediaHeight()
38local n = cat:getNumPages()
39local m = cat:readMetadata()
40
41print("nofpages: ", n)
42print("metadata: ", m)
43print("pagesize: ", w .. " * " .. h)
44print("mediabox: ", box.x1, box.x2, box.y1, box.y2)
45\stoptyping
46
47As you see, there are accessors for each interesting property
48of the file. Of course such an interface needs to be extended
49when the \PDF\ standard evolves. However, once we have access to
50the so called catalog, we can use regular accessors to the
51dictionaries, arrays and other data structures. So, in fact we
52don't need a full interface and can draw the line somewhere.
53
54There are a couple of things that you normally do not want to
55deal with. A \PDF\ file is in fact just a collection of objects
56that form a tree and each object can be reached by an index using
57a table that links the index to a position in the file. You don't
58want to be bothered with that kind of housekeeping indeed. Some data
59in the file, like page objects and annotations are organized in a
60tree form that one does not want to access in that form, so again
61we have something that benefits from an interface. But the
62majority of the objects are simple dictionaries and arrays.
63Streams (these hold the document content, image data, etc.) are
64normally not of much interest, but the library provides an
65interface as you can bet on needing it someday. The library also
66provides ways to extend the loaded \PDF\ file. I will not discuss
67that here.
68
69Because in \CONTEXT\ we already have the \type {lpdf} library for
70creating \PDF\ structures, it makes sense to define a similar
71interface for accessing \PDF. For that I wrote a wrapper that will
72be extended in due time (read: depending on needs). The previous
73code now looks as follows:
74
75\starttyping
76local doc = epdf.open("luatexref-t.pdf")
77local cat = doc.Catalog
78local pag = cat.Pages[3]
79local box = pag.MediaBox
80
81local llx, lly, urx, ury = box[1], box[2] box[3], box[4]
82
83local w = urx - llx -- or: box.width
84local h = ury - lly -- or: box.height
85local n = cat.Pages.size
86local m = cat.Metadata.stream
87
88print("nofpages: ", n)
89print("metadata: ", m)
90print("pagesize: ", w .. " * " .. h)
91print("mediabox: ", llx, lly, urx, ury)
92\stoptyping
93
94If we write code this way we are less dependent on the exact \API,
95especially because the \type {epdf} library uses methods to access
96the data and we cannot easily overload method names in there. When
97you look at the \type {box}, you will see that the natural way to
98access entries is using a number. As a bonus we also provide the
99\type {width} and \type {height} entries.
100
101\stopsection
102
103\startsection [title={Merging links}]
104
105It has always been on my agenda to add the possibility to carry
106the (link) annotations with an included page from a document. This
107is not that much needed in a regular document, but it can be handy
108when you use \CONTEXT\ to assemble documents. In any case, such a
109merge has to happen in such a way that it does not interfere with
110other links in the parent document. Supporting this in the engine
111is no option as each macro package follows its own approach to
112referencing and interactivity. Also, demands might differ and one
113would end up with a lot of (error prone) configurability. Of course
114we want scaled pages to behave well too.
115
116Implementing the merge took about a day and most of that time was
117spent on experimenting with the \type {epdf} library and making
118the first version of the wrapper. I definitely had expected to
119waste more time on it. So, this is yet another example of
120extensions that are quite doable in the \LUA|-|\TEX\ mix. Of
121course it helps that the \CONTEXT\ graphic inclusion code provides
122enough information to integrate such a feature. The merge is
123controlled by the interaction key, as shown here:
124
125\starttyping
126\externalfigure[somefile.pdf][page=1,scale=700,interaction=yes]
127\externalfigure[somefile.pdf][page=2,scale=600,interaction=yes]
128\externalfigure[somefile.pdf][page=3,scale=500,interaction=yes]
129\stoptyping
130
131You can finetune the merge by providing a list of options to the
132interaction key but that's still somewhat experimental. As a start
133the following links are supported.
134
135\startitemize[packed]
136\startitem internal references by name (often structure related) \stopitem
137\startitem internal references by page (e.g.\ table of contents) \stopitem
138\startitem external references by file (optionally by name and page) \stopitem
139\startitem references to uri's (normally used for webpages) \stopitem
140\stopitemize
141
142When users like this functionality (or when I really need it
143myself) more types of annotations can be added although support
144for \JAVASCRIPT\ and widgets doesn't make much sense. On the other
145hand, support for destinations is currently somewhat simplified
146but at some point we will support the relevant zoom options.
147
148The implementation is not that complex:
149
150\startitemize[packed]
151\startitem check if the included page has annotations \stopitem
152\startitem loop over the list of annotations and determine if
153           an annotation is supported (currently links) \stopitem
154\startitem analyze the annotation and overlay a button using the
155           destination that belongs to the annotation \stopitem
156\stopitemize
157
158Now, the reason why we can keep the implementation so simple is that
159we just map onto existing \CONTEXT\ functionality. And, as we have
160a rather integrated support for interactive actions, only a few
161basic commands are involved. Although we could do that all in
162\LUA, we delegate this to \TEX. We create a layer which we put on top
163of the image. Links are put onto this layer using the equivalent of:
164
165\starttyping
166\setlayer
167  [epdflinks]
168  [x=...,y=...,preset=leftbottom]
169  {\button
170     [width=...,height=...,offset=overlay,frame=off]
171     {}% no content
172     [...]}}
173\stoptyping
174
175The \type {\button} command is one of those interaction related
176commands that accepts any action related directive. In this first
177implementation we see the following destinations show up:
178
179\starttyping
180somelocation
181url(http://www.pragma-ade.com)
182file(somefile)
183somefile::somelocation
184somefile::page(10)
185\stoptyping
186
187References to pages become named destinations and are later
188resolved to page destinations again, depending on the
189configuration of the main document. The links within an included
190file get their own namespace so (hopefully) they will not clash
191with other links.
192
193We could use lower level code which is faster but we're not
194talking of time critical code here. At some point I might optimize
195the code a bit but for the moment this variant gives us some
196tracing options for free. Now, the nice thing about using this
197approach is that the already existing cross referencing mechanisms
198deal with the details. Each included page gets a unique reference
199so references to not included pages are ignored simply because
200they cannot be resolved. We can even consider overloading certain
201types of links or ignoring named destinations that match a
202specific pattern. Nothing is hard coded in the engine so we have
203complete freedom of doing that.
204
205\stopsection
206
207\startsection [title={Merging layers}]
208
209When including graphics from other applications it might be that
210they have their content organized in layers (that then can be
211turned on or off). So it will be no surprise that on the agenda is
212merging layer information: first a straightforward inclusion of
213optional content dictionaries, but it might make sense to parse
214the content stream and replace references to layers by those that
215are relevant in the main document. Especially when graphics come
216from different sources and layer names are inconsistent some
217manipulation might be needed so maybe we need more detailed
218control. Implementing this is is no big deal and mostly a matter
219of figuring out a clean and simple user interface.
220
221\stopsection
222
223\stopchapter
224
225\stopcomponent
226