SourceBrowser

luametatex-pdf.tex /size: 11 Kb last modification: 2025-02-21 11:03
1% language=us runpath=texruns:manuals/luametatex
2
3\environment luametatex-style
4
5\startdocument[title=PDF]
6
7\startsection[title={Introduction}]
8
9There is no backend, not even a \DVI\ one. In \CONTEXT\ the main backend is a
10\PDF\ backend and it is written in \LUA. The \PDF\ format makes it possible to
11embed \JPEG\ and \PNG\ encoded images as well as \PDF\ images. All these have to
12be dealt with in \LUA. Although we can parse \PDF\ files with \LUA, the engine
13has a dedicated \PDF\ library on board written by Paweł Jackowski.
14
15A \PDF\ file is basically a tree of objects and one descends into the tree via
16dictionaries (key|/|value) and arrays (index|/|value). There are a few topmost
17dictionaries that start at the document root and those are accessed more
18directly.
19
20Although everything in \PDF\ is basically an object we have to wrap a few in so
21called userdata \LUA\ objects.
22
23\starttabulate[|l|l|]
24\FL
25\BC PDF        \BC \LUA         \NC \NR
26\TL
27\NC null       \NC <t:nil>      \NC \NR
28\NC boolean    \NC <t:boolean>  \NC \NR
29\NC integer    \NC <t:integer>  \NC \NR
30\NC float      \NC <t:number>   \NC \NR
31\NC name       \NC <t:string>   \NC \NR
32\NC string     \NC <t:string>   \NC \NR
33\NC array      \NC <t:userdata> \NC \NR
34\NC dictionary \NC <t:userdata> \NC \NR
35\NC stream     \NC <t:userdata> \NC \NR
36\NC reference  \NC <t:userdata> \NC \NR
37\LL
38\stoptabulate
39
40The interface is rather limited to creating an instance and getting objects and
41values. Aspects like compression and encryption are mostly dealt with
42automatically. In \CONTEXT\ users use an interface layer around these, if they
43use this kind of low level code at all as it assumes familiarity with how \PDF\
44is constructed.
45
46\stopsection
47
48\startsection[title={\LUA\ interfaces}]
49
50\startsubsection[title={Opening and closing}]
51
52There are two ways to open a \PDF\ file:
53
54\starttyping[option=LUA]
55function pdfe.open ( <t:string> filename )
56    return <t:pdf>  -- pdffile
57end
58
59function pdfe.openfile( <t:file> filehandle )
60    return <t:pdf> -- pdffile
61end
62\stoptyping
63
64Instead of from file, we can read from a string:
65
66\starttyping[option=LUA]
67function pdfe.new ( <t:string>  somestring, <t:integer> somelength )
68    return <t:pdf> -- pdffile
69end
70\stoptyping
71
72Closing the instance is done with:
73
74\starttyping[option=LUA]
75function pdfe.close ( <t:pdf> pdffile )
76    -- no return values
77end
78\stoptyping
79
80When we used \type {pdfe.open} the library manages the file and closes it when
81done. You can check if a document opened as expected by calling:
82
83\starttyping[option=LUA]
84function pdfe.getstatus ( <t:pdf> pdffile )
85    return <t:integer> -- status
86end
87\stoptyping
88
89A table of possible return codes can be queried with:
90
91\starttyping[option=LUA]
92function pdfe.getstatusvalues ( )
93    return <t:table> -- values
94end
95\stoptyping
96
97Currently we have these:
98
99\ctxlua{moduledata.pdfe.codes("getstatusvalues")}
100
101An encrypted document can be decrypted by the next command where instead of
102either password you can give \type {nil} and hope for the best:
103
104\starttyping[option=LUA]
105function pdfe.unencrypt (
106    <t:pdf>    pdffile,
107    <t:string> userpassword,
108    <t:string> ownerpassword
109)
110    return <t:integer> -- status
111end
112\stoptyping
113
114\stopsubsection
115
116\startsubsection[title=Getting basic information]
117
118A successfully opened document can provide some information:
119
120\starttyping[option=LUA]
121function pdfe.getsize( <t:pdf> pdffile )
122    return <t:integer> -- nofbytes
123end
124
125function pdfe.getversion( <t:pdf> pdffile )
126    return
127        <t:integer>, -- major
128        <t:integer>  -- minor
129end
130
131function pdfe.getnofobjects ( <t:pdf> pdffile )
132    return <t:integer> -- nofobjects
133end
134
135function pdfe.getnofpages ( <t:pdf> pdffile )
136    return <t:integer> -- nofpages
137end
138
139function pdfe.memoryusage ( <t:pdf> pdffile )
140    return
141        <t:integer>, -- bytes
142        <t:integer>  -- waste
143end
144\stoptyping
145
146\stopsubsection
147
148\startsubsection[title={The main structure}]
149
150For accessing the document structure you start with the so called catalog, a
151dictionary:
152
153\starttyping[option=LUA]
154function pdfe.getcatalog( <t:pdf> pdffile )
155    return <t:userdata> -- dictionary
156end
157\stoptyping
158
159The other two root dictionaries are accessed with:
160
161\starttyping[option=LUA]
162function pdfe.gettrailer ( <t:pdf> pdffile )
163    return <t:userdata> -- dictionary
164end
165
166function pdfe.getinfo ( <t:pdf> pdffile )
167    return <t:userdata> -- dictionary
168end
169\stoptyping
170
171\stopsubsection
172
173\startsubsection[title={Getting content}]
174
175A specific page can conveniently be reached with the next command, which returns
176a dictionary.
177
178\starttyping[option=LUA]
179function pdfe.getpage ( <t:pdf> pdffile, <t:integer> pagenumber )
180    return <t:userdata> -- dictionary
181end
182\stoptyping
183
184Another convenience command gives you the (bounding) box of a (normally page)
185which can be inherited from the document itself. An example of a valid box name
186is \type {MediaBox}.
187
188\starttyping[option=LUA]
189function pdfe.getbox ( <t:pdf> pdffile, <t:string> boxname )
190    return <t:table> -- boundingbox
191end
192\stoptyping
193
194\stopsubsection
195
196\startsubsection[title={Getters}]
197
198Common values in dictionaries and arrays are strings, integers, floats, booleans
199and names (which are also strings) and these are also normal \LUA\ objects. In
200some cases a value is a userdata object and you can use this helper to get some
201more information:
202
203\starttyping[option=LUA]
204function pdfe.type ( <t:whatever> value )
205    return type -- string
206end
207\stoptyping
208
209Stings are special because internally they are delimited by parenthesis (often
210\typ {pdfdoc} encoding) or angle brackets (hexadecimal or 16 bit \UNICODE).
211
212\starttyping[option=LUA]
213function pdfe.getstring (
214    <t:userdata> object,
215    <t:string>   key | <t:integer> index
216)
217    return
218        <t:string>  -- decoded value
219end
220\stoptyping
221
222When you ask for more you get more:
223
224\starttyping[option=LUA]
225function pdfe.getstring (
226    <t:userdata> object,
227    <t:string>   key | <t:integer> index,
228    <t:boolean>  more
229)
230    return
231        <t:string>, -- original
232        <t:boolean>  -- hexencoded
233end
234\stoptyping
235
236Basic types are fetched with:
237
238\starttyping[option=LUA]
239function pdfe.getinteger ( <t:userdata>, <t:string> key | <t:integer> index )
240    return <t:integer> -- value
241end
242
243function pdfe.getnumber ( <t:userdata>, <t:string> key | <t:integer> index )
244    return <t:number> -- value
245end
246
247function pdfe.getboolean ( <t:userdata>, <t:string> key | <t:integer> index )
248    return <t:boolean> -- value
249end
250\stoptyping
251
252A name is (in the \PDF\ file) a string prefixed by a slash, like \typ
253[option=PDF] {<< /Type /Foo >>}, for instance keys in a dictionary or keywords in
254an array or constant values.
255
256\starttyping[option=LUA]
257function pdfe.getname ( <t:userdata>, <t:string> key | <t:integer> index )
258    return <t:string> -- value
259end
260\stoptyping
261
262Normally you will use an index in an array and key in a dictionary but dictionaries
263also accept an index. The size of an array or dictionary is available with the
264usual \type {#} operator.
265
266\starttyping[option=LUA]
267function pdfe.getdictionary ( <t:userdata>, <t:string> key | <t:integer> index )
268    return <t:userdata> -- dictionary
269end
270
271function pdfe.getarray ( <t:userdata>, <t:string> key | <t:integer> index )
272    return <t:userdata> -- array
273end
274
275function pdfe.getstream ( <t:userdata>, <t:string> key | <t:integer> index )
276    return
277        <t:userdata> -- stream
278        <t:userdata> -- dictionary
279end
280\stoptyping
281
282These commands return dictionaries, arrays and streams, which are dictionaries
283with a blob of data attached.
284
285Before we come to an alternative access mode, we mention that the objects provide
286access in a different way too, for instance this is valid:
287
288\starttyping[option=LUA]
289print(pdfe.open("foo.pdf").Catalog.Type)
290\stoptyping
291
292At the topmost level there are \type {Catalog}, \type {Info}, \type {Trailer}
293and \type {Pages}, so this is also okay:
294
295\starttyping[option=LUA]
296print(pdfe.open("foo.pdf").Pages[1])
297\stoptyping
298
299\stopsubsection
300
301\startsubsection[title={Streams}]
302
303Streams are sort of special. When your index or key hits a stream you get back a
304stream object and dictionary object. The dictionary you can access in the usual
305way and for the stream there are the following methods:
306
307\starttyping[option=LUA]
308function pdfe.openstream ( <t:userdata> stream, <t:boolean> decode)
309    return <t:boolean> okay
310end
311
312function pdfe.closestream ( <t:userdata> stream )
313    -- no return values
314end
315
316function pdfe.readfromstream ( <t:userdata> stream )
317    return
318        <t:string>  str,
319        <t:integer> size
320end
321
322function pdfe.readwholestream ( <t:userdata> stream, <t:boolean> decode)
323    return
324        <t:string>  str,
325        <t:integer> size
326end
327\stoptyping
328
329You either read in chunks, or you ask for the whole. When reading in chunks, you
330need to open and close the stream yourself. The \type {decode} parameter controls
331if the stream data gets uncompressed.
332
333As with dictionaries, you can access fields in a stream dictionary in the usual
334\LUA\ way too. You get the content when you \quote {call} the stream. You can
335pass a boolean that indicates if the stream has to be decompressed.
336
337\stopsubsection
338
339\startsubsection[title={Low level getters}]
340
341In addition to the getters described before, there is also a bit lower level
342interface available.
343
344\starttyping[option=LUA]
345function pdfe.getfromdictionary ( <t:userdata>, <t:integer> index )
346    return
347        <t:string>   key,
348        <t:string>   type,
349        <t:whatever> value,
350        <t:whatever> detail
351end
352
353function pdfe.getfromarray ( <t:userdata>, <t:integer> index )
354    return
355        <t:integer>  type,
356        <t:whatever> value,
357        <t:integerr> detail
358end
359\stoptyping
360
361The \type {type} is one of the following:
362
363\startfourrows
364\ctxlua{moduledata.pdfe.codes("getfieldtypes")}
365\stopfourrows
366
367This list was acquired with:
368
369\starttyping[option=LUA]
370function pdfe.getfieldtypes ( )
371    return <t:table> -- types
372end
373\stoptyping
374
375Here \type {detail} is a bitset with possible bits:
376
377\startfourrows
378\ctxlua{moduledata.pdfe.codes("getencodingvalues")}
379\stopfourrows
380
381This time we used:
382
383\starttyping[option=LUA]
384function pdfe.getencodingvalues ( )
385    return <t:table> -- values
386end
387\stoptyping
388
389\stopsubsection
390
391\startsubsection[title={Getting tables}]
392
393All entries in a dictionary or table can be fetched with the following commands
394where the return values are a hashed or indexed table.
395
396\starttyping[option=LUA]
397function pdfe.dictionarytotable ( <t:userdata> )
398    return <t:table> -- hash
399end
400
401function pdfe.arraytotable ( <t:userdata> )
402    return <t:table> -- array
403end
404\stoptyping
405
406You can get a list of pages with:
407
408\starttyping[option=LUA]
409function pdfe.pagestotable(<t:pdf> pdffile)
410    return {
411        {
412            <t:userdata>, -- dictionary
413            <t:integer>,  -- size
414            <t:integer>,  -- objectnumber
415        },
416        ...
417    }
418end
419\stoptyping
420
421\stopsubsection
422
423\startsubsection[title={References}]
424
425In order to access a \PDF\ file efficiently there is lazy evaluation of
426references so when you run into a reference as value or array entry you have to
427resolve it explicitly. An unresolved references object can be resolved with:
428
429\starttyping
430function pdfe.getfromreference( <t:integer> reference ) -- NEEDS CHECKING
431    return
432        <t:integer>,  -- type
433        <t:whatever>, -- value
434        <t:whatever>  -- detail
435\stoptyping
436
437So, as second value you can get back a new \type {pdfe} userdata object that you
438can query.
439
440\stopsubsection
441
442\stopsection
443
444\stopdocument
445
Source Browser ?