1
2
3
4
5
6
7
8
9
10
11\usemodule[abbreviationslogos]
12
13\setupbodyfont
14 [bonum,10pt]
15
16\setuplayout
17 [topspace=.05ph,
18 bottomspace=.05ph,
19 backspace=.05ph,
20 header=.05 ph,
21 footer=0pt,
22 width=middle,
23 height=middle]
24
25\setupwhitespace
26 [big]
27
28\setuphead
29 [chapter]
30 [style=\bfc,
31 headerstate=high,
32 interaction=all]
33
34\setuphead
35 [section]
36 [style=\bfb]
37
38\setuphead
39 [subsection]
40 [style=\bfa]
41
42\setuphead
43 [subsubsection]
44 [style=\bf,
45 after=]
46
47\setuplist
48 [interaction=all]
49
50\setuptyping
51 [color=darkyellow]
52
53\enabletrackers[graphics.fonts]
54\enabletrackers[graphics.fixes]
55
56\startdocument
57 [title=PDFmerge,
58 author=Hans Hagen]
59
60\startMPpage
61 fill Page withcolor "darkyellow" ;
62
63 picture p[] ;
64
65 p1 := image ( draw textext.ulft("PDF")
66 ysized 4cm
67 shifted lrcorner Page
68 withcolor "white"
69 ; );
70 p2 := image ( draw textext.ulft("\strut merging, embedding, fixing")
71 xsized bbwidth p[1]
72 shifted lrcorner Page
73 withcolor "lightgray"
74 ; );
75 p3 := image ( draw textext.ulft("\strut and messing a bit around")
76 xsized bbwidth p[1]
77 shifted lrcorner Page
78 withcolor "lightgray"
79 ; );
80 p4 := image ( draw textext.ulft("\strut in context lmtx")
81 xsized bbwidth p[1]
82 shifted lrcorner Page
83 withcolor "lightgray"
84 ; );
85 draw p[4] shifted (-1cm,1cm) ;
86 draw p[3] shifted (-1cm,1cmbbheight(p4)+0cm) ;
87 draw p[2] shifted (-1cm,1cmbbheight(p3)bbheight(p4)+0cm) ;
88 p[1] := p[1] shifted (-1cm,1cmbbheight(p2)bbheight(p3)bbheight(p4)+15mm);
89 save dy ; dy := 0 ;
90 for i=0 upto 7 :
91 p[1] := p[1] yscaled (1i80) ;
92 draw p[1]
93 shifted (0,dy)
94 withtransparency (1,.8i10) ;
95 dy := dy .6bbheight(p[1]) ;
96 endfor ;
97\stopMPpage
98
99\startluacode
100
101table.save(
102 "compactors-pdfmerge.lua",
103 {
104 compactors = {
105 ["mine"] = {
106 identify = {
107 content = true,
108 resources = true,
109 page = true,
110 },
111
112
113
114
115
116 merge = {
117 type0 = true,
118 truetype = true,
119 type1 = true,
120 LMTX = true,
121 },
122 strip = {
123
124
125 marked = true,
126 },
127
128
129
130
131
132
133
134
135
136
137
138
139 recolor = {
140 viagray = { 1, 0, 0 },
141
142
143
144 },
145 }
146 }
147 }
148)
149\stopluacode
150
151\starttext
152
153\startsubject[title={Introduction}]
154
155The three graphic formats that make most sense for inclusion in \PDF\ are \PNG,
156\JPG, and \PDF. The easiest if these is \JPG\ because basically the binary blob
157get transferred to the result file. A \PNG\ graphic might need more work because
158what actually is supported is basic \PNG\ inclusion. It means that often the
159image data has to be unpacked and split into \PDF\ counterparts that get
160embedded. The \PDF\ format is quite convenient because basically we only need to
161copy the used objects to the result, so when those object are for instance \PNG\
162encoded images, we gain runtime, but when were talking pages of documents it
163might take some more. Nevertheless, in practice it is still quite efficient.
164
165This manual describes how to manipulate \PDF\ files that are not behaving well
166or from which pages are to be embedded another \PDF\ file within constraints. We
167discuss how to cleanup andor embed fonts, fix colors, get rid of interfering
168resources and fix the page stream.
169
170Many thanks to Massimiliano Farinella for teaming up to make this all work better
171and conducting extensive test on complex documents.
172
173\startlines
174Hans Hagen
175Hasselt NL
176January 2024\high{}
177\stoplines
178
179\stopsubject
180
181\startsubject[title={Embedding \PDF\ files}]
182
183Here we focus on \PDF\ inclusion where we have several scenarios to deal with:
184
185\startitemize[packed]
186 \startitem
187 A straightforward inclusion of a single page \PDF\ file.
188 \stopitem
189 \startitem
190 Inclusion of a specific page from a \PDF\ file.
191 \stopitem
192 \startitem
193 Inclusion of several pages from a \PDF\ file.
194 \stopitem
195 \startitem
196 Inclusion of one or more pages from several \PDF\ files.
197 \stopitem
198\stopitemize
199
200To this we can add:
201
202\startitemize
203 \startitem
204 Inclusion of one or more pages from \PDF\ files that are generated
205 independently (subruns) for instance in the process of writing a manual
206 about something \CONTEXT. Think of externally processed buffers.
207 \stopitem
208\stopitemize
209
210The most natural way to include pages is to use the \typ {\externalfigure}
211command but later we will see that there are more ways to manipulate \PDF\ files.
212
213\starttyping
214\externalfigure[myfile.pdf][page=4]
215\stoptyping
216
217If you have problems with the inclusion that originate in the compact features
218discussed here you can say:
219
220\starttyping
221\externalfigure[myfile.pdf][page=4,compact=]
222\stoptyping
223
224but also make sure to tell us what goes wrong so that we can fix it. We cant
225predict what \PDF\ is fed into the machinery.
226
227\stopsubject
228
229\startsubject[title={Embedding multiple \PDF\ files and sharing common content}]
230
231When we include more than one page from a file, we only need to embed shared
232objects once. Of course it demands some object management but that has to be done
233anyway. We could share objects across files but that demands more memory and
234runtime and the saving are likely to be small, with one exception: fonts. It
235would be nice if we can embed missing fonts and also merge fonts that are the
236same. This can make the result much smaller, especially when were talking of
237including examples of typesetting in a manual that uses the same fonts.
238
239Another aspect of inclusion is the quality of the to be embedded page. Here you
240can think of errors in the page stream, color spaces that dont match, missing
241properties, invalid metadata, etc. Often theres not much we can do about it, but
242sometimes we can. However, it has to happen under user control and the outcome
243has to be checked, although often a visual check is good enough.
244
245\stopsubject
246
247\startsubject[title={The \type {compact} parameter and fonts merging}]
248
249The \type {compact} parameter of \type {\externalfigure} controls the embedding
250of \PDF\ content. When set to \quote {yes} it will merge fonts but only when the
251file is produced by \CONTEXT\ \LMTX. The reason for not checking all fonts by
252default comes from the fact that references from the page stream to glyphs in the
253font depend on the application that made the \PDF. In some cases the mapping is
254using the original glyph index, but one can never be sure. Using the \type
255{tounicode} map to go from page stream index to glyph index is also not reliable
256because multiple glyphs can have the same \UNICODE\ slot and when font features
257are applied (say small caps) you actually dont know that.
258
259The mentioned \type {yes} option is a preset that has been defined like:
260
261\starttyping
262\startluacode
263graphics.registerpdfcompactor ( "yes", {
264 merge = {
265 lmtx = true,
266 },
267} )
268\stopluacode
269\stoptyping
270
271Another preset is \type {merge}:
272
273\starttyping
274\startluacode
275graphics.registerpdfcompactor ( "merge", {
276 merge = {
277 type0 = true,
278 truetype = true,
279 type1 = true,
280 lmtx = true,
281 },
282} )
283\stopluacode
284\stoptyping
285
286Currently we dont support \TYPETHREE\ optimization. It is doable but probably
287not worth the effort.
288
289We can also force embedding of fonts that are not included in the document that
290we get the page from. This is unlikely unless you have old documents.
291
292\starttyping
293embed = {
294 type0 = true,
295 truetype = true,
296 type1 = true,
297}
298\stoptyping
299
300References to glyphs in the page stream use an eight bit string encoding or an
301hexadecimal byte pairs. Depending on the font type we have up to 256 references
302(using one character or two hex bytes) or at most 65536 references (using two
303characters or 4 hex bytes). We normalize everything to hex encoding. That way we
304get rid of the ugly escapes and exceptions in page stream glyph string.
305
306There are two trackers:
307
308\starttyping
309\enabletrackers[graphics.fonts]
310\enabletrackers[graphics.fixes]
311\stoptyping
312
313The first one reports what is done with fonts. When embedding of merging is not
314possible you can try to remap the found font onto one on your system. Here are
315some examples:
316
317\starttyping
318graphics.registerpdffont {
319 source = "arial",
320 target = "file:arial.ttf",
321}
322graphics.registerpdffont {
323 source = "arialbold",
324 target = "file:arialbd.ttf",
325}
326graphics.registerpdffont {
327 source = "arial,bold",
328 target = "file:arialbd.ttf",
329}
330graphics.registerpdffont {
331 source = "helvetica",
332 target = "file:arial.ttf",
333 unicode = true,
334}
335graphics.registerpdffont {
336 source = "helveticabold",
337 target = "file:arialbd.ttf",
338 unicode = true,
339}
340graphics.registerpdffont {
341 source = "courier",
342 target = "file:cour.ttf",
343}
344graphics.registerpdffont {
345 source = "mspgothic",
346 unicode = true, via unicode (false for composite)
347}
348\stoptyping
349
350The \type {unicode} key needed when you get rubbish due to the indices in the
351page stream being different from glyph indices in the used font. In that case we
352go via the \type {tounicode} vector which works ok for the average simple
353document not using special font features. There is some trial and error involved
354but that is probably worth the effort when you have to manipulate many documents.
355
356\stopsubject
357
358\startsubject[title={Manipulating properties other than fonts}]
359
360There are two activities when we compact: fonts and content. When content is
361handled additional parsing of the page stream has to happen. What gets processed
362it determined by the \type {identify} table:
363
364\starttyping
365identify = {
366 content = true,
367 resources = true, needs checking
368 page = true, needs checking
369}
370\stoptyping
371
372although this is equivalent:
373
374\starttyping
375identify = "all"
376\stoptyping
377
378As a proof of concept we can recolor an included file. Of course this assumes
379a rather simple use of color. Here is an example:
380
381\startbuffer[demo1]
382\startluacode
383 graphics.registerpdfcompactor ( "preset:demo-1", {
384 identify = {
385 content = true,
386 resources = true,
387 page = true,
388 },
389 merge = {
390 type0 = true,
391 truetype = true,
392 type1 = true,
393 lmtx = true,
394 },
395 recolor = {
396 viagray = { 1, 0, 0 },
397
398
399
400 }
401 } )
402\stopluacode
403\setupexternalfigures[compact=preset:demo1]
404\startTEXpage
405 \startcombination[3*4]
406 {\externalfigure[test000.pdf][frame=on]} {\LUAMETATEX\ 0}
407 {\externalfigure[test001.pdf][frame=on]} {\LUATEX\ 1}
408 {\externalfigure[test002.pdf][frame=on]} {\LUATEX\ 2}
409 {\externalfigure[test003.pdf][frame=on,page=1]} {\LUATEX\ 3.1}
410 {\externalfigure[test003.pdf][frame=on,page=2]} {\LUATEX\ 3.2}
411 {\externalfigure[test003.pdf][frame=on,page=3]} {\LUATEX\ 3.3}
412 {\externalfigure[test004.pdf][frame=on,page=1]} {\PDFTEX\ 4.1}
413 {\externalfigure[test004.pdf][frame=on,page=2]} {\PDFTEX\ 4.2}
414 {\externalfigure[test004.pdf][frame=on,page=3]} {\PDFTEX\ 4.3}
415 {\externalfigure[test005.pdf][frame=on,page=1]} {\PDFTEX\ 4.1}
416 {\externalfigure[test005.pdf][frame=on,page=2]} {\PDFTEX\ 5.2}
417 {\externalfigure[test005.pdf][frame=on,page=3]} {\PDFTEX\ 5.3}
418 \stopcombination
419\stopTEXpage
420\stopbuffer
421
422\typebuffer[demo1]
423
424In \in {figure} [fig:compact1] we make a single page document that embeds 12
425pages from six files made by several engines. The six files have a total of about
426114K but the single page combination is only 19K. The test files are:
427
428\typefile{test000}
429
430So this one is an \LMTX\ produced file. The next two files:
431
432\typefile{test001}
433
434and
435
436\typefile{test002}
437
438are done by \LUATEX\ with \MKIV\ and
439
440\typefile{test003}
441
442as well as
443
444\typefile{test004}
445
446and
447
448\typefile{test005}
449
450are typeset with \PDFTEX\ and \MKII\ so they have the \TYPEONE\ instead of the
451\OPENTYPE\ Latin Modern file embedded (in fact, the \MKII\ and \MKIV\ files use
452the twelve point variant and \LMTX\ the upscaled ten point), so if those were the
453same we would have an even smaller final file.
454
455\startplacefigure[title={An example of content manipulation.},reference=fig:compact1]
456 \typesetbuffer[demo1][compact=yes]
457\stopplacefigure
458
459A useful manipulation is removing tags. The fact that the content is tagged
460doesnt mean that tagging has any use, certainly not if it relates to editing
461specific for some application. Maybe at some point Ill add a retagging option
462but for now we just strip:
463
464\starttyping
465strip = {
466 marked = true,
467 group = true,
468 extgstate = true,
469}
470\stoptyping
471
472The other two are sort of special and might be needed too, especially when for
473instance the states are just there because the producer wasnt clever enough
474to leave them out when not applicable.
475
476It happens that producers use color while actually gray scales are meant. In that
477case one can use these:
478
479\starttyping
480reduce = {
481 color = true, both rgb and cmyk
482 rgb = true,
483 cmyk = true,
484}
485\stoptyping
486
487\type {reduce} converts to gray scale all the \RGB\ colors that have the same
488values for \type {r}, \type {g} and \type {b} and \typ {rgb = true} or \typ
489{color = true}).
490
491The same goes for every \CMYK\ color where \type {c}, \type {m}, \type {m} are
492the same and when \typ {cmyk = true} or \typ {color = true}. In this case the
493common component component is added to the \type {k} component. For example, \typ
494{.2 .2 .2 .5 K} becomes \typ {.2 .5 = .7 G}, while \typ {.5 .5 .5 .7 K} becomes
495\typ {1 G}, because the sum is limited to 1.
496
497Using a gray scale is more efficient and in the case of \CMYK\ a sloppy \typ {.5
498.5 .5 0 K} quite likely is meant to be \typ {0 0 0 0.5 K} or just \typ {.5 G}.
499
500Remapping \RGB\ to \CMYK\ (or gray if applicable) is done with:
501
502\starttyping
503convert = {
504 rgb = true,
505 cmyk = true,
506}
507\stoptyping
508
509and of course one can also remap \CMYK\ to \RGB.
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532I want to stress that manipulating the content stream has some limitations. For
533instance because objects are shared including a page a second time will reuse the
534already converted page. However, you can try the next trick:
535
536\startbuffer[demo2]
537\startluacode
538 graphics.registerpdfcompactor ( "preset:demo-2", {
539 identify = "all",
540 merge = { lmtx = true },
541 recolor = { viagray = { 0, 1, 0 } },
542 } )
543 graphics.registerpdfcompactor ( "preset:demo-3", {
544 identify = "all",
545 merge = { lmtx = true },
546 recolor = { viagray = { 0, 0, 1 } },
547 } )
548
549\stopluacode
550\setupexternalfigures[compact=preset:demo1]
551\startTEXpage
552 \startcombination[2*1]
553 {\externalfigure
554 [test000.pdf]
555 [frame=on,compact=preset:demo2,width=6cm,object=no,arguments=1]}
556 {demo2}
557 {\externalfigure
558 [test000.pdf]
559 [frame=on,compact=preset:demo3,width=6cm,object=no,arguments=2]}
560 {demo3}
561 \stopcombination
562\stopTEXpage
563\stopbuffer
564
565\typebuffer[demo2]
566
567In \in {figure} [fig:compact2] we see that indeed a different compactor is used.
568We need to disable sharing by setting \type {object} to \type {no}. However, this
569will still share some but we abuse the arguments key to create a different
570sharing hash (normally that key is used to pass arguments to converters).
571
572\startplacefigure[title={An example of manipulation content twice.},reference=fig:compact2]
573 \typesetbuffer[demo2][compact=yes]
574\stopplacefigure
575
576I cases where color conversion is problematic (or critical) you can remap
577specific colors. Especially \CMYK\ is sensitive for conversion because there we
578have four color components while in \RGB\ we have only three. Also watching on a
579display (\RGB) is different from looking at a print (\CMYK) and who knows what
580transfer function gets applied in the former. Here is how remapping works:
581
582\starttyping
583local cmykmap = {
584 { 100, 100, 55, 0, 57, 0, 22, 40.8 }
585}
586graphics.registerpdfcompactor ( "preset:demo5", {
587 identify = "all",
588 merge = { lmtx = true },
589 convert = { cmyk = cmykmap },
590} )
591\stoptyping
592
593Here the entries in a \CMYK\ map are:
594
595\starttyping
596 { factor, c, m, y, k, r, g, b }
597\stoptyping
598
599In this case values are multiplied by 100 which makes sure that we catch rounding
600errors in the \PDF\ definitions. Keep in mind that colors in many applications
601have at most 256 values per component. Also, even quality \LCD\ displays can use
602less than eight bits per component.
603
604\startbuffer[demo3]
605\startluacode
606 local cmykmap = {
607 { 100, 100, 55, 0, 57, 0, 22, 40.8 }
608 }
609 graphics.registerpdfcompactor ( "preset:demo-4", {
610 identify = "all",
611 merge = { lmtx = true },
612 convert = { cmyk = true },
613 } )
614 graphics.registerpdfcompactor ( "preset:demo-5", {
615 identify = "all",
616 merge = { lmtx = true },
617 convert = { cmyk = cmykmap },
618 } )
619\stopluacode
620\setupexternalfigures[compact=preset:demo1]
621\startTEXpage
622 \startcombination[2*1]
623 {\externalfigure
624 [test006.pdf]
625 [frame=on,compact=preset:demo4,width=6cm,object=no,arguments=3]}
626 {demo4}
627 {\externalfigure
628 [test006.pdf]
629 [frame=on,compact=preset:demo5,width=6cm,object=no,arguments=4]}
630 {demo5}
631 \stopcombination
632\stopTEXpage
633\stopbuffer
634
635In \in {figure} [fig:compact3] we show an example. The file used looks like:
636
637\typefile{test006.tex}
638
639\startplacefigure[title={An example of remapping \CMYK\ colors.},reference=fig:compact3]
640 \typesetbuffer[demo3][compact=yes]
641\stopplacefigure
642
643In case we wanted to map single \RGB\ values to \CMYK,
644we would define an analogous map:
645
646\starttyping
647local rgbmap = {
648 { 100, 0, 22, 40.8, 100, 55, 0, 57 } factor, r, g, b, c, m, y, k
649}
650graphics.registerpdfcompactor ( "preset:demo8", {
651 identify = "all",
652 merge = { lmtx = true },
653 convert = { rgb = rgbmap },
654} )
655\stoptyping
656
657Here the entries in a \RGB\ map are:
658
659\starttyping
660 { factor, r, g, b, c, m, y, k }
661\stoptyping
662
663\startbuffer[demo6]
664\startluacode
665 local cmykmap = {
666 { 100, 0, 22, 40.8, 100, 55, 0, 57 }
667 }
668 graphics.registerpdfcompactor ( "preset:demo-6", {
669 identify = "all",
670 merge = { lmtx = true },
671 convert = { rgb = true },
672 } )
673 graphics.registerpdfcompactor ( "preset:demo-7", {
674 identify = "all",
675 merge = { lmtx = true },
676 convert = { rgb = rgbmap },
677 } )
678\stopluacode
679
680\setupexternalfigures[compact=preset:demo1]
681
682\startTEXpage
683 \startcombination[nx=2,ny=1]
684 {\externalfigure
685 [test006.pdf]
686 [frame=on,compact=preset:demo6,width=6cm,object=no,arguments=5]}
687 {demo6}
688 {\externalfigure
689 [test006.pdf]
690 [frame=on,compact=preset:demo7,width=6cm,object=no,arguments=6]}
691 {demo7}
692 \stopcombination
693\stopTEXpage
694\stopbuffer
695
696In \in {figure} [fig:compact6] we show an example.
697
698\startplacefigure[title={An example of remapping \RGB\ colors.},reference=fig:compact6]
699 \typesetbuffer[demo6][compact=yes]
700\stopplacefigure
701
702\stopsubject
703
704\startsubject[title={The \type {fixpdf} script}]
705
706I want to stress that you need to check the outcome. Often a visual check is
707enough. Extending the compactor beyond what \MKIV\ provided was to a large extend
708facilitated by a cooperation with Tan, Syabil M. and Ser, Zheng Y. of \quote
709{Team Ramkumar} who did extensive testing and gave enjoyable feedback. In the
710process a test script was made that can help with experiments. We assume that
711\type {qpdf}, \type {mutool} and \type {graphicmagic} abd \type {verapdf} are
712installed. Massimiliano Farinella applied these mechanism to large complex
713files from InDesign and InkScape that needed fixing and in the process the code
714got extended and improved. \footnote {Feel free to send us files that give
715problems so that we can look into it.}
716
717\starttyping
718mtxrun script fixpdf uncompress foo
719mtxrun script fixpdf convert compactor=preset:test foo
720mtxrun script fixpdf validate foo
721mtxrun script fixpdf check foo
722mtxrun script fixpdf compare resolution=300 foo
723\stoptyping
724
725Here we produce an uncompressed version (so that we can see what we deal with),
726convert the original into a new one, validate (and check the outcome) and create
727a version for visual comparison. Its just an example of usage and here the focus
728was on fixing existing documents (six digit numbers so the workflow needs to be
729carefully checked) and not so much on single page inclusion.
730
731However, this script and setup is somewhat complex so we also provide an
732alternative in the \quote {extras} namespace:
733
734\starttyping
735context extra=fixpdf compactor=mine:test extrastyle=foo somefile.pdf
736\stoptyping
737
738Additional options are \type {notracing} and \type {nocompression}. A
739compactor can be defined in a file with the name \typ {compactorsmine.lua} that
740looks like this. Check out \typ {compactorspreset.lua} for examples.
741
742\starttyping
743local fonts = {
744 { source = "arial", target = "file:arial.ttf" },
745 { source = "arialbold", target = "file:arialbd.ttf" },
746 { source = "arial,bold", target = "file:arialbd.ttf" },
747 { source = "helvetica", target = "file:arial.ttf" },
748 { source = "helveticabold", target = "file:arialbd.ttf" },
749 { source = "courier", target = "file:cour.ttf" },
750 { source = "wingdings", target = "wingding" },
751 { source = "timesroman", target = "file:times.ttf" },
752 { source = "timesnewromanpsmt", target = "file:times.ttf" },
753 { source = "timesnewromanpsitalicmt", target = "file:timesi.ttf" },
754 { source = "timesnewromanpsitalicmt", target = "file:timesi.ttf" },
755 { source = "timesnewroman,italic", target = "file:timesi.ttf" },
756 { source = "timesitalic", target = "file:timesi.ttf" },
757 { source = "timesitalic", target = "file:timesi.ttf" },
758 { source = "timesnewromanpsboldmt", target = "file:timesbd.ttf" },
759 { source = "timesnewromanpsboldmt", target = "file:timesbd.ttf" },
760 { source = "timesnewroman,bold", target = "file:timesbd.ttf" },
761 { source = "timesbold", target = "file:timesbd.ttf" },
762 { source = "timesbold", target = "file:timesbd.ttf" },
763}
764
765return {
766 name = "compactorspreset",
767 version = "1.00",
768 comment = "Definitions that complement pdf embedding.",
769 author = "Hans Hagen",
770 copyright = "ConTeXt development team",
771 compactors = {
772 ["test"] = {
773 fonts = fonts,
774 embed = {
775 type0 = true,
776 truetype = true,
777 type1 = true,
778 },
779 merge = {
780 type0 = true,
781 truetype = true,
782 type1 = true,
783 LMTX = true,
784 },
785 strip = {
786 marked = true,
787 },
788 cleanup = {
789 pieceinfo = true,
790 procset = true,
791 cidset = true,
792 },
793 }
794 },
795}
796\stoptyping
797
798A file like this is easier than registering in a \LUA\ snippet. Its also more
799future proof. The somewhat weird font list is normally build up as we test and is
800often rather specific for a specific set of files.
801
802\stopsubject
803
804\stopdocument
805 |