SourceBrowser

ontarget-pdf-2.tex /size: 5472 b last modification: 2025-02-21 11:03
1% language=us runpath=texruns:manuals/ontarget
2
3\startcomponent ontarget-pdf-2
4
5\environment ontarget-style
6
7\logo [TIKZ] {TikZ}
8\logo [SMIL] {SMIL}
9
10\startchapter[title={PDF 2.0}]
11
12% \startsection[title=Introduction]
13
14The \PDF\ file format has evolved over decades and support in \TEX\ macro
15packages has evolved with it. The \PDFTEX\ engine defaults to version 1.4 but
16\CONTEXT\ \MKII\ bumps that to 1.5 by default. The \LUATEX\ engine initializes to
171.0 but in \CONTEXT\ \MKIV\ we use 1.7 by default, although one can choose some
18standard that uses a different value. A quick check shows that \XETEX, that uses
19\DVIPDFMX\ goes for 1.5 by default. In \LUAMETATEX\ we default to nothing because
20it has no backend built in, but in \MKXL\ we also use 1.7 as default. So, where does
21the latest greatest \PDF\ 2.0 fit in?
22
23It's good to notice that the difference between version 1.7 and 2.0 is not that
24large, especially when we look at a \TEX\ engine. When we talk about text we only
25have to provide page streams with text rendering operators and embed fonts that
26make this possible. We can support color by pushing the right operators and, if
27needed, graphic state objects in the file. While \TEX\ only has rules, we're fine
28with simple drawing operations. We can use other drawing operators when we
29convert from \METAPOST\ or use packages like \TIKZ. Of course adding hyperlinks
30and alike is possible and here we can try to play safe, but we can also introduce
31issues because we rely on the viewer. If we insert graphics we have to make sure
32that the right objects are injected, and here we are dependent of the quality of
33the producer of those graphics. In most aspects an upgrade to 2.0 is no big deal
34if we already can provide 1.7 output.
35
36Because \PDF\ is used for printing, interchange and archiving, some standards have
37evolved that put restrictions on what can go in the \PDF\ file. If we decide to
38go for 2.0 output, we can forget about these standards because \PDF\ 2.0 supports
39the lot. It is somewhat more restricted and features that are deprecates
40sometimes are in fact obsolete and supposed not to be used. This is puzzling
41because there is no real need to drop something that is already supported. The
42main problem here is that validators can be picky. There are also amendments made
43to 2.0, some of which smell like accommodating applications that have issues with
44them. It's not like we have tons of old viewers that are used large scale that
45are not up to 2.0 quality \PDF. Maybe we should just forget about the pre 2.0
46standard profiles.
47
48Tagging has been part of \PDF\ for a while but in the beginning was mostly
49related to applications marking content in a way useful for those applications.
50When embedding a page from a file that information is rather useless. Later
51accessibility became an issue and that kind of tagging was added and later
52adapted a bit in \PDF\ 2.0, but in 2024 it is still pretty much unclear how
53to deal with it, if only because in 15 years no real development has taken
54place in viewers when it comes to using these features. The best we can do
55is to make sure that validators are happy with what \CONTEXT\ produces.
56
57It must be noted that when we talk accessibility, the fact that embedding audio
58and video went from being easy to being complicated, with an intermediate flirt
59with flash (in it self okay \SMIL). In a similar way interactive features are a
60bit unstable, and for instance simple (trivial) viewer control would have made
61live much easier. In order to really be accessible one should just ship dedicated
62versions of documents, its source or, if one considers that more usable, some kind
63of \HTML\ output (of course that's often a short term perspective).
64
65So, what is needed to support 2.0 in \CONTEXT ? The output that we render is
66basically the same. We can leave out some details like so called cidsets and
67procsets and we have to make sure that some demands are met, like mandate
68resources and using registered tags for private entries in dictionaries. All that
69is easy compared to inclusion of third party content: here we often need to do
70some cleanup in order to pass the tests. For that we need not only check some
71specific dictionaries but also parse the page (and form) content streams and if
72needed clean them up. This comes a (runtime) price but it's bearable. More tricky
73is dealing with missing (or bad) fonts but again it can be done, maybe with a
74dedicates configuration to control the process. See the \type {pdfmerge} manual
75for more information about this.
76
77When it comes to tagging the best we can so is play safe and rely on future
78content analysis helped with generic tags. After all that us what these language
79models promise us. It doesn't hurt to be somewhat sceptic with regards to
80standards and the future. Just look at how typesetting, printing and publishing
81evolved. Technology is not that long term stable and with big tech in charge
82commerce drives most of it. It's already a miracle is some application or
83technology survives a few years. So, we really need to restrict ourselves to what
84makes sense.
85
86All that said: we're ready to deal with 2.0 and can always adapt if needed. For
87that purpose we also embed additional \CONTEXT\ specific information so that in
88the future we can, if needed, upgrade a \PDF\ file, although with a \TEX\ tool
89chain one can always regenerate a document. The main question is, when do we
90default to it.
91
92% \stopsection
93
94\stopchapter
95
96\stopcomponent
97
Source Browser ?