% language=us runpath=texruns:manuals/ontarget

\startcomponent ontarget-pdf-2

\environment ontarget-style

\logo [TIKZ] {TikZ}
\logo [SMIL] {SMIL}

\startchapter[title={PDF 2.0}]

% \startsection[title=Introduction]

The \PDF\ file format has evolved over decades and support in \TEX\ macro
packages has evolved with it. The \PDFTEX\ engine defaults to version 1.4 but
\CONTEXT\ \MKII\ bumps that to 1.5 by default. The \LUATEX\ engine initializes to
1.0 but in \CONTEXT\ \MKIV\ we use 1.7 by default, although one can choose some
standard that uses a different value. A quick check shows that \XETEX, that uses
\DVIPDFMX\ goes for 1.5 by default. In \LUAMETATEX\ we default to nothing because
it has no backend built in, but in \MKXL\ we also use 1.7 as default. So, where does
the latest greatest \PDF\ 2.0 fit in?

It's good to notice that the difference between version 1.7 and 2.0 is not that
large, especially when we look at a \TEX\ engine. When we talk about text we only
have to provide page streams with text rendering operators and embed fonts that
make this possible. We can support color by pushing the right operators and, if
needed, graphic state objects in the file. While \TEX\ only has rules, we're fine
with simple drawing operations. We can use other drawing operators when we
convert from \METAPOST\ or use packages like \TIKZ. Of course adding hyperlinks
and alike is possible and here we can try to play safe, but we can also introduce
issues because we rely on the viewer. If we insert graphics we have to make sure
that the right objects are injected, and here we are dependent of the quality of
the producer of those graphics. In most aspects an upgrade to 2.0 is no big deal
if we already can provide 1.7 output.

Because \PDF\ is used for printing, interchange and archiving, some standards have
evolved that put restrictions on what can go in the \PDF\ file. If we decide to
go for 2.0 output, we can forget about these standards because \PDF\ 2.0 supports
the lot. It is somewhat more restricted and features that are deprecates
sometimes are in fact obsolete and supposed not to be used. This is puzzling
because there is no real need to drop something that is already supported. The
main problem here is that validators can be picky. There are also amendments made
to 2.0, some of which smell like accommodating applications that have issues with
them. It's not like we have tons of old viewers that are used large scale that
are not up to 2.0 quality \PDF. Maybe we should just forget about the pre 2.0
standard profiles.

Tagging has been part of \PDF\ for a while but in the beginning was mostly
related to applications marking content in a way useful for those applications.
When embedding a page from a file that information is rather useless. Later
accessibility became an issue and that kind of tagging was added and later
adapted a bit in \PDF\ 2.0, but in 2024 it is still pretty much unclear how
to deal with it, if only because in 15 years no real development has taken
place in viewers when it comes to using these features. The best we can do
is to make sure that validators are happy with what \CONTEXT\ produces.

It must be noted that when we talk accessibility, the fact that embedding audio
and video went from being easy to being complicated, with an intermediate flirt
with flash (in it self okay \SMIL). In a similar way interactive features are a
bit unstable, and for instance simple (trivial) viewer control would have made
live much easier. In order to really be accessible one should just ship dedicated
versions of documents, its source or, if one considers that more usable, some kind
of \HTML\ output (of course that's often a short term perspective).

So, what is needed to support 2.0 in \CONTEXT ? The output that we render is
basically the same. We can leave out some details like so called cidsets and
procsets and we have to make sure that some demands are met, like mandate
resources and using registered tags for private entries in dictionaries. All that
is easy compared to inclusion of third party content: here we often need to do
some cleanup in order to pass the tests. For that we need not only check some
specific dictionaries but also parse the page (and form) content streams and if
needed clean them up. This comes a (runtime) price but it's bearable. More tricky
is dealing with missing (or bad) fonts but again it can be done, maybe with a
dedicates configuration to control the process. See the \type {pdfmerge} manual
for more information about this.

When it comes to tagging the best we can so is play safe and rely on future
content analysis helped with generic tags. After all that us what these language
models promise us. It doesn't hurt to be somewhat sceptic with regards to
standards and the future. Just look at how typesetting, printing and publishing
evolved. Technology is not that long term stable and with big tech in charge
commerce drives most of it. It's already a miracle is some application or
technology survives a few years. So, we really need to restrict ourselves to what
makes sense.

All that said: we're ready to deal with 2.0 and can always adapt if needed. For
that purpose we also embed additional \CONTEXT\ specific information so that in
the future we can, if needed, upgrade a \PDF\ file, although with a \TEX\ tool
chain one can always regenerate a document. The main question is, when do we
default to it.

% \stopsection

\stopchapter

\stopcomponent