% language=us \startcomponent hybrid-backends \environment hybrid-environment \startchapter[title={Backend code}] \startsection [title={Introduction}] In \CONTEXT\ we've always separated the backend code in so called driver files. This means that in the code related to typesetting only calls to the \API\ take place, and no backend specific code is to be used. That way we can support backend like dvipsone (and dviwindo), dvips, acrobat, pdftex and dvipdfmx with one interface. A simular model is used in \MKIV\ although at the moment we only have one backend: \PDF. \footnote {At this moment we only support the native \PDF\ backend but future versions might support \XML\ (\HTML) output as well.} Some \CONTEXT\ users like to add their own \PDF\ specific code to their styles or modules. However, such extensions can interfere with existing code, especially when resources are involved. This has to be done via the official helper macros. In the next sections an overview will be given of the current approach. There are still quite some rough edges but these will be polished as soon as the backend code is more isolated in \LUATEX\ itself. \stopsection \startsection [title={Structure}] A \PDF\ file is a tree of indirect objects. Each object has a number and the file contains a table (or multiple tables) that relates these numbers to positions in a file (or position in a compressed object stream). That way a file can be viewed without reading all data: a viewer only loads what is needed. \starttyping 1 0 obj << /Name (test) /Address 2 0 R >> 2 0 obj [ (Main Street) (24) (postal code) (MyPlace) ] \stoptyping For the sake of the discussion we consider strings like \type {(test)} also to be objects. In the next table we list what we can encounter in a \PDF\ file. There can be indirect objects in which case a reference is used (\type{2 0 R}) and direct ones. \starttabulate[|l|l|p|] \FL \NC \bf type \NC \bf form \NC \bf meaning \NC \NR \TL \NC constant \NC \type{/...} \NC A symbol (prescribed string). \NC \NR \NC string \NC \type{(...)} \NC A sequence of characters in pdfdoc encoding \NC \NR \NC unicode \NC \type{<...>} \NC A sequence of characters in utf16 encoding \NC \NR \NC number \NC \type{3.1415} \NC A number constant. \NC \NR \NC boolean \NC \type{true/false} \NC A boolean constant. \NC \NR \NC reference \NC \type{N 0 R} \NC A reference to an object \NC \NR \NC dictionary \NC \type{<< ... >>} \NC A collection of key value pairs where the value itself is an (indirect) object. \NC \NR \NC array \NC \type{[ ... ]} \NC A list of objects or references to objects. \NC \NR \NC stream \NC \NC A sequence of bytes either or not packaged with a dictionary that contains descriptive data. \NC \NR \NC xform \NC \NC A special kind of object containing an reusable blob of data, for example an image. \NC \NR \LL \stoptabulate While writing additional backend code, we mostly create dictionaries. \starttyping << /Name (test) /Address 2 0 R >> \stoptyping In this case the indirect object can look like: \starttyping [ (Main Street) (24) (postal code) (MyPlace) ] \stoptyping It all starts in the document's root object. From there we access the page tree and resources. Each page carries its own resource information which makes random access easier. A page has a page stream and there we find the to be rendered content as a mixture of (\UNICODE) strings and special drawing and rendering operators. Here we will not discuss them as they are mostly generated by the engine itself or dedicated subsystems like the \METAPOST\ converter. There we use literal or \type {\latelua} whatsits to inject code into the current stream. In the \CONTEXT\ \MKII\ backend drivers code you will see objects in their verbose form. The content is passed on using special primitives, like \type {\pdfobj}, \type{\pdfannot}, \type {\pdfcatalog}, etc. In \MKIV\ no such primitives are used. In fact, some of them are overloaded to do nothing at all. In the \LUA\ backend code you will find function calls like: \starttyping local d = lpdf.dictionary { Name = lpdf.string("test"), Address = lpdf.array { "Main Street", "24", "postal code", "MyPlace", } } \stoptyping Equaly valid is: \starttyping local d = lpdf.dictionary() d.Name = "test" \stoptyping Eventually the object will end up in the file using calls like: \starttyping local r = pdf.immediateobj(tostring(d)) \stoptyping or using the wrapper (which permits tracing): \starttyping local r = lpdf.flushobject(d) \stoptyping The object content will be serialized according to the formal specification so the proper \type {<< >>} etc.\ are added. If you want the content instead you can use a function call: \starttyping local dict = d() \stoptyping An example of using references is: \starttyping local a = lpdf.array { "Main Street", "24", "postal code", "MyPlace", } local d = lpdf.dictionary { Name = lpdf.string("test"), Address = lpdf.reference(a), } local r = lpdf.flushobject(d) \stoptyping \stopsection We have the following creators. Their arguments are optional. \starttabulate[|l|p|] \FL \NC \bf function \NC \bf optional parameter \NC \NR \TL %NC \type{lpdf.stream} \NC indexed table of operators \NC \NR \NC \type{lpdf.dictionary} \NC hash with key/values \NC \NR \NC \type{lpdf.array} \NC indexed table of objects \NC \NR \NC \type{lpdf.unicode} \NC string \NC \NR \NC \type{lpdf.string} \NC string \NC \NR \NC \type{lpdf.number} \NC number \NC \NR \NC \type{lpdf.constant} \NC string \NC \NR \NC \type{lpdf.null} \NC \NC \NR \NC \type{lpdf.boolean} \NC boolean \NC \NR %NC \type{lpdf.true} \NC \NC \NR %NC \type{lpdf.false} \NC \NC \NR \NC \type{lpdf.reference} \NC string \NC \NR \NC \type{lpdf.verbose} \NC indexed table of strings \NC \NR \LL \stoptabulate Flushing objects is done with: \starttyping lpdf.flushobject(obj) \stoptyping Reserving object is or course possible and done with: \starttyping local r = lpdf.reserveobject() \stoptyping Such an object is flushed with: \starttyping lpdf.flushobject(r,obj) \stoptyping We also support named objects: \starttyping lpdf.reserveobject("myobject") lpdf.flushobject("myobject",obj) \stoptyping \startsection [title={Resources}] While \LUATEX\ itself will embed all resources related to regular typesetting, \MKIV\ has to take care of embedding those related to special tricks, like annotations, spot colors, layers, shades, transparencies, metadata, etc. If you ever took a look in the \MKII\ \type {spec-*} files you might have gotten the impression that it quickly becomes messy. The code there is actually rather old and evolved in sync with the \PDF\ format as well as \PDFTEX\ and \DVIPDFMX\ maturing to their current state. As a result we have a dedicated object referencing model that sometimes results in multiple passes due to forward references. We could have gotten away from that with the latest versions of \PDFTEX\ as it provides means to reserve object numbers but it makes not much sense to do that now that \MKII\ is frozen. Because third party modules (like tikz) also can add resources like in \MKII\ using an \API\ that makes sure that no interference takes place. Think of macros like: \starttyping \pdfbackendsetcatalog {key}{string} \pdfbackendsetinfo {key}{string} \pdfbackendsetname {key}{string} \pdfbackendsetpageattribute {key}{string} \pdfbackendsetpagesattribute{key}{string} \pdfbackendsetpageresource {key}{string} \pdfbackendsetextgstate {key}{pdfdata} \pdfbackendsetcolorspace {key}{pdfdata} \pdfbackendsetpattern {key}{pdfdata} \pdfbackendsetshade {key}{pdfdata} \stoptyping One is free to use the \LUA\ interface instead, as there one has more possibilities. The names are similar, like: \starttyping lpdf.addtoinfo(key,anything_valid_pdf) \stoptyping At the time of this writing (\LUATEX\ .50) there are still places where \TEX\ and \LUA\ code is interwoven in a non optimal way, but that will change in the future as the backend is completely separated and we can do more \TEX\ trickery at the \LUA\ end. Also, currently we expose more of the backend code than we like and future versions will have a more restricted access. The following function will stay public: \starttyping lpdf.addtopageresources (key,value) lpdf.addtopageattributes (key,value) lpdf.addtopagesattributes(key,value) lpdf.adddocumentextgstate(key,value) lpdf.adddocumentcolorspac(key,value) lpdf.adddocumentpattern (key,value) lpdf.adddocumentshade (key,value) lpdf.addtocatalog (key,value) lpdf.addtoinfo (key,value) lpdf.addtonames (key,value) \stoptyping There are several tracing options built in and some more will be added in due time: \starttyping \enabletrackers [backend.finalizers, backend.resources, backend.objects, backend.detail] \stoptyping As with all trackers you can also pass them on the command line, for example: \starttyping context --trackers=backend.* yourfile \stoptyping The reference related backend mechanisms have their own trackers. \stopsection \startsection [title={Transformations}] There is at the time of this writing still some backend related code at the \TEX\ end that needs a cleanup. Most noticeable is the code that deals with transformations (like scaling). At some moment in \PDFTEX\ a primitive was introduced but it was not completely covering the transform matrix so we never used it. In \LUATEX\ we will come up with a better mechanism. Till that moment we stick to the \MKII\ method. \stopsection \startsection [title={Annotations}] The \LUA\ based backend of \MKIV\ is not so much less code, but definitely cleaner. The reason why there is quite some code is because in \CONTEXT\ we also handle annotations and destinations in \LUA. In other words: \TEX\ is not bothered by the backend any more. We could make that split without too much impact as we never depended on \PDFTEX\ hyperlink related features and used generic annotations instead. It's for that reason that \CONTEXT\ has always been able to nest hyperlinks and have annotations with a chain of actions. Another reason for doing it all at the \LUA\ end is that as in \MKII\ we have to deal with the rather hybrid cross reference mechanisms which uses a sort of language and parsing this is also easier at the \LUA\ end. Think of: \starttyping \definereference[somesound][StartSound(attention)] \at {just some page} [someplace,somesound,StartMovie(somemovie)] \stoptyping We parse the specification expanding shortcuts when needed, create an action chain, make sure that the movie related resources are taken care of (normally the movie itself will be a figure), and turn the three words into hyperlinks. As this all happens in \LUA\ we have less \TEX\ code. Contrary to what you might expect, the \LUA\ code is not that much faster as the \MKII\ \TEX\ code is rather optimized. Special features like \JAVASCRIPT\ as well as widgets (and forms) are also reimplemented. Support for \JAVASCRIPT\ is not that complex at all, but as in \CONTEXT\ we can organize scripts in collections and have automatic inclusion of used functions, still some code is needed. As we now do this in \LUA\ we use less \TEX\ memory. Reimplementing widgets took a bit more work as I used the opportunity to remove hacks for older viewers. As support for widgets is somewhat instable in viewers quite some testing was needed, especially because we keep supporting cloned and copied fields (resulting in widget trees). An interesting complication with widgets is that each instance can have a lot of properties and as we want to be able to use thousands of them in one document, each with different properties, we have efficient storage in \MKII\ and want to do the same in \LUA. Most code at the \TEX\ end is related to passing all those options. You could use the \LUA\ functions that relate to annotations etc.\ but normally you will use the regular \CONTEXT\ user interface. For practical reasons, the backend code is grouped in several tables: The \type{backends} table has subtables for each backend and currently there is only one: \type {pdf}. Each backend provides tables itself. In the \type{codeinjections} namespace we collect functions that don't interfere with the typesetting or typeset result, like inserting all kind of resources (movies, attachment, etc.), widget related functionality, and in fact everything that does not fit into the other categories. In \type {nodeinjections} we organize functions that inject literal \PDF\ code in the nodelist which then ends up in the \PDF\ stream: color, layers, etc. The \type {registrations} table is reserved for functions related to resources that result from node injections: spot colors, transparencies, etc. Once the backend code is finished we might come up with another organization. No matter what we end up with, the way the \type {backends} table is supposed to be organized determines the \API\ and those who have seen the \MKII\ backend code will recognize some of it. \startsection [title={Metadata}] We always had the opportunity to set the information fields in a \PDF\ but standardization forces us to add these large verbose metadata blobs. As this blob is coded in \XML\ we use the built in \XML\ parser to fill a template. Thanks to extensive testing and research by Peter Rolf we now have a rather complete support for \PDF/x related demands. This will definitely evolve with the advance of the \PDF\ specification. You can replace the information with your own but we suggest that you stay away from this metadata mess as far as possible. \stopsection \startsection [title={Helpers}] If you look into the \type {lpdf-*.lua} files you will find more functions. Some are public helpers, like: \starttabulate \NC \type {lpdf.toeight(str)} \NC returns \type {(string)} \NC \NR %NC \type {lpdf.cleaned(str)} \NC returns \type {escaped string} \NC \NR \NC \type {lpdf.tosixteen(str)} \NC returns \type {} \NC \NR \stoptabulate An example of another public function is: \starttyping lpdf.sharedobj(content) \stoptyping This one flushes the object and returns the object number. Already defined objects are reused. In addition to this code driven optimization, some other optimization and reuse takes place but all that happens without user intervention. \stopsection \stopchapter \stopcomponent