% language=us % author : Hans Hagen % copyright : ConTeXt Development Team % license : Creative Commons Attribution ShareAlike 4.0 International % reference : pragma-ade.nl | contextgarden.net | texlive (related) distributions % origin : the ConTeXt distribution % % comment : Because this manual is distributed with TeX distributions it comes with a rather % liberal license. We try to adapt these documents to upgrades in the (sub)systems % that they describe. Using parts of the content otherwise can therefore conflict % with existing functionality and we cannot be held responsible for that. Many of % the manuals contain characteristic graphics and personal notes or examples that % make no sense when used out-of-context. % % comment : Some chapters might have been published in TugBoat, the NTG Maps, the ConTeXt % Group journal or otherwise. Thanks to the editors for corrections. Also thanks % to users for testing, feedback and corrections. % todo: % % metadata % properties % \dontleavehmode before hbox % cover page % % http://www.cnet.com/news/google-subtracts-mathml-from-chrome-and-anger-multiplies/ % \usemodule[luacalls] \usemodule[art-01,abr-02] \definehighlight[notabene][style=bold] \definecolor[darkorange] [.70(green,red)] \definecolor[lightorange][.45(orange,white)] \definecolor[lesswhite] [.90(white)] \setuptyping[color=darkorange] \setuptype [color=darkorange] \starttext \startMPpage numeric w ; w := 21cm ; numeric h ; h := 29.7cm ; numeric ww ; ww := 9w/10 ; numeric oo ; oo := (w-ww) / 2 ; numeric hh ; hh := h/5.5 ; path p ; p := unitsquare xysized(w,h) ; color orange ; orange := \MPcolor{darkorange} ; % .7[green,red] ; fill p enlarged 2mm withcolor orange ; draw image ( draw anchored.top( textext("\ttbf\setupinterlinespace[line=1.7ex]\framed[frame=off,align=middle,offset=0mm]{\smash{
}\\\smash{
}\\\smash{
}}") xsized w, center topboundary p shifted (0,-12mm)) withcolor \MPcolor{lightorange} ; % 0.45[white,orange] ; draw anchored.bot( textext("\ssbf\setupinterlinespace[line=2.2ex]\framed[frame=off,align=middle]{exporting\\xml and epub\\from context}") xsized w, center bottomboundary p shifted (0,4mm)) withcolor \MPcolor {lesswhite} ; % 0.90white ; ) ; setbounds currentpicture to p ; \stopMPpage \startsection[title=Introduction] There is a pretty long tradition of typesetting math with \TEX\ and it looks like this program will dominate for many more years. Even if we move to the web, the simple fact that support for \MATHML\ in some browsers is suboptimal will drive those who want a quality document to use \PDF\ instead. I'm writing this in 2014, at a time when \XML\ is widespread. The idea of \XML\ is that you code your data in a very structured way, so that it can be manipulated and (if needed) validated. Text has always been a target for \XML\ which is a follow|-|up to \SGML\ that was in use by publishers. Because \HTML\ is less structured (and also quite tolerant with respect to end tags) we prefer to use \XHTML\ but unfortunately support for that is less widespread. Interestingly, documents are probably among the more complex targets of the \XML\ format. The reason is that unless the author restricts him|/|herself or gets restricted by the publisher, tag abuse can happen. At \PRAGMA\ we mostly deal with education|-|related \XML\ and it's not always easy to come up with something that suits the specific needs of the educational concept behind a school method. Even if we start out nice and clean, eventually we end up with a polluted source, often with additional structure needed to satisfy the tools used for conversion. We have been supporting \XML\ from the day it showed up and most of our projects involve \XML\ in one way or the other. That doesn't mean that we don't use \TEX\ for coding documents. This manual is for instance a regular \TEX\ document. In many ways a structured \TEX\ document is much more convenient to edit, especially if one wants to add a personal touch and do some local page make|-|up. On the other hand, diverting from standard structure commands makes the document less suitable for output other than \PDF. There is simply no final solution for coding a document, it's mostly a matter of taste. So we have a dilemma: if we want to have multiple output, frozen \PDF\ as well as less-controlled \HTML\ output, we can best code in \XML, but when we want to code comfortably we'd like to use \TEX. There are other ways, like Markdown, that can be converted to intermediate formats like \TEX, but that is only suitable for simple documents: the more advanced documents get, the more one has to escape from the boundaries of (any) document encoding, and then often \TEX\ is not a bad choice. There is a good reason why \TEX\ survived for so long. It is for this reason that in \CONTEXT\ \MKIV\ we can export the content in a reasonable structured way to \XML. Of course we assume a structured document. It started out as an experiment because it was relatively easy to implement, and it is now an integral component. \stopsection \startsection[title=The output] The regular output is an \XML\ file but as we have some more related data it gets organized in a tree. We also export a few variants. An example is given below: \starttyping ./test-export ./test-export/images ./test-export/images/... ./test-export/styles ./test-export/styles/test-defaults.css ./test-export/styles/test-images.css ./test-export/styles/test-styles.css ./test-export/styles/test-templates.css ./test-export/test-raw.xml ./test-export/test-raw.lua ./test-export/test-tag.xhtml ./test-export/test-div.xhtml \stoptyping Say that we have this input: \starttyping \setupbackend [export=yes] \starttext \startsection[title=First] \startitemize \startitem one \stopitem \startitem two \stopitem \stopitemize \stopsection \stoptext \stoptyping The main export ends up in the \type {test-raw.xml} export file and looks like the following (we leave out the preamble and style references): \starttyping
1 First one two
\stoptyping This file refers to the stylesheets and therefore renders quite well in a browser like Firefox that can handle \XHTML\ with arbitrary tags. The \type {detail} attribute tells us what instance of the element is used. Normally the \type {chain} attribute is the same but it can have more values. For instance, if we have: \starttyping \definefloat[graphic][graphics][figure] ..... \startplacefigure[title=First] \externalfigure[cow.pdf] \stopplacefigure ..... \startplacegraphic[title=Second] \externalfigure[cow.pdf] \stopplacegraphic \stoptyping we get this: \starttyping ... ... ... ... \stoptyping This makes it possible to style specific categories of floats by using a (combination of) \type {detail} and|/|or \type {chain} as filters. The body of the \type {test-tag.xhtml} file looks similar but it is slightly more tuned for viewing. For instance, hyperlinks are converted to a way that \CSS\ and browsers like more. Keep in mind that the raw file can be the base for conversion to other formats, so that one stays closest to the original structure. The \type {test-div.xhtml} file is even more tuned for viewing in browsers as it completely does away with specific tags. We explicitly don't map onto native \HTML\ elements because that would make everything look messy and horrible, if only because there seldom is a relation between those elements and the original. One can always transform one of the export formats to pure \HTML\ tags if needed. \starttyping
1
First
one
two
...
...>
...
...>
\stoptyping The default \CSS\ file can deal with tags as well as classes. The file of additional styles contains definitions of so|-|called highlights. In the \CONTEXT\ source one is better off using explicit named highlights instead of local font and color switches because these properties are then exported to the \CSS. The images style defines all images used. The templates file lists all the elements used and can be used as a starting point for additional \CSS\ styling. Keep in mind that the export is \notabene{not} meant as a one|-|to|-|one visual representation. It represents structure so that it can be converted to whatever you like. In order to get an export you must start your document with: \starttyping \setupbackend [export=yes] \stoptyping So, we trigger a specific (extra) backend. In addition you can set up the export: \starttyping \setupexport [svgstyle=test-basic-style.tex, cssfile=test-extras.css, hyphen=yes, width=60em] \stoptyping The \type {hyphen} option will also export hyphenation information so that the text can be nicely justified. The \type {svgstyle} option can be used to specify a file where math is set up; normally this would only contain a \type{bodyfont} setup, and this option is only needed if you want to create an \EPUB\ file afterwards which has math represented as \SVG. The value of \type {cssfile} ends up as a style reference in the exported files. You can also pass a comma|-|separated list of names (between curly braces). These entries come after those of the automatically generated \CSS\ files so you need to be aware of default properties. \stopsection \startsection[title=Images] Inclusion of images is done in an indirect way. Each image gets an entry in a special image related stylesheet and then gets referred to by \type {id}. Some extra information is written to a status file so that the script that creates \EPUB\ files can deal with the right conversion, for instance from \PDF\ to \SVG. Because we can refer to specific pages in a \PDF\ file, this subsystem deals with that too. Images are expected to be in an \type {images} subdirectory and because in \CSS\ the references are relative to the path where the stylesheet resides, we use \type {../images} instead. If you do some postprocessing on the files or relocate them you need to keep in mind that you might have to change these paths in the image|-|related \CSS\ file. \stopsection \startsection[title=Epub files] At the end of a run with exporting enabled you will get a message to the console that tells you how to generate an \EPUB\ file. For instance: \starttyping mtxrun --script epub --make --purge test \stoptyping This will create a tree with the following organization: \starttyping ./test-epub ./test-epub/META-INF ./test-epub/META-INF/container.xml ./test-epub/OEBPS ./test-epub/OEBPS/content.opf ./test-epub/OEBPS/toc.ncx ./test-epub/OEBPS/nav.xhtml ./test-epub/OEBPS/cover.xhtml ./test-epub/OEBPS/test-div.xhtml ./test-epub/OEBPS/images ./test-epub/OEBPS/images/... ./test-epub/styles ./test-epub/styles/test-defaults.css ./test-epub/styles/test-images.css ./test-epub/styles/test-styles.css ./test-epub/mimetype \stoptyping Images will be moved to this tree as well and if needed they will be converted, for instance into \SVG. Converted \PDF\ files can have a \typ {page-} in their name when a specific page has been used. You can pass the option \type {--svgmath} in which case math will be converted to \SVG. The main reason for this feature is that we found out that \MATHML\ support in browsers is not currently as widespread as might be expected. The best bet is Firefox which natively supports it. The Chrome browser had it for a while but it got dropped and math is now delegated to \JAVASCRIPT\ and friends. In Internet Explorer \MATHML\ should work (but I need to test that again). This conversion mechanism is kind of interesting: one enters \TEX\ math, then gets \MATHML\ in the export, and that gets rendered by \TEX\ again, but now as a standalone snippet that then gets converted to \SVG\ and embedded in the result. \stopsection \startsection[title=Styles] One can argue that we should use native \HTML\ elements but since we don't have a nice guaranteed|-|consistent mapping onto that, it makes no sense to do so. Instead, we rely on either explicit tags with details and chains or divisions with classes that combine the tag, detail and chain. The tagged variant has some more attributes and those that use a fixed set of values become classes in the division variant. Also, once we start going the (for instance) \type {H1}, \type {H2}, etc.\ route we're lost when we have more levels than that or use a different structure. If an \type {H3} can reflect several levels it makes no sense to use it. The same is true for other tags: if a list is not really a list than tagging it with \type {LI} is counterproductive. We're often dealing with very complex documents so basic \HTML\ tagging becomes rather meaningless. If you look at the division variant (this is used for \EPUB\ too) you will notice that there are no empty elements but \type {div} blocks with a comment as content. This is needed because otherwise they get ignored, which for instance makes table cells invisible. The relation between \type {detail} and \type {chain} (reflected in \type {class}) can best be seen from the next example. \starttyping \definefloat[myfloata] \definefloat[myfloatb][myfloatbs][figure] \definefloat[myfloatc][myfloatcs][myfloatb] \stoptyping This creates two new float instances. The first inherits from the main float settings, but can have its own properties. The second example inherits from the \type {figure} so in fact it is part of a chain. The third one has a longer chain. \starttyping ... ... ... \stoptyping In a \CSS\ style you can now configure tags, details, and chains as well as classes (we show only a few possibilities). Here, the \CSS\ element on the first line of each pair is invoked by the \CSS\ selector on the second line. \starttyping div.float.myfloata { } float[detail='myfloata'] { } div.float.myfloatb { } float[detail='myfloatb'] { } div.float.figure { } float[detail='figure'] { } div.float.figure.myfloatb { } float[chain~='figure'][detail='myfloata'] { } div.myfloata { } *[detail='myfloata'] { } div.myfloatb { } *[detail='myfloatb'] { } div.figure { } *[chain~='figure'] { } div.figure.myfloatb { } *[chain~='figure'][detail='myfloatb'] { } \stoptyping The default styles cover some basics but if you're serious about the export or want to use \EPUB\ then it makes sense to overload some of it and|/|or provide additional styling. You can find plenty about \CSS\ and its options on the Internet. \stopsection \startsection[title=Coding] The default output reflects the structure present in the document. If that is not enough you can add your own structure, as in: \starttyping \startelement[question] Is this right? \stopelement \stoptyping You can also pass attributes: \starttyping \startelement[question][level=difficult] Is this right? \stopelement \stoptyping But these will be exported only when you also say: \starttyping \setupexport [properties=yes] \stoptyping You can create a namespace. The following will generate attributes like \type {my-level}. \starttyping \setupexport [properties=my-] \stoptyping In most cases it makes more sense to use highlights: \starttyping \definehighlight [important] [style=bold] \stoptyping This has the advantage that the style and color are exported to a special \CSS\ file. Headers, footers, and other content that is part of the page builder are not exported. If your document has cover pages you might want to hide them too. The same is true when you create special chapter title rendering with a side effect that content ends up in the page stream. If something shows up that you don't want, you can wrap it in an \type {ignore} element: \starttyping \startelement[ignore] Don't export this. \stopelement \stoptyping \stopsection \stoptext