epub-mkiv.tex /size: 17 Kb    last modification: 2023-12-21 09:43
1% language=us
2
3% author    : Hans Hagen
4% copyright : ConTeXt Development Team
5% license   : Creative Commons Attribution ShareAlike 4.0 International
6% reference : pragma-ade.nl | contextgarden.net | texlive (related) distributions
7% origin    : the ConTeXt distribution
8%
9% comment   : Because this manual is distributed with TeX distributions it comes with a rather
10%             liberal license. We try to adapt these documents to upgrades in the (sub)systems
11%             that they describe. Using parts of the content otherwise can therefore conflict
12%             with existing functionality and we cannot be held responsible for that. Many of
13%             the manuals contain characteristic graphics and personal notes or examples that
14%             make no sense when used out-of-context.
15%
16% comment   : Some chapters might have been published in TugBoat, the NTG Maps, the ConTeXt
17%             Group journal or otherwise. Thanks to the editors for corrections. Also thanks
18%             to users for testing, feedback and corrections.
19
20% todo:
21%
22% metadata
23% properties
24% \dontleavehmode before hbox
25% cover page
26%
27% http://www.cnet.com/news/google-subtracts-mathml-from-chrome-and-anger-multiplies/
28
29% \usemodule[luacalls]
30
31\usemodule[art-01,abr-02]
32
33\definehighlight[notabene][style=bold]
34
35\definecolor[darkorange] [.70(green,red)]
36\definecolor[lightorange][.45(orange,white)]
37\definecolor[lesswhite]  [.90(white)]
38
39\setuptyping[color=darkorange]
40\setuptype  [color=darkorange]
41
42\starttext
43
44\startMPpage
45
46numeric w  ; w  := 21cm ;
47numeric h  ; h  := 29.7cm ;
48numeric ww ; ww := 9w/10 ;
49numeric oo ; oo := (w-ww) / 2 ;
50numeric hh ; hh := h/5.5 ;
51path    p ; p := unitsquare xysized(w,h)  ;
52
53color orange ; orange := \MPcolor{darkorange} ; % .7[green,red] ;
54
55fill p enlarged 2mm withcolor orange ;
56
57draw image (
58    draw anchored.top(
59        textext("\ttbf\setupinterlinespace[line=1.7ex]\framed[frame=off,align=middle,offset=0mm]{\smash{<div/>}\\\smash{<div >}\\\smash{</div>}}")
60            xsized w,
61        center topboundary p shifted (0,-12mm)) withcolor \MPcolor{lightorange} ; % 0.45[white,orange] ;
62    draw anchored.bot(
63        textext("\ssbf\setupinterlinespace[line=2.2ex]\framed[frame=off,align=middle]{exporting\\xml and epub\\from context}")
64            xsized w,
65        center bottomboundary p shifted (0,4mm)) withcolor \MPcolor {lesswhite} ; % 0.90white ;
66) ;
67
68setbounds currentpicture to p ;
69
70\stopMPpage
71
72\startsection[title=Introduction]
73
74There is a pretty long tradition of typesetting math with \TEX\ and it looks like
75this program will dominate for many more years. Even if we move to the web, the
76simple fact that support for \MATHML\ in some browsers is suboptimal will drive
77those who want a quality document to use \PDF\ instead.
78
79I'm writing this in 2014, at a time when \XML\ is widespread. The idea of \XML\ is
80that you code your data in a very structured way, so that it can be manipulated and
81(if needed) validated. Text has always been a target for \XML\ which is a follow|-|up
82to \SGML\ that was in use by publishers. Because \HTML\ is less structured (and also
83quite tolerant with respect to end tags) we prefer to use \XHTML\ but unfortunately
84support for that is less widespread.
85
86Interestingly, documents are probably among the more complex targets of the
87\XML\ format. The reason is that unless the author restricts him|/|herself or
88gets restricted by the publisher, tag abuse can happen. At \PRAGMA\ we mostly
89deal with education|-|related \XML\ and it's not always easy to come up with
90something that suits the specific needs of the educational concept behind a
91school method. Even if we start out nice and clean, eventually we end up with a
92polluted source, often with additional structure needed to satisfy the tools used
93for conversion.
94
95We have been supporting \XML\ from the day it showed up and most of our projects
96involve \XML\ in one way or the other. That doesn't mean that we don't use \TEX\
97for coding documents. This manual is for instance a regular \TEX\ document. In
98many ways a structured \TEX\ document is much more convenient to edit, especially
99if one wants to add a personal touch and do some local page make|-|up. On the other hand,
100diverting from standard structure commands makes the document less suitable for
101output other than \PDF. There is simply no final solution for coding a document,
102it's mostly a matter of taste.
103
104So we have a dilemma: if we want to have multiple output, frozen \PDF\ as well as
105less-controlled \HTML\ output, we can best code in \XML, but when we want to code
106comfortably we'd like to use \TEX. There are other ways, like Markdown, that can
107be converted to intermediate formats like \TEX, but that is only suitable for
108simple documents: the more advanced documents get, the more one has to escape
109from the boundaries of (any) document encoding, and then often \TEX\ is not a bad
110choice. There is a good reason why \TEX\ survived for so long.
111
112It is for this reason that in \CONTEXT\ \MKIV\ we can export the content in a
113reasonable structured way to \XML. Of course we assume a structured document. It
114started out as an experiment because it was relatively easy to implement, and it
115is now an integral component.
116
117\stopsection
118
119\startsection[title=The output]
120
121The regular output is an \XML\ file but as we have some more related data it gets
122organized in a tree. We also export a few variants. An example is given below:
123
124\starttyping
125./test-export
126./test-export/images
127./test-export/images/...
128./test-export/styles
129./test-export/styles/test-defaults.css
130./test-export/styles/test-images.css
131./test-export/styles/test-styles.css
132./test-export/styles/test-templates.css
133./test-export/test-raw.xml
134./test-export/test-raw.lua
135./test-export/test-tag.xhtml
136./test-export/test-div.xhtml
137\stoptyping
138
139Say that we have this input:
140
141\starttyping
142\setupbackend
143  [export=yes]
144
145\starttext
146  \startsection[title=First]
147    \startitemize
148      \startitem one \stopitem
149      \startitem two \stopitem
150    \stopitemize
151  \stopsection
152\stoptext
153\stoptyping
154
155The main export ends up in the \type {test-raw.xml} export file and looks like
156the following (we leave out the preamble and style references):
157
158\starttyping
159<document> <!-- with some attributes -->
160  <section detail="section" chain="section" level="3">
161    <sectionnumber>1</sectionnumber>
162    <sectiontitle>First</sectiontitle>
163    <sectioncontent>
164      <itemgroup detail="itemize" chain="itemize" symbol="1" level="1">
165        <item>
166          <itemtag><m:math ..><m:mo>•</m:mo></m:math></itemtag>
167          <itemcontent>one</itemcontent>
168        </item>
169        <item>
170          <itemtag><m:math ..><m:mo>•</m:mo></m:math></itemtag>
171          <itemcontent>two</itemcontent>
172        </item>
173      </itemgroup>
174    </sectioncontent>
175  </section>
176</document>
177\stoptyping
178
179This file refers to the stylesheets and therefore renders quite well in a browser
180like Firefox that can handle \XHTML\ with arbitrary tags.
181
182The \type {detail} attribute tells us what instance of the element is used.
183Normally the \type {chain} attribute is the same but it can have more values.
184For instance, if we have:
185
186\starttyping
187\definefloat[graphic][graphics][figure]
188
189.....
190
191\startplacefigure[title=First]
192    \externalfigure[cow.pdf]
193\stopplacefigure
194
195.....
196
197\startplacegraphic[title=Second]
198    \externalfigure[cow.pdf]
199\stopplacegraphic
200\stoptyping
201
202we get this:
203
204\starttyping
205<float detail="figure" chain="figure">
206  <floatcontent>...</floatcontent>
207  <floatcaption>...</floatcaption>
208</float>
209<float detail="graphic" chain="figure graphic">
210  <floatcontent>...</floatcontent>
211  <floatcaption>...</floatcaption>
212</float>
213\stoptyping
214
215This makes it possible to style specific categories of floats by using a
216(combination of) \type {detail} and|/|or \type {chain} as filters.
217
218The body of the \type {test-tag.xhtml} file looks similar but it is slightly more
219tuned for viewing. For instance, hyperlinks are converted to a way that \CSS\ and
220browsers like more. Keep in mind that the raw file can be the base for conversion
221to other formats, so that one stays closest to the original structure.
222
223The \type {test-div.xhtml} file is even more tuned for viewing in browsers as it
224completely does away with specific tags. We explicitly don't map onto native
225\HTML\ elements because that would make everything look messy and horrible, if only
226because there seldom is a relation between those elements and the original. One
227can always transform one of the export formats to pure \HTML\ tags if needed.
228
229\starttyping
230<body>
231  <div class="document">
232    <div class="section" id="aut-1">
233      <div class="sectionnumber">1</div>
234      <div class="sectiontitle">First</div>
235      <div class="sectioncontent">
236        <div class="itemgroup itemize symbol-1">
237          <div class="item">
238            <div class="itemtag"><m:math ...><m:mo>•</m:mo></m:math></div>
239            <div class="itemcontent">one</div>
240          </div>
241          <div class="item">
242            <div class="itemtag"><m:math ...><m:mo>•</m:mo></m:math></div>
243            <div class="itemcontent">two</div>
244         </div>
245       </div>
246       <div class="float figure">
247         <div class="floatcontent">...</div></div>
248         <div class="floatcaption">...></div>
249       </div>
250       <div class="float figure graphic">
251         <div class="floatcontent">...</div></div>
252         <div class="floatcaption">...></div>
253       </div>
254     </div>
255  </div>
256</body>
257\stoptyping
258
259The default \CSS\ file can deal with tags as well as classes. The file
260of additional styles contains definitions of so|-|called highlights. In the \CONTEXT\ source
261one is better off using explicit named highlights instead of local font and color
262switches because these properties are then exported to the \CSS. The images style
263defines all images used. The templates file lists all the elements used and can
264be used as a starting point for additional \CSS\ styling.
265
266Keep in mind that the export is \notabene{not} meant as a one|-|to|-|one visual
267representation. It represents structure so that it can be converted to whatever
268you like.
269
270In order to get an export you must start your document with:
271
272\starttyping
273\setupbackend
274  [export=yes]
275\stoptyping
276
277So, we trigger a specific (extra) backend. In addition you can set up the export:
278
279\starttyping
280\setupexport
281  [svgstyle=test-basic-style.tex,
282   cssfile=test-extras.css,
283   hyphen=yes,
284   width=60em]
285\stoptyping
286
287The \type {hyphen} option will also export hyphenation information so that the
288text can be nicely justified. The \type {svgstyle} option can be used to specify
289a file where math is set up; normally this would only contain a \type{bodyfont} setup,
290and this option is only needed if you want to create an \EPUB\ file afterwards which
291has math represented as \SVG.
292
293The value of \type {cssfile} ends up as a style reference in the exported files.
294You can also pass a comma|-|separated list of names (between curly braces). These
295entries come after those of the automatically generated \CSS\ files so you need
296to be aware of default properties.
297
298\stopsection
299
300\startsection[title=Images]
301
302Inclusion of images is done in an indirect way. Each image gets an entry in a
303special image related stylesheet and then gets referred to by \type {id}. Some
304extra information is written to a status file so that the script that creates
305\EPUB\ files can deal with the right conversion, for instance from \PDF\ to \SVG.
306Because we can refer to specific pages in a \PDF\ file, this subsystem deals with
307that too. Images are expected to be in an \type {images} subdirectory and because in \CSS\
308the references are relative to the path where the stylesheet resides, we use
309\type {../images} instead. If you do some postprocessing on the files or relocate
310them you need to keep in mind that you might have to change these paths in the
311image|-|related \CSS\ file.
312
313\stopsection
314
315\startsection[title=Epub files]
316
317At the end of a run with exporting enabled you will get a message to the console that
318tells you how to generate an \EPUB\ file. For instance:
319
320\starttyping
321mtxrun --script epub --make --purge test
322\stoptyping
323
324This will create a tree with the following organization:
325
326\starttyping
327./test-epub
328./test-epub/META-INF
329./test-epub/META-INF/container.xml
330./test-epub/OEBPS
331./test-epub/OEBPS/content.opf
332./test-epub/OEBPS/toc.ncx
333./test-epub/OEBPS/nav.xhtml
334./test-epub/OEBPS/cover.xhtml
335./test-epub/OEBPS/test-div.xhtml
336./test-epub/OEBPS/images
337./test-epub/OEBPS/images/...
338./test-epub/styles
339./test-epub/styles/test-defaults.css
340./test-epub/styles/test-images.css
341./test-epub/styles/test-styles.css
342./test-epub/mimetype
343\stoptyping
344
345Images will be moved to this tree as well and if needed they will be converted,
346for instance into \SVG. Converted \PDF\ files can have a \typ {page-<number>} in
347their name when a specific page has been used.
348
349You can pass the option \type {--svgmath} in which case math will be converted to
350\SVG. The main reason for this feature is that we found out that \MATHML\ support
351in browsers is not currently as widespread as might be expected. The best bet is Firefox which
352natively supports it. The Chrome browser had it for a while but it got dropped
353and math is now delegated to \JAVASCRIPT\ and friends. In Internet Explorer
354\MATHML\ should work (but I need to test that again).
355
356This conversion mechanism is
357kind of interesting: one enters \TEX\ math, then gets \MATHML\ in the export, and
358that gets rendered by \TEX\ again, but now as a standalone snippet that then gets
359converted to \SVG\ and embedded in the result.
360
361\stopsection
362
363\startsection[title=Styles]
364
365One can argue that we should use native \HTML\ elements but since we don't have a nice
366guaranteed|-|consistent mapping onto that, it makes no sense to do so. Instead, we
367rely on either explicit tags with details and chains or divisions with classes
368that combine the tag, detail and chain. The tagged variant has some more
369attributes and those that use a fixed set of values become classes in the
370division variant. Also, once we start going the (for instance) \type {H1}, \type
371{H2}, etc.\ route we're lost when we have more levels than that or use a
372different structure. If an \type {H3} can reflect several levels it makes no
373sense to use it. The same is true for other tags: if a list is not really a list
374than tagging it with \type {LI} is counterproductive. We're often dealing with
375very complex documents so basic \HTML\ tagging becomes rather meaningless.
376
377If you look at the division variant (this is used for \EPUB\ too) you will notice
378that there are no empty elements but \type {div} blocks with a comment as content.
379This is needed because otherwise they get ignored, which for instance makes table
380cells invisible.
381
382The relation between \type {detail} and \type {chain} (reflected in \type {class})
383can best be seen from the next example.
384
385\starttyping
386\definefloat[myfloata]
387\definefloat[myfloatb][myfloatbs][figure]
388\definefloat[myfloatc][myfloatcs][myfloatb]
389\stoptyping
390
391This creates two new float instances. The first inherits from the main float
392settings, but can have its own properties. The second example inherits from
393the \type {figure} so in fact it is part of a chain. The third one has a longer
394chain.
395
396\starttyping
397<float detail="myfloata">...</float>
398<float detail="myfloatb" chain="figure">...</float>
399<float detail="myfloatc" chain="figure myfloatb">...</float>
400\stoptyping
401
402In a \CSS\ style you can now configure tags, details, and chains as well as
403classes (we show only a few possibilities).  Here, the \CSS\ element on the
404first line of each pair is invoked by the \CSS\ selector on the second line.
405
406\starttyping
407div.float.myfloata { }           float[detail='myfloata'] { }
408div.float.myfloatb { }           float[detail='myfloatb'] { }
409div.float.figure { }             float[detail='figure']   { }
410div.float.figure.myfloatb { }    float[chain~='figure'][detail='myfloata'] { }
411div.myfloata { }                 *[detail='myfloata'] { }
412div.myfloatb { }                 *[detail='myfloatb'] { }
413div.figure { }                   *[chain~='figure'] { }
414div.figure.myfloatb { }          *[chain~='figure'][detail='myfloatb'] { }
415\stoptyping
416
417The default styles cover some basics but if you're serious about the export
418or want to use \EPUB\ then it makes sense to overload some of it and|/|or
419provide additional styling. You can find plenty about \CSS\ and its options
420on the Internet.
421
422\stopsection
423
424\startsection[title=Coding]
425
426The default output reflects the structure present in the document. If that is not
427enough you can add your own structure, as in:
428
429\starttyping
430\startelement[question]
431Is this right?
432\stopelement
433\stoptyping
434
435You can also pass attributes:
436
437\starttyping
438\startelement[question][level=difficult]
439Is this right?
440\stopelement
441\stoptyping
442
443But these will be exported only when you also say:
444
445\starttyping
446\setupexport
447  [properties=yes]
448\stoptyping
449
450You can create a namespace. The following will generate attributes
451like \type {my-level}.
452
453\starttyping
454\setupexport
455  [properties=my-]
456\stoptyping
457
458In most cases it makes more sense to use highlights:
459
460\starttyping
461\definehighlight
462  [important]
463  [style=bold]
464\stoptyping
465
466This has the advantage that the style and color are exported to a special
467\CSS\ file.
468
469Headers, footers, and other content that is part of the page builder are not
470exported. If your document has cover pages you might want to hide them too. The
471same is true when you create special chapter title rendering with a side
472effect that content ends up in the page stream. If something shows up that you
473don't want, you can wrap it in an \type {ignore} element:
474
475\starttyping
476\startelement[ignore]
477Don't export this.
478\stopelement
479\stoptyping
480
481\stopsection
482
483\stoptext
484