xml-mkiv-tricks.tex /size: 20 Kb    last modification: 2021-10-28 13:50
1% language=us runpath=texruns:manuals/xml
2
3\environment xml-mkiv-style
4
5\startcomponent xml-mkiv-tricks
6
7\startchapter[title={Tips and tricks}]
8
9\startsection[title={tracing}]
10
11It can be hard to debug code as much happens kind of behind the screens.
12Therefore we have a couple of tracing options. Of course you can typeset some
13status information, using for instance:
14
15\startxmlcmd {\cmdbasicsetup{xmlshow}}
16    typeset the tree given by \cmdinternal {cd:node}
17\stopxmlcmd
18
19\startxmlcmd {\cmdbasicsetup{xmlinfo}}
20    typeset the name in the element given by \cmdinternal {cd:node}
21\stopxmlcmd
22
23\startxmlcmd {\cmdbasicsetup{xmlpath}}
24    returns the complete path (including namespace prefix and index) of the
25    given \cmdinternal {cd:node}
26\stopxmlcmd
27
28\startbuffer[demo]
29<?xml version "1.0"?>
30<document>
31    <section>
32        <content>
33            <p>first</p>
34            <p><b>second</b></p>
35        </content>
36    </section>
37    <section>
38        <content>
39            <p><b>third</b></p>
40            <p>fourth</p>
41        </content>
42    </section>
43</document>
44\stopbuffer
45
46Say that we have the following \XML:
47
48\typebuffer[demo]
49
50and the next definitions:
51
52\startbuffer
53\startxmlsetups xml:demo:base
54    \xmlsetsetup{#1}{p|b}{xml:demo:*}
55\stopxmlsetups
56
57\startxmlsetups xml:demo:p
58    \xmlflush{#1}
59    \par
60\stopxmlsetups
61
62\startxmlsetups xml:demo:b
63    \par
64    \xmlpath{#1} : \xmlflush{#1}
65    \par
66\stopxmlsetups
67
68\xmlregisterdocumentsetup{example-10}{xml:demo:base}
69
70\xmlprocessbuffer{example-10}{demo}{}
71\stopbuffer
72
73\typebuffer
74
75This will give us:
76
77\blank \startpacked \getbuffer \stoppacked \blank
78
79If you use \type {\xmlshow} you will get a complete subtree which can
80be handy for tracing but can also lead to large documents.
81
82We also have a bunch of trackers that can be enabled, like:
83
84\starttyping
85\enabletrackers[xml.show,xml.parse]
86\stoptyping
87
88The full list (currently) is:
89
90\starttabulate[|lT|p|]
91\NC xml.entities  \NC show what entities are seen and replaced \NC \NR
92\NC xml.path      \NC show the result of parsing an lpath expression \NC \NR
93\NC xml.parse     \NC show stepwise resolving of expressions \NC \NR
94\NC xml.profile   \NC report all parsed lpath expressions (in the log) \NC \NR
95\NC xml.remap     \NC show what namespaces are remapped \NC \NR
96\NC lxml.access   \NC report errors with respect to resolving (symbolic) nodes \NC \NR
97\NC lxml.comments \NC show the comments that are encountered (if at all) \NC \NR
98\NC lxml.loading  \NC show what files are loaded and converted \NC \NR
99\NC lxml.setups   \NC show what setups are being associated to elements \NC \NR
100\stoptabulate
101
102In one of our workflows we produce books from \XML\ where the (educational)
103content is organized in many small files. Each book has about 5~chapters and each
104chapter is made of sections that contain text, exercises, resources, etc.\ and so
105the document is assembled from thousands of files (don't worry, runtime inclusion
106is pretty fast). In order to see where in the sources content resides we can
107trace the filename.
108
109\startxmlcmd {\cmdbasicsetup{xmlinclusion}}
110    returns the file where the node comes from
111\stopxmlcmd
112
113\startxmlcmd {\cmdbasicsetup{xmlinclusions}}
114    returns the list of files where the node comes from
115\stopxmlcmd
116
117\startxmlcmd {\cmdbasicsetup{xmlbadinclusions}}
118    returns a list of files that were not included due to some problem
119\stopxmlcmd
120
121Of course you have to make sure that these names end up somewhere visible, for
122instance in the margin.
123
124\stopsection
125
126\startsection[title={expansion}]
127
128For novice users the concept of expansion might sound frightening and to some
129extend it is. However, it is important enough to spend some words on it here.
130
131It is good to realize that most setups are sort of immediate. When one setup is
132issued, it can call another one and so on. Normally you won't notice that but
133there are cases where that can be a problem. In \TEX\ you can define a macro,
134take for instance:
135
136\starttyping
137\startxmlsetups xml:foo
138  \def\foobar{\xmlfirst{#1}{/bar}}
139\stopxmlsetups
140\stoptyping
141
142you store the reference top node \type {bar} in \type {\foobar} maybe for later use. In
143this case the content is not yet fetched, it will be done when \type {\foobar} is
144called.
145
146\starttyping
147\startxmlsetups xml:foo
148  \edef\foobar{\xmlfirst{#1}{/bar}}
149\stopxmlsetups
150\stoptyping
151
152Here the content of \type {bar} becomes the body of the macro. But what if
153\type {bar} itself contains elements that also contain elements. When there
154is a setup for \type {bar} it will be triggered and so on.
155
156When that setup looks like:
157
158\starttyping
159\startxmlsetups xml:bar
160  \def\barfoo{\xmlflush{#1}}
161\stopxmlsetups
162\stoptyping
163
164Here we get something like:
165
166\starttyping
167\foobar => {\def\barfoo{...}}
168\stoptyping
169
170When \type {\barfoo} is not defined we get an error and when it is known and expands
171to something weird we might also get an error.
172
173Especially when you don't know what content can show up, this can result in errors
174when an expansion fails, for example because some macro being used is not defined.
175To prevent this we can define a macro:
176
177\starttyping
178\starttexdefinition unexpanded xml:bar:macro #1
179  \def\barfoo{\xmlflush{#1}}
180\stoptexdefinition
181
182\startxmlsetups xml:bar
183  \texdefinition{xml:bar:macro}{#1}
184\stopxmlsetups
185\stoptyping
186
187The setup \type {xml:bar} will still expand but the replacement text now is just the
188call to the macro, think of:
189
190\starttyping
191\foobar => {\texdefinition{xml:bar:macro}{#1}}
192\stoptyping
193
194But this is often not needed, most \CONTEXT\ commands can handle the expansions
195quite well but it's good to know that there is a way out. So, now to some
196examples. Imagine that we have an \XML\ file that looks as follows:
197
198\starttyping
199<?xml version='1.0' ?>
200<demo>
201    <chapter>
202        <title>Some <em>short</em> title</title>
203        <content>
204            zeta
205            <index>
206                <key>zeta</key>
207                <content>zeta again</content>
208            </index>
209            alpha
210            <index>
211                <key>alpha</key>
212                <content>alpha <em>again</em></content>
213            </index>
214            gamma
215            <index>
216                <key>gamma</key>
217                <content>gamma</content>
218            </index>
219            beta
220            <index>
221                <key>beta</key>
222                <content>beta</content>
223            </index>
224            delta
225            <index>
226                <key>delta</key>
227                <content>delta</content>
228            </index>
229            done!
230        </content>
231    </chapter>
232</demo>
233\stoptyping
234
235There are a few structure related elements here: a chapter (with its list entry)
236and some index entries. Both are multipass related and therefore travel around.
237This means that when we let data end up in the auxiliary file, we need to make
238sure that we end up with either expanded data (i.e.\ no references to the \XML\
239tree) or with robust forward and backward references to elements in the tree.
240
241Here we discuss three approaches (and more may show up later): pushing \XML\ into
242the auxiliary file and using references to elements either or not with an
243associated setup. We control the variants with a switch.
244
245\starttyping
246\newcount\TestMode
247
248\TestMode=0 % expansion=xml
249\TestMode=1 % expansion=yes, index, setup
250\TestMode=2 % expansion=yes
251\stoptyping
252
253We apply a couple of setups:
254
255\starttyping
256\startxmlsetups xml:mysetups
257    \xmlsetsetup{\xmldocument}{demo|index|content|chapter|title|em}{xml:*}
258\stopxmlsetups
259
260\xmlregistersetup{xml:mysetups}
261\stoptyping
262
263The main document is processed with:
264
265\starttyping
266\startxmlsetups xml:demo
267    \xmlflush{#1}
268    \subject{contents}
269    \placelist[chapter][criterium=all]
270    \subject{index}
271    \placeregister[index][criterium=all]
272    \page % else buffer is forgotten when placing header
273\stopxmlsetups
274\stoptyping
275
276First we show three alternative ways to deal with the chapter. The first case
277expands the \XML\ reference so that we have an \XML\ stream in the auxiliary
278file. This stream is processed as a small independent subfile when needed. The
279second case registers a reference to the current element (\type {#1}). This means
280that we have access to all data of this element, like attributes, title and
281content. What happens depends on the given setup. The third variant does the same
282but here the setup is part of the reference.
283
284\starttyping
285\startxmlsetups xml:chapter
286    \ifcase \TestMode
287        % xml code travels around
288        \setuphead[chapter][expansion=xml]
289        \startchapter[title=eh: \xmltext{#1}{title}]
290            \xmlfirst{#1}{content}
291        \stopchapter
292    \or
293        % index is used for access via setup
294        \setuphead[chapter][expansion=yes,xmlsetup=xml:title:flush]
295        \startchapter[title=\xmlgetindex{#1}]
296            \xmlfirst{#1}{content}
297        \stopchapter
298    \or
299        % tex call to xml using index is used
300        \setuphead[chapter][expansion=yes]
301        \startchapter[title=hm: \xmlreference{#1}{xml:title:flush}]
302            \xmlfirst{#1}{content}
303        \stopchapter
304    \fi
305\stopxmlsetups
306
307\startxmlsetups xml:title:flush
308    \xmltext{#1}{title}
309\stopxmlsetups
310\stoptyping
311
312We need to deal with emphasis and the content of the chapter.
313
314\starttyping
315\startxmlsetups xml:em
316    \begingroup\em\xmlflush{#1}\endgroup
317\stopxmlsetups
318
319\startxmlsetups xml:content
320    \xmlflush{#1}
321\stopxmlsetups
322\stoptyping
323
324A similar approach is followed with the index entries. Watch how we use the
325numbered entries variant (in this case we could also have used just \type
326{entries} and \type {keys}).
327
328\starttyping
329\startxmlsetups xml:index
330    \ifcase \TestMode
331        \setupregister[index][expansion=xml,xmlsetup=]
332        \setstructurepageregister
333          [index]
334          [entries:1=\xmlfirst{#1}{content},
335           keys:1=\xmltext{#1}{key}]
336    \or
337        \setupregister[index][expansion=yes,xmlsetup=xml:index:flush]
338        \setstructurepageregister
339          [index]
340          [entries:1=\xmlgetindex{#1},
341           keys:1=\xmltext{#1}{key}]
342    \or
343        \setupregister[index][expansion=yes,xmlsetup=]
344        \setstructurepageregister
345          [index]
346          [entries:1=\xmlreference{#1}{xml:index:flush},
347           keys:1=\xmltext{#1}{key}]
348    \fi
349\stopxmlsetups
350
351\startxmlsetups xml:index:flush
352    \xmlfirst{#1}{content}
353\stopxmlsetups
354\stoptyping
355
356Instead of this flush, you can use the predefined setup \type {xml:flush}
357unless it is overloaded by you.
358
359The file is processed by:
360
361\starttyping
362\starttext
363    \xmlprocessfile{main}{test.xml}{}
364\stoptext
365\stoptyping
366
367We don't show the result here. If you're curious what the output is, you can test
368it yourself. In that case it also makes sense to peek into the \type {test.tuc}
369file to see how the information travels around. The \type {metadata} fields carry
370information about how to process the data.
371
372The first case, the \XML\ expansion one, is somewhat special in the sense that
373internally we use small pseudo files. You can control the rendering by tweaking
374the following setups:
375
376\starttyping
377\startxmlsetups xml:ctx:sectionentry
378    \xmlflush{#1}
379\stopxmlsetups
380
381\startxmlsetups xml:ctx:registerentry
382    \xmlflush{#1}
383\stopxmlsetups
384\stoptyping
385
386{\em When these methods work out okay the other structural elements will be
387dealt with in a similar way.}
388
389\stopsection
390
391\startsection[title={special cases}]
392
393Normally the content will be flushed under a special (so called) catcode regime.
394This means that characters that have a special meaning in \TEX\ will have no such
395meaning in an \XML\ file. If you want content to be treated as \TEX\ code, you can
396use one of the following:
397
398\startxmlcmd {\cmdbasicsetup{xmlflushcontext}}
399    flush the given \cmdinternal {cd:node} using the \TEX\ character
400    interpretation scheme
401\stopxmlcmd
402
403\startxmlcmd {\cmdbasicsetup{xmlcontext}}
404    flush the match of \cmdinternal {cd:lpath} for the given \cmdinternal
405    {cd:node} using the \TEX\ character interpretation scheme
406\stopxmlcmd
407
408We use this in cases like:
409
410\starttyping
411....
412  \xmlsetsetup {#1} {
413      tm|texformula|
414  } {xml:*}
415....
416
417\startxmlsetups xml:tm
418  \mathematics{\xmlflushcontext{#1}}
419\stopxmlsetups
420
421\startxmlsetups xml:texformula
422  \placeformula\startformula\xmlflushcontext{#1}\stopformula
423\stopxmlsetups
424\stoptyping
425
426\stopsection
427
428\startsection[title={collecting}]
429
430Say that your document has
431
432\starttyping
433<table>
434    <tr>
435        <td>foo</td>
436        <td>bar<td>
437    </tr>
438</table>
439\stoptyping
440
441And that you need to convert that to \TEX\ speak like:
442
443\starttyping
444\bTABLE
445    \bTR
446        \bTD foo \eTD
447        \bTD bar \eTD
448    \eTR
449\eTABLE
450\stoptyping
451
452A simple mapping is:
453
454\starttyping
455\startxmlsetups xml:table
456    \bTABLE \xmlflush{#1} \eTABLE
457\stopxmlsetups
458\startxmlsetups xml:tr
459    \bTR \xmlflush{#1} \eTR
460\stopxmlsetups
461\startxmlsetups xml:td
462    \bTD \xmlflush{#1} \eTD
463\stopxmlsetups
464\stoptyping
465
466The \type {\bTD} command is a so called delimited command which means that it
467picks up its argument by looking for an \type {\eTD}. For the simple case here
468this works quite well because the flush is inside the pair. This is not the case
469in the following variant:
470
471\starttyping
472\startxmlsetups xml:td:start
473    \bTD
474\stopxmlsetups
475\startxmlsetups xml:td:stop
476    \eTD
477\stopxmlsetups
478\startxmlsetups xml:td
479    \xmlsetup{#1}{xml:td:start}
480    \xmlflush{#1}
481    \xmlsetup{#1}{xml:td:stop}
482\stopxmlsetups
483\stoptyping
484
485When for some reason \TEX\ gets confused you can revert to a mechanism that
486collects content.
487
488\starttyping
489\startxmlsetups xml:td:start
490    \startcollect
491        \bTD
492    \stopcollect
493\stopxmlsetups
494\startxmlsetups xml:td:stop
495    \startcollect
496        \eTD
497    \stopcollect
498\stopxmlsetups
499\startxmlsetups xml:td
500    \startcollecting
501        \xmlsetup{#1}{xml:td:start}
502        \xmlflush{#1}
503        \xmlsetup{#1}{xml:td:stop}
504    \stopcollecting
505\stopxmlsetups
506\stoptyping
507
508You can even implement solutions that effectively do this:
509
510\starttyping
511\startcollecting
512    \startcollect \bTABLE \stopcollect
513        \startcollect \bTR \stopcollect
514            \startcollect \bTD \stopcollect
515            \startcollect   foo\stopcollect
516            \startcollect \eTD \stopcollect
517            \startcollect \bTD \stopcollect
518            \startcollect   bar\stopcollect
519            \startcollect \eTD \stopcollect
520        \startcollect \eTR \stopcollect
521    \startcollect \eTABLE \stopcollect
522\stopcollecting
523\stoptyping
524
525Of course you only need to go that complex when the situation demands it. Here is
526another weird one:
527
528\starttyping
529\startcollecting
530    \startcollect \setupsomething[\stopcollect
531        \startcollect foo=\stopcollect
532        \startcollect FOO,\stopcollect
533        \startcollect bar=\stopcollect
534        \startcollect BAR,\stopcollect
535    \startcollect ]\stopcollect
536\stopcollecting
537\stoptyping
538
539\stopsection
540
541\startsection[title={selectors and injectors}]
542
543This section describes a bit special feature, one that we needed for a project
544where we could not touch the original content but could add specific sections for
545our own purpose. Hopefully the example demonstrates its useability.
546
547\enabletrackers[lxml.selectors]
548
549\startbuffer[foo]
550<?xml version="1.0" encoding="UTF-8"?>
551
552<?context-directive message info 1: this is a demo file ?>
553<?context-message-directive info 2: this is a demo file ?>
554
555<one>
556    <two>
557        <?context-select begin t1 t2 t3 ?>
558            <three>
559                t1 t2 t3
560                <?context-directive injector crlf t1 ?>
561                t1 t2 t3
562            </three>
563        <?context-select end ?>
564        <?context-select begin t4 ?>
565            <four>
566                t4
567            </four>
568        <?context-select end ?>
569        <?context-select begin t8 ?>
570            <four>
571                t8.0
572                t8.0
573            </four>
574        <?context-select end ?>
575        <?context-include begin t4 ?>
576            <!--
577                <three>
578                    t4.t3
579                    <?context-directive injector crlf t1 ?>
580                    t4.t3
581                </three>
582            -->
583            <three>
584                t3
585                <?context-directive injector crlf t1 ?>
586                t3
587            </three>
588        <?context-include end ?>
589        <?context-select begin t8 ?>
590            <four>
591                t8.1
592                t8.1
593            </four>
594        <?context-select end ?>
595        <?context-select begin t8 ?>
596            <four>
597                t8.2
598                t8.2
599            </four>
600        <?context-select end ?>
601        <?context-select begin t4 ?>
602            <four>
603                t4
604                t4
605            </four>
606        <?context-select end ?>
607        <?context-directive injector page t7 t8 ?>
608        foo
609        <?context-directive injector blank t1 ?>
610        bar
611        <?context-directive injector page t7 t8 ?>
612        bar
613    </two>
614</one>
615\stopbuffer
616
617\typebuffer[foo]
618
619First we show how to plug in a directive. Processing instructions like the
620following are normally ignored by an \XML\ processor, unless they make sense
621to it.
622
623\starttyping
624<?context-directive message info 1: this is a demo file ?>
625<?context-message-directive info 2: this is a demo file ?>
626\stoptyping
627
628We can define a message handler as follows:
629
630\startbuffer
631\def\MyMessage#1#2#3{\writestatus{#1}{#2 #3}}
632
633\xmlinstalldirective{message}{MyMessage}
634\stopbuffer
635
636\typebuffer \getbuffer
637
638When this file is processed you will see this on the console:
639
640\starttyping
641info > 1: this is a demo file
642info > 2: this is a demo file
643\stoptyping
644
645The file has some sections that can be used or ignored. The recipe for
646obeying \type {t1} and \type {t4} is the following:
647
648\startbuffer
649\xmlsetinjectors[t1]
650\xmlsetinjectors[t4]
651
652\startxmlsetups xml:initialize
653    \xmlapplyselectors{#1}
654    \xmlsetsetup {#1} {
655        one|two|three|four
656    } {xml:*}
657\stopxmlsetups
658
659\xmlregistersetup{xml:initialize}
660
661\startxmlsetups xml:one
662    [ONE \xmlflush{#1} ONE]
663\stopxmlsetups
664
665\startxmlsetups xml:two
666    [TWO \xmlflush{#1} TWO]
667\stopxmlsetups
668
669\startxmlsetups xml:three
670    [THREE \xmlflush{#1} THREE]
671\stopxmlsetups
672
673\startxmlsetups xml:four
674    [FOUR \xmlflush{#1} FOUR]
675\stopxmlsetups
676\stopbuffer
677
678\typebuffer \getbuffer
679
680This typesets:
681
682\startnarrower
683\xmlprocessbuffer{main}{foo}{}
684\stopnarrower
685
686The include coding is kind of special: it permits adding content (in a comment)
687and ignoring the rest so that we indeed can add something without interfering
688with the original. Of course in a normal workflow such messy solutions are
689not needed, but alas, often workflows are not that clean, especially when one
690has no real control over the source.
691
692\startxmlcmd {\cmdbasicsetup{xmlsetinjectors}}
693    enables a list of injectors that will be used
694\stopxmlcmd
695
696\startxmlcmd {\cmdbasicsetup{xmlresetinjectors}}
697    resets the list of injectors
698\stopxmlcmd
699
700\startxmlcmd {\cmdbasicsetup{xmlinjector}}
701    expands an injection (command); normally this one is only used
702    (in some setup) or for testing
703\stopxmlcmd
704
705\startxmlcmd {\cmdbasicsetup{xmlapplyselectors}}
706    analyze the tree \cmdinternal {cd:node} for marked sections that
707    will be injected
708\stopxmlcmd
709
710We have some injections predefined:
711
712\starttyping
713\startsetups xml:directive:injector:page
714    \page
715\stopsetups
716
717\startsetups xml:directive:injector:column
718    \column
719\stopsetups
720
721\startsetups xml:directive:injector:blank
722    \blank
723\stopsetups
724\stoptyping
725
726In the example we see:
727
728\starttyping
729<?context-directive injector page t7 t8 ?>
730\stoptyping
731
732When we set \type {\xmlsetinjector[t7]} a pagebreak will injected in that spot.
733Tags like \type {t7}, \type {t8} etc.\ can represent versions.
734
735\stopsection
736
737\startsection[title=preprocessing]
738
739% local match    = lpeg.match
740% local replacer = lpeg.replacer("BAD TITLE:","<bold>BAD TITLE:</bold>")
741%
742% function lxml.preprocessor(data,settings)
743%     return match(replacer,data)
744% end
745
746\startbuffer[pre-code]
747\startluacode
748    function lxml.preprocessor(data,settings)
749        return string.find(data,"BAD TITLE:")
750           and string.gsub(data,"BAD TITLE:","<bold>BAD TITLE:</bold>")
751            or data
752    end
753\stopluacode
754\stopbuffer
755
756\startbuffer[pre-xml]
757\startxmlsetups pre:demo:initialize
758    \xmlsetsetup{#1}{*}{pre:demo:*}
759\stopxmlsetups
760
761\xmlregisterdocumentsetup{pre:demo}{pre:demo:initialize}
762
763\startxmlsetups pre:demo:root
764    \xmlflush{#1}
765\stopxmlsetups
766
767\startxmlsetups pre:demo:bold
768    \begingroup\bf\xmlflush{#1}\endgroup
769\stopxmlsetups
770
771\starttext
772    \xmlprocessbuffer{pre:demo}{demo}{}
773\stoptext
774\stopbuffer
775
776Say that you have the following \XML\ setup:
777
778\typebuffer[pre-xml]
779
780and that (such things happen) the input looks like this:
781
782\startbuffer[demo]
783<root>
784BAD TITLE: crap crap crap ...
785
786BAD TITLE: crap crap crap ...
787</root>
788\stopbuffer
789
790\typebuffer[demo]
791
792You can then clean up these \type {BAD TITLE}'s as follows:
793
794\typebuffer[pre-code]
795
796and get as result:
797
798\start \getbuffer[pre-code,pre-xml] \stop
799
800The preprocessor function gets as second argument the current settings, an d
801the field \type {currentresource} can be used to limit the actions to
802specific resources, in our case it's \type {buffer: demo}. Afterwards you can
803reset the proprocessor with:
804
805\startluacode
806lxml.preprocessor = nil
807\stopluacode
808
809Future versions might give some more control over preprocessors. For now consider
810it to be a quick hack.
811
812\stopsection
813
814\stopchapter
815
816\stopcomponent
817