xml-mkiv-tricks.tex /size: 21 Kb    last modification: 2025-02-21 11:03
1% language=us runpath=texruns:manuals/xml
2
3\environment xml-mkiv-style
4
5\startcomponent xml-mkiv-tricks
6
7\startchapter[title={Tips and tricks}]
8
9\startsection[title={tracing}]
10
11It can be hard to debug code as much happens kind of behind the screens.
12Therefore we have a couple of tracing options. Of course you can typeset some
13status information, using for instance:
14
15\startxmlcmd {\cmdbasicsetup{xmlshow}}
16    typeset the tree given by \cmdinternal {cd:node}
17\stopxmlcmd
18
19\startxmlcmd {\cmdbasicsetup{xmlinfo}}
20    typeset the name in the element given by \cmdinternal {cd:node}
21\stopxmlcmd
22
23\startxmlcmd {\cmdbasicsetup{xmlpath}}
24    returns the complete path (including namespace prefix and index) of the
25    given \cmdinternal {cd:node}
26\stopxmlcmd
27
28\startbuffer[demo]
29<?xml version "1.0"?>
30<document>
31    <section>
32        <content>
33            <p>first</p>
34            <p><b>second</b></p>
35        </content>
36    </section>
37    <section>
38        <content>
39            <p><b>third</b></p>
40            <p>fourth</p>
41        </content>
42    </section>
43</document>
44\stopbuffer
45
46Say that we have the following \XML:
47
48\typebuffer[demo]
49
50and the next definitions:
51
52\startbuffer
53\startxmlsetups xml:demo:base
54    \xmlsetsetup{#1}{p|b}{xml:demo:*}
55\stopxmlsetups
56
57\startxmlsetups xml:demo:p
58    \xmlflush{#1}
59    \par
60\stopxmlsetups
61
62\startxmlsetups xml:demo:b
63    \par
64    \xmlpath{#1} : \xmlflush{#1}
65    \par
66\stopxmlsetups
67
68\xmlregisterdocumentsetup{example-10}{xml:demo:base}
69
70\xmlprocessbuffer{example-10}{demo}{}
71\stopbuffer
72
73\typebuffer
74
75This will give us:
76
77\blank \startpacked \getbuffer \stoppacked \blank
78
79If you use \type {\xmlshow} you will get a complete subtree which can
80be handy for tracing but can also lead to large documents.
81
82We also have a bunch of trackers that can be enabled, like:
83
84\starttyping
85\enabletrackers[xml.show,xml.parse]
86\stoptyping
87
88The full list (currently) is:
89
90\starttabulate[|lT|p|]
91\NC xml.entities  \NC show what entities are seen and replaced \NC \NR
92\NC xml.path      \NC show the result of parsing an lpath expression \NC \NR
93\NC xml.parse     \NC show stepwise resolving of expressions \NC \NR
94\NC xml.profile   \NC report all parsed lpath expressions (in the log) \NC \NR
95\NC xml.remap     \NC show what namespaces are remapped \NC \NR
96\NC lxml.access   \NC report errors with respect to resolving (symbolic) nodes \NC \NR
97\NC lxml.comments \NC show the comments that are encountered (if at all) \NC \NR
98\NC lxml.loading  \NC show what files are loaded and converted \NC \NR
99\NC lxml.setups   \NC show what setups are being associated to elements \NC \NR
100\stoptabulate
101
102In one of our workflows we produce books from \XML\ where the (educational)
103content is organized in many small files. Each book has about 5~chapters and each
104chapter is made of sections that contain text, exercises, resources, etc.\ and so
105the document is assembled from thousands of files (don't worry, runtime inclusion
106is pretty fast). In order to see where in the sources content resides we can
107trace the filename.
108
109\startxmlcmd {\cmdbasicsetup{xmlinclusion}}
110    returns the file where the node comes from
111\stopxmlcmd
112
113\startxmlcmd {\cmdbasicsetup{xmlinclusions}}
114    returns the list of files where the node comes from
115\stopxmlcmd
116
117\startxmlcmd {\cmdbasicsetup{xmlbadinclusions}}
118    returns a list of files that were not included due to some problem
119\stopxmlcmd
120
121Of course you have to make sure that these names end up somewhere visible, for
122instance in the margin.
123
124\stopsection
125
126\startsection[title={expansion}]
127
128For novice users the concept of expansion might sound frightening and to some
129extend it is. However, it is important enough to spend some words on it here.
130
131It is good to realize that most setups are sort of immediate. When one setup is
132issued, it can call another one and so on. Normally you won't notice that but
133there are cases where that can be a problem. In \TEX\ you can define a macro,
134take for instance:
135
136\starttyping
137\startxmlsetups xml:foo
138  \def\foobar{\xmlfirst{#1}{/bar}}
139\stopxmlsetups
140\stoptyping
141
142you store the reference top node \type {bar} in \type {\foobar} maybe for later use. In
143this case the content is not yet fetched, it will be done when \type {\foobar} is
144called.
145
146\starttyping
147\startxmlsetups xml:foo
148  \edef\foobar{\xmlfirst{#1}{/bar}}
149\stopxmlsetups
150\stoptyping
151
152Here the content of \type {bar} becomes the body of the macro. But what if
153\type {bar} itself contains elements that also contain elements. When there
154is a setup for \type {bar} it will be triggered and so on.
155
156When that setup looks like:
157
158\starttyping
159\startxmlsetups xml:bar
160  \def\barfoo{\xmlflush{#1}}
161\stopxmlsetups
162\stoptyping
163
164Here we get something like:
165
166\starttyping
167\foobar => {\def\barfoo{...}}
168\stoptyping
169
170When \type {\barfoo} is not defined we get an error and when it is known and expands
171to something weird we might also get an error.
172
173Especially when you don't know what content can show up, this can result in errors
174when an expansion fails, for example because some macro being used is not defined.
175To prevent this we can define a macro:
176
177\starttyping
178\starttexdefinition unexpanded xml:bar:macro #1
179  \def\barfoo{\xmlflush{#1}}
180\stoptexdefinition
181
182\startxmlsetups xml:bar
183  \texdefinition{xml:bar:macro}{#1}
184\stopxmlsetups
185\stoptyping
186
187The setup \type {xml:bar} will still expand but the replacement text now is just the
188call to the macro, think of:
189
190\starttyping
191\foobar => {\texdefinition{xml:bar:macro}{#1}}
192\stoptyping
193
194But this is often not needed, most \CONTEXT\ commands can handle the expansions
195quite well but it's good to know that there is a way out. So, now to some
196examples. Imagine that we have an \XML\ file that looks as follows:
197
198\starttyping
199<?xml version='1.0' ?>
200<demo>
201    <chapter>
202        <title>Some <em>short</em> title</title>
203        <content>
204            zeta
205            <index>
206                <key>zeta</key>
207                <content>zeta again</content>
208            </index>
209            alpha
210            <index>
211                <key>alpha</key>
212                <content>alpha <em>again</em></content>
213            </index>
214            gamma
215            <index>
216                <key>gamma</key>
217                <content>gamma</content>
218            </index>
219            beta
220            <index>
221                <key>beta</key>
222                <content>beta</content>
223            </index>
224            delta
225            <index>
226                <key>delta</key>
227                <content>delta</content>
228            </index>
229            done!
230        </content>
231    </chapter>
232</demo>
233\stoptyping
234
235There are a few structure related elements here: a chapter (with its list entry)
236and some index entries. Both are multipass related and therefore travel around.
237This means that when we let data end up in the auxiliary file, we need to make
238sure that we end up with either expanded data (i.e.\ no references to the \XML\
239tree) or with robust forward and backward references to elements in the tree.
240
241Here we discuss three approaches (and more may show up later): pushing \XML\ into
242the auxiliary file and using references to elements either or not with an
243associated setup. We control the variants with a switch.
244
245\starttyping
246\newcount\TestMode
247
248\TestMode=0 % expansion=xml
249\TestMode=1 % expansion=yes, index, setup
250\TestMode=2 % expansion=yes
251\stoptyping
252
253We apply a couple of setups:
254
255\starttyping
256\startxmlsetups xml:mysetups
257    \xmlsetsetup{\xmldocument}{demo|index|content|chapter|title|em}{xml:*}
258\stopxmlsetups
259
260\xmlregistersetup{xml:mysetups}
261\stoptyping
262
263The main document is processed with:
264
265\starttyping
266\startxmlsetups xml:demo
267    \xmlflush{#1}
268    \subject{contents}
269    \placelist[chapter][criterium=all]
270    \subject{index}
271    \placeregister[index][criterium=all]
272    \page % else buffer is forgotten when placing header
273\stopxmlsetups
274\stoptyping
275
276First we show three alternative ways to deal with the chapter. The first case
277expands the \XML\ reference so that we have an \XML\ stream in the auxiliary
278file. This stream is processed as a small independent subfile when needed. The
279second case registers a reference to the current element (\type {#1}). This means
280that we have access to all data of this element, like attributes, title and
281content. What happens depends on the given setup. The third variant does the same
282but here the setup is part of the reference.
283
284\starttyping
285\startxmlsetups xml:chapter
286    \ifcase \TestMode
287        % xml code travels around
288        \setuphead[chapter][expansion=xml]
289        \startchapter[title=eh: \xmltext{#1}{title}]
290            \xmlfirst{#1}{content}
291        \stopchapter
292    \or
293        % index is used for access via setup
294        \setuphead[chapter][expansion=yes,xmlsetup=xml:title:flush]
295        \startchapter[title=\xmlgetindex{#1}]
296            \xmlfirst{#1}{content}
297        \stopchapter
298    \or
299        % tex call to xml using index is used
300        \setuphead[chapter][expansion=yes]
301        \startchapter[title=hm: \xmlreference{#1}{xml:title:flush}]
302            \xmlfirst{#1}{content}
303        \stopchapter
304    \fi
305\stopxmlsetups
306
307\startxmlsetups xml:title:flush
308    \xmltext{#1}{title}
309\stopxmlsetups
310\stoptyping
311
312We need to deal with emphasis and the content of the chapter.
313
314\starttyping
315\startxmlsetups xml:em
316    \begingroup\em\xmlflush{#1}\endgroup
317\stopxmlsetups
318
319\startxmlsetups xml:content
320    \xmlflush{#1}
321\stopxmlsetups
322\stoptyping
323
324A similar approach is followed with the index entries. Watch how we use the
325numbered entries variant (in this case we could also have used just \type
326{entries} and \type {keys}).
327
328\starttyping
329\startxmlsetups xml:index
330    \ifcase \TestMode
331        \setupregister[index][expansion=xml,xmlsetup=]
332        \setstructurepageregister
333          [index]
334          [entries:1=\xmlfirst{#1}{content},
335           keys:1=\xmltext{#1}{key}]
336    \or
337        \setupregister[index][expansion=yes,xmlsetup=xml:index:flush]
338        \setstructurepageregister
339          [index]
340          [entries:1=\xmlgetindex{#1},
341           keys:1=\xmltext{#1}{key}]
342    \or
343        \setupregister[index][expansion=yes,xmlsetup=]
344        \setstructurepageregister
345          [index]
346          [entries:1=\xmlreference{#1}{xml:index:flush},
347           keys:1=\xmltext{#1}{key}]
348    \fi
349\stopxmlsetups
350
351\startxmlsetups xml:index:flush
352    \xmlfirst{#1}{content}
353\stopxmlsetups
354\stoptyping
355
356Instead of this flush, you can use the predefined setup \type {xml:flush}
357unless it is overloaded by you.
358
359The file is processed by:
360
361\starttyping
362\starttext
363    \xmlprocessfile{main}{test.xml}{}
364\stoptext
365\stoptyping
366
367We don't show the result here. If you're curious what the output is, you can test
368it yourself. In that case it also makes sense to peek into the \type {test.tuc}
369file to see how the information travels around. The \type {metadata} fields carry
370information about how to process the data.
371
372The first case, the \XML\ expansion one, is somewhat special in the sense that
373internally we use small pseudo files. You can control the rendering by tweaking
374the following setups:
375
376\starttyping
377\startxmlsetups xml:ctx:sectionentry
378    \xmlflush{#1}
379\stopxmlsetups
380
381\startxmlsetups xml:ctx:registerentry
382    \xmlflush{#1}
383\stopxmlsetups
384\stoptyping
385
386{\em When these methods work out okay the other structural elements will be
387dealt with in a similar way.}
388
389\stopsection
390
391\startsection[title={special cases}]
392
393Normally the content will be flushed under a special (so called) catcode regime.
394This means that characters that have a special meaning in \TEX\ will have no such
395meaning in an \XML\ file. If you want content to be treated as \TEX\ code, you can
396use one of the following:
397
398\startxmlcmd {\cmdbasicsetup{xmlflushcontext}}
399    flush the given \cmdinternal {cd:node} using the \TEX\ character
400    interpretation scheme
401\stopxmlcmd
402
403\startxmlcmd {\cmdbasicsetup{xmlcontext}}
404    flush the match of \cmdinternal {cd:lpath} for the given \cmdinternal
405    {cd:node} using the \TEX\ character interpretation scheme
406\stopxmlcmd
407
408We use this in cases like:
409
410\starttyping
411....
412  \xmlsetsetup {#1} {
413      tm|texformula|
414  } {xml:*}
415....
416
417\startxmlsetups xml:tm
418  \mathematics{\xmlflushcontext{#1}}
419\stopxmlsetups
420
421\startxmlsetups xml:texformula
422  \placeformula\startformula\xmlflushcontext{#1}\stopformula
423\stopxmlsetups
424\stoptyping
425
426\stopsection
427
428\startsection[title={collecting}]
429
430Say that your document has
431
432\starttyping
433<table>
434    <tr>
435        <td>foo</td>
436        <td>bar<td>
437    </tr>
438</table>
439\stoptyping
440
441And that you need to convert that to \TEX\ speak like:
442
443\starttyping
444\bTABLE
445    \bTR
446        \bTD foo \eTD
447        \bTD bar \eTD
448    \eTR
449\eTABLE
450\stoptyping
451
452A simple mapping is:
453
454\starttyping
455\startxmlsetups xml:table
456    \bTABLE \xmlflush{#1} \eTABLE
457\stopxmlsetups
458\startxmlsetups xml:tr
459    \bTR \xmlflush{#1} \eTR
460\stopxmlsetups
461\startxmlsetups xml:td
462    \bTD \xmlflush{#1} \eTD
463\stopxmlsetups
464\stoptyping
465
466The \type {\bTD} command is a so called delimited command which means that it
467picks up its argument by looking for an \type {\eTD}. For the simple case here
468this works quite well because the flush is inside the pair. This is not the case
469in the following variant:
470
471\starttyping
472\startxmlsetups xml:td:start
473    \bTD
474\stopxmlsetups
475\startxmlsetups xml:td:stop
476    \eTD
477\stopxmlsetups
478\startxmlsetups xml:td
479    \xmlsetup{#1}{xml:td:start}
480    \xmlflush{#1}
481    \xmlsetup{#1}{xml:td:stop}
482\stopxmlsetups
483\stoptyping
484
485When for some reason \TEX\ gets confused you can revert to a mechanism that
486collects content.
487
488\starttyping
489\startxmlsetups xml:td:start
490    \startcollect
491        \bTD
492    \stopcollect
493\stopxmlsetups
494\startxmlsetups xml:td:stop
495    \startcollect
496        \eTD
497    \stopcollect
498\stopxmlsetups
499\startxmlsetups xml:td
500    \startcollecting
501        \xmlsetup{#1}{xml:td:start}
502        \xmlflush{#1}
503        \xmlsetup{#1}{xml:td:stop}
504    \stopcollecting
505\stopxmlsetups
506\stoptyping
507
508You can even implement solutions that effectively do this:
509
510\starttyping
511\startcollecting
512    \startcollect \bTABLE \stopcollect
513        \startcollect \bTR \stopcollect
514            \startcollect \bTD \stopcollect
515            \startcollect   foo\stopcollect
516            \startcollect \eTD \stopcollect
517            \startcollect \bTD \stopcollect
518            \startcollect   bar\stopcollect
519            \startcollect \eTD \stopcollect
520        \startcollect \eTR \stopcollect
521    \startcollect \eTABLE \stopcollect
522\stopcollecting
523\stoptyping
524
525Of course you only need to go that complex when the situation demands it. Here is
526another weird one:
527
528\starttyping
529\startcollecting
530    \startcollect \setupsomething[\stopcollect
531        \startcollect foo=\stopcollect
532        \startcollect FOO,\stopcollect
533        \startcollect bar=\stopcollect
534        \startcollect BAR,\stopcollect
535    \startcollect ]\stopcollect
536\stopcollecting
537\stoptyping
538
539\stopsection
540
541\startsection[title={low level injection}]
542
543You can inject raw \TEX\ commands into the processed result:
544
545\starttyping
546<?xml version='1.0'?>
547<whatever>
548    <p>test 1</p>
549    <?context-tex-directive start ?>
550    <?context-tex-directive red   ?>
551    <p>test 2</p>
552    <?context-tex-directive stop ?>
553    <p>test 3</p>
554</whatever>
555\stoptyping
556
557There are however more structured ways available that are discussed in following
558sections.
559
560\startsection[title={selectors and injectors}]
561
562This section describes a bit special feature, one that we needed for a project
563where we could not touch the original content but could add specific sections for
564our own purpose. Hopefully the example demonstrates its usability.
565
566\enabletrackers[lxml.selectors]
567
568\startbuffer[foo]
569<?xml version="1.0" encoding="UTF-8"?>
570
571<?context-directive message info 1: this is a demo file ?>
572<?context-message-directive info 2: this is a demo file ?>
573
574<one>
575    <two>
576        <?context-select begin t1 t2 t3 ?>
577            <three>
578                t1 t2 t3
579                <?context-directive injector crlf t1 ?>
580                t1 t2 t3
581            </three>
582        <?context-select end ?>
583        <?context-select begin t4 ?>
584            <four>
585                t4
586            </four>
587        <?context-select end ?>
588        <?context-select begin t8 ?>
589            <four>
590                t8.0
591                t8.0
592            </four>
593        <?context-select end ?>
594        <?context-include begin t4 ?>
595            <!--
596                <three>
597                    t4.t3
598                    <?context-directive injector crlf t1 ?>
599                    t4.t3
600                </three>
601            -->
602            <three>
603                t3
604                <?context-directive injector crlf t1 ?>
605                t3
606            </three>
607        <?context-include end ?>
608        <?context-select begin t8 ?>
609            <four>
610                t8.1
611                t8.1
612            </four>
613        <?context-select end ?>
614        <?context-select begin t8 ?>
615            <four>
616                t8.2
617                t8.2
618            </four>
619        <?context-select end ?>
620        <?context-select begin t4 ?>
621            <four>
622                t4
623                t4
624            </four>
625        <?context-select end ?>
626        <?context-directive injector page t7 t8 ?>
627        foo
628        <?context-directive injector blank t1 ?>
629        bar
630        <?context-directive injector page t7 t8 ?>
631        bar
632    </two>
633</one>
634\stopbuffer
635
636\typebuffer[foo]
637
638First we show how to plug in a directive. Processing instructions like the
639following are normally ignored by an \XML\ processor, unless they make sense
640to it.
641
642\starttyping
643<?context-directive message info 1: this is a demo file ?>
644<?context-message-directive info 2: this is a demo file ?>
645\stoptyping
646
647We can define a message handler as follows:
648
649\startbuffer
650\def\MyMessage#1#2#3{\writestatus{#1}{#2 #3}}
651
652\xmlinstalldirective{message}{MyMessage}
653\stopbuffer
654
655\typebuffer \getbuffer
656
657When this file is processed you will see this on the console:
658
659\starttyping
660info > 1: this is a demo file
661info > 2: this is a demo file
662\stoptyping
663
664The file has some sections that can be used or ignored. The recipe for
665obeying \type {t1} and \type {t4} is the following:
666
667\startbuffer
668\xmlsetinjectors[t1]
669\xmlsetinjectors[t4]
670
671\startxmlsetups xml:initialize
672    \xmlapplyselectors{#1}
673    \xmlsetsetup {#1} {
674        one|two|three|four
675    } {xml:*}
676\stopxmlsetups
677
678\xmlregistersetup{xml:initialize}
679
680\startxmlsetups xml:one
681    [ONE \xmlflush{#1} ONE]
682\stopxmlsetups
683
684\startxmlsetups xml:two
685    [TWO \xmlflush{#1} TWO]
686\stopxmlsetups
687
688\startxmlsetups xml:three
689    [THREE \xmlflush{#1} THREE]
690\stopxmlsetups
691
692\startxmlsetups xml:four
693    [FOUR \xmlflush{#1} FOUR]
694\stopxmlsetups
695\stopbuffer
696
697\typebuffer \getbuffer
698
699This typesets:
700
701\startnarrower
702\xmlprocessbuffer{main}{foo}{}
703\stopnarrower
704
705The include coding is kind of special: it permits adding content (in a comment)
706and ignoring the rest so that we indeed can add something without interfering
707with the original. Of course in a normal workflow such messy solutions are
708not needed, but alas, often workflows are not that clean, especially when one
709has no real control over the source.
710
711\startxmlcmd {\cmdbasicsetup{xmlsetinjectors}}
712    enables a list of injectors that will be used
713\stopxmlcmd
714
715\startxmlcmd {\cmdbasicsetup{xmlresetinjectors}}
716    resets the list of injectors
717\stopxmlcmd
718
719\startxmlcmd {\cmdbasicsetup{xmlinjector}}
720    expands an injection (command); normally this one is only used
721    (in some setup) or for testing
722\stopxmlcmd
723
724\startxmlcmd {\cmdbasicsetup{xmlapplyselectors}}
725    analyze the tree \cmdinternal {cd:node} for marked sections that
726    will be injected
727\stopxmlcmd
728
729We have some injections predefined:
730
731\starttyping
732\startsetups xml:directive:injector:page
733    \page
734\stopsetups
735
736\startsetups xml:directive:injector:column
737    \column
738\stopsetups
739
740\startsetups xml:directive:injector:blank
741    \blank
742\stopsetups
743\stoptyping
744
745In the example we see:
746
747\starttyping
748<?context-directive injector page t7 t8 ?>
749\stoptyping
750
751When we set \type {\xmlsetinjector[t7]} a pagebreak will injected in that spot.
752Tags like \type {t7}, \type {t8} etc.\ can represent versions.
753
754\stopsection
755
756\startsection[title=preprocessing]
757
758% local match    = lpeg.match
759% local replacer = lpeg.replacer("BAD TITLE:","<bold>BAD TITLE:</bold>")
760%
761% function lxml.preprocessor(data,settings)
762%     return match(replacer,data)
763% end
764
765\startbuffer[pre-code]
766\startluacode
767    function lxml.preprocessor(data,settings)
768        return string.find(data,"BAD TITLE:")
769           and string.gsub(data,"BAD TITLE:","<bold>BAD TITLE:</bold>")
770            or data
771    end
772\stopluacode
773\stopbuffer
774
775\startbuffer[pre-xml]
776\startxmlsetups pre:demo:initialize
777    \xmlsetsetup{#1}{*}{pre:demo:*}
778\stopxmlsetups
779
780\xmlregisterdocumentsetup{pre:demo}{pre:demo:initialize}
781
782\startxmlsetups pre:demo:root
783    \xmlflush{#1}
784\stopxmlsetups
785
786\startxmlsetups pre:demo:bold
787    \begingroup\bf\xmlflush{#1}\endgroup
788\stopxmlsetups
789
790\starttext
791    \xmlprocessbuffer{pre:demo}{demo}{}
792\stoptext
793\stopbuffer
794
795Say that you have the following \XML\ setup:
796
797\typebuffer[pre-xml]
798
799and that (such things happen) the input looks like this:
800
801\startbuffer[demo]
802<root>
803BAD TITLE: crap crap crap ...
804
805BAD TITLE: crap crap crap ...
806</root>
807\stopbuffer
808
809\typebuffer[demo]
810
811You can then clean up these \type {BAD TITLE}'s as follows:
812
813\typebuffer[pre-code]
814
815and get as result:
816
817\start \getbuffer[pre-code,pre-xml] \stop
818
819The preprocessor function gets as second argument the current settings, an d
820the field \type {currentresource} can be used to limit the actions to
821specific resources, in our case it's \type {buffer: demo}. Afterwards you can
822reset the proprocessor with:
823
824\startluacode
825lxml.preprocessor = nil
826\stopluacode
827
828Future versions might give some more control over preprocessors. For now consider
829it to be a quick hack.
830
831\stopsection
832
833\stopchapter
834
835\stopcomponent
836