musings-toocomplex.tex /size: 17 Kb    last modification: 2024-01-16 10:21
1% language=us runpath=texruns:manuals/musings
2
3\useMPlibrary[dum]
4
5% Extending Darwin's Revolution – David Sloan Wilson & Robert Sapolsky
6
7\startcomponent musings-toocomplex
8
9\environment musings-style
10
11\startchapter[title={False promises}]
12
13\startsection[title={Introduction}]
14
15\startlines \setupalign[flushright]
16Hans Hagen
17Hasselt NL
18July 2019 (public 2023)
19\stoplines
20
21The \TEX\ typesetting system is pretty powerful, and even more so when you
22combine it with \METAPOST\ and \LUA. Add an \XML\ parser, a whole lot of handy
23macros, provide support for fonts and advanced \PDF\ output and you have a hard
24to beat tool. We're talking \CONTEXT.
25
26Such a system is very well suited for fully automated typesetting. There are
27\TEX\ lovers who claim that \TEX\ can do anything better than the competition but
28that's not true. Automated typesetting is quite doable when you accept the
29constraints. When the input is unpredictable you need to play safe!
30
31Some things are easy: turning complex \XML\ into \PDF\ with adaptive graphics,
32fast data processing, colorful layouts, conditional processing, extensive cross
33referencing, you can safely say that it can be done. But in practice there is
34some design involved and those are often specified by people who manipulate a
35layout on the fly and tweak and cheat in an interactive \WYSIWYG\ program. That is
36however not an option in automated typesetting. Traditional thinking with manual
37intervention has to make place for systematic and consistent solutions.
38Limitations can be compensated by clever designs and getting the maximum out of
39the system used.
40
41Unfortunately in practice some habits are hard to get rid of. Inconsistent use of
42colors, fonts, sectioning, image placements are just a few aspects that come to
43mind. When you typeset educational documents you also have to deal with strong
44opinions about how something should be presented and what students can't~(!)
45handle, like for instance cross references. One of the most dominant demands in
46typesetting such documents are so called side floats. In (for instance)
47scientific publishing references to content typeset elsewhere (formulas,
48graphics) is acceptable but in educational documents this is often not an option
49(don't ask me why).
50
51In the next sections I will mention a few aspects of side floats. I will not
52discuss the options because these are covered in manuals. Here we stick to the
53challenges and the main question that you have to ask yourself is: \quotation
54{How would I solve that if it can be solved at all?}. It might make you a bit
55more tolerant for suboptimal outcome.
56
57\stopsection
58
59\startsection[title={The basics}]
60
61We start with a simple example. The result is shown in \in {figure} [demo-1a]. We
62have figures, put at the left, with enough text alongside so that we don't have a
63problem running into the next figure.
64
65\startbuffer[demo-1a]
66\dorecurse {8} {
67    \useMPlibrary[dum]
68    \setuplayout[middle]
69    \setupbodyfont[plex]
70    \startplacefigure[location=left]
71        \externalfigure[dummy][width=3cm]
72    \stopplacefigure
73    \samplefile{sapolsky}
74    \par
75}
76\stopbuffer
77
78\typebuffer[demo-1a]
79
80\startplacefigure[reference=demo-1a,title={A simple example with enough text in a single paragraph.}]
81    \startcombination
82        {\typesetbuffer[demo-1a][width=5cm,frame=on,page=1]} {}
83        {\typesetbuffer[demo-1a][width=5cm,frame=on,page=2]} {}
84    \stopcombination
85\stopplacefigure
86
87Challenge: Anchor some boxed material to the running text and make sure that the
88text runs around that material. When there is not enough room available on the
89page, enforce a page break and move the lot to the next page.
90
91But more often than not, the following paragraph is not long enough to go around
92the insert. The worst case is of course when we end up with one word below the
93insert, for which the solution is to adapt the text or make the insert wider or
94narrower. Forgetting about this for now, we move to the case where there is not
95enough text: \in {figure} [demo-1b].
96
97\startbuffer[demo-1b]
98\dorecurse {8} {
99    \useMPlibrary[dum]
100    \setuplayout[middle]
101    \setupbodyfont[plex]
102    \startplacefigure[location=left]
103        \externalfigure[dummy][width=3cm]
104    \stopplacefigure
105    \samplefile{ward} \par \samplefile{ward}
106    \par
107}
108\stopbuffer
109
110\typebuffer[demo-1b]
111
112\startplacefigure[reference=demo-1b,title={A simple example with enough text but multiple paragraphs.}]
113    \startcombination
114        {\typesetbuffer[demo-1b][width=5cm,frame=on,page=1]} {}
115        {\typesetbuffer[demo-1b][width=5cm,frame=on,page=2]} {}
116    \stopcombination
117\stopplacefigure
118
119Challenge: At every new paragraph, check if we're still not done with the blob
120we're typesetting around and carry on till we are behind the insert.
121
122\startbuffer[demo-1c]
123\dorecurse {8} {
124    \useMPlibrary[dum]
125    \setuplayout[middle]
126    \setupbodyfont[plex]
127    \startplacefigure[location=left]
128        \externalfigure[dummy][width=3cm]
129    \stopplacefigure
130    \samplefile{ward}
131    \par
132}
133\stopbuffer
134
135The next example, shown in \in {figure} [demo-1c], has less text. However, the
136running text is still alongside the figure, so this means that white space need
137to be added till we're beyond.
138
139\typebuffer[demo-1c]
140
141\startplacefigure[reference=demo-1c,title={A simple example with less text}]
142    \startcombination
143        {\typesetbuffer[demo-1c][width=5cm,frame=on,page=1]} {}
144        {\typesetbuffer[demo-1c][width=5cm,frame=on,page=2]} {}
145    \stopcombination
146\stopplacefigure
147
148Challenge: When there is not enough content, and the next insert is coming, we
149add enough whitespace to go around the insert and then start the new one. This is
150typically something that can also be enforced by an option.
151
152Before we move on to the next challenge, let's explain how we run around the
153insert. When \TEX\ typesets a paragraph, it uses dimensions like \typ {\leftskip}
154and \typ {\rightskip} (margins) and shape directives like \typ {\hangindent} and
155\typ {\hangafter}. There is also the possibility to define a \typ {\parshape} but
156we will leave that for now. The with of the image is reflected in the indent and
157the height gets divided by the line height and becomes the \typ {\hangafter}.
158Whenever a new paragraph is started, these parameters have to be set again.
159\footnote {I still consider playing with a third parameter representing hang
160height and add that to the line break routine, but I have to admit that tweaking
161that is tricky. Do I really understand what is going on there?} In \CONTEXT\
162hanging is also available as basic feature.
163
164\startbuffer
165\starthanging[location=left]
166    {\blackrule[color=maincolor,width=3cm,height=1cm]}
167    \samplefile{carrol}
168\stophanging
169\stopbuffer
170
171\typebuffer {\setupalign[tolerant,stretch]\getbuffer}
172
173\startbuffer
174\starthanging[location=right]
175    {\blackrule[color=maincolor,width=10cm,height=1cm]}
176    \samplefile{jojomayer}
177\stophanging
178\stopbuffer
179
180\typebuffer {\setupalign[tolerant,stretch]\getbuffer}
181
182The hanging floats are not implemented this way but are hooked into the
183paragraph start routines. The original approach was a variant of
184the macros by Daniel Comenetz as published in TUGBoat Volume 14 (1993),
185No.~1: Anchored Figures at Either Margin. In the meantime they are far
186from that, so \CONTEXT\ users can safely blame me for any issues.
187
188\stopsection
189
190\startsection[title={Unpredictable dimensions}]
191
192In an ideal world images will be sort of consistent but in practice the dimension
193will differ, even fonts used in graphics can be different, and they can have
194white space around them. When testing a layout it helps to use mockups with a
195clear border. If these look okay, one can argue that worse looking assemblies
196(more visual whitespace above of below) is a matter of making better images. In
197\in {figure} [demo-2a] we demonstrate how different dimensions influence the space
198below the placement.
199
200\startbuffer[demo-2a]
201\dostepwiserecurse {2} {8} {1} {
202    \useMPlibrary[dum]
203    \setuplayout[middle]
204    \setupbodyfont[plex]
205    \setupalign[tolerant,stretch]
206    \startplacefigure[location=left]
207        \externalfigure[dummy][width=#1cm]
208    \stopplacefigure
209    \samplefile{sapolsky}
210    \par
211}
212\stopbuffer
213
214\typebuffer[demo-2a]
215
216\startplacefigure[reference=demo-2a,title={Spacing relates to dimensions.}]
217    \startcombination[3*1]
218        {\typesetbuffer[demo-2a][width=5cm,frame=on,page=1]} {}
219        {\typesetbuffer[demo-2a][width=5cm,frame=on,page=2]} {}
220        {\typesetbuffer[demo-2a][width=5cm,frame=on,page=3]} {}
221    \stopcombination
222\stopplacefigure
223
224In \CONTEXT\ there are plenty of options to add more space above or below the
225image. You can anchor the image to the first line in different ways and you can
226move it some lines down, either or not with text flowing around it. But here we
227stick to simple cases, we only discuss the challenges.
228
229Challenge: Adapt the wrapping to the right dimensions and make sure that the
230(optional) caption doesn't overlap with the text below.
231
232\stopsection
233
234\startsection[title={Moving forward}]
235
236When the insert doesn't fit it has to move, which is why it's called a float. One
237solution is do take it out of the page stream and turn it into a regular
238placement, normally centered horizontally somewhere on the page, and in this case
239probably at the top of one of the next pages. Because we can cross reference this
240is a quite okay solution. But, in educational documents, where authors refer to
241the graphic (picture) on the left or right, that doesn't work out well. The
242following content is bound to the image.
243
244Calculating the amount of available space is a bit tricky due to the way \TEX\
245works. But let's assume that this can be done, in \CONTEXT\ we have seen several
246strategies for this, we then end up at the top of the next page and there
247different spacing rules apply, like: no spacing at the top at all. In our
248examples no whitespace between paragraphs is present. The final solutions are
249complicated by the fact that we need to take this into account.
250
251Challenge: Make sure that we never run off the page but also that we
252don't end up with weird situations at the top of the next page.
253
254Another possibility is that images so tightly fit a whole number of lines, that a
255next one can come too close to a previous one. Again, this demands some analysis.
256Here we use examples with captions but when there are no captions, there is also
257less visual space (no depth in lines).
258
259Challenge: Make sure that a following insert never runs too close to a previous
260insert.
261
262Solutions can be made better when we use multi|-|pass information. Because in a
263typical \TEX\ run there is only looking back, storing information can actually
264make us look forward. But, as in science fiction: when you act upon the future,
265the past becomes different and therefore also the future (after that particular
266future). This means that you can only go forward. Say you have 10 cases: when
267case 5 changes because of some feedback, then case 6 upto 10 also can change. So,
268you might need more than 10 runs to get things right. In a workflow where users
269are waiting for a result, and a few hundred side floats are used this doesn't
270sell well: processing 400 pages with a 20 page per second rate takes 20 seconds
271per run. Normally one needs a few runs to get the references right. Assuming a
272worst case of 60 seconds, 10 extra runs will bring you close to 15 minutes. No
273deal.
274
275Of course one can argue for some load|-|in|-|memory and optimize in one go, but
276although \TEX\ can do that well for paragraphs, it won't work for complex
277documents. Sure, it's a nice academic exercise to explore limited cases but
278those are not what we encounter.
279
280\stopsection
281
282\startsection[title={Cooperation}]
283
284When discussing (on YouTube) \quotation {Extending Darwin's Revolution} David
285Sloan Wilson and Robert Sapolsky touch on the fact that in some disciplines (like
286economics) evolutionary principles are applied. One can apply for instance the
287concept of a \quote {selfish gene}. However, they argue that when doing that, one
288actually lags behind the now accepted group selection (which goes beyond the
289individual benefits). An example is given where aggressive behavior on the short
290term can turn one in a winner (who takes it all) but which can lead to self
291destructive in the long run: cooperating seems to works better than terminal
292competition.
293
294In \TEX\ we have glues and penalties. The machinery likes to break at a glue but
295a severe penalty can prohibit that. The fact that we have penalties and no
296rewards is interesting: a break can be stimulated by a negative penalty. I've
297forgotten most of what I learned about cognitive psychology but I do remember
298that penalty vs reward discussions could get somewhat out of hand.
299
300So, when we have in the node list a mix of glue (you can break here), penalties
301(better not break here) and rewards (consider breaking here) you can imagine that
302these nodes compete. The optimal solution is not really a group process but
303basically a rather selfish game. Building a system around that kind of
304cooperation is not easy. In \CONTEXT\ a lot of attention always went into
305consistent vertical spacing. In \MKII\ there were some \quote {look back} and
306\quote {control forward} mechanisms in place, and in \MKIV\ we use a model of
307weighted glue: a combination of penalties and skips. Again we look back and again
308we also try to control the future. This works reasonable well but what if we end
309up in a real competition?
310
311A section head should not end up at the bottom of a page. Because when it gets
312typeset it is unknown what follows, it does some checking and then tries to make
313sure that there is no page break following. Of course there needs to be a
314provision for the cases that there are many (sub)heads and of course when there
315are only heads on a page (in a concept for instance) you don't want to run of the
316page.
317
318Similar situations arise with for instance itemized lists and the tabulate
319mechanism. There we have some heuristics that keep content together in a way that
320makes sense given the construct: no single table line at the bottom of a page
321etc. But then comes the side float. The available space is checked. When doing
322that the whitespace following the section head has to collapse with the space
323around the image, but of course at the top of a page spacing is different. So,
324calculations are done, but even a small difference between what is possible and
325what is needed can eventually still trigger an unwanted page break. This is
326because you cannot really ask how much has been accumulated so far: the space
327used is influenced by what comes next (like whitespace, maybe interline space,
328the previous depth correction, etc). That in turn means that you have to (sort
329of) trigger these future space related items to be applied already.
330
331Challenge: Let the side float mechanism nicely cooperate with other mechanisms
332that have their own preferences for crossing pages, adding whitespace and being
333bound to following content.
334
335\stopsection
336
337\startsection[title={Easy bits}]
338
339Of course, once there is such a mechanism in place, user demands will trigger
340more features. Most of these are actually not that hard to deal with: renumbering
341due to moved content, automatic anchoring to the inner or outer margin,
342horizontal placement and shifting into margins, etc. Everything that doesn't
343relate to vertical placement is rather trivial to deal with, especially when the
344whole infrastructure for that is already present (as in \CONTEXT). The problem
345with such extensions is that one can easily forget what is possible because most
346are rarely used.
347
348Challenge: Make sure that all fits into an understandable model and is easy to
349control.
350
351\stopsection
352
353\startsection[title={Conclusion}]
354
355The side float mechanism in \CONTEXT\ is complex, has many low level options, and
356its code doesn't look pretty. It is probably the mechanism that has been
357overhauled and touched most in the code base. It is also the mechanism that
358(still) can behave in ways you don't expect when combined with other mechanisms.
359The way we deal with this (if needed) is to add directives to (in our case) \XML\
360files that tells the engine what to do. Because that is a last resort it is only
361needed when making the final product. So in the end, we're still have the
362benefits of automated typesetting.
363
364Of course we can come up with a different model (basically re|-|implement the
365page builder) but apart from much else falling apart, it will just introduce
366other constraints and side effects. Thinking in terms of selfish nodes, glues and
367penalties, works ok for a specific document where one can also impose usage
368rules. If you know that a section head is always followed by regular text, things
369become easier. But in a system like \CONTEXT\ you need to update your thinking to
370group selection: mechanisms have to work together and that can be pretty
371complicated. Some mechanisms can do that better than others. One outcome can be
372that for instance side floats are not really group players, so eventually they
373might become less popular and fade away. Of course, as often, years later they
374get rediscovered and the cycle starts again. Maybe a string argument can be made
375that in fully automated typesetting concepts like side floats should not be used
376anyway.
377
378If I have to summarize this wrap up, the conclusion is that we should be
379realistic: we're not dealing with an expert system, but with a bunch of
380heuristics. You need an intelligent system to help you out of deadlock and
381oscillating solutions. Given the different preferences you need a multiple
382personality system. You might actually need a system that wraps your expectations
383and solutions and that adapts to changes in those over time. But if there is such
384a system (some day) it probably doesn't need you. In fact, maybe even typesetting
385is not needed any more by then.
386
387\stopsection
388
389\stopchapter
390
391\stopcomponent
392