% language=us runpath=texruns:manuals/musings

\useMPlibrary[dum]

% Extending Darwin's Revolution – David Sloan Wilson & Robert Sapolsky

\startcomponent musings-toocomplex

\environment musings-style

\startchapter[title={False promises}]

\startsection[title={Introduction}]

\startlines \setupalign[flushright]
Hans Hagen
Hasselt NL
July 2019 (public 2023)
\stoplines

The \TEX\ typesetting system is pretty powerful, and even more so when you
combine it with \METAPOST\ and \LUA. Add an \XML\ parser, a whole lot of handy
macros, provide support for fonts and advanced \PDF\ output and you have a hard
to beat tool. We're talking \CONTEXT.

Such a system is very well suited for fully automated typesetting. There are
\TEX\ lovers who claim that \TEX\ can do anything better than the competition but
that's not true. Automated typesetting is quite doable when you accept the
constraints. When the input is unpredictable you need to play safe!

Some things are easy: turning complex \XML\ into \PDF\ with adaptive graphics,
fast data processing, colorful layouts, conditional processing, extensive cross
referencing, you can safely say that it can be done. But in practice there is
some design involved and those are often specified by people who manipulate a
layout on the fly and tweak and cheat in an interactive \WYSIWYG\ program. That is
however not an option in automated typesetting. Traditional thinking with manual
intervention has to make place for systematic and consistent solutions.
Limitations can be compensated by clever designs and getting the maximum out of
the system used.

Unfortunately in practice some habits are hard to get rid of. Inconsistent use of
colors, fonts, sectioning, image placements are just a few aspects that come to
mind. When you typeset educational documents you also have to deal with strong
opinions about how something should be presented and what students can't~(!)
handle, like for instance cross references. One of the most dominant demands in
typesetting such documents are so called side floats. In (for instance)
scientific publishing references to content typeset elsewhere (formulas,
graphics) is acceptable but in educational documents this is often not an option
(don't ask me why).

In the next sections I will mention a few aspects of side floats. I will not
discuss the options because these are covered in manuals. Here we stick to the
challenges and the main question that you have to ask yourself is: \quotation
{How would I solve that if it can be solved at all?}. It might make you a bit
more tolerant for suboptimal outcome.

\stopsection

\startsection[title={The basics}]

We start with a simple example. The result is shown in \in {figure} [demo-1a]. We
have figures, put at the left, with enough text alongside so that we don't have a
problem running into the next figure.

\startbuffer[demo-1a]
\dorecurse {8} {
    \useMPlibrary[dum]
    \setuplayout[middle]
    \setupbodyfont[plex]
    \startplacefigure[location=left]
        \externalfigure[dummy][width=3cm]
    \stopplacefigure
    \samplefile{sapolsky}
    \par
}
\stopbuffer

\typebuffer[demo-1a]

\startplacefigure[reference=demo-1a,title={A simple example with enough text in a single paragraph.}]
    \startcombination
        {\typesetbuffer[demo-1a][width=5cm,frame=on,page=1]} {}
        {\typesetbuffer[demo-1a][width=5cm,frame=on,page=2]} {}
    \stopcombination
\stopplacefigure

Challenge: Anchor some boxed material to the running text and make sure that the
text runs around that material. When there is not enough room available on the
page, enforce a page break and move the lot to the next page.

But more often than not, the following paragraph is not long enough to go around
the insert. The worst case is of course when we end up with one word below the
insert, for which the solution is to adapt the text or make the insert wider or
narrower. Forgetting about this for now, we move to the case where there is not
enough text: \in {figure} [demo-1b].

\startbuffer[demo-1b]
\dorecurse {8} {
    \useMPlibrary[dum]
    \setuplayout[middle]
    \setupbodyfont[plex]
    \startplacefigure[location=left]
        \externalfigure[dummy][width=3cm]
    \stopplacefigure
    \samplefile{ward} \par \samplefile{ward}
    \par
}
\stopbuffer

\typebuffer[demo-1b]

\startplacefigure[reference=demo-1b,title={A simple example with enough text but multiple paragraphs.}]
    \startcombination
        {\typesetbuffer[demo-1b][width=5cm,frame=on,page=1]} {}
        {\typesetbuffer[demo-1b][width=5cm,frame=on,page=2]} {}
    \stopcombination
\stopplacefigure

Challenge: At every new paragraph, check if we're still not done with the blob
we're typesetting around and carry on till we are behind the insert.

\startbuffer[demo-1c]
\dorecurse {8} {
    \useMPlibrary[dum]
    \setuplayout[middle]
    \setupbodyfont[plex]
    \startplacefigure[location=left]
        \externalfigure[dummy][width=3cm]
    \stopplacefigure
    \samplefile{ward}
    \par
}
\stopbuffer

The next example, shown in \in {figure} [demo-1c], has less text. However, the
running text is still alongside the figure, so this means that white space need
to be added till we're beyond.

\typebuffer[demo-1c]

\startplacefigure[reference=demo-1c,title={A simple example with less text}]
    \startcombination
        {\typesetbuffer[demo-1c][width=5cm,frame=on,page=1]} {}
        {\typesetbuffer[demo-1c][width=5cm,frame=on,page=2]} {}
    \stopcombination
\stopplacefigure

Challenge: When there is not enough content, and the next insert is coming, we
add enough whitespace to go around the insert and then start the new one. This is
typically something that can also be enforced by an option.

Before we move on to the next challenge, let's explain how we run around the
insert. When \TEX\ typesets a paragraph, it uses dimensions like \typ {\leftskip}
and \typ {\rightskip} (margins) and shape directives like \typ {\hangindent} and
\typ {\hangafter}. There is also the possibility to define a \typ {\parshape} but
we will leave that for now. The with of the image is reflected in the indent and
the height gets divided by the line height and becomes the \typ {\hangafter}.
Whenever a new paragraph is started, these parameters have to be set again.
\footnote {I still consider playing with a third parameter representing hang
height and add that to the line break routine, but I have to admit that tweaking
that is tricky. Do I really understand what is going on there?} In \CONTEXT\
hanging is also available as basic feature.

\startbuffer
\starthanging[location=left]
    {\blackrule[color=maincolor,width=3cm,height=1cm]}
    \samplefile{carrol}
\stophanging
\stopbuffer

\typebuffer {\setupalign[tolerant,stretch]\getbuffer}

\startbuffer
\starthanging[location=right]
    {\blackrule[color=maincolor,width=10cm,height=1cm]}
    \samplefile{jojomayer}
\stophanging
\stopbuffer

\typebuffer {\setupalign[tolerant,stretch]\getbuffer}

The hanging floats are not implemented this way but are hooked into the
paragraph start routines. The original approach was a variant of
the macros by Daniel Comenetz as published in TUGBoat Volume 14 (1993),
No.~1: Anchored Figures at Either Margin. In the meantime they are far
from that, so \CONTEXT\ users can safely blame me for any issues.

\stopsection

\startsection[title={Unpredictable dimensions}]

In an ideal world images will be sort of consistent but in practice the dimension
will differ, even fonts used in graphics can be different, and they can have
white space around them. When testing a layout it helps to use mockups with a
clear border. If these look okay, one can argue that worse looking assemblies
(more visual whitespace above of below) is a matter of making better images. In
\in {figure} [demo-2a] we demonstrate how different dimensions influence the space
below the placement.

\startbuffer[demo-2a]
\dostepwiserecurse {2} {8} {1} {
    \useMPlibrary[dum]
    \setuplayout[middle]
    \setupbodyfont[plex]
    \setupalign[tolerant,stretch]
    \startplacefigure[location=left]
        \externalfigure[dummy][width=#1cm]
    \stopplacefigure
    \samplefile{sapolsky}
    \par
}
\stopbuffer

\typebuffer[demo-2a]

\startplacefigure[reference=demo-2a,title={Spacing relates to dimensions.}]
    \startcombination[3*1]
        {\typesetbuffer[demo-2a][width=5cm,frame=on,page=1]} {}
        {\typesetbuffer[demo-2a][width=5cm,frame=on,page=2]} {}
        {\typesetbuffer[demo-2a][width=5cm,frame=on,page=3]} {}
    \stopcombination
\stopplacefigure

In \CONTEXT\ there are plenty of options to add more space above or below the
image. You can anchor the image to the first line in different ways and you can
move it some lines down, either or not with text flowing around it. But here we
stick to simple cases, we only discuss the challenges.

Challenge: Adapt the wrapping to the right dimensions and make sure that the
(optional) caption doesn't overlap with the text below.

\stopsection

\startsection[title={Moving forward}]

When the insert doesn't fit it has to move, which is why it's called a float. One
solution is do take it out of the page stream and turn it into a regular
placement, normally centered horizontally somewhere on the page, and in this case
probably at the top of one of the next pages. Because we can cross reference this
is a quite okay solution. But, in educational documents, where authors refer to
the graphic (picture) on the left or right, that doesn't work out well. The
following content is bound to the image.

Calculating the amount of available space is a bit tricky due to the way \TEX\
works. But let's assume that this can be done, in \CONTEXT\ we have seen several
strategies for this, we then end up at the top of the next page and there
different spacing rules apply, like: no spacing at the top at all. In our
examples no whitespace between paragraphs is present. The final solutions are
complicated by the fact that we need to take this into account.

Challenge: Make sure that we never run off the page but also that we
don't end up with weird situations at the top of the next page.

Another possibility is that images so tightly fit a whole number of lines, that a
next one can come too close to a previous one. Again, this demands some analysis.
Here we use examples with captions but when there are no captions, there is also
less visual space (no depth in lines).

Challenge: Make sure that a following insert never runs too close to a previous
insert.

Solutions can be made better when we use multi|-|pass information. Because in a
typical \TEX\ run there is only looking back, storing information can actually
make us look forward. But, as in science fiction: when you act upon the future,
the past becomes different and therefore also the future (after that particular
future). This means that you can only go forward. Say you have 10 cases: when
case 5 changes because of some feedback, then case 6 upto 10 also can change. So,
you might need more than 10 runs to get things right. In a workflow where users
are waiting for a result, and a few hundred side floats are used this doesn't
sell well: processing 400 pages with a 20 page per second rate takes 20 seconds
per run. Normally one needs a few runs to get the references right. Assuming a
worst case of 60 seconds, 10 extra runs will bring you close to 15 minutes. No
deal.

Of course one can argue for some load|-|in|-|memory and optimize in one go, but
although \TEX\ can do that well for paragraphs, it won't work for complex
documents. Sure, it's a nice academic exercise to explore limited cases but
those are not what we encounter.

\stopsection

\startsection[title={Cooperation}]

When discussing (on YouTube) \quotation {Extending Darwin's Revolution} David
Sloan Wilson and Robert Sapolsky touch on the fact that in some disciplines (like
economics) evolutionary principles are applied. One can apply for instance the
concept of a \quote {selfish gene}. However, they argue that when doing that, one
actually lags behind the now accepted group selection (which goes beyond the
individual benefits). An example is given where aggressive behavior on the short
term can turn one in a winner (who takes it all) but which can lead to self
destructive in the long run: cooperating seems to works better than terminal
competition.

In \TEX\ we have glues and penalties. The machinery likes to break at a glue but
a severe penalty can prohibit that. The fact that we have penalties and no
rewards is interesting: a break can be stimulated by a negative penalty. I've
forgotten most of what I learned about cognitive psychology but I do remember
that penalty vs reward discussions could get somewhat out of hand.

So, when we have in the node list a mix of glue (you can break here), penalties
(better not break here) and rewards (consider breaking here) you can imagine that
these nodes compete. The optimal solution is not really a group process but
basically a rather selfish game. Building a system around that kind of
cooperation is not easy. In \CONTEXT\ a lot of attention always went into
consistent vertical spacing. In \MKII\ there were some \quote {look back} and
\quote {control forward} mechanisms in place, and in \MKIV\ we use a model of
weighted glue: a combination of penalties and skips. Again we look back and again
we also try to control the future. This works reasonable well but what if we end
up in a real competition?

A section head should not end up at the bottom of a page. Because when it gets
typeset it is unknown what follows, it does some checking and then tries to make
sure that there is no page break following. Of course there needs to be a
provision for the cases that there are many (sub)heads and of course when there
are only heads on a page (in a concept for instance) you don't want to run of the
page.

Similar situations arise with for instance itemized lists and the tabulate
mechanism. There we have some heuristics that keep content together in a way that
makes sense given the construct: no single table line at the bottom of a page
etc. But then comes the side float. The available space is checked. When doing
that the whitespace following the section head has to collapse with the space
around the image, but of course at the top of a page spacing is different. So,
calculations are done, but even a small difference between what is possible and
what is needed can eventually still trigger an unwanted page break. This is
because you cannot really ask how much has been accumulated so far: the space
used is influenced by what comes next (like whitespace, maybe interline space,
the previous depth correction, etc). That in turn means that you have to (sort
of) trigger these future space related items to be applied already.

Challenge: Let the side float mechanism nicely cooperate with other mechanisms
that have their own preferences for crossing pages, adding whitespace and being
bound to following content.

\stopsection

\startsection[title={Easy bits}]

Of course, once there is such a mechanism in place, user demands will trigger
more features. Most of these are actually not that hard to deal with: renumbering
due to moved content, automatic anchoring to the inner or outer margin,
horizontal placement and shifting into margins, etc. Everything that doesn't
relate to vertical placement is rather trivial to deal with, especially when the
whole infrastructure for that is already present (as in \CONTEXT). The problem
with such extensions is that one can easily forget what is possible because most
are rarely used.

Challenge: Make sure that all fits into an understandable model and is easy to
control.

\stopsection

\startsection[title={Conclusion}]

The side float mechanism in \CONTEXT\ is complex, has many low level options, and
its code doesn't look pretty. It is probably the mechanism that has been
overhauled and touched most in the code base. It is also the mechanism that
(still) can behave in ways you don't expect when combined with other mechanisms.
The way we deal with this (if needed) is to add directives to (in our case) \XML\
files that tells the engine what to do. Because that is a last resort it is only
needed when making the final product. So in the end, we're still have the
benefits of automated typesetting.

Of course we can come up with a different model (basically re|-|implement the
page builder) but apart from much else falling apart, it will just introduce
other constraints and side effects. Thinking in terms of selfish nodes, glues and
penalties, works ok for a specific document where one can also impose usage
rules. If you know that a section head is always followed by regular text, things
become easier. But in a system like \CONTEXT\ you need to update your thinking to
group selection: mechanisms have to work together and that can be pretty
complicated. Some mechanisms can do that better than others. One outcome can be
that for instance side floats are not really group players, so eventually they
might become less popular and fade away. Of course, as often, years later they
get rediscovered and the cycle starts again. Maybe a string argument can be made
that in fully automated typesetting concepts like side floats should not be used
anyway.

If I have to summarize this wrap up, the conclusion is that we should be
realistic: we're not dealing with an expert system, but with a bunch of
heuristics. You need an intelligent system to help you out of deadlock and
oscillating solutions. Given the different preferences you need a multiple
personality system. You might actually need a system that wraps your expectations
and solutions and that adapts to changes in those over time. But if there is such
a system (some day) it probably doesn't need you. In fact, maybe even typesetting
is not needed any more by then.

\stopsection

\stopchapter

\stopcomponent