workflows-resources.tex /size: 6491 b    last modification: 2021-10-28 13:50
1% language=us runpath=texruns:manuals/workflows
2
3\environment workflows-style
4
5\startcomponent workflows-resources
6
7\startchapter[title=Accessing resources]
8
9One of the benefits of \TEX\ is that you can use it in automated workflows where
10large quantities of data is involved. A document can consist of several files and
11normally also includes images. Of course there are styles involved too. At
12\PRAGMA\ normally put styles and fonts in:
13
14\starttyping
15/data/site/context/tex/texmf-project/tex/context/user/<project>/...
16/data/site/context/tex/texmf-fonts/data/<foundry>/<collection>/...
17\stoptyping
18
19alongside
20
21\starttyping
22/data/framework/...
23\stoptyping
24
25where the job management services are put, while we put resources in:
26
27\starttyping
28/data/resources/...
29\stoptyping
30
31The processing happens in:
32
33\starttyping
34/data/work/<uuid user space>/
35\stoptyping
36
37Putting styles (and resources like logos and common images) and fonts (if the
38project has specific ones not present in the distribution) in the \TEX\ tree
39makes sense because that is where such files are normally searched. Of course you
40need to keep the distributions file database upto|-|date after adding files
41there.
42
43Processing has to happen isolated from other runs so there we use unique
44locations. The services responsible for running also deal with regular cleanup of
45these temporary files.
46
47Resources are somewhat special. They can be stable, i.e.\ change seldom, but more
48often they are updated or extended periodically (or even daily). We're not
49talking of a few files here but of thousands. In one project we have 20 thousand
50resources, that can be combined into arbitrary books, and in another one, each
51chapter alone is about 400 \XML\ and image files. That means we can have 5000
52files per book and as we have at least 20 books, we end up with 100K files. In
53the first case accessing the resources is easy because there is a well defined
54structure (under our control) so we know exactly where each file sits in the
55resource tree. In the 100K case there is a deeper structure which is in itself
56predictable but because many authors are involved the references to these files
57are somewhat instable (and undefined). It is surprising to notice that publishers
58don't care about filenames (read: cannot control all the parties involved) which
59means that we have inconsistent use of mixed case in filenames, and spaces,
60underscores and dashes creeping in. Because typesetting for paper is always at
61the end of the pipeline (which nowadays is mostly driven by (limitations) of web
62products) we need to have a robust and flexible lookup mechanism. It's a side
63effect of the click and point culture: if objects are associated (filename in
64source file with file on the system) anything you key in will work, and
65consistency completely depends on the user. And bad things then happen when files
66are copied, renamed, etc. In that stadium we can better be tolerant than try to
67get it fixed. \footnote {From what we normally receive we often conclude that
68copy|-|editing and image production companies don't impose any discipline or
69probably simply lack the tools and methods to control this. Some of our workflows
70had checkers and fixers, so that when we got 5000 new resources while only a few
71needed to be replaced we could filter the right ones. It was not uncommon to find
72duplicates for thousands of pictures: similar or older variants.}
73
74\starttyping
75foo.jpg
76bar/foo.jpg
77images/bar/foo.jpg
78images/foo.jpg
79\stoptyping
80
81The xml files have names like:
82
83\starttyping
84b-c.xml
85a/b-c.jpg
86a/b/b-c.jpg
87a/b/c/b-c.jpg
88\stoptyping
89
90So it's sort of a mess, especially if you add arbitrary casing to this. Of course
91one can argue that a wrong (relative) location is asking for problems, it's less
92an issue here because each image has a unique name. We could flatten the resource
93tree but having tens of thousands of files on one directory is asking for
94problems when you want to manage them.
95
96The typesetting (and related services) run on virtual machines. The three
97directories:
98
99\starttyping
100/data/site
101/data/resources
102/data/work
103\stoptyping
104
105are all mounted as nfs shares on a network storage. For the styles (and binaries)
106this is no big deal as normally these files are cached, but the resources are
107another story. Scanning the complete (mounted) resource tree each run is no
108option so there we use a special mechanism in \CONTEXT\ for locating files.
109
110Already early in the development of \MKIV\ one of the locating mechanisms was
111the following:
112
113\starttyping
114tree:////data/resources/foo/**/drawing.jpg
115tree:////data/resources/foo/**/Drawing.jpg
116\stoptyping
117
118Here the tree is scanned once per run, which is normally quite okay when there
119are not that many files and when the files reside on the machine itself. For a
120more high performance approach using network shares we have a different
121mechanism. This time it looks like this:
122
123\starttyping
124dirlist:/data/resources/**/drawing.jpg
125dirlist:/data/resources/**/Drawing.jpg
126dirlist:/data/resources/**/just/some/place/drawing.jpg
127dirlist:/data/resources/**/images/drawing.jpg
128dirlist:/data/resources/**/images/drawing.jpg?option=fileonly
129dirfile:/data/resources/**/images/drawing.jpg
130\stoptyping
131
132The first two lookups are wildcard. If there is a file with that name, it will be
133found. If there are more, the first hit is used. The second and third examples
134are more selective. Here the part after the \type {**} has to match too. So here
135we can deal with multiple files named \type {drawing.jpg}. The last two
136equivalent examples are more tolerant. If no explicit match is found, a lookup
137happens without being selective. The case of a name is ignored but when found, a
138name with the right case is used.
139
140You can hook a path into the resolver for source files, for example:
141
142\starttyping
143\usepath                       [dirfile://./resources/**]
144\setupexternalfigures[directory=dirfile://./resources/**]
145\stoptyping
146
147You need to make sure that file(name)s in that location don't override ones in
148the regular \TEX\ tree. These extra paths are only used for source file lookups
149so for instance font lookups are not affected.
150
151When you add, remove or move files the tree, you need to remove the \type
152{dirlist.*} files in the root because these are used for locating files. A new
153file will be generated automatically. Don't forget this!
154
155When content doesn't change an alternative discussed in in a later chapter can be
156considered: hashed databases of files.
157
158\stopchapter
159
160\stopcomponent
161