% language=us runpath=texruns:manuals/workflows \environment workflows-style \startcomponent workflows-resources \startchapter[title=Accessing resources] One of the benefits of \TEX\ is that you can use it in automated workflows where large quantities of data is involved. A document can consist of several files and normally also includes images. Of course there are styles involved too. At \PRAGMA\ normally put styles and fonts in: \starttyping /data/site/context/tex/texmf-project/tex/context/user//... /data/site/context/tex/texmf-fonts/data///... \stoptyping alongside \starttyping /data/framework/... \stoptyping where the job management services are put, while we put resources in: \starttyping /data/resources/... \stoptyping The processing happens in: \starttyping /data/work// \stoptyping Putting styles (and resources like logos and common images) and fonts (if the project has specific ones not present in the distribution) in the \TEX\ tree makes sense because that is where such files are normally searched. Of course you need to keep the distributions file database upto|-|date after adding files there. Processing has to happen isolated from other runs so there we use unique locations. The services responsible for running also deal with regular cleanup of these temporary files. Resources are somewhat special. They can be stable, i.e.\ change seldom, but more often they are updated or extended periodically (or even daily). We're not talking of a few files here but of thousands. In one project we have 20 thousand resources, that can be combined into arbitrary books, and in another one, each chapter alone is about 400 \XML\ and image files. That means we can have 5000 files per book and as we have at least 20 books, we end up with 100K files. In the first case accessing the resources is easy because there is a well defined structure (under our control) so we know exactly where each file sits in the resource tree. In the 100K case there is a deeper structure which is in itself predictable but because many authors are involved the references to these files are somewhat instable (and undefined). It is surprising to notice that publishers don't care about filenames (read: cannot control all the parties involved) which means that we have inconsistent use of mixed case in filenames, and spaces, underscores and dashes creeping in. Because typesetting for paper is always at the end of the pipeline (which nowadays is mostly driven by (limitations) of web products) we need to have a robust and flexible lookup mechanism. It's a side effect of the click and point culture: if objects are associated (filename in source file with file on the system) anything you key in will work, and consistency completely depends on the user. And bad things then happen when files are copied, renamed, etc. In that stadium we can better be tolerant than try to get it fixed. \footnote {From what we normally receive we often conclude that copy|-|editing and image production companies don't impose any discipline or probably simply lack the tools and methods to control this. Some of our workflows had checkers and fixers, so that when we got 5000 new resources while only a few needed to be replaced we could filter the right ones. It was not uncommon to find duplicates for thousands of pictures: similar or older variants.} \starttyping foo.jpg bar/foo.jpg images/bar/foo.jpg images/foo.jpg \stoptyping The xml files have names like: \starttyping b-c.xml a/b-c.jpg a/b/b-c.jpg a/b/c/b-c.jpg \stoptyping So it's sort of a mess, especially if you add arbitrary casing to this. Of course one can argue that a wrong (relative) location is asking for problems, it's less an issue here because each image has a unique name. We could flatten the resource tree but having tens of thousands of files on one directory is asking for problems when you want to manage them. The typesetting (and related services) run on virtual machines. The three directories: \starttyping /data/site /data/resources /data/work \stoptyping are all mounted as nfs shares on a network storage. For the styles (and binaries) this is no big deal as normally these files are cached, but the resources are another story. Scanning the complete (mounted) resource tree each run is no option so there we use a special mechanism in \CONTEXT\ for locating files. Already early in the development of \MKIV\ one of the locating mechanisms was the following: \starttyping tree:////data/resources/foo/**/drawing.jpg tree:////data/resources/foo/**/Drawing.jpg \stoptyping Here the tree is scanned once per run, which is normally quite okay when there are not that many files and when the files reside on the machine itself. For a more high performance approach using network shares we have a different mechanism. This time it looks like this: \starttyping dirlist:/data/resources/**/drawing.jpg dirlist:/data/resources/**/Drawing.jpg dirlist:/data/resources/**/just/some/place/drawing.jpg dirlist:/data/resources/**/images/drawing.jpg dirlist:/data/resources/**/images/drawing.jpg?option=fileonly dirfile:/data/resources/**/images/drawing.jpg \stoptyping The first two lookups are wildcard. If there is a file with that name, it will be found. If there are more, the first hit is used. The second and third examples are more selective. Here the part after the \type {**} has to match too. So here we can deal with multiple files named \type {drawing.jpg}. The last two equivalent examples are more tolerant. If no explicit match is found, a lookup happens without being selective. The case of a name is ignored but when found, a name with the right case is used. You can hook a path into the resolver for source files, for example: \starttyping \usepath [dirfile://./resources/**] \setupexternalfigures[directory=dirfile://./resources/**] \stoptyping You need to make sure that file(name)s in that location don't override ones in the regular \TEX\ tree. These extra paths are only used for source file lookups so for instance font lookups are not affected. When you add, remove or move files the tree, you need to remove the \type {dirlist.*} files in the root because these are used for locating files. A new file will be generated automatically. Don't forget this! When content doesn't change an alternative discussed in in a later chapter can be considered: hashed databases of files. \stopchapter \stopcomponent