publications-datasets.tex /size: 12 Kb    last modification: 2020-07-01 14:35
1\environment publications-style
2
3\startcomponent publications-datasets
4
5\startchapter[title=Datasets]
6
7Normally in a document you will use only one bibliographic database, whether or
8not its source is distributed over multiple files. Nevertheless, we support
9multiple database formats as well which is why we talk of datasets instead. The
10use of multiple datasets allows the isolation of different bibliographies (a
11single bibliography can nevertheless be rendered by structure element: section,
12chapter, part, etc. as we shall see later). A good example of the use of multiple
13datasets would be for a proper bibliography itself in addition to a reference
14catalog (of equipment, suppliers, software, patents, legal jurisprudence, music,
15\unknown). Indeed, datasets can be used to hold both bibliographic and
16non|-|bibliographic information.
17
18A dataset is initiated with the \Cindex {definebtxdataset} command.
19
20\cindex {definebtxdataset}
21
22\startTEX
23\definebtxdataset[default]
24\stopTEX
25
26\startaside
27A default database, \TEXcode {default}, is predefined, yet we recommend defining
28it explicitly because in the future we may provide more options.
29\stopaside
30
31Like other commands in \CONTEXT, the dataset options can be setup using the
32command \Cindex {setupbtxdataset}.
33
34\cindex {definebtxdataset}
35\showsetup[definebtxdataset]
36
37\cindex {setupbtxdataset}
38\showsetup[setupbtxdataset]
39
40A dataset is loaded from some source through the use of the
41\Cindex {usebtxdataset} command.
42
43Here are some examples:
44
45\cindex {usebtxdataset}
46\tindex {.bib}
47\tindex {.xml}
48\tindex {.lua}
49\tindex {.bbl}
50
51\startTEX
52\usebtxdataset[tugboat][tugboat.bib]
53\usebtxdataset[default][mtx-bibtex-output.xml]
54\usebtxdataset[default][test-001-btx-standard.lua]
55\usebtxdataset[default][mkii-publications.bbl]
56\usebtxdataset[default][named.buffer]
57\stopTEX
58
59\cindex {usebtxdataset}
60\showsetup[usebtxdataset]
61
62The four suffixes illustrated in the example above are understood by the loader.
63Here the dataset (other than the first) has the name \TEXcode {default} and the
64four database files are merged. The last example shows that a \TEXcode {named}
65\Index {buffer} can also be employed to add dataset entries (in \BIBTEX\ format).
66This may be useful for small additions or examples, but it is generally a better
67idea (for convenience of management of data) to place them in files separate from
68the document source code.
69
70Definitions in the document source (coded in \TEX\ speak) are also added, and
71they are saved for successive runs. This means that if you load and define
72entries, they will be known at a next run beforehand, so that references to them
73are independent of where in the document source loading and definitions take
74place. This is convenient to eventually break|-|up the dataset loading calls to
75relevant sections of the document structure.
76
77In this document we use some example databases, so let's load one of them now:
78\startfootnote This code snippet demonstrates that \TEXcode {\usebtxdataset} will
79implicitly declare an undefined dataset name, although this practice is to be
80discouraged. Similarly, omitting to specify the dataset name \TEXcode {[default]}
81in the examples given earlier would fall|-|back correctly, but this, too, is to
82be discouraged as being potentially error|-|prone. \stopfootnote
83
84\startbuffer
85\usebtxdataset[example][mkiv-publications.bib]
86\stopbuffer
87
88\cindex {definebtxdataset}
89\cindex {usebtxdataset}
90
91\typeTEXbuffer
92
93\getbuffer
94
95The beginning of the file \type {mkiv-publications.bib} is shown below in \in
96{table} [tab:mkiv-publications.bib]. This bibliography database test file
97contains one entry of each standard type or category, with the \Index {tag} set
98to the entry type name. This entry shown here illustrates many features that will
99be explained elsewhere in the text.
100
101\startsection[title=Dataset coverage]
102
103You can load much more data than you actually need. Usually only those entries
104that are referred to explicitly will be shown in lists, and commands used to
105select these dataset entries will described in \in {chapter} [ch:cite].
106
107A single bibliography list can span groups of datasets; also multiple datasets
108can loaded from the same source, for example, one per chapter, in order to
109achieve a complete \Index {isolation} of bibliographies with respect to numbering
110and references.
111
112As this concept is not obvious but can be quite useful, we will repeat this last
113point: multiple datasets can be loaded using the same source file, i.e.\
114containing the same data, to be used in parallel, independently. There is little
115penalty in keeping even very large datasets as multiple copies in memory.
116
117The current active dataset to be used by default can be set with
118
119\startbuffer
120\setupbtx[dataset=example]
121\stopbuffer
122
123\cindex {setupbtx}
124
125\typeTEXbuffer
126
127\getbuffer
128
129However, most publication|-|related commands accept optional arguments that
130denote the dataset and references to entries can always be prefixed with a
131dataset identifier. More about that later.
132
133\showsetup[setupbtx]
134
135\stopsection
136
137\startsection [title=Specification]
138
139The content of a dataset can really be anything: entries of type (or categories)
140of all sorts, each containing arbitrary fields. The use to be made of this data
141can vary greatly since the system is not limited to the production of
142bibliography lists, in particular. The intended use is reflected through a set of
143specifications, specific to each bibliography (or non|-|bibliography) style.
144These specifications affect the interpretation of dataset categories and fields
145as well as their rendering. They will also affect the rendering of citations or
146the reference or invocation of individual data entries.
147
148The \TEXcode {default} bibliography specification is very simple: only the
149categories \TEXcode {book} and \TEXcode {article} are explicitly defined. These
150were shown along with their default rendering in the quick|-|start example on \at
151{page} [ch:quick]. We purposely limited this \TEXcode {default} specification as
152a minimal example for a bibliography.
153
154The notion of categories and the fields that they might contain and their
155interpretation depend on a particular specification, although the dataset
156\emphasis {content} is independent of all eventual rendering specifications that
157may be applied.
158
159An alternative set of specifications can be selected using, for example
160
161\startbuffer
162\usebtxdefinitions[apa]
163\stopbuffer
164
165\cindex {usebtxdefinitions}
166\index {style+APA}
167\seeindex {specification}{style}
168
169\typeTEXbuffer
170
171\getbuffer
172
173Alternately, the set of specifications can be loaded and (later) activated using
174
175\cindex {loadbtxdefinitionfile}
176\cindex {setupbtx}
177\index {style+APA}
178
179\startTEX
180\loadbtxdefinitionfile[apa]
181...
182\setupbtx[specification=apa]
183\stopTEX
184
185but it is safer to use the \TEXcode {\use} rather than \TEXcode {\load} form, in
186particular with specifications that may themselves have several variants. Also,
187it is way too easy to later forget to set the \TEXcode {specification} parameter
188and then wonder why the loaded specification was not applied.
189
190\startaside
191We wish to clarify that each specification defines the categories of entries and
192the interpretation or use of the fields that they contain, but does not alter the
193data itself, only how this data is used. It also defines \emphasis {setups} that
194control the rendering of lists as well as citations (to be described below).
195Additionally, it creates a namespace with settings for particular \emphasis
196{parameters} controlling the formatting of names, for example, punctuation as
197well as other stylistic features. The user can tune or overload these settings as
198needed.
199\stopaside
200
201A specification need not be activated before loading a dataset; indeed the
202contents of a dataset are stored independent of the specification, and multiple
203specifications can be applied to the same dataset (although this will not usually
204be the case). Furthermore, multiple specification files can be loaded
205simultaneously as they reside in separate namespaces, but only one specification
206can be selected at a time. We introduce these commands here in the context of
207datasets as the labeling of categories and of field use can change depending on
208the specification. Indeed, some specifications might ignore certain fields
209present in the dataset that may be used with other specifications. The details of
210how this is programmed will be explained in \in {Chapter} [ch:custom].
211
212So a specification is both a definition of how a dataset is to be interpreted as
213well as stylistic tuning of how it is to be rendered.
214
215\cindex   {loadbtxdefinitionfile}
216\showsetup[loadbtxdefinitionfile]
217
218\cindex   {usebtxdefinitions}
219\showsetup[usebtxdefinitions]
220
221\stopsection
222
223\startsection [title=Dataset diagnostics]
224
225You can ask for an overview of entries present in a dataset with:
226
227\startbuffer
228\showbtxdatasetfields[example]
229\stopbuffer
230
231\cindex {showbtxdatasetfields}
232
233\typeTEXbuffer
234
235The listing that this produces is shown in \in {Appendix} [ch:datasetfields].
236
237\cindex   {showbtxdatasetfields}
238\showsetup[showbtxdatasetfields]
239\showsetup[showbtxdatasetfields:argument]
240
241Sometimes you might want to check a database, listing all of its entries in
242detail. This can be particularly useful when in doubt concerning the correctness
243or the completeness of the data source, remembering that invalid entries and some
244syntax errors are simply skipped over. One way of examining the loaded dataset in
245detail is the following:
246
247\startbuffer
248\showbtxdatasetcompleteness[example]
249\stopbuffer
250
251\cindex {showbtxdatasetcompleteness}
252
253\typeTEXbuffer
254
255The diagnostic listing (which can be rather long) is shown in \in {Appendix}
256[ch:datasetcompleteness].
257
258\cindex   {showbtxdatasetcompleteness}
259\showsetup[showbtxdatasetcompleteness]
260\showsetup[showbtxdatasetcompleteness:argument]
261
262The dataset contains many entries and each entry is assigned to a \Index
263{category}. It must be stressed, so we repeat ourselves here, that these \quote
264{categories} can be of any sort whatsoever, the meaning of which resides in the
265rendering style that is chosen. The entries contain fields, and these too can be
266of any sort; their use also depends on the rendering style and the \Index
267{category} in which they belong. \BibTeX\ has conventionally defined a number of
268standard categories, each making use of a number of fields considered either
269\index {field+required}required, \index {field+optional}optional or \index
270{field+ignored}ignored. However, different traditional \BIBTEX\ rendering styles
271can make inconsistant use of these standard categories and fields. To make
272matters worse, different \Tindex {.bib} database handling programs might use (and
273impose) differing \quote {standards} as well, as mentioned above. \startfootnote
274For example, \Tindex {jabref}, in addition to discarding all comments contained
275in the database file, will convert all unrecognized, preciously named categories
276to \tindex {@other}\BTXcode {@Other}! Of course, \Tindex {jabref} is flexible
277enough to be configured with new categories and additional fields, so users of
278\Tindex {jabref} with \CONTEXT\ will probably want to use an extended, custom
279configuration. \stopfootnote This situation arises from the complexity of
280handling bibliographic data of all sorts.
281
282You can see all (currently known) \index {category}categories and \index
283{field}fields with:
284
285\cindex {showbtxfields}
286
287\startTEX
288\showbtxfields[rotation=...]
289\stopTEX
290
291The result is shown \in {table} [tab:fields], below.
292
293\cindex   {showbtxfields}
294\showsetup[showbtxfields]
295\showsetup[showbtxfields:argument]
296
297Note that other, possibly non|-|bibliographic use of the present dataset system
298might define entirely different categories and field types, possibly having
299nothing at all to do with the names shown here. An example of such use is given
300in \in {chapter} [ch:duane].
301
302Just as a database can be much larger than needed for a document, the same is
303true for the fields that make up an entry; not all entry fields will be
304necessarily used. This idea will be developed in the next section describing the
305rendering of bibliography lists.
306
307\stopsection
308
309\startplacetable
310  [reference=tab:mkiv-publications.bib,
311   title={mkiv-publications.bib\\
312          This test file was constructed to illustrate various features of the
313          \BIBTEX\ format and contains some fields that might at first glance
314          appear somewhat curious.}].
315  \typeBTXfile
316    [range={@Comment{Start example},@Comment{Stop example}}]
317    {mkiv-publications.bib}
318\stopplacetable
319
320\startplacetable
321  [reference=tab:fields,
322   list={\TEXcode {\showbtxfields[rotation=90]}},
323   title={\cindex {showbtxfields}\TEXcode {\showbtxfields[rotation=90]} The entry
324          \Index {category} and \Index {field} names (and how they are used) are
325          defined by both the rendering style as well as by the contents of the
326          dataset. \index {field+required}\quote {Required} fields are indicated
327          in green. All unmarked fields are normally \index
328          {field+ignored}ignored in the rendering.}]
329    \small
330    \showbtxfields[rotation=90]
331\stopplacetable
332
333\placefloats
334
335\stopchapter
336
337\stopcomponent
338