\environment publications-style

\startcomponent publications-datasets

\startchapter[title=Datasets]

Normally in a document you will use only one bibliographic database, whether or
not its source is distributed over multiple files. Nevertheless, we support
multiple database formats as well which is why we talk of datasets instead. The
use of multiple datasets allows the isolation of different bibliographies (a
single bibliography can nevertheless be rendered by structure element: section,
chapter, part, etc. as we shall see later). A good example of the use of multiple
datasets would be for a proper bibliography itself in addition to a reference
catalog (of equipment, suppliers, software, patents, legal jurisprudence, music,
\unknown). Indeed, datasets can be used to hold both bibliographic and
non|-|bibliographic information.

A dataset is initiated with the \Cindex {definebtxdataset} command.

\cindex {definebtxdataset}

\startTEX
\definebtxdataset[default]
\stopTEX

\startaside
A default database, \TEXcode {default}, is predefined, yet we recommend defining
it explicitly because in the future we may provide more options.
\stopaside

Like other commands in \CONTEXT, the dataset options can be setup using the
command \Cindex {setupbtxdataset}.

\cindex {definebtxdataset}
\showsetup[definebtxdataset]

\cindex {setupbtxdataset}
\showsetup[setupbtxdataset]

A dataset is loaded from some source through the use of the
\Cindex {usebtxdataset} command.

Here are some examples:

\cindex {usebtxdataset}
\tindex {.bib}
\tindex {.xml}
\tindex {.lua}
\tindex {.bbl}

\startTEX
\usebtxdataset[tugboat][tugboat.bib]
\usebtxdataset[default][mtx-bibtex-output.xml]
\usebtxdataset[default][test-001-btx-standard.lua]
\usebtxdataset[default][mkii-publications.bbl]
\usebtxdataset[default][named.buffer]
\stopTEX

\cindex {usebtxdataset}
\showsetup[usebtxdataset]

The four suffixes illustrated in the example above are understood by the loader.
Here the dataset (other than the first) has the name \TEXcode {default} and the
four database files are merged. The last example shows that a \TEXcode {named}
\Index {buffer} can also be employed to add dataset entries (in \BIBTEX\ format).
This may be useful for small additions or examples, but it is generally a better
idea (for convenience of management of data) to place them in files separate from
the document source code.

Definitions in the document source (coded in \TEX\ speak) are also added, and
they are saved for successive runs. This means that if you load and define
entries, they will be known at a next run beforehand, so that references to them
are independent of where in the document source loading and definitions take
place. This is convenient to eventually break|-|up the dataset loading calls to
relevant sections of the document structure.

In this document we use some example databases, so let's load one of them now:
\startfootnote This code snippet demonstrates that \TEXcode {\usebtxdataset} will
implicitly declare an undefined dataset name, although this practice is to be
discouraged. Similarly, omitting to specify the dataset name \TEXcode {[default]}
in the examples given earlier would fall|-|back correctly, but this, too, is to
be discouraged as being potentially error|-|prone. \stopfootnote

\startbuffer
\usebtxdataset[example][mkiv-publications.bib]
\stopbuffer

\cindex {definebtxdataset}
\cindex {usebtxdataset}

\typeTEXbuffer

\getbuffer

The beginning of the file \type {mkiv-publications.bib} is shown below in \in
{table} [tab:mkiv-publications.bib]. This bibliography database test file
contains one entry of each standard type or category, with the \Index {tag} set
to the entry type name. This entry shown here illustrates many features that will
be explained elsewhere in the text.

\startsection[title=Dataset coverage]

You can load much more data than you actually need. Usually only those entries
that are referred to explicitly will be shown in lists, and commands used to
select these dataset entries will described in \in {chapter} [ch:cite].

A single bibliography list can span groups of datasets; also multiple datasets
can loaded from the same source, for example, one per chapter, in order to
achieve a complete \Index {isolation} of bibliographies with respect to numbering
and references.

As this concept is not obvious but can be quite useful, we will repeat this last
point: multiple datasets can be loaded using the same source file, i.e.\
containing the same data, to be used in parallel, independently. There is little
penalty in keeping even very large datasets as multiple copies in memory.

The current active dataset to be used by default can be set with

\startbuffer
\setupbtx[dataset=example]
\stopbuffer

\cindex {setupbtx}

\typeTEXbuffer

\getbuffer

However, most publication|-|related commands accept optional arguments that
denote the dataset and references to entries can always be prefixed with a
dataset identifier. More about that later.

\showsetup[setupbtx]

\stopsection

\startsection [title=Specification]

The content of a dataset can really be anything: entries of type (or categories)
of all sorts, each containing arbitrary fields. The use to be made of this data
can vary greatly since the system is not limited to the production of
bibliography lists, in particular. The intended use is reflected through a set of
specifications, specific to each bibliography (or non|-|bibliography) style.
These specifications affect the interpretation of dataset categories and fields
as well as their rendering. They will also affect the rendering of citations or
the reference or invocation of individual data entries.

The \TEXcode {default} bibliography specification is very simple: only the
categories \TEXcode {book} and \TEXcode {article} are explicitly defined. These
were shown along with their default rendering in the quick|-|start example on \at
{page} [ch:quick]. We purposely limited this \TEXcode {default} specification as
a minimal example for a bibliography.

The notion of categories and the fields that they might contain and their
interpretation depend on a particular specification, although the dataset
\emphasis {content} is independent of all eventual rendering specifications that
may be applied.

An alternative set of specifications can be selected using, for example

\startbuffer
\usebtxdefinitions[apa]
\stopbuffer

\cindex {usebtxdefinitions}
\index {style+APA}
\seeindex {specification}{style}

\typeTEXbuffer

\getbuffer

Alternately, the set of specifications can be loaded and (later) activated using

\cindex {loadbtxdefinitionfile}
\cindex {setupbtx}
\index {style+APA}

\startTEX
\loadbtxdefinitionfile[apa]
...
\setupbtx[specification=apa]
\stopTEX

but it is safer to use the \TEXcode {\use} rather than \TEXcode {\load} form, in
particular with specifications that may themselves have several variants. Also,
it is way too easy to later forget to set the \TEXcode {specification} parameter
and then wonder why the loaded specification was not applied.

\startaside
We wish to clarify that each specification defines the categories of entries and
the interpretation or use of the fields that they contain, but does not alter the
data itself, only how this data is used. It also defines \emphasis {setups} that
control the rendering of lists as well as citations (to be described below).
Additionally, it creates a namespace with settings for particular \emphasis
{parameters} controlling the formatting of names, for example, punctuation as
well as other stylistic features. The user can tune or overload these settings as
needed.
\stopaside

A specification need not be activated before loading a dataset; indeed the
contents of a dataset are stored independent of the specification, and multiple
specifications can be applied to the same dataset (although this will not usually
be the case). Furthermore, multiple specification files can be loaded
simultaneously as they reside in separate namespaces, but only one specification
can be selected at a time. We introduce these commands here in the context of
datasets as the labeling of categories and of field use can change depending on
the specification. Indeed, some specifications might ignore certain fields
present in the dataset that may be used with other specifications. The details of
how this is programmed will be explained in \in {Chapter} [ch:custom].

So a specification is both a definition of how a dataset is to be interpreted as
well as stylistic tuning of how it is to be rendered.

\cindex   {loadbtxdefinitionfile}
\showsetup[loadbtxdefinitionfile]

\cindex   {usebtxdefinitions}
\showsetup[usebtxdefinitions]

\stopsection

\startsection [title=Dataset diagnostics]

You can ask for an overview of entries present in a dataset with:

\startbuffer
\showbtxdatasetfields[example]
\stopbuffer

\cindex {showbtxdatasetfields}

\typeTEXbuffer

The listing that this produces is shown in \in {Appendix} [ch:datasetfields].

\cindex   {showbtxdatasetfields}
\showsetup[showbtxdatasetfields]
\showsetup[showbtxdatasetfields:argument]

Sometimes you might want to check a database, listing all of its entries in
detail. This can be particularly useful when in doubt concerning the correctness
or the completeness of the data source, remembering that invalid entries and some
syntax errors are simply skipped over. One way of examining the loaded dataset in
detail is the following:

\startbuffer
\showbtxdatasetcompleteness[example]
\stopbuffer

\cindex {showbtxdatasetcompleteness}

\typeTEXbuffer

The diagnostic listing (which can be rather long) is shown in \in {Appendix}
[ch:datasetcompleteness].

\cindex   {showbtxdatasetcompleteness}
\showsetup[showbtxdatasetcompleteness]
\showsetup[showbtxdatasetcompleteness:argument]

The dataset contains many entries and each entry is assigned to a \Index
{category}. It must be stressed, so we repeat ourselves here, that these \quote
{categories} can be of any sort whatsoever, the meaning of which resides in the
rendering style that is chosen. The entries contain fields, and these too can be
of any sort; their use also depends on the rendering style and the \Index
{category} in which they belong. \BibTeX\ has conventionally defined a number of
standard categories, each making use of a number of fields considered either
\index {field+required}required, \index {field+optional}optional or \index
{field+ignored}ignored. However, different traditional \BIBTEX\ rendering styles
can make inconsistant use of these standard categories and fields. To make
matters worse, different \Tindex {.bib} database handling programs might use (and
impose) differing \quote {standards} as well, as mentioned above. \startfootnote
For example, \Tindex {jabref}, in addition to discarding all comments contained
in the database file, will convert all unrecognized, preciously named categories
to \tindex {@other}\BTXcode {@Other}! Of course, \Tindex {jabref} is flexible
enough to be configured with new categories and additional fields, so users of
\Tindex {jabref} with \CONTEXT\ will probably want to use an extended, custom
configuration. \stopfootnote This situation arises from the complexity of
handling bibliographic data of all sorts.

You can see all (currently known) \index {category}categories and \index
{field}fields with:

\cindex {showbtxfields}

\startTEX
\showbtxfields[rotation=...]
\stopTEX

The result is shown \in {table} [tab:fields], below.

\cindex   {showbtxfields}
\showsetup[showbtxfields]
\showsetup[showbtxfields:argument]

Note that other, possibly non|-|bibliographic use of the present dataset system
might define entirely different categories and field types, possibly having
nothing at all to do with the names shown here. An example of such use is given
in \in {chapter} [ch:duane].

Just as a database can be much larger than needed for a document, the same is
true for the fields that make up an entry; not all entry fields will be
necessarily used. This idea will be developed in the next section describing the
rendering of bibliography lists.

\stopsection

\startplacetable
  [reference=tab:mkiv-publications.bib,
   title={mkiv-publications.bib\\
          This test file was constructed to illustrate various features of the
          \BIBTEX\ format and contains some fields that might at first glance
          appear somewhat curious.}].
  \typeBTXfile
    [range={@Comment{Start example},@Comment{Stop example}}]
    {mkiv-publications.bib}
\stopplacetable

\startplacetable
  [reference=tab:fields,
   list={\TEXcode {\showbtxfields[rotation=90]}},
   title={\cindex {showbtxfields}\TEXcode {\showbtxfields[rotation=90]} The entry
          \Index {category} and \Index {field} names (and how they are used) are
          defined by both the rendering style as well as by the contents of the
          dataset. \index {field+required}\quote {Required} fields are indicated
          in green. All unmarked fields are normally \index
          {field+ignored}ignored in the rendering.}]
    \small
    \showbtxfields[rotation=90]
\stopplacetable

\placefloats

\stopchapter

\stopcomponent