As already noted, the inferential core of DATR is extremely simple to implement. We know of the existence of more than a dozen different implementations of the language but there may well be others that we do not know of. The best known, and most widely available are our own (Brighton/Sussex), which is written in Prolog and runs on most Unix platforms; Gibbon's (Bielefeld) DDATR (Scheme) and NODE (Sicstus Prolog); Schillo's (Bielefeld) ZDATR (C); and Kilbury's (Duesseldorf) QDATR Prolog implementation which runs (in compiled form) on PCs and on Sicstus Prolog under Unix. All of these are freely available on request, as is an extensive archive of over one hundred example fragments some of which illustrate formal techniques and others of which are applications of DATR to the lexical phonology, morphology, syntax or semantics of a wide variety of different languages. Other interesting implementations that we are familiar with include the experimental reverse query implementation by Langer (Osnabrueck), Duda & Gebhardi's (Berlin) implementation that is dynamically linked to PATR , and Barg's (Duesseldorf) implementation of a system that induces DATR descriptions from extensional data sets.
Lexicons can either be developed by hand or, in principle at least, they can be induced from relevant data. Once created, lexicons get used for language understanding, language generation, or both. Lexicons that are in use also have to be maintained. At present, implementations of lexical representation systems are typically specialised to one or two of these tasks. A language for lexical knowledge representation is merely one component of a lexical representation system, of course, but its design may well have implications for the tasks noted above. A language that coded everything into bit strings might be fully adequate for the induction and generation tasks, say. But it would probably not facilitate manual lexicon maintenance.
From a more formal point of view, Barg (1994) provides the useful tabular conceptualisation of the inferential tasks that may be associated with a lexical representation language like DATR shown in Table 2, below.
Consider the English verbal morphology facts that provided our running
example in Section 2, above. The conventional
inference task presupposes that we have a description (such as that given in
that section) and a query (such as Love:
>): the task is to infer the appropriate value for
this query, namely love ed. This task is crucial to lexicon
development and maintenance since it provides the means by which the
developer can check the empirical adequacy of their analysis. It is
also a task that is likely to figure in the on-line use of the lexicon
in a language processing system, once the relevant lexical entry (i.e.,
the relevant DATR node) has been determined, to recover information
associated with the entry. And it is the task that does the compilation
in systems that use a partially or fully compiled-out off-line lexicon
(as in Andry et al. 1992).
The reverse query task again presupposes that we have a description available
to us. But instead of starting with a known query, we start instead
with a known value (love ed, say, and the task is to infer what
queries would lead to that value (Love:
<mor past tense sing
>, etc.), The ability to perform this kind of inference may
also be useful in lexicon development and maintenance. However, its
most obvious application is to ``bootstrap'' lexical access in language
processing systems that make direct use of an on-line lexicon: given a
surface form (in analysis) or a semantic form (in generation), we need
to identify a lexical entry associated with that form by reverse query,
and then access other lexical information associated with the entry by
conventional inference. Langer (1994) gives an inference algorithm,
based on the familiar chart data structure, for reverse querying DATR\
lexicons; and Gibbon (1993) describes EDQL (Extended DATR\
Query Language) which permits quantification into components of
multisentence DATR queries.
The final task is that of theory induction. Here one starts with a
set of known query-value pairs (Love:
> = love ed., Love:
<mor pres tense sing three
> = love
s., etc.) and the task is to induce a description that has those pairs as
theorems under the application of conventional inference.
In a world in which all the relevant data was already clearly set out
in descriptive linguistic work, an algorithm that efficiently
achieved this kind of induction would be the philosopher's stone to
the construction of computational lexicons. In the real world, such
an algorithm would still be useful for domains like morphology (where
the quality and clarity of extant descriptive linguistic work is very
high), for bootstrapping lexical descriptions for subsequent manual
development by humans, for updating lexicons in the light of newly
encountered lexical information, and for converting one kind of
lexicon into a completely different kind of lexicon by inducing the
latter from the output of the former. The automatic induction of
(symbolic) lexicons from data is a very new research area in
computational linguistics: Kilbury (1993), Kilbury et al. (1994),
Light (1994) and Light et al. (1993) have proposed a variety of
incremental algorithms that take a partial lexical hierarchy and
elaborate it as necessary in the light of successively presented data
sets, whilst Barg (1994) has presented a nonincremental algorithm
that induces full DATR hierarchies from suitable data sets.
Since DATR is no more than a language, it does not itself dictate how a DATR lexicon is to be used. As it turns out, different researchers have used it very differently. Andry et al. (1992), in the context of a speech recognition task involving the parsing of ``extremely large lattices of lexical hypotheses'' (p248), opted for off-line compilation of their 2000 word DATR lexicons into pairs of on-line lexicons, one of which was encoded with bit-vectors for speed and compactness. At the other extreme, Duda & Gebhardi (1994) present an interface between a PATR -based parser and a DATR lexicon where the former is dynamically linked to the latter and able to query it freely, in both conventional and reverse modes, without restriction. And Gibbon (1993) presents an implementation of a very flexible query language, EDQL, which allows quantification over any constituents of (possibly complex) DATR\ queries.