|
Dico |
GNU Dictionary Server |
Sergey Poznyakoff |
| GNU Dico Manual (split by chapter): | ![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
? |
GNU Dico comes with a set of loadable modules for handling several database formats and for extending its functionality. Modules are binary loadable files, installed in ‘$prefix/lib/dico’. Modules are configurable on per-module (see section command) and per-database (see section handler) basis.
GNU Dico version 2.1 is shipped with the following modules: ‘Outline’, ‘Dictorg’, ‘Guile’, and ‘Python’. The ‘Outline’ module handles databases written in Emacs Outline format. It is useful for small databases. The ‘Dictorg’ module handles databases in format designed by DICT development group. The most existing free databases are written in this format. Finally, the ‘Guile’ and ‘Python’ allow you to use arbitrary database modules written in Scheme and Python programming languages, correspondingly.
In this chapter we will describe these modules in detail.
outline module. The outline module supports databases written in
Emacs outline mode. It is not designed for storing large
amounts of data, its purpose rather is to handle small databases that
can be composed easily and quickly using the Emacs editor.
The outline mode is described in Outline Mode: (emacs)Outline Mode section `Outline Mode' in The Emacs Editor. In short it is a usual plain text file, containing header lines and body lines. Header lines start with one or more stars, the number of starts indicating the depth of heading in the document structure: one star for chapters, two stars for sections, etc. Body lines are anything that is not header lines.
The outline dictionary must have at least a chapter named
‘Dictionary’, that contains dictionary corpus. Within it, each
subsection is treated as a dictionary article, its header line giving
the headword, and its body lines supplying the article itself. Apart
from this, two more chapters have special meaning. The
‘Description’ chapter gives a short description to be displayed
on SHOW DB command, and the ‘Info’ chapter supplies a full
database description for SHOW INFO output. Both chapters are
optional.
All three reserved chapter names are case-insensitive.
To summarize, the structure of an outline database is:
* Description
line
* Info
text
* Dictionary
** line
text
[any number of entries follows]
|
As an example of outline format, the GNU Dico package includes Ambrose Bierce's Devil's Dictionary in this format, see ‘tests/devils.out’.
Outline module initialization does not require any command line
parameters, specifying command "outline"; is enough. To
declare a database, supply its full file name to handler
directive, as shown in the example below:
load-module outline {
command "outline";
}
database {
name "devdict";
handler "outline /var/db/devils.out";
}
|
dictorg module. The dictorg module supports dictionaries in the format
designed by DICT development group
(http://dict.org). Lots of free dictionaries in this format
are available from FreeDict project.
A dictionary in this format consists of two files: a dictionary database file, named ‘name.dict’ or ‘name.dict.dz’ (a compressed form), and an index file, which lists article headwords and corresponding offsets in the database. The index file is named ‘name.index’. The common part of these two file names, name, is called the base name for that dictionary.
An instance of the dictorg module is created using the
following statement:
load-module inst-name {
command "dictorg [options]";
}
|
where square brackets denote optional part. Valid options are the following:
Look for databases in directory dir.
Dictorg entries are special database entries that keep some
service information, such as database description, etc. Such entries
are marked with headwords that begin with ‘00-database-’. By
default they are exempt from database look-ups and cannot be retrieved
using MATCH or DEFINE command.
Using ‘show-dictorg-entries’ removes this limitation and makes these entries behave as other database entries.
Sort the database index after loading. This option is designed for use with some databases that have malformed indexes. At the time of this writing the ‘eng-swa’ database from FreeDict requires this option.
Using sort may considerably slow down initial database loading.
Remove trailing whitespace from dictionary headwords at start up. This might be necessary for some databases.
The values set via these options become defaults for all databases using this module instance, unless overridden in their declarations.
A database that uses this module must be declared as follows:
database {
handler "inst-name database=file [options]";
...
}
|
where inst-name is the instance name used in the load-module
declaration above.
The database argument specifies the base name of the
database. Unless file begins with a slash, the value of
dbdir initialization option is prepended to it. If
dbdir is not given and file does not begin with a slash,
an error is signalled.
The options above are the same options as described in
initialization procedure: show-dictorg-entries, sort,
and trim-ws. If used, they override initialization settings for
that particular database. Forms prefixed with ‘no’ may be used
to disable the corresponding option for this database. For example,
notrim-ws cancels the effect of trim-ws used when
initializing the module instance.
guile module. Guile is an acronym for GNU's Ubiquitous Intelligent Language for Extensions. It provides a Scheme interpreter conforming to the R5RS language specification and a number of convenience functions. For information about the language, refer to (r5rs)Top section `Top' in Revised(5) Report on the Algorithmic Language Scheme. For a detailed description of Guile and its features, see (guile)Top section `Overview' in The Guile Reference Manual.
The guile module provides an interface to Guile that allows writing
GNU Dico modules in Scheme. The module is loaded
using the following configuration file statement:
load-module mod-name {
command "guile [options]"
" init-script=‘script’"
" init-args=args"
" init-fun=function";
}
|
The init-script parameter specifies the name of a Scheme
source file that must be loaded in order to initialize the module.
The init-args parameter supplies additional arguments to the
module. They will be accessible to the ‘script’ via
command-line function. This parameter is optional.
The init-fun parameter specifies the name of a function that
will be invoked to perform the initialization of the module and of
particular databases. See section Guile Initialization, for a description
of initialization sequence. Optional arguments, options, are:
debugEnable Guile debugging and stack traces.
nodebugDisable Guile debugging and stack traces (default).
load-path=pathAppend directories from path to the list of directories which should be searched for Scheme modules and libraries. The path must be a list of directory names, separated by colons.
This option modifies the value of Guile's %load-path
variable.
See the section Configuration and Installation in the Guile Reference Manual.
Guile databases are declared using the following syntax:
database {
name "dbname";
handler "mod-name [options] cmdline";
}
|
where:
gives the name for this database,
is the name given to Guile module in load-module statement (see
above),
Options, that allow to override global settings given in the
load-module statement. The following options are understood:
init-script, init-args, and init-fun. Their
meaning is the same as for load-module statement (see above),
except that they affect only this particular database.
is the command line that will be passed to the Guile
open-db callback function (see open-db).
A database handled by guile module is associated with a
virtual function table. This table is an association list, that
supplies to the module the Scheme call-back functions implemented to
perform particular tasks on that database. In this association list,
the car of each element contains the name of a function, and
its cdr gives the corresponding function. The defined function
names and their semantics are described in the following table:
Open the database.
Close the database.
Return a short description of the database.
Return a full information about the database.
Define a word.
Look up a word in the database.
Output a search result.
Return number of entries in the result.
For example, the following is a valid virtual function table:
(list (cons "open" open-module)
(cons "close" close-module)
(cons "descr" descr)
(cons "info" info)
(cons "define" define-word)
(cons "match" match-word)
(cons "output" output)
(cons "result-count" result-count))
|
Apart from a per-database virtual table, there is also a global virtual function table, which is used to supply the entries missing in the former. Both tables are created during the module initialization, as described in the next subsection.
Particular virtual functions are described in Guile API.
The following configuration statement causes loading and
initialization of the guile module:
load-module mod-name {
command "guile init-script=‘script’"
" init-fun=function";
}
|
Upon module initialization stage, the module attempts to load the
file named ‘script’. The file is loaded using
primitive-load call (see primitive-load: (guile)Loading section `Loading' in The Guile Reference Manual), i.e. the load paths are not
searched, so script must be an absolute path name. The
init-fun parameter supplies the name of an initialization
function. This Scheme function is used to construct virtual
function tables for the module itself and for each database that uses
this module. It must be declared as follows:
(define (function arg) ...) |
This function is called several times. First of all, it is called after
script is loaded. This time it is given #f as its
argument, and its return value is saved as a global function table.
Then, it is called for each database statement that has
mod-name (used in load-module above) in its
handler keyword, e.g.:
database {
name db-name;
handler "mod-name...";
}
|
This time, it is given db-name as its argument and its return is stored as the virtual function table for this particular database.
The following example function returns a complete virtual function table:
(define-public (my-dico-init arg)
(list (cons "open" open-module)
(cons "close" close-module)
(cons "descr" descr)
(cons "info" info)
(cons "lang" lang)
(cons "define" define-word)
(cons "match" match-word)
(cons "output" output)
(cons "result-count" result-count)))
|
This subsection describes callback functions that a Guile database module must provide. The description of each function begins with the function prototype and its entry in the virtual function table.
Callback functions can be subdivided into two groups: database functions and search functions.
Database callback functions are responsible for opening and closing databases and for returning information about them.
Virtual table: (cons "open" open-db)
Open the database. The argument name contains database name as
given in name statement of database block
(see section Databases). Optional argument args is a list of
command line parameters obtained from cmdline in handler
statement (see guile-cmdline). For example, if the configuration
file contained:
database {
name "foo";
handler "guile db=file 1 no";
}
|
then the open-db callback will be called as:
(open-db "foo" '("db=file" "1" "no"))
|
The open-db callback returns a database handle, i.e. an
opaque structure that is used to identify this database, and that
keeps its internal state. This value, hereinafter named dbh,
will be passed to another callback functions that need to access the
database.
The return value #f or '() indicates an error.
Virtual Table: (cons "close" close-db)
Close the database. This function is called during the cleanup
procedure, before termination of dicod. The argument
dbh is a database handle returned by open-db.
The return value from close-db is ignored. To communicate
errors to the daemon, throw an exception.
Virtual Table: (cons "descr" descr)
Return a short textual description of the database, for use in
SHOW DB output. If there is no description, return #f
or '().
The argument dbh is a database handle returned by
open-db.
This callback is optional. If it is not defined, or if it returns
#f ('()), the text from description statement
is used (see section description). Otherwise, if no
description statement is present, empty string is used.
Virtual Table: (cons "info" info)
Return a verbose, eventually multi-line, textual description of the
database, for use in SHOW INFO output. If there is no
description, return #f or '().
The argument dbh is a database handle returned by open-db.
This callback is optional. If it is not defined, or if it returns
#f ('()), the text from info statement
is used (see section info). If there is no info statement,
the string ‘No information available’ is used.
Database searches are a two-phase process. First, an appropriate
callback is called to do the search: define-word is called for
DEFINE searches and match-word is called for matches.
This callback returns an opaque entity, called result handle,
which is then passed to output callback, which is responsible
for outputting it.
Virtual Table: (cons "lang" lang)
Virtual Table: (cons "define" define-word)
Find definitions of word word in the database dbh. Return
a result handle. If nothing is found, return #f or '().
The argument dbh is a database handle returned by open-db.
Virtual Table: (cons "match" match-word)
Find all matches of key from the database dbh, using
matching strategy strat. Return a result handle. If nothing is
found, return #f or '().
The key is a Dico Key object, which contains information
about the word being looked for. To obtain the actual word, use
the dico-key->word function (see dico-key->word).
The argument dbh is a database handle returned by
open-db. Matching strategy strat is a special Scheme
object that can be accessed using a set of functions described below
(see section Dico Scheme Primitives).
Virtual Table: (cons "output" output)
Output nth result from the result set resh. The argument
resh is a result handle returned by define-word or
match-word callback.
The data must be output to the current output port, e.g. using
display or format primitives. If resh represents
a match result, the output must not be quoted or terminated by newlines.
Virtual Table: (cons "result-count" result-count)
Return the number of elements in the result set resh.
GNU Dico provides the following Scheme primitives, that access various
fields of the strat and key arguments to match callback:
Return ‘#t’ if obj is a Dico key object.
Extract the lookup word from the key object key.
Create new key object from strategy strat and word word.
Return true if strat has a selector. .
Return true if key matches word as per strategy selector strat. The key is a ‘Dico Key’ object.
Return the name of strategy strat.
Return a textual description of the strategy strat.
Return true if strat is a default
strategy. See section default strategy.
Register a new strategy. If fun is given it will be used as a callback for that strategy. Notice, that you can use strategies implemented in Guile in your C code as well (see section strategy).
The selector function must be declared as follows:
(define (fun key word) ...) |
It must return #t if key matches word, and
#f otherwise.
In this subsection we will show how to build a simple dicod module
written in Scheme. The source code of this module, called
‘example.scm’ and a short database for it, ‘example.db’, are
shipped with the distribution in the directory ‘tests’.
The database is stored in a disk file in form of a list. The first
two elements of this list contain database description and full
information strings. Rest of elements are conses, whose car
contains the headword, and cdr contains the corresponding
dictionary article. Following is an example of such a database:
("Short English-Norwegian numerals dictionary"
"Short English-Norwegian dictionary of numerals (1 - 7)"
("one" . "en")
("two" . "to")
("three" . "tre")
("four" . "fire")
("five" . "fem")
("six" . "seks")
("seven" . "sju"))
|
We wish to declare such databases in ‘dicod.conf’ the following way:
database {
name "numerals";
handler "guile example.db";
}
|
Thus, the rest argument to ‘open-db’ callback will be
‘("guile" "example.db")’ (see open-db). Given this, we may
write the callback as follows:
(define (open-db name . rest)
(let ((db (with-input-from-file
(cadr rest)
(lambda () (read)))))
(cond
((list? db) (cons name db))
(else
(format (current-error-port) "open-module: ~A: invalid format\n"
(car args))
#f))))
|
The list returned by this callback will then be passed as a database handle to another callback functions. To facilitate access to particular elements of this list, it is convenient to define the following syntax:
(define-syntax db:get
(syntax-rules (info descr name corpus)
((db:get dbh name) ;; Return the name of the database.
(list-ref dbh 0))
((db:get dbh descr) ;; Return the desctiption.
(list-ref dbh 1))
((db:get dbh info) ;; Return the info string.
(list-ref dbh 2))
((db:get dbh corpus) ;; Return the word list.
(list-tail dbh 3))))
|
Now, we can write ‘descr’ and ‘info’ callbacks:
(define (descr dbh) (db:get dbh descr)) (define (info dbh) (db:get dbh info)) |
The two callbacks ‘define-word’ and ‘match-word’ provide
the core module functionality. Their results will be passed to
‘output’ and ‘result-count’ callbacks as a “result handler”
argument. In the spirit of Scheme, we make the result a list. Its
car is a boolean value: #t, if the result
comes from ‘define-word’ callback, and #f if it comes from
‘match-word’. The cdr of this list contains the list of
matches. For ‘define-word’, it is a list of conses copied from
the database word list, whereas for ‘match-word’, it is a list of
headwords.
The ‘define-word’ callback returns all list entries whose
cars contain the look up word. It uses mapcan
function, which is supposed to be defined elsewhere:
(define (define-word dbh word)
(let ((res (mapcan (lambda (elt)
(and (string-ci=? word (car elt))
elt))
(db:get dbh corpus))))
(and res (cons #t res))))
|
The ‘match-word’ callback (see match-word) takes three arguments: a database handler dbh, a strategy descriptor strat, and a word word to look for. The result handle it returns contains a list of headwords from the database that match word in sense of strat. Thus, the behavior of ‘match-word’ depends on the strat. To implement this, let's define a list of directly supported strategies (see below for definitions of particular ‘match-’ functions):
(define strategy-list
(list (cons "exact" match-exact)
(cons "prefix" match-prefix)
(cons "suffix" match-suffix)))
|
The ‘match-word’ callback will then select an entry from
that list and call its cdr, e.g.:
(define (match-word dbh strat key)
(let ((sp (assoc (dico-strat-name strat) strategy-list)))
(let ((res (cond
(sp
((cdr sp) dbh strat (dico-key->word key)))
|
If the requested strategy is not in that list, the function will use the selector function if it is available, and the default matching function otherwise:
((dico-strat-selector? strat)
(match-selector dbh strat key))
(else
(match-default dbh strat (dico-key->word key))))))
|
Notice the use of dico-key->word function to extract the actual
lookup word from the key object.
To summarize, the ‘match-word’ callback is:
(define (match-word dbh strat key)
(let ((sp (assoc (dico-strat-name strat) strategy-list)))
(let ((res (cond
(sp
((cdr sp) dbh strat (dico-key->word key)))
((dico-strat-selector? strat)
(match-selector dbh strat key))
(else
(match-default dbh strat (dico-key->word key))))))
(if res
(cons #f res)
#f))))
|
Now, let's create the ‘match-’ functions used in it. The ‘exact’ strategy is the easiest to implement:
(define (match-exact dbh strat word)
(mapcan (lambda (elt)
(and (string-ci=? word (car elt))
(car elt)))
(db:get dbh corpus)))
|
The ‘prefix’ and ‘suffix’ strategies are implemented using
SRFI-13 (see (guile)SRFI-13 section `SRFI-13' in The Guile Reference Manual)
functions string-prefix-ci? and string-suffix-ci?, e.g.:
(define (match-prefix dbh strat word)
(mapcan (lambda (elt)
(and (string-prefix-ci? word (car elt))
(car elt)))
(db:get dbh corpus)))
|
Notice that whereas the ‘prefix’ strategy is defined by the server itself, the ‘suffix’ strategy is an extension, and should therefore be registered:
(dico-register-strat "suffix" "Match word suffixes") |
The match-selector function is pretty similar to its
siblings, except that it uses dico-strat-select?
(see section dico-strat-select?) to select the
matching elements. This also leads to this function expecting
a key as its third argument, in contrast to the previous
matchers, which expect the actual lookup word there:
(define (match-selector dbh strat key)
(mapcan (lambda (elt)
(and (dico-strat-select? strat (car elt) key)
(car elt)))
(db:get dbh corpus)))
|
Finally, the match-default may be a variable that refers to
the default matching strategy for this module, e.g.:
(define match-default match-prefix) |
The two callbacks left to define are ‘result-count’ and
‘output’. The first of them simply returns the number of
elements in cdr of the result:
(define (result-count rh) (length (cdr rh))) |
The behavior of ‘output’ depends on whether the result is produced by ‘define-word’ or by ‘match-word’.
(define (output rh n)
(if (car rh)
;; Result comes from DEFINE command.
(let ((res (list-ref (cdr rh) n)))
(display (car res))
(newline)
(display (cdr res)))
;; Result comes from MATCH command.
(display (list-ref (cdr rh) n))))
|
Finally, the callbacks are made known to dicod by the
module initialization function:
(define-public (example-init arg)
(list (cons "open" open-module)
(cons "descr" descr)
(cons "info" info)
(cons "define" define-word)
(cons "match" match-word)
(cons "output" output)
(cons "result-count" result-count)))
|
Notice, that in this implementation ‘close-db’ callback was not needed.
python module (This message will disappear, once this node revised.)
stratall module The stratall module provides a new strategy, called ‘all’.
This strategy always returns a full list of headwords from the
database, no matter what the actual search word is.
To load this strategy, use the following configuration statement:
load-module stratall {
command "stratall";
}
|
Using this strategy on a full set of databases (‘MATCH * all ""’) produces enormous amount of output, which may induce a considerable strain on the server, therefore it is advised to block such usage as suggested in Strategies and Default Searches:
strategy all {
deny-all yes;
}
|
substr module The substr module provides a ‘substr’ search
strategy. This strategy matches a substring anywhere in the
keyword. For example:
C: MATCH eng-deu substr orma S: 152 207 matches found: list follows S: eng-deu "abnormal" S: eng-deu "conformable" S: eng-deu "doorman" S: eng-deu "format" … |
The loading procedure expects no arguments:
load-module substr {
command "substr";
}
|
ldap module The ldap module loads the support for LDAP user
databases. It is available if Dico has been configured with
LDAP.
The module needs no additional configuration parameters:
load-module ldap {
command "ldap";
}
|
See section LDAP Databases., for a description of its use.
![]() |
![]() |
![]() |
![]() |
![]() |
? |
Verbatim copying and distribution of this entire article is permitted in any medium, provided this notice is preserved.