Dico |
|
GNU Dictionary Server |
Sergey Poznyakoff |
5 Modules
GNU Dico comes with a set of loadable modules for handling various database formats and extending the server functionality. Modules are binary loadable files, installed in $prefix/lib/dico. They are configurable on per-module (see command) and per-database (see handler) basis.
In this chapter we will describe the modules included in the distribution of GNU Dico version 2.10.
5.1 Outline
The outline
module supports databases written in
Emacs outline mode. It is not designed for storing large
amounts of data, its purpose rather is to handle small databases that
can be composed easily and quickly using the Emacs editor.
The outline mode is described in Outline Mode in The Emacs Editor. In short, it is a usual plain text file, containing header lines and body lines. Header lines start with one or more stars, the number of starts indicating the nesting level of the heading in the document structure: one star for chapters, two stars for sections, etc. Body lines are anything that is not header lines.
The outline dictionary must have at least a chapter named
‘Dictionary’, which contains the dictionary corpus. Within it, each
section is treated as a dictionary article, its header line giving
the headword, and its body lines supplying the article itself. Apart
from this, two more chapters have special meaning. The
‘Description’ chapter gives a short description to be displayed
on SHOW DB
command, and the ‘Info’ chapter supplies a full
database description for SHOW INFO
output. Both chapters are
optional.
All three reserved chapter names are case-insensitive.
To summarize, the structure of an outline database is:
* Description
line
* Info
text
* Dictionary
** line
text
[any number of entries follows]
As an example of outline format, the GNU Dico package includes Ambrose Bierce’s Devil’s Dictionary in this format, see examples/devdict.out.
The initialization of the outline
module does not require
any command line parameters. To declare a database, supply its full
file name to the database handler
directive, as shown in the
example below:
load-module outline; database { name "devdict"; handler "outline /var/db/devdict.out"; }
5.2 Dictorg
The dictorg
module supports dictionaries in the format
designed by DICT development group
(http://dict.org). Lots of free dictionaries in this format
are available from the FreeDict
project.
A dictionary in this format consists of two files: a dictionary database file, named name.dict or name.dict.dz (a compressed form), and an index file, which lists article headwords with the corresponding offsets in the database. The index file is named name.index. The common part of these two file names, name, is called the base name for that dictionary.
An instance of the dictorg
module is created using the
following statement:
load-module inst-name { command "dictorg [options]"; }
where square brackets denote optional part. Valid options are the following:
- dbdir=dir
Look for databases in directory dir.
- show-dictorg-entries
Dictorg entries are special database entries that keep some service information, such as database description, etc. Such entries are marked with headwords that begin with ‘00-database-’. By default they are exempt from database look-ups and cannot be retrieved using
MATCH
orDEFINE
command.Using show-dictorg-entries removes this limitation.
- sort
Sort the database index after loading. This option is designed for use with some databases that have malformed indexes. At the time of this writing the ‘eng-swa’ database from FreeDict requires this option.
Using
sort
may considerably slow down initial database loading.- trim-ws
Remove trailing whitespace from dictionary headwords at start up. This might be necessary for some databases.
The values set via these options become defaults for all databases using this module instance, unless overridden in their declarations.
A database that uses this module must be declared as follows:
database { handler "inst-name database=file [options]"; ... }
where inst-name is the instance name used in the load-module
declaration above.
The database
argument specifies the base name of the
database. Unless file begins with a slash, the value of
dbdir
initialization option is prepended to it. If
dbdir
is not given and file does not begin with a slash,
an error is signalled.
The options above are the same options as described in
initialization procedure: show-dictorg-entries
, sort
,
and trim-ws
. If used, they override initialization settings for
that particular database. Forms prefixed with ‘no’ can be used
to disable the corresponding option for this database. For example,
notrim-ws
cancels the effect of trim-ws
used when
initializing the module instance.
5.3 Gcide
The gcide
module provides support for GNU Collaborative
International Dictionary of English. This dictionary can be downloaded
from ftp://ftp.gnu.org/gnu/gcide. It consists of a set of
files named from CIDE.A through CIDE.Z, written using a
special markup. See http://gcide.gnu.org.ua, for a detailed
information about the dictionary.
The gcide
module is started via the following statement:
load-module gcide;
The database is initialized as follows:
database { handler "gcide dbdir=directory [options]"; ... }
The ‘dbdir’ parameter supplies the name of the directory where
database files are located. Upon startup, the module scans the
dictionary files and creates an index file, named GCIDE.IDX, if
it does not already exist. The file is created using an ancillary
program idxgcide
, described below. Unless specified
otherwise, this file is created in the same directory where the
database files are located, therefore the directory must be writable
for the user dicod
is started as.
Other options are:
- gcide parameter: idxdir directory
Specifies the directory where the CIDE.IDX index file resides or should reside.
- gcide parameter: index-cache-size size
Sets the maximum number of index pages the module keeps in memory simultaneously. The default value is 16. The pages are cached using the last recently used algorithm. Raising this value will make dictionary accesses faster at the expense of using more memory.
- gcide parameter: index-program progname
Specifies the full name of the index program. Usually this option is not needed, because the module is configured to start the
idxgcide
utility from its default location. It is mostly useful for the module developers.
- gcide parameter: suppress-pr
This parameter suppresses the output of ‘pr’ (pronunciation) tags. According to GCIDE docs, very few of the pronunciation fields have been filled in, so it might be reasonable to avoid displaying them at all.
Starting from version 0.51, GCIDE contains the file INFO,
which provides basic information about the dictionary. The
gcide
module returns contents of this file at the
‘SHOW INFO’ request. The first line of this file (with the
trailing newline and final point removed) is returned as the short
database description.
Here’s a full example of a ‘gcide’ as used in
‘dico.gnu.org.ua
’:
load-module gcide; database { name "gcide"; handler "gcide dbdir=/var/dictdb/gcide-0.51 suppress-pr"; languages-from "en"; languages-to "en"; }
5.3.1 idxgcide
The idxgcide
utility is used by the gcide
module
to index the GCIDE dictionary. You can start it manually to reindex
the database. It can be needed, for example, if you install a
modified version of the dictionary. The program is installed in
libexecdir. The usage is:
idxgcide [options] dbdir [idxdir]
The only mandatory argument dbdir specifies the name of the directory where the GCIDE dictionary is installed. The optional idxdir argument specifies the directory for the index file, if it differs from dbdir. Available options are:
- --debug
- -d
Debug lexical analyzer.
- --dry-run
- -n
Do nothing, but print everything. This implies --verbose.
- --verbose
- -v
Increase output verbosity. This option can be specified multiple times, each occurrence increasing the verbosity level by one. By default the utility outputs only errors and warnings. At level one, it prints additionally the names of source files that are being indexed at the moment. At level two (the maximum level implemented at the moment) it outputs each headword being indexed along with its location. This is useful only for debugging.
- --page-size=number
- -p number
Defines the size of index file page. The number specifies the size in bytes. The following case-insensitive suffixes can be used: ‘k’ (‘kb’), ‘m’ (‘mb’) or ‘g’ (‘gb’), specifying kilobytes, megabytes and gigabytes (ouch!) correspondingly.
The default page size is 10240 bytes.
5.4 Wordnet
WordNet is a lexical database for the English language, created and maintained at the Cognitive Science Laboratory of Princeton University3. It groups English words into sets of synonyms called synsets, provides short, general definitions, and records the various semantic relations between these synonym sets.
Dico provides a wordnet
module for reading WordNet lexical
database files. The module relies on libWN, the support
library distributed with the WordNet database.
There is a point worth noticing if you plan to use the WordNet
library. Normally, the libWN is compiled as a static library
with position-dependent code, which makes it difficult (or impossible,
on 64-bit architectures) to use from the dynamically-loaded libraries,
such as dicod
modules. So, first of all you will need to
rebuild WordNet so that it contains position-independent code. To do
so, change to the WordNet source directory and reconfigure it as
follows:
./configure CFLAGS=-fPIC [other_options]
where other_options stands for any other options you might wish to pass to configure.
If you are going to run this command in a source directory that has been previously configured, it is advisable to run ‘make distclean’ beforehand.
Debian-based systems provide a package ‘wordnet-dev’, which contains a properly built shared library. However, this library is named ‘libwordnet.so’, instead of the expected ‘libWN.so’. On such systems you will have to use the --with-libWN option to configure, in order to inform it about the change:
./configure --with-libWN=wordnet
Argument to this option is the new basename for the libWN library, without file suffix. Optionally, the ‘lib’ prefix is allowed,
The wordnet
module is compiled automatically if the
configure script was able to find the library and its header file
wn.h. If it was not, use the --with-wordnet configure
option to specify the location where these files can be found. For
example, if WordNet was installed using the default procedure, then
the following option will do the job:
./configure --with-wordnet=/usr/local/WordNet-3.0
This command tells Dico to look for WordNet library files in /usr/local/WordNet-3.0/lib and for include files in /usr/local/WordNet-3.0/include.
A compiled module is loaded using the following statement:
load-module wordnet { command "wordnet [parameters]"; }
Optional parameters are:
- wordnet module parameter: wnhome dir
Base directory for WordNet files. This is the directory where WordNet was installed. For the
wordnet
module to work, it must contain the dict subdirectory with WordNet dictionary files.If you installed WordNet to /usr/local/WordNet-3.0, so that running
ls
on that directory shows you:$ ls /usr/local/WordNet-3.0/ bin/ dict/ doc/ include/ lib/ man/
then you would use
load-module wordnet { command "wordnet wnhome=/usr/local/WordNet-3.0"; }
- wordnet module parameter: wnsearchdir dir
Directory in which the WordNet database has been installed.
Normally, these values are set at compile time and you won’t need to override them. The use of these parameters may, however, be necessary if the database was moved or installed in a non-standard location.
One or more WordNet database instances can be defined. They all will be sharing the same database. The reason for having several database instances is that they may have different output options. For example, you may configure one database to return word definitions and another one to act as a thesaurus.
Dico version 2.10 defines the following database parameters:
- wordnet database parameter: pos value
Select part of speech to be displayed by this database. By default, all parts of speech are displayed. Valid values are:
- all
Display all parts of speech. This is the default.
- noun
Display only nouns.
- verb
Display only verbs.
- adj
- adjective
Display only adjectives.
- adv
- adverb
Display only adverbs.
- satellite
- adjsat
Display only satellites.
- wordnet database parameter: merge-defs
When specified, this parameter instructs the WordNet database to merge all definitions with the same part of speech into a single definition, which will be returned in the usual dictionary fashion, e.g.:
sail n. 1. a large piece of fabric (usually canvas fabric) by means of which wind is used to propel a sailing vessel Synonyms: {canvas}, {canvass}, {sheet} 2. an ocean trip taken for pleasure Synonyms: {cruise} 3. any structure that resembles a sail v. 1. traverse or travel on (a body of water); "We sailed the Atlantic"; "He sailed the Pacific all alone" 2. move with sweeping, effortless, gliding motions
By default, each definition is returned as a separate entry.
As an example, the following is the database definition the author uses on his server:
database { name "WordNet"; handler "wordnet merge-defs"; languages-from "en"; languages-to "en"; description "WordNet dictionary, version 3.0"; }
5.5 Guile
Guile is an acronym for GNU’s Ubiquitous Intelligent Language for Extensions. It provides a Scheme interpreter conforming to the R5RS language specification and a number of convenience functions. For information about the language, refer to Revised(5) Report on the Algorithmic Language Scheme. For a detailed description of Guile and its features, see Overview in The Guile Reference Manual.
The guile
module provides an interface to Guile that
allows for writing GNU Dico modules in Scheme. The module is loaded
using the following configuration file statement:
load-module mod-name { command "guile [options]" " init-script=script" " init-args=args" " init-fun=function"; }
The init-script
parameter specifies the name of a Scheme
source file to be loaded in order to initialize the module.
The init-args
parameter supplies additional arguments to the
module. They will be accessible to the script via
command-line
function. This parameter is optional.
The init-fun
parameter specifies the name of a function that
will be invoked to perform initialization of the module and of
particular databases. See Guile Initialization, for a description
of initialization sequence. Optional arguments, options, are:
debug
Enable Guile debugging and stack traces.
nodebug
Disable Guile debugging and stack traces (default).
load-path=path
Append directories from path to the list of directories which should be searched for Scheme modules and libraries. The path must be a list of directory names, separated by colons.
This option modifies the value of Guile’s
%load-path
variable. See the section Configuration and Installation in the Guile Reference Manual.
Guile databases are declared using the following syntax:
database { name "dbname"; handler "mod-name [options] cmdline"; }
where:
- dbname
gives the name for this database,
- mod-name
the name given to Guile module in
load-module
statement (see above),- options
-
options that override global settings given in the
load-module
statement. The following options are understood:init-script
,init-args
, andinit-fun
. Their meaning is the same as forload-module
statement (see above), except that they affect only this particular database. - cmdline
the command line that will be passed to the Guile
open-db
callback function (see open-db).
5.5.1 Virtual Functions
A database handled by the guile
module is assigned a
virtual function table. This table is an association list which
keeps Scheme call-back functions implemented to perform
particular tasks on that database. In this list, the car
of
each element contains the name of a function, and its cdr
gives
the corresponding function. The defined function names and their
semantics are:
- open
Open the database.
- close
Close the database.
- descr
Return a short description of the database.
- info
Return a full information about the database.
- define
Define a word.
- match
Look up a word in the database.
- output
Output a search result.
- result-count
Return number of entries in the result.
For example, the following is a valid virtual function table:
(list (cons "open" open-module) (cons "close" close-module) (cons "descr" descr) (cons "info" info) (cons "define" define-word) (cons "match" match-word) (cons "output" output) (cons "result-count" result-count))
Apart from a per-database virtual table, there is also a global virtual function table, which supplies entries missing in the former. Both tables are created during the module initialization, as described in the next subsection.
The purposes of particular virtuals functions are described in Guile API.
5.5.2 Guile Initialization
The following configuration statement causes loading and
initialization of the guile
module:
load-module mod-name { command "guile init-script=script" " init-fun=function"; }
Upon module initialization stage, the module attempts to load the
file named script. The file is loaded using
primitive-load
call (see Loading in The Guile Reference Manual), i.e. the load paths are not
searched, so script must be an absolute path name. The
init-fun
parameter supplies the name of the initialization
function. This Scheme function constructs virtual
function tables for the module itself and for each database that uses
this module. It must be declared as follows:
(define (function arg) ...)
This function is called several times. First of all, it is called after
the script is loaded. This time it is given #f
as its
argument, and its return value is saved as a global function table.
Then, it is called for each database
statement that has
mod-name (used in load-module
above) in its
handler
keyword, e.g.:
database { name db-name; handler "mod-name …"; }
This time, it is given db-name as its argument and the value it returns is stored as the virtual function table for this particular database.
The following example function returns a complete virtual function table:
(define-public (my-dico-init arg) (list (cons "open" open-module) (cons "close" close-module) (cons "descr" descr) (cons "info" info) (cons "lang" lang) (cons "define" define-word) (cons "match" match-word) (cons "output" output) (cons "result-count" result-count)))
5.5.3 Guile API
This subsection describes callback functions that a Guile database module must provide. Each description begins with the function prototype and its entry in the virtual function table.
Callback functions can be subdivided into two groups: database functions and search functions.
Database callback functions are responsible for opening and closing databases and for returning information about them.
- Guile Callback: open-db name . args
Virtual table:
(cons "open" open-db)
Open the database. The argument name contains database name as given in the
name
statement of the correspondingdatabase
block (see Databases). Optional argument args is a list of command line parameters obtained from cmdline inhandler
statement (see guile-cmdline). For example, if the configuration file contained:database { name "foo"; handler "guile db=file 1 no"; }
then the
open-db
callback will be called as:(open-db "foo" '("db=file" "1" "no"))
The
open-db
callback returns a database handle, i.e. an opaque object that will subsequently be used to identify this database. This value, hereinafter named dbh, will be passed to another callback functions that need to access the database.The return value
#f
or'()
indicates an error.
- Guile Callback: close-db dbh
Virtual Table:
(cons "close" close-db)
Close the database. This function is called during the cleanup procedure, before termination of
dicod
. The argumentdbh
is a database handle returned byopen-db
.The return value from
close-db
is ignored. To communicate errors to the daemon, throw an exception.
- Guile Callback: descr dbh
Virtual Table:
(cons "descr" descr)
Return a short textual description of the database, for use in
SHOW DB
output. If there is no description, returns#f
or'()
.The argument dbh is a database handle returned by
open-db
.This callback is optional. If it is not defined, or if it returns
#f
('()
), the text fromdescription
statement is used (see description). Otherwise, if nodescription
statement is present, an empty string will be returned.
- Guile Callback: info dbh
Virtual Table:
(cons "info" info)
Return a verbose, eventually multi-line, textual description of the database, for use in
SHOW INFO
output. If there is no description, returns#f
or'()
.The argument dbh is a database handle returned by
open-db
.This callback is optional. If it is not defined, or if it returns
#f
('()
), the text frominfo
statement is used (see info). If there is noinfo
statement, the string ‘No information available’ is used.
- Guile Callback: lang dbh
Virtual Table:
(cons "lang" lang)
Return a
cons
of languages supported by this database: Itscar
is a list of source languages, and itscdr
is a list of destination languages. For example, the following return value indicates that the database contains translations from English to French and Spanish:(cons (list "en") (list "fr" "es"))
A database is searched in a two-phase process. First, an appropriate
callback is called to do the search: define-word
is called for
DEFINE
searches and match-word
is called for matches.
This callback returns an opaque entity, called result handle,
which is then passed to the output
callback, which is responsible
for outputting it.
- Guile Callback: define-word dbh word
Virtual Table:
(cons "define" define-word)
Find definitions of word word in the database dbh. Return a result handle. If nothing is found, return
#f
or'()
.The argument dbh is the database handle returned by
open-db
.
- Guile Callback: match-word dbh strat key
Virtual Table:
(cons "match" match-word)
Find in the database dbh all headwords that match key, using strategy strat. Return a result handle. If nothing is found, return
#f
or'()
.The key is a Dico Key object, which contains information about the word being looked for. To obtain the actual word, use the
dico-key->word
function (see dico-key->word).The argument dbh is a database handle returned by
open-db
. The matching strategy strat is a special Scheme object that can be accessed using a set of functions described below (see Dico Scheme Primitives).
- Guile Callback: result-count resh
Virtual Table:
(cons "result-count" result-count)
Return the number of elements in the result set resh.
- Guile Callback: output resh n
Virtual Table:
(cons "output" output)
Output nth result from the result set resh. The argument resh is a result handle returned by
define-word
ormatch-word
callback.The data must be output to the current output port, e.g. using
display
orformat
primitives. If resh represents a match result, the output must not be quoted or terminated by newlines.It is guaranteed that the
output
callback will be called as many times as there are elements in resh (as determined by theresult-count
callback) and that for each subsequent call the value of n equals its value from the previous call incremented by one.At the first call n equals 0.
5.5.4 Dico Scheme Primitives
GNU Dico provides the following Scheme primitives for accessing various
fields of the strat
and key
arguments to match
callback:
- Function: dico-key? obj
Return ‘#t’ if obj is a Dico key object.
- Function: dico-key->word key
Extract the lookup word from the key object key.
- Function: dico-make-key strat word
Create new key object from strategy strat and word word.
- Function: dico-strat-selector? strat
Return true if strat has a selector (see Selector).
- Function: dico-strat-select? strat word key
Return true if key matches word as per strategy selector strat. The key is a ‘Dico Key’ object.
- Function: dico-strat-name strat
Return the name of strategy strat.
- Function: dico-strat-description strat
Return a textual description of the strategy strat.
- Function: dico-strat-default? strat
Return
true
if strat is a default strategy. See default strategy.
- Function: dico-register-strat strat descr [fun]
Register a new strategy. If fun is given it will be used as a callback for that strategy. Notice, that you can use strategies implemented in Guile in your C code as well (see strategy).
The selector function must be declared as follows:
(define (fun key word) ...)
It must return
#t
if key matches word, and#f
otherwise.
5.5.5 Example Module
In this subsection we will show how to build a simple dicod
module
written in Scheme. The source code of this module, called
listdict.scm and a short database for it, numerals-pl.db, are
shipped with the distribution in the directory examples.
The database is stored in a disk file in form of a list. The first
two elements of this list contain database description and full
information strings. Rest of elements are conses, whose car
contains the headword, and cdr
contains the corresponding
dictionary article. Following is an example of such a database:
("Short English-Norwegian numerals dictionary" "Short English-Norwegian dictionary of numerals (1 - 7)" ("one" . "en") ("two" . "to") ("three" . "tre") ("four" . "fire") ("five" . "fem") ("six" . "seks") ("seven" . "sju"))
We wish to declare such databases in dicod.conf the following way:
database { name "numerals"; handler "guile example.db"; }
Thus, the rest
argument to ‘open-db’ callback will be
‘("guile" "example.db")’ (see open-db). Given this, we may
write the callback as follows:
(define (open-db name . rest) (let ((db (with-input-from-file (cadr rest) (lambda () (read))))) (cond ((list? db) (cons name db)) (else (format (current-error-port) "open-module: ~A: invalid format\n" (car args)) #f))))
The list returned by this callback will then be passed as a database handle to another callback functions. To facilitate access to particular elements of this list, it is convenient to define the following syntax:
(define-syntax db:get (syntax-rules (info descr name corpus) ((db:get dbh name) ;; Return the name of the database. (list-ref dbh 0)) ((db:get dbh descr) ;; Return the desctiption. (list-ref dbh 1)) ((db:get dbh info) ;; Return the info string. (list-ref dbh 2)) ((db:get dbh corpus) ;; Return the word list. (list-tail dbh 3))))
Now, we can write ‘descr’ and ‘info’ callbacks:
(define (descr dbh) (db:get dbh descr)) (define (info dbh) (db:get dbh info))
The two callbacks ‘define-word’ and ‘match-word’ provide
the core module functionality. Their results will be passed to
‘output’ and ‘result-count’ callbacks as a “result handler”
argument. In the spirit of Scheme, we make the result a list. Its
car
is a boolean value: #t
, if the result
comes from ‘define-word’ callback, and #f
if it comes from
‘match-word’. The cdr
of this list contains a list of
matches. For ‘define-word’, it is a list of conses copied from
the database word list, whereas for ‘match-word’, it is a list of
headwords.
The ‘define-word’ callback returns all list entries whose
car
s contain the look up word. It uses mapcan
function, which is supposed to be defined elsewhere:
(define (define-word dbh word) (let ((res (mapcan (lambda (elt) (and (string-ci=? word (car elt)) elt)) (db:get dbh corpus)))) (and res (cons #t res))))
The ‘match-word’ callback (see match-word) takes three arguments: a database handler dbh, a strategy descriptor strat, and a word word to look for. The result handle it returns contains a list of headwords from the database that match word in the sense of strat. Thus, the behavior of ‘match-word’ depends on the strat. To implement this, let’s define a list of directly supported strategies (see below for definitions of particular ‘match-’ functions):
(define strategy-list (list (cons "exact" match-exact) (cons "prefix" match-prefix) (cons "suffix" match-suffix)))
The ‘match-word’ callback will then select an entry from
that list and call its cdr
, e.g.:
(define (match-word dbh strat key) (let ((sp (assoc (dico-strat-name strat) strategy-list))) (let ((res (cond (sp ((cdr sp) dbh strat (dico-key->word key)))
If the requested strategy is not in that list, the function will use the selector function if it is available, and the default matching function otherwise:
((dico-strat-selector? strat) (match-selector dbh strat key)) (else (match-default dbh strat (dico-key->word key))))))
Notice the use of dico-key->word
function to extract the actual
lookup word from the key object.
To summarize, the ‘match-word’ callback is:
(define (match-word dbh strat key) (let ((sp (assoc (dico-strat-name strat) strategy-list))) (let ((res (cond (sp ((cdr sp) dbh strat (dico-key->word key))) ((dico-strat-selector? strat) (match-selector dbh strat key)) (else (match-default dbh strat (dico-key->word key)))))) (if res (cons #f res) #f))))
Now, let’s create the ‘match-’ functions it uses. The ‘exact’ strategy is easy to implement:
(define (match-exact dbh strat word) (mapcan (lambda (elt) (and (string-ci=? word (car elt)) (car elt))) (db:get dbh corpus)))
The ‘prefix’ and ‘suffix’ strategies are implemented using
SRFI-13 (see SRFI-13 in The Guile Reference Manual)
functions string-prefix-ci?
and string-suffix-ci?
, e.g.:
(define (match-prefix dbh strat word) (mapcan (lambda (elt) (and (string-prefix-ci? word (car elt)) (car elt))) (db:get dbh corpus)))
Notice that whereas the ‘prefix’ strategy is defined by the server itself, the ‘suffix’ strategy is an extension, and should therefore be registered:
(dico-register-strat "suffix" "Match word suffixes")
The match-selector
function is pretty similar to its
siblings, except that it uses dico-strat-select?
(see dico-strat-select?) to select the
matching elements. This also leads to this function expecting
a key as its third argument, in contrast to the previous
matchers, which expect the actual lookup word there:
(define (match-selector dbh strat key) (mapcan (lambda (elt) (and (dico-strat-select? strat (car elt) key) (car elt))) (db:get dbh corpus)))
Finally, the match-default
is a variable that refers to
the default matching strategy for this module, e.g.:
(define match-default match-prefix)
The two callbacks left to define are ‘result-count’ and
‘output’. The first of them simply returns the number of
elements in cdr
of the result:
(define (result-count rh) (length (cdr rh)))
The behavior of ‘output’ depends on whether the result is produced by ‘define-word’ or by ‘match-word’.
(define (output rh n) (if (car rh) ;; Result comes from DEFINE command. (let ((res (list-ref (cdr rh) n))) (display (car res)) (newline) (display (cdr res))) ;; Result comes from MATCH command. (display (list-ref (cdr rh) n))))
Finally, at the end of the module the callbacks are made known to
dicod
by the module initialization function:
(define-public (example-init arg) (list (cons "open" open-module) (cons "descr" descr) (cons "info" info) (cons "define" define-word) (cons "match" match-word) (cons "output" output) (cons "result-count" result-count)))
Notice, that in this implementation ‘close-db’ callback was not needed.
5.6 Python
The python
module provides an interface which allows
programmers to write loadable modules in Python. The syntax for
loading the module is:
load-module name { command "python" " init-script=name" " load-path=path" " root-class=name"; }
All parameters are optional:
- python module: load-path=path
Augments the default search path for Python modules. The format of path is the usual UNIX path specification: a colon-separated list of directory names.
- python module: init-script=name
Specifies the name of the initial Python source file. This file will be loaded and interpreted immediately after loading the module.
- python module: root-class=name
Sets the name of the Python root class, which is responsible for the dictionary operations.
A particular instance of the python
module is loaded using
the handler
statement within a database
block. This
statement takes the same parameters as described above, plus any
number of command line arguments, which will be passed to the root
class constructor.
5.6.1 Python Dictionary Class
The dictionary class must define the following methods:
- Method on DictionaryClass: __init__ self *argv
Class constructor. The argv array supplies positional arguments from the
handler
statement in the configuration file.
- Method on DictionaryClass: open self dbname
Opens the database named dbname. Returns ‘True’ on success and ‘False’ on failure.
- Method on DictionaryClass: close self
Closes the database.
- Method on DictionaryClass: descr self
Returns a short description of the database.
- Method on DictionaryClass: info self
Returns a text describing the database.
- Method on DictionaryClass: lang self
Optional. Returns supported languages as ‘(src, dst)’.
- Method on DictionaryClass: define_word self word
Defines word. Returns a result (an opaque Python object) if the definition was found or ‘False’ otherwise.
- Method on DictionaryClass: match_word self strat word
Searches for word in the database using strategy strat. Returns a result (an opaque Python object) if some matches were found or ‘False’ otherwise.
- Method on DictionaryClass: output self result n
Outputs nth result from the result set result.
- Method on DictionaryClass: result_count self result
Returns number of elements in the result set.
- Method on DictionaryClass: compare_count self result
Optional. Returns the number of comparisons performed when constructing the result set.
- Method on DictionaryClass: result_headers self result hdr
Optional. Returns a dictionary of MIME headers.
- Method on DictionaryClass: free_result self result
Reclaims any resources used by the result set.
5.6.2 Dico Python Primitives
- Python primitive: register_strat name descr [proc]
Registers new match strategy. The arguments are:
- name
Strategy name for use in the
MATCH
command.- descr
The dscription, which will appear in the output of
SHOW STRAT
command.- proc
Optional selector procedure.
If the proc argument is present, it must be the name of a Python function declared as:
def select(opcode key headword):
Its arguments are:
- opcode
Integer operation code.
- key
An
DicoSelectionKey
object identifying the search term (see DicoSelectionKey).- headword
The headword being examined.
At the beginning of the search, the function is called with the ‘DICO_SELECT_BEGIN’ as its opcode argument. It must perform the necessary initialization and return.
At the end of the search loop, the function is called with opcode ‘DICO_SELECT_END’. It must perform the necessary deinitialization procedures and exit.
In both cases, the key and headword arguments are not defined.
Within the search loop, the function will be called for each headword from the database. The opcode parameter will be ‘DICO_SELECT_RUN’. In this case the function must return ‘True’ if the headword matches the key and ‘False’ otherwise.
- Python primitive: register_markup name
Registers a markup name.
- Python primitive: current_markup
Returns the name of the current markup.
5.6.2.1 The DicoSelectionKey
class
The DicoSelectionKey
class represents a search key and is used
when looking for matches. Calling str
on the object of that
class returns the search term itself, as does the word
method:
- Method on DicoSelectionKey: word
Returns the search term. It is equivalent to the
__str__
attribute.
5.6.2.2 The DicoStrategy
class
A match strategy is represented by an object of the
DicoStrategy
class.
- Variable of DicoStrategy: name
The name of that strategy.
- Variable of DicoStrategy: descr
Textual description of the strategy.
- Variable of DicoStrategy: has_selector
‘True’ if this strategy has a selector (see Python Selector).
- Variable of DicoStrategy: name is_default
‘True’ if this is the default strategy.
- Method on DicoStrategy: select headword key
Returns ‘True’ if key matches headword as per this strategy.
5.6.3 Python Example
In this subsection we will show a simple database module written in Python. This module handles simple textual databases in the following format:
- Empty lines and lines beginning with double dash are ignored.
- A line beginning with ‘descr:’ introduces a short
dictionary description for
SHOW DB
. The ‘descr:’ prefix and the white space immediately following it are removed. E.g.:descr: Short English-Norwegian numerals dictionary
- Lines beginning with ‘info:’ provide a verbose description
of the database. These lines are concatenated after removing the
‘info:’ prefix and white space immediately following it. E.g.:
info: A short English-Norwegian (Bokmål) dictionary info: of numerals. info: info: This dictionary is public domain.
- A line beginning with ‘lang:’ defines source and
destination languages for this dictionary. E.g.:
lang: en : nb
- Any line consisting of exactly two words defines a dictionary
entry. E.g.:
one en two to three tre four fire
Now, let’s create a module for handling this format. First, we need to import Dico primitives (see Dico Python Primitives) and the ‘sys’ module. The latter is needed for output functions:
import dico import sys
Then, a result class will be needed for match_word
and
define_word
methods. It will contain the actual data in
the variable ‘result’:
class DicoResult: # actual data. result = {} # number of comparisons. compcount = 0 def __init__ (self, *argv): self.result = argv[0] if len (argv) == 2: self.compcount = argv[1] def count (self): return len (self.result) def output (self, n): pass def append (self, elt): self.result.append (elt)
The following two classes extend ‘DicoResult’ for use with
‘DEFINE’ and ‘MATCH’ operations. The define_word
method will return an instance of the ‘DicoDefineResult’ class:
class DicoDefineResult (DicoResult): def output (self, n): print "%d. %s" % (n + 1, self.result[n]) print "---------",
The match_word
method will return an instance of the
‘MatchResult’ class:
class DicoMatchResult (DicoResult): def output (self, n): sys.stdout.softspace = 0 print self.result[n],
Now, let’s define the dictionary class:
class DicoModule: # The dictionary converted to associative array. adict = {} # The database name. dbname = '' # The name of the corresponding disk file. filename = '' # A sort information about the database. mod_descr = '' # A verbose description of the database is kept. # as an array of strings. mod_info = [] # A list of source and destination languages: langlist = ()
The class constructor takes a single argument, defining the name of the database file:
def __init__ (self, *argv): self.filename = argv[0] pass
The ‘open’ method opens the database and reads its data:
def open (self, dbname): self.dbname = dbname file = open (self.filename, "r") for line in file: if line.startswith ('--'): continue if line.startswith ('descr: '): self.mod_descr = line[7:].strip (' \n') continue if line.startswith ('info: '): self.mod_info.append (line[6:].strip (' \n')) continue if line.startswith ('lang: '): s = line[6:].strip (' \n').split(':', 2) if (len(s) == 1): self.langlist = (s[0].split (), \ s[0].split ()) else: self.langlist = (s[0].split (), \ s[1].split ()) continue f = line.strip (' \n').split (' ', 1) if len (f) == 2: self.adict[f[0].lower()] = f[1].strip (' ') file.close() return True
The database is kept entirely in memory, so there is no need for ‘close’ method. However, it must be declared anyway:
def close (self): return True
The methods returning database information are trivial:
def descr (self): return self.mod_descr def info (self): return '\n'.join (self.mod_info) def lang (self): return self.langlist
The ‘define_word’ method checks if the search term is present in
the dictionary, and, if so, converts it to the DicoDefineResult
:
def define_word (self, word): if self.adict.has_key (word): return DicoDefineResult ([self.adict[word]]) return False
The ‘match_word’ method supports the ‘exact’ strategy
natively via the has_key
attribute of adict
:
def match_word (self, strat, key): if strat.name == "exact": if self.adict.has_key (key.word.lower ()): return DicoMatchResult \ ([self.adict[key.word.lower()]])
Other strategies are supported as long as they have selectors:
elif strat.has_selector: res = DicoMatchResult ([], len (self.adict)) for k in self.adict: if strat.select (k, key): res.append (k) if res.count > 0: return res return False
The rest of methods rely on the result object to do the right thing:
def output (self, rh, n): rh.output (n) return True def result_count (self, rh): return rh.count () def compare_count (self, rh): return rh.compcount
5.7 Stratall
The stratall
module provides a new strategy, called ‘all’.
This strategy always returns a full list of headwords from the
database, no matter what the actual search word is.
To load this strategy, use the following configuration statement:
load-module stratall;
Using this strategy on a full set of databases (‘MATCH * all ""’) produces enormous amount of output, which may induce a considerable strain on the server, therefore it is advised to block such usage as suggested in Strategies and Default Searches:
strategy all { deny-all yes; }
5.8 Substr
The substr
module provides a ‘substr’ search
strategy. This strategy matches a substring anywhere in the
keyword. For example:
C: MATCH eng-deu substr orma S: 152 207 matches found: list follows S: eng-deu "abnormal" S: eng-deu "conformable" S: eng-deu "doorman" S: eng-deu "format" …
The loading procedure expects no arguments:
load-module substr;
5.9 Word
The word
module provides the following strategies:
- word
Match separate words within headwords.
- first
Match the first word within headwords.
- last
Match the last word within headwords.
The initialization procedure loads all three if given no arguments, as in
load-module word;
If arguments are given, the initialization procedure loads only those strategies that are listed in its command line. For example, the statement below loads only ‘first’ and ‘last’ strategies:
load-module word { command "word first last"; }
The following is an example of using one of those strategies in a dico session:
C: MATCH devdict word government S: 152 1 matches found: list follows S: devdict "MONARCHICAL GOVERNMENT" S: . S: 250 Command complete
5.10 Nprefix
The nprefix
module provides a strategy similar to
‘prefix’, but which returns the specified range of bytes. For
example, the statement
MATCH dict nprefix skip#count#string
where skip and count are positive integer numbers, returns at most count headwords whose prefix matches string, omitting first skip unique matches.
The entire ‘skip#count#’ construct is optional. If not supplied, the ‘nprefix’ strategy behaves exactly as ‘prefix’.
The module is loaded using this simple statement:
load-module nprefix;
5.11 metaphone2
The metaphone2
module provides a strategy based on
Double Metaphone phonetic encoding algorithm, published by
Lawrence Philips.
The module is normally loaded as follows:
load-module metaphone2;
The only available initialization parameter is
- metaphone2 parameter: size number
Defines the size of computed Double Metaphone codes, in characters. The default is 4.
load-module metaphone2 { command "metaphone2 size=16"; }
5.12 Pcre
The pcre
module provides a matching strategy using
Perl-compatible regular expressions. The module is loaded
using a simple statement:
load-module pcre;
The strategy has the same name as the module and is reflected in the server’s HELP output as shown below:
pcre "Match using Perl-compatible regular expressions"
The headword argument to the pcre
MATCH statement should be
a valid Perl regular expression. It can optionally be enclosed in
a pair of slashes, in which case one or more of the following flags
can appear after the closing slash:
a
The regexp is anchored, that is, it is constrained to match only at the first matching point in the string that is being searched.
e
Ignore whitespace and ‘#’ comments in the expression.
i
Ignore case when matching.
G
Inverts the greediness of the quantifiers so that they are not greedy by default, but become greedy if followed by ‘?’. The same can also be achieved by setting the ‘(?U)’ option within the pattern.
Any of these flags can also be used in reverted case, which also reverts its meaning. For example, ‘I’ means case-sensitive matching.
Here is an example of using this strategy in a dico session:
MATCH ! pcre "/\\stext/i"
5.13 Ldap
The ldap
module loads the support for LDAP user
databases. It is available if Dico has been configured with
LDAP.
The module needs no additional configuration parameters:
load-module ldap;
See ldap userdb, for a description of its use.
5.14 pam
The pam
module implements user authentication via PAM.
It can be used only with ‘LOGIN’ and ‘PLAIN’ GSASL
authentication methods.
The module is loaded as follows:
load-module pam { command "pam [service=sname]"; }
where sname is the name of PAM service to use. If not supplied, ‘dicod’ service will be used.
The user database is normally initialized as:
user-db "pam://localhost";
If password-resource
statement is given, its value will be used
as service name, instead of the one specified in the
load-module
statement, e.g.:
user-db "pam://localhost" { password-resource "local"; }
The group-resource
statement is not used, because there is no
mechanism to return textual data from PAM.
Footnotes
(3)
See http://wordnet.princeton.edu/wordnet/, for a detailed information, including links to download.
This document was generated on September 4, 2020 using makeinfo.
Verbatim copying and distribution of this entire article is permitted in any medium, provided this notice is preserved.