GNU Dico Manual (split by chapter):   Section:   Chapter:FastBack: Modules   Up: Top   FastForward: dico client   Contents: Table of ContentsIndex: Concept Index

6 Dico Module Interface

This chapter describes the API for Dico loadable modules.

6.1 dico_database_module

Each module must export exactly one symbol of type struct dico_database_module. This symbol must be declared as

DICO_EXPORT(name, module)

where name is the name of the module file (without suffix). For example, a module word.so would have in its sourse the following declaration:

struct dico_database_module DICO_EXPORT(word, module) = {
…
};

The dico_database_module has the following members:

dico_database_module: unsigned dico_version

Interface version being used. It is recommended to use the macro DICO_MODULE_VERSION, which keeps the version number of the current interface.

dico_database_module: unsigned dico_capabilities

Module capabilities. As of version 2.7, this member can be one of the following:

DICO_CAPA_DEFAULT

This module defines a handler for a specific database format.

DICO_CAPA_NODB

This module does not handle any databases. When this capability is specified, dicod will call only the dico_init member of the structure.

This capability is used by modules defining new matching strategies or authentication methods.

Dico Callback: int dico_init (int argc, char **argv)

This callback is called right after loading the module. It is responsible for module initialization. The arguments are:

argc

Number of elements in argv.

argv

The command line given by command configuration statement (see command), split into words. The element argv[0] is the name of the module. The element argv[argc] is ‘NULL’. Word splitting follows the rules similar to those used in shell. In particular, a quoted string (using both single and double quotes) is handled as a single word.

If dico_capabilities is DICO_CAPA_DEFAULT, this method is optional. If dico_capabilities is set to DICO_CAPA_NODB, dico_init is mandatory and must be the only method defined.

Dico Callback: dico_handle_t dico_init_db (const char *db, int argc, char **argv)

Initialize the database. This method is called as a part of database initialization routine at startup of dicod, after processing dictionary configuration statement (see Databases). Its arguments are:

db

The name of the database, as given by the name statement.

argc

Number of elements in argv.

argv

The command line given by handler configuration statement (see handler). The array is ‘NULL’-terminated.

This method returns a database handle, an opaque structure identifying the database. This handle will be passed as the first argument to other methods. On error, dico_init_db shall return NULL.

Notice, that this function is not required to actually open the database, if the ‘open’ notion is supported by the underlying mechanism. Another method, dico_open is responsible for that.

Dico Callback: int dico_free_db (dico_handle_t dh)

Reclaim any resources associated with database handle dh. This method is called as part of exit cleanup routine, before the main dicod process terminates.

It shall return ‘0’ on success, or any non-‘0’ value on failure.

Dico Callback: int dico_open (dico_handle_t dh)

Open the database identified by the handle dh. This method is called as part of child process initialization routine.

It shall return ‘0’ on success, or any non-‘0’ value on failure.

The dico_open method is optional.

Dico Callback: int dico_close (dico_handle_t dh)

Close the database identified by the handle dh. This method is called as part of child process termination routine.

It shall return ‘0’ on success, or any non-‘0’ value on failure.

The dico_close method is optional, but if dico_open is defined, dico_close must be defined as well.

Dico Callback: char * dico_db_info (dico_handle_t dh)

Return a database information string for the database identified by dh. This function is called on each SHOW INFO command, unless an informational text for this database is supplied in the configuration file (see info). This value must be allocated using malloc(3). The caller is responsible for freeing it when no longer needed.

This method is optional.

Dico Callback: char * dico_db_descr (dico_handle_t dh)

Return a short database description string for the database identified by dh. This function is called on each SHOW DB command, unless a description for this database is supplied in the configuration file (see descr). This value must be allocated using malloc(3). The caller is responsible for freeing it when no longer needed.

This method is optional.

Dico Callback: dico_result_t dico_match (dico_handle_t dh, const dico_strategy_t strat, const char *word)

Use the strategy strat to search in the database dh, and return all headwords matching word.

This method returns a result handle, an opaque pointer that can then be used to display the obtained results. It returns NULL if no matches were found.

Dico Callback: dico_result_t dico_define (dico_handle_t dh, const char *word)

Find definitions of headword word in the database identified by dh.

This method returns a result handle, an opaque pointer that can then be used to display the obtained results. It returns NULL if no matches were found.

Dico Callback: int dico_output_result (dico_result_t rp, size_t n, dico_stream_t str)

The dico_output_result method outputs to stream str the nth result from result set rp. The latter is a result handle, obtained from a previous call to dico_match or dico_define.

Returns ‘0’ on success, or any non-‘0’ value on failure.

It is guaranteed that the dico_output_result callback is called as many times as there are elements in rp (as determined by the dico_result_count callback, described below) and that for each subsequent call the value of n equals its value from the previous call incremented by one.

At the first call n equals 0.

Dico Callback: size_t dico_result_count (dico_result_t rp)

Return the number of distinct elements in the result set identified by rp. The latter is a result handle, obtained from a previous call to dico_match or dico_define.

Dico Callback: size_t dico_compare_count (dico_result_t rp)

Return the number of comparisons performed when constructing the result set identified by rp.

This method is optional.

Dico Callback: void dico_free_result (dico_result_t rp)

Free any resources used by the result set rp, which is a result handle, obtained from a previous call to dico_match or dico_define.

Dico Callback: int dico_result_headers (dico_result_t rp, dico_assoc_list_t hdr)

Populate associative list hdr with the headers describing result set rp. This callback is optional. If defined, it will be called before outputting the result set rp if OPTION MIME is in effect (see OPTION MIME).

Dico Callback: int dico_run_test (int argc, char **argv)

Runs unit tests for the module. Argument vector contains all command line arguments that follow the --runtest option, up to the ‘--’ marker or end of line, whichever is encountered first.

6.2 Strategies

A search strategy is described by the following structure:

struct dico_strategy {
    char *name;          /* Strategy name */
    char *descr;         /* Strategy description */
    dico_select_t sel;   /* Selector function */
    void *closure;       /* Additional data for SEL */ 
    int is_default;      /* True, if this is a default strategy */
    dico_list_t stratcl; /* Strategy access control list */  
};

The first two members are mandatory and must be defined for each strategy:

member of struct dico_strategy: char * name

Short name of the strategy. It is used as second argument to the MATCH command (see MATCH) and is displayed in the first column of output by the SHOW STRAT command (see SHOW STRAT).

member of struct dico_strategy: char * descr

Strategy description. It is the string shown in the second column of SHOW STRAT output (see SHOW STRAT).

member of struct dico_strategy: dico_select_t sel

A selector function, which is used in iterative matches to select matching headwords. The sel function is called for each headword in the database with the headword and search key as its arguments and returns 1 if the headword matches the key and 0 otherwise. The dico_select_t type is defined as:

typedef int (*dico_select_t) (int, dico_key_t,
                              const char *);

See Selector, for a detailed description.

member of struct dico_strategy: void * closure

An opaque data pointer intended for use by the selector function.

member of struct dico_strategy: int is_default

This member is set to 1 by the server if this strategy is selected as the default one (see default strategy).

member of struct dico_strategy: dico_list_t stratcl

A control list associated with this strategy. See Strategies and Default Searches.

6.2.1 Search Key Structure

The dico_key_t is defined as a pointer to the structure dico_key:

struct dico_key {
    char *word;
    void *call_data;
    dico_strategy_t strat;
    int flags;
};

The structure represents a search key for matching algorithms. Its members are:

member of struct dico_key: char * word

The search word or expression.

member of struct dico_key: void * call_data

A pointer to selector-specific data. If necessary, it can be initialized by the selector when called with the ‘DICO_SELECT_BEGIN’ opcode and deallocated when called with the ‘DICO_SELECT_END’ opcode.

member of struct dico_key: dico_strategy_t strat

A pointer to the strategy structure.

member of struct dico_key: int flags

Key-specific flags. These are used by the server.

The following functions are defined to operate on search keys:

function: int dico_key_init (struct dico_key *key, dico_strategy_t strat, const char *word)

Initialize the key structure key with the given strategy strat and search word word. If strat has a selector function, it will be called with the ‘DICO_SELECT_BEGIN’ opcode (see DICO_SELECT_BEGIN) to carry out the necessary initializations.

The key itself may point to any kind of memory storage.

function: void dico_key_deinit (struct dico_key *key)

Deinitialize the dico_key structure initialized by a prior call to dico_key_init. If the key strategy has a selector, it will be called with the ‘DICO_SELECT_END’ opcode.

Note that this function makes no assumptions about the storage type of key. If it points to a dynamically allocated memory, it is the caller responsibility to free it.

function: int dico_key_match (struct dico_key *key, const char *word)

Match headword and key. Return 1 if they match, 0 if they don’t match and -1 in case of error. This function calls the strategy selector with the ‘DICO_SELECT_RUN’ opcode (see DICO_SELECT_RUN). It is an error if the strategy selector is not defined.

6.2.2 Strategy Selectors

Wherever possible, modules should implement strategies using effective look up algorithms. For example, ‘exact’ and ‘prefix’ strategies must normally be implemented using binary search in the database index. The ‘suffix’ strategy can also be implemented using binary search if a special reverse index is built for the database (this is the approach taken by outline and dictorg modules).

However, some strategies can only be implemented using a relatively expensive iteration over all keys in the database index. For example, ‘soundex’ and ‘levenshtein’ strategies cannot be implemented otherwise.

A strategy that can be used in iterative look ups must define a selector. Strategy selector is a function which is called for each database headword to determine whether it matches the search key.

It is defined as follows:

selector: int select (int opcode, dico_key_t key, const char *headword)

A strategy selector. Its arguments are:

opcode

The operation code. Its possible values are ‘DICO_SELECT_BEGIN’, ‘DICO_SELECT_RUN’ and ‘DICO_SELECT_END’, as described below.

key

The search key.

headword

The database headword.

The selector function is called before entering the iteration loop with ‘DICO_SELECT_BEGIN’ as its argument. If necessary, it can perform any additional initialization of the strategy, such as allocation of auxiliary data structures, etc. The call_data member of dico_key_t structure (see call_data) should be used to keep the pointer to the auxiliary data. The function should return 0 if it successfully finished its initialization and non-zero otherwise.

Once the iteration loop is finished, the selector will be called with ‘DICO_SELECT_END’ as its first argument. This invocation is intended to deallocate any auxiliary memory and release any additional resources allocated at the initialization state.

In these two additional invocations, the headword parameter will be ‘NULL’.

Once the iteration loop is entered, the selector function will be called for each headword. Its opcode parameter will be ‘DICO_SELECT_RUN’ and the headword parameter will point to the headword. The function should return 1 if the headword matches the key, 0 if it does not and a negative value in case of failure.

To illustrate the concept of strategy selector, let’s consider the implementation of the ‘soundex’ strategy in dicod. This strategy computes a four-character soundex code for both search key and the headword and returns 1 (match) if both codes coincide. To speed up the process, the code for the search key is computed only once, at the initialization stage, and stored in a temporary memory assigned to the key->call_data. This memory is reclaimed at the terminating call:

int
soundex_sel(int cmd, dico_key_t key, const char *dict_word)
{
    char dcode[DICO_SOUNDEX_SIZE];

    switch (cmd) {
    case DICO_SELECT_BEGIN:
        key->call_data = malloc(DICO_SOUNDEX_SIZE);
        if (!key->call_data)
            return 1;
        dico_soundex(key->word, key->call_data);
        break;

    case DICO_SELECT_RUN:
        dico_soundex(dict_word, dcode);
        return strcmp(dcode, key->call_data) == 0;

    case DICO_SELECT_END:
        free(key->call_data);
        break;
    }
    return 0;
}

6.3 Output

The dico_output_result method is called when the server needs to output the result of a ‘define’ or ‘match’ command. It must be defined as follows:

int output_result (dico_result_t rp, size_t n,
                   dico_stream_t str);

The rp argument points to the result in question. From the server’s point of view it is an opaque pointer. The application shall define its own result structure, so normally the first operation the dico_output_result method does is typecasting rp to a pointer to that structure in order to be able to access its members.

A result can conceptually contain several parts. For example, the result of a ‘DEFINE’ command can contain several definitions of the term. Similarly, the result of ‘MATCH’ contains one or more matches. The server obtains the exact number of parts in a result by calling the dico_result_count method (see dico_result_count).

When outputting a result, the server calls the dico_output_result in a loop, once for each result part. It passes the ordinal (zero-based) number of the part that needs to be output in the n parameter. It is guaranteed that n increases by one for each subsequent call of dico_output_result with the same rp parameter.

The str parameter identifies the output stream. The dico_output_result function must format the requested part from the result and output it to that stream. To do so it should use one of the following functions:

Function: int dico_stream_write (dico_stream_t str, const void *buf, size_t count)

Writes count bytes from the buffer pointed to by buf to the output stream str. Returns 0 on success, and non-zero on error.

Function: int dico_stream_writeln (dico_stream_t str, const char *buf, size_t size)

Same as dico_stream_write, but ends the output with a newline character (ASCII 10).

6.4 Module Unit Testing

The dico_run_test member of struct dico_database_module (see dico_run_test) points to the function that serves as entry point for unit tests of that module. If it is NULL, the module does not support unit testing. Otherwise, unit tests can be run using the following command line syntax:

$ dicod --runtest module [test_args] [-- init_args]

As usual, square brackets denote optional parts. The module argument specifies the name of the module to test. The arguments that follow the --runtest (-r) option are collected into two arrays: arguments up to the ‘--’ marker form the vector that is passed to the module’s dico_run_test function. The ‘--’ marker is optional. If present, arguments that follow it are collected into a separate argument vector starting from slot 1, the slot 0 is set to point to the module name and the resulting vector is passed to the dico_init method of the module.

When running unit tests, configuration file is ignored. The diagnostic messages are printed to the standard error output.

Use the --load-dir (-L) command line option, if the module being tested cannot be found in the default load path (see load path), e.g.:

$ dicod -L ../lib --runtest metaphone2 build A B C

GNU Dico Manual (split by chapter):   Section:   Chapter:FastBack: Interface   Up: Interface   FastForward: dico client   Contents: Table of ContentsIndex: Concept Index