Mailfromd Manual (split by section):   Section:   Chapter:FastBack: Library   Up: Library   FastForward: Using MFL Mode   Contents: Table of ContentsIndex: Concept Index

5.27 Interfaces to Third-Party Programs

A set of functions is defined for interfacing with other filters via TCP. Currently implemented are interfaces with SpamAssassin spamd daemon and with ClamAV anti-virus.

Both interfaces work much the same way: the remote filter is connected and the message is passed to it. If the remote filter confirms that the message matches its requirements, the function returns true. Notice that in practice that means that such a message should be rejected or deferred.

The address of the remote filter is supplied as the second argument in the form of a standard URL:

proto://path[:port]

The proto part specifies the connection protocol. It should be ‘tcp’ for the TCP connection and ‘file’ or ‘socket’ for the connection via UNIX socket. In the latter case the proto part can be omitted. When using TCP connection, the path part gives the remote host name or IP address and the optional port specifies the port number or service name to use. For example:

# connect to ‘remote.filter.net’ on port 3314:
tcp://remote.filter.net:3314

# the same, using symbolic service name (must be defined in
# /etc/services):
tcp://remote.filter.net:spamd

# Connect via a local UNIX socket (equivalent forms):
/var/run/filter.sock
file:///var/run/filter.sock
socket:///var/run/filter.sock

The description of the interface functions follows.

5.27.1 SpamAssassin

Built-in Function: boolean spamc (number msg, string url, number prec, number command)

Send the message msgt to the SpamAssassin daemon (spamd) listening on the given url. The command argument identifies what kind of processing is needed for the message. Allowed values are:

SA_SYMBOLS

Process the message and return 1 or 0 depending on whether it is diagnosed as spam or not. Store SpamAssassin keywords in the global variable sa_keywords (see below).

SA_REPORT

Process the message and return 1 or 0 depending on whether it is diagnosed as spam or not. Store entire SpamAssassin report in the global variable sa_keywords.

SA_LEARN_SPAM

Learn the supplied message as spam.

SA_LEARN_HAM

Learn the supplied message as ham.

SA_FORGET

Forget any prior classification of the message.

The second argument, prec, gives the precision, in decimal digits, to be used when converting SpamAssassin diagnostic data and storing them into mailfromd variables.

The floating point SpamAssassin data are converted to the integer mailfromd variables using the following relation:

var = int(sa-var * 10**prec)

where sa-var stands for the SpamAssassin value and var stands for the corresponding mailfromd one. int() means taking the integer part and ‘**’ denotes the exponentiation operator.

The function returns additional information via the following variables:

sa_score

The spam score, converted to integer as described above. To convert it to a floating-point representation, use sa_format_score function (see sa_format_score). See also the example below.

sa_threshold

The threshold, converted to integer form.

sa_keywords

If command is ‘SA_SYMBOLS’, this variable contains a string of comma-separated SpamAssassin keywords identifying this message, e.g.:

ADVANCE_FEE_1,AWL,BAYES_99

If command is ‘SA_REPORT’, the value of this variable is a spam report message. It is a multi-line textual message, containing detailed description of spam scores in a tabular form. It consists of the following parts:

  1. A preamble.
  2. Content preview.

    The words ‘Content preview’, followed by a colon and an excerpt of the message body.

  3. Content analysis details.

    It has the following form:

    Content analysis details: (score points, max required)
    

    where score and max are spam score and threshold in floating point.

  4. Score table.

    The score table is formatted in three columns:

    pts

    The score, as a floating point number with one decimal digit.

    rule name

    SpamAssassin rule name that contributed this score.

    description

    Textual description of the rule

    The score table can be extracted from sa_keywords using sa_format_report_header function (see sa_format_report_header), as illustrated in the example below.

The value of this variable is undefined if command is ‘SA_LEARN_SPAM’, ‘SA_LEARN_HAM’ or ‘SA_FORGET’.

The spamc function can signal the following exceptions: e_failure if the connection fails, e_url if the supplied URL is invalid and e_range if the supplied port number is out of the range 1–65535.

An example of using this function:

prog eom
do
  if spamc(current_message(), "tcp://192.168.10.1:3333", 3,
           SA_SYMBOLS)
    reject 550 5.7.0
         "Spam detected, score %sa_score with threshold %sa_threshold"
  fi
done

Here is a more advanced example:

prog eom
do
  set prec 3
  if spamc(current_message(),
           "tcp://192.168.10.1:3333", prec, SA_REPORT)
    add "X-Spamd-Status" "SPAM"
  else
    add "X-Spamd-Status" "OK"
  fi
  add "X-Spamd-Score" sa_format_score(sa_score, prec)
  add "X-Spamd-Threshold" sa_format_score(sa_threshold, prec)
  add "X-Spamd-Keywords" sa_format_report_header(sa_keywords)
done
Library Function: boolean sa (string url, number prec; number command)

Additional interface to the spamc function, provided for backward compatibility. It is equivalent to

spamc(current_message(), url, prec, command)

If command is not supplied, ‘SA_SYMBOLS’ is used.

5.27.2 DSPAM

DSPAM is a statistical spam filter distributed under the terms of the GNU General Public License. It is available from http://dspam.sourceforge.net.

MFL provides an interface to DSPAM functionality if the libdspam library is installed and mailfromd is linked with it. The m4 macro ‘WITH_DSPAM’ is defined if it is so.

The DSPAM functions and definitions become available after requiring the ‘dspam’ module:

require 'dspam'
Built-in Function: number dspam (number msg, number mode_flags; number class_source)

Analyze a message using DSPAM. The message is identified by its descriptor, passed in the msg argument.

The mode_flags argument controls the function behavior. Its value is a bitwise OR of operation mode, flag, tokenizer and training mode. Operation mode defines what dspam is supposed to do with the message. Its value is either ‘DSM_PROCESS’ if full processing of the message is intended (the default), or ‘DSM_CLASSIFY’, if the message must only be classified.

Optional flag bits turn on additional functionality. The ‘DSF_SIGNATURE’ bit instructs dspam to create a signature for the message – a unique string which can subsequently be used to identify that particular message. Upon return from the function, the signature is stored in the dspam_signature variable.

The ‘DSF_NOISE’ bit enables Bayesian noise reduction, and ‘DSF_WHITELIST’ enables automatic whitelisting.

Additional flags are available for defining the algorithm to split the message into tokens (tokenizer) and training mode. See flags-dspam, for a complete list of these. All these are optional, any missing values will be read from the DSPAM configuration file.

The configuration file must always be present. Its full file name must be stored in the global variable dspam_config. There is no default value, so make sure this variable is initialized. If a specific profile section should be read, store the name of that profile in the variable dspam_profile.

When called to process or classify the message, dspam returns an integer code of the class of the message. The value ‘DSR_ISSPAM’ means that this message was classified as spam. The value ‘DSR_ISINNOCENT’ means it is a clean (“ham”) message.

The probability and confidence values are returned in global variables dspam_probability and dspam_confidence. Since MFL lacks floating-point data type, both variables keep integers, obtained from the corresponding floating point values by shifting the decimal point dspam_prec digits to the right and rounding the resulting value to the nearest integer. The same method is used in spamc function (see sa-floating-point-conversion). The default value for dspam_prec variable is 3. You can use the sa_format_score function to convert these values to strings representing floating point numbers, e.g.:

require 'dspam'
require 'sa'

prog eom
do
  if dspam(current_message(), DSM_PROCESS | DSM_SIGNATURE)
       == DSR_ISSPAM
    header_add("X-DSPAM-Result", "Spam")
  else
    header_add("X-DSPAM-Result", "Innocent")
  fi
  header_add("X-DSPAM-Probability",
             sa_format_score(dspam_probability, dspam_prec))
  header_add("X-DSPAM-Confidence",
             sa_format_score(dspam_confidence, dspam_prec))
  header_add("X-DSPAM-Signature", dspam_signature)
done

Optional class_source argument is used when training the DSPAM classifier. It is a bitwise OR of the message class and message source values. Message class specifies the class this message belongs to. Possible values are ‘DSR_ISSPAM’, for spam messages, and ‘DSR_ISINNOCENT’, for clean messages. Message source informs DSPAM where this message comes from. The value ‘DSS_ERROR’ means the message was previously misclassified by DSPAM. The value ‘DSS_CORPUS’ indicates the message comes from a corpus feed. Finally, the value ‘DSS_INOCULATION’ means that the message is in pristine form, and should be trained as an inoculation. Inoculation is a more intense mode of training, usually used on honeypots.

The following example calls dspam to train the classifier on the current message if it was sent to a honeypot address, and uses dspam to analyze the message class otherwise. The honeypot variable is supposed to be set elsewhere in the code (e.g. in the ‘envrcpt’ handler):

prog eom
do
  number res
  if honeypot
    set res dspam(current_message(), DSM_PROCESS,
                  DSR_ISSPAM | DSS_INOCULATION)
    discard
  else
    if dspam(current_message(), DSM_PROCESS | DSM_SIGNATURE)
             == DSR_ISSPAM
      header_add("X-DSPAM-Result", "Spam")
    else
      header_add("X-DSPAM-Result", "Innocent")
    fi
    header_add("X-DSPAM-Probability",
               sa_format_score(dspam_probability, dspam_prec))
    header_add("X-DSPAM-Confidence"
               sa_format_score(dspam_confidence, dspam_prec))
    header_add("X-DSPAM-Signature", dspam_signature)
  fi
done

5.27.2.1 DSPAM Operation Modes and Flags.

The tables below summarize flags which can be used in the mode_flags argument to dspam function. The argument is a bitwise OR of operation mode, flags, tokenizer and training mode bits. Only one operation mode bit can be used. Flags, tokenizer and training mode are optional. Any number of flags, but no more than one tokenizer type and one training mode bit are allowed. Missing values will be supplied from the configuration file.

ModeMeaning
DSM_PROCESSProcess message
DSM_CLASSIFYClassify message only (do not write changes)

Table 5.2: DSPAM Operation modes

FlagMeaning
DSF_SIGNATURECreate a signature
DSF_NOISEUse Bayesian Noise Reduction
DSF_WHITELISTUse Automatic Whitelisting

Table 5.3: DSPAM flags

ConstantMeaning
DSZ_WORDUse WORD tokenizer
DSZ_CHAINUse CHAIN tokenizer
DSZ_SBPHUse SBPH tokenizer
DSZ_OSBUse OSB tokenizer

Table 5.4: DSPAM Tokenizer bits

ModeMeaning
DST_TEFTTrain Everything
DST_TOETrain-on-Error
DST_TUMTrain-until-Mature

Table 5.5: DSPAM Training Modes

5.27.2.2 DSPAM Class and Source Bits

The tables below summarize flags which can be used in the class_source argument to dspam function. The argument is a bitwise OR of classification and source bits. At most one classification and one source bit can be given. If not supplied, ‘DSR_NONE|DSS_NONE’ is used.

The classification flags are also used as the return code, as shown in the following table.

ModeAs return valueAs argument
DSR_NONEN/AClassify message
DSR_ISSPAMMessage is spamLearn as spam
DSR_ISINNOCENTMessage is innocentLearn as innocent

Table 5.6: DSPAM Classification

SourceMeaning
DSS_NONENo classification source (use only with DSR_NONE)
DSS_ERRORMisclassification by libdspam
DSS_CORPUSMessage came from a corpus feed
DSS_INOCULATIONMessage inoculation

Table 5.7: DSPAM Source

5.27.2.3 DSPAM Global Variables

Following global variables affect the behavior of the dspam function:

Built-in variable: string dspam_config

Name of the DSPAM configuration file. You must set this variable prior to calling dspam. There is no default value.

Built-in variable: string dspam_profile

Name of the configuration profile to be used. If empty (the default), use global configuration settings.

Built-in variable: string dspam_user

Name of the user on behalf of which dspam is called. Default is empty (no user).

Built-in variable: string dspam_group

Name of the user group on behalf of which dspam is called. Default is empty (no group).

Built-in variable: number dspam_prec

Number of decimal digits to retain in the dspam_probability and dspam_confidence values. See dspam probability and confidence, for more information and examples.

Before returning, dspam stores additional information in the following variables:

Built-in variable: string dspam_signature

Signature of the classified message. This variable is initialized if ‘DSF_SIGNATURE’ bit is set in the mode_flags argument (see dspam classify example),

Built-in variable: number dspam_probability

Spam probability value converted to integer by shifting decimal point dspam_prec positions to the right and rounding the resulting number. See dspam probability and confidence, for more information and examples.

Built-in variable: number dspam_confidence

Spam confidence converted to integer using the same algorithm as for dspam_probability. See dspam probability and confidence, for more information and examples.

5.27.3 ClamAV

Built-in Function: boolean clamav (number msg, string url)

Pass the message msg to the ClamAV daemon at url. Return true if it detects a virus in it. Return virus name in clamav_virus_name global variable.

The clamav function can signal the following exceptions: e_failure if failed to connect to the server, e_url if the supplied URL is invalid and e_range if the supplied port number is out of the range 1–65535.

An example usage:

prog eom
do
  if clamav(current_message(), "tcp://192.168.10.1:6300")
    reject 550 5.7.0 "Infected with %clamav_virus_name"
  fi
done

Mailfromd Manual (split by section):   Section:   Chapter:FastBack: Library   Up: Interfaces to Third-Party Programs   FastForward: Using MFL Mode   Contents: Table of ContentsIndex: Concept Index