Mailfromd Manual (split by node):   Section:   Chapter:FastBack: Library   Up: Interfaces to Third-Party Programs   FastForward: Using MFL Mode   Contents: Table of ContentsIndex: Concept Index

5.29.2 DSPAM

DSPAM is a statistical spam filter distributed under the terms of the GNU General Public License. It is available from http://dspam.sourceforge.net.

MFL provides an interface to DSPAM functionality if the libdspam library is installed and mailfromd is linked with it. The m4 macro ‘WITH_DSPAM’ is defined if it is so.

The DSPAM functions and definitions become available after requiring the ‘dspam’ module:

require 'dspam'
Built-in Function: number dspam (number msg, number mode_flags; number class_source)

Analyze a message using DSPAM. The message is identified by its descriptor, passed in the msg argument.

The mode_flags argument controls the function behavior. Its value is a bitwise OR of operation mode, flag, tokenizer and training mode. Operation mode defines what dspam is supposed to do with the message. Its value is either ‘DSM_PROCESS’ if full processing of the message is intended (the default), or ‘DSM_CLASSIFY’, if the message must only be classified.

Optional flag bits turn on additional functionality. The ‘DSF_SIGNATURE’ bit instructs dspam to create a signature for the message – a unique string which can subsequently be used to identify that particular message. Upon return from the function, the signature is stored in the dspam_signature variable.

The ‘DSF_NOISE’ bit enables Bayesian noise reduction, and ‘DSF_WHITELIST’ enables automatic whitelisting.

Additional flags are available for defining the algorithm to split the message into tokens (tokenizer) and training mode. See flags-dspam, for a complete list of these. All these are optional, any missing values will be read from the DSPAM configuration file.

The configuration file must always be present. Its full file name must be stored in the global variable dspam_config. There is no default value, so make sure this variable is initialized. If a specific profile section should be read, store the name of that profile in the variable dspam_profile.

When called to process or classify the message, dspam returns an integer code of the class of the message. The value ‘DSR_ISSPAM’ means that this message was classified as spam. The value ‘DSR_ISINNOCENT’ means it is a clean (“ham”) message.

The probability and confidence values are returned in global variables dspam_probability and dspam_confidence. Since MFL lacks floating-point data type, both variables keep integers, obtained from the corresponding floating point values by shifting the decimal point dspam_prec digits to the right and rounding the resulting value to the nearest integer. The same method is used in spamc function (see sa-floating-point-conversion). The default value for dspam_prec variable is 3. You can use the sa_format_score function to convert these values to strings representing floating point numbers, e.g.:

require 'dspam'
require 'sa'

prog eom
do
  if dspam(current_message(), DSM_PROCESS | DSM_SIGNATURE)
       == DSR_ISSPAM
    header_add("X-DSPAM-Result", "Spam")
  else
    header_add("X-DSPAM-Result", "Innocent")
  fi
  header_add("X-DSPAM-Probability",
             sa_format_score(dspam_probability, dspam_prec))
  header_add("X-DSPAM-Confidence",
             sa_format_score(dspam_confidence, dspam_prec))
  header_add("X-DSPAM-Signature", dspam_signature)
done

Optional class_source argument is used when training the DSPAM classifier. It is a bitwise OR of the message class and message source values. Message class specifies the class this message belongs to. Possible values are ‘DSR_ISSPAM’, for spam messages, and ‘DSR_ISINNOCENT’, for clean messages. Message source informs DSPAM where this message comes from. The value ‘DSS_ERROR’ means the message was previously misclassified by DSPAM. The value ‘DSS_CORPUS’ indicates the message comes from a corpus feed. Finally, the value ‘DSS_INOCULATION’ means that the message is in pristine form, and should be trained as an inoculation. Inoculation is a more intense mode of training, usually used on honeypots.

The following example calls dspam to train the classifier on the current message if it was sent to a honeypot address, and uses dspam to analyze the message class otherwise. The honeypot variable is supposed to be set elsewhere in the code (e.g. in the ‘envrcpt’ handler):

prog eom
do
  number res
  if honeypot
    set res dspam(current_message(), DSM_PROCESS,
                  DSR_ISSPAM | DSS_INOCULATION)
    discard
  else
    if dspam(current_message(), DSM_PROCESS | DSM_SIGNATURE)
             == DSR_ISSPAM
      header_add("X-DSPAM-Result", "Spam")
    else
      header_add("X-DSPAM-Result", "Innocent")
    fi
    header_add("X-DSPAM-Probability",
               sa_format_score(dspam_probability, dspam_prec))
    header_add("X-DSPAM-Confidence"
               sa_format_score(dspam_confidence, dspam_prec))
    header_add("X-DSPAM-Signature", dspam_signature)
  fi
done

Mailfromd Manual (split by node):   Section:   Chapter:FastBack: Library   Up: Interfaces to Third-Party Programs   FastForward: Using MFL Mode   Contents: Table of ContentsIndex: Concept Index