General-Purpose Mail Filter
DSPAM is a statistical spam filter distributed under the terms of the GNU General Public License. It is available from http://dspam.sourceforge.net.
MFL provides an interface to DSPAM functionality if the
libdspam library is installed and
mailfromd is linked
with it. The
m4 macro ‘WITH_DSPAM’ is defined if it is
The DSPAM functions and definitions become available after requiring the ‘dspam’ module:
Analyze a message using DSPAM. The message is identified by its descriptor, passed in the msg argument.
The mode_flags argument controls the function behavior. Its value is
a bitwise OR of operation mode, flag, tokenizer and training
mode. Operation mode defines what
dspam is supposed to
do with the message. Its value is either ‘DSM_PROCESS’ if
full processing of the message is intended (the default), or
‘DSM_CLASSIFY’, if the message must only be classified.
Optional flag bits turn on additional functionality. The
‘DSF_SIGNATURE’ bit instructs
dspam to create a signature
for the message – a unique string which can subsequently be used to
identify that particular message. Upon return from the function, the
signature is stored in the
The ‘DSF_NOISE’ bit enables Bayesian noise reduction, and ‘DSF_WHITELIST’ enables automatic whitelisting.
Additional flags are available for defining the algorithm to split the message into tokens (tokenizer) and training mode. See flags-dspam, for a complete list of these. All these are optional, any missing values will be read from the DSPAM configuration file.
The configuration file must always be present. Its full file name
must be stored in the global variable
is no default value, so make sure this variable is initialized. If a
specific profile section should be read, store the name of that
profile in the variable
When called to process or classify the message,
returns an integer code of the class of the message. The value
‘DSR_ISSPAM’ means that this message was classified as spam. The
value ‘DSR_ISINNOCENT’ means it is a clean (“ham”) message.
The probability and confidence values are returned in global variables
MFL lacks floating-point data type, both variables keep
integers, obtained from the corresponding floating point values by
shifting the decimal point
dspam_prec digits to the right and rounding
the resulting value to the nearest integer. The same method
is used in
(see sa-floating-point-conversion). The default value for
dspam_prec variable is 3. You can use the
sa_format_score function to convert these values to strings
representing floating point numbers, e.g.:
require 'dspam' require 'sa' prog eom do if dspam(current_message(), DSM_PROCESS | DSM_SIGNATURE) == DSR_ISSPAM header_add("X-DSPAM-Result", "Spam") else header_add("X-DSPAM-Result", "Innocent") fi header_add("X-DSPAM-Probability", sa_format_score(dspam_probability, dspam_prec)) header_add("X-DSPAM-Confidence", sa_format_score(dspam_confidence, dspam_prec)) header_add("X-DSPAM-Signature", dspam_signature) done
Optional class_source argument is used when training the DSPAM classifier. It is a bitwise OR of the message class and message source values. Message class specifies the class this message belongs to. Possible values are ‘DSR_ISSPAM’, for spam messages, and ‘DSR_ISINNOCENT’, for clean messages. Message source informs DSPAM where this message comes from. The value ‘DSS_ERROR’ means the message was previously misclassified by DSPAM. The value ‘DSS_CORPUS’ indicates the message comes from a corpus feed. Finally, the value ‘DSS_INOCULATION’ means that the message is in pristine form, and should be trained as an inoculation. Inoculation is a more intense mode of training, usually used on honeypots.
The following example calls
dspam to train the classifier on
the current message if it was sent to a honeypot address, and uses
dspam to analyze the message class otherwise. The
honeypot variable is supposed to be set elsewhere in the code
(e.g. in the ‘envrcpt’ handler):
prog eom do number res if honeypot set res dspam(current_message(), DSM_PROCESS, DSR_ISSPAM | DSS_INOCULATION) discard else if dspam(current_message(), DSM_PROCESS | DSM_SIGNATURE) == DSR_ISSPAM header_add("X-DSPAM-Result", "Spam") else header_add("X-DSPAM-Result", "Innocent") fi header_add("X-DSPAM-Probability", sa_format_score(dspam_probability, dspam_prec)) header_add("X-DSPAM-Confidence" sa_format_score(dspam_confidence, dspam_prec)) header_add("X-DSPAM-Signature", dspam_signature) fi done
This document was generated on January 3, 2019 using makeinfo.Verbatim copying and distribution of this entire article is permitted in any medium, provided this notice is preserved.