Mailfromd |
|
General-Purpose Mail Filter |
Sergey Poznyakoff |
DSPAM is a statistical spam filter distributed under the terms of the GNU General Public License. It is available from http://dspam.sourceforge.net.
MFL provides an interface to DSPAM functionality if the
libdspam library is installed and mailfromd
is linked
with it. The m4
macro ‘WITH_DSPAM’ is defined if it is
so.
The DSPAM functions and definitions become available after requiring the ‘dspam’ module:
require 'dspam'
Analyze a message using DSPAM. The message is identified by its descriptor, passed in the msg argument.
The mode_flags argument controls the function behavior. Its value is
a bitwise OR of operation mode, flag, tokenizer and training
mode. Operation mode defines what dspam
is supposed to
do with the message. Its value is either ‘DSM_PROCESS’ if
full processing of the message is intended (the default), or
‘DSM_CLASSIFY’, if the message must only be classified.
Optional flag bits turn on additional functionality. The
‘DSF_SIGNATURE’ bit instructs dspam
to create a signature
for the message – a unique string which can subsequently be used to
identify that particular message. Upon return from the function, the
signature is stored in the dspam_signature
variable.
The ‘DSF_NOISE’ bit enables Bayesian noise reduction, and ‘DSF_WHITELIST’ enables automatic whitelisting.
Additional flags are available for defining the algorithm to split the message into tokens (tokenizer) and training mode. See flags-dspam, for a complete list of these. All these are optional, any missing values will be read from the DSPAM configuration file.
The configuration file must always be present. Its full file name
must be stored in the global variable dspam_config
. There
is no default value, so make sure this variable is initialized. If a
specific profile section should be read, store the name of that
profile in the variable dspam_profile
.
When called to process or classify the message, dspam
returns an integer code of the class of the message. The value
‘DSR_ISSPAM’ means that this message was classified as spam. The
value ‘DSR_ISINNOCENT’ means it is a clean (“ham”) message.
The probability and confidence values are returned in global variables
dspam_probability
and dspam_confidence
. Since
MFL lacks floating-point data type, both variables keep
integers, obtained from the corresponding floating point values by
shifting the decimal point dspam_prec
digits to the right and rounding
the resulting value to the nearest integer. The same method
is used in spamc
function
(see sa-floating-point-conversion). The default value for
dspam_prec
variable is 3. You can use the
sa_format_score
function to convert these values to strings
representing floating point numbers, e.g.:
require 'dspam' require 'sa' prog eom do if dspam(current_message(), DSM_PROCESS | DSM_SIGNATURE) == DSR_ISSPAM header_add("X-DSPAM-Result", "Spam") else header_add("X-DSPAM-Result", "Innocent") fi header_add("X-DSPAM-Probability", sa_format_score(dspam_probability, dspam_prec)) header_add("X-DSPAM-Confidence", sa_format_score(dspam_confidence, dspam_prec)) header_add("X-DSPAM-Signature", dspam_signature) done
Optional class_source argument is used when training the DSPAM classifier. It is a bitwise OR of the message class and message source values. Message class specifies the class this message belongs to. Possible values are ‘DSR_ISSPAM’, for spam messages, and ‘DSR_ISINNOCENT’, for clean messages. Message source informs DSPAM where this message comes from. The value ‘DSS_ERROR’ means the message was previously misclassified by DSPAM. The value ‘DSS_CORPUS’ indicates the message comes from a corpus feed. Finally, the value ‘DSS_INOCULATION’ means that the message is in pristine form, and should be trained as an inoculation. Inoculation is a more intense mode of training, usually used on honeypots.
The following example calls dspam
to train the classifier on
the current message if it was sent to a honeypot address, and uses
dspam
to analyze the message class otherwise. The
honeypot
variable is supposed to be set elsewhere in the code
(e.g. in the ‘envrcpt’ handler):
prog eom do number res if honeypot set res dspam(current_message(), DSM_PROCESS, DSR_ISSPAM | DSS_INOCULATION) discard else if dspam(current_message(), DSM_PROCESS | DSM_SIGNATURE) == DSR_ISSPAM header_add("X-DSPAM-Result", "Spam") else header_add("X-DSPAM-Result", "Innocent") fi header_add("X-DSPAM-Probability", sa_format_score(dspam_probability, dspam_prec)) header_add("X-DSPAM-Confidence" sa_format_score(dspam_confidence, dspam_prec)) header_add("X-DSPAM-Signature", dspam_signature) fi done
This document was generated on August 13, 2022 using makeinfo.
Verbatim copying and distribution of this entire article is permitted in any medium, provided this notice is preserved.