A set of functions is defined for interfacing with other filters via
TCP. Currently implemented are interfaces with
SpamAssassin
spamd
daemon and with
ClamAV
anti-virus.
Both interfaces work much the same way: the remote filter is
connected and the message is passed to it. If the remote filter
confirms that the message matches its requirements, the function
returns true
. Notice that in practice that means that such a
message should be rejected or deferred.
The address of the remote filter is supplied as the second argument in the form of a standard URL:
proto://path[:port]
The proto part specifies the connection protocol. It should be ‘tcp’ for the TCP connection and ‘file’ or ‘socket’ for the connection via UNIX socket. In the latter case the proto part can be omitted. When using TCP connection, the path part gives the remote host name or IP address and the optional port specifies the port number or service name to use. For example:
# connect to ‘remote.filter.net’ on port 3314: tcp://remote.filter.net:3314 # the same, using symbolic service name (must be defined in # /etc/services): tcp://remote.filter.net:spamd # Connect via a local UNIX socket (equivalent forms): /var/run/filter.sock file:///var/run/filter.sock socket:///var/run/filter.sock
The description of the interface functions follows.
Send the message msgt to the SpamAssassin daemon (spamd
)
listening on the given url. The command argument
identifies what kind of processing is needed for the message. Allowed
values are:
Process the message and return 1 or 0 depending on whether it is
diagnosed as spam or not. Store SpamAssassin keywords in the global
variable sa_keywords
(see below).
Process the message and return 1 or 0 depending on whether it is
diagnosed as spam or not. Store entire SpamAssassin report in the
global variable sa_keywords
.
Learn the supplied message as spam.
Learn the supplied message as ham.
Forget any prior classification of the message.
The second argument, prec, gives the precision, in decimal
digits, to be used when converting SpamAssassin diagnostic data and
storing them into mailfromd
variables.
The floating point SpamAssassin data are converted to the integer
mailfromd
variables using the following relation:
var = int(sa-var * 10**prec)
where sa-var stands for the SpamAssassin value and var
stands for the corresponding mailfromd
one. int()
means taking the integer part and ‘**’ denotes the exponentiation
operator.
The function returns additional information via the following variables:
sa_score
The spam score, converted to integer as described above. To convert
it to a floating-point representation, use sa_format_score
function (see sa_format_score). See also the
example below.
sa_threshold
The threshold, converted to integer form.
sa_keywords
If command is ‘SA_SYMBOLS’, this variable contains a string of comma-separated SpamAssassin keywords identifying this message, e.g.:
ADVANCE_FEE_1,AWL,BAYES_99
If command is ‘SA_REPORT’, the value of this variable is a spam report message. It is a multi-line textual message, containing detailed description of spam scores in a tabular form. It consists of the following parts:
The words ‘Content preview’, followed by a colon and an excerpt of the message body.
It has the following form:
Content analysis details: (score points, max required)
where score and max are spam score and threshold in floating point.
The score table is formatted in three columns:
The score, as a floating point number with one decimal digit.
SpamAssassin rule name that contributed this score.
Textual description of the rule
The score table can be extracted from sa_keywords
using
sa_format_report_header
function (see sa_format_report_header), as illustrated in the example below.
The value of this variable is undefined if command is ‘SA_LEARN_SPAM’, ‘SA_LEARN_HAM’ or ‘SA_FORGET’.
The spamc
function can signal the following exceptions:
e_failure
if the connection fails, e_url
if the supplied
URL is invalid and e_range
if the supplied port number
is out of the range 1–65535.
An example of using this function:
prog eom do if spamc(current_message(), "tcp://192.168.10.1:3333", 3, SA_SYMBOLS) reject 550 5.7.0 "Spam detected, score %sa_score with threshold %sa_threshold" fi done
Here is a more advanced example:
prog eom do set prec 3 if spamc(current_message(), "tcp://192.168.10.1:3333", prec, SA_REPORT) add "X-Spamd-Status" "SPAM" else add "X-Spamd-Status" "OK" fi add "X-Spamd-Score" sa_format_score(sa_score, prec) add "X-Spamd-Threshold" sa_format_score(sa_threshold, prec) add "X-Spamd-Keywords" sa_format_report_header(sa_keywords) done
Additional interface to the spamc
function, provided for
backward compatibility. It is equivalent to
spamc(current_message(), url, prec, command)
If command is not supplied, ‘SA_SYMBOLS’ is used.
DSPAM is a statistical spam filter distributed under the terms of the GNU General Public License. It is available from http://dspam.sourceforge.net.
MFL provides an interface to DSPAM functionality if the
libdspam library is installed and mailfromd
is linked
with it. The m4
macro ‘WITH_DSPAM’ is defined if it is
so.
The DSPAM functions and definitions become available after requiring the ‘dspam’ module:
require 'dspam'
Analyze a message using DSPAM. The message is identified by its descriptor, passed in the msg argument.
The mode_flags argument controls the function behavior. Its value is
a bitwise OR of operation mode, flag, tokenizer and training
mode. Operation mode defines what dspam
is supposed to
do with the message. Its value is either ‘DSM_PROCESS’ if
full processing of the message is intended (the default), or
‘DSM_CLASSIFY’, if the message must only be classified.
Optional flag bits turn on additional functionality. The
‘DSF_SIGNATURE’ bit instructs dspam
to create a signature
for the message – a unique string which can subsequently be used to
identify that particular message. Upon return from the function, the
signature is stored in the dspam_signature
variable.
The ‘DSF_NOISE’ bit enables Bayesian noise reduction, and ‘DSF_WHITELIST’ enables automatic whitelisting.
Additional flags are available for defining the algorithm to split the message into tokens (tokenizer) and training mode. See flags-dspam, for a complete list of these. All these are optional, any missing values will be read from the DSPAM configuration file.
The configuration file must always be present. Its full file name
must be stored in the global variable dspam_config
. There
is no default value, so make sure this variable is initialized. If a
specific profile section should be read, store the name of that
profile in the variable dspam_profile
.
When called to process or classify the message, dspam
returns an integer code of the class of the message. The value
‘DSR_ISSPAM’ means that this message was classified as spam. The
value ‘DSR_ISINNOCENT’ means it is a clean (“ham”) message.
The probability and confidence values are returned in global variables
dspam_probability
and dspam_confidence
. Since
MFL lacks floating-point data type, both variables keep
integers, obtained from the corresponding floating point values by
shifting the decimal point dspam_prec
digits to the right and rounding
the resulting value to the nearest integer. The same method
is used in spamc
function
(see sa-floating-point-conversion). The default value for
dspam_prec
variable is 3. You can use the
sa_format_score
function to convert these values to strings
representing floating point numbers, e.g.:
require 'dspam' require 'sa' prog eom do if dspam(current_message(), DSM_PROCESS | DSM_SIGNATURE) == DSR_ISSPAM header_add("X-DSPAM-Result", "Spam") else header_add("X-DSPAM-Result", "Innocent") fi header_add("X-DSPAM-Probability", sa_format_score(dspam_probability, dspam_prec)) header_add("X-DSPAM-Confidence", sa_format_score(dspam_confidence, dspam_prec)) header_add("X-DSPAM-Signature", dspam_signature) done
Optional class_source argument is used when training the DSPAM classifier. It is a bitwise OR of the message class and message source values. Message class specifies the class this message belongs to. Possible values are ‘DSR_ISSPAM’, for spam messages, and ‘DSR_ISINNOCENT’, for clean messages. Message source informs DSPAM where this message comes from. The value ‘DSS_ERROR’ means the message was previously misclassified by DSPAM. The value ‘DSS_CORPUS’ indicates the message comes from a corpus feed. Finally, the value ‘DSS_INOCULATION’ means that the message is in pristine form, and should be trained as an inoculation. Inoculation is a more intense mode of training, usually used on honeypots.
The following example calls dspam
to train the classifier on
the current message if it was sent to a honeypot address, and uses
dspam
to analyze the message class otherwise. The
honeypot
variable is supposed to be set elsewhere in the code
(e.g. in the ‘envrcpt’ handler):
prog eom do number res if honeypot set res dspam(current_message(), DSM_PROCESS, DSR_ISSPAM | DSS_INOCULATION) discard else if dspam(current_message(), DSM_PROCESS | DSM_SIGNATURE) == DSR_ISSPAM header_add("X-DSPAM-Result", "Spam") else header_add("X-DSPAM-Result", "Innocent") fi header_add("X-DSPAM-Probability", sa_format_score(dspam_probability, dspam_prec)) header_add("X-DSPAM-Confidence" sa_format_score(dspam_confidence, dspam_prec)) header_add("X-DSPAM-Signature", dspam_signature) fi done
The tables below summarize flags which can be used in the
mode_flags argument to dspam
function. The argument is a
bitwise OR of operation mode, flags,
tokenizer and training mode bits. Only one operation
mode bit can be used. Flags, tokenizer and training mode are
optional. Any number of flags, but no more than one tokenizer type
and one training mode bit are allowed. Missing values will be supplied
from the configuration file.
Mode | Meaning |
---|---|
DSM_PROCESS | Process message |
DSM_CLASSIFY | Classify message only (do not write changes) |
Table 5.2: DSPAM Operation modes
Flag | Meaning |
---|---|
DSF_SIGNATURE | Create a signature |
DSF_NOISE | Use Bayesian Noise Reduction |
DSF_WHITELIST | Use Automatic Whitelisting |
Table 5.3: DSPAM flags
Constant | Meaning |
---|---|
DSZ_WORD | Use WORD tokenizer |
DSZ_CHAIN | Use CHAIN tokenizer |
DSZ_SBPH | Use SBPH tokenizer |
DSZ_OSB | Use OSB tokenizer |
Table 5.4: DSPAM Tokenizer bits
Mode | Meaning |
---|---|
DST_TEFT | Train Everything |
DST_TOE | Train-on-Error |
DST_TUM | Train-until-Mature |
Table 5.5: DSPAM Training Modes
The tables below summarize flags which can be used in the
class_source argument to dspam
function. The argument is a
bitwise OR of classification and source bits. At
most one classification and one source bit can be given. If not
supplied, ‘DSR_NONE|DSS_NONE’ is used.
The classification flags are also used as the return code, as shown in the following table.
Mode | As return value | As argument |
---|---|---|
DSR_NONE | N/A | Classify message |
DSR_ISSPAM | Message is spam | Learn as spam |
DSR_ISINNOCENT | Message is innocent | Learn as innocent |
Table 5.6: DSPAM Classification
Source | Meaning |
---|---|
DSS_NONE | No classification source (use only with DSR_NONE) |
DSS_ERROR | Misclassification by libdspam |
DSS_CORPUS | Message came from a corpus feed |
DSS_INOCULATION | Message inoculation |
Table 5.7: DSPAM Source
Following global variables affect the behavior of the dspam
function:
Name of the DSPAM configuration file. You must set this variable
prior to calling dspam
. There is no default value.
Name of the configuration profile to be used. If empty (the default), use global configuration settings.
Name of the user on behalf of which dspam
is called. Default
is empty (no user).
Name of the user group on behalf of which dspam
is called. Default
is empty (no group).
Number of decimal digits to retain in the dspam_probability
and
dspam_confidence
values. See dspam probability and confidence, for more information and examples.
Before returning, dspam
stores additional information in the
following variables:
Signature of the classified message. This variable is initialized if ‘DSF_SIGNATURE’ bit is set in the mode_flags argument (see dspam classify example),
Spam probability value converted to integer by shifting decimal point
dspam_prec
positions to the right and rounding the resulting number.
See dspam probability and confidence, for more information and examples.
Spam confidence converted to integer using the same algorithm as for
dspam_probability
. See dspam probability and confidence,
for more information and examples.
Pass the message msg to the ClamAV daemon at url. Return
true
if it detects a virus in it. Return virus name in
clamav_virus_name
global variable.
The clamav
function can signal the following exceptions:
e_failure
if failed to connect to the server, e_url
if
the supplied URL is invalid and e_range
if the
supplied port number is out of the range 1–65535.
An example usage:
prog eom do if clamav(current_message(), "tcp://192.168.10.1:6300") reject 550 5.7.0 "Infected with %clamav_virus_name" fi done
This document was generated on August 13, 2022 using makeinfo.
Verbatim copying and distribution of this entire article is permitted in any medium, provided this notice is preserved.