Mailfromd Manual (split by chapter):   Section:   Chapter:FastBack: Tutorial   Up: Top   FastForward: Library   Contents: Table of ContentsIndex: Concept Index

4 Mail Filtering Language

The mail filtering language, or MFL, is a special language designed for writing filter scripts. It has a simple syntax, similar to that of Bourne shell. In contrast to the most existing programming languages, MFL does not have any special terminating or separating characters (like, e.g. newlines and semicolons in shell)9. All syntactical entities are separated by any amount of white-space characters (i.e. spaces, tabulations or newlines).

The following sections describe MFL syntax in detail.

4.1 Comments

Two types of comments are allowed: C-style, enclosed between ‘/*’ and ‘*/’, and shell-style, starting with ‘#’ character and extending up to the end of line:

/* This is
   a comment. */
# And this too.

There are, however, several special cases, where the characters following ‘#’ are not ignored.

If the first line begins with ‘#!/’ or ‘#! /’, this is treated as a start of a multi-line comment, which is closed by the characters ‘!#’ on a line by themselves. This feature allows for writing sophisticated scripts. See top-block, for a detailed description.

If ‘#’ is followed by word ‘include’ (with optional whitespace between them), this statement requires inclusion of the specified file, as in C. There are two forms of the ‘#include’ statement:

  1. #include <file>
  2. #include "file"

The quotes around file in the second form quotes are optional.

Both forms are equivalent if file is an absolute file name. Otherwise, the first form will look for file in the include search path. The second one will look for it in the current working directory first, and, if not found there, in the include search path.

The default include search path is:

  1. prefix/share/mailfromd/8.7/include
  2. prefix/share/mailfromd/include
  3. /usr/share/mailfromd/include
  4. /usr/local/share/mailfromd/include

    Where prefix is the installation prefix.

New directories can be appended in front of it using -I (--include) command line option, or include-path configuration statement (see include-path).

For example, invoking

$ mailfromd -I/var/mailfromd -I/com/mailfromd

creates the following include search path

  1. /var/mailfromd
  2. /com/mailfromd
  3. prefix/share/mailfromd/8.7/include
  4. prefix/share/mailfromd/include
  5. /usr/share/mailfromd/include
  6. /usr/local/share/mailfromd/include

Along with #include, there is also a special form #include_once, that has the same syntax:

#include_once <file>
#include_once "file"

This form works exactly as #include, except that, if the file has already been included, it will not be included again. As the name suggests, it will be included only once.

This form should be used to prevent re-inclusions of a code, which can cause problems due to function redefinitions, variable reassignments etc.

A line in the form

#line number "identifier"

causes the MFL compiler to believe, for purposes of error diagnostics, that the line number of the next source line is given by number and the current input file is named by identifier. If the identifier is absent, the remembered file name does not change.

4.2 Pragmatic comments

If ‘#’ is immediately followed by word ‘pragma’ (with optional whitespace between them), such a construct introduces a pragmatic comment, i.e. an instruction that controls some configuration setting.

The available pragma types are described in the following subsections.

4.2.1 Pragma prereq

The #pragma prereq statement ensures that the correct mailfromd version is used to compile the source file it appears in. It takes version number as its arguments and produces a compilation error if the actual mailfromd version number is earlier than that. For example, the following statement:

#pragma prereq 7.0.94

results in error if compiled with mailfromd version 7.0.93 or prior.

4.2.2 Pragma stacksize

The stacksize pragma sets the initial size of the run-time stack and may also define the policy of its growing, in case it becomes full. The default stack size is 4096 words. You may need to increase this number if your configuration program uses recursive functions or does an excessive amount of string manipulations.

pragma: stacksize size [incr [max]]

Sets stack size to size units. Optional incr and max define stack growth policy (see below). The default units are words. The following example sets the stack size to 7168 words:

#pragma stacksize 7168

The size may end with a unit size suffix:

SuffixMeaning
kKiloword, i.e. 1024 words
mMegawords, i.e. 1048576 words
gGigawords,
tTerawords (ouch!)

Table 4.1: Unit Size Suffix

File suffixes are case-insensitive, so the following two pragmas are equivalent and set the stack size to 7*1048576 = 7340032 words:

#pragma stacksize 7m
#pragma stacksize 7M

When the MFL engine notices that there is no more stack space available, it attempts to expand the stack. If this attempt succeeds, the operation continues. Otherwise, a runtime error is reported and the execution of the filter stops.

The optional incr argument to #pragma stacksize defines growth policy for the stack. Two growth policies are implemented: fixed increment policy, which expands stack in a fixed number of expansion chunks, and exponential growth policy, which duplicates the stack size until it is able to accommodate the needed number of words. The fixed increment policy is the default. The default chunk size is 4096 words.

If incr is the word ‘twice’, the duplicate policy is selected. Otherwise incr must be a positive number optionally suffixed with a size suffix (see above). This indicates the expansion chunk size for the fixed increment policy.

The following example sets initial stack size to 10240, and expansion chunk size to 2048 words:

#pragma stacksize 10M 2K

The pragma below enables exponential stack growth policy:

#pragma stacksize 10240 twice

In this case, when the run-time evaluator hits the stack size limit, it expands the stack to twice the size it had before. So, in the example above, the stack will be sequentially expanded to the following sizes: 20480, 40960, 81920, 163840, etc.

The optional max argument defines the maximum size of the stack. If stack grows beyond this limit, the execution of the script will be aborted.

If you are concerned about the execution time of your script, you may wish to avoid stack reallocations. To help you find out the optimal stack size, each time the stack is expanded, mailfromd issues a warning in its log file, which looks like this:

warning: stack segment expanded, new size=8192

You can use these messages to adjust your stack size configuration settings.

4.2.3 Pragma regex

The ‘#pragma regex’, controls compilation of expressions. You can use any number of such pragma directives in your mailfromd.mf. The scope of ‘#pragma regex’ extends to the next occurrence of this directive or to the end of the script file, whichever occurs first.

pragma: regex [push|pop] flags

The optional push|pop parameter is one of the words ‘push’ or ‘pop’ and is discussed in detail below. The flags parameter is a whitespace-separated list of regex flags. Each regex-flag is a word specifying some regex feature. It can be preceded by ‘+’ to enable this feature (this is the default), by ‘-’ to disable it or by ‘=’ to reset regex flags to its value. Valid regex-flags are:

extended

Use POSIX Extended Regular Expression syntax when interpreting regex. If not set, POSIX Basic Regular Expression syntax is used.

icase

Do not differentiate case. Subsequent regex searches will be case insensitive.

newline

Match-any-character operators don’t match a newline.

A non-matching list (‘[^...]’) not containing a newline does not match a newline.

Match-beginning-of-line operator (‘^’) matches the empty string immediately after a newline.

Match-end-of-line operator (‘$’) matches the empty string immediately before a newline.

For example, the following pragma enables POSIX extended, case insensitive matching (a good thing to start your mailfromd.mf with):

#pragma regex +extended +icase

Optional modifiers ‘push’ and ‘pop’ can be used to maintain a stack of regex flags. The statement

#pragma regex push [flags]

saves current regex flags on stack and then optionally modifies them as requested by flags.

The statement

#pragma regex pop [flags]

does the opposite: restores the current regex flags from the top of stack and applies flags to it.

This statement is useful in module and include files to avoid disturbing user regex settings. E.g.:

#pragma regex push +extended +icase
 .
 .
 .
#pragma regex pop

4.2.4 Pragma dbprop

pragma: dbprop pattern prop …

This pragma configures properties for a DBM database. See Database functions, for its detailed description.

4.2.5 Pragma greylist

pragma: greylist type

Selects the greylisting implementation to use. Allowed values for type are:

traditional
gray

Use the traditional greylisting implementation. This is the default.

con-tassios
ct

Use Con Tassios greylisting implementation.

See greylisting types, for a detailed description of these greylisting implementations.

Notice, that this pragma can be used only once. A second use of this pragma would constitute an error, because you cannot use both greylisting implementations in the same program.

4.2.6 Pragma miltermacros

pragma: miltermacros handler macro …

Declare that the Milter stage handler uses MTA macro listed as the rest of arguments. The handler must be a valid handler name (see Handlers).

The mailfromd parser collects the names of the macros referred to by a ‘$name’ construct within a handler (see Sendmail Macros) and declares them automatically for corresponding handlers. It is, however, unable to track macros used in functions called from handler as well as those referred to via getmacro and macro_defined functions. Such macros should be declared using ‘#pragma miltermacros’.

During initial negotiation with the MTA, mailfromd will ask it to export the macro names declared automatically or by using the ‘#pragma miltermacros’. The MTA is free to honor or to ignore this request. In particular, Sendmail versions prior to 8.14.0 and Postfix versions prior to 2.5 do not support this feature. If you use one of these, you will need to export the needed macros explicitly in the MTA configuration. For more details, refer to the section in MTA Configuration corresponding to your MTA type.

4.2.7 Pragma provide-callout

The #pragma provide-callout statement is used in the callout module to inform mailfromd that the module has been loaded.

Do not use this pragma.

4.3 Data Types

The mailfromd filter script language operates on entities of two types: numeric and string.

The numeric type is represented internally as a signed long integer. Depending on the machine architecture, its size can vary. For example, on machines with Intel-based CPUs it is 32 bits long.

A string is a string of characters of arbitrary length. Strings can contain any characters except ASCII NUL.

There is also a generic pointer, which is designed to facilitate certain operations. It appears only in body handler. See body handler, for more information about it.

4.4 Numbers

A decimal number is any sequence of decimal digits, not beginning with ‘0’.

An octal number is ‘0’ followed by any number of octal digits (‘0’ through ‘7’), for example: 0340.

A hex number is ‘0x’ or ‘0X’ followed by any number of hex digits (‘0’ through ‘9’ and ‘a’ through ‘f’ or ‘A’ through ‘F’), for example: 0x3ef1.

4.5 Literals

A literal is any sequence of characters enclosed in single or double quotes.

After tempfail and reject actions two special kinds of literals are recognized: three-digit numeric values represent RFC 2821 reply codes, and literals consisting of tree digit groups separated by dots represent an extended reply code as per RFC 1893/2034. For example:

510   # A reply code
5.7.1 # An extended reply code

Double-quoted strings

String literals enclosed in double quotation marks (double-quoted strings) are subject to backslash interpretation, macro expansion, variable interpretation and back reference interpretation.

Backslash interpretation is performed at compilation time. It consists in replacing the following escape sequences with the corresponding single characters:

SequenceReplaced with
\aAudible bell character (ASCII 7)
\bBackspace character (ASCII 8)
\fForm-feed character (ASCII 12)
\nNewline character (ASCII 10)
\rCarriage return character (ASCII 13)
\tHorizontal tabulation character (ASCII 9)
\vVertical tabulation character (ASCII 11)

Table 4.2: Backslash escapes

In addition, the sequence ‘\newline’ has the same effect as ‘\n’, for example:

"a string with\
 embedded newline"
"a string with\n embedded newline"

Any escape sequence of the form ‘\xhh’, where h denotes any hex digit is replaced with the character whose ASCII value is hh. For example:

"\x61nother" ⇒ "another"

Similarly, an escape sequence of the form ‘\0ooo’, where o is an octal digit, is replaced with the character whose ASCII value is ooo.

Macro expansion and variable interpretation occur at run-time. During these phases all Sendmail macros (see Sendmail Macros), mailfromd variables (see Variables), and constants (see Constants) referenced in the string are replaced by their actual values. For example, if the Sendmail macro f has the value ‘postmaster@gnu.org.ua’ and the variable last_ip has the value ‘127.0.0.1’, then the string10

"$f last connected from %last_ip;"

will be expanded to

"postmaster@gnu.org.ua last connected from 127.0.0.1;"

A back reference is a sequence ‘\d’, where d is a decimal number. It refers to the dth parenthesized subexpression in the last matches statement11. Any back reference occurring within a double-quoted string is replaced by the value of the corresponding subexpression. See Special comparisons, for a detailed description of this process. Back reference interpretation is performed at run time.

Single-quoted strings

Any characters enclosed in single quotation marks are read unmodified.

The following examples contain pairs of equivalent strings:

"a string"
'a string'

"\\(.*\\):"
'\(.*\):' 

Notice the last example. Single quotes are particularly useful in writing regular expressions (see Special comparisons).

4.6 Here Documents

Here-document is a special form of a string literal is, allowing to specify multiline strings without having to use backslash escapes. The format of here-documents is:

<<[flags]wordword

The <<word construct instructs the parser to read all the following lines up to the line containing only word, with possible trailing blanks. The lines thus read are concatenated together into a single string. For example:

set str <<EOT
A multiline
string
EOT

The body of a here-document is interpreted the same way as double-quoted strings (see Double-quoted strings). For example, if Sendmail macro f has the value jsmith@some.com and the variable count is set to 10, then the following string:

set s <<EOT
<$f> has tried to send %count mails.
Please see docs for more info.
EOT

will be expanded to:

<jsmith@some.com> has tried to send 10 mails.
Please see docs for more info.

If the word is quoted, either by enclosing it in single quote characters or by prepending it with a backslash, all interpretations and expansions within the document body are suppressed. For example:

set s <<'EOT'
The following line is read verbatim:
<$f> has tried to send %count mails.
Please see docs for more info.
EOT

Optional flags in the here-document construct control the way leading white space is handled. If flags is - (a dash), then all leading tab characters are stripped from input lines and the line containing word. Furthermore, if - is followed by a single space, all leading whitespace is stripped from them. This allows here-documents within configuration scripts to be indented in a natural fashion. Examples:

<<- TEXT
    <$f> has tried to send %count mails.
    Please see docs for more info.
TEXT

Here-documents are particularly useful with reject actions (see reject.

4.7 Sendmail Macros

Sendmail macros are referenced exactly the same way they are in sendmail.cf configuration file, i.e. ‘$name’, where name represents the macro name. Notice, that the notation is the same for both single-character and multi-character macro names. For consistency with the Sendmail configuration the ‘${name}’ notation is also accepted.

Another way to reference Sendmail macros is by using function getmacro (see Macro access).

Sendmail macros evaluate to string values.

Notice, that to reference a macro, you must properly export it in your MTA configuration. Attempt to reference a not exported macro will result in raising a e_macroundef exception at the run time (see uncaught exceptions).

4.8 Constants

A constant is a symbolic name for an MFL value. Constants are defined using const statement:

[qualifier] const name expr

where name is an identifier, and expr is any valid MFL expression evaluating immediately to a constant literal or numeric value. Optional qualifier defines the scope of visibility for that constant (see scope of visibility): either public or static.

After defining, any appearance of name in the program text is replaced by its value. For example:

const x 10/5
const text "X is "

defines the numeric constant ‘x’ with the value ‘5’, and the literal constant ‘text’ with the value ‘X is ’.

Constants can also be used in literals. To expand a constant within a literal string, prepend a percent sign to its name, e.g.:

echo "New %text %x" ⇒ "New X is 2"

This way of expanding constants creates an ambiguity if there happen to be a variable of the same name as the constant. See variable--constant clashes, for more information of this case and ways to handle it.

4.8.1 Built-in constants

Several constants are built into the MFL compiler. To discern them from user-defined ones, their names start and end with two underscores (‘__’).

The following constants are defined in mailfromd version 8.7:

Built-in constant: string __file__

Expands to the name of the current source file.

Built-in constant: string __function__

Expands to the name of the current lexical context, i.e. the function or handler name.

Built-in constant: string __git__

This built-in constant is defined for alpha versions only. Its value is the Git tag of the recent commit corresponding to that version of the package. If the release contains some uncommitted changes, the value of the ‘__git__’ constant ends with the suffix ‘-dirty’.

Built-in constant: number __line__

Expands to the current line number in the input source file.

Built-in constant: number __major__

Expands to the major version number.

The following example uses __major__ constant to determine if some version-dependent feature can be used:

if __major__ > 2
  # Use some version-specific feature
fi  
Built-in constant: number __minor__

Expands to the minor version number.

Built-in constant: string __module__

Expands to the name of the current module (see Modules).

Built-in constant: string __package__

Expands to the package name (‘mailfromd’)

Built-in constant: number __patch__

For alpha versions and maintenance releases expands to the version patch level. For stable versions, expands to ‘0’.

Built-in constant: string __defpreproc__

Expands to the default external preprocessor command line, if the preprocessor is used, or to an empty string if it is not, e.g.:

__defpreproc__ ⇒ "/usr/bin/m4 -s"

See Preprocessor, for information on preprocessor and its features.

Built-in constant: string __preproc__

Expands to the current external preprocessor command line, if the preprocessor is used, or to an empty string if it is not. Notice, that it equals __defpreproc__, unless the preprocessor was redefined using --preprocessor command line option (see –preprocessor).

Built-in constant: string __version__

Expands to the textual representation of the program version (e.g. ‘3.0.90’)

Built-in constant: string __defstatedir__

Expands to the default state directory (see statedir).

Built-in constant: string __statedir__

Expands to the current value of the program state directory (see statedir). Notice, that it is the same as __defstatedir__ unless the state directory was redefined at run time.

Built-in constants can be used as variables, this allows to expand them within strings or here-documents. The following example illustrates the common practice used for debugging configuration scripts:

func foo(number x)
do
  echo "%__file__:%__line__: foo called with arg %x"
  …
done

If the function foo were called in line 28 of the script file /etc/mailfromd.mf, like this: foo(10), you will see the following string in your logs:

/etc/mailfromd.mf:28: foo called with arg 10

4.9 Variables

Variables represent regions of memory used to hold variable data. These memory regions are identified by variable names. A variable name must begin with a letter or underscore and must consist of letters, digits and underscores.

Each variable is associated with its scope of visibility, which defines the part of source code where it can be used (see scope of visibility). Depending on the scope, we discern three main classes of variables: public, static and automatic (or local).

Public variables have indefinite lexical scope, so they may be referred to anywhere in the program. Static are variables visible only within their module (see Modules). Automatic or local variables are visible only within the given function or handler.

Public and static variables are sometimes collectively called global.

These variable classes occupy separate namespaces, so that an automatic variable can have the same name as an existing public or static one. In this case this variable is said to shadow its global counterpart. All references to such a name will refer to the automatic variable until the end of its scope is reached, where the global one becomes visible again.

Likewise, a static variable may have the same name as a static variable defined in another module. However, it may not have the same name as a public variable.

A variable is declared using the following syntax:

[qualifiers] type name

where name is the variable name, type is the type of the data it is supposed to hold. It is ‘string’ for string variables and ‘number’ for numeric ones.

For example, this is a declaration of a string variable ‘var’:

string var

Optional qualifiers are allowed only in global declarations, i.e. in the variable declarations that appear outside of functions. They specify the scope of the variable. The public qualifier declares the variable as public and the static qualifier declares it as static. The default scope is ‘public’, unless specified otherwise in the module declaration (see module structure).

Additionally, qualifiers may contain the word precious, which instructs the compiler to mark this variable as precious. (see precious variables). The value of the precious variable is not affected by the SMTPRSET’ command. If both scope qualifier and precious are used, they may appear in any order, e.g.:

static precious string rcpt_list

or

precious static string rcpt_list

The declaration can be followed by any valid MFL expression, which supplies the initial value for the variable, for example:

string var "test"

If a variable declaration occurs within a function (see User-defined) or handler (see Handlers), it declares an automatic variable, local to this function or handler. Otherwise, it declares a global variable.

A variable is assigned a value using set statement:

set name expr

where name is the variable name and expr is a mailfromd expression (see Expressions). The effect of this statement is that the expr is evaluated and the value it yields is assigned to the variable name.

If the set statement is located outside a function or handler definition, the expr must be a constant expression, i.e. the compiler should be able to evaluate it immediately. See optimizer.

It is not an error to assign a value to a variable that is not declared. In this case the assignment first declares a global or automatic variable having the type of expr and then assigns a value to it. Automatic variable is created if the assignment occurs within a function or handler, global variable is declared if it occurs at topmost lexical level. This is called implicit variable declaration.

Variables are referenced using the notation ‘%name’. The variable being referenced must have been declared earlier (either explicitly or implicitly).

4.9.1 Predefined Variables

Several variables are predefined. In mailfromd version 8.7 these are:

Variable: Predefined Variable number cache_used

This variable is set by stdpoll and strictpoll built-ins (and, consequently, by the on poll statement). Its value is ‘1’ if the function used the cached data instead of directly polling the host, and ‘0’ if the polling took place. See SMTP Callout functions.

You can use this variable to make your reject message more informative for the remote party. The common paradigm is to define a function, returning empty string if the result was obtained from polling, or some notice if cached data were used, and to use the function in the reject text, for example:

func cachestr() returns string
do
  if cache_used
    return "[CACHED] "
  else
    return ""
  fi
done

Then, in prog envfrom one can use:

on poll $f
do
when not_found or failure:
  reject 550 5.1.0 cachestr() . "Sender validity not confirmed"
done
Predefined Variable: string clamav_virus_name

Name of virus identified by ClamAV. Set by clamav function (see ClamAV).

Predefined Variable: number greylist_seconds_left

Number of seconds left to the end of greylisting period. Set by greylist and is_greylisted functions (see Special test functions).

Predefined Variable: string ehlo_domain

Name of the domain used by polling functions in SMTP EHLO or HELO command. Default value is the fully qualified domain name of the host where mailfromd is run. See Polling.

Variable: Predefined Variable string last_poll_greeting

Callout functions (see SMTP Callout functions) set this variable before returning. It contains the initial SMTP reply from the last polled host.

Variable: Predefined Variable string last_poll_helo

Callout functions (see SMTP Callout functions) set this variable before returning. It contains the reply to the HELO (EHLO) command, received from the last polled host.

Variable: Predefined Variable string last_poll_host

Callout functions (see SMTP Callout functions) set this variable before returning. It contains the host name or IP address of the last polled host.

Variable: Predefined Variable string last_poll_recv

Callout functions (see SMTP Callout functions) set this variable before returning. It contains the last SMTP reply received from the remote host. In case of multi-line replies, only the first line is stored. If nothing was received the variable contains the string ‘nothing’.

Variable: Predefined Variable string last_poll_sent

Callout functions (see SMTP Callout functions) set this variable before returning. It contains the last SMTP command sent to the polled host. If nothing was sent, last_poll_sent contains the string ‘nothing’.

Predefined Variable: string mailfrom_address

Email address used by polling functions in SMTP MAIL FROM command (see Polling.). Default is ‘<>’. Here is an example of how to change it:

set mailfrom_address "postmaster@my.domain.com"

You can set this value to a comma-separated list of email addresses, in which case the probing will try each address until either the remote party accepts it or the list of addresses is exhausted, whichever happens first.

It is not necessary to enclose emails in angle brackets, as they will be added automatically where appropriate. The only exception is null return address, when used in a list of addresses. In this case, it should always be written as ‘<>’. For example:

set mailfrom_address "postmaster@my.domain.com, <>"
Predefined Variable: number sa_code

Spam score for the message, set by sa function (see sa).

Predefined Variable: number rcpt_count

The variable rcpt_count keeps the number of recipients given so far by RCPT TO commands. It is defined only in ‘envrcpt’ handlers.

Predefined Variable: number sa_threshold

Spam threshold, set by sa function (see sa).

Predefined Variable: string sa_keywords

Spam keywords for the message, set by sa function (see sa).

Predefined Variable: number safedb_verbose

This variable controls the verbosity of the exception-safe database functions. See safedb_verbose.

4.10 Back references

A back reference is a sequence ‘\d’, where d is a decimal number. It refers to the dth parenthesized subexpression in the last matches statement12. Any back reference occurring within a double-quoted string is replaced with the value of the corresponding subexpression. For example:

if $f matches '.*@\(.*\)\.gnu\.org\.ua'
  set host \1
fi

If the value of f macro is ‘smith@unza.gnu.org.ua’, the above code will assign the string ‘unza’ to the variable host.

Notice, that each occurrence of matches will reset the table of back references, so try to use them as early as possible. The following example illustrates a common error, when the back reference is used after the reference table has been reused by another matching:

# Wrong!
if $f matches '.*@\(.*\)\.gnu\.org\.ua'
  if $f matches 'some.*'
    set host \1
  fi
fi

This will produce the following run time error:

mailfromd: RUNTIME ERROR near file.mf:3: Invalid back-reference number

because the inner match (‘some.*’) does not have any parenthesized subexpressions.

See Special comparisons, for more information about matches operator.

4.11 Handlers

Milter stage handler (or handler, for short) is a subroutine responsible for processing a particular milter state. There are eight handlers available. Their order of invocation and arguments are described in Figure 3.1.

A handler is defined using the following construct:

prog handler-name
do
  handler-body
done

where handler-name is the name of the handler (see handler names), handler-body is the list of filter statements composing the handler body. Some handlers take arguments, which can be accessed within the handler-body using the notation $n, where n is the ordinal number of the argument. Here we describe the available handlers and their arguments:

Handler: connect (string $1, number $2, number $3, string $4)
Invocation:

This handler is called once at the beginning of each SMTP connection.

Arguments:
  1. string; The host name of the message sender, as reported by MTA. Usually it is determined by a reverse lookup on the host address. If the reverse lookup fails, ‘$1’ will contain the message sender’s IP address enclosed in square brackets (e.g. ‘[127.0.0.1]’).
  2. number; Socket address family. You need to require the ‘status’ module to get symbolic definitions for the address families. Supported families are:
    ConstantValueMeaning
    FAMILY_STDIO0Standard input/output (the MTA is run with -bs option)
    FAMILY_UNIX1UNIX socket
    FAMILY_INET2IPv4 protocol
    FAMILY_INET63IPv6 protocol

    Table 4.3: Supported socket families

  3. number; Port number if ‘$2’ is ‘FAMILY_INET’.
  4. string; Remote IP address if ‘$2’ is ‘FAMILY_INET’ or full file name of the socket if ‘$2’ is ‘FAMILY_UNIX’. If ‘$2’ is ‘FAMILY_STDIO’, ‘$4’ is an empty string.

The actions (see Actions) appearing in this handler are handled by Sendmail in a special way. First of all, any textual message is ignored. Secondly, the only action that immediately closes the connection is tempfail 421. Any other reply codes result in Sendmail switching to nullserver mode, where it accepts any commands, but answers with a failure to any of them, except for the following: QUIT, HELO, NOOP, which are processed as usual.

The following table summarizes the Sendmail behavior depending on the action used:

tempfail 421 excode message

The caller is returned the following error message:

421 4.7.0 hostname closing connection

Both excode and message are ignored.

tempfail 4xx excode message

(where xx represents any digits, except ‘21’) Both excode and message are ignored. Sendmail switches to nullserver mode. Any subsequent command, excepting the ones listed above, is answered with

454 4.3.0 Please try again later
reject 5xx excode message

(where xx represents any digits). All arguments are ignored. Sendmail switches to nullserver mode. Any subsequent command, excepting ones listed above, is answered with

550 5.0.0 Command rejected

Regarding reply codes, this behavior complies with RFC 2821 (section 3.9), which states:

An SMTP server must not intentionally close the connection except:
[…]
- After detecting the need to shut down the SMTP service and returning a 421 response code. This response code can be issued after the server receives any command or, if necessary, asynchronously from command receipt (on the assumption that the client will receive it after the next command is issued).

However, the RFC says nothing about textual messages and extended error codes, therefore Sendmail’s ignoring of these is, in my opinion, absurd. My practice shows that it is often reasonable, and even necessary, to return a meaningful textual message if the initial connection is declined. The opinion of mailfromd users seems to support this view. Bearing this in mind, mailfromd is shipped with a patch for Sendmail, which makes it honor both extended return code and textual message given with the action. Two versions are provided: etc/sendmail-8.13.7.connect.diff, for Sendmail versions 8.13.x, and etc/sendmail-8.14.3.connect.diff, for Sendmail versions 8.14.3.

Handler: helo (string $1)
Invocation:

This handler is called whenever the SMTP client sends HELO or EHLO command. Depending on the actual MTA configuration, it can be called several times or even not at all.

Arguments:
  1. string; Argument to HELO (EHLO) commands.
Notes:

According to RFC 28221, $1 must be domain name of the sending host, or, in case this is not available, its IP address enclosed in square brackets. Be careful when taking decisions based on this value, because in practice many hosts send arbitrary strings. We recommend to use heloarg_test function (see heloarg_test) if you wish to analyze this value.

Handler: envfrom (string $1, string $2)
Invocation:

Called when the SMTP client sends MAIL FROM command, i.e. once at the beginning of each message.

Arguments:
  1. string; First argument to the MAIL FROM command, i.e. the email address of the sender.
  2. string; Rest of arguments to MAIL FROM separated by space character. This argument can be ‘""’.
Notes
  1. $1 is not the same as $f Sendmail variable, because the latter contains the sender email after address rewriting and normalization, while $1 contains exactly the value given by sending party.
  2. When the array type is implemented, $2 will contain an array of arguments.
Handler: envrcpt (string $1, string $2)
Invocation:

Called once for each RCPT TO command, i.e. once for each recipient, immediately after envfrom.

Arguments:
  1. string; First argument to the RCPT TO command, i.e. the email address of the recipient.
  2. string; Rest of arguments to RCPT TO separated by space character. This argument can be ‘""’.
Notes:

When the array type is implemented, $2 will contain an array of arguments.

Handler: data ()
Invocation:

Called after the MTA receives SMTPDATA’ command. Notice that this handler is not supported by Sendmail versions prior to 8.14.0 and Postfix versions prior to 2.5.

Arguments:

None

Handler: header (string $1, string $2)
Invocation:

Called once for each header line received after SMTP DATA command.

Arguments:
  1. string; Header field name.
  2. string; Header field value. The content of the header may include folded white space, i.e., multiple lines with following white space where lines are separated by LF (ASCII 10). The trailing line terminator (CR/LF) is removed.
Handler: eoh
Invocation:

This handler is called once per message, after all headers have been sent and processed.

Arguments:

None.

Handler: body (pointer $1, number $2)
Invocation:

This header is called zero or more times, for each piece of the message body obtained from the remote host.

Arguments:
  1. pointer; Piece of body text. See ‘Notes’ below.
  2. number; Length of data pointed to by $1, in bytes.
Notes:

The first argument points to the body chunk. Its size may be quite considerable and passing it as a string may be costly both in terms of memory and execution time. For this reason it is not passed as a string, but rather as a generic pointer, i.e. an object having the same size as number, which can be used to retrieve the actual contents of the body chunk if the need arises.

A special function body_string is provided to convert this object to a regular MFL string (see Mail body functions). Using it you can collect the entire body text into a single global variable, as illustrated by the following example:

string text

prog body
do
  set text text . body_string($1,$2)
done

The text collected this way can then be used in the eom handler (see below) to parse and analyze it.

If you wish to analyze both the headers and mail body, the following code fragment will do that for you:

string text

# Collect all headers.
prog header
do
  set text text . $1 . ": " . $2 . "\n"
done

# Append terminating newline to the headers.
prog eoh
do
  set text "%text\n"
done

# Collect message body.
prog body
do
  set text text . body_string($1, $2)
done
Handler: eom
Invocation:

This handler is called once per message, when the terminating dot after DATA command has been received.

Arguments:

None

Notes:

This handler is useful for calling message capturing functions, such as sa or clamav. For more information about these, refer to Interfaces to Third-Party Programs.

For your reference, the following table shows each handler with its arguments:

Handler$1$2$3$4
connectHostnameSocket FamilyPortRemote address
heloHELO domainN/AN/AN/A
envfromSender email addressRest of argumentsN/AN/A
envrcptRecipient email addressRest of argumentsN/AN/A
headerHeader nameHeader valueN/AN/A
eohN/AN/AN/AN/A
bodyBody segment (pointer)Length of the segment (numeric)N/AN/A
eomN/AN/AN/AN/A

Table 4.4: State Handler Arguments

4.12 The ‘begin’ and ‘end’ special handlers

Apart from the milter handlers described in the previous section, MFL defines two special handlers, called ‘begin’ and ‘end’, which supply startup and cleanup instructions for the filter program.

The ‘begin’ special handler is executed once for each SMTP session, after the connection has been established but before the first milter handler has been called. Similarly, the ‘end’ handler is executed exactly once, after the connection has been closed. Neither of them takes any arguments.

The two handlers are defined using the following syntax:

# Begin handler
begin
do
  …
done    

# End handler
end
do
  …
done

where ‘’ represent any MFL statements.

An MFL program may have multiple ‘begin’ and ‘end’ definitions. They can be intermixed with other definitions. The compiler combines all ‘begin’ statements into a single one, in the order they appear in the sources. Similarly, all ‘end’ blocks are concatenated together. The resulting ‘begin’ is called once, at the beginning of each SMTP session, and ‘end’ is called once at its termination.

Multiple ‘begin’ and ‘end’ handlers are a useful feature for writing modules (see Modules), because each module can thus have its own initialization and cleanup blocks. Notice, however, that in this case the order in which subsequent ‘begin’ and ‘end’ blocks are executed is not defined. It is only warranted that all ‘begin’ blocks are executed at startup and all ‘end’ blocks are executed at shutdown. It is also warranted that all ‘begin’ and ‘end’ blocks defined within a compilation unit (i.e. a single abstract source file, with all #include and #include_once statements expanded in place) are executed in order of their appearance in the unit.

Due to their special nature, the startup and cleanup blocks impose certain restrictions on the statements that can be used within them:

  1. return cannot be used in ‘begin’ and ‘end’ handlers.
  2. The following Sendmail actions cannot be used in them: accept, continue, discard, reject, tempfail. They can, however, be used in catch statements, declared in ‘begin’ blocks (see example below).
  3. Header manipulation actions (see header manipulation) cannot be used in ‘end’ handler.

The ‘begin’ handlers are the usual place to put global initialization code to. For example, if you do not want to use DNS caching, you can do it this way:

begin
do
  db_set_active("dns", 0)
done  

Additionally, you can set up global exception handling routines there. For example, the following ‘begin’ statement disables DNS cache and, for all exceptions not handled otherwise, installs a handler that logs the exception along with the stack trace and continues processing the message:

begin
do
  db_set_active("dns", 0)
  catch *
  do
    echo "Caught exception $1: $2"
    stack_trace()
    continue
  done
done  

4.13 Functions

A function is a named mailfromd subroutine, which takes zero or more parameters and optionally returns a certain value. Depending on the return value, functions can be subdivided into string functions and number functions. A function may have mandatory and optional parameters. When invoked, the function must be supplied exactly as many actual arguments as the number of its mandatory parameters.

Functions are invoked using the following syntax:

  name (args)

where name is the function name and args is a comma-separated list of expressions. For example, the following are valid function calls:

  foo(10)
  interval("1 hour")
  greylist("/var/my.db", 180)

The number of parameters a function takes and their data types compose the function signature. When actual arguments are passed to the function, they are converted to types of the corresponding formal parameters.

There are two major groups of functions: built-in functions, that are implemented in the mailfromd binary, and user-defined functions, that are written in MFL. The invocation syntax is the same for both groups.

Mailfromd is shipped with a rich set of library functions. These are described in Library. In addition to these you can define your own functions.

Function definitions can appear anywhere between the handler declarations in a filter program, the only requirement being that the function definition occur before the place where the function is invoked.

The syntax of a function definition is:

[qualifier] func name (param-decl) returns data-type
do
  function-body
done

where name is the name of the function to define, param-decl is a comma-separated list of parameter declarations. The syntax of the latter is the same as that of variable declarations (see Variable declarations), i.e.:

type name

declares the parameter name having the type type. The type is string or number.

Optional qualifier declares the scope of visibility for that function (see scope of visibility). It is similar to that of variables, except that functions cannot be local (i.e. you cannot declare function within another function).

The public qualifier declares a function that may be referred to from any module, whereas the static qualifier declares a function that may be called only from the current module (see Modules). The default scope is ‘public’, unless specified otherwise in the module declaration (see module structure).

For example, the following declares a function ‘sum’, that takes two numeric arguments and returns a numeric value:

func sum(number x, number y) returns number

Similarly, the following is a declaration of a static function:

static func sum(number x, number y) returns number

Parameters are referenced in the function-body by their name, the same way as other variables. Similarly, the value of a parameter can be altered using set statement.

A function can be declared to take a certain number of optional arguments. In a function declaration, optional abstract arguments must be placed after the mandatory ones, and must be separated from them with a semicolon. The following example is a definition of function foo, which takes two mandatory and two optional arguments:

func foo(string msg, string email; number x, string pfx)

Mandatory parameters are: msg and email. Optional parameters are: x and pfx. The actual number of arguments supplied to the function is returned by a special construct $#. In addition, the special construct @arg evaluates to the ordinal number of variable arg in the list of formal parameters (the first argument has number ‘0’). These two constructs can be used to verify whether an argument is supplied to the function.

When an actual argument for parameter n is supplied, the number of actual arguments ($#) is greater than the ordinal number of that parameter in the declaration list (@n). Thus, the following construct can be used to check if an optional argument arg is actually supplied:

func foo(string msg, string email; number x, string arg)
do
  if $# > @arg
    …
  fi

The default mailfromd installation provides a special macro for this purpose: see defined. Using it, the example above could be rewritten as:

func foo(string msg, string email; number x, string arg)
do
  if defined(arg)
    …
  fi

Within a function body, optional arguments are referenced exactly the same way as the mandatory ones. Attempt to dereference an optional argument for which no actual parameter was supplied, results in an undefined value, so be sure to check whether a parameter is passed before dereferencing it.

A function can also take variable number of arguments (such functions are called variadic). This is indicated by the use of ellipsis as the last abstract parameter. The statement below defines a function foo taking one mandatory, one optional and any number of additional arguments:

func foo (string a ; string b, ...)

All actual arguments passed in a list of variable arguments are coerced to string data type. To refer to these arguments in the function body, the following construct is used:

$(expr)

where expr is any valid MFL expression, evaluating to a number n. This construct refers to the value of nth actual parameter from the variable argument list. Parameters are numbered from ‘1’, so the first variable parameter is $(1), and the last one is $($# - Nm - No), where Nm and No are numbers of mandatory and optional parameters to the function.

For example, the function below prints all its arguments:

func pargs (string text, ...)
do
  echo "text=%text"
  loop for number i 1,
       while i <= $# - 1,
       set i i + 1
  do
    echo "arg %i=" . $(i)
  done
done

Note the loop limits. The last variable argument has number $# - 1, because the function takes one mandatory argument.

The function-body is any list of valid mailfromd statements. In addition to the statements discussed below (see Statements) it can also contain the return statement, which is used to return a value from the function. The syntax of the return statement is

  return value

As an example of this, consider the following code snippet that defines the function ‘sum’ to return a sum of its two arguments:

func sum(number x, number y) returns number
do
        return x + y
done

The returns part in the function declaration is optional. A declaration lacking it defines a procedure, or void function, i.e. a function that is not supposed to return any value. Such functions cannot be used in expressions, instead they are used as statements (see Statements). The following example shows a function that emits a customized temporary failure notice:

func stdtf()
do
  tempfail 451 4.3.5 "Try again later"
done

A function may have several names. An alternative name (or alias) can be assigned to a function by using alias keyword, placed after param-decl part, for example:

func foo()
alias bar
returns string
do
  …
done

After this declaration, both foo() and bar() will refer to the same function.

The number of function aliases is unlimited. The following fragment declares a function having three names:

func foo()
alias bar
alias baz
returns string
do
  …
done

Although this feature is rarely needed, there are sometimes cases when it may be necessary.

A variable declared within a function becomes a local variable to this function. Its lexical scope ends with the terminating done statement.

Parameters, local variables and global variables are using separate namespaces, so a parameter name can coincide with the name of a global, in which case a parameter is said to shadow the global. All references to its name will refer to the parameter, until the end of its scope is reached, where the global one becomes visible again. Consider the following example:

number x

func foo(string x)
do
  echo "foo: %x"
done

prog envfrom
do
  set x "Global"      
  foo("Local")
  echo x
done

Running mailfromd --test with this configuration will display:

foo: Local
Global

4.13.1 Some Useful Functions

To illustrate the concept of user-defined functions, this subsection shows the definitions of some of the library functions shipped with mailfromd13. These functions are contained in modules installed along with the mailfromd binary. To use any of them in your code, require the appropriate module as described in import, e.g. to use the revip function, do require 'revip'.

Functions and their definitions:

  1. revip

    The function revip (see revip) is implemented as follows:

    func revip(string ip) returns string
    do
      return inet_ntoa(ntohl(inet_aton(ip)))
    done    
    

    Previously it was implemented using regular expressions. Below we include this variant as well, as an illustration for the use of regular expressions:

    #pragma regex push +extended
    func revip(string ip) returns string
    do
      if ip matches '([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)'
        return "\4.\3.\2.\1"
      fi
      return ip
    done
    #pragma regex pop
    
  2. strip_domain_part

    This function returns at most n last components of the domain name domain (see strip_domain_part).

    #pragma regex push +extended
    
    func strip_domain_part(string domain, number n) returns string
    do
      if n > 0 and
        domain matches '.*((\.[^.]+){' . $2 . '})'
        return substring(\1, 1, -1)
      else
        return domain
      fi
    done
    #pragma regex pop
    
  3. valid_domain

    See valid_domain, for a description of this function. Its definition follows:

    require dns
    
    func valid_domain(string domain) returns number
    do
      return not (resolve(domain) = "0" and not hasmx(domain))
    done
    
  4. match_dnsbl

    The function match_dnsbl (see match_dnsbl) is defined as follows:

    require dns
    require match_cidr
    #pragma regex push +extended
    
    func match_dnsbl(string address, string zone, string range)
        returns number
    do
      string rbl_ip
      if range = 'ANY'
        set rbl_ip '127.0.0.0/8'
      else
        set rbl_ip range
        if not range matches '^([0-9]{1,3}\.){3}[0-9]{1,3}$'
          return 0
        fi
      fi
    
      if not (address matches '^([0-9]{1,3}\.){3}[0-9]{1,3}$'
              and address != range)
        return 0
      fi
    
      if address matches
            '^([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})$'
        if match_cidr (resolve ("\4.\3.\2.\1", zone), rbl_ip)
          return 1
        else
          return 0
        fi
      fi
      # never reached
    done
    

4.14 Expressions

Expressions are language constructs, that evaluate to a value, that can subsequently be echoed, tested in a conditional statement, assigned to a variable or passed to a function.

4.14.1 Constant Expressions

Literals and numbers are constant expressions. They evaluate to string and numeric types.

4.14.2 Function Calls

A function call is an expression. Its type is the return type of the function.

4.14.3 Concatenation

Concatenation operator is ‘.’ (a dot). For example, if $f is ‘smith’, and $client_addr is ‘10.10.1.1’, then:

$f . "-" . $client_addr ⇒ "smith-10.10.1.1"

Any two adjacent literal strings are concatenated, producing a new string, e.g.

"GNU's" " not " "UNIX" ⇒ "GNU's not UNIX"

4.14.4 Arithmetic Operations

The filter script language offers the common arithmetic operators: ‘+’, ‘-’, ‘*’ and ‘/’. In addition, the ‘%’ is a modulo operator, i.e. it computes the remainder of division of its operands.

All of them follow usual precedence rules and work as you would expect them to.

4.14.5 Bitwise shifts

The ‘<<’ represents a bitwise shift left operation, which shifts the binary representation of the operand on its left by the number of bits given by the operand on its right.

Similarly, the ‘>>’ represents a bitwise shift right.

4.14.6 Relational Expressions

Relational expressions are:

ExpressionResult
x < yTrue if x is less than y.
x <= yTrue if x is less than or equal to y.
x > yTrue if x is greater than y.
x >= yTrue if x is greater than or equal to y.
x = yTrue if x is equal to y.
x != yTrue if x is not equal to y.

Table 4.5: Relational Expressions

The relational expressions apply to string as well as to numbers. When a relational operation applies to strings, case-sensitive comparison is used, e.g.:

"String" = "string" ⇒ False
"String" < "string" ⇒ True

4.14.7 Special Comparisons

In addition to the traditional relational operators, described above, mailfromd provides two operators for regular expression matching:

ExpressionResult
x matches yTrue if the string x matches the regexp denoted by y.
x fnmatches yTrue if the string x matches the globbing pattern denoted by y.

Table 4.6: Regular Expression Matching

The type of the regular expression used by matches operator is controlled by #pragma regex (see pragma regex). For example:

$f ⇒ "gray@gnu.org.ua"
$f matches '.*@gnu\.org\.ua' ⇒ true
$f matches '.*@GNU\.ORG\.UA' ⇒ false
#pragma regex +icase
$f matches '.*@GNU\.ORG\.UA' ⇒ true

The fnmatches operator compares its left-hand operand with a globbing pattern (see glob(7)) given as its right-hand side operand. For example:

$f ⇒ "gray@gnu.org.ua"
$f fnmatches "*ua" ⇒ true
$f fnmatches "*org" ⇒ false
$f fnmatches "*org*" ⇒ true

Both operators have a special form, for MX’ pattern matching. The expression:

  x mx matches y

is evaluated as follows: first, the expression x is analyzed and, if it is an email address, its domain part is selected. If it is not, its value is used verbatim. Then the list of ‘MX’s for this domain is looked up. Each of ‘MX’ names is then compared with the regular expression y. If any of the names matches, the expression returns true. Otherwise, its result is false.

Similarly, the expression:

  x mx fnmatches y

returns true only if any of the ‘MX’s for (domain or email) x match the globbing pattern y.

Both mx matches and mx fnmatches can signal the following exceptions: e_temp_failure, e_failure.

The value of any parenthesized subexpression occurring within the right-hand side argument to matches or mx matches can be referenced using the notation ‘\d’, where d is the ordinal number of the subexpression (subexpressions are numbered from left to right, starting at 1). This notation is allowed in the program text as well as within double-quoted strings and here-documents, for example:

if $f matches '.*@\(.*\)\.gnu\.org\.ua'
  set message "Your host name is \1;"
fi

Remember that the grouping symbols are ‘\(’ and ‘\)’ for basic regular expressions, and ‘(’ and ‘)’ for extended regular expressions. Also make sure you properly escape all special characters (backslashes in particular) in double-quoted strings, or use single-quoted strings to avoid having to do so (see singe-vs-double, for a comparison of the two forms).

4.14.8 Boolean Expressions

A boolean expression is a combination of relational or matching expressions using the boolean operators and, or and not, and, eventually, parentheses to control nesting:

ExpressionResult
x and yTrue only if both x and y are true.
x or yTrue if any of x or y is true.
not xTrue if x is false.

table 4.1: Boolean Operators

Binary boolean expressions are computed using shortcut evaluation:

x and y

If xfalse, the result is false and y is not evaluated.

x or y

If xtrue, the result is true and y is not evaluated.

4.14.9 Operator Precedence

Operator precedence is an abstract value associated with each language operator, that determines the order in which operators are executed when they appear together within a single expression. Operators with higher precedence are executed first. For example, ‘*’ has a higher precedence than ‘+’, therefore the expression a + b * c is evaluated in the following order: first b is multiplied by c, then a is added to the product.

When operators of equal precedence are used together they are evaluated from left to right (i.e., they are left-associative), except for comparison operators, which are non-associative (these are explicitly marked as such in the table below). This means that you cannot write:

if 5 <= x <= 10

Instead, you should write:

if 5 <= x and x <= 10

The precedences of the mailfromd operators where selected so as to match that used in most programming languages.14

The following table lists all operators in order of decreasing precedence:

(...)

Grouping

$ %

Sendmail macros and mailfromd variables

* /

Multiplication, division

+ -

Addition, subtraction

<< >>

Bitwise shift left and right

< <= >= >

Relational operators (non-associative)

= != matches fnmatches

Equality and special comparison (non-associative)

&

Logical (bitwise) AND

^

Logical (bitwise) XOR

|

Logical (bitwise) OR

not

Boolean negation

and

Logical ‘and’.

or

Logical ‘or

.

String concatenation

4.14.10 Type Casting

When two operands on each side of a binary expression have different type, mailfromd evaluator coerces them to a common type. This is known as implicit type casting. The rules for implicit type casting are:

  1. Both arguments to an arithmetical operation are cast to numeric type.
  2. Both arguments to the concatenation operation are cast to string.
  3. Both arguments to ‘match’ or ‘fnmatch’ function are cast to string.
  4. The argument of the unary negation (arithmetical or boolean) is cast to numeric.
  5. Otherwise the right-hand side argument is cast to the type of the left-hand side argument.

The construct for explicit type cast is:

type(expr)

where type is the name of the type to coerce expr to. For example:

string(2 + 4*8) ⇒ "34"

4.15 Variable and Constant Shadowing

When any two named entities happen to have the same name we say that a name clash occurs. The handling of name clashes depends on types of the entities involved in it.

function – any

A name of a constant or variable can coincide with that of a function, it does not produce any warnings or errors because functions, variables and constants use different namespaces. For example, the following code is correct:

const a 4

func a()
do
  echo a
done

When executed, it prints ‘4’.

function – function, handler – function, and function – handler

Redefinition of a function or using a predefined handler name (see Handlers) as a function name results in a fatal error. For example, compiling this code:

func a()
do
  echo "1"
done

func a()
do
  echo "2"
done

causes the following error message:

mailfromd: sample.mf:9: syntax error, unexpected
FUNCTION_PROC, expecting IDENTIFIER

handler – variable

A variable name can coincide with a handler name. For example, the following code is perfectly OK:

string envfrom "M"
prog envfrom
do
        echo envfrom
done

handler – handler

If two handlers with the same name are defined, the definition that appears further in the source text replaces the previous one. A warning message is issued, indicating locations of both definitions, e.g.:

mailfromd: sample.mf:116: Warning: Redefinition of handler
`envfrom'
mailfromd: sample.mf:34: Warning: This is the location of the
previous definition

variable – variable

Defining a variable having the same name as an already defined one results in a warning message being displayed. The compilation succeeds. The second variable shadows the first, that is any subsequent references to the variable name will refer to the second variable. For example:

string x "Text"
number x 1

prog envfrom
do
  echo x
done

Compiling this code results in the following diagnostics:

mailfromd: sample.mf:4: Redeclaring `x' as different data type
mailfromd: sample.mf:2: This is the location of the previous
definition

Executing it prints ‘1’, i.e. the value of the last definition of x.

The scope of the shadowing depends on storage classes of the two variables. If both of them have external storage class (i.e. are global ones), the shadowing remains in effect until the end of input. In other words, the previous definition of the variable is effectively forgotten.

If the previous definition is a global, and the shadowing definition is an automatic variable or a function parameter, the scope of this shadowing ends with the scope of the second variable, after which the previous definition (global) becomes visible again. Consider the following code:

set x "initial"

func foo(string x) returns string
do
  return x
done

prog envfrom
do
  echo foo("param")
  echo x
done

Its compilation produces the following warning:

mailfromd: sample.mf:3: Warning: Parameter `x' is shadowing a global

When executed, it produces the following output:

param
initial
State envfrom: continue

variable – constant

If a constant is defined which has the same name as a previously defined variable (the constant shadows the variable), the compiler prints the following diagnostic message:

file:line: Warning: Constant name `name' clashes with a variable name
file:line: Warning: This is the location of the previous definition

A similar diagnostics is issued if a variable is defined whose name coincides with a previously defined constant (the variable shadows the constant).

In any case, any subsequent notation %name refers to the last defined symbol, be it variable or constant.

Notice, that shadowing occurs only when using %name notation. Referring to the constant using its name without ‘%’ allows to avoid shadowing effects.

If a variable shadows a constant, the scope of the shadowing depends on the storage class of the variable. For automatic variables and function parameters, it ends with the final done closing the function. For global variables, it lasts up to the end of input.

For example, consider the following code:

const a 4

func foo(string a)
do
  echo a
done

prog envfrom
do
  foo(10)
  echo a
done

When run, it produces the following output:

$ mailfromd --test sample.mf                        
mailfromd: sample.mf:3: Warning: Variable name `a' clashes with a
constant name
mailfromd: sample.mf:1: Warning: This is the location of the previous
definition
10
4
State envfrom: continue

constant – constant

Redefining a constant produces a warning message. The latter definition shadows the former. Shadowing remains in effect until the end of input.

4.16 Statements

Statements are language constructs, that, unlike expressions, do not return any value. Statements execute some actions, such as assigning a value to a variable, or serve to control the execution flow in the program.

4.16.1 Action Statements

An action statement instructs mailfromd to perform a certain action over the message being processed. There are two kinds of actions: return actions and header manipulation actions.

Reply Actions

Reply actions tell Sendmail to return given response code to the remote party. There are five such actions:

accept

Return an accept reply. The remote party will continue transmitting its message.

reject code excode message-expr
reject (code-expr, excode-expr, message-expr)

Return a reject reply. The remote party will have to cancel transmitting its message. The three arguments are optional, their usage is described below.

tempfail code excode message
tempfail (code-expr, excode-expr, message-expr)

Return a ‘temporary failure’ reply. The remote party can retry to send its message later. The three arguments are optional, their usage is described below.

discard

Instructs Sendmail to accept the message and silently discard it without delivering it to any recipient.

continue

Stops the current handler and instructs Sendmail to continue processing of the message.

Two actions, reject and tempfail can take up to three optional parameters. There are two forms of supplying these parameters.

In the first form, called literal or traditional notation, the arguments are supplied as additional words after the action name, and are separated by whitespace. The first argument is a three-digit RFC 2821 reply code. It must begin with ‘5’ for reject and with ‘4’ for tempfail. If two arguments are supplied, the second argument must be either an extended reply code (RFC 1893/2034) or a textual string to be returned along with the SMTP reply. Finally, if all three arguments are supplied, then the second one must be an extended reply code and the third one must give the textual string. The following examples illustrate the possible ways of using the reject statement:

reject
reject 503
reject 503 5.0.0
reject 503 "Need HELO command"
reject 503 5.0.0 "Need HELO command"

The notion textual string, used above means either a literal string or an MFL expression that evaluates to string. However, both code and extended code must always be literal.

The second form of supplying arguments is called functional notation, because it resembles the function syntax. When used in this form, the action word is followed by a parenthesized group of exactly three arguments, separated by commas. Each argument is a MFL expression. The meaning and ordering of the arguments is the same as in literal form. Any or all of these three arguments may be absent, in which case it will be replaced by the default value. To illustrate this, here are the statements from the previous example, written in functional notation:

reject(,,)
reject(503,,)
reject(503, 5.0.0)
reject(503, , "Need HELO command")
reject(503, 5.0.0, "Need HELO command")

Notice that there is an important difference between the two notations. The functional notation allows to compute both reply codes at run time, e.g.:

  reject(500 + dig2*10 + dig3, "5.%edig2.%edig2")

Header Actions

Header manipulation actions provide basic means to add, delete or modify the message RFC 2822 headers.

add name string

Add the header name with the value string. E.g.:

add "X-Seen-By" "Mailfromd 8.7"

(notice argument quoting)

replace name string

The same as add, but if the header name already exists, it will be removed first, for example:

replace "X-Last-Processor" "Mailfromd 8.7"
delete name

Delete the header named name:

delete "X-Envelope-Date"

These actions impose some restrictions. First of all, their first argument must be a literal string (not a variable or expression). Secondly, there is no way to select a particular header instance to delete or replace, which may be necessary to properly handle multiple headers (e.g. ‘Received’). For more elaborate ways of header modifications, see Header modification functions.

4.16.2 Variable Assignments

An assignment is a special statement that assigns a value to the variable. It has the following syntax:

set name value

where name is the variable name and value is the value to be assigned to it.

Assignment statements can appear in any part of a filter program. If an assignment occurs outside of function or handler definition, the value must be a literal value (see Literals). If it occurs within a function or handler definition, value can be any valid mailfromd expression (see Expressions). In this case, the expression will be evaluated and its value will be assigned to the variable. For example:

set delay 150

prog envfrom
do
  set delay delay * 2
  …
done

4.16.3 The pass statement

The pass statement has no effect. It is used in places where no statement is needed, but the language syntax requires one:

on poll $f do
when success:
  pass
when not_found or failure:
  reject 550
done

4.16.4 The echo statement

The echo statement concatenates all its arguments into a single string and sends it to the syslog using the priority ‘info’. It is useful for debugging your script, in conjunction with built-in constants (see Built-in constants), for example:

func foo(number x)
do
  echo "%__file__:%__line__: foo called with arg %x"
  …
done

4.17 Conditional Statements

Conditional expressions, or conditionals for short, test some conditions and alter the control flow depending on the result. There are two kinds of conditional statements: if-else branches and switch statements.

The syntax of an if-else branching construct is:

  if condition then-body [else else-body] fi

Here, condition is an expression that governs control flow within the statement. Both then-body and else-body are lists of mailfromd statements. If condition is true, then-body is executed, if it is false, else-body is executed. The ‘else’ part of the statement is optional. The condition is considered false if it evaluates to zero, otherwise it is considered true. For example:

if $f = ""
  accept
else
  reject
fi

This will accept the message if the value of the Sendmail macro $f is an empty string, and reject it otherwise. Both then-body and else-body can be compound statements including other if statements. Nesting level of conditional statements is not limited.

To facilitate writing complex conditional statements, the elif keyword can be used to introduce alternative conditions, for example:

if $f = ""
  accept
elif $f = "root"
  echo "Mail from root!"
else
  reject
fi

Another type of branching instruction is switch statement:

switch condition
do
case x1 [or x2 …]:  
  stmt1
case y1 [or y2 …]:
  stmt2
  .
  .
  .
[default:
  stmt]
done

Here, x1, x2, y1, y2 are literal expressions; stmt1, stmt2 and stmt are arbitrary mailfromd statements (possibly compound); condition is the controlling expression. The vertical dotted row represent another eventual ‘case’ branches.

This statement is executed as follows: the condition expression is evaluated and if its value equals x1 or x2 (or any other x from the first case), then stmt1 is executed. Otherwise, if condition evaluates to y1 or y2 (or any other y from the second case), then stmt2 is executed. Other case branches are tried in turn. If none of them matches, stmt (called the default branch) is executed.

There can be as many case branches as you wish. The default branch is optional. There can be at most one default branch.

An example of switch statement follows:

switch x
do
case 1 or 3:
  add "X-Branch" "1"
  accept
case 2 or 4 or 6:
  add "X-Branch" "2"
default:
  reject
done  

If the value of mailfromd variable x is 2 or 3, it will accept the message immediately, and add a ‘X-Branch: 1’ header to it. If x equals 2 or 4 or 6, this code will add ‘X-Branch: 2’ header to the message and will continue processing it. Otherwise, it will reject the message.

The controlling condition of a switch statement may evaluate to numeric or string type. The type of the condition governs the type of comparisons used in case branches: for numeric types, numeric equality will be used, whereas for string types, string equality is used.

4.18 Loop Statements

The loop statement allows for repeated execution of a block of code, controlled by some conditional expression. It has the following form:

loop [label]
     [for stmt1] [,while expr1] [,stmt2]
do
  stmt3
done [while expr2]

where stmt1, stmt2, and stmt3 are statement lists, expr1 and expr2 are expressions.

The control flow is as follows:

  1. If stmt1 is specified, execute it.
  2. Evaluate expr1. If it is zero, go to 6. Otherwise, continue.
  3. Execute stmt3.
  4. If stmt2 is supplied, execute it.
  5. If expr2 is given, evaluate it. If it is zero, go to 6. Otherwise, go to 2.
  6. End.

Thus, stmt3 is executed until either expr1 or expr2 yield a zero value.

The loop bodystmt3 – can contain special statements:

break [label]

Terminates the loop immediately. Control passes to ‘6’ (End) in the formal definition above. If label is supplied, the statement terminates the loop statement marked with that label. This allows to break from nested loops.

It is similar to break statement in C or shell.

next [label]

Initiates next iteration of the loop. Control passes to ‘4’ in the formal definition above. If label is supplied, the statement starts next iteration of the loop statement marked with that label. This allows to request next iteration of an upper-level loop from a nested loop statement.

The loop statement can be used to create iterative statements of arbitrary complexity. Let’s illustrate it in comparison with C.

The statement:

loop
do
  stmt-list
done

creates an infinite loop. The only way to exit from such a loop is to call break (or return, if used within a function), somewhere in stmt-list.

The following statement is equivalent to while (expr1) stmt-list in C:

loop while expr
do
  stmt-list
done

The C construct for (expr1; expr2; expr3) is written in MFL as follows:

loop for stmt1, while expr2, stmt2
do
  stmt3
done        

For example, to repeat stmt3 10 times:

loop for set i 0, while i < 10, set i i + 1
do
  stmt3
done

Finally, the Cdo’ loop is implemented as follows:

loop 
do
  stmt-list
done while expr

As a real-life example of a loop statement, let’s consider the implementation of function ptr_validate, which takes a single argument ipstr, and checks its validity using the following algorithm:

Perform a DNS reverse-mapping for ipstr, looking up the corresponding PTR record in ‘in-addr.arpa’. For each record returned, look up its IP addresses (A records). If ipstr is among the returned IP addresses, return 1 (true), otherwise return 0 (false).

The implementation of this function in MFL is:

#pragma regex push +extended

func ptr_validate(string ipstr) returns number
do
  loop for string names dns_getname(ipstr) . " "
           number i index(names, " "),
       while i != -1,
       set names substr(names, i + 1)
       set i index(names, " ")
  do
    loop for string addrs dns_getaddr(substr(names, 0, i)) . " "
             number j index(addrs, " "),
         while j != -1,
         set addrs substr(addrs, j + 1)
         set j index(addrs, " ")
    do
      if ipstr == substr(addrs, 0, j)
        return 1
      fi
    done
  done
  return 0
done

4.19 Exceptional Conditions

When the running program encounters a condition it is not able to handle, it signals an exception. To illustrate the concept, let’s consider the execution of the following code fragment:

  if primitive_hasmx(domainpart($f))
    accept
  fi

The function primitive_hasmx (see primitive_hasmx) tests whether the domain name given as its argument has any ‘MX’ records. It should return a boolean value. However, when querying the Domain Name System, it may fail to get a definite result. For example, the DNS server can be down or temporary unavailable. In other words, primitive_hasmx can be in a situation when, instead of returning ‘yes’ or ‘no’, it has to return ‘don't know’. It has no way of doing so, therefore it signals an exception.

Each exception is identified by exception type, an integer number associated with it.

4.19.1 Built-in Exceptions

The lowest 19 exception numbers are reserved for built-in exceptions. These are declared in module status.mf. The following table summarizes all built-in exception types implemented by mailfromd version 8.7:

e_dbfailure

General database failure. For example, the database cannot be opened. This exception can be signaled by any function that queries any DBM database.

e_divzero

Division by zero.

e_exists

This exception is emitted by dbinsert built-in if the requested key is already present in the database (see dbinsert).

e_eof

Function reached end of file while reading. See I/O functions, for a description of functions that can signal this exception.

e_failure
failure
e_failure

A general failure has occurred. In particular, this exception is signaled by DNS lookup functions when any permanent failure occurs. This exception can be signaled by any DNS-related function (hasmx, poll, etc.) or operation (mx matches).

e_format

Invalid input format. This exception is signaled if input data to a function are improperly formatted. In version 8.7 it is signaled by message_burst function if its input message is not formatted according to RFC 934. See Message digest functions.

e_invcidr

Invalid CIDR notation. This is signaled by match_cidr function when its second argument is not a valid CIDR.

e_invip

Invalid IP address. This is signaled by match_cidr function when its first argument is not a valid IP address.

e_invtime

Invalid time interval specification. It is signaled by interval function if its argument is not a valid time interval (see time interval specification).

e_io

An error occurred during the input-output operation. See I/O functions, for a description of functions that can signal this exception.

e_macroundef

A Sendmail macro is undefined.

e_noresolve

The argument of a DNS-related function cannot be resolved to host name or IP address. Currently only ismx (see ismx) raises this exception.

e_range

The supplied argument is outside the allowed range. This is signalled, for example, by substring function (see substring).

e_regcomp

Regular expression cannot be compiled. This can happen when a regular expression (a right-hand argument of a matches operator) is built at the runtime and the produced string is an invalid regex.

e_ston_conv

String-to-number conversion failed. This can be signaled when a string is used in numeric context which cannot be converted to the numeric data type. For example:

 set x "10a"
 if x / 2
   …

The if condition will signal ston_conv, since ‘10a’ cannot be converted to a number.

e_temp_failure
temp_failure
e_temp_failure

A temporary failure has occurred. This can be signaled by DNS-related functions or operations.

e_url

The supplied URL is invalid. See Interfaces to Third-Party Programs.

In addition to these, two symbols are defined that are not exception types in the strict sense of the world, but are provided to make writing filter scripts more convenient. These are success, meaning successful return from a function, and not_found, meaning that the required entity (e.g. domain name or email address) was not found. See Figure 4.1, for an illustration on how these can be used. For consistency with other exception codes, these can be spelled as e_success and e_not_found.

4.19.2 User-defined Exceptions

You can define your own exception types using the dclex statement:

dclex type

In this statement, type must be a valid MFL identifier, not used for another constant (see Constants). The dclex statement defines a new exception identified by the constant type and allocates a new exception number for it.

The type can subsequently be used in throw and catch statements, for example:

dclex myrange

number fact(number val)
  returns number
do
  if val < 0
    throw myrange "fact argument is out of range"
  fi
  …
done

4.19.3 Exception Handling

Normally when an exception is signalled, the program execution is terminated and the MTA is returned a tempfail status. Additional information regarding the exception is then output to the logging channel (see Logging and Debugging). However, the user can intercept any exception by installing his own exception-handling routines.

An exception-handling routine is introduced by a try–catch statement, which has the following syntax:

try
do
  stmtlist
done
catch exception-list
do
  handler-body
done

where stmtlist and handler-body are sequences of MFL statements and exception-list is the list of exception types, separated by the word or. A special exception-list*’ is allowed and means all exceptions.

This construct works as follows. First, the statements from stmtlist are executed. If the execution finishes successfully, control is passed to the first statement after the ‘catch’ block. Otherwise, if an exception is signalled and this exception is listed in exception-list, the execution is passed to the handler-body. If the exception is not listed in exception-list, it is handled as usual.

The following example shows a ‘try--catch’ construct used for handling eventual exceptions, signalled by primitive_hasmx.

try
do
  if primitive_hasmx(domainpart($f))
    accept
  else
    reject
  fi
done
catch e_failure or e_temp_failure
do
  echo "primitive_hasmx failed"
  continue
done

The ‘try--catch’ statement can appear anywhere inside a function or a handler, but it cannot appear outside of them. It can also be nested within another ‘try--catch’, in either of its parts. Upon exit from a function or milter handler, all exceptions are restored to the state they had when it has been entered.

A catch block can also be used alone, without preceding try part. Such a construct is called a standalone catch. It is mostly useful for setting global exception handlers in a begin statement (see begin/end). When used within a usual function or handler, the exception handlers set by a standalone catch remain in force until either another standalone catch appears further in the same function or handler, or an end of the function is encountered, whichever occurs first.

A standalone catch defined within a function must return from it by executing return statement. If it does not do that explicitly, the default value of 1 is returned. A standalone catch defined within a milter handler must end execution with any of the following actions: accept, continue, discard, reject, tempfail. By default, continue is used.

It is not recommended to mix ‘try--catch’ constructs and standalone catches. If a standalone catch appears within a ‘try--catch’ statement, its scope of visibility is undefined.

Upon entry to a handler-body, two implicit positional arguments are defined, which can be referenced in handler-body as $1 and $2. The first argument gives the numeric code of the exception that has occurred. The second argument is a textual string containing a human-readable description of the exception.

The following is an improved version of the previous example, which uses these parameters to supply more information about the failure:

try
do
  if primitive_hasmx(domainpart($f))
    accept
  else
    reject
  fi
done
catch e_failure or e_temp_failure
do
  echo "Caught exception $1: $2"
  continue
done

The following example defines the function hasmx that returns true if the domain part of its argument has any ‘MX’ records, and false if it does not or if an exception occurs 15.

func hasmx (string s)
  returns number
do
  try
  do
    return primitive_hasmx(domainpart(s))
  done
  catch *
  do
    return 0
  done
done

The same function can written using standalone catch:

func hasmx (string s)
  returns number
do
  catch *
  do
    return 0
  done
  return primitive_hasmx(domainpart(s))
done

All variables remain visible within catch body, with the exception of positional arguments of the enclosing handler. To access positional arguments of a handler from the catch body, assign them to local variables prior to the ‘try--catch’ construct, e.g.:

prog header
do
  string hname $1
  string hvalue $2
  try
  do
    …
  done  
  catch *
  do
    echo "Exception $1 while processing header %hname: %hvalue"
    echo $2
    tempfail
  done

You can also generate (or raise) exceptions explicitly in the code, using throw statement:

throw excode descr

The arguments correspond exactly to the positional parameters of the catch statement: excode gives the numeric code of the exception, descr gives its textual description. This statement can be used in complex scripts to create non-local exits from deeply nested statements.

Notice, that the the excode argument must be an immediate value: an exception identifier (either a built-in one or one declared previously using a dclex statement).

4.20 Sender Verification Tests

The filter script language provides a wide variety of functions for sender address verification or polling, for short. These functions, which were described in SMTP Callout functions, can be used to implement any sender verification method. The additional data that can be needed is normally supplied by two global variables: ehlo_domain, keeping the default domain for the EHLO command, and mailfrom_address, which stores the sender address for probe messages (see Predefined variables).

For example, a simplest way to implement standard polling would be:

prog envfrom
do
  if stdpoll($1, ehlo_domain, mailfrom_address) == 0
    accept
  else
    reject 550 5.1.0 "Sender validity not confirmed"
  fi
done

However, this does not take into account exceptions that stdpoll can signal. To handle them, one will have to use catch, for example thus:

require status

prog envfrom
do
  try
  do
    if stdpoll($1, ehlo_domain, mailfrom_address) == 0
      accept
    else
      reject 550 5.1.0 "Sender validity not confirmed"
    fi
  done
  catch e_failure or e_temp_failure
  do
    switch $1
    do
    case failure:
      reject 550 5.1.0 "Sender validity not confirmed"
    case temp_failure:
      tempfail 450 4.1.0 "Try again later"
    done
  done
done

If polls are used often, one can define a wrapper function, and use it instead. The following example illustrates this approach:

func poll_wrapper(string email) returns number
do
  catch e_failure or e_temp_failure
  do
    return email
  done
  return stdpoll(email, ehlo_domain, mailfrom_address)
done

prog envfrom
do
  switch poll_wrapper($f)
  do
  case success:
    accept
  case not_found or failure:
    reject 550 5.1.0 "Sender validity not confirmed"
  case temp_failure:
    tempfail 450 4.1.0 "Try again later"
  done
done

Figure 4.1: Building Poll Wrappers

Notice the way envfrom handles success and not_found, which are not exceptions in the strict sense of the word.

The above paradigm is so common that mailfromd provides a special language construct to simplify it: the on statement. Instead of manually writing the wrapper function and using it as a switch condition, you can rewrite the above example as:

prog envfrom
do
  on stdpoll($1, ehlo_domain, mailfrom_address)
  do
  when success:
    accept
  when not_found or failure:
    reject 550 5.1.0 "Sender validity not confirmed"
  when temp_failure:
    tempfail 450 4.1.0 "Try again later"
  done
done

Figure 4.2: Standard poll example

As you see the statement is pretty similar to switch. The major syntactic difference is the use of the keyword when to introduce conditional branches.

General syntax of the on statement is:

on condition
do
  when x1 [or x2 …]:  
    stmt1
  when y1 [or y2 …]:
    stmt2
    .
    .
    .
done

The condition is either a function call or a special poll statement (see below). The values used in when branches are normally symbolic exception names (see exception names).

When the compiler processes the on statement it does the following:

  1. Builds a unique wrapper function, similar to that described in Figure 4.1; The name of the function is constructed from the condition function name and an unsigned number, called exception mask, that is unique for each combination of exceptions used in when branches; To avoid name clashes with the user-defined functions, the wrapper name begins and ends with ‘$’ which normally is not allowed in the identifiers;
  2. Translates the on body to the corresponding switch statement;

A special form of the condition is poll keyword, whose syntax is:

poll [for] email
     [host host]
     [from domain]
     [as email]

The order of particular keywords in the poll statement is arbitrary, for example as email can appear before email as well as after it.

The simplest form, poll email, performs the standard sender verification of email address email. It is translated to the following function call:

  stdpoll(email, ehlo_domain, mailfrom_address)

The construct poll email host host, runs the strict sender verification of address email on the given host. It is translated to the following call:

  strictpoll(host, email, ehlo_domain, mailfrom_address)

Other keywords of the poll statement modify these two basic forms. The as keyword introduces the email address to be used in the SMTP MAIL FROM command, instead of mailfrom_address. The from keyword sets the domain name to be used in EHLO command. So, for example the following construct:

  poll email host host from domain as addr

is translated to

  strictpoll(host, email, domain, addr)

To summarize the above, the code described in Figure 4.2 can be written as:

prog envfrom
do
  on poll $f do
  when success:
    accept
  when not_found or failure:
    reject 550 5.1.0 "Sender validity not confirmed"
  when temp_failure:
    tempfail 450 4.1.0 "Try again later"
  done
done

4.21 Modules

A module is a logically isolated part of code that implements a separate concern or feature and contains a collection of conceptually united functions and/or data. Each module occupies a separate compilation unit (i.e. file). The functionality provided by a module is incorporated into another module or the main program by requiring this module or by importing the desired components from it.

4.21.1 Declaring Modules

A module file must begin with a module declaration:

module modname [interface-type].

Note the final dot.

The modname parameter declares the name of the module. It is recommended that it be the same as the file name without the ‘.mf’ extension. The module name must be a valid MFL literal. It also must not coincide with any defined MFL symbol, therefore we recommend to always quote it (see example below).

The optional parameter interface-type defines the default scope of visibility for the symbols declared in this module. If it is ‘public’, then all symbols declared in this module are made public (importable) by default, unless explicitly declared otherwise (see scope of visibility). If it is ‘static’, then all symbols, not explicitly marked as public, become static. If the interface-type is not given, ‘public’ is assumed.

The actual MFL code follows the ‘module’ line.

The module definition is terminated by the logical end of its compilation unit, i.e. either by the end of file, or by the keyword bye, whichever occurs first.

Special keyword bye may be used to prematurely end the current compilation unit before the physical end of the containing file. Any material between bye and the end of file is ignored by the compiler.

Let’s illustrate these concepts by writing a module ‘revip’:

module 'revip' public.

func revip(string ip)
  returns string
do
  return inet_ntoa(ntohl(inet_aton(ip)))
done

bye

This text is ignored.  You may put any additional
documentation here.

4.21.2 Scope of Visibility

Scope of Visibility of a symbol defines from where this symbol may be referred to. Symbols in MFL may have either of the following two scopes:

Public

Public symbols are visible from the current module, as well as from any external modules, including the main script file, provided that they are properly imported (see import).

Static

Static symbols are visible only from the current module. There is no way to refer to them from outside.

The default scope of visibility for all symbols declared within a module is defined in the module declaration (see module structure). It may be overridden for any individual symbol by prefixing its declaration with an appropriate qualifier: either public or static.

4.21.3 Require and Import

Functions or variables declared in another module must be imported prior to their actual use. MFL provides two ways of doing so: by requiring the entire module or by importing selected symbols from it.

Module Import: require modname

The require statement instructs the compiler to locate the module modname and to load all public interfaces from it.

The compiler looks for the file modname.mf in the current search path (see include search path). If no such file is found, a compilation error is reported.

For example, the following statement:

require revip

imports all interfaces from the module revip.mf.

Another, more sophisticated way to import from a module is to use the ‘from ... import’ construct:

from module import symbols.

Note the final dot. The ‘from’ and ‘module’ statements are the only two constructs in MFL that require the delimiter.

The module has the same semantics as in the require construct. The symbols is a comma-separated list of symbol names to import from module. A symbol name may be given in several forms:

  1. Literal

    Literals specify exact symbol names to import. For example, the following statement imports from module A.mf symbols ‘foo’ and ‘bar’:

    from A import foo,bar.
    
  2. Regular expression

    Regular expressions must be surrounded by slashes. A regular expression instructs the compiler to import all symbols whose names match that expression. For example, the following statement imports from A.mf all symbols whose names begin with ‘foo’ and contain at least one digit after it:

    from A import '/^foo.*[0-9]/'.
    

    The type of regular expressions used in the ‘from’ statement is controlled by #pragma regex (see regex).

  3. Regular expression with transformation

    Regular expression may be followed by a s-expression, i.e. a sed-like expression of the form:

    s/regexp/replace/[flags]
    

    where regexp is a regular expression, replace is a replacement for each part of the input that matches regexp. S-expressions and their parts are discussed in detail in s-expression.

    The effect of such construct is to import all symbols that match the regular expression and apply the s-expression to their names.

    For example:

    from A import '/^foo.*[0-9]/s/.*/my_&/'.
    

    This statement imports all symbols whose names begin with ‘foo’ and contain at least one digit after it, and renames them, by prefixing their names with the string ‘my_’. Thus, if A.mf declared a function ‘foo_1’, it becomes visible under the name of ‘my_foo_1’.

4.22 MFL Preprocessor

Before compiling the script file, mailfromd preprocesses it. The built-in preprocessor handles only file inclusion (see include), while the rest of traditional facilities, such as macro expansion, are supported via m4, which is used as an external preprocessor.

The detailed description of m4 facilities lies far beyond the scope of this document. You will find a complete user manual in GNU M4 in GNU M4 macro processor. For the rest of this section we assume the reader is sufficiently acquainted with m4 macro processor.

The external preprocessor is invoked with -s flag, instructing it to include line synchronization information in its output, which is subsequently used by MFL compiler for purposes of error reporting. The initial set of macro definitions is supplied in file pp-setup, located in the library search path16, which is fed to the preprocessor input before the script file itself. The default pp-setup file renames all m4 built-in macro names so they all start with the prefix ‘m4_17. It changes comment characters to ‘/*’, ‘*/’ pair, and leaves the default quoting characters, grave (‘`’) and acute (‘'’) accents without change. Finally, pp-setup defines the following macros:

M4 Macro: boolean defined (identifier)

The identifier must be the name of an optional abstract argument to the function. This macro must be used only within a function definition. It expands to the MFL expression that yields true if the actual parameter is supplied for identifier. For example:

func rcut(string text; number num)
  returns string
do
  if (defined(num))
    return substr(text, length(text) - num)
  else
    return text
  fi
done

This function will return last num characters of text if num is supplied, and entire text otherwise, e.g.:

rcut("text string") ⇒ "text string"
rcut("text string", 3) ⇒ "ing"

Invoking the defined macro with the name of a mandatory argument yields true

M4 Macro: printf (format, …)

Provides a printf statement, that formats its optional parameters in accordance with format and sends the resulting string to the current log output (see Logging and Debugging). See String formatting, for a description of format.

Example usage:

printf('Function %s returned %d', funcname, retcode)
M4 Macro: string _ (msgid)

A convenience macro. Expands to a call to gettext (see NLS Functions).

M4 Macro: string_list_iterate (list, delim, var, code)

This macro intends to compensate for the lack of array data type in MFL. It splits the string list into segments delimited by string delim. For each segment, the MFL code code is executed. The code can use the variable var to refer to the segment string.

For example, the following fragment prints names of all existing directories listed in the PATH environment variable:

string path getenv("PATH")
string seg

string_list_iterate(path, ":", seg, `
     if access(seg, F_OK)
       echo "%seg exists"
     fi')

Care should be taken to properly quote its arguments. In the code below the string str is treated as a comma-separated list of values. To avoid interpreting the comma as argument delimiter the second argument must be quoted:

string_list_iterate(str, `","', seg, `
     echo "next segment: " . seg')
M4 Macro: N_ (msgid)

A convenience macro, that expands to msgid verbatim. It is intended to mark the literal strings that should appear in the .po file, where actual call to gettext (see NLS Functions) cannot be used. For example:

/* Mark the variable for translation: cannot use gettext here */
string message N_("Mail accepted")

prog envfrom
do
  …
  /* Translate and log the message */ 
  echo gettext(message)

You can obtain the preprocessed output, without starting actual compilation, using -E command line option:

$ mailfromd -E file.mf

The output is in the form of preprocessed source code, which is sent to the standard output. This can be useful, among others, to debug your own macro definitions.

Macro definitions and deletions can be made on the command line, by using the -D and -U options. They have the following format:

-D name[=value]
--define=name[=value]

Define a symbol name to have a value value. If value is not supplied, the value is taken to be the empty string. The value can be any string, and the macro can be defined to take arguments, just as if it was defined from within the input using the m4_define statement.

For example, the following invocation defines symbol COMPAT to have a value 43:

$ mailfromf -DCOMPAT=43
-U name
--undefine=name

A counterpart of the -D option is the option -U (--undefine). It undefines a preprocessor symbol whose name is given as its argument. The following example undefines the symbol COMPAT:

$ mailfromf -UCOMPAT

The following two options are supplied mainly for debugging purposes:

--no-preprocessor

Disables the external preprocessor.

--preprocessor=command

Use command as external preprocessor. Be especially careful with this option, because mailfromd cannot verify whether command is actually some kind of a preprocessor or not.

4.23 Example of a Filter Script File

In this section we will discuss a working example of the filter script file. For the ease of illustration, it is divided in several sections. Each section is prefaced with a comment explaining its function.

This filter assumes that the mailfromd.conf file contains the following:

relayed-domain-file (/etc/mail/sendmail.cw,
                     /etc/mail/relay-domains);
io-timeout 33;
database cache {
  negative-expire-interval 1 day;
  positive-expire-interval 2 weeks;
};

Of course, the exact parameter settings may vary, what is important is that they be declared. See Mailfromd Configuration, for a description of mailfromd configuration file syntax.

Now, let’s return to the script. Its first part defines the configuration settings for this host:

#pragma regex +extended +icase

set mailfrom_address "<>"
set ehlo_domain "gnu.org.ua"

The second part loads the necessary source modules:

require 'status'
require 'dns'
require 'rateok'

Next we define envfrom handler. In the first two rules, it accepts all mails coming from the null address and from the machines which we relay:

prog envfrom
do
  if $f = "" 
    accept
  elif relayed hostname($client_addr)
    accept
  elif hostname($client_addr) = $client_addr
    reject 550 5.7.7 "IP address does not resolve"

Next rule rejects all messages coming from hosts with dynamic IP addresses. A regular expression used to catch such hosts is not 100% fail-proof, but it tries to cover most existing host naming patterns:

   elif hostname($client_addr) matches
         ".*(adsl|sdsl|hdsl|ldsl|xdsl|dialin|dialup|\
ppp|dhcp|dynamic|[-.]cpe[-.]).*"
     reject 550 5.7.1 "Use your SMTP relay"

Messages coming from the machines whose host names contain something similar to an IP are subject to strict checking:

   elif hostname($client_addr) matches
   ".*[0-9]{1,3}[-.][0-9]{1,3}[-.][0-9]{1,3}[-.][0-9]{1,3}.*"
     on poll host $client_addr for $f do
     when success:
       pass
     when not_found or failure:
       reject 550 5.1.0 "Sender validity not confirmed"
     when temp_failure:
       tempfail
     done

If the sender domain is relayed by any of the ‘yahoo.com’ or ‘nameserver.com’ ‘MX’s, no checks are performed. We will greylist this message in envrcpt handler:

   elif $f mx fnmatches "*.yahoo.com"
        or $f mx fnmatches "*.namaeserver.com"
     pass

Finally, if the message does not meet any of the above conditions, it is verified by the standard procedure:

   else
     on poll $f do
     when success:
       pass
     when not_found or failure:
       reject 550 5.1.0 "Sender validity not confirmed"
     when temp_failure:
       tempfail
     done
   fi

At the end of the handler we check if the sender-client pair does not exceed allowed mail sending rate:

   if not rateok("$f-$client_addr", interval("1 hour 30 minutes"), 100)
     tempfail 450 4.7.0 "Mail sending rate exceeded.  Try again later"
   fi
done

Next part defines the envrcpt handler. Its primary purpose is to greylist messages from some domains that could not be checked otherwise:

prog envrcpt
do
  set gltime 300
  if $f mx fnmatches "*.yahoo.com"
     or $f mx fnmatches "*.namaeserver.com"
     and not dbmap("/var/run/whitelist.db", $client_addr)
    if greylist("$client_addr-$f-$rcpt_addr", gltime)
      if greylist_seconds_left = gltime
        tempfail 450 4.7.0
               "You are greylisted for %gltime seconds"
      else
        tempfail 450 4.7.0
               "Still greylisted for " .
               %greylist_seconds_left . " seconds"
      fi
    fi
  fi
done

4.24 Reserved Words

For your reference, here is an alphabetical list of all reserved words:

Several keywords are context-dependent: mx is a keyword if it appears before matches or fnmatches. Following strings are keywords in on context:

The following keywords are preprocessor macros:

Any keyword beginning with a ‘m4_’ prefix is a reserved preprocessor symbol.

Footnotes

(9)

There are two noteworthy exceptions: require and from ... import statements, which must be terminated with a period. See import.

(10)

Implementation note: actually, the references are not interpreted within the string, instead, each such string is split at compilation time into a series of concatenated atoms. Thus, our sample string will actually be compiled as:

$f . " last connected from " . last_ip . ";"

See Concatenation, for a description of this construct. You can easily see how various strings are interpreted by using --dump-tree option (see --dump-tree). In this case, it will produce:

  CONCAT:
    CONCAT:
      CONCAT:
        SYMBOL: f
        CONSTANT: " last connected from "
      VARIABLE last_ip (13)
    CONSTANT: ";"

(11)

The subexpressions are numbered by the positions of their opening parentheses, left to right.

(12)

The subexpressions are numbered by the positions of their opening parentheses, left to right.

(13)

Notice that these are intended for educational purposes and do not necessarily coincide with the actual definitions of these functions in Mailfromd version 8.7.

(14)

The only exception is ‘not’, whose precedence in MFL is much lower than usual (in most programming languages it has the same precedence as unary ‘-’). This allows to write conditional expressions in more understandable manner. Consider the following condition:

if not x < 2 and y = 3

It is understood as “if x is not less than 2 and y equals 3”, whereas with the usual precedence for ‘not’ it would have meant “if negated x is less than 2 and y equals 3”.

(15)

This function is part of the mailfromd library, See hasmx.

(16)

It is usually located in /usr/local/share/mailfromd/8.7/include/pp-setup.

(17)

This is similar to GNU m4 --prefix-builtin options. This approach was chosen to allow for using non-GNU m4 implementations as well.

Mailfromd Manual (split by chapter):   Section:   Chapter:FastBack: MFL   Up: MFL   FastForward: Library   Contents: Table of ContentsIndex: Concept Index