Tools Sieve Lead image: © radist777, 123RF.com

Filtering email with Sieve

Mail Sorter

Sieve is an easy-to-use, server-based solution that can help admins intelligently filter a flood of email. By Florian Effenberger

The sheer volume of email traffic today causes problems for administrators and users alike. It's not just private messages that end up in your mailbox, but also business correspondence, and even notifications from authorities. Mailing lists and newsletters also do their best to fill your inbox with all kinds of mail. To avoid drowning in this flood of mail, you need an intelligent filtering system that automatically sorts your mail into the right folders.

Users will normally set up these filters directly in the email client because both desktop applications (for example, Mozilla Thunderbird, and KMail) and web mailers (Roundcube and SquirrelMail, to name two) offer fairly comprehensive filtering options. This approach is also the simplest – as long as you only retrieve your email on a single system.

But, if you use multiple devices, things can start to get difficult. Although IMAP keeps your mail repositories identical, the protocol doesn't support filtering rules. You could theoretically copy the settings back and forth between your clients, but that is somewhat time-consuming. And, as soon as you start to use smart phones and tablet PCs, things really start to get tricky.

More Convenience with Sieve

If you manage an IMAP server and want to do your users a favor, consider offering them a Sieve filter. This tool lets them define filter rules of incoming messages, just like with Procmail, for example. The biggest advantage here is that filtering occurs directly on the server and thus happens automatically for all of the clients independently of the operating system. The rules will be applied in exactly the same way on the Thunderbird desktop, your Android cell, and your tablet PC. Sieve is the quintessential "IMAP-style" approach to filtering mail.

From the administrator's point of view, it is a good thing that Sieve integrates directly as a plugin with the Mail Delivery Agent (MDA), and thus offers excellent performance. In contrast, Procmail, for example, is typically executed as a separate script and thus causes more load. Also, Sieve's syntax is much simpler than the legacy, and slightly cryptic, Procmail language.

Sieve is available for a number of IMAP servers and, besides being able to move messages into subfolders, can also:

Forward messages to third parties
Delete messages identified as spam or infected by viruses
Send bounces (although you need to be careful with this)
Set labels and IMAP tags
Send vacation or out of office messages

Both actions and retrieval criteria can be combined arbitrarily. For example, you can tag, forward, and automatically respond to a specific message, depending on the sender and the subject line, with just a couple of lines of code. All of the elements map to an easy-to-understand syntax. For security reasons, Sieve doesn't support the execution of external scripts, in contrast to Procmail, but it does support plugins that add additional functionality.

The Right Location in Postfix

Our lab system comprises the current LTS version of Ubuntu, 10.04 (Lucid Lynx), which comes with Postfix 2.7 and Dovecot 1.2. In the following example, both components should be installed and functional, and the individual user accounts should come from the /etc/passwd file – that is, you should not be using LDAP or MySQL. To install Postfix, you can simply issue the following command apt-get install postfix; the apt-get install dovecot-imapd command will put Dovecot on your system.

The question as to where Sieve is deployed often causes confusion. Because it filters incoming messages, you might assume that it is a component of the SMTP server – Postfix, in this case – but that's not true. Although Sieve is called by Postfix, the implementation is actually a part of the IMAP server.

The underlying principle is simple. Postfix receives incoming messages as the SMTP server and processes its configuration: address resolution, virus scanner, spam filters, graylisting, and many other things. After Postfix has accepted the message, run it through all of its own filters, and is ready to put it into the user's mailbox, a component known as the Mail Delivery Agent (MDA), comes into play. And, it is the MDA that actually delivers the message. This task is handled by a Sieve-capable Dovecot component [1].

This approach has two decisive advantages: On the one hand, the filter is exposed only to messages that haven't already been discarded. In other words, messages that are rejected as spam directly when they arrive will not even reach Sieve, which saves resources. On the other hand, Sieve can access the headers, which are added by the Postfix configuration entries with SpamAssassin, ClamAV, or the Policy Daemon.

The MDA is configured using the mailbox_command option in the /etc/postfix/main.cf file. On our lab system, the correct connection between Sieve and Postfix looks like Listing 1.

Listing 1: Postfix Configuration

01 mailbox_delivery_lock = dotlock, fcntl
02 virtual_mailbox_lock = dotlock, fcntl
03 home_mailbox = Maildir/
04 mailbox_command = /usr/lib/dovecot/deliver

After making the changes, you need to tell Postfix to reload the configuration by issuing the postfix reload command.

Dovecot Configuration

Using Dovecot's deliver as your MDA doesn't automatically enable Sieve because you first need to configure the protocol. Again, I will assume that the basic Dovecot installation is functional and that messages are correctly distributed to the IMAP mailboxes.

On Ubuntu, the server configuration is located in /etc/dovecot/dovecot.conf. You need to enable two important options in the lda section. The postmaster_address option names a technical contact for the mail server and is displayed in bounces and system messages. Because Dovecot only loads the extension if the administrator expressly wishes it to do so, it makes sense to enter a dedicated alias name here.

A complete lda section would look like this:

protocol lda {
postmaster_address = postmaster@company.tld
mail_plugins = sieve
}

The /etc/init.d/dovecot restart command tells the server to reload the configuration, and this concludes the installation of Dovecot.

And, Now, for the Users…

The administrator's work has actually been concluded once you've successfully implemented all the required steps and made Sieve available for each user on the IMAP server. In the following section, I will look at steps that users need to take to manage their own filter rules and will explore the Sieve syntax.

As a general rule, every IMAP server includes its own Sieve implementation, and the details can vary, for example, in terms of the file names for the filter commands or the plugin names. Having said this, the fundamentals of the language are specified, and you should be able to port the major part of your rules if you change your mail server. The following section uses the Sieve implementation in Dovecot 1.2.1, which Ubuntu 10.04 provides as a package.

By default, all of the Sieve control files reside in the user's home directory. For an initial test, you might want to use a dedicated user to avoid impacting your regular mail operations by accidentally deleting or moving mail. Sieve is quite user friendly; an incorrect filter rule will not prevent email from being delivered. Instead, the filter rules are not loaded in this case, and the messages are left unfiltered. Of course, even this action could have an undesirable side effect if you receive a large volume of mail.

You'll need to watch out for several files in the Dovecot implementation. The central file is the.dovecot.sieve control file (note the dot at the start) in which all the filter rules are stored. When changes occur, .dovecot.svbin is automatically precompiled to accelerate processing when the next messages arrive. Another important file is the .dovecot.sieve.log, which the filter system uses by default to store error messages in case of issues. If vacation messages exist, .dovecot.lda-dupes will also exist, which tracks all the auto-replies that are sent.

Listing 2 shows an initial overview of the file structure.

Listing 2: Sieve Example

01 # -- load plug-ins -
02
03 # -- Move messages into another folder -
04 require "fileinto";
05
06 # -- Filter rules -
07
08 if header :contains "X-Spam-Flag" "YES" {
09         fileinto "Junk";
10 }

This process starts by loading all of the required extensions, in this case fileinto, which is responsible for moving messages into other folders. The individual rules follow in the second part; Sieve will process them one after another, ignoring comments that start with a #. In the example, messages with the word YES in their X-Spam-Flag headers are moved into the junk folder – this would be useful for automatically moving any unsolicited mail identified by SpamAssassin. To test the setup, all you need is a message with the GTUBE pattern [2]. Successful filtering is immediately shown in /var/log/mail.log:

Aug  2 10:15:51 mail dovecot: deliver
(sieve): sieve: msgid=<4E37B272.2060705@
meinefirma.tld>: stored mail into mailbox '
Junk'

In the Sieve language, you start by stating the condition, which then leads to the action to be executed. Both conditions and actions can be combined, as you can see from the example in Listing 3.

Listing 3: Conditions and Actions

01 require ["fileinto", "imap4flags"];
02 if anyof (header :contains "X-Spam-Flag" "YES",
03           header :contains "Subject" "<ADV>",
04           header :contains "Subject" "[SPAM]") {
05         setflag "\\Seen";
06         fileinto "Junk";
07         stop;
08 }

In the example, preconditions are defined in the round brackets; the anyof option means that each one is valid on its own. In other words, if only one of the three headers contains the content shown here, the rule still applies. The related actions are listed below this in the curly brackets. In the example, any messages identified as spam are moved to the junk folder and additionally tagged as "Seen." The setflag command required to do this is provided by the imap4flags extension loaded earlier in the file.

The final stop command is also useful; it says that no further action needs to be taken for this message, assuming that the corresponding rule has been applied. This is useful, for example, if messages from mailing lists or newsletters are tagged and you don't want to send a vacation notice (described later on) in response.

Sieve really shines when you need to sort mailing lists. Although many email clients, and web mail systems in particular, filter by reference to the subject line or target, Sieve supports clean detection of mailing lists via the List-ID or List-Post headers (Listing 4), which are used by default by Mailman, for example. This approach gives you a reliable method for detecting whether a message was sent by a mailing list or whether the target has received the message directly.

Listing 4: Filtering List Mails

01 require "fileinto";
02 if header :matches "List-Id" "<announce.de.libreoffice.org>" {
03    fileinto "INBOX.announce@de-libo";
04 stop;
05 }
06 if allof (header :matches "From" "*+owner@*",
07           header :matches "Subject" "Moderation for*used") {
08    fileinto "INBOX.Moderation";
09 stop;
10 }

The first rule moves messages from the German language LibreOffice announcement list to a folder below the inbox. The second rule accepts mail for the list targets. Folders and subfolders are separated by a dot in typical IMAP style.

The difference between the :contains and :matches search modes also becomes clear here. Whereas the former simply expects the text to occur somewhere in the header, the latter gives you an option for performing a more granular search – the text should occur exactly as stated with the asterisk acting as a placeholder. Another new thing in this example is the allof option, which defines that all of the conditions must apply, rather than just one of them. In other words, both the sender and the subject line must match for the rule to be applied.

Search terms with quotes are indicated in Sieve by a backslash, as shown in Listing 5.

Listing 5: Search

01 require ["fileinto", "imap4flags"];
02 if allof (header :matches "From" "\"Florian Effenberger\"<floeff@documentfoundation.org>",
03           header :matches "X-Mailer" "EPOC Email Version 2.10") {
04    setflag "\\Seen";
05    fileinto "Sent";
06 stop;
07 }

Incidentally, this filter handles the task of moving messages sent by legacy Symbian devices into the sent folder. Many of these devices are only capable of sending a copy to themselves; thus, the filter first queries the sender and the mail client, and then moves the messages.

On Vacation

Most people will be back from summer vacation by the time this article appears, but when you go on vacation again, Sieve can help. Although there has been much debate about the sense or nonsense of vacation messages, some people don't want to be without them. The biggest challenge here is to respond only to personal mail – auto-responders that respond to newsletters or mailing list are more than a nuisance.

Automatic responses can really become problematic if their recipient also uses an auto-reply to answer them: incorrectly configured mail systems will then continually send messages back and forth, which can bring the mail system to its knees in a very short period of time. Clean implementations like Sieve, however, check a variety of criteria, such as the envelope sender, list headers, and much more, before they dare to send a message. At the same time, these tools keep records of any messages that have been sent, thereby ensuring that the target is only addressed once within a certain period of time.

It makes sense to configure the filter file first to move any mailing list messages, newsletters, and automated messages to another directory, and then call the stop command to terminate processing. An auto-reply is sent for messages that do not match the filters. Implementing a vacation auto-responder just takes a couple of lines (see Listing 6).

Listing 6: On Vacation

01 require "vacation";
02 vacation
03 :days 14
04 # in case of multiple addresses used to format ["address1", "address2", "address3"]
05 :addresses "floeff@mycompany.tld"
06 :subject "Out of office"
07 "This is an automatically generated message.
08
09 I am out of the office until September 5th, 2011. Your
10 e-mail will not be forwarded, but replied to after my
11 return. In case of an emergency, please call our
12 headquarters. This message will only be sent out once.";

This 'On Vacation' example starts by loading the required module, vacation. The days option states how often a message is sent, which is set to every 14 days in the example. The target addresses must be specified because Sieve will only send the message if one of these addresses is listed in the To or Cc field (Figure 1). Only the subject line is optional.

Figure 1: Sieve will also send vacation notifications if desired.

Sieve Can Do More

Besides the functions that I have explained here, Sieve has many other tricks up its expertly-tailored sleeve. However, these go well beyond the scope of this introductory article. If you understand the basics of the language, you can quickly define comprehensive filtering rules with some help from the numerous sample scripts [3] and tutorials [4] available on the web. To remove the need to test all of your scripts locally, you might also consider using a web validator [5]. It is also useful to read the product documentation for your choice of IMAP server for product specific tips.

One question remains unanswered: How do my scripts get to the server? Although smaller installations might allow users to access their home directories by FTP (insecure!), WebDAV, or even SSH, the ManageSieve service [6] is recommended for larger environments. This tool lets mail clients and web mail systems install filters by using a separate protocol without direct access to the system.