Spam protection using SpamAssassin
Well Filtered
Spam in your Inbox at home is a nuisance you can hardly avoid, but what is merely irritating at home is a genuine problem in the business world. The proportion of email advertising messages can be greater than 50 percent, forcing employees to check every piece of email and manually dump at least every second message in the Trash. Spam is a dynamic, not a static, problem, and spammers usually respond very quickly and cleverly to countermeasures of any kind.
You can get the flood of advertising email under control to a certain degree with the use of spam filters. Much like antivirus programs, spam protection needs to be updated continually if it is to provide protection. Ideally, the filter should be located in the enterprise at the central node through which incoming email traffic runs and where the most efficient filtering is possible.
Combination of Techniques
Intelligent spam filters like SpamAssassin [1] employ various solutions (Figure 1). Black and white lists explicitly exclude or include email addresses. A content filter checks the header and body content. Statistical tests and URL block lists are also used for spam detection and subsequent processing.
Sophisticated solutions like SpamAssassin use Bayesian filters, which are self-learning text filters that cull junk email on the basis of content – in theory, at least. In practice, however, filters suffer from significant error rates, particularly false negatives, relegating legitimate email into the Junk mailbox.
In principle, email filtering can take place on the client or the server sides, and often the two approaches are combined. The best option to protect as many users as possible from unwanted email is to implement server-side spam filtering. The Mail Transfer Agent (MTA) receives the incoming email and passes it to the spam filter, which returns it back to the MTA. Depending on the result of the test, the email is either sorted into the user's mailboxes or moved to a special Spam folder. The client then retrieves the filtered mail but can also access the rejected mail if required.
SpamAssassin filters email in two phases: Phase 1 detects spam and phase 2 processes the email classified as spam. The SpamAssassin spam detector expands the headers with a corresponding note, and the MTA then implements the processing of this information.
The core of SpamAssassin is a rules engine that applies previously established rules, so you can determine which detection methods are used (e.g., Bayesian filtering, the network test, or the whitelist). SpamAssassin comes with simple text files containing a standard set of rules. Both users and administrators can modify these rules. The Bayes filter – a key component of SpamAssassin – then uses its own database with statistical data from previously processed spam and ad-free email. The auto blacklist/whitelist in turn creates its own database.
Commissioning SpamAssassin
The SpamAssassin spam filter is available via the package manager of any common Linux distribution – Debian, openSUSE, or another platform. The installation is simple. To install SpamAssassin manually, download the current archive and copy it to $HOME/src
as in Listing 1.
Listing 1: Installing SpamAssassin
cd $HOME mkdir src cd src wget http://www.apache.org/dist/spamassassin/Mail-SpamAssassin-3.4.0.tar.gz tar xvzf Mail-SpamAssassin-3.4.0.tar.gz cd Mail-SpamAssassin-3.4.0 perl Makefile.PL PREFIX=$HOME && make && make install
Confirm four times by pressing Enter, and make sure you are using the version just installed, which you should find in /home/user_name/bin/spamassassin
.
Next, you should perform a test to see whether the spam filter was installed correctly:
spamassassin < \ $HOME/src/Mail-SpamAssassin-3.4.0/sample-spam.txt
Output appears on the console telling you that SpamAssassin is creating the user settings file and ensuring that the environment is functional. A separate configuration, as with many other environments, is not necessary.
After the installation, you can first devote yourself to the central configuration file local.cf
, which you will usually find in the directory /etc/mail/spamassassin
. The central SpamAssassin configuration file looks roughly as shown in Figure 2.
A mailbox for spam on the email server side allows a client to download email after viewing. You can also use the SpamAssassin Configuration Generator [2] for creating your own configuration. This provides you with a web form in which you can determine the cornerstones of the SpamAssassin configuration and export the configuration file.
Optimizing the Spam Filter
Once you have set up a functional filter system, you can turn to optimizing the environment as the next step. The main problem with using SpamAssassin is how to prevent or minimize false positives. Spammers are also learning through the years and have added increasingly better camouflage to their advertising messages. You need to consider several aspects to reduce the number of messages that are incorrectly identified as spam. First, when you send mail, make sure not to use suspicious subject lines or content. Receivers can work with whitelists or change the assessments that SpamAssassin triggers. Administrators should optimize the use of the Bayes filter in particular.
One goal of the SpamAssassin developers is to make static whitelisting redundant. SpamAssassin has been using the TxRep plugin (reputation plugin) since spring 2014; thanks to its advanced functions, it replaces the auto-whitelist (AWL) plugin.
Like its predecessor, TxRep tracks the assessments of previously received messages and adjusts them as necessary. The status can, however, change for senders who were previously regarded as harmless. From a certain rating they are considered spam distributors. In contrast to AWL, the TxRep plugin is capable of learning. AWL is already disabled in current versions. You can switch it on with
use_auto_whitelist 1
if you do still want to use it.
Checking and Ensuring Performance
The topic of performance tuning is a perennial favorite for all critical infrastructure components. SpamAssassin also provides various approaches. In principle, avoid rulesets that are larger than 100-150KB. The more rules SpamAssassin needs to process, the slower the environment will be.
The blacklist
and blacklist-uri
rules in particular are considered real performance brakes. Therefore, you are best off removing these rules and replacing them with URIBL_WS_SURBL
. Also make sure the implementation of network tests is enabled. To this end, you need to edit either the Spamd startup script or startup options, where you will find the -L
(--local
) option. Remove it. The SpamAssassin team also recommends the use of sa_compile
.
If you think your rules might be acting as a bottleneck, you can easily find out. Download the SpamAssassin rule timing plugin HitFreqsRuleTiming
[3] and copy it to ~/.spamassassin
. Add the following line in the file ~/.spamassassin/user_prefs
:
loadplugin !HitFreqsRuleTiming !HitFreqsRuleTiming.pm
Now run a spam check. In the logfile timing.log
you will find out how long it took to process the rules. If you find relatively high values here, the rule affecting performance is identified. Note that this test slows the environment – possibly even a lot.
Formulating Filtering Rules
SpamAssassin already has a solid basic configuration of rules, but they only cover the best known advertising email. It is not usually imperative to become acquainted with this area, but if, for example, you are faced with an above average number of false positives, it might be useful to create your own custom rules.
Before you get down to writing your own rules, you should be aware that it is explicitly discouraged to add rules to the *.cf
configuration files in the /usr/share/spamassassin
directory. The reason is simple: When you upgrade the filter, all existing rules are overwritten in this folder – including your changes.
The right place for site-wide application of rules is therefore /etc/mail/spamassassin/local.cf
. The rules you set here are used independently of the executing user. If rules are only supposed to apply to a specific user, you can specify this in ~/.spamassassin/user_prefs
. An example of a very simple rule is shown in Listing 2.
Listing 2: Simple Rule
01 body DEMONSTRATIONS_RULE ** ** ** ** ** /sue/ 02 score DEMONSTRATIONS_RULE ** ** ** ** ** 0.1 03 describe DEMONSTRATIONS_RULE ** ** ** ** ** This is a simple example rule
The above example performs a search in the message body for "sue" and adds the value 0.1 to the spam rating. However, the search is very rudimentary because email would also be assessed 0.1 with the terms "suede" and "ensue." You can refine the search using regular expressions such as the word break parameter \b
. With the following configuration, email with the search term embedded do not match the search:
body DEMONSTRATIONS_RULE/\bsue\b/
Adding the i
parameter disregards case, so the following configuration would assess email with "Sue" a value of 0.1 as well:
body DEMONSTRATIONS_RULE/\bsue\b/i
Using the header
parameter at the beginning of a rule line searches for terms in the header instead of the body:
header DEMONSTRATIONS_RULE /\bsue\b/i
To search for URIs in email, use the uri
parameter. SpamAssassin also allows you to create metarules, which are sets of rules linked to each other by a Boolean or arithmetic operation [4].
Checking Rule Syntax
The process of creating SpamAssassin rules is very error prone. Even a small typo can cause the filter mechanism to miss tons of unsolicited advertising email. Experience shows that it takes a long time before the problem is linked to faulty rule configuration. Fortunately, Spam-Assassin provides a lint
feature that checks the syntax of your rules:
spamassassin --lint
The output should provide enough information for you to find and correct any syntax errors. If the lint
option does not provide any useful information, you can also check the debugging output:
spamassassin --lint -D
The spam filter update mechanism sa-update
makes sure SpamAssassin always uses the most current rules and updates. New rules can be transferred promptly to SpamAssassin installations using this tool. The updates are copied to the subdirectory /share/program_version/updates_ spamassassin.org
by default. Sa-update can be performed with different parameters:
sa-update && service spamassassin restart
This command checks whether any new updates are available, performs a download, and if necessary starts the lint test. If any problems occur during the update process, sa-update returns the value 0
and restarts the SpamAssassin service.
Simplified Configuration with Webmin
The configuration of SpamAssassin at the console level is not for everyone. The process is also error prone, which can have unpleasant consequences if, for example, the wrong email messages are identified as spam. The control and administration of SpamAssassin is significantly easier with a GUI.
Webmin, a classic among admin tools, provides a SpamAssassin Mail Filter module in the Servers category that makes it easy to create allowed and denied email addresses (Figure 3). To do this, follow the Allowed and Denied Addresses link in the straightforward web interface. The associated form also allows you to import existing address files of established email clients that you explicitly want to exclude from filtering.
The classification options for the spam assessment are behind the Spam Classification icon. Here you can determine which tests are applied to incoming email and how they are weighted.
Under Message Notification, you can determine what changes are made to email identified as spam. The subject is expanded with the word Spam by default, and the assessment is added to the subject. Other options include adding your own texts and new headers. You can even introduce your own header and body test via the Webmin module. To do so, click the Header and Body Tests link. These tests can also refer to URLs in the message.
The SpamAssassin module can also draw on MySQL and LDAP databases. The import mechanism simplifies data transfer from common storage systems. The module also allows you to edit the SpamAssassin configuration file and the whitelist configuration.
Webmin modules usually always have their own module configuration; the SpamAssassin module included. Follow the Module Config link at the top if, for example, you want to use a different path to the filter configuration or to Procmail.
Because Webmin does not necessarily need be installed on the SpamAssassin server, you can also use the tool for remotely administering the spam filter. You can also set up the interaction with Procmail very easily. To this end, follow the Procmail Spam Delivery link and determine what should happen with the email classified as spam.
SpamAssassin for Windows
Smaller companies in particular may prefer to use Windows computers instead of Linux servers because a specialist is not necessarily required for the administration and maintenance of a system. Even if a local email server is operated onsite (e.g., with the XAMPP package), the question remains as to how to set up filtering.
Software developers JAM Software [5] have ported SpamAssassin to Windows and provide a free and a commercial version of the filter. The basic free version includes the following programs in addition to the Perl run-time environment and various plugins:
-
spamassassin.exe
: the mail filter -
spamd.exe
: SpamAssassin as a server process -
spamc.exe
: client to the server process -
sa-update.exe
: program to update the filter rules -
sa-learn.exe
: trains the Bayesian filter with spam/ham email
Although the installation is easy to perform, you might be a bit disappointed at the lack of a GUI. All actions and settings must be carried out in a terminal or in text files, like the Linux console version (Figure 4).
If you want to use SpamAssassin for Windows with Exchange Server, you will need to purchase a special license for the connector. In addition to the free basic version, the developers also offer the SpamAssassin in a Box version which, for example, can use the Windows Event Viewer and for which the developers offer support.
Conclusions
SpamAssassin is a classic among antispam filters and performs its service very reliably. The administrative overhead for setting up, configuring, and maintaining the software is minimal. However, if you do not want to do without a GUI, a suitable solution is available in the form of the Webmin module.