Nuts and Bolts Mail Archiving Tools Lead image: Lead Image © Alexander Dedrin, 123RF.com
Lead Image © Alexander Dedrin, 123RF.com
 

Open source mail archiving software compared

Locked Away

We compare three mail archiving tools: Piler, Benno MailArchiv, and MailArchiva. By Andrej Radonic

By law, enterprises are required to retain email for a certain period of time. The archiving solutions discussed here, Piler, Benno MailArchiv, and MailArchiva, promise both legally compliant storage and added benefits for corporations.

Many countries provide a comprehensive system of legal regulations for governing the delivery and storage of email. For instance, some mail might contain commercial correspondence or tax-relevant information that forms an official legal record and must stay around for a predetermined time for auditing purposes.

The auditability requirement often dictates that the archived email be immutable and thus protected against access by staff – even the omnipotent administrator. The laws governing email pose several technical problems: Many messages are stashed away in local mailboxes of mail clients where they are stored in an unstructured way so that pertinent messages cannot easily be distinguished from other mail. Targeted searching thus involves a large amount of manual work. The requirements for digital auditing, and possibly also internal compliance policies, are thus not met. Additionally, the daily flood of spam that swamps mailboxes makes it very difficult to sort the digital chaff from genuine content and hugely bloats the data volumes.

More Than Compliance

State-of-the-art archiving solutions address these tasks and problems with a varying degree of internal overhead. They take care of automated, permanent email storage, make the content easy and quick to find through centralized search solutions, and ensure auditable storage of the data. Ideally, these solutions will integrate seamlessly with the enterprise network, collaborate nicely with all popular mail servers, offer web-based access, provide granular authorization, and be able to store any kind of data transparently on any popular kind of storage medium, or even on specialized archiving systems. Ideally, too, the operator will have a choice of infrastructure between on-premises or cloud storage.

A positive side effect of such a solution is the reduced storage space requirement, because mail can be compressed and deduplicated. Additionally, it will be beneficial for business continuity, because mail remains in the archive even if mail servers fail or mail data is lost. Full-text searching against mail content and in mail attachments also makes it possible to find mail content years after archiving a message. See the "Legal Framework Example" box for more information.

Powerful Open Source Applications

In this article, I examine three standalone mail archiving products from the open source camp: Piler, Benno, and MailArchiva; Table 1 compares the features of these archiving tools.

Tabelle 1: Overview of the Test Candidates

Features

Benno MailArchiv

MailArchiva

Piler

Test version

2.1.0

4

1.1.0

Variant

Community Edition, Commercial Version

Open Source Edition, Enterprise Edition

Open Source Edition

SaaS model

Via partners, Hosting Edition

Cloud Edition

No, but multitenant capable

Operating systems

Debian, Ubuntu, SLES, RHEL, UCS

Windows, Linux, Solaris, BSD, OS X

Linux, Solaris

License

GPL1

GPLv2

GPL

Mail server

Postfix, Exim, Sendmail, Qmail

Yes

Yes, all SMTP

Microsoft Exchange

2003/2007/2010

5.5/2000/2003/2007/2010/2013

2003/2007/2010/2013

Google apps

No

Yes2

Yes

Others

Zarafa, Open-Xchange

Lotus Notes, Kerio, CommuniGate Pro, Scalix

Lotus Notes, Zimbra, Office 365

Archiving

Mail standards

POP3, IMAP, SMTP, Maildir, Milter

POP3, IMAP, SMTP, Maildir, Milter

POP3, IMAP, SMTP, Maildir, Milter

Archiving rules

No

Yes

Yes

Retention rules

No

Yes2

Yes

Encryption

No

AES-256

Blowfish

Demonstrable immutability

Checksums and log

Signature,2 log signature,2 and log

Signature and log

Compression

Yes, bzip

Yes, zip

Yes, Zlib

Import

POP3, IMAP, Maildir

Maildir, PST, EML, MSG, Exchange, Google, Office 365

EML, Mailbox, PST

Export

EML

EML, PDF2

EML

Clustering search

No

Yes2

No

Multitenanting

Hosting Edition

Yes2

No

Deduplication

Yes, email and attachments

Yes, email and attachments2

Yes, email and attachments

CLI

Yes

Yes2

Yes

Client/Search

Web client

Yes, Ajax

Yes, Ajax

Yes, responsive

Full-text search

Yes

Yes

Yes

Multilingual search

Yes

Yes

Yes

Forwarding

Yes

Yes

Yes

Search in attachments

Word, PPT, Excel, PDF, RTF, OpenOffice, zip, gzip, bzip2, tar, cpio, ar, JPEG metadata, Flash, mp3

Word, PPT, Excel, PDF, RTF, ZIP, tar, gz, OpenOffice

Word, PPT, Excel, PDF, RTF, ZIP, OpenOffice

Permissions

Yes

Yes2

Yes

Auditing

Yes

Yes

Yes

Integration/Adaptation

Authentication Web GUI

LDAP, MS AD, Univention Corporate Server (UCS), Novell eDirectory

LDAP, MS AD, NTLM, Google, iMail

LDAP, MS AD, Google, NTLM

Storage

Filesystem

Filesystem

Filesystem

Localization

German

German, English, Portuguese, Czech, Chinese, Greek, French, Dutch, Russian, Japanese, Korean, Thai

German, English, French, Spanish, Hungarian, Portuguese, Russian

APIs

REST, XML, Web service API with JSON support

Web services

No

Antivirus scanner

No

Yes: ClamAV

Yes: ClamAV

Backup

No

Yes1

Yes

Themes/skins

No

Yes1

Yes

Price

Licenses

EUR80 per year incl. five mailboxes (Small Business Edition); EUR12.50 per mailbox a year for 20 mailboxes (Standard Edition)

Free up to 20 mailboxes; EUR23 per mailbox one-off, at least 25 mailboxes must be licensed.

Free

Support

Software maintenance in first year, free, can be purchased separately for additional years

20 percent of license costs per year

Not available

1 Community Edition only. 2 Commercial versions only.

All of these products have a fundamentally similar approach: Email is either actively transmitted to the archive (using SMTP) or passively polled by the archive system, that is, retrieved from the mail or groupware server – typically using POP3 or IMAP calls to a journaling mailbox. As you can see in Figure 1, the messages are permanently stored either on the archive filesystem or in the database, including attachments.

Mail archiving solutions store all incoming and outgoing mail in a central archive.
Figure 1: Mail archiving solutions store all incoming and outgoing mail in a central archive.

All systems can be managed using a web client and support audits. The tools come with rights management features, including optional directory integration. The systems listed here all claim to be audit-proof and legally compliant. Of course, any technical solution must be accompanied by appropriate organizational policies to guarantee it as a compliant and comprehensive solution. Also, different jurisdictions have different rules for mail archiving. You should familiarize yourself with laws for your own country: Don't depend on the software to know your legal requirements.

Piler

Piler [1] is completely open source software from Hungary; its feature scope has grown immensely in the past two years so that it can now be regarded as a complete solution for mail archiving.

Email can be retrieved from SMTP servers by a variety of manufacturers, including Microsoft Exchange, and imported from different formats. The data is encrypted using the Blowfish algorithm and stored on the filesystem as compressed files. The matching metadata is stored in a MySQL database. The duplication rules are applied to both messages and attachments. Searching is handled by a Sphinx search engine.

The software takes legal requirements into consideration for the most part: Auditing options are in place, as is logging throughout. When saved, email is digitally signed to be able to check for manipulation or demonstrate a lack of it. Piler can connect to a large number of mail servers, including Lotus Notes, Zimbra, Google Apps, and Office 365. Authentication can be handled by LDAP or Active Directory or controlled by an IMAP server.

Archiving Rules

Administrators can handle the configuration in a web GUI, which is optimized for mobile devices. Additionally, CLI commands are available for automation purposes. Piler fundamentally transfers all mail from the data stream to the archive; administrators can define flexible rules based on regular expressions to filter messages with specific features and prevent them from ending up in the archive. Retention rules let the operator define how long messages are kept in the archive before they are automatically discarded (deleted).

Access to Email

Authorizations or searching and access to mail via the web GUI can be assigned at the user and group level (Figure 2). Regular users only get to see their own messages; if desired, the archive can be integrated directly with Outlook.

Piler has a neat administration GUI that is also suitable for mobile devices.
Figure 2: Piler has a neat administration GUI that is also suitable for mobile devices.

Auditors have access to all mail. Using a separate window, an auditor can access messages in a targeted way and retrieve a log for each email message, providing information on any operations performed on the message in question (e.g., whether the message has been accessed, searched for, or downloaded) (Figure 3). The search is fast and, in addition to an advanced search feature based on a form, offers the option of using search expressions with a highly detailed syntax:

The Audit function supports searching for email messages and provides details on their integrity and history.
Figure 3: The Audit function supports searching for email messages and provides details on their integrity and history.
size:>.2M, subject: viagra OR cialis, \
  body: order < now, from: my@email.address

This is an example of a complex Piler search that filters out Viagra spam with a message size of more than 200KB and other features.

Installation

If you do not want to download and use the prebuilt VMware appliance [2], you will need a little patience installing Piler because the program, which was programmed in C, does not provide any installation packages. On a Linux or Solaris host, you first need to set up the required basic packages: OpenSSL, MySQL 5.1+, Sphinx Search 2.1+, PHP 5.3.x+, web server with rewrite technology (Apache, Lighthttpd, Nginx), TRE Regex Library, Libzip, and Iconv. Then, you can download the source code [3] and build it as follows:

tar zxvf piler-x.y.z.tar.gz cd piler-x.y.z
./configure --localstatedir=/var --with-database=mysql \
  --enable-starttls --enable-tcpwrappers
make su -c 'make install'

After doing so, set up a user named piler and run the postinstallation routine by typing make postinstall; among other things, this will create the databases, generate cronjobs, and create a web directory. Finally, start the Piler daemon and the Sphinx indexer. Initial login via the web GUI uses the admin@local account and the pilerrocks password.

Once you get there, you can start setting up Piler for production. This involves creating users and groups, defining the desired archiving rules, and configuring the required SMTP server so that it passes the incoming mail data stream to Piler. If you have a Postfix mail server, you can do this with the following entry in main.cf:

always_bcc = archive@piler.my.domain

Conclusions

Piler leaves users with an impression that the programmers have done their homework; the project is well maintained and comes with competitive documentation (Table 2). The integration options support operation in many environments. If the convoluted setup does not bother you, and you do not need better support, Piler will give you a comprehensive and lean system that meets most of the central requirements for mail archiving.

Tabelle 2: Piler

Manufacturer

Piler (http://www.mailpiler.org)

Price

Free

Technical data

http://www.mailpiler.org/en/features.html

Verdict (max. 10 points)

Installation overhead

2

Feature scope

8

User-friendliness

7

Integration options

7

Documentation

5

Overall rating

5.8

Before using Piler in heavy load situations with many users, it makes sense to perform a proof of concept, including appropriate load and stability tests.

Benno MailArchiv

Following the general trend, Benno is available as the free Community Edition Open Benno MailArchiv [4] and as the commercially licensed Benno MailArchiv [5]. The two versions are fortunately compatible in terms of data. Benno was created in Germany and is primarily intended for a German audience, but it provides an optional English user interface. The community edition does not advertise official support for legally compliant archiving, although the commercial version does support the GDPdU German guidelines. Vendor support, software maintenance, and accompanying services are only available for the commercial variant.

Complete Package for Archiving

The vendor, LWsystems, presents Benno as a complete package that propagates open standards. Popular standards such as SMTP, POP3, and IMAP are support for collecting email, so integration with any well-known mail server is possible. Existing or legacy mail collections can be imported directly using the Maildir format, for example. Benno organizes mail data in containers directly on the filesystem. The stored files are safeguarded by checksums and interlinked. Administrators can specify the breakdown of archive containers by year, domain, or other criteria. Data encryption is not intended; however, the vendor points to the option of using an encrypted filesystem.

Core Feature

Searching is the application's core feature. Benno creates a full-text index of messages and attachments. The program relies on the Lucene and Tika search engines for searching so that attachments in various formats – from PDF, MS Office, OpenOffice/LibreOffice to ZIP archives – can be searched quickly. Plugins also provide an option for provisioning content. Benno shows users and auditors a neat web interface that supports convenient email searches, including the option to download or forward (Figure 4).

The Benno Web GUI offers quick searching and supports auditing.
Figure 4: The Benno Web GUI offers quick searching and supports auditing.

User authorization management for the web GUI relies on a local (integrated) database or, alternatively, on Microsoft Active Directory via a matching connector or LDAP server.

Installation from Distribution Packages

Installing Benno is very easy, given that the system does not pose any major technical requirements: A Java JDK 6 runtime must be in place for the archiving back end. To run the Benno MailArchiv front ends, you need PHP5, Smarty templates, and an Apache2 web server.

You can download the required packages from the Benno repository. For Ubuntu, Debian, or UCS, just add the package source and GPG key in your package manager and run the following commands:

apt-get update
apt-get install benno-lib benno-core \
  benno-archive benno-rest-lib benno-rest
apt-get install apache2 php5 php-pear \
  php-db smarty
apt-get install benno-web

Before launching the Benno services, you first need to add a shared secret to the /etc/benno/benno.xml and /etc/benno-web/benno.conf files. This is used to safeguard server communication between Benno Core and the REST API.

Then, copy the license file to /etc/benno/benno.lic and restart the Benno REST service by typing /etc/init.d/benno-rest restart. If you want to set up the free open source edition, you need to create an empty benno.lic file and set the USERPERMISSONS = DISABLED parameter in the /etc/benno-web/benno.conf file. Access to the web interface is via the URL http://bennoserver/benno using the admin/secret account password combination. You can then create users with the benno-useradmin command-line tool, if you are not using a centralized directory service.

Conclusions

Benno is a complete system based on open software in combination with optional vendor support (Table 3). Additionally, a hosting edition is available that targets service providers and offers a flexible billing model. Smart administrators will not want to do without the management GUI. The lack of encryption makes it more difficult to comply with requirements, however.

Tabelle 3: Benno MailArchiv

Manufacturer

LWsystems GmbH & Co. KG (http://www.benno-mailarchiv.de)

Price

Free as Open Benno, as a Small Business Edition (SBE) with up to 20 mailboxes, as a Standard Edition (SE) for 20 mailboxes, and a Hosting Edition (HE). SBE with five mailboxes for EUR80 per year. SE for a price of EUR12.50 per mailbox and year. Volume discounts are available.

Technical data

http://www.benno-mailarchiv.de/produkt/uebersicht/standard_edition.html

Evaluation (max. 10 points)

Installation overhead

7

Feature scope

7

User friendliness

5

Integration options

8

Documentation

6

Overall score

6.6

To make up for this, Benno offers interfaces for provisioning, user management, and web services that give administrators the ability to integrate the application seamlessly. The community edition may fit the bill for many uses; however, it does not meet all the legal requirements.

MailArchiva Enterprise Edition v4

MailArchiva [6] is a comprehensive archiving system specially designed for larger environments with many mailboxes (Figure 5); it advertises good scaling capability, in particular for the fully supported commercial version, but a feature-stripped Open Source Edition [7] is also available.

The MailArchiva architecture supports comprehensive integration with enterprise IT.
Figure 5: The MailArchiva architecture supports comprehensive integration with enterprise IT.

One of the special features of MailArchiva is its comprehensive support for MS Exchange; like many other advanced features, this is only available in the Enterprise Edition. MailArchiva natively supports all Exchange versions and multiple Exchange Stores. Outlook users can access the archive directly using plugins from within the mail client.

MailArchiva also offers comprehensive support for many of the popular mail server flavors, such as Postfix, Sendmail, Qmail, iMail, Lotus Notes, AXIGen, Communigate Pro, Neon Insight, Zimbra, and Google Apps.

Clear-Cut Architecture

The archiving program, which runs on Windows, Linux, Solaris, BSD, and OS X, creates the messages, including all headers, in zipped archive files directly on the filesystem and thus does without a database. The files are encrypted using Triple-DES. To save hard disk capacity, attachments in multiple mail messages are saved once only.

Archived data is organized into logical volumes, which can be segmented and stored on separate storage systems if desired. User authentication can be handled by OpenLDAP, Active Directory, or Google Apps for role-based access control.

Painless Installation

Installation has been neatly solved. In just three steps, administrators can set up a working system by unpacking the tarball after downloading, typing ./install to launch the setup routine, confirming the license, and answering the prompt for the Max Heap Size (256MB is fine for test operations). The installation routine automatically launches the main process. You can stop this later, or restart it using /etc/init.d/mailarchiva. After this, you can log in to the web console using the URL http://<Servername>:8090. For your initial login as the administrator, use the admin account and – contrary to what the documentation says – the automatically set admin password.

After the initial login, change the administrator's master password below the Login menu item; otherwise, the system will reject any administrative changes. In our lab, this only worked after I created an empty server.conf file in the /usr/local/mailarchiva/server/webapps/ROOT/WEB-INF/conf path. Next, create an encryption password in the Storage groups menu item; this is used for encrypting all archives with the Blowfish algorithm.

Fast Search

The web GUI offers much convenience. You can configure all the critical system parameters, define rules for archiving and retention, manage certificates, and start the integrated backup routine. The feature scope also includes integrated monitoring, which returns data in JMX format. The web GUI comes with an advanced search routine that can be distributed across multiple servers for performance benefits. In addition to the option of defining and storing your own queries, there are comprehensive export options for archived email, including the ability to generate reports in PDF format. As a special feature, you can integrate the search with Outlook and keep the familiar Outlook look and feel in doing so.

On Premises or in the Cloud

MailArchiva also offers an ISP Edition, which enables hosting for service providers and is designed to be multitenant capable. This edition also contains features for automated billing and underlines its claim of being a complete solution.

Conclusions

MailArchiva leaves no wishes unfulfilled. It provides excellent feature scope in combination with support for very large environments and good scalability (Table 4). Additionally, small businesses will really appreciate the free edition.

Tabelle 4: MailArchiva

Manufacturer

MailArchiva (http://www.mailarchiva.de)

Price

MailArchiva can be tested for 45 days free of charge. Enterprises with 20 mailboxes or fewer can use the software free of charge. Licensing is on a per mailbox basis; at least 25 mailboxes must be ordered at a price of EUR559. Extension licenses can be purchased in steps of 10. For a new investment, charges of 20 percent maintenance per annum are added for the first year.

Technical data

https://www.mailarchiva.com/enterprisefeatures?page=enterprise

Evaluation (max. 10 points)

Installation overhead

9

Feature scope

9

User friendliness

8

Integration options

8

Documentation

8

Overall score

8.4

The tests reveal that a solution is available for any budget and set of requirements. The feature scope of all the candidates presented here is huge, and the overhead for implementation and operation is typically manageable.