Tools Duplicity Lead image: Lead Image © Boing, Photocase.com
Lead Image © Boing, Photocase.com
 

Encrypted backup with Duplicity

Packed and Sent

The free Duplicity backup program consistently encrypts all backups, which means that backups can even be stored in an insecure cloud. By Tim Schürmann

Duplicity [1] packages one or more directories into a tar archive, encrypts the results with GnuPG, and automatically uploads the backup created in this way to a backup server. Signatures help reveal tampering or disk failure, which means backups can be stored on insecure servers or in the cloud. Duplicity even offers native functions for talking to some well-known cloud services.

Additionally, Duplicity can create incremental backups, in which the transferred archive contains only the delta to the previously created backup. This not only saves disk space on the server but also means individual backups are created faster. Duplicity is licensed under the GNU GPL and thus can be used free of charge.

Duplicity is tailored for Linux and other Unix operating systems, such as BSD or OS X. Most major Linux distributions have it in their repositories. Users of OS X can install it via Fink, for example. For Ubuntu-based distributions, there is also a PPA [2] with the current Duplicity version. Alternatively, Duplicity can be built quickly from the source code (see the box "Self-Build"). Duplicity works on Windows in the Cygwin environment but is unable to handle the specific features of the Windows filesystem. Administrators should back up Windows systems with some other software if possible.

Creating Backups

Duplicity is very simple to operate. At the command line, you pass in the directory to be backed up and the storage directory to the tool. The following example packages the complete /etc directory in a tar archive, encrypts it, and uses secure copy (scp) to store the results on the server at example.com below a directory named /var/backup (Figure 1). Note the double slashes after the domain name:

duplicity --progress /etc scp://dd@example.com//var/backup
Here Duplicity has the saved the complete directory /etc on the server at 10.0.2.2.
Figure 1: Here Duplicity has the saved the complete directory /etc on the server at 10.0.2.2.

During the backup, Duplicity considers deleted files, all file permissions, subdirectories, FIFOs, device files, and symbolic links, but not hard links. Specifying the --progress parameter tells Duplicity to indicate the progress continuously. Note that the tool always expects parameters in front of the directory information. Furthermore, you must ensure that Duplicity has the correct permissions. In the above case, it must therefore be allowed to access /etc and all its contents.

Duplicity automatically compresses the archive with gzip, which can be switched off with the --no-compression option. Additionally, Duplicity creates some temporary files in the appropriate directory – for Linux, this is usually in /tmp. If you have insufficient free space, you can use --tempdir /<path/to>/tmp to define another directory. In previous Duplicity versions, users had to define the temporary directory in the environment variable TMPDIR. The developers have made this method obsolete, however.

Duplicity encrypts the resulting archive with GnuPG. For this reason, you need to create and type a password (the GnuPG key) after calling Duplicity; you will need the password to restore the backup later. Accordingly, you will want to make the password as long and cryptic as possible – but not so long that you forget it; otherwise, you can say goodbye to your data.

Transferring Passwords for SSH and FTP

The previous command assumes that you log in to the SSH server using private and public keys. If you want to authenticate via password, specify the Duplicity --ssh-askpass parameter. The tool then prompts you for the required SSH password when connecting. If the SSH server is not listening on the default port, you also need to specify the port in the usual way separated by colons after the domain name:

duplicity /etc scp://dd@example.com:2222//var/backup

If you want to store the backup on an FTP server, you need to enter the password for the server in the FTP_PASSWORD environment variable. In the following example, it is 123. For the FTP transmission method, the domain name is followed by a slash:

FTP_PASSWORD=123 duplicity /home/tim ftp://dd@example.com/var/backup

Incidentally, Duplicity also evaluates the FTP_PASSWORD environment variable for an SSH connection. You can thus omit the --ssh-askpass omit parameter and define the SSH password in the FTP_PASSWORD environment variable. This is especially useful if you want to include Duplicity in a script. If you want Duplicity to create the backup archive on a local storage medium, use the file:// protocol:

duplicity /etc file:///mnt/backup

Duplicity can transmit the backup archive with many other protocols such as Rsync and WebDAV. Additionally, Duplicity can store the backups in various cloud services, including Dropbox, Azure, OpenStack Swift, and Amazon S3, along with a couple of quirky storage memory options such as sending email. Almost every new release of Duplicity adds new protocols. For a complete and quite long list, check the Duplicity man page. To access this, type man duplicity and look for the "URL format" section. The man page for the current Duplicity version is also available online [1].

For some protocols and services, Duplicity requires additional libraries and tools (see the box "Modules Used"). The backup program prompts for any missing helpers when called. On Linux, there is no need for manual attention for the standard protocols, but this is not true for many cloud services. For example, to access Amazon S3, you need Boto [3] software version 2.0 or newer. For a complete list of all dependencies for all supported services, see the Duplicity man page "Requirements" section.

Storing Encrypted Backups

Duplicity uses symmetric encryption by default – that is, the same password is used to encrypt and decrypted the backup. Alternatively, the tool can use GnuPG public key encryption. Here, each user has two keys: An archive locked with the public key can only be unlocked again with the private key.

If you want to use a new key pair for the backup, create the key before the first backup using gpg --gen-key. To do this, answer the questions posed; if in doubt, leave the fields blank or accept the default settings by pressing Enter (Figure 2). You will need to type the passphrase each time for encryption and decryption. At the end, GPG outputs a key ID, which you will want to remember.

The call to gpg --gen-key creates a new GPG key.
Figure 2: The call to gpg --gen-key creates a new GPG key.

Because the keypair secures your backup, you will want to save it on an external medium. Use the following two commands to create a copy of the public and private keys in the files /mnt/key_pub.gpg and /mnt/key_sec.gpg:

gpg --output /mnt/key_pub.gpg --armor --export Key-ID
gpg --output /mnt/key_sec.gpg --armor --export-secret-keys Key-ID

On another system, or after system recovery, the key can then be reloaded using gpg --import. When creating a new backup, you need to tell Duplicity the key ID of the public key using the --encrypt-key parameter:

duplicity --ssh-askpass --encrypt-key 12345678 /etc scp://dd@example.com//var/backup

Normally, you need to type the passphrase after calling the command. If Duplicity runs directly, the GPG agent has probably stored the passphrase in the background. Furthermore, Duplicity buffers some metadata in the ~/.cache/duplicity/ directory that it retrieves whenever called.

When you call Duplicity from a script, you can enter the passphrase in the PASSPHRASE environment variable. This method, however, poses a security risk: Anyone who can read the script automatically discovers the passphrase. If you set the environment variable in a script, you should at least explicitly dump its contents from memory afterward using unset. You can completely disable encryption with --no-encryption.

Incremental Backups

When first called, Duplicity always completely backs up the source directory. Once you invoke the command a second time, Duplicity only backs up the previously added or changed data (delta). This approach has the advantage that you can include the Duplicity call in a cron job or startup script, thus ensuring that Duplicity runs regularly and automatically. To do this, Duplicity uses the librsync library, which implements the well-known Rsync algorithm.

Incremental backups save space on the server and can be created much faster. However, if a read error occurs in one of the parts, the subsequent backups will very likely be useless. Moreover, recovery will take longer because Duplicity may first need to look at all the incremental backups. For this reason, you should perform a full backup at regular intervals. You can enforce this by specifying full:

duplicity full /home/tim scp://dd@example.com//var/backup

In this case, full is not a parameter but an action that needs to follow the program name directly. The --full-if-older-than parameter tells Duplicity to create full backup if the last full backup was created more than a predetermined period ago (Figure 3) – in this example more than one month:

duplicity --full-if-older-than 1M /home/tim scp://dd@example.com//var/backup
The server here has a full backup and an incremental backup (inc) in two volumes.
Figure 3: The server here has a full backup and an incremental backup (inc) in two volumes.

You need to leave out the full action in this case; otherwise, it would overrule the --full-if-older-than parameter.

Instead of 1M for a month, you can also specify other periods; for example, 14D is 14 days. The appropriate value depends on your organization's backup strategy.

Duplicity does not pack the data to be backed up into a single huge archive; instead, it distributes the data to several smaller archives. Because these volumes can only grow to a maximum of 25MB by default, numerous small files accumulate over time on the server (Figure 4).

The complete backup spans two archives or volumes. The --volsize 5 parameter ensures that each volume occupies a maximum of 5MB.
Figure 4: The complete backup spans two archives or volumes. The --volsize 5 parameter ensures that each volume occupies a maximum of 5MB.

You can change this behavior using the --volsize parameter, which lets you define the maximum size of each volume in megabytes. For example, --volsize 125 increases the size to 125MB. As the volume size increases, however, Duplicity also needs more RAM. You might want to exercise caution when increasing this value.

Including and Excluding Data

The --exclude parameter lets you specifically leave out a subdirectory from the backup. In the following example, the tool would not back up the subdirectory /home/klaus/Videos:

duplicity --exclude /home/klaus/Videos /home scp://dd@example.com//var/backup

If you want to back up the entire system via the root directory (/), you should at least always exclude /proc, the dynamic filesystem that provides a window into the running kernel. Otherwise, you are in danger of Duplicity tripping up all over its content. For each directory to exclude, you must specify the --exclude parameter again. The --include parameter lets you specifically include certain subdirectories. This example command

duplicity --include /home --include /etc --exclude / / scp://dd@example.com//var/backup

exclusively backs up the /home and /etc directories.

Easy Recovery

You can restore a backup by reversing the source and destination calls. The following example restores the backup stored in /var/backup on the server example.com to the /home/tim/restore directory:

duplicity scp://dd@example.com//var/backup /home/tim/restore

On request, Duplicity even restores a single file. The parameter responsible for this, --<file-to-restore>, expects the relative path to the file in which you are interested. For example, if you are backing up the /home/klaus directory, you can restore the letter.txt file originally stored in /home/klaus/Documents with the following command:

duplicity --<file-to-restore> Documents/letter.txt scp://dd@example.com//var/backup letter_alt.txt

At the end of the call, Duplicity does not expect the directory in which to restore the file but rather a file name. In the preceding example, the tool retrieves the file letter.txt from the backup and stores it in the current directory as letter_alt.txt. The list-current-files action lists all the files in a backup:

duplicity list-current-files scp://dd@example.com//var/backup

Using the --time parameter, you can even revert to a certain file version. The following example retrieves exactly the version of the letter.txt file that was stored in the backup seven days earlier. This assumes that Duplicity created a backup seven days ago:

duplicity --time 7D --<file-to-restore> Document/letter.txt scp://dd@example.com//var/backup letter_alt.txt

Alternatively, you can also specify a specific date; for example, --time 2015/9/10/ accesses the backup from September 10, 2015.

Ensuring Data Integrity

The verify action compares the data on the hard drive with the last backup. This tells you which data had changed and whether an unauthorized third party has changed the backup:

duplicity verify --compare-data --ssh-askpass scp://dd@example.com//var/backup /home/tim

The --compare-data parameter ensures that Duplicity also compares the contents of the files (Figure 5). This takes longer, but without specifying the option, Duplicity 0.7 did not always deliver reliable information in my tests. You can write the report generated by verify to the duplog.txt text file using the --log-file duplog.txt parameter. In this way, you can include the check in a cron job and then selectively analyze the logfile with a monitoring tool or have it sent via email.

The "verify" action lets you quickly discover changes.
Figure 5: The "verify" action lets you quickly discover changes.

Conclusions

On closer inspection, Duplicity turns out to be a full-featured backup tool. Although you can quickly and easily back up a single directory, more complex situations require more parameters, which can lead to a very complex command line. The documentation is also restricted to the man page, although it is extremely detailed. On a positive note, Duplicity can be integrated into your own shell scripts. Thanks to automatic encryption, you can also store backups in the cloud, even if this is sensitive data for which privacy policies need to be observed.