Encrypted backup with Duplicity
Packed and Sent
Duplicity [1] packages one or more directories into a tar archive, encrypts the results with GnuPG, and automatically uploads the backup created in this way to a backup server. Signatures help reveal tampering or disk failure, which means backups can be stored on insecure servers or in the cloud. Duplicity even offers native functions for talking to some well-known cloud services.
Additionally, Duplicity can create incremental backups, in which the transferred archive contains only the delta to the previously created backup. This not only saves disk space on the server but also means individual backups are created faster. Duplicity is licensed under the GNU GPL and thus can be used free of charge.
Duplicity is tailored for Linux and other Unix operating systems, such as BSD or OS X. Most major Linux distributions have it in their repositories. Users of OS X can install it via Fink, for example. For Ubuntu-based distributions, there is also a PPA [2] with the current Duplicity version. Alternatively, Duplicity can be built quickly from the source code (see the box "Self-Build"). Duplicity works on Windows in the Cygwin environment but is unable to handle the specific features of the Windows filesystem. Administrators should back up Windows systems with some other software if possible.
Creating Backups
Duplicity is very simple to operate. At the command line, you pass in the directory to be backed up and the storage directory to the tool. The following example packages the complete /etc
directory in a tar archive, encrypts it, and uses secure copy (scp
) to store the results on the server at example.com below a directory named /var/backup
(Figure 1). Note the double slashes after the domain name:
duplicity --progress /etc scp://dd@example.com//var/backup
During the backup, Duplicity considers deleted files, all file permissions, subdirectories, FIFOs, device files, and symbolic links, but not hard links. Specifying the --progress
parameter tells Duplicity to indicate the progress continuously. Note that the tool always expects parameters in front of the directory information. Furthermore, you must ensure that Duplicity has the correct permissions. In the above case, it must therefore be allowed to access /etc
and all its contents.
Duplicity automatically compresses the archive with gzip
, which can be switched off with the --no-compression
option. Additionally, Duplicity creates some temporary files in the appropriate directory – for Linux, this is usually in /tmp
. If you have insufficient free space, you can use --tempdir /<path/to>/tmp
to define another directory. In previous Duplicity versions, users had to define the temporary directory in the environment variable TMPDIR
. The developers have made this method obsolete, however.
Duplicity encrypts the resulting archive with GnuPG. For this reason, you need to create and type a password (the GnuPG key) after calling Duplicity; you will need the password to restore the backup later. Accordingly, you will want to make the password as long and cryptic as possible – but not so long that you forget it; otherwise, you can say goodbye to your data.
Transferring Passwords for SSH and FTP
The previous command assumes that you log in to the SSH server using private and public keys. If you want to authenticate via password, specify the Duplicity --ssh-askpass
parameter. The tool then prompts you for the required SSH password when connecting. If the SSH server is not listening on the default port, you also need to specify the port in the usual way separated by colons after the domain name:
duplicity /etc scp://dd@example.com:2222//var/backup
If you want to store the backup on an FTP server, you need to enter the password for the server in the FTP_PASSWORD
environment variable. In the following example, it is 123
. For the FTP transmission method, the domain name is followed by a slash:
FTP_PASSWORD=123 duplicity /home/tim ftp://dd@example.com/var/backup
Incidentally, Duplicity also evaluates the FTP_PASSWORD
environment variable for an SSH connection. You can thus omit the --ssh-askpass omit
parameter and define the SSH password in the FTP_PASSWORD
environment variable. This is especially useful if you want to include Duplicity in a script. If you want Duplicity to create the backup archive on a local storage medium, use the file://
protocol:
duplicity /etc file:///mnt/backup
Duplicity can transmit the backup archive with many other protocols such as Rsync and WebDAV. Additionally, Duplicity can store the backups in various cloud services, including Dropbox, Azure, OpenStack Swift, and Amazon S3, along with a couple of quirky storage memory options such as sending email. Almost every new release of Duplicity adds new protocols. For a complete and quite long list, check the Duplicity man page. To access this, type man duplicity
and look for the "URL format" section. The man page for the current Duplicity version is also available online [1].
For some protocols and services, Duplicity requires additional libraries and tools (see the box "Modules Used"). The backup program prompts for any missing helpers when called. On Linux, there is no need for manual attention for the standard protocols, but this is not true for many cloud services. For example, to access Amazon S3, you need Boto [3] software version 2.0 or newer. For a complete list of all dependencies for all supported services, see the Duplicity man page "Requirements" section.
Storing Encrypted Backups
Duplicity uses symmetric encryption by default – that is, the same password is used to encrypt and decrypted the backup. Alternatively, the tool can use GnuPG public key encryption. Here, each user has two keys: An archive locked with the public key can only be unlocked again with the private key.
If you want to use a new key pair for the backup, create the key before the first backup using gpg --gen-key
. To do this, answer the questions posed; if in doubt, leave the fields blank or accept the default settings by pressing Enter (Figure 2). You will need to type the passphrase each time for encryption and decryption. At the end, GPG outputs a key ID, which you will want to remember.
Because the keypair secures your backup, you will want to save it on an external medium. Use the following two commands to create a copy of the public and private keys in the files /mnt/key_pub.gpg
and /mnt/key_sec.gpg
:
gpg --output /mnt/key_pub.gpg --armor --export Key-ID gpg --output /mnt/key_sec.gpg --armor --export-secret-keys Key-ID
On another system, or after system recovery, the key can then be reloaded using gpg --import
. When creating a new backup, you need to tell Duplicity the key ID of the public key using the --encrypt-key
parameter:
duplicity --ssh-askpass --encrypt-key 12345678 /etc scp://dd@example.com//var/backup
Normally, you need to type the passphrase after calling the command. If Duplicity runs directly, the GPG agent has probably stored the passphrase in the background. Furthermore, Duplicity buffers some metadata in the ~/.cache/duplicity/
directory that it retrieves whenever called.
When you call Duplicity from a script, you can enter the passphrase in the PASSPHRASE
environment variable. This method, however, poses a security risk: Anyone who can read the script automatically discovers the passphrase. If you set the environment variable in a script, you should at least explicitly dump its contents from memory afterward using unset
. You can completely disable encryption with --no-encryption
.
Incremental Backups
When first called, Duplicity always completely backs up the source directory. Once you invoke the command a second time, Duplicity only backs up the previously added or changed data (delta). This approach has the advantage that you can include the Duplicity call in a cron job or startup script, thus ensuring that Duplicity runs regularly and automatically. To do this, Duplicity uses the librsync library, which implements the well-known Rsync algorithm.
Incremental backups save space on the server and can be created much faster. However, if a read error occurs in one of the parts, the subsequent backups will very likely be useless. Moreover, recovery will take longer because Duplicity may first need to look at all the incremental backups. For this reason, you should perform a full backup at regular intervals. You can enforce this by specifying full
:
duplicity full /home/tim scp://dd@example.com//var/backup
In this case, full
is not a parameter but an action that needs to follow the program name directly. The --full-if-older-than
parameter tells Duplicity to create full backup if the last full backup was created more than a predetermined period ago (Figure 3) – in this example more than one month:
duplicity --full-if-older-than 1M /home/tim scp://dd@example.com//var/backup
You need to leave out the full
action in this case; otherwise, it would overrule the --full-if-older-than
parameter.
Instead of 1M
for a month, you can also specify other periods; for example, 14D
is 14 days. The appropriate value depends on your organization's backup strategy.
Duplicity does not pack the data to be backed up into a single huge archive; instead, it distributes the data to several smaller archives. Because these volumes can only grow to a maximum of 25MB by default, numerous small files accumulate over time on the server (Figure 4).
You can change this behavior using the --volsize
parameter, which lets you define the maximum size of each volume in megabytes. For example, --volsize 125
increases the size to 125MB. As the volume size increases, however, Duplicity also needs more RAM. You might want to exercise caution when increasing this value.
Including and Excluding Data
The --exclude
parameter lets you specifically leave out a subdirectory from the backup. In the following example, the tool would not back up the subdirectory /home/klaus/Videos
:
duplicity --exclude /home/klaus/Videos /home scp://dd@example.com//var/backup
If you want to back up the entire system via the root directory (/
), you should at least always exclude /proc
, the dynamic filesystem that provides a window into the running kernel. Otherwise, you are in danger of Duplicity tripping up all over its content. For each directory to exclude, you must specify the --exclude
parameter again. The --include
parameter lets you specifically include certain subdirectories. This example command
duplicity --include /home --include /etc --exclude / / scp://dd@example.com//var/backup
exclusively backs up the /home
and /etc
directories.
Easy Recovery
You can restore a backup by reversing the source and destination calls. The following example restores the backup stored in /var/backup
on the server example.com to the /home/tim/restore
directory:
duplicity scp://dd@example.com//var/backup /home/tim/restore
On request, Duplicity even restores a single file. The parameter responsible for this, --<file-to-restore>
, expects the relative path to the file in which you are interested. For example, if you are backing up the /home/klaus
directory, you can restore the letter.txt
file originally stored in /home/klaus/Documents
with the following command:
duplicity --<file-to-restore> Documents/letter.txt scp://dd@example.com//var/backup letter_alt.txt
At the end of the call, Duplicity does not expect the directory in which to restore the file but rather a file name. In the preceding example, the tool retrieves the file letter.txt
from the backup and stores it in the current directory as letter_alt.txt
. The list-current-files
action lists all the files in a backup:
duplicity list-current-files scp://dd@example.com//var/backup
Using the --time
parameter, you can even revert to a certain file version. The following example retrieves exactly the version of the letter.txt
file that was stored in the backup seven days earlier. This assumes that Duplicity created a backup seven days ago:
duplicity --time 7D --<file-to-restore> Document/letter.txt scp://dd@example.com//var/backup letter_alt.txt
Alternatively, you can also specify a specific date; for example, --time 2015/9/10/
accesses the backup from September 10, 2015.
Ensuring Data Integrity
The verify
action compares the data on the hard drive with the last backup. This tells you which data had changed and whether an unauthorized third party has changed the backup:
duplicity verify --compare-data --ssh-askpass scp://dd@example.com//var/backup /home/tim
The --compare-data
parameter ensures that Duplicity also compares the contents of the files (Figure 5). This takes longer, but without specifying the option, Duplicity 0.7 did not always deliver reliable information in my tests. You can write the report generated by verify
to the duplog.txt
text file using the --log-file duplog.txt
parameter. In this way, you can include the check in a cron job and then selectively analyze the logfile with a monitoring tool or have it sent via email.
Conclusions
On closer inspection, Duplicity turns out to be a full-featured backup tool. Although you can quickly and easily back up a single directory, more complex situations require more parameters, which can lead to a very complex command line. The documentation is also restricted to the man page, although it is extremely detailed. On a positive note, Duplicity can be integrated into your own shell scripts. Thanks to automatic encryption, you can also store backups in the cloud, even if this is sensitive data for which privacy policies need to be observed.