Nuts and Bolts rdiff-backup and rsnapshot Lead image: Lead Image © Dmitry Pichugin, Fotolia.com
Lead Image © Dmitry Pichugin, Fotolia.com
 

Backups using rdiff-backup and rsnapshot

Brothers

The easier you can backup and restore data, the better. Mature Linux tools show that performing regular, automated backups doesn't have to be a pain. By Georg Schönberger

The first step in ensuring comprehensive backups is to consider where the backups should be stored; therefore, a separate backup server is often used that connects to other computers and initiates the backups. Alarm bells will be ringing for security-conscious administrators at this point – the backup server can connect to all the other machines! Safeguarding the backup server and its connection scheme is therefore extremely important, not least because the productive data for all systems are on the backup server.

Automated backups in Linux usually require a user who connects to the system to be backed up using public key authentication. Two security aspects are critical: First, the user needs root rights for the target system to be able to back up all the data, and, second, the private SSH keys for automation are not password protected. In this article, I provide a detailed set of instructions for how to counteract these weak points using the following simple restrictions:

rdiff-backup vs. rsnapshot

The two command-line tools rdiff-backup and rsnapshot are well-known backup programs in Linux. After initial configuration, their simplicity and reliability are very impressive. Table 1 shows the most important functions for both tools and provides some initial information about backup concepts.

Tabelle 1: rdiff-backup and rsnapshot Differences

rdiff-backup

rsnapshot

Programming language

Mainly Python

Completely in Perl

Data transfer

Uses librsync

Uses rsync

Data storage

Old versions are saved as increments or deltas to the current version.

Files that don't change are stored as hard links using snapshots.

Data access

The last data version (mirror) can be accessed immediately; older versions can be restored via increments.

All snapshot data can be accessed immediately.

Removing backups

Backups can be removed using --remove-older-than (i.e., versions that are older than a certain time).

Backups run at certain intervals (e.g., daily or weekly); retain controls which type of snapshot is retained for how long.

Rdiff-backup, as the name suggests, saves the delta between current data and an old version as a reverse diff. If a file changes, only the changes to the previous version are stored in a backup. The current data version or mirror can then be used straightaway. Older versions are computed from the diffs.

Rsnapshot takes another path: If a file doesn't change more than two snapshots, it simply creates another hard link to the file. Identical files then don't take up any more space than needed. As with rdiff-backup, there is no diff calculation. If a file changes, it is completely available in the next snapshot.

Data Backup Using rdiff-backup

Backups using rdiff-backup are created based on the source and target directory. The following examples backups the /etc directory to /mnt/backup:

# rdiff-backup /etc /mnt/backup
# ls /mnt/backup/hosts
/mnt/backup/hosts

Forward slashes at the end of directory names (trailing slashes) are ignored, so it doesn't matter whether you use them here or not. However, in rsnapshot, you have to use trailing slashes in the rsnapshot.conf file. The example above also shows that the files are located directly below /mnt/backup/: /etc/hosts was backed up to /mnt/backup/hosts. You need to sort out subdirectories yourself.

Rdiff-backup does not provide a progress bar during the backup, but verbosity levels are there for anyone who wants to know what is being backed up at the time. Level 5 displays whether a file is changed; however, each processed file is listed in level 6:

# rdiff-backup -v5 /etc/ /mnt/backup
[...]
Incrementing mirror file /mnt/backup
Processing changed file X11
Incrementing mirror file /mnt/backup/X11
Processing changed file X11/Xreset
[...]

The --compare function is also very useful; it performs a kind of trial run and lists the files that have changed. In this way, you know in advance about data that would have been backed up:

# rdiff-backup --compare /etc/ /mnt/backup
changed: .
 changed: hosts
 changed: mtab

To perform another backup you just need to execute the same command again. Continuous backups have the advantage of allowing you to access different versions of data by backup time (Listing 1). The --list-increments option displays how many backups are available at what times. The current version is listed in the Current mirror line, and the data for these times can be accessed as normal files.

Listing 1: Incremental Backups

# rdiff-backup /etc /mnt/backup
# rdiff-backup --list-increments /mnt/backup/
Found 2 increments:
   increments.2015-03-15T09:15: 19+01:00.dir Sun Mar 15 09:15:19 2015
   increments.2015-03-19T20:15: 46+01:00.dir Thu Mar 19 20:15:46 2015
Current mirror: Sat Mar 21 08:43:49 2015

Metadata and increments or diffs are in the rdiff-backup-data directory. It is at least as important as the remaining backup data. After all, the increments are responsible for letting you restore data from previous backups. You thus also need to think about backing up your backups. If the backup system bites the dust, or something goes wrong with rdiff-backup, this mustn't become a huge problem.

Excluding files

Excluding files (excludes) from a backup is just as much an advantage as being able to include them. The simplest way is to pass in the files to be excluded to rdiff-backup using --exclude:

# rdiff-backup --exclude /etc/ld.so.cache /etc /mnt/backup
# ls /mnt/backup/ld.so.cache
ls: cannot access /mnt/backup/ld.so.cache: No such file or directory

This is just the easiest approach to excluding. Shell patterns, regular expressions, and exclude lists are also supported. An exclude list initially consists of the paths for the files to be excluded:

# cat exclude-list
/etc/wpa_supplicant
/etc/dump

This list then serves as a parameter for the --exclude-filelist,

# rdiff-backup --exclude-filelist exclude-list /etc/ /mnt/backup

or --exclude-globbing-filelist options. Globbing lists allow the use of patterns [1].

Performing a Restore

Restoring a data version from the last backup using the current mirror is really easy. The best option is to use the cp command with the archive option:

# cp -a /mnt/backup/shadow /etc/shadow

However, its strength's only become evident if older versions of files need to be restored. This requires regular backups and an associated increment at the desired time. The following example restores the /etc/hosts file, exactly as it used to be when the backup was performed March 19, 2015 (at 20:15:46 hours, see --list-increments:

# rdiff-backup /mnt/backup/rdiff-backup-data/increments/hosts.\
  2015-03-19T20\:15\:46+01\:00.diff.gz /tmp/hosts

You can also use the timestamp when restoring and do without increments:

# rdiff-backup -r 7D /mnt/backup/hosts /tmp/hosts

This means you receive the file in the state it was in seven days ago. If you don't have a backup from this time, rdiff-backup selects the previous increment (e.g., the one eight days ago). You can find more information about this on the man page for rdiff-backup in the TIME FORMATS and RESTORING sections.

The rdiff-backup-fs command is a useful alternative for backing up files. It mounts a backup directory and provides the user with individual increments as a directory. Listing 2 shows that each backup performed appears as a separate directory.

Listing 2: rdiff-backup-fs

# TZ='+1' rdiff-backup-fs /mnt/fuse /mnt/backup/
# ls /mnt/fuse/
2015-03-15T09:15:19 2015-03-15T09:50:34 2015-03-19T20:15:46
2015-03-21T09:45:05 2015-03-15T09:28:48 2015-03-15T10:44:06
2015-03-21T08:43:49

Make sure the TZ variable is set; otherwise, the directory names won't agree with the backup times. For practical purposes, you will see immediately from the directories the state of the data at a specific time. To restore a file, simply go to a directory and copy it to the desired location.

Removing Backups

The idea of time periods comes up again when you start removing old backups. You can't delete a special increment, but you can delete all those older than five days, for example:

# rdiff-backup --remove-older-than 5D /mnt/backup

A warning is output if the time period selected would remove multiple increments. If you're sure that the increments to be deleted are correct, you can delete them using the --force option.

Statistics

Backups in the production environment with statistics other than data volumes and changes makes dimensioning backup servers easier and provides information about data growth. Rdiff-backup logs more than 15 values for each backup process and writes them to separate files or session statistics. It is possible to display the value either directly by calling --print-statistics, by analyzing the rdiff-backup-data/session_statistics files, or using the rdiff-backup-statistics program.

You can display the values directly by calling --print-statistics, analyze the data with rdiff-backup-data/session_statistics files, or use the rdiff-backup-statistics program.

Say you unwittingly back up a new file and want to find out which one. The statistics file first tells you what data volume has changed in the mirror:

# grep TotalDestinationSizeChange /mnt/backup/rdiff-backup-data/\
  session_statistics.2015-03-15T10\:44\:06+01\:00.data
TotalDestinationSizeChange 41943106 (40.0 MB)

Another view of the file statistics lists which file effected the change:

# gunzip -c /mnt/backup/rdiff-backup-data/file_statistics.\
  2015-03-15T10\:44\:06+01\:00.data.gz | awk '$2==1{print}' | sort -k3nr,3
dump/data-dump 1 41943040 NA 0
dump 1 0 0 NA
samba 1 0 0 NA
samba/dhcp.conf 1 0 0 66

The columns for the file statistics are divided into FileName, Changed, SourceSize, MirrorSize, and IncrementSize. Because dump/data-dump is a new file, Changed is 1, MirrorSize is NA (file hasn't been mirrored yet), and IncrementSize also is 0. SourceSize is almost the whole TotalDestinationSizeChange, so the file occupies the storage space in the backup.

Using rsnapshot

The idea behind rsnapshot is clear and effective – snapshots of the data are created at specified times. Although there are no increments in rsnapshot, data is not simply copied across snapshots. Instead, the remaining files are hard linked. Hard links are a kind of reference that points to the filesystem's inode. Each file may have several such references, but it only occupies the storage space once.

In rsnapshot, you can check the use of hard links yourself as in Listing 3. The /etc/hosts file remained unchanged in the two snapshots: hourly.2 and hourly.1. They therefore share inode 28147. A new inode was allocated to the file in the latest backup hourly.0 because it had changed. For the data transfer itself, rsnapshot uses rsync, which synchronizes changes from A to B efficiently with its delta copy mechanism [2].

Listing 3: rsnapshot Hard Links

# ls -li hourly.*/localhost/etc/hosts
28589 -rw-r--r-- 1 root root 209 Mar 13 11:30 hourly.0/localhost/etc/hosts
28147 -rw-r--r-- 2 root root 186 Jul 10 2014 hourly.1/localhost/etc/hosts
28147 -rw-r--r-- 2 root root 186 Jul 10 2014 hourly.2/localhost/etc/hosts

Configuration

The main configuration file in rsnapshot, /etc/rsnapshot.conf, has many configuration options – initially, you can limit the focus to the most important settings (Table 2).

Tabelle 2: Important rsnapshot Backup Options

Option

Purpose

snapshot_root

Root directory for storing snapshots.

no_create_root

If this option is 1, rsnapshot doesn't automatically create the root directory. This is useful, for example, if mounting snapshot_root failed.

retain

The retain lines control which type of snapshot is retained for how long.

logfile

Listed commands are logged in logfile.

backup

The backup lines define the directories to be backed up. It is also possible to use a remote path (accessible via SSH).

cmd_ssh

Path to the SSH program.

Parameters in the configuration are always separated by tab characters. You are best off running a configtest, as follows:

# rsnapshot configtest
 Syntax OK

to make sure you have set everything up correctly.

Regular Snapshots

The following example shows a rsnapshot configuration for backing up the /etc directory at hourly, daily, and weekly intervals:

retain    hourly       6
retain    daily        7
retain    weekly       4
backup    /etc/        localhost/

Using these settings, you still need to make sure rsnapshot is called regularly, because rsnapshot synchronizes and rotates data but doesn't run it regularly.

The cron service, which arranges continuous backups, is in charge of this task:

# vi /etc/cron.d/rsnapshot
0 */4 * * * root /usr/bin/rsnapshot hourly
30 3 * * * root /usr/bin/rsnapshot daily
0 3 * * 1 root /usr/bin/rsnapshot weekly

The entries shown here make sure that an "hourly backup" is performed every four hours (which means six snapshots are kept), a daily backup runs every day at 3:30am (seven snaphots are kept), and a weekly backup starts at 3:00am every Monday (four snapshots are available).

Regularly running backups using cron is equally suitable for rdiff-backup: just create a cron job that regularly ensures the rdiff-backup call.

Backups via SSH

Performing backups from a production system to a backup server over the network via SSH has the advantage of encrypted data communication. Based on the direction of backup, a distinction is made between:

The security measures referred to in the first section relate mainly to pull backups but won't be detrimental for push backups, either. An effective authorized_keys configuration limits the backup user's options:

# cat .ssh/authorized_keys
command="/usr/bin/python /usr/bin/rdiff-backup --server",\
  no-agent-forwarding,no-port-forwarding,no-user-rc,\
  no-X11-forwarding,no-pty ssh-rsa AAAAB3NzaC1y[...]

Finding the right command for the command parameter can be difficult [3].

To back up system data as well, the backup user needs root privileges on the server to be backed up. A matching sudo configuration avoids the use of a genuine root account:

# vi /etc/sudoers.d/rdiff
rdiff ALL=(root)NOPASSWD:/usr/bin/rdiff-backup

The configuration allows the rdiff user to run the /usr/bin/rdiff-backup command with root privileges and without a password. However, rdiff cannot run any other commands with sudo. During the SSH backup, you then need to ensure that the backup command proper is preceded by sudo:

# rdiff-backup --remote-schema 'ssh -C %s sudo rdiff-backup --server' \
  rdiff@192.168.56.105::/etc /mnt/backup

Backups via SSH are part of the rdiff-backup standard repertoire. It supports both pull and push backups equally. The first line below shows the pull variant, and the second uses push mode:

rdiff-backup rdiff@192.168.56.1::/etc /mnt/backup
rdiff-backup /etc/ rdiff@192.168.56.105::/mnt/backup

Rsnapshot only supports pull backups in its configuration file:

backup rsnap@192.168.56.1:/etc remoteA/

By default, it does not support push backups via SSH. In some situations, push backups are essential – for example, if the firewall rules only allow data to travel from the server to the backup server. In this case, you can use rsync and an intelligent configuration to create push backups with rsnapshot:

  1. On the backup server, an rsync daemon including a configuration is assigned to the server's public key; the authorized_keys file takes care of this.
  2. The rsync daemon's configuration on the backup server states the backup path. It uses post-xfer exec to call a script containing rsnapshot.
  3. To manage multiple servers, a configuration independent of /etc/rsnapshot.conf is created; it defines the backup schema.
  4. When the server wants to trigger a backup, it uses rsync and SSH to sync the data with the backup server.

In authorized_keys, the rsync daemon and its post-exec (step 2) are triggered, and a snapshot is created automatically.

The drawback with this solution turns out to be the detour via rsync. The basis, synchronized via rsync, acts as a source for rsnapshot. In addition to the snapshots, the backed up data also exists in another instance, which is not very efficient in terms of space usage. Solutions for this waste of space are currently being discussed in the rsync mailing list [4].

Conclusions

The Linux on-board tools rdiff-backup and rsnapshot both have their raisons d'être. Incremental backups in rdiff-backup have benefits and drawbacks. The good thing is their efficiency in terms of hard disk capacity, but the computational and time overheads are disadvantageous. If you have a large number of diffs (e.g., an older MySQL dump) a restore can take a considerable amount of time.

Rsnapshot impresses with its transparency: Each snapshot contains the files at the time of backup. Compared with rdiff-backup, however, it is unable to compute deltas; the next snapshot thus contains a complete version of a modified file. For a list of other benefits and drawbacks, refer to Table 3. At the end of the day, both programs do a reliable job of backing up data; which you decide to use is a matter of choice.

Tabelle 3: Benefits and Drawbacks

rdiff-backup

rsnapshot

Benefits

Increments save stored data in a very efficient way

Each snapshot is accessible as a legacy directory

Detailed logs and statistics data

Hard link mechanism works simply and is fast

Drawbacks

Increments cause more overhead

Limited use for files that frequently change

No deleting of a single increment (cf. --remove-older-than)

Push backups via SSH involve a detour