Backups using rdiff-backup and rsnapshot
Brothers
The first step in ensuring comprehensive backups is to consider where the backups should be stored; therefore, a separate backup server is often used that connects to other computers and initiates the backups. Alarm bells will be ringing for security-conscious administrators at this point – the backup server can connect to all the other machines! Safeguarding the backup server and its connection scheme is therefore extremely important, not least because the productive data for all systems are on the backup server.
Automated backups in Linux usually require a user who connects to the system to be backed up using public key authentication. Two security aspects are critical: First, the user needs root rights for the target system to be able to back up all the data, and, second, the private SSH keys for automation are not password protected. In this article, I provide a detailed set of instructions for how to counteract these weak points using the following simple restrictions:
- Create a separate key pair for the backup user and limit the permitted commands to the systems to be backed up using
authorized_keys
. - Create a sudo configuration for the backup user that only allows the backup program (
rdiff-backup
orrsnapshot
) to dispense with a password entry.
rdiff-backup vs. rsnapshot
The two command-line tools rdiff-backup and rsnapshot are well-known backup programs in Linux. After initial configuration, their simplicity and reliability are very impressive. Table 1 shows the most important functions for both tools and provides some initial information about backup concepts.
Tabelle 1: rdiff-backup and rsnapshot Differences
rdiff-backup |
rsnapshot |
|
---|---|---|
Programming language |
Mainly Python |
Completely in Perl |
Data transfer |
Uses librsync |
Uses rsync |
Data storage |
Old versions are saved as increments or deltas to the current version. |
Files that don't change are stored as hard links using snapshots. |
Data access |
The last data version (mirror) can be accessed immediately; older versions can be restored via increments. |
All snapshot data can be accessed immediately. |
Removing backups |
Backups can be removed using |
Backups run at certain intervals (e.g., daily or weekly); |
Rdiff-backup, as the name suggests, saves the delta between current data and an old version as a reverse diff. If a file changes, only the changes to the previous version are stored in a backup. The current data version or mirror can then be used straightaway. Older versions are computed from the diffs.
Rsnapshot takes another path: If a file doesn't change more than two snapshots, it simply creates another hard link to the file. Identical files then don't take up any more space than needed. As with rdiff-backup, there is no diff calculation. If a file changes, it is completely available in the next snapshot.
Data Backup Using rdiff-backup
Backups using rdiff-backup
are created based on the source and target directory. The following examples backups the /etc
directory to /mnt/backup
:
# rdiff-backup /etc /mnt/backup # ls /mnt/backup/hosts /mnt/backup/hosts
Forward slashes at the end of directory names (trailing slashes) are ignored, so it doesn't matter whether you use them here or not. However, in rsnapshot, you have to use trailing slashes in the rsnapshot.conf
file. The example above also shows that the files are located directly below /mnt/backup/
: /etc/hosts
was backed up to /mnt/backup/hosts
. You need to sort out subdirectories yourself.
Rdiff-backup does not provide a progress bar during the backup, but verbosity levels are there for anyone who wants to know what is being backed up at the time. Level 5 displays whether a file is changed; however, each processed file is listed in level 6:
# rdiff-backup -v5 /etc/ /mnt/backup [...] Incrementing mirror file /mnt/backup Processing changed file X11 Incrementing mirror file /mnt/backup/X11 Processing changed file X11/Xreset [...]
The --compare
function is also very useful; it performs a kind of trial run and lists the files that have changed. In this way, you know in advance about data that would have been backed up:
# rdiff-backup --compare /etc/ /mnt/backup changed: . changed: hosts changed: mtab
To perform another backup you just need to execute the same command again. Continuous backups have the advantage of allowing you to access different versions of data by backup time (Listing 1). The --list-increments
option displays how many backups are available at what times. The current version is listed in the Current mirror line, and the data for these times can be accessed as normal files.
Listing 1: Incremental Backups
# rdiff-backup /etc /mnt/backup # rdiff-backup --list-increments /mnt/backup/ Found 2 increments: increments.2015-03-15T09:15: 19+01:00.dir Sun Mar 15 09:15:19 2015 increments.2015-03-19T20:15: 46+01:00.dir Thu Mar 19 20:15:46 2015 Current mirror: Sat Mar 21 08:43:49 2015
Metadata and increments or diffs are in the rdiff-backup-data
directory. It is at least as important as the remaining backup data. After all, the increments are responsible for letting you restore data from previous backups. You thus also need to think about backing up your backups. If the backup system bites the dust, or something goes wrong with rdiff-backup, this mustn't become a huge problem.
Excluding files
Excluding files (excludes) from a backup is just as much an advantage as being able to include them. The simplest way is to pass in the files to be excluded to rdiff-backup using --exclude
:
# rdiff-backup --exclude /etc/ld.so.cache /etc /mnt/backup # ls /mnt/backup/ld.so.cache ls: cannot access /mnt/backup/ld.so.cache: No such file or directory
This is just the easiest approach to excluding. Shell patterns, regular expressions, and exclude lists are also supported. An exclude list initially consists of the paths for the files to be excluded:
# cat exclude-list /etc/wpa_supplicant /etc/dump
This list then serves as a parameter for the --exclude-filelist
,
# rdiff-backup --exclude-filelist exclude-list /etc/ /mnt/backup
or --exclude-globbing-filelist
options. Globbing lists allow the use of patterns [1].
Performing a Restore
Restoring a data version from the last backup using the current mirror is really easy. The best option is to use the cp
command with the archive option:
# cp -a /mnt/backup/shadow /etc/shadow
However, its strength's only become evident if older versions of files need to be restored. This requires regular backups and an associated increment at the desired time. The following example restores the /etc/hosts
file, exactly as it used to be when the backup was performed March 19, 2015 (at 20:15:46 hours, see --list-increments
:
# rdiff-backup /mnt/backup/rdiff-backup-data/increments/hosts.\ 2015-03-19T20\:15\:46+01\:00.diff.gz /tmp/hosts
You can also use the timestamp when restoring and do without increments:
# rdiff-backup -r 7D /mnt/backup/hosts /tmp/hosts
This means you receive the file in the state it was in seven days ago. If you don't have a backup from this time, rdiff-backup selects the previous increment (e.g., the one eight days ago). You can find more information about this on the man page for rdiff-backup in the TIME FORMATS and RESTORING sections.
The rdiff-backup-fs
command is a useful alternative for backing up files. It mounts a backup directory and provides the user with individual increments as a directory. Listing 2 shows that each backup performed appears as a separate directory.
Listing 2: rdiff-backup-fs
# TZ='+1' rdiff-backup-fs /mnt/fuse /mnt/backup/ # ls /mnt/fuse/ 2015-03-15T09:15:19 2015-03-15T09:50:34 2015-03-19T20:15:46 2015-03-21T09:45:05 2015-03-15T09:28:48 2015-03-15T10:44:06 2015-03-21T08:43:49
Make sure the TZ
variable is set; otherwise, the directory names won't agree with the backup times. For practical purposes, you will see immediately from the directories the state of the data at a specific time. To restore a file, simply go to a directory and copy it to the desired location.
Removing Backups
The idea of time periods comes up again when you start removing old backups. You can't delete a special increment, but you can delete all those older than five days, for example:
# rdiff-backup --remove-older-than 5D /mnt/backup
A warning is output if the time period selected would remove multiple increments. If you're sure that the increments to be deleted are correct, you can delete them using the --force
option.
Statistics
Backups in the production environment with statistics other than data volumes and changes makes dimensioning backup servers easier and provides information about data growth. Rdiff-backup logs more than 15 values for each backup process and writes them to separate files or session statistics. It is possible to display the value either directly by calling --print-statistics
, by analyzing the rdiff-backup-data/session_statistics
files, or using the rdiff-backup-statistics
program.
You can display the values directly by calling --print-statistics
, analyze the data with rdiff-backup-data/session_statistics
files, or use the rdiff-backup-statistics
program.
Say you unwittingly back up a new file and want to find out which one. The statistics file first tells you what data volume has changed in the mirror:
# grep TotalDestinationSizeChange /mnt/backup/rdiff-backup-data/\ session_statistics.2015-03-15T10\:44\:06+01\:00.data TotalDestinationSizeChange 41943106 (40.0 MB)
Another view of the file statistics lists which file effected the change:
# gunzip -c /mnt/backup/rdiff-backup-data/file_statistics.\ 2015-03-15T10\:44\:06+01\:00.data.gz | awk '$2==1{print}' | sort -k3nr,3 dump/data-dump 1 41943040 NA 0 dump 1 0 0 NA samba 1 0 0 NA samba/dhcp.conf 1 0 0 66
The columns for the file statistics are divided into FileName, Changed, SourceSize, MirrorSize, and IncrementSize. Because dump/data-dump
is a new file, Changed
is 1
, MirrorSize
is NA
(file hasn't been mirrored yet), and IncrementSize
also is 0
. SourceSize
is almost the whole TotalDestinationSizeChange
, so the file occupies the storage space in the backup.
Using rsnapshot
The idea behind rsnapshot is clear and effective – snapshots of the data are created at specified times. Although there are no increments in rsnapshot, data is not simply copied across snapshots. Instead, the remaining files are hard linked. Hard links are a kind of reference that points to the filesystem's inode. Each file may have several such references, but it only occupies the storage space once.
In rsnapshot, you can check the use of hard links yourself as in Listing 3. The /etc/hosts
file remained unchanged in the two snapshots: hourly.2
and hourly.1
. They therefore share inode 28147. A new inode was allocated to the file in the latest backup hourly.0
because it had changed. For the data transfer itself, rsnapshot uses rsync, which synchronizes changes from A to B efficiently with its delta copy mechanism [2].
Listing 3: rsnapshot Hard Links
# ls -li hourly.*/localhost/etc/hosts 28589 -rw-r--r-- 1 root root 209 Mar 13 11:30 hourly.0/localhost/etc/hosts 28147 -rw-r--r-- 2 root root 186 Jul 10 2014 hourly.1/localhost/etc/hosts 28147 -rw-r--r-- 2 root root 186 Jul 10 2014 hourly.2/localhost/etc/hosts
Configuration
The main configuration file in rsnapshot, /etc/rsnapshot.conf
, has many configuration options – initially, you can limit the focus to the most important settings (Table 2).
Tabelle 2: Important rsnapshot Backup Options
Option |
Purpose |
---|---|
|
Root directory for storing snapshots. |
|
If this option is |
|
The |
|
Listed commands are logged in |
|
The |
|
Path to the SSH program. |
Parameters in the configuration are always separated by tab characters. You are best off running a configtest,
as follows:
# rsnapshot configtest Syntax OK
to make sure you have set everything up correctly.
Regular Snapshots
The following example shows a rsnapshot configuration for backing up the /etc
directory at hourly, daily, and weekly intervals:
retain hourly 6 retain daily 7 retain weekly 4 backup /etc/ localhost/
Using these settings, you still need to make sure rsnapshot is called regularly, because rsnapshot synchronizes and rotates data but doesn't run it regularly.
The cron service, which arranges continuous backups, is in charge of this task:
# vi /etc/cron.d/rsnapshot 0 */4 * * * root /usr/bin/rsnapshot hourly 30 3 * * * root /usr/bin/rsnapshot daily 0 3 * * 1 root /usr/bin/rsnapshot weekly
The entries shown here make sure that an "hourly backup" is performed every four hours (which means six snapshots are kept), a daily backup runs every day at 3:30am (seven snaphots are kept), and a weekly backup starts at 3:00am every Monday (four snapshots are available).
Regularly running backups using cron is equally suitable for rdiff-backup: just create a cron job that regularly ensures the rdiff-backup call.
Backups via SSH
Performing backups from a production system to a backup server over the network via SSH has the advantage of encrypted data communication. Based on the direction of backup, a distinction is made between:
- Pull backups: the backup server backs up a remote server locally.
- Push backups: the server transfers its data to the backup server.
The security measures referred to in the first section relate mainly to pull backups but won't be detrimental for push backups, either. An effective authorized_keys
configuration limits the backup user's options:
# cat .ssh/authorized_keys command="/usr/bin/python /usr/bin/rdiff-backup --server",\ no-agent-forwarding,no-port-forwarding,no-user-rc,\ no-X11-forwarding,no-pty ssh-rsa AAAAB3NzaC1y[...]
Finding the right command for the command
parameter can be difficult [3].
To back up system data as well, the backup user needs root privileges on the server to be backed up. A matching sudo configuration avoids the use of a genuine root account:
# vi /etc/sudoers.d/rdiff rdiff ALL=(root)NOPASSWD:/usr/bin/rdiff-backup
The configuration allows the rdiff user to run the /usr/bin/rdiff-backup
command with root privileges and without a password. However, rdiff cannot run any other commands with sudo
. During the SSH backup, you then need to ensure that the backup command proper is preceded by sudo
:
# rdiff-backup --remote-schema 'ssh -C %s sudo rdiff-backup --server' \ rdiff@192.168.56.105::/etc /mnt/backup
Backups via SSH are part of the rdiff-backup standard repertoire. It supports both pull and push backups equally. The first line below shows the pull variant, and the second uses push mode:
rdiff-backup rdiff@192.168.56.1::/etc /mnt/backup rdiff-backup /etc/ rdiff@192.168.56.105::/mnt/backup
Rsnapshot only supports pull backups in its configuration file:
backup rsnap@192.168.56.1:/etc remoteA/
By default, it does not support push backups via SSH. In some situations, push backups are essential – for example, if the firewall rules only allow data to travel from the server to the backup server. In this case, you can use rsync and an intelligent configuration to create push backups with rsnapshot:
- On the backup server, an rsync daemon including a configuration is assigned to the server's public key; the
authorized_keys
file takes care of this. - The rsync daemon's configuration on the backup server states the backup path. It uses
post-xfer exec
to call a script containingrsnapshot
. - To manage multiple servers, a configuration independent of
/etc/rsnapshot.conf
is created; it defines the backup schema. - When the server wants to trigger a backup, it uses rsync and SSH to sync the data with the backup server.
In authorized_keys
, the rsync daemon and its post-exec
(step 2) are triggered, and a snapshot is created automatically.
The drawback with this solution turns out to be the detour via rsync. The basis, synchronized via rsync, acts as a source for rsnapshot. In addition to the snapshots, the backed up data also exists in another instance, which is not very efficient in terms of space usage. Solutions for this waste of space are currently being discussed in the rsync mailing list [4].
Conclusions
The Linux on-board tools rdiff-backup and rsnapshot both have their raisons d'être. Incremental backups in rdiff-backup have benefits and drawbacks. The good thing is their efficiency in terms of hard disk capacity, but the computational and time overheads are disadvantageous. If you have a large number of diffs (e.g., an older MySQL dump) a restore can take a considerable amount of time.
Rsnapshot impresses with its transparency: Each snapshot contains the files at the time of backup. Compared with rdiff-backup, however, it is unable to compute deltas; the next snapshot thus contains a complete version of a modified file. For a list of other benefits and drawbacks, refer to Table 3. At the end of the day, both programs do a reliable job of backing up data; which you decide to use is a matter of choice.
Tabelle 3: Benefits and Drawbacks
rdiff-backup |
rsnapshot |
---|---|
Benefits |
|
Increments save stored data in a very efficient way |
Each snapshot is accessible as a legacy directory |
Detailed logs and statistics data |
Hard link mechanism works simply and is fast |
Drawbacks |
|
Increments cause more overhead |
Limited use for files that frequently change |
No deleting of a single increment (cf. |
Push backups via SSH involve a detour |