Tools Bacula Lead image: © andresr, fotolia.com
© andresr, fotolia.com
 

Free backup tool for data centers

BackupwithBacula

Everybody needs backups; if you are looking for an open source program that will back up heterogeneous networks, you should check out Bacula. By Philipp Storz

One of Bacula's biggest advantages is that it doesn't continually vie for the administrator's attention. Instead, it goes about its work quietly and reliably in the background. Once configured correctly, Bacula will run for a long time without any administrative overhead. And, even in a worst-case scenario where the backup server crashes, you don't lose anything. Instead, you can restore directly from the backup media. In other words, if your priorities are stability, reliability, and robustness, Bacula is the tool you've been looking for.

If you work with Bacula, one thing will immediately grab your attention: In the Bacula system, each task is handled by a separate program. Tasks can include reading the data to be backed up and transferring them across the network, writing to the backup media, or updating the catalogs. A separate daemon exists for each of these tasks in Bacula:

Figure 1 shows the four components of the Bacula system. Communication between the various Bacula daemons is handled via three ports registered with IANA (Table 1).

Tabelle 1: Bacula Ports

Name

Port

Description

bacula-dir

9101/tcp

Bacula Director

bacula-dir

9101/udp

Bacula Director

bacula-fd

9102/tcp

Bacula File Daemon

bacula-fd

9102/udp

Bacula File Daemon

bacula-sd

9103/tcp

Bacula Storage Daemon

bacula-sd

9103/udp

Bacula Storage Daemon

Interaction between the Bacula system components. The central control instance is the director, to which all the user interfaces connect.
Figure 1: Interaction between the Bacula system components. The central control instance is the director, to which all the user interfaces connect.

Configuration

Each element on a Bacula system has its own configuration file. The configuration file syntax is the same in each case. Each Bacula configuration file comprises one or multiple resources. Each resource comprises configuration directives in keyword = value pairs and can contain sub-resources. Here is the generic configuration format:

ResourceType {
  keyword = value
  keyword = value
    Sub-ResourceType {
      Keyword = value
      Keyword = value
  }
}

Each resource definition starts with the resource name, followed by the resource content as a keyword = value pair; some resources can have additional sub-resources. Depending on the keyword, the value part can be a test, a numeric value, or a pointer to another resource; in the latter case, the name of the referenced resource is entered.

Listing 1 shows a couple of genuine resources from a director daemon configurations file. The director resource configures the daemon itself. The fileset resource is a good example of a resource with sub-resources. The director resource points to a messages resource by the name of "Daemon" in the Messages = Daemon line.

Listing 1: Excerpt from a Director Configuration

01 Director {
02   Name = bacula-dir
03   Messages = Daemon
04   Password = "YdhKKoy2Huq1CVHwIR"
05   Pid Directory = "/var/run"
06   Query File = "/usr/lib/bacula/query.sql"
07   Working Directory = "/var/lib/bacula"
08 }
09
10 Fileset {
11   Name = "Full Set"
12   Include {
13     File = /usr/sbin
14     Options {
15       Signature = MD5
16     }
17   }
18   Exclude {
19     File = /var/lib/bacula
20     File = /tmp
21   }
22 }
23
24 Messages {
25   Name = Daemon
26   Append = "/var/lib/bacula/log" = all, !skipped
27   Console = all, !skipped, !saved
28   Mail = root@localhost = all, !skipped
29   Mail Command = "/usr/sbin/bsmtp -h localhost -f \"\(Bacula\) \<%r\>\" -s \"Bacula daemon message\" %r"
30 }

After completing the initial configuration for communications with the directory, no further changes are typically needed. Manual attention to the file daemon or console configuration is not necessary, and changes to the storage daemon are very rare once the Bacula system is up and running. The most frequent changes relate to the Bacula director settings, which are essentially the configuration settings for the overall system. The following list explains the resources in the director configuration file and their tasks:

Tabelle 2: Scheduler Intervals

Time Unit

Example

Hour, minute

at 23:05

Day of month

12

Day of week

Mon

Week of month

2nd

Week of year

w04

Monday through Saturday

mon-sat at 23:10

First Monday in month

1st mon at 23:10

Monday in the 1st week of the year

mon w01 at 23:10

Besides the three main Bacula components (director, storage daemon, and file daemon), the Bacula package also contains other tools that run as standalone programs (Table 3).

Tabelle 3: Utilities

bcopy

Copies Bacula media

bextract

Can open Bacula media and extract files

bscan

Can reconstruct the CatalogDB from Bacula media

bsmtp

Bacula SMTP client

btape

Program for testing tape drives with Bacula

btraceback

Program for collecting information in case of a crash

bregex, bwild

Programs for testing regular expressions or wild cards

dbcheck

Program for managing and plausibility checking the catalog database

Daemon Details

The Bacula file daemon is available for many operating systems – just about any flavor of Linux, Unix, Windows, and MacOS. Besides backing up files, the file daemon can also back up filesystem ACLs. On Windows, it addresses the VSS so that consistent backups of all VSS-capable applications can be created automatically.

Of course, the Bacula file daemon can run scripts before and after the backup. The script output and return values are taken into consideration and added to the backup report. The Bacula file daemon can also compress and encrypt the backup data before sending it to the storage daemon.

Encryption occurs transparently, however, as only the file daemon with the right key can restore the data. A master key kept in secure storage avoids the risk of being unable to access the stored data if the key is lost.

The Bacula storage daemon writes data to backup media, which can include hard disks, single tape drives, and tape libraries. To ensure high throughput and operations, the Bacula storage daemon can cache the data in a spool directory and then transfer the data at a single pass and at high speed to the tape. A script is used to control tape libraries. The script can talk to basically any device that supports command-line based controls.

Additionally, the storage daemon offers the option of copying data from one storage medium to another, thus supporting migration and virtual backups.

The Director Daemon

The Bacula director relies on support for various databases for its catalog: SQLite, MySQL, or PostgreSQL. Access Control Lists (ACLs) are useful for restricting access to resources, especially in large environments (Table 4), making sure that certain administrators work only on a certain group of servers and, if necessary, only back up and restore data to these specific servers.

Tabelle 4: Bacula ACLs

ACL name

Meaning

Catalog ACL

Restriction to specific catalogs

Client ACL

Restriction to specific clients

Command ACL

Restriction to specific console commands, e.g. restore only

Fileset ACL

Restriction to specific filesets

Job ACL

Restriction to specific jobs

Plugin Options ACL

Restriction to specific plugin options

Pool ACL

Restriction to specific pools

Storage ACL

Restriction to specific devices

Where ACL

Restriction to specific restore paths

Restarting the Bacula director while a backup is running is particularly critical; however, the director can parse most changes to the Bacula configuration at runtime without interrupting the backup.

The Bacula Console

Although the Bacula console is a command-line interface (CLI), it is very convenient. The help command gives the administrator an overview of the available commands at any time. When you execute a command, the required parameters are prompted for interactively in a menu.

If you compile the Bacula Console with readline support, the Bacula console has the same level of convenience as the Bash shell. Both tab completion for commands and parameters and a searchable history are available.

You can easily automate the Bacula console: Simply pipe the sequence of commands to STDIN, and the output occurs on STDOUT.

Accurate Backup

Besides simple full, incremental and differential backups, Bacula also has some interesting non-standard options. In normal operations, the number of backup jobs is typically far greater than the number of restores. In high-security environments, the file daemon can be launched in a mode that doesn't support write operations and only allows reads. This prevents manipulation of the target system even in case of a backup system compromise. The file daemon obviously has to be restarted with write capability before you can restore data.

Like most other backup solutions, Bacula investigates the timestamp of the last backup and the timestamp of the files for incremental and differential backups to decide whether to back up specific files. This is a tried and trusted principle, but it can cause problems in some cases. For example, if you create files with an out-of-date timestamp, they will never be backed up. Also, Bacula will never notice that files have been deleted; in other words, a restore operation will always create files that didn't exist at backup time.

The Accurate Backup mode is Bacula's solution for avoiding all of these issues. Before creating a backup, a list of all known files and their sizes, permissions, and checksums on the system is transferred. The file daemon compares this list with the filesystem and backs up the difference. On the downside, Accurate Backup needs far more resources than the legacy backup on both server and client side.

Virtual Backups

To improve redundancy and assure compliance with legal requirements, storing backups externally is often necessary. Copy jobs make it easy to copy data from external storage onto tapes. As long as the data are available locally, Bacula will always access the original backups; if the data are no longer available locally, Bacula will request the external backups. You can also swap out backups onto external media for long-term archiving purposes. The source media are then released after the migration.

Full backups are normally performed for data backups at regular intervals. They are very large and correspondingly consume much time and network capacity. Most of the data in a full backup will typically already be available on the media from previous backups.

Virtual Full Backup accesses existing full, incremental, or differential backups and combines them with the changes since the last differential backup to create a new full backup. This approach reduces the load on the network and the client to the load created by an incremental backup, which will typically be less than 10% of the load caused by a full backup. If you decide to use Virtual Full Backups, it is a very good idea to use Accurate Backup also.

Deduplication is also a topic for Bacula. Base jobs let you save huge volumes of data if you have many identical systems. The base job defines a common basis on which other backups are built. The existing base job data is only stored in the base job and is not backed up again.

Plugins

Because it supports VSS on Windows, and because of its use of scripting, Bacula can create consistent backups of many programs without any additional tools. However, some programs require special attention to create a consistent backup and provide proprietary interfaces for the purpose.

To address an interface of this kind, the Bacula file daemon has the ability to integrate backup plugins. These plugins communicate with the backup interface provided by the program vendor.

For example, the NDMP plugin backs up and restores NAS servers by a standardized interface. For Windows operating systems, it offers the option of backing up the system state and can also back up MS Exchange, MS SQL Server and SharePoint with corresponding plugins.

Another interesting plugin supports deduplication at the block level, thus making it possible to back up very large files far more efficiently. Backups of applications such as databases and virtualization solutions will benefit greatly from this capability. Additionally, interacting with software from giants such as SAP and Oracle is no problem for Bacula, thanks to its plugins.

GUIs

Besides the Bacula Console, various other programs are available for managing Bacula. Bacula-Web, Webacula, and Bweb are three web-based programs. Bacula-Web [3] is easy to install and gives the administrator an overview of the system state; however, it cannot interact with the system.

Webacula [4] is far more powerful and, in addition to offering detailed system information, also has the ability to initiate backups and restores. Bweb can be downloaded from the Bacula website at [5] in the form of the bacula-gui archive. Bweb is developed by Bacula Systems and, like Webacula, is very powerful. In the enterprise version, Bweb also supports multiple director daemons.

The Bacula project also offers a native Qt program under the name of Bat (Bacula Admin Tool). Bat runs on various operating systems and is very powerful; it is extremely well suited to restore operations.

The version browser in Bat is a very practical tool. You can use it to view the files stored in a virtual filesystem tree (Figure 2). Additionally, all stored versions of this file are displayed to allow for a selective restore. You can tell from the file checksum whether or not the file has changed.

You can use the Bat version browser to view files stored in a virtual filesystem tree.
Figure 2: You can use the Bat version browser to view files stored in a virtual filesystem tree.

Conclusions

Although Bacula is powerful, it could be improved in some areas. For example, the number of plugins for commercial software is still fairly low. Having said this, Bacula Systems has already identified this issue and is planning to release a large number of new plugins.

Also, the process of configuring the system and text files is complex and can be error-prone on large systems. I am currently developing a program called dassModus that addresses this issue by supporting modifications to the configuration files for the Bacula system in a graphical interface.

Despite these challenges, Bacula is a free, stable, mature backup solution that has proved its value in large-scale environments. It is reliable and virtually maintenance-free.

If you require more details, visit the project website or attend the annual Bacula conference, which is being held along with the Bacula developer conference this year [6]. Additionally, Open Source Press intends to release a Bacula book this year.