Nuts and Bolts Capsicum for FreeBSD 
 

Capsicum – Additional seasoning for FreeBSD

Hot and Spicy

Applications such as web browsers can open up vulnerabilities and threaten an entire system. As a remedy, Capsicum provides the option of finely granulated allocation of privileges on top of sandboxing. By Jürgen Dankoweit

Administrators often break into a sweat when they read security bulletins explaining the malicious code that is currently in the wild for the programs they use. Web browsers, email programs, archiving tools, and even Office packages are affected. It is not just negligence in the use of libraries that makes it easy for intruders to execute malicious code, but also targeted attacks on vulnerable applications. The mechanisms known on FreeBSD (chroot or jails) are not really an answer.

One remedy is to lock applications up in a sandbox, an environment that provides only very limited resources. However, because FreeBSD up to version 8 does not provide such a mechanism, the Capsicum environment was created in FreeBSD 9. In addition to a protected environment (sandbox), from which applications can't break out, it supports finely granular allocation of rights.

Traditional Access Authorization

FreeBSD, Linux, and other Unix systems traditionally have had a very simple permissions system. To find the reason for this, you have to look back at early Unix systems, which were not originally designed for a desktop networked into the global Internet. This resulted in two main mechanisms of access control. The first mechanism is discretionary access control (DAC), which depends on the user ID. Here, the decision of whether a resource can be accessed is made solely on the basis of the user's identification. This means that, for each user, access rights to data are set by an administrator or by the user. The best example of this is a home directory, to which only the user has access.

The disadvantage of the method is demonstrated by the passwd command for changing the user password. Because users can assign themselves a password or change their password in the user database, the passwd command needs write access to the /etc/passwd file. Only the root user has permission to change that file, however. Put simply, you use a trick and set the SUID flag for the passwd command; the command is then executed with root privileges and the change to /etc/passwd is carried out. Under certain circumstances, this mechanism can be misused as a gateway for malicious software.

The other mechanism for controlling access is mandatory access control (MAC). Here, access is granted on the basis of a set of rules. The disadvantage of this method is that such rules must be defined within the application, resulting in greater programming overhead. The programmer also bears full responsibility for assigning permissions.

These two types of access control are designed primarily to regulate unauthorized access to files. This approach does not prevent access to storage areas or even control structures of a kernel. Also, the mechanisms were never designed to cover modern desktop applications such as web browsers or office packages, which is critical when you consider that such applications process and display information originating from dubious sources. With DAC or MAC, the execution of malicious code in JavaScript or macro viruses can be difficult to prevent.

FreeBSD connoisseurs will point out that there are jails in which you have the option of building a sandbox. That is correct, but the administrative overhead and resource usage would be enormous if you created a jail for each application. Also, jails don't solve the problem of malicious code infiltrating a system.

Another possibility is that of breaking down an application into smaller processes that can be launched by the main process and equipping them with special access rights.

Figure 1 shows a safe environment for a web server based on the example of the Apache HTTP daemon. When Apache is started, the main process has all rights necessary to access the configuration files and the complete directory structure. Additionally, sockets are created that allow web browsers to retrieve web pages. After this basic configuration is complete, subprocesses that handle the actual task of the HTTP daemon are then started. Each subprocess is given permission to access the directory and resources allocated to it. This setup means that the process runs in a sandbox.

Sandboxing with Apache.
Figure 1: Sandboxing with Apache.

Programming an HTTP daemon hardened in this way involves considerable effort because access mechanisms must be specifically implemented for every Unix system or BSD operating system.

Chili Pepper

FreeBSD offers a solution for this problem in the form of Capsicum. FreeBSD serves as a reference platform here, not only for other BSD systems, but also for other Unix platforms. Capsicum in FreeBSD was implemented in the scope of the Google Summer of Code. Many kudos are owed to Pawel Jakub Dawidek (pjd) and his colleagues on the FreeBSD development team for their support and implementation of the project.

In the development of the Capsicum framework, the problems mentioned here were addressed, and new security features were introduced to harden applications. To fully exploit the benefits of Capsicum, you need to, in the worst case, either redevelop applications or at least restructure the code. Restructuring code is not necessarily a bad thing.

The focus of development of Capsicum was for existing access control mechanisms to remain functional without any changes. Additionally, the idea was for the application programming interface (API) to remain unchanged so that existing software would continue to work without any restrictions. Therefore, the Capsicum system extends the Unix programming interface by implementing its own functions within the operating system kernel.

To use Capsicum in your own applications and with operating system tools, you can build on the C header files (sys/capability.h, libcapsicum.h) and the libcapsicum library, which communicates with the kernel extensions.

To understand Capsicum, some non-trivial basics need to be explained. Capsicum supports what is known as capability mode, which is a flag set by the cap_enter() function. It indicates that all file and storage operations are now highly regulated. This flag is inherited by all child processes and cannot be deleted.

Processes that are in the capability mode only have very limited access to the kernel namespace (Table 1). Additionally, some system interfaces are protected. This includes all device drivers that allow access to the physical memory or PCI bus. Also, commands such as reboot or kldload can be blocked.

Tabelle 1: Global Namespace of the FreeBSD Kernel

Namespace

Explanation

Process ID (PID)

Unix processes are represented by unique identifiers. PIDs are returned at the start of a process and can be used for debugging, to send signals, for monitoring, and to determine the current state.

File paths

Unix files exist in a global, hierarchical namespace that is protected by DAC and MAC.

NFS file handles

Both NFS clients and NFS servers use file handles to identify files and directories. NFS access management also relies on these.

Filesystem IDs

These determine the mapping of mountpoints to paths and are used to perform a forced unmount if a path no longer exists.

Protocol addresses

The protocol families use socket addresses to refer to local or remote network endpoints. They exist in the global namespace, as do IPv4 addresses and ports or sockets.

Sysctl MIBs

The sysctl management system users both numeric and alphanumeric entries to read or change system parameters.

System V IPC

Message queues, semaphores, and shared memory are used for interprocess communication and are handled according to the System V standard.

POSIX IPC

Message queues, semaphores, and shared memory are used for interprocess communication and are handled according to the POSIX standard

System clocks

FreeBSD systems provide several interfaces for managing the system clock.

Jail

Jails as FreeBSD-based virtualization use their own namespace as a subset of the global namespace.

CPU sets

Assignments between CPU resources and processes and threads.

Calls to system functions are also regulated in capability mode. Some features that have access to the global namespace are no longer available, whereas others have limited access. An example is the sysctl command and its counterpart systctl() in the libc programming library: With this command, you can query memory allocation, sniff network connections, or modify kernel parameters, and it can provide potential attackers a vector for an attack or monitoring. To increase the security, access was restricted to just 30 parameters – compared with the 3,000 parameters that sysctl() offers. Simply by enabling capability mode, you create a sandbox from which applications cannot break out.

In addition to capability mode, Capsicum also introduces finely granular permissions without abandoning the previous system of permissions (Figure 2). This trick was possible because the developers expanded the structure of the file descriptor. A file descriptor is a system-wide unique serial number that points to a data structure. This data structure – also known as metadata – includes permissions as well as the file name. The most famous file descriptors are STDIN (standard input), STDOUT (standard output), and STDERR (standard output for error messages).

Finely granular permissions as an extension to the existing rights system in FreeBSD.
Figure 2: Finely granular permissions as an extension to the existing rights system in FreeBSD.

The previously used file descriptors already contain the FreeBSD permissions. These are immutable characteristics that can be inherited by child processes. However, in terms of security, their disadvantage is that they allow manipulation of metadata, even if a file or a device has been opened for exclusive read or write operation.

At this point, Capsicum extends the data structure associated with the file descriptor. Once cap_enter() is called by an application, all file descriptors use the extended data structure. As soon as this kind of file descriptor is used, the kernel checks to see whether everything is correct when accessing the hardened unit.

For developers of applications that use the Capsicum system, this step is important, because you have to decide whether to allow access that is already blocked by cap_enter(), be even more restrictive, or add even more rules. This is done by calling cap_new(), which expects an existing file descriptor and the permissions that you want to set as parameters. It doesn't matter whether the file descriptor was created for files, Unix or network sockets, directories, or devices. The man page for cap_new() lists all the available permissions, which are OR'd and then passed to Capsicum. The man page also lists numerous system functions of the libc C library that are affected by Capsicum.

Capsicum therefore requires that you plan your applications carefully. This task is certainly not trivial, because it requires very precise analysis of the resources, including the use of protected shared memory instead of a publicly accessible shared memory area for exchanging data. Capsicum gives the programmer the freedom of choice to use FreeBSD's permission system or the libcapsicum library.

Hot tcpdump

Applications with dubious privileges can be revamped so that they use cap_enter() directly. This approach creates an application whose individual processes run in capability mode and inherit special permissions via their file descriptors. It works well for simple applications that operate on the basis of the following schema: Open all resources and process all incoming and outgoing data in a loop – like a Unix pipeline or through interaction with a network. The speed hit from Capsicum is very low if you restrict permissions when accessing the resources.

On the basis of the FreeBSD network analysis tool tcpdump, this objective is described in detail below. Tcpdump is built in line with the schema I just mentioned and therefore is easy to convert to Capsicum: The program uses the Berkeley packet filter bpf to analyze the data transported over a network. To do so, tcpdump passes a search pattern to the packet filter. In the next step, the filter is defined as an input source to send the information to tcpdump for further processing. Finally, the incoming data are interpreted, reprocessed, and displayed on the console in a loop.

Thus, the application can be switched to Capsicum capability mode with just two additional lines of program code:

if (cap_enter() < 0)
 error("cap_enter: %s",pcap_strerror(errno));

The following two lines are inserted in front of the loop that carries out the traffic analysis:

status = pcap_loop(pd, cnt, callback, pcap_userdata);

This approach improves security considerably. The ability to parse and analyze data packets is typically a vulnerability because memory access is often handled by C pointers and copy actions. As explained above, Capsicum prevents access to privileged memory areas by calling cap_enter().

To restrict communication with standard devices (STDIN, STDOUT, and STDERR) as well, you need to insert Listing 1 before the first call of cap_enter().

Listing 1: Restricting Standard Channels

if (cap_rights_limit(STDIN_FILENO,
 CAP_FSTAT) < 0)
  error("cap_new: unable to limit STDIN_FILENO");
if (cap_rights_limit(STDOUT_FILENO,
 CAP_FSTAT | CAP_SEEK | CAP_WRITE) < 0)
  error("cap_new: unable to limit STDOUT_FILENO");
if (cap_rights_limit(STDERR_FILENO,
 CAP_FSTAT | CAP_SEEK | CAP_WRITE) < 0)
  error("cap_new: unable to limit STDERR_FILENO");

With the cap_rights_limit() function used here, read access to the STDIN device is prevented, whereas write operations are allowed to the standard output devices, STDOUT and STDERR.

Analysis with the FreeBSD command procstat using the -C parameter confirms these facts, as shown in Figure 3. In the first and second columns, you can see the process ID and the name of the process; the third column shows the file descriptor. In this example, these are standard input (FD=0), standard output (FD=1), standard error output (FD=2), and the bpf driver for the Berkeley packet filter (FD=3). The fourth column describes the type of file descriptor, and the FLAGS column shows what FreeBSD permissions are set. The letter c shows that Capsicum is active for this file descriptor. The CAPABILITIES column indicates which Capsicum permissions are used. Specifying FS (CAP_FSTAT) means that the status of the file descriptor can be queried; wr (CAP_WRITE) stands for write permission, and se (CAP_SEEK) means that the file pointer can be set. An overview of all Capsicum permissions can be found online [2]. The last two columns show the log and the device driver used for each file descriptor.

The output from procstat with a Capsicum-secured tcpdump.
Figure 3: The output from procstat with a Capsicum-secured tcpdump.

Using Capsicum does cause a clearly visible nasty side effect, especially with tcpdump: Access to the name service switch is blocked. In the case of tcpdump, this information is needed to convert IP addresses into fully qualified hostnames.You can work around this shortcoming by sending requests to a local domain name server.

Split

A nice example of compartmentalization is provided by the rwhod program. This system daemon is responsible for retrieving system information. The information includes which user is currently logged in, as well the use period and time of login.

To switch the daemon to Capsicum, the code must be cleaned up, and the areas to be protected need to be split into functions: The two main functions are void receiver_process(void) for receiving data and void sender_process(void) for sending the requested information to a client.

After completing the hardening in this example, the programmer needs to consider which access rights the tool requires for proper operation. In particular, you need to focus on the void receiver_process(void) function, because it is used to write data to the whod.<hostname> file in the /var/rwho directory.

Earlier in the text, I explained that a file descriptor created for writing data to a file has the ability to read a file. For malicious code, this situation is welcome, because it means that undesirable information can be propagated. However, the Capsicum cap_rights_limit() function prevents precisely this situation if you set the CAP_WRITE | CAP_FTRUNCATE | CAP_FSTAT flags:

if (cap_rights_limit(whod,CAP_WRITE | CAP_FTRUNCATE | CAP_FSTAT) < 0 && errno != ENOSYS) {
      syslog(LOG_WARNING, "cap_rights_limit: %m");
      exit(1);
}

Check out the complete source code [4] starting at line 404 for more details. The flags indicate that the whod file descriptor can only write to the file (CAP_WRITE), change the size of the file (CAP_FTRUNCATE), and retrieve the file status information (CAP_FSTAT). Any other operation is prevented. If malicious code does try to manipulate the flags, it has no chance of success. Flags that have been set can no longer be changed.

Additionally, you need to define clearly which file operations can be executed in the /var/whod directory. The code [4] as of line 358 (Listing 2) handles this.

Listing 2: File Operations

if (cap_rights_limit(dirfd, CAP_CREATE | CAP_WRITE | CAP_FTRUNCATE | CAP_SEEK | CAP_LOOKUP | CAP_FSTAT) < 0 && errno != ENOSYS) {
     syslog(LOG_WARNING, "cap_rights_limit: %m");
     exit(1);
}
if (cap_enter() < 0 && errno != ENOSYS) {
     syslog(LOG_ERR, "cap_enter: %m");
     exit(1);
}

These few lines of C code are responsible for ensuring that files can be created in or added to the already open directory using the dirfd file handle. However, the program can read the files created here.

Window to the World

Many utilities and tools need access to certain resources. One example has already been mentioned in connection with tcpdump. This tool requires access to the domain name server to convert host names to IP addresses. Within the program, the name service switch (NSS) is called to handle this task, but because Capsicum prevents this access, another solution had to be found. It came in the form of the Casper tool (Capsicum Service), which – as a daemon – provides a controlled option for allowing exceptions.

The function of Casper is quickly explained by Figure 4. Casper starts a program, such as tcpdump, which is locked into a sandbox. Before any monitoring mechanisms can be armed, the program logs the exception rules with the Casper daemon and then activates the protection mechanisms. Such an action must be done beforehand because no communication with system services – and this includes Casper – is possible after enabling Capsicum. In the example, this was handled as shown in Listing 3. The complete code is available online [7].

Communication path of tcpdump and Casper for a DNS query.
Figure 4: Communication path of tcpdump and Casper for a DNS query.

Listing 3: Casper

[...]
#ifdef HAVE_LIBCAPSICUM
  if (nflag) {
    capcas = NULL;
    capdns = NULL;
  } else {
    capcas = cap_init();
    if (capcas == NULL)
      error("unable to contact Casper");
    capdns = cap_service_open(capcas, "system.dns");
    if (capdns == NULL)
      error("unable to open system.dns service");
    /*Limit system.dns to rev. DNS lookups.*/
    limits = nvlist_create(0);
    nvlist_add_string(limits,"type", "ADDR");
    nvlist_add_number(limits,"family", (uint64_t)AF_INET);
    nvlist_add_number(        limits, "family", (uint64_t)AF_INET6);
    if (cap_limit_set(capdns, limits) < 0)
      error(        "unable to limit access to system.dns service");
    nvlist_destroy(limits);
    / * Casper capability no longer needed. * /
    cap_close(capcas);
  }
#endif  /* HAVE_LIBCAPSICUM */
[...]

First, cap_init() is called to contact the Casper daemon and register the program. In the next step, the cap_service_open(...) function reports the desired exception to the daemon. In this example, this is DNS requests, as shown by the system.dns option. The daemon expects a list designated by = nvlist_create(...) with details of the functionality. The first element states that the function relates to converting IP addresses to hostnames, as shown by type and ADDR. The two next entries describe the IP address family. This example contains both IPv4 and IPv6 addresses (AF_INET and AFINET6).

The script passes this list, using cap_limit_set(...), to the Casper daemon and then deletes it, because it is no longer required by the tcpdump program. From this point, Casper has all the information needed to allow the tool access to the domain name service.

You might ask whether this approach does not undermine the sandbox concept. Casper does not grant free access to the resource but uses the finely granular rights provided under Capsicum. Additionally, the author of the program specifies how this communication with the outside world takes place, rather than some external program.

Hot Applications

To demonstrate that not only system programs, but also user programs, can be hardened with Capsicum, Google adapted the new environment for the Google Chromium web browser. When launched, Chromium spawns several processes that handle tasks such as processing HTML code, processing JavaScript, or encrypting data. The original FreeBSD port of the browser contained no security features like sandboxing.

The fact that the program is already compartmentalized facilitated adaptation to the FreeBSD Capsicum environment. The subprocess that retrieves the graphical representation of the website receives special privileges that allow it to communicate with the Xorg graphics system. Protected memory areas are used for transporting data between the various subprocesses. Subprocesses have no access to locations outside the sandbox for compiling JavaScript, HTML, and XML.

Although the code size of Chromium is huge – it is said to be around 4.3 million lines of code – the Capsicum implementation was trouble-free and took just a hundred lines of code. If you want to achieve the same level of security on Windows, for example, you need more than 23,000 lines of code [8]. Also the developer of the GNUstep desktop suite has already announced the intent to use Capsicum in applications.

Conclusions

FreeBSD 9 developers introduced the Capsicum security mechanism. The full extent of Capsicum functionality will be available as of FreeBSD 10. All safety-critical system programs will then use the new framework. Also, some applications, such as Apache, might be adapted to use the new FreeBSD environment. Some developers of GNUstep applications also intend to modify them for FreeBSD Capsicum. The KDE maintainers have announced their intent to implement Capsicum in KDE. It will be exciting to see what kind of interesting applications Capsicum supports in the future.