NUTS AND BOLTS Cgroups Lead image: © Photographer, 123RF.com
© Photographer, 123RF.com
 

Cgroups for resource management in Linux

Take Control

The new cgroups feature provides an administrative approach to restricting resource use. This application is particularly interesting for virtualized systems. By Ralf Spenneberg

A couple of years ago, I was teaching Linux at a large-scale IT service provider's office. Their administrators had plenty of experience with commercial Unix variants, such as HP-UX, and they asked me how they could implement resource management and controls on Linux. How could an administrator restrict the amount of RAM used by a single process or a group of processes?

At the time, I had to admit that Linux didn't offer this feature. But in 2006, Rohit Seth started to develop this functionality, and as of kernel 2.6.24, administrators can use it. Originally referred to as "process containers," control groups (cgroups for short) [1] can restrict, enumerate (for billing purposes), and isolate resources (e.g., RAM, CPU, I/O).

Although many administrators will not need to use this functionality on a standard server, this application is very interesting in environments with KVM-based virtualization. Cgroups let you restrict the resources used by a virtual guest or prioritize use compared with other guests.

A cgroup lets the administrator group multiple processes and then define parameters for specific subsystems for these processes and all of their child processes. A subsystem could be a resource controller that manages the amount of RAM available.

To use cgroups, you first need to define a hierarchy in which the groups will be managed. To do so, you edit the /etc/cgconfig.conf file, which you can see in Listing 1.

Listing 1: /etc/cgconfig.conf

01 mount {
02         cpuset  = /cgroup/cpuset;
03         cpu     = /cgroup/cpu;
04         cpuacct = /cgroup/cpuacct;
05         memory  = /cgroup/memory;
06         devices = /cgroup/devices;
07         freezer = /cgroup/freezer;
08         net_cls = /cgroup/net_cls;
09         ns      = /cgroup/ns;
10         blkio   = /cgroup/blkio;
11 }

If this file doesn't exist, you will need to install the package. The file creates a separate hierarchy for each subsystem, and you then can define your cgroups below it. The /cgroup/cpu hierarchy lets you manage CPU shares, whereas /cgroup/net_cls takes care of network I/O performance.

Starting the cgconfig daemon creates the directories and mounts the cgroups filesystem. The lssubsys lets you verify that the hierarchies have been created correctly (Listing 2).

Listing 2: lssubsys

01 # lssubsys -am
02 cpuset /cgroup/cpuset
03 cpu /cgroup/cpu
04 cpuacct /cgroup/cpuacct
05 memory /cgroup/memory
06 devices /cgroup/devices
07 freezer /cgroup/freezer
08 net_cls /cgroup/net_cls
09 ns /cgroup/ns
10 blkio /cgroup/blkio

You can then create your control groups by issuing the cgcreate command:

cgcreate -g blkio:/dd

The command in Listing 3 tells you which parameters are available for the Block I/O subsystem.

Listing 3: Block I/O Subsystem

01 # cgget -g blkio  /dd
02 /dd:
03 blkio.reset_stats=
04 blkio.io_queued=Total 0
05 blkio.io_merged=Total 0
06 blkio.io_wait_time=Total 0
07 blkio.io_service_time=Total 0
08 blkio.io_serviced=Total 0
09 blkio.io_service_bytes=Total 0
10 ...

As of kernel 2.6.37, the kernel also supports the blkio.throttle.* options here. This means that you can restrict the maximum I/O bandwidth for read and write operations by a process group.

To test this, you need the major and minor numbers of the device whose bandwidth you want to restrict. If this is /dev/sda1, you can determine them with a simple ls:

# ls -l /dev/sda1
brw-rw----. 1 root disk 8, 1 10. Oct 08:32 /dev/sda1

Here, you can see the device major and minor numbers 8 and 1, respectively.

To restrict the bandwidth for the control group to 1Mbps, you then run cgset or simply use the echo command:

echo "8:1  1048576" > /cgroup/blkio/dd/blkio.throttle.write_bps_device

Now, you can launch dd for a test.

dd if=/dev/zero of=/tmp/test & pid=$!

I will initially be running the dd process in the root cgroup, which has no restrictions. You can test this by sending a SIGUSR1 to the process:

# kill -USR1 $pid
578804+0 records in
578804+0 records out
296347648 bytes (296 MB) copied, 7.00803 s, 42.3 MB/s

To move the process to the dd cgroup, you could use the echo command:

# echo $pid > /cgroups/blkio/dd/tasks

Now, when you send a USR1 signal to dd, you will see that the bandwidth drops dramatically because the process is not allowed to write with a bandwidth of more than 1Mbps.

Instead of restricting the maximum bandwidth, you can also prioritize the bandwidth between groups with the use of the blkio.weight= parameter. The default value is 500, so if you were to give a group a value of 1000, they could then access the block devices twice as often as the other groups.

Instead of using the echo command, you can also assign processes to groups using the cgclassify command.

Also, you can use the cgexec command like this

cgexec -g blkio:dd "dd if=/dev/zero of=/tmp/test"

if you want to launch a process directly in a specific group.

Automatic

Assigning processes to groups manually can be tiresome and error-prone. It makes far more sense for the cgrulesengd daemon to handle these assignments automatically. To allow this to happen, the service needs the /etc/cgrules.conf file, which tells it which process belonging to which user should be assigned to which control group. The file has a fairly simple syntax:

<user>[:<process>] <controllers> <destination>

Using the example with the dd command, the rule would look like this:

*:dd blkio /dd

This adds dd processes belonging to all users to the /dd control group on the blkio resource controller.

Hierarchies

Thus far, I have only looked at individual, isolated control groups; however, you can create hierarchies of groups to add more structure.

To be more precise, you can create additional cgroups within a control group as in cgreate -g blkio:/dd/user1.

The new cgroups appear as subdirectories and inherit the properties of the parent control group. All child cgroups then compete for the resources assigned to the parent cgroup.

If the parent cgroup is only allowed to write at 1Mbps, all of the child groups together are not allowed to exceed this maximum.

Resources are assigned hierarchically; however, these hierarchies don't work for the blkio controller as of this writing. The other controllers, such as CPU, memory, and so on, already support hierarchies.

Virtualization

Where does it make sense to deploy cgroups? Some special applications will benefit from cgroups in your daily work; but in most cases, it makes more sense to let the Linux kernel manage the resources itself rather than establishing limits. If you deploy a virtualization solution like KVM, however, in which you virtualize multiple guests on a single host, it can be very useful to restrict, prioritize, and measure guest resource use. Cgroups give you an ideal approach to implementing this.

You will need to manage the virtualization via the Libvirt libraries and LXC containers or use QEMU/KVM. The libvirtd daemon then creates a separate cgroup with the guest name for each guest when launched. The group exists in the libvirtd/qemu|lxc/guest hierarchy for each controller. You can now manage and prioritize the resources individually for each guest

To allow a guest to use twice as much CPU time as a second guest, you need to modify the CPU controller's cpu.shares. To achieve your goal here, just change the default value from 1024 to 2048. You can use a similar approach to configuring RAM or bandwidth usage. To do so, use the memory controller or the net_cls controller in combination with the tc command.

Note that you need the latest Libvirt variants to support the net_cls controller. This controller differs from all other controllers in that it only sets a classID and then expects the administrator to manage the actual bandwidth using the tc command (see the "Bandwidth Management" box).

You can't use the blkio controller with Libvirt at this time because it doesn't currently support the hierarchies that Libvirtd wants to create. The kernel developers are already working on a solution [2].

If you want to bill for the time used by individual virtual guests, you can use the CPUAcct controller to do so. This counts the CPU time actually used by each guest in /cgroup/cpuacct/libvirt/qemu/guest/cpuacct.usage in nanoseconds.

Threads

The current crop of cgroup implementations works on the basis of threads. Each thread in a process can be managed in a separate cgroup, and you need to remember this when you set out to assign the processes to cgroups with the echo command after launching them. You need to assign all the running threads (/proc/pid/task/) to corresponding cgroups.

The cgexec command facilitates this task. The command launches the process in the cgroup, and any child processes and threads then inherit from the group.

Conclusions

Unfortunately, only the very latest distributions support cgroups. Individual functions are available only in the latest Linux kernels. In other words, administrators need to check which properties you can use. But after doing so, cgroups can give you some very powerful functionality, especially in virtualization environments, for control process and guest resources management.