Cgroups for resource management in Linux
Take Control
A couple of years ago, I was teaching Linux at a large-scale IT service provider's office. Their administrators had plenty of experience with commercial Unix variants, such as HP-UX, and they asked me how they could implement resource management and controls on Linux. How could an administrator restrict the amount of RAM used by a single process or a group of processes?
At the time, I had to admit that Linux didn't offer this feature. But in 2006, Rohit Seth started to develop this functionality, and as of kernel 2.6.24, administrators can use it. Originally referred to as "process containers," control groups (cgroups for short) [1] can restrict, enumerate (for billing purposes), and isolate resources (e.g., RAM, CPU, I/O).
Although many administrators will not need to use this functionality on a standard server, this application is very interesting in environments with KVM-based virtualization. Cgroups let you restrict the resources used by a virtual guest or prioritize use compared with other guests.
A cgroup lets the administrator group multiple processes and then define parameters for specific subsystems for these processes and all of their child processes. A subsystem could be a resource controller that manages the amount of RAM available.
To use cgroups, you first need to define a hierarchy in which the groups will be managed. To do so, you edit the /etc/cgconfig.conf
file, which you can see in Listing 1.
Listing 1: /etc/cgconfig.conf
01 mount { 02 cpuset = /cgroup/cpuset; 03 cpu = /cgroup/cpu; 04 cpuacct = /cgroup/cpuacct; 05 memory = /cgroup/memory; 06 devices = /cgroup/devices; 07 freezer = /cgroup/freezer; 08 net_cls = /cgroup/net_cls; 09 ns = /cgroup/ns; 10 blkio = /cgroup/blkio; 11 }
If this file doesn't exist, you will need to install the package. The file creates a separate hierarchy for each subsystem, and you then can define your cgroups below it. The /cgroup/cpu
hierarchy lets you manage CPU shares, whereas /cgroup/net_cls
takes care of network I/O performance.
Starting the cgconfig
daemon creates the directories and mounts the cgroups filesystem. The lssubsys
lets you verify that the hierarchies have been created correctly (Listing 2).
Listing 2: lssubsys
01 # lssubsys -am 02 cpuset /cgroup/cpuset 03 cpu /cgroup/cpu 04 cpuacct /cgroup/cpuacct 05 memory /cgroup/memory 06 devices /cgroup/devices 07 freezer /cgroup/freezer 08 net_cls /cgroup/net_cls 09 ns /cgroup/ns 10 blkio /cgroup/blkio
You can then create your control groups by issuing the cgcreate
command:
cgcreate -g blkio:/dd
The command in Listing 3 tells you which parameters are available for the Block I/O subsystem.
Listing 3: Block I/O Subsystem
01 # cgget -g blkio /dd 02 /dd: 03 blkio.reset_stats= 04 blkio.io_queued=Total 0 05 blkio.io_merged=Total 0 06 blkio.io_wait_time=Total 0 07 blkio.io_service_time=Total 0 08 blkio.io_serviced=Total 0 09 blkio.io_service_bytes=Total 0 10 ...
As of kernel 2.6.37, the kernel also supports the blkio.throttle.*
options here. This means that you can restrict the maximum I/O bandwidth for read and write operations by a process group.
To test this, you need the major and minor numbers of the device whose bandwidth you want to restrict. If this is /dev/sda1
, you can determine them with a simple ls
:
# ls -l /dev/sda1 brw-rw----. 1 root disk 8, 1 10. Oct 08:32 /dev/sda1
Here, you can see the device major and minor numbers 8
and 1
, respectively.
To restrict the bandwidth for the control group to 1Mbps, you then run cgset
or simply use the echo
command:
echo "8:1 1048576" > /cgroup/blkio/dd/blkio.throttle.write_bps_device
Now, you can launch dd
for a test.
dd if=/dev/zero of=/tmp/test & pid=$!
I will initially be running the dd
process in the root cgroup, which has no restrictions. You can test this by sending a SIGUSR1
to the process:
# kill -USR1 $pid 578804+0 records in 578804+0 records out 296347648 bytes (296 MB) copied, 7.00803 s, 42.3 MB/s
To move the process to the dd
cgroup, you could use the echo
command:
# echo $pid > /cgroups/blkio/dd/tasks
Now, when you send a USR1
signal to dd
, you will see that the bandwidth drops dramatically because the process is not allowed to write with a bandwidth of more than 1Mbps.
Instead of restricting the maximum bandwidth, you can also prioritize the bandwidth between groups with the use of the blkio.weight=
parameter. The default value is 500
, so if you were to give a group a value of 1000
, they could then access the block devices twice as often as the other groups.
Instead of using the echo
command, you can also assign processes to groups using the cgclassify
command.
Also, you can use the cgexec
command like this
cgexec -g blkio:dd "dd if=/dev/zero of=/tmp/test"
if you want to launch a process directly in a specific group.
Automatic
Assigning processes to groups manually can be tiresome and error-prone. It makes far more sense for the cgrulesengd
daemon to handle these assignments automatically. To allow this to happen, the service needs the /etc/cgrules.conf
file, which tells it which process belonging to which user should be assigned to which control group. The file has a fairly simple syntax:
<user>[:<process>] <controllers> <destination>
Using the example with the dd
command, the rule would look like this:
*:dd blkio /dd
This adds dd
processes belonging to all users to the /dd
control group on the blkio
resource controller.
Hierarchies
Thus far, I have only looked at individual, isolated control groups; however, you can create hierarchies of groups to add more structure.
To be more precise, you can create additional cgroups within a control group as in cgreate -g blkio:/dd/user1
.
The new cgroups appear as subdirectories and inherit the properties of the parent control group. All child cgroups then compete for the resources assigned to the parent cgroup.
If the parent cgroup is only allowed to write at 1Mbps, all of the child groups together are not allowed to exceed this maximum.
Resources are assigned hierarchically; however, these hierarchies don't work for the blkio
controller as of this writing. The other controllers, such as CPU, memory, and so on, already support hierarchies.
Virtualization
Where does it make sense to deploy cgroups? Some special applications will benefit from cgroups in your daily work; but in most cases, it makes more sense to let the Linux kernel manage the resources itself rather than establishing limits. If you deploy a virtualization solution like KVM, however, in which you virtualize multiple guests on a single host, it can be very useful to restrict, prioritize, and measure guest resource use. Cgroups give you an ideal approach to implementing this.
You will need to manage the virtualization via the Libvirt libraries and LXC containers or use QEMU/KVM. The libvirtd
daemon then creates a separate cgroup with the guest name for each guest when launched. The group exists in the libvirtd/qemu|lxc/guest
hierarchy for each controller. You can now manage and prioritize the resources individually for each guest
To allow a guest to use twice as much CPU time as a second guest, you need to modify the CPU controller's cpu.shares
. To achieve your goal here, just change the default value from 1024
to 2048
. You can use a similar approach to configuring RAM or bandwidth usage. To do so, use the memory controller or the net_cls
controller in combination with the tc
command.
Note that you need the latest Libvirt variants to support the net_cls
controller. This controller differs from all other controllers in that it only sets a classID and then expects the administrator to manage the actual bandwidth using the tc
command (see the "Bandwidth Management" box).
You can't use the blkio
controller with Libvirt at this time because it doesn't currently support the hierarchies that Libvirtd wants to create. The kernel developers are already working on a solution [2].
If you want to bill for the time used by individual virtual guests, you can use the CPUAcct controller to do so. This counts the CPU time actually used by each guest in /cgroup/cpuacct/libvirt/qemu/guest/cpuacct.usage
in nanoseconds.
Threads
The current crop of cgroup implementations works on the basis of threads. Each thread in a process can be managed in a separate cgroup, and you need to remember this when you set out to assign the processes to cgroups with the echo
command after launching them. You need to assign all the running threads (/proc/pid/task/
) to corresponding cgroups.
The cgexec
command facilitates this task. The command launches the process in the cgroup, and any child processes and threads then inherit from the group.
Conclusions
Unfortunately, only the very latest distributions support cgroups. Individual functions are available only in the latest Linux kernels. In other words, administrators need to check which properties you can use. But after doing so, cgroups can give you some very powerful functionality, especially in virtualization environments, for control process and guest resources management.