Features Traffic Control Lead image: © Russell Shively, 123RF.com
© Russell Shively, 123RF.com
 

Get the best out of your bandwidth with tc

Control Freak

If you're looking for a way to make the most of available bandwidth, we'll show you how to throttle and shape traffic with tc. By Chris Binnie

System admins have many legitimate reasons for wanting to restrict the bandwidth allocated to a user or service. Bandwidth is still a finite and relatively expensive resource, even on today's Internet. The good news is that Linux provides a formidable bandwidth throttling solution that is hard to ignore.

With the ability to control bandwidth on routers, servers, and desktops, this solution is certainly fit for the purpose – especially because commercial alternatives, usually bolted-on or bundled with other proprietary operating systems, can cost significant money to implement. In some cases, these commercial solutions don't even perform well under load, incorrectly allocating bandwidth to one service or user at the detriment of the others using the network.

Before I get too carried away here, let me state that the Linux solution I'm referring to isn't for the faint-hearted. In fact, I'd go as far as to say that anybody who isn't blessed with an abundance of technical chutzpah might be better off looking away now.

That said, if you are up for the challenge, in this article I'll provide some working examples for you to experiment with, and I'll give you a glimpse into the inner workings of this solution. Supporting documentation can be found online, starting with the Linux Advanced Routing and Traffic Control website [1].

Ladies and Gentlemen

Traffic Control abbreviates nicely to tc. The man page for tc describes what it does beautifully (paraphrased for the sake of simplicity): Tc is used to configure Traffic Control in the Linux kernel. Features include:

Processing of traffic is controlled by three kinds of objects: qdiscs, classes, and filters.

A class defines the rules or constraints that will be applied to the traffic. A filter defines the set of IP addresses (or other identifiers) to which the rules defined in the class apply.

The Appliance of Science

Imagine you're testing a remote application that reacts differently to how much bandwidth is available for your Internet connection. Using tc, it is possible to alter your throughput quickly with a (relatively long) single command line.

Consider a backup service you use on your desktop – which conveniently pushes any file changes to your home directory – to some type of cloud storage. This backup service might get a little greedy with your DSL's precious upload capacity. You want to throttle specific ports or IP addresses so you can get some work done while the differential backup is quietly running in the background. Your friend tc can also handle this situation perfectly. Of course, this approach could be applied to port 80 (for HTTP traffic) or port 25 (for SMTP traffic) with minor changes.

Another scenario for tc might be when you are using Linux as a router (a home gateway, or a fully fledged multihomed router running BGP), and you want to throttle each LAN machine's outbound capacity so those machines play nicely with your limited bandwidth. With tc, you can separately enforce different levels of inbound bandwidth throttling.

At this stage, it's worth mentioning that, unlike your standard DSL connection, which provides limited upload capacity but a generous download capacity, the connection I'll refer to in the next few sections relates to a Linux router on a synchronous bandwidth connection that offers an inbound capacity equal to its outbound capacity.

Other Things Starting with B

If you've ever written a shell script, you'll be familiar with the shebang at the beginning of the script (e.g., #!/bin/bash), showing where the interpreter resides on the system. You will also have most likely developed a habit of declaring variables at the start of your shell script. In this case, I will start by building a bandwidth throttling script, using Bash as the interpreter.

Thankfully, I only need a few variables for the script. The first task is to get the network interfaces in order. I will demonstrate how to treat a standard Linux machine as a router. Within that scenario, I need two network interfaces, loosely (and sometimes confusingly) referred to as an ingress and an egress network interface.

Better to Give than Receive

Imagine the group of machines on the LAN are servers, and I want to limit how much of the network link each machine can use at any one time. Also, for the sake of argument, I will assume these servers send more information out to the Internet than they receive (like a typical web server).

I'm going to assume that, in this case, the Internet-facing network interface is eth0. In other words, when traffic is sent to the Internet, or when I get traffic from the Internet, it will arrive on eth0. Conversely, any traffic received by the router on the Local Area Network (LAN) will be received on the network interface labeled eth1. (I can safely assume that most traffic sent to the router is destined for the outside world, and the LAN machines rarely need to communicate with the router itself.)

Remember that the router will receive data from the Internet and also the LAN on both interfaces; in other words, both network interfaces, eth0 and eth1, might be considered ingress interfaces. I will name eth1 DEVING because it is the main ingress interface. For simplicity, the Internet-facing network interface, eth0, will simply be referred to as DEV.

DEV=eth0
DEVING=eth1

Along the same vein as command switches (like those in use by iptables), I also need to declare which direction the traffic is going, so rather than using destination or dst in this example, I'll use DIR to make sure the direction of the traffic is from source notated as src.

DIR=src

Recall that I am working with a group of LAN machines, like web servers, that send more traffic than they receive out through the router. In this case, I will focus the bandwidth shaping on traffic sent from the LAN machines and not to them.

To declare that the network link is running at Gigabit speeds, I will use the BAND variable:

BAND=1Gbit

This variable lets tc know the ceiling capacity, which is important for accuracy when tc is flexing its muscles with its highly complex algorithmic number-crunching calculations, or even more so when it is sifting through the wheat and chaff on a really busy network link.

Mr. Average

When it comes to the average packet size I'll be policing or shaping, I am admittedly faced with a bit of a minefield. Rather than get too deep into the whys and wherefores, I'll use a setting where PKT equals 920.

PKT=920

For optimum bandwidth throttling results, finding out your average packet size can really help, but generally it is only going to make an obvious difference if you plan on throttling a large part of a gigabit's worth of traffic. One quick way of finding your average packet sizes might be from a switch on your local network, if it gathers useful statistics, or maybe from your ISP's Internet-facing gateway box.

Several packages available on Linux will help you glean what your busiest protocol is and what your average packet size is. Tools such as Snort [2] will run for a few minutes and offer detailed data about your networks protocol usage. Ntop [3] is another example that will also give information about packet sizes. Failing these choices, there's always my preference, the succinct tcpstat [4], which will report on protocol data with a command as simple as:

tcpstat -i eth0

One of these packages (or just the output from your switch on its own) should help you make an informed decision about what to set for the average packet size parameter if you are inclined to look at tc in greater detail to improve its performance.

A quick tangent at this point will allow me to point out that, should you make a mistake while building your bandwidth throttling script, running it using Bash's debugging mode can help point out which line is the culprit. Simply run your script as follows:

# bash -x ./bandwidth_throttler

Typing dmesg could give direct feedback from the kernel too.

Cooking with Gas

Now I will move onward into the deep, dark innards of the script. (See the box titled "Weights and Measures" for a note on units referenced in tc.) Each time I run the script, I want to reset the previous values that were set live the last time the script was run. So I begin with the del line:

tc qdisc del dev $DEV root

Next, I'll install a qdisc. qdisc is an abbreviation for queuing discipline. In basic terms, think of it as a buffer between the kernel and your network interface that filters how traffic is distributed to your network. That is a very basic description, but it will suffice for now. I will be using a classful qdisc, which simply means it can contain classes or other rules under its umbrella.

I have had mixed results with the most popular classful qdisc, HTB, so I will focus on CBQ, an older alternative. CBQ stands for Class-Based Queuing.

CBQ's man page (type man tc-cbq) makes a beautifully simple statement early on that might help you understand how shaping or throttling works with tc:

When shaping a 10mbit/s connection to 1mbit/s, the link will be idle 90% of the time. If it isn't, it needs to be throttled so that it IS idle 90% of the time.

To reach that goal, I carefully need to slow or drop packets that are received too quickly. Again, I should point out that I'm keeping the language purposefully simple, so please be aware you will probably need to read the documentation further for a more accurate insight into the terminologies used.

Parental Supervision

After I've cleared the decks with the del line, I need to begin to create a hierarchy. This means adding the qdisc configuration:

tc qdisc add dev $DEV root handle 1: cbq bandwidth $BAND avpkt $PKT allot 1514 cell 8 mpu 64

As you can see, the qdisc is added first and then pointed at $DEV, which is set globally at the start of the file. Second, I am asking tc to opt for the cbq qdisc, as opposed to the other alternatives (each has its own pros and cons, but as mentioned, CBQ should work fine in this case). Finally, I make the statement that I have a gigabit of bandwidth available to the qdisc and then provide a parameter with average packet size declarations and other finely tuned settings. In case you are wondering, 1: is the same as 1:0 in this case. (1:0 is the label I am giving the qdisc.)

Time for School

I have defined a parental qdisc, and now I must create children who live under that qdisc's roof. I need an uppermost class, directly underneath the qdisc, from which to spawn the children:

tc class add dev $DEV parent 1:0 classid 1:1 cbq bandwidth $BAND rate 632Mbit allot 1514 maxburst 20 avpkt $PKT cell 8 weight 64Mbit prio 7

Hopefully, as I go further into the details, the preceding command will become clearer, but again, I'm taking each step carefully so as not to cause too much eye strain along the way.

I am using the network interface defined globally in the script under $DEV. The parent class (keep thinking about hierarchies) should look up the way to 1: or 1:0, which is what I named the aforementioned qdisc. And, for the purposes of the child classes, I will call the parent 1:1.

I have already told tc that I have a gigabit network link available, but the rate section is for declaring the sum total of all the child classes (and from what I've read, there is much variation in how this setting is actually used). In other words, on a 1000Mbit network pipe, I am telling tc that, whatever happens, I will only ever want to use 632Mbits. This could be for a number of reasons; for example, I might be charged by the ISP for exceeding the cap, or my network interface, or a switch, might only be able to cope with a limited volume of traffic.

Children, Please

The child classes are pretty straightforward, if you keep thinking about that hierarchy. First, the qdisc is at the top, and it is followed by a "master" class (of sorts) that then owns child classes, which are populated with filters (which I will come to shortly).

I will piece the jigsaw puzzle together a little later on, but in the meantime, the following is a class configuration line:

tc class add dev $DEV parent 1:1 classid 1:12 cbq bandwidth $BAND rate 25Mbit allot 1514 cell 8 weight 2500Kbit prio 3 maxburst 20 avpkt $PKT bounded

(Note that this line won't work on its own; all the configuration lines need to be present to function in a meaningful way.)

As you can see, I'm using the tc binary to add a class on the network interface: variable $DEV. I'm referencing up the hierarchy to the parent qdisc, which is labeled 1:1.

How about the bandwidth settings? I have arbitrarily named the first child class 1:12 and have set the $BAND at a gigabit as usual. (The number 1 before the colon is used because it lives underneath the "master" class 1:1.) I have stated that the class can use up to 25Mbits at any one time. The final bandwidth setting is used when I do reach that 25Mbit limit, with spiky traffic, and should that happen, I need a way to coping with those spikes quickly yet still be in a position to serve the full 25Mbits.

For this reason, I set 10% of the total bandwidth under the weight parameter at 2500Kbits (a tenth of 25Mbits). Again, for optimum performance, the 10% measurement is somewhat debatable, but I haven't had problems using it generally. Of course, it depends on whether you're throttling steady, flat-lining traffic or erratic, spiky traffic.

Filtered Coffee

I still haven't declared which IP addresses I want to associate with the shiny new class called 1:12. For this purpose, I need to use filters. (Many, many filters can reside under each child class.) My filter definition is as follows:

tc filter add dev $DEV protocol ip parent 1:0 prio 100 u32 match ip $DIR 12.34.56.78 flowid 1:12

You'll be glad to know the filter's config is much simpler than the classes or the qdisc configuration.

Aside from the now somewhat familiar configuration parameters, such as which network interface to use, I am stating that I want to affect the IP protocol under the 1:0 qdisc. Note also that I'm declaring direction here (i.e., source or destination), and in this case, $DIR will catch traffic coming from the next parameter, 12.34.56.78, the source IP address.

I finish with the creation of the first filter by making sure it sits under the 1:12 child class correctly.

Never Been Good at Jigsaws

So far, I have defined a few important configuration lines. Now it is time to start assembling them, which I hope will make things a little clearer. I'll tail the bandwidth throttling script shortly for the purposes of good housekeeping, but Listing 1 shows what I have so far.

Listing 1: Bandwidth Throttling Script

01 #!/bin/bash
02 # Our predefined variables
03 DEV=eth0
04 DEVING=eth1
05 DIR=src
06 PKT=920
07 BAND=1Gbit
08 # Reset values from previous script runs
09 tc qdisc del dev $DEV root
10 # Install your qdisc
11 tc qdisc add dev $DEV root handle 1: cbq bandwidth $BAND avpkt $PKT allot 1514 cell 8 mpu 64
12 # Create your "master" class
13 tc class add dev $DEV parent 1:0 classid 1:1 cbq bandwidth $BAND rate 632Mbit allot 1514 maxburst 20 avpkt $PKT \cell 8 weight 64Mbit prio 7
14 # Then create your "child" class 1:12
15 tc class add dev $DEV parent 1:1 classid 1:12 cbq bandwidth $BAND rate 25Mbit allot 1514 cell 8 weight 2500Kbit prio 3 \maxburst 20 avpkt $PKT bounded
16 # Now add filters to your child class
17 tc filter add dev $DEV protocol ip parent 1:0 prio 100 u32 match ip $DIR 12.34.56.78 flowid 1:12

Lots of IPs

If I want to add more IP addresses to class 1:12, so that they might share the 25Mbits of allotted bandwidth, I could just keep adding filters:

tc filter add dev $DEV protocol ip parent 1:0 prio 100 u32 match ip $DIR 98.76.54.32 flowid 1:12

Notice the only thing I have changed is the IP address, which is all that's needed to add more filters. Just make sure you're referencing the correct class.

Lots of Boxes

If I want to add different bandwidth policies in addition to the 25Mbit policy, I simply need to create another class – possibly one class for each machine or device on the LAN.

I admit that I'm superstitious, so I'll create a 1:14 class and avoid 1:13. I have the qdisc installed already and the "master" class, so I'm just going to create a child class with some filters and change their bandwidth settings and labels to 1:14. I'll throw in a generous 100Mbits of bandwidth this time (Listing 2).

Listing 2: 1:14 Class

01 # Configure "child" class 1:14 with new bandwidth settings
02 tc class add dev $DEV parent 1:1 classid 1:14 cbq bandwidth $BAND rate 100Mbit allot 1514 cell 8 weight 10Mbit \prio 3 maxburst 20 avpkt $PKT bounded
03 # Now add filters to your child class, three IP Addresses can share 100Mbit in this case
04 tc filter add dev $DEV protocol ip parent 1:0 prio 100 u32 match ip $DIR 1.2.3.4 flowid 1:14
05 tc filter add dev $DEV protocol ip parent 1:0 prio 100 u32 match ip $DIR 5.4.3.2 flowid 1:14
06 tc filter add dev $DEV protocol ip parent 1:0 prio 100 u32 match ip $DIR 4.3.2.1 flowid 1:14

Step Away from the Bandwidth

I've topped, and now I'm almost ready to tail the bandwidth throttling script. What I need is a way of catching any IP addresses not explicitly declared in the policies. There are a couple of ways I can treat these IP addresses. I could let them share a tiny bit of bandwidth (e.g., maybe they're temporary visiting wireless IP addresses that can survive with 256Kbits of bandwidth), or I could categorically, flatly deny them access to the Internet.

The following class will catch any non-defined IP address and effectively black hole them – in other words, allocate them nearly zero bandwidth using a catch-all:

tc class add dev $DEV parent 1:1 classid 1:265 cbq bandwidth $BAND rate 1Kbit allot 1514 cell 8 weight 1Kbit prio 7 maxburst 20 avpkt $PKT bounded
tc filter add dev $DEV protocol ip parent 1:0 prio 100 u32 match ip src 0.0.0.0/0 police mtu 1 drop flowid 1:265

I'm using the 0.0.0.0/0 notation to catch all IP addresses. Although you have other ways to achieve this, if I don't use tc for several months, this notation is easy to recognize and understand when I return to my scripts.

Forward that Letter

If I want to use a commodity Linux box for routing, I need to enable the forwarding of traffic between network interfaces. You can permanently set traffic forwarding on most distributions by opening the file /etc/sysctl.conf in a favorite text editor and adding or uncommenting:

net.ipv4.ip_forward = 1

To enable forwarding without rebooting at the command prompt, set the change live with:

sysctl -p /etc/sysctl.conf

As I mentioned, IP forwarding is only needed for bandwidth throttling if you're routing between the interfaces. If you just want to throttle your single machine's services on one network interface, you can probably safely ignore this setting in most circumstances.

Crank Her Up

Save your script to your filesystem and call it something like bandwidth_throttler. Enter the following command at the prompt:

chmod +x bandwidth_throttler

to make it executable, and then set it live by running it with:

# ./bandwidth_throttler

You will probably see an error from the very first line (remember the del line?) because there's no qdisc to delete the first time you run the script. You can fine tune that later if you like.

As I mentioned, I'm being liberal with definitions and language on purpose for simplicity's sake; you might want to check dmesg at the command prompt for any kernel errors. If you see errors mentioning "quantum," look for whichever class the error mentions and adjust the weight parameter.

Come in, Tokyo

Now that bandwidth throttling is running, you can check the status. I'll leave it to you to decipher the majority of the output generated by these commands. They each offer important statistics about how tc is being used.

# tc -s -d filter show dev eth0
# tc -s -d class show dev eth0
# tc -s -d qdisc show dev eth0

The second command is probably most pertinent for understanding what the script is doing. The output is as follows:

class cbq 1:48 parent 1:1 rate 4000Kbit cell 8b (bounded) prio 3/3 weight 409600bit allot 1514b
level 0 ewma 5 avpkt 920b maxidle 1.6ms
 Sent 185501283289 bytes 158312905 pkt (dropped 1492, overlimits 451813147 requeues 0)
 rate 0bit 0pps backlog 0b 0p requeues 0
 borrowed 0 overactions 48498715 avgidle 812750 undertime 0

The first line states that 4Mbit (or more accurately, 4000Kbit) is allocated to the class. The third line tells how many packets and how many bytes of data it has shipped since the script was last run. Further on, in brackets on the same line, the output reveals that the LAN machine's packets have bounced off the class's 4Mbit ceiling 451,813,147 times; however, it looks like tc is working well because it has only had to drop 1,492 of those packets to continue service. You can make further inroads into the complex ins and outs of the other meanings by using the excellent man page: man tc.

This Vehicle is Reversing

Just to keep you on your toes, I'll give you a working script for another qdisc called the Ingress qdisc (Listing 3). With all you have learned so far, the transition should be pretty simple.

Listing 3: Ingress qdisc Script

01 #!/bin/bash
02 DEV=eth0
03 # Delete the Ingress qdisc to reset it
04 tc qdisc del dev $DEV handle ffff: ingress
05 # Bind the Ingress qdisc
06 tc qdisc add dev $DEV handle ffff: ingress
07 # First LAN Machine INBOUND
08 tc filter add dev $DEV parent ffff: protocol ip prio 1 u32 match ip dst 12.34.56.78 \police rate 1Mbit burst 100k drop flowid :1001
09 # Second LAN Machine INBOUND
10 tc filter add dev $DEV parent ffff: protocol ip prio 1 u32 match ip dst 98.76.54.32 \police rate 2Mbit burst 200k drop flowid :1002

Possibly the only tricky element is understanding that this new script is for a different direction of traffic, and it uses dst for destination instead of src for source. This script focuses on the traffic coming into the Linux router from the Internet, and the aim with this specifically built qdisc is to throttle how much of the inbound data is allowed per IP address (in terms of strict definitions, this script manages policing and not shaping). The structure of the script is different with the Ingress qdisc. In simple terms, it might be described as just a container on which you can add filters, rather than a fully fledged qdisc.

What's That Ingress qdisc Doing?

The following command lines check the status on ingress policing. Look for references to the flows labeled 1001 and 1002 if you're unsure about the output.

Check for dropped statistics:

# tc -s -d qdisc show dev eth0

Check for overlimit counts:

# tc -d -s filter show dev eth0 parent ffff:

With some trial and error, you should become comfortable with the Ingress qdisc statistical output, so I would recommend spending some time getting used to it. It's far simpler than CQB.

One-Point-Twenty-One Jiggawatts

I mentioned earlier I was focusing on the IP protocol. The mighty tc lets you control almost any traffic your networking stack might offer. Among other things, it can handle hexadecimal instruction with bizarre port and protocol combinations. One somewhat impressive example might be that tc can be used on a Linux router to throttle floods of traffic effortlessly to a specific port on your entire LAN. This feature means, in the event of an attack, your network stays up, even if every machine on your LAN is being flooded with an attack on a specific port or a number of ports.

The End is Nigh

My aim was to provide you with the simplest possible way to use Traffic Control without becoming bogged down with its many arcane complexities. Traffic Control is a truly powerful addition to the Linux kernel, with a multitude of applications, whether it is just used for keeping bandwidth in check or throttling attacks. I hope you find tc as useful as I have found it, and I hope this introduction will help you scale the initial learning curve much faster.