Creating a redundant array of inexpensive links
RAID for the Network
Stable Internet connections (uplinks) are mission critical in many enterprises. Unfortunately, they often break down. If you want to connect two uplinks redundantly using two or more providers, you will typically experiment with the Border Gateway Protocol (BGP). This solution can be a fairly expensive, though, because providers charge dearly for enterprise-level connections. With a few restrictions, you can achieve redundancy far less expensively by opting for Linux and the Fault Tolerant (FT) Router [1].
A Linux host typically sends its packets with the help of a routing table. All packets that do not belong to a specific route follow the default route, which usually leads to the Internet. If this link fails, all the users on the inside are cut off from the Internet. The reasons for failure can be many, including bulldozers digging up cables, Layer 2 or 3 software failures, or routers that fail one hop downstream on the provider's network.
To avoid hard disk failures, administrators have relied on RAID for a long time; in the simplest case, this means simply doubling the number of disks in a mirroring RAID [2]. This isn't quite as easy for access lines. In the classic setup for this scenario, at least two Internet providers safeguard the network; that is, your own connection has two uplinks.
The administrator needs to inform the rest of the world using BGP (on internal networks, this can also be an internal routing protocol such as OSPF, or Open Shortest Path First). If one link fails, the protocols notice this and stop sending packets over the dead link.
The protocols detect failures automatically. If the Internet Protocol (IP) fails even though the link is working perfectly at the lowest level, the routing protocol notices this through active monitoring. Although it can take a while, at least the changeover happens without intervention (i.e., without forcing the administrator out of bed in the middle of the night).
Although the "I" in RAID stands for "Inexpensive," a redundant network connection is not something you can have for $19.99 a month. Instead, you are probably looking at a three-figure amount. However, administrators with less cash at their disposal would probably relish the prospect of turning their Redundant Array of Links (RAL) into a RAIL.
Fortunately, Linux comes with almost all the components that you need to implement such a RAIL. However, the network administrator does need to enable it kernel-side. Linux fails to detect link failures automatically; instead, it would continue sending data packets into a black hole at the end of the failed link, and the user would receive non-reproducible load errors. Before you integrate FT Router into your setup, it is a good idea to familiarize yourself with the underlying software.
Wild Routing with iproute2
The ip
command in the iproute2 package can configure everything the Linux kernel offers in terms of network technology. Many functions can only be configured with this tool (e.g., routes in additional routing tables); in other cases, it replaces standard tools such as ifconfig
, route
, or vconfig
, which creates VLANs.
For example, ip
can create multiple-path routes. To do so, you need to assign one of your new routes the table <X>
parameter, where <X>
can be a number or name that points to a specific routing table in the /etc/iproute2/rt_tables
file. The command
ip route add table 5 192.168.0.0/24 via 10.1.1.1
assigns the route for network 192.168.0.0/24 to routing table number 5, for example.
Policy Routing
If multiple routing tables exist, the administrator needs to tell the kernel when to use which routing table with the ip rule
command, which is used to create rules for the source and target IP addresses – network blocks are also permitted. You can define which network interface to use and a value for the type of service (TOS) field in the IP protocol.
To keep the rules as flexible as possible, you can also use the fwmark
argument. This hexadecimal value lets you track the firewall marker set by iptables; iptables references this along with the mangle
attribute to route by ports or protocols (Listing 1). For more detailed information, search the web for "Policy Routing" [3].
Listing 1: Iptables Rules for Policy Routing
01 iptables -t mangle -A PREROUTING -i <source_interface> \ -j CONNMARK --restore-mark --nfmask 0xffffffff --ctmask 0xffffffff 02 iptables -t mangle -A PREROUTING -i <source_interface> \ -j CONNMARK -p tcp --dport 80 -m conntrack --ctstate NEW --set-mark 0x10
To send all the packets from subnet 172.16.0.0/24 to routing table 8 with the default route 10.2.2.1, you would enter the following two commands:
ip route add table 8 default via 10.2.2.1 ip rule add from 172.16.0.0/24 lookup 8
Administrators need to be aware of one small thing when using multiple routing tables: You do not have the routing entries the kernel creates automatically on creating a new IP address. You need to add these manually if you want to use them (e.g., in Policy Routing). Otherwise, Linux sends all response packets via the wrong route.
If you only want to forward outgoing HTTP traffic over a different route, you can create the two iptables rules from Listing 1. The first will restore the markers on the outgoing packets, and the second assigns packets that have a target of port 80 a hexadecimal marker (--set-mark 0x10
). You will want the routing rule to watch out for this in future:
ip rule add fwmark 0x10 lookup 8
Using this approach, you can redirect packets, but the kernel does not distribute them across different routes.
Multipath Routing
Using the ip route
command, administrators can set up multiple default routes – even with different weighting. This assumes that the kernel was built with multipath routing support up front.
Linux typically balances data packets at the packet or connection level. Distributing the packets individually will achieve maximum utilization of both links, but it does entail a couple of risks in practical applications. For example, a counterpart at the other end of the link organizes the packets again in a similar way; otherwise, only the upstream is load balanced.
If the links use different speeds, it's possible that the sender transmits the packets in the right order, but they arrive in the wrong order. Because TCP connections only compensate for differences up to the buffer size, the speed settles to that of the slowest link. Additionally, address translation is impossible on this route.
Alternatively, admins can set up each new connection on a different link. Although the individual streams are, at most, as fast as the quickest single link, this still leads to improved data throughput for users.
To set up to default routes with load balancing, you would do this:
ip route add default nexthop via 10.1.1.1 dev eth1 \ nexthop via 10.2.1.1 dev eth2
If you want to work at the packet level, you need to add the equalize
keyword after the add
command. To achieve weighting, type weight <number>
behind the individual hops. If a route now fails, every second connection will hit a timeout.
At this point, you have all the building blocks in place to send data across multiple links, monitor with a couple of simple commands, and remove one link from the cluster in case of failure. So, it's time for FT Router to enter the game.
Fault Tolerant
Source code for the Fault Tolerant Router project [4] and some fairly extensive documentation [1] can be downloaded from GitHub. Typing
gem install fault_tolerant_router
lets you install the software, which was programmed in Ruby (Figure 1). After launching, a daemon monitors your links and switches over when needed. When called with the generate_config
option
sudo fault_tolerant_router generate config
the program generates the /etc/fault_tolerant_router.conf
configuration file (Listing 2). You need to modify this to suit your requirements. The file uses a YAML format [5] and contains the configuration groups uplinks
, downlinks
, tests
, log
and email
.
Listing 2: Excerpt of Config File
uplinks: - interface: eth2 ip: 10.1.1.1 gateway: 10.1.1.254 description: Example Provider 1 weight: 1 default_route: true # Other Interfaces [...] downlinks: lan: eth1 tests: ips: - 192.168.254.254 required_successful: 1 ping_retries: 1 interval: 60 log: file: "/var/log/fault_tolerant_router.log" max_size: 1024000 old_files: 10 [...]
An uplinks
definition typically comprises of the name of the interface, the matching IP address, the IP address of the gateway for the interface, a description, a flag, and a weighting. The flag determines whether the interface belongs to the default routes. The higher the weighting, the more often this link is used for new connections. You would use weighting if your links run at different speeds (Figure 2).
In the downlinks
section, you need to define an internal interface – if so desired, a DMZ interface. The content below tests
shows a list of IP addresses that the route is allowed to ping regularly in random order in the scope of functional tests. The addresses should be accessible on the Internet. You will also need to define the number of tests a link needs to pass before it is deemed functional and how often the router sends pings. You also specify a timeout parameter and a retry parameter.
The log
section of the configuration file defines the maximum size and number of logfiles. If its status changes, the daemon can dispatch email. The parameters required for this are also stored in the configuration file. The last parameters act as start counters for routing tables, firewall markers, and priorities, which you want the router to assign automatically.
The script also defines the required mangle and NAT rules on the basis of the configuration. The installation guide recommends merging these with an existing configuration. Because the router typically also assumes the role of the firewall in the infrastructure, this definitely makes sense. The command:
fault_tolerant_router generate_iptables
generates the rules in iptables-save
format.
To avoid any downstream routers requiring a configuration that takes the FT Router into consideration, you would NAT all outgoing sessions on all interfaces to match the IP address of the interface. Then, launch the monitoring process with the fault_tolerant_router monitor
command. If you additionally set the --debug
option, you can follow in detail on the console which routes the FT Router enables and how the various functional tests work out.
Three-Track
The ADMIN test lab was set up with an FT router with three uplinks in a KVM environment. The VMs are connected using Open vSwitch, which was also used to restrict the bandwidth on the router's uplinks. In other words, it emulated typical bandwidth patterns in a consumer environment. Figure 3 shows a sketch of the network. I then ran wget
to download data from a web server.
On the interface between the client and FT router, iptraf-ng
detected the traffic, revealing that wget
commands started sequentially actually did run one after another with the assigned bandwidth as a result of session balancing. Starting multiple downloads at the same time activated all the links. On the route between the client and FT router, the traffic detector revealed that the load was occupying the full bandwidth.
The next step was to block one of the links. The download running on it then stalled. Until the daemon started the next ping test, new connections that would normally have used this link were not established.
Only when the daemon removed the link from the pool did all the connection establishments work again. Again, I enabled the connection again, and after another ping cycle, I was able to use the link in the normal way.
Conclusions
If a system administrator were tasked with creating a link that users would never notice failing because the technology compensated within milliseconds, FT router would be the wrong choice. However, for administrators who can accept a short wait to avoid a triple-figure monthly fee, FT router offers a solution that is easy to set up.
Beyond this, FT router uses all links if there are no failures. If your links run at different speeds, a download can take considerably longer if it happens to be running on the slowest link. Although you can try to prevent this outcome with weighting, you have no way to rule out this behavior fully.