ONIE and Cumulus Linux on a switch
Going Cumulus
When you think of switches for datacenters, you probably think of products by Juniper, Cisco, HP, or Arista. Although high-end switches use their own operating systems for configuration, the hardware takes care of directing packets. The ASICs (Application-Specific Integrated Circuits) that handle the switching tasks often do not differ to any great extent; manufacturers create unique value through the configurations they can map. The control software determines what these maps look like – and in some cases, the control software is Linux.
Linux runs on practically any CPU architecture, and it has even found its way onto control units in switches. Ordinary Linux, however, does not provide the drivers and tools necessary to help administrators manage the switch hardware. The Cumulus Linux [1] project is an effort to provide a robust and versatile version of Linux tailored to run on network switches. The Cumulus distribution runs on devices by Dell, Edgecore, and a couple other manufacturers. Cumulus is not free but is, instead, provided on a subscription basis (see the box "Licensed to Switch").
ONIE
For the operating system to work on the switch, it must be compatible with the Open Network Install Environment (ONIE) [3]. ONIE is an open bootloader environment – a mini Linux/BusyBox system that runs on the bare hardware and supports the installation of a network OS through remote provisioning. According to the project website, ONIE "… allows end-users and channel partners to install the target network OS as part of data center provisioning, in the fashion that servers are provisioned."
The ONIE team refers to it as the "Linux Kernel with BusyBox." ONIE discovers the actual operating system on the network or on a plugged-in USB stick, triggers the installation, and provides repair tools if something goes wrong during the installation or upgrade.
The boot sequence for a bare metal ONIE-capable device is as follows:
- After powering on, the minimal bootloader on the device launches (typically a version of U-Boot).
- The bootloader initiates the Linux kernel of the ONIE installer.
- ONIE searches for the operating system and installs it.
The ONIE installer [4] looks for the operating system in several locations. If ONIE finds a USB storage device and a file with the right name in its root directory (the naming rules are disclosed in the ONIE documentation [5]), the installation starts. If the admin connects a PC with a web server directly using a network cable, ONIE uses neighbor discovery to check the server's IP address for files that comply with the naming rules. The image simply needs to reside in the web server's document root.
The typical case on a network is to use DHCP (Figure 1). You need to set up DHCP option 114 (default-url
) for the host with a direct link to the firmware. From then on, everything else works without any intervention. This approach means you can take the switch out the box, install it, connect the management interface, and power on.
For more complex DHCP setups, ONIE adds a vendor ID to the DHCP requests so that the administrator can distinguish between the DHCP responses. If something goes wrong during the installation, ONIE offers a command line in the style of BusyBox. Administrators can use it to check out what has happened or restart the installation.
As of this writing, ONIE supports the Switch Light [6] operating systems by Big Switch Networks, Cumulus Linux, and MLNX-OS [7] by Mellanox. The system will work on either IPv4 or IPv6.
Cumulus Linux
Cumulus Linux is a Debian-based distribution with a couple of extras that controls the switch hardware. Former staff at VMware and Cisco founded the company behind Cumulus back in 2010 with the aim of developing a "Network Linux." Figure 2 shows how the typical Linux components, the Cumulus extensions, and the switch hardware collaborate.
The switch we used in our lab, a 4600-54T by Edgecore, has 52 ports and a management interface.
You can view these ports using the ifconfig -a
command; not as eth
, but as swp
. The switchd
daemon shown in Figure 2 intercepts all the standard Linux calls and converts them for the switch hardware. The switch hardware, in turn, is configured like a Linux computer with 53 interfaces.
Where the administrator of a Cisco switch would say no shutdown
, you need to say ip link up
. You can enable switching between the switch ports with brctl
, the standard bridge configuration tool.
For the configuration to survive a reboot, it needs to reside in /etc/network/interfaces
. To switch ports 1 to 5 together, you need to add the lines from Listing 1 to the interfaces
file.
Listing 1: Simple Switch Configuration
01 auto br0 02 iface br0 03 bridge-ports swp1 swp2 swp3 swp4 swp5 04 bridge-stp on
Because the list would be pretty long for 52 ports, you can also do the following specify a series of ports:
bridge-ports glob swp1-10
Readers who have worked with bridge and switch systems will see that Cumulus extends the syntax and options for standard Linux bridges. Of course, it is also possible to assign VLANs to ports and to bundle ports to create redundant uplinks.
In addition to the brctl
bridge configuration command, Cumulus supports Open V-Switch virtual switching technology. This support for V-Switch gives you a hardware-accelerated option for configuring overlay networks with Virtual Extensible LAN (VXLAN). If you want the device to operate in layer 3, you can assign the bridge or SWP interfaces IP addresses. The Quagga Suite handles dynamic routing protocols.
Tcpdump is one of the standard tools for troubleshooting, and one that is typically missing on switches. In most switch environments, you first need to configure a mirror port and connect a Tcpdump-capable device to it. In contrast, Cumulus lets you view traffic via tcpdump -n -i br0
.
Large Pcap files still need a mirror report for analysis because the flash memory on the switch is just not big enough and too slow. But, if you have a full Linux system like Cumulus running on your switch, you can try out other tricks. For example, Collectd [8] will run on the switch and support monitoring with a matching server.
Automation
The ONIE Installer makes it easy to support unattended provisioning for switches. Clients for Puppet and Chef are available in the repository, and Ansible is also supported. If you have ready-to-use Cookbooks or Playbooks that modify /etc/network/interfaces
, you can use automation software to configure the switch.
Conclusions
Traditionally minded network administrators might not like Cumulus and ONIE because they are very different from the standard tools for managing switches. On the other hand, a Linux server administrator with classical command-line skills will soon feel at home, despite facing some new shortcuts and technologies.
Living somewhere between these two worlds, I really enjoy troubleshooting networks with well-known Linux tools, and I am pretty sure the automation aspects will be an exciting feature for most administrators: You can easily integrate Cumulus into existing standard solutions, and it does not impose any limits on the administrator who is looking for more automation. In some cases, Cumulus might also save you some money, because bare metal switches are often considerably less expensive than their brand-name competitors.