Features Dell OS10 Lead image: Lead Image © Pei Ling Hoo, 123RF.com
Lead Image © Pei Ling Hoo, 123RF.com
 

OS10 and Dell's open networking offensive

Freedom, as in OS10

Dell's OS10 is a Linux-based operating system for network hardware that is designed to free network admins from the stranglehold of established manufacturers. We look at what it is, how the system works, and what it can do for you. By Martin Loschwitz

Dell caused a sensation at the beginning of 2016 when the manufacturer, known more for its server and desktop systems, presented an operating system for network switches named OS10 [1] (Operating System 10). Although switches by Dell formerly have run on an operating system simply dubbed OS, OS10 ups the game in many ways, one of them being that – unlike its predecessor – OS10 is based on Linux. Also, Dell provides the operating system expressly with the promise of decoupling: OS10 works not only on devices by Dell, but also on generic network hardware.

In contrast to proprietary switch operating systems, OS10 also offers open APIs, turning switches into normal Linux servers that can be managed in large environments like their server counterparts. I take a closer look at what Dell promises with OS10, how the operating system differs from classical switch firmware, and the market opportunities that OS10 could reveal.

Market Analysis

A look at the market for network infrastructure helps you understand why an open switch operating system such as OS10 is attracting so much attention. The market is not renowned for being flexible and fast moving: For decades, only a few corporations have divided it up, led by Juniper and Cisco.

In most cases the decision in favor of network hardware by one of these manufacturers is equivalent to a long partnership. If you have equipped your data center throughout with hardware by one producer, you will find it difficult to break away for several reasons. Although virtually all common network protocols and technologies have standards, it is nevertheless not easy to combine network hardware from two different vendors in everyday life. If you have ever tried to use jumbo frames between switches by different manufacturers, you will be familiar with the problem.

Additionally, a network administrator cannot manage devices automatically from other suppliers without training to match. Just because you can handle Cisco switches does not mean you can operate Juniper hardware. Linux-only admins are usually already ruled out of the network hardware game for the same reasons. Switches thus integrate poorly with modern DevOps concepts: Companies typically maintain configurations well removed from the rest of the installation.

The quasi-monopoly of the established producers is a problem in many ways. In addition to the lack of pressure to develop new features, the lock-in issue in particular prevents competition because it is difficult for new companies to gain a foothold with their own network hardware and reach a critical mass. Mellanox is a good example: In the InfiniBand market, the Israeli company is the undisputed market leader; however, the Ethernet division of the company, which offers some interesting products, is virtually unknown to many networkers.

Switches with proprietary software also prevent the development of additional features, because third parties cannot simply dock their products onto existing devices that lack open standards and interfaces. Juniper and others charge heavily for collaboration.

In recent years, signs have pointed to the slow breakdown of the monopoly of established manufacturers. Cloud computing, and especially software-defined networking (SDN), are the major motivators. Today, much of the functionality previously implemented primarily in the switch (i.e., in the physical network hardware) is now implemented in the software.

Breaking the Monopoly

The software does not need to run on the network devices of the cloud setup. In OpenStack clouds, switches are typically degraded to dumb iron and only receive and deliver packets between individual ports. This is not by design, by the way, because modern switches are actually small servers with many network connections. However, for it to work, the firmware of the switch must be modular and open, which is where the theory often fails from practical limitations: Proprietary operating systems are precisely not open systems. Modifications to the firmware of the device can be made only to the extent permitted by the manufacturer.

Cumulus [2] shows another way: The operating system for switches can be installed on white-label hardware by various vendors; it offers open APIs as well as a genuine Linux kernel and a distribution based on Debian. The idea of the network switch as a simple server becomes a reality. Mellanox therefore relies on cooperation with Cumulus for its Ethernet products, and various Mellanox devices can be ordered with Cumulus installed. Because scalable setups are steadily gaining in importance, the open network infrastructure market has huge potential.

OS10 as a Competitor

The circle now closes with OS10: Dell is taking the same line and looking to establish an operating system for switches already on the market that can also be used on hardware by other vendors and provides open interfaces. Ultimately, OS10 is Dell's manifest claim to cornering a sizeable share of the large cloud network market.

OS10 is similar to Cumulus in many ways: The core system is based on Linux and follows the switch abstraction interface (SAI) rules, a standard that Dell, Mellanox, Facebook, Intel, and Broadcom jointly developed within the framework of the Open Compute Project.

In the past few months, Dell has been beating the marketing drum for OS10. Bearing in mind the many similarities with Cumulus, the questions are: What can OS10 actually do? What can it do better than Cumulus? Where is the potential for improvement?

A Hardware-Software Bridge

The switch abstraction interface is important to understand the idea behind free operating systems for switches. Basically, manufacturers of network hardware face the challenge of providing the software on their switches access to the network hardware built into the standard equipment. As with any network card, network chipsets are built into classical network hardware: A few manufacturers (e.g., Mellanox) make the silicone themselves, whereas most providers rely on ready-made chips (e.g., from Broadcom).

For an operating system to use the hardware, it typically has to talk to the firmware – which is the whole point: For standard operating systems such as Linux to address and use the network hardware of a switch, the manufacturer has to ensure that these operating systems can access the firmware of the installed chipset via a defined interface.

With classical network hardware, you face a monolithic block: The vendor software runs directly on the device, and it communicates directly with the hardware through the proprietary firmware of the respective chipset. The user has no influence on the nature and extent of the interfaces offered by the switch software. SAI, though, starts with the firmware of the switch chipset. If it has a standardized interface, any software can run on the switch itself (e.g., a normal Linux kernel) by accessing the chipset directly via the SAI layer.

Dell has done exactly this with OS10. An abstraction layer based on the SAI specification resides between the hardware and software the end user sees. The abstraction layer communicates downward with the hardware and exports standardized interfaces upward; in the case of Linux, for example, it generates netlink events for the kernel (Figure 1) when a new device is connected. A normal Linux version 3.16 kernel runs on this abstraction; moreover, it routes an interface for the switch firmware (i.e., the SAI) to the system.

The network processing unit (NPU) is part of the SAI interface and exports a normal network interface and netlink events to the Linux side [3].
Figure 1: The network processing unit (NPU) is part of the SAI interface and exports a normal network interface and netlink events to the Linux side [3].

On the basis of this setup, it is up to you to configure the switch: You can use the system resources of the current Linux instance in the usual way and deploy additional network services, or you can rely on Dell's Control Plane Services (CPS), an object-oriented framework that directly accesses the SAI layer (Figure 2) and comes close to a legacy vendor solution.

Overview [3] of OS10 architecture clearly shows that SAI is the focal point of the platform. It supports Linux and CPS.
Figure 2: Overview [3] of OS10 architecture clearly shows that SAI is the focal point of the platform. It supports Linux and CPS.

Dell will offer different modules for CPS, including such functionalities as L2 networking and L3 routing (i.e., ready-made solutions). The important point is: If you do not want to use CPS, you can implement comparable functions from a standard Linux system. At the same time, this approach means that OS10 does not need to run exclusively on Dell switches. Any switch that implements the SAI standard should be capable of operating on OS10.

Dell makes it clear, however, in the OS10 Open Edition guide that the combination is currently only officially supported on some Dell switches. The fastest way to discover whether a Dell switch officially supports OS10, according to Dell, is to go to the product page for the respective device on the Dell website.

Debian Base

Dell offers OS10 in the form of several modules that mesh together. The Open Edition includes the kernel of the Linux operating system: This basic module runs on the switch (Figure 3) and provides a standard, working Linux. Dell advertises OS10 as an unmodified Debian "jessie," so you will have access to Linux 3.16 on your switches.

OS10 is fundamentally modular: The basic package is freely available; extensions or CPS modules will be available separately (illustration from the Dell website [1]).
Figure 3: OS10 is fundamentally modular: The basic package is freely available; extensions or CPS modules will be available separately (illustration from the Dell website [1]).

The SSH login on the switch for the OS10 installation takes you to a normal Linux shell, which is already notable: If you want to delve the depths of the Cisco or Juniper CLIs, you have a steep learning curve ahead. A Linux-based switch with a normal shell can be managed by typical Linux admins because a complete and familiar environment is available.

Running ip a makes this clear: Thanks to the SAI abstraction, you see all the ports on the switch as configurable network interfaces on the OS10 switch. From here, you can walk the tree as required. Because it is a Debian system, you can, for example, call apt-get to install new software.

This opens up all the opportunities available on a normal Linux server. For example, in terms of monitoring: SNMP support can be set up with snmpd in the usual way; traffic statistics per port (e.g., using RRD) are also easily set up. The pure Linux on the switch does not seem to be very powerful until it comes to typical networking functionality: With the border gateway protocol (BGP), you can convert a Layer 3 router into a Layer 2 switch.

Layer 3 Routing for Cloud Setups

Layer 3 routing is popular with cloud providers. The idea is that, instead of simple switching in Layer 2, the switch acts as a router to deliver packages in Layer 3. For this to happen, BGP daemons, such as Quagga or Bird, run on both the switch and all connected hosts, and each host is connected via two NICs to two different switches and announces routes to its main IP on those links.

Because this solution does not offer high availability via bonding, it avoids the problems that come with bonding if specific offloading features (e.g., for VXLAN) are used. Additionally, the switches act as routers, and hosts on different networks can thus be linked easily to one another at the switch level.

This solution is very convenient in cases where, for example, clouds are distributed across multiple locations and rely on different local networks: Thanks to L3 routing, such constructs no longer need a centralized router (Figure 4). Moreover, this type of routing integrates in a significantly better way with a typical leaf-spine network architecture, such as those commonly used to achieve scalability in clouds. In the event of a router failure, the entire BGP setup then automatically reconfigures so that only working paths remain and defective ones are dropped.

A leaf-spine architecture can be achieved on OS10 [3] with on-board resources and, say, Quagga.
Figure 4: A leaf-spine architecture can be achieved on OS10 [3] with on-board resources and, say, Quagga.

Although such a setup can also be created with the proprietary firmware of various manufacturers, you can usually look forward to non-trivial costs, and it is precisely these costs that quickly make the solution unattractive. OS10 solves this task much better: Quagga or Bird can be installed, each operating with their own configuration files; the routing part of the setup, then, is already implemented at the switch level.

Easy Automation

Another big problem for modern workflows is often encountered in the static configurations associated with classical network hardware. Hardware by established vendors typically have just a command line or maybe a proprietary interface specified by the provider to create new configurations. This setup can be difficult to bring in line with today's typical DevOps workflows: The assumption here is that it might be necessary to reconfigure (or configure from scratch) any server at any time in an automated process. A standard Linux like OS10 makes this easy to achieve: Whether Puppet, Chef, or Ansible, switches with OS10 can be edited easily from within your choice of automation solution.

Dell explicitly highlights this fact in the technical description of the OS10 base module [3]. There are even workflows in which a new switch, after being installed in the rack, is installed automatically without an admin having to log in or otherwise intervene manually. This feat is difficult to achieve with old-fashioned networking hardware. Dell keeps its promise of a universal switch operating system that integrates with DevOps workflows to a T.

Operable Cloud Software

Precisely because switches are servers with many network cards, it is possible to run cloud components directly on the switches themselves instead of on individual servers. All common cloud approaches envisage network nodes, which provide VMs in the cloud with external connectivity and take care of VXLAN tunneling and packet separation. Thus far, it has been customary in clouds to handle these tasks either with VMS running on the hosts or to create separate network nodes that perform no other task.

Theoretically there's nothing to prevent you right now from installing and operating the network components (e.g., from OpenStack) on a switch with the appropriate resources – RAM and CPU, in particular. Setups of this kind have not been able to assert themselves so far, not least because the popular SDN solutions for clouds, such as OpenStack, can hardly make meaningful use of the extra information on the switches.

Thanks to solutions like Cumulus or OS10, it is only a matter of time until the SDN manufacturers identify generic switches as target platforms and integrate matching functionality into their solutions. In any case, OS10 is ready for a setup of this kind.

CPS for a Classic Approach

Dell sees OS10 not only as a switch operating system for companies that require maximum flexibility in managing their network hardware, but also as a universal system that offers a choice between flexibility and off-the-shelf solutions. CPS is part of this approach: It is a programming interface used by Dell to add modules retroactively and thus enhance the switch's feature set. It is also implemented as an OS10 Linux system that also accesses the SAI layer directly in the background.

At least a couple of small dents appear in Dell's brave, new, dynamic world at this point, because several important features of the switch hardware can be configured only via the CPS interface and not the Linux command line. These features include port monitoring at the hardware level, quality of service, and access control lists. Dell has at least released detailed documentation for CPS and the matching API and peppered it with many examples. Additionally, bindings for the main script language, Python, exist. If you want to use one of the CPS-only features, you can do so via the CPS interface without having to pay money to Dell (Figure 5).

Python integration is available for CPS, meaning you can read the MAC table of the switch with a Python script, as shown here.
Figure 5: Python integration is available for CPS, meaning you can read the MAC table of the switch with a Python script, as shown here.

Clearly this was hardly likely Dell's original intent: After all, CPS is aimed specifically at companies who want L2 or L3 functionality without having to worry about the actual implementation on the switch. CPS is also Dell's answer to the question of how third-party manufacturers can make their products fit for use on OS10 switches: by using CPS and handling system queries via the local API.

The first versions of OS10 fair poorly compared with competitors like Cumulus, because they do not yet support all the features offered by competitors. However, with CPS, Dell likely will try to catch up in the feature race.

Open Networking Benefits

Speaking of Cumulus: It is interesting to note that Dell is offering a direct competitor to the Cumulus operating system for switches in the form of the OS10 release, without terminating the existing partnership with Cumulus. It was possible to purchase Dell switches without OS10 previously: Dell also delivers its own switches with Cumulus as the operating system.

Until now, Dell has marketed OS10 as part of its open networking contribution: The Open Networking Foundation (ONF) includes virtually all of the major network hardware manufacturers, including Dell. They collaborate under the ONF umbrella with the goal of standardizing SDN and increasing its dissemination.

In a separate FAQ on OS10, Dell makes it clear that the system is to be understood as a complement and extension of the Dell ONF strategy and that this does not affect the existing Cumulus partnership. The number of switches that support OS10 is still low despite the large marketing hoopla in early 2016. If you want OS10, you cannot buy a Dell device with OS10 pre-installed; instead you need to install OS10 on a compatible device. The OS10 offering from Dell thus lags behind the Cumulus offering. Until Dell eliminates these problems, it would be unwise to ditch Cumulus.

Clearly, this strategy cannot be Dell's intent in the long term. For some time, the manufacturer will probably offer purchasers the choice between OS10 and Cumulus, at least until feature parity is achieved; then, things are likely to get exciting.

Conclusions

Dell is pointing the way with OS10 in terms of networking in a DevOps-dominated environment. Network dinosaurs – lead by Juniper and Cisco – noticed all of this long ago. Juniper is currently attempting to establish Junos OS as the operating system for third-party hardware. Compared with Dell's efforts, this seems like a desperate attempt to put the cart before the horse.

At the end of the day, OS10 is not so much about the target platform on which the operating system runs. Its internal structure is much more important, and Dell's approach could hardly be more radical: Away from closed and proprietary solutions and toward an open platform, so you can set up whatever you need with standard tools.

This approach is so radical that it represents a mental hurdle for companies that have long been accustomed to classical networks dependent on a specific vendor. If you have focused for years on Juniper or proudly display your Cisco training certificates above your desk, you might find it difficult to understand that a switch is just a normal Linux server that can be configured like any other device.

The false conclusion that this would devalue specialist network knowledge is all too easily drawn. The opposite is actually the case here: The complexity of network setups will increase, specifically in the context of SDN.