Features Apache Aurora Lead image: Lead Image © Anton Gvozdikov, 123RF.com

The Aurora Mesos Framework

Cloud Watcher

Apache Aurora is a service daemon built for the data center. By Udo Seidel

Apache's Mesos project is an important building block for a new generation of cloud applications. The goal of the Mesos project is to let the developer "program against the datacenter, like it's a single pool of resources. Mesos abstracts CPU, memory, storage, and other compute resources away from machines (physical or virtual), enabling fault-tolerant and elastic distributed systems to easily be built and run effectively" [1].

An important tool that has evolved out of the Mesos environment is Aurora, which recently graduated from the Apache Incubator and is now a full Apache project Figure 1. According to the project website,, "Aurora runs applications and services across a shared pool of machines, and is responsible for keeping them running, forever. When machines experience failure, Aurora intelligently reschedules those jobs onto healthy machines" [2]. In other words, Aurora is a little like an init tool for data centers and cloud-based virtual environments.

Figure 1: Aurora is a Mesos Framework; Mesos is in turn an Apache project.

The Aurora project has many fathers: In addition to its kinship with Apache and Mesos, Aurora was initially supported by Twitter, and Google was at least indirectly an inspiration for the project. The beginnings of Aurora date back to 2010. Bill Farner, a member of the research team at Twitter, launched a project to facilitate the operation of tweeting infrastructure. The IT landscape of the short message service had grown considerably at that time. The operations team was faced with thousands of computers and hundreds of applications. Added to this was the constant rollout of new software versions.

Bill Farner had previously worked at Google and had some experience working with Google's Borg cluster manager [3]. In the early years, development took place only within Twitter and behind closed doors. However, more and more employees contributed to the development, and Aurora became increasingly important for the various Twitter services. Eventually, the opening of the project in the direction of the open source community was a natural step to maintain such a fast-growing software project. Aurora has been part of the Apache family since 2013.

Stone on Stone

One of Aurora's main functions is starting and monitoring services in an IT environment. If a server or an application fails, Aurora restarts it on another computer and checks to see if everything is working as expected. If you are familiar with the Mesos environment, you might be thinking that Aurora sounds similar to another Mesos-based service known as Marathon [4]. See the "Aurora vs. Marathon" box for a look at some of the differences.

Aurora vs. Marathon

At first glance, Aurora and the Mesos-based Marathon seem to serve the same purpose. Both start and monitor long-running applications. However, a few features distinguish the two: The installation and first tests are much faster and easier with Marathon; Aurora requires you to learn a whole new description language, whereas Marathon only requires some rudimentary knowledge of JSON.

Aurora, like Mesos, is a natural member of the Apache family. Marathon, on the other hand, is promoted by the company Mesosphere [5]. Marathon is more lightweight and therefore easier to use. Aurora has a more extensive feature set and has some impressive powers, but it is therefore less flexible and more difficult to adapt to on-the-fly configurations. Another difference relates to the support of the very latest features and technologies. Aurora is somewhat more conservative adding new features, which ensures stability but means that new features show up more slowly. The integration of Docker, for example, took place earlier in Marathon.

Aurora registers services and thus allows other programs to use them. Aurora reverts to the Zookeeper [6] configuration server to assist with service registration. In the background, Aurora users Mesos to start and monitor services.

In Mesos environments, the master receives tasks and passes them on to slaves for execution. Aurora acts as an abstraction layer for the distribution of tasks by Mesos. The interplay of the individual components is shown in Figure 2.

Figure 2: Aurora, Zookeeper, Mesos, and Thermos interacting.

Jobs, Tasks, and Processes

Aurora may use Mesos as a base, but it also copies its own components to the slaves in order to execute the tasks. These special components are called Thermos. Thermos serve two purposes: On one hand, they are the executors that start the process at the operating system level. On the other hand, they are observers. Thermos are a kind of registration office for the executors. They receive the status reports about all started processes and then submit them back to Mesos.

Jobs on the Aurora level are divided into tasks on the Mesos layer. A job usually corresponds to more than one task. The tasks are then divided on the Thermos level into process, which the administrator can observe with commands such as ps or top. The description of an Aurora job contains instructions for the meta-scheduler itself, for Mesos, and Thermos. In the end, information about jobs, tasks, and processes define the application that the meta-scheduler monitors.

The developers use a Python-based Domain-Specific Language (DSL) for defining the configuration. The description file has the extension .aurora and has several components. A component is a job template that lets you preconfigure settings for jobs with similar properties.

Listing 1 shows a typical "Hello World" example. A process object in the Aurora DSL requires two pieces of information: a name and the command you want to execute. Here, it is a simple echo command, which Python executes using the os module. A Task object requires a name and the related processes. The necessary resources are also added for Mesos itself. Based on this information, the system selects the appropriate slaves. In Listing 1, the requirements are 10 percent CPU, 20MB of RAM, and the same amount of disk space.

Listing 1: Hello World Aurora Job

01 $ cat hello.world.aurora
02 import os
03 hello_world_process = Process(
04   name = 'hello_world',
05   cmdline = 'echo hello world')
06
07 hello_world_task = Task(
08   resources = Resources(cpu = 0.1, ram = 20*MB, disk = 20*MB),
09   processes = [hello_world_process])
10
11 hello_world_job = Job(
12   cluster = 'test',
13   role = os.getenv('USER'),
14   task = hello_world_task)
15
16 jobs = [hello_world_job]

The configuration also requires information about the assigned tasks, a role, and the run-time environment. The role often corresponds to the user context under which the associated processes are running at the end. The run-time environment, in turn, defines which Aurora instances take care of the job. The example only demonstrates one part of the options. Anyone wanting to learn more will find sufficient reading material in the project documentation [7]-[10].

If you dare start working with Aurora, you have several options. It is a good idea to install the Vagrant development tool [11]. The project documentation and other instructions on the Internet use this tool for the installation. The manual route is much more difficult and initially requires the installation of Mesos cluster [12] with the associated Zookeeper infrastructure.

Difficult Installation

The installation of the various Aurora components may be documented in principle, but it is very confusing. It is not easy to understand which steps should be completed in which order. The next task is to learn the description language. I recommend you start with the simple examples from the documentation and then gradually expand systematically.

The initial hurdles for successfully using Aurora are not exactly low. Administrators should therefore assess the costs and benefits very carefully for smaller IT environments. In large environments with multiple servers, many services, and constant software updates, evaluating the meta-scheduler is all the more important.