Management Puppet versus Ansible Lead image: Lead Image © Jenzig, photocase.com

Ansible as an alternative to the Puppet configuration tool

Big and Small

Automation is part of life in the data center, and Puppet is commonly regarded as the King of the Hill, but some users prefer the lean alternative Ansible. By Martin Loschwitz

System administrators love harking back to the good old days when IT setups consisted of a manageable number of servers that you could easily maintain manually. On today's networks, admins and hosting providers sometimes look after thousands of hosts.

Manual work is no longer economically feasible: Tools for configuration management and automation are essential for many networks. Most admins are familiar with popular solutions such as Puppet [1] and Chef [2]. Although the Puppet configuration management tool has many supports, some sys admins regularly curse it. Many users believe Puppet is too complex, and some complain that the tool has moved too far away from its original ideals.

Many users who want configuration management but want to avoid the complications associated with Puppet have turned to Ansible [3]. The Ansible configuration tool promises automation with a learning curve that is much shorter than Puppet's – and without compromising quality.

Puppet

Even back in the old days, when every server was hand-reared, admins still occasionally had the need to keep configuration files for individual services synchronized across multiple servers. Networks began to deploy tools such as Rsync or SCP to keep configuration files synchronized, but it quickly became apparent that these approaches were hacks.

When the first version of Puppet arrived in 2005, the promise of the developers seemed to be revolutionary: Puppet could distribute configuration files that were stored and maintained at a central location to all the servers on the network.

Puppet soon added additional features that allowed it to do much more than just roll out files: It could also call individual files on the servers to perform tasks such as retroactively installing packages. Puppet development progressed in leaps and bounds: its own declaration language, DSL, gave Puppet the ability to create generic modules that handled individual tasks. Administrators were able to share prebuilt modules with the community, thus making a contribution to the world of open source system administration.

Puppet grew as a variety of important features made their way into the development – including the ability to retrieve the configuration from external sources; however, the many new features added new complications and made Puppet more difficult to use.

The complaint most frequently levied at Puppet is that the tool has moved away from its original objectives and is trying to become a panacea for all ills at the data center. Whereas reducing complexity is the goal of automation, Puppet actually adds complexity. The proliferation of Puppet modules is a prime example of this problem: Users used to have precisely one module for each problem. In other words, if you were installing Rsyslog with Puppet you would find precisely one solution on the web. Developers began forking the modules, customizing for their local environments, then publishing the forks. If you look for a Puppet module for Rsyslog today, you will have to invest some time in checking through several available alternatives.

The Puppet project itself has attempted to put an end to this confusion: Puppet Forge [4] is regarded as the authoritative source for Puppet modules, and it clearly tags the modules that are maintained by the Puppet team itself. That said, any other user can upload their modules to Puppet Forge. For some users, Puppet Forge creates more confusion than it prevents.

Users also have to face issues related to module maintenance: Many modules you will find on the web are fairly ancient and will not work – or won't work correctly – on more recent systems.

Powerful Puppet features such as External Node Classifiers, which return a node description in YAML format, promise convenience but can be difficult to understand and use. Puppet's own development, Hiera [5], has become the de facto standard for managing site-specific data outside of manifests. Hiera (Figure 1) supports multiple value types, such as strings, hashes, and arrays, and it even lets you combine these types; however, don't expect an easy learning curve.

Figure 1: Hiera often offers very little benefit to administrators – the values that the user would otherwise enter directly in the template just detour via YAML.

Another one of Puppet's problems is the lack of an option for coordinating the execution of individual commands against multiple hosts. Although Puppet is theoretically idempotent (i.e., each command can be run any number of times without causing problems), in practical terms, users face difficulties when they need to coordinate the process of running commands on multiple hosts. You'll need to consider many dependencies; for example, before you can meaningfully operate a web server, you first need to roll out a database to another host and install a matching database schema.

Rolling out a high-availability database, such as MySQL with Galera, is something you can easily achieve. But most administrators prefer to install the database schema manually. Otherwise you might end up with a broken database after two Puppet instances have finished battling it out in an attempt to make changes simultaneously.

The master/agent model used with Puppet also has some drawbacks for many users. (See the box titled "The Master/Agent Model as a Weak Spot.")

The Master/Agent Model as a Weak Spot

Any host that is managed by Puppet has at least one Puppet agent. The agent is responsible for changing the server configuration to the target state. Either you have a complete Puppet configuration on each host, in which case the agent simply accesses and implements the configuration locally, or you run Puppet with a master server, in which case the agents on the various hosts connect to the master server to pick up their configurations.

Both approaches have severe drawbacks: If you want to run a Puppet agent locally and without a master, you need to ensure that all of the relevant configuration files for Puppet exist on the host. Many Puppet installations first call Git when the agent launches on the host to assure that the agent itself is up to date.

If you decide to use a master server, you are adding a bona fide resource hog to your setup. It is not uncommon to see a large-scale Puppet configuration, in combination with many agents, trying to talk to the Puppet master at the same time and therefore making agonizingly slow progress. This phenomenon quickly gives administrators the impression that they are doing something wrong – no matter what they do.

Enter Ansible Stage Left

Ansible seeks to distinguish itself from its competitors through reduced complexity, and the project seems to be successful with this strategy: Red Hat purchased the company behind Ansible in October 2015 for $150 million.

Ansible understands itself as a tool that mainly combines three tasks: software distribution, ad hoc command execution, and configuration file management. This goal is not that different from Puppet or Chef's mission statements, but even at first sight, Ansible reveals that it is different from Puppet or Chef. What is very noticeable is the way in which Ansible goes about managing individual hosts. Ansible does not use a server-client principle like Puppet. Instead, Ansible relies on plain old SSH: You only need the Ansible configuration on the server.

Ansible uses SSH to talk to the individual computers and run the configuration steps. In other words, Ansible does not use its own protocol, nor does it need client software (Figure 2). Administrators can even install the main Ansible host from another computer, such as their own laptops.

Figure 2: Ping is a built-in Ansible command that checks whether Ansible can connect to all the hosts it is supposed to manage.

Playbooks Instead of Manifests

Ansible's approach to configuring managed machines is different from Puppet's. Puppet administrators refer to manifests, and Ansible talks of playbooks. A playbook is a kind of contingency manual telling the admin what to do in certain situations.

The structure of an Ansible playbook is very similar to that of a classical contingency entry. Listing 1 shows a working playbook that installs a web server.

Listing 1: main.yaml: Installing a Web Server

01 ---
02 - name: Install Apache
03   apt: pkg=apache2-mpm-prefork state=present
04
05 - name: Delete default Apache vhost
06   file: path=/etc/apache2/sites-enabled/000-default state=absent
07   notify: Restart Apache
08
09 - name: Generate Vhosts
10   template:
11     src=vhost.conf
12     dest=/etc/apache2/sites-enabled/{{ item.hostname }}.conf
13   with_items: vhosts
14   notify: Restart Apache

The entry that follows name defines what the command does, and in plain English. Following the name is a list of files that need to be changed and the commands that belong to the Ansible command. Playbooks are thus lists of individual commands that need to run in a specific order (Figure 3).

Figure 3: Like Puppet, Ansible has a big community writing playbooks, such as this OpenStack playbook.

Left to its own devices, a playbook is pretty useless: after all, it is not associated with a host. To link a playbook with a host, Ansible uses a role model that is similar to Puppet's model. For each host, Ansible wants to know what roles to install. Listing 2 contains a role definition that assigns the backend host a webserver role.

Listing 2: provision.yaml for the webserver Role

01 ---
02 - hosts: backend
03    sudo: yes
04    roles:
05      - role: webserver
06        vhosts:
07          - { hostname: 'webserver.local', dir: '/var/www/htdocs' }

Ansible designates the hosts that it manages with an inventory file that simply lists the hosts and the IP addresses on which they are addressable – and that's it. Listing 3 shows a complete inventory definition that creates the backend host described in Listing 2.

Listing 3: Ansible Inventory

01 [backend]
02 192.168.0.1 # Backend-Server

Ansible is also an orchestration solution that is useful for customers who operate virtual machines in cloud environments. In the Ansible manual [6], administrators will find a script for practically any cloud environment that generates an inventory file. Precisely because Ansible regularly works its way through all the computers using SSH, it is easy to replace the inventory file on the fly.

One of the biggest horror scenarios in dealings with Puppet is how to express reliable dependencies between individual tasks. Although the keywords requires, before, and after exist, if you use all of them, you run the risk of losing track of where you are. And defining dependencies across multiple hosts is completely impossible. This is not true of Ansible: based on the simple role model with matching playbooks, administrators can run arbitrary tasks in a deterministic order – even specifically on individual hosts.

The execution order is authoritative: Ansible always runs playbooks in the order that they define themselves. In other words, if you stipulate for a playbook that the database role is restricted to the hosts db1 and db2, and the webserver role to the host backend, Ansible will process the steps precisely in this way. Only when database is working on the two target hosts will it even start to process the backend server. For administrators who regularly spend time in Puppet's dependency hell, this fact greatly facilitates the migration to Ansible.

Extensions by Module

Admins can use Ansible modules to run their own specific commands on the hosts (Figure 4). Modules are small programs that Ansible copies to the target hosts in the scope of an Ansible run. When the commands arrive on the host, they are executed and then finally deleted. Unlike Puppet, which preferences the Ruby language, Ansible lets you write modules in any language – the only condition is that the modules deliver their output in JavaScript Object Notation (JSON) to the target host.

Figure 4: Modules let administrators run arbitrary commands on Ansible target computers.

Modules let you manage almost every aspect of the Ansible configuration. For instance, if you are not satisfied with SSH execution, you can move to a different transport module so that Ansible can run its commands on the target hosts.

Prebuilt Playbooks Abound

Puppet and Ansible both support large communities that take care of authoring manifests and playbooks for virtually any task. Like Puppet, Ansible offers a complete solution for deploying OpenStack in a meaningful way [7]. US hosting service provider Rackspace maintains the code required for this OpenStack support. Although you won't find an Ansible Forge, searching on Google with the required keywords will typically turn up lots of options for third-party Ansible modules and add-ons.

In addition to its strong Linux support, Ansible runs on BSD, and Ansible can also handle Windows PowerShell commands.

Conclusions

Ansible impresses in a direct comparison with Puppet for routine configuration tasks. First and foremost, you have to consider the tool's performance. A Puppet agent in a typical setup with a Puppet master will first download the entire configuration from the master and store this configuration in the local cache – and this takes time.

Then a number of seconds are required for the Puppet master to generate a complete manifest for a host when called on to do so. It is only then that the manifest is executed – and again, this can take a number of minutes, especially in the case of larger manifests. Ansible is far quicker at processing the outstanding tasks once it is up and running.

Ansible playbooks are clear and structured – you can understand them if you have had very little experience with Ansible. The fact that Ansible does not even attempt to become a configuration converter like Puppet and Hiera is something for which the developers deserve much praise. Templates are authoritative in a typical Ansible playbook.

In light of all this euphoria, note that Ansible has not been around for nearly as long as Puppet. If you are used to working with Hiera in Puppet and enjoy the experience, you will probably find the structure of the Ansible playbook unnecessarily redundant.

Will Ansible retain its refreshing simplicity as it continues to evolve and gain new features? Only time will tell, but for now, if you want a lean alternative to Puppet, you will probably find what you are looking for with Ansible.