Management Ansible Lead image: Lead Image © Wavebreak Media Ltd, 123RF.com

Automation with Ansible

IT as Planned

Ansible offers automatic provisioning and configuration capabilities similar to that of Chef and Puppet, but it's aimed more at admins than developers. By Daniel Schneller

The objective of DevOps, the combination of software development and IT operations, is to allow close cooperation between software developers and operating staff so they can learn from one another. One central aspect is automatic provisioning of the infrastructure and general software configuration distribution. Well-known products that achieve this goal and have been around for some time include Chef and Puppet. Ansible is a relatively new provider in this field; it offers similar functions but tends to target software developers less and experienced administrators more.

The imperative approach, which Ansible uses to formulate the task to be automated, is more intuitive – to my mind – and it involves fewer surprises in complex setups than the iterative approach of Puppet, for example. In a nutshell, Ansible focuses on "how," whereas Puppet concentrates on "what" in the form of the desired target state of the system and building on the ability of agents to create this state – with the current system state acting as a starting point. If undesirable effects or problems occur, however, the greater degree of abstraction can make debugging complex and difficult to understand.

The minimum requirement for Ansible is simply an installed operating system (Ubuntu Linux in my case) that provides SSH access for an administrative user. Because an operating system and SSH access are required, Ansible is explicitly not suitable as a bare metal provisioning tool. In my case, all the newly created machines boot from a lean base image that meets these requirements. As soon as logging in via SSH is possible, Ansible can take the helm.

At run time, all the required components are transferred to the target system using SSH; they are then executed there and removed on completion; however, don't worry – this process is not as slow as it might initially seem. Although dedicated agents increase the complexity of the overall system and need to be maintained and updated after the installation, this overhead does not exist in Ansible. Updates for Ansible are limited to the local installation.

Flexible Inventorying

To be able to provision machines, Ansible needs to know how to reach them. This information is stored in inventories. An inventory is a text file that uses the INI format and collects DNS hostnames, IP addresses, and optionally variable values, as shown in Listing 1.

Listing 1: Sample Inventory

[controllers]
control01.baremetal
control02.baremetal
[nodes]
node01.baremetal machine_type=Dell_R510
node02.baremetal machine_type=Dell_R720
[baremetals:children]
controllers
nodes

In this example, two host groups named controllers and nodes are defined, each with two members. All of the elements in these two groups are in turn collected in the other group, baremetals, thanks to the children keyword. A group of machines like this offers a simple and extremely flexible way of organizing the infrastructure.

However, as the infrastructure becomes increasingly dynamic and machines – virtual or physical – are added or removed, this approach reaches its limits. At some point, it becomes impossible to manage all the machines manually in the static text file. For situations like this, Ansible offers the option of running programs to generate inventories, both to supplement static entries and as the sole data source. Prebuilt modules exist for a variety of external products that can contain reusable information about the system landscape (e.g., Amazon EC2 or Zabbix). It is easy to add more integration features; after all, this is a simple question of generating the JSON data structure that Ansible expects.

For example, I wrote my own Python script to query information relating to virtual machines in OpenStack Nova from the machines' metafiles and output this information in a format suitable for Ansible. This capability means that you can create or delete machines as needed without having to change your Ansible inventories. Details of the available default modules and a link to the developer documentation are provided on the Ansible website [1].

Controlling Ansible

The following examples assume that the computers you use are addressable by their DNS names and that a local user by the name of local-admin exists and can connect using SSH and a key file without a password. Being able to log in without a password is not a prerequisite for the use of Ansible, but it does make your daily work much easier.

After inventorying and grouping the target hardware, you can now move on to the actions to be performed against it. Ansible formulates these actions in Playbooks – text files in an easily understandable, structured YAML format.

My first example will automate the installation of an enterprise-wide root CA certificate to be able to validate all kinds of TLS certificates, software packages, and so on against it. In other words, this first playbook needs to ensure that the required files are transferred to each host and installed there in a suitable way. The following listing shows the playbook site.yml, which is normally the master playbook and uses include to integrate other playbooks. In the example, I am not using this style to keep things simple:

---
hosts: baremetals
roles:
    - base

The hosts line points to the host group defined in the inventory. In practical terms, this means that all controllers and nodes execute the tasks that follow. The interesting thing here is that, thus far, neither certificates nor any commands have been mentioned. The reason is that normally it makes sense to organize tasks in smaller, reusable units, which Ansible refers to as roles.

In my example, this means that all members of the baremetals group are currently assigned a single role, triggered by the roles keyword. This role goes by the name of base; it creates the basic preconditions for many other steps later on and looks like Listing 2.

Listing 2: Base Role

---
- name: Install CenterDevice Root CA Certificates
    sudo: true
    copy: src=usr/local/share/ca-certificates/{{ item }}
             dest=/usr/local/share/ca-certificates/{{ item }}
    with_items:
       - centerdevice-intermediate-ca.crt
       - centerdevice-root-ca.crt
- name: Update root certificate database
    sudo: true
    command: update-ca-certificates

After a brief learning curve, the YAML structure is easy to read. Two tasks are performed during provisioning. Each has a name that designates it in the logs and can be used to point explicitly to the task. Names are not mandatory, but intuitive names do make playbooks easier to understand and maintain.

The first task installs two CA certificate files in the target directory, /usr/local/share/ca-certificates/. Because the target path is not writable for standard users, sudo: true ensures that the remote command is run with root privileges. The copy: line transfers the local file to the remote computer. To make the whole thing more interesting, I am using a variable item here and processing a set of files in a loop. The variable is populated with each value from the with_items: list in the next line. This is all I need to do: Ansible checks whether the target path exists and copies the two files. If one of the files already exists (and has the same content), the copy command is ignored. Note that variables can also have more complex structures and are not restricted to strings only.

Most Ansible tasks come with a number of additional parameters that let you modify their behavior. For example, you can define the owner and access privileges for the file while copying. After the copying action, the new certificates are processed as the next task. Ubuntu needs to call update-ca-certificates on the remote computer to do this. Again sudo: true ensures that the required privileges are in place.

Using a Sample Playbook

With the inventory, the first playbook, and the role, you now have all the preconditions for watching Ansible work. Ansible consistently uses convention over configuration, so you need to store the previously mentioned files in a directory tree on the local workstation as shown in Listing 3.

Listing 3: Directory Tree

ansible-demo-scripts
|- inventories
|   \- hosts.baremetal
|- roles
|   \- base
|     |- files
|     |  \- usr
|     |    \- local
|     |      \- share
|     |        \- ca-certificates
|     |             |- centerdevice-intermediate-ca.crt
|     |             \- centerdevice-root-ca.crt
|     \- tasks
|         \- main.yml
\- site.yml

You will find the base role name in the directory tree below the roles folder, and the source files to be copied for this role below the files folder. The actual tasks for the role are stored in main.yml below tasks. For more details of the directory tree conventions and recommendations relating to them, check out the excellent Ansible documentation [2].

Listing 4 shows how I execute ansible-playbook in the ansible-demo-scripts directory. The --ask-sudo-pass parameter is only required if the local-admin user is not authorized to run sudo without a password. Additionally, if ~/ .ssh/config already contains the name of the remote user, the -u parameter can be omitted. The -i option (or the longer form --inventory-file) defines the names of the desired inventories; I have just one inventory here. The final argument designates the playbook (site.yml).

Listing 4: Executing ansible-playbook

ansible-demo-scripts$ ansible-playbook -u local-admin --ask-sudo-pass -i inventories/hosts.baremetal site.yml
sudo password: ******
PLAY [baremetals] ***
GATHERING FACTS ***
ok: [control02.baremetal]
ok: [node02.baremetal]
ok: [control01.baremetal]
ok: [node01.baremetal]
TASK: [base | Install CenterDevice Root CA Certificates] ***
changed: [control01.baremetal] => (item=centerdevice-intermediate-ca.crt)
changed: [control02.baremetal] => (item=centerdevice-intermediate-ca.crt)
changed: [node01.baremetal] => (item=centerdevice-intermediate-ca.crt)
changed: [node02.baremetal] => (item=centerdevice-intermediate-ca.crt)
changed: [node02.baremetal] => (item=centerdevice-root-ca.crt)
changed: [control01.baremetal] => (item=centerdevice-root-ca.crt)
changed: [node01.baremetal] => (item=centerdevice-root-ca.crt)
changed: [control02.baremetal] => (item=centerdevice-root-ca.crt)
TASK: [base | Update root certificate database] ***
changed: [control02.baremetal]
changed: [node01.baremetal]
changed: [control01.baremetal]
changed: [node02.baremetal]
PLAY RECAP ***
control01.baremetal : ok=3 changed=2 unreachable=0 failed=0
control02.baremetal : ok=3 changed=2 unreachable=0 failed=0
node01.baremetal : ok=3 changed=2 unreachable=0 failed=0
node02.baremetal : ok=3 changed=2 unreachable=0 failed=0

The output PLAY [baremetals] shows the host group that was addressed at the outset as defined in site.yml. In more complex playbooks, you can use different groups, of course. This is followed by the GATHERING FACTS section. At the start of each run, Ansible collects a set of data for all computers with which it connects. This includes the host name, IP addresses, names of network interfaces, time zone, hardware information, and many other facts. The documentation contains a comprehensive list of the available facts [3]. Beyond this, you can add more facts through your own extensions or in playbooks. The information stored in this step is available downstream in the playbook execution flow for more complex behaviors.

The nonlinear sort order of the ok: ... lines and fact collection is due to Ansible opening connections to multiple hosts at the same time (five by default) to speed up execution. Depending on response times, the order can change from run to run. However, you can rely on all hosts completing a task before moving onto the next task in the playbook.

After acquiring the facts, the tasks that are prescribed by the base role are executed. Again, the order of feedback depends on the performance or the network connection to the remote computer. Because two files are being copied to four computers, there are a total of eight lines in the logfile. The changed: prefix means that all the computers have received the specified file and that it did not previously exist – or contained something different.

After this, Ansible proceeds to the next task and runs the command for successively updating the certificate on each host. Because Ansible is unaware of the effect of running this external command on the remote system, it tags the task as changed:. The criteria for detecting a change – whether through the command's exit code or its output – can be modified as needed.

At the end of the run, the PLAY RECAP section contains an overview of the program execution. In this case, the counters are identical for all the hosts because none of the tasks failed, all of the computers were reached, and they all performed the same task.

Sample Playbook with Different Original States

Because automatic provisioning has the task of achieving a defined state on a computer but does not necessarily start with the same baseline, I changed one of the certificate files on node02 manually before calling ansible-playbook again (Listing 5).

Listing 5: Modified ansible-playbook Run

ansible-demo-scripts$ ansible-playbook -i inventories/hosts.baremetal site.yml --ask-sudo-pass
sudo password: *******
PLAY [baremetals] ***
GATHERING FACTS ***
ok: [node01.baremetal]
ok: [control01.baremetal]
ok: [control02.baremetal]
ok: [node02.baremetal]
TASK: [base | Install CenterDevice Root CA Certificates] ***
ok: [control02.baremetal] => (item=centerdevice-intermediate-ca.crt)
ok: [control01.baremetal] => (item=centerdevice-intermediate-ca.crt)
ok: [node01.baremetal] => (item=centerdevice-intermediate-ca.crt)
changed: [node02.baremetal] => (item=centerdevice-intermediate-ca.crt)
ok: [control02.baremetal] => (item=centerdevice-root-ca.crt)
ok: [control01.baremetal] => (item=centerdevice-root-ca.crt)
ok: [node01.baremetal] => (item=centerdevice-root-ca.crt)
ok: [node02.baremetal] => (item=centerdevice-root-ca.crt)
TASK: [base | Update root certificate database] ***
changed: [control02.baremetal]
changed: [control01.baremetal]
changed: [node01.baremetal]
changed: [node02.baremetal]
PLAY RECAP ***
control01.baremetal : ok=3 changed=1 unreachable=0 failed=0
control02.baremetal : ok=3 changed=1 unreachable=0 failed=0
node01.baremetal : ok=3 changed=1 unreachable=0 failed=0
node02.baremetal : ok=3 changed=2 unreachable=0 failed=0

The log confirms that Ansible discovered the modified file and replaced it with the original copy, whereas the other files already had the expected content.

Using Variables and Templates

Thus far, I have copied static files to a remote computer and remotely triggered command execution. Now I'll show a more complex case in which all nodes synchronize their internal clocks with a dedicated time server. To allow this to happen, you need to install the matching package, set up the configuration, and make sure the daemon launches automatically at boot time.

Assuming you run your own time server, its address could be hardcoded in the NTP configuration file. To improve maintainability and centralize the configuration, you will instead extract its address, store the address in a variable, and ensure that Ansible dynamically adds it to the service configuration.

In Ansible, variables can have different scopes. They can be valid for a single host, a group of hosts, or an entire site. Because you want to synchronize all of the "baremetal" computers with the NTP servers in the example, the scope of the defined variable is the group. To do this, you need to create a baremetals.yml file in the new ansible-demo-scripts/group_vars directory:

---
# Variables for all baremetal hosts
NTP_SERVERS:
      - 192.168.0.150
      - 192.168.0.151
      - 192.168.0.152

You can then add the steps shown in Listing 6 to the base.yml role.

Listing 6: Additions to base.yml

- name: Install NTP daemon
sudo: true
apt: pkg=ntp state=present
- name: Ensure NTP daemon autostart
sudo: true
service: name=ntp enabled=yes
- name: Setup NTP daemon config
sudo: true
template: src=etc/ntpd.conf.j2
          dest=/etc/ntpd.conf
notify: Restart NTP daemon

The first task ensures that the NTP package is installed using apt if it does not already exist. The next task automatically starts the service when the system boots if the installation package doesn't handle this task automatically or if autostart was disabled for other reasons in the meantime.

Things start to get more exciting in the final task: In a similar style to copy, template transfers a local file to the remote system. However, instead of just copying it one-to-one, the task evaluates the file as a Jinja2 template [4]. This gives you the option of dynamically composing the content based on your own variables and Ansible facts.

In line with the convention, Ansible searches in ansible-demo-scripts/roles/base/templates for templates for the base role. Here I am creating an etc/ntp.conf.j2 file, which is a copy of the regular ntp.conf but contains the modification shown in Listing 7.

Listing 7: Modifications to ntp.conf

...
#
# removed all server lines referring to default ubuntu time servers here.
# these are our own servers:
#
{% for item in NTP_SERVERS %}
server {{ item }}
{% endfor %}
...

This fragment iterates through all the values listed previously in the group variable file in NTP_SERVERS, thus effectively generating three new lines in the final configuration.

The last line (Listing 6) in the template task, notify: Restart NTP daemon, tells Ansible to perform a special action (a handler) if the content of the target file has changed compared with the previous version. This makes sense in that it restarts the service only if the configuration really has changed. This means that you can execute the playbook multiple times without causing unnecessary service interruptions.

This handler is a special task and the last element you need to define. It comes as little surprise that it is stored in ansible-demo-scripts/roles/base/handlers/main.yml:

---
name: Restart NTP daemon
sudo: true
service: name=ntp state=restarted

Following these changes, the playbook can now be executed again (Listing 8).

Listing 8: Executing ansible-playbook Again

ansible-demo-scripts$ ansible-playbook -i inventories/hosts.baremetal site.yml --ask-sudo-pass
sudo password: *******
PLAY [baremetals] ***
GATHERING FACTS ***
...
TASK: [base | Install ntp daemon] ***
changed: [control01.baremetal]
changed: [control02.baremetal]
changed: [node02.baremetal]
changed: [node01.baremetal]
TASK: [base | Setup ntp daemon] ***
changed: [control02.baremetal]
changed: [node02.baremetal]
changed: [control01.baremetal]
changed: [node01.baremetal]
NOTIFIED: [base | Restart ntp daemon] ***
changed: [control02.baremetal]
changed: [node01.baremetal]
changed: [node02.baremetal]
changed: [control01.baremetal]
PLAY RECAP ***
...

The logfile shows that the package was installed, the configurations were set up, and, finally, the service was restarted. A quick glance at the node configuration shows that three lines with concrete IP addresses really have been created:

ansible-demo-scripts$ ssh node01.baremetal \
  grep "server\ 192" /etc/ntp.conf
server 192.168.0.150
server 192.168.0.151
server 192.168.0.152

Now the playbook can be run again to demonstrate that the daemon is not restarted unless the file content has changed (Listing 9).

Listing 9: Another Run of the Playbook

ansible-demo-scripts$ ansible-playbook -i inventories/hosts.baremetal site.yml --ask-sudo-pass
sudo password: *******
PLAY [baremetals] ***
GATHERING FACTS ***
TASK: [base | Ensure ntp daemon autostart] ***
ok: [control01.baremetal]
ok: [node01.baremetal]
ok: [node02.baremetal]
ok: [control02.baremetal]
TASK: [base | Setup ntp daemon config] ***
ok: [control02.baremetal]
ok: [node02.baremetal]
ok: [control01.baremetal]
ok: [node01.baremetal]

Because the configuration files were not modified, the handler was not notified, and the service just kept on running.

Integrating Sensitive Variables

One frequent problem in the use of version management systems relates to the access credentials that are needed as variable content in the scope of an Ansible run to store SSH keys automatically, for example, or be able to access a database.

Of course, you will not want to leave these in the clear in the Git repository for everyone to see. To prevent this happening, Ansible uses the concept of vaults, an encrypted form of YAML file for storing sensitive date of this type.

Using vaults is easy as pie: With the ansible-vault command you can create and edit practical data vaults. When you run the playbook, you can pass in the decryption password at the command line or parse it from a file that can only be accessed by authorized users, preventing the password from appearing in the shell history.

ansible-demo-scripts$ vim .vaultpass
ansible-demo-scripts$ ansible-vault \
  create --vault-password-file \
  ~/.vaultpass secrets.yml

If you now integrate the secrets.yml file with the playbook, it expects you to provide the decryption password for the vault before running the commands:

ansible-demo-scripts$ ansible-playbook \
  -i inventories/hosts.baremetal \
  --vault-password-file ~/.vaultpass \
  --ask-sudo-pass site.yml

The vault file itself can be checked into version management without any problems; it is AES256 encrypted and encoded as a pure text format.

Conclusions

The examples given here only show a small part of what Ansible is capable of with very little overhead. The structure of the playbooks follows familiar shell commands and constructs fairly closely – system administrators should be familiar with them. This approach keeps the learning curve very low for your initial steps.

The documentation is excellent and you can look forward to a huge variety of integrated tasks on which to build. These tasks include database setups, several package management variants, simple and complex file management, conditional execution based on the results of the previous tasks, and many other things. In combination with a powerful Jinja2 template syntax, there is virtually nothing you can't do with Ansible.