Virtualization Trove: DBaaS with OpenStack Lead image: Lead Image © Brian Kenney, 123RF.com
Lead Image © Brian Kenney, 123RF.com
 

Exploring OpenStack's Trove DBaaS

Cloud Service

DBaaS moves the database service to the cloud, promising a new database instance at the click of a mouse. By Martin Loschwitz

You can install databases such as MySQL, PostgreSQL, or even MongoDB very quickly thanks to package management, but the installation is not even half the battle. A functioning database also needs user accounts and several configuration steps for better performance and security.

This need for additional configuration poses challenges in cloud environments. You can always manually install a virtual machine in traditional settings, but cloud users want to generate an entire virtual environment from a template. Manual intervention is difficult or sometimes even impossible.

Furthermore, the customer isn't supposed to be troubled with setting up the database in today's IT environment. Users expect to be able to set up a service in the cloud with a mouse click.

These considerations have led to the development of a new class of tools that fall under the name Database as a Service (DBaaS). The aim of DBaaS is to make it as easy as possible for cloud customers to use a database. Amazon has used the DBaaS function in its cloud for years, and cloud solutions like OpenStack now have similar features. In this article, I present OpenStack's Trove DBaaS solution.

Trove [1] has been around for many years already. The service didn't have it easy to start with, and the developers needed several attempts to get Trove accepted as an official part of the OpenStack program. The declared goal of Trove is to hide the full technical substructure of a database from the users. Customers just need a database; how that database is implemented in the background remains hidden.

The Architecture of the solution

The Trove design follows the guidelines used for other OpenStack services: The solution consists of an API and a component that executes commands in the background and sends them to the API. Like all OpenStack APIs, the Trove API follows RESTful principles and can be operated via HTTP. The task manager – the executive component – is linked directly to the API and can see incoming API requests.

The Trove Conductor serves as the focal point for guest agents, which also belong to Trove and perform specific tasks within the VM. The Conductor acts like a proxy server for communication between guest agents within the VM and the Trove Task Manager. All communication with other components of OpenStack is made via API calls in the other services (Figure 1).

The Trove architecture follows the model of other services in OpenStack – the API accepts commands, and the Task Manager implements them.
Figure 1: The Trove architecture follows the model of other services in OpenStack – the API accepts commands, and the Task Manager implements them.

Users have two options for communication with the API: a command-line (CLI) client or a plugin for the OpenStack dashboard Horizon (Figure 2). As usual, the CLI client supports various commands that are not included in the web interface, so anyone who wants or needs the full scope of Trove functionality will not be able to avoid the console.

You can manage Trove through OpenStack's Horizon dashboard.
Figure 2: You can manage Trove through OpenStack's Horizon dashboard.

Once a command is received via the API, the Task Manager takes care of its implementation. In most cases, the implementation consists of starting a new VM and installing and configuring the necessary database on it. The guest agent performs these tasks, acting as an extension of the Task Manager within the VM.

Own Images

The commercial distribution images directly from Canonical or Red Hat do not include the Trove guest agent (Figure 3) and therefore do not support Trove. If you want to use Trove, you'll need to adapt the image to include the guest agent. The Trove developers explain in a separate document how to make Trove-specific images [2] and provide the necessary tools (Figure 4).

The Trove guest agent takes care of rolling out the database within a VM.
Figure 3: The Trove guest agent takes care of rolling out the database within a VM.
Admins can use the disk-image-create tool to create their own images for use with Trove.
Figure 4: Admins can use the disk-image-create tool to create their own images for use with Trove.

The guest agent within the VM has various tasks. The agent is in charge of installing all required packages and starts the database so that it operates according to the user's requirements. The agent also sets up users for the database, as previously defined by the user. The complexity of the configuration depends on the database you select: Installing a MySQL instance requires fewer work steps than installing a Redis cluster.

You specify the type of database when starting a Trove instance, and, depending on the configuration, the agent uses various templates to get the desired result. Trove relieves users of a lot of the work in setting up the database. And Trove provides other options that would also be available with a manual setup in a virtual machine: When starting the DBaaS instance, the user has the choice of which hardware profile to use with the database.

Of course, users can also choose whether the database should be on the local storage of a hypervisor or on a volume. It is a good idea to use volumes, even if this might lead to loss of performance. A DBaaS instance offers the same options provided by a normal instance in OpenStack – you can restart, delete, or edit your hardware profile with a mouse click.

The value of a solution like Trove stands or falls with the number of supported databases. Trove provides support for the most important members of the fraternity, including MySQL and PostgreSQL. Redis and CouchDB do not cause any problems for Trove, and MongoDB was part of the "primordial soup": the first database officially supported by Trove and the one Trove still handles best today.

An important representative of the enterprise market is missing, however: You will search in vain for the top dog, Oracle. Other popular relational databases, such as Microsoft's SQL Server or IBM DB2, are not included, either.

Orchestration and Clustering

Integration with other OpenStack services, especially the orchestration solution Heat, is of great importance to Trove's success. (A DBaaS is virtually useless if database instances can only be started and managed manually.) The OpenStack developers are aware of the need for Heat support and have already installed comprehensive Heat integration in OpenStack version 2014.1 "Icehouse." This means, for example, that the resource type OS::Trove::Instance is available for native Heat templates; this resource type starts a DBaaS instance and provides it with the necessary credentials. Heat integration for Trove provides everything needed for everyday life for clusters from multiple database nodes.

Database solutions with their own cluster mechanisms are the most difficult scenario for any DbaaS. The Trove developers have incorporated cluster functions in several places in OpenStack version 2015.1 (Kilo) – MongoDB is definitely the textbook example for DBaaS clustering.

The Trove developers have written an extension of their API with database clustering functionality, and Trove comes with a Cluster instance type. If you start a cluster instance and specify all the necessary parameters, such as the total number of instances, Trove reliably takes care of the rest.

Trove copes well with the Galera MySQL clustering tool [3] and builds a functional cluster from multiple Galera instances. Trove's Galera support is limited to the database itself; other aspects of the Galera configuration are left to the user.

Considerations for Performance

When you deal with a DBaaS for the first time, you often forget an important criterion: performance. Good performance is often an essential ingredient for success: Most web applications, for example, are totally dependent on the database working reliably and quickly in the background. However, a user who focuses the optimization effort exclusively on bandwidth and throughput will come up short: It is usually latency that is critical for a database running in the cloud.

The reasons for this emphasis on latency will quickly become clear if you take a look at typical clouds up close: Almost all projects, including OpenStack, prefer scalable object storage. Solutions like GlusterFS or Ceph are used. All distributed storage solutions use classic Ethernet for exchanging data between the individual disks of the installation. However, the solution has a significantly higher inherent latency than when accessing a local disk.

Because all data stores also work synchronously, the application running on them gets all this latency. So if a VM is on a Ceph volume, write operations on the local VM storage take much longer than on bare metal. Anyone who moves an application from bare metal to the cloud will face significantly higher latencies.

Backups and Snapshots

The maintenance procedures for a database running within a virtual machine are different from those of a classic DB setup. Backups and snapshots are a good example: Trove has its own functions for these operations. Ultimately, the terms backup and snapshot mean the same thing in Trove. The guest agent does not get involved with backups. Instead, Trove saves a snapshot of the volume belonging to the DBaaS instance using the internal snapshot function of the OpenStack VM service Nova and the Volume service Cinder. Both full and incremental backups are possible.

An object store, for example Swift, serves as a location for storing the snapshots. In its own metadata database, Trove remembers where it stored the snapshot. The status of all snapshots can be restored in this way later, and customers are oblivious to what's happening under the hood: They just need to use the backup and restore functions of the Trove API from the command line or in a web interface.

Commercial Solution from Tesora

The American company Tesora [4] is heavily involved with developing Trove in OpenStack. The company describes itself as an enterprise provider of DBaaS with OpenStack, and it is represented on the market with its own product. The software called Tesora Enterprise upgrades many features in Trove that, according to the manufacturer, admins sorely miss. These value-added features include support for more databases: In addition to the databases included with Trove, the Tesora variant offers support for Oracle and several flavors of MySQL, such as Percona XtraDB or MariaDB.

Tesora also comes with its own dashboard plugin for DBaaS, with which it is possible to change more parameters than with the original from OpenStack (Figure 5). Ready guest images aim to make it easier for you to maintain Trove during operations, because it is no longer necessary for the cloud provider to remake the images. The Enterprise version of Tesora even contains ready guest images for commercial databases. In terms of security, the Tesora product offers several methods that are missing in Trove for toughening up the database. The manufacturer claims that clustering – particularly for MySQL – is easier to implement in Tesora than in Trove from OpenStack.

Tesora beefs up the GUI for its OpenStack solution and even provides optional around-the-clock support.
Figure 5: Tesora beefs up the GUI for its OpenStack solution and even provides optional around-the-clock support.

Tesora also offers help support for its Trove product, including a 24/7 model. This support is probably the most important argument in favor of Tesora for companies running critical applications with DBaaS databases. OpenStack providers only offer support for Trove within the framework of a normal support package from a team that might not specialize in Trove applications.

The company's website might not mention the price of the Tesora Enterprise edition, but interested parties have the opportunity to get the product as a trial version and test it for 60 days. Anyone who doesn't want to splash any money on Tesora still has access to the product's Community edition; however, keep in mind that the Community edition doesn't include any commercial components, and support is missing completely.

Alternatives with Ansible

Almost any automation system provides the tools for editing various database systems. If you aren't using the OpenStack cloud, you could find a way to manage your database with Puppet, Chef, Ansible, or various other solutions. The following example uses Ansible.

It is better to divide the deployment into several steps. The first step involves installing the database. The second step provides the installed database with the desired configuration and then restarts it. In the third and final step, users are created so the application can use MySQL.

Several ready Ansible roles are available on the web that allow you to install and configure the package [5] [6]. To get a working database, you need to create a corresponding inventory file for Ansible and indicate the host where MySQL should end up as its own host group.

A separate playbook, which assigns the MySQL role to this host group, takes care of the rest. In the playbook, you also set the parameters for configuring MySQL. Running the playbook then leads to an executable MySQL on the target system.

Ansible offers additional functions through modules. The module called mysql_user is perfectly suited for creating users. Because the module calls mysql in the background, the user needs a suitable configuration file for mysql when Ansible invokes commands on the target system. The module documentation [7] reveals all the necessary details. Use the mysql_db module to set up a MySQL database [8].

Conclusions

DBaaS in the cloud brings the database into the future. OpenStack's Trove DBaaS tool provides easy integration with the OpenStack orchestration environment. If you aren't using the cloud, you can still use a solution like Chef or Ansible to automate database management.