Nuts and Bolts MariaDB MaxScale Lead image: Lead Image © lightwise, 123RF.com
Lead Image © lightwise, 123RF.com
 

MariaDB MaxScale: A smart proxy for MySQL

Director

MaxScale by MariaDB is a smart proxy server for MySQL that speaks the same protocol as the database server. The manufacturer claims solid high availability and horizontal scalability. By Martin Loschwitz

MySQL databases are just one of many interconnected system components, which call for high availability and horizontal scalability. For databases in particular, horizontal scaling is significantly more complex than with other services, often because most MySQL clients are not smart enough for clustered MySQL installations. If you do not relish the task of programming the missing client functions for database clusters yourself, you are in good hands with MaxScale, which speaks the MySQL protocol, analyzes requests from clients, and routes them to specific servers.

When scaling vertically, admins expand a single large hardware instance on which specific services run, whereas when scaling horizontally, they increase the number of instances, thus distributing the load across more shoulders.

Horizontal scaling has supplanted classic high availability (HA) approaches, because if you have a fair number of instances of a service, the failure of a single instance is only a problem for its currently open connections – and only if the client is not smart enough to reconnect automatically.

Although these two approaches are ready for production use in MySQL – MySQL in master-slave mode and a Galera cluster with multimaster functionality – these solutions are unlikely to do the job if left to their own devices.

MaxScale for Databases

MaxScale comes from MariaDB and made its debut in 2014. The MaxScale team is led by MySQL creator Monty Widenius, who promised that MySQL databases would support high availability, while putting an end to scalability issues. In terms of functionality, MaxScale is a proxy server: At one end, the service opens connections to back ends of the same database in the background; at the other end, it waits for incoming connections from clients. In this article, a client is any application (CRM, web shop, document management, etc.) that uses MySQL as a back end and therefore assumes a client role in its dealings with the database server.

Admins might be reminded of a load balancer, but MaxScale differs in one important aspect: To the client, MaxScale looks like a database, because it speaks the MySQL protocol. It is thus not only transparent to the client, as a local load balancer would be, but it can also analyze requests from clients and route them to specific servers accordingly. In doing so, it can, for example, separate write requests from read requests.

Great Demand, Low Supply

Separating traffic is a common requirement in MySQL installations with more than one node for many reasons:

Most admins work around these issues by installing a load balancer, such as HAProxy [1], or an F5 Appliance [2] between the client and database. However, all of these solutions suffer from the same problem: Because they do not speak MySQL, they do not have an image of the MySQL cluster in their back ends, and a rigid pattern limits how incoming connections are directed to them.

The idea of working with the shortcomings of the MySQL client directly in your application does not promise much more success: The attempt typically degenerates into a code war that mainly leads to confusion. Although several approaches besides MySQL clusters have arisen to solve the problem (e.g., ScaleArc), all of these products suffer from more or less severe technical limitations that makes them unsuitable for production use.

Modular Architecture

In simple terms, MaxScale sees itself as particularly smart client that assumes a kind of bridge function for less smart clients. Common MySQL APIs – such as those in PHP or C – connect with MaxScale, which establishes the connection to Galera or to a master-slave MySQL in the background. However, this function is not the only outstanding MaxScale feature.

The people at MariaDB put a lot of thought into a suitable design for MaxScale, because the results are impressive: Inside, MaxScale is based on a fully modular architecture. The part that opens ports to the outside world and becomes available for clients is little more than a listener on a TCP port. The listener first passes all the incoming traffic to the MySQL client engine, which evaluates the pending query.

In the case of normal MySQL, this would be the end of the road, and the client would receive a response from the server. Not so with MaxScale: From the client module for MySQL, the evaluated information is now forwarded to the routers after applying any filters. At router level, a decision is made as to what to do with the query. A single router is responsible for a special function: The earlier example, in which incoming requests for write and read access are separated, is handled by the ReadWrite Split router. The routers are therefore the most important modules within MaxScale; in addition to ReadWrite Split, you have a filter for regular expressions and a logging router.

Once the packets leave the router, they migrate to the back-end module that works closely with the monitor plugin, which keeps track of the configured back ends and their availability; it also knows whether it is a master or slave. On the basis of this information, the MySQL back-end module finally sends the respective SQL query to one of its back ends. A cache for login credentials is also involved.

Installation and Configuration

MaxScale is available from the MariaDB website as a free download [3]. The manufacturer also distributes the product as open source software; the entire source code of MaxScale is available from GitHub [4]. The open source version has all of the existing features, so an enterprise version is not necessary.

Moreover, it is not necessary to compile MaxScale from the sources: In the user portal [5], you will find instructions for using MaxScale packages for CentOS, SUSE, Ubuntu, and Debian. After you enable the corresponding package source, all you need to do on Ubuntu 14.04 is enter

apt-get install maxscale

to complete the installation.

After the installation, you need to configure. The good news is that MaxScale comes with a single configuration file, and about 60 lines are sufficient for a working installation. The example in Listing 1 is designed for a Galera cluster, which distributes incoming requests to the back ends with the help of the readconnroute router. The individual parameters are practically self-explanatory, but you will probably want to check the MaxScale documentation for the router_options values [6].

Listing 1: MaxScale with Galera Back End

01 [maxscale]
02 threads=4
03
04 [Galera Router]
05 type=service
06 router=readconnroute
07 router_options=synced
08 servers=backend1, [...]
09 user=maxscale
10 passwd=secret
11
12 [Galera Listener]
13 type=listener
14 service=Galera Router
15 protocol=MySQLClient
16 address=10.42.0.1
17 port=3306
18
19 [Galera Monitor]
20 type=monitor
21 module=galeramon
22 servers=backend1, [...]
23 user=maxscale
24 passwd=secret
25
26 [Debug Listener]
27 type=listener
28 service=Debug Interface
29 protocol=telnetd
30 port=4442
31
32 [backend1]
33 type=server
34 address=10.42.0.2
35 port=3006
36 protocol=MySQLBackend
37
38 [CLI]
39 type=service
40 router=cli
41
42 [CLI Listener]
43 type=listener
44 service=CLI
45 protocol=maxscaled
46 address=localhost
47 port=6603

The user specified for user and password must actually exist in the database. Without it, MaxScale has no way of monitoring the Galera cluster back ends. The reward for all this effort is an instance of MaxScale that distributes incoming requests to the same extent across three back ends, which is the simplest MaxScale setup: a load balancer.

Router Parameters

Using router_options, users can determine the principle by which MaxScale selects a back end for a request. In the case of readconnroute, this is done exclusively on the basis of the state of the existing back ends. The synced option in line 10 tells MaxScale that this is a Galera cluster and that a back end can only be a target if it is a synchronized cluster member.

The router_options possibilities for a master-slave setup are master and slave. The running option, on the other hand, basically allows any host as a target if MaxScale can connect to its database. An optional parameter, weightby, allows you to prioritize your machines by implementing weighting in the best load-balancer style.

The configuration for a Galera cluster shown here ultimately amounts to a classic load-balancing scenario. However, in contrast to MaxScale, a typical load balancer would hardly be in a position to coordinate its distribution mechanism based on the state of the MySQL instances on the back ends.

Read-Write Splits

Whereas Galera is only gradually asserting itself in many companies, MySQL master-slave setups are commonplace. The readwritesplit MaxScale router is useful in these cases: It analyzes MySQL queries and forwards read and write queries to different back ends.

The example in Listing 2 contains a sample configuration for a master-slave setup. Write access is sent to the master and read queries to the slave. It lists only the relevant parts for the setup – everything else can be taken from the sample configuration for the Galera setup in Listing 1.

Listing 2: MaxScale for a Master-Slave Setup

01 [ReadWrite Split Router]
02 type=service
03 router=readwritesplit
04 servers=backend1, [...]
05 user=maxscale
06 passwd=secret
07 max_slave_connections=100%
08 max_slave_replication_lag=21
09
10 [ReadWrite Split Listener]
11 type=listener
12 service=ReadWrite Split Router
13 protocol=MySQLClient
14 port=3306
15 address=10.42.0.1

The max_slave_connections and max_slave_replication_lag parameters are important. By specifying 100% for the first parameter (line 7), MaxScale can use any MySQL slave. If you were to enter 80% here, MaxScale would use only four of five slaves. The 21 for the second parameter (line 8) sets the time in seconds that a slave may lag behind its master until MaxScale ignores it.

Out of the box, the readwritesplitrouter comes with a fixed table of query types for MySQL, which are divided into reads and writes, although users can define additional filters. The regex filter lets you classify queries for other types of splitting – other than read-write splitting – with the use of regular expressions.

Galera and Master-Slave

If you use a Galera cluster but still only want to see writes on one Galera node, you can combine the two modes of operation already described. The router then works in readwritesplit mode, which causes MaxScale to make one of the Galera nodes the master, to which writes are routed. In the case of the splitter service, max_slave_connections, as in the master-slave example, ensures that MaxScale only uses a certain number of slave back ends.

By using disable_master_failback = 1 for the Galera monitor, you can also keep the master node from changing during operation when a new node enters the Galera cluster. This would be the default behavior: MaxScale would look at the WSREP_LOCAL_INDEX parameter of the node and choose the node with the lowest value for this variable as the master. If disable_master_failback is set to 1, a server will remain the master as long as it exists in the cluster.

The combination of readwritesplit and the Galera back end ultimately allows you to operate a genuine multimaster cloud that is compatible with older, non-cluster-capable applications. For larger cloud installations that use MySQL in the back end (e.g., Open Stack), the program is a genuine winner.

MaxScale CLI

MySQL now offers a variety of commands to discover the state of the database directly from the command line. MaxScale comes with its own command line (Figure 1), which mimics MySQL behavior in some way. To start the tool, you enter maxadmin. The show server command will give you an overview of the configured back end and its state.

With the help of maxadmin at the MaxScale command line, you can quickly discover the health of a router and its back end.
Figure 1: With the help of maxadmin at the MaxScale command line, you can quickly discover the health of a router and its back end.

The maxadmin command also keeps internal statistics, and eventstats show displays how long various queries take on average and how high the load on the MaxScale server is at the moment (Figure 2). If you need to remove a single back end from the cluster (e.g., for maintenance purposes), you will also find a maintenance mode. Entering

set server backend1 maintenance

enables maintenance mode for backend1; the same command with clear instead of set disables maintenance mode for the node.

Individual back ends can be switched to maintenance mode from the command line for updates or reboots.
Figure 2: Individual back ends can be switched to maintenance mode from the command line for updates or reboots.

MaxScale as the Binlog Server

MaxScale is not only qualified to be the doorman for other MySQL back ends, it can take an active role within a cluster by acting as a source for binlogs that feed the MySQL slaves data. MariaDB calls this feature "replication proxy." In such a setup, MaxScale itself becomes the slave node for the master server in the MySQL cluster while acting as a master for more slaves. Because instances of MaxScale will scale arbitrarily, the number of potential slaves can be high.

In this function, MaxScale competes directly with MySQL. After all, any normal MySQL slave can produce binlogs and distribute them to other slave servers. The developers refer to performance problems in the MySQL version as the motivation for developing the MaxScale version.

A complete guide to the use of MaxScale as a binlog router can be found in the MaxScale GitHub directory [7]. The developers also mention that the show service <replication> command is available on maxadmin binlog servers if the local setup follows the example; otherwise, you need to replace <replication> with the name the replication service uses in the configuration. The service statistics for a binlog router reveal many important details about the replication status or the number of slaves and are thus an important source of information.

MaxScale can also be put to use for compliance and security. If you need performance logging for your database queries, you typically retrieve the data directly from the databases. However, because every query also passes through MaxScale, it can provide an accurate log, as well. MaxScale offers this feature so that admins can further reduce the load on their servers.

Accelerating Cloud Setups

Performance tests in the run-up to this article showed that MaxScale is impressive, particularly in throughput. In terms of latency, the service cannot improve on physical limits; latency obviously cannot be smaller than the latency of Ethernet. However, MaxScale can offer real added value in cloud setups: Existing software-defined network (SDN) implementations are often responsible for latency that is significantly above that of Ethernet.

From version 1.3 on, MaxScale supports persistent database connections to its back ends. MaxScale then opens connections in the background and distributes incoming requests from clients to the already open connections. In this case, it avoids a major part of the network overhead that drives latency in SDN environments. If you need MySQL to be faster than ever before (e.g., on Amazon or Rackspace), MaxScale 1.3 takes you closer to this goal.

Hardening Against Failures

Although MaxScale ensures reliable MySQL, it can itself become a single point of failure. If your own setup is entirely based on MaxScale, a problem arises as soon as MaxScale fails. However, it is not particularly difficult to make MaxScale highly available. In one approach, several instances of MaxScale can access the same database back ends in parallel; then, you manually distribute the physical clients across the available instances of MaxScale so that a failure does not take down the whole setup. Another approach would be deploy a classic load balancer upstream of the MaxScale instances, although this is not ideal in terms of latency.

An alternative solution is the classic cluster manager: For a single instance of MaxScale, all you need is an active-passive setup (e.g., with Pacemaker). A team consisting of a flexible service IP and MaxScale easily moves from one host to another in this variant. In fact, MaxScale describes the setup in a company blog [8].

Conclusions

MaxScale by MariaDB turns out to be a very handy addition to MySQL clusters. It supports both Galera clusters and back ends that follow the master-slave principle, which is already installed as the factory default in MySQL.

MaxScale solves several problems at once for admins. On one hand, the MaxScale-MySQL-Galera team lets you create a database that scales horizontally, even if the client is not specifically equipped to talk with several database instances. Thanks to MaxScale, practically any MySQL client can use this kind of cluster, and MaxScale itself meets all the requirements of the database on the client side.

Administrators of master-slave clusters will particularly look forward to the ability to split reads and writes. Auxiliary features, such as monitoring database clusters or logging queries separately, contribute to the overall positive impression.

If you run MySQL and are thinking of scaling your setup horizontally in the near future, you should consider MaxScale as part of the deal.