MySQL is gearing up with its own high-availability Group Replication solution
Do It Yourself
One important reason for the success of MySQL [1] is asynchronous replication, introduced in the 2001 release 3.23, which allows admins to set up replica instances (slaves) that receive all data changes from the master system. When MySQL only needs to read data, it can do so on a slave to reduce the load on the master.
Many admins were quick to use this replication method to promote one of the slaves to the master if the master server failed. In practice, users often suffered data loss because the slave was not up to date and the asynchronous transmission of changes caused backlogs of anything from a few seconds to several minutes.
MySQL v5.5 saw the introduction of semi-synchronous replication, in which the master waits until a change reaches one of the slaves (but not until it is applied). In the case of multiple concurrent write access, however, it comes at the expense of server throughput and is therefore rarely used.
Highly Available
MySQL has thus not been able to achieve true high availability with its on-board tools. In the community, Galera Cluster by Codership Oy, released under the GPLv2 [2], has enjoyed increasing popularity in recent years. It is based on a MySQL server and supports writing to each of the nodes – given that you have at least three servers.
Galera uses a group protocol to ensure that changes can be made on all nodes. If this is impossible because another user has changed the same data at the same time on another node, MySQL rolls back the transaction. In this case, the client needs to handle the error (e.g., by resending the queries).
Prompted by the popularity of Galera Cluster, Oracle reacted, publishing a stable release of the Group Replication plugin in September 2016 with MySQL Server 5.7.17 after no fewer than eight prereleases. Group replication is a "shared nothing" architecture: All nodes of a group keep all data. Communication between the nodes relies on the Mencius protocol [3] using a variant of the Paxos algorithm.
The system generates a unique order for all writes so that a winner is always found in the case of concurrent access to the same data. A new node can be added to the group at any time – all you need is a reasonably recent backup of an existing node. The plugin then completes final synchronization using the classic MySQL binary logs.
Requirements
Some requirements need to be met if you want to use the Group Replication plugin. Because a majority of nodes is always necessary to make decisions, each group must have at least three nodes. Group replication works only with tables that use InnoDB as the storage engine; the legacy MyISAM engine is not supported. Moreover, it is imperative for each table to have a primary key.
Surprisingly, IPv6 is still lacking – in 2017! Currently, network communication between the replicating nodes relies on IPv4, but improvements have been promised in one of the upcoming releases. Also, Oracle requires a fast network. Transatlantic WAN links are thus ruled out, for the time being, as potential Group Replication applications.
One Master, or Many
Group replication is primarily a high-availability feature. Higher capacities for read access is a side effect, whereas write capacity does not benefit. The default operation mode for a group is single-primary mode, in which the group defines one node as the primary. The database can write to this node only. The remainder are available for read queries and can assume the role of the primary if it happens to fail.
A second operating mode known as multi-primary allows writing to every node. If you want to use this mode, you need to take a closer look at its limitations [4]. Make sure there are no concurrent data definition language (DDL) queries. Simultaneous changes to the structure of a single table from multiple nodes can lead to data loss and the failure of the entire group. Therefore, the use of multi-primary mode is not currently recommended.
Limits
MySQL does not support transaction savepoints, which also rules out the legacy mysqldump
command with the --single-transaction
parameter as a backup method. Thus, a backup solution for Group Replication would require either MySQL Enterprise Backup [5], which is available when you purchase an enterprise license from Oracle, or the free Percona XtraBackup [6] software. The query shown in Listing 1 discovers whether or not your server uses savepoints.
Listing 1: Testing for Savepoints
mysql> SELECT COUNT_STAR FROM performance_schema.events_statements_summary_global_by_event_name WHERE event_name LIKE "%/savepoint"; +------------+ | COUNT_STAR | +------------+ | 1 | +------------+ 1 row in set (0.00 sec)
Hands On
The following sections demonstrate the installation and setup of MySQL Group Replication. The example starts with three MySQL 5.7.17 servers that do not house any data.
The settings described in Listing 2 must be enabled on all servers (in the /etc/my.cnf
file). Global transaction IDs and binary logging are enabled, and all nodes that belong to the group need to be listed with the port used for Group Replication.
Listing 2: Configuration Settings
01 gtid_mode = ON 02 enforce_gtid_consistency = ON 03 master_info_repository = TABLE 04 relay_log_info_repository = TABLE 05 binlog_checksum = NONE 06 log_slave_updates = ON 07 log_bin = binlog 08 binlog_format = ROW 09 transaction_write_set_extraction = XXHASH64 10 plugin-load = group_replication=group_replication.so 11 group_replication_group_name = "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaabcdef" 12 group_replication_start_on_boot = on 13 group_replication_group_seeds = "192.168.1.10:24901,192.168.1.11:24901,192.168.1.12:24901" 14 group_replication_bootstrap_group = off
One block of settings needs to be different on each of the nodes, including the server ID, port (for client connections), and local address that has a port for group communications (Listing 3). Because synchronization of the nodes relies on the MySQL binary log infrastructure, an extra replication user also needs to be created (Listing 4).
Listing 3: Node-Specific Settings
server-id=10001 port = 3306 group_replication_local_address = "127.0.0.1:24901"
Listing 4: Replication User
CREATE USER IF NOT EXISTS rpl@'%' IDENTIFIED BY 'Rplpass1!'; GRANT REPLICATION SLAVE ON *.* TO rpl@'%'; CHANGE MASTER TO MASTER_USER='rpl', MASTER_PASSWORD='Rplpass1!' FOR CHANNEL 'group_replication_recovery';
You then start Group Replication on the first node, which, because it still cannot connect with other nodes, first enters bootstrap mode before starting:
SET GLOBAL group_replication_bootstrap_group=ON; START GROUP_REPLICATION; SET GLOBAL group_replication_bootstrap_group=OFF;
Even if all the servers you use are empty, you still need to transfer the dataset from the first server to the others with the help of a dump; otherwise, the global transaction IDs of the additional systems make it unnecessarily difficult to set up replication. To do this, create a dump on the first server:
mysqldump -uroot -p -A --triggers --routines --events > /tmp/instanz1.sql
Next, copy this dump to the other nodes and import it there:
mysql -uroot -p --init-command="RESET MASTER" < /tmp/instanz1.sql
Now the MySQL service must be started again on the respective nodes:
CHANGE MASTER TO MASTER_USER='rpl', MASTER_PASSWORD='Rplpass1!' FOR CHANNEL 'group_replication_recovery'; START GROUP_REPLICATION;
You have now met all the requirements for starting Group Replication on the other nodes, and Group Replication is now running. Write queries are already allowed against the first node.
Operation
If the node fails, the system automatically selects a new node as the primary and allows writes there. If the failed node later returns to the group, it picks up the changes that accumulated in the meantime from one of the other nodes.
One pitfall remains: If the connection fails for some reason (e.g., because of a network failure), the server will work as a standalone system from that point on and thus accept write queries again, as such, which makes it impossible for the cluster to integrate the node without populating it with a completely new backup from one of the other nodes. A potential for improvement still exists, but as the first stable release, Group Replication left a good impression.
Query Distribution
Group replication does not solve the problem of how to route queries from the application to the cluster nodes. Particularly in the case of single-primary mode, some additional intelligence is necessary because the details of which node is now accepting writes needs to be stored somewhere (see the "Query Distribution" box).
ProxySQL seems to be the most common solution in what is still a very small market. Ideally, you would install the solution on every application server, rather than a central location on the network. All of the solutions are free software except for MaxScale.
And Galera?
The previous top dog in the field of virtual synchronous replication in the MySQL camp was Galera Cluster, an extension for MySQL released under GPLv2. Galera manages replication in an additional layer that sits on top of the MySQL core. In contrast to Group Replication, it does not use the binary log to handle past transactions; instead, it relies on GCache, a proprietary file that acts as a ring buffer.
Galera relies on MySQL but implements collision detection and distribution of the write set itself. In contrast to Oracle's solution, it does not rely on the modular system of replication features already integrated into the server. Galera has already seen several years of production use in many places and is ahead of Group Replication in this aspect.
Future
With the release of Group Replication, Oracle has given users broad insight into its roadmap [13]. Group replication is the most important component of the MySQL InnoDB cluster, a product that also includes the MySQL shell for simplified cluster installation and administration, and MySQL Router for automatic query distribution. The InnoDB cluster is available as a preview [14]. Oracle's next steps in the evolution of the MySQL platform will be read scale-out (via additional asynchronous read-only slaves) and write scale-out (via sharding).