Features MySQL Forks and Patches Lead image: © Dmitry Karasew, 123RF.com
© Dmitry Karasew, 123RF.com
 

Modern MySQL Forks and Patches

Spoiled for Choice

MySQL is the standard solution for free relational database systems in web applications, but new forks, more storage engines, and patched versions muddy the water. Now's the time to take stock of the most important offerings. By Caspar Clemens Mierau

If your MySQL server is too slow, you have various approaches to solving the problem. Besides optimizing queries and indexes, reworking the configuration, and upgrading your hardware, moving to a customized version of the MySQL server can be a good idea. In recent years, so many patches, forks, and new storage engines have been released that it is hard to keep track of them. For hard-working developers and database administrators, this means a change from the simple choice of a standard MySQL distribution.

Little Band-aids

Many enhancements for MySQL come from major corporations like Facebook [1] and Google [2], who run their ad services on top of MySQL or from MySQL specialists like Percona [3], whose claim to fame is the "MySQL Performance Blog" [4] (standard reading for anyone interested in MySQL). The patches can be grouped into three categories: (1) reporting enhancements, (2) functional enhancements of the MySQL kernel and database engines, and (3) performance optimizations. In most cases, a combination of patches from all three categories will make the most sense. Moving to a database server that you patched and compiled yourself can be a daunting prospect. Thankfully, projects such as OurDelta [5] offer repositories with meaningfully patched and prebuilt MySQL packages for popular distributions like Debian, Ubuntu, and CentOS/RHEL. The patches I will be looking at in the rest of this article are a cross-section of the current OurDelta versions of the MySQL server; see their website for a complete list of patches and notes on how to use them.

Reporting

Extended reporting allows administrators to collect more granular information about the MySQL server's behavior under load. Thus far, slow.log, which offers very little in the line of configuration options, might be your first port of call. However, its utility value is restricted to identifying individual, computationally intensive queries on the basis of the time they use – and non-used indexes. The MicroSlow patch offers new filters for a more targeted search for poorly formulated queries. Thus, it logs queries that are responsible for writing temporary tables to disk, performing complete table scans, or reading a freely defined minimum number of lines in a table. The mysqldumpslow Slow.log statistics tool, which is not very well known but is part of the MySQL standard distribution, has been modified to be able to read and evaluate the extended entries.

Aggregated run-time statistics on usage behavior are equally as useful. The UserStats patch extends MySQL by adding statistics for users, clients, tables, and indexes. After enabling data collection in my.cnf or issuing the SQL SET GLOBAL userstat_running = 1 command at run time, four tables in the information_schema, USER_STATISTICS, CLIENT_STATISTICS, INDEX_STATISTICS, and TABLE_STATISTICS, are continually populated with data. The statistics can be accessed via the SHOW command. For example, SHOW TABLE_STATISTICS will give you a table-by-table evaluation of lines read and modified and indexes updated.

Direct access to the statistics tables in information_schema is useful because they are accessed as normal tables, and you can target the results to manipulate. Listing 1 shows a query for the five tables with the most frequently read lines. This example taken from Live operations of the Rails-based Moviepilot movie community and the underlying movie database OMDB (both anonymized for this evaluation) shows that read access to the images table is particularly frequent. The next step would be for the database user to experiment with code optimization or other changes to the table format to reduce access incidence and save time. The relatively write-intensive movies table has a suspiciously high write-access count and, at the same time, a large number of index updates.

Listing 1: Userstats Patch in Action

mysql> select * from information_schema.TABLE_STATISTICS ORDER BY ROWS_READ DESC LIMIT 0,5;
+--------------------+---------------+-------------+--------------+------------------------+
| TABLE_SCHEMA       | TABLE_NAME    | ROWS_READ   | ROWS_CHANGED | ROWS_CHANGED_X_INDEXES |
+--------------------+---------------+-------------+--------------+------------------------+
| moviepilot         | images        | 13138219791 |        14778 |                 118224 |
| moviepilot         | events        |  3957858216 |        59964 |                 359784 |
| moviepilot         | comments      |  2650553183 |         3408 |                  20448 |
| moviepilot         | movies        |  2013076357 |       598505 |                7780565 |
| omdb               | log_entries   |  1106683022 |         2737 |                   5474 |
+--------------------+---------------+-------------+--------------+------------------------+

Because the statistics can be reset easily with FLUSH TABLE_STATISTICS, interval-based evaluation by means of a Munin plugin that you write yourself, or some similar method, would be your best bet. Retrospectively, you could investigate load peaks in relation to table access and modification.

New Functions

New functions in the MySQL kernel give administrators additional options so that maintenance of the MySQL server is more secure and convenient. A typical task is to stop MySQL processes with the KILL command. Under load, you might see a process listed as Idle by SHOW PROCESSLIST start to handle a new query just at the moment you kill it. The "Kill if Idle" patch adds an option to kill a process only if it is doing nothing: KILL IF_IDLE Process_Id. This saves you the embarrassment of accidentally killing a process while it is handling a query.

LVM and ZFS snapshots are commonly regarded as the simplest methods for backing up an InnoDB database on the fly without interrupting operations. If this method is not an option for you and you are forced to rely on a legacy dump file, you need to make sure that the data on your MySQL server do not change while a file is being dumped. The legacy approach to doing this is FLUSH TABLES WITH READ LOCK. However, this might not be sufficient in the InnoDB case because background processes also write to the database. The InnoDB Freeze patch executes SET GLOBAL innodb_disallow_writes = 1 and then freezes all processes that write InnoDB data so you can create a backup. Afterward, SET GLOBAL innodb_disallow_writes = 0 disables the freeze.

Performance Enhancements

Thanks to its support for transactions and line-based locking, InnoDB has developed into a modern alternative for the now fairly ancient MyISAM engine. Despite performance gains in write access thanks to line-based locking, the overhead for supporting transactions, foreign keys, and other functions (even if you don't use them) costs valuable CPU cycles and hardware I/O resources. MySQL systems under heavy load thus need a perfectly configured and powerful InnoDB engine.

A variety of performance-boosting patches are available for the version of InnoDB that ships with MySQLM; some of them are included in the OurDelta version. One that is worthy of mention is a reworked RW lock that improves locking behavior on multi-processor systems in particular. A description of all the improvements is beyond the scope of this article, but one thing is clear; patches for InnoDB exist that retain compatibility and offer transparent optimization of the engine. Typically, it is difficult to measure the performance gain in an objective way because the performance of the MySQL server will depend to a great extent on the hardware, configuration, and data it uses. The most exhaustive and reliable source is the MySQL Performance Blog [4], which regularly publishes test results.

Improvements by selectively installing patches for the legacy InnoDB engine are regarded as a fairly conservative approach. The use of an alternative database engine holds more promise. The InnoDB plugin and Percona XtraDB engines are becoming increasingly widespread.

The InnoDB Plugin

The InnoBase InnoDB plugin is an ongoing development of the InnoDB engine that ships with MySQL [6]. Improvements include general optimization of CPU load and I/O access, a faster locking mechanism, extended configuration and reporting options, and optional table compression.

MySQL 5.1 introduced the option of unloading the standard engine and replacing it with a different version. Starting with MySQL 5.1.38, MySQL additionally supplies the InnoDB plugin. As MySQL describes this as a release candidate, administrators do need to enable it manually. The official MySQL documentation describes the steps required to do so [7]. To benefit from the combination of distribution updates for the MySQL kernel and the latest functions and optimizations of the InnoDB plugin, it is a good idea to install the latest InnoDB plugin version from the InnoDB website.

Ubuntu 10.04 comes with MySQL server version 5.1.41. The /usr/lib/mysql/plugin/ houses an InnoDB plugin version 1.0.4 that is disabled by default. The InnoDB website has the current version, 1.0.6, which you can download and unpack. Then copy ha_innodb.so to the /usr/lib/mysql/plugin/ directory. Because Ubuntu uses AppArmor to protect services by default, you need to disable or modify AppArmor to let you load content from the plugin directory by adding these lines to the /etc/apparmor.d/usr.sbin.mysqld ruleset:

/usr/lib/mysql/plugin/ r,
/usr/lib/mysql/plugin/* mr,

Then restart Apparmor with the service apparmor restart command to enable the new rules. In my.cnf, you then need to enable the InnoDB engine and load the InnoDB plugin as shown in Listing 2. The InnoDB plugin documentation contains detailed information on this procedure. The MySQL server error log contains the message shown in Listing 3 after loading the InnoDB plugin.

Listing 2: Loading the InnoDB Plugin into my.cnf

[mysql]
ignore_builtin_innodb
plugin_load=innodb=ha_innodb.so;  innodb_trx=ha_innodb.so;innodb_locks=ha_innodb.so;
  innodb_lock_waits=ha_innodb.so;  innodb_cmp=ha_innodb.so;innodb_cmp_reset=ha_innodb.so;
  innodb_cmpmem=ha_innodb.so;  innodb_cmpmem_reset=ha_innodb.so

Listing 3: With and Without the InnoDB Plugin

<I>MySQL server without InnoDB plugin:<I>
InnoDB: Started; log sequence number 0 44233
<I>MySQL server with current InnoDB plugin<I><ßßI>:<ßßI>
InnoDB: The InnoDB memory heap is disabled
InnoDB: Mutexes and rw_locks use GCC atomic builtins
InnoDB: highest supported file format is Barracuda.
InnoDB Plugin 1.0.6 started; log sequence number 44233

The documentation provides detailed information on the optimizations and new features offered in the InnoDB plugin. One thing that stands out against the rest is the new "Barracuda" file format. In contrast to the standard "Antelope" format, it stores the InnoDB tables in a compressed format. Although this could cost additional CPU cycles, it will give you huge I/O performance savings – depending on your data structure – because far fewer operations are required on disk/SSD. To discover whether it is worthwhile changing to Barracuda, you will need to measure performance. Tables with large text and blob fields in particular will benefit from compression. Incidentally, MyISAM has supported "compressed" tables for some time; however, you cannot modify compressed MyISAM tables in ongoing operations.

Percona XtraDB

Percona XtraDB [8] takes things one step further than the InnoDB plugin. This storage engine is a merge of the current InnoDB plugin version with additional performance and feature patches. From a codebase point of view, XtraDB is thus the most innovative version of InnoDB. But don't let the name worry you: XtraDB is an InnoDB engine. The new name simply serves to underline the major differences between it and the version of InnoDB that ships with MySQL.

Your easiest approach to installing XtraDB is to resort to the MariaDB packages created by OurDelta. MariaDB [9] itself is a MySQL fork by the well-known MySQL developer Michael "Monty" Widenius. Widenius is working on a transactional alternative to MyISAM – the new Maria engine. At the same time, the MariaDB fork happily integrates XtraDB as a high-performance update to InnoDB. For the administrator, this means a whole lot more optimizations: a state-of-the-art MySQL version reworked by the MariaDB project and extended to include XtraDB (and Maria), along with additional patches courtesy of the OurDelta project.

A conversion from InnoDB to XtraDB tables is not needed because XtraDB replaces the standard InnoDB engine, just as the InnoDB plugin does. Existing or new tables are automatically managed by XtraDB. A downgrade to the InnoDB plugin and the standard InnoDB engine is also possible. To see that XtraDB still refers to itself as "InnoDB," you can call SHOW ENGINES – also, you will see the other modern engines, such as Maria and PBXT, here. Listing 4 shows the engines on a current MariaDB server.

Listing 4: Engines on a MariaDB Server

MariaDB [(none)]> show engines;
+------------+---------+----------------------------------------------------------------+--------------+------+------------+
| Engine     | Support | Comment                                                        | Transactions | XA   | Savepoints |
+------------+---------+----------------------------------------------------------------+--------------+------+------------+
| BLACKHOLE  | YES     | /dev/null storage engine (anything you write to it disappears) | NO           | NO   | NO         |
| MRG_MYISAM | YES     | Collection of identical MyISAM tables                          | NO           | NO   | NO         |
| FEDERATED  | YES     | FederatedX pluggable storage engine                            | YES          | NO   | YES        |
| MARIA      | YES     | Crash-safe tables with MyISAM heritage                         | YES          | NO   | NO         |
| CSV        | YES     | CSV storage engine                                             | NO           | NO   | NO         |
| MEMORY     | YES     | Hash based, stored in memory, useful for temporary tables      | NO           | NO   | NO         |
| ARCHIVE    | YES     | Archive storage engine                                         | NO           | NO   | NO         |
| MyISAM     | YES     | Default engine as of MySQL 3.23 with great performance         | NO           | NO   | NO         |
| InnoDB     | DEFAULT | Supports transactions, row-level locking, and foreign keys     | YES          | YES  | YES        |
| PBXT       | YES     | High performance, multi-versioning transactional engine        | YES          | YES  | NO         |
+------------+---------+----------------------------------------------------------------+--------------+------+------------+

Tests on Live systems demonstrates that migrating to the MariaDB package is unproblematic for the most part. MyISAM tables are left unchanged; InnoDB tables continue to work. However, MySQL-specific configurations in my.cnf are interpreted in a fairly strict manner. sql-mode=NO_ENGINE_SUBSTITUTION,TRADITIONAL will put MySQL in traditional mode, which handles many warnings as errors. For example, a typical Rails-style database migration failed because it did not use completely standards-compliant queries, such as setting default values for text and blob fields.

In non-traditional mode, MySQL ignores the default use and still runs the queries; in traditional mode, the query quits with an error. Thus, it is a good idea in many cases to change the line to sql-mode=NO_ENGINE_SUBSTITUTION and then restart the server. This problem is not specific to MariaDB, simply a very restrictive configuration in line with the MySQL standard. Additionally, it makes sense to check whether programs compiled against Libmysqlclient-dev need to be recompiled against the current Libmariadbclient-device. In a Rails environment, this will affect Mysql-Gem.

Besides the benefits of the InnoDB plugin described thus far, XtraDB also offers a considerable performance boost, as a variety of benchmarks with various setups proves [10]. Numerous additional functions make life easier for developers and administrators – and don't forget the ability to write out the InnoDB buffer to disk. If you do need to restart the MySQL or MariaDB server, the InnoDB engine loses its valuable buffer pool in RAM. Depending on your configuration and the application, the pool can be several gigabytes and might be filled in the course of hours. Storing the buffer pool before quitting and loading it again after restarting will save valuable warmup time, which you would notice as slow response on the part of the database server. Listing 5 shows the commands and returns for storing and loading the buffer pool.

Listing 5: Storing and Loading the Buffer Pool

// storing the buffer pool
MariaDB [(none)]> select * from    information_schema.XTRADB_ADMIN_COMMAND    /*!XTRA_LRU_DUMP*/;
+------------------------------+
| result_message               |
+------------------------------+
| XTRA_LRU_DUMP was succeeded. |
+------------------------------+
// loading buffer pool and
MariaDB [(none)]> select * from information_schema.XTRADB_ADMIN_COMMAND /*!XTRA_LRU_RESTORE*/;
+---------------------------------+
| result_message                  |
+---------------------------------+
| XTRA_LRU_RESTORE was succeeded. |
+---------------------------------+

Drizzle

At this point, I'll take a quick look at a new development, Drizzle [11]. According to the project, the fork is a return to the original MySQL values: simplicity, reliability, and performance. Drizzle was originally based on the code of the not-yet-released MySQL 6.0 and mainly pursues the goals of removing unnecessary functions and reducing complexity.

The developers really have been radical in the features they eradicated: storage engines such as Federated and Merged have been removed; others, such as CSV and MyISAM, have been demoted to temporary engines. Modern engines such as XtraDB are maintained in a separate branch. The standard engine for Drizzle is InnoDB. However, this does not mean that data dumped from a classical MySQL server with InnoDB tables can be integrated without problems, because Drizzle has also eradicated many field types, such as TINYINT, TINYTEXT, and YEAR. Migrating to Drizzle thus means architectural changes to your database design. Although a change from TINYINT to INT could simply mean searching and replacing occurrences in a dump file, the lack of a YEAR field can have a more serious effect on existing applications. A generic solution for the migration does not exist.

On a more positive note, Drizzle offers totally new replication mechanisms. One feature that stands out is the ability to perform rabbit replication to NoSQL databases such as Voldemort [12] or services such as Memcached [13]; thus, you would be able to provision a variety of back ends automatically from a central location. As a state-of-the-art, high-performance, non-transactional database engine, the Drizzle project is working on BlitzDB, which will be positioned as an alternative to MyISAM.

Conclusions

The community's response to the lethargic integration of patches into the Community Edition of MySQL server is to launch new and active projects (Figure 1). Existing MySQL 5.0 installations can be replaced easily by the OurDelta MySQL 5.0 build, which accelerates the server, thanks to performance patches, and offers advanced reporting functionality so the administrator can plan further steps on the basis of run-time statistics. Installations of version 5.1 can benefit from the latest optimizations by installing the current InnoDB plugin – ideally without needing to rebuilding. Migrating to the state-of-the-art MariaDB, which outperforms the InnoDB plugin in performance tests, turns out to be more effective. Luckily, MariaDB is packetized by the OurDelta project, which also adds a number of additional patches.

The MySQL development community now has many forks.
Figure 1: The MySQL development community now has many forks.

The Drizzle database, with its simplified variant of InnoDB, is still at a very early stage. The Maria engine also represents a possible fast future alternative to the classical combination of MyISAM/InnoDB; however, you need to perform extensive checks before using it. Both projects' engines require architectural changes to the database system and the program code that accesses it, in contrast to the InnoDB plugin, XtraDB, and popular MySQL patches.

Administrators and developers are put in an ambivalent situation: Although it has become inevitable for administrators to concern themselves with alternative MySQL patches, engines, and forks and, ideally, to deploy benchmarks to discover the perfect solution for their requirements profile, this does involve a considerable amount of overhead. At the same time, the installation of the server is no more complex than using distribution packages thanks to pre-packetized software.

Work is already in progress on integrating modern forks into distribution repositories [14]. The reward for all this effort will ideally be a noticeably faster database server that helps reduce hardware and development costs.