NUTS AND BOLTS Apache 2.4 Lead image: Helder Almeida, 123RF.com
Helder Almeida, 123RF.com
 

A first look at Apache 2.4: Web server for the cloud

Apache Apogee

High performance and cloud suitability are the thrust of Apache 2.4. We give you the lowdown on transitioning your Apache HTTP Server from 2.2 to 2.4. By Joe "Zonker" Brockmeier

The Apache HTTP team has nothing to prove. The most recent Web Server Survey from Netcraft shows Apache well ahead of its competition. Perhaps Apache is no longer quite as dominant as it once was in its heyday of 2005 when Apache served pages for something like 70 percent of all domains surveyed. But Apache still rules the web with more than 59 percent of the market overall and nearly 67 percent of the million busiest sites on the web – compared with less than 17 percent for Microsoft's IIS and nearly 6 percent for the up and coming Nginx HTTP and reverse proxy server.

Apache is a fairly mature project. In the software lifecycle, Apache reached the point years ago at which it handled its primary function quite well, then the pace of development seemed to slow down. Why should Apache push out a new major version every six months, as some projects do, when it's reached a solid and stable point, does what its users need, and serves as a foundation for so much work? You don't want to be upgrading Apache significantly every few months or even every year.

But the Nginx example highlights a small problem for Apache – its users are needing more or different features. Specifically, they need features that help Apache in the cloud. Yes, it's a vastly overused buzzword, but it's also a valid trend that isn't going away anytime soon – much like, and related to, virtualization. Overhyped? Absolutely. Still real? Yes. How many shops do you see that aren't using virtualization these days?

So, the Apache crew has been hard at work at features that are going to help keep Apache as relevant in five years as it is today. By the way, for the purposes of this article, I'm going to simply refer to the Apache httpd project as Apache or Apache 2.4, rather than the more correct – but cumbersome – Apache httpd or Apache HTTP Server. Apache, as a top-level project, now encompasses way more than an HTTP server – but you will almost always hear people discussing the HTTP server as "Apache."

Major Changes in v2.4

At the Palmetto Open Source Software Conference (POSSCON, Columbia, South Carolina) in March, I had the good fortune of seeing Jim Jagielski talk about Apache 2.4. If you want to know what's going on with Apache, Jagielski is one of the best people to ask. He is co-founder and a member, director, and president of the Apache Software Foundation (ASF) and has been contributing and working with Apache longer than anyone else that's still active with the project.

Apache 2.4 has a lot of bug fixes, performance improvements, and minor changes, but the big picture is high performance and cloud suitability, according to Jagielski. In particular, a lot of the changes focus on using Apache as a reverse proxy server (see the "Reverse Proxy Server" box).

Reverse Proxy

Organizations look to Apache (and Nginx) as a reverse proxy. Although Apache is already being used in this way by quite a few organizations, it's clear that the 2.2 series is not optimized for that use case, which is one reason why Nginx is seeing bigger numbers.

With the 2.4 release, though, Apache should regain some of its ground as a reverse proxy server. The mod_proxy and mod_proxy_balancer modules aren't new in 2.4 – they were already available in 2.2 – but they've received a lot of improvements.

With the improvements in 2.4, you can now do real load balancing natively in Apache. You can weight by the actual requests, by the traffic, and by how busy the servers are, and you can set up factors that weight between individual servers and groups; that is, you can set up groups for load balancing and then set weights within those groups as well.

For example, say you're managing requests that are going to a set of servers for video and a set of servers for static images. You can configure the groups in Apache, then in the subgroups you can set up load balancing within those groups. So, if you have five single-CPU machines in a group and a quad-socket machine with quad-core CPUs, you can weight the quad-core machine to get the appropriate level of traffic. The configuration is very simple – you should be able to pick it up in about five minutes.

Naturally, Apache 2.4 has support for session affinity or cookie-based session support, including specific types for PHP and Tomcat.

Another improvement for availability, or at least perceived availability, is that mod_cache can now cache HEAD requests and can serve "stale" data when it receives a server error (one of the 5xx errors). This means that if Apache 2.4 uses mod_cache and is set up correctly, users might see old content or data, but they don't have to see an error if one of the transactional servers behind Apache goes down or some other problem occurs.

What's really impressive with Apache 2.4 is how many changes can be done on the fly. Admins know the drill: (1) make a configuration change, (2) restart Apache, (3) repeat from step 1. Now, many (but not all) configuration changes can be handled on the fly, including adding servers as part of the proxy balancers. A future issue of ADMIN will take an in-depth look at the how-to of load balancing with Apache.

Multiprocessing Modules

Another big change in Apache 2.4 is that multiprocessing modules (MPMs) can now be added as a module rather than having a single MPM for Apache. Unlike other Apache modules, Apache can use only one MPM at a time, because the MPM determines how Apache accepts requests and handles those requests. Depending on what platform you're on or how you're using Apache, a single method is not always best across the board. If you're running Apache on Windows, you probably want to handle things a bit differently than if you're running Apache on Linux or Solaris. So, Apache 2.4 allows you to build several MPMs and choose the one to use at run time. This means that if you want to test the performance of Apache on Linux/Unix, you can build the MPMs as loadable modules and try them at run time with the LoadModule directive.

If you're on Windows, OS/2, or NetWare, you're probably better off sticking with the platform-specific MPMs. Unix/Linux folks have the choice of prefork or worker MPMs, which have been standard for years, or the now-stable event MPM. The differences between the three could probably fill an entire article. Briefly, the event MPM is based on the worker MPM and is not for older platforms without good thread support. The prefork MPM is best for older platforms.

Lua, Lua, Ohhh, Oh!

One of the nifty features in Apache 2.4 deserves its own section: Apache now comes with mod_lua, which embeds the Lua language into Apache for handling rewrites and other logic.

The Lua module started in Apache 2.3x (the unstable series) as mod_wombat. The nice thing about Lua is that, according to Jagielski, it reduces the memory required to handle a lot of tasks for which you might use other languages.

In place of mod_rewrite, the Lua module might just save your sanity. Using mod_rewrite requires heavy use of regular expressions – and you know the saying about having one problem, deciding to use regular expressions, and then finding you have two problems. Yeah, mod_rewrite is powerful, but it's also a good way to shoot yourself in the foot. With the Lua module, you can do rewrites using real, honest-to-goodness, if statements.

If you're not already a Lua programmer, you might want to brush up and see if you can make use of this one.

As Jagielski alluded to during his talk at POSSCON, this will factor in quite a bit in reverse proxying. The Lua module can be used in place of mod_rewrite to create rules for load balancing and filtering requests that would be sent to back-end servers.

Get Ready for 2.4

Naturally, I've only been able to address the highlights here, but a lot is going on with Apache 2.4, and it's well worth the time looking over the release notes to see what's new. The final release of Apache 2.4 is expected in May 2011. Because it's an open source project, the release date could likely slip a bit – but barring a major problem with the 2.4 release, expect it to be done in that time frame.

Do you need to upgrade right away? That depends entirely on how you're using Apache and how demanding your environment is. If you're making light use of Apache and the new features for load balancing and such won't give you any major improvements, then it's probably best to stand pat. Also, third-party software and projects that work with Apache need to be tested and vetted with Apache 2.4, so it might not be as simple as just dropping it in. If you're getting Apache via a Linux distribution or another downstream project or vendor (e.g., Oracle and Solaris) then it's going to be a while before Apache 2.4 shows up on your doorstep.

But, if you can use the features in 2.4, it's time to start testing right away. The move to 2.4 should be fairly trivial, but you might run into a few gotchas or areas that need to be addressed before you can upgrade.

Most notably, all third-party modules for Apache are going to need to be recompiled before using with 2.4. According to Apache's documentation, "many" of the third-party modules should work without changes for Apache 2.4 (aside from recompiling), depending a great deal on the APIs that the modules use. Apache 2.4 adds and changes a number of APIs, and some of the changes could affect the way modules behave. The ability to use per-module logging, for example, could affect some modules.

Modules that use the ap_log_* calls need to be changed to add the APLOG_MODULE_INDEX argument. The ap_default_type API has gone away, the unixd_config API has changed to ap_unixd_config, and a number of other changes could affect modules in 2.4. See "API Changes in Apache HTTP Server 2.4 since 2.2" [1] for the full set of changes.

Some configuration adjustments might be necessary for 2.4. For example, if you're using the KeepAlive directive, you now only have the On or Off option. A number of directives (e.g., AcceptMutex, SSLMutex, and RewriteLock) have all been replaced with the Mutex directive, so you need either to eliminate these directives or replace them with Mutex. The MaxRequestsPerChild has been renamed to MaxConnectionsPerChild, in favor of accurately describing what the directive does. Again, Apache's docs are very complete on the changes from 2.2 to 2.4, so be sure to read the documentation [2]. Kudos to the Apache folks for their documentation. Although at times it's a bit terse, it is very complete.

The Apache 2.4 release, although a fairly modest bump compared with the move from the 1.x series to 2.0.x, is a pretty important release. If you work with Apache in any capacity, now's the time to start becoming familiar with the new release and to get ready for the eventual transition.