Main Content

Sharing the load with Apache httpd and perhaps Tomcat

Archive - Originally posted on "The Horse's Mouth" - 2007-03-29 17:25:57 - Graham Ellis

"Can you show us how to share the load of a web site between various servers" is one of the question that comes up quite often on the more advanced web server configuration courses that we run. And, yes, I can, but I'll probably ask you a lot of extra questions about exactly how you want to load balance first.

Update - February 2013. The principles remain very much the same (as written 6 years ago for this article). However, mod_proxy_balancer is now the dominant standard and you should consider that first and foremost, ensuring that you have Apache httpd 2.2.11 or later for correct interaction with mod_rewrite. 2.2.11 itself is now rather mature, so and recent release (2.2.x or 2.4) will do the trick!

At one extreme, you can set up your web site so that certain requests - perhaps to all the URLs within one group of directories get forwarder to one server, and other request (from other directory groups) get forwarded to other servers. Using Apache httpd's mod_proxy this forwarding is easy to set up, but it does rely, essentially, on a manual assignment of traffic for different site areas to different servers, and it's not in itself going to dynamically adjust.

At the other extreme, if you run a Tomcat Cluster then you can send requests to any one of the cluster members and they'll handle those requests, even continuing sessions that have been started on other cluster members - but the price is high in terms of network traffic beweeen cluster members - indeed it's unnecessarily so for most applications.

A sensible compromise is to use Apache httpd's mod_rewrite to forward initial requests to a server selected at random from a list, but then to identify subsequent requests that are a part of the same session and forward them to the same server. In other words, a new visitor's initial contact is made with the next available agent, but once the connection has been established that visitor will keep coming back to the same agent. This approach allows a series of requests to be made throughout the domain (which the first solution did NOT allow), but avoids the need for every agent to keep every agent informed of each intermediate step in each transaction in case the visitor comes back to a different server next time.

As from Apache 2.2, there's an extra (and even better and more flexible tool) to share the load on a "per session" basis. mod_proxy_balancer allows the user to configure a number of servers to accept forwarded load, spreading the load evenly between each server based on the number of requests made of each, OR on the traffic each is generating. It's improved on mod_rewrite in that the forwarding is scientifically calculated rather than random, and so mod_proxy_balancer makes self-tuning adjustments and reduced the forwards to busy servers if it need to - if, for example, one particular session is burning up disproportional resources. mod_proxy_balancer can even be told that different servers have different available resources, and so the forwarding doesn't have to be equal.

Other capabilities of mod_proxy_balancer include robustness features such as server time outs and addbacks in the event of servers going off line and coming back on, and hot swap and standby system capabilities so that load can be taken over by a backup system if the main systems fails. mod_proxy_balancer is an Apache httpd module; traffic can be forwarded to other instances of httpd (as you would do if you were balancing a language such as PHP) or Tomcat (if you were balancing Java).

We do not offer a specific public web server balancing course - but we're familiar with these technologies and we can cover them as appropriate (and with practicals) during private courses. Please get in touch if you would like such training - even with just one or two delegates, we do have ways and means of training you cost effectively!