Main Content

Distributed, Balanced and Clustered Load Sharing - the difference

Archive - Originally posted on "The Horse's Mouth" - 2012-10-13 16:15:41 - Graham Ellis

If one web server isn't enough to handle all your traffic, you can share the load. But you need to be careful that you "maintain state" for your visitors if you're running applications that involve a series of forms / inputs that follow on from each other.

Specialist hardware load sharing devices are available ... though often they turn out to be dedicated computers running Linux and a specialised piece of software. Such devices sit at the public entrance to your website - your front door - and distribute all incoming traffic around as appropriate. If you don't want to go to the expense of running a hardware load sharing device, you'll often find that a web server such as Apache httpd, running mod_proxy and perhaps mod_proxy_balancer will do just as good a job for you. After all, a restaurant with 10 waiters / waitresses only needs one maitre d'hotel to seat people and to check them out at the end, so you'll only need one front door server even if you have a lot of web servers doing the real work behind it.

There are a number of ways of sharing the load of a single website across multiple servers.

In a distributed setup, certain folders / subdirectories are forwarded to one second tier server, other folders to another second tier server, and so on.

In a balanced setup, NEW connections are forwarded based on a balancing algorithm to a series of near-identical second tier servers. The balancing may be as simple as "round robin" where each second tier server takes its turn to get new clients, or the top (load balancing) server may look at queue length, or even how quickly each server in the second tier is responding as it makes its choice. RETURNING (old) connections are passed back to the same server that previous requests were passed to, so that a user's series of forms will all be handled by the same machine.

In a clustered setup, all connections are forwarded based on a balancing algorithm. However, in this case no account is taken of whether or not the connection is a new one, and requests can end up at any of the second tier servers. However, in a cluster, each second tier server broadcasts (multicasts) after each request is handled to copy the user's current data to all servers. That way, whichever server is next selected, the data will be on hand for it to work correctly.