Main Content

Automated server heartbeat and health check

Archive - Originally posted on "The Horse's Mouth" - 2009-01-16 07:49:14 - Graham Ellis

Occasionally - very occasionally - we may have a problem on our public facing web server that's hosted in some network operation centre or other. It could be a software glitch, or it could be an internet connectivity issue. And we need to know about it quickly!

The problem is than none of us might actually be using the server. I could be giving a private course in Kent, Lisa may be in the office confirming a public course booking, and Chris could be at home in Calne. We have some excellent customers who have been known to alert us by email (thanks, customer Chris, the other day!) but really it should not be necessary for that - in fact, we should find the problem before it gets widely noticed.

I had a bit of a rant the other day about someone who's running a script that pulls a page off our site every five minutes to see if we're still running (see here) but it also set me thinking that we could monitor our own server in a similar way - it's a very different matter to monitor yourself that to monitor someone else uninvited, after all! And we do have a second (backup) server.

So ... here's what we are now doing:

a) I have installed a PHP program that runs stand alone on our backup server which checks with our main server and emails all three of us "techies" if the main server does not respond. (source code)

b) We have a regular times (crontab) job running 4 times an hour ... running this program. The crontab line is in the source code as a comment / example

And it can be that easy!

I have chosen to go one stage further - the page I am calling up on the public server is actually a status line generator, so that our backup server can do more that just say "live" or "dead" - it can also say "live but looking a bit sick" if it needs to. The status script is actually called up from within my Ajax Demonstation too, and you can see the source code here if you wish.

((As an extra, I should have our main server heartbeating our backup server in an equal and opposite arrangement so that we'll be notified if either one falls over))