Web Server Admin - some of those things that happen, and solutions

Archive - Originally posted on "The Horse's Mouth" - 2015-05-10 13:57:40 - Graham Ellis
The techincal "Buck" with our web sites stops with me - and any little admin issues that come up while I'm away on holiday need to be sorted - and of course Murphy's law states that a problem will always happen at the worst possible time. And so it's been with a couple of never-before issues that have arisen during the past week - while we've been on the "Quantum of the Seas" travelling from Bayonne (New Jersey) across the Atlantic - as I write, we're approaching the straights of Gibraltar.

I use our server log graphs as one of our tools to keep an eye on things, together with having each of our servers visiting each other from time to time, and regular (crontab) backup jobs which email me to confirm correct operation (or otherwise). Net result is that my email usually flags up problems within an hour, and I can then see what's up.

First NEW problem was the MySQL server quitting unexpectedly - see [here] for a full description of how that manifested itself. But that story has moved forward, with the problem recurring two times and the initial fix not working on the second time - the error was solid. Taking a hunch and a hint from the "google hell" that this problem leads to, I wondered if we might have had some sort of memory leak and rebooted the server - and (touch wood) two days later that appears to have fixed the problem.

Second problem - yesterday morning - the regular hourly emails telling me of successful MySQL database backup for the facst changing First Great Western Coffee Shop stopped arriving. A quick look at the site, all seemed OK. Similarly a quick look at the web server and all seemed to be in order. A report from one of my team that he'd had trouble getting images was followed up by another saying the problem had gone away after half an hour or so.

Our email uses a shared hosting service, where the administrators do a very good continual job of keeping spamassassin databases up to date - a look at our main server the showed spamassassin and procmail going through incoming emails up to a certain time, but then external emails drying up and the only emails being received were from the system there itself. Odd. And I was unable to reach their admin server.

Modern technology is wonderful - in the middle of the Atlantic, a phone call to Hurricane Electric in California and - as ever - the phone is answered without being queued by a techincal person who knows what he's about. And we look at our shared system, the files there and the logs - and he goes (brief hold) for a chat with his admins and tracks some of our emails at their gateways too (telling me all about the emails I should have been getting). I love this company - not the cheapest, but I have yet to find better customer support anywhere. And good that they knew what the problem was within 5 inutes (so the phone call was $$ per minute, I expect, but the overall cost to resolve low) ... problem turns out that we hadn't renewed the domain name and our name registrar (THAT company will remain nameless) appears to have switched "send reminders" off on our account. OOps.

Long story being cut short; I have't a clue as to the password for our name registration account, so resort once again to the phone, and an answer to a secret question. They have me by the "sort and curlies" of course ... but at least the registration was back in place within one to two hours, and pecollated through DNS caching subsequently. Lesson learned, diary note for the start of May 2024 that we need to renew again.

Now that I know what the problems were, and now that the system has bounced back, I can look at the log clues and learn from them ...

• The big spikes during the night are scheduled backups
• The drops to near-zero on 6th and 7th May were when the MySQL server was stopped - and that really shows just how much our site is database drived!
• The drop on the orange (9th) line realates to a reduced traffic while images and other pages weren't being requested because the domain name wasn't being resolved for potential visitors.

Good to see the hard black line for today running very much as I would wish. Take a look at how we generate those graphs [here].

Looking back to our old (and very crude) log file size graph this morning (see [here]) I note a drop in log file size yesterday ... only to be expected:

Finally, looking at our Google Analytics for the past 10 days - firstly the Wellho.net site with the DNS lost, then the one that makes massive database use:

To a very great extent, these reports are shutting the door after the horse has bolted - they show what damage was done (or rather they give it scale) and they confirm at the next incrememt that the problem appears to be fixed - not ony to us, but also to our worldwide web site visitors, and that's important.

Main Content

Web Server Admin - some of those things that happen, and solutions