Needle in a haystack - finding the web server overload
Archive - Originally posted on "The Horse's Mouth" - 2011-09-18 16:05:50 - Graham Ellis
Analysis - look at the symptoms, and note that that server has started very seriously swapping at the point that the load has built up. So the problem seems to lie more in memory than in cpu utilisation. And once swapping starts, the throughput drops which means that the server can't cope with the regular load. Come along to a shop and see a long queue, and you'll go elsewhere; come to a webserver and you'll join the queue because you don't realise how long it is, so once there's an overload it's not going to be naturally self-righting.
So - with over 100,000 accesses to our web server each day, how can I find which are the one or two that are causing the problem?
a) I instigated logging as described the other day - [here]. That tells me the process number for each request, and also the time it takes to complete.
b) I instigated a top report to run every minute on the server, logging the memory usage:
top -b -d 60 >> logfile
c) About a day later, I looked for large server threads in the top log and, low and behold:
-bash-3.2$ grep '[0-9][0-9][0-9]m' toplog
1955 apache 20 0 489m 479m 3108 S 4 54.3 0:04.01 httpd
4180 apache 20 0 418m 409m 2972 S 3 46.3 0:02.31 httpd
25587 apache 20 0 195m 188m 3456 S 2 21.3 0:01.75 httpd
25587 apache 20 0 564m 557m 3456 S 4 63.1 0:03.98 httpd
29989 apache 20 0 410m 400m 3132 S 4 45.4 0:02.91 httpd
-bash-3.2$
That's a couple of individual threads which exceeded half a gigabyte ... when the normal size doesn't exceed 25 Mbytes.
d) An examination of the requests served by threads 4180, 25587 and 29989 ...
grep 25587 extra_log
revevealed a commonality of records ...
82.46.91.71 www.wellho.net 20110917.181334 200 200 917852 225 25587 X "en-gb"
"gzip, deflate" "GET /demo/imgcatalog.php HTTP/1.1" "-" "http://www.google.com/
imgres?q=ashchurch+military+watertower&um=1&hl=en&biw=1076&bih=586&tbm=isch&
tbnid=O4TnFYX7AKtYwM:&imgrefurl=http://gb.wellho.net/demo/imgcatalog.php&
docid=-PY_YfY_Je0VRM&itg=1&w=900&h=675&ei=LdV0TtnHJISx8QPrmdH9DQ&zoom=1"
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_1) AppleWebKit/534.48.3
(KHTML, like Gecko) Version/5.1 Safari/534.48.3"
199.71.214.212 www.wellho.net 20110917.195838 200 200 918861 49 29989 - "-"
"-" "GET /demo/imgcatalog.php HTTP/1.0" "-" "http://www.wellho.net/demo/imgcatalog.php"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322;
.NET CLR 2.0.50727)"

The script turns out to include the following MySQL query:
$q = mysql_query("select * from im_library order by iid desc");
which looks simple and easy ... except that the im_library table contains most of the images we use on our web site - there are now some 7,500 of them - and the query was written in such a way that it was returning all the text data and each complete image too to the PHP script - and that's a huge amount of data.
e) rewrite the query to return only the fields needed:
$q = mysql_query("select filename,descriptor,iid from im_library order by iid desc");
(and test to make sure that the script still performs, of course). And then bounce on the link a few times to check that it's rather more robust.
Problem - I THINK - solved.
Moral of story: Check your database queries to ensure they're not returning far more than they need to!.
I'm now fairly sure the problem has been fixed. Looking at various server graphs, there's a distinct change in pattern over the past 24 hours since I put the fix in. And I've been having a good "bash" today too - in fact improving things so that the script now links to a page that will search out when the images are used too ...
The image catalogue may be found [here]. I'm sufficiently sure that it's fixed to encourage you to visit that page. And you can link through from there to details of any image. You can also find a random image, and details about it, [here]. I have a private PHP course coming up this week, so I expect you may find some enhancements to the scripts while I'm in PHP mode ;-)
Images accompanying this article? Random ones from our database ...