Perl - still a very effective language indeed for extracting and reporting
Archive - Originally posted on "The Horse's Mouth" - 2014-09-20 19:09:00 - Graham EllisPerl remains a marvellous programming language for a number of applications, including quick / one off scripts which may be needed to manipulate data is ways that vary from day to day and week to week - research type work, if you like.
We manage / look after our own web server, hosting our IT training business, our hotel the Great Western Coffee Shop Forum and a whole series of other lower volume sites such as Melksham Chamber of Commerce and Industry and the TransWilts Community Rail Partnership. Various monitoring and logging scripts run on the system - from an external heartbeat which sends us an email if the server's not running within predefined metrics (or not running at all - this is an external script) through a script which takes a snapshot of vital parameters every few minutes to our web server logs which record details of each and every completed web server access.
With all this data ... when we want to find what's the reason behind a changing pattern, we're looking for a needle in a haystack - over 200,000 access records / over 50 Mbytes of logging datain some 24 hour periods. We need the "Practcial Extraction and Reporting Language".
Here are a couple of examples of Perl code that I've used so far this month, during which time we've had a couple of pattern changes on the server that I wished to investigate:
while (<>) {
($h,$m,$s) = /:(\d+):(\d+):(\d+)/;
$sid = $h * 3600 + $m * 60 + $s;
print if ($sid - $oldsid < -450) ;
$oldsid = $oldsid > $sid ? $oldsid : $sid;
}
This identify slow running requests to the server - ones that take over 450 seconds to complete; this isn't quite as simple as it sounds, as only the start time of each request is recorded, but the recording is in the order of completion so the Perl job is looking back at previous record times ...
while (<>) {
@n = split;
$c{$n[0]}++;
}
for $k (sort {$c{$a} - $c{$b}} keys %c) {
printf "%6d %s\n",$c{$k},$k;
}
And that piece of code counts accesses from each different visitor to the server, printing them out with the most frequent visitor listed last.
Each of these scripts was a vital step in identifying issues that were effecting server performance - one of our housekeeping scripts was being run rather more often than it should have been by visiting search engine crawlers (so we modified that script to ensure it's not run more than once every 24 hours), and we were getting up to 300 visits in 5 seconds from a system in the Ukraine; I don't know exactly what it was, but it was up to no good with accesses from the same computer claiming to be from Chrome, Firefox and Internet Explorer browsers ... when I suspect it was an automaton that was either our of control or malicious. A quick fix there is to have our web server simply send back a "forbidden" (403 code) response which it can do efficiently without any noticable effect on performance!
I have just given you the bare bones scripts above. We strongly recommend that it you're using things like this on a regular basis, you comment your scripts well, you use better variable names, you add flexibility by allowing different variables to be fed in, and you refrain from using constant values where you might want a change in the future. Such techniques are taught on both our Learning to program in Perl and Perl Programming courses, which run a number of times each year at our Melksham, Wiltshire training centre or can be run in private if you've got a group of delegates.