Firefighting with Perl
Archive - Originally posted on "The Horse's Mouth" - 2009-09-07 08:22:13 - Graham EllisIf the building is blazing, you'll go right ahead and put the fire out with the extinguisher. Hopefully you'll have been trained to use the right sort of extinguisher on the right fire, but you won't leave the fire to blaze just because the "date last serviced" record has become smudged and you don't know if the extinuisher is technically overdue for overhaul - you can soon chuck it to one side if it fails!
At times, Perl can be a bit of a fire extinuguisher - a language that you'll use to plug the emergency gaps, get a quick answer to a one-off question that's needed in response to the MD's call out from a meeting, etc. Questions like "why is our web server running slowly - are there particular pages draining the c.p.u." and "how many people have visited our newsletter page since we sent out the mailing last week?"
Such code need not be pretty; it can be hacked together and provided that it works, comments / readability / maintainability / design can - temporatily - go out of the window. But even when you're writing firefighting code, ALWAYS check that your results are correct
Finding the slow server records
Both of the questions that I raised above have been asked of me ... the "slow server" one at 3 a.m. when responses were dreadful ... and I wanted to find all access records in the current log file that stepped back in time six seconds or more (records are logged in the order of completion, but the time stamps are the access start time!)
The code remains in our server's live log directory to this day. It is "sort of" documented by being names "slow.pl" which is a bit of a clue, and I have inset my loops. A print statement that I used in early debugging remains commented out, and there is no validation to check that the log file really is present. Here's the code in its current state - intentionally left available in case the particular file re-ignites:
open (FH,"access_log");
$osid = 0;
while (<FH>) {
$l++;
($h, $m, $s) = /(\d\d):(\d\d):(\d\d)/;
$sid =$s + 60*$m + 3600 * $h;
if ($osid) {
$moved = $sid - $osid;
if ($moved < -6) {
print "$moved $l ";
print;
}
}
# print "$moved\n";
$osid = $sid;
}
It's a cool autumn morning, all is quiet and the logs were tidied up a couple of hours ago by our night porter, affecionally known as 'cron' ... so there have been absolutely no issues yet today. A typical 'Unix Utility' working on the 'no news is good news' paradigm:
-bash-3.2$ perl slow.pl
-bash-3.2$
Counting unique visitors to a particular area
Let's look at the other question - "how many people have come to our newsletter". This is really one-off code at the moment:
#!/usr/bin/perl -na
m!GET /newsletter/! and $counter{$F[0]}++;
m!GET /newsletter/! and $tc++;
END {
$th = keys(%counter);
foreach $ho(sort(keys(%counter))) {
print "$ho $counter{$ho}\n"; }
print "Total hits $tc ... hosts $th\n"; }
How does that work?
I run it with the input files named on the command line, and the -n switch reads each line of each file it turn into $_, which is then matched against the regular expression(s) and counted if they match - actually counted twice, once in a table (hash) by the visitor's IP address - the first field on the line which has been autosplit into @F by the -a switch.
Once all the lines have been read, the END block is run once, listing all the IP addresses in order, together with counts per IP, then the final (desired) results - total accesses and total DIFFERENT hosts accessing - are printed out:
-bash-3.2$ ./pgnl ac_2009082* ac_2009083* ac_200909*
snip
89.124.xxx.xxx 4
92.237.xxx.xxx 104
92.239.xxx.xxx 4
Total hits 2823 ... hosts 131
-bash-3.2$
Even there, the numbers need to be checked somewhat and read with care; each 'newsletter' hit from a 'real' visitor is four accesses - the main newsletter and 3 graphics in the same directory, and the total of 131 hosts includes our own test machines, and also crawlers such as Google. The fulllist / IP numbers gave me further clues, and I was able to say that I estimated that beween 110 and 120 people had clicked through. Expressed as a percentage of the number of emails, and considering that's people who oepned teh email and took a further step as well, I have to say I'm quite pleased.
Back to the Perl code
Would I be proud to supply this code as part of a contract? No. But am I happy with what it allowed me to do very quickly. Absolutely. I used to say that "what took me a week in C now takes me a day in Perl". That's changed, somewhat: "What took me a week now takes me a morning in Perl".