Designing your data structures for a robust Perl application
Archive - Originally posted on "The Horse's Mouth" - 2009-08-25 08:09:17 - Graham EllisWhatever language you're programming in, design of your data structures is important. You should consider such design ahead of time, and before you start to code - "What am I going to be doing with this data" and "how do I want to access it".
It is easy - VERY easy - to fall into the trap of staring to code without adequate thought (and a diagram) and in language like Perl (especially) which assumes you know what you are doing, and where a few bytes of code can do a lot of work, you can so easily head for trouble if you're unplanned.
So here is a piece of PLANNED code! I am going to be writing code to analyse a log file. It's a web server's access log file, where each request is a separate line in the file, with the first field on each line identifying the visiting client computer.
My data design:
• I want a HASH, keyed by the visiting client computer's identity (IP address).
• the values in that hash are to be references to a LIST of accesses from that client and
• each access record is itself to be a list of individual strings from the incoming access records.
So - in summary - a Hash of lists of lists.
Design done - let's NOW write the setup code!
open (FH,"ac_20090818") or die;
while (<FH>) {
my ($thisip,@otherparts) = split;
push @{$all{$thisip}}, \@otherparts;
}Do not be mislead by how short that code is - it really does set up the three-tier structure I described. What a good job I HAD described it, though, so that it's easy to handle. Let's now test it, by printing out part of one record and also a summary of the number of visitors:
# For IP address 77.88.28.246, look at the
# 7th hit and tell us the 6th fld.
print ${${$all{"77.88.28.246"}}[6]}[5],"\n";
print $all{"77.88.28.246"}->[6]->[5],"\n";
print $all{"77.88.28.246"}[6][5],"\n";
@visitors = keys %all;
print "Visits from ",@visitors+0," places\n";and running that:
Dorothy-2:pl grahamellis$ perl actab
/mouth/834_Python-makes-University-Challenge.html?headline=_200
/mouth/834_Python-makes-University-Challenge.html?headline=_200
/mouth/834_Python-makes-University-Challenge.html?headline=_200
Visits from 14872 places
Dorothy-2:pl grahamellis$What if your code is going to be more than just a few lines long? Are you going to be able to design / recall / easily code structures like these? Probably not - you'll want to use an approach that is more extensible. And that's where you'll take the complicated logic bits inside and hide them ("encapsulate them") within a module or a class - Structured and Object Oriented Programming which allows you to go from small to medium and large applications robustly, and without writing code that becomes a nightmare to enhance.
The example above was written at the end of yesterday's opening day of our Perl for Larger Projects class. I'll be carrying on with it today - moving to an Object Oriented application where each of the layers will be written with more straightforward, verifiable, testable, re-usable code - leading towards an application in which we can extract and report on information about our web site visitors with ease ... and which we can easily enhance and modify as further analyses are required.