Setting up a matrix of data (2D array) for processing in your program
Archive - Originally posted on "The Horse's Mouth" - 2010-10-21 07:41:52 - Graham EllisWhen you're reading and processing data, it often comes in the form of a series of records, with each record being split into a series of fields, and you'll often want to be going through the data several times, looking at different rows and colums, sorting them, comparing them, and so on. If the amount of data isn't so huge that you can't hold it all in memory at once, the best solution of often to read all the data at the start and store it into a collection variable, with each member of the collection being a collection itself. If this sounds very theoretic, it's what is colloquially known as a "2 dimensional array" or sometimes as a table.
Many modern languages don't have an explicit 2 dimensonal collection structure, but rather collection types that can themselves hold other collection types ... and so it is in Perl. Indeed - Perl has two collection types; you'll us a list if you want to look something up by position number, and perhaps to sort it, and you'll use s hash if you want to look something up by a key - perhaps a string. And you can set up a table with the rows indexed by a number (listish) and the columns indexed by a string (hashish) if you like.
I set up an example in Perl, using a list of lists, yesterday - see [full source] - and always in Perl (!!) the setup was short and a little hard for the newcomer to follow:
while ($line = <FH>) {
my @fields = split(/\t/,$line);
push @records,\@fields;
}
So that's each line being read, split, saved into a temporarily named list (that my is vital!) and then added onto the end of the list of all the records so far.
A further example - [full source code] - set up a hash of lists, where each line is keyed by the value in one of the fields (I chose the first field) but the the columns of data are numbered:
while ($line = <FH>) {
my ($place,@fields) = split(/\t/,$line);
$records{$place} = \@fields;
}
You'll notice ... Perl has "autovivification" ... in other words, there's no need for you to setup your lists, hashes, scalars ahead of time - they just get set up automatically for you. If you're writing a medium sized to larger program where you're using subs or modules, though, I will advise you to "use strict" so that you don't accidentally reuse a name across a wide scope and introduce unexpected bugs into your program.
Further examples from yesterday -
Various variable types from our Perl review
Changing the behaviour of a hash from our section on tieing
... all covered in public on our Perl for Larger Projects course!