Writing more maintainable Perl - naming fields from your data records

Archive - Originally posted on "The Horse's Mouth" - 2012-09-25 22:34:03 - Graham Ellis
Perl is the Practical Extraction and Reporting Language, and the data from which you'll want to extract data often comes in the form of CSV (Comma separated variables), or space or tab delimited records.

Opening and reading a file of such records, in a loop, is easy:

  open (FH,"trains") or die;

  while ($service = <FH>) {

    # act in each record here

  }

and within the loop, you can split each line into its individuak fields; if the line's tab delimited, for example, you migh write:

  @flds = split(/\t/,$service);

That's short and sweet, and I can then refer to individual elements. for example:

  print "Train to $flds[2] at $flds[0]\n";

This means, however, that in a complex piece of extraction and analysis code you're likely to be making a large number of references to elements by their position in the original lines, making the bulk of the code harder to follow, and making it difficult to reuse / update the code if the format / field order changes in future data files.

A Better Way

In Perl, you can give a list of scalars on the left of an assignment to name each element of a list (from something like a split) in one go:

  ($time, $cars, $place, $capa) = split(/\t/,$service);

and you can then refer to the elements by name during your extraction and analysis phase:

  print "Train to $place at $time\n";

Although the initial splitting line is (a little) longer, you can now write code that's much more self-documuenting, with meaningful variable names, in the analysis. And if the field order should even change, you've just got a single splitting line to recode, rather that having to re-engineer the whole analysis phase.

Full example (including sample data and output) is [here] on our web site. And this example is as taught on this week's Learning to program in Perl course.

Main Content

Writing more maintainable Perl - naming fields from your data records