Traversing a directory in Perl

Archive - Originally posted on "The Horse's Mouth" - 2012-08-08 22:28:15 - Graham Ellis
As part of the revision of our Perl Programming course, we're adding an example on directory handling in with our basic file handling module. That's because, over recent years, Perl's use in parsing directories and searching around for data - systems admin tasks - has increased disproportionately to its heavy data handling use and web use.

So there's a new example - source code [here] - which parses a directory and produces a summary of the data for each item it contains, and a brief overall report.

In Perl, you open a directory with an opendir function call:
  opendir DH,".";
and you then read from the directory with a readdir:
  $item = readdir DH;
with each read returning the name of the next item. Once you've read the whole directory, you'll get a false value returned, so that you can easily write a loop to check a directory:
  while ($item = readdir DH) { [etc]

Characteristics of individual items in the file system can be checked with operators such as -d ("is it a directory") and -f ("is it a plain file"):
  if (-d $item) {
and other operators in the same family can be called to get more data, such as -s to return the size of a file, and -M to return its age in days:
  $result .= "Size is ",-s $item," bytes\n";

When you're traversing a directory, or reading a file of data, you'll often want to generate multiple reports on your output. Rather than traversing your data multiple times, for efficiency's sake you'll want to traverse it just once, and store each of your reports, as you generate it, in another variable. You can then print out each of these reports once you have completed your data traversal.

The example above uses this technique to store up a report until the work is completed. At that start, a scalar variable is initialised to being empty:
  $result = "";
then throughout the traversal, information is added onto the end of that variable using the .= operator, for example:
  $result .= "Is a plain file\n";
  $result .= "Size is ",-s $item," bytes\n";
  $result .= sprintf "Modified %.2f%% of a day ago\n\n", 100 * (-M $item);
and finally the variable is printed out:
  print $result;

Saving the data in a variable in this way has other advantages - it allows the report to be written to two diffrerent destinations very easily, and it allows the program to have a "rethink" - i.e. to generate a string that will probably be output, but then supress or modify that output later on if some condition towards the end of the data dictates that should be the case.

There's one downside of this approach when used with a long process - the user won't see any results until the data has been completely traversed. There are techniques for dealing with this sort of issue - the whole topic of handling huge data is covered on our more advanced Perl for Larger Proects course.

Main Content

Traversing a directory in Perl