Finding all matches to a pattern in Perl regular expressions

Archive - Originally posted on "The Horse's Mouth" - 2011-12-09 03:12:38 - Graham Ellis
A regular expression usually matches the leftmost occurrence of a pattern within an incoming (source) string.

This doesn't matter if all you're looking to do is find whether or not your source string contains something ... but if you're looking to make use of the part that matched, then it does make a difference. Consider
$us = 'I am graham@wellho.net and you are pinkpanther@frenchdetectives.fr';
and the match
/\S+\@\S+/

This will match graham@wellho.net every time you run it in Perl, and never pinkpanther@frenchdetectives.fr .

What if you want to match that second email address, then? You can add a modifier after the end of the regular expression - the single letter g which stands for "global".

* In a scalar content, but used in a loop, a global match carries on where the last match left off on each successive time around the loop, thus letting you loop though all valid, non-overlapping matches in the string. And when there are no more, a false result will be returned. Thus:

  while ($us =~ /\S+\@\S+/g) {

    print "Emma is $&\n";

    }

Will return each match it turn. By contrast, without the g this program would give you an infinite loop.

There are, as always, multiple ways of doing the same thing in Perl. If you use the g modifier in a list context (for example return the result of the match into a list), that list will be assigned to all non-overlapping matches. Thus:

  @gotted = $us =~ /\S+\@\S+/g;

  print "We got @gotted\n";

will output

We got graham@wellho.net pinkpanther@frenchdetectives.fr

Complete program [here]. As taught during our Learning to program in Perl / Perl Programming training classes.

Main Content

Finding all matches to a pattern in Perl regular expressions