Looking ahead and behind in Regular Expressions - double matching
Archive - Originally posted on "The Horse's Mouth" - 2010-12-23 08:17:38 - Graham Ellis
Look-ahead and look-behind are a way of "double matching" in a regular expression. If you're at a certain point in the match and you think "the next bit should conform to xxx and at the same time it should conform to yyy" then you can describe xxx via a look-ahead, and follow that with matching yyy in the usual way.
Using Perl syntax for this example: $journey =~ /\w+(?=\w+,)ing.on/;
This says ... ""I'm looking for word characters. They are to be followed by more word characters ending in a comma. They are also to be followed (a second match against the same part in the incoming string) by "ing?on" where "?" is any one character.""
You can look ahead and look behind. And you can negate the result too. That's actually much more useful that the positive look ahead - allowing you to exclude special cases: $journey =~ /\w+(?!ingt)ing.on/;
Which says ... ""NOT ingt but ing[somethingelse]on"".
There's example of all four possibilities (lookahead, lookbehind, positive and negative in each case) in a new example - written as a follow up to a question on yesterday's Perl for Larger Projects course - source code [here]. It's been quite the season for lookahead / lookbehind - there are other new examples in Python [here] and [here].
Are these facilities useful? Yes - on some occasions they can be, but there are often better alternatives. If you look at my simple examples, the same thing could be achieved in a much more straightforward way using more commonly understood regular expression elements, which will be easier for people less into the depths of regular expressions to support. And it's often very much the case that two simple regular expression matches are better (faster, easier to maintain) that one complicated and obtuse one.