First and last match with Regular Expressions
Archive - Originally posted on "The Horse's Mouth" - 2010-04-02 06:27:42 - Graham EllisConventional Wisdom says that it's pointless to start a regular expression with ".*" or ".+", as this is implied within a match - regular expression matches are looking for strings that are contained anyway:
'/abc/'
- contains abc'/.*abc/'
- contains anything or nothing, followed by abcHowever - conventional wisdom isn't the full story.
If you're just checking whether something matches, then - fair enough - the checks above are identical. But if you are using capture parentheses (round brackets to save interesting bits into new variables) then an extra .* DOES make a difference. Consider this piece of Perl:
$first = '/1234234/2234234/3423423/5234234/634234/';
($p1) = $first =~ m!(/\d+)!;
print ("$p1 \n");
($p1) = $first =~ m!.*(/\d+)!;
print ("$p1 \n");
Which runs as follows:
[trainee@holt lm10]$ perl dstar.pl
/1234234
/634234
[trainee@holt lm10]$
What is the difference between the two? The first has given the first possible match, and the second has given the last possible match - that's because the ".*" acts as a sponge and eats up as much of the incoming string as possible. If you're looking for a yes / no as to whether something matches, conventional wisdom of "no leading .*" is correct ...if you're looking for match strings, .* makes all the difference.
The first example I gave was PHP "Perl Style" regular expressions, the second was in Perl .. and this technique applies across other languages that use regular expressions. There's a modified behaviour with Python's match method as it only checks at the start of a string.
I wrote this technical briefing as a result of a question on a Lua course - and yet Lua does NOT support regular expressions - "Lua is a small language, and a regular expression engine would be bigger that our entire standard library - it is too expensive" is the reasoning. But it does have pattern matching, and that gives you almost everything you want - it looks very "regular expression like". See the 80 / 20 rule.
So here is the same application coded in Lua:
first = '/1234234/2234234/3423423/5234234/634234/'
p1 = string.match(first,'/%d+')
print (p1)
p1 = string.match(first,'.*(/%d+)')
print (p1)
There are Lua "patterns" and not regular expressions - and the techninques look - remarkably - similar!
We run Regular Expression workshops from time to time - see [here] for details. We also run courses in the other languages mentioned that include regular expression or pattern matching - see [here] for a course listing with onward links to individual descriptions