Main Content

the array returned by preg_match_all

Archive - Originally posted on "The Horse's Mouth" - 2005-06-06 05:40:00 - Graham Ellis

PHP supports two regular expression handlers - the "ereg" series that uses POSIX style regular expressions, and the "preg" series that uses Perl style. Why? Because both have their uses - the POSIX style makes for easier-to-read expressions and are said to be easier to learn, and the Perl style is more powerful and quicker in operation ... at the price of being harder to learn and follow.

I was updating some code in our Wiki yesterday and needed to search for all words in a string that start with a capital and include an embedded capital, since we use them as links - visit our shared data system to see what I mean. But then I hit a curious bug - introduced myself - where there were multiple hits. Turned out to be a good reminder that preg_match_all returns an ARRAY of ARRAYs - with each array contained being a list of each of the elements matching the revelant "interesting bits" in the regular expression given. Hmm - that might be clear as mud when I describe it, so here's an example:

<?php
$demo = "SwinDon ChippenHam MelkSham TrowBridge and WestBury";

preg_match_all('/([A-Z][a-z]+)([A-Z][a-z]+)/',$demo, $gotten);

foreach ($gotten as $set) {
foreach ($set as $item) {
print "$item ... ";
}
print ("<br>");
}
?>


You would EXPECT to get five arrays back - one for each of the place names. But actually you get just three - one for the complete names of the places, one for the start of the place name, and one for the end of the place name.


SwinDon ... ChippenHam ... MelkSham ... TrowBridge ... WestBury ...
Swin ... Chippen ... Melk ... Trow ... West ...
Don ... Ham ... Sham ... Bridge ... Bury ...


Once you've come across this for the first time, it's easy enough to handle in your program - but I just felt it was worth a Beware - counterintauative feature warning here. And there are flag options to the command that let you alter its behaviour if you wish