Getting more than a yes / no answer from a regular expression pattern match
Archive - Originally posted on "The Horse's Mouth" - 2012-06-30 15:49:45 - Graham EllisGuy walks up to me in the street and asks "Could you direct me to the Town Centre?". So I answer him "yes, I could", and walk on. Does he thank me? No - he probably thinks "what a useless half answer" or "how rude can you be", even though I completely and correctly answered his question.
It's sometimes a bit like that in programming too. If I, in my code, as the question "Does what the user typed in looks like it contains a postcode", then that may be only be half an answer. I may really want to know what the postcode is, and I may want to take significant parts of the postcode and do something more with then. For example, if my user enters
156 Broadway, Chadderton, Oldham, Lancashire, OL9 8AU, UK
I may want to know some or all of:
• Yes, it contains a postcode
• The postcode is OL9 8AU
• The postcode is in the OL area.
Regular Expressions allow me to check whether a string matches a pattern, and in most languages return a true / false (yes / no) type answer. But they're also capable of returning or storing ancilliary results from the match so that the programmer isn't require to write loads of other follow up code.
On Friday's Regular Expression Course we took a look at that, using examples in PHP as that was the most relevant language to the student group.
So:
$result = preg_match('/[A-Z]{1,2}[0-9][0-9A-Z]{0,1} {1,}[0-9]{1}[A-Z]{2}/',$line);
print ("5. Result is $result\n");
will say "yes, that line contains a postcde (set $result to 1)" if the line contains something in postcode format, or "no, that line does not contain a postcode (set $result to 0) if it does not. However, if I write:
$result = preg_match('/[A-Z]{1,2}[0-9][0-9A-Z]{0,1} {1,}[0-9]{1}[A-Z]{2}/',$line,$gotten);
print ("6. Result is $result. Side result $gotten[0]\n");
I'll be given the postcode back too, as the first member of a who array of extra output data which I have chosen to call $gotten. I can take this a whole lot further - identifying multiple postcode is that's what the incoming string contains, and also telling my program what are the interesting bits that I want to store in further elements of $gotten. Thus:
$result = preg_match_all('/(([A-Z]{1,2})\d[0-9A-Z]?) +(\d[A-Z]{2})/',$line,$gotten);
print ("10. Result is $result. Side result "); spew2d($gotten) ;
With input string:
I live at SN12 6QL which is just up the road from here and in 2 weeks train near E3 4HC in London?
I got the following results from the code above:
5. Result is 1
6. Result is 1. Side result SN12 6QL
10. Result is 2. Side result
0/0: SN12 6QL 0/1: E3 4HC
1/0: SN12 1/1: E3
2/0: SN 2/1: E
3/0: 6QL 3/1: 4HC
Full program - including the spew function - [here].