Making Regular Expressions easy to read and maintain
Archive - Originally posted on "The Horse's Mouth" - 2009-05-10 08:30:34 - Graham EllisHave you ever seen a long regular expression made up of so many special characters that you can't read or maintain it very easily? Something like
/\(\s*([A-Z]{4})\s+(\d+(?:\.\d*)?)\s*\/\s*(\d+(?:\.\d*)?)\s*\/\s*(\d+(?:\.\d*)?)\s*\)/
We offer a Regular Expression Course that will help you understand things like this more easily ... and also help you write them in what I believe is an easier-to-maintain way:
# A 4 letter word in capitals, to be captured
$word4c = '\s*([A-Z]{4})\s*';
# White Space
$spaces = '\s+';
# A number - may have a decimal point and digits thereafter, to be captured
$floatc = '\s*(\d+(?:\.\d*)?)\s*';
!\( $word4c $spaces $floatc / $floatc / $floatc \)!x
Although the coding of this example is longer, I have set up a series of intermediate variables that has avoided the need for me to repeat complex patterns. And those same intermediate variables can be used in other matches to similar data, avoid the need for repeated logic. "Why is it when people are so proud of their skill in removing duplicated code by writing functions, they often go and spoil the whole thing by repeating the same regular expression or printf format many many times" I ask myself!