Main Content

Be gentle rather than macho ... regular expression techniques

Archive - Originally posted on "The Horse's Mouth" - 2010-08-08 11:22:15 - Graham Ellis

Please don't be "macho" with your regular expressions.

You can write a very long and complex pattern to match something like a postcode or email address, but if you do, it's be hard to test, difficult to debug, and awkard to maintain. And to add insult to injury, it will probably run slower that the alternative approach.

"What alternative approach?" I hear you asking.

Softly, softly.

Just as you would (manuall) look at an email address quickly and say "does it have an @ symbol in the middle", do the same thing with an easier regular expression that a full, detailed match. Capture the bit before the @ (the user name) into one variable, and the bit after the @ (the domain name) into another. And only then check each of them to see if they match (a much simpler) pattern.


From last week's Python course - an example [full source]:

eare = re.compile(r"^\s*(\S+)@(\S+)\s*$")
emmas_bits = eare.findall(emma)
domain_els = emmas_bits[0][1].split(".")


The first line breaks the email address down into two chunks which it stores in the emmas_bits collection. Then the split separates the elements of the domain name at the "." characters. Much easier to read and write than trying to do it all in one!