Regular Expression Myths
Archive - Originally posted on "The Horse's Mouth" - 2010-06-13 04:09:49 - Graham EllisDoes this look good to you?
/.*\S+[@]\S+.*/
It shows some regular expression myths that I would like to explode!
Myth 1. If you want to match a specific character, you must put it in square brackets.
WRONG ... Square Brackets are a grouping - if you're looking to match just a single specific character, you can simply add it in without the square brackets. A word of caution ... there are a few characters which need \ protection to make sure they are taken literally outside []s, but which have no special significance within the []s.
Myth 2. If you want to match something in the middle of a string, you should start and end your regular expression with ".*" - i.e. anything, then (pattern), then anything.
WRONG ... regular expressions match within a string, so the .* on the beginning and the end are redundant. Two exceptions, however ... (i) - in Python, the match method looks at the beginning of a string, so if you're using it to look in the middle of a string, you'll need the .* and (ii) If you are capturing the string that matches - using capture parenthises for example - a leading .* will select a different match for you - it'll select the last match in your incoming string rather than the first match.
Myth 3. A "." matches any character at all.
WRONG ... by default, a "." does NOT match a new line character. This only makes a difference if you're matching against a string that may contain multiple lines of text, and this very slight restriction is applied by default so that you can safely match within a single record using .* even if you have multiple records in a long string. "Single line mode" - an s modifier in Perl, and re.DOTALL in Python, allow you to force a dot to truly match on any character including a new line!

Illustration - course delegates. This article was inspired by the gentleman on the left of the picture, who had significant data to comb through and with whom I had long, fascinating and wide ranging discussions on regular expressions.