Main Content

Regular Expression Myths

Archive - Originally posted on "The Horse's Mouth" - 2010-06-13 04:09:49 - Graham Ellis

Does this look good to you?
  /.*\S+[@]\S+.*/

It shows some regular expression myths that I would like to explode!

Myth 1. If you want to match a specific character, you must put it in square brackets.

WRONG ... Square Brackets are a grouping - if you're looking to match just a single specific character, you can simply add it in without the square brackets. A word of caution ... there are a few characters which need \ protection to make sure they are taken literally outside []s, but which have no special significance within the []s.

Myth 2. If you want to match something in the middle of a string, you should start and end your regular expression with ".*" - i.e. anything, then (pattern), then anything.

WRONG ... regular expressions match within a string, so the .* on the beginning and the end are redundant. Two exceptions, however ... (i) - in Python, the match method looks at the beginning of a string, so if you're using it to look in the middle of a string, you'll need the .* and (ii) If you are capturing the string that matches - using capture parenthises for example - a leading .* will select a different match for you - it'll select the last match in your incoming string rather than the first match.

Myth 3. A "." matches any character at all.

WRONG ... by default, a "." does NOT match a new line character. This only makes a difference if you're matching against a string that may contain multiple lines of text, and this very slight restriction is applied by default so that you can safely match within a single record using .* even if you have multiple records in a long string. "Single line mode" - an s modifier in Perl, and re.DOTALL in Python, allow you to force a dot to truly match on any character including a new line!

We cover regular expressions on almost all of our courses [Schedule]. That's Perl, PHP, Python, Tcl, Ruby, ... they're also used and briefly covered on MySQL, Apache httpd (Linux Web Server, and Deploying LAMP), and we have a separate One day regular expression course too which is suitable for skilled programmers in any of the areas I have mentioned who wish to take their regular expressions further. Regular expression engines are available also in C and Java ... though we only cover them by request during courses on the subjects. Lua's pattern matching is very similar to Regular Expressions (and you can learn a lot from one about the other), but we do not mix the training - if you want to learn about Lua patterns, come on a Lua Course.


Illustration - course delegates. This article was inspired by the gentleman on the left of the picture, who had significant data to comb through and with whom I had long, fascinating and wide ranging discussions on regular expressions.