Regular Expressions for the petrified - in Ruby
Archive - Originally posted on "The Horse's Mouth" - 2015-06-03 10:53:47 - Graham EllisRegular Expressions ... frighten ... newcomers at their apparent peverseness and complexity. But they need not - regular expressions are made up of just a handful of types of elements and once you realise this, they become easy!
The background is that you want to ask is a string of text looks like a particular pattern.
You describe the pattern, from the left, by specifiying a series of:
• specific characters that must be matched or
• character groups (where any one character from a list must be matched
and each of these specific characters or groups is follows by
• a count of the number of times that character of group is to be matched.
For example:
[A-Z]{1,2}[0-9]{1}
means "1 or 2 letters between A and Z followed by one digit between 0 and 9"
Here's a full example to match a British Postcode:
[A-Z]{1,2}[0-9]{1}[0-9A-Z]{0,1} {1,}[0-9]{1}[A-Z]{2}
So that's
• one or two letters
• a digit
• possibly another digit or letter
• some spaces
• a digit
• two letters
Alas ...
• that's getting longwinded and there are shortenings that make it more compact, but more complex-looking.
[0-9] can become \d
{1,} can become +
{1} can be left out as it's the default
{0,1} can become ?
and so on.
• having matched, there's usually a requirement in subsequent program lines to make use of the string that was matched, or the parts of it that matched - and there needs to be a mechanism (round brackets used) to indicate groups to captuure
• the regular expression needs some sort of wrapping within the language to indicate that it is to be treated as a pattern for matching rather than in other ways.
As a first illustration of regular expressions to match a postcode in Ruby, there's an example [here] from last week's course. In that example, I've only applied minimal optimisation to keep it clean. A further example [here] makes use of the shortenings above, and make use of alternative delimiters and ignore-case flags too.