Main Content

Regular expressions made easy - building from components

Archive - Originally posted on "The Horse's Mouth" - 2007-08-16 20:25:39 - Graham Ellis

There seems to be a certain macho desire in many programmer's minds to write a single complicated regular expression to match against an input line, ignorning the structured approach that everyone accepts quite cheerfully in almost every other case. Have a look at this Python line:

wholeline = r"\d\d-...-\d\d\d\d\s+(\d\d):(\d\d):(\d\d.\d\d),\s+(-?\d+\.\d+),\s+(-?\d+\.\d+),(-?\d+\.\d+),\s+(-?\d+\.\d+),(-?\d+\.\d+),\s+(-?\d+\.\d+)"

Impressive, isn't it?
Yes
Easy to follow, isn't it?
No!

Much better to build it up from a series of components:

date = r"\d\d-...-\d\d\d\d"
time = r"(\d\d):(\d\d):(\d\d.\d\d)"
whitespace = r"\s+"
floater = r"(-?\d+\.\d+)"
wholeline = date + whitespace + time + "," + whitespace + \
  floater + "," + whitespace + floater + "," + floater + \
  "," + whitespace + floater + "," + floater + "," + \
  whitespace + floater


These examples are from the Python Course I have just concluded - the full example is here - where a log file was to be analysed and a short report generated to highlight any changes in readings of over 1% from one line of the data to the next in any of the data columns.