Main Content

Matching within multiline strings, and ignoring case in regular expressions

Archive - Originally posted on "The Horse's Mouth" - 2006-11-25 05:48:56 - Graham Ellis

Regular Expressions are powerful matching tools and you can specify almost anything within them. But there are certain facilities that are naturally applied to the regular expression as a whole rather than to parts of the match, and there are specified in a different way in each language / implementation.

For example, in what is commonly known as multiline mode you may want to match not only at the start / end of the string as a whole, but also match at embedded new lines. You can specify multiline mode as follows:

In Tcl, using the -lineanchor option
In Perl, with the /m modifier on the end of your regex
In Python by adding re.M or re.MULTILINE to your compile

Here's an example, in Tcl, looking for embedded lined containing just ABC:

set samples [list "Hello world\nABC\nThis matches" \
"Another test\nABCD\nNo match" ]

foreach sample $samples {
puts [regexp -lineanchor {^ABC$} $sample]
}



ther facilities often added onto your regular expression as modifiers include:

a) The ability to have "." (the dot) match any character at all, and not to exclude the newline character which it does by default. Sometimes known as single line of linestop mode.

In Tcl, leave off the -linestop option
In Perl, add /s
In python, add re.DOTALL onto the compile

b) The ability to ignore case in the match

In Perl, /i
In Python, re.I or re.IGNORECASE
In Tcl, use (?i through ) in the regex

c) The ability to add white space as comments into your expression

In Perl, /x
In Python, re.VERBOSE
In Tcl, use (?X through ) in the regex