Main Content

Global Regular Expression matching in Ruby (using scan)

Archive - Originally posted on "The Horse's Mouth" - 2015-01-08 07:13:16 - Graham Ellis

Regular expression 'engines' start at the left of an input string they're matching against, and counts are usually greedy, so by default they return "leftmost, longest" matches - which works for what users want in most cases.

However, sometimes you may want to work through all (non-overlapping) matches in a string - for example to pick up a series of email addresses or URLs from a line or block of text. In Ruby, you can achieve that through the scan method on a string. A single match:
  if myStringRecord =~ /\s*(\S{1,})@(\S+)\s*/
or changing that to a multiple or global match:
  myStringRecord.scan(/\s*(\S{1,})@(\S+)\s*/) do |tom,dick|
which is a loop populating the variables dick and tom.

Complete example [here]. Topic covered on our Ruby Courses.

Should you wish to match the shortest rather than longest alternatives, an additional ? character after the count(s) you want to vary should be added.
  "banana" =~ /a(.*)a/
  puts
  "banana" =~ /a(.*?)a/
  puts

outputs
  nan
  n