Main Content

Splitting data reading code from data processing code - Ruby

Archive - Originally posted on "The Horse's Mouth" - 2011-02-04 09:19:14 - Graham Ellis

An iterator (a.k.a. generator in Python) is a function which returns its results as it calculates them, rather than building them up into a larger structure to return all at once when the function is completed. So where you have a big flow of incoming data, you can handle it as it arrives rather than setting up massive arrays / lists.

Ruby makes very strong use of iterators. You define your iterator function to yield results as it gets them, and then you call your iterator and give it a block of code as an extra parameter. Here's a complete example:

def employees(filename)
   fh = File.new(filename)
   while staff = fh.gets
      yield staff
   end
end
 
def listskills(info)
   els = info.split(/\s+/)
   name = els.shift;
   print "#{name} knows #{els.join(", ")}\n"
end
 
employees("../data/requests.xyz") {|info| listskills(info)}


The exciting thing here is that you have split the code of your data reading phase (in the method "employees") from the code of your data processing phase (in the method "listskills") so that each function performs its own logical task, but they are co-existing / running almost in parallel. It would have been possible to write this code, easily, all as a single loop. But then the reader and processor elements would be forever intertwined, and there would be no easy way of reusing either element in another program.

My example above used a {} block to specify the operations to be performed on each value yielded by the iterator. I could also use a do ... end block. Example of the do-end [here], above example code in our course resources [here].