Python's Generator functions
Archive - Originally posted on "The Horse's Mouth" - 2006-01-11 03:06:16 - Graham EllisWhen you write a program, you should split your code down into a series of named blocks, each of which performs a single logical action. Blocks will be called up in different ways in different languages - and they'll be know as functions, subs, methods, procs (and perhaps other names too), but the principle is very much the same in each case.
Named blocks are usually performed one after another as a sequence of statements from within another block.
In this example, the pow2 function is performed completely before the for loop is run - the returned list being used as input to feed the variable called vv:
def pow2(upto):
powers = []
startat = 1
startpower = 1
while startpower <= upto:
powers.append(startat)
startat *= 2
startpower += 1
return powers
for vv in pow2(10):
print vv
It works, and works well. But the list called "powers" is constructed completely and stored in memory while the for loop is run. That's not a problem in this case with a list of ten values, but it could be a problem if the list was to contain a billion values. Writing things in this style is rather like building up a whole supply of water in a reservoir, then letting it back out as it's required. If the reservoir isn't big enough, you'll get serious flooding and damage to nearby properties.
Python provides an alternative in Generator Functions. A generator function is one that does NOT run to completion when it's first called - instead, it only runs until it has a value available to return, at which point it yields that value back and suspends operation until called again to resume. Here's the program I provided above, modified to use a generator.
def pow2(upto):
startat = 1
startpower = 1
while startpower <= upto:
yield startat
startat *= 2
startpower += 1
for vv in pow2(10):
print vv
Operation is exactly the same as the example above ... but there is NOT an intermediate list produced - i.e. no reservoir, and no risk of flooding or damage if the reservoir spills. A better model in this case is to think of a water pipe with a tap (faucet) on the end, which can be turned on and off at will as the next element - drop of water - is required.
Generators provide a very neat way of providing data as required on applications that potentially use huge intermediate lists. If you've every wondered what the difference is between range and xrange, or between readlines and xreadlines, it's that the "x" version uses a generator internally. I recall being given a customer problem handling a huge (10 Gb) file, and fixing his problem straight away just by adding an "x" in front of "readlines". That is the power of a generator.
Further example
Further articles on Python functions