Main Content

Programming with random numbers - yet re-using the same values for testing

Archive - Originally posted on "The Horse's Mouth" - 2016-06-22 06:50:30 - Graham Ellis

There are certain programming jobs where you want to simulate / model random occurrences - for example to shuffle a pack of cards, or (a manpower planning example from many years back) work out whether people leave an organisation duirng a year; it's no good working with percentages as you'll end up with athird of a person here and half a person there when you really want to look at what the organisation might look like. And you cetrainly want random numbers if you're going to run a series of simulations to see how consistent your results are ... this is done with (for example) weather forecasting, where random noise is introduced into readings which are likely to be of high granulatity / limited accuracy.

One of the issues you'll have with "random" values is that they're not really random. For all intents and purposes - under most ciscumstances the built in functions / modules within programming language suffice, but if you're concerend, please do read the individual manuals carefully. But the are, typically, pseudo-random returning a number from a sequence that looks pretty random, and staring from a "seed" point which will differ every time you run the program - usually the start point will be based on the system clock.

A second issue you'll find when using random numbers is how to test your program. You'll get one set of results and perhaps spot what you think is a bug, you'll fix it and say "does it work now". And a test will, indeed work correctly. But you have to ask "did I fix the problem, or did the test just happen not to use that logic because of the sequence of random numbers used?". Fortunatley, in most langauges you can set your own seed, so that you can record the sequence start point and replay the same sequence if the need arises.

[here] is an example, written in Python from yesterdays, Python programming course.

  now = int(time())
  
  # Are we using an old seed
  if len(argv) > 1:
    now = int(argv[1])
    print "replay"
  
  # Set the seed and notify the user
  seed(now)
  print now


The seed normally used is from the system clock (an integer so we don't get rounding issues when reseeding) and it's reported back. Normal runs of the program differ second by second; add the old time value (which was reported back) on the end of the Python command if you want to re-run a previous set of data.