Generating traffic for network testing
Archive - Originally posted on "The Horse's Mouth" - 2007-07-29 14:43:38 - Graham Ellis"Why do you have a random number generator in a scientific language" ask some newcomers to the world of programming ... "shouldn't programming be an exact science?". Well yes, and no. It turns out that a random number generator is exceedingly useful in some instances. Here's a real, live example from last week.
In order to test a computer network under load, packets of [[noise]] must be sent between two devices up to the capacity of the line. Rather than have all of the packets of the same size, which would be unnatural, the packets have to be a mixture of small ones (80% of the data must be in small packets), medium size ones (5% of the data) and large ones (the final 15% of the data).
The first - obvious - way to generate the data is 80 small packets, 5 medium ones and 15 big ones. That's wrong - you need to adjust the proportions, increasing the number of small packets and decreasing the number of big ones, since it's the amount of data, not the number of packets that we're looking at. But then even that's going to give a distorted set of traffic with a large batch of small packets followed by a number of medium ones and a number of larger ones. The overall distribution will be correct, but the test results will be broken due to the false patterns in the noise data.
The next approach is, indeed, to bring in random numbers. You start by working out the proportion of PACKETS of each size that are needed to generate the appropriate amounts of DATA. You then take a series of random numbers between 0 and 1 (decimal numbers!) and you compare each random number to the series of proportions - perhaps up to 0.94, you'll send out a small packet, from 0.94 to 0.97 it will be a medium packet, and above that a large packet.
That's better - MUCH better. And indeed I wrote a Tcl proc to illustrate the algorithm during last week's course. Source code.
One of the problems with a program such as this is that it will never produce the same results twice - in one way, that's good as the traffic it represents is inherently different each time anyway, but on the other hand it means that any oddity you see as it is run won't be reproducable. So we can "seed" the random number generator - i.e. give it a know starting point / record the starting point so that we can, if we must, re-run a particular test case. And, yes, this technically means that it isn't random at all but rather pseudorandom.
Here's the result of me running my test code:
Dorothy:~/tcl grahamellis$ tclsh pk4.tcl
1185718234
80 3 15
79 5 15
29 10 59
62 9 7 1 5 12
Dorothy:~/tcl grahamellis$
Firstly, I reported the seed. Then the percentage of traffic in my test pattern of each of the 3 sizes (first three tests) and each of the six sizes (final test). Looks good - I have chosen NOT to output the full test sequence on my blog as it would make it rather long and extremely boring.
Where from here?
In practise, even a straight random distribution isn't perfect. Typically you'll have a long series of small packets after each big packet as the small packets are supposed to take line priority, and a number of them are likely to get queued while a medium or log packet is sent.
The you;ll be looking at traffic models where the bandwidth is less that 100% used, and you'll need to put random timing gaps in. Excecpt they won't be random, because once again after a long packet there's likely to be a backlog of smaller ones queue and - just like you find when you're waiting in line to check in at the airport - it can take a few minutes of flurried activity to clear the backlog once the check in agent has spend some considerable time sorting out the gentleman who brought along his wife's passort by mistake, or the lady who wants to travel on today's flight even though she failed to turn up with her unchangable ticket that had her booked for last Friday.