Arrays v Lists - what is the difference, why use one or the other
Archive - Originally posted on "The Horse's Mouth" - 2010-10-10 08:04:23 - Graham Ellis
If you want a program to run quickly through a data set (that's the sort of thing you'll be doing in heavy scientific work), you'll want the data loaded into successive memory locations - but that means that you have to know how much space to allocate before you set the data up. Otherwise, you'll find that you're overriding data.
If you don't know how much data you'll have to work with in a program, so you cannot set up sequential memory locations, you'll have to use a scheme where each block of data includes a pointer to the next block so that you can keep adding blocks of data even if you've got the first part(s) loaded somewhere on the heap that can't be easily expanded.
Data stored in sequential / successive memory locations, and where each element of the data takes up the same amount of storage, is known to computer scientists as an ARRAY. Data stored with forward and backward pointers is known to them as a LIST. Unfortuatey, the terms have been muddied by the authors of programming languages, so the term "array" is or was used for a list in some languages ... and indeed Tcl's Arrays are a different type of collection all together!
Python uses lists. It calls them lists, and they ARE linked lists. A Tuple is a similar (but lighterweight) structure to a list, which is more efficient to acccess but lacks alteration facilities.
If you want to use the power of Python's scripting language to handle heavy scientific data, you can do so through the numpy module which you can download from the Scipy site - [here]; numpy supports true arrays, and also the basic data types of C, in a nice Python wrapper, giving you the best of both worlds.
But there's still the intrinsic problem that you have to know how much data you'll be saving into successive memory locations before you start to fill them ... and there are two ways of solving this:
1: by using an internal staging area to hold the data as you load it all in, then transferring it all to an array once you know who big an array to create
2: by looking ahead and working out how big the data set will be - typically by looking at the size of the incoming file.
There are examples that use these techniques on our web site - and I've added a couple that go along those lines to the numpy section of the site this morning, based on examples written for the course that I concluded on Friday. You'll also see from them how numpy may be used to read in binary data.
[source] - numpy loading an array [source] - numpy loading a list of arrays / checking a file size
I have used my avatar (shown here) as the example file to load a 2 byte integers; if you look at the data shown in the source examples, you'll see that the 4th and 5th values are 72 and 73, showing that the image is 72 x 73 pixels. Of course, you do have to understand the format of an image file to make use of things in this way. Further details (in Perl) [here].