Loading and saving data - Python / numpy
Archive - Originally posted on "The Horse's Mouth" - 2010-10-09 07:38:42 - Graham EllisIf you're using big data sets in Python, you're probably using the numpy module - providing you with fast data handlers at C speed of running, and Python coding speed. But how do you load that data in? Numpy also provides a number of data handlers, data setup routines, and also a save and restore capability.
There's a very basic example at [link] where I've generated a numpy object from text (I could have used a file ...) - each row and column in the incoming text string has been placed into a row or column in the numpy array.
I've added a further example too ...

Numpy's save and load functions allowed me to dump out my array to a file, and to load it back in again - my 10 seconds drops to less that 1 second if I do this for a week of data (and for six months it would drop me from about four minutes down to 1 second!).
The code to convert my Python list in which I did the counting (that's another numpy extra feature) is:
info = np.asarray(counter)
and the code to save the data to file is:
np.save("logweek.npy",info)
When I came to run the program (again), I simply had it check if the file existed and if it did, I loaded it:
if os.path.exists("logweek.npy"):
info = np.load("logweek.npy")
The complete source code example is [here] ... note that it also uses matplotlib - a plotting library that's often used in association with numpy and scipy
If you're looking to save pure Python data, have a look at the Pickle and Marshall modules that are a part of the standard distribution ... or the cPickle module which is implemented in C and much quicker; this latter becomes the standard in Python 3. We have various examples around - [marshall example] and a [post on pickling].