What are numpy and scipy?
Archive - Originally posted on "The Horse's Mouth" - 2010-10-09 07:04:08 - Graham EllisIn Python, all the operators are really methods - in other words, you write
c = d + e
and you're really writing
c = d . __add__ ( e )
So this means that it's possible to use the language to handle data of any sort, including data types that aren't supported at standard. It's even possible to wrap up low level system types into the langauge, and control 16 bit integers, 32 bit integers, 64 bit integers, 32 bit floats, 64 bit floats and so on through the scripting language.
But Python goes further - even collection references such as
c[4] = jp[3]
use built in methods and you're really writing
c.__setitem__(4,jp.__getitem__(3))
See [example]
This week, I've presented both a beginner and an intermediate private Python course, and at the start of the beginner course these features were quietly and intentionally overlooked - they're not "first day" stuff for newcomers and indeed they're not even last day stuff many time. But it turns out that - in the right circumstance - they can be very useful indeed ...
Scenario ... You are doing a lot of statistical work and data analysis, on sets of information which are huge. For the sake of efficiency, you've previously had to code in C, where you can choose the data type that's required (but minimalist) for your needs, and you can use arrays that work with sequential memory locations and so lookups are fast, rather than having to trapse through the lookup system of a list / hash type collection with its wonderful (and unneeded in your case) flexibility at the cost of speed
Well - in this scenario, your C coding's going to be a bit slow; C isn't the fastest language to write even though it can be fast to run, and you find yourself taking a frustrating amount of research and development time up as you tweak and tune your code to experiment with the data that you're analysing, after all for experimental purposes.
Solution ... Use Python as your controller, via the OO based interface I desrcide at the top, sweetened by the icing of the operators that Python provides. And use a well-tuned C library, with support for the low level data types (dtypes) that you need, and true arrays as a type too, to provide the fast power that's under the instruction of the scripted controller.
"Surely someone has done this before" you should be asking. and "surely it's available for me to download". Of course, you're right - you're looking at numpy - numerical Python. You can see the source code of "Hello numpy world" ... [here]
Numpy goes further, though - once you've been provided with the interface to the basic types and true arrays, it became natural to add to the module to provide facilities which wouldn't be so natural on the more general types supported by the language; you'll see in the source code example:
print table + 5
which adds five to each member of a numpy array before printing it out - array aithmetic. And it's natural to support array addition and array multiplication too (yes, it does).
There's another requirement too - you'll often want to initialise a whole array, perhaps multidimensional, to a mathematical sequence of some sort. Perhaps that would be to a simple series of numbers (an array range or arange), or perhaps to something more complex. And, yes, numpy provides such things:
tab2 = numpy.indices((3,3))
in my little example linked abve. There are further examples [here].
Numpy is only used by a small proportion of Python users - but for that small proportion it's vital stuff. We don't do more than mention it on our regular public courses, but I can provide you with a couple of hours insight on private sessions. If you're a mathematician, you'll soon get to know far more of the detail than I do - and that's doubly so if you add on the extra scipy module which adds a whole raft of different statistical and mathematical algorithms - all ready-coded for you, operating on the C based structures for C execution speed, but controlled by the Python script for Python coding efficiency.