< a quantity that can be divided into another a whole number of time />

Python for statistical computing

February 7, 2011

Pursuant on my previous post on the use of Lisp for statistical computing, here are some links for statistics with Python. Most of the packages listed hereafter have been grabbed on and MetaOptimize.

The two core packages obviously are NumPy and SciPy, which provides infrastructure for handling N-dimensional array object, tools for doing numerical stuff à la Matlab. Combined to Matplotlib, we have a complete scientific numerical platform. The SciPy package already includes some common routines for statistical analysis, but see the Cookbook which collates some worked examples of commonly-done tasks.

The cool thing is that we can benefit from R built-in commands by just using RPy, but see, e.g., Using Python (and R) to calculate Linear Regressions.

Other packages of interest:

Of course, there also are some full-featured application, like Orange that aims to provide a comparable interface to Weka for machine learning and data mining. Another application that I discovered two years ago is VisTrails, which is

an open-source scientific workflow and provenance management system developed at the University of Utah that provides support for data exploration and visualization. Whereas workflows have been traditionally used to automate repetitive tasks, for applications that are exploratory in nature, such as simulations, data analysis and visualization, very little is repeated—change is the norm. As an engineer or scientist generates and evaluates hypotheses about data under study, a series of different, albeit related, workflows are created while a workflow is adjusted in an interactive process. VisTrails was designed to manage these rapidly-evolving workflows.

Finally, Mayavi is great for data visualization, especially in 3D. It relies on VTK. It is included in the Enthought flavoured version of Python, together with Chaco for 2D plotting. To get an idea, look at Travis Vaught’s nice screencast in Multidimensional Data Visualization in Python – Mixing Chaco and Mayavi.

Useful enhanced shells for Python include IPython, IEP, Spyder. And if you like syntax highlighting in your console, then bpython is just fine (and it works like a charm on OS X).

python statistics

See Also

» Diving Into Lisp for Statistical Computing » Archiving my responses on StackExchange » Measures of accuracy for classification » How to efficiently manage a statistical analysis project » Against the systematic use of Fisher's exact test