Testing BioPython

2008-07-27

Following my previous posts on Bioinformatics with Mac OS X and Installation of Python scientifical packages, I just try the BioPython package.

BioPython is an open-source project, based on the same principle as the older BioPerl project.1

It aims at providing a unified interface to traditionnal methods for computational molecular biology. The question then arises as to whether it conflicts with the Bioconductor or BioPerl initiative.

In fact, the Bioconductor project provides a set of about 260 packages (as of July 2008) that enhance the core R software. Most of them are supposed to integrate a growing collection of statistical tools. It is a popular library for microarray analysis since it supports most formats and comes with many graphical tools. On the contrary, BioPerl or BioPython are thought to be used to manipulate data file, organize the information and be interfaced with other programming languages. Thus, they fulfill their role of “universal” scripting language and appear very complementary of R biobase. Whether to use Perl or Python is then a matter of taste.

The installation of BioPython is quite easy, provided you have easy_install working properly.2 Thus, just type

$ sudo easy_install -f http://biopython.org/DIST/ biopython

at the command prompt, or follow the install instructions on BioPython website.

Next, obviously, you will have to read the documentation. There is also a general paper that describes most of the BioPython functionnalities, though it’s more oriented toward a wide audience: S. Bassi. A Primer on Python for Life Science Researchers, PLoS Comput Biol 3(11): e199.

Notes

1 There is also a BioJava project and a BioRuby project.

2 Installing BioPerl on a Mac, you may either use Fink (e.g. $ sudo fink install bioperl-pm588 at the command prompt), use the cpan interface, or follow the instructions on the BioPerl website on how to do it the hard way. Further Notes: Bioperl needs a working version of GD. Installing GD on Mac OS X is well-documented, but you may encounter some difficulties when compiling the source package because of lacking png and jpeg libraries. First, check that libpng.a and libjpeg.a are installed (usually in /usr/local/lib/). Then, update your table of archive using ranlib, e.g. $ sudo ranlib /usr/local/lib/libpng.a. If you now try to compile the gd archive, it may produce the expected result and install libgd.a in the proper directory. However, if you see a message like ranlib: file: /usr/local/lib/libgd.a(gdcache.o) has no symbols when issuing $ sudo ranlib /usr/local/lib/libgd.a at the command prompt, it means that you have a problem with gdcache.o. According to this post, the problem lies in the fact that gdcache.o contains no symbols because neither libttf nor libfreetype were included in the build, and hence libgd thought the cache was unnecessary. Thus, you may safely remove the occurence of gdcache.o in the last part of the Makefile (line beginning with libgd.a:) and try to compile again the source package. Don’t forget to clean the previous install with $ sudo make clean before doing a new installation.

---

Articles with the same tag(s):

Light Table and interactive live coding
Python for interactive scientific data visualization
Emacs auto-completion for Python
Interacting with Weka from Jython
IPython 0.11, the new killer app
Installing numpy+scipy on OS X Lion
Python for statistical computing
Bayesian analysis with Python
Visual psychophysics with Python
Python and indentation

---