Here is a short review of Introduction to Bioinformatics Algorithms, written by Jones & Pevzner (The MIT Press, 2004), that I mentioned in a previous post. Pavel Pevzner now offers an updated version on this textbook, Bioinformatics Algorithms: An Active Learning Approach, with full set of slides available on Philip Compeau’s homepage. See also this website, where Saad Mneimneh offers solutions to selected exercises from each chapter.
Briefly, I started reading this book as a way to remind myself of some of the most important algorithms in bioinformatics. I did read Bioinformatics (Polanski & Kimmel, Springer, 2007 — best book I’ve read so far) and part of Bioinformatics Algorithms (Măndoiu & Zelikovsky, eds., Wiley, 2008 — a bit outdated since the advance in next-generation sequencing) a while ago, while I was working as a biostatistician. However, this textbook attracted me because it is aimed at two very distinct audiences: on the one hand, biologists, who can familiarize themselves with the basic concepts of informatics for bioinformatics applications, and on the other hand, computer scientists, who are made aware of the context of genetics and molecular biology and the underlying theoretical issues.
I have indeed been able to apply my skills in this new area, but only after coming to understand that solving biological problems requires far more than clever algorithms: it involves a creative partnership between biologists and mathematical scientists to arrive at an appropriate mathematical model, the acquisition and use of diverse sources of data, and statistical methods to show that the biological patterns and regularities that we discover could not be due to chance. — Richard Karp
The authors make use of simplified pseudo-code for all the algorithms discussed in this book — on the basis that part of the target audience are biologists. I found it nice, as it is heavily inspired from Python syntax (significant indentation is fine for reading purpose, IMHO). The introductory chapter on computer science (CS) is pretty basic stuff that can be found in any introductory textbook (chapter 2): algorithmic complexity, recursive versus iterative approach, type of algorithms (brute force, branch-and-bound, greedy approach, dynamic programming, divide-and-conquer, machine learning, randomized algorithms), and NP-completeness. It is intended for biologists. For CS folks, the third chapter provides a gentle primer to biology.
A striking feature of this book is the way in which the authors introduce the reader to relatively complex algorithmic subjects of dynamic programming via game theory accessible to everyone. For instance, in the case of the well-known Blast algorithm, the authors offers this “winning recipe” for the rock game:
If Alice takes one rock from each pile, I will take the remaining rocks and win. If Alice takes one rock, I will take one rock from the same pile. As a result, there will be only one pile and it will have two rocks in it, so Alice’s only choice will be to take one of them. I will take the remaining rock to win the game.
It may seem that this is yet another introductory textbook to programming for biologists, but bear in mind that the authors rather emphasize the design of efficient algorithms (and no real code is used), and this is not limited to basic string alignment or sorting algorithms. More elaborated topics are covered in large, including graphs algorithms (in the context of alignement and protein sequencing, after introducing basic principle using a chessboard), suffix trees and combinatorial algorithms, evolutionary trees, Hidden Markov models (again, using analogy with betting in a casino), and finally randomized algorithms. This is by far one of the most interesting books I’ve read in a long time — the other one being A Programmer’s Introduction to Mathematics, by Jeremy Kun.