Dataviz

Yet another gray theme for R base graphics

Among things I like with R is that if you are not happy with default settings, e.g. for graphics, then you can usually update some parameters or make your own plotting function. For instance, Karl Broman proposed his own theme for base R graphics, with a grey background for the main plotting region (à la ggplot2). He even uploaded a full package to CRAN; see the grayplot() function. Here is an example of a beautiful R graphical display:

Python for interactive scientific data visualization

Some random notes on recent ‘pythonic peregrinations’ on my Airbook. Python packages management is really painful. My /Library/Python/2.7/site-packages is just a mess. This is probably due in part to the fact that I switched from easy_install to pip two years ago, but anyway there’s a lot of useless stuff in there. I heard about Bokeh, a new plotting library for Python. Basically, it ought to embed Wilkinson’s Grammar of Graphics into the d3js framework.

Bar charts of counts in Stata

The second part of my course on R and Stata has just started (four weeks to go). This is about Stata this time. First part of the course is about data management, descriptive statistics and basic test of association. Although I prefer dotplots over barcharts, I often miss some of the facilities we have with R base barplot, or its lattice equivalent barchart, used in combination with table or xtabs.

Interactive Data Visualization With Cranvas

One of the advantage of R over other popular statistical packages is that it now has “natural” support for interactive and dynamic data visualization. This is, for instance, something that is lacking with the Python ecosystem for scientific computing (Mayavi or Enthought Chaco are just too complex for what I have in mind). Some time ago, I started drafting some tutors on interactive graphics with R. The idea was merely to give an overview of existing packages for interactive and dynamic plotting, and it was supposed to be a three-part document: first part presents basic capabilities like rgl, aplpack, and iplot (aka Acinonyx)–this actually ended up as a very coarse draft; second part should present ggobi and its R interface; third and last part would be about the Qt interface, with qtpaint and cranvas.

Easy creation of videos with R

While preparing a talk due in three days or so, I thought it would be good to show some live demonstration of regularization techniques in regression with ggplot2. It sounds like a lot of people start with splines or polynomial regression to demonstrate overfitting. I believe this has something to do with Bishop’s book on Pattern Recognition and Machine Learning, see e.g. Shane Conway’s recap’ on Stanford ML 5.2: Regularization .

Biplots

I thougth it would be funny to relate how I came from a query about ‘biplot displays in lisp’ to statistical computing in R, using Google. So, basically I was looking for existing implementation of biplots for Common Lisp. To be honest, I was suspecting that something would be available for xlispstat, and that was the very first hit: xls-biplot was written by Frederic Udina eight years ago. His paper published in the Journal of Statistical Software explains available transformations (functional transformation, weighting, centering) of the raw data and the way coordinates (standard, principal, canonical) can be computed to express variables relationships in a reduced factorial space.

Visualizing What Random Forests Really Do

Apart from summarizing some notes I took when reading articles and book chapters about RFs, I would like to show some simple way to graphically summarize how RFs work and what results they give. Some time ago, there was a question on stats.stackexchange.com about visualizing RFs output. Well, in essence the responses were that it is not very useful as a single unpruned tree is not informative about the overall results or classification performance.

GIS on a Mac

I decided to install a GIS software, just to be able to explore some spatial clustering models, play with the visualization of geographical information, and also because of the limited resources available in R. My first idea was to look at the Quantuum GIS project. It looks pretty nice and is available for OS X 10.6. I also decided to reinstall GRASS 6.4 (I’ve tried to compile an old version by hand in the past, it was really a pain…).

Asymptote and Metapost

I am planning to make a lot of illustrations for basic mathematical and statistical concepts, but I am still hesitating about the drawing program to choose. I know a bit of Metapost and Asymptote, but I am not clear about the pros and cons of each of the vector drawing language. Especially, I’ve heard that Asymptote is somewhat “superior” to Metapost. MetaPost is based on Knuth’s METAFONT, but is intended for figures in technical documents according to its primary author, John Hobby.

Playing with BackupMyTweets

Lastly, I gave a try to backupmytweets to get an archive of my Twitter account (@chlalanne). Everything went fine, and my tweets were available after a few days. What is annoying, however, is that I cannot download the data. I tried several times during the last two weeks, and I always end up with the following message: So, I decided to save the raw HTML page and then process it manually.