Biplots

2012-02-25

I thougth it would be funny to relate how I came from a query about 'biplot displays in lisp' to statistical computing in R, using Google.

So, basically I was looking for existing implementation of biplots for Common Lisp. To be honest, I was suspecting that something would be available for xlispstat, and that was the very first hit: xls-biplot was written by Frederic Udina eight years ago. His paper published in the Journal of Statistical Software explains available transformations (functional transformation, weighting, centering) of the raw data and the way coordinates (standard, principal, canonical) can be computed to express variables relationships in a reduced factorial space. The picture below was taken when running the demo file in xlispstat ((test-bp)).

Leading to another, I came across the following paper:

Weihs, C. and Schmidli, H. (1990). OMEGA (Online Multivariate Exploratory Graphical Analysis): Routine Searching for Structure. Statistical Science, 5(2), 175-208.

with a rejoinder by Forrest Young (author of Vista). I'd like to add a note for myself here: I've learned there was a book on dynamic graphics written by Cleveland, Dynamic graphics for statistics (Wadsworth & Brooks, 1988), who also authored another book on Tukey's work, The Collected Works of John W. Tukey: Graphics 1965-1985 (Chapman & Hall, 1988). About 10 minutes after I started browsing Google with cross-links from my initial query, I ended up with this thread on comp.lang.list, Is Xlisp-Stat Dead?, where Ross Ihaka describes his ongoing project on implementing a new R system..., in Lisp (I initially thought that was Scheme that was retained). It looks like a closed loop: From Scheme to R, and back to Lisp! (One year ago, I dropped some notes on Lisp for statistical computing in Diving into Lisp for statistical computing.)

Biplots are really neglected topics in exploratory data analysis, and even more in explanatory data analysis. The French (and probably the Dutch) school uses them a lot as support for descriptive and explanatory analysis, but it's hard to find published papers related to psychology, health research, or sociology that include a detailed account of the use of biplots or factor-related methods. I have some references on hand, like

Good papers can also be found in journals related to ecology. (I mainly came across those papers by using the vegan R package, and reading some of Gavin Simpson's good replies on r-sig-ecology mailing or his webpage.)

For R users, there's the BiplotGUI package but it is for Windows only. A lot of packages for multivariate data analysis and factor-related methods have been released in the past few years. Here are the ones I know: ade4, ca, anacor, vegan, FactoMineR. I often use the latter because it reminds me of earlier courses I've taken in data analysis à la française. The authors published a nice textbook on Exploratory Multivariate Analysis by Example using R (Chapman & Hall/CRC Press, 2011), that was reviewed in the JSS.

I started thinking of a ggplot2 implementation of biplot in R. At the time of this writing, it seems there was only one such attempt, namely ggbiplot,(a) and it is limited to (SVD-based) PCA. I may fork the code at some point. The next picture is one of my experiences in translating FactoMineR biplot for multiple correspondence analysis using ggplot.

At this point, I should mention the definitive reference on this topic: Gower, J.C. and Hand, D.J. (1996). Biplots. Chapman & Hall.

Biplots are the multivariate analog of scatter plots, using multidimensional scaling to approximate the multivariate distribution of a sample in a few dimensions, to produce a graphical display. In addition, they superimpose representations of the variables on this display, so that the relationships between the sample and the variables can be studied. Like scatter plots, biplots are useful for detecting patterns and for displaying the results found by more formal methods of analysis.

It is also worth citing other related books by Gower:

• Gower, J.C., Gardner-Lubbe, S., and Le Roux, N. (2011). Understanding biplots. Wiley-Blackwell (Fixed publication date, thanks to Patrick Durausau.)
• Gower, J.C. and Dijksterhuis, G.B. (2004). Procrustes problems. Oxford University Press.

Here are some further papers on biplot construction and/or interpretation that were missing in my list of references:

1. Gower, J.C. (2003). Unified biplot geometry, in Ferligoj, A. and Mrvar, A. (eds.), Developments in Applied Statistics. Ljubljana : Fakulteta za družbene vede.
2. Vicente-Villardon, J.L., Galindo-Villardon, M.P., and Blazquez-Zaballos, A. (2008). Logistic biplots.
3. Meulman, J. (1996). A distance-based biplot for multidimensional scaling of multivariate data.
4. Aitchison, J. and Greenacre, M. (2002). Biplots of compositional data. Journal of the Royal Statistical Society: Series C, 51(4), 375-392.

Notes

(a) Interestingly, the code has been made public following one of the author's response on Cross Validated.

Articles with the same tag(s):

Multi-Group comparison in Partial Least Squares Path Models
Yet another gray theme for R base graphics
Stata for structural equation modeling
Python for interactive scientific data visualization
Bar charts of counts or frequencies in Stata
Hierarchical Omega coefficient
Cognitive diagnosis models
Interactive data visualization with cranvas
Notes on the ISOQOL 2012 conference
Testlet response theory