Here is a quick wrap up of the BoRdeaux conference. I won’t detail the conference program itself, but just drop some words on packages that were presented together with their applications (in various fields: epidemiology, social sciences, teaching, high dimensional data, chemometrics).
Stéphanie Bougeard talked about two new functions in the ade4 package aiming at the analysis of K+1 tables (several blocks of explanatory variables and a block of response variables). I can’t find those functions, mbpls
and mbpcaiv
, but they look interesting. I wonder how they compare to RGCCA or PLS path modeling (e.g., plspm or semPLS).
Her slides from other conferences include more mathematical details: AGROSTAT 2010, CARME 2011. Currently, the key paper seems to be: Bougeard, S, Qannari, EM, Rose, N (2011). Multiblock redundancy analysis: interpretation tools and application in epidemiology. Journal of Chemometrics, 25(9): 467–475.
Other (related) papers of interest:
I’ve also learned that ade4 graphics capabilities will be rebased on the lattice package, allowing for complex layout on graphical device (Alice Julien-Laferriere’s talk). This was done using S4 classes on top of existing functions visible to the user (s.class
, dudi.pca
, etc.).
Aurélie Thébault presented her work on locally-weighted PLS regression, with applications in infrared spectral analysis. The idea is to introduce a local calibration stage, before computing PLS components. The idea of local PLS is to predict new observations from a subset of the original samples that resemble the characteristics of these new observations (weighting process). This seems to be highly specific of near-infrared spectroscopy, but it might be interesting for signal processing.
The PCAmixdata was discussed by Vanessa Kuentz-Simonet. This is a package that deals with VARIMAX rotation in factor analysis: Chavent, M, Vanessa, K, and Saracco, J (2011), Orthogonal rotation in PCAMIX (arXiv:1112.0301). At UseR! 2011, there was a related talk on the selection of variables by those authors: ClustOfVar: an R package for the clustering of variables.
Other interesting papers I have to read or reread:
The mixOmics package has been updated with new functions, including Independent Principal Component Analysis. It now has an official website where more information are available, and a there is also a mixOmics wizard where users can see online illustrations and get explanation of the techniques used therein (good point for reproducible research!).
Charles Bouveyron provided a general overview of the HDclassif package (but see the JSS paper, HDclassif: An R Package for Model-Based Clustering and Discriminant Analysis of High-Dimensional Data), which is for supervised and unsupervised classification. There was a nice demo of clustering with the crabs
dataset, which can be found in demo_hddc()
. Below is a screenshot from running model-based clustering with the EM algorithm, k-means initialization for cluster centres, and AkBkQkDk
model for the general variance-covariance structure (see section 2.1 of the JSS paper for more explanation).
Florent Langrognet presented the Rmixmod package; this is a porting from the mixmod project for high performance model-based cluster and discriminant analysis, which comes as a C++ library with command-line utilities and a MATLAB frontend. Interestingly, this package also works with semi-supervised problem, and it allows for case weighting.