The conference was held on the Agrocampus in Rennes. I went in the
same place two years ago for the
UseR! 2009
conference (I found it too crowdy, but anyway there was really great
stuff presented here). The fact is that some of the organizers are
also those guys that develop the
FactoMineR
package, which is a very useful R package. I used it a lot because it
reminds me of the
SPAD
software. Other packages of interest are:
ade4,
anacor,
ca, and
vegan. Also,
as I am currently working on the Greenacre's book, *Multiple
Correspondance Analysis and Related Methods* (Chapman & Hall/CRC,
2006), which summarizes the CARME 2003 conference, this conference was
a good opportunity to sum up what I've already learned.

As I am currently reading some papers on biplots, let me summarize the
talk that John Gower gave on that particular aspect of data analysis
and visualization. Basically, there are two kinds of information
displayed in a biplot: individuals and variables. Here, the concept of
*approximation* is very important. A biplot provide an approximation
of the relationships when considering all individuals together, or all
variables together, or both. It is useful as a tool for visualizing
*results* from multidimensional analysis; it is not a multidimensional
analysis technique *per se*. There are two kinds of approximation that
can be used: (a) by using the least squares properties of SVD, or (b)
by representing "cases" by any form of MDS and then superimposing
variables by either a regression method or by super superimposing
nonlinear trajectories. The use of SVD in CA and data analysed has
been recognized long ago, e.g. Belfrani (1873), Jordan (1874),
Sylvester (1889), Eckhart & Young (1936), Gabriel (1971), and even
Horst (1965). Biplots can be used for summarizing results of many
techniques of multidimensional analysis, the differences between those
being most of the times just a matter of how we initially transform
the X matrix.

Transformation | Technique |
---|---|

Centre and scale | PCA |

Remove main effects | Biadditive |

Pearson resiudals | CA |

Row/Col χ^{2} | CA |

Within-group dispersion | CVA |

Dispersion (X'X) | FA |

Dissimilarity (XX') | MDS |

Constrained regression | Rank, CANOCO |

Another important concept that was evocated by Gower was that of
Category Level Points (CLP) that are the L_{k} points
associated to a k-level categorical variable. The CLPs give an exact
representation in a high-dimension space; and in the biplot
approximation, points nearest the different CLPs define convex
neighbour which amount to Prediction Regions.

The talk after was given by Michael Friendly, and the slides are
available on his website,
datavis.ca/papers. (Just by browsing
throughout these pages, I found a lot of other interesting stuff.) The
main idea of the talk was to offer a deep overview of the
vcd,
vcdExtra,
and gnm
packages, where the second one actually acts as a glue for the two
other ones. The advantage of `gnm()`

over the classical `loglin()`

function from the `MASS`

package is that it is formula-based and
allows to incorporate various effects in the case of symmetric
two-by-two table (e.g., symmetry or quasi-symmetry).

Stéphane Dray also gave an interesting overview of available method to
combine multivariate and spatial data, using the famous "triplet"
notation (X,Q,D), where X (n x p) is a data matrix (eventually
pre-transformed), and Q (p x p) and D (n x n) are distance matrices
used to impose a metric on the variables and individual spaces,
respectively. Such an approach baiscally allows two kind of
eigendecomposition, namely XQX'DK = KΛ and X'DXQA = AΛ. In the case of
the PCA, we consider X = (x_{ij} -
x̄_{j})/s_{j}, Q = **I**_{p}, and D =
n^{-1}**I**_{n}. By using a spatial weighting matrix W
(n x n), we are able to add a mathematical representation of the
geographical layout of the region under study, i.e. through a
connectivity matrix C defined as c_{ij} = 1 if spatial units i
and j are neighbors, 0 otherwise, and setting up w_{ij} =
c_{ij}/∑_{j}c_{ij}. Spatial autocorrelation
can then be measured with Moran's coefficient or Geary's ratio, and
statistical significance can be assessed using rerandomization
technique. The problem is, however, to consider both aspects
(multivariate and spatialisation) simultaneously. The multidimensional
aspect can de bealed with any dimension reduction technique, like
PCA. The geographical information can be processed by considering a
partition or the use of extra explanatory variables. In any case,
studying spatial patterns amounts to maximize the difference between
regions. In the case we want to focus on eliciting a partition of the
egographical regions, we can use Between-class analysis where we
consider the triplet (X,Q,D) and Y which is a n x g matrix of dummy
variables, which a new triplet (A,Q,D_{Y}) where A =
(Y'DY)^{-1}Y'DX. A latter approach is PCAIV where we consider
a matrix Z (n x q) of explanatory variables and now work with the
triplet (X̂,Q,D) with X̂ = P_{Z}X =
Z(Z'DZ)^{-1}Z'DX. This approach shares similarities with
Redundancy Analysis, CCA, and BCA. Stéphane Dray and Michale Friendly
released the
Guerry
package on CRAN which illustrates these concepts. There is also a
paper in press for the *Annals of Applied Statistics* in the special
issue on multivariate analysis.

Other abstracts that might be of interest are listed below. (See the abstract book for more information.)

- Towards the integration of biological knowledge with canonical correspondence analysis when analyzing Xomic data in an exploratory framework, Verback et al.
- Cluster analysis with k-means: what about the details?, Roux
- First 50 years of Survo: from a statistical program to an interactive environment for data processing, by Vehkalahti and Sund
- Combinatorial inference in geometric data analysis: typicality test, Le Roux and Bienaise
- Logistic biplots for binary, nominal and ordinal data, Vicente-Villardon
- Nominal, ordinal and metric variables in the "social space"? Using CatPCA to examine lifestyles and regional identities in a medium-sized German city, Mühlichen