Statistical Models for cross-classified data structures


The following discussion aims at describing statistical tools used to estimate the parameters of multilevel models. It is derived from an article published in Methodology,(1) part of which is briefly extended with additional references and some applications in educational testing.

There are two main approaches to the estimation of statistical models for cross-classified data structures in the literature on multilevel models: likelihood-based frequentist methods and Markov Chain Monte Carlo, i.e. MCMC procedures, which are based on “Bayesian statistics”:

The estimation of the parameters in multilevel modeling programs is usually carried out by likelihood-based frequentist methods that result in maximum likelihood (ML) estimates. These are

  • the iterative generalized least squares (IGLS) method
  • the restrictive iterative generalized least squares (RIGLS) method
  • the expectation-maximization (EM) algorithm

There exists specialized software for multilevel analysis, and probably the most widely used are: MlwinN and HLM] The methods implemented in these programs are designed specifically for estimating hierarchical multilevel models, although they can be adapted to fit other models. For instance, models with crossed random effects can be estimated with ML using ordinary hierarchical multilevel models(8,9) Likewise, the EM algorithm can be applied to cross-classified data(10). ML estimates for cross-classified multilevel models can also be obtained within the framework of linear mixed-effects models.(11) This approach will be discussed in the next section, with an emphasis on the use of SAS and R for LME fitting.In contrast to the previous software, LME models are based on the specification of sizeable design matrices, which require more computing time and may lead to estimation problems in complex data structures.

MCMC methods represent a different approach to the estimation of cross-classified multilevel models. They provide a very general, simulation-based approach that can be used to fit many more statistical models than ML procedures. A general overview of such capabilities and its implementation within the R framework is provided in R News, March 2006, but see also these additional references. For every model parameter, a prior distribution is determined that reflects the previous knowledge about the parameter. Based on the observed data, the posterior distribution, which is the analog to the likelihood function in the ML approach, is then determined. If a “noninformative” prior distribution is selected (i.e. we have only marginal previous knowledge about the parameter of interest), the likelihood function and the posterior distribution will be essentially the same.

Instead of using a relatively complex joint posterior distribution, which is in many cases analytically intractable, MCMC simulates the parameters from the conditional distributions, usually by means of Gibbs sampling(12) or Metropolis-Hastinngs sampling(13). Again, dedicated software are available, e.g. BUGS and WinBUGS, but see also this article on Bayesian inference using BUGS.

If the simulations of a conditional distribution converge to a stationary distribution, the single draws from the (conditional) distribution can be regarded as realizations of the posterior distribution that forms the central basis for statistical inference (for a detailed discussion of assessing convergence in MCMC(14)). This procedure bears some resemblance to “bootstrapping”:,(15) which facilitates, by repeated sampling from the observed data, the determination of the empirical sampling distribution of a parameter of interest. For Bayesian results, the mean, median, or mode of the posterior distribution is used as a point estimate. Comparable to a confidence interval in the frequentist approach, the Bayesian credibility interval (BCI) is based on the 2.5th and the 97.5th percentile points of the posterior distribution.

In contrast to the ML approach, the MCMC method not only gives a point estimate and a standard error for every parameter, but also provides a distribution of the parameter (posterior distribution). The two approaches have now been compared in a number of studies. For instance, (16) showed that, relative to the MCMC method, ML has the tendency to underestimate the variance components, but that the performance difference between the two methods in terms of the bias of the fixed effects is negligible. Another crucial advantage of the MCMC method is that it can be easily generalized to fit more complex multilevel models that might not be estimable in a ML framework with the statistical software packages currently available.


  1. Lüdtke, O., Robitzsch, A., Trautwein, U., Kreuter, F., and Ihme, J.M. (2007). Are there Test Administrator effects in large-scale educational assessments? Using cross-classified multilevel analysis to probe for effects on mathematics achievement and sample attrition. Methodology, 3(4), 149-159.
  2. Goldstein, H. (2003). Multilevel Statistical Models (3rd Ed.). London: Edward Arnold. See also this Note.
  3. Goldstein, H. (1989). Restricted unbiased iterative generalized least squares estimation. Biometrika, 76(3), 622-623.
  4. del Pino, G. (1989). The Unifying Role of Iterative Generalized Least Squares in Statistical Algorithms. Statistical Science, 4(4), 394-403.
  5. Björck, Å (1996). Numerical Methods for Least Squares Problems. SIAM.
  6. Dear, K.B. (1994). Iterative generalized least squares for meta-analysis of survival data at multiple times. Biometrics, 50(4), 989-1002.
  7. Díaz, M.M. and Ones, V.G. (2005). Estimating multilevel models for categorical data via generalized least squares. Revista Colombiana de Estadística, 28(1), 63-76.
  8. Goldstein, H. (1987). Multilevel Models in Educational and Social Research. London: Griffin.
  9. Hox, J.J. (2002). Multilevel Analysis: Techniques and Applications. Mahwah, NJ: Erlbaum.
  10. Raudenbush, S.W. (1993). A crossed random effects model for unbalanced data with applications in cross-sectional and longitudinal research. Journal of Educational Statistics, 18, 321-349.
  11. McCulloch, C.E. and Searle, S.R. (2001). Generalized, Linear, and Mixed Models. New York: Wiley.
  12. Casella, G. and George, E.I. (1992). Explaining the Gibbs sampler. The American Statistician, 46, 167-174.
  13. Chib, S. and Greenberg, E. (1995). Understanding the Metropolis-Hastings algorithm. The American Statistician, 49, 327-335.
  14. Cowles, M.K. and Carlin, B.P. (1996). Markov Chain Monte Carlo convergence diagnostics: A comparative review. Journal of the American Statistical Association, 91, 833-904.
  15. Efron, B. and Tibshirani, R.J. (1993). An introduction to the bootstrap. New York: Chapman and Hall.
  16. Browne, W.J. and Draper, D. (2006). A comparison of Bayesian and likelihood-based methods for fitting multilevel models. Bayesian Analysis, 3, 473-514.


1 Note that there are also other option to keep in line with more traditional software, e.g. SAS or SPSS.


Articles with the same tag(s):

Academic teaching
Data cleaning techniques
Writing a book
Bad Data
Twenty canonical questions in machine learning
Do a large amount of consulting
Audit trails in statistical project
Dose finding studies and cross-over trials
Evidence-based medicine and clinical diagnosis
Exploratory data mining and data cleaning