The following discussion aims at describing statistical tools used to estimate the parameters of multilevel models. It is derived from an article published in Methodology,(1) part of which is briefly extended with additional references and some applications in educational testing.
There are two main approaches to the estimation of statistical models for cross-classified data structures in the literature on multilevel models: likelihood-based frequentist methods and Markov Chain Monte Carlo, i.e. MCMC procedures, which are based on Bayesian statistics.(2)
The estimation of the parameters in multilevel modeling programs is usually carried out by likelihood-based frequentist methods that result in maximum likelihood (ML) estimates. These are:
There exists specialized software for multilevel analysis, and probably the most widely used are: MlwinN and HLM.1 The methods implemented in these programs are designed specifically for estimating hierarchical multilevel models, although they can be adapted to fit other models. For instance, models with crossed random effects can be estimated with ML using ordinary hierarchical multilevel models(8,9). Likewise, the EM algorithm can be applied to cross-classified data(10). ML estimates for cross-classified multilevel models can also be obtained within the framework of linear mixed-effects models(11). This approach will be discussed in the next section, with an emphasis on the use of SAS and R for LME fitting. In contrast to the previous software, LME models are based on the specification of sizable design matrices, which require more computing time and may lead to estimation problems in complex data structures.
MCMC methods represent a different approach to the estimation of cross-classified multilevel models. They provide a very general, simulation-based approach that can be used to fit many more statistical models than ML procedures. A general overview of such capabilities and its implementation within the R framework is provided in R News, March 2006, but see also these additional references. For every model parameter, a prior distribution is determined that reflects the previous knowledge about the parameter. Based on the observed data, the posterior distribution, which is the analog to the likelihood function in the ML approach, is then determined. If a “non-informative” prior distribution is selected (i.e., we have only marginal previous knowledge about the parameter of interest), the likelihood function and the posterior distribution will be essentially the same.
Instead of using a relatively complex joint posterior distribution, which is in many cases analytically intractable, MCMC simulates the parameters from the conditional distributions, usually by means of Gibbs sampling(12) or Metropolis-Hastings sampling.(13) Again, dedicated software are available, e.g. BUGS and WinBUGS, but see also this article on Bayesian inference using BUGS.
If the simulations of a conditional distribution converge to a stationary distribution, the single draws from the (conditional) distribution can be regarded as realizations of the posterior distribution that forms the central basis for statistical inference (for a detailed discussion of assessing convergence in MCMC, see (14). This procedure bears some resemblance to bootstrapping(15), which facilitates, by repeated sampling from the observed data, the determination of the empirical sampling distribution of a parameter of interest. For Bayesian results, the mean, median, or mode of the posterior distribution is used as a point estimate. Comparable to a confidence interval in the frequentist approach, the Bayesian credibility interval (BCI) is based on the 2.5th and the 97.5th percentile points of the posterior distribution.
In contrast to the ML approach, the MCMC method not only gives a point estimate and a standard error for every parameter, but also provides a distribution of the parameter (posterior distribution). The two approaches have now been compared in a number of studies. For instance, in (16) the authors showed that, relative to the MCMC method, ML has the tendency to underestimate the variance components, but that the performance difference between the two methods in terms of the bias of the fixed effects is negligible. Another crucial advantage of the MCMC method is that it can be easily generalized to fit more complex multilevel models that might not be estimable in a ML framework with the statistical software packages currently available.