Testlet response theory


Here is a brief overview of Testlet Response Theory and its Applications, by Wainer, Bradlow, and Wang (Cambridge University Press, 2007).

This book provides a very nice introduction to true score (which focus on test scores) and item response (which focus on item scores) theory, and discusses the advantages of using testlets as the basis of measurement. I like such clear overview of main concepts which form the basis of one's field of study. No unnecessary maths, just facts, good references and supporting examples, and nice visual illustrations. Lastly, I enjoyed reading Chapter 2 of Jenkinson's Measuring Health and Medical Outcomes (UCL Press, 1994) which offers an historical overview of subjective health assessment. A review of the book was published in Quality of Life Research.

Testlets are defined as "a group of items related to a single content area that is developed as a unit and contains a fixed number of predetermined paths that an examinee may follow." Classical test (or true score) theory considers the whole test as its fungible unit, while IRT models focus on item as the basic unit of analysis. The authors recommend a "middle path", coined test response theory, which uses

pieces of the test (as its unit of measurement) that are simultaneously small enough to be usefully adaptive and large enough to maintain some stability.

Testlets are thought to overcome several limitations of classical linear and adaptive testing forms, especially context effects (cross-information, unbalanced content, robustness, order effects), as well as usual assumptions made by common IRT models, including conditional independance (i.e., the probability of answering a particular item correctly is independent of responses to any of the other items, conditional on proficiency) which is hard to control in computer adaptive testing.

Starting with Chapter 3, the authors highlight an interesting and nicely illustrated connection between usual IRT models parameters estimates by marginal likelihhod method and Bayes modal estimates, where (p. 32)

the maximum likelihood estimator is conceptually the same as a Bayes modal estimatori with an improprer uniform prior on proficiency.

In words, the posterior likelihood (probability of observing a given response pattern as a function of proficiency, θ) is obtained by multiplying the corresponding item characteristic curves (which describe the probability of endorsing--positively or negatively--a given item as generated by the IRT model). Of course, in order to write the conditional probability of xsub>i given θ and β (i.e., the likelihood), with βj the item parameter vector (aj, bj, cj) for item j, as

[ P(x_i\mid\theta_i,\beta)=\prod_jP_j(\theta_i)^{x_{ij}}Q_j(\theta_i)^{1-x_{ij}}, ]

conditional independence must hold. From a Bayesian perspective, the Bayes modal estimate is based on the posterior distribution

[ P(\theta\mid x_i)\propto L(\theta\mid x_i)p(\theta), ]

with p(θ) reflecting our knowledge about θ before observing the results, i.e. the prior distribution. The latter is treated as one more item in the estimation scheme, and it is multiplied with the likelihood and everything else, yielding again the posterior distribution. Here, the choice of the prior distribution matters: when using an uniform prior, which basically amounts to say that p(θ) takes the same value for all θ, the posterior distribution for θ will be proportional to the likelihood function. If instead of an uniform distribution we use a gaussian distribution, there might be more subtle effects on the results. In any case, we can interpret this prior as the correct answer to the question: "Are you part of the population whose proficiency distribution is N(0,1)?" (Remember that prior distribution are treated as one supplementary item in this estimation framework.)

Some papers of interest:

  1. Wainer, H. and Kiely, G.L. (1987). Item Clusters and Computerized Adaptive Testing: A Case for Testlets. Journal of Educational Measurement, 24(3), 185–201.
  2. Wainer, H. and Lewis, C. (1990). Toward a psychometrics for testlets. Journal of Educational Measurement, 27, 1-14.
  3. Wang, X., Bradlow, E.T., and Wainer, H. (2002). A General Bayesian Model for Testlets: Theory and Applications. ETS Research Report 02-02.
  4. Lu, Y. and Wang, X. (2006). A Hierarchical Bayesian Framework for Item Response Theory Models with Applications in Ideal Point Estimation.
  5. Glas, C.A.W., Wainer, H., and Bradlow, E.T. (2000). MML and EAP estimation in testlet- based adaptive testing. In W.J. van der Linden, and C.A.W. Glas (Eds.) Computerized adaptive testing: Theory and practice, (p. 271-288). Boston, MA: Kluwer Academic Publishers.

Articles with the same tag(s):

Multi-Group comparison in Partial Least Squares Path Models
Data Science from Scratch
Stata for health researchers
R Graphs Cookbook
Bad Data
Data science at the command-line
Reproducible research with R
Twenty canonical questions in machine learning
Do a large amount of consulting
Stata for structural equation modeling