This book provides a very nice introduction to true score (which focus on
test scores) and item response (which focus on item scores) theory, and
discusses the advantages of using testlets as the basis of measurement. I
like such clear overview of main concepts which form the basis of one's
field of study. No unnecessary maths, just facts, good references and
supporting examples, and nice visual illustrations. Lastly, I enjoyed
reading Chapter 2 of Jenkinson's *Measuring Health and Medical Outcomes*
(UCL Press, 1994) which offers an historical overview of subjective health
assessment. A review of the
book was published in *Quality of Life Research*.

Testlets are defined as "a group of items related to a single content area
that is developed as a unit and contains a fixed number of predetermined
paths that an examinee may follow." Classical test (or *true score*) theory
considers the whole test as its fungible unit, while IRT models focus on
item as the basic unit of analysis. The authors recommend a "middle path",
coined *test response theory*, which uses

pieces of the test (as its unit of measurement) that are simultaneously small enough to be usefully adaptive and large enough to maintain some stability.

Testlets are thought to overcome several limitations of classical linear and adaptive testing forms, especially context effects (cross-information, unbalanced content, robustness, order effects), as well as usual assumptions made by common IRT models, including conditional independance (i.e., the probability of answering a particular item correctly is independent of responses to any of the other items, conditional on proficiency) which is hard to control in computer adaptive testing.

Starting with Chapter 3, the authors highlight an interesting and nicely illustrated connection between usual IRT models parameters estimates by marginal likelihhod method and Bayes modal estimates, where (p. 32)

the maximum likelihood estimator is conceptually the same as a Bayes modal estimatori with an improprer uniform prior on proficiency.

In words, the posterior likelihood (probability of observing a given
response pattern as a function of proficiency, θ) is obtained by multiplying
the corresponding item characteristic curves (which describe the probability
of endorsing--positively or negatively--a given item as generated by the IRT
model). Of course, in order to write the conditional probability of
xsub>i given θ and β (i.e., the likelihood), with β_{j} the
item parameter vector (a_{j}, b_{j}, c_{j}) for item
j, as

[ P(x_i\mid\theta_i,\beta)=\prod_jP_j(\theta_i)^{x_{ij}}Q_j(\theta_i)^{1-x_{ij}}, ]

conditional independence must hold. From a Bayesian perspective, the Bayes modal estimate is based on the posterior distribution

[ P(\theta\mid x_i)\propto L(\theta\mid x_i)p(\theta), ]

with p(θ) reflecting our knowledge about θ before observing the results,
i.e. the *prior distribution*. The latter is treated as one more item in the
estimation scheme, and it is multiplied with the likelihood and everything
else, yielding again the posterior distribution. Here, the choice of the prior
distribution matters: when using an uniform prior, which basically amounts
to say that p(θ) takes the same value for all θ, the posterior distribution
for θ will be proportional to the likelihood function. If instead of an
uniform distribution we use a gaussian distribution, there might be more
subtle effects on the results. In any case, we can interpret this prior as
the correct answer to the question: "Are you part of the population whose
proficiency distribution is N(0,1)?" (Remember that prior distribution are
treated as one supplementary item in this estimation framework.)

Some papers of interest:

- Wainer, H. and Kiely, G.L. (1987). Item Clusters and Computerized
Adaptive Testing: A Case for Testlets.
*Journal of Educational Measurement*, 24(3), 185–201. - Wainer, H. and Lewis,
C. (1990). Toward a psychometrics for testlets.
*Journal of Educational Measurement*, 27, 1-14. - Wang, X., Bradlow, E.T., and Wainer, H. (2002). A General Bayesian Model for Testlets: Theory and Applications. ETS Research Report 02-02.
- Lu, Y. and Wang, X. (2006). A Hierarchical Bayesian Framework for Item Response Theory Models with Applications in Ideal Point Estimation.
- Glas, C.A.W., Wainer, H., and Bradlow, E.T. (2000). MML and EAP
estimation in testlet- based adaptive testing. In W.J. van der Linden, and
C.A.W. Glas (Eds.)
*Computerized adaptive testing: Theory and practice*, (p. 271-288). Boston, MA: Kluwer Academic Publishers.