- Quality of Life. The assessment, analysis and interpretation of patient-reported outcomes, by Peter M. Fayers and David Machin (Wiley, 2007)
- Statistical Methods for Quality of Life Studies. Design, Measurements and Analysis, by Mounir Mesbah, Bernard F. Cole, and Mei-Ling Ting Lee (Kluwer Academic Publishers, 2002)
- Advances in Quality of Life Research, by Bruno D. Zumbo (Kluwer Academic Publishers, 2001)
This “emerging” field of work extends the frontiers of knowledge in classical epidemiology. We can also notice that genetics is becoming a new theoretical framework for several theoretical considerations, as well.
My daily job is not directly concerned with this research topics, since I’m primarily involved in educational statistics and more classical biostatistics. Meanwhile, I found useful to gain insight into statistical methods used to assess such “subjective” patients’ ratings. From a psychometrical point of view, they can be considered as reflecting one or more latent variable(s) underlying clearly identifiable construct(s). Classical Test Theory, together with modern Item Response Theory (IRT), provide valuable add-ons to the understanding of patient self-assessment. Such statistical toolboxes indeed allow to better understand how a person may interact with an item or an items bundle, while providing a way to locate the individual onto a criterion-based measurement scale. The Rasch Model has been successfully applied by Agnes Hamon in her thesis on QoL questionnaire design. She also wrote a review on the use of IRT to validate the SIP questionnaire (De la théorie classique à la théorie psychométrique moderne). Indeed, the Rasch Model quite naturally extends the more traditional or Classical Test Theory framework, which is based on the analysis of raw scores. Similarly, recent advances in QoL studies claim for an extended used of measurement theory.
Among others, IRT provides well-acknowledged methods to assess scores reliability and test construct validity. Reliability is a property of the scores delivered to the examinees (in an educational context) or patients (in a biomedical context), not of the test itself. Bruce Thompson offers an extended discussion about this topic in his recent book, Scores reliability (but see his homepage). In the following, we shall focus on educational assessment, but the same considerations can easily be transfered to the biomedical domain. So, we speak of reliable scores, and scores refer to the underlying measurement purpose of any test or exam. This is a way of locating or ranking each candidate with respect to a common continuous or ordinal scale. Assigned scores are not immutable nor should they be viewed as an absolute measure of performance or proficiency of a given candidate. But, one certainly wants to be able to compare the scores obtained by two candidates, or to evaluate the progression of a candidate who takes the same test at different occasion. Further, the scoring process should be as precise as possible: One also wants to minimize possible random fluctuations or errors when estimating the score of a candidate. Reliability or repeatability concern the random variability associated with such measurements, whatever the kind of scores given to candidate (binary pass/fail or discrete marks). Thus, scores have to be standardized in some way, and their interpretation would be valid whenever they are linked onto an external criterion or reference scale. The same line of reasoning applies when pass/fail outcomes are considered. In this latter scheme, the precision of the scoring process is assessed through the accuracy of classification (sensibility of the examination).
Quoting Fayers and Machin, validation of instrument is the process of determining whether there are grounds for believing that the instrument measures what it is intended to measure, and that it is useful for its intended purpose. Different theoretical and practical concepts are subsumed under the very broad notion of validity. Some are still debated in the litterature (e.g. see J. Bond’s article) and the list proposed in the following is far from being exhaustive. Content validity concerns the extent to which the items are sensible and reflect the intended domain of interest. Criterion validity considers whether the scale has empirical association with external criteria, such as other established instruments. Construct validity examines the theoretical relationship of the items to each other and to the hypothesized scales. This is obviously the most amenable to exploration by numerical analysis. Two specific aspects of construct validity are convergent validity and discriminant validity.
Altogether, validity and reliability—but we could also consider sensibility and responsiveness—are strongly inter-related andcontribute to the quality of the test or exam under consideration. Poor reliability can sometimes be a warning that validity might be suspect, and that the measurement is detecting something different from what is intended to be measured.
As can be seen from the aboce considerations, psychometrics has probably a role to play in the validation of future questionnaire for QoL assessment. QoL now extends to internet-based ressources, as illustrated by the recent Net Scoring QoL management. On-line scoring or adaptive testing are also of potential interest for QoL assessment.
I would like to refer the interested reader to John Uebersax’s website on Latent Class Model and other topics related to reliability and assessment. Of course, the famous book of Nunnally and Bernstein, Psychometric Theory, remains the very best concise and elegant book (to my opinion at least).