Latest reading list on medical statistics
Contents
A bunch of papers in early view from Statistics in Medicine suddenly came out in my Google feed reader. Way too many to tweet them all, so here is a brief list of papers I should read during my forthcoming week off.
Some papers come from a special issue; others are ordinary research papers.
Vexler, A., Tsai, WM., Malinovsky, Y. Estimation and testing based on data subject to measurement errors: from parametric to nonparametric likelihood methods.
Measurement error (ME) problems can cause bias or inconsistency of statistical inferences. When investigators are unable to obtain correct measurements of biological assays, special techniques to quantify MEs need to be applied. Sampling based on repeated measurements is a common strategy to allow for ME. This method has been well addressed in the literature under parametric assumptions. The approach with repeated measures data may not be applicable when the replications are complicated because of cost and/or time concerns. Pooling designs have been proposed as costefficient sampling procedures that can assist to provide correct statistical operations based on data subject to ME. We demonstrate that a mixture of both pooled and unpooled data (a hybrid pooled–unpooled design) can support very efficient estimation and testing in the presence of ME. Nonparametric techniques have not been well investigated to analyze repeated measures data or pooled data subject to ME. We propose and examine both the parametric and empirical likelihood methodologies for data subject to ME. We conclude that the likelihood methods based on the hybrid samples are very efficient and powerful. The results of an extensive Monte Carlo study support our conclusions. Real data examples demonstrate the efficiency of the proposed methods in practice.
Chevret, S. Bayesian adaptive clinical trials: a dream for statisticians only?
Adaptive or ‘flexible’ designs have emerged, mostly within frequentist frameworks, as an effective way to speed up the therapeutic evaluation process. Because of their flexibility, Bayesian methods have also been proposed for Phase I through Phase III adaptive trials; however, it has been reported that they are poorly used in practice. We aim to describe the international scientific production of Bayesian clinical trials by investigating the actual development and use of Bayesian ‘adaptive’ methods in the setting of clinical trials. A bibliometric study was conducted using the PubMed and Science Citation IndexExpanded databases. Most of the references found were biostatistical papers from various teams around the world. Most of the authors were from the US, and a large proportion was from the MD Anderson Cancer Center (University of Texas, Houston, TX). The spread and use of these articles depended heavily on their topic, with 3.1% of the biostatistical articles accumulating at least 25 citations within 5 years of their publication compared with 15% of the reviews and 32% of the clinical articles. We also examined the reasons for the limited use of Bayesian adaptive design methods in clinical trials and the areas of current and future research to address these challenges. Efforts to promote Bayesian approaches among statisticians and clinicians appear necessary.
Hardouin, JB., Amri, S., Feddag, ML., Sébille, V. Towards power and sample size calculations for the comparison of two groups of patients with item response theory models.
Evaluation of patientreported outcomes (PRO) is increasingly performed in health sciences. PRO differs from other measurements because such patient characteristics cannot be directly observed. Item response theory (IRT) is an attractive way for PRO analysis. However, in the framework of IRT, sample size justification is rarely provided or ignores the fact that PRO measures are latent variables with the use of formulas developed for observed variables. It might therefore be inappropriate and might provide inadequately sized studies. The objective was to develop valid sample size methodology for the comparison of PRO in two groups of patients using IRT. The proposed approach takes into account questionnaire’s items parameters, the difference of the latent variables means, and its variance whose derivation is approximated using Cramer–Rao bound (CRB). We also computed the associated power. We realized a simulation study taking into account sample size, number of items, and value of the group effect. We compared power obtained from CRB with the one obtained from simulations (SIM) and with the power based on observed variables (OBS). For a given sample size, powers using CRB and SIM were similar and always lower than OBS. We observed a strong impact of the number of items for CRB and SIM, the power increasing with the questionnaire’s length but not for OBS. In the context of latent variables, it seems important to use an adapted sample size formula because the formula developed for observed variables seems to be inadequate and leads to an underestimated study size.
Dazard, JE., Rao, J.S., Markowitz, S. Local sparse bump hunting reveals molecular heterogeneity of colon tumors.
The question of molecular heterogeneity and of tumoral phenotype in cancer remains unresolved. To understand the underlying molecular basis of this phenomenon, we analyzed genomewide expression data of colon cancer metastasis samples, as these tumors are the most advanced and hence would be anticipated to be the most likely heterogeneous group of tumors, potentially exhibiting the maximum amount of genetic heterogeneity. Casting a statistical net around such a complex problem proves difficult because of the high dimensionality and multicollinearity of the gene expression space, combined with the fact that genes act in concert with one another and that not all genes surveyed might be involved. We devise a strategy to identify distinct subgroups of samples and determine the genetic/molecular signature that defines them. This involves use of the local sparse bump hunting algorithm, which provides a much more optimal and biologically faithful transformed space within which to search for bumps. In addition, thanks to the variable selection feature of the algorithm, we derived a novel sparse gene expression signature, which appears to divide all colon cancer patients into two populations: a population whose expression pattern can be molecularly encompassed within the bump and an outlier population that cannot be. Although all patients within any given stage of the disease, including the metastatic group, appear clinically homogeneous, our procedure revealed two subgroups in each stage with distinct genetic/molecular profiles. We also discuss implications of such a finding in terms of early detection, diagnosis and prognosis.
Ambler, G., Seaman, S., Omar, R.Z. An evaluation of penalised survival methods for developing prognostic models with rare events.
Prognostic models for survival outcomes are often developed by fitting standard survival regression models, such as the Cox proportional hazards model, to representative datasets. However, these models can be unreliable if the datasets contain few events, which may be the case if either the disease or the event of interest is rare. Specific problems include predictions that are too extreme, and poor discrimination between lowrisk and highrisk patients. The objective of this paper is to evaluate three existing penalised methods that have been proposed to improve predictive accuracy. In particular, ridge, lasso and the garotte, which use penalised maximum likelihood to shrink coefficient estimates and in some cases omit predictors entirely, are assessed using simulated data derived from two clinical datasets. The predictions obtained using these methods are compared with those from Cox models fitted using standard maximum likelihood. The simulation results suggest that Cox models fitted using maximum likelihood can perform poorly when there are few events, and that significant improvements are possible by taking a penalised modelling approach. The ridge method generally performed the best, although lasso is recommended if variable selection is required.
Lee, W., Gusnanto, A., Salim, A., Magnusson, P., Sim, X., Tai, E.S., Pawitan, Y. Estimating the number of true discoveries in genomewide association studies.
Recent genomewide association studies have reported the discoveries of genetic variants of small to moderate effects. However, most studies of complex diseases face a great challenge because the number of significant variants is less than what is required to explain the disease heritability. A new approach is needed to recognize all potential discoveries in the data. In this paper, we present a practical modelfree procedure to estimate the number of true discoveries as a function of the number of topranking SNPs together with the confidence bounds. This approach allows a practical methodology of general utility and produces relevant statistical quantities with simple interpretation.
Geloven, N., Broeze, K.A., Opmeer, B.C., Mol, B.W., Zwinderman, A.H. How to deal with double partial verification when evaluating two index tests in relation to a reference test?
Research into the diagnostic accuracy of clinical tests is often hampered by single or double partial verification mechanisms, that is, not all patients have their disease status verified by a reference test, neither do all patients receive all tests under evaluation (index tests). We show methods that reduce verification bias introduced when omitting data from partially tested patients. Adjustment techniques are well established when there are no missing index tests and when the reference test is ‘missing at random’. However, in practice, index tests tend to be omitted, and the choice of applying a reference test may depend on unobserved variables related to disease status, that is, verification may be missing not at random (MNAR). We study double partial verification in a clinical example from reproductive medicine in which we analyse the diagnostic values of the chlamydia antibody test and the hysterosalpingography in relation to a diagnostic laparoscopy. First, we plot all possible combinations of sensitivity and specificity of both index tests in two test ignorance regions. Then, we construct models in which we impose different assumptions for the verification process. We allow for missing index tests, study the influence of patient characteristics and study the accuracy estimates if an MNAR mechanism would operate. It is shown that data on tests used in the diagnostic process of the same population are preferably studied jointly and that the influence of an MNAR verification process was limited in a clinical study where more than half of the patients did not have the reference test.
Pfeiffer, R.M., Forzani, L., Bura, E. Sufficient dimension reduction for longitudinally measured predictors.
We propose a method to combine several predictors (markers) that are measured repeatedly over time into a composite marker score without assuming a model and only requiring a mild condition on the predictor distribution. Assuming that the first and second moments of the predictors can be decomposed into a time and a marker component via a Kronecker product structure that accommodates the longitudinal nature of the predictors, we develop firstmoment sufficient dimension reduction techniques to replace the original markers with linear transformations that contain sufficient information for the regression of the predictors on the outcome. These linear combinations can then be combined into a score that has better predictive performance than a score built under a general model that ignores the longitudinal structure of the data. Our methods can be applied to either continuous or categorical outcome measures. In simulations, we focus on binary outcomes and show that our method outperforms existing alternatives by using the AUC, the area under the receiver–operator characteristics (ROC) curve, as a summary measure of the discriminatory ability of a single continuous diagnostic marker for binary disease outcomes.
Kifley, A., Heller, G.Z., Beath, K.J., Bulger, D., Ma, J., Gebski, V.. Multilevel latent variable models for global healthrelated quality of life assessment.
Quality of life (QOL) assessment is a key component of many clinical studies and frequently requires the use of single global summary measures that capture the overall balance of findings from a potentially wideranging assessment of QOL issues. We propose and evaluate an irregular multilevel latent variable model suitable for use as a global summary tool for healthrelated QOL assessments. The proposed model is a multiple indicator and multiple cause style of model with a twolevel latent variable structure. We approach the modeling from a general multilevel modeling perspective, using a combination of random and nonrandom cluster types to accommodate the mixture of issues commonly evaluated in healthrelated QOL assessments—overall perceptions of QOL and health, along with specific psychological, physical, social, and functional issues. Using clinical trial data, we evaluate the merits and application of this approach in detail, both for mean global QOL and for change from baseline. We show that the proposed model generally performs well in comparing global patterns of treatment effect and provides more precise and reliable estimates than several common alternatives such as selecting from or averaging observed global item measures. A variety of computational methods could be used for estimation. We derived a closedform expression for the marginal likelihood that can be used to obtain maximum likelihood parameter estimates when normality assumptions are reasonable. Our approach is useful for QOL evaluations aimed at pharmacoeconomic or individual clinical decision making and in obtaining summary QOL measures for use in qualityadjusted survival analyses.
Banerjee, M., Ding, Y., Noone, AM. Identifying representative trees from ensembles.
Treebased methods have become popular for analyzing complex data structures where the primary goal is risk stratification of patients. Ensemble techniques improve the accuracy in prediction and address the instability in a single tree by growing an ensemble of trees and aggregating. However, in the process, individual trees get lost. In this paper, we propose a methodology for identifying the most representative trees in an ensemble on the basis of several tree distance metrics. Although our focus is on binary outcomes, the methods are applicable to censored data as well. For any two trees, the distance metrics are chosen to (1) measure similarity of the covariates used to split the trees; (2) reflect similar clustering of patients in the terminal nodes of the trees; and (3) measure similarity in predictions from the two trees. Whereas the latter focuses on prediction, the first two metrics focus on the architectural similarity between two trees. The most representative trees in the ensemble are chosen on the basis of the average distance between a tree and all other trees in the ensemble. Outofbag estimate of error rate is obtained using neighborhoods of representative trees. Simulations and data examples show gains in predictive accuracy when averaging over such neighborhoods. We illustrate our methods using a dataset of kidney cancer treatment receipt (binary outcome) and a second dataset of breast cancer survival (censored outcome).
Pencina, M.J., D’Agostino, R.B., Song, L. Quantifying discrimination of Framingham risk functions with different survival C statistics.
Cardiovascular risk prediction functions offer an important diagnostic tool for clinicians and patients themselves. They are usually constructed with the use of parametric or semiparametric survival regression models. It is essential to be able to evaluate the performance of these models, preferably with summaries that offer natural and intuitive interpretations. The concept of discrimination, popular in the logistic regression context, has been extended to survival analysis. However, the extension is not unique. In this paper, we define discrimination in survival analysis as the model’s ability to separate those with longer eventfree survival from those with shorter eventfree survival within some time horizon of interest. This definition remains consistent with that used in logistic regression, in the sense that it assesses how well the modelbased predictions match the observed data. Practical and conceptual examples and numerical simulations are employed to examine four C statistics proposed in the literature to evaluate the performance of survival models. We observe that they differ in the numerical values and aspects of discrimination that they capture. We conclude that the index proposed by Harrell is the most appropriate to capture discrimination described by the above definition. We suggest researchers report which C statistic they are using, provide a rationale for their selection, and be aware that comparing different indices across studies may not be meaningful.
Archer, K.J., Williams, A.A.A. L1 penalized continuation ratio models for ordinal response prediction using highdimensional datasets.
Health status and outcomes are frequently measured on an ordinal scale. For highthroughput genomic datasets, the common approach to analyzing ordinal response data has been to break the problem into one or more dichotomous response analyses. This dichotomous response approach does not make use of all available data and therefore leads to loss of power and increases the number of type I errors. Herein we describe an innovative frequentist approach that combines two statistical techniques, L 1 penalization and continuation ratio models, for modeling an ordinal response using gene expression microarray data. We conducted a simulation study to assess the performance of two computational approaches and two model selection criteria for fitting frequentist L 1 penalized continuation ratio models. Moreover, we empirically compared the approaches using three application datasets, each of which seeks to classify an ordinal class using microarray gene expression data as the predictor variables. We conclude that the L 1 penalized constrained continuation ratio model is a useful approach for modeling an ordinal response for datasets where the number of covariates (p) exceeds the sample size (n) and the decision of whether to use Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC) for selecting the final model should depend upon the similarities between the pathologies underlying the disease states to be classified.
Li, Y., Baser, R. Using R and WinBUGS to fit a generalized partial credit model for developing and evaluating patientreported outcomes assessments.
The US Food and Drug Administration recently announced the final guidelines on the development and validation of patientreported outcomes (PROs) assessments in drug labeling and clinical trials. This guidance paper may boost the demand for new PRO survey questionnaires. Henceforth, biostatisticians may encounter psychometric methods more frequently, particularly item response theory (IRT) models to guide the shortening of a PRO assessment instrument. This article aims to provide an introduction on the theory and practical analytic skills in fitting a generalized partial credit model (GPCM) in IRT. GPCM theory is explained first, with special attention to a clearer exposition of the formal mathematics than what is typically available in the psychometric literature. Then, a worked example is presented, using selfreported responses taken from the international personality item pool. The worked example contains stepbystep guides on using the statistical languages r and WinBUGS in fitting the GPCM. Finally, the Fisher information function of the GPCM model is derived and used to evaluate, as an illustrative example, the usefulness of assessment items by their information contents. This article aims to encourage biostatisticians to apply IRT models in the reanalysis of existing data and in future research.
Nietert, P.J., Wahlquist, A.E., Herbert, T.L. Characteristics of recent biostatistical methods adopted by researchers publishing in general/internal medicine journals.
Novel statistical methods are continuously being developed within the context of biomedical research; however, the characteristics of biostatistics methods that have been adopted by researchers publishing in general/internal medicine (GIM) journals are unclear. This study highlights the statistical journal articles, the statistical journals, and the types of statistical methods that appear to be having the most direct impact on research published in GIM journals. We used descriptive techniques, including analyses of articles’ keywords and controlled vocabulary terms, to characterize the articles published in statistics and probability journals that were subsequently referenced within GIM journal articles during a recent 10year period (2000–2009). From the 45 statistics and probability journals of interest, we identified a total of 597 unique articles as being cited by 900 (of a total of about 10,501) unique GIM journal articles. The most frequently cited statistical topics included general/other statistical methods, followed by epidemiologic methods, randomized trials, generalized linear models, metaanalysis, and missing data. As statisticians continue to develop and refine techniques, the promotion and adoption of these methods should also be addressed so that their efforts spent in developing the methods are not carried out in vain.

The Aging, Demographics, and Memory Study is the first extensive study of cognitive impairment and dementia of population in the USA. A large sample of participants with ages 71 years or older has answered an indepth questionnaire which included an extensive cognitive assessment. One of the principal aims of the study was to assign a diagnosis of dementia, cognitive impairment but not demented, or normal to the respondents. Because of this heterogeneity, we apply factor mixture model to the set of neuropsychological measures of the study in order to perform clustering of subjects and dimension reduction simultaneously. Moreover, we consider an extended variant of the model by incorporating a set of demographics and clinical covariates which directly affect the latent variables of the factor mixture model. The interest of the analysis is to investigate whether respondents exhibit the same association of overall cognitive functioning with the covariates or whether groups of respondents exist that exhibit different association with the covariates, indicating different determinants of overall cognitive functioning.
Mukherjee, B., Ko, YA., VanderWeele, T., Roy, A., Park, S.K., Chen, J. Principal interactions analysis for repeated measures data: application to gene–gene and gene–environment interactions.
Many existing cohorts with longitudinal data on environmental exposures, occupational history, lifestyle/ behavioral characteristics, and health outcomes have collected genetic data in recent years. In this paper, we consider the problem of modeling gene–gene and gene–environment interactions with repeated measures data on a quantitative trait. We review possibilities of using classical models proposed by Tukey (1949) and Mandel (1961) using the cell means of a twoway classification array for such data. Although these models are effective for detecting interactions in the presence of main effects, they fail miserably if the interaction structure is misspecified. We explore a more robust class of interaction models that are based on a singular value decomposition of the cellmeans residual matrix after fitting the additive main effect terms. This class of additive main effects and multiplicative interaction models (Gollob, 1968) provide useful summaries for subjectspecific and timevarying effects as represented in terms of their contribution to the leading eigenvalues of the interaction matrix. It also makes the interaction structure more amenable to geometric representation. We call this analysis ‘principal interactions analysis’. While the paper primarily focuses on a cellmeanbased analysis of repeated measures outcome, we also introduce resamplingbased methods that appropriately recognize the unbalanced and longitudinal nature of the data instead of reducing the response to cell means. We illustrate the proposed methods by using data from the Normative Aging Study, a longitudinal cohort study of Boston area veterans since 1963. We carry out simulation studies under an array of classical interaction models and common epistasis models to illustrate the properties of the principal interactions analysis procedure in comparison with the classical alternatives.
Gaio, A.R., da Costa, J.P., Santos, A.C., Ramos, E., Lopes, C. A restricted mixture model for dietary pattern analysis in small samples.
Multivariate finite mixture models have been applied to the identification of dietary patterns. These models are known to have many parameters, and consequently large samples are usually required. We present a special case of a multivariate mixture model that reduces the number of parameters to be estimated and seems adequate for small to moderately sized samples.
We illustrate our approach with an analysis of Portuguese data from a foodfrequency questionnaire and with a simulation study.Pan, Y., Haber, M., Gao, J., Barnhart, H.X. A new permutationbased method for assessing agreement between two observers making replicated quantitative readings.
The coefficient of individual equivalence is a permutationbased measure of agreement between two observers making replicated readings on each subject. It compares the observed disagreement between the observers to the expected disagreement under individual equivalence. Individual equivalence of observers requires that for every study subject, the conditional distributions of the readings of the observers given the subject’s characteristics are identical. Therefore, under individual equivalence it does not matter which observer is making a particular reading on a given subject. We introduce both nonparametric and parametric methods to estimate the coefficient as well as its standard error. We compare the new coefficient with the coefficient of individual agreement and with the concordance correlation coefficient. We also evaluate the performance of the estimates of the new coefficient via simulations and illustrate this new approach using data from a study comparing two noninvasive techniques for measuring carotid stenosis to an invasive gold standard.
Androulakis, E., Koukouvinos, C., Vonta, F. Estimation and variable selection via frailty models with penalized likelihood.
The penalized likelihood methodology has been consistently demonstrated to be an attractive shrinkage and selection method. It does not only automatically and consistently select the important variables but also produces estimators that are as efficient as the oracle estimator. In this paper, we apply this approach to a general likelihood function for data organized in clusters, which corresponds to a class of frailty models, which includes the Cox model and the Gamma frailty model as special cases. Our aim was to provide practitioners in the medical or reliability field with options other than the Gamma frailty model, which has been extensively studied because of its mathematical convenience. We illustrate the penalized likelihood methodology for frailty models through simulations and real data.

There is a singleminded focus on events in survival analysis, and we often ignore longitudinal data that are collected together with the event data. This is due to a lack of methodology but also a result of the artificial distinction between survival and longitudinal data analyses. Understanding the dynamics of such processes is important but has been hampered by a lack of appreciation of the difference between confirmatory and exploratory causal inferences. The latter represents an attempt at elucidating mechanisms by applying mediation analysis to statistical data and will usually be of a more tentative character than a confirmatory analysis.
The concept of local independence and the associated graphs are useful. This is related to Granger causality, an important method from econometrics that is generally undervalued by statisticians. This causality concept is different from the counterfactual one since it lacks lacks the intervention aspect. The notion that one can intervene at will in naturally occurring processes, which seems to underly much of modern causal inference, is problematic when studying mediation and mechanisms.
It is natural to assume a stochastic process point of view when analyzing dynamic relationships. We present some examples to illustrate this.
It is not clear how survival analysis must be developed to handle the complex lifehistory data that are increasingly being collected today. We give some suggestions. Desantis, S.M., Houseman, E.A., Coull, B.A., Nutt, C.L., Betensky, R.A. Supervised Bayesian latent class models for highdimensional data.
Highgrade gliomas are the most common primary brain tumors in adults and are typically diagnosed using histopathology. However, these diagnostic categories are highly heterogeneous and do not always correlate well with survival. In an attempt to refine these diagnoses, we make several immunohistochemical measurements of YKL40, a gene previously shown to be differentially expressed between diagnostic groups. We propose two latent class models for classification and variable selection in the presence of highdimensional binary data, fit by using Bayesian Markov chain Monte Carlo techniques. Penalization and model selection are incorporated in this setting via prior distributions on the unknown parameters. The methods provide valid parameter estimates under conditions in which standard supervised latent class models do not, and outperform twostage approaches to variable selection and parameter estimation in a variety of settings. We study the properties of these methods in simulations, and apply these methodologies to the glioma study for which identifiable threeclass parameter estimates cannot be obtained without penalization. With penalization, the resulting latent classes correlate well with clinical tumor grade and offer additional information on survival prognosis that is not captured by clinical diagnosis alone. The inclusion of YKL40 features also increases the precision of survival estimates. Fitting models with and without YKL40 highlights a subgroup of patients who have glioblastoma (GBM) diagnosis but appear to have better prognosis than the typical GBM patient.