Christophe Lalanne
November 25, 2014
These slides are intended to provide some complementary material to multiple comparisons in ANOVA modeling, and the analysis of correlated using mixed-effect models.
The analysis of variance is not a mathematical theorem, but rather a convenient method of arranging the arithmetic. Ronald Fisher (1890-1962)
One-way ANOVA: yij=μ+αi+εij,εij∼N(0,σ2).
Two-way ANOVA yijk=μ+αi+βj+γij+εijk,εijk∼N(0,σ2).
When a factor has more than 2 levels, or when the interaction between the two factors is of interest, this involves post-hoc multiple comparisons of pairs of means.
With k samples (treatments), we can define k−1 orthogonal contrasts ϕ=k∑i=1ci¯xi,∑ici=0etϕtuϕtv=0 and use as a test statistic ϕsϕ, where s2ϕ=s2∑ic2ini, which is distributed as a Student's t.
contr.treatment
: treatment contrast (dummy coding), k−1 last levels compared to baseline categorycontr.sum
: deviation contrast, k−1 first levels compared to the mean of the group meanscontr.poly
: polynomial contrast (factor with ordered levels only), to test for linear, quadratic, cublic trend.contr.treatment(levels(grp <- gl(3, 2, 36, c("A","B","C"))))
## B C ## A 0 0 ## B 1 0 ## C 0 1
y <- 0.5 * as.numeric(grp) + rnorm(36); coef(lm(y ~ grp))
## (Intercept) grpB grpC ## 0.3611311 0.6080719 1.1492023
coef(lm(y ~ grp, contrasts = list(grp = "contr.sum")))
## (Intercept) grp1 grp2 ## 0.94688920 -0.58575808 0.02231383
Consider a factor A with 4 levels and 8 observations per group.
set.seed(101) n <- 8 A <- gl(4, n, 4*n, labels = paste("a", 1:4, sep = "")) y <- 0.5 * as.numeric(A) + rnorm(4*n) d <- data.frame(y, A) print(mm <- tapply(y, A, mean))
## a1 a2 a3 a4 ## 0.7195790 0.9945359 0.8941996 2.2276571
One-way ANOVA model:
H0:μ1=μ2=μ3=μ4, SS=11.38, F(3,28)=6.036 (p=0.00264).
Let us assume that we are interested in the following comparisons: H(1)0:(μ1+μ2)/2=(μ3+μ4)/2, H(2)0:μ1=μ2 and H(3)0:μ3=μ4.
cm <- cbind(c(-1,-1,1,1), c(-1,1,0,0), c(0,0,-1,1)) contrasts(d$A) <- cm summary(lm(y ~ A, data = d))
## ## Call: ## lm(formula = y ~ A, data = d) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.63805 -0.33598 0.07134 0.46073 1.43322 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 1.2090 0.1401 8.628 2.25e-09 *** ## A1 0.3519 0.1401 2.512 0.01806 * ## A2 0.1375 0.1982 0.694 0.49355 ## A3 0.6667 0.1982 3.365 0.00224 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.7927 on 28 degrees of freedom ## Multiple R-squared: 0.3927, Adjusted R-squared: 0.3277 ## F-statistic: 6.036 on 3 and 28 DF, p-value: 0.002642
as.numeric((t(cm) %*% mm) / c(4, 2, 2))
## [1] 0.3519355 0.1374784 0.6667288
library(multcomp) summary(glht(aov(y ~ A, d), linfct = mcp(A = c("a1 - a2 = 0"))))
## ## Simultaneous Tests for General Linear Hypotheses ## ## Multiple Comparisons of Means: User-defined Contrasts ## ## ## Fit: aov(formula = y ~ A, data = d) ## ## Linear Hypotheses: ## Estimate Std. Error t value Pr(>|t|) ## a1 - a2 == 0 -0.2750 0.3963 -0.694 0.494 ## (Adjusted p values reported -- single-step method)
With k samples, there are k(k−1)/2 pairs of means to compare. With k=4 and α=0.05, the family wise error rate (FWER) becomes 1−(1−0.05)6=0.265, assuming tests are independent.
There are two general (and conservative) strategies to control the inflation of Type I errors: (Christensen, 2002)
p.adjust.methods
## [1] "holm" "hochberg" "hommel" "bonferroni" "BH" ## [6] "BY" "fdr" "none"
Usually, Bonferroni correction is applied to unplanned comparisons, and it can be restricted to a subset of all possible tests.
pairwise.t.test(y, A, p.adjust.method = "bonf")
## ## Pairwise comparisons using t tests with pooled SD ## ## data: y and A ## ## a1 a2 a3 ## a2 1.0000 - - ## a3 1.0000 1.0000 - ## a4 0.0042 0.0255 0.0134 ## ## P value adjustment method: bonferroni
Tukey HSD tests can be used following a significant ANOVA, and they are applied to all pairs of means.
TukeyHSD(aov(y ~ A, d))
## Tukey multiple comparisons of means ## 95% family-wise confidence level ## ## Fit: aov(formula = y ~ A, data = d) ## ## $A ## diff lwr upr p adj ## a2-a1 0.2749569 -0.8071461 1.3570599 0.8986093 ## a3-a1 0.1746206 -0.9074824 1.2567236 0.9708799 ## a4-a1 1.5080781 0.4259751 2.5901811 0.0037260 ## a3-a2 -0.1003363 -1.1824393 0.9817667 0.9941741 ## a4-a2 1.2331212 0.1510182 2.3152242 0.0208968 ## a4-a3 1.3334575 0.2513545 2.4155605 0.0113289
When the number of observations differ in each treatment, we are more concerned with how to compute sum of squares (Herr, 1986).
For two factors, A and B:
See also, Venables, W.N. (2000). Exegeses on Linear Models. Paper presented to the S-PLUS User’s Conference Washington, DC, 8-9th October, 1998. (§ 5.1)
Evelyn Hall: I would like to know how (if) I can extract some of the information from the summary of my nlme.
Simon Blomberg: This is R. There is no if. Only how.
—Evelyn Hall and Simon ‘Yoda’ Blomberg
R-help (April 2005)
Compared to standard linear models, mixed-effect models further include random-effect terms that allow to reflect correlation between statistical units. Such clustered data may arise as the result of grouping of individuals (e.g., students nested in schools), within-unit correlation (e.g., longitudinal data), or a mix thereof (e.g., performance over time for different groups of subjects) (McCulloch and Searle, 2001; Lindsey, 1999; Gelman and Hill, 2007).
This approach belongs to conditional models, as opposed to marginal models, where we are interested in modeling population-averaged effects by assuming a working (within-unit) correlation matrix. ME models are not restricted to a single level of clustering, and they give predicted values for each cluster or level in the hierarchy.
There seems to be little agreement about what fixed and random effects really means (Gelman, 2005).
As a general decision workflow, we can ask whether we are interested in just estimating parameters for the random-effect terms, or get predictions at the individual level.
The famous 'sleep' study is a good illustration of the importance of using pairing information when available.
a <- t.test(extra ~ group, data = sleep, var.equal = TRUE)
## [1] "t(18)=-1.86, p=0.07919"
b <- t.test(extra ~ group, data = sleep, paired = TRUE)
## [1] "t(9)=-4.06, p=0.00283"
Generally, ignoring within-unit correlation results in a less powerful test: Considering that Cov(X1,X2)=0 amounts to over-estimate variance of the differences, since Cov(X1,X2) will generally be positive.
A positive covariance in this case means that subjects having higher values on the first level will also have higher values on the second level.
Lack of digestive enzymes in the intestine can cause bowel absorption problems, with excess fat in the feces. Pancreatic enzyme supplements can be given to ameliorate the problem (Vittinghoff, Glidden, Shiboski, and McCulloch, 2005).
ID | None | Tablet | Capsule | Coated | |
---|---|---|---|---|---|
1 | 1.00 | 44.50 | 7.30 | 3.40 | 12.40 |
2 | 2.00 | 33.00 | 21.00 | 23.10 | 25.40 |
3 | 3.00 | 19.10 | 5.00 | 11.80 | 22.00 |
4 | 4.00 | 9.40 | 4.60 | 4.60 | 5.80 |
5 | 5.00 | 71.30 | 23.30 | 25.60 | 68.20 |
6 | 6.00 | 51.20 | 38.00 | 36.00 | 52.60 |
Wide format: Each level in a separate column.
d <- read.table("../data/pilltype.dat", header = TRUE) head(d)
## ID None Tablet Capsule Coated ## 1 1 44.5 7.3 3.4 12.4 ## 2 2 33.0 21.0 23.1 25.4 ## 3 3 19.1 5.0 11.8 22.0 ## 4 4 9.4 4.6 4.6 5.8 ## 5 5 71.3 23.3 25.6 68.2 ## 6 6 51.2 38.0 36.0 52.6
Long format: All levels in the same column.
Note that we often need to keep an 'ID' variable to identify subjects and repeated measurements.
library(reshape2) head(fat <- melt(d, id.vars = "ID", variable.name = "pilltype", value.name = "fecfat"), n = 7)
## ID pilltype fecfat ## 1 1 None 44.5 ## 2 2 None 33.0 ## 3 3 None 19.1 ## 4 4 None 9.4 ## 5 5 None 71.3 ## 6 6 None 51.2 ## 7 1 Tablet 7.3
There is only one predictor, Pill type, which is attached to subject and period of time (subsumed under the repeated administration of the different treatment).
Here are different ways of decomposing the total variance:
aov(fecfat ~ pilltype, data = fat)
aov(fecfat ~ pilltype + subject, data = fat)
aov(fecfat ~ pilltype + Error(subject), data = fat)
Source | DF | SS | MS | M1 | M2*/M3 |
---|---|---|---|---|---|
pilltype | 3 | 2009 | 669.5 | 669.5/359.7 (p=0.17 ) | 669.5/107.0 (p=0.01) |
subject | 5 | 5588 | 1117.7 | – | 1117.7/107.0 (p=0.00*) |
residuals | 15 | 1605 | 107.0 | – | – |
The first model, which assumes independent observations, does not remove variability between subjects (about 77.8% of residual variance).
The last two models incorporate subject-specific effects:
yij=μ+subjecti+pilltypej+εij,εij∼N(0,σ2ε)
The inclusion of a random effect specific to subjects allows to model several types of within-unit correlation at the level of the outcome. What is the correlation between measurements taken from the same individual? We know that Cor(yij,yik)=Cov(yij,yik)√Var(yij)√Var(yik). Because μ and pilltype are fixed, and εij⊥subjecti, we have Cov(yij,yik)=Cov(subjecti,subjecti)=Var(subjecti)=σ2s, and variance components follow from Var(yij)=Var(subjecti)+Var(εij)=σ2s+σ2ε, which is assumed to hold for all observations.
So that, we have Cor(yij,yik)=σ2sσ2s+σ2ε which is the proportion of the total variance that is due to subjects. It is also known as the intraclass correlation, ρ, and it measures the closeness of observations on different subjects (or within-cluster similarity).
Observations on the same subject are modeled as correlated through their shared random subject effect. Using the random intercept model defined above, we can estimate ρ as follows: (default method is known as REML)
First, we fit a random intercept model by specifying which variable is used to identify subjects (ID
):
library(nlme) lme.fit <- lme(fecfat ~ pilltype, data = fat, random = ~ 1 | ID) anova(lme.fit)
## numDF denDF F-value p-value ## (Intercept) 1 15 14.265686 0.0018 ## pilltype 3 15 6.257391 0.0057
intervals(lme.fit)
## Approximate 95% confidence intervals ## ## Fixed effects: ## lower est. upper ## (Intercept) 21.58081 38.083333 54.585860 ## pilltypeTablet -34.27929 -21.550000 -8.820713 ## pilltypeCapsule -33.39595 -20.666667 -7.937380 ## pilltypeCoated -19.74595 -7.016667 5.712620 ## attr(,"label") ## [1] "Fixed effects:" ## ## Random Effects: ## Level: ID ## lower est. upper ## sd((Intercept)) 8.001119 15.89557 31.57924 ## ## Within-group standard error: ## lower est. upper ## 7.232394 10.344027 14.794394
We want to compute Cor(yij,yik)=σ2sσ2s+σ2ε.
From the preceding output, we can extract everything we need using VarCorr
:
sigma.s <- as.numeric(VarCorr(lme.fit)[1,2]) sigma.eps <- as.numeric(VarCorr(lme.fit)[2,2]) sigma.s^2 / (sigma.s^2+sigma.eps^2)
## [1] 0.7025064
From the ANOVA table, we can also compute ˆρ:
ms <- anova(lm(fecfat ~ pilltype + ID, data = fat))[[3]] vs <- (ms[2]-ms[3]) / nlevels(fat$pilltype) vr <- ms[3] vs / (vs+vr)
## [1] 0.6389674
We could also use Generalized Least Squares, imposing compound symmetry:
gls.fit <- gls(fecfat ~ pilltype, data = fat, corr = corCompSymm(form= ~ 1 | ID)) anova(gls.fit)
## Denom. DF: 20 ## numDF F-value p-value ## (Intercept) 1 14.265652 0.0012 ## pilltype 3 6.257396 0.0036
## intervals(gls.fit)
We would get ˆρ=0.7025074.
The imposed variance-covariance structure is clearly reflected in the predicted values.
Inspection of the distribution of the residuals and residuals vs. fitted values plots are useful diagnostic tools. It is also interesting to examine the distribution of random effects (intercepts and/or slopes).
## anova(lme.fit) lme.fit <- update(lme.fit, method="ML") lme.fit0 <- update(lme.fit, fixed= . ~ - pilltype) anova(lme.fit, lme.fit0)
## Model df AIC BIC logLik Test L.Ratio p-value ## lme.fit 1 6 201.9581 209.0264 -94.97905 ## lme.fit0 2 3 210.5667 214.1008 -102.28334 1 vs 2 14.60857 0.0022
Other VC matrices can be choosen, depending on study design (Pinheiro and Bates, 2000), e.g., Unstructured, First-order auto-regressive, Band-diagonal, AR(1) with heterogeneous variance.
The random-intercept model ensures that the VC structure will be constrained as desired. With repeated measures ANOVA, a common strategy is to use Greenhouse-Geisser or Huynh-Feldt correction to correct for sphericity violations, or to rely on MANOVA which is less powerful, e.g. (Abdi, 2010; Zar, 1998).
[1] H. Abdi. "The Greenhouse-Geisser correction". In: Encyclopedia of Research Design. Ed. by N. Salkind. Thousand Oaks, CA: Sage, 2010.
[2] R. Christensen. Plane Answers to Complex Questions: The Theory of Linear Models. 3rd. Springer, 2002.
[3] A. Gelman. "Analysis of variance-why it is more important than ever". In: Annals of Statistics 33.1 (2005), pp. 1-53.
[4] A. Gelman and J. Hill. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, 2007.
[5] D. Herr. "On the History of ANOVA in Unbalanced, Factorial Designs: The First 30 Years". In: The American Statistician 40.4 (1986), pp. 265-270.
[6] J. Lindsey. Models for Repeated Measurements. 2nd. Oxford University Press, 1999.
[7] C. McCulloch and S. Searle. Generalized, Linear, and Mixed Models. Wiley, 2001.
[8] J. Pinheiro and D. Bates. Mixed-Effects Models in S and S-PLUS. Springer, 2000.
[9] E. Vittinghoff, D. Glidden, S. Shiboski, et al. Regression Methods in Biostatistics. Linear, Logistic, Survival, and Repeated Measures Models. Springer, 2005.
[10] J. Zar. Biostatistical Analysis. 4th. Pearson, Prentice Hall, 1998.