November 3, 2015


The analysis of variance is not a mathematical theorem, but rather a convenient method of arranging the arithmetic. Ronald Fisher (1890-1962)

one- and two-way ANOVA • multiple comparisons • effect size

One-way ANOVA


We are interested in assessing whether several groups differ with respect to their mean. We use an omnibus Fisher-Snedecor F-test to test the null hypothesis that all group means are equal, assuming that group variances are identical in the population (homoskedasticity) and that the normality assumption holds. The alternative hypothesis is that at least one pair of means is different at a fixed \(\alpha=0.05\) level.


Effect of different sugars on length of pea sections grown in tissue culture with auxin present. (Sokal and Rohlf, 1995)

ctrl X2G X2F X1G1F X2S
1 75 57 58 58 62
2 67 58 61 59 66
3 70 60 56 58 65
4 75 59 58 61 63
5 65 62 57 57 64
6 71 60 56 56 62

The data

  • Response variable: length (in ocular units) of pea sections.
  • Explanatory variable (factor, group, predictor): treatment (control, 2% glucose, 2% fructose, 1% glucose + 1% fructose, 2% sucrose).
head(peas, n = 3)
##   ctrl X2G X2F X1G1F X2S
## 1   75  57  58    58  62
## 2   67  58  61    59  66
## 3   70  60  56    58  65

Reshaping data

peas.m <- melt(peas, = "length", 
      = "treatment")
head(peas.m, n = 3)
##   treatment length
## 1      ctrl     75
## 2      ctrl     67
## 3      ctrl     70

This way, we can express relation like length ~ treatment.

Summary statistics

To compute group means and SDs, we can use aggregate() two times, or write a little helper function.

f <- function(x, digits = 1) 
  round(c(mean = mean(x), sd = sd(x)), digits = digits)
aggregate(length ~ treatment, data = peas.m, f)
##   treatment length.mean
## 1      ctrl        70.1       4.0
## 2       X2G        59.3       1.6
## 3       X2F        58.2       1.9
## 4     X1G1F        58.0       1.4
## 5       X2S        64.1       1.8

Note: aggregate() does not handle multiple output correctly (check the dimension of the returned object), but it is ok for printing on the console.

Graphical display

Here, the idea is to show the distribution of the response variable in each group using box-and-whiskers charts.

One-way ANOVA

There are 5 groups of independant statistical units, and it looks like variance are approximately comparable in all but the Control group.

mod <- aov(length ~ treatment, data = peas.m)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## treatment    4 1077.3  269.33   49.37 6.74e-16 ***
## Residuals   45  245.5    5.46                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • F = 269.33/5.46 = (1077.3/4) / (245.5/45)
  • DF = (5-1) and (50-5)
  • p = pf(49.37, 4, 45, lower.tail = FALSE)

The model reads: \(y_{ij} = \mu + \alpha_i + \varepsilon_{ij}\), where the \(\alpha_i\)'s represent average group variations from the 'grand mean', \(\mu\).

A table of effects can be obtained using:

## Tables of effects
##  treatment 
## treatment
##  ctrl   X2G   X2F X1G1F   X2S 
##  8.16 -2.64 -3.74 -3.94  2.16

This is the same as:

aggregate(length ~ treatment, peas.m, mean)$length - 

Model fit = data + residual

d <-, 
                      fit = fitted(mod), 
                      residual = resid(mod))
res <- aggregate(. ~ treatment, data=d, mean)
res$effect <- res$fit - mean(peas.m$length)
##   treatment length  fit      residual effect
## 1      ctrl   70.1 70.1 -4.330303e-16   8.16
## 2       X2G   59.3 59.3  5.328637e-16  -2.64
## 3       X2F   58.2 58.2  4.440892e-16  -3.74
## 4     X1G1F   58.0 58.0  6.772740e-16  -3.94
## 5       X2S   64.1 64.1  3.552931e-16   2.16

How to interpret results

If the F-test is significant, then we reject the null, \(H_0:\alpha_1=\dots=\alpha_5\). We can conclude that the average length of pea sections differ between the groups (at least two of them), and that this difference is statistically significant at the 5% level (\(p<0.001\)).

What are the problems?

  1. The ANOVA table and the F-test do not tell us which pair(s) of means really differ.
  2. The F-test is not helpful to quantify the size of the observed effect.
  3. The validity of the F-test relies on two main assumptions: homosckedasticity and normality.


  1. We need to carry out some kind of post-hoc multiple comparisons.

    with(peas.m, pairwise.t.test(length, treatment, p.adj = "bonf"))
  2. The ratio SS(effect) / SS(effect+residual) reflects the proportion of variance accounted for by the factor of interest.

    summary(mod)[[1]][["Sum Sq"]]

    It is also possible to get 95% CI for such effect size (Steiger, 2004; Nakagawa and Cuthill, 2007), see the MBESS package.

  3. Formal tests of hypothesis (e.g., Bartlett and Levene tests) can be used, but it is better to verify that those assumptions are met based on some graphics (boxplot and QQ-plot). Or we can make use of a non-parametric test (Kruskal-Wallis ANOVA), or the same parametric test but some correction (Welch, 1951) (oneway.test()).

Two-way ANOVA

From one to two factors

Instead of working with only one factor and a single working hypothesis to test, we might want to study:

  1. The effect of one factor, accounting for the effect of another (known) factor.
  2. The separate effect of two factors.
  3. The joint effect of two factors.

This raises several questions, notably the effect of one factor can be different depending on the levels of the other factor (Sokal and Rohlf, 1995; Montgomery, 2012).

The model

Let \(y_{ijk}\) be the \(k\text{th}\) observation for level \(i\) of factor \(A\) (\(i=1,\dots,a\)) and level \(j\) of factor \(B\) (\(j=1,\dots,b\)). The full model can be written as follows:

\[ y_{ijk} = \mu + \alpha_i + \beta_j + \gamma_{ij} + \varepsilon_{ijk}, \]

where \(\mu\) is the overall mean, \(\alpha_i\) (\(\beta_j\)) is the deviation of the \(a\) (\(b\)) means from the overall mean, \(\gamma_{ij}\) is the deviation of the \(A\times B\) treatments from \(\mu\), and \(\varepsilon_{ijk}\sim {\cal N}(0,\sigma^2)\) is the residual term.

The \(\alpha_i\) et \(\beta_j\) are called main effects, and \(\gamma_{ij}\) is the interaction effect.

Null hypothesis testing

Null hypotheses associated to the full factorial design are given below:

  • \(H_0^A:\, \alpha_1=\alpha_2=\dots=\alpha_a\) (a-1 DF), No effect of A
  • \(H_0^B:\, \beta_1=\beta_2=\dots=\beta_b\) (b-1 DF), No effect of B
  • \(H_0^{AB}:\, \gamma_{11}=\gamma_{13}=\dots=\gamma_{ab}\) ((a-1)(b-1) DF), No interaction between A and B

The ratio of Mean Squares corresponding to each factor and that of the residuals can be tested with Fisher-Snedecor F-tests.

Interaction between factors

What to do with the interaction

If the interaction is significant, this means that the effect of the first factor depends on the levels of the second factor. In other words, there is no simple effect for the first factor, and we need to study its effect for each level of the second factor. If the interaction is not statistically significant, then we can remove this term from the model if it was not of primary interest, and refit the model.

Often times, the interaction term is the effect we are interested in. But we need to describe the direction and magnitude of this effect carefully. Graphical displays are useful, but be careful with error bars (Cumming and Finch, 2005; Masson, 2003)!

Finally, why a two-way ANOVA rather than two one-way ANOVAs?

Because of Simpson's paradox and biased estimates of the residual.

R commands

The main commands are: aggregate(), aov(), summary.aov(), model.tables(), interaction.plot(),

See Lab 3 : Analysis of variance with R for an application of two-way ANOVA and extended discussion.



[1] G. Cumming and S. Finch. "Inference by Eye, Confidence Intervals and How to Read Pictures of Data". In: American Psychologist 60.2 (2005), pp. 170-180.

[2] M. Masson. "Using confidence intervals for graphically based data interpretation". In: Canadian Journal of Experimental Psychology 57.3 (2003), pp. 203-220.

[3] D. Montgomery. Design and Analysis of Experiments. 8th. John Wiley & Sons, 2012.

[4] S. Nakagawa and I. Cuthill. "Effect size, confidence interval and statistical significance: a practical guide for biologists". In: Biological Reviews of the Cambridge Philosophical Society 82.4 (2007), pp. 591-605.

[5] R. Sokal and F. Rohlf. Biometry. 3rd. WH Freeman and Company, 1995.

[6] J. Steiger. "Beyond the F Test: Effect size confidence intervals and tests of close fit in the Analysis of Variance and Contrast Analysis". In: Psychological Methods 9 (2004), pp. 164-182.

[7] B. Welch. "On the comparison of several mean values: An alternative approach". In: Biometrika 38 (1951), pp. 330-336.