Here is an article published in Statistics in Medicine last year and that argues against the systematic use of Fisher’s exact test: Lydersen, S. Fagerland, M.W., and Laake, P., Recommended tests for association in 2×2 tables, Statistics in Medicine (2009) 28: 1159-1175.
Since this is the test that is most of the times reported in clinical biostatistics and epidemiology articles, let’s look at the rationale behind this claim. Below is the abstract of the article by Lydersen and coll.
The asymptotic Pearson’s chi-squared test and Fisher’s exact test have long been the most used for testing association in 2×2 tables. Unconditional tests preserve the significance level and generally are more powerful than Fisher’s exact test for moderate to small samples, but previously were disadvantaged by being computationally demanding. This disadvantage is now moot, as software to facilitate unconditional tests has been available for years. Moreover, Fisher’s exact test with mid-p adjustment gives about the same results as an unconditional test. Consequently, several better tests are available, and the choice of a test should depend only on its merits for the application involved. Unconditional tests and the mid-p approach ought to be used more than they now are. The traditional Fisher’s exact test should practically never be used.
First of all, let’s recall what’s the difference between Fisher’s test and the standard $\chi^2$ test. Basically, Pearson’s $\chi^2$ reads as the suqared differences between observed and expected counts, divided by expected counts (to get a convenient normalization). Under the hypothesis of no association (i.e., statistical independence between row and column totals, the so-called margins), the expected count of n_{ij} is simply the product of its corresponding totals, n_{i+} and n_{+j}. Under a large sample hypothesis, the observed $\chi^2$ statistic follow a $\chi^2$(1) distribution. It generalizes to IxJ Tables, where the degree of freedom is (I-1)(J-1). The Fisher’s test is an exact test relying on the hypergeometric distribution (where all possible Tables having the same marginal totals are computed). It is a computer intensive job, and in R (fisher.test()
) it is relegated to a C function.
It is important to recall that Fisher’s test assumes that margins are fixed, whereas Pearson’s $\chi^2$ might be used to test either independence or homogeneity of proportions when one marginal total is fixed.
A more comprehensive framework is provided in Sokal and Rohlf,^{(1)} p. 721 and ff. They consider three different kind of sampling schemes or designs, named Model I, II and III, depending on whether “the totals in the margins of the 2x2 Table are fixed by the investigator or are free to vary and reflect population parameters.”
The Fisher’s test is appropriate for Model III design. The G-test of independence is designed for the other two models. The G-test is a likelihood ratio test whose expression might be derived from multinomial expectations (p. 692) as
$$ G=2\sum_{i=1}^af_i\ln\left(\frac{f_i}{\hat f_i}\right), $$
i.e., it summarizes “the sum of the independent contributions of departures from expectations ln(•) weighted by the frequency of the particular class (f_{i}).” It is approximately $\chi^2$(1) distributed.
The Yates’ continuity correction consists in subtracting 0.5 from the difference between each observed value and its expected value in 2×2 table only.^{(1)}
Yates’ correction is known to result in tests that are more conservative as with Fisher’s “exact” tests. However, we can read in Agresti (CDA, 2002, p. 103) that
Yates (1934) mentioned that Fisher suggested the hypergeometric to him for an exact test. He proposed a continuity-corrected version of $\chi^2$, $$ \chi_c^2=\sum\sum\frac{\left(\mid n_{ij}-\hat\mu_{ij}\mid -0.5\right)^2}{\hat\mu_{ij}} $$ to approximate the exact test. (…) Since software now makes Fisher’s exact test feasible even with large samples, this correction is no longer needed.
There is an excellent tutorial by Jerry Dallal on Contingency Tables, and the same remark is made: Continuity-corrected chi-square are useless when it comes to approximate Fisher exact test now that the latter is readily available in any statistical software. Dallal considers the following example:
tab <- matrix(c(8,3,5,10), nr=2)
chisq.test(tab, correct=FALSE)
chisq.test(tab)
fisher.test(tab)
$\chi^2$ | p-value | |
---|---|---|
Uncorrected $\chi^2$ | 3.9394 | 0.0472 |
Corrected $\chi^2$ | 2.5212 | 0.1123 |
Fisher $\chi^2$ | – | 0.1107 |
As can be seen, the p-values from the continuity-corrected $\chi^2$ and Fisher’s test are in close agreement. The LRT (G-test) would yield $\chi^2$(1)=4.0573, p=0.0440. It is available through assocstats()
from the vcd package.
All of the above would lead to the following practical considerations: For large samples, or tables satisfying Cochran’s rule^{1}, use uncorrected chi-square test;^{(4,5)} otherwise, use Fisher’s test. See also Assumptions/Restrictions for Chi-square Tests on Contingency Tables, by Bruce Weaver.
Campbell^{(6)} discussed the use of Fisher’s test in lieu of the usual chi-square test, but see his associated website on Two by two Tables. As quoted on this website, the following rules are recommended:
Finally, Lydersen et coll. said that, as a conditional test, Fisher’s test may be replaced by more appropriate unconditional tests depending on the hypothesis of interest.