Christophe Lalanne
October 6, 2015
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. Ronald Fisher (1890-1962)
Lectures: OpenIntro Statistics, 1.3-1.7, 4.1, 4.3, 4.6, 5.4.
"Statisticians are applied philosophers. Philosophers argue how many angels can dance on the head of a needle; statisticians count them. (…) We can predict nothing with certainty but we can predict how uncertain our predictions will be, on average that is. Statistics is the science that tells us how." Senn (2003)
Descriptive statistics are a fundamental step before any attempt at modeling.
It is very important to summarize the distribution of each variable (univariate approach) and pairwise relationships between variables (bivariate approach).
Anscombe's Quartet
Pearson's correlation, as a measure of linear association, is meaningful in case (a) only. The assumption of linearity must be carefully checked.
r | |
---|---|
x1-y1 | 0.8164 |
x2-y2 | 0.8162 |
x3-y3 | 0.8163 |
x4-y4 | 0.8165 |
EDA is about exploring data for patterns and relationships without requiring prior hypotheses and using resistant methods (Tukey, 1977). This iterative approach makes heavy use of graphical methods to suggest hypotheses and check model assumptions.
The main ideas down to: (Hoaglin, Mosteller, and Tukey, 1985)
We focus on a single hypothesis (null hypothesis) and calculate the probability that the data would have been observed if the null hypothesis were true. If this probability is small enough (usually, 0.05), then we "reject" the null. The statistical power associated with such a test is the probability that if the null were actually false we would reject it (given the same data).
In sum, the idea is to confront a single hypothesis with the data, through a designed experiment, with falsification as the only "truth." This approach follows from Popper's philosophical development and was implemented by Fisher, and Neyman & Pearson's NHST framework. See Hilborn and Mangel (1997) for more discussion.
We would rather like to know P(H0∣data) than P(|S|>|s|) under the null–even if 'the earth is round (p < .05).'
"Four of these dishes were filled with a conventional nutrient solution and four held an experimental 'life-extending' solution to which vitamin E had been added. I waited three weeks with fingers crossed that there was no contamination of the cell cultures, but at the end of this test period three dishes of each type had survived. My technician and I transplanted the cells, let them grow for 24 hours in contact with a radioactive label, and then fixed and stained them before covering them with a photographic emulsion. Ten days passed and we were ready to examine the autoradiographs. Two years had elapsed since I first envisioned this experiment and now the results were in: I had the six numbers I needed. 'I've lost the labels,' my technician said as she handed me the results. This was a dire situation. Without the labels, I had no way of knowing which cell cultures had been treated with vitamin E and which had not." (Good, 2005)
Here are the data:
Let H0: 'vitamin E does not impact culture growth.' Under the null, the two labels (treated, not treated) do not bring any information about the response. Let us consider the total number of cells in each batch.
There are {6 \choose 3} to arrange 3 elements (X1
to X3
) taken among a total of 6 elements. Let s be the sum of their values. The observed sum S=121+118+110=349
.
How many times do we observe such an extreme statistic?
X1 | X2 | X3 | s | |
---|---|---|---|---|
1 | 121.00 | 118.00 | 110.00 | 349.00 |
2 | 121.00 | 118.00 | 34.00 | 273.00 |
3 | 121.00 | 118.00 | 12.00 | 251.00 |
18 | 110.00 | 34.00 | 22.00 | 166.00 |
19 | 110.00 | 12.00 | 22.00 | 144.00 |
20 | 34.00 | 12.00 | 22.00 | 68.00 |
Student's t-test can be used to compare two means estimated from small to moderate sized samples. The idea is to refer the test statistic (parameter estimate divided by its standard error, often times assuming a pooled variance estimated from the whole sample) to a T distribution with \nu=n_1+n_2-2 degrees of freedom.
By default, R's t.test()
does not assume equality of variance when performing this test and makes use of Welch-Satterthwaite correction (Behrens-Fisher problem).
## function (x, y = NULL, alternative = c("two.sided", "less", "greater"), ## mu = 0, paired = FALSE, var.equal = FALSE, conf.level = 0.95, ## ...) ## NULL
Effect of two soporific drugs (1: D. hyoscyamine hydrobromide vs. 2: L. hyoscyamine hydrobromide) measured as increase in hours of sleep compared to control on 10 patients. (William Sealy Gosset, 1876-1937, nom de plume Student).
data(sleep) head(sleep, n=2)
## extra group ID ## 1 0.7 1 1 ## 2 -1.6 1 2
aggregate(extra ~ group, data=sleep, mean)
## group extra ## 1 1 0.75 ## 2 2 2.33
t.test(extra ~ group, data=sleep, paired=TRUE)
## ## Paired t-test ## ## data: extra by group ## t = -4.0621, df = 9, p-value = 0.002833 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## -2.4598858 -0.7001142 ## sample estimates: ## mean of the differences ## -1.58
As n increases (\nu\propto n), the T distribution approaches that of the standard Normal. For small n, the corresponding T distribution exhibits thicker tails (accounting for unusual but expected large results in small samples).
H_0: \mu_1=\mu_2, or equivalently \mu_1-\mu_2=0 (no difference in population means, i.e. the two samples come from the same population) vs. H_1: \mu_1\neq\mu_2 (differences can be in either direction) or H_1: \mu_1\lessgtr\mu_2 .
We tossed a coin 10 times and observed the following outcomes: TTTTHHHTHT
. Suppose that the events are all independent. Some questions we may want to ask:
These are very different questions which call for a decision test, simple probability calculus, and a confidence interval for our estimate. In all cases, however, we need to rely on a known statistical distribution.
To answer the first question, the statistician would say:
"Assuming a fair coin and independent events, the expected number of Heads is 10 × 0.5 = 5. The observed frequency of Heads is 4/10 = 0.4.
We can formulate our null as H_0: p=0.5, and use a binomial test to test whether the observed frequency significantly differs from the expected frequency, considering a risk \alpha = 5%.
The results suggest that this series of H and T’s is not incompatible with the hypothesis of equal probability."
binom.test(x=4, n=10)
## ## Exact binomial test ## ## data: 4 and 10 ## number of successes = 4, number of trials = 10, p-value = 0.7539 ## alternative hypothesis: true probability of success is not equal to 0.5 ## 95 percent confidence interval: ## 0.1215523 0.7376219 ## sample estimates: ## probability of success ## 0.4
Compare to the following test which relies on a Gaussian approximation:
prop.test(x=4, n=10)
[1] D. Ashby. "Bayesian statistics in medicine: a 25 year review". In: Statistics in Medicine 25.21 (2006), pp. 3589-3631.
[2] P. Good. Permutation, Parametric and Bootstrap Tests of Hypothesis. 3rd. Springer, 2005.
[3] R. Hilborn and M. Mangel. The ecological detective. Confronting models with data. Princeton University Press, 1997.
[4] D. Hoaglin, F. Mosteller and K. Tukey. Understanding Robust and Exploratory Data Analysis. New York: Wiley, 1985.
[5] S. Senn. Dicing with death. Chance, Risk and Health. Cambridge University Press, 2003.
[6] J. Tukey. Exploratory Data Analysis. Addison-Wesley, 1977.