Data analysis and statistical inference

Christophe Lalanne
October 15, 2013

Synopsis

To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. Ronald Fisher (1890-1962)

Descriptive statistics • Exploratory Data Analysis • Statistical inference • Statistical models • Student t test

Lectures: OpenIntro Statistics, 1.3-1.7, 4.1, 4.3, 4.6, 5.4.

Statistics are and statistics is

“Statisticians are applied philosophers. Philosophers argue how many angels can dance on the head of a needle; statisticians count them. (…) We can predict nothing with certainty but we can predict how uncertain our predictions will be, on average that is. Statistics is the science that tells us how.” Senn (2003)

  • Providing appropriate (meaningful and robust) numerical and visual summary of the data.
  • Summarizing associations, spotting unusual observations.
  • Modeling and testing, while reporting honest estimates of uncertainty for model parameters.

Basic summary statistics

Descriptive statistics are a fundamental step before any attempt at modeling. It is very important to summarize the distribution of each variable (univariate approach) and pairwise relationships between variables (bivariate approach).

  • Central tendency: mean, median, mode
  • Dispersion: standard deviation, inter-quartile range, range
  • Distribution: histogram (or density estimate), quantile plot, cumulative distribution function, table of relative frequencies
  • Association: correlation (linear or rank-based), odds-ratio, standardized mean difference

Comparing two proportions (con't)

To answer the first question, the statistician would say:

“Assuming a fair coin and independent events, the expected number of Heads is \( 10 × 0.5 = 5 \). The observed frequency of Heads is \( 4/10 = 0.4 \).
We can formulate our null as \( H_0: p=0.5 \), and use a binomial test to test whether the observed frequency significantly differs from the expected frequency, considering a risk \( \alpha = 5 \)%.
The results suggest that this series of H and T’s is not incompatible with the hypothesis of equal probability.”

Comparing two proportions (con't)

binom.test(x=4, n=10)

    Exact binomial test

data:  4 and 10 
number of successes = 4, number of trials = 10, p-value = 0.7539
alternative hypothesis: true probability of success is not equal to 0.5 
95 percent confidence interval:
 0.1216 0.7376 
sample estimates:
probability of success 
                   0.4 

Compare to the following test which relies on a Gaussian approximation:

prop.test(x=4, n=10)

References

Ashby D (2006). “Bayesian statistics in medicine: a 25 year review.” Statistics in Medicine, 25(21), pp. 3589-3631.

Good P (2005). Permutation, Parametric and Bootstrap Tests of Hypothesis, 3rd edition. Springer.

Hilborn R and Mangel M (1997). The ecological detective. Confronting models with data. Princeton University Press.

Hoaglin D, Mosteller F and Tukey K (1985). Understanding Robust and Exploratory Data Analysis. New York: Wiley.

Senn S (2003). Dicing with death. Chance, Risk and Health. Cambridge University Press.

Tukey J (1977). Exploratory Data Analysis. Addison-Wesley.