Mendelian randomization


Some time ago, I proposed a paper on Mendelian randomisation for the Journal Club on Cross Validated. Apparently, it fell to the water, but here are the main ideas from that paper.

The paper in question is freely available on PLoS Medicine:

Sheehan NA, Didelez V, Burton PR, Tobin MD (2008). Mendelian Randomisation and Causal Inference in Observational Epidemiology. PLoS Med 5(8): e177. doi:10.1371/journal.pmed.0050177

An older paper can also be found in the International Journal of Epidemiology, 'Mendelian randomization': can genetic epidemiology contribute to understanding environmental determinants of disease?. Mendelian randomization in family data (BMC Proc. 2009; 3(Suppl 7): S45) is another paper on Mendelian randomization with aggregation at the family level.

In few words, the idea behind Mendelian randomisation--which, from a genetic perspective, is all about the random assortment of genes from parents to offspring that occurs during gamete formation and conception(a)--is to use a known genetic variant as a proxy to assess potential confounding between an intermediate phenotype and the disease of interest, something akin to the use of instrumental variable in econometrics. Most importantly, this genetic variant is unrelated to the confounding factor(s), but it is predictive of the exposure factor. The effect of the genetic variant is not direct, and conditional on exposure and confounders the genetic variant is independent of the outcome. Testing the association between this genetic variant and the outcome amounts to test for the causal effect exposure → outcome.

Several limitations of Mendelian randomization are discussed, including the presence of linkage disequilibrium, genetic heterogeneity (when a phenotype is influencedby several alleles, generally at different loci), pleiotropy (when a genetic variant has more than one phenotypic effect), or population stratification (when the relation between allele frequencies and disease or exposure vary across subgroups), to name a few. Figures 2-5 provide nice depictions of what happens in those cases.


  1. Chen, L, Smith, GD, Harbord, RM, and Lewis, SJ (2008). Alcohol intake and blood pressure: A systematic review implementing a Mendelian randomization approach. PLoS Med 5: e52.
  2. Hernán MA, Robins JM (2006). Instruments for causal inference. An epidemiologist's dream. Epidemiology 17: 360–372.
  3. Smith, GD, Ebrahim, S, Lewis S, Hansell AL, Palmer LJ, and Burton, PR (2005). Genetic epidemiology and public health: Hope, hype, and future prospects. Lancet 366: 1484–1498.
  4. Smith, GD and Ebrahim, S (2003). 'Mendelian randomization': Can genetic epidemiology contribute to understanding environmental determinants of disease. International Journal of Epidemiology 32(1): 1–22.
  5. Katan, MB (1986). Apolipoprotein E isoforms, serum cholesterol, and cancer. Lancet i: 507–508. (See also this IJE paper)
  6. Cambien, F (2003). On Mendelian Randomisation. GeneCanvas.
  7. Didelez, V and Sheehan, NA (2007). Mendelian randomisation as an instrumental variable approach to causal inference. Statistical Methods in Medical Research 16: 309–330.
  8. Lawlor, DA, Harbord, RM, Sterne, JAC, Timpson, N, and Smith, GD (2008). Mendelian randomization: Using genes as instruments for making causal inferences in epidemiology. Statistics in Medicine 27: 1133-1163.


(a) In the context of genetic association study, Mendelian laws ensure that the comparison of groups of individuals defined by genotype is equivalent to a randomized comparison, such that genetic or non-genetic traits are expected to be distributed randomly across genotypes, except those that are affected by the polymorphism under study. Whence the bias-free comparison of phenotypes across genotypes and the idea that such results might bring insight into causal pathways.


Articles with the same tag(s):

Academic teaching
Data cleaning techniques
Data Science from Scratch
Writing a book
Stata for health researchers
R Graphs Cookbook
Bad Data
Data science at the command-line
Reproducible research with R
Twenty canonical questions in machine learning