Review Article | Published:

Using genetic data to strengthen causal inference in observational research

Nature Reviews Geneticsvolume 19pages566580 (2018) | Download Citation


Causal inference is essential across the biomedical, behavioural and social sciences.By progressing from confounded statistical associations to evidence of causal relationships, causal inference can reveal complex pathways underlying traits and diseases and help to prioritize targets for intervention. Recent progress in genetic epidemiology — including statistical innovation, massive genotyped data sets and novel computational tools for deep data mining — has fostered the intense development of methods exploiting genetic data and relatedness to strengthen causal inference in observational research. In this Review, we describe how such genetically informed methods differ in their rationale, applicability and inherent limitations and outline how they should be integrated in the future to offer a rich causal inference toolbox.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Additional information

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Twin Registries:


LD Hub:



  1. 1.

    Glass, T. A., Goodman, S. N., Hernán, M. A. & Samet, J. M. Causal inference in public health. Annu. Rev. Public Health 34, 61–75 (2013).

  2. 2.

    Rimm, E. B. et al. Vitamin E consumption and the risk of coronary heart disease in men. N. Engl. J. Med. 328, 1450–1456 (1993).

  3. 3.

    Stampfer, M. J. et al. Vitamin E consumption and the risk of coronary disease in women. N. Engl. J. Med. 328, 1444–1449 (1993).

  4. 4.

    Millen, A. E., Dodd, K. W. & Subar, A. F. Use of vitamin, mineral, nonvitamin, and nonmineral supplements in the United States: the 1987, 1992, and 2000 National Health Interview Survey results. J. Am. Diet Assoc. 104, 942–950 (2004).

  5. 5.

    Eidelman, R. S., Hollar, D., Hebert, P. R., Lamas, G. A. & Hennekens, C. H. Randomized trials of vitamin E in the treatment and prevention of cardiovascular disease. Arch. Intern. Med. 164, 1552–1556 (2004).

  6. 6.

    Imai, K., King, G. & Stuart, E. A. Misunderstandings between experimentalists and observationalists about causal inference. J. Royal Stat. Soc. A Stat. Methodol. 171, 481–502 (2008).

  7. 7.

    Jaffee, S. R. & Price, T. S. The implications of genotype-environment correlation for establishing causal processes in psychopathology. Dev. Psychopathol. 24, 1253–1264 (2012).

  8. 8.

    Deaton, A. & Cartwright, N. Understanding and misunderstanding randomized controlled trials. Soc. Sci. Med. (2017).

  9. 9.

    DiMasi, J. A., Grabowski, H. G. & Hansen, R. W. Innovation in the pharmaceutical industry: new estimates of R&D costs. J. Health Econ. 47, 20–33 (2016).

  10. 10.

    McGue, M., Osler, M. & Christensen, K. Causal inference and observational research: the utility of twins. Perspect. Psychol. Sci. 5, 546–556 (2010).This study is an introduction to the twin model from a causal inference perspective. It includes a discussion of concepts, estimations and limitations.

  11. 11.

    Davey Smith, G. & Ebrahim, S. What can Mendelian randomisation tell us about modifiable behavioural and environmental exposures? BMJ 330, 1076–1079 (2005).

  12. 12.

    Davey Smith, G. & Hemani, G. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Hum. Mol. Genet. 23, R89–98 (2014).

  13. 13.

    Burgess, S., Timpson, N. J., Ebrahim, S. & Davey Smith, G. Mendelian randomization: where are we now and where are we going? Int. J. Epidemiol. 44, 379–388 (2015).

  14. 14.

    Nikpay, M. et al. A comprehensive 1,000 Genomes-based genome-wide association meta-analysis of coronary artery disease. Nat. Genet. 47, 1121–1130 (2015).

  15. 15.

    Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).

  16. 16.

    Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).

  17. 17.

    Hemani, G. et al. MR-Base: a platform for systematic causal inference across the phenome using billions of genetic associations. Preprint at bioRxiv 78972 (2016).

  18. 18.

    Stuart, E. A. Matching methods for causal inference: a review and a look forward. Stat. Sci. 25, 1–21 (2010).

  19. 19.

    Angrist, J. D., Imbens, G. W. & Rubin, D. B. Identification of causal effects using instrumental variables. J. Am. Stat. Assoc. 91, 444–455 (1996).

  20. 20.

    Tenesa, A. & Haley, C. S. The heritability of human disease: estimation, uses and abuses. Nat. Rev. Genet. 14, 139–149 (2013).

  21. 21.

    Speed, D., Cai, N., Johnson, M. R., Nejentsev, S. & Balding, D. J. Reevaluation of SNP heritability in complex human traits. Nat. Genet. 49, 986–992 (2017).

  22. 22.

    Hernán, M. A. A definition of causal effect for epidemiological research. J. Epidemiol. Commun. Health 58, 265–271 (2004). This study is a pedagogical introduction to the counterfactual or potential outcomes framework for causal inference. It includes mathematical notations and a discussion of key concepts, such as association, causation and exchangeability.

  23. 23.

    Imbens, G. W. & Rubin, D. B. Causal Inference for Statistics, Social, and Biomedical Sciences. (Cambridge Univ. Press, Cambridge, 2015).

  24. 24.

    Pearl, J. Causality. (Cambridge Univ. Press, Cambridge, 2009).

  25. 25.

    Rice, F. et al. Disentangling prenatal and inherited influences in humans with an experimental design. Proc. Natl Acad. Sci. USA 106, 2464–2467 (2009).This is an example of the application of the IVF design to examine the effect of smoking during pregnancy on birthweight.

  26. 26.

    Mezuk, B., Myers, J. M. & Kendler, K. S. Integrating social science and behavioral genetics: testing the origin of socioeconomic disparities in depression using a genetically informed design. Am. J. Publ. Heal. 103 (Suppl.), 145–151 (2013).

  27. 27.

    Kendler, K. S. & Gardner, C. O. Dependent stressful life events and prior depressive episodes in the prediction of major depression: the problem of causal inference in psychiatric epidemiology. Arch. Gen. Psychiatry 67, 1120–1127 (2010).

  28. 28.

    Bruder, C. E. G. et al. Phenotypically concordant and discordant monozygotic twins display different DNA copy-number-variation profiles. Am. J. Hum. Genet. 82, 763–771 (2008).

  29. 29.

    Carlin, J. B., Gurrin, L. C., Sterne, J. A., Morley, R. & Dwyer, T. Regression models for twin studies: a critical review. Int. J. Epidemiol. 34, 1089–1099 (2005).

  30. 30.

    Vitaro, F., Brendgen, M. & Arseneault, L. The discordant MZ-twin method: one step closer to the holy grail of causality. Int. J. Behav. Dev. 33, 376–382 (2009).

  31. 31.

    Fletcher, J. M. & Lehrer, S. F. Genetic lotteries within families. J. Heal. Econ. 30, 647–659 (2011). This paper provides a model combining family fixed effects and genetic instruments, with a discussion of important concepts, such as dynastic effects.

  32. 32.

    Kohler, H.-P., Behrman, J. R. & Schnittker, J. Social science methods for twins data: integrating causality, endowments, and heritability. Biodemogr. Soc. Biol. 57, 88–141 (2011).

  33. 33.

    Hjelmborg, J. et al. Lung cancer, genetic predisposition and smoking: the Nordic Twin Study of Cancer. Thorax 72, 1021–1027 (2017).

  34. 34.

    Bröckerman, P., Hyytinen, A. & Kaprio, J. Smoking and long-term labour market outcomes. Tob. Control 24, 348–353 (2015).

  35. 35.

    Cohen-Cline, H., Turkheimer, E. & Duncan, G. E. Access to green space, physical activity and mental health: a twin study. J. Epidemiol. Commun. Health 69, 523–529 (2015).

  36. 36.

    Singham, T. et al. Concurrent and longitudinal contribution of exposure to bullying in childhood to mental health: the role of vulnerability and resilience. JAMA Psychiatry 74, 1112–1119 (2017).

  37. 37.

    Taylor, M. J. et al. Developmental associations between traits of autism spectrum disorder and attention deficit hyperactivity disorder: a genetically informative, longitudinal twin study. Psychol. Med. 43, 1735–1746 (2013).

  38. 38.

    Frisell, T., Öberg, S., Kuja-Halkola, R. & Sjölander, A. Sibling comparison designs: bias from non-shared confounders and measurement error. Epidemiology 23, 713–720 (2012).

  39. 39.

    Heath, A. C. et al. Testing hypotheses about direction of causation using cross-sectional family data. Behav. Genet. 23, 29–50 (1993).

  40. 40.

    Neale, M. C. & Cardon, L. R. Methodology for Genetic Studies of Twins and Families. (Kluwer Academic, 1992).

  41. 41.

    D’Onofrio, B. M. et al. Paternal age at childbearing and offspring psychiatric and academic morbidity. JAMA Psychiatry 71, 432–438 (2014).

  42. 42.

    Tully, E. C., Iacono, W. G. & McGue, M. An adoption study of parental depression as an environmental liability for adolescent depression and childhood disruptive disorders. Am. J. Psychiatry 165, 1148 (2008).

  43. 43.

    Duffy, D. L. & Martin, N. G. Inferring the direction of causation in cross-sectional twin data: theoretical and empirical considerations. Genet. Epidemiol. 11, 483–502 (1994).

  44. 44.

    Wood, A. C., Rijsdijk, F., Asherson, P. & Kuntsi, J. Inferring causation from cross-sectional data: examination of the causal relationship between hyperactivity-impulsivity and novelty seeking. Front. Genet. 2, 6 (2011).

  45. 45.

    Toulopoulou, T. et al. Reciprocal causation models of cognitive versus volumetric cerebral intermediate phenotypes for schizophrenia in a pan-European twin cohort. Mol. Psychiatry 20, 1386 (2015).

  46. 46.

    Katan, M. B. Apolipoprotein E isoforms, serum cholesterol, and cancer. Lancet 1, 507–508 (1986).

  47. 47.

    Davey Smith, G. Mendelian randomization for strengthening causal inference in observational studies: application to gene x environment interactions. Perspect. Psychol. Sci. 5, 527–545 (2010).

  48. 48.

    Brion, M.-J. A., Benyamin, B., Visscher, P. M. & Smith, G. D. Beyond the single SNP: emerging developments in Mendelian randomization in the ‘omics’ era. Curr. Epidemiol. Rep. 1, 228–236 (2014).

  49. 49.

    Nitsch, D. et al. Limits to causal inference based on Mendelian randomization: a comparison with randomized controlled trials. Am. J. Epidemiol. 163, 397–403 (2006).

  50. 50.

    Davey Smith, G. et al. Genetic epidemiology and public health: hope, hype, and future prospects. Lancet 366, 1484–1498 (2005).

  51. 51.

    Davey Smith, G. et al. Association of C-reactive protein with blood pressure and hypertension: life course confounding and Mendelian randomization tests of causality. Arter. Thromb. Vasc. Biol. 25, 1051–1056 (2005).

  52. 52.

    Hartwig, F. P., Borges, M. C., Horta, B. L., Bowden, J. & Davey Smith, G. Inflammatory biomarkers and risk of schizophrenia: a 2-sample Mendelian randomization study. JAMA Psychiatry 74, 1226 (2017).

  53. 53.

    Wensley, F. et al. Association between C reactive protein and coronary heart disease: mendelian randomisation analysis based on individual participant data. BMJ 342, d548 (2011).

  54. 54.

    Bolton, C. E. et al. The CRP genotype, serum levels and lung function in men: the Caerphilly Prospective Study. Clin. Sci. 120, 347–355 (2011).

  55. 55.

    Pingault, J.-B., Cecil, C.a. M., Murray, J., Munafo, M. & Viding, E. Causal inference in psychopathology: a systematic review of Mendelian randomisation studies aiming to identify environmental risk factors for psychopathology. Psychopathol. Rev. 4, 4–25 (2017).

  56. 56.

    Manousaki, D., Mokry, L. E., Ross, S., Goltzman, D. & Richards, J. B. Mendelian randomization studies do not support a role for vitamin D in coronary artery disease. Circ. Cardiovasc. Genet. 9, 349–356 (2016).

  57. 57.

    Mokry, L. E. et al. Vitamin D and risk of multiple sclerosis: a Mendelian randomization study. PLoS Med. 12, e1001866 (2015).

  58. 58.

    Sheehan, N. A. & Didelez, V. Commentary: Can ‘many weak’ instruments ever be ‘strong’? Int. J. Epidemiol. 40, 752–754 (2011).

  59. 59.

    Visscher, P. M. & Yang, J. A plethora of pleiotropy across complex traits. Nat. Genet. 48, 707 (2016).

  60. 60.

    Bowden, J., Davey Smith, G. & Burgess, S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int. J. Epidemiol. 44, 512–525 (2015).This study introduces the use of a meta-analytical method known as Egger regression to MR analysis. Under certain assumptions, this approach enables causal estimation even when all instruments are invalid.

  61. 61.

    Bowden, J., Davey Smith, G., Haycock, P. C. & Burgess, S. Consistent estimation in mendelian randomization with some invalid instruments using a weighted median estimator. Genet. Epidemiol. 40, 304–314 (2016).

  62. 62.

    Rees, J. M. B., Wood, A. M. & Burgess, S. Extending the MR-Egger method for multivariable Mendelian randomization to correct for both measured and unmeasured pleiotropy. Stat. Med. 36, 4705–4718 (2017).This study provides the analytical framework to combine multivariable-MR and MR-Egger methods, which yields causal estimates robust to invalid genetic instruments.

  63. 63.

    Brion, M.-J. A., Shakhbazov, K. & Visscher, P. M. Calculating statistical power in Mendelian randomization studies. Int. J. Epidemiol. 42, 1497–1501 (2013).

  64. 64.

    Burgess, S. & Thompson, S. G. Bias in causal estimates from Mendelian randomization studies with weak instruments. Stat. Med. 30, 1312–1323 (2011).

  65. 65.

    Burgess, S. & Thompson, S. G. Improving bias and coverage in instrumental variable analysis with weak instruments for continuous and binary outcomes. Stat. Med. 31, 1582–1600 (2012).

  66. 66.

    Gage, S. H. et al. Assessing causality in associations between cannabis use and schizophrenia risk: a two-sample Mendelian randomization study. Psychol. Med. 47, 971–980 (2017).

  67. 67.

    Stringer, S. et al. Genome-wide association study of lifetime cannabis use based on a large meta-analytic sample of 32 330 subjects from the International Cannabis Consortium. Transl Psychiatry 6, e769 (2016).

  68. 68.

    Burgess, S. & Thompson, S. G. Multivariable Mendelian randomization: the use of pleiotropic genetic variants to estimate causal effects. Am. J. Epidemiol. 181, 251–260 (2015).

  69. 69.

    Burgess, S., Freitag, D. F., Khan, H., Gorman, D. N. & Thompson, S. G. Using multivariable Mendelian randomization to disentangle the causal effects of lipid fractions. PLoS ONE 9, e108891 (2014).

  70. 70.

    Liu, D. J. et al. Exome-wide association study of plasma lipids in >300,000 individuals. Nat. Genet. 49, 1758 (2017).

  71. 71.

    Tyrrell, J. et al. Genetic evidence for causal relationships between maternal obesity-related traits and birth weight. JAMA 315, 1129–1140 (2016).

  72. 72.

    Richmond, R. C. et al. Using genetic variation to explore the causal effect of maternal pregnancy adiposity on future offspring adiposity:a Mendelian randomisation study. PLoS Med. 14, e1002221 (2017).

  73. 73.

    Zhang, G. et al. Assessing the causal relationship of maternal height on birth size and gestational age at birth: a Mendelian randomization analysis. PLoS Med. 12, e1001865 (2015).This study introduces intergenerational MR by computing allelic scores in the mother containing variants either transmitted or non-transmitted to the offspring. The method enables the estimation of the effect of maternal risk factors on the offspring free from passive gene–environment correlation.

  74. 74.

    Evans, D. M. et al. Mining the human phenome using allelic scores that index biological intermediates. PLoS Genet. 9, e1003919 (2013).

  75. 75.

    Krapohl, E. et al. Widespread covariation of early environmental exposures and trait-associated polygenic variation. Proc. Natl Acad. Sci. USA 114, 11727–11732 (2017).

  76. 76.

    Fletcher, J. M. The promise and pitfalls of combining genetic and economic research. Health Econ. 20, 889–892 (2011).

  77. 77.

    Minica, C. C., Dolan, C. V., Boomsma, D. I., de Geus, E. & Neale, M. C. Extending causality tests with genetic instruments: an integration of Mendelian randomization and the classical twin design. Preprint at bioRxiv 134585 (2017).

  78. 78.

    Davey Smith, G. & Ebrahim, S. ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? Int. J. Epidemiol. 32, 1–22 (2003).

  79. 79.

    Davey Smith, G. Capitalizing on Mendelian randomization to assess the effects of treatments. J. R. Soc. Med. 100, 432–435 (2007).

  80. 80.

    Pasaniuc, B. & Price, A. L. Dissecting the genetics of complex traits using summary association statistics. Nat. Rev. Genet. 18, 117–127 (2017).

  81. 81.

    Gill, D. et al. Age at menarche and lung function: a Mendelian randomization study. Eur. J. Epidemiol. 32, 701–710 (2017).

  82. 82.

    Bush, W. S., Oetjens, M. T. & Crawford, D. C. Unravelling the human genome-phenome relationship using phenome-wide association studies. Nat. Rev. Genet. 17, 129 (2016).

  83. 83.

    O’Reilly, P. F. et al. MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS. PLoS ONE 7, e34861 (2012).

  84. 84.

    Porter, H. F. & O’Reilly, P. F. Multivariate simulation framework reveals performance of multi-trait GWAS methods. Sci. Rep. 7, 38837 (2017).

  85. 85.

    Pickrell, J. K. et al. Detection and interpretation of shared genetic influences on 42 human traits. Nat. Genet. 48, 709 (2016).This study introduces a method to detect shared genetic influences on multiple traits. It includes a test of asymmetry, which helps to identify pairs of phenotypes that are causally related and which phenotype influences the other (that is, direction of causation).

  86. 86.

    Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).This study applies summary Mendelian randomization (SMR) methods to expression data and enables the distinction between shared aetiology between expression and phenotypes owing to shared causal variants or distinct variants in LD.

  87. 87.

    Richardson, T. G. et al. Mendelian randomization analysis identifies CpG sites as putative mediators for genetic influences on cardiovascular disease risk. Am. J. Hum. Genet. 101, 590–602 (2017).

  88. 88.

    Wallace, C. Statistical testing of shared genetic control for potentially related traits. Genet. Epidemiol. 37, 802–813 (2013).

  89. 89.

    Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).This paper introduces a Bayesian colocalization method to identify shared causal variants between phenotypes.

  90. 90.

    Walter, S. et al. Revisiting mendelian randomization studies of the effect of body mass index on depression. Am. J. Med. Genet. B. Neuropsychiatr. Genet. 168B, 108–115 (2015).

  91. 91.

    Hemani, G. et al. Automating Mendelian randomization through machine learning to construct a putative causal map of the human phenome. Preprint at bioRxiv 173682 (2017).

  92. 92.

    Davey Smith, G. et al. Incidence of type 2 diabetes in the randomized multiple risk factor intervention trial. Ann. Intern. Med. 142, 313–322 (2005).

  93. 93.

    Åsvold, B. O. et al. Causal associations of tobacco smoking with cardiovascular risk factors: a Mendelian randomization analysis of the HUNT Study in Norway. Int. J. Epidemiol. 43, 1458–1470 (2014).

  94. 94.

    Burgess, S., Daniel, R. M., Butterworth, A. S. & Thompson, S. G. Network Mendelian randomization: using genetic variants as instrumental variables to investigate mediation in causal pathways. Int. J. Epidemiol. 44, 484–495 (2015).

  95. 95.

    Chen, W.-M. & Abecasis, G. R. Family-based association tests for genomewide association scans. Am. J. Hum. Genet. 81, 913–926 (2007).

  96. 96.

    Dudbridge, F. Likelihood-based association analysis for nuclear families and unrelated subjects with missing genotype data. Hum. Hered. 66, 87–98 (2008).

  97. 97.

    Dudbridge, F. Power and predictive accuracy of polygenic risk scores. PLoS Genet. 9, e1003348 (2013).

  98. 98.

    Moayyeri, A., Hammond, C. J., Valdes, A. M. & Spector, T. D. Cohort profile: TwinsUK and healthy ageing twin study. Int. J. Epidemiol. 42, 76–85 (2013).

  99. 99.

    Haworth, C. M. A., Davis, O. S. P. & Plomin, R. Twins Early Development Study (TEDS): a genetically sensitive investigation of cognitive and behavioral development from childhood to young adulthood. Twin Res. Hum. Genet. 16, 117–125 (2013).

  100. 100.

    Magnus, P. et al. Cohort profile update: the Norwegian Mother and Child Cohort Study (MoBa). Int. J. Epidemiol. 45, 382–388 (2016).

  101. 101.

    Fraser, A. et al. Cohort profile: the Avon Longitudinal Study of Parents and Children: ALSPAC mothers cohort. Int. J. Epidemiol. 42, 97–110 (2013).

  102. 102.

    Walker, V. M., Davey Smith, G., Davies, N. M. & Martin, R. M. Mendelian randomization: a novel approach for the prediction of adverse drug events and drug repurposing opportunities. Int. J. Epidemiol. 46, 2078–2089 (2017).

  103. 103.

    Scott, R. A. et al. A genomic approach to therapeutic target validation identifies a glucose-lowering GLP1R variant protective for coronary heart disease. Sci. Transl Med. 8, 341ra76 (2016).

  104. 104.

    Lehrer, S. F. & Ding, W. Are genetic markers of interest for economic research? IZA J. Labor Policy. 6, 2 (2017).

  105. 105.

    Glymour, M. M., Tchetgen Tchetgen, E. J. & Robins, J. M. Credible Mendelian randomization studies: approaches for evaluating the instrumental variable assumptions. Am. J. Epidemiol. 175, 332–339 (2012).

  106. 106.

    Lawlor, D. A., Tilling, K. & Davey Smith, G. Triangulation in aetiological epidemiology. Int. J. Epidemiol. 45, 1866–1886 (2016).

  107. 107.

    Munafò, M. R. & Davey Smith, G. Robust research needs many lines of evidence. Nature 553, 399–401 (2018).

  108. 108.

    Fisher, R. A. Alleged dangers of cigarette-smoking. BMJ 2, 297–298 (1957).

  109. 109.

    Knopik, V. S., Neiderhiser, J. M., DeFries, J. C. & Plomin, R. Behavioral Genetics. (Worth Publishers, New York, 2016).

  110. 110.

    Okbay, A. et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature 533, 539–542 (2016).

  111. 111.

    Kendler, K. S. & Baker, J. H. Genetic influences on measures of the environment: a systematic review. Psychol. Med. 37, 615–626 (2007).

  112. 112.

    Krapohl, E. & Plomin, R. Genetic link between family socioeconomic status and children’s educational achievement estimated from genome-wide SNPs. Mol. Psychiatry 21, 437–443 (2016).

  113. 113.

    Munafò, M. R. et al. Association between genetic variants on chromosome 15q25 locus and objective measures of tobacco exposure. J. Natl Cancer Inst. 104, 740–748 (2012).

  114. 114.

    Tobacco and Genetics Consortium. Genome-wide meta-analyses identify multiple loci associated with smoking behavior. Nat. Genet. 42, 441–447 (2010).

  115. 115.

    Morral, A. R., McCaffrey, D. F. & Paddock, S. M. Reassessing the marijuana gateway effect. Addiction 97, 1493–1504 (2002).

  116. 116.

    Rutter, M. Proceeding from observed correlation to causal inference: the use of natural experiments. Perspect. Psychol. Sci. 2, 377–395 (2007).

  117. 117.

    Greenland, S. Quantifying biases in causal models: classical confounding versus collider-stratification bias. Epidemiology 14, 300–306 (2003).

  118. 118.

    Sheehan, N. A., Didelez, V., Burton, P. R. & Tobin, M. D. Mendelian randomisation and causal inference in observational epidemiology. PLoS Med. 5, e177 (2008).

  119. 119.

    Didelez, V. & Sheehan, N. Mendelian randomization as an instrumental variable approach to causal inference. Stat. Methods Med. Res. 16, 309–330 (2007).

  120. 120.

    Burgess, S. & Thompson, S. G. Mendelian Randomization: Methods for Using Genetic Variants in Causal Estimation. (CRC Press, Boca Raton, 2015).

  121. 121.

    Davey Smith, G. et al. Clustered environments and randomized genes: a fundamental distinction between conventional and genetic epidemiology. PLoS Med. 4, e352 (2007).

  122. 122.

    Pierce, B. L. & Burgess, S. Efficient design for Mendelian randomization studies: subsample and 2-sample instrumental variable estimators. Am. J. Epidemiol. 178, 1177–1184 (2013).

  123. 123.

    Hu, J. X., Thomas, C. E. & Brunak, S. Network biology concepts in complex disease comorbidities. Nat. Rev. Genet. 17, 615–629 (2016).

  124. 124.

    Solovieff, N., Cotsapas, C., Lee, P. H., Purcell, S. M. & Smoller, J. W. Pleiotropy in complex traits: challenges and strategies. Nat. Rev. Genet. 14, 483–495 (2013).

  125. 125.

    Paaby, A. B. & Rockman, M. V. The many faces of pleiotropy. Trends Genet. 29, 66–73 (2013).

  126. 126.

    Kong, A. et al. The nature of nurture: effects of parental genotypes. Science 359, 424–428 (2018).

  127. 127.

    Bates, T. C. et al. The nature of nurture: using a virtual-parent design to test parenting effects on children’s educational attainment in genotyped families. Twin Res. Hum. Genet. 21, 73–83 (2018).

  128. 128.

    Euesden, J., Lewis, C. M. & O’Reilly, P. F. PRSice: polygenic risk score software. Bioinformatics 31, 1466–1468 (2015).

  129. 129.

    Burgess, S., Butterworth, A. & Thompson, S. G. Mendelian randomization analysis with multiple genetic variants using summarized data. Genet. Epidemiol. 37, 658–665 (2013).

  130. 130.

    Hartwig, F. P., Davey Smith, G. & Bowden, J. Robust inference in summary data Mendelian randomization via the zero modal pleiotropy assumption. Int. J. Epidemiol. 46, 1985–1998 (2017).

  131. 131.

    Burgess, S., Zuber, V., Gkatzionis, A., Rees, J. M. B. & Foley, C. Improving on a modal-based estimation method: model averaging for consistent and efficient estimation in Mendelian randomization when a plurality of candidate instruments are valid. Preprint at bioRxiv 175372 (2017).

  132. 132.

    Bowden, J. et al. A framework for the investigation of pleiotropy in two-sample summary data Mendelian randomization. Stat. Med. 36, 1783–1802 (2017).

Download references


The authors thank S. Gage and J. M. Vink for cannabis initiation summary statistics and syntax and J. Rees for multivariable Mendelian randomization (MR) syntax. J.-B.P. is a fellow of MQ: Transforming Mental Health (MQ16IP16) and affiliated with the Centre for Research in Epidemiology and Population Health (CESP), French National Institute for Health and Medical Research (INSERM), Université de Paris-Sud, Université de Versailles-Saint Quentin, and Université Paris-Saclay, Paris, France. P.F.O. receives funding from the UK Medical Research Council (MR/N015746/1) and the Wellcome Trust (109863/Z/15/Z). This report represents independent research (partly) funded by the National Institute for Health Research (NIHR) Biomedical Research Centre at South London and Maudsley National Health Service (NHS) Foundation Trust and King’s College London. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR, or the Department of Health.

Reviewer information

Nature Reviews Genetics thanks G. Hemani and M. C. Neale for their contribution to the peer review of this work.

Author information


  1. Department of Clinical, Educational and Health Psychology, University College London, London, UK

    • Jean-Baptiste Pingault
    •  & Tabea Schoeler
  2. Social, Genetic, and Developmental Psychiatry, King’s College London, De Crespigny Park, London, UK

    • Jean-Baptiste Pingault
    • , Paul F. O’Reilly
    •  & Frühling Rijsdijk
  3. Centre for Longitudinal Studies, Department of Social Science, UCL Institute of Education, University College London, London, UK

    • George B. Ploubidis
  4. Department of Health Sciences, University of Leicester, Leicester, UK

    • Frank Dudbridge


  1. Search for Jean-Baptiste Pingault in:

  2. Search for Paul F. O’Reilly in:

  3. Search for Tabea Schoeler in:

  4. Search for George B. Ploubidis in:

  5. Search for Frühling Rijsdijk in:

  6. Search for Frank Dudbridge in:


J.-B.P. and T.S. researched data for the article. J.-B.P. and F.D. wrote the manuscript. All authors contributed substantially to discussion of content and reviewed and/or edited the manuscript before submission.

Competing interests

The authors declare no competing interests.

Corresponding author

Correspondence to Jean-Baptiste Pingault.

Supplementary information


Causal risk and protective factors

Factors whose different values predict different risks of the outcome (either an elevated risk or a protective effect), with all other factors being held constant.


Measurable individual characteristics.


A phenomenon whereby a variable (the confounder) has a causal effect on both the risk factor and the outcome, generating a spurious association between the two.

Genetic confounding

Confounding created by genetic factors influencing both the risk factor and the outcome.

Causal inference methods

Methods that aim to clarify the causal status of a risk factor, either by providing a direct estimate of the causal effect or by ruling out possible sources of confounding (for example, removing the possibility of genetic confounding).

Genetically informed methods

Methods that use genetic information, such as known genetic relationships (for example, twins) or genetic variation data.

Instrumental variable

A variable that is used as a proxy for an exposure X to estimate the causal effect of X on an outcome. This variable must be robustly associated with X, independent of all confounders of the effect of X on an outcome Y, and its effect on Y must be entirely mediated by X.

Mendelian randomization

A method that uses single nucleotide polymorphisms (SNPs) associated with an exposure as instruments to probe the causal nature of the relationship between this exposure and an outcome of interest.


Also known as potential outcomes. The counterfactual is a treatment (or value of a risk factor) that an individual is not exposed to. The potential outcome is the outcome that would be obtained under this counterfactual treatment.


Verified when the expected outcome in the non-treated group would have been the same as the outcome in the treated group if subjects in the non-treated group had received the treatment. Conditional exchangeability occurs when exchangeability is verified in each stratum of a confounder after conditioning (adjusting) for the confounder.

Genetic relatedness

Occurs when two individuals share a proportion of their genome identical by descent, as a result of inheritance from a recent common ancestor.


An exposure X and an outcome Y are d-separated through the process of d-separation, in which all backdoor paths between X and Y are blocked, to estimate the unconfounded effect of X on Y.


A core assumption of instrumental variable estimation, whereby the instrument used must be robustly associated with the exposure of interest.

Exclusion restriction

A core assumption of instrumental variable estimation whereby the effect of the instrument on the outcome must act entirely through its effect on the exposure (that is, not directly and not via confounders or other mediators).

Backdoor paths

Also known as unblocked paths. A path between an exposure X and an outcome Y through a confounder, which biases the estimation of the causal effect of X on Y.

Structural equation modelling

Multivariate statistical technique combining factor analysis and regression analysis to estimate networks of relationships between latent and observed variables.

Sensitivity analysis

An analysis conducted to assess how robust an association of interest is to potential unobserved confounding or other sources of bias.


The proportion of variance in a phenotype that can be attributed to genetic differences among individuals in a given population. Narrow-sense heritability estimates additive genetic effects. Broad-sense heritability includes both additive and dominance effects.

Environmental influences

Influences that contribute to make two individuals (for example, twins) similar (shared environmental influences) or dissimilar (non-shared environmental influences) to each other.

Single nucleotide polymorphisms

(SNPs). DNA sequence variation arising from differences in a single nucleotide: adenine (A), thymine (T), cytosine (C) or guanine (G).


Influenced by variants in many genes.


Occurs when a genetic locus (for example, a single nucleotide polymorphism (SNP)) affects more than one trait.

Genome-wide association studies

(GWAS). Studies in which hundreds of thousands to millions of genetic variants are tested for an association with a phenotype.


When several estimates of the same effect do not converge towards the same value, whether in meta-analyses or in Mendelian randomization analyses using many genetic instruments.

Collider bias

When a variable (the collider) is independently caused by the exposure and outcome of interest; controlling for it creates an association between exposure and outcome.

Allelic scores

Computed as a polygenic score but summarizes genetic information derived from a few to a few hundred single nucleotide polymorphisms (SNPs) as opposed to polygenic scores, which rely on thousands up to all SNPs in the genome.

Polygenic scores

Individual-level scores that summarize genetic risk (or protection) for a given phenotype. For each single nucleotide polymorphism (SNP), a score is computed by counting effect alleles in an individual and weighting them by the effect size of this SNP. A polygenic score is computed by summing scores from a large number, potentially all, of the SNPs in the genome.

Dynastic effects

Occur when genetic variants in parents are transmitted to the offspring but also contribute to parental phenotype and in turn to the environment experienced by the child. This induces a correlation between offspring genotypes and the offspring’s environment.

Summary association statistics

Effect sizes and standard errors derived from a genome-wide association study for each single nucleotide polymorphism (SNP). They may include other summary statistics (for example, allele frequency or imputation accuracy).

Genetic correlations

The correlation between causal effect sizes for two phenotypes across single nucleotide polymorphisms (SNPs). Typically reported as the correlation across the whole genome and will differ when restricted to pleiotropic SNPs only.

Phenome-wide association studies

(PheWAS). These studies estimate the association of one or a few genetic variants of particular interest against many phenotypes, that is, a selection of all possible phenotypes or phenome.

Colocalization methods

When a genetic region contains variants associated with more than one phenotype, colocalization methods aim to determine whether this is due to shared or distinct causal variants.

Linkage disequilibrium

(LD). Nonrandom associations between alleles at different loci.

About this article

Publication history



Further reading