Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Using genetic data to strengthen causal inference in observational research


Causal inference is essential across the biomedical, behavioural and social sciences.By progressing from confounded statistical associations to evidence of causal relationships, causal inference can reveal complex pathways underlying traits and diseases and help to prioritize targets for intervention. Recent progress in genetic epidemiology — including statistical innovation, massive genotyped data sets and novel computational tools for deep data mining — has fostered the intense development of methods exploiting genetic data and relatedness to strengthen causal inference in observational research. In this Review, we describe how such genetically informed methods differ in their rationale, applicability and inherent limitations and outline how they should be integrated in the future to offer a rich causal inference toolbox.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Including multiple instruments in Mendelian randomization.
Fig. 2: Causal mapping.


  1. 1.

    Glass, T. A., Goodman, S. N., Hernán, M. A. & Samet, J. M. Causal inference in public health. Annu. Rev. Public Health 34, 61–75 (2013).

    PubMed  PubMed Central  Article  Google Scholar 

  2. 2.

    Rimm, E. B. et al. Vitamin E consumption and the risk of coronary heart disease in men. N. Engl. J. Med. 328, 1450–1456 (1993).

    PubMed  CAS  Article  Google Scholar 

  3. 3.

    Stampfer, M. J. et al. Vitamin E consumption and the risk of coronary disease in women. N. Engl. J. Med. 328, 1444–1449 (1993).

    PubMed  CAS  Article  Google Scholar 

  4. 4.

    Millen, A. E., Dodd, K. W. & Subar, A. F. Use of vitamin, mineral, nonvitamin, and nonmineral supplements in the United States: the 1987, 1992, and 2000 National Health Interview Survey results. J. Am. Diet Assoc. 104, 942–950 (2004).

    PubMed  Article  Google Scholar 

  5. 5.

    Eidelman, R. S., Hollar, D., Hebert, P. R., Lamas, G. A. & Hennekens, C. H. Randomized trials of vitamin E in the treatment and prevention of cardiovascular disease. Arch. Intern. Med. 164, 1552–1556 (2004).

    PubMed  CAS  Article  Google Scholar 

  6. 6.

    Imai, K., King, G. & Stuart, E. A. Misunderstandings between experimentalists and observationalists about causal inference. J. Royal Stat. Soc. A Stat. Methodol. 171, 481–502 (2008).

    Article  Google Scholar 

  7. 7.

    Jaffee, S. R. & Price, T. S. The implications of genotype-environment correlation for establishing causal processes in psychopathology. Dev. Psychopathol. 24, 1253–1264 (2012).

    PubMed  Article  Google Scholar 

  8. 8.

    Deaton, A. & Cartwright, N. Understanding and misunderstanding randomized controlled trials. Soc. Sci. Med. (2017).

    PubMed  PubMed Central  Article  Google Scholar 

  9. 9.

    DiMasi, J. A., Grabowski, H. G. & Hansen, R. W. Innovation in the pharmaceutical industry: new estimates of R&D costs. J. Health Econ. 47, 20–33 (2016).

    PubMed  Article  Google Scholar 

  10. 10.

    McGue, M., Osler, M. & Christensen, K. Causal inference and observational research: the utility of twins. Perspect. Psychol. Sci. 5, 546–556 (2010).This study is an introduction to the twin model from a causal inference perspective. It includes a discussion of concepts, estimations and limitations.

    PubMed  PubMed Central  Article  Google Scholar 

  11. 11.

    Davey Smith, G. & Ebrahim, S. What can Mendelian randomisation tell us about modifiable behavioural and environmental exposures? BMJ 330, 1076–1079 (2005).

    PubMed  PubMed Central  Article  Google Scholar 

  12. 12.

    Davey Smith, G. & Hemani, G. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Hum. Mol. Genet. 23, R89–98 (2014).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  13. 13.

    Burgess, S., Timpson, N. J., Ebrahim, S. & Davey Smith, G. Mendelian randomization: where are we now and where are we going? Int. J. Epidemiol. 44, 379–388 (2015).

    PubMed  Article  Google Scholar 

  14. 14.

    Nikpay, M. et al. A comprehensive 1,000 Genomes-based genome-wide association meta-analysis of coronary artery disease. Nat. Genet. 47, 1121–1130 (2015).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  15. 15.

    Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).

    PubMed  PubMed Central  Article  Google Scholar 

  16. 16.

    Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  17. 17.

    Hemani, G. et al. MR-Base: a platform for systematic causal inference across the phenome using billions of genetic associations. Preprint at bioRxiv 78972 (2016).

  18. 18.

    Stuart, E. A. Matching methods for causal inference: a review and a look forward. Stat. Sci. 25, 1–21 (2010).

    PubMed  PubMed Central  Article  Google Scholar 

  19. 19.

    Angrist, J. D., Imbens, G. W. & Rubin, D. B. Identification of causal effects using instrumental variables. J. Am. Stat. Assoc. 91, 444–455 (1996).

    Article  Google Scholar 

  20. 20.

    Tenesa, A. & Haley, C. S. The heritability of human disease: estimation, uses and abuses. Nat. Rev. Genet. 14, 139–149 (2013).

    PubMed  CAS  Article  Google Scholar 

  21. 21.

    Speed, D., Cai, N., Johnson, M. R., Nejentsev, S. & Balding, D. J. Reevaluation of SNP heritability in complex human traits. Nat. Genet. 49, 986–992 (2017).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  22. 22.

    Hernán, M. A. A definition of causal effect for epidemiological research. J. Epidemiol. Commun. Health 58, 265–271 (2004). This study is a pedagogical introduction to the counterfactual or potential outcomes framework for causal inference. It includes mathematical notations and a discussion of key concepts, such as association, causation and exchangeability.

    Article  Google Scholar 

  23. 23.

    Imbens, G. W. & Rubin, D. B. Causal Inference for Statistics, Social, and Biomedical Sciences. (Cambridge Univ. Press, Cambridge, 2015).

    Book  Google Scholar 

  24. 24.

    Pearl, J. Causality. (Cambridge Univ. Press, Cambridge, 2009).

    Book  Google Scholar 

  25. 25.

    Rice, F. et al. Disentangling prenatal and inherited influences in humans with an experimental design. Proc. Natl Acad. Sci. USA 106, 2464–2467 (2009).This is an example of the application of the IVF design to examine the effect of smoking during pregnancy on birthweight.

    PubMed  CAS  Article  Google Scholar 

  26. 26.

    Mezuk, B., Myers, J. M. & Kendler, K. S. Integrating social science and behavioral genetics: testing the origin of socioeconomic disparities in depression using a genetically informed design. Am. J. Publ. Heal. 103 (Suppl.), 145–151 (2013).

    Article  Google Scholar 

  27. 27.

    Kendler, K. S. & Gardner, C. O. Dependent stressful life events and prior depressive episodes in the prediction of major depression: the problem of causal inference in psychiatric epidemiology. Arch. Gen. Psychiatry 67, 1120–1127 (2010).

    PubMed  PubMed Central  Article  Google Scholar 

  28. 28.

    Bruder, C. E. G. et al. Phenotypically concordant and discordant monozygotic twins display different DNA copy-number-variation profiles. Am. J. Hum. Genet. 82, 763–771 (2008).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  29. 29.

    Carlin, J. B., Gurrin, L. C., Sterne, J. A., Morley, R. & Dwyer, T. Regression models for twin studies: a critical review. Int. J. Epidemiol. 34, 1089–1099 (2005).

    PubMed  Article  Google Scholar 

  30. 30.

    Vitaro, F., Brendgen, M. & Arseneault, L. The discordant MZ-twin method: one step closer to the holy grail of causality. Int. J. Behav. Dev. 33, 376–382 (2009).

    Article  Google Scholar 

  31. 31.

    Fletcher, J. M. & Lehrer, S. F. Genetic lotteries within families. J. Heal. Econ. 30, 647–659 (2011). This paper provides a model combining family fixed effects and genetic instruments, with a discussion of important concepts, such as dynastic effects.

    Article  Google Scholar 

  32. 32.

    Kohler, H.-P., Behrman, J. R. & Schnittker, J. Social science methods for twins data: integrating causality, endowments, and heritability. Biodemogr. Soc. Biol. 57, 88–141 (2011).

    Article  Google Scholar 

  33. 33.

    Hjelmborg, J. et al. Lung cancer, genetic predisposition and smoking: the Nordic Twin Study of Cancer. Thorax 72, 1021–1027 (2017).

    PubMed  Article  Google Scholar 

  34. 34.

    Bröckerman, P., Hyytinen, A. & Kaprio, J. Smoking and long-term labour market outcomes. Tob. Control 24, 348–353 (2015).

    Article  Google Scholar 

  35. 35.

    Cohen-Cline, H., Turkheimer, E. & Duncan, G. E. Access to green space, physical activity and mental health: a twin study. J. Epidemiol. Commun. Health 69, 523–529 (2015).

    Article  Google Scholar 

  36. 36.

    Singham, T. et al. Concurrent and longitudinal contribution of exposure to bullying in childhood to mental health: the role of vulnerability and resilience. JAMA Psychiatry 74, 1112–1119 (2017).

    PubMed  PubMed Central  Article  Google Scholar 

  37. 37.

    Taylor, M. J. et al. Developmental associations between traits of autism spectrum disorder and attention deficit hyperactivity disorder: a genetically informative, longitudinal twin study. Psychol. Med. 43, 1735–1746 (2013).

    PubMed  CAS  Article  Google Scholar 

  38. 38.

    Frisell, T., Öberg, S., Kuja-Halkola, R. & Sjölander, A. Sibling comparison designs: bias from non-shared confounders and measurement error. Epidemiology 23, 713–720 (2012).

    PubMed  Article  Google Scholar 

  39. 39.

    Heath, A. C. et al. Testing hypotheses about direction of causation using cross-sectional family data. Behav. Genet. 23, 29–50 (1993).

    PubMed  CAS  Article  Google Scholar 

  40. 40.

    Neale, M. C. & Cardon, L. R. Methodology for Genetic Studies of Twins and Families. (Kluwer Academic, 1992).

  41. 41.

    D’Onofrio, B. M. et al. Paternal age at childbearing and offspring psychiatric and academic morbidity. JAMA Psychiatry 71, 432–438 (2014).

    PubMed  PubMed Central  Article  Google Scholar 

  42. 42.

    Tully, E. C., Iacono, W. G. & McGue, M. An adoption study of parental depression as an environmental liability for adolescent depression and childhood disruptive disorders. Am. J. Psychiatry 165, 1148 (2008).

    PubMed  PubMed Central  Article  Google Scholar 

  43. 43.

    Duffy, D. L. & Martin, N. G. Inferring the direction of causation in cross-sectional twin data: theoretical and empirical considerations. Genet. Epidemiol. 11, 483–502 (1994).

    PubMed  CAS  Article  Google Scholar 

  44. 44.

    Wood, A. C., Rijsdijk, F., Asherson, P. & Kuntsi, J. Inferring causation from cross-sectional data: examination of the causal relationship between hyperactivity-impulsivity and novelty seeking. Front. Genet. 2, 6 (2011).

    PubMed  PubMed Central  Article  Google Scholar 

  45. 45.

    Toulopoulou, T. et al. Reciprocal causation models of cognitive versus volumetric cerebral intermediate phenotypes for schizophrenia in a pan-European twin cohort. Mol. Psychiatry 20, 1386 (2015).

    PubMed  CAS  Article  Google Scholar 

  46. 46.

    Katan, M. B. Apolipoprotein E isoforms, serum cholesterol, and cancer. Lancet 1, 507–508 (1986).

    PubMed  CAS  Article  Google Scholar 

  47. 47.

    Davey Smith, G. Mendelian randomization for strengthening causal inference in observational studies: application to gene x environment interactions. Perspect. Psychol. Sci. 5, 527–545 (2010).

    Article  Google Scholar 

  48. 48.

    Brion, M.-J. A., Benyamin, B., Visscher, P. M. & Smith, G. D. Beyond the single SNP: emerging developments in Mendelian randomization in the ‘omics’ era. Curr. Epidemiol. Rep. 1, 228–236 (2014).

    Article  Google Scholar 

  49. 49.

    Nitsch, D. et al. Limits to causal inference based on Mendelian randomization: a comparison with randomized controlled trials. Am. J. Epidemiol. 163, 397–403 (2006).

    PubMed  Article  Google Scholar 

  50. 50.

    Davey Smith, G. et al. Genetic epidemiology and public health: hope, hype, and future prospects. Lancet 366, 1484–1498 (2005).

    PubMed  Article  Google Scholar 

  51. 51.

    Davey Smith, G. et al. Association of C-reactive protein with blood pressure and hypertension: life course confounding and Mendelian randomization tests of causality. Arter. Thromb. Vasc. Biol. 25, 1051–1056 (2005).

    Article  CAS  Google Scholar 

  52. 52.

    Hartwig, F. P., Borges, M. C., Horta, B. L., Bowden, J. & Davey Smith, G. Inflammatory biomarkers and risk of schizophrenia: a 2-sample Mendelian randomization study. JAMA Psychiatry 74, 1226 (2017).

    PubMed  Article  Google Scholar 

  53. 53.

    Wensley, F. et al. Association between C reactive protein and coronary heart disease: mendelian randomisation analysis based on individual participant data. BMJ 342, d548 (2011).

    PubMed  Article  Google Scholar 

  54. 54.

    Bolton, C. E. et al. The CRP genotype, serum levels and lung function in men: the Caerphilly Prospective Study. Clin. Sci. 120, 347–355 (2011).

    PubMed  Article  Google Scholar 

  55. 55.

    Pingault, J.-B., Cecil, C.a. M., Murray, J., Munafo, M. & Viding, E. Causal inference in psychopathology: a systematic review of Mendelian randomisation studies aiming to identify environmental risk factors for psychopathology. Psychopathol. Rev. 4, 4–25 (2017).

    Google Scholar 

  56. 56.

    Manousaki, D., Mokry, L. E., Ross, S., Goltzman, D. & Richards, J. B. Mendelian randomization studies do not support a role for vitamin D in coronary artery disease. Circ. Cardiovasc. Genet. 9, 349–356 (2016).

    PubMed  CAS  Article  Google Scholar 

  57. 57.

    Mokry, L. E. et al. Vitamin D and risk of multiple sclerosis: a Mendelian randomization study. PLoS Med. 12, e1001866 (2015).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  58. 58.

    Sheehan, N. A. & Didelez, V. Commentary: Can ‘many weak’ instruments ever be ‘strong’? Int. J. Epidemiol. 40, 752–754 (2011).

    PubMed  Article  Google Scholar 

  59. 59.

    Visscher, P. M. & Yang, J. A plethora of pleiotropy across complex traits. Nat. Genet. 48, 707 (2016).

    PubMed  CAS  Article  Google Scholar 

  60. 60.

    Bowden, J., Davey Smith, G. & Burgess, S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int. J. Epidemiol. 44, 512–525 (2015).This study introduces the use of a meta-analytical method known as Egger regression to MR analysis. Under certain assumptions, this approach enables causal estimation even when all instruments are invalid.

    PubMed  PubMed Central  Article  Google Scholar 

  61. 61.

    Bowden, J., Davey Smith, G., Haycock, P. C. & Burgess, S. Consistent estimation in mendelian randomization with some invalid instruments using a weighted median estimator. Genet. Epidemiol. 40, 304–314 (2016).

    PubMed  PubMed Central  Article  Google Scholar 

  62. 62.

    Rees, J. M. B., Wood, A. M. & Burgess, S. Extending the MR-Egger method for multivariable Mendelian randomization to correct for both measured and unmeasured pleiotropy. Stat. Med. 36, 4705–4718 (2017).This study provides the analytical framework to combine multivariable-MR and MR-Egger methods, which yields causal estimates robust to invalid genetic instruments.

    PubMed  PubMed Central  Article  Google Scholar 

  63. 63.

    Brion, M.-J. A., Shakhbazov, K. & Visscher, P. M. Calculating statistical power in Mendelian randomization studies. Int. J. Epidemiol. 42, 1497–1501 (2013).

    PubMed  Article  Google Scholar 

  64. 64.

    Burgess, S. & Thompson, S. G. Bias in causal estimates from Mendelian randomization studies with weak instruments. Stat. Med. 30, 1312–1323 (2011).

    PubMed  Article  Google Scholar 

  65. 65.

    Burgess, S. & Thompson, S. G. Improving bias and coverage in instrumental variable analysis with weak instruments for continuous and binary outcomes. Stat. Med. 31, 1582–1600 (2012).

    PubMed  Article  Google Scholar 

  66. 66.

    Gage, S. H. et al. Assessing causality in associations between cannabis use and schizophrenia risk: a two-sample Mendelian randomization study. Psychol. Med. 47, 971–980 (2017).

    PubMed  CAS  Article  Google Scholar 

  67. 67.

    Stringer, S. et al. Genome-wide association study of lifetime cannabis use based on a large meta-analytic sample of 32 330 subjects from the International Cannabis Consortium. Transl Psychiatry 6, e769 (2016).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  68. 68.

    Burgess, S. & Thompson, S. G. Multivariable Mendelian randomization: the use of pleiotropic genetic variants to estimate causal effects. Am. J. Epidemiol. 181, 251–260 (2015).

    PubMed  PubMed Central  Article  Google Scholar 

  69. 69.

    Burgess, S., Freitag, D. F., Khan, H., Gorman, D. N. & Thompson, S. G. Using multivariable Mendelian randomization to disentangle the causal effects of lipid fractions. PLoS ONE 9, e108891 (2014).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  70. 70.

    Liu, D. J. et al. Exome-wide association study of plasma lipids in >300,000 individuals. Nat. Genet. 49, 1758 (2017).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  71. 71.

    Tyrrell, J. et al. Genetic evidence for causal relationships between maternal obesity-related traits and birth weight. JAMA 315, 1129–1140 (2016).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  72. 72.

    Richmond, R. C. et al. Using genetic variation to explore the causal effect of maternal pregnancy adiposity on future offspring adiposity:a Mendelian randomisation study. PLoS Med. 14, e1002221 (2017).

    PubMed  PubMed Central  Article  Google Scholar 

  73. 73.

    Zhang, G. et al. Assessing the causal relationship of maternal height on birth size and gestational age at birth: a Mendelian randomization analysis. PLoS Med. 12, e1001865 (2015).This study introduces intergenerational MR by computing allelic scores in the mother containing variants either transmitted or non-transmitted to the offspring. The method enables the estimation of the effect of maternal risk factors on the offspring free from passive gene–environment correlation.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  74. 74.

    Evans, D. M. et al. Mining the human phenome using allelic scores that index biological intermediates. PLoS Genet. 9, e1003919 (2013).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  75. 75.

    Krapohl, E. et al. Widespread covariation of early environmental exposures and trait-associated polygenic variation. Proc. Natl Acad. Sci. USA 114, 11727–11732 (2017).

    PubMed  CAS  Article  Google Scholar 

  76. 76.

    Fletcher, J. M. The promise and pitfalls of combining genetic and economic research. Health Econ. 20, 889–892 (2011).

    PubMed  Article  Google Scholar 

  77. 77.

    Minica, C. C., Dolan, C. V., Boomsma, D. I., de Geus, E. & Neale, M. C. Extending causality tests with genetic instruments: an integration of Mendelian randomization and the classical twin design. Preprint at bioRxiv 134585 (2017).

  78. 78.

    Davey Smith, G. & Ebrahim, S. ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? Int. J. Epidemiol. 32, 1–22 (2003).

    Article  Google Scholar 

  79. 79.

    Davey Smith, G. Capitalizing on Mendelian randomization to assess the effects of treatments. J. R. Soc. Med. 100, 432–435 (2007).

    PubMed  Article  Google Scholar 

  80. 80.

    Pasaniuc, B. & Price, A. L. Dissecting the genetics of complex traits using summary association statistics. Nat. Rev. Genet. 18, 117–127 (2017).

    PubMed  CAS  Article  Google Scholar 

  81. 81.

    Gill, D. et al. Age at menarche and lung function: a Mendelian randomization study. Eur. J. Epidemiol. 32, 701–710 (2017).

    PubMed  PubMed Central  Article  Google Scholar 

  82. 82.

    Bush, W. S., Oetjens, M. T. & Crawford, D. C. Unravelling the human genome-phenome relationship using phenome-wide association studies. Nat. Rev. Genet. 17, 129 (2016).

    PubMed  CAS  Article  Google Scholar 

  83. 83.

    O’Reilly, P. F. et al. MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS. PLoS ONE 7, e34861 (2012).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  84. 84.

    Porter, H. F. & O’Reilly, P. F. Multivariate simulation framework reveals performance of multi-trait GWAS methods. Sci. Rep. 7, 38837 (2017).

    PubMed  PubMed Central  Article  Google Scholar 

  85. 85.

    Pickrell, J. K. et al. Detection and interpretation of shared genetic influences on 42 human traits. Nat. Genet. 48, 709 (2016).This study introduces a method to detect shared genetic influences on multiple traits. It includes a test of asymmetry, which helps to identify pairs of phenotypes that are causally related and which phenotype influences the other (that is, direction of causation).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  86. 86.

    Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).This study applies summary Mendelian randomization (SMR) methods to expression data and enables the distinction between shared aetiology between expression and phenotypes owing to shared causal variants or distinct variants in LD.

    PubMed  CAS  Article  Google Scholar 

  87. 87.

    Richardson, T. G. et al. Mendelian randomization analysis identifies CpG sites as putative mediators for genetic influences on cardiovascular disease risk. Am. J. Hum. Genet. 101, 590–602 (2017).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  88. 88.

    Wallace, C. Statistical testing of shared genetic control for potentially related traits. Genet. Epidemiol. 37, 802–813 (2013).

    PubMed  PubMed Central  Article  Google Scholar 

  89. 89.

    Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).This paper introduces a Bayesian colocalization method to identify shared causal variants between phenotypes.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  90. 90.

    Walter, S. et al. Revisiting mendelian randomization studies of the effect of body mass index on depression. Am. J. Med. Genet. B. Neuropsychiatr. Genet. 168B, 108–115 (2015).

    PubMed  Article  CAS  Google Scholar 

  91. 91.

    Hemani, G. et al. Automating Mendelian randomization through machine learning to construct a putative causal map of the human phenome. Preprint at bioRxiv 173682 (2017).

  92. 92.

    Davey Smith, G. et al. Incidence of type 2 diabetes in the randomized multiple risk factor intervention trial. Ann. Intern. Med. 142, 313–322 (2005).

    PubMed  Article  Google Scholar 

  93. 93.

    Åsvold, B. O. et al. Causal associations of tobacco smoking with cardiovascular risk factors: a Mendelian randomization analysis of the HUNT Study in Norway. Int. J. Epidemiol. 43, 1458–1470 (2014).

    PubMed  Article  Google Scholar 

  94. 94.

    Burgess, S., Daniel, R. M., Butterworth, A. S. & Thompson, S. G. Network Mendelian randomization: using genetic variants as instrumental variables to investigate mediation in causal pathways. Int. J. Epidemiol. 44, 484–495 (2015).

    PubMed  Article  Google Scholar 

  95. 95.

    Chen, W.-M. & Abecasis, G. R. Family-based association tests for genomewide association scans. Am. J. Hum. Genet. 81, 913–926 (2007).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  96. 96.

    Dudbridge, F. Likelihood-based association analysis for nuclear families and unrelated subjects with missing genotype data. Hum. Hered. 66, 87–98 (2008).

    PubMed  PubMed Central  Article  Google Scholar 

  97. 97.

    Dudbridge, F. Power and predictive accuracy of polygenic risk scores. PLoS Genet. 9, e1003348 (2013).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  98. 98.

    Moayyeri, A., Hammond, C. J., Valdes, A. M. & Spector, T. D. Cohort profile: TwinsUK and healthy ageing twin study. Int. J. Epidemiol. 42, 76–85 (2013).

    PubMed  Article  Google Scholar 

  99. 99.

    Haworth, C. M. A., Davis, O. S. P. & Plomin, R. Twins Early Development Study (TEDS): a genetically sensitive investigation of cognitive and behavioral development from childhood to young adulthood. Twin Res. Hum. Genet. 16, 117–125 (2013).

    PubMed  Article  Google Scholar 

  100. 100.

    Magnus, P. et al. Cohort profile update: the Norwegian Mother and Child Cohort Study (MoBa). Int. J. Epidemiol. 45, 382–388 (2016).

    PubMed  Article  Google Scholar 

  101. 101.

    Fraser, A. et al. Cohort profile: the Avon Longitudinal Study of Parents and Children: ALSPAC mothers cohort. Int. J. Epidemiol. 42, 97–110 (2013).

    PubMed  Article  Google Scholar 

  102. 102.

    Walker, V. M., Davey Smith, G., Davies, N. M. & Martin, R. M. Mendelian randomization: a novel approach for the prediction of adverse drug events and drug repurposing opportunities. Int. J. Epidemiol. 46, 2078–2089 (2017).

    PubMed  PubMed Central  Article  Google Scholar 

  103. 103.

    Scott, R. A. et al. A genomic approach to therapeutic target validation identifies a glucose-lowering GLP1R variant protective for coronary heart disease. Sci. Transl Med. 8, 341ra76 (2016).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  104. 104.

    Lehrer, S. F. & Ding, W. Are genetic markers of interest for economic research? IZA J. Labor Policy. 6, 2 (2017).

    Article  Google Scholar 

  105. 105.

    Glymour, M. M., Tchetgen Tchetgen, E. J. & Robins, J. M. Credible Mendelian randomization studies: approaches for evaluating the instrumental variable assumptions. Am. J. Epidemiol. 175, 332–339 (2012).

    PubMed  PubMed Central  Article  Google Scholar 

  106. 106.

    Lawlor, D. A., Tilling, K. & Davey Smith, G. Triangulation in aetiological epidemiology. Int. J. Epidemiol. 45, 1866–1886 (2016).

    PubMed  Article  Google Scholar 

  107. 107.

    Munafò, M. R. & Davey Smith, G. Robust research needs many lines of evidence. Nature 553, 399–401 (2018).

    PubMed  Article  CAS  Google Scholar 

  108. 108.

    Fisher, R. A. Alleged dangers of cigarette-smoking. BMJ 2, 297–298 (1957).

    Article  Google Scholar 

  109. 109.

    Knopik, V. S., Neiderhiser, J. M., DeFries, J. C. & Plomin, R. Behavioral Genetics. (Worth Publishers, New York, 2016).

    Google Scholar 

  110. 110.

    Okbay, A. et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature 533, 539–542 (2016).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  111. 111.

    Kendler, K. S. & Baker, J. H. Genetic influences on measures of the environment: a systematic review. Psychol. Med. 37, 615–626 (2007).

    PubMed  Article  Google Scholar 

  112. 112.

    Krapohl, E. & Plomin, R. Genetic link between family socioeconomic status and children’s educational achievement estimated from genome-wide SNPs. Mol. Psychiatry 21, 437–443 (2016).

    PubMed  CAS  Article  Google Scholar 

  113. 113.

    Munafò, M. R. et al. Association between genetic variants on chromosome 15q25 locus and objective measures of tobacco exposure. J. Natl Cancer Inst. 104, 740–748 (2012).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  114. 114.

    Tobacco and Genetics Consortium. Genome-wide meta-analyses identify multiple loci associated with smoking behavior. Nat. Genet. 42, 441–447 (2010).

    Article  CAS  Google Scholar 

  115. 115.

    Morral, A. R., McCaffrey, D. F. & Paddock, S. M. Reassessing the marijuana gateway effect. Addiction 97, 1493–1504 (2002).

    PubMed  Article  Google Scholar 

  116. 116.

    Rutter, M. Proceeding from observed correlation to causal inference: the use of natural experiments. Perspect. Psychol. Sci. 2, 377–395 (2007).

    PubMed  Article  Google Scholar 

  117. 117.

    Greenland, S. Quantifying biases in causal models: classical confounding versus collider-stratification bias. Epidemiology 14, 300–306 (2003).

    PubMed  Google Scholar 

  118. 118.

    Sheehan, N. A., Didelez, V., Burton, P. R. & Tobin, M. D. Mendelian randomisation and causal inference in observational epidemiology. PLoS Med. 5, e177 (2008).

    PubMed  PubMed Central  Article  Google Scholar 

  119. 119.

    Didelez, V. & Sheehan, N. Mendelian randomization as an instrumental variable approach to causal inference. Stat. Methods Med. Res. 16, 309–330 (2007).

    PubMed  Article  Google Scholar 

  120. 120.

    Burgess, S. & Thompson, S. G. Mendelian Randomization: Methods for Using Genetic Variants in Causal Estimation. (CRC Press, Boca Raton, 2015).

    Book  Google Scholar 

  121. 121.

    Davey Smith, G. et al. Clustered environments and randomized genes: a fundamental distinction between conventional and genetic epidemiology. PLoS Med. 4, e352 (2007).

    Article  CAS  Google Scholar 

  122. 122.

    Pierce, B. L. & Burgess, S. Efficient design for Mendelian randomization studies: subsample and 2-sample instrumental variable estimators. Am. J. Epidemiol. 178, 1177–1184 (2013).

    PubMed  PubMed Central  Article  Google Scholar 

  123. 123.

    Hu, J. X., Thomas, C. E. & Brunak, S. Network biology concepts in complex disease comorbidities. Nat. Rev. Genet. 17, 615–629 (2016).

    PubMed  CAS  Article  Google Scholar 

  124. 124.

    Solovieff, N., Cotsapas, C., Lee, P. H., Purcell, S. M. & Smoller, J. W. Pleiotropy in complex traits: challenges and strategies. Nat. Rev. Genet. 14, 483–495 (2013).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  125. 125.

    Paaby, A. B. & Rockman, M. V. The many faces of pleiotropy. Trends Genet. 29, 66–73 (2013).

    PubMed  CAS  Article  Google Scholar 

  126. 126.

    Kong, A. et al. The nature of nurture: effects of parental genotypes. Science 359, 424–428 (2018).

    PubMed  CAS  Article  Google Scholar 

  127. 127.

    Bates, T. C. et al. The nature of nurture: using a virtual-parent design to test parenting effects on children’s educational attainment in genotyped families. Twin Res. Hum. Genet. 21, 73–83 (2018).

    PubMed  Article  Google Scholar 

  128. 128.

    Euesden, J., Lewis, C. M. & O’Reilly, P. F. PRSice: polygenic risk score software. Bioinformatics 31, 1466–1468 (2015).

    PubMed  CAS  Article  Google Scholar 

  129. 129.

    Burgess, S., Butterworth, A. & Thompson, S. G. Mendelian randomization analysis with multiple genetic variants using summarized data. Genet. Epidemiol. 37, 658–665 (2013).

    PubMed  PubMed Central  Article  Google Scholar 

  130. 130.

    Hartwig, F. P., Davey Smith, G. & Bowden, J. Robust inference in summary data Mendelian randomization via the zero modal pleiotropy assumption. Int. J. Epidemiol. 46, 1985–1998 (2017).

    PubMed  PubMed Central  Article  Google Scholar 

  131. 131.

    Burgess, S., Zuber, V., Gkatzionis, A., Rees, J. M. B. & Foley, C. Improving on a modal-based estimation method: model averaging for consistent and efficient estimation in Mendelian randomization when a plurality of candidate instruments are valid. Preprint at bioRxiv 175372 (2017).

  132. 132.

    Bowden, J. et al. A framework for the investigation of pleiotropy in two-sample summary data Mendelian randomization. Stat. Med. 36, 1783–1802 (2017).

    PubMed  PubMed Central  Article  Google Scholar 

Download references


The authors thank S. Gage and J. M. Vink for cannabis initiation summary statistics and syntax and J. Rees for multivariable Mendelian randomization (MR) syntax. J.-B.P. is a fellow of MQ: Transforming Mental Health (MQ16IP16) and affiliated with the Centre for Research in Epidemiology and Population Health (CESP), French National Institute for Health and Medical Research (INSERM), Université de Paris-Sud, Université de Versailles-Saint Quentin, and Université Paris-Saclay, Paris, France. P.F.O. receives funding from the UK Medical Research Council (MR/N015746/1) and the Wellcome Trust (109863/Z/15/Z). This report represents independent research (partly) funded by the National Institute for Health Research (NIHR) Biomedical Research Centre at South London and Maudsley National Health Service (NHS) Foundation Trust and King’s College London. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR, or the Department of Health.

Reviewer information

Nature Reviews Genetics thanks G. Hemani and M. C. Neale for their contribution to the peer review of this work.

Author information




J.-B.P. and T.S. researched data for the article. J.-B.P. and F.D. wrote the manuscript. All authors contributed substantially to discussion of content and reviewed and/or edited the manuscript before submission.

Corresponding author

Correspondence to Jean-Baptiste Pingault.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Twin Registries:


LD Hub:


Supplementary information


Supplementary information | A file of supplementary material providing further details on the data analysis for Figures 1 and 2.


Causal risk and protective factors

Factors whose different values predict different risks of the outcome (either an elevated risk or a protective effect), with all other factors being held constant.


Measurable individual characteristics.


A phenomenon whereby a variable (the confounder) has a causal effect on both the risk factor and the outcome, generating a spurious association between the two.

Genetic confounding

Confounding created by genetic factors influencing both the risk factor and the outcome.

Causal inference methods

Methods that aim to clarify the causal status of a risk factor, either by providing a direct estimate of the causal effect or by ruling out possible sources of confounding (for example, removing the possibility of genetic confounding).

Genetically informed methods

Methods that use genetic information, such as known genetic relationships (for example, twins) or genetic variation data.

Instrumental variable

A variable that is used as a proxy for an exposure X to estimate the causal effect of X on an outcome. This variable must be robustly associated with X, independent of all confounders of the effect of X on an outcome Y, and its effect on Y must be entirely mediated by X.

Mendelian randomization

A method that uses single nucleotide polymorphisms (SNPs) associated with an exposure as instruments to probe the causal nature of the relationship between this exposure and an outcome of interest.


Also known as potential outcomes. The counterfactual is a treatment (or value of a risk factor) that an individual is not exposed to. The potential outcome is the outcome that would be obtained under this counterfactual treatment.


Verified when the expected outcome in the non-treated group would have been the same as the outcome in the treated group if subjects in the non-treated group had received the treatment. Conditional exchangeability occurs when exchangeability is verified in each stratum of a confounder after conditioning (adjusting) for the confounder.

Genetic relatedness

Occurs when two individuals share a proportion of their genome identical by descent, as a result of inheritance from a recent common ancestor.


An exposure X and an outcome Y are d-separated through the process of d-separation, in which all backdoor paths between X and Y are blocked, to estimate the unconfounded effect of X on Y.


A core assumption of instrumental variable estimation, whereby the instrument used must be robustly associated with the exposure of interest.

Exclusion restriction

A core assumption of instrumental variable estimation whereby the effect of the instrument on the outcome must act entirely through its effect on the exposure (that is, not directly and not via confounders or other mediators).

Backdoor paths

Also known as unblocked paths. A path between an exposure X and an outcome Y through a confounder, which biases the estimation of the causal effect of X on Y.

Structural equation modelling

Multivariate statistical technique combining factor analysis and regression analysis to estimate networks of relationships between latent and observed variables.

Sensitivity analysis

An analysis conducted to assess how robust an association of interest is to potential unobserved confounding or other sources of bias.


The proportion of variance in a phenotype that can be attributed to genetic differences among individuals in a given population. Narrow-sense heritability estimates additive genetic effects. Broad-sense heritability includes both additive and dominance effects.

Environmental influences

Influences that contribute to make two individuals (for example, twins) similar (shared environmental influences) or dissimilar (non-shared environmental influences) to each other.

Single nucleotide polymorphisms

(SNPs). DNA sequence variation arising from differences in a single nucleotide: adenine (A), thymine (T), cytosine (C) or guanine (G).


Influenced by variants in many genes.


Occurs when a genetic locus (for example, a single nucleotide polymorphism (SNP)) affects more than one trait.

Genome-wide association studies

(GWAS). Studies in which hundreds of thousands to millions of genetic variants are tested for an association with a phenotype.


When several estimates of the same effect do not converge towards the same value, whether in meta-analyses or in Mendelian randomization analyses using many genetic instruments.

Collider bias

When a variable (the collider) is independently caused by the exposure and outcome of interest; controlling for it creates an association between exposure and outcome.

Allelic scores

Computed as a polygenic score but summarizes genetic information derived from a few to a few hundred single nucleotide polymorphisms (SNPs) as opposed to polygenic scores, which rely on thousands up to all SNPs in the genome.

Polygenic scores

Individual-level scores that summarize genetic risk (or protection) for a given phenotype. For each single nucleotide polymorphism (SNP), a score is computed by counting effect alleles in an individual and weighting them by the effect size of this SNP. A polygenic score is computed by summing scores from a large number, potentially all, of the SNPs in the genome.

Dynastic effects

Occur when genetic variants in parents are transmitted to the offspring but also contribute to parental phenotype and in turn to the environment experienced by the child. This induces a correlation between offspring genotypes and the offspring’s environment.

Summary association statistics

Effect sizes and standard errors derived from a genome-wide association study for each single nucleotide polymorphism (SNP). They may include other summary statistics (for example, allele frequency or imputation accuracy).

Genetic correlations

The correlation between causal effect sizes for two phenotypes across single nucleotide polymorphisms (SNPs). Typically reported as the correlation across the whole genome and will differ when restricted to pleiotropic SNPs only.

Phenome-wide association studies

(PheWAS). These studies estimate the association of one or a few genetic variants of particular interest against many phenotypes, that is, a selection of all possible phenotypes or phenome.

Colocalization methods

When a genetic region contains variants associated with more than one phenotype, colocalization methods aim to determine whether this is due to shared or distinct causal variants.

Linkage disequilibrium

(LD). Nonrandom associations between alleles at different loci.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Pingault, JB., O’Reilly, P.F., Schoeler, T. et al. Using genetic data to strengthen causal inference in observational research. Nat Rev Genet 19, 566–580 (2018).

Download citation

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing