Technical Report | Published:

Distinguishing genetic correlation from causation across 52 diseases and complex traits

Nature Genetics (2018) | Download Citation

Abstract

Mendelian randomization, a method to infer causal relationships, is confounded by genetic correlations reflecting shared etiology. We developed a model in which a latent causal variable mediates the genetic correlation; trait 1 is partially genetically causal for trait 2 if it is strongly genetically correlated with the latent causal variable, quantified using the genetic causality proportion. We fit this model using mixed fourth moments \({\it{E}}({\it{\alpha }}_1^2{\it{\alpha }}_1{\it{\alpha }}_2)\) and \({\it{E}}\left( {{\it{\alpha }}_2^2{\it{\alpha }}_1{\it{\alpha }}_2} \right)\) of marginal effect sizes for each trait; if trait 1 is causal for trait 2, then SNPs affecting trait 1 (large \({\it{\alpha }}_1^2\)) will have correlated effects on trait 2 (large α1α2), but not vice versa. In simulations, our method avoided false positives due to genetic correlations, unlike Mendelian randomization. Across 52 traits (average n = 331,000), we identified 30 causal relationships with high genetic causality proportion estimates. Novel findings included a causal effect of low-density lipoprotein on bone mineral density, consistent with clinical trials of statins in osteoporosis.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Data availability

UK Biobank summary statistics are publicly available at http://data.broadinstitute.org/alkesgroup/UKBB/.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Change history

  • 06 November 2018

    In the version of this article originally published, there were errors in equations. In the HTML and PDF, the initial term of equation 10 was estimated GCP but should have been estimated standard error, while a ‘hat’ was missing from the first alpha in the second term of the expression at the end of the paragraph following equation (6) in the Methods. In addition, in the abstract in the PDF, a subscript 1 was used instead of a subscript 2 for the final term of the first fourth-moment expression. These errors have been corrected in the HTML, PDF and print versions of the paper.

References

  1. 1.

    Davey Smith, G. & Ebrahim, S. Mendelian randomization: can genetic epidemiology contribute to understanding environmental determinants of disease? Int. J. Epidemiol. 32, 1–22 (2003).

  2. 2.

    Davey Smith, G. & Hemani, G. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Hum. Mol. Genet. 23, R89–R98 (2014).

  3. 3.

    Voight, B. F. et al. Plasma HDL cholesterol and risk of myocardial infarction: a Mendelian randomisation study. Lancet 380, 572–580 (2012).

  4. 4.

    Do, R. et al. Common variants associated with plasma triglycerides and risk for coronary artery disease. Nat. Genet. 45, 1345–1352 (2013).

  5. 5.

    Burgess, S., Butterworth, A. & Thompson, S. G. Mendelian randomization analysis with multiple genetic variants using summarized data. Genet. Epidemiol. 37, 658–665 (2013).

  6. 6.

    Kang, H. et al. Instrumental variables estimation with some invalid instruments and its application to Mendelian randomization. J. Am. Stat. Assoc. 111, 132–144 (2016).

  7. 7.

    Bowden, J., Davey Smith, G. & Burgess, S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int. J. Epidemiol. 44, 512–525 (2015).

  8. 8.

    Bowden, J. et al. Consistent estimation in Mendelian randomization with some invalid instruments using a weighted median estimator. Genet. Epidemiol. 40, 304–314 (2016).

  9. 9.

    Hemani, G. et al. The MR-Base platform supports systematic causal inference across the phenome. eLife 7, e34408 (2018).

  10. 10.

    Hartwig, F. P., Davey Smith, G. & Bowden, J. Robust inference in summary data Mendelian randomization via the zero modal pleiotropy assumption. Int. J. Epidemiol. 46, 1985–1998 (2017).

  11. 11.

    Pickrell, J. K. et al. Detection and interpretation of shared genetic influences on 42 human traits. Nat. Genet. 48, 709–717 (2016).

  12. 12.

    Verbanck, M., Chen, C. Y., Neale, B. & Do, R. Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases. Nat. Genet. 50, 693–698 (2018).

  13. 13.

    Cohen, J. C., Boerwinkle, E., Mosley, T. H. Jr & Hobbs, H. H. Sequence variations in PCSK9, low LDL, and protection against coronary heart disease. N. Engl. J. Med. 354, 1264–1272 (2006).

  14. 14.

    Paaby, A. B. & Rockman, M. V. The many faces of pleiotropy. Trends Genet. 29, 63–73 (2013).

  15. 15.

    VanderWeele, T. J. et al. Methodological challenges in Mendelian randomization. Epidemiology 25, 427–435 (2014).

  16. 16.

    Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).

  17. 17.

    Welsh, P. et al. Unraveling the directional link between adiposity and inflammation: a bidirectional Mendelian randomization approach. J. Clin. Endocrinol. Metab. 95, 93–99 (2010).

  18. 18.

    Vimaleswaran, K. S. et al. Causal relationship between obesity and vitamin D status: bi-directional Mendelian randomization analysis of multiple cohorts. PLoS Med. 10, e1001383 (2013).

  19. 19.

    Bulik-Sullivan, B. K. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).

  20. 20.

    Yang, J. et al. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nat. Genet. 47, 1114–1120 (2015).

  21. 21.

    Kolesar, M. et al. Identification and inference with many invalid instruments. J. Bus. Econ. Stat. 33, 474–484 (2015).

  22. 22.

    Burgess, S. & Thompson, S. G. Interpreting findings from Mendelian randomization using the MR-Egger method. Eur. J. Epidemiol. 32, 377–389 (2017).

  23. 23.

    Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).

  24. 24.

    Bycroft, C. et al. Genome-wide genetic data on ~500,000 UK Biobank participants. Preprint at bioRxiv https://doi.org/10.1101/166298 (2017).

  25. 25.

    Loh, P. R. et al. Mixed-model association for biobank-scale datasets. Nat. Genet. 50, 906–908 (2018).

  26. 26.

    Holmes, M. V., Ala-Korpela, M. & Davey Smith, G. Mendelian randomization in cardiometabolic disease: challenges in evaluating causality. Nat. Rev. Cardiol. 14, 577–590 (2017).

  27. 27.

    Davey Smith, G. et al. The association between BMI and mortality using offspring BMI as an indicator of own BMI: large intergenerational mortality study. BMJ 339, b5043 (2009).

  28. 28.

    Nordestgaard, B. G. et al. The effect of elevated body mass index on ischemic heart disease risk: causal estimates from a Mendelian randomisation approach. PLoS Med. 9, e1001212 (2012).

  29. 29.

    Hägg, S. et al. Adiposity as a cause of cardiovascular disease: a Mendelian randomization study. Int. J. Epidemiol. 44, 578–586 (2015).

  30. 30.

    Holmes, M. V. et al. Causal effects of body mass index on cardiometabolic traits and events: a Mendelian randomization analysis. Am. J. Hum. Genet. 94, 198–208 (2014).

  31. 31.

    Klein, I. & Ojamaa, K. Thyroid hormone and the cardiovascular system. N. Engl. J. Med. 344, 501–509 (2001).

  32. 32.

    Grais, I. M. & Sowers, J. R. Thyroid and the heart. Am. J. Med. 127, 691–698 (2014).

  33. 33.

    Zhao, J. V. & Schooling, C. M. Thyroid function and ischemic heart disease: a Mendelian randomization study. Sci. Rep. 7, 8515 (2017).

  34. 34.

    Monzani, F. et al. Effect of levothyroxine on cardiac function and structure in subclinical hypothyroidism: a double blind, placebo-controlled study. J. Clin. Endocrinol. Metab. 86, 1110–1115 (2001).

  35. 35.

    Meier, C. et al. TSH-controlled l-thyroxine therapy reduces cholesterol levels and clinical symptoms in subclinical hypothyroidism: a double blind, placebo-controlled trial (Basel Thyroid Study). J. Clin. Endocrinol. Metab. 86, 4430–4863 (2001).

  36. 36.

    Monzani, F. et al. Effect of levothyroxine replacement on lipid profile and intima-media thickness in subclinical hypothyroidism: a double-blind, placebo-controlled study. J. Clin. Endocrinol. Metab. 89, 2099–2106 (2004).

  37. 37.

    Razvi, S. et al. The beneficial effect of l-thyroxine on cardiovascular risk factors, endothelial function, and quality of life in subclinical hypothyroidism: randomized, crossover trial. J. Clin. Endocrinol. Metab. 92, 1715–1723 (2007).

  38. 38.

    Nagasaki, T. et al. Decrease of brachial-ankle pulse wave velocity in female subclinical hypothyroid patients during normalization of thyroid function: a double-blind, placebo-controlled study. Eur. J. Endocrinol. 160, 409–415 (2009).

  39. 39.

    Chaker, L. et al. Thyroid function and risk of type 2 diabetes: a population-based prospective cohort study. BMC Med. 14, 150 (2016).

  40. 40.

    Brenta, G. et al. Acute thyroid hormone withdrawal in athyreotic patients results in a state of insulin resistance. Thyroid 19, 665–669 (2009).

  41. 41.

    Wang, Z. et al. Effects of statins on bone mineral density and fracture risk: a PRISMA-compliant systematic review and meta-analysis. Medicine 95, e3042 (2016).

  42. 42.

    Yerges, L. M. et al. Decreased bone mineral density in subjects carrying familial defective apolipoprotein B-100. J. Clin. Endocrinol. Metab. 98, E1999–E2005 (2013).

  43. 43.

    Sanjak, J. S. et al. Evidence of directional and stabilizing selection in contemporary humans. Proc. Natl Acad. Sci. USA 115, 151–156 (2018).

  44. 44.

    Price, G. R. Selection and covariance. Nature 227, 520–521 (1970).

  45. 45.

    Clarke, T. K. et al. Common polygenic risk for autism spectrum disorder (ASD) is associated with cognitive ability in the general population. Mol. Psychiatry 21, 419–425 (2016).

  46. 46.

    Davies, G. et al. Genome-wide association study of cognitive functions and educational attainment in UK Biobank (N = 112,151). Mol. Psychiatry 21, 758–767 (2016).

  47. 47.

    Keller, M. C. & Miller, G. Resolving the paradox of common, harmful, heritable mental disorders: which evolutionary genetic models work best? Behav. Brain Sci. 29, 385–404 (2006).

  48. 48.

    Mullins, N. et al. Reproductive fitness and genetic risk of psychiatric disorders in the general population. Nat. Commun. 8, 15833 (2017).

  49. 49.

    Ware, J. J. et al. Genome-wide meta-analysis of cotinine levels in cigarette smokers identifies locus at 4q13.2. Sci. Rep. 6, 20092 (2016).

  50. 50.

    Burgess, S. et al. Network Mendelian randomization: using genetic variants as instrumental variables to investigate mediation in causal pathways. Int. J. Epidemiol. 44, 484–495 (2014).

  51. 51.

    Schoech, A. et al. Quantification of frequency-dependent genetic architectures and action of negative selection in 25 UK Biobank traits. Preprint at bioRxiv https://doi.org/10.1101/188086 (2017).

  52. 52.

    Mokry, L. E. et al. Vitamin D and risk of multiple sclerosis: a Mendelian randomization study. PLoS Med. 12, e1001866 (2015).

  53. 53.

    Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015).

  54. 54.

    Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).

  55. 55.

    Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).

  56. 56.

    GTEx Consortium et al. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).

  57. 57.

    Lyall, D. M. et al. Association of body mass index with cardiometabolic disease in the UK Biobak: a Mendelian randomization study. JAMA Cardiol. 2, 882–889 (2017).

  58. 58.

    Child, D. The Essentials of Factor Analysis. (A&C Black, London, 1990).

  59. 59.

    Comon, P. Independent component analysis, a new concept? Signal Process. 36, 287–314 (1994).

  60. 60.

    UK10K Consortium. The UK10K project identifies rare variants in health and disease. Nature 526, 82–90 (2015).

Download references

Acknowledgements

We are grateful to B. Neale, S. Raychaudhuri, C. Patel, S. Kathiresan, B. Pasaniuc, and H. Finucane for helpful discussions and to P.-R. Loh and S. Gazal for producing BOLT-LMM summary statistics for UK Biobank traits. This research was conducted using the UK Biobank Resource under Application #16549 and was funded by National Institutes of Health grants R01 MH107649, U01 CA194393, and R01 MH101244.

Author information

Affiliations

  1. Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA

    • Luke J. O’Connor
    •  & Alkes L. Price
  2. Program in Bioinformatics and Integrative Genomics, Harvard Graduate School of Arts and Sciences, Cambridge, MA, USA

    • Luke J. O’Connor
  3. Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA

    • Alkes L. Price
  4. Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, USA

    • Alkes L. Price

Authors

  1. Search for Luke J. O’Connor in:

  2. Search for Alkes L. Price in:

Contributions

L.J.O. and A.L.P. conceived the methods, designed the analyses, and wrote the manuscript. L.J.O. performed the analyses.

Competing interests

The authors declare no competing interests.

Corresponding authors

Correspondence to Luke J. O’Connor or Alkes L. Price.

Integrated supplementary information

  1. Supplementary Figure 1 Null and causal simulations with no LD and LCV model violations.

    ag, We report the positive rate (α = 0.05 for null simulations, α = 0.001 for causal simulations) for two-sample MR, MR-Egger, Bidirectional MR and LCV. ac correspond to Gaussian mixture model extensions of the models in Fig. 2b-d. f and g correspond to causal analogs of the models in a and d, respectively. We also display scatterplots illustrating the bivariate distribution of true SNP effect sizes on the two traits. a, Null simulation with nonzero SNP effects drawn from a mixture of Gaussian distributions; one mixture component has correlated effects on each trait. b, Null simulation with SNP effects drawn from a mixture of Gaussian distributions, and differential polygenicity between the two traits. c, Null simulation with SNP effects drawn from a mixture of Gaussian distributions and unequal power between the two traits. d, Null simulation with two intermediaries having different effects on each trait. e, Null simulation with two intermediaries having different effects on each trait and unequal polygenicity for the two intermediaries. f, Causal simulation with SNP effects drawn from a mixture of Gaussian distributions; all SNPs affecting trait 1 also affect trait 2, but the relative effect sizes were noisy. g, Causal simulation with an additional genetic confounder (i.e., a second intermediary) mediating part of the genetic correlation. Results for each panel are based on 1,000 simulations. Numerical results are reported in Supplementary Tables 4 and 5, which also include comparisons to MR-WME and MR-MBE.

  2. Supplementary Figure 2 Mean GCP estimates in simulations with LCV model violations.

    Error bars show s.d. based on 1,000 simulations. a, Null simulation with two intermediaries having possibly unequal polygenicity. The two intermediaries had either a slightly, moderately, or highly heterogenous effect on the two traits; that is, when heterogeneity was high, intermediary 1 had a much larger effect on trait 1 while intermediary 2 had a much larger effect on trait 2. Then, we specified a certain difference in polygenicity between the two traits (measured by the proportion of causal SNPs). b, Causal simulation with an additional latent confounder. The latent confounder explained a low, medium, or high proportion of the genetic correlation. We varied the polygenicity of the confounder and of the causal trait, such that a 16× difference in polygenicity indicates that 16× more SNPs were causal for the causal trait than for the genetic confounder.

  3. Supplementary Figure 3 Unbiasedness of posterior mean GCP estimates in simulations with LD and random true GCP values.

    Estimated values of GCP were binned and averaged, and mean true values of GCP are plotted for each bin, with standard errors. Points above the line indicate that GCP estimates were downward biased (toward –1). a, Ascertained simulations (43%) with significant genetic correlation (P < 0.05) and evidence for partial causality (P < 0.001). Only bins with a count of at least 10 are plotted. b, All 10,000 simulations.

Supplementary information

  1. Supplementary Text and Figures

    Supplementary Figures 1–3, Supplementary Tables 2–11 and 13–17, and Supplementary Note

  2. Reporting Summary

  3. Supplementary Table 1

    Simulation parameters

  4. Supplementary Table 12

    LCV and MR results for all trait pairs

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/s41588-018-0255-0