Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Estimation of complex effect-size distributions using summary-level statistics from genome-wide association studies across 32 complex traits

Abstract

We developed a likelihood-based approach for analyzing summary-level statistics and external linkage disequilibrium information to estimate effect-size distributions of common variants, characterized by the proportion of underlying susceptibility SNPs and a flexible normal-mixture model for their effects. Analysis of results available across 32 genome-wide association studies showed that, while all traits are highly polygenic, there is wide diversity in the degree and nature of polygenicity. Psychiatric diseases and traits related to mental health and ability appear to be most polygenic, involving a continuum of small effects. Most other traits, including major chronic diseases, involve clusters of SNPs that have distinct magnitudes of effects. We predict that the sample sizes needed to identify SNPs that explain most heritability found in genome-wide association studies will range from a few hundred thousand to multiple millions, depending on the underlying effect-size distributions of the traits. Accordingly, we project the risk-prediction ability of polygenic risk scores across a wide variety of diseases.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Q-Q plots comparing observed distributions of association statistics against those expected under the fitted models for three representative traits.
Fig. 2: Estimated effect-size distributions for susceptibility SNPs based on the best fitted (M2 or M3) model for continuous (top) and binary traits (bottom).
Fig. 3: Projected number of discoveries (top) and corresponding percentage of GWAS heritability explained (bottom) based on the best-fitted (M2 or M3) model for effect-size distribution for continuous (left) and binary (right) traits.
Fig. 4: Expected area under the curve (AUC) for polygenic risk prediction models with SNPs included at the optimal significance (α) threshold (red solid line) and at the genome-wide significance level of 5 × 10–8 (black solid line).

Similar content being viewed by others

References

  1. Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014).

    Article  CAS  PubMed  Google Scholar 

  2. MacArthur, J. et al. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res. 45(D1), D896–D901 (2017).

    Article  CAS  PubMed  Google Scholar 

  3. Visscher, P. M. et al. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 101, 5–22 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Maas, P. et al. Breast cancer risk from modifiable and nonmodifiable risk factors among white women in the United States. JAMA Oncol. 2, 1295–1302 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  5. Garcia-Closas, M., Gunsoy, N. B. & Chatterjee, N. Combined associations of genetic and environmental risk factors: implications for prevention of breast cancer. J. Natl. Cancer Inst. 106, dju305 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Chen, G.-B. et al. Estimation and partitioning of (co)heritability of inflammatory bowel disease from GWAS and immunochip data. Hum. Mol. Genet. 23, 4710–4720 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Vattikuti, S., Guo, J. & Chow, C. C. Heritability and genetic correlations explained by common SNPs for metabolic syndrome traits. PLoS Genet. 8, e1002637 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. So, H. C., Gui, A. H. S., Cherny, S. S. & Sham, P. C. Evaluating the heritability explained by known susceptibility variants: a survey of ten complex diseases. Genet. Epidemiol. 35, 310–317 (2011).

    Article  PubMed  Google Scholar 

  10. Sampson, J. N. et al. Analysis of heritability and shared heritability based on genome-wide association studies for thirteen cancer types. J. Natl. Cancer Inst. 107, djv279 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Bulik-Sullivan, B. K. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Lee, S. H. et al. Cross-Disorder Group of the Psychiatric Genomics Consortium. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nat. Genet. 45, 984–994 (2013).

    Article  CAS  PubMed  Google Scholar 

  13. Park, J. H. et al. Estimation of effect size distribution from genome-wide association studies and implications for future discoveries. Nat. Genet. 42, 570–575 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Dudbridge, F. Power and predictive accuracy of polygenic risk scores. PLoS Genet. 9, e1003348 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Chatterjee, N. et al. Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies. Nat. Genet. 45, 400–405 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Zheng, J. et al. LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis. Bioinformatics 33, 272–279 (2017).

    Article  CAS  PubMed  Google Scholar 

  18. Gazal, S. et al. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421–1427 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Purcell, S. M. et al. International Schizophrenia Consortium. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748–752 (2009).

    Article  CAS  PubMed  Google Scholar 

  20. Stephens, M. False discovery rates: a new deal. Biostatistics 18, 275–294 (2017).

    PubMed  Google Scholar 

  21. Efron, B. & Tibshirani, R. Empirical bayes methods and false discovery rates for microarrays. Genet. Epidemiol. 23, 70–86 (2002).

    Article  PubMed  Google Scholar 

  22. Lee, S. H., Wray, N. R., Goddard, M. E. & Visscher, P. M. Estimating missing heritability for disease from genome-wide association studies. Am. J. Hum. Genet. 88, 294–305 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Speed, D., Hemani, G., Johnson, M. R. & Balding, D. J. Improved heritability estimation from genome-wide SNPs. Am. J. Hum. Genet. 91, 1011–1021 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Golan, D., Lander, E. S. & Rosset, S. Measuring missing heritability: inferring the contribution of common variants. Proc. Natl. Acad. Sci. USA 111, E5272–E5281 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. So, H. C., Li, M. & Sham, P. C. Uncovering the total heritability explained by all true susceptibility variants in a genome-wide association study. Genet. Epidemiol. 35, 447–456 (2011).

    Article  PubMed  Google Scholar 

  26. Stahl, E. A. et al. Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nat. Genet. 44, 483–489 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Palla, L. & Dudbridge, F. A fast method that uses polygenic scores to estimate the variance explained by genome-wide marker panels and the proportion of variants affecting a trait. Am. J. Hum. Genet. 97, 250–259 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Zhou, X., Carbonetto, P. & Stephens, M. Polygenic modeling with bayesian sparse linear mixed models. PLoS Genet. 9, e1003264 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Zhou, X. & Stephens, M. Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nat. Methods 11, 407–409 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Moser, G. et al. Simultaneous discovery, estimation and prediction analysis of complex traits using a bayesian mixture model. PLoS Genet. 11, e1004969 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Speed, D. & Balding, D. J. MultiBLUP: improved SNP-based prediction for complex traits. Genome Res. 24, 1550–1557 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Thompson, W. K. et al. An empirical Bayes mixture model for effect size distributions in genome-wide association studies. PLoS Genet. 11, e1005717 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Zhu, X. & Stephens, M. Bayesian large-scale multiple regression with summary statistics from genome-wide association studies. Ann. Appl. Stat. 11, 1561–1592 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Holland, D. et al. Estimating phenotypic polygenicity and causal effect size variance from GWAS summary statistics while accounting for inflation due to cryptic relatedness. Preprint at b ioRxiv https://doi.org/10.1101/133132 (2017).

  36. Yengo, L. et al. Meta-analysis of genome-wide association studies for height and body mass index in ~700,000 individuals of European ancestry. Preprint at bioRxiv https://doi.org/10.1101/274654 (2018).

  37. Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832–838 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Speliotes, E. K. et al. Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nat. Genet. 42, 937–948 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).

    Google Scholar 

  40. Nelson, C. P. et al. Association analyses based on false discovery rate implicate new loci for coronary artery disease. Nat. Genet. 49, 1385–1391 (2017).

    Article  CAS  PubMed  Google Scholar 

  41. Chatterjee, N., Shi, J. & García-Closas, M. Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat. Rev. Genet. 17, 392–406 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Wray, N. R., Goddard, M. E. & Visscher, P. M. Prediction of individual genetic risk to disease from genome-wide association studies. Genome Res. 17, 1520–1528 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Evans, D. M., Visscher, P. M. & Wray, N. R. Harnessing the information contained within genome-wide association studies to improve individual prediction of complex disease risk. Hum. Mol. Genet. 18, 3525–3531 (2009).

    Article  CAS  PubMed  Google Scholar 

  44. So, H.-C., Kwan, J. S. H., Cherny, S. S. & Sham, P. C. Risk prediction of complex diseases from family history and known susceptibility loci, with applications for cancer screening. Am. J. Hum. Genet. 88, 548–565 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Kraft, P. & Hunter, D. J. Genetic risk prediction--are we there yet? N. Engl. J. Med. 360, 1701–1703 (2009).

    Article  CAS  PubMed  Google Scholar 

  46. Speed, D., Cai, N., Johnson, M. R., Nejentsev, S. & Balding, D. J. Reevaluation of SNP heritability in complex human traits. Nat. Genet. 49, 986–992 (2017).

    PubMed  PubMed Central  CAS  Google Scholar 

  47. Derkach, A., Zhang, H. & Chatterjee, N. Power analysis for genetic association test (PAGEANT) provides insights to challenges for rare variant association studies. Bioinformatics 34, 1506–1513 (2018).

    Article  CAS  PubMed  Google Scholar 

  48. Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Lindsay, B. G. Composite likelihood methods. Contemp. Math. 80, 221–239 (1988).

    Article  Google Scholar 

  50. Varin, C., Reid, N. & Firth, D. An overview of composite likelihood methods. Stat. Sin. 21, 5–42 (2011).

    Google Scholar 

  51. Heagerty, P. J. & Lumley, T. Window subsampling of estimating functions with application to regression models. J. Am. Stat. Assoc. 95, 197–211 (2000).

    Article  Google Scholar 

  52. Lumley, T. & Heagerty, P. Weighted empirical adaptive variance estimators for correlated data regression. J. R. Stat. Soc. Series B Stat. Methodol. 61, 459–477 (1999).

    Article  Google Scholar 

  53. Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

The authors thank H. Zhang for assistance with computing, P. Kundu for data management for the UK Biobank data, and M. Chatterjee for editing of manuscript. The research was supported by Bloomberg Distinguished Professorship endowment. Some of the simulation studies were conducted using genotype data from the UK Biobank Resource accessed under Application Number 17712.

Author information

Authors and Affiliations

Authors

Contributions

Y.Z., G.Q., and N.C. conceived the methods. Y.Z., G.Q., and J.-H.P. carried out all analyses. Y.Z. and N.C. wrote the manuscript. All authors reviewed the manuscripts.

Corresponding author

Correspondence to Nilanjan Chatterjee.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Text Figures and Tables

Supplementary Note 1, Supplementary Tables 1–13, and Supplementary Figures 1–19

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., Qi, G., Park, JH. et al. Estimation of complex effect-size distributions using summary-level statistics from genome-wide association studies across 32 complex traits. Nat Genet 50, 1318–1326 (2018). https://doi.org/10.1038/s41588-018-0193-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-018-0193-x

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing