Abstract
We developed a likelihood-based approach for analyzing summary-level statistics and external linkage disequilibrium information to estimate effect-size distributions of common variants, characterized by the proportion of underlying susceptibility SNPs and a flexible normal-mixture model for their effects. Analysis of results available across 32 genome-wide association studies showed that, while all traits are highly polygenic, there is wide diversity in the degree and nature of polygenicity. Psychiatric diseases and traits related to mental health and ability appear to be most polygenic, involving a continuum of small effects. Most other traits, including major chronic diseases, involve clusters of SNPs that have distinct magnitudes of effects. We predict that the sample sizes needed to identify SNPs that explain most heritability found in genome-wide association studies will range from a few hundred thousand to multiple millions, depending on the underlying effect-size distributions of the traits. Accordingly, we project the risk-prediction ability of polygenic risk scores across a wide variety of diseases.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Reciprocal causation mixture model for robust Mendelian randomization analysis using genome-scale summary data
Nature Communications Open Access 28 February 2023
-
Functional and molecular characterization of suicidality factors using phenotypic and genome-wide data
Molecular Psychiatry Open Access 06 January 2023
-
Risk of type 2 diabetes and KCNJ11 gene polymorphisms: a nested case–control study and meta-analysis
Scientific Reports Open Access 01 December 2022
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 per month
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout




References
Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014).
MacArthur, J. et al. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res. 45(D1), D896–D901 (2017).
Visscher, P. M. et al. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 101, 5–22 (2017).
Maas, P. et al. Breast cancer risk from modifiable and nonmodifiable risk factors among white women in the United States. JAMA Oncol. 2, 1295–1302 (2016).
Garcia-Closas, M., Gunsoy, N. B. & Chatterjee, N. Combined associations of genetic and environmental risk factors: implications for prevention of breast cancer. J. Natl. Cancer Inst. 106, dju305 (2014).
Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).
Chen, G.-B. et al. Estimation and partitioning of (co)heritability of inflammatory bowel disease from GWAS and immunochip data. Hum. Mol. Genet. 23, 4710–4720 (2014).
Vattikuti, S., Guo, J. & Chow, C. C. Heritability and genetic correlations explained by common SNPs for metabolic syndrome traits. PLoS Genet. 8, e1002637 (2012).
So, H. C., Gui, A. H. S., Cherny, S. S. & Sham, P. C. Evaluating the heritability explained by known susceptibility variants: a survey of ten complex diseases. Genet. Epidemiol. 35, 310–317 (2011).
Sampson, J. N. et al. Analysis of heritability and shared heritability based on genome-wide association studies for thirteen cancer types. J. Natl. Cancer Inst. 107, djv279 (2015).
Bulik-Sullivan, B. K. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Lee, S. H. et al. Cross-Disorder Group of the Psychiatric Genomics Consortium. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nat. Genet. 45, 984–994 (2013).
Park, J. H. et al. Estimation of effect size distribution from genome-wide association studies and implications for future discoveries. Nat. Genet. 42, 570–575 (2010).
Dudbridge, F. Power and predictive accuracy of polygenic risk scores. PLoS Genet. 9, e1003348 (2013).
Chatterjee, N. et al. Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies. Nat. Genet. 45, 400–405 (2013).
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
Zheng, J. et al. LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis. Bioinformatics 33, 272–279 (2017).
Gazal, S. et al. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421–1427 (2017).
Purcell, S. M. et al. International Schizophrenia Consortium. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748–752 (2009).
Stephens, M. False discovery rates: a new deal. Biostatistics 18, 275–294 (2017).
Efron, B. & Tibshirani, R. Empirical bayes methods and false discovery rates for microarrays. Genet. Epidemiol. 23, 70–86 (2002).
Lee, S. H., Wray, N. R., Goddard, M. E. & Visscher, P. M. Estimating missing heritability for disease from genome-wide association studies. Am. J. Hum. Genet. 88, 294–305 (2011).
Speed, D., Hemani, G., Johnson, M. R. & Balding, D. J. Improved heritability estimation from genome-wide SNPs. Am. J. Hum. Genet. 91, 1011–1021 (2012).
Golan, D., Lander, E. S. & Rosset, S. Measuring missing heritability: inferring the contribution of common variants. Proc. Natl. Acad. Sci. USA 111, E5272–E5281 (2014).
So, H. C., Li, M. & Sham, P. C. Uncovering the total heritability explained by all true susceptibility variants in a genome-wide association study. Genet. Epidemiol. 35, 447–456 (2011).
Stahl, E. A. et al. Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nat. Genet. 44, 483–489 (2012).
Palla, L. & Dudbridge, F. A fast method that uses polygenic scores to estimate the variance explained by genome-wide marker panels and the proportion of variants affecting a trait. Am. J. Hum. Genet. 97, 250–259 (2015).
Zhou, X., Carbonetto, P. & Stephens, M. Polygenic modeling with bayesian sparse linear mixed models. PLoS Genet. 9, e1003264 (2013).
Zhou, X. & Stephens, M. Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nat. Methods 11, 407–409 (2014).
Moser, G. et al. Simultaneous discovery, estimation and prediction analysis of complex traits using a bayesian mixture model. PLoS Genet. 11, e1004969 (2015).
Speed, D. & Balding, D. J. MultiBLUP: improved SNP-based prediction for complex traits. Genome Res. 24, 1550–1557 (2014).
Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).
Thompson, W. K. et al. An empirical Bayes mixture model for effect size distributions in genome-wide association studies. PLoS Genet. 11, e1005717 (2015).
Zhu, X. & Stephens, M. Bayesian large-scale multiple regression with summary statistics from genome-wide association studies. Ann. Appl. Stat. 11, 1561–1592 (2017).
Holland, D. et al. Estimating phenotypic polygenicity and causal effect size variance from GWAS summary statistics while accounting for inflation due to cryptic relatedness. Preprint at b ioRxiv https://doi.org/10.1101/133132 (2017).
Yengo, L. et al. Meta-analysis of genome-wide association studies for height and body mass index in ~700,000 individuals of European ancestry. Preprint at bioRxiv https://doi.org/10.1101/274654 (2018).
Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832–838 (2010).
Speliotes, E. K. et al. Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nat. Genet. 42, 937–948 (2010).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).
Nelson, C. P. et al. Association analyses based on false discovery rate implicate new loci for coronary artery disease. Nat. Genet. 49, 1385–1391 (2017).
Chatterjee, N., Shi, J. & García-Closas, M. Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat. Rev. Genet. 17, 392–406 (2016).
Wray, N. R., Goddard, M. E. & Visscher, P. M. Prediction of individual genetic risk to disease from genome-wide association studies. Genome Res. 17, 1520–1528 (2007).
Evans, D. M., Visscher, P. M. & Wray, N. R. Harnessing the information contained within genome-wide association studies to improve individual prediction of complex disease risk. Hum. Mol. Genet. 18, 3525–3531 (2009).
So, H.-C., Kwan, J. S. H., Cherny, S. S. & Sham, P. C. Risk prediction of complex diseases from family history and known susceptibility loci, with applications for cancer screening. Am. J. Hum. Genet. 88, 548–565 (2011).
Kraft, P. & Hunter, D. J. Genetic risk prediction--are we there yet? N. Engl. J. Med. 360, 1701–1703 (2009).
Speed, D., Cai, N., Johnson, M. R., Nejentsev, S. & Balding, D. J. Reevaluation of SNP heritability in complex human traits. Nat. Genet. 49, 986–992 (2017).
Derkach, A., Zhang, H. & Chatterjee, N. Power analysis for genetic association test (PAGEANT) provides insights to challenges for rare variant association studies. Bioinformatics 34, 1506–1513 (2018).
Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375 (2012).
Lindsay, B. G. Composite likelihood methods. Contemp. Math. 80, 221–239 (1988).
Varin, C., Reid, N. & Firth, D. An overview of composite likelihood methods. Stat. Sin. 21, 5–42 (2011).
Heagerty, P. J. & Lumley, T. Window subsampling of estimating functions with application to regression models. J. Am. Stat. Assoc. 95, 197–211 (2000).
Lumley, T. & Heagerty, P. Weighted empirical adaptive variance estimators for correlated data regression. J. R. Stat. Soc. Series B Stat. Methodol. 61, 459–477 (1999).
Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Acknowledgements
The authors thank H. Zhang for assistance with computing, P. Kundu for data management for the UK Biobank data, and M. Chatterjee for editing of manuscript. The research was supported by Bloomberg Distinguished Professorship endowment. Some of the simulation studies were conducted using genotype data from the UK Biobank Resource accessed under Application Number 17712.
Author information
Authors and Affiliations
Contributions
Y.Z., G.Q., and N.C. conceived the methods. Y.Z., G.Q., and J.-H.P. carried out all analyses. Y.Z. and N.C. wrote the manuscript. All authors reviewed the manuscripts.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Text Figures and Tables
Supplementary Note 1, Supplementary Tables 1–13, and Supplementary Figures 1–19
Rights and permissions
About this article
Cite this article
Zhang, Y., Qi, G., Park, JH. et al. Estimation of complex effect-size distributions using summary-level statistics from genome-wide association studies across 32 complex traits. Nat Genet 50, 1318–1326 (2018). https://doi.org/10.1038/s41588-018-0193-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-018-0193-x
This article is cited by
-
Functional and molecular characterization of suicidality factors using phenotypic and genome-wide data
Molecular Psychiatry (2023)
-
Reciprocal causation mixture model for robust Mendelian randomization analysis using genome-scale summary data
Nature Communications (2023)
-
Multivariate genome-wide association meta-analysis of over 1 million subjects identifies loci underlying multiple substance use disorders
Nature Mental Health (2023)
-
Deciphering colorectal cancer genetics through multi-omic analysis of 100,204 cases and 154,587 controls of European and east Asian ancestries
Nature Genetics (2023)
-
Missing heritability found for height
Nature (2022)