Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

The distribution of common-variant effect sizes

Abstract

The genetic effect-size distribution of a disease describes the number of risk variants, the range of their effect sizes and sample sizes that will be required to discover them. Accurate estimation has been a challenge. Here I propose Fourier Mixture Regression (FMR), validating that it accurately estimates real and simulated effect-size distributions. Applied to summary statistics for ten diseases (average \(N_{\textrm{eff}} = 169,000\)), FMR estimates that 100,000–1,000,000 cases will be required for genome-wide significant SNPs to explain 50% of SNP heritability. In such large studies, genome-wide significance becomes increasingly conservative, and less stringent thresholds achieve high true positive rates if confounding is controlled. Across traits, polygenicity varies, but the range of their effect sizes is similar. Compared with effect sizes in the top 10% of heritability, including most discovered thus far, those in the bottom 10–50% are orders of magnitude smaller and more numerous, spanning a large fraction of the genome.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Performance of Fourier regression in well-powered simulations (N = 460,000) with real LD.
Fig. 2: Predicted versus observed GWAS results in UK Biobank.
Fig. 3: Sample size requirements and predictions for future GWAS.
Fig. 4: The NTPR.
Fig. 5: The genetic effect-size distribution.

Data availability

GWAS summary statistics are available at https://alkesgroup.broadinstitute.org/. Numerical results for Figs. 25 are reported in the Supplementary Tables.

Code availability

Open-source software is available at https://github.com/lukejoconnor63. GENESIS14 software is available at https://github.com/yandorazhang/GENESIS.

References

  1. 1.

    International Schizophrenia Consortium. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748–752 (2009).

    PubMed Central  Article  CAS  PubMed  Google Scholar 

  2. 2.

    Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832–838 (2010).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  3. 3.

    Visscher, P. M. et al. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 101, 5–22 (2017).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  4. 4.

    Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  5. 5.

    Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  6. 6.

    Khera, A. V. et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 50, 1219–1224 (2018).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  7. 7.

    Huang, H. et al. Fine-mapping inflammatory bowel disease loci to single-variant resolution. Nature 547, 173–178 (2017).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  8. 8.

    Ulirsch, J. C. et al. Interrogation of human hematopoiesis at single-cell and single-variant resolution. Nat. Genet. 51, 683–693 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  9. 9.

    Wang, G., Sarkar, A., Carbonetto, P. & Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. B 82, 1273–1300 (2020).

  10. 10.

    Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1335 (2015).

    Article  CAS  Google Scholar 

  11. 11.

    Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621–629 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  12. 12.

    Yang, J. et al. Genome partitioning of genetic variation for complex traits using common SNPs. Nat. Genet. 43, 519–525 (2011).

    PubMed Central  PubMed  Google Scholar 

  13. 13.

    Stahl, E. A. et al. Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nat. Genet. 44, 483–489 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  14. 14.

    Zhang, Y., Qi, G., Park, J.-H. & Chatterjee, N. Estimation of complex effect-size distributions using summary-level statistics from genome-wide association studies across 32 complex traits. Nat. Genet. 50, 1318–1326 (2018).

    PubMed  Google Scholar 

  15. 15.

    O’Connor, L. J. et al. Extreme polygenicity of complex traits is explained by negative selection. Am. J. Hum. Genet. 105, 456–476 (2019).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  16. 16.

    Park, J.-H. et al. Estimation of effect size distribution from genome-wide association studies and implications for future discoveries. Nat. Genet. 42, 570–575 (2010).

    PubMed Central  PubMed  Google Scholar 

  17. 17.

    Palla, L. & Dudbridge, F. A fast method that uses polygenic scores to estimate the variance explained by genome-wide marker panels and the proportion of variants affecting a trait. Am. J. Hum. Genet. 97, 250–259 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  18. 18.

    Zeng, J. et al. Signatures of negative selection in the genetic architecture of human complex traits. Nat. Genet. 50, 746–753 (2018).

    CAS  PubMed  Article  Google Scholar 

  19. 19.

    Zhu, X. & Stephens, M. Large-scale genome-wide enrichment analyses identify new trait-associated genes and pathways across 31 human phenotypes. Nat. Commun. 9, 4361 (2018).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  20. 20.

    Holland, D. et al. Beyond SNP heritability: polygenicity and discoverability of phenotypes estimated with a univariate Gaussian mixture model. PLoS Genet. 16, e1008612 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  21. 21.

    Vilhjálmsson, B. J. et al. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 97, 576–592 (2015).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  22. 22.

    Moser, G. et al. Simultaneous discovery, estimation and prediction analysis of complex traits using a Bayesian mixture model. PLoS Genet. 11, e1004969 (2015).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  23. 23.

    Loh, P. R. et al. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat. Genet. 47, 1385–1392 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  24. 24.

    Shi, H., Kichaev, G. & Pasaniuc, B. Contrasting the genetic architecture of 30 complex traits from summary association data. Am. J. Hum. Genet. 99, 139–153 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  25. 25.

    Weissbrod, O. et al. Functionally informed fine-mapping and polygenic localization of complex trait heritability. Nat. Genet. 52, 1355–1363 (2020).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  26. 26.

    Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  27. 27.

    Price, A. L., Zaitlen, N. A., Reich, D. & Patterson, N. New approaches to population stratification in genome-wide association studies. Nat. Rev. Genet. 11, 459–463 (2010).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  28. 28.

    Kong, A. et al. The nature of nurture: effects of parental genotypes. Science 359, 424–428 (2018).

  29. 29.

    Morris, T. T., Davies, N. M., Hemani, G. & Smith, G. D. Population phenomena inflate genetic associations of complex social traits. Sci. Adv. 6, eaay0328 (2020).

  30. 30.

    Hormozdiari, F. et al. Widespread allelic heterogeneity in complex traits. Am. J. Hum. Genet. 100, 789–802 (2017).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  31. 31.

    Schaid, D. J., Chen, W. & Larson, N. B. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat. Rev. Genet. 19, 491–504 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  32. 32.

    Nelson, C. P. et al. Genetically determined height and coronary artery disease. N. Engl. J. Med. 372, 1608–1618 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  33. 33.

    Palmer, C. & Pe’er, I. Statistical correction of the Winner’s Curse explains replication variability in quantitative trait genome-wide association studies. PLoS Genet. 13, e1006916 (2017).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  34. 34.

    Galinsky, K. J. et al. Fast principal-component analysis reveals convergent evolution of ADH1B in Europe and East Asia. Am. J. Hum. Genet. 98, 456–472 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  35. 35.

    Loh, P.-R., Kichaev, G., Gazal, S., Schoech, A. P. & Price, A. L. Mixed-model association for biobank-scale datasets. Nat. Genet. 50, 906–908 (2018).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  36. 36.

    Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).

    PubMed Central  PubMed  Google Scholar 

  37. 37.

    Turchin, M. C. et al. Evidence of widespread selection on standing variation in Europe at height-associated SNPs. Nat. Genet. 44, 1015–1019 (2012).

  38. 38.

    Sohail, M. et al. Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. eLife 8, e39702 (2019).

  39. 39.

    Yang, J., Zaitlen, N. A., Goddard, M. E., Visscher, P. M. & Price, A. L. Advantages and pitfalls in the application of mixed-model association methods. Nat. Genet. 46, 100–106 (2014).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  40. 40.

    Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999).

    CAS  PubMed  Article  Google Scholar 

  41. 41.

    Stephens, M. False discovery rates: a new deal. Biostatistics 18, 275–294 (2017).

    PubMed  Google Scholar 

  42. 42.

    Zhu, X. & Stephens, M. Large-scale genome-wide enrichment analyses identify new trait-associated genes and pathways across 31 human phenotypes. Nat. Commun. 9, 4361 (2018).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  43. 43.

    Zuk, O. et al. Searching for missing heritability: designing rare variant association studies. Proc. Natl Acad. Sci. USA 111, E455–E464 (2014).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  44. 44.

    Asgari, S. et al. A positively selected FBN1 missense variant reduces height in Peruvian individuals. Nature 582, 234–239 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  45. 45.

    Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019).

    PubMed  PubMed Central  Google Scholar 

  46. 46.

    Kichaev, G. & Pasaniuc, B. Leveraging functional-annotation data in trans-ethnic fine-mapping studies. Am. J. Hum. Genet. 97, 260–271 (2015).

  47. 47.

    Vilhjálmsson, B. J. et al. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 97, 576–592 (2015).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  48. 48.

    Chung, W. et al. Efficient cross-trait penalized regression increases prediction accuracy in large cohorts using secondary phenotypes. Nat. Commun. 10, 569 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  49. 49.

    Chun, S. et al. Non-parametric polygenic risk prediction via partitioned GWAS summary statistics. Am. J. Hum. Genet. 107, 46–59 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  50. 50.

    Lloyd-Jones, L. R. et al. Improved polygenic prediction by Bayesian multiple regression on summary statistics. Nat. Commun. 10, 5086 (2019).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  51. 51.

    Chen, W., McDonnell, S. K., Thibodeau, S. N., Tillmans, L. S. & Schaid, D. J. Incorporating functional annotations for fine-mapping causal variants in a Bayesian framework using summary statistics. Genetics 204, 933–958 (2016).

    PubMed  PubMed Central  Article  Google Scholar 

  52. 52.

    Kichaev, G. et al. Improved methods for multi-trait fine mapping of pleiotropic risk loci. Bioinformatics 33, 248–255 (2017).

    CAS  PubMed  Article  Google Scholar 

  53. 53.

    Benner, C., Havulinna, A., Salomaa, V., Ripatti, S. & Pirinen, M. Refining fine-mapping: effect sizes and regional heritability. Preprint at bioRxiv https://doi.org/10.1101/318618 (2018).

  54. 54.

    Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  55. 55.

    Simons, Y. B., Bullaughey, K., Hudson, R. R. & Sella, G. A population genetic interpretation of GWAS findings for human quantitative traits. PLoS Biol. 16, e2002985 (2018).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  56. 56.

    Brown, B. C., Ye, C. J., Price, A. L. & Zaitlen, N. Transethnic genetic-correlation estimates from summary statistics. Am. J. Hum. Genet. 99, 76–88 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  57. 57.

    Gazal, S. et al. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421–1427 (2017).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  58. 58.

    Speed, D. et al. Reevaluation of SNP heritability in complex human traits. Nat. Genet. 49, 986–992 (2017).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  59. 59.

    Gazal, S., Marquez-Luna, C., Finucane, H. K. & Price, A. L. Reconciling S-LDSC and LDAK functional enrichment estimates. Nat. Genet. 51, 1202–1204 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  60. 60.

    Belmont, J. W. et al. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).

    Article  CAS  Google Scholar 

  61. 61.

    Conneely, K. N. & Boehnke, M. So many correlated tests, so little time! Rapid adjustment of P values for multiple correlated tests. Am. J. Hum. Genet. 81, 1158–1168 (2007).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  62. 62.

    Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Article  CAS  Google Scholar 

  63. 63.

    O’Connor, L. J. lukejoconnor/FMR: initial release of FMR software. Zenodo https://doi.org/10.5281/zenodo.4670516 (2021).

Download references

Acknowledgements

I am grateful to A. Price, J. Ballard, A. Nadig, D. Weiner, O. Weissbrod, G. Getz, B. Neale and E. Lander for suggestions and discussions.

Author information

Affiliations

Authors

Contributions

L.J.O. wrote the manuscript.

Corresponding author

Correspondence to Luke J. O’Connor.

Ethics declarations

Competing interests

The author declares no competing interests.

Additional information

Peer review information Nature Genetics thanks Yan Zhang for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Performance of FMR in simulations at different sample sizes.

I show the true HDM (yellow), estimates for 10 individual simulation replicates (grey), the mean estimate across 20 replicates (blue), and the mean uncorrected estimate. The uncorrected estimate is obtained by running FMR without any correction for sampling variation in the GWAS summary statistics (see Supplementary Note). Data were simulated under a point-normal model with either 1% or 10% of SNPs having nonzero causal effect sizes.

Extended Data Fig. 2 Calibration of FMR jackknife standard errors.

Simulations were performed under a normal mixture model with small-, medium- and large-effect SNPs (similar to Fig. 1d), at sample size N=460k, N=145k or N=50k. For different effect-size thresholds, I calculated the standard error of the proportion of random-effect heritability explained by SNPs with effect sizes less than that threshold. Bar plots show root-mean-squared jackknife standard errors (blue) and empirical standard errors (orange) based on 25 replicates. At large sample size (N=460k), standard errors were sometimes underestimated, probably due to the nonnegativity constraints in the regression. Caution is needed when making comparisons between the genetic architecture of different traits, as underestimated standard errors could lead to false-positive differences.

Extended Data Fig. 3 Effect of changing the FMR sampling times and mixture components in simulations.

Simulations were performed under a normal mixture model with small-, medium- and large-effect SNPs (similar to Fig. 1d), at sample size N=460k. I specified a set of 17 mixture components (\(\sigma ^2 = [2^{ - 9},2^{ - 8}, \ldots 2^7]\)) and 17 sampling times (\(t_k = 1/\sigma _k\)), and performed simulations with various subsets of the respective values. In panels a-d, I use the same values of \(\sigma ^2\) (\(\sigma _3^2,\sigma _4^2, \ldots ,\sigma _{15}^2\), which correspond to the default FMR model) and various values of \(t\). In panels e-f, I vary the values of \(\sigma ^2\). In most cases, very similar results are obtained, except when too few sampling times are used (panel d). 25 replicates are performed (identical between the figure panels), the first 10 of which are plotted in grey. The mean and standard deviation across replicates are shown in blue. (a) \(\sigma ^2 = [\sigma _3^2,\sigma _4^2, \ldots ,\sigma _{15}^2]\), \({\boldsymbol{t}} = [t_3,t_4, \ldots ,t_{15}]\); (b) \(\sigma ^2 = [\sigma _3^2,\sigma _4^2, \ldots ,\sigma _{15}^2]\), \({\boldsymbol{t}} = [t_1,t_2, \ldots ,t_{17}]\); (c) \(\sigma ^2 = [\sigma _3^2,\sigma _4^2, \ldots ,\sigma _{15}^2]\), \({\boldsymbol{t}} = [t_5,t_6, \ldots ,t_{11}]\); (d) \(\sigma ^2 = [\sigma _3^2,\sigma _4^2, \ldots ,\sigma _{15}^2]\), \({\boldsymbol{t}} = [t_5,t_7, \ldots ,t_{15}]\); (e) \(\sigma _1^2,\sigma _2^2, \ldots ,\sigma _{17}^2\), \(t_1,t_2, \ldots ,t_{17}\); (f) \(\sigma ^2 = \left[ {\sigma _5^2,\sigma _6^2, \ldots ,\sigma _{13}^2} \right]\), \({\boldsymbol{t}} = [t_5,t_6, \ldots ,t_{13}]\). I recommend using 13 mixture components and 13 sampling times, even though a smaller number may suffice (Extended Data Fig. 3f).

Extended Data Fig. 4 Observed number of genome-wide significant SNPs and proportion of heritability explained at N=145k vs 460k.

For numerical results, see Supplementary Table 2.

Extended Data Fig. 5 Predicted vs. observed heritability explained by genome-wide significant SNPs at different significance thresholds.

%h2GWAS was predicted using interim-release UK Biobank summary statistics (maximum N=145k) and evaluated in the full release (maximum N=460k). Squared correlations between predicted and observed values were 0.94, 0.95, 0.93, and 0.88 in panels a-d respectively. Lower r2 at \(\chi ^2 > 1000\)(panel d) could result from the small number of loci with large effect sizes, which may increase the sampling variance of both the FMR predictions and the observed values. In panel d, the data points for several traits are superimposed near the origin. For numerical results, see Supplementary Table 2.

Extended Data Fig. 6 Predicted vs. observed number of genome-wide significant SNPs at different significance thresholds.

MGWAS was predicted using interim-release UK Biobank summary statistics (maximum N=145k) and evaluated in the full release (maximum N=460k). Squared correlations between predicted and observed values were 0.92, 0.97, 0.92 and 0.91 in panels a-d respectively. In panel d, the data points for several points are superimposed near the origin. For numerical results, see Supplementary Table 2.

Extended Data Fig. 7 Consistency of FMR predictions at different sample sizes.

%h2GWAS and MGWAS were predicted for 22 traits based on N=145k vs. N=460k summary statistics, with target sample size equal to 460k, 2M or 10M. Predictions assume that the LD score regression intercept will be equal to what was observed at N=145k for both sets of estimates. Numerical results are presented in Supplementary Table 2.

Extended Data Fig. 8 Performance of GENESIS vs. FMR predictions in UK Biobank.

FMR and GENESIS were applied to interim-release UK Biobank summary statistics (maximum N=145k) for 22 traits in order to predict the results of the full release (maximum N=460k). Numerical results are presented in Supplementary Table 1.

Extended Data Fig. 9 Consistency of GENESIS predictions at different sample sizes.

%h2GWAS and MGWAS were predicted for 19 traits (Supplementary Table 3) based on N=145k vs. N=460k summary statistics, with target sample size equal to 460k, 2M or 10M. At N=460k, predictions of large-N %h2GWAS were slightly smaller (panels b-c), while predictions of \(M_{{\mathrm{GWAS}}}\) were slightly larger (panel f). This difference could result from a less severe form of the power-dependent bias that is known to affect the point-normal (2-component) model when it is misspecified: as sample size increases, SNPs with smaller effect sizes become detectable, and estimates shift toward a larger number of causal SNPs with smaller effect sizes. (This only occurs when the model is misspecified, with a larger-than-expected number of small-effect SNPs). The 3-component model ameliorates this bias by including a small-effect heritability component even at small sample sizes. However, if this model too is misspecified (for example when there is a mixture of small-, medium- and large-effect SNPs), then it would be affected in the same way as the point-normal model, to a lesser degree. Numerical results are presented in Supplementary Table 3. The same analysis using FMR is presented in Extended Data Fig. 7.

Extended Data Fig. 10 Estimated HDM of height using summary statistics from GIANT vs. UK Biobank.

If results were biased by population stratification, the bottom-left portion of the curve (corresponding to small-effect SNPs) would be inflated for estimates based on GIANT.

Supplementary information

Supplementary Information

Supplementary Tables 1, 4, 6 and 8, Figs. 1–7 and Note

Reporting Summary

Peer Review Information

Supplementary Tables

Supplementary Tables 2, 3, 5, 7 and 9–12

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

O’Connor, L.J. The distribution of common-variant effect sizes. Nat Genet 53, 1243–1249 (2021). https://doi.org/10.1038/s41588-021-00901-3

Download citation

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing