Accurate estimation of SNP-heritability from biobank-scale data irrespective of genetic architecture

Abstract

SNP-heritability is a fundamental quantity in the study of complex traits. Recent studies have shown that existing methods to estimate genome-wide SNP-heritability can yield biases when their assumptions are violated. While various approaches have been proposed to account for frequency- and linkage disequilibrium (LD)-dependent genetic architectures, it remains unclear which estimates reported in the literature are reliable. Here we show that genome-wide SNP-heritability can be accurately estimated from biobank-scale data irrespective of genetic architecture, without specifying a heritability model or partitioning SNPs by allele frequency and/or LD. We show analytically and through extensive simulations starting from real genotypes (UK Biobank, N = 337 K) that, unlike existing methods, our closed-form estimator is robust across a wide range of architectures. We provide estimates of SNP-heritability for 22 complex traits in the UK Biobank and show that, consistent with our results in simulations, existing biobank-scale methods yield estimates up to 30% different from our theoretically-justified approach.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Simulations under 64 distinct MAF/LD-dependent architectures (N = 337,205).
Fig. 2: Comparison of \(\hat h_{{\mathrm{GRE}}}^2\) with LDSC, S-LDSC (MAF) and SumHer in genome-wide simulations (N = 337,205, M = 593,300).
Fig. 3: Comparison of \(\hat h_{{\mathrm{GRE}}}^2\) with GREML, BOLT-REML, GREML-LDMS-I and LDAK in small-scale simulations (N = 8,430, M = 14,821 SNPs).
Fig. 4: Percentage difference of \(h_g^2\) estimates from LDSC (in-sample), S-LDSC (baseline-LD/in-sample) and SumHer (in-sample) with respect to \(\hat h_{{\mathrm{GRE}}}^2\) for 18 complex traits and diseases in the UK Biobank for which \(\hat h_{{\mathrm{GRE}}}^2 > 0.05\) (N = 290,641 unrelated British individuals, M = 459,792 typed SNPs; Methods).

Data availability

The baseline-LD annotations used in Fig. 4 are available at https://data.broadinstitute.org/alkesgroup/LDSCORE/. All individual-level genotypes and phenotypes were obtained from the UK Biobank (https://www.ukbiobank.ac.uk); we do not have permission to release this data. The 1000 Genomes Phase 3 reference panel can be downloaded at http://www.internationalgenome.org/data.

Code availability

Open source code implementing the GRE estimator and our simulation framework is available on Github at https://github.com/bogdanlab/h2-GRE.

References

  1. 1.

    Visscher, P. M., Hill, W. G. & Wray, N. R. Heritability in the genomics era—concepts and misconceptions. Nat. Rev. Genet. 9, 255–266 (2008).

    CAS  Article  Google Scholar 

  2. 2.

    Wray, N. R. et al. Pitfalls of predicting complex traits from SNPs. Nat. Rev. Genet. 14, 507–515 (2013).

    CAS  Article  Google Scholar 

  3. 3.

    Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).

    CAS  Article  Google Scholar 

  4. 4.

    Visscher, P. M., Brown, M. A., McCarthy, M. I. & Yang, J. Five years of GWAS discovery. Am. J. Hum. Genet. 90, 7–24 (2012).

    CAS  Article  Google Scholar 

  5. 5.

    Visscher, P. M. et al. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 101, 5–22 (2017).

    CAS  Article  Google Scholar 

  6. 6.

    Speed, D., Hemani, G., Johnson, M. R. & Balding, D. J. Improved heritability estimation from genome-wide SNPs. Am. J. Hum. Genet. 91, 1011–1021 (2012).

    CAS  Article  Google Scholar 

  7. 7.

    Yang, J. et al. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nat. Genet. 47, 1114–1120 (2015).

    CAS  Article  Google Scholar 

  8. 8.

    Loh, P.-R. et al. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat. Genet. 47, 1385–1392 (2015).

    CAS  Article  Google Scholar 

  9. 9.

    Speed, D. et al. Reevaluation of SNP heritability in complex human traits. Nat. Genet. 49, 986–992 (2017).

    CAS  Article  Google Scholar 

  10. 10.

    Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).

    CAS  Article  Google Scholar 

  11. 11.

    Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet 47, 291–295 (2015).

    CAS  Article  Google Scholar 

  12. 12.

    Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat Genet 47, 1228–1235 (2015).

    CAS  Article  Google Scholar 

  13. 13.

    Gazal, S. et al. Linkage disequilibrium–dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421–1427 (2017).

    CAS  Article  Google Scholar 

  14. 14.

    Speed, D. & Balding, D. J. SumHer better estimates the SNP heritability of complex traits from summary statistics. Nat. Genet. 51, 277–284 (2018).

    Article  Google Scholar 

  15. 15.

    Haseman, J. K. & Elston, R. C. The investigation of linkage between a quantitative trait and a marker locus. Behav. Genet. 2, 3–19 (1972).

    CAS  Article  Google Scholar 

  16. 16.

    Wu, Y. & Sankararaman, S. A scalable estimator of SNP heritability for biobank-scale data. Bioinformatics 34, i187–i194 (2018).

    CAS  Article  Google Scholar 

  17. 17.

    Timpson, N. J., Greenwood, C. M. T., Soranzo, N., Lawson, D. J. & Richards, J. B. Genetic architecture: the shape of the genetic contribution to human traits and disease. Nat. Rev. Genet. 19, 110–124 (2017).

    Article  Google Scholar 

  18. 18.

    Evans, L. M. et al. Comparison of methods that use whole genome data to estimate the heritability and genetic architecture of complex traits. Nat. Genet. 50, 737–745 (2018).

    CAS  Article  Google Scholar 

  19. 19.

    Gazal, S., Marquez-Luna, C., Finucane, H. K. & Price, A. L. Reconciling S-LDSC and LDAK models and functional enrichment estimates. Preprint at bioRxiv https://doi.org/10.1101/256412 (2018).

  20. 20.

    Eyre-Walker, A. Genetic architecture of a complex trait and its implications for fitness and genome-wide association studies. Proc. Natl Acad. Sci. USA 107, 1752–1756 (2010).

    CAS  Article  Google Scholar 

  21. 21.

    Lohmueller, K. E. The impact of population demography and selection on the genetic architecture of complex traits. PLoS Genet. 10, e1004379 (2014).

    Article  Google Scholar 

  22. 22.

    Schoech, A. P. et al. Quantification of frequency-dependent genetic architectures in 25 UK Biobank traits reveals action of negative selection. Nat. Commun. 10, 790 (2019).

    Article  Google Scholar 

  23. 23.

    Zeng, J. et al. Signatures of negative selection in the genetic architecture of human complex traits. Nat. Genet. 50, 746–753 (2018).

    CAS  Article  Google Scholar 

  24. 24.

    O’Connor, L. J. et al. Polygenicity of complex traits is explained by negative selection. Preprint at bioRxiv https://doi.org/10.1101/420497 (2018).

  25. 25.

    Uricchio, L. H., Kitano, H. C., Gusev, A. & Zaitlen, N. A. An evolutionary compass for detecting signals of polygenic selection and mutational bias. Evol. Lett. 3, 69–79 (2019).

    Article  Google Scholar 

  26. 26.

    Zhang, Y., Qi, G., Park, J.-H. & Chatterjee, N. Estimation of complex effect-size distributions using summary-level statistics from genome-wide association studies across 32 complex traits. Nat. Genet. 50, 1318–1326 (2018).

    CAS  Article  Google Scholar 

  27. 27.

    Loh, P.-R., Kichaev, G., Gazal, S., Schoech, A. P. & Price, A. L. Mixed-model association for biobank-scale datasets. Nat. Genet. 50, 906–908 (2018).

    CAS  Article  Google Scholar 

  28. 28.

    Gamazon, E. R., Cox, N. J. & Davis, L. K. Structural architecture of SNP effects on complex traits. Am. J. Hum. Genet. 95, 477–489 (2014).

    CAS  Article  Google Scholar 

  29. 29.

    Shi, H., Kichaev, G. & Pasaniuc, B. Contrasting the genetic architecture of 30 complex traits from summary associationdata. Am. J. Hum. Genet. 99, 139–153 (2016).

    CAS  Article  Google Scholar 

  30. 30.

    Gazal, S. et al. Functional architecture of low-frequency variants highlights strength of negative selection across coding and non-coding annotations. Nat. Genet. 50, 1600–1607 (2018).

    CAS  Article  Google Scholar 

  31. 31.

    Consortium, T. 1000 G. P. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

  32. 32.

    Ledoit, O. & Wolf, M. A well-conditioned estimator for large-dimensional covariance matrices. J. Multivar. Anal. 88, 365–411 (2004).

    Article  Google Scholar 

  33. 33.

    Nagai, A. et al. Overview of the BioBank Japan Project: Study design and profile. J. Epidemiol. 27, S2–S8 (2017).

    Article  Google Scholar 

  34. 34.

    Leitsalu, L. et al. Cohort Profile: Estonian Biobank of the Estonian Genome Center, University of Tartu. Int. J. Epidemiol. 44, 1137–1147 (2015).

    Article  Google Scholar 

  35. 35.

    Gaziano, J. M. et al. Million Veteran Program: A mega-biobank to study genetic influences on health and disease. J. Clin. Epidemiol. 70, 214–223 (2016).

    Article  Google Scholar 

  36. 36.

    Pasaniuc, B. & Price, A. L. Dissecting the genetics of complex traits using summary association statistics. Nat. Rev. Genet. 18, 117–127 (2016).

    Article  Google Scholar 

  37. 37.

    Hormozdiari, F., Kichaev, G., Yang, W.-Y., Pasaniuc, B. & Eskin, E. Identification of causal genes for complex traits. Bioinformatics 31, i206–i213 (2015).

    CAS  Article  Google Scholar 

  38. 38.

    Shi, H., Mancuso, N., Spendlove, S. & Pasaniuc, B. Local genetic correlation gives insights into the shared genetic architecture of complex traits. Am. J. Hum. Genet. 101, 737–751 (2017).

    CAS  Article  Google Scholar 

  39. 39.

    Yengo, L. et al. Imprint of assortative mating on the human genome. Nat. Hum. Behav. 2, 948–954 (2018).

    Article  Google Scholar 

  40. 40.

    Golan, D., Lander, E. S. & Rosset, S. Measuring missing heritability: inferring the contribution of common variants. Proc. Natl Acad. Sci. USA 111, E5272–E5281 (2014).

    CAS  Article  Google Scholar 

  41. 41.

    Weissbrod, O., Flint, J. & Rosset, S. Estimating SNP-based heritability and genetic correlation in case-control studies directly and with summary statistics. Am. J. Hum. Genet. 103, 89–99 (2018).

    CAS  Article  Google Scholar 

  42. 42.

    Lee, S. H. et al. Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs. Nat. Genet. 44, 247–250 (2012).

    CAS  Article  Google Scholar 

  43. 43.

    Lee, S. H. et al. Estimation of SNP heritability from dense genotype data. Am. J. Hum. Genet. 93, 1151–1155 (2013).

    CAS  Article  Google Scholar 

  44. 44.

    Elman, R. S., Karpenko, N. & Merkurjev, A. The Algebraic and Geometric Theory of Quadratic Forms Vol. 56 (American Mathematical Society, 2008).

  45. 45.

    Lee, S. H., Wray, N. R., Goddard, M. E. & Visscher, P. M. Estimating missing heritability for disease from genome-wide association studies. Am. J. Hum. Genet. 88, 294–305 (2011).

    Article  Google Scholar 

  46. 46.

    Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).

    CAS  Article  Google Scholar 

  47. 47.

    Purcell, S. et al. PLINK: A tool set for whole-genome association and population-based linkage Analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

    CAS  Article  Google Scholar 

Download references

Acknowledgements

This research was conducted using the UK Biobank Resource under applications 33297 and 33127. We thank the participants of UK Biobank for making this work possible. We also thank R. Johnson, M. Freund, M. Major, S. Gazal, A. Price and D. Balding for helpful discussions. This work was funded by the National Institutes of Health (NIH) under awards R01HG009120, R01MH115676, R01HG006399, U01CA194393, R35GM125055, T32NS048004, T32MH073526 and T32HG002536 and the National Science Foundation (NSF) under award III-1705121.

Author information

Affiliations

Authors

Contributions

K.H., K.S.B., H.S. and B.P. conceived and designed the experiments. K.H. and K.S.B. performed the experiments and statistical analyses. A.M., H.S., N.M. and S.S. provided statistical support. K.H., K.S.B. and Y.W. collected and managed the data. K.S.B. and B.P. wrote the manuscript with the participation of all authors.

Corresponding authors

Correspondence to Kathryn S. Burch or Bogdan Pasaniuc.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Notes and Supplementary Figs. 1–22

Reporting Summary

Supplementary Tables

Supplementary Tables 1–26

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hou, K., Burch, K.S., Majumdar, A. et al. Accurate estimation of SNP-heritability from biobank-scale data irrespective of genetic architecture. Nat Genet 51, 1244–1251 (2019). https://doi.org/10.1038/s41588-019-0465-0

Download citation

Further reading