Abstract
SNP-heritability is a fundamental quantity in the study of complex traits. Recent studies have shown that existing methods to estimate genome-wide SNP-heritability can yield biases when their assumptions are violated. While various approaches have been proposed to account for frequency- and linkage disequilibrium (LD)-dependent genetic architectures, it remains unclear which estimates reported in the literature are reliable. Here we show that genome-wide SNP-heritability can be accurately estimated from biobank-scale data irrespective of genetic architecture, without specifying a heritability model or partitioning SNPs by allele frequency and/or LD. We show analytically and through extensive simulations starting from real genotypes (UK Biobank, Nā=ā337āK) that, unlike existing methods, our closed-form estimator is robust across a wide range of architectures. We provide estimates of SNP-heritability for 22 complex traits in the UK Biobank and show that, consistent with our results in simulations, existing biobank-scale methods yield estimates up to 30% different from our theoretically-justified approach.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 /Ā 30Ā days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The baseline-LD annotations used in Fig. 4 are available at https://data.broadinstitute.org/alkesgroup/LDSCORE/. All individual-level genotypes and phenotypes were obtained from the UK Biobank (https://www.ukbiobank.ac.uk); we do not have permission to release this data. The 1000 Genomes Phase 3 reference panel can be downloaded at http://www.internationalgenome.org/data.
Code availability
Open source code implementing the GRE estimator and our simulation framework is available on Github at https://github.com/bogdanlab/h2-GRE.
References
Visscher, P. M., Hill, W. G. & Wray, N. R. Heritability in the genomics eraāconcepts and misconceptions. Nat. Rev. Genet. 9, 255ā266 (2008).
Wray, N. R. et al. Pitfalls of predicting complex traits from SNPs. Nat. Rev. Genet. 14, 507ā515 (2013).
Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565ā569 (2010).
Visscher, P. M., Brown, M. A., McCarthy, M. I. & Yang, J. Five years of GWAS discovery. Am. J. Hum. Genet. 90, 7ā24 (2012).
Visscher, P. M. et al. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 101, 5ā22 (2017).
Speed, D., Hemani, G., Johnson, M. R. & Balding, D. J. Improved heritability estimation from genome-wide SNPs. Am. J. Hum. Genet. 91, 1011ā1021 (2012).
Yang, J. et al. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nat. Genet. 47, 1114ā1120 (2015).
Loh, P.-R. et al. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat. Genet. 47, 1385ā1392 (2015).
Speed, D. et al. Reevaluation of SNP heritability in complex human traits. Nat. Genet. 49, 986ā992 (2017).
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203ā209 (2018).
Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet 47, 291ā295 (2015).
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat Genet 47, 1228ā1235 (2015).
Gazal, S. et al. Linkage disequilibriumādependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421ā1427 (2017).
Speed, D. & Balding, D. J. SumHer better estimates the SNP heritability of complex traits from summary statistics. Nat. Genet. 51, 277ā284 (2018).
Haseman, J. K. & Elston, R. C. The investigation of linkage between a quantitative trait and a marker locus. Behav. Genet. 2, 3ā19 (1972).
Wu, Y. & Sankararaman, S. A scalable estimator of SNP heritability for biobank-scale data. Bioinformatics 34, i187āi194 (2018).
Timpson, N. J., Greenwood, C. M. T., Soranzo, N., Lawson, D. J. & Richards, J. B. Genetic architecture: the shape of the genetic contribution to human traits and disease. Nat. Rev. Genet. 19, 110ā124 (2017).
Evans, L. M. et al. Comparison of methods that use whole genome data to estimate the heritability and genetic architecture of complex traits. Nat. Genet. 50, 737ā745 (2018).
Gazal, S., Marquez-Luna, C., Finucane, H. K. & Price, A. L. Reconciling S-LDSC and LDAK models and functional enrichment estimates. Preprint at bioRxiv https://doi.org/10.1101/256412 (2018).
Eyre-Walker, A. Genetic architecture of a complex trait and its implications for fitness and genome-wide association studies. Proc. Natl Acad. Sci. USA 107, 1752ā1756 (2010).
Lohmueller, K. E. The impact of population demography and selection on the genetic architecture of complex traits. PLoS Genet. 10, e1004379 (2014).
Schoech, A. P. et al. Quantification of frequency-dependent genetic architectures in 25 UK Biobank traits reveals action of negative selection. Nat. Commun. 10, 790 (2019).
Zeng, J. et al. Signatures of negative selection in the genetic architecture of human complex traits. Nat. Genet. 50, 746ā753 (2018).
OāConnor, L. J. et al. Polygenicity of complex traits is explained by negative selection. Preprint at bioRxiv https://doi.org/10.1101/420497 (2018).
Uricchio, L. H., Kitano, H. C., Gusev, A. & Zaitlen, N. A. An evolutionary compass for detecting signals of polygenic selection and mutational bias. Evol. Lett. 3, 69ā79 (2019).
Zhang, Y., Qi, G., Park, J.-H. & Chatterjee, N. Estimation of complex effect-size distributions using summary-level statistics from genome-wide association studies across 32 complex traits. Nat. Genet. 50, 1318ā1326 (2018).
Loh, P.-R., Kichaev, G., Gazal, S., Schoech, A. P. & Price, A. L. Mixed-model association for biobank-scale datasets. Nat. Genet. 50, 906ā908 (2018).
Gamazon, E. R., Cox, N. J. & Davis, L. K. Structural architecture of SNP effects on complex traits. Am. J. Hum. Genet. 95, 477ā489 (2014).
Shi, H., Kichaev, G. & Pasaniuc, B. Contrasting the genetic architecture of 30 complex traits from summary associationdata. Am. J. Hum. Genet. 99, 139ā153 (2016).
Gazal, S. et al. Functional architecture of low-frequency variants highlights strength of negative selection across coding and non-coding annotations. Nat. Genet. 50, 1600ā1607 (2018).
Consortium, T. 1000 G. P. et al. A global reference for human genetic variation. Nature 526, 68ā74 (2015).
Ledoit, O. & Wolf, M. A well-conditioned estimator for large-dimensional covariance matrices. J. Multivar. Anal. 88, 365ā411 (2004).
Nagai, A. et al. Overview of the BioBank Japan Project: Study design and profile. J. Epidemiol. 27, S2āS8 (2017).
Leitsalu, L. et al. Cohort Profile: Estonian Biobank of the Estonian Genome Center, University of Tartu. Int. J. Epidemiol. 44, 1137ā1147 (2015).
Gaziano, J. M. et al. Million Veteran Program: A mega-biobank to study genetic influences on health and disease. J. Clin. Epidemiol. 70, 214ā223 (2016).
Pasaniuc, B. & Price, A. L. Dissecting the genetics of complex traits using summary association statistics. Nat. Rev. Genet. 18, 117ā127 (2016).
Hormozdiari, F., Kichaev, G., Yang, W.-Y., Pasaniuc, B. & Eskin, E. Identification of causal genes for complex traits. Bioinformatics 31, i206āi213 (2015).
Shi, H., Mancuso, N., Spendlove, S. & Pasaniuc, B. Local genetic correlation gives insights into the shared genetic architecture of complex traits. Am. J. Hum. Genet. 101, 737ā751 (2017).
Yengo, L. et al. Imprint of assortative mating on the human genome. Nat. Hum. Behav. 2, 948ā954 (2018).
Golan, D., Lander, E. S. & Rosset, S. Measuring missing heritability: inferring the contribution of common variants. Proc. Natl Acad. Sci. USA 111, E5272āE5281 (2014).
Weissbrod, O., Flint, J. & Rosset, S. Estimating SNP-based heritability and genetic correlation in case-control studies directly and with summary statistics. Am. J. Hum. Genet. 103, 89ā99 (2018).
Lee, S. H. et al. Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs. Nat. Genet. 44, 247ā250 (2012).
Lee, S. H. et al. Estimation of SNP heritability from dense genotype data. Am. J. Hum. Genet. 93, 1151ā1155 (2013).
Elman, R. S., Karpenko, N. & Merkurjev, A. The Algebraic and Geometric Theory of Quadratic Forms Vol. 56 (American Mathematical Society, 2008).
Lee, S. H., Wray, N. R., Goddard, M. E. & Visscher, P. M. Estimating missing heritability for disease from genome-wide association studies. Am. J. Hum. Genet. 88, 294ā305 (2011).
Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76ā82 (2011).
Purcell, S. et al. PLINK: A tool set for whole-genome association and population-based linkage Analyses. Am. J. Hum. Genet. 81, 559ā575 (2007).
Acknowledgements
This research was conducted using the UK Biobank Resource under applications 33297 and 33127. We thank the participants of UK Biobank for making this work possible. We also thank R. Johnson, M. Freund, M. Major, S. Gazal, A. Price and D. Balding for helpful discussions. This work was funded by the National Institutes of Health (NIH) under awards R01HG009120, R01MH115676, R01HG006399, U01CA194393, R35GM125055, T32NS048004, T32MH073526 and T32HG002536 and the National Science Foundation (NSF) under award III-1705121.
Author information
Authors and Affiliations
Contributions
K.H., K.S.B., H.S. and B.P. conceived and designed the experiments. K.H. and K.S.B. performed the experiments and statistical analyses. A.M., H.S., N.M. and S.S. provided statistical support. K.H., K.S.B. and Y.W. collected and managed the data. K.S.B. and B.P. wrote the manuscript with the participation of all authors.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisherās note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Notes and Supplementary Figs. 1ā22
Supplementary Tables
Supplementary Tables 1ā26
Rights and permissions
About this article
Cite this article
Hou, K., Burch, K.S., Majumdar, A. et al. Accurate estimation of SNP-heritability from biobank-scale data irrespective of genetic architecture. Nat Genet 51, 1244ā1251 (2019). https://doi.org/10.1038/s41588-019-0465-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-019-0465-0
This article is cited by
-
A method to estimate the contribution of rare coding variants to complex trait heritability
Nature Communications (2024)
-
A versatile, fast and unbiased method for estimation of gene-by-environment interaction effects on biobank-scale datasets
Nature Communications (2023)
-
Polygenic scoring accuracy varies across the genetic ancestry continuum
Nature (2023)
-
Accurate and efficient estimation of local heritability using summary statistics and the linkage disequilibrium matrix
Nature Communications (2023)
-
Causal effects on complex traits are similar for common variants across segments of different continental ancestries within admixed individuals
Nature Genetics (2023)