Analyses of data from genome-wide association studies on unrelated individuals have shown that, for human traits and diseases, approximately one-third to two-thirds of heritability is captured by common SNPs. However, it is not known whether the remaining heritability is due to the imperfect tagging of causal variants by common SNPs, in particular whether the causal variants are rare, or whether it is overestimated due to bias in inference from pedigree data. Here we estimated heritability for height and body mass index (BMI) from whole-genome sequence data on 25,465 unrelated individuals of European ancestry. The estimated heritability was 0.68 (standard error 0.10) for height and 0.30 (standard error 0.10) for body mass index. Low minor allele frequency variants in low linkage disequilibrium (LD) with neighboring variants were enriched for heritability, to a greater extent for protein-altering variants, consistent with negative selection. Our results imply that rare variants, in particular those in regions of low linkage disequilibrium, are a major source of the still missing heritability of complex traits and disease.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Genome Medicine Open Access 28 November 2023
Genome Medicine Open Access 15 November 2023
Cell Research Open Access 27 July 2023
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
The individual-level genotype and phenotype TOPMed data used in this study are available through dbGaP. The dbGaP accession numbers for all TOPMed studies referenced in this paper are listed in Supplementary Table 1. The genotypic data are under restricted access. This research was conducted under TOPMed proposal ID 3235. Individual-level genotype and phenotype data for the UKB are available through formal application (http://www.ukbiobank.ac.uk). The UK10K data are accessible at https://www.uk10k.org. The 1000 Genomes genotype data are available at https://www.internationalgenome.org.
The code used for the main analysis and figures is available at https://github.com/CNSGenomics/Heritability_WGS. GRM computation, LD score calculations, PC projections and GREML analyses were performed using GCTA 1.92.4 (https://cnsgenomics.com/software/gcta/#Download). WGS analyses followed the steps described at https://cnsgenomics.com/software/gcta/#GREMLinWGSorimputeddata. Plink 1.9 (https://www.cog-genomics.org/plink/1.9) and 2.0 were used in the present study (https://www.cog-genomics.org/plink/2.0). R 3.4.1 (https://www.r-project.org) and Tidyverse packages (https://www.tidyverse.org) were used to generate figures and additional analyses. KING 2.2.6 was used for IBD calculations (https://www.kingrelatedness.com). All the parameters used for analyses are described in Methods.
Lynch, M. & Walsh, B. Genetics and Analysis of Quantitative Traits (Sinauer, 1998).
Fisher, R. A. XV—the correlation between relatives on the supposition of mendelian inheritance. Trans. R. Soc. Edinb. 52, 399–433 (1918).
MacArthur, J. et al. The new NHGRI-EBI catalog of published genome-wide association studies (GWAS catalog). Nucleic Acids Res. 45, D896–D901 (2017).
Gazal, S. et al. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421–1427 (2017).
Zeng, J. et al. Signatures of negative selection in the genetic architecture of human complex traits. Nat. Genet. 50, 746–753 (2018).
Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).
Visscher, P. M., Brown, M. A., McCarthy, M. I. & Yang, J. Five years of GWAS discovery. Am. J. Hum. Genet. 90, 7–24 (2012).
Yang, J. et al. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nat. Genet. 47, 1114–1120 (2015).
Speed, D. et al. Reevaluation of SNP heritability in complex human traits. Nat. Genet. 49, 986–992 (2017).
Zuk, O., Hechter, E., Sunyaev, S. R. & Lander, E. S. The mystery of missing heritability: genetic interactions create phantom heritability. Proc. Natl Acad. Sci. USA 109, 1193–1198 (2012).
Young, A. I. et al. Relatedness disequilibrium regression estimates heritability without environmental bias. Nat. Genet. 50, 1304–1310 (2018).
Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed program. Nature 590, 290–299 (2021).
The 1000 Genomes Project Consortium A global reference for human genetic variation. Nature 526, 68–74 (2015).
Bergstrom, A. et al. Insights into human genetic variation and population history from 929 diverse genomes. Science https://doi.org/10.1126/science.aay5012 (2020).
Yengo, L. et al. Meta-analysis of genome-wide association studies for height and body mass index in approximately 700000 individuals of European ancestry. Hum. Mol. Genet. https://doi.org/10.1093/hmg/ddy271 (2018).
International HapMap 3 Consortium Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).
Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
Yang, J. et al. Genome partitioning of genetic variation for complex traits using common SNPs. Nat. Genet. 43, 519–525 (2011).
McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016).
Evans, L. M. et al. Comparison of methods that use whole genome data to estimate the heritability and genetic architecture of complex traits. Nat. Genet. 50, 737–745 (2018).
Elks, C. E. et al. Variability in the heritability of body mass index: a systematic review and meta-regression. Front. Endocrinol. 3, 29 (2012).
Mitt, M. et al. Improved imputation accuracy of rare and low-frequency variants using population-specific high-coverage WGS-based imputation reference panel. Eur. J. Hum. Genet. 25, 869–876 (2017).
Mathieson, I. & McVean, G. Differential confounding of rare and common variants in spatially structured populations. Nat. Genet. 44, 243–246 (2012).
Zaidi, A. A. & Mathieson, I. Demographic history mediates the effect of stratification on polygenic scores. eLife 9, e61548 (2020).
UK10K Consortium The UK10K project identifies rare variants in health and disease. Nature 526, 82–90 (2015).
Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6, 80–92 (2012).
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Gusev, A. et al. Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am. J. Hum. Genet. 95, 535–552 (2014).
Keinan, A. & Clark, A. G. Recent explosive human population growth has resulted in an excess of rare genetic variants. Science 336, 740–743 (2012).
Genome of the Netherlands Consortium Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nat. Genet. 46, 818–825 (2014).
Stulp, G., Simons, M. J., Grasman, S. & Pollet, T. V. Assortative mating for human height: a meta-analysis. Am. J. Hum. Biol. https://doi.org/10.1002/ajhb.22917 (2017).
Border, R. et al. Assortative mating biases marker-based heritability estimators. Preprint at bioRxiv https://doi.org/10.1101/2021.03.18.436091 (2021).
Visscher, P. M. et al. Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full siblings. PLoS Genet. 2, e41 (2006).
Kemper, K. E. et al. Phenotypic covariance across the entire spectrum of relatedness for 86 billion pairs of individuals. Nat. Commun. 12, 1050 (2021).
Hernandez, R. D. et al. Ultrarare variants drive substantial cis heritability of human gene expression. Nat. Genet. 51, 1349–1355 (2019).
Nurk, S. et al. The complete sequence of a human genome. Preprint at bioRxiv https://doi.org/10.1101/2021.05.26.445798 (2021).
Visscher, P. M. et al. Statistical power to detect genetic (co)variance of complex traits using SNP data in unrelated samples. PLoS Genet. 10, e1004269 (2014).
Shihab, H. A. et al. An integrative approach to predicting the functional effects of non-coding and coding sequence variation. Bioinformatics 31, 1536–1543 (2015).
Yengo, L. et al. Imprint of assortative mating on the human genome. Nat. Hum. Behav. 2, 948–954 (2018).
Uricchio, L. H., Zaitlen, N. A., Ye, C. J., Witte, J. S. & Hernandez, R. D. Selection and explosive growth alter genetic architecture and hamper the detection of causal rare variants. Genome Res. 26, 863–873 (2016).
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4, 7 (2015).
Maples, B. K., Gravel, S., Kenny, E. E. & Bustamante, C. D. RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. Am. J. Hum. Genet. 93, 278–288 (2013).
Jiang, L. et al. A resource-efficient tool for mixed model association analysis of large-scale data. Nat. Genet. 51, 1749–1755 (2019).
Goudet, J., Kay, T. & Weir, B. S. How to estimate kinship. Mol. Ecol. 27, 4121–4135 (2018).
VanRaden, P. M. Efficient methods to compute genomic predictions. J. Dairy Sci. 91, 4414–4423 (2008).
Loh, P. R. et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48, 1443–1448 (2016).
P.M.V. was supported by the Australian Research Council (grant nos. DP160102400 and FL180100072), the Australian National Health and Medical Research Council (grant nos. 1113400 and 1078037) and the US National Institutes of Health (NIH; grant no. R01MH100141). J.Y. was supported by the Australian Research Council (grant no. FT180100186), the Sylvia & Charles Viertel Charitable Foundation and the Westlake Education Foundation. L.Y. was supported by the Australian Research Council (grant no. DE200100425). The present study makes use of data from the TOPMed program, the UKB and the UK10K projects. WGS for the TOPMed program was supported by the NHLBI. A full list of acknowledgements is provided in the Supplementary Information.
P.T.E. is supported by a grant from Bayer AG to the Broad Institute focused on the genetics and therapeutics of cardiovascular diseases. He has also served on advisory boards or consulted for Bayer AG, Quest Diagnostics and Novartis. S.A.L. receives sponsored research support from Bristol Myers Squibb/Pfizer, Bayer AG, Boehringer Ingelheim, Fitbit and IBM, and has consulted for Bristol Myers Squibb/Pfizer, Bayer AG and Blackstone Life Sciences. The other authors declare no competing interests.
Peer review information
Nature Genetics thanks the anonymous reviewers for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Wainschtein, P., Jain, D., Zheng, Z. et al. Assessing the contribution of rare variants to complex trait heritability from whole-genome sequence data. Nat Genet 54, 263–273 (2022). https://doi.org/10.1038/s41588-021-00997-7
This article is cited by
Genome Medicine (2023)
Genome Medicine (2023)
Nature Communications (2023)
Conventional twin studies overestimate the environmental differences between families relevant to educational attainment
npj Science of Learning (2023)