Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Assessing the contribution of rare variants to complex trait heritability from whole-genome sequence data

Abstract

Analyses of data from genome-wide association studies on unrelated individuals have shown that, for human traits and diseases, approximately one-third to two-thirds of heritability is captured by common SNPs. However, it is not known whether the remaining heritability is due to the imperfect tagging of causal variants by common SNPs, in particular whether the causal variants are rare, or whether it is overestimated due to bias in inference from pedigree data. Here we estimated heritability for height and body mass index (BMI) from whole-genome sequence data on 25,465 unrelated individuals of European ancestry. The estimated heritability was 0.68 (standard error 0.10) for height and 0.30 (standard error 0.10) for body mass index. Low minor allele frequency variants in low linkage disequilibrium (LD) with neighboring variants were enriched for heritability, to a greater extent for protein-altering variants, consistent with negative selection. Our results imply that rare variants, in particular those in regions of low linkage disequilibrium, are a major source of the still missing heritability of complex traits and disease.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: GREML-LDMS estimates with 8 bins (2 LD bins for each of the 4 MAF bins) correcting for 20 PCs (calculated from LD-pruned HM3 SNPs) after imputing SNPs from Illumina InfiniumCore24, GSA 24 and Affymetrix Axiom arrays using HRC reference panels for n = 25,465 samples.
Fig. 2: GREML-LDMS of height and BMI for n = 25,465 samples using 3 or 4 LD groups for each MAF bin, correcting for 48, 160 or 320 PCs computed from WGS variants.
Fig. 3: Variance explained per variant (the estimate of genetic variance divided by the number of variants in each bin) from GREML-LDMS with the low-LD and low-MAF (<0.1) variants partitioned into two distinct categories according to the SnpEff putative effect of the variant (protein-altering or non-protein-altering), correcting for 48 PCs from WGS variants for n = 25,465 samples.

Similar content being viewed by others

Data availability

The individual-level genotype and phenotype TOPMed data used in this study are available through dbGaP. The dbGaP accession numbers for all TOPMed studies referenced in this paper are listed in Supplementary Table 1. The genotypic data are under restricted access. This research was conducted under TOPMed proposal ID 3235. Individual-level genotype and phenotype data for the UKB are available through formal application (http://www.ukbiobank.ac.uk). The UK10K data are accessible at https://www.uk10k.org. The 1000 Genomes genotype data are available at https://www.internationalgenome.org.

Code availability

The code used for the main analysis and figures is available at https://github.com/CNSGenomics/Heritability_WGS. GRM computation, LD score calculations, PC projections and GREML analyses were performed using GCTA 1.92.4 (https://cnsgenomics.com/software/gcta/#Download). WGS analyses followed the steps described at https://cnsgenomics.com/software/gcta/#GREMLinWGSorimputeddata. Plink 1.9 (https://www.cog-genomics.org/plink/1.9) and 2.0 were used in the present study (https://www.cog-genomics.org/plink/2.0). R 3.4.1 (https://www.r-project.org) and Tidyverse packages (https://www.tidyverse.org) were used to generate figures and additional analyses. KING 2.2.6 was used for IBD calculations (https://www.kingrelatedness.com). All the parameters used for analyses are described in Methods.

References

  1. Lynch, M. & Walsh, B. Genetics and Analysis of Quantitative Traits (Sinauer, 1998).

  2. Fisher, R. A. XV—the correlation between relatives on the supposition of mendelian inheritance. Trans. R. Soc. Edinb. 52, 399–433 (1918).

    Article  Google Scholar 

  3. MacArthur, J. et al. The new NHGRI-EBI catalog of published genome-wide association studies (GWAS catalog). Nucleic Acids Res. 45, D896–D901 (2017).

    Article  CAS  Google Scholar 

  4. Gazal, S. et al. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421–1427 (2017).

    Article  CAS  Google Scholar 

  5. Zeng, J. et al. Signatures of negative selection in the genetic architecture of human complex traits. Nat. Genet. 50, 746–753 (2018).

    Article  CAS  Google Scholar 

  6. Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).

    Article  CAS  Google Scholar 

  7. Visscher, P. M., Brown, M. A., McCarthy, M. I. & Yang, J. Five years of GWAS discovery. Am. J. Hum. Genet. 90, 7–24 (2012).

    Article  CAS  Google Scholar 

  8. Yang, J. et al. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nat. Genet. 47, 1114–1120 (2015).

    Article  CAS  Google Scholar 

  9. Speed, D. et al. Reevaluation of SNP heritability in complex human traits. Nat. Genet. 49, 986–992 (2017).

    Article  CAS  Google Scholar 

  10. Zuk, O., Hechter, E., Sunyaev, S. R. & Lander, E. S. The mystery of missing heritability: genetic interactions create phantom heritability. Proc. Natl Acad. Sci. USA 109, 1193–1198 (2012).

    Article  CAS  Google Scholar 

  11. Young, A. I. et al. Relatedness disequilibrium regression estimates heritability without environmental bias. Nat. Genet. 50, 1304–1310 (2018).

    Article  CAS  Google Scholar 

  12. Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed program. Nature 590, 290–299 (2021).

    Article  CAS  Google Scholar 

  13. The 1000 Genomes Project Consortium A global reference for human genetic variation. Nature 526, 68–74 (2015).

  14. Bergstrom, A. et al. Insights into human genetic variation and population history from 929 diverse genomes. Science https://doi.org/10.1126/science.aay5012 (2020).

  15. Yengo, L. et al. Meta-analysis of genome-wide association studies for height and body mass index in approximately 700000 individuals of European ancestry. Hum. Mol. Genet. https://doi.org/10.1093/hmg/ddy271 (2018).

  16. International HapMap 3 Consortium Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).

  17. Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).

    Article  CAS  Google Scholar 

  18. Yang, J. et al. Genome partitioning of genetic variation for complex traits using common SNPs. Nat. Genet. 43, 519–525 (2011).

    Article  CAS  Google Scholar 

  19. McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016).

    Article  CAS  Google Scholar 

  20. Evans, L. M. et al. Comparison of methods that use whole genome data to estimate the heritability and genetic architecture of complex traits. Nat. Genet. 50, 737–745 (2018).

    Article  CAS  Google Scholar 

  21. Elks, C. E. et al. Variability in the heritability of body mass index: a systematic review and meta-regression. Front. Endocrinol. 3, 29 (2012).

    Article  Google Scholar 

  22. Mitt, M. et al. Improved imputation accuracy of rare and low-frequency variants using population-specific high-coverage WGS-based imputation reference panel. Eur. J. Hum. Genet. 25, 869–876 (2017).

    Article  Google Scholar 

  23. Mathieson, I. & McVean, G. Differential confounding of rare and common variants in spatially structured populations. Nat. Genet. 44, 243–246 (2012).

    Article  CAS  Google Scholar 

  24. Zaidi, A. A. & Mathieson, I. Demographic history mediates the effect of stratification on polygenic scores. eLife 9, e61548 (2020).

  25. UK10K Consortium The UK10K project identifies rare variants in health and disease. Nature 526, 82–90 (2015).

  26. Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6, 80–92 (2012).

    Article  CAS  Google Scholar 

  27. Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).

    Article  CAS  Google Scholar 

  28. Gusev, A. et al. Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am. J. Hum. Genet. 95, 535–552 (2014).

    Article  CAS  Google Scholar 

  29. Keinan, A. & Clark, A. G. Recent explosive human population growth has resulted in an excess of rare genetic variants. Science 336, 740–743 (2012).

    Article  CAS  Google Scholar 

  30. Genome of the Netherlands Consortium Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nat. Genet. 46, 818–825 (2014).

  31. Stulp, G., Simons, M. J., Grasman, S. & Pollet, T. V. Assortative mating for human height: a meta-analysis. Am. J. Hum. Biol. https://doi.org/10.1002/ajhb.22917 (2017).

  32. Border, R. et al. Assortative mating biases marker-based heritability estimators. Preprint at bioRxiv https://doi.org/10.1101/2021.03.18.436091 (2021).

  33. Visscher, P. M. et al. Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full siblings. PLoS Genet. 2, e41 (2006).

    Article  Google Scholar 

  34. Kemper, K. E. et al. Phenotypic covariance across the entire spectrum of relatedness for 86 billion pairs of individuals. Nat. Commun. 12, 1050 (2021).

    Article  CAS  Google Scholar 

  35. Hernandez, R. D. et al. Ultrarare variants drive substantial cis heritability of human gene expression. Nat. Genet. 51, 1349–1355 (2019).

    Article  CAS  Google Scholar 

  36. Nurk, S. et al. The complete sequence of a human genome. Preprint at bioRxiv https://doi.org/10.1101/2021.05.26.445798 (2021).

  37. Visscher, P. M. et al. Statistical power to detect genetic (co)variance of complex traits using SNP data in unrelated samples. PLoS Genet. 10, e1004269 (2014).

    Article  Google Scholar 

  38. Shihab, H. A. et al. An integrative approach to predicting the functional effects of non-coding and coding sequence variation. Bioinformatics 31, 1536–1543 (2015).

    Article  CAS  Google Scholar 

  39. Yengo, L. et al. Imprint of assortative mating on the human genome. Nat. Hum. Behav. 2, 948–954 (2018).

    Article  Google Scholar 

  40. Uricchio, L. H., Zaitlen, N. A., Ye, C. J., Witte, J. S. & Hernandez, R. D. Selection and explosive growth alter genetic architecture and hamper the detection of causal rare variants. Genome Res. 26, 863–873 (2016).

    Article  CAS  Google Scholar 

  41. Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4, 7 (2015).

    Article  Google Scholar 

  42. Maples, B. K., Gravel, S., Kenny, E. E. & Bustamante, C. D. RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. Am. J. Hum. Genet. 93, 278–288 (2013).

    Article  CAS  Google Scholar 

  43. Jiang, L. et al. A resource-efficient tool for mixed model association analysis of large-scale data. Nat. Genet. 51, 1749–1755 (2019).

    Article  CAS  Google Scholar 

  44. Goudet, J., Kay, T. & Weir, B. S. How to estimate kinship. Mol. Ecol. 27, 4121–4135 (2018).

    Article  Google Scholar 

  45. VanRaden, P. M. Efficient methods to compute genomic predictions. J. Dairy Sci. 91, 4414–4423 (2008).

    Article  CAS  Google Scholar 

  46. Loh, P. R. et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48, 1443–1448 (2016).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

P.M.V. was supported by the Australian Research Council (grant nos. DP160102400 and FL180100072), the Australian National Health and Medical Research Council (grant nos. 1113400 and 1078037) and the US National Institutes of Health (NIH; grant no. R01MH100141). J.Y. was supported by the Australian Research Council (grant no. FT180100186), the Sylvia & Charles Viertel Charitable Foundation and the Westlake Education Foundation. L.Y. was supported by the Australian Research Council (grant no. DE200100425). The present study makes use of data from the TOPMed program, the UKB and the UK10K projects. WGS for the TOPMed program was supported by the NHLBI. A full list of acknowledgements is provided in the Supplementary Information.

Author information

Authors and Affiliations

Authors

Consortia

Contributions

P.M.V. and J.Y. conceived the study. P.W. performed the analyses, contributed to methods and interpretations of results, and wrote the first draft of the manuscript and supplementary materials. P.M.V., J.Y. and L.Y. provided supervision and contributed to analyses, and writing and revising the manuscript. M.E.G. contributed to supervision and analysis methods. D.J. and Z.Z. contributed to the analyses. C.A.L, R.D.H., S.T.M, C.C.L, K.E.N., L.A.L. and B.S.W. provided suggestions on the analyses and details of the phenotype data. L.A.C., A.H.S., B. MK., B.M.S., B.D.M., B.M.P., C.K., C.-T. L., C.M.A., D.R., D.I.C., D.D., D.M.L.-J., D.K.A., E.A.R., E.B., J.I.R., J.R.O., L.R.Y., M.A., M.A.A., M.-L.N.M., M.K.C., M.F., N.C., N.L.S., P.T.E., R.S.V., R.A.M., R.J.F.L., S.S.R., S.A.L., S.R.H., S.R., X.G. and Y.-D.I.C. provided phenotypic and/or WGS data through the TOPMed Consortium. All authors reviewed the manuscript, suggested revisions as needed and approved the final version. A full list of members and affiliations of the NHLBI TOPMed Consortium is available at https://topmed.nhlbi.nih.gov/topmed-banner-authorship.

Corresponding authors

Correspondence to Pierrick Wainschtein, Jian Yang or Peter M. Visscher.

Ethics declarations

Competing interests

P.T.E. is supported by a grant from Bayer AG to the Broad Institute focused on the genetics and therapeutics of cardiovascular diseases. He has also served on advisory boards or consulted for Bayer AG, Quest Diagnostics and Novartis. S.A.L. receives sponsored research support from Bristol Myers Squibb/Pfizer, Bayer AG, Boehringer Ingelheim, Fitbit and IBM, and has consulted for Bristol Myers Squibb/Pfizer, Bayer AG and Blackstone Life Sciences. The other authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Notes 1–3, Acknowledgements, Tables 1–8 and Figs. 1–43.

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wainschtein, P., Jain, D., Zheng, Z. et al. Assessing the contribution of rare variants to complex trait heritability from whole-genome sequence data. Nat Genet 54, 263–273 (2022). https://doi.org/10.1038/s41588-021-00997-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-021-00997-7

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing