Comparison of methods that use whole genome data to estimate the heritability and genetic architecture of complex traits

Abstract

Multiple methods have been developed to estimate narrow-sense heritability, h2, using single nucleotide polymorphisms (SNPs) in unrelated individuals. However, a comprehensive evaluation of these methods has not yet been performed, leading to confusion and discrepancy in the literature. We present the most thorough and realistic comparison of these methods to date. We used thousands of real whole-genome sequences to simulate phenotypes under varying genetic architectures and confounding variables, and we used array, imputed, or whole genome sequence SNPs to obtain ‘SNP-heritability’ estimates. We show that SNP-heritability can be highly sensitive to assumptions about the frequencies, effect sizes, and levels of linkage disequilibrium of underlying causal variants, but that methods that bin SNPs according to minor allele frequency and linkage disequilibrium are less sensitive to these assumptions across a wide range of genetic architectures and possible confounding factors. These findings provide guidance for best practices and proper interpretation of published estimates.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Comparison of heritability estimation methods. Mean \({\hat{h}}_{{\bf{SNP}}}^{{\bf{2}}}\) across 100 replicates from GRMs built from WGS SNPs in the least structured subsamples.
Fig. 2: Partitioned heritability methods to explore allelic spectra of traits. Mean \({\hat{h}}_{{\bf{SNP}}}^{{\bf{2}}}\) for four MAF bins across 100 replicates from multicomponent approaches in unrelated individuals using WGS SNPs in the least structured subsample.
Fig. 3: Influence of model assumptions using phenotypes simulated under alternative genetic architectures.Mean \({\hat{h}}_{{\bf{SNP}}}^{{\bf{2}}}\) across 100 replicates from GRMs built from imputed SNPs in the least structured subsamples across different model assumptions (bars) and different ways of simulating CVs (x axes).
Fig. 4: Influence of model assumptions using phenotypes simulated with LD-dependent genetic architecture. Mean \({\hat{h}}_{{\bf{SNP}}}^{{\bf{2}}}\) across 100 replicates from GRMs built from imputed SNPs in the least structured subsamples across different model assumptions (bars) and different ways of simulating CVs (x axes).
Fig. 5: Bias of heritability estimates under different model assumptions. Boxplots of the absolute bias of heritability estimates (\(\left|E\left({\hat{h}}_{{\rm{SNP}}}^{2}\right)-{h}^{2}\right|\)) across all simulated phenotypes.
Fig. 6: Estimated \({\hat{h}}_{{\bf{SNP}}}^{{\bf{2}}}\) using multiple methods with imputed variants for six complex traits in the UK Biobank.

References

  1. 1.

    Tenesa, A. & Haley, C. S. The heritability of human disease: estimation, uses and abuses. Nat. Rev. Genet. 14, 139–149 (2013).

    CAS  Article  PubMed  Google Scholar 

  2. 2.

    Visscher, P. M., Hill, W. G. & Wray, N. R. Heritability in the genomics era-concepts and misconceptions. Nat. Rev. Genet. 9, 255–266 (2008).

    CAS  Article  PubMed  Google Scholar 

  3. 3.

    Keller, M. C. & Coventry, W. L. Quantifying and addressing parameter indeterminacy in the classical twin design. Twin Res. Hum. Genet. 8, 201–213 (2005).

    Article  PubMed  Google Scholar 

  4. 4.

    Eaves, L. J., Last, K. A., Young, P. A. & Martin, N. G. Model-fitting approaches to the analysis of human behaviour. Heredity (Edinb.) 41, 249–320 (1978).

    CAS  Article  Google Scholar 

  5. 5.

    Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  6. 6.

    Lee, S. H. et al. Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs. Nat. Genet. 44, 247–250 (2012).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  7. 7.

    Yang, J. et al. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nat. Genet. 47, 1114–1120 (2015).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  8. 8.

    Hyde, C. L. et al. Identification of 15 genetic loci associated with risk of major depression in individuals of European descent. Nat. Genet. 48, 1031–1036 (2016).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  9. 9.

    Okbay, A. et al. Genetic variants associated with subjective well-being, depressive symptoms, and neuroticism identified through genome-wide analyses. Nat. Genet. 48, 624–633 (2016).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  10. 10.

    Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  11. 11.

    Yang, J. et al. Genome partitioning of genetic variation for complex traits using common SNPs. Nat. Genet. 43, 519–525 (2011).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  12. 12.

    Loh, P.-R. et al. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat. Genet. 47, 1385–1392 (2015).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  13. 13.

    Gazal, S. et al. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421–1427 (2017).

    CAS  Article  PubMed  Google Scholar 

  14. 14.

    Zeng, J. et al. Signatures of negative selection in the genetic architecture of human complex traits. Nat. Genet. https://doi.org/10.1038/s41588-018-0101-4 (2018).

  15. 15.

    Speed, D., Cai, N., Johnson, M. R., Nejentsev, S. & Balding, D. J. Reevaluation of SNP heritability in complex human traits. Nat. Genet. 49, 986–992 (2017).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  16. 16.

    Lee, S. H. et al. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nat. Genet. 45, 984–994 (2013).

    CAS  Article  PubMed  Google Scholar 

  17. 17.

    Gusev, A. et al. Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am. J. Hum. Genet. 95, 535–552 (2014).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  18. 18.

    Mancuso, N. et al. The contribution of rare variation to prostate cancer heritability. Nat. Genet. 48, 30–35 (2016).

    CAS  Article  PubMed  Google Scholar 

  19. 19.

    Bulik-Sullivan, B. K. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  20. 20.

    Speed, D., Hemani, G., Johnson, M. R. & Balding, D. J. Improved heritability estimation from genome-wide SNPs. Am. J. Hum. Genet. 91, 1011–1021 (2012).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  21. 21.

    McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  22. 22.

    Bycroft, C. et al. Genome-wide genetic data on ~ 500,000 UK Biobank participants. Preprint at bioRxiv https://doi.org/10.1101/166298 (2017).

  23. 23.

    Yang, J., Zeng, J., Goddard, M. E., Wray, N. R. & Visscher, P. M. Concepts, estimation and interpretation of SNP-based heritability. Nat. Genet. 49, 1304–1310 (2017).

    CAS  Article  PubMed  Google Scholar 

  24. 24.

    Zaitlen, N. et al. Using extended genealogy to estimate components of heritability for 23 quantitative and dichotomous traits. PLoS Genet. 9, e1003520 (2013).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  25. 25.

    Lee, S. H. et al. Estimation of SNP heritability from dense genotype data. Am. J. Hum. Genet. 93, 1151–1155 (2013).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  26. 26.

    Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  27. 27.

    Wood, A. R. et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 46, 1173–1186 (2014).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  28. 28.

    Lee, S. H., Wray, N. R., Goddard, M. E. & Visscher, P. M. Estimating missing heritability for disease from genome-wide association studies. Am. J. Hum. Genet. 88, 294–305 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  29. 29.

    Browning, S. R. & Browning, B. L. Population structure can inflate SNP-based heritability estimates. Am. J. Hum. Genet. 89, 191–193 (2011). author reply 193–195.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  30. 30.

    Goddard, M. E., Lee, S. H., Yang, J., Wray, N. R. & Visscher, P. M. Response to Browning and Browning. Am. J. Hum. Genet. 89, 193–195 (2011).

    CAS  Article  PubMed Central  Google Scholar 

  31. 31.

    Price, A. L., Zaitlen, N. A., Reich, D. & Patterson, N. New approaches to population stratification in genome-wide association studies. Nat. Rev. Genet. 11, 459–463 (2010).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  32. 32.

    Zhu, Z. et al. Dominance genetic variation contributes little to the missing heritability for human complex traits. Am. J. Hum. Genet. 96, 377–385 (2015).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  33. 33.

    Abraham, G. & Inouye, M. Fast principal component analysis of large-scale genome-wide data. PLoS One 9, e93766 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  34. 34.

    Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  35. 35.

    R Core Team. R: a Language and Environment for Statistical Computing. (R Foundation for Statistical Computing, Vienna, Austria, 2015).

  36. 36.

    Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  37. 37.

    Xia, C. et al. Pedigree- and SNP-associated genetics and recent environment are the major contributors to anthropometric and cardiometabolic trait variation. PLoS Genet. 12, e1005804 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  38. 38.

    Zuk, O., Hechter, E., Sunyaev, S. R. & Lander, E. S. The mystery of missing heritability: Genetic interactions create phantom heritability. Proc. Natl Acad. Sci. USA 109, 1193–1198 (2012).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank D. Speed (University College London) for providing LDAK5. We thank the Keller and Vrieze lab groups, the Institute for Behavioral Genetics, N. Wray, A. Price, and S. Caron for helpful comments. This work was supported by NIH grant R01MH100141 (to M.C.K.), NHMRC grants 1078037 (to P.M.V.) and 1113400 (to P.M.V. and J.Y.), Sylvia & Charles Viertel Charitable Foundation Senior Medical Research Fellowship (to J.Y.), and NIH grants R01DA037904 and R01HG008983 (to S.V.). This work used the Janus supercomputer, which is supported by the National Science Foundation (award number CNS-0821794), the University of Colorado Boulder, the University of Colorado Denver, and the National Center for Atmospheric Research. The Janus supercomputer is operated by the University of Colorado Boulder. We thank the participants of the individual Haplotype Reference Consortium cohorts. This research has been conducted using the UK Biobank Resource.

Author information

Affiliations

Authors

Consortia

Contributions

L.M.E. and M.C.K. conceived and designed the study. L.M.E. performed the statistical analyses and simulations. R.T., S.I.V., S.G., G.R.A., S.D., D.W.B., T.R.d.C., M.E.G., B.M.N., J.Y., and P.M.V. provided statistical support. The Haplotype Reference Consortium, G.R.A., and S.D. contributed to data collection and management. L.M.E. and M.C.K. wrote the manuscript with participation of all authors.

Corresponding authors

Correspondence to Luke M. Evans or Matthew C. Keller.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–34 and Supplementary Note

Reporting Summary

Supplementary Tables

Supplementary Tables 1–10

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Evans, L.M., Tahmasbi, R., Vrieze, S.I. et al. Comparison of methods that use whole genome data to estimate the heritability and genetic architecture of complex traits. Nat Genet 50, 737–745 (2018). https://doi.org/10.1038/s41588-018-0108-x

Download citation

Further reading

Search

Sign up for the Nature Briefing newsletter for a daily update on COVID-19 science.
Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing