Abstract

SNP heritability, the proportion of phenotypic variance explained by SNPs, has been reported for many hundreds of traits. Its estimation requires strong prior assumptions about the distribution of heritability across the genome, but current assumptions have not been thoroughly tested. By analyzing imputed data for a large number of human traits, we empirically derive a model that more accurately describes how heritability varies with minor allele frequency (MAF), linkage disequilibrium (LD) and genotype certainty. Across 19 traits, our improved model leads to estimates of common SNP heritability on average 43% (s.d. 3%) higher than those obtained from the widely used software GCTA and 25% (s.d. 2%) higher than those from the recently proposed extension GCTA-LDMS. Previously, DNase I hypersensitivity sites were reported to explain 79% of SNP heritability; using our improved heritability model, their estimated contribution is only 24%.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).

  2. 2.

    Personal genomes: the case of the missing heritability. Nature 456, 18–21 (2008).

  3. 3.

    et al. Describing the genetic architecture of epilepsy through heritability analysis. Brain 137, 2680–2689 (2014).

  4. 4.

    , , & The estimation of environmental and genetic trends from records subject to culling. Biometrics 15, 192–218 (1959).

  5. 5.

    & Introduction to Quantitative Genetics 4th edn (Longman, 1996).

  6. 6.

    et al. Genome partitioning of genetic variation for complex traits using common SNPs. Nat. Genet. 43, 519–525 (2011).

  7. 7.

    et al. Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am. J. Hum. Genet. 95, 535–552 (2014).

  8. 8.

    , , , & Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism–derived genomic relationships and restricted maximum likelihood. Bioinformatics 28, 2540–2542 (2012).

  9. 9.

    , , & Improved heritability estimation from genome-wide SNPs. Am. J. Hum. Genet. 91, 1011–1021 (2012).

  10. 10.

    et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).

  11. 11.

    Relationship between LD score and Haseman–Elston regression. Preprint at bioRxiv (2015).

  12. 12.

    & Restricted maximum likelihood (REML) estimation of variance components in the mixed model. Technometrics 18, 31–38 (1976).

  13. 13.

    , & Measuring missing heritability: inferring the contribution of common variants. Proc. Natl. Acad. Sci. USA 111, E5272–E5281 (2014).

  14. 14.

    et al. Estimation of SNP heritability from dense genotype data. Am. J. Hum. Genet. 93, 1151–1155 (2013).

  15. 15.

    et al. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nat. Genet. 47, 1114–1120 (2015).

  16. 16.

    et al. Germline genetic contributions to risk for esophageal adenocarcinoma, Barrett's esophagus, and gastroesophageal reflux. J. Natl. Cancer Inst. 105, 1711–1718 (2013).

  17. 17.

    et al. Genetic heritability of ischemic stroke and the contribution of previously reported candidate gene and genomewide associations. Stroke 43, 3161–3167 (2012).

  18. 18.

    et al. Using genome-wide complex trait analysis to quantify 'missing heritability' in Parkinson's disease. Hum. Mol. Genet. 21, 4996–5009 (2012).

  19. 19.

    et al. Common variants explain a large fraction of the variability in the liability to psoriasis in a Han Chinese population. BMC Genomics 15, 87 (2014).

  20. 20.

    et al. Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs. Nat. Genet. 44, 247–250 (2012).

  21. 21.

    et al. Estimation and partitioning of (co)heritability of inflammatory bowel disease from GWAS and Immunochip data. Hum. Mol. Genet. 23, 4710–4720 (2014).

  22. 22.

    et al. Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nat. Genet. 44, 483–489 (2012).

  23. 23.

    et al. The genetic architecture of pediatric cognitive abilities in the Philadelphia Neurodevelopmental Cohort. Mol. Psychiatry 20, 454–458 (2015).

  24. 24.

    et al. Population genomics of cardiometabolic traits: design of the University College London–London School of Hygiene and Tropical Medicine–Edinburgh–Bristol (UCLEB) Consortium. PLoS One 8, e71345 (2013).

  25. 25.

    et al. The Metabochip, a custom genotyping array for genetic studies of metabolic, cardiovascular, and anthropometric traits. PLoS Genet. 8, e1002793 (2012).

  26. 26.

    & Heritability of threshold characters. Genetics 35, 212–236 (1950).

  27. 27.

    , , & Estimating missing heritability for disease from genome-wide association studies. Am. J. Hum. Genet. 88, 294–305 (2011).

  28. 28.

    , , & GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011)..

  29. 29.

    , , & in The NCBI Handbook (eds. McEntyre, J. & Ostell, J.) Chapter. 18 (National Center for Biotechnology Information, 2002).

  30. 30.

    et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).

  31. 31.

    , & The Elements of Statistical Learning (Springer, 2001).

  32. 32.

    , , & Extension of the Bayesian alphabet for genomic selection. BMC Bioinformatics 12, 186 (2011).

  33. 33.

    et al. FaST linear mixed models for genome-wide association studies. Nat. Methods 8, 833–835 (2011).

  34. 34.

    & Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nat. Methods 11, 407–409 (2014).

  35. 35.

    , , , & Advantages and pitfalls in the application of mixed-model association methods. Nat. Genet. 46, 100–106 (2014).

  36. 36.

    et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).

  37. 37.

    Cross-Disorder Group of the Psychiatric Genomics Consortium. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nat. Genet. 45, 984–994 (2013).

  38. 38.

    et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).

  39. 39.

    et al. Linkage disequilibrium dependent architecture of human complex traits reveals action of negative selection. Preprint at bioRxiv (2017).

  40. 40.

    ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  41. 41.

    , , & Limitations of GCTA as a solution to the missing heritability problem. Proc. Natl. Acad. Sci. USA 113, E61–E70 (2016).

  42. 42.

    , & Genotype imputation with thousands of genomes. G3 (Bethesda) 1, 457–470 (2011).

  43. 43.

    , & Increased accuracy of artificial selection by using the realized relationship matrix. Genet. Res. (Camb.) 91, 47–60 (2009).

  44. 44.

    , & The impact of genetic relationship information on genome-assisted breeding values. Genetics 177, 2389–2397 (2007).

  45. 45.

    & Relatedness in the post-genomic era: is it still useful? Nat. Rev. Genet. 16, 33–44 (2015).

  46. 46.

    Mendelian proportions in a mixed population. Science 28, 49–50 (1908).

  47. 47.

    Über den Nachweis der Vererbung beim Menschen. Jahreshefte des Vereins fur Vaterländische Naturkd. Württemb. 64, 368–382 (1908).

  48. 48.

    & An efficient variance component approach implementing an average information REML suitable for combined LD and linkage mapping with a general complex pedigree. Genet. Sel. Evol. 38, 25–43 (2006).

  49. 49.

    World Health Organization. Global Tuberculosis Report (World Health Organization, 2014).

  50. 50.

    et al. Quantifying missing heritability at known GWAS loci. PLoS Genet. 9, e1003993 (2013).

  51. 51.

    & MultiBLUP: improved SNP-based prediction for complex traits. Genome Res. 24, 1550–1557 (2014).

  52. 52.

    , & Polygenic modeling with Bayesian sparse linear mixed models. PLoS Genet. 9, e1003264 (2013).

  53. 53.

    et al. Simultaneous discovery, estimation and prediction analysis of complex traits using a Bayesian mixture model. PLoS Genet. 11, e1004969 (2015).

  54. 54.

    et al. Statistical power to detect genetic (co)variance of complex traits using SNP data in unrelated samples. PLoS Genet. 10, e1004269 (2014).

  55. 55.

    et al. Haplotypes of common SNPs can explain missing heritability of complex diseases. Preprint at bioRxiv (2016).

  56. 56.

    , , & Adjusting for treatment effects in studies of quantitative traits: antihypertensive therapy and systolic blood pressure. Stat. Med. 24, 2911–2935 (2005).

  57. 57.

    et al. Large-scale gene-centric meta-analysis across 32 studies identifies multiple lipid loci. Am. J. Hum. Genet. 91, 823–838 (2012).

  58. 58.

    , & Improved whole-chromosome phasing for disease and population genetic studies. Nat. Methods 10, 5–6 (2013).

  59. 59.

    1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

  60. 60.

    et al. Robust associations of four new chromosome regions from genome-wide analyses of type 1 diabetes. Nat. Genet. 39, 857–864 (2007).

  61. 61.

    et al. TRAF1C5 as a risk locus for rheumatoid arthritis—a genomewide study. N. Engl. J. Med. 357, 1199–1209 (2007).

Download references

Acknowledgements

Access to Wellcome Trust Case Control Consortium data was authorized as work related to the project “Genome-wide association study of susceptibility and clinical phenotypes in epilepsy,” while access to Children's Hospital of Philadelphia (CHOP) data was granted under Project 49228-1, “Assumptions underlying estimates of SNP heritability.” We thank A. Molloy, J. Mills and L. Brody for permission to use genotype data from the Trinity College Dublin Student Study and S. Langley for help accessing the CHOP data. This work is funded by the UK Medical Research Council under grant MR/L012561/1 (awarded to D.S.) and the British Heart Foundation under grant RG/10/12/28456 (the UCLEB Consortium) and is supported by researchers at the National Institute for Health Research (NIHR) University College London Hospitals Biomedical Research Centre. N.C. is an ESPOD Fellow from the European Molecular Biology Laboratory, European Bioinformatics Institute, and Wellcome Trust Sanger Institute. M.R.J. receives funding from the Imperial College NIHR Biomedical Research Centre (BRC) Scheme. S.N. is a Wellcome Trust Senior Research Fellow in Basic Biomedical Science and is also supported by the NIHR Cambridge Biomedical Research Centre. Analyses were performed with the use of the UCL Computer Science Cluster and the help of the CS Technical Support Group, as well as the use of the UCL Legion High-Performance Computing Facility (Legion@UCL) and associated support services.

Author information

Affiliations

  1. UCL Genetics Institute, University College London, London, UK.

    • Doug Speed
    •  & David J Balding
  2. Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, UK.

    • Na Cai
  3. European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK.

    • Na Cai
  4. Division of Brain Science, Imperial College London, London, UK.

    • Michael R Johnson
  5. Department of Medicine, University of Cambridge, Cambridge, UK.

    • Sergey Nejentsev
  6. Centre for Systems Genomics, School of BioSciences, and School of Mathematics and Statistics, University of Melbourne, Melbourne, Victoria, Australia.

    • David J Balding

Consortia

  1. the UCLEB Consortium

    A full list of members and affiliations appears in the Supplementary Note.

Authors

  1. Search for Doug Speed in:

  2. Search for Na Cai in:

  3. Search for Michael R Johnson in:

  4. Search for Sergey Nejentsev in:

  5. Search for David J Balding in:

Contributions

D.S. and N.C. performed the analyses. D.S. and D.J.B. wrote the manuscript with assistance from N.C., M.R.J., S.N. and members of the UCLEB Consortium.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Doug Speed.

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–25 and Supplementary Tables 1–12

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/ng.3865

Further reading