Partitioning heritability by functional annotation using genome-wide association summary statistics

Journal name:
Nature Genetics
Volume:
47,
Pages:
1228–1235
Year published:
DOI:
doi:10.1038/ng.3404
Received
Accepted
Published online

Abstract

Recent work has demonstrated that some functional categories of the genome contribute disproportionately to the heritability of complex diseases. Here we analyze a broad set of functional elements, including cell type–specific elements, to estimate their polygenic contributions to heritability in genome-wide association studies (GWAS) of 17 complex diseases and traits with an average sample size of 73,599. To enable this analysis, we introduce a new method, stratified LD score regression, for partitioning heritability from GWAS summary statistics while accounting for linked markers. This new method is computationally tractable at very large sample sizes and leverages genome-wide information. Our findings include a large enrichment of heritability in conserved regions across many traits, a very large immunological disease–specific enrichment of heritability in FANTOM5 enhancers and many cell type–specific enrichments, including significant enrichment of central nervous system cell types in the heritability of body mass index, age at menarche, educational attainment and smoking behavior.

At a glance

Figures

  1. Simulation results for null calibration and power.
    Figure 1: Simulation results for null calibration and power.

    We simulated genetic architectures with positive total SNP heritability, with and without functional enrichment, for two values of pcausal and a range of values of . (a) Proportion of simulations in which a null hypothesis of no functional enrichment is rejected, as a function of and pcausal. (b) The z score of total SNP heritability depends on and pcausal but does not depend on the presence or absence of functional enrichment. (c) Proportion of simulations in which a null hypothesis of no functional enrichment is rejected, as a function of the z score of total SNP heritability. Here the z score of total SNP heritability for pcausal = 0.005 did not exceed 7.3, even at maximum .

  2. Simulation results for model misspecification.
    Figure 2: Simulation results for model misspecification.

    Enrichment is the proportion of heritability in DHS regions divided by the proportion of SNPs in DHS regions. Bars show 95% confidence intervals around the mean of 100 trials. (a) From left to right, the simulated genetic architectures are 1× DHS enrichment, 3× DHS enrichment and 5.5× DHS enrichment (100% of heritability from DHS SNPs). (b) From left to right, the simulated genetic architectures are 200-bp flanking regions causal, coding regions causal and FANTOM5 enhancer regions causal. For simulations with coding region or FANTOM5 enhancer as the causal category, we removed the causal category and the 500-bp window around that category from the full baseline model to simulate enrichment in an unknown functional category.

  3. Simulation results for ranking cell type groups and cell types.
    Figure 3: Simulation results for ranking cell type groups and cell types.

    For each cell type group, 500 simulations were performed with baseline enrichment and either realistic enrichment or low enrichment in that cell type group. Results for the left two columns are aggregated over the ten cell type groups; results for individual groups are displayed in Supplementary Figure 5. The right two columns represent 500 simulations each of realistic or low enrichment of a single cell type–specific annotation—H3K4me3 in fetal brain cells.

  4. Enrichment estimates for the 24 main annotations, averaged over nine independent traits.
    Figure 4: Enrichment estimates for the 24 main annotations, averaged over nine independent traits.

    Annotations are ordered by size. Error bars represent jackknife standard errors around the estimates of enrichment, and an asterisk indicates significance at P < 0.05 after Bonferroni correction for the 24 hypotheses tested. Negative point estimates, significance testing and the choice of nine independent traits are discussed in the Online Methods and Supplementary Note. TFBS, transcription factor binding site. DGF, digital genomic footprint.

  5. Enrichment estimates for selected annotations and traits.
    Figure 5: Enrichment estimates for selected annotations and traits.

    Error bars represent jackknife standard errors (s.e.) around the estimates of enrichment.

  6. Enrichment of cell type groups.
    Figure 6: Enrichment of cell type groups.

    We report the significance of enrichment for each of ten cell type groups for each of 11 traits. The black dashed lines at −log10 (P) = 3.5 is the cutoff for Bonferroni significance. The gray dashed lines at −log10 (P) = 2.1 is the cutoff for FDR < 0.05. For HDL, three of the top individual cell types are adipose nuclei, which explains the enrichment of the 'other' category.

  7. Comparison of stratified LD score regression to other methods for identifying enriched cell types.
    Figure 7: Comparison of stratified LD score regression to other methods for identifying enriched cell types.

    In null simulations, there is no enrichment. In null (baseline enrichment) simulations, there is enrichment in the baseline categories, some of which overlap the cell type or cell type group, but no additional enrichment in the cell type or cell type group. In the true enrichment simulations, there is enrichment in either the CNS cell type group (top) or the fetal brain cell type (bottom). In all simulations, N = 14,000 and = 0.7. We report the proportion of 100 simulations in which the null model is rejected for six methods: GoShifter6, fgwas9, Top SNPs10, PICS7, stratified LD score (unadjusted) and stratified LD score. Stratified LD score (unadjusted) refers to total unadjusted enrichment, that is, (proportion of )/(proportion of SNPs); LD score refers to the coefficient τ of the category, controlling for all other categories in the model.

References

  1. Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565569 (2010).
  2. Stahl, E.A. et al. Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nat. Genet. 44, 483489 (2012).
  3. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 5774 (2012).
  4. Roadmap Epigenomics Consortium. Integrative analysis of 111 reference human epigenomes. Nature 518, 317330 (2015).
  5. Trynka, G. et al. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nat. Genet. 45, 124130 (2013).
  6. Trynka, G. et al. Disentangling effects of colocalizing genomic annotations to functionally prioritize non-coding variants within complex trait loci. Am. J. Hum. Genet. 97, 139152 (2015).
  7. Farh, K.K.-H. et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337343 (2015).
  8. Kichaev, G. et al. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet. 10, e1004722 (2014).
  9. Pickrell, J.K. Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. Am. J. Hum. Genet. 94, 559573 (2014).
  10. Maurano, M.T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 11901195 (2012).
  11. Yang, J., Lee, S.H., Goddard, M.E. & Visscher, P.M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 7682 (2011).
  12. Lee, S.H. et al. Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs. Nat. Genet. 44, 247250 (2012).
  13. Davis, L.K. et al. Partitioning the heritability of Tourette syndrome and obsessive compulsive disorder reveals differences in genetic architecture. PLoS Genet. 9, e1003864 (2013).
  14. Gusev, A. et al. Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am. J. Hum. Genet. 95, 535552 (2014).
  15. Yang, J. et al. Genomic inflation factors under polygenic inheritance. Eur. J. Hum. Genet. 19, 807812 (2011).
  16. Bulik-Sullivan, B.K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291295 (2015).
  17. Kent, W.J. et al. The human genome browser at UCSC. Genome Res. 12, 9961006 (2002).
  18. Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421427 (2014).
  19. Hnisz, D. et al. Super-enhancers in the control of cell identity and disease. Cell 155, 934947 (2013).
  20. Hoffman, M.M. et al. Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res. 41, 827841 (2013).
  21. Lindblad-Toh, K. et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478, 476482 (2011).
  22. Ward, L.D. & Kellis, M. Evidence of abundant purifying selection in humans for recently-acquired regulatory functions. Science 337, 16751678 (2012).
  23. Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455461 (2014).
  24. Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832838 (2010).
  25. Speliotes, E.K. et al. Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nat. Genet. 42, 937948 (2010).
  26. Perry, J.R. et al. Parent-of-origin-specific allelic associations among 106 genomic loci for age at menarche. Nature 514, 9297 (2014).
  27. Teslovich, T.M. et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466, 707713 (2010).
  28. Schunkert, H. et al. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nat. Genet. 43, 333338 (2011).
  29. Morris, A.P. et al. Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat. Genet. 44, 981990 (2012).
  30. Manning, A.K. et al. A genome-wide approach accounting for body mass index identifies genetic variants influencing fasting glycemic traits and insulin resistance. Nat. Genet. 44, 659669 (2012).
  31. Psychiatric GWAS Consortium Bipolar Disorder Working Group. Large-scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near ODZ4. Nat. Genet. 43, 977983 (2011).
  32. Boraska, V. et al. A genome-wide association study of anorexia nervosa. Mol. Psychiatry 19, 10851094 (2014).
  33. Rietveld, C.A. et al. GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science 340, 14671471 (2013).
  34. Tobacco and Genetics Consortium. Genome-wide meta-analyses identify multiple loci associated with smoking behavior. Nat. Genet. 42, 441447 (2010).
  35. Okada, Y. et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376381 (2014).
  36. Jostins, L. et al. Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature 491, 119124 (2012).
  37. Stamatoyannopoulos, J.A. What does our genome encode? Genome Res. 22, 16021611 (2012).
  38. Pott, S. & Lieb, J.D. What are super-enhancers? Nat. Genet. 47, 812 (2015).
  39. Lilly, L.S. Pathophysiology of Heart Disease: A Collaborative Project of Medical Students and Faculty (Lippincott Williams & Wilkins, 2012).
  40. Kettyle, W.M. & Arky, R.A. Endocrine Pathophysiology (Lippincott Williams & Wilkins, 1998).
  41. Parker, S.C.J. et al. Chromatin stretch enhancer states drive cell-specific gene regulation and harbor human disease risk variants. Proc. Natl. Acad. Sci. USA 110, 1792117926 (2013).
  42. Pasquali, L. et al. Pancreatic islet enhancer clusters enriched in type 2 diabetes risk-associated variants. Nat. Genet. 46, 136143 (2014).
  43. Wang, W. et al. The Th17/Treg imbalance and cytokine environment in peripheral blood of patients with rheumatoid arthritis. Rheumatol. Int. 32, 887893 (2012).
  44. Farooqi, I.S. Defining the neural basis of appetite and obesity: from genes to behaviour. Clin. Med. 14, 286289 (2014).
  45. Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. doi:10.1038/ng.3406 (28 September 2015).
  46. International HapMap 3 Consortium. Integrating common and rare genetic variation in diverse human populations. Nature 467, 5258 (2010).
  47. 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 5665 (2012).
  48. Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 446, 661663 (2007).
  49. Liu, D.J. et al. Meta-analysis of gene-level tests for rare variant association. Nat. Genet. 46, 200204 (2014).

Download references

Author information

  1. These authors contributed equally to this work.

    • Hilary K Finucane &
    • Brendan Bulik-Sullivan
  2. These authors jointly supervised this work.

    • Benjamin M Neale &
    • Alkes L Price

Affiliations

  1. Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

    • Hilary K Finucane
  2. Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA.

    • Hilary K Finucane,
    • Alexander Gusev,
    • Po-Ru Loh,
    • Sara Lindstrom &
    • Alkes L Price
  3. Analytic and Translational Genetics Unit, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA.

    • Brendan Bulik-Sullivan,
    • Verneri Anttila,
    • Kyle Farh,
    • Stephan Ripke,
    • Mark J Daly &
    • Benjamin M Neale
  4. Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.

    • Brendan Bulik-Sullivan,
    • Verneri Anttila,
    • Stephan Ripke,
    • Mark J Daly &
    • Benjamin M Neale
  5. Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA.

    • Gosia Trynka,
    • Shaun Purcell &
    • Soumya Raychaudhuri
  6. Division of Rheumatology, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA.

    • Gosia Trynka,
    • Shaun Purcell &
    • Soumya Raychaudhuri
  7. Partners Center for Personalized Genetic Medicine, Boston, Massachusetts, USA.

    • Gosia Trynka &
    • Soumya Raychaudhuri
  8. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.

    • Gosia Trynka,
    • Verneri Anttila,
    • Soumya Raychaudhuri,
    • Mark J Daly,
    • Nick Patterson,
    • Benjamin M Neale &
    • Alkes L Price
  9. Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK.

    • Gosia Trynka
  10. Department of Computer Science, Harvard University, Cambridge, Massachusetts, USA.

    • Yakir Reshef
  11. Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA.

    • Han Xu &
    • Chongzhi Zang
  12. Epigenomics Program, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.

    • Kyle Farh
  13. Medical Research Council (MRC) Epidemiology Unit, University of Cambridge School of Clinical Medicine, Institute of Metabolic Science, Cambridge Biomedical Campus, Cambridge, UK.

    • Felix R Day &
    • John R B Perry
  14. Department of Psychiatry, Mount Sinai School of Medicine, New York, New York, USA.

    • Shaun Purcell &
    • Eli Stahl
  15. Department of Human Genetics and Disease Diversity, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan.

    • Yukinori Okada
  16. Laboratory for Statistical Analysis, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.

    • Yukinori Okada
  17. Faculty of Medical and Human Sciences, University of Manchester, Manchester, UK.

    • Soumya Raychaudhuri

Consortia

  1. ReproGen Consortium

  2. A full list of members appears in the Supplementary Note.

  3. Schizophrenia Working Group of the Psychiatric Genomics Consortium

  4. A full list of members appears in the Supplementary Note.

  5. The RACI Consortium

  6. A full list of members appears in the Supplementary Note.

Contributions

H.K.F., B.B.-S., A.G., G.T., Y.R., P.-R.L., V.A., S. Raychaudhuri, M.J.D., N.P., B.M.N. and A.L.P. conceived and designed the experiments. H.K.F. and B.B.-S. performed the experiments, performed the statistical analysis and analyzed the data. H.X., C.Z., K.F., S. Ripke, F.R.D., S.P., E.S., S.L., J.R.B.P. and Y.O. contributed reagents. H.K.F., B.B.-S., B.M.N. and A.L.P. wrote the manuscript with feedback from all authors.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (3,129 KB)

    Supplementary Figures 1–9, Supplementary Tables 1–8 and Supplementary Note.

Additional data