Protocol | Published:

Chromatin-state discovery and genome annotation with ChromHMM

Nature Protocols volume 12, pages 24782492 (2017) | Download Citation

Abstract

Noncoding DNA regions have central roles in human biology, evolution, and disease. ChromHMM helps to annotate the noncoding genome using epigenomic information across one or multiple cell types. It combines multiple genome-wide epigenomic maps, and uses combinatorial and spatial mark patterns to infer a complete annotation for each cell type. ChromHMM learns chromatin-state signatures using a multivariate hidden Markov model (HMM) that explicitly models the combinatorial presence or absence of each mark. ChromHMM uses these signatures to generate a genome-wide annotation for each cell type by calculating the most probable state for each genomic segment. ChromHMM provides an automated enrichment analysis of the resulting annotations to facilitate the functional interpretations of each chromatin state. ChromHMM is distinguished by its modeling emphasis on combinations of marks, its tight integration with downstream functional enrichment analyses, its speed, and its ease of use. Chromatin states are learned, annotations are produced, and enrichments are computed within 1 d.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from $8.99

All prices are NET prices.

References

  1. 1.

    et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473, 43–49 (2011).

  2. 2.

    et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).

  3. 3.

    Roadmap Epigenomics Consortium. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

  4. 4.

    et al. FTO obesity variant circuitry and adipocyte browning in humans. N. Engl. J. Med. 373, 895–907 (2015).

  5. 5.

    et al. High-resolution profiling of histone methylations in the human genome. Cell 129, 823–837 (2007).

  6. 6.

    et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).

  7. 7.

    & Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat. Biotechnol. 28, 817–825 (2010).

  8. 8.

    & ChromHMM: automating chromatin-state discovery and characterization. Nat. Methods 9, 215–216 (2012).

  9. 9.

    et al. Combinatorial patterns of histone acetylations and methylations in the human genome. Nat. Genet. 40, 897–903 (2008).

  10. 10.

    ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  11. 11.

    et al. Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res. 41, 827–841 (2013).

  12. 12.

    & Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues. Nat. Biotechnol. 33, 364–376 (2015).

  13. 13.

    et al. Integrating and mining the chromatin landscape of cell-type specificity using self-organizing maps. Genome Res. 23, 2136–2148 (2013).

  14. 14.

    et al. Cooperative binding of transcription factors orchestrates reprogramming. Cell 168, 442–459 e20 (2017).

  15. 15.

    et al. Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell 167, 1369–1384 e19 (2016).

  16. 16.

    et al. Nucleosome density ChIP-Seq identifies distinct chromatin modification signatures associated with MNase accessibility. Cell Rep. 17, 2112–2124 (2016).

  17. 17.

    et al. A comparative encyclopedia of DNA elements in the mouse genome. Nature 515, 355–364 (2014).

  18. 18.

    et al. Identification of functional elements and regulatory circuits by Drosophila modENCODE. Science 330, 1787–1797 (2010).

  19. 19.

    et al. ENCODE data in the UCSC Genome Browser: year 5 update. Nucleic Acids Res. 41, D56–D63 (2013).

  20. 20.

    et al. Ensembl 2015. Nucleic Acids Res. 43, D662–D669 (2015).

  21. 21.

    et al. Long-range chromatin contacts in embryonic stem cells reveal a role for pluripotency factors and polycomb proteins in genome organization. Cell Stem Cell 13, 602–616 (2013).

  22. 22.

    et al. Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers. Nat. Genet. 46, 1311–1320 (2014).

  23. 23.

    et al. Hierarchical mechanisms for direct reprogramming of fibroblasts to neurons. Cell 155, 621–635 (2013).

  24. 24.

    et al. Topologically associating domains are stable units of replication-timing regulation. Nature 515, 402–405 (2014).

  25. 25.

    & Interplay between chromatin state, regulator binding, and regulatory motifs in six human cell types. Genome Res. 23, 1142–1154 (2013).

  26. 26.

    et al. Systematic dissection of regulatory motifs in 2000 predicted human enhancers using a massively parallel reporter assay. Genome Res. 23, 800–811 (2013).

  27. 27.

    et al. Common genetic variants influence human subcortical brain structures. Nature 520, 224–229 (2015).

  28. 28.

    et al. Conserved epigenomic signals in mice and humans reveal immune basis of Alzheimer's disease. Nature 518, 365–369 (2015).

  29. 29.

    et al. Alzheimer's disease: early alterations in brain DNA methylation at ANK1, BIN1, RHBDF2 and other loci. Nat. Neurosci. 17, 1156–1163 (2014).

  30. 30.

    , , & Tau promotes neurodegeneration through global chromatin relaxation. Nat. Neurosci. 17, 357–366 (2014).

  31. 31.

    et al. Chromatin stretch enhancer states drive cell-specific gene regulation and harbor human disease risk variants. Proc. Natl. Acad. Sci. USA 110, 17921–17926 (2013).

  32. 32.

    , , , & Reconfiguration of nucleosome-depleted regions at distal regulatory elements accompanies DNA methylation of enhancers and insulators in cancer. Genome Res. 24, 1421–1432 (2014).

  33. 33.

    et al. A new GWAS and meta-analysis with 1000Genomes imputation identifies novel risk variants for colorectal cancer. Sci. Rep. 5, 10442 (2015).

  34. 34.

    et al. Reprogramming of the human intestinal epigenome by surgical tissue transposition. Genome Res. 24, 545–553 (2014).

  35. 35.

    et al. Systematic epigenomic analysis reveals chromatin states associated with melanoma progression. Cell Rep. 19, 875–889 (2017).

  36. 36.

    et al. Extensive variation in chromatin states across humans. Science 342, 750–752 (2013).

  37. 37.

    & The chromatin landscape of Drosophila: comparisons between species, sexes, and chromosomes. Genome Res. 24, 1125–1137 (2014).

  38. 38.

    et al. Differential DNA methylation with age displays both common and dynamic features across human tissues that are influenced by CpG landscape. Genome Biol. 14, R102 (2013).

  39. 39.

    DNA methylation age of human tissues and cell types. Genome Biol. 14, 3156 (2013).

  40. 40.

    & HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Res. 40, D930–D934 (2012).

  41. 41.

    et al. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 22, 1790–1797 (2012).

  42. 42.

    , , , & Unsupervised segmentation of continuous genomic data. Bioinformatics 23, 1424–1426 (2007).

  43. 43.

    , , & Identification of higher-order functional domains in the human ENCODE regions. Genome Res. 17, 917–927 (2007).

  44. 44.

    et al. Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nat. Methods 9, 473–476 (2012).

  45. 45.

    , & Discovering and mapping chromatin states using a tree hidden Markov model. BMC Bioinformatics 14, S4 (2013).

  46. 46.

    et al. Spatiotemporal clustering of the epigenome reveals rules of dynamic gene regulation. Genome Res. 23, 352–364 (2013).

  47. 47.

    et al. Multi-scale chromatin state annotation using a hierarchical hidden Markov model. Nat. Commun. 8, 15011 (2017).

  48. 48.

    & Chromatin module inference on cellular trajectories identifies key transition points and poised epigenetic states in diverse developmental processes. Genome Res. 27, 1250–1262 (2017).

  49. 49.

    et al. hiHMM: Bayesian non-parametric joint inference of chromatin state maps. Bioinformatics 31, 2066–2074 (2015).

  50. 50.

    , , & Jointly characterizing epigenetic dynamics across multiple human cell types. Nucleic Acids Res. 44, 6721–6731 (2016).

  51. 51.

    et al. Joint annotation of chromatin state and chromatin conformation reveals relationships among domain types and identifies domains of cell-type-specific expression. Genome Res. 25, 544–557 (2015).

  52. 52.

    , , , & Annotation of genomics data using bidirectional hidden Markov models unveils variations in Pol II transcription cycle. Mol. Syst. Biol. 10, 768 (2014).

  53. 53.

    et al. Accurate promoter and enhancer identification in 127 ENCODE and roadmap epigenomics cell types and tissues by GenoSTAN. PLoS ONE 12, e0169249 (2017).

  54. 54.

    & Chromatin segmentation based on a probabilistic model for read counts explains a large portion of the epigenome. Genome Biol. 16, 151 (2015).

  55. 55.

    & Spectacle: fast chromatin state annotation using spectral learning. Genome Biol. 16, 33 (2015).

  56. 56.

    et al. Human promoters are intrinsically directional. Mol. Cell 57, 674–684 (2015).

  57. 57.

    et al. Systematic protein location mapping reveals five principal chromatin types in Drosophila cells. Cell 143, 212–224 (2010).

  58. 58.

    , , & Learning chromatin states with factorized information criteria. Bioinformatics 31, 2426–2433 (2015).

  59. 59.

    & Spatial clustering of multivariate genomic and epigenomic information in Proceedings of the 13th Annual International Conference on Research in Computational Molecular Biology 170–183 (Springer, 2009).

  60. 60.

    et al. Comprehensive analysis of the chromatin landscape in Drosophila melanogaster. Nature 471, 480–485 (2011).

  61. 61.

    , , & A tiered hidden Markov model characterizes multi-scale chromatin states. Genomics 102, 1–7 (2013).

  62. 62.

    et al. Integrative epigenomic mapping defines four main chromatin states in Arabidopsis: organization of the Arabidopsis epigenome. EMBO J. 30, 1928–1938 (2011).

  63. 63.

    et al. Comparative annotation of functional regions in the human genome using epigenomic data. Nucleic Acids Res. 41, 4423–4432 (2013).

  64. 64.

    et al. jMOSAiCS: joint analysis of multiple ChIP-seq datasets. Genome Biol. 14, R38 (2013).

  65. 65.

    , , , & Sparsely correlated hidden Markov models with application to genome-wide location studies. Bioinformatics 29, 533–541 (2013).

  66. 66.

    , & ChromaSig: a probabilistic approach to finding common chromatin signatures in the human genome. PLoS Comput. Biol. 4, e1000201 (2008).

  67. 67.

    et al. A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell 125, 315–326 (2006).

  68. 68.

    et al. The accessible chromatin landscape of the human genome. Nature 489, 75–82 (2012).

  69. 69.

    et al. High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. Genome Res. 21, 456–464 (2011).

  70. 70.

    et al. An expansive human regulatory lexicon encoded in transcription factor footprints. Nature 489, 83–90 (2012).

  71. 71.

    et al. Genome-scale high-resolution mapping of activating and repressive nucleotides in regulatory regions. Nat. Biotechnol. 34, 1180–1190 (2016).

  72. 72.

    et al. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res. 22, 1813–1831 (2012).

  73. 73.

    et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).

Download references

Acknowledgements

We acknowledge the ENCODE and Roadmap Epigenomics consortia for generation and processing of data to which we have previously applied ChromHMM. We acknowledge the users of ChromHMM who have provided useful feedback on the software. We acknowledge funding from U.S. National Institutes of Health grants U54HG004570, RC1HG005334 (M.K.), R01ES024995, U01HG007912 and U01MH105578 (J.E.); a U.S. National Science Foundation Postdoctoral Fellowship (0905968) and CAREER Award 1254200 (J.E.); and an Alfred P. Sloan Fellowship (J.E.).

Author information

Affiliations

  1. Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, California, USA.

    • Jason Ernst
  2. Department of Computer Science, University of California, Los Angeles, Los Angeles, California, USA.

    • Jason Ernst
  3. Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research at University of California, Los Angeles, Los Angeles, California, USA.

    • Jason Ernst
  4. Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, California, USA.

    • Jason Ernst
  5. Molecular Biology Institute, University of California, Los Angeles, Los Angeles, California, USA.

    • Jason Ernst
  6. Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.

    • Manolis Kellis
  7. MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts, USA.

    • Manolis Kellis

Authors

  1. Search for Jason Ernst in:

  2. Search for Manolis Kellis in:

Contributions

J.E. and M.K. wrote this protocol and previously developed ChromHMM.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Jason Ernst or Manolis Kellis.

About this article

Publication history

Published

DOI

https://doi.org/10.1038/nprot.2017.124

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.