Discovery and characterization of chromatin states for systematic annotation of the human genome

Article metrics

Abstract

A plethora of epigenetic modifications have been described in the human genome and shown to play diverse roles in gene regulation, cellular differentiation and the onset of disease. Although individual modifications have been linked to the activity levels of various genetic functional elements, their combinatorial patterns are still unresolved and their potential for systematic de novo genome annotation remains untapped. Here, we use a multivariate Hidden Markov Model to reveal 'chromatin states' in human T cells, based on recurrent and spatially coherent combinations of chromatin marks. We define 51 distinct chromatin states, including promoter-associated, transcription-associated, active intergenic, large-scale repressed and repeat-associated states. Each chromatin state shows specific enrichments in functional annotations, sequence motifs and specific experimentally observed characteristics, suggesting distinct biological roles. This approach provides a complementary functional annotation of the human genome that reveals the genome-wide locations of diverse classes of epigenetic function.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Example of chromatin state annotation.
Figure 2: Chromatin state definition and functional interpretation.
Figure 3: Promoter and transcribed chromatin states show distinct functional and positional enrichments.
Figure 4: SNP and GWAS enrichments for chromatin states.
Figure 5: Discovery power of chromatin states for genome annotation.
Figure 6: Recovery of chromatin states with subsets of marks.

References

  1. 1

    Bernstein, B.E., Meissner, A. & Lander, E.S. The mammalian epigenome. Cell 128, 669–681 (2007).

  2. 2

    Kouzarides, T. Chromatin modifications and their function. Cell 128, 693–705 (2007).

  3. 3

    Strahl, B.D. & Allis, C.D. The language of covalent histone modifications. Nature 403, 41–45 (2000).

  4. 4

    Schreiber, S.L. & Bernstein, B.E. Signaling network model of chromatin. Cell 111, 771–778 (2002).

  5. 5

    Barski, A. et al. High-resolution profiling of histone methylations in the human genome. Cell 129, 823–837 (2007).

  6. 6

    Wang, Z. et al. Combinatorial patterns of histone acetylations and methylations in the human genome. Nat. Genet. 40, 897–903 (2008).

  7. 7

    Heintzman, N.D. et al. Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat. Genet. 39, 311–318 (2007).

  8. 8

    Heintzman, N.D. et al. Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature 459, 108–112 (2009).

  9. 9

    Guttman, M. et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223–227 (2009).

  10. 10

    Hon, G., Wang, W. & Ren, B. Discovery and annotation of functional chromatin signatures in the human genome. PLoS Comput. Biol. 5, e1000566 (2009).

  11. 11

    Wang, X., Xuan, Z., Zhao, X., Li, Y. & Zhang, M.Q. High-resolution human core-promoter prediction with CoreBoost_HM. Genome Res. 19, 266–275 (2009).

  12. 12

    Won, K.J., Chepelev, I., Ren, B. & Wang, W. Prediction of regulatory elements in mammalian genomes using chromatin signatures. BMC Bioinformatics 9, 547 (2008).

  13. 13

    Hon, G., Ren, B. & Wang, W. ChromaSig: a probabilistic approach to finding common chromatin signatures in the human genome. PLOS Comput. Biol. 4, e1000201 (2008).

  14. 14

    Day, N., Hemmaplardh, A., Thurman, R.E., Stamatoyannopoulos, J.A. & Noble, W.S. Unsupervised segmentation of continuous genomic data. Bioinformatics 23, 1424–1426 (2007).

  15. 15

    Jia, L. et al. Functional enhancers at the gene-poor 8q24 cancer-linked locus. PLoS Genet. 5, e1000597 (2009).

  16. 16

    Thurman, R.E., Day, N., Noble, W.S. & Stamatoyannopoulos, J.A. Identification of higher-order functional domains in the human ENCODE regions. Genome Res. 17, 917 (2007).

  17. 17

    Schuettengruber, B. et al. Functional anatomy of polycomb and trithorax chromatin landscapes in Drosophila embryos. PLoS Biol. 7, e13 (2009).

  18. 18

    Jaschek, R. & Tanay, A. Spatial clustering of multivariate genomic and epigenomic information. in Proceedings of the 13th Annual International Conference on Research in Computational Molecular Biology (ed. Batzoglou, S.) 170–183 (Springer, 2009).

  19. 19

    Schwartz, S., Meshorer, E. & Ast, G. Chromatin organization marks exon-intron structure. Nat. Struct. Mol. Biol. 16, 990–995 (2009).

  20. 20

    Kolasinska-Zwierz, P. et al. Differential chromatin marking of introns and expressed exons by H3K36me3. Nat. Genet. 41, 376–381 (2009).

  21. 21

    Andersson, R., Enroth, S., Rada-Iglesias, A., Wadelius, C. & Komorowski, J. Nucleosomes are well positioned in exons and carry characteristic histone modifications. Genome Res. 19, 1732–1741 (2009).

  22. 22

    Schones, D.E. et al. Dynamic regulation of nucleosome positioning in the human genome. Cell. 132, 878–898 (2008).

  23. 23

    Sripathy, S.P., Stevens, J. & Schultz, D.C. The KAP1 corepressor functions to coordinate the assembly of de novo HP1-demarcated microenvironments of heterochromatin required for KRAB zinc finger protein-mediated transcriptional repression. Mol. Cell. Biol. 26, 8623–8638 (2006).

  24. 24

    O'Geen, H. et al. Genome-wide analysis of KAP1 binding suggests autoregulation of KRAB-ZNFs. PLoS Genet. 3, e89 (2007).

  25. 25

    Hindorff, L.A., Junkins, H.A., Mehta, J.P. & Manolio, T.A. A catalog of published genome-wide association studies. <http://www.genome.gov/gwastudies> accessed July 22, 2009.

  26. 26

    Gudbjartsson, D.F. et al. Sequence variants affecting eosinophil numbers associate with asthma and myocardial infarction. Nat. Genet. 41, 342–347 (2009).

  27. 27

    Guelen, L. et al. Domain organization of human chromosomes revealed by mapping of nuclear lamina interactions. Nature 453, 948–951 (2008).

  28. 28

    Furey, T.S. & Haussler, D. Integration of the cytogenetic map with the draft human genome sequence. Hum. Mol. Genet. 12, 1037–1044 (2003).

  29. 29

    Wang, Z. et al. Genome-wide mapping of HATs and HDACs reveals distinct functions in active and inactive genes. Cell 138, 1019–1031 (2009).

  30. 30

    Johnson, D.S. et al. Systematic evaluation of variability in ChIP-chip experiments using predefined DNA targets. Genome Res. 18, 393–403 (2008).

  31. 31

    Zang, C. et al. A clustering approach for identification of enriched domains from histone modification ChIP-Seq data. Bioinformatics 25, 1952–1958 (2009).

  32. 32

    Zhang, Y., Shin, H., Song, J.S., Lei, Y. & Liu, X.S. Identifying positioned nucleosomes with epigenetic marks in human from ChIP-Seq. BMC Genomics 9, 537 (2008).

  33. 33

    Cui, K. et al. Chromatin signatures in multipotent human hematopoietic stem cells indicate the fate of bivalent genes during differentiation. Cell Stem Cell 4, 80–93 (2009).

  34. 34

    ENCODE Project Consortium. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–816 (2007).

  35. 35

    Celniker, S.E. et al. Unlocking the secrets of the genome. Nature 459, 927–930 (2009).

  36. 36

    Carninci, P. et al. Genome-wide analysis of mammalian promoter architecture and evolution. Nat. Genet. 38, 626–635 (2006).

  37. 37

    Karolchik, D. et al. The UCSC Genome Browser Database: 2008 update. Nucleic Acids Res. 36, D773–D779 (2008).

  38. 38

    Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J. & Wheeler, D.L. GenBank: update. Nucleic Acids Res. 32, D23–D26 (2004).

  39. 39

    Durbin, R., Eddy, S., Krogh, A. & Mitchison, G. Biological Sequence Analysis (Cambridge Univ. Press, 1998).

  40. 40

    Neal, R.M. & Hinton, G.E. A view of the EM algorithm that justifies incremental, sparse, and other variants. Learn. Graph. Models 89, 355–368 (1998).

  41. 41

    Pruitt, K.D., Tatusova, T. & Maglott, D.R. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35, D61–D65 (2007).

  42. 42

    Smit, A., Hubley, R. & Green, P. RepeatMasker Open-3.0 1996-2010 <http://www.repeatmasker.org>.

  43. 43

    Miller, W. et al. 28-way vertebrate alignment and conservation track in the UCSC Genome Browser. Genome Res. 17, 1797–1808 (2007).

  44. 44

    Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005).

  45. 45

    Boyle, A.P. et al. High-resolution mapping and characterization of open chromatin across the genome. Cell 132, 311–322 (2008).

  46. 46

    Kent, W.J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).

  47. 47

    Su, A.I. et al. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl. Acad. Sci. USA 101, 6062–6067 (2004).

  48. 48

    Kheradpour, P., Stark, A., Roy, S. & Kellis, M. Reliable prediction of regulator targets using 12 Drosophila genomes. Genome Res. 17, 1919–1931 (2007).

  49. 49

    Ernst, J. & Bar-Joseph, Z. STEM: a tool for the analysis of short time series gene expression data. BMC Bioinformatics 7, 191 (2006).

  50. 50

    International HapMap Consortium. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861 (2007).

Download references

Acknowledgements

We thank P. Kheradpour for regulatory motif instances and M.F. Lin for predicted new exons. We thank M. Garber, A. Siepel, K. Lindblad-Toh, and E. Lander for use of comparative information on 29 mammals. We thank B. Bernstein, N. Shoresh, C. Epstein and T. Mikkelsen for helpful discussions. We thank L. Goff, C. Bristow, R. Sealfon and all members of the MIT CompBio Group for comments, feedback and support. This material is based upon work supported by the National Science Foundation under award no. 0905968 and funding from the US National Human Genome Research Institute (NHGRI) under awards U54-HG004570 and RC1-HG005334.

Author information

J.E. and M.K. developed the method, analyzed results and wrote the paper.

Correspondence to Manolis Kellis.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Tables 1 and 2, Supplementary Notes and Supplementary Figs. 1–41 (PDF 5184 kb)

Rights and permissions

Reprints and Permissions

About this article

Further reading