Genome-wide mapping of 5-hydroxymethylcytosine in embryonic stem cells

Journal name:
Nature
Volume:
473,
Pages:
394–397
Date published:
DOI:
doi:10.1038/nature10102
Received
Accepted
Published online

5-hydroxymethylcytosine (5hmC) is a modified base present at low levels in diverse cell types in mammals1, 2, 3, 4, 5. 5hmC is generated by the TET family of Fe(II) and 2-oxoglutarate-dependent enzymes through oxidation of 5-methylcytosine (5mC)1, 2, 4, 5, 6, 7. 5hmC and TET proteins have been implicated in stem cell biology and cancer1, 4, 5, 8, 9, but information on the genome-wide distribution of 5hmC is limited. Here we describe two novel and specific approaches to profile the genomic localization of 5hmC. The first approach, termed GLIB (glucosylation, periodate oxidation, biotinylation) uses a combination of enzymatic and chemical steps to isolate DNA fragments containing as few as a single 5hmC. The second approach involves conversion of 5hmC to cytosine 5-methylenesulphonate (CMS) by treatment of genomic DNA with sodium bisulphite, followed by immunoprecipitation of CMS-containing DNA with a specific antiserum to CMS5. High-throughput sequencing of 5hmC-containing DNA from mouse embryonic stem (ES) cells showed strong enrichment within exons and near transcriptional start sites. 5hmC was especially enriched at the start sites of genes whose promoters bear dual histone 3 lysine 27 trimethylation (H3K27me3) and histone 3 lysine 4 trimethylation (H3K4me3) marks. Our results indicate that 5hmC has a probable role in transcriptional regulation, and suggest a model in which 5hmC contributes to the ‘poised’ chromatin signature found at developmentally-regulated genes in ES cells.

At a glance

Figures

  1. Comparison of 5hmC enrichment methods.
    Figure 1: Comparison of 5hmC enrichment methods.

    a, The GLIB method. Glucose is added to 5hmC by BGT, oxidized with sodium periodate to yield aldehydes, and reacted with the aldehyde reactive probe (ARP), yielding two biotins at the site of every 5hmC. b, 5hmC is converted to CMS by sodium bisulphite. cf, Precipitation of PCR amplicons containing (1) varying amounts of 5hmC by GLIB methodology (c), anti-CMS methodology (d), or anti-5hmC antibody (e); or (2) varying amounts of 5mC by anti-5mC antibody (f). pAb, polyclonal antibody; mAb, monoclonal antibody. Between 1 and 6 independent experiments per method, mean percentage input precipitated±s.d. is indicated. g, Overlap between HERGs identified by the GLIB and anti-CMS methodologies. Left panel, number of HERGs; right panel, number of base pairs contained within HERGs.

  2. Genomic distribution of 5hmC or 5mC enriched regions of the genome.
    Figure 2: Genomic distribution of 5hmC or 5mC enriched regions of the genome.

    a, Correlation of HERG or MERG density on each chromosome (y-axis) with gene density in the same chromosome (x-axis). Density is defined as frequency divided by chromosome length. b, c, Both HERGs and MERGs are enriched in transcribed regions (b), whereas HERGs are preferentially enriched at enhancers and the start sites of genes (c). The percentage of HERGs or MERGs mapping to the indicated genomic feature (darker bar) is compared with the percentage of randomly chosen sequences mapping to that feature (lighter bar). 5′ UTR, 5′ untranslated region. TSS, transcription start site (−800bp to +200bp relative to start of transcription). See Supplementary Methods for detailed definition of how HERGs or MERGs were classified as mapping to genomic features. d, Distribution of HERGs and MERGs relative to the TSS. The centre of each HERG was plotted relative to the nearest TSS in 1,000bp increments from −10kb to +10kb surrounding the TSS.

  3. Properties of HERGs at transcription start sites.
    Figure 3: Properties of HERGs at transcription start sites.

    a, The percentage of genes with 5hmC at the TSS (blue and red bars) reported to contain histone H3 trimethylation (left) or PRC components (right) at their promoters is compared to the fraction of all genes (grey bars) with these promoter marks22. Number of genes in each category is indicated. b, HERGs are enriched at the TSSs of genes with low expression in ES cells. All genes were ranked by level of expression in ES cells21 and sorted into deciles from lowest to highest. The per cent of genes within the decile category with 5-hmC enriched at the TSS (left) or within gene bodies (right) are shown for each methodology. The first five deciles, which are comprised of genes lacking statistically significant expression, are pooled and averaged in this analysis. c, HERGs are enriched at the TSS of genes upregulated upon differentiation to embryoid bodies (EB)26. The percentage of genes with 5-hmC at their TSS (blue bars) that are substantially upregulated or downregulated upon differentiation to EB is compared with the percentage of total genes similarly regulated (grey bars). Number of genes in each category is indicated. d, Overlap between genes with 5hmC at the TSS and genes positively or negatively regulated by Tet1 (ref. 8).

Accession codes

Primary accessions

Gene Expression Omnibus

References

  1. Tahiliani, M. et al. Conversion of 5-methylcytosine to 5-hydroxymethylcytosine in mammalian DNA by MLL partner TET1. Science 324, 930935 (2009)
  2. Kriaucionis, S. & Heintz, N. The nuclear DNA base 5-hydroxymethylcytosine is present in Purkinje neurons and the brain. Science 324, 929930 (2009)
  3. Szwagierczak, A., Bultmann, S., Schmidt, C. S., Spada, F. & Leonhardt, H. Sensitive enzymatic quantification of 5-hydroxymethylcytosine in genomic DNA. Nucleic Acids Res. 38, e181 (2010)
  4. Ito, S. et al. Role of Tet proteins in 5mC to 5hmC conversion, ES-cell self-renewal and inner cell mass specification. Nature 466, 11291133 (2010)
  5. Ko, M. et al. Impaired hydroxylation of 5-methylcytosine in myeloid cancers with mutant TET2. Nature 468, 839843 (2010)
  6. Iyer, L. M., Tahiliani, M., Rao, A. & Aravind, L. Prediction of novel families of enzymes involved in oxidative and other complex modifications of bases in nucleic acids. Cell Cycle 8, 16981710 (2009)
  7. Loenarz, C. & Schofield, C. J. Oxygenase catalyzed 5-methylcytosine hydroxylation. Chem. Biol. 16, 580583 (2009)
  8. Koh, K. P. et al. Tet1 and Tet2 regulate 5-hydroxymethylcytosine production and cell lineage specification in mouse embryonic stem cells. Cell Stem Cell 8, 200213 (2011)
  9. Delhommeau, F. et al. Mutation in TET2 in myeloid cancers. N. Engl. J. Med. 360, 22892301 (2009)
  10. Zhang, H., Li, X. J., Martin, D. B. & Aebersold, R. Identification and quantification of N-linked glycoproteins using hydrazide chemistry, stable isotope labeling and mass spectrometry. Nature Biotechnol. 21, 660666 (2003)
  11. Song, C. X. et al. Selective chemical labeling reveals the genome-wide distribution of 5-hydroxymethylcytosine. Nature Biotechnol. 29, 6872 (2011)
  12. Hayatsu, H. & Shiragami, M. Reaction of bisulfite with the 5-hydroxymethyl group in pyrimidines and in phage DNAs. Biochemistry 18, 632637 (1979)
  13. Huang, Y. et al. The behaviour of 5-hydroxymethylcytosine in bisulfite sequencing. PLoS ONE 5, e8888 (2010)
  14. Harris, T. D. et al. Single-molecule DNA sequencing of a viral genome. Science 320, 106109 (2008)
  15. Bowers, J. et al. Virtual terminator nucleotides for next-generation DNA sequencing. Nature Methods 6, 593595 (2009)
  16. Lister, R. et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462, 315322 (2009)
  17. Saxonov, S., Berg, P. & Brutlag, D. L. A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters. Proc. Natl Acad. Sci. USA 103, 14121417 (2006)
  18. Feng, S. et al. Conservation and divergence of methylation patterning in plants and animals. Proc. Natl Acad. Sci. USA 107, 86898694 (2010)
  19. Creyghton, M. P. et al. Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proc. Natl Acad. Sci. USA doi:10.1073/pnas.1016071107 (24 November 2010)
  20. Boyer, L. A. et al. Polycomb complexes repress developmental regulators in murine embryonic stem cells. Nature 441, 349353 (2006)
  21. Guttman, M. et al. Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nature Biotechnol. 28, 503510 (2010)
  22. Mikkelsen, T. S. et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448, 553560 (2007)
  23. Ku, M. et al. Genomewide analysis of PRC1 and PRC2 occupancy identifies two classes of bivalent domains. PLoS Genet. 4, e1000242 (2008)
  24. Meissner, A. et al. Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature 454, 766770 (2008)
  25. Zhang, H. et al. TET1 is a DNA-binding protein that modulates DNA methylation and gene transcription via hydroxylation of 5-methylcytosine. Cell Res. 20, 13901393 (2010)
  26. Lee, T. I. et al. Control of developmental regulators by Polycomb in human embryonic stem cells. Cell 125, 301313 (2006)
  27. Bernstein, B. E. et al. A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell 125, 315326 (2006)
  28. Fouse, S. D. et al. Promoter CpG methylation contributes to ES cell gene regulation in parallel with Oct4/Nanog, PcG complex, and histone H3 K4/K27 trimethylation. Cell Stem Cell 2, 160169 (2008)

Download references

Author information

  1. These authors contributed equally to this work.

    • William A. Pastor,
    • Utz J. Pape &
    • Yun Huang

Affiliations

  1. Harvard Medical School, Immune Disease Institute and Program in Cellular and Molecular Medicine, Children’s Hospital Boston, Boston, Massachusetts 02115, USA

    • William A. Pastor,
    • Utz J. Pape,
    • Hope R. Henderson,
    • Mamta Tahiliani &
    • Anjana Rao
  2. La Jolla Institute for Allergy & Immunology, La Jolla, California 92037, USA

    • William A. Pastor,
    • Yun Huang,
    • Hope R. Henderson,
    • Myunggon Ko,
    • Sahasransu Mahapatra &
    • Anjana Rao
  3. Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard School of Public Health, Boston, Massachusetts 02115, USA

    • Utz J. Pape &
    • X. Shirley Liu
  4. Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, California 92037, USA

    • Ryan Lister &
    • Joseph R. Ecker
  5. Division of Hematology/Oncology, Children’s Hospital Boston; Dana-Farber Cancer Institute; Harvard Stem Cell Institute, Boston, Massachusetts 02115, USA

    • Erin M. McLoughlin,
    • George Q. Daley &
    • Suneet Agarwal
  6. Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts 02138, USA

    • Yevgeny Brudno
  7. Helicos BioSciences Corporation, Cambridge, Massachusetts 02139, USA

    • Philipp Kapranov &
    • Patrice M. Milos
  8. Present address: Department of Biochemistry, New York University Langone Medical Centre, New York, New York 10016, USA.

    • Mamta Tahiliani

Contributions

W.A.P., Y.B. and S.A. devised the GLIB method. W.A.P., S.A., H.R.H. and E.M.M. optimized the GLIB method. Y.H. generated the anti-CMS antiserum, and Y.H. and W.A.P. optimized the anti-CMS pull-down. W.A.P. and Y.H. grew ES cells. W.A.P. prepared GLIB samples for sequencing, Y.H. prepared CMS samples, H.R.H. performed MeDIPs. Helicos sequencing and mapping was performed by P.K. and P.M.M., Illumina sequencing and mapping was performed by R.L. and J.R.E., and U.J.P. was responsible for bioinformatic analysis. M.K. performed the anti-5hmC dot blot. W.A.P. and M.T. performed anti-5hmC pull-downs. H.R.H. and S.M. performed and optimized in vitro tests of Tet substrate specificity. W.A.P., S.A. and A.R. wrote the manuscript. S.A. and A.R. coordinated research.

Competing financial interests

P.K. and P.M.M. are employees of Helicos Biosciences.

Corresponding authors

Correspondence to:

Data have been deposited at GEO under accession number GSE28682.

Author details

Supplementary information

PDF files

  1. nature10102-s1.pdf Supplementary Information (1.1M)

    This file contains Supplementary Figures 1-5 with legends, Supplementary Methods, additional references and Supplementary Tables 1-3

Excel files

  1. Supplementary Table 4. GLIB Peak Annotation. (21.8M)

    This table shows the location of every GLIB HERG and the genomic features it is enclosed within (or in the case of exons, transcription start sites, or enhancers, touches).

  2. Supplementary Table 6. CMS Peak Annotation. (22.6M)

    This table shows the location of every CMS HERG and the genomic features it is enclosed within (or in the case of exons, transcription start sites, or enhancers, touches).

  3. Supplementary Table 8. 5mC Peak Annotation. (20.9M)

    This table shows the location of every MERG and the genomic features it is enclosed within (or in the case of exons, transcription start sites, or enhancers, touches).

  4. Supplementary Table 11. List of hydroxymethylated TSS genes (by GLIB). (4.6M)

    This table lists every gene that overlaps with a HERG (as determined by GLIB) at or immediately prior to the TSS (-800bp to +200bp). Featured is the RefSeq output, the promoter CpG class and histone methylation state22, the expresson decile in ES cells21, the presence or absence of polycomb features at the promoter23, change in expression upon differentiation to embryoid bodies20, and upregulation or downregulation in response to Tet1 depletion8.

  5. Supplementary Table 12. List of hydroxymethylated TSS genes (by anti-CMS). (4.1M)

    This table lists every gene that overlaps with a HERG (as determined by anti-CMS precipitation) at or immediately prior to the TSS (-800bp to +200bp). Featured is the RefSeq output, the promoter CpG class and histone methylation state22, the expresson decile in ES cells21, the presence or absence of polycomb features at the promoter23, change in expression upon differentiation to embryoid bodies20, and upregulation or downregulation in response to Tet1 depletion8.

  6. Supplementary Table 13. List of methylated TSS genes (by MeDIP). (644K)

    This table lists every gene that overlaps with a MERG at or immediately prior to the TSS (-800bp to +200bp). Featured is the RefSeq output, the promoter CpG class and histone methylation state22, the expresson decile in ES cells21, the presence or absence of polycomb features at the promoter23, the change in expression upon differentiation to embryoid bodies20, and upregulation or downregulation in response to Tet1 depletion8.

Text files

  1. Supplementary Table 5. GLIB Peak Locations. (4.9M)

    This table shows the location of each GLIB HERG in the Mus Musculus (mm9) genome.

  2. Supplementary Table 7. CMS Peak Locations. (4.4M)

    This table shows the location of each CMS HERG in the Mus Musculus (mm9) genome.

  3. Supplementary Table 9. 5mC Peak Locations. (2.5M)

    This table shows the location of each MERG in the Mus Musculus (mm9) genome.

  4. Supplementary Table 9. Visualization of hoxb locus (3M)

    This file shows reads from the GLIB (reads.glib.hmc) and anti-CMS (reads.cms.hmc) precipitations. Also shown are the reads from the –BGT control (reads.glib.bg), the bisulphite treated input (reads.cms.bg), and the HERGs from each method (peaks.glib and peaks.cms).

Additional data