Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

Bisulfite-free, base-resolution analysis of 5-formylcytosine at the genome scale

This article has been updated

Abstract

Active DNA demethylation in mammals involves oxidation of 5-methylcytosine (5mC) to 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC). However, genome-wide detection of 5fC at single-base resolution remains challenging. Here we present fC-CET, a bisulfite-free method for whole-genome analysis of 5fC based on selective chemical labeling of 5fC and subsequent C-to-T transition during PCR. Base-resolution 5fC maps showed limited overlap with 5hmC, with 5fC-marked regions more active than 5hmC-marked ones.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Cyclization labeling of 5fC and fC-CET.
Figure 2: fC-CET reveals base-resolution 5fC maps in the whole genome.
Figure 3: 5fC represents a more active marker than 5hmC.

Similar content being viewed by others

Accession codes

Primary accessions

Gene Expression Omnibus

Change history

  • 21 September 2015

    In the version of this article initially published online, Chuan He is incorrectly affiliated with Tsinghua-Peking Center for Life Sciences, School of Life Sciences, Tsinghua University, Beijing, China, and is missing an affiliation with the Department of Chemical Biology, College of Chemistry and Molecular Engineering, Peking University, Beijing, China. This error has been corrected for the print, PDF and HTML versions of this article.

References

  1. Tahiliani, M. et al. Science 324, 930–935 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Ito, S. et al. Science 333, 1300–1303 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. He, Y.F. et al. Science 333, 1303–1307 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Song, C.X. & He, C. Trends Biochem. Sci. 38, 480–484 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Bachman, M. et al. Nat. Chem. 6, 1049–1055 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Raiber, E.A. et al. Genome Biol. 13, R69 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Shen, L. et al. Cell 153, 692–706 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Song, C.X. et al. Cell 153, 678–691 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Maiti, A. & Drohat, A.C. J. Biol. Chem. 286, 35334–35338 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Pfaffeneder, T. et al. Angew. Chem. Int. Ed. Engl. 50, 7008–7012 (2011).

    Article  CAS  PubMed  Google Scholar 

  11. Booth, M.J., Marsico, G., Bachman, M., Beraldi, D. & Balasubramanian, S. Nat. Chem. 6, 435–440 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Lu, X. et al. Cell Res. 25, 386–389 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Wu, H., Wu, X., Shen, L. & Zhang, Y. Nat. Biotechnol. 32, 1231–1240 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Booth, M.J. et al. Science 336, 934–937 (2012).

    Article  CAS  PubMed  Google Scholar 

  15. Rivera, C.M. & Ren, B. Cell 155, 39–55 (2013).

    Article  CAS  PubMed  Google Scholar 

  16. Neri, F. et al. Cell Rep. 10, 674–683 (2015).

    Article  CAS  PubMed  Google Scholar 

  17. Song, C.X. et al. Nat. Biotechnol. 29, 68–72 (2011).

    Article  CAS  PubMed  Google Scholar 

  18. Yu, M. et al. Cell 149, 1368–1380 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Iurlaro, M. et al. Genome Biol. 14, R119 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Spruijt, C.G. et al. Cell 152, 1146–1159 (2013).

    Article  CAS  PubMed  Google Scholar 

  21. Wang, D. et al. Biochemistry 42, 6747–6753 (2003).

    Article  CAS  PubMed  Google Scholar 

  22. Obeid, S. et al. EMBO J. 29, 1738–1747 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Peng, X. et al. Nucleic Acids Res. 43, e35 (2015).

    Article  PubMed  Google Scholar 

  24. Krueger, F. & Andrews, S.R. Bioinformatics 27, 1571–1572 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Zhang, Y. et al. Genome Biol. 9, R137 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Heinz, S. et al. Mol. Cell 38, 576–589 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Xiao, S. et al. Cell 149, 1381–1392 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Williams, K. et al. Nature 473, 343–348 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  29. Hon, G.C. et al. Mol. Cell 56, 286–297 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

The authors thank R. Meng, S.T. Huang, J.Y. Liu, J.Y. Li, X.T. Shu, X.Y. Li and C.X. Zhu for technical assistance; X.X. Zhang and H.S. Guo (Peking University, Beijing, China) for providing genomic DNA at the beginning of the project; C.F. Xia for synthetic suggestions; and O. Stovicek for editing the manuscript. This work was supported by the National Basic Research Foundation of China (grant 2014CB964900 to C.Y.), the National Natural Science Foundation of China (grants 31270838 and 21472009 to C.Y.), and the US National Institutes of Health (grant R01 HG006827 to C.H.). C.H. is supported by the Howard Hughes Medical Institute.

Author information

Authors and Affiliations

Authors

Contributions

B.X. and C.Y. conceived the original idea and designed the experiments with the help of X.L. and C.H.; B.X. performed the experiments with the help of X.L., H.Z., M.L. and X.J.; D.H. performed bioinformatics analysis; Z.S. and A.Z. synthesized the chemicals; Q.Y. and W.X. helped with the library preparation; C.H. and C.Y. supervised the project; and B.X. and C.Y. wrote the manuscript with contributions from D.H., X.L. and C.H.

Corresponding authors

Correspondence to Chuan He or Chengqi Yi.

Ethics declarations

Competing interests

B.X., A.Z. and C.Y. are co-inventors on a filed patent (WO2015043493) for the labeling strategies and sequencing methods reported herein.

Integrated supplementary information

Supplementary Figure 1 Chemical labeling of 5fC via Friedländer reaction.

(a) Proposed chemical labeling of 5fC based on the principle of Friedländer reaction30. (b) Chemicals screened for 5fC labeling, with reaction yield indicated. 1, ethyl acetoacetate (EA). 2, methyl acetoacetate (MA). 3, ethyl 6-azido-3-oxohexanoate (EAO). 4, diethyl malonate (DM). 5, 1,3-Indandione (I). 6, 5-(2-azidoethyl)-1, 3-indandione (AI). 7, ethyl nitroacetate. 8, ethyl cyanacetate. 9, acetylacetone. 10, benzoylacetone. 11, barbituric acid. 12. benzoylacetonitrile. n.d., no detectable products as measured by MALDI-TOF mass spectrometry.

Supplementary Figure 2 MALDI-TOF characterizations of chemical labeling of 5fC in a 9-mer model DNA.

Calculated and observed molecular weights are shown. The proposed chemical structures of labeling products of 5fC are also shown. *Because this reaction was performed in alkaline methanol solution, the ethyl ester underwent transesterification to form the methyl ester. 5fC-AA: reaction product between 5fC and ethyl or methyl acetoacetate. 5fC-EAO: reaction product between 5fC and ethyl 6-azido-3-oxohexanoate. 5fC-DM: reaction product between 5fC and diethyl malonate. 5fC-I: reaction product between 5fC and 1,3-indandione.

Supplementary Figure 3 HPLC analysis of 1,3-indanedione labeling on 5fC-containing 9-mer model DNA.

9-mer 5fC and 5fC-I oligonucleotides were digested to nucleosides and analyzed with a C-18 column of HPLC (260 nm and 310 nm). The peak for 5fC was completely undetectable after the reaction, indicating full labeling. Authentic 5fC or 5fC-I nucleosides were also analyzed as positive controls.

Supplementary Figure 4 Properties of 5fC labeling products during PCR and/or bisulfite treatment.

(a) 5fC labeling with diethyl malonate protects the product from bisulfite-mediated deamination and hence is read as C in canonical bisulfite sequencing, similar to the previously reported fCAB-Seq8. (b) The adduct between 5fC and 1,3-indandione results in C-to-T transition during PCR. (c) Single-nucleotide incorporation of dATP, dGTP, dCTP and dTTP opposite the 5fC-I for 1, 2, 5, 10, 30 and 60 min, respectively. A template where X is a T was also included as a positive control. (d) Quantification of single-nucleotide incorporation. Values are presented as mean ± s.d. (n = 3). (e) Schematic comparisons of fC-CET with fCAB-Seq.

Source data

Supplementary Figure 5 Potential base-pairing properties of different cyclic adducts.

(a) Cyclic adducts between 5fC and ethyl acetoacetate, diethyl malonate or ethyl 6-azido-3-oxohexanoate. These adducts contain a 4-NH group and are expected to pair with dG in a canonical Watson-Crick fashion. (b) Hypothesized base-pairing modes of 5fC-I with dA. Left: N3-protonated 5fC-I could base-pair with dA in the Watson-Crick manner. Right: alternatively, the six-amino group of dA could form a bidentate hydrogen bond to 5fC-I. (c) Acid/base titration curves of 5fC-I solution (in DMSO:H2O = 5.5:1 solvent). The 13C NMR spectra of the free nucleoside and in 2 equivalents of hydrochoride acid (pH, ~2) are shown in Supplementary Note 2. These data suggest that no protonation events occurred to the free 5fC-I nucleoside.

Source data

Supplementary Figure 6 Monitoring the efficiency of AI-mediated 5fC labeling on both model sequence and genomic DNA.

(a) AI labeling of 5fC generates a pair of isomers and HPLC chromatograms (260 nm and 310 nm) of the nucleosides digested from the 9-mer 5fC-AI oligonucleotides. The split peaks of the two reaction products are also shown. (b) Dot-blot assay monitoring the efficiency of 5fC labeling on both model sequence (starting from 100 ng) and Tdg–/– mESC gDNA (starting from 1,000 ng). Lambda-DNA (λDNA, starting from 1,000 ng) was used as a negative control. (c) Matrix-assisted laser desorption–ionization time-of-flight (MALDI-TOF) mass spectrometry characterization of fC-CET in 9-mer 5fC oligonucleotides. Calculated and observed m/z are shown. (d) Agarose gel analysis of AI-labeled double-stranded 5fC model DNAs to indicate little degradation. Four replicates are shown. (e) Recovery efficiency for each step of fC-CET. 9-mer 5fC oligonucleotide, 5fC model DNA or gDNA samples were used to characterize the recovery efficiency. Values are presented as mean ± s.d. (n = 4 for oligonucleotide and model DNA; n = 5 for gDNA).

Source data

Supplementary Figure 7 AI-mediated labeling of 5fC is highly selective among cytosine derivatives.

9-mer DNAs with a central C (a), 5mC (b), 5hmC (c) and 5caC (d) were labeled under the same conditions as 5fC and then analyzed with MALDI-TOF. No cross-reactivity was observed for these cytosines.

Supplementary Figure 8 AI labeling of 5fC and subsequent C-to-T transition.

(a) Biotin conjugation for pulldown and DTT cleavage. (b) Sanger sequencing results for the fC-CET treated 5fC-containing 76-mer DNA, introducing the C-to-T transition during PCR amplification. (c) FspI-digestion tests to confirm the C-to-T transition caused by AI-mediated 5fC labeling. PCR-amplified input 70-mer 5fC oligo with FspI restriction site in the middle can be cleaved, whereas amplified products of 5fC-AI-Biotin or 5fC-AI-SH stay intact, suggesting loss of restriction site and hence C-to-T transition. (d,e) Efficiency of 5fC labeling on a model DNA. T%, calculated from TOPO-cloning (d) or high-throughput sequencing using Mi-Seq (e), was used to measure the C-to-T transition rate of 5fC after AI labeling and AI-mediated pulldown. In both cases, a normal C next to the target 5fCpG was used as a control.

Supplementary Figure 9 Choice of polymerase to maximize C-to-T signals during PCR.

(a) 76-mer oligonucleotide with two 5fC sites labeled with AI, PCR amplified and then subjected to Sanger sequencing. The graph shows that the polymerase can readily read through the modified sequence even with two successive 5fC-AI adducts next to each other (in a 5ʹ-fCGfCG-3ʹ context). (b) 76-mer control oligonucleotide with G and A replacing the two 5fCs in the sequence in a. (c) Sanger sequencing results of Taq polymerase reading through the adduct between 5fC and AI. PCR bias will diminish the effective C-to-T signals; hence different polymerases were tested to maximize the signal for 5fC detection. Alternatively, new labeling probes could be explored to minimize chemical scarring on the 5fC base31. (d) Commercially available MightyAmp DNA polymerase best overcomes the PCR bias and hence allows maximal C-to-T transition during PCR. (e) Scheme diagram of the on-bead wash step (with NaOH solution) to remove any strands that do not contain 5fC (“supernatant” depicted in Step 2). Therefore, the output (depicted in Step 3) is enriched for 5fC-AI-SH. (f) AI-labeled 76-bp dsDNA with a central 5fC-AI-SH:G pair. The C-T transition peak can be observed at the 5fC site, although the signal of C (resulting from the G in the complementary strand) is higher than that of T (from 5fC-AI-SH). (g) Elution from the DTT-cleavage step. The eluted fraction is enriched for 5fC-AI-SH, and hence a near-complete C-to-T transition peak can be observed at the 5fC site.

Supplementary Figure 10 Genome browser view of 5fC at the Fbxl18 gene.

(a) 5fC-enriched regions in both Tdgfl/fl and Tdg–/– mESCs. Results from two replicates are shown, demonstrating high reproducibility of fC-CET. 5hmC-enriched peaks by hmC-Seal are also shown. (b) Single-base 5fC sites, along with 5mC and 5hmC, in Tdgfl/fl and Tdg–/– mESCs are shown. The peaks of 5fC corresponds to 5mC sites with low abundance.

Supplementary Figure 11 Venn diagrams of the 5fC-enriched regions.

(a,b) 5fC-enriched regions in two biological replicates of Tdgfl/fl (a) and Tdg–/– (b) mESCs. (c) Majority of 5fC-marked regions in the Tdgfl/fl mESCs fall within those in Tdg–/– mESCs. (d,e) Comparisons of 5fC-enriched regions detected from fC-CET with results from fC-Seal in the Tdgfl/fl and Tdg–/– mESCs.

Supplementary Figure 12 5fC sites at single-base resolution.

(a) Comparisons of 5fC sites in the Tdgfl/fl and Tdg–/– mESCs. (b) 5fC sites in Tdg–/– mESCs are grouped based on genomic elements. (c) The relative enrichment of Tdg–/– 5fC sites in different genomic elements.

Source data

Supplementary Figure 13 Locus-specific validation of 5fC sites.

(a,b) fCAB-Seq8 and MAB-Seq13 were performed (in parallel with fC-CET) on the 76-mer 5fC model sequence. Results from TOPO-cloning assay (a) and high-throughput sequencing (b) are shown. In both cases, a normal C on the 5fCpG nearby was used as a control. (ce) Locus-specific validation of 5fC sites with fC-CET (c,d) and fCAB-Seq (e). The columns represent the percentage of T at the target 5fC sites. For each site, a P value is given to show whether it is a statistically significant 5fC site (binomial test for c and d, and Fisher’s test for differences between fCAB and BS datasets in e). *P < 0.05, **P < 0.01, ***P< 0.001; #, not detected because of technical issues.

Source data

Supplementary Figure 14 5fC sites in Tdg–/– mESCs at different regulatory elements.

Normalized read densities of 5fC in Tdg–/– mESCs at the H3K4me1, H3K27ac, p300 and Tet1 regions, respectively. The signals of 5fC at such regions are higher than 5hmC, suggesting that 5fC represents a more active marker compared to 5hmC.

Source data

Supplementary Figure 15 5fC is more enriched than 5hmC at different genomic elements.

The genome loci analyzed included CTCF binding region, DNase I hypersensitive region, enhancer and p300 binding region; the relative enrichment (observed over expected) is used for comparison. The 5fC sites are more enriched than the 5hmC sites in the enhancer domain and p300 binding region, suggesting a more active nature of 5fC marked regions. This is further confirmed by the DNase I hypersensitive region, which represents the more de-condensed genomic regions.

Source data

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–15, Supplementary Tables 1–3 and Supplementary Notes 1–3 (PDF 3048 kb)

Source data

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xia, B., Han, D., Lu, X. et al. Bisulfite-free, base-resolution analysis of 5-formylcytosine at the genome scale. Nat Methods 12, 1047–1050 (2015). https://doi.org/10.1038/nmeth.3569

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.3569

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing