Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

5-Formylcytosine alters the structure of the DNA double helix

Abstract

The modified base 5-formylcytosine (5fC) was recently identified in mammalian DNA and might be considered to be the 'seventh' base of the genome. This nucleotide has been implicated in active demethylation mediated by the base excision repair enzyme thymine DNA glycosylase. Genomics and proteomics studies have suggested an additional role for 5fC in transcription regulation through chromatin remodeling. Here we propose that 5fC might affect these processes through its effect on DNA conformation. Biophysical and structural analysis revealed that 5fC alters the structure of the DNA double helix and leads to a conformation unique among known DNA structures including those comprising other cytosine modifications. The 1.4-Å-resolution X-ray crystal structure of a DNA dodecamer comprising three 5fCpG sites shows how 5fC changes the geometry of the grooves and base pairs associated with the modified base, leading to helical underwinding.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: High levels of cytosine formylation in genomic DNA are observed at CpG repeats.
Figure 2: 5fC-containing oligonucleotides are characterized by unusual spectroscopic and structural signatures.
Figure 3: Comparison of base-step and groove parameters of the 5fC-containing duplex (F-DNA) with B- and A-form DNA.
Figure 4: Induced conformational transformation of F-DNA to B-DNA.

Similar content being viewed by others

Accession codes

Primary accessions

Protein Data Bank

Referenced accessions

Protein Data Bank

References

  1. Ito, S. et al. Tet proteins can convert 5-methylcytosine to 5-formylcytosine and 5-carboxylcytosine. Science 333, 1300–1303 (2011).

    Article  CAS  Google Scholar 

  2. Pfaffeneder, T. et al. The discovery of 5-formylcytosine in embryonic stem cell DNA. Angew. Chem. Int. Ed. Engl. 50, 7008–7012 (2011).

    Article  CAS  Google Scholar 

  3. Maiti, A. & Drohat, A.C. Thymine DNA glycosylase can rapidly excise 5-formylcytosine and 5-carboxylcytosine: potential implications for active demethylation of CpG sites. J. Biol. Chem. 286, 35334–35338 (2011).

    Article  CAS  Google Scholar 

  4. Hashimoto, H., Hong, S., Bhagwat, A.S., Zhang, X. & Cheng, X. Excision of 5-hydroxymethyluracil and 5-carboxylcytosine by the thymine DNA glycosylase domain: its structural basis and implications for active DNA demethylation. Nucleic Acids Res. 40, 10203–10214 (2012).

    Article  CAS  Google Scholar 

  5. Iurlaro, M. et al. A screen for hydroxymethylcytosine and formylcytosine binding proteins suggests functions in transcription and chromatin regulation. Genome Biol. 14, R119 (2013).

    Article  Google Scholar 

  6. Renciuk, D., Blacque, O., Vorlickova, M. & Spingler, B. Crystal structures of B-DNA dodecamer containing the epigenetic modifications 5-hydroxymethylcytosine or 5-methylcytosine. Nucleic Acids Res. 41, 9891–9900 (2013).

    Article  CAS  Google Scholar 

  7. Lercher, L. et al. Structural insights into how 5-hydroxymethylation influences transcription factor binding. Chem. Commun. (Camb.) 50, 1794–1796 (2014).

    Article  CAS  Google Scholar 

  8. Wang, L. et al. Programming and inheritance of parental DNA methylomes in mammals. Cell 157, 979–991 (2014).

    Article  CAS  Google Scholar 

  9. Raiber, E.A. et al. Genome-wide distribution of 5-formylcytosine in embryonic stem cells is associated with transcription and depends on thymine DNA glycosylase. Genome Biol. 13, R69 (2012).

    Article  Google Scholar 

  10. Song, C.X. et al. Genome-wide profiling of 5-formylcytosine reveals its roles in epigenetic priming. Cell 153, 678–691 (2013).

    Article  CAS  Google Scholar 

  11. Shen, L. et al. Genome-wide analysis reveals TET- and TDG-dependent 5-methylcytosine oxidation dynamics. Cell 153, 692–706 (2013).

    Article  CAS  Google Scholar 

  12. You, C. et al. Effects of Tet-mediated oxidation products of 5-methylcytosine on DNA transcription in vitro and in mammalian cells. Sci. Rep. 4, 7052 (2014).

    Article  CAS  Google Scholar 

  13. Hu, L. et al. Crystal structure of TET2-DNA complex: insight into TET-mediated 5mC oxidation. Cell 155, 1545–1555 (2013).

    Article  CAS  Google Scholar 

  14. Xu, L. et al. Pyrene-based quantitative detection of the 5-formylcytosine loci symmetry in the CpG duplex content during TET-dependent demethylation. Angew. Chem. Int. Edn. Engl. 53, 11223–11227 (2014).

    Article  CAS  Google Scholar 

  15. Thalhammer, A., Hansen, A.S., El-Sagheer, A.H., Brown, T. & Schofield, C.J. Hydroxylation of methylated CpG dinucleotides reverses stabilisation of DNA duplexes by cytosine 5-methylation. Chem. Commun. (Camb.) 47, 5325–5327 (2011).

    Article  CAS  Google Scholar 

  16. Sutherland, J.C., Griffin, K.P., Keck, P.C. & Takacs, P.Z. Z-DNA: vacuum ultraviolet circular dichroism. Proc. Natl. Acad. Sci. USA 78, 4801–4804 (1981).

    Article  CAS  Google Scholar 

  17. Booth, M.J., Marsico, G., Bachman, M., Beraldi, D. & Balasubramanian, S. Quantitative sequencing of 5-formylcytosine in DNA at single-base resolution. Nat. Chem. 6, 435–440 (2014).

    Article  CAS  Google Scholar 

  18. Spruijt, C.G. et al. Dynamic readers for 5-(hydroxy)methylcytosine and its oxidized derivatives. Cell 152, 1146–1159 (2013).

    Article  CAS  Google Scholar 

  19. Wyatt, M.D., Allan, J.M., Lau, A.Y., Ellenberger, T.E. & Samson, L.D. 3-methyladenine DNA glycosylases: structure, function, and biological importance. BioEssays 21, 668–676 (1999).

    Article  CAS  Google Scholar 

  20. Kabsch, W. Integration, scaling, space-group assignment and post-refinement. Acta Crystallogr. D Biol. Crystallogr. 66, 133–144 (2010).

    Article  CAS  Google Scholar 

  21. Adams, P.D. et al. PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr. D Biol. Crystallogr. 66, 213–221 (2010).

    Article  CAS  Google Scholar 

  22. Emsley, P., Lohkamp, B., Scott, W.G. & Cowtan, K. Features and development of Coot. Acta Crystallogr. D Biol. Crystallogr. 66, 486–501 (2010).

    Article  CAS  Google Scholar 

  23. Zheng, G., Lu, X.J. & Olson, W.K. Web 3DNA: a web server for the analysis, reconstruction, and visualization of three-dimensional nucleic-acid structures. Nucleic Acids Res. 37, W240–W246 (2009).

    Article  CAS  Google Scholar 

  24. Lavery, R., Moakher, M., Maddocks, J.H., Petkeviciute, D. & Zakrzewska, K. CURVES+ web server for analyzing and visualizing the helical, backbone and groove parameters of nucleic acid structures. Nucleic Acids Res. 37, 5917–5929 (2009).

    Article  CAS  Google Scholar 

  25. Bingman, C., Jain, S., Zon, S. & Sundaralingam, M. Crystal and molecular structure of the alternating dodecamer d(GCGTACGTACGC) in the A-DNA form: comparison with the isomorphous non-alternating dodecamer d(CCGTACGTACGG). Nucleic Acids Res. 20, 6637–6647 (1992).

    Article  CAS  Google Scholar 

  26. Bingman, C.A., Zon, G. & Sundaralingam, M. Crystal and molecular structure of the A-DNA dodecamer d(CCGTACGTACGG). Choice of fragment helical axis. J. Mol. Biol. 227, 738–756 (1992).

    Article  CAS  Google Scholar 

  27. Drew, H.R. et al. Structure of a B-DNA dodecamer: conformation and dynamics. Proc. Natl. Acad. Sci. USA 78, 2179–2183 (1981).

    Article  CAS  Google Scholar 

  28. Locasale, J.W., Napoli, A.A., Chen, S., Berman, H.M. & Lawson, C.L. Signatures of protein-DNA recognition in free DNA binding sites. J. Mol. Biol. 386, 1054–1065 (2009).

    Article  CAS  Google Scholar 

  29. Leonard, G.A. & Hunter, W.N. Crystal and molecular structure of d(CGTAGATCTACG) at 2.25 A resolution. J. Mol. Biol. 234, 198–208 (1993).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

E.-A.R. is supported as a Herchel Smith Fellow. The Balasubramanian laboratory is supported by a Senior Investigator Award from the Wellcome Trust (099232/Z/12/Z to S.B.), and it also receives core funding from Cancer Research UK (C9681/A11961 to S.B.). D.Y.C. is supported by the Crystallographic X-ray Facility (CXF) at the Department of Biochemistry, University of Cambridge, and B.F.L. is supported by the Wellcome Trust (076846/Z/05/A to B.F.L.). We thank the staff of Soleil and Diamond Light Source for use of facilities. We thank C. Calladine for stimulating discussions.

Author information

Authors and Affiliations

Authors

Contributions

E.-A.R., P.M. and S.B. designed the project and wrote the manuscript with contributions from all authors. E.-A.R. and P.M. performed biophysical experiments and analyzed X-ray crystallographic data. D.Y.C. and B.F.L. acquired and analyzed X-ray crystallographic data. D.Y.C. solved the structure with P-SAD. D.B. performed computational analysis of sequence data sets. S.B. supervised the project. All authors interpreted the data and read and approved the manuscript.

Corresponding author

Correspondence to Shankar Balasubramanian.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 High levels of 5fC are found in CpG repeats.

a) Histograms of percentage 5fC for increasing length of CpG repeats, b) To highlight the increase of 5fC with repeat length, i.e. a right shift, the same data as in a) is plotted on the same graph as density histograms (the density represents the probability of getting an x value between a range of x values), c) Influence of the length of CpG repeats, d(CG)n, on the distribution of significant 5fC sites. This is the same representation as in Figure 1b but applied to the Booth et al. data set.17 Formylation levels of (CpG)3 repeats are similar within a strand and across both strands. d) Same data as in Figure 1c but with CpG sites ordered by significance of 5fC (i.e. FDR) instead of genomic position, e) Formylation level of 16 randomly chosen (CpG)3 repeats (8 highly formylated (above the median, blue lines) and 8 lowly formylated (below the median, red lines).

Supplementary Figure 2 Examples of genes showing 5fC enrichment in CpG repeats.

A selection of genes were picked from single base resolution 5fC sequencing of mouse two-cell embryos8 that show 5fC enrichment in CpG repeats. The list includes genes coding for proteins that are involved in the regulation of key cellular processes.

Supplementary Figure 3 Genomic location and gene ontology analysis of highly formylated CpG repeats.

a) Genomic distribution of CpG repeats of length three or more and with formylation above the median of the repeat (18.51%, “Significant high”, orange). For comparison, the distribution of all the CpG repeats of length 3 repeats or more is also shown (blue). The genomic regions were obtained from annovar (see Supplementary Note) and, for brevity, only those with ≥ 2% sites are shown here. Gene ontology analysis shows that formylated CpG repeats are enriched in genes involved in transcriptional, developmental and differentiation processes. Enrichment analyses were performed using b) the Database for Annotation, Visualization and Integrated Discovery (DAVID, http://david.abcc.ncifcrf.gov) or c) the Gene Ontology project database (AmiGO, http://www.geneontology.org).

Supplementary Figure 4 Biophysical and structural characterization of the 5fC nucleotide and the 5fC-containing dodecamer.

a) CD analysis of 5′-modified C nucleotides (10 μM in PBS pH 7.2) shows that 5fC differs from the other cytosine derivatives by a maximum of molar ellipticity in the 280-300nm regions. b) CD analysis of the dodecamer (5’- CTA5fCG5fCG5fCGTAG-3’) in the crystallization buffer (0.01 M magnesium sulphate, 0.05 M sodium cacodylate pH 6.0, 1.8 M lithium sulphate) and in PBS shows a negative ellipticity in the near UV region. c) Sample of experimental electron density map. A stereoscopic view of the density-modified map calculated with the phases obtained by the application of phosphorus single wavelength anomalous dispersion (P-SAD) phasing technique. The map is contoured at 1.5 σ. All 11 phosphorus atom sites in the asymmetric unit were identified in the experimental map, and these are shown as grey balls. The 2-fold symmetry-related set of phosphorus atom sites that completes the DNA double helix are shown as green balls.

Supplementary Figure 5 Comparison of base-step of the 5fC-containing duplex (F-DNA) to B- and A-form DNA.

a) Twist and b) Tilt angles of F-DNA (red line). F-DNA parameters are compared to canonical A- and B- DNA (blue and black lines respectively). The presented values are the mean and standard deviation obtained from experimental structures of A-DNA and B-DNA of similar length and base composition (n=3, see Online Methods).

Supplementary Figure 6 Effect of hemiformylated 5fCpGs on B-DNA structure.

a) Kimura et al. report the 1.60 Å resolution crystal structure of a hemiformylated Dickerson-Drew duplex (PDB entry 1VE8) showing that 5fC (green spheres) does not impact the overall B-DNA structure but promote b) unusual 5fC-G / 5fC-G/G-C steps that display similar c) local rotational and translocational parameters to the ones observed within F-DNA.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–6, Supplementary Table 1 and Supplementary Note (PDF 3484 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Raiber, EA., Murat, P., Chirgadze, D. et al. 5-Formylcytosine alters the structure of the DNA double helix. Nat Struct Mol Biol 22, 44–49 (2015). https://doi.org/10.1038/nsmb.2936

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nsmb.2936

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing