Abstract
The modified base 5-formylcytosine (5fC) was recently identified in mammalian DNA and might be considered to be the 'seventh' base of the genome. This nucleotide has been implicated in active demethylation mediated by the base excision repair enzyme thymine DNA glycosylase. Genomics and proteomics studies have suggested an additional role for 5fC in transcription regulation through chromatin remodeling. Here we propose that 5fC might affect these processes through its effect on DNA conformation. Biophysical and structural analysis revealed that 5fC alters the structure of the DNA double helix and leads to a conformation unique among known DNA structures including those comprising other cytosine modifications. The 1.4-Å-resolution X-ray crystal structure of a DNA dodecamer comprising three 5fCpG sites shows how 5fC changes the geometry of the grooves and base pairs associated with the modified base, leading to helical underwinding.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Ito, S. et al. Tet proteins can convert 5-methylcytosine to 5-formylcytosine and 5-carboxylcytosine. Science 333, 1300–1303 (2011).
Pfaffeneder, T. et al. The discovery of 5-formylcytosine in embryonic stem cell DNA. Angew. Chem. Int. Ed. Engl. 50, 7008–7012 (2011).
Maiti, A. & Drohat, A.C. Thymine DNA glycosylase can rapidly excise 5-formylcytosine and 5-carboxylcytosine: potential implications for active demethylation of CpG sites. J. Biol. Chem. 286, 35334–35338 (2011).
Hashimoto, H., Hong, S., Bhagwat, A.S., Zhang, X. & Cheng, X. Excision of 5-hydroxymethyluracil and 5-carboxylcytosine by the thymine DNA glycosylase domain: its structural basis and implications for active DNA demethylation. Nucleic Acids Res. 40, 10203–10214 (2012).
Iurlaro, M. et al. A screen for hydroxymethylcytosine and formylcytosine binding proteins suggests functions in transcription and chromatin regulation. Genome Biol. 14, R119 (2013).
Renciuk, D., Blacque, O., Vorlickova, M. & Spingler, B. Crystal structures of B-DNA dodecamer containing the epigenetic modifications 5-hydroxymethylcytosine or 5-methylcytosine. Nucleic Acids Res. 41, 9891–9900 (2013).
Lercher, L. et al. Structural insights into how 5-hydroxymethylation influences transcription factor binding. Chem. Commun. (Camb.) 50, 1794–1796 (2014).
Wang, L. et al. Programming and inheritance of parental DNA methylomes in mammals. Cell 157, 979–991 (2014).
Raiber, E.A. et al. Genome-wide distribution of 5-formylcytosine in embryonic stem cells is associated with transcription and depends on thymine DNA glycosylase. Genome Biol. 13, R69 (2012).
Song, C.X. et al. Genome-wide profiling of 5-formylcytosine reveals its roles in epigenetic priming. Cell 153, 678–691 (2013).
Shen, L. et al. Genome-wide analysis reveals TET- and TDG-dependent 5-methylcytosine oxidation dynamics. Cell 153, 692–706 (2013).
You, C. et al. Effects of Tet-mediated oxidation products of 5-methylcytosine on DNA transcription in vitro and in mammalian cells. Sci. Rep. 4, 7052 (2014).
Hu, L. et al. Crystal structure of TET2-DNA complex: insight into TET-mediated 5mC oxidation. Cell 155, 1545–1555 (2013).
Xu, L. et al. Pyrene-based quantitative detection of the 5-formylcytosine loci symmetry in the CpG duplex content during TET-dependent demethylation. Angew. Chem. Int. Edn. Engl. 53, 11223–11227 (2014).
Thalhammer, A., Hansen, A.S., El-Sagheer, A.H., Brown, T. & Schofield, C.J. Hydroxylation of methylated CpG dinucleotides reverses stabilisation of DNA duplexes by cytosine 5-methylation. Chem. Commun. (Camb.) 47, 5325–5327 (2011).
Sutherland, J.C., Griffin, K.P., Keck, P.C. & Takacs, P.Z. Z-DNA: vacuum ultraviolet circular dichroism. Proc. Natl. Acad. Sci. USA 78, 4801–4804 (1981).
Booth, M.J., Marsico, G., Bachman, M., Beraldi, D. & Balasubramanian, S. Quantitative sequencing of 5-formylcytosine in DNA at single-base resolution. Nat. Chem. 6, 435–440 (2014).
Spruijt, C.G. et al. Dynamic readers for 5-(hydroxy)methylcytosine and its oxidized derivatives. Cell 152, 1146–1159 (2013).
Wyatt, M.D., Allan, J.M., Lau, A.Y., Ellenberger, T.E. & Samson, L.D. 3-methyladenine DNA glycosylases: structure, function, and biological importance. BioEssays 21, 668–676 (1999).
Kabsch, W. Integration, scaling, space-group assignment and post-refinement. Acta Crystallogr. D Biol. Crystallogr. 66, 133–144 (2010).
Adams, P.D. et al. PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr. D Biol. Crystallogr. 66, 213–221 (2010).
Emsley, P., Lohkamp, B., Scott, W.G. & Cowtan, K. Features and development of Coot. Acta Crystallogr. D Biol. Crystallogr. 66, 486–501 (2010).
Zheng, G., Lu, X.J. & Olson, W.K. Web 3DNA: a web server for the analysis, reconstruction, and visualization of three-dimensional nucleic-acid structures. Nucleic Acids Res. 37, W240–W246 (2009).
Lavery, R., Moakher, M., Maddocks, J.H., Petkeviciute, D. & Zakrzewska, K. CURVES+ web server for analyzing and visualizing the helical, backbone and groove parameters of nucleic acid structures. Nucleic Acids Res. 37, 5917–5929 (2009).
Bingman, C., Jain, S., Zon, S. & Sundaralingam, M. Crystal and molecular structure of the alternating dodecamer d(GCGTACGTACGC) in the A-DNA form: comparison with the isomorphous non-alternating dodecamer d(CCGTACGTACGG). Nucleic Acids Res. 20, 6637–6647 (1992).
Bingman, C.A., Zon, G. & Sundaralingam, M. Crystal and molecular structure of the A-DNA dodecamer d(CCGTACGTACGG). Choice of fragment helical axis. J. Mol. Biol. 227, 738–756 (1992).
Drew, H.R. et al. Structure of a B-DNA dodecamer: conformation and dynamics. Proc. Natl. Acad. Sci. USA 78, 2179–2183 (1981).
Locasale, J.W., Napoli, A.A., Chen, S., Berman, H.M. & Lawson, C.L. Signatures of protein-DNA recognition in free DNA binding sites. J. Mol. Biol. 386, 1054–1065 (2009).
Leonard, G.A. & Hunter, W.N. Crystal and molecular structure of d(CGTAGATCTACG) at 2.25 A resolution. J. Mol. Biol. 234, 198–208 (1993).
Acknowledgements
E.-A.R. is supported as a Herchel Smith Fellow. The Balasubramanian laboratory is supported by a Senior Investigator Award from the Wellcome Trust (099232/Z/12/Z to S.B.), and it also receives core funding from Cancer Research UK (C9681/A11961 to S.B.). D.Y.C. is supported by the Crystallographic X-ray Facility (CXF) at the Department of Biochemistry, University of Cambridge, and B.F.L. is supported by the Wellcome Trust (076846/Z/05/A to B.F.L.). We thank the staff of Soleil and Diamond Light Source for use of facilities. We thank C. Calladine for stimulating discussions.
Author information
Authors and Affiliations
Contributions
E.-A.R., P.M. and S.B. designed the project and wrote the manuscript with contributions from all authors. E.-A.R. and P.M. performed biophysical experiments and analyzed X-ray crystallographic data. D.Y.C. and B.F.L. acquired and analyzed X-ray crystallographic data. D.Y.C. solved the structure with P-SAD. D.B. performed computational analysis of sequence data sets. S.B. supervised the project. All authors interpreted the data and read and approved the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 High levels of 5fC are found in CpG repeats.
a) Histograms of percentage 5fC for increasing length of CpG repeats, b) To highlight the increase of 5fC with repeat length, i.e. a right shift, the same data as in a) is plotted on the same graph as density histograms (the density represents the probability of getting an x value between a range of x values), c) Influence of the length of CpG repeats, d(CG)n, on the distribution of significant 5fC sites. This is the same representation as in Figure 1b but applied to the Booth et al. data set.17 Formylation levels of (CpG)3 repeats are similar within a strand and across both strands. d) Same data as in Figure 1c but with CpG sites ordered by significance of 5fC (i.e. FDR) instead of genomic position, e) Formylation level of 16 randomly chosen (CpG)3 repeats (8 highly formylated (above the median, blue lines) and 8 lowly formylated (below the median, red lines).
Supplementary Figure 2 Examples of genes showing 5fC enrichment in CpG repeats.
A selection of genes were picked from single base resolution 5fC sequencing of mouse two-cell embryos8 that show 5fC enrichment in CpG repeats. The list includes genes coding for proteins that are involved in the regulation of key cellular processes.
Supplementary Figure 3 Genomic location and gene ontology analysis of highly formylated CpG repeats.
a) Genomic distribution of CpG repeats of length three or more and with formylation above the median of the repeat (18.51%, “Significant high”, orange). For comparison, the distribution of all the CpG repeats of length 3 repeats or more is also shown (blue). The genomic regions were obtained from annovar (see Supplementary Note) and, for brevity, only those with ≥ 2% sites are shown here. Gene ontology analysis shows that formylated CpG repeats are enriched in genes involved in transcriptional, developmental and differentiation processes. Enrichment analyses were performed using b) the Database for Annotation, Visualization and Integrated Discovery (DAVID, http://david.abcc.ncifcrf.gov) or c) the Gene Ontology project database (AmiGO, http://www.geneontology.org).
Supplementary Figure 4 Biophysical and structural characterization of the 5fC nucleotide and the 5fC-containing dodecamer.
a) CD analysis of 5′-modified C nucleotides (10 μM in PBS pH 7.2) shows that 5fC differs from the other cytosine derivatives by a maximum of molar ellipticity in the 280-300nm regions. b) CD analysis of the dodecamer (5’- CTA5fCG5fCG5fCGTAG-3’) in the crystallization buffer (0.01 M magnesium sulphate, 0.05 M sodium cacodylate pH 6.0, 1.8 M lithium sulphate) and in PBS shows a negative ellipticity in the near UV region. c) Sample of experimental electron density map. A stereoscopic view of the density-modified map calculated with the phases obtained by the application of phosphorus single wavelength anomalous dispersion (P-SAD) phasing technique. The map is contoured at 1.5 σ. All 11 phosphorus atom sites in the asymmetric unit were identified in the experimental map, and these are shown as grey balls. The 2-fold symmetry-related set of phosphorus atom sites that completes the DNA double helix are shown as green balls.
Supplementary Figure 5 Comparison of base-step of the 5fC-containing duplex (F-DNA) to B- and A-form DNA.
a) Twist and b) Tilt angles of F-DNA (red line). F-DNA parameters are compared to canonical A- and B- DNA (blue and black lines respectively). The presented values are the mean and standard deviation obtained from experimental structures of A-DNA and B-DNA of similar length and base composition (n=3, see Online Methods).
Supplementary Figure 6 Effect of hemiformylated 5fCpGs on B-DNA structure.
a) Kimura et al. report the 1.60 Å resolution crystal structure of a hemiformylated Dickerson-Drew duplex (PDB entry 1VE8) showing that 5fC (green spheres) does not impact the overall B-DNA structure but promote b) unusual 5fC-G / 5fC-G/G-C steps that display similar c) local rotational and translocational parameters to the ones observed within F-DNA.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–6, Supplementary Table 1 and Supplementary Note (PDF 3484 kb)
Rights and permissions
About this article
Cite this article
Raiber, EA., Murat, P., Chirgadze, D. et al. 5-Formylcytosine alters the structure of the DNA double helix. Nat Struct Mol Biol 22, 44–49 (2015). https://doi.org/10.1038/nsmb.2936
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nsmb.2936