Letter | Published:

A vitamin-C-derived DNA modification catalysed by an algal TET homologue

Abstract

Methylation of cytosine to 5-methylcytosine (5mC) is a prevalent DNA modification found in many organisms. Sequential oxidation of 5mC by ten-eleven translocation (TET) dioxygenases results in a cascade of additional epigenetic marks and promotes demethylation of DNA in mammals1,2. However, the enzymatic activity and function of TET homologues in other eukaryotes remains largely unexplored. Here we show that the green alga Chlamydomonas reinhardtii contains a 5mC-modifying enzyme (CMD1) that is a TET homologue and catalyses the conjugation of a glyceryl moiety to the methyl group of 5mC through a carbon–carbon bond, resulting in two stereoisomeric nucleobase products. The catalytic activity of CMD1 requires Fe(ii) and the integrity of its binding motif His-X-Asp, which is conserved in Fe-dependent dioxygenases3. However, unlike previously described TET enzymes, which use 2-oxoglutarate as a co-substrate4, CMD1 uses l-ascorbic acid (vitamin C) as an essential co-substrate. Vitamin C donates the glyceryl moiety to 5mC with concurrent formation of glyoxylic acid and CO2. The vitamin-C-derived DNA modification is present in the genome of wild-type C. reinhardtii but at a substantially lower level in a CMD1 mutant strain. The fitness of CMD1 mutant cells during exposure to high light levels is reduced. LHCSR3, a gene that is critical for the protection of C. reinhardtii from photo-oxidative damage under high light conditions, is hypermethylated and downregulated in CMD1 mutant cells compared to wild-type cells, causing a reduced capacity for photoprotective non-photochemical quenching. Our study thus identifies a eukaryotic DNA base modification that is catalysed by a divergent TET homologue and unexpectedly derived from vitamin C, and describes its role as a potential epigenetic mark that may counteract DNA methylation in the regulation of photosynthesis.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Data availability

All the sequencing data reported in this paper are summarized in Supplementary Table 1 and have been deposited in the Gene Expression Omnibus database under accession code GSE122719. Source data for Figs. 1b, 4d and Extended Data Figs. 8d–g, 11b, c are presented in Supplementary Fig. 1. All other data are available from the corresponding author on request.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. 1.

    Pastor, W. A., Aravind, L. & Rao, A. TETonic shift: biological roles of TET proteins in DNA demethylation and transcription. Nat. Rev. Mol. Cell Biol. 14, 341–356 (2013).

  2. 2.

    Bochtler, M., Kolano, A. & Xu, G. L. DNA demethylation pathways: Additional players and regulators. BioEssays 39, 1–13 (2017).

  3. 3.

    Martinez, S. & Hausinger, R. P. Catalytic mechanisms of Fe(ii)- and 2-oxoglutarate-dependent oxygenases. J. Biol. Chem. 290, 20702–20711 (2015).

  4. 4.

    Walport, L. J., Hopkinson, R. J. & Schofield, C. J. Mechanisms of human histone and nucleic acid demethylases. Curr. Opin. Chem. Biol. 16, 525–534 (2012).

  5. 5.

    Morales-Ruiz, T. et al. DEMETER and REPRESSOR OF SILENCING 1 encode 5-methylcytosine DNA glycosylases. Proc. Natl Acad. Sci. USA 103, 6853–6858 (2006).

  6. 6.

    He, Y. F. et al. Tet-mediated formation of 5-carboxylcytosine and its excision by TDG in mammalian DNA. Science 333, 1303–1307 (2011).

  7. 7.

    Tahiliani, M. et al. Conversion of 5-methylcytosine to 5-hydroxymethylcytosine in mammalian DNA by MLL partner TET1. Science 324, 930–935 (2009).

  8. 8.

    Ito, S. et al. Tet proteins can convert 5-methylcytosine to 5-formylcytosine and 5-carboxylcytosine. Science 333, 1300–1303 (2011).

  9. 9.

    Kriaucionis, S. & Heintz, N. The nuclear DNA base 5-hydroxymethylcytosine is present in Purkinje neurons and the brain. Science 324, 929–930 (2009).

  10. 10.

    Wu, X. & Zhang, Y. TET-mediated active DNA demethylation: mechanism, function and beyond. Nat. Rev. Genet. 18, 517–534 (2017).

  11. 11.

    Zhang, H. & Zhu, J. K. Active DNA demethylation in plants and animals. Cold Spring Harb. Symp. Quant. Biol. 77, 161–173 (2012).

  12. 12.

    Hashimoto, H. et al. Structure of a Naegleria Tet-like dioxygenase in complex with 5-methylcytosine DNA. Nature 506, 391–395 (2014).

  13. 13.

    Zhang, L. et al. A TET homologue protein from Coprinopsis cinerea (CcTET) that biochemically converts 5-methylcytosine to 5-hydroxymethylcytosine, 5-formylcytosine, and 5-carboxylcytosine. J. Am. Chem. Soc. 136, 4801–4804 (2014).

  14. 14.

    Chavez, L. et al. Simultaneous sequencing of oxidized methylcytosines produced by TET/JBP dioxygenases in Coprinopsis cinerea. Proc. Natl Acad. Sci. USA 111, E5149–E5158 (2014).

  15. 15.

    Carell, T. et al. Structure and function of noncanonical nucleobases. Angew. Chem. Int. Ed. Engl. 51, 7110–7131 (2012).

  16. 16.

    Iyer, L. M., Tahiliani, M., Rao, A. & Aravind, L. Prediction of novel families of enzymes involved in oxidative and other complex modifications of bases in nucleic acids. Cell Cycle 8, 1698–1710 (2009).

  17. 17.

    Merchant, S. S. et al. The Chlamydomonas genome reveals the evolution of key animal and plant functions. Science 318, 245–250 (2007).

  18. 18.

    Hu, L. et al. Crystal structure of TET2–DNA complex: insight into TET-mediated 5mC oxidation. Cell 155, 1545–1555 (2013).

  19. 19.

    Hausinger, R. P. Fe(ii)/α-ketoglutarate-dependent hydroxylases and related enzymes. Crit. Rev. Biochem. Mol. Biol. 39, 21–68 (2004).

  20. 20.

    Karplus, M. Vicinal proton coupling in nuclear magnetic resonance. J. Am. Chem. Soc. 85, 2870–2871 (1963).

  21. 21.

    Urzica, E. I. et al. Impact of oxidative stress on ascorbate biosynthesis in Chlamydomonas via regulation of the VTC2 gene encoding a GDP-l-galactose phosphorylase. J. Biol. Chem. 287, 14234–14245 (2012).

  22. 22.

    Peers, G. et al. An ancient light-harvesting protein is critical for the regulation of algal photosynthesis. Nature 462, 518–521 (2009).

  23. 23.

    Dai, H. Q. et al. TET-mediated DNA demethylation controls gastrulation by regulating Lefty-Nodal signalling. Nature 538, 528–532 (2016).

  24. 24.

    Lopez, D. et al. Dynamic changes in the transcriptome and methylome of Chlamydomonas reinhardtii throughout its life cycle. Plant Physiol. 169, 2730–2743 (2015).

  25. 25.

    Young, J. I., Züchner, S. & Wang, G. Regulation of the epigenome by vitamin C. Annu. Rev. Nutr. 35, 545–564 (2015).

  26. 26.

    Cimmino, L., Neel, B. G. & Aifantis, I. Vitamin C in stem cell reprogramming and cancer. Trends Cell Biol. 28, 698–708 (2018).

  27. 27.

    Bonente, G. et al. Analysis of LhcSR3, a protein essential for feedback de-excitation in the green alga Chlamydomonas reinhardtii. PLoS Biol. 9, e1000577 (2011).

  28. 28.

    Petroutsos, D. et al. A blue-light photoreceptor mediates the feedback regulation of photosynthesis. Nature 537, 563–566 (2016).

  29. 29.

    Mullins, E. A. et al. The DNA glycosylase AlkD uses a non-base-flipping mechanism to excise bulky lesions. Nature 527, 254–258 (2015).

  30. 30.

    Heyn, H. & Esteller, M. An adenine code for DNA: a second life for N6-methyladenine. Cell 161, 710–713 (2015).

  31. 31.

    Hemming, B. C. & Gubler, C. J. High-pressure liquid chromatography of alpha-keto acid 2,4-dinitrophenylhydrazones. Anal. Biochem. 92, 31–40 (1979).

  32. 32.

    Vidal-Meireles, A. et al. Regulation of ascorbate biosynthesis in green algae has evolved to enable rapid stress-induced response via the VTC2 gene encoding GDP-l-galactose phosphorylase. New Phytol. 214, 668–681 (2017).

  33. 33.

    Jiang, L., Huang, J., Wang, Y. & Tang, H. Eliminating the dication-induced intersample chemical-shift variations for NMR-based biofluid metabonomic analysis. Analyst 137, 4209–4219 (2012).

  34. 34.

    Liu, H. et al. Identification of three novel polyphenolic compounds, origanine A-C, with unique skeleton from Origanum vulgare L. using the hyphenated LC-DAD-SPE-NMR/MS methods. J. Agric. Food Chem. 60, 129–135 (2012).

  35. 35.

    Lambert, J. B. & Mazzola, E. P. Nuclear Magnetic Resonance Spectroscopy: An Introduction to Principles, Applications, and Experimental Methods (Pearson Education, 2004).

  36. 36.

    Frisch, M. J. et al. Gaussian 09 (Gaussian, Wallingford, 2009).

  37. 37.

    Sueoka, N., Chiang, K. S. & Kates, J. R. Deoxyribonucleic acid replication in meiosis of Chlamydomonas reinhardi. I. Isotopic transfer experiments with a strain producing eight zoospores. J. Mol. Biol. 25, 47–66 (1967).

  38. 38.

    Baek, K. et al. DNA-free two-gene knockout in Chlamydomonas reinhardtii via CRISPR-Cas9 ribonucleoproteins. Sci. Rep. 6, 30620 (2016).

  39. 39.

    Maniatis, T. Molecular Cloning: a Laboratory Manual (Cold Spring Harbor Laboratory Press, 1982).

  40. 40.

    Strenkert, D., Schmollinger, S. & Schroda, M. Protocol: methodology for chromatin immunoprecipitation (ChIP) in Chlamydomonas reinhardtii. Plant Methods 7, 35 (2011).

  41. 41.

    Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

  42. 42.

    Xi, Y. & Li, W. BSMAP: whole genome bisulfite sequence MAPping program. BMC Bioinformatics 10, 232 (2009).

  43. 43.

    Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate — a practical and powerful approach to multiple testing. J. R. Stat. Soc. B Met. 57, 289–300 (1995).

  44. 44.

    Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013).

  45. 45.

    Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protocols 7, 562–578 (2012).

  46. 46.

    Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).

Download references

Acknowledgements

We thank Y. Xu for the pPEI-His-Sumo vector; Y. Shan, D. Qiu, J. Kang, B. Han and L. Xu for assistance with mass spectrometry analysis; N. Xu for assistance in C. reinhardtii culturing and the gametogenesis experiment; W. Yang for the npq4 strain; J. Minagawa, G. Peers, S. Toth, M. Levine, C. Fulton, Y. Wang, W. Yang, and C. Yi for discussions. This work is supported by the National Key R&D Program of China (2017YFA0102700 to G.X.; 2017YFC0906800 to Huiru Tang), the National Science Foundation of China (31830018 and 31430049 to G.-L.X.; 81590953 and 21575151 to Huiru Tang; 91851201 to K.H.), the Shanghai Municipal Science and Technology Project (2017SHZDZX01, 16JC1400500 to Huiru Tang), the Chinese Academy of Sciences (XDB19010102 to G.-L.X.), Heye Health Technology Inc. and NIH grant R01-GM118501. Z.-J.Z. is also supported by the Thousand Youth Talents Program and an Agilent Technologies Thought Leader Award.

Reviewer information

Nature thanks Thomas Carell, Arne Klungland, Skirmantas Kriaucionis, Krishna Niyogi and Daniel Zilberman for their contribution to the peer review of this work.

Author information

G.-L.X. conceived the project. J.-H.X. identified 5gmC and the CMD1 reaction. J.-H.X., G.-D.C., Q.-Q.F., X.W. and H.Y. performed the enzymatic assays, HPLC and MS analyses. F.H., Y.C. and Huiru Tang conducted NMR and DFT analysis. H.C., J.-H.X., Q.-L.Y., X.-J.Z., J.Z., B.-A.W., X.D., W.M. and K.H. generated the mutant C. reinhardtii strains and performed the phenotype analysis. B.P., W.L., J.D. and E.W. proposed the reaction mechanism. F.-F.C. and Z.-J.Z. performed multiple-reaction-monitoring-based LC–MS analysis. Z.F., C.X., Hui Tang and L.C. performed the RNA-seq and whole-genome bisulfite sequencing analyses. J.-H.X., R.M.K. and G.-L.X. wrote the paper, with contributions from all other authors.

Competing interests

The authors declare no competing interests.

Correspondence to Kaiyao Huang or Huiru Tang or Guo-Liang Xu.

Extended data figures and tables

Extended Data Fig. 1 Alignment of TET homologues in C. reinhardtii with Naegleria Tet1.

Eight TET-like proteins were found using the TET–JBP domain as a query for BLAST search in the Phytozome database of C. reinhardtii. These proteins have a conserved His-X-Asp motif, as observed in TET proteins from mammals and Naegleria. The symbols above the sequence denote the functional residues in N. gruberi (Ng)Tet1 determined by structural and biochemical analyses. m, metal (iron) binding site; C, 5mC interaction; a, active centre; α, 2-OG binding site, which is not conserved in CrTET1 (CMD1). The gene names for the CrTET in the Phytozome database are as follows: CrTET1: Cre12.g553400, CrTET2: Cre16.g654100, CrTET3: Cre02.g081150, CrTET4: Cre02.g141466, CrTET5: Cre17.g734757, CrTET6: Cre15.g643388, CrTET7: Cre02.g142867, CrTET8: Cre15.g642800.

Extended Data Fig. 2 Purification of recombinant CMD1 and determination of DNA substrate specificity.

a, Coomassie blue staining of untagged full-length CMD1 protein purified from E. coli. An image of fractions collected from the gel filtration chromatography column (eluted between 14 and 17 min, 1 ml min–1) is shown. Representative image from at least three independent experiments. b, Coomassie blue staining of purified wild-type or mutant CMD1 proteins. H345 and D347 correspond to the conserved residues of the iron-binding sites based on the sequence alignment of TET homologues; A330 is predicted to be in the active site required for CMD1 enzymatic activity; and D350 might be involved in the 5mC interaction. Representative image from two independent experiments. For source data in a, b, see Supplementary Fig. 1. c, CMD1 mutants had no or substantially reduced ability to convert 5mC into P1 and P2. Data are representative of two independent experiments. d, P1 and P2 nucleosides accumulate over a period of 2 h upon incubation of the 5mC–DNA substrate with CMD1, as shown by HPLC analysis of nucleosides in DNA samples collected at the indicated time points. Data are representative of two independent experiments. e, Time-course of the relative amounts of 5mC, P1 and P2 during incubation of 5mC–DNA with CMD1. The amount was determined by the peak area of each nucleoside in the HPLC analysis in d. Data are representative of two independent experiments. f, 5mC–DNA, but not C- or 5hmC-containing DNA, serves as a substrate for CMD1. DNA substrates containing C, 5hmC or 5mC were prepared by PCR, incubated with CMD1, and then subjected to nucleoside composition analysis using HPLC. Note that P1 and P2 nucleosides appear in 5mC–DNA only upon incubation with wild-type CMD1. Mut CMD1 is an inactive mutant carrying point mutations (H345Y/D347A). Data are representative of two independent experiments.

Extended Data Fig. 3 Deuterium tracing of the methyl group in 5mC–DNA.

a, b, Tandem mass spectrometry analysis of the HPLC fractions corresponding to the minor side products generated in the CMD1 reaction and comparison with authentic 5hmC (a) and 5caC (b) standards (Fig. 1a; see the reaction mechanism proposed in Extended Data Fig. 7c for further discussion of the origins of 5hmC and 5caC). Data are representative of two independent experiments. c, Mass spectrometry detection of 5mC nucleoside in a DNA substrate methylated in vitro with M.SssI using D3-labelled S-adenosyl-l-methionine ([methyl-D3]-SAM). The mass of 5mC increases by three units when [methyl-D3]-SAM was used. Data are representative of two independent experiments. d, Identification of P1 and P2 bases using the masses of molecules and fragmentation products from tandem mass spectrometry. P1 and P2 produce identical collision-induced dissociation (CID) fragments, suggesting that they are stereoisomers. The most abundant fragments generated by CID of P1 and P2 are shown. Molecular formulae were deduced from the molecular masses. As all the fragment ions of P1 and P2 that are generated from D3-labelled 5mC are 2 D larger than those from unlabelled 5mC, the new modification is likely to occur at the methyl group; the bridging methylene linked to the pyrimidine ring seems unaltered in the CID. P1 and P2 appeared to lose three H2O molecules (molecular mass 18.0100) consecutively in CID, indicating the presence of three hydroxyl groups in the P1 and P2 structures. Data are representative of two independent experiments.

Extended Data Fig. 4 NMR signal assignments support P1 identity as 5-(1-[2,3,4-trihydroxybutyl])-2′-deoxycytidine.

a, 1H-NMR spectrum of P1 with signal assignments. The spectrum shows all the non-exchangeable proton signals with their chemical shifts and J-coupling constants for P1 (Extended Data Table 1). b, 1H-1H 2D COSY spectrum for P1 with assignments. The sequential positions of protons showed in two spin-coupling systems as δH 6.299–2.320/3.437–4.455–4.062–3.773/3.860 in a deoxyribosyl moiety and δH 3.813/3.664–3.615–3.811–2.793/2.505. c, 1H-1H 2D TOCSY spectrum for P1 with assignments. Three coupling systems were observed in this TOCSY spectrum. The first coupling system showed a typical signal pattern for a deoxyriboside moiety, here with seven protons at δH 6.299 (1H, t, H1′), 4.455 (1H, m, H3′), 4.062 (1H, m, H4′), 3.860 (1H, dd, H5′b), 3.773 (1H, dd, H5′a), 2.437 (1H, ddd, H2′b) and 2.320 (1H, dt, H2′a). The second one was observed for six protons at δH 3.813 (1H, H10b), 3.811 (1H, ddd, H8), 3.664 (1H, dd, H10a), 3.615 (1H, ddd, H9), 2.793 (1H, ddd, H7b), 2.505 (1H, ddd, H7a) and 2.320 (1H, dt, H2′a). A third coupling system was observed as a weak correlation between δH 7.759 (1H, t, H6) and a CH2 moiety (H7a and H7b, δH 2.793 and 2.505). d, 1H-1H JRES spectrum for P1 showing J-coupling patterns from all protons (Extended Data Table 1). The F1 dimension gives coupling constants (Hz) and the F2 dimension gives chemical shift information. e, 1H-13C 2D HSQC spectrum for P1 with assignments. The direct H–C linkages were detected by the one-bond 1H–13C correlations in this HSQC spectrum. f, 1H-13C 2D HMBC spectrum for P1 with assignments. The long-range 1H–13C correlations were detected in the HMBC spectrum. The proton at δH 7.759 showed long-range correlations with C2, C4 and C5 (δC 159.98, 168.53 and 107.64, respectively) of a cytosine residue, with C7 of the THB moiety (δC 33.64), and with the deoxyribosyl C1′ (δC 88.95). This indicates that C7 (CH2) of the THB moiety was attached to C6 of a cytosine ring. This was further confirmed with long-range correlations between H7 (δH 2.793, 2.505) and C4, C5, C6, C8 and C9 (δC 168.53, 107.64, 143.83, 72.56 and 76.94). The long correlations between H1′ (δH 6.299) and C2 and C6 (δC 168.53 and 143.83) in the HMBC spectrum further confirmed the N1–C1′ linkage between the deoxyribosyl and cytosine moieties. Taking the above into consideration, P1 was finally determined as 5-(1-[2,3,4-trihydroxybutyl])-2′-deoxycytidine (Fig. 2c) with its 1H and 13C signals unambiguously assigned and tabulated in Extended Data Table 1. Representative results are shown from two independent experiments.

Extended Data Fig. 5 P2 is determined as a stereoisomer of P1.

a, 1H NMR spectrum for P2 with signal assignments. b, 1H-1H COSY spectrum for P2 with assignments. c, 1H-1H TOCSY spectrum for P2 with assignments. d, 1H-1H JRES spectrum for P2. e, 1H-13C HSQC spectrum for P2 with assignments. f, 1H-13C HMBC spectrum for P2 with assignments. In the same manner as in Extended Data Fig. 4, the structure of P2 (Fig. 2c) was determined as 5-(1-[2, 3, 4-trihydroxybutyl])-2′-deoxycytidine using a 1H NMR spectrum and a series of 2D NMR spectra indicating P2 as a stereoisomer of P1. Unlike P1, there were stronger coupling relationships among H8, H9, H10a and H10b and this showed more complicated splitting of peaks in P2. Therefore, accurate chemical shifts and coupling constants were simulated with NMR-Sim5.4 to achieve the maximum similarity to the experimental data (Extended Data Table 1). Representative results are shown from two independent experiments.

Extended Data Fig. 6 Comparison of co-factor requirements of CMD1 and hTET2.

a, The 90-D modification on 5mC does not originate from CMD1 or co-purified small compounds. The CMD1 protein was purified from E. coli grown in M9 medium with 12C- or 13C-labelled glucose as the only carbon source. The lack of mass increase in P1 generated with the 13C-CMD1 preparation suggests that the P1 modification is derived from a reaction component rather than a compound co-purified with the CMD1 enzyme. Data are representative of two independent experiments. b, O2 is indispensable for CMD1 activity. P1 and P2 were not detectable unless O2 was bubbled into a reaction mixture that was incubated under an N2 atmosphere in a glove box. Data are representative of two independent experiments. c, Mass analysis of P1 nucleoside from reactions using 18O-labelled oxygen or water. The mass of P1 nucleoside remained unaltered compared to that of P1 obtained from the reaction using unlabelled oxygen or water. Data are representative of two independent experiments. d, 2-OG is not required for CMD1. Reactions were performed under indicated conditions and HPLC was used to analyse the nucleosides of DNA products. N-oxalylglycine (N-OG), an analogue of 2-OG, does not inhibit the activity of CMD1. Data are representative of two independent experiments. e, Fe2+ is indispensable for CMD1 activity. Reactions were performed in the presence of indicated metal ions or EDTA. Data are representative of two independent experiments. f, 2-OG and Fe2+, but not vitamin C, are required for the activity of hTET2. Reactions were performed under indicated conditions. N-OG inhibits the activity of hTET2. Data are representative of two independent experiments. g, Analogues of vitamin C do not support CMD1 activity. Data are representative of at least three independent experiments. h, Dehydroascorbic acid (DHA), an oxidized form of vitamin C, supports CMD1 activity only upon its reduction into vitamin C by DTT. The conversion of DHA into vitamin C by DTT was confirmed by mass spectrometry analysis (not shown). Data are representative of at least three independent experiments. i, Heat-inactivated vitamin C (100 °C overnight) does not support CMD1 activity. Data are representative of two independent experiments.

Extended Data Fig. 7 Characterization of reaction mechanism of CMD1.

a, Mass spectrometry analysis of P1 nucleoside from reactions using various 13C-labelled vitamin-C co-substrates. The use of [13C6]-VC led to a 3-D increase in the mass of P1, whereas no mass change was detected when [1-13C]-VC or [3-13C]-VC was used. This indicated that the glyceryl moiety was from C4–C6 of vitamin C. Data are representative of two independent experiments. b, Mass determination of the most abundant fragment ions generated by CID of P1. Arch arrows denote the relationship of ions featuring the loss of 13C carbons (top three panels) and loss of 12C carbons (bottom panel). The masses corresponding to the fragments containing 13C atoms are indicated in red. These data indicate that 13C6 of vitamin C ends up in the distal carbon of the side chain of P1 (C10 in Fig. 2c), and that 13C from [5-13C]-VC ends up in C9. Data are representative of two independent experiments. c, Proposed mechanism of CMD1 catalysis. The catalysis starts with the coordination of Fe(ii) to the conserved 2-His-1-carboxylate triad of the enzyme, leaving three sites on the metal that are occupied by water molecules (A). Deprotonated vitamin C displaces two bound water molecules and coordinates to Fe(ii) with its C-1 carbonyl group and C-2 alkoxide (B). Hydrolysis of the bound vitamin C yields the ring-opened intermediate (C), which then tautomerizes to the α-keto form (D). The remaining bound water molecule leaves when 5mC binds to the active site (E). The binding of O2 to the iron centre generates an Fe(iii)-superoxo intermediate (F). The nucleophilic attack of the distal oxygen onto C-2 of 2-keto-l-gulonate yields an Fe(iv)-peroxo species (G). This species initiates oxidative decarboxylation of vitamin C to produce an Fe(iv)-oxo species, which is coordinated with the C-1 carboxylate of the resulting l-xylonic acid (H). The Fe(iv)-oxo species abstracts a hydrogen atom from 5mC to generate Fe(iii)-hydroxide species and a 5mC radical (I). The C-2 hydroxyl group of the coordinated l-xylonic acid binds to the Fe(iii) centre with loss of a bound water molecule (J). Homolysis of the C2–C3 bond of the coordinated l-xylonic acid and non-stereoselective attack of the 5mC radical lead to the formation of the product nucleobases P1 and P2 and Fe(ii)-bound glyoxylic acid (K). Eventually, glyoxylate dissociates from the iron centre to complete the catalytic cycle. The side reaction that generates 5hmC can be explained by this reaction mechanism; the 5mC radical combines with a hydroxide group linked to Fe(iii) (intermediate I), in a manner similar to reactions catalysed by TET dioxygenases. Notably, however, the generation of trace 5hmC is not dependent on 2-OG (Fig. 3a, Extended Data Fig. 6d), confirming that a different mechanism is involved. d, GC–MS analysis of the co-product CO2 from CMD1-catalysed reactions using 13C-labelled vitamin C. The reactions were carried out in airtight vials and directly subjected to GC–MS analysis. The carbon atom of CO2 is shown to come from the C1 of vitamin C. Data are representative of two independent experiments. e, Mass spectrometry analysis of the co-product glyoxylic acid upon DNP derivatization. As C4–C6 and C1 of vitamin C were transferred into base P and CO2, respectively, the remaining two carbons of vitamin C were converted into glyoxylic acid. This is in close agreement with the mass increases of the glyoxylic acid derivatives when using uniformly labelled (13C6) and singly labelled (3-13C) vitamin C. The arrow indicates the peak of the DNP conjugate in the LC profiles. Data are representative of two independent experiments.

Extended Data Fig. 8 Generation of a cmd1 strain using a CRISPR–Cas9-based co-selection strategy and co-segregation of the high light-sensitive phenotype with the CMD1 mutation.

a, The conversion of indole to tryptophan is catalysed by the tryptophan (Trp) synthase-β subunit encoded by the endogenous MAA7 gene in C. reinhardtii. When 5-fluoroindole (5-FI) is used in place of indole, it will be converted into 5-fluorotryptophan, which is lethally toxic to cells. b, The CRISPR–Cas9-mediated co-selection strategy to introduce a mutation into C. reinhardtii. Recombinant Cas9 protein purified from E. coli was assembled with sgRNA for both the MAA7 gene and a target gene of interest to form RNP complexes. Upon electroporation of the mixture of the two RNP complexes into cells, 5-FI-resistant colonies were selected and genotyped to identify clones with a desired mutation in the targeted gene. The mutant strains were then backcrossed with the wild-type strain to segregate the target gene mutation from the MAA7 mutation or any other off-target mutations. c, The genomic loci of CMD1 (also known as CrTET1) and its close paralogue CrTET2. At the CMD1 locus of cmd1 cells, there is an insertion of 245 bp in exon 3, thus generating a frame-shift mutation. Chromosome locations of the two paralogues are indicated above. DNA sequences from the targeted loci in wild-type and cmd1 strains are shown below. The 3-nt protospacer adjacent motif (PAM) (red) and 20-nt sgRNA-binding sequences (blue) are distinctively coloured. d, Genomic PCR genotyping of the cmd1 strain using two primer pairs as shown in c. Sizes expected for the PCR products are indicated. Note that the forward primer of primer pair 1 (c) can bind to both the CMD1 and CrTET2 genomic loci. The forward primer of primer pair 2 is specific for a site upstream of CMD1. Representative image from at least three independent experiments. e, Southern blot analysis of the CMD1 genomic locus. The locations of the probe (dark blue bar) and the SalI and NheI restriction sites used for the digestion of the genomic DNA are indicated in c. Two bands detected in the lane of the cmd1 DNA sample arose from the mutant CMD1 locus with a 245-bp insert and the unaltered CrTET2 paralogous locus of almost identical sequence, respectively. The expected lengths of the detected restriction fragments are shown in parentheses. Representative image from two independent experiments. f, RT–PCR analysis of the region spanning the targeted site of exon 3. The expected lengths of PCR products from the wild-type and cmd1 cells are shown in parentheses. Representative image from two independent experiments. g, Co-segregation analysis of the CMD1 mutation in the progeny of a cross between wild-type CC124 and the cmd1 strain. Equal numbers of the cells were dripped onto agar plates and exposed to low light (20 μmol photons m−2 s−1) or high light (1,000 μmol photons m−2 s−1) for 66 h. A1 and A2 are the cmd1 and wild-type CC124 cells, respectively. Red circles mark the clones of the parental cmd1 strain and the progeny lines, for which growth was inhibited under high light. Forty-eight progeny clones were tested and 14 representative clones are shown here. Right, result of algal colony PCR for genotyping of the progeny clones. Primer pair 2 shown in c was used. For source data in panels dg, see Supplementary Fig. 1.

Extended Data Fig. 9 Role of vitamin C in the regulation of LHCSR3 expression and NPQ.

a, Generation of vtc2 mutant strains. Genomic structure of the VTC2 gene and the sequences flanking the Cas9 cleavage site (downward arrows) in wild-type (WT) and mutant strains. An 83-nt donor oligonucleotide carrying a frame-shift mutation (insertion of A) was co-electroporated into algal cells for homology-directed repair (HDR) with VTC2 in the CRISPR–Cas9-based co-selection procedure (Extended Data Fig. 8b). Out of 48 5-FI-resistant MAA7 mutant clones obtained, 7 were identified to be vtc2 mutants by sequencing. Among them, two clones (numbers 1 and 2) carried the desired insertion of an A, apparently derived from HDR-mediated editing, and the other five clones (3–7) carried indels, arising from non-homologous end joining. In the wild-type gene sequence, the 20-nt sgRNA-binding (blue) and 3-nt PAM (red) sequences are distinctively coloured. b, Cellular vitamin C content in wild-type, vtc2 and cmd1 mutant strains determined by LC–MS. Cells were cultured in TAP medium under continuous illumination with 50 μmol photons m−2 s−1. Data are mean ± s.e.m. of two independent biological replicates with individual data shown as shapes. c, Methylation analysis of the genomic locus 5′ of the LHCSR3.1 gene in wild-type and vtc2 strains after exposure to high light (300 μmol photons m−2 s−1). The open and black circles represent unmethylated and methylated CpG sites, respectively. Representative results from two independent experiments. d, Determination of mRNA expression of LHCSR3.1 and LHCSR3.2 in wild-type and vtc2 strains after exposure to high light (300 μmol photons m−2 s−1). The expression of LHCSR3.1 and LHCSR3.2 was first normalized to the expression of the housekeeping gene GBLP, and the resulting values were compared to those of wild-type samples, which were set to 1.0. Data are mean ± s.e.m. of two independent biological replicates with individual data shown as shapes. e, NPQ induction kinetics of wild-type and mutant strains. Cells were grown under a light intensity of 180 μmol photons m−2 s−1 for 24 h. NPQ was then recorded upon illumination with 600 μmol photons m−2 s−1 for 5 min (white bar) followed by 2.5 min in darkness (black bar). Data are mean ± s.e.m. of five independent biological replicates. f, VTC2 mRNA expression in wild-type and cmd1 strains after exposure to high light (300 μmol photons m−2 s−1). Real-time RT–PCR analysis was used for quantification. The expression of VTC2 was first normalized to the expression of GBLP and the resulting values were compared to that of the wild-type sample, which was set to 1.0. Data are mean ± s.e.m. of four independent biological replicates with individual data shown as shapes.

Extended Data Fig. 10 Functional analyses of vitamin-C-derived modification in C. reinhardtii.

a, Quantification of 5gmC and 5mC nucleosides in genomic DNA from wild-type CC125 strain treated with 400 μM 5-aza. Data are mean ± s.e.m. from three independent biological replicates (circles). Two-tailed Student’s t-test was used without adjustment for multiple comparisons. b, Determination of the electron transport rate of wild-type and cmd1 cells with Dual-PAM-100. Cells were prepared as for Fig. 4c. Data are mean ± s.e.m. from three independent biological replicates. c, Expression of photosynthesis-related genes in cmd1 cells determined by RNA-seq analysis. Cells were grown under high light (300 μmol photons m−2 s−1). Expression levels are relative to wild-type, which is set as 1.0. d, Volcano plot showing genes that are differentially expressed (DEGs) in cmd1 cells versus wild-type cells. n = 3. The analysis was based on edgeR’s quasi-likelihood F-test, which is a two-sided test without adjustment for multiple comparisons. e, Gene Ontology (GO) analysis of DEGs in cmd1 cells. n = 3. Functional enrichment was based on one-sided Fisher’s exact test and the top significant GO terms were selected without adjustment for multiple comparisons. f, Nucleotide contexts enriched in differential methylated cytosines in cmd1 cells compared to the wild type. g, Genomic feature distribution of differentially methylated regions (DMRs) in cmd1 mutant cells compared to the wild type. DMRs were filtered by length (at least 400 bp) and the difference in methylation ratio difference between wild-type and cmd1 cells (at least 20% methylation changes). The DMRs were annotated and analysed for feature distribution. h, DNA methylation frequency distribution in wild-type and cmd1 mutant cells. The cytosines were categorized into ten intervals on the basis of their methylation levels and their numbers in each interval were counted. i, Abundance of 5mC at genes with low and high expression in wild-type cells. 5mC has slightly higher abundance in genes expressed at lower levels. All genes were divided into the low 50% and high 50% expression categories. Methylation at –2 to 0 kb upstream of the TSS was analysed. n = 2. The two-sided Wilcoxon signed-rank test was used without adjustment for multiple comparisons. j, Comparison of the expression of hypermethylated and hypomethylated genes in cmd1 cells and wild-type cells. Hypermethylated genes show reduced expression. Methylation at –2 to 0 kb upstream of the TSS was analysed. n = 2. Two group of genes were chosen by controlling the FDR to be 0.001 after adjustment for multiple comparisons. The two-sided Wilcoxon signed-rank test was used. In box plots, outer edges show first and third quartiles; midline indicates median. Whiskers indicate the maximum and minimum values within 1.5 times the interquartile range. k, Gene Ontology of genes that are differentially methylated at the promoter region in cmd1 cells. n = 2. Two-sided Fisher’s exact test was used without adjustments for multiple comparisons. l, Methylation pattern at the genomic locus of LHCSR3.1 in wild-type and cmd1 mutant cells. Vertical bars indicate the methylation level at individual CpG dyads. The grey-shaded area indicates the region analysed in Fig. 4f. Representative image from two independent experiments.

Extended Data Fig. 11 CMD1 regulates LHCSR3 expression by promoting DNA demethylation through 5gmC generation.

a, Schematics of the CMD1 and LHCSR3 transgene expression constructs used for complementation of the cmd1 strain. The paromomycin resistance marker (AphVIII) was used for selection of transgenic clones. The HSP70ARBCS2 fusion promoter (HSRB) drives transgene expression. An HA epitope added to the C terminus of CMD1 allows detection of the fusion protein. b, Western blot analysis of CMD1–HA expression in wild-type cells, cmd1 cells and cmd1 cells complemented with wild-type CMD1–HA (WT-1 and -2) or mutant CMD1–HA (HD-1 and -2) as indicated above. Anti-HA antibody was used for detection. Detection with anti-α-tubulin provided a sample processing control. Wild-type and cmd1 strains without the CMD1-HA transgene served as negative controls. Representative results from two independent experiments. c, Western blot analysis of LHCSR3 protein in wild-type cells, cmd1 cells and cmd1 cells complemented with CMD1-HA or LHCSR3 as indicated above. Detection with anti-α-tubulin provided a sample processing control. Representative results from two independent experiments. For source data for b, c, see Supplementary Fig. 1. d, Erlenmeyer flasks containing cells as indicated growing photo-autotrophically after 16 h of exposure to high light (750 μmol photons m−2 s−1). Representative photographs from three independent experiments. e, Determination of the effect of 5mC and 5gmC on transcription in C. reinhardtii using a luciferase reporter assay. Luciferase reporters driven by promoters (HSRB or LHCSR3) containing unmodified cytosine, 5mC or 5gmC, prepared by M.SssI treatment or further treated by CMD1, were transformed into C. reinhardtii. The cells were collected at different time points for measurement of luciferase activity. The mock sample was transformed with an empty vector. The luciferase activity was normalized to the corresponding chlorophyll fluorescence and then compared to the value of the mock control, which is set to 1. Data are mean ± s.e.m. of two independent biological replicates (shapes). f, Schematic diagram of TET-BS sequencing analysis. In conventional bisulfite sequencing, C, 5fC and 5caC but not 5mC or 5hmC are converted into U by bisulfite treatment, which is read as T in PCR and sequencing. However, 5gmC is read as C, which is thus indistinguishable from 5mC or 5hmC. By TET treatment, both 5mC and 5hmC are oxidized into 5caC, which is then read as T in subsequent bisulfite sequencing. Therefore, only 5gmC (orange lollipop) in the starting DNA sample is read as C (blank lollipop, lower right) in TET-BS sequencing. g, Establishment of TET-BS assay to distinguish 5gmC from all other forms. A lambda DNA fragment was used to test the feasibility of the assay. After methylation with M.SssI enzyme, all CpG sites are resistant to deamination and thus read as C in BS-seq. 5gmCs, which exist only in the CMD1-treated 5mC–λDNA, are detected as C because they are non-convertible in TET-BS treatment. Each circle represents a CpG site. Representative results from two independent experiments. h, BS-seq and TET-BS-seq analysis of the HSRB promoter used in the luciferase assay. Upon nuclear transformation of the cytosine-modified DNA, a substantial portion of 5gmC underwent conversion to C (reduced from 84.2% to 70.8%) and the high 5mC level remained. Notably, individual 5gmCs at neighbouring Cs on the same DNA template appear to behave differently. Although the mechanism of conversion is not clear, 5gmC might be lost slowly over time through DNA repair or an alternative demethylation process. Representative results from two independent experiments. i, ChIP analysis of the interaction of CMD1–HA with the 5′ genomic region of LHCSR3.1. The different regions of DNA fragments precipitated with anti-HA antibodies were amplified by qPCR. The region amplified by primer pair 3 (chromosome_8: 1947066–1947226) exhibited the strongest interaction with CMD1–HA. The enrichment relative to IgG was normalized to that of cmd1 cells, which was set as 1. Data are mean ± s.e.m. of two independent biological replicates (shapes).

Extended Data Table 1 NMR assignment of compound P1/P2 and comparison of experimental and calculated 3JH–H coupling constants (Hz)

Supplementary information

Supplementary Figure Supplementary Figure 1: The original source images for all data obtained by thin-layer chromatography and electrophoretic separation. This file contains full scanned uncropped images for all data obtained by thin-layer chromatography or electrophoretic separation shown in the manuscript, with the size or molecular weight markers labeled.

Reporting Summary

Supplementary Table Supplementary Table 1: Summary of the high-throughput sequencing data. This file contains a summary of the RNA-sequencing data and whole genome bisulfite sequencing data. The expression level of all genes in wild-type and cmd1 cells is calculated and shown in datasheet 1. The methylation level at the promoter region of all genes is shown in datasheet 2.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark
Fig. 1: CMD1 catalyses DNA modifications of 5-methylcytosine.
Fig. 2: Structural determination of the modified nucleosides P1 and P2.
Fig. 3: Vitamin C is required as a glyceryl donor in CMD1-catalysed modification of 5mC.
Fig. 4: Identification of the vitamin-C-derived modification and its function in the regulation of photosynthesis in C. reinhardtii.
Extended Data Fig. 1: Alignment of TET homologues in C. reinhardtii with Naegleria Tet1.
Extended Data Fig. 2: Purification of recombinant CMD1 and determination of DNA substrate specificity.
Extended Data Fig. 3: Deuterium tracing of the methyl group in 5mC–DNA.
Extended Data Fig. 4: NMR signal assignments support P1 identity as 5-(1-[2,3,4-trihydroxybutyl])-2′-deoxycytidine.
Extended Data Fig. 5: P2 is determined as a stereoisomer of P1.
Extended Data Fig. 6: Comparison of co-factor requirements of CMD1 and hTET2.
Extended Data Fig. 7: Characterization of reaction mechanism of CMD1.
Extended Data Fig. 8: Generation of a cmd1 strain using a CRISPR–Cas9-based co-selection strategy and co-segregation of the high light-sensitive phenotype with the CMD1 mutation.
Extended Data Fig. 9: Role of vitamin C in the regulation of LHCSR3 expression and NPQ.
Extended Data Fig. 10: Functional analyses of vitamin-C-derived modification in C. reinhardtii.
Extended Data Fig. 11: CMD1 regulates LHCSR3 expression by promoting DNA demethylation through 5gmC generation.

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.