Guanine- 5-carboxylcytosine base pairs mimic mismatches during DNA replication

The genetic information encoded in genomes must be faithfully replicated and transmitted to daughter cells. The recent discovery of consecutive DNA conversions by TET family proteins of 5-methylcytosine into 5-hydroxymethylcytosine, 5-formylcytosine, and 5-carboxylcytosine (5caC) suggests these modified cytosines act as DNA lesions, which could threaten genome integrity. Here, we have shown that although 5caC pairs with guanine during DNA replication in vitro, G·5caC pairs stimulated DNA polymerase exonuclease activity and were recognized by the mismatch repair (MMR) proteins. Knockdown of thymine DNA glycosylase increased 5caC in genome, affected cell proliferation via MMR, indicating MMR is a novel reader for 5caC. These results suggest the epigenetic modification products of 5caC behave as DNA lesions.

The genetic information encoded in genomes must be faithfully replicated and transmitted to daughter cells. The recent discovery of consecutive DNA conversions by TET family proteins of 5-methylcytosine into 5-hydroxymethylcytosine, 5-formylcytosine, and 5-carboxylcytosine (5caC) suggests these modified cytosines act as DNA lesions, which could threaten genome integrity. Here, we have shown that although 5caC pairs with guanine during DNA replication in vitro, G?5caC pairs stimulated DNA polymerase exonuclease activity and were recognized by the mismatch repair (MMR) proteins. Knockdown of thymine DNA glycosylase increased 5caC in genome, affected cell proliferation via MMR, indicating MMR is a novel reader for 5caC. These results suggest the epigenetic modification products of 5caC behave as DNA lesions. D NA methylation at the C5 position of cytosine (5mC) in the context of CpG dinucleotides regulates gene expression, retrovirus silencing, X chromosome inactivation, and other functions in mammalian cells 1 . The enzymes responsible for this modification, i.e., DNA methyltransferases (DNMT1-3), are well characterized and are required for normal development in mice 2 . Although DNA methylation was previously assumed to be a stable epigenetic modification, recent discovery of the ten-eleven translocation (TET) family of DNA dioxygenases (TET1-3) has shown that the methyl group of 5mC can be modified to 5-hydroxymethylcytosine (5hmC), adding a layer of complexity to the epigenetic regulation of DNA methylation [3][4][5] . Several studies have developed methods for genome-wide mapping of 5hmC by using either 5hmC-specific antibodies [6][7][8] or chemical labeling to enrich 5hmC-containing DNA 9,10 . More recently methods for mapping at a single-nucleotide resolution level were also reported 11,12 . Those study suggests a role for 5hmC in transcriptional activation and repression in a genomic context-dependent manner 6,7,13 . 5hmC is relatively stable and can be found in various mouse tissues and embryonic stem (ES) cells, although levels differ between cell types 14,15 ; therefore, 5hmC is viewed as an epigenetic modification.
During DNA replication, maintenance DNMT1 maintains symmetric CpG methylation with high specificity on the unmethylated strand of a hemi-methylated CpG sequence, but not in a hemi-hydroxymethylated CpG sequence, which could lead to passive DNA methylation 16,17 . Alternatively, 5hmC can be converted to 5formylcytosine (5fC) and 5-carboxylcytosine (5caC) via TET protein-mediated consecutive oxidations ( Figure 1A) 18,19 . Biochemical analyses suggest thymine DNA glycosylase (TDG) excises 5fC and 5caC, thereby generating an apurinic/apyrimidinic site that is in turn processed by the base excision repair (BER) machinery, suggesting an active DNA demethylation pathway 20 . The importance of TDG in maintaining appropriate DNA methylation has been indicated by targeted Tdg allele disruptions in mice; these knockout mice exhibit aberrant DNA methylation in a subset of gene promoters and enhancers, coincident with dysregulated gene expression 21,22 . Unlike other DNA glycosylases required for the BER pathway, Tdg knockout is embryonic lethal in mice, despite leaving DNA repair largely intact. Thus, TDG is essential for proper embryonic development, in part due to its role in maintaining epigenetic stability during cell-lineage commitment.  Although 5fC and 5caC are assumed to be part of the DNA demethylation pathway and should therefore have short half-lives, substantial amounts of 5fC and 5caC are present in various mouse tissues and ES cells 19 . A recent study showed that 5fC and 5caC are enriched at gene regulatory elements in Tdg-deficient ES cells, suggesting the involvement of 5fC and 5caC in transcriptional regulation 23,24 . Other studies have suggested 5fC induces G?C to A?T transition mutations during DNA replication when DNA polymerase encounters 5fC on template-strand DNA [25][26][27][28] . When 5fC and 5caC behave as mutagenic bases, TET protein-mediated consecutive oxidations of 5mC and 5hmC leads to deleterious consequences such as predisposition to cancer or apoptosis due to the accumulation of genomic mutations, unless 5mC oxidation is coupled with efficient elimination of 5fC and 5caC. Here, we studied the activity of DNA polymerases on oligonucleotide templates containing specifically located epigenetic cytosine products that were oxidatively modified and found that 5caC forms G?T mismatch-mimicking base pairs with unmodified guanine. These mismatch-like base pairs induced DNA polymerase exonuclease activity and were recognized by mismatch repair (MMR) proteins, suggesting a novel DNA damage effect of 5caC via unexpected abortive MMR.

DNA polymerases incorporat dGTP opposite modified cytosines.
To assess whether these modified cytosines, which are involved in epigenetic processes, behave as DNA lesions that induce genomic mutations or block DNA synthesis, we first investigated whether DNA polymerases catalyze DNA synthesis on templates containing a site-specific C, 5mC, 5hmC, 5fC, or 5caC. Figures 1B and  1C show that Klenow fragment exonuclease minus (KF exo-) synthesized ,30-mer DNA fragments on templates containing C, 5mC, or 5hmC, but synthesized fewer ,30-mer DNA fragments on templates containing 5fC or 5caC. In the case of 5caC, a small fraction of polymerases stalled briefly at the modified cytosine. The results suggest that DNA synthesis by KF exo-, unlike the typical stalling induced by DNA damage, was affected by 5fC and 5caC. Next, we examined the nucleotide preference for incorporation opposite a modified cytosine by KF exo-. The polymerase preferentially incorporated dGTP opposite modified cytosines, suggesting these cytosine derivatives were not highly mutagenic and that the less-efficient DNA synthesis observed in the case of 5fC and 5caC was not due to miscoding ( Figure 1D). Human DNA polymerase g (Polg) permits replication past DNA lesions on templates 29,30 ; Polg exhibited a similar difference in DNA synthesis efficiency ( Figure  S1A and S1B) and preferentially incorporated dGTP opposite modified cytosines ( Figure S1C). 5caC pairing with guanine stimulates the proofreading function of Pold. Both KF exo-and Polg were incapable of proofreading during DNA synthesis; therefore, we investigated whether human DNA polymerase d (Pold), which harbors an intrinsic 39 to 59 exonuclease domain, catalyzes DNA synthesis past the modified cytosines during replication. Interestingly, although Pold synthesized DNA fragments on all templates, proofreading cleavage products were observed with only the 5caC templates ( Figure 2A). Next, we examined the nucleotide preference for incorporation opposite a modified cytosine by Pold exo-, which catalyzes DNA synthesis ( Figure S2) and lacks 39 to 59 exonuclease activity. As shown in Figure 2B, this polymerase also incorporated dGTP opposite modified cytosines.
When base-pair misincorporation occurs during DNA synthesis, the proofreading exonuclease activity of DNA polymerase removes the incorrect base. To test whether this exonuclease activity ensured correct pairing of 5caC, polymerization reactions by Pold were performed in the presence of only a single dGTP. Fragment degradation by the exonuclease activity of Pold was simultaneously observed on the 5caC templates ( Figure 2C), indicating that 5caC pairing with guanine stimulates the proofreading function of Pold.
We observed similar DNA synthesis or exonuclease effects for 5caC during DNA synthesis by Klenow fragment (KF exo1), which possesses 39 to 59 exonuclease activity during DNA synthesis ( Figure  S3). KF exo1 possesses intrinsic terminal-deoxynucleotidyl transferase activity; therefore, it could synthesize DNA fragments up to approximately 20 bp on all templates ( Figure S3C). In the case of KF exo1, primers annealed with 5caC templates were more degraded than those annealed with 5fC templates, indicating that G?5caC pairings stimulate the proofreading function more than G?5fC pairings do.
MutSa complex recognizes G?5caC pairs in DNA substrates. The proofreading function of DNA polymerases plays an important role in correcting replicative mismatch errors. Our results suggest this proofreading occurs at G?5caC pairings but not at other cytosine pairings. Although 5caC forms appropriate base pairs with guanine, we hypothesized that these pairings behave like mismatches ( Figure 3A). If this holds true, the mismatch repair (MMR) protein MutS should recognize both pairings as it does G?T mismatches, which are a canonical MutS substrate [31][32][33] . To test this possibility, we performed electrophoretic mobility shift assays (EMSAs) with Taq MutS and 30-mer DNA substrates containing G?T, G?C, G?5mC, G?5hmC, G?5fC, and G?5caC. We observed a striking difference in MutS binding efficiency between these forms of cytosine. MutS bound G?T and G?5caC pairs ( Figure S4A and S4B). The binding preference order was G?T 5 G?5caC . G?5fC . G?C 5 G?5mC 5 G?5hmC. Next, we performed EMSAs with 34mer G?5caC-containing DNA substrates and human MMR protein MutSa complexes, which consist of MSH2 and MSH6 ( Figure S5), because the exonuclease activity of Pold was observed only on the 5caC templates. The MutSa complex is a human homolog of the MMR protein MutS and is indispensable for the mammalian MMR system 34,35 . MutSa bound to the positive control G?T pairs and to the G?5caC pairs ( Figure 3B); addition of excess cold G?T DNA substrates inhibited binding between MutSa and G?5caC DNA substrates ( Figure 3C). To confirm this interaction, biotin labeled-G?5caC DNA substrates were incubated with HeLa whole cell extracts and the DNA-bound proteins were pulled down with streptavidin-coated beads; MSH2 and MSH6 were detected by immunoblotting. Results confirmed the MutSa complex recognized G?5caC pairs in DNA substrates ( Figures 3D and 3E). Thus, the G?5caC pairs behaved similarly to a G?T mismatch when Pold synthesized new DNA fragments opposite 5caC, although DNA polymerase correctly incorporated dGTP. In addition, G?5caC pairs may be subjected to MMR in mammalian cells.
Accumulation of 5caC affects cell proliferation. In the MMR system, exonuclease I removed the daughter DNA strand but could not remove template DNA containing the modified cytosine. Thus, the ''offending'' site persisted in the template. The ensuing abortive turnover of new DNA may result in a death response. Earlier studies have shown that TDG binds to these G?5caC pairs 36 and excises the modified cytosine 20 . To investigate the effects of these G?T mismatch-mimicking base pairs in mammalian cells in vivo, we confirmed the expression levels of Tets, TDG, and MSH2 in various human cells ( Figure S6) and then knocked down TDG expression, thereby inducing the accumulation of G?5caC base pairs and observed the viable cells in TDG-knockdown cultures ( Figure 4A and Figure S7). As expected, 5caC was induced in TDG-knockdown cells ( Figure 4B and Figure S8) versus controlknockdown or MSH2-knockdown cells. TDG-knockdown HeLa     Figure 4C and 4D), indicating the accumulated G?5caC base pairs are recognized by MMR, which induces the effects of DNA damage. This phenotype was partially rescued by knockdown of MSH2 expression. Once again, because Tet1-overexpressed 293 cells exhibit increased 5hmC and 5caC   19 , we investigated the effects of TDG knockdown in Tet1overexpressed 293 cells ( Figure 4E and Figure S9). When Tet1 expression was induced by treatment with doxycycline (Dox), cell number was reduced in all cases, indicating that Tet1-modifying cytosines behave as DNA lesions ( Figure 4F). TDG knockdown leads to reduced cell numbers that were rescued by TDG-MSH2 double knockdown ( Figure 4F). Thus, MMR was required for these DNA damage effects, which result in cell proliferation defects and decreased cell number.

Discussion
In this study, we investigated the effects of oxidative forms of 5mC on DNA synthesis by replicative or translesion DNA polymerases with DNA templates containing a site-specific C, 5mC, 5hmC, 5fC, or 5caC. Although DNA polymerase correctly incorporated dGTP opposite any modified cytosine, DNA degradation products generated by the exonuclease activity of Pold was significantly higher with 5caC than with other modified cytosines. Base pairing of guanine and the imino tautomer of 5caC ( Figure 3A) has the same geometry as a G?T mismatch 37 and was suggested in a previous study 38 ; the results of exonucleolytic degradation in our study may be attributed to this type of base-pair formation. Münzel et al. demonstrated intramolecular hydrogen bonding between the amino and formyl groups of 5fC, but suggested that a substantial shift of tautomer equilibrium toward the imino form was unlikely 27 . Although it is very difficult to experimentally detect the unfavored imino tautomers of cytosine derivatives 39,40 , electron-withdrawing substituents at the C5 position of cytosine may facilitate base pair formation with the minor tautomer because this type of substitution destabilizes the Watson-Crick G?C base pair 41 . Formation of the G?5caC base pair in the same geometry as that of a G?T mismatch would stimulate the exonuclease activity of human Pold.
In order to characterize the base-pair formation of the oxidized 5mC from another viewpoint, we examined binding of Taq MutS and human MutSa to C5-modified duplexes. As shown in Figures 3B and  S4A, these proteins bound to the duplex containing 5caC; binding competition with a G?T mismatch-containing duplex was apparent ( Figure 3C). MutS wedges a Phe side chain into the mismatch site, where this side chain is stacked onto one of the mismatched bases. This interaction changes the orientation of the stacked base, which originally formed the G?T-type mismatch shown in Figure 3A, so that a hydrogen bond forms between this base and the adjacent Glu. Then, bifurcated hydrogen bonds form between the thymine O4 and the nitrogen atoms of guanine in the MutS-DNA complex 31,32 . Our results support the formation of the G?5caC base pair shown in Figure 3A.
In living cells, DNA synthesis by DNA polymerase on 5caC, which cannot induce a typical mutation and cannot block DNA polymerase activity through replication, may also lead to adverse effects. Although TDG can remove these modified forms of cytosine from the genome, substantial amounts of 5caC remain in tissues and cells 19 . Therefore, when the 5caC on a DNA template pairs with the incoming dGTP via replicative DNA polymerase in the S phase, the delay in DNA synthesis may slow replication around the site and delay proper cell cycle progression. We showed that DNA polymerase generated G?T mismatch-mimicking G?5caC pairs that were recognized by the MutSa complex. MMR can eliminate the G?T mismatch-mimicking base pairs to remove the daughter strand, but cannot remove the modified cytosine. This process may induce abortive turnover of DNA synthesis ( Figure 4G). As shown in Figure 4, MSH2 knockdown rescued the cell death phenotype induced by TDG knockdown. We suggest the G?T mismatch-mimicking base pairs formed by 5caC behaved as DNA lesions processed by MMR. This scenario is similar to a model of apoptosis triggered by O 6 -methylguanine (O 6 -meG) 34,35 , which gives rise to O 6 -meG?T mismatches that are subject to abortive MMR and apoptosis. Therefore, 5caC induced by knockdown of TDG may drive genomic instability in mammalian cells, similar to O 6 -meG. TDG knockout leads to embryonic lethality in mice, presumably due to aberrant epigenetic modifications, especially DNA methylation status and transcriptional defects 21,22 . However, G?T mismatch-mimicking G?5caC pairs may also contribute to embryonic lethality in TDG-knockout mice. Additionally, because 5caC residues behave as DNA lesions that slow DNA replication via the exonuclease function of DNA polymerase, it may be removed before replication to prevent formation of lethal DNA lesions.
MSH2 knockdown partially rescued the TDG-knockdown phenotype. This suggests that other effects of 5caC persist, independent of replication and MMR. Because the 5caC on transcribed strands induces transcriptional pausing 42 , a part of the residual lethality may be attributed to lesions encountered during transcription. Spruijt et al. reported that 5caC recruits a large number of DNA repair proteins in mouse ES cells, including BER (Neil1, Neil3, and Mpg) and MMR (Msh3 and Exo1) 43 . Thus, these DNA repair proteins may be involved in removing 5caC from genomic DNA to rescue the TDG-knockdown phenotype under certain circumstances. Interestingly, in TDG-knockdown mouse ES cells, no apoptotic effects are observed 23 . This may support the observation that 5caC decarboxylation exists in mouse ES but not HeLa cells 44 . This activity might determine the extent of cell viability in cells with accumulated 5caC DNA lesions.
Our results indicated that the electrostatic repulsion of oxidatively modified cytosines 5caC paired with guanine influenced the exonuclease activity of DNA polymerases and the damage recognition step of MMR. This process may lead to abortive turnover of MMR. Thus, 5caC residues that were assumed to be intermediates for an active demethylation pathway may be oxidative DNA lesions that must be removed before replication. These findings provide an important new perspective on the potential functional interplay between cytosine modification status and replication.

Methods
DNA substrates. Thirty-mer DNA substrates containing 5mC or 5hmC were synthesized at Tsukuba Oligo Service and purified by high-performance liquid chromatography (HPLC). DNA substrates containing 5fC or 5caC were synthesized in an Applied Biosystems 3400 DNA synthesizer (Applied Biosystems) by using phosphoramidite building blocks purchased from Glen Research and were purified by HPLC. The oligonucleotides were 59-phosphorylated using (c-32 P)-ATP (PerkinElmer Life Sciences) and T4 phosphoramidite kinase (TaKaRa). Unincorporated nucleotides were removed using MicroSpin G-25 columns (GE Healthcare).
In vitro DNA synthesis assays. DNA synthesis assays were performed as described 30 . Briefly, the 59-32 P-labeled primer-template complex was prepared by mixing the primer with a template containing the indicated sequence context at a molar ratio of 151. Ten-microliter reaction mixtures containing 10 mM Tris-HCl (pH 7.9), 50 mM NaCl, 10 mM MgCl 2 , 40 nM of a labeled primer-template complex, and the indicated DNA polymerases were incubated. The reactions were terminated by adding 10 mL of stop solution containing 95% formamide, 10 mM EDTA, 0.025% bromophenol, and 0.025% xylene cyanol. The fragments were separated by electrophoresis on a denaturing polyacrylamide gel, dried, and analyzed using a Fuji FLA-7000 phosphorimager (Fujifilm).
Knockdown experiments. siRNA duplexes specifically targeting TDG (SI02665040) and MSH2 (SI02663563) and nontargeting control siRNAs (1027280) were purchased from Qiagen and transfected into cells using Lipofectamine RNAiMAX (Invitrogen) according to the manufacturer's instructions. Four days after siRNA transfection, the cells were trypsinized and viable cells were counted. Total RNA was isolated using RNeasy Mini Kit (Qiagen) and cDNA was generated with the SuperScript VILO Master Mix (Invitrogen). Real-time quantitative PCR (qPCR) was performed on an Mx3005P QPCR System (Agilent Technologies) using SYBR Green reagent (Roche Applied Science). cDNA levels of the target genes were analyzed by the comparative Ct method and normalized to ACTB. qPCR primers are listed in the Supplementary Table. To quantify apoptotic cells, a Tali Apoptosis Kit -Annexin V Alexa Fluor 488 and Propidium Iodide (Invitrogen) was used according to the manufacturer's instructions. Mass spectrometry analyses to quantify 5caC were performed as previously described 19 .