The mesophilic archaeon Methanosarcina acetivorans counteracts uracil in DNA with multiple enzymes: EndoQ, ExoIII, and UDG

Cytosine deamination into uracil is one of the most prevalent and pro-mutagenic forms of damage to DNA. Base excision repair is a well-known process of uracil removal in DNA, which is achieved by uracil DNA glycosylase (UDG) that is found in all three domains of life. However, other strategies for uracil removal seem to have been evolved in Archaea. Exonuclease III (ExoIII) from the euryarchaeon Methanothermobacter thermautotrophicus has been described to exhibit endonuclease activity toward uracil-containing DNA. Another uracil-acting protein, endonuclease Q (EndoQ), was recently identified from the euryarchaeon Pyrococcus furiosus. Here, we describe the uracil-counteracting system in the mesophilic euryarchaeon Methanosarcina acetivorans through genomic sequence analyses and biochemical characterizations. Three enzymes, UDG, ExoIII, and EndoQ, from M. acetivorans exhibited uracil cleavage activities in DNA with a distinct range of substrate specificities in vitro, and the transcripts for these three enzymes were detected in the M. acetivorans cells. Thus, this organism appears to conduct uracil repair using at least three distinct pathways. Distribution of the homologs of these uracil-targeting proteins in Archaea showed that this tendency is not restricted to M. acetivorans, but is prevalent and diverse in most Archaea. This work further underscores the importance of uracil-removal systems to maintain genome integrity in Archaea, including ‘UDG lacking’ organisms.


Results
Preparation of MacEndoQ, MacExoIII, and MacUDG. The recombinant proteins used in this study were overproduced in E. coli and purified close to homogeneity. The protein bands appearing in the gels were consistent with the calculated molecular weights of each N-terminal His-tagged protein (Fig. 1). To confirm that the nuclease activities are intrinsic to the proteins of interest, inactive mutants MacExoIII E39A and MacEndoQ D192A were prepared by introducing site-specific mutations into the active site predicted by sequence alignments with each family protein. The mutation E39A in MacExoIII was generated in accordance with a highly conserved residue among ExoIII homologs because of its involvement in metal binding that is essential for nuclease activity ( Supplementary Fig. S1) [38][39][40] . The mutation D192A in MacEndoQ was generated in accordance with the essential residue for activity in P. furiosus 32 . The family-4 UDGs are the most prevalent UDGs in the archaeal domain and have been characterized as glycosylases specific for deoxyuridine (dU) 21 . Two putative family-4 UDGs, MA_ RS11760 and MA_RS18745, were found in the M. acetivorans genome in the National Center for Biotechnology Information (NCBI) database. MA_RS18745 aligned better with the other characterized family-4 UDGs in Archaea, including the two catalytic motifs ( Supplementary Fig. S2), and was therefore designated MacUDG. MA_RS11760 was found to deviate in the two catalytic motifs ( Supplementary Fig. S2), and was therefore designated MacUDG-like.
MacEndoQ acts on uracil, hypoxanthine, and the AP site in DNA. To characterize the activity of MacEndoQ, DNA cleavage assays were conducted using normal or single damaged base-containing DNA. As shown in Fig. 2A site, generating 23-, 24-, and 24-nucleotide (nt) fragments, respectively. In contrast, no product was detected in the case of no DNA damage or with the inactive mutant MacEndoQ D192A . The prepared MacEndoQ sample was also incubated with supercoiled plasmid DNA/circular ssDNA, and we did not observe any non-specific endonuclease contamination ( Supplementary Fig. S3). The opposite strand of the damaged site was found intact ( Supplementary Fig. S4). These results indicate that MacEndoQ incised the DNA backbone at the 5′-sides of the damaged sites. Similar cleavage pattern was observed when using ssDNA (Fig. 2B). The cleavage activity toward AP-ssDNA was not detected due to the low cleavage efficiency compared to that toward dsDNA (compare Fig. 2A,B, lane 18) under this condition. Such decreased cleavage efficiency towards AP-ssDNA has also been previously described for other characterized EndoQ homologs 32,33 . These findings suggest that MacEndoQ possesses endonuclease activity with substrate specificity as observed for EndoQ of the hyperthermophilic order Thermococcales, and the bacterial EndoQ from B. pumilus 32,33 . Thus, MacEndoQ appears to be involved in the repair of dU, dI, and AP sites. This finding further suggests that other putative EndoQ homologs in Euryarchaeota  are most likely to be functional, taking into consideration the high conservation in protein sequences and phyletic analysis of this protein 33 .
MacExoIII shows dU endonuclease activity as well as AP endonuclease activity. The ExoIII (Xth) protein family has been intensively investigated as 3′-5′ exonucleases and AP endonucleases, and the homologs show high evolutionary conservation. In some organisms, the 3′-phosphodiesterase activity of these proteins has also been reported 41,42 . We first examined whether MacExoIII also exhibits these typical characteristics of the ExoIII family proteins (Fig. 3). MacExoIII WT digested blunt-ended dsDNA successively (Fig. 3A, lanes [9][10][11][12][13][14], while the ssDNA remained almost completely intact (Fig. 3A, lanes 2-7). Using 3′-recessed dsDNA, the substrate was digested in the same manner as blunt-ended dsDNA (Fig. 3B, lanes 2-7), whereas 3′-protruding ended dsDNA showed resistance to degradation (Fig. 3B, lanes 9-14), indicating that MacExoIII WT exhibits dsDNA-specific exonuclease activity from the 3′ to 5′ direction. A nick site in dsDNA (Fig. 3C, lanes 2-7) and dsDNA with 3′-phosphate termini (Fig. 3D, lanes [9][10][11][12][13][14] were also susceptible to digestion by MacExoIII WT , indicating that MacExoIII can exhibit exonuclease activity from a nick and possesses 3′-phosphomonoesterase activity. These properties of exonuclease activity are conserved in the typical ExoIII family proteins. In addition, we investigated the endonuclease activity of MacExoIII using DNA with or without damage (AP site, dU, or dI) (Fig. 4). To prevent the 3′-5′ exonuclease activity at the ends of DNA, a 3′-protruding structure was used in this experiment. When AP site-containing DNA was used, 24-nt products and shorter fragments were detected with both ssDNA and dsDNA, suggesting that MacExoIII WT cleaved the DNA backbone at the 5′-side of the AP site, followed by exonuclease digestion (Fig. 4A and B, lanes [8][9][10][11][12]. The endonuclease activities were only detected on the damaged strands ( Supplementary Fig. S5). These findings are also consistent with the classical property of ExoIII, as represented by the homologs from E. coli, M. thermautotrophicus, and humans [41][42][43] . Therefore, importantly, these data suggest a role of MacExoIII as an AP endonuclease in the BER pathway. The limited length of the product by MacExoIII WT exonuclease was observed to be 11 nt. MacExoIII may require at least 11-nt DNA for binding. Next, we investigated the endonuclease activity of MacExoIII towards dU and dI in DNA. Using ssDNA with a larger amount of MacExoIII, only unspecific cleavage from the end was observed, and MacExoIII WT did not show any specific endonuclease activity on either dU-ssDNA (Fig. 4B, lanes [14][15][16][17][18] or dI-ss/dsDNA (Fig. 4A,B, lanes [21][22][23][24][25] under this condition. In addition, we did not detect nicked products with 5′-labeled dU-dsDNA, and only very short DNA fragments were obtained (Fig. 4A, lanes [14][15][16][17][18]. Further attempts to detect the intermediate forms of the observed products with decreased reaction time or a lower protein concentration were unsuccessful (data not shown). However, interestingly, when DNA labeled on the opposite side (3′-end) was used, 22-nt products were detected, demonstrating the incision by MacExoIII at the site immediately 5′ to dU (Fig. 4C). Since MacExoIII exhibited 3′-5′ exonuclease activity in a processive manner (Figs. 3B,C), it is possible that nicked products of dU-dsDNA might not be detected. It must be noted that intact AP-dsDNA was completely digested with 5 nM MacExoIII, while dU-dsDNA was not (compare Fig. 4A lane 10 with lane 16), indicating a strong preference for the AP site over dU. This preference can also be explained by the fact that uracil recognition is restricted to dsDNA, but not ssDNA. Surprisingly, the subsequent exonuclease activity from the resulting nick was suppressed with AP-DNA compared to that for dU or normal DNA (compare Fig. 3 with Fig. 4A). To investigate the binding affinity of MacExoIII toward normal, AP-, and dU-containing DNA, we conducted the electrophoretic mobility shift assay ( Supplementary Fig. S6). MacExoIII exhibited a stable protein-DNA complex in the presence of AP-site. In contrast, a protein-DNA complex was not detected in the case of dU-containing DNA, and only the degraded DNA fragments were observed. Considering that MacExoIII produced nicked DNA at a concentration of 5 nM against AP-containing DNA ( Fig. 4 and Supplementary Fig. S5), it is suggested that MacExoIII binds persistently to the AP site after cleaving the strand.

MacUDG is functional and may act with both MacExoIII and MacEndoQ in a single pathway.
To investigate whether the two candidate proteins, MacUDG and MacUDG-like, have the ability to release damaged bases from DNA, enzymatic assays were performed using normal or damaged DNA. As shown in Fig. 5, MacUDG exhibited glycosylase activity towards uracil in DNA, but no activity was detected when using hypoxanthine, xanthine, and G/T mismatch-containing DNA. This result agrees with the common uracil-specific glycosylase activities detected in the family-4 archaeal UDGs [44][45][46] . By contrast, MacUDG-like did not show any activity even at a concentration of 1 μM in the reaction mixture (Fig. 5), which may reflect the lack of some key catalytic residues (Supplemental Fig. S2). To investigate whether MacUDG and AP endonucleases MacExoIII/MacEndoQ act in a single pathway in M. acetivorans, we carried out in vitro experiments to test whether MacExoIII/MacEndoQ cleaves the substrate generated by MacUDG. The results using dU-containing DNA showed that both MacExoIII and MacEndoQ can exhibit endonuclease activities on the product after UDG removes uracil from DNA ( Supplementary Fig. S7).
All genes encoding MacExoIII, MacEndoQ, MacUDG, and MacUDG-like are expressed in M. acetivorans. To assess whether the genes encoding MacExoIII, MacEndoQ, MacUDG, and MacUDG-like are actually expressed in M. acetivorans cells, we conducted RT-PCR with or without RT enzyme using the gene-specific primers. Besides these four proteins, we also examined the expression of the genes for the family-6 UDG (designated as MacHDG) 24 . Expressions of the 16S rRNA gene and glyceraldehyde-3-phosphate dehydrogenase (GAPDH) were analyzed as positive controls of RT-PCR. As shown in Fig. 6, the mRNAs of MacExoIII, MacEndoQ, MacUDG, MacUDG-like, and MacHDG were detected as amplified DNA products when the RT enzyme was added. The sizes of the amplified DNA products were well-matched with the expected sizes ( Fig. 6 and Supplementary Table S1). This result indicates that the mRNAs of each examined gene exist in M. acetivorans

Distribution of enzymes involved in deaminated base repair in Archaea.
To examine the distributions of the proteins for deaminated base repair in the domain Archaea, we performed a comprehensive search for homologs of 12 enzymes from 95 archaeal genome sequences: ExoIII, EndoIV, endonuclease III (EndoIII), family 4, 5, 6 UDGs, AlkA, 8-oxoguanine glycosylase (OGG1), U/G and T/G mismatch-specific glycosylase (MIG), EndoQ, endonuclease V (EndoV), and endonuclease MS (EndoMS). EndoIII from Pyrobaculum aerophilum has been described as an AP lyase and glycosylase for 5,6-dihydrothymine 47,48 . MIGs from M. thermautotrophicus and P. aerophilum have been described as glycosylases specific for mismatches, including U/G 49 . Archaeal EndoV incises the DNA backbone at the 1 nt 3′ to the hypoxanthine and is thought to be potentially involved in hypoxanthine repair 50,51 . EndoMS was recently identified as a mismatch-specific endonuclease from P. furiosus 52 , which cleaves both strands of hypoxanthine-containing DNA. Overall, this distribution analysis showed that several species have redundancy with respect to uracil repair proteins, suggesting that the archaeal domain has evolved a strong backup system (Fig. 7).

Discussion
Removal of uracil to repair DNA damage is generally accomplished by ubiquitously conserved UDG proteins in all three domains of life. However, our in vitro data indicate that the mesophilic euryarchaeon M. acetivorans appears to have developed a different strategy to counteract uracil damage to DNA with at least three distinct enzymes: EndoQ, ExoIII, and family-4 UDG. We further confirmed the expression of the individual genes in M. acetivorans, and their substrate specificities agreed with those of previously reported archaeal homologs, suggesting the existence of overlapping uracil-counteracting systems in this single organism. Based on data from a previous study demonstrating a correlation between bacterial genes and those present in the M. acetivorans genome 53 , we found that none of the proteins examined in this work (MacExoIII, MacEndoQ, MacUDG, and MacUDG-like) was acquired by horizontal gene transfer from bacteria, indicating that these proteins evolved in the archaeal domain and/or ancestor. We further revealed that (i) MacExoIII, MacEndoQ, and MacUDG all have enzymatic activity with uracil-specificity; (ii) their distributions across Archaea are not associated with each other; and (iii) there is a lack of sequence homologies in these proteins, indicating that the three proteins may not share a common ancestor (i.e., not derived by gene duplications). Therefore, we speculate that UDG, ExoIII, and EndoQ in M. acetivorans may comprise an individual repair pathway for uracil, each acting as a backup system for the other. Moreover, we found that MacExoIII and MacEndoQ may act on the MacUDG-catalyzed substrates (Supplemental Fig. S7), leading us to propose the potential uracil repair pathways in M. acetivorans (Fig. 8). Notably, MacExoIII, MacEndoQ, and MacUDG showed a unique range of substrate specificities and preferences. This may provide the driving force of the evolutionary stability underlying this redundancy 54 . Further studies are warranted to address which pathway is dominant or to determine the activation mechanisms of the respective pathway from this redundancy point of view.
A previous phylogenetic analysis of the UDG proteins raised a question of how the 'UDG-lacking' archaea can remove uracil from DNA 18 . A subsequent study proposed a potential answer by revisiting the properties of ExoIII from the 'UDG-lacking' organism M. thermautotrophicus to reveal its function as a dU endonuclease 28 . Here, we extend the understanding of uracil-counteracting systems in Archaea, including 'UDG-lacking' organisms. It appears that several archaeal ExoIIIs exhibit dU endonuclease activity based on analyses at the amino acid level. Such endonuclease activities towards oxidative DNA lesions have been described in various bacterial ExoIIIs such as E. coli and Bacillus subtilis 30,55 ; however, since the uracil-recognition ability has only thus far been observed in Archaea and higher eukaryotes, we speculate that this recognition ability may have been acquired by the common ancestor of Archaea and Eukarya, as suggested previously 29 . Importantly, our results further suggest that EndoQ also functions in many archaea besides the Thermococcales and may play a key role in maintaining DNA free of uracil in 'UDG-lacking' organisms ( Fig. 7 and earlier reports 32,33 ). Thus, our findings highlight the substantial redundancy in uracil repair proteins in Archaea, raising a new question of what is the advantage of having multiple co-existing repair pathways. Moreover, several orders such as Methanopyrales, Methanococcales, and Methanobacteriales seem to have lost UDG proteins entirely; however, the conserved EndoQ may function as a uracil repair enzyme in these particular orders instead of UDG. MIGs appear to be dispersed, but may contribute to the removal of uracil in Methanomicrobiales together with EndoQ. As shown in Supplementary Fig. S1, many archaeal ExoIIIs bear the four key residues responsible for uracil recognition (Lys125, Asn153, Ser171, and Arg209 in Mth212 38 ) and crucial residues for nuclease activity 39 , suggesting intrinsic dU endonuclease activity. This speculation is reinforced by the fact that human APE1, which has less overall homology, bears weak dU  Fig. S1). The distribution map revealed some additional interesting facts. First, surprisingly, EndoIII was found to be ubiquitously distributed in Archaea. This high conservation implies a significant contribution of this protein in maintenance of the archaeal genome; however, this issue has been poorly investigated to date. Importantly, considering their high conservation, most of the archaeal organisms seem to conduct BER using EndoIV and family-4 UDGs, which is supported by previous studies [56][57][58][59] . We speculate that the archaeal BER pathway is regulated by the interplay of DNA glycosylases (UDG, MIG, AlkA, OGG1), AP lyase (EndoIII), and AP endonucleases (EndoIV, ExoIII). Given that most archaea or their ancestral organisms are considered to be thermophiles 35 , we assume that an efficient backup system is required as a necessary survival tactic in the extreme environments where DNA would be more susceptible to base deamination. Furthermore, many organisms in Archaea appear to have gone through drastic habit changes in the course of evolution. For example, the ancestor of M. acetivorans had to cope with thermophilic to mesophilic temperatures, which would be considered an extreme environment. This can explain the relatively high diversity of otherwise conserved uracil-acting proteins, even among species classified in the same order such as Methanosarcinales or Methamicrobiales (Fig. 7). It must be noted that the same trend was also observed with respect to the conservation of hypoxanthine repair enzymes (Fig. 7).
Archaeal replicative DNA polymerases, members of the B and D families, have specific properties that result in stalling at the replication fork when encountering uracil in the template strand 60,61 ; therefore, the association between uracil and DNA replication in Archaea is of particular interest. A previous study demonstrated inhibition of EndoQ and UDG activities by both family B and D DNA polymerases, suggesting potential interplay to prevent strand scission at the on-going replication folk 62 . To date, several archaeal family-4 UDGs have been reported to interact with proliferating cell nuclear antigen (PCNA). PCNA functions as a sliding clamp that tethers proteins such as DNA polymerase to DNA, thereby acting as a scaffold to facilitate the events on DNA. The interactions between PCNA and its interacting proteins are often achieved via the consensus peptide PCNA-interacting protein-box (PIP-box) 63 . Family-4 UDG and EndoIV from P. furiosus have been proposed to act together by interacting with PCNA 44,44 . This interaction appears to be well-conserved from archaeal UDGs to the human nuclear UDG (UNG2) 45,64,65 , indicating strong coordination with PCNA; however, MacUDG lacks the PIP-box found in P. furiosus (Supplementary Fig. S2). Moreover, to our knowledge, archaeal ExoIIIs have not been found to interact with PCNA and other PCNA-interacting proteins. In contrast, only MacEndoQ seems to interact via the PIP-box 33 . Although it remains possible that MacUDG and MacExoIII could interact with PCNA via an unidentified motif, the gene seems to have been selected among archaeal species during evolution, and likewise of the interacting partners. Since our study was focused on in vitro protein characterization, it is currently not clear how these proteins mediate the pathway or in what context they are activated. In future studies, revealing the functional connections of these uracil-acting proteins with DNA replication and repair-associated proteins may provide new clues into the development of cellular mechanisms for maintaining genome integrity.   Table S1). Each amplified gene was digested with NdeI and NotI and ligated into the corresponding sites of the expression vector pET22-28TEV, which is a modified plasmid of pET21d (Novagen) from a thrombin recognition site to the TEV protease recognition site and from the kanamycin-resistant gene to the ampicillin-resistant gene. The resulting plasmids were designated pET-MA_RS10790 WT , pET-MA_RS03380 WT , pET-MA_RS11760, and pET-MA_RS18745. The expression plasmids for MacExoIII with the E39A mutation and for MacEndoQ with the D192A mutation were generated using the primers MA_RS10790-E39A and MA_RS03380-D192A (Supplementary Table S1