Crystal structure and DNA cleavage mechanism of the restriction DNA glycosylase R.CcoLI from Campylobacter coli

While most restriction enzymes catalyze the hydrolysis of phosphodiester bonds at specific nucleotide sequences in DNA, restriction enzymes of the HALFPIPE superfamily cleave N-glycosidic bonds, similar to DNA glycosylases. Apurinic/apyrimidinic (AP) sites generated by HALFPIPE superfamily proteins are cleaved by their inherent AP lyase activities, other AP endonuclease activities or heat-promoted β-elimination. Although the HALFPIPE superfamily protein R.PabI, obtained from a hyperthermophilic archaea, Pyrococcus abyssi, shows weak AP lyase activity, HALFPIPE superfamily proteins in mesophiles, such as R.CcoLI from Campylobacter coli and R. HpyAXII from Helicobacter pylori, show significant AP lyase activities. To identify the structural basis for the AP lyase activity of R.CcoLI, we determined the structure of R.CcoLI by X-ray crystallography. The structure of R.CcoLI, obtained at 2.35-Å resolution, shows that a conserved lysine residue (Lys71), which is stabilized by a characteristic β-sheet structure of R.CcoLI, protrudes into the active site. The results of mutational assays indicate that Lys71 is important for the AP lyase activity of R.CcoLI. Our results help to elucidate the mechanism by which HALFPIPE superfamily proteins from mesophiles efficiently introduce double-strand breaks to specific sites on double-stranded DNA.

the D225N mutant of R.CcoLI in E. coli. Asp225 of R.CcoLI corresponds to the active site residue of R.PabI (Asp214) (Fig. 1a). Asp214 of R.PabI is employed to stabilize the oxocarbenium ion intermediate, to deprotonate the catalytic water and to bind substrate DNA; the D214N mutation reduces the DNA glycosylase activity and the substrate DNA-binding ability of R.PabI 11 . Similar to the D214N mutation of R.PabI, the D225N mutation of R.CcoLI is predicted to reduce its DNA glycosylase activity. The D225N mutation was introduced to R.CcoLI to overexpress toxic proteins in E. coli cells. After Ni-NTA affinity chromatography, ion-exchange chromatography and gel-filtration chromatography purification, the highly purified R.CcoLI(D225N) mutant was obtained (see Supplementary Fig. S3 online). The gel-filtration analysis showed that R.CcoLI(D225N) forms a homodimer in solution similar to R.PabI (Fig. 1b). The melting temperature of the R.CcoLI(D225N) mutant was 57.5-58.5 °C (Fig. 1c). The R.CcoLI(D225N) mutant retained sequence-specific DNA cleavage activity ( Fig. 1d and Supplementary Fig. S4 online). Although we could obtain crystals of the R.CcoLI(D225N) mutant and their X-ray diffraction data, the refinement statistics of the R.CcoLI(D225N) mutant structure were poor. To improve the quality of the crystals, the C189S mutation was introduced to the R.CcoLI(D225N) mutant. Cys189 of R.CcoLI corresponds to Val181 of R.PabI. The side chain of R.PabI Val181 is exposed to a solvent and does not interact with dsDNA 11,13,14 . The C189S mutation of R.CcoLI was predicted to reduce the heterogeneity of R.CcoLI that was caused by the oxidation of Cys189. The R.CcoLI(C189S-D225N) mutant was overexpressed and purified by a similar method to that employed for the R.CcoLI(C225N) mutant (Fig. 1b). The C189S mutation did not affect the melting temperature or the sequence-specific DNA cleavage activity of R.CcoLI (Fig. 1c,d). These results suggested that the C189S mutation does not affect the function of R.CcoLI.
The R.CcoLI structure was determined at a resolution of 2.35 Å using the crystal of the R.CcoLI(C189S-D225N) mutant. The final model contained one R.CcoLI dimer (chains A and B) in the asymmetric unit ( Fig. 2a and Supplementary Fig. S5 online). R.CcoLI consists of five α-helices, fifteen β-strands and one 3 10 (η)-helix and forms the characteristic HALFPIPE structure using strands β3-β2-β5-β13-β12-β11. Similar to the R.PabI structure, the HALFPIPE region of R.CcoLI has a positively charged surface (Fig. 2b) 9,11,13,14 . Due to poor electron density, the structure models of the β2-β3 and β3-β4 loops of chain A, the β1-β2 loop, the β2-β5 region, the β6-β7 loop, the β11-β12 loop, the β12-β13 loop and the η1-β15 region of chain B are not included in the final model. The final model contains only ten water molecules, despite the relatively high-resolution structure (2.35 Å). These properties cause the structure to exhibit a relatively high R free value (Table 1). In the R.CcoLI structure, C189S is exposed to solvent, as predicted (Fig. 2a).
Structure comparison with R.PabI. When the structures of R.CcoLI and R.PabI (the product DNAbinding state, PDB: 3WAZ) were compared, the secondary structures of two regions (sites 1 and 2) were observed to be different between the proteins (Fig. 3a) 11 . Site 1 exists on helices α1, α4 and α5 of R.CcoLI. In this region of R.CcoLI, three catalytic residues of R.PabI, that is, Tyr68, His211 and Asp214, are conserved as Tyr52, His222 and Asp225, respectively. In the R.CcoLI structure, the insertion residues of R.CcoLI (Fig. 1a) forms an antiparallel β-sheet (β6-β8-β7) adjacent to the active site ( Fig. 3b and Supplementary Fig. S5 (Fig. 1a). However, in the R.CcoLI structure, the side-chain orientation of Lys71 is flipped to the active site due to the insertion of the R.CcoLI specific antiparallel β-sheet. In contrast, Asp53 and Asn67 are R.CcoLI-specific residues. Site 2 is the N-terminal region of R.CcoLI. In the structure of R.PabI, the N-terminal region forms a three-stranded antiparallel β-sheet. However, R.CcoLI lacks the first β-strand due to the shortage of the N-terminal residues (Fig. 1a) and forms a two-stranded antiparallel β-sheet (β1-β15, Fig. 3c). Because R.CcoLI lacks the first β-strand, the side chains of Phe3, Ile5 and Tyr7 are partially exposed to the solvent. In addition to these two sites, the lengths of two loops (β10-β11 and η1-β14 loops in R.CcoLI) also differ between R.CcoLI and R.PabI. In the R.PabI structure, the lengths of the β10-β11 and η1-β14 loops are shortened by two and five residues, respectively. The structural comparison between R.CcoLI and R.PabI (the product DNA-binding state) shows that the structures of the dimerization region are also different (Fig. 3a). The structure of the R.PabI dimer is modified depending on the DNA binding state 9,11,13,14 (see Supplementary Fig. S1 online). When the protomer structure of R.CcoLI (chain A) was compared to those of R.PabI in the DNA-free state (PDB: 2DVY), the sequence-nonspecific DNA-binding state (PDB: 5IFF), the intermediate state (PDB: 6L2O) and the product DNA-binding state (PDB: 3WAZ), the R.PabI structure in the sequence-nonspecific DNA-binding state showed the highest structural www.nature.com/scientificreports/ similarity (Z-score from the Dali server = 21.2, root-mean-square deviation (RMSD) = 1.9 Å, sequence identity = 26%) 28 , although the R.CcoLI structure was determined without DNA. The dimeric structure of R.CcoLI was also well superposed with the dimeric structure of R.PabI in the sequence-nonspecific DNA-binding state, and the RMSD was 2.3 Å for 298 superposed Cα atoms ( Fig. 3d and Supplementary Fig. S6 online).
Active site structure. The amino acid sequence alignment of R.PabI homologs shows that most of the conserved residues are located near the positively charged HALFPIPE region (Figs. 1a, 2b and 4a). R.CcoLI is predicted to recognize a negatively charged dsDNA in the HALFPIPE region, similar to R.PabI. To elucidate the DNA cleavage mechanism of R.CcoLI, we created the R.CcoLI-product DNA complex model using the coordinates of R.CcoLI and that of R.PabI in the product DNA binding state (PDB: 3WAZ) (Fig. 4a,b) 11 . The model structure of the R.CcoLI-product DNA complex shows that the catalytic residues of R.CcoLI (Tyr52, His222 and Asp225) clearly exist near the cleaved N-glycosidic bond of the adenine in the recognition sequence (Fig. 4b,c). Among the base-recognizing residues of R.PabI, ten residues are conserved in R.CcoLI (Fig. 1a). According to the amino acid sequence similarity, Lys17, Arg19, Glu47, Gln49 and Tyr168 of R.CcoLI are predicted to recognize a guanine base in the recognition sequence; Gln164 and Met166 of R.CcoLI are predicted to recognize a thymine base in the recognition sequence; Ile50 and Phe215 of R.CcoLI are predicted to recognize an adenine base in the recognition sequence, and Gln158 is predicted to recognize a cytosine base in the recognition sequence (Fig. 4d). In addition to these residues, Asp53, Asn67 and Lys71 of R.CcoLI are located near the active site due to the insertion of the characteristic antiparallel β-sheet (Fig. 4b,c). The model structure of the R.CcoLI-product DNA complex shows that the distance between the side-chain amine group of Lys71 and the C1′ carbon atom of cleaved deoxyadenosine is approximately 3 Å (Fig. 4b). Because the AP lyase activity of DNA glycosylase is initiated by iminium crosslink formation between C1′ and an amine group 22-27 (see Supplementary  Fig. S2 online), Lys71 of R.CcoLI is predicted to be an important residue for the AP lyase activity of R.CcoLI. In contrast, the side chains of Asp53 and Asn67 are located near the O4′ oxygen atom of deoxyribose and the phosphate group of the DNA backbone, respectively (Fig. 4b,d). These resides are predicted to be utilized for the stabilization of the R.CcoLI-DNA complex.
Mutation assay. The model structure of the R.CcoLI-DNA complex suggests that Asp53, Asn67 and Lys71 are important for the catalytic activity of R.CcoLI; in particular, the side-chain amine group of Lys71 is predicted www.nature.com/scientificreports/ to be important for the AP lyase activity of R.CcoLI. To analyze the importance of the side-chain atoms of these residues, we prepared the D53A-D225N, N67A-D225N and K71A-D225N mutants and analyzed their DNA glycosylase activities (Fig. 5a-c). In this study, we analyzed the effects of mutations using the D225N mutant as a control. Asp225 of R.CcoLI is predicted to be used to cleave the N-glycosidic bond and to generate the oxocarbenium intermediate. The D225N mutation was predicted to reduce the efficiency of this reaction, although the D225N mutant exhibited sequence-specific DNA cleavage activity (Fig. 1d). If the AP lyase activity of R.CcoLI is mediated by the interaction between the side-chain amine group of Lys71 and DNA, the K71A mutant of R.CcoLI acts as a monofunctional DNA glycosylase that hydrolyzes the N-glycosidic bond; the oxocarbenium intermediate is not attacked by the amine group of Lys71 but is attacked by water to generate an AP site. The DNA backbone of an AP site is unstable and is easily cleaved by NaOH treatment at the 3′ and 5′ sides of the AP sites (β-and δ-eliminations, respectively) 11,15 . The results of the DNA glycosylase assay of the R.CcoLI(D225N) mutant showed that the fractions of cleaved DNA were the same, regardless of whether NaOH was added. This finding indicated that the D225N mutant of R.CcoLI functions as a bifunctional DNA glycosylase. In contrast, the results of the DNA glycosylase assay of the K71A-D225N mutant showed that the fraction of cleaved DNA was highly increased by NaOH treatment. This result indicated that the K71A-D225N mutant of R.CcoLI functions as a monofunctional DNA glycosylase and that the side-chain amine group of Lys71 is important for the DNA cleavage activity of R.CcoLI. The reaction speed of the K71A-D225N mutant was determined to be higher than that of the D225N mutant (Fig. 5c). This finding suggested that the Lys71-dependent β-elimination is the rate-limiting step of the R.CcoLI activity. Notably, the K71A-D225N mutant slightly cleaved the substrate DNA in the absence of NaOH treatment (Fig. 5a-c); the K71A-D225N mutant also functioned as a bifunctional DNA glycosylase. Although Asp53 does not possess amine groups, the results of the DNA glycosylase assay of the D53A-D225N mutant also showed a cleavage pattern similar to that of the K71A-D225N mutant; the fraction of cleaved DNA was highly increased by NaOH treatment (Fig. 5a-c). Because the side chain of Asp53 is located adjacent to the side chain of Lys71 and the O4′ oxygen atom of deoxyribose (Fig. 4b), the D53A mutation might decrease the stability of the side chain of Lys71 and deoxyribose, while the increased flexibility in the D53A-D225N mutant is predicted to reduce the efficiency of iminium crosslink formation between Lys71 and DNA. The results of the DNA glycosylase assay examining the N67A-D225N mutant showed that the cleavage activity was reduced compared to that of the D225N mutant, regardless of whether NaOH was added (Fig. 5a-c). This result suggested that the side chain of Asn67 is not employed for DNA backbone cleavage; also, Asn67 is predicted to be used for DNA stabilization (Fig. 4b,d).

Discussion
The HALFPIPE superfamily protein of restriction enzymes was first discovered in the hyperthermophilic archaea P. abyssi and was designated R.PabI. Structural studies have identified that HALFPIPE superfamily proteins are not restriction endonucleases, but rather are restriction DNA glycosylases. Because hyperthermophiles, including P. abyssi, grow optimally at high temperatures (over 80 °C), proteins from these organisms possess extremely high thermal stability; in fact, R.PabI cleaves dsDNA at temperatures ranging from 60 to 90 °C 10 . Mesophiles, such as Campylobacter and Helicobacter, also have HALFPIPE superfamily proteins. Although R.PabI is still active above 80 °C, R.CcoLI from C. coli is denatured at temperatures above approximately 60 °C (Fig. 1c). The  To demonstrate the structural basis for the functional switching of these proteins, that is, the switching between the monofunctional DNA glycosylase and the bifunctional DNA glycosylase, we determined the crystal structure of the R.CcoLI(C189S-D225N) mutant. The most striking feature of the R.CcoLI structure is that the insertion residues that are not conserved in R.PabI form the characteristic antiparallel β-sheet structure (β6, β7 and β8) www.nature.com/scientificreports/ ( Fig. 3a,b). Due to the formation of the antiparallel β-sheet structure, the side-chain amine group of Lys71 is located near the N-glycosidic bond of deoxyadenosine (Fig. 4b). Lys71 of R.CcoLI is conserved in R.PabI as Lys73. The superposition of the structures of R.CcoLI and R.PabI shows that the Cα atom of R.CcoLI Lys71 is close to that of R.PabI Lys73 (~ 3 Å). However, the R.CcoLI-specific β-sheet formation causes the inversion of the side chain direction of Lys71 (Fig. 3b). The enzymatic activity assays demonstrated that Lys71 of R.CcoLI is important for AP lyase activity (Fig. 5a-c). These results indicate that the AP lyase activity of R.CcoLI is facilitated by the insertion of the antiparallel β-sheet structure near the active site. The amino acid sequence of the characteristic antiparallel β-sheet structure of R.CcoLI is largely conserved in R.HpyAXII (Fig. 1a), which is the R.PabI homolog from mesophiles that also shows significant AP lyase activity 15 . Although the structure of R. HpyAXII has not been determined, this sequence similarity suggests that the corresponding region of R. HpyAXII forms an antiparallel β-sheet structure similar to R.CcoLI. The insertion of the antiparallel β-sheet structure in this region is predicted to be a signature of the HALFPIPE superfamily enzymes with AP lyase activity (bifunctional DNA glycosylase). As mentioned in the results section, the K71A-D225N mutant showed weak DNA cleavage activity. R.CcoLI is also predicted to cleave dsDNA through a Lys71-independent mechanism. However, the Lys71-independent DNA cleavage mechanism of R.CcoLI has not been elucidated to date. In general, the thermostabilities of proteins have been attributed to several factors: increased numbers of ion-pair networks on protein surfaces, loop shortening and decreased numbers of hydrophobic accessible surface areas [29][30][31][32][33] . In the R.PabI structure, loop regions that correspond to the β10-β11 and η1-β14 loops of R.CcoLI are shortened compared to the structure of R.CcoLI (Fig. 3a). These loop shortenings are predicted to be important for the high thermostability of R.PabI. The existence of the additional antiparallel β-sheet structure near the active site is characteristic of R.CcoLI. However, this structure is truncated in R.PabI (Fig. 3a,b). In the R.CcoLI structure, this region shows relatively high temperature factors compared to the protein core region (Fig. 6a,b and Supplementary Fig. S5 online). It is predicted that the truncation of this region is also important for the high thermostability of R.PabI. In the R.CcoLI structure, the hydrophobic residues in the N-terminal region (Phe3, Ile5 and Tyr7) are exposed to the solvent. In contrast, the corresponding region of R.PabI is covered by the additional www.nature.com/scientificreports/ β-strand (Fig. 3c). Therefore, the hydrophobic surface area of this region is decreased in the R.PabI structure. It is predicted that this difference contributes to the high thermostability of R.PabI. The AP lyase activity of R.CcoLI is mediated by Lys71 in the characteristic antiparallel β-sheet structure. In contrast, R.PabI only shows weak AP lyase activity due to the lack of a lysine amine group near the active site (Fig. 3b). Because C. coli is a mesophile, high thermostability is not necessary for R.CcoLI; R.CcoLI can utilize the relatively flexible regions (that is, the antiparallel β-sheet structure containing Lys71) for its catalytic mechanism. Meanwhile, R.PabI from the hyperthermophile P. abyssi must possess high thermostability to function at high temperature. Because AP sites are unstable in high-temperature conditions 11 , AP lyase activity is predicted not to be required for the DNA damaging function of R.PabI. R.PabI might have relinquished its AP lyase activity to obtain high thermostability. The structures of restriction enzymes are frequently modified by the binding of DNA. For example, the structures of such restriction endonucleases as EcoRV and BamHI, which belong to the PD-(D/E)XK superfamily, show that these proteins widen their DNA binding clefts when they weakly bind sequence-nonspecific dsDNA, and the clefts become narrow when they tightly bind their recognition sequences in dsDNA; the weak sequencenonspecific dsDNA-binding states are utilized for facilitated diffusions on dsDNA 3,4,34 . Our previous studies also demonstrated that the structure of R.PabI is modified by the binding of dsDNA 11,13,14 . Although the structure of R.CcoLI was determined in the absence of DNA, the R.CcoLI structure is most similar to the R.PabI structure in the sequence-nonspecific dsDNA-binding state ( Fig. 3d and Supplementary Fig. S6 online). This structural similarity might indicate that the R.CcoLI structure is not modified by the binding of sequence-nonspecific dsDNA. However, the precise DNA recognition mechanism of R.CcoLI will be clarified by the determination of the R.CcoLI-dsDNA complex structure. In this study, we analyzed the structure and function of R.CcoLI using the D225N mutant. Structural and functional studies utilizing the wild-type enzyme may demonstrate the function of R.CcoLI more precisely.

Methods
Expression and purification. The gene fragment of R.CcoLI (NCBI Reference Sequence: WP_002830209) was synthesized by GenScript (see Supplementary Table S1 online). Each codon in the synthesized gene was optimized for expression in E. coli. The R.CcoLI gene fragment was amplified by PCR using primers in Supplementary Table S2 online (Cloning-F, R), and it was cloned into the SmaI-HindIII site of pET48b (pET48b- www.nature.com/scientificreports/ R.CcoLI) to express R.CcoLI with an N-terminal thioredoxin tag. To reduce the cytotoxicity of R.CcoLI, the D225N mutation, which corresponds to the D214N mutation of R.PabI (Fig. 1a), was introduced to pET48b-R. CcoLI plasmid using the PrimeSTAR Mutagenesis Basal Kit (TAKARA) and primers (D225N-F, R) in Supplementary Table S2 Table S2 online. Each mutant was expressed and purified using the same method as that described above. DNA cleavage assay. A modified pET26b plasmid, possessing only one 5′-GTAC-3′ site, was employed as a substrate for R.CcoLI (see Supplementary Fig. S4 online) 11 . The modified pET26b plasmid was cut with HindIII (TAKARA) to linearize the plasmid. To analyze the DNA cleavage activity of R.CcoLI, 0.2 μg of R.CcoLI mutants and 0.2 μg of the linearized plasmid were mixed in 0.1 M sodium phosphate buffer pH 6.5 and 1 mM TCEP and were incubated at 37 °C for 30 min. The cleaved DNAs were separated by electrophoresis through a 1% agarose gel. The DNAs were visualized with blue-LED light after GelGreen (Biotium) staining. Products by AfaI (TAKARA), which cleaves the sequence 5′-GTAC-3′, were also separated as a control.
DNA glycosylase activity assay. DNA glycosylase activity assays of R.CcoLI mutants were performed using 24-bp dsDNA containing one 5′-GTAC-3′ sequence (5′-fluorecesin-GGA TGC ATGA GTAC GAG GAC CATC-3′, see Supplementary Fig. S4 online). A total of 0.2 μM of the substrate dsDNA was mixed with 0.8 μM of the R.CcoLI dimer in a reaction buffer (0.1 M sodium phosphate buffer pH 6.5, 1 mM TCEP). The reaction solutions were incubated at 37 °C for 15 min or for 1, 3, 7, 15 and 30 min. After the enzymatic reaction, the solutions were supplemented with 0.1 M NaOH or 0.1 M HCl to terminate the enzymatic reaction. To cleave the 5′ and 3′ sides of the AP sites generated by R.CcoLI, the solutions supplemented with NaOH were heated at 70 °C for 10 min. The reaction solutions were neutralized by the addition of an equal concentration of HCl or NaOH and separated on a denaturing 18% polyacrylamide gel in 0.5 × TBE and 7 M urea. The fluorescence was measured using an Amersham Imager 680 (GE Healthcare). Data were quantified using the program Amersham Imager 680 Analysis Software (GE Healthcare). The enzymatic rate constant k obs was obtained from a single-exponential fit to the data from three independent measurements: f p = f p max × (1 − e −kt ), where f p is the fraction of product, f p max is the maximum value of f p , and t is the time of the reaction (min).
Denaturation assay. For the denaturation assay of R.CcoLI, proteins (10 μM) and 2.5 × SYPRO Orange (Thermo Fisher Scientific) were mixed in 0.1 M sodium phosphate buffer pH 6.5 and 1 mM TCEP. Denaturation assays were performed using a CFX Connect Real-Time PCR Detection System (Bio-Rad Laboratories). Fluorescence was measured from 20 to 95 °C in 0.5 °C steps (excitation, 450-490 nm; detection, 560-580 nm). Data were analyzed using Bio-Rad CFX Manager 3.0 software. Crystallization, data collection and structure determination. The purified protein was concentrated to 18 mg/ml using Vivaspin 6 (MWCO 30 k Da, Sartorius) for crystallization. Crystallization experiments of the R.CcoLI(C189S-D225N) mutant were performed using the sitting-drop vapor-diffusion method at 20 °C. The crystals of R.CcoLI(C189S-D225N) were obtained using a reservoir solution of 0.1 M MES pH 6.0 and 8% PEG6000 one day later. X-ray diffraction data of the R.CcoLI(C189S-D225N) crystal were collected on the AR-NW12A beamline at the Photon Factory (Tsukuba, Japan) under cryogenic conditions (95 K). For cryoprotection, the R.CcoLI(C189S-D225N) crystal was soaked in a reservoir solution supplemented with 40% ethylene glycol for several seconds. The R.CcoLI(C189S-D225N) crystal diffracted X-rays to 2.35-Å resolution. The X-ray diffraction data were indexed, integrated and scaled with XDS 35  The initial model of R.CcoLI(C189S-D225N) was determined by the molecular replacement method using the program Phaser 37 . The ensemble of the R.PabI structures (the DNA-free state (PDB code: 2DVY) 9 , the product DNA-binding state (PDB code: 3WAZ) 11 and the sequence-nonspecific DNA-binding state (PDB code: 5IFF) 13 ) was used as the search model. The initial model was refined and rebuilt using the programs Phenix.refine 38 and Coot 39 . The final model of R.CcoLI(C189S-D225N) was refined to 2.35 Å resolution with R and R free values of 22.9% and 25.9%, respectively. The geometry of the final model was evaluated using the program MolProbity 40 . In the Ramachandran plot, 98.0% of the residues were in the favored region, and the rest were in the allowed region. The data collection and refinement statistics are summarized in Table 1.
Computational analysis. The protein structure was analyzed using a set of computer programs as follows: PISA 41 for the analysis of the protein interface, surface and assemblies, APBS 42 for the calculation of macromolecular electrostatics, Dali for the search for similar structures from the database 28 , Clustal Omega 43 for the amino acid sequence alignment, ESpript 44 for the preparation of alignment figure, and PyMOL (http://pymol .org) for the depiction of structures.

Data availability
Atomic coordinates and structure factors for the reported crystal structures have been deposited with the Protein Data Bank under accession number 7CFA.