Introduction

Restriction endonucleases (REases) are a diverse group of DNA-cleaving enzymes that serveto protect bacteria against phage infection or invasion of mobile genetic elements (seereviews1,2). In order to overcome attack by REases, bacteriophagesevolve elaborate modifications on their genomic DNA. Bacteria, in turn, develop newenzymes that can specifically target modified DNA. Modification-dependent REases such asMcrBC, McrA, Mrr, GmrSD and PvuRts1I are loosely grouped together and referred to asType IV REases (recent review in Ref. 3). The modified baseson DNA are N6-methyladenine (N6mA) restricted by Mrr, 5-methylcytosine (5mC) restrictedby McrBC and Mrr, 5-hydroxymethylcytosine (5hmC) restricted by PvuRts1I-family enzymesand McrBC and glucosyl-5-hydroxymethyl-cytosine (glc-5hmC) restricted by AbaSI orGmrS/GmrD enzymes4,5,6,7,8,9,10. The first characterized GmrSDenzyme was found in E. coli strain CT596 and encoded by two adjacent genesgmrS and gmrD6,7. The GmrS and GmrD subunits wereseparately expressed as intein and chitin-binding domain (CBD) fusion proteins andcleaved off by intein cleavage. The reconstituted enzyme is active and specific forglc-5hmC-modified DNA, but it has poor activity on 5hmC- or 5mC-modified DNA. Thereaction buffer for the reconstituted enzyme included UTP, Ca2+, andMg2+6. It has been reported that T-even phagesencapsidate a diverse set of internal proteins encoded at the ip1 locus thatfunction to counteract the GmrSD nuclease activity during DNA injection7,11 (ip1, T4 phage inhibitor gene encoding IPI protein that isprocessed into encapsidated IPI* protein). Interestingly, a close homolog (UTI89_C2960or UT enzyme) found in E. coli O18 K1 H7 UTI89 is a fused single-chain enzyme;the UT enzyme is insensitive to IPI* inhibition due to its altered amino acid (aa)sequence and specificity, but it does not restrict either T4 IPI*-deficient or wild-type(WT) T4 phage although it restricts many other T even-like phages such as T2 and T612. The enzymatic tools that differentially cleave 5hmC DNA are limited,since McrBC- and MspJI-family enzymes cleave both 5hmC and 5mC-modified DNA13,14. Structural studies that could determine the interactions that occurbetween GmrSD and its inhibitor protein, IPI*, have been hampered by the poor expressionof the two chain GmrS/GmrD enzyme.

The flux of sequenced bacterial genomes has revealed that there are many GmrSD homologsin proteobacterial genomes. As with the UTI89_C2960 protein, in these homologs thegmrS and gmrD genes are fused together to form a single gene, whichmay encode a single-chain GmrSD enzyme. The goal of this work was to evaluate theendonuclease activity of such a single-chain GmrSD homolog found in the genome of E.coli strain STEC_94C and to develop methods for simple purification of thetarget protein. In addition, we studied the metal ion requirement and preferredsubstrate size for Eco94GmrSD and identified a potential endonuclease catalytic site (aconserved nuclease motif Asp-His-Asn (D-H-N) in its C-terminus). We found that thesingle-chain enzyme is capable of cleaving 5hmC and glc-5hmC DNA in vitro. This propertydiffers from the two-chain GmrS/GmrD enzyme complex that only cleaved glc-5hmC DNA.However, despite this difference in in vitro substrate sensitivity we found that thephage restriction activity of Eco94GmrSD is very similar to that of the two-chainGmrS/GmrD: Eco94GmrSD only weakly restricted WT T4 and T4gt (deficient in α-,β-glucosyltransferase (gt) phages), but strongly restrictedT4Δip1 phage (about a million fold). The possible involvement ofGmrSD-like enzyme in the bacterial immigration control region (ICR) is alsodiscussed.

Results

The hypothetical protein ECSTEC94C_1402 (GenBank accession #: WP_000834395) fromE. coli STEC_94C has 629 amino acid (aa) residues. It displays 93% aasequence identity to the GmrSD fusion protein found in E. coli UTI89(UTI89_C2960, EcoUTI89GmrSD or UT enzyme) (see sequence alignment in Supplementary Fig. S1). The ECSTEC94C_1402 gene is located on a41.5 kb region of the ECSTEC94C genome diagnosed by“Phast” to be a prophage most similar to Shigella phageSfII (NC_021857). This similarity is mostly over the first 17 kb of theSfII genome (97% by blastn analysis), a region that encodes the major morphogenesisgenes, although there are other shorter homologous regions. Notably, the Eco94GmrSDgene is immediately downstream of the genes predicted to form the phage tail fibersthat are responsible for host adsorption. This is a morphogenic region known toevolve rapidly to adapt to changes in host cell receptors. Notably, Shigellaphage SfII does not encode a GmrSD homolog at this position (or elsewhere). Wehypothesize that the GmrSD gene being prophage-borne indicates it has some role inan evolutionary arms race between phage and host, supported by its in vitro and invivo restriction properties (see below). EcoCT596GmrSD (CT enzyme) was also encodedby a curable prophage that restricted T4ip1 andrII mutants.

Three major differences were found when Eco94GmrSD sequence was compared to theprototype EcoCT596GmrS/GmrD as shown in Supplementary Fig. S1:1) Eco94GmrSD lacks 3-aa residues Ser97-Leu98-Ala99; 2) Eco94GmrSD carries oneadditional amino acid difference (Arg313 in Eco94GmrSD vs Gln313 in the two-chain CTenzyme); 3) Eco94GmrSD contains 84-aa residues as a connector of the GmrS and GmrDsubunits, which fused two subunits into a single-chain peptide. Recently, the genesequence encoding the CT enzyme has been resequenced and updated as a single genethat restricts glc-5hmC containing T-even phages as cloned in a pBeloBac11 (singlecopy vector)(Genbank ID: AF493796_1). It is now apparent that the two-chain GmrSDoriginally cloned had suffered mutational events, but still retained activity.Subcloning of the ecoCT596gmrSD gene into higher copy plasmids was enabled byintroduction of a stop codon which caused a truncation product GmrS, andre-initiation product GmrD with a small deletion between the gmrS andgmrD genes (see below)6.

We used two expression strategies to express ECSTEC94C_1402. One strategy was toclone its ORF in pET21b in fusion with a C-terminal 6xHis tag (Eco94GrmSD-6xHis) andpurification through a nickel-NTA agarose column. Another method was to clone itsORF in fusion with an intein and CBD, the same strategy proved successful inexpression of the two-chain GmrS/GmrD originally cloned from E. coli strainCT596 (see Supplementary information). The Eco94GrmSD enzymepurified by two different methods (6xHis-tagged protein via nickel column orintein-CBD tagged protein via chitin column) share nearly identical enzymeproperties except that the His-tagged enzyme displays higher specific activity (seebelow).

Expression and purification of 6xHis-tagged single-chain GmrSD(GmrSD-6xHis) and endonuclease activity assay

In one expression strategy we cloned the single-chain eco94gmrSD gene intopET21b and purified it with a C-terminal 6xHis-tag (GmrSD-6xHis). This proteinwas insoluble if IPTG-induction was carried out at 37°C, however itexpressed well at 16°C to 18°C in co-overexpression ofGroEL/GroES protein from a compatible plasmid (data not shown). GmrSD-6xHisprotein was purified by nickel-NTA agarose column chromatography and furtherpurified via a heparin column. Most of the GroEL protein was removed by thesecond step. Fig. 1 shows the partially purifiedGmrSD-6xHis enzyme (pooled fractions from heparin for activity assay) and itslow enzyme activity on T4 (panel C, lanes 1–2), T4gt (lanes4–5) and λ DNA (lanes 7–8). The endonucleaseactivity was strongly stimulated by addition of 1 mM ATP in digestionof T4 and T4gt DNA (lanes 10–11, 13–14); poor activity wasdetected on λ DNA (Dam+ Dcm+). In acontrol experiment, T4, T4gt and λ DNAs were digested by MluCI(AATT) whose activity was not affected by cytosine modifications. The specificactivity of the purified enzyme was estimated to be ~500 units/mg protein on T4DNA (see unit definition in Methods). The final protein yield was estimated at4 mg/L of IPTG/arabinose-induced cells. It appeared that the GmrSDenzyme displayed lower cleavage activity on T4gt DNA compared to T4 (less than2-fold difference).

Figure 1
figure 1

SDS-PAGE analysis of purified Eco94GrmSD-6xHis protein and agarose gelanalysis of endonuclease activity assay.

(A). Purified C-terminal 6xHis-tagged GrmSD from a nickel column (Ni). Theenzyme was purified from cell lysate of T7 Express[pET21b-gmrSD, pGro7]. L, protein ladder.(B). SDS-PAGE analysis of purified GmrSD fractions from a heparin column.(C). Digestion of T4 (glc-5hmC), T4gt (5hmC) and λ DNA(Dam+Dcm+) by purified GmrSD-6xHis in thepresence (+) or absence (−) of 1 mM ATP. Lanes 3, 6,and 9, MluCI digested DNAs. One μl or 5 μlof GmrSD (~0.5 μg/μl, 0.14 or0.70 μM) was used to digest 1 μgDNA in buffer 2 plus or minus 1 mM ATP.

Metal ion and dithiothreitol (DTT) requirement for GmrSD endonucleaseactivity

The purified Eco94GmrSD was tested for activity on T4 DNA in a basic buffer(50 mM NaCl, 10 mM Tris-HCl, pH 7.5, 1 mM DTT)supplemented with different divalent cations. Eco94GmrSD was active in digestionof T4 DNA when the basic buffer was supplemented with Mg2+ orMn2+ (Fig. 2A, lanes 1–4).It was interesting to note that metal ions can modulate the relativeendonuclease activity. At 1 mM of divalent cation, the enzyme is moreactive in the presence of Mn2+ than Mg2+. At10 mM of the metal ion, GmrSD is more active in Mg2+than Mn2+. The free Mg2+ concentration in E.coli cells was estimated at 1 to 2 mM15,16.The intracellular concentration of Mn2+ in bacteria was estimatedat μM range (www.bionumbers.hms.harvard.edu). The Eco94GmrSD enzyme shows pooractivity with other metal ions such as Ca2+, Co2+,Ni2+, or Zn2+ (Fig. 2A,lanes 5–9). We also compared GmrSD endonuclease activity in10 μM, 0.1 and 1 mM of Co2+,Ni2+, or Zn2+ (since high concentration oftransition metal ions may inhibit activity). Eco94GmrSD displayed very lowactivity in 0.1 mM Co2+ or Zn2+ (datanot shown). GmrSD nuclease activity on T4 DNA was clearly detected inMn2+ buffer (optimal concentration at 1 mMMnCl2). But this low nuclease activity was independent of ATPcofactor (see Supplementary Fig. S2A, lanes 1–4).It was somewhat unexpected that addition of 1 mM ATP could inhibitGmrSD activity in Mn2+ buffer (lanes 6–8). In acontrol digestion, GmrSD degraded T4 DNA into small fragments(100–300 bp) in NEB buffer 2 and 4 with 10 mMMg2+. It was puzzling that the same ATP cofactor could have apositive stimulatory effect on GmrSD nuclease activity in Mg2+buffer, but it exerts a negative inhibitory effect in Mn2+ buffer(at 0.1 to 0.5 mM). It is well known that HNH-family endonucleasesare more promiscuous in metal ion cofactor requirement for catalyticactivity17,18,19. Perhaps the negatively regulatory loop byATP provides a safeguard to GmrSD star activity on unmodified DNA when GmrSDenzyme is “accidently” bound by Mn2+ ions.To see whether GmrSD enzyme displays any nuclease activity on λ DNAin Mn2+ buffer, λ DNA was digested by GmrSD in theabsence or presence of 1 mM ATP. Supplementary Fig.S2B shows that GmrSD caused some λ DNA smearing as anindication of low nuclease activity. The supplement of 1 mM ATPappeared to inhibit nuclease activity at 0.1 to 0.5 mMMn2+. In a control digestion, GmrSD enzyme shows no smearingin NEB buffer 2 and a low level of smearing in buffer 4. We speculate that GmrSDenzyme displays relaxed specificity (star activity) in Mn2+buffer (since it partially cleaved non-glc-5hmC or non-5hmC DNA). This staractivity is consistent with the observation that GmrSD over-expression in aRecA-deficient E. coli host was quite toxic (see below), probably causedby dsDNA breaks at star sites and the lack of RecA-mediated DNA recombinationand repair.

Figure 2
figure 2

Determination of divalent cation cofactor requirement and DNA substratepreference for GmrSD digestion.

(A). Metal ion cofactor requirement for GmrSD digestion. Divalent cations orEDTA are indicated on top of each lane. (B). Substrate preference andoptimal substrate size for GmrSD digestion. PCR DNA substrates containing5hmC or regular dC were generated by PCR using pBR322 template and digestedby GmrSD endonuclease in the presence of 1 mM ATP in NEB buffer2. The same DNA substrates were also digested by HpaII (CCGG) in NEB buffer4 (to confirm modified DNA).

Eco94GmrSD enzyme gradually loses activity during storage at−20°C, however, its activity can be restored by additionof fresh DTT (data not shown). There are seven Cys residues in Eco94GmrSD enzymeand presumably oxidation of these Cys residues may contribute to lower activityduring storage. The optimal temperature for Eco94GmrSD activity was determinedto be 37°C (see Supplementary Fig. S3).

Preferred substrate and substrate size for the single-chainEco94GmrSD

To study the substrate size preference we used PCR products that contain 5hmCincorporated during PCR by including 5hm-dCTP in PCR reactions. PCR products(3.8, 1.9, 1.0, 0.5 and 0.3 kb) containing 5hmC or unmodified dCwere purified by spin columns and digested with Eco94GmrSD. 5hmC-modified PCRDNA substrates (3.8 kb, 1.9 kb, 1.0 kb) wereefficiently digested; while modified PCR products in 0.3 and 0.5 kbwere cleaved with reduced efficiency (Fig. 2B). PCRproducts (same sizes) with regular dC were poorly digested by Eco94GmrSD at thesame enzyme concentration tested (Fig. 2B). This result isconsistent with the substrate preference for modified DNA (T4gt) shown in Fig. 1. In a control experiment, 5hmC-modified PCRsubstrates were resistant to HpaII digestion and PCR DNAs with unmodifiedcytosine were digested by HpaII (Fig. 2B, right panel). A60-bp PCR fragment containing 5hmC-N20-G (two 5hmC on the opposite strandsseparated by 20 bp) was partially cleaved by Eco94GmrSD; but the60mer with 5hmC-N10-G (two 5hmC on the opposite strands separated by10 bp) was not cleaved (Supplementary Fig. S4).We also cloned and sequenced some GmrSD cleavage products of T4gt and determinedthe cut sites (see Supplementary information and Table S1).The common feature of these cut sites was 5hmC N(17–23) G (two 5hmCon the opposite strands separated by 17–23 bp), wherecleavage takes place mostly at the symmetric sites 5hmCN(9–11)↓N(9–11)G. Sequencing of more cleavageproducts are required to pinpoint the substrate preference and the effect offlanking sequence on cleavage efficiency of the modified 5hmC-containingDNA.

NTP- and dNTP-stimulated GmrSD endonuclease activity

In the previously published report, NTP stimulated the CT enzyme activity6. Therefore, we examined the endonuclease activity in the presenceof NTP, dNTP, or non-hydrolysable γ-S-ATP. Fig. 3A and3B show strong stimulation of endonuclease activity by addition of0.1 to 1 mM ATP (but higher concentration of ATP at 10 mMinhibits activity). Stimulation of activity was also detected at 2 mMATP concentration (data not shown). Supplement of 0.1, 0.5 and 1 mMCTP or UTP had a minimal effect. Addition of GTP(0.5–1 mM) also had a moderate effect on enzyme activity.Supplement of dATP (1–10 mM), or dTTP(1–10 mM) also strongly stimulated the endonucleaseactivity, while dCTP and dGTP have moderate effect. But addition ofnon-hydrolysable γ-S-ATP (1–2 mM) had nostimulatory effect on enzyme activity. We have not directly measured NTPhydrolysis in GmrSD cleavage reactions.

Figure 3
figure 3

Stimulation of GmrSD endonuclease activity by supplement of NTP or dNTP indigestion of T4 DNA.

No stimulator effect on enzyme activity was detected by supplement ofnon-hydrolysable γ–S-ATP. NTP or dNTP concentrationswere indicated on top of each lane.

Site-directed mutagenesis of a putative catalytic site in the C-terminaldomain

In a protein homology search, Eco94GmrSD had a weak hit with His-metal fingernuclease family (conserved amino acid residues DHxxP). The putative endonucleaseactive site residues located near the C-terminus are D507-H508-N522-(N528-N535)with additional Asn/Gln/Lys residues in close proximity (conforming to DH-N-Ncatalytic site). The HNH (HNK or HNN) motif is found in Colicin nucleases,homing endonucleases, DNA repair enzymes, REases, DNA nicking enzyme,transposase, type II intron-encoded reverse transcriptase and Cas920,21,22,23,24,25,26,27. To investigate the importance ofthese residues, six GmrSD variants D507A, H508A, C517A, N522A, N528A and N535Awere constructed by site-directed mutagenesis and the mutant proteins werepurified by nickel column chromatography. Three inactive mutants (D507A, H508A,and N522A) and three partially active variants (C517A, N528A and N535A) werefurther purified by heparin column and analyzed by SDS-polyacrylamide gelelectrophoresis (SDS-PAGE) (Fig. 4). The protein yield andpurity of D507A, H508A, C517A, N528A and N535A were comparable to the WTenzyme, but the N522A variant showed reduction in protein yield and purity afterheparin column. Fig. 4B and 4C shows that GmrSD variantsD507A, H508A and N522A are devoid of endonuclease activity. Variants C517A,N528A and N535A are partially active (about ~25% to 50% of WT activity.Although the relative activity estimates were crude, C517, N528 and N535 couldbe ruled out as potential catalytic residues. The relative activity of WT andmutants are summarized in Table 1.

Table 1 Summary of endonuclease activity on T4 DNA and protein expression levels ofWT and mutant forms of Eco94GmrSD
Figure 4
figure 4

Analysis of partially purified WT Eco94GmrSD and mutant proteins D507A,H508A, C517A, N522A, N528A, N535A, E271A/E273A, E278A/K280A on SDS-PAGE andendonuclease activity assays for the mutant enzymes on T4 DNA.

(A). SDS-PAGE analysis of WT and mutant proteins. Left panel: WT and mutantproteins (D507A, H508A, N522A and N535A) purified by nickel-NTA agarose andheparin HP columns (purified D507A showed aberrant migration). Right panel:WT, E271A/E273A, E278A/K280A, C517A and N528A proteins. (B and C).Endonuclease activity assay for WT and GmrSD variants D507A, H508A, C517A,N522A, N528A, N535A on T4 DNA. The amount of input protein was0.5 μg, 1 μg, and2 μg, respectively in digestion of1 μg T4 DNA. For the double mutants E271A/E273A andE278A/K280A, the amount of input protein was 0.5 μg,1 μg, 1.5 μg and2 μg, respectively in digestion of1 μg T4 DNA.

The mutagenesis results indicate that the critical amino acids of the GmrSDendonuclease catalytic site are likely residues D507, H508 and N522 (See Supplementary Fig. S5 for a model of the predicted activesite). This catalytic site is similar to that found in I-HmuI and I-PpoI homingendonucleases and other HNH-family nucleases19,20,28. When acatalytic residue of a REase is mutated, the mutant protein is still capable ofDNA binding and this can be detected by DNA mobility shift assay (DNA-REasecomplexes migrated slower than substrate DNA in native PAGE)29,30. Purified WT GmrSD, D507A, H508A and N522A were used to bind a 266-bp PCRfragment containing 5hmC or dC. Two major bound complexes by the WT enzyme weredetected and most of the substrate DNA was bound and shifted to the top of gelat high enzyme concentration in an EDTA buffer (no divalent cation, data notshown). It is known that divalent cations can modulate REase specificity: KpnIdisplays high specificity (low star activity) in Ca2+ buffercompared to its specificity in Mg2+ and Mn2+buffers24. Similarly, divalent cations enhanced the bindingspecificity of EcoRV catalytic-deficient mutants31. Therefore, weexamined DNA binding in a buffer with cofactors MgCl2 and ATP(binding at room temperature for 10 min to minimize cleavageactivity). Fig. 5A shows that two bound complexes weredetected on both dC and 5hmC substrates by the WT enzyme. D507A appeared to havereduced DNA binding affinity than the WT enzyme (Fig. 5B)(a large complex is not discernable due to a large DNA fragment present in thesubstrate DNA). Similar to the WT enzyme, H508A variant also caused gel shift ofboth 5hmC- and dC-DNAs in the binding assay and appeared to have enhancedbinding activity since all the substrate was shifted to the loading well at 60:1protein to DNA molar ratio (Fig. 5C, lanes 4 and 8). N522Avariant protein appeared to have reduced DNA binding affinity to 5hmC DNA: amajor bound complex was detected at 100, 250 and 500 ng protein inthe gel shift assay for dC-PCR DNA (Fig. 5D, lanes2–4), but only weak complex formation was detected for 5hmC-PCR DNAat high enzyme concentration (Fig. 5D, lane 8). To furtherconfirm the DNA binding activity of D507A, H508A and N522A mutant proteins, T4MluCI (AATT) restriction fragments were used in the DNA mobility shift assay.The WT enzyme and H508A showed similar binding complexes except that at highenzyme concentration all the substrates were shifted up by H508A; D507A andN522A proteins displayed lower affinity and produced shifted/bound complex(s)only at high enzyme concentrations (data not shown). It was concluded that H508Avariant is a binding-proficient and cleavage-deficient mutant that fits thedefinition of catalytic mutant of REases. The binding results oncleavage-deficient mutants D507A and N522A were not conclusive, but suggestingD507 and N522 may be involved in both binding (specificity determination) andcatalysis. Further biochemical and structure analysis are needed to refine theroles of D507 and N522 residues.

Figure 5
figure 5

DNA mobility shift assays for WT GmrSD (panel A) and its variants D507A(panel B), H508A (panel C), N522A (panel D) in the presence of 10 mMMg2+ and 1 mM ATP.

Bound DNA was resolved in 10% TBE native gels. Two PCR substrates were used:266-bp dC-DNA (unmodified) and 266-bp 5hmC DNA (modified). Arrows indicatebound complexes (shifted bands).

Restriction of phage by GmrSD endonuclease

The proposed biological role of GmrSD endonuclease is to serve as a phageexclusion protein (the resident prophage expressing GmrSD to restrict incomingphage with sugar-modified DNA). To counteract GmrSD restriction, T4-like phagesevolved inhibitor proteins (internal proteins) such as encapsidated IPI* toinhibit GmrSD activity following T4 DNA ejection into the host cytoplasm6,12. To determine if Eco94GmrSD was capable of phage restriction,we tested it against WT T4, several T4 mutants and λ phage. We testedits phage plating efficiency using the T7 Express strain containingpET21-eco94gmrSD (under constitutive expression, no IPTG added) andused pET21b vector as a control. Consistent with the in vitro result of poorcleavage activity on λ DNA, Eco94GmrSD did not restrict λphage (Table 2). Eco94GmrSD restricted T4 and T4gt by 15to 20-fold (Table 2) and this lack of strong restrictioncould be attributed to the counter measure evolved by T4 phage. T4 phageco-eject inhibitor protein (IPI*, ~360 copies per viral capsid) into the hostcytoplasm; this inhibitor protein can antagonize GmrSD nuclease activity andovercome the phage exclusion mechanism, leading to successful phage DNAreplication and virus packaging12. Consistent with thisexplanation, Eco94GmrSD strongly restricted T4Δip1 (T4 mutanteG506, IPI*-deficient). T4Δip1 plating efficiency onEco94GmrSD expressing strain under non-induced condition is in the range of10−6 to 10−7. In acontrol experiment, DH10B cells expressing EcoCT596GmrSD from a single copypBeloBAC plasmid restricted T4 and T4gt at 5 to 10-fold and restricted T4Δip1 phage at ~106-fold. Similarly, thephage restriction activity by phage spot test (10 μl ofthe diluted phage was spotted on a host cell lawn pre-plated with soft agar) isshown in Fig. 6. Consistent with the phage titers (EOP) inrestriction assay, the expression of Eco94GmrSD endonuclease strongly restrictedT4Δip1 in the phage spot test (Fig.6, bottom panel).

Table 2 Restriction of phages by Eco94GmrSD endonuclease
Figure 6
figure 6

Phage spot tests for T4, T4gt, T4Δip1 (IPI*-deficient), T4eG192 IPI+ (ΔIPII ΔIPIII), andλvir on E. coli strains expressing EcoCT596GmrSD orEco94GmrSD.

(A). Two strains DH10B carrying pBeloBAC vector or pBeloBAC-EcoCT596GmrSDwere used for comparison. The difference in phage spot (plaques) formationis most evident at 106-fold dilution where EcoCT596GmrSDrestricted T4 and T4gt at approximately 5-fold. T4Δip1(IPI*-deficient) failed to form plaques on EcoCT596GmrSD-expressing strain(input phage ~2–3 ×105 pfu). (B). T7 Express[pET21] and T7 Express[pET21-Eco94GmrSD] strains were used for phage spottests. Eco94GmrSD moderately restricted T4, T4gt and T4 eG192 and it didnot restrict λvir. T4Δip1 phage was stronglyrestricted by Eco94GmrSD (no plaque formation at 100-fold dilution,estimated phage input ~2–3 ×105 pfu).

Co-expression of ecoCT596gmrSD and ip1 (IPI*) genes to alleviatetoxicity

In the native strain the ecoCT596gmrSD expression may be tightly regulated(or because of an unknown detoxification mechanism carried by the surroundingprophage-encoded gene products), E. coli CT596 cells show normal growth.In a heterologous host, RecA-deficient E. coli DH10B with a single copyplasmid carrying the ecoCT596gmrSD gene restricts T4-like glc-5hmCcontaining phages. However, subcloning of this gene into higher copy plasmids(pBR322-based ColE1 origin) was toxic: successful cloning was apparently enabledby introduction of a stop codon which caused a truncation product GmrS andreinitiation product GmrD with a small deletion between the gmrS andgmrD genes (Genbank ID: AF493796_1)6. (i.e. thetwo-chain GmrS and GmrD were the result of cloning artifact that still retainsendonuclease activity on T4 DNA). Toxicity of the ecoCT596gmrSD gene inDH10B was reflected as less than ~10−6 survivors byeven low level expression from vector pHERD20T; co-expression of the phage IPI*inhibitor protein eliminates this toxicity (see a schematic diagram in Fig. 7A, B), presumably as a result of IPI* neutralizingactivity towards GmrSD REase. Table 3 summarizes thegrowth of different T4-related phages on cell lawns of E. coli DH10Bexpressing either CT596GmrSD or CT596GmrSD plus IPI*. The co-expression of IPI*prevented restriction of phages normally sensitive to EcoCT596GmrSD, e.g., T2.The ultimate purification of the single-chain EcoCT596GmrSD is needed to confirmits nuclease activity and cofactor requirement in vitro. Consistent with thetoxicity of over-expressed CT596GmrSD in RecA-deficient E. coli cells,constitutive expression of Eco94GmrSD from pBR322 (with a strong ribosomebinding site GGAGGT-N6-ATG start codon, under Tc promoter) was quite toxic toRecA minus E. coli cells, probably as the result of GmrSD star activity(relaxed nuclease activity on dC and 5mC DNA). The toxicity was reflected by twoobservations: 100 to 1000 fold-lower transformation efficiency of RecA-deficientcells and poor cell lawn formation during phage infection (data not shown).

Table 3 Growth of different T4-related phages on cell lawns of E. coli DH10Bexpressing either CT596GmrSD or CT596GmrSD and IPI*
Figure 7
figure 7

Co-expression of CT596GmrSD and IPI* in the same host and the proposedmechanism of anti-restriction activity of IPI*.

(A). Scheme of pHERD20T plasmid construct used to express both CT596GmrSD andT4 IPI* genes and inhibition of GmrSD restriction by IPI* as shown by dualgene expression. The phage restriction activities by GmrSD are summarized inTable 3. (B). A schematic diagram of packagedinternal protein IPI* (a.k.a. inhibitor protein) in T4 head and itsinhibition of GmrSD restriction activity following DNA/IPI* ejection intohost cells.

Discussion

ATP/GTP stimulate endonuclease activity

Although GmrSD endonuclease activity is stimulate by ATP/GTP, there is nopredicted ATPase/GTPase domain in the protein by NCBI BlastP analysis. ThereforeEco94GmrSD may carry a novel type of NTPase activity. ATP binding and/orhydrolysis may help with protein translocation, tracking along the DNAsubstrate, or allosteric activation of the enzyme. It is known that Type IVREase McrBC requires GTP hydrolysis for endonuclease activity and SauUSIrequires ATP hydrolysis for enzyme activity32. ATP and GTP alsostimulate the endonuclease activity of BceSIV (GCWGC)33. Morebiochemical and structural studies of GmrSD enzyme are necessary to understandthe molecular mechanism of NTP/dNTP stimulation of endonuclease activity.

In log phase E. coli cells cultured in LB broth the averaged ATPconcentration was calculated to be 1.54 mM16. GmrSDendonuclease activity is stimulated by a range of ATP concentrations at0.1–2 mM with the upper limit near the physiologicalconcentration. Conversely, a high concentration of ATP (10 mM)inhibits GmrSD activity by some yet unknown mechanism.

Eco94GmrSD cut sites

The sequenced cut sites can be summarized as 5hmC N(17–23) G (two 5hmCin the opposite strands separated by 17–23 bp), wherecleavage frequently takes place at the semi-symmetric sites 5hmCN(9–11)↓N(8–11)G. Sequencing a large number ofcleavage sites (cleavage products) would be required to determine the preferredcut sites. The plasmid-borne modification-dependent REase PvuRts1I prefers tocleave a symmetric site at 5′-5hmCN(11–12)↓N(9–10) G-3′9. The crystal structure of PvuRts1I has been solved recently34,35. Based on the structure, PvuRts1I variants have beenengineered to preferentially cleave 5hmC-modified DNA over glc-5hmC DNA35. AbaSI endonuclease, a member of the PvuRts1I-family, cleavesDNA containing 5hmC and glc-5hmC, but not DNA containing 5mC or dC. The bestsubstrate for AbaSI cleavage is symmetrically modified 5hmC with a 22-bp spacer(5hmC N22 G), most likely cleaved by a homotetramer36.

Domain organization of Eco94GmrSD

EcoCT596GmrSD and Eco94GmrSD both contain two conserved protein domains DUF262(Domain of Unknown Function 262) or pfam03235 (Protein family 03235) at theN-terminus and DUF1524 (pfam07510) at the C-terminus. The DUF1524 familyproteins (pfam07510) contain the conserved amino acid motif (D/E/H)HXXP, a motiffound in His-metal nuclease superfamily. It is possible that the N-terminalDUF262 domain is involved in DNA recognition and the C-terminal DUF1524 domainis involved in Mg2+/Mn2+ ion binding and DNAcleavage. A similar domain organization exists in the Type IIS restrictionenzymes MnlI and FokI whose N-termini are involved in DNA binding/recognitionand C-termini have functions in nuclease catalytic activity28,37. In contrast, the N-terminus of AbaSI contains a Vsr-like nuclease domainwith a single catalytic site and the C-terminal domain harbors the Sra-like5hmC-binding domain36.

The differences in enzyme properties of the single-chain Eco94GmrSD andtwo-chain GmrS/GmrD complex

The major differences of the single-chain Eco94GmrSD and the two-chain GmrS/GmrDare: 1) Ca2+ is not required for Eco94GmrSD activity,(Ca2+ and Mg2+ required for the two-chainenzyme), Eco94GmrSD requires Mg2+ or Mn2+ as acofactor for catalytic activity; 2) ATP, dATP and dTTP strongly stimulate theactivity of Eco94GmrSD, while UTP, GTP and CTP simulate the activity of the CTenzyme; 3) Eco94GmrSD displays endonuclease activity on 5hmC-modified T4gt orPCR DNA containing 5hmC, but the two-chain GmrS/GmrD has poor activity on 5hmCDNA (4 aa changes and 84-aa deletion may have contributed to this alteredspecificity); 4) GroEL/ES protein co-purified with Eco94GmrSD similar to thetwo-chain enzyme, but GroEL/ES proteins can be easily removed by a heparincolumn chromatography.

Potential application for Eco94GmrSD endonuclease

GmrSD endonuclease activity may be utilized for in vivo detection of 5mCconversion to 5hmC. For example, E. colidinD::lacZ “endo-blue” indicator strain ER1992 isDcm+, McrBC,Mrr and McrA38 (dinD, DNA damage inducible gene D). The gmrSDgene could be cloned into pACYC184 plasmid under ParaB control(chloramphenicol resistant, CmR). Co-transformation andexpression of plasmid (AmpR) carrying Tet family dioxygenase willlikely covert C5mCWGG to C5hmCWGG in the presence ofcofactors39. The C5hmCWGG modified sites aresubstrates for Eco94GmrSD endonuclease. Controlled low expression of GmrSD cancause dsDNA damage and induce host SOS response. The dinD::lacZ indicatorstrain will likely form dark blue colonies on X-gal, Amp, Cm plate. Thus,co-expression of GmrSD and DNA hydroxylase in a dinD::lacZ indicatorstrain could be used to screen functional DNA demethyase variants from cDNAexpression library40.

Other gmrSD genes associated with Type I and IV restriction systemsin the immigration control region (ICR)

Close homologs to GmrSD are found in some pathogenic E. coli strains andmore diverged homologs in other bacterial genomes. Fig. 8shows that in some E. coli the GmrSD genes are associated with theimmigration control region (ICR) that carries Type I and Type IV Mrr restrictionsystems3,41. The Type IV restriction enzyme Mrr restrictsmethylated DNA with N6mA or 5mC modifications42. The 5mC and5hmC-dependent McrBC endonuclease (E. coli K strain) is not present inthis locus in these strains. For example, the avian pathogenic E. colistrain APEC O1 genome carries two GmrSD homologs, one which is more similar toEco94GmrSD, the 604- aa APECO1_3911 (93% identity by BlastP) and a more divergedhomolog, the 733- aa APECO1_2080 (24% identity by BlastP). Both proteins containthe conserved motifs of DUF262 and DUF1524 characteristic of these enzymes. LikeEco94GmrSD and UTI89GmrSD the APECO1_3911 is likely located on a prophage (itsgene is next to the putative phage tail fiber gene APECO1_3910). APECO1_2080,however, is located in an ICR that encodes a putative DNA transposase,endoribonuclease, Type I specificity (hsdS), modification (hsdM), restriction(hsdR), Mrr and a GTPase. It is possible that APECO1_3911 and APECO1_2080enzymes are both maintained in the same bacterium to restrict/exclude T-evenphages with differences in sugar modifications and/or the two enzymes maydisplay different immunity to the diverse inhibitor proteins (ip1 locusencoded proteins IPI*) ejected by T4-like phages. Either or both of thesefunctions would provide more fitness to this host than those with only one (ornone) GmrSD in resisting phage infection.

Figure 8
figure 8

Some putative E. coligmrSD genes associated with putative DNA transposases, Type I R-M systemsand Type IV restriction systems (Mrr) in bacterial immigration control region(ICR).

The “Gene cluster” function on the web server kegg.jpwas used to generate the table listing GmrSD homologs and associated DNAtransposases and Type I and IV restriction systems. All gene productabbreviations can be found in www.kegg.jp. GmrSD homologs (second column) in some E.coli strains: E. coli O1 K1 H7 (APEC) = APECO1_2080,E. coli O6 K15 H31 536 (UPEC) = ECP_4674, E. coli O18K1 H7 UTI89 (UPEC) = UTI89_C5048, E. coli LF82 = LF82_736, E.coli clone D i14 = i14_4938, E. coli O6 K2 H1 CFT073 (UPEC) =c5421, E. coli clone D i2 = i02_4938, E. coli ABU 83972 =ECABU_c49760, E. coli O81 ED1a (commensal) = ECED1_5210, E.coli PMV-1 = ECOPMV1_04800, E. coli O7 K1 IAI39 (ExPEC) =ECIAI39_4815, E. coli IHE3034 = ECOK1_4852, E. coli UM146 =UM146_22465, E. coli O83 H1 NRG 857C = NRG857_21960, E. coliP12b = P12B_c4425, E. coli SMS-3-5 (environmental) = EcSMS35_4888,E. coli C ATCC 8739 = EcolC_3721, E. coli O157 H7 EDL933(EHEC) = Z5943m, E. coli O7 K1 CE10 = CE10_5085. Note, the 5mC (and5hmC)-dependent type IV restriction genes mcrB/mcrC are replaced bygmrSD gene in these genomes.

Methods

Bacterial strains, culture media, cloning vector and DNAsubstrates

E. coli B strain T7 Express (C2566) (New England Biolabs, NEB) were usedfor gene cloning and protein expression. E. coli cells were grown in LBor phage broth (10 g tryptone, 5 g NaCl, 0.5 gMgCl2 in 1 L) supplemented with appropriateantibiotics (Amp at 100 μg/ml, Cm at33 μg/ml, Km at 50 μg/ml). Allrestriction and modification enzymes and DNA polymerases were from NEB. TheIMPACT protein expression and purification system (with pTYB1 vector, NEB) wasused for GmrSD expression43. The eco94gmrSD gene (GenBankID WP_000834395, gene flanked by NdeI and XhoI sites) was synthesized by IDT andinserted in a pIDT (kanamycin resistant, KmR) vector. TheNdeI-XhoI fragment was sub-cloned into pET21b in fusion with a C-terminal 6xHistag (N-terminal 6xHis tag not tested) or pTYB1, which allows expression oftarget protein as a fusion to the intein-CBD tag (in the C-terminus of thetarget protein). T4, T4gt and λvir phages were from LiseRaleigh's collection (NEB). T4 eG506 Δip1(ip1 gene deletion mutant) and T4 eG192 IPI+ (controloverlapping ΔIPII ΔIPIII deletion)44, IPIdeficient ip1 missense mutations HA35 andKAI12 and E. coli strainsDH10B containing pBeloBAC vector (CmR) orpBeloBAC-ecoCT596gmrSD (a.k.a. DL26)7 were fromLindsay W. Black's collection. Cells were grown to mid-log phase inphage broth plus Amp or Cm, concentrated 10-fold and used for phage platingassays or phage spot test. For phage spot tests on E. coli lawns, phagestock was diluted by 100-fold serial dilution and 10 μl ofthe diluted phage was spotted onto the cell lawn.

Protein purification

For enzyme purification from 2 L of IPTG-induced cells,Eco94GmrSD-6xHis was purified from fast flow nickel-NTA agarose columns(Qiagen). The eluted fractions (5 ml × 6) were analyzed bySDS-PAGE and fractions containing GmrSD were further purified by chromatographythrough a 5 ml HiTrap heparin HP column (GE Life Sciences). Pooledprotein fractions were diluted in a low salt buffer (20 mM Tris-HCl,pH 7.5, 50 mM NaCl, 1 mM DTT, 1 mM EDTA,20 mM NaCl, 5% glycerol) and loaded onto a heparin HP column using anAKTA FPLC system (GE Life Sciences). Elution was carried out using a saltgradient of elution buffer (50 mM to 1 M NaCl,20 mM Tris-HCl, pH 7.5, 1 mM DTT, 1 mM EDTA, 5%glycerol). The eluted fractions corresponding to UV absorption peaks wereanalyzed by SDS-PAGE. Active enzyme fractions were pooled and processed forbuffer exchange by running through an Amicon protein concentrator (Millipore).Protein was carefully recovered from the membrane by washing it a few times witha storage buffer (100 mM NaCl, 20 mM Tris-HCl, pH 7.5,1 mM DTT, 50% glycerol) and the purified enzyme was stored at−20°C.

To determine the optimal temperature for GmrSD-intein-CBD fusion proteinproduction, IPTG-induction (0.5 mM) was carried out at16°C to 37°C for 4 h to overnight. The proteinpurification procedure was based on NEB's manual except thatDTT-stimulated intein cleavage was carried out at 4°C for48 h. The target protein was then eluted and analyzed by SDS-PAGE.Eco94GmrSD protein was further purified by chromatography through a heparincolumn as described above for the 6xHis-tagged version.

Site-directed mutagenesis of the putative active site residues

Site-directed mutagenesis of eco94gmrSD gene was carried out by PCR asdescribed22. Mutant alleles were sequenced to confirm thedesired mutation(s). Six single or double Eco94GmrSD mutants (in the putativeendonuclease catalytic motif PD Xn E/D-X-K or PD XnE/D-X-E) located at the N-terminus were constructed this way usingpTYB1-eco94gmrSD: (1) D217A, (2) E228A/D230A, (3) D249A, (4)E260A/E262A, (5) E271A/E273A, (6) E278A/K280A. Additional six single GmrSDmutants (with C-terminal 6xHis tag) in the putative endonuclease catalytic motifD-H-N located at the C-terminus were also constructed usingpET21b-eco94gmrSD: (7) D507A, (8) H508A, (9) C517A, (10) N522A, (11)N528A, (12) N535A. Eight mutants were purified by chromatography throughnickel-NTA agarose columns. Three inactive mutants (D507A, H508A, N522A), threepartially active mutant (C517A, N528A and N535A) and two double mutants(E271A/E273A, E278A/K280A) were further purified by chromatography throughHiTrap heparin HP column.

DNA binding assay (DNA mobility shift assay)

DNA mobility shift assay was carried out as described29. A 266-bpPCR fragment containing 5hmC or dC was used in the binding assays. For bindingto glc-5hmC-modified DNA, T4 MluCI restriction fragments (100 to500 bp mixture) were used in the DNA mobility shift assay. PCR DNA(10 ng) was incubated with 50 ng,0.1 μg, 0.25 μg,0.5 μg protein (the molar ratio of GmrSD protein to DNAwas estimated at 6.0, 11.9, 29.7 and 59.5, assuming the active form of enzymeis a dimer with DNA) in 1× binding buffer (0.1 M NaCl,10 mM Tris-HCl, pH 7.5, 1 mM DTT,0.1 μg of λ carrier DNA) supplementedseparately by 1) 5 mM EDTA, 2) 10 mM CaCl2, 3)10 mM MgCl2, 4) 10 mM MgCl2 and1 mM ATP, at room temperature for 10 min. Glycerol wasadded to a final concentration of 10% and the DNA-protein complex was loadedonto a pre-run TBE gel (10%, Life Technologies) and electrophoresis was carriedout using 0.5× TBE buffer with gel box emerged in ice water. DNA wasstained by SYBR Gold stain (Life Technologies) in 0.5× TBE for15 min and DNA imaging was carried out on a Typhoon 9400 Imager (GELife Sciences).

GmrSD enzyme activity assay

T4 (glc-5hmC), T4gt (5hmC) and λ DNA (Dam+Dcm+) or 5hmC-modified PCR DNA were digested with purifiedGmrSD enzyme in NEB buffer 2 (50 mM NaCl, 10 mM Tris-HCl,10 mM MgCl2, 1 mM DTT) supplemented with 1 mMATP at 37°C for 1 h unless specified otherwise. Togenerate 5hmC-modified PCR DNA substrates (266 bp, 0.5 kb,1.0 kb, 1.9 kb, 3.8 kb), 5hm-dCTP (ZymoResearch) was incorporated into PCR DNA by Taq DNA polymerase during PCRreactions. As a control, similar PCR fragments were also generated using regulardNTP. To test the enzyme requirement for divalent cations, T4 DNA was digestedin a basic buffer (50 mM NaCl, 10 mM Tris-HCl, pH 7.5,1 mM DTT) and supplemented with different metal ions(MgCl2, MnCl2, CaCl2, CoCl2,NiSO4, ZnSO4) as indicated in each digestion. To testNTP stimulation of GmrSD activity, NTP (0.1, 0.5 and 1 mM), dNTP (1and 10 mM) and γ–S-ATP (1 mM) wereadded to GmrSD digestions. One GmrSD endonuclease unit is defined as the amountof enzyme required for complete digestion of T4 DNA (170 kb) intofragments less than 500 bp in 1 h at 37°C inbuffer 2 supplemented with 1 mM ATP. To examine the optimaltemperature for GmrSD activity, T4 DNA was digested at 25°C to65°C for 30 min in limited digestion.

Additional Information

How to cite this article: He, X. et al. Expression and purification of a single-chain Type IV restriction enzyme Eco94GmrSD and determination of its substrate preference. Sci. Rep. 5, 9747; doi: 10.1038/srep09747 (2015).