Crystal structure report of the ImmR transcriptional regulator DNA-binding domain of the Bacillus subtilis ICEBs1 transposon

Bacillus subtilis is a commensal member of the human oral and gut microbiomes, which can become infectious to immunocompromised patients. It possesses a conjugative transposon, ICEBs1, which includes > 20 genes and can be passed by horizontal gene transfer to other bacteria, including pathogenic Bacillus anthracis and Listeria monocytogenes. ICEBs1 is regulated by the ImmR/ImmA tandem, which are a transcriptional repressor that constitutively blocks transcription and a metallopeptidase that acts as anti-repressor and inactivates ImmR by proteolytic cleavage. We here report the production and purification of 127-residue ImmR from ICEBs1 and the crystal structure of its DNA-binding domain. It features a five-helix bundle centred on a helix-turn-helix motif potentially binding the major grove of double-stranded target DNA. ImmR shows structural and mechanistic similarity with the B. subtilis SinR repressor, which is engaged in sporulation inhibition.


Results and discussion
Structure analysis of the ImmR-DBD. Full-length ImmR of B. subtilis was produced by recombinant overexpression in Escherichia coli and purified through two chromatography steps. Apparently suitable crystals were routinely obtained but diffraction was consistently restricted to 7-8 Å. Eventually, crystals diffracting to around 2 Å were measured back in 2013 at the ESRF synchrotron beamline ID23-2 (Table 1). However, these crystals suffered from high mosaicity and anisotropy. Moreover, diffraction showed diffuse streaks in several regions of the reciprocal space, potentially arising from planar or linear lattice defects, so that individual diffraction spots were not properly resolved. Given the absence of heavy-atom/ion derivatives or a suitable model for molecular replacement, the project was discontinued until this year, when a predicted model for fulllength 127-residue ImmR was obtained with AlphaFold 17 . This model divides into a compact high-confidence (∅pLDDT = 96.7%; see 17 for definition) N-terminal DBD (M 1 -G 63 ) and a loose C-terminal domain (K 64 -E 127 ) containing two large isolated α-helices (K 64 -K 88 and E 103 -K 126 ), which was predicted with lower overall confidence (∅pLDDT = 74.8%). This result motivated us to reprocess the original diffraction data with up-to-date software. Data processing with Xds 18 and Dials 19 failed in our hands to yield data that would enable crystallographic refinement. Eventually, iMosflm 20 processing, which reportedly deals better with data with large mosaicity and ΔΦ values, followed by anisotropy correction with Staraniso 21 , enabled us to get a suitable reflection file for model refinement. This processing yielded comparably high values for the R merge parameter 22 (see Table 1) but absence of twinning and translational non-crystallographic symmetry. Subsequently, the structure was solved by molecular replacement. While no solution satisfying the packing criterium was obtained for the full-length searching model, two clear solutions were found for the DBD model alone. These solutions showed values for the refined translation functions of 9.9 and 20.4, respectively, and a final log-likelihood gain of 356. After successive rounds of model building and refinement, the final model comprised residues M 1 -K 64 of molecules A and B, plus respective N-terminal alanines (A 0 ) from the purification tag 23 , and 167 solvent molecules. The final values for R factor and R free 22 were comparably high for a dataset to 2.1 Å resolution (Table 1), which we attribute to the above crystal pathologies. This notwithstanding, the final (2mF obs − DF calc )-type Fourier map was of excellent quality for both molecules (Fig. 1A), as were the general model validation parameters (Table 1), so that we www.nature.com/scientificreports/  Description of the ImmR-DBD. The protein is a compact almost spherical pentahelical bundle (α1-α5) cohered by a central hydrophobic core, in which the N-and the C-terminal helices are nearly antiparallel, so that the chain termini are adjacent (Fig. 1B). Helices α2-α4 form a flap that folds back onto the two terminal helices. Overall, the five helices are connected by short linkers of 2-to-5 residues and each helix is approximately perpendicular to the preceding one. Following the nomenclature of HTH GBB -DBDs 26 , helices α2 and α3 would correspond to the "positioning helix" and the "recognition helix" of the HTH-motif engaged in double-stranded DNA recognition. The two protomers in the asymmetric unit (a.u.) are related by a dyad, which gives rise to an interface of 573 Å 2 (Δ i G = − 2.1 kcal/mol; Δ i G P-value = 0.424 27 ). The interface involves 56 and 49 atoms of 18 residues of molecules A and B, respectively, which overall perform nine hydrogen bonds, as well as symmetric hydrophobic interactions between 11 residues of either molecule. The main participating residues are L 3 , D 41 , T 44 , L 47 , L 48 , S 51 , N 52 , T 58 , D 59 , L 62 , and K 64 (Fig. 1C), which are provided by helices α4 and α5 plus the linker preceding α4. Finally, the experimental structure is in very good agreement with the predicted dimer (Fig. 1D). Indeed, the 130 residues of the former coincided with the predicted model with a core rmsd of 0.43 Å. Moreover, this superposition further revealed that the C-terminal α-helix of the full-length protein would clash with a symmetric DBD mate, which further supports that the crystal only contained the DBD (see "Structure analysis of the ImmR-DBD" section).

Similar structure. A search with
Dali identified several members of the "434 Cro family" of HTH-DBDs from bacteria or bacteriophages 26 as structurally related. Closest similarity was found with 111-residue SinR from B. subtilis, followed by the C2 repressor of Salmonella bacteriophage P22 (PDB 1ADR 28 ), CylR2 of Enterococcus faecalis, and DdrO of Deinococcus geothermalis ( Fig. 2A).
In all structures, the first four helices have a very similar arrangement ( Fig. 2A), and significant differences are only found in the respective fifth helices. These have variable length and are shifted along the polypeptide chain in the different structures, which supports that the minimal functional unit for these domains is a fourhelix bundle 26 . Moreover, SinR, CylR2, and DdrO evince dimeric crystal structures that are equivalent to that of ImmR (Fig. 2B). In the case of SinR, this dimeric arrangement was functionally validated through the crystal structure of a dsDNA complex 29 and further suggests that ImmR may oligomerize for the production of DNAloop structures similar to SinR 30 . We constructed a homology model for the DNA-complex of the ImmR-DBD dimer based on the SinR complex (Fig. 2C). Accordingly, the DNA major groove would be contacted through the recognition helices, and flanking helices α3 and α4 would play a supportive role. Putative residues engaged in binding would encompass T 17 -E 20 , N 29 -N 31 , S 33 -Y 35 , R 37 , and Y 39 -D 43 of either protomer.
Remarkably, archetypal 434 Cro repressor just spans the pentahelical HTH-DBD 31 but other family members are C-terminally extended and comprise additional domains. This is the case for SinR, which has two helices engaged in dimerization and binding to other proteins (PDB 1B0N 32 ) that are very similar to the AlphaFold prediction for ImmR (see "Structure analysis of the ImmR-DBD" section). Given that SinR is currently the closest structural relative of ImmR, both C-terminal regions may have similar functions. Indeed, ImmA inactivates ImmR through cleavage at F 95 -M 96 , which is in the linker between the two predicted helices 14 . This would be consistent with the protein:dsDNA complex falling apart upon cleavage, thus releasing transcriptional repression.

Materials and methods
Protein production and purification. The ImmR gene was amplified from Bacillus subtilis strain 168 using 5ʹ-CAA TCA TAT GAG CCT AGG CAA ACG ATT AAA AGAAG-3ʹ and 5ʹ-CAA TCT CGA GTC AC TCT TTC TTC TTT AAT TCG TCA ATG -3ʹ as forward and backward primers, respectively. The PCR product was cloned into the pCri8b vector using NdeI and XhoI restriction sites, which attaches an N-terminal hexahistidine (His 6 )-tag followed by a tobacco-etch virus (TEV) recognition sequence to the target protein 23 . The plasmid was transformed into Escherichia coli BL21 (DE3) cells, which were grown at 20 °C in Luria Bertani medium containing ampicillin (30 μg/mL) and chloramphenicol (34 μg/mL) under agitation (220 rpm) until an OD 600 of 0.6-1.0 was reached. Expression was then induced by adding 400 μM isopropyl-β-d-thiogalactopyranoside, and the culture was incubated for further 12 h. Cells were harvested by centrifugation at 4000×g for 15 min at 4 °C and resuspended in lysis buffer (20 mM Tris-HCl pH 7.5, 5 mM magnesium chloride, 20 mM imidazole, 10 μg/ mL DNAse). Cells were lysed in a cell disruptor (Constant Systems, Ltd.), and the lysate was clarified by centrifugation for 1 h at 30,000×g at 4 °C. Sodium chloride (1.5 M) was then added to the supernatant and incubated at room temperature for 45 min prior to nickel nitrilotriacetic affinity chromatography purification (NiNTA resin from Invitrogen). The resin had been preequilibrated with buffer A (20 mM Tris-HCl pH 7.5, 1.5 M sodium chloride, 20 mM imidazole), and the protein was eluted with buffer B (20 mM Tris-HCl pH 7.5, 1.5 M sodium chloride, 300 mM imidazole). The protein was then dialysed against buffer A to remove excess of imidazole and incubated with His 6 -tagged TEV protease at a 1:10 molar ratio over night at 4 °C to cleave the N-terminal His 6 -tag. The protein solution was then reapplied to the NiNTA resin pre-equilibrated as before to remove the TEV protease, the cleaved His 6 -tags and non-cleaved N-terminally His 6 -tagged ImmR. The flow through was collected and concentrated to ~ 2 mL using a Vivaspin 20 ultrafiltration device of 5-kDa cut-off (Sartorius). The sample was then run through a Superdex 200 16/60 column (GE Healthcare), which had been attached to an  . The orientation of the protein is the same as in Fig. 1C. Protein residues hypothetically participating in the protein:dsDNA interface are shown as sticks with white and grey carbons for either protomer, respectively, and labelled. www.nature.com/scientificreports/ ÄKTA liquid chromatography system (GE Healthcare) and equilibrated with buffer C (20 mM Tris-HCl pH 7.5, 1 M sodium chloride). Fractions corresponding to the protein of interest were collected, and the protein purity and molecular mass (theoretic value 14.8 kDa) were assessed through SDS-PAGE. Protein concentration was determined with a Nanodrop spectrophotometer (Thermo Fisher Scientific) using the theoretical absorption coefficient (ε = 7450 M −1 cm −1 ) calculated by ProtParam within Expasy 33 . Protein identity was confirmed by peptide mass fingerprinting analysis at the Protein Chemistry Service and the Proteomics Facilities of the Centro de Investigaciones Biológicas (Madrid, Spain). Briefly, samples were subjected to 10% SDS-PAGE, and gels were stained for 5 min with freshly prepared Coomasie Blue Stain (0.1% solution in 40% methanol/10% acetic acid) and destained for 15 min in 50% methanol. Gel bands were excised with a clean razor blade and placed in a 1.5-mL Eppendorf tube with 50 μL H 2 O for wet shipment.
Crystallisation and data collection. Pure protein in 20 mM Tris-HCl pH 7.5, 100 mM sodium chloride was concentrated to 6.5 mg/mL and employed to screen crystallisation conditions applying the sittingdrop vapor diffusion method at the Automated Crystallography Platform (https:// www. ibmb. csic. es/ en/ facil ities/ autom ated-cryst allog raphic-platf orm). Crystallization solutions were prepared with a Freedom EVO robot (Tecan) and pipetted into the reservoir wells of 96 × 2-well MRC crystallization plates (Innovadyne Tech.). Nanodrops consisting of 100 nL of each reservoir solution and protein solution were dispensed by a Cartesian Microsys 4000 XL robot (Genomic Solutions) into the shallow wells of the crystallization plates, which were stored at 4 °C or 20 °C in thermostatic crystal farms (Bruker). Upscaling and optimization were performed by sitting-drop vapor diffusion, using 2 μL protein solution and 1 μL precipitant solution in 24-well Cryschem crystallization plates (Hampton Research). Suitable crystals of ImmR-DBD were obtained with 18% (w/v) PEG 3350, 10 mM magnesium chloride, 50 mM Tris-HCl pH 8.5 as reservoir solution. Crystals were harvested with cryo-loops (Molecular Dimensions), cryoprotected, flash-vitrified in liquid nitrogen, and stored for data collection. X-ray diffraction data were recorded at 100 K on a 225-mm MARMOSAIC CCD detector (MAR Research) at the ID23-2 beamline 34 of the ESRF synchrotron (Grenoble, France). Crystals were indexed as space group I2, with two protomers per a.u.. Diffraction data were processed with programs iMoslfm 20 and Staraniso 21 , which included the Mrfana analysis routine, to obtain structure-factor amplitudes in MTZ-format for the Phenix 35 and Ccp4 36 suites of programs. Data were further assessed with Xtriage 37 within Phenix and Pointless 38 within Ccp4. Statistics on data collection and processing are provided in Table 1.
Structure solution and refinement. The structure was solved by molecular replacement with the Phaser program 39 using a homology model for the ImmR-DBD monomer obtained with AlphaFold 17 . These calculations yielded two unique solutions at Eulerian angles (in °) α = 116.1, β = 73.5, γ = 25.4 (fractional cell coordinates 0.214, 0.998, 0.333) and α = 296.7, β = 73.6, γ = 25.2 (fractional cell coordinates 0.815, 0.893, 0.331), respectively, which are related by a dyad parallel to cell axis c. The associated values for the translation functions after refinement were 9.9 and 20.4, respectively, and the final log-likelihood gain was 356. The adequately rotated and translated molecules were subjected to the Autobuild 40 protocol within Phenix, which yielded a Fourier map of high quality for manual model building with the Coot program 41 . The latter alternated with crystallographic refinement using the Refine protocol of Phenix 35 , which included hydrogens in riding positions and translation/ libration/screw-motion plus non-crystallographic symmetry restraints, until completion of the model. Table 1 provides essential statistics on the final refined model, which was validated trough the wwPDB validation service (https:// valid ate-rcsb-1. wwpdb. org/ valid servi ce). The coordinates can be retrieved from the Protein Data Bank (www. pdb. org) under access code 7T8I.
Miscellaneous. Structural relatives were identified through the Dali 42 server (ekhidna2.biocenter.helsinki. fi/dali). Structure superpositions were calculated with Ssm 43 in Coot. Figures were prepared using Chimera 44 . Protein interfaces and intermolecular interactions were analyzed using PDBePISA (www. ebi. ac. uk/ pdbe/ pisa) 27 and verified by visual inspection. The interacting surface of a complex was taken as half the sum of the buried surface areas of either molecule. A homology model of the complex between the ImmR-DBD dimer and target dsDNA was obtained by superposing the ImmR dimer onto the SinR dimer within its experimental DNA complex (PDB 3ZKC 29 ). This model is provided as Supplementary File 1. The ImmR chain was then slightly readjusted manually with Coot and geometry-minimised with the same program to iron out clashes and unfavourable side-chain conformations. The dsDNA part was kept intact.

Data availability
The coordinates and structure factors generated during the current study are available from the Protein Data Bank (www. pdb. org) under access code 7T8I.