Introduction

Diarrheal diseases are estimated to cause nearly 1.3 million deaths every year1. Enterotoxigenic E. coli (ETEC) is a leading cause of diarrhea, however there are various estimates of its prevalence. Older estimates suggested 280 million global infections with 380,000 deaths per year of children under five2, while newer estimates suggest only 75 million cases each year, killing up to 30,659 children under the age of five3. Despite this variation, ETEC is a leading cause of traveler’s diarrhea, a serious concern during military deployments, and remains a significant cause of childhood morbidity and mortality in low income nations3,4,5.

ETEC colonization is dependent on production of pili, which are sometimes referred to as colonization factors. ETEC pathogenicity is dependent on the attachment of these proteinaceous rods to the intestinal wall, and human challenge studies have shown that ETEC lacking adhesive pili are significantly attenuated6,7,8. Over 20 of these highly immunogenic colonization factors have been identified9, and while it has been suggested that vaccines incorporating them could be protective from ETEC infection10,11, to date such vaccines have not demonstrated protective immunity10,12. The lack of effective vaccines, coupled with the rise of antibiotic resistant strains that have associated treatment challenges and increased treatment cost13, highlights the importance of identifying new avenues for treatment.

While ETEC colonization factors are diverse, nearly half are regulated by the transcription factor Rns (also known as CfaD or CfaR), a member of the AraC family14,15,16,17,18,19,20,21,22. Rns activates pili loci expression by binding to a site immediately adjacent to the -35 hexamer, which may be accompanied by one or more additional sites further upstream19,23. Like most other AraC family members, Rns contains two helix-turn-helix DNA binding motifs in its carboxy terminal domain21,24, as well as an amino terminal domain that has been suggested to be involved in dimerization25 and ligand binding21. However, prior to this study ligand binding by Rns had not been experimentally demonstrated.

ToxT, an AraC family member from V. cholerae regulates transcription of genes encoding the two major virulence factors, the toxin-coregulated pilus (TCP) and cholera toxin (CT)24,26. Previous work in our laboratory found that unsaturated fatty acids (UFAs) inhibit ToxT activity when bound to a pocket in the N-terminal domain. This inhibition is due to disruption of ToxT’s ability to bind to promoter sites, as supported by the finding that the monounsaturated fatty acids oleic (C18) and palmitoelic (C16) acids inhibit ToxT DNA binding in vitro27. Because fatty acids are a component of bile, which is present in the GI tract of most organisms affected by V. cholerae, these compounds likely act as signals for virulence regulation28, and other virulence-controlling AraC family members could be inhibited by a similar mechanism27. A computational screen of AraC proteins identified several candidates for such regulation, including Rns.

In order to determine whether  Rns could be inhibited by UFA’s in a manner similar to ToxT, we solved the crystal structure of Rns. We observed two protein conformations, “open” and “closed” and show that Rns forms a dimer, consistent with the fact that it binds DNA as a dimer in a manner similar to other AraC proteins29,30,31. The open conformation contained a ligand bound in a groove between the N- and C-terminal domains, which we modeled as decanoic acid. Differential scanning fluorometry demonstrated that decanoic acid increased the melting temperature (Tm) of Rns, supporting a model in which decanoic acid specifically binds to the protein. When Rns was crystallized with excess decanoic acid, only the closed conformation was observed. Furthermore, addition of exogenous decanoic acid abolished the expression of colonization factors by inhibiting Rns-dependent transcriptional regulation. These results support the hypothesis that Rns and ToxT utilize a common mechanism of binding fatty acid effector molecules to regulate virulence gene expression.

Materials and methods

Identification of candidate AraC proteins

Fifty-eight proteins that had been previously characterized and identified as AraC family members were chosen for analysis32. Although these orthologs were linked to various functions (categories included: general metabolism, adaptive responses to nutrient sources, stress, and virulence), we analyzed all 58, regardless of functional class. As the prior study utilized sequence alignments for their analysis32, we used alignments of predicted secondary structure for our analysis. The following criteria were used to identify possible candidates: (1) The protein length was similar to that of ToxT. (2) The predicted secondary structure indicated a protein with an N-terminal β-strand rich domain and a C-terminal DNA binding domain comprised of seven α-helices (the canonical AraC DNA binding domain). (3) The presence of a positively charged amino acid at the C-terminal end of the α-helix predicted to be analogous to Lys230 in α7 of ToxT. (4) The presence of a positively charged amino acid in an appropriate position in the β-strand N-terminal domain. As β-strands are much more difficult to predict than α-helices, and as β-strand domains are inherently more variable, the exact location of this residue was less stringent than in rule #3. Using these rules, we identified four AraC virulence regulators out of the 58 that met all four primary criteria and for which comparison with an existing phylogenic tree32 showed they were among the closest relatives of ToxT. From members of the initial 58 candidates that failed our virtual screen based on rule #3, we identified three additional AraC proteins that are involved in virulence and have a positively charged residue near the end of the helix (Fig. 1, Table 1).

Figure 1
figure 1

Structural alignment of the C-terminal domains of the top AraC family hits. ToxT is shown at the top with structurally determined α-helices shown in boxes. For the other proteins, H indicates predicted α-helices and L indicates predicted loop regions. The position of α7 is indicated and the positive residues are highlighted for each sequence. Predictions by the PROF algorithm using the server at http://predictprotein.org.

Table 1 List of potential ToxT like AraC proteins.

Cloning of rns into an expression vector

The sequence encoding rns was optimized to remove rare codons and flanking sequences to insert into the plasmid pCDB24, a gift from Dr. Christopher Bahl (Institute for Protein Innovation), which contains a N-terminal 10xHis-SUMO tag (pCDB24 addgene.org), were added. The resulting construct was synthesized by IDT. The construct sequence was amplified using PCR and the resulting product was purified using the QIAquick PCR purification kit from Qiagen. The vector was digested with XhoI and purified using the QIAqucik PCR purification kit. The rns construct was cloned into the digested vector in frame with the SUMO tag, using the DNA Assembly Mix from NEB according to the manufacturer’s instructions, to create the 10xHis-SUMO-Rns (SMT-Rns) construct. The correct insertion of rns was verified by sequencing. The resulting plasmid was transformed into BL21 DE3 cells for unlabeled protein expression and into B834 DE3 cells for SeMet labeling.

SeMet labeled protein expression

SeMet labeled protein was expressed using Selenomet media from Molecular Dimensions. The cultures were initiated from a frozen stock into Selenomet media supplemented with 50 μg/ml of l-methionine and 100 μg/ml carbenicillin. The culture was incubated at 37 °C overnight with shaking. The next morning the culture was centrifuged at 25 °C for 10 min at 3000×g to collect the cells. The cells were washed three times with water. After washing the cells were resuspended in a tenfold larger volume of Selenomet media supplemented with 50 μg/ml of seleno-l-methionine, and 50 μg/ml of carbenicillin. The culture was incubated at 37 °C till an OD600 of 1–2. The culture was induced by adding 500 μM IPTG and incubated at 18 °C overnight.

Unlabeled Rns expression

SMT-Rns was expressed in modified TB media (Fisher Scientific). To start a frozen stock was used to inoculate a starter culture of 2 ml ZYP-0.8G media (ZYP media33 supplemented with 0.8% glucose) with 200 μg/ml carbenicillin, which was incubated overnight at 30 °C. The next morning of the starter was diluted 1:100 in TB containing 2 mM MgSO4 and 200 μg/ml of carbenicillin. This culture was incubated at 37 °C till an OD600 of 2–3. The culture was then diluted 1:10 in TB with 2 mM MgSO4 and 50 μg/ml carbenicillin in a 3 L baffled flask. This was again incubated at 37 °C till it reached an OD600 of 2–3. The culture was then induced with 500 μM IPTG and 5% glycerol was added. The culture was incubated at 18 °C overnight.

Rns purification

SMT-Rns purification was performed as follows. The culture was pelleted at 3000×g for 25 min. The bacteria were resuspended in wash buffer (20 mM TRIS pH 8, 20 mM imidazole, 500 mM NaCl) supplemented with 500 μM EDTA, 500 μM PMSF, and a Roche protease inhibitor tablet. The culture was lysed by three passes through a French press. The lysate was clarified by ultracentrifugation at ~ 100,000×g for 45 min. The supernatant was filtered using a 0.45 μm filter, and 1 M MgCl2 was added to a final concentration of 1 mM.

The SMT-Rns was captured using a HisTrap column from GE Healthcare. The column was equilibrated with 10 CV of elution buffer (20 mM TRIS pH 8, 500 mM imidazole, 500 mM NaCl) followed by 10 CV of wash buffer. The supernatant was loaded onto the column at 2 ml/min. The column was washed with 10 CV of wash buffer followed by 9 CV of 9% elution buffer, and 2 CV of 20% elution buffer. SMT-RNS was eluted from the column with a 10 CV gradient of 20–100% elution buffer. Fractions were collected in tubes with EDTA for a final concentration of 100 μM.

Cleavage of the SMT-Rns was achieved using previously purified 6xHis-Ulp1-6xHis SUMO protease, a gift from Dr. Bahl (Institute for Protein Innovation). The relevant fractions from the His-Trap column were pooled and dialyzed against 2 l of (20 mM TRIS pH 8, 200 mM NaCl, 1 mM DTT, 500 μM EDTA) at 4 °C. After ~ 4 h the dialysis buffer was changed, and a 10 mg aliquot of the protease was added to the pooled fractions. Dialysis was continued overnight at 4 °C. The next morning a HisTrap column was equilibrated with 5 CV of dialysis buffer with 1 mM MgCl2 and 20 mM imidazole. Imidazole was added to the dialysate for a final concentration of 20 mM and MgCl2 was added for a final concentration of 1 mM. The dialysate was then applied to the column and the flow through was collected.

Final purification was carried using an HiTrap Sp ion exchange column (GE Healthcare). The column was equilibrated with 10 CV of Sp elution buffer (20 mM TRIS pH 8, 1 M NaCl, 250 μM EDTA) followed by 10 CV of Sp wash buffer (20 mM TRIS pH 8, 200 mM NaCl, 250 μM EDTA). Then the flow through from the cleavage step was loaded onto the Sp column. The column was washed with 10 CV of Sp wash buffer. Elution was performed using a gradient to 100% Sp elution buffer over 5 CV. The concentrations of the relevant fractions were determined, and DTT to a final concentration of 1 mM was added to the fractions.

Crystallization and data collection

Initial crystal screening for the SeMet labeled Rns was done using 96 well block screens from either Qiagen or Hampton Research. Initial drops were setup either at 2 mg/ml or 0.6 mg/ml using a NT8 robot. The crystals were imaged using Rock Imager. The initial hits were then optimized. The best crystals of Rns were obtained in a base condition of 0.1 M (D/L) malic acid pH 7, with 6–10% PEG 3350. These conditions were then used in additive screens, Additive Screen (Hampton Research). The final crystallization condition for SeMet labeled Rns was 0.6 mg/ml protein added 1:1 to 0.1 M (D/L) malic acid pH 7, 6% PEG 3350, 0.01 M Betaine hydrochloride in sitting drops. The crystals were frozen using the well solution supplemented with 10% PEG 3350, and 40% glycerol as the cyro-protectant. Data was collected using the FMX beam line at NSLS-II. Two anomalous data sets were collected from the same crystal with 360º of rotation at a wavelength of 0.979184 Å.

Another set of screens were setup with native Rns at 0.6 mg/ml with 1 mM decanoic acid. The native Rns, decanoic acid mix was crystallized by adding the mixture in a 1:1 ratio to 0.1 M succinnic acid, 14% PEG 3350, 0.03 M glycyl-glycyl-gylcine in hanging drops. The crystals were frozen with the well solution supplemented with 30% ethylene glycol as the cryo-protectant. Diffraction data was collected at the AMX beam line at NSLS-II.

Anomalous data processing and refinement

The anomalous data sets were processed using XDS34. The space group was determined to be P 21 with a unit cell of 72.51, 49.97, 102.92, 90.00, 106.11, 90.00. The two datasets were combined in XSCALE34. The processed data was cut at 2.8 Å and used in a Hybrid Substructure Search followed by one round of AutoSol and one round of AutoBuild in PHENIX35. The model was refined using iterative rounds of automated refinement with Refine as implemented in PHENIX with NCS, secondary structure, as well as experimental phase restraints35, followed by manual model building using COOT36. Chimera and ChimeraX were used for model visualization37,38. Refinement statistics are listed in Table S1.

Native data processing and refinement

The native data sets were processed using XDS34. The space group was P212121 with a unit cell of 48.24 95.15 133.09 90 90 90. This crystal had significant pseudo-translational symmetry with an off-origin peak 46% of the origin peak as determined by PHENIX35. The structure was solved using Phaser39 as implemented in PHENIX35 using the previously solved SeMet-Rns as a search model. After phaser there was visible density in the ligand binding pockets of both monomers. The density was still visible after a cycle of automated refinement in PHENIX35. Decanoic acid was added to the density using COOT36 and refinement was performed using iterative rounds of PHENIX Refine35 followed by manual model building in COOT36. Chimera and ChimeraX were used for model visualization37,38. Refinement statistics are listed in Table S1.

Differential scanning fluorometry (DSF)

DSF was performed to assess the effect of fatty acids on Rns stability40. First stocks of fatty acids at 100× of final concentration were made by adding octanoic, decanoic, and palmitic acid to methanol. The decanoic acid was serially diluted by ½, in methanol, to obtain the concentrations for the dose response. 1 μl of the appropriate fatty acid was added to 99 μl of Rns, at a concentration of ~ 0.7 mg/ml, and incubated at room temperature for 1 h. Then 18 μl of the mixture was added to a PCR plate in triplicate. Sypro Orange dye (Life Technologies), diluted in buffer, was added to the PCR plate for a final concentration 5x, and a total reaction volume of 20 μl. For each condition a buffer only control was also performed in triplicate. The melting curves were generated using a StepOne + RT PCR machine (Life Technologies) with a gradient from 25 to 95 °C utilizing 1 °C steps. The normalized melt data was exported into STATA for analysis as described41.

Plasmids

Plasmid pGPMRns-Myc is a derivative of pTags2 (Addgene) that expresses Rns-myc from lacp. rns was amplified from pGPMRns42 with primers 1519/1522. pTags2 vector backbone was amplified with primers 1414/1419. The two PCR products were Dpn1 digested then circularized with NEB HiFi. All plasmids used in this study are listed on Table S2 and oligo-nucleotides are listed on Table S3.

Reporter strains

Plasmid pHKLac1 is a promoterless lacZ reporter integration plasmid with a pir-dependent origin of replication43. It carries attPHK022 for IntHK022-mediated integration into the chromosome of E. coli at attBHK022. The CS3 promoter was amplified from ETEC strain 1392/75-2a with primers 401/402. The CFA/I, cexE and nlpA promoters were amplified from ETEC strain H10407 with primers 38/4044, 394/39545, and 415/41642, respectively. The PCR products were digested with BamHI and EcoRI and then ligated into the same sites of pHKLac1 to construct pCS3Lac1 [CS3p (− 121 to + 352 relative to ORF)::lacZ], pCFAILac1 [CFA/Ip (− 486 to + 343 relative to ORF)::lacZ], pCexELac1 [cexEp (− 549 to + 264 relative to ORF)::lacZ], and pNlpALac1 [nlpAp(-391 to + 58 relative to ORF)::lacZ]. Each reporter plasmid was then integrated into the chromosome of MC4100 [F-araD139 ∆(argF-lac)U169 rpsL150 (StrR) relA1 flhD5301 deoC1 ptsF25 rbsR]46 as previously described43 resulting in strains GPM1072 (attBHK022::pCS3Lac1), GPM1061 (attBHK022::pCFAILac1), GPM1070 (attBHK022::pCexELac1) and GPM1080 (attBHK022::pNlpALac1). Colony PCR was used to verify that each strain possessed only a single plasmid integrant, as previously described43. Strains used in this study are listed in Table S4.

The λ Red recombinase system47 was used to generate a rns knockout in ETEC 1392/75-2a similarly to the cfaD::kan H10407 strain previously described48. Primers 1242/1150 were used to amplify a tetracycline resistance cassette targeting rns from pAH162. Electroporation of the cassette in to 1392/75-2a resulted in strain GPM3002 (rns::tet) which was confirmed via PCR and the loss of CexE and CS3 pilin expression.

β-galactosidase assays

Lac reporter strains GPM1061, GPM1070, GPM1072, and GPM1080 were transformed with pGPMRns-Myc or vector pTags2 to determine the effects of decanoic acid on expression from Rns regulated promoters. All strains were grown aerobically at 37 °C to stationary phase in LB medium with 100 μg/ml ampicillin with or without decanoic acid in 0.4% vol/vol DMSO. β-galactosidase activity was assayed as previously described49.

Analyses of protein expression

ETEC strains H10407 and 1392/75-2a were grown to stationary phase in CFA50,51 broth with or without decanoic acid in 0.4% vol/vol DMSO. Pilins were released from the outer membrane by incubating ETEC in 1/10 volume PBS at 65 °C for 20 min. Supernatants were clarified by centrifugation to remove insoluble material. Stationary phase whole cell lysates of ETEC or 5 μg pilin supernatants were subjected to SDS-PAGE then transferred to PVDF membranes or stained with 0.1% Coomassie R-250 in 40% ethanol, 10% acetic acid. PVDF membranes were blocked in TBS-Blotto (25 mM TrisCl pH 7.6, 150 mM NaCl, 5% wt/vol powdered nonfat milk). Antibodies against CexE homologs CexEα (H10407) and CexEε (1392/75-2a) were produced by immunization of rabbits (Proteintech Group, Inc.) with purified antigens and used at a dilution of 1:5000 in TBS-Blotto with 0.05% vol/vol Tween2045,48. The primary antibody against DnaK (AbCam ab69617) was used at a 1:10,000 dilution. HRP conjugated goat anti-rabbit (Santa Cruz Biotechnology sc-2030) and goat anti-mouse (Jackson ImmunoResearch 115-036-062) antibodies were used at 1:10,000 dilutions. Chemiluminescence and Coomassie staining was detected with an Odyssey FC Imaging System (LI-COR Biosciences).

Results

Identification of proteins potentially regulated by UFAs

Given the lack of overall sequence identity between ToxT and other AraC proteins we aligned secondary structure predictions and sequence alignments to identify eight AraC proteins that are known virulence regulators and that we predict may be regulated by UFAs (Fig. 1, Table 1). Many of these proteins did not contain a lysine in the area corresponding to Lys230 in ToxT, but rather an arginine, and in one case histidine. When these proteins were compared to an existing phylogenic tree of fifty-eight AraC family proteins32, all were found to be among the most closely related to ToxT. This supports our hypothesis that these AraC family members could share a common mechanism of being regulated by fatty acid ligands and served as a foundation for subsequent structural and functional analysis of these target proteins.

Initial structure determination

We pursued several of these targets for biochemical and structural studies, and Rns was the first target that led to a crystal structure. Following crystallization and data collection, attempts at molecular replacement using ToxT and other AraC family member structures as phasing models were unsuccessful. Therefore, we used SeMet labeling coupled with single-wavelength anomalous dispersion (SAD) to experimentally determine phases and solve the structure. Rns shows the typical characteristics of two-domain AraC family proteins. The DNA binding domain (DBD) (residues 162–265) consists of seven alpha helices linked to the N-terminal domain by a 3 residue linker. The N-terminal domain (residues 1–158) has a cupin-like fold made up of 6 β-strands and 4 α-helices (Fig. 2a). One monomer in the asymmetric unit (ASU) was found to be in an open conformation in which the N- and C-terminal domains were separated by a groove containing electron density consistent with a bound ligand, which we modeled as decanoic acid.

Figure 2
figure 2

Structure of Rns and its potential dimerization interface. (a) Structure of SeMet-Rns showing decanoic acid modeled in the ligand binding pocket with α-helices in blue, β-sheets in orange, coils in gray, and the decanoic acid in pink. The N-terminal and DNA binding domains are labeled. (b) Proposed biological dimer with the N-terminal domains of the two monomers in orange or blue and both DNA binding domains in teal. The N-terminal α-helices are labeled. Images created with ChimeraX38.

The Rns dimer interface

Rns formed a dimer across a crystallographic symmetry axis in a manner similar to other AraC proteins30,31. The interface consists of helices α1–α3 from the N-terminal domain of both monomers interacting in an antiparallel manner, burying approximately 274 Å2 of surface area (Fig. 2b). This is similar to the dimer interface of ExsA, an AraC family member from Pseudomonas aeruginosa involved in the transcription of the type three secretion system31,52. In Rns, the two DNA binding domains point almost 180° away from each other (Fig. 2b), which would allow the dimer to bind looped DNA in a manner similar to AraC53. Ongoing studies are focused on investigating the physiological relevance of Rns dimer formation.

Rns contains a fatty acid ligand

The crystal structure of the open conformation showed unexpected electron density consistent with the presence of a medium chain fatty acid. As we hypothesized Rns binds fatty acids, we initially modeled the density as eight-carbon octanoic acid. Following refinement, because the density was not completely filled by octanoic acid, we built in decanoic acid, which completely filled the electron density and further improved the refinement statistics (Fig. 3a). As no exogenous fatty acids were added to Rns during purification or crystallization, the presence of bound fatty acid suggested a physiological role; therefore we proceeded with analysis of the effect of fatty acids on Rns activity.

Figure 3
figure 3

Rns binds decanoic acid. (a) Detail showing the electron density from an 2Fo-Fc omit map contoured to 1σ around the decanoic acid as a blue volume. Image created with ChimeraX38. (b) DSF showing an increase in the Tm of Rns in a dose dependent manner with decanoic acid. Octanoic acid does not affect the Tm of Rns and palmitic acid decreases the melting temperature of Rns. Data is shown as the mean ± SD, n = 3.

Decanoic acid binds to and inhibits Rns activity

Differential scanning fluorometry (DSF) was used to determine if decanoic acid and other fatty acids could interact with Rns. DSF experiments were performed with decanoic acid (10 carbons), octanoic acid (8 carbons), and palmitic acid (16 carbons). While octanoic acid did not change the melting temperature (Tm) of Rns, palmitic acid decreased its Tm and decanoic acid increased the Tm in a dose dependent manner up to a concentration of 1 mM (Fig. 3b). This is consistent with decanoic acid interacting specifically with Rns.

To determine the biological relevance of our in vitro studies we evaluated the effects of exogenous decanoic acid on the expression of Rns-dependent virulence factors with two different ETEC strains. Coomassie stained gels of outer membrane proteins released by heat shock revealed that decanoic acid inhibits the expression of both strains’ major pilins (Fig. 4a). This effect is specific to the Rns-dependent pilins because the expression of flagellin and other proteins of unknown identity was largely unaffected by the addition of decanoic acid. Likewise Rns-dependent expression of the outer membrane lipoproteins CexEα and CexEε54,55 was inhibited by decanoic acid in a dose dependent manner (Fig. 4a). The effects of decanoic acid on Rns-independent DnaK were negligible. This was expected because decanoic acid is not microbiocidal even at concentrations ca. 16-fold higher than the levels required to inhibit the expression of Rns-dependent virulence factors (Fig. S2). Thus, our in situ results reveal decanoic acid is a specific inhibitor of the Rns virulence regulon in pathogenic strains of ETEC.

Figure 4
figure 4

Decanoic acid inhibits Rns activity. (a) Excerpts of western blots of DnaK/CexE (same blot, probed sequentially for CexE then DnaK and cropped) and Coomassie stained pilin preparations from two ETEC strains (H10407 top and 1392/75-2a bottom). Expression of both CexE (12 kDa) and the major pilins (17 kDa) decreased with increasing concentrations of decanoic acid while DnaK (70 kDa) and flagellin (51 kDa) controls were unaffected. For full western blots see Fig. S1. (b) β-galactosidase assays showing decanoic acid repressed Rns activity at the cexE, CFA, and CS3 promoters and derepressed at the nlpA promoter. Data is given as the mean response ± SD, n = 3.

Because we predict the effects of decanoic acid on Rns function should occur at the level of transcription, we employed Lac reporter strains to evaluate this hypothesis at Rns activated promoters42,44,45. As expected, the addition of decanoic acid inhibited Rns-dependent expression of β-galactosidase from cexEp, CFA/Ip, and CS3p in a dose dependent manner (Fig. 4b). Decanoic acid also relieved Rns-dependent repression of nlpAp, mirroring the effects of the activated systems (Fig. 4b). Although we also observed off target effects of decanoic acid, for example at nlpAp in the absence of Rns and the Rns-independent promoter tibDBp (Fig. 4b, Supplemental Fig. S3), the magnitude of these effects was less than when Rns was involved. As both repression and activation require Rns binding at sites near each promoter (Ref.45, and Munson, unpublished), these results suggest decanoic acid directly interferes with the ability of Rns to bind DNA. Thus, exogenous decanoic acid abolishes the expression of ETEC virulence factors by inhibiting the activity of the Rns transcription factor. This is the first evidence of a small molecule that specifically inhibits Rns mediated ETEC virulence and has significant implications for therapeutic applications.

Structure with added decanoic acid

Given decanoic acid both stabilized and affected Rns function, we crystalized it in the presence of excess decanoic acid, ensuring saturation, to assess what effects this might have on its structure. Despite the diffraction data showing significant pseudo-translation, the structure was solved to 3.0 Å using molecular replacement with the apo SeMet-Rns structure as a search model. Analysis of the resulting electron density showed both monomers contained electron density consistent with a ligand, which we again modeled as decanoic acid.

Overall, the SeMet and the unlabeled Rns structures are quite similar. In the unlabeled structure both monomers in the ASU were in the closed conformation. Therefore, we examined differences between the open monomer observed in the SeMet structure and a closed monomer of the unlabeled structure. The two conformations were aligned using the long helix in the DBD. This highlighted that helices making up the dimer interface and β-strands 2, 4, and 6 in the open conformation are shifted away from the long helix in the DBD by ~ 1.7 Å in comparison to the closed conformation (Fig. 5a,b). This shift results in a change from a pocket in the closed conformation to a groove in the open conformation (compare Fig. 5c to d).

Figure 5
figure 5

Comparison of the SeMet labeled structure in the open conformation and the native Rns structure in the closed conformation. (a) Superposition of the native (orange) and SeMet-Rns (blue) structures, the two domains were aligned using the long helix in the DNA binding domain (Phe205-Glu223), showing how the N-terminal domain is shifted away from the DBD in the SeMet structure. (b) Detail showing the β-sheet is shifted by 1.7 Å from the DBD in the SeMet structure (line to blue β-strand) compared to the native structure (line to orange β-strand). (c) Surface representation of the SeMet-Rns structure in blue with the decanoic acid in pink showing how the fatty acid is exposed to solvent. (d) Surface representation of the native-Rns in orange with the decanoic acid in purple showing how the ligand is enclosed in more of a pocket. Images created with ChimeraX38.

Arg75 and His20 in the structures make contact with the decanoic acid

In both the open and closed conformations, His20 and Arg75 interact with the carboxyl group of the decanoic acid (Fig. 6a,b). Notably, Lys216, which was the positively charged residue identified by our computational screen as potentially interacting with the fatty acid (Fig. 1), does not interact with the ligand. In the open structure, Arg75 is between the fatty acid and the rest of the N-terminal domain (Fig. 6a left panel), whereas in the closed structure, Arg75 is more alongside the ligand (Fig. 6a right panel). The position of the decanoic acid head group is in a different orientation in the open and closed conformations. In the open conformer, the head group is perpendicular to the plane of the imidazole group of His20, resulting in only one oxygen from the fatty acid making contacts with His20 and Arg75 (Fig. 6a,b left panel). In the closed conformation the fatty acid carboxyl group is turned 90° allowing both oxygens to make contacts with either His20 or Arg75 (Fig. 6a,b right panel). We speculate the position of the Arg75 drives the transition between open and closed conformations, and are currently exploring the role Arg75 and His20 play in mediating the structural response to decanoic acid binding.

Figure 6
figure 6

Detail showing the interactions between His20 and Arg75 to decanoic acid. (a) “front” view showing how the Arg75 in the SeMet-Rns, in blue, is above the decanoic acid (left panel) while in the native-Rns, in orange, Arg75 (right panel) is alongside the fatty acid. (b) “Top” view with the SeMet-Rns in blue (left panel) and native-Rns in orange (right panel). Images created with ChimeraX38.

Comparison of Rns and ToxT binding pockets

Because Rns and ToxT are the only full-length AraC structures solved that contain fatty acid ligands, we compared the binding modes of the two proteins. The proteins bind fatty acids of different lengths and degree of saturation: palmitoleate (16 carbon monounsaturated) binds to ToxT, while decanoate (10 carbon saturated) binds to Rns. We aligned ToxT and Rns using ToxT residues 211–231 (3GBG27) and the native Rns structure residues 205–223 with ChimeraX38. While both fatty acids are bound between the N-terminal domain and the DBD (Fig. 7), in ToxT the bound fatty acid projects more deeply into the N-terminal domain (compare Fig. 7a to b). In ToxT, residues from the N-terminal domain (Lys31) and the DBD (Lys230) interact with carboxyl group on the fatty acid (Fig. 7a), whereas in Rns the two residues that interact with the carboxyl group are in the N-terminal domain (Fig. 7b). The β1-strand in ToxT is absent in the Rns structure (compare Fig. 7a to b), creating a more enclosed pocket in ToxT (Fig. 7c) than in Rns, which has an opening in the side of the pocket (Fig. 7d). Both pockets are predominantly hydrophobic except for the charged groups involved in binding the carboxyl portion of the fatty acid (Fig. 7c,d). Palmitoleate occupies a larger portion of the binding pocket in ToxT than the decanoate in Rns (Fig. 7c,d), suggesting Rns is able to bind ligands larger than decanoate. While fatty acids regulate both proteins, the details of the binding are different, which is likely to be a theme among fatty acid regulated AraC’s.

Figure 7
figure 7

Comparison of the binding pockets of ToxT and Rns. (a) ToxT in blue with the palmitoleic acid in dark blue. (b) The native Rns in orange with the decanoic acid in purple were aligned in ChimeraX38 using the long helix in the DBD. The β-strand for each protein closest to the long helix in the DBD are labeled, as well as the residues which make contacts with the fatty acids. The binding pockets for (c) ToxT and (d) Rns are shown. The surfaces are colored according to lipophilicity calculated using ChimeraX38, with cyan for charged areas and orange for hydrophobic surfaces. Again, the residues interacting with fatty acids are labeled. Images created with ChimeraX38.

Discussion

As part of ongoing efforts to determine whether other AraC proteins are regulated by fatty acids in a similar fashion as ToxT, we solved the structure of Rns using X-ray crystallography in two conformations: open, distinguished by having a groove between the N and C-terminal domains for ligand binding; and closed, where the two domains have come closer together with the pocket enclosing the ligand. The Rns structures were dimeric, with an interface similar to structures of other AraC proteins30,31. The DNA binding domains of the monomers are pointing ~ 180° away from each other, suggesting Rns binds DNA loops in a manner similar to AraC53. Such an orientation is also consistent with the observation that most Rns regulated promoters have proximal and distal sites separated by about forty base pairs, although for some promoters this separation is less and for others it is more16,19,23,42,45,56. For example, CS1 pili expression is mediated by two sites, one proximal to the -35 RNAP binding site, and one distal, around − 106, to the transcription start site23. The sites are asymmetric and contribute additively to the expression of the CS1 pili. The rns promoter has two binding sites, one at the − 227 position and the other downstream of the transcription start site56, which are widely spaced and act synergistically, with both required for activating transcription56. These findings, coupled with the observations from our structures, indicate Rns likely binds to looped DNA. Work is ongoing to investigate the relevance of this interface and to clarify if Rns binds looped DNA.

The ligand binding pocket of Rns was occupied by the 10-carbon fatty acid, decanoic acid. DSF studies showed decanoic acid specifically stabilizes Rns, and that it inhibits Rns activated expression of CexE as well as the CS3 and CFA/I major pilins in ETEC strains. Decanoic acid also inhibits Rns activity at the cexE, CFA/I, and CS3 promoters, where Rns activates transcription, and at the nlpA promoter, where Rns represses transcription16,42. Therefore, we have identified the first small molecule that inhibits Rns mediated expression of ETEC virulence factors. Based on homology, we expect these results to be applicable to CsvR, which is found in other ETEC strains, and AggR from enteroaggregative E. coli17,57.

The finding that decanoic acid inhibits expression of ETEC colonization factors raises several questions including, what is the source of the decanoic acid, what are its physiological concentrations, and where in the intestine would ETEC be exposed to the molecule? One source of decanoic acid is food, including milk products, coconut oil, palm kernel oil, and some vegetable oils58,59,60,61. Decanoic acid, like all medium chain fatty acids, is rapidly absorbed by the small intestine, resulting in a decreasing concentration gradient of dietary sourced decanoate along the intestinal tract62,63. In addition to diet, decanoic acid derivatives may be synthesized by the microbiota for signaling purposes. For example, it is known Psuedomonas aeruginosa produces cis-2-decenoic acid to signal biofilm dispersal64,65, and it is unlikely to be only bacteria to produce such molecules. Furthermore, while we identified decanoic acid as a Rns ligand, the structure of the Rns binding pocket suggests a number of relevant natural products could be capable of binding to and regulating the protein. While the specific structural mechanism by which Rns is regulated by decanoic acid is not clear, it likely involves influencing the stability of the Rns dimer, either directly or by a dynamic allosteric mechanism as demonstrated for V. cholerae ToxT66, and future studies will be directed at clarifying this mechanism.

This study demonstrates the first time that a small molecule binds to Rns and inhibits virulence gene expression in ETEC, and provides a foundation for decanoic acid and its analogs to be pursued as anti-virulence lead compounds for limiting the morbidity and mortality caused by ETEC.