Introduction

Viral dark matter is a term given to describe the vast number of unknown genes and their products1. Even bacteriophage λ, whose history spans the entirety of molecular biology, has a genetic region between the exo and xis genes that has remained under-characterized, until recently. Driven by the pL promoter, the λ exo-xis region genes comprising orf61, orf63, orf60a, orf73, and ea22 are among the earliest transcribed during infection2 (Fig. 1). The exo-xis region was associated with cell cycle synchronization3 and inhibition of host DNA replication4. Since these initial observations, exo-xis region genes have been identified as regulators of the viral life cycle and the decision to either actively produce progeny and lyse the host cell (lytic cycle) or remain domain and integrated into the host genome (lysogenic cycle). Since deletion of the exo-xis region does not inhibit replication4, it leads to the idea that exo-xis region genes serve an auxiliary and more nuanced role during development possibly by engaging host transcription factors, toxin-antitoxin regulatory pathways, and stress response pathways.

Figure 1
figure 1

The exo-xis region of phage λ. A genomic cassette including ea22 (red) and four other genes (black) are typically observed among lambdoid phages. Beyond this cassette, these is considerable diversity. In phage λ, two additional genes are observed (grey).

At 182 amino acids in length, Ea22 is the largest of the exo-xis proteins. A combination of sequence comparisons, structure predictions, and deletion studies suggest that Ea22 may be considered in three parts consisting of a short variable N-terminal region, a central coiled-coiled region comprising over half the protein, and a C-terminal domain (CTD)5. Here, we present the NMR structure of the λ Ea22 CTD. This structure represents the second high-resolution examination of the exo-xis region, following the NMR structures of two λ Ea8.5 homologs6. A recent review of all exo-xis region proteins predicted by AlphaFold and RoseTTAfold machine learning methods includes a general discussion of Ea227.

While the shorter and presumably single domain exo-xis proteins Orf61, Orf63 and Orf60a promote the lytic developmental pathway8,9, Ea22 remains in sole contrast as pro-lysogenic developmental protein10. Providing evidence for this role, bacterial lysogens with a deleted ea22 gene are rapidly induced towards lytic development by UV irradiation and produce more viral progeny as the infection proceeds10. Since all exo-xis region proteins are expressed within minutes of infection, their combined action towards promoting lytic or lysogenic development may be the net result of several host metabolic and stress related pathways being interrogated or manipulated at once.

Enterohemorrhagic E. coli (EHEC) such as O157:H711,12,13 are responsible for food poisoning outbreaks and more severe outcomes in immunocompromised people14. These strains contain a variety of genetic elements that contribute to their pathogenicity, including being lysogens of phages that share many similarities with phage λ15. One important distinction between the resident prophage sequences within these strains and phage λ is the presence of a gene encoding Shiga toxin (either stx1 or stx2) that is produced during the late stages of the lytic developmental cycle16. Like ricin, Shiga toxin disables ribosomes in intestinal epithelial cells thus aggravating the bacterial infection. Stx phages also contain copies of exo-xis genes. Differences in gene expression and developmental effects arising from exo-xis region gene expression in Stx phages tend to be more amplified17,18. The ea22 gene is most illustrative of this distinction between Stx phages and phage λ since not only are the effects more pronounced, but the Ea22 proteins of Stx phages have CTDs that are distinct from the Ea22 CTD of phage λ.

The λ Ea22 CTD alone cannot reproduce the pro-lysogenic properties of the full-length protein5 suggesting that the CTD does not have any intrinsic activity or it requires support from the amino-terminal sequence and central coiled-coil domain. From previously published multi-angle laser scattering studies, the λ Ea22 CTD was observed to be dimeric5. The full-length protein; however, was observed to be exclusively tetrameric5. It is currently unknown if the coiled-coil domain mediates tetramerization or the protein is organized as dimer-of-dimers via interactions contributed by the short terminal amino-terminal segment. A previous comparison of 1H,15N-HSQC spectra of full-length λ Ea22 and the λ Ea22 CTD revealed similar spectra5 suggesting the CTD was decoupled from the rest of the 84 kDa tetramer by a mobile segment, otherwise its 1H-15N resonances would have experienced the same degree of line broadening. During the same study, circular dichroism and differential scanning calorimetry revealed the Ea22 CTD was shown be soluble and thermostable in excess of 95 °C suggesting it would be an excellent candidate for high-resolution structural studies.

Results

The NMR structure of the λ Ea22 CTD dimer (21.4 kDa) was solved using a conventional heteronuclear approach that incorporated 993 intramolecular distance restraints equating to approximately 15 restraints per residue, 25 intermolecular distance restraints, 23 inferred hydrogen bonds, and 43 backbone dihedral angle restraint pairs that were predicted from chemical shifts. A complete statistical summary of the ensemble is available in Supplementary Table S1. The ensemble of the twenty best structure solutions (Fig. 2a) has a precision of 0.6 Å for ordered backbone atoms and 1.0 Å for all ordered heavy atoms.

Figure 2
figure 2

The NMR structure of the λ Ea22 CTD. (a) Two views of the structure ensemble with the two chains colored blue and magenta. (b) A schematic representation. (c) Similar dimeric proteins identified from the Protein Data Bank. To guide the comparison, topologically similar helices are colored red, purple and green. (d) A sequence alignment following the same coloring scheme of secondary structure elements.

The λ Ea22 CTD has a secondary structure composition of β1β2α1β3β4α1 (Fig. 2b). The two α-helices cross each other on one face of the four stranded β-sheet leaving the opposite face of the β-sheet to form the dimerization interface. The dimer interface incorporates contributions from all four β-strands (β1: F112/V114/R116; β2: P122/ I124; β3: T146/D148/I150; β4: V162). An ionic bond between R116 and D148 bridges the two protomers. Complete chemical shift assignments could not be made for the amino-terminal affinity tag and residues S102-Q111, the β2/α1 loop (H126-D130), and a longer β3/β4 loop (T151-G158). Thus, a unique conformation of the β2/α1 and β3/β4 loops could not be determined. A graphical presentation of all assigned heavy atoms is provided as Supplementary Fig. S1. Experimental data supporting the configuration of the β-sheet is provided as a Supplementary Fig. S2 and a set of inter- and intramolecular NOEs supporting key contacts made by I124 is provided as Supplementary Fig. S3. A schematic depicting the organization of the dimer interface is provided as Supplementary Fig. S4. From an analysis of the intermolecular interface with PISA19, 23 of the 64 structured amino acids in the Ea22 promoter bury a total of 904 Å2 of surface area. Since the NMR study was performed at 308 K and pH 7.6, segments with high mobility would be attenuated due to hydrogen exchange with solvent. When 1H,15N-HSQC spectra of 15N-labeled Ea22 CTD at 0.3 mM were compared at 303 K and 298 K (Supplementary Fig. S5), no additional resonances corresponding to the the β2/α1 and β3/β4 loops were observed. Unexpectedly, a set of resonances in the 298 K spectrum were noticeably more broadened compared to the 308 K spectrum. These majority of these resonances, presumably in an intermediate exchange regime at 298 K (F112-V114, R116, H117, K125, H126, L144, D148, I150, I152, G159, Q163-A165) mapped to amino acids at the dimer interface. Thus, in contrast to previously published multi-angle laser scattering data5 where no monomer was observed at a comparable concentration (~ 0.2 mM), NMR spectroscopy was able to detect a possible monomer–dimer equilibrium.

The λ Ea22 CTD represents an example of how caution should be taken with machine learning predictions. Using the CTD sequence alone, the AlphaFold20 prediction of the dimer was very similar to the NMR structure (1.25 Å Cα RMSD on one protomer; Supplementary Fig. S6), with the only difference being that the AlphaFold method suggested a longer α2 helix that extending implausibly beyond the domain. In contrast, when the full-length sequence was used or a portion of the coiled-coil region was appended to the CTD, a completely different model was predicted. This model placed the α1 and α2 on opposing sides of the β-sheet rather than interacting together on the same side of the β-sheet. ESMfold is a complementary machine-learning based method to AlphaFold that uses a protein language model in lieu of a multiple sequence alignment21. The NMR solution structure was dissimilar to the ESMFold predicted models of the full-length sequences and the CTD sequence alone.

A structure-based search of the Ea22 CTD dimer against the PDB using FoldSeek22 and SSM23 did not identify any homologous structures. To present this unique configuration in context, the Ea22 CTD with its two α-helices is presented in Fig. 2c in between dimeric proteins that have either one or three α-helices. A sequence comparison follows the structural comparison in Fig. 2d. The λ Ea22 CTD superimposes somewhat (2.63 Å Cα RMSD over 47 residues) to the antitoxin protein Dmd from phage T4. The λ Ea22 CTD dimer also has some superficial resemblance to C. elegans Erh1 which participates in a complex with three other proteins to facilitate the processing of regulatory RNAs24; however, not only is the orientation of the β-sheets is reversed and not as extensive, but there is also no analogous three-helix platform for supporting a donated fourth helix from a protein partner in the complex. In summary, given the lack of close homologs, the Ea22 CTD represents a unique structural assembly whose function remains unknown.

To investigate where variation is occurring in the Ea22 CTD, a search was performed against the four million non-redundant sequences comprising the UniRef100 database. From the fifty-four sequences identified spanning the entirety of the CTD (Supplementary Fig. S7) amino acids with at least 30% variation were mapped on the λ Ea22 CTD NMR structure. As shown in Fig. 3, the most variation was observed at the solvent-facing sides of helices α1 and α2 and the β2/α1 loop. No variation was observed for the unstructured β3/β4 loop.

Figure 3
figure 3

Sequence conservation in the λ Ea22 CTD protein fragments from the UniRef100 database. (a) WebLogo plot. A red dot indicates a position where > 30% variability is observed. (b) The variability is mapped onto one chain (light grey) of the λ Ea22 CTD dimer.

The ea22 gene is the last and largest of four conserved genes in the exo-xis region. In a standard laboratory E. coli MG1655 strain infected with any of four commonly studied Stx phages, ea22 transcription exceeds all other exo-xis genes and early infection markers such as N10. However, in the same host strain infected with λ, the opposite result is observed with transcription being the lowest among the exo-xis region genes10. In addition to this major difference in gene expression, Stx phage ea22 genes are mosaic7,10. To gain a greater understanding of variation among ea22 genes, a previously published dataset of thirty-seven phage and EHEC prophage genomes25 were analyzed. In eleven prophage genomes, no ea22 protein was detected using a translated nucleotide BLAST search. The remaining genomes resembled λ ea22 throughout the first half of the gene but no EHEC-associated ea22 gene encoded a domain like the λ Ea22 CTD. Among the EHEC-associated ea22 genes in the dataset, the observed Ea22 protein sequences could be reduced to three possible variants (Table 1). One sequence of each variant was selected as the prototype (Fig. 4). In eight cases. Two Ea22 variants were encoded in the same genome. All three Ea22 variants were encoded in the genome of E. coli O157:H7 Xuzhou21, a strain that was first isolated in 1986 from China and was responsible for a large outbreak in 199926 (Supplementary Fig. S8).

Table 1 Classification of Ea22 variants in a collection of shigatoxigenic phages (italics) and EHEC strains.
Figure 4
figure 4

Three major variants of Ea22 are observed in a set of Stx phage and EHEC genomes. The exo-xis regions of an EHEC representative bearing ea22 gene variant 1 (ea22_v1) and two Stx phage representatives bearing ea22_v2 and ea22_v3, are shown. The encoded Ea22_V1 and Ea22_V2 and Ea22_V3 proteins serve as prototypes for further analyses.

A complete sequence comparison of λ Ea22 and the three Ea22 variants is presented as Supplementary Fig. S9. When the common regions are presented as blocks, the mosaic nature of the genes becomes apparent (Fig. 5a). Each variant shares the same amino-terminal segment and central coiled-coil region (with InterPro signature IPR025153/Ead_Ea22). The three Stx Ea22 variants contain a 63 amino segment that is absent in the λ Ea22 sequence. The carboxy-terminal sequences bear no similarity to each other except for a 16 amino acid segment in V1 and V2. An InterPro signature (IPR007539/DUF551) is unique to the Ea22 V2 variant. Like the λ Ea22 CTD, the DUF551 signature sequence has the hallmarks of a portable domain since it is found in E. coli RNA polymerase σ70, helix-turn-helix transcription factors, a cryptic prophage CPS-53 protein YfdS that alters sensitivity to oxidative stress27, and a dATP/dGTP pyrophosphohydrase from phage MazZ that inhibits host restriction enzymes28. Despite the considerable differences in CTD architectures among the Ea22 variants, an electrostatic analysis reveals that all CTD are predominantly acidic (Fig. 5b).

Figure 5
figure 5

Ea22 proteins have a mosaic organization. All models were predicted by AlphaFold except for the λ Ea22 CTD which was grafted onto the full-length λ Ea22 protein. The green- and magenta-colored regions are signature sequences in the InterPro database. The N-terminal regions may be considered in terms of a common region shared with λ Ea22 (red) and another region (orange) only observed in the Ea22 variants. The CTD regions of each Ea22 variant (blue and boxed) are dissimilar. An electrostatic surface of each CTD is shaded with a gradient from a kT/e value of − 5 (red) to 0 (white) to 5 (blue).

While biochemical studies have determined that full-length λ Ea22 and Ea22 V3 (from phage vB_ecoS_P27) are tetrameric in solution, it is not presently known how the tetramer is formed. The two most straightforward models of a tetramer would be a dimer-of-dimers mediated by either the N-terminal region or the coiled-coil domain. Attempts to model the N-terminal and coiled-coiled regions as tetramers with AlphaFold Multimer produced candidates with poor packing. Given these modeling challenges, we took a reductionist rationale and modeled each Ea22 variant as a dimer using AlphaFold multimer (Fig. 5b). In Supplementary Fig. S10, the same modeling is presented using ESMFold that made similar predictions for the N-terminal and coiled regions. For the Ea22 V3 variant, AlphaFold and ESMFold diverged considerably in the predictions of the C-terminal regions. Neither prediction of the C-terminal region could explain why a stable fragment of the Ea22 V3 variant (termed P27 Ea22 in that study) was observed from limited proteolysis assay5.

Discussion

The structure of the Ea22 CTD is unique to lambdoid phages. While the structure of the Ea22 CTD did not reveal clues towards its function in λ phage, its conservation suggests the preservation of a useful function in viral development. Shigatoxigenic phages and prophages associated with EHEC possess an ea22 gene within the same exo-xis region of their genomes, albeit with some important structural differences. The most variability among the Ea22 proteins occurs at the C-terminal region of the protein. In a few cases where purified proteins can be studied, the C-terminal region is dimeric with acidic electrostatic surface profile like λ regardless of sequence, and the full-length protein is tetrameric suggesting there is evolutionary pressure to maintain a specific oligomeric state even when the sequence and structure are drifting5. It is possible, therefore, that Stx phages have acquired new domains that improve fitness while maintaining a similar architecture. Within the complex environment of the human gut where bacteria experience oxidative attack by the immune system, Ea22 attenuates the lytic cycle thereby improving the number of bacterial survivors and those that become lysogens10. Interestingly, the uncharacterized motif (DUF551) of the Ea22_V2 is also observed in the protein YfdS that enhances resistance to oxidative stress27,29. Cells lacking yfdS lose the ability to adapt to their environment and are sensitive to H2O2 which is a commonly known natural inductor of Stx prophages in the human gut18. Importantly, the genes encoding Shiga toxins remain inactive within lysogenic bacteria, and the activation of prophages is required for their effective expression and the subsequent release of toxins. While UV light exposure or antibiotics that disrupt DNA replication are frequently employed in lab settings to trigger lambdoid prophages, such circumstances are unlikely to occur in the human intestinal environment. Exposure to H2O2 by intestinal neutrophils or protist predators creates oxidative stress that promotes the activation of Shiga toxin-converting prophages17. Thus, a potential role of Ea22 as a silencer of oxidative stress mediated Stx prophage induction is reasonable and may explain its pro-lysogenic activity10. The metabolic and signaling pathways through which lysogeny is maintained are likely employed by other stress responses since H2O2 induction as an effect is lower than UV irradiation or application of the DNA crosslinking antibiotic, mitomycin C.

Phage-phage and phage-host yeast two-hybrid surveys did not identify any potential partners of λ Ea22; however, these findings might have been influenced by factors such as the limited expression of one of the partners, interactions of low strength, and a requirement of Ea22 to function as one part of a multi-protein assembly30,31. If Ea22 accesses other phage and host proteins via the CTD, the β3/β4 loop (152-HRYYGVGG-159) serves as an interesting possibility since it is dynamic and can make both ionic and hydrophobic contacts. Notably, the β3/β4 loop also appears to be conserved.

Viral non-coding small RNAs (sRNAs) are established mediates of the development switch from the lysogenic to lytic development32,33. In the prevailing model, high concentrations of sRNAs act as a repressor of lytic gene expression. Since Ea22 is the only protein expressed from the exo-xis gene region that has pro-lysogenic characteristics10, we can only speculate at this time if it can act as a sRNA binding protein as a facilitator or mediator.

New structure prediction and sequence comparative methods offer a powerful means to jumpstart molecular and structural investigations of the vast amount of viral dark matter remaining to be explored. Future studies of proteins encoded by the exo-xis gene region may also reveal new ways in which to control the lysogenic-lytic developmental decision and make therapeutics more effective against food poisoning and a fraction of cases that become potentially life-threatening.

Methods

Expression and purification

A carboxy-terminal domain (CTD) fragment of λ Ea22 (102–182, reference #179,655, UniProt: P03756) was gene synthesized by ATUM (Menlo Park, CA) and inserted into plasmid pD441NH for expression at 25 ˚C in an E. coli BL21:DE3 strain. At an A600nm of 0.8, expression was induced by the addition of 1 mM isopropyl thiogalactoside (IPTG) and the culture was grown for three hours further before harvesting. The expressed protein additionally contains a 6xHis tag at the amino-terminus and the sequence IETAV at the carboxy-terminus. Detailed information regarding the expression and purification of this fragment by nickel affinity and gel filtration chromatography has been published previously5. For NMR spectroscopy, a uniformly 13C, 15N labelled protein was made from a 1 L culture containing M9 minimal media salts supplemented with 3 g of 13C (99%)-glucose (Sigma-Aldrich), 1 g of 15N (99%) ammonium chloride (Sigma-Aldrich) and 1 g of 13C,15N (99%, 99%) Celtone algal extract (Cambridge Isotopes). Protein concentration was estimated by UV absorbance at 280 nm.

NMR sample preparation and NMR spectroscopy

The λ Ea22 CTD purified protein was concentrated to 0.6 mM in 5 mM Tris–Cl pH 7.6, 50 mM NaCl, 0.05% (w/v) NaN3 and 10% (v/v) D2O for NMR spectroscopy. For NMR experiments requiring a sample in D2O, the Ea22 CTD protein was dialyzed to a similar buffer in 98% (v/v) D2O. A mixed 12C/13C sample was made by mixing 12C/14N and 13C/15N proteins at a 1:1 molar ratio, adding urea to 6 M and rapidly diluting the mixture into a 20-fold excess of NMR buffer. The protein was concentrated and dialyzed to a 98% (v/v) D2O buffer. A standard set of 2D and 3D heteronuclear NMR spectra were acquired at a temperature of 308 K on a 700 MHz Bruker AvanceIII spectrometer equipped with a nitrogen chilled probe. For backbone and side chain assignments, these experiments included (2D-15N-HSQC, 2D-13C-HSQC, 3D-HNCACB, 3D-CBCACONH, 3D-HNCA, 3D-HNCOCA, 3D-HNCO, 3D-HNCACO, 3D-CCONH, 3D-HCCONH, 3D-HBHACONH, and 3D-CCH-TOCSY). The 3D experiments were acquired according to a 10–20% sparse sampling schedule and processed with NMRPipe34 and HMSist35. Distance restraints were obtained from a 3D-15N-NOESY and a 3D-13C-NOESY experiments sparsely sampled at 25%. Intermolecular distance restraints were obtained from a 3D-13C-filtered/13C-separated NOESY experiment acquired at the laboratory of Lewis Kay (Univ. Toronto) on a Varian Inova 600 MHz instrument equipped with a room temperature probe.

Structure determination

NMR spectra were analyzed with CCPN Analysis 2.5236. Backbone dihedral angles were predicted from chemical shifts with TALOS37. An initial ensemble of 100 structures sorted by the lowest number of NOE violations were calculated from a set of 10,000 structures using CYANA 338. The ensemble was refined using Rosetta build 2021.16.61629 with distance, angle and hydrogen bond restraints converted to Rosetta .cst format. Symmetry was enforced throughout the calculation. Details of the Rosetta refinement method have been published39. The top 20 refined structure solutions that satisfied the experimental restraints formed the final ensemble. Structural quality was assessed with PSVS40 and PROCHECK41.

Bioinformatics

The structures of dimeric T4 Dmd (PDB: 5I8J) and the λ Ea22 CTD were superimposed with Superpose23 to identify analogous amino acids on Ea22 that would interact E. coli RnlA (PDB:6Y2P), LsoA (PDB:5HY3). The UniRef100 database42 was searched with mmseqs243 for homologs to the λ Ea22 CTD. A few of the identified sequences in the initial set that were truncated beyond the minimally folded domain were removed. The dataset was realigned in AliView44 and exported as FASTA formatted list to WebLogo345. The exo-xis regions of two phage genomes and a set of previously published prophage genomes and were mapped with TBLASTN using λ Exo, Xis, Orf60a, Orf61, Orf63, Orf55, and Ea22 proteins as query sequences. Identified genes were manually inspected and presented SnapGene (www.snapgene.com). AlphaFold Multimer was used to predict the dimeric structure of full-length λ Ea22 and one representative of each variant class (www.github.com/sokrypton/ColabFold). A final full-length λ Ea22 dimer was constructed by using SWISS-MODEL46 to graft the N-terminal and coiled-coiled domains from the AlphaFold Multimer prediction to the NMR structure of the λ Ea22 CTD. Coiled-coiled region predictions were also made with CoCoPred47. Electrostatic analysis of the λ Ea22 CTD structure and models of CTDs from other Ea22 variants was performed with APBS48 and visualized with PyMOL (www.pymol.org).