Introduction

Pathogenic bacteria possess a number of different secretion systems that facilitate host infection as well as interbacterial competition1. One of these is the type III secretion system (T3SS), which is found in a number of different Gram-negative pathogens and is key to the ability of these microbes to cause disease2,3,4. Broadly, T3SS comprise two elements: a highly conserved multiprotein structural complex that forms the conduit between the bacterial and the host cell; and various effector proteins that are translocated through this channel. Genes encoding the T3SS channel, or needle complex, are contained within pathogenicity islands comprised of a single cluster of genes5,6. Genes encoding effectors are more widely spread within the genome and vary greatly between different bacterial species.

Certain strains of Escherichia coli possess a well-defined T3SS, notably enteropathogenic E. coli (EPEC) and enterohemorrhagic E. coli (EHEC). This T3SS is encoded on the locus of enterocyte effacement (LEE) and in concert with its secreted effectors, produces the characteristic attaching and effacing lesions that mediate close attachment of the pathogen with the intestinal epithelial wall7. Whole genome sequencing of strains of EHEC revealed the presence of a putative additional T3SS8,9, which has been termed E. coli T3SS 2 (ETT2). The gene cluster encoding this additional T3SS shows significant homology to the SPI-1 T3SS of Salmonella serotype Typihimurium10,11. However, unlike the LEE, the T3SS first described in EHEC and EPEC, there do not appear to be any putative effector proteins encoded within ETT2 and there are some differences in the structural genes present as well11. Compared to the SPI1 T3SS of S. Typhimurium, the ETT2 apparently lacks homologues of genes encoding the needle tip complex, SipBCD. Further studies attempted to delineate the frequency with which this ETT2 locus was found in different E. coli strains12,13,14. However, a study by Ren et al.15 showed that although the ETT2 locus was present in many lineages of E. coli, it had undergone extensive mutational attrition. The phylogenetic analysis showed that ETT2 was absent in what is thought to be the oldest phylogroup of E. coli, B216,17, which contains many uropathogenic E. coli, but had been acquired by the divergence of the next oldest phylogroup, D. Analysis showed multiple inactivating mutations were present within the locus, which would render the T3SS functionless, including the ETT2 locus in the EHEC O157 strains in which it was originally described. However, a complete and potentially fully functional ETT2 was found in the enteroaggregative E. coli O42 (EAEC O42) strain; other E. coli strains analysed either had no ETT2 locus, or it had undergone extensive deletion and/or mutational inactivation. Ren et al. also showed that E. coli strains with the most intact ETT2 locus also carried an additional T3SS-like island adjacent to the selC tRNA gene, the eip locus, which encoded homologues of translocated proteins from the Salmonella pathogenicity island I (Spi-1) T3SS, as well as genes encoding a transcriptional regulator (eilA), a chaperone (eicA) and an outer membrane invasion/intimin-like protein (eaeX)15,18.

Functional effects of ETT2 remain unclear. Mutational analysis of the ETT2 cluster in an avian pathogenic E. coli showed it had reduced virulence, even though the cluster had undergone mutational attrition and could not encode a functional T3SS, suggesting potential alternative roles in pathogenesis19. Other studies have also suggested a role for proteins encoded in the ETT2 in virulence of avian pathogenic E. coli and K1 strains20,21,22. A recent study examined the role of the putative transcriptional regulator gene eilA at the selC locus in EAEC strain O4218. This demonstrated that eilA was responsible for regulating transcription of genes within the selC locus, as well as eivF and eivA within the ETT2 locus. Mutants lacking eilA were less adherent to epithelial cells and had reduced biofilm formation; this phenotype was also observed for mutants in the eaeX gene which encodes the invasin/intimin homologue. This suggested important functional roles of the selC and ETT2 loci in pathogenesis of this strain of E. coli.

Hitherto, there is no evidence of intact ETT2 in human pathogenic strains of E. coli other than a few strains of EAEC. However, given the findings described above, we hypothesised that ETT2 might be of importance in human infections caused by E. coli phylogroups other than B2. In particular, given the roles of T3SSs in attachment, invasion and immune evasion, we hypothesised that strains with an intact ETT2 might be found within invasive bloodstream isolates of extraintestinal pathogenic E. coli, where the ETT2 might have allowed the organism to overcome epithelial barriers and immune clearance. Thus, we set out to determine whether an intact ETT2 was present in a collection of invasive bloodstream isolates of E. coli. We have studied 162 isolates of E. coli isolated from bacteremic patients in Scotland from 2013 and 2015, which we have subjected to whole genome sequencing. Within this group, we identified 26 strains of E. coli sequence type (ST) 69, of phylogroup D, which were largely derived from community-acquired sources. Virtually all of these strains had a completely intact ETT2 and selC locus, with no inactivating mutations. Similarly, intact ETT2/selC loci were also found in some minor ST types in our collection. The eilA transcriptional regulator was functional in these strains. Analysis of E. coli strains with worldwide representation also showed that ST69 contained an intact ETT2 in these strains as well. Our results show that an intact ETT2 locus is widely present in human pathogenic E. coli ST69 strains, suggesting a functional role for this cryptic T3SS in human disease caused by this sequence type.

Results

ETT2 locus within Scottish E. coli blood stream isolates

We have performed whole genome sequencing and analysis of 162 isolates of Escherichia coli obtained from blood cultures of patients within Scotland in 2013 and 201523. Sequence comparisons with other isolates of E. coli showed that strains belonging to ST69 contained an intact ETT2 locus. The gene content of this locus from one of these ST69 strains, ST69 1#9, was compared to the complete ETT2 found in enteroaggregative E. coli strain 042 (EAEC 042) and the degenerate ETT2 found in E. coli O157:H7 Sakai (Fig. 1). An ETT2 locus in this ST69 strain was found in the ~30 kb region spanning the yqeG gene and the tRNA gene gluU with over 98% identity to the ETT2 locus in EAEC 042. Importantly, this locus did not contain any of the deletions, insertions or inactivating mutations found in the E. coli O157:H7 Sakai strain and thus was characterised as intact.

Figure 1
figure 1

Comparisons of the ETT2 locus between EAEC 042, ST69 (1#9) and O157:H7 Sakai. Degree of identity is shown by the level of grey shading as indicated. Genes are colour coded according to putative function as shown.

We extended this analysis to compare all of the ST69 strains in our collection over this region. Of 26 ST69 genomes sequenced, 24 were assembled in one contig covering this region, shown compared to each other in Fig. 2. In all these assemblies, there was a greater than 95% identity between the sequences and the reference genome of the ETT2 in EAEC 042 (Table 1). Two strains appeared to lack the extreme left-hand end of the complete ETT2 locus (ECO#35 and EC1#2), and two strains had a stop codon in the epaO gene at the same site as noted for E. coli O157:H7 Sakai (EC1#70 and ECO1#18; gene highlighted in green); no other ST69 strains had any inactivating mutations.

Figure 2
figure 2

Comparison of the ETT2 operon in 24 ST69 strains. Degree of identity is shown by the level of grey shading as indicated. Genes are colour coded according to putative function as shown. The epaO gene is shown green.

Table 1 Similarities of length and identity between the ETT2 and eip loci in the strains indicated.

Next, we analysed other STs within our collection of bacteremic isolates for the presence of the ETT2 locus (Fig. 3). 4 non-ST69 isolates contained an intact ETT2 region, belonging to ST405, 38, 362 and 349. BLAST percentage identity and length coverage of the ETT2 from these strains to EAEC 042 is shown in Table 2; all are closely related to ST69 (Supplementary Fig. S1). Other strains showed variable loss and/or degradation of the locus as previously described. Notably, none of the common epidemic strain ST131 (phylogroup B2) contains any elements of this ETT2 region – one representative example is shown at the bottom of Fig. 3.

Figure 3
figure 3

Comparison of elements of the ETT2 locus found in non-ST69 strains. Degree of identity is shown by the level of grey shading as indicated. Genes are colour coded according to putative function as shown. Genes unrelated to the ETT2 locus genes are coloured grey.

Table 2 Similarities of the ETT2 locus between the strains indicated.

selC/eip locus within Scottish blood culture isolates

Closely associated with an intact ETT2 region is a group of genes related to type III secretion effectors adjacent to the selC tRNA gene15,18. Two distinct genome insertions were noted at this site: selC-A and selC-B, as defined and described by Sheikh et al.18. selC-A contains mainly phage related genes. selC-B contains homologues of putative type III secretion effectors (eipB, eipX and eipD), a putative type III effector chaperone, eicA, a transcriptional regulator eilA, and a gene eaeX, which encodes a large protein containing bacterial immunoglobulin repeats with homology to outer membrane adhesion/invasion protein intimin found in Yersinia spp. as well as intimins of invasive E. coli strains. Comparison of this region with representative ST69 and other strains compared to EAEC 042 is shown in Fig. 4. In EAEC 042 selC-A lies between an intact copy of the selC gene and a 21 bp direct repeat of the 3′ end of the selC tRNA gene. Three backbone genes then intervene (setC, yicL, nlpA) before the region of the selC-B region. All ST69 strains in our isolates contained the selC-B locus with over 95% identity to the EAEC 042 region (Table 1). The variations were found within the central domain of the EaeX product, which contains the bacterial immunoglobulin (Big) repeats, with variation in the number of repeats contained within this domain. A similar region was also found in non-ST69 isolates; one ST59 strain and one ST349 strain also possessed the ETT2 locus. These major differences in the number of Big repeats between the strains is shown in Supplementary Fig. S2. Domain analysis with ScanProsite also identified an N terminal LYSM domain, a module that recognizes polysaccharides containing N-acetylglucosamine (GlcNAc) residues including peptidoglycan24. The Big repeat number was conserved within the ST69 strains suggesting that once the eaeX gene was acquired within this strain it has been maintained; there are too few isolates with the eaeX gene from other STs to be able to comment on its conservation or otherwise in these groups. As with the ETT2 locus, the selC-B region was entirely missing in ST131 isolates. The selC-A region was largely absent from our isolates but was partially present in one of the ST69 isolates (ECO#72, Fig. 4).

Figure 4
figure 4

Comparison of the selC operon in different strains. Degree of identity is shown by the level of grey shading as indicated. Genes are colour coded according to putative function as shown. *Shows the position of a frameshift mutation in the eilA gene of sample 1#47 (ST59).

EilA has been shown to regulate genes within the selC-B region as well as the ETT2 island adjacent to the tRNA glyU gene18. We wished to determine if we could define conditions under which eilA was transcriptionally active, and hence activating the ETT2 island. We constructed a reporter gene containing 500 bp of upstream sequence from the eilA gene found in the neonatal meningitis associated E. coli strain CE1025. Analysis of this region in strain EC1#2 used for the detailed reporter expression studies showed 96.4% identity with the same region in CE10 and with perfect conservation of putative binding sites for purR, fnr, argr2, argR and a 7/8 nucleotide match to a putative site for rpoS17. Using this reporter in 5 of our ST69 isolates containing the ETT2 locus, we could readily detect reporter gene activity that peaked in the late log phase of growth in equal parts LB and Dulbecco’s Modified Eagle’s Medium (LB:DMEM media) (Fig. 5A,B). Previous studies of transcriptional activation of the LEE have shown this is maximal in less rich media designed for growth of eukaryotic cells such as DMEM compared to the rich medium LB26,27. Following optimization of growth in different media, we compared transcriptional activity of the eilA reporter construct in an ST69 strain grown in LB alone compared to the 1: 1 mixture of LB and DMEM (Fig. 5C,D). Growth in the different media was not significantly different but induction of the promoter was much more marked in the LB:DMEM mix. Transcription of eilA and two other putatively co-regulated genes was confirmed using qPCR at one time point; however, detection was at the limits of detectability and there was no significant difference between transcript levels in bacteria grown in LB or LB:DMEM (Supplementary Fig. S3). Given the short half-life of bacterial mRNAs of the order of 2–10 minutes28, we feel the reporter assay is a more sensitive and accurate measurement of eilA promoter activity. In an attempt to identify proteins potentially secreted into the growth media by ETT2, we compared the pattern of secreted proteins from an ST69 strain with intact ETT2 between the two different media but we did not identify any putative T3SS secreted proteins or secreted components of the T3SS structural domains (data not shown).

Figure 5
figure 5

Activity of the eilA reporter in different strains and media. (A,B) Graphs show growth (Optical Density, panels A) and reporter activity (GFP fluorescence, panels B) at the times indicated. The strains are: EC1#2 (A), EC1#19 (B), EC1#5 (C), EC1#21 (D), and EC1#9 (E), all grown in LB:DMEM mixture. Each point is the mean of a triplicate determination; error bars (sem) are contained within the points. (C,D) strain EC1#2 is grown in the different media as indicated.

Presence of ETT2 and selC/eip locus within worldwide collection of E. coli

In order to ascertain whether the intact ETT2 and selC/Eip loci within ST69 strains was specific to Scotland or more widespread, we analysed the genomes of E. coli available from public depositories with worldwide representation. We identified 269 strains with full sequence data (Supplementary Table S1). The distribution of STs within this group compared to those within the Scottish blood culture isolates is shown in Supplementary Fig. S4. The major STs within both groups are very similar: ST131, ST69, ST73, ST95, ST12 and ST127. Analysis of the length conservation of the ETT2 locus within these sequences is shown in Fig. 6 for both the local and the global sequences. Of 26 ST69 sequences within the global collection (Fig. 6B), 22 had a 98.4% length identity to the reference ETT2 locus in the EAEC 042 strain, two strains had a 95.1% match, and one had a 87.9% match; one ST69 strain had virtually deleted the locus (2.0% length identity). Of the non-ST69 strains that showed >95% conservation of the ETT2 locus, there was no ST present with more than 4 members. Two ST38 strains were included in this group, also found within our collection of Scottish bacteremic strains with high conservation of the ETT2 locus. The length conservation of the selC/Eip locus (over the selC-B region) for the local and global E. coli strains is shown in Fig. 6C,D. 19/26 (73%) of the ST69 strains had a >60% length conservation compared to the reference EAEC 042 strain. The major differences in length of the different strains from the EAEC reference were in the eaeX gene, which contain different numbers of the bacterial immunoglobulin-like repeats. As for the ETT2 locus, the ST131 strains did not contain any of the selC-B locus either.

Figure 6
figure 6

Length conservation of the ETT and selC/Eip locus in different strains of E. coli compared to the reference strain, EAEC 042. Data from the local Scottish strains (panels A and C) and global data (panels B and D) are shown. (A,B) are the comparisons for the ETT2 locus and (C,D) are for the selC/Eip locus. STs with fewer than 4 representatives are classed as Other in panels A, B, and D.

Discussion

We report here the presence of genomic regions encoding ETT2 and associated putative T3SS effectors within E. coli ST69 isolates from bacteremic patients within Scotland. In virtually all of the isolates, the two regions encoding these proteins contained a full complement of genes with no deletion, insertions or inactivating mutations suggesting that the ETT2 and associated effectors could be functionally active. This is in contrast to the vast majority of ETT2 sequences reported to date, which have undergone significant mutational attrition. The conserved nature of the ETT2 sequences reported here strongly suggests that there has been selection pressure for these regions to be conserved within the ST69 lineage.

ST69 belongs to phylogroup D of the E. coli lineage. We did not detect ETT2 in E. coli of ST131, which is phylogroup B2. Although not completely clear, our data are in agreement with the origin of the different phylogroups as discussed by Ren et al.15, who suggest that ETT2 is not present in the ancestral B2 phylogroup but was acquired at some point in the evolution of the D group. Feature free profiling also supports the view that B2 is the ancestral group, with phylogroup D diverging thereafter16. Subsequent lineages show significant mutational attrition of the ETT2 locus, although our data show strong conservation in the isolates of ST69 studied here. ST69 is one of the common STs found in bloodstream isolates of E. coli. In our collection, ST69 was mostly found in infections acquired from the community23. The natural environment of these human pathogenic E. coli is the gastrointestinal tract; passage into blood is predominantly through ascending infection into the bladder and renal tract. Evolutionary pressure to retain ETT2 might therefore have arisen through its ability to provide a selective advantage in gut colonization and/or in infection of the renal tract. Importantly, we also found highly significant conservation of the ETT2 and selC/Eip loci in E. coli strains from global collections, showing that the preservation of these regions is not confined to local Scottish strains.

We noted that two strains had a stop codon in the epaO gene at the same site as noted for E. coli O157:H7 Sakai (EC1#70 and ECO1#18). epaO is homologous to the Salmonella Typhimurium T3SS gene, spaO13, which encodes a protein that forms part of the cytoplasmic sorting platform essential for energizing and sorting substrates for delivery to the needle complex29. spaO is essential for type III secretion in S. Typhimurium30. Recent work has shown that spaO produces two protein products by tandem translation: a full-length protein and a shorter C terminal portion that is translated from an internal ribosome binding site and alternative initiator codon31. Both are needed for functionality of the T3SS in S. Typhimurium, so the loss of the full-length product of epaO will likely also render the ETT2 non-functional.

The functional effects of ETT2 remain obscure. In strains with a disrupted ETT2, genetic deletion does seem to confer a changed phenotype, with defective invasion and survival within brain microvascular endothelial cells22; this suggests even these apparently non-functional regions have a pathogenic role or can be complemented by other gene products. Additionally, experiments in avian strains with ETT2 also suggest a functional role for the ETT2 in pathogenesis20. ETT2 has also been implicated in the control of gene expression from the locus of enterocyte effacement in enterohemorrhagic E. coli O15732. We could not identify any putative secreted ETT2 substrates from the ST69 strains reported here. A recent study of E. coli serotype O2 that causes avian coccobacillosis also failed to identify potential ETT2 secreted proteins, but did find that the intact ETT2 mediated expression and secretion of flagellar proteins, as well as other changes in cell surface behaviour33. It may be that the conditions under which the ETT2 mediates secretion have not been identified, or that it carries out different functions.

In summary therefore, we show here that the ST69 strain of human pathogenic E. coli has an intact genetic locus for ETT2 and associated proteins. The preservation of these sequences in the ST69 strain suggest that its functional effects might confer a significant selection advantage. However, its exact functional effects remain obscure.

Methods

Sequencing and genome analysis

Whole genome sequencing of 162 strains of E. coli from human clinical samples were collected and sequenced as previously described23. Mean Phred score of the reads was 34.57 (99.9% base call accuracy), mean N50 of the assemblies was 355, 277, and the mean number of contigs assembled per sample was 62.6. The data for the individual samples is shown in Supplementary Table S2.

For pangenome analysis, Illumina reads were assembled using the de novo assembler SPAdes34. After filtering contigs less than 100 bp long, genomes were annotated for genus Escherichia using PROKKA35 with default parameters. Annotated genomes were then studied using the pan-genome pipeline Roary36 using minimum blastp identity as 95% and percentage of isolates to be in the core genome taken as 99%. The presence and absence of genes in accessory genome (>5% and <99% isolates) was used to create the binary tree. The ETT2 locus genes were identified using BLAST against the ETT2 region of EAEC 042 strain.

Comparison between selected sequences were made and visualised using Easyfig37. MLST typing was performed in silico using the Achtman profile in BIGSdb38.

Identification of the ETT2 and eip loci was performed using BLAST. Coverage and sequence percentage identities to the reference genome of EAEC 042 are shown in Table 1. The ETT2 locus was termed intact if it contained no deletion, insertions or inactivating mutations.

Maximum Likelihood trees of the sequences shown in Fig. S1 was performed using RaxML39 on core genes (2804 genes) using a Generalised Time Reversible Gamma model (-m set to GTRGAMMA), the algorithm set to rapid Bootstrap analysis and search for bestscoring ML tree in one program with flag -f set to a, number of bootstraps set to 100, and the seed values in flags -p and -x set to 12345.

Global representative sequences filtered as bloodstream isolates were downloaded from the EnteroBase archive40 with accession numbers set out in Table S1. They were from diverse geographical locations and patients as indicated in available metadata.

Domain analysis of the EaeX protein

Domain identification within the EaeX protein was determined using ScanProsite and the PROSITE data base41.

Growth and eilA reporter assay

Growth media used in this study were DMEM (Invitrogen, UK), LB, and a 1:1 mix of LB with DMEM. The eilA reporter construct contains a ~500 bp fragment upstream of the eilA promoter from the CE10 strain of E. coli (Accession number NC017646) that was cloned into a plasmid (pAJR70) used in a previous study for the assessment of transcription of ETT1 operons by enhanced green fluorescent protein (GFP) monitoring from liquid culture42. The different bacterial strains were transformed with this plasmid using standard methods. Chloramphenicol (25 µg/ml) was added to media when required for the selection of strains containing the eilA reporter. Induction of GFP in the different media at 37 °C was measured using a fluorescence plate-reader (FLUOstar Optima; BMG; Labtech, UK). Optical densities and fluorescence were recorded every 24 minutes for 9 hours. Measurements from bacteria transformed with the promoterless pAJR70 showed there was no signal produced above that of the fluorescence of bacteria alone which was subtracted from all readings.

Type III secretion assay

Secreted proteins were extracted by trichloroacetic acid precipitation performed as previously described43. Briefly, overnight LB cultures were diluted 1/100 in 50 ml of the culture media and grown for 9 hours before precipitation of secreted proteins. Secreted proteins were resuspended in 150 µl of loading buffer and analysed by SDS-PAGE.

Quantitative PCR

Bacteria were grown to late-log phase in the indicated media and harvested into RNAprotect (Qiagen) according to the manufacturer’s guidelines. RNA was extracted using a RNAeasy kit (Qiagen) according to the manufacturer’s guidelines. Contaminating DNA was removed using Turbo DNAse (Ambion) followed by phenol-chloroform extraction and ethanol precipitation. RNA was reverse transcribed and quantitative PCR performed using a Syber Green PowerUp master mix (Applied Biosystems) according to the manufacturer’s protocol. Specific primers used are shown in Supplementary Table S3. Amplification was performed using a 7500 series RT PCR system (Applied Biosystems). Quantification of results was performed using the ΔΔCt method of Livak and Schmittgen, using gapA as a reference gene44. Results were expressed as fold induction in LB:DMEM relative to the level in LB alone.

Ethical approval

Advice was sought from the Local Research Ethics Committee of Greater Glasgow and Clyde NHS Board. Specific ethical permission was deemed not to be required as the study was viewed as service improvement. Approval for access to clinical patient data was given by the Caldicott Guardian of the relevant health boards, who is the designated regulator of confidential patient information within NHS Scotland.