Phylogenomic approaches to determine the zoonotic potential of Shiga toxin-producing Escherichia coli (STEC) isolated from Zambian dairy cattle

This study assessed the prevalence and zoonotic potential of Shiga toxin-producing Escherichia coli (STEC) sampled from 104 dairy units in the central region of Zambia and compared these with isolates from patients presenting with diarrhoea in the same region. A subset of 297 E. coli strains were sequenced allowing in silico analyses of phylo- and sero-groups. The majority of the bovine strains clustered in the B1 ‘commensal’ phylogroup (67%) and included a diverse array of serogroups. 11% (41/371) of the isolates from Zambian dairy cattle contained Shiga toxin genes (stx) while none (0/73) of the human isolates were positive. While the toxicity of a subset of these isolates was demonstrated, none of the randomly selected STEC belonged to key serogroups associated with human disease and none encoded a type 3 secretion system synonymous with typical enterohaemorrhagic strains. Positive selection for E. coli O157:H7 across the farms identified only one positive isolate again indicating this serotype is rare in these animals. In summary, while Stx-encoding E. coli strains are common in this dairy population, the majority of these strains are unlikely to cause disease in humans. However, the threat remains of the emergence of strains virulent to humans from this reservoir.

This study assessed the prevalence and zoonotic potential of Shiga toxin-producing Escherichia coli (STEC) sampled from 104 dairy units in the central region of Zambia and compared these with isolates from patients presenting with diarrhoea in the same region. A subset of 297 E. coli strains were sequenced allowing in silico analyses of phylo-and sero-groups. The majority of the bovine strains clustered in the B1 'commensal' phylogroup (67%) and included a diverse array of serogroups. 11% (41/371) of the isolates from Zambian dairy cattle contained Shiga toxin genes (stx) while none (0/73) of the human isolates were positive. While the toxicity of a subset of these isolates was demonstrated, none of the randomly selected STEC belonged to key serogroups associated with human disease and none encoded a type 3 secretion system synonymous with typical enterohaemorrhagic strains. Positive selection for E. coli O157:H7 across the farms identified only one positive isolate again indicating this serotype is rare in these animals. In summary, while Stx-encoding E. coli strains are common in this dairy population, the majority of these strains are unlikely to cause disease in humans. However, the threat remains of the emergence of strains virulent to humans from this reservoir.
Shiga toxigenic Escherichia coli (STEC) are emerging pathogens of public health concern worldwide, including in Europe, North and South America and Asia 1,2 . Ruminants, in particular cattle, have been identified as the predominant reservoir of STEC 3,4 , indicating that the bacteriophage-encoded Shiga toxins (Stx) are likely to confer an advantage to E. coli in these host animals. In Africa there is little information on the epidemiology of STEC in livestock systems and their impact on human health 1 . It is evident that only a subset of STEC are a serious threat to human health, these enterohaemorrhagic E. coli (EHEC) are associated with specific serogroups in particular the seven that have been defined as adulterants in beef production in the USA, O157, O26, O111, O45, O145, O103, O121 5,6 . Similar serotypes, especially O157 & O26 are also an issue in Europe. Typical EHEC strains can be further characterised by possession of a type 3 secretion system (T3SS) that enables colonisation of the gastrointestinal tract 7 . EHEC infections in humans are associated with diarrhoea and bloody diarrhoea, with the more serious sequelae of kidney and brain damage due to activity of Stx on the microvasculature in these organs 4,8 .
The cost of whole genome sequencing (WGS) has drastically reduced and it is now possible to sequence large numbers of isolates and use bioinformatics approaches to extract strain relatedness and gene carriage data. For E. coli, strains have classically been assigned into phylogroups that provide a good correlation with commensal versus pathogenic strains 9 . The phylogroups are based on particular combinations of specific genes and can be assigned from the WGS. Whole genome core SNP analysis to define strain relatedness is now commonly used and provides greater resolution than MLST 10 . In addition the serogroup, the O-chain of LPS, can also be inferred from their genetic determinants using WGS. The WGS of any strain collection is therefore a valuable resource allowing relatively rapid comparison of phylogeny and pathogenic potential.
Both small holding and large-scale dairy farming is important to the economic survival of communities in many developing nations, including Zambia 11,12 . As such, it is important to understand if practices on these units and their products may represent a threat to human health and where such risks exist suggest possible mitigation measures. A study has recently been carried out to sample E. coli strains from cattle across small, medium and large-scale (commercial) dairy farms in central Zambia, with the primary aim of understanding antibiotic use and antimicrobial resistance patterns in this sector 13 .
These isolates have now been further analysed in the present study for stx prevalence and any association with the farming system. In addition, E. coli isolates from patients with diarrhoea were also screened and sequenced to determine evidence of relationship to the bovine strains.

Results
Detection of Shiga toxin alleles (stx1 and stx2) in isolates from Zambian dairy cattle and humans. Eleven percent (41/371) of the bovine E. coli isolates were positive for the presence of Shiga toxin genes as defined by detection of appropriately sized PCR amplification products using an established stx multiplex assay 14 . Based on this, both stx1 and stx2 were detected in 54% (22/41) of the STEC, while 37% (15/41) had stx2 only and 10% (4/41) had stx1 only. Using this data, the overall adjusted prevalence of STEC across the different farming scales for the central Zambian study area can be estimated at 6% (95% CI: 2.5-10.2). The adjusted prevalence per farming scale was higher in medium-scale 17.1% (95% CI: 5.9-28.2) and small-scale 10.6% (95% CI: 6.6-14.5) farms when compared to the commercial farms 2.8% (95% CI: 0.3-6.0). Based on these ranges, there is a significant difference in estimated prevalence between the small and commercial scales. Logistic regression indicates that medium-and small-scale farming are significant risk factors for Shiga toxin producing E. coli (STEC) with commercial as a reference ( Table 1). Out of the 73 E. coli isolates from human patients with diarrhoea for which good quality sequence information was generated, no Shiga toxin genes were detected.
As an additional investigation, the enrichment cultures for all the animals (n = 371) were streaked onto sorbitol MacConkey agar plates and any non-sorbitol fermenting colonies tested for O157 agglutination. Only one animal yielded a positive strain (ZB-2213N0194) and this was then added to the study.

Phylogenetics.
In order to understand the genetic backgrounds of the STEC strains isolated in this study, including their potential threat to human health, their relationship to other human disease-associated EHEC were tested by phylogenetic methods. The WGS of 297 of the Zambian isolates (224 bovine and 73 human) were determined. This included 41 STEC, 37 of the 41 defined as stx+ by PCR from main study, three STEC strains from a pilot study and the single positively selected E. coli O157 strain. These were compared with 262 E. coli sequences from human, cattle, avian and canine hosts; one hundred and twenty nine strains in this second collection were human clinical STEC isolates (see Supplementary Table 1).
Alignment to a reference genome (E. coli O157:H7 str. Sakai, RefSeq assembly accession: GCF_000008865) resulted in 715,632 core positions with 68,327 single nucleotide polymorphism (SNPs) across all 559 sequences. A maximum likelihood phylogeny revealed the population structure of the E. coli strains (Fig. 1). While there was no clear clustering of the strains based on geographical location or host, there was, as anticipated, good correlation with established E. coli phylogroups, with only minor discordance. All possible phylogroups and cryptic clades were identified, however the majority of the E. coli strains (97%) were distributed across 5 phylogroups (Fig. 1).
Serotyping. The majority of EHEC strains that are a threat to human health are associated with 7 specific serogroups. A bioinformatic approach was used to serotype the sequenced strains. H typing was possible for 550/559 strains and O-typing for 483/559 (summarized in Supplementary Tables 3 and 4). Failure to detect specific genes in some of the strains was most likely due to assembly issues with short read sequences.
With the exception of the positively selected E. coli O157 strain, none of the Zambian bovine STEC strains (0/40) were assigned to any of these seven serogroups. In fact, only 3 strains across the whole set of cattle isolates could be assigned within these serogroups (ZB-244; serogroup O45 and ZB-2213N0112; serogroup O111 and ZB-2213N0194; serogroup O157). Overall, the Zambian strains (cattle and human) exhibited an extensive array

Toxicity analysis of stx+ strains.
To determine if the genotypically positive stx strains were able to express Stx, eighteen of the bovine STEC were examined for Vero cell cytotoxicity with and without mitomycin C (MMC) induction. 89% (16/18) of the MMC-induced STEC strains had a cytotoxic effect on Vero cells (Fig. 3). These samples were verified as Stx positive using a commercial ELISA (Fig. 3), with only one strain (4) exhibiting toxicity on Vero cells without any detection of Stx by ELISA.
Shiga toxin subtyping. Forty one STEC positive strains were included in the WGS analysis (Fig. 2) and from this stx alleles could be further subtyped using a published BLAST-based methodology 15 (Supplementary  Table 5). It was evident that the most cytotoxic strains (Fig. 3) were those encoding Stx2a often in combination with Stx1a, in line with studies of cytotoxicity and pathology induced by enterohaemorrhagic strains with different Stx variants 16,17 .
Stx association with type 3 secretion and enteroaggregative virulence factors. Typical enterohaemorrhagic E. coli strains are defined by the co-association of stx genes with a type 3 secretion system (T3SS) 7 .
In the present study, the presence of a T3SS was determined by detection of both eae and sepL. Based on BLAST analysis, 3.6% (8/224) and 2.7% (2/73) of the Zambian bovine and human isolates respectively may encode a T3SS (Fig. 2). Excluding the positively selected O157 strain, neither intimin (eae) nor sepL were detected in the bovine STEC. Other non-STEC but intimin positive strains were present within the Zambian strains analysed and some were in close proximity to clinical human STEC strains (Fig. 2). The cattle strains were also checked for the presence of the enteroaggregative E. coli (EAEC) adherence factors AggR and AA probe by PCR but all were negative indicating that these dairy cattle are not a common reservoir of enteroaggregative E. coli as associated with the atypical EHEC O104 outbreak in Northern Germany in 2011.

Discussion
Shiga toxins (Stx) can pose a serious threat to human health, and human infections are usually restricted to a subset of serogroups that can express Stx and a type 3 secretion system (T3SS) or other adherence mechanisms that can facilitate colonisation of the human gastrointestinal tract. In this study, E. coli isolates were obtained from cattle associated with dairy production in a region of central Zambia. A total of 371 isolates, each from an individual animal, covering 104 farms were tested for the presence of stx by PCR. Of these, 41 (11%) were positive. This gives an estimated prevalence (taking into account sampling and the study design effect) of 6% (95% CI: 2.5-10.2). To our knowledge, this is one of the first surveys to systematically analyse the proportion of random E. coli from a farm animal source that are positive for stx as other studies usually use positive selection methods from animals. Our survey does indicate that STEC are common in these dairy cattle. It was also evident that the small and medium sized production units had a higher prevalence of STEC than the commercial units sampled. This is of interest as it does indicate that management practices potentially influence the selection of STEC. On-going work will investigate these differences including the influence of breed which can differ between the farming scales 13 and/or diet which can change EHEC O157 prevalence 18 . Our study also examined 73 E. coli strains from human patients with diarrhoea presenting at the University Teaching Hospital (UTH) in Lusaka over the same time period. While the human sample numbers were relatively low, stx was not detected, indicating that STEC are unlikely to be common in the local human population. The overall phylogenetic analyses of the strains into phylogroups was as anticipated, with the majority of the Zambian bovine strains being present in the B1 group associated more with commensal strains, although this cluster contains non-O157 EHEC serotypes causing infections in humans that likely originate from cattle. A greater proportion of the human Zambian isolates clustered within the phylogroups associated with human disease, reflecting that the strains were collected from patients with diarrhoea and in some cases the strain may be the etiological agent.
Based on bioinformatic analyses of WGS, there was marked diversity of serotypes in the cattle and human sample populations ( Fig. 1 and Supplementary Tables 3 and 4). None of the randomly selected bovine STEC were allocated to a serogroup commonly associated with human EHEC infections. Furthermore, none of these bovine STEC encoded a T3SS based on detection of intimin (eae) and sepL alleles (Fig. 2), as both genes are present on the locus of enterocyte effacement that encodes the system. An additional study to positively select for E. coli O157 isolates from the faecal pat enrichments only identified one positive sample from a farm. Taken together, this study indicates that while STEC are common in the Zambian dairy cattle these strains would not be classified as EHEC and are unlikely to be associated with serious human disease.
While it is encouraging that EHEC strains were extremely rare, many of the supernatants from the STEC strains were cytotoxic and appropriate backgrounds for EHEC emergence are present. As such, we should remain vigilant in case Stx-encoding prophages from this reservoir do emerge in other strain backgrounds that have a higher capacity to cause disease in the human host. Continuing work on factors driving the maintenance of STEC strains in the bovine host will hopefully clarify approaches to reducing the threat from this emerging group of pathogens.

Methods
Bovine and human isolates from Zambia. Bovine E. coli isolates (n = 371) were collected as part of a previously published study investigating antimicrobial resistance 13 . In addition, a further 81 E. coli isolates from cattle were collected as part of a pilot study in 2013 in the same region 13 . Faecal sampling and animal handling of the farm animals was carried out in accordance with the approved guidelines issued by The Roslin Institute Animal Welfare and Ethical Review Body which approved this study 13 . In the main study, 376 dairy cattle from 104 farms representing about 20% of the dairy herds in the study area were randomly sampled and an E. coli was isolated from 371 animals (E. coli was not isolated from 5 animals) based on growth characteristics on both MacConkey agar and Bile-X-Glucuronide (TBX) plates (Oxoid, UK). Subsequent phylotyping indicated that 97% (361/371) could be allocated to established E. coli phylogroups 19 . In terms of subsequent studies the isolates were chosen as follows: From the main study all isolates (n = 371) were tested by the stx PCR to allow the prevalence to be estimated. 188 were sequenced but 186 were used in the phylogenetic analyses due to quality issues with 2 sets of reads. The sub-selected strains for sequencing were as follows (those with poor reads removed): (1) All strains showing phenotypic antibiotic resistance in the original study (n = 61) 13 ; Strains positive for stx by PCR (37/41); the rest (n = 88) were randomly chosen from the remainder. From the pilot study 40 from 81 strains were sequenced including three that were stx+ as determined by PCR; 37/40 were included in the phylogenetic study as three had read quality issues. As a separate study, enrichments from all animals (n = 371) were plated onto sorbitol MacConkey and any non-sorbitol fermenting colonies (3 per plate) tested for O157 agglutination. Only one animal yielded a positive isolate (ZB-2213N0194) and this positively selected isolated was then sequenced and added into the phylogenetic analysis. In total there were 224 bovine E. coli good quality whole genome sequences that were analysed in this study.
E. coli isolates (n = 79) from patients presenting with diarrhoea were collected at Lusaka hospital between 4th December 2014 and 7th January 2015 as part of another project managed by Prof. J.B. Muma and generously supplied for sequencing. Informed consent was obtained from all subjects. Six of the isolate sequences were not analysed due to read quality leaving n = 73 for phylogenetic and virulence determinant analysis. Further strain and sequence details are provided in the Supplementary Table 1. DNA extraction. DNA extraction was carried out using either a Wizard Genomic DNA Extraction Kit ® or a Qiagen ® DNA extraction kit from 1 ml of bacterial culture as defined in the manufacturers' protocols.
PCR detection of virulence determinants. All the bovine strains were screened by a published multiplex PCR for Shiga toxin genes and intimin 14 . The strains were also screened with a multiplex PCR for aggR and AA probe genes as markers for enteroaggregative E. coli 20 . The PCR products were visualised and captured using multi imaging software (Fluorchem HD2) following electrophoresis in 1.5% w/v agarose gel (Agarose, Melford, UK) and staining with Gelred ® . Verocytotoxicity assays. Established method 21,22 , with these minor variations: Single colonies were selected from LB agar plates and suspended in 10 ml of LB broth for 24 h (overnight). 50 μ l of overnight culture was added to 5 ml (1:100) of fresh LB broth and incubated for 60 min. Then 20 μ l of 5 μ g/ml mitomycin C (MMC) was added followed by overnight incubation. Sequencing analysis. All reads were generated by Illumina 1.9 paired-end read sequencing with read lengths from 36 to 251 bp. FASTQC 24 was used for quality assessment and where necessary trimming was done with cutadapt 25 . Short reads were aligned to a reference E. coli O157:H7 str. Sakai (RefSeq assembly accession: GCF_000008865) by combining BWA 26 , SAMtools and SnpEff 27 in a custom-made python script. The consensus sequence for each alignment of 5,590,092 bp was produced using the majority rule. Consensus sequences for each alignment were concatenated into one multifasta file that were then parsed to find core positions. Multifasta files of concatenated core nucleotides for each strain were used for recombination analysis with GUBBINS 28 . The recombinatorial regions were removed from the final alignment. The final Scientific RepoRts | 6:26589 | DOI: 10.1038/srep26589 alignment was then used to construct a Maximum Likelihood (ML) tree with RAxML 29 under a GAMMA model of heterogeneity with 100 bootstrap replicates (BS). The trees were visualised with ITOL 30 .
An established phylotyping scheme 31 was used as a starting point to develop a programme that assigned each strain to one of the 4 possible phylogroups (A, B1, B2, D) based on the presence or absence of one of 3 genes chuA, yjaA, arpA and one genetic fragment TspE4.C2. To further distinguish between groups and assign strains to an additional 4 phylogroups (C, E, F or cryptic clades), it was necessary to check for the presence of a fifth gene trpA and for the presence of specific alleles for the above genes. arpA alleles were used to distinguish between phylogroups D and E based on specific primer sequences described in 31 .
To establish gene presence or absence a database that includes all sequences from the collection were built with BLAST+ 15 . Query gene's sequences of intimin, sepL, chuA, yjaA, arpA, trpA, genetic fragment TspE4.C2 were downloaded from the NCBI website. Gene identifiers are presented in the (Supplementary Table 2). Query Shiga toxin sequences identified in 32 also were downloaded from the NCBI website. Gene's presence were established based on a E-value = 0 and similarity match at > 90% coverage of the query sequence. For Shiga toxins blast results were filtered based on bit score above 1000, if multiple contigs were involved only the highest result was kept.
Serogroups were identified based on presence of one or several alleles from the following genes: for O-typingwzx, wzy wzm and wzt; for H-typing the flagellin genes fliC, flkA, fllA, flmA and flnA. Databases were provided by Dr Flemming Scheutz and colleges 32 . Multi locus sequence type were identified using SRST2 software 33 .