Identification and pathogenomic analysis of an Escherichia coli strain producing a novel Shiga toxin 2 subtype

Shiga toxin (Stx) is the key virulent factor in Shiga toxin-producing Escherichia coli (STEC). To date, three Stx1 subtypes and seven Stx2 subtypes have been described in E. coli, which differed in receptor preference and toxin potency. Here, we identified a novel Stx2 subtype designated Stx2h in E. coli strains isolated from wild marmots in the Qinghai-Tibetan plateau, China. Stx2h shares 91.9% nucleic acid sequence identity and 92.9% amino acid identity to the nearest Stx2 subtype. The expression of Stx2h in type strain STEC299 was inducible by mitomycin C, and culture supernatant from STEC299 was cytotoxic to Vero cells. The Stx2h converting prophage was unique in terms of insertion site and genetic composition. Whole genome-based phylo- and patho-genomic analysis revealed STEC299 was closer to other pathotypes of E. coli than STEC, and possesses virulence factors from other pathotypes. Our finding enlarges the pool of Stx2 subtypes and highlights the extraordinary genomic plasticity of E. coli strains. As the emergence of new Shiga toxin genotypes and new Stx-producing pathotypes pose a great threat to the public health, Stx2h should be further included in E. coli molecular typing, and in epidemiological surveillance of E. coli infections.


Identificaition of a novel Stx2 subtype in STEC strains of Marmot origin. Among six STEC strains
isolated in this study, two of them carried stx 2 gene that can be subtyped by the PCR-based method, one carried stx 2a , and the other harbored stx 2g (Table 1). However, the stx 2 carried by other four isolates (STEC293, STEC294, STEC295, and STEC299) failed to give amplicon using stx 2 -subtypes specific primers. The full length stx 2 genes of the four isolates was then amplified. Sequence analysis revealed the stx 2 genes in these four isolates were identical.
Phylogenetic trees reconstructed using the neighbor-joining algorithm (Fig. 1, see Supplementary Fig. S2 for extended version of this tree), maximum-likelihood and maximum parsimony algorithms shared the same topology and demonstrated that the Stx2 from STEC293, STEC294, STEC295, and STEC299 form a distinct lineage, not clustering with any known Stx2 subtypes and variants. These data suggest these four STEC strains harbor a novel Stx2 subtype. Based on the new nomenclature for Stx, the new Stx subtype was designated Stx2h. STEC299 encoding variant Stx2h-O102-STEC299 was used as type Stx2h strain for further analysis. Comparison of sequences of the stx 2h subtype and other existing stx 2 subtypes revealed that the nucleic acid sequence of subunit A of the stx 2h showed a similarity ranging from 69.7 to 92.9% to the previously reported stx 2 subtypes, and 67.2% to 91.3% for subunit B. When comparing sequences of Stx2 holotoxin, the similarity with others ranged from 63.8% to 91.9% in nucleic acid level, and from 71.9% to 92.9% in amino acid level ( Table 2). The amino acid alignment for Stx2h-O102-STEC299 holotoxin against other seven subtypes, demonstrated 13 amino acids difference (Fig. 2). The intergenic region between the A and B subunit of stx 2h contained 12 nucleotides, exhibiting the same intergenic region size with stx 2b , stx 2e , stx 2f and stx 2g , but display distinct nucleotides composition (CAGGAGTTAAAC) with others (Fig. S1).
All four stx 2h -isolates gave an expected band about 150-bp by using the stx 2h -specific PCR, but seven stx 2 subtypes reference strains and a non-O157 STEC collection strains from wild animals in the same sampling region were all negative for stx 2h.

Stx2h is inducible and functional.
To determine if Stx2h is inducible, the levels of basal and induced stx 2 expression were determined using real-time RT-PCR. Result showed stx 2 was expressed constitutively (Fig. 3A), while the basal transcription level in STEC299 was lower (4.2 times) than that observed in the O157:H7 outbreak strain Xuzhou21 under non-inducing conditions. Notably, the induced stx 2 expression level is 18.3 times higher than its basal level in STEC299, posting a greater inducing ability than Xuzhou21.The A subunit of novel Stx2h can be recognized against a Stx2 rabbit polyclonal antibody that have been used effectively to recognize all other Stx2 subtypes, while with a lower amount of the protein loaded on the gel (Fig. 3C). Vero cell cytotoxicity assay showed that STEC299 had similar cytotoxicity after 24 hours incubation comparing to the outbreak strain Xuzhou21 (Fig. 3B). These results indicated that stx 2h expression was inducible and Stx2h had cytotoxicity to Vero cells.

STEC299 exhibited a hybrid virulence genes spectrum.
To determine the pathogenic potential of this newly identified Stx2h harboring strain STEC299, the virulence factors were investigated by using a BLAST search against VFDB (http://www.mgc.ac.cn/). Remarkably, except for stx 2 , none of STEC main virulence-related factors, like eae, ehxA, saa, efa1was identified in STEC299. Instead, many other virulence factors were identified (Table S2), which mainly belonged to five categories: adherence, invasion, autotransporter, iron uptake, Notably, the majority of virulence genes in STEC299 are miscellaneous ExPEC virulence determinants, including type I fimbriae genes and invasion gene ibeA, which have been regarded as virulence determinants common to ExPEC 19,20 . Additionally, the presence of fyuA, chuA, and tsh identified in STEC299 has been significantly associated with UPEC and NMEC but not diarrhoeagenic E. coli (DEC) 21 .
Phylogenetic position of STEC299. The phylogenetic position of STEC299 among a diverse collection of E. coli and Shigella spp. genome sequences comprised of representatives of all major pathotypes was assessed using ribosomal multilocus sequence analysis, whole-genome multilocus sequence typing (wgMLST) and whole-genome phylogenetic analysis. In total 53 ribosomal protein gene sequences were extracted from the annotated whole-genome sequence of the 33 strains. Ribosomal protein L36 and L31 type B gene was excluded from the analysis because of the presence of paralogues in some genomes. A ClonalFrame tree (Fig. 5A) was inferred from the 53 concatenated ribosomal protein gene sequences that are single-copy and shared by the 33 strains, which revealed that the novel Stx2h converting strain STEC299 formed a remote cluster with all 10 reference STEC genomes, while grouped close with strains representing EPEC, UPEC, NMEC, APEC, AIEC. Similarly, neighbor-net phylogeny (Fig. 5B) and Gubbins tree (Fig. 5C) generated with the concatenated sequences of all the 2,321 shared loci found in wgMLST analysis were also consistent with this finding. Our study indicates that the Stx2h-containing strain STEC299 is phylogenetically closer to other pathotypes of E. coli than STEC, thus we propose that STEC299 might evolve from other pathotypes into STEC by acquiring Stx prophage and other virulence factors.  Stxs normally reside in bacteriophages, where horizontal gene transfer could lead to emergence of new Stx-subtypes/variants or Stx-producing pathogens 18 . In a recent study, a novel Stx1 subtype, Stx1e, was identified from an Enterobacter cloacae strain 30 . The emerging of the new stx subtypes/variants, and new Stx-producing pathotypes pose a great threaten to the public health. Here, we reported a novel Shiga toxin 2 subtype, named Stx2h produced by E. coli O102:H18 strains from marmots in Qinghai-Tibet plateau of China. Stx2h shows high induced level of stx 2 expression, and cytotoxicity to Vero cell, and the reactivity with anti-Stx2 antibody, posing diagnostic challenge for the emerging of new Stx subtypes/variants. Remarbly, the novel Stx2h subtype occurred at an unexpectedly high rate of 66.7% (4 of 6) in marmot STEC strains, but was not detected in any of other animal-derived or human STEC strains we investigated in different regions of China so far. The absence of Stx2h in other animal in the same enviroment suggests that it may be a recently emerged subtype that has not yet extensively spread among animals, or it could be limited to a specific host or ecosystem. The occurrence of the new Stx2h subtype from marmots enlarges the pool of Stx2 subtypes and add further information to the global epidemiological picture of STEC strains. Future work should bring into light if the novel Stx2h subtype is specific for STEC strains adapted to marmot or wild animal on the Qinghai-Tibetan plateau.

Discussion
Pathogenic E. coli can be categorized into different pathotypes based on the presence of specific virulence markers. The stx genes specific for STEC reside in the genome of heterogeneous lambdoid prophages, Stx-converting bacteriophages 31 , which could infect various bacterial hosts wider than expected 18 . The potential genetic combinations due to gene transfer may result in hybrid pathotype strains. Stx2-phages can infect and lysogenize almost all known pathotypes of E. coli, including both diarrheagenic E. coli (DEC) and extraintestinal pathogenic E. coli (ExPEC) 17 . The emergence of novel hybrid form of STEC and other E. coli pathotypes might result in more severe disease. For instance, the STEC/enteroaggregative E. coli (EAEC) hybrid strain O104:H4 caused a large outbreak with numerous HUS cases in Germany in 2011 32 . STEC/enterotoxigenic E. coli (ETEC) hybrid strains have been recovered from various sources and correlated with diarrheal disease and even HUS in humans 33,34 . A recent report described an STEC/ExPEC hybrid that caused HUS and bacteremia 35 . STEC/UPEC hybrid strains have also been identified from hospital patients 20,36 . Our study reveals the presence of virulence factors from multiple pathotypes of E. coli in the Stx2h converting strain STEC299, including type I fimbriae genes, ibeA, chuA, fyuA, and tsh [19][20][21]37 . The gene pic, originally identified in the EAEC prototype strain 042 38 , was present frequently among UPEC strains, with a positive rate of 13% reported by Abe et al. 39 . The autotransporter gene tsh in E. coli strains are associated with acute pyelonephritis, and are expressed during urinary tract infection 40 . Considering the severe disease caused by hybrid pathotypes, the pathogenic potential of STEC299 should not be neglected and calls for considerable attention. Further, the combined virulence traits of STEC299 is in accordance with our previous findings that most of the marmot E. coli strains exhibited hybrid forms carrying virulence markers from various pathotypes 16 , indicating that marmot E. coli strains exhibit a marked genome plasticity. Future work should aid to ascertain if the E. coli strains from marmots show more tendency to represent hybrid genotypes.
Phylogenies inferred from whole genome comparision clearly underlines that STEC299 are phylogenetically closer to other pathotypes of E. coli than STEC group, thus we propose that marmot strain STEC299 may evolved from other pathotypes by horizontal gene transfer and gaining Stx2 phage, which is supported by view that pathogenic E. coli was evolved from non-pathogenic E. coli through horizontal transfer of virulence genes, resulting in mixed pathotypes with enhanced pathogenicity 41 . However, there are some limitations in genomic analysis, as the number of ExPEC and other pathotypes used for genome comparison is small due to the limited completed reference genomes available, further study are needed to clarify the evolution pattern. Moreover, reference genomes used for comparison are mostly from human-derived strains, there might be a possibility that strains are phylogenetically divergent based on the host origins, thus a various collection of strains are further needed for better understand the phylogenetic placement.
In conclusion, we report the discovery of a novel Shiga toxin 2 subtype from marmot E. coli strains, and enlarges the pool of Stx2 subtypes. Our study shows the new Stx2h converting strain STEC299 is a heteropathogenic strain, which is closer to other pathotypes of E. coli in terms of both phylogenies and virulence gene spectrum. As the emergence of new Stx subtypes and Stx-producing pathotypes has represented a serious problem with the tendency to cause more severe disease, the novel Stx2h should be further included in molecular typing of E. coli strain, and in epidemiological surveillance of E. coli infections.

Methods
Ethics statement. The Marmots (M. himalayana) were sampled as part of the animal plague surveillance program conducted in Yushu Tibetan autonomous prefecture, Qinghai province. The sampling was performed in accordance with the protocol for national plague surveillance program in animals. The study has been reviewed and approved by the ethic committee of National Institute for Communicable Diseases Control and Prevention, China CDC.
Sampling and strain isolation. Of a total of 200 Marmots sampled between July and August 2013, 51 were from Zhongdaxiang (with an altitude of 3599.6 m above sea level (a.s.l)), 120 from Dezhuotan (3025 m a.s.l), and 29 from Dedacun (3625.6 m a.s.l), respectively. The Marmots were captured by cages in the field and sampled in the laboratory of local Centre for Disease Control (CDC). The intestinal contents were collected in 2 ml sterile tubes containing Luria-Bertani (LB) medium in 30% glycerol, which were stored at −20 °C immediately and transported to the laboratory in the National Institute for Communicable Disease Control and Prevention in Beijing. Strains were isolated and confirmed to be STEC by the methods we previously described 11,13 . Briefly, enriched samples in E. coli broth (Land Bridge, Beijing, China) were examined by PCR for the presence of stx genes with primers stx1F/Stx1R and Stx2F/Stx2R respectively 13 . PCR-positive enrichments were then streaked onto CHROMagar TM ECC agar (CHROMagar, Paris, France), and MacConkey agar (Oxoid, Hampshire, UK). Colonies resembling E. coli were picked and tested for stx genes by single colony duplex PCR assay. Serotyping, detection of main STEC-related virulence factors (stx, eae, ehxA, efa1, saa, paa, toxB, and astA), and multilocus sequence typing (MLST) were conducted as we previously described 11,13 . Stx subtyping based on phylogenetic analysis. stx subtypes of STEC isolates were determined by the PCR-based subtyping method 4 . For strains that failed to be detected by the stx 2 subtype-specific primers, the completed stx 2 gene was amplified as described previously 11,42 , then cloned into vector pMD18-T and transformed into E. coli JM109 (Takara, Dalian, China). About 10 transformants were selected for sequencing to discern multiple stx 2 subtypes in a PCR product. The 93 representative reference nucleotide sequences of the full stx 2 operon of stx 2 subtypes and variants (stx 2a -stx 2g ) were downloaded from GenBank as previously described 4 . The amino acid sequences for the combined A and B holotoxin were translated from the open reading frames. The full nucleotide and amino acid sequences, including A and B subunits, the intergenic regions, were aligned and compared by using Clustal Omega to evaluate the differences between stx 2 sequences. Phylogenetic trees based on the holotoxin amino acid sequences were reconstructed with three algorithms, neighbor-joining, maximum likelihood and maximum parsimony, using MEGA 7 software (www.megasoftware.net) 43 , and the stability of the groupings was estimated by bootstrap analysis (1000 replications). Genetic distances were calculated by the maximum composite likelihood method.
Developing a specific PCR to detect and subtype stx 2h . The stx 2h and reference nucleotide sequences of stx 2a -stx 2g were aligned. Subtype-conserved areas were searched to develop a pair of stx 2h -specific primers Stx2h-F (5′-AGATCTCATTCTTTATATG-3′) and Stx2h-R (5′-TCCCCATTATATTTAGAG-3′). The PCR cycling conditions were as follows: an initial denaturation at 94 °C for 5 minutes followed by 30 cycles, each consisting of 30 seconds at 94 °C, 30 seconds at 51 °C and 30 seconds at 72 °C, and a final elongation at 72 °C for 5 minutes using Premix Taq ™ ((TaKaRa, Japan). The expected PCR product is 149-bp. Seven stx 2 reference strains and a non-STEC collection from yak, marmot, pika, antelope, cattle, goat, pig, food, diarrheal patients and healthy carriers reported previously 14 were tested by this subtyping protocol.

Determination of stx 2 transcription by real-time reverse-transcription (RT)-PCR. Strains were
grown in Luria-Bertani medium at 37 °C with shaking to an OD 600 of 0.6., Mitomycin C (BBI, USA) was added to a final concentration of 0.5 μg/ml and incubated for three hours to induce the Stx2 phage. Total RNA was extracted with RNeasy Mini Kit (Qiagen, Germany). Real-time RT-PCR was performed with the Rotor-Gene Q system (Qiagen, Germany) using a One Step SYBRH PrimeScript TM RT-PCR kit (TaKaRa, Japan), according to the manufacturer's instructions. The RT-PCR profile was as follows: 42 °C for 10 minutes, 95 °C for 10 seconds, and 40 cycles of 95 °C for 15 seconds, 60 °C for 1 minute. Primers stx2F (CAACGGACAGCAGTTATACCACTCT) and stx2R (TTAACGCCAGATATGATGAAACCA) allowed amplification of an stx 2 fragment. Primers gapA-F (TATGACTGGTCCGTCTAAAGACAA) and gapA-R (GGTTTTCTGAGTAGCGGTAGTAGC) allowed amplification of an gapA fragment. Expression levels of house-keeping gene gapA (D-glyceraldehyde-3-phosphate dehydrogenase) were used as endogenous control within each sample. The relative level of stx 2 expression was calculated using the 2 −ΔΔCT method 44  Detection of Stx2 production by Western blot. Western blots were conducted as described 45 . Briefly, the culture was induced with mitomycin C and incubated overnight. The supernatants and cells were harvested and separated by SDS-PAGE. After PAGE, the proteins were transferred to an Immobilon ® PVDF membrane (pore size, 0.45 µm; Merck, Germany). Stx2 rabbit polyclonal antibody was diluted to 1 μg/ml in PBS buffer and incubated for 3 hours at room temperature, and then washed three times in PBST. Goat anti-Rabbit IgG (H + L) (IRDye ® 800CW) at a 1/20,000 dilution was incubated for 2 h at RT. The blots were washed four more times with PBST, and then visualized with a 5 minutes exposure with a LI-COR Odyssey scanner (LI-COR Biosciences, USA).
Vero cell cytotoxicity. The cell-free supernatants were used for Vero cytotoxicity assay. Briefly, Vero cells were maintained in tissue culture flasks in DMEM (Difco, USA) supplemented with 10% fetal calf serum at 37 °C in an atmosphere of 5% CO 2 for 24 h. Filtrates were added in triplicate to Vero cells (10 4 cells per well) in 96-well tissue culture plates, then incubated at 37 °C in a 5% CO 2 atmosphere. After 24 h, the release of the cytoplasmic lactate dehydrogenase (LDH) was detected using the Cytotox96 kit (Promega, USA) according to the manufacturer's instructions. The percentage of cytotoxicity was determined as (experimental release-spontaneous release)/(maximum release-spontaneous release) × 100. The spontaneous release was the amount of LDH activity in the supernatant of uninfected cells, the maximum release was that when cells were lysed with the lysis buffer. E. coli Xuzhou21 and E. coli MG1655 were used as control. stx 2 stability. The stability of stx 2 was evaluated as previously described 30 . Briefly, subcultures of bacteria were prepared for 14 consecutive days. Nucleic acid extracts from the consecutive subcultures were tested using the Rotor-Gene Q system (Qiagen, Germany). The primer Stx2F (GGGCAGTTATTTTGCTGTGGA), Stx2R (GAAAGTATTTGTTGCCGTATTAACGA) and probe Stx2-P FAM-ATGTCTATCAGGCGCGTTTTGACCATCTT-BHQ1were used for stx 2 detection as described previously 46 . The reaction mixture consisted of 5 μL of DNA extract, 15 μL of 2 × Premix Ex Tap (Probe qPCR) (TaKaRa, Japan), 1 μL of Stx2-F primer (50 μM), 1 μL of Stx2-R primer (50 μM), 0.5 μL of FAM-labeled probe of stx 2 (5 μM). Cycling conditions used for real-time PCR amplification were as follows: 50 °C for 2 min, 95 °C for 10 min, and 40 cycles at 95 °C for 15 s and60 °C for 1 min.
DNA isolation and whole-genome sequencing. Genomic DNA was isolated from an overnight culture using the Wizard Genomic DNA purification kit (Promega, USA) according to the manufacturer's instructions. The complete genome was sequenced by single molecule, real-time (SMRT) technology using the Pacific Biosciences (PacBio) sequencing platform 47 . The data were assembled to generate one circular genome without gaps by using SMRT Analysis 2.3.0 48 . The protein-coding sequences (CDSs), tRNAs and rRNAs were predicted using GeneMarkS 49 . The prophages were predicated by the PHAge Search Tool (PHAST) 50 . The virulence factors were predicted through the BLAST tool of NCBI and by using the virulence factor database (VFDB; http://www. mgc.ac.cn/) 51 .
Stx containing prophage sequence analysis. The sequence of the Stx converting phage was extracted from the complete genome by using the PHASTER (http://phaster.ca/) 52 , the genome of Stx converting phage was reannotated using the RAST server (http://rast.nmpdr.org/) 53 , and then manually verified and corrected. Functional annotation of selected CDSs was performed based on the results of homology searches against the public nonredundant protein database (http://www.ncbi.nlm.nih.gov/) by using BLASTP. The gene adjacent to the integrase was designated as the phage insertion site 54 . The full Stx2 phage sequence of the STEC299 was compared in detail to representative Stx2 converting phages and visualized using perl script. The Stx2 phage sequences of the reference strains used in the current study were kindly provided by Dr. David A. Rasko, University of Maryland School of Medicine 54 .

Phylogenomic Analysis.
To generate a robust, high-resolution phylogenomic tree depicting position of the novel Stx2 converting strain STEC299, the genome were compared with 32 E. coli/Shigella spp. completed genomes comprised of representatives of all the major pathotypes (Table S1) by using two strategies: ribosomal protein gene sequence analysis (rMLST) 55 and whole-genome multilocus typing (wgMLST). The ribosomal protein subunites (rps) gene sequences were extracted from the annotated whole-genome sequence of the 33 strains. Three independent runs were then carried out with ClonalFrame (version 1.2) on the extracted rps gene sequences and the outputs of the analyses were converged and merged to generate a 95% consensus tree 56 . For wgMLST analysis, the completed whole-genome sequence of strain EDL933was used as reference to perform an ad hoc wgMLST analysis using Genome Profiler version 2.0 57 . The relationship of the strains was further analyzed with Splits Tree 4 58 . The whole-genome phylogeny was inferred from the concatenated sequences of the loci shared by the 33 whole-genome sequences, which was found in the wgMLST analysis. All the regions with elevated densities of base substitutions were eliminated and phylogenetic relationship were generated by Gubbins 59 .