Pathogenomic analyses of Shigella isolates inform factors limiting shigellosis prevention and control across LMICs

Shigella spp. are the leading bacterial cause of severe childhood diarrhoea in low- and middle-income countries (LMICs), are increasingly antimicrobial resistant and have no widely available licenced vaccine. We performed genomic analyses of 1,246 systematically collected shigellae sampled from seven countries in sub-Saharan Africa and South Asia as part of the Global Enteric Multicenter Study (GEMS) between 2007 and 2011, to inform control and identify factors that could limit the effectiveness of current approaches. Through contemporaneous comparison among major subgroups, we found that S. sonnei contributes ≥6-fold more disease than other Shigella species relative to its genomic diversity, and highlight existing diversity and adaptative capacity among S. flexneri that may generate vaccine escape variants in <6 months. Furthermore, we show convergent evolution of resistance against ciprofloxacin, the current WHO-recommended antimicrobial for the treatment of shigellosis, among Shigella isolates. This demonstrates the urgent need to integrate existing genomic diversity into vaccine and treatment plans for Shigella, providing a framework for the focused application of comparative genomics to guide vaccine development, and the optimization of control and prevention strategies for other pathogens relevant to public health policy considerations.

Phylogeny of S. flexneri population. ML phylogenetic trees were constructed using core genome SNPs from alignments of 659 S. flexneri PG1-7 genomes from GEMS with 45 publicly available genomes (A), and 147 Sf6 genomes from GEMS (B). Trees were rooted using E. coli genome. The outer concentric rings illustrate different genotypic and epidemiological data according to the numbered inlaid keys displayed next to the tree in A. Scale bars represents the number of SNPs.

Supplementary Figure. 4
Phylogeny of S. sonnei population. Midpoint rooted ML phylogenetic tree constructed using core genome SNPs from alignments of 308 S. sonnei genomes from GEMS and 40 publicly available genomes. Phylogeny of S. boydii and S. dysenteriae population. ML phylogenetic trees were constructed based on core genome SNPs outside region of recombination from alignments of (A) 79 S. boydii and (B) 60 S. dysenteriae genomes from GEMS and 24 publicly available genomes. Both trees were rooted using E. coli genome.

Supplementary Figure. 6
Estimation of timeframe for serotype switching among S. flexneri PG3 isolates. ML phylogenetic tree of S. flexneri PG3 (n=384) generated using core genome SNPs is displayed on the left, in which isolate serotype is displayed on the outer ring and coloured according to the inlaid key displayed next to the tree. The two subclades with branches highlighted in red were selected for BEAST analysis. Maximum clade credibility trees based on two subclades within PG3 are displayed on the right. Independent switching events occurring along the various phylogenetic branches are highlighted in black, labelled and annotated. BEAST estimated timeframe of divergence along the branches of the seven isolates that have undergone serotype switching are shown in table S5.

Supplementary Figure. 7
The distribution of vaccine antigen candidate and protein sequence identity among Shigella spp. (A) Lefthand y-axis refers to the grouped bar plot displaying presence of vaccine candidate genes identified among Shigella isolates from GEMS (n=1246). Bars are grouped by genes and coloured according to species. Righthand y-axis (blue) refers to the boxplot displaying the interquartile range, median (red) and minimum/maximum pairwise percentage identity of the amino acid sequences of antigen vaccine candidates among GEMS, compared against the reference sequences. Presence of genes were identified using BLASTn search against draft genome assemblies and amino acid sequence percentage identity were inferred using BLASTp. (B) Barplots demonstrates percentage of virulence plasmid detected among each species, as represented along the y-axis. Low percentage of virulence plasmid were detected among S. sonnei isolates, likely contributed by the fact that S. sonnei virulence plasmid is comparatively unstable and often lost during subculturing.

Supplementary Table 2: Details of Shigella isolates used in this study
Includes accession numbers of the sequencing reads used in the study, Shigella serotype, assembly statistics, year and country of isolation, condition of the child (case/control) from which the isolate was derived from as defined by GEMS, genomic subtype, AMR genes and QRDR mutations.
Please see additional file

Supplementary Table 3: Details of publicly available E.coli/Shigella genomes used in this study
Publicly available E. coli/Shigella genomes used to contextualize GEMS isolates within the established phylogroup/lineage/clade/subtype are listed in Supplementary Table 3A and Shigella genomes incorporated in the assessment of vaccine protein antigen variation across LMICs are listed in Supplementary Table 3B.
Please see additional file