A Shigella sonnei clone with extensive drug resistance associated with waterborne outbreaks in China

Antimicrobial resistance of Shigella sonnei has become a global concern. Here, we report a phylogenetic group of S. sonnei with extensive drug resistance, including a combination of multidrug resistance, coresistance to ceftriaxone and azithromycin (cefRaziR), reduced susceptibility to fluoroquinolones, and even colistin resistance (colR). This distinct clone caused six waterborne shigellosis outbreaks in China from 2015 to 2020. We collect 155 outbreak isolates and 152 sporadic isolates. The cefRaziR isolates, including outbreak strains, are mainly distributed in a distinct clade located in global Lineage III. The outbreak strains form a recently derived monophyletic group that may have emerged circa 2010. The cefRaziR and colR phenotypes are attributed to the acquisition of different plasmids, particularly the IncB/O/K/Z plasmid coharboring the blaCTX-M-14, mphA, aac(3)-IId, dfrA17, aadA5, and sul1 genes and the IncI2 plasmid with an mcr-1 gene. Genetic analyses identify 92 accessory genes and 60 single-nucleotide polymorphisms associated with the cefRaziR phenotype. Surveillance of this clone is required to determine its dissemination and threat to global public health.

Line 529: Were the genome assemblies and sequence reads of each isolate analyzed by WGS in this study deposited into a public repository such as GenBank? This should be described here and accession numbers of assemblies and SRAs should be added to the isolate table.
Fig S1: How were the ladder images generated that are overlayed on the southerns? Were the ladders from an inverse exposure of panel A? In the figure legend please mention the ladder used and describe that panels B, C, and D contain the ladders from the ethidium-stained gel prior to transfer, overlayed on the southern hybridization images. Also, please add the ladder band sizes to the bottom row of images.
Extended Data Figure 1 and Extended Data Table 1: It is confusing to have a supplemental Figure S1, and also an Extended Data Figure 1. Can the extended data table and figure be included among the supplementary tables and figures?

Responses to Reviewers' Comments
We thank the reviewers for the valuable and constructive comments. We have carefully incorporated the reviewers' advice and, as a result, our paper has improved substantially.
Below, we explain in detail how we have taken each of the comments into account in the revision of our manuscript. We show the reviewer's comments in italics and reply to them in the standard font. All the revisions have been highlighted in the revised manuscript.

Responses to Reviewer #1
This work describes about the molecular characteristics of a an extensively drug-resistant clone of Shigella sonnei associated with waterborne outbreaks of shigellosis in China. This work has covered susceptibility patterns of the pathogen, plasmid analysis, antimicrobial resistance genes and phylogenetic analysis using the whole genome sequence. Comments:

Abstract is not reflecting the all the findings made in this investigation.
Response: Thank you for pointing out this problem. Since Nature Communications requires the Abstract section to be no longer than 150 words, we omitted some secondary findings in the original manuscript. In the revised manuscript, in response to your comment, we have rewritten the Abstract section to summarize our findings as comprehensively as possible: "Antimicrobial resistance of Shigella sonnei has become a global concern. Here, we report a phylogenetic group of S. sonnei with extensive drug resistance, including a combination of multidrug resistance, coresistance to ceftriaxone and azithromycin (cef R azi R ), reduced susceptibility to fluoroquinolones, and even colistin resistance (col R ). This distinct clone caused six waterborne shigellosis outbreaks in China from 2015 to 2020. We collected 155 outbreak isolates and 152 sporadic isolates. The cef R azi R isolates, including outbreak strains, were mainly distributed in a distinct clade located in global Lineage III. The outbreak strains formed a recently derived monophyletic group that may have emerged circa 2010. The cef R azi R and col R phenotypes were attributed to the acquisition of different plasmids, particularly the IncB/O/K/Z plasmid coharboring the blaCTX-M-14, mphA, aac(3)-IId, dfrA17, aadA5, and sul1 genes and the IncI2 plasmid with an mcr-1 gene. Genetic analyses identified 92 specific accessory genes and 60 single-nucleotide polymorphisms associated with the cef R azi R phenotype. Surveillance of this clone is required to determine its dissemination and threat to global public health." (Lines 39-52) 2. What is the source of the water, pipeline/stored water? Evidence of waterborne outbreaks is very weak. Why only children are the affected group in all these six outbreaks Response: Thanks for your comment. In this study, investigations on water supply systems were performed for five of the six outbreaks (all except for the Wuzhou outbreak). For the locations of the Yulin outbreak, Nanning outbreak, and Guigang outbreak, water supplies were sourced from self-provided reservoirs pumped from the digging wells. For the Baise outbreak location, a centralized water supply sourced from a mountain spring water well was used, and for the Huainan outbreak location, a centralized water supply sourced from a waterworks facility was used. We have rephrased the paragraph in lines 135-140.
Antimicrobial susceptibility tests showed that the outbreak isolates, including 149 isolates from patients and six from water samples, all had the cef R azi R phenotype, with additional resistance to ampicillin, ticarcillin, piperacillin, trimethoprim/sulfamethoxazole, gentamicin and tetracycline (Supplementary Table   S2). The cef R azi R isolates recovered from water samples and outbreak patients were similar in the Yulin, Baise, Nanning, and Guigang outbreaks, with only one to three different SNPs, suggesting that the contaminated water supply may have been the cause of the outbreaks (Figure 2).
We propose the following points as the reasons why most of the outbreak patients in this study were children: 1) The six outbreaks occurred mainly in kindergartens and schools, which were places where children gathered together.
2) These outbreaks occurred in southern China, where heavy precipitation and floods were frequent. During heavy precipitation and floods, the water supply systems, sewage systems, and waste disposal systems of the kindergartens and schools were damaged, resulting in contamination of drinking water sources. Meanwhile, some students in the kindergartens and schools would drink unboiled and unsterilized water directly, leading to an increased risk of infection among them.
3) According to previous studies, children under five years old are more susceptible to the infection of Shigella species. 1 4) It was found that most of the teaching staff did not take meals in the kindergartens and schools. Therefore, their risk of infection would be lower than that of students.

Does these outbreaks repeatedly appeared in the same geographical location
(more specifically the same school) during 2015-2020?
Response: These outbreaks were independent, and appeared in different geographical locations. Five of the outbreaks occurred in Guangxi province, but in different schools in Yulin, Baise, Wuzhou, Nanning, and Guigang cities (Supplementary Table S1), and the remaining outbreak in Huainan was in Anhui province. Locations of the six outbreaks were marked on the map of the two provinces ( Figure 1A). Response: Thanks for your suggestion. We present the collection time of the sporadic strains in Figure 1B:

Introduction
We have also described the collection time and isolation source of the sporadic strains in the revised manuscript: "The sporadic strains were isolated from patients infected with S. sonnei, mainly  Response: Thank you for this comment. In this study, S1-nuclease pulsed-field gel electrophoresis (PFGE) was performed to investigate the plasmid profiles of the cef R azi R and col R strains, and Southern blotting was then used to determine the location of blaCTX-M, mphA, and mcr-1 genes on plasmids or chromosomes. The detailed method for S1-PFGE and Southern blotting was described by Zou D, et al 3 .
The primer and probe sequences used in this study are listed in Supplementary Table   S7. The S1-PFGE and Southern blotting results showed that the blaCTX-M-14 and mphA genes were co-located on the ~100 kbp plasmid, and the mcr-1 gene was located on the ~60 kbp plasmid (Supplementary Figure S1).
We have rephrased the description of the methods for obtaining plasmids: "To determine the blaCTX-M-14-, mphA-, and mcr-1-harboring plasmid sequences, we extracted fragments of plasmids using a Qiagen Plasmid Midi kit (Qiagen, Germany) and submitted them for sequencing using the Illumina MiSeq platform to ensure that most of the fragments sequenced belonged to plasmids. We performed quality control using Trimmomatic and assembly using plasmidSPAdes 4 on the plasmid sequence data before mapping them to the Plasmidfinder database 5  Response: Thank you for your comment. The 5.5kb region was composed of the intI1 integron and the sul1, dfrA17, and aadA5 genes ( Figure 4B).
In the revised manuscript, we have added the corresponding descriptions: "the plasmids from the Baise, Nanning, Guigang, and Wuzhou outbreak isolates lost an IS26 copy and acquired a 5.5 kb region composed of the intI1 integron and the sul1, dfrA17, and aadA5 genes ( Figure 4B)." (Lines 253-255) "The WGS data and representative plasmids were mapped to the local INTEGRALL 6 database to identify the type of integron using BLAST." (Lines

584-585)
"We found that there was an intI1-type integron located downstream of the dfrA17 gene associated with the cluster of dfrA17, aadA5, qacEdelta, and sul1 (Supplementary Figure S4). In addition, there were four other gene clusters associated types. This information may also be included in the phylogenetic analysis.
Response: Thank you for this comment. There were 12 genes mentioned in lines 170-172 and 205 of the original manuscript. We found that there were mainly five combinations of 10 related genes.
We present this finding in Supplementary Figure S4: We have also provided information on the types of integrons in the revised manuscript: "We found that there was an intI1-type integron located downstream of the dfrA17 gene associated with the cluster of dfrA17, aadA5, qacEdelta, and sul1 (Supplementary Figure S4). In addition, there were four other gene clusters associated with integrons. The genes tet(A), strA, strB, and sul2 were mainly present in the order 11. Most importantly, the virulence encoding genes are missing in the analysis.
Epidemiologically, it is important to undertake this kind of analysis, as it is directly associated with the diseases. Authors can generate this formation using the WGS data.
Response: Thanks for your helpful comment. We have analyzed the virulence encoding genes and present the findings in Supplementary Figure S5: Figure S5. Distribution of virulence-encoding genes with different functions. The exoenzyme, effector delivery system, motility, regulation, and two other function genes were absent in most of the strains. There were no significant associations between the genes and the cef R azi R strains. cef R azi R coresistance to ceftriaxone and azithromycin.
In the revised manuscript, we have also added and corrected the following descriptions: "SNPs and genes associated with cef R azi R isolates and outbreak strains" (Line

270)
"Analysis of potential markers and related genes" (Line 587) "Considering that virulence genes may play certain roles, we also mapped WGS data to the VFDB 7 using BLAST to identify the virulence-encoding genes and have displayed the results in a heatmap." (Lines 601-603) "Analysis of virulence-encoding genes showed that there were a total of 97 virulence factor genes, including 70 genes involved in the effector delivery system, three genes involved in exoenzyme activity, one involved in regulation, one involved in enterotoxin activity, one involved in motility, nineteen involved in nutritional or metabolic factor activity, and two involved in other functions. The genes of the effector delivery system, especially the T3SS, motility, regulation, and two other functions, were significantly different in distribution among the lineages. These virulence genes were absent in most of the cef R azi R strains but present in the strains from the Guigang outbreak and some of the Nanning outbreak strains (Supplementary Figure S5)." (Lines 286-294)

The SNPs analysis has to be extended in virulence encoding genes and its transcriptions and functions.
Response: Thank you for this helpful comment. We have extracted and analyzed SNPs located in the virulence factors genes of S. sonnei and supplemented Supplementary "MLST analysis was also conducted using MLST 8 tools according to the database of the PubMLST website 9 ." (Lines 559-560) "Multilocus sequence typing (MLST) analysis showed that all of the strains belonged to ST152, as previously described 10 ." (Lines 174-175)

Discussion section mostly reflects the information provided in the results and this
can be avoided. Technology Co., Ltd., China). Suspected isolates recovered from the samples were identified using a commercial biochemical test kit (API 20E system; bioMérieux

Response
Vitek, France) according to the manufacturer's recommendations." (Lines 493-505) 16. Reference numbers mentioned in the text and in the concerned section are not matching.

Response:
We are very sorry for the negligence. We have updated reference numbers throughout the text to match those in the References section.
17. Finally, the English language has to be improved. In many places, grammar and syntax errors exists.

Response:
The manuscript has been thoroughly revised and polished by native speakers, and we hope it can meet the journal's standards. Thank you so much for your useful comments. Response: Thank you for your suggestion. We have interpreted and discussed the results of the COG analysis in detail in the revised manuscript:

This study reports on the sampling and genomic epidemiology investigation of
"The COG results of phenotype-related pangenomic genes and SNP sites showed that all these genes involved were related to energy production; carbohydrate transport and metabolism; transcription, replication, recombination, and repair; cell wall or membrane or envelope biogenesis, and other functions. Although the effects of these related genes on the biological functions of the strains cannot be determined by simple mathematical addition of the number of genes involved in these two COGs, there were many more genes associated with transport and energy production, such as the fep gene encoding iron enterobactin, msb gene encoding a full transporter, and tauB gene encoding the taurine ATP-binding component of a transport system 13,14 . We hypothesize that the strains with the cef R azi R phenotype may have stronger nutrient transport and metabolic capacity, and to some extent, this may contribute to better adaptation to the environment of these strains. In addition, cell wall and membrane and envelope biogenesis 15 -associated genes, such as the phage-related lysozyme gene rrrD, may play a role in adaptation to poor environments. These may be important reasons why certain waterborne strains can survive in water and contribute to a series of outbreaks." (Lines 408-421) 4. Use of the "first time/study" seems a bit overstated throughout the MS given the novelty of original research.
Response: Thank you for this comment. We have deleted the inappropriate "first time/study" from the revised manuscript.

Some methods need clarification.
Response: Thanks for your comment. We have supplemented detailed information about the methods of isolating strains, the tools used in correcting errors, and the analyses on genotyping, multi-locus sequence typing, plasmids comparations, integrons, and virulence encoding genes: "All food and water samples were collected by sterile sampling according to the national food safety standards 11  "De novo assemblies were generated using Canu v1.6, 16 and errors of sequencing were corrected by the Illumina sequencing data using Pilon v1.24 17 ." (Lines 537-538) "Further genotyping of the strains was performed using Mykrobe software according to the study of Jane Hawkey et al. 10 " (Lines 558-559) "MLST analysis was also conducted using MLST 8 tools according to the database of the PubMLST website 9 ." (Lines 559-560) "Plasmid sequence comparison and map generation were performed using BLAST 18 and custom Perl scripts, respectively." (Lines 573-575) "The WGS data and representative plasmids were mapped to the local INTEGRALL 6 database to identify the type of integron using BLAST." (Lines

584-585)
"Considering that virulence genes may play certain roles, we also mapped WGS data to the VFDB 7 using BLAST to identify the virulence-encoding genes and have

Line 222: elaborate on "almost the same"
Response: There were a total of 215 cef R azi R isolates in our study, including 155 outbreak isolates. The outbreaks isolates were an important component of these cef R azi R isolates. We performed a GWAS on the outbreak and the cef R azi R isolates. A total of 59 significant SNPs were associated with the outbreak isolates ( Figure 5A) and 60 significant SNPs were associated with the cef R azi R isolates ( Figure 5B).
Among these SNPs, only three were different, including one same-sense mutation, one in gcd, and one in fhuE (Supplementary Table S5). Therefore, we concluded that associated SNPs with the outbreak and the cef R azi R isolates were almost the same. To some degree, this revealed that the SNPs found in outbreak isolates also had a strong correlation with other cef R azi R isolates. These SNPs may play an important role in transportation and energy production. 13 We have elaborated on "almost the same" in the revised manuscript: "In addition to the 10 highly significant SNPs, other SNPs found in the cef R azi R isolates were almost the same as those in the outbreak isolates, with only three SNPs different (one same-sense mutation, one in gcd, and one in fhuE) (Supplementary

Line 245: report
Response: Thank you for this comment. We have revised it in line 318: "Concerningly, here, we report the prevalence and outbreaks of shigellosis caused by S. sonnei with a combination of MDR, cef R azi R , reduced susceptibility to fluoroquinolones, and even col R ."

Line 257: may have emerged
Response: Thank you for this comment. We have revised it in line 330: "Our data showed that the Chinese S. sonnei isolates formed a main Chinese clade within the global Lineage III that may have emerged circa 1987."

Line 307: locally, in the local population
Response: Thank you for this comment. We have revised it in lines 380-381: "These results indicated that this distinct clone harboring the blaCTX-M-14-and mphA-positive plasmids might have been prevalent in the local populations and spread interregionally."

Line 322: rephrase ever reported
Response: We appreciate the valuable comment. We have rewritten the sentence to make it easier to understand: "We have reported the emergence of mcr-1-positive S. sonnei strains in China, 19 and here, we report the first outbreak of shigellosis caused by mcr-1-positive S. sonnei strains with the cef R azi R phenotype." (Lines 396-398)

Line 328: in the process > towards?
Response: Thank you for this comment. We have revised it in line 402: "these distinct plasmids might have been prevalent in the local populations and spread interregionally and seem to be evolving toward local establishment."

Line 333: Suggest to delete> … than just the treatment of Shigellosis.
Response: Thank you for the helpful suggestion. We have deleted these redundant words and revised the sentence from "which would represent a major threat to global public health and thus have a much broader implication than just the treatment of shigellosis" to "which would represent a major threat to global public health and thus have much broader implications." (Lines 405-406)

Line 336: While… ? check intended meaning of this sentence.
Response: Thank you for pointing out this problem. This sentence has been removed from the revised manuscript since the paragraph in which the sentence is located overlapped with the Results section.

Line 380: globalization
Response: Thank you for this comment. We have corrected it in line 456:  Response: Thank you for this comment. We apologized for our negligence in missing the introduction of the tool used. We have revised the description: "Plasmid sequence comparison and map generation were performed using BLAST 18 and custom Perl scripts, respectively." (Lines 573-575)

Line 507: analyses
Response: Thank you for your helpful comments. We have corrected it in line 600: "COGs of genes identified by pangenome analysis and a GWAS were determined by mapping to the COG function database using BLAST and visualized using R."  (lines 318, 397, and 451).

Responses to Reviewer #3
We used the software Mykrobe 10 to identify genotyping of the strains in our study, and we found that most of the Chinese strains belonged to the 3.7.6 genotype, corresponding to global Lineage III. The findings were presented in Figure 1C. We have added the corresponding descriptions in the Methods and Results sections: "Further genotyping of the strains was performed using Mykrobe software according to the study of Jane Hawkey et al. 10 " (Lines 558-559) "Further genotyping of the main Chinese clade strains showed that they belonged to the 3.7.6 genotype, corresponding to global Lineage III ( Figure 1C)." (Lines

Line 45: What is the number of outbreaks?
Response: Thank you for this comment. There were six outbreaks in our study during 2015-2020. We have changed "a series of" to "six" in line 43.

Line 48: Change "with mcr-1 gene" to "with an mcr-1 gene".
Response: Thank you for pointing out this error. We have added "an" before the "mcr-1 gene" in line 49: "The cef R azi R and col R phenotypes were attributed to the acquisition of diverse distinct plasmids, particularly the IncB/O/K/Z plasmid coharboring the blaCTX-M-14, mphA, aac(3)-IId, dfrA17, aadA5, and sul1 genes and the IncI2 plasmid with an mcr-1 gene."

Line 72: Please add a reference.
Response: According to your comment, we have added a reference to support our statement: "Antimicrobials, especially fluoroquinolones (e.g., ciprofloxacin and norfloxacin), third-generation cephalosporins (e.g., ceftriaxone), and macrolides (e.g., azithromycin), are recommended by the World Health Organization (WHO) for the treatment of shigellosis to accelerate recovery, prevent complications, and reduce onward transmission. 24 " (Lines 69-72) 7. Line 74: Change "locates" to "located" and "Global III Lineage" to "global Lineage III".
Response: Thank you for your valuable comment. We have accordingly revised the sentence in line 74: "this pandemic clone is located within global Lineage III, which recently emerged from Europe and underwent contemporary global dispersal and localized clonal expansion. 2,25 "

Lines 77-78: Change to "has been increasingly detected among MDR S. sonnei"
Response: Thank you for the suggestion. The sentence has been revised in line 78: "Moreover, resistance to fluoroquinolones has been increasingly detected among MDR S. sonnei" 9. Line 82: Change "within" to "at".

Response:
We are sorry for this mistake. We have reworded the sentence according to your comment in line 84: "Several studies have investigated the genomic epidemiology of S. sonnei at the global and regional levels"  Response: Thank you. We apologize for the confusion. In lines 168-169 of the original manuscript, we did refer to Extended Data Figure 1A and Figure 3.
Nonetheless, as per your comment below, we have included Extended Data Figure 1 in supplementary figures, which is now Supplementary Figure S3. Therefore, we have Response: Thank you for this comment. We have corrected "-log10(P) > 100" and "-log10(P) > 50" to "P < 1e-100" and "P < 1e-50", respectively, in the revised manuscript. Response: Thank you for your comments. We have elaborated on the significance of these mutations to the expansion of the MDR clones in the Discussion section: "The COG results of phenotype-related pangenomic genes and SNP sites showed that all these genes involved were related to energy production; carbohydrate transport and metabolism; transcription, replication, recombination, and repair; cell wall or membrane or envelope biogenesis, and other functions. Although the effects of these related genes on the biological functions of the strains cannot be determined by simple mathematical addition of the number of genes involved in these two COGs, there were many more genes associated with transport and energy production, such as the fep gene encoding iron enterobactin, msb gene encoding a full transporter, and tauB gene encoding the taurine ATP-binding component of a transport system 13,14 .
We hypothesize that the strains with the cef R azi R phenotype may have stronger nutrient transport and metabolic capacity, and to some extent, this may contribute to better adaptation to the environment of these strains. In addition, cell wall and membrane and envelope biogenesis 15   "The COG results of phenotype-related pangenomic genes and SNP sites showed that all these genes involved were related to energy production; carbohydrate transport and metabolism; transcription, replication, recombination, and repair; cell wall or membrane or envelope biogenesis, and other functions. Although the effects of these related genes on the biological functions of the strains cannot be determined by simple mathematical addition of the number of genes involved in these two COGs, there were many more genes associated with transport and energy production, such as the fep gene encoding iron enterobactin, msb gene encoding a full transporter, and tauB gene encoding the taurine ATP-binding component of a transport system 13,14 . We hypothesize that the strains with the cef R azi R phenotype may have stronger nutrient transport and metabolic capacity, and to some extent, this may contribute to better adaptation to the environment of these strains. In addition, cell wall and membrane and envelope biogenesis 15 -associated genes, such as the phage-related lysozyme gene rrrD, may play a role in adaptation to poor environments. These may be important reasons why certain waterborne strains can survive in water and contribute to a series of outbreaks." (Lines 408-421) 28 shows that the plasmids of ~60kbp carried the mcr-1 gene. Panels C and D show that the plasmids of ~100kbp carried the blaCTX-M-14 and mphA genes.
Following your suggestion, we have added corresponding markers to panels B, C, and D of the revised Supplementary Figure S1 and revised its legend: Figure S1. S1-PFGE and Southern blotting analysis of the mcr-1-, blaCTX-M-14, and mphA-positive plasmids. Eleven representative isolates, including ten from patients and one from tap water, were selected for S1-PFGE and Southern hybridization analysis. The universal standard strain Salmonella braenderup H9812 digested with XbaI was used as the size marker for the PFGE and southern. The size of the Nylon membrane was the same as the PFGE. A S1 nuclease plasmid profile obtained by PFGE. B Southern hybridization for the mcr-1 gene. C Southern hybridization for the blaCTX-M-14 gene. D Southern hybridization for the mphA gene. S1-PFGE S1 nuclease pulsed-field gel electrophoresis. Figure 1 and Extended Data Table 1: It is confusing to have a supplemental Figure S1, and also an Extended Data Figure  Response: Thank you for your comments to help us improve the quality of this paper.

Extended Data
In the revised manuscript, we have included Extended Data Figure 1 and Extended Data Table 1 in the Supplementary Information as Figure S3 and Table S2, respectively.
As suggested, we have added Supplementary Figure S4 to present the distribution,  Table S1. Characteristics of the patients from the six outbreaks Table S2. Antimicrobial resistance of the Chinese S. sonnei isolates including the outbreak strains