Genomic profiling of antimicrobial resistance genes in clinical isolates of Salmonella Typhi from patients infected with Typhoid fever in India

The development of multidrug resistance in Salmonella enterica serovar Typhi currently forms a major roadblock for the treatment of enteric fever. This poses a major health problem in endemic regions and extends to travellers returning from developing countries. The appearance of fluoroquinolone non-susceptible strains has resulted in use of ceftriaxone as drug of choice with azithromycin being recommended for uncomplicated cases of typhoid fever. A recent sporadic instance of decreased susceptibility to the latest drug regime has necessitated a detailed analysis of antimicrobial resistance genes and possible relationships with their phenotypes to facilitate selection of future treatment regimes. Whole genome sequencing (WGS) was conducted for 133 clinical isolates from typhoid patients. Sequence output files were processed for pan-genome analysis and prediction of antimicrobial resistance genes. The WGS analyses disclosed the existence of fluoroquinolone resistance conferring mutations in gyrA, gyrB, parC and parE genes of all strains. Acquired resistance determining mechanisms observed included catA1 genes for chloramphenicol resistance, dfrA7, dfrA15, sul1 and sul2 for trimethoprim-sulfamethoxazole and blaTEM-116/blaTEM-1B genes for amoxicillin. No resistance determinants were found for ceftriaxone and cefixime. The genotypes were further correlated with their respective phenotypes for chloramphenicol, ampicillin, co-trimoxazole, ciprofloxacin and ceftriaxone. A high correlation was observed between genotypes and phenotypes in isolates of S. Typhi. The pan-genome analysis revealed that core genes were enriched in metabolic functions and accessory genes were majorly implicated in pathogenesis and antimicrobial resistance. The pan-genome of S. Typhi appears to be closed (Bpan  =  0.09) as analysed by Heap’s law. Simpson’s diversity index of 0.51 showed a lower level of genetic diversity among isolates of S. Typhi. Overall, this study augments the present knowledge that WGS can help predict resistance genotypes and eventual correlation with phenotypes, enabling the chance to spot AMR determinants for fast diagnosis and prioritize antibiotic use directly from sequence.

plasmid mediated antimicrobial resistance determinants by horizontal transfer has prevented its reuse in clinical practice 8 . Sporadic episodes of extended spectrum beta lactamase genes conferring resistance to current third generation cephalosporins have also emerged 9,10 . Recent emergence of S. Typhi isolates resistant to azithromycin in Bangladesh indicated an increasing trend in resistance to azithromycin as compared to the rest of the world 11 . Of great concern is a recent study from Pakistan that reports the outbreak of enteric fever caused by extremely drug resistant (XDR) strains of S. Typhi 12 . In this scenario of evolving resistance mechanisms, an examination of the basis of resistance to existing drugs coupled with a genotypic-phenotypic analysis can aid in devising alternative therapeutic targets or strategies for the treatment of multidrug resistant bacterial infections.
Advancements in next generation sequencing technologies have aided whole genome sequencing (WGS) and evaluation of the complete DNA sequence of a bacterium, making it an ideal technique for surveillance 13 . WGS provides definitive genotype information and gives best possible resolution for characterization of an individual organism. Furthermore, strains possessing identical resistance phenotypes conferred by different mechanisms can also be differentiated 14 . Antimicrobial resistance determination by WGS can complement traditional laboratory-based surveillance and provide direct insights into their evolution and transmission from one strain to another. Current genome sequencing methods afford improved and exhaustive data related to the pathogen genotypic characteristics together with the identification of virulence determinants, antimicrobial resistance genes and serotypes whereas conventional antimicrobial susceptibility tests yield the phenotypes of strains. WGS data can help in revealing the antibiotic resistance mechanism 15-18 for drugs not being tested routinely or where the mechanisms of antimicrobial resistance are not yet identified. Several studies have established a strong antimicrobial genotypes-phenotypes correlation in E. coli, Campylobacter and non-typhoidal Salmonella respectively 16,19,20 .
Antimicrobial resistance is either caused by mutations in chromosomal genes (intrinsic resistance) or by acquisition of plasmid mediated resistance determinants (extrinsic resistance). The intrinsic resistance is mainly due to selection pressure whereas extrinsic genes are acquired by horizontal transfer 21 . The recent trends in the development of antimicrobial resistance (AMR) among S. Typhi in India proposed that multidrug resistant (MDR) enteric fever was decreasing in India and being replaced by enteric fever with fluoroquinolone resistant strains 22 . Similarly, hospital-based genomic surveillance for enteric fever in Bengaluru, India using WGS method suggested that large number of isolates showed non-susceptibility to fluoroquinolones 23 . Genome characterization assessment for AMR and pan genome of clinical isolates in endemic countries like India will assist recognition of variations in resistance mechanism and epidemiology leading to better selection of antimicrobial therapy and implementation of appropriate preventive measures. Though typhoidal fever is common across India and specifically in the northern sub-continent, the diversity between the strains is yet unexplored.
In the present study, 133 strains of S. Typhi isolated from patients presented with enteric fever to AIIMS hospital during past 24 years were subjected to whole genome sequencing. The pan genome was constructed and the occurrence of plasmid mediated antimicrobial resistance conferring genes and gene mutations determining resistance to anti-typhoidal agents analysed. This was further compared to the phenotypes of each respective strain. The pan genome analysis indicated that the genome remains almost closed. The genetic diversity among S. Typhi strains were investigated using the available online resources, such as BacWGSTdb 24 , Center for Genomic Pathogen Surveillance, and Centre for Genomic Epidemiology. These tools offer bacterial typing, rapid classification, source tracing, and phylogenetic relatedness linked to antibiotic resistance genes and clinical data important in a globalized community.

Result and Discussion
Salmonella draft genome. A total of 13,645 contigs varying from 48 to 654 with an average of 102.59 contigs per genome were generated. The average genome size of 4.6 Mb with reads per genome of 47,78,163 base pairs corroborates with other Salmonella strains 25 . Likewise, observed averaged G + C content of 51.97% corresponded to other isolates 26 . Functional annotation of the genome predicted a total of 4,607 CDSs, 73 tRNAs, and 1 transfer-messenger RNA. Analysis by Kruskal-Wallis statistical test did not yield any significant differences (P ≤ 0.05) amongst the strains related to their CDS, genome size, GC content, and average gene number. The submission detail for isolates, including statistics of genome assembly and transcript annotation is summarized in Table S1 in Supplementary File 1.
Salmonella pan-and core-genome. The pan-genome furnishes a comparative analysis across same species but different strains to determine specific features of that species. The constructed pan-genome offers insights into shared and diverse roles of genes amongst studied isolates. BPGA pipeline with similarity threshold of 90% identified a total of 4185 (70.79%) non-redundant core genes present in all genomes (Fig. S1, Supplementary File 2) and 1273 (21.53%) non-redundant accessory genes occurring in at least one but not present in all genomes. Nearly 454 (7.68%) strain specific unique genes (singletons) with no orthologs in corresponding genomic strains were observed. Overall, the pan-genome with 5912 genes, contains 1.28 times more gene content than each individual Salmonella strain. The core to pan-genome ratio disclosed a relatively high degree of conservation with the core-genome comprising 70.79% of the pan-genome. The contribution of 1432 new gene families to the pan-genome from 91 genomes implies that it contained on an average 9.22 new genes per genome. The estimated core-genome size (4185 genes) was higher than previously reported core-genomes of S. Typhi (3944 genes) 27 and S. Typhimurium (3846, 3910 and 3890 genes) [28][29][30] . The predicted core genes from this study were found to be ranging from 4016 to 4545 as determined by various pan-genome tools using S. Typhi data 31 . Addition of 42 genomes did not supplement the core genome indicating that further augmentation of genomes would not significantly decrease the size of core genome. Genome-wise statistics revealed a wide variation in the shared accessory genes from 129 (PGS-33) to 523 (PGS-26). The singletons were exclusively present in 76 isolates (57.14%) with the highest in PGS-123 (48 genes). Likewise, 63 genes were exclusively absent in 27 isolates (20.30%) with a maximum of 11 in PGS-65. The presence or absence of gene-families might be attributed to specific roles in emergence of virulence and adaptation to their respective habitats. The pan-genome size and distribution is summarized in Table S2 in Supplementary File 1.
The closed or open nature of pan-genome evaluates the versatility of the studied strains. The expected gene number in the pan and core-genome was calculated via curve fitting in accordance with Heaps' law 32 , Y pan = A pan. x B pan + C pan , where y is the size of pan-genome, x is the number of genomes and A pan , B pan , and C pan the fitting parameters and exponential equations (Y core = A core. e-B core .x + C core ), respectively. In this equation, B pan defines whether the pan-genome is closed (B pan < 0 or B pan > 1) or open (0 < B pan < 1) 33 . Total number of gene families (pan-genome) and shared gene families (core genome) are plotted for a given number of genomes added sequentially. The plot indicated a minor expansion of the pan-genome (Bpan = 0.09) while the core genome appeared to extend to almost a closed state as the addition of newer genomes failed to increment any novel gene in the existing genome (Fig. 1). This supports earlier studies demonstrating S. Typhi to be a closed pan-genome 34 .
Functional distribution of genes in pan-genome. The pan-genome was further gauged for functional role of the constituent genes. The core, accessory and unique genes were examined for their diverse features by exploring the functional databases COG and KEGG [35][36][37][38] . This annotation reflected that the genes are scattered in an assorted range of diverse functional categories throughout the genome. The COG distribution (Fig. 2) predicted the core genome (41.82%) to be predominantly associated with metabolic functions like transport and metabolism of nucleotides, amino acids, coenzyme, carbohydrates, secondary metabolites biosynthesis, lipids and inorganic ions, energy production and conversion and transport and catabolism 39,40 . These categories had some representation in the accessory genome. Some genes were implicated in translation and ribosomal structure mechanism. This analysis suggests that the majority of the core genes are essential for cell survival and necessary for the basic activity of the species. Nearly one-third genes in the accessory (33.86%) and unique (33.31%) genome were primarily involved in basic cellular processes of 'information storage and processing' , namely translation, ribosomal structure and biogenesis, replication, recombination, repair, and transcription. Though observed in the core genome, these genes were highly represented in the accessory genome signifying inter-species variations. The 'cellular processes and signalling' category was enriched among the core genes (21.65%) and depleted in the unique (13.32%) genome. More than one-fifth genes in the core (20.64%), accessory (24.10%) and unique (24.65%) genome were either poorly categorized or uncategorized (Fig. 3). The COG functional distribution of pan-genome is summarized in Table S3 in Supplementary File 1.
The KEGG functional distribution revealed similar overall division of core genes associated with "metabolisms" that accounted for a huge part in core (61.32%) compared to accessory (24.59%), and unique (13.50%) genomes (Fig. 3). Genes were mainly associated with 'carbohydrate metabolism' , 'overview' , 'amino acid metabolism' and 'energy metabolism' . A sizeable gene proportion in unique (88%) and accessory (50.55%) genome, compared to the core-genome, participated in 'human diseases' related to 'infectious diseases' , 'immune diseases' , 'cancer' and 'cardiovascular disease' . Remarkably, 70 genes in the core-genome were associated with pathway 'drug resistances: antimicrobial' comprising 38 ko01503: cationic antimicrobial peptide (CAMP) resistance genes, 23 ko01501: beta-lactam resistance genes, and 9 ko01502: vancomycin resistance genes. This suggests that antibiotic resistance genotype plays a vital role in determining S. Typhi susceptibility. Overall 2711 (64.78%), 490 (38.49%), and 413 (90.97%) genes in core, accessory and unique-genome, respectively were assigned to a KEGG category. The KEGG functional distribution of pan-genome is summarized in Table S4 in Supplementary File 1.
Genotypic and Phenotypic resistance to antityphoidal agents. Antimicrobial susceptibility patterns were detected by WGS for resistance determining genes (genotype) and disk diffusion method (phenotype) for first line antibiotics (co-trimoxazole, amoxicillin, and chloramphenicol), fluoroquinolones (ciprofloxacin, and pefloxacin), third generation cephalosporins (cefixime and ceftriaxone) and macrolides (azithromycin) as per CLSI guidelines. All the strains which were collectively resistant to ampicillin, chloramphenicol and co-trimoxazole were considered as MDR (Multiple drug resistant) strains. The relationships between resistance gene content identified from WGS was deciphered with the drug resistance profile for each corresponding clinical isolate (phenotype). None of the 133 strains tested by phenotypic method was found to be susceptible to all antimicrobial agents. Out of the 133 strains, 111 (83.5%; 95% CI, 76.2% to 88.8%) revealed non-susceptibility to one antibiotic and 23 (17.3%; 95% CI, 11.8% to 24.6%) showed non-susceptibility to two or more antibiotics. The antibiotic susceptibility patterns are discussed individually.
Amoxicillin resistance. The amoxicillin resistance is associated with the presence of beta-lactam genes which were observed in 15.79% (21/133) strains by WGS. The most common beta-lactam resistance gene bla TEM-1B was observed in 19 strains, TEM-1 in 16 strains and bla TEM116 in one strain (Table 1; Table S5 in Supplementary File 1). The resistance genes encode for the predominant plasmid-mediated β-lactamases of Enterobacteriaceae 41 . Overall, antimicrobial resistance was observed in 12.03% (16/133) isolates by phenotypic method. Earlier reports for amoxicillin resistance in Salmonella strains isolated pan-India was 3% 4 . The sensitivity and specificity was 100% and 95.73%, respectively for beta-lactams ( Fig. 4; Table 3; Table S7 in Supplementary File 1).
Chloramphenicol resistance. Chloramphenicol resistance determinants were observed in 15.79% (21/133) strains by WGS (Table 1; Table S5 in Supplementary File 1). Nineteen non-susceptible strains harboured catA1 gene which encodes chloramphenicol acetyltransferase enzyme causing chloramphenicol resistance by chemical modification of the drug molecule, whereas sixteen isolates harboured the catI genes. Our findings are consistent with other studies reporting chloramphenicol susceptibility in S. enterica 4,32 . Antimicrobial resistance to chloramphenicol was 8.27% (11/133) by disk diffusion method. Similar findings have been reported by other studies where resistance gene carriage rate was higher than phenotypically reported resistance 42,43 . The sensitivity was 100% and specificity was 91.80% for phenicol resistance ( Fig. 4; Table 3; Table S7 in Supplementary File 1).
www.nature.com/scientificreports www.nature.com/scientificreports/ Co-trimoxazole resistance. Trimethoprim-sulfamethoxazole were considered in combination for treatment as the first-line drug, co-trimoxazole. Out of 133 strains, trimethoprim resistance determining genes were found in 15% isolates (20/133). The most prevalent dfrA7 was observed in 18 isolates followed by dfrA15 and dfrA1 in three isolates. Likewise, gene sul1 and sul2, encoding dihydropteroate synthases known to disseminate sulfamethoxazole resistance, were detected in overall 15.79% isolates (21/133), where 16 isolates harboured both sul1 and sul2 genes and separately sul1 and sul2 genes alone were observed in 3 and 2 strains, respectively ( Table 1; Table S5 in Supplementary File 1). Overall, antimicrobial resistance to co-trimoxazole was detected in 21 (15.79%) strains by WGS as compared to 12 (9.02%) by phenotypic method. The trimethoprim and sulfamethoxazole resistance markers individually were perceived in two isolates. These results are in concordance with previous findings from India which reported antimicrobial resistance in 6% and 5% of typhoidal salmonella isolates 4,32 . The sensitivity was 100% and specificity was 99.56% for co-trimoxazole ( Fig. 4; Table 3; Table S7 in Supplementary File 1).

Figure 2.
COG functional distribution of pan-genome. The histogram illustrates the predicted functionality of proteins assigned to core (orange), accessory (yellow) and unique (green) genes. The COG families were broadly grouped into four category namely information storage and processing (assigned to the L, K, and J categories), cellular processes and signalling (assigned to the V, U, T, O, N, M and D categories), metabolism (assigned to the P, Q, I, H, F, E, G, and C categories), and poorly characterized (assigned to the S, and R categories).  . KEGG functional distribution of pan-genome. The graph represent the KEGG functionality of proteins assigned to core (green), accessory (blue) and unique (yellow) genes. The KEGG families were broadly grouped into five category namely cellular processes, human diseases, genetic information processing, environmental information processing, metabolism, and organismal systems. (2020) 10 Supplementary File 1). The identified genes were associated with mutations in Quinolone Resistance Determining Region of DNA gyrase enzyme, the binding site for fluoroquinolone. Antimicrobial resistance to fluoroquinolones was 97.74% (130/133) by both disc diffusion and E-test method. MIC distribution ranged between 2-24 mg/L and peaked at 12 mg/L. A gradual increase in decreased susceptibility to ciprofloxacin was demonstrated in a longitudinal analysis of past 20 years 44 . DNA Gyrase A mutations at position 83 (Ser-83→Phe and Ser-83→Tyr) are the most prevalent resistance mechanisms for fluoroquinolone in India 45 , followed by Ser-80→Ile substitution in parC gene. We also found one fluoroquinolone susceptible strains with mutation in gyrA gene which might be occurring due to bacterial promoter modifications leading to the overexpression of other DNA topoisomerases. Highly non-susceptible strains (with ciprofloxacin MIC > 8 mg/L) were found to be double or triple mutants with mutations in gyrA83, gyrA87 and parC80. Strains with moderate resistance to ciprofloxacin possessed single mutations in DNA gyrA gene at Ser83 position. The observed sensitivity was 99.23% and specificity was 100% for both ciprofloxacin and peflox resistance ( Fig. 4; Table 3; Table S7 in Supplementary File 1).

Cephalosporins resistance.
• Antimicrobial susceptibility to antibiotics, cefixime and ceftriaxone, observed for all strains is consistent with other studies from India 1,32 . Resistance determinants were also not found for cephalosporins (Table 1; Table S5 in Supplementary File 1). Though all the strains were susceptible, however, a gradual increase in median MIC values was perceived over a time period. This clearly raises an alarm towards the judicial use of these antibiotics.
Azithromycin resistance. Only 0.75% non-susceptible strains (1/133) to azithromycin were observed by WGS as compared to 1.50% (2/133) strains by phenotypic method. The non-susceptible strains harboured mutations in the acrB efflux pump regulator genes that have been known to confer macrolide resistance 46 . In this study, we detected R717Q mutation on AcrB efflux pump as a cause of azithromycin resistance in S. Typhi 11 (Tables 1-2; Table S5-S6 in Supplementary File 1). AcrB-R717Q precisely leads to increased macrolide resistance. This study indicated that most of the S. Typhi strains were susceptible to azithromycin. These strains were screened separately for presence of macrolide resistance determinants (mphA, ermA, ereA, ereB, mefA and msrA genes) by PCR and presence of mutations in rlpD and rlpV, where also no macrolide resistance mechanism was observed 46 . The sensitivity was 50% and specificity was 100% for azithromycin resistance ( Fig. 4; Table 3; Table S7 in Supplementary File 1).  (Table 4; Table S8 in Supplementary File 1). In addition, one novel ST, aroC1 was identified which consists of a missense substitution (C-> T) compared to the predominant aroC1 allele present in almost all S. Typhi strains. MLST analysis revealed the presence of ST1 in all strains, irrespective of the geographical location, whereas ST2 was completely absent in strains isolated at Chennai and Wardha city of India. We next investigated the epidemiological outbreak using the phylogenetic tree based on gene-by-gene-based core genome MLST (cgMLST) method. The analysis yielded a total of 3002 core loci, where the number of successfully called alleles per isolate ranged from 2664 (88.74%) to 2706 (90.14%) depending on the gene. For high-resolution subtyping, core-genome sequence types (cgST) were considered, where the allele matches in cgST per isolate ranged from 2634 (87.74%) to 2699 (89.91%). The cgST-based tree showed that isolates with identical ST1 (119 strains) and novel ST (1 strain) were clustered in one group, whereas isolates with ST2 (23 strains) were separated into outlier cluster (Fig. S2 in Supplementary File 3; Table S9 in Supplementary File 1). Inter-strain diversity among all isolates yielded the Simpson's diversity index (SDI) equal to 0.51. The result demonstrates a lower level of genetic diversity and epidemiological links among isolates of S. Typhi.
Population structure and QRDR mutation patterns. The genotyphi classification according to the scheme of Wong et al. 50 (Fig. S3 in Supplementary File 4) and recombination free (Fig. S4 in Supplementary File 5) phylogenetic analysis was carried out separately to locate outbreak clusters based on SNPs shared among S. Typhi isolates collected from diverse geographical locations 55 . The tree showed that the strains were mainly grouped into two clusters, where 17.29% of strains (23/133) clustered close together with a distinct segregation from others strains (109/133). The phylogenies revealed that 93.39% of the reference genome (S. Typhi CT18) was covered in all isolates. The minimum and maximum SNPs differences in comparison to reference genome was observed for PGS-123 (275 SNPs) and PGS-66 (521 SNPs), respectively (Table S10 in Supplementary File 1). SNP tree was further explored to investigate the distribution and mutation patterns in the genes encoding DNA gyrase (gyrA and gyrB) or topoisomerase (parC and parE) that reside in the quinolone resistance-determining region (QRDR). The overall high genotypic (96.99%) and phenotypic (97.74%) resistance to ciprofloxacin and pefloxacin was observed owing to QRDR mutants in Salmonella isolates ( Fig. 4; Table S6 in Supplementary File 1). This is similar to the previously reported cases where the QRDR mutants were dominant in India (97%), Bangladesh (94%), Nepal (66%), Pakistan (94%), Myanmar (100%), Uganda (100%), and Nigeria (50%) 54 . The close examination of the tree exposed the genotype-based clustering in S. Typhi. The minimum and maximum numbers of SNP differences within the cluster strains were observed to be low as compared to strains belonging to different   The mutation frequency of gyrA_S83F (n = 74; 55.64%) observed in our data was accordance with previously reported cases of travel to South Asia including India (n = 71), Pakistan (n = 134), and Bangladesh (n = 25), and in contrast with travel history to Central Asia (n = 1), Europe (n = 1), Middle East (n = 1), and South America (n = 1) 54 . The mutation frequency of gyrA_S83Y (n = 48; 36.09%) was similar with previously discussed cases of travel to India (n = 56). The shared QRDR mutations between S. Typhi lineages in the phylogeny indicated that they might have inherited from a shared common ancestor. Genotype-associated mutation patterns are given in Table 4 and Fig. S4 in Supplementary File 5.

Methods
Study design. This study was conducted on clinical strains isolated from the blood culture of patients presented with typhoid fever from January 1993 to December 2016. We used blood culture isolates from bacteriology laboratory, A.I.I.M.S., New Delhi, where the samples were sent as a part of routine diagnosis. All the procedures were performed as per CLSI guidelines. All the experiments and data analysis were completed at All India Bacterial isolates. Clinical isolates of S. Typhi were revived from the archived cryopreserved collection of blood culture isolates and identified by standard biochemical and serological methods as described earlier 57 . The collection comprised 686 strains from which 133 strains of S. Typhi were selected using random number generator function in Stata v.14.1 (StataCorp, College Station, TX, USA) 58 . The strains were serially named as PGS1 (Pan Genome Study) to PGS133 for simplicity in data analysis.
Whole genome sequencing. Genomic DNA isolated from freshly grown overnight culture (QIAamp DNA minikit; Qiagen, Germany) was quantified using Qubit fluorometer (Life Technologies, USA). Sequencing library preparation was by Illumina Nextera XT DNA sample preparation kit (Illumina, USA). Genomic DNA samples were fragmented using Covaris M-series (M220) at temperature of 5.5 to 6 °C for 40 seconds. DNA fragments were end repaired using dA bases before ligation with Illumina indexed adapters, amplified for 10 cycles of PCR and sequenced employing v2 and v3 chemistry with paired-end 2 × 151 bp reads on Illumina MiSeq (Illumina, USA). Output data files were de-multiplexed and transformed with Casava v.1.8.2. into FASTQ files (Illumina, Inc, USA).
Preprocessing, genome assembly and analysis. The selected sequence paired-end reads from FastQC v0.11.4 59 were pre-processed and assembled de novo with A5-miseq pipeline 60 . Sequence adapters and low-quality (<Q30) regions were filtered with trimmomatic v0.36 61 . Read errors were corrected by SGA's k-mer-based error correction algorithm 62 . Paired and unpaired reads were assembled utilizing IDBA-UD algorithm and quality of genome assembly evaluated by QUAST (http://quast.sourceforge.net/quast) 63 . The redundant homologues with identity cut-off of 0.9 were removed from assembled de novo contigs by CD-HIT 64 . The assembled genomes were aligned to a reference genome of S. Typhi to avoid the risk of cross-contamination. The assembled bacterial genomes were annotated with Prokka v1.12 65 (http://www.vicbioinformatics.com/software.prokka.shtml). The total numbers of coding sequence regions (CDS) in annotated genomes were compared among all strains to filter out outlier strains.
Pan-genome construction. To identify strain-specific genomic features and genomic diversity among S.
Typhi isolates, the pan-genome was constructed using computational pipeline BPGA 66 . The entire set of proteins served as input for the BPGA analysis. The clustering of genes into families was generated with USEARCH 67 with 90% sequence identity as a cutoff. To reduce the impact of outliers, the core-and pan-genome size was determined for two isolates at first, and then the remaining isolates were added iteratively till the exact contaminant strain liable for shrinking the core-genome size was detected. The outliers were excluded in the final version of pan-genome. In order to evade any bias during the sequential inclusion of genomes, random sequence/order of genomes permutations were performed. The pan-genome functional analysis module of BPGA was used to assign Cluster of Orthologous Genes (COG) 35 and Kyoto Encyclopedia of Genes and Genomes (KEGG) 36 classes to the core, accessory, and unique gene families. The results were plotted with Gnuplot and PanGP 68,69 .
Correlation of susceptibility genotypes and phenotypes. Overall 1064 phenotypic data analysis points detected from 133 strains through phenotypic antimicrobial susceptibility testing were correlated with the existence of corresponding plasmid mediated resistance gene(s) as well as structural gene mutations. Phenotypically intermediate and resistant strains were together referred as non-susceptible in this investigation. Considering phenotypic results as the reference outcome, sensitivity of genomic method of antimicrobial susceptibility determination was calculated by dividing the number of genotypically non-susceptible isolates by the overall sum of isolates displaying phenotypic resistance. Specificity was determined by dividing the number of genotypically susceptible isolates by total number of phenotypically susceptible isolates.
Phylogenetic tree based on cgMLST and SNPs. Whole-genome sequencing (WGS)-based in silico multi-locus sequence typing (MLST) of Salmonella strains were performed using MLST 2.0 server 74 . The gene-by-gene-based core genome MLST (cgMLST) was completed using cgMLSTFinder (version 1.1) server 75 , whereas referenced-based SNP tree using maximum likelihood method was constructed by CSI phylogeny (version 1.4) server 76 from the Centre for Genomic Epidemiology (www.genomicepidemiology.org). The CSI phylogeny build phylogenetic tree is based on the concatenated alignment of the high-quality SNPs, where SNPs were filtered out if the depth at the SNP position was not at least 10x or at least 10% of the average depth for the particular genome mapping. SNPs were also filtered out if the mapping quality was below 25 or the SNP quality was below 30. The genotypic profile and recombination free SNP tree was built using Pathogenwatch from the Center for Genomic Pathogen Surveillance (CGPS, available at https://pathogen.watch). The phylogenetic tree was visualized using FigTree v1.4.3 (http://tree.bio.ed.ac.uk/software/figtree/) 77

Conclusions
Salmonella Typhi continues to pose a major challenge in treatment as it causes invasive infections and acquires antimicrobial resistance (AMR) genes to become non-susceptible to available drugs 79 . As the pipeline for new antityphoidal drugs are exhausted, it has become imperative to explore whole genomes to understand the characteristics and search for novel diagnostic targets. Findings from this study revealed a high correlation between the phenotypes and their corresponding genotypes. The pan-genome analysis revealed the characteristics of gene pool shared by clinical Salmonella Typhi isolates, indicating that the core genes were enriched in metabolism related function whereas accessory genes were majorly implicated in pathogenesis and antimicrobial resistance mechanisms. The cgMLST and SNP-based phylogenetic analysis revealed lower level of genetic diversity where a larger of the strains belonged to the same outbreak and few were separated into outlier cluster. This work overall highlights the importance of WGS-based resistance gene screening for the tracking of AMR mechanisms in S. Typhi and thus can 66 contribute to the battle against the expanding AMR threat worldwide.

Data availability
WGS data of all 133 Salmonella isolates have been submitted to the National Center for Biotechnology Information (NCBI; https://submit.ncbi.nlm.nih.gov/subs/bioproject/) under BioProject accession number PRJNA564922.