Comparative genomics of Australian and international isolates of Salmonella Typhimurium: correlation of core genome evolution with CRISPR and prophage profiles

Salmonella enterica subsp enterica serovar Typhimurium (S. Typhimurium) is a serovar with broad host range. To determine the genomic diversity of S. Typhimurium, we sequenced 39 isolates (37 Australian and 2 UK isolates) representing 14 Repeats Groups (RGs) determined primarily by clustered regularly interspaced short palindromic repeats (CRISPR). Analysis of single nucleotide polymorphisms (SNPs) among the 39 isolates yielded an average of 1,232 SNPs per isolate, ranging from 128 SNPs to 11,339 SNPs relative to the reference strain LT2. Phylogenetic analysis of the 39 isolates together with 66 publicly available genomes divided the 105 isolates into five clades and 19 lineages, with the majority of the isolates belonging to clades I and II. The composition of CRISPR profiles correlated well with the lineages, showing progressive deletion and occasional duplication of spacers. Prophage genes contributed nearly a quarter of the S. Typhimurium accessory genome. Prophage profiles were found to be correlated with lineages and CRISPR profiles. Three new variants of HP2-like P2 prophage, several new variants of P22 prophage and a plasmid-like genomic island StmGI_0323 were found. This study presents evidence of horizontal transfer from other serovars or species and provides a broader understanding of the global genomic diversity of S. Typhimurium.


Supplementary text Plasmids and antibiotic resistance
We further analysed the distribution of the plasmid and antibiotic resistance genes among the 105 strians. Antimicrobial resistance genes were identified using Resfinder 1 . The contigs that were not aligned with LT2 were identified by progressiveMauve.
To determine the homologues and functions of the unaligned sequences, contigs were searched against the GenBank non-redundant nucleotide database by using BLASTn 2 .
DT97 had a 108 kb plasmid with 78% and 94% DNA sequence similarities with pSTM7 (KF290377) and pC49-108 (KJ484638), respectively. Apart from pSLT, Overall, antibiotic resistance genes were randomly distributed among different sub-lineages except for DT97, L1874 and DT193 in RG6B, T000240 and ST1660/06 from RG2 and 138736, DT104, L1860 and U302 in RG8 which shared the similar antibiotic resistance patterns. Antibiotic resistance genes were present in the chromosome of 138736, DT104, L1860 and U302 from RG8, which located in the SG1 genomic island. T000240 and ST1660/06 from RG2 also contain multiple antibiotic resistance genes in their chromosome as described previously 4 .

Figure S1
Phylogenetic tree of 105 S. Typhimurium strains based on their SNPs obtained from S. Typhimurium core genome. The minimum evolution method was used to infer evolutionary relationships of the isolates. The SNPs supporting each branch are shown next to the branches. Scale bar indicates the number of SNPs.

Figure S2
Phylogenetic tree of core genome sequence of P2 phage. The minimum evolution method was used to infer evolutionary relationships of the P2 phages. The SNPs supporting each branch are shown next to the branches. Scale bar indicates the number of SNPs.

Figure S3
Phylogenetic tree of core genome sequence of SPN9CC phage found in this study. The minimum evolution method was used to infer evolutionary relationships of the SPN9CC phages. The SNPs supporting each branch are shown next to the branches.  Figure S4 Summary of the phylogenetic relationships among the sequences of 24 P22-like prophages and their homology with other serovars or Escherichia coli. Left:The UPGMA tree based on the presence and absence of DNA fragments in P22 pan genome was inferred using DENOUPGMA. The scale bar indicates the number of difference of DNA fragments. The tree is drawn to scale, with branch lengths in the same units as those of the distances used to infer the UPGMA tree. Right: the sequence similarity of 24 P22-like phages and their homology with other serovars or E. coli. The cut-off value of sequence similarity is 0.9. Phage P22 gene function is shown at the top in the order in which they are located on the phage pan-genome (not drawn to scale).The gene function was annotated by RAST (except for the use of P22 phage gene names for ORF485, gtrB, gtrA, int, ORF56, Ea cluster, ORF30 and intergenic region (IG)). The different colours of the phage names indicate the genus of their host as follows: blue: only found in Escherichia coli; black: not found in any other serovars or E.coli but only in Typhimurium; yellow: Dublin/Choleraesuis; orange: Heiderberg; purple: Newport/Paratyphi A; green: only found in one serovar except for Heidelberg; Red: found in more than three serovars. Black vertical lines indicate protein/gene boundaries.

Figure S5
The genome comparison of Shigella phage SfV and its variant in DT193. The nucleotide matches between the phage SfV and its variant in DT193 was visualised by Artemis Comparison Tool (ACT) (http://www.sanger.ac.uk/Software/ACT/). The red bars represent individual sequence matches between the DNA lines with each other.