A multilocus sequence typing scheme of Pseudomonas putida for clinical and environmental isolates

Pseudomonas putida is a bacterium commonly found in soils, water and plants. Although P. putida group strains are considered to have low virulence, several nosocomial isolates with carbapenem- or multidrug-resistance have recently been reported. In the present study, we developed a multilocus sequence typing (MLST) scheme for P. putida. MLST loci and primers were selected and designed using the genomic information of 86 clinical isolates sequenced in this study as well as the sequences of 20 isolates previously reported. The genomes were categorised into 68 sequence types (STs). Significant linkage disequilibrium was detected for the 68 STs, indicating that the P. putida isolates are clonal. The MLST tree was similar to the haplotype network tree based on single nucleotide morphisms, demonstrating that our MLST scheme reflects the genetic diversity of P. putida group isolated from both clinical and environmental sites.

Selection of MLSt loci. Using the 106 genomes, we selected loci for MLST. First we chose 17 genes (acsA, argS, aroE, dnaN, dnaQ, era, gltA, guaA, gyrB, ileS, mutL, nuoC, ppnK, ppsA, recA, rpoB, rpoD and trpE), which are utilized in MLST schemes of other Pseudomonas group (P. aeruginosa and P. fluorescens) and a multilocus sequence analysis (MLSA) of P. putida group 13,20 . Among the 17 genes, consensus sequences applicable for primers were found in 8 housekeeping genes (argS, gyrB, ileS, nuoC, ppsA, recA, rpoB and rpoD genes) which encode Arginine-tRNA ligase (ArgS), DNA gyrase subunit B (GyrB), Isoleucine-tRNA ligase (IleS), NADH-quinone oxidoreductase subunit C/D (NuoC), Phosphoenolpyruvate synthase (PpsA), DNA recombination and repair protein (RecA), DNA-directed RNA polymerase subunit beta (RpoB), and RNA polymerase sigma factor (RpoD), respectively. The primers listed in Table 1 were used to confirm amplification by PCR. All PCR amplicons localised at the predicted corresponding sizes in electrophoresis. MLST allele sequences were extracted from sequences of the PCR amplicons. Table 1 shows the ratios of substitutions at non-silent (non-synonymous, change of amino acid) sites (dN) to those at silent (synonymous, no change of amino acid) sites (dS). The dN/dS ratios were extremely high (dN/dS > 1) in four genes (nuoC, ppsA, recA and rpoB), indicating a positive selection in the four housekeeping proteins. To test for positive selection of codons, we used the site-model analysis function of CodeML program, which is a part of the PAML software package 21 . No significant selection of codons was detected. Moreover, no deviation from random evolution was detected among any of the populations following neutrality test using Tajima's D statistic, Fu's D and F statistics, or Ramos-Onsins & Rozas' R2 (Table 1). Our MLST scheme showed that 106 P. putida strains were categorised into 68 sequence types (STs) ( Table S1). The MLST data have been deposited in the PubMLST database, available for public analysis. The number of unique alleles varied from 18 (rpoB) to 48 (gyrB, ileS, recA and rpoD) (Table S3).

Linkage disequilibrium.
To detect the non-random association of alleles at different loci, an index of association (I A ) was calculated. Significant linkage disequilibrium (P = 0.0) was detected for the 68 STs with I A = 2.94 in the classical (Maynard Smith) method and I S A = 0.42 in the standardised (Haubold) method. The significant linkage disequilibrium revealed close associations among the eight housekeeping genes. In addition, these close associations among the P. putida strains indicated that the strains used in this MLST scheme are clonal. tree of the MLSt scheme based on unweighted pair group method with arithmetic mean. Figure 1 shows an unweighted pair group method with arithmetic mean (UPGMA) tree, which was constructed from pairwise differences in the allelic profiles of the 68 STs. The analysis of the clonal complex showed that one group consisted of ST-9, -10, -11, -12 and -13. Although some non-clinical isolates (DOT-T1E, JB, BIRD-1, S12, KT2440 and B6-2) were relatively close, they were not categorised into the same clonal complex. www.nature.com/scientificreports www.nature.com/scientificreports/ comparison of the MLSt scheme with Snp-based distance tree. To evaluate the MLST scheme, we prepared a haplotype network tree based on 7,194 single nucleotide polymorphisms (SNPs) extracted from the total 106 whole genome sequences included in the present study (Fig. 2). Strains belonging to identical STs and identical clonal complex groups were close to each other, which was consistent with the MLST tree. Therefore, the MLST scheme reflects the whole genome profiles and is applicable for the characterisation of P. putida isolates.
16S rRNA sequences analysis. The 20 complete genomes were registered as P. putida genomes. Our 86 strains were judged as P. putida by MALDI-TOF MS. 16S rRNA sequence alignment, showed that some of strains were P. mosselii and P. parafulva, both of which belonged to the P. putida group. As shown in Figs 1 and 2, P. mosselii and P. parafulva strains were closely located in the MLST and SNP trees.
DnA-DnA distance analysis. To calculate the genetic distance of the 106 strains, digital DNA-DNA hybridization (DDH) was conducted using the KT2440 strain as a query. Interestingly, only 9 strains were estimated to be the same species as the KT2440 strain (Table S1, right end). Further DDH analysis revealed that the 9 strains were P. asiatica, whose 16S rRNA sequence was identical to that of P. putida. While in the SNP tree (Fig. 2), the 9 P. asiatica strains were closely located, in the MLST tree (Fig. 1), one ST (ST-14) branched away from 7 of the STs. www.nature.com/scientificreports www.nature.com/scientificreports/

Discussion
Based on the 16S rRNA sequence, Pseudomonas species was categorized into 5 groups; the P. aeruginosa group, the P. chlororaphis group, the P. fluorescens group, the P. pertucinogena group, and the P. putida group 22 . We found that this MLST scheme was applicable for P. putida, P. mosselii, and P. parafulva. Recently Tohya et al. reported that P. asiatica, newly split from P. putida based on average nucleotide identity and digital DDH and was phylogenetically close to Pseudomonas monteilii and P. putida 23 . In this MLST scheme, most of the P. asiatica strains were located close to each other, while one strain was not (Fig. 1). This result indicates that some of P. asiatica strains could be detected by our MLST scheme.
Yonezuka et al. reported that a phylogenetic trees based on 16S rRNA sequences was inadequate and that multi-locus sequence analysis (MLSA) based on 9 concatenated housekeeping genes (argS-dnaN-dnaQ-era-gltA-gyrB-ppnK-rpoB-rpoD) improved the resolution of the phylogenetic tree 13 . Consistent with the report, our MLST scheme yielded approximately the same results as the SNP analysis of the whole genomic sequence, while we utilized some different housekeeping genes because some genes lacked consensus sequences for PCR primers (Table 1).
Antibiotic-resistant strains in the P. putida group have been increasingly isolated from clinical sites 10,24 . Because none of the 106 strains possessed the blaVIM gene, distribution of the genes was not indicated in our MLST scheme.
In conclusion, we developed an MLST scheme for P. putida group strains using 8 housekeeping genes. This scheme was applicable to both clinical and environmental isolates.

Methods
DnA of bacterial strains. Overall, 86 strains, isolated from clinical sites in Japan in 2017, were judges as P.
putida by MALDI-TOF MS (Vitek MS, bioMérieux). The P. putida strains were cultured in brain heart infusion media (Becton Dickinson) at 37 °C overnight. Genomic DNA was extracted using DNeasy Blood & Tissue Kit (QIAGEN), according to the manufacturer's instructions.
Determination of whole genome sequences. Sequencing libraries were prepared using the Next Ultra II DNA library prep kit for Illumina (New England Biolabs) and subjected to HiSeq X platform (Illumina). The obtained 151 bp paired-end reads were de novo assembled to contigs using CLC Genomics Workbench (version 11.0.1, QIAGEN) after trimming depending on the base quality (quality score limit = 0.05)-reads with > 2 ambiguous nucleotides and those <15 bp in length were removed. Antimicrobial genes. Drug resistant genes were detected using BLAST against the AMR database of CLC Genomics Workbench. The blaVIM genes were retrieved from NCBI database and used for BLAST. pcR conditions and amplicon sequencing. According to previous reports 13,17,18 , we selected acsA, argS, aroE, dnaN, dnaQ, era, gltA, guaA, gyrB, ileS, mutL, nuoC, ppnK, ppsA, recA, rpoB, rpoD and trpE genes as the candidates of MLST loci. These were extracted from the genome sequences of the 86 isolates and from the 20 genome sequences available from the NCBI website by CLC Genomics Workbench, and the sequences were aligned using the MAFFT program 25 . In argS, gyrB, ileS, nuoC, ppsA, recA, rpoB and rpoD genes, conserved sequences for primers were 'manually' identified using the Genetyx software (Genetyx Corporation, Japan; Table 1). PCR amplification thermal cycles consisted of 3 min at 98 °C, 30 cycles at 98 °C for 20 sec, at 45 °C for 20 sec and at 72 °C for 1 min, with a final extension at 72 °C for 5 min using the TaKaRa LA Taq DNA Polymerase (Takara Bio) in the Veriti Thermal Cycler (ThermoFisher Scientific). PCR amplicons were treated with ExoSAP-IT Express PCR Cleanup Reagents (ThermoFisher Scientific) and sequenced using the primers listed in Table 1 and the BigDye Terminator v3.1 Cycle Sequencing Kit (ThermoFisher Scientific) on an ABI 3730xl (ThermoFisher Scientific).
Sequence typing. Allele sequences of the 8 housekeeping genes from the 86 isolates and the 20 sequences available in the database were 'manually' extracted using the Genetyx software and uploaded to PubMLST website 20 . I A values were calculated using the START2 software 26 . The UPGMA tree was prepared in the START2 program 26 . STs were grouped using the BURST program 27 . Tajima's D statistic, Fu's F and D statistic and Ramos-Onsins & Rozas' R2 were tested using the DnaSP 6 software 28-30 . calculation of dn/dS ratio. dN/dS substitution ratio of the allelic genes were calculated using the START2 program 26 . For further analysis of the dN/dS ratio, positive selection (dN/dS > 1) of the allelic genes were examined using the CodeML program in the PAML software package 21 . Codon sequences were manually extracted using Genetyx software. Head and tail nucleotides that do not constitute codons were removed. Moreover, stop codons and sequences after the codons were removed. Phlyip 3.0 format files and Neighbor-Joining tree were prepared in MEGA5 software 31 . Site models were tested for the eight housekeeping genes. Snp analysis. SNPs were extracted by the Parsnp tool using KT2440 as a reference after removing the phage regions predicted by the PHAST tool 32,33 . Following the conversion of a VCF file using VCFtools 34 , SNP-based distance tree was constructed using SNiPlay 35 web-tool. The tree was visualised using FigTree (http://tree.bio. ed.ac.uk/software/figtree/). 16S rRNA analysis and digital DNA-DNA hybridization. Sequences of 16S rRNA were extracted from the genome sequences and aligned with NCBI 16S database using commercial software CLC Genomics Workbench. Digital DNA-DNA hybridization was conducted using Genome-to-Genome Distance Calculator 2.1 Webtool 36 .