Fast Klebsiella pneumoniae typing for outbreak reconstruction: an highly discriminatory HRM protocol on wzi capsular gene developed using EasyPrimer tool

In this work we present EasyPrimer, a user-friendly online tool developed to assist pan-PCR and High Resolution Melting (HRM) primer design. The tool finds the most suitable regions for primer design in a gene alignment and returns a clear graphical representation of their positions on the gene. EasyPrimer is particularly useful in difficult contexts, e.g. on gene alignments of hundreds of sequences and/or on highly variable genes. HRM analysis is an emerging method for fast and cost saving bacterial typing and an HRM scheme of six primer sets on five Multi-Locus Sequence Type (MLST) genes is already available for Klebsiella pneumoniae. We validated the tool designing a scheme of two HRM primer sets on the hypervariable gene wzi of Klebsiella pneumoniae and compared the two schemes. The wzi scheme resulted to have a discriminatory power comparable to the HRM MLST scheme, using only one third of primer sets. Then we successfully used the wzi HRM primer scheme to reconstruct a Klebsiella pneumoniae nosocomial outbreak in few hours. The use of hypervariable genes reduces the number of HRM primer sets required for bacterial typing allowing to perform cost saving, large-scale surveillance programs.


Introduction
Most methods used for the identification and typing of prokaryotes are based on DNA amplification and sequencing. Indeed, the sequence of specific genes can harbour enough information to classify bacteria at both species and subspecies level. For instance, Multi-Locus Sequence Typing (MLST) is one of the most used methods for bacterial typing and it is based on the amplification and sequencing of few housekeeping genes 1 . During the last ten years, the analysis of the entire bacterial genome by Whole Genome Sequencing (WGS) approach revolutionized the field, drastically increasing the typing precision 1  Similarly, PFGE typing requires five days and also MLST needs few days. During the last years, the High Resolution Melting (HRM) assay has emerged as a low-cost and fast method for bacterial typing, particularly promising for epidemiological applications 3,4,5,6 . HRM is a single-step procedure for the discrimination of sequence variants on the basis of their melting temperature. This method allows to perform bacterial typing in less than five hours 7 .
To develop a novel HRM-based typing procedure, it is necessary to: i) select one or more core genes; ii) design a primer set in conserved regions flanking a gene portion where the melting temperature varies among the strains.
Andersson and colleagues have developed the "MinimumSNP" tool 8 , which identifies, in a gene alignment, the variable positions that can lead to a melting temperature change (called informative SNPs). MinimumSNP identifies single informative positions, that could be spread along the entire alignment. In other words, it does not indicate which regions are more suitable for primer design: two low-variable regions neighbouring a SNP-rich, informative stretch. Thus, the user has to choose one (or few) SNPs and then design primers around it (or around them).
Herein, we present EasyPrimer, a web-based tool for the identification of the gene regions suitable for primers design to perform HRM studies and any kind of pan-PCR experiments.
Moreover, we validated EasyPrimer by designing HRM primers for the discrimination of clinical isolates of Klebsiella pneumoniae, an important opportunistic pathogen frequently cause of infections in humans and animals 9 .

EasyPrimer: a tool for primers design.
EasyPrimer is a user-friendly open-source tool developed to assists primer design in difficult contexts, e.g. on an alignment of hundreds of sequences and/or on hypervariable genes.
The tool uses as input a sequence alignment and identifies the best regions for primer design: two low variable regions flanking a highly variable one. The on-line and the standalone  versions  of  the  tool  are  freely  available  at https://skynet.unimi.it/index.php/tools/easyprimer.

Primers design.
We downloaded pgi, gapA and wzi gene sequences from BigsDB database (https://bigsdb.pasteur.fr) and we run EasyPrimer to identify the best regions for primer design. The EasyPrimer output for the wzi gene is reported in Figure 1, while the outputs relative to pgi and gapA genes are reported in Supplementary Fig. S1 and Supplementary   Fig. S2, respectively. Then we designed a total of four primer sets: one for pgi, one for gapA and two for wzi (reported in Table 1).

High-Resolution Melting analysis.
We performed HRM experiments using ten primer sets on two strain collections. Four out of the ten primer sets were newly designed in this work (see above), while the remaining six were already available in literature 7 . The two strain collections were: i) the "background" collection, which includes 17 K. pneumoniae strains belonging to 17 different Sequence Types (STs); ii) the "outbreak" collection, which includes 11 K. pneumoniae strains isolated during a nosocomial outbreak. The obtained melting temperatures ("Tm") of the three HRM replicates and their relative average temperature ("aTm") values are reported in Supplementary Table S1.

Primer sets and schemes comparison.
For each of the ten primer sets we calculated the strain distance matrix among the background strains based on the aTm values (see Methods). The calculated aTm distances ranged from zero to three degrees, and the median distances varied among the genes (as shown in Figure 2). In particular, the two wzi primer sets showed median distance values significantly higher than those obtained for many of the other primer sets (see Supplementary Table S2 for details).
We also compared the aTm distance matrices of the following schemes: The median pairwise distance did not significantly change among the three schemes (Wilcoxon test with Holm post-hoc correction, p-value > 0.05) and the relative boxplot graphs are reported in Supplementary Fig. S3.
Furthermore, we compared the aTm distance matrices of wzi and MLST8 schemes for each strain pair of the background collection, subtracting the two matrices (see Figure 3). We found that, among all the 136 possible strain pairs, 66 (48.5%) showed a higher distance for wzi than MLST8. More in detail, the ST147 resulted better discriminated by MLST8 scheme, while the ST15 and ST307 resulted discriminated by wzi.
Whole Genome Sequencing-based strain typing.
Genomic reads of the 11 K. pneumoniae strains of the outbreak collection and genomic reads of the eight strains of the background collection isolated during the San Raffaele hospital surveillance program were obtained.
Ten out of the 11 outbreak isolates belonged to the ST512 while the isolate "BG-Kpn-22-18" resulted to belong to the ST307. All the wzi alleles and the STs identified for the 28 K.
pneumoniae strains (19 sequenced in this work and 9 from Gaiarsa and colleagues 10 ) as well as their accession numbers are reported are reported in Supplementary Table S3.

WGS-based outbreak reconstruction.
An alignment of 66 core-SNPs was obtained from the 11 outbreak strains. The relative Maximum Likelihood phylogenetic tree is reported in Supplementary Fig. S4. The ST512 strains have SNP distances ranging from zero to four SNPs. Conversely, the SNP distances among the ST512 strains and the ST307 strain ranged from 63 to 66 SNPs.

Discussion
High Resolution Melting (HRM) is a real-time PCR analysis for the detection of mutations and polymorphisms 3,4 , also applicable for fast bacterial typing in hospital surveillance and real-time nosocomial outbreak detection 5 . Several works applied HRM to bacterial typing, exploiting Multi Locus Sequence Type (MLST) genes 7,11,12 , which has been considered the gold standard genes for bacterial typing for almost 20 years 13 . These genes have been selected to be housekeeping therefore they display low variability. In this work we show that it is possible to increase HRM discriminatory power using hypervariable genes.  Our analyses showed a good discriminatory power for both the MLST-based and wzi-based HRM assays. Indeed, both schemes successfully discriminated the 17 K. pneumoniae STs.
Additionally, we want to highlight that the observed HRM discriminatory power was obtained using a BioRad CFX Connect real-time PCR instrument (BioRad, Hercules, California): a machine not specifically designed for HRM experiments but for real-time PCR, with a melting temperature sensitivity of 0.5°C (higher than other available HRM machines).
We found that the discriminatory power of an HRM scheme does not strictly depend on the number of genes but also on the genetic variability of the genes. Indeed, comparing the MLST6 and MLST8 schemes, we found that the median distance among the strains did not change significantly. As shown in Figure 4, wzi scheme can discriminate the highly epidemiologically relevant ST258, ST512 and ST307. Furthermore, the ST258 is better discriminated from ST307 and ST512 by the wzi scheme rather than MLST8 scheme ( Figure   3). The wzi scheme contains two primer sets and this reduces drastically the amount of time We applied the wzi scheme to the reconstruction of a nosocomial outbreak occurred in an Italian hospital. During the outbreak, 11 patients resulted colonized or infected by K.
pneumoniae and the WGS typing revealed that the isolates belonged to two different clones.
These clones were identified on the basis of core SNP distance (SNP distance < 5) and MLST profile (one isolate belongs to the ST307 and ten isolates to the ST512). As shown in The use of hypervariable genes in HRM-based bacterial typing, such as wzi in K.
pneumoniae, can drastically increase the discriminatory power of the method. With the large number of genomes available in databases it is now possible to find the most variable genes for a species. Unfortunately, it is not easy to identify the best regions to design primers in such hypervariable genes, particularly when hundreds of different alleles are available.
EasyPrimer can represent a useful tool to overcome this limit.

Isolates collection
We considered two strain collections: the "background" and the "outbreak" collections. The background collection includes 17 strains belonging to 17 different STs: nine retrieved from a previously WGS typed 10 bacterial collection (see Supplementary Table S3 Table S4).

DNA extraction and Whole-Genome Sequencing
The genomic DNA of the 11 outbreak strains was extracted using the DNeasy blood and tissue kit (Qiagen, Hilden, Germany) following the manufacturer's instructions.

High Resolution Melting primer design using EasyPrimer
The EasyPrimer tool was developed for the identification of the most suitable regions for primers design in HRM and, more in general, in pan-PCR experiments. Briefly, the tool starts from the gene alignment, evaluates the amount of genetic variation for each position and identifies the most reliable regions for primer design. EasyPrimer flags as good candidates for primer design two conserved regions flanking a highly variable one (taking into consideration, in advance, the optimal lengths of primers and amplicon). The user can decide either to evaluate the variability of the amplicon considering HRM-detectable SNPs only (the best option for HRM primer design) or all the SNPs (the best set for pan-PCR experiments). A detailed description of the algorithm is reported in the Supplementary Note S1.
To develop an HRM-based protocol for K. pneumoniae typing, we focused on the seven MLST genes and on the hypervariable capsular gene wzi 14 . The HRM primer sets for five out of the seven K. pneumoniae MLST genes were already available in literature 7 (infB, mdh, phoE, rpoB and two sets on tonB). For the remaining two MLST genes (pgi and gapA) and for the wzi capsular gene (two primer sets) the primers were designed using EasyPrimer.
For each gene the sequences were downloaded from the BigsDB database (https://bigsdb.pasteur.fr, 218 alleles for pgi, 183 for gapA and 563 for wzi), EasyPrimer was run and primer sets were designed on the basis of its output.

High-Resolution Melting assays
We performed HRM assays using the genomic DNA extracted from each of the 28 K.
pneumoniae strains included in this work, using each of the ten primer sets mentioned above. HRM analyses were performed on the BioRad CFX Connect real-time PCR System (BioRad, Hercules, California). Each 10µl reaction contained: 5µl of iTaq™ Universal SYBR® Green Supermix (BioRad, Hercules, California), 0.4µl of each primer (0.4µM) and 1µl of template DNA (25-50ng/µl). The thermal profile was as follows: 98°C for 2min, 40 cycles of [95°C for 7s, 61°C for 7s, and 72°C for 15s], 95°C for 2min, followed by HRM ramping from 70-95°C with fluorescence data acquisition at 0.5°C increments. Three technical replicates were performed for each strain and for each gene analysed. Negative controls were added in every run and for each gene.

Comparison of the HRM primer sets and schemes
We compared the discriminatory power of the ten HRM primer sets on the 17 strains of the background collection. For each set we calculated the average melting temperatures (aTms) of the three replicates for each strain and the relative strain distance matrix based on the obtained aTms. Thus, we compared the discriminatory power of the different primer sets by comparing the relative distance matrix values using Wilcoxon test with Holm post-hoc correction.
Furthermore, we grouped the primers sets in three schemes (MLST6, MLST8 and wzi) and we compared the relative strain distance matrices using Wilcoxon test with Holm post-hoc correction. The scheme compositions were as follows: the MLST6 scheme included the six primer sets proposed by Andersson and colleagues 7 for five MLST genes (with two primer sets for tonB); the MLST8 included all the MLST6 primer sets, the primers for pgi and gapA (newly designed in this work using the EasyPrimer tool); the wzi scheme included the two primer sets for the wzi gene (newly designed in this work). For details see Table 1 and Figure 2.
Then, we compared the discriminatory power of MLST8 and wzi schemes by subtracting the relative distance matrices (wzi -MLST8) and studying the obtained matrix with heatmaps.
All these analyses were performed using R (https://www.r-project.org/) and the R libraries Ape and Gplots.

HRM-based outbreak reconstruction
From the aTms of the outbreak and background collections we calculated the distance matrices for MLST6, MLST8 and wzi primer schemes (for more details see above) and clustered the strains using the hierarchical clustering method implemented in the Hclust function in R.

WGS-based strain typing
We retrieved the reads of the nine K. pneumoniae strains previously WGS-typed by Gaiarsa and colleagues 10 from NCBI database using fastq-dump tool (for accession numbers see Supplementary Table S3).
Then we performed de novo assembly on the reads obtained from the 19 strains sequenced in this work and those from the nine strains from database using SPAdes software 15 .
We retrieved the wzi allele sequences from BigsDB database and we annotated the wzi allele present in each of the 28 genome assemblies included in the study by Blastn search and manual curation of the results.
We retrieved the sequences of the K. pneumoniae MLST gene alleles and the relative scheme tables from the BigsDB database. Thus, we determined the MLST profiles using an in-house Blastn-based Perl script.

Core-SNP-based phylogenetic reconstruction on outbreak strains
We aligned the reads obtained from the 11 outbreak strains against the NTUH_K2044 reference genome (accession number NC_016845.1), and performed the SNPs calling following the GATK best practice procedure. We masked SNPs localized within repeated regions, identified using MUMmer 16 , or prophages, identified using PhiSpy 17 , and we called the core-SNPs among the strains using an in-house Python script. Thus, we subjected the core-SNPs alignment to phylogenetic analysis as follows: the best evolutionary model was assessed by ModelTest-NG and phylogenetic reconstruction was performed using the selected best model, with RAxML8 software 18 . We evaluated the core-SNPs distances among the strains using the R Ape library.