A large-scale genomic snapshot of Klebsiella spp. isolates in Northern Italy reveals limited transmission between clinical and non-clinical settings

The Klebsiella group, found in humans, livestock, plants, soil, water and wild animals, is genetically and ecologically diverse. Many species are opportunistic pathogens and can harbour diverse classes of antimicrobial resistance genes. Healthcare-associated Klebsiella pneumoniae clones that are non-susceptible to carbapenems can spread rapidly, representing a high public health burden. Here we report an analysis of 3,482 genome sequences representing 15 Klebsiella species sampled over a 17-month period from a wide range of clinical, community, animal and environmental settings in and around the Italian city of Pavia. Northern Italy is a hotspot for hospital-acquired carbapenem non-susceptible Klebsiella and thus a pertinent setting to examine the overlap between isolates in clinical and non-clinical settings. We found no genotypic or phenotypic evidence for non-susceptibility to carbapenems outside the clinical environment. Although we noted occasional transmission between clinical and non-clinical settings, our data point to a limited role of animal and environmental reservoirs in the human acquisition of Klebsiella spp. We also provide a detailed genus-wide view of genomic diversity and population structure, including the identification of new groups.

The Klebsiella group, found in humans, livestock, plants, soil, water and wild animals, is genetically and ecologically diverse. Many species are opportunistic pathogens and can harbour diverse classes of antimicrobial resistance genes. Healthcare-associated Klebsiella pneumoniae clones that are non-susceptible to carbapenems can spread rapidly, representing a high public health burden. Here we report an analysis of 3,482 genome sequences representing 15 Klebsiella species sampled over a 17-month period from a wide range of clinical, community, animal and environmental settings in and around the Italian city of Pavia. Northern Italy is a hotspot for hospital-acquired carbapenem non-susceptible Klebsiella and thus a pertinent setting to examine the overlap between isolates in clinical and non-clinical settings. We found no genotypic or phenotypic evidence for non-susceptibility to carbapenems outside the clinical environment. Although we noted occasional transmission between clinical and non-clinical settings, our data point to a limited role of animal and environmental reservoirs in the human acquisition of Klebsiella spp. We also provide a detailed genus-wide view of genomic diversity and population structure, including the identification of new groups.
The Klebsiella genus is a member of the Enterobacteriaceae family. The most well studied species is Klebsiella pneumoniae, which the World Health Organization has recognized as a critical priority healthcare-associated pathogen 1 . Antibiotic resistance has spread rapidly within K. pneumoniae and other members of the genus since the early 1980s 2 ; the plasmid-mediated spread of genes encoding carbapenemases over the last two decades is of particular concern 3,4 . Widespread clones of K. pneumoniae and other Klebsiella species associated with these genes are spread through the healthcare network 5 ; in common with other key antimicrobial resistance (AMR) determinants, genes encoding carbapenemases have also been detected in multiple non-clinical settings including livestock and wastewater [6][7][8] .
The potential public health risks posed by these non-clinical reservoirs of antibiotic resistance has led to a widening adoption of the One Health framework for AMR management 9 . This integrative approach is underpinned by a synthesis of antibiotic stewardship and AMR surveillance within clinical, community, agricultural and environmental settings. However, existing data on the abundance and distribution of AMR strains and genes in the environment does not provide a full picture and our current understanding of their maintenance and spread within and between ecological settings is fragmentary 10 . Given the urgent requirement for policy priorities informed by robust risk assessments, this represents a key knowledge gap.
A powerful way to infer pathogen transmission dynamics is to use whole-genome sequencing (WGS) combined with phylogenetic and other statistical analyses. WGS has been applied successfully within healthcare settings 11,12 and played a key role in the management of the severe acute respiratory syndrome coronavirus 2 pandemic; however, capturing transmission pathways within complex non-clinical settings is more challenging. Large contemporaneous samples from well-defined regions are required to control for the confounding effects of divergence time and geographical variation. Nevertheless, a picture has begun to emerge that the risk of transmission of AMR genes and strains from environmental or agricultural settings into the clinic may be rather low, at least in well-resourced regions; this is a view seemingly at odds with the prevailing One Health cri de coeur 13 . Recent reports suggested that transmission of AMR strains and/or genes between humans and agricultural animals is limited in Escherichia coli 14 , Enterococcus faecium 15 and K. pneumoniae 16,17 . The evidence, however, is equivocal 18 ; a compelling counter-example is the study on colistin

Species clonality and population structures
We compared the population structures of the different species by delineating SCs using PopPunk 31 (Extended Data Fig. 5 and Fig. 3). This revealed high levels of diversity in all species, as previously described for K. pne 32,33 . In total, we identified 1,168 SCs across all species, of which only 41 (3.5%) were represented by more than 10 isolates and 50% of all isolates corresponded to SCs that were observed no more than 6 times. The most common SC within each species represented between 3 and 10% of the population (Fig. 3a) and the SC accumulation curves were not close to saturation (Fig. 3b). K. orn showed particularly high diversity; 147 SCs were identified from 258 isolates and the most common SC accounted for 3% of the isolates. Pairwise divergences tended to be distributed around a modal average of approximately 1% (Fig. 3c) and each lineage was roughly equidistant to every other lineage (Extended Data Fig. 6). In some cases (for example, K. pne, K. gri), a much smaller peak was also evident at a much lower divergence, reflecting expansion of individual SCs. Klebsiella michiganensis (K. mic), K. hua, K. spa, Klebsiella terrigena (K. ter) and K. aer also showed more diverged modal peaks, with core genome distances up to 3%; this reflects the presence of deep subdivisions within these species, which is consistent with nascent speciation; this was also evident in the individual species trees (Extended Data Fig. 6 and Supplementary We used a permutation test to gauge whether different species were non-randomly distributed between sources (Fig. 4). K. pne was significantly overrepresented in hospital carriage and in livestock (cows and pigs), as expected 16,34 , but was underrepresented in sheep, water (and turtles), invertebrates and soil/plants. Species within the K. orn SPEC were significantly overrepresented in soil and plants and underrepresented in hospital carriage, which is also consistent with previous work 34 . Other distributions were more surprising; for example, we did not find any evidence that Klebsiella variicola (K. var) is associated with plants, contrary to its original description 35 and species from the K. oxy SPEC tended to be overrepresented in invertebrates. While this is consistent with a previous report of a symbiotic relationship between houseflies and K. oxy 36 , to our knowledge this specialism has not been described before in other species of this SPEC. These data also point to a significant overrepresentation of K. mic in hospital carriage and we note that a small but consequential proportion (17 out of 600; 2.8%) of the diagnostic isolates from hospital disease correspond to this species.
An important caveat with this analysis is that statistical association can result from clonality rather than ecological adaptation. For a, Maximum-likelihood phylogenetic tree constructed from core genes, coloured by species, with the SPECs shown. Only one isolate from each species is shown as this tree is intended to show the distances between species. b, Neighbour-joining phylogenetic tree constructed from pairwise Mash distances between all isolates, coloured by species, with the SPECs shown. The metadata rings show sources (inner rings) and resistance and virulence scores (outer rings). c, Bar  example, the apparent overrepresentation of K. oxy in turtles is due to the clonal expansion of a single lineage (K. oxy SC1) within a population of turtles in a pond at a botanical garden. However, we did not find evidence for clonal expansion of K. mic within hospital settings nor for certain K. mic lineages being more strongly associated with humans than others.

Distribution of resistance genes
Kleborate 26 assigns isolates to 1 of 4 resistance scores: 0 = low level resistance; 1 = extended-spectrum beta-lactamase (ESBL)-positive; 2 = carbapenemase-positive; and 3 = carbapenemase plus colistin-positive. The distribution of species according to these categories and to each source is shown in Figs. 2 and 5a; a full breakdown   Only three isolates of species other than K. pne from outside the hospital setting harboured an ESBL gene; in each case, the gene in question was bla SHV-12 . These were a K. orn isolate recovered from a fly caught within a hospital (SPARK_2923_C1), a K. orn isolate from environmental water (SPARK_1613_C1) and a Klebsiella quasivariicola (K. qva) isolate from a pig (SPARK_1906_C1). Excluding K. pne, there were 9 isolates from other species recovered from hospital patients that harboured ESBLs (bla CTX-M-15 , n = 4; bla SHV-12 , n = 5). Of note are a pair of clonally related isolates (SPARK_1773_C1, SPARK_2031_C1) belonging to clone K.qpq_SC_11_ST571, which harboured bla CTX-M-15 plus the virulence factors ybt, iro and rmpA. These isolates were recovered from urine samples from two inpatients at the same hospital in April 2018. This is consistent with hospital transmission of a new Klebsiella quasipneumoniae subsp. quasipneumoniae (K. qpq) clone exhibiting both resistance and virulence genes.
Excluding K. pne, only three isolates from other species harboured a carbapenemase gene; these were all isolated from the hospital environment and carried bla VIM-1 . Two of these (K. mic SPARK_1816_C1 and K. gri SPARK_1652_C1) presented nearly identical genotypic and phenotypic resistance profiles to each other and to five isolates of K. pne. This resistance profile is characterized by the presence of the bla SHV-12 , bla VIM-1 , mph(A) and qnrS genes, harboured by a class 1 integron (GenBank accession no. MN783743) associated with the conjugative IncA plasmid pR210-2-VIM 37 . This plasmid is known to circulate in multiple Enterobacteriaceae species in Italy 38 and the re-emergence of VIM-1 in this region is thought to reflect the increased use of ceftazidime-avibactam against K. pneumoniae carbapenemase (KPC)-producing bacteria. Closer analysis revealed the presence of this plasmid in distinct K. pne clones within a single patient and in other Klebsiella species (Extended Data Fig. 8).
Regarding the 1,705 isolates of K. pne, 1,105 (64.8%) exhibited a low level of resistance (category 0), 411 (24.1%) carried an ESBL (category 1), 175 (10.3%) carried a carbapenemase gene (category 2) and only 14 (0.8%) carried a carbapenemase gene and colistin resistance (category 3). Two ESBL genes were dominant; bla CTX-M-15 and variants of bla SHV-27 , which together accounted for 83.5% of all ESBL genes. These were distributed non-randomly between sources; 238 out of 256 (93%) of   Only Klebsiella samples from the SCAI dataset (n = 2,795) are shown and 23 of these samples were removed either because they were from very poorly sampled sources (21) or could not be confidently assigned to a species (2). The rows represent species delimited according to SPECs and the columns represent sources delimited according to source categories. The grey shaded rows at the bottom of the table give the total number of positive samples for the corresponding source, and below, the total number of samples for that source. The grey shading reflects the percentage prevalence from each source. The number of positive samples are shown for each species from each source and a blank cell indicates zero positive samples. The red shading shows the relative enrichment of each species from each source, given the overall prevalence from that source and assuming a null hypothesis whereby all species would be equally likely to be observed from any given source. The dark red and blue borders show those categories where the number of samples is significantly higher or lower than expected, respectively, as determined by a permutation test. The bar plot to the right shows the number of samples from each species and the total sampling effort.  6  21  2  3  3  7  5  8  5  3  3  30  2  2   4  1  1  2   6  25  16  6  3  9  109  18  20  1  2  2  9  2  2  3  4  5  1  9  34  1  2   1  1  1  1  1  3  1   5  8  15  1  3  2  63  39  22  1  2  3  2  2  2  7  4  6  24  34  2   6  1  6  2  1  1  1  2  2  2  3   1  9  1  3  6  5  20  1  3  2  1  12  9  16  1  12  21  3  3   16  344  26  2  17  337  301  16  3  2  24  13  1  2  1  9  5  17  12  53  10  20   3  6  3  20  5  3  1  2  3  1  23  1  2   3  2  25  4  1  2  2 9 1 4 1 1  Article https://doi.org/10.1038/s41564-022-01263-0 the K. pne isolates bearing bla CTX-M-15 were from humans, the exceptions being from hospital surfaces and companion animals. In contrast, only 51 out of 170 (30%) of the K. pne isolates bearing bla SHV-27 variants were from human sources compared to 87 out of 170 (51%) from cows. Of the 175 K. pne isolates harbouring carbapenemase genes, all were isolated from the hospital environment and the majority (n = 161; 92%) carried bla KPC and corresponded to the healthcare-associated clones ST258/512 or ST307. K. pne_ST307_SC1 was the most abundant clone in the dataset and was isolated from hospital surfaces and companion animals as well as hospital patients, although none of the ST307 isolates from non-human sources harboured bla KPC . Eleven K. pne isolates harboured bla VIM-1 , including those discussed above, and 3 K. pne isolates harboured bla  . Of the 192 isolates with a carbapenemase gene for which phenotypic resistance data were also available, 91% showed phenotypic resistance to ertapenem, 71.7% to imipenem and 77.7% to meropenem. In contrast, the values were 0.8% (27 isolates), 0.18% (6 isolates) and 0.28% (9 isolates), respectively, for isolates (from all species) without a carbapenemase gene; these exceptions are likely due to changes in membrane permeability 39 . Consistent with the genotypic data, there was no evidence for any phenotypic resistance to carbapenems outside of the hospital environment.
There were 14 isolates in the highest resistance category; these were all K. pne isolates from hospitals and all harboured the carbapenemase gene bla KPC plus a mutated mgrB gene known to confer colistin resistance. All except one of these isolates belong to the common healthcare-associated clone ST258/512, with the exception of a single ST307 isolate. Phylogenetic analysis of the 95 ST258/512 isolates suggested at least 5 acquisitions of the mgrB chromosomal mutation into this clone ( Supplementary Fig. 1). The available phenotypic data confirmed resistance to colistin in 13 out of 14 of these isolates, 1 of which (SPARK_1222_C1) was originally assigned as sensitive using the BD Phoenix 100 automated system but was subsequently found to be resistant using the Sensititre platform. Phenotypic sensitivity to the other isolate containing mgrB (SPARK_372_C2) could not be confirmed because this isolate lost viability. In total, phenotypic resistance to colistin was observed in 46 K. pne isolates, 41 of which were from humans. Besides the 12 phenotypically resistant isolates harbouring an mgrB mutation, Kleborate did not detect a mechanism for colistin resistance in the other cases, including three K. pne isolates from pigs and a single K. aer isolate from a goat. This was not unexpected since many mcr variants are not included in the Kleborate database and colistin resistance can also be conferred through mutations responsible for membrane synthesis 40 . The final non-human colistin-resistant isolate was a single K. pne isolate from a cow that harboured mcr-1.

Distribution of virulence genes
Like the genotypic resistance profiles, all isolates were assigned to 1 of 6 categories based on the presence of genes encoding the known virulence factors yersiniabactin (ybt), aerobactin (iuc), salmochelin (iro) and colibactin (clb), as identified by Kleborate (Figs. 2 and 5b); 2,749 out of 3,482 (78.9%) of all isolates and 1233 out of 1705 (72.3%) of K. pne isolates were in the lowest virulence category and 669 out of 3,483 (19.2%) of all isolates corresponded to virulence category 1, reflecting the presence of ybt, but the frequency of this locus varied markedly between species: ybt was present in 410 out of 1706 (24%) of the K. pne isolates, 249 out of 258 (96.5%) of the K. orn isolates, 6 out of 279 (2.1%) of the K. var isolates and 2 out of 171 (1.1%) of the K. aer isolates. The ybt locus in K. orn was assigned as an 'unknown' type by Kleborate, chromosomally located close to an transfer RNA-Asparagine site (with no evidence for an associated integrative conjugative element) and was phylogenetically distinct from the ybt locus in K. pne 41,42 . Despite being a core locus in K. orn, this distinct ybt variant was not found in any other species, including related species from K. orn SPEC. While only 7 isolates corresponded to virulence category 2 (ybt + clb), 45 K. pne isolates and 1 K. oxy isolate were assigned as virulence category 3. These isolates harboured the iuc locus that encodes the siderophore iuc and 38 out of 46 were recovered from pigs. In total, 42 out of 87 (48%) of the pig isolates harboured iuc and in 40 out of 42 (95%) cases harboured iuc3. Three K. pne isolates and one K. oxy isolate from the farm environment (water and surfaces) also harboured iuc3; a similar association between iuc3 and pig isolates has recently been described in Germany 43 . The high frequency of iuc3 in porcine isolates contrasts with clinical isolates, in which iuc1 and iuc2 are more common 44 . The porcine iuc3 was observed on multiple sequence types (STs) and from different farms; hence, it is not a simple consequence of clonal spread. Preliminary analysis also suggests that iuc3 is carried by diverse plasmids (Extended Data Fig. 9).
Twelve isolates were predicted to show a high level of virulence (categories 4 and 5). The two category 5 isolates corresponded to the hypervirulent lineage K. pne ST23 and contained all five virulence loci. These isolates were from patients in different hospitals and were sufficiently diverged to rule out epidemiological linkage. One of these ST23 isolates, SPARK_1158_C1, isolated from the urine of a hospital inpatient, had also acquired the resistance genes qnrS1 and bla TEM and exhibited phenotypic resistance to ciprofloxacin and levofloxacin. Of the ten K. pne isolates corresponding to category 4, four were from cases of hospital disease, four from pigs (all containing iuc3) and two from dogs. The two isolates from dogs, representing STs 5 and 25, harboured ybt, iuc, iro and rmpA.

The distribution of sublineages below the species level
We examined the distribution across sources of subspecies SCs as defined using PopPunk, using the same permutation test used to examine species distributions ( Supplementary Figs. 2-17). This analysis revealed that different K. pne lineages were associated with either cows or humans ( Supplementary Fig. 2) and this was also borne out by phylogenetic analysis (Supplementary Fig. 18). The lineages SC1_ST307, SC2_ST17, SC3_ST512, SC4_ST45 and SC11_ST392 were mostly associated with humans, although these varied in the degree to which they were associated with hospital carriage versus hospital disease. For example, 66% of the SC1_ST307 isolates were associated with hospital disease and 28% with hospital carriage. These figures contrasted with K.pne_SC2_ST17, the second most common lineage in our dataset (n = 128), for which the equivalent figures were 20% and 57%, respectively. Other K. pne SCs were associated with cows rather than humans (for example, SC5_ST661, SC9_ ST3068, SC10_ST2703, SC17_ST3345). Some intermingling occurred, particularly in SC5_ST661, which contained clonal expansions of both bovine and human isolates. This lineage was previously observed from both human and bovine sources 16 and may represent a more generalist clone that is adapted to and able to transmit between both cows and humans.
Fewer statistically significant SC enrichments were apparent for other species, likely due to smaller sample sizes. However, a number of observations were notable. As discussed, K. mic was enriched within hospital carriage (Supplementary Table 4) but this was not due to the expansion of a single SC. Twenty-five of the 30 most common SCs of this species were present in hospital carriage samples but no single SC was significantly more commonly associated with hospital carriage relative to the others (Supplementary Fig. 3). In contrast, the association of K. gri with invertebrates was largely driven by K. gri SC1 ( Supplementary  Fig. 7). This is unlikely to reflect clonal expansion or sampling bias since K. gri SC1 was associated with different invertebrate hosts (a cockroach, fly, wasp and an unspecified bug) sampled in different locations. This clone, which has no notable resistance or virulence attributes, was also recovered from a cockroach caught in a hospital environment; an isolate very closely related to this clone was recovered from an outpatient of the same hospital ( Supplementary Fig. 19).

Quantifying transmission
To quantify and compare transmission events between different settings, we used a single-nucleotide polymorphism (SNP) threshold-based approach (thresholds: 0, 1, 2, 5, 10, 20). It was clear from the resulting transmission matrices and networks ( Fig. 6 and Supplementary Table 6) that most of the transmission occurred within a single source and, most importantly, that acquisition by humans almost always originated from other humans rather than from animals or the environment. In particular, our analysis further reinforces the view that transmission of K. pne, and other species, between cows and humans, which are the two most deeply sampled sources, is limited. Despite this, we note that sporadic transmission events occur relatively commonly between humans and companion animals and very occasionally between humans and other sources, including river water and invertebrates.

Discussion
The One Health framework is integral to AMR research programmes aiming to mitigate the risks posed by non-clinical reservoirs of AMR through careful surveillance and stewardship. These risks exist on multiple levels, with the simplest being sporadic transmission events, for example, from livestock to farmers. If these events are epidemiologically distinct, and onward human-to-human transmission is unlikely, then the public health risk posed by each event is likely to be low. However, the recovery of isolates from dogs that are predicted to be hypervirulent, and the widespread healthcare-associated clone ST307 from both cats and dogs, suggest that transmission of high-risk clones between companion animals and humans may warrant close attention. We note that ST307 and ST15, the latter lineage being the most common K. pne lineage recovered from dogs in our data, have both previously been associated with companion animals 45 .
A more complex question relates to the emergence of high-risk lineages that combine heightened virulence and/or resistance attributes with the ability to move between, and spread within, different settings. A full understanding of the emergence of such clones requires a consideration of likely anthropogenic drivers, such as inappropriate use of antibiotics or other environmental stresses, but also of the underlying ecology since free transmission will be less likely if the selective landscape is characterized by local adaptation and specialization.
Our data revealed low levels of resistance and virulence genes outside of clinical settings and within species other than K. pne. Given the caveat that this may in part reflect biases in the database used by Kleborate towards well-characterized genes common in K. pne, this suggests that the emergence and subsequent spread of highly virulent and/or resistant lineages within the environment is a rare event. Human-community carriage     The acquisition of a variant ybt by K. orn as a core locus provides a possible exception, although the functional and epidemiological significance of this is unknown and it has not been observed in any other species. An additional example is the high frequency of porcine K. pne isolates that harbour the virulence locus iuc3. Although none of the pigs harbouring iuc3 isolates showed signs of disease, it is likely that this iuc lineage is adaptive in the pig host. While much is unclear about the function and significance of the iuc3 locus in K. pne isolates from pigs, and the near universal presence of ybt in K. orn, their close surveillance is warranted.
In contrast to the evidence from environmental and animal settings, our data provide evidence for the emergence of new and potentially high-risk lineages within the hospital setting, even in species other than K. pne. The new lineage K. qpq ST571, associated with hospital disease, is a worrying example since this clone harbours both resistance (bla CTX-M-15 ) and virulence (ybt, iro and rmpA) genes. Although only two isolates of this lineage were observed, they were both isolated from urine from different patients in the same hospital and are likely to be epidemiologically linked. A second example is the interspecies transfer of the plasmid pR210-2-VIM-like carrying bla VIM-1 from K. pne to K. mic and K. gri isolates, again within the hospital environment, as well as the intrapatient transfer of this plasmid between different K. pne lineages. Our data also revealed a surprisingly high rate of K. mic within hospital carriage, although in this case there was no evidence for the hospital spread of high-risk K. mic lineages. The high frequency of this species in hospital carriage, combined with the recovery of a K. mic strain harbouring the bla VIM-1 , as well as previous reports of strains of this species harbouring carbapenemase genes 46 , urges heightened clinical awareness of this species.
We note that species and SCs within species are non-randomly distributed, with the clearest example being the distinct sets of K. pne SCs of human and bovine origin, which is consistent with previous studies 16 . Moreover, our analysis revealed that transmission is much more common within, than between settings, and that most cases of acquisition of Klebsiella by humans is likely to be from other humans, which has been suggested earlier for E. coli 47 . Thus, our data and analyses broadly challenge the view that bacterial strains and the resistance genes they carry can 'flow' unimpeded between different settings and we argue that local adaptation plays a role in mitigating transmission. The ecological and phylogenetic distribution of virulence and resistance genes also points to barriers to transmission between the clinical environment, animals and the environment. High levels of virulence and/or resistance tends to be rare in species other than in K. pne and outside the hospital environment. The complete absence of genotypic or phenotypic evidence for carbapenem non-susceptibility outside of healthcare settings is particularly noteworthy.
In conclusion, in this study we described WGS data incorporating multiple species of the Klebsiella genus from diverse sources. Our findings broadly corroborate recent research indicating hospitals as the hubs of K. pne resistance dissemination in Europe 5 and justify a continued focus on breaking the transmission chains throughout the healthcare network. Moreover, our analysis suggested that new lineages of heightened virulence and/or resistance are most likely to emerge within hospital settings rather than in the environment or animals, although this possibility cannot be discounted. We contend that the One Health perspective is pertinent for restricting sporadic transmission events and that transmission dynamics will vary according to the region and the pathogen under study. For example, contact between humans and animals may be much more common in many low-resource regions 48 . We also recognize that the transmission dynamics of strains can be distinct from the plasmids they carry. Finally, we acknowledge limitations in our study with respect to wastewater and food-borne transmission, which may play important roles in the transmission cycle between humans, animals and the environment.

Sampling
A summary of the sampling and subsequent methodology is provided in Extended Data Fig. 1. In this study, two strategies were used to obtain samples: 5,900 samples were collected in the city of Pavia (Northern Italy) and the surrounding province between June 2017 and November 2018. Information on the samples collected is given in Supplementary  Table 1. To summarize, the following types of samples were collected: stool and rectal swabs from hospital inpatients and outpatients (four different hospitals) and from a nursing home; stool from healthy volunteers; stool and rectal swabs from companion animals, farm animals and animals admitted to veterinary clinics (dogs, cats, cattle, pigs, poultry, turtles, rabbits and wild birds); invertebrates; samples of edible and ornamental plants, both wild and purchased from grocery shops, garden centres and wholesale distribution; soil samples; samples of drinking water (drinking fountains) and surface water (rivers and irrigation ditches); surface swabs from hospital, anthropic surfaces (including ATM keypads, ticket machines, buses, benches, supermarket trolleys) and farm surfaces (including enclosure, buckets and milking machines). Seven hundred and twenty-two Klebsiella isolates obtained from the laboratory diagnostic routine from urine, wound swabs, respiratory samples and blood cultures of hospital patients with infections were also processed. All sampling metadata was stored in Microsoft Excel 365 spreadsheets.

Metadata
Detailed metadata on all the sequenced isolates is given in Supplementary Table 2 and the Microreact project. For guidance, a summary of the main metadata fields used in the Microreact project is provided in Supplementary Table 3 and instructions on the use of this platform are available at https://microreact.org/. Maps showing the locations of the sequenced isolates and a sampling timeline (sequenced isolates only) is given in Fig. 1.

Sample processing
Stool and rectal swab samples (fecal swabs; Copan), both from human and animals, were enriched in Luria Bertani (LB) broth with amoxicillin (10 mg ml −1 ) at 36 ± 1 °C for 24 h. Invertebrates were frozen for at least 24 h after sampling and surviving bacteria were recovered from the surface of the animals as well as the gut. Freezing was carried out for practical and sanitary reasons, although this procedure was not optimal due to a loss of sensitivity and potentially some sampling bias resulting from bacterial cell death. For the surface, each insect was washed with sterile water for 2 min and an aliquot of the washing was enriched in LB broth with amoxicillin (10 mg ml −1 ). For the gut, the insect's surface was washed with ethanol 70% for 5 min (once the surface had been sampled) and then air-dried. Small insects were ground with a pestle, while larger ones were dissected with a sterile scalpel to separate the gut. The gut was then used to inoculate LB broth with amoxicillin (10 mg ml −1 ), which was left to incubate at 36 ± 1 °C for 24 h.
For plants, each sample was divided into four portions: rhizosphere; rhizoplane; epiphyte; endophyte. The portions corresponding to the rhizosphere, rhizoplane and epiphytes were washed once with PBS (pH 7.2). The buffer used for washing was then added to the LB broth with amoxicillin (10 mg ml −1 ) at 36 ± 1 °C for 24 h. Endophytes were washed with ethanol 70% for 2 min (once the surface had been sampled) and rinsed with sterile water before being washed with a sodium hypochlorite 2% and Triton X-100 1% solution and incubated for 2 min before washing with sterile water. Endophytes were ground once in PBS (pH 7.2) with pestle and mortar. An aliquot of 1 ml was enriched in LB broth with amoxicillin (10 mg ml −1 ) at 36 ± 1 °C for 24 h.
Soil samples (5 g) were washed in PBS (pH 7.2), which was then added to the LB broth with amoxicillin (10 mg ml −1 ) at 36 ± 1 °C for 24 h. Water samples (1 l for both drinking and river water) were filtered through a sterile filter unit (pore size 0.2 µm; Thermo Fisher Article https://doi.org/10.1038/s41564-022-01263-0 Scientific) and the membranes were enriched in LB broth with amoxicillin (10 mg ml −1 ) at 36 ± 1 °C for 24 h. For environmental water (mainly from ditches and ponds), we sampled and filtered at least 50 ml of water (higher volumes when possible) and then proceeded in the same way as the drinking and river waters. Surface swabbing was performed on areas of 10 cm 2 for each point by using a swab rinse kit (Copan). After the collection, the swab and its medium were enriched in LB broth with amoxicillin (10 mg ml −1 ) at 36 ± 1 °C for 24 h. For each of the above samples, 1 µl of each enrichment was plated on SCAI 50,51 medium and the plates were incubated at 36 ± 1 °C for 48 h.

Diagnostic isolates
In addition to the Klebsiella isolates recovered using selective SCAI media from diverse sources, we also assembled an additional 722 diagnostic Klebsiella isolates recovered from hospital patients and veterinary clinics as part of ongoing surveillance programmes within the Pavia catchment area, of which 687 were successfully sequenced: 676 (98%) of the isolates were from human clinical cases, with 484 (70%) being from a single hospital, 600 (89%) were from hospital disease and 76 (11%) were from hospital carriage. Most of the human clinical isolates were recovered from urine (n = 417; 62%), the other sample types being blood (n = 74; 11%), rectal swab (n = 70; 10%), bronchial (n = 45; 7%), sputum (n = 20; 3%) and other minor sources.

Species identification and antimicrobial susceptibility testing
Yellow and mucoid colonies on SCAI plates suspected to belong to the Klebsiella genus were identified at the species level through matrix-assisted laser desorption/ionization (MALDI)-time-of-flight (TOF) mass spectrometry (Microflex LT/SH; Bruker Daltonik GmbH) equipped with the Bruker biotyper 3.1 software. Once confirmed to be members of this genus, the isolates were subcultured on MacConkey agar for antibiotic susceptibility testing and DNA extraction. Antibiotic susceptibility was tested for all the isolates using the BD Phoenix 100 automated system and the dedicated panels NMIC-402 for all the diagnostic routine samples and NMIC-417 for all the other samples. The antibiotics and range of antibiotic concentrations present in the two panels are listed in Supplementary Table 7. We also used broth microdilution (Sensititre panel DKMGN; Thermo Fisher Diagnostics) on a subset of isolates to confirm phenotypic resistance to colistin in those cases where there were discrepancies between the genotypic and phenotypic data.

MALDI-TOF
All of the sequenced isolates were initially assigned to 1 of 7 Klebsiella species by MALDI-TOF and, apart from 1 exception, subsequently assigned to 1 of 15 species on the basis of the genome sequences using Kleborate and phylogenetics. The accuracy of the MALDI-TOF assignments varied according to species; 88.4% of the isolates assigned as K. pneumoniae by MALDI-TOF were confirmed by WGS, whereas only 30% of the isolates assigned as K. oxytoca were confirmed as this species. These discrepancies reflect the fact that reference databases are currently unable to distinguish K. oxytoca from closely related species (Supplementary Table 8).

Lineage assignment at the subspecies level
Kleborate was used to assign STs to the isolates but this was only possible for those species for which MLST schemes have been previously established. Therefore, we carried out intraspecies lineage assignments into SCs for all species using PopPunk v.2.0.2. For each species, the number of components to fit in the mixture model (k) was chosen based on the scatter plot of core and accessory distances. The model was then fitted and the boundary refined using an iterative process of moving the boundary and reassessing the network features. In all cases, the core boundary was used to define the clusters. For most species, 2 components provided the best fit, the exceptions being K. aer (k = 6), K. mic, K. qpq and K. ter (k = 3). For two species, there were outlying isolates (SPARK_1532_C1, SPARK_1532_C2; SPARK_1553_C1, SPARK_871_C1), which were removed to fit the model and then reassigned as query sequences. Plots of the final fits are shown in Extended Data Fig. 5. The SCs defined by PopPunk were named according to their relative abundance within each species, with SC1 being the most abundant, followed by SC2 and so on. For those species where STs could also be identified by Kleborate, the SCs defined by PopPunk matched closely with STs (Rand Index > 0.98), although SCs tended to be slightly more inclusive groupings. For ease of reference, we also used a compound identifier that combined SC with the corresponding canonical ST (for example, the most common group in K. pne was designated K.pne_SC1_ST307).

Inferring transmission events
To quantify the frequency of transmission events within and between different sources, we aggregated data over all Klebsiella species and identified transmission events using a threshold-based approach based on SNP distances using a mapping procedure. Briefly, for each SC in the dataset we used the 'sample_n' function from the dplyr v.0.8.3 package in R v.3.6.1 to randomly choose a reference isolate to represent the SC. This was an objective and practical solution, given the high diversity of the dataset (1,168 SCs across 15 species). We mapped all isolates from a given SC to this random SC-specific reference using Snippy v.4.6.0 (https:// github.com/tseemann/snippy), with variant calling by freebayes v.1.3.2 and the parameters --mapqual 60, --basequal 13, --mincov 10, --minfrac AUTO, --minqual 100, --maxsoft 10. We then used snp-dists v.0.7.0 (https://github.com/tseemann/snp-dists) to count the number of SNPs between each pair of isolates from that SC. Article https://doi.org/10.1038/s41564-022-01263-0 For each SC, we created a network where all isolates were connected to all other isolates and then pruned this network into 'putative transmission clusters' by removing all the links between isolates where the SNP distance was greater than the chosen threshold SNP distance (an example is given in Extended Data Fig. 10). For each of these putative transmission clusters, we recorded every unique source pair (including same source pairs) connected by a pair of isolates differing by an SNP distance ≤ threshold. We excluded pairs of samples from the same individual hosts. Each unique source pair (not isolate pair) within each cluster, which satisfied these criteria, was recorded as a single transmission event. This approach is conservative because multiple transmission events occurring within a cluster between any given source pair are only counted once, but this avoids double counting of single transmission events. To obtain transmission frequencies, we normalized the count of transmission events for each source pair by the total number of isolates in each pair of sources. We carried out this analysis using 6 different SNP thresholds: 0, 1, 2, 5, 10 and 20 SNPs. The upper threshold (20 SNPs) was informed by a recent analysis of K. pne transmission within and between healthcare settings 5 .

Statistics and plotting
All statistics and plotting were performed with R v.3.6.1. The permutation tests were done by randomly permuting the source categories and cluster labels (either species or SC depending on the dataset) 10,000 times and then deriving a P value by dividing the number of permutations that were at least as extreme as the observed data by 10,000. The test was two-tailed; therefore, a significance level of 0.025 was used at each tail. The P values were corrected using the Benjamini-Hochberg method from the p.adjust function in R. Other tests of association were done using the chi-squared test. All the plotting was done in ggplot2 v.3.3.5 and the multi-panel plots were produced using cowplot v.1.1.0.

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability
All raw Illumina sequence data are available from the European Nucleotide Archive under accession no. PRJEB27342. The individual accession numbers for all sequenced isolates are in Supplementary Table 9. The metadata and phylogenetic analysis are available for download from the Microreact project https://microreact.org/project/KLEB-PAVIA; additional metadata is provided in Supplementary Tables

Code availability
The R code for the transmission analysis and permutation tests is available from https://github.com/harry-thorpe/SPARK_code. https://doi.org/10.1038/s41564-022-01263-0 Extended Data Fig. 8 | Alignment of bla VIM plasmids. Alignment of the MN783743 and CP034084 reference plasmids with the contigs identified in our data as carrying the bla VIM gene in IncA/C plasmids. Arrows represent the bla VIM gene (turquoise), conjugal transfer system (green), resistance genes (red), mobile elements (yellow), hypothetical proteins (plum), replication protein repA (orange), integrases (dark blue) and the mercury resistance operon (pink). The three bracketed SPARK_315 SCAI carriage isolates were obtained from the same faecal sample from an outpatient in Hospital 1 (H1). The plasmid was found in two hospitals (H1 and H2). Two of these (SPARK_315_C1 and SPARK_315_C2) were members of the same clonal lineage (SC152_ST2418), whilst SPARK_315_C3 was an unrelated isolate related to the hypervirulent lineage ST23 (SC299_ST23-2LV). The plasmids are however essentially identical within all three isolates (which is identical to CP034084), indicating within-patient transfer. SPARK_355_C1 is a diagnostic carriage sample from a rectal swab from a different inpatient in Hospital 1 corresponding to lineage K.pne_SC66_ST253 and also contains the same VIM plasmid. SPARK_1466_C1 also contains a very similar plasmid. This isolate corresponds to a different lineage (K.pne_SC195_ST606) and is a diagnostic isolate from a urine sample from inpatient in Hospital 2. SPARK_1816_ C1 is a K.mic diagnostic isolate (SC14) from a rectal swab from inpatient in Hospital 1 and contains also contains this plasmid. Finally, SPARK_1652_C1 is a K.gri diagnostic isolate (SC44), also from a urine sample from an outpatient in hospital 1. This isolate contains the same plasmid, but with a deletion near the resistance cassette.

Corresponding author(s): Edward Feil
Last updated by author(s): Sep 22, 2022 Reporting Summary Nature Portfolio wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency in reporting. For further information on Nature Portfolio policies, see our Editorial Policies and the Editorial Policy Checklist.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value Noted. Give P values as exact values whenever suitable.

For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of computer code Data collectionExcel 365 was used to compile and record the list of samples as they were being generated Data analysis Bruker Biotyper 3.1 was used to interpret the data from Maldi-ToF for species ID Trimmomatic v. 0.33 was used to trim the Illumina reads SPAdes v3.9.0 was used to generated de novo assemblies Prokka v1.12 was used to annotate the genome assemblies, and Prokka v1.4.5 was used to annotate the plamids Kleborate v2.0.0 was used to assign species and ST (and call resistance and virulence genes) from the assemblies PopPunk v2.0.2 was used to delimit sequence clusters (SCs) Snippy v4.6.0 was used for mapping Abricate v1.0.0 was used to assign AMR genes to contigs RapidNJ, (no version informaion available), and RaXML v8.2.8 were used for phylogenetic analysis R code for the transmission analysis and permutation tests has been deposited in github: https://github.com/harry-thorpe/SPARK_code For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Portfolio guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A description of any restrictions on data availability -For clinical datasets or third party data, please ensure that the statement adheres to our policy All studies must disclose on these points even when the disclosure is negative.

Study description
The study focussed on the characterisation -by whole-genome sequencing-of a large collection of bacteria within to the genus Klebsiella. The aim was to examine the likely role of one-health reservoirs of Klebsiella in the emergence and spread of antimicrobial resistance.

Research sample
The Klebsiella isolates were sampled from multiple clinical, community, animal and environmental settings. Animal samples were from swabs only.

Sampling strategy
Most isolates were recovered by culturing on selective media (SCAI). Clinical isolates were acquired as part of ongoing surveillance projects in the Pavia region. The final sample size (3482 complete genomes) is one of the largest studies of its kind and is sufficient to draw inferences regarding transmission frequencies between different settings. No size calculation was carried out as we were not aiming to capture all the diversity within these populations.

Data collection
Metadata (place, time, source information) pertaining to all the samples (both positive and negative) were recorded by members of the team in Pavia. The sequence data was generated on the Illumina platform at the Wellcome Trust Sanger Institute in Cambridge, UK.
Timing and spatial scale The data were collected in and around the Northern Italian city of Pavia between June 2017 and November 2018.

Data exclusions
A small number of genome sequences were excluded on the basis that they did not pass QC.

Reproducibility
Multiple samples were taken from each source

Randomization
The samples were grouped by source (eg human, cow, river water, rhizosphere etc). These are discrete categories.

Blinding
The isolates were cultured without selecting for distinct species or resistance profile. Species ID by WGS was derived Independently from species ID by Maldi-ToF.
Did the study involve field work?

Yes No
Field work, collection and transport

Field conditions
Environmental parameters such as temperature and rainfall were not recorded.

Location
In and around the Northern Italian city of Pavia Access & import/export Permission was granted for access to farms, veterinary clinics and for clinical samples. All appropriate procedures were followed when transporting the isolates.
Disturbance NA Reporting for specific materials, systems and methods We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response.

Recruitment
The samples included stool and rectal swabs from hospital inpatients and outpatients (four different hospitals) and from a nursing home; stool from healthy volunteers.
For patients, no selection was performed, and all the samples obtained by the research group through the clinical laboratory of the San Matteo Hospital in Pavia was analyzed, up to the monthly analytical capacity of the research Group.
For healthy volunteers, no selection was performed. The study was publicized by word of mouth and through the channels of the Hospital and of the University of Pavia. All the volunteers that contacted us were enrolled. Limited metadata was required (see above) and full anonymity was guaranteed.
Ethics oversight Note that full information on the approval of the study protocol must also be provided in the manuscript.