Genomic dissection of Klebsiella pneumoniae infections in hospital patients reveals insights into an opportunistic pathogen

Klebsiella pneumoniae is a major cause of opportunistic healthcare-associated infections, which are increasingly complicated by the presence of extended-spectrum beta-lactamases (ESBLs) and carbapenem resistance. We conducted a year-long prospective surveillance study of K. pneumoniae clinical isolates in hospital patients. Whole-genome sequence (WGS) data reveals a diverse pathogen population, including other species within the K. pneumoniae species complex (18%). Several infections were caused by K. variicola/K. pneumoniae hybrids, one of which shows evidence of nosocomial transmission. A wide range of antimicrobial resistance (AMR) phenotypes are observed, and diverse genetic mechanisms identified (mainly plasmid-borne genes). ESBLs are correlated with presence of other acquired AMR genes (median n = 10). Bacterial genomic features associated with nosocomial onset are ESBLs (OR 2.34, p = 0.015) and rhamnose-positive capsules (OR 3.12, p < 0.001). Virulence plasmid-encoded features (aerobactin, hypermucoidy) are observed at low-prevalence (<3%), mostly in community-onset cases. WGS-confirmed nosocomial transmission is implicated in just 10% of cases, but strongly associated with ESBLs (OR 21, p < 1 × 10−11). We estimate 28% risk of onward nosocomial transmission for ESBL-positive strains vs 1.7% for ESBL-negative strains. These data indicate that K. pneumoniae infections in hospitalised patients are due largely to opportunistic infections with diverse strains, with an additional burden from nosocomially-transmitted AMR strains and community-acquired hypervirulent strains.

H ealthcare-associated infections (HAIs) are common throughout the world, and in industrialised countries the burden of HAIs exceeds that of all other communicable diseases combined 1 . The major causative agents are opportunistic bacterial pathogens, which are generally viewed as commensals but can take advantage of the weakened immune system and altered microbiome of hospitalized patients to cause disease [2][3][4][5] . Particularly concerning are HAIs caused by Gram-negative bacteria including Klebsiella pneumoniae, which is intrinsically resistant to ampicillin and increasingly displays acquired resistance to multiple additional drugs (multidrug resistance, MDR). Of particular concern, this organism readily acquires extended-spectrum beta-lactamase (ESBL) or carbapenemase genes that confer resistance to third-generation cephalosporins or carbapenems, respectively, leaving very limited options for antimicrobial therapy 6 . K. pneumoniae is amongst the leading causes of HAIs in hospitals globally, including urinary tract infections (UTI), pneumonia, wound infections and sepsis 2,7 . Recent studies confirm it is also a leading cause of neonatal sepsis in Africa 8 and Asia 9 . K. pneumoniae is a clear opportunist, colonizing the human gut, nasopharynx and skin at high frequency. K. pneumoniae is the 'K' in the ESKAPE pathogens, the group of opportunistic pathogens that together account for the majority of clinically significant MDR HAIs 3, 10 . ESBL and carbapenemase-producing (CP) K. pneumoniae combined make up the fastest-growing cause of drug-resistant infections in European hospitals 11 .
Asymptomatic K. pneumoniae colonization has been shown to be a source of HAIs, with attack rates estimated between 4-35% in colonized hospital inpatients [12][13][14][15][16][17] . Comparatively little is understood about K. pneumoniae pathogenesis, in contrast with closely related 'true' pathogens of the family Enterobacteriaceae, such as Salmonella or Shigella. Nearly all K. pneumoniae express the basic traits required for human infection, which are core to all strains: expression of O antigen lipopolysaccharide (LPS) and polysaccharide capsule (K antigen) (encoded by diverse K and O loci), the siderophore enterobactin (ent locus), and type I and type III fimbriae (fim and mrk loci) 18,19 . So-called 'hypervirulent' K. pneumoniae clones-which are typically hypermucoid, aerobactin-producing, and carry the K. pneumoniae virulence plasmid (Kp-VP) and serum-resistant capsular (K) types such as K1, K2, K5 18,20,21 -are associated with community-acquired invasive infections such as liver abscess and ophthalmitis 22,23 . However, there is no evidence such strains are more likely than others to cause opportunistic infections in the hospital setting. The acquired siderophore yersiniabactin (ybt locus) has been identified as a virulence factor relevant to nosocomial infection; 22,[24][25][26] indeed a recent study of infection risk amongst patients colonized with CP sequence type (ST) 258 K. pneumoniae confirmed carriage of yersiniabactin (ybt) and modified O antigen synthesis as independent bacterial risk factors for subsequent infection 26 . K. pneumoniae also show propensity for nosocomial spread within the hospital environment, with sinks, drains, medical devices and cleaning products all being demonstrated as potential reservoirs of infection [27][28][29] . However, the relative contribution of patients' own gut bacteria, vs nosocomial acquired bacteria, to the burden of K. pneumoniae HAI remains unclear, as does the question of whether bacterial features contribute to propensity for nosocomial infection.
Here we aimed to dissect the burden of K. pneumoniae HAIs in a tertiary hospital in Australia, via whole genome sequencing (WGS) of all clinical isolates identified in the hospital microbiological diagnostic laboratory for a one-year period. WGS studies have previously unveiled extensive diversity in the global K. pneumoniae population, which comprises hundreds of phylogenetically distinct lineages with variable gene content 18,22 . Additional diversity is harboured by related species in the K. pneumoniae species complex (KpSC), which includes seven species and subspecies that are difficult to distinguish by MALDI-TOF or biochemical tests [30][31][32] . However, the implications of this population structure to the role of K. pneumoniae as an opportunistic pathogen in the hospital setting are unclear, due to the limited focus of most WGS studies on either (i) MDR (CP or ESBL) HAI or (ii) hypervirulent community-acquired infections. Each of these clinical manifestations are associated with just a subset of clonal lineages (a few dozen common CP lineages, and fewer than a dozen hypervirulent ones 18,[33][34][35]. Hence WGS studies focused on these subsets of infections reveal little about the diverse agents underlying the general burden of opportunistic infections. WGS has the advantage that it can also be used to identify nosocomial transmission clusters, although this has mainly been applied to investigation of CP or ESBL HAIs [36][37][38][39] , or restricted to blood isolates 18,40,41 , and so the overall contribution of nosocomial transmission to total burden of K. pneumoniae HAI has not previously been well characterised.

Results
Infection burden. A total of 362 clinical isolates of KpSC were identified at the microbiological diagnostic laboratory during the 1-year study period, collected from 318 patients ( Fig. 1 and Table 1). The patients were 55% female and ranged in age from 20 to 97 years old, with median age 70 years. The median age for females was significantly higher than for males (75 vs 67, p = 0.001 using Wilcoxon rank-sum test, see Fig. 1d). The majority of patients had UTI (66%), 15% had pneumonia and 10% wound/tissue infections ( Fig. 1a and Table 1). Ten percent of patients had disseminated infections (bloodstream and/or cerebral spinal fluid (CSF) isolates); most had no other isolate and the primary site of infection was not known (Table 1). UTIs were more common in females whilst pneumonia was more common in males (see Fig. 1d and Table 1). No statistically significant differences were observed in gender or age distribution for other infection types (Table 1). KpSC clinical isolates originated from specimens taken in 49 clinical units/wards; those contributing the greatest number of isolates were the emergency department (n = 72, majority causing UTIs (n = 54) plus 12 disseminated infections), ICU (n = 41, majority causing pneumonia (n = 25) plus four disseminated infections), and haematology ward (n = 15, majority causing pneumonia (n = 5) or disseminated infections (n = 7)). Sixteen isolates were collected in outpatient clinics (n = 15 UTI, n = 1 pneumonia).
Only 40% of patients had their first clinical KpSC isolate collected >2 days into their current hospital admission (i.e., meeting the standard definition of nosocomial onset) ( Table 1). Taking into account prior contact with the hospital network in the last 1-12 months to ascertain likely mode of acquisition (nosocomial: onset on day >2 and/or prior hospital admission within 30 days; HA: hospital or outpatient contact in last 12 months and onset on day ≤2; CA: no such contact and onset on day ≤2; see Methods for details), we estimate that 49% of KpSC infections were nosocomial and a further 37% were HA. Just 13% of infections could be considered CA, of which 67% were UTIs and 19% wound infections (Fig. 1b). Forty percent of CA infections were admitted via the emergency department (n = 12 UTI, 3 disseminated and 2 wound infections) and 12% via ICU (n = 4 wound infections and 1 pneumonia). Pneumonia was significantly more common amongst nosocomial acquired infections (21%; vs 9.5% of HA and 7.1% of CA, p = 0.005 for test of difference in proportions), whilst wound infections were significantly more common amongst CA infections (19%; vs 5% of HA and 10% of nosocomial, p = 0.045 for test of difference in proportions) (see Fig. 1b).
The frequencies of AMR phenotypes per KpSC isolate are shown in Fig. 1c. Most isolates (63%) were susceptible to all drugs tested except ampicillin. The remaining 37% had acquired resistance to ≥1 drug tested, 21.3% were MDR (acquired resistance to ≥3 drug classes), and 19.6% were 3 rd generation cephalosporin resistant (3GCR, of which 96% were MDR). At the patient level, four patients (1.4%) had ≥1 carbapenem (meropenem) resistant isolate and 46 (16%) had ≥1 3GCR isolate. 3GCR KpSC infections (using first clinical isolate per patient) were significantly associated with nosocomial infection, whether defined as onset >2 days after admission (OR 1.12 and p = 0.01, vs day 0-2 and adjusting for patient age, sex, specimen type using logistic regression), or accounting for other recent admissions (OR 1.13 and p = 0.05, vs CA and adjusting as above). Whilst no temporal or seasonal trends in monthly isolation rates were detected, either for individual infection types or in aggregate (p > 0.1 using Bartel's rank test), the 3GCR frequency showed an increasing trend (p = 0.036 using Bartel's rank test for trend, see Fig. 1a), rising from mean 15% (range 10-21% per month) in the first nine months of the study to mean 34% (27-46% per month)   1). WGS confirmed the majority of pure-culture isolates were K. pneumoniae (82.3%); the rest were K. variicola subsp. variicola (13.7%), K. quasipneumoniae subsp. similipneumoniae (3.7%) and K. quasipneumoniae subsp. quasipneumoniae (n = 1) ( Table 2 and Fig. 2a). There were no significant associations between species and infection type, onset or acquisition; however, K. pneumoniae infections were more likely to display acquired resistance phenotypes compared to the other taxa combined (32.9% vs 14.5%; p = 0.01 using test for difference in proportions) (see Table 2). We assessed genomic diversity of the clinical isolates in terms of phylogenetic lineages, gene content, plasmid content, AMR and acquired virulence determinants, and surface antigen synthesis loci. We inferred a maximum likelihood core-genome phylogeny (Fig. 2) and used this to cluster the 328 genomes into 182 lineages (138 K. pneumoniae, 35 K. variicola, 9 K. quasipneumoniae) representing distinct strain types that have been separated from other lineages over many years of evolution (see Methods). These correlated very closely with the 179 unique STs defined by 7-locus MLST (see ST line vs lineage line in Fig. 3a; assignments of individual genomes to lineages and STs is given in Supplementary Data 1 and can be explored against the phylogenetic tree using the interactive viewer at https:// microreact.org/project/kaspahclinical). Twenty-six patients contributed more than one sequenced isolate. In most cases (n = 21, 81%), isolates from the same patient matched at the lineage level, consistent with a single infecting strain and we classified this as a single infection episode (within-patient pairwise distances ranged from 0-16 SNVs, median 1 SNV). Of the remaining cases, three patients had one lineage identified in urine followed by a disseminated infection with a second lineage 2-19 days after; one patient had different MDR lineages (ST347 and ST491) detected in wound swab specimens collected from the same site three days apart; and one patient had one lineage (sensitive ST520) isolated from sputum and a second lineage detected 32 days later in both blood and sputum (norfloxacin-resistant ST111). We therefore classified these five patients as each having two distinct infection episodes, bringing the total number of genomically-defined infection episodes to 294. The cumulative counts of infection episodes, lineages and STs during the study are plotted in Fig. 3a. All curves were nearly linear (linear regression, adjusted R 2 ≥ 0.97), and the slopes of the lineage/ST linear regression curves were 69% that of total infection episodes (67% considering K. pneumoniae only, dashed lines in Fig. 3), indicating extensive diversity of strains underlying the total infection burden.
The mean number of genes per genome was 5031 (IQR 4936-5161), including 3,095 core genes that were present in all genomes. The pangenome comprised 23,075 genes in total, of which 4067 were very common (present in ≥95%), 4,001 were common (present in 5-95%), and 15,007 were rare (present in <5%) or very rare (9845 in <1%) ( Supplementary Fig. 2). Gene content was largely conserved within lineages (median pairwise Jaccard similarity 0.93 for all genes including core genes, and 0.75 across common genes) but was quite distinct between lineages of the same species (median values 0.79 and 0.32 for all genes and common accessory genes, respectively) (Fig. 3b). We used multiple methods to assess plasmid load and diversity. Mob markers (associated with distinct plasmid mobility types 42 ) were detected in genomes from 89% of infection episodes (median n = 2, IQR 1-3), and rep markers (associated with distinct plasmid replicons 43 ) were detected in 84% (median n = 5, IQR 2-9). A total of 55 uniquely distributed rep markers were identified, including 25 that were present in ≥5% of infection episodes and 18 in ≥10% (Fig. 4). The number of unique mob and  rep markers were significantly correlated across genomes (Pearson correlation coefficient = 0.614, p < 1 x 10 −15 ), and both were significantly independently associated with total DNA in contigs predicted to be plasmid-derived (see Fig. 4  isolates (96%). Carbapenemase genes were identified in K. pneumoniae ST340 (n = 2 infections, bla IMP-4 ) and ST231 (n = 3 infections, bla OXA-48 ). The majority of acquired AMR genes were predicted to be plasmid-borne (67.5%), and 8% chromosomally located (the rest were unassignable, Fig. 3d). Chromosomally integrated AMR genes were identified in genomes from 12 infection episodes and confirmed by long-read sequencing (Supplementary Data 1). These included six genomes (three ST231, three ST340) with bla CTX-M-15 integrated in the chromosome via ISEcp1, and one (ST29) with an entire 243 kbp MDR plasmid fused with the chromosome as previously reported 12 . Three chromosomes (K. variicola ST616 and ST1456) carried a novel acquired fosA homolog (closest relative being fosA7, 9% nucleotide divergence) in addition to the intrinsic fosA gene; however only one of these isolates (INF136) had elevated fosfomycin MIC (128 mg/L, vs wildtype range 16-32 mg/L, measured using agar dilution) so it is not clear whether this gene confers resistance. Concerningly the 16S rRNA methylase gene rmtB (which confers high-level resistance to aminoglycosides) was found in  isolates from three patients, displaying resistance to amikacin, gentamicin and tobramycin in addition to 3GC, amoxicillinclavulanic acid, ticarcillin-clavulanic acid, tazobactam-piperacillin, ciprofloxacin, norfloxacin and trimethoprim/sulfamethoxazole. One was resistant to meropenem (MIC ≥ 16 mg/mL) and the other two had elevated MIC (1 mg/mL, compared to the wildtype <0.25 mg/mL for 96.4% of isolates). These three isolates were all ST231 and harboured three quinolone-resistance mutations (GyrA-83I, GyrA-87Y, ParC-80I) and a bla CTX-M-15 insertion in the chromosome; an IncC plasmid carrying rmtB plus 11 other AMR genes including the ESBLs bla CTX-M-15 and bla VEB-1 , ermB (azithromycin resistance) and arr-2 (rifampicin resistance); and an IncL plasmid carrying the bla OXA-48 carbapenemase.
AMR phenotypes for key drugs were quite well predicted by known AMR determinants, however there were instances of unexplained resistance for most drugs (Table 3). Of the 47 3GCR infections, 42 carried known ESBL genes (n = 33 bla CTX-M-15 (n = 3/ 33 also carried bla VEB-1 ), n = 6 bla CTX-M-14 , n = 1 bla CTX-M-3 , n = 1 bla CTX-M-62 , n = 1 bla SHV-12 ). The remaining five isolates were MDR and had tested positive for ESBL production in the diagnostic laboratory, but the corresponding genome sequences lacked ESBL genes and four lacked any acquired AMR genes. Re-testing of stocked cultures confirmed that four remained resistant to ceftriaxone (and were MDR), but one (INF255) had regained susceptibility to cephalosporins, fluoroquinolones, aminoglycosides, and trimethoprim/sulfamethoxazole. Another (INF018) was an SNVs. We therefore speculate that all five isolates initially had ESBL/ MDR plasmids upon first isolation and testing, but these were lost during culture for DNA extraction (this would also account for most unexplained resistance for other drugs except trimethoprim).
The ybt locus encoding the acquired siderophore yersiniabactin was detected in isolates from 33% of K. pneumoniae infection episodes associated with 48 lineages (Fig. 2) but was not detected in other KpSC. Fourteen ybt locus types were identified (including the plasmid-borne ybt 4 in 14 strains from 11 STs, as previously reported 24 ). Presence of ybt was not significantly associated with the presence of ESBL or other acquired AMR genes, however the additional acquired virulence factors iuc, iro, rmp and clb were exclusively found in ybt + isolates, with overall frequencies of 2.7% (iuc) or 3.1% (iro, rmp, clb). Iuc, iro and rmp were mostly co-located, and restricted to five known hypervirulent clonal groups associated with virulence plasmids (CG86, CG23, CG66, CG420, CG91/subsp. ozaenae; Table 4). Isolates harbouring the complete rmp locus were confirmed to be hypermucoid via the string test (except for a single ST86 isolate). Three clb variants were identified in five STs (Table 4), located with ybt in ICEKp10.  Table 1). Forty-one K loci (45%) were identified only once each (Supplementary Table 1). Eight KL types were found in ≥3% of infection episodes each ( Supplementary Fig. 3), together these accounted for 33% of all infection episodes including 57% of 3GCR cases (Fig. 3e). These top eight KL types were each associated with a dominant ST ( Supplementary Fig. 3), suggesting Based on most resistant isolate per infection episode (n = 294). Major error = number (%) of susceptible isolates in which a resistance determinant was identified; very major error = number (%) of resistant isolates in which a resistance determinant was not identified (i.e., unexplained resistance). Genetic determinants explaining ≥5% of resistance to the given drug or class are listed. *Note aac(6')-Ib-cr reduces susceptibility to fluoroquinolones (Machuca et al.) but does not on its own raise the MIC above the breakpoint for clinical resistance; it was present in 60% of ciprofloxacin resistant strains, but these all carried qnr genes and/or gyrA+ parC mutations that could explain resistance; hence aac(6')-Ib-cr is not included in the list of genetic determinants explaining ciprofloxacin resistance. § Errors for amikacin resistance prediction (MIC ≥ 16 mg/L) shown in the table are based on excluding aac(6')-Ib-cr as a determinant, as we could not find any evidence that this specific allele confers resistance in Klebsiella (causative genes detected were rmtB and aac(6')-Ib4); including aac(6')-Ib-cr removes the two very major errors but results in 24 major errors due to isolates carrying the gene but which have low MIC for amikacin (MIC 8 (n = 2), 4 (n = 7) or <2 (n = 15)).  that their comparatively high prevalence was driven by local clonal expansion (and potentially transmission) of specific lineages. However, all except one (KL21) were also found in multiple STs (4-7 STs each) indicating that these K types are also widely distributed across lineages ( Supplementary Fig. 3). Operons for synthesis of the most common capsular polysaccharide sugar components, mannose (man) and rhamnose (rml), were present in the K loci of 64% and 29% of unique infection episodes respectively (Supplementary Table 1); these were not significantly associated with infection type (p = 0.71 and p = 0.19, respectively, using Chi-square test). The theoretical coverage provided by multi-valent vaccines targeting increasing numbers of KL (ordered by KL frequency in the population) is shown in Fig. 3e. KL diversity was similar for each type of infection (Simpson's diversities between 0.93 and 0.97), and theoretical vaccine coverage was also similar (Fig. 3e). Overall, 16 KL types (each with frequency ≥2%) would need to be targeted to cover 50% of infections (79% of 3GCR), and 31 KL types (each with frequency ≥1%) to cover 70% of infections (89% of 3GCR) (Fig. 3e). Enhanced coverage of 3GCR infections is attributable to high numbers of the ESBL-producing strains ST323 (KL21) and ST29 (KL30), which were transmitted in the hospital (discussed below).
Species hybrids. As hybridization between KpSC members has been reported previously 22 we screened our genome collection for evidence of cross-species hybridization (see Methods) and identified 12 K. variicola clinical isolates whose genomes harboured imports of between~100 kbp and~1 Mbp of sequence from K. pneumoniae. These represent 12 unique infection episodes and 8 lineages, i.e., 27% of K. variicola infections and 23% of K. variicola lineages. Four further hybrids were identified amongst isolates reported previously from screening swabs at the same hospitals 12,31 Ten of the hybrids belonged to K. variicola ST681 (3 UTI, 2 respiratory, 1 disseminated, 1 unknown, 3 throat swabs). One clinical respiratory isolate and two subsequent screening throat swabs were isolated from a single ICU patient. Genomic comparisons with publicly available ST681 genomes suggest that our ST681 isolates were in fact hybrids of K. variicola ST681 and K. pneumoniae (see Methods and Fig. 5). One isolate (INF232; from a woman in her 90s with UTI at Hospital D) comprised a ST681 K. variicola genome backbone with a 281 kbp recombinant region whose sequence closely matched K. pneumoniae (≤0.5% divergence, Fig. 5a). This isolate harboured KL143 (man-positive) and a truncated form of the O3/O3a locus (broken by an insertion of IS903B and likely non-functional). The nine other local ST681 isolates were very closely related to one another (0-7 pairwise SNVs) and shared with INF232 the 281 kbp K. pneumoniae import and also a second recombination import of 311 kbp, which spanned the K and O biosynthesis loci and resulted in import of intact KL10 (man/mannose-positive) and O1/O2v2 (O2afg) loci. Again, the imported region showed close homology (≤0.5% divergence) with K. pneumoniae, in which KL10 was originally described (Fig. 5a).
The other six hybrids all included recombinant regions spanning the K locus, resulting in import of various capsule loci from K. pneumoniae (see Fig. 5b-d). Five were associated with infections (UTI, respiratory, wound, sepsis) and one was from a rectal screening swab in the ICU study 31 . Four of these hybrids belonged to ST925, each comprising a K. variicola ST925 backbone with a different recombinant block spanning the K locus, between 90 and 565 kbp in size, apparently imported from K. pneumoniae (≤0.5% divergence; see Fig. 5b) and encoding distinct K and O types (KL9/ O1, KL28/O3, KL102/O1, KL169/OL104). These strains, isolated from four different patients at Hospital A, differed from one another by >12,000 SNVs in the non-recombinant backbone regions, confirming they were not related to one another by recent local transmission. The other two hybrids were novel singleton STs (ST3095 and ST3060), also comprising K. variicola with one or two imported regions from K. pneumoniae (393 to 1043 kbp in size, Fig. 5c, d).
Genomics-informed understanding of disease burden. Of the total 182 lineages associated with 294 unique infection episodes, 139 (76.4%) were unique to an individual patient. These singleton lineages accounted for nearly half (47.3%) of all the infections, which most likely originate from the patients' pre-existing gut microbiome. The remaining infection episodes were associated with 43 lineages that were detected in multiple patients, including 21 'common' lineages (11.5% of all detected) that were each isolated from ≥3 patients and accounted for 37.8% of all infection episodes (labelled in Fig. 2). These comprised 20 K. pneumoniae and the hybrid K. variicola ST681 cluster. Isolates belonging to the common lineages were significantly and independently positively associated with ESBL, ybt, man-positive K loci, and nosocomial onset; and negatively associated with non-K. pneumoniae species and rml-positive K loci (Fig. 6a).
The frequency of the common lineages could potentially reflect nosocomial transmission, or a higher propensity to cause disease in colonized patients. We defined probable nosocomial transmission clusters as those with ≤25 pairwise SNVs between genomes isolated from the same hospital within 45 days and with plausible epidemiological links (Fig. 6b, see Methods). This identified 12 clusters of 2-9 patients each, involving 11 STs and associated with 41 infection episodes (14%) (Supplementary Table 2). Mean pairwise distance between clustered isolates was 0.7 SNVs (median 0, IQR 0-0, range 0-22). As expected, these infections were significantly associated with onset several days into the hospital stay (median onset day 4 for infections in transmission clusters, vs day 1 for other infections, p = 0.007 using Wilcoxon test). Notably one of these clusters involved the ST681 hybrid strain, which infected six ICU patients over a 2.5-month period. Patient age was independently associated with transmission (OR 0.98 [95% CI, 0.96-0.996], p = 0.02 in multivariable logistic regression model); but patient sex and bacterial virulence factors were not (Supplementary Table 3). Two-thirds of the transmission clusters involved ESBL+ strains (Fig. 6b); by comparison, just n = 4/139 (2.9%) of singleton lineages were ESBL+ (OR 61 [95% CI, 11-422], p < 1 x 10 −7 for association between ESBL+ and transmission, using Fisher's exact test). ESBL carriage was a strong predictor of onward transmission of a lineage: we estimate a crude probability of onward nosocomial transmission to be 28% (n = 8/29, [95% CI, 11-44%]) for unique ESBL+ strains and 1.7% (n = 4/236, [95% CI, 0-3.3%]) for ESBL-strains.
Overall, probable transmission clusters accounted for 55% of all ESBL+ infection episodes and 6.1% of ESBL-infection episodes (Fig. 6c). Assuming the first clinical isolate from each cluster represents the index case, this implies that 29 infection episodes (9.9%), including 21 ESBL+ infections (45%), resulted from nosocomial transmission (note this is a lower limit as it is possible that the first clinical isolate was also acquired in the hospital from an unsampled source, such as asymptomatic colonization of another patient or staff member, or an environmental reservoir). Transmission-linked ESBL+ infections occurred throughout the study period but were concentrated in Dec 2013 to Feb 2014 (Fig. 6d), during which time they accounted for 88% of ESBL+ infections (vs 38% in earlier months, p = 0.005 using proportion test). This was associated with clusters of ST29 (n = 8 patients) and ST323 (n = 4 patients) bearing the same bla CTX-M-15 plasmid (described previously 12 ), and the highly resistant ST231 strain noted above (n = 3 patients).
Only 38 infections were classed as true community-onset, and these were not significantly different from healthcare linked infections in terms of age, sex, AMR, hypervirulence determinants, or K/O types (Supplementary Table 4a). Nosocomial onset of infection (i.e., day 3 or later of hospital stay, vs earlier or outpatient onset) was significantly positively associated with male sex, ESBL carriage and rml-positive capsule, independently and in a multivariable model (see Supplementary Table 4b). Similar results were observed when including those with a prior hospital admission within 30 days in the definition of nosocomial onset (Supplementary Table 4c).

Discussion
Here we analysed all clinical isolates identified as K. pneumoniae in a hospital clinical microbiology laboratory for a one-year period ( Fig. 1 and Table 1), and found remarkable genomic diversity in the underlying population of organisms (Figs. 2 and  3). Consistent with previous studies, we found that 19% of isolates identified as K. pneumoniae by MALDI-TOF in fact belonged to other common members of the wider K. pneumoniae complex ( Table 2) 12,22,30,31 . However even amongst the isolates confirmed as K. pneumoniae, in this single 1-year local snapshot of diseaseassociated strains, we detected huge genetic diversity in the form of 138 phylogenetic lineages bearing 78 distinct capsular biosynthesis loci (half of all K loci ever described), 60 acquired AMR genes and 55 plasmid replicons, with just 80% of genes shared pairwise between lineages (Figs. 2 and 3).
The sheer scale of genomic diversity associated here with clinical disease supports the view of K. pneumoniae as a classic opportunistic pathogen namely that: (i) any member of the population has the potential to cause disease in hospitalized patients whose underlying health is sufficiently compromised; and (ii) much of the hospital-associated disease burden stems from extraintestinal 'escape' by the patients' own colonizing strains, rather than acquisition of the bacteria through nosocomial transmission 12,31,45 . Indeed only 10% of infections in this study were attributed to WGS-supported nosocomial transmission. Furthermore, our data suggest that most of the common lineages, were not transmitted within the hospital system; rather the reason they were detected in multiple patients is because they circulate widely in the human population. Indeed in many cases these lineages showed evidence of global dissemination. The reasons for the apparent success of these global clones within the human host population are not yet clear; however, as a group they were enriched for ESBL genes, the ybt locus, as well as manpositive and rml-negative capsule types. Notably, many of these lineages (including ST17, ST35, ST37, ST111, ST629, ST661) are amongst those frequently detected in food animals (cows, pigs and/or poultry [46][47][48][49], so may constitute animal-adapted strains to which humans are frequently exposed via the food chain. Our data reveal a clear association between AMR and nosocomial onset (using either definition). Notably, whilst nosocomial transmission added relatively little to the overall infection burden (~10%), we estimate that it roughly doubled the burden of ESBL infections. In particular, the rise in overall ESBL frequency observed during December 2013 and February 2014 (Fig. 1a) can be attributed to transmission of ST323, ST29 and ST231 during this period (Fig. 6d). Consistent with this, we estimate the crude risk of transmission resulting in secondary infection/s was negligible for non-ESBL infections (95% CI, 0-3.3%) but substantial (95% CI, 11-44%) for ESBL infections. These observations support the notion that interventions aimed at preventing crosstransmission in hospitals (e.g., hand hygiene, or seek-and-contain approaches to CP infections) could have a significant impact on reducing the total burden of AMR infections. However, the data also suggest that the underlying burden of opportunistic K. pneumoniae infection, which originate from diverse strains present in the gut microbiome of patients, might still remain high unless this source of infection is specifically targeted (e.g., by colonization or colonization-density screening) 50 .
One caveat of these analyses is the use of simple genetic and temporal distance rules to define WGS-supported nosocomial transmission (see Methods). However, the transmission clusters identified using these simple rules are concordant with those we identified previously in related contemporaneous studies 12,31 . The latter studies incorporated patient movement data, as well as carriage screening isolates, to detect silent transmission in the Hospital A ICU 31 and in geriatric wards of Hospital C 12 . These studies found strong evidence for transmission of ST231, CG323, and ST681 in ICU 31 , and for transmission of CG29, CG323 and ST340 more widely in Hospital A. These transmission events also accounted for all instances of MDR K. pneumoniae colonization and infection detected at Hospital C, to which Hospital A geriatric patients are often referred for longer term care 12 . Similar detailed analyses of ST27, ST35, ST111, ST133, ST412, and ST792 in the Hospital A ICU found no evidence to support intrahospital transmission of these clones, consistent with the analysis in the present study 31 . Hence, we consider the lack of detailed patient movement data to confirm transmission of the novel clones identified in the present study to be a minor limitation. Another limitation of the study is the reliance on stored clinical isolates for WGS. This provides the opportunity for evolution of the isolate during storage and passage, between the initial identification and susceptibility testing in the clinical laboratory and the later subculture and DNA extraction for WGS. Indeed, we identified five cases of probably plasmid loss, based on comparison of the initial susceptibility data and later WGS data. Additionally, reliance on single representative isolates means that we were unable to assess whether patients were co-infected with multiple KpSC strains. Five of the 26 patients from whom multiple isolates were captured had distinct lineages identified by WGS. We defined these as distinct infection episodes, however it is possible that some of these instances represent a single prolonged episode of co-infection with two distinct strains, whereby both strains were present in both specimens, but a different strain happened to be picked for storage from each specimen. However even if all five of these patients actually had a single episode of coinfection rather than multiple unique infections episodes, this would reduce the total number of unique episodes by only 1.7%, which would have little impact on the overall picture of diversity or associations with clinical features.
Besides AMR, we noted some significant associations between other bacterial factors and infection traits. Virulence plasmidencoded loci (aerobactin, salmochelin, rmp) were associated with known hypervirulent clones and community onset of infection, often with diagnosis made upon presentation to the emergency department or ICU ( Table 4). The ybt locus was associated with common lineages (Fig. 6a). This is consistent with previous observations that (i) ybt is enriched amongst clinical infection isolates compared to asymptomatic carriage isolates (in the range of >30% vs <10%); 22,24,51 (ii) a recent report that ybt+ ST258 have a higher attack rate than ybt-ST258 in colonised patients; 26 and (iii) the known mechanism by which yersiniabactin can enhance the potential for extraintestinal infection, by evading Lcn2-mediated host immunity 25,52 . Notably the presence of the man operon in the K locus (correlated with presence of mannose in the expressed capsule 53 ) was also associated with common lineages (Fig. 6a). Mannose-containing capsules have been shown to be recognised by the mannose receptor on human and murine macrophages, promoting clearance and resulting in lower virulence (higher LD 50 ) in a murine infection model 54 . Hence, we hypothesise that any advantage conferred by man+ K loci likely relates to the process of colonization rather than infection. Consistent with this, the overall prevalence of man+ K loci in our clinical isolates (64%) was the same as in a collection of n = 464 community carriage isolates recently published from Norway 51 (65%). Presence of the rml operon in the K locus (expected to produce rhamnose-containing capsule 53 ) was negatively associated with common lineages (Fig. 6a) and positively associated with nosocomial onset (using either definition, Supplementary Table 4b, c) but not nosocomial transmission. Indeed, the frequency of rml + was particularly high amongst patients whose first clinical isolate was collected after at least a week into their hospital stay (41% vs 23%; OR 2.3, 95% CI, 1.3-4.2, p = 0.003 using Fisher's exact test). Thus, we hypothesise that K. pneumoniae with rhamnose-containing capsules have reduced virulence, as they are apparently less able to establish an infection until the patient's condition deteriorates sufficiently in hospital. Consistent with this interpretation, the frequency of rml+ KL was low (10%) amongst infections diagnosed in the emergency department.
In contrast to the findings above, we did not find any evidence of association between infection site and genomic features of the bacteria, but rather infection site was associated with patient demographics and infection onset. Specifically, UTIs were significantly associated with age and female sex; respiratory infections with nosocomial onset; and wound infections with community onset. This is consistent with there being no genetically-determined tissue tropism (or limited effect size of bacterial factors, which were underpowered to detect here), and suggests the outcome of the host-pathogen interaction is primarily determined by the patient's health status and vulnerabilities, consistent with the concept of opportunistic infection. The likely exception is the so-called 'hypervirulent' strains that express virulence plasmid-encoded siderophores and hypermucoidy; 18,20,21 such strains were rare in our study, but were mostly detected in community-onset infections (bold in Table 4). The spectrum of K. pneumoniae disease identified here (~two-thirds UTI, 15% respiratory,~10% wound,~10% sepsis) mirrors patterns in other hospitals around the world 55 , and the genomic diversity we uncovered is consistent with WGS studies of unselected bloodstream isolates 18,40,41 and even from AMRselected studies; 39,55,56 hence our results are likely to be broadly representative of the K. pneumoniae clinical picture in other hospital settings.
Aside from K. pneumoniae, the clinical significance of the other species in the K. pneumoniae complex remains open to investigation, although it is clear that both K. variicola and K. quasipneumoniae are capable of causing disease in hospitalized patients 30,57,58 . However, neither of these species was particularly prevalent in our study (18% of all infections, combined). Interestingly, within just a single year of collection in our hospital, we detected 12 K. variicola / K. pneumoniae hybrid isolates (8 unique strains or variants), all of which involved imports of K. pneumoniae into a K. variicola background and resulted in import of a capsular biosynthesis locus from K. pneumoniae (Fig. 5). Large-scale recombination between K. pneumoniae lineages, centred mainly around the K locus, has been reported several times; the best-known example is the emergence of the carbapenemase clone ST258 59,60 . However to our knowledge very few species hybrids have been reported previously 22,46,61 , and those reported appear to represent sporadic isolations consistent with the expectation that cross-species hybrids have compromised fitness. However, in the present study the KL10 ST681 hybrid strain showed evidence of local transmission in the hospital, spreading to cause silent gut colonization of three patients and various infections in six patients, demonstrating that it is clearly fit to transmit. Notably, the imported capsular synthesis locus KL10 is relatively common in K. pneumoniae clinical isolates 62 , with a recent study of the >13,000 publicly available K. pneumoniae genomes in GenBank reporting frequencies of 2.1% amongst human blood isolates, 1.5% amongst other human isolates, and 0.3% and 0.5% amongst animal and environmental isolates, respectively 63 . Hence, we hypothesise this may have contributed to the hybrid strain's ability to colonise or infect humans.
Overall, we show that by sequencing all clinical isolates, we can gain a much more nuanced view of the burden of K. pneumoniae infections. WGS clarified the burden of infection in this setting resulted mainly from diverse strains present in the patients' own gut microbiomes, including a very low frequency (<3%) of hypervirulent strains and enriched for a small number of successful lineages associated with AMR, yersiniabactin, and man+ K loci; on top of which there was an additional burden (~10%) resulting from nosocomial transmission that is strongly associated with ESBL.

Methods
Ethics approval and consent to participate. This project complies with all relevant ethical regulations. Ethical approval for the project was granted by the Alfred Hospital Ethics Committee in Melbourne, Australia (Project numbers #550/ 12 and #526/13). A consent waiver was granted for the inclusion of limited patient data related to clinical isolates, extracted from hospital and laboratory records by hospital staff who normally have access to the data, and shared in deidentified form with research staff for analysis in this project.
Setting and sample collection. The Klebsiella Acquisition Surveillance Project at Alfred Health (KASPAH) was conducted over a one-year period from April 1, 2013 to March 31, 2014 in Melbourne, Australia. All clinical isolates identified as K. pneumoniae by the Alfred Health microbiological diagnostic laboratory as part of routine care were included in the study, if they were reported by the laboratory as associated with infection (see details below). Four hospitals in the Alfred Health Network are served by this laboratory. All patients from whose specimens the isolates were cultured were recruited into the study (consent was not required). Relevant clinical data was extracted from the laboratory and hospital records at the time of recruitment (date of specimen collection, specimen type and referring hospital, patient age and gender). Clinical review of hospital records was undertaken retrospectively for all participants, 4 years post-recruitment, in order to classify each isolate as community-acquired (CA), healthcare-associated (HA), or nosocomial in origin (details below).
Clinical isolates. Clinical isolates were included in the study when the treating physician referred a specimen to the diagnostic service of the microbiology laboratory for analysis based on clinical suspicion of infection, and K. pneumoniae was then identified and reported as a pathogen according to the in-house standard operating procedures as previously described 31 . Species identification was performed using matrix-assisted laser desorption ionization-time of flight (MALDI-TOF) (Vitek MS ® , bioMerieux Marcy l'Etoile, France). All K. pneumoniae identified from sterile sites (blood cultures, cerebrospinal fluid, deep tissue biopsies, pleural fluid) and from cultured prosthetic material (e.g., central venous catheters) were reported as pathogens, as long as other enteric or skin flora was not detected. For other specimen types, a K. pneumoniae infection was deemed present if sufficient concentrations of neutrophils were seen on microscopy or Gram stain, and K. pneumoniae was found to be the sole organism present or the predominant organism if the sample was also expected to contain normal flora. K. pneumoniae would be reported as an infection in the absence of neutrophils if the patient was neutropenic. The vast majority of isolates resulted from wound swabs, sputum samples or urine samples. Where K. pneumoniae was identified in urine samples or wound swabs along with other enteric bacteria (e.g., E. coli), the laboratory reported this as mixed enteric flora; K. pneumoniae isolated from such specimens were excluded from the study. Wound swabs were collected when signs of infection were present (i.e., purulent discharge) and reported as K. pneumoniae only when other enteric or skin bacteria were not also identified. K. pneumoniae was reported as clinically significant when it was the predominant isolate from a well collected sputum specimen. When samples were clearly from the oral cavity (indicated by the presence of saliva (macroscopically), squamous epithelial cells (microscopically) and mixed oral flora on culture), then a small amount of K. pneumoniae was not reported as pathogenic but rather as 'mixed oral/enteric flora' and such isolates are not included in this study. Laboratory data entry was done using Microsoft Access 2013 and Excel 2013.
Three distinct infection acquisition statuses were defined: community acquired infections, nosocomial infections, and healthcare-associated. Community-acquired (CA) infections were defined by isolation of K. pneumoniae from an outpatient or on day 0, 1 or 2 of current admission as an inpatient, and with no recorded prior contact with the Alfred Health Network (either as an inpatient or outpatient) in the previous 12 months. Nosocomial infections were defined by isolation of K. pneumoniae on day 3 or later of the current inpatient admission, or with recent inpatient admission (in the last month). K. pneumoniae infections in patients not meeting the criteria for nosocomial infection but having some recorded contact with Alfred Health in the last 12 months (including as an inpatient 1-11 months prior to the current infection, or with one or more prior outpatient visits up to 12 months prior to the current infection), were considered as healthcareassociated (HA).
Antimicrobial susceptibility testing. All clinical isolates were subjected to antimicrobial susceptibility testing in the clinical microbiological diagnostics laboratory upon isolation (i.e., in 2013-2014), using the Vitek2 GN card and interpreted using 2020 EUCAST breakpoints. Antimicrobials tested were: ampicillin (to which K. pneumoniae are intrinsically resistant via chromosomally encoded beta-lactamases), amoxicillin-clavulanate, ticarcillin-clavulanate, tazobactam-piperacillin, cefazolin, ceftazidime, ceftriaxone, cefepime, norfloxacin, ciprofloxacin, amikacin, gentamicin, tobramycin, trimethoprim and trimethoprim/sulfamethoxazole. If the susceptibility pattern suggested an ESBL enzyme was present, this was confirmed using the method of Jarlier 64 . Isolates were classified as MDR based on acquired resistance to three or more classes of antimicrobials (i.e., not counting ampicillin resistance which is intrinsic) as previously defined 65 . Selected stored isolates were re-tested in 2021 in order to investigate issues identified upon sequencing : INF034,  INF155, INF167, INF255, INF018 (whose genomes were ESBL-negative but ceftriaxone resistant) were re-tested via Vitek2 GN cards; INF307, INF048, INF136 had novel fosA genes in the chromosome and were subjected to agar dilution in triplicate to assess MIC to fosfomycin.
DNA extraction and whole genome sequencing (WGS). All isolates were subjected to DNA extraction for Illumina sequencing in 2015, using a phenol:chloroform method via phase lock gel tubes (5PRIME) as previously described 31 . Barcoded Illumina DNA libraries were prepared using Nextera XT or TruSeq protocols and sequenced on the HiSeq 2500 platform, generating paired-end reads of 125 bp each. Eighty-seven isolates (26%, see Supplementary Data 1) were later subjected to fresh DNA extraction using a protocol based on GenFind (Beckman Coulter) reagents (doi: 10.17504/protocols.io.p5mdq46), and multiplex long-read sequencing with an Oxford Nanopore Technologies (ONT) MinION device (as described previously 66 ), to facilitate assembly, pan-genome and plasmid analyses.
Species analysis and quality control of WGS data. A total of 362 infection isolates from 318 patients were included in the study. Two of these isolates failed the Illumina library preparation step prior to sequencing. Three of the sequenced read sets were excluded from further analysis because preliminary analysis showed that the sequences were dominated by non-K. pneumoniae DNA (two were predominantly Klebsiella oxytoca and one was predominantly Acinetobacter baumannii). This could be due to either mixed culture with K. pneumoniae and another bacterium, or contamination following the initial identification of K. pneumoniae. Since the original identification was recorded as K. pneumoniae, and the presence of K. pneumoniae DNA was confirmed by sequencing, we include these three specimens in the reporting of K. pneumoniae clinical isolates; but excluded them from further genomic analysis. A further 29 DNA sequences were excluded from genomic analysis due to either (i) failing quality control thresholds (mean read depth <25×, coverage of reference sequence <85%), or (ii) suspicion of mixed Klebsiella strains (ratio of heterozygous:homozygous core gene variant sites ≥2%). The remaining 328 WGSconfirmed K. pneumoniae isolates (from 289 patients) underwent detailed genomic analyses. A flow chart of isolate and genome processing is given in Supplementary  Fig. 1, details of isolates and WGS data accessions are given in Supplementary Data 1.
Single nucleotide variant analysis and multi-locus sequence typing. Single nucleotide variants (SNVs) were identified by mapping Illumina reads against the ST23 K. pneumoniae strain NTUH-K2044 reference genome (NC_012731.1), using the mapping pipeline RedDog v1b.10.3 (https://github.com/katholt/reddog). RedDog uses Bowtie2 v2.2.5 70 to map reads and SamTools v1.2 71,72 to call SNVs with Phred quality score ≥30, as described previously 31 . Multi-locus sequence typing (MLST) was performed, and sequence type (ST) assigned based on the 7-locus scheme 73 , by analysing assemblies with Kleborate v2.0.0 63 . Novel STs were submitted to the K. pneumoniae BIGSdb-Pasteur database 23 for allele assignments. To identify other geographic continents from which STs identified in this study have previously been reported, we used MLST data reported for 13,156 whole genome sequences publicly available in RefSeq in July 2020 (available as Supplementary Data 2 in Lam et al. 63 ).
Phylogenetic analysis. Core genes were defined as those that were annotated in the reference genome and present (coverage ≥95% and mean read depth ≥5×) in all of the sequenced isolates based on the mapping analysis. A maximum likelihood phylogenetic tree was inferred from an alignment of all homozygous SNVs (n = 690,727 SNVs) identified within 3,135 core genes in the 328 genomes, using FastTree v2.1.8 74 . The tree file is available via MicroReact (https://microreact.org/ project/kaspahclinical). Phylogenetic clusters were defined using a patristic distance threshold of 0.01 and were extracted from the trees using R to define lineages. The threshold for clustering was determined by assessing the distribution of pairwise distances, which showed an inflection point at patristic distance d = 0.01 (0.0044% of pairs cluster using d = 0.01, compared with 0.00025% using d = 0.005, or 0.0376% with d = 0.015). The patristic distance 0.01 in our tree equates to~6900 core SNVs, or 0.13% nucleotide divergence.
Surface antigen biosynthesis and acquired virulence loci. Capsule locus (KL) types and lipopolysaccharide O antigen (O) types were identified from the resulting assemblies using Kaptive v2.0; 62,75 KL and O types with a match confidence of 'good' or better (as described at https://github.com/katholt/Kaptive) were reported; genomes with a match confidence of 'low' or 'none' were investigated through manual exploration of the assembly graph in Bandage v0.8.1 76 . Putative novel loci were extracted and annotated with Prokka 68 followed by manual curation. Loci that could not be resolved via manual inspection were marked as "unknown" (i.e., if the assembly graph was fragmented in the region of the K/O locus, or because there was not a single unambiguous path through the locus). Kleborate v2.0.0 63 was used to screen each genome assembly for key acquired virulence factors that are significantly associated with invasive infections in humans: 22 yersiniabactin 24 (ybt), salmochelin 35 (iro), colibactin 24 (clb), aerobactin 35 (iuc), and regulators of the mucoid phenotype (rmp locus, rmpA2).
Plasmid analyses. Plasmid content was assessed using multiple methods. Replicon (rep) markers were identified by screening assemblies against the PlasmidFinder database 43 using BLASTn (80% identity and 80% coverage thresholds). Mob types (mob) were assigned using iterative PSI BLAST as described previously 42 . Contigs were assigned as being of plasmid or chromosomal origin using Kraken as previously described 77 (all other contigs were marked as 'unknown').
Genetic determinants of antimicrobial resistance (AMR). Kleborate v2.0.0 was used to screen each genome assembly for acquired resistance genes and known chromosomal mutations associated with resistance to fluoroquinolones, colistin and carbapenems 63 . The detected AMR determinants were used to predict resistance to ceftriaxone (based on presence of acquired ESBL genes), meropenem (acquired carbapenemases and ompK35/36 alleles), ciprofloxacin (known gyrA and parC mutations and acquired qnr genes, but not aac(6')-Ib-cr as there is no evidence this gene can raise the MIC above the breakpoint for clinical resistance in the absence of other determinants 78 ), gentamicin (acquired resistance genes defined in CARD v3.0.8 79 ), trimethoprim (acquired dfr genes), and trimethoprim/sulfamethoxazole (acquired dfr genes plus sul genes). In line with established norms for reporting on accuracy of susceptibility testing in clinical laboratories 80 , and translation of these principles to reporting on WGS-based identification of resistance 81,82 , results were expressed in terms of major and minor errors. A major error was defined as a phenotypically susceptible isolate that carried one or more determinants of resistance for the specified drug; a very major error was defined as a phenotypically resistant isolate in which no known resistance determinant for that drug was identified in the genome.
Detection of species hybrids. The genome collection was screened for hybrids by using BLASTn to align contigs against a set of reference assemblies for each Klebsiella species. The BLAST alignments were then used to assign per-species sequence identity to each position in the contig. Each assembly's overall species composition was then quantified, based on assignment of genomic regions to the closest matching species, and hybrids identified as those with ≥3% of the genome assigned to a second KpSC species. This analysis was implemented in a Python script, available at http://github.com/rrwick/Klebsiella-assembly-species.
Transmission analysis. Pairwise core gene SNV counts were extracted from the SNV alignment described above, and used to infer transmission networks comprising nodes (one representative isolate per infection episode) connected by edges where the pairwise distance was ≤25 SNVs (based on our previous investigation of within-patient vs between-patient SNV distances 31 , and other recent studies of CP K. pneumoniae transmission 38,39 ) and the temporal distance was ≤45 days (twice the median of time-to-infection estimated for colonized patients 14 ). The network function in the R package network (v1.17.1) was used to construct the transmission network, and to extract clusters of connected nodes (isolates). Putative transmission clusters were manually reviewed for plausible epidemiological links; one was removed because onset of the second case occurred on day 1 of admission and no previous admissions with Alfred Health were recorded; another was removed because it comprised specimens taken from one outpatient and one inpatient collected on the same day.
Statistical analysis. All statistical analyses were conducted using R (v3.6.3). Specific tests used are given together with each result in the text, corresponding R functions are: wilcox.test for Wilcoxon rank-sum test (two-sided); prop.test for test of differences in proportions (two-sided); fisher.test for Fisher's exact test (two-sided); chisq.test for Chisquare test (two-sided); bartels.rank.test with left-sided test for trend, to test for temporal trends in monthly isolate counts; glm with 'family = binomial(link = 'logit')' for logistic regression. Figures were plotted in R using ggplot2 v3.3.5, ggnetwork 83 v0.5.10 and ggtree 84 v2.4.2.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The sequence data generated in this study have been deposited in public databases as follows. Raw Illumina reads are available in the European Nucleotide Archive (ENA), under BioProject accessions PRJEB6891 and PRJNA351909. Run accessions for each genome are provided in Supplementary Data 1. Reference-quality hybrid Illumina+ONT assemblies are available in GenBank, individual genome accessions are given in Supplementary Data 1. Illumina-only assemblies were not deposited as they are considered draft assemblies, however the full set of assemblies and pan-genome data used in this study are available in FigShare under https://doi.org/10.26180/16811344. The annotated phylogeny is available in the MicroReact online viewer [http://microreact.org/ project/kaspahclinical].