Convergence of virulence and antimicrobial resistance in increasingly prevalent Escherichia coli ST131 papGII+ sublineages

Escherichia coli lineage ST131 is an important cause of urinary tract and bloodstream infections worldwide and is highly resistant to antimicrobials. Specific ST131 lineages carrying invasiveness-associated papGII pathogenicity islands (PAIs) were previously described, but it is unknown how invasiveness relates to the acquisition of antimicrobial resistance (AMR). In this study, we analysed 1638 ST131 genomes and found that papGII+ isolates carry significantly more AMR genes than papGII-negative isolates, suggesting a convergence of virulence and AMR. The prevalence of papGII+ isolates among human clinical ST131 isolates increased dramatically since 2005, accounting for half of the recent E. coli bloodstream isolates. Emerging papGII+ lineages within clade C2 were characterized by a chromosomally integrated blaCTX-M-15 and the loss and replacement of F2:A1:B- plasmids. Convergence of virulence and AMR is worrying, and further dissemination of papGII+ ST131 lineages may lead to a rise in severe and difficult-to-treat extraintestinal infections.

A large proportion of human urinary tract and bloodstream infections are caused by a few globally dispersed E. coli clones, including sequence type (ST) 69, ST73, ST95, and ST131 1 . Despite its recent emergence, ST131 is the dominant multi-drug resistant clone among extraintestinal pathogenic E. coli (ExPEC) isolates today 2 . In particular, high rates of resistance to 3 rd -generation cephalosporins and fluoroquinolones among ST131 isolates present a major public health risk, leading to its classification as a critical priority pathogen by the WHO 3 .
ST131 colonizes the human gastrointestinal tract as a commensal but causes mild to severe infections in the urinary tract including pyelonephritis and urosepsis. Specific ST131 sublineages are also overrepresented among asymptomatic bacteriuria 10,11 , which may be explained by varying underlying virulence profiles in these sublineages. Irrespective of their clade affiliation, most ST131 isolates carry mobile genetic elements encoding synthesis of the siderophores aerobactin (iuc) and yersiniabactin (ybt), two important factors promoting extraintestinal colonization 12,13 . Other virulence factors such as hly (hemolysin), iro (salmochelin siderophore), agg (aggregative adherence fimbriae AAF), or papGII (P fimbrial tip adhesin variant PapGII) are restricted to specific ST131 sublineages 11,14,15 . Of particular significance is papGII, which was recently identified as a key determinant of invasive uropathogenic E. coli (UPEC) in infection experiments and genome-wide association studies 11,16,17 . Approximately 60% of E. coli isolates from invasive urinary tract infections (i.e., pyelonephritis or bacteremia with a urinary portal of entry) carry papGII, while the gene is less common among isolates from patients with cystitis or asymptomatic bacteriuria 11 . PapGII drives inflammation and renal tissue damage through transcriptional activation of signalling pathway genes in kidney cells, resulting in kidney and bloodstream infections 16 . Overall, approximately half of all E. coli bacteremia cases are associated with an entry through the urinary tract [18][19][20] .
ST131 comprises multiple sublineages that independently acquired papGII-containing pathogenicity islands (papGII PAIs) 11,21 . Currently, it is unknown how the antimicrobial resistance gene (ARG) content relates to these papGII-containing (papGII+) lineages. Dissemination of highly resistant and more invasive lineages would aggravate the global burden of already difficult-to-treat infections caused by E. coli.
In E. coli, virulence and AMR are frequently encoded by mobile genetic elements such as plasmids, genomic islands (PAIs and resistance islands [REIs]), bacteriophages, or transposons [22][23][24] . Relative high rates of acquisitions of mobile genetic elements with a highly dynamic accessory genome 9 make E. coli ST131 an interesting model to examine the co-evolution of AMR and virulence. In this study, we investigate the population structure, resistome, and distribution of papGII in ST131 using 1638 publicly available genomes of human isolates. Our results reveal significant evolutionary changes and a genetic convergence of virulence and AMR in increasingly prevalent papGII+ sublineages of ST131.

Results
Isolate collections and resistome. Publicly available genomes of 1638 E. coli ST131 isolates were analysed in this study. These included 1538 whole-genome draft assemblies from 11 collections and 100 high-quality reference assemblies (Supplementary Data 1). The isolates originated from human bloodstream infections (n = 843), urinary tract infections (n = 306), feces (n = 83), and other (n = 9) or unknown (n = 397) clinical sources, and were isolated between 2001 and 2017 in Europe, North America, Asia, and Oceania (Table 1). In eight of the 11 source studies (comprising 1148 isolates), isolates were specifically selected for being ESBL-producing. Genome sizes ranged from 4.69 Mb to 5.73 Mb, and all assemblies passed quality control (N50 > 45 kb, >99% completeness).
papGII+ isolates presented with an increased ARG content: on average, papGII+ isolates harboured 8.7 ARGs (median 9; SD 3.6) versus 6.3 ARGs (median 7; SD 3.7) among papGII-negative isolates. The positive association between papGII presence and ARGs was found to be significant (P adj < 0.05, Mann-Whitney U test) within each of the ST131 clades A, B, C1, and C2 and across different isolation time intervals (Supplementary Table 3). This significant association was also found irrespective of whether isolates were pre-selected for being ESBL-producing and among urinary and blood isolates. The increased ARG content was not significant among faecal isolates, for which limited data was available. This association was confirmed using an extended dataset, comprising assemblies of the main dataset and 3,608 additional assemblies of human ST131 isolates from EnteroBase. The extended dataset showed that papGII+ isolates from bloodstream infections carry similar numbers of ARGs than papGII+ isolates of urinary or faecal origin. Regardless of their source, significantly more ARGs were found among papGII+ isolates than among papGII-negative isolates (Supplementary Table 4). As the acquisition of papGII occurs via PAIs 25 and acquisition of ARGs predominantly via plasmids 24 , virulence and AMR acquisition likely occurred independently. Among the 30 resolved papGII+ PAIs from high-quality assemblies, only one contained ARGs within the same PAI (Supplementary Table 2).
The difference in the ARG content between papGII+ and papGII-negative isolates could not be attributed to one specific ARG or AMR class. When stratified by clade, different ARGs were significantly (P adj < 0.05, Fisher's exact test) associated with papGII+ isolates, including those conferring resistance against 3 rd -generation aminoglycosides, cephalosporins, fluoroquinolones, sulfamethoxazole, and trimethoprim ( Table 2, Supplementary Table 5, Supplementary Fig. 5).
The third-largest ST131 papGII+ sublineage L3 (clade C2; 34 isolates) was epidemiologically more diverse with isolates originating from Asia, Europe, and North America (Supplementary Fig. 2). In 14 isolates, blaCTX-M-15 was chromosomally integrated into ydhS (encoding a putative oxidoreductase); in 4 isolates blaCTX-M-15 was integrated at other chromosomal positions (near aphA, ghrB, ubiX, or into prophage mEp460); in one isolate it was located on a plasmid; and in the remaining 15 isolates, the blaCTX-M-15 genetic context could not be resolved from short-read assemblies. Also in papGII+ sublineages of ST131 clades A and C1, blaCTX-M genes were often chromosomally integrated (Table 3). Incomplete assemblies from shortread data did however not allow a systematic evaluation of papGII+ versus papGII-negative isolates.

Discussion
Our genomic study of global public ST131 data suggests a convergence of virulence and AMR in increasingly prevalent papGII + E. coli ST131 sublineages. ARGs enriched in papGII+ isolates included those conferring resistance against fluoroquinolones, 3 rd -generation cephalosporins, aminoglycosides, and trimethoprim/sulfamethoxazole, which are important treatment options for urinary tract infections and bacteraemia. Most papGII+ sublineages expanded after the year 2005 within the multi-drug resistant clade C2, implying that PAIs harbouring papGII+ were acquired after AMR determinants such as the clade C-specific QRDR mutations and blaCTX-M-15. However, also within this clade C2, we observed higher levels of AMR in papGII+ isolates compared to papGII-negative isolates, suggesting a further synergy between AMR and virulence. We assume that virulence genes may contribute to the maintenance and further acquisition of ARGs by causing more severe disease which needs more extensive treatment. UPEC typically reside in the gut implying that antibiotic treatment for UTI may create evolutionary pressure both in the gut and at the site of infection 23,28 . Increased AMR may hence result in prolonged extraintestinal colonization, intestinal blooms, and eventually enhanced dissemination of specific UPEC clones 29 . The observed convergence of AMR and virulence in ST131 may not be generalized to other E. coli   11,14,30 . This might be explained by differences between lineages in their ability to acquire genes and integrate them into regulatory and functional processes, or by their different lifestyles and occupation of niches 31,32 . While the success of early clade C2 sublineages was partly attributed to the maintenance of pMLST F2:A1:B-plasmids containing blaCTX-M-15 5 , we observed that the more recently emerged papGII+ sublineages within clade C2 were characterized by the presence of various IncFIB plasmids and chromosomally encoded blaCTX-M-15. Transpositions of blaCTX-M-15 from the plasmid to the chromosome of clade C2 isolates have been described in multiple prior studies 5,6,9,27,33,34 . The chronological order of AMR and papGII acquisitions could not be elucidated with confidence from our data. Conceivably, (i) transposition of blaCTX-M-15 from an F2:A1:B-plasmid to the chromosome was followed by (ii) the loss of F2:A1:B-plasmids, (iii) the acquisition of FIB plasmids, and (iv) the acquisition of papGII+ PAIs. Varying selective advantages of plasmids or plasmid-PAI incompatibilities may underly the co-presence of papGII+ and IncFIB in clade C2. Specific plasmids may for example be involved in the horizontal co-transfer of papGII+ PAIs: E. coli PAIs typically lack mobilization and transfer genes, but conjugative (co-)transfer of UPEC PAIs was previously shown in vitro using helper plasmids 35,36 . A complex plasmid-island interaction is for example known for S. enterica, where the mobilizable resistance island SGI1 is incompatible with IncC/A plasmids, but relies on those for propagation 37 .
The increasing dominance of papGII+ ST131 strains has been reported before. Royer et al. 21 6 . This C2 sublineage carried a chromosomal blaCTX-M-15 and was here found to also harbour papGII.
Sublineage L1a was most abundant among the papGII+ ST131 sublineages. L1a was globally disseminated and characterized by (i) a PAI with papGII, cnf1, and hly, (ii) a chromosomal resistance cassette with aac(3)-IIa, aac(6')-Ib-cr, blaOXA-1,   38 . An apparent stable integration of blaCTX-M-15 and aac(6')-Ib-cr might have contributed to its success: in clade C2, the ciprofloxacin-inactivating AAC(6')-Ib-cr was shown to confer a selective advantage in the presence of ciprofloxacin over isolates that contained QRDR mutations alone 39 . A limitation of our study was that 8 of the 11 ST131 collections (1148 isolates) were pre-selected for ESBL-producing isolates, introducing sampling bias. Among those, ESBL genes were detected in 1062 (92.5%) isolates, compared to 146/390 (37.4%) isolates from the 3 remaining collections, suggesting that AMR (in particular ESBL prevalence) was overestimated here. We stratified the statistical analysis by pre-selection criteria to take this into account. In addition, with more than half of all isolates originating from bloodstream infections, papGII-containing isolates are likely also overrepresented in our data relative to the overall ST131 population. Furthermore, the investigated isolates were not available for phenotypic AMR validation. Sensitivities and specificities of AMR genotype-phenotype predictions in E. coli were previously estimated to be >95% and >90%, respectively, for most antibiotics 22 . Lastly, we were unable to determine the genomic location of most ARGs from the available short-read data. Long-read sequencing of more isolates would allow a better understanding of the observed association between papGII, chromosomal blaCTX-M elements, and specific plasmid replicons, but has not been employed for large collections like the ones studied here due to the high costs.
In conclusion, we describe the convergence of virulence and AMR in papGII+ ST131 sublineages, which is an important Collections that consisted only of ESBL isolates are labelled with an asterisk in the legend. The number of substitutions per core genome alignment site is indicated by the scale bar. The tree was visualized using iTOL 74 . Numbers at the outermost ring refer to the 12 isolates with high-quality assemblies and resolved DUF4132 region, which are illustrated in panel b. Sublineage L1 was unambiguously defined within the ST131 population by a distinct mtlD allele (encoding a mannitol dehydrogenase family protein; 96% sequence identity to the mtlD allele of other ST131 isolates; Genbank accession number TLH04005.1) b Genetic context of the DUF4132 region in high-quality assemblies from 12 isolates of papGII+ sublineage L1. All 9 isolates of subbranch L1a (isolates 4′ to 12′) harboured TnMB1860 with the resistance genes aac(3)-IIa, aac(6')-Ib-cr, blaOXA-1, blaCTX-M-15, and ΔcatB3 inserted into the chromosomal DUF4132 region. Vertical boxes between sequences indicate shared homologies (100% identity). Sequence comparisons were performed using EasyFig 64 .
concept among emerging pathogens. A similar convergence of virulence and AMR in specific clones has been observed in other pathogens, including K. pneumoniae 40 , S. enterica 41 , and S. aureus 42 . UTIs caused by ST131 papGII+ strains presumably lead to more severe infections and are more challenging to treat with commonly used antibiotics, underlining the need for novel preventive and curative strategies to manage infections.

Methods
Bacterial genomes: main dataset. Genomes of 1638 E. coli ST131 isolates of human origin were included in this study. 1538 genomes originated from 11 publicly available collections. Inclusion criteria for collections were (i) a large number (>40) of ST131 genomes within a single collection, (ii) the availability of metadata, and (iii) a human source of isolation. In addition, 100 public high-quality assemblies (N50 > 1.5 Mb; derived from long-read sequencing) were included for the analyses of mobile genetic elements that could not be resolved in assemblies from short-read data. Assemblies were obtained from EnteroBase 43 or NCBI and sequence types were confirmed with the Achtman scheme 43 55 in conjunction with the resfinder database 56 (minimum sequence coverage/identity 70%/90%). Network graphs were constructed in R 3.5.0 for ARG combinations co-occurring in at least 15 isolates with a Jaccard distance of <0.5. For each pair of ARGs (ARG A and ARG B ), Jaccard distance (1-|ARG A ∩ARG B |/|ARG A ∪ ARG B |) were calculated with the vegdist function in the R package vegan v2.5-7 57 and networks were analysed and visualized using the R package igraph v1.2.6 58 . The genetic context of blaCTX-M genes was inspected manually using CLC sequence viewer 8. Disruptions of the mppA and DUF4132 loci were investigated by determining their BLAST alignment coverage using ABRicate v0.9.3 with mppA (P423_RS08085) and the DUF4132 region (P423_RS12390-P423_RS12392) from strain JJ1886 (GCF_000493755.1) as query. Hits with <90% (mppA) and <70% (DUF4132 region) query coverage were classified as disrupted.
Statistical tests. Statistical analyses were performed using R version 3.5.3. Frequency counts were compared using a two-tailed Fisher's exact test, while nonnormally distributed continuous variables were analysed using the Mann-Whitney U test (two-sided). P values were adjusted for multiple testing using Bonferroni correction and adjusted P values of <0.05 were considered to reflect statistical significance.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
All genome assemblies were obtained from public databases. Accession numbers are listed in Supplementary Data 1 (main dataset) and Supplementary Data 3 (validation dataset). Source data for the main figures and calculations can be found in Supplementary Data 1, 2 and 3. All other data are available from the corresponding author on reasonable request.