Evolutionary Highways to Persistent Bacterial Infection

Persistent infections require bacteria to evolve from their naïve colonization state by optimizing fitness in the host. This optimization involves coordinated adaptation of multiple traits, obscuring evolutionary trends and complicating infection management. Accordingly, we screen 8 infection-relevant phenotypes of 443 longitudinal Pseudomonas aeruginosa isolates from 39 young cystic fibrosis patients over 10 years. Using statistical modeling, we map evolutionary trajectories and identify trait correlations accounting for patient-specific influences. By integrating previous genetic analyses of 474 isolates, we provide a window into early adaptation to the host, finding: 1) a 2-3 year timeline of rapid adaptation after colonization, 2) variant “naïve” and “adapted” states reflecting discordance between phenotypic and genetic adaptation, 3) adaptive trajectories leading to persistent infection via 3 distinct evolutionary modes, and 4) new associations between phenotypes and pathoadaptive mutations. Ultimately, we effectively deconvolute complex trait adaptation, offering a framework for evolutionary studies and precision medicine in clinical microbiology.

we screen 8 infection-relevant phenotypes of 443 longitudinal Pseudomonas aeruginosa 23 isolates from 39 young cystic fibrosis patients over 10 years. Using statistical modeling, we 24 map evolutionary trajectories and identify trait correlations accounting for patient-specific 25 influences. By integrating previous genetic analyses of 474 isolates, we provide a window into 26 early adaptation to the host, finding: 1) a 2-3 year timeline of rapid adaptation after 27 colonization, 2) variant "naïve" and "adapted" states reflecting discordance between 28 phenotypic and genetic adaptation, 3) adaptive trajectories leading to persistent infection via 29 3 distinct evolutionary modes, and 4) new associations between phenotypes and 30 pathoadaptive mutations. Ultimately, we effectively deconvolute complex trait adaptation, 31 offering a framework for evolutionary studies and precision medicine in clinical microbiology. to map due in part to competing modes of evolution. We know from laboratory evolution 48 studies in highly controlled conditions that these multiple modes are at work and induce 49 substantial phenotypic adaptation to minimal media within the initial 5,000-10,000 50 generations 6-8 , but only an estimate is available of the timeline of adaptation in the complex 51 CF lung environment 9 . Multiple recent studies have shown a high degree of population 52 heterogeneity in chronic CF infections that could be influenced by competing evolutionary 53 modes, but past consensus has been that select traits converge torwards similar "evolved" 54 states during most CF infections (e.g. loss of virulence and increase in antibiotic 55 resistance) 3,10-12 . This convergence can be complex and drug-driven, as recent studies have 56 shown development of collateral sensitivity to antibiotics (treatment with one drug can 57 induce reciprocal changes in sensitivity to other drugs) 13 ; this illustrates that a single selection 58 A unique dataset. The 443 clinical P. aeruginosa isolates originate from a cohort of 39 youth 96 with CF (median age at first P. aeruginosa isolate = 8.1 years) treated at the Copenhagen CF 97 Centre at Rigshospitalet and capture the early period of adaptation, spanning 0.2-10.2 years 98 of colonization by a total of 52 clone types. Of these isolates, 373 were previously 99 characterized in a molecular study of adaptation 16 . The "colonization time of an isolate" (ColT) 100 is defined for each specific lineage, approximating the length of time since a given clone type 101 began colonization of the CF airways in the specific patient. Importantly, our colonization time 102 metric does not necessarily start at the true "time zero", since a significant bacterial load is 103 necessary for a positive culture. Our isolate collection also does not capture the complete 104 To obtain systems-level readouts of pathogen adaptation in the host and thereby assess 111 multi-trait evolutionary trajectories, we present an infection-relevant characterization of our 112 isolate collection entailing high-throughput measurements of 8 phenotypes: growth rates (in 113 Luria-Bertani broth (LB) and Artificial Sputum Medium (ASM)), antibiotic susceptibility (to 114 ciprofloxacin and aztreonam), virulence factors (protease production and mucoidity), and 115 adherence (adhesion and aggregation) (Figure 1 and 2). We define adherence as a shared 116 trend in adhesion and aggregation which we associate with a biofilm-like lifestyle (see 117 Methods for further discussion of limitations of these measures). These phenotypes are 118 generally accepted to change over the course of colonization and infection of CF airways 119 based primarily on studies of chronically-infected patients 10,17,23,24 . 120 121 That is, an "evolved" isolate would grow slowly, adhere proficiently, be more likely to exhibit 122 a mucoid and/or hypermutator phenotype, have reduced protease production, and resist 123 antibiotics, in contrast to a "naïve" isolate ( Figure 2B). However, simply ordering our 124 measurements by colonization time does not illustrate an overarching adaptive trajectory 125 from naïve to evolved phenotypes ( Figure 2C). Instead, we see substantial heterogeneity, with 126 isolates that resemble both naïve and evolved phenotypic states throughout the study period. 127 Given that we are investigating a unique collection from a young patient cohort that we track 128 for a substantial period of colonization, this data fills the critical gap between studies of acute 129 infections and chronic infections 25 Figure 2. Phenotypic characterization. We present summary statistics of our phenotype screen including (A) mean and standard deviation of isolate data versus the P. aeruginosa PAO1 value or antibiotic breakpoint we use for normalization, respectively, as well as boxplots of continuous variables (showing the median, 1 st and 3 rd quartile hinges, whiskers extending from the hinges to the most extant value within 1.5x inter-quartile range, and outliers as points). We then compare the (B) expected adaptation over time based on field consensus versus (C) the measured raw adaptation of our isolate collection over time. After sorting the isolates (x-axis) by the time since colonization of a specific lineage or "colonization time" (ColT), it is still difficult to see consistent patterns of phenotypic change over time. Colors are linked with the expected change of the specific phenotype (B), so that blue denotes a "naïve" phenotype and red denotes an "evolved" phenotype. For growth rate (in artificial sputum medium (ASM)), adhesion, and aggregation, naïve and evolved phenotypes are roughly divided by comparison with the reference isolate PAO1 phenotype. For aztreonam and ciprofloxacin MIC, naïve and evolved phenotypes are based on sensitivity or resistance as indicated by the EUCAST breakpoint values as of March 2017.
using GAMMs (Figure 1). We describe our approach below in brief, with more extended 147 explanation available in both the Methods and Supplements 1 and 2. 148 149 With AA, we want to assess multi-trait adaptive paths within the context of the evolutionary 150 landscape. We map these paths (or trajectories) by first fitting idealized extreme isolates 151 ("archetypes") located on the boundaries of the evolutionary landscape and then evaluating 152 every other isolate according to its similarity to these idealized extremes. The archetypes are 153 positioned at the "corners" of the principal convex hull (PCH), the polytope of minimal volume 154 that effectively encapsulates our phenotype dataset 29 (Figure 1, bottom panel). We 155 conceptualize archetypes as the "naïve" and "evolved" states of plausible adaptive 156 trajectories and predict both the optimal number of archetypes and their distinct phenotypic 157 profiles. We illustrate the AA by the 2D projection of our multi-trait model via a "simplex" 158 plot, as shown in Figure 3C Figure 3A. We use only growth rate 177 in ASM due to its correlation with growth rate in LB ( Figure 3D). The simplex plot of Figure 3C  178 highlights the standout features of each archetype by annotating according to the highest or 179 lowest values for each phenotype across all archetype trait profiles ( Figure 3B). This simplex 180 key illustrates that two archetypes resembled naïve and un-evolved isolates with fast growth, 181 antibiotic susceptibility, and low adherence (Archetype A3 and A5), while two others 182 accounted for slow-growing evolved archetypes (A2 and A6), in accordance with the accepted 183 paradigm 10,24 . A substantial portion of isolates in our study resemble the naïve archetypes 184 more closely than the evolved archetypes as indicated by their localization in the simplex plot 185 ( Figure 3C, most isolates cluster on the left near the naïve archetypes). This aligns with the 186 infection stage of the patients included in this study. Importantly, we also find two regions in 187 the simplex visualization which represent different focal points of adaptation: 1) an increase 188 in adherence (A2 and A4) and 2) ciprofloxacin resistance (A1 and A6). 189

190
We also built a GAMM for each of our six continuous phenotypes to identify whether any of 191 the other traits and time influenced it significantly across our patient cohort ( Figure 3D). 192 When evaluating adaptation of the specific phenotypes, we found that the colonization time 193 had a significant impact on both growth rate and sensitivity to ciprofloxacin but did not 194 significantly influence sensitivity to aztreonam ( Figure 3C, Figure  An important distinction between AA and GAMMs is that many isolates clearly cluster in AA 201 according to phenotypes whose adaptation is not significantly influenced by time of 202 colonization as shown by GAMMs. This contrast shows the importance of combining these 203 approaches to understand our data. As an example, both adhesion and aggregation do not 204 correlate with colonization time for this population of young patients, though we see 205 selection for adherence in a few specific patients via AA. That this is not a major trend in our 206 data is surprising when we consider that a biofilm lifestyle is expected to be beneficial to 207 persistence in chronically infected patients 4,34-36 . Furthermore, the biofilm-related metric of 208

F a s t G r o w t h A
A D.    Figure 4. Rapid early adaptation. We present specific GAMM and AA models to illustrate the rapid adaptation of growth rate and ciprofloxacin over time and contrast these patterns with genetic adaptation via the accumulation of nonsynonymous mutations. Here, we use GAMMs to illustrate the significant impact of the explanatory variable colonization time on (A) growth rate in ASM, (B) ciprofloxacin sensitivity in ASM, (E) the accumulation of all mutations (orange) and nonsynonymous SNPs (blue) and indels (insertions and deletions). We use simplex visualizations of AA to show (C) "naïve" trait alignment of the first isolate of the twenty patients where we have analyzed the first P. aeruginosa isolate ever cultured at the CF clinic (blue circles) in contrast to "evolved" isolates that have been cultured at year 2-3 of colonization (red squares, all patients of the dataset). We contrast this trait-based ordination with (D) genetic adaptation, shown by a color overlay of the number of nonsynonymous mutations present in each isolate. Isolate 95 (purple circle) of the DK12 clone type has a very high number of mutations (>100) because one isolate in that lineage (isolate 96) is very different from the remaining 11 isolates. For the GAMM analysis shown in Figure 4E, we filtered out the mutations from the errant DK12 96 single isolate that affected the whole lineage. Hypermutators are marked by purple triangles. (A/B/E notation) GAMMs are illustrated by solid smoothed trendlines, dashed two standard error bounds, and gray points as residuals. Y-axes are labelled by the predictor variable on which the effect of colonization time of the clone type ("ColT") has been estimated as well as the estimated degrees of freedom (edf) (for the E upper panel the edf is ordered as all mutations/NS SNPs). Residuals have not been plotted in the upper panel of (E) for clarity reasons. X-axes are the ColT in years and patients are included as random smooths together with ColT. A rug plot is also visible on the x-axis to indicate the density of observations over time.

C. A. # Archetypes mean RSS (E-02)
accumulation of all nonsynonymous mutations appears logarithmic with accumulation 249 slowing after 2 years ( Figure 4E). This behavior resembles that of the laboratory evolution of 250 E. coli (propagated for more than 60,000 generations) 40 , though accumulation may slow 251 sooner in the CF lung. When we plot accumulation of indels alone, we see the likely driver of 252 the logarithmic trend. When combined with the discordance found by AA, these findings 253 support the theory that select beneficial mutations (for example, a highly impactful indel) can 254 alone induce important phenotypic changes that improve fitness 41 nfxB (B, right panel). Furthermore, gyrB-mutated isolates cluster more closely with A2 and A4 than gyrA mutated isolates, indicating a potential association with adhesion; GAMMs predicts that gyrB mutation has a significant impact on adhesion (GAMM, p-value << 0.01). (C, left panel) Mutations in the retS/gacAS/rsmA system shows a clear phenotypic change when retS is mutated alone (blue circles) or in combination with gacA or gacS (red squares and circles). The associated lineage plot (C, right panel) shows the appearance of double mutations (retS + gacA/S) after a colonization period by retS mutated isolates in three patient lineages. (A/B/C -lineage plot notation) Lineage length is based on the span of time for which we have collected isolates and is indicated by gray bracketed lines, with only isolates affected by a mutation of interest plotted using shape to indicate mutation type. Symbol color indicates the specific mutation location in the affected gene and (A/B only) symbol size indicates the level of resistance to ciprofloxacin. Multiple isolates may be collected at the same sampling date based on differences in colony morphology or collected from different sinuses at sinus surgery, which explains the vertical overlap of isolates for some lineages.

Differential evolutionary potential via ciprofloxacin resistance mechanisms 282
The primary drivers of ciprofloxacin resistance in P. aeruginosa are theorized to be mutations 283 in drug efflux pump repressor nfxB and the gyrase subunits gyrA and gyrB of the DNA 284 replication system 47-49 . We would therefore expect isolates with mutations in these genes to 285 cluster around archetypes A1 and A6 characterized by high ciprofloxacin minimal inhibitory 286 concentrations (MICs) ( Figure 3C). However, AA illustrates a broad distribution of gyrA/B 287 mutants among archetypes, and a contrasting narrow distribution of nfxB mutants ( Figure 5A- Interestingly, we noted that isolates with a gyrB mutation (22 isolates alone or 14 in concert 300 with gyrA mutation) are concentrated closer to "biofilm-linked" archetypes A2 and A4 than 301 isolates with only a gyrA mutation (33 isolates). To our knowledge, there is no direct 302 relationship between gyrB and the capability to adhere 49 . This positive association of gyrB on 303 adhesion was confirmed by GAMM, but when we moved the two SNPs affecting the most 304 isolates in both gyrA and gyrB (2 lineages each, Figure S4) into lab strain P. aeruginosa 305 PAO1, we did not find the same association ( Figure S5-6) (p-values > 0.05, ANOVA with Tukey 306 correction, F(4,10)=0.233). We then looked for co-occurring mutations in biofilm-linked genes 307 in the gyrB-mutated lineages; for all but one lineage, there was no obvious explanation for 308 increased adhesion. Ultimately, this association underlines the impact that genetic 309 background and the multi-genetic signature of biofilm regulation can have on the 310 identification of links between genotype and phenotype 50 . 311

Infection trajectory reversal via a regulatory switch 313
The functional model of the retS/gacA/gacS/rsmA regulatory system is theorized to be a 314 bimodal switch between acute and chronic infection phenotypes 46 influences by mapping patient-specific adaptive trajectories. We find 3 overarching modes of 341 evolution that P. aeruginosa can utilize to persist successfully in individual patients: 1) 342 convergent evolution, 2) directed diversity or 3) general diversity. Figure 6A-D shows  343 examples of adapting lineages employing these modes. We see rapid convergent evolution 344 towards an endpoint of ciprofloxacin resistance in patient P5304 ( Figure 6A). Diverse isolates 345 appear to move in the same general direction of increased adhesion and aggregation in 346 patient P4104 (Figure 6B), which we term "directed diversity", while no directionality is 347 apparent in the diverse isolates of the trajectory of patient P0804 (Figure 6C), which we term 348 "general diversity". In the complex trajectory of patient P1404 (Figure 6D 424 We provide our complete phenotype dataset in raw form as a supplemental spreadsheet and 425 include a visualization and summary statistics of normalized data in Figure 2. Data 426 normalization, processing and construction of all models was performed in R as described 427 above and all essential code for reproduction of these steps is provided in R Markdown format 428 in supplemental files 1-2. These files also include code for replicating the model visualizations 429 of Figure    This collection is a complement to and extension of the collection previously published 16 and 622 captures the period of initial rapid adaptation 6,7,9 , with 389 isolates of the previously 623 published collection included here in addition to 54 new isolates. To build a homogeneous 624 collection for our study of evolution, we excluded two patients with a sustained multi-clonal 625 infection. For the GAMM analysis, we excluded isolates belonging to clone types present in a 626 patient at two or fewer time-points, unless the two time-points were sampled more than 6 627 months apart. The isolates not included in the previous study have been clone typed as a 628 routine step at the Department of Clinical Microbiology at Rigshospitalet. This clone type 629 identification was performed as described previously 16  Isolates were re-grown from frozen in 96 well plates in 150ul media (LB or ASM) and 645 incubated for 20h at 37°C with OD630nm measurements every 20 min on an ELISA reader. 646 Microtiter plates were constantly shaking at 150 rpm. LB growth rates were first assessed by 647 manual fitting of a line to the exponential phase of the growth curve. This dataset was then 648 used to confirm the accuracy of R code that calculated the fastest growth rate from each 649 growth curve using a "sliding window" approach where a line was fit to a 3-9 timepoint 650 interval based on the level of noise in the entire curve (higher levels of noise triggered a larger 651 window to smooth the fit). To develop an automated method of analyzing the ASM growth 652 curves, which are much more noisy and irregular than the LB growth curves across the 653 collection, we used standardized metrics for identifying problematic curves that we then also 654 evaluated visually. Curves with a maximum OD increase of less than 0.05 were discarded as 655 non-growing. Curves with linear fits with an R 2 of less than 0.7 were discarded as non-656 analyzable, and a small number of outlier curves (defined as curves analyzed for growth rates 657 of 1.5 times the mean strain growth rate) were also discarded. Examples of our analyzed 658 curves are shown in Figure S7 and all visualizations are available upon request. 659 660 "Adherence" measures 661 The ability to form biofilm is a complex trait that is impacted by multiple factors, such as the 662 production of polysaccharides, motility and the ability to adhere 56-58 . In this study, we have 663 measured adhesion to peg-lids and estimated the ability to make aggregates -both traits 664 have been linked with an isolate's ability to make biofilm 59,60 . Because of this, we are using 665 these two measures as an estimate of our isolates' ability to make biofilm. However, because 666 we are aware of the complexity of the actual biofilm-forming phenotype, we have chosen to 667 refer to this adhesion/aggregation phenotype as "adherence" and not "biofilm formation".