Genomic plasticity and rapid host switching can promote the evolution of generalism: a case study in the zoonotic pathogen Campylobacter

Horizontal gene transfer accelerates bacterial adaptation to novel environments, allowing selection to act on genes that have evolved in multiple genetic backgrounds. This can lead to ecological specialization. However, little is known about how zoonotic bacteria maintain the ability to colonize multiple hosts whilst competing with specialists in the same niche. Here we develop a stochastic evolutionary model and show how genetic transfer of host segregating alleles, distributed as predicted for niche specifying genes, and the opportunity for host transition could interact to promote the emergence of host generalist lineages of the zoonotic bacterium Campylobacter. Using a modelling approach we show that increasing levels of homologous recombination enhance the efficiency with which selection can fix combinations of beneficial alleles, speeding adaptation. We then show how these predictions change in a multi-host system, with low levels of recombination, consistent with real r/m estimates, increasing the standing variation in the population, allowing a more effective response to changes in the selective landscape. Our analysis explains how observed gradients of host specialism and generalism can evolve in a multihost system through the transfer of ecologically important loci among coexisting strains.


Modelling bacterial population dynamics in silico
Here we detail the model used throughout the main paper (MP) and discuss how the data was incorporated into the model and how simulations were performed.

Fitness function
The concept of fitness is fundamental to our approach and we model this using a fitness function, which allows us to quantify the fitness of the cell relative to its environment. Quantifying fitness has been the subject of much investigation, and a number of mathematical approaches have been developed which explore results and ramifications of applying various models. However, it is 1 beyond the scope of this study to evaluate the efficacy of the multitude of approaches to fitness quantification and so we opt for simple but valid approach to aid interpretation, namely the classical additive multilocus fitness model. This is an instance of the well known N K model [1] where no epistasis between loci is assumed, which is equivalent to setting K = 0.
In this model, each allele at each locus is assigned a value between 0 and 1 inclusive, and cell i will survive and remain in the population with probability F (C i ) per generation.

Selection of loci and determination of fitness
From our data set, we selected five loci that contained alleles which were most highly represented in the bacteria found in chicken but not in cow, and the five loci with alleles highly represented in the converse case and took these to be our 'niche-specifying genes'. Furthermore, we ensured that these alleles were in diverse positions upon the bacterial chromosome, which would break frequency dependence due to chromosomal proximity and confer more confidence in our assumption of no epistasis in our fitness function. It of course remains possible that we may have selected alleles where epistatic interactions occur due to functional interdependence but as this function would be likely to be related to the survival in a particular host, this is somewhat implicit in both the selection procedure and the model, and so should not greatly affect the outcome of the simulations.
Allelic fitnesses were determined by taking dividing the frequency of occurence of each allele n a k at a locus L j by the frequency of occurrence of the most frequent allele n max for each As such, the most frequent allele was assigned a fitness of 1, and the less frequent ones were given values relative to this. This was repeated for all ten loci. We used this method to construct fitnesses for the bacteria that were sampled from chickens and those that were sampled from cattle. Furthermore, we created a composite fitness function by elementwise addition of the allelic fitnesses of chicken and cattle, maximised at 1.
To these ten loci, we added five from the MLST schema [2] and the alleles at these loci were all given a fitness of 1 to reflect their neutral nature. These were used as a buffer so that even bacteria with low fitness had a chance of survival (unless highly mutated), but also so we could verify that the algorithm was working correctly and population mixing was occurring as expected as these alleles would become uniformly distributed in the populations, irrespective of fitness function.

Base Model
For the purposes of this study, we represent each cell by a sequence type that corresponds to the alleles found at each locus, selected as detailed above. The model incorporates five cellular processes: resource consumption, mutation, recombination, cell division and cell death. Cell death is further divided by its two causes, fitness and resource-related death. The replenishment of resources is also included in the model which confers an informal carrying capacity as the population will tend to an equilibrium where resources are being used as fast as they are being generated and the population cannot grow any further.
Mutation and recombination occur at the level of the individual locus and cell death and cell division occur at the ST (cell) level. With cell division, a copy of the cell that divides is added to the population, this occurs with rate b per generation. Mutation occurs with rate m per allele per generation, and any allele that mutates is deemed to offer no selective advantage at all and is assigned a fitness of zero. Similarly, recombination occurs with rate r per allele per generation, and an allele that recombines is assigned the value of another allele randomly chosen from those at the same loci proportional to their frequency in the current population.
Resources are generated with a rate of g per generation and are utilised/consumed proportional to their abundance, where each of the N pop −N res cells that are not already using resources in the current population have an equal chance of beginning to utilise a resources, with a rate of u + per generation for each resource R. The N res cells in the population which are using a resource are immune to resource related death for a period of time, which elapses at rate u − per generation. Cells which are unable to locate a resource have die at a rate of d per generation.
Including the fitness related death function from Section 1.1, the full set of processes governing this system can be represented as where C i represents a cell with index i, and the superscript indicates whether cell i is currently using (U ) or is not currently using (N ) a resource. Where the superscript is omitted, the resource status of the cell is irrelevant. C i,j denotes allele j of cell i. R are the number of resources, M is a mutant allele and F (C i ) is the fitness of cell C i .

Stochastic Model
We utilise a stochastic sampling algorithm in which each rate in the model is accorded a corresponding probability of occurence per time step. Performing such a simulation using the Gillespie algorithm would be computationally infeasible and so we use distributions to update the system in discrete steps. Most values were generated via a Binomial distribution (B(n, p)), which determines the number of successes given a number of trials n for a given probability of occurrence per trial p. This was used instead of the more common Poisson distribution for stochastic rate modelling as the number of trials was determined by population size which can vary in our model. As such, we split each generation into ten discrete time periods (∆ = 0.1 generations) and evaluated the operations once per time period. This seemed to give the best tradeoff between accuracy and efficiency. We describe how each process is modelled stochastically at each time step, for a given population consisting of N pop bacterial cells (of which the number using a resource was N res ), each consisting of K loci.
Mutation The number of mutations that will occur in a time step was calculated via a Binomial distribution, n m = B(N pop , mK∆), and then we assigned the loci to mutate uniformly at random. Each mutation was given an identical identifier (allele number 0), which was assigned a fitness of 0.
Recombination The number of recombination events that will occur in a time step was calculated via a Binomial distribution n r = B(N pop , rK∆), and then we assigned the donor alleles uniformly at random. We then determined recipient alleles at the same loci, again uniformly at random, and gave the recipient the allele number and fitness value of the donor.
Cell division The number of cell division events that will occur in a time step was calculated with a Binomial distribution n b = B(N pop , b∆) and then we determined which cells were to divide at random. These cells were duplicated and added to the population.
Cell death (fitness-related) The fitness of each cell was calculated in turn and then we determined whether it survives by drawing a uniform random number q. If q > 1 − F (C i ) then the cell dies and is removed from the population.
Cell death (resource-related) The number of cells that will die from those not using a resource was calculated using a Binomial distribution n d = B(N pop − N res , d∆), and we determined which cells to remove from the population uniformly at random.
Resource utilisation (cell) The number of cells that will start using a resource was calculated using a Binomial distribution n u = B(N pop −N res , uR∆) and this number of cells were selected from those not using a resource uniformly at random to become designated as using a resource.
Resource utilisation (resource) The utilisation of a resource also removes one of the resources from the population making it unable to be utilised. As each cell can only use exactly one resource at at time, when n u is calculated using for the cell case above, this number is subtracted from the number of available resources, R new = R old − n u .
Cessation of resource utilisation As the probability of cessation of resource utilisation is independent of the amount of time already spent using a resource (i.e. follows an exponential decay), we do not need to track the time spent using a resource. As such, the number of cells that will stop using a resource was calculated using a Binomial distribution n c = B(N res , c∆) and this number of cells was selected from those using a resource uniformly at random and were 6 designated as not using a resource.
Resource generation As the number of resources is unbound, the generation of resources can be modelled with a Poisson distribution. As such the new number of resources is R new = R old + P oiss(g∆).

Simulation
An initial population was randomly generated of 50 million cells with allele values appearing proportional to their frequency in the data set. The same initial population was used in every run.
The simulation began with the composite fitness function which was changed to the chicken fitness function after two hundred generations. The purpose of the composite fitness function is to allow alleles to align themselves in fit genotypes, and reach an equilibrium level with no bias toward either niche. In the first experiment, the fitness function was changed to the cow fitness function after a further 1000 generations, whereas in the second experiment it alternated between cattle and chicken every two hundred generations.
Parameter values used in the simulations are given in Table 1 2 Interplay between recombination and the two causes of cell death Aside from imparting a carrying capacity, the resource abundance also creates an interesting interplay between fitness and resource related death in deciding the fate of the cell. When resources are high compared to the population, the predominant driver of cell death is fitness.
Conversely, when the cell has approached its optimal genotype, the cause of death is predominantly resource-driven. Figure 1 shows the relative contribution of fitness and resources to the  (2) ensures an unrestricted growth of approximately double per generation when dividing each generation up into multiple time steps. † As cessation essentially follows an exponential decay, this parameter equal to 1 means the mean time of resource utilisation is one generation.
fate of a cell for a given population size in a simple simulation of one generation with a starting population size and fixed fitness.
This subtlety is important for understanding how the population dynamics affect the composition of the population, and hence how it reacts to host transitions. Immediately following a transition to a new niche after a host switch, resources will be abundant and mean fitness will be low. As such, in this period where cells grow and adapt to the new niche, cell death with be driven by fitness, with the effect of diminishing the less fit genotypes via natural selection. However, once the maximal fitness for this population in this niche has been reached, the cell fate will be determined by resource availability which is entirely random and unrelated to fitness. As such, when the population hits its carrying capacity, in this case enacted by the availability of resources, the fitness of the population ceases to increase in any meaningful capacity.
Crucially, this helps maintain genetic variance in the population, severely slowing convergence to the optimal genotype, and so allows alleles which may be less fit to survive until the next host transition, where they may have a higher fitness. For populations above the carrying capacity, the likelihood of any death being due to resources becomes even more likely, which can be seen in the change in the curves after 2.5 million.
This reenforces the importance of recombination rate to the dynamics and long term viability of the population. As the rate of convergence to the optimal genotype increases with increasing recombination rate, when the recombination rate and carrying capacity are high, the population will have converged to an optimal genotype before the carrying capacity is reached. This has the 9 knock on effect of purging more alleles from the population, and reducing variance, potentially leading to poor adaptation in subsequent niches. Conversely, if the recombination rate is so low that the carrying capacity is reached before the population is sufficiently adapted, then alleles are lost at random meaning the population may become ill equipped to survive in either niche.
As such, this gives more insight into why, in multiple hosts, an intermediate recombination rate is favoured as the interplay with the carrying capacity provides a trade-off between adaptation and maintenance of currently unfit alleles.