Migration and horizontal gene transfer divide microbial genomes into multiple niches

Horizontal gene transfer is central to microbial evolution, because it enables genetic regions to spread horizontally through diverse communities. However, how gene transfer exerts such a strong effect is not understood. Here we develop an eco-evolutionary model and show how genetic transfer, even when rare, can transform the evolution and ecology of microbes. We recapitulate existing models, which suggest that asexual reproduction will overpower horizontal transfer and greatly limit its effects. We then show that allowing immigration completely changes these predictions. With migration, the rates and impacts of horizontal transfer are greatly increased, and transfer is most frequent for loci under positive natural selection. Our analysis explains how ecologically important loci can sweep through competing strains and species. In this way, microbial genomes can evolve to become ecologically diverse where different genomic regions encode for partially overlapping, but distinct, ecologies. Under these conditions ecological species do not exist, because genes, not species, inhabit niches.


Supplementary Methods.
Here we present an extended version of the methods presented in the paper.

A) Continuous model
Here we introduce an extended version of the compartment model presented in the main text. This extended model captures how a focal gene spreads both inside a focal patch where it is under positive selection, and outside this patch. Outside the focal patch the cells do not benefit from the focal gene but pay a fitness cost when they carry it. For simplicity, we capture the external environment as a single patch that is connected to the focal patch by cell migration.
Our model is analogous to an SIR (Sensitive, Infected, Resistant) model, often known as a "compartment model", which is commonly used to describe the dynamics of an epidemic by representing the flow between sub-communities ("compartments").
Our model contains two patches, the focal patch and the environment patch, each comprising two compartments: carriers ("infected") and non-carriers ("sensitive") of the selected trait. In the focal patch there are a constant number of cells (N) and varying fractions of beneficial gene carriers (C) and non-carriers (1-C), outside the patch there is a p-times larger number of cells (pN) and varying fractions of focal gene carriers (E) and non-carriers (1-E). For simplicity, each patch is well-mixed and we do not consider spatial effects. We also do not consider stochastic effects in this model but our later models do. The dynamics within both patches are determined by the following three processes, which are considered to occur continuously in time, where one time unit is one cell generation:

Selection:
A cell's fitness depends on whether it carries the focal gene or not and whether it lives in the focal patch or not. The general fitness term of a cell is w = 1 -ε + β, where ε is the cost that all trait-carriers experience and β is the benefit that only trait-carriers within the focal patch experience. We assume that the benefit of the focal trait in the selective patch is larger than its cost (β > ε) so that the net change in fitness is positive (s = β -ε, s > 0). This leads to the following fitnesses: Carriers in focal patch: w = 1 + s, non-carriers in focal patch: w = 1 Carriers in environment patch: w = 1 -ε, non-carriers in environment patch: w = 1

Horizontal gene transfer:
We let r be the probability that a carrier transmits the focal gene to a non-carrier per cell generation and we assume that horizontal transfer occurs according to the mass action principle: the transmission rate is proportional to both the fraction of recipients (1-C or 1-E) and the fraction of donors (C or E).

Migration:
Migration works as an exchange of cells between focal patch and environment patch. The migration rate is specified through m, the fraction of the cells in the focal patch that is replaced randomly by individuals from the external patch landing in the focal patch per unit time. In the external patch cells from the focal patch replace a fraction of m/p per unit time.
The evolution of our compartment model can be described using two coupled ordinary differential equations (ODEs). We let C N (t) and E N (t) be the number of carriers in the focal patch and the environmental patch, respectively, at time t. Due to the constant carrying capacities of the patches, the number of non-carriers is given through N -C N (t) for the focal patch and through pN -E N (t) for the environment patch. We can then describe the dynamics of the whole system by considering sub-communities C and E only. The rate of transitions from sub- Similarly, in the environment patch the rate of gene transitions is given by In the environment patch the carrier-compartment E is reduced by selection at rate ). In the external patch, migration increases the carrier-compartment E at rate ω migration =mN(C N (t)/N- In summary, the system evolves according to the system of ODEs  (pN), the coupled equations: For a visualisation of our extended compartment model see Supplementary Figure   S1. In the external patch, trait carriers are added through migration from the focal patch, and the trait spreads through horizontal transfer and is removed due to positive selection. Given a very small gene transfer rate, the key parameters that determine to what extend the trait is maintained in the external patch are the cost ε and p, the size of the external patch relative to size of the focal patch. Because microbial genome evolution by gene gain and loss is biased towards loss of unnecessary genes 1,2 , we assume that ε is larger than the gene transfer rate r (ε > r).
We explore in our model an according range of values for ε (10 -5 < ε < 10 -1 in Supplementary Fig. 2 and 10 -5 < ε < 10 -3 in Supplementary Fig. 3). The migration parameter m determines the amount of cell exchange between the patches, and when m is very high it causes the two patches to behave close to a single, well-mixed patch. Because we assume that the focal patch and its environment are spatially separated, we consider m = 0.02 as a high migration rate and choose this to be the upper limit. We plot the fraction of carriers in the external patch at equilibrium (E*) for a range of parameters ε and p (see Supplementary Fig. 2). We determine E* by solving equations (A3), in MATLAB until equilibrium is reached, with initial conditions C(0) = 0.001, E(0) = 0 and for different migration rates m and a selection pressure of s = 2m so that the focal trait is fixed in the selective patch. Supplementary Figure S2 shows that when the external patch is much larger than the focal patch (p > 10 4 ), then the focal trait is extremely rare outside the focal patch (E* < 0.0001). We begin by studying this case in detail, by setting E(t) = 0: The steady states of this system for r ≈ 0 is given approximately by: with C*= 0 defining an unstable equilibrium and C* = (s -m)/((m + 1)s) a stable equilibrium, given s > 0, according to linear stability analysis. Here and throughout, we use an asterisk to denote the steady-state value of a variable. Table 1 provides a summary of the parameters present in this model.  Fig. 1 and Fig. 2, we employ the midpoint rule.
Explicitly modelling immigration. We now move to explicitly modelling immigrating cells to relax the assumption that migrants never arrive pre-adapted in the focal patch and test how this changes our results. For this we return to equations (A3) and we use these equations to repeat the calculations of the final cumulative gene flux using different values for p and ε, and we compare the new results with our results from Figure 2b (Supplementary Fig. 3).
Very small fractions of external carriers ( Supplementary Fig. 2, black areas) then mean that the system with external patch (Eq. A3) behaves like the simplified system (Eq. A4) and the latter is a good approximation of the former. In our next models, we explore the effects of the horizontal gene flux observed when migrants lack the adaptive trait (E(t) = 0) which corresponds to the case were the outside environment is much larger than the focal patch.

B) Coalescence model
We use a coalescence approach to model the genomes of our community under influence of a selective sweep in combination with genetic transfer and migration.
We ask how these processes influence the evolution of genetic diversity in the focal We first explain how we apply the coalescence model to the background genome.
When taking two random loci of the background genome at t = t end , we know that these loci can be found in one of the first 3 states: 11 (with probability P 11 (t end ) = C(t end ) 2 ), 00 (with probability P 00 (t end )= (1 -C(t end )) 2 ), or 01 (with probability P 01 (t end ) = 2C(t end )(1-C(t end ))). The probabilities for the remaining states are zero (P 1 (t end ) = 0, P 0 (t end ) = 0, P m (t end ) = 0). We use these probabilities as our initial condition and simulate the change of all six probabilities backward in time, until t = 0, by solving the following set of coupled ODEs: where we use the following probabilities of the loci to change their state at time t: where f a is the relative fitness of a carrier cell at time t -1, given by: Here we make the typical assumptions of a coalescence model 3 that generations are discrete and non-overlapping and that the community size (N) is constant throughout time.
After running the coalescence simulation from t = t end to t = 0, we obtain the unconditional probabilities for the two background loci (i) to have coalesced within the simulated time interval (P 1 + P 0 ), or (ii) to be derived from two distinct cells that have either been in the community since t = 0, or that have migrated into the patch (P 11 + P 10 + P 00 + P m ). In case (i) the two loci are identical and in case (ii) the two loci are different. Only, however, when the two loci descended from 2 distinct cells at t = 0 that are carriers (with probability P 11 (0)), we assume that the loci are identical, because the genotype that initially carries the beneficial gene comes into the patch as a clonal group. We can calculate the probability of the two background loci being identical as: P(2 background loci are identical) = P 1 (0) + P 0 (0) + P 11 (0).
The probability of two randomly chosen gene sites to be identical is the Simpson's index of diversity 4 . Here we use the inverse of the Simpson's index to measure diversity, which is the effective number of species in a community (of order 2, 5 ). The diversity in the background genome is then given by: Similarly, we can calculate the probability of two focal loci to be identical. Note that two randomly sampled focal loci at t = t end are identical when they are in two carrier cells (state 11) and different when only one locus is in a carrier cell (state 01). We define the diversity of the focal locus as: We measure the power of the horizontal sweep using the ratio of the diversity in the background genome over the diversity in the focal locus, and we call this ratio the diversity ratio DR, given by: (A12)

C) Individual-based model
We where R ij is the amount or resources obtained by cell i from niche j and  j is the summed competitive weight in niche j given by: The total amount of resources obtained by cell i per generation is then given by: The resources of a cell determine the reproduction of a cell. Because the resource share per niche is scaled to N/n, the mean amount of resources per cell per generation is 1. Therefore, we can use R i directly as the mean number of offspring of a cell. We update the cell numbers stochastically using a Poisson distribution. Then, cells that lack the selected trait have a chance of acquiring the trait with a probability C(t)r. We use a rate of gene transfer of r = 10 -4 , which is higher than conservative estimates (10 -6 ≤ r ≤ 10 -5 , 10 ), in order for simulations to run in a reasonable computational time. However, the qualitative effects of niche separation that we observe should generalize to lower rates of genetic transfer. Moreover, recent studies suggest that rates of gene transfer may be orders of magnitude larger than these conservative estimates 11,12 . In simulations with migration, we model this process by replacing existing cells at random with new incoming genotypes (and matching focal locus). Each new genotype is also assigned to an associated niche at random.
Each simulation (implemented in MATLAB) is run for t end = 10 5 time steps, which is equivalent to about 6 years of microbial evolution given a generation time of 30 minutes 13 . We show that the results of our individual-based model match our coalescence model (see Supplementary Fig. 4). We track the background genomes and focal loci of all individuals in the community over time and we also measure the horizontal gene flux by counting the number of gene transfer events. From there we determine the parameters of interest, such as the fraction of carrier cells C, or diversities in different parts of the genome. The diversity of a genome is calculated as: where n is the number of different locus variants present in the community and p i is their respective proportion. This calculation is analogous to the effective number of species in a community (of order 2, 5 ). We measure the power of the horizontal sweep using the ratio of the diversity in the background genome over the diversity in the focal locus, and we call this ratio the diversity ratio DR, given by: where D bg is the diversity in the background genome and D f is the diversity in the focal locus. A diversity ratio of 5, for example, means that effectively 5 different background genomes per single focal locus appear in the genomes of the community.
This diversity ratio can also be calculated for compartment C and compartment 1-C separately. For the compartment of non-carriers (1-C) the background genomes match their focal locus, giving a diversity ratio of 1. For the compartment of carriers (C), gene transfer allows for an increase of diversity in the background genome against the homogeneous focal locus. Thus, the diversity ratio in C is directly linked to the cumulative gene flux. The diversity ratio of the whole community lies between the diversity ratios of compartments C and 1-C, depending on the size of C. It therefore behaves similarly, but not exactly like the cumulative gene flux with respect to a changing selection pressure.