Construction of arbitrarily strong amplifiers of natural selection using evolutionary graph theory

Because of the intrinsic randomness of the evolutionary process, a mutant with a fitness advantage has some chance to be selected but no certainty. Any experiment that searches for advantageous mutants will lose many of them due to random drift. It is therefore of great interest to find population structures that improve the odds of advantageous mutants. Such structures are called amplifiers of natural selection: they increase the probability that advantageous mutants are selected. Arbitrarily strong amplifiers guarantee the selection of advantageous mutants, even for very small fitness advantage. Despite intensive research over the past decade, arbitrarily strong amplifiers have remained rare. Here we show how to construct a large variety of them. Our amplifiers are so simple that they could be useful in biotechnology, when optimizing biological molecules, or as a diagnostic tool, when searching for faster dividing cells or viruses. They could also occur in natural population structures.

. Summary of our results on existence of strong amplifiers. We consider different initialization schemes (temperature initialization or uniform initialization) and graph families (presence or absence of loops and/or weights). The " " symbol marks that for given choice of initialization scheme and graph family, almost all graphs admit a weight function that makes them strong amplifiers. The "×" symbol marks that for given choice of initialization scheme and graph family, no strong amplifiers exist (under any weight function). The asterisk signifies that the negative results under uniform initialization only hold for bounded degree graphs.
state the negative results (Section 4) and the positive result (Section 5). Fully rigorous mathematical proofs, with all the technical details and complete calculations, are available in [21]. Here we focus on presenting the key ideas, omitting some of those details. Next we present several explicit examples of arbitrarily strong amplifiers (Section 6) and the detailed description of simulation results (Section 7).
2 Model and Summary of Results

Model
The birth-death Moran process. The Moran process considers a population of n individuals, which undergoes reproduction and death, and each individual is either a resident or a mutant [18]. The residents and the mutants have constant fitness 1 and r, respectively. The Moran process is a discrete-time stochastic process defined as follows: in the initial step, a single mutant is introduced into a homogeneous resident population. At each step, an individual is chosen randomly for reproduction with probability proportional to its fitness; another individual is chosen uniformly at random for death and is replaced by a new individual of the same type as the reproducing individual. Eventually, this Markovian process ends when all individuals become of one of the two types. The probability of the event that all individuals become mutants is called the fixation probability.
The Moran process on graphs. In general, the Moran process takes place on a population structure, which is represented as a graph. The vertices of the graph represent individuals and edges represent interactions between individuals [14,19]. Formally, let G n = (V n , E n , W n ) be a weighted, directed graph, where V n = {1, 2, . . . , n} is the vertex set , E n is the Boolean edge matrix, and W n is a stochastic weight matrix. An edge is a pair of vertices (i, j) which is indicated by E n [i, j] = 1 and denotes that there is an interaction from i to j (whereas we have E n [i, j] = 0 if there is no interaction from i to j). The stochastic weight matrix W n assigns weights to interactions, i.e., W n [i, j] is positive iff E n [i, j] = 1, and for all i we have j W n [i, j] = 1. For a vertex i, we denote by ) the set of vertices that have incoming (resp., outgoing) interaction or edge to (resp., from) i. Similarly to the Moran process, at each step an individual is chosen randomly for reproduction with probability proportional to its fitness. An edge originating from the reproducing vertex is selected randomly with probability equal to its weight. The terminal vertex of the chosen edge takes on the type of the vertex at the origin of the edge. In other words, the stochastic matrix W n is the weight matrix that represents the choice probability of the edges. We only consider graphs which are connected, i.e., every pair of vertices is connected by a path. This is a sufficient condition to ensure that in the long run, the Moran process reaches a homogeneous state (i.e., the population consists entirely of individuals of a single type). The well-mixed population is represented by a complete graph where all edges have equal weight of 1/n.

Classification of graphs.
We consider the following classification of graphs: In other words, there is an edge from i to j iff there is an edge from j to i, which represents symmetric interaction. If a graph is not undirected, then it is called a directed graph.
2. Self-loop free graphs. A graph G n = (V n , E n , W n ) is called a self-loop free graph iff for all 3. Weighted vs unweighted graphs. A graph G n = (V n , E n , W n ) is called an unweighted graph if for all In other words, in unweighted graphs for every vertex the edges are choosen uniformly at random. Note that for unweighted graphs the weight matrix is not relevant, and can be specified simply by the graph structure (V n , E n ). In the sequel, we will represent unweighted graphs as G n = (V n , E n ).

4.
Bounded degree graphs. The degree of a graph G n = (V n , E n , W n ), denoted deg(G n ), is max{In(i), Out(i) | 1 ≤ i ≤ n}, i.e., the maximum in-degree or out-degree. For a family of graphs (G n ) n>0 we say that the family has bounded degree, if there exists a constant c such that the degree of all graphs in the family is at most c, i.e., for all n we have deg(G n ) ≤ c.
Initialization of the mutant. The fixation probability is affected by many different factors [20]. In a wellmixed population, the fixation probability depends on the population size n and the relative fitness advantage r of mutants [15,19]. For the Moran process on graphs, the fixation probability also depends on the population structure, which breaks the symmetry and homogeneity of the well-mixed population [13,12,6,14,4,7,22,10]. Finally, for general population structures, the fixation probability typically depends on the initial location of the mutant [2,3], unlike the well-mixed population where the probability of the mutant fixing is independent of where the mutant arises [15,19]. There are two standard ways mutants may arise in a population [14,1]. First, mutants may arise spontaneously and with equal probability at any vertex of the population structure. In this case we consider that the mutant arise at any vertex uniformly at random and we call this uniform initialization. Second, mutants may be introduced through reproduction, and thus arise at a vertex with rate proportional to the incoming edge weights of the vertex. We call this temperature initialization. In general, uniform and temperature initialization result in different fixation probabilities.
Amplifiers, quadratic amplifiers, and strong amplifiers. Depending on the initialization, a population structure can distort fitness differences [14,19,4], where the well-mixed population serves as a canonical point of comparison. Intuitively, amplifiers of selection exaggerate variations in fitness by increasing (respectively decreasing) the chance of fitter (respectively weaker) mutants fixing compared to their chance of fixing in the well-mixed population. In a well-mixed population of size n, the fixation probability is Thus, in the limit of large population (i.e., as n → ∞) the fixation probability in a well-mixed population is 1 − 1/r. We focus on two particular classes of amplifiers that are of special interest. A family of graphs (G n ) n>0 is a quadratic amplifier if in the limit of large population the fixation probability is 1 − 1/r 2 . Thus, a mutant with a 10% fitness advantage over the resident has approximately the same chance of fixing in quadratic amplifiers as a mutant with a 21% fitness advantage in the well-mixed population. A family of graphs (G n ) n>0 is an arbitrarily strong amplifier (hereinafter called simply a strong amplifier) if for any constant r > 1 the fixation probability approaches 1 at the limit of large population sizes, whereas when r < 1, the fixation probability approaches 0. There is a much finer classification of amplifiers presented 6 in [1]. We focus on quadratic amplifiers which are the most well-known among polynomial amplifiers, and strong amplifiers which represent the strongest form of amplification. Remark 1. Amplifiers tend to have fixation times longer than the well mixed population. Therefore they are especially useful in situations where the rate limiting step is the discovery and evaluation of marginally advantageous mutants. An interesting direction for future work would be to consider amplifiers as well as the time-scale of evolutionary trajectories.
Existing results. We summarize the main existing results in terms of uniform and temperature initialization.
1. Uniform initialization. First, consider the family of Star graphs, which consist of one central vertex and n − 1 leaf vertices, with each leaf being connected to and from the central vertex. Star graphs are unweighted, undirected, self-loop free graphs, whose degree is linear in the population size. Under uniform initialization, the family of Star graphs is a quadratic amplifier [14,19]. A generalization of Star graphs, called Superstars [14,19,11,5], are known to be strong amplifiers under uniform initialization [8]. The Superstar family consists of unweighted, self-loop free, but directed graphs where the degree is linear in the population size. Another family of directed graphs with strong amplification properties, called Megastars, was recently introduced in [8]. The Megastars are stronger amplifiers than the Superstars, as the fixation probability on the former is a approximately 1 − n −1/2 (ignoring logarithmic factors), and is asymptotically optimal (again, ignoring logarithmic factors). In contrast, the fixation probability on the Superstars is approximately 1 − n −1/2 . In the limit of n → ∞, both families approach the fixation probability 1.
2. Temperature initialization. While the family of Star graphs is a quadratic amplifier under uniform initialization, it is not even an amplifier under temperature initialization [1]. It was shown in [1] that by adding self-loops and weights to the edges of the Star graph, a graph family, namely the family of Looping Stars, can be constructed, which is a quadratic amplifier simultaneously under temperature and uniform initialization. Note that in contrast to Star graphs, the Looping Star graphs are weighted and also have self-loops.

Results
In this work we present several negative as well as positive results that answer the open questions (Questions 1-5) mentioned above. We first present our negative results.
Negative results. Our main negative results are as follows: 1. Our first result (Theorem 1) shows that for any self-loop free weighted graph G n = (V n , E n , W n ), for any r ≥ 1, under temperature initialization the fixation probability is at most 1 − 1/(r + 1). The implication of the above result is that it answers Question 1 in negative.
2. Our second result (Theorem 2) shows that for any unweighted (with or without self-loops) graph G n = (V n , E n ), for any r ≥ 1, under temperature initialization the fixation probability is at most 1 − 1/(4r + 2). The implication of the above result is that it answers Question 2 in negative.
3. Our third result (Theorem 3) shows that for any bounded degree self-loop free graph (possibly weighted) G n = (V n , E n , W n ), for any r ≥ 1, under uniform initialization the fixation probability is at most 1 − 1/(c + c 2 r), where c is the bound on the degree, i.e., deg(G n ) ≤ c. The implication of the above result is that it answers Question 3 in negative.
4. Our fourth result (Theorem 4) shows that for any unweighted, bounded degree graph (with or without self-loops) G n = (V n , E n ), for any r ≥ 1, under uniform initialization the fixation probability is at most 1 − 1/(1 + rc), where c is the bound on the degree, i.e., deg(G n ) ≤ c. The implication of the above result is that it answers Question 4 in negative.
Significance of the negative results. We now discuss the significance of the above results.
1. The first two negative results show that in order to obtain quadratic amplifiers under temperature initialization, self-loops and weights are inevitable, complementing the existing results of [1]. More importantly, it shows a sharp contrast between temperature and uniform initialization: while self-loop free, unweighted graphs (namely, Star graphs) are quadratic amplifiers under uniform initialization, no such graph families are quadratic amplifiers under temperature initialization.
2. The third and fourth results show that without using self-loops and weights, bounded degree graphs cannot be made strong amplifiers even under uniform initialization. See also Remark 4.
Positive result. Our main positive result shows the following: 1. For any constant > 0, consider any connected unweighted graph G n = (V n , E n ) of n vertices with self-loops and which has diameter at most n 1− . The diameter of a connected graph is the maximum, among all pairs of vertices, of the length of the shortest path between that pair. We establish (Theorem 5) that there is a stochastic weight matrix W n such that for any r > 1 the fixation probability on G n = (V n , E n , W n ) both under uniform and temperature initialization is at least 1 − 1 n /3 . An immediate consequence of our result is the following: for any family of connected unweighted graphs with self-loops (G n = (V n , E n )) n>0 such that the diameter of G n is at most n 1− , for a constant > 0, one can construct a stochastic weight matrix W n such that the resulting family (G n = (V n , E n , W n )) n>0 of weighted graphs is a strong amplifier simultaneously under uniform and temperature initialization. Thus we answer Question 5 in affirmative.
Significance of the positive result. We highlight some important aspects of the results established in this work.
1. First, note that for the fixation probability of the Moran process on graphs to be well defined, a necessary and sufficient condition is that the graph is connected. A uniformly chosen random connected unweighted graph of n vertices has diameter bounded by a constant, with high probability. Hence, within the family of connected, unweighted graphs, the family of graphs of diameter at most O(n 1− ), for any constant 0 < < 1, has probability measure 1. Our results establish a strong dichotomy: (a) the negative results state that without self-loops and/or without weights, no family of graphs can be a quadratic amplifier (even more so a strong amplifier) even for only temperature initialization; and (b) in contrast, for almost all families of connected graphs with self-loops, there exist weight functions such that the resulting family of weighted graphs is a strong amplifier both under temperature and uniform initialization.
2. Second, with the use of self-loops and weights, even simple graph structures, such as Star graphs, Grids, and well-mixed structures (i.e., complete graphs) can be made strong amplifiers.
3. Third, our positive result is constructive, rather than existential. In other words, we not only show the existence of strong amplifiers, but present a construction of them.
Our results are summarized in Supplementary Table 1. Remark 2. Edges with zero weight. Note that edges can be effectively erased by being assigned zero weight. (However, no weight assignment can create edges that don't exist.) Therefore, when our construction works for some graph, it also works for a graph that contains some additional edges. In particular, our construction easily works for complete graphs. However, erasing edges is not necessary: the construction can be easily modified to a scenario in which each existing edge is assigned positive (that is, non-zero) weight. Remark 3. Relation between theoretical and simulation results. Strong amplifiers can only exist in the limit of large population size (see Theorem 5). In Section 7 we show that our construction and weight assignment significantly increases the fixation probability even for graphs with small population size. 9 3 Preliminaries: Formal Notation

The Moran Process on Weighted Structured Populations
We consider a population of n individuals on a graph G n = (V n , E n , W n ). Each individual of the population is either a resident, or a mutant. Mutants are associated with a reproductive rate (or fitness) r, whereas the reproductive rate of residents is normalized to 1. Typically we consider the case where r > 1, i.e., mutants are advantageous, whereas when r < 1 we call the mutants disadvantageous. We now introduce the formal notation related to the process.

Configuration.
A configuration of G n is a subset S ⊆ V which specifies the vertices of G n that are occupied by mutants and thus the remaining vertices V \ S are occupied by residents. We denote by F(S) = r · |S| + n − |S| the total fitness of the population in configuration S, where |S| is the number of mutants in S.
The Moran process. The birth-death Moran process on G n is a discrete-time Markovian random process. We denote by X i the random variable for a configuration at time step i, and F(X i ) and |X i | denote the total fitness and the number of mutants of the corresponding configuration, respectively. The probability distribution for the next configuration X i+1 at time i + 1 is determined by the following two events in succession: Birth: One individual is chosen at random to reproduce, with probability proportional to its fitness. That is, the probability to reproduce is r/F(X i ) for a mutant, and 1/F(X i ) for a resident. Let u be the vertex occupied by the reproducing individual.
occupying v dies, and the reproducing individual places a copy of its own on v.
The above process is known as the birth-death Moran process, where the death event is conditioned on the birth event, and the dying individual is a neighbor of the reproducing one.
Probability measure. Given a graph G n and the fitness r, the birth-death Moran process defines a probability measure on sequences of configurations, which we denote as P Gn,r [·]. If the initial configuration is {u}, then we define the probability measure as P Gn,r u [·], and if the graph and fitness r is clear from the context, then we drop the superscript.
Fixation event. The fixation event, denoted E, represents that all vertices are mutants, i.e., X i = V for some i. In particular, P Gn,r u [E] denotes the fixation probability in G n for fitness r of the mutant, when the initial mutant is placed on vertex u. We will denote this fixation probability as ρ(G n , r, u) = P Gn,r u [E].

Initialization and Fixation Probabilities
We will consider three types of initialization, namely, (a) uniform initialization, where the mutant arises at vertices with uniform probability, (b) temperature initialization, where the mutant arises at vertices proportional to the temperature, and (c) convex combination of the above two.
Temperature. For a weighted graph G n = (V n , E n , W n ), the temperature of a vertex u, denoted T(u), is v∈In(u) W n [v, u], i.e., the sum of the incoming weights. Note that u∈Vn T(u) = n, and a graph is isothermal iff T(u) = 1 for all vertices u.
Fixation probabilities. We now define the fixation probabilities under different initialization.

Strong Amplifier Graph Families
A family of graphs G is an infinite sequence of weighted graphs G = (G n ) n∈N + .
• Strong amplifiers. A family of graphs G is a strong uniform amplifier (resp. strong temperature amplifier, strong convex amplifier) if for every fixed r 1 > 1 and r 2 < 1 we have that where Z = U (resp., Z = T, Z = η).
Intuitively, strong amplifiers ensures (a) fixation of advantageous mutants with probability 1 and (b) extinction of disadvantageous mutants with probability 1. In other words, strong amplifiers represent the strongest form of amplifiers possible.

Negative Results
In the current section we state our negative results, which show the nonexistence of strong amplifiers in the absence of either self-loops or weights. Fully rigorous mathematical proofs, with all the technical details and complete calculations, are available in [21]. Theorem 1. For all self-loop free graphs G n and for every r ≥ 1 we have ρ(G n , r, T) ≤ 1 − 1/(r + 1). Corollary 1. There exists no self-loop free family of graphs which is a strong temperature amplifier.
Theorem 2. For all unweighted graphs G n and for every r ≥ 1 we have ρ(G n , r, T) ≤ 1 − 1/(4r + 2). Corollary 2. There exists no unweighted family of graphs which is a strong temperature amplifier.

Theorem 3.
For all self-loop free graphs G n with c = deg(G n ), and for every r ≥ 1 we have ρ(G n , r, U) ≤ 1 − 1/(c + r · c 2 ).

Corollary 3.
There exists no self-loop free, bounded-degree family of graphs which is a strong uniform amplifier.
Theorem 4. For all unweighted graphs G n with c = deg(G n ), and for every r ≥ 1 we have ρ(G n , r, U) ≤ 1 − 1/(1 + r · c). Corollary 4. There exists no unweighted, bounded-degree family of graphs which is a strong uniform amplifier. Remark 4. Theorems 3 and 4 establish the nonexistence of strong amplification with bounded degree graphs. A relevant result can be found in [16], which establishes an upperbound of the fixation probability of mutants under uniform initialization on unweighted, undirected graphs. If the bounded degree restriction is relaxed to bounded average degree, then recent results show that strong amplifiers (called sparse incubators) exist [9].

Positive Result
In the previous section we showed that self-loops and weights are necessary for the existence of strong amplifiers. In this section we state our positive result, namely that every family of undirected graphs with self-loops and whose diameter is not "too large" can be made a strong amplifier by using appropriate weight functions W that assign non-negative real weights to edges and self-loops of the graphs. Fully rigorous mathematical proofs, with all the technical details and complete calculations, are available in [21]. Theorem 5. Let ε > 0 and n 0 > 0 be any two fixed constants, and consider any sequence of unweighted, undirected graphs (G n ) n>0 such that diam(G n ) ≤ n 1−ε for all n > n 0 . There exists a sequence of weight functions (w n ) n>0 such that the graph family G = (G wn n ) is a (i) strong uniform, (ii) strong temperature, and (iii) strong convex amplifier.

Explicit Examples
In this section we present explicit examples of unweighted graph families and our weight assignment that turns them into arbitrarily strong amplifiers.

Explicit Unweighted Graphs
Complete graph K N .
• Edges: Every vertex is connected to every vertex (including self-loops).
Star graph S N .
• Vertices: One vertex in the center, N − 1 vertices (leafs) around it.
• Edges: The center is connected to all the leafs.
Grid graph G a,b .
• Vertices: Aligned in a grid-like fashion with a rows and b columns.
• Edges: Every vertex is connected to four surrounding vertices (one above, below, to the left, and to the right). In order to avoid boundary conditions, the grid "wraps around", i.e. the vertices in the first row are connected to the vertices in the last row and the same holds for columns.
• Vertices: A complete graph of size n in the center, and n surrounding petals which are complete graphs of sizes k 1 , . . . , k n , respectively.
• Edges: In addition to the edges within center and petals, every petal is connected with all its vertices to a unique vertex from the center.
Remark 5. Note that Grids are graphs of bounded degree -any vertex has only four neighbors (five including a self-loop), no matter how large the Grid is.

Explicit Weight Assignments
For several explicit graph families, we present a way to partition them into hub and branches and to assign weights to edges and self-loops to obtain strong amplifiers.
Star graph S N : N → ∞.
• Hub and branches: The hub H consists of the center vertex together with √ N other leafs called hub-leafs. Every other leaf is a single branch.
• Weights: if u is the center, (N − |H|) · 2 −N + |H| − 2 if u is a hub-leaf, N −2 if u is outside the hub, Fig. 1(a) for an illustration.

See Supplementary
Grid graph G n,n : n → ∞.
• Hub and branches: The hub H is the middle row (or one of the two middle rows if n is even), the rest is split into branches by assigning weight 0 to all the remaining horizontal edges.
• Weights: Recall that N = n 2 and λ(u) is the distance from vertex u to the hub.
• Hub and branches: The hub H is the complete graph in the center. Every other vertex is a single branch.
See Supplementary Fig. 1(c) for an illustration. Remark 6. Note that n × n Grid graphs are bounded degree graphs and our construction turns them into strong amplifiers.

Details of Simulation Results
Here we present details of the computer simulation results showed in Fig. 3 of main article. Fig. 3 of main article shows how simple structures can be turned into strong amplifiers under uniform initialization by assigning weights according to our algorithm (see Section 6.2). Unless stated otherwise, the values plotted are obtained by simulating the process 10 000 times. For completeness, in Supplementary  Fig. 2 we also present analogous comparisons for temperature initialization. Fig. 3(A). We consider Star graphs S N with N = 10, 20, . . . , 500. For unweighted Star, the exact fixation probability under both uniform and temperature initialization follows from formula in [17]. The values for weighted Star were computed numerically by solving large systems of linear equations. Fig. 3(B). We consider n × n and n × (n + 1) Grid graphs of sizes N = 9, 12, 16, 20, . . . , 100. Unweighted grid with N vertices is isothermal so the fixation probability under both uniform and temperature initialization is given by (1 − 1/r)/(1 − 1/r N ).