Fast and strong amplifiers of natural selection

Selection and random drift determine the probability that novel mutations fixate in a population. Population structure is known to affect the dynamics of the evolutionary process. Amplifiers of selection are population structures that increase the fixation probability of beneficial mutants compared to well-mixed populations. Over the past 15 years, extensive research has produced remarkable structures called strong amplifiers which guarantee that every beneficial mutation fixates with high probability. But strong amplification has come at the cost of considerably delaying the fixation event, which can slow down the overall rate of evolution. However, the precise relationship between fixation probability and time has remained elusive. Here we characterize the slowdown effect of strong amplification. First, we prove that all strong amplifiers must delay the fixation event at least to some extent. Second, we construct strong amplifiers that delay the fixation event only marginally as compared to the well-mixed populations. Our results thus establish a tight relationship between fixation probability and time: Strong amplification always comes at a cost of a slowdown, but more than a marginal slowdown is not needed.

This is a supplementary note to the manuscript Fast and strong amplifiers of natural selection. The organization of the text is as follows: In Supplementary Note 1 we introduce the necessary terminology and notation, together with the tools we will be using in the proofs. In Supplementary Note 2 we formally state our mathematical results and provide a high level overview of the ideas behind our proofs. In Supplementary Note 3 we then provide those formal proofs.

Evolutionary graph theory
Here we formally define Moran Birth-death process on a population structure, the key quantities of fixation probability and timescale of fixation, and the notions of amplifiers and strong amplifiers.
Moran process on a population structure. The population structure is described by a connected graph (network) G = (V, E), where V is a set of |V | = N nodes and E is a set of edges, possibly including self-loops, where each edge uv is assigned a weight w uv . The nodes of G represent sites, each site hosting one individual. The individuals are of two types: residents with fitness normalized to 1, and mutants with (relative) fitness advantage r > 1. The set of nodes occupied by mutants is called a configuration. Given a configuration S we denote by F(S) the total fitness of the population. The edges of G represent connections between adjacent sites, marking where an individual could place its offspring, and the weights represent the strengths of those interactions. The Moran Birth-death process is a stochastic (random) process that acts in discrete time steps until either all nodes are occupied by mutants (fixation occured) or by residents (extinction occurred): At each time step, first ("Birth") an individual is selected randomly proportionally to its fitness and it produces a copy of itself. Denote the corresponding site by u. Second ("death"), a site v adjacent to u is selected randomly proportionally to the weight w uv of the edge uv, and the individual at site v gets replaced by a copy of the individual at site u.
Fixation probability, Extinction probability, Fixation time. Given a graph G with N nodes, a mutant advantage r > 1 and a single node v, we denote by fp(G, r, v) the fixation probability of the Moran Birth-death process, starting with a configuration containing a single mutant with fitness r occupying node v, against a background population of residents with fitness normalized to 1. Similarly, we denote by ep(G, r, v) = 1−fp(G, r, v) the extinction probability.
Concerning the duration of the process, we denote by FT(G, r, v) the (unconditional) fixation time, that is, the (expected) number of steps the process makes until either the mutants or the residents fixate, divided by the population size N . (This quantity, without rescaling by the factor of N , is sometimes also called the absorption time.) Note that the fixation time counts all the steps of the process, including those where the set of nodes occupied by mutants might not have changed due to an offspring replacing an individual of the same type. The argument for rescaling the fixation time by the population size N is that Moran process can be viewed as a discretization of a continuous-time Markov chain where a node with fitness F reproduces after a time given by an exponentially distributed random variable with mean 1/F (by listing only the timepoints in which a reproduction event has happened). In this continuous viewpoint, a population with double the size undergoes twice as many reproduction events per unit time. Thus we prefer to measure the fixation time in generations rather than in individual steps.
Finally, for completeness we briefly mention another quantity that is sometimes used to measure time. We denote by CFT(G, r, v) the conditional fixation time, that is, the (expected) number of steps the process makes until the mutants fixate, conditioning on the fact they do, divided by the population size N . In other words, when computing the conditional fixation time, we average over only those evolutionary trajectories that eventually reach fixation of the mutant. We note that conditional fixation time is typically longer than the (unconditional) fixation time, since the trajectories leading to fixation are typically longer than those leading to extinction. Similarly, for strong amplifiers (defined below), the fixation time and the conditional fixation time are typically asymptotically equivalent, since the fixation event has probability 1.
Uniform and temperature initialization. Averaging over the |V | = N possible locations of the initial mutant, we denote by the fixation probability under uniform initialization. This is meaningful under the assumption that mutations arise spontaneously, with the same rate at each node. If one assumes that mutations arise during reproduction, one prefers to study the weighted average is the probability that when a node v reproduces it self-loops, and T (v) = u∈V \{v} wuv v w uv is the temperature of node v. Intuitively, the temperature of a node v counts how many times per generation is v replaced by its neighbors, on average, when all individuals are residents. Therefore the temperature initialization can be thought of as follows: Consider a uniform population of residents, sample a random reproduction event, and then place a mutant on the node being replaced. The extinction probability ep(G, r), ep T (G, r) and the fixation time FT(G, r), FT T (G, r) under uniform and temperature initialization are defined analogously.
Complete graph K N , amplifiers. The well-mixed ("unstructured") population consisting of N individuals is modeled by a complete graph K N . It is known [1,2,3] that for any fixed r > 1 we have as N → ∞. This gives a natural baseline to which one can compare the other population structures. In particular, given r > 1, a population structure A N with N nodes is called an r-amplifier if it increases the fixation probability as compared to the complete graph of the same size (i.e. fp(A N , r) > fp(K N , r)). A population structure that is an r-amplifier for all r > 1 is called an amplifier. (We note that other authors sometimes further require that a graph A N satisfies fp(A N , r) < fp(K N , r) for all r < 1 to be called an amplifier.)

Results for large N
Superamplfication and timescale of fixation concern the behavior of population structures in the limit of a large population size N → ∞. First we recall the standard mathematical notation for asymptotic results.
Notation for asymptotic results. In the limit N → ∞, we consider the standard mathematical notation of o(·), O(·), Ω(·), ω(·), and Θ(·) that denote asymptotically strictly less than, less than or equal to, greater than or qual to, strictly greater than, and asymptotically equal to (up to a constant factor), respectively. Finally, we use the approximation sign x ≈ y to state that y = x + o(x).
As four examples, we will write: . For detailed treatment see [4,Section 1.3].
We say that an event happens with high probability if the probability p N that the event happens tends to 1 as N tends to infinity (equivalently, if 1 − p N = o(1)).
Strong amplifiers and Timescale of fixation. In order to compare the behavior of population structures for large N , we consider sequences {G N } ∞ N =1 of graphs of increasing population size. Regarding probability, such a sequence {G N } ∞ N =1 is called a strong amplifier (also known as a superamplifier ) if, for any fixed r > 1, it satisfies fp(G N , r) → N →∞ 1. Strong amplification is, in a sense, the strongest possible form of amplification -an arbitrarily minute (but fixed) fitness advantage r > 1 guarantees that a single mutant with that advantage will fixate with high probability, as the population size N grows large.
Regarding the duration of the process, we define the timescale of fixation by focusing on how the fixation time dependends on N and ignoring both the lower order terms and the possible constants that depend on r but not on N . For instance, for Complete graphs K N we have FT(G N , r) = (1 + 1/r) · log N + o(log N ), hence for any fixed r > 1 we can write FT(G N , r) = Θ(log N ). Formally, given a sequence {G N } ∞ N =1 and r > 1 the timescale of fixation is a function T(G N , r) such that FT(G N , r) = Θ(T(G N , r)). If the same function T(G N , r) applies for all r > 1, as is the case for instance for the Complete graphs, we omit the parameter r and write T(G N ).

Selection reactors
In this section we introduce Selection reactors. First we describe their underlying (unweighted) graph structure.
Unweighted selection reactor. Given integers N , n the unweighted selection reactor USR N (n) is an unweighted graph with N nodes such that: 1. There are n hub nodes and N − n leaf nodes.
2. Every two hub nodes are connected, every hub node is connected to every leaf node, and every leaf node has a self-loop.
Firing in and out. Due to the symmetry of selection reactors, any step of the process that changes the mutant configuration is one of the following four types: 1. A hub node replaces a leaf node: We say the hub node fires out.

2.
A hub node replaces another hub node: We say the hub node does not fire out (or stays).

3.
A leaf node replaces a hub node: We say the leaf fires in.

4.
A leaf node replaces itself due to a self loop: We say the leaf does not fire in (or loops).
Moreover, we say that a fire-in event is a resident fire-in when the individual reproducing is a resident (and similarly for mutant and/or fire-out events).
Selection reactors. Now we can formally define (weighted) Selection reactors. Given integers N , n and two real numbers p in , p out ∈ (0, 1), the selection reactor SR N (n, p in , p out ) is a weighted graph with N nodes such that: 1. There are n hub nodes and N − n leaf nodes.
2. Every two hub nodes are connected by an edge with weight w h , every leaf node has a self-loop with weight w l , and every hub node is connected to every leaf node by an edge with weight w a = 1. The numbers w h and w l are chosen such that: (a) For any leaf node, once it is selected for reproduction, it fires in the hub (as opposed to self-looping) with probability p in . In other words, we expect to see (N − n) · p in fire-in events per generation.
(b) For any hub node, once it is selected for reproduction, it fires out to some leaf (as opposed to replacing another hub node) with probability p out . In other words, we expect to see n · p out fire-out events per generation.
It is readily checked that the properties 2a and 2b are satisfied by assigning weight w h = (N −n)(1−p out )/((n−1)p out ) to each edge within the hub, and by assigning weight w l = n · (1 − p in )/p in to each self-loop, respectively.

Mathematical tools
Here we introduce the mathematical notions and tools required for our proofs.
Martingales. Note that Moran process on a population structure G = (V, E) can be represented as an absorbing Markov chain, whose states correspond to all the possible configurations S ⊆ 2 V (i.e. subsets of nodes occupied by mutants) and whose transition probabilities can be expressed in terms of the weights of the edges. We analyze such Markov chains using the machinery of martingales: Intuitively, a martingale is a "potential function" φ : 2 V → R that assigns a real number to every configuration S ∈ 2 V such that, in expectation, the value of the potential does not change in one step of the Moran process. Formally, we consider the sequence {S t } ∞ t=0 of random variables where S 0 is the (possibly random) initial configuration of the process and S t is the configuration after t steps. We say that a function φ is a martingale if Similarly, we say that a function ψ is a sub-martingale if, in expectation, it does not decrease, that is if Example martingale. As an example, consider Moran process running on a complete graph G = K N and any r ≥ 1. Given any configuration S, let p + S (resp. p − S ) be the probability that in the next step we gain (resp. lose) one mutant. Let F(S) = N + (r − 1) · |S| be the total fitness of the population. Then we have hence p + S = r · p − S . Now consider a function φ c that counts the number of mutants in the current configuration. Formally φ c is defined as We claim that φ c is a martingale when r = 1 and a sub-martingale when r > 1: Indeed, where the last term is 0 when r = 1 and non-negative when r > 1.
Specific mathematical tools. We make use of the following standard results.
is a sub-martingale and there exists d ≥ 0 such that |S k+1 − S k | ≤ d for any k ≥ 0. Then for any t ≥ 0 and any real ε > 0 we have Proposition 2 (Markov's inequality). Let X be nonnegative random variable with expectation µ = E[X] and let λ > 0 be a real number. Then Proposition 4 (Asymptotics of the harmonic series). For any N ≥ 1 we have log N ≤ N i=1 1/i ≤ log N + 1.

Supplementary Note 2: Results
Here we restate our two main results from the main text and present the high level ideas behind their proofs. Conceptually, our first main result shows that in order to achieve strong amplification, some asymptotic slowdown is necessary. In other words, there are no strong amplifiers that would be only constant factor slower than complete graphs.

Theorem 1. For any strong amplifier {G
To complement this, our second main result shows that strong amplification can be achieved with an arbitrarily small slowdown as compared to the complete graph. The slowdown is given by a function β(N ) which can increase arbitrarily slowly as long as it is unbounded.
In fact, our two main results (Theorem 1 and Theorem 2) can be viewed as immediate corollaries of two slightly more general theorems (Theorem A and Theorem B, respectively). Next we state those two more general theorems and we provide the high level ideas behind their proofs. The full formal proofs appear in Section 3.

Proof idea for Theorem 1
Theorem 1 follows directly from Theorem A which, for any fixed r > 1 and any amplifier (not necessarily a strong amplifier), gives a lower bound on fixation time that is inversely proportional to the extinction probability.

If G is an r-amplifier under uniform initialization then
2. If G is an r-amplifier under temperature initialization then Note that this indeed immediately implies Theorem 1: is a strong amplifier, then for any fixed r > 1 the extinction probability tends to zero in the limit N → ∞, thus FT(G N , r) = Ω 1 ep(G,r) · N log N = ω(N log N ) and the slowdown as compared to the complete graph is asymptotically non-negligible.
Proof Idea. The proof of Theorem A rests on two ideas. The first idea is that the fixation time on any graph G can be bounded from below in terms of temperatures of its nodes (see Lemma 1) by noting that, on any trajectory to fixation, each node (except for the node where the initial mutant appeared) has to be replaced at least once by one of its neighbors: In particular, if many nodes have a relatively low temperature ("cold nodes") then the fixation time is relatively long since we must wait until each of the cold nodes is replaced at least once (this corresponds to a coupon-collector-like process with many rare coupons).
The second idea is that when fp(G, r) is relatively high then there exist many initial nodes u for which ep(G, r, u) is relatively low. However, when ep(G, r, u) is low then T (u) has to be low too -otherwise u would be likely to be replaced by one of its neighbors before reproducing even once. Thus any graph G with high fixation probability inevitably contains many cold nodes and the general bound given by Lemma 1 becomes particularly strong for strong amplifiers.

Proof idea for Theorem 2
Theorem 2 immediately follows from Theorem B which states that selection reactors with suitably tuned parameters are strong amplifiers whose fixation time is asymptotically arbitrarily close to the fixation time of the complete graphs. We believe that the proof of Theorem B can be modified to apply to sparse incubators introduced in [5] but for convenience in what follows we focus solely on Selection Reactors. Specifically, for any function α : N → N that is both o(N 1/2 ) and ω(1), and for any integer N , let be the selection reactor with n = N/α N hub nodes, = N − n ≈ N leaf nodes in the periphery, and with fire-in and fire-out rates both equal to p in = p out = 1/α 2 (N ). With this notation we have the following theorem.
Theorem B (Reactors are fast strong amplifiers). For any function α : N → R that is both o(N 1/2 ) and ω(1), and for any fixed r > 1, the selection reactors {SR α } ∞ N =1 satisfy the following, under both uniform and temperature initialization: Note that this indeed immediately implies Theorem 2 as one can take α(N ) = 5 β(N ). Before we present the high level idea behind the proof of Theorem B, we need to introduce more notions. Recall that once a leaf node is selected for reproduction, it fires in the hub with probability p in . Similarly, a hub node, once selected, fires out with probability p out . We will make extensive use of a key parameter h = p in /n p out / = · p in n · p out that characterizes the bias towards the hub along any fixed edge that connects a leaf node with a hub node. Our proofs rely on the fact that h = ω(1) and on several other properties, namely: For the parameters stated in Theorem B, we have = Θ(N ) and p in = p out = 1/α 2 (N ) and thus h = Θ(α(N )) and all the properties are satisfied.
Proof idea for probability. Here we sketch the intuition behind our proof that Selection reactors from Theorem B are strong amplifiers. The argument proceeds in five steps. We show that: 1. Whenever the hub contains j ∈ [1, n/2] mutants, the size of the mutant population in the hub is more likely to increase than to decrease, see Lemma 4. This is because the hub by itself is biased towards gaining mutants (due to r > 1), and the leaves do not change this as they interact with the hub only rarely.
2. With high probability, the initial mutant appears at a leaf (for both uniform and temperature initialization), see Lemma 5. This is a simple computation.
3. With high probability, that initial mutant repeatedly fires in the hub before being eliminated and, by the result of step 1, this establishes a mutant subpopulation in the hub that has a superconstant size, see Lemma 6.
4. By Item 1 that subpopulation of mutants in the hub grows to size n/2, with high probability, see Lemma 7. This is because the hub is biased towards gaining mutants.

5.
Once there are j ≥ n/2 mutants in the hub, fixation on the whole graph occurs with high probability, see Lemma 8. This is because, at this point, the mutants have established such a large subpopulation that an extinction is extremely unlikely.
Proof idea for time. Here we sketch the intuition behind our proof that Selection reactors from Theorem B are fast strong amplifiers. Recall that a configuration S is the subset of nodes that are occupied by mutants. We say that a step of the process is active if it changes the configuration (due to a mutant replacing a resident or vice versa).
The idea is to decompose the process into the active process that tracks the active steps and the waiting process that tracks the steps in which the configuration S stays the same (either due to a node self-looping or due to a node replacing a neighbor of the same type). The argument then proceeds in four steps: 1. We define a certain potential function ψ that assigns a non-negative integer to each configuration and satisfies ψ(S) = O( · h) for any configuration S.

2.
For the active process, we argue that, in expectation, the value of the potential function increases by at least a constant in each active step (and always by at most O(h) in one step). This then implies that, in expectation, the active process attains each potential value k only a few times (namely R k = O(h 2 ) times), see Lemma 9 and Lemma 10.
3. For the waiting process, we establish an upper bound on the number W k of (waiting) steps that the process makes until an active step occurs, when the current configuration S is assigned potential value k = ψ(S), see Lemma 11 and Lemma 12.
4. By linearity of expectation, this implies that, in expectation, the process makes in total at most k R k · W k = O(h 2 ) · k W k steps. Summing up our upper bounds for W k from previous item gives the result.

Supplementary Note 3: Proofs
In the next two subsections we present the formal proofs of our two main theorems.

No superamplifiers are as fast as the complete graph
The goal of this section is to prove Theorem A. First, we present a lower bound on the fixation time for an arbitrary graph G in terms of the temperatures of its nodes. The bound is based on the fact that, on any trajectory to fixation, each node has to be replaced by one of its neighbors at least once. This reduces to a coupon collector process where nodes with lower temperature correspond to coupons that are more rare. Flajolet et al [6] found a closed form expression for the expected number of steps of such a coupon collector process, for any given list of temperatures. Here we present a simplified (weaker) bound that takes into account only those nodes whose temperature is lower than a given threshold ("cold" nodes). The proof idea is similar to the proof of Theorem 1 in [3]. Lemma 1. Let G be a graph, u a node of G and r > 1. Suppose that at least k nodes of G different from u have temperature at most t each. Then FT(G, r, u) ≥ fp(G, r, u) · log k r · t .
Proof. Consider a modified Moran process M that is identical with the standard Moran process with a mutant initialized at u, except that whenever the mutants go extinct we instead again initialize a single mutant at u and continue the process. Clearly, the modified process M always terminates with the mutants fixating. We claim that the expected fixation time is given by FT (G, r, u) = 1 fp(G,r,u) · FT(G, r, u): Indeed, for M denote by p = fp(G, r, u) the fixation probability and by E = ET(G, r, u), C = CFT(G, r, u), T = FT(G, r, u), the (expected) extinction time, conditional fixation time, and the fixation time, respectively. Then we have T = p · C + (1 − p) · E. Denoting by T = FT (G, r, u) the fixation time of M , by linearity of expectation we have since the first mutant either fixates, in which case M does C steps on average and then terminates (probability p), or goes extinct, in which case M does E steps on average and then restarts (probability 1 − p). This now rewrites as the desired Let C ⊆ V be a set of k nodes with temperature at most t each (the "cold" nodes). Since the process M eventually reaches fixation, for each i = 0, . . . , k − 1 it transitions from some configuration with i mutants in C to some configuration with i + 1 mutants in C at least once. Any time the process is in a configuration with i mutants in C, the probability p + i that it successfully gains another mutant in C satisfies p + i ≤ r N · (k − i) · t , because each of the remaining k − i resident nodes in C has temperature at most T (v) ≤ t and thus it is replaced with probability at most replace(v) ≤ r N t . Therefore the expected number of generations spent in such configurations is . By linearity of expectation, summing over i = 0, . . . , k − 1 we get Remark. We remark that Lemma 1 yields lower bounds on fixation time for various specific graph families and initialization schemes. For instance, star graphs S N contain N −1 = Θ(N ) nodes of temperature 1/(N −1) = Θ(1/N ) each, and therefore for each node u we have For uniform initialization we have fp(S N , r) → 1 − 1/r 2 = Θ(1), thus averaging over the initial nodes we obtain FT(S N , r) = Ω(N log N ) which is asymptotically tight [3]. Similarly, dense incubator D N (see [5] for their definition) are superamplifiers that contain Θ(N ) nodes of temperature Θ(N −2/3 ) each, thus FT(D N , r) = Ω(N 2/3 log N ).
In order to prove Theorem A using Lemma 1 we aim to show that any superamplifier has many nodes with low temperature. To that end, we use two lemmas. The first one (Lemma 2) allows us to look for nodes u with small extinction probability ep(G, r, u) (instead of nodes with low temperature T (u)). The second one (Lemma 3) states that for any graph G that amplifies (under either uniform or temperature initialization), a constant portion of its nodes have a small extinction probability ep(G, r, u). Lemma 2. Fix r > 1. Let G be a graph and u its node. Denote by ep u = ep(G, r, u) the extinction probability of a single mutant starting at node u. Then Proof. Assume that the configuration of the Moran process at some step t is S t = {u}. Let p (resp., q) be the probability that |S t+1 | = 2 (resp., |S t+1 | = 0). The event |S t+1 | = 2 occurs when u is selected for reproduction and it does not self-loop. The event |S t+1 | = 0 occurs when u is replaced by one of its neighbors. Hence Note that if |S t+1 | = 2 and |S t+1 | = 0, then we have S t+1 = S t = {u}, therefore the extinction probability from u is lower-bounded by .
Clearing the denominators and using 1 − L(u) ≤ 1 we get the desired bounds.

(Uniform initialization) Let
Proof. We prove the two claims independently (but along the same lines).
1. Let X be a random variable that denotes the extinction probability of a node of V chosen uniformly at random. Note that X is non-negative, and By Markov's inequality, we have Finally, note that and thus |S| ≥ N · (1 − 1/c r ) = N · r−1 r+1 as claimed. 2. Likewise, let Y be a random variable that denotes the extinction probability of a node of V chosen according to the temperature initialization. Proceeding as before, and since Y is non-negative, by Markov's inequality, we have Fix u ∈ S T and denote by ep u = ep(G, r, u) its extinction probability. By definition of S T we have ep u ≤ c r · ep T (G, r), and since G is an amplifier under temperature initialization, we have ep T (G, r) ≤ 1/r, therefore ep u ≤ r+1 2r . Plugging this into the bound given by Lemma 2 we obtain This yields hence |S T | ≥ N · (r−1) 2 r(r+1) 2 as claimed.
We are now ready to prove Theorem A.

If G is an r-amplifier under uniform initialization then
2. If G is an r-amplifier under temperature initialization then Proof. Both claims are proved in essentially the same way.
1. Let c r = r+1 2 > 1 and consider the set of nodes with not too high an extinction probability ("cold nodes"). By Lemma 3 we have |S| ≥ N · r−1 r+1 . Moreover, for any node u ∈ S we have where throughout the computation all denominators are positive. Therefore we can apply Lemma 1 to all nodes 2. Likewise, let c r = r+1 2 > 1 and consider the set of nodes with not too high an extinction probability ("cold nodes"). By Lemma 3 we have |S T | ≥ N · (r−1) 2 r(r+1) 2 . By exactly the same computation as before, for any node u ∈ S we have where again all denominators are positive.
Therefore we can apply Lemma 1 to all nodes u

Selection reactors are superamplifiers
The goal of this section is to prove Item 1 of Theorem B. Let S i,j be a configuration consisting of i ∈ [0, ] mutants among the leaves and j ∈ [0, n] mutants in the hub.
The following lemma states that whenever the hub consists mostly of residents, we are at least a constant r = 1 2 (r + 1) > 1 more likely to gain a mutant there rather than to lose one. It holds since p out = o(1) and · p in = o(n). Lemma 4. Consider the Moran process starting from a configuration S i,j with 1 ≤ j ≤ n/2. Let S i ,j be the first subsequent configuration with j = j. Then, for all large enough , we have Proof. Let p + (i, j) (resp. p − (i, j)) be the probability that when in configuration S i,j , the number j of mutants in the hub increases (resp. decreases) in a single step. Let F = F(S i,j ) = N + (r − 1)(i + j) be the total fitness. Straightforward computation gives where in p + (i, j) we counted only the "within-hub" interactions and in p − (i, j) we bounded by the worst case when all leaf nodes are residents. We claim that B = o(A): Indeed, we have n − j ≥ 1 2 n and 1 − p out = 1 − 1/α(N ) 2 = Θ(1), hence the claim is equivalent with p in = o(n) which is true in our case as p in = /α(N ) 2 and n = /α(N ). Therefore, In particular, for all large enough N the ratio is higher than a smaller positive constant 1 2 (r + 1) and thus also The next lemma states that, with high probability, the first mutant appears in a leaf node. It holds since n = o( ) and p in = o(1).

Lemma 5.
Under both uniform and temperature initialization, the first mutant is initialized at a leaf node with high probability.
Proof. This is straightforward computation. For uniform initialization, the first mutant appears at a leaf with probability /N → N →∞ 1.
The next lemma states that, with high probability, the initial mutant produces a mutant subpopulation in the hub of a superconstant size. It holds since h = ω(1) and h = o(n 3 ). Lemma 6. Consider the Moran process starting from a configuration S i,j with i ≥ 1. Then, with high probability, the process reaches a configuration S i ,j with j ≥ h 1/3 .

Proof. Fix a leaf u that hosts a mutant in the initial configuration
First, we claim that, with high probability, u fires in the hub √ h times before being replaced by a resident: The probability p + that in a single step u is selected for reproduction and fires in the hub satisfies p + ≥ 1 N · p in . The probability p − that in a single step u is replaced by a resident firing out from the hub satisfies p − ≤ n N · p out · 1 . Hence as long as u contains a mutant, the probability that u fires in the hub before being replaced by a resident satisfies The probability q that u fires in the hub at least √ h times before being replaced satisfies q ≥ p h . Since h = ω(1), by Proposition 3 this event indeed happens with high probability as claimed.
Next, we claim that after √ h mutant fire-ins, the hub contains at least h 1/3 mutants, with high probability. Note that this claim is not obvious, for at least two reasons: First, if the hub was biased towards losing mutants (which it is not), it could happen that each invading mutant gets swiftly eliminated. Second, if the hub was very small (e.g. a constant size), it could happen that the fire-ins from u repeatedly replace the same node.
To prove the second claim, run the process until either the hub contains at least h 1/3 mutants, or u has fired in the hub √ h times. Let t be the (random, stopping) time when the earlier of those two conditions occurs and let p t be the probability that the configuration S i ,j at time t still satisfies j < h 1/3 . Consider the potential φ(S i,j ) = j that counts the number of mutants in the hub. As in the proof of Lemma 4, when j ≤ h 1/3 ≤ n/2, the hub together with all the resident fire-ins is still biased towards gaining mutants (or not biased either way when j = 0). Hence in steps when u does not fire in, the potential φ does not decrease in expectation. Moreover, in steps when u does fire in, the potential increases in expectation by at least (n − j)/n ≥ 1 − h 1/3 /n = 1 − o(1), where the last equality holds due to h = o(n 3 ). Hence The next lemma states that once the hub contains a superconstant number of mutants, mutants eventually gain majority in the hub, with high probability. It holds since h = ω(1) and n = ω(1). Lemma 7. Let S i,j be a configuration with j ≥ h 1/3 . Then, with high probability, the process reaches a configuration S i ,j with j ≥ n/2.
Proof. By Lemma 4, for any j ∈ [1, n/2] the mutant subpopulation in the hub is a constant r = 1 2 (r + 1) more likely to increase rather than decrease. Hence the probability p that, starting with j mutants, the mutant subpopulation in the hub increases to size n/2 before being eliminated, can be lower-bounded by the absorption probability of a 1-dimensional random walk with a constant forward bias r as Since both j and n/2 are ω(1), we have The next lemma states that once the mutants have majority in the hub, fixation on the whole graph occurs with high probability. It holds since h = ω(1) and h = o(n). Lemma 8. Let S i,j be a configuration with j ≥ n/2. Then, with high probability, the process reaches the configuration S ,n .
Proof. We will show that, in fact, once we have j = ω(h) mutants in the hub, fixation on the whole graph occurs with high probability (this suffices since h = o(n)). For any configuration S i,j , define its "star-like" potential by where β = h+r hr 2 +r → 1 r 2 and γ = hr+1 hr+r 2 → 1 are positive real numbers less than one. In particular, we have φ(S 0,0 ) = 1 and φ(S ,n ) → N →∞ 0. Also, since j = ω(h) we have φ(S i,j ) ≤ γ j = (1 − Θ(1/h)) ω(h) → N →∞ 0. As we verify below, the potential is chosen such that it does not change in expectation when the reproduction event happens along an edge that connects a hub node with a leaf node. Moreover, for reproduction events that happen along edges within the hub, the potential decreases in expectation. Thus the function φ is a sub-martingale and by Doob's optional stopping theorem, the expectation φ ∞ = E[φ(S T )] of the potential at the (random, stopping) time T when the process terminates satisfies where fp i,j is the fixation probability starting from the configuration S i,j . Rearranging the terms and dividing by the (positive) expression φ(S 0,0 ) − φ(S ,n ), this gives To verify our claim, it suffices to check contributions of three different types of edges: (a) "within-hub" edges connecting a mutant and a resident: When this edge is used for reproduction, it is r times more likely that we gain a mutant rather than lose it. Denoting the current potential by x, we need to show that r r + 1 x · γ + 1 r + 1 x This reduces to (1 − γ)(rγ − 1) ≥ 0 which is true because the first term is positive and the second one is positive for any large enough population size (recall that h = ω(1)).
(b) "resident leaf -mutant hub" edge: The edge is used in the direction towards the hub with probability p 1 = 1 F · p in · 1 n and in the direction towards the leaf with probability p 2 = r F · p out · 1 . Hence we need to verify that
(c) "mutant leaf -resident hub" edge: Similarly as in the previous case, we get p 1 = r F · p in · 1 n and p 2 = 1 F · p out · 1 . Then we need to verify hr hr + 1 · γ + 1 hr + 1 which is done by rewriting the left-hand side as We are now ready to prove Item 1 of Theorem B.
Theorem B (Reactors are fast strong amplifiers). For any function α : N → R that is both o(N 1/2 ) and ω(1), and for any fixed r > 1, the selection reactors {SR α } ∞ N =1 satisfy the following, under both uniform and temperature initialization: Proof of Theorem B, Item 1. Each of the following four steps happens with high probability: By Lemma 5, the first mutant appears at a leaf node. From there, by Lemma 6, this mutant establishes a mutant subpopulation in the hub with superconstant size j ≥ h 1/3 . From there, by Lemma 7, the mutant subpopulation in the hub grows until the mutants have majority in the hub. From there, by Lemma 8, the mutants fixate.

Selection reactors are fast superamplifiers
The goal of this section is to prove Item 2 of Theorem B. Denote the configuration with i mutant leaf-nodes and j mutant hub-nodes by S i,j . Let h 0 = h be the parameter h = ( · p in )/(n · p out ) rounded down to the nearest integer. We define ψ(S i,j ) = h 0 · i + j. Intuitively, each mutant leaf-node is worth h 0 , each mutant hub-node is worth 1, and the potential of a configuration is the total worth of its mutant nodes. The potential function ψ is a rescaled and rounded version of the function ψ (S) = v∈S 1/ deg(v) which was used in several earlier works such as [7,8,9].
First we show that, in expectation, each active step increases the potential by at least a constant c (that depends on r > 1). The claim is essentially the same as Lemma 2 in [8] and Lemma 67 in [9]. Lemma 9. Fix a selection reactor SR( , n, p in , p out ) and r > 1. Fix a configuration S different from S 0,0 and S ,n . Sample a random active step of the process and denote the new configuration by S = S. Let be positive constants and let c = min{c 1 , c 2 , c 3 }. Then Proof. We say that an edge is active if its two endpoints are occupied by one mutant and one resident. There are three types of active edges: 1. Type 1 edges ("hub-hub") that connect a mutant in the hub to a resident also in the hub; 2. Type 2 edges ("hub-leaf") that connect a mutant in the hub to a resident in a leaf; and 3. Type 3 edges ("leaf-hub") that connect a mutant in a leaf to a resident in the hub.
Each active edge can be used during a replacement event in either of its two directions. Fix an active edge uv with u a mutant and v a resident and suppose it has type i (where 1 ≤ i ≤ 3). Below we show that, assuming the edge uv is selected for replacement (in one of its two directions), the potential increases at least by c i in expectation. Since in any one step of the active process, some active edge is used for replacement, we conclude that each active step increases the potential by at least min i {c i } = c, in expectation. We treat the three types of edges separately. For two nodes x, y let p x→y be the probability that in a single step of the Moran process, node x replaces node y. Let F = + n + (i + j)(r − 1) be the total fitness of the population at a configuration S i,j .
1. Type 1 edges ("hub-hub"): If the mutant at a hub-node u replaces a resident at hub-node v, the potential increases by 1, otherwise it decreases by 1. We have thus, in expectation, the potential increases by 2. Type 2 edges ("hub-leaf"): If the mutant at a hub-node u replaces a resident at leaf-node v, the potential increases by h 0 , otherwise it decreases by 1. We have thus, in expectation, the potential increases by 3. Type 3 edges ("leaf-hub"): If the mutant at a leaf-node u replaces a resident at hub-node v, the potential increases by 1, otherwise it decreases by h 0 . We have thus, in expectation, the potential increases by In particular, for any fixed r > 1 and as N → ∞ we have h → ∞ and thus c = c 1 = (r − 1)/(r + 1) is a constant that only depends on r.
Let R k be the expected number of times the active process attains a potential value k. The next lemma bounds R k from above. Lemma 10. Fix an integer k. Run the active process starting from any configuration. Then the expected total number of visits to configurations T that satisfy ψ(T ) = k is O(h 2 ).
Proof. Run the active process until the potential value k is attained for the first time (if it is never attained, the claim is trivial). Denote the corresponding configuration by S 0 and for t ≥ 0 let S t be a random variable denoting the configuration after t active steps, starting from S 0 (S 0 = S 0 ). Consider a function ξ(S t ) = ψ(S t ) − c · t, where c is the positive constant from Lemma 9. We claim that the function ξ is a sub-martingale with differences bounded by h 0 + c: Indeed, since the sequence is a sub-martingale and since its differences are upper-bounded by h 0 + c. By Azuma's inequality [10] we can bound the probability that after precisely t steps the potential is at most its initial value as Summing the infinite geometric series over t = 0, . . . , ∞ we obtain an upper bound on the expected number of visits to states T with φ(T ) ≤ φ(S 0 ), and thereby also on the number of visits to states with φ(T ) = φ(S 0 ). Specifically, we denote the exponent by x = −c 2 2(c+h0) 2 and note that since x ∈ (−1/2, 0), we have e x ≤ 1 + x/2. Upon summing the series we obtain the desired Next we move our attention to the waiting process. Given a configuration S i,j , let p i,j be the probability that the next step of the process is active. The number W i,j of waiting steps the process makes when at S i,j before we observe an active steps then satisfies W i,j = 1/p i,j . The next lemma establishes a lower bound on p i,j .
Lemma 11. Consider a selection reactor with n ≤ and any configuration S = S i,j different from S 0,0 and S ,n . Then the probability p i,j that the next step of the process is active satisfies Proof. We account for the fire-in events only. Let F = + n + (r − 1)(i + j) be the total fitness. Since n ≤ we have F ≤ 2r . The probability p r that a resident fires in the hub and replaces a mutant equals p r = − i F · p in · j n = p in F · n · ( − i)j and the probability p m that a mutant fires in the hub and replaces a resident equals Hence p i,j ≥ p r + p m ≥ pin 2r ·n · (i(n − j) + ( − i)j) as desired. Next, for each k ∈ [1, · h 0 + n − 1] let W k = max{W i,j | ψ(S i,j ) = k} be the largest possible number of waiting steps we can expect to encounter, when at a configuration with potential value equal to ψ(S i,j ) = k. The following lemma bounds the sum h0+n−1 k=1 W k from above.
Lemma 12. Suppose n ≤ /4. Then Proof. Split the configuration space into subsets where each subset X k contains configurations S i,j with k = ψ(S i,j ) = h 0 · i + j. First, within each slice, we identify the configuration where the bound on p i,j from Lemma 11 is weakest. To that end, it suffices to analyze the expression t(i, j) = i(n − j) + ( − i)j for a fixed k = h 0 · i + j. Note that t(i, j) = t( − i, n − j). Plugging in j = k − h 0 · i we get t(i, j) = i(n − (k − h 0 i)) + ( − i)(k − h 0 i) = 2h 0 · i 2 − i · (2k + h 0 − n) + k which is a quadratic function of i with a positive coefficient 2h 0 by the leading term. Its minimum over real numbers is attained for i 0 = (2k + h 0 − n)/4h 0 . Note that when k ≤ 1 2 ( h 0 + n) − n then we have i 0 ≥ k/h 0 which implies that the corresponding j 0 satisfies j 0 ≤ 0, thus the minimum of t(i, j) over real numbers is attained at a point outside of the allowed configuration space. For such k we thus get t(i, j) ≥ t(k/h 0 , 0) = n h 0 · k.
Finally, for the 2n + 1 intermediate values k ∈ [ 1 2 ( h 0 + n) − n, 1 2 ( h 0 + n) + n], note that any corresponding configuration S i,j satisfies i ∈ [ 1 2 − 3n 2h0 , 1 2 + 3n 2h0 ]. Since t(i, j) is non-decreasing in j when i ≤ 1 2 and non-increasing in j when i ≥ 1 2 , together with the symmetry t(i, j) = t( − i, n − j) this implies that where for convenience in the second inequality we used a bound n ≤ /4 which holds for all large enough . It remains to compute the sum over all k ∈ [1, h 0 + n − 1]. Using Lemma 11, the above analysis and the fact that h 0 = o( ) we can write We are now ready to prove Item 2 of Theorem B.
Proof of Theorem B, Item 2. For each k ∈ [1, h 0 + n − 1], consider the "slice" X k consisting of configurations S i,j with k = ψ(S i,j ) = h 0 · i + j. Let R k be the expected number of times a slice X k is visited (by an active step) and let W k be an upper bound on the expected number of steps before the slice is left, once it is visited. Rescaling from steps to generations we then have By Lemma 10, for any k we have R k = O(h 2 ). Hence by Lemma 12 we have Since we have h = Θ(α(N )) and p in = 1/α 2 (N ), this gives the desired bound O(log N · α 5 (N )).