Aspiration dynamics generate robust predictions in heterogeneous populations

Update rules, which describe how individuals adjust their behavior over time, affect the outcome of social interactions. Theoretical studies have shown that evolutionary outcomes are sensitive to model details when update rules are imitation-based but are robust when update rules are self-evaluation based. However, studies of self-evaluation based rules have focused on homogeneous population structures where each individual has the same number of neighbors. Here, we consider heterogeneous population structures represented by weighted networks. Under weak selection, we analytically derive the condition for strategy success, which coincides with the classical condition of risk-dominance. This condition holds for all weighted networks and distributions of aspiration levels, and for individualized ways of self-evaluation. Our findings recover previous results as special cases and demonstrate the universality of the robustness property under self-evaluation based rules. Our work thus sheds light on the intrinsic difference between evolutionary dynamics under self-evaluation based and imitation-based update rules.


Supplementary Note 1: Overview
The supplementary text is organized as follows. First, we introduce the model and define the notations. Then we derive the general condition for strategy success, which applies to a large class of evolutionary dynamics. After that, we focus on aspiration dynamics on weighted networks and derive the exact formula of condition for strategy success under different model assumptions, for instance, symmetric or asymmetric aspirations, and average or accumulated payoffs. For a better comparison, we also prove a Proposition that connects our approach to those in [1,2] for imitation-based update rules. Meanwhile, we give the condition for strategy success under pairwise comparison rules [3,4].
Supplementary Note 2: Model and notation 2

.1 Population structure
The population size is N and the structure of the population is depicted by an undirected weighted graph with edge weights w ij ≥ 0. This graph is fixed during the whole evolution process. Here, w ij > 0 denotes the weight of edge ij while w ij = 0 means there is no connection between node i and j. Since edges are undirected, w ij = w ji . Moreover, we define the weighted degree of node i as d i = N j=1 w ij [1].

Discrete-time Markov chain with finite states
In our model, individuals can play one of the two strategies, strategy A and strategy B. We consider the stochastic aspiration dynamics in structured populations. The state of the system can be described by a (column) binary vector s = (s 1 , s 2 , · · · , s l , · · · , s N ) T where s(l) = s l is the strategy of individual l: s l = 1 if individual l uses strategy A, s l = 0 if it uses B. We model the resulting evolutionary process as a discrete-time Markov chain {S n , n = 0, 1, 2, 3, · · · } with state space {s 1 , s 2 , · · · , s M } where M = 2 N . For each individual l, its strategy at the k-th step is thus S k (l). Note that throughout this supplementary text, all the vectors are column vectors by default.

Payoff
On the weighted graph, nodes are occupied by individuals and links indicate who interacts with whom. Individuals interact with their nearest neighbors according to the game given by the following payoff matrix and they collect the degree-weighted average payoff π. Here, self-interactions are excluded and each individual must have at least one neighbor to interact with, which requires that w ll = 0 and d l > 0 for all l.
For each individual l, the number of times it interacts with an A-player at state s is N k=1 w lk s k and that with a B-player is

Aspiration-based update functions
In the real world, different individuals may make distinct decisions for the same shortfall of payoffs, which means they may have individualized ways of self-evaluation. Such heterogeneity can be characterized by different update functions [5,6]. We define individual l's update function as g l (u) (l = 1, 2, · · · , N ) and these functions represent the tendency to switch strategies. For aspiration-based update functions, u = β(α l − π l ), where α l is individual l's aspiration level, π l is its payoff, and β > 0 is the intensity of selection [7,1]. Weak selection means β 1 and β = 0 is the neutral drift [8]. In addition, each function g l (u) should satisfy the following restrictions: • it is a probability, i.e., g l (u) ∈ [0, 1] for u ∈ R; • it is a strictly increasing function of u, i.e., g l (u) = dg l (u)/du > 0 for all u, which indicates that individuals with higher payoffs should have a lower tendency to switch; • g l (0) > 0, which avoids frozen dynamics at the neutral drift.

Transitions under aspiration dynamics
Under aspiration dynamics, individuals' self-assessment of strategies triggers the transition between states. At each time step, only one individual can update its strategy, either switch to the other strategy or keep the current strategy unchanged. For the former, it means that the one-step transition between two distinct states of the associated Markov chain happens only if they possess the same entries everywhere except at one place which marks the strategy switching of the selected individual. Formally, this means that the one-step transition between s i and s j (i = j) is possible only if l |s i (l) − s j (l)| = 1. Here, the operator | · | means to take the absolute value. For convenience, we denote the set of states which can transit out of (or into) s i as N i (s i / ∈ N i ), namely, the neighboring states of s i . If the selected individual instead does not switch, the state of the Markov chain stays unchanged. Therefore, the transition probabilities of the Markov chain are where l ij (l ik ) is the selected individual and it is uniquely determined by s i and s j (s k ). A simple way to locate l ij is by looking at the (unique) non-zero entry of vector s j − s i when s j ∈ N i . With all these transition probabilities, we have a transition matrix P whose (i, j)-th entry is p ij , which describes the one-step transition probability between any states.
Supplementary Note 3: Condition for strategy success

General condition
Let us denote the number of individuals playing strategy A at state s as φ(s) = N l=1 s l . The abundance (frequency) of strategy A and B at state s is x A = φ(s)/N and x B = 1 − x A , respectively. The frequencies of strategy A at all states are denoted as a vector N ) T and those of B are 1 − x where 1 is a vector with all its elements being one. Under aspiration dynamics, transition is possible between all the states (not necessarily in one step) and the number of steps taken can be odd or even. This indicates that aspiration dynamics admit a unique stationary distribution u, in which each state occurs with a positive probability [9] and all these probabilities sum up to one, i.e., u T 1 = 1. The stationary distribution u can be obtained by solving a set of linear equations u T P = u T . In this stationary distribution, we calculate the average abundance of strategy A as x A = u T x and that of B as x B = u T (1 − x) = 1 − x A , where the angle bracket · means to take the average over all the states.
Strategy A is more abundant than strategy B if and only if x A > 1 2 [8,10]. Since [11,12], and further equivalent to Note that u and P depend on the selection intensity β and so do x A and x B . To show this dependence, we rewrite them as u β , P β , x A β , and x B β . In the limit of weak selection β → 0, we do a Talyor expansion of x A − x B β with respect to β at β = 0 and get where x A − x B 0 is the first-order derivative of x A − x B β with respect to β, evaluated at β = 0 and O(β 2 ) is the higher order terms of β 2 , which is negligible as β → 0. Note that the first term in the right-hand side of (4) is the average abundance difference between strategy A and B at the neutral drift, which is zero [5].
Since β > 0, the condition for strategy A to be more abundant than B becomes To get u 0 , we differentiate u T β P β = u T β with respect to β and evaluate it at β = 0, by rearranging items, we obtain where I is the identity matrix of dimension M , u 0 = d dβ u β | β=0 , and P 0 = d dβ P β | β=0 . Note that I − P 0 is not invertible since it is not full rank. Introduce the matrix W = 1u T 0 , we know that the matrix I − P 0 + W is invertible and its inverse, denoted as Z 0 , is called the fundamental matrix [9]. As shown in ref. [9], Note that (u 0 ) T W = 0 T , right multiply Z 0 at both sides of equation (6), we have Substituting equation (7) into (5), we have that strategy A is favored over strategy B if ij is the k-step transition probability from state s i to s j . Summing over all the possible states, is the average abundance difference between strategy A and B at the k-th step when starting at state s i . Therefore, c i is the accumulated average abundance difference during the whole evolution when the initial state is s i .
Actually, equation (8) holds for any evolutionary dynamics which have a unique limiting stationary distribution and an equal abundance of strategy A and B at the neutral drift. For example, the death-birth, birth-death process and pairwise comparison with symmetric mutations are of this class [13,4,8,12,14,1].

Calculating the accumulated average abundance difference
ij is the k-step transition probability from state i to j at the neutral drift (i.e., β = 0). It represents the accumulated average abundance difference during the whole evolution when starting at state s i . Here, the abundance difference at each state aggregates contributions from all the individuals. For instance, at state s j , individual l's contribution to the abundance difference is . The idea is that to calculate the accumulated average abundance difference during the whole evolution, we can first calculate the individual contribution and then aggregate them. This approach is made possible due to the following properties of aspiration dynamics.
Under aspiration dynamics, when β = 0, the payoffs collected from the games do not affect individuals' strategy revisions. More importantly, during the whole evolution, individuals' strategy revisions are completely independent of each other. Due to this independence, we can study the much simpler two-state Markov chains for each individual instead of dealing with the whole chain with 2 N states. For the two-state Markov chain of individual l, the state space is {0, 1} and the associated transition matrix is The associated k-step transition probabilities are Since each individual's behavior is highly independent of each other, we can calculate their contribution to the accumulated average abundance difference separately. Starting at state s i , at step k, the contribution of individual l to the average abundance difference is By definition of the update functions, 0 < g l (0) < 1 for all l. The contribution of individual l to the accumulated average abundance difference during the whole evolution is thus Summing up all the individual contributions, we obtain the accumulated average abundance difference of the whole Markov chain as By this formula, we obtain that c only depends on the update functions. It does not depend on the game, the aspiration levels, and the population structure.

Deriving the exact formula
Denote h(s i ) = M j=1 p ij c j , which is the i-th row of P 0 c. Equation (8) implies that the condition for strategy A to be favored over B is where the bracket · 0 means to take the average over the neural stationary distribution (i.e., when β = 0). Substitute formula (10) of c j into h(s i ), we have Note that the transitions can happen if s i and s j (i = j) differ at only one individual's strategy. Combining with the transition probability (3), we have that for j = i, where g lij (u) = dg lij (u)/du (see Supplementary Note 2.4). The summation over all neighboring states can also be rewritten as the summation over all the individuals, which leads to Here, we drop the subscript l ij since the summation goes over all the individuals. Substituting the payoff formula (2) into the above equation and using s 2 l = s l , we obtain that for any state s, Evaluating h(s) 0 needs to analyze the correlation of strategies in the neutral stationary distribution. Under aspiration dynamics, when the selection intensity β = 0 (neutral drift), individuals' strategy updating does not dependent on the aspiration level, the payoff, and the population structure. The transition probabilities between any two states are thus the same in both directions, which indicates that the transition matrix is symmetric, i.e., P 0 = P T 0 . By the uniqueness of the stationary distribution u 0 and the property of the transition (stochastic) matrix P 0 1 = 1, we have 1 T P 0 = 1 T , which leads to u 0 = 2 −N 1 through normalization. This results in that at the neutral stationary distribution, each individual plays strategy A with probability one-half, i.e., s l 0 = 1/2 for all l. Moreover, since individuals' strategy updating is independent of each other, s l s k 0 = s l 0 s k 0 = 1/4 when l = k. Based on these, we have that the correlations of strategies at the neutral stationary distribution are where δ lk = 0 if l = k and δ lk = 1 if l = k. Equation (14) indicates that aspiration dynamics do not lead to assortment of strategies in the neutral stationary distribution. Combining equations (11), (14), and (13), we obtain that Since Moreover, the average abundance of strategy A in the stationary distribution can be obtained by combining equation (15) and the identity x A + x B β = 1, which leads to

Asymmetric aspirations
For aspirations contingent on the strategy in use, individuals playing strategy A have aspiration level α A while those using B have α B . As shown in Supplementary Note 3.2.1, coefficients c in equation (8) is evaluated at β = 0. The asymmetry of aspirations thus does not affect c. However, h(s) in Supplementary Note 3.2.2 now depends on both α A and α B , and equation (13) is modified as (17) Note that at the neutral stationary distribution, strategy correlations s l s k 0 are independent of aspirations. Combining equations (11), (14), and (17), we obtain that Therefore, under the limit of weak selection, the condition for strategy A to prevail over B for asymmetric aspirations is In the Prisoner's Dilemma game (i.e., a < c and b < d), a + b is always less than c + d. If the aspirations are symmetric, strategy A is never favored since a + b < c + d. With asymmetric aspirations, if α B − α A is large enough, it is possible that a + b > c + d − 2(α B − α A ), which selects strategy A over B. Considering the evolution of cooperation, this means that cooperation can prevail over defection if individuals aspire more when they defect.

Accumulated payoffs
Under accumulated payoffs, the payoff of individual l at state s is This leads to that Since the way of payoff collection does not affect the correlation of strategies in the neutral stationary distribution, we obtain that and the condition for strategy A to be favored over B is a + b > c + d and that for strategy B to be favored over A is a + b < c + d. This means that accumulated payoffs do not change the condition for strategy success under aspiration dynamics.

The general formula
Pairwise comparison rule is an imitation-based update rule [3,4]. At each time step, a focal individual l is randomly selected. Then it randomly selects a role model m from its neighbors proportional to weights and copy the strategy of individual m based on the payoff difference ∆π. When mutation or random exploration is considered, individual l will use this update rule with probability 1 − µ (0 < µ < 1) or switch to a random strategy with probability µ. Here, we assume that all the individuals in the population use the same imitation-based update function g(u) where u = β∆π. Note that such update function satisfies the same restrictions as mentioned in Supplementary Note 2.4. For pairwise comparison rule, the probability for individual l to adopt the strategy of individual m is (1 − µ)g (β∆π) + µ/2 where ∆π = π m − π l . Define the degree-weighted frequency of strategy A in state s asx A (s) = N l=1 d l s l /Ω where Ω = N l=1 d i [1]. In the neutral stationary distribution, the average degree-weighted abundance of strategy A equals to that of B, i.e., x A 0 = x B 0 = 1 2 . Under the limit of weak selection β → 0, using equation (7), we have In the following, we will try to prove a Proposition which connects our approach to those in [1,2]. Proposition 1. With mutation rate 0 < µ < 1, under the neutral drift β = 0 , 2x − 1 is an eigenvector of Z 0 with respect to the eigenvalue N µ . Proof. By definition, Z 0 = (I − P 0 + W) −1 . If 2x − 1 is an eigenvector of I − P 0 + W with respect to the eigenvalue µ/N , then Proposition 1 is proved.
Here, we first calculate where the subscript • represents the neutral drift (β = 0). The first term on the right hand of equation (22) is the expected instantaneous rate of degree-weighted frequency change [1], which equals to zero when mutation is absent. With mutations, is non-zero due to the contribution of mutations. We thus have Combining equations (22) and (23), we get This leads to which proves Proposition 1. By Proposition 1, equation (21) becomes Note that the ith-row of P 0x is where∆ sel (s i ) is the first-order coefficient of the reproductive value weighted change due to selection [1,2]. Now equation (26) can be rewritten as Therefore, x A > x B if and only if ∆ sel 0 > 0. Up to now, we have demonstrated that our approach is equivalent to that in [1,2] for homogeneous imitation-based update rules. In what follows, we will use their results to derive the condition for strategy success for pairwise comparison rules under the limit of rare mutation (µ → 0) and weak selection (β → 0).
By Corollary 1 in [2], under rare mutation µ → 0, Here, for the pairwise comparison rule, Inserting equation (2) At the neutral stationary distribution, evaluating the assortment of types x i x j 0 needs to know the identity-by-descend probabilities q ij . Following [1], their relation reads x i x j 0 = (1 + q ij )/4. As a special case, when i = j, x 2 i 0 = x i 0 = 1/2 since q ii = 1 for all i. Furthermore, under rare mutation µ → 0, the identity-by-descend probabilities q ij has the low-mutation expansion q ij = 1 − µτ ij + O(µ 2 ) [15,1], which connects x i x j 0 to coalescence times τ ij by To calculate τ ij , we first introduce the quantityτ ij = g(0)τ ij . Analogous to other imitation-based update rules, we have that for pairwise comparison rules,τ ij satisfy the following recurrent relations: Note that the form of the above relations is exactly the same as those for the death-birth update rule [1]. Therefore, at the neutral drift, the rescaled coalescence timesτ ij under the pairwise comparison rule equals to the coalescence times τ ij under the death-birth update rule. Inserting the relation (31) into equation (30), we get Since g (0) > 0, using equation (33) we have that d dµ ∆ sel 0 | µ=0 > 0 is equivalent to To simplify the above inequality, we define the probability to step from node i to node j in a one-step random walk as v ij = w ij /d i and the probability for an n-step random walk as v By introducing the quantityτ (n) = N k=1 N l=1 kl τ kl and multiplying both sides of inequality (34) by g(0), we have that x A > x B is equivalent to For graphs without self-loops,τ (2) −τ (1) = −1 [1]. Therefore, under rare mutation and weak selection, strategy A outcompetes B in abundance (i.e., x A > x B ) if and only if Note thatτ (1) = g(0) Ω N k=1 N l=1 w kl τ kl > 0. This indicates that under pairwise comparison rules, cooperation can never evolve on any weighted network. The reason is that for strategy A to be a cooperative behavior, the benefit B must exceed the cost C (i.e., B > C). However, if B > C, cooperation is never favored by natural selection according to condition (36).
In addition, condition (36) implies that the critical benefit-to-cost ratio (B/C) * is −τ (1) . Using the Structure Coefficient Theorem [8], for a general game with payoff values a, b, c, d, x A > x B if and only if σa + b > c + σd where σ = ((B/C) * + 1) / ((B/C) * − 1). Here, we get that for the pairwise comparison rule,

σ for regular graphs
To calculate σ for regular graphs, we use the results from Section 5.3 of the supplementary information for [1]. The results imply thatτ (1) = N − 1. Substituting this into equation (37), we have that under under pairwise comparison rules, for regular graphs.

σ for star graphs
For star graphs, the node in the center is called the hub (H) and those connecting to the hub are called the leaves (L). Due to the symmetry, we have that the rescaled coalescence times between any two (different) leaves are the same, which is denoted asτ LL . Similarly, the rescaled coalescence times between the hub and a leaf is denoted asτ HL . According to equations (32), we get that Solving the above equations, we have thatτ LL = 4(N − 1)/N andτ HL = (3N − 4)/N , which leads tõ τ (1) = (3N − 4)/N . Based on these, we have that under pairwise comparison rules, for star graphs.   Figure 2 of the main text, we construct weighted networks by first generating an undirected network and then assigning weights to edges. In the figure, three kinds of unweighted networks (random, regular, and scale-free networks) and three types of edge weight distributions (homogeneous, uniform, and power-law distributions) are considered (see Figure 2 of the main text for the definitions of different edge weight distributions). This in total generates nine kinds of weighted networks. Each panel shows the simulation results under one kind of the weighed networks: a random networks with homogeneous weights, b random networks with uniform weights, c random networks with power-law weights, d regular networks with homogeneous weights, e regular networks with uniform weights, f regular networks with power-law weights, g scale-free networks with homogeneous weights, h scale-free networks with uniform weights, and i scale-free networks with power-law weights. In each panel, the color indicates the average abundance of strategy A, and the dashed line the estimated critical equation in the form of 1 + k S S − k T T = 0 that corresponds to the condition for strategy success (inequality (3) in the main text). In detail, parameters k S and k T are normalized regression coefficients and R 2 score is the R-squared values, which are estimated by the linear regression with R, S, T, P as independent variables and the average abundance of strategy A minus one-half divided by the selection intensity as the dependent variable (see similar methods in Section Estimating the coefficients via simulation of the Supplementary Methods in [5] for detailed explanations). Other parameters and simulation settings are the same as those in Figure 2 of the main text. To better illustrate the critical value of S above which x A > x B and below which x A < x B , we fixed the value of T and leave S as a tunable parameter. Three values of T are considered: T = 0.6 (panels (a) and (b)), T = 1.2 (panels (c) and (d)), T = 1.8 (panels (e) and (f )). In each panel, we run simulations for five sets of update functions, one is homogeneous and the other four are heterogeneous. For the homogeneous case, we set the update functions of all individuals as g 1 (u) = 1/(1 + exp(−u)). For the heterogeneous cases, we construct update functions by taking the convex combination of three base update functions, g 1 (u) = 1/(1 + exp(−u)), g 2 (u) = 1/(1 + 10 exp(−u)), and g 3 (u) = 10/(10 + exp(−u)). This means that update functions are now represented by γ 1 g 1 + γ 2 g 2 + γ 3 g 3 where γ 1 , γ 2 , γ 3 ∈ [0, 1] and γ 1 + γ 2 + γ 3 = 1. Under such constraints, we then randomly sample γ 1 , γ 2 , γ 3 and use them to parameterize the update function for each individual. In the simulations, four sets of heterogeneous update functions are considered, which are denoted as Heter (set 1), Heter (set 2), Heter (set 3), and Heter (set 4) for short in the panels, respectively. In all the panels, symbols represent the simulation results and red dashed lines the critical value of S, S * , determined by the condition for strategy success: S * = T − 1 for aspiration-based update rules with both homogeneous and heterogeneous update functions (see inequality (3) in the main text for reference); S * = T −τ