Rethinking network reciprocity over social ties: local interactions make direct reciprocity possible and pave the rational way to cooperation

Since Nowak&May's (1992) influential paper, network reciprocity--the fact that individuals' interactions repeated within a local neighborhood support the evolution of cooperation--has been confirmed in several theoretical models. Essentially, local interactions allow cooperators to stay protected from exploiters by assorting into clusters, and the heterogeneity of the network of contacts--the co-presence of low- and high-connected nodes--has been shown to further favor cooperation. The few available large-scale experiments on humans have however missed these effects. The reason is that, while models assume that individuals update strategy by imitating better performing neighbors, experiments showed that humans are more prone to reciprocate cooperation than to compare payoffs. Inspired by the empirical results, we rethink network reciprocity as a rational form of direct reciprocity on networks--networked rational reciprocity--indeed made possible by the locality of interactions. We show that reciprocal altruism in a networked prisoner's dilemma can invade and fixate in any network of rational agents, profit-maximizing over an horizon of future interactions. We find that networked rational reciprocity works better at low average connectivity and we unveil the role of network heterogeneity. Only if cooperating hubs invest in the initial cost of exploitation, the invasion of cooperation is boosted; it is otherwise hindered. Although humans might not be as rational as here assumed, our results could help the design and interpretation of new experiments in social and economic networks


Introduction
Cooperation among rational-self-interested-agents is a longstanding and still debated puzzle in biology and social sciences, with countless contributions since Axelrod, Hamilton, and Trivers' seminal works, 1, 2 and the topic received recent attention also in several fields of engineering. 3 The standard modeling framework is evolutionary game theory (EGT), 4 in which a non-cooperative game-where any altruistic act is self-enforcing-describes a repeated interaction among pairs (or larger groups) of individuals, a given set of strategies is confronted, and an evolutionary process links the obtained payoffs to reproduction and death in biology or to strategy-update in socio-economic systems. The paradigmatic game used to study the evolution of cooperation is the prisoner's dilemma (PD)-the two-player-two-option interaction in which a cooperator (option C) provides a benefit b to the opponent at a cost c<b to herself, whereas a defector (option D) provides no benefit at no cost. The benefit-to-cost ratio, or game return r = b/c, is often used to parameterize the game (taking c = 1 as monetary unit).
To test whether a cooperative strategy (strategy C) has any chance to evolve, it is confronted with the benchmark strategy 'unconditional defection' (strategy D, played by individuals who always defect) under one or a few evolutionary processes. The three different issues to be discussed are the invasion of the strategy-the spreading of cooperators (C-strategists) in a population dominated by defectors (D-strategists)-its persistence-the long-term presence, fluctuating or not, of C's-and its fixation-the convergence to the state all-C. For example, it is well known that when the PD is played in large and well-mixedunstructured-populations, there is no hope for the strategy 'unconditional cooperation' (played by individuals who always cooperate). Defecting gives the largest payoff regardless of what the others are doing, so that, without any specific incentive to cooperation, C's cannot invade under any reasonable evolutionary process and disappear if initially present in the population. Compared to other social dilemmas, the PD is the worst-case scenario for the evolution of cooperation.
Traditional incentivizing mechanisms 5 either make cooperation conditional-such as reciprocal altruism 1, 2 (also known as direct reciprocity), the establishment of reputations 6 (also known as indirect reciprocity), and mechanisms of kin 7 or group selection 8 (or other forms of assortment 9, 10 )-or change the rules of the game, as by introducing volunteering (optional participation) 11 and punishment of antisocial behaviors. 12,13 All these mechanisms add degrees of strategical complexity, either in terms of players' cognitive abilities and/or information flows, or due to extra options in the underlying game.
Starting with Nowak and coauthors' influential papers, 14-16 the fact that interactions in real populations are structured according to the individuals' personal contacts has been proposed as the simplest mechanism-requiring no strategical

Model description
Before presenting results, we introduce our model (implementation details are given in the Methods section; the elements of novelty are commented in the Discussion). We consider two strategies, a reciprocating form of conditional cooperation (strategy C) and unconditional defection (strategy D). At each game round, each individual is a C-or a D-strategist, the strategy representing the individual's mood. A PD is played by all pairs of connected individuals. The individual payoff in the round is the sum of the outcomes of the PD interactions in the neighborhood. After each round, each individual revises her strategy with a probability δ assumed small and uniform across the population; parameter δ measures the rate (per game round) of (asynchronous) strategy update, and 1/δ is the average number of rounds between two consecutive revisions by the same individual, a sort of 'inertia' to change.
In each PD interaction, a C can selectively opt for cooperation or abstention; specifically, she stops playing with an exploiting neighbor for a number of rounds drawn in accordance with the probability that the exploiter revises her strategy just after. D's always play and defect. Our conditional C-strategy implements a form of direct reciprocity. As in the famous tit-for-tat strategy, C's cooperate with neighbors who showed cooperation in the previous round. However, they do not retaliate for defection; rather, they abstain from playing, communicating their mood to the opponent; moreover, they forgive defection or, better, they poll previous exploiters to seek for cooperation. To modulate reciprocity in the population, the length of abstention periods is drawn according to a reciprocity-biased rate of strategy update The payoff of the C-individual i is π i = r(k i − 1) − k i ; that of the D-individual j is π j = r; π i > π j if r > 1 + 2/(k i − 2) and j drastically reduces her payoff when copying i. Under imitation update and π i > π j , j alternates between C (when selecting i for comparison) and D (when selecting a D-neighbor); the other nodes do not change strategy.
from complete) network, unconditional C's can do better than unconditional D's by grouping into clusters, i.e., the network's structure allows C's to reciprocate (Fig. 1). More specifically, in large regular networks-each node having degree k; k neighbors-driven by an imitation-like evolutionary process-individuals imitate from time-to-time a better performing neighbor (SI note 2)-and in the limit of weak selection-the game payoff marginally impacting the individual performance (SI note 3)-unconditional C's can invade and fixate under a simple condition: 16 the game return r must exceed the connectivity k. That is, if r > k in the above setting, the probability that cooperation invades and fixates starting from a single C placed in a random position-the fixation probability-is larger than 1/N-the fixation probability under a totally random process of strategy update, N being the network's size. And the rule has been generalized to non-regular networks, 17, 18 essentially requiring r to exceed the average degree k .
As several authors did in the last decade, 19-23 we question network reciprocity as a mechanism supporting cooperation in socio-economic networks. In particular, we question the rationale behind the imitation process of strategy update. Why should we copy a better performing neighbor whose neighborhood might be considerably different-in size as well as in composition-from ours? Especially in heterogeneous networks, imitation may turn counterproductive (see Fig. 1, where individual j reduces her payoff by copying i). Recent experiments on (impressively large) human networks playing a PD 19,20 indeed showed that we are more prone to reciprocate cooperation than to compare payoffs. Specifically, the probability to cooperate (as a rule of the experiment, the same game option, C or D, is taken toward all neighbors) is conditioned by the player's previous choice, i.e., by the player's cooperative or defective 'mood.' 21,23 In the C mood, a player is willing to cooperate the more cooperation she observed in the previous game round, even though the neighbors' previous payoffs were made available; in the D mood, defection is rather unconditional (or weakly correlated with the previous level of cooperation). Nonetheless, the simple rule of network reciprocity, r > k , found empirical validation. 24,25 Motivated by the empirical findings, we rethink the role of the population structure. In the C mood, players showed a form of direct reciprocity, 1, 2 i.e., higher chance to cooperate with neighbors who showed altruism in past interactions. As all forms of reciprocity, it requires repeated interactions with the same individuals as well as cognitive abilities to recognize individuals and remember past interactions. This is exactly the environment provided by a static and sparse network. In particular, it is the locality of interactions that makes cognitive tasks possible, as the required resources-in terms of memory and abilities-scale with the player's neighborhood. At the same time, a local interaction opens the way to more complex rules of strategy update, not only based on past interactions, but also on foreseeing future ones. We theoretically test reciprocal cooperation against the benchmark 'unconditional defection' under a rational process of strategy update, based on a model prediction of their future income. Reciprocity is implemented by allowing C's to selectively abstain from playing for a few game rounds with exploiting neighbors. 11,26,27 So doing, C's reduce exploitation risks-with respect to unconditional C's-and, at the same time, communicate their mood, thus increasing their chances to reciprocate cooperation in future interactions. We name this new mechanism for the social evolution of cooperation networked rational reciprocity.
Our main result is that, under a rule qualitatively similar to r > k , networked rational reciprocity grants the fixation of cooperation starting from any cluster of two C's. The more interactions are local (the sparser is the network), the lower is the required return r. And even starting from a single C the fixation probability remains high the more connected is the initial C, highlighting the role of the network's structure. We hence rediscover the simple rule of network reciprocity, but we provide a different underlying explanation, more in line with the observed social behavior. Figure 1. Network reciprocity in homogeneous and heterogeneous networks. (a) A cluster of 4 C's surrounded by D's in a square lattice with periodic boundary conditions. The payoff (per game round, obtained by each individual by playing a PD with all neighbors and summing up outcomes) of C's is 2r − 4; that of the D's at the border of the cluster is r. C's do better than D's if r > 4 and the payoff of a D copying a C (e.g., j copying i) drops to r − 4. (b) A cluster of k i C's protected from k j D's (k i , k j > 2; k i and k j are the degrees of nodes i and j). The payoff of the C-individual i is π i = r(k i − 1) − k i ; that of the D-individual j is π j = r; π i > π j if r > 1 + 2/(k i −2) and j drastically reduces her payoff when copying i. Under imitation update and π i > π j , j alternates between C (when selecting i for comparison) and D (when selecting a D-neighbor); the other nodes do not change strategy.

2/11
where parameter ε, also assumed small and uniform across the population, measures reciprocity; super/sub reciprocating C's (ε ≷ 0) wait longer/shorter, on average w.r.t. normally reciprocating ones (ε = 0), to go back playing. When revising strategy, an individual computes her expected accumulated payoff behaving as C or D during an horizon of h future interactions and decides for the more profitable strategy until the next update. The prediction is based on the model society here described that is assumed to be public knowledge. We consider the minimal-information scenario, in which individuals have no access to neighbors' payoffs and connectivity and infer the neighbors' strategies only from past interactions. Consequently, the model prediction cannot account for the concomitant changes in the neighbors' strategies. This limits the horizon to a few rounds under a relatively slow strategy update (small δ ). Because of the short horizon, no discount of future incomes is adopted.
The model parameters and their numerical values used in the analysis are summarized in Table S3.

Analytical results
To gain insight in the system's dynamics, we preliminary consider an infinite predictive horizon, because it allows a simpler analysis. We prove (in SI Sects. S1-S6) that when a C-player with degree k and k C C-neighbors (known from past interactions) revises her strategy according to an infinite horizon, she remains C (no expected gain in changing to D) if where P ∞ CD is the probability (computed in Sect. S2) that a C-player who remains C forever will get exploited by a D-neighbor who remains D forever in a far-future interaction. Similarly, under the same condition (2), the D-to-C strategy change occurs (positive expected gain in changing to C) according to an infinite predictive horizon.
With a finite horizon of h ≥ 1 future interactions, the conditions governing strategy update are more complex (see Sects. S4 and S5, where the expected gains ∆π h C and ∆π h D respectively predicted by a C and a D for a strategy change are computed). Given a C and a D with identical neighborhoods, the r-threshold for C to remain C and that for D to change to C are different. Typically, the former is lower, as the C-neighbors of a C are more prone to play in the near future (see Sect. S8). Not surprisingly, for h = 1 (best-response update 28 ) we have ∆π 1 C > 0 and ∆π 1 D < 0, i.e., defecting assures the highest payoff independently of the network's structure. Moreover, under a condition on r more restrictive than (2) (derived in Sect. S7), the predicted payoff gains ∆π h C and ∆π h D are h-monotonic, respectively decreasing and increasing to the negative and positive infinite-horizon limits. For intermediate r, ∆π h C (resp. ∆π h D ) first increases (decreases) with h up to a positive (negative) extremum, then decreases (increases) to the negative (positive) infinite-horizon limit. Finally, we show (Sect. S7) that the effect on ∆π h C (resp. ∆π h D ) of adding one prediction step can be made arbitrarily negative (positive) by a sufficiently large r.
Despite the system's complexity, the above results have straightforward consequences.

3/12
(i) The state all-C is invariant under a weak requirement on r (condition (2) with k C = k; recall that ε and δ are small).
(ii) The state all-D is invariant (condition (2) with k C = 0), though a coordinated switch to C by a small cluster of players can give a payoff gain.
(iii) Isolated C's (k C = 0) change to D as soon as they revise their strategy. This does not mean that cooperation cannot start from isolated individuals, since D-neighbors could change to C before the isolated C changes to D.
(iv) Indeed, a D player connected to a single C (k C = 1) can change to C, provided the game return r is sufficiently large.
(vi) A multi-step predictive horizon (h ≥ 2) is essential for cooperation (see the above discussion on ∆π 1 C and ∆π 1 D ).
(vii) The inertia to change also helps cooperation, in the sense that a lower rate δ of strategy update reflects in longer C's abstentions and hence in a lower P ∞ CD in (2).
(viii) Main result: with both reciprocity and multi-step horizon, given all other details, there is a threshold r h C on r above which cooperation fixates starting from any cluster of (at least two) C's. An upper bound to r h C is obtained by considering the most connected node with one C-neighbor (Sect. S8).
(ix) The previous result probabilistically holds also starting from an isolated C, provided her degree k is not too small. With r > r h C , the probability that a D-neighbor changes to C before the isolated C changes to D goes as 1 − 1/(k + 1) for small δ (Sect. S9).
(x) Increasing the horizon h always reduces r h C (Sect. S8).
(xi) According to condition (2), cooperation seems to be favored in homogeneous sparse networks (low largest degree) compared to heterogeneous and/or dense ones. This is evident at low levels of cooperation, at which adding links to a node is not likely to increase the number of its C-neighbors, thus increasing the ratio k/k C .
(xii) In the complete network-to be used as benchmark since direct reciprocity is there unfeasible-the threshold r h C is maximal.
(xiii) The role of network's structure: the comment at point (ix) and two simple examples in Fig. 2 suggest that degree heterogeneity helps cooperation only if C's initially occupy the network's hubs. Essentially, low-degree D's connected to a C change to C under a mild requirement on r (low ratio k/k C in (2)). However, to have many low-degree D's connected to few initial C's, we need high-connected C's. Hence, especially starting at low levels of cooperation, degree heterogeneity and the placement of the initial C's in the network's hubs together reduce the required r to evolve to all-C. Compared to imitation update (in which C-hubs need a significant fraction of C-neighbors to persist), our rational process of strategy update allows the formation of C-clusters even starting from isolated C-hubs, who pay (or better invest in) the initial cost of exploitation. If however most of the hubs are D's, network heterogeneity turns harmful to cooperation (see point (xi)).  Figure 2. Networked rational reciprocity in homogeneous and heterogeneous networks. Consider (a) the ring (k = 2 for all nodes) and b the star network ( k ≈ 2 for large N) in the infinite-horizon limit. The ring-r ∞ C is given by (2) with k/k C = 2 and a single D drives the population to all-D if r < r ∞ C . The star-r ∞ C is much higher, because the ratio k/k C peaks at N − 1 for the central node with only one C-neighbor. However, if the central node is a C, the D-leaves will change to C under the weakest requirement on r (k/k C = 1 in (2)) and there are high chances that this occurs before the central node revises her strategy. (The probability that a D-leaf revises before the central node is given by the formula at point (ix), in which k is replaced by the number of D-leaves.) If the number k C of C-leaves raises to satisfy condition (2), the population then evolves to all-C, and this occurs with probability higher than 1/2 starting with no C-leaves and r equal to the ring-r ∞ C (the probability goes as 1/2 + 1/(2N) for small δ , see Sect. S9). That is, on average, the isolated central C drives the star to all-C under a milder condition on r w.r.t. the ring with some initial D's.
Note that the threshold r h C,max is much smaller than the theoretical r h C discussed at point (viii) of the Analytical results (see the average r h C over the simulations of a given type in Fig. 3, wherer h C is the upper bound to r h C derived in Sect. S8). This is due to the stochastic effect described at point (ix). Essentially, even if some of the C's initially need a higher r to remain C, by the time they revise their strategy the r-gap could have vanished because of D-to-C changes in the neighborhood. This overcompensates the opposite effect due to the fact that r h C is computed starting from a cluster of two C's, whereas initial C's are most often isolated (except for scale-free networks with degree-rank-C-placement, because hubs are likely connected among themselves; see the average % for each panel in Fig. 3). Starting with random pairs of connected initial C's indeed results in lower r h C,min and r h C,max (Fig. S4). As expected from the arguments at points (x) and (xi), the thresholds r h C,min and r h C,max decrease with the horizon h and increase with the network's average degree, given all other details (Fig. 3: compare the different colors within each panel and left vs right panels). Note that the effects are weakened (as expected) starting from higher initial C-levels (see Fig. S1).
Degree heterogeneity also works as theoretically predicted. Placing the initial C's in the network's hubs does favor cooperation, both in terms of invasion and fixation ( Fig. 3: compare single-scale and scale-free networks under degree-rank-Cplacement and note the lower r h C,min and r h C,max in scale-free networks; also note that the type of placement is irrelevant for single-scale networks). The effect is however moderate. Degree heterogeneity works against the fixation of cooperation under random placement (Fig. 3: compare single-scale and scale-free networks and note the significantly higher r h C,max in the latters; r h C,min is slightly lower, because of the larger number of low-connected D's connected to the initial C's, who change to C under mild returns). The effects are again weakened starting from higher initial C-levels (see Fig. S1).

Discussion
We have questioned network reciprocity as a mechanism supporting cooperation in socio-economic networks. The imitation processes used for strategy update in all theoretical models lack sense when interaction patterns are local and/or heterogeneous, and the interest for unconditional cooperation disappears under any rational strategy update. Imitation indeed did not emerge in social experiments. 19, 20, 24, 25, 31-34 What did emerge 19-21, 23 is a mood to be C or D and a form of direct reciprocity 1, 2 in the C mood-the tendency to cooperate with subjects who showed cooperation in past interactions. Nevertheless, the network plays a fundamental role in making cooperation possible. Static and sparse networks indeed make direct reciprocity feasible, granting repeated interactions within same groups of individuals and limiting the required cognitive abilities. We have theoretically shown that these are the networks in which a simple form of direct reciprocity, driven by a rational strategy update, more easily allows cooperation to invade and reach fixation.

Aims and novelty of the model
Our primary motivation was not to model the human behavior observed in the available experiments, but rather to test a specific hypothesis: Whether a form of direct reciprocity could evolve on networks (invasion and fixation) under a process of strategy update that rationally pursues the individual's interest. We named this EGT scenario networked rational reciprocity, of which our model is a minimal benchmark version. Minimal in two respects: the number of confronted strategies-two, 5/11 Figure 2. Networked rational reciprocity in homogeneous and heterogeneous networks. Consider (a) the ring (k = 2 for all nodes) and b the star network ( k ≈ 2 for large N) in the infinite-horizon limit. The ring-r ∞ C is given by (2) with k/k C = 2 and a single D drives the population to all-D if r < r ∞ C . The star-r ∞ C is much higher, because the ratio k/k C peaks at N − 1 for the central node with only one C-neighbor. However, if the central node is a C, the D-leaves will change to C under the weakest requirement on r (k/k C = 1 in (2)) and there are high chances that this occurs before the central node revises her strategy. (The probability that a D-leaf revises before the central node is given by the formula at point (ix), in which k is replaced by the number of D-leaves.) If the number k C of C-leaves raises to satisfy condition (2), the population then evolves to all-C, and this occurs with probability higher than 1/2 starting with no C-leaves and r equal to the ring-r ∞ C (the probability goes as 1/2 + 1/(2N) for small δ , see Sect. S9). That is, on average, the isolated central C drives the star to all-C under a milder condition on r w.r.t. the ring with some initial D's. open interval (0, 1)) indeed reveals that most of dots above 0.2 denote simulations that converge to all-C on a longer timescale, whereas dots below 0.2 typically represents simulations ending in a nontrivial stalemate-different from all-C and all-D-or showing long-term fluctuations (see SI Sect. S12 for further details and Sect. S10 for examples of nontrivial stalemates and fluctuations in the simple network of Fig. 1b).
Note that the threshold r h C,max is much smaller than the theoretical r h C discussed at point (viii) of the Analytical results (see the average r h C over the simulations of a given type in Fig. 3, wherer h C is the upper bound to r h C derived in Sect. S8). This is due to the stochastic effect described at point (ix). Essentially, even if some of the C's initially need a higher r to remain C, by the time they revise their strategy the r-gap could have vanished because of D-to-C changes in the neighborhood. This overcompensates the opposite effect due to the fact that r h C is computed starting from a cluster of two C's, whereas initial C's are most often isolated (except for scale-free networks with degree-rank-C-placement, because hubs are likely connected among themselves; see the average % for each panel in Fig. 3). Starting with random pairs of connected initial C's indeed results in lower r h C,min and r h C,max (Fig. S4). As expected from the arguments at points (x) and (xi), the thresholds r h C,min and r h C,max decrease with the horizon h and increase with the network's average degree, given all other details (Fig. 3: compare the different colors within each panel and left vs right panels). Note that the effects are weakened (as expected) starting from higher initial C-levels (see Fig. S1).
Degree heterogeneity also works as theoretically predicted. Placing the initial C's in the network's hubs does favor cooperation, both in terms of invasion and fixation ( Fig. 3: compare single-scale and scale-free networks under degree-rank-Cplacement and note the lower r h C,min and r h C,max in scale-free networks; also note that the type of placement is irrelevant for single-scale networks). The effect is however moderate. Degree heterogeneity works against the fixation of cooperation under random placement (Fig. 3: compare single-scale and scale-free networks and note the significantly higher r h C,max in the latters; r h C,min is slightly lower, because of the larger number of low-connected D's connected to the initial C's, who change to C under mild returns). The effects are again weakened starting from higher initial C-levels (see Fig. S1).

Discussion
We have questioned network reciprocity as a mechanism supporting cooperation in socio-economic networks. The imitation processes used for strategy update in all theoretical models lack sense when interaction patterns are local and/or heterogeneous, and the interest for unconditional cooperation disappears under any rational strategy update. Imitation indeed did not emerge in social experiments. 19, 20, 24, 25, 31-34 What did emerge 19-21, 23 is a mood to be C or D and a form of direct reciprocity 1, 2 in the C mood-the tendency to cooperate with subjects who showed cooperation in past interactions. Nevertheless, the network plays a fundamental role in making cooperation possible. Static and sparse networks indeed make direct reciprocity feasible, granting repeated interactions within same groups of individuals and limiting the required cognitive abilities. We have theoretically shown that these are the networks in which a simple form of direct reciprocity, driven by a rational strategy update, more easily allows cooperation to invade and reach fixation.
PD return, r PD return, r   Figure 3. Invasion, persistence, and fixation of cooperation under networked rational reciprocity. Panels show the level of cooperation reached in 10 4 game rounds starting from 1% inital C's on different network structures (left: average degree k = 4; right: k = 8) as a function of the PD game return r. Solid lines show the average fraction over 100 random initializations (random placement of the initial C's in planar lattices; network generation and random placement of the initial C's for random-single scale and scale free-networks; the average % of isolated initial C's is reported). Dots show the outcome of single simulations (only for r h C,min < r < r h C,max , i.e., only if some of the outcomes lie in the open interval (0, 1)); transparency is used to show dots accumulation. Colors code the predictive horizon h, from 2 to 5, and the corresponding upper bound r h C to the threshold r h C is reported. See Sects. S11 and S12 for further details on networks and numerical simulations.
strategies C and D-and, especially, the amount of information available to the players-nothing more than what they know from direct interaction. Benchmark, because of the choices of the PD interaction and the D strategy-unconditional D-both setting the worst-case scenario for the evolution of cooperation. Figure 3. Invasion, persistence, and fixation of cooperation under networked rational reciprocity. Panels show the level of cooperation reached in 10 4 game rounds starting from 1% inital C's on different network structures (left: average degree k = 4; right: k = 8) as a function of the PD game return r. Solid lines show the average fraction over 100 random initializations (random placement of the initial C's in planar lattices; network generation and random placement of the initial C's for random-single scale and scale free-networks; the average % of isolated initial C's is reported). Dots show the outcome of single simulations (only for r h C,min < r < r h C,max , i.e., only if some of the outcomes lie in the open interval (0, 1)); transparency is used to show dots accumulation. Colors code the predictive horizon h, from 2 to 5, and the corresponding upper bound r h C to the threshold r h C is reported. See Sects. S11 and S12 for further details on networks and numerical simulations.

Aims and novelty of the model
Our primary motivation was not to model the human behavior observed in the available experiments, but rather to test a specific hypothesis: Whether a form of direct reciprocity could evolve on networks (invasion and fixation) under a process of strategy update that rationally pursues the individual's interest. We named this EGT scenario networked rational reciprocity, of which our model is a minimal benchmark version. Minimal in two respects: the number of confronted strategies-two, strategies C

6/12
and D-and, especially, the amount of information available to the players-nothing more than what they know from direct interaction. Benchmark, because of the choices of the PD interaction and the D strategy-unconditional D-both setting the worst-case scenario for the evolution of cooperation.
The main novelty of our model is the model-predictive rule for strategy update. Except for best-response update 28corresponding to our 1-step prediction-the evolution of unconditional C on networks has been always studied with update rules that implement imitation in socio-economic contexts. Also direct reciprocity has been similarly investigated. 35 Other elements are rather standard. 36 Individuals repeatedly play in a static network. At each round, each individual is in the C or D mood 21, 23 and accordingly behaves using the C or D strategy. A round consists of a PD interaction among all pairs of neighbors and payoffs are collected. Strategy update is slow-compared to the frequency of game rounds-and asynchronous: 37 each individual revises what is the best strategy to follow at a rate that is assumed small and uniform across the population. This is the assumption that makes predictions of short-future incomes possible, by disregarding the neighbors' updates in the predictive horizon.
The way in which we implement direct reciprocity-allowing C's to abstain from playing-is also not new. Optional participation is known to relax social dilemmas when a baseline payoff is granted to loners, 11,26 whereas link disconnection has been recently considered. 27 Our link inhibition is temporary and grants no profit to C's. We prefer abstention rather than forcing retaliation-cooperators defecting neighboring exploiters-because this is more connatural to the C mood. Although there is no difference in a single round (because we assume no payoff for mutual defection), abstaining C's communicate their mood to exploiters. This choice is however not crucial for our findings. We expect similar results by confronting unconditional D with any reciprocating form of conditional cooperation under a rational process of strategy update (e.g., we preliminary tested the well-known tit-for-tat and forgiving tit-for-tat 2 ).
Direct reciprocity deserves another comment. Originally, 1, 2 it has been studied in iterated games, i.e., (non-evolutionary) games involving only two players (rather than a population of two types of players) who know the probability w > 0 of a next interaction. In the evolutionary context, there are two ways in which one can study repeated interactions among two given players in the population. Either the single game round consists of an iterated game between each pair of neighbors-an option allowing direct reciprocity even in large, dense or highly-dynamic networks, so far investigated in the socio-economic context under imitation-like update rules 35 -or one relies on a static, sparse network, as we do. Our game round involves a one-shot, optional PD interaction with neighbors. The game is however repeated over an indefinite number of rounds in a static network, so that a next round is essentially always granted. The sparsity of interactions makes direct reciprocity possible, by limiting the cognitive abilities required to remember neighbors and past interactions. Of course hubs need more resources than leaves, but this is typically built-in in the socio-economic structure.
From network reciprocity to networked rational reciprocity Network reciprocity and networked rational reciprocity are substantially different mechanisms for the evolution of cooperation. Limiting the discussion to socio-economic systems, the former models the competition between unconditional cooperation and defection under an imitation-like process of strategy update; the latter studies the competition between a reciprocating form of conditional cooperation and unconditional defection under a rational strategy update. Their common bond is the need of a population structure, steady and local, to support cooperation. To allow cluster of C's protected from D's and to make direct reciprocity feasible and effective, respectively.
We confirm the fundamental role played by static and sparse networks of contacts in the evolution of cooperation. We rethink, however, the underlying evolutionary mechanism. Direct reciprocity combined with a farsighted rule of strategy update-our multi-step predictive horizon-are the keys to explain the success of cooperation. With no reciprocity (or other mechanisms) supporting cooperation, players should rationally opt for defection; this is well-known in unstructured population, but it works as well on any structure. Similarly, the myopic optimization of the next game round suggests to defect.
Interestingly, our results are in line with those theoretically obtained for network reciprocity: [16][17][18] there are good chances that cooperation invades and fixates if the game return r sufficiently exceeds a measure of the network connectivity (see condition (2) for the case of an infinite predictive horizon; the threshold r h C in Sect. S8 for a finite horizon; also see all our numerical simulations). Two aspects on which the evolution of cooperation under network reciprocity and networked rational reciprocity differ are discussed in the following two sections.

The invasion of cooperation
We have shown that if the game return r is large enough, networked rational reciprocity grants high chances of (invasion and) fixation starting from a single C, i.e., chances of the order 1 − 1/(k + 1) + O(δ ), where k is the degree of the initial C and δ is the rate of strategy update (see analytical results (viii) and (ix)). This is different from what is granted by network reciprocity under the rule r > k, i.e., fixation probability larger than 1/N (in large regular networks in the limit of weak selection; 1/N is the fixation probability under totally random strategy update). When the rule of network reciprocity is weakly satisfied in a large network, cooperation almost surely disappears starting from a single C (probability 1 − 1/N). To have higher chances of fixation, a significantly larger r is typically required and, especially when selection is strong (SI note 3), cooperation cannot 7/12 invade anyhow. Consider, e.g., a single C in the lattice of Fig. 1a. If selection is strong, the C most likely imitates a D-neighbor as soon as she revises her strategy (probability δ ), whereas D-neighbors do not imitate the C. The probability of invasion-to go from one to two C's-is negligible after each game round, whereas the C sooner or later switches to D.
We hence conclude that network reciprocity does not support the invasion of cooperation. Not surprisingly, all theoretical studies showing significant fixation probabilities for cooperation under intermediate or strong selection considered random initial conditions with 50% C's 29, 30, 38-51 (33% has been considered in Ref. 28). Starting from isolated C's or small clusters, cooperation most likely disappear. We have, e.g., tested network reciprocity starting from 1% initial C's on the same network structures of Fig. 3, using the pairwise comparison imitation rule under strong selection (the one used in the majority of the above mentioned works; see SI notes 2 and 3). Cooperation systematically disappeared up to r = 5000, except for scale-free networks with degree-rank-C-placement (where C-hubs are known to form clusters) in which we found invasion only for r larger than 20.
The effect of the network structure A considerable effort has been devoted to identify the network structures that best favor the evolution of cooperation under network reciprocity. [38][39][40][41] The general answer is that, for a given (sufficiently small) average degree k and starting from a significant C-level, heterogeneous networks-e.g., scale-free networks-do better than homogeneous networks-lattices or single-scale networks. Indeed, a C-hub (individual i with degree k i k ) with a significant fraction (say 50%) of C-neighbors is imitated by a low-connected D-neighbor j under a mild requirement on the game return r (π i = (r − 1)k i /2 − k i , π j = r for a leaf j). C-hubs can then build C-clusters, whereas this requires higher returns in homogeneous networks (k i , k j ≈ k ).
On the contrary, heterogeneity works against cooperation under networked rational reciprocity. From condition (2) (for the case of an infinite horizon; similarly see the threshold r h C in Sect. S8 for a finite horizon) we see that the threshold on r above which strategy revising C's (resp. D's) remain (resp. change to) C increases with the node degree k. Especially at low levels of cooperation (low k C ), C-or D-hubs (connected to one or a few C's) require a larger r to opt for C than nodes with degree closer to average. The simulations in Fig. 3 indeed show that the r-thresholds required for invasion and fixation of cooperation are higher for scale-free w.r.t. single-scale networks, if the initial C's are placed at random.
However, C-hubs encourage low-connected D-neighbors in changing to C under mild returns. Provided the rate of strategy update is small enough, the number of C-neighbors of a C-hub will raise, while the hub pays the cost of building the cluster. In other words, initially isolated C-hubs invest in the future establishment of cooperation. Moreover, hubs are likely connected among themselves, so that placing the initial C's in the network's hubs forms clusters of C's that mitigate the investment. Network heterogeneity therefore turns beneficial to cooperation under this strategic placement of the initial C's.

Links with social experiments
Four experiments on relatively large (N ≥ 100), static and sparse human networks playing a PD have been performed to date. 19, 20, 24, 25 The two most recent (on ring lattices: N = 100, degree k = 2, 4, 6; 24 N = 225, k = 2 25 ) have shown significant levels of stable cooperation under the rule r > k of network reciprocity (r = 2, 4, 6 24 and r = 2; 25 significant cooperation was observed also for r = k), though the consistency of human behavior with the unconditional C and D strategies under imitation update was not documented. In the first two experiments (on planar lattices: N = 13 × 13 = 169, k = 8; 19 N = 25 × 25 = 625, k = 4; 20 on a heterogeneous network: N = 604, degree from 2 to 16, k = 3.4 20 ), the game return r was set below (r = 10/3) the average connectivity and cooperation dropped at levels comparable to those expected in the complete network (the all-to-all interaction was mimicked in a control treatment by reshuffling neighbors at each round). Subjects were however documented not to imitate better performing neighbors. They cooperated by essentially reciprocating the benefit obtained in the previous round. Moreover, the positive effect predicted by network reciprocity in heterogeneous networks 38-41 was missed. 20, 21 Because the experimental setting was very similar in the four experiments-in particular, the same game option, C or D, is taken for all neighbors at each round-we might expect a similar behavior.
Other experiments have been performed on smaller networks, with results consistent with the four larger experiments. On small ring lattices (N = 18, k = 4), low/medium levels of cooperation was observed for r = 4, 5, 31, 32 with subjects reacting to cooperation in previous rounds rather than to neighbors' previous payoffs (the payoff per round was however normalized by the node degree). Small-words and random networks (N = 18, k = 4) 31 as well as complete networks (N from 2 to 5) 32, 34 showed lower cooperation w.r.t. homogeneous sparse networks. On small square lattices (N = 16, k = 4), low cooperation was observed for r = 3, 33 with an apparently unconditional behavior driven by imitation update, though the cooperative strategy was later shown more robustly described as conditioned by direct reciprocity under a random strategy update 23 (see below).
The identification of the strategy (one or many) adopted (or learned) by humans when playing a PD experiment is a difficult task and the result is likely to depend on the experimental setting. A few general traits are however apparent from the analysis of the experiments of Refs. 19, 20, 33 (with similar settings). Apart from subjects who mostly cooperate or defect, that are always minorities, the analysis revealed that humans behave according to a cooperative or defective mood, 21, 23 thus justifying the modeling assumption of two strategies. In the C mood, subjects more likely cooperate (with all neighbors, as a rule of the experiment) the more cooperation was observed in the neighborhood in the previous round, i.e., they reciprocate cooperation.

8/12
Subjects in the D mood most likely defect irrespectively of the previous round, i.e. defection is largely unconditional. Strategy change is identified as a random process biased by the subject's mood. Precisely, a C/D-player changes strategy after opting for defection/cooperation at a given round. This shows that subjects did not copied better performing neighbors (though payoff were made available), but does not unveil what pushes subjects to change strategy. Our basic rational hypothesis is that subjects try to maximize, at the best of their knowledge, the outcome of one or a few future rounds.
Summarizing, the social experiments show that the C strategy is conditioned by direct reciprocity, that defection is rather unconditional, and that a rule of the kind r > k essentially works-in the sense of a threshold on r over which cooperation stabilizes at levels that increase with the game return r. They do not suggest the underlying evolutionary mechanism, but show it is not network reciprocity. These results provide the motivation and the basis for our model. However, we imagined a different model society with respect to the one imposed by the experimental setting in Refs.19,20, 33. The two major differences concern the C strategy. We assume that C-players do not defect, rather they can abstain from playing with defecting neighbors; and the choice is taken independently for each neighbors, instead of forcing a common decision. Assuming a rational process of strategy update, based on a model prediction of future payoffs, we show that cooperation fixates if the game return r is larger than a threshold that scales with the average connectivity of the network.
Although our simulations cannot be directly compared with the available experiments, we provide an evolutionary mechanism that has the potential to explain human cooperation. Our model-predictive rule for strategy update is not an easy one to apply in real networks, in which the rate of strategy update might be far from uniform across the population. Moreover, it requires nontrivial cognitive abilities and humans might not be as rational as we assume. However, it could emerge, approximatively, as the result of an intuitive, rather than computational, human behavior. To test this claim new experiments must be designed, and our model suggests how. For example, to confirm that humans do have a C or D mood, it is important to allow them to temporarily abstain from playing with specific neighbors, to avoid confounding risk-avoiding defections with a D mood. Allowing independent decisions with different neighbors is also important, e.g., to identify a mix of C and D as the absence of a mood. Allowing independent decisions definitely poses experimental challenges, and it has been recently shown to enhance cooperation in static networks. 52

Methods
When the C-individual i (normally reciprocating, ε = 0) gets exploited by neighbor j, she draws the number a of game rounds to skip according to the distribution Prob(a) = (1 − δ ) a δ , a ≥ 0 (with mean 1/δ − 1), i.e., i goes back playing with j at the (a + 1)-th round following the exploitation with the probability that j first revises her strategy after the a-th round (e.g., i does not stop playing with j, a = 0, with the probability δ that j revises strategy just after exploiting i). Without knowing the number drawn by i, the probability p i j that i will agree to play with j at the t-th round following exploitation is the cumulative distribution of Prob(a) from a = 0 to a = t − 1, i.e., the probability that the drawn number is smaller than t. Instead of managing abstention periods, we implement our model by endowing each individual i with the set of probabilities p i j that i will agree to play with j at the next round, i, j = 1, . . . , N. Initially, p i j = p ji = 1 if i and j are neighbors; p i j = p ji = 0 otherwise. The N × N matrix P = [p i j ] defines the (static) network topology. At each game round, each PD interaction takes place with probability p i j p ji , i.e., only if both players agree to play. When the C-individual i gets exploited by neighbor j, she sets p i j = p 1 = δ , i.e., to the probability that j revises her strategy just after. If i decides not to play with j at the next round, p i j is updated to p 2 = 1 − (1 − δ ) 2 , the probability that j had revised strategy by the second round following exploitation. After t − 1 consecutive abstentions, the probability to play at the t-th round is p t in (3), that increases to one with t. When the C-individual i gets reciprocated by the C-neighbor j, she resets p i j = 1. D's always have p i j = 1 toward all neighbors. (Note that either p i j = 1 or p ji = 1 by construction.) We therefore implement a different model, w.r.t. the one so far described, by shifting the network's dynamic from the links (on-off) to the links' weights (the probabilities p i j ). The two models are statistically equivalent and the latter is simpler to analyze (see Supplementary Information).
For super/sub reciprocating C's (ε ≷ 0), the rate of strategy update δ must be replaced with the biased rate δ ε in the above formulas. Recall (from (1)) that ε = −(δ ε − δ )/δ is the relative mismatch between δ and δ ε , i.e., the under/over-estimation of δ adopted by super/sub reciprocating C's in deciding how long to abstain. For super/sub reciprocating C's, 1/δ ε is larger/smaller than the average number of rounds 1/δ within which the exploiter once revises her strategy, so that, on average, C's stop playing with exploiters for longer/shorter than the time taken by the latter to possibly change to C. (C's stop playing forever if ε → ε max = 1; they never stop-they play C unconditionally-if ε → ε min = −(1 − δ )/δ .) After each game round, each individual independently decides whether to revise strategy with probability δ . C's who revise compute their expected accumulated payoff assuming to remain C, π h CC , or to behave as D, π h CD , in the next h rounds (see Sect. S4) and change to D if the expected gain ∆π h C = π h CD − π h CC is positive. So doing, they assume the neighbors' strategy 9/12 unchanged since the last PD interaction that took place, so that D's who changed to C in the meantime are considered as D's. D's who revise compute their expected accumulated payoff gain ∆π h D = π h DC − π h DD under full information (see Sect. S5) and change to C under a positive gain. When changing to D, C's set p i j = 1 toward all neighbors. When changing to C, D's set p i j = δ ε toward D-neighbors, as if drawing an abstention period.
To compute their expected payoff gains ∆π h C and ∆π h D , C's and D's evaluate the probabilities to interact with their neighbors during the predictive horizon. To this end, two probabilities are defined in Sects. S2 and S3 for the C-individual i with p i j = p t i j for some t i j ≥ 0 (p 0 = 1 by definition): the probability P t CD (t i j ) to play with the D-neighbor j at round t ≥ 1 of the horizon; and, similarly, the probability P t CC (t i j ) to play with the C-neighbor j (with p ji = 1). As t → ∞, P t CD converges (independently of the initialization) to the infinite-horizon limit P ∞ CD of condition (2). Although making prediction assuming no strategy change in the neighborhood makes sense only for short horizons, relatively to the revision rate δ , players can hardly do better predictions anyway. In our simulations we have limited the product δ h-upper bounding the neighborhood fraction possibly subject to change within the horizon-to 0.3 (e.g., δ = 0.05 and h ≤ 5 in Fig. 3).
For details on networks' structure and generation and on numerical simulations, see Sects. S11 and S12, respectively.
Author contributions statement: All authors were involved in the design of the research and in the analysis. F.D.R. performed the numerical analysis and produced the graphics; C.P. performed the network analysis; F.D. wrote the paper.   2. Network reciprocity has been shown to work under several evolutionary processes. These include 16, 40 the two imitation processes traditionally used to describe strategy update in socio-economic networks: Imitation (IM)-an individual is selected uniformly at random to revise her strategy and stays or copies one of the neighbors' strategies with probabilities proportional to her own and the neighbors' (normalized) fitnesses-and Pairwise Comparison (PC)-a randomly chosen individual compares payoffs with a random neighbor and stays or copies the neighbor's strategy proportionally to fitness difference. Fitness is a measure of performance in the underlying game (in terms of reproduction in biology and status in socio-economic systems); it can simply be the game payoff in the last round (see note 3). Network reciprocity best works for the biological process known as Death-Birth (DB) 16 -a random individual is selected to die and the neighbors compete for the empty site proportionally to fitness-that is equivalent to a modified IM process where the selected individual is forced to imitate a neighbor. Interestingly, network reciprocity does not work (the condition on the game return r is highly demanding 18 ) for the dual biological process of Birth-Death (BD)-an individual is selected to reproduce proportionally to fitness over the whole population and the offspring replaces a randomly selected neighbor. Essentially, under BD, D's at the border of C-clusters reproduce more than bordering C's. (See Ref. 51 for other biologically-inspired evolutionary processes, in which selection acts globally or locally on both birth and death with possibly independent dispersal and interaction graphs.) 3. Weak selection means that the game payoffs contribute to the individual's fitness only to a small extent. It is an interesting limit because it simplifies analytic computations. It represents situations in which the individual's performance is largely determined by factors that are independent of the game interaction and typically assumed time-invariant. They give a baseline fitness to all players, to which the game output marginally adds. E.g., in the PC process, the probability that an individual with low payoff imitates a selected one with high payoff is slightly above 50%; as well, there is a significant probability, slightly below 50%, to copy individuals with lower payoff. That is, the performance in the game is weakly selected. Selection is strong when the fitness is totally determined by the game. It can simply be the payoff, or even a function of the payoff that increases more than linearly, so to increase the probability that a given payoff difference will result in the selection of the best performance. The strongest selection (extreme selection 57 ) is the case in which the player with larger payoff is always selected to reproduce or be copied. In our model selection is extreme, in the sense that when revising strategy individuals always opt for the strategy giving the largest payoff prediction.    Figure S1. Persistence and fixation of cooperation under networked rational reciprocity. The figure complements Fig. 3 by showing the results obtained starting from 50% initial C's, all other details unchanged (same model parameters, networks, and initializations). Note that the thresholds on r identified with 50% initial C's have a different meaning w.r.t. the significantly higher r h C,min and r h C,max of Fig. 3. Here cooperation persists (resp. fixates), on average, for r above the lower (resp. higher) threshold, being however unable to invade (resp. fixate after invasion) if r is below the r h C,min (resp. r h C,max ) of Fig. 3. The effects of the different network structures and initializations observed in Fig. 3 are still present but weakened.

2/19
Figure S1. Persistence and fixation of cooperation under networked rational reciprocity. The figure complements Fig. 3 by showing the results obtained starting from 50% initial C's, all other details unchanged (same model parameters, networks, and initializations). Note that the thresholds on r identified with 50% initial C's have a different meaning w.r.t. the significantly higher r h C,min and r h C,max of Fig. 3. Here cooperation persists (resp. fixates), on average, for r above the lower (resp. higher) threshold, being however unable to invade (resp. fixate after invasion) if r is below the r h C,min (resp. r h C,max ) of Fig. 3. The effects of the different network structures and initializations observed in Fig. 3 are still present but weakened.
PD return, r PD return, r PD return, r PD return, r  3 and S1 by adding a degree-4 ring lattice and the complete network, starting from 1% (left panels) and 50% (right panels) initial C's. The ring behaves similarly to the square lattice, better supporting the invasion of C's (slightly lower r h C,min for 1% initial C's). In the ring lattice, half of the D's at the boundary of a small C-cluster (those next to a C in the main loop) have 2 D-neighbors, whereas they typically have 3 D-neighbors in the square lattice, so the that D-to-C strategy changes requires a lower return at low levels of C. The increased number of simulations not ended in all-C or all-D w.r.t. the square lattice (dots in the open interval (0, 1), especially for 1% initial C's) is due to the slower convergence, because of the essentially one-dimensional ring structure. The all-to-all connection is the most demanding structure for the invasion of C's (note the different scale of the r-axis in panel c), and even for its fixation starting from 50% initial C's (compare with Fig. S1). The complete network is in a stalemate starting from 50% initial C's with intermediate returns. Stalemate at the initial state is possible on any network structure (see, e.g., the simple network of Fig. 1b analyzed in Sect. S10), though we observed it in our numerical simulations only for the complete network. It typically requires significant C-levels, since isolated C's are willing to change, though we observed it also at 1% initial C's for r slightly larger than r h C,min . (See Sect. S12 for more details on the numerical results.) Figure S2. Networked rational reciprocity on other regular network structures. The figure complements Figs. 3 and S1 by adding a degree-4 ring lattice and the complete network, starting from 1% (left panels) and 50% (right panels) initial C's. The ring behaves similarly to the square lattice, better supporting the invasion of C's (slightly lower r h C,min for 1% initial C's). In the ring lattice, half of the D's at the boundary of a small C-cluster (those next to a C in the main loop) have 2 D-neighbors, whereas they typically have 3 D-neighbors in the square lattice, so the that D-to-C strategy changes requires a lower return at low levels of C. The increased number of simulations not ended in all-C or all-D w.r.t. the square lattice (dots in the open interval (0, 1), especially for 1% initial C's) is due to the slower convergence, because of the essentially one-dimensional ring structure. The all-to-all connection is the most demanding structure for the invasion of C's (note the different scale of the r-axis in panel c), and even for its fixation starting from 50% initial C's (compare with Fig. S1). The complete network is in a stalemate starting from 50% initial C's with intermediate returns. Stalemate at the initial state is possible on any network structure (see, e.g., the simple network of Fig. 1b analyzed in Sect. S10), though we observed it in our numerical simulations only for the complete network. It typically requires significant C-levels, since isolated C's are willing to change, though we observed it also at 1% initial C's for r slightly larger than r h C,min . (See Sect. S12 for more details on the numerical results.   Figure S3. Sensitivity analysis. The figure complements Fig. 3 by reproducing the simulations with average degree k = 4 and predictive horizon h = 3 (red in the left panels of Fig. 3) for two different (larger/smaller) values of the reciprocity parameter ε (0.5 and −1, halving/doubling the reciprocity-biased rate of strategy update δ ε (see (1)) w.r.t. the baseline value in Table S3; left column) and of the rate of strategy update δ (0.1 and 0.025, double/half of the baseline value; right column). As expected at points (v) and (vii) of the Analytical results, super/sub reciprocation (ε ≷ 0) as well as higher/lower inertia to change (lower/higher δ ) result in lower/higher thresholds r h C,min and r h C,max . Figure S3. Sensitivity analysis. The figure complements Fig. 3 by reproducing the simulations with average degree k = 4 and predictive horizon h = 3 (red in the left panels of Fig. 3) for two different (larger/smaller) values of the reciprocity parameter ε (0.5 and −1, halving/doubling the reciprocity-biased rate of strategy update δ ε (see (1)) w.r.t. the baseline value in Table S3; left column) and of the rate of strategy update δ (0.1 and 0.025, double/half of the baseline value; right column). As expected at points (v) and (vii) of the Analytical results, super/sub reciprocation (ε ≷ 0) as well as higher/lower inertia to change (lower/higher δ ) result in lower/higher thresholds r h C,min and r h C,max .

4/19
PD return, r PD return, r   Figure S4. Networked rational reciprocity under random pair placement of the initial C's. The figure complements Fig. 3 by reproducing the simulations with random placement of the initial C's by randomly selecting pairs of C-neighbors instead of single individuals. The 10 initial C's (1% of N = 1000 nodes) are iteratively selected as follows: each time a selected node has no C-neighbors, the next node is selected among its neighbors; the last node is either selected to pair the previous one, or among the D's connected to C's. Both r h C,min and r h C,max , as well as their gap, get reduced w.r.t. random placement in Fig. 3. Recall that isolated C's switch to D at first strategy revision, so that starting with all isolated C's it is always possible to end in the trivial stalemate all-D. Note that degree-rank-pair-placement-selecting initial C's according to degree-ranking by pairing isolated ones-would have reproduced the results of Fig. 3, as random and degree-rank selections are essentially equivalent for single-scale networks and degree-rank-C-placement is unlikely to leave isolated C's in scale-free networks. Figure S4. Networked rational reciprocity under random pair placement of the initial C's. The figure complements Fig. 3 by reproducing the simulations with random placement of the initial C's by randomly selecting pairs of C-neighbors instead of single individuals. The 10 initial C's (1% of N = 1000 nodes) are iteratively selected as follows: each time a selected node has no C-neighbors, the next node is selected among its neighbors; the last node is either selected to pair the previous one, or among the D's connected to C's. Both r h C,min and r h C,max , as well as their gap, get reduced w.r.t. random placement in Fig. 3. Recall that isolated C's switch to D at first strategy revision, so that starting with all isolated C's it is always possible to end in the trivial stalemate all-D. Note that degree-rank-pair-placement-selecting initial C's according to degree-ranking by pairing isolated ones-would have reproduced the results of Fig. 3, as random and degree-rank selections are essentially equivalent for single-scale networks and degree-rank-C-placement is unlikely to leave isolated C's in scale-free networks.

5/19
PD return, r PD return, r  Figure S5. Networked rational reciprocity in scale-free networks with medium-high transitivity. We checked the robustness of our results with respect to a different type of scale-free network (Holme-Kim model, 63 HK) with tunable transitivity (the average probability that the neighbors of a node are neighbors themselves, see SI note 4), since network transitivity-shown to favor cooperation under imitation update 29, 30 -vanishes with the size of Barabási-Albert networks. 62 (See Sect. S11 for details on the BA and HK algorithms.) The resulting transitivity (average over 100 networks) is 0.74 for average degree k = 4 (left) and 0.28 for k = 8 (right) (respectively 0.027 and 0.037 in Fig. 3; transitivity is known to increases/decrease with k in BA/HK networks 62,63 ). Although the theoretical degree distribution for large size is the same of a BA network, the effect of raising transitivity in a finite network is an increase of nodes with low and high degree to the detriment of nodes with intermediate degree (checked on average on our 100 networks). This 'finite size effect' explains our results: slightly lower r h C,min and higher r h C,max compared with the same panel in Fig. 3. Invasion is indeed favored by the enhanced initial presence of low-connected D's, who change to C under mild returns if connected to a C, whereas fixation is hindered by the enhanced initial presence of D-hubs. The increased number of simulations not ended in all-C or all-D (dots in the open interval (0, 1), essentially in the left panels where transitivity is very high, w.r.t. Fig. 3) can be also explained in terms of the loss of nodes with intermediate degree. On one hand, the loss makes the evolution slower, possibly forming bottlenecks through which cooperation must percolate, so that some of the dots at significant C-levels might denote simulations ending in all-C on a longer timescale. On the other hand, the loss makes nontrivial stalemates and fluctuations more frequent, as D-nodes with high degree can prevent the convergence to all-C even under degree-rank-C-placement. (See Sect. S12 for more details on the numerical results.) Figure S5. Networked rational reciprocity in scale-free networks with medium-high transitivity. We checked the robustness of our results with respect to a different type of scale-free network (Holme-Kim model, 63 HK) with tunable transitivity (the average probability that the neighbors of a node are neighbors themselves, see SI note 4), since network transitivity-shown to favor cooperation under imitation update 29, 30 -vanishes with the size of Barabási-Albert networks. 62 (See Sect. S11 for details on the BA and HK algorithms.) The resulting transitivity (average over 100 networks) is 0.74 for average degree k = 4 (left) and 0.28 for k = 8 (right) (respectively 0.027 and 0.037 in Fig. 3; transitivity is known to increases/decrease with k in BA/HK networks 62,63 ). Although the theoretical degree distribution for large size is the same of a BA network, the effect of raising transitivity in a finite network is an increase of nodes with low and high degree to the detriment of nodes with intermediate degree (checked on average on our 100 networks). This 'finite size effect' explains our results: slightly lower r h C,min and higher r h C,max compared with the same panel in Fig. 3. Invasion is indeed favored by the enhanced initial presence of low-connected D's, who change to C under mild returns if connected to a C, whereas fixation is hindered by the enhanced initial presence of D-hubs. The increased number of simulations not ended in all-C or all-D (dots in the open interval (0, 1), essentially in the left panels where transitivity is very high, w.r.t. Fig. 3) can be also explained in terms of the loss of nodes with intermediate degree. On one hand, the loss makes the evolution slower, possibly forming bottlenecks through which cooperation must percolate, so that some of the dots at significant C-levels might denote simulations ending in all-C on a longer timescale. On the other hand, the loss makes nontrivial stalemates and fluctuations more frequent, as D-nodes with high degree can prevent the convergence to all-C even under degree-rank-C-placement. (See Sect. S12 for more details on the numerical results.)

6/19
Supplementary Methods S1 The probability to play p t After t − 1 ≥ 0 consecutive abstentions following an exploitation by the neighbor j, the probability p i j that the C-individual i agrees to play with j at the next game round is set to p t = 1 − (1 − δ ε ) t . It is the probability that j had revised her strategy at least once ever since the exploitation, according to the reciprocity-biased rate of strategy update δ ε . (Recall that δ ε = (1 − ε)δ ∈ (0, 1) is equal to, resp. smaller/larger than, the actual rate δ for for normally, resp. super/sub, reciprocating individuals; ε = 0, resp. ε ≷ 0.) If the (i, j) interaction takes place at the next round, then i sets p i j to 1 or to p 1 = δ ε depending on whether j cooperates or defects. Otherwise, i updates p i j to p t+1 . Note that p t+1 can be obtained with the recursion (S1) We set p 0 = 1 and, whenever needed in the following, t i j denotes the integer giving p t i j = p i j for the neighbor pair (i, j).

CD
Starting after a given game round with p i j = p t i j for some t i j ≥ 0 (or after initialization with t i j = 0), P t CD (t i j ) is the probability that the C-individual i plays with the D-neighbor j at the t-th future round, assuming no change of strategy. (The t i j -argument, sometimes omitted in the following, makes initialization explicit.) We therefore have P 1 CD (t i j ) = p t i j for t = 1, while for t > 1 we use the recursion That is, if i is exploited by j at round t, she will then play at round t + 1 with probability p 1 = δ ε (first term after the first equal sign in (S2)); otherwise P t CD is updated as p t with the rule (S1) (second term). Starting with p i j = δ ε (i.e., t i j = 1), the probabilities p t and P t CD are listed in Table S1 for t ≥ 1 (first and second columns). Both probabilities have linear (1-st-order) leading δ ε -term with same coefficient t, i.e., P t CD (t i j ) ≈ (t i j + t − 1)δ ε for small δ ε (up to 1-st-order, only the terms δ ε and P t CD matter in the right-most side of (S2)) As t → ∞, P t CD converges (independently of the initialization) to the infinite-horizon limit The limit is reached monotonically if δ ε < 1/3 (from below if P 1 CD < P ∞ CD or P 1 CD > 1 − P ∞ CD ; from above if P ∞ CD < P 1 CD ≤ 1 − P ∞ CD ; see Fig. S6). This condition is met in the numerical analysis, where we use at most δ ε = 0.1 (corresponding to the perturbation of Fig. S3 w.r.t. the baseline value δ ε = 0.05, see Table S3).

CC
Starting after a given game round with p i j = p t i j for some t i j ≥ 0 (or after initialization with t i j = 0), P t CC (t i j ) is the probability that the C-individual i plays at the t-th future round with the C-neighbor j, assuming no change of strategy. If t i j = 0, then i and j reciprocated cooperation in the last round, so that P t CC (0) = 1 for all t ≥ 1. Otherwise, i stopped playing after getting exploited by j who turned C in the meanwhile (so that p ji = 1). In both cases we have P 1 CC (t i j ) = p t i j for t = 1, while for t > 1 we use the recursion That is, if i does play at round t, she will certainly play at round t + 1 (first term after the first equal sign in (S4)); otherwise P t CC changes from p t i j +t−1 to p t i j +t (second term). Starting with p i j = δ ε (i.e., t i j = 1), the probability P t CC is listed in Table S1 for t ≥ 1 (third column). The leading δ ε -term is linear (1-st-order) and from (S4) it follows that its coefficient in P t+1 CC is t i j + t plus the coefficient in P t CC (up to 1-st-order, only the 1-st-order term of the first power (1 − δ ε ) t i j +t and the 0-order term of the second matter in the right-most side of (S4)). For sufficiently small δ ε , we hence have P t CC (t i j ) > P t CD (t i j ) for any t i j ≥ 0 and t > 1.
The probability P t CC monotonically increases to 1 (independently of t i j ). We now show that inequality (S5) holds true for any δ ε ∈ (0, 1). From the first right-hand side in (S4) (time-shifted from t − 1 to t), we see that P t CC (t i j ) is an interior convex combination (by construction 0 < P t−1 CC < 1) of 1 and p t i j +t−1 < 1, so that P t CC (t i j ) > p t i j +t−1 for any t > 1. Similarly, from (S2) (time-shifted from t − 1 to t), we see that P t CD (t i j ) is an interior convex combination of δ ε (the smallest of the p t for t ≥ 1) and the transformation of P t−1 CD by rule (S1). By construction, we hence have P t CD (t i j ) < p t i j +t−1 for any t > 1. The fixed point P ∞ CD and the asymptotic rate of convergence (the local slope 1− δ ε (4 − 3δ ε ) of the right-hand side of (S2) at the fixed point) as a function of δ ε . The slope is in (−1, 1) (asymptotic stability) for all δ ε ∈ (0, 1) and in (0, 1) (monotonic convergence) for δ ε ∈ (0, 1/3). Table S1. Probabilities p t , P t CD (1), and P t CC (1) (i.e., starting with P 1 CD = p 1 = δ ε ).

S4
The predicted payoff gain ∆π h C After any game round, each of the k i probabilities p i j of the C-individual i is either 1 or equal to p t i j for some t i j ≥ 1. In revising her strategy, i assumes the strategy of j unchanged since they last interacted. That is, i believes j to be a C if they either both cooperated at the last round or j did not play because the last time they interacted i exploited j, so that p ji < 1, and then i changed to C. In both situations, p i j = 1. Conversely, i believes j to be a D if the last time they interacted j exploited i, so that p i j < 1. C's can hence underestimate the number of their C-neighbors. Let k C,i (≤ k i ) be the number of C-neighbors of individual i and k ′ C,i (≤ k C,i ) the number of p i j = 1. Assuming no strategy change in the neighborhood within the predictive horizon h (see Methods in the main text for a discussion of this assumption), i computes the expected payoffs accumulated in the next h rounds behaving as D, π h CD , or as C, π h CC , as follows: • Initialize π h CD = 0 and π h CC = 0 and consider the sums (note that t i j must be replaced with τ in the right-hand side of recursion (S4) to define P t CC (τ) in terms of P t CC (τ −1)); Note that the indexes t ji are involved in the computation, i.e., if j refused to play with i at the last game round, we assume that i remembers when she has last exploited j. Also note that ∆π 1 i.e., defect is the best-response for h = 1. Figure S6. The dynamics of the recursion (S2). (a) The right-hand side of (S2) for δ ε = 0.05 (solid thick line; the thin line is the diagonal) and the probabilities P t CD (1), t = 1, . . . , 10 (dots). (b) The fixed point P ∞ CD and the asymptotic rate of convergence (the local slope 1− δ ε (4 − 3δ ε ) of the right-hand side of (S2) at the fixed point) as a function of δ ε . The slope is in (−1, 1) (asymptotic stability) for all δ ε ∈ (0, 1) and in (0, 1) (monotonic convergence) for δ ε ∈ (0, 1/3). Table S1. Probabilities p t , P t CD (1), and P t CC (1) (i.e., starting with P 1 CD = p 1 = δ ε ).

S4
The predicted payoff gain ∆π h C After any game round, each of the k i probabilities p i j of the C-individual i is either 1 or equal to p t i j for some t i j ≥ 1. In revising her strategy, i assumes the strategy of j unchanged since they last interacted. That is, i believes j to be a C if they either both cooperated at the last round or j did not play because the last time they interacted i exploited j, so that p ji < 1, and then i changed to C. In both situations, p i j = 1. Conversely, i believes j to be a D if the last time they interacted j exploited i, so that p i j < 1. C's can hence underestimate the number of their C-neighbors. Let k C,i (≤ k i ) be the number of C-neighbors of individual i and k C,i (≤ k C,i ) the number of p i j = 1. Assuming no strategy change in the neighborhood within the predictive horizon h (see Methods in the main text for a discussion of this assumption), i computes the expected payoffs accumulated in the next h rounds behaving as D, π h CD , or as C, π h CC , as follows: • Initialize π h CD = 0 and π h CC = 0 and consider the sums (note that t i j must be replaced with τ in the right-hand side of recursion (S4) to define P t CC (τ) in terms of P t CC (τ −1)); Note that the indexes t ji are involved in the computation, i.e., if j refused to play with i at the last game round, we assume that i remembers when she has last exploited j. Also note that i.e., defect is the best-response for h = 1.

8/19
S5 The predicted payoff gain ∆π h D After any game round, the D-individual i has probability p i j = 1 toward each of her k i neighbors and full information, i.e., knowledge of the p ji of the k C,i ≤ k i C-neighbors. In revising her strategy (assuming no strategy change in the neighborhood within the predictive horizon h, see Methods in the main text), i computes the expected payoffs accumulated in the next h rounds behaving as C, π h DC , or as D, π h DD , as follows: • Initialize π h DC = 0 and π h DD = 0 and consider the sums S h CD and S h CC in (S6); • For each C-neighbor j, add (r − 1)S h CC (t ji ) to π h DC ; • For each C-neighbor j, add rS h CD (t ji ) to π h DD .
• For each D-neighbor j, subtract S h CD (1) from π h DC ; where m j ∈ {C, D} is the strategy of individual j and the sums span the neighborhood of individual i. Note that i.e., defect is the best-response for h = 1. (2) and the threshold r ∞ C With an infinite predictive horizon (h → ∞), the sums S h CD (τ) and S h CC (τ) in (S6) diverge with S h CD /h → P ∞ CD and S h CC /h → 1 independently of τ. Consequently, for a C-and a D-individual with degree k and k C ≤ k C-neighbors, the predicted payoff gains ∆π h C and ∆π h D (from eqs. (S7) and (S9)) are unbounded with

S6 Proof of condition
where, for a C, k C ≤ k C is the number of p i j = 1.
Solving ∆π h C /h < 0 and ∆π h D /h > 0 in the limits in (S11) gives condition (2) in the main text. The threshold r ∞ C on r above which cooperation fixates starting from any cluster of (at least two) C's is then obtained from (2) by setting k to the largest degree k max in the network and k C = 1, i.e., (S12)

S7 The effect of the predictive horizon h
From eqs. (S7) and (S9), it immediately follows that the contributions to the predicted payoff gains ∆π h C and ∆π h D of one more prediction step are and For any h ≥ 1, the first can be made arbitrarily negative, the second arbitrarily positive, by a sufficiently large r. Any multi-step predictive horizon (h ≥ 2) is hence sufficient for the evolution of rational reciprocity, provided the game return is large enough.  Instead, for a given r, the effect on cooperation of extending the predictive horizon is positive, i.e., ∆π h C and ∆π h D are respectively decreasing and increasing with h ≥ 1, if and r > 1 + ∑ m j =C P h+1 obtained by solving ∆π h+1 C − ∆π h C < 0 and ∆π h+1 D − ∆π h D > 0 for r from eqs. (S13) and (S14). The denominators in (S15) are minimized by t ji = 1. Indeed, t ji = 0 gives P h+1 increases with t ji for any δ ε ∈ (0, 1) and remains below 1 − P ∞ CD for t ji = 1 (checked numerically, see Fig. S7a for h = 2, . . . , 5). Since the quantity P h+1 CC (1) − P h+1 CD (1) increases with h (and converges to 1 − P ∞ CD for large h, see Fig. S7b), a lower bound to the denominators in (S15) is P 2 Table S1) to be multiplied by k ′ C (the number of p i j = 1) in the left condition and by k C (the number of C-neighbors of individual i) in the right one. An upper bound to the numerators is k(1 + 3δ ε )/4, obtained by replacing P h+1 CD (t ji ), P h+1 CD (t i j ), and P h+1 CD (1) with the maximal value assumed by the right-most side of (S2), i.e., with δ ε + 1 4 (1 − δ ε ). We then have ∆π h+1 Note that condition (S16) might be very conservative, especially for small δ ε . Moreover, even if ∆π h C and ∆π h D are respectively increasing and decreasing with small h, they become and remain decreasing and increasing (under condition (2) in the main text) for sufficiently large h. Indeed, the right-hand sides in (S15) converge to the right-hand side of (2) for large h. Instead, for a given r, the effect on cooperation of extending the predictive horizon is positive, i.e., ∆π h C and ∆π h D are respectively decreasing and increasing with h ≥ 1, if and r > 1 + ∑ m j =C P h+1 obtained by solving ∆π h+1 C − ∆π h C < 0 and ∆π h+1 D − ∆π h D > 0 for r from eqs. (S13) and (S14). The denominators in (S15) are minimized by t ji = 1. Indeed, t ji = 0 gives P h+1 . And for t ji ≥ 1, P h+1 CC (t ji ) − P h+1 CD (t ji ) increases with t ji for any δ ε ∈ (0, 1) and remains below 1 − P ∞ CD for t ji = 1 (checked numerically, see Fig. S7a for h = 2, . . . , 5). Since the quantity P h+1 CC (1) − P h+1 CD (1) increases with h (and converges to 1 − P ∞ CD for large h, see Fig. S7b), a lower bound to the denominators in (S15) is P 2 Table S1) to be multiplied by k C (the number of p i j = 1) in the left condition and by k C (the number of C-neighbors of individual i) in the right one. An upper bound to the numerators is k(1 + 3δ ε )/4, obtained by replacing P h+1 CD (t ji ), P h+1 CD (t i j ), and P h+1 CD (1) with the maximal value assumed by the right-most side of (S2), i.e., with δ ε + 1 4 (1 − δ ε ). We then have ∆π h+1 C − ∆π h C < 0 and ∆π h+1 D − ∆π h D > 0 for any h ≥ 1 under Note that condition (S16) might be very conservative, especially for small δ ε . Moreover, even if ∆π h C and ∆π h D are respectively increasing and decreasing with small h, they become and remain decreasing and increasing (under condition (2) in the main text) for sufficiently large h. Indeed, the right-hand sides in (S15) converge to the right-hand side of (2) for large h.
, (S17 from which we note that the r-threshold for a C to remain C (left) is typically lower than that for a D with same neighborhood t switch to C (right). Indeed, the denominators in (S17) grow with t ji , as each (positive) element of the sum S h Fig. S7a), and the indexes t ji are expected to be higher for a C-individual i. In the following we derive an upper bound to r h C consistent with the limit in (S12). For any h ≥ 2, the threshold r is the maximal value attained by the right-hand sides in (S17) over all possible choices of the node i and over all possib configurations of its neighborhood (in terms of the strategies m i and m j and of the probabilities p i j and p i j ), restrictin however the search to configurations with at least one known C-neighbor that can be reached from an initial state (a state wit p i j = p ji = 1 for all connected pairs (i, j) and p i j = p ji = 0 otherwise).
As noted above, the denominators in (S17) are minimized by t ji = 1. Moreover, the sum over j comprises a single eleme if individual i knows to have only one C-neighbor. A lower bound to the denominators in (S17) is hence S h CC (1) − S h CD (1), th can be easily be tabulated w.r.t. h from Table S1. At numerator there are k i terms of the kind S h CD (τ). The value of τ ≥ 0 th maximizes S h CD (τ) unfortunately depends on both h and δ ε , so we cannot upper bound the numerator by a specific choice o τ. However, we note that S h CD (τ) is upper bounded by 1 + (h − 1)P ∞ CD (checked numerically, see Fig. S8 for h = 2, . . . , 5). The threshold r h C is hence upper bounded by obtained by taking i as the node with highest degree k max . The boundr h C converges for large h to r ∞ C (from above und condition (2); S h CC (1)/h → 1 and S h CD (1)/h → P ∞ CD ). It is confirmed by all our simulations, though it can be quite conservativ especially for small δ ε .
from which we note that the r-threshold for a C to remain C (left) is typically lower than that for a D with same neighborhood to switch to C (right). Indeed, the denominators in (S17) grow with t ji , as each (positive) element of the sum S h Fig. S7a), and the indexes t ji are expected to be higher for a C-individual i. In the following we derive an upper bound to r h C consistent with the limit in (S12). For any h ≥ 2, the threshold r h C is the maximal value attained by the right-hand sides in (S17) over all possible choices of the node i and over all possible configurations of its neighborhood (in terms of the strategies m i and m j and of the probabilities p i j and p i j ), restricting however the search to configurations with at least one known C-neighbor that can be reached from an initial state (a state with p i j = p ji = 1 for all connected pairs (i, j) and p i j = p ji = 0 otherwise).
As noted above, the denominators in (S17) are minimized by t ji = 1. Moreover, the sum over j comprises a single element if individual i knows to have only one C-neighbor. A lower bound to the denominators in (S17) is hence S h CC (1) − S h CD (1), that can be easily be tabulated w.r.t. h from Table S1. At numerator there are k i terms of the kind S h CD (τ). The value of τ ≥ 0 that maximizes S h CD (τ) unfortunately depends on both h and δ ε , so we cannot upper bound the numerator by a specific choice of τ. However, we note that S h CD (τ) is upper bounded by 1 + (h − 1)P ∞ CD (checked numerically, see Fig. S8 for h = 2, . . . , 5). The threshold r h C is hence upper bounded by obtained by taking i as the node with highest degree k max . The boundr h C converges for large h to r ∞ C (from above under condition (2); S h CC (1)/h → 1 and S h CD (1)/h → P ∞ CD ). It is confirmed by all our simulations, though it can be quite conservative, especially for small δ ε .

11/19
S9 Relevant probabilities for strategy update Two probabilities regarding the process of strategy update have been used in discussing the analytical results in the main text. The first, P update,k , is the probability that, in a (possibly infinite) sequence of game rounds, one or more of the k ≥ 1 neighbors of a given individual revise their strategy before she does. The second, P update,k , is the probability that, in a (possibly infinite) sequence of game rounds in a star configuration, at least half of the k ≥ 2 leaves revise their strategy (at least once) before central one does (k even).
To compute these two probabilities, recall that after each game round the probability that an individual with degree k does not revise strategy, while some of her neighbors do, is (S19) Then, P update,k is obtained as the following geometric series where the t-th element of the sum is the probability that the strategy update occurs after round t + 1 (i.e., with no revision after the first t rounds). Note that, for small δ , P update,k approaches 1 − 1/(k + 1) from below (use Hôpital or note that by the binomial expansion). The probability P update,k is too complex to be characterized analytically. However, for small δ , we can assume that after each game round either none or at most one of the k leaves of the star does revise strategy, i.e., we neglect the probability Thus, P update,k > 1 2 for small δ , whereas it obviously drops to zero as δ → 1 (for δ = 1, all individuals revise strategy after each game round). Numerically (with a Monte-Carlo approach), we have checked that for δ = 0.05 (the baseline value used in Fig. 3) P update,k remains larger than 1 2 for stars with up to 70 nodes, that is not far from the maximal degree of the scale-free networks used in our simulations (see Table S2).

S10 Stalemate and fluctuations in the network of Fig. 1b
Consider the network and the initial state of Fig. 1b, with i = 1, j = 2, nodes 3 to k 1 +1 being the k 1 −1 initial C-neighbors of 1, and nodes k 1 +2 to k 1 +k 2 being the k 2 − 1 initial D-neighbors of 2, k 1 , k 2 > 2.
After the first game round, at which all individuals participate (i.e., no C abstains from playing), the strategy-revising C-individual 1 remains C if (from the left inequality in (S17), taking into account that 1 lowered to p 1 the probability to play with 2 at second round, i.e., t 12 = 1). The strategy-revising D-individual 2 changes to C if (from the right inequality in (S17)). Strategy-revising C-individuals 3 to k 1 +1 remain C under (from the left inequality in (S17) with t i1 = t 1i = 0, i = 3, . . . , k 1 +1), a condition implied by (S22). Strategy-revising Dindividuals k 1 +2 to k 1 +k 2 remain D because have no C-neighbors. Assume that no one changes. After the second game round, if 1 played with 2, the situation is the same as after the first round; otherwise, after a consecutive abstentions, the strategy-revising C-individual 1 remains C if

12/19
while the strategy-revising D-individual 2 changes to C if .

(S26)
Again, strategy-revising C-individuals 3 to k 1 +1 remain C under (S24) and strategy-revising D-individuals k 1 +2 to k 1 +k 2 do not change strategy. The threshold r a C,1 is larger than r 0 C,1 for small a (S h CD (a+1) grows with a ≥ 0 as long a is sufficiently small, see Fig. S8) and decreases with k 1 , while r a D,2 decreases with a for sufficiently large k 2 (S h CC (a+1) − S h CD (a+1) grows with a ≥ 0 as discussed in Sect. S8) and increases with k 2 . For sufficiently large k 1 and k 2 , we therefore have max a r a C,1 < min a r a D,2 . However, the opposite relation is also possible for any a, k 1 , k 2 , provided δ ε is sufficiently small.
If max a r a C,1 < r < min a r a D,2 , the network is in a stalemate. The network can also reach a stalemate. Imagine node 2 to be initially a C, with its condition to remain such after the first round unsatisfied, i.e., (possible for sufficiently large k 2 ) and the condition for 1 to remain C met, that is r > r 0 C,3 from (S24). Then, if 2 changes to D the network goes in the stalemate.
But the same network can also produce long-term fluctuations. Consider the initial state of Fig. 1b with r 0 C,1 < r < r a C,1 < min a r a D,2 for some a > 0. Then, the C-individual 1 is the only willing to change after a consecutive abstentions. Once 1 switches to D, she will remain such for some game rounds. If in the meantime the C-neighbors 3 to k 1 +1 do not change strategy and abstain for a rounds, the condition for 1 to go back C becomes that is satisfied for large enough a (S h CD (a+1) converges to S h CD (0) for large a, so that the right-hand side of (S28) approaches r 0 C,1 ). Once 1 switches back to C after a sufficiently long abstention of nodes 3 to k 1 +1, the situation is essentially the one just after the first game round.
The above example shows that long-term fluctuations are possible, though sometime difficult to observe. They might require specific sequences of events. Essentially, once an individual changes strategy and plays a game round, she is not in the condition to switch back. A D changes to C when the probabilities that most of her C-neighbors play in the next rounds are sufficiently high, otherwise she will mostly play with D-neighbors; and the strategy change further raises such probabilities, making the switch to D unattractive in the near future. As well, switching to D lowers the probabilities that the C-neighbors play in the next rounds, thus preventing the near switch to C.

S11 Networks
We used six standard types of networks-three regular and three random-of N = 1000 nodes and M = N k /2 links, k denoting the average degree. For each random type, we generated 100 networks. Details on the networks' structure and generation algorithms are given below. Table S2 reports several structural indicators (averaged over the 100 generated networks for random types). See Ref. 58 for further details on generation and analysis of complex networks.

S11.1 Regular networks
Planar lattices: rectangular lattices of N nodes with degree k = 4 (horizontal and vertical links-square lattices) and k = 8 (also including diagonal links) with periodic boundary conditions. We used lattices of 40 × 25 nodes. Ring lattices: loops of N nodes each connected to the k/2 nearest nodes in both left and right directions in the loop. We used only k = 4 (the degree k must be an even integer). Complete network: each node is connected to all N − 1 others.

S11.2 Random networks
Watts-Strogatz (WS) with full rewiring: WS rewiring of all left links of a degree-k ring lattice. 59 We use this model to generate single-scale random networks, i.e., networks with sufficiently narrow degree distribution-small variance-so that the average degree k = k well describes the 'scale' of the connections. The standard single-scale model is the Erdös-Rényi 58 (ER) random network, where each of the M = N k /2 links is included with probability p = M/(N(N − 1)/2) = k /(N − 1) and the binomial degree distribution-binomial(k, N − 1, p)-converges for large N to the Poisson with parameter k . The resulting network is however disconnected if k /N is too small (it is disconnected with probability 1 if p < ln N/N, a condition that is met in our simulations with N = 1000 and k = 4). Though connectivity can be easily forced, the effects on the degree 13 is Poissonian-like for large N (see, e.g., Fig. 1b).
Barabási-Albert (BA): BA degree-rank preferential attachment of k /2 links. 61 We use this model to generate scale-free random networks, i.e., networks with broad degree distribution-large variance-showing low-and high-connected nodes, in spite of their average degree. The BA algorithm produces networks with degree distribution that is zero for k < k /2 and converges, for large k and N, to a power law with exponent −3, hence showing an large variance (infinite variance in the limit N → ∞). The transitivity-the average over the network's nodes of the fraction of connected neighbor pairs-increases with k but vanishes as (ln N) 2 /N for large N. 62 Holme-Kim (HK): HK scale-free networks with tunable transitivity. 63 We use this model to generate scale-free random networks with non-vanishing transitivity for large size. With a tunable probability (that we set to 1), the HK algorithm alternates steps of degree-rank preferential attachment with steps in which a triangle is closed between the new node, the last preferred node and a neighbor of the latter. The resulting transitivity does not vanish with N and decreases with k , because the higher number of triangles closed with larger k does not compensate for the increased number of possibilities. The theoretical degree distribution for large N is the same of the BA model. However, w.r.t. a finite BA network, the HK algorithm raises the number of low-and high-connected nodes to the detriment of nodes with intermediate degree (checked on average in our 100 networks). Imagine, e.g., to raise transitivity by rewiring. One option is to increase the number of hub-hub connections, to close triangles with common leaves. This is achieved by detaching the termination of a hub's link to reconnect it to another hub. The latter node gains a link, while the looser moves left in the degree distribution. For another option to increase transitivity in scale-free networks-by introducing the small-world property-see Ref. 62.

S12 Numerical simulations
For each panel of Figs. 3 and S1-S5 and for each value of the predictive horizon (h = 2, . . . , 5), we have first run simulations to identify the thresholds r h C,min and r h C,max (all simulations reach all-D for r ≤ r h C,min ; all-C for r ≥ r h C,max ). We have then run simulations for r in the open interval (r h C,min , r h C,max ) using an equally spaced grid with resolution of about 0.2 (except for Fig. S2b in which the scale of r is larger). For a given network of N nodes (labeled 1 to N) and assigned model parameters (δ , ε, h, r) (see Table S3), we used the following simulation procedure (implemented in Matlab).

Initial state
Initial C-level: we used either 1% or 50%. Let N C denote the number of initial C's, i.e., N C = 10 or 500 for our networks of N = 1000 nodes. Random placement of the initial C's: The first N C nodes of a random (uniform distribution) permutation of the integers {1, . . . , N} are set to C; all others are set to D.
Degree-rank placement of the initial C's: the N C nodes with highest degree are set to C (random choice, if needed, among the nodes sharing the smallest selected degree); all others are set to D. Probabilities to play: p i j = p ji = 1 for all connected pairs (i, j); p i j = p ji = 0 otherwise.

Random number generation
For each simulation, we used two independent random number generators, one to generate the network (for random network models) and the initial state, and one to perform the simulation (game interaction and strategy update). The pair of initialization seeds for the two generators uniquely identify the simulation and are stored to allow reproduction.

Simulation length and outcome
Max. number of game rounds: 500/δ game rounds (= 10 4 for our baseline value of δ ). It is the time length within which each individual revises strategy 500 times, on average. Early termination: termination in all-C or all-D before the last game round. There is no reason to continue the simulation, as all-C and all-D are both invariant states (trivial stalemates).
Outcome: 1/0 in case of termination in all-C/D; otherwise, the average C-level (fraction of C-nodes) over the last 100/δ (20% of) game rounds (after strategy update).

Classification
Trivial stalemate: termination in all-C or all-D. Nontrivial stalemate: termination different from all-C and all-D with no strategy change in the last 100/δ (20% of) game rounds. There is of course no guarantee that this criterion identifies real stalemates. In practice, we classify as nontrivial stalemates the cases in which evolution slows down so much to be in a 'practical' stalemate. Long-term fluctuation: termination different from all-C and all-D with some strategy change in the last 100/δ game rounds yielding a regression slope within 10 −6 (the slope of the regression line over the last 100/δ game rounds does not exceed ±1 individual over 1000 game rounds). This is also a practical criterion. Non-convergence: termination different from all-C and all-D with some strategy change in the last 100/δ game rounds yielding a regression slope exceeding ±10 −6 .

Results
We applied the above classifications to all simulations of Figs. 3 and S1-S5 performed for r h C,min < r < r h C,max , i.e., for values of the game return r and the predictive horizon h for which neither all simulations ended in all-C nor in all-D. The results, grouped by type of network and initialization (i.e., by figure panel) are reported in Table S4. (W.r.t. the predictive horizon h within each panel, we note that the occurrence of stalemates and fluctuations and the corresponding levels of C slightly increase up to h = 4 or 5). Only for Fig. 3, the simulations not ended in all-C or all-D have been extended to a ten-times longer timescale to validate the classification (see rows labeled 'l.t.' in Table S4).

Trivial stalemates
• They constitute a significant fraction of outcomes independently of the network's structure and initialization, all-D dominating close to r h C,min , all-C close to r h C,max . • They are majority starting from 1% initial C's, especially in regular and single-scale networks, and their frequencies slightly increase (especially all-C) by extending the simulations on a longer timescale (compare regular with 'l.t.' rows in the classification of Fig. 3), not only because of the reduced fractions of non-convergent simulations, but also because apparent nontrivial stalemates and fluctuations eventually end up in all-C or all-D.
• They seem to occur less frequently for 50% initial C's (see the classification of Figs. S1 and S2b,d), though part of the effect is due to the slower convergence (see the fractions of non-convergent simulations and the comment below on convergence).

15/19
a Scale-free network, strategic placement of initial C's b Scale-free network, random placement of initial C's time (game rounds) • Non-convergent simulations are more frequent starting from 50% initial C's. The evolution of cooperation is indeed slower, because the values of r for which cooperation persists from 50% are lower than those for which cooperation invades. At low r, a D needs to have a significant fraction of C-neighbors to change to C (see condition (2) in the main text), so she will wait longer (i.e., more strategy updates) before opting for a strategy change, compared to a case with higher r. In contrast, if r allows the invasion of C's, D's are willing to change strategy with few C-neighbors, and this speeds up the spread of C's. The classification of simulations starting from 50% initial C's is therefore imprecise on our timescale. It however gives the interesting insights discussed above on nontrivial stalemates and fluctuations. • The connectivity of the network speeds up the evolution of cooperation (compare the fractions of non-convergent simulations for k = 4 and k = 8), especially if starting from 1% initial C's. More connections favor the spread of the better strategy and the higher values of r required for the invasion of cooperations further contribute to the faster convergence. • Network heterogeneity seems to slow down the evolution of cooperation, because of possible bottlenecks in the network structure. The effect is amplified under degree-rank-C-placement, as the thresholds r h C,min and r h C,max are lower w.r.t. to single-scale networks.
• Finally, note that the majority of non-convergent simulations have positive regression slope, meaning that the C-level observed on our timescale is underestimated. This is confirmed by the extended simulations performed for Fig. 3 (compare regular with 'l.t.' rows in Table S4), where all-C systematically gains w.r.t. the other outcomes. Figure S9. Long-term fluctuations. Two examples observed in scale-free (BA) networks starting from 1% randomly placed (a) and 50% degree-rank placed (b) initial C's. Other parameters: see legends and baseline values in Table S3. Both simulations are classified as long-term fluctuations in the timescale of 10 4 game rounds, i.e., based on rounds from 0.8 to 1 (×10 4 , shaded area); average asymptotic C-level 0.140 (a), 0.492 (b); fluctuation amplitude 0.008 (a), 0.011 (b).

17/19
• Non-convergent simulations are more frequent starting from 50% initial C's. The evolution of cooperation is indeed slower, because the values of r for which cooperation persists from 50% are lower than those for which cooperation invades. At low r, a D needs to have a significant fraction of C-neighbors to change to C (see condition (2) in the main text), so she will wait longer (i.e., more strategy updates) before opting for a strategy change, compared to a case with higher r. In contrast, if r allows the invasion of C's, D's are willing to change strategy with few C-neighbors, and this speeds up the spread of C's. The classification of simulations starting from 50% initial C's is therefore imprecise on our timescale. It however gives the interesting insights discussed above on nontrivial stalemates and fluctuations. • The connectivity of the network speeds up the evolution of cooperation (compare the fractions of non-convergent simulations for k = 4 and k = 8), especially if starting from 1% initial C's. More connections favor the spread of the better strategy and the higher values of r required for the invasion of cooperations further contribute to the faster convergence. • Network heterogeneity seems to slow down the evolution of cooperation, because of possible bottlenecks in the network structure. The effect is amplified under degree-rank-C-placement, as the thresholds r h C,min and r h C,max are lower w.r.t. to single-scale networks.
• Finally, note that the majority of non-convergent simulations have positive regression slope, meaning that the C-level observed on our timescale is underestimated. This is confirmed by the extended simulations performed for Fig. 3 (compare regular with 'l.t.' rows in Table S4), where all-C systematically gains w.r.t. the other outcomes.  Table S4. Classifications of the simulations performed for r h C,min < r < r h C,max grouped by panel of Figs. 3 and S1-S5. For each panel, the table reports the fractions of the six possible outcomes (all-C, all-D, stalemate, fluctuation, non-convergence with positive/negative regression slope). To quantify stalemates and fluctuations, the table reports the mean and standard deviation ( C and σ C ) of the average asymptotic C-level and, only for fluctuations, the mean and standard deviation ( A and σ A ) of the oscillation's amplitude (the min-max excursion of the asymptotic C-level). To validate the classification, the simulations of Fig. 3 have been extended over a ten-times longer timescale (5000/δ = 10 5 game rounds, the last 20% of which used for the classification; see row label 'l.t.'), showing no significant change.