Evolution of cooperation in stochastic games

Hilbe, Christian; Šimsa, Štěpán; Chatterjee, Krishnendu; Nowak, Martin A.

doi:10.1038/s41586-018-0277-x

Letter
Published: 04 July 2018

Evolution of cooperation in stochastic games

Christian Hilbe^1,2,
Štěpán Šimsa³,
Krishnendu Chatterjee² &
…
Martin A. Nowak^1,4

Nature volume 559, pages 246–249 (2018)Cite this article

15k Accesses
147 Citations
181 Altmetric
Metrics details

Subjects

Abstract

Social dilemmas occur when incentives for individuals are misaligned with group interests^{1,2,3,4,5,6,7}. According to the ‘tragedy of the commons’, these misalignments can lead to overexploitation and collapse of public resources. The resulting behaviours can be analysed with the tools of game theory⁸. The theory of direct reciprocity^{9,10,11,12,13,14,15} suggests that repeated interactions can alleviate such dilemmas, but previous work has assumed that the public resource remains constant over time. Here we introduce the idea that the public resource is instead changeable and depends on the strategic choices of individuals. An intuitive scenario is that cooperation increases the public resource, whereas defection decreases it. Thus, cooperation allows the possibility of playing a more valuable game with higher payoffs, whereas defection leads to a less valuable game. We analyse this idea using the theory of stochastic games^16,17,18,19 and evolutionary game theory. We find that the dependence of the public resource on previous interactions can greatly enhance the propensity for cooperation. For these results, the interaction between reciprocity and payoff feedback is crucial: neither repeated interactions in a constant environment nor single interactions in a changing environment yield similar cooperation rates. Our framework shows which feedbacks between exploitation and environment—either naturally occurring or designed—help to overcome social dilemmas.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: In stochastic games, the decisions made by players in one round determine the game that will be played next round.**

**Fig. 2: Stochastic games can promote cooperation even if all individual games favour defection.**

**Fig. 3: Probabilistic transitions maximize cooperation in three different stochastic games.**

**Fig. 4: Strong immediate feedback maximizes cooperation.**

Evolution of direct reciprocity in group-structured populations

Article Open access 04 November 2022

Pairwise interact-and-imitate dynamics

Article Open access 24 June 2021

Self-regulation versus social influence for promoting cooperation on networks

Article Open access 16 March 2020

References

Lloyd, W. F. Two Lectures on the Checks to Population (Oxford Univ. Press, Oxford, 1833).
Google Scholar
Hardin, G. The tragedy of the commons. Science 162, 1243–1248 (1968).
Article ADS PubMed CAS Google Scholar
Trivers, R. L. The evolution of reciprocal altruism. Q. Rev. Biol. 46, 35–57 (1971).
Article Google Scholar
Axelrod, R. The Evolution of Cooperation (Basic Books, New York, NY, 1984).
MATH Google Scholar
Ostrom, E. Governing the Commons: The Evolution of Institutions for Collective Action (Cambridge Univ. Press, Cambridge, 1990).
Book Google Scholar
Nowak, M. A. Five rules for the evolution of cooperation. Science 314, 1560–1563 (2006).
Article ADS PubMed PubMed Central Google Scholar
Van Lange, P. A. M., Balliet, D., Parks, C. D. & Van Vugt, M. Social Dilemmas – The Psychology of Human Cooperation (Oxford Univ. Press, Oxford, 2015).
Google Scholar
Sigmund, K. The Calculus of Selfishness (Princeton Univ. Press, Princeton, 2010).
Book MATH Google Scholar
Nowak, M. & Sigmund, K. A strategy of win-stay, lose-shift that outperforms tit-for-tat in the Prisoner’s Dilemma game. Nature 364, 56–58 (1993).
Article ADS PubMed CAS Google Scholar
Hauert, C. & Schuster, H. G. Effects of increasing the number of players and memory size in the iterated prisoner’s dilemma: a numerical approach. Proc. R. Soc. Lond. B 264, 513–519 (1997).
Article ADS Google Scholar
Killingback, T. & Doebeli, M. The continuous prisoner’s dilemma and the evolution of cooperation through reciprocal altruism with variable investment. Am. Nat. 160, 421–438 (2002).
PubMed Google Scholar
Szolnoki, A., Perc, M. & Szabó, G. Phase diagrams for three-strategy evolutionary prisoner’s dilemma games on regular graphs. Phys. Rev. E 80, 056104 (2009).
Article ADS CAS Google Scholar
Grujić, J., Cuesta, J. A. & Sánchez, A. On the coexistence of cooperators,defectors and conditional cooperators in the multiplayer iterated Prisoner’s Dilemma. J. Theor. Biol. 300, 299–308 (2012).
Article MathSciNet PubMed MATH Google Scholar
García, J. & van Veelen, M. In and out of equilibrium I: evolution of strategies in repeated games with discounting. J. Econ. Theory 161, 161–189 (2016).
Article MathSciNet MATH Google Scholar
Hilbe, C., Chatterjee, K. & Nowak, M. A. Partners and rivals in direct reciprocity. Nat. Hum. Behav. (2018).
Shapley, L. S. Stochastic games. Proc. Natl Acad. Sci. USA 39, 1095–1100 (1953).
Article ADS MathSciNet PubMed MATH CAS Google Scholar
Neyman, A. & Sorin, S. (eds) Stochastic Games and Applications (Kluwer Academic Press, Dordrecht, 2003).
MATH Google Scholar
Mertens, J. F. & Neyman, A. Stochastic games. Int. J. Game Theory 10, 53–66 (1981).
Article MathSciNet MATH Google Scholar
Mertens, J. F. & Neyman, A. Stochastic games have a value. Proc. Natl Acad. Sci. USA 79, 2145–2146 (1982).
Article ADS MathSciNet PubMed MATH CAS Google Scholar
Rand, D. G. & Nowak, M. A. Human cooperation. Trends Cogn. Sci. 17, 413–425 (2013).
Article PubMed Google Scholar
Ledyard, J. O. in The Handbook of Experimental Economics (eds Kagel, J. H. & Roth, A. E.) 111–194 (Princeton Univ. Press, Princeton, 1995).
Milinski, M., Sommerfeld, R. D., Krambeck, H.-J., Reed, F. A. & Marotzke, J. The collective-risk social dilemma and the prevention of simulated dangerous climate change. Proc. Natl Acad. Sci. USA 105, 2291–2294 (2008).
Article ADS PubMed Google Scholar
Alur, R., Henzinger, T. & Kupferman, O. Alternating-time temporal logic. J. Assoc. Comput. Mach. 49, 672–713 (2002).
Article MathSciNet MATH Google Scholar
Miltersen, P. B. & Sorensen, T. B. A near-optimal strategy for a heads-up no-limit texas hold’em poker tournament. In Proc. 6th International Joint Conference on Autonomous Agents and Multiagent Systems 191 (ACM, 2007).
Ashcroft, P., Altrock, P. M. & Galla, T. Fixation in finite populations evolving in fluctuating environments. J. R. Soc. Interface 11, 20140663 (2014).
Article PubMed PubMed Central Google Scholar
Gokhale, C. S. & Hauert, C. Eco-evolutionary dynamics of social dilemmas. Theor. Popul. Biol. 111, 28–42 (2016).
Article PubMed MATH Google Scholar
Hauert, C., Holmes, M. & Doebeli, M. Evolutionary games and population dynamics: maintenance of cooperation in public goods games. Proc. R. Soc. Lond. B 273, 2565–2570 (2006); corrigendum 273, 3131–313 (2006).
Article Google Scholar
Weitz, J. S., Eksin, C., Paarporn, K., Brown, S. P. & Ratcliff, W. C. An oscillating tragedy of the commons in replicator dynamics with game-environment feedback. Proc. Natl Acad. Sci. USA 113, E7518–E7525 (2016).
Article PubMed CAS Google Scholar
Tavoni, A., Schlüter, M. & Levin, S. The survival of the conformist: social pressure and renewable resource management. J. Theor. Biol. 299, 152–161 (2012).
Article MathSciNet PubMed MATH Google Scholar
Traulsen, A., Nowak, M. A. & Pacheco, J. M. Stochastic dynamics of invasion and fixation. Phys. Rev. E 74, 011909 (2006).
Article ADS CAS Google Scholar
Neyman, A. Continuous-time stochastic games. Games Econ. Behav. 104, 92–130 (2017).
Article MathSciNet MATH Google Scholar
Nowak, M. A. & Sigmund, K. The evolution of stochastic strategies in the prisoner’s dilemma. Acta Appl. Math. 20, 247–265 (1990).
Article MathSciNet MATH Google Scholar
Ohtsuki, H. & Iwasa, Y. The leading eight: social norms that can maintain cooperation by indirect reciprocity. J. Theor. Biol. 239, 435–444 (2006).
Article MathSciNet PubMed Google Scholar
Stewart, A. J. & Plotkin, J. B. Collapse of cooperation in evolving games. Proc. Natl Acad. Sci. USA 111, 17558–17563 (2014).
Article ADS PubMed CAS Google Scholar
Pinheiro, F. L., Vasconcelos, V. V., Santos, F. C. & Pacheco, J. M. Evolution of all-or-none strategies in repeated public goods dilemmas. PLOS Comput. Biol. 10, e1003945 (2014).
Article ADS PubMed PubMed Central CAS Google Scholar
Akin, E. in Ergodic Theory, Advances in Dynamics (ed. Assani, I.) 77–107 (de Gruyter, Berlin, 2016).
Hilbe, C., Martinez-Vaquero, L. A., Chatterjee, K. & Nowak, M. A. Memory-n strategies of direct reciprocity. Proc. Natl Acad. Sci. USA 114, 4715–4720 (2017).
Article PubMed CAS Google Scholar
Stewart, A. J. & Plotkin, J. B. Small groups and long memories promote cooperation. Sci. Rep. 6, 26889 (2016).
Article ADS PubMed PubMed Central CAS Google Scholar
Reiter, J. G., Hilbe, C., Rand, D. G., Chatterjee, K. & Nowak, M. A. Crosstalk in concurrent repeated games impedes direct reciprocity and requires stronger levels of forgiveness. Nat. Commun. 9, 555 (2018).
Article ADS PubMed PubMed Central CAS Google Scholar
Fudenberg, D. & Imhof, L. A. Imitation processes with small mutations. J. Econ. Theory 131, 251–262 (2006).
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work was supported by the European Research Council Start Grant 279307: Graph Games (to K.C.), Austrian Science Fund (FWF) grant P23499-N23 (to K.C.), FWF NFN grant S11407-N23 Rigorous Systems Engineering/Systematic Methods in Systems Engineering (to K.C.), Office of Naval Research Grant N00014-16-1- 2914 (to M.A.N.) and the John Templeton Foundation (M.A.N.). C.H. acknowledges support from the ISTFELLOW programme.

Reviewer information

Nature thanks A. Neyman and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

Authors and Affiliations

Program for Evolutionary Dynamics, Harvard University, Cambridge, MA, USA
Christian Hilbe & Martin A. Nowak
IST Austria, Klosterneuburg, Austria
Christian Hilbe & Krishnendu Chatterjee
Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic
Štěpán Šimsa
Department of Organismic and Evolutionary Biology, Department of Mathematics, Harvard University, Cambridge, MA, USA
Martin A. Nowak

Authors

Christian Hilbe
View author publications
You can also search for this author in PubMed Google Scholar
Štěpán Šimsa
View author publications
You can also search for this author in PubMed Google Scholar
Krishnendu Chatterjee
View author publications
You can also search for this author in PubMed Google Scholar
Martin A. Nowak
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors conceived the study, performed the analysis, discussed the results and wrote the manuscript.

Corresponding authors

Correspondence to Christian Hilbe, Krishnendu Chatterjee or Martin A. Nowak.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Our findings are robust with respect to parameter changes.

To test the robustness of our findings, we consider the stochastic game introduced in Fig. 2a and independently vary several key parameters. a, b, When we vary the benefit of cooperation in state 1, we find that the advantage of the stochastic game is most pronounced when this benefit is intermediate, 1.5 ≤ b₁ ≤ 2.5. This conclusion holds independently of whether individuals use pure strategies only (a) or stochastic ones (b). c–f, We obtain similar results when we vary the error rate ε (c), the strength of selection β (d), the discount factor δ (e) and the mutation rate μ (f). In all cases, we observe that stochastic games yield a cooperation premium, provided that errors are sufficiently rare, selection is sufficiently strong, players give sufficient weight to future payoffs and mutations are comparably rare. Solid lines indicate exact results in the limit of rare mutations, whereas square symbols and dashed lines represent simulation results (see Supplementary Information for details). Filled circles highlight the results obtained for the parameters in Fig. 2a. As default parameters, we used the same values as in Fig. 2a: N = 100, b₁ = 2.0, b₂ = 1.2, c = 1, β = 1, ε = 0.001, δ → 1 and μ → 0.

Extended Data Fig. 2 Whether cooperation evolves in two-player games depends critically on the form of the environmental feedback.

Keeping the game parameters fixed at the values used in Fig. 2a, we explored how the evolution of cooperation depends on the underlying transition structure of the stochastic game in the limit of rare mutations (see Supplementary Information). a–h, We calculated the selection–mutation equilibrium for all possible stochastic games with two states when transitions are state-independent and deterministic. i, Overall, six of the eight transition structures lead players to spend more time in the more profitable state 1, in which mutual cooperation has a higher benefit. j, However, cooperation evolves in only two out of these six transition structures. These two structures have in common that mutual cooperation always leads to the beneficial state 1, whereas mutual defection leads to the detrimental state 2. Thus, cooperation is most likely to evolve if the environmental feedback itself incentivizes mutual cooperation and disincentivizes mutual defection. The transitions after unilateral defection have a less prominent role.

Extended Data Fig. 3 Analysis of the evolving strategies suggests that the evolution of cooperation hinges on the success of WSLS.

Here, we consider all state-invariant and deterministic stochastic games with two states and two players. a–h, For each of the eight possible cases, we recorded the evolving cooperation rate (lower plots) and the relative abundance of each pure memory-one strategy (upper plots) for different values of b₁. For clarity, we depict only two memory-one strategies explicitly, All D (the strategy that prescribes to always defect) and WSLS. The colour-shaded bars on top of the upper plots show parameter regimes in which either All D or WSLS is most abundant among all 16 strategies. In four of the eight cases, we observe that full cooperation evolves as the benefit to cooperation in state 1 approaches b₁ = 3. These are exactly the cases in which mutual cooperation leads players towards the more beneficial state 1. Moreover, in these four cases the upper plots show that cooperation emerges owing to the success of WSLS, which is the predominant strategy whenever cooperation prevails. Except for the value of b₁, all other parameter values are the same as in Extended Data Fig. 2.

Extended Data Fig. 4 Effect of transitions on cooperation in four-player public-goods games.

We also explored the effect of different transition structures for stochastic games between multiple players (with a public-goods game being played in each state). State 1 is again more beneficial because r₁ > r₂, but to be in state 1 there must be a minimum number k of cooperators in the previous round. a–f, For a four-player public-goods game, there are six possible monotonic configurations of the stochastic game because k can be any number from 0 (players always move to first state) to 5 (players never move to first state). h, There is a non-monotonic relationship between the six transition structures and the time spent in the more beneficial state 1. g, The evolving cooperation rate becomes maximal when any deviation from mutual cooperation leads players to state 2 (e). Parameters are as in Fig. 2b, but with the multiplication factor in the first state fixed to r₁ = 2 and selection strength β = 1; to derive exact results, we considered the limit of rare mutations μ → 0 (see Supplementary Information for details).

Extended Data Fig. 5 WSLS sustains cooperation in multiplayer public-goods games.

This figure is analogous to Extended Data Fig. 3 for the case of multiplayer interactions. Again, we show evolving cooperation rates and the relative abundance of All D and WSLS for the six state-independent and deterministic games in which transitions are monotonic. In five of these games, cooperation emerges once the multiplication factor r₁ becomes sufficiently large. In all of those, WSLS is the most abundant strategy when cooperation evolves. Except for r₁, all parameters are the same as in Extended Data Fig. 4.

Extended Data Fig. 6 Probabilistic transitions can further enhance cooperation.

a, Here, we explore in more detail the stochastic game introduced in Fig. 3a (see Supplementary Information for details), in which any defection always leads to state 2. After mutual cooperation in state 1, players remain in state 1 with certainty. After mutual cooperation in state 2, players move towards state 1 with probability q. b, Calculating the cooperation rate in the selection–mutation equilibrium in the limit of rare mutations shows that the highest cooperation rate is achieved for intermediate values of q. c, We recorded the abundance of all 32 memory-one strategies in the selection–mutation equilibrium. The most abundant strategy is either All D (for small values of q, as indicated by the red squares), WSLS (for small but positive values of q, green circles) or AWSLS (for all other values of q, yellow triangles; AWSLS is a more ambitious variant of WSLS, see Supplementary Information, section 4.1). d, To estimate the time that it takes each resident strategy to be invaded, we randomly introduced other mutant strategies and recorded how long it took until a mutant successfully fixed (that is, the number of independent mutant strategies introduced before the mutant strategy was adopted by the whole population). To obtain a reliable estimate, we performed 10,000 runs for each resident strategy. e, f, In addition, we recorded which strategy eventually reaches fixation if the resident applies either All D or WSLS when q = 1. Parameters: b₁ = 1.9, b₂ = 1.4, c = 1, β = 1, N = 100.

Extended Data Fig. 7 Players benefit from a small endogenous risk that the game stops early.

a, We consider the stochastic game in Fig. 3b, in which players remain in state 1 after cooperation, but move towards state 2 with transition probability q if one of the players defects. In state 2, no profitable interactions are possible. All results are discussed in detail in Supplementary Information; here we provide a summary. b, According to our evolutionary simulations, a higher transition probability leads to more cooperation. c, However, a higher probability q also makes players move to the second state if one of them defected merely owing to an error; hence, the dependence of payoffs on q is non-monotonic. d, e, When q is small, Grim is the predominant strategy. Players with this strategy cooperate until one of the players defects; from then on, they defect forever. As q increases, WSLS strategies take over. As q → 1, unconditional cooperation becomes most successful. f, For the given parameter values, a homogeneous Grim population achieves only one-third of the maximum payoff possible, because any error leads to relentless defection. The other three strategies result in the maximum payoff b₁ − c for q = 0, but this payoff decreases with q. Parameters: b₁ = 2, c = 1, δ = 0.999, ε = 0.001, β = 1, N = 100.

Extended Data Fig. 8 Immediate environmental feedback enhances cooperation.

a, We consider a state-dependent stochastic game with two players and three states. Mutual cooperation always leads players to move to a superior state (or to remain in the most beneficial state s₁). Similarly, mutual defection always leads to an inferior state (or players remain in the most detrimental state s₃). After a unilateral defection, players remain in the same state. We consider four different versions of this game, depending on how quickly the payoffs decrease as players move towards an inferior state. b, Our numerical results show that an immediate negative response of the environment to defection is most favourable to the evolution of cooperation. c, As a consequence, the scenario with immediate consequences also yields the highest average payoffs once the benefit in state 1 exceeds a moderate threshold. d–g, On the level of evolving strategies, we find that an immediately responding environment is most favourable to the evolution of WSLS strategies and strongly selects against defecting strategies. Again, the coloured bars on top of each panel indicate the strategy that is most favoured by selection for the respective value of b₁ (see Supplementary Information for all details). Parameters: c = 1; b₁ varies from 1 to 3; b₂ is equal to c, (b₁ + c)/2 or b₁; and b₃ is equal to either c or b₁ depending on the scenario considered (as depicted in a); N = 100, β = 1, δ → 1, ε = 0.001.

Extended Data Fig. 9 Cooperation in stochastic games requires that players take future payoff consequences into account.

We repeated the numerical computations in Extended Data Fig. 8 for various discount rates δ. When players focus entirely on the present (δ = 0), cooperation evolves in none of the four treatments. As players increasingly take future payoffs into account, cooperation rates increase. Immediate payoff feedback is most conducive to cooperation across all values of δ considered. Except for the discount rate, parameters are the same as in Extended Data Fig. 8, with b₁ = 1.8.

Extended Data Fig. 10 A systematic analysis of the expected game dynamics for different game payoffs.

Keeping the two-player game in state 2 fixed to the game in Fig. 2a, we varied the game that is played in state 1. We assume that payoffs in the first state are 1 (for mutual cooperation), S₁ (for unilateral cooperation), T₁ (for unilateral defection) and 0 (for mutual defection). Depending on T₁ and S₁, game 1 can be one of four different types: harmony game (HG), snowdrift game (SD), stag-hunt game (SH) or prisoner’s dilemma (PD); see Supplementary Information for details. For each of the eight possible state-independent transitions q, we systematically varied the temptation payoff T₁ (x axis) and the sucker’s payoff S₁ (y axis) in the first state (see Supplementary Information for details). For each combination of T₁, S₁ and q, we computed how often players cooperate in the selection–mutation equilibrium (left panels) and in what fraction of rounds they switch from one state to the other (right panels). a–c, e, Full cooperation can evolve when players find themselves in state 1 after mutual cooperation. d, f, Players learn to switch between states only when mutual cooperation leads to state 2 and mutual defection leads to state 1. g, h, In the remaining cases, players hardly cooperate. The payoffs in game 2 are the same as in Fig. 2a—a prisoner’s dilemma with b₂ = 1.2 and c = 1. For the evolutionary parameters we considered population size N = 100 and selection strength β = 1.

Supplementary information

Supplementary Information

This file contains a Supplementary Discussion, Supplementary Table 1 and Supplementary References. Supplementary Table 1 provides several examples of memory-1 strategies of stochastic games.

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hilbe, C., Šimsa, Š., Chatterjee, K. et al. Evolution of cooperation in stochastic games. Nature 559, 246–249 (2018). https://doi.org/10.1038/s41586-018-0277-x

Download citation

Received: 01 November 2017
Accepted: 17 May 2018
Published: 04 July 2018
Issue Date: 12 July 2018
DOI: https://doi.org/10.1038/s41586-018-0277-x

This article is cited by

Effect of reciprocity mechanisms on evolutionary dynamics in feedback-evolving games
- Xiaojian Ma
- Ji Quan
- Xianjia Wang
Nonlinear Dynamics (2024)
The effect of environmental information on evolution of cooperation in stochastic games
- Maria Kleshnina
- Christian Hilbe
- Martin A. Nowak
Nature Communications (2023)
Evolutionary games with two species and delayed reciprocity
- Kaipeng Hu
- Zhouhong Li
- Matjaž Perc
Nonlinear Dynamics (2023)
Path probability selection in nature and path integral
- Chao Wang
- Min-Lan Li
- Rui-Wu Wang
Scientific Reports (2022)
The emergence of cooperation from shared goals in the governance of common-pool resources
- Chengyi Tu
- Paolo D’Odorico
- Samir Suweis
Nature Sustainability (2022)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.