Letter | Published:

Evolution of cooperation in stochastic games

Naturevolume 559pages246249 (2018) | Download Citation


Social dilemmas occur when incentives for individuals are misaligned with group interests1,2,3,4,5,6,7. According to the ‘tragedy of the commons’, these misalignments can lead to overexploitation and collapse of public resources. The resulting behaviours can be analysed with the tools of game theory8. The theory of direct reciprocity9,10,11,12,13,14,15 suggests that repeated interactions can alleviate such dilemmas, but previous work has assumed that the public resource remains constant over time. Here we introduce the idea that the public resource is instead changeable and depends on the strategic choices of individuals. An intuitive scenario is that cooperation increases the public resource, whereas defection decreases it. Thus, cooperation allows the possibility of playing a more valuable game with higher payoffs, whereas defection leads to a less valuable game. We analyse this idea using the theory of stochastic games16,17,18,19 and evolutionary game theory. We find that the dependence of the public resource on previous interactions can greatly enhance the propensity for cooperation. For these results, the interaction between reciprocity and payoff feedback is crucial: neither repeated interactions in a constant environment nor single interactions in a changing environment yield similar cooperation rates. Our framework shows which feedbacks between exploitation and environment—either naturally occurring or designed—help to overcome social dilemmas.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from $8.99

All prices are NET prices.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


  1. 1.

    Lloyd, W. F. Two Lectures on the Checks to Population (Oxford Univ. Press, Oxford, 1833).

  2. 2.

    Hardin, G. The tragedy of the commons. Science 162, 1243–1248 (1968).

  3. 3.

    Trivers, R. L. The evolution of reciprocal altruism. Q. Rev. Biol. 46, 35–57 (1971).

  4. 4.

    Axelrod, R. The Evolution of Cooperation (Basic Books, New York, NY, 1984).

  5. 5.

    Ostrom, E. Governing the Commons: The Evolution of Institutions for Collective Action (Cambridge Univ. Press, Cambridge, 1990).

  6. 6.

    Nowak, M. A. Five rules for the evolution of cooperation. Science 314, 1560–1563 (2006).

  7. 7.

    Van Lange, P. A. M., Balliet, D., Parks, C. D. & Van Vugt, M. Social Dilemmas – The Psychology of Human Cooperation (Oxford Univ. Press, Oxford, 2015).

  8. 8.

    Sigmund, K. The Calculus of Selfishness (Princeton Univ. Press, Princeton, 2010).

  9. 9.

    Nowak, M. & Sigmund, K. A strategy of win-stay, lose-shift that outperforms tit-for-tat in the Prisoner’s Dilemma game. Nature 364, 56–58 (1993).

  10. 10.

    Hauert, C. & Schuster, H. G. Effects of increasing the number of players and memory size in the iterated prisoner’s dilemma: a numerical approach. Proc. R. Soc. Lond. B 264, 513–519 (1997).

  11. 11.

    Killingback, T. & Doebeli, M. The continuous prisoner’s dilemma and the evolution of cooperation through reciprocal altruism with variable investment. Am. Nat. 160, 421–438 (2002).

  12. 12.

    Szolnoki, A., Perc, M. & Szabó, G. Phase diagrams for three-strategy evolutionary prisoner’s dilemma games on regular graphs. Phys. Rev. E 80, 056104 (2009).

  13. 13.

    Grujić, J., Cuesta, J. A. & Sánchez, A. On the coexistence of cooperators,defectors and conditional cooperators in the multiplayer iterated Prisoner’s Dilemma. J. Theor. Biol. 300, 299–308 (2012).

  14. 14.

    García, J. & van Veelen, M. In and out of equilibrium I: evolution of strategies in repeated games with discounting. J. Econ. Theory 161, 161–189 (2016).

  15. 15.

    Hilbe, C., Chatterjee, K. & Nowak, M. A. Partners and rivals in direct reciprocity. Nat. Hum. Behav. (2018).

  16. 16.

    Shapley, L. S. Stochastic games. Proc. Natl Acad. Sci. USA 39, 1095–1100 (1953).

  17. 17.

    Neyman, A. & Sorin, S. (eds) Stochastic Games and Applications (Kluwer Academic Press, Dordrecht, 2003).

  18. 18.

    Mertens, J. F. & Neyman, A. Stochastic games. Int. J. Game Theory 10, 53–66 (1981).

  19. 19.

    Mertens, J. F. & Neyman, A. Stochastic games have a value. Proc. Natl Acad. Sci. USA 79, 2145–2146 (1982).

  20. 20.

    Rand, D. G. & Nowak, M. A. Human cooperation. Trends Cogn. Sci. 17, 413–425 (2013).

  21. 21.

    Ledyard, J. O. in The Handbook of Experimental Economics (eds Kagel, J. H. & Roth, A. E.) 111–194 (Princeton Univ. Press, Princeton, 1995).

  22. 22.

    Milinski, M., Sommerfeld, R. D., Krambeck, H.-J., Reed, F. A. & Marotzke, J. The collective-risk social dilemma and the prevention of simulated dangerous climate change. Proc. Natl Acad. Sci. USA 105, 2291–2294 (2008).

  23. 23.

    Alur, R., Henzinger, T. & Kupferman, O. Alternating-time temporal logic. J. Assoc. Comput. Mach. 49, 672–713 (2002).

  24. 24.

    Miltersen, P. B. & Sorensen, T. B. A near-optimal strategy for a heads-up no-limit texas hold’em poker tournament. In Proc. 6th International Joint Conference on Autonomous Agents and Multiagent Systems 191 (ACM, 2007).

  25. 25.

    Ashcroft, P., Altrock, P. M. & Galla, T. Fixation in finite populations evolving in fluctuating environments. J. R. Soc. Interface 11, 20140663 (2014).

  26. 26.

    Gokhale, C. S. & Hauert, C. Eco-evolutionary dynamics of social dilemmas. Theor. Popul. Biol. 111, 28–42 (2016).

  27. 27.

    Hauert, C., Holmes, M. & Doebeli, M. Evolutionary games and population dynamics: maintenance of cooperation in public goods games. Proc. R. Soc. Lond. B 273, 2565–2570 (2006); corrigendum 273, 3131–313 (2006).

  28. 28.

    Weitz, J. S., Eksin, C., Paarporn, K., Brown, S. P. & Ratcliff, W. C. An oscillating tragedy of the commons in replicator dynamics with game-environment feedback. Proc. Natl Acad. Sci. USA 113, E7518–E7525 (2016).

  29. 29.

    Tavoni, A., Schlüter, M. & Levin, S. The survival of the conformist: social pressure and renewable resource management. J. Theor. Biol. 299, 152–161 (2012).

  30. 30.

    Traulsen, A., Nowak, M. A. & Pacheco, J. M. Stochastic dynamics of invasion and fixation. Phys. Rev. E 74, 011909 (2006).

  31. 31.

    Neyman, A. Continuous-time stochastic games. Games Econ. Behav. 104, 92–130 (2017).

  32. 32.

    Nowak, M. A. & Sigmund, K. The evolution of stochastic strategies in the prisoner’s dilemma. Acta Appl. Math. 20, 247–265 (1990).

  33. 33.

    Ohtsuki, H. & Iwasa, Y. The leading eight: social norms that can maintain cooperation by indirect reciprocity. J. Theor. Biol. 239, 435–444 (2006).

  34. 34.

    Stewart, A. J. & Plotkin, J. B. Collapse of cooperation in evolving games. Proc. Natl Acad. Sci. USA 111, 17558–17563 (2014).

  35. 35.

    Pinheiro, F. L., Vasconcelos, V. V., Santos, F. C. & Pacheco, J. M. Evolution of all-or-none strategies in repeated public goods dilemmas. PLOS Comput. Biol. 10, e1003945 (2014).

  36. 36.

    Akin, E. in Ergodic Theory, Advances in Dynamics (ed. Assani, I.) 77–107 (de Gruyter, Berlin, 2016).

  37. 37.

    Hilbe, C., Martinez-Vaquero, L. A., Chatterjee, K. & Nowak, M. A. Memory-n strategies of direct reciprocity. Proc. Natl Acad. Sci. USA 114, 4715–4720 (2017).

  38. 38.

    Stewart, A. J. & Plotkin, J. B. Small groups and long memories promote cooperation. Sci. Rep. 6, 26889 (2016).

  39. 39.

    Reiter, J. G., Hilbe, C., Rand, D. G., Chatterjee, K. & Nowak, M. A. Crosstalk in concurrent repeated games impedes direct reciprocity and requires stronger levels of forgiveness. Nat. Commun. 9, 555 (2018).

  40. 40.

    Fudenberg, D. & Imhof, L. A. Imitation processes with small mutations. J. Econ. Theory 131, 251–262 (2006).

Download references


This work was supported by the European Research Council Start Grant 279307: Graph Games (to K.C.), Austrian Science Fund (FWF) grant P23499-N23 (to K.C.), FWF NFN grant S11407-N23 Rigorous Systems Engineering/Systematic Methods in Systems Engineering (to K.C.), Office of Naval Research Grant N00014-16-1- 2914 (to M.A.N.) and the John Templeton Foundation (M.A.N.). C.H. acknowledges support from the ISTFELLOW programme.

Reviewer information

Nature thanks A. Neyman and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author information


  1. Program for Evolutionary Dynamics, Harvard University, Cambridge, MA, USA

    • Christian Hilbe
    •  & Martin A. Nowak
  2. IST Austria, Klosterneuburg, Austria

    • Christian Hilbe
    •  & Krishnendu Chatterjee
  3. Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic

    • Štěpán Šimsa
  4. Department of Organismic and Evolutionary Biology, Department of Mathematics, Harvard University, Cambridge, MA, USA

    • Martin A. Nowak


  1. Search for Christian Hilbe in:

  2. Search for Štěpán Šimsa in:

  3. Search for Krishnendu Chatterjee in:

  4. Search for Martin A. Nowak in:


All authors conceived the study, performed the analysis, discussed the results and wrote the manuscript.

Competing interests

The authors declare no competing interests.

Corresponding authors

Correspondence to Christian Hilbe or Krishnendu Chatterjee or Martin A. Nowak.

Extended data figures and tables

  1. Extended Data Fig. 1 Our findings are robust with respect to parameter changes.

    To test the robustness of our findings, we consider the stochastic game introduced in Fig. 2a and independently vary several key parameters. a, b, When we vary the benefit of cooperation in state 1, we find that the advantage of the stochastic game is most pronounced when this benefit is intermediate, 1.5 ≤ b1 ≤ 2.5. This conclusion holds independently of whether individuals use pure strategies only (a) or stochastic ones (b). cf, We obtain similar results when we vary the error rate ε (c), the strength of selection β (d), the discount factor δ (e) and the mutation rate μ (f). In all cases, we observe that stochastic games yield a cooperation premium, provided that errors are sufficiently rare, selection is sufficiently strong, players give sufficient weight to future payoffs and mutations are comparably rare. Solid lines indicate exact results in the limit of rare mutations, whereas square symbols and dashed lines represent simulation results (see Supplementary Information for details). Filled circles highlight the results obtained for the parameters in Fig. 2a. As default parameters, we used the same values as in Fig. 2a: N = 100, b1 = 2.0, b2 = 1.2, c = 1, β = 1, ε = 0.001, δ → 1 and μ → 0.

  2. Extended Data Fig. 2 Whether cooperation evolves in two-player games depends critically on the form of the environmental feedback.

    Keeping the game parameters fixed at the values used in Fig. 2a, we explored how the evolution of cooperation depends on the underlying transition structure of the stochastic game in the limit of rare mutations (see Supplementary Information). ah, We calculated the selection–mutation equilibrium for all possible stochastic games with two states when transitions are state-independent and deterministic. i, Overall, six of the eight transition structures lead players to spend more time in the more profitable state 1, in which mutual cooperation has a higher benefit. j, However, cooperation evolves in only two out of these six transition structures. These two structures have in common that mutual cooperation always leads to the beneficial state 1, whereas mutual defection leads to the detrimental state 2. Thus, cooperation is most likely to evolve if the environmental feedback itself incentivizes mutual cooperation and disincentivizes mutual defection. The transitions after unilateral defection have a less prominent role.

  3. Extended Data Fig. 3 Analysis of the evolving strategies suggests that the evolution of cooperation hinges on the success of WSLS.

    Here, we consider all state-invariant and deterministic stochastic games with two states and two players. ah, For each of the eight possible cases, we recorded the evolving cooperation rate (lower plots) and the relative abundance of each pure memory-one strategy (upper plots) for different values of b1. For clarity, we depict only two memory-one strategies explicitly, All D (the strategy that prescribes to always defect) and WSLS. The colour-shaded bars on top of the upper plots show parameter regimes in which either All D or WSLS is most abundant among all 16 strategies. In four of the eight cases, we observe that full cooperation evolves as the benefit to cooperation in state 1 approaches b1 = 3. These are exactly the cases in which mutual cooperation leads players towards the more beneficial state 1. Moreover, in these four cases the upper plots show that cooperation emerges owing to the success of WSLS, which is the predominant strategy whenever cooperation prevails. Except for the value of b1, all other parameter values are the same as in Extended Data Fig. 2.

  4. Extended Data Fig. 4 Effect of transitions on cooperation in four-player public-goods games.

    We also explored the effect of different transition structures for stochastic games between multiple players (with a public-goods game being played in each state). State 1 is again more beneficial because r1 > r2, but to be in state 1 there must be a minimum number k of cooperators in the previous round. af, For a four-player public-goods game, there are six possible monotonic configurations of the stochastic game because k can be any number from 0 (players always move to first state) to 5 (players never move to first state). h, There is a non-monotonic relationship between the six transition structures and the time spent in the more beneficial state 1. g, The evolving cooperation rate becomes maximal when any deviation from mutual cooperation leads players to state 2 (e). Parameters are as in Fig. 2b, but with the multiplication factor in the first state fixed to r1 = 2 and selection strength β = 1; to derive exact results, we considered the limit of rare mutations μ → 0 (see Supplementary Information for details).

  5. Extended Data Fig. 5 WSLS sustains cooperation in multiplayer public-goods games.

    This figure is analogous to Extended Data Fig. 3 for the case of multiplayer interactions. Again, we show evolving cooperation rates and the relative abundance of All D and WSLS for the six state-independent and deterministic games in which transitions are monotonic. In five of these games, cooperation emerges once the multiplication factor r1 becomes sufficiently large. In all of those, WSLS is the most abundant strategy when cooperation evolves. Except for r1, all parameters are the same as in Extended Data Fig. 4.

  6. Extended Data Fig. 6 Probabilistic transitions can further enhance cooperation.

    a, Here, we explore in more detail the stochastic game introduced in Fig. 3a (see Supplementary Information for details), in which any defection always leads to state 2. After mutual cooperation in state 1, players remain in state 1 with certainty. After mutual cooperation in state 2, players move towards state 1 with probability q. b, Calculating the cooperation rate in the selection–mutation equilibrium in the limit of rare mutations shows that the highest cooperation rate is achieved for intermediate values of q. c, We recorded the abundance of all 32 memory-one strategies in the selection–mutation equilibrium. The most abundant strategy is either All D (for small values of q, as indicated by the red squares), WSLS (for small but positive values of q, green circles) or AWSLS (for all other values of q, yellow triangles; AWSLS is a more ambitious variant of WSLS, see Supplementary Information, section 4.1). d, To estimate the time that it takes each resident strategy to be invaded, we randomly introduced other mutant strategies and recorded how long it took until a mutant successfully fixed (that is, the number of independent mutant strategies introduced before the mutant strategy was adopted by the whole population). To obtain a reliable estimate, we performed 10,000 runs for each resident strategy. e, f, In addition, we recorded which strategy eventually reaches fixation if the resident applies either All D or WSLS when q = 1. Parameters: b1 = 1.9, b2 = 1.4, c = 1, β = 1, N = 100.

  7. Extended Data Fig. 7 Players benefit from a small endogenous risk that the game stops early.

    a, We consider the stochastic game in Fig. 3b, in which players remain in state 1 after cooperation, but move towards state 2 with transition probability q if one of the players defects. In state 2, no profitable interactions are possible. All results are discussed in detail in Supplementary Information; here we provide a summary. b, According to our evolutionary simulations, a higher transition probability leads to more cooperation. c, However, a higher probability q also makes players move to the second state if one of them defected merely owing to an error; hence, the dependence of payoffs on q is non-monotonic. d, e, When q is small, Grim is the predominant strategy. Players with this strategy cooperate until one of the players defects; from then on, they defect forever. As q increases, WSLS strategies take over. As q → 1, unconditional cooperation becomes most successful. f, For the given parameter values, a homogeneous Grim population achieves only one-third of the maximum payoff possible, because any error leads to relentless defection. The other three strategies result in the maximum payoff b1 − c for q = 0, but this payoff decreases with q. Parameters: b1 = 2, c = 1, δ = 0.999, ε = 0.001, β = 1, N = 100.

  8. Extended Data Fig. 8 Immediate environmental feedback enhances cooperation.

    a, We consider a state-dependent stochastic game with two players and three states. Mutual cooperation always leads players to move to a superior state (or to remain in the most beneficial state s1). Similarly, mutual defection always leads to an inferior state (or players remain in the most detrimental state s3). After a unilateral defection, players remain in the same state. We consider four different versions of this game, depending on how quickly the payoffs decrease as players move towards an inferior state. b, Our numerical results show that an immediate negative response of the environment to defection is most favourable to the evolution of cooperation. c, As a consequence, the scenario with immediate consequences also yields the highest average payoffs once the benefit in state 1 exceeds a moderate threshold. dg, On the level of evolving strategies, we find that an immediately responding environment is most favourable to the evolution of WSLS strategies and strongly selects against defecting strategies. Again, the coloured bars on top of each panel indicate the strategy that is most favoured by selection for the respective value of b1 (see Supplementary Information for all details). Parameters: c = 1; b1 varies from 1 to 3; b2 is equal to c, (b1 + c)/2 or b1; and b3 is equal to either c or b1 depending on the scenario considered (as depicted in a); N = 100, β = 1, δ → 1, ε = 0.001.

  9. Extended Data Fig. 9 Cooperation in stochastic games requires that players take future payoff consequences into account.

    We repeated the numerical computations in Extended Data Fig. 8 for various discount rates δ. When players focus entirely on the present (δ = 0), cooperation evolves in none of the four treatments. As players increasingly take future payoffs into account, cooperation rates increase. Immediate payoff feedback is most conducive to cooperation across all values of δ considered. Except for the discount rate, parameters are the same as in Extended Data Fig. 8, with b1 = 1.8.

  10. Extended Data Fig. 10 A systematic analysis of the expected game dynamics for different game payoffs.

    Keeping the two-player game in state 2 fixed to the game in Fig. 2a, we varied the game that is played in state 1. We assume that payoffs in the first state are 1 (for mutual cooperation), S1 (for unilateral cooperation), T1 (for unilateral defection) and 0 (for mutual defection). Depending on T1 and S1, game 1 can be one of four different types: harmony game (HG), snowdrift game (SD), stag-hunt game (SH) or prisoner’s dilemma (PD); see Supplementary Information for details. For each of the eight possible state-independent transitions q, we systematically varied the temptation payoff T1 (x axis) and the sucker’s payoff S1 (y axis) in the first state (see Supplementary Information for details). For each combination of T1, S1 and q, we computed how often players cooperate in the selection–mutation equilibrium (left panels) and in what fraction of rounds they switch from one state to the other (right panels). ac, e, Full cooperation can evolve when players find themselves in state 1 after mutual cooperation. d, f, Players learn to switch between states only when mutual cooperation leads to state 2 and mutual defection leads to state 1. g, h, In the remaining cases, players hardly cooperate. The payoffs in game 2 are the same as in Fig. 2a—a prisoner’s dilemma with b2 = 1.2 and c = 1. For the evolutionary parameters we considered population size N = 100 and selection strength β = 1.

Supplementary information

  1. Supplementary Information

    This file contains a Supplementary Discussion, Supplementary Table 1 and Supplementary References. Supplementary Table 1 provides several examples of memory-1 strategies of stochastic games.

  2. Reporting Summary

About this article

Publication history







By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.