Abstract
Many human interactions feature the characteristics of social dilemmas where individual actions have consequences for the group and the environment. The feedback between behavior and environment can be studied with the framework of stochastic games. In stochastic games, the state of the environment can change, depending on the choices made by group members. Past work suggests that such feedback can reinforce cooperative behaviors. In particular, cooperation can evolve in stochastic games even if it is infeasible in each separate repeated game. In stochastic games, participants have an interest in conditioning their strategies on the state of the environment. Yet in many applications, precise information about the state could be scarce. Here, we study how the availability of information (or lack thereof) shapes evolution of cooperation. Already for simple examples of two state games we find surprising effects. In some cases, cooperation is only possible if there is precise information about the state of the environment. In other cases, cooperation is most abundant when there is no information about the state of the environment. We systematically analyze all stochastic games of a given complexity class, to determine when receiving information about the environment is better, neutral, or worse for evolution of cooperation.
Introduction
Cooperation can be conceptualized as an individually costly behavior that creates a benefit to others1. Such cooperative behaviors have evolved in many species, from uni-cellular organisms to mammals2. Yet they are arguably most abundant and complex in humans, where they form the very basis of families, institutions, and society3,4. Humans often support cooperation through direct reciprocity5. Here, people preferentially help those who have been helpful in the past6. Such forms of direct reciprocity naturally emerge when groups are stable, and when cooperation yields substantial returns7. In that case, individuals readily learn to engage in conditional cooperation, using strategies like Tit-for-tat8,9,10,11 (TFT), Win-Stay Lose-Shift12,13 (WSLS), or multiplayer variants thereof14,15,16. When everyone adopts these strategies, groups can sustain cooperation despite any short-run incentives to free ride17,18.
To describe direct reciprocity formally, traditional models of cooperation consider individuals who face the same strategic interaction (game) over and over again. The most prominent model of this kind is the iterated prisoner’s dilemma8. In this game, two individuals (players) repeatedly decide whether to cooperate or defect. While the players’ decisions may change from one round to the next, the feasible payoffs remain constant. Models based on iterated games have become fundamental for our understanding of reciprocity. However, they presume that interactions take place in a constant social and natural environment. Individual actions in one round have no effect on the exact game being played in future. In contrast, in many applications, the environment is adaptive, such as when populations aim to control an epidemics19,20,21, manage natural resources22,23,24, or mitigate climate change25,26,27. Changing environments in turn often bring about a change in the exact game being played. Such applications are therefore best described with models in which there is a feedback between behavior and environment. In the context of direct reciprocity, such feedbacks can be incorporated with the framework of stochastic games28,29,30.
In stochastic games, individuals interact over multiple time periods. Each period, the players’ environment is in one of several possible states. This state can change from one period to the next, depending on the current state, the players’ actions, and on chance. Changes of the state affect the players’ available strategies and their feasible payoffs. In this way, stochastic games are better able to describe social dilemmas in which individual actions affect the nature of a group’s future interactions. Yet previous evolutionary models of stochastic games presume that individuals are perfectly aware of the current state31,32,33. This allows individuals to coordinate on appropriate responses once the state has changed. In contrast, in many applications, any knowledge about the state of the environment is at best incomplete. Such uncertainties can in turn have dramatic effects on human behavior34,35,36,37. Understanding the impact of information on decision-making has been a rich field of study in economics. Corresponding studies suggest that the effect of information is often positive, even though there are situations in which it has adverse effects38,39,40. Additionally, studies of partially observable stochastic games suggest that settings with incomplete information can benefit decision-makers41,42.
In the following, we explore how state uncertainty in stochastic games shapes the evolution of cooperation. To this end, we compare two scenarios. First, we consider the case when individuals are able to learn the state of their environment and condition their decisions on the current state. We will refer to this case as the ‘full-information setting’. In the second case, individuals may be aware that they are engaged in a stochastic game but they either ignore or are unable to obtain information about the current state. As a result, their decisions are independent of their environment. We refer to this case as the ‘no-information setting’. To compare these two settings we focus on the simplest possible case, where two players may experience two possible states. Already for this elementary setup, we obtain an extremely rich family of models that gives rise to many different possible dynamics. Already here, we observe that conditioning strategies on state information can have drastic effects on how people cooperate.
To quantify the importance of state information, we introduce a measure to which we refer as the ‘value of information’. This value reflects by how much the cooperation rate in a population changes by gaining access to information about the present state. When this value is positive, access to information makes the population more cooperative. In that case, we speak of a ‘benefit of information’. In general, it is also possible to observe negative values, in which case we speak of a ‘benefit of ignorance’. With analytical methods for the important limit of weak selection43,44,45, and with numerical computations for arbitrary selection strengths, we compare the value of information across many stochastic games. We identify settings where receiving information is better, neutral, or worse for the evolution of cooperation. Most often, information is highly beneficial. However, there are also a few notable exceptions in which populations can achieve more cooperation when they are ignorant of their state. In the following, we describe and characterize these cases in detail.
Results
Stochastic games with and without state information
To explore the dynamics of cooperation in variable environments, we consider stochastic games31,32,33. We introduce our framework for the most simple setup, in which the game takes place among two players who interact for infinitely many rounds, without discounting of their future payoffs. In each round, players can find themselves in two possible states, S = {s1, s2}. Depending on the state, players engage in one of two possible prisoner’s dilemma games. In either game, they can either cooperate (C) or defect (D). Cooperation means to pay a cost c for the other player to get a benefit bi. The cost of cooperation is fixed, but the benefit bi depends on the present state si (Fig. 1a). Without loss of generality, we assume that the first state is more profitable, such that b1 ≥ b2 > c ≔ 1. However, states can change from one round to the next, depending on the game’s transition vector
Here, each entry \({q}_{a\tilde{a}}^{i}\in [0,\, 1]\) is the probability that players find themselves in the more profitable state s1 in the next round. This probability depends on the previous state si and on the players’ previous actions a and \(\tilde{a}\). For example, the transition vector q = (1, 0, 0; 1, 0, 0) corresponds to a game in which players are only in the more profitable state if they both cooperated in the previous round. Note that we assume the transition vector to be symmetric. That is, transition probabilities depend on the number of cooperators, but they are independent of who cooperated (\({q}_{CD}^{i}={q}_{DC}^{i}\) for all i). We say a transition vector is deterministic if each entry \({q}_{a\tilde{a}}^{i}\) is either zero or one (Fig. 1b). Even for deterministic vectors we speak of a ‘stochastic game’, because games with deterministic transitions represent a special case of our framework. Based on Eq. (1), there are 26 = 64 deterministic transition vectors in total. We call a transition vector single-stochastic if there is exactly one entry that is strictly between zero and one. Games with single-stochastic transitions can serve as the most elementary example of an interaction for which the environment depends on chance events.
To explore how often players cooperate depending on the information they have, we compare two settings (Fig. 1c). In the full-information setting, players learn the present state before making decisions. Thus, their strategies may depend on both the present state and on the players’ actions in the previous rounds. Herein, we assume that players make decisions based on memory-1 strategies. Such strategies only take into account the outcome of the last round46 (extensions to more complex strategies47,48,49,50,51,52 are possible, but for simplicity we do not explore them here). In the full information setting, memory-1 strategies take the form of an 8-tuple,
Here, \({p}_{a\tilde{a}}^{i}\) is the player’s probability to cooperate in state si, depending on the focal player’s and the co-player’s previous actions a and \(\tilde{a}\), respectively. We compare this full-information setting with a no-information setting, in which individuals are unable to condition their behavior on the current state. In that case, strategies are 4-tuples
We note that the set of no-information strategies is a strict subset of the full-information strategies (they correspond to those pF for which \({p}_{a\tilde{a}}^{1}={p}_{a\tilde{a}}^{2}\) for all actions a and \(\tilde{a}\)). For simplicity, we assume in the following that the players’ strategies are deterministic, such that each entry is either zero or one. For full information, there are 28 = 256 deterministic strategies. For no information, there are 24 = 16 deterministic strategies. Some results for stochastic strategies are shown in Fig. S1a, b.
The players’ strategies may be subject to errors with some small probability ε. This model parameter reflects the assumption that people may occasionally make mistakes when engaging in reciprocity53,54. In that case, an intended cooperation may be misimplemented as a defection (and vice versa). Games with errors have the useful technical property that the long-run dynamics is independent of the players’ initial moves46. For ε > 0, a player with strategy p effectively implements the strategy (1 − ε)p + ε(1 − p). In particular, even when the original strategy p is deterministic, the effective strategy is stochastic. Given the error probability, the players’ strategies, and the game’s transition vector, we can compute how often players cooperate on average and which payoffs they get (see Methods).
Because we are interested in how cooperation evolves, we do not consider players with fixed strategies. Rather players can change their strategies in time, depending on the payoffs they yield. To describe this evolutionary dynamics, we use a pairwise comparison process55. This process considers populations of fixed size N. Players receive payoffs by interacting with all other population members. At regular time intervals, one player is randomly chosen and given the opportunity to revise its strategy. The player may do so in two ways. With probability μ, the player switches to a random deterministic memory-1 strategy (similar to a mutation in biological models of evolution). Otherwise, with probability 1 − μ, the focal player compares its own payoff π to the payoff \(\tilde{\pi }\) of a random role model. The player switches to the role model’s strategy with probability \({(1+\exp [-\beta (\tilde{\pi }-\pi )])}^{-1}\). The parameter β > 0 is the strength of selection. The higher this parameter, the more individuals are prone to imitate only those role models with a high payoff. Overall, these assumptions define a stochastic process on the space of all possible population compositions. For finite β, evolutionary trajectories do not converge to any particular outcome because no population composition is absorbing. However, because the process is ergodic, the respective time averages converge to an invariant distribution. This invariant distribution describes how often the population has a given composition in the long run (see Methods).
We study this evolutionary process analytically when mutations are rare and selection is weak (that is, when μ, β → 0). In addition, we numerically explore the process for arbitrary selection strengths. In either case, we compute which payoffs players receive on average and how likely they are to cooperate over time. By comparing the cooperation rates \({\hat{\gamma }}^{F}\) and \({\hat{\gamma }}^{N}\) for populations with full and no information, respectively, we quantify how favorable information is for the evolution of cooperation. We refer to the difference, \({V}_{\beta }({{{{{{{\bf{q}}}}}}}}):={\hat{\gamma }}^{F}-{\hat{\gamma }}^{N}\) as the value of (state) information. In general, this value depends on the game’s transition vector q, as well as on the strength of selection β. When this value is positive, populations achieve more cooperation when they learn the present state of the stochastic game.
In the following, we describe the results of this baseline model in detail. In the SI, we provide further results on the impact of different game parameters (Fig. S1), other strategy spaces (Fig. S2), and alternative learning rules (Fig. S3).
The effect of state information in two examples
To begin with, we illustrate the effect of state information by exploring the dynamics of two examples. Both examples are variants of models that have been previously used to highlight the importance of stochastic games for the evolution of cooperation31. In the first example (Fig. 2a), players only remain in the more profitable first state if they both cooperate. If either of them defects, they transition to the inferior second state. Once there, they transition back to the more profitable state after one round, irrespective of the players’ actions. The second state may thus be interpreted as a ‘time-out’31. For numerical results, we assume that cooperation yields an intermediate benefit in the more profitable state and a low benefit in the inferior state (b1 = 1.8, b2 = 1.3).
When we simulate the evolutionary dynamics of this stochastic game, we observe that individuals consistently learn to cooperate when they have full information. In contrast, without information, they mostly defect (Fig. 2b). To explain this result, we numerically compute which strategies are most likely to evolve according to the process’s invariant distribution, for each of the two cases (Fig. 2c). In the full-information setting, individuals predominantly adopt a strategy pF = (1, 0, 0, 0; x, 0, 0, 1), where x ∈ {0, 1} is arbitrary. This strategy may be considered as a variant of the WSLS rule that has been successful in the traditional prisoner’s dilemma12. In particular, it is fully cooperative with itself. We prove in Supplementary Note 3 that this strategy forms a subgame perfect (Nash) equilibrium if 2b1 − b2 ≥ 2c, which is satisfied for the parameters we use (see also Fig. 3a). On the other hand, in the no-information setting, this strategy is no longer available. Instead, players can only sustain cooperation with the traditional WSLS rule pN = (1, 0, 0, 1). This strategy is only an equilibrium under the more stringent condition b1 > 2c. Because our parameters do not satisfy this condition, cooperation does not evolve in the no-information setting (Fig. 3b). To explore how these results depend on the benefit of cooperation b1 and on the selection strength β, Fig. 2d shows further simulations where we systematically vary both parameters. In all considered cases, state information is beneficial because it allows individuals to give more nuanced responses.
The second example has a similar transition vector as the first, with a single modification. This time, the inferior state is only left if at least one of the two players cooperates (Fig. 2e). Although this modification may appear minor, the resulting dynamics is strikingly different. We observe that with and without state information, individuals are now largely cooperative. However, they are most cooperative when individuals do not condition their strategies on the state information (Fig. 2f). For this stochastic game, we show in Supplementary Note 3 that already the traditional WSLS rule is subgame perfect for 2b1 − b2 ≥ 2c. As a result, WSLS is predominant in the no-information setting (Fig. 3d). In contrast, in the full-information setting, WSLS is subject to (almost) neutral drift by strategies that only differ from WSLS in a few bits (Fig. 3c). These other strategies may in turn give rise to the occasional invasion of defectors. Overall, we find that this stochastic game exhibits a benefit of ignorance when selection is sufficiently strong, and when cooperation is particularly valuable in the more profitable state (i.e., in the upper right corner of Fig. 2h).
These examples highlight three observations. First, just as there are instances in which state information is beneficial, there are also instances in which state information can reduce how much cooperation players achieve. Second, the stochastic games (transition vectors) for which state information is beneficial may only differ marginally from games with a benefit of ignorance. Finally, even if a stochastic game admits a benefit of ignorance, this benefit may not be present for all parameter values. Taken together, these observations suggest that in general, the effect of state information can be non-trivial and requires further investigation.
A systematic analysis of the weak-selection limit
To explore more systematically in which cases there is a benefit of information (or ignorance), we study the class of all games with deterministic transition vectors. We first consider the limit of weak selection (β → 0). Here, game payoffs only weakly influence how individuals adopt new strategies. While a vanishingly small selection strength is a mathematical idealization, this limit plays an important role in evolutionary game theory43,44,45. It often permits researchers to derive explicit solutions when analytical results are difficult to obtain otherwise. In our case, the limit of weak selection is particularly convenient, because it allows us to exploit certain symmetries between the two possible states s1 and s2, and between the two possible actions C and D, see Supplementary Note 1. As a result, we show that instead of 64 stochastic games, we only need to analyze 24. For each of these 24 transition vectors q, we explore whether information is beneficial, detrimental, or neutral (i.e., whether V0(q) is positive, negative, or zero).
First, we prove that half of the 64 stochastic games are neutral. In these games, the full-information and the no-information setting yield the same average cooperation rate in the limit of weak selection. Among the neutral games, we identify three (overlapping) subclasses. (i) The first subclass consists of those games that have an absorbing state (15 cases). Here, either the first or the second state can no longer be left once it is reached, because \({q}_{a\tilde{a}}^{1}=1\) or \({q}_{a\tilde{a}}^{2}=0\) for all a and \(\tilde{a}\). For these games, state information is neutral because players can be sure they are in the absorbing state eventually. (ii) In the second subclass, transitions are state-independent31, which means \({q}_{a\tilde{a}}^{1}={q}_{a\tilde{a}}^{2}\) for all a and \(\tilde{a}\) (6 additional cases). For deterministic transitions, state-independence implies that the current state can be directly inferred from the players’ previous actions, even without obtaining explicit state information. (iii) In the third subclass, neutrality arises because of more abstract symmetry arguments, described in detail in Supplementary Note 1. In particular, while the games in the first two subclasses are neutral for all selection strengths, the games in the third subclass only become neutral for vanishing selection. One particular example of this last subclass is the game with transition vector q = (1, 0, 0; 1, 1, 0), which we studied in the previous section (Figs. 2e–h and 3c, d). There, we observed that this game can give rise to a benefit of ignorance when selection is intermediate or strong. Here, we conclude that this benefit disappears completely for vanishing selection (see also the lower boundary of Fig. 2h).
For the remaining 32 non-neutral cases, we identify a simple proxy variable that indicates whether or not the respective game exhibits a benefit of information for weak selection (Fig. 4a). Specifically, in a non-neutral game, information is beneficial if and only if X > 0, with X being
Here, \({{\mathbb{1}}}_{A}\) is an indicator function that is one if assertion A is true and zero otherwise. One can interpret the variable X as a measure for how easily the game can be absorbed in mutual cooperation (X ≥ 0) or mutual defection (X ≤ 0). For example, if a game has a transition vector with \({q}_{CC}^{1}=1\), groups can easily implement indefinite cooperation by choosing strategies with \({p}_{CC}^{1}=1\). By doing so, players ensure they remain in the first state, in which they again would continue to cooperate. Using the proxy variable X, we can conclude that there are two properties of transition vectors that make state information beneficial in the limit of weak selection. The transition vector either needs to allow players to coordinate on mutual cooperation in a stable environment (\({q}_{CC}^{1}=1\), \({q}_{CC}^{2}=0\)); or it needs to prevent players from coordinating on mutual defection in a stable environment (\({q}_{DD}^{1} \, \ne \, 1\), \({q}_{DD}^{2} \, \ne \, 0\)). Again by symmetry considerations, we find that there are as many games with a benefit of information as there are games with a benefit of ignorance (16 cases each, see Fig. 4a).
Exploring the impact of other game parameters
After characterizing the case of weak selection, we next explore the dynamics under strictly positive selection. To this end, we numerically compute the population’s average cooperation rate with and without state information, for each of the 64 stochastic games considered previously. To explore the impact of different game parameters, we systematically vary the strength of selection (Figs. 4b and S4), the benefit of cooperation (Figs. 4c and S5), and the error rate (Fig. S6). For 21 games, the evolving cooperation rates are the same with and without information. These games are neutral either because there is an absorbing state, or because transitions are state-independent (as described earlier). For the remaining cases, we find that a clear majority of them result in a benefit of information (Fig. 4b, c).
In the few cases with a consistent benefit of ignorance (the red squares in Figs. S4–S6), there is overall very little cooperation. As a result, the magnitude of this benefit is often negligible. Only in two cases one can find parameter combinations that lead to a sizeable benefit of ignorance. The first case is the stochastic game considered in Fig. 2e–h with transition vector q = (1, 0, 0; 1, 1, 0). The other case is a slight modification of the first, having the transition vector q = (1, 0, 1; 1, 1, 0). In both cases mutual cooperation leads to the more profitable first state. Moreover, in both cases, players can use WSLS to sustain cooperation even without state information, provided that 2b1 − b2 ≥ 2c. But even when this condition holds, the benefit of ignorance is constrained, because even fully informed populations tend to achieve substantial cooperation rates (Figs. S4–S6). Overall, these results suggest that for positive selection strengths, a sizeable benefit of ignorance is rare. Moreover, there seems to be no simple rule that predicts for which stochastic games we can expect a benefit of ignorance (see Supplementary Note 3, Section 3.3 for a more detailed discussion).
The effect of environmental stochasticity
In our analysis so far, we assumed that the environment changes deterministically. Individuals who know the present state and the players’ actions can therefore anticipate the game’s next state. This form of predictability may overall diminish the impact of explicit state information because it reduces uncertainty. In the following, we extend our analysis to allow for stochasticity in the game’s transitions. To gain some intuition, we start with a simple example taken from the previous literature31 (see Fig. 5a for a depiction). According to the game’s transition vector, q = (1, 0, 0, q, 0, 0), players always find themselves in the less profitable second state if one or both players defect. If both players cooperate, however, they either remain in the first state (if they are already there), or they transition to the first state with probability q (if they start out in the second state). This stochastic game represents a scenario in which an environment deteriorates immediately once players defect. If players resume to cooperate, it may take several rounds for the environment to recover.
For this example, we find that the value of information varies non-trivially, depending on the transition probability q and the strength of selection β (Fig. 5b–e). Overall, parameter regions with a benefit of ignorance seem to prevail (Fig. 5f). To obtain analytical results, again we study the game for weak selection (β → 0). In that case, the value of information can be computed explicitly, as \({V}_{0}({{{{{{{\bf{q}}}}}}}})=-\frac{3q(1-q)}{64(1+q)}\). In particular, there is a benefit of ignorance for all intermediate values q ∈ (0, 1). This benefit becomes most pronounced for \(q=\sqrt{2}-1\) (for more details, see Supplementary Note 3, Section 3.4). As we increase the selection strength, however, the dynamics can change, depending on q. For small q, we continue to observe a benefit of ignorance, whereas for larger q information tends to become beneficial (Fig. 5f).
To explore the scenarios with a benefit of ignorance, we record which strategies players adopt for q = 0.2. Without state information, we find that players adopt WSLS almost all of the time (Fig. 5g). In contrast, when players condition their strategies on state information, WSLS is risk-dominated by a strategy that has been termed Ambitious WSLS31 (AWSLS). AWSLS differs from WSLS after mutual cooperation, in which case AWSLS only cooperates when players are in the first state (i.e., \({q}_{CC}^{1}=1\) but \({q}_{CC}^{2}=0\)). Once AWSLS is common in the population, it opens up opportunities for less cooperative strategies to invade. In particular, also non-cooperative strategies like Always Defect (ALLD) are adopted for a non-negligible fraction of time (Fig. 5h). Overall, we find that predicting the effect of information is non-trivial. While some parameter combinations favor populations with full information, we also observe a benefit of ignorance for a significant portion of the parameter space.
To obtain a more comprehensive picture, we numerically analyze all stochastic games with single-stochastic transition vectors. Because the corresponding transition vectors have exactly one entry q between 0 and 1, there are 6 ⋅ 25 = 192 cases in total. We find several regularities. First, similarly to games with deterministic transitions, we find that there are 24 transition vectors for which the game is neutral. In all of these games, one of the two states is absorbing. Second, we analyze the remaining cases in the limit of vanishing selection (Fig. S7). Most of these games follow the rule defined by the proxy variable X in Eq. (4), with some exceptions discussed in detail in Supplementary Note 2. Finally, for positive selection strengths we can again compute the players’ average cooperation rates numerically. We do this for all 192 families of games for weak (Fig. S8), intermediate (Fig. S9), and strong selection (Fig. S10). Similar to the case of deterministic transitions, state information is beneficial in an absolute majority of cases (Fig. S11). However, exceptions can and do occur. A notable benefit of ignorance arises most frequently when mutual cooperation in the more beneficial state leads the players to remain in that state, and when mutual defection in any state is punished with deteriorating environmental conditions.
Our computational methods are not limited to games with deterministic or single-stochastic transitions. To obtain a comprehensive understanding of the general effect of state information, we systematically explore the space of all stochastic transition vectors. To make this analysis feasible, we assume the entries of q are taken from a finite grid \({q}_{ij}^{k}\in \{0,\, 0.2,\, 0.4,\, 0.6,\, 0.8,\, 1.0\}\), leading to 66 = 46, 656 possible cases. Our numerical results again confirm that for the majority of these cases, environmental information is beneficial (Fig. S12b). Although there is also a non-negligible number of games for which populations are better off without information, the respective benefit of ignorance is often small (Fig. S12a).
Discussion
When people interact in a social dilemma, their actions often have spillovers to their social, natural, and economic environment56,57,58,59. Changes in the environment may in turn modulate the characteristics of the social dilemma. One important example of such a feedback loop is the ‘tragedy of the commons’60. Here, groups with little cooperation may deteriorate their environment, thereby restricting their own feasible long-run payoffs.
Such spillovers between the groups’ behavior and their environment can be formalized as a stochastic game28. In stochastic games, individuals interact for many time periods. In each period, they may face a different kind of social dilemma (state). The way they act in one state may affect the state they experience next. Recently, stochastic games have become a valuable model for the evolution of cooperation, because changing environments can reinforce reciprocity31,32,33. In particular, the evolution of cooperation may be favored in stochastic games even if cooperation is disfavored in each individual state31, see also Fig. S2a, b. However, implicit in these studies is the assumption that individuals are perfectly aware of the state they are in. Here, we systematically explore the implications of this assumption. We study to which extent individuals learn to cooperate, depending on whether or not they know the present state of their environment. We say the stochastic game shows a benefit of information if well-informed groups tend to be more cooperative. Otherwise, we speak of a benefit of ignorance.
Already for the most basic instantiation of a stochastic game, with two individuals and two states, we find that the impact of information is non-trivial. All three cases are possible: state information can be beneficial, neutral, or detrimental for cooperation. To explore this complex dynamics, we employ a mixture of analytical techniques and numerical approaches. Analytical results are feasible in the limiting case of weak selection43,44,45. Here, we observe an interesting symmetry. For every stochastic game in which there is a benefit of information, there is a corresponding game with a benefit of ignorance. This symmetry breaks down for positive selection. As selection increases, we observe more and more cases in which state information becomes beneficial. Moreover, in those few cases in which a benefit of ignorance persists, this benefit tends to be small. These results highlight the importance of accurate state information for responsible decision making.
However, our research also highlights a few notable exceptions. We identify several ecologically plausible scenarios where individuals cooperate more when they ignore their environment’s state. One example is the game displayed in Fig. 2e–h. Here, players only remain in the profitable state when they both cooperate. Once they defect, they transition to the inferior state. From there, they can only escape if at least one player cooperates. This game reflects a scenario where the group’s environment reinforces cooperation. Cooperative groups are rewarded by maintaining access to the more profitable state. Non-cooperative groups are punished by transitioning to an inferior state. For this kind of environmental feedback it was previously observed that the simple WSLS strategy can sustain cooperation easily31,32,33. WSLS can be instantiated without any state information. Once a population settles at WSLS, providing state information can even be harmful; in that case, individuals may deviate towards more nuanced strategies, which in turn can destabilize cooperation. In this sense, our study mirrors previous results suggesting that richer strategy spaces can sometimes reduce a population’s potential to cooperate61.
To allow for a systematic treatment, we focus on comparably simple games. Nevertheless, the number of games we consider is huge. For example, if all transitions between states are assumed to be deterministic (independent of chance), there are 64 cases to consider (Figs. S4–S6). If all but one transition are deterministic, we obtain 192 families of games (each having a free parameter q ∈ [0, 1], Figs. S7–S10). In addition, we also systematically explore the set of fully stochastic transition functions, by considering 46,656 different cases (Fig. S12). In all these instances, we observe that seemingly innocent changes in the environmental feedback or in the game parameters can lead to complex changes in the dynamics. In particular, games with a benefit of information may turn into games with a benefit of ignorance. As shown in Fig. S13, we observe a similar sensitivity in games with more than two players. These observations suggest that there may be no simple rule that predicts the impact of state information. These difficulties are likely to further increase as we extend the model to more complex strategies47,48,49,50,51,52, or environments with multiple states31.
Overall, we believe our work makes at least two contributions. First, we introduce a simple and easily generalizable framework to explore how state information (or the lack thereof) affects the evolution of cooperation. This framework can be generalized into various directions. For example, in our model we compare two limiting cases. We either consider a population in which no one knows the state of the environment, or in which everyone gets precise information about the environment’s state. There are many interesting cases in between. In some applications, population members may only obtain an imperfect signal of the environment’s true state42. Alternatively, one may adapt our model to explore games with information asymmetries. As one instance of such a model extension, individuals may choose to acquire state information at a small cost. Such a model would allow researchers to explore whether individuals acquire information exactly in those games for which we find a benefit of information.
As our second contribution, our results illustrate the intricate dynamics that arise in the presence of environmental, informational, and behavioral feedbacks. By exploring these feedbacks in elementary stochastic games, we can better understand the more complex dynamics of the socio-ecological systems around us.
Methods
Calculation of payoffs in stochastic games
In this study, we compare the evolutionary dynamics for two strategy sets. The first set \({{{{{{{{\mathcal{S}}}}}}}}}_{F}\) is the set of all memory-one strategies for the full-information setting. The second set \({{{{{{{{\mathcal{S}}}}}}}}}_{N}\) consists of all memory-one strategies for the no-information setting. Equivalently, we can define \({{{{{{{{\mathcal{S}}}}}}}}}_{N}\) as the set of all full-information strategies that do not condition their behavior on the current state,
We denote by \({{{{{{{{\mathcal{P}}}}}}}}}_{F}\) and \({{{{{{{{\mathcal{P}}}}}}}}}_{N}\) the respective sets of deterministic strategies, for which all entries are required to be either zero or one. In the following, we describe how to calculate payoffs when players have full information. Since any strategy for the case of no information can be associated with a full-information strategy, the same method also applies to the case of no information.
As our baseline, we consider games that are infinitely repeated and in which there is no discounting of the future. Given player 1’s effective memory-1 strategy p and player 2’s effective strategy \(\tilde{{{{{{{{\bf{p}}}}}}}}}\), such games can be described as a Markov chain. The states of this Markov chain correspond to the eight possible outcomes \(\omega=({s}_{i},\, a,\, \tilde{a})\) of a given round. Here, si ∈ {s1, s2} reflects the environmental state, and \(a,\, \tilde{a}\in \{C,\, D\}\) are player 1’s and player 2’s actions, respectively. The transition probability to move from state \(\omega=({s}_{i},\, a,\, \tilde{a})\) in one round to \({\omega }^{{\prime} }=({s}_{i}^{{\prime} },\, {a}^{{\prime} },\, {\tilde{a}}^{{\prime} })\) in the next round is a product of three factors,
The first factor
reflects the probability to move from environmental state si to \({s}_{i}^{{\prime} }\), given the player’s previous actions. Since the game is symmetric, we note that \({q}_{DC}^{i}\) is defined to be equal to \({q}_{CD}^{i}\). The other two factors are
They correspond to the conditional probability that each of the two players chooses the action prescribed in \({\omega }^{{\prime} }\). By collecting all these transition probabilities, we obtain an 8 × 8 transition matrix \(M=({m}_{\omega,{\omega }^{{\prime} }})\). Assuming that players are subject to errors and that the game’s transition vector satisfies q ≠ (1, 1, 1, 0, 0, 0), this transition matrix has a unique left eigenvector v. The entries \({v}_{a\tilde{a}}^{i}\) of this eigenvector give the frequency with which players observe the outcome \(\omega=({s}_{i},\, a,\, \tilde{a})\) over the course of the game. For a given transition vector q, we can thus compute the first players’ expected payoff as
The second player’s payoff can be computed analogously. Similarly, the average cooperation rate of the two players can be defined as follows.
In this work, we focus on games without discounting. However, similar methods can be applied to games in which future payoffs are discounted by a factor of δ (or equivalently, to games with a continuation probability δ). For δ < 1, instead of computing the left eigenvector of the transition matrix, we define v to be the vector
In this expression, v0 is the vector that contains the probabilities to observe each of the eight possible states ω in the very first round. Moreover, I8 is the 8 × 8 identity matrix. Similar to before, the entries of \({{{{{{{\bf{v}}}}}}}}=({v}_{a,\tilde{a}}^{i})\) represent the weighted average that describes how often the two players visit the state ω over the course of the game62. The payoffs and the average cooperation rate can then again be computed with the formulas in (10) and (11). We use this approach when we explore the impact of the continuation probability δ on the robustness of our results in Fig. S1e, f.
Evolutionary dynamics
To model how players learn to adopt new strategies over time, we study a pairwise comparison process55 in the limit of rare mutations63,64,65,66. We consider a population of fixed size N. Initially, all players adopt the same resident strategy pR = ALLD. Then one of the players switches to a randomly chosen alternative strategy pM. This mutant strategy may either go extinct or reach fixation, depending on which payoff it yields compared to the resident strategy. If the number of players adopting the mutant strategy is given by k, the expected payoffs of the two strategies is
Based on these payoffs, the fixation probability of the mutant strategy can be computed explicitly43,67,
As the selection strength parameter β approaches zero, this fixation probability approaches the neutral probability 1/N, as one may expect. As β increases, the fixation probability is increasingly biased in favor of mutant strategies with a high relative payoff.
If the mutant fixes, it becomes the new resident strategy. Then another mutant strategy is introduced and either fixes or goes extinct. By iterating this basic process for τ time steps, we obtain a sequence (p0, p1, p2, …, pτ) where pt is the resident strategy present in the population after t mutant strategies have been introduced. Based on this sequence, we can calculate the population’s average cooperation rate and payoff as
Because the evolutionary process is ergodic for any finite β, these time averages exist and are independent of the population’s initial composition.
If players have infinitely many strategies, the payoff and cooperation averages in (16) can only be approximated, by simulating the above described process for a sufficiently long time τ. However, when strategies are taken from a finite set \({{{{{{{\mathcal{P}}}}}}}}\), these quantities can be computed exactly. In that case, the evolutionary dynamics can again be described as a Markov chain63. Each state of this Markov chain corresponds to one possible resident population \({{{{{{{\bf{p}}}}}}}}\in {{{{{{{\mathcal{P}}}}}}}}\). Given that the current resident population uses p, the probability that the next resident population uses strategy \(\tilde{{{{{{{{\bf{p}}}}}}}}} \, \ne \, {{{{{{{\bf{p}}}}}}}}\) is given by \(\rho ({{{{{{{\bf{p}}}}}}}},\tilde{{{{{{{{\bf{p}}}}}}}}})/|{{{{{{{\mathcal{P}}}}}}}}|\). By calculating the invariant distribution w = (wp) of this Markov chain, we can compute the average cooperation rates and payoffs according to Eq. (16) by evaluating
Herein, we perform these calculations for the specific strategy sets for full information and no information, \({{{{{{{{\mathcal{P}}}}}}}}}_{F}\) and \({{{{{{{{\mathcal{P}}}}}}}}}_{N}\), respectively. By comparing the respective averages \({\hat{\gamma }}^{F}\) and \({\hat{\gamma }}^{N}\), we characterize for which stochastic games there is a benefit of information, by computing \({V}_{\beta }({{{{{{{\bf{q}}}}}}}})={\hat{\gamma }}^{F}-{\hat{\gamma }}^{N}\).
We use this process based on deterministic strategies, pairwise comparisons, and rare mutations for all of our main text figures. As robustness checks, we present several variations of this model in the SI. For example, in Fig. S1a,b, we show simulation results for players with stochastic memory-1 strategies. To this end, we assume that mutant strategies are randomly drawn from the spaces \({{{{{{{{\mathcal{S}}}}}}}}}_{F}\) and \({{{{{{{{\mathcal{S}}}}}}}}}_{N}\). To make sure that strategies close to the corners get sufficient weight, the entries \({p}_{a\tilde{a}}^{i}\) are sampled according to an arcsine distribution, as for example in Nowak and Sigmund12. Similarly, in Fig. S1h, i, we show simulations for positive mutation rates. In Fig. S2a, b, we compare the results from Fig. 2 to a setup in which players only engage in the game in the first state (without any transitions), or in which they only engage in the game in the second state. In addition, in Fig. S2c, d, we run simulations when players are unable to condition their behavior on the outcome of the previous round. Finally, to explore whether our qualitative results depend on the specific learning process we use, we have also implemented simulations with an alternative learning process, introspection dynamics68,69,70. The respective results are shown in Fig. S3.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The generated simulation data is available at https://github.com/kleshnina/stochgames_info.
Code availability
All numerical computations were performed with Matlab. For some of the symbolic calculations we used Mathematica. The respective code is available at zenodo71 and on GitHub: https://github.com/kleshnina/stochgames_info.
References
Nowak, M. A. Five rules for the evolution of cooperation. Science. 314, 1560–1563 (2006).
Dugatkin, L. A. Cooperation among animals: an evolutionary perspective. (Oxford Univ. Press, 1997).
Melis, A. P. & Semmann, D. How is human cooperation different? Philos. Transac. R. Soc. B. 365, 2663–2674 (2010).
Rand, D. G. & Nowak, M. A. Human cooperation. Trends Cogn. Sci. 117, 413–425 (2012).
Fischbacher, U., Gächter, S. & Fehr, E. Are people conditionally cooperative? Evidence from a public goods experiment. Econ. Lett. 71, 397–404 (2001).
Trivers, R. The evolution of reciprocal altruism. Q. Rev. Biol. 46, 35–57 (1971).
Hilbe, C., Chatterjee, K. & Nowak, M. A. Partners and rivals in direct reciprocity. Nat. Human Behav. 2, 469–477 (2018).
Rapoport, A., Chammah, A. M. & Orwant, C. J. Prisoner’s dilemma: A study in conflict and cooperation. vol. 165. (University of Michigan press, 1965.
Axelrod, R. The emergence of cooperation among egoists. Am. Political sci. Rev. 75, 306–318 (1981).
Molander, P. The optimal level of generosity in a selfish, uncertain environment. J. Confl. Resol. 29, 611–618 (1985).
Nowak, M. A. & Sigmund, K. Tit for tat in heterogeneous populations. Nature. 355, 250–253 (1992).
Nowak, M. & Sigmund, K. A strategy of win-stay, lose-shift that outperforms tit-for-tat in the Prisoner’s Dilemma game. Nature. 364, 56–58 (1993).
Kraines, D. P. & Kraines, V. Y. Learning to cooperate with Pavlov an adaptive strategy for the iterated prisoner’s dilemma with noise. Theor. Decis. 35, 107–150 (1993).
van Segbroeck, S., Pacheco, J. M., Lenaerts, T. & Santos, F. C. Emergence of fairness in repeated group interactions. Phys. Rev. Lett. 108, 158104 (2012).
Pinheiro, F. L., Vasconcelos, V. V., Santos, F. C. & Pacheco, J. M. Evolution of all-or-none strategies in repeated public goods dilemmas. PLoS Comput. Biol. 10, e1003945 (2014).
Hilbe, C., Wu, B., Traulsen, A. & Nowak, M. A. Cooperation and control in multiplayer social dilemmas. Proc. Natl Acad. Sci. USA. 111, 16425–16430 (2014).
Stewart, A. J. & Plotkin, J. B. From extortion to generosity, evolution in the iterated prisoner’s dilemma. Proc. Natl Acad. Sci. 110, 15348–15353 (2013).
Stewart, A. J. & Plotkin, J. B. Collapse of cooperation in evolving games. Proc. Natl Acad. Sci. 111, 17558–17563 (2014).
Chica, M., Hernández, J. M. & Bulchand-Gidumal, J. A collective risk dilemma for tourism restrictions under the COVID-19 context. Sci. Rep. 11, 1–12 (2021).
Johnson, T. et al. Slowing COVID-19 transmission as a social dilemma: Lessons for government officials from interdisciplinary research on cooperation. J. Behav. Public Adminis. 3, 1–13 (2020).
Abel, M., Byker, T. & Carpenter, J. Socially optimal mistakes? Debiasing COVID-19 mortality risk perceptions and prosocial behavior. J. Econ. Behav. Org. 183, 456–480 (2021).
Samuelson, C. D. Energy conservation: A social dilemma approach. Soc. Behav. 5, 207–230 (1990).
Van Vugt, M. Central, individual, or collective control? Social dilemma strategies for natural resource management. Am. Behav. Sci. 45, 783–800 (2002).
Cumming, G. S. A review of social dilemmas and social-ecological traps in conservation and natural resource management. Conserv. Lett. 11, e12376 (2018).
Vesely, S., Klöckner, C. A. & Brick, C. Pro-environmental behavior as a signal of cooperativeness: Evidence from a social dilemma experiment. J. Environ. Psychol. 67, 101362 (2020).
Milfont, T. L. Global warming, climate change and human psychology. In Psychology approaches to sustainability: Current trends in theory, research and practice. Vol 19, 42 (Nova Science, 2010).
Tavoni, A., Dannenberg, A., Kallis, G. & Löschel, A. Inequality, communication, and the avoidance of disastrous climate change in a public goods game. Proc. Natl Acad. Sci. 108, 11825–11829 (2011).
Shapley, L. S. Stochastic games. Proce. Natl Acad. Sci. 39, 1095–1100 (1953).
Neyman, A. & Sorin, S. Stochastic games and applications. (Kluwer Academic Press, Dordrecht, 2003).
Barfuss, W., Donges, J. F. & Kurths, J. Deterministic limit of temporal difference reinforcement learning for stochastic games. Phys. Rev. E. 99, 043305 (2019).
Hilbe, C., Simsa, S., Chatterjee, K. & Nowak, M. A. Evolution of cooperation in stochastic games. Nature. 559, 246–249 (2018).
Su, Q., Zhou, L. & Wang, L. Evolutionary multiplayer games on graphs with edge diversity. PLoS Comput. Biol. 15, e1006947 (2019).
Wang, G., Su, Q. & Wang, L. Evolution of state-dependent strategies in stochastic games. J. Theor. Biol. 527, e110818 (2021).
Barrett, S. & Dannenberg, A. Sensitivity of collective action to uncertainty about climate tipping points. Nat. Clim. Change. 4, 36–39 (2014).
Abou Chakra, M., Bumann, S., Schenk, H., Oschlies, A. & Traulsen, A. Immediate action is the best strategy when facing uncertain climate change. Nat. Commun. 9, 1–9 (2018).
Morton, T. A., Rabinovich, A., Marshall, D. & Bretschneider, P. The future that may (or may not) come: How framing changes responses to uncertainty in climate change communications. Glob. Environ. Change 21, 103–109 (2011).
Paarporn, K., Eksin, C. & Weitz, J. S. Information sharing for a coordination game in fluctuating environments. J. Theor. Biol. 454, 376–385 (2018).
Harsanyi, J. C. Games with incomplete information played by “Bayesian” players, I–III Part I. The basic model. Manag. Sci. 14, 159–182 (1967).
Levine, P. & Ponssard, J. P. The values of information in some nonzero sum games. Int. J. Game Theor. 6, 221–229 (1977).
Bagh, A. & Kusunose, Y. On the economic value of signals. The BE Journal of Theoretical Economics. 20(1), (Walter de Gruyter GmbH, 2020).
Hansen, E. A., Bernstein, D. S. & Zilberstein, S. Dynamic programming for partially observable stochastic games. In: AAAI. vol. 4; p. 709–715, (2004).
Barfuss, W. & Mann, R. P. Modeling the effects of environmental and perceptual uncertainty using deterministic reinforcement learning dynamics with partial observability. Phys. Rev. E. 105, 034409 (2022).
Nowak, M. A., Sasaki, A., Taylor, C. & Fudenberg, D. Emergence of cooperation and evolutionary stability in finite populations. Nature. 428, 646–650 (2004).
Wild, G. & Traulsen, A. The different limits of weak selection and the evolutionary dynamics of finite populations. J. Theor. Biol. 247, 382–390 (2007).
Wu, B., Altrock, P. M., Wang, L. & Traulsen, A. Universality of weak selection. Phys. Rev. E. 82, 046106 (2010).
Sigmund, K. The calculus of selfishness. vol. 6. (Princeton University Press, 2010).
van Veelen, M., García, J., Rand, D. G. & Nowak, M. A. Direct reciprocity in structured populations. Proc. Natl Acad. Sci. USA. 109, 9929–9934 (2012).
García, J. & van Veelen, M. In and out of equilibrium I: Evolution of strategies in repeated games with discounting. J. Econ. Theory. 161, 161–189 (2016).
García, J. & van Veelen, M. No strategy can win in the repeated prisoner’s dilemma: Linking game theory and computer simulations. Front. Robot. AI. 5, 102 (2018).
Hilbe, C., Martinez-Vaquero, L. A., Chatterjee, K. & Nowak, M. A. Memory-n strategies of direct reciprocity. Proc. Natl Acad. Sci. USA. 114, 4715–4720 (2017).
Murase, Y. & Baek, S. K. Five rules for friendly rivalry in direct reciprocity. Sci. Rep. 10, 16904 (2020).
Li, J. et al. Evolution of cooperation through cumulative reciprocity. Nat. Comput. Sci. 2, 677–686 (2022).
Boyd, R. Mistakes allow evolutionary stability in the repeated Prisoner’s Dilemma game. J. Theor. Biol. 136, 47–56 (1989).
Brandt, H. & Sigmund, K. The good, the bad and the discriminator - Errors in direct and indirect reciprocity. J. Theor. Biol. 239, 183–194 (2006).
Traulsen, A., Pacheco, J. M. & Nowak, M. A. Pairwise comparison and selection temperature in evolutionary game dynamics. J. Theor. Biol. 246, 522–529 (2007).
Weitz, J. S., Eksin, C., Paarporn, K., Brown, S. P. & Ratcliff, W. C. An oscillating tragedy of the commons in replicator dynamics with game-environment feedback. Proc. Natl Acad. Sci. 113, E7518–E7525 (2016).
Tilman, A. R., Plotkin, J. B. & Akçay, E. Evolutionary games with environmental feedbacks. Nat. Commun. 11, 1–11 (2020).
Wang, X., Zheng, Z. & Fu, F. Steering eco-evolutionary game dynamics with manifold control. Proc. R. Soc. A. 476, 20190643 (2020).
Barfuss, W., Donges, J. F., Vasconcelos, V. V., Kurths, J. & Levin, S. A. Caring for the future can turn tragedy into comedy for long-term collective action under risk of collapse. Proc. Natl Acad. Sci. USA. 117, 12915–12922 (2020).
Hardin, G. The Tragedy of the Commons. Science 162, 1243–1248 (1968).
Stewart, A. J., Parsons, T. L. & Plotkin, J. B. Evolutionary consequences of behavioral diversity. Proc. Natl Acad. Sci. 113, E7003–E7009 (2016).
Hilbe, C., Traulsen, A. & Sigmund, K. Partners or rivals? Strategies for the iterated prisoner’s dilemma. Games Econ. Behav. 92, 41–52 (2015).
Fudenberg, D. & Imhof, L. A. Imitation processes with small mutations. J. Econ. Theor. 131, 251–262 (2006).
Wu, B., Gokhale, C. S., Wang, L. & Traulsen, A. How small are small mutation rates? J. Math. Biol. 64, 803–827 (2012).
Imhof, L. A. & Nowak, M. A. Stochastic evolutionary dynamics of direct reciprocity. Proc. R. Soc. 277, 463–468 (2010).
McAvoy, A. Comment on “Imitation processes with small mutations”[J. Econ. Theory 131 (2006) 251–262]. J. Econ. Theory. 159, 66–69 (2015).
Traulsen, A. & Hauert, C. Stochastic evolutionary game dynamics. Rev. Nonlinear Dynam. Complex. 2, 25–61 (2009).
Hauser, O. P., Hilbe, C., Chatterjee, K. & Nowak, M. A. Social dilemmas among unequals. Nature 572, 524–527 (2019).
Couto, M. C., Giaimo, S. & Hilbe, C. Introspection dynamics: A simple model of counterfactual learning in asymmetric games. N. J. Phys. 24, 063010 (2022).
Ramírez, M. A., Smerlak, M., Traulsen, A. & Jost, J. Diversity enables the jump towards cooperation for the traveler’s dilemma. Sci. Rep. 13, 1441 (2022).
Kleshnina, M., Hilbe, C., Šimsa, S., Chatterjee, K. & Nowak, M. The effect of environmental information on evolution of cooperation in stochastic games; (2023). Available from: https://zenodo.org/badge/latestdoi/417090802.
Acknowledgements
This work was supported by the European Research Council CoG 863818 (ForM-SMArt) (to K.C.), the European Research Council Starting Grant 850529: E-DIRECT (to C.H.), the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie Grant Agreement #754411 and the French Agence Nationale de la Recherche (under the Investissement d’Avenir programme, ANR-17-EURE-0010) (to M.K.).
Author information
Authors and Affiliations
Contributions
All authors conceived and discussed the study; S.S. ran some preliminary simulations; M.K. and C.H. analyzed the model, conducted further simulations, and wrote the first draft of the manuscript; M.K., C.H., S.S., M.N. and K.C. discussed the results and edited the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Ceyhun Eksin, Alexander Stewart, the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Kleshnina, M., Hilbe, C., Šimsa, Š. et al. The effect of environmental information on evolution of cooperation in stochastic games. Nat Commun 14, 4153 (2023). https://doi.org/10.1038/s41467-023-39625-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-023-39625-9
This article is cited by
-
Complexity synchronization in emergent intelligence
Scientific Reports (2024)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.