Abstract
Mitigating climate change effects involves strategic decisions by individuals that may choose to limit their emissions at a cost. Everyone shares the ensuing benefits and thereby individuals can free ride on the effort of others, which may lead to the tragedy of the commons. For this reason, climate action can be conveniently formulated in terms of Public Goods Dilemmas often assuming that a minimum collective effort is required to ensure any benefit, and that decisionmaking may be contingent on the risk associated with future losses. Here we investigate the impact of reward and punishment in this type of collective endeavors — coined as collectiverisk dilemmas — by means of a dynamic, evolutionary approach. We show that rewards (positive incentives) are essential to initiate cooperation, mostly when the perception of risk is low. On the other hand, we find that sanctions (negative incentives) are instrumental to maintain cooperation. Altogether, our results are gratifying, given the apriori limitations of effectively implementing sanctions in international agreements. Finally, we show that whenever collective action is most challenging to succeed, the best results are obtained when both rewards and sanctions are synergistically combined into a single policy.
Introduction
Climate change stands as one of our biggest challenges in what concerns the emergence and sustainability of cooperation^{1,2}. Indeed, world citizens build up high expectations every time a new International Environmental Summit is settled, unfortunately with few resulting solutions implemented so far. This calls for the development of more effective incentives, agreements and binding mechanisms. The problem can be conveniently framed resorting to the mathematics of game theory, being a paradigmatic example of a Public Goods Game^{3}: at stake there is a global good from which every single individual can profit, irrespectively of contributing to maintain it. Parties may free ride on the efforts of others, avoiding any effort themselves, while driving the population into the tragedy of the commons^{4}. Moreover, since here cooperation aims at averting collective losses, this type of dilemmas is often referred as public bad games, in which achieving collective goals often depends on reaching a threshold number of cooperative group members^{5,6,7,8}.
One of the multiple obstacles attributed to such agreements is misperceiving the actual risk of future losses, which significantly affects the ensuing dynamics of cooperation^{5,9}. Another problem relates to both the incapacity to sanction those who do not contribute to the welfare of the planet, and/or to reward those who subscribe to green policies^{10}. Previous cooperation studies show that reward (positive incentives), punishment (negative incentives) and the combination of both^{11,12,13,14,15,16,17,18,19,20,21,22,23} have a different impact depending on the dilemma in place. Assessing the impact of reward and punishment (isolated or combined) in the context of Nperson threshold games — and in the particular case of climate change dilemmas — remains, however, an open problem.
Here we study, theoretically, the role of both institutional reward and punishment in the context of climate change agreements. Previous works consider the public good as a linear function of the number of contributors^{12,17,21,22} and conclude that punishment is more effective than reward (for an optimal combination of punishment and reward see ref.^{12}). We depart from this linear regime by modeling the returns on the public good as a threshold problem, combined with an uncertain outcome, represented by a risk of failure. As a result – and as detailed below – the dynamical portrait of our model reveals new internal equilibria^{9}, allowing to identify the dynamics of coordination and coexistence typifying collective action problems. As discussed below, the reward and punishment mechanisms will impact, in a nontrivial way, those equilibria.
We consider a population of size Z, where each individual can be either a Cooperator (C) or a Defector (D), when participating in a Nplayer CollectiveRisk dilemma (CRD)^{5,9,10,24,25,26,27,28,29,30}. In this game, each participant starts with an initial endowment B (viewed as the asset value at stake) that may be used to contribute to the mitigation of the effects of climate change. A cooperator incurs a cost corresponding to a fraction c of her initial endowment B, in order to help prevent a collective failure. On the other hand, a defector refuses to have any cost, hoping to free ride on the contributions of others. We require a minimum number of 0 < M ≤ N cooperators in a group of size N before collective action is realized; if a group of size \(N\) does not contain at least M Cs, all members lose their remaining endowments with a probability r, where r (0 ≤ r ≤ 1) stands as the risk of collective failure. Otherwise, everyone will keep whatever she has. This CRD formulation has been shown to capture some of the key features discovered in recent experiments^{5,24,31,32,33}, while highlighting the importance of risk. In addition, it allows one to test model parameters in a systematic way that is not possible in human experiments. Moreover, the adoption of nonlinear returns mimics situations common to many human and nonhuman endeavors^{6,34,35,36,37,38,39,40,41}, where a minimum joint effort is required to achieve a collective goal. Thus, the applicability of this framework extends well beyond environmental governance, given the ubiquity of such type of social dilemmas in nature and societies.
Following Chen et al.^{12}, we include both reward and punishment mechanisms in this model. A fixed group budget Nδ (where δ ≥ 0 stands for a percapita incentive) is assumed to be available, of which a fraction w is applied to a reward policy and the remaining 1w to a punishment policy. We assume the effective impact of both policies to be equivalent, meaning that each unit spent will directly increase/decrease the payoff of a cooperator/defector by the same amount. For details on policies with different efficiencies, see Methods.
Instead of considering a collection of rational agents engaging in oneshot Public Goods Games^{32,42}, here we adopt an evolutionary description of the behavioral dynamics^{9}, in which individuals tend to copy those appearing to be more successful. Success (or fitness) of individuals is here associated with their average payoff. All individuals are equally likely to interact with each other, causing all cooperators and defectors to be equivalent, on average, and only distinguishable by the strategy they adopt. Therefore, and considering that only two strategies are available, the number of cooperators is sufficient to describe any configuration of the population. The number of individuals adopting a given strategy (either C or D) evolves in time according to a stochastic birth–death process^{43,44}, which describes the time evolution of the social learning dynamics (with exploration): At each timestep each individual (X, with fitness f_{X}) is given the opportunity to change strategy; with probability μ, X randomly explores the strategy space^{45} (a process similar to mutations in a biological context that precludes the existence of absorbing states). With probability (1μ), X may adopt the strategy of a randomly selected individual (Y, with fitness f_{Y}), with a probability that increases with the fitness difference (f_{Y}–f_{X})^{44}. This renders the stationary distribution (see Methods) an extremely useful tool to rank the most visited states given the ensuing evolutionary dynamics of the population. Indeed, the stationary distribution provides the prevalence of each of the population’s possible configuration, in terms of the number of Cs (k) and Ds (Zk). Combined with the probability of success characterizing each configuration, the stationary distribution can be used to compute the overall success probability of a given population – the average group achievement, η_{G}. This value represents the average fraction of groups that will overcome the CRD, successfully preserving the public good.
Results
In Fig. 1 we compare the average group achievement η_{G} (as a function of risk) in four scenarios: (i) a reference scenario without any policy (i.e., no reward or punishment, in black); and three scenarios where a budget is applied to (ii) rewards, (iii) punishment and (iv) a combination of rewards and sanctions (see below). Our results are shown for the two most paradigmatic regimes: low (Fig. 1A) and high (Fig. 1B) coordination requirements. Naturally η_{G} improves whenever a policy is applied. Less obvious is the difference between the various policies. Applying only rewards (blue curves in Fig. 1) is more effective than only punishment (red curve) for low values of risk. The opposite happens when risk is high. On scenarios with a low relative threshold (Fig. 1A), rewards play the key role, with sanctions only marginally outperforming them for very high values of risk. For high coordination thresholds (Fig. 1B) reward and punishment portray comparable efficiency in the promotion of cooperation, with purePunishment (w = 0) performing slightly better than pureReward (w = 1).
Justifying these differences is difficult from the analysis of η_{G} alone. To better understand the behavior dynamics under Reward and Punishment, we show in Fig. 2 the gradients of selection (top panels) and stationary distributions (lower panels) for each case and different budget values. Each gradient of selection represents, for each discrete state k/Z (i.e., fraction of Cs), the difference \(G(k)={T}^{+}(k){T}^{}(k)\) among the probability to increase (T^{+}(k)) and decrease (T^{−}(k)) the number of cooperators (see Methods) by one. Whenever G(k) > 0 the fraction of Cs is likely to increase; whenever G(k) < 0 the opposite is expected to happen. The stationary distributions show how likely it is to find the population in each (discrete) configuration of our system. The panels on the lefthand side show the results obtained for the CRD under pureReward; on the righthand side, we show the results obtained for purePunishment.
Naturally, both mechanisms are inoperative whenever the percapita incentives are inexistent (δ = 0), creating a natural reference scenario in which to study the impact of Reward and Punishment on the CRD. In this case, above a certain value of risk (r), decisionmaking is characterized by two internal equilibria (i.e., adjacent finite population states with opposite gradient sign, representing the analogue of fixed points in a dynamical system characterizing evolution in infinite populations). Above a certain fraction of cooperators the population overcomes the coordination barrier and naturally selforganizes towards a stable coexistence of cooperators and defectors. Otherwise, the population is condemned to evolve towards a monomorphic population of defectors, leading to the tragedy of the commons^{9}. As the budget for incentives increases, using either Reward or Punishment leads to very different outcomes, as depicted in Fig. 2.
Contrary to the case of linear Public Goods Games^{12}, in the CRD coordination and coexistence dynamics already exist in the absence of any reward/punishment incentive. Reward is particularly effective when cooperation is low (small k/Z), showing a significant impact on the location of the finite population analogue of an unstable fixed point. Indeed, increasing δ lowers the minimum number of cooperators required to reach the cooperative basin of attraction (as well as increasing the prevalence of cooperators in coexistence point on the right), which ultimately disappears for high δ (Fig. 2A). This means that a smaller coordination effort is required before the population dynamics start to naturally favor the increase of cooperators. Once this initial barrier is surpassed, the population will naturally tend towards an equilibrium state, which does not improve appreciably under Reward. The opposite happens under Punishment. The location of the coordination point is little affected, yet once this barrier is overcome, the population will evolve towards a more favorable equilibrium (Fig. 2B). Thus, while Reward seems to be particularly effective to bootstrap cooperation towards a more cooperative basin of attraction, Punishment seems effective in sustaining high levels of cooperation.
As a consequence, the most frequently observed configurations are very different when using each of the policies. As shown by the stationary distributions (Fig. 2C,D), under Reward the population visits more often states with intermediate values of cooperation (i.e., where Cs and Ds coexist). Intuitively, this happens because the coordination effort is eased by the rewards, causing the population to effectively overcome it and reach the coexistence point (the equilibrium state with an intermediate amount of cooperators) thus spending most of the time near it. On the other hand, Punishment will not ease the coordination effort, and thus the population will spend most of the time in states of low cooperation, failing to overcome this barrier. Notwithstanding, once surpassed, the population will stabilize on higher states of cooperation. This is especially evident for high budgets, as shown with δ = 0.02 (blue line). Moreover, since Nδ corresponds to a fixed total amount which is distributed by the existing cooperators/defectors, this causes the percooperator/defector budget to vary depending on the number of existing cooperators/defectors (i.e., each of the j cooperators receives wδN/j and each defector loses (1 − w)δN/(N − j)). In other words, positive (negative) incentives become very profitable (or severe) if defection (cooperation) prevails within a group. In particular, whenever the budget is significant (see, e.g., δ = 0.02 in Fig. 2) the punishment becomes so high when there are few defectors within a group, that a new equilibrium emerges close to full cooperation.
The results in Fig. 2 show that Reward can be instrumental in fostering prosocial behavior, while Punishment can be used for its maintenance. This suggests that, to combine both policies synergistically, pureReward (w = 1) should be applied at first, when there are few cooperators (low k/Z); above a certain critical point (k/Z = s) one should switch to purePunishment (w = 0). In the Methods section, we demonstrate that, similar to linear Public Goods Games^{12}, in CRDs this is indeed the policy which minimizes the advantage of the defector, even if we consider the alternative possibility of applying both policies simultaneously. In Methods, we also compute a general expression for the optimal switching point s*, that is, the value of k above which Punishment should be applied instead of Reward to maximize cooperation and group achievement. By using such policy — that we denote by s* — we obtain the best results shown with an orange line in Fig. 1. We propose, however, to explore what happens in the context of a CRD when s* is not used. How much cooperation is lost when we deviate from s* to either of the pure policies, or to a policy which uses a switching point different from the optimal one?
Figure 3 illustrates how the choice of the switching point s impacts the overall cooperation, as evaluated by η_{G}, for different values of risk. For a switching point of s = k/Z = 1.0 (0.0) a static policy of always pureReward (purePunishment) is used. This can be seen on the far right (left) of Fig. 3. Figure 3 suggests that, for low thresholds, an optimal policy switching (which, for the parameters shown, occurs for s = 50%, see Methods) is only marginally better than a policy solely based on rewards (s = 1). Figure 3 also allows for a comparison of what happens when the switching point occurs too late (excessive rewards) or too early (excessive sanctions) in a lowthreshold scenario. A late switch is significantly less harmful than an early one. In other words, our results suggest that when the population configuration cannot be precisely observed, it is preferable to keep rewarding for longer. This said, whenever the perception of risk is high (an unlikely situation these days) an early switch is slightly less harmful than a late one. In the most difficult scenarios, where stringent coordination requirements (large M) are combined with a low perception of risk (low r), the adoption of a combined policy becomes necessary (see right panel of Fig. 1).
Discussion
One might expect the impact of Reward and Punishment to lead to symmetric outcomes – Punishment would be effective for highcooperation the same way that Reward is effective for lowcooperation. In lowcooperation scenarios (under low risk, threshold or budget) Reward alone plays the most important role. However, in the opposite scenario, Punishment alone does not have the same impact. Either a favourable scenario occurs, where any policy yields a satisfying result, or Punishment cannot improve outcomes on its own. In the latter case, the synergy between both policies becomes essential to achieve cooperation. Such optimal policy involves a combination of the single policies, Reward and Punishment, which is dynamic, in the sense that the combination does not remain the same for all configurations of the population. It corresponds to employing pure Reward at first, when cooperation is low, switching subsequently to Punishment whenever a predetermined level of cooperation is reached.
The optimal procedure, however, is unlikely to be realistic in the context of Climate Change agreements. Indeed, and unlike other Public Goods Dilemmas, where Reward and Punishment constitute the main policies available for Institutions to foster cooperative collective action, in International Agreements it is widely recognized that Punishment is very difficult to implement^{2,42}. This has been, in fact, one of the main criticisms put forward in connection with Global Agreements on Climate Mitigation: They suffer from the lack of sanctioning mechanisms as it is practically impossible to enforce any type of sanctioning at a Global level. In this sense, the results obtained here by means of our dynamical, evolutionary approach, are gratifying, given these apriori limitations of sanctioning in CRDs. Not only do we show that Reward is essential to foster cooperation, mostly when both the perception of risk is low and the overall number of engaged parties is small (low k/Z), but also we show that Punishment mostly acts to sustain cooperation, after it has been installed. Given that lowrisk scenarios are more common and harmful to cooperation than highrisk ones, our results in connection with rewards provide a viable way to explore in the quest for establishing Global cooperative collective action. Reward policies may also be very relevant in scenarios where Climate Agreements are coupled with other International agreements from which parties are not interested to deviate from^{2,42}. Finally, the fact that rewards ease coordination towards cooperative states suggests that positive incentives should also be used within intervention mechanisms aiming at fostering prosociality in artificial systems and hybrid populations comprising humans and machines^{46,47,48,49}.
The model used takes for granted the existence of an institution with a budget available to implement either Reward or Punishment. New behaviours may emerge once individuals are called to decide whether or not to contribute to such an institution, allowing for a scenario where this institution fails to exist^{10,28,50,51}. At present, and under the Paris agreement, we are witnessing the potential birth of an informal funding institution, whose goal is to finance developing countries to help them increase their mitigation capacity. Clearly, this is just an example pointing out to the fact that the prevalence of local and global institutional incentives may depend and may be influenced by the distribution of wealth available among parties, in the same way that it influences the actual contributions to the public good^{10,29,33}. Finally, several other effects may further influence and/or affect the present results. Among others, if intermediate tasks are considered^{33}, or if individuals have the opportunity to pledge their contribution before their actual action^{7,40,52}, it is likely that prosocial behavior may be enhanced. Work along these lines is in progress.
Methods
Public goods and collective risks
Let us consider a population with Z individuals, where each individual can be a cooperator (C) or a defector (D). For each round of this game, a group of N players is sampled from the original finite population of size Z, which corresponds to a process of sampling without replacement. The probability of a group comprising any possible combination of Cs and Ds is given by the hypergeometric distribution. In the context of a given group, a strategy is associated with a payoff value corresponding to an individual’s earnings in that round, which depend on the action of the rest of group. Fitness is the expected payoff of an individual in a population, before knowing to which group he was assigned. This way, for a population with k out of Z Cs and each group containing j out of N Cs, the fitness of a D and a C can be written as:
where \({\Pi }_{{\rm{C}}}(j)\) and \({\Pi }_{{\rm{D}}}(j)\) stand for the payoff or a C and a D in a single round, in a group with N players and j Cs. To define the payoff functions, let \(\theta (x)\) be a Heaviside stepfunction distribution, where θ(x) = 0 if x < 0 and θ(x) = 1 if x ≥ 0. Each player can contribute with a fraction c of her endowment B (with 0 ≤ c ≤ 1), and in case a group contains less than M cooperators (0 < M ≤ N) there is a risk r of failure (0 ≤ r ≤ 1), in which case no player obtains her remaining endowment. The payoff of a defector (\({\Pi }_{D}(j)\)) and the payoff of a cooperator (\({\Pi }_{C}(j)\)), before incorporating any policy, can be written as^{9}:
Reward and punishment
To include a Reward or a Punishment policy, let us follow ref.^{12} and consider a group budget N∙δ which can be used to implement any type of policy. The fraction of N∙δ applied to Reward is represented by the weight w, with 0 ≤ w ≤ 1. Parameters a and b correspond to the efficiency of Reward and Punishment (for all Figures above it was assumed that a = b = 1).
Naturally, these new payoff functions can be included into the previous fitness functions (\({\Pi }_{D}^{P}\) replaces \({\Pi }_{D}\) and \({\Pi }_{C}^{R}\) replaces \({\Pi }_{C}\)), letting fitness values account for the different policies.
Evolutionary dynamics in finite populations
The fitness functions written above allow us to setup the (discrete time) evolutionary dynamics. Indeed, the configurations of the entire population may be used to define a Markov Chain, where each state is characterized by number of cooperators^{9,44}. To decide in which direction the system will evolve, at each step a player i and a neighbour j of her are drawn at random from the population. Player i decides whether to imitate her neighbour j with a probability depending on the difference between their fitness^{43,44}. This way, a system with k cooperators may stay in the same state, switch to k − 1 or to k + 1. The probability of player i imitating player j can be given by the Fermi function:
where β is the intensity of selection. Using this probability distribution, we can fully characterize this Markov process. Let k be the total number of cooperators in the population and Z the total size of the population. \({T}^{+}(k)\) and \({T}^{}(k)\) are the probabilities to increase and decrease k by one, respectively^{44}:
The most likely direction can be computed using the difference \(G(k)\equiv {T}^{+}(k){T}^{}(k)\). A mutation rate can be introduced by using transition probabilities \({T}_{\mu }^{+}(k)=(1\mu ){T}^{+}(k)+\mu \frac{Zk}{Z}\) and \({T}_{\mu }^{}(k)=(1\mu ){T}^{}(k)+\mu \frac{k}{Z}\). In all cases we used a mutation rate μ = 0.01, this way avoiding the population to fixate in a monomorphic configuration. In this context, the stationary distribution becomes a very useful tool to analyse the overall population dynamics, providing the probability \({\bar{p}}_{k}=P(\frac{k}{Z})\) for each of the Z + 1 states of this Markov Chain to be occupied^{53,54}. For each given population state k, the hypergeometric distribution can be used to compute the average fraction of groups that obtain success −a_{G}(k). Using the stationary distribution and the average group success, the average group achievement (η_{G}) can then be computed, providing the overall probability of achieving success: \({\eta }_{G}=\mathop{\sum }\limits_{k=0}^{Z}{\bar{p}}_{k}{a}_{G}(k)\).
Combined policies
By allowing the weight w to depend on the frequency of cooperators, we can derive the optimal switching point s* between positive and negative incentives by minimizing the defector’s advantage (f_{D} − f_{C}). This is done similarly to ref.^{12}, but using finite populations and therefore a hypergeometric distribution (see Eqs (1), (2), (5), and (6)), to account for sampling without replacement. From Eqs (1) and (2), we get
from which we aim at finding the value of w (with respect to k) that minimizes F′ = f_{D} − f_{C}. Since \({\Pi }_{D}(j)\), \({\Pi }_{C}(j+1)\) and c do not depend on w, these quantities do not affect the choice of the optimal w, leaving us with the problem of minimizing the following expression:
Since \((\begin{array}{c}k\\ j\end{array})=(\begin{array}{c}k1\\ j\end{array})\frac{k}{kj}\,{\rm{and}}\,(\begin{array}{c}Z1k\\ N1j\end{array})=(\begin{array}{c}Zk\\ N1j\end{array})\frac{Zk(N1j)}{Zk},\)
The second summation does not depend on w; thus the optimal policy is given by the minimization of:
Since N and δ are always positive, the whole expression can be divided by Nδ without changing the optimization problem. Moreover, by multiplying the expression by (−1), it can finally be shown that minimizing f_{D} − f_{C} is equivalent to maximizing the following expression:
where j represents the number of Cs in a group of size N, sampled without replacement from a population of size Z containing k Cs. Now, let us consider that the optimal switching point s* depends on k. Since this sum decreases as k increases, containing only one root, the solution to this optimization problem corresponds to having w set to 1 (pure Reward) for positive values of the sum, suddenly switching to w = 0 (pure Punishment) once the sum becomes negative. The optimal switching point s* depends on the ratio \(\frac{a}{b}\), group size N and population size Z. The effect of population size (Z) and group size (N) on s* is limited, while the impact of the efficiency of reward (a) and punishment (b) is illustrated in Fig. 4. For \(\frac{a}{b}=1\) the switching point is s* = 0.5 (see Fig. 4). Interestingly, we note that, also in the CRD, s* is not impacted by the group success threshold (M) or the risk associated with losing the retained endowment when collective success is not attained (r). This is the case as we assume that the decision to punish or reward is independent on M or r. Notwithstanding, the model that we present can, in the future, be tuned to test more sophisticated incentive tools, such as rewarding or punishing depending on (i) how far group contributions remained from (or surpassed) the minima to achieve group success or (ii) how soft/strict is the dilemma at stake, given the likelihood of losing everything when collective success is not accomplished.
References
 1.
Barrett, S. Selfenforcing international environmental agreements. Oxford Economic Papers 46, 878–894 (1994).
 2.
Barrett, S. Why cooperate?: the incentive to supply global public goods. (Oxford UP, 2007).
 3.
Dreber, A. & Nowak, M. A. Gambling for Global Goods. Proc Natl Acad Sci USA 105, 2261–2262 (2008).
 4.
Hardin, G. The Tragedy of the Commons. Science 162, 1243 (1968).
 5.
Milinski, M., Sommerfeld, R. D., Krambeck, H. J., Reed, F. A. & Marotzke, J. The collectiverisk social dilemma and the prevention of simulated dangerous climate change. Proc Natl Acad Sci USA 105, 2291–2294 (2008).
 6.
Pacheco, J. M., Santos, F. C., Souza, M. O. & Skyrms, B. Evolutionary dynamics of collective action in Nperson stag hunt dilemmas. Proc R Soc Lond B 276, 315–321 (2009).
 7.
Tavoni, A., Dannenberg, A., Kallis, G. & Löschel, A. Inequality, communication and the avoidance of disastrous climate change in a public goods game. Proc Natl Acad Sci USA 108, 11825–11829 (2011).
 8.
Bosetti, V., Heugues, M. & Tavoni, A. Luring others into climate action: coalition formation games with threshold and spillover effects. Oxford Economic Papers 69, 410–431 (2017).
 9.
Santos, F. C. & Pacheco, J. M. Risk of collective failure provides an escape from the tragedy of the commons. Proc Natl Acad Sci USA 108, 10421–10425 (2011).
 10.
Vasconcelos, V. V., Santos, F. C. & Pacheco, J. M. A bottomup institutional approach to cooperative governance of risky commons. Nat. Clim. Change 3, 797–801 (2013).
 11.
Sigmund, K., Hauert, C. & Nowak, M. A. Reward and punishment. Proc. Natl. Acad. Sci. USA 98, 10757–10762 (2001).
 12.
Chen, X., Sasaki, T., Brännström, Å. & Dieckmann, U. First carrot, then stick: how the adaptive hybridization of incentives promotes cooperation. Journal of The Royal Society Interface 12, 20140935 (2015).
 13.
Hilbe, C. & Sigmund, K. Incentives and opportunism: from the carrot to the stick. Proceedings of the Royal Society of London B: Biological Sciences 277, 2427–2433 (2010).
 14.
Gneezy, A. & Fessler, D. M. Conflict, sticks and carrots: war increases prosocial punishments and rewards. Proceedings of the Royal Society of London B: Biological Sciences, rspb20110805 (2011).
 15.
Sasaki, T. & Uchida, S. Rewards and the evolution of cooperation in public good games. Biology letters 10, 20130903 (2014).
 16.
Fehr, E. & Gächter, S. Altruistic punishment in humans. Nature 415, 137–140 (2002).
 17.
Sigmund, K. Punish or perish? Retaliation and collaboration among humans. Trends in ecology & evolution 22, 593–600 (2007).
 18.
Masclet, D., Noussair, C., Tucker, S. & Villeval, M.C. Monetary and nonmonetary punishment in the voluntary contributions mechanism. Am. Econ. Rev. 93, 366–380 (2003).
 19.
Charness, G. & Haruvy, E. Altruism, equity, and reciprocity in a giftexchange experiment: an encompassing approach. Games and Economic Behavior 40, 203–231 (2002).
 20.
Andreoni, J., Harbaugh, W. & Vesterlund, L. The carrot or the stick: Rewards, punishments, and cooperation. The American economic review 93, 893–902 (2003).
 21.
Szolnoki, A. & Perc, M. Reward and cooperation in the spatial public goods game. EPL (Europhysics Letters) 92, 38003 (2010).
 22.
Perc, M. et al. Statistical physics of human cooperation. Physics Reports 687, 1–51 (2017).
 23.
Fang, Y., Benko, T. P., Perc, M., Xu, H. & Tan, Q. Synergistic thirdparty rewarding and punishment in the public goods game. Proc. Roy. Soc. A 475, 20190349 (2019).
 24.
Milinski, M., Semmann, D., Krambeck, H. J. & Marotzke, J. Stabilizing the Earth’s climate is not a losing game: Supporting evidence from public goods experiments. Proc Natl Acad Sci USA 103, 3994–3998 (2006).
 25.
Chen, X., Szolnoki, A. & Perc, M. Averting group failures in collectiverisk social dilemmas. EPL (Europhysics Letters) 99, 68003 (2012).
 26.
Chakra, M. A. & Traulsen, A. Evolutionary dynamics of strategic behavior in a collectiverisk dilemma. PLoS Comput Biol 8, e1002652 (2012).
 27.
Chen, X., Szolnoki, A. & Perc, M. Riskdriven migration and the collectiverisk social dilemma. Physical Review E 86, 036101 (2012).
 28.
Pacheco, J. M., Vasconcelos, V. V. & Santos, F. C. Climate change governance, cooperation and selforganization. Phys Life Rev 11, 595–597 (2014).
 29.
Vasconcelos, V. V., Santos, F. C., Pacheco, J. M. & Levin, S. A. Climate policies under wealth inequality. Proc Natl Acad Sci USA 111, 2212–2216 (2014).
 30.
Hilbe, C., Chakra, M. A., Altrock, P. M. & Traulsen, A. The evolution of strategic timing in collectiverisk dilemmas. PloS ONE 8, e66490 (2013).
 31.
Barrett, S. Avoiding disastrous climate change is possible but not inevitable. Proc Natl Acad Sci USA 108, 11733 (2011).
 32.
Barrett, S. & Dannenberg, A. Climate negotiations under scientific uncertainty. Proc Natl Acad Sci USA 109, 17372–17376 (2012).
 33.
Milinski, M., Röhl, T. & Marotzke, J. Cooperative interaction of rich and poor can be catalyzed by intermediate climate targets. Climatic change 1–8 (2011).
 34.
Boesch, C. Cooperative hunting roles among Tai chimpanzees. Human. Nature 13, 27–46 (2002).
 35.
Creel, S. & Creel, N. M. Communal hunting and pack size in African wild dogs, Lycaon pictus. Animal Behaviour 50, 1325–1339 (1995).
 36.
Black, J., Levi, M. D. & De Meza, D. Creating a good atmosphere: minimum participation for tackling the’greenhouse effect’. Economica 281–293 (1993).
 37.
Stander, P. E. Cooperative hunting in lions: the role of the individual. Behavioral ecology and sociobiology 29, 445–454 (1992).
 38.
Alvard, M. S. et al. Rousseau’s whale hunt? Coordination among biggame hunters. Current anthropology 43, 533–559 (2002).
 39.
Souza, M. O., Pacheco, J. M. & Santos, F. C. Evolution of cooperation under Nperson snowdrift games. J Theor Biol 260, 581–588 (2009).
 40.
Pacheco, J. M., Vasconcelos, V. V., Santos, F. C. & Skyrms, B. Coevolutionary dynamics of collective action with signaling for a quorum. PLoS Comput Biol 11, e1004101 (2015).
 41.
Skyrms, B. The Stag Hunt and the Evolution of Social Structure. (Cambridge Univ Press, 2004).
 42.
Barrett, S. Environment and statecraft: the strategy of environmental treatymaking. (Oxford UP, 2005).
 43.
Sigmund, K. The Calculus of Selfishness. (Princeton Univ Press, 2010).
 44.
Traulsen, A., Nowak, M. A. & Pacheco, J. M. Stochastic dynamics of invasion and fixation. Phys. Rev. E 74, 011909 (2006).
 45.
Traulsen, A., Hauert, C., De Silva, H., Nowak, M. A. & Sigmund, K. Exploration dynamics in evolutionary games. PNAS 106, 709–712 (2009).
 46.
Paiva, A., Santos, F. P. & Santos, F. C. Engineering prosociality with autonomous agents in ThirtySecond AAAI Conference on Artificial Intelligence, pp. 7994–7999 (2018).
 47.
Shirado, H. & Christakis, N. A. Locally noisy autonomous agents improve global human coordination in network experiments. Nature 545, 370 (2017).
 48.
Santos, F. P., Pacheco, J. M., Paiva, A. & Santos, F. C. Evolution of collective fairness in hybrid populations of humans and agents in Proceedings of the ThirtyThird AAAI Conference on Artificial Intelligence, Vol. 33, pp. 6146–6153 (2019).
 49.
Rahwan, I. et al. Machine behaviour. Nature 568, 477 (2019).
 50.
Powers, S. T., van Schaik, C. P. & Lehmann, L. How institutions shaped the last major evolutionary transition to largescale human societies. Philosophical Transactions of the Royal Society B: Biological Sciences 371, 20150098 (2016).
 51.
Sigmund, K., De Silva, H., Traulsen, A. & Hauert, C. Social learning promotes institutions for governing the commons. Nature 466, 861 (2010).
 52.
Santos, F. C., Pacheco, J. M. & Skyrms, B. Coevolution of preplay signaling and cooperation. J Theor Biol 274, 30–35 (2011).
 53.
Kulkarni, V. G. Modeling and analysis of stochastic systems. (Chapman and Hall/CRC, 2016).
 54.
Hindersin, L., Wu, B., Traulsen, A. & García, J. Computation and simulation of evolutionary game dynamics in finite populations. Sci. Rep. 9, 6946 (2019).
Acknowledgements
This research was supported by Fundação para a Ciência e Tecnologia (FCT) through grants PTDC/EEISII/5081/2014 and PTDC/MAT/STA/3358/2014 and by multiannual funding of INESCID and CBMA (under the projects UID/CEC/50021/2019 and UID/BIA/04050/2013). F.P.S. acknowledges support from the James S. McDonnell Foundation 21st Century Science Initiative in Understanding Dynamic and Multiscale Systems  Postdoctoral Fellowship Award. All authors declare no competing financial or nonfinancial interests in relation to the work described.
Author information
Affiliations
Contributions
A.R.G., F.P.S, J.M.P. and F.C.S. designed and implemented the research; A.R.G., F.P.S, J.M.P. and F.C.S prepared all the Figures; A.R.G., F.P.S, J.M.P. and F.C.S. wrote the manuscript; A.R.G., F.P.S, J.M.P. and F.C.S reviewed the manuscript.
Corresponding author
Correspondence to Francisco C. Santos.
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Góis, A.R., Santos, F.P., Pacheco, J.M. et al. Reward and punishment in climate change dilemmas. Sci Rep 9, 16193 (2019). https://doi.org/10.1038/s41598019525248
Received:
Accepted:
Published:
Further reading

The social physics collective
Scientific Reports (2019)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.