Voluntary rewards mediate the evolution of pool punishment for maintaining public goods in large populations

Punishment is a popular tool when governing commons in situations where free riders would otherwise take over. It is well known that sanctioning systems, such as the police and courts, are costly and thus can suffer from those who free ride on other's efforts to maintain the sanctioning systems (second-order free riders). Previous game-theory studies showed that if populations are very large, pool punishment rarely emerges in public good games, even when participation is optional, because of second-order free riders. Here we show that a matching fund for rewarding cooperation leads to the emergence of pool punishment, despite the presence of second-order free riders. We demonstrate that reward funds can pave the way for a transition from a population of free riders to a population of pool punishers. A key factor in promoting the transition is also to reward those who contribute to pool punishment, yet not abstaining from participation. Reward funds eventually vanish in raising pool punishment, which is sustainable by punishing the second-order free riders. This suggests that considering the interdependence of reward and punishment may help to better understand the origins and transitions of social norms and institutions.

Apart from the issue of system stabilization, there still remains another issue relevant for the evolution of costly punishment: the emergence problem. Indeed, punishing right and left in large populations of free riders will require considerable effort and expense for pool punishers. Reflecting this, it is often explicitly assumed that pool punishment becomes active if at least a threshold number of players, more than one, contribute to it [28][29][30] . This means that in such large populations it is not easy to successfully start up costly punishment 31,32 , even with considering punishment of second-order free riders 33 (Fig. 1a,b).
For the last decade, several attempts have tried to resolve the emergence problem. Most of the theoretical results have been based on assuming small, finite populations and analyzing those stochastic dynamics 13,24,[34][35][36] . In addition, optional participation and mutual aid games (MAGs) have been considered as key factors in a resolution 14,22,25 . When participation in games is optional, players can simply escape a social trap of mutual defection 37,38 . MAGs are variants of public good games (PGGs). In PGGs the resulting benefits are shared equally among all members in the group. In MAGs it is not allowed to benefit from one's own contribution to the public goods provision 13,20 . That is, MAGs deal with excludable goods, not public goods, and combined with optional participation, are also two-fold exclusion. As such, previous studies have shed light on excludable good games in small populations.
Here we turn to pool reward 39,40 , thereby we tackle the emergence of pool punishment in non-excludable good games in very large populations. We model a situation like a matching fund that usually arises for charity or common goods, in which contributors donate to a nonprofit source outside. Then the external source, enhancing the input, will make returns to a broader range of beneficiaries that include the contributors. Previous studies have investigated reward and punishment, often comparatively [41][42][43][44][45] , and have also examined the selection or interplay of these incentives [46][47][48][49][50][51][52] . It is thus surprising that little is known about what happens if those who commit to pool punishment are promptly rewarded, rather than through iterated interactions or reputation.
Rewarding is costly. The pool reward being considered allows for receiving self-returns from one's own contribution as well as sharing in other's contribution without contributing, similar to PGGs. It follows that a pool reward can suffer from those who take a free ride on the reward-fund raising. From the viewpoint of its initiator, rather than punishing, rewarding can be less expensive and thus more efficiently stimulate cooperative behaviors 4,43,46,52 . Indeed, voluntary rewarding can be maintained even in public good games with second-order free riders 39,53 . It is thus predicted that a pool reward that also rewards volunteers to pool punishment will provide a foothold for the initially rare volunteers to proliferate, overcoming the emergence problem even without the assistance of the optional participation. We shall confirm this prediction by using the following game-theoretical model.

Methods
Evolutionary games for a public good and multi-strategy interactions. We consider a well-mixed, infinitely large population. We assume that a player is more likely to adopt other player's strategy earning a higher payoff (''imitate better''). In the population this can be implemented by considering replicator dynamics 54,55 . We analyze the replicator dynamics for five strategies that consist of four types of participants in the PGGs: (i) cooperators (C) contribute to the PGG, but not to the incentive funds; (ii) defectors (D) do not contribute at all; (iii) punishers (P) contribute to the PGG and to the punishment fund; (iv) rewarders contribute to the PGG and to the reward fund (R); and (v) nonparticipants (N). We denote as x S and P S the relative frequency and expected payoff for each strategy S 5 C, D, P, R, or N (thus, 0 # x S # 1 and X S x S~1 ). The replicator dynamics for the five strategies are given by _ x S~xS (P S { P), in which P describes the average payoff over the population, that is P~X S x S P S .
Game procedure and parameters. A group for the public good interaction consists of n . 1 members who are randomly chosen from the population. First, each of the members is offered an opportunity to participate in the PGG. If participating, then each participant will be subsequently offered distinct opportunities to contribute, to the reward, then the punishment, and finally PGG. Each contribution to the PGG, reward or punishment fund means an investment of c 1 , c 2 , c 3 . 0, respectively, at a cost to the contributor itself. In the PGG, the resulting benefits, multiplied by factors r 1 . 1, are equally shared by all participants, excluding N-players. To examine a previous, problematic situation in which C, D, and N coexist, in particular we assume that 2 , r 1 , n 37 . In the reward fund, the resulting rewards, multiplied by intermediate factors r 2 with 1 , r 2 , n, is shared, yet not always equally, among all of the contributors to the PGG (C-, R-, and P-players), excluding D-and N-players 39,40,43 . We assume weights k RP , k RR $ 0 for the P-and Rplayers' share. In the punishment fund, non-contributors to the punishment (D-, C-, and R-players) incur fines. We assume that the fines are proportional to the contribution accumulated over all P-players 14,20 , with proportionality factor r 3 . 1 and weights k PC , k PR $ 0 for the C-and R-players' fines. Finally, the fifth type (v) non-participant is a loner that independently earns a small payoff g . 0. Hence, we have the individual payoff for an interaction, f S , of each strategy S 5 C, D, P, R, or N, as follows: participation or (b) bistability of the P node or periodic oscillations among C, D, and N for optional participation. (c) Replace non-participant N with R. As in panel b, on the CDR face the population states oscillate along periodic closed orbits. In contrast to panels a and b, rare P-players, rewarded, can invade to the CDR face. Typically, the population state will converge to the DPR face, on which the dynamics is repelling. The trajectory then will come close to the edges connecting the three nodes D, P, and R, and finally attain the P node. Parameter values are: n 5 5, c 1 5 1, r 1 5 3, c 2 5 1, r 2 5 2, c 3 5 0.1, r 3 5 1.6, k RP 5 2, k PC 5 1, and g 5 1. The system includes second-order punishment. Open and filled circles denote, respectively, unstable and asymptotically stable equilibria.
www.nature.com/scientificreports SCIENTIFIC REPORTS | 5 : 8917 | DOI: 10.1038/srep08917 in which n S denotes the number of S-player among (n 2 1) co-players, b 1 5 r 1 c 1 , b 2 5 r 2 c 2 , and b 3 5 r 3 c 3 . The expected payoff for each strategy is given by x nC C x nD D x nP P x nR R x nN N describes the probability of finding the specific (n 2 1) co-players which includes n S S-players (S 5 C, D, P, R, and N).
Here, it has been assumed that there are participants of more than one, and if a participant is single, she or he acts as a non-participant and earns the same payoff g 31,37 . In the model we consider that the reward weight k RP and k RR describe an extra bonus for the one who contributed not only the PGG but also another public fund. Thus, k RP and k RR are supposed to be greater than 1. In the punishment weights, k PC and k PR are usually smaller than 1, denoting a discount factor for the one who did the second-order but not first-order free riding. For simplicity, we hereafter assume that k RR 2 1 and k PR offset each other and in particular k RR 5 1 and k PR 5 0.

Results
We, in terms of evolutionary game theory 54 , show that voluntary rewarding for pool punishers can lead to a state in which all are Pplayers, no matter whether participation is compulsory or optional.
Stability of a coercive society. We start with analyzing local stability of the all-P state. In particular for the all-P state to be robust for the invasion of a rare C-player, we consider second-order punishment with k PC r 3 . 1/(n 2 1), under which there is no temptation to switch to C when all play P, unless specifically stated otherwise. It is not difficult to also know from equation (1) under which conditions the all-P state is stable against the invasion of a rare D-or N-player. In the case of D this is when c 1 (1 2 r 1 /n) , c 3 [(n 2 1)r 3 2 1] holds, where the left and right sides describe the marginal costs for cooperating in PGGs and for being punished by n 2 1 punishers, respectively. In the case of N the condition is that g , c 1 (r 1 2 1) 2 c 3 , where the right side means the payoff for the group of all P-players.
Conditions of rock-scissors-paper cycles. It is known that there can exist two kinds of periodic cycles among three strategies. It is clear that the last inequality above is also a sufficient condition that C dominates N. Considering also that N dominates D with g . 0 and that D dominates C with r 1 , n, it follows that when the PGG multiplication factor r 1 is greater than 2, C-, D-, and N-players alternatively become dominant in the population 37,38 . Otherwise, the population which consists of the three strategies will end up with a homogeneous state in which all play N 37 . We thus focus on PGGs with r 1 . 2 (and thus n . 2) in what follows. In addition, to hold such periodic oscillations among another triplet C-, D-, and Rplayers, it is necessary that c 1 (1 2 r 1 /n) , c 2 (r 2 2 1) holds 39 . Based on these rock-scissors-paper-type cycles, we shall investigate the evolutionary dynamics for more than three strategies.
With no reward, pool punishment never emerges (Fig. 1a,b). We first consider combinations of C-, D-, P-players with or without N-players. We show that no P-players evolve if they are initially very rare, whatever the condition of participation. Let us start by compulsory participation (Fig. 1a). In a population which exclusively consists of P and D (or C), the replicator dynamics exhibit a bi-stable system: depending on the initial fraction of Pplayers in the population, the population can evolve either to a state of all P-players or a state of all D-players (or all C-players). By assumption D-players are always better off than C-players. Thus, for the three strategies, the dynamics exhibit bistability of the two homogeneous states for P-players or D-players (all-P state and all-D state). Next is in the case of optional participation (Fig. 1b). In competition among three strategies C, D, and N, it is supposed that the CDN face is filled with periodic closed orbits surrounding a center 37 (see Supplementary Fig. S1 for detailed phase portraits on the faces). For a coexisting state of C, D, and N within the CDN face, a rare, innovative P-player cannot invade, because the time average of the transversal growth rate (i.e., difference of the expected payoff of a rare P-payer and the average payoff over the population) for the rare P-player is negative per punishing cost c 3 , which is the same as in the case of peer punishment 31 . Thus, in the given parameter settings, the dynamics exhibit bistability of the all-P state and periodic oscillations among C, D, and N (see Supplementary Fig. S2 for time series).
With reward, pool punishment emerges for compulsory participation (Figs. 1c and 2a). Replacing non-participation with a pool reward only leads to the similar dynamics on the corresponding CDR face, which is filled with periodic closed orbits surrounding a center 39 (Supplementary Fig. S1). It is obvious that the dynamics on the CDP face are unchanged. With an extra reward for P-players with k RP . 1, the even rare P-player can be encouraged to  invade the coexisting population on the CDR face. Numerical simulations show that the population state will typically come close to the DPR face, increasing in the fraction of P-players and decreasing in that of C-players. This is because of second-order punishment. Among the three strategies of D, P, and R, the dynamics are repelling (Supplementary Fig. S1). As time goes by, the trajectories of population states will converge to the boundaries connecting the three homogenous states for D, P, and R. Considering that P-players are better off than R-players and R-players are better off than D-players, it is understood that the trajectories will be attracted to the all-P state, which again is robust for invasions of rare D-or C-players.
Pool reward emerges for optional participation (Fig. 3). To expand the applicable range of pool-reward, we also consider a case where participation is optional. It turns out that with sufficiently high degrees of the reward multiplication factor r 2 , rare R-players can invade the CDN face, replacing N-players. The population state will eventually be attracted to a periodic orbit on the DNR face (see Supplementary Fig. S1 for detailed phase portraits on the faces). We remark that despite the fact that C-players exploit rewards by R-players, R-players can sprout in the presence of these second-order free riders. The successful invasion of a rare R-player deserves an example of the well-known Simpson's paradox 37,56,57 for second-order social dilemmas: in spite of the burden of costs for rewarding in each game, the rare R-player's payoff, when it is averaged over the whole population, will be better than the second-order free rider C-player's payoff 13,58 . This is in striking contrast to the former case in pool punishment (Fig. 1b). The DNR face, shared in Fig. 1c, is an ''interface'' to connect to the evolution of pool punishment and thus opens the door to the full course of the five strategies, as in what follows.
With reward, pool punishment emerges for optional participation (Fig. 2b). The initial state of the population almost exclusively consists of C-, D-, and N-players, and R-and P-players are given only at very small rates. The population first follows periodic oscillations among the resident three strategies. Similar to the last case, the initially rare R-players then start to gradually spread in the population, replacing N-players. The R-players then can take over almost all of the population. However, the all-P state finally arrives, substituting R-players. Without the intermediate sequence of a rise and fall of voluntary rewarding, we can only have continuous oscillations among C, D, and N.
With no second-order punishment, pool punishment is unstable ( Supplementary Figs. S3 and S4). We explore effects of no punishment of C-players, who do not shoulder the punishment fee c 3 . 0. It is obvious that with no second-order punishment, C dominates P: switching the strategy from P to C is beneficial, whatever others do (see equation (1) with k PC 5 0). Therefore, the all-P state is unstable against the C's invasion and a small random shock will cause that a population of P-players will converge to a boundary state that completely excludes P-players. For compulsory participation with no reward, thus such a population will be eventually attracted by the all-D state ( Supplementary Fig. S3a), and for optional participation with no reward, by cycles among C, D, and N ( Supplementary Fig. S3b). With pool reward, however, the bonus weight for P-players k RP . 1 can lead populations to temporally increase in P-players. The trajectories of population states then can converge to heteroclinic cycles, among C, D, R, and P for compulsory participation (Supplementary Figs. S3c and S4a). In particular, for optional participation the population will stay at an almost-all-N state for a long time on a heteroclinic cycle connecting the five homogeneous states for C, D, N, R, and P ( Supplementary  Fig. S4b).
We examine our main results for a certain range of the model parameters and initial conditions. Our main results that reward funds facilitate the emergence of costly pool punishment are robust against the various initial conditions, whether participation is optional or not ( Supplementary Fig. S5). In particular we numerically explore the lower bound of the reward weight for punishers k RP , with various settings of other parameters, r 1 , r 2 , c 2 , and c 3 (Figs. 4b,d and Supplementary Fig. S6). In Figs. 1c and 2b we also investigate effects of (i) different group sizes n, (ii) different combinations of multiplication factors in PGGs and reward funds, r 1 and r 2 (Fig. 4a,c), (iii) nonlinearity of benefit production functions (Supplementary Text S1 and Fig. S7), and (iv) pool punishment which imposes constant fees. None of the variants (i)-(iv) qualitatively affect the main results.

Discussion
Carrot or stick? This is a commonly used dichotomy in studies on selective incentives. Here we have focused on interdependence of reward and punishment. The evolution of costly punishment indeed will be promoted provided ample positive incentives that covers its net cost. In the case preferring costly punishment is a rational behavior. Thus, the core problem has been whether efforts to provide such rewards can endogenously evolve. Only a few studies have explored the evolution of a meta-norm that rewards peer punishers [59][60][61][62] . We have instead considered pool reward in n-person public good games, which can proliferate when rare even in the presence of second-order free riders. We examined a mediation effect of pool reward on overcoming the emergence problem of pool punishment. It turned out that considering pool reward leads to completing an evolutionary transition of societies in different equilibrium states, with norm deviators or norm followers. The latter state is protected by pool punishment.
Looking back to the real world, a law for an official subsidy or tax reduction to smoothly promote social changes (e.g., green cars and eco-friendly home) often includes its own expiration conditions. In our model, with achievement of a foothold for the evolution of pool punishment, the pool reward becomes evolutionarily retired. These mediation dynamics can be seen for variants of the model. For instance, rewarding mediation is applicable to nonlinear public good games in which the benefit production function has decreasing returns to scale 32 . This is also in threshold public good games in which a certain level of cooperation is required for producing public goods 40,63 . In either case, considering a sufficiently concave benefit function, the homogeneous state for cooperators turns into a stable state and even punishing free riders is redundant to maintain cooperation. The essence of sustaining pool punishment is its prior commitment scheme followed by second-order punishment. Exploring if and how such a commitment system can emerge is out of the range of the model considered. Second-order punishment has been found to effectively prevent second-order free riders from eroding the voluntary sanctioning system 7,64,65 . In the case of peer punishment, it has also been reported that second-order punishment is not likely to be observed 62 . In contrast to this, pool punishment of second-order free riders is often conspicuously observed (i.e., against tax evaders). However, each individual is not supposed to transcendentally abide by the norm of pool punishment. In particular, in the very beginning when people never had concepts of pool punishment and thus there are also second-order free riders, how does a norm that assesses second-order free riders as bad emerge? 66 A better understanding of this could be relevant to the quest to understand the ''roots of sanctioning institutions'' 23 . As such, the fascinating origin of norm assessment for second-order pool punishment deserves further investigation.
Nowadays, various modern issues of commons, such as energy, natural environment, and climate change, are reaching every corner and covering all stages of human lives. As such, it appears that there is almost no time or space for people to opt out of both the corresponding dilemma situations and the related laws 34,67 . Results, based on compulsory participation but voluntary rewards, thus could be more applicable than previous theories with optional participation 14,34 . This implies an improved scenario to accomplish Garrett Hardin's recipe for the commons: mutual coercion mutually agreed upon 1 . In Isaiah Berlin's concept 68 , optional participation (with ''leaving loners alone'' 36 ) can be viewed as a negative liberty, freedom from interference in individual payoff by other players.
In contrast to this, voluntary rewards could be a positive liberty, freedom aimed at modifying the payoff of others. Recent studies have also shown that participants who enable an effect on one another through a majority vote prefer a coercive society with second-order pool punishment 27 . We have revealed that in a broad range of conditions with large populations, non-excludable public goods, or general benefit functions, only having optional participation is often not sufficient 32,67 , but when combined with voluntary rewards, can be effective for establishing pool punishment. All in all, the results may suggest: through positive liberty, corrective coercion.