Introduction

The emergence of cooperation is an ubiquitous phenomenon in biological and social systems. In recent years, due to the increasing availability of experimental results1,2,3,4,5 and the development of new techniques to characterize actual networks of contacts, new insights into the problem of how cooperative behavior arises and survives have been provided. However, there are still many fundamental questions that remain open. To date, evolutionary game theory is a powerful mathematical tool for the analysis of diverse dilemmas in biological and social systems6,7,8,9,10,11,12. In this context, different games have been developed as metaphors of real biological, human and economic behaviors. Among them the Prisoner's Dilemma (PD) and the Snowdrift (SD) have received a lot of attention in the literature13,14,15,16,17,18,19,20,21,22,23,24,25.

In the PD, two players simultaneously decide whether to cooperate (C) or defect (D). They both receive R under mutual cooperation and P under mutual defection, while a cooperator receives S when confronted to a defector, which in turn gets T. The payoffs are ordered as T > R > PS so that in the well-mixed case defection is the best strategy regardless of the opponent strategy8. In the SD, players interact in a similar way but the punishment P for mutual defection is supposed to lead to a higher cost for both players and thus the payoff order is T > R > S > P. This variation, although very small, induces a significant change in the game dynamics with the creation of a second Nash equilibrium where both strategies coexist also in the well-mixed case.

Unlike the well-mixed case, when a spatial structure is added to guide players interactions, cooperators can survive by forming cohesive clusters that prevent the invasion by defectors. This mechanism, known as spatial reciprocity, can lead to the formation of different equilibria where, even in the PD, cooperators and defectors can coexist26,27,28,29,30,31,32,33,34 and, in some cases, cooperation can also become the dominant strategy35,36,37,38. Following these works, many studies exploited the potentialities of complex interaction structures to obtain high levels of cooperation between the players. Recently, several mechanisms have been shown to favor cooperation. For instance, high values of the clustering coefficient39,40, different rewiring mechanisms41,42,43 or the diversity between players44,45, all allow cooperation to prevail even if the temptation to defect reaches very high values.

In spite of the achievements of the recent years, there is a situation of particular relevance that has received relatively little attention till now. This is the case of social punishment where cooperators can decide to spend a small part of their resources to punish defectors for their free-rider behavior. Although this kind of mechanism is almost ubiquitous in real world scenarios, only a few works have addressed the effects of social punishment on evolutionary dynamics46,47,48,49,50,51,52,53. Most of these works are limited to public goods games47,48,49,50,51,52,53, where the evolution of cooperation is further supported by the interaction between agents that belong to groups of different sizes. However, in a recent work54, where cooperators were able to punish defectors as a second-stage behavior, it was unveiled that severe punishment was not necessarily more effective in improving the survival of cooperation (using pairwise interaction). It is therefore of further interest to inspect the dynamics of PD and SD games when punishment is considered an independent strategy.

In this paper, we explore the effects of social punishment on 2-person games in which interactions are driven by complex topologies. Specifically, we introduce social punishers in an otherwise standards PD and SD games and study how this new strategy affects the emergence and the organization of cooperation in several topological settings.

Results

To include social punishment in the standard PD and SD, along with cooperation (C) and defection (D), we consider a third strategy, Punish (Pu), as an independent yet particular type of cooperation. In the interaction with a cooperator or between them punishers act exactly as cooperators both earning the same payoffs. In contrast, when a punisher meets a defector, the first one, at cost γ, imposes a fine β to the defector with the effect of reducing the effective payoff gained by the latter. In the model we impose β > γ assuring that only a small cost is needed for punishment. We expect that severe punishment could lead to a more beneficial environment for the survival of cooperation. In the methods section we summarize the interactions between players and their corresponding payoffs.

Once defined our model we analyze its behavior at two different granularity levels. Firstly, we focus on the macroscopic response of the system measuring the average fraction of cooperative agents 〈c〉, defined as the fraction of cooperators and punishers present at the steady state (see Methods). Next, to provide a deeper understanding of the effects of social punishment, we also study the evolution of individuals' strategies at the single node level and the formation of local patterns of interaction. In 2-person evolutionary games on networks the evolution of individuals' strategies can follow two different behaviors36,55. If an individual keeps the same strategy in all generations after the a transient period, she is defined as a pure strategist. Conversely, individuals that change their strategy at the steady state are defined as fluctuating. Since we are interested in cooperative behavior in general, we define three types of pure strategists: pure cooperators, pure punishers, and cooperators plus punishers, where the last cluster accounts for agents that alternatively spend some time as a cooperators and as a punishers.

Macroscopic behavior

We start our analysis at the macroscopic level studying whether social punishment can favor cooperation or not. Fig. 1 presents results obtained for the PD on the three classes of networks considered (see Methods) and for different values of β. We first focus on the case of PD on a regular square lattice (Fig. 1A) since, of the three graphs, it is the one that provides smaller levels of cooperation for the standard settings of the games. In the standard formulation (i.e., no social punishment) the fraction of cooperators at the stationary state suddenly decreases as b > 1 and becomes zero soon afterwards for very small values of the temptation b. Interestingly, even a small punishment (β = 0.1 or 0.3) can radically change the dynamics of the system: cooperators can survive and become the dominant strategy for higher values of b. Increasing β, produces an even marked dominance of cooperators and therefore cooperation is extinguished for larger values of b, which is consistent with our expectation that severe punishment is more effective in promoting cooperation. Note that when the cost to impose the social fine γ and the social fine itself are identical β = γ = 0.1, cooperation is favored and an increase with respect to the standard case is still observed. Results for ER and SF (Fig. 1B and Fig. 1C) networks are along the same lines as for the square lattice, indicating that the increase in cooperation due to the presence of punishers is a general feature.

Figure 1: Fraction of cooperators and punishers 〈c〉 in the Prisoner's Dilemma as a function of the temptation to defect b for different values of the social fine β and the three network classes considered.
figure 1

From left (A) to right (C) the networks are a square lattice, an ER graph and a SF network, respectively. All the results have been obtained for N = 104 nodes, 〈k〉 = 4 and γ = 0.1.

Due to the differences24,56,57 between the SD and the PD, the SD is an appropriate candidate to test the universality of our results. Figure 2 depicts the fraction of cooperators 〈c〉 as function of the cost-to-benefit ratio r for the three topologies. Also in this case, it can be observed that, compared with the results obtained for the standard setting, punishment significantly facilitates the evolution of cooperation. For large values of β, cooperation can survive for a wider range of r values. This is in agreement with observations made in PD, suggesting that social punishment on free-riders is generally valid in promoting the evolution of cooperation, irrespective of the potential evolutionary games and underlying interaction network.

Figure 2: Fraction of cooperators and punishers 〈c〉 in the Snowdrift Game as a function of the cost-to-benefit ratio r for different values of the social fine β and for the three network classes considered.
figure 2

From left (A) to right (C) the networks are square lattice, ER and SF networks, respectively. Other parameters are the same as in Fig. 1.

Microscopic organization

In what follows, we focus on the PD to inspect what are the mechanisms that allow social punishment to favor cooperation. To this end, we analyze the system at the microscopic scale. Important clues come from the analysis of the local distribution of pure cooperators. As described in the previous section, we focus on three types of clusters of pure strategists: clusters formed by pure cooperators, pure punishers and the ones formed by cooperators and punishers together. In addition, we look at the size of the largest clusters for the three possible configurations.

Figure 3 shows the evolution of the number of cooperative clusters and the size of the biggest ones as the temptation b increases for the square lattice. For low values of b, the number of C clusters is much larger than in the standard version (i.e., no punishment, see inset of Fig. 3). On the other hand, for the Pu clusters the microscopic organization is totally different: only one giant cluster exists reaching almost the size of the entire system. This indicates that for low values of b, the system is composed by small islands of cooperators surrounded by punishers that prevent defectors to invade cooperators. As b increases an interesting phenomenon takes place. For intermediate b the number of C clusters rapidly decreases while the giant cluster of punishers grows. This is the protection mechanism that allows cooperation to survive against higher temptation values with respect to the traditional PD. Cooperators who get in touch with defectors become punishers and, in this way, they can stop the spreading of defectors in the system. Once all cooperators become punishers, these strategists have no other way to resist the invasion of defectors — essentially, because interaction between punishers reports less benefits than between a cooperator and a punisher. From that point on, a small increase in b produces the break down of the Pu cluster into smaller clusters, up to the point at which all punishers die out.

Figure 3: Number of clusters of pure cooperative agents Ncc (upper panels) and number of cooperative agents in the corresponding largest cluster Nc (lower panels) in the square lattice as a function of b.
figure 3

Insets represent the results of standard two-strategy game. Form left to right, values of β are 0.1, 0.3 and 0.5. All the results are obtained for γ = 0.1, N = 104 and 〈k〉 = 4.

To support the previous qualitative picture, we inspect the characteristic spatial configuration of the agents for different values of b. Figure 4 displays the results obtained for β = 0.3 and γ = 0.1. For low b (Fig. 4A), a number of pure cooperators islands survive in the interior of the giant Pu cluster that protect them from the exploitation of defectors. On the other hand, for high values of b (Fig. 4B), defectors start invading the Pu cluster until it splits in smaller parts.

Figure 4: Spatial distribution of the different clusters for different values of b in square lattices.
figure 4

For low temptation b = 1.05 (A), numerous C clusters (blue) are surrounded by a giant Pu cluster (green), whereas for a large temptation b = 1.25 (B), that giant Pu cluster is separated by many defectors (red). All the results are obtained for β = 0.3 and γ = 0.1.

Next, we analyze the microscopic organization of cooperation on ER graphs. Figure 5 shows the evolution of the three types of clusters and the size of the largest one as a function of the temptation b for the same settings of Fig. 3 when the underlying topology is an ER graph. In general, the behavior of the system is the same as in the square lattice, but small differences arise. As before, for low values of b, a giant cluster formed by both cooperators and punishers is present. At variance with the lattice case, this cluster is mostly made up by pure cooperators and not punishers — the difference being due to the fact that in ER networks, the “surface” of the cluster made exclusively by pure cooperators is smaller than that in the square lattice. On the other hand, when the temptation increases, the number of pure C clusters decreases until the transition point is reached. From that point on, as observed for the square lattice, the giant C + Pu cluster starts to collapse in several smaller isolated clusters until defectors invade the system. This behavior is in line with previous results for the standard PD on ER graphs29,36. Additionally, note that the previous picture depends on the value of β in such a way that the larger β is, the larger is the temptation to defect needed for defectors to invade. Moreover, when the social fine β increases, punishers, and not pure cooperators, populate the largest cluster.

Figure 5: Number of clusters of pure cooperative agents Ncc (upper panel) and number of cooperative agents in the respective largest cluster Nc as a function of b for ER networks.
figure 5

Note that the insets plot the results obtained for standard setup. From left [(A) and (D)] to right [(C) and (F)], the values of β are 0.1, 0.3 and 0.5, respectively. All the results are obtained for γ = 0.1.

Another important result of29,36 is that, in general, in scale free networks the raising (or breakdown) of cooperation follows a different path with respect to ER graphs. So, it is also of interest to study the behavior of NCC and NC for SF topologies. In the standard PD on SF graphs, hubs are usually occupied by cooperators and a giant cluster of pure cooperators starts to grow around them until the entire network forms a complete cluster. Increasing b produces a reduction in the size of the C cluster that doesn't break up until very high values of temptations are reached. Figure 6 presents the same analysis of Figs. 3 and 5 for the case of SF networks. In sharp contrast with the behavior observed for square lattices and ER graphs, the results of Fig. 6 show that NCC and NC behave differently as b grows. The number of pure C and pure Pu clusters monotonically decrease while only one C + Pu cluster is present in the system until it disappears for very high values of b. This is in agreement with what we know for the standard formulation of the PD on SF networks. Moreover, the results point out that also in the presence of social punishment, the heterogeneity of the network strongly affects the structure and evolution of cooperation.

Figure 6: Number of clusters of pure cooperative strategists Ncc (upper panel) and number of cooperative players in the corresponding largest cluster Nc as a function of b for SF networks.
figure 6

The insets depict the results for the standard setup. Form left [(A) and (D)] to right [(C) and (F)], the values of β are 0.1, 0.3 and 0.5, respectively. All results are obtained for γ = 0.1.

Finally, we have also monitored how cooperators and punishers distribute by degree classes. Figure 7 presents the distribution of strategies at the steady state for different degree classes on SF networks for the traditional PD (panel A) and different values of β (panels from B to D). As it can be seen, for intermediate and high values of β, cooperators and punishers have a higher probability of occupying large and medium degree nodes, while defectors are localized in lowly connected nodes. As it happened for the clusters organization, when β is relatively small (Fig. 7B), pure cooperators are more abundant and tend to dominate in intermediate and high degree nodes. However, increasing β produces a growth in the fraction of punishers until for high fees (β = 0.7 Fig. 7D) a crossover has taken place and cooperators and punishers are practically indistinguishable as far as the degree of the nodes they sit at is concerned.

Figure 7: Distribution of strategies in SF networks.
figure 7

The points represent the ratios of cooperators, defectors and punishers as a function of the nodes' degrees at the steady state in SF networks. Panel (A) depicts the standard version of the PD, whereas panels from (B) to (D) shows results obtained for our model with social punishment (β = 0.1, 0.3 and 0.7, respectively). All the results are obtained for b = 2.4 and γ = 0.1.

Discussion

Inspired by many real world human, economical and biological scenarios, the inclusion of social punishment in evolutionary models seems a natural choice. In this work we have studied the impact of such mechanism in spatial evolutionary games when the underlying interaction networks are regular or complex. Numerical simulations have shown that when punishers are taken into account, which at a small cost reduce the benefits of defectors, cooperation is further enhanced in both the Prisoner's Dilemma and the Snowdrift games.

The analysis of the system at the microscopic level for the PD game allowed to identify the mechanisms that drives the survival of cooperative behavior. In homogenous graphs, small patches of cooperators arise surrounded by punishers that help to protect the cluster against the invasion of defectors until a giant cluster of pure punishers percolates the system. When the temptation to defect further increases cooperators first turn into punishers and then the giant cluster breaks down into several ones until defection becomes the dominant strategy. On the other hand, in heterogenous networks, the raise of cooperation is driven by hubs that can be both cooperators or punishers. When the temptation to defect increases making cooperation a costly strategy, defectors' invasion takes place slowly by the erosion of the single cluster of pure cooperators and punishers present in the system. In summary, our work shows that a sort of social punishment mechanism like the one here discussed can be beneficial for sustaining cooperative behavior. Given that only small differences at the microscopic level arises with respect to the standard formulation, we anticipate that similar mechanisms will produce the same qualitative behavior.

Methods

2-person evolutionary games with social punishment

We introduce social punishment in the PD and SD as a novel kind of agents, defined Punishers (Pu) able, when playing against defectors, to apply them a fine β at a small cost γ; while in the interaction with a cooperator or between themselves, Punishers act as cooperators — punishers are therefore special cooperators, but note however that they can exist independently, which is at variance with the second-stage behavior based on cooperators in54. In table I we review the interactions between the agents and the relative payoffs. Following a common parametrization in the recent literature26,58,59, we choose the PD's payoffs as R = 1, P = S = 0, and T = b > 1 satisfying the restricted condition T > R > P = S. For the snowdrift we choose a similar scheme with R = 1, S = 1 − r, P = 0 and T = 1 + r, where 0 ≤ r ≤ 1 represents the so-called cost-to-benefit ratio (satisfying the ranking T > R > S > P). Evolution has been simulated employing the finite population analogue of replicator dynamics35,36. We implement the evolutionary dynamics in the following way. As initial conditions, we assign to each individual, with equal probability, one of the three available strategies: cooperation (C), defection (D) or punish (Pu). Then, at each time step, each node i in the network plays with all her neighbors, and gets a payoff Pi. Next, all the players synchronously update their strategy by picking up at random one of their neighbors, say j, and comparing the respective payoffs Pi and Pj. If Pi > Pj, player i will keep her strategy for the next step. On the contrary, if Pj > Pi, agent i will copy j's strategy with a probability proportional to the payoff difference:

where ki and kj stand for the degree of agents i and j respectively, and Δ represents the maximum possible payoff difference between two players. Note that from Eq. (1), it is possible that the strategy of one individual changes into another strategy. For example, a punisher can become a defector, but this results for the evolutionary dynamics and therefore it is not a consequence of a sort of second-stage behavior as in54.

Table 1 Payoff matrix of the studied evolutionary game. The three strategies are cooperation (C), defection (D) and punishment (Pu). Here, γ stands for the cost of punishment and β is the fine applied to defectors

To assure that the system has reached a stationary state we wait a transient time of t0 = 105 time-steps and then we calculate 〈c〉 as the average over additional 104 time-steps. As a further check, once t0 has been reached we analyze the size of the fluctuations in c(t) if this size is smaller than 10−2 we assume that the stationary state has been reached, otherwise we wait for other 104 time-steps and redo the check. In all the simulations the system reached the stationary state before t0 and no additional time-steps were needed. Moreover, since the heterogeneity of some of the networks could introduce additional noise, all the results have been averaged over 400 independent realizations of the network topology and initial conditions. Finally, we test the robustness of the results considering three different classes of networks: regular square lattices with periodic boundary conditions, Erdös-Rényi (ER) random graphs60 and Barabási-Albert scale-free (SF) networks61. For all the considered networks we set the same size (N = 104 nodes) and the same average degree, i.e., 〈k〉 = 4.