Introduction

Social dilemma means that individual rationality can lead to collective irrationality1,2,3. The issue of cooperation between multiple subjects is a typical type of social dilemma4. On the one hand, the dilemma of cooperation is prevalent in the social and economic activities of multiple people and multiple organizations5,6,7. On the other hand, the phenomenon of cooperation is everywhere in actual social relationships8,9,10. Cooperation is the foundation of human social progress and civilization, and most of the cooperation in reality is spontaneous in the absence of centralization. Thus, the following questions are raised: (1) Under what conditions can cooperation be spontaneously emerging from self-interested individuals without centralized power? (2) What is the mechanism of cooperation when everyone has selfish motives? In fact, these issues have plagued theoretical scientists for many years11,12.

Currently, many scholars from different disciplines have studied the evolution of human cooperative behaviours6,9,10,13,14,15,16,17,18,19,20. A few mature research frameworks and methodologies are now in place21,22,23,24,25, and certain mechanisms to interpret cooperation have also been proposed24,26,27,28,29,30,31,32. From the behavioral science perspective, the issue of cooperation is essentially an incentive problem33,34. Hence, if we want to achieve collective rationality by individual choice and obtain the benefits of cooperation, then we must motivate and induce individual behaviours. The examples are punishing uncooperative behaviours29,35,36,37,38,39 or rewarding cooperative behaviours33,40,41,42,43. However, in reality, punishments and rewards are costly, the process of which is essentially a second-order social dilemma44,45,46. Thus, who will impose punishments or rewards? How are punishments or rewards implemented? These questions have become the core issues in this incentive problem.

If punishment or reward is used as a type of strategy for the individuals, then, the conditions under which cooperative strategies can be emerging through systematic self-organization can be explored. Presently, various forms of punishments or rewards have been proposed under the direct reciprocity mechanism36,37,43,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61. An example is introducing punishing strategies to punish cooperators and defectors52,59; introducing the moralists who cooperate whilst punishing non-cooperative behaviours and immoralists who defect whilst punishing other non-cooperative behaviours60; self-organised punishment that allows players to adapt their punishment depending on the frequency of cooperation49; tolerance-based punishments in which individuals punish their co-players based on social tolerance61; conditional punishments with fine depending on the number of punishers47; implicated punishments that punish all individuals in the group once a evildoer is caught56; probabilistic sharing the cost of punishing defectors48; and heterogeneous punishments that group punishers based on their willingness to bear the punishing57. Perc et al.10 presented a thoroughly review on punishment mechanisms in evolutionary games. Notably, in the real world, a few shortcomings exist for punishment in promoting cooperation62. Apart from the efficiency loss that may be caused by implementing this behaviour, the promotion effect may also be affected by other factors owing to the increase in the process. For instance, Nikiforakis et al.63 corroborated that the form of punishment and its feedback in the public goods game will have an impact on group cooperation behaviour through behavioural experiments, and only appropriate punishment feedback can promote cooperation in the population.

In the traditional form of punishment, the punishing strategy must generally pay a cost, and individuals who are punished must simultaneously pay a larger penalty cost. The exclusion strategy, which has been proposed recently, can be regarded as a new form of punishing strategy64. Unlike the traditional assumption that defectors who are punished must pay a fixed penalty cost, it assumes that defectors will be expelled from the group by excluders with a certain probability. Moreover, the deported individuals cannot share the cooperative benefits of the group. This mechanism has received widespread attention since its introduction65,66,67. For instance, Li et al.66 extended the evolutionary public goods game model with exclusion strategies to a finite size population for the first time. Further, Liu et al.67 simultaneously introduced prosocial punishment and exclusion type strategies and studied competitions between them, and they affirmed that exclusion can outperform punishment when they coexist. What’s more, Li et al.68 proposed the concept of sequential exclusion, also called asynchronous exclusion, and they contended that asynchronous exclusion is a more effective mechanism than synchronous exclusion when three strategy types exist, namely, cooperation, defection and exclusion. Whether this conclusion remains valid when other strategy types or structured populations are introduced requires investigation. In addition, how to measure the advantage of the asynchronous exclusion mechanism and how the advantage is affected by the system parameters entail further studies. Accordingly, this study aims to answer such arguments.

This study introduces four strategy types used in evolutionary public goods games in two population types, namely, finite-sized, well-mixed and structured populations. In addition, synchronous and asynchronous exclusion forms have been considered. In the well-mixed interactive population, similar to literature37,69, the population state is described by a Markov process. Different from them, the competitive evolution between different strategies is introduced. Based on the same evolutionary dynamic model, the stochastic stable equilibria of the system in the two exclusion cases and the effects of parameters on the probabilities of the system choosing different equilibria are analyzed. Thus, the benefits of the asynchronous exclusion mechanism can be measured by comparing the probability that the system chooses the cooperative states in the two situations. In the structured population, we also compare the evolutionary stable state of the system under the two exclusion mechanisms. Hence, the effects of parameters on the frequency of strategies in the stable state in the two exclusion situations can be obtained. We provide the exact range of parameters, which makes the asynchronous exclusion mechanism relatively efficient. Ultimately, these results elucidate the emergence of cooperation under different exclusion mechanisms.

Results

Synchronous and asynchronous exclusion in optional public goods games

We introduce four types of individuals in the public goods game (PGG), namely, cooperation (denote as C), defection (denote as D), non-participation (denote as L) and exclusion (denote as E). Cooperation type individuals invest in the public goods and share the benefits of their investment income, whereas defection type individuals do not invest in the public goods but can take a free ride of the cooperators’ investment income. Without the loss of generality, we let the investment cost equals one. Non-participation type individuals, who are typically called loners, do not take part in the game but can obtain a fixed income σ, whereas exclusion type individuals not only participate in the investment in the public goods game, but also exclude defectors in the group. The exclusion behaviour is costly, which will bring additional cost to the excluders. However, it can prevent free-riders from sharing their investment income. The defectors who are excluded from the group cannot share the benefit of the public goods. In addition, we assume that the exclusion behaviour cannot successfully exclude defectors with certainty but only with probability β. Let cE denote the unit exclusion cost for an excluder. Evidently, the higher the probability β, the greater the cost cE; thus, we assume that cE is a function of β with \({c}_{E}^{^{\prime} }(\beta ) > 0\) and \({c}_{E}^{^{\prime\prime} }(\beta ) > 0\). Furthermore, this study considers two types of exclusion mechanisms. The first is synchronous exclusion, under which each individual of the exclusion type independently and simultaneously expels all defectors. The other is asynchronous exclusion, and the exclusion process is sequential, which means that once a defector is expelled by an excluder, the latter excluders no longer have to spend the exclusion cost for this defector. Other parameters in our model are as follows. M is the population size (number of individuals). r is the amplification coefficient of the N-person public goods game (1 < r < N). κ is a parameter to describe individuals’ reaction speed to the environment in their decision.

Numerical experiments in a finite size and well-mixed population

In this situation, each individual interacts with others with equal probability. Each time, N individuals are randomly sampled from the population to participate in the PGG. We focus on the stochastic stable states of the evolutionary system and the corresponding probability of the system to choose each stable state. We aim to compare the differences in the probability of the system selecting each evolutionary steady state under the two different exclusion mechanisms. By a large number of numerical experiments under arbitrary parameters in both situations, only the states of (0, M, 0, 0), (0, 0, M, 0), (0, 1, M − 1, 0), (0, 0, M − 1, 1) and (i, 0, 0, M − i) (0 ≤ i ≤ M) may be stochastically stable, where state (i, j, k, l) denotes the number of cooperators, defectors, loners and excluders, respectively, in the population. For example, (0, M, 0, 0) indicates that all individuals choose the defection strategy, and we denote it as the ‘All D’ state. Similarly, (i, 0, 0, M − i) corresponds to the ‘C + E’ states. According to the model assumption, when N − 1 loners exist in the sampled group, the other individual can only obtain a fixed payoff, regardless of its type, which is equivalent to all individuals being loners. Therefore, states (0, 0, M, 0),(0, 1, M − 1, 0) and (0, 0, M − 1, 1) can be collectively referred to as the ‘All L’ state. In the following, we fix parameters M = 20, N = 5, κ = 1 and let cE = 0.2 * 10β to study the effects of parameters β, r, σ on the probability of the system to select each stable equilibrium. We provide results for larger size populations (M = 50 and 100) in Supplementary Information. The probability that the system selects some stable states will change significantly when M increases to 100, but for the main conclusions we present, there is no essential difference between M = 100 and M = 20.

Figure 1 shows the limit probability of the system selecting each stochastic stable state under any parameter combinations of (r, σ) and fixed β = 0.1 in the asynchronous exclusion. It depicts that when β is small, a large range of parameters (r, σ) (denote this region as D1) exists, in which the system selects the cooperative state with a low probability. Moreover, a corresponding region (denote as D2) exists, leading the system to select the ‘All L’ state with a high probability. Notably, a small parameter area (denote as D3) also exists corresponding to the large values of r and small values of σ, which leads the system to select the ‘All D’ state with a high probability. Notably, we notice that when β slowly increases, all three regions shrink rapidly. Moreover, the probability of the system selecting the ‘All D’ state also drop rapidly when parameters (r, σ) are in the D3 region. More details can be found in Supplementary Information. The results are somewhat similar in the synchronous exclusion situation. However, the probability of the system reaching the cooperative state is lower than that of the asynchronous exclusion mechanism under the same combination of parameters. This finding further illustrates that the asynchronous exclusion mechanism works better for the evolution of cooperation. We use the probability difference to represent the benefit of asynchronous exclusion in promoting cooperation. Figure 2 shows the relationship between the probability difference and parameters (r, σ) when β is fixed at six different values, and it elucidates that when β is small, the probability difference of the system selecting the cooperative states under the two exclusion mechanisms is small. Moreover, the points with relatively large differences in probability are located in a small region D4 (D4 changes with the increase in β). Given that the maximal probability difference does not exceed 0.1 when β ≤ 0.3, the advantage of the asynchronous exclusion mechanism is not evident. As β increases, the maximal probability difference also increases, and the benefits of the asynchronous exclusion mechanism slowly emerge. Further, the points with relatively large differences in probability are located in an area with small values of r and small values of σ. To clearly compare the differences between the two exclusion situations, we fix the parameter σ = 0.1 to show the corresponding results. Figures 3 and 4 show the relationship between the limit probability of the system selecting each type of stable states and parameters r and β, respectively, for fixed σ = 0.1 under the two exclusion mechanisms. Only when r is small and β is large will the asynchronous exclusion mechanism have a relatively large advantage in promoting cooperation which is consistent with our theoretical analysis. In fact, when the number of excluders in the population is l, the expected unit exclusion costs in the asynchronous situation cR and in the synchronous situation cE satisfy the following equation: \({c}_{{\rm{R}}}=\frac{1-{(1-\beta )}^{l+1}}{(l+1)\beta }{c}_{E} < {c}_{E}\). The difference between cR and cE increases as β increases. Moreover, when β → 0+, no difference exists between the two costs.

Figure 1
figure 1

Limit probability of the system selecting each stochastic stable state under any parameter combinations of (r, σ) and fixed β = 0.1 in the asynchronous exclusion. In this situation, a large parameter region D1 exists, leading the system to select the cooperative state with a low probability. Moreover, a corresponding region D2 exists, leading the system to select the ‘All L’ state with a high probability. Notably, a small parameter area D3 also exists corresponding to large values of r and small values of σ, which leads the system to select the ‘All D’ state with a high probability.

Figure 2
figure 2

Relationship between the probability difference of the system selecting the cooperative states and parameters (r, σ) under the two exclusion mechanisms. (a) β = 0.1; (b) β = 0.2; (c) β = 0.3; (d) β = 0.5; (e) β = 0.8; (f) β = 1. When β is small, the probability difference of the system selecting the cooperative states under the two exclusion mechanisms is small. Moreover, the points with relatively large differences in probability are located in a small region that changes with the increase in β. Given that the maximal probability difference does not exceed 0.1 when β ≤ 0.3, the advantage of the asynchronous exclusion mechanism is not evident. As β increases, the maximal probability difference also increases, and the benefits of the asynchronous exclusion mechanism slowly emerge.

Figure 3
figure 3

Relationship between the limit probability of the system selecting each type of stable states and parameter r for fixed σ = 0.1 under the two exclusion mechanisms. (a) β = 0.1; (b) β = 0.2; (c) β = 0.3; (d) β = 0.5; (e) β = 0.8; (f) β = 1. The values of β in the first three subgraphs are relatively small, and the advantages of the asynchronous exclusion mechanism are not evident. The values of β in the latter three subgraphs are relatively large, and when r is small, the benefits of the asynchronous exclusion mechanism emerge. It verifies that only when r is small and β is large will the asynchronous exclusion mechanism have a relatively large advantage in promoting cooperation.

Figure 4
figure 4

Relationship between the limit probability of the system selecting each type of stable state and parameter β for fixed σ = 0.1 under the two exclusion mechanisms. (a) r = 1.5; (b) r = 2.0; (c) r = 2.5; (d) r = 3.0; (e) r = 3.5; (f) r = 4. The values of r in the first two subgraphs are relatively small, and when β is large, the advantages of the asynchronous exclusion mechanism emerge. The values of r in the latter four subgraphs are relatively large, and the benefits of the asynchronous exclusion mechanism are not evident.

Simulation results in a structured population

In this situation, each individual is located on a node of the square lattice, and plays the PGG with its four direct neighbors. Thus, each individual participates in five rounds of games to accumulate payoff. Exclusion individual E will exclude their adjacent defectors with certain probability β for each, whilst paying β-related unit cost cE. Defectors who are expelled from the group cannot obtain investment income in the corresponding single-round game. Moreover, we consider synchronous and asynchronous exclusion situations. Figure 5 shows the frequencies of all strategy types after the system reaches evolutionary stability in the two exclusion situations under two values of β = 0.1 and 0.8, respectively, when r changes from 1.5 to 4.5. For each simulation, the system iterates 10,000 to 50,000 rounds (10,000 times for a round), ensuring that the system has reached an evolutionary stable state. We take the average of the last 1000 rounds and obtain all the simulation results by the average results of 20 independent simulation experiments.

Figure 5
figure 5

Relationship of the frequencies of all the strategy types in the squared lattice and parameter r for fixed σ = 0.1 after the system reaches evolutionary stability under the two exclusion mechanisms. (a) β = 0.1; (b) β = 0.8. When β = 0.1, the advantage of the asynchronous exclusion is not evident, and a common r interval (3.1, 3.7) exists in the two exclusion situations, so that the D strategy can emerge and the frequency of D peaks when r = 3.5 after the system becomes stable. When β = 0.8, the asynchronous exclusion mechanism has a significant advantage mainly in the middle interval of r. Two different r intervals (2.2, 2.9) and (2.5, 3.1) exist, which correspond to the asynchronous and synchronous exclusions, respectively, so that the D strategy can emerge.

The simulation results also verify that the asynchronous exclusion works better than synchronous exclusion for promoting cooperation in the structured population. When β is small (β = 0.1), the advantage of the asynchronous exclusion is not evident. In this situation, a common r interval (3.1, 3.7) emerges in the two situations, so that the D strategy can emerge and the frequency of D peaks when r = 3.5 after the system becomes stable. Conversely, when β is large (β = 0.8), the asynchronous exclusion mechanism has a significant advantage mainly in the middle interval of r, the range of which is different from that in the well-mixed population. Two different r intervals (2.2, 2.9) and (2.5, 3.1) exist, which correspond to the asynchronous and synchronous exclusions, respectively, so that the D strategy can emerge. To observe the evolution of the spatial distribution of strategies under the two exclusion mechanisms, we choose four parameter combinations for comparison. Figures 69 show the spatiotemporal distribution of the four strategies in the PGG at different Monte Carlo steps (MCS) in the two exclusion situations.

Figure 6
figure 6

Spatiotemporal distribution of the four strategies in the PGG at t = 20,40,60,300 Monte Carlo steps (MCS) and t = 20,200,2000,5000 MCS, respectively, in the two exclusion situations for one simulation, when β = 0.1, r = 3.5 and σ = 0.1. (a) Synchronous exclusion; (b) asynchronous exclusion. In both exclusion situations, the system reaches the stable state of C + D + L, that is, the three strategies coexist.

Figure 7
figure 7

Spatiotemporal distribution of the four strategies in the PGG at t = 20, 40, 60, 500 MCS and t = 20, 40, 240, 9000 MCS, respectively, in the two exclusion situations for one simulation, when β = 0.8, r = 2.6 and σ = 0.1. (a) Synchronous exclusion; (b) asynchronous exclusion. In the synchronous exclusion situation, the system reaches the state of L, whereas in the asynchronous exclusion situation, the system reaches the stable state of C + D + L, that is, the three strategies coexist.

Figure 8
figure 8

Spatiotemporal distribution of the four strategies in the PGG at t = 20, 40, 60, 300 MCS and t = 20, 200, 2000, 5000 MCS, respectively, in the two exclusion situations for one simulation, when β = 0.8, r = 2.9 and σ = 0.1. (a) Synchronous exclusion; (b) asynchronous exclusion. In the synchronous exclusion situation, the system reaches the state of D, whereas in the asynchronous exclusion situation, the system reaches the stable state of C + E, that is, the two cooperative strategies coexist.

Figure 9
figure 9

Spatiotemporal distribution of the four strategies in the PGG at t = 20, 500, 5000, 15000 MCS and t = 20, 40, 200, 2000 MCS, respectively, in the two exclusion situations for one simulation, when β = 0.8, r = 3.1 and σ = 0.1. (a) Synchronous exclusion; (b) asynchronous exclusion. In both exclusion situations, the system reaches the stable state of C + E, that is, the two cooperative strategies coexist.

Discussion

We verified that the asynchronous exclusion mechanism is indeed better than the synchronous exclusion mechanism for promoting cooperation in the well-mixed and structured populations. The benefits of the asynchronous exclusion are measured by comparing the probability that the system chooses the cooperative states in the two situations. In the well-mixed population cases, only when the investment amplification factor is small and the probability of exclusion success is high will the asynchronous exclusion mechanism have a relatively large advantage in promoting cooperation. However, in the structured population cases, the range of the investment amplification factor, in which the asynchronous exclusion mechanism has relatively large advantages in promoting cooperation, is somewhat different and is mainly in the middle of the interval under our parameters.

The three mechanisms of population structure, voluntary participation and exclusion can promote the evolution of cooperation. However, when these mechanisms exist simultaneously, we corroborate that within our parameters, an interval of r emerges, in which a large proportion of individuals choose defection after the system becomes stable. Figure 5-(b) illustrates that when β = 0.8, the defection frequency reaches a peak value of approximately 0.4 at around r = 2.7 in the asynchronous exclusion situation; and reaches a peak value of roughly 0.8 at around r = 3.0 in the synchronous exclusion situation. The figure further depicts that in the well-mixed population under the same parameters, the frequency of defection is nearly zero. Thus, when non-participation and exclusion strategies exist, the population structure does not necessarily promote cooperation compared with the well-mixed population for some parameter combinations. These results can further enrich the existing conclusions on the voluntary and exclusion mechanisms for the evolution of cooperation. When the population has a heterogeneous network structure, the relevant conclusions need further verification, and we will explore this issue in the subsequent research.

Methods

Evolutionary dynamics in a finite size and well-mixed population

Suppose a finite size population consisting M individuals. Let variables X, Y, Z and W denote the numbers of cooperators, defectors, loners and excluders in the population, respectively. Each time step, N individuals are sampled randomly from the population to play the PGG. Let variables i, j, k and l denote the number of cooperators, defectors, loners and excluders, respectively, in a sampled group.

In the synchronous exclusion situation, the expected payoffs of the cooperation, defection, loner and exclusion type strategies are as follows. The details of analysis can refer to Supplementary Information.

$$\begin{array}{rcl}{\pi }_{C}^{(X,Y,Z,W)} & = & \sum _{l=1}^{N-1}\sum _{k=0}^{N-1-l}\sum _{j=0}^{N-1-l-k}\frac{(\begin{array}{c}W\\ l\end{array})(\begin{array}{c}Z\\ k\end{array})(\begin{array}{c}Y\\ j\end{array})(\begin{array}{c}M-1-W-Z-Y\\ N-1-l-k-j\end{array})}{(\begin{array}{c}M-1\\ N-1\end{array})}\\ & & \times \,\sum _{s=0}^{j}(\begin{array}{c}j\\ s\end{array}){{p}_{1}}^{s}{(1-{p}_{1})}^{j-s}[\frac{r(N-k-j)}{N-k-j+s}-1]\\ & & +\,\frac{(\begin{array}{c}M-1-W\\ N-1\end{array})}{(\begin{array}{c}M-1\\ N-1\end{array})}\sum _{k=0}^{N-2}\sum _{j=0}^{N-1-k}\frac{(\begin{array}{c}Z\\ k\end{array})(\begin{array}{c}Y\\ j\end{array})(\begin{array}{c}M-1-W-Z-Y\\ N-1-k-j\end{array})}{(\begin{array}{c}M-1-W\\ N-1\end{array})}\\ & & \times \,[\frac{r(N-k-j)}{N-k}-1]+\frac{(\begin{array}{c}Z\\ N-1\end{array})}{(\begin{array}{c}M-1\\ N-1\end{array})}\sigma (X\ne 0),\end{array}$$

where p1 = (1 − β)l.

$$\begin{array}{rcl}{\pi }_{D}^{(X,Y,Z,W)} & = & \sum _{l=1}^{N-1}\sum _{k=0}^{N-1-l}\sum _{j=0}^{N-1-l-k}\frac{(\begin{array}{c}W\\ l\end{array})(\begin{array}{c}Z\\ k\end{array})(\begin{array}{c}Y-1\\ j\end{array})(\begin{array}{c}M-W-Z-Y\\ N-1-l-k-j\end{array})}{(\begin{array}{c}M-1\\ N-1\end{array})}\\ & & \times \,{p}_{1}\sum _{s=0}^{j}(\begin{array}{c}j\\ s\end{array}){{p}_{1}}^{s}{(1-{p}_{1})}^{j-s}[\frac{r(N-k-j-1)}{N-k-j+s}]\\ & & +\,\frac{(\begin{array}{c}M-1-W\\ N-1\end{array})}{(\begin{array}{c}M-1\\ N-1\end{array})}\sum _{k=0}^{N-2}\sum _{j=0}^{N-1-k}\frac{(\begin{array}{c}Z\\ k\end{array})(\begin{array}{c}Y-1\\ j\end{array})(\begin{array}{c}M-W-Z-Y\\ N-1-k-j\end{array})}{(\begin{array}{c}M-1-W\\ N-1\end{array})}\\ & & \times \,[\frac{r(N-k-j-1)}{N-k}]+\frac{(\begin{array}{c}Z\\ N-1\end{array})}{(\begin{array}{c}M-1\\ N-1\end{array})}\sigma \,(Y\ne 0),\end{array}$$
$${\pi }_{L}^{(X,Y,Z,W)}=\sigma \,(Z\ne 0),$$
$$\begin{array}{rcl}{\pi }_{E}^{(X,Y,Z,W)} & = & \sum _{l=0}^{N-1}\sum _{k=0}^{N-1-l}\sum _{j=0}^{N-1-l-k}\frac{(\begin{array}{c}W-1\\ l\end{array})(\begin{array}{c}Z\\ k\end{array})(\begin{array}{c}Y\\ j\end{array})(\begin{array}{c}M-W-Z-Y\\ N-1-l-k-j\end{array})}{(\begin{array}{c}M-1\\ N-1\end{array})}\sum _{s=0}^{j}(\begin{array}{c}j\\ s\end{array})\\ & & \times \,{{p}_{2}}^{s}{(1-{p}_{2})}^{j-s}[\frac{r(N-k-j)}{N-k-j+s}-1-{c}_{E}j]\\ & & -\frac{(\begin{array}{c}Z\\ N-1\end{array})}{(\begin{array}{c}M-1\\ N-1\end{array})}(r-1)+\frac{(\begin{array}{c}Z\\ N-1\end{array})}{(\begin{array}{c}M-1\\ N-1\end{array})}\sigma \,(W\ne 0),\end{array}$$

where p2 = (1 − β)l+1.

In the asynchronous exclusion situation, the expected payoffs of cooperators, defectors and loners remain the same, but the expected payoff of exclusion-type strategy becomes

$$\begin{array}{rcl}{\pi }_{E}^{(X,Y,Z,W)} & = & \sum _{l=0}^{N-1}\sum _{k=0}^{N-1-l}\sum _{j=0}^{N-1-l-k}\frac{(\begin{array}{c}W-1\\ l\end{array})(\begin{array}{c}Z\\ k\end{array})(\begin{array}{c}Y\\ j\end{array})(\begin{array}{c}M-W-Z-Y\\ N-1-l-k-j\end{array})}{(\begin{array}{c}M-1\\ N-1\end{array})}\sum _{s=0}^{j}(\begin{array}{c}j\\ s\end{array}){{p}_{2}}^{s}\\ & & \times \,{(1-{p}_{2})}^{j-s}[\frac{r(N-k-j)}{N-k-j+s}-1-{c}_{R}j]\\ & & -\frac{(\begin{array}{c}Z\\ N-1\end{array})}{(\begin{array}{c}M-1\\ N-1\end{array})}(r-1)+\frac{(\begin{array}{c}Z\\ N-1\end{array})}{(\begin{array}{c}M-1\\ N-1\end{array})}\sigma \,(W\ne 0).\end{array};$$

cR is the expected unit expulsion cost in the asynchronous exclusion situation, where \({c}_{{\rm{R}}}=\frac{1-{(1-\beta )}^{l+1}}{(l+1)\beta }{c}_{E}\).

When X, Y, Z and W take zero, respectively, the corresponding \({\pi }_{C}^{(0,Y,Z,W)}\), \({\pi }_{D}^{(X,0,Z,W)}\), \({\pi }_{L}^{(X,Y,0,W)}\) and \({\pi }_{E}^{(X,Y,Z,0)}\) make no sense. In this situation, the payoff of each type strategy is defined as the average payoff of the population.

In the evolution, different types of strategies will mutually transfer based on their relative payoffs. We use the concept of transfer rate \({p}_{{s}_{1}\to {s}_{2}}^{(X,Y,Z)}=\varepsilon +\kappa \cdot {({\pi }_{{s}_{2}}^{(X,Y,Z)}-{\pi }_{{s}_{1}}^{(X,Y,Z)})}^{+}\), where \({f}^{+}=\,{\rm{\max }}(f,0)\), ε > 0 is a small positive number, κ > 0 is a parameter to describe individuals’ reaction speed to the environment in their decision, s1, s2 {C, D, L, E}, s1 ≠ s2 to describe the relative rate of transfer between different strategies, which is different from the transition probability (such as given by the Fermi function) of individuals in a structured population. The transfer rate is a macro-indicator that describes the mutual transfer intensity between four different strategies in the well-mixed system, whereas the agent-based transition probability is a micro-indicator to describe how individuals change their strategies. In fact, we can use the transfer rate to understand how individuals evolve (change their strategies). Each time interval t (t is adequately small), one of the four kinds of strategies is chosen. Without loss of generality, we assume that it is a C strategy. Then the probabilities of its transfer to the D, L and E strategies are \({p}_{C\to D}^{(X,Y,Z)}t+o(t)\), \({p}_{C\to L}^{(X,Y,Z)}t+o(t)\) and \({p}_{C\to E}^{(X,Y,Z)}t+o(t)\), respectively, and with probability \(1-\sum _{S\in \{D,L,E\}}{p}_{C\to S}^{(X,Y,Z)}t-o(t)\) to remain the same, where (x, y, z) is the state of the system and o(t) is a high order infinitesimal of t when t is adequately small. Changes in individual strategies will result in changes in the state of the system. The evolution of the system can be described as an ergodic multi-dimensional Markov process. Based on the limit distribution of the process, we can obtain the stochastic stable states of the system and their corresponding limit probabilities. The details of the Markov-process-based dynamics and the definition of stochastic stable equilibrium can refer to Supplementary Information.

Evolutionary dynamics in a structured population

We consider a 100 × 100 square lattice with periodic boundary conditions. Initially, all the four strategy types (C, D, L and E) are distributed uniformly at random on the network. Let \({\pi }_{{s}_{i}}\) denote the cumulative payoff of individual i (with strategy si) participating in five rounds of the PGG. The system evolves according to the following rules. Each time, an individual i is randomly selected. Then, individual i chooses one of its neighbors at random. The chosen individual j imitates the strategy of individual i with probability \(p({s}_{i}\to {s}_{j})=\frac{1}{1+\exp [({\pi }_{{s}_{j}}-{\pi }_{{s}_{i}})/\tau ]}\), where τ is a parameter indicating the intensity of noise. When τ → 0, individual j imitates the strategy of individual i if and only if \({\pi }_{{s}_{i}} > {\pi }_{{s}_{j}}\). When τ → + ∞, whether individual j imitates the strategy of individual i is completely random. In our simulation, we fix parameter τ to 0.1 and repeat the above process 10000 times as a round of iterations to ensure that each node has an opportunity to adjust its strategy on average in one iteration. The system can be stabilized after a sufficiently large number of iterations.