Coevolution of teaching ability and cooperation in spatial evolutionary games

Individuals with higher reputation are able to spread their social strategies easily. At the same time, one’s reputation is changing according to his previous behaviors, which leads to completely different teaching abilities for players. To explore the effect of the teaching ability influenced by reputation, we consider a coevolutionary model in which the reputation score affects the updating rule in spatial evolutionary games. More precisely, the updating probability becomes bigger if his/her partner has a positive reputation. Otherwise, the updating probability becomes smaller. This simple design describes the influence of teaching ability on strategy adoption effectively. Numerical results focus on the proportion of cooperation under different levels of the amplitude of change of reputation and the range of reputation. For this dynamics, the fraction of cooperators presents a growth trend within a wide range of parameters. In addition, to validate the generality of this mechanism, we also employ the snowdrift game. Moreover, the evolution of cooperation on Erdős-Rényi random graph is studied for the prisoner’s dilemma game. Our results may be conducive to understanding the emergence and sustainability of cooperation during the strategy adoptions in reality.

defectors through forming clusters, which can protect those cooperators that are located in the interior of clusters. Following this pioneering work, a great number of promoting mechanisms have been studied. See, for example, the survey articles 32,33 . Complex networks, having a similar connectivity distribution with complex systems in reality, e.g., air transportation networks and the Internet, provide a uniform framework to understand the common cooperation behaviors [34][35][36][37] .
Teaching activity 38,39 is an important process in the evolution of cooperation, which refers to the influence or reproduction rate of individuals. Players with high influence are more likely to reproduce than individuals with low influence, i.e., they have a higher teaching ability. In previous works 38, 39 , teaching ability is a control variable, which is unchanged during the evolution. However, teaching ability is changing continuously in reality, as a consequence, we adopt a coevolutionary model in this paper. In addition, it is not hard to imagine that players who have higher reputation are able to spread their social strategies easily. For example, the companies will get higher and higher reputation if they always complete the production tasks on time on the basis of the contracts with other enterprises. Then, more and more firms are not only inclined to cooperate or deal with them, but also more likely to refer to and imitate their way of operational management or technologies. On the contrary, other companies are not willing to imitate them. Therefore, the logical assumption using reputation score to symbolize the teaching ability is reasonable. Reputation represents a class of individual information which is about one's past behaviors. It will change according to one's past behaviors. It has promoted the evolution of cooperation effectively in games of indirect reciprocity. As a classically theoretical model of reputation, image scoring has been studied extensively in which cooperative behaviors increase reputations and defective actions decrease the score by one unit 40 . It has been proved that cooperation can be enhanced evidently with the aid of reputation. Migration based on the reputation has been introduced into the spatial PDG 41 . Individuals can adjust their partnerships on the basis of local information about reputation 42 . The time scale of selection and updating will change if reputation is introduced 43,44 . Some coevolutionary models about time scale and cooperation are employed in previous works 45,46 and the results show that cooperation can be promoted when an individual with a high payoff holds a successful strategy for a longer time. In the present paper, strategy updating and teaching ability have the same time scale. In addition, cognitive ability based on reputation is also studied, inferring that reputation mechanism can be seen as a universally applicable promoter of cooperation, which works on various interaction networks and in different types of evolutionary game [47][48][49][50][51] . However, one can not ignore partners' reputation (teaching ability) when he updates his strategy because reputation includes a lot of information about the partner. Obviously, one's teaching ability could affect partner's decision directly. Generally speaking, one is perhaps more likely to adopt partner's strategy with a good reputation and excludes the one with a bad reputation. For example, virtuous people usually spread their minds easily in reality. This form of connection between reputation and partner's teaching ability is not studied in previous work. Therefore, a more realistic scenario will acknowledge that a player will make a decision by taking the teaching ability into consideration.
Based on the above facts, in the present paper, we propose a modified updating rule incorporated with partner's reputation to describe the teaching ability. It is assumed that individuals acquire reputation without extra cost because reputation information can spread among neighbors by gossip. The PDG and SDG are employed to model social dilemmas, in which interactions are driven by complex topologies. In this paper, we consider the regular lattice (the neighborhood setups are the von Neumann neighborhood or the Moore neighborhood, in other words, the degree k is equal to 4 or 8 for each vertex, respectively) and the Erdős-Rényi (ER) random graph. For the ER random graph, the average degree k is equal to 4. Simulation results show that a higher level of cooperation appears when teaching ability is in effect during the decision making process.

Results
Teaching ability, represented by reputation score R i , is introduced into the strategy updating rule to explore its influences on the emergence of cooperative behavior in spatial evolutionary games. The influences of one's teaching ability change during the evolution of games. The change amplitude of R i is δ (>0) every time. That is to say, choosing cooperation for player i will lead to R i increases by δ. Otherwise, it decreases by δ. Additionally, R i ∈ [−α, α] (α > 0), which means that the value of reputation has a saturation effect whether it is good or bad. Reputation score R i has an important effect on the strategy updating for player i. From a qualitative point of view, the probability of strategy updating becomes more bigger if R i is positive. Instead, it turns smaller. Furthermore, δ/α is the fluctuation ratio of reputation. It represents the intensity of teaching ability. In the following results, we set the size of the regular lattice from 100 × 100 to 200 × 200. The size of the ER random graph is 10000. And Monte Carlo (MC) simulation is repeated for 61000 times. The details of interactions between agents and their corresponding payoffs are summarized in the Methods section.
We start by examining the effect of the new strategy adoption rule on the persistence of cooperation. As shown in Fig. 1, two different sizes of neighbors are compared to analyze the impact of strategy selection on the evolution of cooperation on the regular lattice (200 × 200) with periodic boundary conditions. More concretely, panels (a) and (b) are corresponding to the results for the von Neumann neighborhood and the Moore neighborhood 52 , respectively. δ = 0 means that the model degenerates to the traditional version and the normalized payoff difference (P i − P j )/k is the sole determinant factor for strategy updating. According to the previous research 31 , cooperators could form clusters to prevent defectors from invasion, which is called the network reciprocity. However, the cooperators located on the edge of the cooperative clusters are prone to revolting as the value of b increases, which results in the dissolution of cooperative clusters eventually (i.e., in the two traditional cases, the values of b, which make the cooperators vanish, are less than 1.1).
As shown in panels (a) and (b), once δ > 0, the evolution of the whole system becomes totally different because the teaching ability is considered. The normalized payoff difference (P i − P j )/k and the teaching ability R i decide the strategy updating at the same time. Moreover, b is fixed to 1.15. For each curve in Fig. 1 cooperation ρ c monotonically increases with the increasing of the fluctuation ratio of reputation δ/α. Although the temptation to defect b is high, this mechanism can guide the players to select cooperation effectively so that cooperators survive in the system. It can be observed that cooperators could dominate the whole network in some cases. The introduction of teaching ability makes individuals update strategy depends on the normalized payoff difference and the teaching ability. And this updating rule seems to be reasonable and makes cooperation become the dominant strategy. Furthermore, the range of reputation α has a great impact on the evolution of cooperation. Obviously, the speed and intensity of cooperators appearing and spreading in the α = 3 condition are more remarkable than those of α = 1 or α = 2. However, the gap between the two curves of α = 2 and α = 3 is smaller than that between α = 1 and α = 2. This suggests that the effect of the range of reputation will not increase immensely. The above results clearly indicate that the evolution of cooperation is greatly promoted under the newly introduced mechanism.
It remains interesting to elucidate how this new mechanism promotes cooperation. To provide answers, we show some characteristic snapshots on a 100 × 100 square lattice (the von Neumann neighborhood) in Fig. 2 (the green and the red represent the cooperators and the defectors, respectively). The parameter b is given by a constant term in all snapshots (b = 1.17). First, looking at the upper row, the snapshots are given for t = 0, 5, 10, 100, 60000. As shown, cooperators and defectors uniformly scatter all over the lattice initially. As described earlier, cooperators will die out in the traditional version in this condition. However, the evolution is obviously different once the teaching ability (δ = 1.5 and α = 3.0) is incorporated. Compared with the traditional case, all players take more information (the teaching ability) into account when they make a decision. Cooperative clusters could protect cooperators granted that the value of b is high. Many a cooperator is able to survive at the stable stage even though the fraction of cooperators will fall at the beginning of the evolution.
To compare with Fig. 1, we also explore the distribution of strategies at the 60000th Monte Carlo step (MCS) in the lower panels of Fig. 2. The parameters are δ = 0, δ = 0.75, δ = 1.5, δ = 2.25, δ = 3.0 from left to right and α = 3.0. The other configurations are consistent with the upper row. For δ = 0, cooperators still can not survive because the selection intensity is not enough to resist the temptation to defection. With the increasing of δ, many big and compact clusters form steadily when the mechanism works. For example, the territory of the defectors becomes more and more smaller when δ increases from 0 to 3. The increased teaching ability means that some information except the payoff becomes more and more important for individuals. Such a consideration is reasonable, e.g., one will consider a lot of things besides the profit when he makes a decision. These results illustrate that this mechanism can facilitate the network reciprocity remarkably. Based on this fact, it is not hard to understand that cooperators can dominate the whole system when δ reaches the maximum at the same condition. The simulated phenomena imply that the cooperators can survive or even thrive owing to the consideration of appropriate teaching ability.
Note that there are two major factors to affect the probability of strategy updating from the above analyses: the normalized payoff difference and the value of reputation. Therefore, it is necessary to study the individual's preference for strategies. What follows is an observation about the temporal traits of strategy retention rate. As shown in Fig. 3, the parameter b is fixed to 1.2 in every panel and δ/α = 0.75 except the traditional case. Therein, ρ c→c represents the rate that cooperator is still a cooperator between two rounds. Analogously, ρ d→d is the rate that defectors retain defective strategies along with time. For the traditional case, ρ c→c reaches to 0 and ρ d→d becomes 1 fast since the temptation to defect is high. Consequently, cooperators become extinct soon in the background of the payoff of defectors being more than that of cooperators. However, cooperators could occupy a certain territory when teaching ability (α = 1.0) is taken into consideration. For α = 2.0, ρ c→c first drops and the trend of ρ d→d is opposite completely, which proves that the overall atmosphere is still unfavorable for the persistence of cooperation even though with the help of different teaching abilities in the early stage. After that stage, ρ c→c fast upward pulls and ρ d→d falls rapidly and ρ c→c even exceeds ρ d→d . Since the cooperation strategy is more likely  becoming reference selection, so it is not hard to understand that the evolution of cooperation widely spreads in Fig. 2. For α = 4.0, the evolution trend is similar to α = 2.0. However, the gap between two curves becomes bigger and bigger and the number of retaining cooperation will increase to 1. This process means that the prevalence of cooperation is positively related to the value of α. We could draw a conclusion that the incorporation of teaching ability influenced by reputation adjusts the microscopic preference of players and accelerates the dissemination of cooperation based on these results. These promoting effects are consistent with the aforementioned results.
Besides, it deserves to consider how the critical threshold value of b c changes with the fluctuation ratio parameter δ/α. Fig. 4 is the simulation results on a 200 × 200 square lattice (k = 4). As shown in Fig. 4, b c denotes the threshold for cooperators to die out. For the traditional version, we could see a straight line (black) at the bottom of Fig. 4. It indicates that all players only care about payoffs so that defection becomes the rational choice. However, it can be observed that b c increases monotonically from left to right for other three curves, which means that the space of cooperators living is enlarged as δ/α increases. Those players with good reputation restrain selfish agents from adopting defection. For example, the cooperators are located in the edge of clusters in Fig. 2 are more loyal to cooperation. This result fully explains that the new idea could promote the survival of cooperators among selfish players.
Lastly, it is worth exploring the robustness and generality of the above observations by means of different networks and evolutionary game models. Here we set δ/α = 1 for all curves in Fig. 5. Our MC simulation results in the left panel are about prisoner's dilemma game on the ER random graph. The network has the same average degree (i.e. k = 4) and size (N = 10 4 nodes) with regular lattice. As shown, a virtually promotion effect on the evolution of cooperation can be observed compared with the traditional version (δ = 0). The evolution of cooperation is strengthened effectively with the increasing of δ in the PDG, which is qualitatively consistent with the results obtained on the regular network. As an example, the critical value b c has exceeded 2.0 when δ = 3.0, which implies that the cooperators can survive or even thrive within a large range of b's values. For the right panel of Fig. 5, it depicts the fraction of cooperators ρ c of SDG on the regular network (200 × 200 and k = 4) depend on the parameter r. Likewise, cooperation is enhanced obviously. Anyway, these results support the fact that teaching ability influenced by reputation is a universally effective way to sustain and promote cooperation, regardless of the form of the underlying game and interaction network.

Discussion
In sum, we have proposed a coevolutionary model to investigate the impact of teaching ability influenced by reputation on the evolution of cooperation in spatial evolutionary games. This model emphasizes the relevance of strategy adoption and teaching ability when human behaviors are modeled. This form of strategy adoption illustrates that the players with high reputation can spread their strategies easily and vice versa. There is no doubt that this coevolution process conforms to the real situation compared to the traditional case. Numerical simulation implies that the amplitude of change of reputation δ and reputation's range α have a significant impact on the persistence of cooperation. More cooperative clusters appear easily under this updating dynamics. Those players with good reputation restrain selfish agents from adopting defection strategy. In addition, the robustness of the enhancement effect is checked on ER graph for the prisoner's dilemma game. The promoting effects are confirmed in the snowdrift game as well. The aforementioned results may illustrate that this mew mechanism has a certain degree of university because it works effectively for the prisoner's dilemma game and the snowdrift game on two kinds of networks (the regular square lattice and the ER random graph). This work may be conductive to understanding the cooperative behaviors in complex economics as well as human society.

Methods
In this work, evolutionary PDG and SDG are employed to explore the role of teaching ability influenced by reputation, in which every player occupies a vertex of the underlying networks. For testing the robustness of the impact of this newly introduced mechanism on the evolution of cooperation, different networks topologies including regular lattice and Erdős-Rényi (ER) random graph are taken into consideration. For simplicity but without loss of generality, here we consider a so-called weak PDG 31 , which is characterized with the temptation to defect T = b, mutual cooperation R = 1, the punishment for the mutual defection P = 0 and the suckers' payoff S = 0. Therefore, it is not hard to see that the outcome of the game is only dependent on the parameter b. In addition, 1 < b < 2 quantifies the temptation to defect and represents the advantage of defectors over cooperators. For the SDG, the rescaled payoffs are T = 1 + r, R = 1, S = 1 − r and P = 0, where 0 < r < 1 represents the so-called cost-to-benefit ratio and payoffs still satisfy the ranking T > R > S > P. Initially, each individual i is designed as a cooperator (s i = C) or a defector (s i = D) with equal probability and is given a reputation score coefficient R i as well. To avoid the preferential influence, we set R i = 0 before the game. Reputation is important in human society. It reflects one's history information or status and is accessed by all members in his/her community. Individuals with different reputation scores have different influences on the players interacting with them. As a consequence, we use the reputation score to symbolize the teaching ability. Moreover, it is assumed that reputation spreads among neighbors by gossip and is evaluated under the simple protocol below without cost.
The game is simulated with the following Monte carlo (MC) simulation procedures: firstly, player i gets his total payoff P i by playing the game with his nearest neighbors. Next, player i will choose a neighbor j randomly from its neighbors as the reference target, who acquires payoff P j in the same way. Last, all agents synchronously update their strategies according to the following probability: where K represents the intensity of selection. Without loss of generality, we set K to be 0.1 in this paper if not directly stated 53 . Player i will adopt j's strategy relying on the normalized payoff difference (e.g. (P i − P j )/k) and R j . The degree of player i is k. It is noted that each player has a chance to adopt one of their neighbors' strategies once on average during one full MC simulation. As mentioned above, here we assume that players have local information about his nearest neighbors. As a consequence, each neighbor's reputation is known to the focal player and R j could affect Prob i directly. Furthermore, it is more practical that reputation is changing with the evolution.
Here we assume that δ > 0 symbolizes the amplitude of change of reputation. That is to say, R i increases by δ when i is a cooperator and R i decreases by δ when i chooses defection. Additionally, R i ∈ [−α, α](α > 0), which means that the value of reputation has a saturation effect whether it is good or bad and δ/α is the fluctuation ratio of reputation. According to formula (1), player i prefers to adopt player j's strategy if R j > 0. Instead, player i has an attitude of exclusion for player j if he is notorious. This simple design describes the influence of teaching ability on strategy adoption effectively. Ultimately, a whole Monte Carlo step (MCS) is finished if the above-mentioned fundamental procedures are implemented. For the regular lattice, the results of MC simulation presented in the Results are obtained on populations comprising 100 × 100 up to 200 × 200 agents. And the neighborhood setups are the von Neumann neighborhood or the Moore neighborhood, namely, the degree k is equal to 4 or 8 for each agent, respectively. For the ER random graph, the size N and the average degree k are N = 10 4 and k = 4, respectively. Additionally, the fraction of cooperators ρ c is acquired by averaging the last 1000 full MCS of the total 61000 and the final results are averaged over 10-20 independent runs to guarantee the accuracy.