Stability of cooperation under image scoring in group interactions

Image scoring sustains cooperation in the repeated two-player prisoner’s dilemma through indirect reciprocity, even though defection is the uniquely dominant selfish behaviour in the one-shot game. Many real-world dilemma situations, however, firstly, take place in groups and, secondly, lack the necessary transparency to inform subjects reliably of others’ individual past actions. Instead, there is revelation of information regarding groups, which allows for ‘group scoring’ but not for image scoring. Here, we study how sensitive the positive results related to image scoring are to information based on group scoring. We combine analytic results and computer simulations to specify the conditions for the emergence of cooperation. We show that under pure group scoring, that is, under the complete absence of image-scoring information, cooperation is unsustainable. Away from this extreme case, however, the necessary degree of image scoring relative to group scoring depends on the population size and is generally very small. We thus conclude that the positive results based on image scoring apply to a much broader range of informational settings that are relevant in the real world than previously assumed.

Scientific RepoRts | 5:12145 | DOi: 10.1038/srep12145 selfish behaviour in the one-shot game 14 . As a result, if reputation matters sufficiently in determining with whom individuals interact, then cooperation can survive despite limited foresight.
Unfortunately, due to the inherent simultaneity of interactions 23 , it is impossible to condition one's own decision on the decisions of the others 14 . One of the most important, and perhaps the simplest, reputation mechanism known in the literature to overcome this problem is image scoring 24 . Famously, image scoring can sustain cooperation in the repeated two-player prisoner's dilemma through various forms of indirect reciprocity [25][26][27] . Under image scoring, agents learn who cooperated and who defected in previous interactions, and consequently condition their own actions on this information. Essentially, image scoring enables cooperators to find each other, and this overcomes the negative Nash equilibrium prediction of universal defection from the one-shot game. Interestingly, image scoring has been shown to work in the laboratory [28][29][30][31][32] , but in general, it is considered to provide a relatively frail support to cooperative behaviour 33,34 .
In fact, many real-world social dilemmas unfold in groups 35 , and it is unlikely that individuals will have access to others' individual action histories 1,2 . Information that should be readily available, however, concerns the performance of the groups as a whole. Such information thus enables 'group scoring' as an alternative to image scoring. In particular, the image of an individual is no longer determined by its own past action, but by the performance of the group where an individual is member. More precisely, each player's group score summarizes the aggregate cooperativeness of the groups where he was a member in the past, without any additional information regarding what the player did individually. Two important and previously unaddressed new questions emerge: (i) How do results related to image scoring generalize to group scoring?, and (ii) How sensitive are these results to information, that is, when image scoring constitutes a proportion p ∈ [0,1] of information made available and the residual information is based on group scoring? The common feature of previous research on image scoring is that, over time, cooperators achieve higher scores and defectors achieve lower scores, and interactions are matched based on these scores such that thus cooperators play with cooperators and defectors play with defectors. In our paper, we build on this common feature by assuming the existence of a mechanism that assorts and matches players by their scores. In addition, we extent the scope of image-scoring-based models by analysing the sensitivity of the results to the imperfections of scores that ought to reflect individuals' true past cooperativeness rather than the overall performance of the groups where they are members. Our analysis continuously spans the worlds of two extremes; image scoring and group scoring. As we will show, image scoring, reflecting accurately individuals' past actions, works perfectly also in the generalized prisoner's dilemma game that is governed by group interaction. Conversely, group scoring fails, as it enables defectors to effectively hide behind the cooperative efforts of others in the group. But how many true images are needed for cooperation to evolve in group interaction? In other words, what is the necessary proportion p ∈ (0,1) of image scoring? It turns out that this depends sensitively on the underlying parameters of the interactions in ways that provide a formal basis for some of Ostrom's conditions for successful common-pool resource management 3 . Key determinants are the rate of return, the size of the population, and the group size. Remarkably, for large populations only a 'grain' of image scoring is generally sufficient for cooperation to become dominant.

Results
We shall now formalize our arguments, generalizing step by step the two-by-two prisoner's dilemma model due to 24 of a fixed size s = n/k according to the ranking of players' scores (with random tie-breaking). After groups have formed, i's resulting payoff turns out φ ( ) r is the game's fixed rate of return, and r/s the game's 'marginal per-capita rate of return' that summarizes the underlying game's synergy. We assume, as is standard, r ∈ [1, s], that is, contributing a unit of budget is socially beneficial (yielding a sum total of payoffs to all players larger than one), but individually costly.
We consider the following range of scoring mechanisms between image scoring and group scoring.
Image scoring. First, we formulate the equivalent of image scoring 24 in our setup: at time t each player i has an image score, s Ii t , known to every player which is based on decisions prior to t.
Group scoring. Analogously, we formulate group scoring: at time t each player i has a group score, s Gi Hybrid scoring. A hybrid between image scoring and group scoring in our setup means that, at time t, each player i has a hybrid score, s Hi t , known to every player which is based on decisions prior to t. In period t + 1, if i's score is updated according to image scoring with probability p, and according to group scoring with probability 1 − p.
To summarize the three scoring methods, the types of information necessary under the different regimes are as follows. For image scoring, ex post individual-level information about contribution decisions is necessary, group-level information is therefore trivially also available. For group scoring, group associations and ex post group-level contributions need be known, individual-level information is not necessary. For the hybrid case, characterized by degree of image scoring p ∈ [0, 1], the probability that individual-level information rather than only group-level information becomes available must be larger than zero.
The stability of cooperation under the different scoring rules can be evaluated. One result is that tragedy of the commons (resulting from universal defection) is a potential risk in all cases. The stability of universal defection under all scoring rules derives from the fact that unilateral defection is a best response under all scoring methods against a state of universal defection. However, the relative stability of this worst-case outcome vis-a-vis a highly cooperative state critically depends on the scoring rule, mitigating this issue.
Under image scoring, high levels of cooperation can be stabilized and then turn out to be more stable. This is the case if the proportion of cooperators with score one (matched in good groups) grows exactly at the speed so as to neutralize the shrinking of the proportion of cooperators with score zero (matched in bad groups). The defectors profit from the contributions of the latter group and achieve an average growth equal to that of the average cooperator.
Stability is summarized with the results presented in Fig. 1 from simulations. The Methods section contains analytical proof of these results, as well as a description of the employed Monte Carlo simulation procedure. It can be observed that the state of complete cooperation is reached exponentially fast. We emphasize that this result is recovered independently of the value of s, r and n, and it is also robust against variations of the strategy adoption rule. Cooperation will always prevail under image scoring, as it allows cooperators to separate from defectors. In general, cooperators form homogeneous groups that provide them with a competitive payoff. Conversely, defectors must be content to form groups with their like, which provides them a null payoff. Cooperators can therefore easily invade defectors, and they do so with a speed that is proportional to their number, which ultimately gives rise to the exponentially fast downfall of defectors.
At first sight, such a state of cooperation may also seem a candidate for stability under group scoring. Inspection of the individual growth dynamics, however, reveals one crucial difference. Namely, cooperation states are not robust against the influx of defectors with score one. These players outperform all others, which, jointly with the fact that score-zero defectors outperform score-zero cooperators, implies an above-average growth rate for defection vis-a-vis cooperation. In other words, the key difference between image scoring and group scoring is that defectors can only free-ride on the contributions of others under image scoring, while, under group scoring, defectors can free-ride on the contributions and scores of others.   Fig. 1 in terms of the simulation procedure, only that here group scoring instead of image scoring is used. It can be observed that, irrespective of the initial fraction of cooperators, they eventually die out. As by image scoring, this outcome too is robust against variations of s, r, n and the strategy adoption rule. Group scoring allows defectors to have the same high score as cooperators, which in turn disables the separation of the two strategies into homogeneous groups. In agreement with the outcome of the public goods game in a well-mixed population, even a single defector can therefore eventually invade the entire population. Groups scoring thus completely fails to mitigate the tragedy of the commons.
Since image scoring and group scoring could not be more different in their ability to stabilize cooperation, it remains of interest to determine the merit of hybrid scoring. While it seems reasonable to assume that sometimes the information about the past of each particular individual is readily available, more often that not the scoring of an individual is possible only indirectly through the achievements of the groups where s/he was member. We note that individual contributions in group efforts are notoriously difficult to pinpoint, which is also why the reciprocation to such efforts is quite a vague concept -if a group contains a cooperator and a defector, who do you reciprocate with 36 ? The question thus is, just how much individual-level information is needed to stabilize cooperation? To answer this question, we introduce the probability p that a player's score is determined by image scoring, while otherwise, with probability 1 − p, group scoring is used. All other simulation details remain the same as in Figs 1 and 2.
Results presented in Fig. 3 show that cooperation can evolve even at a very small p value, if only the population size is sufficiently large. The key for the stability of cooperation is for cooperators being able to recognize each other through their high scores, and thus to form homogeneous groups. The lower the value of p and the lower the value of r, the longer it takes for cooperators to segregate from defectors. Since cooperators are threatened by extinction, it is imperative that the segregation occurs before defectors take over. Accordingly, the lower the value of p and r, the larger the population size needs to be to warrant sufficient time to cooperators to segregate before they die out. A lower bound for p is p ≥ 2/n. Results presented in Fig. 4 make these arguments quantitatively more accurate. Evidently, the lower the population size, the higher the values of p and r need to be for cooperation to prevail. In small populations, there exist critical threshold values for both r (main panel) and r (inset), where drops to defector dominance are abrupt and occurring without precursors.
We emphasize that the results concerning hybrid scoring mechanisms are independent of the group size as long as their number, and hence the population size, is sufficiently large, and they are also independent of the strategy adoption rule. This corroborates our main argument, which is that, regardless of the scoring that is used, conditions need to be given for cooperators to completely segregate from defectors, i.e., to form homogeneous groups without a single defector. The identification of defectors has second-order importance. The key goal of scoring is thus to allow cooperators to recognize each other efficiently and to form homogenous groups accordingly. explanation for the puzzle of cooperation is based on image scoring 24 , a mechanism that is both stunningly successful and stunningly simple. However, in its original formulation and application 24 , it came with the restriction to interactions that are pairwise to informational environments that allow a complete tracking of individual-level information.

Discussion
In the real world, these restrictive assumptions may not be germane. Instead, cooperation may involve several groups of individuals, and ex post information regarding individual-level cooperativeness may only percolate imperfectly through group-level information. Our focus in this paper has been to determine robust theoretical predictions regarding the emergence and survival of cooperation in such situations. The presented results have rather important implications. One is negative. Namely, when individual-level information is not available, cooperation cannot spread. But there is a silver lining. When there is at least a 'grain' of individual-level information, this may suffice for cooperators to find each other and form groups that are impervious to an invasion by defectors. We have shown that the spread of cooperation is robust to extensive imperfections in image scoring, thus extending the domain of environments where we should expect flourishing cooperation levels based on this established mechanism that fosters indirect reciprocity. Factors that affect the effectiveness of hybrid scoring negatively and positively are group size and population size, respectively. The rate of return was shown to not matter.
There is one important component that the model we considered here externalises, namely the issue of how a hierarchy of scores translates into an analogous group formation hierarchy. These are questions future work should address. In particular, we need to address how mechanisms known to play such a role in the context of image scoring would translate into the informational setting considered here, and how such mechanisms may be designed. Little is known in this direction. Economic experiments 37,38 might be particularly conductive to such research and guide future theoretical work to relevantly address these fundamental dilemmas of human cooperation.

Methods
Simulation procedure. The employed Monte Carlo simulation procedure 39 requires the iteration of the following three elementary steps. First, two randomly selected players i and j play one instance of the public goods game in their current group, thereby obtaining payoffs φ i and φ j , respectively. Next, player j adopts the strategy of player i with the probability given by the Fermi function , where π as is the expected payoff to action-score pair as and π is the average population payoff. . Hence, ∂p C /∂t < 0 and therefore any such process is stable at p C = 0.
Stability of high cooperation levels under image scoring. Suppose the four different strategies at time t have mass of p C1 , p D1 , p C0 , p D0 respectively such that p D1 = 0. We shall now show that there exists a starting state with p C > (s − 1)/s such that ∂p C /∂t = 0. Suppose that p C1 = (n − s)/n. Then π C1 = r, π C0 = (s − n * p D0 ) * (r/s) and π D0 = 1 + (s − n * p D0 ) * (r/s).  Figure 4. Hybrid scoring requires a critical population size to stabilize cooperation under adverse conditions. Cooperators essentially compete to segregate from defectors before being completely wiped out. The lower the value of r (main panel) and p (inset), the larger the population size needed for cooperation to be stabilized. Note that for a sufficiently small population size (see legend), there exist sudden drops to zero cooperation levels at critical values of r and p. Shown is the stationary fraction of cooperators in dependence on r (main) and p (inset), as obtained for different numbers of groups forming the population (see legend). Parameter values are: p = 0.01 (main) and r = 1.1 (inset).