Quantitative assessment can stabilize indirect reciprocity under imperfect information

The field of indirect reciprocity investigates how social norms can foster cooperation when individuals continuously monitor and assess each other’s social interactions. By adhering to certain social norms, cooperating individuals can improve their reputation and, in turn, receive benefits from others. Eight social norms, known as the “leading eight," have been shown to effectively promote the evolution of cooperation as long as information is public and reliable. These norms categorize group members as either ’good’ or ’bad’. In this study, we examine a scenario where individuals instead assign nuanced reputation scores to each other, and only cooperate with those whose reputation exceeds a certain threshold. We find both analytically and through simulations that such quantitative assessments are error-correcting, thus facilitating cooperation in situations where information is private and unreliable. Moreover, our results identify four specific norms that are robust to such conditions, and may be relevant for helping to sustain cooperation in natural populations.

(3) Additionally, we have that 33 (iv) r t+1 ij ≥ r t ij for all i, j ≥ 2, and that 34 (v) r t+1 ii ≥ r t ii for all i. action that lets a good donor maintain her good reputation in her own eyes, independent of which 41 reputation she assigns to the recipient. Thus, all players keep considering themselves as good after 42 one interaction; either they do not need to make a decision (because they were not chosen to act as 43 the donor), or they choose an action they themselves evaluate as good.

44
(ii)-(iii) r ij = r i j for i, i ≥ 2, j ≥ 1 and r ij ∈ {0, 1} for all i, j ≥ 2. Since M ∈ M, all players i, j ≥ 2 45 initially agree on the reputations of all population members. Because they all apply the same 46 assessment rule and observation errors are excluded, they also agree on how the donor's action 47 in the subsequent interaction needs to be assessed. This shows r il = r jl for all i, j ≥ 1 and all l. 48 Moreover, since all players i, j ≥ 2 consider each other as good initially, and since their common 49 action rule only lets them choose actions that let them keep their good reputation, we conclude 50 r ij ∈ {0, 1} for i, j ≥ 2.

51
(iv)-(v) r t+1 ij ≥ r t ij for all i, j ≥ 2 and r t+1 ii ≥ r t ii for all i. Since M ∈ M, all players i ≥ 2 initially 52 agree on the good reputations of all population members. By (ii) and (iii), all players i ≥ 2 also 53 keep their good image of each other, and only potentially change their opinion about player 1.

54
Since they thus never act against their assessment of each other for any reason, their reputation 55 scores in each others' eyes can only increase. Hence r ij ≥ r ij for all i, j ≥ 2. The same reasoning 56 applies to all self images, including that of player 1 -due to the lack of observation errors of any 57 kind, players never act against their own assessment rule, and can only improve their self image 58 over time. This shows r ii ≥ r ii for all i.

60
Proposition 1 guarantees that when we consider a process with private information, perfect ob-61 servation, and no noise, (i) all players assign themselves a good reputation overall, (ii) all players 62 2 ≤ i, j ≤ N assign each other a good reputation overall, (iii) all players 2 ≤ i, j ≤ N assign the same 63 reputation to player 1, (iv) the exact reputation scores that players 2 ≤ i, j ≤ N assign to each other at 64 time t can not be smaller than in the initial configuration. Furthermore, (v) and (vi) additionally imply that the exact reputation scores that players 2 ≤ i, j ≤ N assign to each other are nondecreasing over all 66 timesteps t, which means that the overall reputations among these players cannot turn into "bad".

67
Proposition 1 also lets us reduce the state space when we consider a model of private information.

68
Instead of tracking entire image matrices M , we can focus on 3-tuples (s, k, l), with s ∈ {−1, 0, 1}, 69 k ∈ {0, ..., N − 1}, l ∈ {0, ..., N − 1} and 0 ≤ (k + l) ≤ N − 1. We identify s as player 1's reputation 70 score from the perspective of all other players (due to Proposition 1(iii), all other players agree on player 71 1's reputation). The value of k denotes the number of players that player 1 considers to have score 72 r 1i = 0, whereas l denotes the number of players that 1 considers to have score r 1j = 1. The sum of k 73 and l thus corresponds to the number of players that player 1 considers to have a good reputation overall.

74
We can use this reduction due to Proposition 1(iii), which says that all other players can be considered 75 to be equivalent.

76
In this reduced state space, the Markov chain has 3(N +1)N 2 states in total. The initial states as defined can identify the fully recovered state as a group of configurations A, with We can now write down transition probabilities for L i , f i (s, k, l; s , k , l ), for the reduced state 80 space. They denote probabilities of the population moving from state (s,k,l) to (s',k',l') in one round.

81
There are at most 12 different transitions the population can take in the course of a run of the reputation 82 dynamics. Given the nature of the quantitative assessment dynamics we have introduced, many transition 83 probabilities are independent of the exact value of s; instead, they depend on whether s ≥ S with S the 84 threshold for overall assessment, i.e. whether player 1 has a "good" or "bad" image in the eyes of the 85 other players. This means that in these cases f i (1, k, l; 1, k , l ) = f i (0, k, l; 0, k , l ) ∀i, k, l, and we 86 write them as f i (G, k, l; G, k , l ). In analogy, we write f i (B, k, l; B, k , l ) for f i (−1, k, l; −1, k , l ).

87
We can calculate the transition probabilities as follows:

88
Transition (G, k, l) → (G, k+1, l). This case can only occur if a player i > 1 is chosen to be the donor 89 who is perceived as bad by player 1. Given that the current state is (G, k, l), it follows from Propo-90 sition 1 that the donor considers everyone as good, and hence they cooperate. If player 1 considers 91 the receiver to be good, this leads them to assign a good reputation to the donor, independent 92 of the applied leading-eight strategy L i . Otherwise, if player 1 considers the receiver to be bad, 93 the donor only obtains a good reputation for L 1 , L 2 , L 3 , and L 5 . Therefore, the corresponding 94 transition probability is Transition (G, k, l) → (G, k−1, l). This case can only occur if a player i > 1 is randomly chosen to act as 96 the donor who is perceived to have reputation score r 1i = 0 by player 1. Similar to before, player 97 i will always cooperate, which is only considered as bad by player 1 if the receiver is considered 98 as bad by player 1 and if the applied strategy is either L 2 , L 5 , L 6 , or L 8 . Therefore, the transition 99 probability is Transition (G, k, l) → (B, k, l). This corresponds to the probability f i (0, k, l; −1, k, l). The transition 101 can only occur if player 1 is chosen to be the donor, and if player 1 defects against the receiver 102 (which in turn requires player 1 to consider the receiver as bad). The corresponding transition 103 probability is Transition (B, k, l) → (B, k+1, l). This case requires that a player i > 1 is chosen to be the donor who 105 is considered as bad by player 1. This donor cooperates, unless the randomly chosen receiver 106 happens to be player 1 (who is bad from the perspective of all other players). Thus, player 1 107 considers the donor as good after this round unless the receiver is player 1, or the receiver is 108 a group member that is considered as bad by player 1 and the applied leading-eight strategy is Transition (B, k, l) → (B, k−1, l). This case requires that a player i > 1 is chosen to be the donor who 111 player 1 considers to have reputation score r 1i = 0. To become bad in player 1's eyes, this donor 112 then either needs to defect against player 1, or he needs to cooperate against a receiver who is 113 considered as bad by player 1 (provided that the applied leading-eight strategy is L 2 , L 5 , L 6 , or 114 L 8 ). The transition probability becomes Transition (B, k, l) → (G, k, l). This corresponds to the probability f i (−1, k, l; 0, k, l). It requires player 116 1 to be the donor, and that player 1 cooperates with her co-player. The probability is Transition (G, k, l) → (G, k − 1, l + 1). This case requires that a player i > 1 is chosen to be the donor 118 who is considered to have reputation score r 1i = 0 by player 1. This player cooperates with 119 probability 1, since they consider everyone to be good. Player 1 will increment the reputation 120 score of the donor unless the social norm applied is L 4 , L 6 , L 7 , L 8 and the receiver is considered 121 to be bad by player 1.
Transition (G, k, l) → (G, k + 1, l − 1). This case requires that a player i > 1 is chosen to be the donor 123 who is considered to have reputation score r 1i = 1 by player 1. This player cooperates with 124 probability 1, since they consider everyone to be good. Player 1 will decrement the reputation 125 score of the donor only if the social norm applied is L 2 , L 5 , L 6 , L 8 and the receiver is considered 126 to be bad by player 1.
Transition (B, k, l) → (B, k − 1, l + 1). This case requires that a player i > 1 is chosen to be the donor 128 who is considered to have reputation score r 1i = 0 by player 1. This player then has to cooperate, 129 which means that player 1 cannot be the receiver. Player 1 will increment the reputation score of 130 the donor unless the social norm applied is L 4 , L 6 , L 7 , L 8 and the receiver is considered to be bad 131 by player 1.
Transition (B, k, l) → (B, k + 1, l − 1). This case requires that a player i > 1 is chosen to be the donor 133 who is considered to have reputation score r 1i = 1 by player 1. The receiver then has to be either 134 player 1, against who the donor will defect, or someone player 1 assigns a bad reputation to in case 135 the applied norm is L 2 , L 5 , L 6 , L 8 .
Transition (0, k, l) → (1, k, l). This case requires player 1 to be the donor, and that player 1 cooperates 137 with their co-player. The probability is It is equal to f i (−1, k, l; 0, k, l).

139
Transition (1, k, l) → (0, k, l). This case requires player 1 to be the donor, and that player 1 defects 140 against their co-player. The probability is It is equal to f i (0, k, l; − 1, k, l) in (9).

142
All other transitions from (s, k, l) to (s , k , l ) have transition probability f i (s, k, l; s , k , l ) = 0.

143
We observe that for this reduced Markov chain, the set of recovery states A is absorbing. As can be where player 1 considers everyone else as bad, whereas the remaining players consider player 1 to be 149 bad. Note that (G, 0, 0) is never an absorbing state.

150
In the following we will show that for all L i , both the recovery probability and the expected time to argument to show that these chains will perform better or equal compared to the binary assessment sce-156 nario.

157
We visualize the Markov chains with quantitative assessment by first noting that the transitions of 158 type (s, k, l) → (s, k ± 1, l ∓ 1) do not change the overall reputations, i.e. the labels "good" and "bad" 159 of any players. We can identify them with internal transitions inside the 3N "aggregated" states of 160 type (s, k + l = t) (Supplementary Figure 1a), where t is the overall number of players that player 161 1 considers to be "good". In our illustrations of the chains, we will omit these internal transitions for 162 ease of visualization (Supplementary Figure 1b). We note however that some of the remaining state  Figure 2a). Intuitively, our argument works by considering that all "bad" 176 moves (i.e., moving downwards or right in the chain) in the quantitative case always have smaller or 177 equal probabilities than the corresponding "bad" moves in the binary case.

178
More specifically, consider an arbitrary trace T in M 1 . If T never takes a transition that changes the 179 value of s, we can associate the identical trace T in M 1 that never leaves the level s = G, since the 180 levels s = 0 and s = 1 in M 1 are indistinguishable in this case. Otherwise, there is a moment where 181 T has a transition into a state with s = s + 1 or s = s − 1. In these cases, depending on whether T 182 is in s = 0 or s = 1, the two traces either both take a step into the state where player 1 is considered 183 to be bad, or T remains in a state where player 1 is considered good, whereas he is considered bad in 184 the state that T reaches (i.e., the middle layer (s = 0) of M 1 can act as a buffer). In both cases, we can 185 couple the traces such that T is never below or to the right of T . The latter holds due to how the "lateral" f 1 (G, k, l; −1, k + 1, l) and f 1 (G, t; G, t + 1), respectively. Therefore, it follows that if T has reached 193 the absorbing state A in n steps, T has reached it in n steps with at least the same probability. Thus, we 194 get an upper bound for the number of steps required to reach the absorbing set of states A in M 1 , which 195 is equivalent to the bound calculated for the chain in the binary assessment scenario. Since this bound 196 was found to be τ 1 = N + 7, and the lower bound is τ = N − 1, we also find that the tight bound of 197 τ 1 = Θ(N ) holds in the quantitative assessment case. 198 We can use similar coupling arguments as we look at the remaining cases as well. We proceed with 199 the case of L 2 and L 5 (Supplementary Figure 3a), which differs from the previous chain in the positive to the lower bound in the first case, with τ 4 ≤ N − 1, such that we again get τ 4 = Θ(N ).

220
The same reasoning holds for M 6 (Supplementary Figure 5), which differs from M 4 again in the 221 positive probability to make a step to the right in an upper level of the chain (f 6 (G, k, l; G, k −1, l) > 0).

222
We get, by comparing with the bounds for the binary case, that ρ 6 ≥ 1 − 1 N and τ 6 ≤ N · H N − N , with 223 H N the N-th harmonic number N n=1 1 n .

224
These are very rough upper bounds. In fact, when we take a look at the actual recovery times of the 225 system, we find that in all eight cases, τ = O(n), i.e. that recovery time is approximately linear for all 226 leading eight norms. We show the resulting plot in Figure S3a. When we do linear regression on these 227 curves, τ i ≈ N for i ∈ {1, 3, 4, 7} and τ i ≈ 1.3N for i ∈ {2, 5, 6, 8}. This is a substantial improvement 228 over the recovery times for the case of binary reputations. We additionally find that the expected number In analogy to the work of Ohtsuki and Iwasa 2 , we now explain the characteristics of those third-order 237 strategies that are successful both under public information as well as private and noisy information. For 238 this axiomatic approach, we now assume that players use quantitative assessment, since binary assess-239 ment does not lead to the evolution of cooperation once information is not public.

240
In the following, we use notation similar to previous work, adapted to our model of quantitative as-241 sessment. We again distinguish between reputation scores r ij ∈ [−R, R], and the corresponding overall 242 judgments (labels) as "good" or "bad", which arise from comparing these scores with the threshold S.

243
The assessment (i.e. adding or subtracting from the score) of an action X by a donor with label A to- We note that these four required properties are independent of whether players use binary or quan-262 titative assessment. These conditions fix five elements (bits) of a successful norm's assessment rule. In

263
Ohtsuki and Iwasa's original work, three bits were then left unspecified, giving the leading eight. How-264 ever, if we consider the setting where information is private and noisy, we need to specify one more bit 265 with the following condition: With these requirements, four of the leading eight norms remain: L 1 , L 2 , L 7 , L 8 . They are exactly 272 the four norms that we see being able to evolve under private and noisy information, as long as they use 273 quantitative assessment. The norms L 3 , L 4 , L 5 , L 6 in contrast are more gullible, and let defectors regain 274 some of their reputation by defecting against another of their kind. 275 We note however that among the successful norms, L 8 has the fewest opportunities for a player 276 labeled as bad to improve his score and be labeled good (Fig. 1a). For example, an unconditional 277 cooperator easily gets a bad label in the eyes of an L 8 player. This explains why we see the lowest 278 abundance and cooperation rate in equilibrium in L 8 out of all four successful norms, and why the 279 success of L 8 is also more sensitive to an increased number of reputation ranks (Fig. 5).  a, Aggregated states of the type (s, k + l), with s the assessment of player 1 in the eyes of the other players, and k + l the number of players that player 1 assesses as good. They aggregate states (s, k , l ) with k + l = k + l), with internal ("hidden") transitions that change the value of k and l while keeping their sum constant. b, For ease of visualization, we only show the aggregated states and omit the internal states when we illustrate the Markov chains in the following. Note however that the internal state can determine the transitions out of a state. For the upper bound, we consider a chain where states with s = 0 are erased and transition probabilities to the right that are proportional to k are upper bounded by k + l. This is equivalent to the chain for the binary assessment case, M 6 .