Social norms in indirect reciprocity with ternary reputations

Indirect reciprocity is a key mechanism that promotes cooperation in social dilemmas by means of reputation. Although it has been a common practice to represent reputations by binary values, either ‘good’ or ‘bad’, such a dichotomy is a crude approximation considering the complexity of reality. In this work, we studied norms with three different reputations, i.e., ‘good’, ‘neutral’, and ‘bad’. Through massive supercomputing for handling more than thirty billion possibilities, we fully identified which norms achieve cooperation and possess evolutionary stability against behavioural mutants. By systematically categorizing all these norms according to their behaviours, we found similarities and dissimilarities to their binary-reputation counterpart, the leading eight. We obtained four rules that should be satisfied by the successful norms, and the behaviour of the leading eight can be understood as a special case of these rules. A couple of norms that show counter-intuitive behaviours are also presented. We believe the findings are also useful for designing successful norms with more general reputation systems.

The ability to cooperate with others that are genetically unrelated is a remarkable trait of humans. Reputation is formed by evaluating each other after observing who does what against whom, which in turn incentivizes an action that is costly but beneficial to others because those with good reputations are likely to receive benefits subsequently in society. This is known as indirect reciprocity, one of the most fundamental mechanisms for maintaining cooperation 1 . Whether a given action is perceived as good depends on the action itself, the context, and the social norm used by the observer. A central question is thus what are the requirements of social norms to achieve social cooperation.
The leading eight is a set of highly successful social norms for maintaining cooperation at a high level 2 . By comprehensive enumeration of possible social norms, it has been found that the leading eight are the only ones that can sustain evolutionarily stable cooperation for a broad range of the benefit-to-cost ratios of cooperation. Because of its simplicity and effectiveness, the leading eight have served as a baseline in a wide range of theoretical studies of indirect reciprocity [3][4][5][6][7][8][9][10] .
Most of the previous studies on indirect reciprocity, including the leading eight, assume that the reputation of a player is represented by either 'good'(G) or 'bad'(B) [11][12][13][14] . Whereas the assumption of binary reputation has been widely adopted as a common practice for its simplicity and theoretical tractability, such dichotomy is not always realistic given experimental evidence and our daily experience. Furthermore, it is not always clear how much we can generalize the conclusions obtained from the binary-reputation models to more realistic and complex reputation models because the conclusions may be consequences of the oversimplification.
What would be the universal characteristics that every successful norm shares irrespective of the form of reputation? How should we revise the conclusion learned from the binary-reputation system when the binarity assumption is relaxed? Answering these questions has been a serious challenge because the strategy space expands super-exponentially with the number of possible values of reputations: The number of third-order social norms with k reputations is about (2k 2 ) k 2 /k! as we will discuss in the next section. Although there are several studies that go beyond the binary assumption, only a small subset of the norms were studied in these studies by assuming ordinal relationships between reputations [15][16][17][18][19][20][21][22] . In particular, a continuum formulation of indirect reciprocity allows a perturbative analysis, from which one can derive a condition for linear stability against erroneous disagreement, but it is applicable only to mutants that are sufficiently close to the resident norm 22 .
In this study, to bridge the gap between the binary and more general models of reputation, we comprehensively study the norms with a ternary-reputation model under public reputation, in which players are labelled by three types of reputations. The ternary counterparts of the leading eight will be fully identified by comprehensive enumeration of the third-order social norms through state-of-the-art supercomputing. Such a large-scale enumerative approach has also proved useful in studies of direct reciprocity [23][24][25] , and this study is an application of the method to the study of indirect reciprocity. As we will see in the following, both similarity and dissimilarity 1. Let h B , h N , and h G denote respective fractions of B, N, and G. Calculate their values in a stationary state, denoted by h * B , h * N , and h * G , respectively, under the assumption that the entire population uses S. 2. Calculate the cooperation level p c , which means the probability that a donor cooperates towards a recipient when both are randomly picked from the resident population. 3. Reject the norm if p c < p th c , where p th c is a threshold for the cooperation level. 4. Otherwise, calculate the payoff of a mutant with a different action rule from the resident one under the assumption that mutants occupy a sufficiently small fraction. 5. Repeat the above step for all possible action rules. If the payoff of a resident is higher than that of any possible mutants, S is a CESS.
Here, we define a CESS as a norm whose defection level p d ≡ 1 − p c scales as O(µ a ) + O(µ e ) as µ → 0 , that is, the probability of prescribing defection is of the same order as the error rates. Some norms show slower convergence, such as www.nature.com/scientificreports/ such norms that are fragile against noise. This requirement is consistent with the criteria for finding the leading eight in the binary-reputation case. In the following calculation, we use µ = 10 −3 , where µ ≡ µ e = µ a , and p th c = 0.99 . We choose these values such that 1 − p th c is sufficiently larger than µ but smaller than √ µ . For the CESS's found in the following, we numerically confirmed that p d = O(µ) by calculating the cooperation levels for different values of µ.
As mentioned above, the norms generated by permuting G, N, and B are equivalent because we assume no ordinal relations among them. To remove trivial multiplicity, we use the following protocol: First, the reputation with the largest fraction is labelled as G. In other words, we always have h * To be a CESS, therefore, the action rule must achieve high p c by prescribing C when both the players have reputation G. If this rule is violated by mistake, the donor must get a reputation other than G because, otherwise, the donor would not find any incentive to cooperate. We define the reputation resulting from such defection as B, and the last remaining one as N. The labels assigned by these guidelines are overall consistent with our common sense of 'good' , 'neutral' , and 'bad' , as we will see in the following.

Results
Leading eight in the binary-reputation model. Before showing our results for the ternary-reputation model, let us review the characteristics of the leading eight. We will check whether these are universally shared with successful norms in the ternary-reputation model. By the leading eight, we mean eight norms that qualify as CESS's in the binary-reputation model, and they are characterized by the following four properties 12 : 1. Maintenance of cooperation: P(G, G) = C and R(G, G, C) = G. With the leading eight, the community is mostly occupied by a single type of players (G) who form mutual cooperation, whereas the fraction of B-players is of O(µ) . When someone defected from cooperation, the population assigns reputation B to the defector to distinguish him or her from cooperators. G-players punish such a B-player by refusing cooperation, and the punishment is justified in the sense that the defection does not hurt their G-reputation. A B-player can obtain G-reputation by donating to a G-player as an apology. The prescriptions of the leading eight are summarized in Table 1.
To describe a norm concisely, we hereafter use a notation composed of five characters separated by semicolons such as GB:DG:N . The first two characters denote the reputations of a donor and a recipient, respectively. In this example, the donor's reputation is G, and the recipient's reputation is B. The third character denotes the prescribed action, and the fourth character means the reputation that the donor obtains by following the prescription. Finally, the last character denotes the donor's new reputation when choosing the opposite action. The above example is thus interpreted as follows: "When a G-donor meets a B-recipient, the donor should defect. He or she gets G if following the prescription, and N otherwise. " In addition, square brackets [. . . ] and a wildcard * are used to indicate a set of prescriptions. For instance, GB:D[NG]:B means a set of prescriptions, according to which a G-donor should defect against a B-recipient. By defecting, the donor earns either N or G. Otherwise, the donor's reputation becomes B. By this notation, the leading eight can be characterized by the following prescriptions: In a stationary state, the fraction of B-players h * B and the defection level p d scale as O(µ) . This is because players get B-reputation only by implementation or assignment errors, whereas a B-player can almost always recover  Prescriptions that are commonly shared by the leading eight. The asterisk ( * ) is a wildcard, meaning that it can be any of G and B. The left two columns show reputations, and the third column is the action A prescribed by the action rule. The fourth column indicates the reputation assigned to the donor who executed the action A, and the last column shows the reputation resulting from the other action ¬A . The dagger ( † ) means that the action is either C or D depending on the assignment rule, so it is C if and only if R(B, B, C) = G and R(B, B, D) = B. where the first term on the right-hand side means the rate of change from B to G represented by Eq. (1c), and the last term represents the opposite caused by error. We note that all the wildcards in Table 1 are prescribed for the events that happen with probability smaller than O(µ) . For instance, a B-donor meets another B-recipient with probability O(µ 2 ) . However, such an event is so rare that the prescriptions for these events remain arbitrary within the leading eight. In other words, we can understand the working mechanism of the leading eight by investigating the events that occur with probability O(µ).
Here, it is worth pointing out that the leading eight are the only CESS's in the binary-reputation model. If any of the prescriptions in Eq. (1) is missing, the norm is no longer a CESS. For instance, without the justification of punishment [Eq. (1b)], the norm is essentially equivalent to Image Scoring, which cannot sustain stable cooperation because those who punish a B-player also lose good reputation, making h * B greater than O(µ).

CESS's in the ternary-reputation model.
We exhaustively enumerated all the norms to find CESS's in the ternary model. The number of CESS's is shown as a function of b/c in Fig. 1. As shown in this figure, the number tends to increase in a stepwise fashion with b/c, indicating the existence of norms that qualify as CESS's only for a certain range of b/c. Let us define the "core" set as norms that are evolutionarily stable within a reasonable range of b/c, say, [1.1, 10]. In other words, the core set is the common subset of the discovered CESS's. Of course, even the norms in the core set may be evolutionarily unstable if b/c is extremely large or close to unity, but such edge cases were excluded from consideration. Our core set contained 2, 067, 861 CESS's in total, and we examined this set. Note that the size of the core set is smaller than the number of CESS's for the lowest b/c. Furthermore, the number of CESS's does not increase monotonically as b/c grows in Fig. 1. Such behaviour implies the existence of nontrivial social norms that are CESS's for a certain value of b/c but not when b/c takes a higher value. For example, if a player may lose good reputation by punishing an ill-reputed player, it could be better to overlook him or her than to inflict costly punishment as long as b/c is high enough. Figure 2a shows the distributions of h * B and h * G for the norms in the core set. The plot of h * N has been omitted because h * The figure shows h * B ≈ 0 for all the cases, and we numerically verified that h * B ∼ O(µ) for µ ≪ 1 . However, whereas most norms have h * G ≈ 1 similarly to the leading eight, we found a small but nonnegligible amount of norms for which h * G is significantly smaller than unity, indicating the existence of CESS's having different working mechanisms from those of the leading eight.
To understand CESS's systematically, we first classified them according to how much G-players exist in the stationary state. For the leading eight, the majority of players have reputation G, i.e. h * G ∼ O(1) and h * B ∼ O(µ) , and mutual cooperation is formed by these G-players. For some of the ternary strategies, on the other hand, not only G but N may occupy a significant fraction of the population as shown in Fig. 2a. Depending on the scaling behaviours of h * N as µ → 0 , we found that the CESS's in the core set are classified into the following three types: www.nature.com/scientificreports/ Figure 2b shows examples of the scaling relations between h * N and µ for three norms, one in each class. In this way, despite the considerable differences at the prescription level, the vast number of CESS's can be categorized into three well-defined classes unambiguously.
We confirmed that all the core CESS's, as in the leading eight, commonly have mechanisms to punish defectors and to recover their reputations from erroneous actions. However, we also observed a couple of variants in their ways of punishment and recovery. After classifying the norms into the above three types, we further classified them according to how players conducted punishment and recovery, which we call punishment and recovery patterns.
What do we mean by punishment patterns? With the leading eight, the majority G-players punish B-players by defecting against them, and those who inflicted punishment keep G-reputation after their punishment. Namely, the punishment is justified. However, this is not always the case for some of the ternary CESS's, under which a punishing player's reputation does change. We thus classify the norms into those with full justification (P1) and those with partial justification (P2).
Likewise, a recovery pattern means the way that B-players recover their reputations. With the CESS's, a B-player can restore reputation by making an apology. In the leading-eight community, a B-player can immediately return to G after cooperating with a G-player, and such norms that allow instantaneous recovery are labeled as R1. However, R1 is not the unique recovery pattern in the case of the ternary reputation because some norms allow B-players to only gradually recover their reputation, which we call R2.
In summary, the CESS's are grouped into three classes and 12 subclasses according to the taxonomy shown in Fig. 3. We will see the details in the following.
Details of each type. Type C1. The first class (C1) is the most common type. It contains more than two million norms, which comprises about 97% of the core CESS's. With this class of norms, the majority of the players have reputation G and form mutual cooperation. In other words, the norms prescribe GG:CG:B in common. The master equations near the stationary state are approximated in the following forms: which mean that players with N or B quickly change reputation by meeting G-players, the majority of the population ( h G ≈ 1 ). If µ ≪ 1 , we can see from Eqs. (3) and (4) that both h N and h B will decrease exponentially as time goes by. In the stationary state, the population will thus end up with h N ∼ h B ∼ O(µ) , sharing a high degree of similarity to the leading eight. Nevertheless, we find some distinctions from the leading eight in the punishment and recovery patterns. Example norms in each subclass are shown in Table 2.
Let us first look at two different patterns, depending on which reputation is assigned to a punishing player: • Type P1: Norms with GB:DG: * , • Type P2: Norms with GB:DN: * .
As we have seen in the binary-reputation model, a punishing G-player must not get B-reputation to keep the cooperation level high. Thus, the above two are the only possibilities. Class P1 works similarly to the leading eight. Namely, the punishing behaviour is fully justified, and a G-player can maintain the reputation. On the other hand, P2 is a novel class that has not been reported before. Under a P2 norm, a punishing G-player cannot maintain his or her original reputation but gets N. Their punishment is not always justified because the resulting R1 is the most basic type, similar to the leading eight. R2 is unique to the ternary model: It takes two steps for a B-player to improve reputation to G. During the recovery process, a player needs to cooperate with G-players at least once to ensure that defection does not pay. Each class is further categorized into four subclasses. Norms that fully (partially) justify punishment are labeled as P1 (P2). They are also categorized according to whether B-players are allowed to recover their reputation instantaneously (R1) or gradually (R2). The leading eight correspond to C1P1R1 in which N is totally irrelevant, or to C3P1R1 when G and N merge into a single reputation.
and the cooperation level p c for µ a = µ e = 10 −3 are shown together with their prescriptions.

Type
Prescriptions  Table 2. Transitions in the stationary state [Eq. (11)] are depicted as weighted edges. The corresponding graph for the leading eight is also presented for comparison: The thick blue self-loop at G indicates that the majority of the players have G with a high level of cooperation. When an implementation error happens, the state moves to B as indicated by the dashed edge GG:DB . The other self-loop is GB:DG , which means justified punishment inflicted by G-players. The remaining edge, BG : CG, corresponds to the recovery of reputation.
The topology of the graph for the C1P1R1 norm is identical to that for the leading eight except for the unused node N, indicating that their working mechanisms are essentially the same. P2 norms have a directed edge from G to N in common, which corresponds to GB:DN instead of the self-loop GB : DG in P1 norms, and it implies reputation change caused by punishment. Whereas punishment against B-players is not justified, the action against N-players is justified as seen in the self-loop GN : DG or GN : CG, which is required to keep h G high. We can also find difference between R1 and R2 in that R2 norms require two steps to reach G from B, as seen from a path B → N → G instead of a direct edge B → G.
Type C2. The second class contains 51, 363 norms, and this number is much smaller than C1. In this class, we have h * N ∼ O( √ µ) , which is small but significantly greater than O(µ) . Near the stationary state, the leading terms of the master equations are written as follows:  As in C1, C2 norms can be further classified according to punishment patterns as follows: • where R1 and R2 correspond to the instantaneous and gradual recovery processes, respectively. Examples of C2 norms are shown in Table 3, and their state-transition graphs are depicted in Fig. 5. The difference between these subclasses is clear: The graphs for P1 have self-edges GB:DG in common, whereas those for P2 have edges from G to N ( GB:DN ). Similarly, the graphs for R1 have edges from B to G (BG : CG) whereas those for R2 have a path B → N → G.
Type C3. Finally, we categorize 11, 337 norms into the third class (C3). Differently from C1 and C2, the stationary fraction h * N remains finite even when µ → 0 . As a result, most players end up with either G or N. Mutual cooperation is formed between G-and N-players, whereas those who defect receive reputation B. The fraction of B-players is a small quantity of O(µ) because one can easily escape from B-reputation by meeting G-or N-players. We also obtained asymptotic dynamics of h G and h N as shown in "Methods".
Similarly to the other classes, a finer classification of C3 norms according to their punishment patterns can be defined as follows:   R1 is similar to the leading eight, and R2 is unique to the ternary-reputation model because B-players cooperate only with either G-or N-players and recover reputation. Thus, on average, more than one step is required to return to the original reputation state if it is lost by mistake. Examples of C3 norms as well as their state-transition graphs are shown in Table 3 and in Fig. 6. The graph for C3P1R1 is equivalent to that for the leading eight if G and N merge into one. Whereas P1 norms have no solid edges to B, P2 norms have a solid edge to B either from G or N, indicating that punishment is not always justified in P2 norms. No self-loop exists around B in the R1 graphs whereas those for the R2 graphs have a selfloop BN : DB or BG : DB, indicating that B-players cannot always escape from B.

Counter-intuitive examples.
Here we show a couple of norms with interesting differences from the leading eight and discuss why they nevertheless qualify as CESS's with the ternary reputation.
The first example is "unfair" punishment, which is observed from the C1P2R1 norm in Table 2 and Fig. 4. Under this norm, a punishing player does not maintain G but gets N, and the player is punished by a G-player in the subsequent round. In other words, a player who initially had G-reputation gets unfairly punished after the unfortunate encounter with a B-player although the player has accurately followed the prescription. This unfairness is never observed in the leading eight because B cannot be assigned to a punishing player to keep h * B ∼ O(µ) . However, when the reputation is not binary, a norm without the full justification can be a CESS because both h *    Figure 5. The state transitions for the C2 norms shown in Table 3. The notations of the graphs are the same as those in Fig. 4. www.nature.com/scientificreports/ In the second example, we see peculiar behaviour of "making an apology by defecting. " Such behaviour is observed from the C1P1R2 example in Table 2 and Fig. 4. A player needs two steps ( B → N → G ) to return to G once he or she gets B. First, a B-player must cooperate with a G-player to become N. Then, the N-player must defect, not cooperate, against a G-player to become G, which goes against our common sense. This counterintuitive apology is not possible with the binary reputation because such a norm would allow a constantly defecting player to be better off than the rest of the population. However, when more than two kinds of reputation are available, this is not the case. A norm can be a CESS as long as cooperation is prescribed at least once    Table 4. The notations of the graphs are the same as those in Fig. 4.

Scientific Reports
| (2022) 12:455 | https://doi.org/10.1038/s41598-021-04033-w www.nature.com/scientificreports/ in the course of apology, but not necessarily twice or more, because a single move of cooperation is enough to compensate for the defection. Another interesting behaviour is found in C2P2R1 norm in Table 3 and in Fig. 5, where "inequality" among cooperators spontaneously emerges. As shown above, C2 norms are characterized by the fact that most players have G except a small fraction O( √ µ) of N players in the stationary state. Those players form mutual cooperation, but N-players defect against each other under some C2 norms. Such an event occurs with probability O(µ) , thus negligible at the societal level, but it makes a significant difference from a player's perspective. That is, although G-players almost surely receive cooperation from the community, N-players do not benefit from N-players, yielding the drop of the individual cooperation level by O( √ µ) . This inequality becomes significant as µ grows. For instance, when µ e = µ a = 0.05 , we see that h * N ≈ 0.22 , and more than 20% of the population suffers such a loss. Again, the drop of the cooperation level is still acceptable at the society level, and this C2 norm qualifies as a CESS.

Summary and discussion
Although reputation in our real life is not always distinguished between good and bad, most previous works have accepted an idealized assumption of binary reputation. Some researchers have attempted to go beyond the binary reputation [15][16][17][18][19][20][21] , and the motivation behind the ternary reputation in Tanabe et al. 15 is close to ours. However, they studied only second-order norms and assumed an ordinal relationship among reputations to limit the number of norms to 512. By considering the third-order assessment rules, this study naturally takes into account the Self strategy 27 as well as all the second-order norms (see "Methods"). One may extend the binary system by representing reputation as integer values 16,19,20 . The other extreme is to regard reputation as a continuous variable. This approach makes it possible to use analytic tools, but it has its own limitations because we can only examine a small neighborhood of the existing cooperative norm 22 . Little is known about the consequences of these simplifying assumptions: For example, some lessons from indirect reciprocity might be due to the oversimplification. We should ask what are the fundamental properties that are preserved irrespective of the complexity of the reputation system to sustain cooperation. To address this question, we comprehensively searched for the CESS pairs of assignment and action rules with ternary reputations to compare with the leading eight. From more than thirty billion possibilities, we filtered out "core" CESS norms that constitute the counterpart of the well-known leading eight.
The result shows that the previous conclusions drawn from the binary-and continuous-reputation models do not fully capture the various possibilities of CESS's. For example, under a certain norm, a player may lose G-reputation even when he or she has obeyed all the prescriptions of the norm, something unimaginable in the leading eight. Another norm requires an ill-reputed player to defect against the community to gain a better reputation. This observation suggests that a population may well achieve cooperation by actively making use of reputations far from G, a possibility that has been ignored by the linear stability analysis for the continuous model. Put differently, the viewpoint of this work is that N does not necessarily mean 'less bad' but functions in its own way. Based on this idea, we explored the strategy space in full without imposing any ordinal relationship a priori. Indeed, we have observed cases where N cannot be interpreted as the middle reputation between G and B. For instance, in some of C1P2R1 norms, a G-player who punished a B-player gets N and the N-player can defect against G-players while receiving cooperation from G-players, which implies that N is deemed better than the majority's reputation G. Such a CESS would not be found if we assumed an ordinal relationships among the three reputations.
It is still instructive to compare our findings with the leading eight. As we have seen above, the leading eight have the following characteristics: (i) Maintenance of cooperation, (ii) Identification of defectors, (iii) Punishment and justification of punishment, (iv) Apology and forgiveness. Overall, these characteristics are shared with the ternary CESS's, indicating key features universally required for general reputation systems. However, some of the above characteristics are relaxed when we go beyond the binary assumption. First, cooperation is not always maintained among a single type of players but among multiple types, as seen in C2 and C3. Second, partial justification of punishment is allowed: A player who inflicted punishment does not always keep the original reputation, differently from the leading eight. Third, forgiveness may be non-instantaneous: Instead of being forgiven right after cooperating with G as in the leading eight, it may take some steps for a B-player to recover his or her reputation.
In summary, based on the results for the ternary-reputation system, we conjecture that the CESS norms for general reputation models will share the common characteristics in a more relaxed form: 1. Maintenance of cooperation by the majority (but not necessarily all) of the population. 2. Identification of defectors. 3. Punishment, followed by partial or full justification. 4. Apology accompanied by gradual or instantaneous forgiveness.
We believe these rules serve as useful guiding principles when designing a norm with even more intricate reputations.
Finally, we would like to stress that the ternary-reputation model is an interesting system in its own right because the third reputation may provide additional historical information for players. An important future direction is the study of the private reputation in a noisy environment 4,[8][9][10][28][29][30][31] . Investigation into a polymorphic population in this context remains an open problem at large, and the same remark applies to stochastic reputation dynamics 15,21  www.nature.com/scientificreports/ of h N (C2 and some C3 norms, see Table 6), these norms could show behaviours that are significantly different from the binary ones.

Methods
Calculation of the stationary-state population. Let h Z be the fraction of players having reputation Z ∈ {B, N, G}) . By construction, we always have h sum ≡ h B + h N + h G = 1 . When an RP pair is given, one can calculate the time evolution of h Z for a short time interval [t, t + t] in the error-free limit as follows: Within this interval, we randomly choose a small fraction of players as donors, and we denote their fraction as α�t ≪ 1 . The fraction of those players having reputation X is h X (t)α�t . A recipient is assigned to each donor through random sampling, so the donor meets a recipient with reputation Y with probability h Y (t) . If everyone abides by the given RP pair, we may rewrite the assignment rule as R(X, Y ) ≡ R(X, Y , P(X, Y )) . The inflow of h Z is thus equal to h X (t)h Y (t)δ R(X,Y ),Z , where δ i,j is the Kronecker delta, because the donor with X has to interact with the recipient with Y and obtain new reputation Z according to the rule R(X, Y). On the other hand, the outflow is h Z (t)α�t because the donors will have updated reputations other than Z in general. Thus, the time evolution of h Z (t) is given by Taking the limit of t → 0 , we have the following differential equation: where T XY →Z ≡ h X (t)h Y (t)δ R(X,Y ),Z and α ≡ 1 after rescaling the unit of time.
In the presence of implementation error, a donor fails to cooperate with probability µ e . In other words, the prescribed action is correctly executed with probability 1 − µ e , and the player must defects otherwise. By taking into account the implementation error, T XY →Z is thus redefined as In the presence of assignment error, the donor does action A and receives correct reputation Z with probability (1 − µ a ) , but Z may be assigned by mistake with probability µ a /2 although the assignment rule does not prescribe Z. Therefore, the probability that the donor obtains Z is given as Thus, when both implementation and assignment errors may occur, T XY →Z is redefined as Note that the dynamics preserves h sum ≡ h G + h N + h B = 1 because According to our numerical check, h Z (t) converges to a unique stationary state, h * Z = lim t→∞ h Z (t) , irrespective of the initial condition for most cases. However, in some cases where multiple stationary states coexist, we adopted the one obtained from the initial condition (1/3, 1/3, 1/3), regarding it as the most representative one. For each social norm, we obtain h * Z by using the fourth-order Runge-Kutta algorithm, normalizing h Z (t) by h sum each time step.

Calculation of the cooperation level and the payoffs. Cooperation level p c for a resident species is defined as
The payoff of a resident player π res is calculated as Then, we calculate the dynamics of the fraction of mutant players for each reputation, {H B , H N , H G } , when a small number of mutant players exist in the community. The fraction of mutants having Z reputation is updated as π res = p c (1 − µ e )(b − c). where R XY ≡ R X, Y ,P(X, Y ) and P (X, Y ) is the action rule of the mutant. We numerically confirmed that H Z converges to a stationary value H * Z after an initial transient period. Using these stationary values, the probability that a mutant cooperate with a resident is whereas its counterpart is Using these, the payoff of the mutant is given as Enumeration of norms. We enumerated all possible combinations of assignment and action rules to find every CESS's. A supercomputer was used to deal with a large number of possibilities that amounts to 64, 573, 605 × 2 9 = 33, 061, 685, 760 . To speed up the calculation, we removed some of the norms that cannot be CESS's as follows: When an assignment rule R contains a case where the assigned reputation is the same for both actions, D must be prescribed to be an ESS. For instance, when R(G, G, C) = R(G, G, D) , a G player would have no incentive to cooperate with another G player. In such a case, an action rule prescribing C at (G, G) cannot be an ESS because the defector gains a strictly higher payoff than the resident. We exclude these cases to speed up the computation.
Second-order norms. Although we have focused on the third-order norms, second-order norms are included in the third-order CESS's as a subset. Under a second-order norm, a new reputation is assigned to a donor, and the prescribed action is independent of the donor's reputation, and the assignment and action rules are functions of the recipient's reputation and the conducted action. These norms are thus simpler than thirdorder ones.
The full list of the second-order CESS's is shown in Table 5. We represent the norms by using the same notation as those in the main text, but with the first character unspecified (denoted as _ ) because the second-order norms are independent of the donor's reputation.
As shown in the table, there are 18, 9, and 6 second-order norms in C1-P1-R1, C1-P2-R1, and C3-P1-R1 classes, respectively. Most of these are relevant to norms with the binary reputation. The only two secondorder norms in the leading eight are Simple Standing (SS) and Stern Judging (SJ), which are denoted as ( _B:DG: * , _G:CG:B ). As seen in Table 5, a large fraction of the norms, those denoted by ♠ or ♣ , are equivalent to SS or SJ when two of the reputations are merged into one.
Dynamics of C3 norms. There are different kinds of dynamics of h N within C3 norms. The fraction of B-players is a small quantity of O(µ) because one can easily escape from B-reputation by meeting G-or N-players. If we merge G and N into a single reputation, the merged reputation corresponds to G for the leading eight. A radical example is a mechanism found in 2, 139 norms, for which h G and h N are almost non-interacting: Just as G-players preserve their reputations through GG:CG:B , the same is true for N-players with NN:CN:B , and the interaction between G-and N-players cannot change their numbers because the reputations are either preserved   Table 5. List of the second-order norms that are included in the CESS's. The asterisk * represents a wildcard, and the square bracket [BN] represents either B or N. Those denoted by ♠ are equivalent to SS or SJ when B and N are merged into a single reputation. Those denoted by ♣ are equivalent to SS or SJ when N and G are merged into a single reputation. www.nature.com/scientificreports/ or swapped. Their interaction is basically mediated by B-players, originating from error ∼ O(µ) . Therefore, for each of these 2139 norms, the dynamics towards a fixed point becomes frozen as µ → 0.

Prescriptions
For the rest, the coupling between h G and h N is more explicit, and the convergence rate is a finite constant independent of µ . Let us give two representative examples: A common pattern among 2754 norms in C3 is an oscillation between G and N due to GG:CN:B and NN:CG:B . In the absence of error, two G-players will change their reputations to N, and vice versa. The master equation for h N is thus written as where we have plugged h G = 1 − h N , considering h B ≪ 1 . The above equation clearly shows h * N = 1/2 in the limit of small µ.
For other 3, 003 of the C3 norms, the interaction between G and N is more delicate. In addition to the above prescriptions for Eq. (20), they also have GN:CG:B and NG:CG:B in common. Therefore, when an N-player meets a G-player, both will earn G-reputation by choosing C. As a whole, these features lead to the following master equation: whose stationary value is obtained as lim µ→0 h * N = ϕ −2 = (3 − √ 5)/2 ≈ 0.38 , where ϕ ≡ ( √ 5 + 1)/2 is the golden ratio. The dynamical behaviours of the other C3 norms can be explained in similar ways. Table 6 summarizes our dynamical characterization of the three classes.
Received: 21 October 2021; Accepted: 7 December 2021 Table 6. Summary of stationary values of h * N and the asymptotic time evolution near stationarity. In every case, the common findings are h B (µ, t → ∞) = O(µ) and h B (µ → 0, t) ∼ exp(−t/τ B ) . If necessary, a certain time scale of O(1) is denoted by τ N or τ B , which may differ norm by norm, and ϕ ≡ ( √ 5 + 1)/2 is the golden ratio.

Stationary values
Time dependence