Punitive preferences, monetary incentives and tacit coordination in the punishment of defectors promote cooperation in humans

Peer-punishment is effective in promoting cooperation, but the costs associated with punishing defectors often exceed the benefits for the group. It has been argued that centralized punishment institutions can overcome the detrimental effects of peer-punishment. However, this argument presupposes the existence of a legitimate authority and leaves an unresolved gap in the transition from peer-punishment to centralized punishment. Here we show that the origins of centralized punishment could lie in individuals’ distinct ability to punish defectors. In our laboratory experiment, we vary the structure of the punishment situation to disentangle the effects of punitive preferences, monetary incentives, and individual punishment costs on the punishment of defectors. We find that actors tacitly coordinate on the strongest group member to punish defectors, even if the strongest individual incurs a net loss from punishment. Such coordination leads to a more effective and more efficient provision of a cooperative environment than we observe in groups of all equals. Our results show that even an arbitrary assignment of an individual to a focal position in the social hierarchy can trigger the endogenous emergence of more centralized forms of punishment.


S2. Further data analyses and results
All test statistics reported in the main article and figures 1 and 2 are based on the regression model estimations presented in this section. Statistical significance is set at the 5% level (i.e.  = 0.05) for two-sided tests and we account for the repeated measures obtained on the same subject by estimating cluster-robust standard errors. We use Stata's margins command to calculate proportions from logit regressions and test the statistical significance of the differences between proportions using Wald tests of linear hypotheses. Regression tables are created using the estout command in Stata S1 . The data are available from the authors on request. Table S1 lists the coefficient estimates from logit regression models of individual group members' punishment decisions. Model M1 accounts for the type of the stage game only (MHD vs. VOD). The overall punishment rates in the MHD (19.8 %) and in the VOD (32.7 %) are significantly different from each other (χ 2 (1) = 15.61, p < 0.001). Model M2 shows that the difference in punishment rates between the MHD and the VOD is also statistically significant within both the symmetric (17.7 % vs. 32.9 %; χ 2 (1) = 9.04, p = 0.003) and the asymmetric (22.2 % vs. 32.6 %; χ 2 (1) = 9.30, p = 0.002) versions of the games. Model M3 accounts for the entire structure of the stage games, but it does not differentiate between the first part (without a penalty) and the second part of the experiment (with a penalty). This model shows that a weak person is much less likely to punish defectors than the strong person in both the asymmetric MHD (5.1 % vs. 56.3 %; χ 2 (1) = 81.99, p < 0.001) and the asymmetric VOD (10.6 % vs. 76.8 %; χ 2 (1) = 119.44, p < 0.001).
Model M4 is the most unrestricted model as it also accounts for the two parts of the experiment. The punishment rates displayed in Figure 1 are based on this model. Except for the difference in the symmetric VOD (χ 2 (1) = 6.50, p = 0.011), the differences between the punishment rates in the first part (without penalty) and the second part of the experiment (with penalty) are all statistically insignificant. Based on this result, and the fact that the punishment rates do not increase substantially from the first to the second part, we feel confident that reporting results regarding the punishment of defectors based on the pooled data from both parts does not obfuscate any relevant facts. The table lists coefficient estimates from logistic regression models and cluster-robust standard errors (*** p < 0.001, ** p < 0.01, * p < 0.05, for two-sided tests). The models were estimated without a constant so that the coefficients can be interpreted as deviations from zero (i.e., 0.5). Goodness of fit measures are based on model estimations with a constant. The outcome variable in all models is 1 if a person punished the defection and is 0 otherwise. Figure 1 is based on the estimates in model M4. N 1 denotes the number of decisions and N 2 denotes the number of clusters. confirms that this also holds within the symmetric (43.8 % vs. 72.8 %; χ 2 (1) = 54.60, p < 0.001) and the asymmetric (59.9 % vs. 80.9 %; χ 2 (1) = 26.25, p < 0.001) versions of the games.
As stated in the main paper, there is also a significant difference in punishment rates between the symmetric and asymmetric MHD (44 % vs. 60 %; χ 2 (1) = 15.20, p < 0.001) and the symmetric and asymmetric VOD (73 % vs. 81 %; χ 2 (1) = 4.34, p = 0.037). The table lists coefficient estimates from logistic regression models and heteroscedasticity-robust standard errors (*** p < 0.001, ** p < 0.01, * p < 0.05, for two-sided tests). The models were estimated without a constant so that the coefficients can be interpreted as deviations from zero (i.e., 0.5). Goodness of fit measures are based on model estimations with a constant. The outcome variable in all models is 1 if the defection was punished by at least one group member and is 0 otherwise. Figure S5 is based on the estimates in model M7.
Recall that the latter two differences are estimated based on the pooled data from both parts of the experiment. Model M7 also accounts for the two parts of the experiment and allows for testing the differences between the symmetric and asymmetric games for each part separately.
Except for the difference in the VOD with penalty (χ 2 (1) = 0.89, p = 0.345), the differences in punishment rates between the symmetric and asymmetric games are all statistically significant. Note, however, that the insignificant difference in the VOD with penalty may partly be due to low power in the data; defection rates are lowest in the VOD with penalty (see Fig. 2) making it most difficult to identify a statistically significant difference in this experimental condition. We are therefore confident that reporting results regarding the punishment rates at the group level based on the pooled data from both parts does not obfuscate any relevant facts. The punishment rates displayed in Figure S5 are based on model M7. Figure S5: Punishment rate at group level Table S3 lists the coefficient estimates from logit regression models of single group member punishment. The outcome variable in all three models is 1 if the defection was punished by exactly one group member, and it is 0 otherwise. In other words, these models estimate the rates at which the second-order public good is produced efficiently. Again, model M8 distinguishes between the two types of the stage game only (MHD vs. VOD). The secondorder public good is produced less often efficiently in the MHD than in the VOD, both overall Model M10 is the most unrestricted model as it also accounts for the two parts of the experiment. The rates of efficient public good provision displayed in Figure S6 are based on this model. None of the differences in single punisher rates between the first part (without penalty) and the second part of the experiment (with penalty) are statistically significant. 15.82(1)*** 44.77(3)*** 47.28(7)*** Notes: The table lists coefficient estimates from logistic regression models and heteroskedasticity-robust standard errors (*** p < 0.001, ** p < 0.01, * p < 0.05, for two-sided tests). The models were estimated without a constant so that the coefficients can be interpreted as deviations from zero (i.e., 0.5). Goodness of fit measures are based on model estimations with a constant. The outcome variable in all models is 1 if the defection was punished by exactly one group member and is 0 otherwise. Figure S6 is based on the estimates in model M10. Figure S6: Single punisher rate Finally, Table S4 lists the coefficient estimates from logit regression models of defections.
The outcome variable in all three models is 1 if a defection occurred, and it is 0 otherwise.
Unlike in the case of punishment rates, the latter two differences are somewhat obfuscated by the fact that model M12 does not distinguish between the first part (without penalty) and the second part of the experiment (with penalty). As mentioned in the main paper, the penalty matters a great deal with regard to defection. Model M13 shows that in the first part of the experiment (without a penalty), only the difference between the symmetric and asymmetric VOD is statistically significant (χ 2 (1) = 6.84, p = 0.009). As soon as a penalty is introduced in the second part, overall defection rates drop dramatically (87.7 % vs. 40.9 %; χ 2 (1) = 252.23, p < 0.001). From model M13 we can see that defection rates in the second part depend substantially on the experimental condition. Defection rates are higher in the symmetric MHD than in the asymmetric VOD (62.6 % vs. 21.0 %; χ 2 (1) = 32.77, p < 0.001); they are higher in the symmetric MHD than in the asymmetric MHD (62.6 % vs. 41.4 %; χ 2 (1) = 7.83, p = 0.005); and they are higher in the symmetric VOD than in the asymmetric VOD (38.6 % vs. 21.0 %; χ 2 (1) = 6.17, p = 0.013). The difference in defection rates between the asymmetric MHD and the symmetric VOD is statistically insignificant (41.4 % vs. 38.6 %; χ 2 (1) = 0.15, p = 0.698). The defection rates displayed in Figure 2 are based on model M13. 9.91(1)** 21.07(3)*** 188.56(7)*** Notes: The table lists coefficient estimates from logistic regression models and cluster-robust standard errors (*** p < 0.001, ** p < 0.01, * p < 0.05, for two-sided tests). The models were estimated without a constant so that the coefficients can be interpreted as deviations from zero (i.e., 0.5). Goodness of fit measures are based on model estimations with a constant. The outcome variable in all models is 1 if a defection occurred and is 0 otherwise. Figure 2 is based on the estimates in model M13.