Third-party punishment (TPP)1,2,3,4,5,6,7, in which unaffected observers punish selfishness, promotes cooperation by deterring defection. But why should individuals choose to bear the costs of punishing? We present a game theoretic model of TPP as a costly signal8,9,10 of trustworthiness. Our model is based on individual differences in the costs and/or benefits of being trustworthy. We argue that individuals for whom trustworthiness is payoff-maximizing will find TPP to be less net costly (for example, because mechanisms11 that incentivize some individuals to be trustworthy also create benefits for deterring selfishness via TPP). We show that because of this relationship, it can be advantageous for individuals to punish selfishness in order to signal that they are not selfish themselves. We then empirically validate our model using economic game experiments. We show that TPP is indeed a signal of trustworthiness: third-party punishers are trusted more, and actually behave in a more trustworthy way, than non-punishers. Furthermore, as predicted by our model, introducing a more informative signal—the opportunity to help directly—attenuates these signalling effects. When potential punishers have the chance to help, they are less likely to punish, and punishment is perceived as, and actually is, a weaker signal of trustworthiness. Costly helping, in contrast, is a strong and highly used signal even when TPP is also possible. Together, our model and experiments provide a formal reputational account of TPP, and demonstrate how the costs of punishing may be recouped by the long-run benefits of signalling one’s trustworthiness.
Costly third-party punishment (TPP) is widely observed in laboratory1,2,3,4,7 and field5,6 experiments (although see ref. 12), and appears to be universal across cultures13. While collectively beneficial, TPP poses a puzzle: why should individuals incur the costs of punishment?
We propose an answer based on reputation4,14,15,16,17,18. Specifically, we introduce a game theoretic model of TPP as a costly signal of trustworthiness: if you see me punish selfishness, it can signal that I will not be selfish to you. Our model involves a partner-choice19 game with two roles. In each interaction, the ‘Signaller’ decides whether to send one or more costly signals; then the ‘Chooser’ decides whether to partner with the Signaller.
As with all costly signalling models8,9,10, our model is based on individual differences: two ‘types’ of Signallers differ in their quality as interaction partners. For trustworthy types, it is payoff-maximizing to cooperate when trusted; for exploitative types, it is payoff-maximizing to defect. Choosers thus benefit from partnering with trustworthy Signallers, but are harmed by partnering with exploitative Signallers.
Signallers’ types are fixed, but not directly observable. Therefore, Choosers must base their partner choice on the aforementioned costly signals. In each interaction, the Signaller’s cost of signalling is either small (less than the benefit of being chosen as a partner) or large (greater than the benefit of being chosen). It is thus beneficial to signal (in order to be chosen) when the cost is small, but not large. The key premise of costly signalling is that high-quality types are more likely to experience small signalling costs than low-quality types (and are thus more likely to signal). Therefore, signals convey information about Signallers’ types, and Choosers benefit from preferring partners who signal.
How does this relate to TPP and trustworthiness? We argue that TPP will typically be less net costly for trustworthy types (that is, individuals who find it payoff-maximizing to cooperate when trusted). Because TPP deters future harm against others, punishing may benefit the punisher (for example, via direct reciprocity from the victim of the punished transgression, or rewards from institutions or leaders seeking to promote cooperation); and these benefits should be larger for trustworthy types, because the same mechanisms11 that make trustworthy behaviour advantageous also increase the benefits of preventing harm against others. (This argument implies that the costly signalling mechanism we propose may interact positively with other mechanisms for TPP that are based on deterrence benefits.) Furthermore, because trustworthy types are more desirable to interact with than exploitative types, they typically attract more partners—which may reduce TPP costs by offering protection against retaliation and facilitating coordinated punishment20. See Supplementary Information sections 1.2.3 and 1.3 and Extended Data Fig. 1 for formal models of these two microfoundations for our central argument.
When TPP is less net costly for trustworthy types, it can serve as a costly signal of trustworthiness. Agents should thus sometimes punish for the express purpose of signalling their trustworthiness to Choosers (like a peacock’s tail signals genetic quality)—specifically, when the deterrence benefits of TPP are too small to outweigh the costs on their own (otherwise, TPP would occur without signalling), but the reputational benefit of appearing trustworthy makes TPP worthwhile.
Although TPP can convey information about type, there are often several possible ways to signal trustworthiness, and TPP is not always the most informative. Therefore, a crucial prediction of this signalling account is that when a more informative signal is available, the signalling value of TPP should be attenuated and less TPP should occur. To illustrate this fact, our model also includes the possibility of signalling via costly helping of third parties: because being trustworthy and helping both involve paying costs to benefit others, helping should typically be a very informative signal of trustworthiness (see Supplementary Information section 1.2.3).
Agents in our model make decisions across three different scenarios in which Signallers have the opportunity to engage in (1) TPP, (2) third-party helping, or (3) both. In each scenario, Choosers know which signals were available to Signallers. An agent’s strategy specifies her actions as both the Signaller and Chooser in each scenario.
Our equilibrium analysis identifies Nash equilibria that are robust against indirect invasion (RAII)21 (and thus likely to be favoured by natural selection; see Supplementary Information section 2.1). We also directly test which strategies are favoured by selection using stochastic evolutionary dynamics where agents interact at random to earn payoffs, strategies with higher payoffs become more common, and mutation maintains variation (see Supplementary Information section 3.1). This process can describe genetic evolution, as well as social learning whereby people imitate successful others.
We first consider scenario 1, where Signallers have the opportunity to punish but not help. Here a punishment-signalling strategy profile (in which Signallers punish when experiencing small signalling costs, and Choosers only accept Signallers who punish) can be an equilibrium when punishment is sufficiently informative: that is, when trustworthy types are sufficiently more likely to receive small punishment costs, and less likely to receive large punishment costs, than exploitative types (see Fig. 1a for precise conditions). Thus, we confirm that TPP can signal trustworthiness when it is the only available signal. By symmetry, the same is true for helping when it is the only available signal (scenario 2). See Supplementary Information section 2.2 for details.
What, then, happens in scenario 3 when TPP and helping are both possible? If helping is more informative, TPP may be ignored. To see why, consider a Signaller who punishes but does not help. If she did not have the opportunity to help, her choice to punish conveys positive information, and a Chooser might accept her. However, if helping was possible, her choice not to help conveys negative information—and when not helping is informative enough to outweigh the positive effect of punishing, the same Chooser might reject her.
To formalize this argument, we vary the informativeness of TPP and helping in scenario 3. We focus on the parameter region where both TPP and helping are informative enough to serve as signals on their own (that is, punishment-signalling and helping-signalling are the unique equilibria in scenarios 1 and 2, respectively; everywhere in Fig. 1b). We find that when the informativeness of the two signals is sufficiently similar, there are equilibria in which Signallers are equally likely to engage in TPP and helping, and Choosers equally demand TPP and helping. However, as the informativeness of helping increases, and/or the informativeness of TPP decreases, the unique equilibrium becomes an only-helping strategy profile in which helping is signalled and demanded, and TPP is ignored. Specifically, only-helping becomes the unique equilibrium when Choosers receive (i) a positive expected payoff from accepting any Signaller with a small helping cost (even if she has a large punishing cost), and (ii) a negative expected payoff from accepting any Signaller with a large helping cost (even if she has a small punishing cost). See Fig. 1b for precise conditions.
Critically, then, there are parameter regions in which it is an equilibrium to punish (and condition partner choice on punishment) in scenario 1 but not in scenario 3. Evolutionary dynamics show that as a result, TPP can evolve as a costly signal that is preferentially used when helping is not possible (Fig. 1c and Extended Data Fig. 2).
Our model thus makes clear predictions. First, when TPP is the only possible signal, it should be perceived as, and should actually be, an honest signal of trustworthiness. Second, when a more informative signal (for example, helping) is also available, third parties should be less likely to punish, and the perceived and actual signalling value of TPP should be attenuated. Third, the same should not be true of helping, which should continue to serve as a strong signal even when TPP is possible.
We next test these predictions in a two-stage economic game conducted using Amazon Mechanical Turk in which TPP and helping signals can be sent, and then partner choice occurs (Extended Data Fig. 3 illustrates the experimental setup, and Supplementary Information section 4 discusses the link between our theoretical and experimental setups). As in our model, there are two roles in this game: Signaller and Chooser.
In the first stage, the Signaller participants in a TPP game1 (TPPG), interacting with people other than the Chooser. In the TPPG, a Helper decides whether to share money with a Recipient, and an unaffected Punisher decides whether to pay to punish the Helper if the Helper is selfish. To investigate the three scenarios from the model in which helping, punishment or both are available as signals, we manipulate whether the Signaller participates in the TPPG as the Helper, Punisher or both (playing twice with two different sets of other people).
The second stage captures the psychology of partner choice using a trust game (TG). Here, both the Signaller and Chooser participate. The Chooser first decides how much of an endowment to send to the Signaller; any money sent is tripled. The Signaller then decides how much to return to the Chooser. The Chooser can condition her sending on the Signaller’s behaviour in the TPPG—and the Signaller knows this when deciding how to behave in the TPPG.
Overall, therefore, our experiment is designed to include opportunities to signal via TPP and/or helping, and to make helping more informative than TPP (see Supplementary Information section 5.1 for further discussion).
The results confirm our theoretical predictions. First, in the punishment-only condition (where punishment is the only available signal, n = 397 Signaller–Chooser pairs), punishment is perceived by Choosers as a signal of trustworthiness: Choosers trust Signallers who punish in the TPPG more than those who do not (sending 16 percentage points more to punishers than non-punishers, P < 0.001, Fig. 2a). Furthermore, punishment actually is an honest signal of trustworthiness: Signallers who punish return significantly more in the TG than non-punishers (returning 8 percentage points more, P = 0.001, Fig. 2b). P values generated using linear regression with robust standard errors; see Supplementary Information section 5.2.
Second, in the punishment-plus-helping condition (where helping is also possible, n = 393 Signaller–Chooser pairs), Signallers use punishment less than in the punishment-only condition: only 30% of Signallers punish in punishment-plus-helping, compared to 41% in punishment-only (P = 0.002, Fig. 2c). Furthermore, providing the option to help attenuates the perceived and actual signalling value of punishment: in the punishment-plus-helping condition, controlling for helping, Choosers only trust punishers slightly more than non-punishers (4 percentage points more sent to punishers than non-punishers, P = 0.004, Fig. 2a), and Signallers who punish in the TPPG do not return significantly more in the TG than non-punishers (0.3 percentage points less returned by punishers than non-punishers, P = 0.900, Fig. 2b). Thus the effects of punishment on trust and trustworthiness are significantly smaller in punishment-plus-helping than punishment-only (interactions: P < 0.001 and P = 0.016, respectively). See Supplementary Information section 5.2 for details.
Third, in the helping-only condition (n = 409 Signaller–Chooser pairs), just as many Signallers help (81%) as in the punishment-plus-helping condition (82%) (P = 0.650, Fig. 2c). Furthermore, in both conditions, Choosers preferentially trust Signallers who help (39 percentage points more sent to helpers in helping-only, P < 0.001; 37 percentage points more sent in punishment-plus-helping (controlling for TPP), P < 0.001, Fig. 2a), and Signallers who help are more trustworthy (25 percentage points more returned by helpers in helping-only, P < 0.001; 22 percentage points more returned in punishment-plus-helping (controlling for TPP), P < 0.001, Fig. 2b). These differences between conditions are not significant (interactions: P = 0.539 and P = 0.623, respectively). Thus, while helping attenuates the signalling value of TPP, TPP does not attenuate the signalling value of helping.
These results offer clear support for our model of TPP as a costly signal of trustworthiness. We therefore provide evidence that people may punish to provide information about their character to observers, rather than just to harm defectors or deter selfishness. This theory helps to reconcile conflicting previous experimental results about whether TPP confers reputational benefits. Our conclusion that the signalling value of TPP is mitigated when more informative signals of trustworthiness are also available explains why a large positive effect of punishment on trust was found in one experiment in which helping information was absent22, while little effect was found in another experiment in which helping was observable16. This conclusion also provides an explanation for why TPP and trustworthiness were found to be uncorrelated in an experiment in which both punishment and helping were possible23. Finally, our theory also explains why participants preferred punishers as partners to a greater extent in situations in which participants could benefit from choosing a prosocial partner24.
Our results cannot be explained by the alternative theory that TPP is perceived as a signal of willingness to retaliate when harmed directly (although TPP may signal retaliation in other contexts), because retaliation is not possible in the TG. Even if Choosers sent more to punishing Signallers out of an ‘irrational’ fear of retaliation, helping information should not attenuate this effect (as helping is unlikely to be a more informative signal of retaliation than TPP). Furthermore, an additional experiment (Extended Data Fig. 4 and Supplementary Information section 6) finds that TPP elicits larger reputational benefits when stage 2 is a TG than an ultimatum game (where signalling retaliatoriness is advantageous).
Importantly, punishers need not be consciously seeking to signal their trustworthiness; at a proximate level, TPP may be motivated by emotions like moral outrage1,3. Thus, TPP may be based on social heuristics25 rather than explicit reasoning, and is unlikely to be perfectly sensitive to context—signalling motives may ‘spill over’26 to settings where TPP cannot function as a signal (for example, anonymous interactions, or settings in which engaging with trustworthy Signallers is not actually advantageous to Choosers, such as the Dictator Game27).
Relatedly, while our model assumes that different types of individuals have different costs of TPP and helping, and different optimal responses to being trusted, our experiments do not vary subjects’ payoffs of punishing, helping and being trustworthy. Instead, the experiments tap into participants’ pre-existing inclinations to punish, help and reciprocate the trust of others, reflecting the incentives experienced in daily life25. Thus, because we do not exactly recreate the model in the laboratory, our results are consistent with the idea that the model operates outside of the laboratory (rather than merely showing that participants can reason strategically about a novel game).
In sum, we help answer a fundamental question regarding human nature: why do humans care about selfish behaviours that do not affect them personally? Although TPP may often appear ‘altruistic’, we show how punishing can be self-interested in the long-run because of reputational benefits. Sometimes punishing wrongdoers is the best way to show that you care.
We gratefully acknowledge the John Templeton Foundation for financial support; A. Bear, R. Boyd, M. Crockett, J. Cone, F. Cushman, E. Fehr, M. Krasnow, R. Kurzban, J. Martin, M. Nowak, N. Raihani, L. Santos, and A. Shaw for helpful feedback; and A. Arechar, Z. Epstein, and G. Kraft-Todd for technical assistance.
Extended data figures
This file contains Supplementary Text and Data – see contents page for details.
About this article
Nature Human Behaviour (2018)