Introduction

Understanding mechanisms of cooperation in a competitive environment remains an unresolved issue in the 21st century1. Although several mechanisms have been revealed to promote cooperation2, direct reciprocity is one of the most primitive and powerful foundations and can be observed in species other than humans3,4. The most powerful format for understanding direct reciprocity is the Prisoner’s Dilemma (PD)5,6,7. The PD has been researched in a wide range of fields, including physics, economics, psychology, and informatics, and it has become one of the common bases for understanding human behaviour.

In a typical (simultaneous) PD game, two players are expected to make their decisions simultaneously. In the alternating PD, on the other hand, the two players play alternatingly. Direct reciprocity, often encapsulated in the phrase ’I will help you because you helped me’, is fundamental in the case of alternating actions. Both the alternating and simultaneous actions are relevant for reciprocal altruism because, on the one hand, reciprocal cooperation in an actual situation has a behavioural time lag6, while on the other hand, mutual help exists when the actions of two players are simultaneous8. A theoretical study9 has shown the differences between the dominant strategies in alternating and simultaneous games.

What behavioural strategies are adaptive in a PD? The Prisoner’s Dilemma Tournament by Axelrod explored adaptive strategies for the best-known PD10,11. The study showed that the tournament winner was the Tit-For-Tat (TFT) strategy. A subsequent study showed that the Win-Stay-Lose-Shift (WSLS) strategy is adaptive in simultaneous games, and the Generous TFT is adaptive in alternating games.9,12. Numerous theoretical studies have employed the PD to investigate various aspects, such as the effects of network structure13,14,15,16, incomplete information17, and individual differences18. Recent theoretical studies have highlighted the effects of game transitions19 and unexpected promotions of unidirectional social interactions20. Empirical research has also revealed the coupling between different domains within multilayer social structures21. However, only some studies have examined how these theoretical models align with actual social environments and human behaviour22.

Do humans adopt behavioural strategies that have been theoretically shown to be adaptive in their implementation? One of the earliest subject experiments analysed behavioural strategies in the simultaneous games23. Surprisingly, the results showed that neither TFT nor WSLS was adaptive but did show that humans are relatively more inclined to choose cooperation even after being defected against. Studies have also reported a trend of decreasing cooperation rates as the rounds progress24,25. Most studies have focused on the simultaneous game, and few experiments have analysed human behavioural strategies in the alternating game. We explore humans’ behavioural strategies in simultaneous and alternating games to clarify the differences between theoretical prediction and actual human behaviour.

In the basic structure of a PD, players choose between two actions: cooperation (C) or defection (D). A pioneering paper26 introduced a third option where a player opts out of the game, receiving a small, independent income. These non-participating players are called “loners.” Loners can thwart defectors and resolve the social dilemma. The role of loners has been extensively studied in both public goods games26,27,28,29,30,31,32 and PD games33,34,35,36. A simulation study37 revealed adaptive strategies for PDs with voluntary participation in simultaneous and alternating games. However, no one has revealed any analysis of human behavioural strategies in PDs with voluntary participation. We clarify humans’ behavioural strategies in four games: (the standard PD and the PD with voluntary participation) and (simultaneous and alternating games).

Results

First, we analyse the results of the standard PD. Figure 1 shows the time trends for the alternating game and the simultaneous game. As is clear from the figure, in all 20 rounds, the cooperation rate was higher in the alternating game. Fisher’s exact test for the total number of C and D occurrences in simultaneous and alternating games showed a significant difference (\(p<.001\)). The overall average cooperation rate is higher for alternating games.

Fig. 1
figure 1

Time trend of mean cooperation together with the standard error: Solid and dotted lines indicate alternating and simultaneous games, respectively. The error bars present standard errors.

Table 1 shows the cooperation ratio \(P_{c(r=1)}\) in the first round and the behavioural strategy after the second round. The behavioural strategy is described by the player’s and that player’s partner’s actions; thus, the strategy space of the model consists of the player’s previous action (C, D) and the partner’s previous action (C, D). For example, \(P_{c(CD)}\) shows the cooperation ratio and variance in the current round under the condition that the player was C and his/her partner was D in the previous round. The rightmost column shows the number of times each combination occurred. The same is true for other combinations. The simultaneous game results by Rapoport23 are \(P_{c(CC)}=.808\), \(P_{c(CD)}=.434\), \(P_{c(CC)}=.369\) and \(P_{c(CC)}=.223\), which are mostly consistent with our results. Interestingly, if the player’s strategy is TFT or WSLS, then \(P_{c(CD)}=0\) should be the case, but the player chooses cooperation in roughly half of the cases. Also, \(P_{c(DD)}\) is the lowest, suggesting that it is difficult to get out once a DD combination occurs. Furthermore, a similar trend is observed for the alternating game. The surprising result is that \(P_{c(CD)}\) exceeds 0.5. Contrary to theoretical expectations, people tended to cooperate after they had been exploited. In both games, players’ actions tended to be the same as their previous actions. The results indicate that actual human behaviour in iterated PD deviates from theoretical predictions.

Table 1 Behavioural strategies in Standard PD:The left and right tables show simultaneous and alternating games, respectively. The numbers in the second and third columns show the times the event occurred and the cooperation rate under the conditions, respectively. The \(P_{c(CD)}\) and \(P_{c(DC)}\) cases are always identical in simultaneous and alternating games because if an action is (C, D) from one player’s viewpoint in a round, it will always be (D, C) from the opponent’s viewpoint.

We then analyse voluntary PD. Figure 2 shows the time trends for the simultaneous and alternating games. A Fisher’s exact test for the total number of C, D, and L (loner) occurrences in simultaneous and alternating games showed no significant difference (\(p=.245\)). Also, no characteristic change due to time trend was observed in either game.

Fig. 2
figure 2

The time trend of the voluntary PD games: The red, blue, and green bars express the ratio of defection, cooperation and loner, respectively. Panel A shows simultaneous games and panel B shows alternating games.

Table 2, 3 shows the behavioural strategies as in Table 1. For example, \(P_{CL}\) shows the number of times the D, C, and L behaviours appeared, the total, and the rate of occurrence of each behaviour for the previous round a player did C and the partner did L. The same is true for other combinations.

Table 2 Behavioural strategies in voluntary and simultaneous PD: The numbers in the table indicate the total number of behaviours that appeared in each condition and their rate of occurrence.
Table 3 Behavioural strategies in voluntary and alternating PD: The meaning of each number in the table is the same as in Table 2.

In voluntary games, the strategy needs to be visualised and observed because the number of combinations describing it increases, making it difficult to understand. In observing the results, we have employed a visualising method developed by Yamamoto et al.37 that maps the ratio of all the players’ actions to an RGB colour chart (see panel (A) in Fig. 3). Each vertex of the triangle shows perfect domination of cooperation (blue), loners (green), and defection (red). The centre of the triangle shows that all strategies exist equally. For example, if an action ratio is \((C, D, L) = (0.6, 0.2, 0.2)\), the colour on the graph is expressed as \((R, G, B) = (51, 51, 153)\) in the RGB colour chart. If the cooperation ratio equals one, (RGB) becomes (0, 0, 255) and is mapped in blue.

Figure 3 visualises the strategies employed in each game using the results from Tables 2 and 3 (Panels (C) (E)). Panels (B) and (D) show the agent-based simulation results for games with the same payoff structure as in Yamamoto et al.37. The simulation results show that, in the alternating game, a strategy that can be described as “escape from interaction if a partner defected, or cooperate if a partner escaped from interaction” dominates the population. On the other hand, in the simultaneous game, WSLS is adopted. However, actual human behaviour deviates from the adaptive strategies shown in the simulation. In contrast to the simulation results, players tend to stick to their previous behaviour in both simultaneous and alternating games. This tendency becomes especially marked if they chose the loner option in the previous round. In addition, it is observed that the next round after DD tends to be non-cooperative in the simultaneous game, while the loner option is more likely to be chosen in the alternating game. Interestingly, the choice of the loner option following DD in the alternating game aligns with the outcomes predicted by simulation results. The simulation results show that adaptive strategies clearly differ depending on the game structure, but the fact that these differences are not as pronounced in human behaviour has significant implications for our understanding of game theory and behavioural economics.

Fig. 3
figure 3

Behavioural strategies in voluntary PD: Panels (B) and (C) show simultaneous games, and panels (D) and (E) show the alternating game. Panels (B) and (D) show the results of the agent-based simulation37, and panels (C) and (E) show the experimental results which conducted in this paper.

In the alternating games, the effect of moves may influence behaviour. A Fisher’s exact test on the behaviour of the first and second movers in standard alternating games showed a significant trend (\(p=0.05\)). A two-tailed test (\(\alpha = 0.05\)) for residuals showed that the first mover cooperated significantly more often (\(z=2.015\), adjusted \(p=0.043\)). A Fisher’s exact test on the behaviour of the first and second movers in the voluntary alternating games was significant (\(p=0.005\)). A two-tailed test (\(\alpha =0.05\)) for residuals showed that the second mover was more likely to cooperate (\(z=3.107,\) adjusted \(p=0.005\)) and the first mover was more likely to be a Loner (\(z=2.699\), adjusted \(p=0.01\)). Second movers in standard PD can be considered more non-cooperative because they can opt for non-cooperative behaviour in low-risk situations, having the advantage of deciding their actions after observing the behaviour of the first mover. Conversely, no trend consistent with standard PD is evident in voluntary PD. The added complexity from new behavioural options could influence behaviour, necessitating further examination of its effects on the moves.

Discussion

Our study aimed to clarify human behavioural strategies in four variations of the PD: standard and voluntary participation, both in simultaneous and alternating formats. The results provide significant insights into the discrepancies between theoretical predictions and actual human behaviour. The analysis of standard PD games reveals a higher overall cooperation rate in alternating games than in simultaneous ones. This finding aligns with the hypothesis that alternating actions, which allow for direct reciprocity, promote higher cooperation.

Interestingly, our results indicate that humans often cooperate even after being defected against, contradicting the theoretical expectations for TFT and WSLS strategies. This behaviour, observed in both simultaneous and alternating games, suggests that humans are more forgiving and inclined towards cooperation than theoretical studies would predict. This tendency to cooperate after defection may reflect a more complex understanding of social interactions, where maintaining relationships could be valued over immediate retaliation.

In the voluntary games, introducing the “loner” option, where participants can opt out of the game, adds another layer of complexity. Our analysis shows no significant difference in the overall occurrence of cooperation, defection, and loner behaviours between simultaneous and alternating games. This finding contrasts with simulation studies that predict distinct adaptive strategies for each game type37. In human behaviour, however, the choice to become a loner after a defection suggests a strategy of avoiding further negative outcomes rather than immediately seeking reciprocity or retaliation.

Theoretically, cooperative behaviour is dominant in the payoff matrix employed in our experiments in both standard and voluntary PD. However, the average cooperation ratios in standard PD were 0.509 and 0.700 for simultaneous and alternating games, respectively. On the other hand, the cooperation ratios in voluntary PD were 0.428 and 0.446, respectively, which were lower than those in standard PD. One possible explanation for the persistently low cooperation rates in voluntary PD could be that introducing the loner option may have distorted participant behaviour. This result may indicate that the decoy effect38,39 distorted the behaviour or that risk-averse behaviour was chosen40,41,42. However, our experiments do not permit a detailed analysis of the mechanisms associated with the loner option, necessitating future experiments be designed to explore these possibilities.

The observed deviations between human behaviour and theoretical predictions in PD games with voluntary participation have significant implications. The results suggest that human social strategies are influenced by a broader range of factors, including the desire to avoid conflict and the inclination towards forgiveness and cooperation, even in competitive environments. These findings require a reassessment of current game theory and behavioural economics models to better account for human decision-making’s nuanced and often context-dependent nature.

Our results provide noteworthy insight into the role of forgiveness in the evolution of cooperation. Several studies have shown that people who punish non-cooperation are not always positively evaluated43,44,45,46. On a theoretical level, the punishment of free riders is deemed necessary to uphold cooperation. However, our research reveals a paradox-people do not always respond positively to punishment, which complicates the maintenance of cooperation. Our results suggest we must actively consider the influence of “forgiveness,” a human tolerance quality, on people’s cooperative behaviour. The biblical passage “If anyone slaps you on the right cheek, turn to them the other cheek also.” may have profound implications for the evolution of cooperation.

The limitations of this study need to be noted. While our experiments deal with memory-1 strategies, many theoretical studies have tested the effects of longer memories47,48. While considering that longer memory-n strategies are a significant extension of the evolution of cooperation, sufficient data is not easy to collect in the current experimental settings because experiments with human participants produce large variations in the number of behavioural combinations. The variation is particularly noticeable when the memory length is increased. An approach is also needed that integrates mathematical models and subject experiments49.

Future research should explore the underlying psychological mechanisms driving these deviations from theoretical strategies. Understanding the role of trust, reputation, and long-term relationship building in PD games could provide deeper insights into the observed behaviours. Moreover, experiments incorporating more diverse social and game structures19,20,21 could help generalise these findings and refine existing models. In conclusion, our study highlights the importance of considering human behavioural nuances in game theoretical models. The discrepancies between theoretical predictions and actual human behaviour underscore the need for more comprehensive approaches to understanding cooperation in competitive environments.

Methods

We implemented four types of games, combining two game structures (standard PD and voluntary PD) and sequences of actions (alternating game and simultaneous game). Standard PD is a well-known game in which players have two action choices: cooperation or defection. Voluntary PD is a game in which the player can choose from three actions: cooperation, defection, or loner. In an alternating game one player decides an action, and the partner observes the action and decides the next action. In a simultaneous game, the two players decide their actions simultaneously. The detailed experimental procedure is as follows.

We conducted our experiment in a 2x2 between-subjects design with game types (Standard and Voluntary) and moves (Simultaneous and Alternating). We recruited 689 participants (female=191; age mean=49.3, SD=11.3) using a Japanese crowdsourcing service (http://crowdsourcing.yahoo.co.jp/) and randomly assigned them to each of the four conditions. There were 394 participants who participated in the experiment until the end and were included in the analysis.

The experiment was conducted on 4th and 10th June 2024. We developed these experimental systems using oTree50. Each game lasted 20 rounds, but participants were not informed of the end condition.

In the simultaneous game, the result and gain were displayed after both players decided their action, and the pair proceeded to the next round. In the alternating game, the 1st and 2nd mover were randomly assigned at the beginning of the game, and then each time each player decided an action his/her opponent was notified of the action. Once the 2nd mover determined his/her decision in each round, the gain for that round was determined, and each player was notified. These conditions were the same for both standard and voluntary games.

Table 4 shows the payoff matrix of the games. The payoff matrix is symmetric, and the numbers in the table represent the payoffs of player A. In the voluntary game, if one player chose loner, the two players gained 3 pts, regardless of the other’s choice. Final rewards were calculated as follows. In addition to the show-up fee, participants received the cumulative payoffs multiplied by 0.75 and rounded up to the nearest JPY. Participants with negative cumulative payoffs only received a show-up fee. Participants received up to 150JPY for a cumulative gain of 20 rounds.

Table 4 The payoff matrixes of the standard and voluntary game. The left and right tables show the standard and voluntary games, respectively.