Cooperation among rational agents in co-action equilibrium of Prisoner's Dilemma and other single-stage symmetric games

The conventional solution concept used for solving non-cooperative games is that of the Nash equilibrium - a strategy choice by each player so that no player can do better by deviating unilaterally from it. In this paper, we propose an alternative framework referred to as the co-action equilibrium for solving such games. This equilibrium is guaranteed to exist for all games having a symmetric payoff structure. It also has the advantage of being unique for a given game. We analyze in detail three well-known two-person single-stage games, viz., Prisoner's Dilemma (PD), Chicken and Stag Hunt, to illustrate the differences between Nash and co-action solutions. The latter, in general, lead to"nicer"strategies being selected by the agents resulting in globally more efficient outcomes. For example, the co-action equilibrium in PD corresponds to full cooperation among agents at lower values of temptation to defect, while for higher temptation each agent employs a probabilistic (or mixed) strategy, thus essentially solving the dilemma. The key idea underlying the co-action solution is that agents make independent choices from the possible actions available, taking into account that other agents will behave the same way as them and they are also aware of this. It defines a new benchmark strategy for agents in non-cooperative games which is very different from the existing ones. The concept can be generalized to game situations where the symmetry assumption does not hold across all agents by clustering players into different symmetry groups that results in a novel class of games.

The conventional solution concept used for solving non-cooperative games is that of the Nash equilibrium -a strategy choice by each player so that no player can do better by deviating unilaterally from it. In this paper, we propose an alternative framework referred to as the co-action equilibrium for solving such games. This equilibrium is guaranteed to exist for all games having a symmetric payoff structure. It also has the advantage of being unique for a given game. We analyze in detail three well-known two-person single-stage games, viz., Prisoner's Dilemma (PD), Chicken and Stag Hunt, to illustrate the differences between Nash and co-action solutions. The latter, in general, lead to "nicer" strategies being selected by the agents resulting in globally more efficient outcomes. For example, the co-action equilibrium in PD corresponds to full cooperation among agents at lower values of temptation to defect, while for higher temptation each agent employs a probabilistic (or mixed) strategy, thus essentially solving the dilemma. The key idea underlying the co-action solution is that agents make independent choices from the possible actions available, taking into account that other agents will behave the same way as them and they are also aware of this. It defines a new benchmark strategy for agents in non-cooperative games which is very different from the existing ones. The concept can be generalized to game situations where the symmetry assumption does not hold across all agents by clustering players into different symmetry groups that results in a novel class of games.

I. INTRODUCTION
Strategic interactions occur all around us in a multitude of forms between autonomous agents. These interacting agents could correspond to individual humans or animals or even computer algorithms, as well as, collective entities such as groups, organizations or nations. Analyzing their interactions in terms of games [1] is a promising approach for understanding the behavior of a wide variety of socio-economic and biological systems, and finds applications in fields ranging from economics and political science to computer science and evolutionary biology [2]. A game is described by the set of all possible actions by a specified number of agents, where each possible combination of actions is associated with a payoff for each agent. Thus, the payoff received by an agent depends on her choice of action, as well as that of others. The standard assumption used in game theory is that agents are rational and selfish. Further, they want to maximize their individual payoffs and can implement their decisions without any error. In addition, every agent knows that all agents satisfy these criteria (for a detailed discussion of these ideas see, e.g., Ref. [3]).
In order to solve a game, i.e., to find the set of actions that the agents will employ given the structure of the game, one also needs a solution concept that forms the basis for strategy selection by the agents. For solving non-cooperative games (where agents choose their actions independently without communicating with other agents), the canonical solution concept employed is that of the Nash equilibrium. Informally, it is defined as the set of actions chosen by the agents where no agent can gain by unilaterally deviating from this equilibrium [4]. Nash equilibria exist for all games having a finite number of agents choosing from a finite set of actions, making it a very general concept that has wide applicability [5]. Indeed, the concept has been central to various attempts at developing quantitative descriptions of socio-economic phenomena [6]. However, analyzing specific games using the concept of Nash equilibrium can raise the following issues: (i) A game may have more than one Nash equilibria and hence, deciding which of these will be adopted by rational agents is a non-trivial problem [7]. Additional criteria need to be provided for selecting an equilibrium; however their success is not always guaranteed [8]. (ii) The Nash equilibrium of a game may sometimes be inferior to an alternate choice of actions by the agents in which all the parties get higher payoff. This gives rise to the dilemmas in games such as Prisoner's Dilemma (PD) [9], Traveler's Dilemma [10,11], etc. For example, in PD, where each agent has the option of either co-operating or defecting with the other agents, mutual defection is the only Nash equilibrium, although mutual cooperation will result in higher payoffs for all agents. Results of experimental realizations of such games also show deviation from the Nash solutions [9,12]. That rational action by individual agents can result in an undesirable collective outcome for the agents is a longstanding puzzle and a considerable body of research literature exists devoted to understanding it [13].
In this paper we propose a novel solution paradigm referred to as co-action equilibrium for payoff-symmetric games such as PD, in which the optimal action of rational agents is markedly different from Nash equilibrium. The concept of co-action equilibrium, which was originally introduced in the restricted context of minority games [14], is generalized here to analyze all single-stage games with two actions per agent, where the payoff structure is unchanged on exchanging the identities of the agents (payoff symmetry). We primarily focus on two-person games, with agents playing the game once (in contrast to repeated games where agents can interact many times in an iterative manner) and analyze in detail three wellknown instances, viz., PD, Chicken (also referred to as snow-drift or Hawk-Dove) and Stag Hunt. We describe the differences between Nash and co-action solutions for these games, with the latter, in general, leading to "nicer" strategies being selected by the agents. For example, the co-action equilibrium in PD corresponds to full cooperation among agents at lower values of temptation to defect, while for higher temptation each agent employs a probabilistic (or mixed) strategy. Thus, co-action typically results in more globally efficient outcomes, reconciling the apparent conflict between individual rationality and collective benefit. Co-action solution of a game has the additional benefit that the equilibrium is unique and therefore does not have the problem of equilibrium selection. We also discuss how the concept can be extended to other scenarios, such as, symmetric games involving several players, or even non-symmetric games when agents can be grouped into clusters with symmetry holding within each. In fact, the latter case can be seen as defining a new class of games between players, where each "player" represents a group of agents who independently choose the same strategy.

II. THE CO-ACTION EQUILIBRIUM
To describe the co-action solution concept, we consider the general case of a payoff-symmetric, two-person game where each agent (say, A and B) has two possible actions (Action 1 and Action 2) available to her. Each agent receives a payoff corresponding to the pair of choices made by them. If both agents choose the same option, Action 1 (or 2), each receives the payoff R (or P , respectively), while if they opt for different choices, the agent choosing Action 1 receives payoff S while the other receives T . Thus, the game can be represented by a payoff matrix that specifies all possible outcomes (Fig. 1). An agent may employ a mixed strategy, in which she randomly selects her options, choosing Action 1 with some probability p (say) and Action 2 with probability (1 − p). A pure strategy corresponds to p being either 0 or 1. A Nash equilibrium for a game can be in pure strategies or in mixed strategies. As noted earlier, a given game may have more than one Nash equilibrium, possibly involving mixed strategies. Assuming that agent A (B) chooses Action 1 with probability p 1 (p 2 ) and Action 2 with probability 1 − p 1 (1 − p 2 , respectively), their ex- pected payoffs are, The symmetry of the game is reflected in the fact that W A and W B are interchanged on exchanging p 1 with p 2 .
It is easily seen that if a mixed strategy Nash equilibrium exists, it is the same for both agents and given by the probabilities The Nash solution assumes that all agents are rational and that each agent knows the planned equilibrium strategies of the other agents. Furthermore, a unilateral deviation in strategy by one of them will not change the strategy choice of others (who are assumed to be just as rational as the one who deviated!). Although this latter assumption is deeply embedded in standard game theory, it is not a necessary component of rational behavior.
Here, in the co-action concept, we argue that two rational agents facing a symmetric situation will adopt the same strategy. Thus, by virtue of the symmetry of the game, each agent will argue that whatever complicated processes she employs in arriving at the optimal decision, the other agents will choose the same strategy as they have the same information and capabilities. It is important to note that this does not require any communication between the agents nor does it invoke the existence of trust or other extraneous concepts. Rather, it arises from the fact that both agents are equally rational and being in a symmetric situation, will reach the same conclusion about the choice of action; moreover, they realize and consider this in deciding their strategy. Similar arguments were used earlier by Hofstadter to sug-gest that rational agents will always cooperate in singlestage PD [15]. However, unlike these earlier arguments, the co-action concept does not imply that both agents will necessarily end up choosing the same action. For instance, the co-action solution for the single-stage PD is not to always cooperate (as suggested by Hofstadter's argument) but to resort to a mixed strategy when the temptation to defect is sufficiently high.
As in the co-action concept, each agent maximizes her payoff assuming that all other agents in a symmetric situation will be making the same decision, formally this amounts to optimizing the expected payoff functions of the two agents, which in this case are identical: Here p(= p 1 = p 2 ) is the probability with which each of the agents A and B chooses Action 1. Under the coaction concept, the equilibrium strategy p * of the agents is obtained by maximizing W with respect to p ∈ [0, 1]. If the maximum of function W in [0, 1] occurs at one of the ends (i.e., p = 0 or 1), it results in a pure strategy co-action equilibrium. However, if W has a maximum inside (0, 1) then the co-action equilibrium is a non-trivial mixed strategy, viz., It is easy to see that the existence of the co-action equilibrium for all symmetric games is guaranteed from the smoothness of polynomial functions such as Eq. 2. Also, unlike the Nash equilibrium, the co-action equilibrium is unique and thus, for a given symmetric game there is no ambiguity about the optimal choice of action for the agents in this solution concept.

III. EXAMPLES
Having described the general procedure for obtaining the co-action equilibrium, we will now apply it to three well-known two-person symmetric games, illustrating in each case the differences between the co-action and Nash equilibria. Each of these games is defined in terms of a specific hierarchical relationship between the payoffs R, S, T and P (using the terminology of the payoff matrix shown in Fig. 1).

A. Prisoner's Dilemma
Prisoner's Dilemma (PD) is one of the most wellstudied games in the literature of strategic choices in social sciences and evolutionary biology [16,17]. It is the canonical paradigm for analyzing the problems associated with evolution of cooperation among selfish individuals [18]. The game represents a strategic interaction between two agents who have to choose between cooperation (Action 1) and defection (Action 2). If both players decide to cooperate, each receives a "reward" payoff R and if both players decide to defect, then each receives a "punishment" payoff P . If one of the players decides to defect and the other to cooperate, then the former gets a payoff T (often termed as the "temptation" to defect) and the latter gets the "sucker's payoff" S.
In PD the hierarchical relation between the different payoffs is T > R > P > S. The only Nash equilibrium for this game is when both agents choose defection (with each receiving payoff P ), as unilateral deviation by an agent would yield a lower payoff (S) for her. Note that, mutual defection is the only Nash solution even if the game is repeatedly played between the players a finite number of times. However, it is easy to see that mutual cooperation would have resulted in a higher payoff (R) for both agents. This illustrates the apparently paradoxical aspect of the Nash solution for PD where pursuit of self-interest by rational agents leads to a less preferable outcome for all parties involved. The failure on the part of the agents -who have been referred to as "rational fools" [19] -to see the obviously better strategy is at the core of the dilemma and has important implications for the social sciences, including economists' assumptions about the efficiency of markets [20]. Further, experimental realizations of PD show that some degree of cooperation is achieved when the game is played by human subjects, which is at variance with the Nash solution [9,12,21].
In more general terms, PD raises questions about how cooperation can emerge in a society of rational individuals pursuing their self-interest [18] and there have been several proposals to address this issue. These have mostly been in the context of the iterative PD (rather than the single-stage game that we are considering here) and typically involve going beyond the standard structure of the game, e.g., by introducing behavioral rules such as direct or indirect reciprocity [22], assuming informational asymmetry [23], etc. By contrast, the co-action solution concept allows non-zero levels of cooperation in the standard single-stage PD played among rational selfish agents, with the degree of cooperation depending on the ratio of temptation T to reward R.
To obtain the co-action solution of PD, we use the formalism described in section II with the value of the lowest payoff S assumed to be zero without loss of generality. From Eq. 2 and using the hierarchical relation among the payoffs T , R and P for PD, it is easily seen that when T ≤ 2R, the optimal strategy for the agents is p * = 1, i.e., both agents always cooperate. On the other hand, when the temptation to defect T > 2R, the optimal strategy is a mixed one with the probability of cooperation [Eq. 3], i.e., the agents randomly choose between the available for low values of T (corresponding to temptation for defection in PD and for being aggressive in Chicken), the agents always opt for Action 1 (corresponding to cooperation in PD and being docile in Chicken). However, as T increases, agents opt for a mixed strategy, where Action 1 is chosen with decreasing probability. In both cases, in the limit of very high T , the agent strategy becomes fully random with the two actions being chosen with equal probability. Note that in PD, the optimal strategy also has a very weak dependence on P (corresponding to punishment payoff for mutual defection).
actions, defecting with probability 1 − p * . As temptation keeps increasing, the probability of cooperation decreases and in the limit T → ∞, p * → 1/2, i.e., the agents choose to cooperate or defect with equal probability, receiving an expected payoff W * → T /4. Thus, unlike the Nash solution of PD where cooperation is not possible, the co-action solution of the game always allows a non-zero level of cooperation, with 1/2 < p * < 1 [ Fig. 2 (a)]. The co-action solution also differs from the result expected based on the reasoning given in Ref. [15] -essentially a collective rationality argument [24] -which suggested that rational agents will always cooperate. To the best of our knowledge, co-action is the first solution concept which allows probabilistic cooperation by the players in the single-stage PD. The existence of non-zero level of cooperation in the co-action solution means that there is no longer any incompatibility between the individual actions of rational agents trying to maximize their payoffs and achieving the best possible collective outcome, thereby resolving the "dilemma" in PD. The co-action concept may be used to solve other games involving similar dilemmas such as traveler's dilemma [10]. We would like to emphasize that in both the Nash and co-action frameworks, the agents consider other agents to be just as rational as them, although the conclusions about the optimal strategy under these two concepts are very different. It is of interest to note in this context that in the various experimental realizations of PD, the level of cooperation observed is neither zero (as in the Nash solution) nor complete [21]. While the co-action solution may perhaps be too idealized to explain the results obtained under realistic condi-tions, it nevertheless provides a new benchmark strategy for such game situations.

B. Chicken
A two-person game that has been extensively investigated in the context of the study of social interactions and evolutionary biology is the game of Chicken (also referred to as Snowdrift or Hawk-Dove) [16,25]. It represents a strategic interaction between two agents who have to choose between being docile (Action 1) or being aggressive (Action 2). If both agents decide to be docile, they receive the payoff R, while if one is docile when the other resorts to aggression, the former -considered the "loser" -receives a lower payoff S (< R) and the latter -the "winner" -receives a higher payoff T (> R). However, the worst possible outcome corresponds to when both players choose to be aggressive, presumably resulting in severe damage to both, which is associated with the lowest payoff P . Thus, the hierarchical relation between the different payoffs in Chicken is T > R > S > P . Note that it differs from PD in that the payoff S is higher than P . Therefore, an agent benefits by being aggressive only if the other is docile but is better off being docile otherwise, as the cost of mutual aggression is high.
The game has three Nash equilibria, of which two correspond to pure strategies where one agent is docile while the other is aggressive. The mixed strategy Nash equilibrium p * 1 = p * 2 = S/(T + S − R) is given by Eq. (2), where it is assumed that the lowest of the possible payoffs P is zero [see Fig. 2 (b)]. As in many other non-cooperative games with multiple Nash equilibria, one has to invoke additional criteria (viz., equilibrium refinements [7]) to decide which of these solutions will be selected by the agents. In Chicken, a commonly used refinement concept is that of evolutionarily stable strategy (ESS) [25] an important concept in evolutionary game theory [26] which, in this game, gives the mixed strategy Nash equilibrium as the unique solution.
To obtain the co-action solution for Chicken, we note that under this solution concept, agents choose their actions so as to optimize the payoff function Eq. (2). Using the hierarchical relation of the payoffs for Chicken (assuming the lowest payoff P is zero without loss of generality), it is easy to see that for 2R ≥ T + S, p * = 1 is the optimal choice. On the other hand, when 2R < T + S, agents choose to be docile with a probability [Eq. (3)] Thus, for low values of T , both agents decide to be docile (non-aggressive) always and avoid damaging each other, whereas, when the stakes are high (for large T ) they randomly choose between the available actions, being docile with probability p * and aggressive with probability 1−p * . As in PD, in the limit of large T , i.e., T → ∞, the optimal strategy is p * → 1/2, where the agents choose to be aggressive or docile with equal probability, receiving an expected payoff W * → T /4. It is instructive to compare the optimal strategy of the agents under the different solution concepts when the stakes are very high. In the limit of T → ∞, the ESS suggests that both agents should resort to mutual aggression [i.e., p * 1 = p * 2 → 0 which is evident from Eq. (2)]. This would result in both agents suffering serious damage and receiving the lowest possible payoff P . Compared to this, the co-action concept yields a a significantly better outcome for both agents, as noted above.

C. Stag Hunt
The last of the two-person games we discuss here is the Stag Hunt which is used to describe many social situations where cooperation is required to achieve the best possible outcome [27]. The game represents a strategic interaction between two agents who have to choose between a high-risk strategy having potentially large reward, viz., hunting for stag (Action 1) or a relatively low-risk, but poor-yield, strategy, viz., hunting for hare (Action 2). The agents can catch a stag (which is worth more than a hare) only if they both opt for it, i.e., cooperate, thereby receiving the highest payoff R. However, being unsure of what the other will do, they may both choose the safer option of hunting hare, which can be done alone, so that each receives a lower payoff P . However, if one agent chooses to hunt stag while the other decides to hunt hare, the former being unsuccessful in the hunt receives the lowest possible payoff S, while the latter (who succeeds in catching hare) gets the payoff T . Thus, the hierarchical relation between the payoffs in Stag Hunt is R > T ≥ P > S.
As in Chicken, the game has three Nash equilibria, of which two correspond to pure strategies where both agents opt for hunting stag or both choose to hunt hare. Note that both strategies are also ESS for this game, so that this refinement, unlike in Chicken, does not yield a unique solution. The mixed strategy Nash equilibrium p * 1 = p * 2 = P/(P + R − T ) is given by Eq. (2) where it is assumed that the lowest of the possible payoffs S is zero.
The co-action solution for Stag Hunt is obtained by noting that as R is greater than T , the payoff function [Eq. (2)] increases monotonically in the interval [0, 1]. Thus, the co-action payoff W = p 2 (R + P − T ) + p(T − 2P ) + P is optimized when p * = 1, regardless of the values of R, T and P . Therefore, the solution of the game under the co-action concept is unique, with both agents opting to hunt stag, resulting in the best outcome for them.

IV. DISCUSSION
The different games that are analyzed in detail here show that the co-action solution concept gives rise to radically different outcomes compared to the standard Nash equilibrium concept. Co-action leads to strategies that are relatively "nicer" and globally more efficient being selected by the agents. In particular, it resolves the dilemma in PD as the mutually beneficial action, viz., cooperation, always has a significant probability (≥ 1/2) of being chosen by both agents. Similarly, co-action yields more cooperative outcomes in the other games, i.e., agents playing Chicken resort to non-aggressive strategies and agents achieve perfect coordination to receive the highest possible payoff in Stag Hunt. Thus, this solution concept reconciles the idea of individual self-interest pursued by rational agents with the achievement of collective outcomes that are mutually beneficial, even for single-stage games. In addition, the co-action solutions for these games are also unique and do not require any additional refinement concepts.
The key idea underlying the co-action solution is that agents make independent choices from the possible actions available, taking into account that other agents will behave the same way as them and they are also aware of this. If the interactions in an N -player (N > 2) game can be considered as the set of all pair-wise interactions between agents who are symmetric in every respect, it is easy to see that the optimal co-action strategy will be exactly the same as that of the two-person game. The co-action solution concept can be generalized even to cases where the symmetry assumption does not hold across all agents. If the agents are aware that some of the other agents are different from them, one can still apply co-action within each cluster of agents (group) whose members consider each other to be identical (i.e., the symmetry assumption holds). For agents belonging to different groups, however, the payoffs are not invariant under interchanging the identities of the players. Thus, the symmetry of agents is broken across groups. For a population of agents whose members can be considered as belonging to two groups, one can treat the game as a two-player Nash-like scenario where each "player" is now a group of agents. However, unlike the standard Nash setting where one cannot have a mixed strategy as a stable Nash equilibrium, it is now possible for mixed strategy equilibria to be stable [28]. In general, one can consider a game with N agents, clustered into M symmetry groups, who have to choose between two actions. Assuming that the size of each group i is n i (Σ i n i = N ), the payoff for an agent belonging to the i-th group is a polynomial of degree n i in p i (i = 1, . . . , M ), where each p i is the probability of agents in that group to choose one of the actions. By contrast, the corresponding formulation of the game in terms of Nash solution concept will involve N variables with the payoffs being linear in each of these variables. Therefore, this defines a novel class of games between multiple clusters of agents, with agents independently choosing the same strategy as the other members of the cluster they belong to.
In this paper we have focused on single-stage games but the co-action concept applies also to repeated games where information about the choices made by agents in the past are used to decide their future action. Here, the co-action solution developed in the context of singlestage games is applied at each iteration, with the past actions of agents used to define the different symmetry groups [14]. This is inherently a dynamical process, as the membership of these groups can evolve in time. For example, in iterative PD with N agents having memory of the choices made in the previous iteration, all agents who made the same decision in the last round will belong to the same symmetry group and will behave identically. The resulting solution can allow coexistence of cooperators and defectors in the game, which we discuss in detail in Ref. [29].
To conclude, we have introduced here an alternative solution framework for non-cooperative games which makes use of the symmetry between agents. The resulting co-action equilibrium for a game can have properties radically different from that of the corresponding Nash equilibrium, the conventional solution employed for such games. In particular, the co-action concept resolves the apparent conflict between rationality of individual agents and globally efficient outcomes in games such as PD. We believe that the co-action and Nash solutions represent two extreme benchmark strategies for non-cooperative games. While we do not address here the question of which concept is more appropriate for a given situation, it is conceivable that agent behavior in reality may be described by a strategy between these two extremes and can potentially be represented by a combination of them.
This work was partially supported by the IMSc Econophysics project funded by the Department of Atomic Energy, Government of India. We thank Deepak Dhar for useful discussions and Shakti N. Menon for a careful reading of the manuscript.