The coevolution of overconfidence and bluffing in the resource competition game

Resources are often limited, therefore it is essential how convincingly competitors present their claims for them. Beside a player’s natural capacity, here overconfidence and bluffing may also play a decisive role and influence how to share a restricted reward. While bluff provides clear, but risky advantage, overconfidence, as a form of self-deception, could be harmful to its user. Still, it is a long-standing puzzle why these potentially damaging biases are maintained and evolving to a high level in the human society. Within the framework of evolutionary game theory, we present a simple version of resource competition game in which the coevolution of overconfidence and bluffing is fundamental, which is capable to explain their prevalence in structured populations. Interestingly, bluffing seems apt to evolve to higher level than corresponding overconfidence and in general the former is less resistant to punishment than the latter. Moreover, topological feature of the social network plays an intricate role in the spreading of overconfidence and bluffing. While the heterogeneity of interactions facilitates bluffing, it also increases efficiency of adequate punishment against overconfident behavior. Furthermore, increasing the degree of homogeneous networks can trigger similar effect. We also observed that having high real capability may accommodate both bluffing ability and overconfidence simultaneously.

The emergence of overconfidence is a well-established bias in which a person's subjective confidence in self-assessment is greater than the objective accuracy of those judgments, especially when confidence is relatively high 1 . In human societies overconfidence has been recognized in many different ways, such as overestimation of one's actual performance, over-ranking of personal achievement relative to others, and the excessive certainty regarding the accuracy of individual beliefs 2 . Although it is often blamed for hubris, market bubbles, financial collapses, policy failures and costly wars, overconfidence remains prevalent in our daily experience [3][4][5] . Such a bias can evolve due to the competition of alternative strategies and may contribute significantly to the increase of morale, ambition, resolve and persistence [6][7][8][9] . Very high levels of core self-evaluations, a stable personality trait composed of locus of control, neuroticism, self-efficacy, and self-esteem, may also be related to the overconfidence effect 10,11 .
As a concomitant bias, bluffing, also named boasting or exaggeration, is a representation of something in an excessive manner 12,13 . The boaster is regarded as one who pretends to have distinguished qualities, but has not at all or to a lesser degree 6,12,14 . Usually bluffing is not reliably distinguished from true ability 15 and exists in different forms, such as amplifying achievements, deceiving others expectations by magnifying emotional expressions 12 . It is important to stress that the deception profile, including the appropriate levels of overconfidence and bluffing intensities, plays a decisive role in determining what an individual gets in resource competitions. Our specific interest here is to explore how such profiles develop due to an evolutionary process.
The application of realistic evolutionary rule, however, requires sanctioning of uncovered bluffing, which represents a sort of social norm of the population. In fact, the ability to develop and enforce social norms is probably one of the distinguishing features of the human species 16 . Several experiments and theoretical investigations have revealed that sanctions are able to create a sufficiently strong selective pressure to prevent cheating, which is necessary to stabilize human cooperation [17][18][19][20][21][22][23] . Similarly, the deception behavior, regardless of self-deception

Results
We start by presenting the stationary overconfidence level f O and bluffing level f B as a function of resource-to-cost ratio r/c, obtained on square lattice, as shown in Fig. 1(a). It suggests that increasing r/c does not noticeably change f O , but decreases f B , especially when system bias δ is relatively large. Lifting δ also significantly reduces overconfidence level f O for moderate punishment probability p (p = 0.5). Note that positive values of δ induce extra conflicts, and thus boost the chances of centralized sanctions. Therein it seems that the values of r/c have little impact on the stabilization of overconfidence, regardless of whether punishment is rare or frequent. Meanwhile, boast behavior (f B ) slightly decreases as r/c increases. Importantly, the results for regular random graph with k = 4 are in accordance with those for translation invariant square lattice. Thus it seems that the structure of interactions does not play a prominent role as long as the average degrees k are identical. The value of k, however, could play a decisive role on the evolution of deception profile. To explore this effect, we investigate the impact of r/c on f O and f B under extreme conditions (p = 1, δ = 1) on homogeneous networks with different k values (k = 4; 8; 16). As shown in Fig. 1(b) overconfidence almost goes extinct irrespective of the values of r/c and k, showing that enough sanctions can effectively reduce the general level of overconfidence. Meanwhile, f B drops sharply to a minimum value as r/c ascends when k = 4, in contrast to larger degrees as k = 8 or k = 16. In other words, having more available neighbors partially offsets the effect of punishment on boasters. We next evaluate the impact of probability of punishment p and system bias δ on general overconfidence level f O and bluffing level f B (see Fig. 2). Besides homogeneous networks, summarized in Fig. 2(a,b), we also explored the possible impact of interaction heterogeneity by considering BA scale-free networks, shown in Fig. 2(c). To avoid additional effects we used the same average degree = k 4 used for random graph in Fig. 2(a). It can be observed that at any given value of δ, increasing punishment rate p will slightly reduce both f O and f B . Meanwhile, for any given p, both f O and f B drop with δ monotonously, signalling that δ plays a decisive role in restraining the deception behaviours (both overconfidence and bluffing). This behaviour is based on the fact that large δ ensures frequent conflicts between competing players, which will reveal their real abilities. In the other extreme case, negative δ < 0 parameter values inhibits conflicts, which results in a prompt fixation into a high overconfidence and high bluffing deception profile (this case is not shown in figures). Moreover, another common trait of color maps independently of the applied topologies is that f B always evolves to a higher level than the corresponding f O , highlighting that natural selection provides higher bluffing level than overconfidence when other factors equal. Furthermore, the comparison of Fig. 2(a,c) illustrates that network heterogeneity can apparently elevate average bluffing intensity, f B . It also illustrates that the heterogeneity of interaction topology helps to restrain overconfidence for relatively large δ values. Interestingly, increasing k of homogeneous networks is capble to lift bluffing level f B while overconfidence f O is slightly reduced (see also Fig. 1(b)). For better understanding the possible influence of sanctioning mechanism on the evolution of deception profile (α, β), we monitor the time evolution of α and β values on a square lattice without and with punishment (shown in Fig. 3(a) and in Fig. 3(b), respectively). Figure 3(a) shows how the probability distribution of f(α, β) pairs evolve in time in the absence of punishment, when only imitation of deception profiles is possible. It can be observed that the small β values die out first, signalling that boast is most favored by natural selection. Later, when only large β values are present, those players become more successful who apply higher α values. As a result, the whole population will be trapped into a large (α, β) pair after sufficiently long relaxation (t = 100000 MC steps). In fact, once fixation occurs the evolutionary process stops. Here, f O and f B can then be determined by means of averaging over the final states that emerge from different initial conditions. As we conclude, a high α-high β combination survives when there exist only imitations, which is in accordance with our previous observations 14 . However, fixation never happens when sanction determines the evolution (see Fig. 3(b)). In the early stage almost half of the population is punished, hence low α-low β combinations will form the majority of f(α, β) distribution. Later, as time passes, a dynamic balance emerges between low α-low β combinations and a moderate α-high β pairs. The specific position of the latter depend on the actual values of δ and p parameters. In general, the punishment plays a "shunting" role here, undermining the stabilization of overconfidence and bluffing in the whole population. Importantly, these results hold for any homogenous networks besides square lattice. For strongly heterogeneous networks, sometimes more than one α − β pair can survive around strong hubs even without punishment, which is in agreement with related works where other player-specific profiles evolved 41,48 .
After realizing the significant impact of sanctions on the evolution of deception profile, next we are interested in the targets of such punishments. More precisely, we wonder whether the real inferiors' deception profiles are minimized on homogeneous networks with different k values. For this reason we measure separately the average real capability of those players who are punished and those players who are not. The ratio of their averages is denoted by R ability . Similarly, we also measure the average payoff of the mentioned subclasses, and their ratio is denoted by R payoff . These ratios are depicted in Fig. 4(a) for different random regular networks, where we gradually increase the degree k. Apparently, R ability < 1 indicates that on average, players having lower real γ abilities are punished more frequently. At the same time, R payoff < 1 values highlight that the mentioned small-γ group benefit less than their higher ability opponents. Increasing the degree of nodes, both R ability and R payoff raise unambiguously, showing that enhancement of connections narrows the real capability-and payoff-gap between the punished players and those who are not punished. In other words, punishment is directed principally towards who are  Fig. 4(b). The plot suggests clearly that both R O and R B ascend with γ, and exceed the ratio 1 once γ > 0.5. Note that homogeneous networks with other k values show similar tendency. Thus we conclude that players with high ability are inclined to evolve to a higher state of both overconfidence and bluffing because they have a higher chance to collect resource without conflict. Furthermore, if conflict is inevitable and competitors should reveal their real abilities then the mentioned players still have a higher chance to win. Lastly, it is instructive to investigate the impact of upper limits α max and γ max on the evolution of f O and f B values. By keeping γ max = β max = 1, α max > 1 means that excessive overconfidence intensity is allowed for competitors. γ max > 1, when α max = β max = 1, however, implies that real abilities of players are significantly higher compared to the changing α or β values. We note that β max > 1 is not taken into consideration, for extravagant boasting could be easily recognized from real facts. For appropriate comparison, f O is normalized, max , when α max > 1 is applied. As demonstrated in Fig. 5(a), the possibility of sanctioning results in drastic reductions in the normalized overconfidence level f O norm as α max is increased. It suggests that punishment can effectively restrain excessive overconfidence, but is unable to decrease bluffing level significantly. However, without punishment (p = 0), raising α max gives rise to intensive conflicts that help competitors to recognize others' real capabilities. Therefore, f O norm and f B monotonously decrease with α max , and finally converge to 0.5, which equals to the initial value of average bluffing intensities. We stress that the results presented in Fig. 5(a) are robust and remain valid if we use other interaction topologies. Increasing γ max drives the evolution toward "neutral drift" because peer biases, such as overconfidence and bluffing, become second-order important in resource competitions when real abilities dominate. Importantly, however, f O and f B may fluctuate heavily in heterogeneous networks, showing that the existence of strong hubs might influence significantly the evolution both overconfidence and bluffing.

Discussion
In summary, we have investigated how overconfidence and bluffing co-evolves within the framework of a resource competition game. It is a well recognized fact that when confidence is relatively high then the whole population fall victim easily into overconfidence, which is considered to be the most "pervasive and potentially catastrophic" of all the cognitive biases by some psychologists 1,10 . Counterintuitively, this "erroneous" psychology can maximize individual fitness in many situations, leading to its prosperity in human society. Meanwhile, the existence of bluffing behavior, sometimes unable to be detected, usually leads to ambiguity in one's perception about other's real ability. Our previous study highlighted that bluffing promotes overconfidence and they both stabilize at a high level when evolution is limited via imitation without the chance to reveal competitors' real abilities 14 . However, the ability to develop and enforce social norms is probably one of the most characteristic feature of human species 16 . Motivated by this fact we propose an evolutionary which combines sanction mechanism with the clebrated rule of "imitating the better" 54 . Punishment here, instead of reducing individuals' real income, is only reduced to their deception behaviors, including both self-deception (overconfidence) and other-deception (bluffing). It is a key point of our model that these two mechanisms, which may determine a player's success, can coevolve. Furthermore, except the deception profile, the system bias describing the group inclination towards . extra conflicts is also considered. Accordingly, system bias can be treated as integral effect, caused by all the other factors, to stimulate conflicts (δ > 0) or to inhibit conflicts (δ < 0) between competitors. In addition, punishment is not certain to occur, but happens with probability p here. Lastly, we stress that we have tested different interaction topologies to explore the possible consequences of structured population. All these details make our model more realistic. Our extended model gives deeper insight to previous findings 14 . As shown in Fig. 2, overconfidence and bluffing have essentially the same changing tendency irrespective of p, δ and topological properties. It is in accordance with previous observation that bluffing promotes overconfidence. There is, however, a significant difference, when both side of deception can coevolve. Namely, boasting seems more stable than the fatal psychology of overconfidence because individuals can take advantage of bluffing immediately. As a consequence, eliminating boast behavior requires more intensive sanction mechanism to work. We also find that increasing heterogeneity or average degree of the interaction networks significantly promote bluffing, and simultaneously increase efficiency of adequate punishment (when p and δ are large) against overconfident behavior. More importantly, this third-party punishment prominently limits overconfidence of excessive intensity. Intriguingly, high capability of an elite might induce high level of his deception profile, which lies in the fact that elites hardly fail in the conflicts.
In conclusion, for better understanding the intricate relation between overconfidence and bluffing, we have proposed a more realistic model in which the individual deception profile coevolve. Overall, both social norms and topological properties of interaction networks have substantial influence on the evolution of these "peer biases". We hope that these observations will motivate further research aimed at promoting our comprehension of the evolution of these "erroneous" but sometimes meaningful inclinations.

Methods
The traditional setup of an evolutionary game assumes N players occupying vertices of an interaction graph. Our basic model is a resource competition game (RCG) in which neighbors compete for resources and their success is based on how convincingly they claim for it. Without loss of generality, an individual i is characterized by a time-independent real capability γ γ ∈ , [0 ] i m ax , and an evolving overconfidence intensity α α ∈ , [0 ] i m ax , and bluffing intensity β β ∈ , [0 ] . While the real capacity γ i is fixed and unalienable feature of each players, α i represents the actual overconfidence state (OS), a perception error about self-ability. Similarly, β i characterizes the bluffing state (BS) of the player that helps to over-represent abilities towards competitors. In particular, i believes he/she owns a "self-perceived capability" k i as: while his/her "displaying capability" m i is observed as: Supposing a resource r is potentially available to neighboring individuals that claim it. If neither of them claims then the resource remains unused. If only one individual makes a claim, then it acquires the resource and gains fitness r while the other gains nothing. When, players i and j both claim for this resource, a RCG takes place. In the latter case each individual pays a cost c due to the conflict between them, and the one who has higher real capability acquires the resource. In this model, the recognition ability of each player is also influenced by a uniform system bias δ, which allows us to control the intensity of conflicts between competitors. Summing up, a player i facing with player j gains a payoff P ij that can be calculated as follows: (1) If k i > m j − δ and k j < m i − δ, player i claims but player j does not, thus P ij = r.
(2) If k i < m j − δ, player i will not claim and remains empty handed, P ij = 0.
(3) If k i > m j − δ and k j > m i − δ, a conflict emerges between players i and j when they have to reveal their real capabilities which determine what they get: Here parameter δ represents a uniform group inclination how to handle possible conflicts: for positive δ > 0 values group members are motivated to "open their cards" impulsively and bravely, and thus more conflicts take place. In case of δ < 0, however, conflicts are avoided because all players in the group are excessively cautious.
Initially each player i is assigned by random γ i , α i and β i values. The situation that two values are equal is not taken into consideration. In stark contrast to our preliminary work 14 in the extended model both α i and β i can coevolve, which influence dramatically a player's success in resource competition. During an elementary Monte Carlo (MC) step a randomly selected player i collects its payoff P i by playing RCG with all k i neighbors, where k i represents the degree of player i in the interaction graph. The total payoff of player i is where Ω (i) represents all players in i′ s neighborhood. Subsequently, a randomly chosen neighbor j acquires its payoff P j in a similar way.
As we noted, a crucial point of the evolution that players may change their deception profile to collect more resources. In particular, if a player i looses a conflict against player j then his/her extreme overconfidence and bluff levels are revealed, hence player i is punished with probability p. As a result, the (α i ; β i ) values are reduced to the minimum levels of the whole population. Otherwise, player i adopts the deception profile of a randomly selected neighbor j with the probability W = W(P j − P i ). And thus where ε α and ε β represent the minimum overconfidence and bluffing intensity respectively. Parameter K characterizes the level of uncertainty in deception profile adoption 55 . Without loss of generality we use K = 0.1, but qualitatively similar results can be obtained for other K values. Importantly, since the profile consists of two parameters, two independent random numbers are drawn to enable uncorrelated imitation of α i and β i values, as it was suggested in ref. 48. The presented simulation results were obtained using different interaction graphs, such as square lattice with periodic boundary conditions, regular random graph with different degrees, and the Barabási-Albert (BA) scale-free graph 56 . The latter is served to explore the possible consequence of heterogeneities. In accordance with the random sequential update, each full MC step, which consists of N times of repeated elementary steps, gives a chance on average once to update individual deception profiles. The typical system size contains N = 10 4 − 10 5 nodes and the stationary frequencies are determined by averaging over 10 4 MC generations in the stationary state after sufficiently long relaxation times. The stationary state is considered to be reached when the average of the overconfidence level f O (the stable average values of α) and bluffing level f B (the stable average values of β) no longer change in time. We have averaged the final outcome over 50 independent initial conditions. Scientific RepoRts | 6:21104 | DOI: 10.1038/srep21104 unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/