Evolution of costly signaling and partial cooperation

Two seemingly unrelated, but fundamental challenges in evolutionary theory, are the evolution of costly signals and costly cooperative traits, both expected to reduce an individual’s fitness and diminish by natural selection. Here, by considering a well mixed population of individuals who produce signals and decide on their strategies in a game they play, based on the signals, we show that costly signals and costly cooperative strategies can co-evolve as a result of internal dynamics of the system. Costly signals evolve, despite their apparent cost, due to a favorable cooperative response they elicit. This favorable strategic response can be quantified in a fitness term which governs the distribution of costly signals better than their apparent cost. In the same way, cooperative strategies evolve as they can reach a high fitness due to the internal dynamics of the systems.

. Here, w α is the net payoff of individual α (In case an agent's net payoff becomes negative, it is set to zero. This makes sure that the corresponding agent does not contribute any offspring to the next generation). The offspring inherit the signal production probability P(σ ) and the strategy matrix s(σ 1 , σ 2 ) of their parent. However, mutations can occur. With probability ν σ a mutation in signal production probability occurs in which case the probability that the offspring produces a randomly chosen signal i increases. This is done by setting P o (σ ) = (1 − dσ )P p (σ ) + dσ [i]. Here, [i] is a vector whose ith element is 1 and its other elements are zero, and the subindices o and p refer respectively, to offspring and parent. With probability ν s a mutation in strategy occurs in which case a randomly chosen entry of the strategy matrix of the offspring is set randomly equal to either C or D.
We have considered different game structures. In the next section we start with a prisoner's dilemma (PD) game. Then we turn to another game frequently appealed in studying the evolution of cooperation. Namely, snow drift (SD), also known as hawk-dove and chicken game. The payoff of the games considered in this study are presented in Table. (SI.1). R S T P R S T P R S T P PD 0.6 0 0.8 0.2 TTD 0.6 0 1.4 0.2 SD 0.6 0.4 0.8 0.2 BS 0.4 0.8 0.6 0.2 leader 0.4 0.6 0.8 0. 2   Table SI.1. Games and their payoffs. R is the payoff to mutual cooperation, T payoff to defection against a cooperator, S payoff to cooperation with a defector, and P the payoff of mutual defection.

SI. 2 Mean field theory of the model
In this section we develop a mean field theory for the model. We proceed to write down expressions for the fitness of signals and strategies and use a selection-mutation equation to write a mean field equation for the model. In order to write down a mean field equation for the model, we use some simplifying assumptions. We assume individuals' signal is fixed during their life time. This simplified model can be of interest in many biological situations in which individuals' signal is a phenotype which is fixed during the life time. We note that this corresponds to the case dσ = 1 in our model. A second simplifying assumption is that, mutations in signal production and strategies occur as a result of agents randomly switching their signal and strategy. This is done, by assuming that each agent switches its signal to another signal chosen uniformly at random, with rate (probability per unit time) µ σ . In addition, agents switch each entry of their strategy matrix to the opposite value (C to D and vice versa), with rate µ s . Such an interpretation is more consistent with an interpretation of the evolutionary process as an imitation-exploration process, in which agents imitate available strategies and signals in the population proportional to their success, and at the same time, explore by randomly switching to novel strategies and signals. In this section we use lower case Latin letters (a, b, ... ) to denote signals.
Consider the set of strategies which determine an individual's act if it shows signal a while its opponent shows signal b. That is s(a, b). This can be either C or D. We show the density of strategies of the form s(a, b) = C by ρ a,b . Obviously, the density of strategies of the form s(a, b) = D is equal to 1 − ρ a,b . In addition, we show the density of signal a by ρ a . This is the number of times signal a is produced in the population at a time step.
We begin by writing an expression for the fitness of signals. Consider the expected payoff of signal a (by this we mean the expected payoff accrued to an individual by showing signal a) against signal b. We show this by w(a|b). Under the mean field assumption, this can be written as: (SI.1) Here, the first term is the probability that both signalers cooperate times the payoff of signal a in this case (R), the second term is the probability that a-signaler cooperates while the b-signaler defects times the payoff of signal a (S), the third term is the probability that a-signaler defects while the b-signaler cooperates times the payoff of signal a (T ), and finally the last term is the probability that both signalers defect times the payoff of signal a (P). The expected payoff of signal a is equal to the average of w(a|b) over b: As mentioned above, signals are also subject to mutation. Taking into account both selection of signals based on their payoff and mutations in signals, the following mean field equation for the evolution of signal density can be written: Here, c(a) is the cost of signal a, and w(a) − c(a) a = ∑ a ρ a (w(a) − c(a)), is the average net payoff (payoff minus cost) of signals. The first term ensures signal densities grow proportional to their fitness, the second term is the loss of density due to mutations out of a signal, and the last term is the gain in density due to mutations to the signal.
In the same way, we can write evolution equation for the density of strategies. The expected payoff of a strategy of the form s(a, b) = C is: (SI.4) The first term is the probability that an individual shows signal a, while its opponent shows signal b, times the probability that the opponent cooperates (ρ b,a ) times R (which is the payoff of the strategy s a,b = C), and the second term is the probability that an individual shows signal a, while its opponent shows signal b, times the probability that the opponent defects (1 − ρ b,a ) times S (which is the payoff of the strategy s a,b = C). In the same way, the expected payoff of the complementary strategy s(a, b) = D is: (SI.5) In addition to growing proportional to their payoff, strategies are also subject to mutations. Taking both effects into account, the following mean field equation for the density of strategies can be written: The term in the bracket is the average payoff of strategies s(a, b) = C and s(a, b) = D. The first term ensures that the density of each possible entry of the strategy matrix (s(a, b) = C and s(a, b) = D) grows with a rate proportional to its payoff, the second term is the mutation out of a strategy, and the last term is the mutation to a strategy. Eq. (SI.3) and eq. (SI.6), provide a set of equations which can be simultaneously solved for all a and b, to yield evolution of the density of signals and strategies in the system. In Fig. (SI.1), numerical solutions to the mean field equations are presented. Here, the prisoner's dilemma game, with payoff values given in Table. (SI.1) is considered, and the signal costs are randomly distributed in the interval [0, c max ], with c max = 0.1. In Fig. (SI.1.a) and Fig. (SI.1.b), the densities of different strategy pairs CD (solid green line), CC (dash-dotted blue line), DD (dashed red line), for different mutation rates (indicated in the figure) are presented. For large mutation rates (≥ 10 −4 ) a large density of cooperative strategies is maintained in the system. The fraction of CD strategy pairs is close to 0.5, and the fraction of CC strategies increases with increasing mutation rate and approaches 0.25 for very large mutation rates. These features are consistent with agent based simulations of the model. The density of cooperative strategies however, decreases for In the mean field approximation, the level of cooperation increases with increasing the mutation rate in strategies ν s . (c): Time evolution of density of a randomly chosen signal σ , density of the strategies which cooperate with that signal C(σ ), and fitness of that signal w σ , for two different mutation rates indicated in the figures. The density of the strategies which cooperates with a signal increases when the density of that signal is low. When ρ C(σ ) increases enough, the fitness of the signal increases, and consequently its density increases too. This decreases the fitness of strategies C(σ ), and thus ρ C(σ ) decreases. The dynamics go through the same cycles. (d) Time average density of signals ρ σ t as a function of signal cost (up), and as a function of time average signal fitness w σ t (down). Mean field theory predicts a decreasing trend between signal cost and density, and an increasing trend between signal fitness and signal density. Here, the mean field equations are solved numerically for t = 2000 time steps and an average over the last 1000 time steps is taken. very small mutation rates (≤ 10 −5 ). We have not observed such a decrease in agent based simulations for comparable mutation rates. This seems to suggests mean field theory deviates from the results of the dynamics for very small mutation rates. The reason is not difficult to see. In the mean field theory, correlations between different entries of the strategy matrix are neglected. As described below, these correlations play an important role in the dynamics of the system. When the density of a given signal b is low, strategies which cooperate with that signal (i.e. strategies of the form s(a, b) = C), impose small fitness cost and can grow and get fixed in the population due to association with fit strategies. This means that the same individual can have strategies of the form s(a, b) = C and s(a, b ) = D. When the density of the signal b is low while the density of b is high, such individuals accumulate a high payoff and grow in number. This leads to an increase of cooperative strategies with the rare signal b. However, in the mean field assumption, such correlations are neglected and consequently, cooperative strategies can not be accumulated due to association with fit defective strategies. This leads to a reduction in the level of cooperative strategies in the mean field theory, compared to the agent based simulations.
In Fig. (SI.1.c), we take a look at the dynamics of signals and strategies in the mean field theory, by plotting the density of one of the signals ρ σ and its fitness w σ , together with the fraction of strategies which cooperate with that signal ρ C(σ ) , for two different mutation rates (indicated in the figure). The same dynamic observed in the agent based simulations, is grasped by the mean field theory. When the density of a signal is low, strategies which cooperate with that signal impose a small cost on their bearer. Consequently, the fraction of such strategies increase. This is observable in both cases presented in Fig. (SI.1.c), however, more salient in the lower panel of Fig. (SI.1.c), where the rate of mutation in strategies is much smaller than that of signals. When the density of cooperative strategies with the given signal increases enough, the signal reaches a high fitness. Consequently, its density increases. This in turn decreases the fitness of strategies which cooperate with that signal, and thus ρ C(σ ) decreases. However, such a dynamics is observed only in small ν s . When, ν s increases, the mean field theory predicts signal and strategy densities to go to a fixed point and do not show variations in time. While according to the simulations, the same interrelated dynamics of signals and strategies observed in low mutation rates in the mean field theory, is at work for large mutation rates as well. This is another feature of the dynamics resulting from correlations between strategies, which is neglected in the mean field theory.
Finally, in Fig. (SI.1.d), we look at the relation between signal density, and cost, and also between signal density and fitness. As can be seen, the mean field theory predicts a non-zero density for costly signals. This results from the fact that costly signals receive a high payoff in the course of dynamics of the system which compensate for their cost. More precisely, as the density of costly signals is smaller compared to less costly signals, the fraction of cooperative strategies with costlier signals is higher. This partially compensate for their cost, and leads to a non-zero abundance of costly signals. We note that, without such mechanism, the dynamics would have produced a stationary state in which the density of all the signals except the cheapest one would have been zero. In addition, the mean field theory predicts signal density is a decreasing function of signal cost (upper panel in Fig. (SI.1.d)) and an increasing function of signal fitness (lower panel in Fig. (SI.1.d)). These features are consistent with agent based simulations of the model.

SI. 3.1 Prisoner's dilemma game
The dependence of the cooperation level on the model parameters in the prisoner's dilemma game is investigated in Fig. (SI.2). The upper part of each panel shows the average fraction of C (blue lines marked by circles) and D (red lines marked by squares) strategies in the population. The lower part of each panel shows the fraction of strategy pairs actually played in the population. These include mutual cooperation CC (blue circles), mutual defection (red squares), and heterogeneous cooperation-defection CD (green triangles). The base parameter values are given in Table. (SI.2). These are the same parameter values used in the simulations in the main text. In each simulation, one of the parameters of the model is varied as specified in the figure. The fraction of strategies are calculated from a time series of the system of length t = 50000 after discarding the first t = 1000 steps.  The dependence on the temptation T is investigated in Fig. (SI.2.a). Here, T is varied from 0.5 to 2. As R is fixed at 0.6, the prisoner's dilemma includes T s in the interval [0.6, 1.2]. In the upper part of Fig. (SI.2.a), we see that, by increasing T , the fraction of C strategies decreases, and the fraction of D strategies increases. For small T s the fraction of C strategies is larger, and at a certain T value in the PD range, the fraction of D strategies becomes larger. More interesting is how the fraction of strategy pairs actually played in the population changes. As can be seen in the lower part of Fig. (SI.2.a), the fraction of CCs decreases with increasing T . For small T s this reduction in the density of CC is accompanied by an increase in both the density of DD and CD pairs. However, for larger T s, the density of DDs become fixed while the density of CDs continues to increase well into the turn taking dilemma regime (i.e. when the prisoner's dilemma becomes a turn taking dilemma which happens at T ≥ 2R = 1.2). In the turn taking dilemma range, the heterogeneous CD pair is in fact the socially optimal solution to the game. Thus as we can see, signaling can be a natural way to solve the cooperation dilemma and provides a social optimal solution, in the regime (i.e. turn taking dilemma) where, many of the conventional known solutions of the cooperation dilemma fail to solve the dilemma (As conventional solutions can only give rise to homogeneous CC strategies).
The dependence on the number of signals n is investigated in rapidly increases with increasing n up to n = 20. For larger n, although still increasing, but with a slower rate. This shows that signaling is more efficient in resolving the social dilemma in the PD when the number of signals is large enough.
In Fig. (SI.2.c) the dependence of the cooperation level on the maximum cost of signals c max is investigated. For c max = 0 signals have no cost. We see that up to moderate cost c max = 0.2, cost of signals does not affect the level of cooperation adversely. An absolute cost of dc = 0.2 coincides to a normalized cost (absolute cost divided by mean payoff of the individuals from the game) of approximately 0.5. This means agents can invest up to half of their payoff in growing and maintaining costly signals. However, for larger costs, it becomes less profitable for agents to produce costly signals. Consequently, agents resort less to signaling as a way to solve the dilemma. Signal distributions become more homogeneous (maximized around the cheapest signals) and the level of competition between agents to maximize their payoff by engaging in a signal war decreases. Consequently, the level of D and mutual defection DD increase.
The dependence on the population size N is investigated in Fig. (SI.2.d). We see that for small population sizes, by increasing the population size the fraction of C strategies decreases while the fraction of D strategies increases. This reduction leads to a reduction of mutual cooperation while the fraction of CD strategies shows small sensitivity to the population size. For larger populations, the sensitivity of the density of strategies to the population size decreases.
The dependence on the mutation rate in strategies is investigated in Fig. (SI.2.e). Both the fraction of cooperation and mutual cooperation decrease by increasing ν s for small mutation rates. The sensitivity to the mutation rate is stronger for small mutation rate and significantly decreases for large mutation rates.
The dependence on the mutation rate in signals is investigated in Fig. (SI.2.f). Here, the cooperation level changes depending on whether the mutation rate in signals is smaller or larger than the mutation rate in strategies. Both ρ C and ρ CC are maximized when ν σ is comparable to ν s . In other words, the level of cooperation is maximized when the time scale of evolution of signals and strategies are comparable and a gap in the time scale of the change of strategies and signals undermines 4/18 In each panel the dependence on one of the parameters is investigated. T is the temptation, n is the number of signals, c max is the maximum cost of signals. N is the population size, ν σ is the rate of mutation is signals, ν s is the rate of mutation in strategies, and dσ is the strength of mutations in signals. The upper part of each panel shows the fraction of C (cooperation) and D (defection) in the strategy matrix of individuals, averaged over population. The lower panel shows the fraction of strategy pairs actually played in the population. CC refers to mutual cooperation, DD to mutual defection, and CD refers to the case that an individual cooperates while the opponent defects. The simulations are run for t = 50000 time steps and an average is taken after discarding the first t = 1000 time steps.
cooperation. When ν σ ν s , strategy densities become fast changing variables which adapt to the slowly changing signal distributions in the population. Such adaptation undermines cooperation, as defective strategies with high fitness are produced more often and grow in number by selection. In the opposite regime, when ν s ν σ , agents have more chances to exploit cooperators by changing the probability with which they produce signals. In other words, strategies being a slowly changing variable with respect to the signal production probability, agents have the chance to exploit cooperators by changing their signal production probabilities in a way such that they get the most exploitation out of cooperators. This reduces cooperators' payoff the thus, decreases the frequency of cooperative strategies and mutual cooperation. To summarize, when there is a large gap between time scale of change of strategies and signals, agents can use this gap, either to exploit the slowly changing profile of strategies by producing fit signals (when ν s ν σ ), or to switch to fitter strategies which defect more often against the prevalent signals (when ν σ ν s ). Both effects undermine cooperation, and the cooperation level is maximized when the time scale of change of signals and strategies are comparable.
Dependence on the strength of mutations in signals is investigated in Fig

SI. 3.2 Snow drift game
Another game frequently appealed to in studding the evolution of cooperation is the snow drift (SD) game [18]. Contrary to the PD, in SD, a certain fraction of cooperation is maintained in the equilibrium in a well-mixed population. For this reason, the evolution of cooperation in this game is not as a serious problem as in PD. Consequently, SD has attracted less attention compared to PD in studies of evolution of cooperation. However, the equilibrium level of cooperation in the SD In each panel the dependence on one of the parameters is investigated. N is the population size, n is the number of signals, c max is the maximum cost of signals, ν σ is the rate of mutation is signals, ν s is the rate of mutation in strategies, and dσ is the strength of mutations in signals. The upper part of each panel shows the fraction of C (cooperation) and D (defection) in the strategy matrix of individuals, averaged over the population. The lower panel shows the fraction of strategy pairs actually played in the population. CC refers to mutual cooperation, DD to mutual defection, and CD refers to the case when an individual cooperates while the opponent defect. The simulations are run for t = 50000 time steps and an average is taken after discarding the first t = 1000 time steps.

5/18
game is smaller than the social optimum. For this reason, this game, just as the PD game, offers a social dilemma, and the mechanisms by which the level of cooperation can be increased in this game has been studied [18]. Many of the mechanisms, resolve the social dilemma in SD by promoting mutual cooperation. However, as both mutual cooperation and heterogeneous cooperation-defection strategy pairs lead to the same social payoff in the SD, mechanisms which allow the individuals to coordinate in heterogeneous strategy pairs can be as efficient as mutual cooperation in resolving the social dilemma in SD. Some mechanisms which accommodate this by breaking the symmetry between the players are identified. For example if individuals hold a territory, a strategy which defects if the owner of the territory, and cooperate otherwise, can lead to a social optimum solution of the SD game [18]. Our study shows signaling can be appealed too as a fundamental mechanism to resolve or reduce the social dilemma in SD by promoting heterogeneous cooperation-defection pairs. Such heterogeneous strategies offer a situation as efficient as homogeneous cooperation in the SD game. Below, we investigate the parameter dependence of cooperation and partial cooperation level in the SD game. The dependence of the density of cooperative strategies ρ C , and the defective strategies, ρ D on the population size is shown in the top panel of Fig. (SI.3.a). As can be seen, both the densities of cooperation and defection are close to 1 2 , with the latter slightly larger than the former, for all population sizes. Comparison with PD (in Fig. (SI.2.d)) shows the level of cooperation is higher in SD compared to PD. The densities of the strategy pairs actually played in the population are shown in the lower panel of Fig. (SI.3.a). While both the densities of DD and CC strategy pairs decrease with increasing population size, the density of the heterogeneous CD strategy increases with population size. Furthermore, the density of CC strategy pair is larger than that of DD in all the population sizes. These features show that signaling is an efficient mechanism to resolve the social dilemma of cooperation in the SD game, specially in larger population sizes.
The dependence on the number of signals n is investigated in Fig for any number of signals both ρ C and ρ D are close to 1 2 , the density of cooperative strategies slightly increases with n. To see what effect this has on the density of strategy pairs played in the population, in the lower panel of Fig. (SI.3.b), we plot the density of strategy pairs as a function of n. As can be seen, the density of heterogeneous CD strategy pair decreases by increasing n, specially for small values of n. This is accompanied by an increase in the density of mutual cooperation (CC). The variation of the density of strategy pairs with respect to n is very small for larger values of n.
We investigate the dependence of the density of strategies on the maximum signal cost c max , in Fig. (SI.3.c). As can be seen in the top panel, ρ C and ρ D are close to 1 2 for all c max , and show small variation with respect to c max . However, for very large c max , a slight decrease in the density of cooperative strategies is visible. By turning to the density of strategy pairs in the lower panel of Fig. (SI.3.c), we see that by increasing signal cost, the density of heterogeneous strategies increases, while both ρ CC and ρ DD decrease. This shows that, in the SD game, individuals can induce their opponent to cooperation when the cost of signals is higher. This was not the case in the PD game as can be seen by comparison of Fig. (SI.2.c) for PD and Fig. (SI.3.c) for SD.
In 3.e), the dependence on, respectively, the mutation rate in strategies ν s , and the mutation rate in signal production ν σ is investigated. In the top panels, small variation with respect to ν s in both ρ C and ρ D is seen. The behavior of ρ CC , seems to be similar to the PD game: it decreases with increasing ν s (Fig. (SI.3.d)), and is maximized when ν σ is comparable with ν s (Fig. (SI.3.e)).
Finally, the dependence on the strength of mutations dσ is investigated in Fig. (SI.3.f). As can be seen in the top panel, the density of cooperative strategies, ρ C increases, while ρ D decreases, by increasing dσ . In the lower panel, the dependence of strategy pairs played in the population, as a function of dσ is plotted. Here, we see that ρ CD shows small variation with respect to dσ . On the other hand, the density of mutual cooperation increases, and the density of mutual defection decreases by increasing dσ .

SI. 4 Dependence of signal frequency on cost, and on fitness for different values of model parameters
In this section we systematically investigate the dependence of signal density on fitness, and on apparent cost for different values of the parameters of the model. We show that in all the cases (all the games and all values of model parameters) time average of signal density shows strong dependence on, and is an increasing function of, time average signal payoff. While the dependence of time average signal density on the apparent cost of signals is much weaker, and in many cases it is not possible to establish a trend between the two. We begin by the prisoner's dilemma game, and then proceed to other games.

SI. 4.1 Prisoner's dilemma game
To show that signal fitness determines signal densities for all parameter values, we run simulations with the base parameter values given in Table. (SI.2). However, in each simulation we change one of the parameters to test whether the trend between signal density and signal fitness, found in the main text, is valid in all parameter regions. The result is given in Fig. (SI.4) for the PD game. In each panel, one of the parameters is changed and time average signal density ρ σ t as a function of average of the payoff gathered by that signal w a σ t , over the same time window is plotted (this can be considered as the realized fitness of the signals and in the following is called realized fitness, or fitness of signals). In the inset of each panel, time average signal density as a function of normalized apparent cost (defined as apparent cost divided by average payoff of the individuals received from the games) is plotted too. In In all the cases, when time average signal density as a function of time average payoff of signals is plotted, a strong increasing trend is observable. This trend results from the fact that signals are indirectly subject to selection based on their fitness. Consequently, signals with larger fitness are selected more often and assume a higher density in the population. On the other hand, when the time average density of signals as a function of signal cost is plotted (insets), in many cases no trend is observed. For some parameter values a weak decreasing trend is observed. However, in all the cases, the trend between signal density and signal fitness is (much) stronger than that between signal density and signal cost (This is shown more quantitatively below).
Of particular interest are  Table. (SI.2). In order to investigate the parameter dependence, in each panel, one of the parameters is changed (as specified in the figure). As can be seen, in all the cases ρ σ t shows an increasing trend with respect to w a σ t . For each panel, ρ σ t as a function of normalized apparent cost of the signalsc σ is shown in the corresponding inset. For some parameter values no trend between ρ σ t andc σ is seen, and in all the cases the trend between ρ σ t and w a σ t is much stronger than the trend between ρ σ t andc σ . Here, the time average is taken over a time window of length t = 15000, starting from the beginning of the simulations. determined by signal fitness in this case too. In Fig. (SI.4.f), maximum signal cost is c max = 1. As signals are distributed randomly in the interval [0, c max ], most of the signals have extravagant cost in this case. In fact, noting that the maximum payoff reached in the game can be T = 0.8, signal costs can be larger than agent's payoff. This is seen in the inset of Fig. (SI.4.f), where normalized cost of signals can be close to 3 (i.e. signal cost can be three times agent's average payoff from the game). Obviously, no agent affords to produce such heavily costly signals. Consequently, for excessively costly signals, the density of signals become zero or close to zero.
We put the parameter dependence of the existence or lack thereof of a trend between signal density and fitness, (and between signal density and cost) to a more quantitative test as well. For this purpose we appeal to two statistical tests to examine the presence of a trend: Spearman's rank correlation test and Mann-Kendal trend test [57,58]. The results of the tests are given in Table. (SI.3). Here, the Spearman's rank correlation coefficient between time average signal density and normalized apparent cost r(c, ρ σ t ), and also Spearman's rank correlation coefficient between ρ σ t and time average signal realized fitness w a σ t , denoted as r( w a σ t , ρ σ t ), are given in first rows. The p value of the Spearman's test p r , and also the p value of the Mann-Kendall test p MK are also given. A p value smaller than 0.05 is conventionally considered as a success in establishing a trend and a p value larger than 0.05 is considered as a failure in establishing the trend. For each of the model parameters, 3 different values are considered and r, p r , and p MK (each first for the normalized cost and time average density, and then for time average fitness and time average density) are presented. As can be seen, in most of the cases both tests fail to establish a trend between time average density and normalized apparent costs. While a strong or a very strong trend between time average density and time average fitness of signals is established using both tests. In addition, in all the cases, the (increasing) trend between time average density and time average fitness is (much) stronger than the (decreasing) trend between time average density and normalized cost of signals. This means that, signal fitness explains signal densities in the population, much better than the signal costs, and can be seen as a solution to the puzzle posed by the presence of costly signals.  Table. (SI.2). As can be seen, in all the cases ρ σ t shows an increasing trend with respect to w a σ t . For each panel, ρ σ t as a function of normalized apparent cost of the signalsc σ is shown in the corresponding inset. For some parameter values no trend between ρ σ t andc σ is seen, and in all the cases the trend between ρ σ t and w a σ t is much stronger than the trend between ρ σ t andc σ . Here, the time average is taken over a time window of t = 10000, starting from the beginning of the simulations.
The time window over which the average is taken differs for different parameters in Table. (SI.3). It is equal to 1000 for ν s and c max and N (the same for all the three values presented in the table), and equal to 5000 for ν σ , and 10 4 for dσ . Variations of this time does not affect the overall result, as long as it is larger than the possible time lag between the signal density and signal fitness (as for the signal fitness to affect signal density we need to take an average over a time window larger than the time lag between the two). This time lag results from the fact that when a signal's fitness increases, it takes some time until enough mutations happen and grow such that the density of that signal increases as well (see Fig. (1) in the main text for an example of a time lag between signal fitness and signal density). In most of the parameter values considered here, this time lag is rather small (less than 100 time steps). However, for small dσ this time lag can be larger, as it takes longer time for mutations in signal production to be accumulated enough for the signal densities to follow signal fitness. Generally by increasing the averaging time window, a stronger trend between density and fitness is observed.
In , the population size is set equal to, respectively, N = 100 and N = 1600. In all the cases the time window over which an average is taken is set equal to t = 5000. Variation of the time window length does not affect the results and our conclusions are valid for other time durations as long as the time duration is larger than a possible time lag that can exist between signal density and signal fitness.
As in the case of the prisoner's dilemma game studied in the last section, in all the cases, when time average signal density as a function of time average payoff of signals is plotted, a strong increasing trend is observable. This trend results from the fact that signals are indirectly subject to selection based on their fitness. Consequently, signals with larger fitness are selected more often and assume a higher density in the population. On the other hand, when the time average density of signals as a function of signal cost is plotted (insets), in many cases no trend is observed. For some parameter values a weak decreasing trend is observed. However, in all the cases, the trend between signal density and signal fitness is (much) stronger than that between signal density and signal cost. This conclusion is also valid in the case of costless signals plotted in Fig. (SI.6.e). In this case, even though signals are symmetric in that they all are costless, in the course of dynamics, this symmetry breaks and at each time instant, some signals find higher fitness compared to others. Such signals are produced more often. This leads to the increasing trend between signal fitness and signal density observed here.

SI. 4.3 Battle of the sexes game
Here, we check that a strong increasing trend between signal fitness and signal density also exists for different parameter values in the case of BS game. We repeat the same experiments, this time for the BS game. The procedure is as before: We use the parameter values given in Table. (SI.2) as the base parameter values. However, in each simulation one of the parameters is changed to test whether the trend found between signal density and signal fitness is valid in all parameter regions. The result is  Table. (SI.2). In order to investigate the parameter dependence, in each panel, one of the parameters is changed (as specified in the figure). As can be seen, in all the cases ρ σ t shows an increasing trend with respect to w a σ t . For each panel, ρ σ t as a function of normalized apparent cost of the signalsc σ is shown in the corresponding inset. For some parameter values no trend between ρ σ t andc σ is seen, and in all the cases the trend between ρ σ t and w a σ t is much stronger than the trend between ρ σ t andc σ . Here, the time average is taken over a time window of t = 5000, starting from the beginning of the simulations.
given in Fig. (SI.7) for the BS game. In each panel, one of the parameters is changed and the time average signal density ρ σ t as a function of the average of the payoff gathered by that signal over the same time window w a σ t , is plotted. In the inset of each panel, time average signal density as a function of normalized apparent cost (defined as apparent cost divided by average payoff of the individuals from the games) is plotted too. In Fig. (SI.7.a) and Fig. (SI.7.b), the mutation rate in strategies is set equal to, respectively, ν s = 0.001 and ν s = 0.5. In Fig. (SI.7.c) and Fig. (SI.7.d), the mutation rate in signals is set equal to, respectively, ν σ = 0.001 and ν σ = 0.5. In Fig. (SI.7.e) and Fig. (SI.7.f), maximum cost is set equal to, respectively, c max = 0 and c max = 1. In Fig. (SI.7.g) and Fig. (SI.7.h), the strength of mutation in signals is set equal to respectively, dσ = 0.01 and dσ = 0.5. In Fig. (SI.7.i) and Fig. (SI.7.j), the number of signals is set equal to, respectively, n = 5 and n = 50. And finally, In Fig. (SI.5.e) and Fig. (SI.5.f), the population size is set equal to, respectively, N = 100 and N = 1600. In all the cases the time window over which an average is taken is set equal to t = 5000. Variation of the time window length does not affect the results and our conclusions are valid for other time durations as long as the time duration is larger than a possible time lag that can exist between signal density and signal fitness.
As in the case of the prisoner's dilemma game studied before, in all the cases, when the time average signal density as a function of time average payoff of signals is plotted, a strong increasing trend is observable. This trend results from the fact that signals are indirectly subject to selection based on their fitness. Consequently, signals with larger fitness are selected more often and assume a higher density in the population. On the other hand, when the time average density of signals as a function of signal cost is plotted (insets), in many cases no trend is observed. For some parameter values a weak decreasing trend is observed, however, in all the cases, the trend between signal density and signal fitness is (much) stronger than that between signal density and signal cost. This conclusion is also valid in the case of costless signals plotted in Fig. (SI.7.e). In this case, even though signals are symmetric in that they all are costless, in the course of dynamics, this symmetry breaks and at each time instant, some signals find higher fitness compared to others. Such signals are produced more often. This leads to the increasing  Table. (SI.2). In order to investigate the parameter dependence, in each panel, one of the parameters is changed (as specified in the figure). As can be seen, in all cases ρ σ t shows an increasing trend with respect to w a σ t . For each panel, ρ σ t as a function of normalized apparent cost of the signalsc σ is shown in the corresponding inset. For some parameter values no trend between ρ σ t andc σ is seen, and in all the cases the trend between ρ σ t and w a σ t is much stronger than the trend between ρ σ t andc σ . Here, the time average is taken over a time window of t = 5000, starting from the beginning of the simulations. trend between signal fitness and signal density observed here.

SI. 4.4 Leader game
Finally, we check that the increasing trend between signal density and signal fitness holds for all parameter values also in the case of the leader game. As before, we use the base parameter values given in Table. (SI.2). However, in each simulation we change one of the parameters to test whether the trend found between signal density and signal fitness is valid in all parameter regions. The result is given in Fig. (SI.8) for the leader game. In each panel, one of the parameters is changed and time average signal density ρ σ t as a function of average of the payoff gathered by that signal w a σ t , over the same time window is plotted. In the inset of each panel, time average signal density as a function of normalized apparent cost (defined as apparent cost divided by average payoff of the individuals) is plotted too. In Fig. (SI.8.a) and Fig. (SI.8.b), the mutation rate in strategies is set equal to, respectively, ν s = 0.001 and ν s = 0.5. In Fig. (SI.8.c) and Fig. (SI.8.d), the mutation rate in signals is set equal to, respectively, ν σ = 0.001 and ν σ = 0.5. In Fig. (SI.7.e) and Fig. (SI.7.f), maximum cost is set equal to, respectively, c max = 0 and c max = 1. In Fig. (SI.8.g) and Fig. (SI.8.h), the strength of mutation in signals is set equal to, respectively, dσ = 0.01 and dσ = 0.5. In Fig. (SI.8.i) and Fig. (SI.8.j), the number of signals is set equal to, respectively, n = 5 and n = 50. And finally, In Fig. (SI.5.g) and Fig. (SI.5.h), the population size is set equal to, respectively, N = 100 and N = 1600. In all the cases the time window over which an average is taken is set equal to t = 10000. Variation of the time window length does not affect the results and our conclusion are valid for other time durations as long as the time duration is larger than a possible time lag that can exist between signal density and signal fitness.
As in the case of the prisoner's dilemma game, in all the cases, when time average signal density as a function of time average payoff of signals is plotted, a strong increasing trend is observable. On the other hand, when the time average density of signals as a function of signal cost is plotted (insets), in many cases no trend is observed. For some parameter values a weak  Table. (SI.2). In order to investigate the parameter dependence, in each panel, one of the parameters is changed (as specified in the figure). As can be seen, in all cases ρ σ t shows an increasing trend with respect to w a σ t . For each panel, ρ σ t as a function of normalized apparent cost of the signalsc σ is shown in the corresponding inset. For some parameter values no trend between ρ σ t andc σ is seen, and in all the cases the trend between ρ σ t and w a σ t is much stronger than the trend between ρ σ t andc σ . Here, the time average is taken over a time window of t = 10000, starting from the beginning of the simulations. decreasing trend is observed. However, in all the cases, the trend between signal density and signal fitness is (much) stronger than that between signal density and signal cost. This conclusion is also valid in case of costless signals plotted in Fig. (SI.7.e). In this case, even though signals are symmetric in that they all are costless, in the course of dynamics, this symmetry breaks and at each time instant, some signals find higher fitness compared to others. Such signals are produced more often. This leads to the increasing trend between signal fitness and signal density observed here.

SI. 5 When selection occurs with a probability proportional to the exponential of payoff
In this section, we consider a rather different model, in which selection occurs with a probability proportional to the exponential of payoff. All the details of this model are as before (as explained in the Model Section in this text). The only difference is that in the selection step of the evolutionary dynamics, agents are selected with a probability proportional to the exponential of their payoff. That is, agent α reproduce with a probability exp(β w α ) . Here β can be considered as the strength of selection and determines the degree to which payoff differences determine reproductive success. As it has been noted in other models of evolution of cooperation, this seemingly innocent change can make important differences [27]. For example, while cooperation evolves on networks with a certain update rule in a model in which individuals reproduce with a probability proportional to the exponential of their payoff, but not if individuals reproduce with a probability proportional to their payoff.
Remarkably, in our model, no qualitative difference in the results is introduced when we make this change to the model: In a model in which individuals reproduce with a probability proportional to exponential of their payoff, cooperation and partial cooperation are evolved. Individuals compete to exploit the existent strategies by producing the signals which are more likely to be cooperated with. They also compete by changing their strategies to pay less cost of cooperation by being more defective with respect to the signals which are more frequent in the population. As a result of this signaling war, signals obtain a fitness due to the internal dynamics of the system and the signal densities are determined by their fitness. The density of signals is explained better by the payoff they gather (i.e. the payoff an individual gathers by showing that signal), compared to their apparent cost. Below we look at this model in more detail. In the following, we will occasionally call this model the exponential model to distinguish it from the model considered before. These include mutual cooperation CC (blue circles), mutual defection (red squares), and heterogeneous cooperation-defection strategy pairs CD (green triangles). The payoff values used in the simulations are the same as those given in Table. (SI.1) and the base parameter values are the same as those given in Table. (SI.2). In each simulation, one of the parameters of the model is varied as specified in the figure. The fraction of strategies are calculated from a time series of system of length t = 50000 after discarding the first t = 1000 steps.
The dependence on the temptation T is investigated in Fig. (SI.9.a). Here, T is varied from 0.5 to 2. As R is fixed at 0.6, the prisoner's dilemma includes T s in the interval [0.6, 1.2]. We see that, by increasing T , the fraction of C strategies, and CC pairs decrease, and the fraction of D strategies increases. the fraction of both C and CC strategy pairs show a sharp decrease at a value T = R = 0.6. This is the value at which the game becomes a (prisoner's) dilemma. The reduction in the fraction of C strategies is accompanied by an increase in the fraction of D strategies (upper part of Fig. (SI.9.a)). While the reduction in the fraction of CC strategy pair is accompanied by an increase in both DD and CD strategy pairs (lower part of Fig.  (SI.9.a)). In the PD region, the fraction of CD strategies is the larges compared to the other two possible strategy pairs. This means CD strategy pair is the dominant strategy pair. This continues to hold well into the turn taking dilemma region (T > 2R), where the heterogeneous CD strategy pair becomes the social optimum (as it endows the largest total payoff compared to the homogeneous CC or DD strategy pairs). By comparison with the model in which individuals reproduce with a probability proportional to their payoff, presented in Fig. (SI.2.a), we see that the main quantitative difference is a slight reduction in mutual cooperation in the exponential model.
The dependence on the number of signals n is investigated in Fig. (SI.9.b). We see that the fraction of C strategies (upper part of Fig. (SI.9.b)), and also the fraction of CC and CD strategy pairs (lower part of Fig. (SI.9.b)) increase with increasing n. The main difference with the model in which reproduction is done with a probability proportional to payoff is that the rate of increase in cooperative strategies with increasing n is smaller in the exponential model.
In Fig. (SI.9.c), the dependence of the cooperation level on the maximum cost of signals is investigated. For c max = 0 signals have no cost. We see that both the fraction of C strategies (upper part of Fig. (SI.9.c)) and the CC and CD strategy pairs (lower part of Fig. (SI.9.c)) decrease with increasing cost. This reduction in the level of cooperation is less rapid for small costs and becomes more rapid as cost of signals increases. The reason is that for larger cost, it pays less for the agents to engage in signaling wars by producing different costly signals and they stick more to homogeneous production of the cheapest signals, which are defected more often. The dependence of the cooperation level on the population size N is investigated in Fig. (SI.9.d). We see that for small population sizes, by increasing the population size, keeping the number of signals n fixed, the fraction of C strategies decreases while the fraction of D strategies increase. This reduction in ρ C leads to a reduction of mutual cooperation while the fraction of CD strategies slightly increase. The sensitivity to the population size is stronger for smaller populations and becomes weaker for larger population sizes.
The dependence on the mutation rate in strategies is investigated in Fig. (SI.9.e). The fraction of C and D strategies show small sensitivity to the mutation rate (upper part of Fig. (SI.9.e)). However, for very small mutation rates considered here (ν σ = 0.001) both the level of mutual cooperation (CC) and partial cooperation (CD) strongly increase by decreasing the mutation rate.
The dependence on the mutation rate in signals is investigated in Fig. (SI.9.f). Similarly to the model in which reproduction is done with a probability proportional to the payoff, the behavior seems to change depending on whether the mutation rate in signals is smaller or larger than the mutation rate in strategies. Both ρ C and ρ CC are maximized when ν σ is comparable to ν s . In other words, the level of cooperation is maximized when the time scale of evolution of signals and strategies are comparable and a gap in the time scale of the evolution of strategies and signals undermines cooperation. When ν σ ν s , strategy densities become fast changing variables which adapt to the slowly changing signal distributions in the population. Such adaptation undermines cooperation, as defective strategies with high fitness are produced more often and grow in number by selection. In the opposite regime, when ν s ν σ , agents have more chances to exploit cooperators by changing the probability with which they produce signals. In other words, strategies being a slowly changing variable with respect to the signal production probability, agents have the chance to exploit cooperators by changing their signal production probabilities in a way such that they get the most exploitation out of cooperators. This reduces cooperators' payoff the thus, decreases the frequency of cooperative strategies and mutual cooperation. To summarize, when there is a large gap between the time scale of change of strategies and signals, agents can use this gap, either to exploit the slowly changing profile of strategies by producing fit signals (when ν s ν σ ), or to switch to fitter strategies which defect more often against the prevalent signals (when ν σ ν s ). Both effects undermine cooperation, and the cooperation level is maximized when the time scale of change of signals and strategies are comparable.
Dependence on the strength of mutations in signals is investigated in Fig. (SI.9.g). Here. it seems that the fraction of strategies show very small sensitivity with respect to dσ .

SI. 5.2 The relation between signal fitness and signal density
In this section, we investigate the relation between signal fitness and signal density in the model in which agents reproduce with a probability proportional to the exponential of their payoff. In Fig. (SI.10), we plot the time average of signal density, as a function of time average signal payoff, for different games. In Fig.(SI.10.a) agents play a PD game, in Fig. (SI.10.b) the game is a TTD, in Fig. (SI.10.c) agent plays the SD game, in Fig. (SI.10.d), the game is BS, and finally in Fig. (SI.10.e) the game is the leader game. The time average density as a function of normalized apparent cost is plotted in the insets. Here, the time window over which an average is taken is equal to t = 5000 in all the cases. As can be seen, similarly to the model considered in the main text, when signal densities are plotted as a function of their apparent cost (insets) a confusing picture emerges. Not only signals with large cost are produced, signal densities seem to have no or little dependence on their apparent cost. While by plotting the signal densities as a function of their average payoff, in all the cases a strong increasing trend is observed. This results from the fact that signals are indirectly subject to selection based on their payoff. Consequently, signals with larger show ρ σ t , as a function of normalized cost (cost divided by mean payoff). When signal densities are plotted against apparent cost, a puzzling picture emerges: not only costly signals are produced, but also signals show little or no dependence on their cost. However, by plotting the signal density against their payoff, a strong pattern emerges: signal densities are distributed as an increasing function of their payoff. Here, in all the cases, time averages are taken over a window of length t = 5000.
payoff are selected more often and take a higher density. In Table. (SI.4), we put the existence of a trend between signal densities and signal payoffs, and also between signal density and their apparent cost into a more quantitative test. For this purpose we appeal to two statistical tests to examine the presence of a trend: Spearman's rank correlation test and Mann-Kendal test [57,58]. The results of the tests are given in Table. (SI.4). Here, the Spearman's rank correlation coefficient between time average signal density and normalized apparent cost r(c, ρ σ t ), and also Spearman's rank correlation coefficient between ρ σ t and time average signal payoff w a σ t , denoted as r( w a σ t , ρ σ t ), are given in the first rows. The p value of the Spearman's test p r , and also the p value of the Mann-Kendall test p MK are also given. Here an average over a time window of length t = 5000 is taken (the same quantities presented in Fig. (SI.10) are used for the tests). A p value smaller than 0.05 is conventionally considered as a success in establishing a trend and a p value larger than 0.05 is considered as a failure in establishing the trend. As can be seen, in the case of TTD, SD and BS, both tests fail to establish a trend between the signal density and signal cost, and in the case of PD and the leader game a weak trend is established. On the other hand, a very strong trend between signal density and signal payoff is established in all the cases. Furthermore, the trend between signal payoff and signal density is much stronger than the trend between signal density and their apparent cost.
The trend between signal density and signal payoff holds for other parameter values of the model in the case of all the games considered. As an example, here, we show the existence of this relation for different values of the model parameters in the case of the prisoner's dilemma game. For this purpose, for each of the model parameters, 3 different values are considered and r, p r , and p MK are calculate and presented in Table. (SI.5) (each first for the normalized cost and time average density, and then for time average fitness and time average density). As can be seen in most of the cases both tests fail to establish a trend between time average density and normalized apparent costs. While a strong or a very strong trend between time average density and time average fitness of signals is established using both tests. In addition, in almost all the cases the (increasing) trend between time average density and time average fitness is (much) stronger than the (decreasing) trend between time average density and normalized cost of signals. This means that signal fitness, explains signal densities in the population, much better than signal costs and can be seen as a solution to the puzzle posed by the presence of costly signals.
The only cases where, the tests suggest the trend between time average density and time average payoff, is similar (in magnitude) to the trend between time average density and normalized apparent cost, is the case of large cost (c max = 0.5) and large β (β = 50). In the case of very high signal costs, such that the cost of signals is comparable or larger than the average payoff of the individuals, it does not pay for the individuals to produce very costly signals. Consequently, the cost differences between the signals can become more significant than the fitness differences resulting from the internal dynamics of the system and signal density show a similar trend (in magnitude) with respect to cost and payoff of signals. In the case of very large β , the fittest agents are selected with higher probability. It can happen that an individual contribute many offspring to the pool of the next generation. Under such condition the heterogeneity in the population and also the magnitude of the signal war is reduced. Consequently, agents who produce the cheapest signals obtain a higher payoff and are selected more often. This is why a rather strong trend between signal cost and signal density is observed in this case. This predicts that in a situation 2.8 × 10 −2 4.4 × 10 −1 5.6 × 10 −2 3.3 × 10 −1 2.5 × 10 −2 p r ( w a σ t , ρ σ t ) 1.1 × 10 −10 1.4 × 10 −10 4.4 × 10 −7 3.3 × 10 −11 6.2 × 10 −11 p MK (c σ , ρ σ t ) 4.7 × 10 −2 4.9 × 10 −1 6.4 × 10 −2 3.1 × 10 −1 2.5 × 10 −2 p MK ( w a σ t , ρ σ t ) 1.7 × 10 −7 1.7 × 10 −7 2.1 × 10 −5 2.4 × 10 −7 4.9 × 10 −7 Table SI.4. Trend tests between signal density and signal payoff in the model in which individuals reproduce with a probability proportional to the exponential of their payoff, with different game structures. From top to down, Spearsman's rank correlation coefficient between apparent cost and time average density of signals r(c σ , ρ σ t ), and between the time average fitness and time average density of signals r( w a σ t , ρ σ t ), p value of the Spearman test between apparent cost and time average density of signals p r (c σ , ρ σ t ), and between time average fitness and time average density of signals p r ( w a σ t , ρ σ t ), p value of Mann-Kendall test between apparent cost and time average density of signals p MK (c σ , ρ σ t ), and between time average fitness and time average density of signals p MK ( w a σ t , ρ σ t ). Here, an average over a window of length t = 5000 is taken. In all the cases both tests strongly support a trend between fitness and density of signals, but in the case of TTD, and SD, fail to establish a trend between the apparent cost and density of signals. In all the cases, the trend between density and fitness is significantly stronger compared to the trend between apparent cost and density.
when small fitness differences can be determining in selection (such as the case where individuals produce offspring with a probability proportional to the exponential of their payoff, in the large β regime), costly signaling is more difficult to evolve.
In Table. (SI.5), the time window over which an average is taken is equal to t = 5000. The only exception is the case of dσ = 0.01, in which case t = 10000. Variation of the time window does not affect the results, as far as it is larger than a possible time lag between signal fitness and signal densities. This time lag results from the fact that when a signal's fitness increases, it takes some time until enough mutations happen and grow for the density of that signal to increase as well. Just as in the model in which selection occurs with a probability proportional to payoff, in most of the parameter values considered here, this time lag is rather small (less than 100 time steps). However, for small dσ this time lag can be larger, as it takes longer time for mutations to be accumulated enough for the signal densities to follow signal fitness.