Assortativity in cognition

In pairwise interactions, where two individuals meet and play a social game with each other, assortativity in cognition means that pairs where both decision-makers use the same cognitive process are more likely to occur than what happens under random matching. In this paper, we show theoretically that assortativity in cognition may arise as a consequence of assortativity in other dimensions. Moreover, we analyze an applied model where we investigate the effects of assortativity in cognition on the emergence of cooperation and on the degree of prosociality of intuition and deliberation, which are the typical cognitive processes postulated by the dual process theory in psychology. In particular, with assortativity in cognition, deliberation is able to shape the intuitive heuristic toward cooperation, increasing the degree of prosociality of intuition, and ultimately promoting the overall cooperation. Our findings rely on agent-based simulations, but analytical results are also obtained in a special case. We conclude with examples involving different payoff matrices of the underlying social games, showing that assortativity in cognition can have non-trivial implications in terms of its societal desirability.


Introduction
Assortativity is a broad concept that can be applied to different contexts.In general, assortativity means that individuals are more likely to be engaged in interactions with people that are similar to them along some dimensions.It is related to homophily: the tendency of individuals to associate and bond with similar others (from Ancient Greek: homoû + philíē, 'love of the same') [1,2].Assortativity is a widespread phenomenon.A large amount of evidence has been collected showing that individuals often stay and interact with similar others, in some form or another: similarities may refer to belonging to the same cultural group, the same social or ethnic group, or the same religion [3].In network theory, the assortativity coefficient measures the correlation between nodes of similar degree [4].The effects of assortativity have also been studied extensively, e.g., in genetics [5,6] or for the evolution of cooperation [7,8].If we think of agents as divided in groups according to some characteristic or action, an index of assortativity can be formalized as the difference in probability of matching with an individual of a group conditional on belonging to that same group rather than to a different one [9].Preferences may be used to rationalize different types of assortativity [10,11,12].
The dual process theory is a paradigm that has become prominent in cognitive psychology and social psychology in the last thirty years.In the dual process framework, the decision making is described as an interaction between an intuitive cognitive processes and a deliberative one.Although different approaches emerge from the literature [13,14,15], some common characteristics of the two processes are well established.The intuitive process, also called system 1 or type 1, is fast, automatic, and unconscious, while the deliberative process, also called system 2 or type 2, is slow, effortful and conscious.In evolutionary terms, the intuitive cognitive process is older than the deliberative one, and it is shared with other animals [16].The existence of two systems in reasoning and decision making is extended to the domain of learning with associative implicit processes and rule-based explicit processes [17,18].
To the best of our knowledge, assortativity in cognition has not been considered and analyzed by the literature so far.In some cases, it is involved or even implied, but the focus was never on it.For instance, priming has been shown to affect the activation of cognitive processes [19], hence interacting partners who are exposed to the same priming are more likely to rely on the same cognitive process.Also, assortativity in actions often implies assortativity in cognition as a byproduct [20].

Sources of assortativity in cognition
Assortativity in cognition may arise as a consequence of assortativity on other dimensions, such as the characteristics of the interaction or the characteristics of the interacting agents.
Let p(D|D) be the probability, for a given agent, to interact with a deliberating agent given that the agent is deliberating as well.Following the same notation, p(D|I) is the probability to interact with a deliberating agent given that the agent is deciding intuitively.Let p(I|I) and p(I|D) be defined analogously.There is assortativity in cognition if p(D|D) > p(D|I) which implies, and is implied by, p(I|D) < p(I|I).

State-based assortativity
The characteristics of an interaction (e.g., payoffs, information, complexity of choice) vary across interactions but are often the same, or at least similar, for the individuals in the same interaction.When such characteristics determine the likelihood of deliberation, assortativity in cognition emerges.To fix ideas consider a case with two states of the world, A and B, that differs in the likelihood that deliberation and intuition are used by agents.State A and state B occur with probabilities p(A) and p(B) = 1 − p(A), respectively.Agents involved in the same interaction make decisions in the same state.In state A an agent decides intuitively with probability k A while she deliberates with probability 1 − k A .Analogously, in state B an agent decides intuitively with probability k B while she deliberates with probability 1 − k B .In this setting, assortativity in cognition comes out if and only if the likelihood of intuition differs in the two states, i.e., k A = k B (for the proof see Appendix, Subsection A.1).

Type-based assortativity
Agents can have heterogeneous characteristics (e.g., skills, abilities, preferences, knowledge) which may determine the likelihood of deliberation.In this case, when the agents participating in the same interaction tend to share the same characteristics, assortativity in cognition emerges.To fix ideas consider the case where the population is composed by two types of agents, X and Y , that differ in the likelihood of resorting to deliberation and intuition.The fraction of X agents is equal to q and consequently 1 − q is the fraction of Y agents.Type X agents and type Y agents decide intuitively with probability k X and k Y , respectively, while they deliberate with the remaining probability 1 − k X and 1 − k Y .Let p(X|X) and p(X|Y ) be the probability that a type X interacts with an other type X and with a type Y , respectively.There is assortativity in types if p(X|X) > p(X|Y ), which implies, and is implied by, p(Y |X) < p(Y |Y ).
In this setting, if we assume assortativity in types, then assortativity in cognition comes out if and only if the likelihood of intuition is different for the two types, i.e., k X = k Y (for the proof see Appendix, Subsection A.2).

Learning intuitive cooperation through deliberation
Cooperation is a central feature of human behavior that differentiates Homo Sapiens from the other species [21,22].When people are cooperative pay a cost to benefit others.The emergence of cooperation as a persistent phenomenon is a major focus of research across different subjects, such as social sciences [23] and biology [24].Indeed, the wide empirical evidence on cooperation is puzzling.For social scientists, it is at variance with the paradigmatic rational self-interested individual that is known as Homo Economicus, even if other-regarding individuals can have reasons to cooperate [25].For biologists, competition among individuals is at the basis of natural selection, and this is likely to wipe out cooperators though it is not necessarily the case [26].In the literature on evolutionary game theory, great attention has been devoted to the mechanisms through which selection can favor the evolution of cooperation [27,28,29].Recently, the cognitive basis of cooperative decision-making has also been explored, both experimentally [30,31,32] and through theoretical modeling [20,33].In the following we show that cognition can play an important role for the evolution of cooperation by the channel of assortativity.By doing so, we exemplify how assortativity in cognition can be incorporated in a fully-fledged model, giving insights on the phenomenon under analysis, namely the emergence of cooperation and the degree of prosociality of intuition and deliberation.

The model
We describe a setting in which agents from a population interact repeatedly in random pairs.There are two possible types of interaction, the one shot prisoner dilemma, Table 1(a) that occurs with probability 1 − p, and the repeated interaction, Table 1( Table 1: Payoff earned by the row player, with the assumption that b > c > 0.
of interaction.Similarly, when the two players in an interaction play D, they both earn c irrespectively of the type of interaction.When the two players choose different actions, the payoffs depend on the type of interaction: in the one-shot prisoner dilemma, the defecting player earns b + c and the cooperating agent earns 0; in the repeated prisoner dilemma, both players earn c.We assume that b > c > 0, which makes D strictly dominant in the one-shot interaction, and C weakly dominant in the repeated interaction.This payoff structure is already used in the literature [20], with the only difference that c is added in every cell to avoid negative values.
Each agent is able to elaborate the rewards obtained in the past when playing the two different actions, cooperation and defection.This information is stored in the memory of agents.When an agent chooses a certain action then updates the information about the past rewards obtained with that action, keeping unchanged the information about the past rewards obtained with the other action.Indeed the memory of a generic agent i at time t, m t i , is made of two elements, the information about the past rewards obtained in the previous periods when playing cooperation, R t i,C , and the information about the past rewards obtained in the previous periods when playing defection, R t i,D : In particular, if agent i plays cooperation at time t, then the agent's memory is updated in the following way: with α ∈ (0, 1] measuring the learning rate and R t i being the reward obtained in the last period.Analogously, if agent i plays defection at time t, then the agent's memory is updated in the following way: This process of memory update, by which we compute a weighted mean between the value stored in memory and the last reward obtained, is a form of reinforcement learning: it can be seen as myopic Q-learning [34], i.e., the case in which agents are not able to make any prediction about the future.We note that, when the learning rate α is equal to one, only the last reward obtained for each action matters. The decision process used by agents relies on either intuition or deliberation, with the latter following a more consequentialist rule (based on best reply) than the former (based on reinforcement learning).
• Under intuition the agent is not able to recognize the type of occurring interaction.
The intuitive decision is based on the information saved in memory.The action with the highest past reward is chosen: when In case of a tie, i.e., when , each action is chosen with one-half probability.
• Under deliberation the agent is able to recognize the type of occurring interaction.The deliberative decision is driven by best-response.Defection is chosen in the one-shot prisoner dilemma because strictly dominant, while cooperation is chosen in the repeated prisoner dilemma, because weakly dominant.In the Appendix (Subsection C.3) we consider a variant in which deliberative decisions are based on myopic Q-learning with finer information, distinguishing between past performance of cooperation and defection under deliberation in the two types of interaction.We show that qualitatively similar results hold in that case as well.
We assume that an agent adopts intuition or deliberation depending on the realization of a random variable.In particular, we let K ∈ [0, 1] denote the probability that an agent responds intuitively, so that 1 − K denotes the probability of deliberation.The cognitive processes adopted by two agents interacting together exhibit assortativity, as measured by parameter A ∈ [0, 1].Indeed, with probability A there is a single draw of the random variable, which means that the two agents are forced two use the same cognitive process.With probability 1 − A, there are two independent draws of the random variable, one for each agent, whose cognitive process will be the same or different depending on the realized draws.
We stress that K is homogeneous and exogenous in our model.This is so because our aim is not to study the evolution of dual process reasoning, rather we want to focus on the effects of assortativity in cognition given dual process reasoning, for which the literature has already provided evolutionary arguments [35,36,20].Quite interestingly, we find that in our model the value of K that maximizes the average payoff is often strictly in between 0 and 1 (Appendix, Section D).

Results
The findings in this subsection are based on simulations over 5000 periods, with 500 agents, payoffs b = 4 and c = 1, and learning rate α = 0.5.The code in Python is available at https://github.com/EugenioVicario/Assortativity_in_Cognition.
A first result is that the average cooperation rate increases monotonically in the level of assortativity in cognition.The result is depicted in Figure 1 where solid lines represent the average cooperation rate under intuition as the assortativity in cognition varies.Since the cooperation rate under deliberation is constant and equal to p, which depicted with dashed lines in the figure, the result is driven by the increase in cooperation rate under intuition.When assortativity in cognition emerges through assortativity in types, it also comes with assortativity in behavior [20], at least if types are defined including actions.When this is the case, it is impossible to disentangle the effect of assortativity in cognition from the effect of assortativity in behavior.Our result suggests that assortativity in cognition is able to promote cooperation per sé, also in the absence of other forms of assortativity.
A second result points to the existing interaction effect between assortativity in cognition and other parameters in the model.In particular, Figure 1 suggests that assortativity in cognition can be a substitute for both the likelihood of repeated interactions, i.e., p, and the recourse to deliberation, i.e., 1 − K. Indeed, when p is quite large there is no room for a significant effect of assortativity in cognition, because repeated interactions are frequent and this, in itself, sustains high rates of intuitive cooperation.Also, when K is small every agent frequently deliberates, which implies that often both the agents in an interaction are deliberative, even in the absence of assortativity in cognition.
A third result is an observation that is independent of assortativity in cognition.The average cooperation rate under intuition, for given p and A, increases as K decreases, i.e., the more frequently agents resort to deliberation.Deliberation is able to shape the intuitive heuristic toward cooperation or, in other words, agents learn intuitive cooperation through deliberation.
Finally, a fourth result is about the role of assortativity in cognition in determining whether intuition is more cooperative than deliberation, which is a theme that has been harshly debated in the literature [32,37].In our model, intuition can be more cooperative than deliberation, or the vice versa can happen, and assortativity in cognition plays a role for this.By looking at Figure 1, we observe that the average cooperation rate is always higher under intuition than under deliberation when K is quite small or p is quite large.When K is large and p is small, assortativity in cognition matters: indeed, it is often the case that intuition is still more cooperative than deliberation for high values of assortativity, while deliberation turns out to be more cooperative than intuition when assortativity in cognition is small.In this sense, assortativity in cognition helps intuition to be more cooperative than deliberation, in that it enlarges the region in the set of parameters where this holds.
In the Appendix we provide a robustness check of our results, by considering different entries in the payoff matrix (Subsection C.1) and different learning rates (Subsection C.2).

Markov process
When the learning rate α is equal to one, the behavior of one agent i, given the behavior of all the other agents, in the model can be described through a discrete-time Markov process P , defined on a finite state space S and characterized by a transition matrix T .The state space is made by all the feasible memories of agent i, i.e., all the pairs {R t i,C , R t i,D }.The transition matrix describes the probabilities of moving from each state to any other.Transition probabilities depend on the current memory, i.e., the state, the parameters K and p, and the probability of intuitive cooperation of the rest of the population, denoted by x.A probability distribution π defined on S is a vector of probabilities such that m π m = 1, where m ∈ S denotes a memory and π m the probability that the agent has memory m.A probability distribution is said invariant if: In words, an invariant distribution remains unchanged in the Markov process as time progresses.Since the Markov process has a unique recurrent class, the invariant distribution exists and is unique.Once obtained the invariant distribution, the probability of cooperation under intuition for agent i is the sum of probabilities, in the invariant distribution, of states in which agents cooperate under intuition while they randomly choose the intuitive response in the cases in which R t i,C = R t i,D .We denote with x i the probability of intuitive cooperation in the invariant distribution for agent i.Finally, we introduce the consistency condition: in the long run equilibrium of the model, the cooperation rate of agent i is equal to the cooperation rate of the other agents, i.e., x = x i .
For the sake of simplicity, we focus on the case of perfect assortativity.Thus the information about past rewards when playing cooperation R In the Appendix (Subection B.1) we develop the analysis in detail for the simplifying case of full assortativity, i.e., A = 1.
Figure 2 represents the cooperation rate under intuition, distinguishing between the em- pirical frequencies obtained through simulations and the theoretical frequencies resulting from the long-run Markov chain analysis.For most values of p and k, the theoretical analysis overlap with simulations, with only perceptible differences for cooperation rates that are very close to one.See the Appendix (Subsection B.2) for more details on this.

Bivalence of assortativity in cognition on payoffs
Drawing from the results in Subsection 3.2, one may be tempted to conclude that assortativity in cognition is welfare enhancing.In this section, we show that this conclusion would be an overstatement: indeed, the effects of assortativity in cognition on the overall welfare are complicated in general, and hence must be evaluated case by case.
In the previous section we focused on the cooperation rate since the total reward of agents is increasing in it.In the following examples we do not have an action that is always more cooperative than the other action, hence we focus on the average total reward, i.e., the average reward over the whole population along the entire time span.
We replicate the simulations of the previous section changing the types of interaction in which the agents are involved.For simplicity, we consider each of the two interactions in Table 1 combined with a variant of it, in which the two actions are permuted, i.e., the actions have inverted payoff consequences in the two types of interaction.In particular, in subsection 4.1 we consider two one-shot prisoner dilemmas, while in subsection 4.2 we consider two repeated prisoner dilemmas.

Double one-shot prisoner dilemma
Under deliberation agents choose the dominant action, S in game 2a and and F in game 2b.Let p be the probability of game 2b.In this setting, playing the dominated action increases the overall payoff, with the result that miscoordination in behaviors can be beneficial with respect to coordination in the dominant action.Figure 3 shows in (IV) that an increase in assortativity is welfare-increasing when K is low and welfare-decreasing when K is high.To grasp the learning effects contributing to this result, we can focus on pairs with one agent intuitive and the other deliberative, given that the main effect of assortativity is to reduce the likelihood of such pairs.Consider p > 0.5.As K increases, i.e., agents are more often intuitive, the probability to choose action F gets larger under intuition (Figure 3, II).Suppose first that the intuitive agent chooses F .With probability p both agents play F , since F is dominant and hence surely chosen by the deliberative agent, yielding no substantial effects on learning.With probability (1 − p), the deliberative agent chooses S because dominant, with the result that S performs well and F performs poorly, which makes S more likely to be adopted in the future for both agents.Suppose now that the intuitive agent chooses S. Analogously, with probability (1 − p) both agents play S, with no substantial effect on learning, while with probability p the deliberative agent chooses F since dominant, which triggers a learning effect.Indeed, in the latter case F performs well and S performs poorly, which makes F more likely to be adopted in the future for both agents.Please note that S is the welfare-enhancing action, when p > 0.5.To complete the reasoning, we observe that the two learning effects described above get weakened when assortativity in cognition increases, due to the reduction in the likelihood that a pair occurs with one agent intuitive and the other deliberative.We note that the former (latter) event is more (less) likely as K increases, because this makes the intuitive player more often choose F (Figure 3, II).Therefore, an increase in assortativity reduces the likelihood of playing the dominant action when K is low and increases it when K is high (Figure 3, III).Since the dominated action is socially optimal, this leads us to conclude that assortativity in cognition is welfare-enhancing for low values of K and welfare-decreasing for high values of K (Figure 3, IV).

Double repeated prisoner dilemma
Under deliberation, agents choose the weakly dominant action, S in game 3a and and F in game 3b.Let p be the probability of game 3b.In this setting average payoffs are maximized when both players choose the weakly dominant action, while other outcomes pay the same.Intuitively, greater deliberation, i.e., a lower K, is beneficial because it makes agents choose the weakly dominant action (Figure 4, I); the average payoff also increases for extreme values of p, close to either 0 or 1 (again Figure 4, I), because also intuitive agents choose the weakly dominant action most of the time (Figure 4, II).
As already pointed out in the previous subsection, assortativity in cognition decreases the probability of interaction between an intuitive agent and a deliberative one, thus increasing the probability of interaction between two intuitive agents and between two deliberative agents.
On the one hand, an increase of assortativity yields a direct effect on payoffs in that the increased likelihood of two deliberative agents interacting together allows an easier coordination on the weakly dominant action.
On the other hand, there other effects triggered by learning.To grasp these learning effects, we focus again on pairs with an intuitive agent and a deliberative one.Consider p > 0.5.The most likely occurrence here is that agents play game 3b, which happens with probability p, and that the intuitive agent plays action F (Figure 4, II).Since the deliberative agent surely chooses F as well, they obtain the highest payoff b, which increases the likelihood of playing action F in the future.The least likely occurrence is that agents play game 3a, which happens with probability 1 − p, and that the intuitive agent plays action S (Figure 4, II).Since the deliberative agent surely chooses S as well, they obtain the highest payoff b, which increases the likelihood of playing action S in the future.Since the action F is the more often the weakly dominant action, given p > 0.5, the former effect is stronger than the latter.To complete the picture, there are other two cases in which the intuitive agent plays the dominated action, this yielding no substantial effect on learning because both players earn a payoff equal to c, even if for different actions.Overall, an increase in assortativity in cognition leads to a decrease in the rate at which intuitive agents play the action that is dominant in the interaction occurring with higher probability (Figure 4, III).In turn, this has a negative impact on average payoffs, and this impact is greater for extreme values of p, close to either 0 or 1 (again Figure 4, III).It turns out that, for extreme values of p, close to either 0 or 1, this negative indirect effect through learning more than offsets the positive direct effect on payoffs, resulting in the blue areas in Figure 4, IV.

A Models of assortativity in cognition
In the following two subsections we provide simple models where assortativity in cognition arises as a consequence of state-based assortativity (A.1) and type-based assortativity (A.2).

A.1 State-based assortativity
There are two states of the world, A and B, that occur with probabilities p(A) and p(B) = 1 − p(A), respectively.We assume p(A) ∈ (0, 1).Agents involved together in an interaction are in the same state of the world, i.e., there is full assortativity in the state of the world.Suppose that in state A there is a probability k A of intuition and a probability 1 − k A of deliberation.Analogously, k B and 1 − k B are the probabilities of intuition and deliberation, respectively, in state B. We denote with p(D|D) the probability for an agent, conditional on being deliberative, to interact with an agent who is deliberative as well.Following the same notation, p(D|I) is the probability to interact with a deliberative agent, conditional on being intuitive.Assortativity in cognition occurs when: From inequality 2, it follows that p(I|D) < p(I|I).Applying Bayes' formula and the definition of conditional probability, inequality 2 can be rewritten as: Let: Substituting in inequality 3, the result is: Multiplying for the inverse of the right hand side, we obtain: The numerator and denominator are identical except for the two parts inside square brackets.
Then inequality 5 can be rewritten as: The inequality can be reduced to: In conclusion, there exists assortativity in cognition, as defined in inequality 2, if and only

A.2 Type-based assortativity
The population comprises two types of agents, X and Y .The fraction of X agents is equal to p(X) and, consequently, p(Y ) = 1 − p(X) is the fraction of Y agents.We assume p(X) ∈ (0, 1).The two types differ in the probability to respond intuitively or deliberately.Assume that type X agents respond intuitively with probability k X while they deliberate with probability 1 − k X .The probability of intuitive and deliberative answers for type Y agents are, respectively, k Y and 1 − k Y .Assume that there exists assortativity in types, i.e., the probability to interact with a type X agent is greater for a type X agent than for a type Y agent: From inequality 8, it follows p(X|Y ) < p(Y |Y ).As in the previous subsection, assortativity in cognition is defined as: Applying again Bayes' formula and the definition of conditional probability, let:: Proceeding analogously as for state-based assortativity, if p(X) = 0, p(Y ) = 0 and k X = k Y , then inequality 10 is equivalent to: In conclusion, if there is assortativity in types and the probability of deliberation is different for the two types, then assortativity in cognition emerges.

B Theoretical analysis B.1 Markov chain
To describe the transition probability from state to state, we firstly need to write the probabilities of obtaining a certain reward with cooperation and defection conditioned on the state: The probabilities of obtaining a certain reward with cooperation and defection depend on the state on which they are conditioned.The states space can be partitioned in three subsets.In the first one, labeled with S 1 , there are the states in which R In S 1 the probabilities of different rewards are: In S 2 the probabilities of different rewards are: In S 3 the probabilities of different rewards are: Starting from the probabilities of obtaining a certain reward with cooperation and defection conditioned on the memory, it is straightforward to build the transition probabilities between different states.To give an example, we focus on the transition probabilities from the state {c, c}.The probability of transition in one step from {c, c} to {0, d} or {b, d} is equal to zero.Indeed, at least two steps are required to change both the rewards stored in memory.The probability of the transition from {c, c} to {c, d} is equal to while the probabilities of the transitions from {c, c} to {0, c} and {b, c} are, respectively, Finally, the probability to remain in {c, c} is equal to the probability of obtaining again c from cooperation plus the probability of obtaining again c from deliberation: 5 for graphical representation.
In the transition matrix the entry T m i m j represents the probability to have a transition from state m i to state m j .The summation of all the entries along every row is equal to one.
As stated above the Markov chain has a unique recurrent class and thus the invariant distribution exists and is unique: The probability of each state in π is a function of the probability of intuitive cooperation x. π = π {0,c} (x), π {c,c} (x), π {b,c} (x), π {0,d} (x), π {c,d} (x), π {b,d} (x) The probability of intuitive cooperation is equal to the probability of being in a state belonging to S 1 plus half the probability of being in a state belonging to S 2 .
For the consistency condition we should have: Solving Equation 12, considering the consistency condition in equation 13, we obtain the equilibrium values of x i and x, let x = x i = x.For all the values of K ∈ (0, 1) and p ∈ (0, 1), x = 1 is always a solution.When all the agents intuitively cooperate, the reward of cooperation remains higher than the reward of defection whatever the type of interaction and the cognitive mode chosen.When large values of K are associated with low values of p, a second solution emerges in between 0 and 1.
The analysis carried out so far is for given aggregate behavior, focusing on the values of x where the resulting individual behavior is consistent, i.e. coincides, with the aggregate behavior.When this does not happen, it is natural to ask how the aggregate behavior evolves in response to individual behaviors disconfirming it.We posit that the aggregate behavior will decrease over time when x i < x, while it will increase when x i > x.Avoiding the burden of introducing a formal dynamical model, we rely on this assumption and on the observation from Figure 6 on the shapes of x i (x), to conclude that: when only the x = 1 solution exists it is an attractor, while, when also another solution exists, this latter solution is attractive and x = 1 is no longer so.

B.2 Analytical vs. simulative results
In Figure 7 we observe that there is a discrepancy between analytical and simulative results.In fact, simulations perfectly predict analytical results when cooperation rate under intuition is equal to one and when it is small enough.When cooperation rate under intuition is close to 1, but different from 1, simulations tend to overestimate it.Overestimation occurs with perfect assortativity when, at some point t in time, we have that R t C > R t D for all agents.In the following periods, all the agents always cooperate under intuition.The system reaches the equilibrium with x = 1 and, unless perturbations are introduced, the system can not leave such equilibrium.When two equilibria exist and the the minimum x in those equilibria is close to 1, it is possible that, during the process of convergence to the equilibrium with x < 1 (which is the attractor), the dynamic reaches the equilibrium with x = 1 due to stochastic realizations of intuitive behavior as cooperation, which is quite likely since x is close to 1.In Figure 7 we show that the greater is the number of agents the lower is the overestimation, because having that all realized behaviors are cooperative becomes less likely.The payoff matrices of the two types of interaction are equivalent to the games described by [20], in that they are obtained from theirs by adding c in every entry to avoid negative payoffs.

C Robustness analysis C.1 Payoff matrix
In the simulations presented in the paper we consider b = 4 and c = 1 as in [20].Results are qualitatively similar if we consider different values of b and c, indeed cooperation rates under intuition monotonically increase as assortativity in cognition increases.Moreover, the lower p, the lower the cooperation rate and also the greater K, the lower the cooperation rate.Furthermore, we notice that the greater is b, maintaining c constant, the greater is the rate of cooperation.

C.2 Q-Learning
The standard formulation of Q-learning update is where x is the state and a is the action performed, α t is the learning rate, r t is the reward obtained in period t, and V t−1 is the future value of state y t that can be reached playing action a in state x, which is discounted by the parameter γ [34].In our formulation, γ is equal to zero, representing the situation in which agents are myopic, i.e., they are unable to make any prediction about future rewards.Since our results in the main text are given for α = 0.5, here we explore their robustness by considering different learning rates.Figure 9 shows the cooperation rate attained under intuition for α taking values 0.25, 0.75 and 1, as K, p, and A change.In particular, K and p range from 0.2 to 0.8 with steps of 0.2, while instead A ranges from 0 to 1 with steps of 0.1.We notice that the quality of results does not vary as α changes, indeed cooperation rates under intuition monotonically increase as assortativity in cognition increases.Moreover, the lower p, the lower the cooperation rate and also the greater K, the lower the cooperation rate.More precisely, we observe that a larger weight given to the past information, i.e., a smaller α, leads to greater cooperation under intuition and, hence, overall.

C.3 Q-Learning under deliberation
In this subsection we introduce a learning process for deliberation as well.Agents are characterized by three different memories.The first is the memory through which agents take an intuitive decision, m t i , that comprises two elements.One is the information about the past rewards obtained in the previous periods when playing cooperation, R  other is information about the past rewards obtained in the previous periods when playing defection, R t i,D : The second and the third memories are instead used to take decisions under deliberation, one for each of the two different games.The second memory, m t i,0 , comprises two elements: one is R t i,C,0 , which is a statistics of the payoffs obtained in the past when agent i cooperates under deliberation in the repeated prisoner dilemma; the other is R t i,D,0 , which is a statistics of the payoffs obtained in the past when agent i defects under deliberation in the repeated prisoner dilemma.Agent i at time t, when choosing under deliberation in the repeated prisoner dilemma, takes the action associated to the highest between R  In Figure 10 we plot the evolution in time of the average rate of cooperation under deliberation when playing the one shot and the repeated prisoner dilemma for different values of the parameters K, A, and α.The parameter p is constant and equal to 0.5 to maintain a situation of symmetry between the two games.Colors are related to different values of K, as reported in the legend.In each subplot, for each color, there are two lines, one increasing and the other decreasing.The increasing line is the average cooperation rate under deliberation for the repeated prisoner dilemma, while the decreasing one is the average cooperation rate under deliberation for the one shot prisoner dilemma.
When α is lower than one, i.e., equal to 0.5 and 0.75, agents quickly learn to cooperate under deliberation when they play the repeated prisoner dilemma, while they defect when playing the one shot interaction.In these cases simulations are qualitatively the same of the baseline model: after few iterations, agents cooperate with probability one when they deliberate in the repeated interaction, while they cooperate with probability zero when they deliberate in the one shot interaction.When α = 1, agents are able to learn to defect under deliberation in the one shot prisoner dilemma, while they are not able to completely learn to cooperate in the repeated prisoner dilemma under deliberation.This result is due to the fact that cooperation in the repeated interaction is the weakly dominant strategy, and not the strongly dominant one.Thus, agents who save in memory only the last payoff obtained, given α = 1, are often indifferent between the two actions and take their choice randomly.
In general, we observe that the greater is K, the slower is the learning process under deliberation because it is less frequent.Furthermore, the learning process is quicker in the one shot interaction than in the repeated one, which is probably due the different types of dominance (strong as opposed to weak) in the two types of interaction.Moreover, higher assortativity and smaller learning rate speed up the learning process.

D The optimality of dual process reasoning
As we argue in the paper, our aim is not to study the evolution of dual process reasoning.Rather, we assume that agents are dual process reasoners, for which the literature has already provided evolutionary arguments, and we focus on the effects of assortativity in cognition on cooperation.Nevertheless, it can be interesting to notice that the value of K that, for given p and A, maximizes the overall level of cooperation is often strictly in between 0 and 1.This means that a population of dual process reasoners would perform better than a population of agents that are purely intuitive or deliberative agents.
A number of remarks can be done by looking at Figure 11.First, we notice that the value of K that maximizes total cooperation increases as p increases.In other words, a higher probability of repeated interactions, for which cooperation performs better than defection at the individual level, makes deliberation less important to maximize cooperation.Second, the higher the assortativity in cognition, the greater is the optimal value of K. Indeed, when assortativity is high, deliberation is more effective in shaping the heuristics, and thus less deliberation is needed to sustain cooperation.Third, we notice that populations of purely deliberative agents (i.e., K = 0) are never optimal, while populations of purely intuitive agents (i.e., K = 1) can be optimal, which happens when p is high enough.These remarks appear not be fully general by looking at the figure (see, for instance, that the lines are not always monotonically increasing), but this can be an artifact of a rather limited number of simulations, also considering that we find a tiny difference of cooperation rates between the maximizing K and the level of K attaining the second-highest cooperation rate.
b) that occurs with probability p. Two actions are available in both interactions, namely cooperation, C, and defection, D. When the two players in an interaction play C, they both earn b irrespectively of the type

Figure 1 :
Figure 1: Average cooperation rate varying assortativity in cognition.Each subplot refers to a specific value of K. Solid lines represent the average rate of cooperation under intuition, dashed lines represent the average cooperation rate under deliberation, i.e., the value of p.Each color refers to a specific value of p.
t i,C belongs to {0, c, b} and analogously R t i,D belongs to {c, d}, where d = b + c.Hence the memory of each agent belongs to the Cartesian product of the column vector [0, c, b] and the row vector [c, d].

Figure 2 :
Figure 2: Solid lines are the theoretical frequencies obtained through the long-run Markov chain analysis.Dots are the empirical frequencies obtained through simulations with 500 agents, 5000 time periods, and A = 1.

Figure 3 :
Figure 3: (I) Average reward with A = 0; (II) Rate of intuitive play of action F when A = 0; (III) Difference in the rate of intuitive play of action F when A = 1 and A = 0; (IV) Average reward with A = 1 minus average reward with A = 0.

Figure 4 :
Figure 4: (I) Average reward with A = 0; (II) Rate of intuitive play of action F when A = 0; (III) Difference in the rate of intuitive play of action F when A = 1 and A = 0; (IV) Average reward with A = 1 minus average reward with A = 0.
. In the second group S 2 there are the states in which R t i,C = R t i,D , and in the third group S 3 the remaining states in which R t i,C < R t i,D .The probabilities of obtaining a certain reward with cooperation and defection depend on the parameters K, p, and x.

Figure 5 :
Figure 5: Transition probabilities from state {c, c}.Colors blue, violet, and red denote states belonging to S 1 , S 2 , and S 3 , respectively.

Figure 6 :
Figure 6: Colored lines represent x i (x) for A = 1 and different values of p.The intersection with the 45 • line identifies the equilibria satisfying the consistency condition x i = x.Each subplot refers to a different value of K.

Figure 7 :
Figure 7: Three different values of the population size M .Solid lines represent the theoretical frequencies obtained through the long-run Markov chain analysis.Dots represent the empirical frequencies obtained through simulations with 500 agents, 5000 time periods, and A = 1.

Figure 8 :
Figure 8: Average cooperation rate varying assortativity in cognition.Each subplot refers to a specific value of K and b.Solid lines represent the average rate of cooperation under intuition, dashed lines represent the average cooperation rate under deliberation, i.e., the value of p.Each color refers to a specific value of p.
b + c c (a) One shot prisoner dilemma.(b) Repeated prisoner dilemma.

Figure 9 :
Figure 9: Average cooperation rate for different levels of assortativity in cognition.Each subplot refers to a specific value of K and α.Solid lines represent the average rate of cooperation under intuition, dashed lines represent the average cooperation rate under deliberation (which coincide with p).Each color refers to a specific value of p.
,0 .The third memory is m t i,1 , and it similarly comprises two elements: R t i,C,1 and R t i,D,1 , which are statistics of past payoffs, for cooperation and defection, respectively, when agent i has deliberated in the one shot prisoner dilemma.Both the memories used for deliberation, m t i,0 and m t i,1 , are updated following the same procedure as m t i .

Figure 10 :
Figure 10: Evolution over time of the average rate of cooperation under deliberation when playing the one shot and the repeated prisoner dilemma for different values of the parameters K, A, and α.

Figure 11 :
Figure 11: Values of K that maximizes total cooperation varying the parameter p. Different lines with different colors are associated to different values of A.