Learning leads to bounded rationality and the evolution of cognitive bias in public goods games

In social interactions, including cooperation and conflict, individuals can adjust their behaviour over the shorter term through learning within a generation, and natural selection can change behaviour over the longer term of many generations. Here we investigate the evolution of cognitive bias by individuals investing into a project that delivers joint benefits. For members of a group that learn how much to invest using the costs and benefits they experience in repeated interactions, we show that overestimation of the cost of investing can evolve. The bias causes individuals to invest less into the project. Our explanation is that learning responds to immediate rather than longer-term rewards. There are thus cognitive limitations in learning, which can be seen as bounded rationality. Over a time horizon of several rounds of interaction, individuals respond to each other’s investments, for instance by partially compensating for another’s shortfall. However, learning individuals fail to strategically take into account that social partners respond in this way. Learning instead converges to a one-shot Nash equilibrium of a game with perceived rewards as payoffs. Evolution of bias can then compensate for the cognitive limitations of learning.


Payoff function and one-shot game
For the payoff function in equation (4), we note that and where δ i j is the Kronecker delta, which is equal to one if i = j and zero otherwise. By our assumptions about the payoffs, the matrix with elements given by equation (S2) for i = 1, . . . , g, j = i, k = 1, . . . , g is symmetric and negative definite. According to [36], this implies that a one-shot game with this payoff function has a unique Nash equilibrium.

Approximate actor-critic process
Here we investigate the actor-critic learning dynamics in the vicinity of a one-shot Nash equilibrium, for low rates of learning and small σ , and assuming that g > 1. With a * i = a * i (q · ) the equilibrium from equation (13), i.e. (14, S21) for our special case of equations (2, 3), we define the deviations For convenience we used the notation where the expressions on the right-hand side are for the special case. The reason for writing out second order terms like y jt z it in equation (S3) is that they contribute to the covariance below. Because the z jt are (independent and) normal with mean zero and standard deviation σ , the expectation of the TD error in equation (8), conditional on the x jt and y jt , j = 1, . . . , g, is which gives the deterministic part of the increment in equation (7). For equations (10, 11), we need the (conditional) covariance of δ it with the eligibility ζ it in order to compute the deterministic part of the increment in the learning parameter θ it . We get the covariance Because the eligibility from equation (9) is equal to z it /σ 2 , which has variance 1/σ 2 , becoming big for small σ , terms containing z it , like ω 11 y it z it in equation (S3), contribute to lowest order to the covariance.
To approximate the actor-critic learning dynamics as a vector autoregressive process, we introduce the vector ξ t = (x 1t , . . . , x gt , y 1t , . . . , y gt ) T , with elements ξ lt , l = 1, . . . , 2g (and if we approximate time to be continuous, we obtain a multivariate Ornstein-Uhlenbeck process). We then have a VAR(1) process given by where the matrix A is expressed using the approximate deterministic dynamics, given in equation (S10) below, and ε t is a vector of zero-mean, serially uncorrelated stochastic increments. The process is stable (stationary and ergodic) if the eigenvalues of A have modulus less than one (see a textbook on multivariate time series analysis, e.g., [9]. From equations (7, 10, S3), the stochastic increments are given by for i = 1, . . . , g. Except for the special case g = 1, which we do not consider here, the first of these, involving z jt , is normally distributed to lowest order, but the second, involving products z jt z it of two independent normally distributed variables, has a leptokurtic distribution. Nevertheless, for g > 1, numerical simulation of the learning dynamics shows that the equilibrium distribution of the process is approximately multivariate normal. Let P be the variance-covariance matrix of the increment vector ε t to lowest order, which is given in equation (S11) below. The equilibrium variance-covariance matrix Q of the process ξ t satisfies which is sometimes called the discrete-time Lyapunov equation. This is readily solved, as the linear system in equation (S12) below, or numerically through iteration. The solution Q was used to generate the comparison in Fig. S1, showing that the approximation is at least reasonable for low rates of learning, which is also illustrated in Fig. S2. We write the matrix A in equation (S6) as a block matrix with g × g matrices as blocks. From equations (7, 10, S4, S5), the blocks are given by A 21 = 0 A 22 = I g + α θ (ω 11 1 + ω 22 I g ),

2/6
where I g is the g × g identity matrix and 1 and 0 in the last three equations indicate g × g matrices with all elements 1 and 0, respectively. The variance-covariance matrix P of the stochastic increments in equation (S7) is also expressed as a block matrix P 11 = α 2 w ω 2 1 σ 2 (g − 2)1 + I g (S11) P 12 = 0 where 0 and 1 in the different equations indicate g × g matrices with all elements 0 and 1, respectively. Using the vectorization operator and the Kronecker product, equation (S8) can be written For a stable process, the 4g 2 × 4g 2 matrix I 4g 2 − A ⊗ A can be inverted, providing the solution Q from I 4g 2 − A ⊗ A −1 vec(P).

Comparative statics of the Nash equilibrium
Here we show that, for a Nash equilibrium a * i (q · ), i = 1, . . . , g, satisfying equation (13), the equilibrium actions depend on the qualities in the following way: where j = i. To show this we note that equation (13) holds for all qualities, so we can take the partial derivative with respect to q j , leading to for i, j = 1, . . . , g. If we let A be the matrix with elements A i j = ∂ a * i /∂ q j , we can write this as where G and F are diagonal matrices with and H = β J where J is a matrix with all elements equal to one and Noting that H has rank 1, we can use a result from [37] to write the inverse of G + H as where The solution to equation (S15) is then Correlation D Figure S1. Illustration of how the limiting variance-covariance matrix from equations (S8, S12) is approached by the multivariate distribution of the learning parameters w i and θ i after many rounds of learning, for successively smaller rates of learning. The rate of learning is expressed through a time constant, 1/(1 − λ ), where λ is the leading eigenvalue of the matrix A from equations (S9, S10). Panel A shows the SD of w 1 (blue) and θ 1 (red) for g = 2, shifted slightly left and right to avoid overlap. The lighter coloured symbols are the limiting predictions from equation (S12) and the bolder symbols show values from individual-based simulations. Approach to the limit means that bold symbols come closer to the corresponding lighter coloured symbols. The different symbol shapes indicate different values ofq, for group compositions as in Fig. 2 (q = 0, circle;q = 1/2, triangle;q = 1, plus). Panel B shows correlations between w 1 and w 2 (blue) and between θ 1 and θ 2 in the same way (q = 0, circle;q = 1/3, triangle;q = 1, plus). Note that the limiting distribution is degenerate when g = 2, with the limiting correlation between θ 1 and θ 2 equal to 1. Panels C and D illustrate the same thing for g = 3.  Figure S2. A snapshot of the the values of the mean actions θ 1 and θ 2 from simulations of many groups of size g = 2. The individual qualities in each group are q 1 = q 2 = 1. Panel A has the same parameters as the leftmost red plus symbol in Fig. S1B, which corresponds to the highest rates of learning, and panel B has the same parameters as the rightmost red plus symbol in Fig. S1B, which corresponds to the lowest rates of learning. In each panel the line, with slope 1 and intercept 0, shows the limiting relation of a correlation equal to 1. The plus symbol at θ 1 = θ 2 = 1 shows the location of the Nash equilibrium.

4/6
As G −1 F is a diagonal matrix, we see directly that ∂ a * i /∂ q j < 0 for j = i, and then it follows from equation (S14) that ∂ a * i /∂ q i > 0. For our special case of equations (2, 3), one readily solves equation (13), yielding equation (14) with We note that e 1 > 0 and e 2 < 0, so that with j = i, which agrees with equation (S13). Also, g = 1 is a special case where e 2 is not relevant, but e 0 and e 1 above apply to this case. The sensitivity of the equilibrium actions to differences in quality between group members can be written

Evolution of cognitive bias
First, if the true qualities of group members are q i , an evolutionary equilibrium for the perceived qualities p i should satisfy for i = 1, . . . , g, where W i is the Darwinian reproductive value from equation (4). Note that this expression for the derivative takes into account that a Nash equilibrium a * (p · ) depends and the perceived qualities of the group members. Using equation (15), this becomes the condition for the derivative in equation (16). Next, replacing the true qualities q i with the perceived qualities p i , equation (14) for the Nash equilibrium becomes a * i (p · ) = e 0 + e 1 p i + e 2 ∑ j =i p j = e 0 + e 1 p i + e 2 (g − 1)p −i . (S25) 5/6