## Introduction

Second, prior expectations (i.e., descriptive norms, the perceptions of what most people do27,28,29) inform decision making by generating predictions about the behaviour of others (e.g., I will cooperate if this person is likely to reciprocate). Individual differences in initial expectations about others may lead to individual differences in choices30, and it is conceivable that different age groups have varying prior expectations31. However, prior expectations have hardly been studied in developmental populations, despite being important determinants of cooperative behaviours.

Third, adjusting behaviour requires expectations to be updated in response to new information. Updating of expectations in social environments can be captured by reinforcement learning (RL) models (e.g.,32,33,34), in which learning is driven by differences between expected and received rewards (i.e., prediction errors). Adolescence is characterized by substantial improvements in flexible learning and quick adaptation to novel non-social contexts35,36,37; whether this extends to the social domain, however, is still unclear (but see38).

Here we examine experimentally how children, adolescents, and adults adjust to social environments that differ in their level of cooperation, and aim to provide a mechanistic explanation by evaluating the role of social preferences, prior expectations, and expectation updating. To achieve this goal, we deployed a set of economic games, together with behavioural analyses and computational reinforcement learning modelling. Our cross-sectional sample spanned from late childhood into early adulthood (8 to 23 years old, N = 244). Participants played age-appropriate versions of two well-studied incentivized economic games: A Trust Game (Fig. 1b) and a Coordination Game (Fig. 1d). These two games involve key types of cooperative behaviours: trust and coordination. Trust is key for mutually beneficial cooperation to be initiated and sustained (e.g.,12,19,39), and for achieving beneficial outcomes for all interaction partners involved. Yet, trust also creates a hazard of being betrayed. Similarly, coordinating one’s behaviour with others is often critical for collective welfare, even though outcomes may not always equally benefit all interaction partners40,41.

The two games consisted of repeated one-shot interactions, in which both players had to choose between two options. In each trial, they encountered one new anonymous player from either a Cooperative environment or an Uncooperative environment in two games (the Trust Game and Coordination Game). The decisions of these players had been recorded in a previous session with age-matched unfamiliar others (see “Methods, pre-test). Participants were explained that between environments, players could differ in their tendency to choose X (see Fig. 1b,d). To maximise their earnings, participants had to learn over the course of the game which environment was Cooperative and which environment was Uncooperative, and adjust their choices accordingly. That is, in the Trust Game they had to learn in which environment the typical behaviour was choosing X (labelled the ‘Trustworthy environment’) and in which environment the typical behaviour was choosing Y (labelled the ‘Untrustworthy environment’). Participants maximized their monetary outcomes by trusting (i.e., choose A) Trustworthy others and withhold trust (i.e., choose B) from Untrustworthy others. Similarly, in the Coordination Game, participants had to learn in which environment players tended to choose X (labelled the ‘Friendly environment’) and in which environment they tended to choose Y (labelled the ‘Unfriendly environment’). Participants maximized their outcomes by coordinating with the response of the others, i.e., participants accepting a disadvantage (i.e., choose A) when interacting with the ‘Unfriendly environment’, or accepting an advantage (i.e., choose B) when interacting with the ‘Friendly environment’. The social environments in these games were probabilistic, as cooperative behaviours were displayed by 73% of the players in the Cooperative environments, and by 27% of the players in the Uncooperative environments.

Participants also played an iterative Ultimatum Game (UG) and Dictator Game (DG), which allowed us to estimate participants’ social preferences (i.e., advantageous and disadvantageous inequality aversion; see “Methods”). We separately assessed participants’ prior expectations of the behaviour of others before the start of the Trust Game and Coordination Game (see “Methods”). Furthermore, we used computational reinforcement-learning models42 to model the updating of expectations between interactions. In these models, the learning rate quantifies how much an expectation violation modifies our subsequent expectations and consequently our decision making. We allowed learning rates to decay over the course of the games because we expected that most of the learning about the environments would happen in the first set of trials. After that, behaviour would stabilize, provided the environments did not change their behaviour (for more on learning rates and environmental stability see43,44,45). We extended these reinforcement learning models to account for the measured prior expectations and social preferences32, and compared the parameters of these models across age cohorts (see “Methods”).

We hypothesized that participants would be able to learn to adjust their behaviours to social environments differing in their level of (non)cooperation, but that across adolescence this ability would improve rapidly. We expected that these developmental differences could be explained by a combination of (1) social preferences (i.e., age-related changes in levels of advantageous and disadvantageous inequality aversion), (2) prior expectations (i.e., age-related changes in expectations about others’ trustworthiness and tendencies to prioritise their own payoffs over those of others) and (3) updating of expectations (i.e., age-related changes in learning rates).

## Results

### Learning to adjust to cooperative and uncooperative social environments across age

First, we examined decisions over the course of the games to assess whether children, adolescents, and young adults adjust their behaviour to different social environments with different levels of cooperation. For this, we used the Trust Game in which participants maximized their monetary outcomes by trusting Trustworthy others and withhold trust from Untrustworthy others (Fig. 1b), and the Coordination Game in which participants maximized their outcomes by coordinating with the response of the others, i.e., participants accepting an advantage when interacting with the ‘Friendly environment’, or participants accepting a disadvantage when interacting with the ‘Unfriendly environment’. We performed a binomial generalized linear mixed model (GLMM) per game on participants’ binary choices, including social preferences and prior expectations of others’ behaviour (see “Methods”).

For the Trust Game, results indicated an accelerated change in adolescence in which people differentiated more between the Trustworthy and Untrustworthy environment (environment x age linear, B = -0.307, P < 0.001; environment x age quadratic, B = 0.205, P = 0.015; N = 244; see Table S1 for full statistical analysis; Fig. 2a). Post-hoc tests per social environment showed that trusting the Trustworthy others increased rapidly in early to mid-adolescence (age linear, B = − 0.384, P = 0.006; age quadratic, B = 0.316, P = 0.020). In contrast, adjusting to Untrustworthy others improved slightly, and monotonically across adolescence (age linear, B = 0.233, P = 0.031).

For the Coordination Game, results again indicated that with age, people differentiated more between the Friendly and Unfriendly environment (environment x age linear, B = -0.458, P < 0.001; N = 202; see Table S3 for full statistical analysis; Fig. 2b). Post-hoc tests per social environment showed that optimally coordinating to the Unfriendly environment (i.e., participants accepting their disadvantage) increased across adolescence (age linear, B = − 0.446, P < 0.001). However, coordinating to the Friendly environment (i.e., participants accepting their advantage) did not change with age; participants from all age cohorts adjusted quickly to this environment. Together, these results show that people coordinated to both environments but younger participants were less likely to accept a disadvantage than older participants.

### Social preferences and prior expectations

Social preferences (advantageous and disadvantageous inequality aversion) and prior expectations of others’ behaviour are features that may account for age-related changes in learning to adjust to different social environments. Before further testing their relation to behaviour in the Trust Game and Coordination Game, we first examined the age-related changes in these parameters. Robust linear regression analyses (5000 bootstraps) indicated that only disadvantageous inequality aversion changed across age (Fig. 3a–d). Specifically, older participants were, compared to younger participants, less averse to being behind (age linear, B = − 0.098, β = − 0.308, P < 0.001, 95% CI = [− 0.139, − 0.057], N = 244). We did not observe significant age-related change for advantageous inequality aversion (age linear, B = − 0.099, β = − 0.118, P = 0.133, 95% CI = [− 0.229, 0.031], N = 202), nor for prior expectations of others’ trustworthiness (age linear, B = − 0.048, β = − 0.085, P = 0.209, 95% CI = [− 0.124, − 0.027], N = 245) or for prior expectations of others’ tendency to prefer to have more than the other (age linear B = − 0.031, β = − 0.076, P = 0.209, 95% CI = [− 0.087, − 0.024], N = 245).

In a binomial GLMM analysis, advantageous and disadvantageous inequality aversion were related to choices in the games (see Tables S1 and S3 for full statistical analysis). Greater disadvantageous inequality aversion was associated with overall fewer trusting choices (B = 0.215, P = 0.012) and with fewer choices in which participants accepted a disadvantage (B = 0.134, P = 0.048). In addition, greater advantageous inequality aversion was associated with greater acceptance of a disadvantage (B = 0.172, P = 0.009). In contrast, prior expectations were not related to choices in both games.

### Computational modelling of updating expectations

To understand how children, adolescents, and young adults update their expectations in different social environments, we developed computational models that extend basic reinforcement learning models46. In our models, participants use the outcome of interactions to update their expectations of their interaction partners’ choices in each social environment (Fig. 4a–c). The extent to which these expectations are updated is reflected in a learning rate (λ). Besides quantifying the updating of expectations, this computational approach allows us to confirm the role of social preferences as observed in our behavioural analyses. We extended the basic reinforcement model by i) incorporating mean cohort-level social preferences to calculate a subjective value of interaction monetary outcomes (Fig. 4a,c) that drives decision making, and ii) by allowing learning rates (expectation updating) to exponentially decay over trials of the game. Thus, we fitted four variants of this model (with and without social preferences; with and without decaying learning rates) to our experimental data for each age cohort and each game, to allow estimating different parameters (learning rates, expectation updating) across cohorts per game (see “Methods”).

For both the Trust Game and the Coordination Game, a comparison of model fits provided strong support for models extended with social preferences (Fig. 4d), confirming the results from our behavioural analyses that social preferences impact decision making. The best models also included decaying learning rates (see Fig. 4d and Table S7). For the Trust Game, we observe that for the 8–11 year-olds, estimated learning rates are constant over the course of the game, suggesting that in late phases, individuals in this youngest age cohort still updated their expectations of the behaviour in the different social environments. In the older age cohorts, learning rates start high (around λ = 1; asymptote not shown in Fig. 4e) and decay over trials, indicating that expectations take form relatively early in the game, and remain relatively stable later on. For the Coordination Game (Fig. 4f), we observe a similar pattern: older participants tended to show the strongest decay in learning rates over trials, whereas participants from the younger cohorts tended to update their expectations more early in the game.

## Discussion

Here, we examined children’s, adolescents’, and adults’ ability to learn to adjust to social environments that differ in their level of cooperation. We examined the role of social preferences (inequality aversion), prior expectations about others’ behaviour, and the updating of expectations as potential mechanisms underlying this behaviour. To this end, participants played a series of economic games with groups of age-matched unfamiliar others, which captured two important cooperative behaviours: trust and coordination behaviour. Our results show a striking developmental asymmetry in learning to adjust to (un)cooperative environments: people adjust well to environments that require uncooperative behaviour (i.e., withholding trust, accepting an advantage) from a young age, yet only during adolescence they learn to adjust to environments that require cooperative behaviours (i.e., trusting, accepting a disadvantage). Our results provide several insights into the mechanisms that explain these age-related differences.

First, age-related differences in learning to adjust to cooperative behaviours can be partly explained by differences in social preferences. Specifically, older participants showed lower levels of disadvantageous inequality aversion which explained their higher levels of cooperative behaviours in a Trust Game and Coordination Game. That is, younger participants are less willing to cooperate (trust, accept a disadvantage) given that they are more averse to potential non-cooperation of the other player. Moreover, our computational models confirmed that participants’ decisions were best captured by a reinforcement learning model extended with social preferences in all age bins. Note that in our current RL modelling approach social preferences influence choice behaviour through a subjective transformation on the participants’ expected payoffs in these games. This way, the RL models show e.g., how disadvantageous inequality aversion (dislike of being behind) can reduce the ability to adjust to environments that require cooperative behaviours. Future work should explore whether other factors underlie the age-related changes in social preferences and their mediating effects on cooperative (adjusting) behaviours. For example, the willingness to punish disadvantageous outcomes, or trying to force the other to coordinate in your favour may be alternative motivations underlying these effects. However, those questions are better answered by using experimental designs in which participants play multiple rounds with the same person, rather than a series of one-shot games. Taken together, our results underline that for understanding age-related changes in social decision making it is critical to understand the development in social preferences, which differ across developmental windows and largely drive social decision making.

A potential mechanism that may relate to the influence of inequality aversion on decision making, is behavioural control21. Behavioural control refers to the ability to control thoughts and actions in order to regulate behaviour towards (long-term) goals47,48. Developmental studies have shown that behavioural control undergoes protracted development due to a prolonged maturation of underlying neural circuitry in regulatory brain regions including the prefrontal cortex48,49,50. In turn, this would result in developmental changes in responses to inequality into childhood and presumably into adolescence24,51. An experiment in children also confirmed a direct role of behavioural control in behaviour that benefits others: taxing children with a response inhibition task resulted in less prosocial behaviour and more costly punishment to violations of fairness52. An alternative explanation is that inequality may evoke stronger emotional responses, such as increased levels of anger19,53. This would yield a different view on social preferences in which responses to inequality can be based on emotion regulation ability. Future studies are necessary to further disentangle whether such self-regulatory behaviours drive the development of social preferences and their influence on cooperative behaviours. Consequently, an interesting field for future studies is whether strengthening self-regulatory processes is a promising pathway for stimulating cooperative behaviour in young people.

Besides social preferences, we also examined how people’s prior expectations of others’ trustworthiness and inclination to take more than others influenced learning in different social environments. Our results indicated that reported prior expectations of others’ behaviour were stable across age cohorts. This is surprising given the consistently reported increase in cooperativeness across age (e.g.,5,12,13,14), which was also observed in the current experiment. This suggests that there is a developmental mismatch between prior expectations and the actual levels of cooperation. Moreover, contrary to our hypotheses, we did not find effects of prior expectations on learning to adjust to different social environments. Perhaps people do not have strong prior expectations about others’ behaviour in the anonymous games used in the current study, and any expectations they might have are overridden quickly by outcomes of interactions. Presumably, effects of prior expectations in the current setup would be more prominent in a more heterogeneous sample with greater diversity in—for example—life-history backgrounds. For instance, prior expectations (as well as the updating of these expectations), may be different for people who have grown up in an environment where rewards and punishments are unpredictable, this may be particularly the case for children who have experienced harsh and inconsistent discipline, maltreatment and neglect54,55. These expectations of others’ behaviour may match their environmental experiences and as such, they may engage in social situations differently. Thus, when assessing the generalizability of our results it would be important to include a more heterogeneous sample with greater diversity in life-history backgrounds. Including different populations could also help answering the question to what extent prior expectations about behaviour in games reflect prior expectations about cooperative behaviour in the real world (e.g.,56).

Here we used computational modelling to quantify how quickly children and adolescents updated their expectations based on choice outcomes in previous interactions. Interestingly, when placed in a new social environment, people were initially highly sensitive to behaviours of other players, and quickly adapted their behaviour to the outcomes they experienced. For older ages, behaviour stabilized after a few interactions as signalled by a decrease in learning rate. Children and young adolescents, however, continued to react to the choices of others across the games. That is, they often switched strategies after a surprising response from one of the environments. This finding indicates that during adolescence, people more effectively integrate outcomes over time, and consequently form stable expectations of others based on their behaviour, which are not quickly overridden by a single experience. Building lasting relations may crucially depend on this integrated information of others’ behaviour. Although the continuous expectation updating of children and adolescents hampers their learning in stable environments, this actually may provide an advantage in fast-changing or unpredictable environments57. That is, in such environments, immediately responding to changing feedback is more beneficial than sticking to prior expectations36. Whether fast-updating better fits children’s and adolescents’ experienced social environments is an interesting question for future studies.

In the current study, participants were confronted with choices from actual peers and real-life consequences of their actions for all interaction partners. This two-directional approach, rather than often-used one-way decision making, is acknowledged as an important aspect of paradigms in social sciences58. However, the controlled social environments in our study are less complex than real-life social interactions, in which factors such as social status, culture, or reputation may complicate social decision making. Future studies, e.g., field studies or studies using virtual reality, could aim to further approach the complexity of real-life social interactions, while retaining experimental control. In addition, we included a specific experimental set-up of social learning in which participants were given prior information on the different social environments. Future studies will need to assess whether our developmental findings hold in settings where participants need to figure out base rates of cooperativeness and exploitation on their own. Another limitation of the current study is that whereas social preferences were revealed preferences, prior expectations were stated expectations about others. People find it hard to estimate probabilities, and future studies need to assess the validity of these preferences with individual difference measures. Moreover, although IQ did not differ between groups and did not influence any of our findings, our adult participants were mainly recruited through university advertisements. Future studies should aim for a representative sampling strategy in each age cohort. A final limitation of the current study is its cross-sectional design, as longitudinal studies are necessary to identify developmental patterns. Therefore, developmental interpretations of behavioural results and the underlying mechanisms remain speculative.

In sum, we combined computational learning models and experimental social manipulations to demonstrate age-related changes in adjusting cooperative behaviours. Well-developed social skills are essential for succeeding in society and for long-term positive outcomes. The ability to adapt to different social environments and discern who we should trust and cooperate with, may benefit short-term outcomes, but may also foster social relationships and restrain behavioural and mental health problems in the long-term1,2,3,4. Knowledge of how such social adjustment behaviour manifest in different developmental stages inform what ages are the important developmental phase for monitoring social development, and what ages are potentially more receptive to interventions2,59.

Our study has shown that adjusting cooperative behaviours is developing rapidly in early adolescence. Improvements in adjustment to different social environments are driven by developing social preferences (waning aversion to disadvantageous inequality aversion) and increasingly effective updating of own behaviour in response to others’ behaviour. Early adolescence would, therefore, be a key target window for interventions targeted at stimulating cooperative and well-adjusted social behaviour. Moreover, these findings provide important starting points for interventions for youth with maladaptive social tendencies, such as youth with conduct disorder problems60,61.

## Methods

### Participants

A total of 269 participants (58.4% female) between ages 8 and 23 years took part in this study. Participants were recruited from a primary school (n = 60), two secondary schools (n = 128), and through local advertisements at a university campus (n = 81) in the western and middle parts of The Netherlands. The majority of the participants (92.3%) were born in the Netherlands, and a minority was born elsewhere (Morocco 1.4%; all other countries < 1%), or information was missing (1.4%). Twenty participants from secondary schools (ages 14–16) were excluded due to technical problems with saving the learning data. Four participants were excluded because they did not finish the cognitive behavioural measures, and therefore IQ could not be estimated. The final sample consisted of 245 individuals aged between 8 and 23 years.

For analyses using age cohorts (see Computational modelling in the section below), we divided the sample into four roughly equally-sized age cohorts: 8–11 year-olds (n = 54, 46.3% female, mean age 10.6, SD 0.9), 12–14 year-olds (n = 73, 52.1% female, mean age 13.4, SD 0.7), 15–18 year-olds (n = 57, 59.6% female, mean age 17.0, SD 1.3), and 19–23 year-olds (n = 61, 80.3% female, mean age 21.1, SD 1.4). A χ2-test indicated sex differences between age cohorts ($$\chi_{\left( 3 \right)}^{2}$$ = 16.6, P = 0.001), with more females in the oldest age cohort. IQ was estimated using a speeded version of the Raven Standard Progressive Matrices62. The estimated IQ scores were largely within the normal range varying between 79 and 136 (mean IQ = 106, SD = 10.3), and did not differ significantly between age cohorts (F(3,237) = 2.18, P = 0.090) and sexes (F(1,237) = 0.28, P = 0.770). Additional analyses showed that sex differences and IQ did not confound performance on the social games, and did not influence any of our observed age-related changes therein (see Tables S2 and S4).

### Pre-test

A key component of the economic games used in the current study is that choices have consequences not only for oneself, but also for the other player. To ensure this, we performed a pre-test at a separate high school and a separate adult sample (both in The Netherlands) functioning primarily as a match for determining the participants’ outcomes and thereby creating a true social consequence of behaviour.

In total, 82 adolescents and 44 adults were asked to make one choice (X or Y) for each social game (Trust Game and Coordination Game, see Fig. 1). We randomly linked each participant in the full-experiment with one pre-test participant. This match and the combined outcomes of their choices determined the outcome for the participants (number of points), as well as for the pre-test participant. The pre-test participants had a similar lottery ticket procedure as the participants from the full experiment, i.e., points were lottery tickets with which they had a chance of winning a 10 Euro gift voucher. All pre-test participants received a similar instruction as the participants of the main study. That is, it was stressed that their choices would have consequences for themselves and another participant, since their outcomes would result from their combined choices.

### Economic games: Trust Game and Coordination Game

Participants completed two incentivized economic games: A Trust Game and a Coordination Game (Fig. 1). Each game was composed of 30 trials in total: each trial was a one-shot game with a new anonymous player (whose decision had been recorded in the pre-test; see above). Every trial, the participants chose between 2 options (A or B) to distribute points between themselves and the other. After their decision, they could see the choice of the player (X or Y) and the outcomes for themselves and the player. Outcomes for self and the player resulted from their combined choices, as shown with payoff matrix $$\left[ {\begin{array}{*{20}c} {{\varvec{a}}, a^{\prime}} & {{\varvec{b}}, b^{\prime}} \\ {{\varvec{c}}, c^{\prime}} & {{\varvec{d}}, d^{\prime}} \\ \end{array} } \right]$$ where in each of the cells entries with and without apostrophes indicate payoffs for, respectively, the other and self (in bold).

In each of the games, the two social environments consisted of 20 players each (but note that participants interacted with only 15 players per environment). Environments are formed based on pre-test responses, which were matched to create a ‘Cooperative’ (73%, i.e., 11 out of 15) and an ‘Uncooperative’ social environment (Fig. 1). Over the course of the game trials, participants could learn the tendency of choosing X for each environment of other players and adjust their responses accordingly. Participants were incentivised by associating their performance with the chance of winning a gift voucher (see Supplementary Information for the instruction protocol).

The Trust Game (Fig. 1b) was characterized by payoff matrix $$\left[ {\begin{array}{*{20}c} {3,3} & {1,5} \\ {2,2} & {2,2} \\ \end{array} } \right]$$. Participants could maximise their earnings by choosing A (‘trust’; top row) when matched with a member of the Trustworthy environment, and choosing B (‘not-trust’; bottom row) when matched with a member of the Untrustworthy environment. The Coordination Game (Fig. 1d) was characterized by payoff matrix $$\left[ {\begin{array}{*{20}c} {2,3} & {0,0} \\ {0,0} & {3,2} \\ \end{array} } \right]$$. Participants could maximize their earnings by coordinating to their partners’ choices. That is, the participant needed to accept a disadvantage (choose A; top row) when matched with a member of the ‘Unfriendly’ environment, but when matched with a member of the ‘Friendly’ environment the participant needed to accept an advantage (choose B; bottom row).

The order of these two games was counterbalanced across participants. Within each game, participants played 30 trials, 15 trials with each environment of players (e.g., Trustworthy and Untrustworthy environment). The inconsistent choices within an environment (e.g., Y when playing with someone of the environment that prefers X) were distributed across trials, yet fixed on trials 4, 8, 12, and 14. Within a game, the order of interactions with the two different environments was presented randomly, yet fixed across participants.

Although our main research questions center on the factors specific to learning to adjust behaviour in different social environments (e.g., the role of prior expectations about others, and getting more or less than others), we also included a non-social learning task to examine the level of behavioural adjustment in a simple learning context (Figure S1 and Tables S5-S6). In this non-social learning task, participants played with computers as interaction partners, and only the participant—not the computers—could receive payoffs. A formal comparison between age-related changes in learning to adjust to non-social versus social environments is included in the Supplementary Information. A computational modelling approach on the non-social learning task is discussed in Figure S5.

### Social preferences

We measured advantageous inequality aversion and disadvantageous inequality aversion in two separate tasks: respectively, a modified Dictator Game (DG) and Ultimatum Game (UG). These measures were derived from an adapted (i.e., child-friendly and short) version of a DG and UG (based on63,64). Participants always performed the DG and UG right before the economic games.

In the Dictator Game participants were given six binary choices to divide 10 points between themselves and another anonymous participant in the study; one option was always an unequal distribution (10/0; 10 points for self, 0 points for the recipient), and the other option an equal distribution of points for themselves and the recipient (i.e., starting with (5, 5) and decreasing to (0, 0) with each subsequent trial [(4, 4), (3, 3), (2, 2), (1, 1), (0, 0)] or increasing to (10,10) with each subsequent trial [(6, 6), (7, 7), (8, 8), (9, 9), (10, 10)], depending in the first choice, see supplemental information).

In the Ultimatum Game, participants responded to six proposals of another anonymous participant in the study on how to divide 10 points. In the case of a rejection both players earn zero, whereas if the participant accepted the offer, the players get the proposed outcome. The first proposal was an equal split but every next proposal was more beneficial for the other than for self (i.e., (5, 5), (4, 6), (3, 7), (2, 8), (1, 9), (0, 10). For both games, we were interested in the point at which a participant switched their preference from an equal to unequal distribution, or vice versa. This allowed us to infer the point at which participants were indifferent between either distribution. This ‘indifference point’ represents participants’ inequality aversion. That is, higher indifference points in the UG indicate stronger disadvantageous inequality aversion [range 0–5], whereas lower indifference points in the DG indicate stronger advantageous inequality aversion [range 0–10]. We used indifference points as measures of inequality aversion in all behavioural analyses.

Note that for using social preferences in the reinforcement learning models we transformed the indifference points to measures of advantageous (β) and disadvantageous (α) inequality aversion, following the equations of64. Accordingly, α varied between 0 and 4.5 and β varied between 0 and 1 (see Supplemental information for a detailed description, and Figure S4). Note that these transformations are only relevant for the computational modelling as they are used for obtaining a subjective payoff matrix. However, if we rerun the behavioural analyses with these transformed inequality aversions all conclusions remain the same.

Finally, indifference points and inequality parameters α and β can only be calculated for people that show consistency in choice behaviour in the DG and UG. In total, 54 participants were excluded due to missing values for social preferences (missing disadvantageous inequality aversion, n = 1; missing advantageous inequality aversion, n = 53). See Supplementary Information, and Figures S2S4 for a more detailed description of the Dictator game and Ultimatum Game, and calculation of indifference points and inequality aversion measures (α, β).

### Prior expectations

Before the start of each of the economic games (Trust Game and Coordination Game), we assessed participants’ prior expectations about the behaviour of other people. We asked participants “Suppose that there are 10 other players, how many of these 10 do you think will choose X?” (i.e., ‘trustworthy’ choice in the Trust Game, or choice to have an advantage over another person in the Coordination Game). This resulted in a prior expectation of the trustworthiness of others (Fig. 3c) and a prior expectation of others’ tendencies to accept an advantage (Fig. 3d), both varying from 0 to 10.

### Procedure

All tests were administered in school settings. In the instruction of each learning task, three control questions were included to ensure understanding of the experimental procedure. Two questions quizzed the participant on their understanding of the point distribution (e.g., type how many points each player was winning in a certain choice combination), and one question referred to the colour denotation of the two environments. If participants failed one of the control questions, the instruction was repeated until participants understood the procedure of the game. For participants younger than 12, instructions were read out loud by an experimenter. All participants completed the tasks by themselves on computers in a quiet environment at school or at the university. Background variables such as the Raven SPM (estimated IQ) and several questionnaires (not relevant to the current study) were administered online using Qualtrics (www.qualtrics.com). In a separate session the DG, UG, and learning tasks were completed using the online software LIONESS Lab65.

### Statistical analyses of behavioural data

To assess age-related changes in prior expectations and social preferences we ran separate robust linear regression analyses (5000 bootstraps), each with age linear and age quadratic as predictors. Multiple mediation analyses were conducted in SPSS using the computational tool PROCESS version 3.366. For indirect effects, 95% (two-tailed) bias-corrected bootstrapped confidence intervals were calculated using 5000 repetitions. An indirect effect is significant if the confidence interval for the indirect effect does not include zero. These analyses were conducted in SPSS 25, and all tests were two-sided.

### Generalized linear mixed models

To analyse choice behaviour in the Trust Game and Coordination Game, we fitted logistic generalized linear mixed models (GLMMs) to decisions to choose A (coded as 0) or B (coded as 1) for each game separately. Analyses were conducted in R 3.6.167, using the lme4 package68. In all models, participant ID entered the regression as a random intercept to handle the repeated nature of the data. Where appropriate, the environment was entered as a random slope in our analyses to handle the differences between individuals in their responsiveness to learning different levels of (non)cooperation. Our GLMMs included a main effect of environment (e.g., Trustworthy environment, Untrustworthy environment), age in years (linear and quadratic), prior expectations of others’ choices and social preferences, and all two-way interactions with environment (see Tables S1-S4 for all GLMM results). Note that for the Trust Game we only added disadvantageous inequality aversion, whereas for the Coordination Game both social preferences were included. That is, in the Coordination Game both types of inequality can occur and drive choice behaviour, in contrast to the Trust Game in which only disadvantageous inequality is present.

In all GLMMs, age, prior expectations, disadvantageous and advantageous inequality aversion were mean-centered and scaled, and categorical predictor variables were specified by a sum-to-zero contrast (e.g., sex: − 1 = boy, 1 = girl). For the mixed-effects model analyses the optimizer “bobyqa”69 was used, with a maximum number of 1 × 105 iterations. P values for all individual terms were determined by Loglikelihood Ratio Tests as implemented in the mixed function in the afex package70. All statistics, including odds ratios and confidence intervals, are reported in Tables S1S6.

### Computational modelling

To gain a mechanistic understanding of participants’ learning to adjust in the Trust Game and the Coordination Game, we used a basic reinforcement learning (RL) model71 and extended it to accommodate social preferences20 (aversion to unequal outcomes). All our models follow the basic logic of RL, in which agents learn about others behaviour by updating their expectations with experience. In the case of the games, these expectations (denoted p) concern the behaviour of their interaction partners (X or Y; cf. Figure 1). In each trial, p is updated with a magnitude proportional to the prediction error (PE; the difference between the actual and expected choice) and the learning rate λ. Formally, pt+1 = pt + λ · PE, where PE = p − choice of other (1 if X, 0 otherwise). We fit a set of reinforcement learning models to the data to investigate how λ changes across age cohorts. This parameter is bounded between 0 (which means no updating of expectations at all) and 1 (which means that expectations match the decision of the most recent player).

In our models, the value of p determines the relative weights of wA and wB (Fig. 4). Each of the games is characterized by a payoff matrix (Fig. 1). In each trial t, expected monetary payoffs of choosing A or B are respectively given by wA,t = pt · a + (1 − pt) · b and wB,t = pt · c + (1 − pt) · d. We set the initial value of p0 to the cohort mean prior measured in our experiment. The probability that a participant chooses A is determined by a standard softmax function: Pr(A) = [1 + e–θwA − wB)]–1. As there are only two options (A and B) to choose from, the probability of choosing B is simply 1 −  Pr(A). In the softmax formula, θ reflects ‘decision sensitivity’ and accounts for stochasticity in participants’ choices: low values of θ indicate high levels of stochasticity (Pr(A) and Pr(B) tend to be near 0.5), and high values of θ indicate low levels of stochasticity. In our model fits, θ is a free parameter allowed to vary between 0 and 5.

We extended this baseline model with two factors. First, we include the cohort mean measures of social preferences; that is, we add the measured cohort averages of disadvantageous and advantageous inequality aversion to calculate wA and wB.. In particular, for the Trust Game, the weight of option A was penalized with a value proportional to the disadvantageous inequality aversion (i.e., α; note that we drop the subscripts as we assume social preferences to be parameters with a constant value20 : wA = p · a + (1 − pt) · [ b − α · (b′ − b) ]. As for option B the payoffs for both partners are always equal, wB is unaffected by social preferences. For the Coordination Game, social preferences can affect the weights of both A and B: wA = p · α · (a′ − a), and wB = p · β · (d − d′), where β denotes advantageous inequality aversion.

Second, we allowed the learning rate λ to decay over the course of interactions. We implemented this by defining λt = λ0 · rτ, where r denotes the trial number, and τ is a free parameter that reflects the speed of the decay in learning, allowed to vary between 0 and 5. The values of the estimated parameters (θ, λ,τ) per age cohort and per game can be found in Table S7. For each of the four age cohorts, we pooled the data and fitted the model with each possible combination of the factors ‘social preferences’ and ‘decay’, yielding a total of four models per cohort per game. Note that we also evaluated a potential role for prior expectations by including mean cohort-level prior expectations in the initial valuation of the choice options. However, because prior expectations were relatively close to 5 (range 0–10; Fig. 2) this was close to the default expectation p of 0.5, marking indifference between the environments at the first choice. Hence, we did not apply formal tests of improved model fit for prior expectations.

Figure 4d shows the goodness-of-fit for each model summed across the four age cohorts relative to the best model, which includes both social preferences and decay. We included a simulation study with a parameter recovery component in the Supplementary Information. Our approach of fitting reinforcement learning models to cohort-level data was motivated by the fact that we had a limited number of observations to accurately fit our model to individual-level choice data. Note that sensitivity analyses with individually-derived parameters indicated this did not influence any of our model-fit conclusions or main findings.