Animal and human societies are organised in multiple layers and clusters with ingrained socioeconomic classes bearing inequalities that have been the target of interest of scholars for decades (Landtman, 1934). From 10,000 years ago to today, human societies have greatly changed due to the shift from hunting and gathering to farming, and to industrialisation. Many impute the appearance of these hierarchical levels to the emergence of agriculture and the formation of larger and more complex societies (e.g. Fix, 2019; Landtman, 1934; Lee, 1990). Others, instead, believe they were always present (e.g. David-Barrett and Dunbar, 2012, 2013), as if hierarchy inevitably appears in larger groups whose individuals display more complex interactions.

The nature of social relationships in which cooperative or defective individuals typically engage, both in nature and within human societies, is commonly ruled by an asymmetric use of power exerted by most individuals (Ames, 2007; Boehm, 1999; Chapais, 2015; Cohen, 1998; Keltner et al., 2003; Mattison et al., 2016; Pansini, 2012; van Vugt and Tybur, 2015). We are aware that the individuals’ unequal resource holdings affect, (unwittingly, Hauser et al., 2019) the willingness to cooperate at a given time (in empirically tested subjects, which also holds true in nature: Burton-Chellew et al., 2013; Cronin et al., 2015; Frank, 1996, 2010; Gavrilets and Fortunato, 2014; Herrmann et al., 2019; Pansini, 2011; Suchak et al., 2016; with different states’ economic systems: Acemoglu et al., 2017; in international joint ventures: Inkpen and Beamish, 1997). Moreover, these inequalities also affect future expectations in iterated exchange (Zeng et al., 2019). Therefore, when inter-hierarchical exchanges occur, the exercise of power alters the initial, hypothetical centrality of the game, with an eventual shift of the strategies of the players towards disjoint interests (Barker et al., 2015; Campennì and Schino, 2016; Cox et al., 2013; Herbst et al., 2017; Phillips, 2017; Wang et al., 2010) and, eventually, a decreased drive to engage into any deal.

Already depicted by Hume more than three centuries ago (Hume, 1975), at the heart of this reluctance to interact may lie a lack of reciprocal trust (Greif and Tabellini, 2010; Steckermeier and Delhey, 2018; Xin, 2017) of the poorer and richer classes of individuals who eventually prefer not to enter into transactions with each other. As a result, the civic engagement and higher political participation of lower classes is prevented (Uslaner, 2005) with detrimental impacts on the countries’ economies (Algan and Cahuc, 2014; Benabou, 1996; Bowles, 2012; Horváth, 2013).

Socioeconomic class segregation appears right in those societies with a high rate of income inequality (Axtell et al., 2001; Henrich and Boyd, 2008; Smith and Choi, 2007). Modelling the behaviour of individuals coming from such economic constellations allows us to better scrutinise human cooperation at an evolutionary level (Bowles et al., 2014), such as, for example during times when states have not yet implemented thorough wealth redistribution policies (Joyce and Xu, 2019).

Although behavioural economics research has traditionally focussed on the factors that determine the individuals’ choices between cooperation and defection, researchers have widened the perspective by including punishment as a further strategy to induce cooperation in unwilling partners (in the case of repeated social interactions, e.g.: Boyd and Richerson, 1992; Dreber et al., 2008; Fehr and Gachter, 2002; West et al., 2007; Yamagishi, 1986; conversely, rewards can also be computed in a game, e.g.: Jusup et al., 2018). A number of examples have been found by ethologists in nature whereby several species display punishment providing the individuals with an option to attempt increasing their own fitness (Clutton-Brock and Parker, 1995), but whether or not this behaviour prompts a higher use of cooperation or, rather, retaliation is still a question open to debate in humans and other species (Baldassarri and Grossman, 2011; Balliet et al., 2011; Baumard, 2010; Bone et al., 2016; Boyd et al., 2010; Dong et al., 2019; Fehr and Gachter, 2002; Fehr and Schurtenberger, 2018; Fowler et al., 2005; Gächter et al., 2008; Gao et al., 2015; Hilbe et al., 2014; Janssen and Bushman, 2008; Li et al., 2018; Nikiforakis, 2008; Raihani and Bshary, 2015; Raihani et al., 2012; Riehl and Frederickson, 2016; Wubs et al., 2016).

When punishment is a strategy the subjects can enforce, asymmetric interests may become more apparent (in Public Good Games: Baldassarri and Grossman, 2011; Gächter et al., 2008; Kuwabara et al., 2016; O’Gorman et al., 2009; Reuben and Riedl, 2013; Vincent, 2017; in Prisoner’s Dilemma Games: Pansini et al., 2016) and the players’ final payoffs may be even further differentiated (Burton-Chellew et al., 2013; Gächter et al., 2017; Raihani et al., 2012). This may be the case especially when social classes are segregated, leaving it largely up to more powerful subjects to impose such a strategy over their less dominant partners. We therefore deemed it appropriate to include punishment in the analysis of inequalities.

For the first time after 12 years, in 2013, the Gini coefficient (measuring income inequality within countries) of the world most populous country of China has become publicly known (The Economist, 2012) presenting predictably high levels in the range of 0.53–0.55 (Xie and Zhou, 2014). Top scoring countries like South Africa, Brazil, and Nigeria fare at 0.4–0.6 Gini levels. More recent data, accessed from possibly biased governmental sources, report that the Gini index in 2015 stood at 0.462 and in 2016 at 0.465 (Jizhe, 2016). As the second richest nation in the world, their per capita income becomes the 58th country out of a 79 worldwide list positioning between Costa Rica and Brazil (Bloomberg, 2018). Although the negative effects of class segregations are well known in these countries, and 77% of Chinese respondent declare large concerns in this regard (Lockett, 2016; Wike and Stokes, 2016), we do not know of any empirical examples in behavioural economics literature and Prisoner’s Dilemma Games displaying how to lessen the individuals’ disparity in incomes. Recently, Bone and colleagues (2015) provided social dilemma game participants with different punishment powers. We chose a different approach here. In our experiment, we estimated and accounted for the participants’ socioeconomic background as a life history covariate determining their ability to punish or not punish the opponents.

In this paper, we model socioeconomic class segregation wherein individuals that are more dominant can exert punishment over less dominant ones. Our experiments and ensuing simulations were implemented in a country with a high rate of income inequality, China. We found that a redistribution of wealth could be attained only when both rich and poor subjects are allowed to punish each other. More specifically, we invited richer and poorer individuals to play a Prisoner’s Dilemma Game with a costly punishment option in both a segregated and an integrated condition. Steep differences were found in payoff gains in the segregated condition (rich people playing with poor ones), while no differences were found in the integrated condition (designed and implemented with random socioeconomic matching). To model the effect of punishment realistically, we imposed a simplified and skewed payoff strategy on the players, whereby in a first treatment only dominant and richer players were able to exert punishment on subordinate and poorer players. Later, we allowed only poorer subjects to punish richer ones instead. By doing so, we established two treatments to predict whether such segregated societies in relation to both punishment and socioeconomic status would have influenced changes in the players’ preferred strategies and payoff gains versus reference results of an integrated society. This hypothetical integrated society was the one we assumed giving rise to a redistribution of wealth.


Laboratory experiments

The experiments were performed at Yunnan University of Finance and Economics in Kunming, Yunnan province, China, with students still novel to game theory and behavioural economics tests.

Informed consent forms were submitted for all the anonymous participants in our research. The Yunnan University of Finance and Economics Ethics Committee approved all the experiments utilising human subjects, which, in turn, was carried out in accordance with the approved guidelines.

The experiments took place during 9 different days of November and December 2014 and of April 2015. The students’ age spanned from 18 to 22 and their geographical provenience varied across different parts of China. We tested a total of 348 students (164 males, 184 females). The same subject never took part in more than one experiment and that subjects had not previously participated in other behavioural experiments. Our analyses included 9 experimental sessions with three replicate sessions for each of the three game scenarios.

While sitting in a computer lab, the subjects were briefed in neutral terms regarding the rules of the Iterated Prisoner’s Dilemma Game. The experiment consisted of three, between-subject treatments. In each of them, the students were allocated to two different trading classes. In the CDP class, the subjects were free to choose between three strategies: cooperation, defection, and punishment (referred as A, B, and C to the students). In the CD class, they were only allowed to choose between two strategies: cooperation and defection. For simplicity, the exit option to terminate the interaction (Vanberg and Congleton, 1992) was not given to the players. The treatments were implemented as following: in T0 the subjects were assigned randomly to the CDP and CD class; in T1 richer students were assigned to the CDP class and poorer ones to the CD class; in T2 (the reverse of T1) poorer subjects could choose any option amongst CDP while the richer ones could opt only for CD. The treatments comprised a number of subjects so to be counterbalanced: T0 116 students, T1 122 students, and T3 110 students. The students were not informed of the socioeconomic principles, which we elected to offer them for the purposes of our research and why their options differed. They were simply informed that one group had the option of employing the extra strategy, namely, punishment.

The purpose of T1 is to simulate societies often found in developing countries with high Gini coefficients—with China at present providing an extreme example (Xie and Zhou, 2014). T2, contrastively, stand out as an unlikely scenario in which poor people can only punish people richer than them. This treatment helps ascertain some degree of causation arising between T0 and T1. That is, whether results can be interpreted simply via implementation of the segregation effect or whether the segregation effect combined with the social status influenced the outcome.

The payoff matrix as designated in the diagram below shows the outcome of the round of each game expressed in earned points. Upper and lower case letters indicate the reading order.


player 2





1, 1

A, a

−1, 2

B, d


2, −1

C, b

0, 0

D, e


1, −3

E, c

−1, −2

F, f

The values for defection and punishment in this experiment are lower than in Dreber et al. (2008) and Wu et al. (2009), largely due to the concerns raised by Rankin et al. (2009), who showed that assigning levels too high to defection may confound the players about the difference between these two game options.

We matched the individuals into dyads who had to repeatedly interact at successive rounds. We tallied an average of 82.5 iterated rounds per session and 22.4 interactions per session. During each round of interactions, two participants simultaneously chose between two or three available options (CD or CDP). At the completion of each round, each participant was shown his or her partner’s choice together with the total payoff scores. The experimental sessions averaged 38 participants each, up to 9 repeated interactions with the same dyad according to a 75% probability stopping rule (Dal Bó, 2005), with an average duration of each experiment of ~1 h (unknown to the players, to avoid end-game effects). At the end, each subject received an average payment of ¥53.64 Chinese renminbi.

At present in China there is no consistent criterion for assigning scholarships to students from lower income families. Thus, university enrolment offices are not aware of the precise socioeconomic status of their students. To obtain this information, we requested each student to fill an anonymous questionnaire supplying us with this indispensable data right before the start of the experiments. The questionnaire was answered electronically in the computer lab. From the students’ answers, we could assign each subject to the two classes while concomitantly applying the permissible strategies as per our methodology, either CDP or CD, by calculating the results of the questionnaire in the computer lab in real time with a precompiled algorithm. In particular, we asked students to provide basic demographic information focussing on their families’ income and other elements (reported in Supplementary Methods). The list included, how many houses their family owns (HOUSE#), how many cars (CAR#), how many rooms present in their first house divided by the number of family members (ROOM#/FAMMEMB#), their parents’ job type (JOB), and whether they have an undergraduate degree (BACDEG) (for the full list of questions, see Supplementary information). From their answers, we calculated an index in order to quantify where they stood along a variable socioeconomic class spectrum. This index was determined arbitrarily as

$$\begin{array}{l}{Y} = 2.5\,{\mathrm{HOUSE}}\# \, + \,2\,\left( {{\mathrm{CAR}}\# \, + \,1} \right)/{\mathrm{FAMMEMB}}\# \, + \\ 2\,\left( {\left( {{\mathrm{ROOM}}\# \, + \,1} \right)/{\mathrm{FAMMEMB}}\# } \right) + 1.5\left( {1.5\,{\mathrm{FATJOB}}} \right) + \\ {\mathrm{MOTJOB + FATBACDEG + MOTBACDEG}}\end{array}$$

We assigned different weights to these indicators of wealth, based on the common assumption that, in order of importance and weight (a) the possession of properties weighs more than the possession of cars, (b) a house with more rooms is more valuable because usually it is larger, (c) that parents with jobs as managers weight more than employees, and (d) parents graduated from university also contribute at increasing this socioeconomic index.

Using the results, we could rank and assign each subject to one of the two classes we created, splitting them according to whether their answers placed them below or above the median of their respective indexes.

Validation of the socioeconomic index

To validate whether the answers to the questionnaire were representative for ascribing the two student types to their actual socioeconomic classes, we checked whether individuals with higher scores were also those individuals earning more during the experiment. Due to their elevated socioeconomic background, we hypothesised that, in fact, they could have chosen more effective strategies and eventually earned more points (Bone et al., 2015; Holland et al., 2012; Nikiforakis et al., 2010). This hypothesis was confirmed by our sample, since the single students who earned relatively more within their experimental class division were those coming from a financially better off background (F1,266 = 4.84, p = 0.029) (Figs 1 and 2 in Supplementary Results).

Agent-based model

Following the lab experiment, we developed a set of Agent-based simulations to compare the simpler behaviour of agents with humans, when the former adopted similar or different strategies than the latter group. Adopting an Agent-based modelling approach, in fact, offers one significant advantage: the experimenters can test different hypotheses pertaining to the evolutionary and cognitive mechanisms needed to perform different behaviours (Campennì and Schino, 2014).

As in the segregation treatments of T1 and T2, we split the agents into two economic classes (rich and poor) and we assigned a group-dependent propensity in adopting a cooperative or defective behaviour.

The agents played according to the same payoff matrix adopted in the experiments with human subjects.

The first hypothesis we decided to test is whether the social status of the individuals affected their propensity to cooperate. In the model, in fact, we gave for granted that such phenomenon exists, due to the results originated from the human experiment. Our assumption was to ascertain if the fact that richer individuals may be more likely to cooperate was due to knowing they could rely on other people with more wealth. Poorer individuals, instead, may be more likely to play safe and defect because their life circumstances generally are more of need (Gächter et al., 2017; Piff et al., 2010). For this reason, in the first set of simulations we assigned a probability of p = 0.7 to the richer agents to cooperate or to the poorer to defect in the first rounds lacking prior information. Later, we decided to test a set of different hypotheses considering: (i) a scenario where there was no difference in belonging to one of the two classes and there was a generalised high propensity to perform defective acts in first rounds (i.e., p = 0.95 for all agents) (see Piff, 2014; Piff et al., 2012 for the scientific background which inspired this); and (ii) the size of the memory in the short memory condition, set to 5 instead of 10 rounds.

Statistical analysis

We acquired two types of empirical data relating to cooperation via z-Tree (Fischbacher, 2007). First, we learnt the proportions of CDP and CD chosen behaviours, and, secondly, we discovered the final payoffs gained by each student. These sets of data were analysed by generalised linear mixed models via R 3.2.4 (R Core Team, 2016) via the lmerTest package (Kuznetsova et al., 2015). The algorithm calculating the socioeconomic index was coded in IBM SPSS Statistics 24, as well as the graphs reported in this paper and its Supplementary information.


Frequencies of cooperation, defection and punishment

To understand how the players behaved differently as a result of their division into the two CDP and CD classes, we obtained behavioural measures and derived the earnings from them. We used a GLMM with the frequencies of their choices (C, D, or P) as a dependent variable, implementing the split into the two trading classes by treatment as fixed effects while the subjects as random effects. (This GLMM is the reduced version of a more general one—reported in Supplementary information—incorporating several demographic and social characters obtained with a questionnaire reported in Supplementary information) Figure 1 shows the behavioural frequencies the players displayed. Belonging to either one of the trading classes significantly explained the variation in behavioural strategies (F1,347 = 49.15, p < 0.001). However, the integrated or segregated conditions proved to have no significant effect on the selection of strategies (F2,345 = 0.74, p = 0.47).

Fig. 1: Frequencies of Cooperation, Defection, and Punishment (C, D, P) across the three treatments.
figure 1

In T0 the subjects could punish each other by random partner matching, regardless of their socioeconomic class (integrated society model); in T1 and T2 the matching was predetermined (segregated society models): in T1 richer subjects could punish poorer ones, whereas in T2 poorer could punish richer ones. High levels of defections are found in all treatments, but, in the integrated condition, the punishers limited themselves in defecting while punishing. In both segregation conditions, instead, the punishers decreased their cooperation levels when punishing. ** Means p < 0.01, *** means p < 0.001; error bars show standard errors of the means.

By analysing the differential frequencies of the students’ choices, we can infer the impact that punishment had on the players. In T0, the two classes cooperated at similar levels (t2,9518 = −0.47, p = 0.63). However, the 12.4% rate of punishment added to the 68.36% of defection makes up for the defection rate of the CD class not able to punish (F2,9518 = 4.21, p = 0.015). This indicates that, in the integrated condition, the punishers partly restrain themselves from defecting while punishing. In both T1 and T2, conversely, cooperation levels were significantly different across classes (F1,20510 = 266, p < 0.001) whereas defection levels were not (F2,20505 = 0.226, p = 0.79). Hence, in both segregation conditions the punishers decreased their cooperation levels when choosing to punish. For a finer representation of what happens during the behavioural exchanges between partners on a time scale (see Fig. 3 of Supplementary information).

Players’ earnings

One of the driving questions we mined our data for was whether the players earn differently as a result of their disparity in socioeconomic background. Moreover, did their incomes vary due to the condition that only some players were permitted to implement the punishment strategy? The treatments and class separation allowed for exploring such questions (as displayed in Fig. 2). In all three different conditions the total earnings achieved by the both classes combined were statistically in the same small range (T1 = 101; T2 = 102; T3 = 104 units + comparable SE’s). Thus, we can conclude that this manipulation, redistributing the subjects into the two distinct classes, did not cause a sizeable difference in total earnings.

Fig. 2: Final earnings the subjects obtained from the experiment (measured in Chinese renminbi).
figure 2

In the segregated conditions, the subjects who could punish (CDP) earned double than what those who could not did (CD). (*) Means marginally significant, p = 0.048, *** means p < 0.001; error bars show standard errors of the means.

When comparing the treatments, we checked whether in the integrated condition the players earned differently. In this treatment, the players were randomly assigned to the two different classes regardless of their socioeconomic status. The ability to punish the players belonging to the other class made a marginal increase in earnings possible (of 11 points), which proved to be slightly statistically significant (t1,108 = 1999, p = 0.048). In the two segregation treatments, instead, it is readily apparent how the two classes of players differentiated their earnings. The punishers earned more, almost double as much as the other CD class (F1,229 = 147, p < 0.0001) regardless of the chosen treatment (F1,229 = 0.064, p = 0.8).

Agent-based model

Frequencies of Cooperation, Defection, and Punishment

We tested whether agents behaved differently due to: (i) their division into CDP and CD classes and ensuing ability to either perform only a subset of all possible actions (i.e., the CD class) or all possible actions (i.e., the CDP class); (ii) the different cognitive mechanisms involved in selecting to cooperate or to defect. Following this, we report results from a population of 350 agents, which is an analogous size to the students involved in the human experiments.

Before running the simulation which most closely resembles the behaviour displayed by the human subjects, we ran prior simulations to test for occurrences of a class-dependent predisposition of rich agents to behave as cooperators or defectors. Additionally, we also ran other simulations to test for the ability of the agents to remember past interactions (see Supplementary Results).

In the final set of simulations which we present here, we tested for a predisposition in all agents towards defection with p = 0.95 (for a general predisposition to defect, see Piff, 2014; Piff et al., 2012), exclusively implementing a short-term memory strategy. In this case, the number of past round outcomes each agent could remember was fixed at 5. The simulation shows that defection is the preferred behaviour performed most often by the agents (Fig. 3—for additional analyses with 80 rounds, 1000 agents and different strategies depending on varying memory size, see Figs 46 in Supplementary information).

Fig. 3: Behavioural frequencies of Cooperation, Defection and Punishment (C, D, P) shown by the agents in the Agent Based model with a memory size equal to five rounds.
figure 3

Defection levels were lower and cooperation was higher compared to the human experiment. The predisposition to defect at first encounter is 95%. N = 350 agents. Supplementary information reports similar results with a larger number of agents, different memory sizes and different strategies. The minute error bars show standard error of the mean values with 95% confidence interval.

In T0 and T1 we can see the artificial agents performed behaviours similar to what the human did, with the exception of a generally higher propensity to cooperate. In T2, though, when agents adopted a Tit-for-Tat strategy, the agents from the class of rich agents selected less punishment; further, when agents adopted a short-term memory strategy, instead (i.e., they were enabled to use outcome information from the preceding five interactions), agents from both classes clearly chose to defect more and to cooperate less. Finally, when the agents adopted an incremental memory strategy, there is a small increase in defection performed by agents from both classes (see Supplementary information for all these results). This combination of parameters, (i.e., short-term memory strategy and T2 treatment with a strong propensity to perform defective acts for agents from both classes) led to results remarkably similar to those obtained from the human experiments with Chinese students. Consequently, the models proved sound since a varying group size does not lead to any variation in the results.

Agents’ earnings

For the computerised simulations, we consider here a generalised predisposition to defect of p = 0.95 and with a short memory size consisting of five previous rounds. In T0, the agents from both classes tallied the same amount of earnings. In T1, the richer agents earned double of the poorer agents. In T2, the richer agents earned more than double the amount of the poorer agents (see Fig. 4 for details). For additional analyses about earning (see Figs 79 in Supmentary information).

Fig. 4: Agent Based model final earnings agents obtained from the simulation with a memory size of five rounds.
figure 4

In the integration treatment, the agents who could punish (CDP) earned the same amount that those who could not did (CD). In the segregated treatments, those that could punish earned approximately double of those who could not. These results are in line with those of the human experiments. The predisposition to defect in such cases at first encounter is 95%. N = 350 agents. Supplementary information reports similar results with a larger number of agents, different memory sizes and different strategies. The minute error bars show standard error rates of the mean values with a 95% confidence interval.


The discipline of economics has historically looked at the effects of income distribution on overall economic growth, essentially taking the position that while a more equal distribution of income might indeed benefit society, any policy efforts to promote redistribution do exact an overall cost (e.g. Benabou, 2000). However, if the risk-taking for anonymous cooperation is deterred by income inequality (Brosnan and Bshary, 2016) leading to the formation of segregated groups within the same society, this overall cost might be balanced out.

We allowed for the evolution of behavioural exchanges when relatively richer and poorer individuals were let interact within the context of a Prisoner’s Dilemma Game. Selecting the subjects from a country where the distribution of wealth is plainly uneven and in which socioeconomic classes are just as evidently distinct (Triandis, 1995; Xie and Zhou, 2014), we were interested in examining how this state of socioeconomic segregation alters the anonymous subjects’ behaviour and their incomes. In the treatment modelling an integrated society, the subjects could freely trade points across socioeconomic classes. In the more realistic models of socioeconomic class segregation, the classes were permitted to alternatively interact from one class to the other. In all treatments, the use of punishment could be realistically carried out by only one class. Comparing the treatments, we could ascertain that the behavioural and monetary differences we eventually found were principally due to the effect of socioeconomic segregation.

The class division we implemented caused a difference in the final earnings gained by the players, whether they interacted in a segregated or integrated socioeconomic condition. In the segregated models with people and artificial agents, final incomes amounted to an increase of just over 90% in favour of the class of individuals or agents that were provided with the option to punish others. Social inequality therefore increased after the experiments. However, in the integrated model, the incomes proved to be, statistically, only modestly dissimilar for the human subjects, which also produced the same result in the computer simulations. There were no differences in the amounts of cooperation, defection, and punishment behaviours of rich and poor individuals in the segregated societies (T1 with T2). We therefore conclude people and agents from both backgrounds do not implement punishment according to social learning and life experience in their different milieus to gain any further.

By splitting the anonymous players according to their different socioeconomic classes, we brought about behavioural changes in subjects who were not aware of the social status of the other partners (similar to Kuwabara et al., 2016). These two elements lead us to conclude that it is precisely the integration effect that allows for a redistribution of wealth. Firstly, by alternating encounters between partners with similar or different socioeconomic origins, they could balance out their earnings and, secondly, the use of intra-class punishment (rich versus rich, and poor versus poor) may well also narrow the gap related to different final payoff earnings (a larger sample size, within group, should provide these meaningful statistics in a follow-up study). Furthermore, the simulations suggested that the strategy adopted by our subjects relied on the memory of the five preceding rounds. It seems therefore that players adjust their strategies to mostly defect during these initial exchanges. A prevalence for defection by Chinese subjects is not a novel to our study, but rather a feature previously noted by other authors (e.g. Pansini et al., 2016; Wu et al., 2009), probably to be imputed to the fact that only 11.3% of Chinese trust a person whom they met for the first time, compared to about 38% in the West (Greif and Tabellini, 2010) and that in China the level of trust toward strangers is <40%, whereas in the U.S. it is over 60% (Inglehart et al., 1998).

By focussing on the ability to punish, an option realistically more frequently opted for downward in the social hierarchy than upward, we showed how disparity in socioeconomic classes does not cause a modification in cooperative behaviours per se. (It should be noted that the students not able to punish might have been affected in their behaviours by the threat of being punished, irrespective of the actual use of punishment employed by their more powerful partners: Bone et al., 2015; Fehr and Gachter, 2002). Rather than allowing the subjects to themselves punish their partners, a different game could be implemented by giving the authority to designated individuals or a centralised authority to use second-order punishment, as performed in those Public Good Games of Hilbe et al. (2014) and Baldassarri and Grossman (2011) (As it stands, this experiment can be compared to other IPD games in similar anonymous player conditions. However, another game scenario would need evaluation if we inform the subjects (Xie et al., 2017) about their reciprocal social status (Cherry et al., 2005; Zhang et al., 2020)).

Our results are somewhat different from previous research, which concluded that members of different socioeconomic classes were found cooperating in their own ways (Holland et al., 2012; Wilson et al., 2009). In these two studies, for instance, people from lower socioeconomic classes offered less than the wealthy ones in donation experiments due to their perceived higher relative cost of cooperation. Furthermore, in the Bone et al. study (2015), students were given different punishing powers and, as a result, the weaker subjects also wound up investing less. In our case, by contrast, the investments made by the poorer subjects fared at the same levels of the richer ones. Access to higher education is, in fact, a relatively recent phenomenon in China and, unlike in the UK or US (Bone et al., 2015; Holland et al., 2012; Wilson et al., 2009), tuition fees are always subsidised here, making our ‘poorer’ subjects probably better off. Yet, the experiment took place in Yunnan province, which is the second poorest province (Bloomberg, 2018) with an inequality issue likely larger than the country’s average. Regardless of the absolute wealth of the subjects, what counts is the relative gap amongst them. When considering the pre-experiment wealth conditions of these individuals, given how, due to punishment, the poorer ones can earn more than the rich (in T2), inequalities are overall potentially reduced by favouring the poor individuals who seem to have the ability to achieve power hastily.

Through the differential use of punishment, we modelled how a redistribution of wealth can be achieved following some of the principles of the Keynesian approach to a free-market economy between the states (see Bardhan et al., 2006; Bowles, 2012). In a more base and practical sense, state systems in high Gini-coefficient countries should seek to implement practical measures and act decisively to foster a decrease in the yawning gap in socioeconomic classes. Regardless of its impressive and top line economic development, China should consider enacting reforms (e.g. Jacobs and Mazzucato, 2016; Scheffer et al., 2017) in pursuit of this critical goal. We believe that fostering the integration of economic exchanges (Acemoglu et al., 2017; Benabou, 1996) and enlarging network exchanges (Bowles et al., 2014; David-Barrett and Dunbar, 2013; Kets et al., 2011; Wang et al., 2015; Xia et al., 2015) are key elements for the future progress of developing economies. Our study shows an alternative pathway for the evolution of social integration and redistribution (yet with a very simple design, similar to what now even seen in the redistribution of unequal resources by mutualistic bacterias (Whiteside et al., 2019)), departing from individuals who adjust their social position by levelling out their earnings. Via this approach, though, the older Confucian approach separating people into distinct classes (Greif and Tabellini, 2010) shall be progressively superseded by the old saying that a chain is only as strong as its weakest link.