Equal status in Ultimatum Games promotes rational sharing

Experiments on the Ultimatum Game (UG) repeatedly show that people’s behaviour is far from rational. In UG experiments, a subject proposes how to divide a pot and the other can accept or reject the proposal, in which case both lose everything. While rational people would offer and accept the minimum possible amount, in experiments low offers are often rejected and offers are typically larger than the minimum, and even fair. Several theoretical works have proposed that these results may arise evolutionarily when subjects act in both roles and there is a fixed interaction structure in the population specifying who plays with whom. We report the first experiments on structured UG with subjects playing simultaneously both roles. We observe that acceptance levels of responders approach rationality and proposers accommodate their offers to their environment. More precisely, subjects keep low acceptance levels all the time, but as proposers they follow a best-response-like approach to choose their offers. We thus find that status equality promotes rational sharing while the influence of structure leads to fairer offers compared to well-mixed populations. Our results are far from what is observed in single-role UG experiments and largely different from available predictions based on evolutionary game theory.

We conducted a series of repeated dual-role UG experiments, including 9 treatment groups with structured populations and 2 control groups with well-mixed populations in computer labs of Beijing Normal University. Detail settings of the groups are shown in Supplementary Table 2. All 321 subjects were freshmen and sophomores recruited from Beijing Normal University without taking classes of game theory and economy. The interactions were anonymous, and via computers. Frosted glass dividers ensured that the students could not see each other (see Supplementary Figure 1). We built the experimental platform by using PHP, mySQL and javascript, and ran the platform programs on the server. Schematic diagrams of our experimental platform are shown in Supplementary Figure 2.
Before starting the experiment, we explained the game to all subjects, including the rules of the game, the purpose of the game, and the feedback information in the computer in 20 minutes. All subjects in each session were given the same instructions (in Chinese). To ensure that all subjects fully understand the game, we implemented 2 exercises and 5 practice rounds before the formal experiment (last about 10 minutes). During the period, all subjects can raise their hands, and our experimenters would answer their questions. The formal experiment lasted about 60 minutes, and subjects were not told the number of rounds, so as to avoid end round effects. Each round is time limited (except for the first round). In T3-T4, subjects have 30 seconds to submit their decisions at each round, and in other groups, subjects have 45 seconds to submit their decisions. Subjects knew that if they did not decide within the given time, they would be allocated their own decisions in the previous round. Since the subjects had familiarized themselves with the game during the practice rounds, this happened only 412 times in 24672 decisions (1.67 %). After the experiment, the total payoffs of each subject obtained in the formal experiment was converted to Chinese Yuan at a ratio of 100 : 1. This pay plus 30 Chinese Yuan is his/her final income (see Supplementary Table 2 for details).
In all experiments, the data of three subjects (two in C1 and one in T3) are excluded because we notice that they do not really understand the game. To keep the comparison unbiased, all results were calculated using data in 1-70 rounds.

Experimental instructions for T1-T9
Welcome and thanks for participating in this game. Please read the game instruction carefully. If you have any questions please raise your hand. One experimenters will then come to you and answer your questions. From now on, communication with other participants is not allowed. Please switch off your mobile phone and keep quiet in the whole game.
You will play a decision making game. In the game, you would not know other persons' true identity.
Your scores depend on your and your partners' decisions. Your final income = fixed income 30 Chinese Yuan + 0.01 × total scores.

Game instruction
1. In this game, you play two roles, proposer and responder, and submit your offer and demand simultaneously. You play the game with four fixed partners, who also play the two roles.
2. At each round, you play the game twice with different roles at the same time. When you play proposer, all your partners play responders; when you play responder, all your partners play proposers. 3. A proposer and a responder share 100 points. If proposer' offer greater than or equal to responder' demand, the proposer receives (100−proposer's offer), and the responder receives (proposer's offer), otherwise, both receive 0.
4. Your total points are the sum of your four interactions. Your score = (your total points)/(number of partners). After all the participants submit their offer and demand, the system will calculate your points obtained as a proposer, your points obtained as a responder, your total points, and your scores.
Example (A sketch map of the ring structure is showed to subjects.) 1. Suppose you have four partners, A, B, C and D.
2. At each round, you play the game with all your four partners.
3. You submit your offer p and demand q.
4. Suppose the four partners' offers are p A , p B , p C and p D , and demands are q A , q B , q C , q D . 5. Suppose your offer p satisfies q C , q D > p ≥ q A , q B . Then, as a proposer, your offer is accepted by A and B, and your points obtained as a proposer are (100 − p) + (100 − p).
6. Suppose your demand q satisfies p A < q ≤ p B , p C , p D . Then, as a responder, you accept offers from B, C and D, and your points obtained as a responder are p B + p C + p D .
7. In this round, your total points are (200 − 2p) + p B + p C + p D .

Exercise 1
Now we generate your and your partners' offers and demands randomly. For simplicity, we only generate multiples of 10 for offers and demands. You need calculate your points obtained as a proposer, your points obtained as a responder, your total points and your scores. Different subjects may have different partners, so you cannot calculate your partners' scores.

Exercise 2
Same as Exercise 1.

Experimental instructions for C1-C2
Game instruction 1. In this game, you play two roles, proposer and responder, and submit your offer and demand simultaneously. You play the game with four partners, who also play the two roles.
2. At each round, you play the game twice with different roles at the same time. When you play proposer, all your partners play responders; when you play responder, all your partners play proposers. 3. A proposer and a responder share 100 points. If proposer' offer greater than or equal to responder' demand, the proposer receives (100−proposer's offer), and the responder receives (proposer's offer), otherwise, both receive 0.
4. Your total points are the sum of your four interactions. Your score = (your total points)/(number of partners). After all the participants submit their offer and demand, the system will calculate your points obtained as a proposer, your points obtained as a responder, your total points, and your scores.
5. At the beginning of each round, you will randomly encounter four new partners.
[The rest parts are same as Instructions for T1-T2.]

Calculating best-response behaviour
We used a rigorous definition of best-response behaviour to identify whether proposers were rational.
The best strategy for rational proposers in each round was to offer the amount that maximizes payoff, keeping in mind the acceptance levels of indicated by neighbouring responders in the previous round [1,2]. For a proposer with k neighbouring responders whose acceptance levels in the previous round were respectively q 1 , ..., q k (with q 1 < · · · < q k ), the best strategy was p = argmax was the payoff if the proposers offered q i . We found that the proportion of rational proposers gradually increased and eventually nearly about half of all proposers take bestresponse behaviours in all groups. Our definition of best-response behaviours of proposers was extremely rigorous which indicate that the proportion of rational proposers was quite high in the last several rounds.

Experimental settings for single-role ultimatum game experiments
In our previous single-role UG experiments, we totally conducted 2 treatment groups and 2 control groups [3]. In each group, we recruited 50 subjects, half of whom were randomly assigned proposers and the rest randomly assigned responders. The interactions were executed via computer and were anonymous.
We built the experimental platform by using z-Tree [4]. In the single-role UG experiments, each subject only enacted one role, proposer or responder, and his/her role didn't change during the experiment. In the treatment groups, each subject was assigned in a location within a static bipartite network, and played UG with their neighbours who enacted the other role. All of the proposers' neighbours are responders and vice versa. We use two static bipartite networks including a regular bipartite network in which each node has four neighbors and a random bipartite network in which the number of neighbors ranges from 2 to 6 (with an average degree of 4). In the control groups, the population structure changes and players randomly encounter their neighbours in each round.
In the single-role UG, all subjects must use one decision behavior as they interact with their neighbors; that is, a proposer must make the same offer p(0 ≤ p ≤ 100) to all of his or her neighboring responders, and a responder must indicate the same minimum acceptance level q(0 ≤ q ≤ 100) to all of his or her neighboring proposers. The payoff of a subject (proposer or responder) is taken as the average points of his/her all interactions. That is, a subject i's payoff can be calculated as U i = ∑ j∈Γ i U ij /k i , where Γ i is the set of his/her neighbours, k i is the number of his/her neighbours, and U ij is the points of subject i interacting with neighbor j.

Calculating 'offering the maximum acceptance level' behaviour
We note that the best-response offer for a proposer must be equal to one of his/her neighbors' acceptance levels. In particular, the best-response offer is exactly the maximum acceptance level if max{q i (t)} ≤ 25. We then calculate the proportion of behaviours that offer the maximum acceptance level, i.e., p(t + 1) = max{q i (t)}. The proportions of 'offering the maximum acceptance level' behaviours are 0.3043 and 0.2250 for the treatment groups and the control groups, respectively. Note that the proportions of best-response behaviours are 0.3047 and 0.2264 for the treatment groups and the control groups, respectively, i.e., the proportion of behaviours that offer the maximum acceptance level is slightly lower than that of the best-response behaviours. This reveals that some subjects indeed choose their offers based on the best-response consideration rather than simply adopt their neighbors' maximum acceptance level.

Reinforcement learning model
In order to get a deeper insight into this theoretical significance of our experimental results, we have run simulations based on a type of reinforcement learning model [5]. Firstly, we build two databases of all responder acceptance levels obtained from all treatment groups and all control groups, respectively. Then we randomly pick responder acceptance level sequences from the two databases and use reinforcement learning model to reproduce proposers' offer p for treatment groups and control groups, respectively. We use a static 4-degree ring structure and well-mixed population with 50 subjects in treatment simulations and control simulations, respectively. The simulation process of reinforcement learning model is as follows.
1. Initial propensities: We reduce the offer set of proposers into {0, 5, 10, ..., 100} in our simulations and assume that all proposers have the same initial propensities for all offers p in the simplified strategy set, which are set equal to fair split 50.

2.
Update propensities : Suppose a proposer i has chosen offer p k in round t, the propensity in round t + 1 is updated by where Q k i (t + 1) and Q k i (t) denote the propensities of proposer i chosen offer p k at round t + 1 and round t, respectively, u k i (t) is the payoff of proposer i chosen offer p k at round t, θ is the learning rate which is set to 0.2 in our simulations. 5 3. Update probabilities: The probability of choosing offer p k in round t + 1 is determined by where k = 1, 2, · · · , 21 and λ is a parameter that determines reinforcement sensitivity which is set to 0.2 in our simulations. We repeat step 2 and step 3 until the simulation reaches a predetermined round. The ordinate represents the spatial orders of proposers. Two proposers with most common neighbours will be adjacent to each other. e-h, Spatio-temporal patterns of rational behaviours (q = 0 or 1) and irrational behaviours of responders in T1-T2 (e), T3-T4 (f), T5-T9 (g) and C1-C2 (h). The ordinate represents the spatial orders of responders. Two responders with most common neighbours will be adjacent to each other. The red color represents rational behaviours and the blue color represents irrational behaviours. i.e., both faster and slower decisions are more likely to be best-response. However, in T3-T4 (b), subjects do not have enough time to make a slow decision given the 30 seconds time limit. This explains why the proportion of best-response behaviours in T3-T4 is lower than T1-T2 and T5-T9. Finally, in C1-C2

Supplementary
(d), the correlation between best-response behaviours and t −(t) is not significant (Pearson correlation coefficient= 0.0981, P -value= 0.4933). This implies that in decision time may not affect best-response behaviours in a well-mixed population.
Supplementary     of Mann-Whitney U-test for offer p and acceptance level q (n = 103 in T1-T2, n = 59 in T3-T4, n = 50 in T5-T9, and n = 106 in C1-C2). A subject's offer (or acceptance level) is taken as the average of his/her p (or q) over 70 rounds. The symbol "*" denotes that the mean values of two groups are significantly different, i.e. P -value<0.05.