Reinforcement learning accounts for moody conditional cooperation behavior: experimental results

In social dilemma games, human participants often show conditional cooperation (CC) behavior or its variant called moody conditional cooperation (MCC), with which they basically tend to cooperate when many other peers have previously cooperated. Recent computational studies showed that CC and MCC behavioral patterns could be explained by reinforcement learning. In the present study, we use a repeated multiplayer prisoner’s dilemma game and the repeated public goods game played by human participants to examine whether MCC is observed across different types of game and the possibility that reinforcement learning explains observed behavior. We observed MCC behavior in both games, but the MCC that we observed was different from that observed in the past experiments. In the present study, whether or not a focal participant cooperated previously affected the overall level of cooperation, instead of changing the tendency of cooperation in response to cooperation of other participants in the previous time step. We found that, across different conditions, reinforcement learning models were approximately as accurate as a MCC model in describing the experimental results. Consistent with the previous computational studies, the present results suggest that reinforcement learning may be a major proximate mechanism governing MCC behavior.


Supporting Information for "Reinforcement learning accounts for moody conditional cooperation behavior: experimental results"
Yutaka Horita, Masanori Takezawa, Keigo Inukai, Toshimasa Kita, Naoki Masuda Figure S1: Decision screens presented to the participants Figure S2: The relationship between the level of cooperation in the PDG and that in the PGG Figure S3: The expected probability of C predicted by the reinforcement learning models in the PDG Figure S4: The expected fraction of contributions predicted by the reinforcement learning models in the PGG Supplementary Tables  Table S1: Maximum likelihood estimators obtained for the behavioral data in the PDG Table S2: Maximum likelihood estimators obtained for the behavioral data in the PGG

Supplementary Figures
Supplementary Figure S1: Decision screens presented to the participants. (a) PDG in the fixed treatment, (b) PDG in the mixed treatment, (c) PGG in the fixed treatment, and (d) PGG in the mixed treatment. In fact, the screens were displayed in Japanese. Here we translated the text into English.

a b c d
Supplementary Figure S3: The expected probability of C predicted by the reinforcement learning models in the PDG. We predicted the probability of C by running the BM or RE model with the estimated parameter values. Then, we averaged the predicted probability of cooperation over the participants and the rounds, and plotted them as a function of N c (i.e., the number of the other group members that cooperated in the previous round).

Values of the payoffs used in the experiments
We determined the values of b, c, and m to make the payoff when a player chose C or D in the PDG be identical to that when the player maximally cooperated or maximally defected in the PGG, respectively. Denote by N the number of participants in the group. Denote by K the fraction of players who select C in the case of the PDG and the normalized contribution averaged over the group members in the case of the PGG. Denote by " the action (C = 1, D = 0) by the ith other member in the group (1 < i ≤ N -1) in the case of the PDG and the normalized contribution by the ith other member in the group (0 ≤ " ≤ 1) in the case of the PGG. Then, we obtain = " %&' "(' ( -1).
First, consider the case in which a player maximally cooperates in both the PDG and the PGG. By equating the payoff value given by equation (1) with a t = 1 and that given by equation (2) with a t = 1, we obtain Second, consider the case in which a player maximally defects in both the PDG and the PGG. By equating equation (1) with a t = 0 and equation (2) with a t = 0, we obtain + ( − 1) = + ( − 1).