Examining learning coherence in group decision-making: triads vs. tetrads

This study examined whether three heads are better than four in terms of performance and learning properties in group decision-making. It was predicted that learning incoherence took place in tetrads because the majority rule could not be applied when two subgroups emerged. As a result, tetrads underperformed triads. To examine this hypothesis, we adopted a reinforcement learning framework using simple Q-learning and estimated learning parameters. Overall, the results were consistent with the hypothesis. Further, this study is one of a few attempts to apply a computational approach to learning behavior in small groups. This approach enables the identification of underlying learning parameters in group decision-making.

www.nature.com/scientificreports/ learning strategies, conflicts arise, leading to a situation in which both strategies are randomly adopted. The probability of adopting these strategies is 6p 2 1 − p 2 . In contrast, the triads never encounter this situation because the majority, who decides the preferred strategy, always exists. Thus, learning coherence could be achieved in triads. The difference between expected rewards for the tetrads and triads are −3p 2 indicating that tetrads underperform triads. The purpose of this study was to examine the hypothesis that learning coherence emerges in individuals and triads and learning incoherence occurs in tetrads by estimating and comparing learning parameters.

Participants.
A total of 343 healthy undergraduate students at Kobe University (103 women, age range = 19-25 years, SD = 1.21) participated in the study for course credit. All experimental protocols in this study were approved by the Ethics Committee, Graduate School of Business Administration, Kobe University, and the study was carried out in accordance with the relevant guidelines and regulations. All participants signed an informed consent form before the experiment.
Experiments. In test 1, participants undertook cognitive tasks (two-armed bandit [TAB] problems) individually. In test 2, they formed groups of three and performed as groups. In test 3, they formed groups of four and performed the same cognitive tasks as groups. There were seven rounds of tests. To control for learning effects, three tests were randomly assigned to either groups or individuals in each round, that is, some groups were triads, the other groups were tetrads, and the remaining were individuals. All tests were performed with the PsytoolKit 32,33 , and when participants performed the TAB tasks as a group, they communicated with each other via a breakout session in Zoom to decide the choices in the TAB.
All participants undertook test 1, and most of the participants took part in tests 2 and 3. In each of the triad and tetrad groups, at least one member participated in both tests 2 and 3. Because all group members in the triads did not participate in test 3, and all group members in the tetrads did not participate in test 2, 8 triad groups and 4 tetrad groups were dropped from the sample. As a result, the total number of groups of triads and tetrads examined in this study were 100 and 104, respectively. Q-learning model. In this study, a simple Q-learning reinforcement learning algorithm 34 was adopted to account for asymmetric learning rates (learning biases). Participants played a TAB problem, in which they chose either a right or left box on the screen. After the selection, the participants were awarded either 10 or 0 points, and they were instructed to try to achieve the highest score over a series of 100 choices. One of the boxes had a higher probability of being worth 1 point (70%), and the corresponding probability of the other box was set at 30%. However, we switched these probabilities twice over 100 choices. For example, the right and left boxes had a respective 70% and 30% probability of being worth 1 point for the first 30 choices, and from the 31st to the 70th choice, the probabilities switched such that the probability of earning 1 point for the right and left boxes became 30% and 70%, respectively. Then, for the last 30 choices, the probabilities of the right and left boxes returned to the initial respective levels of 70% and 30%. Thus, in each round of tests 1 and 2, these changes in probability took place three times over 100 choices. Moreover, the probabilities were randomized for every round of tests 1 and 2 so that even in the same test, the probability for each round differed. Therefore, participants could not transfer learning obtained in one round of the test to other rounds.
In the Q-learning framework, a decision-maker is assumed to calculate the action value for each choice (i.e., the right and left boxes). The action value of option i at trial t is denoted by Q i (t) , calculated as follows: with where R i (t) is the reward associated with option i at trial t , either 10 or 0 points, and δ(t) is the reward prediction error. α ± indicates the learning rate so that the learning biases are measured by α + − α − . If this is positive (negative), positivity (negativity) biases exist. φ is added in Eq. 1 as the choice trace to account for autocorrelation of choice, which could affect learning biases 35 .
As one of the characteristics of learning, this study compared positivity biases. The positivity and confirmation biases refer to the tendency to respond to positive news more sensitively than to negative news, and the tendency to respect outcomes consistent with one's hypothesis 36 . Related studies examined the existence of these biases in individual reinforcement learning, and reported that learning rates tend to be positively biased [37][38][39][40][41] . Katahira 35 suggested that the autocorrelation of choices itself tends to generate pseudo-positivity biases. Harada 42 controlled for this autocorrelation by incorporating the effects of past choices into the learning model, and demonstrated that the positivity biases were indeed confirmed in a simple Q-learning model. However, once a more dynamic model was introduced, the positivity biases disappeared. Therefore, learning biases not only depended on the autocorrelation of choices, but also on autocorrelation of learning parameters in the model. While previous studies examined learning biases for individuals, this study investigated the existence of positivity biases in group learning of triads and tetrads. As related studies indicated, it could be inferred that either positivity biases existed or no biases existed for both triads and tetrads. According to our hypothesis, we speculated that learning coherence in triads lead to positivity biases because individual learning was reported to generate positivity biases in related studies while tetrads generated no biases due to learning incoherence. Given these action values of the two options, the decision-maker determines one of the two options according to the softmax decision rule: where P(a(t) = i) is the probability of choosing the action a(t) = i at trial t . The parameter β is the inverse temperature, that measures the relative strength of exploitation vs. exploration (exploitation/exploration ratio). Exploitation is related to optimization under current contexts, implying the choice of the option with the highest action value Q i (t) . Exploration, on the other hand, refers to the digression from optimization so that one of the options without the highest action value is selected. If β is high, the probability of choosing the option with the highest action value increases, leading to exploitation. In contrast, if β is low, the probability of choosing the option without the highest action value increases. Thus, β measures the exploitation/exploration ratio. where p(D s |θ s ) is the likelihood of data D s for a subject s conditional on parameters θ s = α ± S , φ S , β S . p(θ s ) is the prior probability of θ s . Note that α should be bounded between 0 and 1, and β take non-negative values. Therefore, the corresponding priors were assumed to follow beta distributions for α ± with shape parameters of 2 and 2, and gamma distributions for β with a shape parameter of 2 and a scale parameter of 3. In addition, φ S is assumed to follow standard normal distribution with mean 0 and variance 1.

Results
This study investigated underlying learning mechanisms of triads and tetrads from two perspectives: (1) group differences and (2) within-group effects. The descriptive statistics for relevant variables are reported in Table 1.
Since the data rejected either the homogeneity of variance by the Bartlett test or the normality by the Shapiro-Wilk test in the statistical tests of the differences of relevant data across and within groups, the Kruskal-Wallis test was applied in the subsequent analyses without referring to the results of either the Bartlett or the Shapiro-Wilk tests, due to space limitation.
Group differences. Performance. First, the performance difference between triads and tetrads was examined. The result suggested that a performance difference existed between triads and tetrads and triads outperformed tetrads ( χ 2 =4.12, p = 0.04). Thus, we could identify that triads generated slightly higher performance than tetrads (see Fig. 1).
Inverse temperature. As the first characteristic of learning, the magnitude of the inverse temperature between triads and tetrads was compared. Inverse temperature measured the degree of exploitation vis-à-vis exploration. Exploitation adopts the optimal choices, given existing information, whereas exploration makes random choices. Inverse temperature was significantly higher for triads than for tetrads ( χ 2 =42.88, p = 5.8.e−11) (see Fig. 2). It follows that triads were more likely to make random choices, regardless of past records. It could be θ = argmax p(D s |θ s )p(θ s ), www.nature.com/scientificreports/ inferred that this result was generated due to the fact that the majority rule was harder to apply in tetrads than in triads. This implied learning coherence in triads and incoherence in tetrads.
Positivity biases. As the second characteristic of learning, this study compared positivity biases. While previous studies examined learning biases for individuals, this study investigated the existence of positivity biases in group learning of triads and tetrads. As related studies indicated, it could be inferred that either positivity biases existed or no biases existed for both triads and tetrads. For triads, the positivity biases were supported ( χ 2 =13.39, p = 2.5e−04). However, for tetrads, we confirmed negativity biases ( χ 2 =24.05, p = 9.4e−07). This study also investigated learning biases for individuals, revealing that positivity biases existed ( χ 2 =22.08, p = 2.6e−06). Thus, while individuals and triads confirmed positivity biases, tetrads generated negativity biases (see Fig. 3). According to related studies, this result suggested learning coherence for triads and learning incoherence for tetrads.

Within-group effects.
As the within-group effects, the maximum, minimum, and the average of group members' individual performances and learning parameters were compared with the corresponding group variables.
Thus, group effects in triads were high in generating positivity biases, but those in tetrads were also significant in giving rise to negativity biases.

Discussion
Overall, our statistical analysis revealed that triads had higher performance, higher inverse temperature, and more positivity biases. Since inverse temperature and positivity biases were indicated to be positively related to performance, these results implied that triads achieved learning coherence, but tetrads experienced learning incoherence. On the one hand, it can be inferred that triads that might break into majority and minority subgroups, enabled the group to achieve consistent and efficient learning over 100 choices, indicated by high performance, inverse temperature and positivity biases. On the other hand, tetrads that might be constrained by two equal subgroups, encountered dispute and confrontation, sometimes leading to deadlock, resulting in lower performance and inconsistent learning behavior, represented as low inverse temperature and high negativity biases. These results were consistent with related studies 14,[25][26][27][28][29]31 .
Of course, dispute and confrontation do not necessarily impair group performance. For example, in more creative tasks that require insight and experimentation, the high exploration observed in tetrads might be more www.nature.com/scientificreports/ efficient than triads subjected to majority influence. However, in the TAB problems, insight and experimentation were not required. Instead, utilizing past information and efficiently guessing an advantageous box played a critical role in achieving higher performance, that in turn hinged on consistency in learning strategies. Thus, while this study confirmed that odd-sized groups (i.e., triads) performed better than even-sized groups (i.e., tetrads) in learning tasks that do not require creativity or insight, tetrads might be superior to triads in creative tasks and insight problem-solving. This could be an interesting research topic in the future. In contrast to performance, inverse temperature, positivity biases, risk parameters, µ, and ν , did not account for the difference between triads and tetrads. Note that risk-seeking behavior also has a tendency toward divergence from current learning strategies. In this sense, risk-seeking has some similarity to exploration. However, in our model, exploration corresponded to divergence from the optimal Q value, that already incorporated riskseeking behavior. Hence, risk-seeking and exploration have subtle differences. That inverse temperature differed between triads and tetrads, implying that the divergence from a consistent learning strategy was reflected in the inverse temperature but not in risk attitudes.
In addition to these results, this paper contributes a novel methodology for the study of small groups. To the best of our knowledge, this is one of the first attempts to take a computational approach to the study of small-group dynamics. Of course, a large body of literature on group dynamics has empirically investigated the properties of the dynamics of small groups. However, most of these studies did not explicitly model the underlying mechanism of group decision-making or estimate parameters that characterize group dynamics. The computational approach proposed in this paper articulates the algorithm of group decision-making and enables the underlying learning parameters to be estimated, allowing for rigorous comparison among small groups in terms of learning parameters such as inverse temperature and risk attitudes. We hope this computational approach sheds new light on group dynamics and group decision-making.
In this respect, it should also be noted that a simple Q-learning model, or reinforcement learning in general, closely correspond to the actual working of neural networks in the brain. The key variables are the actual rewards and reward prediction errors. The Q value is the expected reward, that is updated by feedback from a reward prediction error. This reinforcement learning framework is supported by a number of empirical studies including www.nature.com/scientificreports/ neural signals in various cortical and subcortical structures that behave as predicted [43][44][45][46] . For example, it is now commonly accepted that dopamine neurons in the midbrain of humans and monkeys encode reward prediction errors [46][47][48] . Thus, the reinforcement learning model class is typically matched by brain activity. Since the simply Q-learning model considered in this paper belongs to this model class, the model matches brain activity, unlike abstract and unrealistic models without an empirical foundation. One of the managerial implications derived from this study is that group size is crucial to the management of small groups. In particular, when groups undertake learning under uncertainty without the burden of creativity and insight, triads, rather than tetrads, should be selected. However, when tasks require much creativity and insight, tetrads, rather than triads, might be preferred, although this idea was not examined in this study. In broader contexts, odd-sized groups are favored for learning tasks without creativity and even-sized groups for creative problem-solving 25 . This rule is clear and straightforward to implement, but, of course, diversity in knowledge, skill, working experiences, cultural backgrounds, and personalities also account for group performance. However, unless managers have sufficient time to take these factors into account, this simple rule should be implemented.
Finally, we would like to point out the limitations of this study. First, while we confirmed that triads outperformed tetrads, learning coherence of triads and learning incoherence of tetrads were inferred from the results on inverse temperature and positivity biases, rather than derived from a strong theoretical background. In this sense, these learning characteristics were exploratory in our hypothesis. In future studies, more detailed learning mechanisms generating learning coherence and incoherence should be specified and empirically tested. Second, learning tasks (TAB) are fundamental to the results of this study. If different kinds of tasks are assigned, the relative performance of triads and tetrads would differ. In particular, as described above, insight problemsolving or creative tasks might have opposite results regarding the relative performance of triads and tetrads. This constitutes one of our future research challenges.

Conclusion
This study focused on the relative performance and learning characteristics of triads and tetrads as an extension of Simmel 49 research on dyads vs. triads to triads vs. tetrads, and also serves as a specific investigation of odd-vs. even-sized group dynamics 14,25 . Generally, our study confirmed that the odd-sized groups performed better than the even-sized groups. Moreover, it was revealed that learning coherence and incoherence were observed in triads and tetrads, respectively. In addition to the confirmation of the theoretical predictions, this study developed a new computational model that enables the estimation of the underlying learning properties of small groups. In related works, Harada 50 also showed that individuals and triads performed better than dyads due to learning coherence of individuals and triads and the learning incoherence of triads. This study was consistent with this result in that the odd-sized groups (triads) performed better than even-sized groups (tetrads). To the best of our knowledge, this study was one of a few attempts to apply the reinforcement learning framework to group decision making.