Introduction

Individual decision-making is a key research area in microeconomics and a considerable amount of theoretical and experimental research has been devoted to its analysis. However, many decisions in firms, households, and other contexts are taken by groups, not by individuals. In addition, most groups, especially in firms, are characterized by hierarchical organizational structures. While hierarchies are a common feature of organizations, surprisingly little is known about their effect on outcomes in economic experiments, despite the well-known advantages of experiments when attempting to isolate true ceteris paribus effects (Falk and Heckman, 2009).

Magee and Galinsky (2008) posit that status and power are the two bases of hierarchy and describe power as related to ‘control over valued resources’ and status as ‘respect one has in the eyes of others’. To explain the pervasiveness of hierarchy in social settings, they outline two functions of hierarchy, namely, establishing order and improving decision-making (related to power) and motivating individuals (related to status). The outcomes of interest analyzed here are choices under hierarchy in a lottery task (Holt and Laury, 2002) and answers in intellective tasks (following the group task continuum definition proposed by Laughlin, 1980), compared to the choices of a majority voting group of three. Hierarchy is conceptualized as a formal authority (power) over a group of three’s decision-making in the abovementioned two tasks, i.e. the group leader decides about the group’s choices after a group discussion period, and it is implemented according to four different mechanisms: a hierarchy by vote, by age, by merit, and by a random mechanism. This research uses a sample of undergraduate students, data gathered using experimental methods and Bayesian hypothesis testing, as well as frequentist regression analysis, to analyze the effects of hierarchy on outcomes.

Results suggest that while there are no differences in choices of voting groups and hierarchy groups for the number of safe choices in the Holt and Laury (2002) lottery task, and therefore no differences in risk attitude, as well as in the probability of making an inconsistent choice in the lottery, there are indeed differences in the probability of providing a correct answer to the intellective tasks, and both Bayesian ANOVA and frequentist regression analysis shows that groups with a leader assigned based on merit are more likely to provide a correct answer. Comparing individuals’ choices as group leaders to their choices in an individual task (and therefore controlling for individual differences) using Bayesian hypothesis testing shows suggests that there seem to be no changes in leaders’ choices compared to their changes as individuals.

The remainder of this study is organized as follows: the section “Literature review” briefly discusses related literature in experimental economics, management, and psychology. Section “Experimental methods” describes the experimental procedures and section “Results” presents the results, while section “Discussion and conclusion” discusses and concludes.

Literature review

Research in management, sociology, and psychology has a long tradition of investigating group decision-making and the role of hierarchies using different research methods and analyzing different tasks (see, for example, Granovetter, 2005 for an overview of results in sociology), while experimental economic research has only in recent years increasingly focused on group decision-making. Research results in management typically show mixed effects of hierarchies on outcomes, with some studies suggesting that hierarchies might improve performance because they reduce conflict and promote coordination and other studies suggesting they might reduce performance because hierarchies decrease group members’ motivation and stifle innovation (see Bunderson et al., 2016 and the literature cited there). On a related topic, the effect of different decision rules, such as unanimity or majority voting, has been studied in the political economy literature (Feddersen and Pesendorfer, 1998; Messner and Polborn, 2004) as well as in psychology (Kerr and Tindale, 2004; Tindale and Winget, 2019). However, unlike this study, these related strands of related literature do not analyze tasks such as lottery choice and the possible effect of hierarchies on decision-making outcomes using an (economic) experimental approach.

Previous experimental economic research on group decision-making has analyzed group behavior in both games (i.e. decision-making tasks against another player that can be analyzed using game theory as “the study of mathematical models of conflict and cooperation among intelligent rational decision-makers” (Myerson, 1991) and in non-strategic situations (i.e. games against nature) that can be analyzed using the expected utility model as a framework for the analysis of choice situations where an individual chooses “between risky or uncertain prospects by comparing their expected utility values, i.e., the weighted sums obtained by adding the utility values of outcomes multiplied by their respective probabilities” (Mongin, 1997). As the goal of this study is an analysis of non-strategic situations, only the literature that analyzes tasks similar to the ones in this study will be reviewed. For a more complete overview, see Kugler et al. (2012).

The finding that “biases” (i.e., choices that are not in line with the predictions of the expected utility model) exist in individual decision-making in experiments led researchers to investigate whether these biases persist in groups. There is no clear-cut answer to this research question: in some settings, groups show stronger biases than individuals, while in other settings, the biases become weaker (Kerr et al., 1996). Similarly, research in psychology also suggests that groups’ decisions are not necessarily superior to those of individuals (Tindale and Winget, 2019), although Charness and Sutter (2012) analyze results from game theoretical experiments where groups are the players and conclude that groups are more likely to make choices that are predicted by game theory. For the closely related topic of risky choice on behalf of others, and whether there is a “risky shift” (i.e. a tendency to make riskier choices when choosing on behalf of others) or a “cautious shift” (i.e. a tendency to make less risky choices when choosing on behalf of others), Polman and Wu (2020) provide a literature review and meta-analysis of previous studies where they calculated effect sizes and found that authors reported a medium-to-large size (Cohen’s d) statistically significant (α = 0.05) risky shift in 22.7% of the effects reported from previous studies, with 13.3% of the effects suggesting a cautious shift, and the remaining effects suggesting there is no statistically significant difference between decisions made for oneself and for others. They also point out the importance of moderator variables, such as the frame of decision analyzed (choice over gains or losses) or the identity of the choice recipient (stranger vs. friend, or more vulnerable individual). However, a key difference between the experimental design of the studies that they analyzed and this research is the fact that group leaders have to bear the consequences of their choices since their payoff is also determined by those choices, and therefore the choice situation in this research can be seen as a simultaneous choice for themselves and others.

These differences in outcomes in previous research might be because mechanisms used for reaching an agreement about a group’s choice differ widely. Except for Baillon et al. (2016) who compare the majority rule and unanimity rule for decision-making, none of the previous research in economics pays attention to the possible effects of different types of hierarchy or decision-making rules on decision-making outcomes. Baillon et al. (2016) analyze group behavior in lottery choices and found that whereas groups of three are less likely to violate stochastic dominance, they make riskier choices than individuals in Allais paradox tasks (Allais, 1953). Concerning decision-making rules, the unanimity rule is found to improve both group communication and group rationality. Baker et al. (2008) also compare individual and three-person group behavior in lottery tasks with the unanimity rule and found that the number of safe lotteries chosen by groups is higher than the number of safe lotteries chosen by individual group members. Masclet et al. (2009) compare individual and three-person group behavior with the majority rule and found that groups chose safer lotteries. Rockenbach et al. (2007) found that groups of three accumulate more expected value at a lower risk than individuals, although both violate the expected utility theory. However, Shupp and Williams (2008) found that whereas groups of three are less risk averse than individuals over lottery choices with high winning probabilities, they become more risk averse as the winning probability decreases.

Bone et al. (1999) and Bateman and Munro (2005) both analyze if individuals and groups differ concerning violations of expected utility theory. Bateman and Munro (2005) compare couples to individuals and found no differences in behavior. Bone et al. (1999) compare randomly matched groups of two individuals and found no differences either. Charness et al. (2007) analyze if groups and individuals differ in violation of monotonicity concerning first-order stochastic dominance. This violation can actually be a decision-making error and not a mere expression of preferences; that is, an individual should always prefer a better chance of winning more money. For groups of two and three and no specified decision rule, they found that groups are less likely to make errors when facing these lottery choices. Deck et al. (2012) analyze differences between individuals and groups of two and found that gender and age influence bargaining strength, and that making a pair decision first increases risk-taking in subsequent individual choices. Ertac and Gurdal (2012) analyze risk-taking behavior on behalf of a group of six and found that men are more willing to become leaders than women and that both take fewer risks when deciding on behalf of a group. Maciejovsky et al. (2013) found that groups of two make choices closer to the rational prediction and learn the solution faster than individuals in challenging probability and reasoning tasks, namely, the Monty Hall problem and the Wason selection task.

To summarize the experimental literature on group decision-making, there is no consensus about whether groups make better decisions than individuals, or whether they are more rational than individuals. Except for Baillon et al. (2016), no previous research focuses on the possible role of different decision-making rules, and none focuses on hierarchies, suggesting an investigation of choice over lotteries and performance in intellective tasks under different hierarchies presents a novel research question in the field.

The literature on organizational behavior and psychology has mostly focused on the impact hierarchies might have on performance. Following Laughlin (1980), there has been a broad distinction between performance in ‘intellective tasks’ that have a correct answer (e.g. algebra problems) and ‘judgmental tasks’ that do not. In this framework, the lottery choice task belongs to the second category, as there is no correct answer, in contrast to the intellective tasks. Levine and Smith (2013) distinguish between ‘group problem-solving tasks’ that have high demonstrability and are intellective and ‘group decision-making tasks’ that have low demonstrability and are judgmental. In social psychology, much of the second type involves tasks where groups have to choose from among a small number of discrete alternatives (such as lotteries). This study uses tasks of both types and analyzes the effect of hierarchies on their outcomes.

According to Halevy et al. (2011), hierarchies are one of the most common forms of organization and often emerge spontaneously. They identify five ways in which hierarchies can enhance performance and success: They fulfill psychological needs, such as power and status; provide an incentive system, therefore motivating both high- and low-ranked individuals; increase coordination; reduce conflict; and improve cooperation. Finally, they also bring about complementary psychological processes related to group members having or lacking power, with those having power hypothesized by Van Vugt (2006) to “see the big picture, initiate, and lead whereas those who lack power attend to details, conform, and follow” (as cited in Halevy et al., 2011). Halevy et al. (2011) also identify three moderators when hierarchy is beneficial for organizations: when there is procedural interdependence (such as when the members’ input is needed for reaching a decision), when the hierarchy is legitimate (i.e. it is seen as fair by members (Tyler, 2001), and they have internalized the leader’s entitlement to lead, following orders out of their own will rather than out of anticipation of positive or negative consequences (Tyler, 2006), and when different bases of hierarchy such as competence or power are aligned rather than misaligned (i.e. those with high levels of competence also are the ones with the right to lead). Consequently, when status (i.e. leader’s status in a hierarchy) is conferred based on individual characteristics that are irrelevant or even detrimental to goals, hierarchies might not be beneficial (Halevy et al., 2011).

Other authors have focused on why group inequalities such as hierarchy might worsen group performance. Ronay et al. (2012) focus on the role of status conflicts and their potentially detrimental effect on group performance. They found that hierarchically differentiated groups are more productive in a high, rather than low, procedural interdependence task.

There is scant literature on hierarchy and its possible impact on risk-taking behavior. One exception is the work by Mihet (2013), who does not take an experimental approach but analyzes firm-level data from 51 countries for the effect of ‘culture’ on corporate risk-taking behavior. This is measured using data from the corporate vulnerability utility (CVU) developed by the IMF using Worldscope and Datastream data. The findings suggest (among other results) that risk-taking is higher in countries with a low tolerance for hierarchical relationships.

The previous section has shown that the effect of hierarchies depends on several factors, such as the type of task that groups are supposed to perform and the criteria upon which leadership is conferred. Based on the surveyed literature, the following hypotheses are derived:

H1: Groups with a leader assigned based on irrelevant characteristics (such as age or using a random assignment mechanism) perform worse on intellective tasks than groups with a leader assigned based on merit.

H2: Groups with a hierarchy make more safe choices in a lottery choice task than those without a hierarchy.

Experimental methods

This section describes experimental methods and procedures: the experimental design and treatments used, the implementation of ‘hierarchy’ in experimental treatments, the decision-making tasks used during the experiments, and the process of conducting the experiments.

The experiment consisted of three stages and six treatments, as summarized in Fig. 1.

To control for possible group composition effects (Moreland et al., 1996; Yoon and Kim, 2021), experiments were conducted using a within-subjects design. To investigate the possible effects of hierarchy on decision-making outcomes, three stages were required: (1) individual decision-making tasks as the baseline case to compare leaders’ choices as individuals and as leaders; (2) group decision-making tasks without hierarchy; and (3) group decision-making tasks with hierarchy, where four different types of hierarchies were investigated. The effect of hierarchy on group decision-making can then be observed by comparing the results between the second and the third stage. The effect of becoming a leader on individuals’ decision-making outcomes can be investigated by comparing choices in the first and third stages.

The decision-making tasks used were chosen to enable comparison with previous research on group decision-making. The lottery described in Holt and Laury (2002) was selected because of its widespread use, and the fact that this decision-making task has been used in a considerable number of previous studies enables both the replication of previous results and a comparison of the effects of different hierarchies. In this lottery choice task that can be used to measure risk attitude, subjects make choices between 10 paired lotteries A (with a low variability) and B (with a high variability). In the first decision, the probability of a high payoff is 10% and the probability of a low payoff is 90%, leading to an expected value of KRW 1640 for A and KRW 475 for B. The probability of the high payoff is then increased by 10% in each step, leading to a decrease in the expected value of A and an increase in the expected value of B. Assuming risk neutrality, a decision-maker should only care about expected values and therefore “switch” from Lottery A to Lottery B if the probability of the higher payoff is above 40%, resulting in a higher expected value of B. However, if the decision-maker continues to choose A, this indicates risk aversion, and correspondingly, if she chooses B over A even for lower expected values, this indicates risk love. The lottery choice sheet is provided in Fig. 2. Full experimental instructions can be found in Appendix B.

Fig. 1
figure 1

Structure of the experiment.

In addition to this lottery choice task, individuals and groups also had to solve intellective tasks, similar to those described in Curşeu et al. (2013) and Huang and Wang (2010). Those tasks represented the application of framing effects, the Ellsberg paradox, and basic probability. Examples of all intellective tasks can be found in Appendix 2. The intellective tasks provided were different in each round to ensure that there was no repetition.

Fig. 2: Decision sheet for lottery choice task (Holt and Laury, 2002).
figure 2

KRW Korean Won.

The first experimental stage was implemented as an individual task, with all subjects making a lottery choice and solving an intellective task.

The second experimental stage (group without hierarchy) was implemented as group decision-making with a majority vote as the decision-making mechanism. With a majority vote, every group member’s decision has the same weight and there is no hierarchy.

In the third experimental stage (group with hierarchy), four different types of hierarchies were investigated. The first type of hierarchy was to have a ‘random hierarchy’, where the group leader was selected by a random mechanism (i.e., the throw of a die). The second type was to have an elected leader for every group (‘hierarchy by vote’). The third type was a ‘hierarchy by age’, where the oldest group member was the group leader. Finally, the fourth type was a ‘hierarchy by merit’, where the group member who performed best on a financial literacy test became the group leader. While hierarchies are a common feature of organizations, the existing literature on typology of hierarchies is surprisingly scant (Romme, 2021), with the only distinction typically being made between formal and informal hierarchy (Magee and Galinsky, 2008; Diefenbach and Sillince, 2011) and all types of hierarchy in this experiment being implemented as formal hierarchies.

The leadership assignment mechanisms in this research were chosen to test the first research hypothesis and to represent (albeit necessarily stylized) versions of mechanisms used in organizations. To test the hypothesis that groups with a leader assigned based on irrelevant characteristics will perform worse in intellective tasks than those where the leader was assigned based on merit, a hierarchy with a randomly elected leader was implemented. Secondly, a hierarchy with an elected leader was implemented to resemble the typical assignment of political leadership in democracies. Thirdly, a hierarchy by age was implemented to consider the fact that seniority-based promotion decisions are still widespread, especially in the Republic of Korea, where the experiment was conducted (Horak and Yang, 2019), but also in other contexts (see, for example, Cirone et al. (2021) for an analysis of seniority-based nomination allocations in political parties in Norway). Fourthly, a merit-based hierarchy was implemented to resemble promotion decisions based on individual performance-based evaluation systems as a part of high-performance work practices (Cappelli, 1999). While the experimental approach chosen in this research comes at the cost of simplification and abstraction from many features of hierarchical organizations outside of the laboratory, it is uniquely suited to isolate and analyze the ceteris paribus effects of a certain type of hierarchy on groups’ decision-making, since it allows the researcher to hold fixed all other parameters that might matter for outcomes, such as group size, group composition, or group and/or power dynamics that might matter in “real-world settings”, such as firms.

In all hierarchies, the leader took the final decision about the group tasks after a 10-min discussion time with group members. The effects of hierarchy type on groups’ decision-making outcomes can be investigated by comparing choices in the second the third stages.

Group sizes of three were chosen to ensure a majority decision in the group without hierarchy and the election of a leader in the hierarchy by vote, to improve comparability with previous research that has typically used groups of three (Baillon et al., 2016; Baker et al., 2008; Masclet et al., 2009; Rockenbach et al., 2007; Shupp and Williams, 2008), take into account the fact that small groups are “primary agents for performing many of the tasks of organizations and institutions” (Witte and Davis, 1996), and to manage the cost of the experiment.

Sessions were conducted as paper-and-pencil experiments due to the lack of an experimental lab at the university in Wonju (Republic of Korea) where the sessions took place. In line with standard practice, undergraduate students were recruited as subjects, with no restrictions based on major, age, or gender. As this was the first economic experiment conducted at the university, there were no restrictions on prior experiment participation either. The instructions were translated into Korean by one person and back into English by another person to check for inconsistencies. Non-Korean subjects were provided with the original English instructions, whereas Korean subjects were provided with translated instructions. The instructions were pre-tested and the overall time needed for the completion of all tasks and questionnaires was determined first. The experimental sessions were conducted in three sessions, with a total of 99 participants. On average, the experimental sessions lasted 103 min and average earnings were KRW 19,200. To provide a measure of comparison for judging the appropriate size of incentives, most on-campus student jobs pay the Korean minimum wage, which increased from KRW 5580 to KRW 6030 per hour after the first experimental session, suggesting that the incentives should be sufficient.

In the pretest and the three experimental sessions, subjects were first welcomed to the experiment and provided their consent for participation. Then, the instructions were distributed and read out aloud. In sessions with Korean and non-Korean subjects, the instructions were read in both Korean and English and in sessions with Korean subjects only, they were read in Korean. In the first treatment, which consisted of individual choices and tasks, subjects made their choice over the lottery and completed three different intellective tasks individually.

After this stage, subjects were randomly assigned into groups of three and they participated in all five group treatments (i.e. the voting group and the four hierarchy group treatments). Participation order in the different types of hierarchies was randomized. After the final treatment in this last stage, subjects answered a background questionnaire.

Finally, payments were determined and subjects were paid according to their choices in all treatments (individual, group without hierarchy, and group with four different types of hierarchy). For each of the lottery choices, one row was randomly determined by the throw of a 10-sided die to be relevant for the payout. Payment was determined identically, was the same amount for all members of the group in the group treatments and corresponded to the choice that the group (or the leader on behalf of the group) had made in the respective treatment. In line with standard procedures, subjects were presented with incentivized choices, i.e. paid cash for their participation in the experiment based on the choices they made. In addition, subjects received a show-up fee of 5000 KRW to ensure they received adequate payment for participation.

Full experimental instructions, examples of the intellective tasks, and financial literacy questions can be found in Appendix B.

After deleting observations with missing values, a total of n = 96 participants in 32 groups remained. As each group participated in five rounds (as a voting group and under all four types of hierarchy investigated in this study), the group dataset consists of n = 160 observations. A total of n = 128 participants were leaders in one or more types of hierarchies and, therefore, a dataset consisting of leaders is also available for analysis.

Table 12 in the appendix provides basic demographics for the sample.

Results

This section presents and discusses results from both Bayesian hypothesis testing and classical (frequentist) regression analysis using experimental data. All Bayesian analyses were carried out using JASP (JASP Team, 2023), an open-source statistical software program software based on R following the suggestions of Rouder et al. (2012) for the specification and implementation of default Bayes factor tests that are both invariant to measurement units and computationally convenient, as well as facilitating the interpretation of results (van den Bergh et al., 2020). Since there is little previous knowledge about the research question analyzed here, no changes to JASP’s default prior distributions were made, following the recommendations in van Doorn et al. (2021). JASP provides Bayes factors for null and alternative model(s) and the changes in predictive power from null to alternative model that are presented in the following analyses. The Bayes factors were categorized based on the suggestions for evidence categories by Andraszewicz et al. (2014) following Jeffreys (1961), based on the size of the Bayes factor BF01 (i.e., the factor in favor of the null hypothesis relative to the alternative hypothesis). All frequentist regression analyses were carried out using STATA/SE 17.0.

To give an overview of individual choices, the choices from the first treatment (individual choice) will be briefly discussed before moving on to group choices as the main research focus of this study, first providing Bayesian hypothesis testing results to investigate possible differences in group outcomes before moving on to a classical/frequentist regression analysis of determinants of group outcomes and changes in predicted probabilities of choosing a certain outcome. Then, a comparison of individuals’ choices in the first treatment (individual choice) to their choices when they became leaders under different hierarchy types using Bayesian hypothesis testing is provided, followed by regression analysis of the determinants of leaders’ choices, and lastly, changes in predicted probabilities of outcomes for four different “ideal types” to detect possible heterogeneities in leaders’ behavior are presented.

Table 1 presents an overview of choices in all tasks for individuals. The average number of safe choices in the lottery is 4.833. About 6.25% percent of individuals made inconsistent choices, that is, they switched several times between the safe and the risky option, which implies that they cannot be categorized as having risk-averse, risk-neutral, or risk-loving preferences. On average, subjects solved 32.29% of the intellective tasks correctly.

Table 1 Number of safe choices, percentage of inconsistent choices for individuals.

Table 2 compares the choices of five different types of groups: a voting group without a hierarchy, a group with a randomly determined hierarchy, a group with an elected leader, a group where the eldest group member became the leader and a group where the group member who performed best on a financial literacy test became the leader. Please note that there are n = 160 observations in this data set, as all 32 groups made decisions under the aforementioned five different group treatments (a voting group and four different types of hierarchy).

Table 2 Number of safe choices, percentage of inconsistent choices and correct number of answers to intellective task for groups.

As Table 2 shows, groups with an age-based hierarchy make most safe choices (5.1563), followed by a voted leader (5.0938), those with a randomly determined leader and the voting group (5 each), and hierarchy by merit (4.9688). Voting groups, random, and age-based hierarchies make fewer inconsistent choices in the lottery (3.125%) than those with a hierarchy by merit (6.25%) and an elected leader (12.5%). The probability of providing a correct answer in the intellective task is highest under the hierarchy by merit (53.125%), followed by a random hierarchy (28.125%), a hierarchy by age and by vote (both 21.875%), and finally lowest for the voting group (16.125%).

Results using Bayesian repeated-measures ANOVA are presented in the next table. Results from frequentist ANOVA and non-parametric frequentist hypothesis tests (Friedman’s test (Friedman, 1940) and Cochran’s Q test (Cochran, 1950, using the Stata version of the test by Dinno, 2017)) are presented in the Appendix, Table 13, since the data analyzed are ordered and binary and q–q plots showed that they violate the normality assumption. However, if one is willing to extend the argument that classical/frequentist parametric tests are reasonably robust to violations of the normality assumption (Norman, 2010; Knief and Forstmeier, 2021) to Bayesian testing, the advantages of Bayesian hypothesis testing may justify the use of a Bayesian repeated-measures ANOVA that requires normality (Table 3).

Table 3 Bayesian repeated measures ANOVA results, post-hoc analysis: (a) number of safe choices, (b) probability of inconsistent choice, and (c) probability of correct IT answer.

For the number of safe choices in the Holt/Laury-type lottery, the Bayes factor BF01 in favor of the null hypothesis is 47.719, representing “very strong” evidence in favor of the null hypothesis relative to the alternative hypothesis following the framework suggested by Andraszewicz et al. (2014), suggesting there seem to be no differences in risk attitude between groups with different types of hierarchies. For the probability of making an inconsistent choice in the lottery, the Bayes factor BF01 in favor of the null hypothesis is 4.860, representing “moderate” evidence in favor of the null hypothesis relative to the alternative hypothesis, suggesting there seem to be no differences between groups with different types of hierarchies. Lastly, for the probability of providing a correct answer in the intellective task, the Bayes factor BF01 in favor of the null hypothesis relative to the alternative hypothesis is 0.105, representing “strong” evidence in favor of the alternative hypothesis relative to the null hypothesis and suggesting that there are indeed differences between groups with different hierarchies. A post-hoc analysis reveals that when comparing all five types of groups (no hierarchy, random hierarchy, hierarchy by vote, by age, and by merit), the Bayes factor BF01 in favor of the null hypothesis is 0.019 for the comparison between a voting group and a group with a merit-based hierarchy, indicating “very strong” evidence in favor of the alternative hypothesis that groups with a merit-based hierarchy indeed are more likely to provide a correct answer. Bayes factors BF01 for the comparisons between other types of hierarchies and the voting group range between 2.084 and 4.415, suggesting “anecdotal” to “moderate” evidence in favor of the null hypothesis relative to the alternative hypothesis.

Table 4 presents regression results for the same outcomes (number of safe choices in the lottery, probability of making an inconsistent choice, probability of a correct answer in the intellective task). Column 1 presents the determinants of the number of safe choices in the lottery task, using an ordered probit model. Column 2 presents the determinants of the probability of making an inconsistent choice in the lottery task, using a probit model. Column 3 presents the determinants of the probability of providing the correct answer in the intellective task, using a probit model. The baseline case is the voting group, and four dummy variables denote the aforementioned four different types of hierarchy.

Table 4 Regression results—number of safe choices, probability of inconsistent choices, and correct answers to intellective task for groups.

To save space, only the regressors of interest are presented here. Full regression results including the constants for probit and cutpoints for ordered probit models are provided in Table 12 in Appendix A. ***, **, and * denote significance levels of 1%, 5%, and 10%, respectively.

There seem to be no effects of different types of hierarchies on the number of safe choices and the probability of making an inconsistent choice in Holt and Laury (2002) type lottery choice tasks. However, compared to voting groups, groups with a leader assigned based on merit are significantly more likely to provide a correct answer in the intellective tasks.

To gauge the size of this effect, Table 5 presents changes in predicted probabilities for the probability of making an inconsistent choice in the lottery task and giving a correct answer in the intellective task. These were computed using the SPost 14 package (Long and Freese, 2014). P-values are provided in parentheses. Again, to save space, changes in predicted probabilities using ordered probit models are only presented in Appendix A, Table 15.

Table 5 Changes in predicted probabilities, probability of inconsistent choices and correct answers to intellective task for groups.

Compared to a voting group, having a leader assigned based on merit increases the probability of giving a correct answer in the intellective task by 38.9%. This effect is found to be statistically significant at the 1% level.

The groups’ choices might simply be a repetition of the leaders’ choices in the first individual choice task, both in the lottery and in the intellective tasks. To investigate this possibility, the following table compares the choices of leaders in the individual tasks to those they made as leaders of groups, using Bayesian paired-data t-tests. Again, results from frequentist non-parametric tests (Wilcoxon signed-rank test (Wilcoxon, 1945) and McNemar’s χ2-test (McNemar, 1947)) and frequentist t-tests are provided in the Appendix, Table 14. This data set consists of the n = 128 individuals who were leaders in one or more of the different hierarchy treatments. Since this is a within-subject comparison of choices, it also controls for differences in both cognitive and non-cognitive ability and other individual differences that might matter for the quality of answers.

In all treatments, the number of safe choices made by leaders increased, compared to the choices they made as individuals. The percentage of those who made inconsistent choices decreased or was identical. However, an analysis of Bayes factors BF01 in favor of the null hypothesis relative to the alternative hypothesis shows that the factors are between 1.354 and 5.295, suggesting “anecdotal” to “moderate” evidence in favor of the null hypothesis relative to the alternative hypothesis (Table 6).

Table 6 Leader’s choices in individual choice and as leaders, results from Bayesian repeated samples t-tests.

As there might be other determinants of leaders’ behavior, Table 7 presents results from regression analyses of the determinants of leaders’ choices, which also allows controlling for leaders’ characteristics. This data set consists of n = 128 individuals who were leaders in one or more hierarchy treatments. While this analysis allows controlling for leaders’ characteristics, it also allows comparing only their choices in the four hierarchy treatments, not choices in the voting group treatment which, by definition, had no leader.

Table 7 Determinants of leaders’ choices: estimated coefficients.

Table 7 presents results for the determinants of the three following outcomes: the number of safe choices (column 1), using an ordered probit model; the probability of making an inconsistent choice (column 2), using a probit model; and the probability of making a correct decision in the intellective task (column 3); using a probit model. In all three regressions, the following variables are included as regressors: a leader’s choices in the individual tasks (namely, their number of safe choices, if they made an inconsistent choice in the individual lottery choice task, and their number of correct decisions in the individual intellective tasks), their gender (1 if female, 0 if male), the number of group members they had known before the experiment (if any), their citizenship (1 if Korean, 0 if other), and the type of hierarchy under which they were the leader. The merit-based hierarchy is the baseline case and three dummy variables take the value of 1 if the person was a leader in an age-based, vote-based, or random mechanism-based hierarchy. To save space, only the regressors of interest are presented here. Full regression results including the constants and cutpoints can be found in Appendix A, Table 14. ***, **, and * denote significance levels of 1%, 5%, and 10%, respectively.

As these estimated coefficients are not very meaningful in non-linear models, I will only briefly discuss them before moving on to a more thorough discussion of marginal effects and changes in predicted probabilities for “ideal types”.

For the determinants of the number of safe choices in the lottery task (column 1), a measure of risk aversion, no effect was found for any type of hierarchy, compared to the baseline case of hierarchy by merit. Two individual-level regressors are statistically significant: leaders who made more safe choices in the individual choice task (i.e. leaders who are more risk averse) tend to make more safe choices as the leader as well. Female leaders, interestingly, tend to make fewer safe choices than males.

For the determinants of the probability of making an inconsistent choice in the lottery task (column 2), again, no effect was found for any type of hierarchy, compared to the baseline case of hierarchy by merit. Two individual-level regressors are statistically significant: the more group members a group knew before the experiment, the higher their probability of making an inconsistent choice. In addition, Koreans are less likely to make an inconsistent choice, compared to non-Koreans.

For the determinants of the probability of making an incorrect choice in the intellective task (column 3), a clear result emerges: compared to leaders who were appointed in a hierarchy by merit, all others are significantly less likely to make a correct choice, even after controlling for the leader’s number of correct choices in the intellective tasks of the individual choice task and other individual characteristics. In addition, Koreans are significantly less likely to make a correct choice in the intellective task, compared to non-Koreans.

Table 8 presents average marginal effects (AMEs) for the same three regressions. These AMEs were calculated by computing changes for each observation at its observed values and then averaged, using the SPost13 package in Stata 16.1 (Long and Freese, 2014). P-values are reported in parentheses. These average marginal effects were calculated for the change from hierarchy by merit to the respective other types of hierarchy (hierarchy by age, by vote, or a random hierarchy). Marginal effects for the probability of making an inconsistent choice and a correct answer, using a probit model and calculated using the SPost package will be reported and discussed.

Table 8 Type of hierarchy and leaders’ choices—OLS estimates, marginal effects.

As the interpretation of marginal effects in an ordered probit model with a dependent variable that can take values between 0 and 10 is rather space-consuming and probably not very informative for the research question, estimated coefficients from OLS regression for the number of safe choices were reported here. Marginal effects from an ordered probit model are presented in Appendix A, Table 17.

The results for determinants of the number of safe choices in the Holt and Laury (2002) lottery task (column 1) will be discussed first. None of the different types of hierarchy have a statistically significant effect on the number of safe choices. However, the leader’s number of safe choices in the individual choice task has a statistically significant effect, suggesting that individual risk attitude matters for a leader’s choices as well. For every additional safe choice that a group’s leader made in the individual choice task, the number of safe choices they make as leaders increase by 0.6757. This estimated coefficient is statistically significant at the 1% level. In addition, female leaders make 0.5442 fewer safe choices than male leaders, and this estimated coefficient is statistically significant at the 5% level.

Regarding the determinants of the probability of making an inconsistent choice in the lottery task (column 2), there is an effect of having a leader based on age, as opposed to merit. Having a leader based on age decreases the probability of making an inconsistent choice by 5.3%, compared to having a leader based on merit. Also, Korean leaders have a lower probability of making an inconsistent choice, compared to non-Korean leaders, where the probability of making an inconsistent lottery choice as a leader decreases by 6.7%, and this effect is statistically significant at the 1% level.

Finally, for the probability of making an incorrect choice in the intellective task (column 3), the following results can be stated: compared to leaders who were appointed in a hierarchy by merit, all others are significantly less likely to make a correct choice, even after controlling for the leader’s number of correct choices in the intellective tasks in the individual choice task. More specifically, compared to leaders appointed by merit, leaders appointed by a random hierarchy are 16.4% less likely to make a correct choice, and this effect is statistically significant at the 5% level. Leaders appointed by vote are 18.6% less likely to make a correct choice, and this effect is statistically significant at the 1% level. Lastly, leaders appointed by age are 21.9% less likely to make a correct choice in the intellective task, and this effect is statistically significant at the 1% level. In addition, Koreans are significantly less likely to make a correct choice in the intellective task, compared to non-Koreans: the probability that they make a correct choice is 26.3% lower, and this effect is statistically significant at the 1% level.

Lastly, to provide a more complete analysis of the possibly heterogeneous effects of type of hierarchy on group decision-making outcomes, I also present changes in predicted probabilities of the probability of making an inconsistent choice and giving a correct answer in the intellective tasks for four different “ideal types” of leaders: a non-Korean woman, a non-Korean man, a Korean woman, and a Korean man. The changes in predicted probabilities here were calculated as the result of a change from the merit-based hierarchy to a vote-based, age-based, and random hierarchy. The other regressors were held constant at the sample means, except for the number of safe choices in individual choice and the number of group members known before the experiment, which were held constant at the sample modes (Tables 912).

Table 9 Changes in predicted probabilities, Korean woman.
Table 10 Changes in predicted probabilities, non-Korean woman.
Table 11 Changes in predicted probabilities, Korean man.
Table 12 Changes in predicted probabilities, non-Korean man.

To save space, again, the results from ordered probit models for the number of safe choices are only presented in Appendix A (Table 18).

For the predicted probabilities of making an inconsistent choice in the lottery choice task, there are no statistically significant changes. For the predicted probabilities of making a correct answer in the intellective tasks, however, the predicted probabilities decrease for all types of hierarchy and all four “ideal types”, compared to a hierarchy by merit. Compared to a hierarchy by merit, the predicted probabilities decrease between 22.7% and 30.8% for a Korean woman, between 17.7% and 29.7% for a non-Korean woman, between 24.2% and 33.5% for a Korean man, and between 15.2% and 26.3% for a non-Korean man. The largest decrease in predicted probabilities is found for having a leader by age, compared to a leader by merit, and this effect is always found to be statistically significant.

Discussion and conclusion

This study presented experimental evidence on the choices of groups of three characterized by different types of hierarchies. The considered choice and decision tasks were a Holt and Laury-type lottery choice task (Holt and Laury, 2002) and intellective tasks, that is, applications of basic probability, framing, and Ellsberg paradox tasks. Bayesian ANOVA comparing the choices of groups under four different types of hierarchies suggests that there are no differences between groups’ number of safe choices and the probability of making an inconsistent choice in the lottery task, but that there are differences in the probability of providing a correct answer in the intellective tasks, with a post-hoc analysis suggesting that groups with a leader appointed based on merit perform better. Bayesian hypothesis testing comparing the choices of individuals in the individual tasks and as leaders suggested that no evidence was found for an effect of types of hierarchy on the number of safe choices, the probability of making an inconsistent choice in the lottery choice task, and the probability of providing a correct answer to intellective tasks.

Regression analysis of the determinants of groups’ choices further showed that those groups with a leader appointed by merit performed better on the intellective task choices. An additional regression analysis of the determinants of leaders’ choices suggests that compared to leaders appointed by merit, all others have lower probabilities of providing a correct answer to intellective tasks. Comparing these results to previous findings is difficult, as there is scant research on group behavior in experiments and none, to the best of my knowledge, that analyzes the role of hierarchies in group decision-making tasks such as the ones used in this study.

Although it is not the main focus of this research, findings confirm previous results comparing the choices of groups and individuals. Maciejovsky et al. (2013) found that groups make choices closer to rational prediction and learn the solution faster than individuals in challenging probability and reasoning tasks, with the tasks used here being different from theirs. It also confirms the findings by Baker et al. (2008), who found that the number of safe lotteries chosen by groups under unanimity voting is higher than the number of safe lotteries chosen by individual group members, Masclet et al. (2009), who found that with the majority rule, groups choose safer lotteries, and Rockenbach et al. (2007), who found that groups accumulate more expected value at a lower risk than individuals.

Results on the determinants of the quality of group decision-making in psychology have typically also focused on group members’ individual characteristics, such as their expertise, or their openness to other opinions (Tindale and Winget, 2019). The finding that groups with wiser members usually make better choices is confirmed by the finding that a merit-based leader makes better choices in intellective tasks. The results also confirm the findings by Drazen and Ozbay (2019) who found that the appointment mechanism matters for leaders’ choices, although they analyzed a dictator game and not the types of tasks analyzed in this research. Concerning differences between female and male leaders, results from this research suggest that female leaders make fewer safe choices than male leaders, which is in contrast to previous findings by Ertac and Gurdal (2012). While the earlier experimental economics literature on gender differences had concluded that women are more risk averse than men in individual decision-making (Croson and Gneezy, 2009), this finding has recently been challenged and re-evaluated in meta-analyses using statistical methods typically not used by economists, such as effect sizes, resulting in decidedly less drastic differences between the sexes concerning risk attitudes (Nelson, 2015), or finding that the differences are task-specific (Filippin and Crosetto, 2016). Possible explanations for the finding that female leaders made fewer safe choices might be cultural differences, or the fact that the experiment made the leader’s power salient and this has a differing effect on females’ and males’ risk preferences. Cultural differences (simplistically defined as being a matrilineal or a patriarchal culture) have been found to matter in previous economics research findings, such as for competitiveness (Gneezy et al., 2009), bargaining (Andersen et al., 2018), and propensity to lead (Banerjee et al., 2015), but not for risk preferences in individual choices (Gong and Yang, 2012). However, they seem unlikely as an explanation in South Korea, a patrilineal society with the highest gender wage gap (31.1%) among all OECD countries as of 2021 (OECD, 2022). If the salience of power has indeed differing effects on females and males, a priming experiment might be an interesting avenue for further research on the topic.

Concerning the research hypotheses posited in this study, H1 (Groups where the leader has been assigned based on irrelevant characteristics (such as age or using a random assignment mechanism), perform worse on intellective tasks than those where the leader was assigned based on merit) was confirmed, but H2 (Groups with a hierarchy make more safe choices in a lottery choice task than those without hierarchy) was rejected. From a management perspective, these experimental results suggest that the still common practice of assigning leadership based on seniority might lead to decisions of suboptimal quality, while assigning leadership based on merit could improve the quality of decisions, with no effect on groups’ risk-taking behavior. However, it should be kept in mind that the experimental approach used to derive these implications has its limits concerning the external validity of findings and that the results were derived using a sample of undergraduate students and groups of three, while groups or teams in organizations might have different sizes and additional dynamics that were not present in this experimental study, but could matter for decision-making outcomes. Additionally, the relatively narrow age range of students in the sample (between 19 and 30 years old, with a mean of 21.09 and a median of 21) should be a caveat for the external validity of the experimental results presented in this research.

In future research, it would be worthwhile to analyze the effect of hierarchies on other decision-making tasks, as well as the effect of other types of hierarchies on different types of outcomes, and to provide a more detailed analysis of the gender differences uncovered here. Also, experiments with non-student subject pools (such as managers, and individuals with a wider range of age groups than college students) might provide an interesting avenue for further research.