With the rapid development of artificial intelligence have come concerns about how machines will make moral decisions, and the major challenge of quantifying societal expectations about the ethical principles that should guide machine behaviour. To address this challenge, we deployed the Moral Machine, an online experimental platform designed to explore the moral dilemmas faced by autonomous vehicles. This platform gathered 40 million decisions in ten languages from millions of people in 233 countries and territories. Here we describe the results of this experiment. First, we summarize global moral preferences. Second, we document individual variations in preferences, based on respondents’ demographics. Third, we report cross-cultural ethical variation, and uncover three major clusters of countries. Fourth, we show that these differences correlate with modern institutions and deep cultural traits. We discuss how these preferences can contribute to developing global, socially acceptable principles for machine ethics. All data used in this article are publicly available.
This is a preview of subscription content
Subscribe to Nature+
Get immediate online access to the entire Nature family of 50+ journals
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Source data and code that can be used to reproduce Figs. 2–4, Extended Data Figs. 1–7, Extended Data Tables 1, 2, Supplementary Figs. 3–21, and Supplementary Table 2 are all available at the following link: https://goo.gl/JXRrBP. The provided data, both at the individual level (anonymized IDs) and the country level, can be used beyond replication to answer follow-up research questions.
Greene, J. Moral Tribes: Emotion, Reason and the Gap Between Us and Them (Atlantic Books, London, 2013).
Tomasello, M. A Natural History of Human Thinking (Harvard Univ. Press, Cambridge, 2014).
Cushman, F. & Young, L. The psychology of dilemmas and the philosophy of morality. Ethical Theory Moral Pract. 12, 9–24 (2009).
Asimov, I. I, Robot (Doubleday, New York, 1950).
Bryson, J. & Winfield, A. Standardizing ethical design for artificial intelligence and autonomous systems. Computer 50, 116–119 (2017).
Wiener, N. Some moral and technical consequences of automation. Science 131, 1355–1358 (1960).
Wallach, W. & Allen, C. Moral Machines: Teaching Robots Right from Wrong (Oxford Univ. Press, Oxford, 2008).
Dignum, V. Responsible autonomy. In Proc. 26th International Joint Conference on Artificial Intelligence 4698–4704 (IJCAI, 2017).
Dadich, S. Barack Obama, neural nets, self-driving cars, and the future of the world. Wired https://www.wired.com/2016/10/president-obama-mit-joi-ito-interview/ (2016).
Shariff, A., Bonnefon, J.-F. & Rahwan, I. Psychological roadblocks to the adoption of self-driving vehicles. Nat. Hum. Behav. 1, 694–696 (2017).
Conitzer, V., Brill, M. & Freeman, R. Crowdsourcing societal tradeoffs. In Proc. 2015 International Conference on Autonomous Agents and Multiagent Systems 1213–1217 (IFAAMAS, 2015).
Bonnefon, J.-F., Shariff, A. & Rahwan, I. The social dilemma of autonomous vehicles. Science 352, 1573–1576 (2016).
Hauser, M., Cushman, F., Young, L., Jin, K.-X. R. & Mikhail, J. A dissociation between moral judgments and justifications. Mind Lang. 22, 1–21 (2007).
Carlsson, F., Daruvala, D. & Jaldell, H. Preferences for lives, injuries, and age: a stated preference survey. Accid. Anal. Prev. 42, 1814–1821 (2010).
Johansson-Stenman, O. & Martinsson, P. Are some lives more valuable? An ethical preferences approach. J. Health Econ. 27, 739–752 (2008).
Johansson-Stenman, O., Mahmud, M. & Martinsson, P. Saving lives versus life-years in rural Bangladesh: an ethical preferences approach. Health Econ. 20, 723–736 (2011).
Graham, J., Meindl, P., Beall, E., Johnson, K. M. & Zhang, L. Cultural differences in moral judgment and behavior, across and within societies. Curr. Opin. Psychol. 8, 125–130 (2016).
Hainmueller, J., Hopkins, D. J. & Yamamoto, T. Causal inference in conjoint analysis: understanding multidimensional choices via stated preference experiments. Polit. Anal. 22, 1–30 (2014).
Luetge, C. The German Ethics Code for automated and connected driving. Philos. Technol. 30, 547–558 (2017).
Müllner, D. Modern hierarchical, agglomerative clustering algorithms. Preprint at https://arxiv.org/abs/1109.2378 (2011).
Inglehart, R. & Welzel, C. Modernization, Cultural Change, and Democracy: The Human Development Sequence (Cambridge Univ. Press, Cambridge, 2005).
Muthukrishna, M. Beyond WEIRD psychology: measuring and mapping scales of cultural and psychological distance. Preprint at https://ssrn.com/abstract=3259613 (2018).
Hofstede, G. Culture’s Consequences: Comparing Values, Behaviors, Institutions and Organizations Across Nations (Sage, Thousand Oaks, 2003).
International Monetary Fund. World Economic Outlook Database https://www.imf.org/external/pubs/ft/weo/2017/01/weodata/index.aspx (2017).
Kaufmann, D., Kraay, A. & Mastruzzi, M. The worldwide governance indicators: methodology and analytical issues. Hague J. Rule Law 3, 220–246 (2011).
Gächter, S. & Schulz, J. F. Intrinsic honesty and the prevalence of rule violations across societies. Nature 531, 496–499 (2016).
O’Neil, C. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy (Penguin, London, 2016).
Henrich, J. et al. In search of Homo Economicus: behavioral experiments in 15 small-scale societies. Am. Econ. Rev. 91, 73–78 (2001).
Future of Life Institute. Asilomar AI Principles https://futureoflife.org/ai-principles/ (2017).
Haidt, J. The Righteous Mind: Why Good People Are Divided by Politics and Religion (Knopf Doubleday, New York, 2012).
Gastil, J., Braman, D., Kahan, D. & Slovic, P. The cultural orientation of mass political opinion. PS Polit. Sci. Polit. 44, 711–714 (2011).
Nishi, A., Christakis, N. A. & Rand, D. G. Cooperation, decision time, and culture: online experiments with American and Indian participants. PLoS ONE 12, e0171252 (2017).
I.R., E.A., S.D., and R.K. acknowledge support from the Ethics and Governance of Artificial Intelligence Fund. J.-F.B. acknowledges support from the ANR-Labex Institute for Advanced Study in Toulouse.
The authors declare no competing interests.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Calculated values correspond to values in Fig. 2a (that is, AMCE calculated using conjoint analysis). For example, ‘Sparing Pedestrians [Relation to AV]’ refers to the difference between the probability of sparing pedestrians, and the probability of sparing passengers (attribute name: Relation to AV), aggregated over all other attributes. Error bars represent 95% confidence intervals of the means. AV, autonomous vehicle. a, Validation of assumption 1 (stability and no-carryover effect): potential outcomes remain stable regardless of scenario order. b, Validation of assumption 2 (no profile-order effects): potential outcomes remain stable regardless of left–right positioning of choice options on the screen. c, Validation of assumption 3 (randomization of the profiles): potential outcomes are statistically independent of the profiles. This assumption should be satisfied by design. However, a mismatch between the design and the collected data can happen during data collection. This panel shows that using theoretical proportions (by design) and actual proportions (in collected data) of subgroups results in similar effect estimates. See Supplementary Information for more details.
Calculated values correspond to values in Fig. 2a (AMCE calculated using conjoint analysis). For example, ‘Sparing Pedestrians [Relation to AV]’ refers to the difference between the probability of sparing pedestrians, and the probability of sparing passengers (attribute name: Relation to AV), aggregated over all other attributes. Error bars represent 95% confidence intervals of the means. a, Validation of textual description (seen versus not seen). By default, respondents see only the visual representation of a scenario. Interpretation of what type of characters they represent (for example, female doctor) may not be obvious. Optionally, respondents can read a textual description of the scenario by clicking on ‘see description’. This panel shows that direction and (except in one case) order of effect estimates remain stable. The magnitude of the effects increases for respondents who read the textual descriptions, which means that the effects reported in Fig. 2a were not overestimated because of visual ambiguity. b, Validation of device used (desktop versus mobile). Direction and order of effect estimates remain stable regardless of whether respondents used desktop or mobile devices when completing the task. c, Validation of data set (all data versus full first-session data versus survey-only data). Direction and order of effect estimates remain stable regardless of whether the data used in analysis are all data, data restricted to only first completed (13-scenario) session by any user, or data restricted to completed sessions after which the demographic survey was taken. First completed session by any user is an interesting subset of the data because respondents had not seen their summary of results yet, and respondents ended up completing the session. Survey-only data are also interesting given that the conclusions about individual variations in the main paper and from Extended Data Fig. 3 and Extended Data Table 1 are drawn from this subset. See Supplementary Information for more details.
Extended Data Fig. 3 Average marginal causal effect (AMCE) of attributes for different subpopulations.
Subpopulations are characterized by respondents’ age (a, older versus younger), gender (b, male versus female), education (c, less versus more educated), income (d, higher versus lower income), political views (e, conservative versus progressive), and religious views (f, not religious versus very religious). Error bars represent 95% confidence intervals of the means. Note that AMCE has a positive value for all considered subpopulations; for example, both male and female respondents indicated a preference for sparing females, but the latter group showed a stronger preference. See Supplementary Information for a detailed description of the cutoffs and the groupings of ordinal categories that were used to define each subpopulation.
Extended Data Fig. 4 Hierarchical cluster of countries based on country-level effect sizes calculated after filtering out responses for which the linguistic description was seen, thus neutralizing any potential effect of language.
The three colours of the dendrogram branches represent three large clusters: Western, Eastern, and Southern. The names of the countries are coloured according to the Inglehart–Welzel Cultural Map 2010–201421. See Supplementary Information for more details. The dendrogram is essentially similar to that shown in Fig. 3a.
a, b, We use two internal metrics of validation of three linkage criteria of calculating hierarchical clustering (Ward, Complete and Average) in addition to the K-means algorithm: a, Calinski–Harabasz index; b, silhouette index. The x axis indicates the number of clusters. For both internal metrics, a higher index value indicates a ‘better’ fit of partition to the data. c, d, We use two external metrics of validation of the used hierarchical clustering algorithm (Ward) versus those of random clustering assignment: c, purity; d, maximum matching. The histogram shows the distributions of purity and maximum matching values derived from randomly assigning countries to nine clusters. The red dotted lines indicate purity and maximum matching values computed from the clustering output of the hierarchical clustering algorithm using ACME values. See Supplementary Information for more details.
Extended Data Fig. 6 Demographic distributions of sample of population that completed the survey on Moral Machine (MM) website.
Distributions are based on gender (a), age (b), income (c), and education attributes (d). Most users on Moral Machine are male, went through college, and are in their 20s or 30s. While this indicates that the users of Moral Machine are not a representative sample of the whole population, it is important to note that this sample at least covers broad demographics. See Supplementary Information for more details.
Extended Data Fig. 7 Demographic distributions of US sample of population that completed the survey on Moral Machine website versus US sample of population in American Community Survey (ACS) data set.
a–d, Only gender (a), age (b), income (c), and education (d) attributes are available for both data sets. The MM US sample has an over-representation of males and younger individuals compared to the ACS US sample. e, A comparison of effect sizes as calculated for US respondents who took the survey on MM with the use of post-stratification to match the corresponding proportions for the ACS sample. Except for ‘Relation to AV’ (the second smallest effect), the direction and order of all effects are unaffected. See Supplementary Information for more details.
About this article
Cite this article
Awad, E., Dsouza, S., Kim, R. et al. The Moral Machine experiment. Nature 563, 59–64 (2018). https://doi.org/10.1038/s41586-018-0637-6
- Moral Machines
- Machine Ethics
- Moral Orientation
- Autonomous Vehicles
- Human Spare
Scientific Reports (2022)
Nature Reviews Psychology (2022)
Nature Reviews Psychology (2022)
Bridge over troubled water: managing compatibility and conflict among thought collectives in sustainability science
Sustainability Science (2022)
How virtue signalling makes us better: moral preferences with respect to autonomous vehicle type choices
AI & SOCIETY (2022)