The Moral Machine experiment

Awad, Edmond; Dsouza, Sohan; Kim, Richard; Schulz, Jonathan; Henrich, Joseph; Shariff, Azim; Bonnefon, Jean-François; Rahwan, Iyad

doi:10.1038/s41586-018-0637-6

Article
Published: 24 October 2018

The Moral Machine experiment

Edmond Awad¹,
Sohan Dsouza¹,
Richard Kim¹,
Jonathan Schulz²,
Joseph Henrich²,
Azim Shariff³,
Jean-François Bonnefon⁴ &
…
Iyad Rahwan^1,5

Nature volume 563, pages 59–64 (2018)Cite this article

211k Accesses
804 Citations
3406 Altmetric
Metrics details

Subjects

Matters Arising to this article was published on 04 March 2020

Abstract

With the rapid development of artificial intelligence have come concerns about how machines will make moral decisions, and the major challenge of quantifying societal expectations about the ethical principles that should guide machine behaviour. To address this challenge, we deployed the Moral Machine, an online experimental platform designed to explore the moral dilemmas faced by autonomous vehicles. This platform gathered 40 million decisions in ten languages from millions of people in 233 countries and territories. Here we describe the results of this experiment. First, we summarize global moral preferences. Second, we document individual variations in preferences, based on respondents’ demographics. Third, we report cross-cultural ethical variation, and uncover three major clusters of countries. Fourth, we show that these differences correlate with modern institutions and deep cultural traits. We discuss how these preferences can contribute to developing global, socially acceptable principles for machine ethics. All data used in this article are publicly available.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 4: Association between Moral Machine preferences and other variables at the country level.**

Whose morality? Which rationality? Challenging artificial intelligence as a remedy for the lack of moral enhancement

Article Open access 07 October 2020

Human decision-making biases in the moral dilemmas of autonomous vehicles

Article Open access 11 September 2019

Social and juristic challenges of artificial intelligence

Article Open access 25 June 2019

Data availability

Source data and code that can be used to reproduce Figs. 2–4, Extended Data Figs. 1–7, Extended Data Tables 1, 2, Supplementary Figs. 3–21, and Supplementary Table 2 are all available at the following link: https://goo.gl/JXRrBP. The provided data, both at the individual level (anonymized IDs) and the country level, can be used beyond replication to answer follow-up research questions.

References

Greene, J. Moral Tribes: Emotion, Reason and the Gap Between Us and Them (Atlantic Books, London, 2013).
Tomasello, M. A Natural History of Human Thinking (Harvard Univ. Press, Cambridge, 2014).
Cushman, F. & Young, L. The psychology of dilemmas and the philosophy of morality. Ethical Theory Moral Pract. 12, 9–24 (2009).
Article Google Scholar
Asimov, I. I, Robot (Doubleday, New York, 1950).
Bryson, J. & Winfield, A. Standardizing ethical design for artificial intelligence and autonomous systems. Computer 50, 116–119 (2017).
Article Google Scholar
Wiener, N. Some moral and technical consequences of automation. Science 131, 1355–1358 (1960).
Article ADS CAS Google Scholar
Wallach, W. & Allen, C. Moral Machines: Teaching Robots Right from Wrong (Oxford Univ. Press, Oxford, 2008).
Dignum, V. Responsible autonomy. In Proc. 26th International Joint Conference on Artificial Intelligence 4698–4704 (IJCAI, 2017).
Dadich, S. Barack Obama, neural nets, self-driving cars, and the future of the world. Wired https://www.wired.com/2016/10/president-obama-mit-joi-ito-interview/ (2016).
Shariff, A., Bonnefon, J.-F. & Rahwan, I. Psychological roadblocks to the adoption of self-driving vehicles. Nat. Hum. Behav. 1, 694–696 (2017).
Article Google Scholar
Conitzer, V., Brill, M. & Freeman, R. Crowdsourcing societal tradeoffs. In Proc. 2015 International Conference on Autonomous Agents and Multiagent Systems 1213–1217 (IFAAMAS, 2015).
Bonnefon, J.-F., Shariff, A. & Rahwan, I. The social dilemma of autonomous vehicles. Science 352, 1573–1576 (2016).
Article ADS CAS Google Scholar
Hauser, M., Cushman, F., Young, L., Jin, K.-X. R. & Mikhail, J. A dissociation between moral judgments and justifications. Mind Lang. 22, 1–21 (2007).
Article Google Scholar
Carlsson, F., Daruvala, D. & Jaldell, H. Preferences for lives, injuries, and age: a stated preference survey. Accid. Anal. Prev. 42, 1814–1821 (2010).
Article Google Scholar
Johansson-Stenman, O. & Martinsson, P. Are some lives more valuable? An ethical preferences approach. J. Health Econ. 27, 739–752 (2008).
Article Google Scholar
Johansson-Stenman, O., Mahmud, M. & Martinsson, P. Saving lives versus life-years in rural Bangladesh: an ethical preferences approach. Health Econ. 20, 723–736 (2011).
Article Google Scholar
Graham, J., Meindl, P., Beall, E., Johnson, K. M. & Zhang, L. Cultural differences in moral judgment and behavior, across and within societies. Curr. Opin. Psychol. 8, 125–130 (2016).
Article Google Scholar
Hainmueller, J., Hopkins, D. J. & Yamamoto, T. Causal inference in conjoint analysis: understanding multidimensional choices via stated preference experiments. Polit. Anal. 22, 1–30 (2014).
Article Google Scholar
Luetge, C. The German Ethics Code for automated and connected driving. Philos. Technol. 30, 547–558 (2017).
Article Google Scholar
Müllner, D. Modern hierarchical, agglomerative clustering algorithms. Preprint at https://arxiv.org/abs/1109.2378 (2011).
Inglehart, R. & Welzel, C. Modernization, Cultural Change, and Democracy: The Human Development Sequence (Cambridge Univ. Press, Cambridge, 2005).
Muthukrishna, M. Beyond WEIRD psychology: measuring and mapping scales of cultural and psychological distance. Preprint at https://ssrn.com/abstract=3259613 (2018).
Hofstede, G. Culture’s Consequences: Comparing Values, Behaviors, Institutions and Organizations Across Nations (Sage, Thousand Oaks, 2003).
International Monetary Fund. World Economic Outlook Database https://www.imf.org/external/pubs/ft/weo/2017/01/weodata/index.aspx (2017).
Kaufmann, D., Kraay, A. & Mastruzzi, M. The worldwide governance indicators: methodology and analytical issues. Hague J. Rule Law 3, 220–246 (2011).
Article Google Scholar
Gächter, S. & Schulz, J. F. Intrinsic honesty and the prevalence of rule violations across societies. Nature 531, 496–499 (2016).
Article ADS Google Scholar
O’Neil, C. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy (Penguin, London, 2016).
Henrich, J. et al. In search of Homo Economicus: behavioral experiments in 15 small-scale societies. Am. Econ. Rev. 91, 73–78 (2001).
Article Google Scholar
Future of Life Institute. Asilomar AI Principles https://futureoflife.org/ai-principles/ (2017).
Haidt, J. The Righteous Mind: Why Good People Are Divided by Politics and Religion (Knopf Doubleday, New York, 2012).
Gastil, J., Braman, D., Kahan, D. & Slovic, P. The cultural orientation of mass political opinion. PS Polit. Sci. Polit. 44, 711–714 (2011).
Article Google Scholar
Nishi, A., Christakis, N. A. & Rand, D. G. Cooperation, decision time, and culture: online experiments with American and Indian participants. PLoS ONE 12, e0171252 (2017).
Article Google Scholar

Download references

Acknowledgements

I.R., E.A., S.D., and R.K. acknowledge support from the Ethics and Governance of Artificial Intelligence Fund. J.-F.B. acknowledges support from the ANR-Labex Institute for Advanced Study in Toulouse.

Author information

Authors and Affiliations

The Media Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
Edmond Awad, Sohan Dsouza, Richard Kim & Iyad Rahwan
Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
Jonathan Schulz & Joseph Henrich
Department of Psychology, University of British Columbia, Vancouver, British Columbia, Canada
Azim Shariff
Toulouse School of Economics (TSM-R), CNRS, Université Toulouse Capitole, Toulouse, France
Jean-François Bonnefon
Institute for Data, Systems & Society, Massachusetts Institute of Technology, Cambridge, MA, USA
Iyad Rahwan

Authors

Edmond Awad
View author publications
You can also search for this author in PubMed Google Scholar
Sohan Dsouza
View author publications
You can also search for this author in PubMed Google Scholar
Richard Kim
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Schulz
View author publications
You can also search for this author in PubMed Google Scholar
Joseph Henrich
View author publications
You can also search for this author in PubMed Google Scholar
Azim Shariff
View author publications
You can also search for this author in PubMed Google Scholar
Jean-François Bonnefon
View author publications
You can also search for this author in PubMed Google Scholar
Iyad Rahwan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

I.R., A.S. and J.-F.B. planned the research. I.R., A.S., J.-F.B., E.A. and S.D. designed the experiment. E.A. and S.D. built the platform and collected the data. E.A., S.D., R.K., J.S. and A.S. analysed the data. E.A., S.D., R.K., J.S., J.H., A.S., J.-F.B., and I.R interpreted the results and wrote the paper.

Corresponding authors

Correspondence to Azim Shariff, Jean-François Bonnefon or Iyad Rahwan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Robustness checks: internal validation of three simplifying assumptions.

Calculated values correspond to values in Fig. 2a (that is, AMCE calculated using conjoint analysis). For example, ‘Sparing Pedestrians [Relation to AV]’ refers to the difference between the probability of sparing pedestrians, and the probability of sparing passengers (attribute name: Relation to AV), aggregated over all other attributes. Error bars represent 95% confidence intervals of the means. AV, autonomous vehicle. a, Validation of assumption 1 (stability and no-carryover effect): potential outcomes remain stable regardless of scenario order. b, Validation of assumption 2 (no profile-order effects): potential outcomes remain stable regardless of left–right positioning of choice options on the screen. c, Validation of assumption 3 (randomization of the profiles): potential outcomes are statistically independent of the profiles. This assumption should be satisfied by design. However, a mismatch between the design and the collected data can happen during data collection. This panel shows that using theoretical proportions (by design) and actual proportions (in collected data) of subgroups results in similar effect estimates. See Supplementary Information for more details.

Extended Data Fig. 2 Robustness checks: external validation of three factors.

Calculated values correspond to values in Fig. 2a (AMCE calculated using conjoint analysis). For example, ‘Sparing Pedestrians [Relation to AV]’ refers to the difference between the probability of sparing pedestrians, and the probability of sparing passengers (attribute name: Relation to AV), aggregated over all other attributes. Error bars represent 95% confidence intervals of the means. a, Validation of textual description (seen versus not seen). By default, respondents see only the visual representation of a scenario. Interpretation of what type of characters they represent (for example, female doctor) may not be obvious. Optionally, respondents can read a textual description of the scenario by clicking on ‘see description’. This panel shows that direction and (except in one case) order of effect estimates remain stable. The magnitude of the effects increases for respondents who read the textual descriptions, which means that the effects reported in Fig. 2a were not overestimated because of visual ambiguity. b, Validation of device used (desktop versus mobile). Direction and order of effect estimates remain stable regardless of whether respondents used desktop or mobile devices when completing the task. c, Validation of data set (all data versus full first-session data versus survey-only data). Direction and order of effect estimates remain stable regardless of whether the data used in analysis are all data, data restricted to only first completed (13-scenario) session by any user, or data restricted to completed sessions after which the demographic survey was taken. First completed session by any user is an interesting subset of the data because respondents had not seen their summary of results yet, and respondents ended up completing the session. Survey-only data are also interesting given that the conclusions about individual variations in the main paper and from Extended Data Fig. 3 and Extended Data Table 1 are drawn from this subset. See Supplementary Information for more details.

Extended Data Fig. 3 Average marginal causal effect (AMCE) of attributes for different subpopulations.

Subpopulations are characterized by respondents’ age (a, older versus younger), gender (b, male versus female), education (c, less versus more educated), income (d, higher versus lower income), political views (e, conservative versus progressive), and religious views (f, not religious versus very religious). Error bars represent 95% confidence intervals of the means. Note that AMCE has a positive value for all considered subpopulations; for example, both male and female respondents indicated a preference for sparing females, but the latter group showed a stronger preference. See Supplementary Information for a detailed description of the cutoffs and the groupings of ordinal categories that were used to define each subpopulation.

Extended Data Fig. 4 Hierarchical cluster of countries based on country-level effect sizes calculated after filtering out responses for which the linguistic description was seen, thus neutralizing any potential effect of language.

The three colours of the dendrogram branches represent three large clusters: Western, Eastern, and Southern. The names of the countries are coloured according to the Inglehart–Welzel Cultural Map 2010–2014²¹. See Supplementary Information for more details. The dendrogram is essentially similar to that shown in Fig. 3a.

Extended Data Fig. 5 Validation of hierarchical cluster of countries.

a, b, We use two internal metrics of validation of three linkage criteria of calculating hierarchical clustering (Ward, Complete and Average) in addition to the K-means algorithm: a, Calinski–Harabasz index; b, silhouette index. The x axis indicates the number of clusters. For both internal metrics, a higher index value indicates a ‘better’ fit of partition to the data. c, d, We use two external metrics of validation of the used hierarchical clustering algorithm (Ward) versus those of random clustering assignment: c, purity; d, maximum matching. The histogram shows the distributions of purity and maximum matching values derived from randomly assigning countries to nine clusters. The red dotted lines indicate purity and maximum matching values computed from the clustering output of the hierarchical clustering algorithm using ACME values. See Supplementary Information for more details.

Extended Data Fig. 6 Demographic distributions of sample of population that completed the survey on Moral Machine (MM) website.

Distributions are based on gender (a), age (b), income (c), and education attributes (d). Most users on Moral Machine are male, went through college, and are in their 20s or 30s. While this indicates that the users of Moral Machine are not a representative sample of the whole population, it is important to note that this sample at least covers broad demographics. See Supplementary Information for more details.

Extended Data Fig. 7 Demographic distributions of US sample of population that completed the survey on Moral Machine website versus US sample of population in American Community Survey (ACS) data set.

a–d, Only gender (a), age (b), income (c), and education (d) attributes are available for both data sets. The MM US sample has an over-representation of males and younger individuals compared to the ACS US sample. e, A comparison of effect sizes as calculated for US respondents who took the survey on MM with the use of post-stratification to match the corresponding proportions for the ACS sample. Except for ‘Relation to AV’ (the second smallest effect), the direction and order of all effects are unaffected. See Supplementary Information for more details.

Extended Data Table 1 Regression table showing the individual variations for each of the nine attributes

Full size table

Extended Data Table 2 Country-level OLS regressions showing the relationships between key ethical preferences and various social, political and economic measures

Full size table

Supplementary information

Supplementary Information

This file contains Supplementary Text, Supplementary Figures 1-11, Supplementary Tables 1-2 and Supplementary References.

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Cite this article

Awad, E., Dsouza, S., Kim, R. et al. The Moral Machine experiment. Nature 563, 59–64 (2018). https://doi.org/10.1038/s41586-018-0637-6

Download citation

Received: 02 March 2018
Accepted: 25 September 2018
Published: 24 October 2018
Issue Date: 01 November 2018
DOI: https://doi.org/10.1038/s41586-018-0637-6

Keywords

This article is cited by

Knowledge as a key determinant of public support for autonomous vehicles
- Hao Tan
- Jiayan Liu
- Chao Tang
Scientific Reports (2024)
Global Healthspan Summit 2023: closing the gap between healthspan and lifespan
- Mehmood Khan
- Haya Al Saud
- Michael Torres
Nature Aging (2024)
Applying AVWEWM to ethical decision-making during autonomous vehicle crashes
- Guoman Liu
- Yufeng Luo
- Jing Sheng
Scientific Reports (2024)
Explicit discrimination and ingroup favoritism, but no implicit biases in hypothetical triage decisions during COVID-19
- Nico Gradwohl
- Hansjörg Neth
- Wolfgang Gaissmaier
Scientific Reports (2024)
The risk ethics of autonomous vehicles: an empirical approach
- Sebastian Krügel
- Matthias Uhl
Scientific Reports (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.