An autonomous debating system

Slonim, Noam; Bilu, Yonatan; Alzate, Carlos; Bar-Haim, Roy; Bogin, Ben; Bonin, Francesca; Choshen, Leshem; Cohen-Karlik, Edo; Dankin, Lena; Edelstein, Lilach; Ein-Dor, Liat; Friedman-Melamed, Roni; Gavron, Assaf; Gera, Ariel; Gleize, Martin; Gretz, Shai; Gutfreund, Dan; Halfon, Alon; Hershcovich, Daniel; Hoory, Ron; Hou, Yufang; Hummel, Shay; Jacovi, Michal; Jochim, Charles; Kantor, Yoav; Katz, Yoav; Konopnicki, David; Kons, Zvi; Kotlerman, Lili; Krieger, Dalia; Lahav, Dan; Lavee, Tamar; Levy, Ran; Liberman, Naftali; Mass, Yosi; Menczel, Amir; Mirkin, Shachar; Moshkowich, Guy; Ofek-Koifman, Shila; Orbach, Matan; Rabinovich, Ella; Rinott, Ruty; Shechtman, Slava; Sheinwald, Dafna; Shnarch, Eyal; Shnayderman, Ilya; Soffer, Aya; Spector, Artem; Sznajder, Benjamin; Toledo, Assaf; Toledo-Ronen, Orith; Venezian, Elad; Aharonov, Ranit

doi:10.1038/s41586-021-03215-w

Article
Published: 17 March 2021

An autonomous debating system

Noam Slonim ORCID: orcid.org/0000-0001-5171-8264¹,
Yonatan Bilu¹,
Carlos Alzate²,
Roy Bar-Haim¹,
Ben Bogin¹,
Francesca Bonin²,
Leshem Choshen¹,
Edo Cohen-Karlik¹,
Lena Dankin¹,
Lilach Edelstein¹,
Liat Ein-Dor¹,
Roni Friedman-Melamed¹,
Assaf Gavron¹,
Ariel Gera¹,
Martin Gleize²,
Shai Gretz¹,
Dan Gutfreund¹,
Alon Halfon¹,
Daniel Hershcovich¹,
Ron Hoory¹,
Yufang Hou²,
Shay Hummel¹,
Michal Jacovi¹,
Charles Jochim²,
Yoav Kantor¹,
Yoav Katz¹,
David Konopnicki¹,
Zvi Kons¹,
Lili Kotlerman¹,
Dalia Krieger¹,
Dan Lahav¹,
Tamar Lavee¹,
Ran Levy¹,
Naftali Liberman¹,
Yosi Mass¹,
Amir Menczel¹,
Shachar Mirkin¹,
Guy Moshkowich¹,
Shila Ofek-Koifman¹,
Matan Orbach¹,
Ella Rabinovich¹,
Ruty Rinott¹,
Slava Shechtman¹,
Dafna Sheinwald¹,
Eyal Shnarch¹,
Ilya Shnayderman¹,
Aya Soffer¹,
Artem Spector¹,
Benjamin Sznajder¹,
Assaf Toledo¹,
Orith Toledo-Ronen¹,
Elad Venezian¹ &
…
Ranit Aharonov¹

Nature volume 591, pages 379–384 (2021)Cite this article

18k Accesses
62 Citations
395 Altmetric
Metrics details

Subjects

Abstract

Artificial intelligence (AI) is defined as the ability of machines to perform tasks that are usually associated with intelligent beings. Argument and debate are fundamental capabilities of human intelligence, essential for a wide range of human activities, and common to all human societies. The development of computational argumentation technologies is therefore an important emerging discipline in AI research¹. Here we present Project Debater, an autonomous debating system that can engage in a competitive debate with humans. We provide a complete description of the system’s architecture, a thorough and systematic evaluation of its operation across a wide range of debate topics, and a detailed account of the system’s performance in its public debut against three expert human debaters. We also highlight the fundamental differences between debating with humans as opposed to challenging humans in game competitions, the latter being the focus of classical ‘grand challenges’ pursued by the AI research community over the past few decades. We suggest that such challenges lie in the ‘comfort zone’ of AI, whereas debating with humans lies in a different territory, in which humans still prevail, and for which novel paradigms are required to make substantial progress.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 3: Evaluation of Project Debater.**

Negotiation and honesty in artificial intelligence methods for the board game of Diplomacy

Article Open access 06 December 2022

Understanding the dilemma of explainable artificial intelligence: a proposal for a ritual dialog framework

Article Open access 26 February 2024

Bad machines corrupt good morals

Article 03 June 2021

Data availability

The full transcripts of the three public debates in which Project Debater participated are available in Supplementary Information section 11, including information that elucidates the system’s operation throughout, and the results of the audience votes. In addition, multiple datasets that were constructed and used while developing Project Debater are available at https://www.research.ibm.com/haifa/dept/vst/debating_data.shtml. Source data are provided with this paper for Fig. 3. Source data are provided with this paper.

Code availability

Most of the underlying capabilities of Project Debater, including the argument mining components, are freely available for academic research upon request as cloud services via https://early-access-program.debater.res.ibm.com/academic_use (in which the terminology differs: what we call here ‘motion’ and ‘topic’ is denoted as ‘topic’ and ‘concept’, respectively.).

References

Lawrence, J. & Reed, C. Argument mining: a survey. Comput. Linguist. 45, 765–818 (2019).
Article Google Scholar
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. Preprint at https://arxiv.org/abs/1810.04805 (2018).
Peters, M. et al. Deep contextualized word representations. In Proc. 2018 Conf. North Am. Ch. Assoc. for Computational Linguistics: Human Language Technologies Vol. 1, 2227–2237 (Association for Computational Linguistics, 2018); https://www.aclweb.org/anthology/N18–1202
Radford, A. et al. Language models are unsupervised multitask learners. OpenAI Blog 1, http://www.persagen.com/files/misc/radford2019language.pdf (2019).
Socher, R. et al. Recursive deep models for semantic compositionality over a sentiment treebank. In Proc. Empirical Methods in Natural Language Processing (EMNLP) 1631–1642 (Association for Computational Linguistics, 2013).
Yang, Z. et al. XLNet: generalized autoregressive pretraining for language understanding. In Adv. in Neural Information Processing Systems (NIPS) 5753−5763 (Curran Associates,2019).
Cho, K., van Merriënboer, B., Bahdanau, D. & Bengio, Y. On the properties of neural machine translation: encoder–decoder approaches. In Proc. 8th Worksh. on Syntax, Semantics and Structure in Statistical Translation 103−111 (Association for Computational Linguistics, 2014).
Gambhir, M. & Gupta, V. Recent automatic text summarization techniques: a survey. Artif. Intell. Rev. 47, 1–66 (2017).
Article Google Scholar
Young, S., Gašić, M., Thomson, B. & Williams, J. POMDP-based statistical spoken dialog systems: A review. Proc. IEEE 101, 1160–1179 (2013).
Article Google Scholar
Gurevych, I., Hovy, E. H., Slonim, N. & Stein, B. Debating Technologies (Dagstuhl Seminar 15512) Dagstuhl Report 5 (2016).
Levy, R., Bilu, Y., Hershcovich, D., Aharoni, E. & Slonim, N. Context dependent claim detection. In Proc. COLING 2014, the 25th Int. Conf. on Computational Linguistics: Technical Papers 1489–1500 (Dublin City University and Association for Computational Linguistics, 2014); https://www.aclweb.org/anthology/C14–1141
Rinott, R. et al. Show me your evidence—an automatic method for context dependent evidence detection. In Proc. 2015 Conf. on Empirical Methods in Natural Language Processing 440–450 (Association for Computational Linguistics, 2015); https://www.aclweb.org/anthology/D15–1050
Shnayderman, I. et al. Fast end-to-end wikification. Preprint at https://arxiv.org/abs/1908.06785 (2019).
Borthwick, A. A Maximum Entropy Approach To Named Entity Recognition. PhD thesis, New York Univ. https://cs.nyu.edu/media/publications/borthwick_andrew.pdf (1999).
Finkel, J. R., Grenager, T. & Manning, C. Incorporating non-local information into information extraction systems by Gibbs sampling. In Proc. 43rd Ann. Meet. Assoc. for Computational Linguistics 363–370 (Association for Computational Linguistics, 2005).
Levy, R., Bogin, B., Gretz, S., Aharonov, R. & Slonim, N. Towards an argumentative content search engine using weak supervision. In Proc. 27th Int. Conf. on Computational Linguistics (COLING 2018) 2066–2081, https://www.aclweb.org/anthology/C18-1176.pdf (International Committee on Computational Linguistics, 2018).
Ein-Dor, L. et al. Corpus wide argument mining—a working solution. In Proc. Thirty-Fourth AAAI Conf. on Artificial Intelligence 7683−7691 (AAAI Press, 2020).
Levy, R. et al. Unsupervised corpus-wide claim detection. In Proc. 4th Worksh. on Argument Mining 79–84 (Association for Computational Linguistics, 2017); https://www.aclweb.org/anthology/W17–5110
Shnarch, E. et al. Will it blend? Blending weak and strong labeled data in a neural network for argumentation mining. In Proc. 56th Ann. Meet. Assoc. for Computational Linguistics Vol. 2, 599–605 (Association for Computational Linguistics, 2018); https://www.aclweb.org/anthology/P18–2095
Gleize, M. et al. Are you convinced? Choosing the more convincing evidence with a Siamese network. In Proc. 57th Conf. Assoc. for Computational Linguistic, 967–976 (Association for Computational Linguistics, 2019).
Bar-Haim, R., Bhattacharya, I., Dinuzzo, F., Saha, A. & Slonim, N. Stance classification of context-dependent claims. In Proc. 15th Conf. Eur. Ch. Assoc. for Computational Linguistics Vol. 1, 251–261 (Association for Computational Linguistics, 2017).
Bar-Haim, R., Edelstein, L., Jochim, C. & Slonim, N. Improving claim stance classification with lexical knowledge expansion and context utilization. In Proc. 4th Worksh. on Argument Mining 32–38 (Association for Computational Linguistics, 2017).
Bar-Haim, R. et al. From surrogacy to adoption; from bitcoin to cryptocurrency: debate topic expansion. In Proc. 57th Conf. Assoc. for Computational Linguistics 977–990 (Association for Computational Linguistics, 2019).
Bilu, Y. et al. Argument invention from first principles. In Proc. 57th Ann. Meet. Assoc. for Computational Linguistics 1013–1026 (Association for Computational Linguistics, 2019).
Ein-Dor, L. et al. Semantic relatedness of Wikipedia concepts—benchmark data and a working solution. In Proc. Eleventh Int. Conf. on Language Resources and Evaluation (LREC 2018) 2571−2575 (Springer, 2018).
Pahuja, V. et al. Joint learning of correlated sequence labelling tasks using bidirectional recurrent neural networks. In Proc. Interspeech 548−552 (International Speech Communication Association, 2017).
Mirkin, S. et al. Listening comprehension over argumentative content. In Proc. 2018 Conf. on Empirical Methods in Natural Language Processing 719–724 (Association for Computational Linguistics, 2018).
Lavee, T. et al. Listening for claims: listening comprehension using corpus-wide claim mining. In ArgMining Worksh. 58−66 (Association for Computational Linguistics, 2019).
Orbach, M. et al. A dataset of general-purpose rebuttal. In Proc. 2019 Conf. on Empirical Methods in Natural Language Processing 5595−5605 (Association for Computational Linguistics, 2019).
Slonim, N., Atwal, G. S., Tkačik, G. & Bialek, W. Information-based clustering. Proc. Natl Acad. Sci. USA 102, 18297–18302 (2005).
Article ADS MathSciNet CAS Google Scholar
Ein Dor, L. et al. Learning thematic similarity metric from article sections using triplet networks. In Proc. 56th Ann. Meet. Assoc. for Computational Linguistics Vol. 2, 49–54 (Association for Computational Linguistics, 2018); https://www.aclweb.org/anthology/P18–2009
Shechtman, S. & Mordechay, M. Emphatic speech prosody prediction with deep Lstm networks. In 2018 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP) 5119–5123 (IEEE, 2018).
Mass, Y. et al. Word emphasis prediction for expressive text to speech. In Interspeech 2868–2872 (International Speech Communication Association, 2018).
Feigenblat, G., Roitman, H., Boni, O. & Konopnicki, D. Unsupervised query-focused multi-document summarization using the cross entropy method. In Proc. 40th Int. ACM SIGIR Conf. on Research and Development in Information Retrieval 961–964 (Association for Computing Machinery, 2017).
Daxenberger, J., Schiller, B., Stahlhut, C., Kaiser, E. & Gurevych, I. Argumentext: argument classification and clustering in a generalized search scenario. Datenbank-Spektrum 20, 115–121 (2020).
Gretz, S. et al. A large-scale dataset for argument quality ranking: construction and analysis. In Thirty-Fourth AAAI Conf. on Artificial Intelligence 7805–7813 (AAAI Press, 2020); https://aaai.org/ojs/index.php/AAAI/article/view/6285
Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, 2016).
Samuel, A. L. Some studies in machine learning using the game of checkers. IBM J. Res. Develop. 3, 210–229 (1959).
Article MathSciNet Google Scholar
Tesauro, G. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Comput. 6, 215–219 (1994).
Article Google Scholar
Campbell, M., Hoane, A. J., Jr & Hsu, F.-h. Deep Blue. Artif. Intell. 134, 57–83 (2002).
Article Google Scholar
Ferrucci, D. A. Introduction to “This is Watson”. IBM J. Res. Dev. 56, 235–249 (2012).
Article Google Scholar
Silver, D. et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362, 1140–1144 (2018).
Article ADS MathSciNet CAS Google Scholar
Coulom, R. Efficient selectivity and backup operators in Monte-Carlo tree search. In 5th Int. Conf. on Computers and Games inria-0011699 (Springer, 2006).
Vinyals, O. et al. Grandmaster level in Starcraft II using multi-agent reinforcement learning. Nature 575, 350–354 (2019).
Article ADS CAS Google Scholar

Download references

Acknowledgements

We thank E. Aharoni, D. Carmel, S. Fine, M. Levinger, and L. Haas for invaluable help during the early stages of this work. We thank A. Aaron and R. Fernandez for help in developing the Project Debater voice; P. Levin-Slesarev for work on the figures; G. Feigenblat and J. Daxenberger for help in generating baseline results; Y. Katsis for comments on the draft; N. Ovadia, D. Zafrir and H. Natarajan for their sportsmanship; and I. Dagan, I. Gurevych, C. Reed, B. Stein, H. Wachsmuth and U. Zakai for many discussions. We are indebted to the in-house annotators and in-house debaters, and especially to A. Polnarov and H. Goldlist-Eichler, who worked on this project over the years. Finally, we thank the additional researchers and managers from the Haifa, Dublin, India and Yorktown IBM Research labs who contributed to this project over the years, and especially to J. E. Kelly, A. Krishna, D. Gil and the IBM communications team, Epic Digital and Intelligence Squared for their support and ideas.

Author information

Authors and Affiliations

IBM Research AI, Haifa, Israel
Noam Slonim, Yonatan Bilu, Roy Bar-Haim, Ben Bogin, Leshem Choshen, Edo Cohen-Karlik, Lena Dankin, Lilach Edelstein, Liat Ein-Dor, Roni Friedman-Melamed, Assaf Gavron, Ariel Gera, Shai Gretz, Dan Gutfreund, Alon Halfon, Daniel Hershcovich, Ron Hoory, Shay Hummel, Michal Jacovi, Yoav Kantor, Yoav Katz, David Konopnicki, Zvi Kons, Lili Kotlerman, Dalia Krieger, Dan Lahav, Tamar Lavee, Ran Levy, Naftali Liberman, Yosi Mass, Amir Menczel, Shachar Mirkin, Guy Moshkowich, Shila Ofek-Koifman, Matan Orbach, Ella Rabinovich, Ruty Rinott, Slava Shechtman, Dafna Sheinwald, Eyal Shnarch, Ilya Shnayderman, Aya Soffer, Artem Spector, Benjamin Sznajder, Assaf Toledo, Orith Toledo-Ronen, Elad Venezian & Ranit Aharonov
IBM Research AI, Dublin, Ireland
Carlos Alzate, Francesca Bonin, Martin Gleize, Yufang Hou & Charles Jochim

Authors

Noam Slonim
View author publications
You can also search for this author in PubMed Google Scholar
Yonatan Bilu
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Alzate
View author publications
You can also search for this author in PubMed Google Scholar
Roy Bar-Haim
View author publications
You can also search for this author in PubMed Google Scholar
Ben Bogin
View author publications
You can also search for this author in PubMed Google Scholar
Francesca Bonin
View author publications
You can also search for this author in PubMed Google Scholar
Leshem Choshen
View author publications
You can also search for this author in PubMed Google Scholar
Edo Cohen-Karlik
View author publications
You can also search for this author in PubMed Google Scholar
Lena Dankin
View author publications
You can also search for this author in PubMed Google Scholar
Lilach Edelstein
View author publications
You can also search for this author in PubMed Google Scholar
Liat Ein-Dor
View author publications
You can also search for this author in PubMed Google Scholar
Roni Friedman-Melamed
View author publications
You can also search for this author in PubMed Google Scholar
Assaf Gavron
View author publications
You can also search for this author in PubMed Google Scholar
Ariel Gera
View author publications
You can also search for this author in PubMed Google Scholar
Martin Gleize
View author publications
You can also search for this author in PubMed Google Scholar
Shai Gretz
View author publications
You can also search for this author in PubMed Google Scholar
Dan Gutfreund
View author publications
You can also search for this author in PubMed Google Scholar
Alon Halfon
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Hershcovich
View author publications
You can also search for this author in PubMed Google Scholar
Ron Hoory
View author publications
You can also search for this author in PubMed Google Scholar
Yufang Hou
View author publications
You can also search for this author in PubMed Google Scholar
Shay Hummel
View author publications
You can also search for this author in PubMed Google Scholar
Michal Jacovi
View author publications
You can also search for this author in PubMed Google Scholar
Charles Jochim
View author publications
You can also search for this author in PubMed Google Scholar
Yoav Kantor
View author publications
You can also search for this author in PubMed Google Scholar
Yoav Katz
View author publications
You can also search for this author in PubMed Google Scholar
David Konopnicki
View author publications
You can also search for this author in PubMed Google Scholar
Zvi Kons
View author publications
You can also search for this author in PubMed Google Scholar
Lili Kotlerman
View author publications
You can also search for this author in PubMed Google Scholar
Dalia Krieger
View author publications
You can also search for this author in PubMed Google Scholar
Dan Lahav
View author publications
You can also search for this author in PubMed Google Scholar
Tamar Lavee
View author publications
You can also search for this author in PubMed Google Scholar
Ran Levy
View author publications
You can also search for this author in PubMed Google Scholar
Naftali Liberman
View author publications
You can also search for this author in PubMed Google Scholar
Yosi Mass
View author publications
You can also search for this author in PubMed Google Scholar
Amir Menczel
View author publications
You can also search for this author in PubMed Google Scholar
Shachar Mirkin
View author publications
You can also search for this author in PubMed Google Scholar
Guy Moshkowich
View author publications
You can also search for this author in PubMed Google Scholar
Shila Ofek-Koifman
View author publications
You can also search for this author in PubMed Google Scholar
Matan Orbach
View author publications
You can also search for this author in PubMed Google Scholar
Ella Rabinovich
View author publications
You can also search for this author in PubMed Google Scholar
Ruty Rinott
View author publications
You can also search for this author in PubMed Google Scholar
Slava Shechtman
View author publications
You can also search for this author in PubMed Google Scholar
Dafna Sheinwald
View author publications
You can also search for this author in PubMed Google Scholar
Eyal Shnarch
View author publications
You can also search for this author in PubMed Google Scholar
Ilya Shnayderman
View author publications
You can also search for this author in PubMed Google Scholar
Aya Soffer
View author publications
You can also search for this author in PubMed Google Scholar
Artem Spector
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Sznajder
View author publications
You can also search for this author in PubMed Google Scholar
Assaf Toledo
View author publications
You can also search for this author in PubMed Google Scholar
Orith Toledo-Ronen
View author publications
You can also search for this author in PubMed Google Scholar
Elad Venezian
View author publications
You can also search for this author in PubMed Google Scholar
Ranit Aharonov
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

N.S. conceived the idea of Project Debater. N.S., Y.B., C.A., R.B.-H., B.B., F.B., L.C., E.C.-K., L.D., L.E., L.E.-D, R.F.-M, A. Gavron, A. Gera., M.G., S.G., D.G., A.H., D.H., R.H., Y.H., S.H., M.J., C.J., Y. Kantor, Y. Katz, D. Konopnicki, Z.K., L.K., D. Krieger, D.L., T.L., R.L., N.L., Y.M., A.M., S.M., G.M., M.O., E.R., R.R., S.S., D.S., E.S., I.S., A. Spector, B.S., A.T., O.T.-R., E.V. and R.A. designed and built Project Debater, with guidance from S.O.-K. and A. Soffer. N.S., Y.B., R.F.-M, and R.A. designed the evaluation framework. N.S., Y.B., and R.A. wrote the paper, with contribution from A. Gera to the In Depth Analysis section. N.S., Y.B., R.B.-H., L.C., L.D., L.E.-D., A. Gera, R.F.-M., S.G., C.J., Y. Kantor, D.L., G.M., M.O., E.S., A.T., E.V. and R.A. wrote the Supplementary Information. Y. Katz led the software engineering of the project. N.S. and R.A. led the team, with D.G. co-leading during the early stages of the project.

Corresponding author

Correspondence to Noam Slonim.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature thanks Claire Cardie and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

This file contains Supplementary Information Sections 1-11, including Supplementary Tables 1-3, Supplementary Figures 1-6 and Supplementary References – see contents pages for details.

Supplementary Information

This file contains additional information, including: query_sentiment_lexicon - a lexicon of sentiment words, used as a building block to create queries for sentence retrieval in the claim detection and evidence detection components; action_verb_expansions - a mapping between common action verbs and their syntactic and semantic expansions; claim_verb_phrases - a list of verb phrases commonly found in sentences containing claims; contrastive_expressions - a lexicon of expressions indicating contrast and study_conclusions - a list of phrases (unigrams to 5-grams) that frequently appear in reports of study results and conclusions.

Peer Review File

Source data

Source Data Fig. 3

Rights and permissions

Reprints and permissions

About this article

Cite this article

Slonim, N., Bilu, Y., Alzate, C. et al. An autonomous debating system. Nature 591, 379–384 (2021). https://doi.org/10.1038/s41586-021-03215-w

Download citation

Received: 19 May 2020
Accepted: 08 January 2021
Published: 17 March 2021
Issue Date: 18 March 2021
DOI: https://doi.org/10.1038/s41586-021-03215-w

This article is cited by

Argumentation effect of a chatbot for ethical discussions about autonomous AI scenarios
- Christian Hauptmann
- Adrian Krenzer
- Frank Puppe
Knowledge and Information Systems (2024)
AI Moral Enhancement: Upgrading the Socio-Technical System of Moral Engagement
- Richard Volkman
- Katleen Gabriels
Science and Engineering Ethics (2023)
Nobel Turing Challenge: creating the engine for scientific discovery
- Hiroaki Kitano
npj Systems Biology and Applications (2021)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.