Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

An autonomous debating system


Artificial intelligence (AI) is defined as the ability of machines to perform tasks that are usually associated with intelligent beings. Argument and debate are fundamental capabilities of human intelligence, essential for a wide range of human activities, and common to all human societies. The development of computational argumentation technologies is therefore an important emerging discipline in AI research1. Here we present Project Debater, an autonomous debating system that can engage in a competitive debate with humans. We provide a complete description of the system’s architecture, a thorough and systematic evaluation of its operation across a wide range of debate topics, and a detailed account of the system’s performance in its public debut against three expert human debaters. We also highlight the fundamental differences between debating with humans as opposed to challenging humans in game competitions, the latter being the focus of classical ‘grand challenges’ pursued by the AI research community over the past few decades. We suggest that such challenges lie in the ‘comfort zone’ of AI, whereas debating with humans lies in a different territory, in which humans still prevail, and for which novel paradigms are required to make substantial progress.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Debate flow.
Fig. 2: System architecture.
Fig. 3: Evaluation of Project Debater.
Fig. 4: Content type analysis.

Data availability

The full transcripts of the three public debates in which Project Debater participated are available in Supplementary Information section 11, including information that elucidates the system’s operation throughout, and the results of the audience votes. In addition, multiple datasets that were constructed and used while developing Project Debater are available at Source data are provided with this paper for Fig. 3Source data are provided with this paper.

Code availability

Most of the underlying capabilities of Project Debater, including the argument mining components, are freely available for academic research upon request as cloud services via (in which the terminology differs: what we call here ‘motion’ and ‘topic’ is denoted as ‘topic’ and ‘concept’, respectively.).


  1. 1.

    Lawrence, J. & Reed, C. Argument mining: a survey. Comput. Linguist. 45, 765–818 (2019).

    Article  Google Scholar 

  2. 2.

    Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. Preprint at (2018).

  3. 3.

    Peters, M. et al. Deep contextualized word representations. In Proc. 2018 Conf. North Am. Ch. Assoc. for Computational Linguistics: Human Language Technologies Vol. 1, 2227–2237 (Association for Computational Linguistics, 2018);–1202

  4. 4.

    Radford, A. et al. Language models are unsupervised multitask learners. OpenAI Blog 1, (2019).

  5. 5.

    Socher, R. et al. Recursive deep models for semantic compositionality over a sentiment treebank. In Proc. Empirical Methods in Natural Language Processing (EMNLP) 1631–1642 (Association for Computational Linguistics, 2013).

  6. 6.

    Yang, Z. et al. XLNet: generalized autoregressive pretraining for language understanding. In Adv. in Neural Information Processing Systems (NIPS) 5753−5763 (Curran Associates,2019).

  7. 7.

    Cho, K., van Merriënboer, B., Bahdanau, D. & Bengio, Y. On the properties of neural machine translation: encoder–decoder approaches. In Proc. 8th Worksh. on Syntax, Semantics and Structure in Statistical Translation 103−111 (Association for Computational Linguistics, 2014).

  8. 8.

    Gambhir, M. & Gupta, V. Recent automatic text summarization techniques: a survey. Artif. Intell. Rev. 47, 1–66 (2017).

    Article  Google Scholar 

  9. 9.

    Young, S., Gašić, M., Thomson, B. & Williams, J. POMDP-based statistical spoken dialog systems: A review. Proc. IEEE 101, 1160–1179 (2013).

    Article  Google Scholar 

  10. 10.

    Gurevych, I., Hovy, E. H., Slonim, N. & Stein, B. Debating Technologies (Dagstuhl Seminar 15512) Dagstuhl Report 5 (2016).

  11. 11.

    Levy, R., Bilu, Y., Hershcovich, D., Aharoni, E. & Slonim, N. Context dependent claim detection. In Proc. COLING 2014, the 25th Int. Conf. on Computational Linguistics: Technical Papers 1489–1500 (Dublin City University and Association for Computational Linguistics, 2014);–1141

  12. 12.

    Rinott, R. et al. Show me your evidence—an automatic method for context dependent evidence detection. In Proc. 2015 Conf. on Empirical Methods in Natural Language Processing 440–450 (Association for Computational Linguistics, 2015);–1050

  13. 13.

    Shnayderman, I. et al. Fast end-to-end wikification. Preprint at (2019).

  14. 14.

    Borthwick, A. A Maximum Entropy Approach To Named Entity Recognition. PhD thesis, New York Univ. (1999).

  15. 15.

    Finkel, J. R., Grenager, T. & Manning, C. Incorporating non-local information into information extraction systems by Gibbs sampling. In Proc. 43rd Ann. Meet. Assoc. for Computational Linguistics 363–370 (Association for Computational Linguistics, 2005).

  16. 16.

    Levy, R., Bogin, B., Gretz, S., Aharonov, R. & Slonim, N. Towards an argumentative content search engine using weak supervision. In Proc. 27th Int. Conf. on Computational Linguistics (COLING 2018) 2066–2081, (International Committee on Computational Linguistics, 2018).

  17. 17.

    Ein-Dor, L. et al. Corpus wide argument mining—a working solution. In Proc. Thirty-Fourth AAAI Conf. on Artificial Intelligence 7683−7691 (AAAI Press, 2020).

  18. 18.

    Levy, R. et al. Unsupervised corpus-wide claim detection. In Proc. 4th Worksh. on Argument Mining 79–84 (Association for Computational Linguistics, 2017);–5110

  19. 19.

    Shnarch, E. et al. Will it blend? Blending weak and strong labeled data in a neural network for argumentation mining. In Proc. 56th Ann. Meet. Assoc. for Computational Linguistics Vol. 2, 599–605 (Association for Computational Linguistics, 2018);–2095

  20. 20.

    Gleize, M. et al. Are you convinced? Choosing the more convincing evidence with a Siamese network. In Proc. 57th Conf. Assoc. for Computational Linguistic, 967–976 (Association for Computational Linguistics, 2019).

  21. 21.

    Bar-Haim, R., Bhattacharya, I., Dinuzzo, F., Saha, A. & Slonim, N. Stance classification of context-dependent claims. In Proc. 15th Conf. Eur. Ch. Assoc. for Computational Linguistics Vol. 1, 251–261 (Association for Computational Linguistics, 2017).

  22. 22.

    Bar-Haim, R., Edelstein, L., Jochim, C. & Slonim, N. Improving claim stance classification with lexical knowledge expansion and context utilization. In Proc. 4th Worksh. on Argument Mining 32–38 (Association for Computational Linguistics, 2017).

  23. 23.

    Bar-Haim, R. et al. From surrogacy to adoption; from bitcoin to cryptocurrency: debate topic expansion. In Proc. 57th Conf. Assoc. for Computational Linguistics 977–990 (Association for Computational Linguistics, 2019).

  24. 24.

    Bilu, Y. et al. Argument invention from first principles. In Proc. 57th Ann. Meet. Assoc. for Computational Linguistics 1013–1026 (Association for Computational Linguistics, 2019).

  25. 25.

    Ein-Dor, L. et al. Semantic relatedness of Wikipedia concepts—benchmark data and a working solution. In Proc. Eleventh Int. Conf. on Language Resources and Evaluation (LREC 2018) 2571−2575 (Springer, 2018).

  26. 26.

    Pahuja, V. et al. Joint learning of correlated sequence labelling tasks using bidirectional recurrent neural networks. In Proc. Interspeech 548−552 (International Speech Communication Association, 2017).

  27. 27.

    Mirkin, S. et al. Listening comprehension over argumentative content. In Proc. 2018 Conf. on Empirical Methods in Natural Language Processing 719–724 (Association for Computational Linguistics, 2018).

  28. 28.

    Lavee, T. et al. Listening for claims: listening comprehension using corpus-wide claim mining. In ArgMining Worksh. 58−66 (Association for Computational Linguistics, 2019).

  29. 29.

    Orbach, M. et al. A dataset of general-purpose rebuttal. In Proc. 2019 Conf. on Empirical Methods in Natural Language Processing 5595−5605 (Association for Computational Linguistics, 2019).

  30. 30.

    Slonim, N., Atwal, G. S., Tkačik, G. & Bialek, W. Information-based clustering. Proc. Natl Acad. Sci. USA 102, 18297–18302 (2005).

    ADS  MathSciNet  CAS  Article  Google Scholar 

  31. 31.

    Ein Dor, L. et al. Learning thematic similarity metric from article sections using triplet networks. In Proc. 56th Ann. Meet. Assoc. for Computational Linguistics Vol. 2, 49–54 (Association for Computational Linguistics, 2018);–2009

  32. 32.

    Shechtman, S. & Mordechay, M. Emphatic speech prosody prediction with deep Lstm networks. In 2018 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP) 5119–5123 (IEEE, 2018).

  33. 33.

    Mass, Y. et al. Word emphasis prediction for expressive text to speech. In Interspeech 2868–2872 (International Speech Communication Association, 2018).

  34. 34.

    Feigenblat, G., Roitman, H., Boni, O. & Konopnicki, D. Unsupervised query-focused multi-document summarization using the cross entropy method. In Proc. 40th Int. ACM SIGIR Conf. on Research and Development in Information Retrieval 961–964 (Association for Computing Machinery, 2017).

  35. 35.

    Daxenberger, J., Schiller, B., Stahlhut, C., Kaiser, E. & Gurevych, I. Argumentext: argument classification and clustering in a generalized search scenario. Datenbank-Spektrum 20, 115–121 (2020).

  36. 36.

    Gretz, S. et al. A large-scale dataset for argument quality ranking: construction and analysis. In Thirty-Fourth AAAI Conf. on Artificial Intelligence 7805–7813 (AAAI Press, 2020);

  37. 37.

    Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, 2016).

  38. 38.

    Samuel, A. L. Some studies in machine learning using the game of checkers. IBM J. Res. Develop. 3, 210–229 (1959).

    MathSciNet  Article  Google Scholar 

  39. 39.

    Tesauro, G. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Comput. 6, 215–219 (1994).

    Article  Google Scholar 

  40. 40.

    Campbell, M., Hoane, A. J., Jr & Hsu, F.-h. Deep Blue. Artif. Intell. 134, 57–83 (2002).

    Article  Google Scholar 

  41. 41.

    Ferrucci, D. A. Introduction to “This is Watson”. IBM J. Res. Dev. 56, 235–249 (2012).

    Article  Google Scholar 

  42. 42.

    Silver, D. et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362, 1140–1144 (2018).

    ADS  MathSciNet  CAS  Article  Google Scholar 

  43. 43.

    Coulom, R. Efficient selectivity and backup operators in Monte-Carlo tree search. In 5th Int. Conf. on Computers and Games inria-0011699 (Springer, 2006).

  44. 44.

    Vinyals, O. et al. Grandmaster level in Starcraft II using multi-agent reinforcement learning. Nature 575, 350–354 (2019).

    ADS  CAS  Article  Google Scholar 

Download references


We thank E. Aharoni, D. Carmel, S. Fine, M. Levinger, and L. Haas for invaluable help during the early stages of this work. We thank A. Aaron and R. Fernandez for help in developing the Project Debater voice; P. Levin-Slesarev for work on the figures; G. Feigenblat and J. Daxenberger for help in generating baseline results; Y. Katsis for comments on the draft; N. Ovadia, D. Zafrir and H. Natarajan for their sportsmanship; and I. Dagan, I. Gurevych, C. Reed, B. Stein, H. Wachsmuth and U. Zakai for many discussions. We are indebted to the in-house annotators and in-house debaters, and especially to A. Polnarov and H. Goldlist-Eichler, who worked on this project over the years. Finally, we thank the additional researchers and managers from the Haifa, Dublin, India and Yorktown IBM Research labs who contributed to this project over the years, and especially to J. E. Kelly, A. Krishna, D. Gil and the IBM communications team, Epic Digital and Intelligence Squared for their support and ideas.

Author information




N.S. conceived the idea of Project Debater. N.S., Y.B., C.A., R.B.-H., B.B., F.B., L.C., E.C.-K., L.D., L.E., L.E.-D, R.F.-M, A. Gavron, A. Gera., M.G., S.G., D.G., A.H., D.H., R.H., Y.H., S.H., M.J., C.J., Y. Kantor, Y. Katz, D. Konopnicki, Z.K., L.K., D. Krieger, D.L., T.L., R.L., N.L., Y.M., A.M., S.M., G.M., M.O., E.R., R.R., S.S., D.S., E.S., I.S., A. Spector, B.S., A.T., O.T.-R., E.V. and R.A. designed and built Project Debater, with guidance from S.O.-K. and A. Soffer. N.S., Y.B., R.F.-M, and R.A. designed the evaluation framework. N.S., Y.B., and R.A. wrote the paper, with contribution from A. Gera to the In Depth Analysis section. N.S., Y.B., R.B.-H., L.C., L.D., L.E.-D., A. Gera, R.F.-M., S.G., C.J., Y. Kantor, D.L., G.M., M.O., E.S., A.T., E.V. and R.A. wrote the Supplementary Information. Y. Katz led the software engineering of the project. N.S. and R.A. led the team, with D.G. co-leading during the early stages of the project.

Corresponding author

Correspondence to Noam Slonim.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature thanks Claire Cardie and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

This file contains Supplementary Information Sections 1-11, including Supplementary Tables 1-3, Supplementary Figures 1-6 and Supplementary References – see contents pages for details.

Supplementary Information

This file contains additional information, including: query_sentiment_lexicon - a lexicon of sentiment words, used as a building block to create queries for sentence retrieval in the claim detection and evidence detection components; action_verb_expansions - a mapping between common action verbs and their syntactic and semantic expansions; claim_verb_phrases - a list of verb phrases commonly found in sentences containing claims; contrastive_expressions - a lexicon of expressions indicating contrast and study_conclusions - a list of phrases (unigrams to 5-grams) that frequently appear in reports of study results and conclusions.

Peer Review File

Source data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Slonim, N., Bilu, Y., Alzate, C. et al. An autonomous debating system. Nature 591, 379–384 (2021).

Download citation

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing