Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Designing a strong test for measuring true common-sense reasoning

Common-sense reasoning has recently emerged as an important test for artificial general intelligence, especially given the much-publicized successes of language representation models such as T5, BERT and GPT-3. Currently, typical benchmarks involve question answering tasks, but to test the full complexity of common-sense reasoning, more comprehensive evaluation methods that are grounded in theory should be developed.

This is a preview of subscription content

Access options

Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: A tree-based visualization of the 48 representational areas in the Gordon–Hobbs common-sense theory.


  1. Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. in Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Vol. 1 (eds Burstein, J. et al.) 4171–4186 (ACL, 2019).

  2. Brown, T. et al. in Advances in Neural Information Processing Systems 33, 1877–1901 (2020).

  3. Liu, Y. et al. Preprint at arXiv (2019).

  4. Grudin, J. & Jacques, R. in Proc. 2019 CHI Conference on Human Factors in Computing Systems 209 (2019).

  5. Davis, E. & Marcus, G. Commun. ACM 58, 92–103 (2015).

    Article  Google Scholar 

  6. Talmor, A., Herzig, J., Lourie, N. & Berant, J. in Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Vol. 1 (eds Burstein, J. et al.) 4149–4158 (ACL, 2019).

  7. Lewis, M. & Fan, A. in International Conference on Learning Representations (2019).

  8. Khashabi, D. et al. in EMNLP’20: Findings of the Association for Computational Linguistics 1896–1907 (2020).

  9. Lenat, D. B. Commun. ACM 38, 33–38 (1995).

    Article  Google Scholar 

  10. Speer, R., Chin, J. & Havasi, C. in Proc. 31st AAAI Conference on Artificial Intelligence, February 2017 (eds Singh, S. & Markovitch, S.) 4444–4451 (AAAI Press, 2017).

  11. Gordon, A. S. & Hobbs, J. R. A Formal Theory of Commonsense Psychology: How People Think People Think (Cambridge Univ. Press, 2017).

  12. Moore, C. The Development of Commonsense Psychology (Psychology Press, 2006).

  13. Ratcliffe, M. Rethinking Commonsense Psychology: A Critique of Folk Psychology, Theory of Mind and Simulation (Palgrave Macmillan UK, 2007).

  14. Goldman, A. I. in The Oxford Handbook of Philosophy of Cognitive Science (eds Margolis, E. et al.) (2012).

  15. Simon, H. A. Q. J. Econ. 69, 99–118 (1955).

    Article  Google Scholar 

  16. Grice, P. Studies in the Way of Words (Harvard Univ. Press, 1989).

  17. McCarthy, J. Ann. N. Y. Acad. Sci. 426, 129–137 (1984).

    Article  Google Scholar 

  18. Biro, S., Verschoor, S., Coalter, E. & Leslie, A. M. Infant Behav. Devel. 37, 729–738 (2014).

    Article  Google Scholar 

  19. Kushnir, T., Xu, F. & Wellman, H. M. Psychol. Sci. 21, 1134–1140 (2010).

    Article  Google Scholar 

  20. Perner, J. Understanding the Representational Mind (MIT Press, 1991).

  21. Powell, L. J. & Spelke, E. S. Proc. Natl Acad. Sci. USA 110, E3965–E3972 (2013).

    Google Scholar 

  22. Sommerville, J. A., Schmidt, M. F., Yun, J. E. & Burns, M. Infancy 18, 40–66 (2013).

    Article  Google Scholar 

  23. Jara-Ettinger, J., Gweon, H., Schulz, L. E. & Tenenbaum, J. B. Trends Cogn. Sci. 20, 589–604 (2016).

    Article  Google Scholar 

  24. Lin, X. V., Socher, R. & Xiong, C. in Proc. 2018 Conference on Empirical Methods in Natural Language Processing (eds Riloff, E. et al.) 3243–3253 (ACL, 2018).

Download references


This work was funded under the DARPA Machine Common Sense (MCS) program under award number N660011924033.

Author information

Authors and Affiliations



M.K. and D.M. conceived the ideas behind the manuscript and its outline. M.K., H.S. and A.M co-wrote the manuscript and designed the figures, examples and supplementary material. All authors reviewed the manuscript.

Corresponding author

Correspondence to Mayank Kejriwal.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks the anonymous reviewers for their contribution to the peer review of this work.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kejriwal, M., Santos, H., Mulvehill, A.M. et al. Designing a strong test for measuring true common-sense reasoning. Nat Mach Intell 4, 318–322 (2022).

Download citation

  • Published:

  • Issue Date:

  • DOI:


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing