Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Perspective
  • Published:

Challenges, evaluation and opportunities for open-world learning


Environmental changes can profoundly impact the performance of artificial intelligence systems operating in the real world, with effects ranging from overt catastrophic failures to non-robust behaviours that do not take changing context into account. Here we argue that designing machine intelligence that can operate in open worlds, including detecting, characterizing and adapting to structurally unexpected environmental changes, is a critical goal on the path to building systems that can solve complex and relatively under-determined problems. We present and distinguish between three forms of open-world learning (OWL)—weak, semi-strong and strong—and argue that a fully developed OWL system should be antifragile, rather than merely robust. An antifragile system, an example of which is the immune system, is not only robust to adverse events, but adapts to them quickly and becomes better at handling them in subsequent encounters. We also argue that, because OWL approaches must be capable of handling the unexpected, their practical evaluation can pose an interesting conceptual problem.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Examples of structural environmental changes across three common domains.
Fig. 2: Research paradigm for evaluating OWL agents (weak, semi-strong and strong) in the general setting of continuous deployment.
Fig. 3: Opportunities for an OWL system with important human consequences, compared to more traditional machine learning.

Similar content being viewed by others


  1. Heaven, D. et al. Why deep-learning AIs are so easy to fool. Nature 574, 163–166 (2019).

    Google Scholar 

  2. Marcus, G. Deep learning: a critical appraisal. Preprint at (2018).

  3. Moon, J., Kim, J., Shin, Y. & Hwang, S. Confidence-aware learning for deep neural networks. In Proc. 37th International Conference on Machine Learning (eds Daumé, H. & Singh, A.) 7034–7044 (PMLR, 2020).

  4. Bulusu, S., Kailkhura, B., Li, B., Varshney, P. K. & Song, D. Anomalous example detection in deep learning: a survey. IEEE Access 8, 132330–132347 (2020).

    Google Scholar 

  5. Musliner, D. J. et al. OpenMIND: planning and adapting in domains with novelty. In Proc. Ninth Annual Conference on Advances in Cognitive Systems (Advances in Cognitive Systems, 2021).

  6. Muhammad, F. et al. A novelty-centric agent architecture for changing worlds. In Proc. 20th International Conference on Autonomous Agents and MultiAgent Systems 925–933 (International Foundation for Autonomous Agents and Multiagent Systems, 2021).

  7. Jafarzadeh, M. et al. Open-world learning without labels. Preprint at (2020).

  8. Jafarzadeh, M. et al. A review of open-world learning and steps toward open-world learning without labels. Preprint at (2020).

  9. Levesque, H. J. Common Sense, the Turing Test and the Quest for Real AI (MIT Press, 2017).

  10. Kejriwal, M., Santos, H., Mulvehill, A. M. & McGuinness, D. L. Designing a strong test for measuring true common-sense reasoning. Nat. Mach. Intell. 4, 318–322 (2022).

    Google Scholar 

  11. Maher, M. L. Evaluating creativity in humans, computers and collectively intelligent systems. In Proc. 1st DESIRE Network Conference on Creativity and Innovation in Design (eds Christensen, B. T. et al.) 22–28 (Desire Network, 2010).

  12. Brachman, R. J. & Levesque, H. J. Machines Like Us: Toward AI with Common Sense (MIT Press, 2022).

  13. Mak, R., Walton, J., Keely, L., Heher, D. & Chan, L. Reliable service–oriented architecture for NASA’s Mars Exploration Rover mission. In Proc. 2005 IEEE Aerospace Conference 1006–1019 (IEEE, 2005).

  14. Silver, D. et al. A general reinforcement learning algorithm that masters chess, shogi and go through self-play. Science 362, 1140–1144 (2018).

    MathSciNet  Google Scholar 

  15. Tomašev, N., Paquet, U., Hassabis, D. & Kramnik, V. Reimagining chess with alphazero. Commun. ACM 65, 60–66 (2022).

    Google Scholar 

  16. Cully, A., Clune, J., Tarapore, D. & Mouret, J.-B. Robots that can adapt like animals. Nature 521, 503–507 (2015).

    Google Scholar 

  17. Cincotti, A., Iida, H. & Yoshimura, J. Refinement and complexity in the evolution of chess. In Information Sciences 2007, Proceedings of the 10th Joint Conference (ed. Wang, P. P.) 650–654 (World Scientific, 2007).

  18. Berger, E. R. & Dubbs, A. Winning strategies in multimove chess (i, j). J. Inf. Process. 23, 272–275 (2015).

    Google Scholar 

  19. Naudé, W. Artificial intelligence vs COVID-19: limitations, constraints and pitfalls. AI Soc. 35, 761–765 (2020).

    Google Scholar 

  20. Tu, J. et al. Exploring adversarial robustness of multi-sensor perception systems in self driving. In Proc. 5th Conference on Robot Learning (eds Faust, A. et al.) 1013–1024 (PMLR, 2022).

  21. Terryn, S., Brancart, J., Lefeber, D., Van Assche, G. & Vanderborght, B. Self-healing soft pneumatic robots. Sci. Robot. 2, eaan4268 (2017).

    Google Scholar 

  22. Bilodeau, R. A. & Kramer, R. K. Self-healing and damage resilience for soft robotics: a review. Front. Robot. AI 4, 48 (2017).

    Google Scholar 

  23. Metz, C. OpenAI unveils A.I. that instantly generates eye-popping videos. The New York Times (15 February 2024).

  24. Kejriwal, M. Designing artificial intelligence for open worlds, 2023 AAAS annual meeting. AAAS (2023).

  25. Taleb, N. N. Antifragile: Things that Gain from Disorder Vol. 3 (Random House, 2014).

  26. Marsland, S. Novelty detection in learning systems. Neural Comput. Surveys 3, 157–195 (2003).

    Google Scholar 

  27. Chandola, V., Banerjee, A. & Kumar, V. Anomaly detection: a survey. ACM Comput. Surveys 41, 1–58 (2009).

    Google Scholar 

  28. Aminikhanghahi, S. & Cook, D. J. A survey of methods for time series change point detection. Knowl. Inf. Syst. 51, 339–367 (2017).

    Google Scholar 

  29. Missikoff, M., Navigli, R. & Velardi, P. The usable ontology: an environment for building and assessing a domain ontology. In Proc. First International Semantic Web Conference, The Semantic Web - ISWC 2002 (eds Horrocks, I. & Hendler, J. A.) 39–53 (Springer, 2002).

  30. Wang, D., Shelhamer, E., Liu, S., Olshausen, B. A. & Darrell, T. Tent: Fully test-time adaptation by entropy minimization. In Proc. 9th International Conference on Learning Representations (, 2021).

  31. Mitchell, T. et al. Never-ending learning. Commun. ACM 61, 103–115 (2018).

    Google Scholar 

  32. Bateni, P., Barber, J., van de Meent, J.-W. & Wood, F. Enhancing few-shot image classification with unlabelled examples. In Proc. IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 1597–1606 (IEEE, 2022).

  33. Loyall, B. et al. An integrated architecture for online adaptation to novelty in open worlds using probabilistic programming and novelty-aware planning. In Proc. AAAI Spring Symposium on Designing AI for Open-World Novelty (AAAI, 2022).

  34. Bonjour, T. et al. Decision making in monopoly using a hybrid deep reinforcement learning approach. IEEE Trans. Emerg. Topics Comput. Intell. 6, 1335–1344 (2022).

    Google Scholar 

  35. Di, X. & Shi, R. A survey on autonomous vehicle control in the era of mixed-autonomy: from physics-based to AI-guided driving policy learning. Transport. Res. C Emerg. Technol. 125, 103008 (2021).

    Google Scholar 

  36. Chernova, S. & Veloso, M. Interactive policy learning through confidence-based autonomy. J. Artif. Intell. Res. 34, 1–25 (2009).

    MathSciNet  Google Scholar 

  37. Kejriwal, M. in Domain-Specific Knowledge Graph Construction 9–31 (Springer, 2019).

  38. Santos, H., Mulvehill, A. M., Shen, K., Kejriwal, M. & McGuinness, D. L. TG-CSR: A human-labeled dataset grounded in nine formal commonsense categories. Data Brief 51, 109666 (2023).

  39. Gao, R. et al. ObjectFolder 2.0: a multisensory object dataset for Sim2Real transfer. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 10588–10598 (IEEE, 2022).

  40. Fuad, A. & Al-Yahya, M. Recent developments in Arabic conversational AI: a literature review. IEEE Access 10, 23842–23859 (2022).

  41. Shrivastava, A., Singh, S. & Gupta, A. Constrained semi-supervised learning using attributes and comparative attributes. In Proc. 12th European Conference on Computer Vision, Computer Vision – ECCV 2012 (eds Fitzgibbon, A. et al.) 369–383 (Springer, 2012).

  42. Tseitlin, A. The antifragile organization. Commun. ACM 56, 40–44 (2013).

    Google Scholar 

  43. Russo, D. & Ciancarini, P. Towards antifragile software architectures. Proc. Comput. Sci. 109, 929–934 (2017).

    Google Scholar 

  44. Abid, A. et al. Toward antifragile cloud computing infrastructures. Proc. Comput. Sci. 32, 850–855 (2014).

    Google Scholar 

  45. Torralba, A. & Efros, A. A. Unbiased look at dataset bias. In Proc. 2011 IEEE Conference on Computer Vision and Pattern Recognition 1521–1528 (IEEE, 2011).

  46. Samala, R. K., Chan, H.-P., Hadjiiski, L. & Koneru, S. Hazards of data leakage in machine learning: a study on classification of breast cancer using deep neural networks. In Medical Imaging 2020: Computer-Aided Diagnosis, Proceedings of SPIE, Volume 11314 (eds Hahn, H. K. & Mazurowski, M. A.) 1131416 (SPIE, 2020).

  47. Gamage, C. et al. Novelty generation framework for AI agents in angry birds style physics games. In Proc. 2021 IEEE Conference on Games (CoG) (IEEE, 2021).

  48. Kejriwal, M. & Thomas, S. A multi-agent simulator for generating novelty in monopoly. Simul. Model. Pract. Theory 112, 102364 (2021).

    Google Scholar 

  49. Höfer, S. et al. Sim2Real in robotics and automation: applications and challenges. IEEE Trans. Autom. Sci. Eng. 18, 398–400 (2021).

    Google Scholar 

  50. Lee, W. & Xiang, D. Information-theoretic measures for anomaly detection. In Proc. 2001 IEEE Symposium on Security and Privacy 130–143 (IEEE, 2000).

  51. Killick, R. & Eckley, I. changepoint: an R package for changepoint analysis. J. Stat. Software 58, 1–19 (2014).

    Google Scholar 

  52. New, A., Baker, M., Nguyen, E. & Vallabha, G. Lifelong learning metrics. Preprint at (2022).

  53. Chen, M. et al. Evaluating large language models trained on code. Preprint at (2021).

  54. Goss, S. A. et al. Polycraft World AI Lab (PAL): an extensible platform for evaluating artificial intelligence agents. Preprint at (2023).

  55. Acsintoae, A. et al. UBnormal: new benchmark for supervised open-set video anomaly detection. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 20111–20121 (IEEE, 2022).

  56. Hamon, R., Junklewitz, H. & Sanchez Martin, J. I. Robustness and Explainability of Artificial Intelligence. Report No. JRC119336, EUR 30040 EN (Publications Office of the European Union, 2020).

  57. Lakkaraju, H., Kamar, E., Caruana, R. & Horvitz, E. Identifying unknown unknowns in the open world: representations and policies for guided exploration. In Proc. Thirty-First AAAI Conference on Artificial Intelligence 2124–2132 (AAAI Press, 2017).

  58. Nayak, A., Timmapathini, H., Ponnalagu, K. & Venkoparao, V. G. Domain adaptation challenges of BERT in tokenization and sub-word representations of out-of-vocabulary words. In Proc. First Workshop on Insights from Negative Results in NLP (eds Rogers, A. et al.) 1–5 (Association for Computational Linguistics, 2020).

  59. Manning, C. & Schutze, H. Foundations of Statistical Natural Language Processing (MIT Press, 1999).

  60. Lin, R. & Kraus, S. Can automated agents proficiently negotiate with humans? Commun. ACM 53, 78–88 (2010).

    Google Scholar 

  61. Meta Fundamental AI Research Diplomacy Team et al.Human-level play in the game of diplomacy by combining language models with strategic reasoning. Science 378, 1067–1074 (2022).

  62. Marcus, G. Horse rides astronaut. Substack (2022).

  63. Bang, Y. et al. A multitask, multilingual, multimodal evaluation of ChatGPT on reasoning, hallucination, and interactivity. In Proc. 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (eds Park, J. C. et al.) 675–718 (Association for Computational Linguistics, 2023).

  64. Shen, Y. et al. ChatGPT and other large language models are double-edged swords. Radiology 307, 2 (2023).

    Google Scholar 

  65. Liu, X., Ospina, J. & Konstantinou, C. Deep reinforcement learning for cybersecurity assessment of wind integrated power systems. IEEE Access 8, 208378–208394 (2020).

    Google Scholar 

  66. Panesar, A. Machine Learning and AI for Healthcare (Springer, 2019).

  67. Minn, S. AI- assisted knowledge assessment techniques for adaptive learning environments. Comput. Educ. Artif. Intell. 3, 100050 (2022).

    Google Scholar 

  68. Kumar, S. & Tomar, R. The role of artificial intelligence in space exploration. In Proc. 2018 International Conference on Communication, Computing and Internet of Things (IC3IoT) (eds Niranjan, S. K. et al.) 499–503 (IEEE, 2018).

  69. Chantry, M., Christensen, H., Dueben, P. & Palmer, T. Opportunities and challenges for machine learning in weather and climate modelling: hard, medium and soft AI. Phil. Trans. R. Soc. A 379, 20200083 (2021).

    Google Scholar 

  70. Saba, L. et al. The present and future of deep learning in radiology. Eur. J. Radiol. 114, 14–24 (2019).

    Google Scholar 

  71. Ngo, R., Chan, L. & Mindermann, S. The alignment problem from a deep learning perspective. Preprint at (2022).

  72. Wang, J. X. Meta-learning in natural and artificial intelligence. Curr. Opin. Behav. Sci. 38, 90–95 (2021).

    Google Scholar 

  73. Wu, Y. et al. Brain-inspired global-local learning incorporated with neuromorphic computing. Nat. Commun. 13, 65 (2022).

    Google Scholar 

  74. Chen, X., Shrivastava, A. & Gupta, A. NEIL: extracting visual knowledge from web data. In Proc. 2013 IEEE International Conference on Computer Vision (ICCV) 1409–1416 (IEEE, 2013).

  75. Mitchell, M. Abstraction and analogy—making in artificial intelligence. Ann. N. Y. Acad. Sci. 1505, 79–101 (2021).

    Google Scholar 

  76. Chalapathy, R. & Chawla, S. Deep learning for anomaly detection: a survey. Preprint at (2019).

  77. Salehi, M. et al. A unified survey on anomaly, novelty, open-set, and out of-distribution detection: solutions and future challenges. Transactions on Machine Learning Research (2022).

  78. Doorenbos, L., Sznitman, R. & Márquez-Neila, P. Data invariants to understand unsupervised out-of-distribution detection. In Proc. 17th European Conference, Part XXXI, Computer Vision – ECCV 2022 (eds Avidan, S. et al.) 133–150 (Springer, 2022).

  79. Erdil, E., Chaitanya, K., Karani, N. & Konukoglu, E. Task-agnostic out-of-distribution detection using kernel density estimation. In Proc. Uncertainty for Safe Utilization of Machine Learning in Medical Imaging, and Perinatal Imaging, Placental and Preterm Image Analysis: 3rd International Workshop, UNSURE 2021, and 6th International Workshop, PIPPI 2021 (eds Sudre, C. H. et al.) 91–101 (Springer, 2021).

  80. Sastry, C. S. & Oore, S. Detecting out-of-distribution examples with gram matrices. In Proc. 37th International Conference on Machine Learning (eds Daumé, H. III & Singh, A.) 8491–8501 (PMLR, 2020).

  81. Nassif, A. B., Talib, M. A., Nasir, Q. & Dakalbab, F. M. Machine learning for anomaly detection: a systematic review. IEEE Access 9, 78658–78700 (2021).

    Google Scholar 

  82. Zhang, Y. & Yang, Q. An overview of multi-task learning. Natl Sci. Rev. 5, 30–43 (2018).

    Google Scholar 

  83. Caruana, R. Multitask Learning (Springer, 1998).

  84. Van Steenkiste, G., van Loon, G. & Crevecoeur, G. Transfer learning in ECG classification from human to horse using a novel parallel neural network architecture. Sci. Rep. 10, 186 (2020).

    Google Scholar 

  85. Zhang, M.-L. & Zhou, Z.-H. A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26, 1819–1837 (2013).

    Google Scholar 

  86. Thung, K.-H. & Wee, C.-Y. A brief review on multi-task learning. Multimedia Tools Appl. 77, 29705–29725 (2018).

    Google Scholar 

  87. Bi, J., Xiong, T., Yu, S., Dundar, M. & Rao, R. B. An improved multi-task learning approach with applications in medical diagnosis. In Proc. Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2008, Part I (eds Daelemans, W. et al.) 117–132 (Springer, 2008).

  88. Wang, Y., Yao, Q., Kwok, J. T. & Ni, L. M. Generalizing from a few examples: a survey on few-shot learning. ACM Comput. Surveys 53, 1–34 (2020).

    Google Scholar 

  89. Pourpanah, F. et al. A review of generalized zero-shot learning methods. IEEE Trans. Pattern Anal. Mach. Intell 45, 4051–4070 (2022).

    Google Scholar 

  90. Boult, T. E. et al. Learning and the unknown: surveying steps toward open world recognition. In Proc. Thirty-Third AAAI Conference on Artificial Intelligence 9801–9807 (AAAI Press, 2019).

  91. Song, Y., Wang, T., Cai, P., Mondal, S. K. & Sahoo, J. P. A comprehensive survey of few-shot learning: evolution, applications, challenges and opportunities. ACM Comput. Surveys 55, 1–40 (2023).

    Google Scholar 

  92. Ade, R. & Deshmukh, P. Methods for incremental learning: a survey. Int. J. Data Mining Knowl. Manag. Process 3, 119–125 (2013).

    Google Scholar 

  93. Zhang, M., Levine, S. & Finn, C. MEMO: test time robustness via adaptation and augmentation. In Proc. 36th International Conference on Neural Information Processing Systems (eds Koyejo, S. et al.) 38629–38642 (NeurIPS, 2022).

  94. Zhang, M. et al. Adaptive risk minimization: learning to adapt to domain shift. Adv. Neural Inf. Process. Syst. 34, 23664–23678 (2021).

    Google Scholar 

Download references


This work was funded by multiple awards under the DARPA Science of Artificial Intelligence and Learning for Open-world Novelty program and the Army Research Office (ARO; W911NF2020010, W911NF2020003 and W911NF2020009). The views contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of DARPA, ARO or the US government. We also acknowledge M.-H. Chiu for assisting us with the bibliographic typesetting.

Author information

Authors and Affiliations



M.K. conceived the outline of the manuscript. M.K., E.K. and A.S. conceived the ideas behind the manuscript and co-wrote the manuscript. E.K., R.S., and M.K. designed the figures, tables and examples. All authors reviewed the manuscript.

Corresponding author

Correspondence to Mayank Kejriwal.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Antoine Cully and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kejriwal, M., Kildebeck, E., Steininger, R. et al. Challenges, evaluation and opportunities for open-world learning. Nat Mach Intell 6, 580–588 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics