Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Self-orienting in human and machine learning


A current proposal for a computational notion of self is a representation of one’s body in a specific time and place, which includes the recognition of that representation as the agent. This turns self-representation into a process of self-orientation, a challenging computational problem for any human-like agent. Here, to examine this process, we created several ‘self-finding’ tasks based on simple video games, in which players (N = 124) had to identify themselves out of a set of candidates in order to play effectively. Quantitative and qualitative testing showed that human players are nearly optimal at self-orienting. In contrast, well-known deep reinforcement learning algorithms, which excel at learning much more complex video games, are far from optimal. We suggest that self-orienting allows humans to flexibly navigate new settings.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type



Prices may be subject to local taxes which are calculated during checkout

Fig. 1: The Logic Game.
Fig. 2: Results of study 1 (Logic Game).
Fig. 3: The Contingency Game.
Fig. 4: Results of study 2 (Contingency Game).
Fig. 5: Results of study 3 (Switching Mappings Game).
Fig. 6: The Switching Embodiments Game.
Fig. 7: Results for study 4 (Switching Embodiments Game).
Fig. 8: Results for the mean number of steps during the last 50 levels and (where relevant) all post-perturbation levels.

Data availability

The data that support the findings of this study are available in the Open Science Framework at

Code availability

All code for data analysis and reproducing the plots is available at


  1. James, W., Burkhardt, F., Bowers, F. & Skrupskelis, I. K. The Principles of Psychology Vol. 1 (Macmillan London, 1890).

  2. Belk, R. W. Extended self in a digital world. J. Consum. Res. 40, 477–500 (2013).

    Article  Google Scholar 

  3. Buckner, R. L. & Carroll, D. C. Self-projection and the brain. Trends Cogn. Sci. 11, 49–57 (2007).

    Article  PubMed  Google Scholar 

  4. Dennett, D. C. in Self and Consciousness 111–123 (Psychology Press, 2014).

  5. Sui, J. & Humphreys, G. W. The integrative self: how self-reference integrates perception and memory. Trends Cogn. Sci. 19, 719–728 (2015).

    Article  PubMed  Google Scholar 

  6. Blanke, O. & Metzinger, T. Full-body illusions and minimal phenomenal selfhood. Trends Cogn. Sci. 13, 7–13 (2009).

    Article  PubMed  Google Scholar 

  7. Bem, D. J. Self-perception: an alternative interpretation of cognitive dissonance phenomena. Psychol. Rev. 74, 183 (1967).

    Article  CAS  PubMed  Google Scholar 

  8. McConnell, A. R. The multiple self-aspects framework: self-concept representation and its implications. Personal. Soc. Psychol. Rev. 15, 3–27 (2011).

    Article  Google Scholar 

  9. Sanchez-Vives, M. V. & Slater, M. From presence to consciousness through virtual reality. Nat. Rev. Neurosci. 6, 332–339 (2005).

    Article  CAS  PubMed  Google Scholar 

  10. Strawson, G. The sense of the self. Lond. Rev. Books 18, 126–152 (1996).

    Google Scholar 

  11. Dennett, D. C. in Science Fiction and Philosophy: From Time Travel to Superintelligence (ed. Schneider, S.) 55–68 (John Wiley & Sons, 2016).

  12. Nozick, R. Philosophical Explanations (Harvard Univ. Press, 1981).

  13. Perry, J. Can the self divide? J. Philos. 69, 463–488 (1972).

    Article  Google Scholar 

  14. Moulin-Frier, C. et al. DAC-h3: a proactive robot cognitive architecture to acquire and express knowledge about the world and the self. IEEE Trans. Cogn. Dev. Syst. 10, 1005–1022 (2017).

    Article  Google Scholar 

  15. Johnson, M. & Demiris, Y. Perceptual perspective taking and action recognition. Int. J. Adv. Rob. Syst. 2, 32 (2005).

    Article  Google Scholar 

  16. Paul, L., Ullman, T. E., De Freitas, J. & Tenenbaum, J. Reverse-engineering the self. Preprint at (2023).

  17. Andrychowicz, M. et al. Hindsight experience replay. Adv. Neural Inform. Process. Syst. 30, 5048–5058 (2017).

  18. Hausknecht, M. & Stone, P. in 2015 AAAI Fall Symposium Series 29–37 (AAAI, 2015).

  19. Schaul, T., Quan, J., Antonoglou, I. & Silver, D. Prioritized experience replay. Preprint at (2015).

  20. Van Hasselt, H., Guez, A. & Silver, D. in Proc. AAAI Conference on Artificial Intelligence 2094–2100 (AAAI, 2016).

  21. Wang, Z. et al. in International Conference on Machine Learning 1995–2003 (PMLR, 2016).

  22. Mnih, V. et al. Playing Atari with deep reinforcement learning. Preprint at (2013).

  23. Kaiser, L. et al. Model-based reinforcement learning for Atari. Preprint at (2019).

  24. Dubey, R., Agrawal, P., Pathak, D., Griffiths, T. L. & Efros, A. A. Investigating human priors for playing video games. Preprint at (2018).

  25. Tsividis, P. A. et al. Human-level reinforcement learning through theory-based modeling, exploration, and planning. Preprint at (2021).

  26. Tsividis, P. A., Pouncy, T., Xu, J. L., Tenenbaum, J. B. & Gershman, S. J. in 2017 AAAI Spring Symposium Series 643–646 (AAAI, 2017).

  27. Uhde, C., Berberich, N., Ramirez-Amaro, K. & Cheng, G. in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 8081–8086 (IEEE, 2020).

  28. Lanillos, P. & Cheng, G. Robot self/other distinction: active inference meets neural networks learning in a mirror. Preprint at (2020).

  29. Demiris, Y. & Meltzoff, A. The robot in the crib: a developmental analysis of imitation skills in infants and robots. Infant Child Dev. Int. J. Res. Pract. 17, 43–53 (2008).

    Article  Google Scholar 

  30. Piaget, J. The construction of reality in the child. J. Consult. Psychol. 19, 77 (1955).

    Article  Google Scholar 

  31. Thrun, S. in Robotics and Cognitive Approaches to Spatial Mapping 13–41 (Springer, 2008).

  32. Silver, D., Singh, S., Precup, D. & Sutton, R. S. Reward is enough. Artif. Intell. 299, 103535 (2021).

    Article  Google Scholar 

  33. Botvinick, M. et al. Building machines that learn and think for themselves. Behav. Brain Sci. 40, E255 (2017).

  34. Botvinick, M. et al. Building machines that learn and think for themselves: Commentary on Lake et al., Behavioral and Brain Sciences, 2017. Preprint at (2017).

  35. Vul, E., Goodman, N., Griffiths, T. L. & Tenenbaum, J. B. One and done? Optimal decisions from very few samples. Cogn. Sci. 38, 599–637 (2014).

    Article  PubMed  Google Scholar 

  36. Reed, S. et al. A generalist agent. Trans. Mach. Learn. Res. (2022).

  37. Schrittwieser, J. et al. Mastering Atari, Go, chess and shogi by planning with a learned model. Nature 588, 604–609 (2020).

    Article  CAS  PubMed  Google Scholar 

  38. Pan, X. et al. How you act tells a lot: privacy-leakage attack on deep reinforcement learning. Preprint at (2019).

  39. Brockman, G. et al. OpenAI Gym. Preprint at (2016).

  40. Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D. & Iverson, G. Bayesian t tests for accepting and rejecting the null hypothesis. Psychon. Bull. Rev. 16, 225–237 (2009).

    Article  PubMed  Google Scholar 

  41. Hill, A., Raffin, A., Ernestus, M., Gleave, A. & Kanervisto, A. stable-baselines. GitHub (2018).

  42. Dhariwal, P. et al. Openai baselines. GitHub (2017).

  43. Weitkamp, L. option-critic-pytorch. GitHub (2019).

Download references


For running the artificial models, we used the Harvard Business School compute cluster. This research was funded by Harvard Business School. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations



J.D.F. initiated the research, J.D.F., A.K.U. and Z.O.-U. put together the data and conducted the analyses, and J.D.F., A.K.U., Z.O.-U., L.A.P., J.T. and T.D.U. wrote the manuscript.

Corresponding author

Correspondence to Julian De Freitas.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Human Behaviour thanks Nathan Faivre, Tony Prescott and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–20 and Tables 1–20.

Reporting Summary

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

De Freitas, J., Uğuralp, A.K., Oğuz-Uğuralp, Z. et al. Self-orienting in human and machine learning. Nat Hum Behav (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing