Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Intelligent problem-solving as integrated hierarchical reinforcement learning


According to cognitive psychology and related disciplines, the development of complex problem-solving behaviour in biological agents depends on hierarchical cognitive mechanisms. Hierarchical reinforcement learning is a promising computational approach that may eventually yield comparable problem-solving behaviour in artificial agents and robots. However, so far, the problem-solving abilities of many human and non-human animals are clearly superior to those of artificial systems. Here we propose steps to integrate biologically inspired hierarchical mechanisms to enable advanced problem-solving skills in artificial agents. We first review the literature in cognitive psychology to highlight the importance of compositional abstraction and predictive processing. Then we relate the gained insights with contemporary hierarchical reinforcement learning methods. Interestingly, our results suggest that all identified cognitive mechanisms have been implemented individually in isolated computational architectures, raising the question of why there exists no single unifying architecture that integrates them. As our final contribution, we address this question by providing an integrative perspective on the computational challenges to develop such a unifying architecture. We expect our results to guide the development of more sophisticated cognitively inspired hierarchical machine learning architectures.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Rent or buy this article

Prices vary by article type



Prices may be subject to local taxes which are calculated during checkout

Fig. 1: A New Caledonian crow solves a food-access problem.
Fig. 2: Prerequisites, mechanisms and features of biological problem-solving agents.
Fig. 3: Compositional action and state abstractions.
Fig. 4: General hierarchical problem-solving architecture.
Fig. 5: Types of action abstraction.
Fig. 6: Shortcomings, challenges and suggestions for HRL.


  1. Gruber, R. et al. New Caledonian crows use mental representations to solve metatool problems. Curr. Biol. 29, 686–692 (2019).

    Article  Google Scholar 

  2. Butz, M. V. & Kutter, E. F. How the Mind Comes into Being (Oxford Univ. Press, 2017).

  3. Perkins, D. N. & Salomon, G. in International Encyclopedia of Education (eds. Husen T. & Postelwhite T. N.) 6452–6457 (Pergamon Press, 1992).

  4. Botvinick, M. M., Niv, Y. & Barto, A. C. Hierarchically organized behavior and its neural foundations: a reinforcement learning perspective. Cognition 113, 262–280 (2009).

    Article  Google Scholar 

  5. Tomov, M. S., Yagati, S., Kumar, A., Yang, W. & Gershman, S. J. Discovery of hierarchical representations for efficient planning.PLoS Comput. Biol. 16, e1007594 (2020).

    Article  Google Scholar 

  6. Arulkumaran, K., Deisenroth, M. P., Brundage, M. & Bharath, A. A. Deep reinforcement learning: a brief survey. IEEE Signal Process. Mag. 34, 26–38 (2017).

    Article  Google Scholar 

  7. Li, Y. Deep reinforcement learning: an overview. Preprint at (2018).

  8. Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction 2nd edn (MIT Press, 2018).

  9. Neftci, E. O. & Averbeck, B. B. Reinforcement learning in artificial and biological systems. Nat. Mach. Intell. 1, 133–143 (2019).

    Article  Google Scholar 

  10. Eppe, M., Nguyen, P. D. H. & Wermter, S. From semantics to execution: integrating action planning with reinforcement learning for robotic causal problem-solving. Front. Robot. AI 6, 123 (2019).

    Article  Google Scholar 

  11. Oh, J., Singh, S., Lee, H. & Kohli, P. Zero-shot task generalization with multi-task deep reinforcement learning. In Proc. 34th International Conference on Machine Learning (ICML) (eds. Precup, D. & Teh, Y. W.) 2661–2670 (PMLR, 2017).

  12. Sohn, S., Oh, J. & Lee, H. Hierarchical reinforcement learning for zero-shot generalization with subtask dependencies. In Proc. 32nd International Conference on Neural Information Processing Systems (NeurIPS) (eds Bengio S. et al.) Vol. 31, 7156–7166 (ACM, 2018).

  13. Hegarty, M. Mechanical reasoning by mental simulation. Trends Cogn. Sci. 8, 280–285 (2004).

    Article  Google Scholar 

  14. Klauer, K. J. Teaching for analogical transfer as a means of improving problem-solving, thinking and learning. Instruct. Sci. 18, 179–192 (1989).

    Article  Google Scholar 

  15. Duncker, K. & Lees, L. S. On problem-solving. Psychol. Monographs 58, No.5 (whole No. 270), 85–101 (1945).

  16. Dayan, P. Goal-directed control and its antipodes. Neural Netw. 22, 213–219 (2009).

    Article  Google Scholar 

  17. Dolan, R. J. & Dayan, P. Goals and habits in the brain. Neuron 80, 312–325 (2013).

    Article  Google Scholar 

  18. O’Doherty, J. P., Cockburn, J. & Pauli, W. M. Learning, reward, and decision making. Annu. Rev. Psychol. 68, 73–100 (2017).

    Article  Google Scholar 

  19. Tolman, E. C. & Honzik, C. H. Introduction and removal of reward, and maze performance in rats. Univ. California Publ. Psychol. 4, 257–275 (1930).

    Google Scholar 

  20. Butz, M. V. & Hoffmann, J. Anticipations control behavior: animal behavior in an anticipatory learning classifier system. Adaptive Behav. 10, 75–96 (2002).

    Article  Google Scholar 

  21. Miller, G. A., Galanter, E. & Pribram, K. H. Plans and the Structure of Behavior (Holt, Rinehart & Winston, 1960).

  22. Botvinick, M. & Weinstein, A. Model-based hierarchical reinforcement learning and human action control. Philos. Trans. R. Soc. B Biol. Sci. 369, 20130480 (2014).

    Article  Google Scholar 

  23. Wiener, J. M. & Mallot, H. A. ’Fine-to-coarse’ route planning and navigation in regionalized environments. Spatial Cogn. Comput. 3, 331–358 (2003).

    Article  Google Scholar 

  24. Stock, A. & Stock, C. A short history of ideo-motor action. Psychol. Res. 68, 176–188 (2004).

    Article  Google Scholar 

  25. Hommel, B., Müsseler, J., Aschersleben, G. & Prinz, W. The theory of event coding (TEC): a framework for perception and action planning. Behav. Brain Sci. 24, 849–878 (2001).

    Article  Google Scholar 

  26. Hoffmann, J. in Anticipatory Behavior in Adaptive Learning Systems: Foundations, Theories and Systems (eds Butz, M. V. et al.) 44–65 (Springer, 2003).

  27. Kunde, W., Elsner, K. & Kiesel, A. No anticipation-no action: the role of anticipation in action and perception. Cogn. Process. 8, 71–78 (2007).

    Article  Google Scholar 

  28. Barsalou, L. W. Grounded cognition. Annu. Rev. Psychol. 59, 617–645 (2008).

    Article  Google Scholar 

  29. Butz, M. V. Toward a unified sub-symbolic computational theory of cognition. Front. Psychol. 7, 925 (2016).

    Article  Google Scholar 

  30. Pulvermüller, F. Brain embodiment of syntax and grammar: discrete combinatorial mechanisms spelt out in neuronal circuits. Brain Lang. 112, 167–179 (2010).

    Article  Google Scholar 

  31. Sutton, R. S., Precup, D. & Singh, S. Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112, 181–211 (1999).

    Article  MathSciNet  MATH  Google Scholar 

  32. Flash, T. & Hochner, B. Motor primitives in vertebrates and invertebrates. Curr. Opin. Neurobiol. 15, 660–666 (2005).

    Article  Google Scholar 

  33. Schaal, S. in Adaptive Motion of Animals and Machines (eds. Kimura, H. et al.) 261–280 (Springer, 2006).

  34. Feldman, J., Dodge, E. & Bryant, J. in The Oxford Handbook of Linguistic Analysis (eds Heine, B. & Narrog, H.) 111–138 (Oxford Univ. Press, 2009).

  35. Fodor, J. A. Language, thought and compositionality. Mind Lang. 16, 1–15 (2001).

    Article  Google Scholar 

  36. Frankland, S. M. & Greene, J. D. Concepts and compositionality: in search of the brain’s language of thought. Annu. Rev. Psychol. 71, 273–303 (2020).

    Article  Google Scholar 

  37. Hummel, J. E. Getting symbols out of a neural architecture. Connection Sci. 23, 109–118 (2011).

    Article  Google Scholar 

  38. Haynes, J. D., Wisniewski, D., Gorgen, K., Momennejad, I. & Reverberi, C. FMRI decoding of intentions: compositionality, hierarchy and prospective memory. In Proc. 3rd International Winter Conference on Brain-Computer Interface (BCI), 1-3 (IEEE, 2015).

  39. Gärdenfors, P. The Geometry of Meaning: Semantics Based on Conceptual Spaces (MIT Press, 2014).

    Book  MATH  Google Scholar 

  40. Lakoff, G. & Johnson, M. Philosophy in the Flesh (Basic Books, 1999).

  41. Eppe, M. et al. A computational framework for concept blending. Artif. Intell. 256, 105–129 (2018).

    Article  MathSciNet  MATH  Google Scholar 

  42. Turner, M. The Origin of Ideas (Oxford Univ. Press, 2014).

  43. Deci, E. L. & Ryan, R. M. Self-determination theory and the facilitation of intrinsic motivation. Am. Psychol. 55, 68–78 (2000).

    Article  Google Scholar 

  44. Friston, K. et al. Active inference and epistemic value. Cogn. Neurosci. 6, 187–214 (2015).

    Article  Google Scholar 

  45. Berlyne, D. E. Curiosity and exploration. Science 153, 25–33 (1966).

    Article  Google Scholar 

  46. Loewenstein, G. The psychology of curiosity: a review and reinterpretation. Psychol. Bull. 116, 75–98 (1994).

    Article  Google Scholar 

  47. Oudeyer, P.-Y., Kaplan, F. & Hafner, V. V. Intrinsic motivation systems for autonomous mental development. In IEEE Transactions on Evolutionary Computation (eds. Coello, C. A. C. et al.) Vol. 11, 265–286 (IEEE, 2007).

  48. Pisula, W. Play and exploration in animals—a comparative analysis. Polish Psychol. Bull. 39, 104–107 (2008).

    Article  Google Scholar 

  49. Jeannerod, M. Mental imagery in the motor context. Neuropsychologia 33, 1419–1432 (1995).

    Article  Google Scholar 

  50. Kahnemann, D. & Tversky, A. in Judgement under Uncertainty: Heuristics and Biases (eds Kahneman, D. et al.) Ch. 14, 201–208 (Cambridge Univ. Press, 1982).

  51. Wells, G. L. & Gavanski, I. Mental simulation of causality. J. Personal. Social Psychol. 56, 161–169 (1989).

    Article  Google Scholar 

  52. Taylor, S. E., Pham, L. B., Rivkin, I. D. & Armor, D. A. Harnessing the imagination: mental simulation, self-regulation and coping. Am. Psychol. 53, 429–439 (1998).

    Article  Google Scholar 

  53. Kaplan, F. & Oudeyer, P.-Y. in Embodied Artificial Intelligence, Lecture Notes in Computer Science Vol. 3139 (eds Iida, F. et al.) 259–270 (Springer, 2004).

  54. Schmidhuber, J. Formal theory of creativity, fun, and intrinsic motivation. IEEE Trans. Auton. Mental Dev. 2, 230–247 (2010).

    Article  Google Scholar 

  55. Friston, K., Mattout, J. & Kilner, J. Action understanding and active inference. Biol. Cybern. 104, 137–160 (2011).

    Article  MathSciNet  MATH  Google Scholar 

  56. Oudeyer, P.-Y. Computational theories of curiosity-driven learning. In The New Science of Curiosity (ed. Goren Gordon), 43-72 (Nova Science Publishers, 2018);

  57. Colombo, M. & Wright, C. First principles in the life sciences: the free-energy principle, organicism and mechanism. Synthese 198, 3463–3488 (2021).

    Article  MathSciNet  Google Scholar 

  58. Huang, Y. & Rao, R. P. Predictive coding. WIREs Cogn. Sci. 2, 580–593 (2011).

    Article  Google Scholar 

  59. Friston, K. The free-energy principle: a unified brain theory? Nat. Rev. Neurosci. 11, 127–138 (2010).

    Article  Google Scholar 

  60. Knill, D. C. & Pouget, A. The Bayesian brain: the role of uncertainty in neural coding and computation. Trends Neurosci. 27, 712–719 (2004).

    Article  Google Scholar 

  61. Clark, A. Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behav. Brain Sci. 36, 181–204 (2013).

    Article  Google Scholar 

  62. Clark, A. Surfing Uncertainty: Prediction, Action and the Embodied Mind (Oxford Univ. Press, 2016).

  63. Zacks, J. M., Speer, N. K., Swallow, K. M., Braver, T. S. & Reyonolds, J. R. Event perception: a mind/brain perspective. Psychol. Bull. 133, 273–293 (2007).

    Article  Google Scholar 

  64. Eysenbach, B., Ibarz, J., Gupta, A. & Levine, S. Diversity is all you need: learning skills without a reward function. In International Conference on Learning Representations (ICLR, 2019).

  65. Frans, K., Ho, J., Chen, X., Abbeel, P. & Schulman, J. Meta learning shared hierarchies. In Proc. International Conference on Learning Representations (ICLR, 2018).

  66. Heess, N. et al. Learning and transfer of modulated locomotor controllers. Preprint at (2016).

  67. Jiang, Y., Gu, S., Murphy, K. & Finn, C. Language as an abstraction for hierarchical deep reinforcement learning. In Neural Information Processing Systems (NeurIPS) (eds. Wallach, H. et al.) 9414–9426 (ACM, 2019).

  68. Li, A. C., Florensa, C., Clavera, I. & Abbeel, P. Sub-policy adaptation for hierarchical reinforcement learning. In Proc. International Conference on Learning Representations (ICLR, 2020).

  69. Qureshi, A. H. et al. Composing task-agnostic policies with deep reinforcement learning. In Proc. International Conference on Learning Representations (ICLR, 2020).

  70. Sharma, A., Gu, S., Levine, S., Kumar, V. & Hausman, K. Dynamics-aware unsupervised discovery of skills. In Proc. International Conference on Learning Representations (ICLR, 2020).

  71. Tessler, C., Givony, S., Zahavy, T., Mankowitz, D. J. & Mannor, S. A deep hierarchical approach to lifelong learning in minecraft. In Proc. 31st AAAI Conference on Artificial Intelligence 1553–1561 (AAAI, 2017).

  72. Vezhnevets, A. et al. Strategic attentive writer for learning macro-actions. In Neural Information Processing Systems (NIPS) (eds. Lee, D. et al.) 3494–3502 (NIPS, 2016).

  73. Devin, C., Gupta, A., Darrell, T., Abbeel, P. & Levine, S. Learning modular neural network policies for multi-task and multi-robot transfer. In Proc. International Conference on Robotics and Automation (ICRA) (eds. Okamura, A. et al.) 2169–2176 (IEEE, 2017).

  74. Hejna, D. J., Abbeel, P. & Pinto, L. Hierarchically decoupled morphological transfer. In Proc. International Conference on Machine Learning (ICML) (eds. Daumé III, H. & Singh, A.) 11409–11420 (PMLR, 2020).

  75. Hamrick, J. B. et al. On the role of planning in model-based deep reinforcement learning. In Proc. International Conference on Learning Representations (ICLR, 2021).

  76. Sutton, R. S. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proc. 7th International Conference on Machine Learning (ICML) (eds. Porter, B. W. & Mooney, R. J.) 216–224 (Morgan Kaufmann, 1990).

  77. Nau, D. et al. SHOP2: an HTN planning system. J. Artif. Intell. Res. 20, 379–404 (2003).

    Article  MATH  Google Scholar 

  78. Lyu, D., Yang, F., Liu, B. & Gustafson, S. SDRL: interpretable and data-efficient deep reinforcement learning leveraging symbolic planning. In Proc. AAAI Conference on Artificial Intelligence Vol. 33, 2970–2977 (AAAI, 2019).

  79. Ma, A., Ouimet, M. & Cortés, J. Hierarchical reinforcement learning via dynamic subspace search for multi-agent planning. Auton. Robot. 44, 485–503 (2020).

    Article  Google Scholar 

  80. Bacon, P.-L., Harb, J. & Precup, D. The option-critic architecture. In Proc. 31st AAAI Conference on Artificial Intelligence 1726–1734 (AAAI, 2017).

  81. Dietterich, T. G. State abstraction in MAXQ hierarchical reinforcement learning. In Advances in Neural Information Processing Systems (NIPS) (eds. Solla, S. et al.) Vol. 12, 994–1000 (NIPS, 1999).

  82. Kulkarni, T. D., Narasimhan, K. R., Saeedi, A. & Tenenbaum, J. B. Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation. In Neural Information Processing Systems (NIPS) (eds. Lee, D. et al.) 3675–3683 (NIPS, 2016).

  83. Shankar, T., Pinto, L., Tulsiani, S. & Gupta, A. Discovering motor programs by recomposing demonstrations. In Proc. International Conference on Learning Representations (ICLR, 2020).

  84. Vezhnevets, A. S., Wu, Y. T., Eckstein, M., Leblond, R. & Leibo, J. Z. Options as responses: grounding behavioural hierarchies in multi-agent reinforcement learning. In Proc. International Conference on Machine Learning (ICML) (eds. Daumé III, H. & Singh, A.) 9733–9742 (PMLR, 2020).

  85. Ghazanfari, B., Afghah, F. & Taylor, M. E. Sequential association rule mining for autonomously extracting hierarchical task structures in reinforcement learning. IEEE Access 8, 11782–11799 (2020).

    Article  Google Scholar 

  86. Levy, A., Konidaris, G., Platt, R. & Saenko, K. Learning multi-level hierarchies with hindsight. In Proc. International Conference on Learning Representations (ICLR, 2019).

  87. Nachum, O., Gu, S., Lee, H. & Levine, S. Data-efficient hierarchical reinforcement learning. In Proc. 32nd International Conference on Neural Information Processing Systems (NIPS) (eds. Bengio, S. et al.) 3307–3317 (NIPS, 2018).

  88. Rafati, J. & Noelle, D. C. Learning representations in model-free hierarchical reinforcement learning. In Proc. 33rd AAAI Conference on Artificial Intelligence 10009–10010 (AAAI, 2019).

  89. Röder, F., Eppe, M., Nguyen, P. D. H. & Wermter, S. Curious hierarchical actor-critic reinforcement learning. In Proc. International Conference on Artificial Neural Networks (ICANN) (eds. Farkaš, I. et al.) 408–419 (Springer, 2020).

  90. Zhang, T., Guo, S., Tan, T., Hu, X. & Chen, F. Generating adjacency-constrained subgoals in hierarchical reinforcement learning. In Neural Information Processing Systems (NIPS) (eds. Larochelle, H. et al.) 21579-21590 (NIPS, 2020).

  91. Lample, G. & Chaplot, D. S. Playing FPS games with deep reinforcement learning. In Proc. 31st AAAI Conference on Artificial Intelligence 2140–2146 (AAAI, 2017).

  92. Vezhnevets, A. S. et al. FeUdal networks for hierarchical reinforcement learning. In Proc. 34th International Conference on Machine Learning (ICML) (eds. Precup, D. & Teh, Y. W.) Vol. 70, 3540–3549 (PMLR, 2017).

  93. Wulfmeier, M. et al. Compositional Transfer in Hierarchical Reinforcement Learning. In Robotics: Science and System XVI (RSS) (eds. Toussaint M. et al.) (Robotics: Science and Systems Foundation, 2020);

  94. Yang, Z., Merrick, K., Jin, L. & Abbass, H. A. Hierarchical deep reinforcement learning for continuous action control. IEEE Trans. Neural Netw. Learn. Syst. 29, 5174–5184 (2018).

    Article  MathSciNet  Google Scholar 

  95. Toussaint, M., Allen, K. R., Smith, K. A. & Tenenbaum, J. B. Differentiable physics and stable modes for tool-use and manipulation planning. In Proc. Robotics: Science and Systems XIV (RSS) (eds. Kress-Gazit, H. et al.) (Robotics: Science and Systems Foundation, 2018).

  96. Akrour, R., Veiga, F., Peters, J. & Neumann, G. Regularizing reinforcement learning with state abstraction. In Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 534–539 (IEEE, 2018).

  97. Schaul, T. & Ring, M. Better generalization with forecasts. In Proc. 23rd International Joint Conference on Artificial Intelligence (IJCAI) (ed. Rossi, F.) 1656–1662 (AAAI, 2013).

  98. Colas, C., Akakzia, A., Oudeyer, P.-Y., Chetouani, M. & Sigaud, O. Language-conditioned goal generation: a new approach to language grounding for RL. Preprint at (2020).

  99. Blaes, S., Pogancic, M. V., Zhu, J. J. & Martius, G. Control what you can: intrinsically motivated task-planning agent. Neural Inf. Process. Syst. 32, 12541–12552 (2019).

    Google Scholar 

  100. Haarnoja, T., Hartikainen, K., Abbeel, P. & Levine, S. Latent space policies for hierarchical reinforcement learning. In Proc. International Conference on Machine Learning (ICML) (eds. Dy, J. & Krause, A.) Vol. 4, 2965–2975 (PMLR, 2018).

  101. Rasmussen, D., Voelker, A. & Eliasmith, C. A neural model of hierarchical reinforcement learning. PLoS ONE 12, e0180234 (2017).

    Article  Google Scholar 

  102. Riedmiller, M. et al. Learning by playing—solving sparse reward tasks from scratch. In Proc. International Conference on Machine Learning (ICML) (eds. Dy, J. & Krause, A.) Vol. 10, 6910–6919 (PMLR, 2018).

  103. Yang, F., Lyu, D., Liu, B. & Gustafson, S. PEORL: integrating symbolic planning and hierarchical reinforcement learning for robust decision-making. In Proc. 27th International Joint Conference on Artificial Intelligence (IJCAI) (ed. Lang, J.) 4860–4866 (IJCAI, 2018).

  104. Machado, M. C., Bellemare, M. G. & Bowling, M. A Laplacian framework for option discovery in reinforcement learning. In Proc. International Conference on Machine Learning (ICML) (eds. Precup, D. & Teh, Y. W.) Vol. 5, 3567–3582 (PMLR, 2017).

  105. Pathak, D., Agrawal, P., Efros, A. A. & Darrell, T. Curiosity-driven exploration by self-supervised prediction. In Proc. 34th International Conference on Machine Learning (ICML) (eds. Precup, D. & Teh, Y. W.) 2778–2787 (PMLR, 2017).

  106. Schillaci, G. et al. Intrinsic motivation and episodic memories for robot exploration of high-dimensional sensory spaces. Adaptive Behav. 29 549–566 (2020).

  107. Colas, C., Fournier, P., Sigaud, O., Chetouani, M. & Oudeyer, P.-Y. CURIOUS: intrinsically motivated modular multi-goal reinforcement learning. In Proc. International Conference on Machine Learning (ICML) (eds. Chaudhuri, K. & Salakhutdinov, R.) 1331–1340 (PMLR, 2019).

  108. Hafez, M. B., Weber, C., Kerzel, M. & Wermter, S. Improving robot dual-system motor learning with intrinsically motivated meta-control and latent-space experience imagination. Robot. Auton. Syst. 133, 103630 (2020).

    Article  Google Scholar 

  109. Yamamoto, K., Onishi, T. & Tsuruoka, Y. Hierarchical reinforcement learning with abductive planning. In Proc. ICML/IJCAI/AAMAS 2018 Workshop on Planning and Learning (PAL-18) (2018).

  110. Wu, B., Gupta, J. K. & Kochenderfer, M. J. Model primitive hierarchical lifelong reinforcement learning. In Proc. International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS) (eds. Agmon, N. et al.) Vol. 1, 34–42 (IFAAMAS, 2019).

  111. Li, Z., Narayan, A. & Leong, T. Y. An efficient approach to model-based hierarchical reinforcement learning. In Proc. 31st AAAI Conference on Artificial Intelligence 3583–3589 (AAAI, 2017).

  112. Hafner, D., Lillicrap, T. & Norouzi, M. Dream to control: learning behaviors by latent imagination. In Proc. International Conference on Learning Representations (ICLR, 2020).

  113. Deisenroth, M. P., Rasmussen, C. E. & Fox, D. Learning to control a low-cost manipulator using data-efficient reinforcement learning. In Robotics: Science and Systems VII (RSS) (eds. Durrant-Whyte, H. et al.) 57–64 (Robotics: Science and Systems Foundation, 2011).

  114. Ha, D. & Schmidhuber, J. Recurrent world models facilitate policy evolution. In Proc. 32nd International Conference on Neural Information Processing Systems (NeurIPS) (eds. Bengio, S. et al.) 2455–2467 (NIPS, 2018).

  115. Battaglia, P. W. et al. Relational inductive biases, deep learning and graph networks. Preprint at (2018).

  116. Andrychowicz, M. et al. Hindsight experience replay. In Proc. Neural Information Processing Systems (NIPS) (eds. Guyon I. et al.) 5048–5058 (NIPS, 2017);

  117. Schwartenbeck, P. et al. Computational mechanisms of curiosity and goal-directed exploration. eLife 8, e41703 (2019).

    Article  Google Scholar 

  118. Haarnoja, T., Zhou, A., Abbeel, P. & Levine, S. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proc. International Conference on Machine Learning (ICML) (eds. Dy, J. & Krause, A.) 1861–1870 (PMLR, 2018).

  119. Yu, A. J. & Dayan, P. Uncertainty, neuromodulation and attention. Neuron 46, 681–692 (2005).

    Article  Google Scholar 

  120. Baldwin, D. A. & Kosie, J. E. How does the mind render streaming experience as events? Top. Cogn. Sci. 13, 79–105 (2021).

    Article  Google Scholar 

Download references


We acknowledge funding from the DFG (projects IDEAS, LeCAREbot, TRR169, SPP 2134, RTG 1808 and EXC 2064/1), the Humboldt Foundation and Max Planck Research School IMPRS-IS.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Manfred Eppe.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Boxes 1–6 and Table 1.

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Eppe, M., Gumbsch, C., Kerzel, M. et al. Intelligent problem-solving as integrated hierarchical reinforcement learning. Nat Mach Intell 4, 11–20 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing