Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Multi-task reinforcement learning in humans


The ability to transfer knowledge across tasks and generalize to novel ones is an important hallmark of human intelligence. Yet not much is known about human multitask reinforcement learning. We study participants’ behaviour in a two-step decision-making task with multiple features and changing reward functions. We compare their behaviour with two algorithms for multitask reinforcement learning, one that maps previous policies and encountered features to new reward functions and one that approximates value functions across tasks, as well as to standard model-based and model-free algorithms. Across three exploratory experiments and a large preregistered confirmatory experiment, our results provide evidence that participants who are able to learn the task use a strategy that maps previously learned policies to novel scenarios. These results enrich our understanding of human reinforcement learning in complex environments with changing task demands.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Overview of theoretical setup and experiments.
Fig. 2: Overview and results of experiment 1.
Fig. 3: Overview and results of experiment 2.
Fig. 4: Overview and results of experiment 3.
Fig. 5: Results of preregistered experiment 4.

Data availability

Anonymized participant data and model simulation data are available at

Code availability

Code for all models and analyses is available at


  1. 1.

    Meyer, D. E. & Kieras, D. E. A computational theory of executive cognitive processes and multiple-task performance: part I. Basic mechanisms. Psychol. Rev. 104, 3 (1997).

    CAS  Article  Google Scholar 

  2. 2.

    Lake, B. M., Ullman, T. D., Tenenbaum, J. B. & Gershman, S. J. Building machines that learn and think like people. Behav. Brain Sci. 40, e253 (2017).

    Article  Google Scholar 

  3. 3.

    Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT Press, 1998).

  4. 4.

    Schaul, T., Horgan, D., Gregor, K. & Silver, D. Universal Value Function Approximators. In International Conference on Machine Learning, 1312–1320 (2015).

  5. 5.

    Barreto, A. et al. Successor features for transfer in reinforcement learning. Adv. Neural Inform. Process. Syst. 30, 4055–4065 (2017).

    Google Scholar 

  6. 6.

    Barreto, A. et al. Transfer in deep reinforcement learning using successor features and generalised policy improvement. Proc. Mach. Learn. Res. 80, 501–510 (2018).

    Google Scholar 

  7. 7.

    Borsa, D. et al. Universal successor features approximators. Preprint at xrXiv (2018).

  8. 8.

    Taylor, M. E. & Stone, P. Transfer learning for reinforcement learning domains: a survey. J. Mach. Learn. Res. 10, 1633–1685 (2009).

    Google Scholar 

  9. 9.

    Finn, C., Abbeel, P. & Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning 70, 1126–1135 (, 2017).

  10. 10.

    Caruana, R. Multitask learning. Mach. Learn. 28, 41–75 (1997).

    Article  Google Scholar 

  11. 11.

    Frans, K., Ho, J., Chen, X., Abbeel, P. & Schulman, J. Meta learning shared hierarchies. Preprint at arXiv (2017).

  12. 12.

    Wang, J. X. et al. Prefrontal cortex as a meta-reinforcement learning system. Nat. Neurosci. 21, 860 (2018).

    CAS  Article  Google Scholar 

  13. 13.

    Duan, Y. et al. Rl2: fast reinforcement learning via slow reinforcement learning. Preprint at arXiv (2016).

  14. 14.

    Harlow, H. F. The formation of learning sets. Psychol. Rev. 56, 51 (1949).

    CAS  Article  Google Scholar 

  15. 15.

    Yang, G. R., Joglekar, M. R., Song, H. F., Newsome, W. T. & Wang, X.-J. Task representations in neural networks trained to perform many cognitive tasks. Nat. Neurosci. 22, 297 (2019).

    CAS  Article  Google Scholar 

  16. 16.

    Stachenfeld, K. L., Botvinick, M. M. & Gershman, S. J. The hippocampus as a predictive map. Nat. Neurosci. 20, 1643 (2017).

    CAS  Article  Google Scholar 

  17. 17.

    O’Keefe, J. & Nadel, L. The Hippocampus as a Cognitive Map (Clarendon Press, 1978).

  18. 18.

    Gardner, M. P., Schoenbaum, G. & Gershman, S. J. Rethinking dopamine as generalized prediction error. Proc. R. Soc. B 285, 20181645 (2018).

    Article  Google Scholar 

  19. 19.

    Niv, Y. et al. Reinforcement learning in multidimensional environments relies on attention mechanisms. J. Neurosci. 35, 8145–8157 (2015).

    CAS  Article  Google Scholar 

  20. 20.

    Leong, Y. C., Radulescu, A., Daniel, R., DeWoskin, V. & Niv, Y. Dynamic interaction between reinforcement learning and attention in multidimensional environments. Neuron 93, 451–463 (2017).

    CAS  Article  Google Scholar 

  21. 21.

    Flesch, T., Balaguer, J., Dekker, R., Nili, H. & Summerfield, C. Comparing continual task learning in minds and machines. Proc. Natl Acad. Sci. U. S. A. 115, E10313–E10322 (2018).

    CAS  Article  Google Scholar 

  22. 22.

    Keramati, M. & Gutkin, B. Homeostatic reinforcement learning for integrating reward collection and physiological stability. eLife 3, e04811 (2014).

    Article  Google Scholar 

  23. 23.

    Schuck, N. W., Cai, M. B., Wilson, R. C. & Niv, Y. Human orbitofrontal cortex represents a cognitive map of state space. Neuron 91, 1402–1412 (2016).

    CAS  Article  Google Scholar 

  24. 24.

    Wilson, R. C., Takahashi, Y. K., Schoenbaum, G. & Niv, Y. Orbitofrontal cortex as a cognitive map of task space. Neuron 81, 267–279 (2014).

    CAS  Article  Google Scholar 

  25. 25.

    Williams, G. et al. The hypothalamus and the control of energy homeostasis: different circuits, different purposes. Physiol. Behav. 74, 683–701 (2001).

    CAS  Article  Google Scholar 

  26. 26.

    Burgess, C. R., Livneh, Y., Ramesh, R. N. & Andermann, M. L. Gating of visual processing by physiological need. Curr. Opin. Neurobiol. 49, 16–23 (2018).

    CAS  Article  Google Scholar 

  27. 27.

    Juechems, K. & Summerfield, C. Where does value come from?. Trends Cogn. Sci. 23, 836–850 (2019).

    Article  Google Scholar 

  28. 28.

    Botvinick, M. M. Hierarchical models of behavior and prefrontal function. Trends Cogn. Sci. 12, 201–208 (2008).

    Article  Google Scholar 

  29. 29.

    Chang, M. B., Gupta, A., Levine, S. & Griffiths, T. L. Automatically composing representation transformations as a means for generalization. in International Conference on Learning Representations (2019).

  30. 30.

    Saxe, A. M., McClelland, J. L. & Ganguli, S. A mathematical theory of semantic development in deep neural networks. Proc. Natl Acad. Sci. U. S. A. 116, 11537–11546 (2019).

    CAS  Article  Google Scholar 

  31. 31.

    Tsividis, P. A., Pouncy, T., Xu, J. L., Tenenbaum, J. B. & Gershman, S. J. Human learning in Atari. in 2017 AAAI Spring Symposium Series (2017).

  32. 32.

    Momennejad, I. et al. The successor representation in human reinforcement learning. Nat. Hum. Behav. 1, 680 (2017).

    CAS  Article  Google Scholar 

  33. 33.

    Wu, C. M., Schulz, E., Speekenbrink, M., Nelson, J. D. & Meder, B. Generalization guides human exploration in vast decision spaces. Nat. Hum. Behav. 2, 915 (2018).

    Article  Google Scholar 

  34. 34.

    Stojić, H., Schulz, E., Analytis, P. & Speekenbrink, M. It’s new, but is it good? How generalization and uncertainty guide the exploration of novel options. J. Exp. Psychol. 149, 1878–1907 (2020).

    Article  Google Scholar 

  35. 35.

    Morey, R. D., Rouder, J. N., Jamil, T. & Morey, M. R. D. Package ‘BayesFactor’ (R Project, 2015).

  36. 36.

    Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D. & Iverson, G. Bayesian t tests for accepting and rejecting the null hypothesis. Psychon. Bull. Rev. 16, 225–237 (2009).

    Article  Google Scholar 

  37. 37.

    Gronau, Q. F., Singmann, H. & Wagenmakers, E.-J. bridgesampling: an R package for estimating normalizing constants. J. Stat. Soft. (2020).

  38. 38.

    Lazaric, A. in Reinforcement Learning (ed. Wiering, M. & van Otterlo, M.) 143–173 (Springer, 2012).

  39. 39.

    Gershman, S. J. The successor representation: its computational logic and neural substrates. J. Neurosci. 38, 7193–7200 (2018).

    CAS  Article  Google Scholar 

  40. 40.

    Dayan, P. Improving generalization for temporal difference learning: the successor representation. Neural Comput. 5, 613–624 (1993).

    Article  Google Scholar 

  41. 41.

    Russek, E. M., Momennejad, I., Botvinick, M. M., Gershman, S. J. & Daw, N. D. Predictive representations can link model-based reinforcement learning to model-free mechanisms. PLoS Comput. Biol. 13, e1005768 (2017).

    Article  Google Scholar 

  42. 42.

    Stachenfeld, K. L., Botvinick, M. & Gershman, S. J. Adv. Neural Inform. Process. Syst. 27, 2528–2536 (2014).

    Google Scholar 

  43. 43.

    Tomov, M., Yagati, S., Kumar, A., Yang, W. & Gershman, S. Discovery of hierarchical representations for efficient planning. PLoS Comput. Biol. (2020).

  44. 44.

    Franklin, N. T. & Frank, M. J. Compositional clustering in task structure learning. PLoS Comput. Biol. 14, e1006116 (2018).

    Article  Google Scholar 

  45. 45.

    Daw, N. D., O'Doherty, J. P., Dayan, P., Seymour, B. & Dolan, R. J. Cortical substrates for exploratory decisions in humans. Nature 441, 876 (2006).

    CAS  Article  Google Scholar 

Download references


The authors thank N. Franklin and W. Yang for helpful discussions. This research was supported by the Toyota Corporation, the Office of Naval Research (award N000141712984), the Harvard Data Science Initiative and the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information




M.S.T. and E.S. contributed equally. M.S.T., E.S. and S.J.G. conceived the experiments, M.S.T. and E.S. conducted the experiments and analysed the results. All authors wrote the manuscript.

Corresponding authors

Correspondence to Momchil S. Tomov or Eric Schulz.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information

Primary Handling Editor: Marike Schiffer

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Information with additional analyses and plots, Supplementary Figs. 1–6 and Supplementary References.

Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Tomov, M.S., Schulz, E. & Gershman, S.J. Multi-task reinforcement learning in humans. Nat Hum Behav 5, 764–773 (2021).

Download citation


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing