Abstract
The ability to transfer knowledge across tasks and generalize to novel ones is an important hallmark of human intelligence. Yet not much is known about human multitask reinforcement learning. We study participants’ behaviour in a two-step decision-making task with multiple features and changing reward functions. We compare their behaviour with two algorithms for multitask reinforcement learning, one that maps previous policies and encountered features to new reward functions and one that approximates value functions across tasks, as well as to standard model-based and model-free algorithms. Across three exploratory experiments and a large preregistered confirmatory experiment, our results provide evidence that participants who are able to learn the task use a strategy that maps previously learned policies to novel scenarios. These results enrich our understanding of human reinforcement learning in complex environments with changing task demands.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Value-free random exploration is linked to impulsivity
Nature Communications Open Access 04 August 2022
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 per month
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout





Data availability
Anonymized participant data and model simulation data are available at https://github.com/tomov/MTRL.
Code availability
Code for all models and analyses is available at https://github.com/tomov/MTRL.
References
Meyer, D. E. & Kieras, D. E. A computational theory of executive cognitive processes and multiple-task performance: part I. Basic mechanisms. Psychol. Rev. 104, 3 (1997).
Lake, B. M., Ullman, T. D., Tenenbaum, J. B. & Gershman, S. J. Building machines that learn and think like people. Behav. Brain Sci. 40, e253 (2017).
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT Press, 1998).
Schaul, T., Horgan, D., Gregor, K. & Silver, D. Universal Value Function Approximators. In International Conference on Machine Learning, 1312–1320 (2015).
Barreto, A. et al. Successor features for transfer in reinforcement learning. Adv. Neural Inform. Process. Syst. 30, 4055–4065 (2017).
Barreto, A. et al. Transfer in deep reinforcement learning using successor features and generalised policy improvement. Proc. Mach. Learn. Res. 80, 501–510 (2018).
Borsa, D. et al. Universal successor features approximators. Preprint at xrXiv https://arxiv.org/abs/1812.07626 (2018).
Taylor, M. E. & Stone, P. Transfer learning for reinforcement learning domains: a survey. J. Mach. Learn. Res. 10, 1633–1685 (2009).
Finn, C., Abbeel, P. & Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning 70, 1126–1135 (JMLR.org, 2017).
Caruana, R. Multitask learning. Mach. Learn. 28, 41–75 (1997).
Frans, K., Ho, J., Chen, X., Abbeel, P. & Schulman, J. Meta learning shared hierarchies. Preprint at arXiv https://arxiv.org/abs/1710.09767 (2017).
Wang, J. X. et al. Prefrontal cortex as a meta-reinforcement learning system. Nat. Neurosci. 21, 860 (2018).
Duan, Y. et al. Rl2: fast reinforcement learning via slow reinforcement learning. Preprint at arXiv https://arxiv.org/abs/1611.02779 (2016).
Harlow, H. F. The formation of learning sets. Psychol. Rev. 56, 51 (1949).
Yang, G. R., Joglekar, M. R., Song, H. F., Newsome, W. T. & Wang, X.-J. Task representations in neural networks trained to perform many cognitive tasks. Nat. Neurosci. 22, 297 (2019).
Stachenfeld, K. L., Botvinick, M. M. & Gershman, S. J. The hippocampus as a predictive map. Nat. Neurosci. 20, 1643 (2017).
O’Keefe, J. & Nadel, L. The Hippocampus as a Cognitive Map (Clarendon Press, 1978).
Gardner, M. P., Schoenbaum, G. & Gershman, S. J. Rethinking dopamine as generalized prediction error. Proc. R. Soc. B 285, 20181645 (2018).
Niv, Y. et al. Reinforcement learning in multidimensional environments relies on attention mechanisms. J. Neurosci. 35, 8145–8157 (2015).
Leong, Y. C., Radulescu, A., Daniel, R., DeWoskin, V. & Niv, Y. Dynamic interaction between reinforcement learning and attention in multidimensional environments. Neuron 93, 451–463 (2017).
Flesch, T., Balaguer, J., Dekker, R., Nili, H. & Summerfield, C. Comparing continual task learning in minds and machines. Proc. Natl Acad. Sci. U. S. A. 115, E10313–E10322 (2018).
Keramati, M. & Gutkin, B. Homeostatic reinforcement learning for integrating reward collection and physiological stability. eLife 3, e04811 (2014).
Schuck, N. W., Cai, M. B., Wilson, R. C. & Niv, Y. Human orbitofrontal cortex represents a cognitive map of state space. Neuron 91, 1402–1412 (2016).
Wilson, R. C., Takahashi, Y. K., Schoenbaum, G. & Niv, Y. Orbitofrontal cortex as a cognitive map of task space. Neuron 81, 267–279 (2014).
Williams, G. et al. The hypothalamus and the control of energy homeostasis: different circuits, different purposes. Physiol. Behav. 74, 683–701 (2001).
Burgess, C. R., Livneh, Y., Ramesh, R. N. & Andermann, M. L. Gating of visual processing by physiological need. Curr. Opin. Neurobiol. 49, 16–23 (2018).
Juechems, K. & Summerfield, C. Where does value come from?. Trends Cogn. Sci. 23, 836–850 (2019).
Botvinick, M. M. Hierarchical models of behavior and prefrontal function. Trends Cogn. Sci. 12, 201–208 (2008).
Chang, M. B., Gupta, A., Levine, S. & Griffiths, T. L. Automatically composing representation transformations as a means for generalization. in International Conference on Learning Representations https://openreview.net/forum?id=B1ffQnRcKX (2019).
Saxe, A. M., McClelland, J. L. & Ganguli, S. A mathematical theory of semantic development in deep neural networks. Proc. Natl Acad. Sci. U. S. A. 116, 11537–11546 (2019).
Tsividis, P. A., Pouncy, T., Xu, J. L., Tenenbaum, J. B. & Gershman, S. J. Human learning in Atari. in 2017 AAAI Spring Symposium Series (2017).
Momennejad, I. et al. The successor representation in human reinforcement learning. Nat. Hum. Behav. 1, 680 (2017).
Wu, C. M., Schulz, E., Speekenbrink, M., Nelson, J. D. & Meder, B. Generalization guides human exploration in vast decision spaces. Nat. Hum. Behav. 2, 915 (2018).
Stojić, H., Schulz, E., Analytis, P. & Speekenbrink, M. It’s new, but is it good? How generalization and uncertainty guide the exploration of novel options. J. Exp. Psychol. 149, 1878–1907 (2020).
Morey, R. D., Rouder, J. N., Jamil, T. & Morey, M. R. D. Package ‘BayesFactor’ (R Project, 2015).
Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D. & Iverson, G. Bayesian t tests for accepting and rejecting the null hypothesis. Psychon. Bull. Rev. 16, 225–237 (2009).
Gronau, Q. F., Singmann, H. & Wagenmakers, E.-J. bridgesampling: an R package for estimating normalizing constants. J. Stat. Soft. https://doi.org/10.18637/jss.v092.i10 (2020).
Lazaric, A. in Reinforcement Learning (ed. Wiering, M. & van Otterlo, M.) 143–173 (Springer, 2012).
Gershman, S. J. The successor representation: its computational logic and neural substrates. J. Neurosci. 38, 7193–7200 (2018).
Dayan, P. Improving generalization for temporal difference learning: the successor representation. Neural Comput. 5, 613–624 (1993).
Russek, E. M., Momennejad, I., Botvinick, M. M., Gershman, S. J. & Daw, N. D. Predictive representations can link model-based reinforcement learning to model-free mechanisms. PLoS Comput. Biol. 13, e1005768 (2017).
Stachenfeld, K. L., Botvinick, M. & Gershman, S. J. Adv. Neural Inform. Process. Syst. 27, 2528–2536 (2014).
Tomov, M., Yagati, S., Kumar, A., Yang, W. & Gershman, S. Discovery of hierarchical representations for efficient planning. PLoS Comput. Biol. https://doi.org/10.1371/journal.pcbi.1007594 (2020).
Franklin, N. T. & Frank, M. J. Compositional clustering in task structure learning. PLoS Comput. Biol. 14, e1006116 (2018).
Daw, N. D., O'Doherty, J. P., Dayan, P., Seymour, B. & Dolan, R. J. Cortical substrates for exploratory decisions in humans. Nature 441, 876 (2006).
Acknowledgements
The authors thank N. Franklin and W. Yang for helpful discussions. This research was supported by the Toyota Corporation, the Office of Naval Research (award N000141712984), the Harvard Data Science Initiative and the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
M.S.T. and E.S. contributed equally. M.S.T., E.S. and S.J.G. conceived the experiments, M.S.T. and E.S. conducted the experiments and analysed the results. All authors wrote the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information
Primary Handling Editor: Marike Schiffer
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Information with additional analyses and plots, Supplementary Figs. 1–6 and Supplementary References.
Rights and permissions
About this article
Cite this article
Tomov, M.S., Schulz, E. & Gershman, S.J. Multi-task reinforcement learning in humans. Nat Hum Behav 5, 764–773 (2021). https://doi.org/10.1038/s41562-020-01035-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41562-020-01035-y
This article is cited by
-
Value-free random exploration is linked to impulsivity
Nature Communications (2022)