The ability to transfer knowledge across tasks and generalize to novel ones is an important hallmark of human intelligence. Yet not much is known about human multitask reinforcement learning. We study participants’ behaviour in a two-step decision-making task with multiple features and changing reward functions. We compare their behaviour with two algorithms for multitask reinforcement learning, one that maps previous policies and encountered features to new reward functions and one that approximates value functions across tasks, as well as to standard model-based and model-free algorithms. Across three exploratory experiments and a large preregistered confirmatory experiment, our results provide evidence that participants who are able to learn the task use a strategy that maps previously learned policies to novel scenarios. These results enrich our understanding of human reinforcement learning in complex environments with changing task demands.
Subscribe to Journal
Get full journal access for 1 year
only $8.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Anonymized participant data and model simulation data are available at https://github.com/tomov/MTRL.
Code for all models and analyses is available at https://github.com/tomov/MTRL.
Meyer, D. E. & Kieras, D. E. A computational theory of executive cognitive processes and multiple-task performance: part I. Basic mechanisms. Psychol. Rev. 104, 3 (1997).
Lake, B. M., Ullman, T. D., Tenenbaum, J. B. & Gershman, S. J. Building machines that learn and think like people. Behav. Brain Sci. 40, e253 (2017).
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT Press, 1998).
Schaul, T., Horgan, D., Gregor, K. & Silver, D. Universal Value Function Approximators. In International Conference on Machine Learning, 1312–1320 (2015).
Barreto, A. et al. Successor features for transfer in reinforcement learning. Adv. Neural Inform. Process. Syst. 30, 4055–4065 (2017).
Barreto, A. et al. Transfer in deep reinforcement learning using successor features and generalised policy improvement. Proc. Mach. Learn. Res. 80, 501–510 (2018).
Borsa, D. et al. Universal successor features approximators. Preprint at xrXiv https://arxiv.org/abs/1812.07626 (2018).
Taylor, M. E. & Stone, P. Transfer learning for reinforcement learning domains: a survey. J. Mach. Learn. Res. 10, 1633–1685 (2009).
Finn, C., Abbeel, P. & Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning 70, 1126–1135 (JMLR.org, 2017).
Caruana, R. Multitask learning. Mach. Learn. 28, 41–75 (1997).
Frans, K., Ho, J., Chen, X., Abbeel, P. & Schulman, J. Meta learning shared hierarchies. Preprint at arXiv https://arxiv.org/abs/1710.09767 (2017).
Wang, J. X. et al. Prefrontal cortex as a meta-reinforcement learning system. Nat. Neurosci. 21, 860 (2018).
Duan, Y. et al. Rl2: fast reinforcement learning via slow reinforcement learning. Preprint at arXiv https://arxiv.org/abs/1611.02779 (2016).
Harlow, H. F. The formation of learning sets. Psychol. Rev. 56, 51 (1949).
Yang, G. R., Joglekar, M. R., Song, H. F., Newsome, W. T. & Wang, X.-J. Task representations in neural networks trained to perform many cognitive tasks. Nat. Neurosci. 22, 297 (2019).
Stachenfeld, K. L., Botvinick, M. M. & Gershman, S. J. The hippocampus as a predictive map. Nat. Neurosci. 20, 1643 (2017).
O’Keefe, J. & Nadel, L. The Hippocampus as a Cognitive Map (Clarendon Press, 1978).
Gardner, M. P., Schoenbaum, G. & Gershman, S. J. Rethinking dopamine as generalized prediction error. Proc. R. Soc. B 285, 20181645 (2018).
Niv, Y. et al. Reinforcement learning in multidimensional environments relies on attention mechanisms. J. Neurosci. 35, 8145–8157 (2015).
Leong, Y. C., Radulescu, A., Daniel, R., DeWoskin, V. & Niv, Y. Dynamic interaction between reinforcement learning and attention in multidimensional environments. Neuron 93, 451–463 (2017).
Flesch, T., Balaguer, J., Dekker, R., Nili, H. & Summerfield, C. Comparing continual task learning in minds and machines. Proc. Natl Acad. Sci. U. S. A. 115, E10313–E10322 (2018).
Keramati, M. & Gutkin, B. Homeostatic reinforcement learning for integrating reward collection and physiological stability. eLife 3, e04811 (2014).
Schuck, N. W., Cai, M. B., Wilson, R. C. & Niv, Y. Human orbitofrontal cortex represents a cognitive map of state space. Neuron 91, 1402–1412 (2016).
Wilson, R. C., Takahashi, Y. K., Schoenbaum, G. & Niv, Y. Orbitofrontal cortex as a cognitive map of task space. Neuron 81, 267–279 (2014).
Williams, G. et al. The hypothalamus and the control of energy homeostasis: different circuits, different purposes. Physiol. Behav. 74, 683–701 (2001).
Burgess, C. R., Livneh, Y., Ramesh, R. N. & Andermann, M. L. Gating of visual processing by physiological need. Curr. Opin. Neurobiol. 49, 16–23 (2018).
Juechems, K. & Summerfield, C. Where does value come from?. Trends Cogn. Sci. 23, 836–850 (2019).
Botvinick, M. M. Hierarchical models of behavior and prefrontal function. Trends Cogn. Sci. 12, 201–208 (2008).
Chang, M. B., Gupta, A., Levine, S. & Griffiths, T. L. Automatically composing representation transformations as a means for generalization. in International Conference on Learning Representations https://openreview.net/forum?id=B1ffQnRcKX (2019).
Saxe, A. M., McClelland, J. L. & Ganguli, S. A mathematical theory of semantic development in deep neural networks. Proc. Natl Acad. Sci. U. S. A. 116, 11537–11546 (2019).
Tsividis, P. A., Pouncy, T., Xu, J. L., Tenenbaum, J. B. & Gershman, S. J. Human learning in Atari. in 2017 AAAI Spring Symposium Series (2017).
Momennejad, I. et al. The successor representation in human reinforcement learning. Nat. Hum. Behav. 1, 680 (2017).
Wu, C. M., Schulz, E., Speekenbrink, M., Nelson, J. D. & Meder, B. Generalization guides human exploration in vast decision spaces. Nat. Hum. Behav. 2, 915 (2018).
Stojić, H., Schulz, E., Analytis, P. & Speekenbrink, M. It’s new, but is it good? How generalization and uncertainty guide the exploration of novel options. J. Exp. Psychol. 149, 1878–1907 (2020).
Morey, R. D., Rouder, J. N., Jamil, T. & Morey, M. R. D. Package ‘BayesFactor’ (R Project, 2015).
Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D. & Iverson, G. Bayesian t tests for accepting and rejecting the null hypothesis. Psychon. Bull. Rev. 16, 225–237 (2009).
Gronau, Q. F., Singmann, H. & Wagenmakers, E.-J. bridgesampling: an R package for estimating normalizing constants. J. Stat. Soft. https://doi.org/10.18637/jss.v092.i10 (2020).
Lazaric, A. in Reinforcement Learning (ed. Wiering, M. & van Otterlo, M.) 143–173 (Springer, 2012).
Gershman, S. J. The successor representation: its computational logic and neural substrates. J. Neurosci. 38, 7193–7200 (2018).
Dayan, P. Improving generalization for temporal difference learning: the successor representation. Neural Comput. 5, 613–624 (1993).
Russek, E. M., Momennejad, I., Botvinick, M. M., Gershman, S. J. & Daw, N. D. Predictive representations can link model-based reinforcement learning to model-free mechanisms. PLoS Comput. Biol. 13, e1005768 (2017).
Stachenfeld, K. L., Botvinick, M. & Gershman, S. J. Adv. Neural Inform. Process. Syst. 27, 2528–2536 (2014).
Tomov, M., Yagati, S., Kumar, A., Yang, W. & Gershman, S. Discovery of hierarchical representations for efficient planning. PLoS Comput. Biol. https://doi.org/10.1371/journal.pcbi.1007594 (2020).
Franklin, N. T. & Frank, M. J. Compositional clustering in task structure learning. PLoS Comput. Biol. 14, e1006116 (2018).
Daw, N. D., O'Doherty, J. P., Dayan, P., Seymour, B. & Dolan, R. J. Cortical substrates for exploratory decisions in humans. Nature 441, 876 (2006).
The authors thank N. Franklin and W. Yang for helpful discussions. This research was supported by the Toyota Corporation, the Office of Naval Research (award N000141712984), the Harvard Data Science Initiative and the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
The authors declare no competing interests.
Peer review information
Primary Handling Editor: Marike Schiffer
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Tomov, M.S., Schulz, E. & Gershman, S.J. Multi-task reinforcement learning in humans. Nat Hum Behav 5, 764–773 (2021). https://doi.org/10.1038/s41562-020-01035-y