Multi-task reinforcement learning in humans

Tomov, Momchil S.; Schulz, Eric; Gershman, Samuel J.

doi:10.1038/s41562-020-01035-y

Article
Published: 28 January 2021

Multi-task reinforcement learning in humans

Nature Human Behaviour volume 5, pages 764–773 (2021)Cite this article

6560 Accesses
23 Citations
18 Altmetric
Metrics details

Subjects

Abstract

The ability to transfer knowledge across tasks and generalize to novel ones is an important hallmark of human intelligence. Yet not much is known about human multitask reinforcement learning. We study participants’ behaviour in a two-step decision-making task with multiple features and changing reward functions. We compare their behaviour with two algorithms for multitask reinforcement learning, one that maps previous policies and encountered features to new reward functions and one that approximates value functions across tasks, as well as to standard model-based and model-free algorithms. Across three exploratory experiments and a large preregistered confirmatory experiment, our results provide evidence that participants who are able to learn the task use a strategy that maps previously learned policies to novel scenarios. These results enrich our understanding of human reinforcement learning in complex environments with changing task demands.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Overview of theoretical setup and experiments.**

**Fig. 2: Overview and results of experiment 1.**

**Fig. 3: Overview and results of experiment 2.**

**Fig. 4: Overview and results of experiment 3.**

**Fig. 5: Results of preregistered experiment 4.**

Maximum diffusion reinforcement learning

Article 02 May 2024

The development of human causal learning and reasoning

Article 26 April 2024

Two common and distinct forms of variation in human functional brain networks

Article 30 April 2024

Data availability

Anonymized participant data and model simulation data are available at https://github.com/tomov/MTRL.

Code availability

Code for all models and analyses is available at https://github.com/tomov/MTRL.

References

Meyer, D. E. & Kieras, D. E. A computational theory of executive cognitive processes and multiple-task performance: part I. Basic mechanisms. Psychol. Rev. 104, 3 (1997).
Article CAS Google Scholar
Lake, B. M., Ullman, T. D., Tenenbaum, J. B. & Gershman, S. J. Building machines that learn and think like people. Behav. Brain Sci. 40, e253 (2017).
Article Google Scholar
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT Press, 1998).
Schaul, T., Horgan, D., Gregor, K. & Silver, D. Universal Value Function Approximators. In International Conference on Machine Learning, 1312–1320 (2015).
Barreto, A. et al. Successor features for transfer in reinforcement learning. Adv. Neural Inform. Process. Syst. 30, 4055–4065 (2017).
Google Scholar
Barreto, A. et al. Transfer in deep reinforcement learning using successor features and generalised policy improvement. Proc. Mach. Learn. Res. 80, 501–510 (2018).
Google Scholar
Borsa, D. et al. Universal successor features approximators. Preprint at xrXiv https://arxiv.org/abs/1812.07626 (2018).
Taylor, M. E. & Stone, P. Transfer learning for reinforcement learning domains: a survey. J. Mach. Learn. Res. 10, 1633–1685 (2009).
Google Scholar
Finn, C., Abbeel, P. & Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning 70, 1126–1135 (JMLR.org, 2017).
Caruana, R. Multitask learning. Mach. Learn. 28, 41–75 (1997).
Article Google Scholar
Frans, K., Ho, J., Chen, X., Abbeel, P. & Schulman, J. Meta learning shared hierarchies. Preprint at arXiv https://arxiv.org/abs/1710.09767 (2017).
Wang, J. X. et al. Prefrontal cortex as a meta-reinforcement learning system. Nat. Neurosci. 21, 860 (2018).
Article CAS Google Scholar
Duan, Y. et al. Rl²: fast reinforcement learning via slow reinforcement learning. Preprint at arXiv https://arxiv.org/abs/1611.02779 (2016).
Harlow, H. F. The formation of learning sets. Psychol. Rev. 56, 51 (1949).
Article CAS Google Scholar
Yang, G. R., Joglekar, M. R., Song, H. F., Newsome, W. T. & Wang, X.-J. Task representations in neural networks trained to perform many cognitive tasks. Nat. Neurosci. 22, 297 (2019).
Article CAS Google Scholar
Stachenfeld, K. L., Botvinick, M. M. & Gershman, S. J. The hippocampus as a predictive map. Nat. Neurosci. 20, 1643 (2017).
Article CAS Google Scholar
O’Keefe, J. & Nadel, L. The Hippocampus as a Cognitive Map (Clarendon Press, 1978).
Gardner, M. P., Schoenbaum, G. & Gershman, S. J. Rethinking dopamine as generalized prediction error. Proc. R. Soc. B 285, 20181645 (2018).
Article Google Scholar
Niv, Y. et al. Reinforcement learning in multidimensional environments relies on attention mechanisms. J. Neurosci. 35, 8145–8157 (2015).
Article CAS Google Scholar
Leong, Y. C., Radulescu, A., Daniel, R., DeWoskin, V. & Niv, Y. Dynamic interaction between reinforcement learning and attention in multidimensional environments. Neuron 93, 451–463 (2017).
Article CAS Google Scholar
Flesch, T., Balaguer, J., Dekker, R., Nili, H. & Summerfield, C. Comparing continual task learning in minds and machines. Proc. Natl Acad. Sci. U. S. A. 115, E10313–E10322 (2018).
Article CAS Google Scholar
Keramati, M. & Gutkin, B. Homeostatic reinforcement learning for integrating reward collection and physiological stability. eLife 3, e04811 (2014).
Article Google Scholar
Schuck, N. W., Cai, M. B., Wilson, R. C. & Niv, Y. Human orbitofrontal cortex represents a cognitive map of state space. Neuron 91, 1402–1412 (2016).
Article CAS Google Scholar
Wilson, R. C., Takahashi, Y. K., Schoenbaum, G. & Niv, Y. Orbitofrontal cortex as a cognitive map of task space. Neuron 81, 267–279 (2014).
Article CAS Google Scholar
Williams, G. et al. The hypothalamus and the control of energy homeostasis: different circuits, different purposes. Physiol. Behav. 74, 683–701 (2001).
Article CAS Google Scholar
Burgess, C. R., Livneh, Y., Ramesh, R. N. & Andermann, M. L. Gating of visual processing by physiological need. Curr. Opin. Neurobiol. 49, 16–23 (2018).
Article CAS Google Scholar
Juechems, K. & Summerfield, C. Where does value come from?. Trends Cogn. Sci. 23, 836–850 (2019).
Article Google Scholar
Botvinick, M. M. Hierarchical models of behavior and prefrontal function. Trends Cogn. Sci. 12, 201–208 (2008).
Article Google Scholar
Chang, M. B., Gupta, A., Levine, S. & Griffiths, T. L. Automatically composing representation transformations as a means for generalization. in International Conference on Learning Representations https://openreview.net/forum?id=B1ffQnRcKX (2019).
Saxe, A. M., McClelland, J. L. & Ganguli, S. A mathematical theory of semantic development in deep neural networks. Proc. Natl Acad. Sci. U. S. A. 116, 11537–11546 (2019).
Article CAS Google Scholar
Tsividis, P. A., Pouncy, T., Xu, J. L., Tenenbaum, J. B. & Gershman, S. J. Human learning in Atari. in 2017 AAAI Spring Symposium Series (2017).
Momennejad, I. et al. The successor representation in human reinforcement learning. Nat. Hum. Behav. 1, 680 (2017).
Article CAS Google Scholar
Wu, C. M., Schulz, E., Speekenbrink, M., Nelson, J. D. & Meder, B. Generalization guides human exploration in vast decision spaces. Nat. Hum. Behav. 2, 915 (2018).
Article Google Scholar
Stojić, H., Schulz, E., Analytis, P. & Speekenbrink, M. It’s new, but is it good? How generalization and uncertainty guide the exploration of novel options. J. Exp. Psychol. 149, 1878–1907 (2020).
Article Google Scholar
Morey, R. D., Rouder, J. N., Jamil, T. & Morey, M. R. D. Package ‘BayesFactor’ (R Project, 2015).
Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D. & Iverson, G. Bayesian t tests for accepting and rejecting the null hypothesis. Psychon. Bull. Rev. 16, 225–237 (2009).
Article Google Scholar
Gronau, Q. F., Singmann, H. & Wagenmakers, E.-J. bridgesampling: an R package for estimating normalizing constants. J. Stat. Soft. https://doi.org/10.18637/jss.v092.i10 (2020).
Lazaric, A. in Reinforcement Learning (ed. Wiering, M. & van Otterlo, M.) 143–173 (Springer, 2012).
Gershman, S. J. The successor representation: its computational logic and neural substrates. J. Neurosci. 38, 7193–7200 (2018).
Article CAS Google Scholar
Dayan, P. Improving generalization for temporal difference learning: the successor representation. Neural Comput. 5, 613–624 (1993).
Article Google Scholar
Russek, E. M., Momennejad, I., Botvinick, M. M., Gershman, S. J. & Daw, N. D. Predictive representations can link model-based reinforcement learning to model-free mechanisms. PLoS Comput. Biol. 13, e1005768 (2017).
Article Google Scholar
Stachenfeld, K. L., Botvinick, M. & Gershman, S. J. Adv. Neural Inform. Process. Syst. 27, 2528–2536 (2014).
Google Scholar
Tomov, M., Yagati, S., Kumar, A., Yang, W. & Gershman, S. Discovery of hierarchical representations for efficient planning. PLoS Comput. Biol. https://doi.org/10.1371/journal.pcbi.1007594 (2020).
Franklin, N. T. & Frank, M. J. Compositional clustering in task structure learning. PLoS Comput. Biol. 14, e1006116 (2018).
Article Google Scholar
Daw, N. D., O'Doherty, J. P., Dayan, P., Seymour, B. & Dolan, R. J. Cortical substrates for exploratory decisions in humans. Nature 441, 876 (2006).
Article CAS Google Scholar

Download references

Acknowledgements

The authors thank N. Franklin and W. Yang for helpful discussions. This research was supported by the Toyota Corporation, the Office of Naval Research (award N000141712984), the Harvard Data Science Initiative and the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Contributed equally. Momchil S. Tomov and Eric Schulz

Authors and Affiliations

Program in Neuroscience, Harvard Medical School, Boston, MA, USA
Momchil S. Tomov
Center for Brain Science, Harvard University, Cambridge, MA, USA
Momchil S. Tomov & Samuel J. Gershman
Max Planck Institute for Biological Cybernetics, Tübingen, Germany
Eric Schulz
Department of Psychology, Harvard University, Cambridge, MA, USA
Eric Schulz & Samuel J. Gershman
Center for Brains, Minds and Machines, Cambridge, MA, USA
Samuel J. Gershman

Authors

Momchil S. Tomov
View author publications
You can also search for this author in PubMed Google Scholar
Eric Schulz
View author publications
You can also search for this author in PubMed Google Scholar
Samuel J. Gershman
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.S.T. and E.S. contributed equally. M.S.T., E.S. and S.J.G. conceived the experiments, M.S.T. and E.S. conducted the experiments and analysed the results. All authors wrote the manuscript.

Corresponding authors

Correspondence to Momchil S. Tomov or Eric Schulz.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information

Primary Handling Editor: Marike Schiffer

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Information with additional analyses and plots, Supplementary Figs. 1–6 and Supplementary References.

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tomov, M.S., Schulz, E. & Gershman, S.J. Multi-task reinforcement learning in humans. Nat Hum Behav 5, 764–773 (2021). https://doi.org/10.1038/s41562-020-01035-y

Download citation

Received: 22 October 2019
Accepted: 10 December 2020
Published: 28 January 2021
Issue Date: June 2021
DOI: https://doi.org/10.1038/s41562-020-01035-y

This article is cited by

Stable training via elastic adaptive deep reinforcement learning for autonomous navigation of intelligent vehicles
- Yujiao Zhao
- Yong Ma
- Xinping Yan
Communications Engineering (2024)
Actively Learning to Learn Causal Relationships
- Chentian Jiang
- Christopher G. Lucas
Computational Brain & Behavior (2024)
Solution of the Hirota equation using a physics-informed neural network method with embedded conservation laws
- Ruibo Zhang
- Jin Su
- Jinqian Feng
Nonlinear Dynamics (2023)
Value-free random exploration is linked to impulsivity
- Magda Dubois
- Tobias U. Hauser
Nature Communications (2022)