The successor representation in human reinforcement learning

Momennejad, I.; Russek, E. M.; Cheong, J. H.; Botvinick, M. M.; Daw, N. D.; Gershman, S. J.

doi:10.1038/s41562-017-0180-8

Article
Published: 28 August 2017

The successor representation in human reinforcement learning

Nature Human Behaviour volume 1, pages 680–692 (2017)Cite this article

12k Accesses
168 Citations
169 Altmetric
Metrics details

Subjects

Abstract

Theories of reward learning in neuroscience have focused on two families of algorithms thought to capture deliberative versus habitual choice. ‘Model-based’ algorithms compute the value of candidate actions from scratch, whereas ‘model-free’ algorithms make choice more efficient but less flexible by storing pre-computed action values. We examine an intermediate algorithmic family, the successor representation, which balances flexibility and efficiency by storing partially computed action values: predictions about future events. These pre-computation strategies differ in how they update their choices following changes in a task. The successor representation’s reliance on stored predictions about future states predicts a unique signature of insensitivity to changes in the task’s sequence of events, but flexible adjustment following changes to rewards. We provide evidence for such differential sensitivity in two behavioural studies with humans. These results suggest that the successor representation is a computational substrate for semi-flexible choice in humans, introducing a subtler, more cognitive notion of habit.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Comparison of stored representations, computations at the decision time and behaviour across models.**

**Fig. 2: Schematic of retrieved representations at test and model predictions in reward and transition revaluation trials.**

**Fig. 3: Schematic of the design of experiment 1.**

**Fig. 4: Behavioural performance in a passive sequential learning task.**

**Fig. 5: Model fits to the phase 3 test data from the passive learning task.**

**Fig. 6: Schematic of the active sequential learning task.**

**Fig. 7: Behavioural performance in a sequential decision task.**

**Fig. 8: Model fits to the data from the sequential decision task.**

Beyond dichotomies in reinforcement learning

Article 01 September 2020

Anne G. E. Collins & Jeffrey Cockburn

Linear reinforcement learning in planning, grid fields, and cognitive control

Article Open access 16 August 2021

Payam Piray & Nathaniel D. Daw

Model-based learning retrospectively updates model-free values

Article Open access 11 February 2022

Max Doody, Maaike M. H. Van Swieten & Sanjay G. Manohar

References

Dayan, P. Twenty-five lessons from computational neuromodulation. Neuron 76, 240–256 (2012).
Article CAS PubMed Google Scholar
Daw, N. D. & Dayan, P. The algorithmic anatomy of model-based evaluation. Phil. Trans. R. Soc. B 369, 20130478 (2014).
Article PubMed PubMed Central Google Scholar
Botvinick, M. & Weinstein, A. Model-based hierarchical reinforcement learning and human action control. Philos. Trans. R. Soc. Lond. B Biol. Sci. 369, 20130480 (2014).
Article PubMed PubMed Central Google Scholar
Dayan, P. Improving generalization for temporal difference learning: the successor representation. Neural Comput. 5, 613–624 (1993).
Article Google Scholar
Gershman, S. J., Moore, C. D., Todd, M. T., Norman, K. A. & Sederberg, P. B. The successor representation and temporal context. Neural Comput. 24, 1553–1568 (2012).
Article PubMed Google Scholar
Daw, N. D., Niv, Y. & Dayan, P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8, 1704–1711 (2005).
Article CAS PubMed Google Scholar
Dickinson, A. Actions and habits: the development of behavioural autonomy. Philos. Trans. R. Soc. B Biol. Sci. 308, 67–78 (1985).
Article Google Scholar
Tolman, E. C. Cognitive maps in rats and men. Psychol. Rev. 55, 189–208 (1948).
Article CAS PubMed Google Scholar
Lengyel, M. & Dayan, P. Hippocampal Contributions to Control: The Third Way in Proceedings of the 20th International Conference on Neural Information Processing Systems (Curran Associates, Red Hook, NY, 2007).
Collins, A. G. E. & Frank, M. J. How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis. Eur. J. Neurosci. 35, 1024–1035 (2012).
Article PubMed PubMed Central Google Scholar
Stachenfeld, K. L., Botvinick, M. M., & Gershman, S. J. The hippocampus as a predictive map. Preprint at http://www.biorxiv.org/content/early/2017/07/27/097170 (2017).
Schapiro, A. C., Rogers, T. T., Cordova, N. I., Turk-Browne, N. B. & Botvinick, M. M. Neural representations of events arise from temporal community structure. Nat. Neurosci. 16, 486–492 (2013).
Article CAS PubMed PubMed Central Google Scholar
Garvert, M. M., Dolan, R. J. & Behrens, T. E. A map of abstract relational knowledge in the human hippocampal–entorhinal cortex. eLife 6, e17086 (2017).
Article PubMed PubMed Central Google Scholar
Russek, E. M., Momennejad, I., Botvinick, M. M., Gershman, S. J. & Daw, N. D. Predictive representations can link model-based reinforcement learning to model-free mechanisms. Preprint at http://www.biorxiv.org/content/early/2016/10/27/083857 (2017).
Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P. & Dolan, R. J. Model-based influences on humans’ choices and striatal prediction errors. Neuron 69, 1204–1215 (2011).
Article CAS PubMed PubMed Central Google Scholar
Sadacca, B. F., Jones, J. L. & Schoenbaum, G. Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework. eLife 5, e13665 (2016).
Article PubMed PubMed Central Google Scholar
Gläscher, J., Daw, N., Dayan, P. & O’Doherty, J. P. States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron 66, 585–595 (2010).
Article PubMed PubMed Central Google Scholar
Brogden, W. J. Sensory pre-conditioning. J. Exp. Psychol. 25, 323 (1939).
Article Google Scholar
Wimmer, G. E. & Shohamy, D. Preference by association: how memory mechanisms in the hippocampus bias decisions. Science 338, 270–273 (2012).
Article CAS PubMed Google Scholar
Sutton, R. S. Dyna, an integrated architecture for learning, planning, and reacting. ACM SIGART Bulletin 2, 160–163 (1991).
Article Google Scholar
Gillan, C. M., Otto, A. R., Phelps, E. A. & Daw, N. D. Model-based learning protects against forming habits. Cogn. Affect. Behav. Neurosci. 15, 523–536 (2015).
Article PubMed PubMed Central Google Scholar
Gershman, S. J. & Daw, N. D. Reinforcement learning and episodic memory in humans and animals: an integrative framework. Annu. Rev. Psychol. 68, 101–128 (2017).
Article PubMed Google Scholar
Lee, S. W., Shimojo, S. & O’Doherty, J. P. Neural computations underlying arbitration between model-based and model-free learning. Neuron 81, 687–699 (2014).
Article CAS PubMed PubMed Central Google Scholar
Spiers, H. J. & Gilbert, S. J. Solving the detour problem in navigation: a model of prefrontal and hippocampal interactions. Front. Hum. Neurosci. 9, 125 (2015).
Article PubMed PubMed Central Google Scholar
Balleine, B. W. & Dickinson, A. Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology 37, 407–419 (1998).
Article CAS PubMed Google Scholar
Shohamy, D. & Daw, N. D. Integrating memories to guide decisions. Curr. Opin. Behav. Sci. 5, 85–90 (2015).
Article Google Scholar
Gershman, S. J., Horvitz, E. J. & Tenenbaum, J. B. Computational rationality: a converging paradigm for intelligence in brains, minds, and machines. Science 349, 273–278 (2015).
Article CAS PubMed Google Scholar
Boureau, Y.-L., Sokol-Hessner, P. & Daw, N. D. Deciding how to decide: self-control and meta-decision making. Trends Cogn. Sci. 19, 700–710 (2015).
Article PubMed Google Scholar
Kool, W., Cushman, F. A. & Gershman, S. J. When does model-based control pay uff? PloS Comput. Biol. 12, e1005090 (2016).
Article PubMed PubMed Central Google Scholar
Karlsson, M. P. & Frank, L. M. Awake replay of remote experiences in the hippocampus. Nat. Neurosci. 12, 913–918 (2009).
Article CAS PubMed PubMed Central Google Scholar
Ólafsdóttir, H. F., Barry, C., Saleem, A. B., Hassabis, D. & Spiers, H. J. Hippocampal place cells construct reward related sequences through unexplored space. eLife 4, e06063 (2015).
Article PubMed PubMed Central Google Scholar
Wu, X. & Foster, D. J. Hippocampal replay captures the unique topological structure of a novel environment. J. Neurosci. 34, 6459–6469 (2014).
Article CAS PubMed PubMed Central Google Scholar
Doll, B. B., Shohamy, D. & Daw, N. D. Multiple memory systems as substrates for multiple decision systems. Neurobiol. Learn. Mem. 117, 4–13 (2015).
Article PubMed Google Scholar
Jiang, N., Kulesza, A., Singh, S. & Lewis, R. The Dependence of Effective Planning Horizon on Model Accuracy in Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems (IFAAMAS, 2015).
Anderson, J. R. & Schooler, L. J. Reflections of the environment in memory. Psychol. Sci. 2, 396–408 (1991).
Article Google Scholar
Simon, D. A. & Daw, N. D. Environmental Statistics and the Trade-off Between Model-Based and TD Learning in Humans in Proceedings of the 24th International Conference on Neural Information Processing Systems (Curran Associates, Red Hook, NY, 2011).
Sutton, R. S. TD Models: Modeling the World at a Mixture of Time Scales. (University of Massachusetts, Amherst, MA, 1995).
Google Scholar
Tanaka, S. C. et al. Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops. Nat. Neurosci. 7, 887–893 (2004).
Article CAS PubMed Google Scholar
Kurth-Nelson, Z. & Redish, A. D. Temporal-difference reinforcement learning with distributed representations. PLoS ONE 4, e7362 (2009).
Article PubMed PubMed Central Google Scholar
O’Keefe, J. & Dostrovsky, J. The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely-moving rat. Brain Res. 34, 171–175 (1971).
Article PubMed Google Scholar
Barron, H. C., Dolan, R. J. & Behrens, T. E. J. Online evaluation of novel choices by simultaneous representation of multiple memories. Nat. Neurosci. 16, 1492–1498 (2013).
Article CAS PubMed PubMed Central Google Scholar
Tavares, R. M. et al. A map for social navigation in the human brain. Neuron 87, 231–243 (2015).
Article CAS PubMed PubMed Central Google Scholar
Brown, T. I. et al. Prospective representation of navigational goals in the human hippocampus. Science 352, 1323–1326 (2016).
Article CAS PubMed Google Scholar
Preston, A. R. & Eichenbaum, H. Interplay of hippocampus and prefrontal cortex in memory. Curr. Biol. 23, R764–R773 (2013).
Article CAS PubMed PubMed Central Google Scholar
Foster, D. J. & Knierim, J. J. Sequence learning and the role of the hippocampus in rodent navigation. Curr. Opin. Neurobiol. 22, 294–300 (2012).
Article CAS PubMed PubMed Central Google Scholar
Schapiro, A. C., Gregory, E., Landau, B., McCloskey, M. & Turk-Browne, N. B. The necessity of the medial temporal lobe for statistical learning. J. Cogn. Neurosci. 26, 1736–1747 (2014).
Article PubMed PubMed Central Google Scholar
Gupta, A. S., van der Meer, M. A. A., Touretzky, D. S. & Redish, A. D. Hippocampal replay is not a simple function of experience. Neuron 65, 695–705 (2010).
Article CAS PubMed PubMed Central Google Scholar
Pfeiffer, B. E. & Foster, D. J. Hippocampal place-cell sequences depict future paths to remembered goals. Nature 497, 74–79 (2013).
Article CAS PubMed PubMed Central Google Scholar
Schapiro, A. C., Turk-Browne, N. B., Botvinick, M. M. & Norman, K. A. Complementary learning systems within the hippocampus: a neural network modelling approach to reconciling episodic memory with statistical learning. Philos. Trans. R. Soc. Lond. B Biol. Sci. 372, 20160049 (2017).
Article PubMed Google Scholar
Momennejad, I. & Haynes, J.-D. Human anterior prefrontal cortex encodes the ‘what’ and ‘when’ of future intentions. NeuroImage 61, 139–148 (2012).
Article PubMed Google Scholar
Momennejad, I. & Haynes, J.-D. Encoding of prospective tasks in the human prefrontal cortex under varying task loads. J. Neurosci. 33, 17342–17349 (2013).
Article CAS PubMed Google Scholar
Euston, D. R., Gruber, A. J. & McNaughton, B. L. The role of medial prefrontal cortex in memory and decision making. Neuron 76, 1057–1070 (2012).
Article CAS PubMed PubMed Central Google Scholar
Maguire, E. A. Memory consolidation in humans: new evidence and opportunities. Exp. Physiol. 99, 471–486 (2014).
Article PubMed PubMed Central Google Scholar
Nieuwenhuis, I. L. C. & Takashima, A. The role of the ventromedial prefrontal cortex in memory consolidation. Behav. Brain Res. 218, 325–334 (2011).
Article PubMed Google Scholar
Hampton, A. N., Bossaerts, P. & O’Doherty, J. P. The role of the ventromedial prefrontal cortex in abstract state-based inference during decision making in humans. J. Neurosci. 26, 8360–8367 (2006).
Article CAS PubMed Google Scholar
Wunderlich, K., Dayan, P. & Dolan, R. J. Mapping value based planning and extensively trained choice in the human brain. Nat. Neurosci. 15, 786–791 (2012).
Article CAS PubMed PubMed Central Google Scholar
Wilson, R. C., Takahashi, Y. K., Schoenbaum, G. & Niv, Y. Orbitofrontal cortex as a cognitive map of task space. Neuron 81, 267–279 (2014).
Article CAS PubMed PubMed Central Google Scholar
Wikenheiser, A. M. & Schoenbaum, G. Over the river, through the woods: cognitive maps in the hippocampus and orbitofrontal cortex. Nat. Rev. Neurosci. 17, 513–523 (2016).
Article CAS PubMed PubMed Central Google Scholar
Ramus, S. J. & Eichenbaum, H. Neural correlates of olfactory recognition memory in the rat orbitofrontal cortex. J. Neurosci. 20, 8199–8208 (2000).
CAS PubMed Google Scholar
Balaguer, J., Spiers, H., Hassabis, D. & Summerfield, C. Neural mechanisms of hierarchical planning in a virtual subway network. Neuron 90, 893–903 (2016).
Article CAS PubMed PubMed Central Google Scholar
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction Vol. 1 (MIT Press, Cambridge, MA, 1998).
Huys, Q. J. M. et al. Disentangling the roles of approach, activation and valence in instrumental and Pavlovian responding. PLOS Comput. Biol. 7, e1002028 (2011).
Article CAS PubMed PubMed Central Google Scholar
Gureckis, T. M. et al. psiTurk: an open-source framework for conducting replicable behavioral experiments online. Behav. Res. Methods 48, 829–842 (2015).
Article Google Scholar
Huber, P. The Behavior of Maximum Likelihood Estimates Under Nonstandard Conditions in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. (Univ. California Press, Berkeley, CA, 1967).
Google Scholar

Download references

Acknowledgements

This project was made possible by grant support through the National Institutes of Health Collaborative Research in Computational Neuroscience award 1R01MH109177, the National Institutes of Health under R. L. Kirschstein National Research Service Award 1F31MH110111-01 and the John Templeton Foundation. The authors acknowledge K. Norman and R. Otto for helpful conversations, and A. Rich and S. Tubridy for assistance with psiTurk. The opinions expressed in this publication are those of the authors and do not necessarily reflect the views of the funding agencies. No funders had any role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

I. Momennejad and E. M. Russek are contributed equally to the work.

Authors and Affiliations

Department of Psychology, Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA
I. Momennejad & N. D. Daw
Center for Neural Science, New York University, New York, NY, USA
E. M. Russek
Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA
J. H. Cheong
DeepMind and Gatsby Computational Neuroscience Unit, University College London, London, UK
M. M. Botvinick
Department of Psychology, Center for Brain Science, Harvard University, Cambridge, MA, USA
S. J. Gershman

Authors

I. Momennejad
View author publications
You can also search for this author in PubMed Google Scholar
E. M. Russek
View author publications
You can also search for this author in PubMed Google Scholar
J. H. Cheong
View author publications
You can also search for this author in PubMed Google Scholar
M. M. Botvinick
View author publications
You can also search for this author in PubMed Google Scholar
N. D. Daw
View author publications
You can also search for this author in PubMed Google Scholar
S. J. Gershman
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

I.M., M.M.B. and S.J.G. designed experiment 1. J.H.C. conducted and collected the data. E.M.R. and N.D.D. designed and conducted experiment 2. I.M., E.M.R. and S.J.G. analysed the data and ran model simulations. I.M., E.M.R., M.M.B., N.D.D. and S.J.G. wrote the paper.

Corresponding author

Correspondence to I. Momennejad.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Supplementary Information

Supplementary Figures 1–9, Supplementary Methods and Results.

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Cite this article

Momennejad, I., Russek, E.M., Cheong, J.H. et al. The successor representation in human reinforcement learning. Nat Hum Behav 1, 680–692 (2017). https://doi.org/10.1038/s41562-017-0180-8

Download citation

Received: 17 October 2016
Accepted: 07 July 2017
Published: 28 August 2017
Issue Date: September 2017
DOI: https://doi.org/10.1038/s41562-017-0180-8

This article is cited by

A multi-stage anticipated surprise model with dynamic expectation for economic decision-making
- Ho Ka Chan
- Taro Toyoizumi
Scientific Reports (2024)
Knowledge generalization and the costs of multitasking
- Kelly G. Garner
- Paul E. Dux
Nature Reviews Neuroscience (2023)
Re-Thinking the Organization of Cortico-Basal Ganglia-Thalamo-Cortical Loops
- Javier Baladron
- Fred H. Hamker
Cognitive Computation (2023)
An Interpretable Neuro-symbolic Model for Raven’s Progressive Matrices Reasoning
- Shukuo Zhao
- Hongzhi You
- Da-Hui Wang
Cognitive Computation (2023)
Stock Price Formation: Precepts from a Multi-Agent Reinforcement Learning Model
- Johann Lussange
- Stefano Vrizzi
- Boris Gutkin
Computational Economics (2023)