Article

The successor representation in human reinforcement learning

Received:
Accepted:
Published online:

Abstract

Theories of reward learning in neuroscience have focused on two families of algorithms thought to capture deliberative versus habitual choice. ‘Model-based’ algorithms compute the value of candidate actions from scratch, whereas ‘model-free’ algorithms make choice more efficient but less flexible by storing pre-computed action values. We examine an intermediate algorithmic family, the successor representation, which balances flexibility and efficiency by storing partially computed action values: predictions about future events. These pre-computation strategies differ in how they update their choices following changes in a task. The successor representation’s reliance on stored predictions about future states predicts a unique signature of insensitivity to changes in the task’s sequence of events, but flexible adjustment following changes to rewards. We provide evidence for such differential sensitivity in two behavioural studies with humans. These results suggest that the successor representation is a computational substrate for semi-flexible choice in humans, introducing a subtler, more cognitive notion of habit.

  • Subscribe to Nature Human Behaviour for full access:

    $99

    Subscribe

Additional access options:

Already a subscriber?  Log in  now or  Register  for online access.

References

  1. 1.

    Dayan, P. Twenty-five lessons from computational neuromodulation. Neuron 76, 240–256 (2012).

  2. 2.

    Daw, N. D. & Dayan, P. The algorithmic anatomy of model-based evaluation. Phil. Trans. R. Soc. B 369, 20130478 (2014).

  3. 3.

    Botvinick, M. & Weinstein, A. Model-based hierarchical reinforcement learning and human action control. Philos. Trans. R. Soc. Lond. B Biol. Sci. 369, 20130480 (2014).

  4. 4.

    Dayan, P. Improving generalization for temporal difference learning: the successor representation. Neural Comput. 5, 613–624 (1993).

  5. 5.

    Gershman, S. J., Moore, C. D., Todd, M. T., Norman, K. A. & Sederberg, P. B. The successor representation and temporal context. Neural Comput. 24, 1553–1568 (2012).

  6. 6.

    Daw, N. D., Niv, Y. & Dayan, P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8, 1704–1711 (2005).

  7. 7.

    Dickinson, A. Actions and habits: the development of behavioural autonomy. Philos. Trans. R. Soc. B Biol. Sci. 308, 67–78 (1985).

  8. 8.

    Tolman, E. C. Cognitive maps in rats and men. Psychol. Rev. 55, 189–208 (1948).

  9. 9.

    Lengyel, M. & Dayan, P. Hippocampal Contributions to Control: The Third Way in Proceedings of the 20th International Conference on Neural Information Processing Systems (Curran Associates, Red Hook, NY, 2007).

  10. 10.

    Collins, A. G. E. & Frank, M. J. How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis. Eur. J. Neurosci. 35, 1024–1035 (2012).

  11. 11.

    Stachenfeld, K. L., Botvinick, M. M., & Gershman, S. J. The hippocampus as a predictive map. Preprint at http://www.biorxiv.org/content/early/2017/07/27/097170 (2017).

  12. 12.

    Schapiro, A. C., Rogers, T. T., Cordova, N. I., Turk-Browne, N. B. & Botvinick, M. M. Neural representations of events arise from temporal community structure. Nat. Neurosci. 16, 486–492 (2013).

  13. 13.

    Garvert, M. M., Dolan, R. J. & Behrens, T. E. A map of abstract relational knowledge in the human hippocampal–entorhinal cortex. eLife 6, e17086 (2017).

  14. 14.

    Russek, E. M., Momennejad, I., Botvinick, M. M., Gershman, S. J. & Daw, N. D. Predictive representations can link model-based reinforcement learning to model-free mechanisms. Preprint at http://www.biorxiv.org/content/early/2016/10/27/083857 (2017).

  15. 15.

    Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P. & Dolan, R. J. Model-based influences on humans’ choices and striatal prediction errors. Neuron 69, 1204–1215 (2011).

  16. 16.

    Sadacca, B. F., Jones, J. L. & Schoenbaum, G. Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework. eLife 5, e13665 (2016).

  17. 17.

    Gläscher, J., Daw, N., Dayan, P. & O’Doherty, J. P. States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron 66, 585–595 (2010).

  18. 18.

    Brogden, W. J. Sensory pre-conditioning. J. Exp. Psychol. 25, 323 (1939).

  19. 19.

    Wimmer, G. E. & Shohamy, D. Preference by association: how memory mechanisms in the hippocampus bias decisions. Science 338, 270–273 (2012).

  20. 20.

    Sutton, R. S. Dyna, an integrated architecture for learning, planning, and reacting. ACM SIGART Bulletin 2, 160–163 (1991).

  21. 21.

    Gillan, C. M., Otto, A. R., Phelps, E. A. & Daw, N. D. Model-based learning protects against forming habits. Cogn. Affect. Behav. Neurosci. 15, 523–536 (2015).

  22. 22.

    Gershman, S. J. & Daw, N. D. Reinforcement learning and episodic memory in humans and animals: an integrative framework. Annu. Rev. Psychol. 68, 101–128 (2017).

  23. 23.

    Lee, S. W., Shimojo, S. & O’Doherty, J. P. Neural computations underlying arbitration between model-based and model-free learning. Neuron 81, 687–699 (2014).

  24. 24.

    Spiers, H. J. & Gilbert, S. J. Solving the detour problem in navigation: a model of prefrontal and hippocampal interactions. Front. Hum. Neurosci. 9, 125 (2015).

  25. 25.

    Balleine, B. W. & Dickinson, A. Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology 37, 407–419 (1998).

  26. 26.

    Shohamy, D. & Daw, N. D. Integrating memories to guide decisions. Curr. Opin. Behav. Sci. 5, 85–90 (2015).

  27. 27.

    Gershman, S. J., Horvitz, E. J. & Tenenbaum, J. B. Computational rationality: a converging paradigm for intelligence in brains, minds, and machines. Science 349, 273–278 (2015).

  28. 28.

    Boureau, Y.-L., Sokol-Hessner, P. & Daw, N. D. Deciding how to decide: self-control and meta-decision making. Trends Cogn. Sci. 19, 700–710 (2015).

  29. 29.

    Kool, W., Cushman, F. A. & Gershman, S. J. When does model-based control pay uff? PloS Comput. Biol. 12, e1005090 (2016).

  30. 30.

    Karlsson, M. P. & Frank, L. M. Awake replay of remote experiences in the hippocampus. Nat. Neurosci. 12, 913–918 (2009).

  31. 31.

    Ólafsdóttir, H. F., Barry, C., Saleem, A. B., Hassabis, D. & Spiers, H. J. Hippocampal place cells construct reward related sequences through unexplored space. eLife 4, e06063 (2015).

  32. 32.

    Wu, X. & Foster, D. J. Hippocampal replay captures the unique topological structure of a novel environment. J. Neurosci. 34, 6459–6469 (2014).

  33. 33.

    Doll, B. B., Shohamy, D. & Daw, N. D. Multiple memory systems as substrates for multiple decision systems. Neurobiol. Learn. Mem. 117, 4–13 (2015).

  34. 34.

    Jiang, N., Kulesza, A., Singh, S. & Lewis, R. The Dependence of Effective Planning Horizon on Model Accuracy in Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems (IFAAMAS, 2015).

  35. 35.

    Anderson, J. R. & Schooler, L. J. Reflections of the environment in memory. Psychol. Sci. 2, 396–408 (1991).

  36. 36.

    Simon, D. A. & Daw, N. D. Environmental Statistics and the Trade-off Between Model-Based and TD Learning in Humans in Proceedings of the 24th International Conference on Neural Information Processing Systems (Curran Associates, Red Hook, NY, 2011).

  37. 37.

    Sutton, R. S. TD Models: Modeling the World at a Mixture of Time Scales. (University of Massachusetts, Amherst, MA, 1995).

  38. 38.

    Tanaka, S. C. et al. Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops. Nat. Neurosci. 7, 887–893 (2004).

  39. 39.

    Kurth-Nelson, Z. & Redish, A. D. Temporal-difference reinforcement learning with distributed representations. PLoS ONE 4, e7362 (2009).

  40. 40.

    O’Keefe, J. & Dostrovsky, J. The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely-moving rat. Brain Res. 34, 171–175 (1971).

  41. 41.

    Barron, H. C., Dolan, R. J. & Behrens, T. E. J. Online evaluation of novel choices by simultaneous representation of multiple memories. Nat. Neurosci. 16, 1492–1498 (2013).

  42. 42.

    Tavares, R. M. et al. A map for social navigation in the human brain. Neuron 87, 231–243 (2015).

  43. 43.

    Brown, T. I. et al. Prospective representation of navigational goals in the human hippocampus. Science 352, 1323–1326 (2016).

  44. 44.

    Preston, A. R. & Eichenbaum, H. Interplay of hippocampus and prefrontal cortex in memory. Curr. Biol. 23, R764–R773 (2013).

  45. 45.

    Foster, D. J. & Knierim, J. J. Sequence learning and the role of the hippocampus in rodent navigation. Curr. Opin. Neurobiol. 22, 294–300 (2012).

  46. 46.

    Schapiro, A. C., Gregory, E., Landau, B., McCloskey, M. & Turk-Browne, N. B. The necessity of the medial temporal lobe for statistical learning. J. Cogn. Neurosci. 26, 1736–1747 (2014).

  47. 47.

    Gupta, A. S., van der Meer, M. A. A., Touretzky, D. S. & Redish, A. D. Hippocampal replay is not a simple function of experience. Neuron 65, 695–705 (2010).

  48. 48.

    Pfeiffer, B. E. & Foster, D. J. Hippocampal place-cell sequences depict future paths to remembered goals. Nature 497, 74–79 (2013).

  49. 49.

    Schapiro, A. C., Turk-Browne, N. B., Botvinick, M. M. & Norman, K. A. Complementary learning systems within the hippocampus: a neural network modelling approach to reconciling episodic memory with statistical learning. Philos. Trans. R. Soc. Lond. B Biol. Sci. 372, 20160049 (2017).

  50. 50.

    Momennejad, I. & Haynes, J.-D. Human anterior prefrontal cortex encodes the ‘what’ and ‘when’ of future intentions. NeuroImage 61, 139–148 (2012).

  51. 51.

    Momennejad, I. & Haynes, J.-D. Encoding of prospective tasks in the human prefrontal cortex under varying task loads. J. Neurosci. 33, 17342–17349 (2013).

  52. 52.

    Euston, D. R., Gruber, A. J. & McNaughton, B. L. The role of medial prefrontal cortex in memory and decision making. Neuron 76, 1057–1070 (2012).

  53. 53.

    Maguire, E. A. Memory consolidation in humans: new evidence and opportunities. Exp. Physiol. 99, 471–486 (2014).

  54. 54.

    Nieuwenhuis, I. L. C. & Takashima, A. The role of the ventromedial prefrontal cortex in memory consolidation. Behav. Brain Res. 218, 325–334 (2011).

  55. 55.

    Hampton, A. N., Bossaerts, P. & O’Doherty, J. P. The role of the ventromedial prefrontal cortex in abstract state-based inference during decision making in humans. J. Neurosci. 26, 8360–8367 (2006).

  56. 56.

    Wunderlich, K., Dayan, P. & Dolan, R. J. Mapping value based planning and extensively trained choice in the human brain. Nat. Neurosci. 15, 786–791 (2012).

  57. 57.

    Wilson, R. C., Takahashi, Y. K., Schoenbaum, G. & Niv, Y. Orbitofrontal cortex as a cognitive map of task space. Neuron 81, 267–279 (2014).

  58. 58.

    Wikenheiser, A. M. & Schoenbaum, G. Over the river, through the woods: cognitive maps in the hippocampus and orbitofrontal cortex. Nat. Rev. Neurosci. 17, 513–523 (2016).

  59. 59.

    Ramus, S. J. & Eichenbaum, H. Neural correlates of olfactory recognition memory in the rat orbitofrontal cortex. J. Neurosci. 20, 8199–8208 (2000).

  60. 60.

    Balaguer, J., Spiers, H., Hassabis, D. & Summerfield, C. Neural mechanisms of hierarchical planning in a virtual subway network. Neuron 90, 893–903 (2016).

  61. 61.

    Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction Vol. 1 (MIT Press, Cambridge, MA, 1998).

  62. 62.

    Huys, Q. J. M. et al. Disentangling the roles of approach, activation and valence in instrumental and Pavlovian responding. PLOS Comput. Biol. 7, e1002028 (2011).

  63. 63.

    Gureckis, T. M. et al. psiTurk: an open-source framework for conducting replicable behavioral experiments online. Behav. Res. Methods 48, 829–842 (2015).

  64. 64.

    Huber, P. The Behavior of Maximum Likelihood Estimates Under Nonstandard Conditions in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. (Univ. California Press, Berkeley, CA, 1967).

Download references

Acknowledgements

This project was made possible by grant support through the National Institutes of Health Collaborative Research in Computational Neuroscience award 1R01MH109177, the National Institutes of Health under R. L. Kirschstein National Research Service Award 1F31MH110111-01 and the John Templeton Foundation. The authors acknowledge K. Norman and R. Otto for helpful conversations, and A. Rich and S. Tubridy for assistance with psiTurk. The opinions expressed in this publication are those of the authors and do not necessarily reflect the views of the funding agencies. No funders had any role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Author notes

  1. I. Momennejad and E. M. Russek are contributed equally to the work.

Affiliations

  1. Department of Psychology, Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA

    • I. Momennejad
    •  & N. D. Daw
  2. Center for Neural Science, New York University, New York, NY, USA

    • E. M. Russek
  3. Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA

    • J. H. Cheong
  4. DeepMind and Gatsby Computational Neuroscience Unit, University College London, London, UK

    • M. M. Botvinick
  5. Department of Psychology, Center for Brain Science, Harvard University, Cambridge, MA, USA

    • S. J. Gershman

Authors

  1. Search for I. Momennejad in:

  2. Search for E. M. Russek in:

  3. Search for J. H. Cheong in:

  4. Search for M. M. Botvinick in:

  5. Search for N. D. Daw in:

  6. Search for S. J. Gershman in:

Contributions

I.M., M.M.B. and S.J.G. designed experiment 1. J.H.C. conducted and collected the data. E.M.R. and N.D.D. designed and conducted experiment 2. I.M., E.M.R. and S.J.G. analysed the data and ran model simulations. I.M., E.M.R., M.M.B., N.D.D. and S.J.G. wrote the paper.

Competing interests

The authors declare no competing interests.

Corresponding author

Correspondence to I. Momennejad.

Electronic supplementary material

  1. 1.

    Supplementary Information

    Supplementary Figures 1–9, Supplementary Methods and Results.

  2. 2.

    Reporting Summary

    Reporting Summary