The successor representation in human reinforcement learning

Abstract

Theories of reward learning in neuroscience have focused on two families of algorithms thought to capture deliberative versus habitual choice. ‘Model-based’ algorithms compute the value of candidate actions from scratch, whereas ‘model-free’ algorithms make choice more efficient but less flexible by storing pre-computed action values. We examine an intermediate algorithmic family, the successor representation, which balances flexibility and efficiency by storing partially computed action values: predictions about future events. These pre-computation strategies differ in how they update their choices following changes in a task. The successor representation’s reliance on stored predictions about future states predicts a unique signature of insensitivity to changes in the task’s sequence of events, but flexible adjustment following changes to rewards. We provide evidence for such differential sensitivity in two behavioural studies with humans. These results suggest that the successor representation is a computational substrate for semi-flexible choice in humans, introducing a subtler, more cognitive notion of habit.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Comparison of stored representations, computations at the decision time and behaviour across models.
Fig. 2: Schematic of retrieved representations at test and model predictions in reward and transition revaluation trials.
Fig. 3: Schematic of the design of experiment 1.
Fig. 4: Behavioural performance in a passive sequential learning task.
Fig. 5: Model fits to the phase 3 test data from the passive learning task.
Fig. 6: Schematic of the active sequential learning task.
Fig. 7: Behavioural performance in a sequential decision task.
Fig. 8: Model fits to the data from the sequential decision task.

References

  1. 1.

    Dayan, P. Twenty-five lessons from computational neuromodulation. Neuron 76, 240–256 (2012).

    CAS  Article  PubMed  Google Scholar 

  2. 2.

    Daw, N. D. & Dayan, P. The algorithmic anatomy of model-based evaluation. Phil. Trans. R. Soc. B 369, 20130478 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  3. 3.

    Botvinick, M. & Weinstein, A. Model-based hierarchical reinforcement learning and human action control. Philos. Trans. R. Soc. Lond. B Biol. Sci. 369, 20130480 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  4. 4.

    Dayan, P. Improving generalization for temporal difference learning: the successor representation. Neural Comput. 5, 613–624 (1993).

    Article  Google Scholar 

  5. 5.

    Gershman, S. J., Moore, C. D., Todd, M. T., Norman, K. A. & Sederberg, P. B. The successor representation and temporal context. Neural Comput. 24, 1553–1568 (2012).

    Article  PubMed  Google Scholar 

  6. 6.

    Daw, N. D., Niv, Y. & Dayan, P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8, 1704–1711 (2005).

    CAS  Article  PubMed  Google Scholar 

  7. 7.

    Dickinson, A. Actions and habits: the development of behavioural autonomy. Philos. Trans. R. Soc. B Biol. Sci. 308, 67–78 (1985).

    Article  Google Scholar 

  8. 8.

    Tolman, E. C. Cognitive maps in rats and men. Psychol. Rev. 55, 189–208 (1948).

    CAS  Article  PubMed  Google Scholar 

  9. 9.

    Lengyel, M. & Dayan, P. Hippocampal Contributions to Control: The Third Way in Proceedings of the 20th International Conference on Neural Information Processing Systems (Curran Associates, Red Hook, NY, 2007).

  10. 10.

    Collins, A. G. E. & Frank, M. J. How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis. Eur. J. Neurosci. 35, 1024–1035 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  11. 11.

    Stachenfeld, K. L., Botvinick, M. M., & Gershman, S. J. The hippocampus as a predictive map. Preprint at http://www.biorxiv.org/content/early/2017/07/27/097170 (2017).

  12. 12.

    Schapiro, A. C., Rogers, T. T., Cordova, N. I., Turk-Browne, N. B. & Botvinick, M. M. Neural representations of events arise from temporal community structure. Nat. Neurosci. 16, 486–492 (2013).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  13. 13.

    Garvert, M. M., Dolan, R. J. & Behrens, T. E. A map of abstract relational knowledge in the human hippocampal–entorhinal cortex. eLife 6, e17086 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  14. 14.

    Russek, E. M., Momennejad, I., Botvinick, M. M., Gershman, S. J. & Daw, N. D. Predictive representations can link model-based reinforcement learning to model-free mechanisms. Preprint at http://www.biorxiv.org/content/early/2016/10/27/083857 (2017).

  15. 15.

    Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P. & Dolan, R. J. Model-based influences on humans’ choices and striatal prediction errors. Neuron 69, 1204–1215 (2011).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  16. 16.

    Sadacca, B. F., Jones, J. L. & Schoenbaum, G. Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework. eLife 5, e13665 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  17. 17.

    Gläscher, J., Daw, N., Dayan, P. & O’Doherty, J. P. States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron 66, 585–595 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  18. 18.

    Brogden, W. J. Sensory pre-conditioning. J. Exp. Psychol. 25, 323 (1939).

    Article  Google Scholar 

  19. 19.

    Wimmer, G. E. & Shohamy, D. Preference by association: how memory mechanisms in the hippocampus bias decisions. Science 338, 270–273 (2012).

    CAS  Article  PubMed  Google Scholar 

  20. 20.

    Sutton, R. S. Dyna, an integrated architecture for learning, planning, and reacting. ACM SIGART Bulletin 2, 160–163 (1991).

    Article  Google Scholar 

  21. 21.

    Gillan, C. M., Otto, A. R., Phelps, E. A. & Daw, N. D. Model-based learning protects against forming habits. Cogn. Affect. Behav. Neurosci. 15, 523–536 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  22. 22.

    Gershman, S. J. & Daw, N. D. Reinforcement learning and episodic memory in humans and animals: an integrative framework. Annu. Rev. Psychol. 68, 101–128 (2017).

    Article  PubMed  Google Scholar 

  23. 23.

    Lee, S. W., Shimojo, S. & O’Doherty, J. P. Neural computations underlying arbitration between model-based and model-free learning. Neuron 81, 687–699 (2014).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  24. 24.

    Spiers, H. J. & Gilbert, S. J. Solving the detour problem in navigation: a model of prefrontal and hippocampal interactions. Front. Hum. Neurosci. 9, 125 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  25. 25.

    Balleine, B. W. & Dickinson, A. Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology 37, 407–419 (1998).

    CAS  Article  PubMed  Google Scholar 

  26. 26.

    Shohamy, D. & Daw, N. D. Integrating memories to guide decisions. Curr. Opin. Behav. Sci. 5, 85–90 (2015).

    Article  Google Scholar 

  27. 27.

    Gershman, S. J., Horvitz, E. J. & Tenenbaum, J. B. Computational rationality: a converging paradigm for intelligence in brains, minds, and machines. Science 349, 273–278 (2015).

    CAS  Article  PubMed  Google Scholar 

  28. 28.

    Boureau, Y.-L., Sokol-Hessner, P. & Daw, N. D. Deciding how to decide: self-control and meta-decision making. Trends Cogn. Sci. 19, 700–710 (2015).

    Article  PubMed  Google Scholar 

  29. 29.

    Kool, W., Cushman, F. A. & Gershman, S. J. When does model-based control pay uff? PloS Comput. Biol. 12, e1005090 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  30. 30.

    Karlsson, M. P. & Frank, L. M. Awake replay of remote experiences in the hippocampus. Nat. Neurosci. 12, 913–918 (2009).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  31. 31.

    Ólafsdóttir, H. F., Barry, C., Saleem, A. B., Hassabis, D. & Spiers, H. J. Hippocampal place cells construct reward related sequences through unexplored space. eLife 4, e06063 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  32. 32.

    Wu, X. & Foster, D. J. Hippocampal replay captures the unique topological structure of a novel environment. J. Neurosci. 34, 6459–6469 (2014).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  33. 33.

    Doll, B. B., Shohamy, D. & Daw, N. D. Multiple memory systems as substrates for multiple decision systems. Neurobiol. Learn. Mem. 117, 4–13 (2015).

    Article  PubMed  Google Scholar 

  34. 34.

    Jiang, N., Kulesza, A., Singh, S. & Lewis, R. The Dependence of Effective Planning Horizon on Model Accuracy in Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems (IFAAMAS, 2015).

  35. 35.

    Anderson, J. R. & Schooler, L. J. Reflections of the environment in memory. Psychol. Sci. 2, 396–408 (1991).

    Article  Google Scholar 

  36. 36.

    Simon, D. A. & Daw, N. D. Environmental Statistics and the Trade-off Between Model-Based and TD Learning in Humans in Proceedings of the 24th International Conference on Neural Information Processing Systems (Curran Associates, Red Hook, NY, 2011).

  37. 37.

    Sutton, R. S. TD Models: Modeling the World at a Mixture of Time Scales. (University of Massachusetts, Amherst, MA, 1995).

    Google Scholar 

  38. 38.

    Tanaka, S. C. et al. Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops. Nat. Neurosci. 7, 887–893 (2004).

    CAS  Article  PubMed  Google Scholar 

  39. 39.

    Kurth-Nelson, Z. & Redish, A. D. Temporal-difference reinforcement learning with distributed representations. PLoS ONE 4, e7362 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  40. 40.

    O’Keefe, J. & Dostrovsky, J. The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely-moving rat. Brain Res. 34, 171–175 (1971).

    Article  PubMed  Google Scholar 

  41. 41.

    Barron, H. C., Dolan, R. J. & Behrens, T. E. J. Online evaluation of novel choices by simultaneous representation of multiple memories. Nat. Neurosci. 16, 1492–1498 (2013).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  42. 42.

    Tavares, R. M. et al. A map for social navigation in the human brain. Neuron 87, 231–243 (2015).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  43. 43.

    Brown, T. I. et al. Prospective representation of navigational goals in the human hippocampus. Science 352, 1323–1326 (2016).

    CAS  Article  PubMed  Google Scholar 

  44. 44.

    Preston, A. R. & Eichenbaum, H. Interplay of hippocampus and prefrontal cortex in memory. Curr. Biol. 23, R764–R773 (2013).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  45. 45.

    Foster, D. J. & Knierim, J. J. Sequence learning and the role of the hippocampus in rodent navigation. Curr. Opin. Neurobiol. 22, 294–300 (2012).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  46. 46.

    Schapiro, A. C., Gregory, E., Landau, B., McCloskey, M. & Turk-Browne, N. B. The necessity of the medial temporal lobe for statistical learning. J. Cogn. Neurosci. 26, 1736–1747 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  47. 47.

    Gupta, A. S., van der Meer, M. A. A., Touretzky, D. S. & Redish, A. D. Hippocampal replay is not a simple function of experience. Neuron 65, 695–705 (2010).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  48. 48.

    Pfeiffer, B. E. & Foster, D. J. Hippocampal place-cell sequences depict future paths to remembered goals. Nature 497, 74–79 (2013).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  49. 49.

    Schapiro, A. C., Turk-Browne, N. B., Botvinick, M. M. & Norman, K. A. Complementary learning systems within the hippocampus: a neural network modelling approach to reconciling episodic memory with statistical learning. Philos. Trans. R. Soc. Lond. B Biol. Sci. 372, 20160049 (2017).

    Article  PubMed  Google Scholar 

  50. 50.

    Momennejad, I. & Haynes, J.-D. Human anterior prefrontal cortex encodes the ‘what’ and ‘when’ of future intentions. NeuroImage 61, 139–148 (2012).

    Article  PubMed  Google Scholar 

  51. 51.

    Momennejad, I. & Haynes, J.-D. Encoding of prospective tasks in the human prefrontal cortex under varying task loads. J. Neurosci. 33, 17342–17349 (2013).

    CAS  Article  PubMed  Google Scholar 

  52. 52.

    Euston, D. R., Gruber, A. J. & McNaughton, B. L. The role of medial prefrontal cortex in memory and decision making. Neuron 76, 1057–1070 (2012).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  53. 53.

    Maguire, E. A. Memory consolidation in humans: new evidence and opportunities. Exp. Physiol. 99, 471–486 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  54. 54.

    Nieuwenhuis, I. L. C. & Takashima, A. The role of the ventromedial prefrontal cortex in memory consolidation. Behav. Brain Res. 218, 325–334 (2011).

    Article  PubMed  Google Scholar 

  55. 55.

    Hampton, A. N., Bossaerts, P. & O’Doherty, J. P. The role of the ventromedial prefrontal cortex in abstract state-based inference during decision making in humans. J. Neurosci. 26, 8360–8367 (2006).

    CAS  Article  PubMed  Google Scholar 

  56. 56.

    Wunderlich, K., Dayan, P. & Dolan, R. J. Mapping value based planning and extensively trained choice in the human brain. Nat. Neurosci. 15, 786–791 (2012).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  57. 57.

    Wilson, R. C., Takahashi, Y. K., Schoenbaum, G. & Niv, Y. Orbitofrontal cortex as a cognitive map of task space. Neuron 81, 267–279 (2014).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  58. 58.

    Wikenheiser, A. M. & Schoenbaum, G. Over the river, through the woods: cognitive maps in the hippocampus and orbitofrontal cortex. Nat. Rev. Neurosci. 17, 513–523 (2016).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  59. 59.

    Ramus, S. J. & Eichenbaum, H. Neural correlates of olfactory recognition memory in the rat orbitofrontal cortex. J. Neurosci. 20, 8199–8208 (2000).

    CAS  PubMed  Google Scholar 

  60. 60.

    Balaguer, J., Spiers, H., Hassabis, D. & Summerfield, C. Neural mechanisms of hierarchical planning in a virtual subway network. Neuron 90, 893–903 (2016).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  61. 61

    Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction Vol. 1 (MIT Press, Cambridge, MA, 1998).

  62. 62.

    Huys, Q. J. M. et al. Disentangling the roles of approach, activation and valence in instrumental and Pavlovian responding. PLOS Comput. Biol. 7, e1002028 (2011).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  63. 63.

    Gureckis, T. M. et al. psiTurk: an open-source framework for conducting replicable behavioral experiments online. Behav. Res. Methods 48, 829–842 (2015).

    Article  Google Scholar 

  64. 64.

    Huber, P. The Behavior of Maximum Likelihood Estimates Under Nonstandard Conditions in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. (Univ. California Press, Berkeley, CA, 1967).

    Google Scholar 

Download references

Acknowledgements

This project was made possible by grant support through the National Institutes of Health Collaborative Research in Computational Neuroscience award 1R01MH109177, the National Institutes of Health under R. L. Kirschstein National Research Service Award 1F31MH110111-01 and the John Templeton Foundation. The authors acknowledge K. Norman and R. Otto for helpful conversations, and A. Rich and S. Tubridy for assistance with psiTurk. The opinions expressed in this publication are those of the authors and do not necessarily reflect the views of the funding agencies. No funders had any role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Affiliations

Authors

Contributions

I.M., M.M.B. and S.J.G. designed experiment 1. J.H.C. conducted and collected the data. E.M.R. and N.D.D. designed and conducted experiment 2. I.M., E.M.R. and S.J.G. analysed the data and ran model simulations. I.M., E.M.R., M.M.B., N.D.D. and S.J.G. wrote the paper.

Corresponding author

Correspondence to I. Momennejad.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Supplementary Information

Supplementary Figures 1–9, Supplementary Methods and Results.

Reporting Summary

Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Momennejad, I., Russek, E.M., Cheong, J.H. et al. The successor representation in human reinforcement learning. Nat Hum Behav 1, 680–692 (2017). https://doi.org/10.1038/s41562-017-0180-8

Download citation

Further reading