Prioritized memory access explains planning and hippocampal replay

Mattar, Marcelo G.; Daw, Nathaniel D.

doi:10.1038/s41593-018-0232-z

Article
Published: 22 October 2018

Prioritized memory access explains planning and hippocampal replay

Nature Neuroscience volume 21, pages 1609–1617 (2018)Cite this article

18k Accesses
153 Citations
146 Altmetric
Metrics details

Subjects

Abstract

To make decisions, animals must evaluate candidate choices by accessing memories of relevant experiences. Yet little is known about which experiences are considered or ignored during deliberation, which ultimately governs choice. We propose a normative theory predicting which memories should be accessed at each moment to optimize future decisions. Using nonlocal ‘replay’ of spatial locations in hippocampus as a window into memory access, we simulate a spatial navigation task in which an agent accesses memories of locations sequentially, ordered by utility: how much extra reward would be earned due to better choices. This prioritization balances two desiderata: the need to evaluate imminent choices versus the gain from propagating newly encountered information to preceding locations. Our theory offers a simple explanation for numerous findings about place cells; unifies seemingly disparate proposed functions of replay including planning, learning, and consolidation; and posits a mechanism whose dysfunction may underlie pathologies like rumination and craving.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: A rational model of prioritized memory access.**

**Fig. 2: Replay produces extended trajectories in forward and reverse directions.**

**Fig. 3: Forward and reverse sequences happen at different times and are modulated asymmetrically by reward.**

**Fig. 4: Replay over-represents agent and reward locations and predicts subsequent and past behavior.**

**Fig. 5: Forward and reverse sequences happen at different times and are modulated asymmetrically by reward.**

**Fig. 6: Replay frequency decays with familiarity and increases with experience.**

Navigating for reward

Article 06 July 2021

Differential reinforcement encoding along the hippocampal long axis helps resolve the explore–exploit dilemma

Article Open access 26 October 2020

Goal-oriented representations in the human hippocampus during planning and navigation

Article Open access 23 May 2023

References

Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA, USA, 1998).
Daw, N. D. & Dayan, P. The algorithmic anatomy of model-based evaluation. Phil. Trans. R. Soc. B 369, 20130478 (2014).
Article Google Scholar
Shohamy, D. & Daw, N. D. Integrating memories to guide decisions. Curr. Opin. Behav. Sci. 5, 85–90 (2015).
Article Google Scholar
Huys, Q. J. et al. Interplay of approximate planning strategies. Proc. Natl. Acad. Sci. USA 112, 3098–3103 (2015).
Article CAS Google Scholar
Doll, B. B., Duncan, K. D., Simon, D. A., Shohamy, D. & Daw, N. D. Model-based choices involve prospective neural activity. Nat. Neurosci. 18, 767–772 (2015).
Article CAS Google Scholar
Daw, N. D., Niv, Y. & Dayan, P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8, 1704–1711 (2005).
Article CAS Google Scholar
Keramati, M., Dezfouli, A. & Piray, P. Speed/accuracy trade-off between the habitual and the goal-directed processes. PLoS Comput. Biol. 7, e1002055 (2011).
Article CAS Google Scholar
Gillan, C. M., Kosinski, M., Whelan, R., Phelps, E. A. & Daw, N. D. Characterizing a psychiatric symptom dimension related to deficits in goal-directed control. eLife 5, e11305 (2016).
Article Google Scholar
Wimmer, G. E. & Shohamy, D. Preference by association: how memory mechanisms in the hippocampus bias decisions. Science 338, 270–273 (2012).
Article CAS Google Scholar
Gershman, S. J., Markman, A. B. & Otto, A. R. Retrospective revaluation in sequential decision making: A tale of two systems. J. Exp. Psychol. 143, 182 (2014).
Article Google Scholar
Momennejad, I., Otto, A. R., Daw, N. D. & Norman, K. A. Offline replay supports planning: fMRI evidence from reward revaluation. Preprint at bioRxiv https://doi.org/10.1101/196758 (2017).
O’Keefe, J. & Nadel, L. The Hippocampus as a Cognitive Map (Clarendon Press, Oxford, 1978)..
Johnson, A. & Redish, A. D. Neural ensembles in CA3 transiently encode paths forward of the animal at a decision point. J. Neurosci. 27, 12176–12189 (2007).
Article CAS Google Scholar
Diba, K. & Buzsáki, G. Forward and reverse hippocampal place-cell sequences during ripples. Nat. Neurosci. 10, 1241 (2007).
Article CAS Google Scholar
Pfeiffer, B. E. & Foster, D. J. Hippocampal place-cell sequences depict future paths to remembered goals. Nature 497, 74–79 (2013).
Article CAS Google Scholar
Foster, D. J. & Wilson, M. A. Reverse replay of behavioural sequences in hippocampal place cells during the awake state. Nature 440, 680–683 (2006).
Article CAS Google Scholar
Davidson, T. J., Kloosterman, F. & Wilson, M. A. Hippocampal replay of extended experience. Neuron 63, 497–507 (2009).
Article CAS Google Scholar
Gupta, A. S., van der Meer, M. A., Touretzky, D. S. & Redish, A. D. Hippocampal replay is not a simple function of experience. Neuron 65, 695–705 (2010).
Article CAS Google Scholar
Ambrose, R. E., Pfeiffer, B. E. & Foster, D. J. Reverse replay of hippocampal place cells is uniquely modulated by changing reward. Neuron 91, 1124–1136 (2016).
Article CAS Google Scholar
Lee, A. K. & Wilson, M. A. Memory of sequential experience in the hippocampus during slow wave sleep. Neuron 36, 1183–1194 (2002).
Article CAS Google Scholar
Karlsson, M. P. & Frank, L. M. Awake replay of remote experiences in the hippocampus. Nat. Neurosci. 12, 913–918 (2009).
Article CAS Google Scholar
Johnson, A. & Redish, A. D. Hippocampal replay contributes to within session learning in a temporal difference reinforcement learning model. Neural Netw. 18, 1163–1171 (2005).
Article Google Scholar
Carr, M. F., Jadhav, S. P. & Frank, L. M. Hippocampal replay in the awake state: a potential substrate for memory consolidation and retrieval. Nat. Neurosci. 14, 147–153 (2011).
Article CAS Google Scholar
Jadhav, S. P., Kemere, C., German, P. W. & Frank, L. M. Awake hippocampal sharp-wave ripples support spatial memory. Science 336, 1454–1458 (2012).
Article CAS Google Scholar
McClelland, J. L., McNaughton, B. L. & O'Reilly, R. C. Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. Psychol. Rev. 102, 419 (1995).
Article CAS Google Scholar
Sutton, R. S. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. Proc. Int. Conf. Mach. Learn. 7, 216–224 (1990).
Google Scholar
Moore, A. W. & Atkeson, C. G. Prioritized sweeping: reinforcement learning with less data and less time. Mach. Learn. 13, 103–130 (1993).
Google Scholar
Peng, J. & Williams, R. J. Efficient learning and planning within the Dyna framework. Adapt. Behav. 1, 437–454 (1993).
Article Google Scholar
Schaul, T., Quan, J., Antonoglou, I. & Silver, D. Prioritized experience replay. Preprint at arXiv 1511.05952 (2015).
Dupret, D., O'Neill, J., Pleydell-Bouverie, B. & Csicsvari, J. The reorganization and reactivation of hippocampal maps predict spatial memory performance. Nat. Neurosci. 13, 995–1002 (2010).
Article CAS Google Scholar
Ólafsdóttir, H. F., Barry, C., Saleem, A. B., Hassabis, D. & Spiers, H. J. Hippocampal place cells construct reward related sequences through unexplored space. eLife 4, e06063 (2015).
Article Google Scholar
Jackson, J. C., Johnson, A. & Redish, A. D. Hippocampal sharp waves and reactivation during awake states depend on repeated sequential experience. J. Neurosci. 26, 12415–12426 (2006).
Article CAS Google Scholar
Singer, A. C. & Frank, L. M. Rewarded outcomes enhance reactivation of experience in the hippocampus. Neuron 64, 910–921 (2009).
Article CAS Google Scholar
Wu, C.-T., Haggerty, D., Kemere, C. & Ji, D. Hippocampal awake replay in fear memory retrieval. Nat. Neurosci. 20, 571 (2017).
Article CAS Google Scholar
O’Neill, J., Senior, T. J., Allen, K., Huxter, J. R. & Csicsvari, J. Reactivation of experience-dependent cell assembly patterns in the hippocampus. Nat. Neurosci. 11, 209 (2008).
Article Google Scholar
Cheng, S. & Frank, L. M. New experiences enhance coordinated neural activity in the hippocampus. Neuron 57, 303–313 (2008).
Article CAS Google Scholar
Buhry, L., Azizi, A. H. & Cheng, S. Reactivation, replay, and preplay: how it might all fit together. Neural Plast. 2011, 203462 (2011).
Singer, A. C., Carr, M. F., Karlsson , M. P. & Frank, L. M. Hippocampal SWR activity predicts correct decisions during the initial learning of an alternation task. Neuron 77, 1163–1173 (2013).
Article CAS Google Scholar
Lansink, C. S., Goltstein, P. M., Lankelma, J. V., McNaughton, B. L. & Pennartz, C. M. Hippocampus leads ventral striatum in replay of place-reward information. PLoS Biol. 7, e1000173 (2009).
Article Google Scholar
Gomperts, S. N., Kloosterman, F. & Wilson, M. A. VTA neurons coordinate with the hippocampal reactivation of spatial experience. eLife 4, e05360 (2015).
Article Google Scholar
Dayan, P. Improving generalization for temporal difference learning: the successor representation. Neural Comput. 5, 613–624 (1993).
Article Google Scholar
Stachenfeld, K. L., Botvinick, M. M., & Gershman, S. J. The hippocampus as a predictive map. Nat. Neurosci., 20, 1643 (2017).
Article CAS Google Scholar
Russek, E. M., Momennejad, I., Botvinick, M. M., Gershman, S. J., & Daw, N. D. Predictive representations can link model-based reinforcement learning to model-free mechanisms. PLoS Comput. Biol. 13, e1005768 (2017).
Cushman, F. & Morris, A. Habitual control of goal selection in humans. Proc. Natl. Acad. Sci. USA 112, 13817–13822 (2015).
Article CAS Google Scholar
Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P. & Dolan, R. J. Model-based influences on humans’ choices and striatal prediction errors. Neuron 69, 1204–1215 (2011).
Article CAS Google Scholar
Sadacca, B. F., Jones, J. L. & Schoenbaum, G. Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework. eLife 5, e13665 (2016).
Article Google Scholar
Doll, B. B., Bath, K. G., Daw, N. D. & Frank, M. J. Variability in dopamine genes dissociates model-based and model-free reinforcement learning. J. Neurosci. 36, 1211–1222 (2016).
Article CAS Google Scholar
Sharpe, M. J. et al. Dopamine transients are sufficient and necessary for acquisition of model-based associations. Nat. Neurosci. 20, 735–742 (2017).
Article CAS Google Scholar
Momennejad, I. et al. The successor representation in human reinforcement learning. Nat. Hum. Behav. 1, 680 (2017).
Article Google Scholar
Botvinick, M. M., Niv, Y. & Barto, A. C. Hierarchically organized behavior and its neural foundations: a reinforcement learning perspective. Cognition 113, 262–280 (2009).
Article Google Scholar
Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).
Article CAS Google Scholar
Coulom, R. Efficient selectivity and backup operators in Monte-Carlo tree search. In International Conference on Computers and Games, 72–83 (Springer, Heidelberg, Germany, 2006).
Watkins, C. J. & Dayan, P. Q-learning. Mach. Learn. 8, 279–292 (1992).
Anderson, J. R. & Milson, R. Human memory: an adaptive perspective. Psychol. Rev. 96, 703 (1989).
Article Google Scholar
Sutton, R. S., McAllester, D. A., Singh, S. P. & Mansour, Y. Policy gradient methods for reinforcement learning with function approximation. Adv. Neural Inf. Process. Syst. 12, 1057–1063 (2000).
Google Scholar

Download references

Acknowledgments

We thank M. Lengyel, D. Shohamy, and D. Acosta-Kane for many helpful discussions, and we thank P.D. Rich for his comments on an earlier draft of the manuscript. We acknowledge support from NIDA through grant R01DA038891, part of the CRCNS program, and Google DeepMind. The content is solely the responsibility of the authors and does not necessarily represent the official views of any of the funding agencies.

Author information

Authors and Affiliations

Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA
Marcelo G. Mattar & Nathaniel D. Daw
Department of Psychology, Princeton University, Princeton, NJ, USA
Nathaniel D. Daw

Authors

Marcelo G. Mattar
View author publications
You can also search for this author in PubMed Google Scholar
Nathaniel D. Daw
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization, M.G.M. and N.D.D.; methodology, M.G.M. and N.D.D.; software, M.G.M.; simulations, M.G.M.; writing (original draft), M.G.M. and N.D.D.; writing (review & editing), M.G.M. and N.D.D.; funding acquisition, N.D.D.

Corresponding author

Correspondence to Marcelo G. Mattar.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Fig. 1 The need term is required for channeling backups along depth-first trajectories.

(a) Number of significant forward and reverse sequences on the open field when setting the need term to the Successor Representation (as predicted by the theory). (b) Number of significant forward and reverse sequences on the open field when setting the need term to be one at every state. Notice the reduction of reverse sequences at the end of a run due to a breadth-first pattern of value propagation.

Supplementary Fig. 2 A positive need term is required for sequential value propagation.

(a) Number of significant forward and reverse sequences on linear track (left) and on open field (right) when setting the need term to the Successor Representation (as predicted by the theory). (b) Number of significant forward and reverse sequences on linear track (left) and on open field (right) when setting the need term to be one at the agent’s location and zero elsewhere. Notice the complete absence of both forward and reverse sequences due repeated replay of actions on the agent’s location.

Supplementary Fig. 3 Replay over-represents reward locations in both reverse and forward sequences.

(a) Probability that a state is part of a forward sequence at a given episode. Notice that because of the random starting location, the observed results cannot be explained by initiation bias. Notice also that locations corresponding to the final turn toward the reward were emphasized even more than locations nearer the reward itself, a consequence of the gain term being higher where there is a greater effect on behavior. (b) Probability that a state is part of a reverse sequence at a given episode. Notice that because of the random starting location, the observed results cannot be explained by initiation bias. Reverse replay tends to concentrate near the reward. Notice that the higher activation probability for reverse events is due to a combination of a reward-location bias and initiation bias, given that reverse sequences tend to start near the reward, where the agent usually is.

Supplementary Fig. 4 Differential modulation of reverse replay in increased (or decreased) reward conditions.

(a) Increased reward condition where the reward encountered by the agent was four times larger in half of the episodes. In comparison to the baseline (1x-1x) reward setting (left), there was a lower number of significant reverse sequences for 1x rewards (middle) and a greater number of significant reverse sequences for 4x rewards (right). The results presented in Fig. 5c correspond to a ratio between the two red bars (4x/1x). (b) Decreased reward condition where the reward encountered by the agent was zero in half of the episodes. In comparison to the baseline (1x-1x) reward setting (left), there was a greater number of significant reverse sequences for 1x rewards (middle) and a lower number of significant reverse sequences for 0x rewards (right). The results presented in Fig. 5e correspond to a ratio between the two blue bars (0x/1x).

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–4

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mattar, M.G., Daw, N.D. Prioritized memory access explains planning and hippocampal replay. Nat Neurosci 21, 1609–1617 (2018). https://doi.org/10.1038/s41593-018-0232-z

Download citation

Received: 09 January 2018
Accepted: 17 August 2018
Published: 22 October 2018
Issue Date: November 2018
DOI: https://doi.org/10.1038/s41593-018-0232-z

This article is cited by

Goal-oriented representations in the human hippocampus during planning and navigation
- Jordan Crivelli-Decker
- Alex Clarke
- Charan Ranganath
Nature Communications (2023)
Differential replay of reward and punishment paths predicts approach and avoidance
- Jessica McFadyen
- Yunzhe Liu
- Raymond J. Dolan
Nature Neuroscience (2023)
Reply to ‘Post-encounter freezing during approach–avoidance conflict: the role of the hippocampus’
- Karin Roelofs
- Felix H. Klaassen
- Peter Dayan
Nature Reviews Neuroscience (2023)
Disarrangement and reorganization of the hippocampal functional connectivity during the spatial path adjustment of pigeons
- Mengmeng Li
- Shuguan Cheng
- Long Yang
BMC Zoology (2022)
Model-based learning retrospectively updates model-free values
- Max Doody
- Maaike M. H. Van Swieten
- Sanjay G. Manohar
Scientific Reports (2022)