Prioritized memory access explains planning and hippocampal replay

Abstract

To make decisions, animals must evaluate candidate choices by accessing memories of relevant experiences. Yet little is known about which experiences are considered or ignored during deliberation, which ultimately governs choice. We propose a normative theory predicting which memories should be accessed at each moment to optimize future decisions. Using nonlocal ‘replay’ of spatial locations in hippocampus as a window into memory access, we simulate a spatial navigation task in which an agent accesses memories of locations sequentially, ordered by utility: how much extra reward would be earned due to better choices. This prioritization balances two desiderata: the need to evaluate imminent choices versus the gain from propagating newly encountered information to preceding locations. Our theory offers a simple explanation for numerous findings about place cells; unifies seemingly disparate proposed functions of replay including planning, learning, and consolidation; and posits a mechanism whose dysfunction may underlie pathologies like rumination and craving.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: A rational model of prioritized memory access.
Fig. 2: Replay produces extended trajectories in forward and reverse directions.
Fig. 3: Forward and reverse sequences happen at different times and are modulated asymmetrically by reward.
Fig. 4: Replay over-represents agent and reward locations and predicts subsequent and past behavior.
Fig. 5: Forward and reverse sequences happen at different times and are modulated asymmetrically by reward.
Fig. 6: Replay frequency decays with familiarity and increases with experience.

References

  1. 1.

    Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA, USA, 1998).

  2. 2.

    Daw, N. D. & Dayan, P. The algorithmic anatomy of model-based evaluation. Phil. Trans. R. Soc. B 369, 20130478 (2014).

    Article  Google Scholar 

  3. 3.

    Shohamy, D. & Daw, N. D. Integrating memories to guide decisions. Curr. Opin. Behav. Sci. 5, 85–90 (2015).

    Article  Google Scholar 

  4. 4.

    Huys, Q. J. et al. Interplay of approximate planning strategies. Proc. Natl. Acad. Sci. USA 112, 3098–3103 (2015).

    CAS  Article  Google Scholar 

  5. 5.

    Doll, B. B., Duncan, K. D., Simon, D. A., Shohamy, D. & Daw, N. D. Model-based choices involve prospective neural activity. Nat. Neurosci. 18, 767–772 (2015).

    CAS  Article  Google Scholar 

  6. 6.

    Daw, N. D., Niv, Y. & Dayan, P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8, 1704–1711 (2005).

    CAS  Article  Google Scholar 

  7. 7.

    Keramati, M., Dezfouli, A. & Piray, P. Speed/accuracy trade-off between the habitual and the goal-directed processes. PLoS Comput. Biol. 7, e1002055 (2011).

    CAS  Article  Google Scholar 

  8. 8.

    Gillan, C. M., Kosinski, M., Whelan, R., Phelps, E. A. & Daw, N. D. Characterizing a psychiatric symptom dimension related to deficits in goal-directed control. eLife 5, e11305 (2016).

    Article  Google Scholar 

  9. 9.

    Wimmer, G. E. & Shohamy, D. Preference by association: how memory mechanisms in the hippocampus bias decisions. Science 338, 270–273 (2012).

    CAS  Article  Google Scholar 

  10. 10.

    Gershman, S. J., Markman, A. B. & Otto, A. R. Retrospective revaluation in sequential decision making: A tale of two systems. J. Exp. Psychol. 143, 182 (2014).

    Article  Google Scholar 

  11. 11.

    Momennejad, I., Otto, A. R., Daw, N. D. & Norman, K. A. Offline replay supports planning: fMRI evidence from reward revaluation. Preprint at bioRxiv https://doi.org/10.1101/196758 (2017).

  12. 12.

    O’Keefe, J. & Nadel, L. The Hippocampus as a Cognitive Map (Clarendon Press, Oxford, 1978)..

  13. 13.

    Johnson, A. & Redish, A. D. Neural ensembles in CA3 transiently encode paths forward of the animal at a decision point. J. Neurosci. 27, 12176–12189 (2007).

    CAS  Article  Google Scholar 

  14. 14.

    Diba, K. & Buzsáki, G. Forward and reverse hippocampal place-cell sequences during ripples. Nat. Neurosci. 10, 1241 (2007).

    CAS  Article  Google Scholar 

  15. 15.

    Pfeiffer, B. E. & Foster, D. J. Hippocampal place-cell sequences depict future paths to remembered goals. Nature 497, 74–79 (2013).

    CAS  Article  Google Scholar 

  16. 16.

    Foster, D. J. & Wilson, M. A. Reverse replay of behavioural sequences in hippocampal place cells during the awake state. Nature 440, 680–683 (2006).

    CAS  Article  Google Scholar 

  17. 17.

    Davidson, T. J., Kloosterman, F. & Wilson, M. A. Hippocampal replay of extended experience. Neuron 63, 497–507 (2009).

    CAS  Article  Google Scholar 

  18. 18.

    Gupta, A. S., van der Meer, M. A., Touretzky, D. S. & Redish, A. D. Hippocampal replay is not a simple function of experience. Neuron 65, 695–705 (2010).

    CAS  Article  Google Scholar 

  19. 19.

    Ambrose, R. E., Pfeiffer, B. E. & Foster, D. J. Reverse replay of hippocampal place cells is uniquely modulated by changing reward. Neuron 91, 1124–1136 (2016).

    CAS  Article  Google Scholar 

  20. 20.

    Lee, A. K. & Wilson, M. A. Memory of sequential experience in the hippocampus during slow wave sleep. Neuron 36, 1183–1194 (2002).

    CAS  Article  Google Scholar 

  21. 21.

    Karlsson, M. P. & Frank, L. M. Awake replay of remote experiences in the hippocampus. Nat. Neurosci. 12, 913–918 (2009).

    CAS  Article  Google Scholar 

  22. 22.

    Johnson, A. & Redish, A. D. Hippocampal replay contributes to within session learning in a temporal difference reinforcement learning model. Neural Netw. 18, 1163–1171 (2005).

    Article  Google Scholar 

  23. 23.

    Carr, M. F., Jadhav, S. P. & Frank, L. M. Hippocampal replay in the awake state: a potential substrate for memory consolidation and retrieval. Nat. Neurosci. 14, 147–153 (2011).

    CAS  Article  Google Scholar 

  24. 24.

    Jadhav, S. P., Kemere, C., German, P. W. & Frank, L. M. Awake hippocampal sharp-wave ripples support spatial memory. Science 336, 1454–1458 (2012).

    CAS  Article  Google Scholar 

  25. 25.

    McClelland, J. L., McNaughton, B. L. & O'Reilly, R. C. Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. Psychol. Rev. 102, 419 (1995).

    CAS  Article  Google Scholar 

  26. 26.

    Sutton, R. S. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. Proc. Int. Conf. Mach. Learn. 7, 216–224 (1990).

    Google Scholar 

  27. 27.

    Moore, A. W. & Atkeson, C. G. Prioritized sweeping: reinforcement learning with less data and less time. Mach. Learn. 13, 103–130 (1993).

    Google Scholar 

  28. 28.

    Peng, J. & Williams, R. J. Efficient learning and planning within the Dyna framework. Adapt. Behav. 1, 437–454 (1993).

    Article  Google Scholar 

  29. 29.

    Schaul, T., Quan, J., Antonoglou, I. & Silver, D. Prioritized experience replay. Preprint at arXiv 1511.05952 (2015).

  30. 30.

    Dupret, D., O'Neill, J., Pleydell-Bouverie, B. & Csicsvari, J. The reorganization and reactivation of hippocampal maps predict spatial memory performance. Nat. Neurosci. 13, 995–1002 (2010).

    CAS  Article  Google Scholar 

  31. 31.

    Ólafsdóttir, H. F., Barry, C., Saleem, A. B., Hassabis, D. & Spiers, H. J. Hippocampal place cells construct reward related sequences through unexplored space. eLife 4, e06063 (2015).

    Article  Google Scholar 

  32. 32.

    Jackson, J. C., Johnson, A. & Redish, A. D. Hippocampal sharp waves and reactivation during awake states depend on repeated sequential experience. J. Neurosci. 26, 12415–12426 (2006).

    CAS  Article  Google Scholar 

  33. 33.

    Singer, A. C. & Frank, L. M. Rewarded outcomes enhance reactivation of experience in the hippocampus. Neuron 64, 910–921 (2009).

    CAS  Article  Google Scholar 

  34. 34.

    Wu, C.-T., Haggerty, D., Kemere, C. & Ji, D. Hippocampal awake replay in fear memory retrieval. Nat. Neurosci. 20, 571 (2017).

    CAS  Article  Google Scholar 

  35. 35.

    O’Neill, J., Senior, T. J., Allen, K., Huxter, J. R. & Csicsvari, J. Reactivation of experience-dependent cell assembly patterns in the hippocampus. Nat. Neurosci. 11, 209 (2008).

    Article  Google Scholar 

  36. 36.

    Cheng, S. & Frank, L. M. New experiences enhance coordinated neural activity in the hippocampus. Neuron 57, 303–313 (2008).

    CAS  Article  Google Scholar 

  37. 37.

    Buhry, L., Azizi, A. H. & Cheng, S. Reactivation, replay, and preplay: how it might all fit together. Neural Plast. 2011, 203462 (2011).

  38. 38.

    Singer, A. C., Carr, M. F., Karlsson , M. P. & Frank, L. M. Hippocampal SWR activity predicts correct decisions during the initial learning of an alternation task. Neuron 77, 1163–1173 (2013).

    CAS  Article  Google Scholar 

  39. 39.

    Lansink, C. S., Goltstein, P. M., Lankelma, J. V., McNaughton, B. L. & Pennartz, C. M. Hippocampus leads ventral striatum in replay of place-reward information. PLoS Biol. 7, e1000173 (2009).

    Article  Google Scholar 

  40. 40.

    Gomperts, S. N., Kloosterman, F. & Wilson, M. A. VTA neurons coordinate with the hippocampal reactivation of spatial experience. eLife 4, e05360 (2015).

    Article  Google Scholar 

  41. 41.

    Dayan, P. Improving generalization for temporal difference learning: the successor representation. Neural Comput. 5, 613–624 (1993).

    Article  Google Scholar 

  42. 42.

    Stachenfeld, K. L., Botvinick, M. M., & Gershman, S. J. The hippocampus as a predictive map. Nat. Neurosci., 20, 1643 (2017).

    CAS  Article  Google Scholar 

  43. 43.

    Russek, E. M., Momennejad, I., Botvinick, M. M., Gershman, S. J., & Daw, N. D. Predictive representations can link model-based reinforcement learning to model-free mechanisms. PLoS Comput. Biol. 13, e1005768 (2017).

  44. 44.

    Cushman, F. & Morris, A. Habitual control of goal selection in humans. Proc. Natl. Acad. Sci. USA 112, 13817–13822 (2015).

    CAS  Article  Google Scholar 

  45. 45.

    Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P. & Dolan, R. J. Model-based influences on humans’ choices and striatal prediction errors. Neuron 69, 1204–1215 (2011).

    CAS  Article  Google Scholar 

  46. 46.

    Sadacca, B. F., Jones, J. L. & Schoenbaum, G. Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework. eLife 5, e13665 (2016).

    Article  Google Scholar 

  47. 47.

    Doll, B. B., Bath, K. G., Daw, N. D. & Frank, M. J. Variability in dopamine genes dissociates model-based and model-free reinforcement learning. J. Neurosci. 36, 1211–1222 (2016).

    CAS  Article  Google Scholar 

  48. 48.

    Sharpe, M. J. et al. Dopamine transients are sufficient and necessary for acquisition of model-based associations. Nat. Neurosci. 20, 735–742 (2017).

    CAS  Article  Google Scholar 

  49. 49.

    Momennejad, I. et al. The successor representation in human reinforcement learning. Nat. Hum. Behav. 1, 680 (2017).

    Article  Google Scholar 

  50. 50.

    Botvinick, M. M., Niv, Y. & Barto, A. C. Hierarchically organized behavior and its neural foundations: a reinforcement learning perspective. Cognition 113, 262–280 (2009).

    Article  Google Scholar 

  51. 51.

    Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).

    CAS  Article  Google Scholar 

  52. 52.

    Coulom, R. Efficient selectivity and backup operators in Monte-Carlo tree search. In International Conference on Computers and Games, 72–83 (Springer, Heidelberg, Germany, 2006).

  53. 53.

    Watkins, C. J. & Dayan, P. Q-learning. Mach. Learn. 8, 279–292 (1992).

  54. 54.

    Anderson, J. R. & Milson, R. Human memory: an adaptive perspective. Psychol. Rev. 96, 703 (1989).

    Article  Google Scholar 

  55. 55.

    Sutton, R. S., McAllester, D. A., Singh, S. P. & Mansour, Y. Policy gradient methods for reinforcement learning with function approximation. Adv. Neural Inf. Process. Syst. 12, 1057–1063 (2000).

    Google Scholar 

Download references

Acknowledgments

We thank M. Lengyel, D. Shohamy, and D. Acosta-Kane for many helpful discussions, and we thank P.D. Rich for his comments on an earlier draft of the manuscript. We acknowledge support from NIDA through grant R01DA038891, part of the CRCNS program, and Google DeepMind. The content is solely the responsibility of the authors and does not necessarily represent the official views of any of the funding agencies.

Author information

Affiliations

Authors

Contributions

Conceptualization, M.G.M. and N.D.D.; methodology, M.G.M. and N.D.D.; software, M.G.M.; simulations, M.G.M.; writing (original draft), M.G.M. and N.D.D.; writing (review & editing), M.G.M. and N.D.D.; funding acquisition, N.D.D.

Corresponding author

Correspondence to Marcelo G. Mattar.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Fig. 1 The need term is required for channeling backups along depth-first trajectories.

(a) Number of significant forward and reverse sequences on the open field when setting the need term to the Successor Representation (as predicted by the theory). (b) Number of significant forward and reverse sequences on the open field when setting the need term to be one at every state. Notice the reduction of reverse sequences at the end of a run due to a breadth-first pattern of value propagation.

Supplementary Fig. 2 A positive need term is required for sequential value propagation.

(a) Number of significant forward and reverse sequences on linear track (left) and on open field (right) when setting the need term to the Successor Representation (as predicted by the theory). (b) Number of significant forward and reverse sequences on linear track (left) and on open field (right) when setting the need term to be one at the agent’s location and zero elsewhere. Notice the complete absence of both forward and reverse sequences due repeated replay of actions on the agent’s location.

Supplementary Fig. 3 Replay over-represents reward locations in both reverse and forward sequences.

(a) Probability that a state is part of a forward sequence at a given episode. Notice that because of the random starting location, the observed results cannot be explained by initiation bias. Notice also that locations corresponding to the final turn toward the reward were emphasized even more than locations nearer the reward itself, a consequence of the gain term being higher where there is a greater effect on behavior. (b) Probability that a state is part of a reverse sequence at a given episode. Notice that because of the random starting location, the observed results cannot be explained by initiation bias. Reverse replay tends to concentrate near the reward. Notice that the higher activation probability for reverse events is due to a combination of a reward-location bias and initiation bias, given that reverse sequences tend to start near the reward, where the agent usually is.

Supplementary Fig. 4 Differential modulation of reverse replay in increased (or decreased) reward conditions.

(a) Increased reward condition where the reward encountered by the agent was four times larger in half of the episodes. In comparison to the baseline (1x-1x) reward setting (left), there was a lower number of significant reverse sequences for 1x rewards (middle) and a greater number of significant reverse sequences for 4x rewards (right). The results presented in Fig. 5c correspond to a ratio between the two red bars (4x/1x). (b) Decreased reward condition where the reward encountered by the agent was zero in half of the episodes. In comparison to the baseline (1x-1x) reward setting (left), there was a greater number of significant reverse sequences for 1x rewards (middle) and a lower number of significant reverse sequences for 0x rewards (right). The results presented in Fig. 5e correspond to a ratio between the two blue bars (0x/1x).

Supplementary information

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Mattar, M.G., Daw, N.D. Prioritized memory access explains planning and hippocampal replay. Nat Neurosci 21, 1609–1617 (2018). https://doi.org/10.1038/s41593-018-0232-z

Download citation

Search

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing