Article | Published:

Prioritized memory access explains planning and hippocampal replay

Nature Neurosciencevolume 21pages16091617 (2018) | Download Citation

Abstract

To make decisions, animals must evaluate candidate choices by accessing memories of relevant experiences. Yet little is known about which experiences are considered or ignored during deliberation, which ultimately governs choice. We propose a normative theory predicting which memories should be accessed at each moment to optimize future decisions. Using nonlocal ‘replay’ of spatial locations in hippocampus as a window into memory access, we simulate a spatial navigation task in which an agent accesses memories of locations sequentially, ordered by utility: how much extra reward would be earned due to better choices. This prioritization balances two desiderata: the need to evaluate imminent choices versus the gain from propagating newly encountered information to preceding locations. Our theory offers a simple explanation for numerous findings about place cells; unifies seemingly disparate proposed functions of replay including planning, learning, and consolidation; and posits a mechanism whose dysfunction may underlie pathologies like rumination and craving.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. 1.

    Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA, USA, 1998).

  2. 2.

    Daw, N. D. & Dayan, P. The algorithmic anatomy of model-based evaluation. Phil. Trans. R. Soc. B 369, 20130478 (2014).

  3. 3.

    Shohamy, D. & Daw, N. D. Integrating memories to guide decisions. Curr. Opin. Behav. Sci. 5, 85–90 (2015).

  4. 4.

    Huys, Q. J. et al. Interplay of approximate planning strategies. Proc. Natl. Acad. Sci. USA 112, 3098–3103 (2015).

  5. 5.

    Doll, B. B., Duncan, K. D., Simon, D. A., Shohamy, D. & Daw, N. D. Model-based choices involve prospective neural activity. Nat. Neurosci. 18, 767–772 (2015).

  6. 6.

    Daw, N. D., Niv, Y. & Dayan, P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8, 1704–1711 (2005).

  7. 7.

    Keramati, M., Dezfouli, A. & Piray, P. Speed/accuracy trade-off between the habitual and the goal-directed processes. PLoS Comput. Biol. 7, e1002055 (2011).

  8. 8.

    Gillan, C. M., Kosinski, M., Whelan, R., Phelps, E. A. & Daw, N. D. Characterizing a psychiatric symptom dimension related to deficits in goal-directed control. eLife 5, e11305 (2016).

  9. 9.

    Wimmer, G. E. & Shohamy, D. Preference by association: how memory mechanisms in the hippocampus bias decisions. Science 338, 270–273 (2012).

  10. 10.

    Gershman, S. J., Markman, A. B. & Otto, A. R. Retrospective revaluation in sequential decision making: A tale of two systems. J. Exp. Psychol. 143, 182 (2014).

  11. 11.

    Momennejad, I., Otto, A. R., Daw, N. D. & Norman, K. A. Offline replay supports planning: fMRI evidence from reward revaluation. Preprint at bioRxiv https://doi.org/10.1101/196758 (2017).

  12. 12.

    O’Keefe, J. & Nadel, L. The Hippocampus as a Cognitive Map (Clarendon Press, Oxford, 1978)..

  13. 13.

    Johnson, A. & Redish, A. D. Neural ensembles in CA3 transiently encode paths forward of the animal at a decision point. J. Neurosci. 27, 12176–12189 (2007).

  14. 14.

    Diba, K. & Buzsáki, G. Forward and reverse hippocampal place-cell sequences during ripples. Nat. Neurosci. 10, 1241 (2007).

  15. 15.

    Pfeiffer, B. E. & Foster, D. J. Hippocampal place-cell sequences depict future paths to remembered goals. Nature 497, 74–79 (2013).

  16. 16.

    Foster, D. J. & Wilson, M. A. Reverse replay of behavioural sequences in hippocampal place cells during the awake state. Nature 440, 680–683 (2006).

  17. 17.

    Davidson, T. J., Kloosterman, F. & Wilson, M. A. Hippocampal replay of extended experience. Neuron 63, 497–507 (2009).

  18. 18.

    Gupta, A. S., van der Meer, M. A., Touretzky, D. S. & Redish, A. D. Hippocampal replay is not a simple function of experience. Neuron 65, 695–705 (2010).

  19. 19.

    Ambrose, R. E., Pfeiffer, B. E. & Foster, D. J. Reverse replay of hippocampal place cells is uniquely modulated by changing reward. Neuron 91, 1124–1136 (2016).

  20. 20.

    Lee, A. K. & Wilson, M. A. Memory of sequential experience in the hippocampus during slow wave sleep. Neuron 36, 1183–1194 (2002).

  21. 21.

    Karlsson, M. P. & Frank, L. M. Awake replay of remote experiences in the hippocampus. Nat. Neurosci. 12, 913–918 (2009).

  22. 22.

    Johnson, A. & Redish, A. D. Hippocampal replay contributes to within session learning in a temporal difference reinforcement learning model. Neural Netw. 18, 1163–1171 (2005).

  23. 23.

    Carr, M. F., Jadhav, S. P. & Frank, L. M. Hippocampal replay in the awake state: a potential substrate for memory consolidation and retrieval. Nat. Neurosci. 14, 147–153 (2011).

  24. 24.

    Jadhav, S. P., Kemere, C., German, P. W. & Frank, L. M. Awake hippocampal sharp-wave ripples support spatial memory. Science 336, 1454–1458 (2012).

  25. 25.

    McClelland, J. L., McNaughton, B. L. & O'Reilly, R. C. Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. Psychol. Rev. 102, 419 (1995).

  26. 26.

    Sutton, R. S. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. Proc. Int. Conf. Mach. Learn. 7, 216–224 (1990).

  27. 27.

    Moore, A. W. & Atkeson, C. G. Prioritized sweeping: reinforcement learning with less data and less time. Mach. Learn. 13, 103–130 (1993).

  28. 28.

    Peng, J. & Williams, R. J. Efficient learning and planning within the Dyna framework. Adapt. Behav. 1, 437–454 (1993).

  29. 29.

    Schaul, T., Quan, J., Antonoglou, I. & Silver, D. Prioritized experience replay. Preprint at arXiv 1511.05952 (2015).

  30. 30.

    Dupret, D., O'Neill, J., Pleydell-Bouverie, B. & Csicsvari, J. The reorganization and reactivation of hippocampal maps predict spatial memory performance. Nat. Neurosci. 13, 995–1002 (2010).

  31. 31.

    Ólafsdóttir, H. F., Barry, C., Saleem, A. B., Hassabis, D. & Spiers, H. J. Hippocampal place cells construct reward related sequences through unexplored space. eLife 4, e06063 (2015).

  32. 32.

    Jackson, J. C., Johnson, A. & Redish, A. D. Hippocampal sharp waves and reactivation during awake states depend on repeated sequential experience. J. Neurosci. 26, 12415–12426 (2006).

  33. 33.

    Singer, A. C. & Frank, L. M. Rewarded outcomes enhance reactivation of experience in the hippocampus. Neuron 64, 910–921 (2009).

  34. 34.

    Wu, C.-T., Haggerty, D., Kemere, C. & Ji, D. Hippocampal awake replay in fear memory retrieval. Nat. Neurosci. 20, 571 (2017).

  35. 35.

    O’Neill, J., Senior, T. J., Allen, K., Huxter, J. R. & Csicsvari, J. Reactivation of experience-dependent cell assembly patterns in the hippocampus. Nat. Neurosci. 11, 209 (2008).

  36. 36.

    Cheng, S. & Frank, L. M. New experiences enhance coordinated neural activity in the hippocampus. Neuron 57, 303–313 (2008).

  37. 37.

    Buhry, L., Azizi, A. H. & Cheng, S. Reactivation, replay, and preplay: how it might all fit together. Neural Plast. 2011, 203462 (2011).

  38. 38.

    Singer, A. C., Carr, M. F., Karlsson , M. P. & Frank, L. M. Hippocampal SWR activity predicts correct decisions during the initial learning of an alternation task. Neuron 77, 1163–1173 (2013).

  39. 39.

    Lansink, C. S., Goltstein, P. M., Lankelma, J. V., McNaughton, B. L. & Pennartz, C. M. Hippocampus leads ventral striatum in replay of place-reward information. PLoS Biol. 7, e1000173 (2009).

  40. 40.

    Gomperts, S. N., Kloosterman, F. & Wilson, M. A. VTA neurons coordinate with the hippocampal reactivation of spatial experience. eLife 4, e05360 (2015).

  41. 41.

    Dayan, P. Improving generalization for temporal difference learning: the successor representation. Neural Comput. 5, 613–624 (1993).

  42. 42.

    Stachenfeld, K. L., Botvinick, M. M., & Gershman, S. J. The hippocampus as a predictive map. Nat. Neurosci., 20, 1643 (2017).

  43. 43.

    Russek, E. M., Momennejad, I., Botvinick, M. M., Gershman, S. J., & Daw, N. D. Predictive representations can link model-based reinforcement learning to model-free mechanisms. PLoS Comput. Biol. 13, e1005768 (2017).

  44. 44.

    Cushman, F. & Morris, A. Habitual control of goal selection in humans. Proc. Natl. Acad. Sci. USA 112, 13817–13822 (2015).

  45. 45.

    Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P. & Dolan, R. J. Model-based influences on humans’ choices and striatal prediction errors. Neuron 69, 1204–1215 (2011).

  46. 46.

    Sadacca, B. F., Jones, J. L. & Schoenbaum, G. Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework. eLife 5, e13665 (2016).

  47. 47.

    Doll, B. B., Bath, K. G., Daw, N. D. & Frank, M. J. Variability in dopamine genes dissociates model-based and model-free reinforcement learning. J. Neurosci. 36, 1211–1222 (2016).

  48. 48.

    Sharpe, M. J. et al. Dopamine transients are sufficient and necessary for acquisition of model-based associations. Nat. Neurosci. 20, 735–742 (2017).

  49. 49.

    Momennejad, I. et al. The successor representation in human reinforcement learning. Nat. Hum. Behav. 1, 680 (2017).

  50. 50.

    Botvinick, M. M., Niv, Y. & Barto, A. C. Hierarchically organized behavior and its neural foundations: a reinforcement learning perspective. Cognition 113, 262–280 (2009).

  51. 51.

    Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).

  52. 52.

    Coulom, R. Efficient selectivity and backup operators in Monte-Carlo tree search. In International Conference on Computers and Games, 72–83 (Springer, Heidelberg, Germany, 2006).

  53. 53.

    Watkins, C. J. & Dayan, P. Q-learning. Mach. Learn. 8, 279–292 (1992).

  54. 54.

    Anderson, J. R. & Milson, R. Human memory: an adaptive perspective. Psychol. Rev. 96, 703 (1989).

  55. 55.

    Sutton, R. S., McAllester, D. A., Singh, S. P. & Mansour, Y. Policy gradient methods for reinforcement learning with function approximation. Adv. Neural Inf. Process. Syst. 12, 1057–1063 (2000).

Download references

Acknowledgments

We thank M. Lengyel, D. Shohamy, and D. Acosta-Kane for many helpful discussions, and we thank P.D. Rich for his comments on an earlier draft of the manuscript. We acknowledge support from NIDA through grant R01DA038891, part of the CRCNS program, and Google DeepMind. The content is solely the responsibility of the authors and does not necessarily represent the official views of any of the funding agencies.

Author information

Affiliations

  1. Princeton Neuroscience Institute, Princeton University, Princeton, NJ, USA

    • Marcelo G. Mattar
    •  & Nathaniel D. Daw
  2. Department of Psychology, Princeton University, Princeton, NJ, USA

    • Nathaniel D. Daw

Authors

  1. Search for Marcelo G. Mattar in:

  2. Search for Nathaniel D. Daw in:

Contributions

Conceptualization, M.G.M. and N.D.D.; methodology, M.G.M. and N.D.D.; software, M.G.M.; simulations, M.G.M.; writing (original draft), M.G.M. and N.D.D.; writing (review & editing), M.G.M. and N.D.D.; funding acquisition, N.D.D.

Competing interests

The authors declare no competing interests.

Corresponding author

Correspondence to Marcelo G. Mattar.

Integrated supplementary information

  1. Supplementary Fig. 1 The need term is required for channeling backups along depth-first trajectories.

    (a) Number of significant forward and reverse sequences on the open field when setting the need term to the Successor Representation (as predicted by the theory). (b) Number of significant forward and reverse sequences on the open field when setting the need term to be one at every state. Notice the reduction of reverse sequences at the end of a run due to a breadth-first pattern of value propagation.

  2. Supplementary Fig. 2 A positive need term is required for sequential value propagation.

    (a) Number of significant forward and reverse sequences on linear track (left) and on open field (right) when setting the need term to the Successor Representation (as predicted by the theory). (b) Number of significant forward and reverse sequences on linear track (left) and on open field (right) when setting the need term to be one at the agent’s location and zero elsewhere. Notice the complete absence of both forward and reverse sequences due repeated replay of actions on the agent’s location.

  3. Supplementary Fig. 3 Replay over-represents reward locations in both reverse and forward sequences.

    (a) Probability that a state is part of a forward sequence at a given episode. Notice that because of the random starting location, the observed results cannot be explained by initiation bias. Notice also that locations corresponding to the final turn toward the reward were emphasized even more than locations nearer the reward itself, a consequence of the gain term being higher where there is a greater effect on behavior. (b) Probability that a state is part of a reverse sequence at a given episode. Notice that because of the random starting location, the observed results cannot be explained by initiation bias. Reverse replay tends to concentrate near the reward. Notice that the higher activation probability for reverse events is due to a combination of a reward-location bias and initiation bias, given that reverse sequences tend to start near the reward, where the agent usually is.

  4. Supplementary Fig. 4 Differential modulation of reverse replay in increased (or decreased) reward conditions.

    (a) Increased reward condition where the reward encountered by the agent was four times larger in half of the episodes. In comparison to the baseline (1x-1x) reward setting (left), there was a lower number of significant reverse sequences for 1x rewards (middle) and a greater number of significant reverse sequences for 4x rewards (right). The results presented in Fig. 5c correspond to a ratio between the two red bars (4x/1x). (b) Decreased reward condition where the reward encountered by the agent was zero in half of the episodes. In comparison to the baseline (1x-1x) reward setting (left), there was a greater number of significant reverse sequences for 1x rewards (middle) and a lower number of significant reverse sequences for 0x rewards (right). The results presented in Fig. 5e correspond to a ratio between the two blue bars (0x/1x).

Supplementary information

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/s41593-018-0232-z

Further reading