Abstract
Neural replay is implicated in planning, where states relevant to a task goal are rapidly reactivated in sequence. It remains unclear whether, during planning, replay relates to an actual prospective choice. Here, using magnetoencephalography (MEG), we studied replay in human participants while they planned to either approach or avoid an uncertain environment containing paths leading to reward or punishment. We find evidence for forward sequential replay during planning, with rapid state-to-state transitions from 20 to 90 ms. Replay of rewarding paths was boosted, relative to aversive paths, before a decision to avoid and attenuated before a decision to approach. A trial-by-trial bias toward replaying prospective punishing paths predicted irrational decisions to approach riskier environments, an effect more pronounced in participants with higher trait anxiety. The findings indicate a coupling of replay with planned behavior, where replay prioritizes an online representation of a worst-case scenario for approaching or avoiding.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
Data are freely available on the Open Science Framework: https://osf.io/6ndu9/. Source data are provided with this paper.
Code availability
All code for the experimental paradigm and analysis pipeline is freely available on GitHub: https://github.com/jjmcfadyen/approach-avoid-replay.
References
Skaggs, W. E. & McNaughton, B. L. Replay of neuronal firing sequences in rat hippocampus during sleep following spatial experience. Science 271, 1870–1873 (1996).
Louie, K. & Wilson, M. A. Temporally structured replay of awake hippocampal ensemble activity during rapid eye movement sleep. Neuron 29, 145–156 (2001).
Diba, K. & Buzsáki, G. Forward and reverse hippocampal place-cell sequences during ripples. Nat. Neurosci. 10, 1241–1242 (2007).
Foster, D. J. & Wilson, M. A. Reverse replay of behavioural sequences in hippocampal place cells during the awake state. Nature 440, 680–683 (2006).
Lee, A. K. & Wilson, M. A. Memory of sequential experience in the hippocampus during slow wave sleep. Neuron 36, 1183–1194 (2002).
Wilson, M. A. & McNaughton, B. L. Reactivation of hippocampal ensemble memories during sleep. Science 265, 676–679 (1994).
Schuck, N. W. & Niv, Y. Sequential replay of nonspatial task states in the human hippocampus. Science 364, eaaw5181 (2019).
Wu, X. & Foster, D. J. Hippocampal replay captures the unique topological structure of a novel environment. J. Neurosci. 34, 6459–6469 (2014).
Davidson, T. J., Kloosterman, F. & Wilson, M. A. Hippocampal replay of extended experience. Neuron 63, 497–507 (2009).
Jadhav, S. P., Kemere, C., German, P. W. & Frank, L. M. Awake hippocampal sharp-wave ripples support spatial memory. Science 336, 1454–1458 (2012).
Karlsson, M. P. & Frank, L. M. Awake replay of remote experiences in the hippocampus. Nat. Neurosci. 12, 913–918 (2009).
Kurth-Nelson, Z., Economides, M., Dolan, R. J. & Dayan, P. Fast sequences of non-spatial state representations in humans. Neuron 91, 194–204 (2016).
Friston, K. & Buzsáki, G. The functional anatomy of time: what and when in the brain. Trends Cogn. Sci. 20, 500–511 (2016).
Nour, M. M., Liu, Y., Arumuham, A., Kurth-Nelson, Z. & Dolan, R. J. Impaired neural replay of inferred relationships in schizophrenia. Cell 184, 4315–4328 (2021).
Liu, Y., Dolan, R. J., Kurth-Nelson, Z. & Behrens, T. E. J. Human replay spontaneously reorganizes experience. Cell 178, 640–652 (2019).
Wallach, H. et al. Coordinated hippocampal–entorhinal replay as structural inference. In Advances In Neural Information Processing Systems 32 (NIPS 2019) (eds. Wallach, H. et al.) 13 (NIPS, 2019).
Liu, Y., Mattar, M. G., Behrens, T. E. J., Daw, N. D. & Dolan, R. J. Experience replay is associated with efficient nonlocal learning. Science 372, eabf1357 (2021).
Ólafsdóttir, H. F., Bush, D. & Barry, C. The role of hippocampal replay in memory and planning. Curr. Biol. 28, R37–R50 (2018).
Buzsáki, G. Hippocampal sharp wave‐ripple: a cognitive biomarker for episodic memory and planning. Hippocampus 25, 1073–1188 (2015).
Ólafsdóttir, H. F., Carpenter, F. & Barry, C. Task demands predict a dynamic switch in the content of awake hippocampal replay. Neuron 96, 925–935 (2017).
Xu, H., Baracskay, P., O’Neill, J. & Csicsvari, J. Assembly responses of hippocampal CA1 place cells predict learned behavior in goal-directed spatial tasks on the radial eight-arm maze. Neuron 101, 119–132 (2019).
Eldar, E., Lièvre, G., Dayan, P. & Dolan, R. J. The roles of online and offline replay in planning. eLife 9, e56911 (2020).
Singer, A. C., Carr, M. F., Karlsson, M. P. & Frank, L. M. Hippocampal SWR activity predicts correct decisions during the initial learning of an alternation task. Neuron 77, 1163–1173 (2013).
Pfeiffer, B. E. & Foster, D. J. Hippocampal place-cell sequences depict future paths to remembered goals. Nature 497, 74–79 (2013).
Ambrose, R. E., Pfeiffer, B. E. & Foster, D. J. Reverse replay of hippocampal place cells is uniquely modulated by changing reward. Neuron 91, 1124–1136 (2016).
Zheng, C., Hwaun, E., Loza, C. A. & Colgin, L. L. Hippocampal place cell sequences differ during correct and error trials in a spatial memory task. Nat. Commun. 12, 3373 (2021).
Papale, A. E., Zielinski, M. C., Frank, L. M., Jadhav, S. P. & Redish, A. D. Interplay between hippocampal sharp-wave-ripple events and vicarious trial and error behaviors in decision making. Neuron 92, 975–982 (2016).
Igata, H., Ikegaya, Y. & Sasaki, T. Prioritized experience replays on a hippocampal predictive map for learning. Proc. Natl Acad. Sci. USA 118, e2011266118 (2021).
Gupta, A. S., van der Meer, M. A. A., Touretzky, D. S. & Redish, A. D. Hippocampal replay is not a simple function of experience. Neuron 65, 695–705 (2010).
Ólafsdóttir, H. F., Barry, C., Saleem, A. B., Hassabis, D. & Spiers, H. J. Hippocampal place cells construct reward related sequences through unexplored space. eLife 4, e06063 (2015).
Dragoi, G. & Tonegawa, S. Preplay of future place cell sequences by hippocampal cellular assemblies. Nature 469, 397–401 (2011).
Singer, A. C. & Frank, L. M. Rewarded outcomes enhance reactivation of experience in the hippocampus. Neuron 64, 910–921 (2009).
Wu, C.-T., Haggerty, D., Kemere, C. & Ji, D. Hippocampal awake replay in fear memory retrieval. Nat. Neurosci. 20, 571–580 (2017).
Mattar, M. G. & Daw, N. D. Prioritized memory access explains planning and hippocampal replay. Nat. Neurosci. 21, 1609–1617 (2018).
Carey, A. A., Tanaka, Y. & van der Meer, M. A. A. Reward revaluation biases hippocampal replay content away from the preferred outcome. Nat. Neurosci. 22, 1450–1459 (2019).
Gillespie, A. K. et al. Hippocampal replay reflects specific past experiences rather than a plan for subsequent choice. Neuron 109, 3149–3163).
Antonov, G., Gagne, C., Eldar, E. & Dayan, P. Optimism and pessimism in optimised replay. PLoS Comput. Biol. 18, e1009634 (2022).
Quartz, S. R. Reason, emotion and decision-making: risk and reward computation with feeling. Trends Cogn. Sci. 13, 209–215 (2009).
Bach, D. R. et al. Human hippocampus arbitrates approach-avoidance conflict. Curr. Biol. 24, 541–547 (2014).
Gagne, C., Dayan, P. & Bishop, S. J. When planning to survive goes wrong: predicting the future and replaying the past in anxiety and PTSD. Curr. Opin. Behav. Sci. 24, 89–95 (2018).
Heller, A. S. & Bagot, R. C. Is hippocampal replay a mechanism for anxiety and depression? JAMA Psychiatry 77, 431–432 (2020).
Aupperle, R. L. & Paulus, M. P. Neural systems underlying approach and avoidance in anxiety disorders. Dialogues Clin. Neurosci. 12, 517–531 (2010).
Loijen, A., Vrijsen, J. N., Egger, J. I. M., Becker, E. S. & Rinck, M. Biased approach-avoidance tendencies in psychopathology: a systematic review of their assessment and modification. Clin. Psychol. Rev. 77, 101825 (2020).
Loh, E. et al. Parsing the role of the hippocampus in approach-avoidance conflict. Cereb. Cortex 27, 201–215 (2016).
Schumacher, A. et al. Ventral hippocampal CA1 and CA3 differentially mediate learned approach-avoidance conflict processing. Curr. Biol. 28, 1318–1324 (2018).
Schumacher, A., Vlassov, E. & Ito, R. The ventral hippocampus, but not the dorsal hippocampus is critical for learned approach-avoidance decision making. Hippocampus 26, 530–542 (2016).
Wimmer, G. E., Liu, Y., Vehar, N., Behrens, T. E. J. & Dolan, R. J. Episodic memory retrieval success is associated with rapid replay of episode content. Nat. Neurosci. 23, 1025–1033 (2020).
Wise, T., Liu, Y., Chowdhury, F. & Dolan, R. J. Model-based aversive learning in humans is supported by preferential task state reactivation. Sci. Adv. 7, eabf9616 (2021).
Wikenheiser, A. M. & Redish, A. D. Hippocampal theta sequences reflect current goals. Nat. Neurosci. 18, 289–294 (2015).
Johnson, A. & Redish, A. D. Neural ensembles in CA3 transiently encode paths forward of the animal at a decision point. J. Neurosci. 27, 12176–12189 (2007).
Gupta, A. S., van der Meer, M. A. A., Touretzky, D. S. & Redish, A. D. Segmentation of spatial experience by hippocampal theta sequences. Nat. Neurosci. 15, 1032–1039 (2012).
Momennejad, I., Otto, A. R., Daw, N. D. & Norman, K. A. Offline replay supports planning in human reinforcement learning. eLife 7, e32548 (2018).
Kaefer, K., Nardin, M., Blahna, K. & Csicsvari, J. Replay of behavioral sequences in the medial prefrontal cortex during rule switching. Neuron 106, 154–165 (2020).
Berners-Lee, A., Wu, X. & Foster, D. J. Prefrontal cortical neurons are selective for non-local hippocampal representations during replay and behavior. J. Neurosci. 41, 5894–5908(2021).
Lansink, C. S., Goltstein, P. M., Lankelma, J. V., McNaughton, B. L. & Pennartz, C. M. A. Hippocampus leads ventral striatum in replay of place-reward information. PLoS Biol. 7, e1000173 (2009).
Vertes, R. P. Interactions among the medial prefrontal cortex, hippocampus and midline thalamus in emotional and cognitive processing in the rat. Neuroscience 142, 1–20 (2006).
Tolman, E. C. Cognitive maps in rats and men. Psychol. Rev. 55, 189–208 (1948).
Hirsch, C. R. & Mathews, A. A cognitive model of pathological worry. Behav. Res. Ther. 50, 636–646 (2012).
Hunter, L. E., Meer, E. A., Gillan, C. M., Hsu, M. & Daw, N. D. Increased and biased deliberation in social anxiety. Nat. Hum. Behav. 6, 146–154 (2021).
Hartley, C. A. & Phelps, E. A. Anxiety and decision-making. Biol. Psychiatry 72, 113–118 (2012).
Kuhnen, C. M. Asymmetric learning from financial information. J. Finance 70, 2029–2062 (2015).
Mobbs, D., Headley, D. B., Ding, W. & Dayan, P. Space, time, and fear: survival computations along defensive circuits. Trends Cogn. Sci. 24, 228–241 (2020).
Carleton, R. N., Norton, M. A. P. J. & Asmundson, G. J. G. Fearing the unknown: a short version of the Intolerance of Uncertainty Scale. J. Anxiety Disord. 21, 105–117 (2007).
Meyer, T. J., Miller, M. L., Metzger, R. L. & Borkovec, T. D. Development and validation of the Penn State Worry Questionnaire. Behav. Res. Ther. 28, 487–495 (1990).
Blais, A.-R. & Weber, E. U. A Domain-Specific Risk-Taking (DOSPERT) scale for adult populations. Judgm. Decis. Mak. 1, 33–47 (2006).
Liu, Y. et al. Temporally delayed linear modelling (TDLM) measures replay in both animals and humans. eLife 10, e66917 (2021).
Kurth-Nelson, Z., Barnes, G., Sejdinovic, D., Dolan, R. & Dayan, P. Temporal structure in associative retrieval. eLife 4, e04919 (2015).
Polyn, S. M., Natu, V. S., Cohen, J. D. & Norman, K. A. Category-specific cortical activity precedes retrieval during memory search. Science 310, 1963–1966 (2005).
Horner, A. J., Bisby, J. A., Bush, D., Lin, W.-J. & Burgess, N. Evidence for holistic episodic recollection via hippocampal pattern completion. Nat. Commun. 6, 7462 (2015).
Woolrich, M., Hunt, L., Groves, A. & Barnes, G. MEG beamforming using Bayesian PCA for adaptive data covariance matrix regularization. Neuroimage 57, 1466–1479 (2011).
Van Veen, B. D., van Drongelen, W., Yuchtman, M. & Suzuki, A. Localization of brain electrical activity via linearly constrained minimum variance spatial filtering. IEEE Trans. Biomed. Eng. 44, 867–880 (1997).
Hunt, L. T. et al. Mechanisms underlying cortical activity during value-guided choice. Nat. Neurosci. 15, 470–476 (2012).
Rolls, E. T., Huang, C.-C., Lin, C.-P., Feng, J. & Joliot, M. Automated anatomical labelling atlas 3. Neuroimage 206, 116189 (2020).
Fox, J. Applied Regression Analysis and Generalized Linear Models (SAGE Publications, 2015).
Acknowledgements
We thank T. Wise and P. Sharp for their helpful discussions about the study design. This work is supported by the Wellcome Trust (098362/A/12/Z and 091593/Z/10/Z supporting R.J.D. and J.M., respectively). Y.L. is supported by the National Science and Technology Innovation 2030 Major Program (2022ZD0205500) and the National Natural Science Foundation of China (32271093). The Max Planck University College London Centre for Computational Psychiatry and Ageing Research is a joint initiative supported by University College London and the Max Planck Society. The Wellcome Centre for Human Neuroimaging is supported by core funding from the Wellcome Trust (203147/Z/16/Z). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
J.M. designed the experiment with input from Y.L. J.M. collected data, and J.M. and Y.L. wrote the analysis code. J.M. and Y.L. interpreted data with input from R.J.D. J.M. wrote the manuscript with input and edits from Y.L. and R.J.D.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review information
Nature Neuroscience thanks Philippe Albouy and Matthijs van der Meer for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Experimental paradigm.
(a) All 12 stimuli used in the experiment. Six stimuli were pseudo-randomly allocated to each participant, per session, ensuring equal allocation of each stimulus. (b) For visualisation purposes, stimuli are represented by letters: A, B, and C for path 1, and D, E, and F for path 2. In an initial functional localiser in session 2, participants viewed an image and then reported the correct label (left or right). Correct and incorrect responses produced a green or red fixation cross, respectively. (c) To learn the image order, participants selected either Path 1 or Path 2 (forced-choice, 2 selections of each) and then observed an animation of the sequence of states along each path. Participants were then tested on their memory for the image order. (d) After image learning, participants observed the animated sequences again but this time with the value of each state displayed underneath each image. Participants were tested on their memory for the values associated with each state, as well as their ability to calculate the cumulative sum at different states. (e) Two protocols were used, counterbalanced across participants. One such protocol is shown. The designation of the odd rule to one state per path (first row) dictated the final value of each path (second row). One path was mostly negative (here, starting with path 1; circles) and the other mostly positive (here, starting with path 2; triangles), with the exception of catch trials (marked in yellow). This tendency switched halfway through the experiment. The transition probability (third row) to each path, in combination with the path value, dictated the expected value of approaching (fourth row). Expected value was a sum of each path’s value weighted by its probability. When the expected value was greater than 1 (which was guaranteed outcome of avoiding), participants should approach (green); otherwise, participants should avoid (red).
Extended Data Fig. 2 Classifier confusion matrices.
(a) Temporal generalisation matrix for classifiers trained (x axis) and tested (y axis) across all time points. Five-fold cross-validated accuracy is displayed, averaged across all participants (N = 26) and all six states. (b) Confusion matrices for all six states (x axis = trained, y axis = tested) per classifier training time. Five-fold cross-validated accuracy is averaged across all participants (N = 26).
Extended Data Fig. 3 Extending replay intervals from 600 ms to 3 seconds.
(a) Evidence for forwards-minus-backwards sequenceness (averaged across subjects with shaded standard error) during planning for state-to-state intervals from 0 to 3 seconds, in steps of 10 ms. Only replay within an original analysis window of lags from 0 to 600 ms exceeded the significance threshold (dashed horizontal line; shaded error indicates standard error of the mean). (b) Same as A, except only sequenceness for lags from 600 ms to 3 seconds were considered in the permutation threshold, to give a more conservative significance threshold (by excluding shorter lags known to produce stronger sequenceness estimates). No sequenceness estimates exceeded the significance threshold.
Extended Data Fig. 4 Modulation of path replay by experience and expectations.
(a) Replay strength (y axis) during planning as predicted by a model containing path type (reward or loss), path experience (x axis), and path transition probability (darker lines indicate higher transition probability). Evidence of rewarding path replay increased when rewarding paths had not been experienced for longer, whereas the opposite was true for punishing paths. This was most prominent when rewarding paths were more likely to be transitioned to. (b) Replay strength (y axis) during planning was not significantly predicted by a model containing path probability (x axis) and choice (approach or avoid).
Extended Data Fig. 5 Behavioural modelling.
(a) Model specificity, as shown by the proportion of times each fitted model (rows) was the best fit for data simulated by different models (columns). (b) Expected task performance of different strategies, as indexed by model fit (BIC) to rational responses. (c) Group-level estimates of model fit. BIC scores (y axis) are shown for each model (x axis), where BIC has been subtracted from the worst-performing model to make higher numbers indicate better fit. (d) Pie chart of winning model per subject. Solid outlines indicate ‘mental arithmetic’ models, dashed outlines indicate ‘learn path value’ models, and dotted outlines indicates ‘learn odd rule positions’ models.
Extended Data Fig. 6 Overall reactivation of states along reward and loss paths during planning.
To estimate overall state reactivation, we conducted a GLM on the probabilistic state reactivation time series per trial, comparing overall reactivation of rewarding vs punishing paths. The resultant beta coefficients for each path’s overall state reactivation were entered into two linear mixed effects models: model 9 predicting state reactivation [Reactivation ~ (Choice × Path Type) + RT + (1 | Subject)] and model 10 predicting sequenceness [Sequenceness ~ (Choice × Path Type) + Reactivation) + RT + (1 | Subject / Lag)]. Note that trials with state reactivation coefficients more than 5 standard deviations from the group mean were excluded (0.03% of trials). (a) Estimated marginal means from linear mixed effects model (on N = 24 participants), where choice and path type (rewarding or punishing, green and red respectively) predicted state reactivation coefficient. There was significantly greater (p = 0.028) reactivation for rewarding than punishing paths (significance given by a two-tailed statistic using a Satterhwaite approximation, and error bars indicate 95% confidence interval – same for B and C). (b) Similar model to A, except that this model predicted sequenceness, and also state reactivation was included as a fixed effect. An interaction between choice and path type on sequenceness was significant (p = 3.023E-6), even when controlling for reactivation. (c) Same model as B but showing the effect of state reactivation on sequenceness (p = 8.542E-14). * p < .05, ** p < .01, *** p < .001.
Supplementary information
Supplementary Information
Supplementary Methods and Tables 1–3
Source data
Source Data Fig. 2
Statistical source data.
Source Data Fig. 3
Statistical source data.
Source Data Fig. 4
Statistical source data.
Source Data Extended Data Fig. 2
Statistical source data.
Source Data Extended Data Fig. 3
Statistical source data.
Source Data Extended Data Fig. 4
Statistical source data.
Source Data Extended Data Fig. 5
Statistical source data.
Source Data Extended Data Fig. 6
Statistical source data.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
McFadyen, J., Liu, Y. & Dolan, R.J. Differential replay of reward and punishment paths predicts approach and avoidance. Nat Neurosci 26, 627–637 (2023). https://doi.org/10.1038/s41593-023-01287-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41593-023-01287-7