The rodent hippocampus spontaneously generates bursts of neural activity (replay) that can depict spatial trajectories to reward locations, suggesting a role in model-based behavioral control. A largely separate literature emphasizes reward revaluation as the litmus test for such control, yet the content of hippocampal replay under revaluation conditions is unknown. We examined the content of awake replay events following motivational shifts between hunger and thirst. On a T-maze offering free choice between food and water outcomes, rats shifted their behavior toward the restricted outcome, but replay content was shifted away from the restricted outcome. This effect preceded experience on the task each day and did not reverse with experience. These results demonstrate that replay content is not limited to reflecting recent experience or trajectories toward the preferred goal and suggest a role for motivational states in determining replay content.
Subscribe to Journal
Get full journal access for 1 year
only $17.42 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Data files including metadata are publicly available on DataLad (http://datasets.datalad.org/?dir=/labs/mvdm/, ‘MotivationalT’ dataset).
All analyses were performed using MATLAB 2017a and can be reproduced using code available on our public GitHub repository (http://github.com/vandermeerlab/papers).
van der Meer, M. A. A., Kurth-Nelson, Z. & Redish, A. D. Information processing in decision-making systems. Neuroscientist 18, 342–359 (2012).
Epstein, R. A., Patai, E. Z., Julian, J. B. & Spiers, H. J. The cognitive map in humans: spatial navigation and beyond. Nat. Neurosci. 20, 1504–1513 (2017).
Buzsáki, G. Hippocampal sharp wave-ripple: a cognitive biomarker for episodic memory and planning. Hippocampus 25, 1073–1188 (2015).
Foster, D. J. Replay comes of age. Annu. Rev. Neurosci. 40, 581–602 (2017).
Pfeiffer, B. E. & Foster, D. J. Hippocampal place-cell sequences depict future paths to remembered goals. Nature 497, 74–79 (2013).
Xu, H., Baracskay, P., O’Neill, J. & Csicsvari, J. Assembly responses of hippocampal CA1 place cells predict learned behavior in goal-directed spatial tasks on the radial eight-arm maze. Neuron 101, 119–132.e4 (2019).
Jadhav, S. P., Kemere, C., German, P. W. & Frank, L. M. Awake hippocampal sharp-wave ripples support spatial memory. Science 336, 1454–1458 (2012).
Fernández-Ruiz, A. et al. Long-duration hippocampal sharp wave ripples improve memory. Science 364, 1082–1086 (2019).
Dolan, R. J. & Dayan, P. Goals and habits in the brain. Neuron 80, 312–325 (2013).
Kennedy, P. J. & Shapiro, M. L. Motivational states activate distinct hippocampal representations to guide goal-directed behaviors. Proc. Natl Acad. Sci. USA 106, 10805–10810 (2009).
Hebben, N., Corkin, S., Eichenbaum, H. & Shedlack, K. Diminished ability to interpret and report internal states after bilateral medial temporal resection: case H.M. Behav. Neurosci. 99, 1031–1039 (1985).
Davidson, T. & Jarrard, L. E. A role for hippocampus in the utilization of hunger signals. Behav. Neural Biol. 59, 167–171 (1993).
Kennedy, P. J. & Shapiro, M. L. Retrieving memories via internal context requires the hippocampus. J. Neurosci. 24, 6979–6985 (2004).
Dickinson, A. & Balleine, B. Motivational control of instrumental performance following a shift from thirst to hunger. Q. J. Exp. Psychol. B 42, 413–431 (1990).
Karlsson, M. P. & Frank, L. M. Awake replay of remote experiences in the hippocampus. Nat. Neurosci. 12, 913–918 (2009).
Lakens, D. Equivalence tests: a practical primer for t tests, correlations, and meta-analyses. Soc. Psychol. Personal. Sci. 8, 355–362 (2017).
van der Meer, M. A. A., Carey, A. A. & Tanaka, Y. Optimizing for generalization in the decoding of internally generated activity in the hippocampus. Hippocampus 27, 580–595 (2017).
Gupta, A. S., van der Meer, M. A., Touretzky, D. S. & Redish, A. D. Hippocampal replay is not a simple function of experience. Neuron 65, 695–705 (2010).
Kudrimoti, H. S., Barnes, C. A. & McNaughton, B. L. Reactivation of hippocampal cell assemblies: effects of behavioral state, experience, and EEG dynamics. J. Neurosci. 19, 4090–4101 (1999).
Foster, D. J. & Wilson, M. A. Reverse replay of behavioural sequences in hippocampal place cells during the awake state. Nature 440, 680–683 (2006).
Eschenko, O., Ramadan, W., Mölle, M., Born, J. & Sara, S. J. Sustained increase in hippocampal sharp-wave ripple activity during slow-wave sleep after learning. Learn. Mem. 15, 222–228 (2008).
Louie, K. & Wilson, M. A. Temporally structured replay of awake hippocampal ensemble activity during rapid eye movement sleep. Neuron 29, 145–156 (2001).
Wang, S.-H. & Morris, R. G. M. Hippocampal-neocortical interactions in memory formation, consolidation, and reconsolidation. Annu. Rev. Psychol. 61, 49–79 (2010).
Joo, H. R. & Frank, L. M. The hippocampal sharp wave–ripple in memory retrieval for immediate use and consolidation. Nat. Rev. Neurosci. 19, 744–757 (2018).
Ambrose, R. E., Pfeiffer, B. E. & Foster, D. J. Reverse replay of hippocampal place cells is uniquely modulated by changing reward. Neuron 91, 1124–1136 (2016).
Michon, F., Sun, J.-J., Kim, C. Y., Ciliberti, D. & Kloosterman, F. Post-learning hippocampal replay selectively reinforces spatial memory for highly rewarded locations. Curr. Biol. 29, 1436–1444.e5 (2019).
Wu, C.-T., Haggerty, D., Kemere, C. & Ji, D. Hippocampal awake replay in fear memory retrieval. Nat. Neurosci. 20, 571–580 (2017).
Ólafsdóttir, H. F., Bush, D. & Barry, C. The role of hippocampal replay in memory and planning. Curr. Biol. 28, R37–R50 (2018).
Pfeiffer, B. E. The content of hippocampal ‘replay’. Hippocampus https://doi.org/10.1002/hipo.22824 (2018).
Javadi, A.-H., Tolat, A. & Spiers, H. J. Sleep enhances a spatially mediated generalization of learned values. Learn. Mem. 22, 532–536 (2015).
Pezzulo, G., Kemere, C. & van der Meer, M. A. A. Internally generated hippocampal sequences as a vantage point to probe future-oriented forms of cognition. Ann. N. Y. Acad. Sci. 1396, 144–165 (2017).
Stachenfeld, K. L., Botvinick, M. M. & Gershman, S. J. The hippocampus as a predictive map. Nat. Neurosci. 20, 1643–1653 (2017).
Russek, E. M., Momennejad, I., Botvinick, M. M., Gershman, S. J. & Daw, N. D. Predictive representations can link model-based reinforcement learning to model-free mechanisms. PLoS Comput. Biol. 13, e1005768 (2017).
Sutton, R.S. in Neural Networks for Control (eds Miller, W. T. 3rd, Sutton, R. S. & Werbos P. J.) 179–189 (MIT Press, 1990).
Momennejad, I., Otto, A. R., Daw, N. D. & Norman, K. A. Offline replay supports planning in human reinforcement learning. eLife 7, e32548 (2018).
Mattar, M. G. & Daw, N. D. Prioritized memory access explains planning and hippocampal replay. Nat. Neurosci. 21, 1609–1617 (2018).
Schapiro, A. C., McDevitt, E. A., Rogers, T. T., Mednick, S. C. & Norman, K. A. Human hippocampal replay during rest prioritizes weakly learned information and predicts memory performance. Nat. Commun. 9, 3920 (2018).
Colgin, L. L., Kubota, D., Jia, Y., Rex, C. S. & Lynch, G. Long-term potentiation is impaired in rat hippocampal slices that produce spontaneous sharp waves. J. Physiol. 558, 953–961 (2004).
Mehta, M. R. Cortico-hippocampal interaction during up-down states and memory consolidation. Nat. Neurosci. 10, 13–15 (2007).
Ólafsdóttir, H. F., Carpenter, F. & Barry, C. Task demands predict a dynamic switch in the content of awake hippocampal replay. Neuron 96, 925–935.e6 (2017).
Singer, A. C., Carr, M. F., Karlsson, M. P. & Frank, L. M. Hippocampal SWR activity predicts correct decisions during the initial learning of an alternation task. Neuron 77, 1163–1173 (2013).
Ólafsdóttir, H. F., Barry, C., Saleem, A. B., Hassabis, D. & Spiers, H. J. Hippocampal place cells construct reward related sequences through unexplored space. eLife 4, e06063 (2015).
Grosmark, A. D. & Buzsaki, G. Diversity in neural firing dynamics supports both rigid and learned hippocampal sequences. Science 351, 1440–1443 (2016).
Malhotra, S., Cross, R. W., Zhang, A. & van der Meer, M. A. A. Ventral striatal gamma oscillations are highly variable from trial to trial, and are dominated by behavioural state, and only weakly influenced by outcome value. Eur. J. Neurosci. 42, 2818–2832 (2015).
Balleine, B. W. & Dickinson, A. Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology 37, 407–419 (1998).
Zhang, K., Ginzburg, I., McNaughton, B. L. & Sejnowski, T. J. Interpreting neuronal population activity by reconstruction: unified framework with application to hippocampal place cells. J. Neurophysiol. 79, 1017–1044 (1998).
Davidson, T. J., Kloosterman, F. & Wilson, M. A. Hippocampal replay of extended experience. Neuron 63, 497–507 (2009).
We thank N. Gibson, M. Ryan and J. Flanagan for animal care and M.-C. Kuo, J. Espinosa and E. Carmichael for technical assistance. We thank E. Grant for developing the SWR detection method used for the main analyses in this paper. This work was supported by the University of Waterloo and Dartmouth College (start-up funds to M.v.d.M.), the Netherlands Organization for Scientific Research (grant no. 863.10.013 to M.v.d.M.) and the Human Frontiers Science Project (grant no. RGY0088/2014).
The authors declare no competing interests.
Peer review information: Nature Neuroscience thanks Daojun Ji and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Integrated supplementary information
Supplementary Fig. 1 Schematic of SWR content analysis using sequenceless decoding (a) and sequence-based decoding (b).
In the sequenceless decoding analysis, each candidate SWR event is decoded as a single time bin, using joint tuning curves containing left and right trial data. This analysis produces two outputs reported in the paper (drawn as cyan diamond shapes): the z-scored left vs. right log odds averaged across all events, and a left vs. right count ratio of significant events only. In contrast, sequence-based decoding (b) first decodes all data in a given session for left and right trial tuning curves separately, using a 25 ms moving window (step size, 5 ms). Next, sequences are detected in the left and right posteriors separately, before removing sequences that (1) did not overlap with a candidate SWR event, (2) were not sufficiently distinct from the number of sequences obtained from a random resampling procedure, or (3) were sequences for both left and right.
Each SWR event (top left) is converted into a vector of spike counts and decoded into a joint probability distribution that includes both left and right trajectories of the T-maze. This probability distribution yields a left vs. right log odds score for each event. Because any left or right bias in this score may simply be due to unequal distributions of the number of place cells, average firing rates, and so on, this raw log odds score is compared to a distribution of log odds obtained from 1000 permutations of left and right trial tuning curves used in the decoding. This comparison yields a log odds z-score for each event, which is either averaged across all events in a session (bottom right figure and Fig. 2b), or thresholded to keep only significant events to yield a proportion of left events (Fig. 2c). To determine if SWR content is related to motivational state, both measures are averaged across food-restriction (fr) and water-restriction (wr) sessions, and the resulting values (black vertical bars in bottom right plot; dots indicate single sessions) compared to a distribution of averages obtained from randomly permuting food- and water-restriction labels across sessions (top right; gray bars indicate mean and SD of this shuffled distribution). Thus, this analysis uses two independent bootstraps: the first ‘tuning curve shuffle’ quantifies SWR content on an event-by-event basis, and the second ‘motivational state shuffle’ quantifies the effect of motivational state on SWR content averaged across sessions.
a: Decoding confusion matrix for a single example session. Each decoded time bin (bin size = 100 ms) is assigned to the corresponding actual trial type (left, right) and position of the animal (horizontal axis). Decoded posteriors for all bins at that location are averaged to obtain the values in each column of the matrix. Perfect decoding would result in all diagonal elements being 1. Note the clear diagonal indicating non-random decoding overall, and the fainter, off-diagonal elements corresponding to confusion of left and right trials. b: Average classification performance across all subjects and sessions, based on the trial type (left, right) and location (before the choice point, ‘pre’; or after, ‘post’) of the maximum a posteriori decoding. Note that for all trial types and locations, classification performance is clearly above chance (all p < .001, bootstrap compared to shuffling trial type and location). Overall, classification performance was better for the right (water) arm compared to the left (food) arm (post-CP .92 ± .04 for right, .83 ± .06 for left, p = .009; n.s. for pre-CP; two-tailed Mann-Whitney U test). Importantly, there was no indication that classification performance differed between food- and water-restriction sessions (all comparisons n.s.). Analyses were performed on n = 4 rats and 19 total sessions. c: Single-trial tuning curve Pearson’s correlations between trials of the same type (left-left, right-right) are systematically higher than correlations between trials of different type (left-right), both when averaged across all cells within a session (left panel) and on a cell-by-cell basis (right panel; both two-tailed Mann-Whitney U tests, n = 971 cells across all rats and sessions). Only positions on the common, central stem of the maze were included in this analysis. The higher correlations between trials of the same type, compared to the correlations across types, indicate left and right trials are distinct. Thus, as measured by two different approaches, left and right trials are clearly distinguishable during behavior on the track, even on the common portion of the maze.
Supplementary Fig. 4 SWR content, as measured by z-scored log odds, for every individual recording session.
Columns correspond to pre-task rest, intertrial intervals, and post-task rest respectively. Rows show data from individual subjects (n = 4 rats), and each data point corresponds to a single session (19 total). Individual subjects may show idiosyncratic biases such as an overall shift in SWR content towards the water (right) arm, but in each subject and epoch changes across days in SWR content tended to be opposite changes in behavior. For instance, in the pre-task (left) column, it can be seen that this is the case for all pairs except for R042 day 3 to 4, and R050 day 3 to 4. In other words, 13 out of 15 motivational shifts resulted in a SWR content shift opposite from behavior.
Supplementary Fig. 5 SWR content z-scored log odds based on maze arms only (that is excluding the central stem of the maze).
Figure layout is the same as Fig. 2, as is the pattern of results: across sessions, sequenceless SWR content is negatively correlated with behavioral experience (a, Pearson’s correlation –.88, p = 10−5). Both unthresholded (b) and thresholded z-scored log odds (c) showed a SWR content shift between food- and water-restriction sessions in the opposite direction from behavior (unthresholded z-scored log odds difference z = –3.09, p = 2.1 * 10−3; thresholded proportions difference z = –2.60, p = 9.2 * 10−3). Asterisks indicate significance level (*: p < 0.05, **: p < 0.01, ***: p < 0.001, two-tailed bootstrap) by comparison with a resampled distribution based on shuffling food and water restriction labels across sessions (gray bar width indicates standard error of the mean across shuffles). Analyses were performed on n = 4 rats and 19 total sessions.
Main panel (top) shows the total number of detected sequences (across all subjects) corresponding to the left (L) and right (R) trajectories on the maze, for food-restriction sessions (fr, red) and water-restriction sessions (wr, blue). Note that the same overall pattern is apparent in these raw sequence counts as is shown in the proportion-based analysis (main text, Fig. 5): for food-restriction sessions, more sequences are detected on the right trajectory (leading to water) and for water-restriction sessions, more sequences are detected on the left trajectory (leading to food), indicating a change in SWR sequence content in the opposite direction from the motivational shift. Such a shift between food- and water-restriction sessions was apparent in each individual subject (lower panels). Individual sequence numbers are 1257 sequences for R042 (5 sessions), 95 for R044 (2 sessions), 429 for R050 (6 sessions) and 1999 for R064 (6 sessions).
Supplementary Fig. 7 SWR content, as measured by the proportion of detected sequences on the left (food) trajectory, for every individual recording session.
Columns correspond to pre-task rest, intertrial intervals, and post-task rest respectively. Rows show data from individual subjects (n = 4 rats), and each data point corresponds to a single session (19 total).
Supplementary Fig. 8 SWR sequence content (proportion of sequences depicting the left/food trajectory) for forward sequences (a), reverse sequences (b) and sequences beyond the choice point (c).
The same overall pattern as in the main analysis was apparent (compare Fig. 5), although individual comparisons with a resampled distribution based on shuffling food and water restriction labels across sessions did not reach statistical significance (forward sequences difference z = −1.62, p = .10; reverse sequences difference z = −1.45, p = .15, post-choice point sequences z = −1.04, p = .29). Layout as in Fig. 5b (n = 4 rats, 19 sessions total).
Supplementary Fig. 9 Comparison between the data and SWR content expected based on different hypotheses.
a: Schematic depicting the hypothesis that replay content is proportional to experience (solid line) alongside a sketch of the pattern found in the data (dashed line). The horizontal axis indicates time, including four daily recording sessions starting with a water restriction session (shown in blue on the lower left) and ending with a food restriction session (shown in red on the top right). During each recording session, experience is biased towards the restricted food type (for illustration purposes given here as a .75 probability of choosing water for water-restriction sessions, and .75 food probability for food-restriction sessions). This bias shifts predicted replay content (shown on the vertical axis) towards the corresponding side of the maze (right trials for water, left trials for food; note changes in the solid line (indicating predicted replay content) that occur during the ITI epoch of each recording session. (The size of the experience-driven changes depends on factors such as whether all experience or only within-session experience is considered, as in Fig. 6c, d, but the pattern is the same.) The experience account thus predicts (1) a difference in replay content between the pre- and post-task rest periods, and (2) a bias in post-task replay content towards the recently preferred outcome. As indicated by the dashed line, neither prediction is confirmed by the data. b: Schematic depicting the hypothesis that replay content favors the preferred behavioral choice (or outcome). This account predicts that pre-task replay content (1) and ITI replay content (2) are shifted towards the preferred choice (note solid line is on the water side during water-restriction sessions, and on the food side during food-restriction sessions). As indicated by the dashed line, neither prediction is supported by the data. A variation of this account, which assumes animals can plan for the next session, correctly predicts post-task replay content, but not pre-task replay content.
Supplementary Fig. 10 Comparison between the data and SWR content expected based on delayed-experience and motivational state accounts.
a: Schematic depicting the hypothesis that replay content reflects delayed experience. Although unlikely given the reported rapid effects of experience on replay content (see main text for discussion), this scenario correctly predicts no change between pre- and post-task rest, and an overall replay bias opposite the preferred outcome. However, it further predicts that the behavioral bias (preference for one side or the other, defined as max(pleft, 1 − pleft)) on day n predicts SWR content bias the next day: note the relatively small swing in SWR content following a relatively unbiased session (for example session 2 with 63% food (left) arm experience) and comparatively large swing following a strongly biased session (session 4 with 95%, top right). The two bottom panels depict the three example session data points shown (dark gray symbols), which form the predicted positive correlation; the data, depicted here schematically as light gray circles, do not exhibit such a relationship. This figure also illustrates why it is informative to compute bias scores (bottom right panel; ranging from 0.5 to 1 by using the max operation above) rather than using raw values (bottom left panel). This is an instance of Simpson’s Paradox, where using raw values would always show a positive correlation between day n behavior and day n + 1 pre-task replay content (lower left panel): the structure of the task combined with the overall opposite bias in replay content confines the data points to the lower left and upper right quadrants. Geometrically, the bias scores align the food and water sessions to a common axis (note the reflection of the yellow and green quadrants in the lower right panel, made visible by the notch in one corner), enabling the testing of the more specific predictions shown here. b: If motivational state determines replay content, replay content bias during the pre-task rest period as well as other epochs should predict that session’s behavioral bias (after all, a hungrier animal would show a stronger preference for food). This prediction, illustrated in the lower two panels, is confirmed in the data. Note that again, bias scores are important in avoiding spurious results (Simpson’s Paradox). Because the comparison is within-session rather than across-session (as in a), the raw data are now confined to the upper left and lower right quadrants (lower left panel).
About this article
Cite this article
Carey, A.A., Tanaka, Y. & van der Meer, M.A.A. Reward revaluation biases hippocampal replay content away from the preferred outcome. Nat Neurosci 22, 1450–1459 (2019). https://doi.org/10.1038/s41593-019-0464-6
Proceedings of the National Academy of Sciences (2021)
Current Opinion in Behavioral Sciences (2021)
Current Opinion in Behavioral Sciences (2020)
PLOS Computational Biology (2020)