Hippocampal-neocortical interactions sharpen over time for predictive actions

Hindy, Nicholas C.; Avery, Emily W.; Turk-Browne, Nicholas B.

doi:10.1038/s41467-019-12016-9

Download PDF

Article
Open access
Published: 05 September 2019

Hippocampal-neocortical interactions sharpen over time for predictive actions

Nicholas C. Hindy ORCID: orcid.org/0000-0003-1643-0251¹,
Emily W. Avery² &
Nicholas B. Turk-Browne²

Nature Communications volume 10, Article number: 3989 (2019) Cite this article

4839 Accesses
16 Citations
25 Altmetric
Metrics details

Subjects

Abstract

When an action is familiar, we are able to anticipate how it will change the state of the world. These expectations can result from retrieval of action-outcome associations in the hippocampus and the reinstatement of anticipated outcomes in visual cortex. How does this role for the hippocampus in action-based prediction change over time? We use high-resolution fMRI and a dual-training behavioral paradigm to examine how the hippocampus interacts with visual cortex during predictive and nonpredictive actions learned either three days earlier or immediately before the scan. Just-learned associations led to comparable background connectivity between the hippocampus and V1/V2, regardless of whether actions predicted outcomes. However, three-day-old associations led to stronger background connectivity and greater differentiation between neural patterns for predictive vs. nonpredictive actions. Hippocampal prediction may initially reflect indiscriminate binding of co-occurring events, with action information pruning weaker associations and leading to more selective and accurate predictions over time.

Hippocampal representations switch from errors to predictions during acquisition of predictive associations

Article Open access 08 June 2022

Mnemonic prediction errors bias hippocampal states

Article Open access 10 July 2020

Feedforward prediction error signals during episodic memory retrieval

Article Open access 27 November 2020

Introduction

As you open the door to a familiar room, you are able to anticipate specific objects that will come into view. A neural source of such predictions may be pattern completion in the hippocampus^1,2,3. Repeated experience and interaction allows associative learning mechanisms in the hippocampus to bind recurring patterns of objects and actions over space and time^4,5. Once these links are formed, making an action in response to a familiar cue may prompt the hippocampus to retrieve a conjunctive representation of past events. These representations could contain information about the cue and action, but additionally the yet-to-occur sensory consequences of the action. These retrieved consequences could in turn get reinstated via feedback to sensory systems—a form of memory-based predictive coding of action outcomes.

Decoding of stimulus-related information during action-based prediction provides suggestive evidence of a link between pattern completion in the hippocampus and predictive coding in early visual cortex (EVC)³. For pre-learned cue–action–outcome sequences, qualitatively different stimulus representations were evoked in the hippocampus and EVC for predictive actions (i.e., actions that determine an outcome given a cue). Given a part of a sequence, the hippocampus represented the full cue–action–outcome sequence, and this was related within and across participants to evidence of the same outcome in EVC. In control analyses with nonpredictive actions (i.e., actions that did not determine which outcome appeared after a cue), actions could not be decoded from stimulus-evoked activity in either region.

Beyond correlations in stimulus-evoked information, we hypothesize that the intrinsic coupling of the hippocampus and EVC may be enhanced during action-based prediction. This hypothesis is motivated by findings in human neurophysiology that link perceptual inference to long-range oscillatory synchronization between the hippocampus and visual cortex^6,7, together with the observation that stimulus-evoked responses and coherent spontaneous fluctuations are linearly superimposed in human functional magnetic resonance imaging (fMRI) data⁸. Critically, although correlated classification of stimulus-evoked responses is suggestive of hippocampal–neocortical interactions, such correlations depend upon the precision of memories and associated predictions represented within each region. Therefore, along with measuring multivariate patterns in the hippocampus and EVC, here we use background connectivity to quantify the temporal dynamics and covariance of these regions after removing stimulus-evoked responses^9,10. Because background connectivity may more directly measure hippocampal–neocortical interactions than stimulus-specific decoding on its own, we reason that it should provide an objective index of the contexts in which the hippocampus is and is not involved in action-based predictive coding.

Critically, different accounts of memory retrieval make diverging predictions about the temporal contexts in which the hippocampus may be involved in predictive coding. On the one hand, involvement of the hippocampus could be specific to associations that have been learned very recently. In this case, hippocampal–neocortical interactions may diminish over time with the neocortex playing a more autonomous role in action-based prediction as a result of systems consolidation^11,12,13 (cf. ref. ¹⁴). Alternatively, unique computational processes of the hippocampus like multimodal binding and pattern completion may serve an important function in prediction regardless of the more canonical role of the hippocampus as a memory system^15,16,17. Thus, in the current study we test the role of the hippocampus in action-based prediction over two timescales. We hypothesize that background connectivity between the hippocampus and EVC depends on both the lag between training and scanning and the predictiveness of actions, and that this relates to the representational contents of these areas.

Participants learned cue–action–outcome sequences in a first training session 3 days before an fMRI scan and in a second training session immediately before the scan (Fig. 1a). Separate sets of cues and outcomes were used in each training session and actions were either predictive or nonpredictive of outcomes depending on the cue (Fig. 1b). For predictive actions, one outcome reliably followed the cue after a left button press and a different outcome reliably appeared after a right button press; explicit memory of predictable outcomes was at ceiling on verbal tests administered during each training session and before and after the fMRI scan (Fig. 1c). For nonpredictive actions, the two outcomes followed the cue with equal probability when either the left or right button was pressed. After both training sessions, participants performed the same task in the fMRI scanner, with stimuli from the two training sessions presented separately in alternating runs, and cues with predictive vs. nonpredictive actions presented separately in alternating blocks within each run type. Background connectivity was calculated for each of these blocks and then collapsed within condition, resulting in four key measures of hippocampal–EVC interaction: 3-day vs. no-delay learning of predictive vs. nonpredictive actions.

Results

Verbal tests

To verify that predictive actions had been learned during training and remembered across the delay, participants were required to be 100% accurate in identifying expected outcomes of predictive actions in verbal outcome-identification tests outside of the fMRI scanner. Participants who did not reach this accuracy criterion on each test were excluded from the fMRI scan. Thus, by definition, all 24 scanned participants reached perfect accuracy. Two additional participants completed training but did not participate in the fMRI scan because of accuracy less than 100% even after repeating a pre-scan test.

Choice RT

Throughout training and in the scanner, we measured choice response time (RT) as the time it took for participants to press the left or right button in response to a cue. During training sessions outside of the scanner, choice RT did not differ among the conditions (p's > 0.26 in repeated-measures ANOVAs). The lack of a timescale difference between training sessions is not surprising, as these conditions were equivalent at this point in the study. However, in the scanner, we observed a reliable interaction between timescale and predictiveness (F(1, 22) = 5.49, p = 0.03; Fig. 1d). For no-delay sequences, choice RT was comparable for predictive and nonpredictive actions (t(23) = 0.18, p = 0.86), whereas for 3-day delay sequences, choice RT was faster for predictive vs. nonpredictive actions (t(23) = 3.96, p < 0.001). When predictive and nonpredictive events were separately compared across delay conditions, speeded RT over time for predictive actions was marginally significant (t(23) = 1.85, p = 0.08), while slower RT over time for nonpredictive actions was not significant (t(23) = 1.57, p = 0.13).

Stimulus-evoked responses

A general linear model (GLM) containing finite impulse response (FIR) basis functions was used to estimate evoked blood-oxygen level-dependent (BOLD) activity in the hippocampus and EVC (Fig. 2a). Stimulus-evoked activity for each condition was averaged within block to capture the peak response. Although activity in the hippocampus was marginally reduced for both predictive and nonpredictive actions after the 3-day delay (F(1, 22) = 4.09, p = 0.05), no other main effects or interactions were observed in either ROI (p's > 0.35 in repeated-measures ANOVAs).

ROI background connectivity

Task-specific background connectivity between the hippocampus and EVC was measured after removing stimulus-evoked activity and confounding variables through linear regression in a multistep procedure^{9,18,19,20,21}. We first used a GLM to regress out white matter and ventricle activity along with motion parameters from preprocessing, and then used FIR basis functions to capture and remove the average timing and shape of the hemodynamic response in each voxel in a data-driven way. Background connectivity was measured as correlations in the residual timeseries of each ROI. There were no differences across hemispheres in background connectivity between the hippocampus and EVC (p's > 0.61 in repeated-measures ANOVAs). Critically, we observed a reliable interaction between timescale and predictiveness (F(1, 23) = 8.28, p = 0.008; Fig. 2b). This interaction was driven by a reliable difference between predictive and nonpredictive actions for sequences learned 3 days before the fMRI scan (t(23) = 2.90, p = 0.008), with no hint of an effect of predictiveness for sequences learned immediately before the scan (t(23) = 0.12, p = 0.90). When predictive and nonpredictive events were separately compared across delay conditions, enhanced background connectivity over time for predictive actions was not significant (t(23) = 1.67, p = 0.11), while diminished background connectivity over time for nonpredictive actions was significant (t(23) = 2.34, p = 0.03). Furthermore, although the interactions between timescale and predictability in background connectivity paralleled interactions in RT, differences among conditions in background connectivity were not correlated with RT either across participants or across runs for each participant (p's > 0.27 for all Pearson correlation coefficients and repeated-measures ANOVAs; Supplementary Fig. 1).

Control correlations across matched runs

The goal of background connectivity is to remove stimulus-evoked responses to isolate idiosyncratic fluctuations that reveal how experimental conditions modulate functional connectivity. To verify that the residualizing approach above was effective, we performed an across-run control analysis⁹. Each training condition was tested in two runs that used the same block order and the same cue–stimulus–outcome sequences. If the key findings above were confounded by unmodeled stimulus-evoked responses, the residual activity in the hippocampus in one run should be correlated with the residual activity in EVC in the other run. However, there were no reliable interactions or main effects when computing connectivity across runs (p's > 0.17 in repeated-measures ANOVAs; Fig. 2c); moreover, connectivity was reliably lower for each condition when it was calculated across vs. within run (p's < 0.01 in paired t-tests).

Specificity within V1/V2

Are differences in background connectivity specific to only the voxels that are most responsive to the specific retinotopic location of the experimental stimuli, or are they widespread throughout V1/V2? The EVC ROI for each participant included (1.5 mm isotropic) voxels responsive to square- and diamond-masked stimuli in a localizer scan, ranging from 732 to 4200 voxels in volume (6.2–30.0% of V1/V2). To examine the specificity of hippocampal background connectivity within EVC, we varied the extent of the EVC ROI from including just the 50 voxels (<1% of V1/V2) most responsive to functional localizer stimuli to including all V1/V2 voxels (Fig. 3a). Immediately after training, hippocampal background connectivity was equivalent for predictive and nonpredictive actions regardless of the size of the EVC ROI (Fig. 3b). In contrast, after the 3-day delay, background connectivity was reliably stronger for predictive than nonpredictive actions across a wide range of ROI sizes (Fig. 3c). Likewise, the interaction between predictiveness and timescale was significant for ROIs ranging from 50 to 1000 voxels (p's < 0.05) and marginally reliable for 5000 voxels (F(1, 23) = 3.81, p = 0.06). However, this interaction was not significant for the mean background timeseries across all V1 and V2 voxels (F(1, 23) = 0.48, p = 0.50). Thus, while differences in hippocampal background connectivity were robust to the size of the EVC ROI, they were not entirely pervasive within V1/V2.

Voxelwise background connectivity

To what extent are timescale and predictiveness differences in background connectivity between the hippocampus and EVC specific to these regions vs. widespread in the brain? To assess the anatomical specificity of the effects, we performed exploratory analyses using the residual timeseries from bilateral hippocampus and EVC ROIs (Fig. 4a) to calculate background connectivity with all voxels in the partial volume collected for each participant. After registering these correlation maps to MNI space, we conducted nonparametric randomization tests of their reliability across participants. Immediately after training, predictiveness did not reliably modulate background connectivity anywhere in the partial volume when the hippocampus or EVC served as the seed (Fig. 4b). Conversely, for sequences learned 3 days before the scan, several clusters showed reliably greater background connectivity with each seed during blocks of predictive vs. nonpredictive actions (Fig. 4c). Specifically, predictiveness enhanced the background connectivity of the hippocampus with left (−9, −87, −13) and right (19, −91, −10) occipitotemporal cortex and left (−20, 6, −3) and right (25, 18, −4) putamen, and enhanced the background connectivity of EVC with anterior (−30, −12, −27) and posterior (−21, −42, −7) left hippocampus (bilateral at uncorrected threshold), left parahippocampal gyrus (−17, −52, 2), and left posterior cingulate cortex (−9, −55, 14). At each timescale, no voxels showed stronger background connectivity with the hippocampus or EVC for nonpredictive actions.

Verbal predictions for nonpredictive actions

There are multiple potential explanations for the observed interaction between timescale and predictiveness in background connectivity. First, it could be that background connectivity between hippocampus and EVC was at equivalent baseline levels for predictive and nonpredictive actions immediately after training, while enhanced specifically for predictive actions after the 3-day delay. Alternatively, it could be that background connectivity was already enhanced above baseline for both predictive and nonpredictive actions immediately after training, while reduced specifically for nonpredictive actions after the 3-day delay. Beyond the control correlations across matched runs that could be used to infer a baseline correlation for the context, behavior on the verbal tests before and after the fMRI scan can be used to help disentangle these possibilities. While participants were required to be 100% accurate in identifying outcomes of predictive actions, there were no correct or incorrect responses for nonpredictive actions. Nonetheless, participants could be consistent or inconsistent in their verbal predictions of unpredictable outcomes. We quantified this behavior for nonpredictive actions based on how consistently each participant mapped each outcome onto specific cue–action combinations (Fig. 5a). In fact, participants were significantly less consistent in verbally identifying expected outcomes of nonpredictive actions learned before the 3-day delay than for nonpredictive actions immediately before the scan (t(23) = 3.86, p < 0.001), suggesting that action-based prediction may have diminished over time for nonpredictive events (Fig. 5b).

Are consistent vs. inconsistent predictions sufficient to modulate hippocampal–neocortical interactions for nonpredictive actions? In total, 14 of the 24 participants were 100% consistent in identifying outcomes for nonpredictive cues and actions immediately after training, while 4 participants were 100% consistent after the 3-day delay. We reasoned that participants who were consistent in verbally identifying the outcomes of nonpredictive actions may have likewise maintained stronger visual predictions for nonpredictive actions than participants who were inconsistent in their responses. If so, such differences across participants may also be reflected in their hippocampal–neocortical interactions. Indeed, background connectivity during nonpredictive actions tended to be greater among participants who made 100% consistent test responses than among participants who made inconsistent responses (Fig. 5c). While this difference between participants was not significant immediately after training (t(22) = 1.25, p = 0.22), it was significant after the 3-day delay (t(22) = 2.85, p = 0.009). Moreover, among participants with 100% consistent test responses, background connectivity was the same for predictive and nonpredictive actions at each timescale (p's > 0.79 in paired t-tests).

Time-lagged background connectivity

Background connectivity between the hippocampus and visual cortex during predictive action is agnostic to the direction of the interaction. Such questions can only be addressed definitively with techniques that allow for causal interventions. Moreover, the slow sampling rate of fMRI and the temporal autocorrelation of BOLD activity severely limit the analysis of temporal dynamics. Nevertheless, it is possible to test whether there exists any evidence for a temporal asymmetry in the signals between these regions³ that would be consistent with processing in one region preceding the other. Specifically, we hypothesized that insofar as the hippocampus is relying on learned predictiveness to reinstate expected outcomes in visual cortex, the activity in the hippocampus at one time point should predict activity in visual cortex at the next time point, at least more than the reverse. Indeed, we were able to replicate the main timescale by predictiveness interaction reported above when EVC was lagged by one time point with respect to the hippocampus (F(1, 22) = 4.77, p = 0.04; Fig. 6a). This interaction reflected a reliable difference in background connectivity between predictive and nonpredictive actions for sequences learned three days before the fMRI scan (t(23) = 2.98, p = 0.007) and not for sequences learned immediately before the scan (t(23) = −0.33, p = 0.74). Critically, using nonpredictive blocks as the baseline controls for the possibility that BOLD activity merely peaks later in visual cortex than the hippocampus. In contrast, no such interaction was found when the hippocampus was lagged with respect to EVC (F(1,22) = 0.04, p = 0.85; Fig. 6b), with no differences between predictive and nonpredictive actions at either timescale (p's > 0.21 in paired t-tests).

Multivariate pattern similarity

We have shown that background connectivity between the hippocampus and EVC strengthens over time for predictive relative to nonpredictive actions. How does this relate to the information represented in each ROI? Specifically, we hypothesized that greater connectivity for predictive actions after 3 days should be accompanied by greater information about expected outcomes. We tested this by measuring the neural similarity between visual transitions in which the same cue appeared but was followed by different outcomes (Fig. 7a). Insofar as these overlapping cue–outcome transitions are more differentiated after 3 days vs. immediately after training, it would imply that the actions led to a stronger and/or clearer prediction of the outcome. To calculate pattern similarity, we correlated spatial patterns of parameter estimates in the hippocampus and EVC obtained from an event-related GLM, as a function of which cue was presented, whether it was associated with predictive vs. nonpredictive actions, and at what timescale it was learned. Critically, visual stimulation was the same for cue–outcome transitions containing either predictive or nonpredictive actions: either of two outcomes followed a cue and double-sided arrow with equal probability. But since nonpredictive actions could not be decoded in either the hippocampus or EVC in a prior study with the same action-based prediction task³, we averaged across left and right button presses for visual transitions that contained nonpredictive actions. However, all of the pattern similarity effects replicated in follow-up analyses with resampled data that split between left and right nonpredictive actions (Supplementary Fig. 2). Moreover, predictiveness significantly interacted in each ROI with comparisons of within-cue vs. across-cue pattern similarity of the exact same multivoxel patterns (p's < 0.01; Fig. 7b).

Consistent with prior findings³, neural representations of the two alternative visual transitions associated with each predictive cue were less similar to one another than those of each nonpredictive cue in both the hippocampus (F(1, 23) = 17.77, p < 0.001) and EVC (F(1, 23) = 32.16, p < 0.001), suggesting that predictive actions helped disambiguate action outcomes in these regions. Importantly, differentiation effects were modulated by delay condition, including a marginally reliable interaction between predictiveness and timescale in the hippocampus (F(1, 23) = 4.26, p = 0.051) and a significant interaction in EVC (F(1, 23) = 5.36, p = 0.03). In the hippocampus, pattern similarity was reliably reduced for predictive vs. nonpredictive cues trained 3 days before the scan (t(23) = 4.73, p < 0.001), but did not differ for visual transitions trained immediately before the scan (t(23) = 1.61, p = 0.12). In EVC, despite a reliable interaction, the difference in pattern similarity for predictive and nonpredictive events was reliable both after the 3-day delay (t(23) = 5.60, p < 0.001) and immediately after training (t(23) = 3.42, p = 0.002). Unlike background connectivity, pattern similarity did not significantly differ between immediate and 3-day delay conditions within just predictive or just nonpredictive actions in either ROI (p's > 0.10 in paired t-tests).

Do predictive events become more neurally distinct than nonpredictive events specifically when they share a cue (and thus initially overlap), or do they become more neurally distinct in general? To test whether differences in pattern similarity extend to non-overlapping events, we measured similarity between cue–outcome transitions with different cues (Fig. 7b). In the hippocampus, there was no difference in pattern similarity between predictive vs. nonpredictive events when both the cues and the outcomes were distinct (F(1, 23) = 0.00, p = 0.97). Conversely, this difference was reliable in EVC (F(1, 23) = 17.24, p < 0.001), with reduced similarity between cue–outcome visual transitions with predictive vs. nonpredictive actions. However, as noted above, within-cue vs. across-cue similarity reliably interacted with predictiveness in both the hippocampus (F(1, 23) = 8.95, p = 0.007) and EVC (F(1, 23) = 11.33, p = 0.003). Unlike the differentiation effect between overlapping cue–outcome transitions, across-cue similarity did not interact with delay condition in either the hippocampus (F(1, 23) = 1.74, p = 0.20) or EVC (F(1, 23) = 0.96, p = 0.34), though the three-way interaction of within-cue vs. across-cue similarity, predictiveness, and timescale was not reliable in either ROI (p's > 0.12 in repeated-measures ANOVAs).

Finally, multivariate pattern similarity in the hippocampus was correlated across participants with background connectivity only after the 3-day delay. Individual differences across participants in background connectivity were unrelated immediately after training to within-cue pattern similarity in either the hippocampus (r(22) = 0.11, p = 0.62) or EVC (r(22) = 0.10, p = 0.63; Supplementary Fig. 3a). In contrast, after the 3-day delay, background connectivity was significantly negatively correlated with pattern similarity in the hippocampus (r(22) = −0.62, p = 0.001) though not EVC (r(22) = −0.21, p = 0.32; Supplementary Fig. 3b). Like background connectivity, pattern similarity in each ROI was not correlated with individual differences in RT at either timescale (p's > 0.08 for all Pearson correlation coefficients). Since left and right button presses were averaged together in order to estimate multivoxel patterns corresponding to visual transitions, pattern similarity expectedly did not differ in either the hippocampus or EVC between participants who made 100% consistent vs. inconsistent verbal predictions for nonpredictive actions (p's > 0.20 in two-sample t-tests).

Discussion

Using high-resolution fMRI and a multi-session training paradigm, we examined how functional interactions between the hippocampus and EVC change over the early periods of a memory. Results build upon recent evidence of a link between hippocampal pattern completion and predictive coding in visual cortex³, but suggest that the role of the hippocampus in visual prediction depends on the age of the knowledge on which the prediction was based. Specifically, interactions between the hippocampus and visual cortex became weaker for nonpredictive actions (and relatively stronger for predictive actions) 3 days after learning compared to immediately after learning. Over the same timescale, predictive actions led neural representations in these regions to become more differentiated for sequences with overlapping stimuli. Hippocampal prediction may be based at first on indiscriminate binding of co-occurring stimuli, with time and offline processing leading to gradual pruning of weaker associations, in this case, associations without informative actions.

Immediately after training, hippocampal–neocortical interactions were the same for predictive and nonpredictive actions. At first glance, the absence of a difference in background connectivity between these conditions may appear to be at odds with the finding that multivariate pattern similarity in EVC was significantly reduced for predictive vs. nonpredictive actions even immediately after training, and also with previous multivoxel pattern analysis (MVPA) findings in which classifier accuracy was at chance in both the hippocampus and EVC for nonpredictive actions while above chance for predictive actions³. Critically, however, background connectivity and MVPA are differentially sensitive to prediction in this task. Specifically, although participants cannot accurately predict outcomes of nonpredictive actions, they may nonetheless inaccurately predict outcomes. For example, the less predictable transitions for these cues may encourage hypothesis testing or other attempts to continue learning, or participants may be predicting both outcomes associated with the cue (which each still co-occur 50% of the time, far higher than any other outcome). Less differentiated patterns in visual cortex may in fact reflect less differentiated neural predictions, as opposed to a lack of prediction. Likewise, in any of these cases, a multivariate classifier will seek evidence of the correct outcome, and so performance will be at chance on average. However, to the extent that background connectivity between the hippocampus and visual cortex reflects the process of prediction, whether accurate or inaccurate, it may be enhanced for both predictive and nonpredictive actions.

Verbal predictions for nonpredictive actions before and after each fMRI scan—and their relationship across participants with background connectivity—support the idea that the hippocampus may at first generate spurious predictions for nonpredictive events. While participants were required to be 100% accurate in identifying outcomes of predictive actions, there were no objectively correct or incorrect responses for nonpredictive actions. However, participants were more than 90% consistent on average in matching nonpredictive cues and actions to specific unpredictable outcomes immediately after training and were significantly less consistent in making such predictions for nonpredictive actions after the 3-day delay. Moreover, the small subset of participants who were still 100% consistent in their verbal predictions for nonpredictive actions after the 3-day delay exhibited significantly stronger background connectivity during nonpredictive events than did participants who made inconsistent predictions.

Three days after training, participants were also significantly quicker in making predictive actions than in making nonpredictive actions. Although RT in the scanner did not correlate with background connectivity across participants, quicker responses to predictive cues coincided across conditions with greater background connectivity. Accordingly, changes in hippocampal–neocortical interaction may relate to the perceptual fluency of cue–action–outcome sequences. At the same time that strengthening of sparse hippocampal representations may lead to faster responses to predictive cues, weak or noisy representations for nonpredictive associations may lead to slower responses to nonpredictive cues. While time-dependent changes in perceptual fluency may be independent of the hippocampus for tasks that are completely perceptual^22,23, hippocampal function is necessary for learning arbitrary associations among stimuli²⁴. Notably, however, the statistical learning required for action-based prediction may involve different pathways within the hippocampus than other forms of hippocampally dependent learning^5,24.

While background connectivity effects were robust across a wide range of ROI sizes within EVC, voxelwise background connectivity largely overlapped with the specific a priori ROIs. Immediately after training, predictive actions did not reliably modulate voxelwise background connectivity with either the hippocampus or EVC. However, after a 3-day delay, predictive actions significantly modulated hippocampal background connectivity with voxels in V1 and V2, as well as EVC background connectivity with voxels in the hippocampus. In addition to overlap with the specific ROIs, a few other interesting findings emerged including enhanced hippocampal background connectivity for predictive actions with the putamen and object-selective visual areas in posterior fusiform and lateral occipital cortex. The putamen is especially intriguing because it has frequently been linked with action selection²⁵ and with offline motor-sequence learning²⁶. Moreover, this finding converges with previous MVPA findings for action decoding in the putamen³.

Along with background connectivity and behavior, multivariate pattern similarity within the hippocampus and visual cortex depended on the combination of predictiveness and delay interval. In the hippocampus, consistent with hippocampal models of episodic memory that emphasize the importance of representational overlap for neural differentiation^27,28, we observed reduced pattern similarity for predictive relative to nonpredictive actions only between visual transitions that shared the same cue stimulus. In EVC, predictive actions led to more distinctive neural patterns at each timescale for both overlapping visual transitions (that shared a cue stimulus) and non-overlapping transitions (in which both the cue and outcome differed). However, just as observed in the hippocampus, delay condition significantly modulated the effect of predictiveness on EVC pattern similarity only between overlapping visual transitions. Thus, the passage time modulated neural differentiation effects in EVC in the same way as in the hippocampus, further linking these regions together.

At the same time that changes across time in hippocampal–neocortical interaction are inconsistent with a time-invariant role for the hippocampus in predictive coding, models of memory retrieval that posit a reduced role for the hippocampus over time^11,12,13 would not obviously predict the findings: identical hippocampal–neocortical interaction during predictive and nonpredictive actions immediately after training followed by greater interaction specifically during predictive actions after a 3-day delay. In order to accommodate these findings, models that include the hippocampus need to include a role for predictive action in offline processing. Specifically, predictive action may provide a mechanism for prioritizing which representations are either strengthened through synaptic potentiation or weakened through synaptic depression during periods of offline rest^29,30. Activity-dependent synaptic potentiation and depression may in turn be mediated by offline replay within the hippocampus^31,32 and between the hippocampus and neocortex^33,34. By transforming noisy recent associations into sparser remote associations, this offline processing may increase the efficiency and utility of hippocampal associations over time^35,36. Ultimately, sparser hippocampal representations may increase the signal-to-noise ratio of the hippocampal–neocortical interactions during action-based prediction.

While feedback across layers of visual cortex may be sufficient to fill-in adjacent elements of a sequence or scene, top-down connections such as from the hippocampus may be needed to simultaneously predict multiple elements in a sequence^37,38 and to make predictions based on prior co-occurrence and arbitrary associations^3,16. Indeed, time-lagged background connectivity here converges with previous MVPA findings in which sequence information in the hippocampus temporally preceded outcome information in EVC during mnemonic prediction³. The timescale by predictiveness interaction observed for background connectivity was preserved when hippocampal background activity was shifted earlier to lead EVC, while it was eliminated when hippocampal background activity was shifted later to trail EVC. Although the causal direction of the relationship between the hippocampus and EVC cannot be established with correlational measures such as fMRI, converging data across this experiment and a previous study³ are at least consistent with the hippocampus reinstating expected outcomes in visual cortex.

Hippocampal–neocortical interactions measured here through background connectivity are consistent with previous findings in human neurophysiology that link perceptual inference to the synchronization of long-range hippocampal-cortical oscillations^6,7. Because stimulus-evoked responses and coherent spontaneous fluctuations are linearly superimposed in human fMRI data⁸, intrinsic activity within the hippocampus and EVC can be separated from stimulus-evoked responses and other variables^9,10. Whereas correlations in classification of stimulus-evoked responses depend upon the precision of memories and associated predictions represented within each region, background correlations may more directly reflect hippocampal–neocortical interactions themselves. In this way, background connectivity provides a more objective index of hippocampal involvement in action-based predictive coding. By using background connectivity to reveal consolidation-related effects on visual prediction, findings here further develop the link between hippocampal representation^2,39 and models of predictive coding in visual cortex^40,41.

In sum, interactions between the hippocampus and EVC, and representations in these areas, strengthen over time for predictive actions relative to nonpredictive actions. Hippocampal prediction may occur by default, based at first on indiscriminate binding of co-occurring of stimuli. Time and offline processing may gradually prune weaker associations, in this case ones without informative actions, so that hippocampal reinstatement becomes increasingly specific to predictive events.

Methods

Participants

Twenty-four individuals (19 female, aged 18–33 years) from the Princeton University community participated in the study. Each participant was right-handed and had normal or corrected-to-normal vision. Two additional participants completed the training sessions but did not participate in the fMRI component of the experiment due to below-criterion accuracy on verbal outcome-identification tests prior to the scan. Participants were paid $20 per hour and provided informed consent to a protocol approved by the Princeton University Institutional Review Board.

Stimuli

The primary stimulus set included 24 fractal-like images that were masked to be either square or diamond in shape. An additional 144 unique fractal and phase-scrambled images were included in a localizer to identify V1/V2 voxels reliably responsive to the experimental stimuli. All fractal images were created using ArtMatic Pro (www.artmatic.com). Both square- and diamond-masked stimuli subtended ∼4° of visual angle in diameter on the training/testing laptop computer, and 4.5° in the scanner. We counterbalanced the assignment of images to 3-day delay and no-delay conditions and to sequences containing either predictive or nonpredictive actions, and randomly assigned images to serve as cues or outcomes. The Psychophysics Toolbox⁴² for MATLAB (MathWorks) was used for stimulus presentation and response collection.

First training session (3-day delay)

The first training session was proctored 3 days before the fMRI scan on a laptop computer in a behavioral testing room. As in previous studies involving the same action-based training paradigm, the training session began with an exploratory training phase, followed by a verbal outcome-identification test, and finally a directed training phase^3,43. The exploratory training phase included 320 trials in which a cue stimulus appeared on the computer screen for 1000 ms and then a double-headed arrow appeared below the cue. Participants were allowed an unlimited amount of time for each trial to make either a left button press or a right button press, in order to replace the cue with an outcome stimulus that appeared for 1000 ms. A meter at the bottom of the screen tracked the proportion of left and right button presses throughout the exploratory training phase, and participants were instructed to keep the meter pointer within a specified central zone, in order to roughly equate the frequency of actions and outcomes.

The directed training phase included 160 trials in which the onset of the cue was followed by a single-headed arrow that instructed participants to make a left or right button press for that trial. Directed training was included in order to equate the stimulus frequencies and transitional probabilities of the two outcomes associated with each cue throughout training. For example, if participants responded left more than right during the exploratory training, they were more likely to be instructed to respond right in the directed training.

Second training session (no delay)

The second training session was also proctored on a laptop computer in a behavioral testing room, but immediately before the fMRI scan and with new cue and outcome stimuli. To minimize interference between stimulus sets from different sessions, we masked one set with squares and the other with diamonds, with the order counterbalanced across participants. The structure of the second training session was identical to the first, with an exploratory training phase, then a verbal outcome-identification test, and finally a directed training phase.

Predictive actions

For half of the sequences within each training session, actions were highly predictive of the outcome. For instance, given the predictive cue A, outcome B appeared with 95% probability when the left button was pressed, and outcome C appeared with just 5% probability. Similarly, when the right button was pressed, outcome C appeared with 95% probability and outcome B appeared with just 5% probability. Within each training session, participants were exposed to two different cue stimuli for which actions were highly predictive of outcomes.

Nonpredictive actions

Randomly intermixed with the predictive action trials, the remaining half of the sequences within each training session contained nonpredictive actions: the two outcomes for each cue appeared with equal probability, irrespective of which button was pressed. That is, given the nonpredictive cue D, outcome E or outcome F appeared with 50% probability when either the left or right button was pressed. Within each training session, participants were exposed to two different cue stimuli for which actions were nonpredictive of outcomes.

Scan task

The task in the fMRI scanner resembled the training sessions. Participants were instructed to continue to keep track of probabilistic relationships between button presses and fractal pairs while in the scanner and they knew to expect a final set of behavioral tests after the scan. A total of 320 sequence trials were organized into eight 6-min runs. Each run contained sequences from either the first training session or the second training session, alternating between runs. Within each run, four blocks of predictive actions alternated with four blocks of nonpredictive actions. Pairs of runs for each participant contained the same stimuli and block order, while the trial order of the cue stimuli was randomized within and across blocks of predictive or nonpredictive actions. For nonpredictive actions, the trial order of the associated outcomes was also randomized within and across blocks. Each block included five trials and lasted 22.5 s, followed by 18 s of fixation. To match the outcome probabilities during the scan with the trained probabilities, participants were instructed to balance their left and right responses, and one block of predictive actions in each run contained a trial with an incorrectly predicted outcome (modeled separately and excluded from analysis).

As during exploratory training, each trial in the scanner involved three parts: a cue stimulus for 1000 ms, an action prompt consisting of a double-headed arrow below the cue that remained on screen until a button press or until the 1500 ms response window elapsed, and an outcome stimulus for 1000 ms. Participants used a separate response box for each hand to press the left and right buttons. If participants did not press a button within the response window, the cue stimulus and action prompt were replaced with a fixation cross that remained on screen until the next trial.