Retrieval-Based Model Accounts for Striking Profile of Episodic Memory and Generalization

Banino, Andrea; Koster, Raphael; Hassabis, Demis; Kumaran, Dharshan

doi:10.1038/srep31330

Download PDF

Article
Open access
Published: 11 August 2016

Retrieval-Based Model Accounts for Striking Profile of Episodic Memory and Generalization

Andrea Banino¹,
Raphael Koster¹,
Demis Hassabis^1,2 &
…
Dharshan Kumaran^1,3

Scientific Reports volume 6, Article number: 31330 (2016) Cite this article

4647 Accesses
22 Citations
11 Altmetric
Metrics details

Subjects

Abstract

A fundamental theoretical tension exists between the role of the hippocampus in generalizing across a set of related episodes, and in supporting memory for individual episodes. Whilst the former requires an appreciation of the commonalities across episodes, the latter emphasizes the representation of the specifics of individual experiences. We developed a novel version of the hippocampal-dependent paired associate inference (PAI) paradigm, which afforded us the unique opportunity to investigate the relationship between episodic memory and generalization in parallel. Across four experiments, we provide surprising evidence that the overlap between object pairs in the PAI paradigm results in a marked loss of episodic memory. Critically, however, we demonstrate that superior generalization ability was associated with stronger episodic memory. Through computational simulations we show that this striking profile of behavioral findings is best accounted for by a mechanism by which generalization occurs at the point of retrieval, through the recombination of related episodes on the fly. Taken together, our study offers new insights into the intricate relationship between episodic memory and generalization, and constrains theories of the mechanisms by which the hippocampus supports generalization.

Conceptual relatedness promotes memory generalization at the cost of detailed recollection

Article Open access 20 September 2023

Greta Melega & Signy Sheldon

Abrupt hippocampal remapping signals resolution of memory interference

Article Open access 10 August 2021

Guo Wanjia, Serra E. Favila, … Brice A. Kuhl

Behavioral representational similarity analysis reveals how episodic learning is influenced by and reshapes semantic memory

Article Open access 20 November 2023

Catherine R. Walsh & Jesse Rissman

Introduction

The hippocampus is widely accepted to play a critical role in episodic memory, the capacity to remember individual experiences from the past (e.g. where one parked the car on a given day)^1,2. However, recent evidence suggests that the hippocampus plays an important role across species in experimental paradigms where successful performance depends on exploiting the commonalities present across multiple related experiences^{3,4,5,6,7,8,9,10,11,12,13,14,15,16}. Indeed, these findings provoke fundamental questions about the nature of hippocampal representations, and how putative mechanisms by which the hippocampus might support generalization in such scenarios fits with its well established role in episodic memory.

Two classes of mechanisms have been proposed to account for the role of the hippocampus in generalization: firstly, “encoding-based” models^11,17,18 argue that the hippocampus integrates together related experiences (e.g. two object pairs A-B & B-C) at the point of encoding, resulting in the formation of blended representations (e.g. A-B-C: linking A, B and C) that directly support generalization at the time of test. In contrast, “retrieval-based” models¹⁹ (also see ref. 20) seek to retain a computational principle that is viewed to be critical to the functioning of an episodic memory system: pattern separation^{21,22,23,24,25,26,27,28,29}, whereby even related experiences result in the formation of orthogonalized representations in the hippocampus during the study phase. Generalization, according to this view, represents an emergent phenomenon whereby multiple related episodic memory traces (e.g. coding for experiences A-B, and B-C) are re-activated and recombined within a memory space dynamically shaped at the point of retrieval (i.e. during test trials). The question of which of these two very different mechanisms provides a more accurate account of the hippocampal contribution to generalization remains an open question based on empirical evidence to date.

In this study, we employed a prototypical paradigm widely used to study the role of the hippocampus in generalization – the paired associate inference (PAI) task^3,4,5,30,31 – to address these issues. Whilst the PAI paradigm has been used to test the mechanisms underlying generalization, the relationship between episodic memory for the individual associative experiences (i.e. A-B, B-C object pairs) and ability to generalize (i.e. appreciate the indirect relationship between A and C objects) has not been studied before. Here we develop an adapted version of the task that allowed us to obtain measures of participants’ episodic memory for the individual experiences as well as their ability to generalize (see Fig. 1). Importantly, the two classes of mechanisms discussed above suggest divergent predictions about the nature of the relationship between episodic memory and generalization: encoding-based models hold that the ability to generalize depends on blended representations (i.e. A-B-C) that imply a concomitant loss of episodic memory. In contrast, better memory for the individual episodes is associated with superior generalization performance in retrieval-based models, because robust reactivation of pattern separated episodic traces facilitates episodic recombination mediated by a recurrent mechanism. We conducted several studies to explore the effects of different experimental conditions – for example, whether participants knew during the study phase of the task that their ability to generalize would subsequently be tested – on the relationship between episodic memory and generalization.

Results

For all experiments we report the results in a Bayesian statistical framework (see Methods for the explanations of the advantages of Bayesian statistic over null hypothesis significance test in the context of our experiments & e.g.^32,33). Notably, the posterior distributions were summarised using the highest density interval (HDI), which is defined as the interval that covers a certain percentage of the distribution (in our case 95%) in a way that every point inside the interval has a higher credibility than any point outside it. The HDI is then used to obtain the set of credible values for a certain parameter, which can be used to make unbiased decisions on parameters’ values. For instance if the HDI of the estimated posterior distribution of a certain parameter include 0 then it is possible to infer that that parameter is not credibly different from 0. A similar concept in null hypothesis significance testing (NHST) is represented by confidence interval (CI), however this does not describe a probability distribution over parameters values, but merely defines two end points, which are based on the hidden intentions of the experimenter and not on prior knowledge (cf. stated prior distribution)^32,33. We used standard methods for performing approximate bayesian inference (i.e. markov chain monte carlo (MCMC) algorithms: see Methods for details), in situations where calculating the exact posterior distribution is intractable. For all the models (across experiments) visual inspection of both trace and density plots revealed an almost perfect overlap between chains. The Gelman-Rubin test³⁴ confirmed the convergence of the chains for each parameter (mean values were 1.00 for all parameters). Finally, the analysis of the autocorrelation function (ACF) and effective sample size (ESS)³⁵ revealed that the MCMC were sufficiently large and accurate estimates of the posterior distributions. Note that in addition, for the purposes of clarity we also report the results of a classical statistical analysis for Experiment 1 (see Supplementary Information).

Experiment 1

During choice test trials (see Methods), participants were required to select which of two objects was associated with the probe object (e.g. whether B1 or B2 was associated with A1, in an AB choice test trial: see Fig. 1 and Methods). Their performance was highly proficient on AB trials [95% HDI from 0.91 to 0.95, mean of HDI = 0.93], and BC trials [95% HDI from 0.87 to 0.91, mean of HDI = 0.89] (see Fig. 2). Notably, they were also successful on choice test trials where an inference was required: AC trials [95% HDI from 0.81 to 0.86, mean of HDI = 0.83]. Interestingly, participants’ performance on AB choice trials was credibly better than on BC choice trials (i.e. zero was not among the credible values of the difference between the relevant posterior distributions [AB − BC: 95% HDI from 0.01 to 0.07, mean of HDI = 0.04].

Critically, our experimental design included source test trials that followed choice test trials which specifically probed the episodic nature of participant’s underlying representations, by asking them to judge whether the two relevant items (e.g. A1 and B1 on an AB trial) were “directly” associated (i.e. had been presented together during the study phase) or “indirectly” associated (i.e. were related through an intervening item: e.g. A1 and C1 on an AC trial: see Methods). In contrast, participants’ responses during choice test trials assessed their memory for whether objects were associated with one another (i.e. part of the same triplet), regardless of whether they had been presented together as part of a single episode (e.g. A1-B1) or not (e.g. A1-C1) (see Supplemental Information for further details on the distinction between choice and source trials).

Our results revealed a striking asymmetry between performance on AB and BC trials. Whilst participants performed successfully on AB source trials [95% HDI from 0.79 to 0.88, mean of HDI = 0.83], their performance on BC source trials was not far above chance level [95% HDI from 0.52 to 0.62, mean of HDI = 0.57]. Performance on AC source trials had a mean of the HDI equal to 0.69 [95% HDI from 0.65 to 0.73].

Notably, both performance on AB source trials (see above) and AC source trials was credibly different from performance on BC trials: for both comparisons the value of zero was not among the credible values of the HDI [AB − BC: 95% HDI from 0.18 to 0.32, mean of HDI = 0.26; BC − AC: 95% HDI from −0.18 to −0.05, mean of HDI = −0.11].

This profile of results was also reflected in our analysis of the reaction time (RT) data. Note that the data are reported on the original scale after the log-posterior distribution had been transformed. Participants were faster to respond on AB choice trials [95% HDI from 2.62 seconds to 2.98 seconds, mean of HDI = 2.80 seconds] than on both the BC choice trials [95% HDI from 3.35 seconds to 3.75 seconds, mean of HDI = 3.55 seconds] and AC choice trials [95% HDI from 4.45 seconds to 4.95 seconds, mean of HDI = 4.70 seconds] (see Fig. 3). The direct comparison between these conditions showed that participants’ RT in AB choice trials were credibly different from both BC and AC choice trials [AB − BC: 95% HDI from −1.02 seconds to −0.48 seconds, mean of HDI = −0.75 seconds; AB − AC: 95% HDI from −2.21 seconds to −1.60 seconds, mean of HDI = −1.90 seconds]. RTs on BC choice trials were credibly faster than on AC trials [BC − AC: 95% HDI from −1.48 seconds to −0.84 seconds, mean of HDI = −1.15 seconds].

We also analyzed the RT data relating to source judgments: this revealed a similar profile. Specifically, participants were faster on AB source trials than on BC source trials [AB − BC: 95% HDI from −0.20 seconds to −0.09 seconds, mean of HDI = −0.15 seconds]. RTs during AB source trials were also credibly lower than during AC source trials [AB − AC: 95% HDI from −0.13 seconds to −0.03 seconds, mean of HDI = −0.08 seconds]. Participants were also slower to respond on BC source trials than on AC trials [BC − AC: 95% HDI from 0.01 seconds to −0.12 seconds, mean of HDI = 0.07 seconds].

Logistic regression

The results described above provide evidence for a striking asymmetry between participants’ episodic memory – indexed by the performance on source test trials – whereby memory for A-B experiences is credibly stronger than memory for B-C experiences, the latter being close to chance levels. To explore the relationship between episodic memory (i.e. indexed source test trial performance) and generalization/inference (i.e. A-C choice test trial performance), we conducted a logistic regression analysis that included these critical variables (see Methods).

The regressors coding for performance on AB source trials and BC source trials had a significant positive correlation with performance on AC choice trials [β_ABsource: 95% HDI from 0.37 to 1.03, mean of HDI = 0.66 ; β_BCsource: 95% HDI from 0.33 to 1.13, mean of HDI = 0.70] with all the other predictors held constant (see Fig. 4 for posterior odds ratios). Indeed, the magnitude of AB and BC predictors was similar suggesting that they contribute equally to the probability of making a correct inference on A-C choice test trials. Indeed, the comparison of their posterior distributions showed no significant differences between the magnitude of their coefficients [β_ABsource − β_BCsource: 95% HDI from −0.47 to 0.47, mean of HDI = 0.02]. Also the predictor coding for BC choice trial showed a positive correlation with AC choice performance [β_BCchoice: 95% HDI from 0.21 to 1.23, mean of HDI = 0.72], but this effect was not credibly different from the predictor coding for BC source [β_BCchoice − β_BCsource: 95% HDI from −0.42 to 0.47, mean of HDI = 0.02]. As such, our findings imply that better episodic memory for both A-B and B-C trials is associated with superior A-C performance – a result that points towards retrieval-based models of inference (cf encoding-based – see discussion).

Experiment 2

In Experiment 1, participants were instructed prior to the experiment that they would need to complete generalization trials (i.e. AC) and the experimental schedule was organized into study-test cycles. In Experiment 2, we asked whether a similar profile of results would be observed under different experimental conditions where participants were not aware during the study phase that their ability to generalize would be subsequently tested. Specifically, in this experiment participants completed all encoding sessions before test, and were in fact instructed to keep the object pairs (e.g. A1-B1, B1-C1) experienced during the study phase as separate as possible to avoid interference (see Methods). Despite these considerable differences in the set-up of the experimental paradigm, we observed a similar profile of results (see Fig. 2).

Participants’ performance on AB, BC and AC choice test trials was: [AB trials 95% HDI from 0.86 to 0.93, mean of HDI = 0.89; BC trials 95% HDI from 0.77 to 0.83, mean of HDI = 0.80; AC trials 95% HDI from 0.62 to 0.69, mean of HDI = 0.66]. The posterior mean comparison tests again revealed that participants’ performance on AB choice trials were credibly better than both BC choice trials [95% HDI from 0.04 to 0.14, mean of HDI = 0.09] and AC choice trials [95% HDI from 0.10 to 0.19, mean of HDI = 0.15] (see Fig. 2).

Performance on source test trials also showed a clear asymmetry between AB and BC test trials: [AB trials 95% HDI from 0.67 to 0.77, mean of HDI = 0.72; BC trials 95% HDI from 0.45 to 0.54, mean of HDI = 0.50]. Performance on AC source trials was slightly above chance [95% HDI from 0.51 to 0.58, mean of HDI = 0.54]. Notably, a mean comparison analysis revealed that performance on AB source trials were credibly different from both BC and AC source trials: in fact for both comparisons zero was not among the credible values of the HDI [AB − BC: 95% HDI from 0.16 to 0.30, mean of HDI = 0.23; AB − AC: 95% HDI from 0.12 to 0.24, mean of HDI = −0.18].

RT analysis during choice trials revealed that participants were faster on AB choice trials [95% HDI from 1.81 seconds to 2.01 seconds, mean of HDI = 1.91 seconds] than on both BC choice trials [95% HDI from 2.05 seconds to 2.26 seconds, mean of HDI = 2.16 seconds] and AC choice trials [95% HDI from 2.49 seconds to 2.76 seconds, mean of HDI = 2.62 seconds] (see Fig. 3). Participants were credibly faster in responding to AB choice trials (cf. BC and AC trials): [AB − BC: 95% HDI from −0.39 seconds to −0.10 seconds, mean of HDI = −0.25 seconds; AB − AC: 95% HDI from −0.87 seconds to −0.55 seconds, mean of HDI = −0.71 seconds]. Additionally, RT on BC choice trials was credibly faster than AC trials [BC − AC: 95% HDI from −0.63 seconds to −0.29 seconds, mean of HDI = −0.46 seconds].

Lastly, the analysis of source RT data revealed that participants were credibly faster on AB source trials than both BC and AC trials [AB − BC: 95% HDI from −0.09 seconds to 0 seconds, mean of HDI = −0.05 seconds; AB − AC: 95% HDI from −0.13 seconds to −0.03 seconds, mean of HDI = −0.08 seconds]. No difference was found between BC source trials than AC trials [BC − AC: 95% HDI from −0.09 seconds to 0.02 seconds, mean of HDI = −0.04 seconds].

Logistic regression

As in experiment 1, in a logistic regression analysis where AC test trial performance was the dependent variable, the predictors coding for performance on AB source trials and BC source trials were credibly different from zero: [β_ABsource: 95% HDI from 0.31 to 0.87, mean of HDI = 0.59; β_BCsource: 95% HDI from 0.17 to 0.68, mean of HDI = 0.42] with all the other predictors held constant (see Fig. 4 for posterior odds ratios). It is worth noting that although the coefficient for AB source performance was numerically higher than the coefficient for BC source performance, a comparison of their posterior distributions revealed that they were not credibly different [β_ABsource − β_BCsource: 95% HDI from −0.24 to 0.58, mean of HDI = 0.16]. Furthermore, the predictor coding for BC choice showed positive correlation with AC choice [β_BCchoice: 95% HDI from 0.36 to 0.55, mean of HDI = 0.46], a further comparison revealed that this predictor was not credibly from the predictor coding for BC source [β_BCchoice − β_BCsource: 95% HDI from −0.33 to 0.47, mean of HDI = 0.04]. These findings provide further support that better episodic memory – even for BC trials where source performance was found to be at chance level across participants – is associated with superior generalization on AC choice test trials.

Experiment 3

This experiment followed the structure of experiment 1 (i.e. encoding-test cycles), with the exception that a novel/familiar scene was presented before each BC trial (see Methods). The rationale for this manipulation was that there are theoretical proposals³⁶ and empirical evidence³⁷ that novelty/familiarity biases the hippocampal system towards an encoding/retrieval mode, respectively. A hypothesis consequent on this perspective in the PAI paradigm is that BC trials which are preceded by a novel scene may result in relatively more pattern separated representations for the relevant AB and BC experiences, and therefore a preservation of AB and BC episodic memory (i.e. indexed by source memory trials). In comparison, BC trials preceded by a familiar scene may trigger pattern completion (i.e. of the initial AB experience), integration/blending, and therefore relatively worse episodic memory for AB and BC experiences.

However, in our experiment we did not find credible effects of the novelty/familiarity manipulation – specifically, the familiarity factor (β_2[k]) and its interaction with the trial category factor (β_1x2[j,k]) were not credibly different from zero in any of the relevant analyses. As such, the results we report focus on the difference between trial types, effectively collapsing across the novelty/familiarity factor (see Methods), and therefore serve largely to replicate the findings of experiment 1.

Participants’ performance on choice test trials was: AB trials [95% HDI from 0.89 to 0.94, mean of HDI = 0.91], BC trials [95% HDI from 0.84 to 0.91, mean of HDI = 0.88], and AC trials [95% HDI from 0.75 to 0.85, mean of HDI = 0.80]. In this experiment participants’ choice performance on AB was numerically higher than that on BC trials, but not credibly different [AB − BC: 95% HDI from 0 to 0.08, mean of HDI = 0.04] (see Fig. 2).

Source data analysis confirmed the marked asymmetry between performance on AB and BC trials: [AB: 95% HDI from 0.73 to 0.84, mean of HDI = 0.79; BC trials 95% HDI from 0.50 to 0.68, mean of HDI = 0.59]. In contrast to BC source performance which did not differ from chance levels, performance on AC source trials was well above chance [95% HDI from 0.63 to 0.76, mean of HDI = 0.69]. Performance on AB source trials was credibly different from performance on BC source trials [AB − BC: 95% HDI from 0.18 to 0.32, mean of HDI = 0.26] (see Fig. 2).

Analysis of reaction times shown that participants were faster on AB choice trials [95% HDI from 2.07 seconds to 2.27 seconds, mean of HDI = 2.17 seconds] than on both BC choice trials [95% HDI from 2.32 seconds to 2.55 seconds, mean of HDI = 2.43 seconds] and AC choice trials [95% HDI from 2.81 seconds to 3.05 seconds, mean of HDI = 2.93 seconds]. Further, participants were credibly faster on AB choice trials than both BC and AC choice trials [AB − BC: 95% HDI from −0.47 seconds to −0.05 seconds, mean of HDI = −0.26 seconds; AB − AC: 95% HDI from −0.98 seconds to −0.52 seconds, mean of HDI = −0.75 seconds] and that the RT on BC choice trials was credibly lower than on AC trials [BC − AC: 95% HDI from −0.72 seconds to −0.24 seconds, mean of HDI = −0.49 seconds] (see Fig. 3).

Analysis of RT source data showed a similar pattern: participants were credibly faster on AB source trials than both BC and AC trials [AB − BC: 95% HDI from −0.21 seconds to −0.07 seconds, mean of HDI = −0.14 seconds; AB − AC: 95% HDI from −0.39 seconds to −0.25 seconds, mean of HDI = −0.32 seconds]. Also, participants were faster on BC source trials than on AC trials [BC − AC: 95% HDI from −0.24 seconds to −0.12 seconds, mean of HDI = −0.18 seconds].

Logistic regression

The categorical predictor coding for familiarity (β_2[k]) was not credibly different from zero and hence we considered only the predictors coding for the different trial types. Both predictors for AB and BC source performance were positively correlated with AC test trial performance, as in the previous two experiments: [β_ABsource: 95% HDI from 0.44 to 1.12, mean of HDI = 0.76; β_BCsource: 95% HDI from 0.10 to 1.03, mean of HDI = 0.49] with all the other predictors held fixed (see Fig. 4 for posterior odds ratios). No significant difference was found between their magnitude: [β_ABsource − β_BCsource: 95% HDI from −0.27 to 0.88, mean of HDI = 0.30]. Finally the predictor coding for BC choice trial shown a positive correlation with AC trial performance [β_BCchoice: 95% HDI from 0.14 to 1.18, mean of HDI = 0.67], but this was not credibly different from the BC source predictor [β_BCchoice − β_BCsource: 95% HDI from −0.30 to 0.57, mean of HDI = 0.19], as in the previous experiments.

Recognition memory

Participants’ showed above chance recognition memory for novel scenes [Mean corrected hit rate = 29%, SD = 16%]. Also, we tested participants’ performance on the main experimental test trials as a function of their subsequent recognition memory for test trials preceded by novel scenes. This analysis revealed no difference between choice test trials in which participants subsequently recognized the novel scene and trials in which they did not [AB_hit − AB_miss: 95% HDI from −2.54 to 2.68, mean of HDI = 0.07, BC_hit − BC_miss: 95% HDI from −2.55 to 2.60, mean of HDI = 0.06; AC_hit − AC_miss: 95% HDI from −2.53 to 2.64, mean of HDI = 0.07; where “hit” and “miss” denote whether the scene was recognized or not]. The same pattern of result held for source test trials [AB_hit − AB_miss: 95% HDI from −2.07 to 2.42, mean of HDI = 0.17, BC_hit − BC_miss: 95% HDI from −2.00 to 2.47, mean of HDI = 0.17; AC_hit − AC_miss: 95% HDI from −2.06 to 2.41, mean of HDI = 0.16].

Experiment 4

Our findings across three experiments show a consistent profile of findings, marked by a clear asymmetry between performance on AB and BC source test trials. In this follow-up experiment (see Supplemental Information and Fig. S1 for details) – in which AC test trials were omitted for half of the triplets (i.e. there were only AB and BC test trials) – we considered and excluded the possibility that this effect might be driven by learning occurring during AC test trials which by design preceded AB and BC test trials (see Methods).

Computational Modelling

We next sought to provide a mechanistic account of the striking profile of behavioural findings observed across the three experiments. In particular, we pitted a retrieval-based model of generalization¹⁹ directly against encoding-based mechanisms^11,17,18 that were constructed within the same connectionist framework (see Fig. 4). Intuitively, these models represent a highly simplified abstraction of the processing within the hippocampal system (e.g.¹⁹): broadly, the feature layer, where units denote individual objects (e.g. object A1), can be related to the entorhinal cortex and the conjunctive layer to the CA3 region of the hippocampus, where units are conjunctive (e.g. A1B1 unit). The conjunctive units implement an idealization of the notion of pattern separated representations for overlapping episodes (e.g. A1B1, B1C1 – see Fig. 5). The retrieval-based REMERGE model implements a principle of big-loop recurrence within the hippocampal system, whereby the output of the system can be fed back in as a new input, through bidirectional excitatory connections between the feature and conjunctive layer¹⁹. This allows generalization, even between distantly related experiences, to occur through memory space that is dynamically constructed at the point of retrieval. In contrast, encoding-based models^11,17,18 do not have recurrent connections, but implement the notion that blended/integrated representations than span episodes are formed during encoding through units on the conjunctive layer (e.g. A1B1C1). We examine variants of both classes of models that additionally encompass the assumption of proactive interference during BC encoding.

**Figure 5: Schematic of computational models used to simulate performance in the PAI task: 2 triplets illustrated (A₁B₁C₁, A₂B₂C₂).**

Model Architectures

REMERGE model

As shown in Fig. 4, units in the conjunctive layer (e.g. A₁B₁, B₁C₁) correspond to object pairs (e.g. A₁-B₁, B₁-C₁) presented during the studied pairs. Two triplet pairs (i.e. A₁-B₁-C₁, A₂-B₂C₂) are shown^15,19. The feature layer is connected to the conjunctive layer by bidirectional excitatory connections, and individual units denote individual objects (e.g. A₁, B₁). External input is presented to the feature layer (e.g. A₁, C₁, C₃ on an AC test trial). Activity of feature units was determined by a logistic function. Processing in the network continued through an iterative constraint-style satisfaction process, mediated by the recurrent connections. The curved arrow indicates inhibitory competition between conjunctive units, implemented by the standard softmax function. The conjunctive layer connects to the response layer (not shown in figure) through feedforward weights that are the same as the feature-conjunctive weights, such that activation of the A₁B₁ conjunctive unit drives choice of the B₁ object. A softmax function operating on response unit activities determined the network’s choice in a given trial. For a more detailed description of the model see ref. 19, and a schematic description of its operating principles is given in ref. 15. To capture the asymmetry between AB and BC test trial performance, we made the assumption that proactive interference occurred during BC encoding resulting in a slight decrement in weight strength (see below).

The 3 free parameters in this model used to simulate choice test trial performance were: i) the magnitude of weights between AB conjunctive units (e.g. A₁B₁) and A and B feature layer units (e.g. A₁, B₁). ii) The reduction in weight strength between BC conjunctive units and B and C feature units. iii) The network temperature used across feature, conjunctive and response layers.

Following previous work²⁷, we simulated performance on source test trials through a measure that captured the amount of mismatching activity present on the feature layer ( x in equation (1)) – where the presence of mismatching recall provides evidence that an input pattern has not been actually experienced before. Intuitively, an A₁C₁ trial would induce significant activation through recurrence of the B₁ feature unit – leading the network to make an “indirect” response. As such, two additional free parameters, common to all 4 models, were included: iv) a threshold value against the amount of mismatching activity was compared ( thresh ) v) τ_source - a temperature specific to source judgments (see equation (1)).

BLEND models

These models were implemented as described for the REMERGE model (e.g. logistic function on feature layer etc), except where stated. Importantly, these were feedforward models (i.e. feature→ conjunctive, with no recurrence) designed to implement the principle of integrated/blended representations of encoding-based models through conjunctive units (e.g. A₁B₁C₁) which were connected to all 3 feature units for a given triplet (i.e. A₁, B₁, C₁). Note that although recurrence was not present in these blend networks, one backward pass (from conjunctive→ feature layer) was allowed to compute the level of mismatching activity on the feature layer (i.e. as in ref. 27). In order of increasing complexity (see Fig. 5):

Blend_1 model has 4 free parameters (denoted as (iii, iv, v) in REMERGE above) plus a free parameter specifying magnitude of feedforward weights from all 3 feature units (i.e. A₁, B₁, C₁) to the “blended” conjunctive unit (e.g. A₁B₁C₁).

Blend_2 model incorporates additional AB and BC units in the conjunctive layer (i.e. as well as the integrated ABC unit), as if there are pattern separated representations for individual episodes as well as a blended representation. It has the 4 free parameters of blend_1 plus one additional parameter (i.e. parameter (i) of REMERGE above).

Blend_3 model incorporates the 5 free parameters of Blend_2, plus an additional parameter to allow for the assumption of proactive interference during BC encoding (i.e. parameter (ii) of REMERGE above: see Fig. 5).

Model Results

Given the highly similar profiles of the choice and source data in experiments 1–3, we focus here on simulating the observed data in experiment 1. We performed a hyperparameter sweep for all networks (i.e. REMERGE and the 3 versions of the blend model), to identify the parameters that best fit the empirical data in terms of the choice and source data across the group of subjects. We report the results of the relevant indices (averaged across subjects): i.e the negative log likelihood (NLL)) and BIC³⁸ in Table 1 (see below).

Table 1 Summary of model fits.

Full size table

As shown in Fig. 6, REMERGE – as described in ref. 19, but incorporating the assumption of a degree of proactive interference during BC encoding through a slight weight asymmetry (see Methods) – was able to reproduce the key features of the empirical data: i.e. the slightly superior of AB (cf BC) choice performance, and the striking asymmetry between AB and BC source judgments, with the latter being near chance levels. REMERGE also produced the best quantitative fit (see Table 1), with the BIC index providing evidence in favour of it over the next best fitting model (i.e. blend_3)³⁹. Notably, the simplest interpretation of the encoding based hypothesis – the blend_1 model – where a single integrated representation (i.e. represented by a single ABC unit) is formed during encoding of the overlapping A-B and B-C pairs, provided a poor qualitative and quantitative fit to both choice and source data (see Fig. 6 and Table 1). The only blend model that produced a reasonable qualitative and quantitative fit to the data (i.e. blend_3) can be considered a specific instantiation of an encoding-based of model – since it incorporated both the notion of proactive interference during encoding, and maintained individual representations for AB and BC episodes (see Discussion).

Discussion

Despite rising evidence that the hippocampus plays an important role in certain types of generalization^13,15, it remains unclear whether this is primarily a result of representations formed at the stage of encoding^11,17,18 or mechanisms operating at the point of retrieval (i.e. at test)¹⁴. Here we took the PAI paradigm^{3,4,5,30,31,40}, a widely used hippocampal-dependent task, and modified it to enable us to study the relationship between episodic memory and generalization. Through combining computational modeling with this richer behavioral dataset, we aimed to reveal the underlying mechanisms that support generalization.

Results across 4 separate studies involving different experimental conditions demonstrated that participants’ ability to generalize (i.e. perform inferences on AC choice test trials) was associated with a surprising loss of episodic memory. Whilst participants’ performance was superior on AB as compared to BC choice test trials, the asymmetry in terms of episodic memory performance was much more pronounced: performance on BC source judgments was near chance levels, with a relative preservation of AB episodic memory. Interestingly, however, despite the poor episodic memory overall on BC trials logistic regression analyses across all experiments (Fig. 4) revealed that better BC (and indeed AB) episodic memory was associated with a superior capacity for generalization.

Computational modeling demonstrated the ability of the retrieval-based model REMERGE to provide the best qualitative and quantitative fit to the empirical data among the models tested (see Fig. 6 and Table 1). Indeed, the finding from the logistic regression that superior episodic memory for BC (and AB) pairs relates to better generalization performance is also consistent with a retrieval-based model of generalization – because robust reactivation of pattern separated episodic traces facilitates episodic recombination mediated by a recurrent mechanism (see ref. 19 for simulation of this effect). Notably, the only alteration to the original model¹⁹ was the assumption of a degree of proactive interference during BC encoding resulting in weaker encoding of BC, as compared to AB, object pairs. This notion of proactive interference – which relates to a wider literature on the associative learning of overlapping pairs^41,42,43 was implemented in the model as slightly stronger weights between the A and B feature units and AB conjunctive units, as compared to B and C feature unit to BC conjunctive unit weights (see Methods).

In contrast, the simplest model incorporating the notion of the formation of blended/integrated representations during encoding^11,13,17 (i.e. coded by a single ABC unit in the blend_1 model), provided a poor fit to the qualitative profile of the data (see Fig. 5). The most complex version of the blend model (i.e blend_3) was able to provide a reasonable qualitative fit to the data, though the quantitative fit was substantially worse than REMERGE (see Table 1) – despite being able to incorporate the assumption of stronger encoding of AB pairs than BC pairs through an additional AB unit as well as a similar weight asymmetry to that used in REMERGE (see Methods and Fig. 5). It is also worth noting that this model can be viewed as a specific instantiation of an encoding-based model that preserves pattern separated representations for individual episodes, whilst allowing for the creation of a new blended representation. As such, this model configuration would appear to differ from current proposals of encoding-based models (e.g.^11,17,18,40). Moreover, this representational scheme is consistent with proposed extensions of the REMERGE model that allow for the creation of “stored generalizations” (see ref. 19) that would coexist with representations of individual episodes.

The results of these simulations – and additionally the profile of reaction times (i.e. AB < BC < AC) – favours a retrieval-based (cf. encoding-based) mechanism such as REMERGE¹⁹, in which overlapping episodes (i.e. AB, BC object pairs in the PAI paradigm) are represented in a pattern separated fashion, with related episodes being recombined at the point of retrieval to support generalization. However, a natural question is what explains the pronounced loss of BC episodic memory in the context of such putatively pattern separated representations. To illustrate this, it is worth considering what occurs during source trials in the REMERGE model: during a BC source test trial there is partial activation of the A object unit on the feature layer, due to the slightly weaker encoding of the BC object pair based on the assumption of proactive interference. It is this partial activation of feature units denoting objects not actually present on the screen (i.e. A unit in BC source trial), that translates into a mismatching recall signal (cf.²⁷). This mismatching recall signal leads the network to assign a relatively high probability (i.e. around 50%) that the objects (i.e. B & C) weren’t actually experienced as a pair in one episode. This mismatching recall signal arises to a much lesser extent in AB source trials due to the stronger encoding of AB pairs which through competitive inhibition in the conjunctive layer effectively curtails activation of the BC unit (and therefore activation of the C unit on the feature layer). Note that it is for similar reasons that high levels of mismatching recall (i.e. partial activation of the B unit) during an AC source trial causes the network to assign a lower probability of objects A and C having been studied together (i.e. ~30% in experiment 1) – and therefore relatively high levels of accuracy (i.e. ~70% correct in experiment 1). To summarize: the simulations demonstrate that the loss of BC episodic memory does not imply the existence of blended representations. Instead, the observed episodic memory loss can be accounted for by mismatching recall, in the context of pattern separated representations for the individual pairs.

Our simulations were set up to directly compare the fit of the retrieval-based REMERGE model to the behavioral data to a range of increasingly complex encoding-based models with blended/integrated representations, all implemented within the same architectural framework. Whilst the results of the quantitative model fitting procedure (i.e. BIC values: see ref. 44) provide strong evidence that the empirical data is best captured by the retrieval-based REMERGE mechanism, it is important to note that we have not tested the full space of models that incorporate the notion of blended representations (e.g. the temporal context model (TCM)^18,45). Indeed, previous research on the PAI paradigm suggests that encoding and retrieval-based mechanisms may both operate (e.g.¹³). In particular a recent multivariate fMRI study³¹ (also see ref. 46) provided evidence that in the anterior hippocampus the representation of individual items (i.e. A and C objects) becomes more similar following exposure to overlapping pairs (i.e. AB, BC) during the study phase of the PAI paradigm – whereas item representations within the posterior hippocampus became less similar. Whilst it was not possible to link these representational differences to generalization at the behavioral level – because participants were intentionally trained to ceiling levels of performance – one hypothesis would be that both encoding-based and retrieval-based mechanisms support generalization, with a differential contribution of anterior and posterior hippocampus, respectively.

Encoding and retrieval based mechanisms may also play differential roles depending on the experimental conditions (e.g.¹³). It is interesting to note, however, that at least in our study we observed the same qualitative profile of findings across 4 experiments across a range of experimental conditions. In experiments 1 and 4, which involved encoding-test cycles, participants encoded object pairs having been instructed that they would be subsequently tested on their ability to generalize. In experiment 2, however, participants were told to keep the object pairs as separate as possible so as to perform as well as possible on a subsequent episodic memory test (i.e. the generalization test was a surprise). In experiment 3, we manipulated the novelty/familiarity of scenes that preceded BC encoding trials with the aim of biasing hippocampal processing towards encoding/retrieval mode, respectively³⁶ – drawing inspiration from a previous study that showed significant effects on generalization³⁷. Whilst we did not find an effect of novelty/familiarity – either as a main effect or as a function of subsequent memory – one should be cautious in interpreting this null finding, particularly given the significant differences between paradigms.

The PAI paradigm has been widely used to study the mechanisms by which the hippocampus supports generalization. Our study, however, represents the first to relate episodic memory in the PAI task to the ability to generalize. Indeed, existing paradigms have tended to study episodic memory for overlapping associations (e.g.^41,42,43), or generalization^4,30,31,40 in isolation from one another. Our study, therefore, reveals new aspects of the widely used PAI paradigm and shows that an apparent loss of episodic memory occurs alongside an ability to generalize. Further, our findings demonstrate that the combination of pattern separated episodic representations and recurrence implemented in the REMERGE model can account for the complex relationship we discovered between episodic memory and generalization.

Methods

Participants

Healthy individuals who were free from neurological or psychiatric disease, and currently undertaking or had recently completed a university degree at the University College London took part in the experiments (experiment 1: n = 24 (15 females); experiment 2: n = 21 (13 females); experiment 3: n = 16 (11 females) experiment 4: n = 16 (6 females). All participants gave full informed consent prior to the experiment. All experimental procedures were approved by the local research ethics committee (Division of Psychology and Language Sciences, University College London), and procedures were carried out in accordance with the approved guidelines. Participants were paid a minimum of £10 for completing the experiment, and an additional amount (i.e. up to £10) depending on their performance on the task.

Object stimuli

Object stimuli were a selection of pictures obtained from the Bank of Standardized Stimuli (BOSS - https://sites.google.com/site/bosstimuli), which is a database of photo stimuli that have been normalized across several parameters⁴⁷. All objects were presented in the RGB colour space (8 Bits/pixel) on a white background, with a resolution of 600 × 600 dpi and they were all resized to a width and a height of 220 pixels.

Scene stimuli

A set of 80 landscape pictures was obtained from various sources on the internet to serve as novel/familiar stimuli in Experiment 3. All the pictures depicted a landscape scenario, without any prominent objects. All scenes were presented in the RGB colour space (8 Bits/pixel), with a resolution of 72 × 72 dpi and they were all resized to a width and a height of 500 pixels using Photoshop^© CS5.

Procedures common to all 3 experiments

In all three experiments, 80 different object triplets (e.g. A1-B1-C1, A2-B2-C2) were used. The allocation of stimuli into triads and the sessions order were randomized across participants.

Encoding Trials

During study trials participants viewed a pair of objects (e.g. A1-B1, B1-C1) each displayed on either side of the screen. The left-right position of the objects on the screen was pseudo-randomised across trials and each pair was presented for 2.5 seconds. A fixation cross, presented for 0.5 second, preceded the presentation of the next trial (Fig. 1).

Test Trials

Choice trials

Participants were presented with three objects; a cue was presented on the top-centre of the screen (e.g. object A1) and two possible choices were presented on either side at the bottom of the screen (e.g. C1 and C7). To control for familiarity, the incorrect choice was a familiar item (i.e. C7), which was a member of a different triplet. As in previous studies (e.g.⁴⁰), there were 2 types of choice test trials: those that required generalization (i.e. associative inference: A-C choice test trials), and those that did not (i.e. A-B and B-C choice test trials). For example, in an A-B choice test trial: object A1 could be presented at the top of the screen, with the subject required to choose B1 (correct) over B2 (incorrect) object stimuli. Participants had a maximum of 10 seconds to respond, by pressing the left or the right arrow key buttons, corresponding to the item at the bottom (e.g. B1) that they thought was associated with the cue (e.g. A1). After participant’s choice, a blue bar appeared below the chosen item (duration 1 second), regardless of its correctness (Fig. 1B).

Source trials

This type of trial was a novel feature of our PAI paradigm, and not incorporated in previous studies (e.g.⁴⁰). Critically, source trials provided a measure of participant’s episodic memory, for example testing whether they knew that A1 and B1 objects had actually been presented together. In contrast, participants’ responses during choice test trials probed their memory for whether objects were associated with one another (i.e. part of the same triplet), regardless of whether they had been presented together (e.g. A1-B1) or not (e.g. A1-C1).

Source trials were implemented as follows (see Fig. 1): after a participant’s response during a given choice test trial, participants were asked to state whether the chosen object had actually been presented with the cue object (i.e. was “directly” associated), or whether it was “indirectly” associated through a third object. As such, on an A-C test trial, an “indirect” response would be appropriate, whilst in A-B and B-C test trials “direct” responses would be correct. This phase was also self-paced, with a maximum of 10 seconds given for participants to make the direct/indirect judgement. A fixation cross, presented for 0.5 second, preceded the presentation of the next trial.

Specifics of each experiment

Prior to the start of each experiment, participants completed a short demo (involving different object stimuli) that was specific to each experiment: for example in experiments 1 & 3, this informed them that there would be encoding-test cycles. In experiment 2, participants were shown a demo relating to how to perform test trials after the last encoding session.