Hippocampal neurons code individual episodic memories in humans

The hippocampus is an essential hub for episodic memory processing. However, how human hippocampal single neurons code multi-element associations remains unknown. In particular, it is debated whether each hippocampal neuron represents an invariant element within an episode or whether single neurons bind together all the elements of a discrete episodic memory. Here we provide evidence for the latter hypothesis. Using single-neuron recordings from a total of 30 participants, we show that individual neurons, which we term episode-specific neurons, code discrete episodic memories using either a rate code or a temporal firing code. These neurons were observed exclusively in the hippocampus. Importantly, these episode-specific neurons do not reflect the coding of a particular element in the episode (that is, concept or time). Instead, they code for the conjunction of the different elements that make up the episode. Kolibius et al. show that individual neurons in the human hippocampus code for particular episodic memories.


Abstract:
The hippocampus is an essential hub for episodic memory processing. However, how human 25 hippocampal single neurons code multi-element associations remains unknown. Some argue that each hippocampal neuron codes for an invariant element within an episode. Instead, others have proposed that hippocampal neurons bind together all elements present in a discrete episodic memory. Here, we provide evidence for the latter. We show that individual neurons, which we term Episode Specific Neurons (ESNs), code discrete episodic memories using either a rate code 30 or a temporal firing code. We find evidence for these neurons exclusively in the hippocampus.
Importantly, these ESNs do not reflect the coding of a particular element in the episode (i.e., concept or time). Instead, they code for the conjunction of the different elements that make up the episode.

One-Sentence Summary:
Individual neurons in the hippocampus code for discrete episodic memories. 5

Introduction:
Episodic memory refers to our ability to reinstate the what, where and when of past experiences (Tulving, 2002). This ability is thought to depend on the reinstatement of neural activity that was present at memory encoding (Pacheco Estefan et al., 2019). It is undisputed that the 10 hippocampus plays an integral role in episodic memory processing (Lisman et al., 2017;Marr, 1971;Squire, 1992) and the binding of multimodal information (Cooper and Ritchey, 2020).

However, how it codes episodic memories remains controversial.
One important open question is whether neurons in the hippocampus code for specific elements or an entire episode. Concept Neurons in the hippocampus fire in response to specific invariant 15 elements independent of the context in which they are presented (Gelbard-Sagiv et al., 2008;Mormann et al., 2008;Mormann et al., 2011;Quiroga et al., 2005). One contemporary idea is that the diverse elements that make up an episode are coded by the simultaneous activity of a set of these Concept Neurons (Quiroga, 2012;2020) or by expanding the selectivity of existing Concept Neurons (Ison et al., 2015). According to this framework when you are sitting in your 20 favourite coffee shop with your best friend, one set of Concept Neurons might code for the coffee shop and a separate set for your friend ( Figure 1A).
Alternatively, single units in the hippocampus might sparsely encode a specific set of elements within an individual episode and act as pointers to cortical modules during memory reinstatement. According to this so-called Indexing Theory (Teyler and DiScenna, 1986;Teyler and Rudy, 2007), the entire episode with your friend in the coffee shop is represented by a set of hippocampal neurons ( Figure 1A). Unlike Concept Neurons, these Episode Specific Neurons (ESNs) would fire in response to the conjunction of all the diverse information within an episode and not in response to individual content elements. Despite computational models pointing 5 towards the existence of ESNs (Bowman and Wyble, 2007;Krotov and Hopfield, 2020;Parish et al., 2021;Whittington et al., 2022), to this day there is no evidence for such a sparse conjunctive code in humans.
In the present work, we provide support for the existence of this content-agnostic episodic memory code implemented through Episode Specific Neurons. We leveraged intracranial 10 microwire recordings to investigate the firing patterns of neurons in the human hippocampus and hypothesized that a significant number of hippocampal neurons reinstate their firing rate within a specific episode (i.e., fire during encoding and retrieval).
Importantly, these ESNs would code for the conjunctive elements present within an episode and are not tuned to individual elements within the episode. The existence of ESNs does not preclude 15 Concept Neurons from participating in episodic memory processing. However, investigating the role of Concept Neurons in episodic memories goes beyond the scope of this work. As control analyses, we investigated whether this firing activity can be explained by a firing response to specific invariant elements, as occurs in Concept Neurons (Quiroga et al., 2005), or by a time preference, as occurs in Time Cells (TC; Reddy et al., 2021;Umbach et al., 2020). (A) Left: The classic Indexing Theory (Teyler and DiScenna, 1986) proposes that neurons in the hippocampus represent a conjunctive code that binds together all the elements that make up the episode in the form of an index 5 Within this framework, neurons do not directly code for the elements themselves (i.e., the smell of the coffee, your friend, the background music, the café, etc.), but rather act as pointers to these different elements which themselves are coded elsewhere (i.e., the neocortex). Right: Some hippocampal neurons are thought to code for specific elements or concepts, which is why they are called Concept Neurons (Mormann et al., 2008;Mormann et al., 2011; ic us . ur es fic 1; Quiroga et al., 2005). Within this framework, a group of neurons collectively code an episodic memory, with each neuron representing a specific element involved in that episode (i.e., a neuron coding for the coffee, another neuron coding for your friend, etc. (Quiroga, 2012;2020)). It is important to note that one index or one concept is likely to be coded by an assembly of neurons, not a single neuron.
(B) Outline of the procedure for Experiment 1. During encoding, all participants were instructed to imagine a vivid 5 episode involving an animal cue and two associate images (two faces, two places or a face and a place) and rated its plausibility. This approach is suitable for investigating episodic memory as originally defined by Tulving in 1972(Tulving, 1972. During recall, participants were asked to retrieve the associated images when cued with the animal cue. The experiment was self-paced and every episode was learned and tested only once. Following each encoding block of roughly 20 episodes, participants performed a short distractor task. The pink areas represent the time 10 windows used for subsequent analyses (see Methods).
(C) Outline of the procedure for Experiment 2. Left: The memory task was largely the same as in Experiment 1 (see Figure 1B). However, events consisted of one cue (either an animal, a face or a place) and one associate image (either an animal, a face or a place). Right: After the memory task, patients performed a visual tuning task where the previously used stimuli were shown multiple times in quick succession without a memory component. This 15 approach has been traditionally used to identify putative Concept Neurons.
During the encoding phase of Experiment 1 participants created a vivid mental story consisting 25 of an animal cue and two associate images (two faces, two places or a face and a place). By contrast, Experiment 2 consisted of one cue and one associate image (both either an animal, face, or place). The encoding and recall phase of the experiment was interleaved with a short distractor task where patients had to judge whether a series of 15 numbers was odd or even. The distractor phase lasted between 22.43s and 224.52s (median duration: 42.19s). During the recall phase, the animal cue was presented again and participants were asked to retrieve the associate image(s).
The experiments were self-paced and every episode was learned and retrieved only once. 5 Participants correctly recalled on average 68.38% (SE = 4.64%) episodes in the first experiment (see Supplements Table S1) and on average 65.63% (SE = 4.45%) episodes in the second experiment (see Supplements Table S2). This is substantially more than would be expected by chance (16.7% and 25% respectively). 10

Identifying Episode Specific Neurons (ESNs)
For every neuron, we determined the firing rate during each episode at encoding and retrieval.
We then z-scored the firing rate across all encoding and retrieval episodes and excluded all later forgotten episodes. This was done independently for encoding and retrieval to account for general differences in firing rates. We measured episode-specific firing reinstatement as the 15 product of the standardized firing rates at encoding and retrieval (Figure 2A).
Using a episode-shuffling procedure, we generated a distribution of reinstatement values expected by chance. A neuron was considered an ESN if (i) the empirical reinstatement value exceeded the 99 th -percentile of the shuffled distribution for at least one episode and (ii) the standardized firing rate for encoding and retrieval of that episode each exceeded 1.645 (≙ p right-20 tailed < 0.05). The second criterion prevented the identification of ESNs which would excessively fire at only one phase of the task (i.e., encoding or retrieval).
It could be argued that ESNs identified in this manner could reflect the firing of cells tuned to the image of the animal cue, rather than the conjunction of all elements since the cue is episode-unique and presented during encoding and retrieval. To address this issue, in Experiment 1, we excluded neurons that showed a significant firing increase during the first second after the encoding of the animal cue for episodes that were later reinstated (see Methods). This procedure has traditionally been used to identify putative concept neurons (Quiroga et al., 2005;Mormann et al., 2008;Mormann et al., 2011). Using this approach, we identified a significant number of 5 hippocampal ESNs in Experiment 1 (136 out of 585 neurons ≙ 23.25%; p < 0.001; permutation test; Figure 2B). Comparable results are obtained when (i) adding up the standardized firing rate between encoding and retrieval instead of multiplying them (E&R >= 1.645, reinstatement: E+R) (125 ESNs; p < 0.001), (ii) increasing the minimum standardized firing rate from z = 1.645 to z = 2.6 (E&R >= 2.6, reinstatement: E*R) (29 ESNs; p < 0.001) and (iii) using a different 10 reinstatement measure that normalizes the encoding and retrieval product by their absolute difference (E&R >= 1.645, reinstatement: (E*R)/|E-R|) (53 ESNs; p < 0.001). This reinstatement measure has the important advantage of considering the similarity between the encoding and retrieval firing rate.
In Experiment 1, 117 out of 136 ESNs (≙ 86.03%) coded for a single episode, whereas the rest 15 coded for multiple episodes. Two example ESNs are shown in Figure 3. These ESNs are unlikely to be concept cells tuned to the animal cue as the firing rate during encoding reaches its maximum only after the presentation of the associate stimulus (see Figure 4).
It is of note that the proportion of neurons that can be classified as ESNs is proportional to the number of events learned and retrieved (the same is the case for Concept Neurons). This is 20 because we apply the threshold derived from the first permutation test to all episodes, without family-wise error correction. As such it is not suitable to determine the sparseness of the hippocampal code. However, the proportion of ESNs of all recorded neurons is useful as an estimation of how many ESNs we can expect in future analyses.
It is crucial to understand that this alpha-level inflation does not extend to the group-level permutation test, where the same number of tests are applied to randomly shuffled data. We have added a simulation using random values as spike rates and using circularly shuffled spike times to show that there is no inflation of the alpha error at the group-level at which we interpret our findings (see Methods; Figure S2). 5 ESNs are suggested to reflect a unique coding mechanism of the hippocampus (Teyler and DiScenna, 1986;Teyler and Rudy, 2007). In line with this, we did not find a significant number To conclude, we find a significant number of ESNs in the hippocampus, but not in the parahippocampus. The analysis approach we use to identify ESNs is robust to deviations in the 15 parameter space. (A) A schematic for identifying Episode Specific Neurons (ESNs) is shown. The diagram shows the z-scored firing rate on the y-axis for ten simulated episodes on the x-axis colour-coded for encoding and retrieval (purple and orange, respectively). The transparent bars encompassing encoding and retrieval indicate the product of encoding and retrieval firing rates, which is used as the measure of episode-specific firing reinstatement. The dotted red line shows the threshold (derived from a shuffling procedure, see Methods). Because of the way ESNs are defined, they 5 are required to fire substantially above their average firing rate during encoding and retrieval, which rules out neurons that generally show an increased firing rate during remembered episodes. ESNs do not code for the content/visual properties of the cue or associate image 15 Traditionally, visually responsive neurons have been identified using the repeated presentation of a stimulus. In the above analysis, we only present the animal cue once, which is suboptimal for ruling out Concept Neurons tuned to the animal cue. To ameliorate this shortcoming, in Experiment 2 we added a visual tuning task ( Figure 1C) after the memory association task. During the visual tuning task, images from the memory task were repeatedly shown in quick 20 succession. This approach is widely used to identify putative Concept Neurons that respond to one of the images independently of any memory processes (for example Concept Neurons see Figure S3; Mormann et al., 2008;Mormann et al., 2011;Quiroga et al., 2005). When excluding Concept Neuron activity in this independent dataset, we replicated our previous results and identified a significant number of ESNs (38 out of 216 neurons ≙ 17.59%; p = 0.0053; permutation test; Figure 2C). In Experiment 2, 34 out of 38 ESNs (≙ 89.47%%) coded for a single episode, whereas the rest coded for multiple episodes.
However, traditional Concept Neuron detection methods might be too conservative to identify weakly tuned Concept Neurons. To address this concern, we drastically reduced the threshold of what constitutes a Concept Neuron, i.e., lowering the uncorrected threshold from p = 0.0005 to p 5 = 0.05, which increased the number of Concept Neurons from 58 to 155 (out of 216 neurons). During a typical tuning task an average of 108.7 (min: 80; max: 156) different images are shown and each image is tested for visual tuning. There is no correction for multiple testing rendering a threshold at p < 0.05 very liberal.
Remarkably, incorporating this liberal threshold to exclude potential Concept Neurons, had little 10 effect on the number of ESNs which remained almost unchanged (36 out of 216 neurons ≙ 16.67%; p = 0.0025; permutation test). It is conceivable that some images that are presented during the visual tuning task act as cues that reactivate some ESNs. These reactivated ESNs would then be erroneously rejected as Concept Neurons. However, in practice, only four potential ESNs were excluded based on the visual tuning task (six when lowering the Concept 15 Neuron threshold). We suspect that ESNs were not reactivated during the visual tuning task because patients were not instructed to actively retrieve memories and thus may not have been in a "retrieval mode" (Lepage et al., 2000;Addante et al., 2011). Tulving first proposed this concept in 1983 (Tulving, 1983;Nyberg et al., 1995), referring to the cognitive state that occurs when we actively attempt to remember something. Being in a retrieval mode increases the likelihood that a (C) Spike density plot for reinstated episodes. Note that the experiment is self-paced and episode length varies.
(E-H) same as (A-D) but for a different example ESN. 5

ESNs are limited to later remembered episodes
We have so far demonstrated that ESNs reinstate their firing rate when remembering a unique episode. This reinstatement cannot be explained by the semantic content or visual properties of the used image, which strengthens the notion that ESNs code for memories. In line with this, we 10 did not find a significant number of miss-ESNs when limiting our analysis to later forgotten episodes (15 out of 585 neurons ≙ 2.56%; p = 0.4229; permutation test). However, this result could stem from a lower number of forgotten events (see Table S1). To counter this bias, we equalized event numbers between later remembered and later forgotten events for every neuron by randomly sampling (with replacement) later remembered events as many times as participants 15 forgot an event. If any of the sampled events were later reinstated, we considered this neuron a miss-ESNs under the null hypothesis. By repeating this procedure 10,000 times we generated a distribution of how many miss-ESNs were expected if the number of later remembered and later forgotten events were equal. This analysis did not result in a significantly lower empirical number of miss-ESNs compared to hit-ESNs (p = 0.7032, bootstrapping test). To conclude, we 20 did not find a significant number of ESNs when restricting our analysis to episodes that were forgotten. However, when considering that fewer episodes were forgotten than remembered there was no difference in the number of hit-ESNs and miss-ESNs.

Identification of temporal Episode Specific Neurons (tESNs)
The previous identification of ESNs relied on a rate code, i.e., the standardized mean firing rate during one episode at encoding and retrieval. We have adapted this analysis to identify neurons that reinstate a temporal pattern of firing. For every neuron, we considered the spiking activity six seconds before until one second after the response during encoding and retrieval (the first and last second was later excluded to avoid edge artefacts). 5 By convolving each spike with a gaussian kernel (standard deviation: 100ms, length: length: ±300ms, peak normalized to 1) we created a measure of instantaneous firing rate. Because we do not know the exact times when an episode is encoded or retrieved, we cross-correlated this episode-specific instantaneous firing rate during encoding and retrieval and considered the maximum value as the reinstatement value. We repeated this process after shuffling the encoding 10 and retrieval episode order 1,000 times and took the 99th percentile as a threshold for the empirical reinstatement value. If the empirical reinstatement value reached this threshold, we considered the neuron a temporal Episode Specific Neuron (tESN; Figure S4). In the next step, we randomly drew for each neuron one of the previously calculated permutations. If these permuted values reached or exceeded the threshold the neuron was considered a tESNs under the 15 null hypothesis. We repeated this process 1,000 times to build a null distribution against which we compared our empirical number of tESNs. We found a significant number of empirical We employed a permutation test to assess the degree of overlap between episodes reinstated by a rate code (ESNs) and those reinstated by a temporal code (tESNs). Specifically, for each neuron we shuffled the identity of whether an episode was reinstated or not and compared the overlap in the shuffled dataset with the empirical overlap (i.e., is an episode reinstated using a rate code also reinstated using a temporal code and vice versa). Our analysis revealed a significant overlap in Experiment 1 (Experiment 2), with 20.25% (26.19%) of all episodes reinstated by ESNs also being reinstated by tESNs, and 25.81% (24.44%) of all episodes reinstated by tESNs also being reinstated by ESNs (both p < 0.001). 5 We then tested the validity of this analysis using random spike times. We generated these random spike times by first rounding the empirical spike times to the nearest integer and then drawing an equal number of pseudorandom integer values from a discrete uniform distribution between the first and last empirical spikes times. We did not find a significant number of tESN in either experiment (both p > 0.2). We next repeated the analysis using 500 surrogate datasets. 10 These datasets were created by segmenting all spike times into episodes in the order they occurred, and then circularly shuffling them. The results of this analysis indicated that the percentage of significant tESN identifications was below the 5% threshold (4.18%), providing further evidence of the credibility and reliability of our analysis.
In conclusion, we show in two separate experiments a significant number of neurons that 15 reinstate an event specific temporal firing pattern during successful memory retrieval.

ESNs do not code for time
Recent studies in humans show that some hippocampal neurons code specific time points invariant across repetitions, which are referred to as Time Cells (Reddy et al., 2021;Umbach et 20 al., 2020). We investigated whether our dataset contains such Time Cells (TC) using a similar method as employed by (Umbach et al., 2020). Due to the self-paced nature of our experiment, each encoding block varied in length. To accommodate this, we used both the unaltered block length, as well as a normalized block length within one recording session (see Methods). Of all 585 recorded cells, 12 (normalized) and 10 (non-normalized) fulfilled the criteria of TCs, which is below chance level (p values > 0.9; permutation test). Critically, there was no significant overlap between neurons that behaved like TCs and ESNs (p values > 0.3; permutation test) indicating that ESNs cannot be construed as TCs. 5

ESNs show a wider waveshape than other neurons
We found some evidence that spike waveshapes of ESNs are wider than those of other units ( Figure S5A; p = 0.0563; with data from Experiment 1 and p = 0.0121 with data from both experiments combined; both independent samples t-test), possibly indicating that ESNs are physiologically different from other neurons. In the hippocampus, a wider waveshape has 10 previously been associated with excitatory cells (Prestigio et al., 2019)

Recorded neurons are mostly single neurons and not multi-units
Although we tried to separate multi-units into single neurons as best as possible during the spike sorting procedure (see Methods), some units might still represent activity from multiple neurons.
We thus employed the method outlined by Tankus and colleagues (Tankus et al., 2009)    Firing rate of ESNs (n = 136) from cue onset until five seconds later during memory encoding (left) and retrieval (right). We utilized a bootstrapping method to ensure an equal number of reinstated and non-reinstated episodes for each ESN, followed by a computation of the cluster-based permutation test (Maris and Oostenveld, 2007). The proportion of iterations that contained a significant cluster at a specific timepoint is represented by the grayscale inset at the bottom of the figure. The coloured shaded areas represent the standard error of the mean (SEM) that was 10 calculated across all ESNs. We ensured an equal number of reinstated and non-reinstated episodes per ESN while calculating the SEM using a bootstrapping method.

Discussion
Using an associative episodic memory paradigm in human epilepsy patients, we identified 15 hippocampal neurons that are active during the initial encoding of a unique episode and later reinstate their firing rate when successfully remembering the same episode. Therefore, we term these neurons Episode Specific Neurons (ESNs). The activity of these neurons could not be Previous studies have demonstrated that Concept Neurons increase their firing rate during memory retrieval when the image they are tuned to is part of the memory (Gelbard-Sagiv et al., 5 2008; Ison et al., 2015). We used two approaches to ensure that the ESNs we identified are not Concept Neurons that selectively respond to visual elements or semantic concepts: (i) in Experiment 1 we excluded ESNs that were visually responsive to the presentation of the animal cue at encoding. (ii) Following the episodic memory task in Experiment 2 patients completed a visual tuning task using all previously presented stimuli. This is a standard method to identify 10 putative Concept Neurons (Ison et al., 2015;Mormann et al., 2008;Mormann et al., 2011;Quiroga et al., 2005) and allowed us to exclude episodes where a neuron showed a visual tuning to either the cue or the associate image. Using this approach, we replicated our results from Experiment 1 in a new sample of patients and found a significant number of ESNs while also verifying that these neurons do not selectively respond to visual elements or semantic concepts. 15 Importantly, this finding was robust even when dramatically reducing the threshold of what constitutes a Concept Neuron. Taken together these analyses reinforce the argument that ESNs are memory related.
The existence of ESNs does not exclude Concept Neurons from playing a role in episodic memory processes. Concept Neurons might code the semantic aspect of an episode (i.e., the 20 general concept of "coffee shop"). However, according to the Indexing Theory (Teyler and DiScenna, 1986;Teyler and Rudy, 2007), hippocampal neurons that perform this indexing function should have no initial tuning and are allocated to a specific episode during memory formation (i.e., the coffee shop in a specific setting). The behaviour of ESNs would be consistent with such an indexing function and may add crucial event-specific information to an episode, that Concept Neurons cannot encode themselves.
We found a significant number of ESNs when excluding potential multi-units in the first experiment. However, in the second experiment we only obtained a statistical trend for a significant number of ESNs. This was likely because this restriction resulted in too few single 5 neurons in the second experiment, thus reducing statistical power.
Because we do not know the exact time points when episodes are encoded or retrieved, we used a rate code approach in the first instance for these analyses (i.e., averaging the number of spikes over a time of interest and encoding and retrieval). In addition, we present evidence for a reinstatement of a temporal firing code which we uncovered by shifting the instantaneous firing 10 rate (i.e., the spike times convolved with a gaussian kernel) using a cross-correlation.
Interestingly, we found a significant overlap of episodes that were coded both by tESNs and ESNs. This suggests that in many cases, a temporal firing code can still be identified through a rate code analysis. In some cases we detected a reinstated episode using either a rate code or a temporal code alone. It is possible that hippocampal neurons employ two distinct coding 15 mechanisms, or in some instances, we may have missed certain spikes, resulting in the inability to identify either firing code, which is an interesting question for future studies.
The Indexing Theory proposes that this coding mechanism is unique to the hippocampus. In line with this, we did not find a significant number of ESNs in the parahippocampus. However, these findings are based on a relatively small sample size and should be considered preliminary. Future 20 studies are needed to ascertain the regional specificity of ESNs to the hippocampus.
We did not find a significant number of ESNs when restricting our analysis to later forgotten episodes. However, there was no significant difference between the number of ESNs when considering later remembered and later forgotten events. This suggests that hippocampal neural reinstatement might occur without behavioural memory retrieval. This could be due to downstream processing being disrupted (i.e., due to interference, selective attention).
Alternatively, it is possible that in some cases during memory encoding patients created an episodic memory that did not incorporate the presented associate stimuli. While retrieval would lead to neural reinstatement, the patients would not be able to choose the correct associate 5 images. Another possible explanation for this finding is that a relatively fewer number of forgotten episodes led to an insufficient power to detect a significant number of ESNs that code for later forgotten episodes.
Time Cells (TC) are neurons that invariantly fire at specific, reoccurring time points (Reddy et al., 2021;Umbach et al., 2020). We did not find a significant number of TCs in our study and 10 there was no significant overlap between TCs and ESNs. This might be because the self-paced nature of the task introduced too much time variation between too few learning blocks to uncover TC dynamics. However, the absence of TCs in our paradigm corroborates ESNs as independent from TCs.
We found that ESNs have a wider waveshape than other neurons, which suggests that they are 15 physiologically different from other single neurons. Specifically, the broader waveshape of ESNs suggests that they are likely excitatory cells (Prestigio et al., 2019). Alternatively, it is possible that ESNs and neurons with a narrower waveshape are located in different hippocampal subfields. Unfortunately, with the current methods, we lack the precision to designate neurons to individual subfields (Quiroga, 2019). 20 One limitation of the current study is that every event was encoded and retrieved only once.
However, the very nature of episodic memories is one-shot learning and the ability to subsequently perform mental time travel. Any neural substrate that supports this function must occur after a single bout of learning and subsequent retrieval of a single episode. Our method honours this fundamental characteristic which is the defining feature of episodic memory as originally stated by Endel Tulving (Tulving, 1972). Arguably, a repeated design would have allowed for a more reliable ESN identification. However, each memory reactivation leads to a transient plasticity of the memory trace until it is reconsolidated again. During this time window profound changes in the neurons that code for the initial memory trace might occur (Nader & 5 Hardt, 2009). To avoid this potential confound, every episode is learned and retrieved only once in the present experiments. The stability of ESNs over repeated reactivations and extended periods, therefore, remains an interesting topic of research for future studies.
Our results are consistent with previous studies using fMRI that have shown item-specific activity reinstatement in the hippocampus (Chadwick et al., 2010;Mack and Preston, 2016) 10 where similar representations are associated with distinct activity patterns (Bakker et al., 2008;Berron et al., 2016). These findings are suggestive of an episode-specific neural code, which is consistent with our results. However, due to the coarse resolution of fMRI, these previous results cannot disambiguate whether this event-specific code is driven by a population of event-specific concept neurons, or whether it is driven by a population of event-specific indexing neurons. We 15 here provide evidence for the latter.
Previous intracranial work has identified a multitude of different neurons that detect episode boundaries and event onsets (Zheng et al., 2022) as well as novelty or familiarity (Rutishauser et al., 2006;Rutishauser et al., 2008;Rutishauser et al., 2015, Urgolites et al., 2022. Recent work showed that this is a generic signal that can be observed across the brain and is not unique to the 20 hippocampus (Urgolites et al., 2022). These cell types generally fired not specific to a particular episode, but instead across many episodes (Rutishauser et al., 2006;Rutishauser et al., 2008;Rutishauser et al., 2015). Previous research quantified neural firing reinstatement during scene encoding and recognition by considering the activity of all recorded neurons (i.e., population activity) (Zheng et al., 2022). In contrast, the vast majority of ESNs identified in our experiments coded for a single episode, and we demonstrated that neural reinstatement takes place at the single-neuron level. In line with our findings, recent work identified a sparse and item-specific memory recognition signal that was unique to the hippocampus (Urgolites et al., 2022). Importantly, we expect that an episode is coded by an assembly of ESNs from which we sampled 5 only one due to the limited number of neurons that can be recorded with the currently available methods.
Our findings are in line with previous work showing that episodic memories in the hippocampus are coded in a sparse distributed way (Wixted et al., 2014;Wixted et al. 2018). However, there are various reasons why we refrain from making any claims regarding sparsity in the present 10 study. A neural code can be sparse in two ways (Wixted et al., 2018). A neural code can be population sparse, which is the case when a low percentage of neurons respond to a given stimulus. It can also be lifetime sparse, which refers to a low percentage of stimuli that a given neuron responds to. On the one hand, we artificially induce lifetime sparsity (and by extension population sparsity) because we (i) standardize the firing rate during encoding and retrieval and 15 then (ii) multiply these two values. On the other hand, we drastically reduce the sparsity because we test for reinstatement at each episode without correcting for multiple comparisons. It is very important to understand that while this leads to alpha-level inflation at the level of the neuron, this does not extend to the group-level at which we interpret our findings. We have confirmed that our analysis does not have a bias towards positive findings using a simulation (see Figure   20 S2). Unfortunately, that also means that in the present study we have to refrain from making any claims regarding lifetime sparsity. Although most ESNs reinstate only a single episode, some code more than one. We expect that episodic memories are represented in the hippocampus as neural assemblies of single neurons and not individual neurons. Thus, it is plausible, that a subset of neurons within a neural assembly coding for memory A are also part of an assembly that codes for memory B. This coding mechanism is more efficient as a partial overlap between neural assemblies reduces the number of required assemblies.
In conclusion, we found neurons in the hippocampus that show firing reinstatement in response 5 to a specific conjunction of elements within a unique episode. These Episode Specific Neurons did not fire in response to individual concepts (Concept Neurons) or to specific, re-occurring time points (Time Cells). We propose that during memory formation an assembly of ESNs acts as a pointer or index that initially binds the elements of an episode together, in line with the Indexing Theory (Bowman and Wyble, 2007;Teyler and DiScenna, 1986;Teyler and Rudy, 10 2007). Reactivation of this pointer allows ESNs to reinstate the episodic memory previously encoded. Importantly, because ESNs reinstate unique episodes, they contain a time and content component. However, rather than reflecting the underlying coding mechanism, this time and content aspect necessarily emerges from the conjunctive code of an episode that is unique in content and time.

Procedure of memory experiment 1
During the encoding phase of the experiment the participant associated a cue with two other stimuli. For each episode, the cue was a new picture of an animal. The stimuli could be pictures 20 of either places, faces or both. Every picture was only shown once. Two seconds after the animal cue was presented, the associate stimuli were shown, while the animal cue remained on the screen. The participant was asked to create a vivid imaginary story involving the cue and the two stimuli. This part of the experiment was self-paced. The task continued once the participant rated the plausibility of the imaginary story (plausible/implausible).
After the encoding phase, the participant performed a distractor task to rule out working memory effects. During the distractor task, participants had to indicate whether a random number (up to two digits) that appeared serially on the screen was odd or even. After each response, the 5 participant received feedback indicating a correct or incorrect response. This task consisted of 15 trials.
During the retrieval phase, all cues from the previous encoding phase were presented sequentially in pseudorandom order. Each animal cue was presented for two seconds and subjects were tasked to retrieve the corresponding images. The participant was then asked how 10 many associated images they remembered (none, one, or two). Participants had as much time to respond as they required. If the participant indicated that they remembered one or two images, they then were asked to select two pictures from an array of four pictures (two targets and two distractors that consisted of pictures from the previous encoding block which were associated with a different cue). 15 The experiment ended after the retrieval phase if the total runtime exceeded 40 minutes, or if the patient asked to abort the experiment. Otherwise, the experiment continued with the next encoding block. The encoding block initially consisted of 20 episodes but could be adjusted depending on the cognitive abilities of the patient. If the hit rate fell below 66.25%, fewer episodes were shown for the next block and vice versa if the hit rate surpassed 73.75%. 20 The patients performed the memory task on a laptop computer (experiment 1: Toshiba Tecra W50, 60 Hz refresh rate; experiment 2: Lenovo L390 Yoga, 60.01 Hz refresh rate), while either seated in a chair next to their bed or their hospital bed.

Procedure of memory experiment 2
The second experiment is based on the first experiment with the following adaptations: participants are presented with one cue image (depicting an animal/place/face) and only one associate image (depicting an animal/place/face). During retrieval, participants were asked whether they remembered the associate image and the participants had to choose the correct 5 associate from an array of four pictures (one target and three distractors that consisted of pictures from the previous encoding block which were associated with a different cue). The experiment was terminated upon request or when the runtime at the end of a retrieval block exceeded 30 minutes. 10

Visual tuning task procedure
For experiment 2, the memory task was followed by a visual tuning task. During this tuning task, every image that was shown during the preceding memory task was displayed. Each image was shown six times in pseudorandom order on the screen for a duration of one second. The interimage interval was jittered between 500ms and 550ms. To ensure attention, patients had to 15 categorize the image as an animal, a place, or a face using the arrow keys on the keyboard.

Ethical approval
Ethical approval was granted by the National Health Service Health Research Authority (15/WM/0219) and the Ethik-Kommission of the Friedrich-Alexander Universität Erlangen-Nürnberg (142_12 B). Informed consent was obtained in accordance with the Declaration of Helsinki. 5

Behavioural analysis
For the analysis of the first experiment, we considered an episode a hit if the participant correctly identified both stimuli. We considered an episode a miss if the participant either indicated not to remember any stimuli or did not remember either stimulus correctly. Participants correctly 10 recalled on average 68.38% (SE = 4.64%) episodes in the first experiment (see Table S1) and on average 65.63% (SE = 4.45%) episodes in the second experiment (see Table S2). This is substantially more than would be expected by chance (16.7% and 25% respectively).

15
All statistical analyses were conducted using MATLAB R2020a on a computer running Windows 10 Enterprise. The significance threshold for all statistical tests was set at 0.05. Unless specified otherwise, all permutation tests were implemented with N = 10,000 random draws. 20 For all but one patient, a pre-operational T1-weighted MRI scan was co-registered with a postoperational scan and normalized in MNI space using SPM12. For one patient, a post-operational CT scan was used instead of a post-operational MRI scan. Each microelectrode was localized either within the hippocampus, within the parahippocampus, or outside of both brain structures through visual inspection of an (see Figure S1). Only activity from microwires in Behnke-Fried electrodes assigned to the hippocampus was analysed in the main analysis of the current study.

Co-Registering
Neurons in the parahippocampus were analysed in an independent follow-up analysis.

5
Patients were implanted with one to eight (see Table S1 and S2 for an overview) depth electrodes of the Behnke Fried type with microwire bundles (Ad-Tech Medical Instrument Corporation, USA) to localize epileptic foci. The electrode location was determined by clinical need. These single-use electrodes are made from platinum, have a diameter of 1.3mm and allow for simultaneous macro-and microcontact recordings. Platinum has a high impedance for lower 10 frequency and a low impedance for higher frequency bands. As such it is suitable to pick up local extra-cellular action potentials. The micro contacts extended radially past the endpoint of the macro depth electrode, and each contained eight high-impedance microwires (38-micron diameter) and one low-impedance microwire that is typically used for referencing.
The electrodes were connected to an ATLAS system (Neuralynx Inc, USA) consisting of CHET- 15 10-A pre-amplifiers and a Digital Lynx NX amplifier and recorded with a sampling rate of either 32,000 Hz (Location: Birmingham) or 32,768 Hz (Location: Erlangen). Upon acquisition, an analogue bandpass filter from 0.1 Hz to 9,000 Hz was applied. 20 In the following paragraph, we will outline the process used to filter the raw data, detect spike timestamps, extract features of the waveshape and cluster spike waveshapes into putative single neurons using the wave_clus toolbox. For a more in-depth description of the wave_clus algorithm, the reader is referred to (Chaure et al., 2018).

Spike detection and spike sorting
The unfiltered signal included both the local field potential and the action potentials of individual neurons. Action potentials are characterized by a very steep and transient amplitude in the signal.
To extract these spikes, we first applied zero-phase filtering using a second-order bandpass elliptic filter in the range of 300-3,000 Hz. The resulting signal contained the information of the so-called spike band. 5 Next, we segmented the continuous filtered data into epochs of five minutes. Segmenting the continuous data into smaller epochs had the advantage that noise in the signal did not increase the detection threshold for the whole recording and instead was limited to the segment in which it occurred (Chaure et al., 2018).
Spike detection was performed separately for positive and negative deflections. Once a spike was 10 identified, 64 data points around the spike maximum were extracted. This corresponds to a 2 ms window at a sampling rate of 32,000 Hz. The spike peak was aligned to the 20 th sampling point.
To avoid misalignment of the spike, the waveshape was first up-sampled to 320 data points using cubic spline-interpolated waveforms and then down-sampled again (Chaure et al., 2018).
Based on the extracted spike waveform, features were computed using a four-scale 15 multiresolution decomposition with a Haar wavelet. This results in 64-wavelet coefficients for each spike. The 10 most significant coefficients were identified using a Lilliefors test and used for the clustering procedure (Chaure et al., 2018).
Nonparametric clustering in the feature space was performed using superparamagnetic clustering (SPC). SPC grouped spike waves into clusters based on nearest-neighbour interactions (Blatt et 20 al., 1996). Template-matching in Euclidian space was performed to assign unclassified waveforms to one of the identified clusters. The resulting clustering solution was then manually inspected and further optimized by rejecting artefact clusters, splitting clusters that represent multi-unit activity and merging clusters that likely stem from the same neural source. See Figures S5 to S7 for an overview of the spike width, spike height, the Fano factor and the firing rate separately for ESNs and all other single units.

Identification of Episode Specific Neurons (ESNs)
For every single unit, we determined the number of spikes within each episode. During 5 encoding, spikes from the onset of the associate images (two seconds after the cue onset i.e., when the whole information of the episode was present) until the end of the episode were considered. During the retrieval phase, spikes from cue onset until the time point at which participants indicated how many images they remembered were considered. We chose this time window because an episode could be reinstated following cue presentation, while after the 10 response patients were presented with an array of images that could have potentially induced single-unit firing. Because the experiment was self-paced and longer episodes trivially contained more spikes, the firing rate (in hertz) was computed for each episode and single unit. In the next step, we z-scored this firing rate per single unit within all encoding episodes and retrieval episodes separately. Afterwards, we excluded all episodes that were later forgotten (for hit-15 ESNs) or that were later remembered (for miss-ESNs). Only sessions with at least eight episodes after this restriction were considered for further analysis. We then multiplied this standardized firing rate for encoding and retrieval episodes elementwise to gain an indicator for the reinstatement of firing for each episode (Figure 2A).
To estimate a threshold at which episode-specific firing reinstatement occurs on a single-unit 20 level, we permuted the order of the encoding episodes and recomputed the elementwise product of the shuffled episode series. We repeated this permutation step 10,000 times and stored all output values. The 99 th percentile of these pooled values was then used as a threshold for firing reinstatement. As an additional constraint, z-scored firing during encoding and retrieval each had to exceed 1.645 (≙ p right-tailed < 0.05) to make sure the elementwise product was not predominantly driven by a high firing rate in one of the two phases alone (i.e., either encoding or retrieval). This procedure is allowing us to threshold, but we do not have family-wise error corrected statistical significance at the single-unit level (there is no alpha inflation at the group level, see #Simulation of ESN identification). Furthermore, we assume that single units fire 5 independently. To ensure Concept Neurons tuned to the animal cue were not falsely interpreted as ESN activity, we excluded ESNs that showed a significant firing increase in response to the animal cue at encoding using the method described below under Identification of putative Concept Cells.
Alternative reinstatement measures are explored in the results section under #Identifying Episode 10 Specific Neurons (ESNs) and include (i) adding up the standardized firing rate between encoding and retrieval instead of multiplying them (E&R >= 1.645, reinstatement: E+R), (ii) increasing the minimum standardized firing rate from z = 1.645 to z = 2.6 (E&R >= 2.6, reinstatement: E*R) and (iii) using a different reinstatement measure that normalizes the encoding and retrieval product by their absolute difference (E&R >= 1.645, reinstatement: (E*R)/|E-R|), thereby taking 15 into account the similarity in the standardized firing rate between encoding and retrieval.
In the second step, we calculated whether the number of ESNs (as identified in the above procedure) was above chance level. We did this by randomly choosing one of the permutations calculated in the first step for every single unit and checking whether it would be classified as an ESN under the same criteria outlined above. This approach is similar to a set-level effect in SPM 20 (Penny et al., 2011). This process was repeated 10,000 times and the total number of single units which would be classified as an ESN in every single iteration of this process was used to build a distribution against which we compared our empirically discovered number of ESNs.

Simulation of ESN identification
We created a simulation using random pseudo-spike rates to determine whether our ESN analysis pipeline contains a bias towards false positive results. To create this simulation, we simulated the firing rate of 585 single neurons during 40 encoding and 40 retrieval trials by randomly drawing from a standard uniform distribution in the open interval of 0 to 1. These 5 values were first multiplied by a variance factor that cycled from 2 to 5 and then z-scored independently for encoding and retrieval. Just as in the main ESN analysis we computed a reinstatement value for each trial by multiplying the two standardized synthetic firing rates.
Next, we created a threshold by permuting the encoding and retrieval trial order 10.000 times while recomputing the shuffled reinstatement value. The 99th percentile was used as a threshold 10 while the empirical standardized pseudo-firing rate had to be at least 1.645 during encoding and retrieval. If these criteria were met, we considered the neuron an ESN.
Then we computed the second-order (group level) permutation test by drawing a random firstorder permutation for every single neuron and contrasted these values with the single neuron specific threshold. If the shuffled values satisfied the criteria for ESNs (i.e., encoding and 15 retrieval standardized pseudo-firing rate at or above 1.645 and a reinstatement value above the neuron specific threshold) we considered the single neuron an ESN under the null distribution.
By repeating this step 10,000 times we created a distribution under the H0 against which we could compare our initial random values. We repeated this entire process 1,000 times for each level of variance (2 to 5). 20 Because our initial pseudo-spikes were just random values, we expected 5% of all repetitions to yield a significant number of ESNs at any level of variance. If there was a bias, then more than 5% of all repetitions would contain a significant number of ESNs. As evidenced by Figure S2 this was not the case for any level of variance.
To ensure the robustness of our analysis approach, we repeated the ESN identification analysis using 500 surrogate datasets. These datasets were generated by segmenting all spike times into all available episodes in the order they occurred, and then circularly shuffling them. The results of this analysis revealed that the percentage of significant ESN identifications was 1.8%, which is well below the 5% threshold. These findings provide additional evidence for the credibility 5 and reliability of our analysis.

Identification of putative Concept Cells
We have followed the method outlined in Mormann et al. (Mormann et al., 2011; to detect significant single-unit responses towards images. To this end, the 1000ms period after the 10 stimulus onset was divided into 19 overlapping 100ms bins. The spike counts of each bin over all presentations of an image were compared to the 500ms baseline periods before stimulus onset for all images in the session using a two-tailed Mann-Whitney U test. We used the Simes' procedure to correct for multiple comparisons (Rødland, 2006). We performed this test twice, once with the commonly used threshold of p < 0.0005 and again with a liberal threshold of p = 0.05.

Identification of temporal Episode Specific Neurons (tESNs)
The analysis to identify neurons that showed a temporal firing reinstatement for specific episodes closely followed the outline described in #Identification of Episode Specific Neurons (ESNs).
For every neuron, we considered the spiking activity six seconds before until one second after 20 the response during encoding and retrieval (the first and last second was later excluded to avoid edge artefacts). We set a minimum threshold of ten spikes per trial and ten trials per neuron to enter the analysis. These thresholds were chosen to avoid artificially high cross correlations due to low numbers of spikes, and for having enough trials for the randomization procedure.
We then convolved each spike with a gaussian kernel (standard deviation: 100ms, length: ±300ms, peak normalized to one) creating a measure of instantaneous firing rate.
A main problem with comparing neural time courses between encoding and retrieval is that we do not know the time point at which an episode was encoded or retrieved. Therefore, we crosscorrelated the instantaneous firing rate during encoding with the instantaneous firing rate during 5 the corresponding retrieval trial (maximum lag of ±2.5s). The maximum value of this sequence served as our empirical reinstatement value. We then shuffled the encoding and retrieval order and recomputed this reinstatement value 1,000 times. The 99th percentile of these values was used as a threshold. If the empirical reinstatement value reached this threshold, we considered the neuron a temporal Episode Specific Neuron (tESN). In the next step, for each neuron we 10 randomly drew one of the permutations we calculated previously. Neurons whose permuted values reached or exceeded the threshold were considered tESNs under the null hypothesis. We repeated this process 1,000 times to build a null distribution against which we compared our empirical number of tESNs.
For experiment 2, we further excluded all trials in which the given neuron showed a significant 15 visual tuning using the methodology outlined under #Identification of putative Concept Neurons.
To evaluate the extent of trial overlap between rate code (ESNs) and temporal code (tESNs) reinstated trials, we used a permutation test. The trial identity (reinstated vs. non-reinstated) was shuffled within each neuron, and the resulting overlap values were compared with the empirical overlap. The analysis showed a significant overlap between rate code and temporally reinstated 20 episodes in Experiment 1 and Experiment 2 (p < 0.001). We tested the validity of this analysis by repeating the same analysis using random spike times. We generated these random spike times by first rounding the empirical spike times to the nearest integer and then drawing an equal number of pseudorandom integer values from a discrete uniform distribution between the first and last empirical spike times. We next followed a similar approach to ensure the robustness of our analysis in the tESN identification as outlined in #Simulation of ESN identification, whereby we repeated the analysis using 500 surrogate datasets. These datasets were created by segmenting all spike times into episodes in the order they occurred, and then circularly shuffling them. Upon analysis, the results indicated that the percentage of significant tESN identifications 5 was below the 5% threshold (4.18%), providing further evidence of the credibility and reliability of our analysis.

Spike density calculation
To produce the visualisations in Figure 4, we extracted spikes from one second before the cue 10 onset until five seconds after cue onset for each episode. Binary spike times were convolved with a 100 ms Gaussian kernel (length: ±300ms, peak normalized to one) to create a time-resolved signal of spike activity. We computed the average firing rate over time for all episodes (ep) during the baseline (BL) period 1,000 ms preceding the animal cue ሺ ‫ݔ‬ ሻ . We then z-scored the spike activity during the episode ‫ݔ(‬ , ௧ ) using the standard deviation ‫ݔ‪݀ሺ‬ݐݏ(‬ ሻ ) and mean ‫ݔ(‬ ) 15 across all pre-cue baseline periods (see equation (1)). To account for instances where no spiking activity occurred during the baseline period, 0.1 (see Ison et al., 2015) was added to the standard deviation ‫ݔ‪݀ሺ‬ݐݏ(‬ ሻ ). Episodes were then split into reinstated and non-reinstated episodes. Firing rates for each episode type (reinstated/non-reinstated) were then averaged over ESNs.
We employed a bootstrapping technique (N = 100 random draws) to ensure that the number of reinstated and non-reinstated episodes was the same for each ESN. Next, we performed a computation of the standard error of the mean (SEM) and the cluster-based permutation test (Maris and Oostenveld, 2007). The grayscale inset at the bottom of the figure shows the proportion of iterations that had a significant cluster at a specific time point.

Identification of Time Cells
We defined the beginning of an encoding block as the most salient event. Based on Umbach and colleagues (Umbach et al., 2020), we then extracted all spikes within each block and convolved them with a 251 ms Gaussian kernel (width factor: 2.5). This created a block number x time 10 points matrix. For our first analysis, we cut each encoding block into 40 equally sized bins, thereby normalizing block duration. We then used a Kruskal-Wallis test to determine whether any of the 40 bins significantly differed from each other.
We then performed a circular shifting permutation test to calculate whether we found a significant number of Time Cells. This is done by shifting a random number of values from the 15 beginning of the vector to the end. This shifting was imposed on each block separately and repeated N = 10,000 times for every single unit.
In a second test, the block length was determined by the longest block and shorter blocks were filled up with NaN values. This resulted in no normalization of time between blocks. The rest of the procedure is the same as described in the above paragraph. 20 Data and analysis code is currently available in a private repository on figshare (https://figshare.com/s/12fcaf4069e972a5c362) and will be made publicly available upon publication.

Materials and Methods
Figures S1 to S7 Tables S1 and S2