## Introduction

How do we strengthen memories while we sleep? The prime vehicle of systems consolidation is thought to be the reactivation of information encoded during prior wakefulness1,2,3,4. Through reactivation, memory representations are relayed between the hippocampus and neocortical long-term stores, transforming initially labile representations into long-lasting memories during sleep5,6. The communication between the hippocampus and neocortical networks is thought to be facilitated by an intricate interplay of the cardinal NREM sleep-related oscillations, namely cortical slow oscillations (SOs), thalamo-cortical sleep spindles, and hippocampal sharp-wave ripples7,8,9,10,11,12. SOs reflect fluctuations of the membrane potential and orchestrate transitions from neuronal silence (hyperpolarization, i.e., downstate) to neuronal excitation (depolarization, i.e., upstate)13,14. Importantly, they initiate time windows of excitability and inhibition not only in cortical but also in subcortical areas15,16,17. They trigger the emergence of sleep spindles in the thalamus18, which nest in the excitable upstates of the SOs. Spindles have been shown to gate Ca2+ influx into dendrites, thereby facilitating synaptic plasticity19,20. Importantly, recent evidence from two-photon imaging in mice suggests that Ca2+ influx is strongly amplified when spindles coincide with SO upstates21. Lastly, hippocampal ripples are transient network oscillations and have been closely linked to reactivation/replay of learning experiences22,23. They have been shown to occur in the excitable troughs of the spindle, suggesting that spindles might facilitate information transfer from the hippocampus to neocortical target sites24,25. The efficacy of systems consolidation through memory reactivation might thus hinge on concurrent SO-spindle coupling, ensuring optimal conditions to ignite structural changes in cortical target sites8,11,26,27.

Indeed, recent work in humans has revealed a key role of SO-spindle coupling during NREM sleep for behavioral expressions of consolidation. For instance, the precision of SO-spindle coupling, i.e., the exact timing of spindle maxima with respect to the SO upstate, has been shown to correlate with retention of declarative learning material28,29. Moreover, levels of SO-spindle coupling track the rise and decline of memory performance across development30,31,32. What is unknown, however, is whether there is a link between SO-spindle coupling and physiological expressions of consolidation, i.e., memory reactivation. A recent rodent study revealed that precise SO-spindle coupling is key for maintaining the reactivation of neural ensembles33, but whether and how this relates to episodic memory consolidation in humans is unclear.

In humans, the study of memory reactivation during sleep has mainly relied on targeted memory reactivation (TMR) protocols34,35. This experimental technique follows the rationale that reminder cues are presented during sleep to exogenously trigger memory reactivation. Intriguingly, presenting auditory reminder cues during NREM sleep reliably induces SO-spindle complexes36,37,38. However, to what extent TMR-induced processes reflect natural/endogenous consolidation processes remains unknown.

Building on the work summarized above, we propose that SO-spindle complexes might clock endogenous memory reactivation in service of consolidation during human sleep. To test this notion, we devised an experimental paradigm in which participants acquired associative memories before taking a nap. Multivariate decoding was then used to assess endogenous memory reactivation during NREM sleep. In this work, we show that memory reactivation is specifically bound to the presence of SO-spindle complexes, with the precision of their coupling correlating with reactivation strength. Reactivation strength in turn predicts the extent of consolidation across participants. These findings elucidate the memory function of sleep in humans and illustrate the importance of SO-spindle coupling for clocking endogenous consolidation processes.

## Results

Twenty participants (age: 20.75 ± 0.35; 17 female) took part in two experimental sessions. In both sessions they performed an episodic learning task, with memory performance being assessed before and after taking a 120 min nap (Fig. 1a). Depending on the experimental session, participants learned to associate verbs with images of objects or scenes during the presleep learning phase. These stimulus categories were chosen as they recruit distinctive brain networks (e.g., lateral occipital complex for objects, parahippocampal place area for scenes39,40), thus facilitating the analytical readout of endogenous, experience-dependent memory reactivation during sleep. Specifically, learning-related memory reactivation during sleep would manifest as enhanced representational evidence for the stimulus category learned before sleep (i.e., greater evidence for object representations after word-object encoding and greater evidence for scene representations after word-scene encoding, respectively).

Memory performance was tested both before and after the sleep period in a stepwise manner. First, participants made word-recognition judgments (old or new). Then, for recognized words only, recall of the associated image exemplar (object or scene, depending on experimental session) was assessed. The resulting recall performance was then normalized by the amount of correctly recognized items (i.e., “hits”). To avoid any impact of presleep testing on our behavioral consolidation measures41,42, only half of the learned material was tested before sleep, while the remaining half was tested after sleep. Finally, at the end of the experimental sessions participants performed an independent “localizer task”, where a new set of object and scene images was presented (including both stimulus categories, irrespective of experimental session). This localizer served to train a linear classifier to distinguish object- vs. scene-related electroencephalographic (EEG) patterns.

### Behavioral results and category classification during the localizer task

First, we calculated d-prime (d′43) as a general measure of recognition memory performance (for a detailed overview of memory measures as well as sleep characteristics see Supplementary Tables 1 and 2). Both pre- and post-sleep d′ levels confirmed that participants could reliably discriminate between old and new items (i.e., d′ > 0; presleep objects: d′ = 2.11 ± 0.14, scenes: d′ = 2.02 ± 0.22; post-sleep objects: d′ = 1.76 ± 0.19, scenes: d′ = 1.69 ± 0.23). Out of hits, participants recalled the correct image for 64.31 ± 3.23% before sleep (objects: 64.90 ± 3.99%, scenes: 63.72 ± 5.20%) and for 57.61 ± 3.91% after sleep (objects: 59.39 ± 5.71%, scenes: 55.82 ± 5.47%).

To test for potential differences in memory performance between test times and stimulus categories, we conducted ANOVAs for recognition memory (d′) and cued recall, including the factors category (object vs. scene) and test-time (pre- vs. post-sleep). Results indicated that memory performance (both recognition and recall) declined over the course of sleep (main factor test-time: recognition memory: F1,19 = 10.91; p = 0.004; cued recall: F1,19 = 15.53; p = 0.001). Importantly though, no difference in memory performance between categories was observable (main effect category: recognition memory: F1,19 = 0.21; p = 0.65; cued recall: F1,19 = 0.38; p = 0.54) and no interaction between test-time and learning category (recognition memory: F1,19 = 0.003; p = 0.95; associative memory: F1,19 = 0.69; p = 0.41), ensuring that task difficulty was highly comparable between image categories (also see Table S1).

The localizer task at the end of each session was employed to derive the neural signatures of object vs. scene processing, which were then used to track category-specific memory reactivation during NREM sleep (see below). Participants were presented with novel sets of object and scene images and performed a continuous recognition task on these images. Specifically, each image was presented twice (mean distance between successive presentations = 8.06, range = 2–33) and participants were instructed to indicate whether a given item was “new” (first presentation) or “old” (second presentation). As expected, participants showed high accuracy levels on this task (objects: 97.02 ± 0.61 correct decisions; scenes: 92.57 ± 4.44 correct decisions), with performance again matched between image categories (t(19) = 1.05, p = 0.31).

To extract the category-specific (i.e., object and scene) patterns of neuronal activity, we pooled the localizer data across experimental sessions and performed multivariate classification (linear discriminant analysis; LDA) on these data (Fig. 1c). Using fivefold cross-validation (see Methods), above-chance classification accuracy emerged around 150 ms following image onset, was sustained until 2800 ms and peaked at 600 ms (p = 0.002, corrected for multiple comparisons across time). Hence, the localizer data allowed us to isolate brain patterns associated with the processing of object and scene images, which we then used to guide analysis of category-specific reactivation during sleep (for results concerning the stability of the decoding approach see Supplementary Fig. 1).

### Endogenous memory reactivation during NREM sleep is clocked by SO-spindle complexes

As mentioned above, theoretical models and recent empirical findings point to particular role of SO-spindle coupling for memory consolidation. We thus tested the resulting prediction that the joint presence of SOs and sleep spindles (henceforth referred to as “SO-spindle complexes”) would drive endogenous memory reactivation during human sleep. SOs and sleep spindles were detected in the EEG data using established algorithms8,44. To isolate SO-spindle complexes, we identified events where SO downstates were followed by sleep spindles within a time window of 1.5 s (for a time–frequency representation of the SO-spindle complexes see Fig. 2a; for a peri-event SO-spindle histogram, see Supplementary Fig. 2). To determine whether learning-related (i.e., category-specific) neuronal activity would be differentially reactivated during SO-spindle complexes, we first trained a classifier on the concatenated localizer data from both experimental sessions [−0.5 to 3 s]. Importantly, the localizer tasks of both sessions included object and scene images, to ensure that multivariate measures of potential reactivation not merely reflect session-specific EEG properties. The resulting training weights were then applied on both sessions’ sleep data, centered around the downstate of SO-spindle complexes (for related results where the data were locked to different spindle features see Supplementary Fig. 3). Classifier testing labels reflected the stimulus category used in the preceding encoding session (object or scene), such that above-chance classification signifies endogenous activation patterns more strongly resembling the just-learned stimulus category than the alternative stimulus category.

As shown in Fig. 2b, results revealed a cluster of significant above-chance classification from 800 to 1200 ms relative to the SO downstate (p = 0.016, corrected for multiple comparisons across time, localizer time-window [1000 to 1800 ms]), emerging between maximum and offset of coupled sleep spindles (for the corresponding accuracy map see Supplementary Fig. 4; for participant specific classification values see Supplementary Table 3). No negative cluster survived correction for multiple comparisons (cluster with smallest p > 0.6).

But does endogenous memory reactivation indeed require the joint presence of SOs and spindles? To address this question, we performed the same decoding procedure, but locking the data to solitary SO or spindle events (thus, SOs without spindles and vice versa). For both types of events, when testing accuracy levels against chance at any localizer time × sleep time point, no significant cluster of above-chance classification emerged (in both cases cluster with the smallest p > 0.2, see Supplementary Fig. 5; similarly, testing the classifier on Slow spindle—SO-locked data did not yield any significant cluster of above-chance classification (cluster with smallest p = 0.67; see Supplementary Fig. 6)).

### Precision of SO-spindle coupling correlates with reactivation strength

If SO-spindle coupling is indeed instrumental for consolidation, its precision should impact the extent of endogenous memory reactivation. To quantify the preferred phase of SO-spindle modulation, we determined in every participant the SO phases corresponding to the spindle peak amplitudes (electrode Cz). In 16/20 participants we found significant nonuniform distributions (p < 0.05; Rayleigh test, mean vector length: 0.34 ± 0.03). In line with previous findings, we found a significant nonuniform distribution across participants (Rayleigh z = 16.71, p < 0.0001), with spindles peaking near the SO upstate (corresponding to 0°; mean coupling direction: −36.78° ± 5.48°; see Fig. 2c).

To further test whether the precision of SO-spindle coupling would be relevant for the reactivation of memories we computed a circular-linear correlation between each participant’s preferred SO-spindle phase (averaged across sessions) and their mean reactivation strength (averaged across the significant cluster shown in Fig. 2b). The individual SO-spindle modulation phase was significantly correlated with decoding accuracy (r = 0.66; p = 0.011). The distribution indicated that the closer the spindles were nested towards the SO upstate, the higher the fidelity of the associated reactivation signal (see Fig. 2c, for a scatter plot see Supplementary Fig. 7; for additional analyses estimating the impact of trait-like characteristics in this context, see Supplementary Notes).

To ensure that the results described above were not driven by differential wake classification characteristics, we conducted a partial circular-linear correlation with the mean decoding levels from the localizer tasks (averaged across the significant cluster shown in Fig. 1c) as a covariate. Again, we observed a positive relationship between the individual SO-spindle modulation phase and decoding accuracy (r = 0.65; p = 0.012).

### Reactivation strength predicts consolidation of associative memories

If SO-spindle triggered reactivation reflects memory-related processes, one would expect a functional link with behavioral expressions of consolidation. To address this question, we correlated, across participants, levels of post-sleep memory retention and reactivation strength. Specifically, a “retention index” (proportion of post-sleep recalled images (out of hits) in relation to pre–sleep memory performance; see Methods section for details) was collapsed across sessions and correlated with decoding accuracies averaged across the significant cluster reported above. As shown in Fig. 2d, we observed a significant positive relationship between the two variables (Spearman rho = 0.45, p = 0.048). Of note, no association between decoding accuracy and recognition memory performance was detectable (r = 0.02, p = 0.93), indicating that reactivation strength was specifically linked to the consolidation of hippocampal-dependent associative memories45. However, the correlation between reactivation and consolidation of associative memory was not significantly greater than that with recognition memory (z = 1.35; p = 0.17). Lastly, we again controlled this analysis for localizer decoding levels using a partial correlation, which substantiated the results (Spearman rho = 0.45, p = 0.049).

## Discussion

Our results demonstrate that consolidation relies on endogenous memory reactivation clocked by SO-spindle complexes. In particular, we found that during the presence of SO-spindle complexes, activation patterns were biased towards the previously encoded learning material (Fig. 2a, b). Moreover, the precision of SO-spindle coupling predicted the fidelity of memory reactivation (Fig. 2c). Finally, reactivation strength predicted the amount of consolidation across participants, highlighting its functional significance for behavior (Fig. 2d).

NREM sleep oscillations (SOs, spindles, and ripples) have long been implicated in the memory function of sleep, and recent work has emphasized the importance of their temporal synchronization46. Specifically, the precise timing of SOs, spindles, and ripples is thought to enable the relay of hippocampus-dependent memories to cortical networks1. Indeed, recent work in rodents has shown that their co-occurrence is necessary for effective consolidation as assessed via fear conditioning10 or an object-in-place recognition task9. However, how these tasks relate to expressions of episodic memory in humans is not entirely clear. Human iEEG work with epilepsy patients has corroborated the triple-interaction of these sleep oscillations8,24,27, but none of these studies has assessed memory reactivation or the effects on behavior. Investigation of healthy participants via scalp EEG has shown that brain patterns across sleep differ as a function of prior learning tasks47, but these activation patterns were not directly related to wake activity or to discrete SOs/spindles. Another study employed simultaneous EEG-fMRI and found univariate signal increases in learning-related areas during spindles48 (see also44), but it remained open whether such reactivation bears relevance for memory consolidation. Finally, the advent of TMR protocols49,50 has shown evidence for both SO-spindle complexes and information processing in response to external reminders36,38,51,52,53,54,55, but it is unclear whether and how such exogenous memory reactivation relates to endogenous reactivation in service of memory consolidation. In sum, different lines of research across species point to a key role in coupled sleep oscillations, but the dynamics of endogenous reactivation in humans and its relevance for memory consolidation has remained unclear.

In the current study, we tackled this question by employing two learning sessions per participant, each using different and analytically discriminable learning stimuli (object and scene images, Fig. 1a). To ensure that multivariate measures of reactivation not merely reflect session-specific EEG properties, we included an object/scene localizer task in each session and trained a linear classifier on the combined data. This allowed us to track the reemergence of learning categories during the nap periods. It deserves mention that decoding levels were modest in general and not every participant reached above-chance classification (18/20, see Fig. 2d and Supplementary Table 3). Several reasons might limit the effect size when decoding memory reprocessing during sleep. First, the signal of interest (i.e., sleep electrophysiology) is inherently noisy. Guided by theoretical considerations we limited the search-space for memory reactivation to the presence of SO-spindle complexes. Still, it is unlikely that each single SO-spindle complex is associated with memory reactivation. Including the presence of ripples as a criterion may increase sensitivity, but even SO-spindle-ripple complexes are unlikely to yield robust memory reactivation in every instance56. Second, our data show that SO downstates represent viable reference points for time-locking the analysis of memory reactivation. However, there is considerable variability in signal characteristics across SOs and spindles (e.g., event durations or peak times), and such across-event variability diminishes classification power which relies on spatiotemporal activation patterns common across events. That said, decoding levels observed here are in line with previous TMR studies examining sleep-related memory reactivation with multivariate classification36,53,57. Importantly, we found that higher decoding performance correlates with the behavioral expression of memory consolidation across participants, further corroborating the functional significance of reactivation.

Another key feature of our paradigm was the assessment of both item- and associative memory performance. Interestingly, the strength of memory reactivation during sleep predicted consolidation levels for associative memory only. This finding could indicate that reactivation particularly benefits hippocampus-dependent memories45. However, it might also reflect the fact that reactivation pertained to the categorical features of the learning material, which was also the aspect relevant for associative- and not item memory. Moreover, while performance levels were carefully matched between object and scene tasks (Fig. 1b), performance was lower for associative memory than for item recognition. Thus, differential effects of reactivation for associative- vs. item memory could also suggest differential benefits of sleep for weaker vs. stronger memories58,59,60,61 but see ref. 62.

Owing to the limited spatial resolution of scalp EEG (especially for transient high-frequency oscillations), our current data remain agnostic with regard to hippocampal ripples. That said, a recent iEEG study has shown that both hippocampal ripples and hippocampal–cortical interactions are most eminent when preceded by a cortical SO-spindle complex24. To the extent that reactivation observed here is linked to hippocampal engagement, the timing of our effects (Fig. 2a, b) is consistent with accumulating evidence that the hippocampal–cortical dialog is in fact initiated by cortex24,25,63,64,65. One tentative interpretation of our results might thus be that cortical SO-spindle complexes trigger hippocampal memory reactivation while ensuring that the cortical target area is optimally tuned for synaptic plasticity and memory reprocessing19,21,66. Indeed, recent rodent work has shown that optogenetic induction of SO-locked spindles enhances SOs-spindle-ripple coupling and the consolidation of hippocampus-dependent memories10. Our finding that reactivation peaks towards the end of spindles (Fig. 2b) is consistent with the idea that mnemonic reprocessing and integration into neocortical networks continue after sleep spindles, i.e., during periods of spindle “refractoriness”67. Likewise, intracranial recordings in humans have shown that hippocampal–cortical connectivity (“mutual information”) mediated by hippocampal ripples occurred ~500–1500 ms after the SO downstate24, again matching the time window in which we observed memory reactivation. Together, one tentative scenario might be that memory processing is most beneficial after SO-spindle complexes, i.e., at time points of elevated cortical plasticity.

Analytically, our approach relied on (i) matching behavioral performance between sessions, (ii) pooling sleep data across both sessions, and (iii) deriving evidence for the reactivation of learning material across all aggregated SO-spindle complexes. These design features leave some interesting questions open for future work. First, to what extent might trait-like participant characteristics drive both reactivation and memory processes? Using our sleep questionnaires, we were able to rule out subjective sleep quality and circadian rhythm as confounds (see Supplementary Notes), but there may be other trait-like factors impacting reactivation and consolidation. An alternative design would be to conduct a longitudinal study in which within-participant levels of learning and consolidation are experimentally manipulated across multiple sessions (e.g., by varying encoding depth or task difficulty). Second, while aggregating all SO-spindle events is essential for the classification approach, it leaves open whether reactivation occurs during each SO-spindle event. An alternative approach might be to use intracranial recordings to identify single neurons that are tuned to stimuli used in a specific learning session and then track engagement of these neurons during individual SO-spindle complexes. Such more fine-grained methods might provide additional insights into reactivation-related characteristics (e.g., accuracy and frequency of reactivation processes). In conclusion, our results indicate that endogenous memory reactivation in service of sleep-dependent consolidation is clocked by the fine-tuned coupling of SOs and spindles. Future work employing simultaneous recordings from the hippocampus will further elucidate the intricate dynamics underlying the hippocampal–cortical dialog of systems consolidation.

## Methods

### Participants

Twenty healthy, right-handed participants (mean age: 20.75 ± 0.35; 17 female) with normal or corrected-to-normal vision took part in the experiment. An additional five participants had to be excluded due to insufficient sleep (less than 30 min sleep during one of the sessions). The sample size was determined in accordance with previous human sleep and memory studies (e.g., 30,68). Pre-study screening questionnaires (including the Pittsburgh Sleep Quality Index (PSQI, 69), the morningness–eveningness questionnaire70, and a self-developed questionnaire querying general health status and the use of stimulants) indicated that participants did not take any medication at the time of the experimental session and did not suffer from any neurological or psychiatric disorders. All participants reported good overall sleep quality. Furthermore, they had not been on a night shift for at least 8 weeks before the experiment. All participants were instructed to wake up by 7 a.m. and avoid alcohol the evening before and caffeine on the day of the experimental sessions. They confirmed at the beginning of each experimental session their adherence to the requirements. The study was approved by the University of Birmingham Research Ethics Committee and written informed consent was obtained from participants.

### Stimuli and procedures

#### Overview

The experiment consisted of two experimental sessions (object and scene condition), separated by at least 1 week (mean = 8.5 ± 0.85 days). The order of the two sessions was counterbalanced across participants. On experimental days participants arrived at the sleep laboratory at 11 a.m. The experimental session started with the set-up for polysomnographic recordings during which electrodes for electroencephalographic (EEG), electromyographic (EMG), and electrocardiographic (ECG) recordings were applied. Before the experimental sessions, participants were habituated to the environment by spending an adaptation nap in the sleep laboratory.

At around 12 a.m. the experiment started with a modified version of the psychomotor vigilance task (“PVT”71), followed by the memory task (for details see Memory Task below). The sleep period began at ~1 p.m. and participants were given 120 min to nap (mean total sleep time: 101.63 ± 2.23 min; for sleep characteristics see Supplementary Table 2). Afterwards, the vigilance of all participants was assessed using the PVT and memory performance was tested again. At the end of each session a localizer task was conducted (see Localizer Task for details).

#### Stimuli

A set of in total 360 verbs and 240 images (half objects and half scenes) served as experimental stimuli during both sessions. Objects were images of animals, food, clothing, tools, or household items presented on a plain white background (e.g., a hammer). Scenes were images of nameable landscapes or places (e.g., a coffee shop). All images were taken from72.

For the recording of behavioral responses and the presentation of all experimental tasks, Psychophysics Toolbox Version 373 and MATLAB 2018b (MathWorks, Natick, USA) were used. Participants completed a practice run (five trials) of each experimental task in advance to ensure they fully understood the instructions. Responses were made via keyboard presses on a dedicated PC. Across all experimental phases, presentation order of stimuli was randomized across participants.

The vigilance of the participants was assessed using a modified version of the “PVT”71 before the encoding phase and right after the sleep period. Participants were presented with a centered fixation cross on the computer screen. Every 2–10 s the fixation cross was replaced by a counter counting up from 0 to 2 s in steps of 20 ms. Participants were instructed to stop the counter as fast as possible by pressing the space bar. After each trial participants were provided with feedback about their reaction time. The task was administered for 5 min. For PVT related results see Supplementary Fig. 8.

#### Familiarization

The experiment began with an image familiarization phase. The purpose of this part was (i) to facilitate learning of the verb-image pairs in the main encoding session and (ii) to provide the proper image names for subsequent cued recall. Each trial started with a fixation cross, presented for 1.5 ± 0.1 s. Subsequently, participants saw one of 130 images showing objects or scenes (depending on the experimental session). About 120 of these images were part of the subsequent learning material and were accompanied by a caption naming the exemplar. Ten additional images, which were not further used during the experiment, were accompanied by an erroneous description. Each stimulus combination was presented for 2.5 s on the computer screen. The participants’ task was to press a button whenever they encountered a wrong image-word combination.

#### Encoding

Participants learned pairwise associations between 120 verbs and images. The images comprised either objects or scenes (depending on experimental session).

Each trial started with a fixation cross, presented for 1.5 ± 0.1 s. Afterwards, a verb (e.g., “jump”) was presented for 1 s on the computer screen and immediately followed by the to-be-associated image for 4 s. Participants were instructed to form a vivid mental image or story linking the verb and the object/scene. After the presentation of the image (4 s), they had to indicate whether the image they had formed was realistic or bizarre. In addition, participants were informed that their memory performance for verb- image pairs would be tested later. The learning block was run twice with varying trial order to reach satisfactory levels of presleep memory performance (as determined in a pilot study).

#### Presleep memory test

In order to prevent any testing effect on our behavioral measures of memory consolidation41,42, only half of the learned verb-image combinations was tested during the presleep memory test. Thus, the presleep memory test included 60 randomly chosen verbs intermixed with 30 new verbs, which were not seen by the participants before (“foils”). Each trial started with a fixation cross, presented for 1.5 ± 0.1 s. After the fixation cross, a verb was presented on the computer screen. After 3 s, participants had to indicate whether the verb was “old” (i.e., part of the learning material) or “new”’ (i.e., it was not seen during learning) within the next 10 s. In case of “new” responses, participants immediately moved on to the next trial. In case of “old” responses, participants were required to type a description of the image they had in mind or to type “do not know” in case they could not recall the target image. Trials were coded as correct if (i) the participant typed the same caption as shown during the familiarization phase or (ii) the description unambiguously matched the content of the image

#### Sleep period

The nap period began at ~1 p.m. Participants had the opportunity to sleep in a laboratory bedroom for 120 min, while their brain activity was monitored using polysomnography).

#### Post-sleep memory test

Twenty minutes after waking up, participants performed another memory test on the remaining 60 study items. This followed the same procedures as the presleep memory test with the exception that new foil verbs were used.

During the localizer task participants were presented with a new set of images comprising objects and scenes (90 objects and 90 scenes, irrespective of session). Each trial started with a fixation cross, presented for 1.5 ± 0.1 s. Subsequently, a randomly chosen image (object or scene) was presented on the computer screen for a minimum of 2.5 and a maximum of 10 s. Each image was presented twice during the task and participants were instructed to indicate whether it was shown for the first (“new”) or second (“old”) time (mean distance between successive presentations = 8.06, range = 2–33).

By administering the localizer task at the very end of each session, we assured that participants engaged exclusively with a given stimulus category before sleep (objects or scenes, respectively). The rationale of this approach was to keep the category-specific representations during learning as pure as possible, in an effort to bias their reactivation during the subsequent sleep period. However, presenting both stimulus categories during the localizer task ensured that category-specific classifier evidence during sleep would not merely reflect general differences between sessions (e.g., electrode impedances, electrode positions, etc.).

### EEG

A Brain Products 64 channel EEG system was used to record electroencephalography (EEG) throughout the experiment. Impedances were kept below 10 kΩ. EEG signals were referenced online to electrode FCz and sampled at a rate of 1000 Hz. Furthermore, EMG and the ECG was recorded for polysomnography. Sleep architecture was determined offline according to standard criteria by two independent raters74.

### Data analysis

#### Behavioral preprocessing

To assess recognition memory performance, we calculated the sensitivity index d′ [i.e., z(Hits)—z(False Alarms)] according to signal detection theory. Proportions of 0 and 1 were replaced by 1/2 N and 1–1/2 N, respectively, with N representing the number of trials in each proportion (i.e., N = 60, see ref. 43).

For associative memory performance we calculated the proportion of correctly recalled images relative to the number of recognized words (i.e., (recalled images/hits) $$*$$ 100). To correlate levels of memory retention and reactivation strength we derived a “retention index”. We computed the proportion of post-sleep recalled images (out of hits) in relation to presleep memory performance (i.e., (recalled out of hits post-sleep/recalled out of hits presleep) $$*$$ 100) and collapsed these measures across sessions.

#### EEG data analysis

EEG data were preprocessed using the FieldTrip toolbox for EEG/MEG analysis75. All data were downsampled to 200 Hz. Subsequently, the localizer and sleep data were segmented into epochs. The temporal range of the epochs was [–1 to 3] s around stimulus onset for localizer trials. As in other studies concentrating on the coordination of SOs and spindles (e.g.,8,30,31,32,76) we specifically focused on electrode Cz due to the spatial distribution of both oscillations. Both oscillations show strong presence over central areas, rendering Cz an optimal target zone for investigating concomitant activity of SOs and (fast) spindles. Hence, for the sleep data, slow oscillation—spindle epochs [−2.5 to +2.5 s] time-locked to SO downstates were extracted from channel Cz (for details see Event detection).

Noisy EEG channels were identified by visual inspection, discarded, and interpolated, using a weighted average of the neighboring channels. The localizer data were additionally subjected to an independent component analysis77 and ICA components associated with eye blinks and eye movements were identified and rejected.

#### Event detection and SO-spindle coupling

SOs and sleep spindles were identified for each participant, based on established detection algorithms8,44. Following standard procedures, all sleep data were re-referenced against linked mastoids for sleep scoring and event detection74,78,79; please note that the classification results reported in Fig. 2b remained unchanged when using a CAR scheme). SOs were detected as follows: Data were filtered between 0.3–1.25 Hz (two-pass FIR bandpass filter, order = three cycles of the low frequency cut-off). Only movement-free data (as determined during sleep scoring) from NREM sleep stages 2 and 3 were taken into account. All zero-crossings were determined in the filtered signal at channel Cz, and event duration was determined for SO candidates (that is, downstates followed by upstates) as time between two successive positive- to-negative zero-crossings. Events that met the SO duration criteria (minimum of 0.8 and maximum of 2 s, 0.5–1.25 Hz) entered the analysis. 5-s-long segments (±2.5 s centered on the downstate) were extracted from the unfiltered raw signal.

For spindle detection, data were filtered between 12–18 Hz25,80 (two-pass FIR bandpass filter, order = three cycles of the low frequency cut-off), and again only artifact-free data from NREM sleep stages 2 and 3 were used for event detection. The root mean square (RMS) signal was calculated for the filtered signal at channel Cz using a moving average of 200 ms, and a spindle amplitude criterion was defined as the 75% percentile of RMS values. Whenever the signal exceeded this threshold for more than 0.5 s but less than 3 s (duration criteria), a spindle event was detected. Epochs time-locked to the minimum spindle trough (−2.5 to +2.5 s) were extracted from the unfiltered raw signal for all events. To isolate SO-spindle complexes, we determined for all SOs whether a spindle was detected following the SO (SO downstate + 1.5 s). Finally, SO-spindle events were extracted (−2.5 to +2.5 s with regards to the SO downstate) from the raw signal at channel Cz.

For the analysis of SO-spindle coupling8,24, we filtered the SO-spindle data in the SO range (0.3–1.25 Hz, two-pass Butterworth bandpass filter), applied a Hilbert transform and extracted the instantaneous phase angle. Next we filtered the same data segments in the spindle range (12–18 Hz two-pass Butterworth bandpass filter), Hilbert transformed the signal and extracted the instantaneous amplitude. Only data points within ±1.5 s were considered to avoid filter-related edge artifacts. Then we detected the maximal sleep spindle amplitude in channel Cz and isolated the corresponding SO phase angle. The preferred phase of SO-spindle coupling was then obtained from averaging all individual events’ preferred phases of each participant, and the resulting distribution across participants was tested against uniformity (Rayleigh test, CircStat toolbox81).

#### Multivariate analysis

Multivariate classification of single-trial EEG data was performed using MVPA-Light, a MATLAB-based toolbox for multivariate pattern analysis82. For all multivariate analyses, a LDA was used as a classifier82. Prior to classification, all data were re-referenced using a common average reference (CAR).

For classification within the localizer task, the localizer data were z-scored across all trials for each time point separately. Next, data from both sessions were collapsed and subjected to a principal component analysis (PCA), which transforms the data into linearly uncorrelated components, ordered by the amount of variance explained by each component83. PCA was applied to reduce dimensionality and limit over-fitting84 and the first 30 principal components were retained for further analysis85,86,87. To quantify whether object and scene representations can be differentiated in the localizer, the classifier was trained and tested to discriminate between object and scene trials. Data were smoothed using a running average window of 150 ms. The EEG channels served as features and a different classifier was trained and tested on every time point. As metric, we used Area Under the ROC Curve (AUC), which indexes the mean accuracy with which a randomly chosen pair of Class A and Class B trials could be assigned to their correct classes (0.5 = random performance; 1.0 = perfect performance). To avoid overfitting, data were split into training and test sets using fivefold cross-validation88. Since cross-validation results are stochastic due to the random assignment of trials into folds, the analysis was repeated five times and results were averaged. For statistical evaluation, surrogate decoding performance was calculated by shuffling the training labels 250 times. Resulting surrogate performance values were then averaged, providing baseline values for each participant under the null hypothesis of label exchangeability.

To investigate differential evidence for object vs. scene representations as a function of prior learning during SO-spindle complexes (Fig. 2b), we used the temporal generalization method89. Prior to decoding, a baseline correction was applied based on the whole trial ([−0.5 to 3 s] for localizer segments; [–1.5 to 1.5 s] for SO-spindle segments). Next, localizer and sleep data were z-scored across trials and collapsed across sessions. PCA was applied to the pooled wake-sleep data and the first 30 principal components were retained. Localizer and sleep data were smoothed using a running average window of 150 ms. A classifier was then trained for every time point in the localizer data (Fig. 2b, vertical axis) and applied on every time point during SO-spindle complexes (horizontal axis). No cross-validation was required since localizer and sleep datasets were independent. As metric, we again used AUC (see above). For statistical evaluation, surrogate decoding performance was calculated by shuffling the training labels (stemming from the localizer task) 250 times. Again, the resulting performance values were averaged, providing baseline values for each participant under the null hypothesis of label exchangeability

To resolve the topography of diagnostic features, we conducted a “searchlight decoding procedure”. In brief, PCA components were projected back to sensor space and the classification procedure was repeated across moving kernels of small electrode clusters, with neighboring electrodes being selected as features [feature number range: 5 to 9]. Classifiers were trained for every time point in the localizer data and applied on every time point during SO-spindle complexes. Finally, classification values were collapsed across our time windows of interest [localizer time: 1000 to 2000 ms; SO-spindle time: 800 to 1200 ms] and tested against chance level (corrected for multiple comparisons across space). A broad cluster of above-chance classification comprising bilateral parietal and occipital areas emerged (pcluster = 0.004).

#### Time–frequency analysis

Time–frequency analysis of the SO-spindle segments was performed using FieldTrip. Frequency decomposition of the data, using Fourier analysis based on sliding time windows (moving forward in 50 ms increments). The window length was set to five cycles of a given frequency (frequency range: 1–30 Hz in 1 Hz steps). The windowed data segments were multiplied with a Hanning taper before Fourier analysis. Afterwards, power values were z-scored across time [−4 to 4 s]. The longer time segments were chosen to allow for resolving low frequency activity within the time windows of interest [−1.5 to 1.5 s] and avoid edge artifacts.

### Statistics

Behavioral retrieval data were subjected to a 2 (Category: Object/Scene) × 2 (Test-Time: Presleep/Post-sleep) repeated measures ANOVA. To test for potential differences in memory accuracy between sessions in the localizer task, a paired sampled t-test was computed. The statistical significance thresholds for all behavioral analyses were set at p < .05. Spearman correlation was used to assess the relationship between memory retention and reactivation strength. To control for mean decoding levels from the localizer tasks (averaged across the significant cluster), a partial Spearman correlation was used. SPSS (IBM Corp., Version 26) and Matlab was used for behavioral data analyses.

FieldTrip’s cluster permutation test90 was used to deal with the multiple comparisons problem for all classification analyses. A dependent-samples t-test was used at the sample level to identify clusters of contiguous time points across participants and values were thresholded at p = 0.05. Maxsum (sum of all t values in cluster) served as cluster statistic and Monte Carlo simulations were used to calculate the cluster p value (alpha = 0.05, two-tailed) under the permutation distribution. Analyses were performed at the group level. The input data were either classification values across time (Fig. 1c) or time x time classification values (Fig. 2b). In all cases a two-sided cluster permutation test with 1000 randomizations was used to contrast classification accuracy against chance performance.

Non-uniformity of the preferred phase with regard to SO-spindle coupling was assessed using the Rayleigh test (CircStat toolbox). The nonlinear relationship between SO-spindle coupling and reactivation strength was determined with a circular-linear correlation as implemented in the CircStat toolbox. A partial circular-linear correlation modified from the CircStat toolbox was used to control for the mean decoding levels from the localizer task. In all cases the statistical significance thresholds were set at p < 0.05.

### Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.