Memory for spatio-temporal contextual details during the retrieval of naturalistic episodes

Episodic memory entails the storage of events together with their spatio-temporal context and retrieval comprises the subjective experience of a link between the person who remembers and the episode itself. We used an encoding procedure with mobile-phones to generate experimentally-controlled episodes in the real world: object-images were sent to the participants' phone, with encoding durations up to 3 weeks. In other groups of participants, the same objects were encoded during the exploration of a virtual town (45 min) or using a standard laboratory paradigm, with pairs of object/place-images presented in a sequence of unrelated trials (15 min). At retrieval, we tested subjective memory for the objects (remember/familiar) and memory for the context (place and time). We found that accurate and confident context-memory increased the likelihood of “remember” responses, in all encoding contexts. We also tested the participants' ability to judge the temporal-order of the encoded episodes. Using a model of temporal similarity, we demonstrate scale-invariant properties of order-retrieval, but also highlight the contribution of non-chronological factors. We conclude that the mechanisms governing episodic memory retrieval can operate across a wide range of spatio-temporal contexts and that the multi-dimensional nature of the episodic traces contributes to the subjective experience of retrieval.

the ownership status of the encoded item 17,18 or by asking the participants to perform simple actions during encoding ("enactment") [19][20][21] . These manipulations have been found to enhance the subjective relevance of the encoded episode and to strengthen memory performance. Nonetheless, they only weakly approximate the personal experience of acting in the real world.
Virtual reality (VR) provides us with means of investigating episodic memory in conditions that entail active behavior within a coherent spatio-temporal continuum and can result to high-levels of personal engagement. Episodic memory studies in VR typically involve a first phase when the participants encode a series of items/events in the environment, followed by a memory retrieval phase testing for some aspects of the memorized events. Overall, encoding in VR has been associated with an increase of memory performance [22][23][24] ; see also 25 for review.
Beside overall performance, VR protocols allowed addressing the more specific issue of how the characteristics of the encoding experience affect the quality of the subsequent retrieval. The latter includes, on the one hand, the availability of information about the encoding context (e.g. the "where and when" of a given episode) and, on the other hand, the subjective experience of recollection. In the framework of dual-process models of episodic memory 26,27 , it has been suggested that the binding of the multiple elements of an episode may contribute to recollection, which would entail the access to the specific spatio-temporal context associated with the memorized episode [28][29][30] . For example, Jebara et al. 31 investigated how "active vs. passive" VR navigation affected subsequent event-and context-retrieval and their memory status. The results in young adults showed better event memory and what-where-when binding in a passive-planning condition compared with active-driving condition. These somewhat surprising results indicate that active behavior in complex environments (here, active-driving) may actually reduce memory encoding. In this study, the memory status of the episode (Remember vs. Know) was unaffected by the encoding condition (but, note that the different encoding conditions did affect performance in an older group of participants). Persson et al. 32 investigated the relationship between the memory status of the events (Remember vs. Familiar) and the ability to retrieve contextual information. Memory encoding entailed a small VR environment (2 rooms, with windows facing on an external courtyard) and the results showed that context details (i.e. the outdoor weather) could be reliably retrieved only when the memory status of the object was Remember. By contrast, memory for the temporal position of the different encoding events was available also when the episode was Familiar. Using a far larger environment (48 rooms), Horner and colleagues 33 also reported some uncoupling between temporal-order performance and context-memory. Specifically, temporalorder memory was above chance and was modulated by the presence of physical boundaries (i.e., a change of room), while spatial memory (where/in-what-room a given object was seen) was instead at chance level. Accordingly, previous studies using VR demonstrated that several aspects of the encoding context, such as the spatio-temporal characteristics of the environment and the cognitive demands of the task, can affect memory performance and modulate the relationship between the subjective memory status of the what-event (recollection vs. familiarity) and memory for the associated where/when-sources.
However, encoding in VR still entails substantial differences compared to any real-world situation. In particular, the real physical position of the participant is fixed in space (i.e. the participant sits in the laboratory throughout encoding) leading to a crossmodal miss-match between the visual input and body position. This may weaken the participants' subjective experience of the spatial context and reduce the binding of the episodic elements 25 . Furthermore, VR protocols forcibly entail relatively short encoding periods (i.e., max a few hours). This is likely to constrain how the temporal context is represented and, thus, how the timing of specific events is embedded within such representation 34 .
To address these limitations, we propose an innovative strategy to generate experimentally-controlled episodes in the everyday life of the participants (see also "Discussion" section, for related studies using portable cameras [35][36][37] ). Using mobile-phone technology, we tracked the geolocation of the participants in the real world for relatively long periods, up to several weeks (RW protocol, see Fig. 1b; Table 1). Based on specific spatiotemporal constraints (see Suppl. Material), we sent images of objects to the participants' phone and we recorded the real-world place and time of each of these events. We then tested the memory status of the items (what-object, Remember/Familiar/New), as well as the accuracy and confidence for the associated context (i.e. where-place and when-time), see Fig. 2a. Moreover, we employed a temporal-order judgment task to investigate the temporal representation of events encoded in such a large scale spatio-temporal context (Fig. 2b). The same retrieval tasks were also administered to two separate groups of participants, who encoded the same set of object-images either in a highly interactive large-scale virtual environment (VR protocol, Fig. 1a) or using a "standard" encoding procedure that included a series of unrelated trials each comprising the pairing of an object and a place-image on a computer screen (SL: standard laboratory, Fig. 1c).
The three protocols vastly differed in terms of the spatial-temporal scale of the encoding context. The participants travelled over distances of up to hundreds of kilometers during encoding in the real world (RW), while their position was fixed in lab for VR and SL. On the temporal side, both encoding durations and retention intervals lasted up to several weeks in RW vs. minutes-to-hours in VR and SL (see Table 1). Because of this, our analyses sought primarily to evaluate whether specific retrieval mechanisms held across protocols, rather than seeking to interpret any difference in overall memory performance. Specifically, we carried out logistic regressions analyses to predict the memory status of the what-objects (Remember/Familiar) from the accuracy and the confidence for the corresponding where/when-context; and we used a model of temporal similarity 38 to test for scale invariance in temporal-order retrieval. In particular, the temporal similarity scores account both for the encoding times and for the time intervals between encoding and retrieval (see "Methods" section), allowing us to compare the three protocols with different ranges of intervals.

Results
Explicit source retrieval. The main aim of the source retrieval task was to test how the availability of place/ time source information contributes to the subjective memory status of the encoded objects (Rem vs Fam), in the three different encoding contexts. Each trial started with the presentation of the picture of an object. The task of the participant was to choose one of 3 possible responses: (a) I have seen this object and I have some memory of the place/time when this happened; (b) The object is familiar but cannot remember when/where I saw it; (c) I have not seen the object. We label these three response-types as "Remembered" (Rem), "Familiar" (Fam) and The main task of the participants was to collect specific items in virtual shops. At unpredictable intervals, pictures of objects appeared in the bottom part of the screen and the participants performed a "like/ dislike" judgment (see inset, top-left). The main panel shows the path of one participant (yellow dotted-line) and the locations where the memory probes were presented (stars, with the different colors coding whether the object was subsequently retrieved as Remember, Familiar or wrongly categorized as a New object). (b) The real-world protocol (RW) entailed the encoding of the memory probes via a dedicated mobile-phone system, over a period of up to several weeks (see also Table 1). The participants received the object-images on the mobile-phone and responded with a "like/dislike" judgment. The position of the participant was geolocalised via the GPS functionality of the mobile-phone. The panel presents the data of one participant, who traveled in different cities and countries during the encoding phase. The bottom part of the panel shows a close-up view of the participant's path in the area of Lyon during a 5 days period (see z-axis). (c) Illustration of the Standard Laboratory protocol (SL) that comprised a sequence of unrelated trials presented on the computer screen. Half of the trials included arbitrarily paired object-and place-images ("Trial with object-event"). The other half of the trials included only the place-images ("Trial without any object-event). When the trial included the objectimage, the participant performed the "like/dislike" judgment. The place-images showed views of the virtual town (cf. a). Table 1. Main spatio-temporal characteristics of the three encoding contexts. Encoding: time between the encoding of the first and the last object. Retention: time between the encoding of the last object and the retrieval of the first object (explicit source retrieval task). Times are in the format "DAYS:HOURS:MINUTES:SECONDS". Path length: The participants' total movement during the encoding phase. For the VR protocol the values correspond to "virtual" kilometers in the virtual town. For the RW protocol, the path length corresponds to kilometers in the real world. The SL protocol did not involve any real or virtual movement of the participant. VR virtual reality, RW real world, SL standard laboratory, n.a. not applicable. n.a n.a n.a Figure 2. Retrieval tasks. (a) Illustration of the phases of the explicit source retrieval task. The task began with the presentation of the object-probe and the participant reported whether they had seen the object during the encoding phase and whether they could remember the place/time of that event. We label the three possible responses as Remember, Familiar or New. If the object was seen at encoding and correctly recognized as old (i.e. Rem or Fam response), the trial continued with a sequence of questions concerning the time/place-context. The place-test comprised a 2-alternative forced-choice discrimination task involving the presentation of an image associated with the place where the object was presented during encoding, plus a foil place-image. For the VR and SL protocols the place-images corresponded to the participant's view at encoding, while for the RW protocol the images were obtained from Google-images based on GPS coordinates and phone orientation. A confidence question followed the place-image discrimination ("how sure are you? very much/little"). The time-test also entailed a 2-alternative forced-choice, now comprising two time windows. One of the two windows included the moment when the object was presented. A confidence question followed the time-discrimination. The presentation order of the place and time source-tests was randomized across trials. (b) Example trials of the temporal-order judgment task. All trials included the presentation of two images side-by-side and the task of the participant was to report what image referred to the event that happened earlier during the encoding. In two separate blocks, the images comprised either the memorized object (TOobj), or images of places (TOloc). The place-images were either associated with an object-event during encoding (ev-TOloc) or referred to seen/visited places but where no object-event took place (noe-TOloc). The ev-TOloc and noe-TOloc trials were randomized within the TOloc block. The task's instructions were displayed in French. www.nature.com/scientificreports/ "New" (New). If the object was seen during encoding and the participant responded Rem or Fam, the trial went on testing for when/where source-memory and confidence (see Fig. 2a). First, we report to the overall object-and source-memory performance. A between-groups ANOVA tested the effect of the encoding context on the participants' memory for old/seen objects, irrespective of Rem/Fam responses (see Fig. 3a). This revealed a significant effect of experiment (F(2,57) = 7.23; p = 0.002; see also Table S1). Subsequent post-hoc tests (Tukey HSD) revealed that the accuracy in RW was significantly lower than in SL (p = 0.001). We then tested whether the encoding context affected the subjective memory status of the correctly recognized seen objects. The ratio of Rem/(Rem + Fam) correct responses were submitted to a between-groups ANOVA. The ANOVA did not reveal any significant difference between the three experiments (F(2,57) = 0.84; p = 0.438; see Fig. 3b).

Scientific Reports
Next, we tested the effect of the encoding context on memory for the when/where sources, as a function of the confidence of the source discrimination. The discrimination accuracy data were submitted to a mixed-ANOVA including the 3 factors: "source" (place/time: within-subject), "confidence" (high/low: within-subject) and "experiment" (VR/RW/SL: between-groups). The ANOVA revealed a main effect of confidence (F(1,56) = 68.02; p < 0.001), a main effect of experiment (F(2,56) = 5.66; p = 0.006), a significant interaction between confidence and experiment (F(2,56) = 3.29; p = 0.044), as well as a significant 3-way interaction (F(2,56) = 5.20; p = 0.008), see Fig. 3c. The participants could reliably retrieve the place and time sources, when they reported to be confident about their response: the accuracies for all the high-confident conditions were significantly above chance-level (all p < 0.05; see also Table S1). By contrast, the only low-confidence condition leading to performance above chance was the time-test in SL (cf. last bar in the plot of Fig. 3c), contributing to the observed 3-way interaction.
To address our main question about the contribution of source memory to the subjective memory status of the objects, mixed-effects logistic regressions sought to predict the probability of Rem responses (vs. Fam, considering only correctly recognized old/seen trials) based on the accuracy and the confidence of the source discrimination. The regression models included 4 predictors coding for place/time accuracy (irrespective of confidence) and for correct-source discrimination with high-confidence (vs. all the other combinations of accuracy and confidence). The results revealed a significant contribution of source-memory to the object memory status, but only when the sources were discriminated correctly and with high-confidence. Both confident-place and confident-time memory led to an increase of the probability to respond Rem (vs Fam) and this held for all three encoding contexts (see Fig. 3d). For confident-place the significance levels were: < 0.001 (VR), < 0.001 (RW), 0.010 (SL); and for confident-time they were: < 0.001 (VR), 0.022 (RW), 0.016 (SL). None of the predictors coding for accuracy, irrespective of confidence, were significant (all p's > 0.2) An additional set of logistic regressions tested whether trials including both confident-place and confident-time judgments lead to any further increase of the probability of Rem responses, but this was not significant in any of the three contexts (all p values > 0.05).
Temporal-order judgments. The main aim of the second retrieval task was to investigate the representation of temporal distances in our 3 protocols, which entailed vastly different temporal scales (see Table 1). In two separate blocks, the participants judged the temporal-order of two events, based either on pairs of object-images (TOobj task) or pairs of place-images (TOloc task). Further, in TOloc, the two images showed either places where an object-event had taken place during encoding (ev-TOloc); or two places that were also visited/seen, but without any object-event (noe-TOloc), see Fig. 2b. Across trials, the temporal distance at encoding between the two to-be-judged events was varied parametrically. Temporal distances were transformed into temporal similarity scores (TS) allowing us to test for invariance in temporal-order retrieval across the three encoding protocols.
First, we report the overall accuracy of the temporal-order judgments. A between-groups ANOVA tested whether the encoding context (VR, RW, SL) modulated the overall accuracy of the temporal-order retrieval using the object-cues. This was not significant (F(2,57) = 0.02, p > 0.9; see Fig. 4a, left panel), suggesting that information about the presentation order of the objects was available irrespective of context. Next, we asked whether the participants could retrieve the order of the visited/seen places and whether the fact of having received-and responded to-an object-event in these places affected the temporal-order performance. A mixed-ANOVA with the factors "experiment" (VR, RW, SL) and place "with/without" events (ev-TOloc, noe-TOloc) revealed only a main effect of experiment (F(2,57) = 10.13; p < 0.001). Post hoc tests (Tukey HSD) revealed that VR accuracy was higher than both RW (p = 0.009) and SL (p < 0.001), while RW and SL did not differ significantly from each other (p > 0.3), see Fig. 4a, panel on the right.
Our main analyses examined the influence of temporal similarity on the retrieval reaction times (RTs). We used the SIMPLE model 38 to compute the similarity between the two events to be judged on each trial and correlated this with the corresponding RT on a trial-by-trial basis for each subject, condition (TOobj, ev-TOloc, noe-TOloc) and experiment (VR, RW, SL). Figure 4b illustrates the influence of temporal similarity on RTs for the TOobj condition. For statistical inference, the regression slopes were estimated at the level of the individual participant using trail-specific similarity values and RTs. The resulting slopes (beta values) were submitted to a mixed-ANOVA with the factors "experiment" and "cue-condition", see Fig. 4c. This highlighted a significant main effect of "condition" (F(2,114) = 7.00; p = 0.001), with steeper slopes in TOobj compared with the two TOloc conditions. The ANOVA did not reveal any significant main effect or interaction related to the factor "experiment" (p's > 0.3), suggesting that analogous order-retrieval processes operated across temporal scales ranging from seconds (SL) to weeks (RW). We sought to confirm this scale invariance by submitting the average similarity slopes first to a one-sample T-test including all three experiments (t(59) = 5.84, p < 0.001), and then to a Bayesian analysis to evaluate the null-hypothesis of no difference between the experiments (see also 5 ). The results showed "extreme evidence" in favor of the null-model (Bayes factor = 0.009, with JZS prior), supporting scale invariance. www.nature.com/scientificreports/ www.nature.com/scientificreports/

Discussion
We investigated episodic memory retrieval following encoding in three contexts that comprised vastly different spatio-temporal scales and levels of naturalism (see Fig. 1; Table 1). In the framework of the what-where-when dimensions of episodic memory, we tested participants' memory for the encoded what-item (pictures of objects) and for the corresponding where-place and when-time context (Fig. 2a), as well as for the participants' ability to retrieve temporal order information based on what-object or where-place cues (Fig. 2b). The results revealed a set of general principles linking the different episodic dimensions, irrespective of encoding context. These included the proportion of objects retrieved with a Remember memory status (vs. Familiar, Fig. 3b), the contribution of source memory and confidence to this (cf. logistic regressions, Fig. 3d), the role of the temporal similarity between two episodes when retrieving their temporal order, as well as the impact of the cueing dimension on the latter (what-object vs. where-places, see Fig. 4c). By using an innovative real-world encoding protocol, these findings contribute to the understanding of binding processes in episodic memory and support proposals of temporal scale invariance in long-term memory. A first objective of the current study was to assess the impact of the encoding context on item-memory, source-memory, and their relationship. In the framework of dual-process models of episodic memory it has been suggested that the availability of source information should contribute to recollection as opposed to familiarity 39 . Studies employing arbitrary associations between the item and the sources (e.g. using the item's color, the stimulus' position on the screen or in a list, as the context/source) provided us with mixed results (e.g. see 40 , reporting chance-level source performance for familiar items, vs. high proportion of correct source judgments for "know" responses in 41 ). Possible explanations for these differences relate to factors such as the emotional content of the . Results of temporal-order judgment task. (a) Mean accuracy (%, ± SEM) of the temporal-order discrimination task using object-images as retrieval cues (TOobj). (b) Mean accuracy (%, ± SEM) of the temporal-order discrimination task using place-images as retrieval cues (TOloc). The first 3 bars show the accuracy for places where an object-event took place (ev-TOloc), while the 3 bars on the right show the accuracy for trials comprising place-images without any object-event (noe-TOloc). (c) For illustration purposes only, the TOobj trial-specific data were averaged in 15 bins, allowing us to display the regression lines for the three protocols (VR, RW, SL). The regression lines show that the RTs increased with increasing temporal similarity, in all 3 experiments. (d) Average slopes (± SEM) of the regressions between temporal similarity and RTs, now calculated at the level of the individual participant and displayed separately for the different types of retrieval cues (TOobj, ev-TOloc, noe-TOloc) and the three encoding protocols (VR, RW, SL). The data analysis revealed a general effect of temporal similarity across experiments, supporting scale invariance of order retrieval, but also a significant effect of cue-type indicating that factors other than chronological distance also contributed to temporal order retrieval, see "Discussion" section. www.nature.com/scientificreports/ stimuli, the availability of prior knowledge during encoding, the retrieval testing procedure, and the degree of similarity of the sources associated with the different items [42][43][44][45][46] . Most relevant here, the strengths of the links between the different dimensions of the episode during encoding and the level of the participant's active engagement are also thought to play a role 4,19,47,48 .
Here we assessed the relationship between recollection and source memory, following encoding in conditions that either entailed active behavior within rich and coherent spatio-temporal contexts (RW and VR), or merely involved sequences of unrelated trials, each comprising arbitrary pairs of objects and places-images (SL). Overall, we found that item-and source-memory were slightly better following encoding in SL compared to RW and VR (see Fig. 3a,c). At first, this may appear surprising given previous evidence that encoding in active and immersive conditions can strengthen memory 22,23 . However, it should be noticed that our participants engaged in many concurrent activities during RW and VR encoding, but not in SL, which may explain the overall lower memory performance (see also 4 ). Unlike previous memory studies that made use of VR (e.g. 4,32,33 ), in the current study the participants had to perform a series complex tasks throughout the encoding phase. The participants interacted with avatars, received instructions, stored and updated information about the items they had to buy and actively searched for the relevant shops in a large virtual town. The level of cognitive load and the complexity of the task were higher than in previous studies, when the participants typically focused attentional/cognitive resources on elements that were potentially relevant for the subsequent memory tests (e.g. exploring the VR, when later on asked to reproduce maps 49 or to make judgments about crossing points 50 ). The possible impact of concomitant activities was further exacerbated in the RW protocol. Further, when participants received the object-images, they may have focused on the phone screen and paid little attention to the surrounding environment. On the other hand, in RW, participants could perceive the environment for longer periods (several minutes before/after responding to the phone event), while in SL participants had only a few seconds to process the place-images. All these factors make it difficult to interpret the differences in the overall memory performance for objects and sources (see Fig. 3a,c). This is why our main analyses focused on the relationship between source-memory and the subjective status of the memorized objects. The results showed that, in spite of the lowered overall performance, the contribution of incidentally encoded contextual information (places and times) was maintained across the three encoding contexts. The logistic-regression analyses showed that the likelihood of reporting the what-item as Remember, as opposed to Familiar, increased when the participants were able to discriminate with a high-level of confidence the associated place-or time-source (Fig. 3d). These results are in line with the notion that the availability of source information supports the subjective sense of recollection, as proposed by Tulving 15 and postulated by dual-process models of episodic memory 39 . Nonetheless, it should also be noted that additional logistic analyses indicated that confident recognition of both time and place sources did not lead to any additional increase of Remember responses (see also 40 ). Thus, while both spatial and temporal source signals contributed to subjective recollection, the underlying memory trace of the episode did not necessarily include an integrated representation of the what-where-when dimensions 31,48 : that is, in the framework of the current task, confident memory of one source-dimension was sufficient to explain the subjective sense of recollection.
Most importantly, we show that this relationship between subjective recollection and source-memory held across three vastly different encoding contexts. The investigation of episodic memory to real-world situations allows embracing two characterizing features of episodic memory: namely, the rich and multidimensional nature of the episodic events, and the strong association between the events and the person who experiences them 14,16 . Several previous studies addressed this by making use of wearable cameras (e.g. [35][36][37]51 ). These protocols entail recording images of the everyday life of the participant and then make use of these scenes for subsequent memory assessments. Studies using wearable cameras revealed that the presentation of participant-specific scene images can strengthen episodic retrieval [35][36][37]51 (see also 52,53 for reviews, including the clinical relevance of these devices). However, other studies indicated that participants' memories of their own real-world experiences can be relatively poor 54 . These studies often made use of simple old/new judgment 54 or more specific remember/know tasks 35 but did not directly address the issue of the contribution of source memory to subjective recollection (but see 55,56 , for neuroimaging evidence of the role of the hippocampus in combining spatial and temporal information about everyday life events). Using a real-world procedure, Mazurek et al. 57 directly tested the binding of what-wherewhen elements and the relationship between this and the subjective experience of retrieval (remember vs know). The results revealed no significant difference of correct what-where-when combinations between remembered vs. know episodes. These results contrast with the findings of the current RW protocol but it should be noted that the number of objects/events, the task at encoding and the spatio-temporal characteristics of the encoding context substantially differed between the two studies. Indeed, the main feature of our RW mobile-protocol was that it enabled us to generate events over long temporal windows (up to 3 weeks, vs. hours in 57 ) and widespread spatial locations (kilometers, vs. meters in 57 ).
The possibility of extending the duration of the encoding phase enabled us to address the hypothesis of scale invariance in long-term memory 58,59 . In the framework of the SIMPLE model 38 , we have previously shown that the relationship between temporal similarity and temporal-order reaction times holds across different encoding durations and retention intervals, in the seconds-to-hours range 5 . In the current study, we extended this approach to encoding periods lasting up to several weeks (RW protocol) and including retrieval cues based either on memorized what-items or contextual where-places. We found that the relationship between similarity and retrieval times holds across the three encoding contexts, consistent with the hypothesis that similar mechanisms contribute to order-retrieval over different scales (Fig. 4c). In the SIMPLE model, this would entail local interference arising when the two to-be-retrieved items are close within the "psychological space" 38 , see also below.
Our results also showed that the relationship between temporal similarity and retrieval times changed as a function of the cue employed to retrieve the temporal order of the events. The effect of temporal similarity was significantly larger when the participants judged the order of two objects than when they judged the order of two www.nature.com/scientificreports/ places. Notably, this was true also when we specifically tested places where the objects had been encoded (i.e. the evTOloc trials) and that, thus, had identical temporal similarity values as the corresponding object-trials. A straightforward explanation for this difference may be that the participants did not retain temporal information about places as good as they did for the objects. However, the effect of retrieval-cue was evident also for the VR protocol that-overall-lead to analogous order-retrieval performances, irrespective of cue dimension (see Fig. 4a, light-gray bars). The finding that the type of retrieval-cue modulates the relationship between temporal similarity and retrieval times suggests that factors other than chronological information contributed to the order judgment. One proposal that considers the interplay between chronological and non-chronological signals during temporal retrieval comprises temporal context models 60 . These have been found to account for temporal-order performance following encoding in complex VR settings 33 . Temporal context models can also explain scale invariance, but they substantially differ from SIMPLE because they include context-specific temporal drifts and decays. Our current study was not designed to advocate between these models. However, the finding that SIMPLE accounted for the retrieval times across three contexts (RW, VR, SL) highlights the relevance of logarithmic scaling beside any context-specific effect. Moreover, our three encoding conditions substantially differed in terms of their spatio-temporal structure, i.e. a coherent continuum with different levels of complexity in RW and VR, vs. a set of unrelated trails in SL. These differences should impact processes relevant for context-based retrieval (e.g. event segmentation 10 ), while our results showed analogous effects of temporal similarity in the three encoding protocols, see Fig. 4c.
Instead, we suggest that the impact of cue-type here may reflect the existence of a multi-dimensional space storing the memorized episodes. In the SIMPLE model, the "psychological space" determining the competition between neighboring events can comprise not only the chronological dimension, but also additional dimensions. In the initial formulation that considered the retrieval of simple stimuli, these additional dimensions would-for example-encode whether two word-items belonged to the same vs. different lists at encoding, or more complex hierarchical effects related to semantic content 38 . In the current study, a most important additional dimension concerned the place-related information that was associated with each event. Accordingly, events' similarity in the psychological space-and any resulting competition during retrieval-would entail not only chronological distances, but also interference caused by place information. In the framework of the SIMPLE model, Brown and colleagues 38 proposed that the allocation of attentional resources during recall would bias the contribution of the different dimensions of the psychological space, "stretching out the relevant dimension, while simultaneously squashing it along another". Here, this would correspond to the use of the different retrieval cues that in the TOloc condition (place-cues) would diminish any interference arising from the chronological distance and reduce the correlation between the temporal similarity values and the retrieval times (cf. Fig. 4c). Future studies may attempt to quantify the events' distance along the place-dimension, albeit this may be particularly difficult with naturalistic stimuli. Place-related information includes the real-world distance between events encoded in RW (e.g. see 55 ), but also other factors such as the visual similarity between the place-images presented during retrieval 61,62 , as well as more complex parameters arising from the reconstructive mechanisms thought to govern the retrieval of complex, multi-dimensional naturalistic events 63,64 .
In conclusion, we investigated the contribution of contextual information to episodic memory, introducing a novel protocol based on mobile-phone technology that allowed us to study this relationship in a vast spatiotemporal, real-world context (see also 65 ). The results highlighted the contribution of source memory to the subjective memory status of the tested-item and demonstrated the impact of temporal similarity on temporalorder retrieval irrespective of encoding context. These findings corroborate the link between the two characterizing aspects of episodic memory, namely the multi-dimensional nature of the memory trace and the subjective experience of retrieval, and they support proposals of temporal scale invariance in long-term memory. The study of cognitive functions outside the laboratory comes with many limitations that relate to the impossibility to control for a large number of factors likely to affect the processes under investigation. Nonetheless, naturalistic approaches are fundamental to bridge the gap between basic scientific knowledge and any application in the real world, including clinical practice, as well as putting to the test cognitive theories typically constructed on the basis of simple and stereotyped paradigms. Experimental procedures. Each of the three experiments comprised an encoding phase and a retrieval phase (see Figs. 1, 2). The retrieval tasks were analogous in the three experiments. The encoding phase comprised the memorization of 60 objects that, across the 3 experiments, were presented in vastly different spatiotemporal contexts, see Fig. 1 and Table 1. Pictures of the objects were taken from the "Massive Memory Object Categories" database 66 and included common objects of different categories (e.g. tools, animals, food).

Methods
Encoding protocols. In experiment 1 (Virtual Reality, VR), the pictures of the 60 objects were presented over a period of 45 min, while the participants actively explored a large-scale virtual town (see Fig. 1a www.nature.com/scientificreports/ virtual environment was viewed on a standard PC screen. The participants used the PC-keyboard to navigate within the town and to control various actions associated with a set of "missions". The missions entailed collecting specific objects (e.g. T-shirts, medicines, shoes, fish, cigarettes, etc.) in specific virtual shops (e.g. clothesshop, pharmacy, fishmonger, tobacconist, etc.). At unpredictable times, the participant was presented with the objects for the main memory task. The picture of the memory-object was shown in the lower part of the visual field and the participants used the keyboard to indicate whether they "liked/disliked" the object. The like/dislike judgment was included to make sure that the participant paid attention to the memory-objects. The retrieval phase took place the day after the encoding phase.
In experiment 2 (Real-World, RW), the same 60 objects were presented over a period of 3-17 days during the everyday life of the participants, via a dedicated mobile-phone application (Fig. 1b). The system acquires contextual data based on mobile-phone functionalities, including real-time GPS coordinates, current speed and motion direction and can make use of this real-time knowledge to make decisions about any information to be sent to the participant (see Suppl. Material, for the specific constraints used to decide when to send the object-images). During encoding, the only task of the participant was to look at the picture of the object sent on the mobile-phone and to respond whether they "liked/disliked" the object, via the mobile-phone interface. The mobile application signaled the object-events with a sound, plus a vibration notice. The retrieval phase took place at variable intervals after the encoding of the last object (hours to days, see Table 1).
In experiment 3 (Standard Laboratory, SL), the encoding phase followed a standard laboratory procedure, including the pairing of each object with the picture of a place on a computer screen in the laboratory. The encoding phase comprised 120 trials: 60 trials with place-images paired with the memory-objects and 60 trials comprising place-images only (see Fig. 1c). The place-images were obtained from the VR experiment and the task of the participant was again to indicate whether they "liked/disliked" the object. The SL encoding took place over a short period of 15 min and the retrieval started approx. 15 min after the end of the encoding phase. Further details about the three encoding procedures are provided in the Supplementary Materials. Retrieval tasks. The retrieval phase was identical in the three experiments and comprised two main tasks: explicit source retrieval and temporal-order judgment. The aim of the source retrieval task was to assess the participants' memory for the 60 objects (what) and for the associated spatial (where) and temporal (when) context. Each trial included multiple phases, first assessing the subjective memory status of the object (Remembered object with where/when-context vs. Familiar object only) and then, explicitly testing memory for the where/ when-sources using a two-alternative forced-choice procedure, plus confidence judgments (see Fig. 2a). The place-discrimination comprised the presentation of two place-images shown side-by-side. For the VR and SL experiments, one of the two images (the target) corresponded to the exact scene that was seen by the subject at the moment they encoded the object. For the RW experiment the images were obtained from Google-image and depicted the real-world location where the participant had received the object on their mobile-phone. The time-discrimination entailed the presentation of two time-windows (written text), one of which included the time when the memory-object was presented.
The second retrieval task comprised a temporal-order judgment. Each trial entailed the side-by-side presentation of two images (Fig. 2b). In different conditions, these retrieval cues included either what-objects or where-places. Specifically, the two images depicted either: two objects presented during encoding (TOobj task); two images of places, both associated with an object-presentation event (ev-TOloc); or (3) two images of places that were also visited/seen during the encoding phase, but that were not associated with any object-presentation event (noe-TOloc). The task of the participant was to report which of the two images (objects or places) referred to the time-point that had happened earlier during the encoding. The main data analysis fitted the retrieval reaction times with a temporal similarity model (SIMPLE 38,67 ) that allowed us to assess scale invariance of the temporal-order retrieval taking into account both the temporal distance between each pairs of events during encoding and the retention times between encoding and retrieval (see Table 2). Further details about the retrieval tasks are provided in the Supplementary Materials. Table 2. Temporal-order discrimination task. Temporal characteristics of the to-be-judged pairs of events during the temporal-order task and the average behavioral performance for the different conditions (TOobj, ev-TOloc, noe-TOloc) and encoding contexts (VR, RW, SL). Temporal distance: Time between the occurrence of the two events during encoding (range, in minutes); Temporal similarity: Similarity values computed according to the SIMPLE model (range, in similarity units [0-1], see "Methods" section). RT: Mean reaction times during retrieval (in ms, with standard error); Accuracy: Mean accuracy during retrieval (in %, with standard error).

Virtual reality (VR)
Real world (RW) Standard laboratory (SL) www.nature.com/scientificreports/ Data analysis. Explicit source retrieval. The aim of the explicit source-retrieval test was to assess the influence of source-memory on the subjective memory status of the object (Rem vs. Fam). The data analysis was carried out using mixed-effect binomial logistic regressions, implemented in Matlab R2017a (Mathworks, Inc.). Separately for the 3 experiments, the logistic models considered all single trials when an old/seen-object was correctly recognized either as Rem or Fam. The dependent variable was the Rem or Fam memory status of the object. The four predictors of interest comprised the place and time retrieval accuracies (irrespective of confidence) and the interaction between accuracy and confidence for the two sources. The factor "subject" was included as a random effect in the models.

TOobj ev-TOloc noe-TOloc TOobj ev-TOloc noe-TOloc TOobj ev-TOloc noe-TOloc
An additional set of models tested for possible interactions between the two sources. Given the results of the main models, which showed that only high-confident correct source-responses predict Rem, the additional logistic model coded directly high-confident correct responses (= 1, vs. all other accuracy/confidence combinations = 0) for the two sources, plus the interaction between the two sources. The interaction term should highlight whether the ability to retrieve both place and time sources (correct and with high-confidence) leads to any further increase in the likelihood of responding Rem to the initial object-retrieval question.
Temporal-order judgments. The aim of the temporal-order task was to investigate scale-invariant mechanisms of temporal-order retrieval using the SIMPLE model proposed by Brown and colleagues 38 . For each single trial of each of the 3 experiments, we computed a "temporal similarity" value (TS) that takes into account the temporal distance between the two probes during encoding, as well as the time between the encoding of the probes and their retrieval. For instance, two events separated by a fixed temporal distance (e.g. 10 min) will be more "similar" (high TS) if they occurred a long time before retrieval, compared to two events with the same distance but occurring closer to the time of retrieval (lower TS). The TS were computed as: where T i and T j are the encoding-to-retrieval retention delays for the pair of events "i" and "j". "c" is a power constant that here was computed as the inverse of the range of log-transformed T i and T j 67 . The TSs take values between 0 and 1 irrespective of the range of temporal distances and retention delays (see Table 2, reporting the initial ranges of temporal distances and the corresponding TS ranges).
Following the computation of the TS for each trial, we used robust regressions implemented in Matlab 2017a to obtain the relationship between TSs and RTs for each participant separately for the three temporal-order conditions (TOobj, ev-TOloc and noe-TOloc), considering correct trials only. For statistical inference at the group level, the corresponding regression slopes (betas) were submitted to a 3 × 3 mixed-ANOVA with the factors: "cue-condition" (TOobj, ev-TOloc, noe-TOloc) and "experiment" (VR/RW/SL: between-groups). The ANOVA was carried out using SPSS (vers 21, IBM).