Emergence of hierarchical organization in memory for random material

Structured information is easier to remember and recall than random one. In real life, information exhibits multi-level hierarchical organization, such as clauses, sentences, episodes and narratives in language. Here we show that multi-level grouping emerges even when participants perform memory recall experiments with random sets of words. To quantitatively probe brain mechanisms involved in memory structuring, we consider an experimental protocol where participants perform ‘final free recall’ (FFR) of several random lists of words each of which was first presented and recalled individually. We observe a hierarchy of grouping organizations of FFR, most notably many participants sequentially recalled relatively long chunks of words from each list before recalling words from another list. Moreover, participants who exhibited strongest organization during FFR achieved highest levels of performance. Based on these results, we develop a hierarchical model of memory recall that is broadly compatible with our findings. Our study shows how highly controlled memory experiments with random and meaningless material, when combined with simple models, can be used to quantitatively probe the way meaningful information can efficiently be organized and processed in the brain.

each group. The recall of each item is preceded by the retrieval of a group context, which in turn is triggered by group-specific cues and control elements. The main focus of the model is induced or spontaneous chunking in single list recall, characterized by the appearance of short clusters of 3-4 subsequently presented words. Another influential model of clustering, called Context Maintenance and Retrieval (CTM) is proposed in 14 . This model generalizes the earlier Temporal Context Model 12 to include the possibility that different memory items are grouped into distinct sources (e.g. words presented aurally vs visually). The model accounts for experimentally observed interplay between two different types of clustering: a temporal one based on presentation position of different items, and source clustering.
In this contribution, we chose to focus our attention on the paradigm of final free recall (FFR), which could be considered as a strong version of induced chunking. In this paradigm, participants recall several lists in a single daily session, in the end of which they are asked (with no prior warning) to recall the words of all the lists in an arbitrary order. FFR was studied in several previous publications (see e.g. [35][36][37][38][39]. Most of these studies concerned the differences between temporal organization of recall within lists as assessed by a classical serial position curve 2 , for both single list recall and FFR. In some studies, similar organization was reported for both cases, characterized by primacy and recency effects 35 , while other studies, involving bigger number of longer lists, reported within-list 'anti-recency' in FFR 38,39 . It was also reported that words from the lists presented towards the end of the session were more likely to be recalled (list recency) and between-the-list transitions tend to be between lists that were presented one after another (list contiguity) 36,37 .
We conjectured that FFR should exhibit strong grouping over the lists, because each list was presented and recalled individually in the same session. We also wanted to elucidate the possible effects of grouping on the FFR performance. Indeed over-the-list grouping was observed in 36 , but only the average length of within-list clusters was reported (2.6 words our of 10 words in each list) and no analysis of its effect on FFR performance was presented. Our analysis not only uncovered a highly significant overall degree of grouping in FFR, but also demonstrated the great diversity of it across participants. Moreover, we found a very strong positive correlation between grouping and FFR performance, thus reinforcing the crucial role of temporal organization of information in episodic memory, in a precisely quantifiable way.
To elucidate the possible mechanisms of grouping and its effect on FFR performance, we developed a highly reduced version of hierarchical recall models of 14,24 . It generalizes our previous model that successfully accounted for power-law scaling in free recall of single lists 32,40,41 and includes some of the features that are similar to 42 . To reduce the complexity of the model, we did not include any mechanisms for generating within-list temporal organization which resulted in significantly fewer free parameters than in the previous theoretical studies. We found that the model accounts well for both overall degree of grouping and its diversity, as well as the correlation between grouping and FFR performance in terms of the number of words recalled.

Results
Grouping over lists, induced by presentation protocol. The protocol of the experimental dataset we analyze, obtained in the lab of Prof. Kahana at the University of Pennsylvania 43 , adheres to the following structure. Each participant performed 16 Immediate Free Recall (IFR) trials a day with randomly assembled non-overlapping lists of 16 words. On selected days they were subsequently asked to recall all the words presented on that day (FFR; Fig. 1a). Averaged over roughly 900 FFR sessions, participants recalled 57 words per session. This level of performance is much higher than typical recall performance of lists of 16×16 = 256 words 2,41 , indicating that participants take advantage of the structural organization of presented words imposed by prior IFR trials. To prove that this is indeed the case, we quantify the level of grouping in FFR over the presented lists with a value p 16 that reflects the tendency to recall subsequent words from the same list before switching to another list (see Materials and Methods) 44 . The distribution of p 16 over the data is very wide (Fig. 1b), covering the range from 0 (random recall) to 0.9 (strong degree of grouping; see Fig. 1c-e for three prototypical examples). Displaying the FFR performance versus the grouping measure p 16 revealed a striking correlation between the two (r = 0.62, p = 4×10 −97 ), with the bulk of data well characterized by linear dependence of performance on p 16 . Interestingly, in the limit p 16 → 0, i.e. when no grouping is employed, performance approached a value of 30 words, supporting the theoretical prediction 32 . We also observe that in FFR sessions with highest values of p 16 participants occasionally recalled single words from a list in between longer sequences from other lists ( Fig. 1c; see e.g. a single word from the 15 th list recalled between two groups of words from the 4 th list). We speculate that these short 'intrusions' are analogous to famous 'slips of the tongue' in natural speech 45 .
A possible interpretation of the above results is that participants perform FFR by applying a mixture of two recall strategies, one that treats all the words as one long random list, and another one that operates on two levels, namely individually presented lists and words within a list. As the second strategy gains prominence, recall becomes progressively more grouped and the value of p 16 increases, accompanied by the increase in performance. In particular, the participants could develop stable representations of each list as a separate entity and 'recall' a list before recalling words from that list. spontaneous grouping within presented lists. The grouping over lists exhibited in Fig. 1 is induced by the experimental protocol as lists are first presented and recalled individually in the IFR protocol. Another level of grouping, that was not induced by the protocol, was identified in FFR through the analysis of IFR data: a small proportion of participants develops chunking strategies in IFR 46,47 . These participants divide lists of 16 words into chunks of 3 or 4 consecutively presented words (e.g. words 1-4, 5-8, 9-12 and 13-16 in case of chunks of size 4) and recall these chunks as single entities 44 . This kind of chunking is not imposed by the protocol; hence, it must emerge from active manipulation of the presented list, for example representing chunks of words as separate items in memory. Here we wondered whether the chunks observed during IFR remained in memory till FFR trials. It is hard to infer whether chunking occurred in every single trial, hence we assumed that a chunk is recalled as a unit when all words from that chunk are recalled consecutively in IFR (not necessarily in the correct order). We therefore isolated all chunks of size 4 that were recalled during IFR trials (as described above), and considered the recall of the constituting words during FFR. We computed the probability for the different number of words from this chunk to be recalled. The results are shown in Fig. 2a. We found that for the first three chunks in the list, probability has two peaks, at 0 and 4 words, indicating the tendency for all 4 words in these chunks to be recalled or omitted as a single unit. Interestingly, the probability curve for the last chunk in a list decays monotonically, indicating that words from that chunk are recalled independently. A plausible explanation of this effect is that the last several words in a list are typically recalled immediately during IFR since they are maintained in working memory after the list is presented, and hence their recall is effortless and does not lead to the formation of a chunk representation in memory. A similar explanation also accounts for a recently reported 'anti-recency' effect in FFR, where the last words in a list have lower probability to be recalled, as opposed to the well-documented positive recency effect during IFR 39 . For comparison, if the same analysis is performed for IFR trials where the same four www.nature.com/scientificreports www.nature.com/scientificreports/ words were recalled but with at least one intervening word, the corresponding probabilities do not exhibit a peak at four words recalled (Fig. 2b). spontaneous grouping of lists. Some of the best participants who employ a strong over the list grouping imposed by the presentation protocol, also exhibit a higher-level grouping of lists. In particular, they tend to recall lists in chunks of four consecutive lists, as illustrated in figure Fig. 3.
Taken together, the results presented above illustrate that our memory is trained to create a structure on different levels of organization, including those that are not directly imposed by the presentation protocol.
Hierarchical model of memory recall. The model developed for this study generalizes our previous model of single list free recall that is based on two principles 32,40 : 1. The encoding principle states that each memory item is encoded ("represented") in the brain by a specific group of neurons in a dedicated memory network. When an item is retrieved ("recalled"), either spontaneously or when triggered by an external cue, this specific group of neurons is activated. 2. The associativity principle for which, in the absence of specific retrieval cues, the currently retrieved item plays the role of an internal cue that triggers the retrieval of the next item.
From these two principles we were able to theoretically predict that, out of L remembered words, on average, π ≈ . L L 3 /2 2 17 words would be recalled 41 . This matches well with the average performance and its distribution in single list recall of approximately 8 words out of 16 32 . However, the average FFR performance of 57 words recalled out of 256 words presented over the entire session (16 lists of 16 words each) is much higher than  www.nature.com/scientificreports www.nature.com/scientificreports/ predicted by this model, which motivated the current extension. To this end, we build an hierarchical model of memory based on these two principles and show that it's behavior is in agreement with experimental results presented above. Our model could be viewed as a radically simplified version of 24 and 14 .
Modeling the encoding. We extend the encoding principle formulated above for the recall of single lists in the following way. Following 14,24 we postulate that different distinct levels of information (words, chunks, lists, context…) are encoded in the form of sparse random neuronal populations in the corresponding distinct subnetworks (see Fig. 4a). In the experimental paradigm words are presented in lists of 16 items and each session consists of 16 lists. Accordingly, each word W is labeled by the triple of indexes: W = (w, l, s)  www.nature.com/scientificreports www.nature.com/scientificreports/ the presentation position of the word in the session (from 1 to 256), the presentation position of the list (from 1 to 16), and the session number, respectively. Similar to CMR 14 , we represent each word W by the concatenation of three binary patterns, each representing a session, list with a session an a word within a list, respectively: The length of the three vectors equals the number of neurons in each subnetwork N w , N l , N s . Each neuron contributes to the encoding of a word W with probability f so that the total number of neurons which encode a word W is on average f⋅(N w + N l + N s ) = f⋅N.
In our previous studies 32,48 , transitions between words were driven by similarity defined as a dot product between the corresponding representations (see Methods). Due to the decomposition of representations into three parts (see Eq. (1)), the similarity between any two words W 1 and W 2 can be presented as a sum of three corresponding terms: where S word w w , 1 2 is the similarity matrix between words w 1 and w 2 in words subnetwork; S list l l , 1 2 is the similarity between lists l 1 and l 2 , to which the words W 1 and W 2 belong; S session s s , 1 2 is the similarity of sessions s 1 and s 2 in the session subnetwork; parameters α and β weight the relative strength of the list and session context populations respectively in driving the retrieval process.
The data shows that IFR had a strong effect on FFR, since, while average IFR performance was 50%, 87% of the words recalled in FFR are the words that were previously recalled in IFR. We therefore assumed that only the words that are recalled during IFR are bound to list and session representations, i.e. the last two terms in the total similarity matrix of Eq. (2) are only added for pairs of words that were both recalled during IFR (see Methods).
Associative transitions. The model of the encoding principle provides a simple mathematical characterization of words representation, but it does not describe how these representations are exploited in the retrieval dynamics. This is described within the scope of the associative principle which determines transitions between words.
According to the associativity principle the currently retrieved item functions as an internal cue that triggers the retrieval of the next one. Transitions between words are brought about by similarities between the active word -the last retrieved one -and other encoded words. Simplifying the previous models 14, 24 , we use the deterministic transition rule, namely, the word which is most similar to the currently retrieved one is then activated and the process continues leading to the retrieval of more and more words. Importantly, the last retrieved word cannot be activated so that a transition which just occured cannot immediately happen in the reverse direction. The IFR of a single list was obtained by the non-hierarchical recall model of 40 with word-to-word contribution S word to the similarity matrix of Eq. (2) (see Fig. 4b). The recalled words in IFR were then used to build the total similarity matrix S tot of Eq. (2) for different strengths of binding between words and lists, α, and FFR was modeled as follows. The dynamical recall process is driven by S tot (see Fig. 4c, black arrows), unless the same within-list transition is attempted for the second time (Fig. 4b,c, red arrows). At this point, the process, if continued, would enter a loop by recapitulating the same words of a given list that were already retrieved and hence no new words would be recalled (Fig. 4b, bold arrows). Note that the repeated retrieval of the same word not always initiates the loop because sometimes the retrieval could then proceed in the opposite direction (see 40 and Fig. 4b). Similar to 42 , we assume than when the process approaches a loop, i.e. the same within-list transition is attempted for the second time, the list representation is suppressed and the next transition is determined by the other two contributions in the similarity matrix corresponding to session context and word-to-word similarity: We call these transitions 'random' in contrast to the 'structured' ones induced by S tot , and show them with green arrows in Fig. 4c. Upon triggering the retrieval of a new word through random transition the process reverts to using the full similarity matrix S tot with the list representation corresponding to the retrieved word activated, until it eventually enters a big loop that includes several lists.
Comparison between data and model simulation. We now turn to deploying this model in simulating the experimental paradigm analyzed previously. To qualitatively compare the model to experimental findings, we examine how the sequences generated by our model present grouping of items as measured by p 16 . In the model, the parameter α controls the strength of binding between the words recalled during IFR and the list representation. When α is high, the similarity between the words from the same list is high and hence most of the transitions happen between such words. We let α vary across sessions, and set the binding between words and sessions according to β γ = + α 2 . Here γ is a constant that controls the binding of words recalled during IFR to a session, irrespective of how strong the list binding is. The reason for this contribution is that the words recalled in IFR have a higher chance to be recalled in FFR even for sessions with no list grouping (i.e. sessions with p 16 = 0), see Fig. 5d. The exact relation between β and α is not important, besides setting the value of α for which grouping saturates (see Fig. 5a).
Using the described model, 6500 sessions of FFR were simulated. We compute p 16 for all sessions of FFR so generated and find that p 16 on average monotonically increases with the value of α, Fig. 5a. This is an expected behavior since large values of α force structured recall. Similarly to experimental data the model shows a linear dependence of the number of recalled words as a function of p 16 , Fig. 5b (cfr. Fig. 1b). Intriguingly, the number of www.nature.com/scientificreports www.nature.com/scientificreports/ sequences of words recalled from the same list as a function of p 16 shows a non-monotonic dependence, Fig. 5c (red dots), which we also observed in the experimental data (blue dots). For small values of α, and thus p 16 , the recall is unstructured and the number of sequences is roughly equal to the number of recalled words (see Fig. 1d). When α and, therefore, p 16 increases the number of sequences increases since there is a mixture of two recall processes -random and structured (see Fig. 1e). For intermediate values of p 16 the contribution of S word and S list to driving structured transitions are comparable and across lists transitions may still be triggered by structured transitions. As we further increase α the recall becomes very structured and the words from a single list are predominantly recalled before words from other lists are recalled (see Fig. 1c). Consequently the number of sequences becomes comparable, or even smaller than the number of presented lists. To further assess the validity of our model we compute the percentage of newly recalled words in FFR (the words that were not recalled in IFR). Figure 5d shows that this steadily decreases with p 16 for both the model (red dots) and the experimental data (blue dots).

Discussion
We studied the final free recall of sets of 256 unrelated words that were previously presented and recalled on the same day as 16 lists of 16 words each. We found that FFR trials exhibit various degrees of hierarchical organization: within-list chunking that spontaneously emerged in IFR, over-the-list organization induced by the presentation protocol, and finally list chunking for the very best participants (see Figs 1-3 above). The dominant recall organization, exhibited in the bulk of the data, was the tendency to recall subsequent words from the same list. This type of grouping strongly correlated with performance, Fig. 1b. When extrapolated to the limit of random recall, the performance dipped below the level of 30 words that closely matched our theoretical prediction for structure-less recall. The average performance was almost twice higher than this level, indicating a strong effect of information structure on memory retrieval. We also found that within-list chunks that emerged spontaneously in a limited number of trials in IFR 44 have a high probability to be recalled or omitted as single units during FFR trials as well. Taken together, our results strongly indicate that people tend to organize information to be remembered in a way that facilitate subsequent recall, even when information itself lacks any meaning, as in the case of free recall of random words.
From a theoretical point of view, we extended the model of associative memory recall 40 to take into account the hierarchical representation of information in FFR that we found in experimental data. More specifically, we added list and session context subnetworks. The resulting model is compatible with proposed principles of sparse encoding and associative transitions. Our model is much simpler than the previous models of hierarchical contextual recall 14,24 , which helps to better understand the relation between the binding of memorized words to context and FFR properties. It should be noted however that the simplicity of the model comes with the price, since it does not account for temporal organization of single list recall, such as primacy, recency and contiguity. Our experimental and theoretical results indicate that the recall of the words in IFR, rather than passive acquisition, is a dominating factor in the emergence of the grouping. The model can be easily generalized to any number of hierarchical levels by adding additional layers of representations, similar to list and session representations.   49 , the data reported in this manuscript were collected in the lab of M.
Kahana as part of the Penn Electrophysiology of Encoding and Retrieval Study (see 43 for details of the experiments). Here we analyzed the results from the 217 participants (age 17-30) who completed the first phase of the experiment, consisting of 7 experimental sessions. All experiments were performed in accordance with relevant guidelines and regulations. Participants were consented according the University of Pennsylvania's IRB protocol and were compensated for their participation. Informed consent was obtained from all participants or, if participants are under 18, from a parent or legal guardian. Each session consisted of 16 lists of 16 words presented one at a time on a computer screen and lasted approximately 1.5 hours. Each study list was followed by an immediate free recall test. Words were drawn from a pool of 1638 words. For each list, there was a 1500 ms delay before the first word appeared on the screen. Each item was on the screen for 3000 ms, followed by jittered 800-1200 ms inter-stimulus interval (uniform distribution). After the last item in the list, there was a 1200-1400 ms jittered delay, after which participants were given 75 seconds to attempt to recall any of the just-presented items. In 4 out of 7 experimental sessions, following the immediate free recall test from the last list, participants were shown an instruction screen for final-free recall, informing them to recall all the items from the preceding lists in any order. After a 5 s delay, a tone sounded and a row of asterisks appeared. Participants had 5 minutes to orally recall any item from the preceding lists.
Grouping measures. For each final-free recall trial we consider the ordered set of recalled words (W) defined as w 1 → w 2 → … → w n where n is the number of words recalled in a given trial and w 1 (w 2 , …, w n ) denotes the input serial position during the day of the first (second, …, last) word recalled, which is the number between 1 and 256 (see Fig. 1a). We introduce the grouping measure (p), and assign the probability to each transition by assuming that the next word recalled is chosen from the same list as the currently recalled word with probability p and a random word is chosen with probability 1 − p. The probability for the whole sequence is computed as a product of individual transition probabilities. Formally, if l i is the number of the list (from 1 to 16) from which word w i was presented, the probability P i of transition (w i → w i+1 ) and the total logarithm probability of the whole sequence (log-likelihood) are  , γ = 15. This value for γ was chosen to match the proportion of new words recalled during FFR on sessions with little over the list grouping (see Fig. 5d).