Statistical learning occurs during practice while high-order rule learning during rest period

Knowing when the brain learns is crucial for both the comprehension of memory formation and consolidation and for developing new training and neurorehabilitation strategies in healthy and patient populations. Recently, a rapid form of offline learning developing during short rest periods has been shown to account for most of procedural learning, leading to the hypothesis that the brain mainly learns during rest between practice periods. Nonetheless, procedural learning has several subcomponents not disentangled in previous studies investigating learning dynamics, such as acquiring the statistical regularities of the task, or else the high-order rules that regulate its organization. Here we analyzed 506 behavioral sessions of implicit visuomotor deterministic and probabilistic sequence learning tasks, allowing the distinction between general skill learning, statistical learning, and high-order rule learning. Our results show that the temporal dynamics of apparently simultaneous learning processes differ. While high-order rule learning is acquired offline, statistical learning is evidenced online. These findings open new avenues on the short-scale temporal dynamics of learning and memory consolidation and reveal a fundamental distinction between statistical and high-order rule learning, the former benefiting from online evidence accumulation and the latter requiring short rest periods for rapid consolidation.


INTRODUCTION
Learning is the ability to acquire knowledge or skills through new or repeated experiences. To understand the neural mechanisms of learning, it is crucial to identify the specific periods during which it occurs. In the laboratory, learning is usually assessed by measuring specific knowledge or skill before and after a period of training. For example, a seminal experience consists of measuring the speed and accuracy with which participants play a sequence-a simplified version of learning a piece of piano without the artistic component-before and after practicing it several times 1 . This type of research revealed that following a training session and during a resting or sleep period, the acquisition of new skill may continue to develop, a process called offline learning 2 . Indeed, performance 3 or the stability of the memories against interference (e.g., caused by the learning of a second sequence) 4,5 is enhanced several hours after the end of the practice compared to just after the practice. This offline learning, which occurs during awake or sleep periods, has been linked to functional brain changes 6,7 . This demonstrates that the neural mechanisms of learning do not necessarily only develop during practice. Recently, rapid offline consolidation of skill has also been documented in the course of short rest periods, from seconds 8,9 to minutes 10 during the learning of a perceptual-motor sequence. In Bönstrup et al. 8,9 , this fast offline learning even accounted for most behavioral gains during early skill learning, raising the hypothesis that the brain mainly learns during short rest periods and not during the practice itself. However, these studies investigating ultra-fast consolidation during sequence learning did not evaluate the relative contribution of online and offline learning to different crucial components of learning. Here we used sequence learning tasks with random, probabilistic, and deterministic transitions that made possible the identification of the short-scale dynamics of general skill (the general speed-up in the task), statistical, and high-order rule learning.
Statistical learning is a fundamental learning mechanism responsible for picking up probabilistic regularities in the environment. The ability of an organism to extract such statistical environmental information is critical for its survival 11,12 and is present across species and modalities 13 . In humans, this ability is present in babies 11 and at the core of a wide range of behaviors, including linguistic processing 14 or perceptual decision making 15 . One challenge of language acquisition, for example, is the segmentation of words from fluent speech. Within a language, the transitional probability between two syllables will generally be higher within a word than between two words, creating inhomogeneities in transitional probabilities between sounds. Such statistical information is used by adults and babies as young as 8 months old in order to segment words 11,16 .
Nevertheless, learning does not rely solely on the extraction of statistical regularities. High-order rule learning is also needed to extract deterministic rules that can be generalized to new elements that have never been encountered before. For instance, it has been shown that 7-month-old babies can also extract and generalize abstract rules from an artificial language 17 and that these rules are captured during speech processing 18 . Such rules are abstract in the sense that they can be applied to new elements in the environment that have never been encountered before. They are often said to be "high-order" because the knowledge of several elements (n − 1, n − 2, etc.) is necessary to predict an upcoming element (n). Well beyond language acquisition, the brain is constantly making predictions based on previous knowledge in virtually all types of learning [19][20][21] . Such predictions may be inferred from both statistical regularities and high-order rules.
Here we explore whether statistical learning and high-order rule learning are related to different ultra-fast consolidation dynamics.
Learning a new visuomotor skill also requires the development of lower-level perceptual and motor skills that do not depend on statistical or high-order rule learning, including visuomotor mapping and dexterity 22 . We refer to this type of learning as general skill learning.
In this study, we used serial reaction time (SRT 23 ) and alternating serial reaction time (ASRT) tasks 24 in which healthy participants encounter an array of four positions on a screen, each paired with a designated response key. Positions are filled sequentially with deterministic (in both SRT and ASRT) or probabilistic (in ASRT) patterns and the participant has to push the corresponding key as fast and as accurately as possible. These task designs allow the distinction between statistical learning, high-order rule learning, and general skill learning. Note that the measure of general skill learning is a mixed measure that includes deterministic sequence learning in Experiment 1 and fatigue effects in all experiments. We identified the short-scale temporal dynamics of these three types of learning by measuring the performance gains during short practice (online) or rest (offline) periods. Our analyses revealed a critical distinction between statistical learning that is acquired during practice and high-order rule learning that is acquired during rest periods (Fig. 1). These results suggest that the brain mechanisms leading to statistical and high-order rule learning are fundamentally different, the former requires online evidence accumulation while the latter requires a rest consolidation period.

Dynamics of general skill learning
In the three experiments, average RT per block for all trials (excluding random blocks in the SRT task) decreased over time These results may suggest that general skill learning occurs offline. However, the general skill learning dynamic is highly sensitive to within block fatigue, as clearly observed with the decrease of performance within each block in all experiments (Fig. 2a, e, h). The observed performance increase during rest periods might then be mainly due to fatigue release [25][26][27] . To investigate whether that performance increase during rest periods reflects, at least in part, offline learning and not only fatigue/ inhibition release, we analyzed rest periods following the first blocks of each session, during which no performance decrements were observed (average of the first blocks of the two sessions for Experiment 1, first block of the session for the Experiment 2, and average of the first blocks of the eight sessions for Experiment 3). General design and main results. a Structure of the sequences used in the SRT task and ASRT task. In the SRT task, a deterministic sequence of 12 elements is repeated five times per block. In the ASRT task, a deterministic sequence of four elements is interleaved with four random elements resulting in an eight-element probabilistic sequence, which is repeated ten times per block. b The number of participants and sessions in Experiment 1 (SRT experiment), Experiment 2 (ASRT experiment), and Experiment 3 (long ASRT experiment). c Type of learning investigated in each experiment. d Summary of the results. General skill and high-order rule learning occur during rest periods (offline) while statistical learning occurs during practice (online).
No decrease in performance occurred during these first session blocks and even a modest performance increase occurred in  (Fig. 2d, g, j). Note, however, that these additional analyses do not ensure that the observed offline gain in general skill learning is not simply due to a fatigue/inhibition release (see "Discussion" section for further details).
In Experiment 1, because there is only one type of transition (deterministic), we cannot dissociate general skill from sequence learning within each block or rest period. However, in the ASRT tasks (Experiments 2 and 3), general skill learning can be estimated independently from any structural or sequence learning by considering only the random-low trials instead of all trials. This measure led to similar learning rates ( We also investigated whether offline general skill learning across days or weeks was also visible. In Experiment 1, offline change in general skill performance between sessions 12 h apart was significant (M LongOffline = 28.23 ± 61.88 s, t(62) = 3.59, p < 10 −3 , d = 0.46). In Experiment 3, offline change in general skill performance between sessions a week apart was not significant (M LongOffline = 5.00 ± 17.25 s, t(24) = 1.42, p = 0.17, d = 0.30). Significance is noted by a single asterisk (*) for p value <0.05 and four asterisks (****) for p value <0.0001. In violin plots, higher values mean greater learning. Error bars represents standard error.

DISCUSSION
Our brains can learn new skills very quickly. But the short-scale dynamic of learning, and in particular, whether the new skill can be learned during practice or short rest periods, has only recently started to be investigated [8][9][10] . Here we used 3 different experiments (1 with SRT and 2 with ASRT tasks) and a total of 506 behavioral sessions to characterize the online and offline contribution for 3 types of learning, namely, general skill learning, statistical learning, and high-order rule learning. Our results revealed that the short-scale dynamics of different types of learning are mirroring each other, building up either during practice or during the following rest periods. Specifically, statistical learning is acquired during practice periods, while high-order rule learning is acquired during break periods.
Statistical learning refers to the process of extracting probabilistic structure from the environment 28,29 . In our ASRT tasks, statistical learning is evidenced by shorter RTs during triplets that appear frequently (random-high trials) compared to triplets that appear less frequently (random-low trials) 24 . Performance in statistical learning increases during practice and decreases during rest periods (Fig. 3). These results suggest that statistical learning benefits from evidence accumulation developing during practice and does not consolidate but decays during rest periods. This observation may explain why no evidence for offline consolidation of statistical learning was found during 12-h sleep or awake periods [30][31][32][33] .
Conversely, high-order rule learning 34 , evidenced by faster performance during pattern relative to random-high trials specifically increases offline during rest periods (Fig. 4). This type of learning is much lower in magnitude than statistical learning and becomes significant only after many trials or sessions, as in the third experiment. Indeed, while the probabilistic learning in the ASRT task is based on acquiring the statistics on low-order, simple transitions, the high-order rule learning is, as indicated by its name, based on acquiring the deterministic rule on high-order, complex transitions, i.e., every other trial. A potential explanation for these opposite results in these two learning types is that statistical knowledge on simple transitions can be acquired under attentional distraction coming from the task itself of mapping visual cues with response keys. In contrast, high-order rule learning could need more attentional resources and consequently occurs only between practice periods. It has indeed been shown during sequence learning that simple transitions 33,35-37 , but not more complex structures 38 , could be learned under attentional distraction.
Another possible explanation stands in the deterministic vs. probabilistic nature of these two types of learning. While deterministic and probabilistic information may be considered as a continuum of the same process (deterministic rule is mathematically an extreme case of statistical information with probabilities of 0 or 1), past research suggests that both processes are linked to different brain regions 39 , influenced differently by the explicitness of the information 40 and better modeled by two distinct hypothesis spaces instead of one 41 . It is then possible that uncertain regularities (statistical learning) need evidence accumulation and can only be acquired online while deterministic regularities (rule learning) need a rest period to be consolidated, possibly because they are somehow rehearsed or replayed during rest. Future studies will have to dissociate whether this difference in dynamics between statistical and high-order rule learning is related to the low-order/high-order or the probabilistic/deterministic nature of the learning, or a mixture of both.
Our results also show that general skill learning seems to be acquired during rest periods (Fig. 2). This result stands both when the measure for general skill learning included all trials or only random-low trials (Experiments 2 and 3), excluding then any predictable patterns from the stimulus stream. It thus suggests that the fast consolidation of procedural learning during breaks observed in previous research [8][9][10] is less dependent of the sequence learning itself but depends more on a mixture of improvement in sensorimotor transformation, dexterity, and familiarization with the task. Statistical and high-order rule learning are measured as a difference between two types of trials, precluding that the offline gap in performance is due to a release of fatigue or reactive inhibition effect 27 . In contrast, general skill learning is measured by a simple RT, which is very sensitive to fatigue, as depicted by the constant decrease in RT within blocks in the three experiments (Fig. 2a, e, h). To investigate whether the offline gap in general skill performance is not simply a release of fatigue, we tested the offline change in general skill performance after the first blocks of each session during which there is no decrease in RT (Fig. 2d, g, j) and the offline gain was still present. It is then possible that offline improvements in general skills are not only related to fatigue release but also reflect consolidation processes. Nevertheless, it is also possible that, during the first blocks of learning, the within-block learning rate counteracts the within-block fatigue effect, yielding to no observable fatigue effect. The design of the present study does not allow to firmly conclude on the offline/online dynamic of the general skill learning in the absence of a clear control for fatigue effect 9,25,26 .
In this study, we identified the short-scale temporal dynamics of two types of learning, namely, statistical learning and high-order rule learning, extracted from the same information stream. We revealed that they are not developing at the same time, with statistical learning developing online while high-order rule learning is developing offline. These results suggest that such types of learning rely on separate neural mechanisms with their own dynamics. Our unprecedented dissection of the short-scale dynamics of subcomponents of learning challenge the classical view of memory acquisition and consolidation, which would be applied indifferently to all types of learning. We revealed, on the contrary, that statistical learning occurs only during practice and high-order rule learning occurs only during breaks.

METHODS Participants
Two hundred and sixty-eight (268) healthy young volunteers participated in 3 studies (192 women, 76 men, mean age = 22.2 years) for a total of 506 reported behavioral sessions. All participants had normal or corrected-tonormal vision, and none of them reported a history of any neurological and/or psychiatric condition. Participants provided informed written consent to the procedure before enrollment, as approved by the institutional review board of the local research ethics committee. The three experiments were approved by the United Ethical Review Committee for Research in Psychology (EPKEB) in Hungary and by the research ethics committee of Eötvös Loránd University, Budapest, Hungary. The experiments were conducted in accordance with the Declaration of Helsinki. Participants received course credits for taking part in the experiment. Data from Experiment 2 were previously published 27,42 . The results of the present paper were not tested nor reported before. Figure 1 summarizes the design of the present study.

SRT task
During the SRT task 23 , four empty circles were horizontally arranged on the screen. Participants were instructed to respond to a stimulus (a dog's head) Fig. 4 High-order rule learning occurs during rest periods. a Average high-order rule learning (RT difference between pattern and randomhigh trials) per block (black line) and per bin (green line) for Experiment 3 (long ASRT). For better visualization, a zoom-in for day 1 and day 8 is represented. b Average online and offline high-order rule learning across all blocks for Experiment 3 (long ASRT). c Online and offline highorder rule learning across all blocks and with a linear fit for Experiment 3. Significance is noted by a single asterisk (*) for p value <0.05. Note that higher values mean greater learning. Error bars represents standard error.
that appeared in one of the four open circles by pressing one of four corresponding keys on a computer keyboard (Z, C, B, or M on a QWERTY keyboard) as quickly and accurately as possible after the appearance of the stimulus. Participants used their left and right middle and index fingers to respond to the stimuli. The stimulus remained visible until participants pressed the correct key, at which time it disappeared. The following stimulus appeared 120 ms after the offset of the previous stimulus. The SRT task was programmed and displayed using the E-prime software (Psychology Software Tools, Inc.). The serial order of the four possible positions (coded as 1, 2, 3, and 4) in which target stimuli could appear was determined by a 12-element sequence (2-3-1-4-3-2-4-1-3-4-2-1) 22 . An experimental session was divided into blocks with either 60 trials corresponding to 5 repetition of the 12-element sequence or 60 pseudorandom trials in which the visual cue no longer played out a deterministic pattern of positions.

ASRT task
The visual display, response modality, timing, instructions, and program software for the ASRT task were similar to those during the SRT task. The serial order of the four possible positions (coded as 1, 2, 3, and 4) in which target stimuli could appear was determined by an eight-element sequence 24,30,43 . In this sequence, every second element appeared in the same order during the entire task, while the other elements' positions were randomly chosen (e.g., 2-r-1-r-3-r-4-r, where numbers refer to a predetermined location in one of the four locations and r refer to randomly chosen locations out of the four possible). A total of six unique sequences of predetermined elements were created and one of them was assigned to each subject in a random order 24 . An experimental session was divided into blocks starting with five random trials (warm-up) followed by the eight-element sequence repeated ten times 31,44 . Warm-up trials were discarded from the analyses.
Due to the alternating sequence structure, some patterns of three consecutive elements (henceforth referred to as triplets) occurred with a higher probability than other ones. Each trial was categorized as the last element of either a high-or a low-probability triplet. High-probability triplets could be formed either by predetermined elements or random ones. In the above sequence example (2-r-1-r-3-r-4-r), the probability that a triplet starting with the element "2" and ending with the element "1" occurred was of 62.5%. Indeed, the item "2" could be either predetermined (50%) or random (50%). If it is predetermined, then the last element of the triplet has to be "1"; if it is random, the last element of the triplet could be any of the four locations. Thus, the item "1" had 50% probability of occurring as the last predetermined element of the triplet plus 12.5% of chances to occur as a random element. In contrast, triplets such as 1-x-2 or 4-x-3 occurred with a low probability (12.5%) because they could only occur when the third element of the triplet was random. Low-probability triplets forming repetitions (e.g., 222) or trills (e.g., 232) were discarded from analyses as participants often show pre-existing response tendencies to them 45,46 . Trials where participants pressed a wrong button were also discarded. Participants were not informed of any regularity. Each trial could be a pattern trial, a random-high trial, or a random-low trial. A pattern trial corresponded to a predetermined element ending a triplet (all pattern trials are high-probability triplets); a random-high trial corresponded to a random element ending a high-probability triplet; a random-low trial corresponded to a random element ending a low-probability triplet. This sequence structure allows the distinction between (i) general skill learning, measured by a decrease in RT for all trials, (ii) statistical learning, measured by the difference in RT between the random-high trials and the random-low trials (because they end two types of triplets that appear randomly, but random-high trials are more frequent than random-low trials), and (iii) highorder deterministic learning, measured by the difference in RT between pattern trials and random-high trials (because they end two types of triplets that are similar in term of sequence but pattern trials, unlike random-high trials, are predictable) 24,47 .

Procedure: Experiment 1
Sixty-three participants took part in this experiment. They each performed two sessions separated by 12 h. Each session contained a total of 13 blocks of SRT task, with the 6th and the 12th block displaying random sequences. Behavioral performances during random blocks were discarded from the analyses (but these are visible in Fig. 2a for illustration purpose). After each block, the average speed and accuracy for the most recent block were displayed to the participants, and they could have a short break before starting the next block by pressing a button. The average block duration across participants and blocks was 31.33 ± 5.11 s. The average break duration across participants and breaks was 24.26 ± 19.83 s.

Procedure: Experiment 2
One hundred and eighty participants took part in this experiment. They each performed one session of 45 blocks of ASRT task. After each block, the average speed and accuracy for the most recent block were displayed to the participants, and they could have a short break before starting the next block by pressing a button. After 15 blocks and 30 blocks, participants had a more extended break and filled questionnaires. The average block duration across participants and blocks was 46.45 ± 3.34 s. The average short break duration across participants and blocks was 18.75 ± 10.7 s. The average break duration for the two longer breaks with questionnaire was 258.0 ± 99.75 s.

Procedure: Experiment 3
Twenty-five participants took part in this experiment. They each performed 8 sessions of 25 blocks of ASRT task. Each session was a week apart. After each block, the average speed and accuracy for the most recent block were displayed to the participants, and they could have a short break before starting the next block by pressing a button. The average block duration across participants and blocks was 41.79 ± 3.78 s. The average break duration across participants and breaks was 18.56 ± 3.31 s.

Learning measures and statistical analyses
General skill learning was defined as a decrease of RT for all trials across blocks. In ASRT tasks, general skill learning was also tested considering random low trials only. Statistical and high-order rule learning was measurable only in ASRT experiments. Statistical learning was defined as an increase of RT difference between random-low and random-high trials (RT random-low − RT random-high ) across blocks 48 . High-order rule learning was defined as an increase of RT difference between random-high and pattern trials (RT random-high − RT pattern ). High-order rule learning takes a high number of trials or sessions in ASRT to become visible. Indeed, in the current study, it was only observable in the long ASRT task (Experiment 3, see "Results" section). To estimate general skill learning, one-way repeatedmeasure ANOVA on the average RT per block with block as a withinsubject factor was used. Main effect of block is reported. To estimate statistical and high-order rule learning, two-way repeated-measure ANOVA on the average RT per block with block and triplets (random-low and random-high trials for statistical learning and random-high and pattern trials for high-order rule learning) as within-subject factors was used. The block × triplet interaction is reported. Greenhouse-Geisser correction was applied to the reported p values. Additionally, Spearman correlation between learning measures (block-average RT for general skill learning or block-average difference in RT between two types of triplet for statistical and high-order rule learning) and block position was used. To measure the online (over practice blocks) and offline (over rest periods) contribution to each type of learning, in both SRT and ASRT tasks, each block was binned into five bins. Each bin corresponds to 12 trials (one 12-element sequence) in the SRT task and 16 trials (two 8-element sequences) in the ASRT task. Online learning was measured as the difference in learning between the last bin of a block and the first bin of the same block. Offline learning was measured as the difference in learning between the first bin of a block and the last bin of the previous block (Fig. 2b). For general skill learning, as learning is defined as a decrease in RT, online and offline measures were reversed so that learning appears positive on the violin plots (Fig. 2c, d, f, g, i, j). One-sample two-tailed t tests against zero were used to assess whether learning occurred during practice (online) or rest (offline) periods, and paired t tests were used to compare learning during practice and rest. Effect size were evaluated using Cohen's d measure.

Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.

DATA AVAILABILITY
All data (https://github.com/romquentin/Learning_during_practice_and_rest) are available online. Further information and requests for resources should be directed