Cognitive control training with domain-general response inhibition does not change children’s brains or behavior

Cognitive control is required to organize thoughts and actions and is critical for the pursuit of long-term goals. Childhood cognitive control relates to other domains of cognitive functioning and predicts later-life success and well-being. In this study, we used a randomized controlled trial to test whether cognitive control can be improved through a pre-registered 8-week intervention in 235 children aged 6–13 years targeting response inhibition and whether this leads to changes in multiple behavioral and neural outcomes compared to a response speed training. We show long-lasting improvements of closely related measures of cognitive control at the 1-year follow-up; however, training had no impact on any behavioral outcomes (decision-making, academic achievement, mental health, fluid reasoning and creativity) or neural outcomes (task-dependent and intrinsic brain function and gray and white matter structure). Bayesian analyses provide strong evidence of absent training effects. We conclude that targeted training of response inhibition does little to change children’s brains or their behavior.


Article
https://doi.org/10.1038/s41593-024-01672-wtraining regimes in terms of diversity, complexity and variability of training input 33,36 and assessing a wide array of behavioral and neural outcome measures both short and long term.
Unlike most cognitive control interventions, which focus on working memory training 22 , in the present sudy we targeted 'response inhibition' as the primary mechanism of action.Inhibition involves a set of highly relevant and widely used processes, including response inhibition or stopping, response selection and contextual monitoring 38 .As such, inhibition may offer a set of cognitive control processes that lend themselves well to training in terms of their domain-general nature as well as the specifically identified training mechanism [39][40][41][42][43][44] .Using a randomized controlled trial, we assessed the impact of an 8-week cognitive control training with response inhibition as the active ingredient in our experimental group.We compared performance changes on a host of outcome measures with an active control group training response speed, before and after training as well as at a 1-year follow-up.Outcome measures were chosen based on their well-established relationship with cognitive control and response inhibition specifically and included There is broad consensus that these functions can be improved through training, albeit in a relatively narrow and often task-specific manner (that is, near transfer) 23,24 .However, changes in other distally related domains of cognitive functioning and real-world outcomes (that is, far transfer) have been much less consistently observed 22,23,[25][26][27][28][29][30][31][32] .Although views differ on whether cognitive training can actually lead to far transfer, the quality of evidence has been consistently questioned 33,34 .Given the likelihood of small effect sizes, criticisms have focused on underpowered samples and poorly specified training mechanisms 33,35,24 .Furthermore, training regimes often lack core features minimally required for far transfer, such as continuously variable, diverse and complex input 18,36,37 , and assessment of training-related outcomes focuses mostly on only short-term effects and a limited number of outcome measures 29 .Finally, the frequent absence of active control groups prohibits drawing any inference on the reasons, let alone mechanisms, for any transfer effects.Here we address whether cognitive control training transfers onto other domains of functioning.We do so in a highly powered sample of children using best practice recommendations for  Fig. 1 | Training metrics.a, Motivation (week 2: Experimental Group = 5.30 ± 0.80 (n = 38, min = 3.5, max = 6.17, q1-q3 = [4.67,5.67]);Control Group = 5.30 ± 0.72 (n = 39, min = 3.17, max = 6.17, q1-q3 = [5,6]) decreased over the weeks F (6, 308.75) = 16.42,P < 0.001) and was similar (t (395.13)= −0.Article https://doi.org/10.1038/s41593-024-01672-wsocial and intertemporal decision-making [4][5][6]45 , academic achievement 7,8 , fluid reasoning 46 , mental health (that is, internalizing and externalizing symptoms) 9,47 as well as creativity 48 . Tounderstand the underlying neurocognitive basis of potential training effects, we also sampled a wide assay of neural indices of brain function, structure and connectivity.In addition to whole-brain analyses, we focused on regions implicated in cognitive control, including the inferior frontal gyrus (IFG) 38,49 and cingulo-opercular and fronto-parietal networks (CONs and FPNs, respectively) 50 . In ddition to assessing the impact of the training regime as a whole, we sought to test two recent hypotheses concerning cognitive control training, namely (1) that far transfer effects emerge only over time 29 and (2) that near transfer effects mediate far transfer effects 51 . Fially, we made use of the occurrence of a naturally occurring stressor, coronavirus disease 2019 (COVID- 19), to test the commonly held view that cognitive control might buffer against the onset of mental health problems 47,52 .Training duration was chosen to be 8 weeks, which was previously shown to be sufficient for far transfer 26,29 .We developed a highly motivating gamified interface to train response inhibition through variations of the stop-signal task (Experimental Group) or response speed (Control Group). Bothgroups received identical training in terms of narrative, stimuli and intensity, and the only difference between the groups was how participants were instructed to respond to the stop stimuli (inhibit for the Experimental Group and respond for the Control Group).Training involved a high degree of variation of training contexts and mechanisms and further ensured adaptiveness of the training protocol (Supplementary Figs. 2 and 3) by means of trial-by-trial adaptation (using a staircase procedure) based on performance, such that trials were scaled appropriately to individual abilities for both groups.We refer to closely related domains as 'near transfer', which are outcome measures with a highly similar task structure as to what was trained 53 . Evrything else we refer to as 'far transfer'.Power calculations estimated that to obtain even a small Group-by-Session interaction effect of f = 0.1 with a power of 0.95 at an alpha Bonferroni corrected for the present number of measures (19; corrected alpha = 0.0025) requires a minimal sample size of 119 participants.The present sample of 235 children is almost twice that and, therefore, amply powered.Leveraging such a large sample also allows us to establish evidence of the absence of the effects of cognitive control training by using Bayesian factor (BF) hypothesis testing 54 .All main hypotheses and analyses for this study were pre-registered: https://osf.io/bn75g/.Correction to control for false discovery rate (FDR) with multiple testing of pre-post training effects was done using the Benjamini-Hochberg procedure 55 .

Associations between cognitive control and outcome measures
We first tested how cognitive control performance was associated with each of our outcome measures.To remove task-related variance specific to any assessment of cognitive control, we obtained a single factor of cognitive control derived from multiple cognitive control measures (Methods).We observed significant positive associations between cognitive control performance and several of the outcome measures in the expected direction (Extended Data Fig. 1): delay of gratification (that is, percentage of delayed choices in the intertemporal choice task; t (226) = 2.44, P = 0.015); academic achievement (t (217) = 2.53, P = 0.012); fluid reasoning (that is, Wechsler Abbreviated Scale of Intelligence (WASI) scores; t (216) = 2.27, P = 0.024); externalizing symptoms (t (184) = −2.15,P = 0.032) as well as mean diffusivity of right fronto-striatal tracts (t (145) = −2.81,P = 0.005).Cognitive control performance was, thus, correlated with a host of other outcomes, as commonly reported in the literature [7][8][9]45 .

Article
https://doi.org/10.1038/s41593-024-01672-wnumber of sessions (Experimental Group: n = 16.60 ± 8.35; Control Group: n = 16.99 ± 8.55).No significant difference was observed in the amount trained between both groups (t (205.33)= 0.33, P > 0.740; BF 10 = 0.16; Fig. 1b).To assess whether each group improved on the trained cognitive function throughout the intervention, we examined changes over the training sessions in the stop-signal reaction time (SSRT; Experimental Group) and the 'go-signal' reaction time (Go RT; Control Group), respectively.For this, we looked at the slope of change in each trained cognitive function using a mixed model with training weeks added as a predictor.There was a main effect of Session where both groups improved on their trained cognitive functions over the training weeks (Experimental Group: F (1, 2292.60)= 121.30,P < 0.001, η 2 = 0.05; Control Group: F (1, 3197.5)= 185.57,P < 0.001, η 2 = 0.05; Fig. 1c).Thus, groups did not differ in training intensity or motivation and showed moderate improvements during training in the targeted processes.

Short-term training-related changes
Near transfer.As a primary measure of near transfer, we looked at the probability of successful stopping and response times to 'go' stimuli.The latter is of interest for both indexing training success for the response speed group as well as providing a measure of proactive slowing 56 for the Experimental Group.A mixed model revealed a significant interaction between Session and Group in the probability of successful stopping in the SSRT (F (1,221.00)= 27.31,P FDRcorr < 0.001, η 2 = 0.11; Fig. 2a).Follow-up paired t-tests comparing pre-post training scores revealed that the probability of successful stopping increased in the Experimental Group (t (215) = −5.96,P < 0.001).However, no significant change was found in the Control Group (t (218) = 1.43,P = 0.92).We also observed a significant interaction between Session and Group in Go RT (F (1, 227.28) = 31.75,P FDRcorr < 0.001, η 2 = 0.12; Fig. 2b).Follow-up paired t-tests comparing pre-post training scores revealed that reaction times increased in the Experimental Group (t (228) = −5.02,P < 0.001) and decreased in the Control Group (t (228) = 2.94, P = 0.021).

Far transfer-behavioral indices. Cognitive control.
Training cognitive control was operationalized by targeting response inhibition.We assessed the impact of training response inhibition on other subprocesses associated with cognitive control (that is, inhibition as measured by tasks other than the SSRT, shifting and working memory).Given the potentially different impact of training on both speed and accuracy 57 , we performed factor analyses across all cognitive control tasks separately for error rates and reaction times (Methods).This yielded two factors for error rates (one jointly for inhibition and shifting and one for memory) and one single factor for reaction times.For error rates, there was a Session-by-Group interaction found with the inhibition/shifting factor (F (1, 215.68) = 10.678,P FDRcorr = 0.006, η 2 = 0.05; Fig. 3a).Follow-up paired t-tests, however, revealed that neither group changed significantly from pre-training to post-training.For the memory factor, there was no Session-by-Group interaction (F (1, 212.72) = 0.090, P = 0.764, η 2 < 0.001, BF 10 = 0.188; Fig. 3b).For the reaction time factor, there was a significant Session-by-Group interaction (F (1,213.71)= 18.60,P FDRcorr < 0.001, η 2 = 0.08; Fig. 3c).Pre-post t-test comparisons in the Experimental Group revealed an increase from pre-training to post-training (t (213) = −2.94,P = 0.022) and a decrease for the Control Group (t (212) = 3.16 P = 0.011).Decision-making.For the role of the proposer in the Dictator Game (DG) for coins shared, there was no significant Session-by-Group interaction (F (1, 199.18) = 0.144, P = 0.705, η 2 < 0.001, BF 10 = 0.201; Fig. 3d).For the role of the responder in the Ultimatum Game (UG) for offers accepted, there was no significant Session-by-Group interaction (F (1, 196.49) = 2.36, P = 0.126, η 2 = 0.01, BF 10 = 0.176; Fig. 3e).In the intertemporal choice task, there was no significant Session-by-Group interaction in the total percentage of delayed choices (F (1, 203.60) = 1.01,P = 0.317, η 2 = 0.004, BF 10 = 0.150; Fig. 3f).
Cortical thickness.To assess potential training-related changes in cortical gray matter structure, we looked at the whole brain.There was no significant interaction between Session and Group for any voxel.

Long-term training-related changes
Near transfer.We also tested if any training-related changes might persist or, indeed, emerge over time, as was asserted previously 29 , by comparing performance on outcome measures between training groups 1 year after training.For the probability of successful stopping in the SSRT, there was a significant interaction between Session and Group (F (1,227.16)= 8.68, P FDRcorr = 0.018, η 2 = 0.04; Fig. 5a).Follow-up paired t-tests revealed that the probability of successful stopping remained increased in the Experimental Group (t (217) = −4.38,P = 0.001) after 1 year; however, no significant change was found in the Control Group (t (218) = −0.202,P = 1.000).For reaction time to the 'go' signal, there was a significant interaction between Session and Group (F (1, 235.94) = 13.32,P FDRcorr < 0.003, η 2 = 0.05; Fig. 5b).Follow-up paired t-tests revealed that reaction times remained elevated in the Experimental Group (t (231) = −6.992,P < 0.001); however, no significant change was found in the Control Group (t (230) = −1.844,P = 0.399).

Mediation of far transfer by near transfer
A common argument in defense of the large heterogeneity within far transfer effects from training studies is that this depends crucially on whether near transfer is found 51 .We examined if changes in near transfer were in any way predictive of changes in far transfer.Our measure of near transfer was the probability of successful stopping.We found that near transfer was not predictive of performance change on any far transfer measure.

Training effect on mental health after COVID-19 lockdown
Much research has been dedicated to establishing that cognitive control might serve as a buffer to the onset of mental health problems 47,52 .Although our present sample was not at risk, data collection took place during COVID-19, which presented considerable challenges to mental health due to school closures and lockdowns 59 .We examined whether training cognitive control would buffer against any negative impact of COVID-19 measures on mental health.We studied apathy and mental health using the Apathy Evaluation Scale, clinical version (AES-C), and the Strengths & Difficulties Questionnaire (SDQ) for ages 4-17 years before and after the COVID-19 lockdown.We found that both groups were similar in terms of positive cases of COVID-19 as well as perceived stress (Supplementary Tables 5 and 6).Crucially, although we found a significant increase in apathy after the COVID-19 lockdown (F (1,178.29)= 29.82,P < 0.001; Extended Data Fig. 2a), this was not buffered by response inhibition training (F (1, 178.78) = 0.014, P = 0.905, η 2 < 0.001, BF 10 = 0.188; Extended Data Fig. 2a).There was no buffering effect of training on the strength and difficulties scores after the COVID-19 lockdown (F (1, 154.32) = 3.05, P = 0.083, η 2 = 0.008, BF 10 = 0.141; Extended Data Fig. 2b).

Controlling for socioeconomic status
To test for the robustness and generalizability of our effects, we re-ran all analyses of short-term and long-term near and far transfer effects while also controlling for socioeconomic status (SES).Controlling for SES did not change any of the outcomes.

Discussion
The critical role of cognitive control in healthy and productive development and positive later-life outcomes has attracted tremendous interest from researchers and policymakers seeking to understand how cognitive control development can be supported.However, consensus on whether this is possible has been difficult to reach.In this study, we addressed whether cognitive control can be improved by means of a targeted response inhibition training and whether such training has a lasting, wider impact on cognitive and neural functioning.We developed an 8-week intervention, which was administered to a highly powered sample of 235 6-13-year-old children in a pre-registered randomized controlled trial including an active control group training response speed.We found that our training led to specific improvements in the trained functions (that is, response inhibition and response speed), which lasted up to 1 year after training.We further found that response inhibition training led to more cautious responding on a   1 and 2).In sum, response inhibition training appears to do little to alter children's brains or their behavior in long-lasting ways.
Research on the effectiveness of cognitive control interventions has been riddled with contradictory findings 32,60,61 .However, consensus exists that this is best arbitrated by high-quality evidence 33 , namely through randomized controlled trials with an active control group 33,34 and clearly defined training mechanisms 33,35,24 implemented in a variable, dynamic and adaptive training schedule 18,36,37 across a large sample of participants and with a comprehensive set of outcome measures taken at multiple timepoints.The present study represents such an approach, following current best practices of the field 33,36 to interrogate whether a core facet of cognitive control-response inhibition-can be improved and whether this leads to changes in other domains of functioning.We found that each group improved throughout the intervention on their trained process and that training effects remained present up to 1 year after the end of training, suggesting that the training was highly effective at improving the targeted cognitive processes.We also found that the proactive slowing exhibited in the experimental group became manifest as general slowing on other cognitive control tasks.Although it has been shown that training response inhibition can increase proactive control 62 , the absence of reduced errors on cognitive control tasks in the present study suggests that such slowing does not bestow any strategic advantage.The fact that the two training groups improved on the targeted function strengthens the evidence of absent training effects on any far transfer measure or underpinning neurocognitive outcome.Bayesian analyses demonstrate evidence of the absence of transfer effects on any of the tested domains or brain mechanisms implicated in cognitive control.Furthermore, the present study also addresses two recent hypotheses for the large heterogeneity  51 .We did not find evidence to support this claim in the current work.Similarly, it has also been argued that far transfer effects might emerge over time and can, therefore, be detected only by testing again at least 1 year after the end of an intervention 29 .Again, we did not find any evidence for such effects.Finally, we were able to leverage the unique opportunity of COVID-19 as a large-scale and unintended stressor that occurred during the period of our study, allowing us to test a commonly held assumption, namely whether cognitive control training would buffer against the onset of mental health difficulties after a stressor 11,47 .We did not find any evidence of such an effect, and, in fact, we found moderately strong evidence of the absence of an effect of appreciable magnitude.In sum, the present study provides evidence against the possibility of training cognitive control in targeted ways to improve associated domains of functioning, at least as instantiated through a response inhibition intervention.
A fundamental feature of virtually all cognitive control interventions is to attempt to bring about improvements by directly increasing the capacity of the targeted function (that is, extend the number of items held in working memory and accelerate the speed of inhibition or flexibility) 24 .This approach is predicated on the assumption that cognitive control is a limited capacity or resource 63 , with little regard for what might motivate its use.The present study demonstrates that such an approach does not impact children's behavior or underlying neural architecture, at least not through targeting response inhibition.Indeed, resource accounts of cognitive control, although popular for many years, are being debunked on both theoretical and empirical grounds 64 and replaced with theories that consider cognitive control as inherently goal-oriented processes 65,66 .A growing body of empirical evidence and computational modeling has shown that cognitive control is assigned a value as a function of subjectively perceived effort and the likely reward or goal priority 65,67,68 .Critically, these insights were successfully leveraged recently in the context of aiming to improve cognitive control.For instance, effort-contingent rewards introduced during cognitive control tasks, by means of objective assessments of effort, led to an increased preference of effort in new tasks, such as difficult problems of arithmetic 69,70 .In conjunction with the present findings that cognitive control cannot be changed through artificially inflating capacity, this raises the possibility that cognitive control could be improved in ways that lead to changes in other domains by targeting motivation and effort expenditure, something that has yet to be tested in developmental populations.
We note some limitations in the current work.Although the overall duration was longer than other recent studies demonstrating far transfer 26,29 , there is a possibility that the present training was insufficient in terms of dose or implementation.Furthermore, our sample came predominantly from above-average SES backgrounds.Although there is still some variability in SES, which, when accounted for, does not alter the results, we acknowledge that our findings may not generalize to other samples and, in fact, that such a training might be efficacious for children coming from lower SES backgrounds (although see ref. 71).
In conclusion, we followed best practice recommendations for designing cognitive trainings to test whether cognitive control can be improved in durable ways through training response inhibition and whether this leads to changes in associated domains of functioning in a large sample of children and number of outcome measures.Although trained functions improved in both groups and did so up to 1 year after training, and response inhibition training led to more cautious task responding generally, our training did not lead to changes in children's behavior or associated neural mechanisms.Given the considerable policy implications of how children can be supported in their development, these findings caution against any further investment in seeking to improve response inhibition specifically and cognitive control more generally through trainings that canonically aim to boost these capacities wholesale.

Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41593-024-01672-w.Mixed models were used to examine long-term training effect.Significant interaction effects between Session and Group were interpreted as presence of training-related changes.Effect size was calculated for the interaction effect of Group and Session.The Benjamini-Hochberg procedure was then applied to mixed-model analysis testing for training-related changes.For the results that are significant, we report their adjusted P value after correction to control FDR with multiple testing.For evidence of null effects, we report the BF in favor of the null model (the model without a Group-by-Session interaction) over the training model (the model with a Group-by-Session interaction) for each measure of interest.H0, the degree of change in the outcome measure between the two groups following training is the same; H1, the degree of change in the outcome measure between the two groups following training is different.The University College London (UCL) ethics committee approved this study (protocol number: 12271/001).In accordance with this, written informed consent was obtained from parents, and assent was obtained from children after a description of the study was provided.

Study design
This study had four main phases.After an initial baseline data collection phase at pre-test, the 8-week computerized intervention was administered.This was followed by a post-test and, finally, a 1-year follow-up.Behavioral, questionnaire and neural data (that is, at pre-test, post-test and 1-year follow-up) were collected to examine independent near transfer and far transfer changes.Due to disruptions to in-person testing during the COVID-19 pandemic, no magnetic resonance imaging (MRI) was obtained at 1-year follow-up.Retention was 71.24% from pre-test to post-test and 99.40% from post-test to 1-year follow-up.

Training games.
Training was programmed on Gorilla Experiment Builder (https://gorilla.sc/), a platform for running behavioral research online.Training was presented in the form of a computerized web-based Treasure Game.The training was designed to last 8 weeks, with four recommended sessions per week, one taking place at school and three at home.Each session was programmed to take approximately 15 min.
Both groups received identical training in terms of narrative, stimuli and intensity (Supplementary Fig. 2).The only difference between the groups was how participants were instructed to respond to the stop stimuli (that is, inhibit for the Experimental Group and respond for the Control Group; further details are provided in the Supplementary Information).Once every week, questions regarding children's motivation were administered (Supplementary Information).
Experimental Group: response inhibition training.To train response inhibition, a stop-signal response task was used.Participants were instructed to press the spacebar on presentation of a 'go' signal.On stop trials where a 'stop' signal appeared after the 'go' signal, participants were instructed to inhibit pressing the spacebar (however, see Supplementary Information Table 4 for specific descriptions of each training game and training mechanism).'Go' and 'stop' signal stimuli and inhibition mechanism varied according to the game being played.The stop signal delay (SSD) was initially set at 200 ms.After successful inhibition, the SSD would decrease by 50 ms, and, after failed inhibition, it would increase by 50 ms 75,76 .This ensured that the training was adaptive.Stop trials occurred 26-47% for each training session.To ensure adaptiveness across training sessions, the SSD of each subsequent session was taken from the final 'stop' trial of the preceding session on that specific training game.
Control Group: response speed training.The response speed training was identical to the experimental condition in all aspects except that a response was required for all signals.Participants were instructed to press the spacebar as quickly as possible.To ensure that training was adaptive for this group, participants had to respond within a time window that was set based on a rolling average of the response time of the previous 10 trials plus two standard deviations.This ensured that the training was adaptive while minimizing the effect of outliers on the response threshold.

Pre-post tasks
Before and after the training, three assessment timepoints took place onsite at the author's laboratory: before the training (T0), after the training (T1) and 1-year follow-up (T2).Note that, due to the outbreak of the COVID-19 pandemic in March 2020, some participants completed one or more assessment timepoints online from home.The assessment battery included several child-friendly tasks measuring cognitive control and neural measurements as well as creativity, mental health and academic performance (Supplementary Fig. 1).

Cognitive control tasks
A total of nine cognitive control tasks were administered, assessing different functions (that is, inhibition, shifting and working memory).For all tasks, participants were presented with practice trials, before main trials were administered, where they had to attain a criterion threshold for accuracy.Additionally, comprehension questions were employed to ensure participants understood the rules for each task (for example, 'What button should you press if you see a bear on the screen?').Rules were re-explained if participants answered incorrectly on any of the questions.The experimenter noted if the participant still failed to comprehend the task.All participants managed to pass these comprehension questions; therefore, no individual was excluded from the analysis.The task was presented using Presentation software (https://www.neurobs.com/,version 23).For remote testing during COVID-19, a subset of executive function tasks was administered online via Gorilla (https://gorilla.sc/) 77,78.

Inhibition tasks. SSRT task.
A measure of cognitive control was administered via a child-friendly version of the SSRT 79 .Ten practice trials were administered before 80 trials of the main task.Each trial started with the presentation of a fixation cross of 1,250 ms.During the task, participants were asked to press the left arrow key when seeing the 'go' signal (that is, a honey pot) on the left side of the screen and the down arrow key when the signal appeared on the right side.On 25% of the trials (that is, a 'stop' trial), a picture of bees was presented after the honey pot.This served as the 'stop' signal.The SSD started at 200 ms, decreased by 50 ms after a successful 'stop' trial and increased by 50 ms after an unsuccessful 'stop' trial.As a measure of inhibition, a mean SSRT (ms) was calculated using the integration method 80 .Several studies validated the SSRT as a measure of response inhibition 81 , and it is correlated with self-report measures of impulsive behaviors in young adults 75 .
Flanker inhibition.The participants completed a child-friendly version of the Eriksen flanker inhibition task 82 .Children were presented with a row of fish on the screen.They were required to focus on the fish in the center (named Chloe) and indicate the direction in which it was swimming (that is, left key response required when the fish was facing left; Article https://doi.org/10.1038/s41593-024-01672-wdown key response required when the fish was facing right).Participants were told to ignore the direction that other fish swim in and only indicate the direction that Chloe swam in.On congruent trials, all fish faced the same direction.On incongruent trials, surrounding fish faced the opposite direction to Chloe.Fish were presented for 700 ms before they disappeared.Participants were given a maximum of 2,500 ms to respond from stimulus onset.A total of 20 congruent trials and 20 incongruent trials were administered.This task was chosen because it is a child-friendly task for ages 6 years and up and was validated in several studies 83,84 .The difference in both reaction times and error rates between incongruent trials and congruent trials was calculated separately.
Stroop.Participants completed a child-friendly version of the Stroop task 85 .The task was introduced as the 'Farm Animal' game, where they were told to match animals to their homes (for example, dog to a kennel).They were presented with both auditory stimuli of an animal sound (for example, 'bark', 'meow' and 'croak' for dog, cat and frog, respectively) and visual stimuli of the animals.Crucially, participants were asked to match animals to where they live (for example, frog to a pond).They were told to listen carefully to an auditory cue indicating the animal type (for example, frog -'ribbit') and not to pay attention to the visual cue of the animal presented on the screen.Trials lasted for 10,000 ms within which participants had to make a response.Although audio stimuli were presented for 600 ms, visual stimuli were presented until participants made a response (maximum of 1,000 ms).A blank screen with a 'cross' was presented between trials for 10,000 ms (inter-trial interval (ITI)).On congruent trials, both auditory and visual cues matched (for example, frog presented on screen and 'ribbit' tone played).On incongruent trials, auditory and visual cues did not match (that is, dog presented on screen and 'ribbit' tone played).Participants completed 72 trials in total, with 36 congruent and 36 incongruent trials.The differences in both reaction times and error rates between incongruent trials and congruent trials were calculated separately.

Memory tasks. N-back.
Both the 1-back and 2-back tasks were administered to measure working memory 86 .The task was adapted to be child-friendly and introduced as the 'Dino-Donut' game, where participants were told that dinosaurs were lining up to eat some donuts.For the 1-back task, they were told to stop dinosaurs that tried to eat a donut twice in a row and to press the spacebar if they appeared consecutively to stop them.For the 2-back task, they were told that the dinosaurs became sneakier, and this time they should press the spacebar if the same dinosaur appeared two trials prior.Stimuli were shown for 500 ms followed by a 1,500-ms inter-stimulus-interval (ISI).Responses had to be made before the onset of the next stimulus presentation.Participants completed 80 trials in total, 40 for each N-back condition.As a measure of error rate, false alarm rate was calculated for both 1-back and 2-back tasks.Reaction times to make a correct response were also calculated.
Corsi block-tapping task.Working memory span was assessed using the Corsi block-tapping task, which measures visuo-spatial working memory span with a higher value indicating a higher working memory span 87 .This task consisted of 'Freddy the frog' jumping between nine potential locations designed as lily pads.The participants followed the jumps by clicking on the lily pads in a forward sequence.Trials commenced with a countdown from 3 to 1 to alert participants to the start of a trial.Then, the stimulus of the frog jumping was shown for 600 ms for every jump.The ISI was fixed to 600 ms.Participants completed three practice trials with feedback, and there was a total of 14 main trials.Initially, participants had to remember and click on two lily pads.The task employed an adaptive staircase design where the working memory load (that is, number of lily pads to remember) increased by one when participants made two consecutive correct answers.The maximum working memory load attained was used as a working memory span measure.

Shifting tasks. Cognitive flexibility.
A child-friendly version of the cognitive flexibility task assessed participants' ability for rule switching across dimensions (using sound cues: 'animal' or 'size').If a sound cue of 'animal' was played, participants had to indicate if the animal was a cat or a dog.If a sound cue of 'size' was played, participants had to indicate if the animal was big or small 88 .Participants had 10 s to respond, during which the stimuli remained on the screen before the trial timed out.Responses made before 200 ms after stimulus onset were not recorded.The ITI was jittered and ranged from 1,000 ms to 1,200 ms.Stay trials were preceded by a trial with the same rule (for example, deciding on the type of animal was presented twice in a row).During switch trials, the current trial was preceded by a trial in a different dimension (that is, participants had to first respond to the size of the animal and then to the type of animal that is presented).After a practice block, participants completed 40 trials (consisting of 28 stay trials and 12 switch trials).Participants completed 20 single-dimension trials in two blocks and 40 mixed trials in one block.The difference in both reaction times and error rates between switch trials and stay trials was calculated.
Flanker shifting.The participants completed a child-friendly version of the Eriksen flanker shifting task 88 .Children were presented with a row of fish on the screen.They were told that all the fish swim in the same direction.However, two colors of fish would appear: orange and purple fish.When orange fish were presented, they were instructed to indicate the direction in which the fish swam (that is, left key response required when the fish faced left; down key response required when the fish faced right).When purple fish were presented, they were instructed to indicate the opposite direction in which the fish swam (that is, left key response required when the fish was facing right; down key response required when the fish was facing left).Fish were presented for 700 ms before they disappeared.Participants were given a maximum of 2,500 ms to respond from stimulus onset.Stay trials were defined as those where the rule for the previous trial was the same as the current trial (that is, purple trial after a purple trial; orange trial after an orange trial).Switch trials were defined as those where a rule change has occurred (that is, purple trial after an orange trial; orange trial after a purple trial).Based on this, there were 28 stay trials and 12 switch trials.The difference in both reaction times and error rates between switch trials and stay trials was calculated.
Complex cognitive control tasks.AX-CPT.Reactive and proactive control were measured using a child-friendly version of the AX-CPT paradigm 89 .The task was introduced as the 'Fruit Island' game.An 'A' or 'B' cue (that is, dog or cat) was presented in the middle of the screen for 500 ms, followed by an ISI of 750 ms and then a probe 'X' or 'Y' (that is, orange or apple) during which participants had to make their response.Participants were instructed to press the left key whenever an 'X' followed an 'A' (that is, AX trials) and to press the down arrow key for all other cue-probe combinations.Importantly, they were instructed to only respond once the probe had been presented and were alerted of this if they made a response before the probe was presented.Participants had a maximum of 6,000 ms to make a response.Responses were followed by an ITI of 1,500 ms.The proportions of the trial types were based on previous studies 89,90 where 40% of trials were AX trials.All other trials (that is, AY, BX and BY trials) were presented 20% each.Trials were presented randomly.Ten practice trials were administered where feedback was provided, followed by 60 main trials.Proactive Behavioral Index (PBI) was calculated for error rates and reaction times separately 91 .

Decision-making tasks
Participants were told that they would be playing a series of games where they could win monetary units (MUs) and exchange these for gifts at the end of the experiment.Participants were told that the more Article https://doi.org/10.1038/s41593-024-01672-w MUs they had at the end of all the games, the larger their gift would be.The reward was described in this abstract way to appeal to children of all ages and was previously found to be sufficiently motivating for children of this age and equally so across the age range 92,93 .DG. Participants were allocated six MUs, visually represented in the task as coins on a computer screen.In the offline sample, two boxes were presented, one for the child and one for their 'partner'.Children were told that they were playing with another child from a different school; in reality, there was no other participant.They were instructed to first click on the MU and then the boxes to divide them, and they were informed that once they had put an MU in a box, they could not change their decision.Counters at the side of the boxes kept track of the number of MUs in either box.During the task, the instructor explicitly informed the participant that they would turn away and not look at the screen.There was no response time limit.The DG measures pro-social decision-making as indicated by how many MUs a participant decides to give to another unknown child.In the online version, children determined their chosen distribution by moving a slider.In this sense, the online task required just one move to distribute the MUs.As in the offline version, children were told that they were playing with another child from another school whom they did not know, when, in reality, there was no other participant.Unlike in the offline sample, however, children could change their minds about their preferred distributions indefinitely and submit their final decision by pressing the spacebars on their computers.Parents were instructed to be present in the room while testing, engaged in an activity such as reading a book and not to influence their children's participation.

UG.
The UG consisted of the responder role.Children could accept or reject a single offer of an unfair distribution (1/6) of MUs made by another unknown child in the study.If they rejected the offer, the participant and the unknown other child who made the offer (a computer, in reality) would receive zero MUs.For this game, there was, again, no response limit for the participants.
Intertemporal choice task.Intertemporal decision-making was assessed using an intertemporal choice task.In the intertemporal choice task, participants made choices between immediate and delayed reward options.This task measured the extent to which participants discount rewards as a function of how delayed they are via their choices.Participants completed 18 trials (in a fixed order) where they were always presented with a choice between either an immediate or a delayed option.The unit of delay used was days, where every moon depicted indicated one additional day of waiting before the participant would receive their reward.The reward for the delayed option was always eight MUs, and the immediate reward option ranged among two, four and six MUs.For every immediate reward option, participants' discounting was measured by calculating the percentage of total delayed choices.

Academic performance
Academic performance scores were collected retrospectively from schools in the form of English and Maths age-standardized scores.Depending on school, English tests included Progress in Reading Assessment, Progress Test in English, Suffolk Reading Test and/or New Group Reading Test, and Maths tests included Progress in Understanding Maths Assessment and/or Progress Test in Maths.As we did not have discipline-specific hypotheses, the main measure for overall academic performance was a composite age-standardized score computed for each participant as the average across all available English or Maths age-standardized scores for that participant; if participants had scores for one test or discipline only, that score was used as a measure of overall academic performance.

Creativity
Creativity was measured using the Torrence Test of Creative Thinking (TTCT) 94 .The TTCT is the most widely used test of creativity [95][96][97] .The TTCT consists of verbal and figural versions.In the present study, we used TTCT-Figural form A. Participants were provided with a pencil, an eraser and a printed Torrance activity sheet.Following a protocol, participants were instructed to use 10 min to complete the given stimuli with unique answers and to come up with interesting titles that described their drawings.In case participants finished in less than 10 min, they were encouraged to use the remaining time to add to their answers.It has high test-retest reliability and can predict creativity success 98 .

Fluid intelligence
Fluid intelligence was measured using WASI-II (ref.99).The WASI consists of two parts: Matrix Reasoning and Vocabulary.WASI Matrix Reasoning measures non-verbal ability, which correlates well with fluid and visual intelligence, and the WASI verbal subtest measures verbal ability, which correlates well with verbal IQ and crystalized intelligence.For Matrix Reasoning, participants were provided with 30 visually depicted incomplete matrices and asked to choose one from the five options that logically follows the missing matrices.For the vocabulary part, participants were presented with 28 words, one at a time, and asked to verbally define or describe the word presented.WASI-II has high reliability and validity 100 and provides a good estimate of intelligence.

SDQ.
The parent-report version of the SDQ 101 was used to measure internalizing and externalizing difficulties.The SDQ is a 25-item scale consisting of five subscales (emotional problems, conduct problems, peer relationship problems, prosocial behavior and hyperactivity/inattention), each of which includes five questions.Parents rate their child's behavior over the previous 6 months.Each question has the following response options: 0 = not true, 1 = somewhat true and 2 = certainly true.For each scale, the responses can be summed to provide a total score for that scale.In non-clinical samples, such as are included in the present study, it has been recommended to combine the scales into two further subscales representing 'internalizing' and 'externalizing' problems 102 .The internalizing subscale is calculated by summing the emotional problems and peer relationship problems subscales, and the externalizing subscale is calculated by summing the hyperactivity/inattention and conduct problems subscales.We, therefore, used this approach in the present study.The SDQ has high validity and reliability 103 .
Child and Adolescent Symptom Inventory-4R.The Child and Adolescent Symptom Inventory-4R (CASI-4R) 104 is a parent-report rating scale that evaluates behaviors related to the disorders that are included in the Diagnostic and Statistical Manual of Mental Disorders in young people aged 5-18 years.In the present study, the CASI-4R subscales relating to attention-deficit/hyperactivity disorder (ADHD), generalized anxiety disorder, major depressive episode, depressive disorder, conduct disorder, social phobia and separation anxiety were included.Parents were asked to rate their child's overall behavior.Each question has the following response options: 0 = never, 1 = sometimes, 2 = often and 3 = very often.Previous studies found that the CASI-4R has good test-retest reliability, validity and internal consistency 105 .

Apathy Evaluation Scale, informant version
The Apathy Evaluation Scale, informant version (AES-I), was used to assess apathy 106 .The AES-I includes 18 items relating to cognitive, behavioral and emotional apathy.We asked parents to rate their child's behavior over the previous 4 weeks.Each question is rated on a four-point scale (not at all, slightly true, somewhat true and very true), with higher scores reflecting greater apathy.

MRI measures
MRI data were acquired with a standard whole-head coil on a 3.0-Tesla Siemens Prisma scanner at the Birkbeck-UCL Centre for Neuroimaging.To limit head motion, participants were asked to keep their heads as still as possible.Foam inserts were used between the head and the head coil to ensure a snug fit.Visual stimuli were projected onto a screen in the magnet bore that could be viewed via a mirror attached to the head coil.During the acquisition of the structural and diffusion tensor imaging (DTI) scan, participants watched cartoons without sound.

Pre-processing and statistical analysis
Outliers were removed for all measures.Datapoints falling two standard deviations below or above the mean were excluded.

Cognitive control factors.
Outliers were removed from each cognitive control measure.Datapoints falling two standard deviations below or above the mean were excluded.Then, a confirmatory factor analysis (CFA) was performed using 'lavaan' in RStudio to create latent factors of executive functions 107 .For T0 data, multiple models were fit; however, the model failed to converge for most models, with some of them displaying negative variances, suggesting that models were mis-specified.Only two models converged: a model with a single factor encompassing all tasks and a model with three subfactors of inhibition, shifting and memory.There were no significant differences in model fits (Δχ 2 (3) = 1.69,P = 0.638).The inhibition factor was extracted to examine correlations at T0 with the other domains.To examine training-related changes in executive functions, the factor analysis was conducted separately for error rate and reaction time data.This was done because a factor solution could not be found when composite measures of error rates and reaction times were used.For the error rate factor specifically, inclusion of flanker inhibition indices caused non-convergence of models and was excluded from analysis.Based on previous literature, factor loadings were constrained by timepoints to allow for pre-post comparisons establishing weak factorial invariance 26 .Values for each individual were extracted from this for further analysis.This was done separately for error rates and reaction times, where a larger value indicated a larger error rate or reaction time.
Creativity.The TTCT responses were scored according to the Streamlined Scoring Guideline 108 .The responses were scored with respect to five norm-based creativity measures: fluency, originality, abstractness of titles, elaboration and resistance to premature closure.A higher score in any of the five subcategories indicates more unique answers and higher levels of creativity.In the present study, all responses were scored by a single scorer, and a sum score of all five categories was used for the analyses.To establish consistency, the scorer scored a random sample of 10 responses two times with 2 weeks in between.Eighty-six percent of the scores were consistent across the two separate scorings.
Mental health.A CFA was performed using 'lavaan' in RStudio to create latent factors of mental health 107 .Based on previous literature, factor loadings were constrained by timepoints to allow for pre-post comparisons establishing weak factorial invariance 26 .Factors of externalizing problems and internalizing problems were created.Specifically, externalizing problems from the SDQ and CASI-ADHD problems loaded on the externalizing factor.Internalizing problems from the SDQ, CASI-social phobia, CASI-separation anxiety and CASI-depression loaded on the internalizing factor.Values for each individual were extracted from this for further analysis, where a larger value indicated greater mental health problems.

MRI measures.
Task-related functional MRI.Each individual's functional scans were realigned to correct for head motion by initial realignment to first image and second realignment to mean image.The realigned scans were co-registered with anatomical T1-weighted images and spatially normalized to the standard Montreal Neurological Institute (MNI) space by resampling to a voxel size of 2 × 2 × 2 mm 3 .Normalized images were smoothed with an 8-mm Gaussian filter.Fixed statistical effects were calculated at the individual level by modeling each trial condition ('stop' successful, 'stop' unsuccessful, 'go' successful and 'go' unsuccessful) with a box car function convolved with the canonical hemodynamic response function.To reduce movement-related artifacts, six motion parameters were included as regressors as well as an additional regressor to model images that were corrupted due to head motion of more than 1.5 mm and were replaced by interpolations of adjacent images (<10% of participant's data).To examine training-related changes from pre-test to post-test in 'stop' versus 'go' trial condition, the Sandwich Estimator Toolbox for Longitudinal and Repeated Measures Data version 2.1.0was employed (SwE, toolbox for SPM, Guillaume et al. 109 ).Repeated-measures ANOVA was conducted at the group level, with the 'stop' successful condition and 'go' successful condition entered as fixed effects and a subject factor entered as random effects.Family-wise error (FWE) corrections at P < 0.05 were applied to the data.Moreover, using the MarsBaR Toolbox 110 implemented in SPM12, we extracted functional activity from the right IFG selected from the probabilistic Harvard-Oxford atlas 111 (thresholded at 20%, center of mass: 51, 28, 8).Beta values for each ROI (that is, successful 'stop' trials versus successful 'go' trials) were extracted for further statistical analyses outside of SPM.
Cortical thickness.After converting the DICOM files to NifTI using dcm2niix, structural MRI images were processed with FreeSurfer 112 (version 6.0.0, http://surfer.nmr.mgh.harvard.edu)to label and segment cortex and white matter.All scans were then visually inspected for quality, and, if necessary, segmentation was manually corrected Article https://doi.org/10.1038/s41593-024-01672-w in FreeSurfer.Four independent inspectors conducted these checks, and one final inspector performed a final inspection of all scans.After corrections, scans were re-segmented using FreeSurfer.If the quality of scans was inadequate, they were excluded from the final analysis.Based on this, data were available from 141 participants.After pre-processing, sulcal and gyral features across individual participants were aligned by morphing each participant's brain to an average spherical representation that accurately matches cortical thickness measurements across participants while minimizing metric distortion.A 10-mm Gaussian smoothing kernel was applied to the data to reduce measurement noise but preserve the capacity for anatomical localizations 113,114 .Cortical thickness data were analyzed using the SurfStat toolbox for MAT-LAB 115 (https://www.math.mcgill.ca/keith/surfstat).Findings from the surface-based analyses were controlled for multiple comparisons using random field theory 5,113,115 .This reduced the chance of reporting an FWE.We ran whole-brain models looking at changes in cortical thickness after training by testing for a Session-by-Group interaction.Using the Desikan-Killiany atlas 116 , cortical thickness was extracted from the right IFG (comprising the right pars triangularis, the pars opercularis and the pars orbitalis) to look at the specific interaction within this region.
Resting state.Processing of resting-state functional connectivity (RSFC) data was completed with the ABCD-HCP pipeline (https://github.com/DCAN-Labs/abcd-hcp-pipeline), which is modified from the original HCP pipelines 117 .In brief, this pipeline consists of six stages.First, the PreFreeSurfer stage normalizes anatomical data.This normalization includes brain extraction, denoising and then bias field correction on anatomical T1-weighted and/or T2-weighted data.To improve output image quality, ANTs DenoiseImage attempts to remove scanner noise from T1 and T2 anatomical images by modeling scanner noise as a Rician distribution, and ANTs N4BiasFieldCorrection attempts to improve bias field correction.Second, the FreeSurfer stage constructs cortical surfaces from the normalized anatomical data.This stage also performs surface registration to a standard surface template, and surfaces are refined using the T2-weighted anatomical data.Third, the PostFreeSurfer stage transforms the volumes to a standard volume template space using ANTs nonlinear registration and the surfaces to the standard surface space via spherical registration.Fourth, the fMRIVolume stage performs processing of the functional data, including correction for functional distortions via reverse-phase encoding spin echo images, intensity normalization to a whole-brain-mode value of 1,000, within-run correction for head movement and registration to the standard template.Fifth, the fMRISurface stage maps the normalized functional volumes to the standard surface template.The BOLD functional MRI volumetric data were sampled to each participant's original mid-thickness left and right hemisphere surfaces constrained by the gray matter ribbon.These surfaces were then combined with volumetric subcortical and cerebellar data into the CIFTI format using Connectome Workbench (https://www.humanconnectome.org/software/connectome-workbench),creating full brain timecourses excluding non-gray matter tissue.The resting-state timecourses were then smoothed with a 2-mm full-width at half-maximum kernel applied to geodesic distances on surface data and Euclidean distances on volumetric data.Finally, the DCANBOLDproc stage performs further denoising steps to reduce variance unlikely to reflect neuronal activity.These denoising steps include a respiratory filter to improve framewise displacement estimates, temporal masks to flag motion-contaminated frames with a filtered framewise displacement greater than 0.3 mm, demeaning, detrending, interpolation across censored frames and a band-pass filter (0.008 Hz < f < 0.1 Hz).
After processing, time series of RSFC data were extracted using the Gordon-333 parcellation 118 , which includes 333 parcels (ROIs) that cover the whole cortical surface.These time series were further motion censored at a framewise displacement greater than 0.2 mm.Then, parcels were grouped for the networks of interest (FPN and CON), and Pearson correlations across parcels within each network were run.We then computed the mean z-score across all correlations within each network.Therefore, we obtained an RSFC value (z-score) for each network of interest, participant and timepoint.DTI.The data were initially visually inspected.Volumes with extreme artifacts or corruption were removed.Across the dataset, the average number of volumes removed was 0.27 (range = 0-5) at T0 and 0.97 (range = 0-10) at T1, accounting for 0.5% of the total number of volumes acquired.Data was then pre-processed using ExploreDTI (https://exploredti.com/).The data were corrected for head motion, eddy current distortions and EPI distortions, and the B-matrix was rotated 119 .Remaining outliers due to head motion and cardiac pulsation were excluded using REKINDLE.The tensor model was fitted to the data using a nonlinear least square fitting procedure.DTI scalar maps, including fractional anisotropy and mean diffusivity, were calculated and exported.A whole-brain tractography algorithm using Euler integration and the following settings was applied: step size = 0.5 mm, fractional anisotropy threshold ≥ 0.15 and angle threshold ≤ 35.Whole-brain tractography was exported to TrackVis (https://trackvis.org/) to perform virtual in vivo dissections for the right hemisphere.The connections were dissected in regions corresponding to the putamen and the frontal lobes, providing measures for the fronto-putamen connections.All dissections were completed after ensuring intra-rater reliability.This was tested with the use of 10 participants from the present study, dissected twice by the same dissector.Reliability was tested using a two-way mixed intra-class correlation coefficient (ICC) 120 .For all tracts, the ICC for single measures reached greater than 0.90.For each tract, fractional anisotropy and mean diffusivity were calculated.These measures reflect the structural integrity of the white matter connection and may indicate microstructural differences, such as myelination, axonal integrity and how compact fiber bundles are 121 .Fractional anisotropy is the degree of directionality of water motion within a particular voxel.Mean diffusivity is the average diffusion of water motion within a voxel.
Training-related changes.Mixed models were used to examine training-related changes using the 'lme4' package in R (version 4.3.1).In this model, the main effects of training group and session were examined as well as the interaction between Group and Session.Age and gender was added into the model as a covariate.Significant interaction effects between Session and Group were interpreted as presence of training-related changes and followed up with post hoc paired t-tests.In a subset of available tasks, maintenance of training-related changes was examined between pre-test (T0) and 1-year follow-up (T2).For evidence of null effects, we report the BF in favor of the null model (the model without a Group-by-Session interaction) over the training model (the model with a Group-by-Session interaction) for each measure of interest 122 .The prior is set to be the model with main effects of Group and Session.We isolate this particular interaction as our training effect of interest where BF 10 < 1 suggested evidence for the null hypothesis (that is, no training-related changes).
Data imputation.For all measures (unless specified otherwise), multiple imputations by chained equations (MICE) was used to impute missing data (predictive mean matching; iterations = 20, n datasets = 100; Supplementary Fig. 4).A single imputed dataset was used, as this was necessary in conducting mixed models with post hoc tests and factor analysis.We ensured the replicability of these results by re-running the process multiple times and choosing a dataset at random.Missing data were imputed using the MICE package in R (50 datasets created, 50 maximum iterations), and quickpred was used to create the imputation model 123 .Factors of executive function (at T0) and mental health factors were imputed using full information maximum likelihood (FIML) in 'lavaan'.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.

A description of all covariates tested
A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g.means) or other basic estimates (e.g.regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g.confidence intervals) For null hypothesis testing, the test statistic (e.g.F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.
For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g.Cohen's d, Pearson's r), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

nature portfolio | reporting summary
April 2023 fMRI: FWE-corrected at cluster-level of p < .05,based on an uncorrected height threshold of p < .001.Structural MRI: Findings from the surface-based analyses were controlled for multiple comparisons using random field theory.This reduced the chance of reporting a family-wise error (FWE).The cluster-defining threshold was set to p< .01 and the FWE to p < .05.

Correction
Family wise error-corrections (FWE) at p < .05were applied to the data.

Models & analysis n/a Involved in the study
Functional and/or effective connectivity

Graph analysis
Multivariate modeling or predictive analysis Functional and/or effective connectivity fMRI: Time-series of RSFC data was extracted using the Gordon-333 parcellation, which includes 333 parcels (ROIs) that cover the whole cortical surface.Parcels were grouped for the networks of interest (frontoparietal network, FPN; cingulo-opercular network, CON) and correlations across parcels within each network were run.The mean Z-score were calculated across all correlations within each network and an RSFC value (Z-score) were obtained for each network of interest, participant and timepoint.DTI: Reliability was tested using a two-way mixed intraclass correlation coefficient (ICC).For all tracts, the ICC for single measures reached >0.90.For each tract fractional anisotropy and mean diffusivity were calculated.These measures reflect the structural integrity of the white matter connection and may indicate microstructural differences such as myelination, axonal integrity and how compact fiber bundles are.

Table 1 | Short-term training effect for each measure with BF values
https://doi.org/10.1038/s41593-024-01672-w of effects in cognitive training studies.The first of these proposes that the occurrence of far transfer depends on and is, indeed, mediated by the occurrence of near transfer

Table 2 | Long-term training effect for each measure with BF values
MethodsParticipantsA total of 262 typically developing children were recruited for the study (6.03-13.31years; mean age = 8.97 years; females = 52.84%)from schools within Greater London in the United Kingdom (data collection started in May 2019 and ended in May 2021).Sampling occurred by contacting over 2,000 schools in the Greater London area.Of those schools, 20 ended up participating from a diverse range of boroughs.