Effects of multi-domain cognitive training on working memory retrieval in older adults: behavioral and ERP evidence from a Chinese community study

Working memory (WM) is a fundamental cognitive function that typically declines with age. Previous studies have shown that targeted WM training has the potential to improve WM performance in older adults. In the present study, we investigated whether a multi-domain cognitive training program that was not designed to specifically target WM could improve the behavioral performance and affect the neural activity during WM retrieval in healthy older adults. We assigned healthy older participants (70–78 years old) from a local community into a training group who completed a 3-month multi-domain cognitive training and a control group who only attended health education lectures during the same period. Behavioral and electroencephalography (EEG) data were recorded from participants while performing an untrained delayed match or non-match to category task and a control task at a pre-training baseline session and a post-training follow-up session. Behaviorally, we found that participants in the training group showed a trend toward greater WM performance gains than participants in the control group. Event-related potential (ERP) results suggest that the task-related modulation of P3 during WM retrieval was significantly enhanced at the follow-up session compared with the baseline session, and importantly, this enhancement of P3 modulation was only significant in the training group. Furthermore, no training-related effects were observed for the P2 or N2 component during WM retrieval. These results suggest that the multi-domain cognitive training program that was not designed to specifically target WM is a promising approach to improve WM performance in older adults, and that training-related gains in performance are likely mediated by an enhanced modulation of P3 which might reflect the process of WM updating.


Scientific Reports
| (2021) 11:1207 | https://doi.org/10.1038/s41598-020-79784-z www.nature.com/scientificreports/ were excluded from participation. These participants were different from those in our previous manuscript 23 . They were assigned into two groups, i.e., a multi-domain training group (N = 27) and a control group (N = 22), with matched age, gender and education. Among these participants, 23 (out of 27, 2 lost to follow-up due to illness, 2 refused to do post-training test) in the training group and 16 (out of 22, 4 lost to follow-up due to illness, 2 refused to do post-training test) in the control group completed cognitive assessments and EEG recording during a WM task (see below) at a pre-training (baseline) session and a post-training (follow-up) session (conducted within one month after the end of training). The demographic information of these 39 participants who were included in the final analysis can be found in Table 1. To examine possible impacts caused by the sample size difference between the training (N = 23) and control (N = 16) group, we conducted additional statistical analyses by randomly selecting 16 participants in the training group. Results were overall similar to that observed using 23 participants in the training group. Therefore, in the following we only reported the results based on 23 participants in the training group.

Cognitive interventions and neuropsychological tests. The multi-domain training group received
cognitive training at a frequency of twice a week over a period of 12 weeks, resulting in a total of 24 training sessions. Cognitive intervention was provided by two psychiatrists in a face-to-face manner to the 27 individuals in the training group. The percentage of participation in the training sessions dropped from 96% (26/27) at the first session to 63% (17/27) at the twenty-second session, and the mean percentage of participation over the 24 sessions was 76%. The cognitive training program covered the domains of memory (e.g., memorizing pictures and words, adapted from the word learning test and digit span test), reasoning (e.g., the identification of patterns in a group of words, numbers, or pictures, adapted from the Raven's Progressive Matrices), problem-solving strategies (e.g., forming strategies for different tasks) and behavioral exercises (e.g., handwriting and handcrafts). Each session covered one domain, with domains of training counterbalanced across sessions. The training difficulty was fixed within each training session, and participants were required to provide feedback about the difficulty level, perceived usefulness, and interestingness of the session after each session. Such information was subsequently used to adjust the following sessions. Each training session lasted for 60 min. A lecture was presented on the common diseases in older adults during the first 15 min of each session. Participants were then trained for 30 min, with the last 15 min used to consolidate the newly practiced skills. Between every two training sessions, participants in the training group were encouraged to do physical exercise and finish the homework assigned during the previous session (i.e., reading, calligraphy, painting). See Cheng et al. 23 and Feng et al. 43  Working memory paradigm and stimuli. Participants performed a visual delayed match or non-match to category (DMC/DNMC) task, a classic experimental paradigm in the study of WM 44 . As Fig. 1A illustrates, each trial began with a sample stimulus (S1) lasting for 1500 ms. After a retention period of 3500 ms, a probe stimulus (S2) was presented for 2000 ms. S1 included a single object in black and white. S2 included two objects in black and white or in color, with one object presented in the left visual hemifield and the other presented in the right visual hemifield. There were 54 different S1 stimuli, which were paired with 54 different S2 stimuli. When S2 was presented, participants were required to indicate which object (referred to as the target object hereafter) in S2 matched (DMC) or did not match (DNMC) the category of the object in S1 by pressing the button '1' when the target object was in the left visual hemifield, and pressing button '5' when the target object was in the right visual hemifield. The location of the target was matched with the location of response button, as the button '1' and button '5' were placed at the left and the right side in front of the participant, respectively. Such a task design could minimize possible conflicts between stimulus and response. In addition to the DMC and the DNMC trials, another type of trials, i.e., the control trial, was included in the experimental design. Participants had to indicate Table 1. Group demographics (mean ± SD). a Independent-samples t-test. b Chi-square test. c The data from one participant was not available due to incompliance with the assessment. www.nature.com/scientificreports/ whether the two objects in S2 were both in color (press button '1') or not (press button '5') in the control trials. The inter-trial interval was 2000 ms. Each session included 9 control trials, followed by 9 DMC trials, and then 9 DNMC trials. The sequence of control, DMC and DNMC trials was fixed across participants, and the goal of this setting was to minimize possible impacts of task instructions for the DMC/DNMC trials on the control trials. After a short training session to get familiar with the task, each participant finished 6 sessions, with a 2-3 min break between every two successive sessions, resulting a total of 54 trials for each trial type. E-Prime 2.0 toolkit (Psychology Software Tools Inc., Sharpsburg, PA, USA) was used for the implementation of the experimental design and the recording of behavioral responses. EEG recoding and preprocessing. Scalp EEG data were continuously recorded with a 64-channel Easy-Cap connected to two BrainAmp DC amplifiers (Brain Products GmbH, Gilching, Germany). The 60 recording During each trial, a sample stimulus (S1) consisting of an object in black and white was first presented for 1500 ms. After a delay of 3500 ms, a probe stimulus (S2) consisting of two objects either in color or in black and white was presented for 2000 ms. In the control trial, participants must indicate whether the two objects of S2 were both in color or not. In the delayed match or non-match to category (DMC/DNMC) trial, participants must indicate the category of object in which visual hemifield matched (or did not match) that of the sample stimulus. Trials were separated by an inter-trial interval of 2000 ms. (B) Participants in both groups showed significant increases in accuracy at the follow-up session relative to the baseline session. Participants in the training group tended to show greater improvements than participants in the control group in the DMC trials. See main text for the details about statistics. Vertical bars indicate mean ± SEM. . The AFz was used as the recording ground and the tip of the nose was used as the recoding reference. To monitor ocular activity, the horizontal and vertical electrooculograms (EOGs) were measured by electrodes placed at the outer canthi (LO1, LO2) and above and below the left eye (SO1, IO1). The digitization rate was 1000 Hz and the online anti-aliasing band pass filtering was 0.016-200 Hz. Electrode impedance was kept below 10 kΩ throughout the recording. Matlab-based EEGLAB 45 and ERPLAB 46 toolboxes were used for the offline preprocessing. Raw continuous EEG signal first went through a two-way Butterworth band-pass (0.1-40 Hz) filter with zero phase shift (rolloff slope: 12 dB/oct), then followed by a Parks-McClellan notch filter to minimize the 50 Hz powerline noise. Independent component analysis based on Informax algorithm was then applied to correct ocular artifacts. After that, continuous EEG data were down-sampled to 250 Hz and then segmented into epochs time-locked to S2 (− 200 to 1000 ms post-S2). We note that for the present study, only the retrieval stage of WM was of interest. Therefore, we only analyzed the ERPs elicited by the probe stimulus (S2). Epochs meeting one or more of the following criteria were marked as artifacts and rejected in the following ERP analysis: (1) the maximal voltage difference in any EEG channel exceeded 150 μV within any of the moving windows (width: 200 ms; step: 50 ms) throughout the epoch, examined by a peak-to-peak (maximum minus minimum) function, (2) the absolute value of voltage at any time point throughout the whole epoch in any EEG channel exceeded 150 μV, examined by a simple voltage threshold function. After preprocessing, the percentages of rejected epochs with artifacts across participants (mean ± SD) were 1.54 ± 2.96% (baseline session) and 1.37 ± 2.18% (follow-up session) for the training group, and 4.22 ± 8.33% (baseline session) and 1.12 ± 1.50% (follow-up session) for the control group. Only epochs with behaviorally correct responses and without artifacts were included in the following ERP analysis. ERP analysis. ERPs elicited by S2 were obtained by averaging all epochs of the same trial type for each session and participant, with the pre-stimulus 200 ms as baseline. According to the literature, P2 and N2 components were typically observed over the frontal-central region, and P3 component was typically observed over the parietal region. We thus defined two regions-of-interest (ROIs), i.e., a frontal-central ROI (Fz, FC1, FCz, FC2, Cz) and a parietal ROI (CPz, P1, Pz, P2, POz). ERPs were then averaged across the electrodes within each ROI. Based on previous studies 38,47,48 as well as the ERPs observed in the present study, the amplitudes of P2, N2 and P3 components were obtained by averaging the voltages within the windows of 200-250 ms, 300-350 ms and 450-550 ms, respectively. The individual peak latency for each ERP component was identified as the time point of the most negative (N2) or most positive (P2, P3) voltage during the same windows as that used for measuring the mean amplitude. In order to get a reliable measurement of the individual peak latency, individual ERPs were filtered into 0-15 Hz to minimize the interference of high-frequency noise.

Statistical analysis.
Behavioral and ERP data were analyzed using three-way mixed Analysis of Variance (ANOVA) with Session (baseline, follow-up) and Task (control, DMC, DNMC) as within-subject factors, and Group (control, training) as a between-subject factor. We note that because part of the reaction time data were lost due to hardware issues, we only reported accuracy for behavioral performance in the present study. For ERP amplitude data, statistical analysis was performed with the following two steps. First, to examine whether these components were significantly modulated by the WM task and get an overall impression of trainingrelated effects, we conducted an initial 3-way ANOVA with Session (baseline, follow-up) and Task (control, DMC, DNMC) as within-subject factors, and Group (control, training) as a between-subject factor. Second, we calculated the amplitude differences between the DMC/DNMC and the control trials as the task modulation (DMC/DNMC trials minus control trials) for each component of interest, participant and session 41 . This could help to isolate the ERP effects related to WM retrieval by minimizing ERP effects related to other processes, e.g., behavioral responses, and also possible impacts of the baseline differences in ERP amplitudes between the two subject groups. A 3-way ANOVA was conducted on such task modulations of ERP amplitudes with Session (baseline, follow-up) and Task (DMC, DNMC) as within-subject factors, and Group (control, training) as a between-subject factor.
For ANOVA, Greenhouse-Geisser adjustment was applied to correct the degrees of freedom when the assumption of sphericity was violated, and the uncorrected degrees of freedom and corrected p values were reported in this case. Post hoc tests were conducted using the least significant difference (LSD) method to analyze the significant main effects shown by the ANOVA when appropriate. Independent-samples and paired-samples t-tests (2-tailed) were conducted to inform the interactions when necessary. Statistical analysis was performed in SPSS 22.0. Statistical test was considered as significant when p < 0.05. All results were presented as mean ± SEM (standard error of the mean) unless otherwise specified.

Results
Behavioral results. Mean accuracy was analyzed for each trial type (control, DMC and DNMC), session (baseline, follow-up) and participant (Fig. 1B). We conducted a 3-way ANOVA on accuracy, with Session (baseline, follow-up) and Task (control, DMC and DNMC) as within-subject factors, and Group (control, training) as a between-subject factor. This ANOVA revealed the main effects of Session (F (1,37) = 64.787, p < 0.001, η 2 p = 0.636) and Task (F (2,74) = 10.801, p < 0.001, η 2 p = 0.226), indicating significant task-related modulation and trainingrelated gains in accuracy. A post hoc LSD test showed that the accuracy in the control trials was significantly higher than that in the DMC (p = 0.004) and DNMC (p < 0.001) trials, and that the accuracy in the DMC trials tended to be higher than that in the DNMC trials (p = 0.09). In addition, we observed a significant 2-way interac- www.nature.com/scientificreports/ tion of Session × Task (F (2,74) = 5.705, p = 0.005, η 2 p = 0.134) and a marginally significant 3-way interaction of Session × Task × Group (F (2,74) = 2.758, p = 0.070, η 2 p = 0.069), while the interaction of Session × Group (F (1,37) = 1.399, p = 0.245, η 2 p = 0.036) was not significant. Such results indicated that the accuracy gains depended on the trial type and subject group. The main effect of Group was not observed (F (1,37) = 0.395, p = 0.533, η 2 p = 0.011), suggesting that the overall accuracy was not significantly different between the two groups. To further characterize the 3-way interaction, we then performed 2-way ANOVA with Session and Task as within-subject factors in each subject group. We observed the main effect of Session in both the training group (F (1,22)  Furthermore, we directly compared the magnitude of training-related performance gains for each trial type between the two subject groups by calculating the differences in accuracy between different sessions (follow-up minus baseline). Here we did not correct for multiple comparisons as these were planned comparisons to test pre-defined hypotheses. As Fig. 1B illustrates, participants in the training group showed a trend of greater gains in accuracy than participants in the control group for the DMC and DNMC trials, but not for the control trials. Independent-samples t-tests suggested that the differences approached statistical significance for the DMC trials (t (37)  ERP results. Figure 2 illustrates the grand-averaged ERP waveforms elicited by the probe stimulus (S2) over the frontal-central ROI and the parietal ROI in different sessions and subject groups. Figure 3 illustrates the grand-averaged topographical maps for the task modulations of P2, N2 and P3 component.
P2 component. The initial 3-way ANOVA revealed no main effects or significant interactions (all ps > 0.1). For the task modulation of P2 amplitude, as illustrated in Fig. 4A, the 3-way ANOVA revealed a marginally significant 2-way interaction of Session × Task (F (1,37) = 3.615, p = 0.065, η 2 p = 0.089). We thus performed additional 2-way ANOVA with Session as a within-subject factor and Group as a between-subject factor for the DMC and the DNMC trials separately. However, no main effects or significant interactions were observed for either type of trials (all ps > 0.1). Such results suggest that P2 was not significantly modulated by the WM task or cognitive training.
For the peak latency of P2, the 3-way ANOVA revealed only a significant interaction of Session × Group (F (1,37)  N2 component. The initial 3-way ANOVA revealed no main effects or significant interactions (all ps > 0.1). For the task modulation of N2 amplitude, as illustrated in Fig. 4B, the 3-way ANOVA revealed a marginally significant 2-way interaction of Task × Group (F (1,37) = 3.101, p = 0.086, η 2 p = 0.077). We then performed additional 2-way ANOVA with Session and Task as within-subject factors in each subject group. However, no main effects or significant interactions were observed in either subject group (all ps > 0.1). Such results suggest that N2 was not significantly modulated by the WM task or cognitive training.
For the peak latency of N2, the 3-way ANOVA revealed a main effect of Group (F (1,37) = 8.246, p = 0.007, η 2 p = 0.182), suggesting that the training group had significantly longer N2 latency than the control group. No other main effects or significant interactions were observed (all ps > 0.2). Overall, such results indicate that although P3 was not significantly modulated by the WM task for participants in both groups at the baseline session, the task modulation of P3 (i.e., larger amplitudes in the DMC/DNMC trials than in the control trials) was regained after cognitive training for participants in the training group.
For the task modulation of P3 amplitude, as illustrated in Fig. 4C, the 3-way ANOVA revealed a main effect of Session (F (1,37)

Discussion
In the present study, we examined whether a previously established multi-domain cognitive training program that was not designed to specifically target WM could improve behavioral performance and ERP responses of WM in healthy older adults. We assigned healthy older participants from a local community into a training group who completed 3-month multi-domain cognitive training and a control group who only attended health education lectures during the same period. In both pre-training (baseline) and post-training (follow-up) sessions, behavioral and EEG data were recorded from participants while performing an untrained WM task. By comparing the performance between the baseline session and the follow-up session, we found that participants For the ERP data, we observed a significantly enhanced task-related modulation of P3 amplitude after cognitive training in the training group, but not in the control group. Moreover, no significant training-related effects were observed for other ERP components, i.e., P2 and N2. Altogether, these results suggest that the multi-domain cognitive training program that was not designed to specifically target WM is a promising approach to improve WM performance in healthy older adults, and that the enhancement of P3 modulation which was associated with WM updating might reflect the underlying neural substrate. The behavioral data in the present study confirmed our hypothesis that the multi-domain cognitive training program can improve WM performance in healthy older adults. An interesting finding is that participants in both groups showed significant improvements in the follow-up session relative to the baseline session. It might indicate a practice effect induced by repeated testing in addition to the performance gains related to cognitive training 33 . However, by directly comparing the net increase of accuracy between the two groups (Fig. 1B), we found that participants in the training group showed greater improvements than participants in the control group, although this group difference was only marginally significant for the DMC trials (p = 0.054). In contrast, for the DNMC trials, despite a trend of greater improvements in the training group relative to the control group, there was no significant differences between the two groups (p = 0.312). Such results might be due to the relatively small sample size (see discussion below), while it might also reflect the differences between the DMC and DNMC trials. Specifically, DNMC is harder than DMC for adult humans 29 , which was supported by our results that participants showed a trend of lower accuracy in DNMC trials than in DMC trials (p = 0.09). Moreover, as shown in Fig. 1B, our findings suggested that for participants in the training group, the performance in the DMC and DNMC trials was promoted to similar levels compared with the control trials at the follow-up session, which might indicate that the performance in DMC/DNMC trials might reach the ceiling of improvement. This might partly explain why the performance gains in DMC trials, not in DNMC trials, only approached statistical significance when comparing between the two subject groups. www.nature.com/scientificreports/ Our ERP results suggest that the task modulation of P3 amplitude was enhanced for participants who received 3-month multi-domain cognitive training. Such enhanced P3 modulation, however, was not observed for participants in the control group who only attended lectures about healthy living. Combined with the behavioral finding, our results suggest that P3 modulation might reflect the neural substrate of training effects on WM performance in older adults. Although the functional significance of P3 is still not precisely known, many studies have suggested that P3 observed in WM tasks might reflect the updating of WM 40,42,49 . At the same time, P3 has been also shown to be sensitive to normal aging 50 , and age-related reductions in P3 amplitude have been reported by many WM studies [51][52][53] . Consistent with the literature, we found that P3 amplitude was not significantly modulated by the WM task (i.e., the main effect of Task was not observed) at the baseline session for participants in either group. At the follow-up session, participants in the training group showed significant P3 modulation (i.e., larger P3 amplitudes in the DMC/DNMC trials than in the control trials), while such modulation was still absent for participants in the control group. This pattern of results explained why participants in the training group showed significantly greater task modulation of P3 than participants in the control group (p < 0.001; Fig. 4C).
We did not observe significant training effects on the shorter-latency components, i.e., P2 or N2. This finding might indicate that the neural processes underlying P2 and N2 were relatively difficult to change through the multi-domain cognitive training applied in the present study 23 . However, it did not imply that the earlier neural processes could not be altered by specifically designed cognitive training. For example, Anne and colleagues have shown that visual discrimination training over a three to five week period could modify early visual processing during stimulus encoding, which could further predict the improvement in WM accuracy 54 . Nevertheless, the present finding still contrasts with previous studies which showed that multi-domain cognitive training could increase the N2 amplitude in a task switching paradigm [32][33][34] . This disparity might be explained by different cognitive processes that could elicit the N2 effect. Specifically, it has been proposed that the N2 should reflect the control-related or mismatch-related processes 39 . The control-related processes are most involved in cognitive tasks with fast sequence of stimuli, high conflict (e.g., the Stroop task) and time pressure, and the enhanced N2 has been interpreted as the neural correlate of the improved performance after receiving cognitive training 32,34 . In contrast, for the present study, no fast stimulus sequence or response interference was induced in the DMC/ DNMC trials, and thus the N2 should only reflect the discrimination of mismatches between the current stimulus and a mental template stored in WM 38,39,55 . When combined with previous research, our findings might indicate that the control-related processes are more likely to be modified through multi-domain cognitive training compared with mismatch-related processes that are required by WM retrieval.
The cognitive intervention applied in the present study covered the domains of memory, reasoning, problemsolving strategies and behavioral exercises. The memory-related training materials and procedures were adopted from neuropsychological tests, e.g., the word learning test and digit span test. It has been suggested that such tests mainly involved short-term memory (STM), which was regarded as a subcomponent of the larger WM system 28 . Specifically, STM refers to the simple temporary storage of information, while WM implies a combination of storage and manipulation 6 . Thus, it might not be appropriate to consider the training effects in the present study as near transfer. However, considering that WM plays a fundamental role in general cognition, the training materials and procedures included in the present study should inevitably cover, at least to some extent, WM and its interaction with other cognitive processes. In this case, it might be difficult to make a quantitative distinction between near and far transfer. Nevertheless, the current results indicate that the improvement of WM and related neural processing after receiving multi-domain cognitive training might underlie the improvement of a variety of cognitive functions that has been demonstrated in our previous work 23 .
The following limitations need to be considered when interpreting the current findings. Firstly, the sample size in the present study was relatively small, and cognitive assessors were not blinded to the intervention assignment, which may reduce statistical power and prevent the generalization of results to the general population. The sample size estimation was based on the consideration that ~ 20 participants for each group was commonly seen in previous WM studies that compared ERPs between different participant groups 48,56,57 . However, the relatively high drop-out rate warrants larger sample sizes in future studies. Furthermore, considering that the effects of the multi-domain cognitive training have been demonstrated in our previous study using the randomized controlled design 23 , the intervention assignment of participants into the two groups was not randomized in the present study, but was in order to match their age, gender and education levels. Nevertheless, larger sample sizes would also help to uncover the baseline differences in more detail. Here we observed significant or marginally significant baseline differences in ERPs (i.e., N2 latency, P3 amplitude and latency) between the training and control participants, although we tried to minimize possible impacts of such baseline differences by calculating the amplitude differences between the DMC/DNMC and the control trials. It can be speculated that the baseline differences in ERPs might be prevented by the random allocation of training and control participants instead of the matching procedure applied in the present study. Moreover, larger sample sizes might also enable the investigation of inter-individual differences in training-related performance gains and associated neural changes 58 , which would allow the development of more effective and targeted training protocols in the future. Lastly, the reaction time data were lost due to hardware issues, which restricted the understanding of possible training effects on behavioral measures, e.g., lower RT variability after cognitive training 32 . In spite of the above limitations, the present study, for the first time, demonstrated the positive effect of multi-domain cognitive training on WM performance and possible neural mechanisms using ERPs based on a Chinese community-living older participant sample. In sum, future studies with the randomized controlled design, double-blind assessments and relatively larger sample sizes are required to replicate and further extend the current findings.
In conclusion, the present study extends our understanding of the behavioral effects and neural responses of WM associated with a multi-domain cognitive training program that was not designed to specifically target WM. We showed that this 3-month multi-domain cognitive training had beneficial effects on the WM performance Scientific Reports | (2021) 11:1207 | https://doi.org/10.1038/s41598-020-79784-z www.nature.com/scientificreports/ in healthy older adults, and that these training-related performance gains were likely mediated by an enhanced modulation of P3 which was believed to reflect WM updating.

Data availability
The datasets that support the findings of the present study are available from the corresponding author upon reasonable request.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.