Introduction

Humans differ in their ability to learn. Many studies have reported neural underpinnings of the difference within brain regions that increase activity during corresponding tasks (i.e., task-activated brain regions). Training-induced neural changes within these regions have been linked to learning performance. For example, working memory training has induced neural changes primarily within fronto-parietal network regions1,2,3,4, which are critical in the working memory process5,6,7 as well as performance8,9,10. Similarly, individuals with better performance improvement on working memory tasks have shown larger increases in brain activity1 and/or functional connectivity3 within the fronto-parietal network. Moreover, individual differences in subsequent task performance have been predicted from functional connectivity within task-activated brain regions11,12,13. Consequently, neural activity within task-activated brain regions covaries with as well as predicts learning performance. Together, these studies have led to the general consensus that individual learning performance is determined by activity or functional connections only within task-activated brain regions.

Theoretically, however, any brain region is embedded in brain-wide intrinsic connectivity networks rather than isolated from other regions6,14 and it dynamically interacts with other regions and networks15. For instance, the brain is organized as a scale-free network, in which a small number of nodes have broad access to most other nodes, with a small-world structure of short path length and high local clustering16. The architectures of connections between modules are important for optimal integration of modules17 and stability of the entire network18. A previous network analysis indicated that learning is characterized by dynamic reconfiguration of global networks of modules19. These findings suggest that learning is affected by the connectivity structure of pre-existing whole-brain networks. Therefore, we hypothesized that learning performance was determined by functional connections among intrinsic networks that include both task-activated and less-activated networks. However, no data were available to examine this hypothesis because previous studies have mainly investigated changes within task-activated regions.

To test our hypothesis, we calculated functional connectivity patterns among whole-brain networks in resting-state functional magnetic resonance imaging (fMRI) and examined whether the patterns predict individual learning performance in a short period of training (80–90 min). We used a 3-back working memory task because individual performances on this task are known to be correlated with resting-state functional connectivity20,21,22. According to a previous study23, obtaining a significant training effect for every subject is difficult due to the considerable inter-individual differences in training effects on working memory. For precise measurement of learning performance, we selected subjects who showed a monotonic increase in learning performance by fitting an inverse function (see below) to their learning curves. As an index for individual learning performance, we used a performance plateau estimated from the individual's learning curve. To analyze functional connectivity among intrinsic connectivity networks, we defined networks according to datasets in ‘BrainMap ICA’ that identified intrinsic connectivity networks by applying an independent component analysis (ICA) to a large-scale database of neuroimaging studies, named BrainMap6. Furthermore, we took advantage of the metadata of BrainMap ICA, enabling us to quantitatively evaluate each network's relevance to working memory tasks. Consequently, consistent with the general consensus, higher functional connectivity within the left fronto-parietal network (the most robust ‘task-activated’ network) predicted about half of the higher performance plateaus. On the other hand, consistent with our hypothesis, about half of the performance plateaus were significantly predicted by functional connections between task-activated and less-activated networks from the metadata of BrainMap ICA. Our results suggest that learning performance is determined by a larger repertoire of functional connections among intrinsic networks, rather than only task-activated networks.

Results

Learning of 3-back working memory task

Twenty-nine subjects participated in the behavioral training of a 3-back task. Outside the MRI scanner, the subjects performed the verbal 3-back task as illustrated in Figure 1a. One of nine consonant letters was presented in a trial. Subjects were asked to press a key on a keyboard when the presented letter was identical to the letter presented three trials back. A session consisted of four blocks, each of which included 15 trials. The subjects completed 25 sessions within a total duration of 80–90 min. Their performance (hit rate and false-alarm rate) was presented on a screen at the end of each session and the subjects were asked to improve their performance by increasing the hit rate and reducing the false-alarm rate. This procedure resulted in a series of 25 d-primes for individual subjects (see Methods). We applied a five-session moving average to the series of d-primes to obtain a smoothed learning curve. We fitted an inverse curve24 (y = ab/x) to the smoothed one, where y is a d-prime in the x-th session, a is a performance plateau and b is learning speed. Figure 1b shows learning curves of 20 out of 29 subjects who were significantly fitted by the inverse function (F(1, 19) > 4.46, p < 0.05; see Methods). Supplementary Figure S1a shows for comparison the learning curves of the subjects who were not well fitted by the inverse curve of learning time. We applied a two-way analysis of variance (ANOVA) to d-primes with training session (after the moving average) as a within-subjects factor and subject group (significant/not-significant fitting) as a between-subjects factor. Consequently, we found a significant interaction effect between session and subject group (F(20, 540) = 5.31; p < 0.001; see Supplementary Fig. S1b). This suggests a significant difference in training effect between the groups. A post-hoc analysis using a paired t-test identified a significant difference between the first and last sessions in the significant-fitting group (t(19) = 7.58; p = 3.7 × 10−7; p < 0.001 corrected for two comparisons using the Bonferroni method) but not in the not-significant-fitting group (t(8) = 1.94; p = 0.09). Since we are interested in predicting learning performance, we conducted further analyses in the group for which a significant effect of training was identified (n = 20).

Figure 1
figure 1

Estimation of individual performance plateau.

(a) Example of a session in the 3-back task. Subjects respond to the target stimulus, which was the letter identical to the one presented three trials back. Red and blue arrows indicate trials in which responses were detected. d-prime was calculated from the hit and false-alarm rates for each session (see Methods). (b) Learning curves (thin lines) of individual subjects (n = 20; coded by color) after smoothing with five-session moving average. Learning curves significantly fitted with an inverse curve (bold line; y = ab/x) are presented (F-test, p < 0.05; see Supplementary Fig. S1 for remaining subjects). Gray curves indicate three subjects who were excluded from further analysis due to excessive head motions during the resting-state fMRI scan (n = 3, see Methods). Note that the number of sessions was reduced from 25 to 21 due to the moving average.

Resting-state functional connectivity

All subjects underwent resting-state fMRI scan for 5 min 4 s before or after the 3-back task (grouped into Rest-First or Task-First group, see below) on a different day. The data were preprocessed with a canonical resting-state fMRI analysis procedure (see Methods). To reduce spurious changes in functional connectivity by head motion, the data were checked with a method used for reducing motion-related artifacts in resting-state fMRI25. Excessive head motion was identified in 3 of the above 20 subjects (see Methods). Thus, these three subjects were excluded from further analysis, although their learning curves were well fitted by the inverse function (F(1,19) = 30.76, 10.27 and 4.46; p = 2.4 × 10−5, 0.0047 and 0.048, respectively).

To analyze the functional connectivity of whole-brain intrinsic connectivity networks, the results of BrainMap ICA were used to define the regions in each network (masks)6. In the previous study, a spatial ICA was applied to 8,637 activation maps reconstructed from the BrainMap database, resulting in 20 independent components. Of these, 18 components were used as the masks because two components were considered artifacts. We calculated a functional connectivity matrix of 171 connections for each subject. These connections consisted of functional connectivity between 153 combinations of the 18 network masks (18 × 17/2) and functional connectivity within 18 masks. Between-network functional connectivity was calculated as Pearson's correlation coefficient between blood-oxygen-level dependent (BOLD) signal time courses averaged across voxels within individual masks. Each correlation coefficient was transformed into Fisher Z values. Within-network functional connectivity was calculated as mean voxel-wise correlations within each of the masks as follows. The time series for a voxel was correlated with every other voxel within a mask and this calculation was repeated for every voxel in the mask. Then, the correlation coefficients were transformed into Fisher Z values. Finally, the Z values were averaged within the mask. Figure 2a shows the functional connectivity matrix averaged across subjects (n = 17).

Figure 2
figure 2

Matrices for functional connectivity and selection count of 18 intrinsic connectivity networks.

Diagonal and non-diagonal elements show within- and between-network connectivity, respectively. (a) Functional connectivity matrix averaged across subjects (n = 17). Color bar indicates Z-transformed correlation coefficient. See Figure 4b for regions included in each network. (b) Selection count matrix. Red circles indicate connections having a significantly greater selection count than the chance level according to the binomial distribution (p < 0.05 after Bonferroni correction for number of connections, see Methods).

Prediction of performance plateau

A prediction model for the performance plateaus was defined as y = wx + e. Here, y is a performance plateau, x is a 1 × 172 vector, including 171 functional connectivities (Z-transformed correlation coefficients) and a bias term, w is a 172 × 1 weight vector and e is residual noise. A sparse linear regression26 was used to estimate weight values in the prediction model. This method calculates automatic relevance-determination parameters that indicate the contribution of the estimated weight to the prediction. Based on these parameter values, the weights that contribute very little to the prediction are set to zero. In this way, only the most relevant elements of the functional connectivity matrix (i.e., connections) are sparsely selected to predict the performance plateaus. Therefore, the sparse linear regression is beneficial for avoiding overfitting, even for a small number of data compared to a large number of parameters. To reduce bias in selecting relevant connections26, each correlation value was divided by the standard deviation over a dataset separately calculated for each connection.

Leave-one-subject-out cross-validation was used to estimate and validate the prediction model27,28. In each validation fold, one subject's data were used as test data while the remaining subjects' data were used as training data for the sparse linear regression. A weight vector estimated from the training data was used to predict the performance plateau from the functional connectivity matrix in the test data. The validation was conducted in 17 folds, which is equal to the number of subjects included in the analysis. Figure 3 shows the relationship between the predicted and actual performance plateaus for the individual subjects (n = 17). The coefficient of determination (R2) was 0.73 when it was calculated as the square of the correlation coefficient between observed and predicted performance plateaus. This R2 value was statistically significant (p = 0.003) according to a permutation test in which the plateaus of individuals were randomly shuffled 10,000 times.

Figure 3
figure 3

Scatter plot of predicted versus observed individual performance plateaus (n = 17).

Solid line is the regression line with a 95% confidence interval (gray area). Filled and open circles indicate subjects who first underwent resting-state fMRI (i.e., Rest First; n = 10) and those who first received training of the 3-back task (i.e., Task First; n = 7), respectively.

To ensure that the order of the task and the resting-state fMRI scan did not affect the prediction accuracy, we tested the difference in prediction accuracy between the two groups. After calculating the prediction error between the predicted and observed performance plateaus for each subject, we compared the errors in the Rest-First group (n = 10) with those in the Task-First group (n = 7). As a result, no significant difference was found between the two groups (Wilcoxon signed-rank test, p = 0.67). Moreover, R2 was significant even if the above permutation test was performed separately for the groups (Rest-First: R2 = 0.62, p = 0.03; Task-First: R2 = 0.87, p = 0.007).

Functional connections contributing to prediction

We investigated which connections contributed to the prediction. The sparse linear regression selected 16.24 ± 2.19 (mean ± SD) from 171 connections across 17 validation folds. We counted how many times each connection was selected (termed ‘selection count’29; Figure 2b). Under the null hypothesis that 16 connections were randomly selected from 171 connections, we tested the probability of the selection count by a binomial test. As shown by the red circles in Figure 2b, we found nine connections that were significantly frequently selected (p < 0.05, Bonferroni corrected for 171 connections). Mean selection count was 15.00 (SD: 2.60) across the nine connections (Supplementary Table S1) but only 0.87 (SD: 1.45) across the remaining connections. This indicates that the nine connections were consistently selected across validation folds.

To quantify the relative importance of the connections, we investigated the differences in contribution among the nine connections. In each validation fold, the estimated weights were multiplied by the normalized correlation coefficients (i.e., functional connectivities) at the corresponding connection and averaged across the 17 folds. ‘Contribution ratio’ was defined as a ratio of the product (weight × coefficient) at each connection to the summation of products over the 171 connections. Widths of the connection lines (edges) in Figure 4a indicate the contribution ratio of the nine connections. Red or blue edges indicate that the coefficients averaged across subjects (Figure 2a) had positive or negative functional connectivity, respectively. Note that the positive and negative functional connectivity had positive and negative averaged weights, respectively, at all nine connections. Therefore, their products (i.e., contribution ratios) were all positive, thus showing that coefficients greater in absolute value predict higher performance plateaus.

Figure 4
figure 4

Contribution of connections to prediction of performance plateau.

(a) Circle plot of the 18 intrinsic connectivity networks in order of relevance to working memory according to the metadata of BrainMap ICA. Contribution ratios (see text) of the nine connections that were consistently selected by a sparse linear regression model (red circles in Figure 2b) are presented as the thickness of connection lines (edges). Red and blue edges indicate positive and negative functional connectivity, respectively. (b) Network labels (left column) and their detailed definitions in BrainMap ICA (right column). Color bar indicates weight values that quantify how strongly each network is related to working memory function.

As a result, the largest contribution ratio (47.1%) was found at positive functional connectivity within the left fronto-parietal network. We found the second-largest contribution ratio (21.7%) at positive connectivity between a network including the supplementary motor and premotor areas and the frontal eye field (hereafter, we use ‘supplementary motor network’ for brevity) and a network including the primary sensorimotor cortices for hands (‘primary sensorimotor network (hand)’). In addition, a small contribution ratio (1.2%) was observed for positive connectivity between a network including the middle and inferior temporal frontal gyri (‘lateral temporal network’) and a network including the middle frontal gyri and superior parietal lobules (‘middle frontal and parietal network’). The other six connections of negative functional connectivity showed relatively lower contribution ratios (0.68–7.90%) among the 10 networks that were widely distributed in brain areas including the frontal, parietal, temporal and occipital cortices and midbrain. The contribution ratios of all connections other than the above nine connections were nearly zero (see Supplementary Fig. S2).

Characterizing networks by the metadata

Finally, we examined how each network is related to the working memory function according to BrainMap ICA6. The study used metadata classes of behavioral domains (cognitive processes) and paradigms (experimental tasks) in the BrainMap database as well as calculated weights that quantify how strongly each network is related to the classes. These weights were normalized by Z-transformation across the classes to account for uneven sampling in the database. We averaged the weights of the networks across metadata classes related to working memory (see Supplementary Methods). In Figure 4, the weights are presented in color and networks are ranked according to the weights. Note that negative weights were normalization results and not derived from negative values in the original data.

The largest weight was observed in the left fronto-parietal network, whose functional connectivity showed a primary contribution ratio. We found that five of the above six connections of negative functional connectivity were between networks that were highly (orange/yellow) and less (green/blue) relevant to working memory function. This means that stronger negative functional connectivity between the networks that are relevant and less relevant to working memory leads to higher performance plateaus (see Supplementary Fig. S3).

Discussion

Our current study tested the hypothesis that individual learning performance of cognitive function is determined by functional connections among pre-existing intrinsic networks that include both task-activated and less-activated brain regions. Consistent with the general consensus that individual difference in learning performance is attributed to functional connections within task-activated brain regions, connectivity within the left fronto-parietal network most strongly contributed to prediction of the individual performance plateau (about 47% of the contribution ratio). In addition, functional connectivity between the ‘middle frontal and parietal network’ and ‘lateral temporal network,’ both of which are known to be activated in working memory tasks, also contributed to the prediction and accounted for more than 1% of the contribution ratio. On the other hand, consistent with our hypothesis, two types of connectivity between task-activated and less-activated networks contributed to the prediction. First, positive functional connectivity between the ‘supplementary motor network’ and ‘primary sensorimotor network (hand)’ accounted for about 22% of the contribution ratio. Second, negative functional connections between task-activated and less-activated networks (six connections) represented in total more than 23% of the contribution ratio. Accordingly, functional connections within task-activated networks and those between task-activated and less-activated networks accounted for about 48% and 44%, respectively, of the contribution ratio. These results suggest that the connectivity between networks that play central roles in corresponding task execution and networks that have less relevance to the task greatly influences individual learning performance of cognitive function.

Functional connections within task-activated regions were linked with learning performance of working memory. First, connectivity within the left fronto-parietal network had the largest contribution to the prediction (47%). It has been shown that learning in working memory tasks modulates activity and structure in the fronto-parietal network1,2,3,4. Additionally, performance of working memory tasks has been correlated with task-evoked activity8,10 or functional connectivity8,9,10 within the fronto-parietal network. However, there has been no consensus on laterality. Different results have been reported for the importance of right3,8, left2,10, or both hemispheres1,9. However, just for intrinsic connectivity networks, previous studies have suggested that the left fronto-parietal network is more important6,10, which is consistent with our results. This work's findings widen our knowledge of the importance of the fronto-parietal network in cognitive learning by providing evidence that functional connectivity within the left fronto-parietal network contributes to predicting an individual's performance plateau of working memory. Second, the connection between ‘middle frontal and parietal network’ and ‘lateral temporal network’ was consistently selected by the sparse linear regression model, although its contribution ratio was small (~1%). These networks are known to be related to visual representations of letters30 and top-down attention31 underlying working memory function32,33.

Metadata from BrainMap ICA showed that the supplementary motor network (including supplementary motor and premotor areas as well as the frontal eye field) had the third-highest relevance with working memory tasks (Figure 4). In spite of lower relevance (11th out of 18 networks) in the primary sensorimotor network (hand), connectivity between these networks represented more than 20% of the contribution ratio. One possible reason for this relatively large contribution may be that this connection contributes to appropriate responses. In our 3-back task, the subjects had to respond by a hand key press and thus appropriate response selection and motor output was necessary. Moreover, it has been shown that effective connectivity from the supplementary motor area to the primary motor cortex plays an important role in response inhibition34,35. Another possible reason may be that this connectivity helps recognition of the letter stimuli. Several studies reported activity in premotor36,37 and primary sensorimotor regions36 during visual recognition of letters and suggested that these regions contain sensorimotor representation of letters. Therefore, it might be possible that distributed letter representations in the sensorimotor as well as ‘lateral temporal network’ (i.e., visual representation of letters, see above) allow highly efficient processing of stimuli and contribute to a yet greater performance plateau of working memory.

We found that multiple negative connections between task-activated and less-activated networks accounted for more than 20% of the contribution ratio in total (0.68–7.90% for each; Figure 4). It is likely that these negative connections reflect background processing that is irrelevant to working memory tasks6. Many previous studies have linked stronger negative functional connections with higher cognitive functions21,22,38,39,40. For example, higher fluid intelligence has been correlated with stronger negative functional connectivity between lateral frontal cortex and the default mode network. A recent study has suggested that a developmental increase in negative connectivity between the medial prefrontal cortex and the amygdala reflects an increase in regulation of the amygdala by medial prefrontal cortex41. Regulation of a task-irrelevant network by the task-relevant network could lead to higher learning performance.

Training-induced neural changes in the default mode network were reported after intensive and consecutive behavioral training for each day over four weeks42. That study found, in resting state, an increase in functional connectivity within the default mode network but a decrease in connectivity between the default mode network and the lateral prefrontal/posterior parietal cortices. In addition, it has been suggested that these resting-state functional connectivities are correlated with performances of working memory tasks20,21,22. In our study, however, we observed no evidence that the default mode network contributes to learning performance on working memory tasks. A possible reason for this difference is that the previous studies recorded resting-state fMRI interleaved with task executions. Therefore, in those studies, changes in functional connectivity involving the default mode network may have reflected transient cognitive states such as concentration on tasks rather than individual traits such as learning ability. By contrast, we separated the time of conducting resting-state fMRI from that of working memory training by several days (see Methods) to exclude transient task-related effects, including concentration on the task, from the resting-state fMRI data.

Our study investigated neural substrates of performance plateaus, that is, the limits of learning capacity. These limits were associated with functional connectivity between the ‘lateral temporal network’ and ‘middle frontal and parietal network,’ connections within the left fronto-parietal network, connectivity between the ‘supplementary motor network’ and ‘primary sensorimotor network (hand),’ and multiple negative functional connections; these connections, in turn, correspond to encoding of visual information, attention to the visual target information, appropriate response selection (or sensorimotor representation of letters) and inhibition of task-irrelevant networks, respectively. Our method has a limitation in that the prediction of plateaus assumes a significant effect of training (Figure S1b). Our study suggests that limits to individual cognitive capacity after training are affected not only by task-activated networks but also by less-activated networks for the subjects, who exhibited significant learning effects well modeled by the inverse curve of learning time. Future research should investigate whether behavioral training or external stimulation technologies can overcome these individual learning-capacity limits.

Methods

Subjects

Twenty-nine healthy subjects (19–24 years old; 13 females; all right-handed) participated in the study and gave written informed consent. The experiments were conducted according to the Declaration of Helsinki and approved by the Ethics Committee at Advanced Telecommunication Research Institute International (www.atr.jp). Eighteen subjects first participated in the resting-state fMRI scan (mean 1.8 ± 1.6 d before the 3-back task), while eleven subjects first performed the 3-back task (mean 3.6 ± 2.8 d before the scan).

3-back task

As illustrated in Figure 1a, one of nine letters (‘B’, ‘C’, ‘D’, ‘F’, ‘G’, ‘H’, ‘J’, ‘K’ and ‘L’) was presented in a random sequence (0.5 s duration, 1.5 s inter-stimulus interval). The target stimuli appeared in 34.3 ± 1.2% (mean ± SD) trials in a session. A short break of 6.0 s was interleaved between blocks. The task was controlled using Matlab (MathWorks, Natick, MA, USA) Psychophysics Toolbox (Psychtoolbox-3; www.psychtoolbox.org). Subjects were allowed to take a break between sessions until they restarted the task by pressing any key. Subjects did not practice the task beforehand.

Analysis of behavioral data

We calculated d-prime as Z(hit rate) - Z(false alarm rate) for each of the 25 sessions, where Z is the inverse of the cumulative Gaussian distribution and acquired a series of 25 d-primes for each subject. After the series (learning curves) were smoothed with a 5-session moving average, we fitted an inverse function24 (y = ab/x) to the curves by a linear model function (lm) in the R statistical software (www.r-project.org). To test the significance of the fitting, we performed an F test for each subject. The sum of squares due to error (SSE) and regression (SSR) were calculated as

Here, n, yi, and denote the number of data points (i.e., 21), an observed d-prime for session i in the smoothed learning curve, the mean d-prime of the 21 d-primes and predicted d-prime, respectively. We calculated the F-value as

where m denotes the number of parameters (i.e., 2) and corresponding p-value.

MRI data acquisition

Images were acquired with a 3T MRI scanner MAGNETOM Trio Tim (Siemens Medical Systems, Erlangen, Germany). Anatomical images were acquired for normalization to the standard brain for registration purposes. T1-weighted images were acquired first (TR = 100 ms, TE = 2.42 ms, flip angle = 60 degrees, matrix = 256 × 256, field of view = 256 mm, slice thickness = 10 mm, 10 slices, voxel size = 10 × 1 × 1 mm), followed by T2-weighted image acquisitions (TR = 6.0 s, TE = 57 ms, flip angle = 160 degrees, matrix = 256 × 256, field of view = 192 mm, slice thickness = 3.5 mm, 33 slices, voxel size = 0.75 × 0.75 × 3.5 mm). Functional images were acquired with an echo planar imaging sequence (TR = 2.0 s, TE = 30 ms, flip angle = 80 degrees, matrix = 64 × 64, field of view = 192 mm, slice thickness = 3.5 mm, 33 slices, voxel size = 3 × 3 × 3.5 mm) at rest for 5 min 4 s. During the resting-state scans, subjects were instructed to keep looking at a central fixation point, to keep still, to stay awake and not to think about specific things.

Resting-state fMRI data preprocessing

The data were processed with SPM8 (Wellcome Trust Centre for Neuroimaging, London, UK; www.fil.ion.ucl.ac.uk/spm/) on Matlab. The first two volumes were discarded to allow for T1 equilibration. The remaining data were preprocessed with slice timing correction, realignment and spatial smoothing with an isotropic Gaussian kernel of 8 mm full width at half maximum. To remove several sources of spurious variance along with their temporal derivatives, linear regression was performed, including (i) six motion parameters in addition to averaged signals over (ii) gray matter, (iii) white matter and (iv) cerebro-spinal fluid43. Furthermore, to reduce spurious changes in functional connectivity by head motion, the data were checked by the method reduce motion-related artifacts25. Specifically, we calculated frame-wise displacement (FD) and DVARS (D: temporal derivative of time-courses, VARS: root mean square variance over voxels) and removed volumes with FD > 0.5 mm or DVARS > 0.5%, as proposed by the original article. Subjects were excluded from further analysis if the number of excluded volumes was more than 25 of the total 150 volumes.

Network definition of functional connectivity

Network images were downloaded from the website of BrainMap ICA (www.brainmap.org/icns/). Each image of the network represents Z statistics corresponding to the power of a component voxel-by-voxel. Each image was thresholded at Z > 4 as processed in previous BrainMap ICA studies6,44.

Selection count

Statistical significance of the selection counts was tested by a binomial test. Since 16.24 ± 2.19 (mean ± SD) connections were selected in each of the 17 validation folds, we assumed a binomial distribution Bi(n, p), where n = 17 (number of validation folds) and p = 16/171 (probability of being selected from all of the connections).