Complementary topology of maintenance and manipulation brain networks in working memory

Working memory (WM) is assumed to consist of a process that sustains memory representations in an active state (maintenance) and a process that operates on these activated representations (manipulation). We examined evidence for two distinct, concurrent cognitive functions supporting maintenance and manipulation abilities by testing brain activity as participants performed a WM alphabetization task. Maintenance was investigated by varying the number of letters held in WM and manipulation by varying the number of moves required to sort the list alphabetically. We found that both maintenance and manipulation demand had significant effects on behavior that were associated with different cortical regions: maintenance was associated with bilateral prefrontal and left parietal cortex, and manipulation with right parietal activity, a link that is consistent with the role of parietal cortex in symbolic computations. Both structural and functional architecture of these systems suggested that these cognitive functions are supported by two dissociable brain networks. Critically, maintenance and manipulation functional networks became increasingly segregated with increasing demand, an effect that was positively associated with individual WM ability. These results provide evidence that network segregation may act as a protective mechanism to enable successful performance under increasing WM demand.

(DRAT), which utilizes both forms of WM processing. Here maintenance was examined by assessing parametric changes in the number of letters held in WM (Set Size) and manipulation, by assessing the number discrete moves required to alphabetize the letters (Sorting Steps), both during the delay period. It was hypothesized that Set Size and Sorting Steps would have distinct effects on performance and elicit distinct parametric patterns of univariate activity. Based on neuroimaging evidence linking the maintenance of stimulus representations across a delay to bilateral prefrontal cortex (PFC) and the left inferior parietal sulcus (IPS) during both lexical 6,7 and visual working memory 8,9 , and abstract symbol manipulations to the right superior parietal lobule [10][11][12][13] , a dissociation between these two regions was expected for these two distinct but interrelated working memory processes. Lastly, to address the limited focus of previous maintenance-manipulation fMRI studies on univariate activity, we also examined network dynamics. While univariate fMRI studies emphasize significant clusters of regions predicting aspects of WM function, quantifying the dynamic relationship between subnetworks is difficult within this framework. Graph theoretical techniques help to describe the brain as a complex network comprised of functionally separable subnetworks, which may modulate their segregation or integration across different levels of task demands. Furthermore, it is unclear how the basic fundamental relationship between the timeseries for two regions (e.g., the anti-correlation based on mean r values across nodes in a network), is related to more complex graph measures describing the complex organization of modular networks (e.g., segregation/integration). In the current study we used graph measures of network segregation and reconfiguration 14,15 to describe the dynamics of maintenance and manipulation networks as a function of maintenance or manipulation demands. Changes in the relational complexity of a task have been associated with variations in the segregation of PFC regions 16 , as well as to more global alterations in the organization of whole-brain partitions 14,17,18 . Thus, the present study offers an intermediate approach between these local and global scales, defining widespread, task-related networks that represent concurrent maintenance and manipulation operations. Given that the goal of maintenance is to sustain information in the same state whereas the goal of manipulation is to alter this state, it was expected that negative association would exist between networks supporting these processes. Moreover, it was also expected that this segregation of processing would increase with task difficulty, insulating local processing regions from competing information from a separate subnetwork.
In sum, we hypothesized that Set Size and Sorting Steps would (1) have differential effects on WM performance, (2) be associated with univariate activations in different brain regions (e.g., bilateral PFC and IPS vs. right SPL), and (3) be supported by dissociable neural networks. We expected that the answer to these hypotheses would clarify the neural mechanisms underlying the two main types of cognitive operations mediating working memory function, maintenance and manipulation.

Results
Behavioral results. A comparison between the two types of WM processes was first examined behaviorally using behavioral responses to a Delayed Recognition Alphabetization Task (DRAT) during fMRI scanning (Fig. 1A). In this task Set Size is defined by the number of letters present within a stimulus array, while Sorting Steps are the minimum number of discrete changes required to transform the initial random letter array into the alphabetized array. The current task paradigm differs from classical maintenance/manipulation paradigms advanced by previous groups 2,4,5 in 2 critical ways. First, we consider the concurrent operation of multiple working memory processes; subjects were not cued to focus on a specific working memory process; instead, both processes were intrinsic to all trials. Second, the current paradigm quantifies not only the number of items to be held in working memory, but also the complexity of the manipulation operation to be performed during the delay. Set Size had four levels defined based on data from an initial screening session. Individual fitted accuracy functions, centered around each subject's individual Criterion, and determination of their Starting Set Size (i.e. Set Size value corresponding to the easiest condition) are shown in Fig. 1C. The number of Sorting Steps was estimated using the minimum number of sorting operations calculated from four sorting algorithms (See Methods), approximating a normal distribution within each Absolute Set Size (Fig. 1D). Figure 2 presents mean accuracy and RT data describing performance on this task. Based on likelihood ratio tests of the full model with a null model removing the relevant term (Table 1), both Relative Set Size (4 load levels titrated to individual performance) and Sorting Steps made significant and distinct contributions to both accuracy and RTs. Specifically, the binary logistic regression of accuracy revealed a significant effect of Relative Set Size (χ 2 = 80.07, p = 2.2e- 14) and Sorting Steps (χ 2 = 22.14, p = 2.5e-6), as well as a significant Relative Set Size by Sorting Steps interaction (χ 2 = 12.35, p = 4e-4). The linear mixed effects regression applied to RT data revealed a similar pattern of findings, such that both Relative Set Size (χ 2 = 45.73, p = 1.4e-11), Sorting Steps (χ 2 = 12.39, p = 4.3e-4), and their interaction (χ 2 = 10.66, p = 1.1e-3) demonstrated significant effects. Effects of Gender and Starting Set Size were nonsignificant in both models (p > 0.05), which is not surprising given the inclusion of intercepts for subjects, as well as by-subject random slopes for the effect of Relative Set Size and Sorting Steps. These findings therefore support the approach of using these two measures to disentangle maintenance and manipulation WM mechanisms. fMRI results. Univariate activity. Univariate analyses were used to identify regions where delay-period activity increased parametrically as a function of Relative Set Size or Sorting Steps. As shown by Fig. 3 and Table 2, models with concurrent parametric regressors show that Relative Set Size was associated with increased activity in bilateral PFC (including the middle and inferior frontal gyri-MFG and IFG), ventral parietal cortex (VPC), and the anterior cingulate cortex (ACC), whereas Sorting Steps were associated with activations in superior parietal lobule (SPL), ACC, the posterior cingulate cortex (PCC), the superior temporal gyrus (STG), and the hippocampus. Comparison of non-competing parametric maps at the single subject level confirmed that both maintenance and manipulation parameters elicited activity in overlapping middle-cingulate regions.
The strength of these unique effects is surprising, given the moderate collinearity between Set Size and Sorting Steps noted above. To investigate possible overlaps between the parametric effects of Set Size and Sorting Steps, whole-brain conjunction analysis was performed at the subject level, using parametric fMRI models with either Set Size or Sorting Steps (but not both regressors). Significant overlapping voxels were observed only in mid-cingulate cortex and anterior SFG, indicating that these regions are sensitive to both maintenance and manipulation.  Mean values and standard error across subjects for accuracy (A) and RTs (B) across Relative Set Size, reflecting the number of items to be retained in WM across a 5 s delay (adjusted across subjects to 4 levels), and Sorting Steps, reflecting the number of sorting operations required to alphabetize a given letter array. Note: Statistical significance was determined by linear mixed-effects models using trial-wise information, means and error bars in this figure reflect subject-level information and are presented here for display only. Furthermore, the model fit for each ROI was examined to infer explicit evidence for collinearity between the convolved parametric Set Size and Sorting Step regressors. In order to test explicitly for the nature of the collinearity between these terms, the average Variance Inflation Factor (VIF) was calculated across runs, for each ROI.  14,20 .
Network analyses. The network-level analyses are organized into 3 stages: network identification and validation, basic network description, and segregation & reconfiguration analysis. These analyses began by identifying Maintenance and Manipulation networks, by relying on both functional and structural information to define and validate the task-based connectivity approach. These networks were constructed with equal numbers of nodes, in order to ensure that the main network metrics (within-or between-network correlations, see below) were not biased by the number of regions contributing to that aggregate measure. First, masked parametric univariate activity ( Fig. 4A) with the 471-ROI Harvard-Oxford brain atlas was used in order to identify the top 5% nodes (n = 23) for each parametric effect (Fig. 4B), as determined by the z-statistic from the parametric map within a given ROI/node; no overlapping nodes were found. To ensure an equal number of ROIs in the two networks, each ROI was ranked by its mean z-score in parametric analyses and identified the top 5% of nodes (a more liberal top-10% or top-20% threshold [n = 46, 92 nodes in each network] also revealed no overlap in networks). The Maintenance (blue) and Manipulation (blue) networks are visualized both as the nodes and as the connections between these nodes in Fig. 5A.
Structural network validation. Before analyzing functional within-and between-network connectivity, averaged across the putative task-related networks, patterns of structural connectivity between nodes was examined in order to test the validity of the task-based node definitions. If these networks form reliable task-based parcellations, structural network connectivity should be weaker between-networks than within-networks. Consistent with this idea, structural connection strength (measured using fractional anisotropy or FA) was weaker between-networks than within-network. This pattern was present in both the Maintenance (t 28 = 20.5; p = 2.2e-18) and Manipulation (t 28 = 12.7; p = 3.5e-13) networks (Fig. 4C). This result suggests a structural basis for functional connectivity patterns within each task-related network, and points to a clear structural hurdle to between-network connections. While these effects may be at least partially due to greater mean distance between nodes (Maintenance network: 57.1 mm; Manipulation network: 73.7 mm; between-network: 82.4 mm), this difference is not incompatible with community membership (regions closer together are often more likely to form coherent neurocognitive networks). Thus, subsequent network analysis results are characterized in terms of two discrete networks, the "Maintenance network" and "Manipulation network". While we have demonstrated that this task-based community assignment has both functional and structural foundations, we do not assume that the same Maintenance and Manipulation networks operate for every particular WM paradigm and stimulus type.

Effects of Set Size and Sorting
Steps on summary measures of functional network connectivity. Next we return to our two principle measures of Maintenance and Manipulation functions, and examine the effects of increasing Set Size or Sorting Steps, respectively, on within-and between-network connections ( Fig. 5A) were examined in the same discrete Maintenance and Manipulation networks defined above. Here, two reliable patterns were found that helped explain how increasing computational complexity in the behavioral domain manifests as a more segregated cortical system in which local networks predominate over more global connectivity. As illustrated by Fig. 5B a significant main effect of Network Connection Type on connectivity (i.e., mean correlation value, F 1,39 = 215.23, Figure 4. Converting univariate information into multivariate topology. Thresholded parametric maps (Fig. 4), using average responses within all voxels in each ROI in the HOA471, were used in order to identify regions responsive to Set Size or Sorting Steps (A). (B) The top 5% nodes of each network were then assigned to either Maintenance or Manipulation networks, based on the parametric effect (z-score) within these nodes (p < 0.005).
(C) Structural network connectivity is stronger within than between networks, helping to validate the taskbased network parcellation. Independent of any functional information, nodes selected within the Maintenance or Manipulation networks showever greater connectivity than between the two putative task-related networks. p < 0.001) was found, such that the mean correlation values were stronger in the Maintenance and Manipulation networks than between the networks. When difficulty was split by Set Size, within-network connectivity in both the Maintenance and Manipulation networks was consistently positive (one-sample t collapsing across levels were 5.31 and 4.43, respectively, both p < 0.001), as may be expected for networks defined by their task-relatedness. Splitting Steps. (C) Both Maintentance and Manipulation networks became increasingly segregated with increasing Set Size, suggesting that the negative relationship between these two networks was behaviorally meaningful. Note: Statistical significance was determined by linear mixed-effects models, which may not be reflected in the averages and standard errors displayed here. Note: Z w : mean within-network correlation; Z b : mean between-network correlation. these same networks by Sorting Steps elicited similar effects. Chi-squared tests accounting for subject-level differences in mean connectivity demonstrated no effect of difficulty on within-network connectivity in either Maintenance or Manipulation network, whether difficulty was defined by Set Size (χ 2 = 0.3, χ 2 = 0.5, respectively for each network, both p > 0.1) or by increasing number of Sorting Steps (χ 2 = 0.6, χ 2 = 0.2, respectively, both p > 0.1). This result suggests that the connectivity between nodes within each network was consistent across all levels of difficulty, and that any difficulty-related changes are driven largely by between-network connections. Interestingly, in contrast with the positive within-network connections (range for Maintenance network: r = 0.22-0.25, Manipulation network: r = 0.11-0.13), between-network correlations were consistently negative (mean r across levels = −0.04; one-sample t-test collapsing across levels: t 40 = −3.57, p = 4.62e-3). Furthermore, the mean connectivity between networks demonstrated a decline in connectivity with increasing Set Size (χ 2 = 3.81, p = 4.5e-2) or increasing number of Sorting Steps (χ 2 = 3.51, p = 5.4e-2), indicating that the correlation between nodes in these two task networks declines linearly with increasing complexity, signifying a behaviorally meaningful relationship.
Network re-organization and its behavioral consequence. To examine system-level organization, two derived measures of overall network organization were calculated. Segregation, which describes the difference between within-and between-network correlations as a proportion of mean within-system correlation, is defined by the segregation coefficient, a node-level measure describing the degree to which local nodes become more connected to other local nodes within a network compared to nodes outside the local network (Fig. 5C). Within both the Maintenance and Manipulation networks, the segregation coefficient showed a clear linear increase with increasing Set Size (χ 2 = 4.53, χ 2 = 4.48, respectively, both p < 0.05), further supporting the idea that the global organization tended towards increasingly segregated network nodes. In contrast, increasing Sorting Steps did not elicit the same effect in either the Maintenance or Manipulation networks (χ 2 = 2.34, χ 2 = 0.1, respectively, both p > 0.1), suggesting that the segregation effect was driven by changes in Set Size. While not a planned comparison due to the fact that WM operations studied here reflect overlapping processes (and thus overlapping connectome information) we performed an explicit post hoc WM Operation (Set Size, Sorting Steps) × Difficulty (4 levels) ANOVA, to test whether the observed difficulty-related effect on segregation was indeed specific to Set Size. The lack of a significant WM Operation x Difficulty interaction (all χ 2 < 1.0, for either Maintenance or Manipulation networks), coupled with the upward trends in Fig. 5C suggests that Sorting operations, in addition to Set Size may drive difficulty-related increases in segregation in a larger sample.
Lastly, network reconfiguration was analyzed using a summary statistic that describes the individual differences in network reconfiguration across the task conditions (Fig. 6A). Here, it was found that network reconfiguration was greater in connections between Maintenance and Manipulation networks than within either task network alone (t = 9.84, p = 4.1e-12; t = 11.10, p = 1.2e-13, respectively; see marginal rug plots in Fig. 6B). Furthermore, network reconfiguration in these between-network connections was predictive of subjects' individual Criterion for the WM task, which describes the idealized 82% level of behavioral performance (r 39 = 0.39, p = 0.012), while within-network reconfiguration was not (r 39 = 0.17 and 0.30 for Maintenance and Manipulation networks, respectively, both p > 0.05). Results were similar when splitting networks by the number of Sorting Steps, with a slight increase in the correlation between between-network reconfiguration and Criterion (r 39 = 0.41, p = 0.007); a Fisher r-to-z transformation found no difference between these two correlations for Figure 6. Network Reconfiguration. (A) While the segregation measure above is descriptive of network behavior at discrete levels of difficulty, network reconfiguration describes the overall similarity between task conditions, i.e., between network states. Network reconfiguration represents a direct comparison between network states, and in this case represents an average of the correlation values between all functional connection matrices for a given subject. (B,C) Network reconfiguration was higher in between-than within-network connections, and between-network reconfiguration (but not within-network reconfiguration) was predictive of individual differences in working memory ability (i.e., Criterion), whether reconfiguration was assessed across Set Size (B) or Sorting Steps (C). between-network connections (z = 0.15, p = 0.94), suggesting that the reconfiguration of between-network connections were critical for successful memory whether the difficulty of the task was driven by Set Size or Sorting Steps. The direction of these effects demonstrates that individuals with higher working memory capacity have greater changes in between-network functional connectivity in response to increasingly difficult task conditions, suggesting that network reconfiguration in working-memory related regions is adaptive to task demands.

Discussion
Going beyond previous fMRI studies on WM maintenance versus manipulation, the current study investigated these processes using a novel behavioral paradigm in which maintenance and manipulation are assessed simultaneously within the same trial by indexing maintenance in terms of Set Size (number of letters) and manipulation in terms of Sorting Steps (number of sorting operations to alphabetize a letter array). This paradigm improves upon standard maintenance/manipulation task paradigms by explicitly examining specific manipulation operations ongoing with a concurrent maintenance demand to retain items in memory. The study yielded three main findings. First, it was found that Set Size and Sorting Steps made significant and independent contributions to accuracy and RTs, supporting the distinction between maintenance and manipulation. Second, maintenance and manipulation recruited distinct frontal-parietal patterns of univariate activity: maintenance was associated with a bilateral fronto-parietal network, as typical in WM tasks, whereas manipulation was associated with greater activity in the right SPL, a region associated with symbolic computations. Third, summary measures of the functional connectivity between the Maintenance and Manipulation networks demonstrated a negative association which increased with task demand, suggesting the action of a protective mechanism against interference of the cognitive operations within the two networks. These three main findings are discussed below.
WM maintenance and manipulation are dissociable in behavior. The first goal of this study was to provide evidence that distinct, concurrent processes underlie basic working memory operations. To the authors' knowledge, this is the first study in which WM maintenance and manipulation have been investigated within the same trial. Moreover, it is also the first instance in which WM manipulation has been linked to a specific measure of the computation required by the task, namely the alphabetization of letters which requires individuals to sort letters into an ordered array. Here, the number of sorting steps was quantified using established algorithms from the computer science literature (insertion sort, selection sort, etc.). As there is no known evidence of specific sorting algorithm that humans use to alphabetize letter arrays (or number arrays), the most efficient solution among these algorithms was chosen for each trial. Although Set Size and Sorting Steps measures were correlated, significant main effects for either measure suggested each processes makes an independent contribution to response performance, and that it was possible to disentangle their effects on WM behavior (accuracy and RTs) and brain activity. Nonetheless, their significant interaction also suggests a quantifiable cooperation between brain regions mediating these complimentary cognitive processes.

Distinct univariate brain activity for concurrent maintenance and manipulation operations.
Satisfying the second goal of this study, strong evidence was found for concurrent univariate brain activity tracking separate maintenance and manipulation operations during the WM task. Set Size was associated with activations in bilateral frontal and parietal regions, whereas Sorting Steps was associated with selective recruitment of a right SPL region, as well as activations in ACC, STG, and hippocampus. Below, we consider the two sets of regions associated with maintenance and manipulation. The finding that maintenance was associated with the activity of a bilateral dorsolateral fronto-parietal network is consistent with fMRI evidence linking these regions to WM capacity 21,22 . In previous studies, maintenance-as indexed by Set Size or load-has been sometimes linked not only to dorsolateral PFC (DLPFC), but also to more inferior ventrolateral PFC activity 7,23,24 , but this linked has been challenged 25 . Also, several studies have linked DLPFC to manipulation, not to maintenance 2,26,27 . However, these studies investigated maintenance and manipulation in separate tasks. A problem with this general isolated approach to maintenance and manipulation is that, compared to maintenance tasks (e.g., holding letters in order), manipulation tasks (e.g., alphabetizing) involve not only greater manipulation (e.g., sorting) but also greater maintenance (e.g., holding both original and reorganized letter sequences), as well as interference that may arise between the two processes. Thus, the differential involvement of DLPFC in manipulation tasks could reflect increased maintenance demands, rather than specific manipulation operations.
The finding that Sorting Steps during alphabetization was associated with right SPL is intriguing because this is not a region typically associated with WM manipulation. However, the link between Sorting Steps and right SPL is consistent with the role of this region in symbol computation. For example, activations in right SPL have been reported in almost every neuroimaging study of numerosity, including tasks primarily involved in basic quantity processing, including precise number processing and numerical operations 11,12,28 . However, the role of SPL is not limited to number-based operations. There is also evidence of this region being similarly activated across tasks manipulating both numbers and letters, which may be the result of one or more underlying computational processes shared across domains of symbol manipulation 10,29 . Thus, although right SPL is more commonly associated with number processing, its engagement in this task is likely the result of a more general process involved in all symbol-based computation.
In addition to right SPL, the number of Sorting Steps were also associated with STG and hippocampus. The hippocampus is commonly associated with successful spatial WM 30 , and STG is often found during auditorily presented working memory tasks 31,32 . In this task, the activation in hippocampus may be associated with the mental rearrangement of the letters in space, and the STG with imagery of the letters' while alphabetization was taking place. The involvement of this constellation of regions therefore suggests that symbol computation and rehearsal may be an intrinsic part of working memory manipulation. WM Maintenance and Manipulation networks are negatively associated. The third goal of this study was to investigate whether maintenance and manipulation differ not just on univariate activity but also on network interactions, as measured using graph theory. It was found that nodes in the Maintenance and Manipulation networks were consistently segregated across task conditions, such that summary measures of between-network connectivity were consistently negative. In addition to this general negative correlation, nodes between the two networks showed a consistent linear decrease in connectivity with increasing number of items, and increasing segregation with increasing task difficulty. These results suggest that these two dissociable networks maintain segregated connectivity in order to dissociate the cognitive processes. The increasingly negative relationship with increasing difficulty suggests that these networks become more segregated to combat the interference of these processes as cognitive demand increases. Recent empirical work has begun to focus on how selective network properties change between increasingly complex task conditions 28,33 , and how such changes in the modular structure of functional brain networks relate to behavior. While changes in modular structure in response to task difficulty have been observed now in a number of studies, one discrepancy is in the direction of the effect: both increases 34 and decreases 18,35 in modularity have been reported with increasing task complexity. The discrepancy in these findings may be related to the use of global network variables (e.g., global efficiency) and global network assignments (e.g., default mode network, salience network, etc.), both of which may conflate task-specific operations with operations or regions unrelated to the task at hand. In this context, the task-specific network approach used here first identified specific cortical nodes with relevance to the task, and then offered a clear mechanistic demonstration that the interaction between these systems is modulated by the task demands. This is supported by the increasing network segregation with task difficulty, suggesting that the maintenance of the letter arrays in working memory is increasingly protected from the interference generated by the manipulation of this information. Nonetheless, one result which unites these findings is that the degree to which individual subjects are able to make flexible adjustments in functional network structure is a strong predictor of behavioral performance 15,36 . In particular, individuals who showed greater dynamic reconfiguration across maintenance or manipulation levels had better working memory capacity (as estimated by the subject-level Criterion values). Furthermore, this effect was limited to the reconfiguration of between-network connections (Fig. 6), highlighting the key role of internetwork connectivity in mediating flexible behaviors. How such modular architecture supports the dynamic integration of many high-level cognitive functions remains far from understood, but the present results highlight the importance of task-related connectivity in WM maintenance and manipulation.

Conclusions
The current study presents evidence and arguments for two distinct cognitive functions supporting WM processing during short delays. We examined evidence for significant and independent contributions of Set Size and Sorting Steps in a WM alphabetization task, contributions reflecting Maintenance and Manipulation operations, respectively. These dissociable operations were mirrored in the univariate fMRI results, such that distinct patterns of bilateral fronto-parietal (Maintenance) and right-lateralized SPL (Manipulation) networks were activated. Lastly, we found that connectivity between these networks was increasingly segregated as difficulty increased, and that this effect was positively related to individual WM ability. This analysis therefore suggests the action of a protective mechanism against interference of the cognitive operations within dissociable components of the WM system.

Methods
Participants. Forty-four young adults aged 18 to 35 (mean 22.8 ± 4.6, 23 female, 21 male) participated in the study for monetary compensation; informed consent was obtained from all subjects under a protocol approved by the Duke Medical School IRB. All procedures and analyses were performed in accordance with IRB guidelines and regulations for experimental testing. Participants had no history of psychiatric or neurological disorders and were not using psychoactive drugs. These participants were enrolled in a 6-day TMS protocol, but only data from the Screening session (Day 1) and MR Imaging (Day 2) are reported here. Three individuals were excluded because of poor functional imaging quality (due to excessive movement or falling asleep during the scan), and hence 41 participants are included in the analyses. (Fig. 1). In this task, an array consisting of 3-9 consonant letters was presented for 3 seconds followed by a 5-second delay period during which participants mentally reorganized letters into alphabetical order. Vowels were excluded to prevent chunking. After the delay period, a letter and number were presented together for 4 seconds and the participants pressed one of three buttons to indicate if the probe letter (1) appeared in the position indicated by the number in the alphabetized list (Valid, 40% of trials), (2) was part of original set but the number did not match the position in the alphabetized list (Invalid, 40% of trials), or (3) was not part of the original set (New, 20% of trials). These three types of trials occurred in random order. For all three conditions, the probe was never from the first half of the alphabetized array, and in the Invalid condition, to exclude obvious differences between correct and incorrect position, the number above the letter was always within 1 step of the letter's actual alphabetized position. During the subject-specific titration on Day 1 (see the following paragraph for more information), the response phase was followed by a 5-second (mean) inter-trial interval (ITI). During practice (10 trials), participants were given feedback during this ITI on the accuracy of their previous trial response. Twenty-five trials were included in each of the 6 blocks with a brief, self-paced rest interval between blocks.

Behavioral procedures. The study investigated a Delayed Recognition Alphabetization Task
As part of the overall protocol, subjects participated in 6 experimental sessions, but only the first two are relevant to this study. In the first session, participants performed the DRAT outside the scanner, while seated at a computer terminal, in order to identify the range of Set Size optimal to each participant. The optimal Set Size was identified using 2-down-1-up staircase procedure: when a trial was answered correctly, the Set Size was increased by 1, and when it was answered incorrectly, the Set Size was decreased by 2. Accuracy data for each Set Size was then fitted to a sigmoid function, with Criterion set at 82% accuracy. The two Set Sizes with sigmoid-fitted accuracy immediately greater than Criterion were defined as Very Easy and Easy levels, and the two Set Sizes with accuracy below Criterion were defined as Medium and Hard levels. Thus, the four Set Size levels selected for an individual depended on his/her WM ability (e.g., 3-4-5-6 letters in one participant, 4-5-6-7 in another participant). This method balanced task demands across participants. To ensure that the psychometric function was not strongly influenced by noise for Set Sizes with a low number of trials, 50% accuracy was used for the largest set sizes if less than 10 trials were tested. To achieve more stable curve fits, peripheral anchors were added by including points for Set Sizes of 1 and 2 at 100% accuracy and Set Sizes 10 and 11 at 50% accuracy.
In the second session, participants performed the DRAT inside the scanner. Four blocks, each with 30 trials, were performed using the 4 difficulty levels defined from session 1 performance, with equal numbers of trials for each of the 4 difficulty levels, pseudorandomly chosen across the 4 blocks. Stimuli were back-projected onto a screen located at the foot of the MRI bed using an LCD projector. Subjects viewed the screen via a mirror system located in the head coil and the start of each run was electronically synchronized with the MRI acquisition computer. Trial-by-trial feedback was not given, but the overall accuracy was presented at the end of each block. Behavioral responses were recorded with a 4-key fiber-optic response box (Resonance Technology, Inc.). Scanner noise was reduced with ear plugs, and head motion was minimized with foam pads. When necessary, vision was corrected using MRI-compatible lenses that matched the distance prescription used by the participant. The total scan time, including breaks and structural scans, was approximately 1 h 40 min.

MRI scanning and data preprocessing. MRI was performed in a 3-T GE scanner at the at Duke Brain
Imaging Analysis Center (BIAC). Structural MRI and DWI scans were followed by performing 4 fMRI runs of the DRAT task. The anatomical MRI was acquired using a 3D T1-weighted echo-planar sequence (matrix = 2562, TR = 12 ms, TE = 5 ms, FOV = 24 cm, slices = 68, slice thickness = 1.9 mm, sections = 248). In the fMRI runs, coplanar functional images were acquired using an inverse spiral sequence (64 × 64 matrix, time repetition Functional images were preprocessed using image processing tools, including FLIRT (FMRIB's Linear Image Registration Tool) and FEAT (FMRIB Expert Analysis Tool) from FMRIB's Software Library (FSL, http://fmrib. ox.ac.uk/fsl), in a publicly available pipeline developed by the Duke Brain Imaging and Analysis Center (https:// wiki.biac.duke.edu/biac:analysis:resting_pipeline). Images were corrected for slice acquisition timing, motion, and linear trend; motion correction was performed using FSL's MCFLIRT, and 6 motion parameters estimated from the step were then regressed out of each functional voxel using standard linear regression. Images were then temporally smoothed with a high-pass filter using a 190 s cutoff, and normalized to the Montreal Neurological Institute (MNI) stereotaxic space. White matter and CSF signals were also removed from the data, using WM/CSF masks generated by FAST and regressed from the functional data using the same method as the motion parameters. Spatial filtering with a Gaussian kernel of full-width half-maximum (FWHM) of 6 mm was applied. Brain images were visualized using MRIcroGL (https://www.nitrc.org/projects/mricrogl/).

Experimental Design and Statistical Analyses. Behavioral Analyses. Accuracy and RTs of correct
DRAT trials were analyzed in terms of Set Size and Sorting Steps using linear mixed effects models, as implemented by R and lme4. Set Size had four levels, Very Easy, Easy, Medium, and Hard, which were defined based on data from the first session. Individual fitted accuracy functions, centered around each subject's individual Criterion, and determination of their Starting Set Size (i.e. Set Size value corresponding to the Very Easy condition) are shown in Fig. 1C. Across the sample of 41 participants, 12 had a Starting Set Size of 3; 19 had a Starting Set Size of 4; 9 had Starting Set Size of 5; and 1 had a Starting Set Size of 6. In all future references, Relative Set Size refers to the individually titrated load of four Set Sizes for each subject (beginning with their Starting Set Size, then +1 item, +2 items, and +3 items) quantified across four discrete levels (1-4), whereas Absolute Set Size refers to the original number of letters in an array.
Sorting Steps is the minimum number of discrete changes required to transform the initial random letter array into the alphabetized array. The number of sorting steps was estimated using the minimum number of sorting operations calculated from four sorting algorithms 37 : insertion, selection, merge insertion, and merge selection. Insertion consists of processing each letter one-by-one and inserting it into the correct alphabetized position. Selection consists of identifying the earliest letter in the alphabet and swapping it with the letter occupying the correct position. Merge insertion and merge selection are similar to insertion and selection, respectively, but they subdivide the letter array into two sub-arrays, sorting within each of them, and then combining the results. Assuming that participants used the most efficient strategy, sorting steps was calculated as the minimum number of reordering steps from among the four algorithms on each trial. Given the logical complexities in orthogonalizing Absolute Set Size and Sorting Step factors, letters were selected at random, approximating a normal distribution within each Absolute Set Size (Fig. 1D).
Absolute Set Size and Sorting Steps were moderately correlated (r = 0.51). The distribution of Sorting Steps within each Absolute Set Size approximated a normal distribution within each level of Set Size (all Shapiro-Wilk tests, W = 0.81-0.95, p > 0.05), though increasing Set Size was naturally associated with a wider distribution in the number of Sorting Steps for that level (Fig. 1D). To confirm that both Set Size and Sorting Steps had significant and independent effects on performance, linear (for RT) and logistic (for accuracy) regression analyses were conducted. In all subsequent analyses, Relative Set Size is used as the measure of Set Size to best standardize the level of difficulty across all subjects. RTs were analyzed only for correct trials using a linear restricted maximum likelihood model. Accuracy was analyzed using a binomial logistic model including all trials. For both models, Set Size and Sorting Steps were treated as fixed effects while individual subjects were treated as a random effect. In addition, for both RT and Accuracy models, the interaction term (Set Size by Sorting Steps) was tested in order to account for additional variance attributed to increasing Sorting Steps across the 4 levels of difficulty. In both models, R (R Core Team, 2012) and lme4 38 were used to perform a linear mixed effects analysis; while Relative Set Size and Sorting Steps (with interaction term) were entered into the fixed effects model. Intercepts for subjects, as well as by-subject random slopes were entered for the random effects of Relative Set Size and Sorting Steps. Gender, age, and each subject's Starting Set Size were also included to account for standardizing difficulty levels across subjects. P-values were obtained by likelihood ratio tests of the full model with the effect in question against the model without the effect in question. Residuals showed no evidence of heteroscedasticity (Levene's test, F = 1.43, p = 0.25) or deviation from normality (Shapiro-Wilk test, W = 0.95, p = 0.69). There was no missing data, but participants failed to respond within the permitted 4-second time window on 1.6% of trials (79 out of 4920). These trials were excluded from all analyses.
fMRI analyses. A parametric approach was used to investigate how activity varied as a function of Relative Set Size and Sorting Steps. First-level voxel time-series analysis was carried out using general linear modeling (GLM) implemented in the FEAT toolbox of FSL. Fixed effects models were carried out to examine the parametric effects of Set Size and the number of sorting operations necessary to alphabetize each trial; separate events were modeled for the array presentation (duration: 3 s), delay period (duration: 5 s), and response (duration: subject response time), each with an onset at the beginning of the event. Weighted regressors during the delay period were used to model the difficulty associated with different WM operations. The first regressor increased linearly with the array's Set Size to model the parametric increase in difficulty with increased letter load. The second weighted regressor reflected the minimum number of sorting steps needed on a given trial. Both of these parametric variables were orthogonalized to the non-parametric delay-period regressor, the trial period when maintenance and manipulation are likely to operate concurrently. Incorrect and non-response trials were modeled identically, but separately, and were not considered in the results below. Subsequent to individual-level models, random-effects analysis was performed on the parameter estimates of the parametric regressors (p < 0.005, cluster correction: z > 2.0).

Cortical Parcellation.
We used a consistent parcellation scheme across all subjects and all modalities (DWI, fMRI). Subjects' T1-weighted images were segmented using SPM12 (www.fil.ion.ucl.ac.uk/spm/software/spm12/), yielding a grey matter (GM) and white matter (WM) mask in the T1 native space for each subject. The entire GM was then parcellated into 471 regions of interest (ROIs), each representing a network node by using a subparcellated version of the Harvard-Oxford Atlas 39 , defined originally in MNI space. The T1-weighted image was then nonlinearly normalized to the ICBM152 template in MNI space using fMRIB's Non-linear Image Registration Tool (FNIRT, FSL, www.fmrib.ox.ac.uk/fsl/). The inverse transformations were applied to the HOA atlas in the MNI space, resulting in native-T1-space GM parcellations for each subject. Then, T1-weighted images were coregistered to native diffusion space using the subjects' unweighted diffusion image as a target; this transformation matrix was then applied to the GM parcellations above, using FSL's FLIRT linear registration tool, resulting in a native-diffusion-space parcellation for each subject.
Structural connectivity. Information on the structural connections based on diffusion tractography, between each pair of regions in our data were assessed with a standard DWI processing pipeline used previously in our group 40,41 . DWI data were analyzed utilizing FSL (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki) and MRtrix (http:// mrtrix.org) software packages. Data were de-noised, corrected with eddy current correction, and bias-field corrected using MRtrix and FSL. Constrained spherical deconvolution (CSD) was utilized in calculating the fiber orientation distribution FOD 42 . This FOD was used along with the brain mask to generate whole brain tractography, with seeding done at random within the mask. Relevant parameters regarding track generation are as follows: seed = at random within mask; step-size = 0.2 mm; 10,000,000 tracts. After tracts were generated, they were filtered using SIFT (spherical-deconvolution informed filtering of tractograms) in order to improve the quantitative nature of the whole-brain streamline reconstructions used here 43 . This process utilizes an algorithm which determines whether a streamline should be removed or not based off of information obtained from the FOD, which improves the selectivity of structural connectomes by using a cost-function to eliminate false positive tracts. Tracts were SIFTed until 1 million tracts remained. Connectomes were then generated by using FLIRT to apply a linear registration to the HOA atlases mentioned above to register them to native diffusion space; subsequent connectomes describe the mean fractional anisotropy (FA) for all fiber tracts connecting any pair of regions.
Functional connectivity. Functional connection matrices representing task-related connection strengths were estimated using a correlational psychophysical interaction (cPPI) analysis 40,41 in order to estimate a whole-brain connectivity matrix that describes task-related interactions between brain regions. Briefly, the model relies on the calculation of a PPI regressor for each region (or node), based on the product of that region's timecourse and a task regressor of interest, in order to generate a term reflecting the psychophysical interaction between the seed region's activity and the specified experimental manipulation.
Network definition. Cortical parcellation was first performed using the methods detailed above; all code that underpins the subsequent network analysis is openly available and contributions from the research community are encouraged: www.github.com/ElectricDinoLab. Next, the convolved task regressors from the univariate SCIENtIfIC REPORtS | (2018) 8:17827 | DOI:10.1038/s41598-018-35887-2 model described above were used as the psychological regressor, which were originally coded as either a) the unmodulated (weight = 1) delay for each trial, b) the Set-Size-modulated delay regressor (range = 1-4), or c) the Sorting Operations-modulated delay regressor (range = 0-7); all regressors are mean-adjusted in FSL. Additional psychological regressors were modeled on the onsets for encoding (i.e., letter array) and response (i.e., cue) periods, but were not used in the connectivity analysis. The delay-period regressors were each multiplied with two network timecourses for region i and j. Partial correlations ρ ⋅ PPI PPI z , i j were then computed by removing the variance z, which includes both the psychological regressor and the time courses for regions i and j, as well as constituent noise regressors including 6 motion parameters and noise regressors coding for the concurrent signal in white matter and CSF during each run. In order to compare equally reliable estimates of connectivity delineated by either Set Size or Sorting Steps, the distribution of Sorting Steps within each individual from 0-7 to 1-4 level was interpolated, such that an equal number of trials were used to estimate connectivity values in each parameter. This cPPI analysis resulted in 8 separate output matrices, comprising connectivity delineated by Set Size (4 levels), or Sorting Steps (also 4 levels). Task-related connectivity was estimated from the resulting output matrices; negative connections were included in these analyses, as they may inform important, explicit interpretations about how networks may be segregated 44 . Graph metrics, including modularity (describing the modular organization of the whole-brain graph) and strength (describing a sum of the connectivity strengths for each node) were computed using the Brain Connectivity Toolbox as described previously 40 and, when appropriate, summed across all nodes within a task-related network.
Maintenance and Manipulation networks were defined by using both functional and structural information. First, parametric univariate activity from voxelwise maps was averaged within individual regions of interest (ROI) within the 471-ROI Harvard-Oxford brain atlas, and ranked by mean z-score. This information was used to identify the top 5% nodes for each parametric effect. Both networks were constructed with equal numbers of nodes, in order to ensure that the main network metrics (within-or between-network correlations, see below) were not biased by the number of regions contributing to that aggregate measure. Each ROI was ranked by its mean parametric effect z-score and the top 5% of nodes were classified as either Maintenance or Manipulation network nodes. Lastly, structural connectivity information-the fractional anisotropy (or FA, a measure denoting white matter organization) of each pairwise connection-between all network nodes (5% of 471 = 23 Maintenance nodes, 23 Manipulation nodes) was assessed for both within-and between-network connection strength.
Network segregation and reorganization measures. Lastly, in order to summarize the more system-wide behavior of the two task-related networks, two derived measures of overall network organization were calculated. First, a previously reported 14 measure of system segregation was used. This measure was calculated as the difference between the mean magnitudes of between-system correlations from the within-system correlations as a proportion of mean within-system correlation.
Where Ζ w is the mean r-values between nodes of one partition, module, or system (similar to within-module degree or WMD), and b Ζ is the mean of r-values between nodes of separate partitions similar to between-module degree or BMD 40 . Accordingly, values greater than 0 reflect relatively lower between-system correlations in relation to within-system correlations (i.e., stronger segregation of systems), and values less than 0 reflect higher between-system correlations relative to within-system correlations (i.e., diminished segregation of systems).
Second, a network reconfiguration measure was developed to describe the similarity in functional connectivity across the task conditions. While the segregation measure above is descriptive of network behavior at discrete levels of difficulty, network reconfiguration describes the overall similarity between task conditions, i.e., between network states. Network reconfiguration represents a direct comparison between network states, and in this case represents an average of the correlation values between all functional connection matrices for a given subject.
Where n is the number of states (e.g., 4 in this case), and ρ x,y represents the Spearman's correlation between the complex functional connectivity profiles representing two brain states x and y (e.g., functional connectivity matrices representing Easy and Medium difficulty levels in this case). Thus, highly correlated matrices represent low reconfiguration (closer to 0), while weakly correlated matrices represent high reconfiguration across task conditions (closer to 1). Given the explicit hypotheses concerning segregation and integration of the putative Maintenance and Manipulation networks, reconfiguration within a subset of connections that describe a) connections within the Maintenance network, b) connections within the Manipulation network, and c) connections between both networks were examined.