Temporal sequences of brain activity at rest are constrained by white matter structure and modulated by cognitive demands

A diverse set of white matter connections supports seamless transitions between cognitive states. However, it remains unclear how these connections guide the temporal progression of large-scale brain activity patterns in different cognitive states. Here, we analyze the brain’s trajectories across a set of single time point activity patterns from functional magnetic resonance imaging data acquired during the resting state and an n-back working memory task. We find that specific temporal sequences of brain activity are modulated by cognitive load, associated with age, and related to task performance. Using diffusion-weighted imaging acquired from the same subjects, we apply tools from network control theory to show that linear spread of activity along white matter connections constrains the probabilities of these sequences at rest, while stimulus-driven visual inputs explain the sequences observed during the n-back task. Overall, these results elucidate the structural underpinnings of cognitively and developmentally relevant spatiotemporal brain dynamics.

i. j. k. l. Sinusoidal signal X1 with additive Gaussian noise is plotted in blue. Signal X2, generated by subtracting X1 from a slower sine wave, plus Gaussian noise, is plotted in red. k-means clustering extracts three visually obvious coactivation states (partition shown in light red, green, and blue). (b) Plot of X1 amplitude (x-axis) vs. X2 amplitude (y-axis), yielding a Pearson's r = −0.70. (c) X1 and X2 concatenated into a data matrix, Xa. We concatenated Xa with two channels of Gaussian noise to generate a 4-column data matrix Xn in order to use the correlation distance metric. (d) Centroids, or "states," obtained through k-means clustering of Xn. Three states emerge; the first state consists of low-and high-amplitude activity in X1 and X2, respectively. The second state consists of low amplitude activity in X2 with near 0 activity in X1. The third state consists of high amplitude activity in X1 with low amplitude activity in X2. (e) The first 3 principal component (PC) loadings of Xa. PC1 captures States 1 and 2, but no single PC captures State 3. (f ) The cross-correlation structure of Xa, which does not reveal the states in panel (d). (g) Signal X1, generated by Gaussian noise plus random positive sinusoidal activations is plotted in blue. Signal X2, generated by Gaussian noise plus synchronized coactivation or coinactivation with X1, is plotted in red. k-means clustering extracts three visually obvious coactivation states (partition shown in light red, green, and blue). (h) Plot of X1 amplitude (x-axis) vs. X2 amplitude (y-axis), yielding a Pearson's r =0.018. (i) X1 and X2 concatenated into a data matrix X b . We concatenated X b with two channels of Gaussian noise to generate a 4-column data matrix Xm in order to use the correlation distance metric. (j) Centroids, or "states," obtained through k-means clustering of Xm. Three states emerge; the first consists of high-amplitude activity in X1 with low-amplitude activity in X2. The second consists of near 0 activity in X1 and X2. The third consists of high-amplitude activity in both X1 and X2.  . BOLD data exhibits clustering in regional activation space. We performed k-means clustering on rest and n-back data from the n = 879 sample studied in the main text at k=5 using the correlation distance in a 462-region activation space. Here we show silhouette scores for rest and n-back time points from 40 random subjects; however, results are consistent across random sets of 10 subjects, suggesting marked reliability. Silhouette scores range from -1 to 1 and are computed for each data point, with 1 indicating that a data point is closer to members of its assigned cluster than to members of the next closest cluster, 0 indicating equidistance between the assigned cluster and the closest cluster, and -1 indicating that the data point is closer to another cluster. (a) Silhouette values for data points from 462 independent, normally distributed channels. (b) Silhouette values for data from an independent phase randomized (IPR) null model applied separately to resting state and n-back BOLD data. This null model preserves regional autocorrelation while eliminating non-stationarities and reducing covariance. (c) Silhouette values for actual resting state and n-back BOLD data. (d) Distribution of silhouette values across all clusters for a null model preserving autocorrelation (red, µipr) and for actual data (blue, µ real ). A two-sample t-test confirms that silhouette values are larger in the real data, indicating that the data are clustered in regional activation space beyond what is expected from a signal with the same regional autocorrelations. (e) Cluster centroids at k = 5 using IPR resting state and n-back BOLD data. (f ) Spatial correlation between IPR null centroids in panel e (y-axis) and full sample centroids (x-axis).

SUPPLEMENTARY FIGURE 4.
Similarity between states in rest and n-back. (a) Spatial correlation between cluster centroids reveals anticorrelation between DMN-and DMN+, between DMN-and FPN+, and between VIS+ and VIS-. (b) Spatial correlation between centroids calculated separately for rest and n-back reveal high correspondence, consistent with the identification of recurrent activity patterns common to both scans. (c) Cluster centroids computed by including equal amounts of rest and n-back task data as input to the clustering algorithm. Cluster names based on maximum cosine similarity were identical to the full sample centroids. (d) Spatial correlation between the 6 minute rest and the 6 minute n-back task cluster centroids and the full sample cluster centroids. Correlation coefficients of r > 0.99 were found only on the diagonal, suggesting 1-to-1 correspondence between the two centroid sets. The observed off-diagonal anticorrelations are consist with those observed in the full sample, as shown in panel (a). (e) Group average state transition probabilities for rest (right) and n-back (left) using the 6 minute n-back task cluster partition reveals similar structure and high correlation with full sample state transition probabilities. (f ) The y-axis shows the percentage of subjects missing at least one state in their time series for rest (purple) and for n-back (yellow ), for values of k on the x-axis ranging from 2 to 18. There existed at least one subject with missing states for all k > 5, supporting the choice of k = 5 for the main text. SUPPLEMENTARY FIGURE 5. Key findings reproduced at k = 5 using the 234-node Lausanne parcellation. (a) Cluster centroids at k = 5 are similar to that of the 463-node Lausanne parcellation. (b) State fractional occupancies change with increasing cognitive load similarly compared to the 463-node parcellation analysis. (c-d) Correlation between structurebased transition energy prediction (x-axis) and empirically derived transition probability (y-axis) for resting state (left) and the 2-back condition of the n-back task (right), using inputs weighted evenly throughout the whole brain (c) or weighted positively towards the visual system (d). SUPPLEMENTARY FIGURE 6. Brain states and dynamics in an independent sample with higher sampling rate and no global signal regression. (a) Cluster centroids for clustering of rest and n-back task BOLD data from the Human Connectome Project (HCP) with volumes acquired 4 times as frequently as the PNC and no global signal regression. (b) Spatial correlation between centroids for HCP and PNC data sets, is high along the diagonal, allowing for unambiguous matching of brain states between the two samples.  Group average, mean functional connectivity within and between cognitive systems defined a priori 1 . We first computed average functional connectivity matrices by averaging functional connectivity matrices over rest and n-back scans for each subject. Next, we computed the average edge strength within and between regions in each a priori defined system to yield mean within-and between-system functional connectivity. Finally, we computed the group average mean within-and between-system functional connectivity across the n = 879 subjects from the main text, the values of which are displayed in black text overlaying each heatmap element. *, p < 0.05 for a one-sample t-test comparing the mean of the distribution of correlations across subjects to 0.   10. Spatial and topological properties of brain structure facilitate selective brain state stability. (a) Construction of null states preserving symmetry and spatial clustering of activity using sphere-based permutation 2 . We compare the minimum control energy required to maintain the brain in each state (persistence energy) relative to spatially permuted states. (b) We computed persistence energy for each state and its permuted variants in group average SC (orange) and found that the DMN+ state required less energy to persist than its permuted variants. We performed the same test in two null models: a null model preserving topology (blue) and a null model preserving both topology and spatial constraints (light blue). Orange *, pcorr < 0.05 after Bonferroni correction over each of the 5 states, separately for each null model. (Deg. Pres.), degree distribution-preserving null model 3 . Space Pres., degree sequence, edge weight and length distribution and relationship preserving null model 4 .

a.
b.
c. d. Supplementary Notes, Discussion, and Methods

Sample exclusion criteria
We excluded 722 of the initial 1601 subjects for the following reasons: medical problems that may impact brain function, incidental radiologic abnormalities in brain structure, poor or incomplete FreeSurfer reconstruction of T1 images 5 , high motion in rest or n-back fMRI scans, high signal-to-noise ratio or poor coverage in task-free or n-back task BOLD images, and failure to meet a rigorous manual and automated quality assurance protocol for DTI 6 . Notably, our goal in constructing a sample was to compare structure-function relationships between contexts across all subjects in our sample. This analysis required highly stringent inclusion criteria that only included subjects with high quality data for rest BOLD, n-back task BOLD, and DTI.

Functional Scan Types
During the resting-state scan, a fixation cross was displayed as images were acquired. Subjects were instructed to stay awake, keep their eyes open, fixate on the displayed crosshair, and remain still. Total resting state scan duration was 6.2 minutes. As previously described 7 , we used the fractal n-back task 8 to measure working memory function. The task was chosen because it is a reliable probe of the executive system and is not contaminated by lexical processing abilities that also evolve during adolescence 9,10 . The task involved the presentation of complex geometric figures (fractals) for 500 ms, followed by a fixed interstimulus interval of 2500 ms. This occurred under the following three conditions: 0-back, 1-back, and 2-back, inducing different levels of working memory load. In the 0-back condition, participants responded with a button press to a specified target fractal. For the 1-back condition, participants responded if the current fractal was identical to the previous one; in the 2-back condition, participants responded if the current fractal was identical to the item presented two trials previously. Each condition consisted of a 20-trial block (60 s); each level was repeated over three blocks. The target/foil ratio was 1:3 in all blocks, with 45 targets and 135 foils overall. Visual instructions (9 s) preceded each block, informing the participant of the upcoming condition. The task included a total of 72 s of rest, while a fixation crosshair was displayed, which was distributed equally in three blocks of 24 s at the beginning, middle, and end of the task. Total task duration was 693 s. To assess performance on the task, we used d , a composite measure that takes into account both correct responses and false positives to separate performance from response bias 11 .
Imaging data acquisition and preprocessing MRI data were acquired on a 3 Tesla Siemens Tim Trio whole-body scanner and 32-channel head coil at the Hospital of the University of Pennsylvania. High-resolution T1-weighted images were acquired for each subject. For diffusion tensor imaging (DTI), 64 independent diffusionweighted directions with a total of 7 b = 0 s/mm 2 acquisitions were obtained over two scanning sessions to enhance reliability in structural connectivity estimates 12 . All subjects underwent functional imaging (TR = 3000 ms; TE = 32 ms; flip angle = 90 degrees; FOV = 192 × 192 mm; matrix = 64×64; slices = 46; slice thickness = 3 mm; slice gap = 0 mm; effective voxel resolution = 3.0 × 3.0 × 3.0 mm) during the resting-state sequence and the n-back task sequence 12,13 . During resting-state and n-back task imaging sequences, subjects' heads were stabilized in the head coil using one foam pad over each ear and a third pad over the top of the head in order to minimize motion. Prior to any image acquisition, subjects were acclimated to the MRI environment via a mock scanning session in a decommissioned scanner. Mock scanning was accompanied by acoustic recordings of gradient coil noise produced by each scanning pulse sequence. During these sessions, feedback regarding head motion was provided using the MoTrack motion tracking system (Psychology Software Tools, Inc., Sharpsburg, PA).
Raw resting-state and n-back task fMRI BOLD data were preprocessed following the most stringent of current standards 14,15 using XCP engine 16 : (1) distortion correction using FSL's FUGUE utility, (2) removal of the first 4 volumes of each acquisition, (3) template registration using MCFLIRT 17 , (4) de-spiking using AFNI's 3DDESPIKE utility, (5) demeaning to remove linear or quadratic trends, (6) boundary-based registration to the individual high-resolution structural image, (7) 36parameter global confound regression including framewise motion estimates and signal from white matter and cerebrospinal fluid, and (8) first-order Butterworth filtering to retain signal in the 0.01 to 0.08 Hz range. Following these preprocessing steps, we parcellated the voxel-level data using the 463-node Lausanne atlas 18 . We excluded the brainstem, leaving 462 parcels. Our choice of parcellation scale was motivated by prior work showing that parcellations of this scale replicate voxelwise clustering results more than coarser scales with fewer parcels 19 . We excluded any subject with mean relative framewise displacement > 0.5 mm or maximum displacement > 6 mm during the n-back scan, and mean relative framewise displacement > 0.2 mm for the resting state scan.
All DTI datasets were subject to a rigorous manual quality assessment protocol that has been previously described 6 . The skull was removed by applying a mask registered to a standard fractional anisotropy map (FM-RIB58) to each subject's DTI image using an affine transformation. The FSL EDDY tool was used to correct for eddy currents and subject motion and rotate diffusion gradient vectors accordingly. Distortion correction was applied using FSL's FUGUE utility. DSI studio was then used to estimate the diffusion tensor and perform deterministic whole-brain fiber tracking with a modified FACT algorithm that used exactly 1,000,000 streamlines per subject excluding streamlines with length < 10 mm 20,21 . Lausanne 463-node atlas parcels were extended into white matter with a 4 mm dilation 20,21 and then registered to the first b = 0 volume using an affine transform 6,20 . For all analyses, edge weights in the structural network were defined by the average fractional anisotropy value for streamlines connecting each pair of parcels 22 .

Split-halves validation of clustering
To ensure that our final clustering solution for k = 5 was not influenced by outliers or adversely impacted by model overfitting, we split our sample into two equal partitions 500 times and performed k-means clustering separately on each half of the dataset. We then matched clusters by computing the cross-correlation between both sets of centroids, and then by reordering the clusters based on the maximum correlation value for each cluster. We plotted those maximum correlation values and found that most were > 0.99, suggesting a high degree of robustness and stability in brain states ( Supplementary Fig. 2d). We also computed the state transition probabilities and state persistence probabilities for each half separately for rest and n-back, and then computed the correlation between the transition or persistence probabilities between the two data set partitions. Similarly, we found very high correlation values (> 0.99) for state transition probabilities and state persistence probabilities for both rest and n-back (Fig 2e-f). These observations suggest that our estimates of brain dynamics are robust to outliers and consistent across different subsamples of our data.

Functional connectivity does not fully explain instantaneous coactivation
We posit that the alignment between a priori resting state functional networks 1 (RSNs) and our coactivation patterns cannot be fully explained by analyzing interregional correlations in BOLD signal across time, which is often called "functional connectivity." Here, we provide evidence that suggests that our coactivation patterns are consistent with but not trivially explained by functional connectivity. We also provide examples that explain why temporal correlation does not necessarily explain instantaneous coactivation in general, and demonstrate that k-means clustering is a useful tool for extracting coactivation patterns.
In general, time series data can contain (1) unexpected coactivation patterns in the presence of temporal corre-lation and (2) unexpected coactivation patterns in the absence of temporal correlation. Here, we provide two examples that illustrate how k-means clustering can resolve dissociation between temporal correlation and instantaneous coactivation in multi-dimensional time series data. First, we show two stationary signals that are anticorrelated across time ( Supplementary Fig. 1b, r = −0.70), but exhibit three unexpected coactivation patterns (rather than two simple "on-off" and "off-on" patterns) due to fluctuation between variable amplitudes (Supplementary Fig. 1a-b). In this example, one sinusoidal signal fluctuates between three distinct amplitudes and the other sinusoidal signal simply fluctuates between a peak and a trough (Supplementary Fig. 1a). The combination of these signals yields 3 distinct coactivation patterns, rather than the 2 patterns that would be expected from a pair of temporally anticorrelated signals ( Supplementary Fig. 1c-e).
Next, we show two signals that are uncorrelated across time, but still have multiple spontaneous coactivation patterns ( Supplementary Fig. 1a). Because these two signals sometimes activate together and at other times exhibit opposing activity, their temporal correlation is near 0 (r = 0.018), yet there are 3 unique coactivation patterns in the time series that k-means faithfully extracts. Positive temporal correlations could occur either through simultaneous activation or simultaneous inactivation, while temporal anticorrelation can result from opposing activity patterns. One could envision how adding in additional signals (i.e. dimensions or brain regions) could provide additional degrees of freedom to construct a range of coactivation patterns for a given temporal correlation structure. The sensitivity of the k-means approach to coactivation patterns in these two contexts supports its use in fMRI data for identifying previously unknown spontaneous interactions between neural networks, as well as the temporal organization of those networks.
When we apply k-means clustering to high-dimensional BOLD data from resting state and n-back scans (Fig.  2a), we identify activity patterns with both expected and unexpected features based on functional connectivity. In general, we saw that RSNs show coherent high or low amplitude activity in each state (Supplementary Fig.  8c). This finding reflects the strong positive correlations within RSNs (Supplementary Fig. 8a). We also see coactivation patterns consistent with patterns of functional connectivity between RSNs. For instance, the DMN+ state (Fig. 2a) shows spatial anticorrelation between the temporally anticorrelated dorsal attention network and default mode network 23 (Supplementary Fig. 8a, mean r = −0.10, one-sample t-test, df = 878, t = −80.45, p < 10 −15 ). However, we also identify coactivation patterns that do not trivially reflect functional connectivity. The DMN-state centroid (Fig. 2a) consists of low amplitude activity throughout the default mode network (DMN) with high amplitude activity throughout somatomotor and visual systems (Fig. 2c). When clustering on functional connectivity 1 , these three systems emerge as separate. Mean correlations within these systems are positive ( Supplementary Fig. 8a-b), but correlations between them are near zero (SOM and VIS) or weakly negative (DMN with SOM and VIS) (Supplementary Fig. 8c). Based solely on functional connectivity, one would not expect these three systems to strongly coactivate or oppose one another at the level of individual BOLD frames (TRs), yet our analysis suggests that there are many TRs with low DMN activity, high VIS activity, and high SOM activity ( Supplementary Fig. 8d-g). Moreover, VIS and SOM have two additional configurations of coactivation represented in the VIS+ and VIS-states (Fig. 2a-c). In the VIS-state, we see high amplitude SOM activity with low amplitude VIS activity, suggesting that at times VIS and SOM systems coactivate and at other times they oppose one another. In the VIS+ state we see low amplitude SOM activity with high amplitude VIS activity. Purely based on functional connectivity, one might inappropriately draw the conclusion that these two RSNs are independent given their mean correlation of zero, when in reality they have 3 distinct configurations at the level of individual time points. The behavior of VIS and SOM is most consistent with the scenario presented in Supplementary Fig. 1. Overall, these findings suggest that our analysis identifies recurrent activity patterns whose spatial organization reflects strong temporal correlations within RSNs, but also with coactivation between RSNs that cannot be trivially explained by temporal correlations between RSNs.

Silhouette analysis of clustering
In order to support the use of a discrete model of brain dynamics, we asked whether individual BOLD time points from rest and n-back scans exhibited clustering in a 462-region activation space. We used the silhouette scores of BOLD time points as a measure of clustering in this space. Silhouette scores range from -1 to 1 and are computed for each data point, with 1 indicating that a data point is closer to members of its assigned cluster than to members of the next closest cluster, 0 indicating equidistance between the assigned cluster and the closest cluster, and -1 indicating that the data point is closer to another cluster. We compared the silhouette scores for real BOLD data points from the PNC, data points from 462 independent random Gaussian distributions, and data points from independent phase randomized null time series 24 based on subject-specific BOLD data. This independent phase randomized null model (IPR) preserves the autocorrelation within each region, but destroys covariance between regions. When we compared the silhouette scores for clustering of real BOLD data to independent random Gaussian and autocorrelation-preserving null data, we found that the real data had higher mean silhouette scores than that of the autocorrelation-preserving null data ( Supplementary Fig. 3a, difference in mean silhouette score actual minus null = 0.047, two sample t-test, df = 27598, t = 115.03, p < 10 −15 ). Additionally, the centroids generated by clustering this null data had little obvious structure (Supplementary Fig. 3e) and showed little similarity to the original centroids ( Supplementary  Fig. 3f). These findings suggest that BOLD data exhibit non-trivial clustering in regional activation space.

Impact of scan composition on brain states and dynamics
To ensure that our results were not biased by the fact that there were a larger number of n-back volumes (225 per scan) than rest volumes (120 per scan), we used the partition generated by clustering both entire scans together to compute separate centroids for volumes in rest or n-back scans. This analysis revealed a mean spatial Pearson correlation of 0.96 between corresponding centroids ( Supplementary Fig. 4b). Next, we generated a new sample by concatenating the first 6 minutes of the n-back task data for each subject and the entire 6 minutes of the rest data for each subject. We ran this sample through the clustering algorithm at k = 5 and found that the cluster centroids ( Supplementary Fig. 4c) were highly similar to those computed from the full sample (mean Pearson r = 0.99; Supplementary Fig. 4d). We also computed transition probabilities using this cluster partition and identified highly similar group average transition matrix structure (rest, Pearson r = 0.997, nback, Pearson r = 0.989, Supplementary Fig. 4e), suggesting that the temporal order of state labels was largely unaffected by the scan composition of the sample. Moreover, these results suggest that n-back state transitions are internally consistent.
Finally, we show that the differences between rest and n-back in the proportion of subjects with any absent states ( Supplementary Fig. 4f) is attenuated relative to the full sample ( Supplementary Fig. 2e). This finding suggests that in the full sample, the n-back task data has better state representation due to better sampling, rather than poor classification of rest volumes. However, even with equal samples, the rest dataset still has more subjects with missing states, suggesting that there may be more variability in brain dynamics during rest. Importantly, there were still no subjects with absent states for rest or n-back at k < 5 ( Supplementary Fig. 4f), and there was at least 1 subject with a missing state in rest or n-back at k > 5 (though bars are very small in Supplementary Fig. 4f). Collectively, these results support the simultaneous generation of partitions for rest and n-back volumes and the choice of k = 5 for analysis in the main text.
Impact of sampling rate, global signal regression, and head motion on brain states and dynamics The BOLD data from the PNC was acquired at a sampling rate of one volume every 3 seconds 12 , which is relatively slow compared with other large data sets, including the Human Connectome Project 25 , which samples every 0.72 seconds. The standard preprocessing pipeline for this data set involves regression of head motion parameters, white matter confounds, cerebrospinal fluid confounds, and global signal from each voxel's time series 16,26 . It is controversial whether this procedure, known as "global signal regression," induces anticorrelation 27,28 .
Thus, we selected 100 unrelated subjects from the minimally preprocessed version of the Human Connectome Project (HCP) data set 29 and performed the following preprocessing steps on resting state and n-back working memory task scans: (1) head motion regression, (2) linear and quadratic detrending, (3) bandpass filtering to retain the 0.01 to 0.08 Hz range, and (4) parcellation according to the 462 region Lausanne atlas. We concatenated all 405 volumes from the working memory task with the first 405 volumes from the resting state over all 100 subjects. We chose to make the number of volumes from each scan equal so that the clustering algorithm would not be biased towards one scan or the other.
Next, we performed k-means clustering on this matrix and computed the centroids (Supplementary Fig. 6a). Every HCP centroid was maximally correlated with only one PNC centroid, and vice versa, allowing for unambiguous matching between the two sets of brain states (Supplementary Fig. 6b). The DMN+ and DMN-states were the most similar to PNC states, while VIS+ and VISexhibited slightly lower correlations (Supplementary Fig.  6b). DMN+ and DMN-states, as well as VIS+ and VISstates, exhibited strong anticorrelation with each other (Supplementary Fig. 6c). Nevertheless, the HCP offdiagonal elements of the transition probability matrices (i.e. transitions not persistence) for rest and 2-back block of the n-back task were correlated with PNC transition probabilities at r = 0.74 and r = 0.57, respectively (Supplementary Fig. 6d-e). This finding suggests that while there were differences in the spatial activity patterns of brain states, their dynamic progression through time was relatively similar. The unexplained variance between the two samples could be due to differences in age, with the PNC comprised of developing youths and the HCP comprised of healthy adults. Notably, task dynamics were less similar between the two groups, possibly reflecting stronger age-related changes in task dynamics relative to resting state. Moreover, the differences in transition probabilities between 2-back and rest were highly similar in HCP and in PNC ( Supplementary Fig. 6e), with increased transitions from DMN and FPN states into VIS states. Overall, these findings suggest that while global signal regression and sampling rate may impact the spatial activity patterns comprising brain states to some de-gree, it does not impact estimation of their dynamics or the presence of spatial anticorrelation in their activity patterns.
Finally, we tested whether the identification of recurrent coactivation patterns with k-means clustering was biased by the inclusion of single frames with submillimeter framewise displacement. We removed 76,339 frames associated with > 0.1 mm framewise displacement, and repeated the clustering on the remaining 226,916 frames using correlation distance and k = 5. The resulting centroids (Supplementary Fig. 7a) were nearly identical to the centroids found in Figure 2a (Supplementary Fig. 7b, all r > 0.99 for corresponding centroids). These findings suggest that high motion frames minimally impact the clustering process. Therefore, we included these frames so that we could have the largest continuous sample of sequential frames from which to compute transition probabilities and dwell times.
Assessing randomness, asymmetry, autocorrelation, and distance dependence of brain state sequences In addition to assessing the relationship between brain state transitions, cognitive demands, and behavior, we were also interested to assess important basic properties of the state transition probability matrix. First, we were interested to validate previous findings which suggest that the brain does not undergo every possible transition with equal probability 30 . For these analyses, we begin with a transition matrix ( Supplementary Fig.  9a-b) whose ij th element indicates the probability that state i occurs at time t and state j occurs at time t + t r , where t r is the repetition time (TR) of the scan (here, 3 seconds). The state transition probability matrix houses several pieces of important information. We refer to the diagonal entries in the transition probability matrix as the persistence probabilities, because they indicate the probability of remaining in a given state, and we refer to the off-diagonal entries in the transition probability matrix as the transition probabilities, because they indicate the probability of transitioning between two distinct states. Given this structure, we were interested to determine whether the brain dynamics that we observed could occur in a uniformly random distribution of states and state transitions. To test the randomness of persistence probabilities, we generated subject-level null state time series that preserved fractional occupancy but shuffled the temporal sequence of states; for example, if the state time series was given by the vector [1 1 2 2 3 3], then we would permute the order of the entries in that vector uniformly at random, yielding a distribution of vectors with the same proportion of each state, i.e. [1 2 1 3 3 2]. By comparing the observed persistence probabilities to the persistence probabilities in this null model, we can test whether the observed persistence probabilities would be expected based solely on fractional occupancy. We averaged together all subjects to generate a distribution of group average null persistence probabilities and compared them to the empirically observed group average persistence probabilities.
To test the randomness of transition probabilities, we generated null state time series that preserved only the states involved in transitions and reduced sequences of repeating states to a single state. We removed repeating states to control for the potentially independent effects of state persistence, which is equivalent to temporal autocorrelation, in estimating transition probabilities. For example, if the state time series was given by the vector V = [112233], then we would reduce that original vector to the new vector V t = [123], and subsequently permute the new vector uniformly at random. Specifically, for subject i = 1, .., ..N , we reduce the state sequence vector V i to a transition sequence vector V ti by eliminating repeating states, compute a transition matrix T i , and average across all subjects to generate a group average transition matrix T = 1 N N i=1 T i that excludes state persistence (i.e. diagonal entries of T i are equal to 0). Next, we shuffle V ti uniformly at random to generate V ni , compute a transition matrix T ni , and average across all subjects to generate a group average null transition matrix T n = 1 N N i=1 T ni that excludes state persistence. We generate a distribution of T n by independently shuffling V ti for each subject many times and averaging them across subjects. Finally, we compare each element of T to the corresponding element in a distribution of T n to compute a two-tailed, non-parametric p-value for each transition. These analyses demonstrated that almost all of the observed persistence and transition probabilities in both resting state and the n-back task were unexpected under these uniformly random null models (all p corr < 0.05 except for DMN-to DMN+ during n-back, Supplementary  Fig. 9a-b).
Next, we assessed the properties of these transition matrices, such as the matrix symmetry, which reflects whether transitions from state 1 → 2 occur as frequently as transitions from state 2 → 1, and so forth for every state pair. Specifically, we quantified the asymmetry ψ of a k-by-k transition matrix A as: In calculating this asymmetry score, we exclude the elements along the diagonal of A so as to only capture directional bias in transitions between pairs of states, without including the probability of persisting in each state. The values of this score range from 0 to 1, where 0 represents a matrix that is symmetric about the diagonal and 1 represents a matrix in which the upper triangle is −1× the lower triangle.
We were also interested in how much information about future states was contained within the current state. Drawing from information theoretic approaches to analysis of discrete signals, we computed the normalized auto mutual information (NMI) between lagged state time series to answer this question. Here, we asked whether the current state contains information about the subsequent state by computing NMI between the original state time series and a state time series lagged by one element. First, we created two copies of the state time series. Then, we removed the first element from one, X, and the last element from the other, Y, to generate two vectors of equal length such that X i = Y i+1 . We computed the NMI between X and Y as: where H(X) is the entropy of X and H(X|Y) is the conditional entropy of X given Y. Where k is the number of brain states, and In normalizing by H(X), we ensure that the NMI ranges from 0 to 1, with 0 representing two completely independent signals and 1 representing two identical signals. We anticipated that stimulus driven activity in the nback task that occurs independently of the current brain state would result in a reduction of directional, asymmetric state transitions relative to resting state and a reduction in the dependency of state transitions on the current state. Indeed, using the normalized measure of matrix skewness described above, we found that transition probabilities at rest were more asymmetric than during the n-back task (Supplementary Fig. 9d; paired t-test, µ nback−rest = −0.062, df = 878, t = −24.96, p < 10 −15 )s. Additionally, using the normalized automutual information metric described above, we found that the current brain state carried less information about the subsequent state during the n-back task relative to rest, even when controlling for autocorrelation ( Supplementary Fig. 9f: normalized auto mutual information, µ nback−rest = −0.028, df = 878, t = −20.88, p corr < 10 −15 ). Consistent with our hypothesis, we also found that the Euclidean distance between states was anticorrelated with the transition probabilities between states (Supplementary Fig. 9e). Interestingly, however, the effect was stronger for n-back than for rest (Supplementary Fig. 9e; paired t-test, µ nback−rest = −0.17, df = 878, t = −22.74, p < 10 −15 ), suggesting that the brain is more prone to transition between distant states while at rest.

Transition probabilities within task blocks
In addition to computing transition probabilities across the entire n-back task scan, we also computed transition probabilities within each task block. Because the instances of a specific task block (i.e. 0-back, 1-back, 2-back) are not continuous, we counted the number of each transition found within all instances of a given task block, and then we divided the counts by the total number of possible transitions within all instances of that task block ( Supplementary Fig. 13a-c). Similar to the analysis of transition probabilities in the entire n-back scan, we found that in the 2-back block, transitions from the DMN+ and DMN-states into the VIS+ state were increased relative to the 0-back block (Supplementary Fig.  13d). Interestingly, we saw that transitions from DMN+ and DMN-states in to FPN+ states increased from 0back to 2-back ( Supplementary Fig. 13d), although in the resting state these transitions were more frequent relative to the entire n-back scan (Fig. 4c). However, transitions from VIS+ to FPN+ did not differ between the two conditions, while transitions into DMN+ and DMN-states decreased ( Supplementary Fig. 13d). These findings suggest that increasing cognitive load biases the traversal of some trajectories in state space without affecting others.

Spatially embedded null models
To assay the specificity of brain state activity patterns themselves, we compared P e for actual brain states to P e for a distribution of null brain states. We generated null states using a recent method developed to find overlap between activation maps while accounting for spatial clustering of activity 2 . Following this method 2 , we projected node-level data to a cortical surface, inflated the surface to a sphere using FreeSurfer tools, applied a rotation to the sphere, collapsed it back to a cortical surface, and extracted node-level data by averaging over vertices belonging to each region. This process preserves the spatial grouping and relative locations of regions with similar activity while still changing their absolute locations. Importantly, reflected versions of the same rotation are applied for each hemisphere, thus also preserving the symmetry of the original activity pattern.
To assay the specificity of our findings to higher order topological features found in real structural brain networks, we compared P e estimates to a recently developed network null model 4 , which preserves several important spatial and topological features. This model exactly preserves the degree sequence and edge weight distribution, while approximately preserving the edge length distribution and edge length-weight relationship. We also compared our findings to a more commonly used topological null model 3 which preserves only the degree distribution, but not the degree sequence, of the network.

Comparing transition energies and transition probabilities using control theory
A main goal of the present work was to provide a mechanistic description for how the brain's large-scale white matter architecture constrains its progression through activation space. To accomplish this goal, we began with a simple model of linear, time-invariant dynamics along the white matter structural connectome estimated from diffusion tractography. We represent the volume-normalized, fractional anisotropy-weighted structural network as the graph G = (V, E), where V and E are the vertex and edge sets, respectively. Let A ij be the weight associated with the edge (i, j) ∈ E, and define the weighted adjacency matrix of G as A = [A ij ], where A ij = 0 whenever (i, j) / ∈ E. We associate a real value with each of the N brain regions to generate a vector describing the activity in each node at time t, and we define the map x : R ≥0 → R N to describe the dynamics of activity in network nodes over time. Here we employ a simplified noise-free linear continuous-time and time-invariant model of such dynamics: where x describes the activity (i.e. voltage, firing rate, BOLD signal) of brain regions over time. Thus, the vector x has length N , where N is the number of brain regions in the parcellation, and the value of x i describes the activity level of that region. The matrix A is symmetric, with the diagonal elements satisfying A ii = 0. Prior to calculating control energy, we divide A by ξ 0 (A) and subtract 1 from the diagonal elements of A, where ξ 0 (A) is the largest eigenvalue of A. This step makes the system marginally stable by ensuring that the maximum eigenvalue of the system is equal to 0. The input matrix B identifies the control input weights, which we set to the N × N identity matrix by default. For certain analyses (Fig. 5d, Supplementary Fig. 11b and d, Supplementary  Fig. 5d, Supplementary Fig. 12d), a set of brain regions K ⊂ V that belong to a particular cognitive system 1 was given increased weight, such that and c is a positive, real scalar value, which we set equal to 2 here. The input u : R ≥0 → R M denotes the control strategy.
To compute the minimum control energy required to drive the system from an initial activity pattern x o to a final activity pattern x f over some time T , we compute an invertible controllability Gramian W for controlling the network A from the set of network nodes K (in our case, every node in the network), where: where T is the time horizon, which specifies the time over which input to the system is allowed. After computing the controllability Gramian, we can solve for the minimum control energy E m by computing the quadratic product between the inverted controllability Gramian and the difference between x o and x T : In Fig. 5b, we computed the k × k transition energy matrix T e as the minimum energy required to transition between all possible pairs of the k clustered brain states, given the white matter connections represented in A. We refer to the on-diagonal elements of T e as persistence energies, because they quantify the energy required to maintain x o in the special case where x o = x T . We refer to the off-diagonal elements of T e as transition energies, because they quantify the energy required to move between all pairs of x o and x T where x o = x T . First, we sought to determine whether the brain states that we observed (Fig. 2a) were easier to maintain (1) compared to null states and (2) in real brain networks compared to null brain networks. Thus, we compared the persistence energy for the actual states to that of null brain states 2 with preserved symmetry and spatial clustering ( Supplementary Fig. 10a). We found that the persistence energy of the DMN+ state was significantly lower than that of its respective null states (Supplementary Fig. 10b: one-tailed non-parametric test, DMN+, p corr = 0.045, 1000 sphere-permuted null states). The DMN-state also required lower persistence energy than many of its respective null states, although this result was not significant after Bonferroni correction over all states (Fig. 5b, one-tailed non-parametric test, p corr = 0.05, 1000 sphere-permuted null states). Crucially, the DP and SLP null models did not exhibit selectively increased stability in DMN states, suggesting that DMN-driven states may arise in part due to complex features of white matter topology that allow for their stability.
Next, we hypothesized that the off-diagonal elements of the empirically observed state transition matrix at rest (Fig. 4a) would be anticorrelated with the off-diagonal elements of T e . This hypothesis was based on the notion that the brain would empirically prefer trajectories in state space requiring smaller magnitude control inputs to achieve. Note that we ignored the diagonal of T e , which captures whether persistence energies explain observed state dwell times, a question that we were not sufficiently powered to ask with only k = 5 states but that should be revisited in future studies.
The choice of the control horizon T is critical for the calculation of T e . When control inputs are uniformly distributed across the brain such that B K is the identity matrix, then as T approaches 0, the minimum control energy E m becomes proportional to the squared Euclidean norm of x T −x o . ||x T −x o || 2 is the state space distance between the initial and final state in the transition. At longer time horizons, both ||x T − x o || 2 and the topology of A determine E m . When B K is not the identity matrix, E m becomes proportional to ( However, because the units of the edge weights of A, obtained from deterministic tractography performed on diffusion-weighted imaging data, are not measured in activity per unit time, the value of T is arbitrary relative to A and the real time in which neural activity was measured through BOLD fMRI. Therefore, we chose T in a data-driven fashion by computing transition energies for a range of T values using a group representative structural A matrix, and computing the Spearman rank correlation between transition energies and the group average transition probability matrix from resting state fMRI data ( Supplementary Fig. 11a). We used the Spearman rank correlation rather than the Pearson correlation in order to reduce the effect of outliers on estimating the relationship between transition probability and transition energy. We found that the strongest negative correlation value was obtained for T = 5, but similarly strong negative correlations were found for the range T = [5,10] ( Supplementary Fig. 11a). Accordingly, we used T = 5 for the analyses presented in Fig. 5b-c, Fig. 6c, and Supplementary Fig. 10b. We also carried out the same analysis using a distribution of null networks with preserved degree sequence (see Fig. 5b), which revealed that there was no T value yielding a correlation between transition energies and observed resting state transition probabilities as strongly negative as when we used the real structural connectivity matrix ( Supplementary Fig. 11a).
In addition to controlling the brain with uniformly weighted inputs, we also asked whether transition energies obtained using a non-uniform distribution of inputs might better explain the observed transition probabilities. Specifically, we hypothesized that accounting for external visual input during the fractal n-back task might provide a more accurate estimation of the input energy needed to achieve each transition. We weighted the inputs towards one cognitive system 1 at a time while still allowing input into every brain region, and then recomputed the Spearman correlation between transition energies and transition probabilities for the resting state ( Supplementary Fig. 11b) and the 2-back condition of the n-back task (Supplementary Fig. 11d). This analysis revealed that accounting for visual input in computing transition energies improved our ability to explain the observed brain state transition probabilities during the 2-back condition ( Supplementary Fig. 11d VIS-weighted subpanel, orange trace is lower than teal trace), and abolished our ability to explain resting state transition probabilities ( Supplementary Fig. 11b VIS-weighted subpanel, blue trace is greater than 0 and dashed line is near 0). However, we did not find a clear role for brain structure in this relationship, as evidenced by the fact that transition energy did not explain transition probability any better than the weighted distance between states ( Supplementary Fig. 11d, orange trace does not dip below dashed line). Thus, we present results in Fig. 5d for . Nevertheless, this finding suggests a specificity of the constraints of state-space transition distance on the brain's empirically observed progression through state space. Resting state transition probabilities can be explained by unweighted state-space distance ( Supplementary Fig. 11a, dashed line, Spearman's r = −0.32) but not by visual system-weighted state-space distance ( Supplementary Fig. 11b, VISweighted dashed line, Spearman's r = −0.06); 2-back transition probabilities can be explained by unweighted state-space distance ( Supplementary Fig. 11c, dashed line, Spearman's r = −0.61), but are explained best by visual system-weighted state-space distance (Supplementary Fig. 11b, VIS-weighted dashed line, Spearman's r = −0.80). This result suggests that visual input allows the brain to deviate from the constraints of state-space distance found at rest, while an equal consideration of visual inputs alongside other inputs is key to explaining resting state transitions. Resolving the effect of structure on brain dynamics during a task may require full knowledge of all input sources, which could potentially be uncovered through a data-driven approach 31 .

Impact of parcellation and cluster number
The choice of parcellation scale may impact analyses involving tractography, where relative region sizes may bias estimates of connectivity. We chose a relatively fine-grained parcellation with 462 nodes because previous coactivation pattern analyses 19 at this scale produced cluster assignments similar to those obtained by clustering at the voxel level. However, we were interested to know whether we would obtain the same results using a coarser parcellation scale. Therefore, we repeated the clustering procedure and subsequent control theoretic analyses using the Lausanne 234 node parcellation 18 .
First, this analysis revealed brain states whose spatial maps were virtually identical to those generated using the 462 node parcellation ( Supplementary Fig. 5a). The fractional occupancy of these states also changed with cognitive load in a similar fashion ( Supplementary  Fig. 5a) compared to the results in Fig. 3d. Similar to the results presented in the main text, we found a negative relationship between transition energies and empirically observed brain state transition probabilities at rest (Supplementary Fig. 5c, Spearman's r = −0.62, p SLP < 0.001, p DP < 0.001) with a weak positive relationship between transition energies and transition probabilities from the 2-back condition ( Supplementary Fig.  5c, Spearman's r = 0.21, p SLP = 1, p DP = 0.85). We also found that discounting the weight of the visual system in calculating state-space transition distance allowed us to better explain 2-back transition probabilities (Supplementary Fig. 5d, Spearman's r = −0.78, p SLP = 1, p DP = 1), and reduced our ability to explain resting state transition probabilities ( Supplementary Fig. 5d, Spearman's r = −0.31, p SLP < 0.001, p DP < 0.001). These results suggest that one can quantify the constraints of white matter architecture on brain state transitions at rest at multiple scales of region definition, supporting the generalizability of our findings.
A limitation of k-means clustering is the need to specify an absolute number of clusters. While the data suggests that k = 5 is the optimal solution, one could certainly choose to analyze this data at multiple scales. Because of a recent study 32 in which k = 6 was identified as the optimal solution by the same heuristic, we reproduce our results at k = 6 in the interest of facilitating comparison between the two studies.
At k = 6, cluster centroid brain states interestingly become grouped into 3 anticorrelated pairs (DMN+ and DMN-, VIS+ and VIS-, FPN+ and SOM+), similar to Ref. 32 . The states were also highly similar to those at k = 5, with the DMN-state from k = 5 essentially "split" into the SOM+ and DMN-state at k = 6. The DMN-state at k = 6 state houses concurrent high amplitude activity in the dorsal attention network and visual system with low amplitude activity in the default mode network (Supplementary Fig. 12a). This appearance of "split" brain states is consistent with hierarchical state organization 30,33 , which becomes apparent at different clustering scales. As WM load increased from 0-back to 2-back, we saw a decrease in DMN+ fractional occupancies with a concurrent increase in FPN+ and VIS state fractional occupancies ( Supplementary Fig. 12b). Direct transitions between anticorrelated states were infrequent ( Supplementary Fig. 12c-d), with a drastic shift in transition probabilities towards VIS states and away from DMN states during 2-back relative to rest (Supplementary Fig. 12e). Similar to k = 5, at k = 6 we found that 2-back task performance was related to 2-back state transitions from the VIS-state into states with coherent frontoparietal and default mode activity ( Supplementary Fig. 12e). Specifically, we found that transitions from VIS-into DMN-and FPN+ were positively associated with performance, consistent with the high amplitude dorsal attention network activity and low amplitude DMN activity found in the DMN-state (Supplementary Fig. 12a). Additionally, transitions from the VIS-state to the SOM+ state were negatively associated with performance, consistent with the frontoparietal deactivation found in the SOM+ state ( Supplementary  Fig. 12a). We again found a strong negative correlation between transition energies and resting state transition probabilities ( Supplementary Fig. 12f, Spearman's r = −0.87, p SLP < 0.001, p DP < 0.001) that was specific to the topology of white matter, supporting the notion that the constraints of white matter on state-space progression generalizes across multiple scales of states. Similar to our results at k = 5, we found that discount-ing the weight of the visual system in calculating the state-space distance of transitions allowed us to better explain the state transitions observed during the 2-back task, during which visual stimuli are frequently delivered ( Supplementary Fig. 12g, Spearman's r = −0.75, p SLP = 0.4, p DP = 0.7). Overall, this analysis suggests that our main findings hold true at near optimal values of k, and that much can be learned from studying the states of the brain at different scales.