Local structure-function relationships in human brain networks across the lifespan

A growing number of studies have used stylized network models of communication to predict brain function from structure. Most have focused on a small set of models applied globally. Here, we compare a large number of models at both global and regional levels. We find that globally most predictors perform poorly. At the regional level, performance improves but heterogeneously, both in terms of variance explained and the optimal model. Next, we expose synergies among predictors by using pairs to jointly predict FC. Finally, we assess age-related differences in global and regional coupling across the human lifespan. We find global decreases in the magnitude of structure-function coupling with age. We find that these decreases are driven by reduced coupling in sensorimotor regions, while higher-order cognitive systems preserve local coupling with age. Our results describe patterns of structure-function coupling across the cortex and how this may change with age.

The authors explored the relationship between structural connectivity (SC) and resting-state functional connectivity (FC) from MRI images collected in human participants. Diffusion-based sparse SC network matrices were transformed using 40 different predictors(models)/parameter-combinations into fully weighted matrices. This allowed the authors to investigate whether different properties of SC predict global and local regional patterns of FC. Similar analyses were applied first in a young adult only dataset (HCP,n=95), and then to a lifespan dataset (NKI,n=542,. This work follows a collection of closely-related papers focused on this topic that some of the authors of the present work have been involved with. Here, the authors included a wide range of testing parameters across multiple models, which is a good practice when performing explorative analysis. The take home is that Euclidean distance-and mean passage time-based models explain the largest amount of variance in FC patterns, an observation which is consistent with prior reports. The regional analysis (e.g., Figure 2) provides interesting evidence that distinct SC-related properties may constrain patterns of FC across different functional networks, although its not clear what establishes these distinctions or why they vary. Additional variance is explained in a subset of locations when including multiple models (but see concerns below), suggesting that there remains an unexplained territory to better understand the complex relationships between structural and functional connectivity. The application of the SC-FC analysis to a broader range of age is novel, although considerable methodological and theoretical challenges constrain interpretation of these results. As a general point, I also felt that many of the analyses were unnecessarily complex and packaged in ways that hindered rather than facilitated understanding (e.g., the core-periphery analysis and associated results, more comments below). In sum, the work adds to prior observations and is largely consistent with previous papers on this topic but now explored in a larger parameter space (e.g., models, regions, age). As mentioned, interpretation of the more novel elements of the present submission are constrained by concerns which are elaborated upon below.
Detailed comments (no particular order): • How were the 10 class of predictors chosen? o It is unclear how and why the authors focused in on these specific classes of predictors and their subsets of parameters. There are a considerable number of available models, why are the ones included in this manuscript chosen to investigate SC-FC relationships? This should be discussed in the introduction, as it will further educate readers what these models represent, and why they are relevant to specific aspects of SC.
• Greater details and usage of null models should be included o The authors used spin test to general 1000 null models in their regional analysis and supplemental analysis (SC-FC relation with intelligence). A null model should also be used for the global analysis, showing whether the variance explained by the 40 models actually provide better results than a null.
• Clarify one-way ANOVA set-up o In the second section of the results, one-way ANOVAs were conducted to look at main effects of factors and regions. However, it is unclear whether that is based on data that averaged the subjects together (i.e., what is visualized in Figure 2a), or if a within-subject factor was used to account for subjects. If the ANOVAs were conducted on data where the dimension of subjects are flattened, then the authors should clarify that the "averaged over subject" was not only for visualization purposes only. o Is the main effect of regions' degrees of freedom incorrect? It says (39) even though there are 400 regions (pg. 3).
• Are delta R2 based on adjusted R2? o When the authors examined delta R2 based on coupling of predictors, it is not clear that if these were changes in adjusted R2 , where the number of predictors in a model is accounted for. If it isn't, the addition of predictors generally increases R2 , but it may not be an actual meaningful increase after accounting for the sheer occurrence of having more variables in the model. o Plotting a PCA biplot without the actual axis in its appropriate space makes it difficult to understand the PC space. The angles and distance of a variable with respect to the origin (0,0) has meaning in PC space, and should not be removed.
• Exploration of relationships should focus on using analyses that primarily reveals contributions of variables (e.g., PCA) o There are 40 predictors, and the authors tried to explore the relationship between these predictors in relation to the dependent variable (FC). It seems a stretch to force the coupling of predictor, which then results in a count matrix based on their R square changes across participants. Instead, the authors could focus on multivariate analyses such as PCA that conveniently included the information about similarity between predictors in its component space. The count matrix that shows the coupling of predictor forces a relationship that identifies 'pairs' of predictors, but in reality, there is no reason that a combination of 3 or more predictors might be even better suited for this problem. o Core-periphery analysis is unnecessary for understanding the relation between predictors in which their pair-wise relation is turned into a graph.
• A major take-away is that the SC-FC relationships alter with increasing age, however this conclusion assumes that structural and functional data are comparable in quality across the lifespan. o Head motion is strongly associated with resting-state function correlations (e.g., Power et al. 2012;Van Dijk et al., 2012;Ciric et al. 2017), which is critical to account for especially in studies that include participants with heightened motion (e.g., older adults). In the present study, the authors despiked frames that were above a frame displacement of 0.5mm in the lifespan dataset (NKI), which is inadequate in mitigating motion artifacts in the results (see Power et al 2014). Furthermore, no framewise correction was performed for the HCP dataset at all. A more stringent motion threshold should be employed in both datasets (HCP and NKI), followed by the reporting of the level of motion artifacts detected and removed. Lastly, the number of contaminated frames should also be controlled for in analysis across subjects. This is critical as motion artifact's relation to FC is also distance dependent (Euclidean distance is a major variable in the present manuscript). o In a similar way, head motion is also associated with SC, even after standard correction for head motion, eddy current, and field distortion (Baum et al. 2018). Head motions impacts the estimated SC in a way that is sensitive to edge consistency and length (which is associated with distance). Furthermore, as white matter degrades with age, and the quality of tractography suffers. This results in poorer estimates of white-matter that would alter the aging effect. o Given that the quality of both SC and FC is lower in older adults, it is fairly reasonable that their association is also weaker (e.g., increased noise in the data). Applying the current analyses in a lifespan dataset requires a much more rigorous approach that accounts for these known age-related artifacts and challenges in analysis and interpretation of both types of imaging data.
• Floor effect of age-related R2 differences. o The authors should address the possibility for a floor effect where regions with initially low R2 ( Figure  4d) have little room to exhibit age-related decreases in R2. A conjunction map of Fig. 4d and 4e would likely show areas of low R2 showing little decrease or even increases, whereas areas with the highest R2 to begin with has the greatest room to show 'decreases' with age. While it is likely that those areas truly exhibit greater age-related decreases in R2, the possibility of a floor effect or even regressing to the mean should be addressed.
Minor: • Age-related differences observed from a cross-sectional sample should be labeled as 'differences' instead of 'changes'; changes implies within-person changes, which requires longitudinal data. • Yellow labels or colors are not very visible/readable. Please consider using something like "brown" (e.g., Fig 1c, 2a, 2g) • Terms that are specifics to other types of data (e.g., 'expression' from genes, 'fingerprinting') creates more confusion than actually help the reader understand the results. • Due to how similar in setup Figure 1c and 2a are, their y-axis label should indicate that 2a is actually "Mean variance explained across subjects".

Dear Reviewers,
We wish to thank the editor and all reviewers for their thoughtful comments and suggestions. Our enclosed submission is significantly improved as a result. We provide, here, a short summary of those changes: 1. We compared performance of predictors with that of the raw SC matrix (R1), 2. We tested the effect of combining all predictors into a single multilinear model (R1), 3. Performed statistical testing at regional level for assessing lifespan/intelligence relationships (R1), 4. Included a broader discussion of methodology for linking brain structure and function (R2), 5. Fully replicated all of our results following increasingly stringent motion-correction strategies applied to both datasets (HCP and NKI) (R3), and 6. Addressed statistical concerns, including issues related to global structure-function relationships, the effect of including multiple predictors in the linear model, among others (R3).
Throughout this letter we refer to comments made by the reviewers in italics. Changes made to the text are shown in blue italics and indented. When appropriate, we highlight the section name where changes were implemented. In addition to this response letter, we include two versions of the revised manuscript: a "marked" version that tracks all of the changes made since the original submission and a "clean" version that includes all changes but without any highlights. This is a good point. Assessing the magnitude of structure-function coupling using the streamline matrix would help contextualize the results generated using the derived metrics.
To this end, we performed the following analysis. For each subject and for each brain region, we created a "mask" of its existing connections to other brain regions and extracted their corresponding structural functional connection weights. We then fit a linear model to predict functional connection weights using structural weights. This procedure resulted in a [400 x 95] matrix of whose elements corresponded to variance explained (R 2 values).
The R 2 values obtained from the streamline matrices and structural predictors were broadly similar to one another (mean similarity of r = 0.36 +/-0.08). However, there were some important differences.
To better understand how these two approaches compared to one another, we determined for each subject and region, whether its functional connections were better explained by streamline matrices or structural predictors. We found that, for each subject, structural predictors better explained functional connections for 67.5 +/-0.06 regions. Interestingly, the regional preferences exhibited non-random spatial structure. Regions whose functional connections were better explained by the structural predictors included those in control, default mode, dorsal attention, salience, ventral attention, somatomotor, and visual networks. In contrast, streamline matrices better predicted the functional connection weights of the temporo-parietal network.
Collectively, these results suggest that using structural measures to predict functional connection weights improve on predictions made using streamline counts, alone.
We note, however, that there are issues in directly comparing the performance of structural predictors and streamline matrices. Notably, the streamline matrices are sparse (only a fraction of connections exist) whereas the structural predictors are fully weighted (every node pair has some nonzero value).
For the sparse matrices, we cannot make predictions about functional connections for which there is no corresponding structural connection.

Fig. S5. Comparison of communication measures and structural connectivity in predicting FC.
In the main text we derived a series of communication measures from sparse structural connectivity (SC) data to predict regional FC patterns. Here, we compare the results from the communication measures with the results obtained from using the structural connections directly. In this analysis, we create a ``mask'' for each region and subject of its structural connections to other regions. We then extract the weights of those structural and functional connections and fit a linear model to explain the functional connection weights in terms of the structural weights. This results in an R 2 value for each region and subject. (a) Variance explained using communication measures averaged across subjects. (b) Variance explained using structural connections alone averaged across subjects. (c) Regional differences in variance explained. Warmer and cooler colors indicate regions whose FC is better predicted using the communication measures and structural connectivity, respectively. (d) Data from panel c but displayed by brain system.
We have now added Fig. S5 to the supplementary material and reference it in the Results section. The revised text reads: We also repeated this process using structural connection weights instead of derived predictors (see Fig. S5). We found that, while the two techniques generated similar regional patterns of explained variance ( That would also provide a point of reference for the regional differences shown in figure 3. We again agree with the reviewer that this is an interesting idea and a useful benchmark. We followed the reviewer's suggestion and used all 40 predictors to predict regional FC (see Fig. S8 below). Compared to the single-best predictor, using all predictors resulted in a three-fold increase in R 2 (3.2 +/-1.2 times greater) (Fig. S8e). Relative to the best two-predictor models, using all predictors still resulted in a greater than two-fold increase in R 2 (2.4 +/-0.6) (Fig. S8e).
Interestingly, the biggest increases in R 2 as a result of using all predictors included regions in default mode and salience/ventral attention networks. Of the top 10% of regions by increase in R 2 , nearly half came from those two networks, including regions in default mode that exhibited a greater than ten-fold increase (Fig. S8d).

Fig. S8. Predicting FC using all measures.
In the main text we predicted regional FC using metrics derived from SC matrices. Specifically, we focused on the variance explained by the best individual predictor (a) and the best pair of predictors (b). Here, we repeat this analysis but using all predictors to predict regional FC. (c) The variance explained at each region after using all predictors. (d) Ratio of variance explained using the best pair of predictors (panel b) and all predictors (panel c). Note that the peak increases fall within default mode (interparietal sulcus; posterior cingulate; precuneus) and salience/ventral attention (anterior cingulate). (e) Boxplot of regional variance explained. (f) Scatterplot of regional variance explained with the single best predictor versus the best pair of predictors (grey) and all predictors (orange). The black line is the identity line.
We now include this figure in the supplementary material. The revised text reads: This is an important point and one worth discussing. First, it is reassuring to note that the overall spatial pattern of the variance explained is similar between the two datasets (compare Fig. 2c with Fig. 4d, r=0.73), despite the notable differences in the magnitude of overall variance explained. One possible explanation for the difference in maximum magnitudes of these maps is the difference between cohort sizes (n=95 versus n=525). Since this map represents the maximum R 2 value for each node, it could be the case that in the larger NKI sample, more individual variation in structure-function prediction is captured in the sample. This might lead to specific nodes having a higher maximum R 2 based on a specific individual's differences that would more likely show up in the larger (and more heterogeneous regarding age and community representativeness) NKI sample.
Another possible explanation for the difference in magnitudes is differences in data quality and data preprocessing between the two cohorts. For NKI data, the functional correlations and structural streamline counts were measured relative to a volumetric parcellation in the subject anatomical space (3mm grid for functional scan, 2mm grid for DWI scan, 1mm grid for T1w scan). For HCP, structural streamline counts were measured relative to a volumetric parcellation in the subject anatomical space (0.7mm grid for T1w scan, 1.25 mm grid for DWI scan), but the functional data was provided by the HCP consortium in a standard high-resolution (2mm spacing between vertices) surface space known as fs_LR space. Thus, HCP functional correlations were measured using the same parcellation (Schaefer 400), but in the fs_LR space as opposed to the subject anatomical space. We could speculate that this higherresolution functional data in the standard space could be a possible explanation for the lower maximum R 2 values in the HCP cohort. It could be the case that the surface functional data, which is thought to better localize functional signals (Coalson et. al 2018, PNAS), is less influenced by the Euclidean distance between nodes (which is the top predictor in both datasets, but to a lesser extend in the HCP sample), which could possibly result in the lower maximum R 2 magnitudes. Additionally, a possibility is that the definition of nodes in fs_LR space is slightly different (owing to an alternative tool set used to fit the parcellations) node definitions for the functional and structural brain networks for the HCP data. However, ultimately, the similarity of the spatial pattern of maximum magnitudes reassures us that the effect of this mismatch likely insignificant to the main results of the paper.
We have updated the Discussion section to include this topic. The revised text reads: Interestingly, while the regional patterns of structure-function coupling in NKI and HCP datasets were correlated, the magnitude of coupling was noticeably stronger. While the origins of this difference remain unclear, possible explanations include differences in sample size (N = 95 in HCP versus N = 542 in NKI), heterogeneity, and community representativeness. Other possibilities include differences in data quality and preprocessing strategies. In the NKI dataset, functional and structural networks were estimated using a volumetric parcellation in subject anatomical space. In contrast, the HCP data was processed using surface-based analyses. Indeed, recent studies have identified notable differences between these two image processing pipelines (Coalson et al 2018). Future studies should investigate this possibility in greater detail, documenting the effect of different processing pipelines on local structure-function coupling.

The authors show a core-periphery structure in the graph of structural metrics. However, it is not made very clear what it means for a specific metric to be part of the core.
We apologize for any confusion and appreciate the opportunity to clarify. In this context, a "core" refers to pairs of predictors (metrics) that frequently appear together in two-term multilinear models. The same core predictors may sometimes be paired with peripheral predictors, but peripheral predictors are infrequently paired together. That is, the core is comprised of predictors that exhibit strong synergies in their ability to predict FC patterns; the periphery is comprised of predictors that exhibit relatively weak synergies.
We have updated the manuscript to better clarify this. The revised text reads: In this context, a "core" refers to pairs of predictors (metrics) that frequently appear together in two-term multilinear models. The same core predictors may sometimes be paired with peripheral predictors, but peripheral predictors are infrequently paired together. That is, the core is comprised of predictors that exhibit strong synergies in their ability to predict FC patterns; the periphery is comprised of predictors that exhibit relatively weak synergies.
5. In figure 2h, what threshold is used to determine whether regions are coloured or not?
We apologize for any confusion. In this figure, the colors of brain regions correspond to the fraction of subjects for which a given predictor was optimal, i.e. yielded the greatest R 2 value. The underlying values are continuous and range between zero -a given predictor was optimal for no subjects -up to one -a given predictor was optimal for all subjects. The colors, therefore, are also continuous, with gray representing a value of zero and brighter colors indicating increasing fractions of subjects. For the sake of visualization, we saturated the colormap at the fraction 3/10, so that if any regions above a value of 0.3 are assigned the brightest possible color.
We have updated the figure caption to indicate this more clearly. The revised caption reads: Note here that the colors projected onto the surface are continuous; grays correspond to regions where a predictor was optimal for few subjects while brighter colors correspond to regions where a predictor was optimal for many subjects. In all panels the colorscale was capped at a value of 0.3.
6. The results in figure 1D are interesting but hard to see. A different visualization (like in 1C) would make these findings more interpretable.
We appreciate the suggestion and have updated Figure 1D. See below. In this panel, the first row represents variance explained using only direct connections with path length of one (PL = 1). The next two rows show variance explained at path lengths two and three (PL = 2 and PL = 3).

Are the results in figure 4e and S6 thresholded for statistical significance? And if so, which correction for multiple comparisons was used?
This is an important point. We note that, in both cases, we did not test for significance at the region level, but at the level of brain systems (Figures 4e and S6 were accompanied by boxplots showing this). We agree, however, that it would be useful to also test for statistical significance at the regional level. In both cases, we adjusted our critical p-value to accommodate a 5% false discovery rate (Benjamini & Hochberg, 1995). In the case of the age correlation (shown in Figure 4e), we, found robust negative correlations involving regions in the somatomotor and visual systems (padj = 0.018). In the case of the intelligence correlation, no regions passed the multiple comparisons correction. See Fig. S11, below.
Fig. S11. Statistical thresholding of regional correlation maps. In the main text we computed the correlation of age and intelligence with the regional measures of structure-function coupling (R 2 ). In those analyses, we performed statistical testing at the level of brain systems. That is, we identified systems whose average correlation was strongermore positive or negativethan expected by chance.
Here, we perform statistical testing at a regional level, correcting for multiple comparisons by adjusting the critical value to accommodate a false discovery rate of q = 0.05, i.e. 5%. In panel a, we show results of the age and R 2 correlation (padj = 0.018). We note that no regions survive corrections for multiple comparisons in the case of correlation with intelligence.
We have now included a thresholded brain surface plot in the Supplementary Material ( Fig. S11) and have made edits to the text. The revised text reads: We include thresholded surface maps of the correlations between R 2 with both age and intelligence in the supplementary material (Fig. S11).
8. In the discussion section the authors suggest that: "These observations suggest that, decentralized patterns of interregional communication may degrade over the human lifespan, prompting a decoupling of functional connectivity from structure.". I wonder what is meant here exactly. Is it that with age, functional connectivity is shaped more strongly by direct (short) connections?
We appreciate the opportunity to clarify. One of the principal findings in our study is that the wholebrain correlation between FC and predictors decreases with age ( Fig. 4a). In parallel, we observed that the mean first passage time metric (MFPT) also becomes less frequent with increasing age. In the discussion, we speculated that these two observations might be related. That is, the decrease in coupling could be explained by the decreased use of MFPT as a communication policy.
We have updated the text to better clarify this assertation. The revised text reads: Based on these findings and along with the observation that the magnitude of global structurefunction coupling decreases with age, we speculate that, decentralized patterns of interregional communication may degrade over the human lifespan, prompting a decoupling of functional connectivity from structure.

In the discussion section I am missing a section on how the pattern of explained variance across the cortex compares to previous studies using similar approaches (e.g. ref 27, 28)
This is a good point and we regret the omission. We have updated the Discussion section to contextualize, more broadly, the links between structure and function from a local perspective. The new text reads: Here, we extend the matrix-based prediction approach from a global, whole-brain level to a the level of individual brain regions. We regret the omission of this detail and are happy to clarify. The reviewer is referring to the resolution parameter, gamma, that appears in the modularity equation. Changing the value of gamma changes the scale of the detected communities, i.e. the number and size of clusters.
Here, we use the uniform null model in our modularity equation. This null model is especially well-suited for analysis of correlation networks (see Bazzi et al 2016). Given this null model, our modularity matrix is defined as: = − , where is equal to 1 for all { , } and where is the resolution parameter.
Although, in principle, the resolution parameter could be varied to detect different-sized communities, we performed community detection with its value fixed at = 〈 〉 = 0.296 ≈ 0.3. That is, the resolution parameter was set to mean similarity across all pairs of regions. Our rationale for doing so was that we wanted to identify communities of regions whose mutual similarity to one another exceeded a "typical" value.
For completeness, we reanalyzed these data while varying the parameter and, in addition to the communities reported in the original submission, now show examples of communities at other values. The panel from the revised figure is shown below: In addition, we updated our description of the community detection procedure. It now reads: where Bij = Sij -Pij. In this expression, Pij is the expected weight of the connection between regions i and j and is a structural resolution parameter that tunes the number of size of detected communities. For simplicity, we set = <S> = 0.296 ≈ 0.3 and used this value for all pairs of brain regions. For completeness, however, we also tested other resolution parameter values, ranging from = 0 to = 1 in increments of 0.025. We show some of these communities in Fig. S3d.

Would it be possible to colour the clusters in the figure S3 according to the dominant structural metric in each cluster (making the left out regions grey). The current colour scheme is confusing.
We agree that this type of visualization is useful and have adopted it for communities shown in Figure  S3.

Could the age differences between participants explain the observed associations between SC-FC coupling and intelligence?
This is an interesting hypothesis and one that could be tested by regressing age from the intelligence measures and, using the residuals, computing their correlation with variance explained. We performed this procedure and found that the regional correlation pattern -a [400 x 1] vector -was almost identical to the correlation pattern obtained without regressing out age (r=0.986). These results suggest that differences in age are likely not drivers of the correlation between SC-FC coupling and intelligence.

Reviewer #2 (Remarks to the Author):
This ms investigates three extremely interesting and challenging problems I have only minor recommendations. I would be more detailed in reviewing the whole-brain literature, and more fair. There were really much more papers than the one that they cited and with more results that makes the decision between which is the best approach for establishing the link between FC and SC more challenging. Many aspects of modelling are extremelly usefull for adressing the same questions wuth the addition that they can be used to include also the time dimesion in, which is fully neglected by the present approach. So I would really be more balanced in the judging of whole-brain models and stress the complementarity (specially for gaing not only space but also time prediction information).
We thank the reviewer for their positive comments. We also agree that a more balanced discussion of the literature on modeling the link between FC and SC would broaden the appeal of our submission.
To this end, we have revised our Discussion section, adding two new paragraphs: 1. The first addresses the absence of temporal information in matrix-based predictors of FC. 2. The second discusses tradeoffs between different methods for studying structure-function relationships, emphasizing that they can and should be viewed as complementary to one another.
Additionally, in line with the reviewer's suggestion, we added additional references to the discussion of global structure-function coupling.
This new section reads: Additionally, matrix-based predictors fundamentally lack a temporal dimension. That is, they compress information about dynamical processes, e.g. diffusion, navigation, shortest paths routing, into matrix form, the precise temporal evolution of those processes is lost. In contrast, biophysical models document the temporal evolution of activity, generating spike trains, voltage traces, or hemodynamic signals from cells, populations, or regions (Deco 2008, Deco 2011, Ritter 2013. These temporal data provide an additional target for modeling studies; rather than simply matching the correlation structure of brain activity, can a model also replicate its time-varying features? Indeed, while recent work has begun to investigate time-varying structure-function coupling (Kong 2021, Liu 2021, Liegeois 2016, Fukushima 2018, future studies are necessary.
In summary, there exists a spectrum of methods for assessing and modeling structure-function coupling in empirical data (Honey 2010, Sanz Leon 2015, Sanz Leon 2013, Batista 2018, Sarwar 2021, Lynn 2019, Becker 2018, Rosenthal 2018 Overall comment: The authors explored the relationship between structural connectivity (SC) and resting-state functional connectivity (FC) from MRI images collected in human participants. Diffusion-based sparse SC network matrices were transformed using 40 different predictors(models)/parameter-combinations into fully weighted matrices. This allowed the authors to investigate whether different properties of SC predict global and local regional patterns of FC. Similar analyses were applied first in a young adult only dataset (HCP,n=95), and then to a lifespan dataset (NKI,n=542,. This work follows a collection of closely-related papers focused on this topic that some of the authors of the present work have been involved with. Here, the authors included a wide range of testing parameters across multiple models, which is a good practice when performing explorative analysis. The take home is that Euclidean distance-and mean passage time-based models explain the largest amount of variance in FC patterns, an observation which is consistent with prior reports. The regional analysis (e.g., Figure 2) provides interesting evidence that distinct SC-related properties may constrain patterns of FC across different functional networks, although its not clear what establishes these distinctions or why they vary. Additional variance is explained in a subset of locations when including multiple models (but see concerns below), suggesting that there remains an unexplained territory to better understand the complex relationships between structural and functional connectivity. The application of the SC-FC analysis to a broader range of age is novel, although considerable methodological and theoretical challenges constrain interpretation of these results. As a general point, I also felt that many of the analyses were unnecessarily complex and packaged in ways that hindered rather than facilitated understanding (e.g., the core-periphery analysis and associated results, more comments below). In sum, the work adds to prior observations and is largely consistent with previous papers on this topic but now explored in a larger parameter space (e.g., models, regions, age). As mentioned, interpretation of the more novel elements of the present submission are constrained by concerns which are elaborated upon below.
Detailed comments (no particular order): • This is an important point, and we appreciate the opportunity to elaborate. Our study builds on growing body of work in which sparse structure connectivity matrices are transformed into fully weighted matrices. The interregional weights in these matrices correspond to the capacity for two regions to communicate under some "communication policy." For instance, in a shortest paths matrix, the weights encode the smallest number of hops (or if the network is weighted, the minimum cost) of going from a source node to a target node. If shortest paths are the primary means by which brain regions communicate with one another, then might expect that the weights of the shortest path matrix be correlated with the weights of functional connections.
Shortest paths represent an example of a "centralized" communication policy in that using a shortest path for signally requires complete knowledge of a network's global topology -a particle (signal) moving from a source region to a target region needs to "know" which nodes are on the shortest path and which connections need to be followed in order to stay on that path.
In contrast, a number of studies have proposed "decentralized" communication policies. These include diffusion processes (random walks) and network navigation -where a particle moves from one node to another according to some greedy policy, e.g. choose the connected neighbor nearest the target in some metric space.
In selecting predictors, our aim was to sample this space and include predictors based on both centralized and decentralized processes and, when those predictors are parameterized, at a range of parameter values. In addition, we included other similarity-based predictors, e.g. matching index, cosine similarity, etc., and Euclidean distance. These measures, which do not fit neatly along the centralized/decentralized communication spectrum, were included because previous studies have identified topological similarity and the geometry of the space in which brains are embedded as key features that shape the correlation structure of their activity. We note that code for implementing these models is made public via our lab's github repository, allowing users the freedom to change parameter values or test additional predictors not included in the manuscript (https://github.com/brainnetworks/local_scfc).
We have updated the introduction of the manuscript to better to clarify why we selected the predictors we did. The revised text reads: This is an important point. To address this concern, we calculated the observed structure-function coupling magnitude (R 2 ) for each subject and predictor. We also obtained 1000 null values of the same measure using the spin test procedure described in the main text. Using these null values, we z-scored the observed R 2 measure.
This procedure results in 95 z-scores for each of the 40 predictors (see Fig S1). We evaluated these data in two ways. First, we transformed the z-scores into p-values and calculated for each predictor the fraction of subjects that exhibited a statistically significant correlation (corrected for multiple comparisons by fixing the false discovery rate at 5% and adjusting the critical p-value; p_adjusted < 0.0208). We found that 100% of subjects exhibited significant structure-function correlations in 21/40 predictors and that at least 90% in 30/40. Notably, fewer than 5% of subjects were statistically significant in three predictors: the binary flow graph at a Markov time of t = 10.0, binary communicability, and weighted path length with a weight-to-cost transformation of -4.
In addition, we identified predictors whose distribution of z-scores fully excluded a value of 0. The results using this method largely converge with those of the first: 75% of predictors fully exclude a value of zero. The remaining 25% include predictors identified using the first method and parametrically similar predictors along with some of the path transitivity measures.
In general, these results support the hypothesis that, although the variance explained by global structure-function coupling is weak relative to the local coupling, for most predictors and most subjects, the correlations exceed statistical significance.
We now include this figure (Fig. S1) in the Supplementary Material and reference this analysis in the main text. The revised text reads: We also tested the statistical significance of the variance explained using a spatially-constrained permutation model (Markello 2021) (see Fig S1).

Fig. S1. Statistical analysis of global structure-function coupling.
In the main text we reported structure-function coupling at the global (whole-brain) level. Here, we use spin tests to assess the statistical significance of the variance explained. Specifically, we calculated the observed structurefunction coupling magnitude (R 2 ) for each subject and predictor. We also obtained 1000 null values of the same measure using the spin test procedure described in the main text. Using these null values, we z-scored each observed R 2 measure. This procedure results in 95 z-scores for each of the 40 predictors. We evaluated these data in two ways. First, we transformed the z-scores into p-values and calculated for each predictor the fraction of subjects that exhibited a statistically significant correlation (corrected for multiple comparisons by fixing the false discovery rate at 5% and adjusting the critical p-value; padj = 0.0208). In addition, we identified predictors whose distribution of z-scores fully excluded a value of 0. In this figure, we show for each predictor the distribution of z-scores. Gray rectangles around predictors identify those whose distribution included a value of zero and were considered ``not significant''. The row of numbers below the boxplot indicates, for each predictor, the fraction of subjects that exhibited a statistically significant structure-function correlation. We appreciate the reviewer bringing this to our attention. When inspecting the reported ANOVA results, we noticed two errors in reporting. First, the test statistic for the ANOVA of region-averaged R 2 values was copied from the previous section and incorrectly listed as F(39) = 326.6. The correct test statistic is F(39) = 141.6. Second, when checking the ANOVA of subject-averaged R 2 values across regions, we realized that the test had been performed over the wrong dimension (predictors) when in fact we intended to perform it across regions.
We apologize for this error and thank the reviewer for discovering it. In the revised manuscript we have corrected both issues and now report the correct results and degrees of freedom. Additionally, we now clarify that the subject-averaged results are used not only for visualization purposes but also for statistical analysis.
The revised text reads: To visualize these results and for subsequent statistical analyses, we averaged over subjects and plotted the mean variance explained for each region and predictor (Fig 2a). As in the previous section, we found considerable variability across predictors (one-way ANOVA R 2 ; F(39) = 141.6; p < 10 -15 ) but also across regions (one-way ANOVA R 2 ; F(399) = 35.4; p < 10 -15 ), confirming that both regions and predictors differ from one another in terms of their mean variance explained.

• Are delta R2 based on adjusted R2? o When the authors examined delta R2 based on coupling of predictors, it is not clear that if these were changes in adjusted R2 , where the number of predictors in a model is accounted for. If it isn't, the addition of predictors generally increases R2 , but it may not be an actual meaningful increase after accounting for the sheer occurrence of having more variables in the model.
This is an important point. We repeated our analyses by calculating the adjusted R 2 for both the oneand two-term models. Specifically, we calculated the adjusted R 2 as: where = 399 is the number of samples (we exclude self-connections for all regions) and is the number of parameters and is equal to 2 and 3 for the one-and two-term models (the additional parameter is due to the intercept in both models).
In general, we find virtually no difference in the spatial pattern of ΔR 2 when we use the adjusted version compared to the unadjusted version (similarity with the unadjusted version is = 0.99, < 10 −15 ; see The revised text reads: In the supplementary material we repeat this analysis using the adjusted R 2 to confirm that changes in variance explained are not simply a consequence of the additional parameter in the linear regression model (Fig.S 6).

Fig. S7. Regional improvement in adjusted R 2 .
In the main text we reported the change in variance explained, ΔR 2 as a result of including two predictors. In general, we might expect increases in R 2 simply due to the inclusion of a second parameter. Here, we show an analogous plot using the adjusted R 2 measure, which takes into account the number predictors. Specifically, we calculated R 2 adjusted = 1 -(1 -R 2 )(N-1)/(N -p -1) for each region for both the one-and two-term models. Here, N = 399 is the number of samples (Nregions -1 because we exclude self-connections) and p is the number of predictors in the model and is equal to p = 2 and p = 3 for the one-and two-predictor models (the additional parameter is from the intercept). In general, we find excellent correspondence between ΔR 2 and ΔR 2 adjusted (r = 0.99, p < 10 -15 ). These observations suggest that the change in variance explained is not likely driven simply by an increase in the number of parameters.
• PCA space used in Figure 2 and 3  We apologize for the omission of details about the PCA plots. In line with the reviewer's suggestions, have done the following: 1. Updated the figure caption to include a more detailed description of the PCA analysis. 2. Updated the figure caption to indicate what edges represent. 3. Included in both PCA plots the axis.
We also note that this PCA analysis was performed largely as a means of embedding predictors in a lowdimensional space and, other than helping to reveal similarities between different predictors, we do not analyze the results further.
The revised figure captions read: 1. For Figure 2: (b) Spatial embedding of predictors. The coordinates were determined using the following procedure: 1) calculating the similarity

of regional correlations for every pair of predictors, 2) thresholding this matrix to retain the k=4 nearest neighbors for each predictor, and 3) performing a principal component analysis of the thresholded and symmetrized matrix.
Coordinates represent the first two principal components, PC1 and PC2. Broadly, predictors that are near/distant to one another in principal component space exhibit similar/dissimilar patterns of regional correlations. In this plot, edges exist between predictors if they are nearest neighbors.

For Figure 3:
In this plot, coordinates were determined by: 1) thresholding the count matrix to retain, for each predictor, its k = 4 nearest neighbors, and 2) performing a principal component analysis on the thresholded and symmetrized matrix. Here, the coordinates represent the first two principal components, PC1 and PC2. Predictors that are near/distant from one another in principal components space pair with similar/dissimilar sets of predictors when improving R 2 . Predictors are joined by an edge if they are considered nearest neighbors.

• Exploration of relationships should focus on using analyses that primarily reveals contributions of variables (e.g., PCA) o There are 40 predictors, and the authors tried to explore the relationship between these predictors in relation to the dependent variable (FC). It seems a stretch to force the coupling of predictor, which then results in a count matrix based on their R square changes across participants. Instead, the authors could focus on multivariate analyses such as PCA that conveniently included the information about similarity between predictors in its component space. The count matrix that shows the coupling of predictor forces a relationship that identifies 'pairs' of predictors, but in reality, there is no reason that a combination of 3 or more predictors might be even better suited for this problem. o Core-periphery analysis is unnecessary for understanding the relation between predictors in which their pair-wise relation is turned into a graph.
This is an important point, and we appreciate the opportunity to clarify decisions made as part of our analysis pipeline. In general, we agree with the reviewer that there are multiple strategies for exploring structure-function coupling. While we adopted an approach using single or pairs of predictors, deriving principal components from correlated predictors is another. Here, we explore this type of analysis and its utility for predicting function from structure. We describe the results of this analysis below and report the same findings in the Supplementary Material.
Specifically, we performed the following steps. For each subject, we vectorized and concatenated their 40 predictors (Fig. S9a) into a [NumEdges x 40] matrix and z-scored its columns (Fig. S9b). This matrix was used as input for the PCA algorithm, which generated 40 orthonormal components, the loading of each predictor onto each component, and the variance explained by those components (Fig. S9c). This procedure was performed separately for each subject.

Fig. S9. Predicting FC using all PCs.
In the main text we predicted regional patterns of FC using a series of predictors and later identified pairs of predictors that maximally improved these predictions. However, because predictors are correlated with one another, interpreting the results of this analysis may be difficult. Here, we use principal component analysis (PCA) to generate an orthonormal basis set from the predictors and use the resulting principal components (PCs) to predict FC. We performed PCA at the subject level using the 40 predictor matrices as input (a We identified for each subject the PC that yielded the greatest R 2 and averaged R 2 values across subjects. In panels, h and i we show R 2 values and the optimal PC projected onto cortical surface. We compared R 2 values reported in the main text with those generated using the PCA-based approach. We find that results from the main text outperform those generated using PCs (paired-sample t-test, p < 10 -15 ). We present this comparison in panel j. As in the main text, we also tested the effect of combining pairs of predictors in the same multilinear model. Again, we found that the two-predictor models reported in the main text outperform the PCA approach (paired sample t-test, p < 10 -15 ). A boxplot highlighting this comparison is shown in panel k. (l) Increases in R 2 as a results of including an additional PC as a predictor. Interestingly, the combinations of predictors that lead to the greatest improvement tended to involve PC1 pairing with other components. (k) A matrix plot of how frequently pairs of predictors were identified as the optimal model.
Next, we used PCs to generate predictions of regional FC. We followed the same analysis pipeline as the main text. Instead of fitting 40 linear models (one for each communication measure), we fit 5 models (one for each PC). For each region and each subject, we identified the PC with the greatest R 2 and then averaged this value across subjects, yielding a single [400 x 1] vector (Fig. S9h). We compared this vector against the one reported in the main text (Fig. 2c) and found excellent correspondence ( = 0.96). However, the values estimated using PCA were significantly less than those estimated using the predictors (paired sample t-test, < 10 −15 ; Fig. S9j), indicating that while the patterns are similar, the individual predictors actually outperform the PCA approach. We also find that for most regions (59.3%) PC1 is the predictor that explains the greatest amount of variance (Fig. S9i).
Next, we tested whether combinations (pairs) of PCs could lead to improvements in R 2 . In general, we found that the optimal combinations were PC1+PC2 and PC1+PC3 (Fig. S9k). As in the main text, the regions with the greatest increases in R 2 were concentrated in sensorimotor systems (Fig. S9l). However, as in the previous section, we find that the R 2 values of the multi-linear models with PCs were significantly less than those reported in the main text (paired sample t-test, < 10 −15 ; Fig. S9k).
In general, these results corroborate those reported in the main text. Specifically, they support the hypothesis that local structure-function coupling is strongest in sensorimotor systems and that combinations of predictors -this time pairs of PCs -yield improvements in variance explained.
We now include this figure (Fig. S9) and results results in the Supplementary Material. The revised text reads: In parallel, we repeated the above analysis using principal components derived from predictors rather than the predictors themselves. This procedure helps address concerns related to correlated predictors. In general, the results of this analysis converge with those reported here. See Fig. S9 for a summary of these results.
• A major take-away is that the SC-FC relationships alter with increasing age, however this conclusion assumes that structural and functional data are comparable in quality across the lifespan. o Head motion is strongly associated with resting-state function correlations (e.g., Power et al. 2012;Van Dijk et al., 2012;Ciric et al. 2017), which is critical to account for especially in studies that include participants with heightened motion (e.g., older adults). In the present study, the authors despiked frames that were above a frame displacement of 0.5mm in the lifespan dataset (NKI), which is inadequate in mitigating motion artifacts in the results (see Power et al 2014). Furthermore, no framewise correction was performed for the HCP dataset at all. A more stringent motion threshold should be employed in both datasets (HCP and NKI),

followed by the reporting of the level of motion artifacts detected and removed. Lastly, the number of contaminated frames should also be controlled for in analysis across subjects. This is critical as motion artifact's relation to FC is also distance dependent (Euclidean distance is a major variable in the present manuscript). o In a similar way, head motion is also associated with SC, even after standard correction for head motion, eddy current, and field distortion (Baum et al. 2018). Head motions impacts the estimated SC in a way that is sensitive to edge consistency and length (which is associated with distance). Furthermore, as white matter degrades with age, and the quality of tractography suffers. This results in poorer estimates of white-matter that would alter the aging effect. o Given that the quality of both SC and FC is lower in older adults, it is fairly reasonable that their association is also weaker (e.g., increased noise in the data). Applying the current analyses in a lifespan dataset requires a much more rigorous approach that accounts for these known age-related artifacts and challenges in analysis and interpretation of both types of imaging data.
We agree with the reviewer that the effects of in-scanner motion could be better addressed. To this end, we performed a series of additional analyses and have also updated our Discussion section to candidly discuss possible limitations.
First, we reanalyzed HCP data employing a more rigorous motion threshold. Specifically, following preprocessing (which was left unchanged), we: 1. censored any frames with relative motion > 0.15, 2. censored frames that were within 2 or fewer frames of any frame dropped in step 1, and 3. retained only contiguous sequences of frames of length 5. 4. In addition, we completely excluded any scans that contained fewer than 50% of usable frames following steps 1-3. Any subject dropped a scan based on this criterion was also removed from all subsequent analyses.
Based on these criteria, we retained 70/95 subjects. Of the remaining 70 subjects, they retained, on average, 90.3±0.97% of their frames. These retained frames corresponded to an average relative movement of 0.067±0.01. In contrast, the censored frames had average relative movement of 0.15±0.03 (Note: these values include frames that are sub-threshold in terms of motion but occur in proximity to high-motion frames or represent short sequences of low-motion frames).
Using these data, we calculated FC matrices for each subject. We then replicated the principal results of our study. Specifically, we found:  ). We found a strong correspondence (r = 0.997). These findings held at the subject level (calculating column-wise similarity; r = 0.999±1.3×10 -4 ). 2) Next, we performed an analogous procedure at the regional level, resulting in a [400 region x 40 predict x 70 subject] matrix of R 2 values. Again, we assessed the similarity of motioncorrected results with those in the main text by vectorizing each matrix, transforming them into vectors of length [1,120,000 x 1] and computing their Pearson correlation. As before, we found an excellent correspondence (r = 0.995). We also repeated this analysis at the subject level by transforming each of the [400 x 40] matrices into a [16,000 x 1] vector and computing, for each subject, the similarity between the originally reported and motion corrected R 2 values. Once again, we found a strong correspondence r = 0.9995±6.13×10 -4 . 3) Finally, we modeled FC using pairs of predictors, identified for each region the optimal pair (the one that explained the most variance), and summarized these results with a "count" matrix, as in the main text. We then computed the correlation of the motion corrected count matrix (its upper triangle) with the matrix reported in the original submission. We found a strong correlation of r = 0.997.
Collectively, these results suggest that none of the effects reported in the main text using the HCP data are obviously confounded by in-scanner motion.
We now report these results, including the motion statistics, in the main text and include a supplementary figure showing the scatterplots. The revised text reads: In the main text, we analyzed data that had been processed using the above procedure. We also performed extensive post-processing of these data to reduce the likelihood that in-scanner motion contributed to any reported effects (Power 2012, Power 2014, Ciric 2017, Parkes 2018. Specifically, we implemented the following steps for the HCP data: 1. Using the Movement_RelativeRMS.txt time series, we identified frames with motion greater than 0.15. These frames were immediately censored and not used in the estimation of FC. 2. We also censored any low-motion time points that were within two frames of any of the frames censored in step 1. 3. Following steps 1 and 2, we further censored any sequence of temporally contiguous low-motion frames that was shorter than five frames. 4. Lastly, we excluded a scan if more than 50% of its frames were flagged as high-motion following steps 1-3, i.e. were censored. If any scan from a given subject was removed, we removed that subject and all of their other scans from analysis.
Given these criteria, we retained 70/95 HCP subjects. Of these remaining subjects, we retained, on average, 90.3±0.97% of their frames. The frames that were retained had 0.07±0.01 relative motion. In contrast, the censored frames had relative motion of 0.15±0.03.
Using these data, we fully replicated the results reported in the main text. Specifically, we made predictions of whole-brain FC using the 40 predictors from the 70 low-motion subjects. This resulted in a [40 predictor x 70 subject] matrix of R 2 values, which we transformed into a [2800 x 1] vector. We did the same using the data from the main text and computed the similarity (Pearson correlation) between these two vectors. We found a strong correspondence (r = 0.997; p < 10 -15 13×10 -4 ). Finally, we modeled FC using pairs of predictors with multilinear models, identified for each region the optimal pair (the one that explained the most variance), and summarized these results with a count matrix, as in the main text. We then computed the correlation of the motion corrected count matrix (its upper triangle) with the matrix reported in the original submission. We found a strong correlation of r = 0.997 (p < 10 -15 ).
We also performed a similar procedure using the NKI dataset. Specifically, we excluded subjects from analysis whose temporal signal to noise ratio (TSNR; mean BOLD signal across time divided by temporal standard deviation; Krüger et al 2001) was in the bottom P% or whose DVARS (derivative of RMS variance over voxels; Power et al 2012) was in the top Q%. Of the remaining subjects, we regressed out both variables from the regional R 2 values and analyzed the residuals.
Depending upon the values of P and Q, different subsets of subjects were excluded. Here, we systematically varied these parameters and report: 1. The number of retained subjects, 2. The similarity (correlation) of the residual R 2 vs. age correlation pattern with the pattern reported in the main text, and 3. The mean difference in correlation magnitude. 4. The mean difference in absolute correlation magnitude. We show the results of this analysis in the below figure.
In general, we find broad correspondence between the results reported in the main text and the results at all combinations of parameters (see Fig. S13). Specifically, the similarity between correlation patterns never fell below a value of = 0.29. However, this pattern was obtained using an unreasonable combination of parameters in which only 8 total subjects were retained. For parameter combinations that retained at least half of the full dataset (262 subjects), the minimum similarity was = 0.83, with a mean±standard deviation of = 0.89 ± 0.02 and range of 0.83 to 0.92.
Over this same range the mean correlation magnitude decreased by −0.032 ± 0.005 while the mean absolute correlation increased by 0.035 ± 0.003, suggesting that negative correlations are getting increasingly more negative. Fig. S13. Effect on correlation of regional R 2 versus age of motion-correction applied to NKI dataset. In the main text we reported a correlation between regional R 2 and age using the NKI dataset.
Here we assess the effect of motion-correction on these results. Broadly, this strategy involved excluding data from subjects in the bottom P% based on their temporal signal to noise ratio (TSNR; mean BOLD signal divided by temporal standard deviation (Kruger et al 2001), or was in the top Q% by DVARS (derivative of RMS over voxels; Power et al 2012). Of the remaining individuals, we regressed out both variables from their regional R 2 values and computed their correlation with age. Depending on the values of P and Q different subsets of subjects were retained. Here, we systematically varied these parameters are report: (a) The similarity of the R 2 versus age correlation pattern with the pattern reported in the main text. (b) The mean difference in regional correlation coefficients ( Collectively, the results of these two analyses support those reported in the main text. We now include the above figure (Fig. S13) in the Supplementary Material and have updated the text as well.
• Floor effect of age-related R2 differences. o The authors should address the possibility for a floor effect where regions with initially low R2 ( Figure  4d) have little room to exhibit age-related decreases in R2. A conjunction map of Fig. 4d and 4e would likely show areas of low R2 showing little decrease or even increases, whereas areas with the highest R2 to begin with has the greatest room to show 'decreases' with age. While it is likely that those areas truly exhibit greater age-related decreases in R2, the possibility of a floor effect or even regressing to the mean should be addressed.
This is an important question. We address the possibility of a floor effect using null models to generate null distributions and estimate the theoretical lower limits of regional R 2 values. If a floor effect exists for some regions, wherein their observed R 2 values cannot decrease further, we expect them to be consistent with the null.
For each subject, we predicted their regional FC patterns using predictors derived from a randomized SC matrix (the order of its rows and columns were permuted). We then calculated and retained, for each region, its maximum R 2 across any of the predictors. We repeated this procedure 100 times for each subject, generating subject-specific null distributions. We compared the observed R 2 values with this null distribution (z-score) and identified regions whose z-statistic could not be distinguished statistically from the null distribution (p >= 0.05). We counted how frequently across subjects each of the 400 regions was part of this group, considering them to have already reached their floor R 2 .
We find that, on average, most regions exhibit R 2 values that are statistically greater than their theoretical floor (96.8±1.5%). The specificity of regions that are approaching their R 2 floor is also poor; the region with the greatest frequency is consistent with the null distribution in 14.7% of subjects (Fig.  S14a). Interestingly, and as the reviewer anticipated, the whole-brain pattern is similar to the correlation pattern reported in the main text ( = 0.51; < 10 −15 ; Fig. S14b), opening up the possibility that some of the age-related differences may be attributable to a floor effect.
However, disentangling those two effects -the floor effect from true age-related differences -is challenging. Accordingly, we have included the above figure and now discuss the floor effect as a limitation and area for future investigation.
Fig. S14. In the main text we reported a correlation between regional R 2 and age using the NKI dataset.
Here we assess the possibility that the spatial pattern of these correlations may be driven, in part, by a floor effect. To test for this effect, we use a permutation-based null model. Regions whose R 2 values cannot be statistically distinguished from the null distribution are especially susceptible to a floor effect. In more detail, this procedure entailed the following. For each subject, we predicted their regional FC patterns using predictors derived from a randomized SC matrix (the order of its rows and columns were permuted). We then calculated and retained, for each region, its maximum R 2 across any of the predictors. We repeated this procedure 100 times for each subject, generating subject-specific null distributions. We compared the observed R 2 values with this null distribution (z-score) and identified regions whose z-statistic could not be distinguished statistically from the null distribution (p >= 0.05). We counted how frequently across subjects each of the 400 regions was part of this group, considering them to have already reached their floor R 2 . We find that, on average, most regions exhibit R 2 values that are statistically greater than their theoretical floor (96.8±1.5%). The specificity of regions that are approaching their R 2 floor is also poor; the region with the greatest frequency is consistent with the null distribution in 14.7% of subjects. Interestingly, and as the reviewer anticipated, the whole-brain pattern is similar to the correlation pattern reported in the main text ( =0.51; p<10 -15 ), opening up the possibility that some of the age-related differences may be attributable to a floor effect.
(a) Fraction of subjects in which a region's R 2 is not distinguishable from the null distribution. We show these values projected onto the cortical surface. (b) Values from panel a with the regional R 2 versus age correlation coefficients. (c) Values from panel a grouped by system.
The revised text reads: A final limitation concerns the possibility that the reported correlations between R 2 and age could be attributed to a floor effect. That is, any regions that start with low R 2 early in life necessarily have less room to decrease over the course of the lifespan, whereas regions that start with high levels of R 2 have more room to decrease. Although we show that the fraction of regions that may be susceptible to this effect is small, the spatial pattern is correlated with the reported R 2 versus age correlation map (see Fig. S14}). Disentangling true age-related differences from floor effects is challenging and may require the use of an unbounded measure other than R 2 for assessing structure-function correspondence. Future work should investigate these potential confounds in greater detail. Minor: • Age-related differences observed from a cross-sectional sample should be labeled as 'differences' instead of 'changes'; changes implies within-person changes, which requires longitudinal data.
We have changed all references to age-related changes or lifespan changes to differences.
• Yellow labels or colors are not very visible/readable. Please consider using something like "brown" (e.g. , Fig 1c, 2a, 2g) We appreciate the reviewer bringing this to our attention. We've changed the color to brown.
• Terms that are specifics to other types of data (e.g., 'expression' from genes, 'fingerprinting') creates more confusion than actually help the reader understand the results.
• Due to how similar in setup Figure 1c and 2a are, their y-axis label should indicate that 2a is actually "Mean variance explained across subjects".
We have changed the axis label.

REVIEWER COMMENTS
Reviewer #1 (Remarks to the Author): I highly appreciate the extensive set of new analyses the authors performed in response to the reviewer comments as well as the increased clarity of the manuscript. I have a few minor concerns left: 1. The additional analyses of the floor effect as a driver of regional differences in the age-R2 correlations are an important complement to the paper. However, the results in figure 4G are currently described separately from the results in figure S14 while it seems to me that the associations that are described may be very similar. The floor effect may be a driver of the relationship between variance explained and its correlation with age. That is why I would suggest describing these findings in the same part of the results section.
2. All the figures shown in the manuscript should show results that indicate statistical significance. As they are presented now, the casual reader may be unaware that results are uncorrected. Unthresholded maps could be shown in the supplementary materials. For example, figure 4d-e should contain results after correlation for multiple comparisons, so it is clearer that some of these results are not statistically significant. Also in figure 4f, statistical significance should be indicated with asterisks. Figure 1c could be replaced by figure S1.
3. A related points is that the grey -colour gradient used in many figures results in a rather arbitrary and very stark threshold between the grey regions and the coloured ones. I initially mistook this threshold to reflect statistical significance and I think other readers might make the same mistake. That is why I would recommend using a different colour scale with a less stark transition.
4. I would suggest replacing figure 3D by figure S7. Since the results are nearly identical I do not see the added value of presenting both.
Reviewer #2 (Remarks to the Author): The authors have answers all my and others comments in an excellent way. I recommend publication as it is. Thanks the authors for the the hard work in the revision.
Reviewer #3 (Remarks to the Author): The authors have addressed multiple concerns raised in the previous review. Specifically, they have now clarified their choice of predictors in the introduction with supporting theory, included null models for global analysis, and explored how the 40 predictors perform using PCA in their supplemental analysis. However, there still remain issues regarding data-quality due to motion-related noise. While the authors have done additional analyses to respond to the previous comments about movement, this limitation has not been adequately addressed in the NKI dataset which is the basis of the conclusions regarding age-related differences in the reported observations. This limitation is elaborated below, along with a number of other more minor comments.
Motion correction: a. The authors performed motion correction using frame censoring on the HCP dataset, but not for the NKI dataset (which likely has greater motion contamination). The processing strategy in this latter dataset was to exclude subjects based on a threshold P% TSNR and Q% DVARS. While this latter procedure removes subjects that move more on average, the remaining subjects still retain motion contaminated frames which would bias the measurement of resting-state correlations. Given that the HCP data were frame censored, its not clear why this same approach was not conducted for the NKI dataset, especially given that the latter includes individuals that exhibit a higher number of large head movements during the scan session (i.e., older adults). Without adequate motion processing, the present conclusions regarding the lifespan variation are not supported. b. In relation to the above, the authors compared results of the motion-corrected data to the original uncorrected analysis by comparing the vectorized matrix of the original analysis (i.e., from the main text) and the motion-corrected analysis. While the patterns appear comparable, this is an indirect way of demonstrating the results are robust to attempts to minimize sources of motion-related noise. This is particularly important for the lifespan dataset analysis using the NKI dataset. The authors should report the outcome statistics using the motion-corrected processing, both for HCP and NKI.
Minor comments: • 2nd paragraph of pg. 3, the author states that "Across all factors, we found that the majority of variance explained can be attributed to one-step (direct) connections (Fig. 1c)". I think they meant to refer to Fig. 1d.
• In the first paragraph of pg. 12, "Interestingly, while the regional patterns of structure-function coupling in NKI and HCP datasets were correlated, the magnitude of coupling was noticeably stronger." Please clarify what is the magnitude of coupling noticeably stronger in.
• The authors should explain why age 'bins' are used in the lifespan analysis instead of using just age as a continuous variable.
• The authors clarified that their use of PCA in the main text was mainly to showcase the similarity between predictors in a lower dimensional space. Were the columns centered and/or normalized? Centering (and in some cases normalizing) is often the default for PCA functions, and could alter interpretation of the resultant output. Greater details should be provided to clarify what was done.
• In the paragraph before the Discussion section in pg. 8, "Heteromodal systems, like default mode and control networks, on the other hand, exhibit subtle reductions in coupling magnitude and, in some cases, even increase with age." It is a leap to infer potential increases of an entire system's effect when the mean r(age,R^2) is below 0. The authors should revise to point out certain "regions" within the default mode and control networks may exhibit these patterns.

Dear Reviewers,
We wish to thank the editor and all reviewers for their thoughtful comments and suggestions. Our enclosed submission is significantly improved as a result. We provide, here, a short summary of the largest changes: 1. We updated the post-hoc motion censoring strategy for the NKI dataset so that it is consistent with the technique applied to the HCP data. 2. We updated all main text figures so that the reported results come from the motion-censored data rather than reporting uncensored data and, treating it as a benchmark, verifying its robustness with censored data. 3. We updated figure colormaps ( Figure 2h) and added two additional supplementary figures; Figure S2 depicts the fraction of subjects for which regional structure-function coupling was statistically significant and Figure S10 shows age-frequency analysis without the binning procedure. 4. We have incorporated several panels from supplementary figures from the previous submission into the main text; those supplemental figures no longer appear. Specifically, we removed the figure showing thresholded correlation maps and the adjusted 2 analysis.
Throughout this letter we refer to comments made by the reviewers in italics. Changes made to the text are shown in blue italics and indented. When appropriate, we highlight the section name where changes were implemented. In addition to this response letter, we include two versions of the revised manuscript: a "marked" version that tracks all of the changes made since the original submission and a "clean" version that includes all changes but without any highlights. 1. The additional analyses of the floor effect as a driver of regional differences in the age-R2 correlations are an important complement to the paper. However, the results in figure 4G are currently described separately from the results in figure S14 while it seems to me that the associations that are described may be very similar. The floor effect may be a driver of the relationship between variance explained and its correlation with age. That is why I would suggest describing these findings in the same part of the results section.
We appreciate this point and have moved the description of the floor analysis to the same section where we describe results in Figure 4g. As a result, Figure S14 is now numbered Figure  S12. The revised text reads: "An important concern related to age differences in structure-function coupling is the possibility of a floor effect. Namely, that the R 2 of some regions cannot exhibit significant decreases because it is already near its theoretical floor value. To assess the likelihood of such an effect occurring, we estimated floor values for each region and subject using a permutation-based null model (100 repetitions; see Fig. S12 for methodological details) and compared them against the observed R 2 values. In general, we found that most regions (96.8 ± 1.5% were significantly greater than their theoretical floor) and that those regions nearing the floor did not overlap with the regions in which we reported strong age effects."  figure 4f, statistical significance should be indicated with asterisks. Figure 1c could be replaced by figure S1.

All
This is an important point. We have updated the following figures and panels to indicate statistical significance.
• Figure 2c and Figure 4d -these panels depict average 2 and were left unchanged, as they represent complicated transformations of statistical models (means of the maximum 2 values across subjects and predictors, respectively). However, we have added an additional figure depicting the fraction of subjects for which each region was determined to be statistically significant (false discovery rate fixed for each subject independently at = 0.05). Note that in almost every case, the models were statistically significant. It is straightforward to understand why this is. At the group level, the smallest 2 values in the HCP dataset are close to 0.04. When transformed back to a correlation coefficient, this yields a value of = 0.2. Given that the correlations were estimated using − 1 samples and the monotonic relationship of with -values, correlations of this magnitude will tend to be statistically significant. Fig. S2. Fraction of subjects with significant regional structure-function coupling. (a) Human Connectome Project dataset. (b) Nathan Kline Institute dataset. For both datasets, the false discovery rate was fixed at = 0.05 and a separate adjusted -value calculated for each subject.

3.
A related points is that the grey -colour gradient used in many figures results in a rather arbitrary and very stark threshold between the grey regions and the coloured ones. I initially mistook this threshold to reflect statistical significance and I think other readers might make the same mistake. That is why I would recommend using a different colour scale with a less stark transition.
We appreciate this suggestion and apologize for any confusion. We presume that the reviewer is referring primarily to the colormap used in Figure 2h. In these panels, the colormaps go from a gray value to some color, where the color corresponds to a particular family of predictors, e.g. the red map represents the communicability measures. We agree with the reviewer that these maps can be improved, both visually and in terms of clarity.
To this end, we updated the colormaps and also added text in the figure caption, noting explicitly that the colors denote categories or families of predictors and not statistical significance.
In choosing new colormaps, we considered, specifically, the perceptual contrast of maps we currently use. In general, a concern about some colormaps is that they are perceptually imbalanced, such that variation in data values is not well captured by variation in the colormap itself. A classic example of a problematic colormap is the former "jet" map, which used to be the default in MATLAB (see Kovesi, 2015 for a detailed explanation of the issues and this blogpost for Mathworks justification on why "jet" was replaced).
In examining our current colormaps using methods from Kovesi (2015), we found evidence of ranges of modest imbalance. These imbalances arose because of how we selected our base colors. Our current colormaps were generated using four ordered RGB triplets and interpolating between the extremes. Specifically, the four triplets were: "light gray" "dark gray" "color" "lighter version of same color" As an example, consider the "dark blue" colormap, which corresponds to the family of predictors associated with mean first passage times (mfpt-wei and mfpt-bin). In the top panel of the figure below, we show the four base colors along with the interpolated colormap. Following Kovesi (2015), we superimposed sinusoids of varying amplitude over the colormap. Intuitively, if the colormap is perceptually balanced then the variation should be evident.
Regions of the colormap where the sinusoidal behavior is not perceptually distinguishable correspond to perceptual discontinuities. Using this approach, we identified at least one region of perceptual discontinuity near the transition between "color" and the "lighter color" (a similar discontinuity was evident in the other colormaps). We highlight this in red in the below figure.
To resolve this issue, we created a new colormap that transitioned from gray to dark blue. We repeated the above analysis and found no clear evidence of discontinuities, suggesting that the new colormap represents an improvement. For each predictor (subpanels of Figure 2h), we generated a new colormap following the above procedure. We show, below, the updated panel: The revisions to the figure captions are shown below: In Figure 1: "Note that here and in all other figures, we adopt a consistent color scheme, wherein data associated with a given predictor, e.g. same measure but different parameters or weighted/binary versions of a measure, are depicted using variation of the same color. For example, communicability measures are red, matching index is yellow, and mean first passage time is dark blue." In Figure 2: "Note that the base colors for each surface plot correspond to different predictors and not varying levels of consistency across subjects or statistical significance." 4. I would suggest replacing figure 3D by figure S7. Since the results are nearly identical I do not see the added value of presenting both.
We agree and have done so. We have also removed the previously designated Figure S7 from the supplement. Finally, we have modified the text that had previously appeared in the caption for Figure S7 and created a new subsection in the Materials and Methods section, describing the procedure for calculating adjusted 2 . The added text in both sections reads: In the Results section: "Here, when we calculate 2 , we adjust the multilinear model's 2 to penalize for the addition of a second predictor (see Materials and Methods for details of the correction)." In the Materials and Methods section: "In the main text we reported the change in variance explained, Δ 2 as a result of including two predictors. In general, we might expect increases in 2 simply due to the inclusion of a second parameter. One strategy to account for this addition is to adjust the 2 measure based on the number of predictors. Specifically, we calculated 2 = 1 − (1− 2 )( −1) − −1 for each region for both the one-and two-term models.
Here, = 399 is the number of samples ( − 1 because we exclude selfconnections) and is the number of predictors in the model and is equal to = 2 and = 3 for the one-and two-predictor models (the additional parameter is from the intercept)." Figure S10. Correlation of predictor frequency with age as a continuous variable. In the main text we binned subjects into groups based on their age and calculated the correlation frequency of each predictor versus age. Here, we repeat this analysis but without the binning procedure. We found that only weighted and binary mean first passage time passed multiple comparison corrections ( = −0.32 and = −0.22; = 9.8 × 10 −13 and = 7.7 × 10 −7 ; false discovery rate fixed at = 0.05; adjusted critical value of = 0.0011).
The revised text reads: "We then grouped subjects into percentile-based age bins (10 bins in the main text; see Fig. S9 for reproducibility of results with different numbers of bins and Fig. S10 for results in which age was treated as a continuous variable and without the binning procedure), and found that …" 4. The authors clarified that their use of PCA in the main text was mainly to showcase the similarity between predictors in a lower dimensional space. Were the columns centered and/or normalized? Centering (and in some cases normalizing) is often the default for PCA functions, and could alter interpretation of the resultant output. Greater details should be provided to clarify what was done.
We apologize for the omission. Here, we did not center or z-score columns. We now note this.
The revised text reads: In Figure 2's caption: "Note that here the principal component analysis was carried out without zscoring or centering the columns of the correlation matrix." In Figure 3's caption: "As in Fig. 2b, we did not center or z-score columns as part of the principal component analysis."

5.
In the paragraph before the Discussion section in pg. 8, "Heteromodal systems, like default mode and control networks, on the other hand, exhibit subtle reductions in coupling magnitude and, in some cases, even increase with age." It is a leap to infer potential increases of an entire system's effect when the mean r(age,R^2) is below 0. The authors should revise to point out certain "regions" within the default mode and control networks may exhibit these patterns.
We have updated the text in that section. In now reads: "Collectively, these results suggest that the interrelationship of structural and functional connectivity covaries weakens with age. Notably, the areas that exhibit the greatest reductions fall within sensorimotor systems, which are among those with the strongest coupling to begin with, and include regions in striate and extrastriate cortex, as well as primary motor cortex and superior parietal lobule. Areas within heteromodal systems, like default mode and control networks, on the other hand, exhibit subtle reductions in coupling magnitude and, in some cases, even increase with age for instance, both dorsal and ventral prefrontal cortices along with temporal pole). Our findings point to heterogeneous differences in the complex relationship between the brain's physical wiring and its intrinsic functional organization."