Tract-specific statistics based on diffusion-weighted probabilistic tractography

Diffusion-weighted neuroimaging approaches provide rich evidence for estimating the structural integrity of white matter in vivo, but typically do not assess white matter integrity for connections between two specific regions of the brain. Here, we present a method for deriving tract-specific diffusion statistics, based upon predefined regions of interest. Our approach derives a population distribution using probabilistic tractography, based on the Nathan Kline Institute (NKI) Enhanced Rockland sample. We determine the most likely geometry of a path between two regions and express this as a spatial distribution. We then estimate the average orientation of streamlines traversing this path, at discrete distances along its trajectory, and the fraction of diffusion directed along this orientation for each participant. The resulting participant-wise metrics (tract-specific anisotropy; TSA) can then be used for statistical analysis on any comparable population. Based on this method, we report both negative and positive associations between age and TSA for two networks derived from published meta-analytic studies (the “default mode” and “what-where” networks), along with more moderate sex differences and age-by-sex interactions. The proposed method can be applied to any arbitrary set of brain regions, to estimate both the spatial trajectory and DWI-based anisotropy specific to those regions.

Neuroimage 43(1):20-28 Michielse S, Coupland N, Camicioli R, Carter R, Seres P, Sabino J, Malykhin N (2010) Selective effects of aging on brain white matter microstructure: a diffusion tensor imaging tractography study. Neuroimage 52(4):1190-1201 The application of a fibre specific anisotropy measurement to ageing also demonstrated higher sensitivity and specificity than traditional fractional anisotropy measurements in the following reference.
Rojkova K, Volle E, Urbanski M, Humbert F, Dell'Acqua F, Thiebaut de Schotten M. Atlasing the frontal lobe connections and their variability due to age and education: a spherical deconvolution tractography study. Brain Struct Funct. The methods regarding the tractography are missing a few details. How many fibres populations were modelled per voxels? 2 or 3? This is important to specify because it has an impact on the solving of the crossing. "Distance correction was applied", it is essential to specify that distance correction is a multiplication of the tract probability by the distance. It does not really solve the issue of distance and structural connectivity.
Some limitations should be acknowledged: -It is doubtful that this new approach can be applied to the whole brain due to the penalisation for multiple comparisons that increase exponentially with the increase of the number of region of interest.
-The measurement can be biased (as many other methods) by aligning to the stereotaxic space that, when not perfect, can lead to increased variability in the findings.
-Some indication on the computation time, which is believed to be rather long, should be mentioned.
Other than that, excellent work. Michel Thiebaut de Schotten Reviewer #2: Remarks to the Author: This work presents a method to analyze the integrity of white matter tracts in vivo from diffusion MRI images. This method is based on post-processing probabilistic tractography-issued tracts and then aggregate the tracts across different subjects to obtained a weighed mask that then is used to define a tract-specific anisotropy estimations. This technique is then used to determine sex and age effects on tracts connecting regions belonging to the default mode network (DMN) and the what-where network (WWN). The applications are well designed a presented. My review will mostly be centered on the technical aspects of diffusion MRI.
My first point is with respect to the innovative aspect of the method. Specifically, aggregating streamlines across subjects to build a population-specific weighted tract mask has been proposed more than 10 years ago by Hua et al (Neuroimage 2007), even to compute the "tract-specific anisotropy" proposed in this article. This technique has been then used by many different studies (see e.g., Jolles et al 2014, Subramaniam et al 2018 and proposed in several different variants. A succinct review of some of these proposals and studies is presented in Calamante (Magn Reson Mater Phy 30, 317-335, 2017). It should be noted that as the authors are basing their analysis on streamlines, the procedure is the roughly similar for deterministic or probabilistic tractography. Furthermore, the method proposed to select the most probable tract, is admittedly heuristic and composed of a multistep protocol where different parameters are hand-adjusted, which opens questions on the reproducibility of the analysis. In all the method to obtain the most representative tract and to aggregate the tract-specific anisotropy estimation is, without acknowledging it, a re-iteration of methods that have been proposed at-length in the past 10 years in the diffusion community.
My second point, and a less important one, is with respect to the introduction. There are statements in the introduction which are plainly wrong an example of this being: "for two identical fibres oriented along the anteroposterior (AP) axis, FA would be inversely proportional to the number of fibres crossing each along the (perpendicular) mediolateral (ML) axis". FA is a complex and diffuse aggregate measurement depending on several physiological factors, such as density of axons in a specific voxel, proportion of extra-cellular water, among other phenomena (see Wang et al Brain 2011 for an initial evaluation of this and Novikov et al 2018, for a review of this phenomenon).
In all, the results proposed as an example can be of value to the community. However, this article is presented as a novel method to compute tract-specific measurements which, at least in the current presentation does not appear to be different from a long tradition of proposed methods, and the presentation of diffusion MRI specifics in the introduction fails at portraying the authors as experts in the field.

Response to Reviewers General
Both reviewers have raised excellent points about previous advances in DWI tractography-based approaches, and particularly those that target tract specificity. We agree that our initial introduction to the current study lacked a thorough discussion of the preceding body of work, and have substantially amended both the Introduction and Discussion sections to address this shortcoming. Additionally, the term "tract" should also be better defined as the set of axons projecting (possibly bidirectionally) between two grey matter regions, which is the main point of differentiation between the current approach and others highlighted by both reviewers. However, we do not feel that the present study is merely an iteration of existing methods, at least not based on any studies we are aware of. A comparison of the specific studies highlighted by the reviewers is given below, and where appropriate has also been incorporated into our Introduction and Discussion section. We are grateful to the reviewers for suggestions that have, in our opinion, improved these sections.

Reviewer 1
1. As it read, the authors seem to introduce this work as the first attempt to measure tract specific anisotropy. However, this is not the case as other authors have been developing measures and software that are fibre specific. This is certainly true, and we thank reviewer for making us aware of these two studies in particular. We did not intend to represent the current study as the first such attempt at fibre-specific approaches, and agree with both reviewers that our introduction should be modified to properly summarise the existing literature on this topic. The relevant Introduction section (page 2) now includes this discussion as an appended paragraph: Additional fibre-specific DWI approaches have also been proposed, including qspace and q-ball imaging, spherical deconvolution, and CHARMED (Tuch, 2004;Tournier et al., 2004;Assaf and Basser, 2005). In particular, spherical deconvolution uses a spherical harmonic decomposition to estimate an orientation distribution function (ODF) from the observed diffusion signal (Tournier et al., 2004). This approach allows both the diffusion model and the number of distinct fibre populations within a voxel to be estimated from the observed data, and is the basis for voxel-wise estimation of apparent fibre density (AFD) for these distinct populations (Raffelt et al., 2012). Differences in oriented clusters of AFD (also referred to as fixels; Raffelt et al., 2015) have been shown for patients with motor neurone disease (Raffelt et al., 2012) and Alzheimer's disease (Mito et al., 2018). A related spherical deconvolution-based approach, called hindrance modulated orientational anisotropy (HMOA), uses the amplitude of specific lobes of the ODF as an estimate of white matter integrity for a specific fibre population (Dell'Acqua et al., 2012). HMOA of the postcommissural fornix has been shown to predict verbal memory performance in a healthy aging cohort (Christiansen et al., 2016).
Additionally, we have modified the Discussion in other to better acknowledge how these existing approaches may complement our the proposed methodology (page 12; changes underlined): It is possible that the increased specificity of the current approach permits a more fine-grained spatial and angular dissection of effects than does TBSS with FA, which uses a more coarse-grain white matter skeleton, and is not orientationspecific. If so, then the positive effects observed here may reflect a real age-related increase in white matter integrity for specific tracts. This possibility is supported by reports of fairly widespread increases in fMRI-based functional covariance with aging (Tomasi and Volkow, 2011), which have been proposed to reflect compensatory changes in response to degeneration or dysfunction of other brain regions. Given the conflicting evidence, however -particularly with the HMOA evidence from Rojkova et al. (2015), which is fibre-specific -these effects should be interpreted with caution. It will be important in future TSA studies to increase the number of ROIs, or query specific crossing tracts, in order to obtain a more complete picture of age-related effects across white matter. One promising avenue could be based on the so-called "tract-specific fractional anisotropy" approach (Mishra et al., 2014), in which uses the free water fraction to estimate an adjusted FA value in crossing-fibre regions. Alternatively, an integration of the current approach with existing fibre-specific spherical deconvolution-based methods, such as AFD (Raffelt et al., 2012) or HMOA (Dell'Acqua et al., 2012), could potentially be used to disambiguate the interpretation of TSA values in areas of dense crossing fibres.
2. Since age differences was targeted as a proof of concept in the present paper, the manuscript should include references to the pioneers in the field of age-related diffusion weighted imaging differences.
The application of a fibre specific anisotropy measurement to ageing also demonstrated higher sensitivity and specificity than traditional fractional anisotropy measurements in the following reference.
We agree that the discussion of age effects should be expanded. We have included a more expansive discussion of the relevant studies, only omitting those suggestions that were aimed at childhood and adolescence, since these don't overlap with the age range of our cohort. The relevant Discussion section now reads (page 11; changes underlined): For the WWN, age-related decreases in TSA occurred mainly in the body, but not the splenium, of the corpus callosum. This pattern is in agreement with several DWIbased studies of age-related connectivity changes. Burzynska et al. (2010) used TBSS to show an age-related reduction in FA (and increase in radial and mean diffusivity) in the genu and body of the corpus callosum, but not the splenium. Using a DTI approach, Bennett and Madden (2014) found decreased FA in older versus younger participants for both the genu and splenium, but with a more pronounced effect in the former. The same authors report an anterior-to-posterior gradient in age-related FA changes, with these being more pronounced in frontal white matter, consistent with the pattern found in the current study (Bennett et al., 2009). An even more pronounced pattern was reported by Michielse et al. (2010), who found an age-related decrease of FA in the genu, no relationship in the body, and a late (age 70-85) increase in the splenium. Bastin et al.(2008), also using tract shape modelling, found a significant decrease in FA for the genu, but not the splenium, in an elderly cohort (age 65 to 87), while in a cohort ranging from 30 to 80 years, Hsu et al. (2008) reported a similar age-related decrease in FA (and increase in MD) for the anterior, but not posterior corpus callosum. Interestingly, evidence from Hasan et al. (2009) suggests FA in the corpus callosum changes in a quadratic manner across the lifespan: increasing between age 7 and 20 years and decreasing between 20 and 60, which is in line with the present findings. Taken together, these findings suggest that, in adults, age-related changes in WM integrity of the corpus callosum may be more prominent anteriorly, and reduced or even reversed posteriorly.
Positive age associations were a more surprising finding, as numerous articles report negative age/FA associations (e.g., Kodiweera et al., 2016) and postmortem evidence of white matter loss and decrease in the proportion of small myelinated fibres with age (Tang et al., 1997;Aboitiz et al., 1997). Both positive and negative age/FA associations have been reported in at least one previous brain-wide TBSS study (Kochunov et al., 2007), however. For DMN in particular, we found that most positive associations occurred in regions proximal to ROIs, where the potential confound of crossing fibres is likely more pronounced (Jeurissen et al., 2012). The positive relationships in the WWN were especially prominent, and suggest a (paradoxical) increase in white matter integrity in these tracts. The majority of positive relationships in WWN were found in the middle of the superior longitudinal fasciculus (SLF). One TBSS study focusing on the SLF in a healthy cohort found no effect of age on FA (Madhavan et al., 2014), while another whole-brain study found SLF among tracts with negative age effects (Marstaller et al., 2015).
Similarly, Rojkova et al. (2015), using FA and the fibre-specific HMOA approach, found that both metrics decreased with age in the SLF bilaterally. On the other hand, increased FA in Alzheimer's disease patients has also been reported in SLF (Douaud et al., 2011). The authors of this study suggest that a relative sparing of crossing motor fibres may account for this effect, but this is inconsistent with our observed increase in TSA; on the contrary, our findings might be explained by a relative decrease in WM integrity in these crossing fibres.
3. The methods regarding the tractography are missing a few details. How many fibres populations were modelled per voxels? 2 or 3? This is important to specify because it has an impact on the solving of the crossing.
The default of N=3 populations was used in this study; this has now been added to our Preprocessing section (page 14).

"Distance correction was applied", it is essential to specify that distance correction is a multiplication of the tract probability by the distance. It does not really solve the issue of distance and structural connectivity.
We totally agree with this; the parameter was used in order to ensure that the resulting distributions were less highly skewed towards voxels near to the seed ROI. The relevant section has been updated to emphasize that this correction is not meant to actually solve the issue of distance bias (page 14; changes underlined): Probabilistic tractography was performed in MNI-152 space, using the probtrackx command (Behrens et al., 2007). For both networks, we generated 50,000 streamlines per seed voxel, with the other seed regions as target masks. We applied a step length of 0.5 mm, a curvature threshold of 0.2, a minimal path distance of 5 mm, and a fibre threshold of 0.01. A distance correction was also applied. Notably, this correction merely multiplies the streamline count at each voxel by its expected distance from the seed voxel. This is not an adequate correction for distance bias (which is unlikely to be linear), but does help reduce this bias in the middle of trajectories, for the purpose of tract determination (see next section). For each voxel, probtrackx recorded the number of streamlines that encountered that voxel, producing a separate count for streamlines terminating in each target ROI. Additionally, the voxel-wise mean orientation of streamlines between each seed/target pair was computed.

It is doubtful that this new approach can be applied to the whole brain due to the penalisation for multiple comparisons that increase exponentially with the increase of the number of region of interest.
While we hope that this method might eventually be refined in a way that accommodates whole brain analyses, in its present form we agree that this will have a high cost in terms of family-wise error. Notably, however, the approach (and number of tests) should scale quadratically with the number of ROIs, and an approach such as one-to-all, rather than all-to-all, could be feasibly applied. We also take hope from the use of FDR control in GWAS studies, where millions of tests are performed. A more thorough discussion of this complexity issue is now provided as a new paragraph in the Discussion (page 13): The relatively small networks used here (9 ROIs for DMN and 10 ROIs for WWN) required a total processing time of over 200 hours per participant, on CPU processors (see Supplementary Table 3). Notably, processing time will scale quadratically with the number of ROIs ( ( )), indicating that it may be infeasible to apply the TSA approach to the full set of possible ROIs (of comparable size) in the brain. On the other hand, the volume of grey matter in the brain is finite, and the number of ROIs comprising a whole-brain network depends critically on their granularity. Additionally, the bulk of processing time for this approach is attributable to the preprocessing steps (bedpostx and probtrackx), which were run using CPUs. GPU versions of these functions have recently been introduced, which can reduce processing time by a factor of 200 (Hernandez-Fernandez et al., 2019). This would reduce the required processing time for the present study to 10 hours per participant. A related limitation is that the number of multiple hypothesis tests (and associated family-wise error) also increases by ( ). However, the false discovery rate (FDR) approach we use to control family-wise error should be robust even to the high number of tests expected with a whole-brain TSA analysis (and is commonly used in genomic studies; see Korthauer et al. 2019). Ultimately, whether whole-brain TSA analysis is feasible in practice remains to be demonstrated.

The measurement can be biased (as many other methods) by aligning to the stereotaxic space that, when not perfect, can lead to increased variability in the findings.
This is indeed an important caveat, and is especially worrisome when applied to neurodegeneration, neurodevelopment, or lesions, where the structure of the brain and the associated normalization can be expected to change substantially. This may be especially relevant near ventricles, as is seen in Figure 4b. We have modified our discussion of this potential limitation (which is also an issue for many neuroimaging methods) in the Discussion (page 12; changes underlined): The NKI Rockland dataset was chosen due to its large size, age range, and the use of a single MRI scanner and protocol. To ensure the cohort was as representative of the general population as possible, and to enable the analysis of age over the lifespan, we chose to use close to the full age range (18-80), and to exclude participants with clinical diagnoses. As with most population templates, however, the choice of cohort is an important consideration when interpreting a derived result. The human brain is known to show systematic anatomical grey matter changes across the lifespan (Sowell et al., 2003;Giorgio et al., 2010), and this will almost certainly bias normalization in a way that may account for a portion of the TSA effects reported here. Indeed, variability of findings due to the choice of T1w templates has been shown for voxel-based morphometry (Shen et al., 2007). It will be important in future studies to assess the influence of this bias, use cohorts that are more targeted to a particular phenomenon under investigation, generate population-specific T1w templates for normalization (see, e.g., Whitwell et al., 2007), and compare the predictions of TSA to in vivo or post mortem analyses of white matter (e.g., as in Reveley et al., 2015).

This information has been added as a new Supplementary Table 3 (page 24):
A discussion of this has also added to the Discussion, highlighting the drastic performance improvements (approximately 200x) supplied by GPU implementations of the FDT tools (see Hernandez-Fernandez et al., 2019) (page 13; see above).

Reviewer 2
4. TSA statistics are mapped to the core trajectory by weighing based on the tract map (see Line 236). A "lambda" parameter is provided that can influence the spatial distribution of this weighting, although we use lambda=1 and this parameterization is completely optional.
Importantly, each of these parameters is simply a way to fine-tune the method to attain the heuristical objective: "we want a spatially constrained distribution that represents the most probable tract, based on the diffusion evidence, between two GM ROIs". It is unlikely, in our opinion, that changes to these parameters will significantly alter the observed results, other than to produce failures for a higher fraction of ROI pairs, or alter the spatial extent (or smoothness) of the resulting statistics. We also note that many existing and well-established neuroimaging methods use parameters in this way (e.g., TBSS, ProbtrackX, BET, Freesurfer, etc.).
In our opinion, using the default parameters specified in the manuscript will result in outcomes that are reproducible and comparable. We only recommend altering these parameters for a specific purpose; e.g., to broaden or shrink the spatial extent of the resulting TSA map.
This is indeed an important consideration, and we have added a new paragraph to our Discussion section (page 12-13): The estimation of tract trajectories and TSA values involves a heuristic approach, with numerous parameters involved at each step. This raises the potential for parameter adjustments to vary the results in ways that bias the resulting distributions and statistics. In general, however, these parameters were chosen in order to spatially constrain these estimates in reasonable ways; for example: to optimise a threshold such that a single trajectory is chosen from several alternatives, to constrain core polylines to realistic geometries, or to determine the spatial extent at which TSA values are used to compute distance-wise statistics. Of these parameters, it is only the latter where the researcher is able to exercise discretion, i.e., over the degree of spatial certainty used to map statistics at a given distance to the "core" tract trajectory (specified by the , and parameters). As a general policy, we recommend using the default parameter values for tract and TSA estimation, in the absence of a principled reason to adjust them 6. In all the method to obtain the most representative tract and to aggregate the tractspecific anisotropy estimation is, without acknowledging it, a re-iteration of methods that have been proposed at-length in the past 10 years in the diffusion community.
As argued above for the specific examples given, and based on our knowledge of the current literature, we respectfully disagree that it is a "re-iteration" of any existing method. While we accept that the anatomical properties generating an FA signal are more complex than simply oriented axonal compartments, the sentence above is intended to present a simple hypothetical situation that challenges the use of streamline counts to infer connectivity density. Two "identical fibres" will generate different streamline distributions if one is crossed by a perpendicular fibre and the other is not. Moreover, the relationship will be inversely proportional because the crossing fibre will reduce FA proportional to the fraction of diffusion occurring along its axis. This follows directly from the formula for FA (Basser & Pierpaoli, 1996): If diffusivity in the perpendicular direction ( ) is varied from 0 to 1, while holding constant, the relationship with FA is indeed inversely proportional: The issue is perhaps the wording of the sentence, which we have modified to reflect the complexities subsumed by the word "identical" (page 2, changes underlined): It is often desirable to relate voxel-wise DWI-based metrics such as FA to other phenotypical observations, such as behavioural or cognitive measures, or clinical status. Voxel-wise analyses can be highly confounded by the individual geometry of white matter tracts, and one way to address this issue is tract-based spatial statistics (TBSS), in which FA measures are projected onto a population-based FA "skeleton" with a high probability of being white matter in all participants (Smith et al., 2006(Smith et al., ,2007. The presence of crossing fibres, however, also has implications for the interpretation of FA (Jbabdi et al., 2010). As a hypothetical example: for two otherwise anatomically identical white matter fibres, the introduction of perpendicularly oriented axons to one will reduce its FA proportionally. Interpreting FA in terms of the underlying microstructure of white matter in a voxel is thus inherently ambiguous. This ambiguity can be improved if crossing fibres are explicitly modelled, for example using the Bayesian approach described above. Such a crossing fibre model has been proposed as an extension to the TBSS approach (Jbabdi et al., 2010).
We are happy to respond to any other perceived issues with statements made in the revised Introduction.