Abstract
Diffusion magnetic resonance imaging and tractography enable the estimation of anatomical connectivity in the human brain, in vivo. Yet, without groundtruth validation, different tractography algorithms can yield widely varying connectivity estimates. Although streamline pruning techniques mitigate this challenge, slow compute times preclude their use in bigdata applications. We present ‘Regularized, Accelerated, Linear Fascicle Evaluation’ (ReAlLiFE), a GPUbased implementation of a stateoftheart streamline pruning algorithm (LiFE), which achieves >100× speedups over previous CPUbased implementations. Leveraging these speedups, we overcome key limitations with LiFE’s algorithm to generate sparser and more accurate connectomes. We showcase ReAlLiFE’s ability to estimate connections with superlative test–retest reliability, while outperforming competing approaches. Moreover, we predicted interindividual variations in multiple cognitive scores with ReAlLiFE connectome features. We propose ReAlLiFE as a timely tool, surpassing the state of the art, for accurate discovery of individualized brain connectomes at scale. Finally, our GPUaccelerated implementation of a popular nonnegative leastsquares optimization algorithm is widely applicable to many realworld problems.
Main
Intact anatomical connectivity among brain areas is critical to cognition^{1}. Accurate estimation of anatomical connections in vivo is critical not only for uncovering the neural underpinnings of human behavior, but also for understanding the genetic bases of neurological disorders^{2}.
Diffusion magnetic resonance imaging (dMRI), followed by tractography, enables the estimation of anatomical connectivity in the human brain, in vivo^{3}. dMRI measures the diffusion of water molecules in the brain’s white matter, then tractography algorithms estimate axonal structures based on restricted diffusion, post hoc^{4}. However, dMRI and tractography algorithms are prone to challenges such as acquisition noise and redundant fiber geometries. As part of an international tractography challenge, a recent study^{5} compiled efforts by 20 teams that estimated a wholebrain connectome from a simulated dMRI scan, which was, in turn, generated with simulated ‘groundtruth’ fiber bundles. The diverse success rates across the different teams—in terms of their ability to match the ground truth—underscores the magnitude of this challenge. Because actual groundtruth connectivity in the brain is typically unavailable in vivo, direct validation of tractography in the human brain remains elusive.
Streamline pruning and evaluation algorithms represent a stateoftheart, postprocessing approach to address these challenges^{3,6,7}. Linear Fascicle Evaluation (LiFE) is a recent, stateoftheart model that prunes out spurious fibers based on the quality of fit to the underlying diffusion signal^{3}. Yet, LiFE’s algorithm is implemented on central processing units (CPUs) and suffers from both speed and memory bottlenecks^{3}, which preclude its application for connectome evaluation at scale.
In this Brief Communication, we present an improvement to LiFE—Regularized, Accelerated, Linear Fascicle Evaluation, or ReAlLiFE—for rapid and accurate connectome evaluation at scale. We improve the LiFE algorithm by introducing an explicit regularization (sparsity) penalty into its objective function, and present a scalable graphics processing unit (GPU) implementation that routinely achieves orders of magnitude (>100×) speedups over CPU implementations while also estimating more sparse and more consistent connectomes. Next, we show that ReAlLiFE performs at par with, and even outperforms, other stateoftheart approaches (for example, SIFT2^{6} and COMMIT2^{8}) in terms of estimating streamlines with high test–retest reliability. Finally, we apply ReAlLiFE to identify structural connectivity correlates of behavior in a cohort of 200 participants drawn from the Human Connectome Project (HCP) database^{9}. We propose ReAlLiFE as an effective tool, surpassing the state of the art, for rapid and accurate connectome discovery at scale.
We introduced a preliminary version of the ReAlLiFE algorithm in an earlier study^{10} (Fig. 1a,b); this implementation achieved 50–100× speedups over CPU implementations of LiFE. In the present study, we optimize the algorithm further (Methods) to achieve even greater speedups (>100×, up to 155×; Methods). We demonstrate these speedups with three different diffusion MRI datasets.
We tested for speedups, first, with a stateoftheart diffusion MRI dataset (dataset H; N_{v} = 437,495, N_{θ} = 270) from the HCP database^{9}. We generated connectomes of seven different sizes, ranging from 50,000 to two million fibers (Methods). The streamlines in these connectomes were then pruned with the CPU implementation as well as with our GPU implementation of LiFE (Methods). The GPU implementation produced substantial speedups, ranging from 62fold (62×; 95% confidence interval (CI), [59.8, 63.4]) for a connectome with 50,000 fibers to 129fold (129×; 95% CI, [128.8, 129]) for a connectome with 1.5 million fibers (Fig. 1c). We evaluated these speedups on two other, independently acquired datasets: a dMRI dataset acquired in house (dataset I; N_{v} = 116,468, N_{θ} = 64; Methods) and a dataset used in the original LiFE study^{3} (dataset S; N_{v} = 247,969, N_{θ} = 96; Methods). Again, we observed maximum speedups of 124× (95% CI, [123.6, 124.4]; dataset I) and 155× (95% CI, [155.1, 155.2]; dataset S) for connectomes with 1.5 million fibers (Fig. 1c).
We also tested how speedups scaled with the number of voxels (N_{v}) and number of diffusion directions (N_{θ}). Overall, speedups scaled fastest with the number of diffusion directions, followed by the number of voxels (Supplementary Fig. 1). We also compared convergence times for ReAlLiFE with two other pruning algorithms, SIFT^{7} and COMMIT2^{8}. ReAlLiFE performed comparably to SIFT and COMMIT2, with a slight advantage for our approach, especially at larger connectome sizes (Supplementary Fig. 1 and Supplementary Information).
Incorporating a sparsityinducing prior (L1norm of fiber weights; Methods) enabled ReAlLiFE to generate sparser and more accurate connectomes. Yet, previous studies have indicated that such a sparsityinducing prior may increase the chances of false negatives (missed fibers), particularly when duplicate fibers occur in the connectome^{11}. We addressed this challenge precisely by constructing an artificial connectome comprised entirely of nearidentical, duplicate streamlines, and show that pruning with ReAlLiFE largely ameliorated this challenge (Supplementary Fig. 2). In addition, we show that ReAlLiFE reduced overfitting and produced more consistent connectomes, as compared to LiFE (Supplementary Figs. 3 and 4).
We next quantified the reliability of the ReAlLiFE algorithm, comparing it with that of LiFE, SIFT2^{6} and COMMIT2^{8}. We performed a test–retest reliability analysis^{12} using a sample of n = 5 participants drawn from the HCP database (Methods). For each participant, pairwise intrahemispheric connectivity was computed, the total variability of which was partitioned into betweenparticipant (V_{b}) and withinparticipant (V_{w}) components (Methods).
Following pruning with LiFE, within and betweenparticipant variabilities were comparable (V_{b}, CI [0.411, 0.435]; V_{w}, CI [0.407, 0.432]; P = 0.119; effect size = 0.025; Fig. 1d), but, following ReAlLiFE pruning, withinparticipant variability was significantly lower than betweenparticipant variability (V_{b}, CI [0.374, 0.401]; V_{w}, CI [0.337, 0.365]; P < 0.001, effect size = 0.217; Fig. 1d). Thus, ReAlLiFE pruning increased test–retest reliability of the estimated connection weights. Moreover, betweenparticipant variability following ReAlLiFE pruning was significantly lower than that following LiFE pruning (P < 0.001).
We further quantified the test–retest reliability with a reliability index (ϕ; Methods). A greater reliability index is a hallmark of efficient pruning. Compared with LiFE, pruning with ReAlLiFE yielded a significantly higher reliability index (Fig. 1g; LiFE: ϕ = 0.507, CI [0.503, 0.512], ReAlLiFE: ϕ = 0.533, CI [0.528, 0.538]; P < 0.001 across all connections). Pruning with SIFT2 yielded reliability indices comparable with those of ReAlLiFE (Fig. 1g; SIFT2, ϕ = 0.535, CI [0.530, 0.540]; P = 0.812), whereas pruning with COMMIT2 yielded a marginally lower reliability index than ReAlLiFE (Fig. 1g; ϕ = 0.526, CI [0.521, 0.531]; P = 0.004).
Next we quantified the improvement in reliability based on the proportion of connections in each quadrant of the (V_{b}, V_{w}) plot: a greater proportion in the II quadrant (V_{b} > V_{w}; ‘highreliability’) as compared to the IV quadrant (V_{w} > V_{b}; ‘lowreliability’) indicates more efficient pruning. Compared to the unpruned connectome, streamline pruning with LiFE yielded fewer connections in the highreliability (6.8%) than in the lowreliability quadrant (11.1%), although not different from chance (P = 0.994, binomial test; Fig. 1e). On the other hand, pruning with ReAlLiFE yielded significantly more connections in the highreliability (13.2%) than in the lowreliability quadrant (8.5%; P = 0.011; Fig. 1e). Pruning with SIFT2 and COMMIT2 yielded comparable proportions of connections in both the high and lowreliability quadrants (SIFT2, P = 0.399; COMMIT2, P = 0.390; Fig. 1e).
Finally, we identified connections exhibiting extreme values of test–retest reliability following ReAlLiFE pruning (Fig. 1f). The highest test–retest reliability was observed for longrange connections between the frontal and parietal lobes (Supplementary Table 3), which strongly overlap with established whitematter tracts including the superior and inferior longitudinal fasciculus (SLF/ILF), the arcuate fasciculus (AF) and the inferior frontooccipital fasciculus (IFOF) (Fig. 1h). Conversely, test–retest reliability was least for shortrange connections, especially those connecting the middle temporal gyrus with adjacent temporooccipital regions (Fig. 1i).
As a realworld application of ReAlLiFE, we asked if streamline pruning with ReAlLiFE would enable identification of structural connectivity correlates of behavior. For this, we predicted 60 behavioral test scores^{13} spanning three categories—cognition, emotion and personality—of 200 participants (HCP database^{9}; Supplementary Data File 1). Behavioral score prediction was performed with a support vector regression (SVR) model using recursive feature elimination (RFE)^{14} (Methods and Fig. 2a). Specifically, we compared predictions made with ReAlLiFE connection weights as features against those based on the number of fibers in the unpruned connectome.
Across a range of significance thresholds (α = 0.00001 to 0.05, uncorrected), the number of behavioral scores predicted significantly by connectome features based on both the number of fibers (Fig. 2b, top, red filled circles) and ReAlLiFE weights (Fig. 2b, top, purple filled circles) were not different (P = 0.906; effect size = 0.011). However, predictions based on ReAlLiFE weights were more accurate than those based on the number of fibers, as evidenced by the higher average correlations (Fig. 2b, bottom; effect size = 0.660). Similar trends were observed when each category of scores—cognition, emotion and personality—was predicted separately (Fig. 2c–f and Supplementary Fig. 5). Specifically, for cognition and personality scores, ReALLiFE weights yielded consistently higher prediction accuracies (average r values) than the number of fibers, across the entire range of significance thresholds (Fig. 2c,e, lower panels).
Next we asked which set of connectome features—ReAlLiFE weights or unpruned connectome fibers—more robustly predicted behavioral scores. For this, we combined both sets of features and quantified the proportion of ReAlLiFE features selected by the RFE algorithm for the best prediction of each behavioral score. For over 95% (58/60) of score predictions, RFE favored a higher proportion of ReAlLiFE features, as compared to features from the unpruned connectome (P < 0.001, Wilcoxon signedrank test; Fig. 2g). Additional control analyses for these behavior predictions are reported in the Supplementary Information (Supplementary Fig. 5).
We also analyzed the anatomical features underlying these predictions, focusing on the ‘cognition’ scores. Briefly, reading ability correlated significantly with connectivity in the left frontal cortex, whereas picture vocabulary scores correlated significantly with connectivity within the right parietal cortex (Supplementary Fig. 5 and Supplementary Data File 2).
Our findings indicate that, in addition to yielding more accurate connectomes, ReAlLiFE connection weights enabled more accurate predictions across a range of behavioral scores, all acquired outside the scanner environment. dMRIbased structural connectivity, quantified following connectome evaluation, may provide a reliable neuroimagingbased biomarker for key cognitive traits.
The ReAlLiFE algorithm may be developed and improved on several key fronts to overcome its current limitations. First, the current version of the ReAlLiFE algorithm does not take advantage of parallel computations across multiple GPUs. Moreover, ReAlLiFE is presently not integrated with multiCPU acceleration schemes^{10}, although our speedups exceed (~8.7×) stateoftheart numbers reported for these approaches. Combining these CPUbased schemes with our GPU implementation, or implementing parallel computations across multiple GPUs, may yield further speedups of the algorithm.
Second, ReAlLiFE’s optimization objective, including the sparsityinducing prior, may be further improved. A key feature of regularized pruning with ReAlLiFE is the ability to generate connectomes at various, desired levels of sparsity using L1normbased regularization, a feature unavailable in the original LiFE algorithm. Yet, such a stringent regularization increases the chances of false negatives (missed fibers)^{8,11}. Although we tested for this possibility (Supplementary Fig. 2), other kinds of regularization (for example, L2normbased) will need to be systematically evaluated to identify those that minimize false negatives in the connectome. Nonetheless, incorporating this penalty resulted in ReAlLiFE outperforming LiFE in terms of reducing overfitting, producing more consistent connectomes and increasing test–retest reliability. Although the test–retest reliability analyses were performed with only a limited number of participants (n = 5), ReAlLiFE connections with the highest test–retest reliability mapped onto established white matter tracts^{15}. Incorporating additional features and constraints into the objective function—for example, based on brain anatomy, as with COMMIT2^{8}—may enable future improvements to the ReAlLiFE algorithm.
Third, a principled evaluation of the reasons underlying the differences between ReAlLiFE and other stateoftheart algorithms—SIFT/SIFT2 and COMMIT2^{6,7,8}—remains to be carried out. ReAlLiFE pruning times were largely comparable with the other approaches, but demonstrated a marginal advantage for larger connectome sizes. Moreover, ReAlLiFE differed from the other approaches in the proportion of highreliability connections retained following pruning. The reasons for these differences need to be investigated carefully. Finally, ReAlLiFE will need to be directly compared against these competing approaches, in terms of their respective success rates with mapping structural connectivity to behavior. Such a principled comparison is essential for identifying robust structural connectivity bases of higherorder cognitive functions, such as attention, learning and decisionmaking^{15,16}.
More generally, the Subspace Barzilai–Borwein NonNegative LeastSquares (SBBNNLS) algorithm, at the heart of ReAlLiFE, is widely applicable to optimization problems in many realworld applications, including healthcare^{17}. Our GPUaccelerated implementation of the SBBNNLS algorithm has the potential for wide application in diverse domains that go beyond connectome pruning.
Methods
All experiments were conducted according to protocols approved by the Institute Human Ethics Committee, Indian Institute of Science, Bangalore. Informed written consent was obtained from each participant before the study.
ReAlLiFE
Description of the LiFE algorithm
The LiFE algorithm models a diffusion signal from the reconstructed wholebrain connectome. LiFE’s optimization algorithm minimizes the error between the modeled and the measured diffusion signal, while eliminating fibers from the connectome that do not contribute to the underlying diffusion signal. The diffusion signal is typically measured along multiple, uniformly sampled gradient directions in space, N_{θ}. For each voxel and gradient direction, this signal is then encoded into a vector \({{{\bf{b}}}}\, \in \,{{R}}^{{{N}_\theta} \times {{N}_{{{\mathrm{v}}}}}}\), where N_{v} is the number of voxels in the data. In a given wholebrain connectome, fibers traverse multiple voxels and each voxel may contain many different fibers. The contribution of each fiber f, traversing a voxel v, measured along the gradient direction θ, is encoded into a matrix \({{M}}\, \in \,{R}^{{{N}_{{{\mathrm{v}}}}}{{N}_\theta} \times {{N}_{{{\mathrm{f}}}}}}\), where N_{f} is the number of fibers in the connectome. The diffusion signal in each voxel is modeled as a weighted sum of the individual fibers traversing the voxel. This can be written as b = Mw, where \({\bf{w}}\, \in \,{R}^{{N}_{{{\mathrm{f}}}}}\) signifies the contribution (or weight, w_{f}) of each fiber f to the diffusion signal b.
LiFE minimizes the error between the modeled and measured diffusion signal by assigning a nonnegative weight to each fiber. This objective is posed as a nonnegative leastsquares optimization problem:
Solving this problem imposes substantial memory demands due to the large size of M, which is typically 2 GB for a typical connectome with N_{v} = 100,000, N_{f} = 1 million and N_{θ} = 64 (refs. ^{3,18}). A recent study^{18} overcame this limitation by adopting a more efficient, sparse tensorial representation of M. The demeaned diffusion signal \({{M}}_{{{\mathrm{v}}}}\, \in \,{{R}}^{{{{N}}_\theta} \times {{N}}_{{{\mathrm{f}}}}}\) in each voxel v was represented, using sparse Tucker decomposition (STD), as \({{M}}_{{{\mathrm{v}}}} = {{S}}_{0}({{v}}){{D}}{\varPhi _{{{\mathrm{v}}}}}\), where S_{0}(v) is the ‘baseline’ diffusion signal measured in the absence of a diffusion gradient, \({{D}} \in {{R}}^{{{N}}_{\theta} \times {{N}}_{{{\mathrm{a}}}}}\) is a dictionary matrix quantifying the contribution of canonical diffusion ‘atoms’ (N_{a} atoms) toward each diffusion direction, and \({{\varPhi }}_{{{\mathrm{v}}}} = {{R}}^{{{N}}_{{{\mathrm{a}}}} \times {{N}}_{{{\mathrm{f}}}}}\) is a sparse, binary matrix whose columns indicate the contribution of each atom to each fiber, in that voxel. Collating Φ_{v} for all voxels v into a sparse 3D tensor Φ, the modeled (or predicted) diffusion signal may be written as
where w is the vector of all the individual fiber weights and ×_{k} represents a matrix product in mode k. M in equation (1) is now given by \({{M}} = {{\varPhi }} \times _{1}{{D}} \times _{2}{{S}}_{0}\).
With this representation, the optimization problem is solved using an SBBNNLS algorithm^{10}. Briefly, given w^{0} as an initial weight vector, the weight updates occur as follows:
where the gradient term is given by
and α^{(i)}, the step value at each iteration, is given by
for the odd iterations and
for the even iterations. The tilde denotes the projection of the gradient into the positive space at each iteration and 〈a, b〉 denotes an inner product of the vectors a and b.
In its original form, the LiFE algorithm suffered from a few key drawbacks, which needed to be overcome to enable connectome evaluation at scale. First, a CPU implementation of the memoryefficient implementation of the LiFE algorithm still suffered from computational bottlenecks, because every iteration of the algorithm requires a large number (O(N_{v}N_{θ}N_{f})) of multiplications, typically, with matrices comprising 10^{13}–10^{14} elements. This requirement produced a considerable speed bottleneck when evaluating large connectomes. Second, LiFE (and related pruning algorithms) suffer from another key limitation: there is no explicit provision in LiFE or in other popular pruning algorithms, such as SIFT^{7} or SIFT2^{6}, for directly eliminating redundant fibers in the connectome.
GPU acceleration
To address the issue of speed, we sought to identify key bottlenecks in LiFE’s algorithm that could be optimized on GPUs. Briefly, the SBBNNLS optimization algorithm requires several multiplications of the form Mx or M^{T}y, where x and y are generic notations of matrices that are used in various steps of the optimization. We sought to speed up these multiplications with efficient GPUbased computations. The detailed steps are presented as pseudocode in algorithms 1 and 2 in the Supplementary Information. Here we describe these steps briefly.
A key ingredient of our GPU acceleration approach is splitting the computation among voxels, with each CUDA block handling data associated with one voxel. For storage efficiency, the matrix M was stored in a sparse tensor (Coordinate list, COO format) with indices into the dictionary matrix D, as it was not feasible to use standard sparse matrix multiplication packages. Following STD of M (equation (2)), computing Mx requires computing linear combinations of columns from D while M^{T}y computation requires computation of inner products with columns of D (ref. ^{18}). The former has a high memory write bandwidth requirement, whereas the latter has a high memory read bandwidth requirement as well as a reduction operation. To address this issue we sorted the Φ tensor, stored in the COO format, along the voxel dimension, enabling faster pervoxel execution of both Mx and M^{T}y by reducing memory write and read requests, respectively.
We fixed the block size to the warp size of the GPU, which corresponds to the number of threads processing each voxel. Each thread handled one or more diffusion directions, depending on the total number of diffusion directions. Data along the diffusion direction dimension were padded such that its size was a multiple of the warp size, to avoid branching in the kernel code that is to be run on the GPUs. This also permitted maximizing the usage of warp shuffle instructions and reducing shared memory usage. We used shared memory only for storing the final results. In each block, we read up to warp size entries from the sparse tensor in parallel, to leverage memory coalescing advantages, and stored them in thread local memory. The threads in a block then computed on different diffusion directions for the read entries sequentially. We used warp broadcast instructions to share data from thread local memories to all threads in a block. In the case of M^{T}y computation, we used warp shuffle instructions for computing inner products. This freed up resources, potentially allowing more blocks to be scheduled at any given time.
We implemented these algorithms (algorithms 1 and 2 in the Supplementary Information) with the CUDA language for use with NVIDIA GPUs. The results reported in the main text reflect speedups with the CUDA implementation. In addition, we implemented these same algorithms on AMD GPUs with the HIP (Heterogenous Compute Interface for Portability) language, using the HIPIFY package (https://github.com/ROCmDeveloperTools/HIPIFY). We found that execution times for algorithms 1 and 2 on the NVIDIA GeForce GTX 1080 Ti GPU and the AMD Radeon RX 580 GPU are comparable, and several orders of magnitude faster than their corresponding CPU implementations in Matlab (Supplementary Fig. 1).
Regularized evaluation
To address the second issue of redundant fibers, we developed a regularized pruning algorithm, extending that of LiFE^{10}. We modified LiFE’s leastsquares error minimization objective function to incorporate a regularization term for the weights, such that an L1norm penalty was incorporated into the objective function: \({{{{O}}}({{{\bf{w}}}}) + \lambda (\left\ {{{\bf{w}}}} \right\_{1}),\,\,{{{\bf{w}}}} \ge {0}}\), where λ is the regularization constant. The gradient calculation in equation (3) now changes to \({{g}}({{{\bf{w}}}}) = {{M}}^{{{\mathrm{T}}}}({{M\bf{w}}}{{{\mathrm{\bf{b}}}}}) + {\lambda {\bf{1}}}\) where 1 is a vector of all 1s. Similarly, we also implemented L2 regularization for the weights, by adding an L2norm penalty to the objective function: \({{{O}}({{{\bf{w}}}}) + \lambda \left( {\left\ {{{\bf{w}}}} \right\_2} \right),{{{\bf{w}}}} \ge {0}}\). The gradient calculation in equation (3) now changes to \({{g}}({{{\bf{w}}}}) = {{M}}^{{{\mathrm{T}}}}({{M\bf{w}}}  {{\bf{b}}}) + {\lambda {{{\bf{w}}}}}\). We tested several values of the penalty λ, for both the L1 and L2 regularization (Supplementary Fig. 3).
The estimated fiber weights vector w is encouraged to be sparse in LiFE through the nonnegativity constraint. Additionally, in ReAlLiFE, more sparsity is induced, through regularization, on the weights vector.
Although a preliminary version of the method has been published previously^{10}, in the present study we advance the ReAlLiFE algorithm by leveraging the sparsity of w to further speed up the GPU implementation of Mx. Specifically, whenever a weight vector element x is zero, the entire matrix multiplication computation for that step can be skipped, in each thread (step 5 in Supplementary algorithm 1). This yielded a substantial improvement in speedups for the present ReAlLiFE algorithm (>100–150× for the largest connectome sizes tested) over the version of the algorithm published previously^{10} (~50–100×).
Diffusion MRI acquisition and preprocessing
Dataset I
Structural and dMRI scans were acquired on a Siemens Skyra, 3T scanner with a 32channel head coil, at the HealthCare Global Hospital, Bangalore. A T1weighted MPRAGE structural scan was acquired before the diffusion scan (1mm spatial resolution; echo time (TE) = 2.32 ms, repetition time (TR) = 2,300 ms, field of view (FoV) = 240 mm, flip angle = 8°, 256voxel matrix size, parallel acquisition technique (PAT) with inplane acceleration factor 2 (GRAPPA)). Diffusion scans were acquired along 64 noncollinear directions with a b value of 1,000 s mm^{−2} (2mm isotropic voxels, TE = 90 ms, TR = 8,900 ms, FoV = 256 mm, 128voxel matrix size, 68 transversal slices with interleaved slice acquisition; parallel acquisition technique (PAT) with inplane acceleration factor 2 (GRAPPA), phase encoding direction: A>>P). Two nondiffusion weighted images (b = 0 s mm^{−2}) were acquired, one at the beginning and one at the end of each scan, respectively. Preprocessing of dMRI images followed previously published protocols^{3}. Briefly, the T1 image was first manually aligned to the participant’s anterior commisure–posterior commisure (AC–PC) axis coordinates. Following this, scans were preprocessed to correct for head motion, eddy currentrelated distortions were corrected using a rigidbody alignment algorithm^{3}, followed by alignment to the ACPC aligned T1 image using the VISTA LAB (Stanford Vision and Imaging Science and Technology) diffusion MRI software package, as part of the Vistasoft suite 2017 (https://github.com/vistalab/vistasoft/). This dataset is available on a Figshare repository^{19}.
Dataset S
We acquired a publicly available, preprocessed dataset as used in the evaluation of the LiFE algorithm^{3}. Data were acquired on a General Electric Discover 750 (GE Healthcare), 3T scanner with a 32channel head coil. Diffusion scans were acquired along 96 noncollinear directions with a b value of 2,000 s mm^{−2} (1.5mm isotropic voxels, TE = 96.8 ms). Ten nondiffusion weighted images (b = 0 s mm^{−2}) were acquired at the beginning of the scan. Preprocessing steps included correcting distortions arising out of B0 field inhomogeneities as well as participant head motion correction using a rigidbody alignment method^{3}. Further details regarding the acquisition and preprocessing steps are available in refs. ^{3} and ^{18}.
Dataset H
We acquired a publicly available dataset from the HCP database^{9}. Structural and dMRI scans were acquired on a customized Siemens Connectome Skyra, 3T scanner with a 32channel head coil. The T1weighted MPRAGE structural scan was acquired at 0.7mm spatial resolution, TE = 2.14 ms, TR = 2,400 ms, FoV = 224 × 224 mm^{2}, flip angle = 8°, PAT with inplane acceleration factor 2. Diffusion scans were acquired along 270 noncollinear directions using multishell imaging with b values of 1,000, 2,000 and 3,000 s mm^{−2} (1.25 mm isotropic voxels, TE = 89.5 ms, TR = 5,520 ms, FoV = 210 × 180 mm^{2}, 164 × 144 matrix size, multibandfactor = 3, phase encoding direction: L>>R and R>>L). A total of 18 nondiffusion weighted images (b = 0 s mm^{−2}) were acquired, interspersed throughout the scan. Preprocessing steps included B0 intensity normalization, susceptibilityinduced distortion correction, eddy current and participant head motion correction, and gradient nonlinearity correction. Finally, the diffusion images were registered to the structural (T1weighted) image. For all our analyses, we used HCP’s minimally preprocessed data^{9}.
Dataset M
This dataset was provided as part of an international Tractography Challenge^{5}. Briefly, the authors used one dataset from the HCP database^{9} to manually delineate 25 known fiber bundles and their corresponding termini regions (regions of interest, or ROIs) in the human brain^{5}; these were termed ‘groundtruth’ bundles. Next, using these groundtruth bundles, diffusion MRI data were simulated corresponding to a b value of 1,000 s mm^{−2} and 32 gradient directions using the Fiberfox software. A T1weighted image was also simulated. The authors provide two datasets—one artifactfree dataset with no noise and one dataset with added artifacts such as head motion, susceptibilityinduced distortion, eddy currents, spiking noise, ghosting and ringing artifacts, as well as Gaussian noise.
In addition to the dataset already provided by the authors, we simulated a second, independent dataset with the same parameters, using Fiberfox^{5} version 2018.09.99. Briefly, we generated one diffusion MRI dataset with b = 1,000 s mm^{−2} and 32 gradient directions (2mm isotropic voxels) in the A>>P (anterior to posterior) phase encoding direction. For each dataset, we used a fourcompartment model to simulate the (1) interaxonal, (2) intraaxonal, (3) gray matter (GM) and (4) cerebrospinal fluid (CSF) tissue response profiles. Parameter values corresponding to each compartment model are listed in Supplementary Table 1. All other parameters were set to their respective, default values. We also simulated an additional nondiffusion weighted image (b = 0 s mm^{−2}) with the phase encoding direction reversed (P>>A). Individual masks used for each compartment as well as the anatomical T1 image were the same as those provided in the original dataset. Preprocessing the simulated data involved denoising with MRtrix3^{20} followed by motion correction and susceptibilityinduced distortion correction^{20}. Finally, the diffusion MRI scan was aligned to the T1 image.
HCP datasets for studying structure–behavior relationships
For predicting behavioral scores using structural connectivity, we used minimally preprocessed diffusion MRI datasets from the HCP database^{9}. We utilized data from n = 200 participants with an equal number of males and females (n = 100 each) for whom both dMRI data as well as behavior data was available (Supplementary Table 2; participant IDs). These participants included 60 participants who were matched for age, gender and handedness from a previous study^{21}. The remaining 140 participants were drawn in chronological order from the HCP database to ensure gender and age parity (n = 100 females: mean = 29.2 years, s.d. = 3.7 years; n = 100 males: mean = 28.5 years, s.d. = 3.9 years). We confirmed that there was no significant difference between the average age of male and female participants (P = 0.223, twosample ttest).
Tractography and generating wholebrain connectomes from dMRI data
Datasets I, S, H and M
For each of these datasets, we used a standard tractography pipeline available with MRtrix3^{20}. This pipeline comprises the following steps. We first performed a fivetissuetype segmentation on the T1 image to separate out the (1) cortical GM, (2) subcortical GM, (3) white matter, (4) CSF and (5) other pathological tissue. Next, a constrained spherical deconvolution (CSD) algorithm was employed to estimate the fiber orientation distribution (FOD) in each voxel^{20}, with the maximum harmonic order at 8 (default value). Finally, anatomically constrained probabilistic tractography was performed using dynamic seeding^{7}. The maximum fiber length cutoff and FOD amplitude threshold for datasets I, S and M were set to default values (200 mm and 0.1, respectively). For dataset H, the maximum length cutoff and FOD amplitude threshold were set to 250 mm and 0.06, respectively. Following this, we constructed wholebrain connectomes with specific fiber counts, as indicated in the respective sections in the main text. For the behavioral score predictions with 200 participants’ dMRI data from the HCP database, and for the test–retest reliability analyses with five participants’ data from the HCP Retest database, dMRI preprocessing and connectome generation followed the same protocol as indicated for dataset H.
Ensemble tractography (dataset S(ET))
Tractography algorithms include multiple parameter settings with many degrees of freedom. Ensemble tractography seeks to overcome biases associated with specific parameter choices by estimating several connectomes, one for each choice of parameter value, and subsequently combining them into an ‘ensemble’ connectome^{10}. For ensemble tractography, we used dataset S to generate five wholebrain connectomes, with the maximum radius of curvature of fibers set to one of five parameter values (0.25 mm, 0.5 mm, 1 mm, 2 mm and 4 mm). Each connectome was generated by seeding fibers at the GM–white matter interface. Next, we combined these individual connectomes to form the ensemble connectome. We generated two connectomes: a smaller, 0.8millionfiber connectome for streamline pruning with LiFE and a larger 1.6millionfiber connectome for pruning with ReAlLiFE (for details see the section Estimating the regularization parameter λ).
Quantifying ReAlLiFE performance as speedups and fits to data
Quantifying speedups
For each of the datasets I, S and H and for every connectome size (Fig. 1c), we first pruned streamlines with the original LiFE optimization algorithm^{3} for 500 iterations. Next, we pruned streamlines with the same connectome using the GPUaccelerated version of LiFE (with no regularization), again for 500 iterations. Speedup of the GPUaccelerated pruning as compared to CPU pruning was computed as
where t(LiFE_{CPU}) is the time taken for 500 iterations of the LiFE algorithm on the CPU, t(LiFE_{GPU}) is the time taken for 500 iterations of the GPUaccelerated LiFE and t(Overhead_{GPU}) is the GPU overhead time corresponding to the time taken for data transfer between the CPU and GPU memory and other preprocessing steps, such as sorting along the voxels dimension.
To calculate the speedup scaling factor, we fit a function of the form \({{y}} = {{a}} + {{b}}\,{{{\mathrm{log}}}}({{x}})\) to the speedups, where y is the speedup, x is the parameter (N_{v}, N_{θ}, N_{f}), a is the intercept and b is the slope. We quantified the scaling factor (q) as change in speedup from the lowest to the highest value for each of the parameters N_{v}, N_{θ} and N_{f}.
We compared the speedups of GPUaccelerated LiFE over a second stateoftheart algorithm, SIFT^{7}. We tested for speedups for each of the three datasets—I, H and S—on a connectome of one million fibers (N_{f} = 10^{6}), using the respective default values for parameters N_{v} and N_{θ}. To ensure a fair comparison, we first pruned streamlines from each unpruned connectome with the GPUaccelerated LiFE, systematically running the optimization algorithm from 50 to 500 iterations (steps of 50 iterations). Next, we pruned the same (respective) unpruned connectome with SIFT, albeit to the exact same size as the corresponding LiFEpruned connectome. In other words, the termination criterion for SIFT was specified to match the number of fibers in the LiFEpruned connectome. We then defined the convergence time t_{c} as the iteration at which the change in the objective function O(t) over the last ten iterations was less than 0.1% of its initial value \({{{O}}({{t}}_{{{\mathrm{c}}}} + 10)  {{O}}({{t}}_{{{\mathrm{c}}}}) < {{\varDelta }} \cdot {{O}}({{{t}}_{0}})}\), where O(t_{0}) is the initial value of the objective function and Δ = 0.001. For each dataset, we then computed the speedup of GPUaccelerated LiFE over SIFT at convergence as the average of the speedups across the nearest two multiples of 50 iterations. Speedups were compared based on total execution times of each algorithm from start to finish, including all overheads associated with loading data into memory.
We also compared the speedups of GPUaccelerated LiFE over yet another a stateoftheart algorithm, COMMIT2, which incorporates anatomically informed priors into its objective^{8}. As before, for each of the three datasets (I, H and S), we computed speedups of a connectome of one million fibers (N_{f} = 10^{6}), with default parameter values for N_{v} and N_{θ}. To enable a fair comparison, in this case, we first pruned the connectome with COMMIT2 until convergence, with all default parameters. Next, we pruned the same (respective) unpruned connectome with GPUaccelerated LiFE to within 1% of the initial size of the corresponding COMMIT2pruned connectome. In this case, we pruned the ReAlLiFE streamlines to the size of the COMMIT2pruned connectome, because the converse approach—pruning the COMMIT2 streamlines to match the ReAlLiFEpruned connectome, at convergence—required over 500 iterations of the COMMIT2 algorithm. As with SIFT, speedups were compared based on total execution times of each algorithm from start to finish, including all overheads associated with loading data into memory.
Finally, we tested a dockerized version of the LiFE algorithm (https://doi.org/10.25663/bl.app.104) to optimize a onemillion fiber connectome for each of the three datasets (500 iterations). We found that the run times were comparable to those of CPULiFE, as reported in Supplementary Fig. 1 (dataset H: CPULiFE 17.9 h, dockerized LiFE, 23.1 h; dataset I: CPULiFE 3.0 h, dockerized LiFE 3.3 h; dataset S: CPULiFE 8.2 h, dockerized LiFE 8.4 h).
Quantifying duplication of fiber weights
We tested ReAlLiFE’s ability to prune away duplicate (redundant) fibers and compared its performance with the LiFE and SIFT2 algorithms^{6}. For this we used two approaches. In the first approach, we used dataset M to generate a wholebrain connectome comprising 0.5 million fibers. Next, we used this connectome to simulate a noisefree diffusion signal^{5}. We then created two, randomly ‘jittered’, nearidentical versions of this connectome (C1 and C2), by randomly perturbing the spatial coordinates of 10% of the nodes in each fiber, randomly by ±0.01%. We combined these perturbed connectomes to create a single connectome comprising one million fibers. Finally, we pruned streamlines in the combined connectome with LiFE and ReAlLiFE, using the noisefree diffusion dataset simulated in the previous step. For pruning with SIFT2, we followed a slightly different approach. SIFT2, unlike LiFE or ReAlLiFE, does not prune out (eliminate) any fiber. To facilitate a fair comparison with LiFE and ReAlLiFE, we first pruned the combined connectome with SIFT^{7} followed by pruning with SIFT2 to obtain fiber weights of the unpruned fibers retained by SIFT. We sought to perturb the fibers across the two connectomes before pooling them, rather than combining two connectomes with identical, duplicated fibers; in the latter case, each algorithm yielded identical weights across each pair of duplicate fibers because none of the algorithms could break ties across identical fibers.
Following pruning, we computed a ‘uniqueness’ index quantifying the normalized difference in weights between copies of corresponding fibers as
A higher value of ζ_{uniq} indicates a higher tendency to prune out redundant fibers and to retain only one copy of the two nearidentical fibers.
In the second approach, we created a ‘trimmed’ version of the wholebrain connectome from dataset M (same as estimated in the previous approach) by trimming out 5% of the nodes from each terminus of each fiber. We then combined the original wholebrain connectome with this jittered copy, followed by pruning with LiFE, ReAlLiFE and SIFT2, as before.
Estimating the regularization parameter λ
We sought to compare the performance of LiFE with ReAlLiFE in terms of model fit. Because the L1regularization penalty in ReAlLiFE yields systematically sparser connectomes as compared to LiFE, to enable fair comparisons of the model fit we matched the summed weights (L1norm) of the fibers, following pruning with each approach (Supplementary Fig. 3). To permit this, we generated a larger initial connectome, with 2× the number of fibers for pruning with ReAlLiFE, as compared to LiFE, and sampled λ, in the range of [10^{−8}, 1] in logarithmically spaced steps (Supplementary Fig. 3). For datasets I and M, we compared the crossvalidated rootmeansquare error (r.m.s.e.) by generating a connectome with one million fibers, followed by pruning with LiFE (unregularized), against the accuracy of a connectome generated with two million fibers, followed by pruning with ReAlLiFE. For dataset S(ET), we generated two ensemble connectomes: one with 0.8 million fibers (for pruning with LiFE) and another with 1.6 million fibers (for pruning with ReAlLiFE). These ensemble connectomes were assembled from five smaller wholebrain connectomes generated with 160,000 and 320,000 fibers respectively. We then chose the λ that matched the L1norm of weights across both pruning approaches. For datasets I, M and S(ET), λ values of 0.006, 0.01 and 0.01, respectively, provided this match (Supplementary Fig. 3); unless otherwise specified, the same configurations and regularization parameter values were used for all subsequent analyses (for example, Fig. 2 and Supplementary Figs. 2–4).
Quantifying the model fit
For each dataset (datasets I, S(ET) and M), we evaluated the performance of the pruning algorithm in two ways—by testing for overfitting and consistency. To test for overfitting, we first generated a wholebrain connectome (C1) with one diffusion dataset (D1; Supplementary Fig. 3). Next, we pruned C1 with LiFE and ReAlLiFE using dataset D1 as ground truth to obtain a predicted diffusion signal (P1; Supplementary Fig. 3). Finally, we computed the voxelwise, crossvalidated r.m.s.e. between the predicted diffusion signal P1 and a second, independently acquired diffusion dataset (D2; Supplementary Fig. 3) from the same participant. For dataset M, D2 was independently simulated from the same underlying groundtruth connectome (see section Diffusion MRI acquisition and preprocessing). We then computed the distribution of voxelwise r.m.s.e.s, following pruning with LiFE and ReAlLiFE, and also computed their pairwise differences (ReAlLiFE – LiFE; Supplementary Fig. 3).
To test for consistency (Supplementary Fig. 4), we generated two wholebrain connectomes (C1 and C2) from two independently acquired diffusion datasets (D1 and D2, respectively), both from the same participant (Supplementary Fig. 4). Next, we pruned C1 (with LiFE or ReAlLiFE) with dataset D2 as ground truth and, similarly, pruned C2 with dataset D1 as ground truth (Supplementary Fig. 4). We then chose the fibers assigned the top 50th percentile of the weights, to predict the diffusion signals P1 and P2 (Supplementary Fig. 4). Finally, we computed the voxelwise, r.m.s.e. between the predicted diffusion signals P1 and P2 (Supplementary Fig. 4). As before, we computed the distribution of voxelwise r.m.s.e.s, following pruning with LiFE and ReAlLiFE, and also computed their pairwise differences (ReAlLiFE – LiFE; Supplementary Fig. 4).
Test–retest reliability analysis, HCP retest data
Estimating structural connectivity features
For each participant in the HCP retest dataset (n = 5; blue indices in Supplementary Table 2), we estimated two wholebrain connectomes, one comprising one million fibers and a second two million fibers (see ‘Tractography and generating wholebrain connectomes from dMRI data’). This was done both for the original dataset (dataset 1) and the retest dataset (dataset 2). For each visit’s data, we employed FreeSurfer’s anatomical parcellation based on the Desikan–Killiany atlas comprising 34 regions on each hemisphere to construct two 34 × 34 structural connectivity matrices (one per hemisphere)^{21}. We computed five kinds of structural connectivity matrix: (1) unpruned, (2) ReAlLiFE pruned (λ = 0.01, twomillionfiber connectome), (3) LiFE pruned (onemillionfiber connectome), (4) SIFT2 pruned (onemillionfiber connectome) and (5) COMMIT2 pruned (onemillionfiber connectome). For each connectivity matrix, the (i, j)th entry indicates the number of fibers, or the sum of pruned fiber weights (after pruning with either ReAlLiFE, LiFE, SIFT2 or COMMIT2), across all fibers connecting regions i and j, respectively. Because diffusion MRI does not provide information regarding the direction of connectivity, each matrix was symmetric about the diagonal. For these analyses we considered only connections between pairs of intrahemispheric regions and ignored diagonal elements of the connectivity matrix, that is, connections that originate and terminate within the same ROI. As a result, the total number of connectivity features across both hemispheres was 1,122 (^{34}C_{2} connections per hemisphere × 2 hemispheres). To limit noisy estimates of test–retest reliability metrics, we chose only connections with the top 50th percentile of fibers (561 connections), for the test–retest reliability analysis.
Withinparticipant variability (V _{w})
To compute withinparticipant variability, for each participant, and for each connection in the connectivity matrix, we computed a normalized difference index between the connectivity metrics of the first dataset and the second dataset, that is
where k denotes the participant, i denotes the connection, i = 1, 2, ..., 561, and c_{1} and c_{2} denote the connectivity strengths based on datasets 1 and 2 (two visits), respectively. We then computed average withinparticipant variability for each connection as
where the angle brackets \({\left\langle \cdot \right\rangle _k}\) denote the average across the participants.
Betweenparticipant variability (V _{b})
To compute betweenparticipant variability, for each connection in the connectivity matrix, we first computed a normalized difference index between the connectivity metrics of every participant’s dataset m and every other participant’s dataset l (m ≠ l), that is
where i denotes the connection, i = 1, 2, ..., 561, and c_{1} and c_{2} denote the connectivity strengths based on datasets 1 and 2. We then computed average betweenparticipant variability as
where the angle brackets \({\left\langle \cdot \right\rangle _{l,\,m}}\) denote the average across each pair of participants.
Finally, for each connection i, we computed a difference in variability between pruned and unpruned connectivity as
Reliability
Finally, for each connection i, we computed a reliability metric ϕ as
That is, the reliability of connection i was taken as the ratio of the betweenparticipant variability to the total variability, including between and withinparticipant components, following ref. ^{12}. Using this reliability metric computed using the HCP retest data, we identified connections with the highest and lowest reliability metrics.
Analysis of structure–behavior relationships
We tested whether structural connectivity could explain interindividual variations in various cognitive scores, with data from n = 200 participants from the HCP database. Specifically, we performed SVRbased prediction of cognitive scores with structural connectivity features estimated either from the unpruned connectome (number of fibers) or by pruning streamlines with ReAlLiFE or SIFT (connection weights).
Behavioral scores
For each of the n = 200 participants drawn from the HCP database, we predicted 60 behavioral scores (Supplementary Data 1) from dMRI connectivity^{9}. These 60 scores were chosen based on selection criteria employed in previous studies that sought to predict these scores from functional MRI connectivity^{13}. In addition to selecting the 58 scores employed in these studies, we included the inscanner score related to performance accuracy in the shapematching subtask of the emotion task (Emotion_Task_Shape_Acc, Supplementary Data 1) and replaced the overall accuracy in the relational task with accuracies in each of its subtasks (object matching/Relational_Task_Match_Acc and object relation/Relational_Task_Rel_Acc). Both state and trait scores were included, and ageadjusted scores were used, wherever available. Importantly, behavioral score selection was agnostic to the results of our prediction analyses. Broadly, these behavioral scores fell into three major categories: cognition (n = 13), emotion (n = 23) or personality (n = 24)^{22}. These scores also included nine behavioral performance scores recorded during the taskfunctional MRI sessions. These included accuracy scores in all subtasks of each task, except for the workingmemory task, for which we only chose the overall performance accuracy across all subtasks. Further details are provided in Supplementary Data 1.
Estimating structural connectivity features
For each participant in the HCP dataset (n = 200), we estimated a wholebrain connectome comprising one million fibers (see ‘Tractography and generating wholebrain connectomes from dMRI data’). As with the test–retest reliability analysis, we used the FreeSurfer anatomical parcellation based on the Desikan–Killiany atlas comprising 34 regions on each hemisphere to construct a 68 × 68 structural connectivity matrix^{21}. We computed two kinds of structural connectivity matrix—(1) unpruned and (2) ReAlLiFEpruned (λ = 0.01)—where the (i, j)th entry in each matrix indicates the number of fibers, or the sum of ReAlLiFEpruned fiber weights, for fibers connecting regions i and j, respectively. We also carried out predictions with a combination of features (1) and (2) (Fig. 2g). Because diffusion MRI does not provide information regarding the direction of connectivity, each matrix was symmetric about the diagonal. In addition, for these analyses we considered only connections between pairs of intrahemispheric regions and also set all diagonal elements of the connectivity matrix (connections that originate and terminate within the same ROI) to zero. As a result, the total number of connectivity features across both hemispheres was 1,122 (^{34}C_{2} connections per hemisphere × 2 hemispheres).
Prediction model
For predicting behavioral scores using structural connectivity as features, we used an SVMbased regression model with a linear kernel (‘fitrsvm’ function in MATLAB), along with RFE^{14}. Behavioral scores were standardized by zscoring, prior to model fitting. The 1,122 connection features were organized into a feature matrix having dimensions 200 × 1,122, where each row corresponds to a participant and each column corresponds to one of the 1,122 connections (features). We employed RFE to identify features with the largest weight magnitudes in the linear prediction model that provided the highest generalized crossvalidation accuracy. We describe the algorithm briefly below (a more detailed description is provided in ref. ^{14}).
In the first stage of the RFE, the data were divided into N folds. During each iteration i, N − 1 of these were used for training and the Nth fold was reserved for testing. In the second stage, for each training set (N − 1 folds), the training data were further divided into K folds. Subsequently, each of these K folds was left out exactly once and the SVR model was trained on the K − 1 folds. In each of the K iterations, the estimated regression coefficients were used to predict the behavioral scores of the leftout fold (Nth fold) from the first stage, and a correlation between the predicted scores and the observed scores of the test fold was computed. At the end of the second stage (K iterations), the β weights (regression coefficients) and correlation values were averaged across the K iterations to obtain robust estimates. Next, based on these average β weights, the bottom 10% of the features were discarded. This marks the end of the second stage. The second stage of the RFE was repeated until all features were eliminated. Finally, we chose the set of features, say F_{i}, that yielded the maximum correlation between the observed and predicted scores.
The above procedure was repeated N times, leaving one fold for testing each time. At the end of the N iterations, we averaged (across the N folds), the final set of estimated β weights (<F_{i}>_{i}). We repeated the entire RFE procedure (first and second stage) for 100 runs, to robustly estimate the top features and predict behavioral scores. For these analyses, we chose N = 10 and K = 5.
Comparing predictions based on the number of fibers and ReAlLiFE connection weights
We compared the predictions based on ReAlLiFE weights with those based on the number of fibers. We computed the number of significant predictions (based on the correlation between the observed and predicted scores) for each feature set (ReAlLiFE weights and number of fibers) for a range of different levels of significance α, ranging from α = 0.00001 to α = 0.05. For each value of α we computed the number of scores for which the P value for the correlation between the observed and predicted score was less than α. We also computed average correlation coefficient values across all significant scores at each α level. The number of significant scores and the average correlation coefficients, at each α level, for predictions based on each set of features, and for the different score categories, are plotted in Fig. 2.
Univariate correlations between connectivity and behavioral scores
To understand the anatomical relevance of the ReAlLiFE connections predicting behavioral scores, we chose the five best predicted scores in the ‘cognition’ category. For each score, we identified the top connections that contributed most to the prediction, based on the β weight assigned to the connection by the SVRRFE prediction model. For each such connection, we then computed three additional connectivity metrics: (1) the number of fibers post pruning with ReAlLiFE, (2) streamline volume and (3) streamline length (voxels intersected). For each connection, we then computed univariate correlations between each of these connectivity metrics and the corresponding behavioral score. Univariate correlations (r and P values) are reported based on robust correlations. This procedure allowed us to identify connectivity metrics, strongly predictive of behavioral scores, that were most strongly correlated with the respective behavioral score.
Control analyses
We performed three control analyses for behavioral score predictions, with ReAlLiFE connection weights as features.
Predicting five minimally correlated scores
To account for correlations among the 60 behavioral scores, we selected a subset of five minimally correlated scores, following the same procedure as in ref. ^{13}. We recapitulate their approach as follows. First, a pair of scores with an absolute correlation of less than 0.1 was chosen at random. Subsequently, three behavioral scores were selected, one at a time, so that each new score correlated minimally with the existing set of scores (r < 0.01). This procedure was repeated 100 times, resulting in 100 such sets of five minimally correlated scores. Finally, the subset of five scores with the least maximum absolute mutual correlation was selected. These scores corresponded to personality extroversion (HCP field: NEOFAC_E), emotion recognition (ER40HAP and ER40NOE), picture vocabulary (PicVocab) and processing speed (ProcSpeed); these five scores are not identical to those in ref. ^{13}, possibly because, unlike their study, we employed ageadjusted scores, wherever these were available. In our analyses, these scores exhibited a nonsignificant maximum absolute correlation of r = 0.059 (P = 0.403).
Controling for head motion confounds
To account for confounding of head motion on behavioral score predictions, we repeated predictions with the SVRRFE model after regressing out the effects of motion parameters from the behavioral scores, again following a procedure closely similar to that of ref. ^{13}. From each participant’s dMRI scan, six headmotionrelated parameters were extracted: three parameters corresponding to rotations (about the x, y and z axes, respectively) and three parameters corresponding to translations (along the x, y and z axes, respectively). We fit a multiple linear regression model with the motion parameters as the independent variables to fit each behavioral score. Subsequently, the residual corresponding to each prediction was fit with the SVRRFE model with ReAlLiFE weights as features. If the residual could not be fit well, this would indicate that the interparticipant behavioral score variations could not be explained by structural connection features over and above what could be explained with the motion parameters alone.
Permutation test for RFE predictions
Finally, we corrected for a potential bias of the RFE algorithm toward positive prediction accuracies (r values). Because, at each iteration, the RFE algorithm selects features with the highest numerical value of the generalized crossvalidation accuracy, average prediction accuracies across iterations could be positively biased. To control for this bias, we performed a random permutation test in which we shuffled participant labels 100 times across each behavioral score and structural connectivity feature and generated a null distribution of prediction accuracies (r values). Because generating the null distribution for each behavioral score is computationally expensive, each r value in the null distribution (100 permutations) was computed by averaging across, at most, ten iterations of the RFE algorithm. The P value was computed as the proportion of observations in the null distribution that were greater than the actual prediction accuracy (actual r value; Supplementary Fig. 5).
Statistical tests
Unless otherwise stated, all pairwise statistical comparisons were performed with the nonparametric Wilcoxon signedrank test (for example, Fig. 1 and Supplementary Fig. 2). Voxelwise r.m.s.e.s post pruning with LiFE and ReAlLiFE (Supplementary Figs. 3 and 4) were compared using a Kolmogorov–Smirnov test. All correlations were computed based on ‘robust’ correlations: we report values of the ‘bend correlation’ (Fig. 2b–f,h and Supplementary Fig. 5), an approach that accounts for univariate outliers in the data^{23}. Unless otherwise specified, multiple comparisons corrections were carried out with the Benjamini–Hochberg approach, at a significance level of P = 0.05. To compare the execution times of the GPU implementation of LiFE with those of the CPU implementation of LiFE as a function of the number of fibers N_{f} (Supplementary Fig. 1), we employed a twoway analysis of variance with the implementation (CPU/GPU) and the number of fibers as factors. Effect sizes were quantified as Cohen’s d. Statistical significance values are reported as ***P < 0.001, **P < 0.01 and *P < 0.05.
Hardware and software specifications
All analyses described in this study were performed on a desktop computer with the following hardware and software specifications:

CPU: eight cores; Intel(R) Xeon(R) CPU E52623 v3 @ 3.00 GHz

GPU: one NVIDIA GeForce GTX 1080 Ti (or) AMD Radeon RX 580

RAM: DDR4, 4 × 16 GB 1,866 MHz (total 64 GB)

Hard disk space: 447 GB (64 GB configured as swap)

Operating system: Ubuntu 16.04

Software: MATLAB R2017b (64 bit); CUDA toolkit 9.0.
We performed all our benchmarking experiments with only one CPU core and one GPU. GPU code binaries were compiled using the ‘nvcc’ compiler with the ‘ptx’ flag. Because the original LiFE package was implemented in Matlab, integrating the CUDA or HIP implementation with the ReAlLiFE package requires Matlab support for GPU computation. Matlab support is available for NVIDIA GPUs, but is currently unavailable for AMD GPUs (https://www.mathworks.com/help/parallelcomputing/gpusupportbyrelease.html). The endtoend version of the code, integrated with ReAlLiFE, is currently available for NVIDIA GPUs (Code availability section), and will be released for AMD GPUs as soon as Matlab support for the latter hardware becomes available.
Reporting Summary
Further information on research design is available in the Nature Research Reporting Summary linked to this Article.
Data availability
Dataset I has been deposited into a figshare repository^{19}. Dataset S is a part of the original LiFE algorithm and can be accessed from the associated repository^{18}. Dataset H and all HCP data can be accessed from the HCP database^{9}. Dataset M is available on the Zenodo repository^{5}. Source data are provided with this paper.
Code availability
The custom code for reproducing all figures is available on figshare^{19}. The current version of the ReAlLiFE algorithm is available on Code Ocean^{24}. In addition, to streamline pruning, the code pipeline integrates with a standard atlas (HCPMMP atlas^{9}) to label brain regions connecting each pair of streamlines. The pipeline also integrates with the TractSeg tool^{25} for automated tract labeling.
References
Chanraud, S., Zahr, N., Sullivan, E. V. & Pfefferbaum, A. MR diffusion tensor imaging: a window into white matter integrity of the working brain. Neuropsychol. Rev. 20, 209–225 (2010).
Damoiseaux, J. S. et al. White matter tract integrity in aging and Alzheimer’s disease. Hum. Brain Mapp. 30, 1051–1059 (2009).
Pestilli, F., Yeatman, J. D., Rokem, A., Kay, K. N. & Wandell, B. A. Evaluation and statistical inference for human connectomes. Nat. Methods 11, 1058–1063 (2014).
Jeurissen, B., Descoteaux, M., Mori, S. & Leemans, A. Diffusion MRI fiber tractography of the brain. NMR Biomed. 32, e3785 (2019).
MaierHein, K. H. et al. The challenge of mapping the human connectome based on diffusion tractography. Nat. Commun. 8, 1349 (2017).
Smith, R. E., Tournier, J. D., Calamante, F. & Connelly, A. SIFT2: enabling dense quantitative assessment of brain white matter connectivity using streamlines tractography. Neuroimage 119, 338–351 (2015).
Smith, R. E., Tournier, J. D., Calamante, F. & Connelly, A. SIFT: sphericaldeconvolution informed filtering of tractograms. Neuroimage 67, 298–312 (2013).
Schiavi, S. et al. A new method for accurate in vivo mapping of human brain connections using microstructural and anatomical information. Sci. Adv. 6, eaba8245 (2020).
Van Essen, D. C. et al. The Human Connectome Project: a data acquisition perspective. NeuroImage 62, 2222–2231 (2012).
Kumar, S., Sreenivasan, V., Talukdar, P., Pestilli, F. & Sridharan, D. ReAllife: accelerating the discovery of individualized brain connectomes on GPUs. In Proc. AAAI Conference on Artificial Intelligence Vol. 33, 630–638 (AAAI, 2019).
Daducci, A., Dal Palú, A., Descoteaux, M. & Thiran, J. P. Microstructure informed tractography: pitfalls and open challenges. Front. Neurosci. 10, 247 (2016).
Jiang, C., Betzel, R., He, Y. & Zuo, X.N. Toward reliable network neuroscience for mapping individual differences. Preprint at bioRxiv https://doi.org/10.1101/2021.05.06.442886 (2021).
Kong, R. et al. Spatial topography of individualspecific cortical networks predicts human cognition, personality, and emotion. Cereb. Cortex 29, 2533–2551 (2019).
De Martino, F. et al. Combining multivariate voxel selection and support vector machines for mapping and classification of fMRI spatial patterns. Neuroimage 43, 44–58 (2008).
de Schotten, M. T. et al. A lateralized brain network for visuospatial attention. Nat. Neurosci. 14, 1245–1246 (2011).
Brodt, S. et al. Fast track to the neocortex: a memory engram in the posterior parietal cortex. Science 362, 1045–1048 (2018).
Ho, J. C., Ghosh, J. & Sun, J. Marble: highthroughput phenotyping from electronic health records via sparse nonnegative tensor factorization. In Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 115–124 (ACM, 2014); https://doi.org/10.1145/2623330.2623658
Caiafa, C. F. & Pestilli, F. Multidimensional encoding of brain connectomes. Sci. Rep. 7, 11491 (2017).
Sreenivasan, V. et al. ReALLiFE. figshare https://doi.org/10.6084/m9.figshare.13491024.v1 (2002).
Tournier, J. D., Calamante, F. & Connelly, A. MRtrix: diffusion tractography in crossing fiber regions. Int. J. Imaging Syst. Technol. 22, 53–66 (2012).
Sreenivasan, V. & Sridharan, D. Subcortical connectivity correlates selectively with attention’s effects on spatial choice bias. Proc. Natl Acad. Sci. USA 116, 19711–19716 (2019).
Kashyap, R. et al. Individualspecific fMRIsubspaces improve functional connectivity prediction of behavior. Neuroimage 189, 804–812 (2019).
Pernet, C. R., Wilcox, R. & Rousselet, G. A. Robust correlation analyses: false positive and power validation using a new open source Matlab toolbox. Front. Psychol. 3, 606 (2013).
Sreenivasan, V. et al. ReAlLiFE. Code Ocean https://doi.org/10.24433/CO.5578739.v1 (2022).
Wasserthal, J., Neher, P. & MaierHein, K. H. TractSeg—fast and accurate white matter tract segmentation. Neuroimage 183, 239–253 (2018).
Acknowledgements
We thank A. Nagaraj, R. Bhuthesh, A. Ashok Nayak, A. Panwar, S. Pandey and A. Basu for help and resources with porting ReAlLiFE to AMD GPUs. We also thank P. Gupta, S. Gupta and G. T. Anandan for their feedback on an earlier version of this manuscript. We also thank Health Care Global (HCG) Hospital for access to their MRI scanning facility. This research was supported by a Department of BiotechnologyWellcome Trust India Alliance Intermediate fellowship (no. IA/I/15/2/502089), a Science and Engineering Research Board Early Career award (no. ECR/2016/000403), a Department of Biotechnology–Indian Institute of Science Partnership Program grant, an India Trento Partnership Program grant (to D.S.), a Pratiksha Trust Intramural grant (to D.S. and P.T.), NSF OAC1916518, NSF IIS1912270, NSF IIS1636893, NSF BCS1734853, NIH NIMH R01MH126699, NIH NIBIB R01EB030896, NIH NIBIB R01EB029272, Microsoft Investigator Fellowship, a gift by the Kavli Foundation (to F.P.), and the Ministry of Human Resource Development, Government of India (to V.S. and S.K.).
Author information
Authors and Affiliations
Contributions
D.S. and P.T. designed the research. V.S. performed the research. S.K and F.P. contributed unpublished reagents/analytic tools. V.S. analyzed data and V.S. and D.S. wrote the manuscript.
Corresponding authors
Ethics declarations
Competing interests
D.S. is a research consultant at Google. P.T. is a research scientist at Google. He is also the founder of Kenome, an enterprise AI company. V.S., S.K. and F.P. declare no competing interests.
Peer review
Peer review information
Nature Computational Science thanks XiNian Zuo and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Ananya Rastogi, in collaboration with the Nature Computational Science team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Results, Algorithms 1 and 2, Tables 1–3, Figs. 1–5 and References.
Supplementary Data 1
List of cognitive scores used for behavioral prediction.
Supplementary Data 2
Correlations between the best predicted cognitive scores fiber features for the top connections.
Source data
Source Data Fig. 1
Data to reproduce Fig. 1.
Source Data Fig. 2
Data to reproduce Fig. 2.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Sreenivasan, V., Kumar, S., Pestilli, F. et al. GPUaccelerated connectome discovery at scale. Nat Comput Sci 2, 298–306 (2022). https://doi.org/10.1038/s4358802200250z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s4358802200250z
This article is cited by

Efficiently pruning brain connectomes
Nature Computational Science (2022)