Multidimensional encoding of brain connectomes

The ability to map brain networks in living individuals is fundamental in efforts to chart the relation between human behavior, health and disease. Advances in network neuroscience may benefit from developing new frameworks for mapping brain connectomes. We present a framework to encode structural brain connectomes and diffusion-weighted magnetic resonance (dMRI) data using multidimensional arrays. The framework integrates the relation between connectome nodes, edges, white matter fascicles and diffusion data. We demonstrate the utility of the framework for in vivo white matter mapping and anatomical computing by evaluating 1,490 connectomes, thirteen tractography methods, and three data sets. The framework dramatically reduces storage requirements for connectome evaluation methods, with up to 40x compression factors. Evaluation of multiple, diverse datasets demonstrates the importance of spatial resolution in dMRI. We measured large increases in connectome resolution as function of data spatial resolution (up to 52%). Moreover, we demonstrate that the framework allows performing anatomical manipulations on white matter tracts for statistical inference and to study the white matter geometrical organization. Finally, we provide open-source software implementing the method and data to reproduce the results.


Multidimensional arrays notation and definitions
Multidimensional arrays (tensors) generalize vectors (1D array) and matrices (2D array) to arrays of higher dimensions, three or more. Such arrays can be used to perform multidimensional factor analysis and decomposition and are of interest to many scientific disciplines 1,2 .
Below, we introduce basic concepts and notation (we refer the reader to Supplementary Table 1 ). Vectors, matrices and tensors. Vectors and matrices are denoted using boldface lower-and upper-case letters, respectively. For example x ∈ R I and X ∈ R I×J represent a vector and a matrix, respectively. A tensor X ∈ R I×J×K is a 3D array of real numbers whose elements (i, j, k) are referred to as X(i, j, k) or x i jk . The individual dimensions of a tensor are referred to as modes (1 st mode, 2 nd mode, and so on).
Tensor slices and mode-n vectors. Slices are used to address a tensor along a single dimension and are obtained by fixing the index of one dimension of the tensor while letting the other indices vary. For example, in a 3D tensor X ∈ R I×J×K , we identify horizontal (i), lateral ( j) and frontal (k) slices by holding fixed the corresponding index of each array dimension (see Supplementary Fig. 1a-top). Tensors can be also addressed in any dimension by means of mode-n vectors. These vectors are obtained by holding all indices fixed except one ( Supplementary Fig. 1a-bottom).
Subtensors and tensor unfolding. A subset of indices in any mode identifies a volume also referred as to a subtensor. For example, in Fig. 1d, we identify a volume by collecting slices in the 3 rd mode. In addition, a tensor can be converted into a matrix by re-arranging its entries (unfolding). The mode-n unfolded matrix, denoted by X (n) ∈ R I n ×Ī n , whereĪ n = ∏ m =n I m and whose entry at row i n and column (i 1 − 1)I 2 · · · I n−1 I n+1 · · · I N + · · · + (i N−1 − 1)I N + i N is equal to x i 1 i 2 ...i N . For example, mode-2 unfolding builds the matrix X (2) where its columns are the mode-2 vectors of the tensor and the rows are vectorized versions of the lateral slices, i.e. spanning dimensions with indices i and k (see Supplementary Fig. 1b).
Tensor by matrix product. By generalization of matrix multiplication, a tensor can be multiplied by a matrix in a specific mode, only if their size matches. Given a tensor X ∈ R I 1 ×I 2 ···×I N and a matrix A ∈ R J×I n , the mode-n product is defined by: y i 1 ···i n−1 ji n+1 ···i N = ∑ I n i n =1 x i 1 ···i n ···i N a ji n , with i k = 1, 2, ..., I k (k = n) and j = 1, 2, ..., J. Supplementary Fig. 1c illustrates a 3D tensor by matrix product operation (2 nd mode, Y = X × 2 A).
Tucker decomposition. Low-rank matrix approximation can be generalized to tensors by Tucker decomposition 3 . For example, X ∈ R I 1 ×I 2 ×I 3 , can be approximated by: where × n is the mode-n tensor-by-matrix product. G ∈ R R 1 ×R 2 ×R 3 is the core tensor and A n ∈ R I n ×R n are factor matrices. Such a decomposition guarantees data compression when the core tensor is much smaller than the original, i.e. R n I n (see Supplementary Fig.1d-top). Sparse Decomposition: Tensors can be approximated also by sparse decomposition 4,5 . In this case, compression can be achieved independently of the size of G as long as sparsity is sufficiently high (see Supplementary Fig.1d-bottom).  (2) . (c) Tensor-by-matrix product (example of product in mode-2). (d) The classical Tucker decomposition (top; 3 ) allows representing a 3D tensor X ∈ R I 1 ×I 2 ×I 3 as the product of a core tensor (green) G ∈ R R 1 ×R 2 ×R 3 by factor matrices A n ∈ R I n ×R n (red, yellow and light blue). Data compression is achieved by considering very small (dense) core tensors G, meaning that R n I n . The sparse Tucker Decomposition (bottom; 4 ). The core tensor G is large but sparse. Data compression is achieved because of the sparsity of the core tensor. See Supplementary Table 1 for additional information about notation and mathematical definitions. Table 1. Mathematical notation and definitions for multidimensional arrays (tensors) and their decomposition.
A tensor, a matrix, a vector and a scalar Entries of a tensor, a matrix and a vector X(:, j, k), X(i, :, k), X(i, j, :) Mode-1, mode-2 and mode-3 vectors are obtained by fixing all but one index X(i, :, :), X(:, j, :), X(:, :, k) Horizontal, lateral and frontal slices are obtained by fixing all but two indices X (n) ∈ R I n ×I 1 I 2 ···I n−1 I n+1 ···I N Mode-n unfolding of tensor X ∈ R I 1 ×I 2 ×···×I N whose entry at row i n and column (i 1 − 1)I 2 · · · I n−1 I n+1 · · · I N + · · · + (i N−1 − 1) Tucker decomposition: a 3D tensor X ∈ R I 1 ×I 2 ×I 3 is represented as the product of a core array G ∈ R R 1 ×R 2 ×R 3 by factor matrices A n ∈ R I n ×R n x = vec(X) ∈ R I 1 I 2 ···I N Vectorization of tensor X ∈ R I 1 ×I 2 ×···×I N with the entry at position i 1 + ∑ N k=2 [(i k − 1)I 1 I 2 · · · I k−1 ] equal to x i 1 i 2 ···i N

Mathematical Modeling of dMRI signals
Measured dMRI signals depend on the combination of multiple cellular compartments within the brain tissue 6 (e.g., neurons, astrocytes and oligodendrocytes). The dMRI signal is generally modeled as the linear combination of two components. One component describes the directional diffusion signal and is presumably related primarily to the direction of the neuronal axons wrapped in myelin sheaths (white matter). This signal is often referred to as anisotropic diffusion. The other component describes isotropic diffusion (non-directional) and is presumably related to the combination of signals originating from the rest of the cellular bodies within the brain tissue. Below we introduce the equations we used to model the dMRI signal in relation to these compartments.
dMRI measurements are collected with and without a diffusion sensitization magnetic gradient. Such gradient allows the dMR image intensity to vary depending on water diffusion along a single direction. Generally multiple dMR images are collected for each brain location by varying this diffusion-sensitization gradient (i.e., by sequentially orienting the gradient along several gradient directions). The measured signal depends on a combination of parameters such as diffusion gradient strength and duration. Below we denote the diffusion sensitization gradient strength with the scalar b and direction with the unit-norm vector θ θ θ ∈ R 3 .
For a given sensitization strength b and diffusion direction θ θ θ , the measured dMRI signal at each location within a brain (voxel v) can be computed using the following equation 7,8 : where f is the index of the candidate white-matter fascicles in the voxel, S 0 (v) is the non diffusion-weighted signal in voxel v and A 0 is the isotropic apparent diffusion (diffusion in all directions). The value θ θ θ T Q f ,v θ θ θ > 0 gives us the apparent diffusion at direction θ θ θ generated by fascicle f . Q f ,v ∈ R 3×3 is a symmetric and positive-definite matrix called diffusion tensor 9 . The diffusion tensor Q f ,v allows a compact representation of the diffusion signal measured with dMRI. Usually, Q f ,v is represented by an 3D-ellipsoid as shown in Supplementary Fig. 2a, which can be mathematically defined with the equation: where u n ∈ R 3×1 are the unit-norm orthogonal vectors that correspond to the semi-axes of the diffusion tensor ellipsoid, and s a , s r 1 , s r 2 define the axial and radial diffusivity of the tensor, respectively. In the simplest version of the model, s a = 1 and s r 1 = s r 2 = 0, which means that diffusion is restricted to the main axis direction (stick model) 8, 10 .

Tensor decomposition of the Linear Fascicle Evaluation model
The LiFE 10 method predicts the demeaned diffusion signal y ∈ R N θ θ θ N v in all voxels (v = 1, 2, . . . , N v ) and gradient directions (θ θ θ 1 , θ θ θ 2 , . . . , θ θ θ N θ θ θ ) using the following equation (see Methods for its derivation). Each column in matrix M (Methods, equation 4) contains the diffusion signal contribution from a single fascicle at all voxels and gradient directions. Vector w ∈ R N f contains the weights associated to each fascicle's contribution. Vector y and matrix M are composed by a vertical concatenation of N v block vectors y v ∈ R N θ θ θ and matrices M v ∈ R N θ θ θ ×N f , where each block corresponds to a particular voxel v (see Supplementary Fig. 2c): Thus, in each voxel we have the following linear model: where matrix D ∈ R N θ θ θ ×N a is a dictionary of diffusion predictions whose columns (atoms) correspond to precomputed fascicle orientations, and Φ Φ Φ v ∈ R N a ×N f is a sparse matrix whose non-zero entries (Φ Φ Φ v (a, f ) ) indicate the orientation of fascicle f in voxel v approximated by atom a. The dictionary atoms were created by uniformly sampling the azimuth (α) and elevation

5/16
(β ) of an idealized unit-norm sphere representing the space of putative fascicles directions ( Supplementary Fig. 2f). More specifically, the entries of the dictionary were computed as follows: where Q a is a diffusion tensor that approximates Q f ,v . At voxels where a fascicle is straight enough, the diffusion signal is approximated by addressing only one atom (one orientation), however for curved fascicles at a voxel its diffusion signal is approximated by a linear combination of few atoms in the dictionary. However, By inserting equation (16) into equation (15), and transforming the approximated full matrixM of the original LiFE model 10 into a tensorM ∈ R N a ×N v ×N f (stacking block matrices M v as lateral slices a ), we demonstrate that the following decomposition holds:M Finally, using equations (4) and (19), the full LiFE T model can be written as ( Fig. 2a): Supplementary Fig. 2e). This model is an example of application of Sparse Tucker Decomposition (see Methods, 4 ). In the article we refer to this model as LiFE T (see Fig. 2a).

Comparison of the weights estimated by LiFE M and LiFE T
In equation (13), we introduced a measure that quantifies the difference between two weights vectors w andŵ. Weights vectors are sparse and their entries indicate which fascicles in the connectome contribute (non-zero) or not (zero) to predict the diffusion signal. Hereafter, we perform additional detailed analyses on the difference between these vectors. The observed error between the vectors w andŵ can, in theory, be the result of: (1) each model assigning non-zero weights to a different subset of fascicles; (2) non-zero weights being assigned to the same subset of fascicles but their magnitude differs in the two models; (3) a combination of (1) and (2). This difference is important because it would indicate that either LiFE M and LiFE T select very different fascicles (1) or the same fascicles (2). The case in (1) would indicate a potential bias in LiFE T . We explicitly quantified which one of these three cases contributed to the observed error in the weights. To do this, we defined two subsets of weight-fascicles indices. Those that have non-zero values in both models, common-fascicles indices ( Supplementary Fig. 2j, orange) and those that have non-zero values in one model but not in the other, different-fascicles indices ( Supplementary Fig. 2j, green and blue).
We define the vectors of common-fascicles as w c andŵ c , and different-fascicles as w d andŵ d (see Supplementary Fig.  2j). We demonstrate that the square of the error of the weights (Methods, equation 13) is: where w 2 are the squared errors associated to the commonand different-fascicles, respectively.

Encoding a connectome into LiFE T
Encoding a brain connectome information into the LiFE T model involves the computation of the dictionary matrix D and the sparse tensor Φ Φ Φ.
The dictionary matrix D need to be computed first by fixing the total number of atoms N a to be used, i.e. by setting the minimum incremental unit ∆ = π/L for the spherical coordinates which create a regular grid (see Supplementary Fig.  2f). Thus, by increasing the parameter L we increase the resolution which has two main consequences: 1) reduces the a Note that theM is the transposed mode-3 unfolded matrix of tensorM, i.eM =M T (3) . We use this unfolding operation to compute the LiFE T model error e M , see Fig. 2c, top panel.

6/16
approximation error (see section 2.5 and Supplementary Fig. 2g-i), and 2) increases the size of the model (see section 2.6). We have demonstrated that by using L = 360, we obtained a very accurate approximation and a substantial reduction in storage requirements.
By computing the sparse tensor Φ Φ Φ we encode the information of fascicles in a connectome into our decomposed model. More specifically, the sparse tensor is computed slice by slice (mode-3), i.e. by looking at one fascicle in the connectome at a time. Each fascicle f is composed by a set of nodes connected, so for each one of the nodes we need: 1) to identify the voxel index v in which the node is located, and 2) find the atom index a having the spatial orientation of that fascicle. Finally, for each fascicle node we set a non-zero entry within the sparse tensor Φ Φ Φ as follows: After encoding all the fascicles nodes into the sparse tensor Φ Φ Φ, we impose the constraint stated in equation (18) by applying the following normalization:

Comparison between the multidimensional encoding framework with the standard fascicles representation
It is standard to represent fascicles as an ordered series of x, y, z coordinates. Each coordinate represents one fascicle node and each fascicle is composed of multiple nodes. Nodes are spaced anything between 0.1 to 1 mm. Our multidimensional encoding framework capture fascicles in a manner that is similar but different from the standard method. Below, we discuss the properties of the multidimensional encoding framework and advantages and potential disadvantages as compared to the standard representation.
• Storage: Supplementary Fig. 2n compares the computer memory required by MatLab the fascicles using our framework and the standard method. We plot: (i) the sparse core tensor Φ Φ Φ and (ii) the vectors of 3D points (fiber group). It is noted that, the sparse core tensor requires slightly more memory than the standard format. This is the case because whereas each fascicle node is represented in the standard format by three double precision floating-point numbers, the array encoding in MatLab uses four double precision floating point numbers. This is due to the sparse-array MatLab implementation (Tensor Toolbox 11,12 ). However, as we discuss below, our tensor encoding allows for very efficient computations.
• Efficient signal prediction model Our first application in the manuscript shows that by using the encoding framework allows discretizing the orientations of diffusion-prediction kernels by using dictionary of pre-computed diffusion tensors. This discretization allows saving memory when predicting the signals as compared to the original LiFE model based on matrices (LiFE M ) 10 .
• Efficient computation The advantage of our encoding method is that to represent in a compact form the integration of the dMRI signal, the streamlines and the LiFE model. By virtue of this integration, the sparse tensor representation makes forward model calculations (based either on the fascicles, data or model) more straightforward by allowing access to streamlined tensor operations, such as direct indexing operations. This is illustrated in the third and fourth application in the manuscript.
• Sub-voxel information The encoding framework loses some information on the position of the fascicles within a voxel. Indeed, when mapping fascicles-nodes to locations in the array Φ Φ Φ nodes are assumed to be centered in each voxel. This approach simplifies most mapping calculations. In practice, the effect of this assumption depends on multiple properties of the data, such as its resolution and SNR. Future advances in measurement to improve data SNR or resolution are likely to mitigate the effects of this assumption.
• Fascicle nodes order The sparse tensor representation loses the information about the order of each node in a fascicle.
However, when needed this information can be conveniently mapped using separated vectors storing nodes order. In addition, in theory the Φ Φ Φ array could in principle be extended by adding one dimension to map the node order.

Fitting the LiFE T model
Once the LiFE T has been built, the final step to validate a connectome requires finding the non-negative weights that least-square fit the measured diffusion data. This is a convex optimization problem that can be solved using a variety of Non-Negative Least Squares (NNLS) optimization algorithms. We used a NNLS algorithm based on first-order methods specially designed for large scale problems 13 . Hereafter, we show how to exploit the decomposed LiFE T model in the optimization.

7/16
The gradient of the original objective function for the LiFE model can be written as follows: where M ∈ R N θ N v ×N f is the original LiFE model, w ∈ R N f the fascicle weights and y ∈ R N θ N v the demeaned diffusion signal. Because the decomposed version does not explicitly store M, below we describe how to perform two basic operations (y = Mw and w = M T y) using the sparse decomposition.

Computing y = Mw
The product Mw can be computed in the following way using a tensor by vector product: where the result is a matrix Y ∈ R N θ ×N v , a matrix version of the vector y. Using the LiFE T model the product is written as follows: In Algorithm 1, we present the steps for computing y = Mw in an efficient way. The product w = M T y can be computed using LiFE T in the following way: where ⊗ is the Kronecker product and I is the (N v × N v ) identity matrix. Equation (27) can be written also as follows 4 : where vec() stands for the vectorization operation, i.e. to convert a matrix to a vector by stacking its columns in a long vector. Because matrix Φ Φ Φ (3) is very sparse, we avoid computing the large and dense matrix D T Y and compute instead only its blocks that are being multiplied by the non-zero entries in Φ Φ Φ (3) . This allows maintaining efficient memory usage and limits the necessary number of CPU cycles. In Algorithm 2, we present the steps for computing w = M T y in an efficient way.

Analysis of the LiFE T model accuracy
where O f (θ θ θ ) is the diffusion signal as defined in Methods, equation (6) (we avoided making reference to the voxel v for clearity) and D(θ θ θ , a), defined in equation (17), is the diffusion signal of atom a at gradient direction θ θ θ = [θ x , θ y , θ z ] T . By T as the vectors pointing out at the directions of the fascicle f and its closest dictionary atom a, respectively (see Supplementary Fig. 2f), and using the "stick" model diffusion tensor, i.e. s a = 1 and s r 1 = s r 1 = 0 in equation (14), the diffusion tensors of the associated fascicle f and its approximation are: Now, using equations (6), (17) and (30) into equation (29), we arrive at: where For a sufficiently small error vector we can approximate ∆ f as follows: By using the fact that |θ θ θ T v| ≤ 1, , and θ θ θ 1 ≤ √ 3 θ θ θ , we obtain: Finally, by using the equation (34) into equation (31), we obtain an upper bound for the error modeling the diffusion signal of one fascicle f at one gradient direction in a voxel: (36) Finally, using equation (36) in equation (12) we obtain the following upper bound of the model error: is a constant (independent of discretization parameter L). Equation (37) clearly states that the achievable relative error is inversely proportional to the discretization parameter L, which allows us to make the model as accurate as we want by just increasing the dictionary resolution, i.e. increasing L.

9/16
Finally, we note that this discretization of the diffusion-orientation prediction on the sphere (Supplementary Fig. 2f) is non-uniform, which in turns leads to a denser discretization around the poles of the sphere. This is the result of using a uniform sampling along the dimensions of azimuth and elevation. This choice improves efficiency in mapping fascicles-nodes to dictionary (D) and sparse array (Φ Φ Φ) entries.This is because we can use an analytic solution to compute the spherical coordinates and map the orientation of each fiber at each node onto the sphere, and the corresponding columns in D and entries in Φ Φ Φ.
Yet, because of the non-uniform sampling on the sphere, it is in principle possible that the LiFE T model could be biased toward supporting more fascicles closer aligned to the poles of the spheres in D. THis is because at orientation closer to the poles the dictionary resolution is highest. We performed an explicit test to show whether the discretization biases the model fit. We compared our original results, both r.m.s error and fascicle weights (w), with new ones obtained by rotating the dMRI image and fascicles 90 degrees (on the axis x) without rotating dictionary. This manipulation changes the position of the higher-density sphere sampling poles in relation to the fascicles and dMRI data. If the density affects the model fit we expect either r.m.s. or w to differ. The test was performed on 8 subjects (HCP3T and STN) using probabilistic tractography (L max = 10) and using dictionaries with the fixed density (that is L = 360, corresponding to about N a = 129, 241 dictionary entries). The results of this test are presented in Supplementary Fig. 2o. The results demonstrate a very small effect of the rotation, with measured less than 0.5% change in r.m.s. error and less than 0.05% change in the assigned weights.

Analysis of the LiFE T model compression factor
Here, we analytically derive the storage requirements of matrix M in LiFE (Supplementary Fig. 2c; 10 ) and its approximation M through LiFE T , decomposed model ( Supplementary Fig. 2d). To do so, as in the previous section, we simplify the analysis by assuming that all fascicles have the same number of nodes N n and that there are no more than one node per fascicle, per voxel. Under these ideal assumptions the amount of memory necessary to store each fascicle f in a sparse matrix M is 3N θ θ θ N n , since using a sparse matrix structure, three numbers are required for each node, i.e. the row-column indices plus the entry value. Thus the storage cost of M is: Conversely, storing fascicles in the LiFE T model requires 4N n values per fascicle plus the dictionary matrix (i.e. the set of the non-zero entries and their locations within the tensor Φ Φ Φ together with matrix D). Thus the amount of memory required in LiFE T model is: where N θ θ θ N a is the storage associated with the dictionary matrix D ∈ R N θ θ θ ×N a . The Compression Factor can be straightforwardly computed as follows: (40) Given that, usually 3N n N f N a , the compression factor can be estimated b as follows: Equation (41) states that the compression factor is proportional to the number of directions N θ θ θ , which represents a substantial reduction in memory requirements for modern datasets. b We note that our experimental results in Fig. 2d showed a compression factor slightly lower than the theoretical estimation because the sparse matrix format implemented in Matlab 14 is relatively more efficient than the sparse array format used in the Matlab Tensor Toolbox 12

10/16
is written as a Sparse Tucker Decomposition by using the core Sparse tensor Φ Φ Φ ∈ R N a ×N v ×N f , multiplied in mode-1 by the dictionary matrix D ∈ R N θ ×N a and, in mode-3, by the vector of weights w ∈ R N f . (f) Error introduced by the discretization: Left, a regular grid in the sphere is obtained by discretizing the space in spherical coordinates (α, β ). Right, the maximum distance between the fascicle orientation vector v and its approximation v a is inversely proportional to the parameter L, i.e. ∆ v ≤ π The function e M ≈ C L was fitted to the data, which resulted in an estimated value C = 28.22, and a fitting error equal to 3.03% (relative root squared error). It is noted that, for L ≥ 360, we obtained a model error smaller than 0.1%. (j) By defining common (orange) and different (green and blue) subsets of indices within vectors w andŵ, the squared weights error is decomposed as the sum of the squared errors associated to the common and different fascicles (see equation 21). (k) The squared error for common fascicles e 2 w c (top) and different fascicles e 2 w d (bottom) computed for STN, STN150 and HCP3T datasets are shown. While the total squared e 2 w is kept below 0.2%, it is highlighted that it is larger in the deterministic case compared against to the probabilistic case. However, the error associated to the different fascicles is very small compared to the common fascicles, which means that essentially both models use almost the same subset of fascicles. (l) and (m) Model size (GB) scales linearly with the number of directions N θ and the number of fascicles N f , however it increases much faster in the LiFE model compared to the LiFE T model. (n) Comparison of memory usage in Matlab to store a connectome as a fiber group (points in the 3D space) and as a sparse core tensor Φ Φ Φ computed on 8 subjects (HCP3T and STN) using probabilistic tractography (L max = 10). It is noted that, even that the core tensor requires moderately more memory, our decomposed tensor model provides the information on the relation between fascicles and diffusion signal. (o) To test the existence of a bias due to the non uniform distribution of dictionary elements, we computed the relative error on the weights e w and difference in rmse values (defined as e rms −ê rms / e rms ) by comparing the results with the one obtained by rotating 90 degrees the diffusion data and the fascicles. These results show that the difference is very small (less than 0.5 % and 0.05 % for the weights and rmse, respectively). Major human tracts from LiFE optimized connectomes (fascicles with nonzero weights) using Probabilistic tractography (L max = 10) for two subjects in the STN dataset and two subjects in the HCP3T dataset. Results obtained by repeating the tractography and optimization ten times are shown in different columns. It is highlighted that connectomes are anatomically discriminable across subjects and datasets (rows) but preserving the anatomy among repetitions of the LiFE evaluation (columns).
14/16 100 0.01 1  Fig. 4. The strength of evidence S is a measure of distance between the distributions of r.m.s errors in voxels for the model with virtual lesion and without virtual lesion (see 10 for details). Similarly to the Earth Mover Distance measure (Fig. 4d), the strength of evidence is positive for the major tracts, which replicates the statistical evidence of major tracts reported in 10 .