Capturing functional connectomics using Riemannian partial least squares

For neurological disorders and diseases, functional and anatomical connectomes of the human brain can be used to better inform targeted interventions and treatment strategies. Functional magnetic resonance imaging (fMRI) is a non-invasive neuroimaging technique that captures spatio-temporal brain function through change in blood-oxygen-level-dependent (BOLD) signals over time. FMRI can be used to study the functional connectome through the functional connectivity matrix; that is, Pearson’s correlation matrix between time series from the regions of interest of an fMRI image. One approach to analysing functional connectivity is using partial least squares (PLS), a multivariate regression technique designed for high-dimensional predictor data. However, analysing functional connectivity with PLS ignores a key property of the functional connectivity matrix; namely, these matrices are positive definite. To account for this, we introduce a generalisation of PLS to Riemannian manifolds, called R-PLS, and apply it to symmetric positive definite matrices with the affine invariant geometry. We apply R-PLS to two functional imaging datasets: COBRE, which investigates functional differences between schizophrenic patients and healthy controls, and; ABIDE, which compares people with autism spectrum disorder and neurotypical controls. Using the variable importance in the projection statistic on the results of R-PLS, we identify key functional connections in each dataset that are well represented in the literature. Given the generality of R-PLS, this method has the potential to investigate new functional connectomes in the brain, and with future application to structural data can open up further avenues of research in multi-modal imaging analysis.


Introduction
The functional and anatomical connections of the human brain form complex networks that link the infrastructure of our minds.
Understanding these connectomes has the potential to provide insight into the effect of neurological diseases which can be used to better inform targeted interventions and treatment strategies 1,2 .In particular, the functional connectome can shed new light onto neurological conditions such as schizophrenia and autism spectrum disorder (ASD), two conditions that alter brain function from healthy, neurotypical controls 3,4 .
A popular approach used to investigate brain function is functional magnetic resonance imaging (fMRI), a non-invasive neuroimaging technique that measures blood flow through the brain over time 5 .An fMRI image is a complex spatio-temporal picture of the brain with voxels (volumetric pixels) describing the spatial location and a time series for each voxel describing the blood flow over time.To reduce the spatial complexity, voxels can be collated into user-specified regions of interest (ROIs).Functional connectomes can then be investigated through the Pearson correlation matrix between ROIs, known as the functional connectivity matrix.
One approach to investigating functional connectivity is using the partial least squares (PLS) regression method.Introduced by Wold (1975)  6 for use in chemometrics, PLS is an extension of multivariate multiple regression to high-dimensional data that predicts the response data from a set of lower-dimensional latent variables constructed from the predictor data.Popularised for fMRI by McIntosh et.al. (1996)  7 , PLS has been used to explore the relationships between fMRI data and either behavioural data, experimental designs, or seed region activation 8 .However, standard PLS ignores the structure of functional connectivity data -functional connectivity matrices are correlation matrices and hence positive definite.
The space of R × R symmetric positive definite matrices -which includes functional connectivity matrices -forms a convex cone in R(R + 1)/2-dimensional Euclidean space.However, when considered with the affine invariant geometry 9 , the space of symmetric positive definite matrices becomes a complete Riemannian manifold with non-positive curvature.By considering this non-linear geometry on symmetric positive definite matrices we can glean interesting new insights into functional connectivity (see Pennec et.al. (2019)  10 and citations therein).
Here we propose an extension of the PLS model to allow Riemannian manifold response and predictor data, which we call Riemannian partial least squares (R-PLS).The R-PLS model then allows us to predict from functional connectivity data while accounting for the intricate relationships enforced by the positive definite criteria.To fit the R-PLS model, we propose the arXiv:2306.17371v1[stat.ME] 30 Jun 2023 tangent non-linear iterative partial least squares (tNIPALS) algorithm, which is related to previously proposed applications of PLS for functional connectivity data in the literature [11][12][13][14] .We determine the optimal number of latent variables using cross validation.To aid in interpretability of the high-dimensional functional connectivity data, we determine significant functional connections identified by R-PLS using permutation tests on the variable importance in the projection (VIP) statistic 15 , a popular measure of variable importance from standard PLS.
We apply R-PLS to two datasets and two different ROI atlases to demonstrate its versatility.First is the COBRE dataset 16 which investigates differences in functional connectivity between health controls and patients with schizophrenia; we consider the fMRI in the COBRE dataset in the functional multi-subject dictionary learning (MSDL) atlas 17 .Second is the ABIDE dataset 18 which investigates differences in functional connectivity between typical health controls and subjects with ASD; we consider the ABIDE data in the automated anatomic labelling (AAL) atlas 19 .

COBRE
Ten-fold cross validation showed that K = 2 latent variables was the most parsimonious, within one standard error of the minimum root mean square error (RMSE) (K = 3).When compared with Euclidean PLS using raw and Fisher-transformed correlations, R-PLS outperformed both methods across all metrics except for specificity in group prediction (Table 1) .However, all three methods produced similar results for every metric.
A permutation test of the VIP statistic (Equation 5) with 200 permutations found 45 significant functional connections between ROIs as being predictive of age and subject group (Figure 1).To aid interpretability, we have reduced the 39 ROIs of the MSDL atlas into the 17 resting state networks associated to the atlas 20 by taking the mean coefficient value within the ROIs of each network, as suggested by Wong et.al. (2018) 11 .
An increase in subject age tended towards a decrease of within-network connectivity (as measured by a mean decrease in functional connectivity within-networks) with particular emphasis on the auditory network, cingulate insula, and left and right ventral attention networks (Figure 1 (left)).Further, increased age was associated with an increase in between-network connectivity with focus on connectivity involving the cingulate insula and the motor network.
For subjects in the schizophrenic group, the basal ganglia exhibited both increased and decreased connectivity with other networks (Figure 1 (right)).In particular, there was a decrease in connectivity between the basal ganglia and the cerebellum and salience networks, whereas we observed an increase in connectivity between the basal ganglia and auditory and language networks for the schizophrenic group.We also note that the default mode network was highly discriminatory for the schizophrenic group showing increased within-network connectivity and both increased and decreased between-network connectivity.

ABIDE
Ten-fold cross validation found K = 3 latent variables was the most parsimonious, within one standard error of the minimum RMSE (K = 6).When compared with Euclidean PLS using the raw and Fisher-transformed correlations, R-PLS outperformed both methods across all metrics except for specificity in group classification (Table 1).In particular, the R 2 value for R-PLS was substantially larger than the Euclidean methods.
A permutation test of the VIP statistic (Equation 5) with 200 permutations found 208 significant functional connections between ROIs as being predictive of age, subject group, sex and eye status (Figure 2).We aid interpretability by associating the 116 ROIs of the AAL atlas to the seven resting-state networks suggested by Parente and Colosimo (2020) 21 and an eighth containing the cerebellum and vermis, which we call the cerebellum network.
In the ABIDE dataset, increased age was associated to both increased and decreased functional connectivity within restingstate networks (Figure 2 (a)).Although we observed increased between-network connectivity for the thalamus and occipital networks, the cerebellum and default mode network exhibited decreased between-network connectivity with age.
For subjects with ASD we observed increased within-network connectivity with the exception of the limbic network and the thalamus (Figure 2 (b)).We also observed decreased between-network connectivity particularly for the cerebellum and the limbic networks.We observed the same connectivity patterns for subject sex (Figure 2 (c)) For subjects with their eyes closed, our model suggests there was decreased within-network connectivity (Figure 2 (d)).With the exception of the default mode network and the limbic network, we saw decreased between-network connectivity with particular emphasis on the occipital network.

Discussion
The R-PLS model has identified many functional connections associated to age, ASD, schizophrenia, sex, and eye status that are well represented in the literature.In both the COBRE and ABIDE datasets, we identified the reduction of within-network connectivity with age that has been previously observed [22][23][24] , with exceptions in the temporo-parietal, fronto-parietal, limbic and thalamus networks in the ABIDE dataset and the salience network in the COBRE dataset, which all show an increase in connectivity with age.Further, both datasets exhibit the decreased connectivity with the default mode network, consistent with existing literature 25,26 .
For subjects with ASD, the decreased connectivity with the cerebellum 27 and the limbic 28 networks have been previously observed.However, the decreased between-network connectivity suggested by R-PLS is in contradiction with existing literature 11,29 ; in particular, Wong et.al. (2018) 11 showed an increase in between-network connectivity associated to ASD on the full ABIDE dataset using logistic regression.Also, observe that the connectivity for subject sex is highly correlated with the connectivity for the ASD group.Although interactions between subject sex and ASD have been identified 30 , we believe this highlights a possible limitation of R-PLS and requires further investigation in future research.
The role of the basal ganglia in schizophrenic patients has been previously observed, particularly the decrease in connectivity between the salience network and the basal ganglia 31,32 and the decreased connectivity between the cerebellum and basal ganglia 33 .Further, the connectivity patterns involving the default mode network have been previously reported in schizophrenic patients [34][35][36][37][38] .
The results for eye status during scan are also well represented in the literature.The decreased within-network connectivity for the default mode network for patients with closed eyes has been previously reported by Yan et.al. (2009) 39 , and the increased between-network connectivity for the default mode network has recently been discussed by Han et.al. (2023) 40 .Further, the observed decrease in connectivity for the occipital network agrees with Agcaoglu et.al. (2019) 41 .
The use of the VIP statistic to identify significant connections in functional connectivity has not been previously studied.We have demonstrated that this statistic can identify many functional connections that have been addressed previously in the literature, but it is not without its limitations.First, with our focus on generalising partial least squares to Riemannian manifolds, the VIP statistic does not take into account the Riemannian geometry we are considering.This is mitigated by the tangent space approximation we are performing, which directly accounts for the geometry of the data, but further research could help better generalise the VIP statistic for R-PLS.Further, the VIP statistic associates the effects of a single predictor on the full multivariate response.In situations like we consider here, this makes it difficult to determine which functional connections are associated to which outcome variable.For example, the connectivity within the default mode network is deemed significant by the VIP statistic in the ABIDE dataset, but it is unclear whether this connectivity is significance for every outcome variable or a subset of them.Work has been done to generalise the VIP statistic when the outcome variable is multivariate 42 , but further research is needed to investigate this generalisation.
These results suggest that R-PLS can provide insight into the functional connectome and how it relates to subject phenotype data.Further, due to the specification and generality of the R-PLS model, this method is readily applicable to other imaging modalities, and in particular to multimodal imaging studies.The application of R-PLS to multimodal imaging studies is an area of future research that may help to us to understand the functional networks that make up the human connectome.

Data
The International Neuroimaging Data-Sharing Initiative (INDI) is an initiative set to encourage free open access to neuroimaging datasets from around the world.We consider two datasets that are accessible as a part of the INDI.

COBRE
The Center for Biomedical Research Excellence (COBRE) have contributed structural and functional MRI images to the INDI that compare schizophrenic patients with healthy controls 16 .The data were collected with single-shot full k-space echo-planar imaging with a TR of 2000 milliseconds, matrix size of 64 × 64 and 32 slices (giving a voxel size of 3 × 3 × 4 mm 3 ).These data were downloaded using the PYTHON package nilearn v 0.6.2, and contains 146 subjects (Control = 74), each with phenotype information on subject group and age; further information is available in Table S1 of the supplementary material.
The fMRI data were preprocessed using NIAK 0.17 under CentOS version 6.3 with Octave version 4.0.2 and the Minc toolkit version 0.3.18 43.The data were subjected to nuisance regression where we removed six motion parameters, the frame-wise displacement, five slow-drift parameters, average parameters for white matter, lateral ventricles, and global signal, as well as 5 estimates for component based noise correction 44 .
For the COBRE dataset, we consider each fMRI in the MSDL atlas, a functional ROI decomposition of 39 nodes across 17 resting state networks 20 .Time series were extracted for each ROI by taking the mean time series across the voxels in each region.

ABIDE
The Autism Brain Imaging Data Exchange (ABIDE) is part of the Preprocessed Connectomes Project in INDI 18 .The ABIDE data is a collection of preprocessed fMRI images from 16 international imaging sites with 539 individuals diagnosed with ASD and 573 neurotypical controls (NTC).The ABIDE initiative provides data preprocessed under four separate standard pipelines, as well as options for band-pass filtering and global signal regression.
Here we consider the 172 subjects (NTC = 98) of the New York University imaging site.We restrict to this site to reduce inter-site variation in imaging and because it is the largest individual imaging site.The data were collected with a 3 Tesla Allegra MRI using echo-planar imaging with a TR of 2000 milliseconds, matrix size of 64 × 64 and 33 slices (giving a voxel size of 3 × 3 × 4mm 3 ).The fMRI data were downloaded using the PYTHON package nilearn v 0.6.2preprocessed using the NIAK 0.7.1 pipeline 43 .The data were subjected to: motion realignment; non-uniformity correction using the median volume; motion scrubbing; nuisance regression which removed the first principal component of 6 motion parameters, their squares, mean white matter and cerebrospinal fluid signals, and low frequency drifts measured by a discrete cosine basis with a 0.01 Hz high-pass cut-off; band-pass filtering and; global signal regression.We consider the subjects preprocessed fMRI as well as subject group, age, sex, and eye status during scan (open or closed); further information is available in Table S2 of the supplementary material.
For the ABIDE dataset, we consider each fMRI in the AAL atlas 19 , an anatomical atlas of 116 nodes across the brain.Time series were extracted by taking the mean time series across the voxels in each ROI.

Partial least squares in Euclidean space
PLS is a predictive modelling technique that predicts a response matrix Y n×q from a set of predictors X n×p .Originally introduced in the chemometrics literature by Wold (1975)  6 , PLS has found application in bioinformatics 45 , social sciences 46 , and neuroimaging 8,47,48 ; see Rosipal and Krämer (2006)  49 and citations therein for further examples.As an extension of multivariate multiple regression, PLS has been shown to have better predictive accuracy than multivariate multiple regression when the standard regression assumptions are met 50 .A further advantage of PLS is that it is effective when q > n or p > n since it performs prediction from lower dimensional latent variables, that is, PLS constructs a new set of predictor variables from X to predict Y 50 .
Let X n×p and Y n×q be predictor and response matrices respectively.Suppose X and Y are column centred, that is, suppose the means of each column of X and Y are 0. PLS proposes the existence of L ≤ min{p, n} latent variables such that X and Y decompose into a set of scores matrices T n×L and U n×L , and loadings matrices P p×L and Q q×L with where E n×p and F n×q are error matrices, assumed to be a small as possible 51 , and the superscript T denotes the matrix transpose.Further, PLS assumes that there is a diagonal matrix B L×L with where H is a matrix of residuals.Equations 1 and 2 are called the outer relationships while Equation 3 defines the inner relationship that connects X and Y .Combining the inner relationship and the outer relationship for Y gives which highlights that Y is a regression on the latent scores T .Further, notice that the error in Y is given by HQ T + F, that is, error in Y is a combination of error inherent to the response data (F) and error from the estimation of the inner relationship (HQ T ).The inclusion of the residual matrix H can complicate discussion of the PLS method, so it is common to consider the estimated inner relationship Û ≈ T B instead 51,52 .
Estimation of the PLS model (Equations 1-3) is commonly done through the non-linear iterative partial least squares (NIPALS) algorithm (Algorithm S1 in the supplementary material).The inputs for the NIPALS algorithm are the data matrices X and Y and the pre-specified number of latent variables K; noting that the true number of latent variables L is unknown, the value K can be chosen with methods such as cross validation.The NIPALS algorithm outputs estimates of the scores, loadings, and regression coefficients as well as matrices W p×K and C q×K known as the weights.The weight matrices W and C are linear transformations of P and Q that more efficiently fit the PLS model and are defined within the NIPALS algorithm; see the supplementary material for further information.Using the results of the NIPALS algorithm and Equations 1-3, we can write is the matrix of regression coefficients.Using β PLS we see that PLS is a linear regression technique similar to ordinary least squares and ridge regression.

4/15
The VIP statistic To determine significant predictors of the response variables in the PLS model, we use the VIP statistic 15 .Suppose there are p predictor variables, q response variables, and K latent variables extracted using NIPALS.Following Tennenhaus (1998) 53 , the VIP statistic for the j th predictor variable is where t k is the k th column of the score matrix T , w jk is the k th weight for the j th predictor, Rd(Y ,t k ) = 1 q q ∑ i=1 cor(Y i ,t k ) 2 , and Rd(Y ,t k ).The coefficient cor(Y i ,t k ) 2 is the squared correlation between the i th response variable and the k th score.The denominator Rd(Y , T ) in Equation 5measures the proportion of variance in Y explained by T , and the numerator Rd(Y ,t k )(w jk ) 2 measures the proportion of variance in Y described by the k th latent variable that is explained by the j th predictor 54 .Thus the VIP statistic measures the influence of each predictor on the explained variation in the model 55 .
Commonly, the "greater than one" rule is used to find predictors significantly associated with the response.However, this rule is motivated by the mathematical properties of VIP j rather than statistical properties 54 .Thus, we use a permutation test to determine significance of VIP j .This is an alternative to Afanador et.al. (2013) 56 who used 95% jackknife confidence intervals to determine significance of VIP.
Specifically, for each predictor variable j we permute the values H times.For each permutation h = 1, 2, . . ., H we refit the PLS model and calculate VIP j,h .The P-value for the j th VIP score is then For our data, the predictors are functional connectivity matrices.Thus, we know a priori that the diagonal elements are uninformative since they are identically one.Hence, if predictor j describes a diagonal element we set P-value j = 1 for all i.To account for the multiple comparisons problem, we adjust all P-values using the false discovery rate 57 and determine significance at a significance level of α = 0.05.

Riemannian manifolds
Intuitively speaking, a Riemannian manifold M is a space where we can perform calculus, measure distances, and measure angles between tangent vectors.More specifically, a smooth d-dimensional manifold M is a connected, Hausdorff, second countable topological space that is covered by a set of coordinate charts {(U i , ϕ i : U i → R d )} i∈I , defined by some indexing set I, such that every point in M belongs to a U i for some i ∈ I and the intersection maps ϕ i • ϕ −1 j are smooth as maps R d → R d for every i, j ∈ I.These coordinate charts make the space M "locally Euclidean" in the sense that every point has a neighbourhood that looks like Euclidean space.Since concepts from differential calculus are local in nature, the construction of a smooth manifold allows us to perform calculus on these more general spaces.
An important concept in the study of manifolds is the tangent bundle T M = a∈M T a M, where T a M is the tangent space at a.The space T a M is defined as the set of equivalence classes of curves through a such that γ 1 and γ 2 are equivalent if γ 1 (0) = γ 2 (0), where the prime denotes the derivative.Then T a M is a vector space that generalises the notion of vectors tangent to a surface to arbitrary smooth manifolds.
A Riemannian manifold is a manifold M together with a smooth map g : is an inner product for every a ∈ M. The Riemannian metric g allows us to measure angles between tangent vectors and measure distances between points on the manifold M. Further, g is used to define geodesics (locally length minimising curves) γ : [t 0 ,t 1 ] → M between two points a, b ∈ M. We only consider complete Riemannian manifolds here, which are spaces where every geodesic γ has domain R.
Through geodesics we get the concepts of the Riemannian exponential and logarithm maps which allow us to smoothly move between the manifold and the tangent space.The Riemannian exponential at a point a ∈ M is a map Exp a : T a M → M defined by Exp(a, •)(γ) = Exp a (γ) = γ(1), where γ is a geodesic such that γ(0) = a.The Riemannian exponential is a smooth map that is locally diffeomorphic and hence has a local inverse denoted Log(a, •) = Log a : M → T a M defined by Log a (b) = γ (0) where γ(t) is a geodesic from a to b.For a point b ∈ M close to a, we think of Log a (b) as the shortest initial velocity vector based at a pointing in the direction of b.Further information on Riemannian manifolds can be found in the books by Lee (2011, 2012, 2018) [58][59][60] or do Carmo (1992) 61 .An accessible introduction for medical imaging can be found in the book edited by Pennec et.al. (2019) 10 .

Fréchet mean
To capture the centre of data on a manifold we consider the Fréchet (or intrinsic) mean of data X 1 , X 2 , . . ., X n ∈ M. First, consider the Riemannian distance between two close points X 1 , X 2 ∈ M defined by where • is the norm in T X 1 M induced by the Riemannian metric.By generalising the sum of squared distances definition of the arithmetic mean, the Fréchet mean 62 is given by We solve for µ X using gradient decent 10 ; see Algorithm S2 in the supplementary material for further information.

The affine invariant geometry for symmetric positive definite matrices
Let GL R R be the set of R × R real invertible matrices.The set of symmetric positive definite matrices is defined by where superscript T denotes matrix transpose.The set S + R is a smooth manifold, which can be easily seen by embedding S + R into R R(R+1)/2 as a convex cone.This construction shows that the tangent space at each A ∈ S + R is given by the set of symmetric R × R matrices.
However, S + R has an interesting intrinsic geometry known as the affine-invariant geometry 9 .Under the affine invariant geometry S + R becomes a complete Hadamard manifold -a Riemannian manifold of non-positive curvature where Exp A is a diffeomorphism for every A ∈ S + R .The affine-invariant metric g is defined by where A ∈ S + R , U,V ∈ T A S + R , and Tr denotes the trace operator.Using g, we can calculate the Riemannian distance between A, B ∈ S + R as where σ r A −1/2 BA −1/2 are the eigenvalues of A −1/2 BA −1/2 , r = 1, 2, . . ., R. Further, letting A, B ∈ S + R and U ∈ T A S + R , we get where Exp and Log are the matrix exponential and logarithm respectively.The Riemannian distance, exponential, and logarithm are essential in the definition and fitting of the R-PLS model defined below.

6/15
where e i ∈ T Exp µ X (∑ L l=1 t il p l ) M and f i ∈ T Exp µ Y (∑ L l=1 u il q l ) M are error vectors with e i , f i small.Equations 7 and 8 are the outer relationships for Riemannian data, and Equation 9 is the inner relationship connecting our response and predictor.Note that, since the Riemannian exponential map on Euclidean space is vector addition, if M = R p and N = R q the R-PLS model (Equations 7-9) reduce to the standard PLS model (Equations 1-3).
One approach to fitting R-PLS is by directly generalising NIPALS (Algorithm S1) to Riemannian manifolds (see, for example, Ryan (2023) 42 ), but this becomes computationally intensive and fails to converge for sample sizes above 20.Instead, we propose a tangent space approximation to fitting R-PLS when our data is close to the Fréchet mean, similar to methods such as Riemannian canonical correlations analysis 63 and principal geodesic analysis 64 .
The tNIPALS algorithm (Algorithm 1) works by first linearising the manifold data in a neighbourhood of the Fréchet mean using the Riemannian logarithm (see supplementary material for further information), and then applying the Euclidean NIPALS algorithm to the linearised data which is now vector-valued.Thus, tNIPALS provides a combination of the simplicity and efficiency of Euclidean NIPALS with the geometry of the Riemannian manifold.
The tNIPALS algorithm provides a more general approach to Wong et.al.'s (2018) 11 method for constructing predictors from functional connectivity matrices to predict ASD using PLS and logistic regression by considering a Euclidean response and symmetric positive definite predictor.Similarly, the methods of Zhang and Liu (2018) 13 and Chu et.al. (2020) 12 are also generalised by tNIPALS.The tNIPALS algorithm for R-PLS is also closely related to the PLS method for symmetric positive definite matrices offered by Perez and Gonzalez-Farias (2013) 14 .

Model fitting
For each dataset we predict the phenotype information (age, group, sex, eye status) from the functional connectivity data using the R-PLS model.To deal with low-rank functional connectivity matrices in the ABIDE dataset (which are not positive definite), we consider regularised functional connectivity matrices F = F + I following Venkatesh et.al. (2020) 65 , where I is the 116 × 116 identity matrix.For comparison, we also fit the standard PLS model using the upper triangle of the functional connectivity matrices as the predictors (raw correlations), as well as their Fisher transformed values (Fisher correlations).
We determine the optimal number of latent variables through ten-fold cross validation using the "within one standard error" rule 66 when minimising the root mean square error.Due to the interest in the COBRE and ABIDE datasets in investigating the differences between healthy controls and patients, we also present the group classification metrics of accuracy, sensitivity, and specificity.
To investigate the functional connectomes associated to each phenotype variable, we consider the regression coefficient matrix β PLS (Equation 4) where the i th column represents the effect of the functional connectivity matrix on the i th response variable.We visualise the columns of the matrix β PLS as symmetric matrices in the tangent space of the Fréchet mean for each dataset.as well as the cerebellum network comprising the cerebellum and vermis.(a) shows the coefficients that predict age, (b) shows the coefficients that predict group, (c) shows the coefficients that predict sex, and (d) shows the coefficients that predict eye status.The darker outlined boxes show the top 25% influential regions as measured by the absolute coefficient value within and between each network.

Figure 2 :
Figure 2: Significant regression coefficients as measured by VIP for the R-PLS model on the ABIDE dataset with K = 3 latent variables, visualised as symmetric matrices.The regression coefficients have been averaged over the 7 resting state networks identified by Parente and Colosimo (2020)21 as well as the cerebellum network comprising the cerebellum and vermis.(a) shows the coefficients that predict age, (b) shows the coefficients that predict group, (c) shows the coefficients that predict sex, and (d) shows the coefficients that predict eye status.The darker outlined boxes show the top 25% influential regions as measured by the absolute coefficient value within and between each network.

Table 1 .
Mean (SE) 10-fold cross validation results for R-PLS on the COBRE and ABIDE datasets, and Euclidean PLS using the raw and Fisher transformed correlations.The value K represents the optimal number of latent variables for each model using the within one standard error rule.The full model metrics are the multivariate R 2 and RMSE.The group classification metrics look at the classification for subject group only.R-PLS is the best model for both datasets over all model metrics, except for specificity (bold values).Tangent non-linear iterative partial least squares.Input:Data X 1 , X 2 , ..., X n , Y 1 ,Y 2 , ...,Y n , Desired number of components K. Output: PLS weights {w k } K k=1 , {c k } K k=1 , Scores {t k } K k=1 , {u k } K k=1 , Loadings {p k } K k=1 ,and Regression coefficients{ β1k } K k=1 . 1 Calculate Fréchet means µ X , µ Y (Algorithm S2 * );2 Linearise the data by;3 x i ← Log µ X X i ; 4 y i ← Log µ Y Y i ; 5 Map x i , y i to Euclidean space via coordinates φ on T µ X M and ψ on T µ Y M; 6 Perform NIPALS (Algorithm S1 * ) on {(x i , y i )} to get weights {w k } K k=1 , {c k } K k=1 , scores {t k } K k=1 , {u k } K k=1 , loadings {p k } K k=1, and regression coefficients { β1k } K k=1 ; 7 Map w k , c k and p k back to their appropriate tangent spaces using φ −1 and ψ −1 .