Iterative consensus spectral clustering improves detection of subject and group level brain functional modules

Specialized processing in the brain is performed by multiple groups of brain regions organized as functional modules. Although, in vivo studies of brain functional modules involve multiple functional Magnetic Resonance Imaging (fMRI) scans, the methods used to derive functional modules from functional networks of the brain ignore individual differences in the functional architecture and use incomplete functional connectivity information. To correct this, we propose an Iterative Consensus Spectral Clustering (ICSC) algorithm that detects the most representative modules from individual dense weighted connectivity matrices derived from multiple scans. The ICSC algorithm derives group-level modules from modules of multiple individuals by iteratively minimizing the consensus-cost between the two. We demonstrate that the ICSC algorithm can be used to derive biologically plausible group-level (for multiple subjects) and subject-level (for multiple subject scans) brain modules, using resting-state fMRI scans of 589 subjects from the Human Connectome Project. We employed a multipronged strategy to show the validity of the modularizations obtained from the ICSC algorithm. We show a heterogeneous variability in the modular structure across subjects where modules involved in visual and motor processing were highly stable across subjects. Conversely, we found a lower variability across scans of the same subject. The performance of our algorithm was compared with existing functional brain modularization methods and we show that our method detects group-level modules that are more representative of the modules of multiple individuals. Finally, the experiments on synthetic images quantitatively demonstrate that the ICSC algorithm detects group-level and subject-level modules accurately under varied conditions. Therefore, besides identifying functional modules for a population of subjects, the proposed method can be used for applications in personalized neuroscience. The ICSC implementation is available at https://github.com/SCSE-Biomedical-Computing-Group/ICSC.


Modules for multiple folds of HCP dataset
In order to understand the reproducibilty of ICSC modularization on a subset of the dataset, we divided the HCP dataset into three folds, such that the average age of the subjects and the gender ratio in each fold is maintained. We ran 20 independent runs on the subject data in each fold and selected the run with the optimal consensuscost. We compute the adjusted mutual information and the normalized mutual information between the group level modularizations obtained with different folds and with the complete dataset (Fig. S2). We found that the modularizations obtained from different folds were not only very similar to each other (AMI = 0.87, NMI: 0.91), but also to the one obtained with the whole dataset (AMI = 0.92, NMI = 0.94).
Choosing L max for ICSC The ICSC algorithm expects the user to know the range of expected number of modules given by (L min , L max ). While the lower limit L min can be fixed based on prior scientific knowledge, the choice of L max has a trade-off in terms of time and optimization of the quality function associated with it. While smaller than optimal values of L max will limit the options available for individual-level modularizations generating sub-optimal group-level partitions, larger than optimal values of L max will lead to more s k (l) being evaluate and take a longer time to converge.
We demonstrate this by running 10 independent runs with varying L max = {20, 25, 30, 35, 40}, and observing the consensus-cost and distribution of number of individual-level modules {L k } K k=1 . For L max = {20, 25}, we found that there is a sharp drop in the {L k } histogram pointing to several individual-level modularizations taking the sub-optimal values for L k . For L max = 30, 35, 40, we observed a smooth distribution of {L k }. The consensus-cost was found to be optimal for L max = 30, with the average number of iterations ranging from 7.0 ± 2.9 for L max = 20 to 32.4 ± 20 for L max = 40.

Intra-subject variability of functional modules
Studies have shown variability in regional functions across multiple scans of the same individual, which is attributable to noise and different states of mind during fMRI acquisition. We measured the variability of the modular memberships of brain regions across multiple scans of the same subject by using the nodal purity. We found a narrow spread of values for nodal purity (average 0.501 ± 0.043) across the subjects. The intra-subject nodal purities corresponding to different brain anatomical locations are shown in figure S4, where the sizes of the nodes correspond to the values of purity. We found that all the nodes had similar purity in contrast to the inter-subject nodal purity where nodes had pronounced differences in purity. Intra-subject purity scores for different ROIs along with their anatomical coordinates are attached in the file 'Supplement nodal purity.csv'.

Using the ICSC algorithm on synthetic data
To quantitatively assess the ICSC algorithm, we generated synthetic data with known ground-truth modular structure mimicking human brain functional connectivity. For evaluating the performance while detecting the group-level  modularizations, we generated multiple individual data with varying amount of noise and inter-subject variability in modular assignment. We found that the ICSC algorithm can detect group-level modularizations close to the ground-truth even at low SNR values (Fig. S5(a)) and high inter-subject variability (Fig. S5(b)).
For an individual with multiple scans, we assumed that the functions of brain regions are invariant across the scans Figure S4: The intra-subject purity corresponding to different anatomical locations of brain ROIs. The sizes of the nodes correspond to the values of purity and the colors denote the functional modules. and therefore only the number of scans are varied per individual. We found that the individual-level modularization becomes closer to the ground-truth as the number of scans per individual increases (Fig. S5c).

Group-level modules from averaged and thresholded connectivity matrices
Previous studies in the area have used a thresholded and averaged group matrix to detect group-level modules. Thresholding removes weak edges and makes the graph sparser, which reduces the computational complexity but loses vital information related to the functional modules. We studied the group-level modularization on the HCP resting state data by using module detection approaches on an averaged and thresholded group-level matrix. For thresholding, we used percolation analysis [1], which iteratively removes weak edges till the connectedness in the network is preserved. Percolation analysis has been applied for thresholding networks for modularization in studies involving humans before [2]. The percolation threshold is where nodes start getting disconnected from the largest component. For our data, this was around 90%. For the sake of completeness, we computed the group-level modules at two other thresholds (10% and 40%) besides the percolation threshold ( Fig. S6(a)).
For the Louvain algorithm, we also varied the resolution parameter, γ between 0.3 and 1.3 and chose the modularization with the highest modularity. For low thresholds, we observed isolated nodes in the network. Across different thresholds, we observe that Infomap and Louvain algorithms give a few number of modules, while Asymptotical Surprise gives two large modules (module 1 containing >100 nodes and module 2 containing > 50 nodes) and multiple smaller modules composed of 7 nodes or less.  Figure S6: The group-level modular structure obtained with different module detection algorithms at different thresholds of connectivity. The group matrix was obtained from an average of thresholded subject connectivity matrices, where thresholding was performed by percolation analysis. Isolated nodes that result from thresholding are not shown.