Introduction

The precise relationship between brain structural and functional connectivities has puzzled leading researchers in network neuroscience1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16. Despite numerous efforts, an accurate prediction of the structural connectivity from its functional counterpart remains still a distant goal. Beyond link-level network comparisons between the two modalities, a modular-level comparison has shown bi-directional advantages in the understanding of brain structure and function17,18,19,20,21,22,23,24,25,26,27,28,29. Although non-invasive magnetic resonance imaging (MRI) techniques have a limitation in the network resolution that they can resolve30, arriving at about one millimeter resolution, multi-scale approaches between brain structural and functional modules have been studied to assess organizational aspects in healthy population13,26,31,32,33, in relation to cognition34,35, during development36 and aging37, or in some pathological conditions38,39.

When evaluating the correspondence at the module-level (also known as aggregate level) between structure and function, there are two complementary strategies in relation to the chosen network resolution. One is the optimal strategy, in which a fixed level of network representation is established, for example, maximizing a certain network metric40,41,42,43,44,45. Therefore, this optimal solution depends on the chosen metric which we have optimized to cut the hierarchical tree at a given level. The alternative approach explores the multi-scale nature of the data, using a nested hierarchy of network resolutions as potential features or input variables to be synergistically integrated to address a specific problem35,37,39,46. Therefore, for the multi-scale strategy, brain networks at different resolutions are combined, and the problem in question is solved without setting any resolution level in the hierarchical tree.

While progress has been made in assessing the module-level correspondence between structural and functional connectivities in the human brain, many questions remain unanswered. Among these inquiries, it is important to assess the extent to which the different modules within structural connectivity favor the emergence of functional modules. Moreover, it is crucial to investigate the presence of dynamic processes that influence the relationship between structural and functional modules. Identifying the scale at which the correspondence between modules in both representations reaches a maximum is also of great relevance. Understanding the insights of the pathological brain and the extent to which the interplay between structure and function is altered within this context are pressing matters. Additionally, elucidating the underlying mechanism responsible for such alterations is crucial.

In the pursuit of addressing these queries, there are brain data repositories like the Human Connectome Project47, which offers functional connectivity matrices and diffusion data, yet lacks diverse structural connectivity matrices at varying resolutions. Another repository, MICA-MICs48, more closely resembles our database, providing 50 structural and functional connectivity matrices across several resolutions. The latest database, utilizing data from the UK Biobank49, contains 40,000 connectomes from healthy individuals. However, its access is somewhat restricted, as it is part of the UK Biobank database and requires payment for access. Our dataset’s distinctiveness lies in its inclusion of multiscale structural-functional modules (γ-parametrized) and associated gene transcriptomic profiles.

This study aims to address this need by providing derivative functional and structural connectivity matrices at different scales and code to the scientific community, enabling progress in addressing these questions, which constitute a central focus of research in our laboratory. Specifically, in this article, we initially introduce a modeling approach to investigate the interplay between structure and function by parameterizing the degree of overlap between the two classes of connectivity. Secondly, building upon the methodology described in24, we extend it to examine the module-level correspondence between the two connectivity matrices. Finally, we have extended the study on the relationship between brain structure and function by sharing also neuro-genetic code and data, delving into the underlying biological and molecular foundations of brain-related disorders, similarly to previous work50,51,52,53,54,55,56,57,58,59,60. We aimed to bridge the gap between transcriptomics and connectomics, particularly for those skilled in network analysis but not in data handling. This objective has been our primary drive.

Results

Structural and functional connectivity matrices at various resolutions were built making use of brain images from the open dataset “Max Planck Institut Leipzig Mind-Brain-Body Dataset” – LEMON61,62, well-known for having high-quality multimodal acquisitions and preprocessed MRI sequences. Rather than using different subjects across lifespan, we specifically selected 136 young participants, aged between 20 and 30 years old, who had both the preprocessed rs-fMRI sequence and DWI sequence available. We next processed the raw images following standard neuroimaging pipelines to obtain structural connectivity (SC) and functional connectivity (FC) matrices (Fig. 1A). Subsequently, we built population connectivity matrices by choosing, for each link in the matrix, the median value across all the equivalent links in the individual connectivity matrices. Finally, we modeled the amount of interplay between structure and function connectivities using a single fusion parameter, γ. When γ equals 0 or 1, we have purely structural or functional connectivity, respectively. In intermediate situations, the amount of overlapping connectivity is modulated by γ. Furthermore, and similar to the methodology described in24, for each value of the fusion parameter γ, we applied hierarchical agglomerative clustering to the resulting γ-fused structure-function matrix, γSFC (Fig. 1B).

Fig. 1
figure 1

Methodological sketch and pipeline. (A) Participants’ raw MRI data from N = 136 healthy volunteers of the MPI-LEMON dataset were used. From preprocessed rs-fMRI images, we generated an initial brain parcellation using gray-matter masks and the open repository pyClusterROI. Subsequently, we employed our own publicly available code on GitHub to process the DWI images and extract the SC matrices. All post-processed data at the participant level, including the time series BOLD signal corresponding to the different iPAs, and the SC and FC matrices are available at https://zenodo.org/record/8158914. (B) We calculated population SCp and FCp matrices, and subsequently generated γ-fused matrices as described in the methods section. The γSFC matrices were used to construct dendrogram trees for different γ parameters. Additionally, we employed abagen to generate transcriptomic expression matrices for various tree-based parcellations. The transcriptomic matrices and dendrogram trees are also accessible on Figshare. The entire pipeline and project codes can be found on GitHub at https://github.com/compneurobilbao/bha2.

Optimal γ-fused structure-function modular organization

The different trees of the γSFC matrices describe distinct scales, enabling the construction of networks at different resolutions. The determination of the optimal partition by cutting the tree at M* modules depends on the specific metric we seek to maximize. In order to showcase the versatility of our data with various metrics, we specifically concentrate on a metric we have previously defined24 and named cross-modularity χ, defined as the triple product of the modularity of the functional matrix, the modularity of the structural matrix, and the similarity between structural and functional modular configuration.

Here, we aimed to maximize χ to select the optimal value of γ for a given initial parcellation atlas (iPA). Figure S1 shows box-plots of different values of the cross-modularity χ across different dendrogram levels and for different iPAs with different size, ie., varying the number of micro-regions ranging from 183 to 2165. For further analyses, we selected the iPA with 2165 micro-regions since it exhibited a higher mean cross-modularity value together with a lower variability within a wide range of scales, particularly within the first 120 levels where the maximum χ is located. Beyond this point, there was a decreasing trend. The optimal brain partition corresponds to the iPA that maximizes χ by reducing the dendrogram to M* modules and setting the fusion parameter to γ*. In our case, the optimal value γ* was determined to be 0.7, and this occurred at the level of 28 modules. However, two modules were considered invalid as they consisted of only one and two micro-regions, respectively, making it impractical to analyze the community structure within them. Henceforth, any subsequent analysis referring to the optimal brain partition will be based on the iPA that maximizes χ by reducing the dendrogram to M* = 26 modules and setting the fusion parameter to \({\gamma }^{\ast }=0.7\).

Multi-scale γ-fused structure-function modular organization

As an illustrative example, and to gain a deeper understanding of how the multi-scale organization is influenced by γ, we investigated the network strength of the γSFC matrix in two network representations. At the finest spatial resolution, corresponding to the lowest level in the hierarchical tree consisting of 2165 micro-regions (represented as brain maps in Fig. 2A), and for each of the macro-regions, including the frontal lobe, parietal lobe, occipital lobe, temporal lobe, insula, and a collection of subcortical areas (Fig. 2B). From both analyses, it is evident that increasing γ, transitioning from structure (γ = 0) to function (γ = 1), results in higher strengths shifting from subcortical areas to cortical regions. This pattern is observed consistently across all macro-regions, where the strength progressively increases with γ, peaking around \(\gamma \approx 0.7\).

Fig. 2
figure 2

Graph-node strength is modulated by γ, the amount of interplay between structural connectivity (SC) and functional connectivity (FC). (A) Brain maps of graph-node (normalized) strength. It is observed that for γ = 0 (pure SC), high strengths are predominantly localized in subcortical regions. However, as γ varies, there is a decrease in strength within these regions and an increase in cortical regions. (B) Box plots for the distribution of graph-node strength across different macro-regions: Frontal lobe, Parietal lobe, Temporal lobe, Occipital lobe, a collection of Subcortical areas, and Insula. Notably, for each macro-region, the strength shows an increasing trend from γ = 0, reaching a peak at \(\gamma \approx 0.7\), followed by a subsequent decrease.

Beyond the two selected levels of resolution for calculating strength, a hierarchical tree organization enables obtaining different metrics at different module-levels. For instance, we first colored the M* = 26 distinct modules of the optimal brain partition for ease of visualization (Figure S2, top panel). Some tree metrics we computed were the module size (MS), height (H), and multi-scale index (MSI). Very briefly, MS represents the count of micro-regions encompassed within a module. MSI quantifies the number of levels within the tree structure where a module remains intact without further division. H denotes the specific level at which a given micro-region separates from its parent module. Therefore, while H is defined for each micro-region, MS and MSI are defined at the module-level. We also defined the module height (MH), by averaging over all individual H values within the given module (Figure S2, full dendrogram and dendrogram-inset). We represented these metrics on brain plots, where we observed distinct patterns for each module (Figure S2, brain plots).

We conducted further analyses to examine the correlations among these metrics and the intra-module strength, as a proxy for module segregation (Figure S3). From the different tree measures, both MS and MH showed a high correlation (r = 0.95, p < 0.001) with module segregation, as well as between themselves (r = −0.84, p < 0.001). Interestingly, the first module M1 was an outlier in the metric of MSI. Located at the border between the Precuneus, Isthmus Cingulate, and Posterior Cingulate cortices (Fig. 3), M1 showed a remarkable resilience in remaining intact without splitting across multiple levels in the tree, possibly indicating a multi-scale functional role.

Fig. 3
figure 3

Brain localization, Functional description and Anatomical description of the optimal brain partition. Left: The different M1 to M26 modules belonging to the optimal brain partition are depicted within brain representations for localization purposes. Right: Additional information is provided regarding the specific location of each module, incorporating anatomical details and functional representations derived from the overlapping analysis among different modules and established Desikan-Killiany and Resting-State Network (RSN) atlases.

Anatomical and functional description of a given brain partition

Our methodology for defining a given partition defined by specific values of M and γ is generally data-driven. At some point, it may be of interest to characterize the anatomical and functional properties of a specific partition. As an illustrative example, we made use of the optimal partition with M* = 26 modules and \({\gamma }^{\ast }=0.7\) and determined the spatial locations of these 26 modules within the brain (Fig. 3, brain-glasses). We next obtained their functional and anatomical characterization by measuring respectively the amount of overlap within known Resting State Networks (RSNs)63 and regions from the Desikan-Killiany atlas64, the latter consisting of 34 cortical regions (in right and left hemispheres), 8 subcortical regions segmented from Freesurfer, and the cerebellum (Fig. 3, heatmaps). Several of our modules exhibited strong overlapping within the RSNs. For instance, module M6 encompassed the Somato-motor Network (SMN) while also integrated parts of the Dorsal and Ventral Attention Networks (DAN, VAN). Module M18 showed an overlap with a portion of the Default Mode Network (DMN), and module M14 did it with the Visual Network (VIS). Moreover, certain modules displayed high overlapping with specific anatomical regions. For example, module M20 included the medial and lateral Orbitofrontal cortices, while module M26 integrated the Basal Ganglia and Thalamus.

Neurobiological relevance of a given brain partition in brain-related disorders by making use of neurogenetics data

Essential for a comprehensive biological interpretation is characterize the modules of a given partition based on their participation in major brain-related disorders. To accomplish this, we focus again on the optimal partition and made use of publicly available transcriptomic data from the Allen Human Brain Atlas (AHBA)50, and constructed transcriptomic expression matrices at module-level (Fig. 1B). In particular, we examined the transcriptomic expression within each module for genes associated with 40 different brain-related disorders, using a list of disorders and their corresponding genes extracted from the available data in65. For each disorder, the average expression across samples within a module was computed for the relevant genes, and z-score values were then calculated based on the optimal partition (Fig. 4, heatmap). Modules with a z-score > 2, indicating high expression levels, or z-score < −2, indicating low expression levels, were marked with an asterisk. As the disorders were categorized into 7 disease groups (Psychiatric disorders, Substance abuse, Movement disorders, Neurodegenerative diseases, Tumor conditions, Developmental disorders, and Others), we also generated a z-score map representing the average expression of all genes associated with each disease group (Fig. 4, brain-glasses). Modules M3 and M26 exhibited high relevance across multiple disorders. M26 represented a subcortical circuit encompassing the Basal Ganglia and Thalamus, while M3 represented a Thalamo-cortical circuit comprising a portion of the right Thalamus, a small segment of the right Putamen and Pallidum, and the right Caudal Middle Frontal, Superior Frontal, and Precentral gyri. M14 was relevant for 7 disorders and included the Occipital pole and the Medial Visual Cortex. Lastly, M8 displayed relevance in 6 disorders and represented a circuit connecting the Amygdala and Hippocampus with the Precuneus via the Parahippocampal and Isthmus Cingulate cortices.

Fig. 4
figure 4

Neurogenetic module-level data for disease-related transcriptomic characterization of the optimal brain partition. Left: Within each module of the optimal brain partition (M1 to M26, x-axis), we depict the median transcriptomic expression values across different sets of genes associated with various brain-related disorders. Modules with significantly high/low expression values compared to the rest are denoted by a white asterisk (*). Notice that only specific modules among the 26 exhibited significant expression values. Notably, modules encompassing the basal ganglia and thalamus (M3 and M26) demonstrated significance across all disease groups. Right: Brain maps of median transcriptomic expression, averaged over different maps belonging to the same group of brain disorders. Left, Right: Z-scores were calculated across all the modules to indicate a relative high/low gene expression levels across modules.

Discussion

In the words of Karl Friston, how rich functionality emerges from the invariant structural architecture of the brain remains a major mystery in neuroscience7. Previous research in network neuroscience has provided evidence for the emergence of hierarchical network modularity, facilitating the separation of neural computations at various spatial scales13,18,21,26,31,32,33,34,36,38. However, the specific role played by structure in functional organization, as well as the reciprocal influence, remain elusive. A key contributing factor to this gap in knowledge is the continuous interaction that takes place across multiple temporal and spatial scales30. Consequently, when employing a particular imaging modality for measurement, obtaining a comprehensive understanding of the intricate interplay between different scales becomes challenging.

For many years, our laboratory has been investigating the interplay between structure and function at the module-level. In our initial work 8 years ago24, we demonstrated excellent correspondence among modules, but not at the level of links in connectivity matrices. Since then, we have expanded our research to include studies on healthy brains35,42,44 and pathological brains37,39,40,41,43,45,46, exploring the multi-scale structural-functional correspondence at the module-level. Through collaborations with other groups, we have identified a lack of clear open data and repositories that facilitate this line of research. Many research groups in the field of physics and mathematics, specializing in complex network methodology, lack access to preprocessed brain networks derived from raw neuroimaging data. Motivated by this gap, our study aims to address this issue by employing state-of-the-art neuroimaging analyses, offering structural and functional networks at different scales, with both networks built using the same set of network nodes. Not only do we provide the final connectivity matrices, but also the code to generate them. Additionally, we introduce hybrid structure-functional networks through the parameterization of their overlap using the parameter γ. Furthermore, we have also released pre-processed neurogenetic data and code, to facilitate the connection between structural-functional correspondence across different scales and underlying biological mechanisms, opening up a promising avenue for studying brain disorders.

The structure of the released data, code and analyses are as follows:

Our choice to build brain partitions was based on the pyClusterROI package67 rather than using other existing parcellations68,69. This decision was driven by two primary factors. Firstly, it allows for a gradual increase in the number of ROIs across different scales. Notably, in our cross-modularity analysis (see Figure S1), we observed higher values for parcellations with more than 1000 ROIs. This suggests that, particularly for the metric of cross-modularity, there could be more optimal parcellations beyond 1000 ROIs (which is the maximum resolution provided by other parcellations eg. Schaefer’s69). This possibility might extend to other metrics, prompting us to retain this feature in our code. Secondly, and crucially for us, our choice included subcortical ROIs alongside cortical ones. This is particularly important in the study of brain pathologies, as many diseases impact subcortical regions in their early stages, and later on, in more severe stages.

In this manuscript, which aims to be a practical guide for studies using functional and structural connectivity matrices at different resolution scales, and their neurogenetic associations in relation to some brain pathologies, we have performed specific analyzes that resulting in four different figures. Firstly, Fig. 1 presents a methodological overview of the various datasets and key steps involved in our analyses. Figure 2 aims to demonstrate the continuous parametric effect of γ on different network metrics, and for that, we have depicted the distributions of node-graph strength at two different spatial resolutions. One resolution encompassed 2165 distinct microregions, while the other aggregated them into six distinct macro-regions covering the entire brain: Frontal lobe, Parietal lobe, Temporal lobe, Occipital lobe, a collection of Subcortical areas, and Insula. Figure 3 depicts a widely accepted strategy employed in neuroimaging studies for the comprehensive characterization of a specific brain partition, encompassing both functional and anatomical labels for each brain region. Finally, Fig. 4 shows how gene transcriptomic imaging data can be used to characterize a given partition (in our example, the same partition chosen for Fig. 3) in relation to a set of genes with significant associations in 40 different brain disorders. These disorders are grouped into various categories such as psychiatric disorders, substance abuse, movement disorders, neurodegenerative diseases, tumor conditions, or developmental disorders.

To summarize, with this study we aim to provide valuable resources in the form of dataset-derivative functional and structural connectivity matrices at different scales and code to facilitate the exploration of the intricate interplay between brain structure and function across diverse spatial scales, including the aggregated level. It is our intention to contribute to unraveling the enigmatic nature of this structure-function relationship, which, despite significant progress, still harbors substantial gaps in our understanding. We encourage other researchers with an interest in this area to make use of the resources presented here to further contribute to the elucidation of this fascinating phenomenon.

Material and Methods

Raw neuroimaging dataset

The Max Planck Institut Leipzig Mind-Brain-Body Dataset, commonly referred to as LEMON61,62, is a comprehensive multimodal dataset encompassing MRI sequences, EEG, ECG, and behavioral scores. In this study, we focused on a subgroup of N = 136 healthy individuals within the age range of 20 to 30 years, with a male population comprising 98 individuals (72%). The selection of young individuals stems from existing research indicating age’s influence on both structural and functional network connectivity37. For more considerations about sex-bias affecting the analyses, see subsection sex bias considerations below. For the precise selection of the IDs individuals, see https://zenodo.org/record/8158914/files/participants.tsv. Specifically, for this study we made use of three different MRI sequences: T1, rs-fMRI, and DWI, with the aim of extracting functional and structural connectivity matrices. For a comprehensive description of the sequences employed, please refer to Table 7 in61.

Preprocessed neuroimaging dataset

LEMON also provides access to preprocessed data. In our study, we made use of the brain-extracted T1 and rs-fMRI images as input data for further analysis. For preprocessing DWI images, we made used of a custom pipeline utilizing the the brain-extracted T1 images, MRtrix3 (\(v3.0\_RC3\)), FSL (v6.0.1), and ANTs (v2.3.1). Our DWI pipeline involved cleaning the raw images using DWIdenoise and DWIpreproc tools, which included correction for eddy current distortion and susceptibility-induced distortion using the topup tool. Next, we performed whole-brain probabilistic tractography using the iFOD2 tracking algorithm in MRtrix3. The tractography was initiated from seeds located at the gray matter-white matter interface, with a selection of 3 million fibers based on an angle threshold of <45° and a length threshold of <200 mm, following the recommendations in the MRtrix3 documentation. The tool SIFT270 was used to correct for the effect of crossing and disconnection of fibers on the quantification of the total number of streamlines.

Data-driven anatomical and functional solutions for different initial brain partitions

Following a methodology similar to24, and utilizing the preprocessed rs-fMRI images, we performed an unsupervised voxel-level clustering analysis to delineate a specific number of micro-regions following the strategy proposed in67. These micro-regions define the highest spatial resolution scale employed for subsequent analyses. In contrast to24, where all brain voxels were included into the same clustering process, our methodology here involved six separate clustering procedures, each one focused solely on the voxels contained within distinct macro-regions, namely the frontal lobe, parietal lobe, occipital lobe, temporal lobe, insula, and subcortical structures (pooling together the thalamus, basal ganglia, amygdala, and hippocampus). Thus, while a functional partition was applied, the resulting regions were anatomically constrained by the defined macro-regions. By aggregating all micro-regions obtained from the six different macro-regions, we generated an initial Parcellation Atlas (iPA). We followed the same procedure to generate 9 different iPAs, each consisting of a different number of micro-regions, specifically 183, 391, 568, 729, 964, 1242, 1584, 1795, and 2165.

Brain functional and structural connectivity matrices, and their fusion through the parameter γ

FC matrices were generated by computing the Pearson correlation between pairs of regions fMRI time-series, each one obtained by averaging voxel-wise fMRI time-series within each micro-region within a specific iPA. For SC matrices, we employed the tck2connectome tool to perform fiber counting of the streamlines connecting pairs of micro-regions in the iPA. Prior to constructing SC, a nonlinear registration of the iPA to the b0 image of the DWI sequence was performed.

From the individual SC and FC matrices, we computed population matrices SCp and FCp by calculating for each link the median value across all equivalent links in the individual matrices. Next, and to match the sparse nature of SCp, a threshold was applied to FCp, resulting in comparable link density. Subsequently, we binarized the two matrices and fused them into a single matrix as follows:

$$\gamma {\rm{S}}{\rm{F}}{\rm{C}}\equiv \gamma \times {\rm{F}}{{\rm{C}}}_{{\rm{p}}}+(1-\gamma )\times {\rm{S}}{{\rm{C}}}_{{\rm{p}}},$$

where γ is a real number between 0 and 1. This strategy allows us to continuously parameterize the level of interplay between SCp and FCp, enabling the recovery of each connectivity class when γ = 0 or γ = 1, while obtaining hybrid connectivities for intermediate values.

Multi-scale hierarchical representation at module-level

For each of the 9 distinct iPAs, a hierarchical agglomerative clustering (HAC) was employed to extract different nested modules, allowing to build different network resolutions at different scales (each one determined by a cutting tree number of modules M). More specifically, HAC was performed using a weighted method on the connectivity patterns obtained from the matrix γSFC, here defined as the correlation distance between pairs of micro-regions in the γSFC matrix.

Within the context of tree or dendrogram partitions, different multi-scale metrics can be established. These metrics can be defined either at the module-level or at a more granular level (with the highest spatial resolution determined by the micro-region level). In this particular study, we have defined the module size (MS), representing the count of micro-regions encompassed within a module. Additionally, we have introduced the multi-scale index (MSI), which quantifies the number of levels within the tree structure where a module remains intact without further division. In relation to micro-regions, we have selected the height (H) metric, which denotes the specific level at which a given micro-region separates from its parent module.

The optimal brain partition based on cross-modularity maximization

After obtaining a hierarchical partitioning of the connectivity matrix, clustering micro-regions into distinct modules M, the determination of the optimal level for cutting the tree depends on the specific metric being optimized. In this study, following the methodology introduced in24, the metric chosen for maximization is the cross-modularity χ, defined as:

$$\chi \equiv {\left({Q}_{F}\times {Q}_{S}\times {T}_{FS}\right)}^{\frac{1}{3}}$$

This metric simultaneously accounts for three different qualities: the modularity of the functional matrix (QF), the modularity of the structural matrix (QS), and their similarity (TFS). The latter is defined as the DICE similarity between a functional module and a structural module, and averaging across all modules in the given partition. By varying M along the tree, we can calculate χ for each configuration M and select the optimal partition M* where χ is maximized. Additionally, unlike the approach in24, we here have introduced a second parameter γ* in the maximization of χ, which controls the amount of structure-function interplay.

Transcriptomic data at module-level to assess brain-related disorders

In addition to multi-scale brain partitions and structure-function analysis, we processed the transcriptomic open data from the Allen Human Brain Atlas (AHBA)50 by using the abagen tool57, with the default settings, which follow the recommendations of53. Notably, this included setting the ibf_threshold to 0.5, ensuring inclusion of genes where sample signal exceeds 50% of background noise, resulting in a comprehensive list of 15,633 genes. We did not apply the abagen.keep_stable_genes function; therefore, our gene list was not refined using differential stability criteria. Abagen allows us the generation of brain transcriptome module-level values for a specific parcellation, that in our case, we chose the optimal brain partition. Moreover, we examined the transcriptomic expression at module-level in relation to genes associated with 40 brain disorders introduced in65. To do this, we calculated the mean transcriptomic expression of all genes related to each disorder within each module. Subsequently, we performed a z-score analysis for each disorder to identify modules with low transcriptomic expression (z < −2) and modules with high transcriptomic expression (z > 2). The average expression of the different genes was also grouped into 7 different disease cateogories: Psychiatric disorders, Substance abuse, Movement disorders, Neurodegenerative diseases, Tumor conditions, or Developmental disorders.

Sex bias considerations

Our selection of young individuals led to choosing 136 young participants from the LEMON database, with an unexpected male predominance (72%). The manuscript’s repository (https://github.com/compneurobilbao/bha2) includes instructions for accessing the sex data of participants (https://zenodo.org/record/8158914/files/participants.tsv), so if a new researcher creates a partition using a sex-balanced cohort, this limitation could be effectively addressed.