Introduction

Understanding the brain’s organization is one of the key challenges in human neuroscience1 and is critical for clinical translation2. Parcellation of the brain into functionally and structurally distinct regions has seen impressive advances in recent years3, and has grown the field of network neuroscience4,5. Through a range of techniques such as clustering6,7,8,9, multivariate decomposition10,11, gradient based connectivity1,12,13,14,15,16, and multimodal neuroimaging1, parcellations have enabled fundamental insights into the brain’s topological organization and network properties17. In turn, these properties have allowed researchers to investigate brain-behavioral associations with developmental18,19, cognitive20,21, and clinical phenotypes22,23,24.

More recently, researchers interested in understanding brain organization are presented with a variety of brain atlases that can be used to define nodes of network-based analyses25. While this variety is a boon to researchers, the use of different parcellations across studies makes assessing reproducibility of brain-behavior relationships difficult (e.g. comparing across parcellations with different organizations and numbers of nodes5). Amalgamating multiple brain parcellations into a single, standardized, curated list would offer researchers a valuable resource for evaluating replication of neuroimaging studies.

Some efforts to consolidate these atlases is already underway. For example, Nilearn is a popular Python package that provides machine-learning and informatics tools for neuroimaging26. Nilearn provides several single line command line interface functions to ‘fetch’ both atlases and datasets. Nilearn includes twelve anatomically and functionally defined atlases, such as the Harvard-Oxford27 and Automated Anatomical Labeling (AAL)28 parcellation. Although a promising prototype, Nilearn’s current atlas collection represent a limited range of available atlases, and the more recent gradient based, surface based, and multimodal parcellations have yet to be included into any central repository. More importantly, existing atlas repositories have not attempted to systematically standardize their collections following a single specification. Without well-established standards, the investigator is faced with limited information about each atlas, so connecting neuroscientific findings to the organization of the atlas becomes more difficult. Moreover, if the investigator requires a comparison across atlases, some form of metadata must be available that describes the similarities and differences between them. Neuroparc mitigates these issues by providing: (1) a detailed atlas specification which will enable researchers to both easily understand existing atlases and generate new atlases compliant with this specification, (2) a repository of the most commonly used atlases in neuroimaging, all stored in that specification, and (3) a set of functions for transforming, comparing, and visualizing these atlases. The Neuroparc package presented here includes 46 different adult human brain parcellations—including surface-based and volume-based. Here, we provide an overview of the relationship between these parcellations via comparison of the spatial similarity between atlases, as measured by Dice coefficient and adjusted mutual information. To facilitate replication and extension of this work, all the data and code are available from a the registered OSF repository29 or the github repository https://github.com/neurodata/neuroparc.

Results

Atlases

Through the use of the python scripts provided in Neuroparc, 46 atlases were resampled to either 1 mm3, 2 mm3, or 4 mm3 and registered to the Montreal Neurological Institute 152 Nonlinear 6th generation reference brain (MNI). Each atlas had a accompanying JSON file containing the relevant metadata described in Methods. Of the 46 atlases, 17 lost at least one ROI from the resampling and registration process, which is recorded in the JSON files. This phenomenon was more common with atlases containing a greater amount of ROIs due to the smaller average ROI size. The smaller the ROI, the greater the chance it is overwritten by surrounding larger ROIs when down-sampled or registered. Visualizations of a variety of atlases at 1 mm³ resolution are shown in Fig. 1. The number of ROIs present at each voxel size is recorded in Table 1

Fig. 1
figure 1

A comparison of the regions present in the major atlases available in Neuroparc. These visualizations were made using MIPAV tri-planar views on the same slice numbers. Each atlas shows a cross-section in each of the canonical orthogonal planes (H = Horizontal, S = Sagittal, C = Coronal). For most atlases, the slice numbers were (90, 108, 90). There are a few exceptions for visualization purposes: JHU: (90, 108, 109), Slab907: (95, 104, 95), Slab1068: (93, 105, 93)1,27,28,32,35,36,38,39,52,53,54.

Table 1 This table contains the atlases included in Neuroparc and the number of ROIs per voxel size, showing the number of ROIs lost during resampling and registration.

Dice coefficient

A python script, provided in Neuroparc using functions from both AFNI30 and FSL31, was used to calculate the Dice Coefficients between pairs of atlases. See the Dice Coefficient section in Methods for more information about the calculation. Dice Score Maps, such as Fig. 2, for each of the pairs of atlases can be found on the Neuroparc OSF registered repository29. The purpose of these maps are to both reinforce the differences between each parcellation and what they represent, as well as serve as comparative metric for ROIs across atlases. Access to these values allow for easy cross-parcellation analysis, a useful tool to anyone using Neuroparc. On inspection, each Dice Coefficient Map was accurate in its representation of overlap present between ROIs of two different atlases.

Fig. 2
figure 2

Dice Score Map between the Yeo-17 Networks atlas and the 300 parcellation Schaefer atlas. In a Dice Score Map, the larger the Dice score, the larger the percentage of overlap. Due to Schaefer’s larger quantity of ROIs, several different ROIs overlap a single Yeo-17 ROI. In this Dice Map, the 0 ROI for Yeo-17 represents the background of the image, or the empty space in the image. This ROI not having a Dice value of 0 indicates that both atlases don’t cover the same amount of brain volume.

Adjusted mutual information

Using another python script, provided in Neuroparc GitHub repository, the Adjusted Mutual Information (AMI) between atlases was calculated. See the Adjusted Mutual Information section in Methods for more information about the calculation. The results, displayed in Fig. 3, affirm the necessity for having access to multiple different parcellation methods during data analysis. Each atlas was created using different reference data and designed to track specific structure or functionality present in the brain. If atlases were similar to the point of being interchangeable, this wide variety of AMI scores would not exist, along with the reason for having a repository of atlases. The AMI amongst the Schaefer atlas set32, DS atlas set33, and Slab34 atlases was consistently greater than 0.8, an expected result due to their creation using the same methodologies with different parameters and tolerances.

Fig. 3
figure 3

The adjusted mutual information between atlases contained within Neuroparc. Atlases that were generated from the same algorithm using different parameters, such as Yeo, Slab, Schaefer, and DS have an expected high amount of mutual information.

Both the JHU35 and Princeton36 atlases displayed a consistently low (<0.3) AMI value for the majority of other atlases. This can be explained by the fact that both atlases only relate to anatomical sub-structures, such as the visual cortex for JHU and hippocampal region for Princeton. Their limited coverage of the brain results in less mutual information with the other surface-based or volume-based atlases.

Discussion

Why use neuroparc

The purpose of the Neuroparc atlas collection and metadata formatting method is two fold: (1) to provide a repository of standardized parcellations that can be used interchangeably without any additional effort, and (2) to document all relevant information about each parcellation for easy use in research. Neuroparc succeeds in both of these aspects, as well as enabling a new level of comparison between atlases. Using the formatting method proposed in this paper, any user of the repository has the ability to find where each atlas came from, how many different ROIs exist in the atlas, the location and size of each of each ROI, how the segmentation of an atlas compares to others, and whether there is a significant correlation between areas covered by ROIs from different atlases. The formatting method also allows for constant improvement and refinement of atlas metadata, discussed below in Future Development. By standardizing the atlases, researchers can easily analyze MRI data in MNI space using a variety of atlases without additional processing. Metrics provided by Neuroparc, such as adjusted mutual information and the dice coefficient, also inform users as to how the atlases are related.

Potential issues

The method used to generate the atlases for Neuroparc can result in the loss of ROIs due to down-sampling and registration. The chance of this occurring inversely correlates with the average size of ROIs in a given atlas. The ROIs that are lost are still cataloged in the corresponding JSON file, with a value of “null” being given for the center coordinates and size. While there do exist ways to attempt to prevent this loss of information, excessive manipulation of a given atlas depending on the voxel size potentially compromises any conclusions derived using said atlas. As such, the parcellations in Neuroparc do not incorporate these additional methods and it us on the user to decide how best to resample the corresponding 1 mm voxel parcellation to fit their unique needs.

Future development

With the current iteration of Neuroparc there are several routes for improvement. The most apparent is the expansion of the atlas collection. Our proposed methodology for standardizing new atlases and tracking metadata makes this task a simple one. With the emphasis on clear and concise information, approval of any new set of atlases is a quick and simple process. There also exists the ability to standardize all atlases to other spaces besides MNI, allowing for atlases offered in both different voxel sizes and standardized spaces. Another route for growth of Neuroparc is the anatomical labeling of atlases whose ROIs do not have clearly defined anatomical boundaries. The anatomical labels currently in Neuroparc are taken from the published work where they were first made. To keep the rational of the original authors, very little was done to the labels provided, mainly rewording for clarity and to follow the largest structure to smallest structure method. Due to subjective nature of labeling the atlases, an agreed upon anatomical labeling reference would first have to be made. From there, anatomical labeling could be assigned pragmatically.

The methodology proposed in this paper, as well as the atlas repository in Neuroparc, attempt to address the lack of standardization and centralization for brain atlases. We believe that Neuroparc exemplifies the first steps towards solving this issue. We call on other researchers to utilize the resources contained within and encourage everyone to contribute.

Methods

Data compilation

The atlases contained in Neuroparc were collected from a variety of locations. As previously noted, there is no current standard for atlas storage, so all gathered datasets are converted into a single format. Collected atlases were re-sampled using AFNI’s 3dresample30 to either 1 mm3, 2 mm3, or 4 mm3 voxel resolution and then registered to a reference T1-weighted image described below. The sources and additional information for each of the atlases can be found in the README file in the GitHub repository.

Reference brain

To allow direct comparison between different atlases, a standard reference brain must be used for all involved atlases. Within Neuroparc, a single reference brain is provided and multiple resolutions, yielding a consistent coordinate space. Neuroparc uses Montreal Neurological Institute 152 Nonlinear 6th generation reference brain, abbreviated MNI152NLin6 in the file naming structure37. While there are a symmetric and asymmetric version of the MNI152NLin6 T1-weighted image, Neuroparc atlases are registered to the symmetric version, which is used by both FSL 5.031 and new versions of SPM. However, code provided in Neuroparc allows for the registering of any atlas to any reference brain the user chooses.

This image is stored in a GNU-zipped NIfTI file format of a T1-weighted MRI and is available in Neuroparc at three resolutions (1 mm3, 2 mm3, and 4 mm3) for easy use when registering. The naming convention for these files clearly displays their source and resolution as: MNI152NLin6_res-<resolution>_T1w.nii.gz. For example, the format of the resolution input would be “1 × 1 × 1” for the 1 mm3 resolution.

Atlas images and processing

The atlas images compiled in Neuroparc were stored in the form of GNU-zipped NifTi files containing the parcellated atlas. In these files, each region of interest (ROI) within the parcellated image is denoted by a unique integer ranging from 1 to n, where n is the total number of ROIs. Atlases were resampled to the desired voxel resolution through the use of AFNI’s 3dresample30 and then registered to the MNI image of the same resolution using FSL’s flirt function31. The naming convention for the resulting atlas was: <atlas_name>_space-MNI152NLin6_res-<resolution>.nii.gz. The atlas_name field is unique for each atlas image, ideally no more than two words long without a space in between (e.g. Yeo-1738, Princetonvisual36, HarvardOxford27).

Atlas metadata

Using a Python script in Neuroparc, a JSON file containing relevant meta-data was generated for each of the atlases. This file was split into two sections: region-wide and atlas-wide information. The naming convention of the JSON file follows that of the atlas image, with an atlas with 1 mm3 voxel size being named <atlas name>_space-MNI152NLin6_res-1 × 1 × 1 .json.

The term “region-wide” refers to information unique to each region of interest (ROI) in said atlas. This information includes the voxel value for that ROI in the atlas, an anatomical label (if possible), the coordinates of the center of the ROI, and the number of voxels that make up that region. The center and size can be calculated using provided code in the scripts directory of Neuroparc.

Although the label must be specified, this information is not relevant for all atlases. For example, atlases that are generated using algorithmic means, like Slab34, have ROIs that do not strongly correlate to individual anatomical regions. In that case, NULL should be used for the labels of the regions. For ROIs that do have anatomical significance, the naming should follow a hierarchical format in order of largest region to smallest with an underscore in between each name. Modifiers, such as “Superior” or “Medial” can be placed before the anatomical region. An example of this is in the Desikan atlas39, which contains the region with label “L _rostral _anterior _cingulate_cortex”. The main purpose of labeling is to clearly convey the location of an ROI and any anatomical significance it may have. The avoidance of unique abbreviations or terminology that is not widely used helps with the ease of use for individuals new to MRI analysis. Figure 4 shows an example json file.

Fig. 4
figure 4

An example JSON file rubric for storing atlas metadata. Metadata stored in brackets (“color”, “description”, etc.) is optional but encouraged.

Optional fields in the region-wide data include description and color. Description can be used to provide more information than the region label if necessary. An example of this use is in the Yeo-7 Networks atlas38. The label for this atlas is in the form ‘7Networks _2’, but the description for that label is the corresponding functional network, ‘Somatomotor’ in this case. The color field must be given in the form [R, G, B] and is only used if the user wants to specify the colors of the regions upon visualization.

Brain-wide data must include the name, description, native coordinate space, and source of the atlas. The name field allows for more elaboration than in the name of the file. The description is more flexible, allowing the creator of an atlas to briefly describe important information for users of their atlas. The intended use case or the method of generation are examples of information provided in this field. Since all atlases in Neuroparc are stored in the same coordinate space, the coordinate space used during the creation of the atlas must be specified.

Finally, the publication detailing the atlas should be included in the source field so users can have a more full understanding of the atlas being used. Optional fields for brain-wide data can all be calculated, including the number of regions, the average volume per region, whether the segmented regions are hierarchical, and if the atlas is symmetrical.

The full description and format of the atlas specification is available within Neuroparc at https://github.com/neurodata/neuroparc/tree/master/atlases/atlas_spec.md.

Dice coefficient

As a way to compare the different atlases to each other, since each has been registered to MNI space, we calculated the Dice Coefficient between atlases. The Dice coefficient is a measure of similarity between two sets40. Specifically, it measures a coincidence index (CI) between two sets, normalized by the size of the sets. Let h be the number of points overlapping in the sets A and B, and a and b are the sizes of their corresponding sets. If the two sets are labelled regions in segmented images, then the Dice coefficient between any pair of regions between the images is given by

$${CI}_{ij}=\frac{2{h}_{ij}}{{a}_{i}+{b}_{j}}$$
(1)

where i is the region in image 1 and j is the region in image 2. The result is a similarity matrix, as shown in Fig. 2. Since this map visualizes similarity between two regions in two atlases, the information provided by the Dice map can be used quantify which regions in a given atlas are most similar to regions in another atlas. This method has proven valuable for performing inference with parcellations lacking anatomical annotation, as it allows conclusions realized at the parcel level to be inferred at the anatomical level41.

Adjusted mutual information

Adjusted mutual information is another measure of the similarity of two labelled sets, quantifying how well a particular point can be identified as belonging to a region given another region. It differs from the Dice coefficient in that it tends to be more sensitive to region size and position relative to other measures42.

Similar to the Dice coefficient, Adjusted mutual information is not dependent on a region’s label43. Volumes that share many points are likely to be have a higher mutual information score all else being equal44.

To assure that all atlas comparisons were on the same scale, Neuroparc computes the adjusted mutual information score. Let H(·) denote entropy, N be the number of elements (voxels) in total, and E(MIAB) denote the expected mutual information for sets of size a and b. Here, PA(i) is the probability that a point chosen randomly from the set A will belong to region i45.

$$H(A)=-\,\mathop{\sum }\limits_{i=1}^{N}{P}_{A}(i)log({P}_{A}(i))$$
(2)
$$M{I}_{AB}=\sum _{a,b}{P}_{A,B}(a,b)\cdot log\frac{{P}_{A,B}(a,b)}{{P}_{A}(a)\cdot {P}_{B}(b)}$$
(3)

where PA,B(a, b) is the probability that a voxel will belong to region a in set A and region b in set B.

$$E[MI(A,B)]=\mathop{\sum }\limits_{i=1}^{N}\mathop{\sum }\limits_{j=1}^{N}\,\mathop{\sum }\limits_{{n}_{ij}={({a}_{i}+{b}_{j}-N)}^{+}}^{min({a}_{i},{b}_{j})}\,\frac{{n}_{ij}}{N}log\left(\frac{N\cdot {n}_{ij}}{{a}_{i}{b}_{j}}\right)\frac{{a}_{i}!{b}_{j}!(N-{a}_{i})!(N-{b}_{j})!}{N!{n}_{ij}!({a}_{i}-{n}_{ij})!(N-{a}_{i}-{b}_{j}+{n}_{ij})!}$$
(4)

where ai is the number of voxels in region i of set A and bj is the number of voxels in region j of set B. \({({a}_{i}+{b}_{j}-N)}^{+}=max(1,{a}_{i}+{b}_{j}-N)\).

$$AM{I}_{AB}=\frac{M{I}_{AB}-E(M{I}_{AB})}{mean(H(A),H(B))-E(M{I}_{AB})}$$
(5)

as provided in46.

Figure 3 shows the adjusted mutual information between all pairs of atlases. The information provided for this score is atlas-wide, while the Dice score was computed per region to generate a map. The similarity between groups of atlases, such as the various Schaefer atlases, the Yeo liberal atlases, and the DS atlases, is immediately apparent. Recent work highlighting the complementary information provided by disparate parcellations stresses the importance of the availability and ease-of-use of a collection of parcellations from heterogeneous sources41,47,48,49.