CerebrA, registration and manual label correction of Mindboggle-101 atlas for MNI-ICBM152 template

Accurate anatomical atlases are recognized as important tools in brain-imaging research. They are widely used to estimate disease-specific changes and therefore, are of great relevance in extracting regional information on volumetric variations in clinical cohorts in comparison to healthy populations. The use of high spatial resolution magnetic resonance imaging and the improvement in data preprocessing methods have enabled the study of structural volume changes on a wide range of disorders, particularly in neurodegenerative diseases where different brain morphometry analyses are being broadly used in an effort to improve diagnostic biomarkers. In the present dataset, we introduce the Cerebrum Atlas (CerebrA) along with the MNI-ICBM2009c average template. MNI-ICBM2009c is the most recent version of the MNI-ICBM152 brain average, providing a higher level of anatomical details. Cerebra is based on an accurate non-linear registration of cortical and subcortical labelling from Mindboggle 101 to the symmetric MNI-ICBM2009c atlas, followed by manual editing.


Background & Summary
Brain atlases are widely recognized as important tools in research for the analysis of neuroimages. High spatial resolution magnetic resonance imaging (MRI) and improved data preprocessing have enabled the study of structural volume changes in a wide range of disorders. Anatomical atlases are central to the understanding of the brain anatomy and are the best resources to bring prior knowledge about anatomy into any computer vision methodology involved in various types of brain imaging analyses. Anatomical atlases can also be used to investigate potential disease-specific changes that occur in clinical cohorts compared with healthy populations, by providing information on region locations for various regions of interest. Analysis of fMRI data also routinely involves registration to a template and extraction of the average signal within various regions of interest within the corresponding anatomical atlases 1 .
The MNI-ICBM152 brain template 2 , from the Montreal Neurological Institute (MNI) is a crucial tool in neuroimage analysis. This multi-contrast atlas including T1w, T2w and PDw contrasts, was built recruiting brain scans from 152 young adults at 1.5 T. The 2009 edition uses group-wise non-linear registration for better alignment of cortical structures between subjects. The MNI-ICBM152 non-linear model has many advantages. It was created from a large number of subjects; hence it represents the average anatomy of the population and is not biased unlike single-subject models. In addition, the left-right symmetric version enables interpretation of asymmetries that might be found in an analysis.
Mindboggle-101 is the largest, publicly available set of manually labelled human brain images created from 101 human scans, labelled according to a surface-based cortical labelling protocol (DKT-Desikan-Killiany-Tourville labelling protocol) 3,4 . For the creation of the Mindboggle-101 dataset, developed to serve as brain atlas for use in labelling other brains, 101 T1-weighted (T1w) brain MRI images were selected and segmented based on a modification of the DKT cortical parcellation atlas 4 . These labels were then manually edited in agreement with the DKT protocol. Labelling was performed on the surface, yet, topographical landmarks visible in the folded surface were used to infer label boundaries. In addition, Mindboggle used non-cortical labels that were converted from Neuromorphometrics BrainCOLOR subcortex labels 4 .

MNI-ICBM152 template.
This section summarizes the details on generation of the nonlinear MNI-ICBM2009c average template. Further methodological details can be found in the original paper by Fonov et al. 2 . Within the ICBM project, MRI data from 152 young normal adults (18.5-43.5 years) were acquired on a Philips 1.5 T Gyroscan (Best, Netherlands) scanner at the Montreal Neurological Institute. The T1w data were acquired with a spoiled gradient echo sequence (sagittal acquisition, 140 contiguous 1 mm thick slices, TR = 18 ms, TE = 10 ms, flip angle 30°, rectangular FOV of 256 mm SI and 204 mm AP). The Ethics Committee of the Montreal Neurological Institute approved the study, and informed consent was obtained from all participants 2 .
The following preprocessing steps were applied to all MRI scans prior to building the atlas: (1) N3 non-uniformity correction 5 ; (2) linear normalization of each scan's intensity to the range [0-100] by a single linear histogram scaling 6 ; (3) automatic linear (nine parameters) registration to the ICBM 152 stereotaxic space 7 ; and (4) brain mask creation 8 . Only the voxels within the brain volume after linear mapping into stereotaxic space were used for the nonlinear registration procedure described. The template described is generated through a hierarchical nonlinear registration procedure, with diminishing step sizes in each iteration until convergence and relies on the nonlinear registration using Automatic Nonlinear Image Matching and Anatomical Labelling (ANIMAL) 9 . The nonlinear versions of MNI-ICBM2009 (http://nist.mni.mcgill.ca/?p=904) have many advantages over widely used previous versions (i.e. MNI-ICBM non-linear 6 th generation; http://nist.mni.mcgill.ca/?p=858, https://fsl. fmrib.ox.ac.uk/fsl/fslwiki/Atlases). Besides, anatomical variability still remains after linear transformation to stereotaxic space, therefore sulci and gyri remain blurred in previous versions 10 (Fig. 1).

Fig. 1
Comparison between the two versions of MNI-ICBM152 template. Rows 1, 3, and 5 show sagittal, coronal, and axial slices of the MNI-ICBM2009c template, respectively. Rows 2, 4, and 6 show sagittal, coronal, and axial slices of non-linear 6th generation template, respectively. Note the improved tissue contrast and cortical definition of the new template compared to the older 6 th generation version.
www.nature.com/scientificdata www.nature.com/scientificdata/ Mindboggle-101. This section summarizes the generation of the original Mindboggle-101 atlas and additional methodological details can be found in the original paper by Klein and Tourville 2012 3 . The authors started with publicly accessible T1-w MRI scans selected from 101 healthy participants. Scanner acquisition and demographic information can be found in Klein 2012 3 and are also available on the http://mindboggle.info/data website. The data sets that comprise the Mindboggle-101 include the 20 test-retest subjects from the "Open Access Series of Imaging Studies" data 11 , the 21 test-retest subjects from the "Multi-Modal ReproducibilityResource" 12 , with two additional subjects run under the same protocol in 3 T and 7 T scanners, 20 subjects from the "Nathan Kline Institute Test-Retest" set, 22 subjects from the "Nathan Kline Institute/Rockland Sample", the 12 "Human Language Network" subjects 13 , the Colin Holmes 27 template 14 , two identical twins, and one brain imaging colleague.
T1-w MRI volumes were preprocessed and segmented and then, cortical surfaces were generated using FreeSurfer's standard recon-all image processing pipeline 15,16 . FreeSurfer then automatically labelled the cortical surface using its DKT cortical parcellation atlas 4,17 . Vertices along the cortical surface are assigned a given label based on local surface curvature and average convexity, prior label probabilities, and neighbouring vertex labels. FreeSurfer automatically labelled the cortical surface using its DKT cortical parcellation atlas for 54 of the brains in the Mindboggle-101 data set. The region definitions of the labelling protocol represented by the DKT atlas are described by Desikan et al. 4 . These labels were then manually edited in agreement with the DKT protocol with 31 cortical regions per hemisphere as described by Klein and Tourville 3 . Then, the first 40 brains that labelled were selected to train a new FreeSurfer cortical parcellation atlas representing the DKT protocol (see http:// surfer.nmr.mgh.harvard.edu/fswiki/FsTutorial/GcaFormat; S' egonne et al. 17 ; Desikan et al. 4 for details regarding the algorithm that generates the atlas and how it is implemented). The resulting atlas was named "DKT40 classifier atlas" which then automatically generated the initial set of cortical labels for the remaining 47 brains in the data. Finally, Mindboggle data includes non-cortical labels that were converted from the Neuromorphometrics BrainCOLOR subcortex labels (i.e., http://Neuromorphometrics.com/). Details on the original labels included in Mindboggle-101 can be found in https://mindboggle.readthedocs.io/en/latest/labels.html. OASIS-30_Atropos_ template and OASIS-TRT-20_jointfusion_DKT31_CMA_labels_in_OASIS-30_v2 were the template and atlas files that were used for registration to the MNI-ICBM2009 template and manual correction. atlas registration and manual label editing. The Mindboggle-101 average template was first linearly and then non-linearly registered to the symmetric version of MNI-ICBM (MNI-ICBM2009c) template. In both registrations, the Mindboggle-101 template was used as the source image, and the MNI-ICBM2009c template was used as the target image. The linear registration was performed with 9 parameters (-lsq9, 3 for translation, 3 for rotation, and 3 for scaling), using bestlinreg_s2 pipeline from the MINCTools 18 . The resulting image was then nonlinearly registered to MNI-ICBM2009c template using the ANTs diffeomorphic registration pipeline 19 , providing both source and target masks. Using the obtained nonlinear transformation, the Mindboggle-101 atlas labels were also resampled and registered to MNI-ICBM2009c template, using the label resampling option from itk_resample tool (i.e.,-label, applying a nearest neighbor interpolator for discrete labels). The quality of the registration was visually assessed by overlaying the registered Mindboggle-101 and MNI-ICBM2009c templates as well as the registered Mindboggle-101 atlas to ensure accurate transformation of the labels to the MNI-ICBM2009 template. Any remaining inaccuracies were manually corrected on the right hemisphere using the interactive software package Display, part of the MINC Tool Kit, developed at the McConnell Brain Imaging Center of the Montreal Neurological Institute (https://www.mcgill.ca/bic/software/visualization/display, https://github.com/ BIC-MNI). The program allows simultaneous viewing and segmentation in the coronal, sagittal and axial planes, as well as intensity thresholding, label filling, dilation, and erosion. The corrections were performed by A.M., a neurologist with 10 years of experience in reading and assessment of MRIs. Prior to performing these corrections, A.M. received additional training on performing manual segmentations using Display by an anatomist expert both in using Display and MRI segmentation. These corrections mainly involved improvements of boundaries between neighboring regions, addition of missing voxels in some structures (detailed in Table 1) and improving the continuity of voxels within each region. Afterwards, labels were flipped onto the left hemisphere and then visual inspection on each structure was performed. In detail, thickness and boundaries of all 51 cortical and subcortical labels from each hemisphere were improved using intensity thresholds with manual painting using MNI Display. Details on the significant edits that were made for particular structures are provided in Table 1.

Data records
CerebrA probabilistic atlas, including the corresponding T1w template, as well as segmentations of labels are available at G-Node (https://doi.org/10.12751/g-node.be5e62) 20 , TemplateFlow (https://github.com/templateflow/tpl-MNI152NLin2009cSym) and on http://nist.mni.mcgill.ca/?p=904. All imaging data are in compressed MINC 21,22 and NIfTI formats. The registration and resampling scripts, the obtained transformations and the final Mindboggle-101 atlas labels registered to the MNI-ICBM2009 template are also available at https://doi. org/10.12751/g-node.be5e62. We invite contributions by other researchers, in terms of alternative opinions on labeling of included structures.

technical Validation
Comparison between atlases. Dice Kappa similarity index was used to assess the degree of agreement between the CerebrA labels and the original Mindboggle-101 labels, after registration to the MNI-ICBM2009 template. The label agreement before and after manual correction is included to demonstrate the improvement achieved by manual correction. Dice Kappa measures the proportion of the number of voxels that are common between the two masks, over the total number of voxels within the masks, and is defined as: where κ denotes the Dice Kappa coefficient, and V 1 and V 2 denote the two volumes under comparison. A Dice Kappa of 1 implies perfect agreement, whereas a Dice Kappa of 0 implies no overlap between the two masks. When comparing CerebrA to original labels from Mindboggle-101 (Fig. 2) registered to ICBM152, the average Dice Kappa value was κ = 0.73 ± 0.18 (Table 1). The structures with relatively lower Dice Kappa (κ < 0.6) corresponded to the structures that needed the most correction such as the optic chiasm, inferior lateral ventricles, fourth ventricle and cerebellar vermis. The optic chiasm label was barely found in the original Mindboggle-101 registered to ICBM152 and most of it was misaligned with regards to the actual structure. To ensure that this inaccuracy was not caused by the nonlinear registration process, we further inspected the original Mindboggle-101 template and label atlas and found similar issues. For CerebrA, the optic chiasm label was redefined trying to achieve continuity amongst optic chiasma itself and optic tracts (Fig. 3, panel a). Then, the inferior lateral ventricles and fourth ventricle boundaries were improved using a threshold to differentiate CSF from parenchyma (Fig. 3,  panels b and c). And finally, cerebellar vermis labels were redefined for right and left side (Fig. 3, panels d-f).
Another significant change in CerebrA from the original warped labels was the brainstem label definition. The brainstem area was manually redefined for the right side and then flipped in the same procedure as all the labels considering the symmetrical feature of the ICBM152 2009c 2 template. In addition, boundaries between brainstem and fourth ventricle were carefully defined using the CSF intensity threshold, cerebellar white matter labels within the brainstem area were removed and rostral brainstem delimitation was improved (Fig. 4).
Inter-rater and intra-rater variability assessments. To assess intra-rater variability, the rater repeated the process of manual correction for 10 randomly selected regions, and the results were compared against the previously corrected masks. The mean Dice-Kappa values between the two masks were 0.88 ± 0.03. To assess inter-rater variability, the same 10 selected regions were rated a second time by another independent rater and these results were also compared against the previously corrected masks, yielding a mean inter-rater Dice-Kappa of 0.83 ± 0.05.

Volumes of cortical and subcortical structures.
Region volumes were calculated for all cortical and subcortical structures before and after performing the manual correction by summing up the number of voxels within each label (in CCs). These volumes were then log-transformed to achieve normal distribution to enable www.nature.com/scientificdata www.nature.com/scientificdata/ comparisons between the two sets of volumes. Figure 5a shows the correlation plot between the log-transformed CerebrA and Mindboggle volumes. Although the volumes were strongly and significantly correlated (R = 0.9657, P value < 0.001), overall volumes estimated with CerebrA were larger than those estimated with Mindboggle-101 (Fig. 5a). The volumes estimated per structure using Mindboggle and CerebrA segmentation are listed in Table 2. overlap with CSF. Using the CSF mask of the MNI-ICBM152 template, the number of voxels within each label that overlapped with the CSF was calculated to assess which template had a lower overlap with CSF. The four ventricular regions (i.e. lateral ventricles, inferior lateral ventricles, 3 rd and 4 th ventricles) were excluded from this analysis. Figure 5b shows the proportion of overlap of each of the labels with the CSF for each atlas; i.e. the number of voxels in the specific mask overlapping with the CSF mask divided by the total number of voxels inside the specific mask. Overall, Mindboggle-101 showed greater overlap of cortical and subcortical structures with CSF (Fig. 5b). Cerebellar vermal lobule regions (630 and 631) from Mindboggle-101 had the highest degree of overlap (20%) with the CSF, due to a combination of misalignment and over-segmentation errors (see Fig. 2c). www.nature.com/scientificdata www.nature.com/scientificdata/ Fig. 4 Comparison between warped Mindboggle-101 labels and CerebrA labels for the brainstem. The column on the left (purple) represents the original labels from Mindboggle-101 and the right column (coral) represents CerebrA's right sided corresponding labels.