Background & Summary

To study large morphological brain changes associated with ageing and disease, magnetic resonance imaging (MRI) data of the brain are acquired then analysed, often by segmenting tissue types (i.e. grey matter, white mater, cerebro-spinal fluid) and parcellating the brain into regions which are defined by gyri and sulci. Most gyral volumes that are publicly available have been derived either automatically or semi-automatically. The latter group typically stems from automated parcellation (or segmentation), followed by manual changes, correcting for errors often due to inaccuracies in the segmentation and/or the parcellation scheme1,2. Because of the manual aspect, they are usually classified as ‘manually-generated’. These datasets are considered as the ‘gold standard’ and exist for the various brain regions— cortical2,3, subcortical4, abnormal (such as tumours or lesions5), and the whole brain (LPBA406; NeuAtlas (Neuromorphometrics, Inc. NeuAtlas,— although they are not always available to the public4. In theory however, manually-generated data should only refer to volumes that are hand-drawn, from beginning to end, and are therefore free of software-related bias. Because the manual process is very tedious and time-consuming, such datasets are very rare (e.g., MNI-HISUB257).

Irrespective of the nature of segmentation, available datasets rarely provide population demographics, details on how the regions have been drawn or obtained, or whether anatomical variations are considered1. Scarce attempts have been made in retrospect to address these issues2. This situation proves to be challenging for end-users following such tools’ parcellation protocols, particularly when it comes to anatomical variability, leaving lots of room for assumptions, misinterpretations, and inconsistent parcellations, all of which are undesirable. Previous work suggests that automated image analysis tools reliant on these protocols go on to produce differing representations of the similarly named same gyrus, rendering interpretation and/or comparisons impossible1,8,9.

For more clarity on these essential aspects, we created a new parcellation protocol10, based on two anatomical atlases11,12. The protocol describes 3 particular gyri— the superior frontal gyrus (SFG), the supramarginal gyrus (SMG), and the cingulate gyrus (CG)— while providing methodical step-by-step instructions for identifying each of their borders. Ultimately, all gyri should be defined in a manner that allows for morphological variability to be accounted for, making automated parcellation tools more reliable. Here we focused on these 3 gyri as they are known to exhibit structural changes in ageing13 and dementia1417 populations, and to significantly differ between sexes13,18.

We provide parcellation of 10 subjects’ left and right SFG, SMG, and CG, which should prove useful to the brain imaging community given how rare manually-segmented datasets are. Although the sample size is small, these data are a good example of the morphological changes that an upper middle-aged healthy population undergoes, and a valuable ground truth for studies investigating the effects of ageing and/or disease on cohorts of similar age. We include complete population demographics and a detailed protocol that accounts for anatomical variability such as interruptions, connections and branching. This 60-gyrus dataset could also be used as a reference (rather than absolute truth) to assess inter-package parcellation differences or to validate a novel or improved parcellation tool, independently of the borders defined in the protocol, because the manually parcellated data allow checking for the anatomical variability we describe.


Dataset Methods


Ten healthy right-handed non-smoking subjects (5 male, 5 female, age range 55–64 years old), not on any medication, were randomly selected among a larger NIH-funded study (NIH grant R01 EB004155) involving 80 healthy subjects. MRI data were collected at the Western General Hospital (Edinburgh, UK) and structural scans were examined by a fully-qualified radiologist, confirming all subjects were in good health.

Data acquisition

The scans and cognitive tests were acquired and administered in 2008–2012, (data summarized in Table 1) prior to the development of community reporting standards, however, all data were systematically collected and reported. The local ethics committee approved the study and informed consent was obtained from each patient. For each of the 10 subjects in this dataset, 4 MRI volumes were obtained: coronal high resolution 3D T1-weighted (T1w), axial T2-weighted (T2w), T2*-weighted and T2 FLAIR. All scans were acquired on a 1.5 T MRI scanner (General Electric, Milwaukee, WI, USA) at the Brain Research Imaging Centre in Edinburgh (UK). Further details can be found in Tables 2 and 3 of the Data Records section).

Table 1 Demographics and cognitive scores of the 10 subjects used for this study, with scores reported in the order in which the tests were administered.
Table 2 A summary of the T1w MRI parameters.
Table 3 A summary of the T2w MRI parameters.

A medical questionnaire and a battery of cognitive subtests from the 4th edition of the Wechsler Adult Intelligence Scale (WAIS-IV19) were administered to each healthy volunteer. Checks were made to ensure that they scored within the normal range (Table 1).

The general practitioner (GP) of every volunteer was contacted twice throughout the study: once to inform them of the subject’s participation in the study’s details (along with the study’s information sheet), and once more to inform them of the scan’s outcome.

Data preparation

Given the limited contrast between grey matter (GM), white matter (WM) and cerebrospinal fluid (CSF) at 1.5 T, we adopted a multispectral method to enhance the raw subject volumes prior to parcellation (Fig. 1). The method would better the anatomical accuracy without altering the intensities in the neighbourhood of the tissue boundaries. To achieve this we combined the T1-weighted and T2-weighted volumes as detailed in the steps below:

  1. 1

    convert all volumes from dicom to ANALYZE 7.5 format (.hdr and .img files) using in-house software as we had initially intended to parcellate in Analyze 12 (Analyze12. AnalyzeDirect, Inc.

  2. 2

    convert the coronal T1w volume to an axial T1w volume

  3. 3

    flip the T1w volume, using Analyze 12, along the y- and z-axes for neurological orientation

  4. 4

    register the T1w volume to the T2w volume using FLIRT2022. The T1w and T2w volumes are now in radiological convention

  5. 5

    flip the T1w and T2w volumes along the y-axis for correct neurological orientation using Analyze 12

  6. 6

    bias field correction of the co-registered T1w volume in 3D Slicer (, version 4.3.123, using the ‘N4ITK MRI bias correction’ module and default N4 parameters. The T1w volume is saved again in radiological convention

  7. 7

    flip the T1w and T2w volumes again along the y-axis for neurological correct orientation using Analyze 12

  8. 8

    exclude the least occurring intensities in the T1w volume, as shown in the histogram, using Analyze 12

  9. 9

    subtract the T2w volume from the T1w volume using Analyze 12’s image calculator

Figure 1: Contrast enhancement using T1w-T2w volume difference (subject 2).
figure 1

The raw T1w volume (a) had low GM-WM contrast (yellow stars) as well as low GM-CSF contrast (red arrows), particularly visible at the sharp bends in the cortical surface and the small CSF spaces. By registering the subject’s T1w volume to the T2w volume (b), bias field correction, and subtraction, we generated a difference volume (c) with enhanced contrast (blue stars and yellow arrows), allowing for simpler and more accurate manual parcellation.

Parcellation Method

The variability of folding patterns is very large, making it a challenge to accurately incorporate them into gyral definitions. The number of folds in a gyrus may increase or decrease, and sulci may experience a combination of branching, connections, interruptions, and absences11. Given that sulci are the landmarks most commonly used for defining gyral borders, their misrepresentation can significantly skew gyral representations, producing false over- or under-estimations of them. It therefore becomes crucial to define parcellation protocols in a manner that is clear and flexible enough to incorporate and reflect all recognised variability, but with consistent reproducibility. Existing protocols fail to do so as they either omit some forms of variability reported in the literature or fail to clarify how a particular form (e.g., sulcal absence, sulcal discontinuity, double sulcal occurrence, etc.) shall be addressed1.

Protocol details

Gyral parcellation is most often equivalent to the parcellation of GM, which is what we endorse in the present parcellation protocol10. Cortical GM is bound by CSF externally and WM internally. To define these two borders, or surfaces, we instruct the user to create two separate paired masks (or borders), one for the GM’s outer border and one for its inner border.

We consulted 2 brain atlases to devise a comprehensive protocol for the SFG, SMG and CG. The first atlas, by Duvernoy12, indicates the general location of each gyrus, in various views and throughout the brain, however, sulcal variability details such as interruptions, connections, and branching are missing. The second, by Ono et al.11, thoroughly describes sulcal patterns and variability, but rarely in relation to adjacent gyri. Despite their variability, sulci are the main gyral delimiters, necessitating a clear understanding of them and of the consequent gyral variations. By combining the valuable details from both anatomical sources (gyral location from the first and the patterns and variability of their delimiters from the second), we moulded a single, accurate, consistent and detailed protocol for the three gyri. For each gyrus we first specify the view (axial, sagittal, or coronal) in which it is to be identified and drawn, while naming all gyral borders, mainly sulci. We then provide detailed, step-by-step instructions on drawing the gyrus from start to end, along the direction in which it propagates, in addition to information on the known variations that may be encountered and how to address them. We occasionally resort to a notch or artificial line rather than a sulcus to mark a clear start or end to the segmentation, for the sake of consistency and reproducibility. Illustrations accompany the instructions for clarification purposes.

Because of the gyral folding pattern (frequency and sharpness of turns), software can fail to accurately outline the anatomy (e.g., sulcus, gyrus, or surface). Furthermore, with cortical thickness ranging from 1 mm to ~5 mm24,25, it becomes more difficult to identify the grey and white matter boundaries due to partial volume effects. We therefore used multiple segments to represent these boundaries, while accurately following each rise and fall in the cortical surface.

Parcellation details

For each of the 10 subjects, we first loaded the enhanced difference volume (T1w-T2w) in MRIcron (, version 22DEC2015), of voxel size 1 × 1 × 2 mm. Then we manually traced the paired masks representing outer and inner GM borders for the SFG, SMG and CG in both the right and left hemispheres following the pre-defined protocol10. Importantly, this was done for every discontinuity and every sharp change in curvature. As a result, many segments were required to outline a single gyrus. The segments were merged to form a single volume for each gyrus. A workflow detailing how the derived data were made is illustrated in Fig. 2. The workflow in Fig. 2 illustrates how the derived data were made. We identified a total of 66 sulcal (CS and SFS) discontinuities in the axial plane at the SFG, 24 gyral discontinuities in the sagittal plane at the SMG, and 103 sulcal (CS) discontinuities in the sagittal plane at the CG, all of which are a result of the folding nature of cortical gyri (Table 4). We also identified 6 double sulcal (CS), and therefore 6 double gyral (CG) occurrences (n = 6). These anatomical variations (sulcal discontinuities and double sulcal occurrences) will have influenced our gyral parcellations.

Figure 2: Workflow depicting the stages followed to create the derivative masks.
figure 2

From the source T1w and T2w data, difference volumes were generated and gyral borders were outlined using multiple segments. The segments were then combined into a single 4D and a single 3D file for each gyrus (top). For comparison purpose, a multispectral segmentation of grey and white matter tissue was also performed using SPM12 (bottom).

Table 4 A summary of anatomical variations observed in our dataset.

A standard approach for validating a parcellation protocol is to obtain equivalent manual segmentations from several experts. We instead sought to validate the consistency of our manual parcellation, i.e., the anatomical landmarks defining the gyral borders which are known to vary across hemispheres and subjects. This alternative method is very similar to that of Klein and Tourville2 where the authors first automatically parcellated the brains using their older protocol26, then followed it with manual corrections based on the anatomical variability they detected in each of the subjects which the automated method failed to identify. Here, after reviewing the 2 anatomical brain atlases11,12 and writing the protocol, the first author (SM) manually segmented and parcellated all 10 subjects’ regions of interest. They were then revised for landmark consistency and accuracy by 2 experts, CP and GM, who were blinded from the protocol. When inconsistency was found across subjects, the protocol was amended, and the regions were redrawn and reviewed again. Consecutive revisions by the 3 authors and protocol updates continued until an agreement was reached on the dataset’s anatomical accuracy and consistency as well as the comprehensiveness of the protocol with regards to variability.

Because we did not seek to validate the tissue segmentation itself (grey matter-cerebrospinal fluid and grey matter-white matter borders), but the consistency of the parcellation scheme, the signal intensity (after enhancement) and spatial resolution of 1.5 T MRI is adequate for this task.

Code Availability

The Matlab (, R2016a) code used to generate the combined segments as derivative files can be found alongside the data (Data Citation 1).

To validate the manual parcellation of the gyri of interest, we computed the mean thickness of each (GMth), using all paired segments, and compared them to FreeSurfer version 5.1’s outputs. This was done using the Masks2Metrics (M2M) software version 1.027,28, freely available to all users under the GNU General Public License. The latest version of the software is available at

Data Records

All the data used and created by this study are available in the Edinburgh DataShare repository (Data Citation 1). Data are organized following the Brain Imaging Data Structure (BIDS29, also defined at with the T1 and T2 weighted volumes as source and the parcellation volumes as derivatives. Tables 2 and 3 summarize the MRI parameters and Table 5 summarizes the additional ROI details. The number of paired segments ranged from 24 to 59 for the SFG, 3 to 16 for the SMG, and 16 to 54 for the CG. This number tended to increase with the increase in cortical folding as well as cortical variability such as discontinuities and double gyrus occurrences.

Table 5 The number of paired WM and GM segments varied across subjects and hemispheres depending on cortical variability and the degree of folding.

The grey matter thickness, grey matter volume, and white matter surface area information for each parcel is available in Data Citation 2 (SuperiorFrontalGyrus.tsv, SupraMarginalGyrus.tsv, and CingulateGyrus.tsv). Columns 2–10 contain FreeSurfer-derived metrics, columns 11–19 contain Masks2Metrics-derived metrics, and columns 20–22 contain mean modified Hausdorff distance (MMHD, metrics. Further metrics details can be found in the Technical Validation section.

Technical Validation

As mentioned in the ‘Parcellation Details’ section, we validated the protocol using a blinded review of the data checking between subjects consistency whilst still considering of all known forms of cortical variability. A scanner strength of 1.5 T was sufficient as we did not seek to validate the grey matter-cerebrospinal fluid and grey matter-white matter border segmentation itself, but the consistency in parcellation and border identification despite all sorts of cortical variability, across hemispheres and subjects.

To quantitatively assess the data parcels, we compared the average thickness, volume and surface area of each gyrus/sulcus from our parcellation to that of FreeSurfer (version freesurfer-Linux-centos4_x86_64-stable-pub-v5.1.0). Although anatomical variations are not accounted for as much as in our protocol1, it still provides a valuable comparison, as variations should not be substantial, in particular for average thickness30. We also computed MMHD which, like FreeSurfer, is computed by averaging the shortest distance from each voxel on one segment to the other segment, in both directions. M2M on the other hand measures the perpendicular from one segment to the other, in both directions. We computed this distance for the inner and outer GM masks and compared it to those of the other two methods (Table 6).

Table 6 Gyral metrics, averaged for the left and right hemispheres, as measured by M2M, FreeSurfer (FS), and mean modified Hausdorff distance (MMHD).

We ran FreeSurfer using default settings to process each subject’s T1w NifTI volume, then limited our investigations to the output of the Desikan-Killiany protocol26 – specifically SFG, CG and SMG parcellations – and corresponding measurements. FreeSurfer developers recommend that manual checks and corrections typically follow automated parcellation, however, they were omitted in our case for two main reasons: (1) so as not to introduce human error/bias, and (2) the corrections would not have a drastic effect on ROI average thickness, while equalizing volume and surface area because of our edits (detailed under ‘Data Preparation’). For the SFG and SMG, metrics for the corresponding FreeSurfer labels were used, ‘superiorfrontal’ (label 80) and ‘supramarginal’ (label 83) respectively, while for the CG, 4 FreeSurfer labels (‘rostralanteriorcingulate’, ‘caudalanteriorcingulate’, ‘posteriorcingulate’ and ‘isthmuscingulate’, or labels 78,55,75,62) were used to ensure ‘like-for-like’ comparison.

Generally, gyrus thicknesses, as derived by the 3 methods, are in agreement with lateral (3.5 mm), medial (2.7 mm) and overall (2.5 mm) cortical thicknesses measured in post-mortem brains25 (Fig. 3). A percentile bootstrap on median differences showed no difference for the SFG between M2M and FreeSurfer (median difference 0.02 [−0.08 0.13] p = 0.68), and lower M2M estimates for the SMG (median difference 0.18 [0.09 0.3] p = 0.001) and CG (0.51 [0.36 0.59] p = 0.001). These last two differences are mainly due to the large variability in the ROIs’ bordering landmarks. Furthermore, the CG consisted of a large number of short segments implying that fewer perpendicular M2M thickness measurements were nonzero compared to the corresponding shorter-distance measurements of FreeSurfer. Because both the MMHD and FreeSurfer seek the shortest distance, their results should be comparable. FreeSurfer measurements are however made in 3D while our approaches rely on 2D leading to an overestimation (SFG median difference 0.18 [0.06 0.27] p = 0.006; SMG median difference 0.31 [0.13 0.45] p = 0.001; CG median difference 0.28 [0.12 0.52] p = 0.002). Together these results indicate that accounting for variability by creating segments is essential so as to not over-estimate thickness.

Figure 3: Cortical thickness measurements as calculated by Masks2Metrics, FreeSurfer and the mean modified Hausdorff distance (MMHD), along with corresponding non-parametric density estimates of the thickness (the thick lines represent the median).
figure 3

The regions measured by the tools, from left to right, are the SFG (a), SMG (b), and CG (c).

Since our data accounts for anatomical variations such as a double CG, which we encountered in 6 of the 10 subjects’ hemispheres, significantly larger mean CG GM volume (Fig. 4c) and in turn mean WM surface area (Fig. 4f) measurements are observed in the manually-derived parcels compared to their corresponding FreeSurfer counterparts (median volume difference 6304.2 [3627.08 8806.70] p = 0.002; median surface area difference 2193.34 [1276.27 3179.56] p = 0.002). In the event of a double CG, the SFG on the medial surface ‘loses’ its inferior-most fold to the CG in our protocol (Fig. 5a), compared to FreeSurfer (Fig. 5b–d). This explains why our manually-derived SFG WM surface areas are smaller than their corresponding FreeSurfer-derived ones ((Fig. 4d- median difference 1379.99 [529.01 2468.41] p = 0.002), and to a lesser extent also GM volumes, although not significantly (Fig. 4a - median volume difference 632.61 [−3105.74 3680.1] p = 0.63). The double cingulate occurrences in our cohort also explain the wider distributions for both the CG (Fig. 4c,f) and SFG (Fig. 4a,d).

Figure 4: Volume (GMvol) and surface area (WMsa) measurements computed by M2M and FreeSurfer, along with their corresponding non-parametric density estimates (thick lines represent the median).
figure 4

SFG metrics (a,d) are most similar between the two techniques, although a larger metric distribution is seen at both the SFG (a,d) and CG (c,f), mainly due to cortical variability in the cingulate sulcus which is not always accounted for by FreeSurfer. The greatest disagreement between the methods is seen at the SMG where we observed smaller parcel volumes (b) and surface areas (e) with M2M than with FreeSurfer.

Figure 5: A demonstration of protocol differences stemming from cortical variability, as seen in several subjects.
figure 5

With our protocol, the double cingulate sulcus scenario for subject 5’s left hemisphere, shown in the T1w-T2w difference volume, contributes to a CG with two folds (a), whereas with FreeSurfer’s protocol, the superior fold is mostly (inside the yellow box) a part of the SFG (b). Parts of the upper CG fold are similarly omitted by FreeSurfer in the left hemispheres of subjects 1 (c) and 6 (d).

The greatest disagreement between the two methods was evident when calculating SMG metrics. This is most likely due to the inferior border of the SMG in our protocol being more superior than that of FreeSurfer, lending to smaller manually-segmented parcels, and therefore smaller parcel volumes (Fig. 4b) and inner mask surface areas (Fig. 4e) with M2M than with FreeSurfer (median volume difference 6588.74 [4165.12 7873.76] p = 0.002; median surface area difference 2598.92 [2012.17 2954.39] p = 0.002).

Not only do parcellation schemes (and tools) have direct implications on the morphometrics of the regions they outline, but also on any concomitant analyses. To demonstrate this, we conducted a regression analysis on the three CG metrics (thickness, volume and surface area), as measured by M2M and FreeSurfer, with respect to the National Adult Reading Test (NART) score. The NART is a WAIS-IV subtest (reported as NART50 in Table 1 and available with the MRI data).

The highest density interval (HDI) of the difference in regression coefficients did not differ between our parcellation and FreeSurfer’s for CG thickness (HDI: [−0.001 0.0006], Fig. 6a), but was statistically different for both CG volume (HDI: [−0.0016 −0.0021], Fig. 6b) and surface area (HDI: [−0.0025 −0.0002], Fig. 6c), with no association using either parcellation scheme.

Figure 6: Regression analysis on CG metrics.
figure 6

CG thickness (a), volume (b), and surface area (c) as measured by M2M and FreeSurfer with respect to NART scores.

It is understandable that when dealing with large datasets it is not possible to manually segment the entire ground truth. However, from what we have observed with our small cohort, cortical variability is not rare. It is therefore crucial to be aware of the variability details of any regions of interest, and when working with automated tools to check for variability considerations to best assess the implications this may have on the results, if any. The data presented here provide in that sense a good testing ground for automated MRI parcellation.

Usage Notes

All previously described data is freely available at the Edinburgh DataShare repository (Data Citation 1) under the CCBY license.

Masks2Metrics is a tool that is freely available on GitHub, at, under the GNU General Public License (archived version used for the results presented available at Edinburgh DataShare repository27). The tool has also been published in the Journal of Open Source Software (JOSS)28.

Results for the Technical Validation section were derived by running our Matlab code (Matlab_code_to_derive_stats_and_figs.m, Data Citation 2) which uses the parcel metrics of Data Citation 2 (SuperiorFrontalGyrus.tsv, SupraMarginalGyrus.tsv, and CingulateGyrus.tsv).

Additional information

How to cite this article: Mikhael, S.S. et al. Manually-parcellated gyral data accounting for all known anatomical variability. Sci. Data. 6:190001 (2019).

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.