Introduction

Neurodevelopmental disorders (NDD) are highly prevalent among children aged 3 to 17 in the United States, affecting approximately 17% of this population1. Recent advancements in genetic technologies have led to the identification of genetic causes in 15–53% of NDD cases2, but comprehensive phenotypic evaluation, including laboratory tests and neuroimaging, remains crucial for precise genetic diagnosis. Eventually, these evaluations provide insights into the underlying mechanisms and potential biomarkers associated with these disorders.

Since the advent of high-resolution, three-dimensional (3D) structural magnetic resonance imaging (MRI) of the human brain, brain morphometric analysis has been widely applied in various neurodevelopmental diseases3,4,5,6 and the features, such as regional volume and thickness, have been developed as biomarkers of disease states or treatment responses7,8. Additionally, repeated MRI scans can typically be made on the same individual, due to their non-invasive and radiation-free characteristics, enabling the visualization of longitudinal changes in normal brain development and the distinguishing of atypical trajectories in pediatric patients with neurodevelopmental diseases9,10.

To conduct brain morphometric analysis effectively in both research and clinical settings, precise segmentation of T1-weighted brain MRI into anatomical regions is an essential component of quantitative analysis11. Given that manual delineation of structural parameters is both labor-intensive and prone to inter-rater variability, standardized and automated processing approaches such as mni_autoreg12, SPM13, Freesurfer14,15, and FSL16,17 are widely used to label novel target images in adults. Nevertheless, the pediatric brain is substantially distinct from its adult counterpart. Consequently, these templates, predominantly derived from adult data, may not be ideally suited for pediatric applications11.

Owing to these challenges, innovative methods are being incessantly developed to address the constraints inherent to existing template-based approaches. Recently, deep learning has emerged as a promising methodology for precise brain segmentation18. Deep learning encompasses neural networks exceeding five layers, which facilitate the extraction of hierarchical features directly from raw images. Given their capability for autonomous learning, these networks demonstrate remarkable outcomes and broad generalizability when trained on extensive datasets11,18,19,20. In the field of neonate and infant brain segmentation, including whole brains or lesions, convolutional neuronal networks (CNNs) are the most commonly used11. In this context, a state-of-the-art deep-learning based segmentation (DLS) methodology, named VUNO Med-Deep Brain, has been introduced in adult cohorts with Alzheimer’s disease21.

In a previous study conducted by our team, we utilized the Freesurfer software coupled with manual adjustments to execute brain morphometric analyses. This yielded the first comprehensive findings on developmental brain alterations and volumetric disparities in regional structures among epilepsy patients harboring an SCN1A gene mutation22. The SCN1A gene (MIM#182389), responsible for encoding the alpha 1 subunit of the voltage-gated sodium channel, has mutations that can lead to an array of neurodevelopmental disorders, including epilepsy23.

To ascertain the efficacy of the newly devised deep-learning based morphometric analysis for pediatric cohorts, we compared the structural parameters derived from the DLS method for healthy children aged under 11 with those obtained from the Freesurfer software supplemented by manual corrections. For a more granular assessment of the age-specific accuracy of the DLS approach, we performed subgroup analyses focusing on three distinct age groups: under two years, between two to six years, and older than six years. Furthermore, to gauge the robustness of the DLS technique, we juxtaposed regional volumes between pediatric patients (below 11 years) with an SCN1A mutation and their healthy counterparts.

Results

Baseline demographics of patients and control group

Twenty-one patients with a SCN1A mutation and 42 healthy controls, previously selected and analyzed by our center, were re-analyzed in this study (age range; 2.0–10.5). To delineate the age-specific performance of the DLS method, we classified the control group into three different age subgroups: age ≤ 2 years (n = 12, 28.6%), 2 < Age ≤ 6 years (n = 18, 42.9%), and 6 < Age ≤ 10 years (n = 12, 28.6%). The baseline demographics are shown in Table 1 of a previous study22.

Table 1 Whole brain volume analysis between two methods, DLS and Freesurfer with manual correction methods in the control group.

Brain volume analysis in control group measured by two different methods

Whole-brain volume analysis

The volumes of total brain, total gray matter, cortical gray matter, and total white matter measured by DLS method were not different from those measured by Freesurfer with manual correction. The volume of subcortical gray matter measured by DLS was significantly smaller, and the total cerebellum measured by DLS was larger than that measured by Freesurfer with manual correction methods. (Table 1 and Fig. 1).

Figure 1
figure 1

Whole brain volume analysis in a 42 healthy control by the DLS method and Freesurfer with manual correction. (A) Whole brain volume of overall healthy controls (B) Whole brain volume of healthy controls with age ≤ 2 years (C) Whole brain volume of healthy controls with 2 < age ≤ 6 (D) Whole brain volume of healthy controls with age > 6 The blue dots showed the volume measured by Freesurfer with manual correction; The red dots showed the volume measured by the DLS method. **p-values < 0.001.

After subgroup analysis according to age, the volume differences of the subcortical and cerebellar gray matter between the two methods were revealed to be consistent in all three subgroups.

Cortical parcellated volume analysis

Among the 68-parcellated areas measured by the two methods, volume differences were found in only 7 areas (Table 2). The volumes of the right caudal middle frontal, right frontal pole, left inferior parietal, left fusiform, right lateral occipital, and right insular cortex were significantly smaller. However, the left parahippocampus volume was significantly larger when measured by DLS, than when measured by Freesurfer with manual correction methods. Subgroup analysis between the two methods exhibited a significant volume difference in the right insular cortex among the ≤ 2 years of age group.

Table 2 Cortical volume analysis between the two methods, DLS and Freesurfer with manual correction methods in control group.

Subcortical volume analysis

The measured volume of both thalami, both putamen, and left caudate were significantly different between those measured by the DLS method and those by the Freesurfer with the manual correction method (Table 3 and Fig. 2). In the ≤ 2 years age group, both thalami were significantly larger and the right putamen was significantly smaller in the volume measured by the DLS method, compared to that by Freesurfer with the manual correction method. These differences were also observed in volumes of the right thalamus in the 2 < Age ≤ 6 years group and the right putamen in the 6 < Age ≤ 10 years group.

Table 3 Subcortical volume analysis between the two methods, DLS and Freesurfer with manual correction methods in control group.
Figure 2
figure 2

Subcortical volume analysis in a 42 healthy control by the DLS method and Freesurfer with manual correction. (A) Whole brain volume of overall healthy controls (B) Whole brain volume of healthy controls with age ≤ 2 years (C) Whole brain volume of healthy controls with 2 < age ≤ 6 (D) Whole brain volume of healthy controls with age > 6 The blue dots showed the volume measured by Freesurfer with manual correction; The red dots showed the volume measured by the DLS method. **p-values < 0.001.

Group comparison of each volume between patients with a SCN1A mutation and healthy control measured by DLS

Whole-brain volume analysis

After adjusting for the sex, age and ICV for each group, the volumes of the total brain, total gray matter, cortical gray matter, subcortical gray matter, and total white matter were found to be significantly smaller in the patient group than in the controls (Table 4 and Fig. 3A) as is consistent with a previous study22.

Table 4 Group comparison of whole brain, subcortical and cortical parcellated volumes by deep-learning based segmentation (DLS) method.
Figure 3
figure 3

Group comparison of whole brain (A) and subcortical (B) parcellated volumes by deep-learning based segmentation (DLS) method. The blue dots showed the volumes of healthy controls; The red dots showed the volume of SCN1A patients **p-values < 0.001.

Cortical parcellated volume analysis

The 34 cortical parcellated regions per hemisphere between the two groups; healthy control and patients with SCN1A mutation were compared (Table 4 and Supplementary Table 1). In comparison to heathy controls, patients showed significantly decreased volumes in both the lateral orbitofrontal, precentral, and inferior parietal, right isthmus cingulate, right middle temporal, left Banks of the STS, left parahippocampal, and right insular cortex compared to healthy control.

Subcortical volume analysis

A subcortical volume analysis was performed to compare the volumes of the subcortical structures (thalamus, caudate, putamen, pallidum, accumbens area) between the patients and the controls (Table 4, Supplementary Table 1 and Fig. 3B). The patients showed significantly smaller volumes of both the thalami, putamen, and caudate, and right pallidum than those of healthy controls.

Group comparison of each volume between patients with a SCN1A mutation and healthy control measured by DLS

In the assessment of our segmentation algorithm's efficacy, we computed the Dice Similarity coefficient (DSC), along with precision and recall metrics, across whole, cortical, and subcortical brain regions as delineated in Table 5. Our findings revealed cortical gray matter showed relatively low performance compared to other whole brain areas. Notably, specific areas such as the frontal pole, entorhinal cortex, temporal pole, and nucleus accumbens exhibited suboptimal performance, with performance values falling below 0.7, in both cortical and subcortical analyses.

Table 5 Mean dice score coefficient (DSC), precision, and recall values between two methods, DLS and Freesurfer with manual correction methods in control, SCN1A patients, and total patients.

Discussion

In pediatric populations, accurate brain segmentations of MR imaging can help find a diagnostic biomarker for a specific neurological disease, define a clinical course, or identify the underlying developmental patho-mechanism of various neurodevelopmental disorders24. However, accurate segmentation of pediatric brain MR imaging is challenging due to the reduced tissue contrast, increased noise, several partial volume effects, and ongoing white matter myelination25,26. This study aimed to demonstrate the performance of the DLS method for brain segmentation of MRI in a pediatric population aged under 11 years.

To investigate the accuracy of the DLS method in measuring regional brain volume, the volume of whole brain (Table 1 and Fig. 1), parcellated cortical volumes (Table 2), and segmented subcortical volumes (Table 3 and Fig. 2) were measured by a DLS method in healthy controls and compared to previously measured volumes using Freesurfer with a manual editing method.

Importantly, using the DLS method, the volumes of the total brain, total cerebral gray matter, cortical gray matter, and total white matter, were consistent with those measured by Freesurfer with manual editing (Table 1 and Fig. 1). In particular, the DLS methods can successfully delineate the gray and white matter and parcellate the total cortical gray matter (Table 1 and Fig. 1) as the freesurfer did (Table 2) representing the good performance of DLS method for cortical volume analysis.

The volume of only seven area including right caudal, middle frontal, right frontal pole, left inferior parietal, left fusiform, left lateral occipital, right insular, and the left parahippocampal cortex, measured by the DLS method, were discordant with that measured by Freesurfer with manual correction. After a subgroup analysis, the differences were resolved in the Age > 2 years group.

Since CNN was first introduced in 198927, great interest in CNN’s ability of the neonate and infant brain segmentation were gained through two large-scale competitions using standardized open data sets: the Neonatal Brains Segmentation Challenge and the 2017 iSeg 6-month Infant Brain Magnetic Resonance Imaging Segmentation Challenge. They concluded that the CNN approach, using deep-learning methods, could solve the neonate and 6-month-old brain tissue segmentation with a respectable dice similarity coefficient of 72.5–73.5%6,28,29.

Recently, the UNC/UMN Baby Connectome Project (https://iseg2017.web.unc.edu/baby-connectome-project/) is in the process of identifying brain and behavioral development in typically developing infants across the first 5 years of life by analyzing structural segmentation and functional connectivity30. These data suggest that CNN can be applied to the segmentation of young child brains and that it is particularly effective in whole brain and cortical gray matter volume analysis. In addition, currently released deep learning-based, infant-dedicated cortical surface reconstruction pipeline, iBEAT V2.0 were successfully processed various imaging protocols/scanners31.

However, there were volume differences between Freesurfer with manual editing, and the new DLS method in the cerebellum and subcortical gray matter, including both thalami and the right putamen. Regard to subgroup analysis, the volume differences were evident mainly in patients of age under two and also dissolved in older age groups upper two as cortical parcellated volume analysis. The Supplementary Fig. 1 provide several illustrative examples, highlighting enhancements in achieving consistent and accurate segmentation in specific brain areas using deep learning segmentation (DLS) methods. Notably, the examples include incorrect segmentation from subcortical gray matter (GM) to white matter (WM) (Supplementary Fig. 1A), a noisy boundary in cerebellar segmentation (Supplementary Fig. 1B), and smaller segmented volumes in the putamen and pallidum areas (Supplementary Fig. 1C and D), when measured by Freesurfer with manual editing. These findings imply that the DLS approach could be instrumental in addressing the challenges of precise segmentation of complex structures, which is a limitation observed with Freesurfer.

Traditionally, atlas-based methods32,33,34, which match intensity information between an atlas and target images, and pattern recognition methods35,36, which classify tissues based on a set of local intensity features, are the classical approaches to automated segmentation11. Unfortunately, these methods have been shown to provide inaccurate segmentation for pediatric brain37,38 due to an inappropriate template, customized to adult brains with low intensity contrasts and high shape variability of each regional structure, including the thalamus, hippocampus, parahippocampal areas, and insular cortex. For these reasons, volume differences measured between these two methods can be observed in these areas. Additionally, small sample sizes with large dispersions also contribute to the volume differences measured between the two methods.

In investigating the ability to identify brain morphometric abnormalities in patients with a SCN1A mutation, we compared the volumes between patients and healthy controls using the DLS method (Table 4). In whole brain analysis, the volumes of total brain, total gray matter, cortical gray matter, subcortical gray matter, and total white matter were significantly decreased in patients with a SCN1A mutation and related epilepsy, compared to that of healthy control and these results were consistent with our previous study22. In addition to the cortical and subcortical areas which showed reduced volume in patients using Freesurfer software with manual correction22, the both thalamus, right pallidum, right lateral orbitofrontal, right paracentral, right inferior parietal, left Banks of the STS, left parahippocampal, and right insular cortex were significantly smaller in patients with SCN1A mutation related epilepsy compared to those of healthy controls in this study.

The banks of the STS, and parahippocampal cortex are subnetworks of default mode network (DMN) and the thalamus and insular cortex are correlated areas with DMN in studies with resting state functional MRIs39,40. These structural alterations of the DMN and DMN-associated areas, in patients with a SCN1A mutation, is consistent with the results of other studies22,41.

For cortical parcellation, FreeSurfer generates a white matter (WM) surface and pial surface for each hemisphere. The WM surface is generated from a segmentation mask and a copy of the WM surface is deformed towards the cerebrospinal fluid (CSF)/Gray matter (GM) boundary, to eventually form a pial surface. The cortical surface is mapped to a spherical atlas and the probabilities for each cortical region are calculated with Bayesian estimations. As the parcellation pipeline includes intensive computation such as registration and surface reconstruction, the whole processing time is around 7 h while it depends on the computation environment.

Since DeepBrain utilizes the fully trained deep-learning model for the parcellation pipeline that replaces manually designed algorithms such as cortical surface generation, the computation time can be significantly reduced. The total computation time for whole brain parcellation was less than 1 min, which is incomparably faster than traditional methods, such as FreeSurfer.

Despite several limitations, namely lack of validation using standardized open datasets or other datasets from multiple resources with dice coefficient, and a small number of patients, the deep-learning-based method with CNN could provide a key solution for accurate pediatric brain segmentation. Importantly, accurate and quick segmentation by CNN may identify the normal trajectories of developing brains, leading ultimately to the early detection and treatment of various neurodevelopmental diseases.

In a pediatric population, this new, fully automated DLS method is compatible with the classic, volumetric analysis with Freesurfer software and manual correction, and it can also well detect segmental brain volume change in children with a SCN1A mutation. Further validation using larger population data may confirm that this fully automated DLS method is a good and easy tool for accurate brain segmentations in pediatric populations.

Methods

Subjects

The 21 patients of epilepsy with a SCN1A mutation, who were under 11 years of age, and the 42 healthy controls, who had participated in our previous study, were reinvestigated by the DLS method22. Patients of epilepsy with a SCN1A mutation were recruited from the pediatric neurology clinics of three medical centers in Korea: Asan Medical Center, Samsung Medical Center, and Seoul National University Children’s Hospital. We included patients who satisfied the following inclusion criteria: (i) epilepsy diagnosed by a pediatric neurologist; (ii) a genetically confirmed SCN1A mutation and (iii) a normal brain MRI. For each patient, two healthy control subjects matched in age and sex and without alleged neurologic deficits, were recruited.

MRI acquisition

MRI scans were obtained on a Philips Achieva 3.0 T scanner (Philips Healthcare, Eindhoven, The Netherlands) (n = 114) and Siemens MAGNETOM Verio 3.0 T scanner (Siemens AG, Erlangen, Germany) (n = 6). Three-dimensional whole brain T1 sequence imaging was acquired with the following image parameters: echo time (TE) = 4.6 ms, repetition time (TR) = 9.8 ms, flip angle (FA) = 8.08, field of view (FOV) = 224 × 224 mm, matrix = 256 × 256, slice thickness = 1 mm, sagittal images of the entire brain with in-plane resolution 1.0 mm × 1.0 mm or TE = 5.1 ms, TR = 25 ms, FA = 30, FOV = 220 × 220 mm, matrix = 512 × 512, slice thickness = 1 mm, sagittal images of the entire brain with in-plane resolution 1.0 mm × 1.0 mm on a Philips 3.0 T Achieva scanner. On the MAGNETOM Verio scanner, images were obtained with TE = 1.9 ms, TR = 1500 ms, FA = 9.0, FOV = 220 × 220 mm, matrix = 256 × 256, slice thickness = 1 mm, sagittal images of the entire brain with in-plane resolution 1.0 mm × 1.0 mm. Prior to data processing, all raw T1 sequencing images were visually inspected for common MR T1 weighted imaging artifacts.

Image analysis

We analyzed T1-weighted MR images using two automated segmentation software, VUNO Med-DeepBrain (version 1.0.1, VUNO Inc., Seoul, South Korea)42,43 and FreeSurfer (version 5.3.0, https://surfer.nmr.mgh.harvard.edu)42,43. FreeSurfer is a publicly available software for brain analysis which provides automated brain segmentation and cortical parcellation. It uses a probabilistic atlas generated from manually segmented brain MR images to train a Bayesian segmentation algorithm.

VUNO Med-DeepBrain is based on DLS system, unlike the atlas-based segmentation methods used in FreeSurfer and NeuroQuant. The DLS system provides quantitative information, which includes the volume of 104 regions and cortical thickness of 68 cortical regions (34 for each hemisphere) from the T1-weighted brain MRI, and white matter hyperintensity (WMH) regions from the T2-weighted brain MRI. The DLS system is designed using in-house segmentation model with convolutional neural networks (CNNs) with dilated convolution layers instead of max-pooling to minimize feature loss of small brain regions during spatial dimension reduction. The model was originally trained with adult MR images, but we fine-tuned the model with additional pediatric brain images obtained from OpenfMRI dataset44,45. Fine-tuning dataset consists of 249 images (female: 156, male: 143, age: 5.01–19.22, σ = 4.33). The input image comprised a conformed T1-weighted brain MRI and the outputs are segmented brain regions mask and associated volumes of 104 regions in total. The CNNs was trained using ADAM optimizer and generalized dice loss function. The DLS system also includes a 3D CNN model that provides the intracranial volume segmentation, which was used to normalize the volume measurements for statistical analysis.

The total processing time including image preprocessing and normalized volume retrieval was about 1 min on the minimum computational requirement setting (CPU: 16 GB, GPU: RTX2080Ti 11 GB). Although the model was trained with both 3 T and 1.5 T images, exploiting 3 T MR images was recommended to acquire more accurate segmentation.

Statistical analysis

We compared the absolute volume differences of healthy controls measured by two methods using the paired t-test. For multiple comparisons, a Bonferroni correction was applied, and p < 0.001 was considered significant for the whole brain, the 10 subcortical, and 34 cortical volumes.

The adjusted volumes (\(Vo{lume}_{adj}\)) were used for comparisons between the control and patient groups by the DLS method, in order to adjust the total (ICV). \(Vo{lume}_{adj}\) signifies the linearly adjusted volume calculated as \(Volume-\beta (ICV-IC{V}_{mean})\) where \(IC{V}_{mean}\) is the mean ICV of the each group and the parameter was fixed to minimize the covariance of the \(Vo{lume}_{adj}\) and ICVs in each group46.

We fitted linear models with the interceptor term and the group indicator (healthy control vs. SCN1A patients) as covariates and used generalized least squares to estimate regression coefficients. The errors were allowed to be correlated with unconstrained parameters. After the Bonferroni correction for multiple comparison, a p < 0.001 was considered significant for the whole brain, the 10 subcortical, and 34 cortical volumes.

To determine the spatial overlap of the structures, we conducted Dice score coefficient (DSC), precision, and recall analysis between manual and automated segmentation methods. Dice Score can be defined as twice the total overlapping area of the predicted mask and the ground truth divided by the sum of the total number of tumor pixels (i.e. pixel value 1 [foreground]) in both the predicted mask and the ground truth). The value of DSC ranges from 0, indicating no spatial overlap between structures, to 1, indicating complete overlap47.

Other metrics used require pixel-wise evaluation of the ground truth and the predicted masks. True positive (TP) can be defined as the total number of positive pixels (belonging to the tumor) in the ground truth which are correctly predicted, True negative (TN) is the total number of negative pixels (belonging to the background) in the ground truth which are correctly predicted negative, False positive (FP) is the total number of negative pixels which are falsely predicted as positive pixels. Precision can be defined as the ratio of the total number of pixels predicted as positive to the total number of actual foregrounds. Recall is the ratio of the total number of correctly predicted foregrounds to the total number of actual foregrounds.

Ethical approval and consent of participate

The study protocol was reviewed and approved by the Institutional Review Board of the University of Ulsan College of Medicine (No. 2014-0405), and informed consent was waived because of the retrospective nature of the study. The Study was conducted in accordance with the Declaration of Helsinki.