Predicting 2-year neurodevelopmental outcomes in preterm infants using multimodal structural brain magnetic resonance imaging with local connectivity

The neurodevelopmental outcomes of preterm infants can be stratified based on the level of prematurity. We explored brain structural networks in extremely preterm (EP; < 28 weeks of gestation) and very-to-late (V-LP; ≥ 28 and < 37 weeks of gestation) preterm infants at term-equivalent age to predict 2-year neurodevelopmental outcomes. Using MRI and diffusion MRI on 62 EP and 131 V-LP infants, we built a multimodal feature set for volumetric and structural network analysis. We employed linear and nonlinear machine learning models to predict the Bayley Scales of Infant and Toddler Development, Third Edition (BSID-III) scores, assessing predictive accuracy and feature importance. Our findings revealed that models incorporating local connectivity features demonstrated high predictive performance for BSID-III subsets in preterm infants. Specifically, for cognitive scores in preterm (variance explained, 17%) and V-LP infants (variance explained, 17%), and for motor scores in EP infants (variance explained, 15%), models with local connectivity features outperformed others. Additionally, a model using only local connectivity features effectively predicted language scores in preterm infants (variance explained, 15%). This study underscores the value of multimodal feature sets, particularly local connectivity, in predicting neurodevelopmental outcomes, highlighting the utility of machine learning in understanding microstructural changes and their implications for early intervention.


Demographics and clinical characteristics
The neonatal intensive care unit neonatal and maternal data, clinical information derived during follow-up, and Bayley Scales of Infant and Toddler Development, Third Edition (BSID-III) subscale results are presented in Table 1.The 193 preterm infants who participated in the study were divided into EP (n = 62) and VLP (n = 131) groups.

Local connectivity features with net effect on BSID-III subtests
Table 2 shows GNA results for partial relationship to BSID-III subtests score with WM integrity indices, and factors that directly affect BSID-III scores include positive (+) or negative (−) correlations (Table 2):

Machine learning performance of multimodal feature sets
Using multimodal feature sets, we determined the best performing linear and nonlinear models (Table 3; Fig. 1).Regarding the cognitive scores, the preterm (root mean squared error [RMSE], 13.352; variance explained, 17% on ElasticNet) and V-LP (RMSE, 11.205; variance explained, 17% on ElasticNet) groups exhibited the highest predictive performance for models that included local connectivity features.In the EP group, the RF model, including volumetric and global network feature sets, demonstrated the highest predictive performance (RMSE, 15.402; variance explained, 13%).The subgroup that included local connectivity features for motor scores demonstrated the highest predictive performance (EP group: RMSE, 11.363; variance explained, 15% on XGBoost; V-LP group: RMSE, 13.698; variance explained, 10% on RF).Regarding language scores, the preterm group demonstrated a high-performing prediction that included only local connectivity features (Preterm: RMSE, 11.792; variance explained, 15% on XGBoost).However, the other models exhibited relatively low-performance predictions (EP: RMSE, 11.674; variance explained, 3% on XGBoost; V-LP: RMSE, 12.425; variance explained, 6% on ElasticNet).

Feature importance within the best performing model in each BSID-III subset
The top ten features that were important for prediction were selected, and the quota for each feature set is shown in Fig. 2. Each BSID-III subset showed a different feature importance, depending on the group.In terms of cognitive and language scores, local connectivity and volume predictors had the highest proportions, whereas in motor scores, clinical variables and volume predictors (e.g., cerebellum) had the highest proportions.
We presented the brain lobe distribution of the local connectivity predictors (Fig. 3) and frequencies of the top 10 predictors from the best-performing models for all BSID-III subsets (Table 4).The feature importance frequency of the brain regions in the local connectivity was ranked by the left superior temporal gyrus (STG), thalamus, and inferior frontal gyrus (opercular), and the remaining regions were counted once or twice.

Discussion
To the best of our knowledge, the present study is the first to apply linear and nonlinear machine learning methods to predict 2-year neurodevelopmental outcomes in preterm infants, utilizing a comprehensive set of multi-modal features.The predictive performance demonstrated notable improvement when considering multimodal feature sets compared to single-feature sets, with a primary contribution to performance enhancement observed from local connectivity sets.Feature importance in the best-performing models differed, depending on neurodevelopmental subsets, and was primarily ranked in the left STG and thalamus.
Differences in core WM developmental patterns in the EP and V-LP groups suggest that the affected brain regions may differ depending on the degree of prematurity at birth 12 .Preterm groups in this study might not have shared a common etiology, as indicated by differences in WM development patterns, depending on GA. Between 24 and 28 weeks of gestation, thalamocortical afferent axons developed in the frontal, temporal, and occipital areas, and initial synaptic connections and spatial reorganization in the frontal and occipital regions occurred under the influence of sensory-sensitive cortical development 45,46 .In contrast, after 28 weeks of gestation, myelination became prominent, along with astrocyte and oligodendrocyte production, and the sensory-driven  development of long-range association fibers of the thalamocortical, somatosensory, visual, and auditory processes occurred 47,48 .
Although the majority of preterm infants performed within the normal range of general cognitive functions [49][50][51][52] , previous studies have shown that prediction performance is poor for 2-year-olds with EP who perform at levels 1-2 standard deviation below the expected cognitive function 49,53,54 .Similarly, the significant difference in cognitive scores between EP (93.92 ± 13.91) and V-LP (102.53 ± 13.83) groups may cause performance differences in predicting cognitive outcomes.This suggests that the influence of prematurity level and latent variables early in the EP may cause simultaneous changes in local connectivity and increase collinearity between variables.In this case, global connectivity variables that have been proven in studies comparing pre-and full-term infants at term-equivalent ages may be effective for prediction 21,22,55 .
The emergence of cognitive function is largely influenced by the development of specific subnetworks rather than the development of the whole brain 56 .Differential brain regions identified in this study that predicted cognitive outcomes included the left cuneus, lingual, superior occipital gyrus, and putamen, which are consistent with the regions identified in previous studies 57,58 .They may be responsible for a series of cognitive processes in early brain development, such as the primary processing of visual, and somatosensory information for cognitive 59,60 and sensory association for higher-order cognitive developments 61 .
Predictors with significant partial correlations with motor outcomes were identified in the thalamus, cerebellum, and frontotemporal regions.The identified predictors shared key biomarkers found in previous neurodevelopmental prediction studies.The thalamus has been identified as an important feature of the preterm subgroup, suggesting that the relationship between the thalamus and motor outcomes may be stratified according to GA 62 .Thalamic development is linearly related to the degree of prematurity 12,63 , which could weaken the thalamocortical connections and lead to the disruption of connections within key brain structures in preterm infants 26,55 .Moreover, a study by Kline et al. showed that the thalamic volume was associated with motor outcomes at 2 years of age 64 , and further correlated with motor function at 7 and 11 years of age 65,66 , suggesting that early thalamic development may have effects that persist throughout childhood and adulthood.Additionally, the feature importance in the left insula and frontal and temporal brain regions may reflect the involvement of the high-level cerebellothalamic pathway in motor development 67 .The left superior frontal gyrus contains part of the premotor cortex and is an important predictor of motor outcomes 33 .Moreover, the insula is partially involved in controlling sustained intentional movements 68 , and the temporal pole is known to play a role in controlling visuomotor movements 69 , suggesting its potential as a key marker for later functional development.www.nature.com/scientificreports/ The preterm subgroup exhibited the lowest predictive performance and unreliable feature importance in predicting language scores.These results may be due to the lack of a cohort; therefore, the relationships between multivariate variables and language scores were no longer stratified.Moreover, language performance is more sensitive to potential environmental factors that cannot be completely explained by imaging 70 , and our study may not have considered complex clinical factors derived from the EP group.Nevertheless, the left STG identified in the preterm group was closely related to the language score, again highlighting the importance of this variable as single predictors.A previous study that examined the relationship between language ability at 2 years of age, and local connectivity in preterm infants showed that the left STG was negatively correlated with language scores, suggesting that the left STG is a key region for language development and has microstructural vulnerability 71 .
Although this study attempted to follow the quality assessment criteria provided in recent reviews on predicting pediatric development (Supplementary Table 1) 72 , the most prominent limitation of the current study is the limited amount of data.Generalization to other datasets might be limited due to the lack of external crossvalidation.Also, the implementation of a more complex predictive model such as a non-linear SVM or artificial neural network was not possible because of the limited amount of data and poor interpretability of such models.Therefore, the complex relationships between multiple features and predictive value may not be fully represented in the model.A future collaboration between multiple sites can overcome the hurdle of a small sample size.Dependencies among global network metrics exhibiting similar patterns have been identified (Supplementary Fig. 1), which may potentially impact the model's predictive power and should be interpreted with caution 73 .Future research may perform analyses that limit collinearity among metrics while accommodating the inherent biological complexity within each network metric and measuring the uniqueness of network structures 74,75 .Quantification of the structural connectome should be performed with the cerebellum because structural connections within the cerebellum may be important for WM connections around the thalamus.The importance of the local connectivity was primarily identified in the left hemisphere.Given that early brain lateralization is a multivariate trait influenced by a variety of factors, such as stress and the external environment [76][77][78] .Future studies could utilize lateralization indices in predictive models.
In conclusion, we found that the prediction performance and feature importance differed, depending on the preterm group for the BSID-III.Additionally, the STG and thalamus are important markers for predicting motor and language development.Machine learning approaches that leverage brain connectivity can improve individual risk stratification by improving our understanding of how alterations in the brain microstructure affect neurodevelopment.

Study populations
The participants of the present study included preterm infants born at < 37 weeks GA who were admitted to the neonatal intensive care unit of the Hanyang University Hospital and participated in a follow-up project at the Hanyang Inclusive Clinic for Developmental Disorders between 2017 and 2021.Of 218 eligible preterm infants using the BSID-III 79 , eight preterm infants with brain injury, one with hypothermia, and one with metabolic abnormalities were excluded from image processing.Fifteen of 208 participants were excluded from the WM analysis because of motion artifacts and poor image quality.Similarly, 38 patients were excluded from the volumetric analysis.A total of 193 participants in a WM analysis and 170 in a volumetric analysis were recruited with suitable MRI data obtained at near-term age (postmenstrual age, 35-44 weeks) without congenital brain abnormalities, congenital infections, cystic periventricular leukomalacia, diffuse ventriculomegaly, evidence of genetic disorders, focal abnormalities, intraventricular hemorrhage (IVH, grades II-IV), or punctate WM injury.
The Institutional Review Board of the Hanyang University Hospital approved the study protocol, and informed consent was obtained from infant's parents prior to participation in present study.All procedures were performed in compliance with the principles of the Declaration of Helsinki.
All the preterm infants were assessed at corrected ages between 18 and 24 months by certified examiners for cognitive, motor, and socioemotional development using the BSID-III, with subtests scaled based on age at test.For statistical analysis, preterm groups were divided based on GA 28 weeks (28 < EP; 28 ≥ V-LP).

Clinical data collection
Detailed information on clinical data was gathered through a systematic and prospective chart review.Clinical variables adopted in this study have useful biological premises and potential associations with later neurodevelopment, and variables that have been validated with improved variance explained in previous papers predicting 2-year neurodevelopmental scores in preterm infants were selected 80 .The present study followed nine predefined clinical factors (GA, postmenstrual age, male sex, maternal education, small gestational age [SGA], IVH, 5-min Apgar score, and bronchopulmonary dysplasia) from the PENUT dataset that are expected to be associated with long-term outcomes 80 .Maternal education level was based on the years of schooling and categorized by educational level.

MRI acquisition
Individual preterm infants were scanned at near-term age (postmenstrual age [PMA], 35-44 weeks) using wholebody 3 T magnetic resonance imaging (MRI) scanner (Philips, Achieva, 16-channel phase-array head coil, Best, Netherlands) during natural sleep, using a blanket to preserve body temperature.An experienced pediatrician monitored the pulse oximeter during the MRI to determine the heart and respiratory rates of each infant.Singleshot spin-echo three-dimensional echo planar images were obtained using diffusion tensor imaging (DTI).Parameters of the DTI were b-value = 800 s/mm 2 , echo time = 75 ms, repetition time = 4,800 ms, flip angle = 90°, field of view = 120 × 120 mm, number of electrostatic gradient directions = 32, voxel sizes = 1.56 × 1.56 mm 2 , slice thickness = 2 mm, number of averages = 2, total acquisition time = 6 min 17 s, and water-fat shift = 4.68 Hz/ pixel.The slices were axially parallel to the anterior-posterior commissure line with a 40-50 slices covering the entire hemisphere and brainstem.Estimates of motion artifacts of diffusion-weighted images were calculated for individual participants, including absolute and relative volume-to-volume motion and percentage of outliers using the EDDY QC tool 81 in the FMRIB Software Library (FSL, https:// fsl.fmrib.ox.ac.uk/ fsl/ fslwi ki) 82 .Additionally, structural T2-weighted images were acquired for volumetric analysis and to exclude white matter (WM) abnormalities.The parameters for T2-weighted image were echo time = 90 ms, repetition time = 4,800 ms, flip angle = 90°, field of view = 180 × 180 mm 2 , voxel sizes = 0.5 × 0.5 mm 2 , slice thickness = 3 mm, number of averages = 1, total acquisition time = 6 min 30 s, and water-fat shift = 4.68 Hz/pixel.

Image preprocessing
Imaging data were preprocessed using an eddy correction tool for eddy current distortions, and motion artifacts 83 .A nondiffusion-weighted image (b0 image) was extracted from the raw image, including the skull and nonbrain tissues.To remove the effect of low-frequency intensity inhomogeneity on the b0 diffusion data, the bias field estimated using N4 bias field correction in advanced normalization tools (ANTs) 84 .Subsequently, the principal eigenvalues of the diffusion tensor model were computed by simple least-squares fitting of the diffusion-weighted volume.Fractional anisotropy (FA), mean diffusivity (MD), axial diffusivity (AD), and radial diffusivity (RD) were calculated using tensor-eigenvalues.Quality control of the preprocessed images was performed via visual inspection by two independent reviewers.All procedures were performed using FSL.

Network construction
A 12-parameter affine transformation and the nonlinear symmetric normalization algorithm from ANTs were employed to transform the individual b0 images into T2-weighted images from the University of North Carolina (UNC) neonate atlas 85 , and vice versa.Inverse transformations were used to warp the automated anatomical labeling atlas from the UNC space to the native space.Discrete labeling values were preserved using nearest-neighbor www.nature.com/scientificreports/interpolation, which is a family of sinc-based methods.Using this procedure, we obtained 90 brain regions (each representing a node in the network) for the underlying structural network of each participant (Supplementary Table 2).Whole-brain fiber tracking using probabilistic tractography was performed for each neonate using FSL.First, to prepare for probabilistic tracking, BEDPOSTX 86 was used to model the direction of the crossing fiber, and partial volume effects were corrected for thick slices (BEDPOSTX arguments: fiber 3 and Rician for uniform noise levels).Probabilistic tractography was then performed on individual diffusion images using PROBTRACKX 87 .The network matrix, assigned through the connectivity probabilities between brain regions i and j, was calculated as the total proportion of fibers sampled from all voxels in brain region i reaching all voxels in brain region j (PROBRACKX arguments are sample tracts per seed voxel: 5,000; step length: 0.5 mm; curvature threshold: 0.2; and fractional anisotropic volumes: 0.01).
Probabilistic tractography depends on the seeding point; therefore, the probability from i to j can differ from the probability from j to i. Therefore, we defined the unidirectional connection probability P ij between regions i and j and created a 90 × 90 symmetric matrix after averaging these two probabilities.Moreover, we performed pair-wise Pearson correlation for all 4,005 connections with nonzero probability values for all the participants and set r = 0.7 as the threshold to remove spurious connections with a small probability of connection.The weighted (W) network edges were calculated as W ij = P ij .

Global and local network analysis
Prior to brain network quantification, a sparsity threshold of 0.25 (i.e., which is the ratio of the number of actual edges to the maximum possible number of edges in a structural network) 85 , was applied to individual networks to remove the weakest connections subject to experimental noise 88 .The specific threshold selection procedure followed that of our previous network study 23 .Global and local network properties were analyzed using the Brain Connectivity Toolbox 89 and GRETNA software (http:// www.nitrc.org/ proje cts/ gretna/) 90 .
Graph metrics were used to quantify brain global (global efficiency, E glob ; local efficiency, E loc ; modularity, Q; small-worldness, S; normalized clustering coefficient, C p ; normalized shortest path length, L p ) 89 and local (betweenness centrality, BC; degree centrality, DC; nodal clustering coefficient, NC p ; nodal shortest path length, NL p ; nodal efficiency, L e ; nodal local efficiency, NL e ) 91,92 connectivity.Global metrics were computed for 1,000 random networks with conserved number of nodes, number of edges, and degree distribution at predefined sparsity thresholds 23 .Local network metrics were used as indicators of neonatal and children brain development, and employed to elucidate clinical implications 23,[93][94][95][96] .Details of the described graph-theoretical measures can be found in supplementary text 1.

WM integrity analysis
We aligned the Johns Hopkins University (JHU) neonatal, probabilistic, WM pathway atlas to the FA images of individual diffusion spaces using a nonlinear symmetric normalization algorithm in the ANTs to compute the mean FA, MD, AD, and RD values for specific WM pathways 97 .This atlas provides 27 major WM pathways in a population-averaged neonatal template.

Volumetric analysis
To extract neonatal volumetric features, we used the morphologically adaptive neonatal tissue segmentation toolbox (MANTiS) to segment and measure neonatal brain tissue from T2-weighted images 98 .Additionally, MANTiS extends the existing approach to tissue classification implemented in Statistical Parametric Mapping software to neonates, combining template adaptation and topological filtering and segmenting the neonatal brain into eight tissue classes: gray matter, WM, deep gray matter, hippocampus, amygdala, cerebellum, and brainstem.These volumetric values were corrected by dividing them by the total brain volume without cerebrospinal fluid.

Local connectivity feature
Local connectivity features are inherently complex and high-dimensional and are difficult to describe linearly.This may have caused overfitting because there were more variables than the sample size.GNA is a useful technique for determining conditional effects between a set of observed variables.Additionally, GNA can identify collinear variables and provide estimates of the most transparent relationships among variables through nonlinear relationship modeling.Recently, this methodological approach has provided support for follow-up studies, and has the potential to improve clinical care by identifying variables independently associated with clinical and maternal characteristics and neurodevelopmental outcomes in preterm infants 80 .
We employed GNA to identify the net effect of individual variables on each BSID-III subtest.Four WM integrity indices for 27 Johns Hopkins University pathway atlas and 6 local network properties for 90 brain regions were used to determine whether each of the candidate predictors showed a partial effect, even after considering the clinical characteristics.This analysis uses the method described by Williams and Rast to identify significant correlations with a single variable by forming a matrix of precision for relationships between variables, considering relationships with all other variables 99 .A precision matrix was constructed using the maximum likelihood estimation method, and the Fisher Z-transformation (95% confidence intervals) was performed to establish a network with significant relationships between variables 99 .connectivity features (feature set D, n = 5).The prediction model uses 15 combinations of prediction sets for all possible combinations.Linear (ElasticNet) and nonlinear regression (RF; XGBoost) analyses for predicting cognitive, language, and motor scores were performed using the presented combination of feature sets.However, GA and postmenstrual age were included in all feature set combinations.All of these regressors were implemented using Python's scikit learning library (https:// github.com/ scikit-learn/ scikit-learn) 100,101 except for XGBoost 102 .A randomized hyperparameter optimization was performed with fivefold cross-validation to identify high-performance models with reduced computational costs.The predictive power of the regression models was evaluated by calculating the RMSE on the held-out test set.Randomly selected 30% of the data were used as the held-out test set.We also plotted actual BSID-III scores against predicted scores for all feature set combinations of all the regression models.

Feature importance
In ElasticNet, feature importance was calculated by penalizing the coefficients in the form of absolute values through the combination of L1 and L2 regulation.In the RF, feature importance was by evaluating the extent to which each feature reduced impurity.Additionally, XGBoost computes feature importance by averaging the information gains from all decision trees into which a particular predictor is split.

Figure 1 .
Figure 1.Scatter plot for best predictive model within BSID-III subset.Feature set (A) Local connectivity features; (B) Clinical characteristic features; (C) Volumetric features; (D) Global connectivity features.Abbreviations: EP, extremely preterm; V-LP, very-to-late preterm; BSID-III, Bayley Scales of Infant and Toddler Development, Third Edition.

Figure 2 .
Figure 2. (A) Number of importance features in the best performing model in each BSID-III subset frequencies range between 1 and 10. (B) Feature importance for the best performing model in all preterm groups.For all brain region abbreviations, see table S2 in the supplementary materials.EP, extremely preterm; V-LP, veryto-late preterm; Freq, Frequencies; BC, betweenness centrality; NL p , nodal shortest path length; DC, degree centrality; N e , local efficiency; GE, global efficiency; BSID-III, Bayley Scales of Infant and Toddler Development, Third Edition.

Table 1 .
Clinical and maternal characteristics of preterm infants.Data are presented as the mean ± SD or number (%).EP, extremely preterm; V-LP, very-to-late preterm; SD, standard deviation; N/A, not applicable; IVH, interventricular hemorrhage; BPD, bronchopulmonary dysplasia; BSID-III, Bayley Scales of Infant and Toddler Development, Third Edition.

Table 4 .
Frequencies of feature importance within nine best performing models.For all brain region abbreviations, see table S2 in the supplementary materials.EP, extremely preterm; V-LP, very-to-late preterm; BSID-III, Bayley Scales of Infant and Toddler Development, Third Edition.