Language function following preterm birth: prediction using machine learning

Valavani, Evdoxia; Blesa, Manuel; Galdi, Paola; Sullivan, Gemma; Dean, Bethan; Cruickshank, Hilary; Sitko-Rudnicka, Magdalena; Bastin, Mark E.; Chin, Richard F. M.; MacIntyre, Donald J.; Fletcher-Watson, Sue; Boardman, James P.; Tsanas, Athanasios

doi:10.1038/s41390-021-01779-x

Download PDF

Clinical Research Article
Open access
Published: 11 October 2021

Language function following preterm birth: prediction using machine learning

Evdoxia Valavani¹,
Manuel Blesa²,
Paola Galdi²,
Gemma Sullivan²,
Bethan Dean²,
Hilary Cruickshank³,
Magdalena Sitko-Rudnicka⁴,
Mark E. Bastin⁵,
Richard F. M. Chin^6,7,
Donald J. MacIntyre⁸,
Sue Fletcher-Watson⁹,
James P. Boardman^2,5 &
…
Athanasios Tsanas¹

Pediatric Research volume 92, pages 480–489 (2022)Cite this article

3056 Accesses
8 Citations
7 Altmetric
Metrics details

Abstract

Background

Preterm birth can lead to impaired language development. This study aimed to predict language outcomes at 2 years corrected gestational age (CGA) for children born preterm.

Methods

We analysed data from 89 preterm neonates (median GA 29 weeks) who underwent diffusion MRI (dMRI) at term-equivalent age and language assessment at 2 years CGA using the Bayley-III. Feature selection and a random forests classifier were used to differentiate typical versus delayed (Bayley-III language composite score <85) language development.

Results

The model achieved balanced accuracy: 91%, sensitivity: 86%, and specificity: 96%. The probability of language delay at 2 years CGA is increased with: increasing values of peak width of skeletonized fractional anisotropy (PSFA), radial diffusivity (PSRD), and axial diffusivity (PSAD) derived from dMRI; among twins; and after an incomplete course of, or no exposure to, antenatal corticosteroids. Female sex and breastfeeding during the neonatal period reduced the risk of language delay.

Conclusions

The combination of perinatal clinical information and MRI features leads to accurate prediction of preterm infants who are likely to develop language deficits in early childhood. This model could potentially enable stratification of preterm children at risk of language dysfunction who may benefit from targeted early interventions.

Impact

A combination of clinical perinatal factors and neonatal DTI measures of white matter microstructure leads to accurate prediction of language outcome at 2 years corrected gestational age following preterm birth.
A model that comprises clinical and MRI features that has potential to be scalable across centres. It offers a basis for enhancing the power and generalizability of diagnostic and prognostic studies of neurodevelopmental disorders associated with language impairment.
Early identification of infants who are at risk of language delay, facilitating targeted early interventions and support services, which could improve the quality of life for children born preterm.

Language performance and brain volumes, asymmetry, and cortical thickness in children born extremely preterm

Article Open access 03 November 2023

Hedvig Kvanta, Jenny Bolk, … Ulrika Ådén

Extremely preterm children exhibit altered cortical thickness in language areas

Article Open access 02 July 2020

Maria E. Barnes-Davis, Brady J. Williamson, … Darren S. Kadis

MRI based radiomics enhances prediction of neurodevelopmental outcome in very preterm neonates

Article Open access 13 July 2022

Matthias W. Wagner, Delvin So, … Steven P. Miller

Introduction

An estimated 15 million infants are born preterm (before 37 weeks of gestation) annually worldwide.¹ Although advances in neonatal intensive care have led to a decrease in infant mortality rates over time, survivors of preterm birth are at increased risk of long-term neurocognitive impairment.² Preterm birth may lead to language deficits that persist into school age³ and are associated with a range of negative sequelae across the life span, including poor academic performance, poor social, emotional and behavioural functioning, and unemployment.^4,5 Neurodevelopmental trajectories are amenable to early intervention, which presents a window of opportunity to have a profound, long-lasting effect on later life.⁶ Therefore, there is a clear unmet clinical need for early identification of those children who are at high risk of poor language development.

Multiple outcome studies have demonstrated associations between prenatal, neonatal, and postnatal factors and early neurodevelopmental outcomes for preterm infants.^7,8 In addition, preterm birth is closely associated with generalized microstructural changes in cerebral white matter, inferred from diffusion tensor imaging (DTI) (fractional anisotropy [FA], mean, axial, and radial diffusivities [MD, AD, RD]), and alterations in these have been linked to language delay.⁹ However, it is rare for research to combine data from different modalities for the development of prediction models for neurodevelopmental outcomes.

Nonetheless, a few studies have built and validated tools for prediction of the composite outcome of neurodevelopmental impairment at 2 years corrected gestational age (CGA) for children born preterm. Tyson et al.¹⁰ investigated the clinical and demographic characteristics of a cohort of infants born before 26 weeks of gestation and found that the risk of adverse neurodevelopmental outcome at 18–22 months CGA was predicted using gestational age (GA), sex, exposure to antenatal corticosteroids, multiple birth, and birth weight. Ambalavanan et al.¹¹ reported that neurodevelopmental impairment at 18–22 months CGA was predicted by combining sex, respiratory illness severity, and enlarged ventricular size, periventricular leukomalacia, or porencephalic cyst on cranial ultrasound. Vesoulis et al.¹² developed a tool for prediction of risk of neurodevelopmental impairment at 18–24 months CGA. This tool comprised of ventilator days, mode of delivery, exposure to antenatal corticosteroids, retinopathy of prematurity (ROP) requiring surgery, and magnetic resonance imaging (MRI) findings (cerebellar haemorrhage size, cerebellar haemorrhage laterality, intraventricular haemorrhage grade, white matter injury).

However, deficits in different developmental domains require different therapies and targeted support strategies. Thus, tools for stratification of children at high risk of impairment in specific developmental domains would be valuable. Recently, Vassar et al.¹³ evaluated the predictive value of structural MRI and DTI variables for classification of very preterm infants at high versus low risk of language delay. They developed a model for prediction of language delay that included DTI variables in three brain regions and achieved 89% sensitivity and 86% specificity. Ball et al.¹⁴ revealed that distinct patterns of brain structure and microstructure following preterm birth are linked to specific clinical and environmental factors, and these patterns correlate with neurodevelopmental outcome at 18–24 months CGA. Language outcome was associated with specific neuroanatomic variation, which was linked to age at scan, need for continuous positive airway pressure, birth weight, GA at birth, parenteral nutrition, surfactant administration, and mechanical ventilation.

In view of this evidence, we hypothesized that a combination of clinical, environmental, and imaging factors derived from DTI that capture generalized white matter dysmaturation would potentially enhance the prediction of language outcomes at 2 years CGA following preterm birth. Blesa et al.¹⁵ demonstrated that histogram-based variables derived from DTI (peak width of skeletonized [PS] FA, MD, RD, and AD), which represent generalized water content and myelination, can be used as biomarkers of microstructural white matter alterations associated with preterm birth. The advantage of the histogram-based framework is that it is fully automated, captures generalized white matter dysmaturation that characterizes the encephalopathy of prematurity, is computationally inexpensive compared with tract-specific approaches, and has high inter-scanner reproducibility.¹⁶

A prediction tool that combines clinical data and imaging biomarkers for early language development is lacking, and yet timely identification of future language deficits has clinical and research implications, because it could stratify infants at most need for early interventions. Here we aimed to develop a machine learning model that accurately predicts typical versus delayed language outcomes at 2 years CGA using a parsimonious feature set derived from clinical, demographic, and histogram-based variables computed from neonatal brain DTI.

Methods

Participants

Participants were selected from a longitudinal cohort of preterm neonates born at ≤33 weeks of gestation at the Royal Infirmary of Edinburgh between February 2012 and August 2015.¹⁷ Selection from the larger cohort was based on availability of diffusion MRI (dMRI) scans at term-equivalent age and 2-year language outcome. Ethical approval was obtained from the UK National Research Ethics Service (NRES), South East Scotland Research Ethics Committee (NRES numbers 11/55/0061 and 13/SS/0143). Written informed consent from parents/carers was obtained for all neonates. Exclusion criteria for the study were congenital anomalies, chromosomal abnormalities, congenital infections or major overt parenchymal lesions (cystic periventricular leukomalacia, haemorrhagic parenchymal infarction), and post-haemorrhagic ventricular dilatation. Infants with a contraindication to MRI at 3 Tesla were also excluded.

Clinical and demographic features

The selection of clinical and demographic features included in models was guided by extant literature linking biological and environmental exposures with neurocognitive development in preterm infants. Specifically, we studied the contribution towards prediction of language outcome at 2 years CGA of the following features: sex,^10,11,18,19 GA (based on first trimester ultrasound),^10,18 birth weight,^10,20 maternal age,²¹ primiparity,¹⁹ twin status,^10,20 maternal body mass index (BMI),²² medical history of maternal depression,²³ administration of a complete course of antenatal corticosteroids for foetal lung maturation (defined as two doses 24 h apart), any antenatal corticosteroid exposure,^10,12,19,20 administration of antenatal magnesium sulfate (MgSO₄) for neuroprotection,²⁴ mode of delivery (spontaneous vaginal delivery or caesarean section),¹⁹ total days requiring intubation while in the neonatal intensive care unit (NICU),^11,12,18 bronchopulmonary dysplasia (defined as oxygen requirement at ≥36 weeks CGA),^19,20,25,26 late-onset sepsis (defined as blood stream infection occurring ≥72 h postnatally with (a) bacterial pathogen isolated from blood culture or (b) blood culture growing coagulase-negative staphylococcus, along with one or more signs of generalized infection, and treatment with intravenous antibiotics for ≥5 days),²⁰ necrotizing enterocolitis (NEC, defined as stages two or three according to the modified Bell’s staging for NEC²⁷),^25,28 ROP treated with laser therapy,^12,29 and type of infant feeding at discharge from the neonatal unit (dichotomized as exclusive maternal breast milk versus exclusive formula or mixed feeding).³⁰ All infants had placental histopathology performed and histological chorioamnionitis was defined using an established system.³¹ Maternal level of education (dichotomized as secondary school or below versus college, university or postgraduate studies)^18,19,20 and socioeconomic status of the family, operationalized as Scottish Index of Multiple Deprivation 2016 (SIMD16) quintile, where 1 indicates the most deprived and 5 indicates the least deprived (https://www2.gov.scot/Topics/Statistics/SIMD), were also included.

Image acquisition

Infants underwent a brain MRI scan at term-equivalent age (38–42 weeks GA) without sedation, during natural sleep after having been fed and swaddled. Vital signs were monitored throughout the scan, and hearing protection was provided for all neonates (MiniMuffs, Natus). All scans were supervised by a physician and a paediatric nurse trained in neonatal resuscitation.

A Siemens MAGNETOM Verio 3-Tesla MRI clinical scanner (Siemens Healthcare Gmbh, Erlangen, Germany) and 12-channel phased-array head coil were used to acquire dMRI data consisting of 11 T2-weighted and 64 diffusion-weighted (b = 750 s/mm²) single-shot, spin-echo, echo planar imaging volumes collected in the axial plane with 2 mm isotropic voxels (repetition time = 7300 ms, echo time = 06 ms, field of view = 256 mm, acquired matrix = 128 × 128, 50 contiguous interleaved slices with 2 mm thickness, acquisition time=9 min 29 s).

Image analysis

For each participant, the dMRI was denoised using a Marchenko-Pastur-PCA-based algorithm;^32,33 eddy current and head movement were corrected using outlier replacement^34,35,36 and bias field inhomogeneity correction was performed by calculating the bias field of the mean b0 volume and applying the correction to all the volumes.³⁷ For each participant, PSFA, PSMD, PSRD, and PSAD were calculated using age-optimized methods described by Blesa et al.¹⁵ In summary, image data were registered to the Edinburgh Neonatal Atlas₅₀¹⁵ using a tensor registration,³⁸ and their DTI maps were calculated. Subsequently, the individual FA maps were projected into the template skeleton and multiplied by the atlas custom mask. Finally, the peak width of the histogram values within the skeletonized maps was calculated as the difference between the 95th and 5th percentiles.¹⁶ Figure 1 illustrates a summary of the process described. The code necessary to calculate histogram-based metrics can be found at https://git.ecdf.ed.ac.uk/jbrl/psmd. Figure 2 shows scatterplots of the values of the PS DTI metrics for all participants.

**Fig. 1: Scheme of the steps necessary for the calculation of the peak width of skeletonized DTI metrics.**

**Fig. 2: Scatterplots of the PSFA, PSMD, PSAD, and PSRD values for all participants.**

Language outcome

All children took part in a developmental assessment with a trained clinician at 2 years CGA (median age 24.13, range 23.1–28.27 months) using the Bayley Scales of Infant and Toddler Development, Third Edition (Bayley-III).³⁹ We used the Bayley-III language composite score (mean 100, SD 15) as the response variable. The clinical cut-off of 85 (i.e. 1 SD below the mean) was used in order to assign children into two distinct groups, thus creating a binary outcome; children whose score was <85 were considered to have moderate-to-severe language impairment, while scores ≥85 were considered as normal range or higher.⁴⁰

Data analysis

We compared three feature selection algorithms: (a) Boruta,⁴¹ (b) ReliefF expRank,^42,43 and (c) random forests (RF) variable importance.⁴⁴ The Boruta algorithm is a wrapper feature selection technique built around the RF learner, which uses Z score as the importance measure. In other words, it measures the importance of each feature by dividing the average loss of accuracy among all trees by the standard deviation of the accuracy loss. The basic idea of the ReliefF algorithm is to assign a ‘weight’ value to all features of a data set based on how well their values distinguish between the instances that are near to each other and thus how useful they are in predicting the response variable. The important features will have a large weight, while the redundant ones will have a low weight. In RF variable importance, variable importance is computed using the mean decrease in Gini index. We can measure the total amount that the Gini index is decreased by splits over a given feature, averaged over all trees. A large value indicates an important feature. In all cases, we obtain a feature ranking indicating in descending order their contribution towards prediction of the response variable. The final feature subset for each feature selection algorithm was selected using leave-one-out cross-validation (LOOCV), using only the training data set in each cross-validation iteration and following the process described by Tsanas et al.⁴⁵ Subsequently, the selected feature subset was presented into a RF classifier⁴⁶ in order to predict the binarized language composite score. Partial dependence plots (PDP)⁴⁷ were constructed in order to assess how the selected features influence the prediction of the RF classifier. To quantify the strength of the association between the selected features, we used correlation analysis (the Spearman’s rank correlation coefficient was used to quantify the strength of the association between two continuous features, the phi coefficient was used to quantify the association between two binary features, and the point-biserial correlation coefficient was used to quantify the strength of the association between a continuous and a binary feature).

The data set is imbalanced since only 16% of the study group had a language composite score <85. To overcome the class imbalance problem in the data set, we explored different data balancing techniques: under-sampling of the majority class, over-sampling of the minority class, and the synthetic minority over-sampling technique (SMOTE),⁴⁸ which has been previously used in similar unbalanced applications in the healthcare domain.^{49,50,51,52,53} We found that SMOTE yields the best results, which are presented in the paper. SMOTE is a training data enrichment method, where the minority class is over-sampled by creating new synthetic samples, to create a balanced data set. For each minority class sample, the k minority class nearest neighbours were identified (using the suggestion of Chawla et al. with k = 5) and synthetic samples were introduced along the line segments joining any or all of the k minority class nearest neighbours. Model validation was implemented using LOOCV. LOOCV involves holding out a single observation to be used as the test set, while the learner is trained using the remaining n − 1 observations (n is the total number of observations). The process is repeated n times and each time a different observation from the original data set is used as the test set. The result is n estimates of the test error. The final test error rate is the average of these n test error estimates. The accuracy of the model was assessed by constructing a confusion matrix, which is a contingency table of the observed and predicted classes. Missing data for both numeric and categorical features were imputed using multiple imputation by chained equations (five imputed data sets were created in each LOOCV iteration),^54,55 based only on the information in the training set independently within each LOOCV iteration. Data analysis was conducted in R. The R packages used were: tidyverse, dplyr, caret, randomForest, CORElearn, Boruta, mice, ggplot2, DMwR, Hmisc, RGraphics, grid, gridExtra, and gridGraphics.

Results

Two-year language data and dMRI of the brain at term-equivalent age were available from 89 children; demographic and clinical characteristics of the study population are presented in Table 1. At median age 24.13 months (range 23.24–28.27 months), 14 children had a language composite score <85. The percentage of missing values in the data set was 0.2% (1 participant had missing histological chorioamnionitis data, 2 participants had missing SIMD16, and 3 participants had missing maternal BMI).

Table 1 Demographic and clinical characteristics of the study group.

Full size table

Figure 3 illustrates the out-of-sample performance of the RF classifier (trained on approximately 150 samples in each LOOCV iteration) as a function of the number of features selected by the different feature selection algorithms. These data show that feeding a subset of eight features selected by the Boruta feature selection algorithm (a wrapper feature selection technique built around the RF learner) to the RF classifier gives the highest balanced accuracy. The selected feature subset comprises PSFA, twin status (yes or no), antenatal steroid exposure (complete or incomplete course), any antenatal steroid exposure (yes or no), sex (male or female), PSRD, PSAD, and feeding at discharge from the NICU (exclusive maternal breast milk versus exclusive formula or mixed feeding). Figure 4 shows the importance attributed to each feature by each of the feature selection algorithms. PSFA, twin status, the course of antenatal steroid exposure, any antenatal steroid exposure, sex, PSRD, PSAD, and feeding are the jointly most predictive features towards the prediction of the binarized language outcome. PDP were used to visualize relationships between the selected features and the response based on our model (see Fig. 5). The PDP provide insight into the effect of changing one or two features in terms of the model’s prediction (binary response variable, indicating whether language composite score <85). Regarding the histogram-based variables derived from DTI, the PDP show that the predicted language impairment probability rises with increasing PSFA, PSRD, and PSAD values. PSRD and PSAD are presented in the same plot because they are highly correlated as illustrated in the correlogram and correlation matrix in Fig. 6. Language composite score <85 at 2 years CGA is more likely following a twin pregnancy, an incomplete course of antenatal corticosteroids, or no exposure to antenatal steroids. Female sex and feeding with exclusive breast milk reduce the risk of future language delay.

**Fig. 5: Partial dependence plots for the eight features selected by Boruta and used in the random forests classifier.**

**Fig. 6: Correlogram and correlation matrix of the eight most important features selected by the Boruta algorithm.**

Table 2 shows the confusion matrix of the out-of-sample classification performance of the RF classifier when mapping the selected feature subset (i.e., PSFA, twin status, antenatal corticosteroid exposure, sex, PSRD, PSAD, and feeding at discharge) to the binarized language composite score. Our model achieved balanced accuracy: 91%, sensitivity: 86%, and specificity: 96%.

Table 2 Confusion matrix summarizing the out-of-sample findings using LOOCV.

Full size table

Finally, we repeated the analysis to investigate separately the performance of the model when presented only with either clinical or MRI features, which led to reduced model performance. As shown in Table 3, the model that comprises clinical and MRI features outperformed the models using only clinical or MRI features. The combination of clinical and DTI features enhances the prediction of language outcomes at 2 years CGA following preterm birth.

Table 3 Model performance using (a) only clinical features, (b) only MRI features, and (c) the combination of clinical and MRI features.

Full size table

Discussion

We developed a parsimonious machine learning model that accurately identifies preterm infants who are likely to develop language impairment in early childhood. We explored the predictive value of 24 clinical, demographic, and brain imaging features and found that a robust subset of eight clinical characteristics and imaging biomarkers best predicts a language composite score <85 on the Bayley-III: PSFA, PSRD, PSAD, twin status, administration of an incomplete course of antenatal corticosteroids, no exposure to antenatal corticosteroids, male sex, and feeding with exclusive formula milk or mixed formula and breast milk. Overall, we demonstrated out-of-sample balanced accuracy: 91%, sensitivity: 86%, and specificity: 96%.

Feature selection was conducted by comparing three feature selection algorithms: (a) Boruta, (b) ReliefF expRank, and (c) RF variable importance. Feature selection methods can be broadly considered into three main categories: filter, wrapper, embedded methods. Filter feature selection methods work independently of a statistical learner relying on the general statistical properties of the data and thus select a feature subset that is not tuned or optimized towards a specific learning algorithm. Wrapper methods take a particular machine learning method into account in order to choose the best subset of the original features. They evaluate multiple models by training and testing in the feature space, thus optimizing the performance of the particular machine learning model that was used. Embedded methods choose the subset of features while the learning model is being constructed. This means that the resulting feature subset is specific to a particular learning algorithm. We chose to use a feature selection algorithm from each main category for our exploration; ReliefF is a filter technique, Boruta is wrapper feature selection technique built around the RF learner, and RF variable importance is an embedded method. The use of ReliefF and the RF importance have been extensively used and validated in many different applications and we have previously conducted a thorough empirical study⁵⁶ where they performed very competitively against many established feature selection approaches. In general, we would expect a wrapper or embedded method to perform better for a particular choice of a classifier, although it might not necessarily generalize very well with the choice of different classifiers.

Our findings suggest that PSFA, PSRD, and PSAD, which detect generalized white matter microstructural alterations in preterm infants compared to infants born at term,¹⁵ are predictive of impaired language development at 2 years CGA. We explored the predictive value of whole-brain measures of PS DTI metrics, instead of tract-specific segmentations, because preterm brain dysmaturation is a substantially generalized process,⁵⁷ and language development draws on broad cognitive capacities. We have found that the probability of language delay is higher with increased PSFA, PSRD, and PSAD. These features are consistent with delayed myelination, less coherent white matter organization, and altered axonal integrity in the preterm brain.^15,58 Previous research has also shown that abnormalities in brain structure following preterm birth are correlated with long-term neurodevelopmental outcome.⁵⁹

The data show that twin status is associated with increased risk of impaired language development. This finding is consistent with studies in the extant literature which have found that multiple pregnancy is associated with neurodevelopmental impairment^10,20,60 and language delay⁶¹ at 2 years CGA. Language delay in twins can be attributed to postnatal environmental factors;^62,63 twins receive a less focussed and less elaborated communicative interchange with their parents than do singletons. Thorpe et al.⁶² compared families with twins to families with pairs of closely spaced singletons. This study found that language delay in twins compared to singletons may be explained by patterns of parent–child interaction and communication. Antenatal corticosteroid administration is associated with lower risk of language deficits, which has been previously proved by research.^10,12 Our findings suggest that male sex is a risk factor for language impairment in early childhood, consistent with previous studies that have associated male sex with poorer neurodevelopmental outcome following preterm birth.^10,11,18,19

Moreover, previous work has shown that exclusive breast milk feeding in the weeks following preterm birth can enhance brain development,³⁰ and in the general population breast milk intake in infancy is associated with improved performance on intelligence tests.⁶⁴ In line with this, we found that exclusive breastfeeding is associated with improved language outcomes compared to formula feeding or mixed breast and formula feeding. It is surprising that GA at birth was not included in the final feature set. However, its influence on long-term outcome may be captured by PSRD and PSAD, which are strongly correlated with GA at birth.¹⁵

This study is the first to investigate the use of PS DTI metrics as predictors for language development in the preterm population. The advantage of using these image biomarkers is that their calculation is fully automated, computationally inexpensive, and has high inter-scanner reproducibility,¹⁶ meaning that they can be easily obtained for preterm neonates who undergo a dMRI scan at term-equivalent age and can be used for multi-centre studies. Thus, our model comprises features that can be easily obtained for future clinical application.

Hitherto, few studies have focussed on developing and validating prediction models for early neurodevelopmental outcomes for children born preterm. Most tools predict the composite outcome of neurodevelopmental impairment.^10,11,12 However, deficits in different developmental domains require different interventions. Therefore, tools for timely identification of children at risk of impairment in specific developmental domains are valuable. The developed model predicts language deficits at 2 years CGA. Recently, a model was developed for classification of very preterm infants at high versus low risk for language delay, which achieved 89% sensitivity and 86% specificity.¹³ That model included DTI variables in three brain regions: MD of right sagittal stratum and right inferior occipital gyrus and AD of right lingual gyrus. However, whole-brain calculation of DTI variables is computationally expensive; hence, we investigated the predictive value of histogram-based variables derived from DTI. We have shown that combining DTI metrics with perinatal factors, along with the use of advanced machine learning techniques, can further improve identification of children at risk of language impairment.

The main strength of our study is that we had a longitudinal cohort of preterm infants that is deeply phenotyped with brain imaging and biological information that enabled us to investigate a large number of clinical, demographic, social, and DTI variables. We acknowledge some limitations in our study. The sample size is relatively small, and this is a single-centre study, so despite our best efforts with standard model validation techniques to assess model generalization we would need to further validate findings in a different cohort. Nonetheless, the study population was fairly representative of NICU populations in terms of comorbidities that have been associated with long-term neurodevelopmental outcomes. In addition, cortical grey matter was not assessed in this study. We focussed on alterations in white matter microstructure, since it is the most consistently abnormal finding in preterm infants, by measuring a functionally tractable property using a tool that is readily applied to clinical image data. Future studies could aim to validate our model in additional external cohorts and also apply machine learning techniques for prediction of motor, cognitive, and social–emotional outcomes for children born preterm.

Conclusion

A combination of clinical perinatal factors and neonatal DTI measures of white matter microstructure best predict language impairment at 2 years after preterm birth. This model has the potential to enable clinicians identify infants who are at risk of language delay, thus facilitating targeted early intervention and support services. The model comprises clinical and MRI features that have potential to be scalable across centres, so it offers a basis for enhancing the power and generalizability of diagnostic and prognostic studies of neurodevelopmental disorders associated with language impairment.

References

Chawanpaiboon, S. et al. Global, regional, and national estimates of levels of preterm birth in 2014: a systematic review and modelling analysis. Lancet Glob. Health 7, e37–e46 (2019).
Article PubMed Google Scholar
Pierrat, V. et al. Neurodevelopmental outcome at 2 years for preterm children born at 22 to 34 weeks’ gestation in France in 2011: EPIPAGE-2 cohort study. BMJ 358, j3448 (2017).
Article PubMed PubMed Central Google Scholar
van Noort-van der Spek, I. L., Franken, M.-C. J. P. & Weisglas-Kuperus, N. Language functions in preterm-born children: a systematic review and meta-analysis. Pediatrics 129, 745–754 (2012).
Article PubMed Google Scholar
Law, J., Rush, R., Schoon, I. & Parsons, S. Modeling developmental language difficulties from school entry into adulthood: literacy, mental health, and employment outcomes. J. Speech Lang. Hear. Res. 52, 1401–1416 (2009).
Article PubMed Google Scholar
Conti-Ramsden, G., Mok, P. L. H., Pickles, A. & Durkin, K. Adolescents with a history of specific language impairment (SLI): strengths and difficulties in social, emotional and behavioral functioning. Res. Dev. Disabil. 34, 4161–4169 (2013).
Article PubMed PubMed Central Google Scholar
Spittle, A., Orton, J., Anderson, P. J., Boyd, R. & Doyle, L. W. Early developmental intervention programmes provided post hospital discharge to prevent motor and cognitive impairment in preterm infants. Cochrane Database Syst. Rev. CD005495 (2015).
Linsell, L., Malouf, R., Morris, J., Kurinczuk, J. J. & Marlow, N. Prognostic factors for poor cognitive development in children born very preterm or with very low birth weight: a systematic review. JAMA Pediatr. 169, 1162–1172 (2015).
Article PubMed PubMed Central Google Scholar
Linsell, L., Malouf, R., Morris, J., Kurinczuk, J. J. & Marlow, N. Prognostic factors for cerebral palsy and motor impairment in children born very preterm or very low birthweight: a systematic review. Dev. Med. Child Neurol. 58, 554–569 (2016).
Article PubMed PubMed Central Google Scholar
Feldman, H. M., Lee, E. S., Yeatman, J. D. & Yeom, K. W. Language and reading skills in school-aged children and adolescents born preterm are associated with white matter properties on diffusion tensor imaging. Neuropsychologia 50, 3348–3362 (2012).
Article PubMed PubMed Central Google Scholar
Tyson, J. E. et al. Intensive care for extreme prematurity–moving beyond gestational age. N. Engl. J. Med. 358, 1672–1681 (2008).
Article CAS PubMed PubMed Central Google Scholar
Ambalavanan, N. et al. Outcome trajectories in extremely preterm infants. Pediatrics 130, e115–e125 (2012).
Article PubMed PubMed Central Google Scholar
Vesoulis, Z. A., El Ters, N. M., Herco, M., Whitehead, H. V & Mathur, A. M. A web-based calculator for the prediction of severe neurodevelopmental impairment in preterm infants using clinical and imaging characteristics. Children 5, 151 (2018).
Vassar, R. et al. Neonatal brain microstructure and machine-learning-based prediction of early language development in children born very preterm. Pediatr. Neurol. 108, 86–92 (2020).
Ball, G. et al. Multimodal image analysis of clinical influences on preterm brain development. Ann. Neurol. 82, 233–246 (2017).
Article PubMed PubMed Central Google Scholar
Blesa, M. et al. Peak width of skeletonized water diffusion MRI in the neonatal brain. Front. Neurol. 11, 235 (2020).
Article PubMed PubMed Central Google Scholar
Baykara, E. et al. A novel imaging marker for small vessel disease based on skeletonization of white matter tracts and diffusion histograms. Ann. Neurol. 80, 581–592 (2016).
Article PubMed Google Scholar
Boardman, J. P. et al. Impact of preterm birth on brain development and long-term outcome: protocol for a cohort study in Scotland. BMJ Open 10, e035854 (2020).
Article PubMed PubMed Central Google Scholar
Charkaluk, M. L. et al. Neurodevelopment of children born very preterm and free of severe disabilities: the Nord-Pas de Calais Epipage cohort study. Acta Paediatr. 99, 684–689 (2010).
Article CAS PubMed Google Scholar
Wood, N. S. et al. The EPICure study: associations and antecedents of neurological and developmental disability at 30 months of age following extremely preterm birth. Arch. Dis. Child. Fetal Neonatal Ed. 90, F134–F140 (2005).
Article CAS PubMed PubMed Central Google Scholar
Vohr, B. R., Wright, L. L., Poole, W. K. & McDonald, S. A. Neurodevelopmental outcomes of extremely low birth weight infants <32 weeks’ gestation between 1993 and 1998. Pediatrics 116, 635–643 (2005).
Article PubMed Google Scholar
Tseng, K.-T. et al. The impact of advanced maternal age on the outcomes of very low birth weight preterm infants. Medicine 98, e14336 (2019).
Article PubMed PubMed Central Google Scholar
Reynolds, L. C., Inder, T. E., Neil, J. J., Pineda, R. G. & Rogers, C. E. Maternal obesity and increased risk for autism and developmental delay among very preterm infants. J. Perinatol. 34, 688–692 (2014).
Article CAS PubMed PubMed Central Google Scholar
Bozkurt, O. et al. Does maternal psychological distress affect neurodevelopmental outcomes of preterm infants at a gestational age of ≤32weeks. Early Hum. Dev. 104, 27–31 (2017).
Article PubMed Google Scholar
Marret, S. et al. [Effect of magnesium sulphate on mortality and neurologic morbidity of the very-preterm newborn (of less than 33 weeks) with two-year neurological outcome: results of the prospective PREMAG trial]. Gynecol. Obstet. Fertil. 36, 278–288 (2008).
Article CAS PubMed Google Scholar
Synnes, A. et al. Determinants of developmental outcomes in a very preterm Canadian cohort. Arch. Dis. Child. Fetal Neonatal Ed. 102, F235–F234 (2017).
Article PubMed Google Scholar
Twilhaar, E. S. et al. Cognitive outcomes of children born extremely or very preterm since the 1990s and associated risk factors: a meta-analysis and meta-regression. JAMA Pediatr. 172, 361–367 (2018).
Article PubMed PubMed Central Google Scholar
Bell, M. J. et al. Neonatal necrotizing enterocolitis. Therapeutic decisions based upon clinical staging. Ann. Surg. 187, 1–7 (1978).
Article CAS PubMed PubMed Central Google Scholar
van Vliet, E. O. G., de Kieviet, J. F., Oosterlaan, J. & van Elburg, R. M. Perinatal infections and neurodevelopmental outcome in very preterm and very low-birth-weight infants: a meta-analysis. JAMA Pediatr. 167, 662–668 (2013).
Article PubMed Google Scholar
Schmidt, B., Davis, P. G., Asztalos, E. V., Solimano, A. & Roberts, R. S. Association between severe retinopathy of prematurity and nonvisual disabilities at age 5 years. JAMA 311, 523 (2014).
Article CAS PubMed Google Scholar
Blesa, M. et al. Early breast milk exposure modifies brain connectivity in preterm infants. Neuroimage 184, 431–439 (2019).
Article PubMed Google Scholar
Anblagan, D. et al. Association between preterm brain injury and exposure to chorioamnionitis during fetal life. Sci. Rep. 6, 37932 (2016).
Article CAS PubMed PubMed Central Google Scholar
Veraart, J., Fieremans, E. & Novikov, D. S. Diffusion MRI noise mapping using random matrix theory. Magn. Reson. Med. 76, 1582–1593 (2016).
Article CAS PubMed Google Scholar
Tournier, J.-D. et al. MRtrix3: a fast, flexible and open software framework for medical image processing and visualisation. Neuroimage 202, 116137 (2019).
Article PubMed Google Scholar
Smith, S. M. et al. Advances in functional and structural MR image analysis and implementation as FSL. Neuroimage 23, S208–S219 (2004).
Article PubMed Google Scholar
Andersson, J. L. R. & Sotiropoulos, S. N. An integrated approach to correction for off-resonance effects and subject movement in diffusion MR imaging. Neuroimage 125, 1063–1078 (2016).
Article PubMed Google Scholar
Andersson, J. L. R., Graham, M. S., Zsoldos, E. & Sotiropoulos, S. N. Incorporating outlier detection and replacement into a non-parametric framework for movement and distortion correction of diffusion MR images. Neuroimage 141, 556–572 (2016).
Article PubMed Google Scholar
Tustison, N. J. et al. N4ITK: improved N3 bias correction. IEEE Trans. Med. Imaging 29, 1310–1320 (2010).
Article PubMed PubMed Central Google Scholar
Zhang, H., Yushkevich, P. A., Alexander, D. C. & Gee, J. C. Deformable registration of diffusion tensor MR images with explicit orientation optimization. Med. Image Anal. 10, 764–785 (2006).
Article PubMed Google Scholar
Albers, C. A. & Grieve, A. J. Test Review: Bayley, N. (2006). Bayley Scales of Infant and Toddler Development–Third Edition. San Antonio, TX: Harcourt Assessment. J. Psychoeduc. Assess. 25, 180–190 (2007).
Article Google Scholar
Johnson, S., Moore, T. & Marlow, N. Using the Bayley-III to assess neurodevelopmental delay: which cut-off should be used? Pediatr. Res. 75, 670–674 (2014).
Article PubMed Google Scholar
Kursa, M. B. & Rudnicki, W. R. Feature selection with the Boruta package. J. Stat. Softw. 36, 1–13 (2010).
Article Google Scholar
Kira, K. & Rendell, L. A. The Feature Selection Problem: Traditional Methods and a New Algorithm (AAAI Press, 1992).
Kononenko, I. In Machine Learning: ECML-94. ECML 1994. Lecture Notes in Computer Science (Lecture Notes in Artificial Intelligence), Vol. 784 (eds Bergadano, F. & De Raedt, L.) 171–182 (Springer, 1994).
Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning Data Mining, Inference, and Prediction (Springer, 2009).
Tsanas, A., Little, M. A., McSharry, P. E., Spielman, J. & Ramig, L. O. Novel speech signal processing algorithms for high-accuracy classification of Parkinson’s disease. IEEE Trans. Biomed. Eng. 59, 1264–1271 (2012).
Article PubMed Google Scholar
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Article Google Scholar
Friedman, J. H. Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001).
Article Google Scholar
Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002).
Article Google Scholar
Dessie, E. Y., Tsai, J. J. P., Chang, J.-G. & Ng, K.-L. A novel miRNA-based classification model of risks and stages for clear cell renal cell carcinoma patients. BMC Bioinformatics 22, 270 (2021).
Article CAS PubMed PubMed Central Google Scholar
Park, K. H., Batbaatar, E., Piao, Y., Theera-Umpon, N. & Ryu, K. H. Deep learning feature extraction approach for hematopoietic cancer subtype classification. Int. J. Environ. Res. Public Health 18, 1–24 (2021).
Google Scholar
Lee, Y. W., Choi, J. W. & Shin, E. H. Machine learning model for predicting malaria using clinical information. Comput. Biol. Med. 129, 104151 (2021).
Ivanović, M. D. et al. Predicting defibrillation success in out-of-hospital cardiac arrested patients: Moving beyond feature design. Artif. Intell. Med. 110, 101963 (2020).
Nguyen, Q. D. N., Liu, A. B. & Lin, C. W. Development of a neurodegenerative disease gait classification algorithm using multiscale sample entropy and machine learning classifiers. Entropy 22, 1340 (2020).
Article Google Scholar
Raghunathan, T. E., Lepkowski, J., Hoewyk, J., Van & Solenberger, P. A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv. Methodol. 27, 85–95 (2001).
Van Buuren, S. Multiple imputation of discrete and continuous data by fully conditional specification. Stat. Methods Med. Res. 16, 219–242 (2007).
Article PubMed Google Scholar
Tsanas, A. Accurate Telemonitoring of Parkinson’s Disease Symptom Severity using Nonlinear Speech Signal Processing and Statistical Machine Learning. PhD thesis, Oxford Univ. (2012).
Telford, E. J. et al. A latent measure explains substantial variance in white matter microstructure across the newborn human brain. Brain Struct. Funct. 222, 4023–4033 (2017).
Article PubMed PubMed Central Google Scholar
Boardman, J. P. & Counsell, S. J. Invited Review: Factors associated with atypical brain development in preterm infants: insights from magnetic resonance imaging. Neuropathol. Appl. Neurobiol. 46, 413–421 (2020).
Article CAS PubMed Google Scholar
Batalle, D., Edwards, A. D. & O’Muircheartaigh, J. Annual Research Review: Not just a small adult brain: understanding later neurodevelopment through imaging the neonatal brain. J. Child Psychol. Psychiatry Allied Discip. 59, 350–371 (2018).
Article Google Scholar
Wadhawan, R. et al. Twin gestation and neurodevelopmental outcome in extremely low birth weight infants. Pediatrics 123, e220–e227 (2009).
Article PubMed Google Scholar
Adams-Chapman, I., Bann, C. M., Vaucher, Y. E., Stoll, B. J. & Eunice Kennedy Shriver National Institute of Child Health and Human Development Neonatal Research Network. Association between feeding difficulties and language delay in preterm infants using Bayley Scales of Infant Development-Third Edition. J. Pediatr. 163, 680.e3–685.e3 (2013).
Thorpe, K., Rutter, M. & Greenwood, R. Twins as a natural experiment to study the causes of mild language delay: II: Family interaction risk factors. J. Child Psychol. Psychiatry 44, 342–355 (2003).
Article PubMed Google Scholar
Thorpe, K. Twin children’s language development. Early Hum. Dev. 82, 387–395 (2006).
Article PubMed Google Scholar
Horta, B. L., Loret De Mola, C. & Victora, C. G. Breastfeeding and intelligence: a systematic review and meta-analysis. Acta Paediatr. 104, 14–19 (2015).
Article PubMed Google Scholar

Download references

Funding

This work was supported by Theirworld (www.theirworld.org). The work was undertaken in the MRC Centre for Reproductive Health, which was funded by MRC Centre Grant (MRC G1002033). The study was also supported by Health Data Research UK, which receives its funding from HDR UK Ltd (HDR-5012) funded by the UK Medical Research Council, Engineering and Physical Sciences Research Council, Economic and Social Research Council, Department of Health and Social Care (England), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health and Social Care Research and Development Division (Welsh Government), Public Health Agency (Northern Ireland), British Heart Foundation, and the Welcome Trust. The funders had no role in the study and the decision to submit this work to be considered for publication.

Author information

Authors and Affiliations

Usher Institute, Medical School, University of Edinburgh, Edinburgh, UK
Evdoxia Valavani & Athanasios Tsanas
MRC Centre for Reproductive Health, University of Edinburgh, Edinburgh, UK
Manuel Blesa, Paola Galdi, Gemma Sullivan, Bethan Dean & James P. Boardman
NHS Lothian-Neonatal Physiotherapy, Royal Infirmary of Edinburgh, Edinburgh, UK
Hilary Cruickshank
NHS Lothian-Neonatology, Royal Infirmary of Edinburgh, Edinburgh, UK
Magdalena Sitko-Rudnicka
Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, UK
Mark E. Bastin & James P. Boardman
Muir Maxwell Epilepsy Centre, Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, UK
Richard F. M. Chin
Royal Hospital for Sick Children, Edinburgh, UK
Richard F. M. Chin
Division of Psychiatry, Deanery of Clinical Sciences, Royal Edinburgh Hospital, University of Edinburgh, Edinburgh, UK
Donald J. MacIntyre
Salvesen Mindroom Research Centre, University of Edinburgh, Edinburgh, UK
Sue Fletcher-Watson

Authors

Evdoxia Valavani
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Blesa
View author publications
You can also search for this author in PubMed Google Scholar
Paola Galdi
View author publications
You can also search for this author in PubMed Google Scholar
Gemma Sullivan
View author publications
You can also search for this author in PubMed Google Scholar
Bethan Dean
View author publications
You can also search for this author in PubMed Google Scholar
Hilary Cruickshank
View author publications
You can also search for this author in PubMed Google Scholar
Magdalena Sitko-Rudnicka
View author publications
You can also search for this author in PubMed Google Scholar
Mark E. Bastin
View author publications
You can also search for this author in PubMed Google Scholar
Richard F. M. Chin
View author publications
You can also search for this author in PubMed Google Scholar
Donald J. MacIntyre
View author publications
You can also search for this author in PubMed Google Scholar
Sue Fletcher-Watson
View author publications
You can also search for this author in PubMed Google Scholar
James P. Boardman
View author publications
You can also search for this author in PubMed Google Scholar
Athanasios Tsanas
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors have made substantial contributions to conception and design, acquisition of data, or analysis and interpretation of data. All authors have drafted the article or revised it critically for important intellectual content and have approved the final version to be considered for publication.

Corresponding author

Correspondence to Evdoxia Valavani.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethics approval and consent to participate

Ethical approval was obtained from the UK National Research Ethics Service (NRES), South East Scotland Research Ethics Committee (NRES numbers 11/55/0061 and 13/SS/0143). Written informed consent from parents/carers was obtained for all neonates.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary abbreviations

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Valavani, E., Blesa, M., Galdi, P. et al. Language function following preterm birth: prediction using machine learning. Pediatr Res 92, 480–489 (2022). https://doi.org/10.1038/s41390-021-01779-x

Download citation

Received: 20 April 2021
Revised: 04 August 2021
Accepted: 12 September 2021
Published: 11 October 2021
Issue Date: August 2022
DOI: https://doi.org/10.1038/s41390-021-01779-x

This article is cited by

Predicting mental and psychomotor delay in very pre-term infants using machine learning
- Gözde M. Demirci
- Phyllis M. Kittler
- Chia-Ling Tsai
Pediatric Research (2024)
Machine learning for understanding and predicting neurodevelopmental outcomes in premature infants: a systematic review
- Stephanie Baker
- Yogavijayan Kandasamy
Pediatric Research (2023)
Intelligent wearable allows out-of-the-lab tracking of developing motor abilities in infants
- Manu Airaksinen
- Anastasia Gallen
- Sampsa Vanhatalo
Communications Medicine (2022)
Früherkennung primärer Sprachentwicklungsstörungen – zunehmende Relevanz durch Änderung der Diagnosekriterien?
- Christiane Kiese-Himmel
Bundesgesundheitsblatt - Gesundheitsforschung - Gesundheitsschutz (2022)

Abstract

Background

Methods

Results

Conclusions

Impact

Similar content being viewed by others

Introduction

Methods

Participants

Clinical and demographic features

Image acquisition

Image analysis

Language outcome

Data analysis

Results

Discussion

Conclusion

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethics approval and consent to participate

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links