Introduction

Preterm infants have long been recognized as a population at high risk for mortality and adverse functional outcomes, including cerebral palsy and intellectual impairment.1 As mortality rates for preterm neonates decline and more survive to childhood,2,3 attention has increasingly turned towards measuring longer-term morbidities and related functional impairments during childhood and young-adulthood, as well as identifying risk factors related to these complications.4,5 While child-specific characteristics, such as gestational age, birth weight, and sex, are well established as predictors of adverse neurodevelopmental outcomes,6,7,8 recent work has identified additional factors, including bronchopulmonary dysplasia and family socioeconomic status, that are correlated with relevant outcomes, such as poor neuromotor performance and low intelligence quotient at school age.9

In clinical settings, the assessment of prognosis can vary widely across neonatologists,10 making a valid and reliable predictive model for long-term outcomes a highly sought-after clinical tool. Moreover, predicting outcomes is vital when making decisions regarding which therapeutic interventions to apply, when providing critical data to parents for informed decision-making, and when matching infants with outpatient services to best meet their needs. In addition, prediction models are useful in evaluating Neonatal Intensive Care Unit (NICU) performance and allowing for between-center comparisons with proper adjustment for the severity of cases being treated.11

Numerous prediction tools have been developed to quantify the risk of death for preterm neonates in the NICU setting, including the Score for Neonatal Acute Physiology (SNAP) and the Clinical Risk Index for Babies (CRIB).12 The National Institute of Child Health and Human Development (NICHD) risk calculator, predicting survival with and without neurosensory impairment, is widely used to counsel families in the setting of threatened delivery at the edges of viability.13 Furthermore, there are numerous other models that use clinical data from the NICU stay to predict risk for poor functional outcomes in infancy and school age.14,15 While several studies have categorized and evaluated the risk prediction models developed and validated in recent decades for mortality,12,16 no studies have compared and contrasted risk prediction models for non-mortality outcomes. Recently, Linsell et al.17 published a systematic review of risk factor models for neurodevelopmental outcomes in children born very preterm or very low birth weight (VLBW). However, this review focused primarily on overall trends in model development and validation rather than a detailed consideration of individual models.

In this article, we conduct an in-depth, narrative review of the current risk models available for predicting the functional outcomes of preterm neonates, evaluating their relative strengths and weaknesses in variable and outcome selection, and considering how risk model development and validation can be improved in the future. Towards this, we first provide an overview of the different risk models developed since 1990. We then frame our review of these models in terms of the outcomes predicted, the range of predictors considered, and the statistical methods used to select the variables included in the final model, as well as to assess the predictive performance of the model. Finally, the ethical implications of integrating risk stratification into standard clinical care for preterm neonates are considered.

Methods

We conducted a manual search for relevant literature via PubMed, entering combinations of key terms synonymous with “prediction tool,” “preterm,” and “functional outcome” and reading the abstracts of resulting studies (Table 1). Studies with abstracts that appeared related to our review were then read in full to identify prediction models that were eligible for inclusion. Reference lists of included studies were also reviewed, as were articles that later cited these original studies. Prediction tools were defined as multivariable risk factor analyses (>2 variables) aiming to predict the probability of developing functional outcomes beyond 6 months corrected age. Models that solely investigated associations between individual risk factors and outcomes were excluded, as were models that were not evaluated for predictive ability in terms of either a validation study or an assessment for performance, discrimination, or calibration. Tests used to evaluate a model’s overall performance were R2, adjusted R2, and the Brier score. The use of a receiver operating characteristic (ROC) curve or a C-index evaluated a model’s discrimination, and the Hosmer–Lemeshow test was considered to evaluate a model’s calibration.18 Preterm neonates were defined as <37 weeks of completed gestational age. Models with VLBW neonates <1500 g were also included, since in the past birth weight served as a substitute for measuring prematurity when gestational age could not be accurately determined. Models were excluded if they used a cohort entirely composed of infants born prior to 1 January 1990; those born after 1990 were likely to have had surfactant therapy available in the event of respiratory distress syndrome, which significantly reduced the morbidity and mortality rates among preterm neonates nationwide.19,20 Models were also excluded if they limited their prediction to the outcome of survival, if they incorporated variables measured after initial NICU discharge, or if they included subjects who were not necessarily transferred to a NICU for further care following delivery. Finally, we excluded tools that only predicted outcomes to an age of <6 months corrected age, as well as case reports, narrative reviews, and tools reported in languages other than English.

Table 1 Search terms used in literature review of studies with risk prediction models for functional outcomes of preterm neonates.

Overview of risk prediction models

Table 2 lists all 32 studies with risk prediction models that meet the inclusion and exclusion criteria.13,14,15,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49 From these, a total of 43 distinct models were reported.

Table 2 Summary of studies with risk prediction models predicting functional outcomes of preterm neonates.

From mortality to neurodevelopmental impairment

Since 1990, several mortality prediction tools have been evaluated in regards to their ability to predict the likelihood of neurodevelopmental impairment (NDI) among neonates surviving to NICU discharge. One such model is the CRIB, which incorporates six physiologic variables collected within the first 12 h of the preterm infant’s life: birth weight, gestational age, presence of congenital malformations, maximum base excess, and minimum and maximum FiO2 requirement.50 Fowlie et al.24 evaluated how CRIB models obtained at differing time periods over the first 7 days of life predicted severe disability among a group of infants born >31 weeks gestational age or VLBW. In another study, Fowlie et al.25 incorporated cranial ultrasound findings on day of life 3 along with CRIB scores between 48 and 72 h of life into their prediction model. Subsequent studies analyzed the CRIB in its original 12-h form and, with only one exception,23 determined that it was not a useful tool for predicting long-term NDI or other morbidities.26,27,28,29 A second example is the SNAP score.51 SNAP uses 28 physiologic parameters collected over the first 24 h of life to predict survival to NICU discharge, and was modified to predict NDI at 1 year and 2–3 years of age. A subsequent assessment of both the SNAP and the SNAP with Perinatal Extension42 showed a poor predictive value for morbidity at 4 years of age for children born VLBW and/or with gestational age ≤31 weeks.28 Finally, the Neonatal Therapeutic Intervention Scoring System, a comprehensive exam-based prediction tool for mortality,52 was found to have a poor predictive value for adverse outcomes at 4 years of age in children born very preterm or VLBW.28

Shortened forms of the early physiology-based scoring systems were developed and assessed for their ability to predict outcomes in childhood. Application of the CRIB-II on a small cohort (n = 107) of infants born <1250 g predicted significant NDI at 3 years of age.39 However, a subsequent evaluation in a much larger cohort (n = 1328) of preterm infants <29 weeks gestational age concluded that the CRIB-II did no better than gestational age or birth weight alone in predicting moderate to severe functional disability at 2–3 years of age.40 Studies have supported an association between the SNAP-II and SNAPPE-II scores and neurodevelopmental outcomes and small head circumference at 24 months corrected age. High SNAP-II scores were shown to correlate with adverse neurological, cognitive, and behavioral outcomes up to 10 years of age within a large cohort (n = 874) of children born very preterm.43

Antenatal risk factors

Several groups have used data from the NICHD’s Neonatal Research Network (NRN) to design and test various risk prediction models for extremely low birth weight (ELBW) newborns. One of the most widely used risk prediction tools developed from this cohort was by Tyson et al.,13 using data from ELBW infants in NRN centers between 1998 and 2003. The model includes five variables available prior to delivery—gestational age, estimated birth weight, sex, plurality, and antenatal corticosteroid exposure—to predict risk for death or profound NDI for infants born between 22 and 25 completed weeks gestation. The model has been incorporated into an online calculator that facilitates the counseling of families facing delivery at the margins of viability. It has also been validated by two separate studies—Lee et al.53 only evaluated the model for predicting death before discharge, but Marrs et al.38 evaluated the model for risk of death or NDI as well.

Postnatal morbidity

A large cohort study (n = 910) from Schmidt et al.15,32 used data from ELBW neonates 500–999 g enrolled in the international Trial of Indomethacin Prophylaxis in Preterms (TIPP). They found that the presence of three morbidities at 36 weeks post-menstrual age—bronchopulmonary dysplasia, serious brain injury, and severe retinopathy of prematurity—had a significant and additive effect on the risk for death or poor neurologic outcome at 18 months corrected age. They developed a model from this relationship that has been corroborated in two studies with smaller samples and by Schmidt et al.15 in a separate, large cohort in which the definition of poor outcome was expanded from solely NDI to “poor general health.”33,34

Letting the machines decide

Some innovative work has been recently performed by Ambalavanan et al.14,35 in creating several risk prediction models.45 Along with studies developing risk prediction tools with data from the NRN and the TIPP to predict the outcomes of death and NDI or solely NDI, the group made the only risk prediction tool for the outcome of rehospitalization, both general and specifically for respiratory complications, using a combination of physiologic and socioeconomic variables incorporated into a decision tree approach. They have also been the only group to create neural network-trained models, using the same small cohort to predict major handicap, low mental development index (MDI), or low psychomotor development index (PDI). The advantage of using neural networks—algorithms that can “learn” mathematical relationships between a series of independent variables and a set of outcomes—is the ability to model complex or nonlinear relationships that can be elucidated by the model without having to consider these relationships a priori (as is typically required when using multiple regression models). Despite the use of innovative approaches, however, none of these models differed from other studies in predictive strength or even had high predictive efficacy.31

Limitations of prior approaches

The above literature review highlights the substantial interest in developing a clinically useful risk prediction model and the limits of efforts to date. Notwithstanding their differing inclusion and exclusion criteria, existing risk prediction models are relatively similar in terms of variables selected, outcomes analyzed, and statistical strategies employed. With few exceptions, the limitations of existing risk prediction models are especially apparent in their reliance on solely biologic variables and traditional analytic methods ill-equipped to handle the statistical complexity necessary for risk modeling.

Conceptual considerations

Identifying important outcomes

The majority of risk prediction models defined NDI as their primary outcome of interest. Making a determination of impairment often relies on standardized measures of cognition in concert with neurosensory deficits. Yet, researchers often define NDI in different ways, making between-study comparisons difficult. NDI is a construct relating to global abilities encompassing cognition, language, motor function, and vision and hearing. While the tools used to identify NDI are often also used to make diagnoses of developmental delay, NDI is not a clinical term or diagnosis in and of itself. Many of the remaining studies also predicted functional outcomes, such as academic performance, executive function, language ability, and autism spectrum disorder (ASD). These outcomes may be more meaningful to parents and providers than NDI.54

To date, only four studies have considered outcomes unrelated to neurodevelopment, such as impaired pulmonary function, “poor general health,” and rehospitalization rates.15,28,45,49 While the emphasis on NDI is unsurprising given the high-risk population, moderate to severe NDI only affects a minority of the preterm population.55,56 Studies have revealed numerous additional adverse outcomes that preterm individuals are more likely to experience compared to their full-term counterparts, such as impaired respiratory, cardiovascular, and metabolic function.57,58,59,60,61,62,63,64,65,66 Neurodevelopment has been linked to chronic health problems in later childhood.67 Limiting risk prediction to moderate to severe NDI therefore ignores other, more common complications that preterm infants are likely to face that have an impact on neurodevelopment. This represents a missed opportunity for researchers to better understand what variables influence the likelihood that these problems occur.

The impact of developmental disability on the child and family is completely absent from current risk models. Health-related quality of life (HRQL), which distinguishes itself as a personal rather than third-party valuation of a patient’s physical and emotional well-being, is being increasingly appreciated as an important metric necessary to fully understand the impact of prematurity.68 In a French national survey, the majority of neonatologists, obstetricians, and pediatric neurologists stated that predicting HRQL in the long term for preterm infants would be beneficial for consulting parents about what additional responsibilities they can anticipate in caring for their child.69 The trajectory of HRQL from childhood to young-adulthood appears to improve in both VLBW and extremely low gestational age populations.70 Prediction modeling might aid in determining which factors could positively or negatively impact HRQL in this vulnerable population.

Finally, we must consider the age at which outcomes are being predicted. It is evident that lower gestational age is inversely proportional to rates of NDI and academic achievement in adolescence.71,72 However, the vast majority of risk prediction models assessed outcomes at the age of 3 years or less, with only three studies doing so at 10 years of age or above. Although early childhood outcomes may give clues about later development, many problems do not manifest until later in childhood, such as learning disabilities and certain psychiatric disorders. Developmental disability severity can fluctuate throughout childhood, with catch-up occurring in early preterm children and worsening delay in some moderate and late preterm children.73,74 Although cohorts of preterm infants are not usually followed for more than several years, likely due to lack of resources and expense, recent studies have used data from national registries to link neonatal clinical data to sampled adults, providing evidence of increased rates of adverse neurodevelopmental, behavioral, and educational outcomes among adults born preterm.75,76 Opportunities are therefore available to use long-term data to extend risk prediction models beyond the first few years of life.

Variable selection

Most of the risk models reviewed relied primarily on physiologic and clinical measures obtained during the NICU stay. While an emphasis on biologic risk factors is clearly reasonable given the known associations between perinatal morbidities and long-term outcomes, there is strong evidence in the literature suggesting associations between sociodemographic factors like parental race, education, and age, and outcomes such as cognitive impairment, cerebral palsy, and mental health disorders in children born preterm. More specific socioeconomic variables such as lower parental education, maternal income, insurance status, foreign country of birth by a parent, and socioeconomic status as defined by the Elly-Irving Socioeconomic Index have been repeatedly correlated with reduced mental development index, psychomotor development index, intelligence quotient, and social competence throughout childhood.71,72,77,78,79,80,81,82 The geographic area in which preterm neonates are raised could also have a profound influence on their development. Neighborhood poverty rate, high school dropout rate, and place of residence (metropolitan vs. non-metropolitan) have all been correlated with academic skills and rate of mental health disorders among low birth weight children.83,84

Only 12 of the 43 models reviewed included socioeconomic variables. This may be due, at least in part, to the difficulty in obtaining social, economic, and demographic data; these variables are often not collected upon hospital admission. Additionally, socioeconomic information is often poorly, inaccurately, and variably recorded or is largely missing.85 Some risk prediction models collected socioeconomic variables at the follow-up visit when outcomes were assessed. This is an imperfect method given that factors such as household setting and family income may change substantially in the years following NICU discharge and affect children’s health.86,87

In some models, socioeconomic variables were not included because they did not significantly improve the model’s predictive ability.45 Testing the effects of social factors on infant and child outcomes requires samples that are socially and economically diverse. Even large, diverse study populations may become more homogeneous over time, as subjects of lower socioeconomic status and non-white race are more likely to drop out of studies dependent on long-term follow-up.41 And treating socioeconomic variables as statistically independent factors rather than interrelated might minimize the impact of contextual information on neurodevelopmental outcomes.

Statistical considerations

Model development

Of the 32 papers included in the review, 12 reported on de novo risk prediction tools. The other 20 studies either evaluated a previous model or adjusted a prior model by changing the times at which data were collected or by adding additional variables. The approach to prediction tool development was almost uniform among the studies, with nine of the models solely using regression techniques to select variables. Ambalavanan et al. deviated from this method in three separate studies: two using classification tree analysis,35,45 and one using a four-layer back-propagation neural network.31

Each new model—with the exception of the neural network-based model by Ambalavanan et al.35,45—depended on an approach in which individual variables were selected and treated as independent of one another as they were analyzed in their ability to predict the outcome of interest. Yet, variables may, in fact, not act independently. While parsing the roles of potential interrelationships may be computationally onerous and treating them independently may lead to a more parsimonious model, this may be at the expense of accuracy. Alternative computational approaches are needed to account for the differential likelihoods of certain outcomes on the causal pathway from preterm birth to later childhood outcome. Nonlinear statistical tools should be further utilized in risk prediction model development to examine the relationships between variables and outcomes of interest. Machine learning, for instance, is a method of inputting a group of variables and generating a predictive model without making assumptions of independence between the factors or that specific factors would contribute the most to the model.88 Different forms of machine learning have already been employed in NICU’s to extract the most important variables for predicting outcomes such as days to discharge.89

The non-independence of risk factors is also complicated by the role of time in models of human health and development. The life-course framework describes how an accumulation or “chains” of risk experienced over time and at certain critical periods impact later health outcomes.90 In the context of preterm birth, the risk of being born early is not uniform across populations and dependent on a given set of maternal risks. In turn, the degree of prematurity imparts differential risk for developing complications such as bronchopulmonary dysplasia, necrotizing enterocolitis, or retinopathy of prematurity. These morbidities then, in turn, increase risks for further medical and developmental impairment. These time-varying probabilities can be modeled and incorporated into prediction tools to more accurately capture the longitudinal and varying relationships between exposures and outcomes and improve thereby estimations of risk.91,92,93

A final methodological concern regarding model development is whether and how the competing risk of death is considered when the outcome being predicted is non-terminal. Consider, for example, the task of developing a model for the risk of NDI at 10 years of age. How one handles death can have a dramatic effect on the model, especially since mortality is relatively high among preterm infants. Moreover, if death is treated simply as a censoring mechanism, as it is often done in time-to-event analyses such as those based on the Cox model, then the overall risk of NDI will be artificially reduced; those children who die before being diagnosed with NDI will be viewed as remaining at risk even though they cannot possibly be subsequently diagnosed with NDI. While an alternative to this would be to use a composite outcome of time the first of NDI or death, doing so may result in a model that is unable to predict either event well. Instead, one promising avenue is to frame the development of a prediction model for NDI within the semi-competing risks paradigm.94,95 Briefly, semi-competing risks refer to settings where one event is a competing risk for the other, but not vice versa. This is distinct from standard competing risks, where each event is competing for the other (e.g., death due to one cause or another). To the best of our knowledge, however, semi-competing risks have not been applied to the study of long-term outcomes among preterm infants.

Model evaluation

Waljee et al.18 provide a summary of methods for assessing the performance of a predictive model, categorizing them into three types: overall model performance, which focuses on the extent of variation in risk explained by the model; calibration, which assesses differences between observed and predicted event rates; and discrimination, which assesses the ability to distinguish between patients who do and do not experience the outcome of interest. The majority of studies in our review assessed their models with ROC curve analysis, a method of assessing discrimination. While widely used, there is some debate with regard to ROC-based assessments, specifically in regard to its lack of sensitivity in assessing differences between good predictive models.96 Although several novel performance measures for comparing discrimination among models have been proposed, none have been employed in the context of comparing risk prediction tools for preterm neonates.97,98

Few studies employed analyses other than ROC. Only six in our review assessed overall performance with R2 or partial R2, and five evaluated calibration using the Hosmer–Lemeshow test. Another four studies assessed internal validation with either an internal validation set or bootstrapping techniques.99 There were nine studies meeting inclusion criteria solely because they had models that were externally validated via other studies. Schmidt et al.32 reported odds ratio associations for their 3-morbidity model, which are not a reliable method of determining the strength of risk prediction tools.100 Future risk model assessments for preterm neonates should at minimum include an ROC curve analysis, although assessments of overall performance and calibration would also be helpful. Validation with a different sample from the development set is also advised, ideally with a population outside the original cohort.18

Conclusion

Risk assessment and outcomes prediction are valuable tools in medical decision-making. Fortunately, infants born prematurely enjoy ever-increasing likelihood of survival. Research over the past several decades has highlighted the many influences, physiologic and psychosocial, affecting neurodevelopment, HRQL, and health services utilization. Yet, the wealth of knowledge gained from longitudinal studies of growth and development is not reflected in current risk prediction models. Moreover, some of the most well-known and widely used tools today, such as Tyson et al.’s13 five-factor model, were developed nearly two decades ago. As advances in neonatal intensive care progressively reduce the risk of certain outcomes, it is clear that these older models require updating if they are to be of continued clinical use. It should be recognized that there are potential ethical ramifications to incorporating more psychosocial factors and outcomes into risk prediction models, such as crossing the line from risk stratification to “profiling” patients and offering different treatment decisions based on race or class.101 However, physician predictions without the aid of prediction tools are highly inconsistent during counseling at the margins of viability, and further research is needed regarding the level of influence that physicians actually have on caregiver decision-making during counseling, as well as the extent to which risk prediction tools would change their approach to counseling.10 In addition, despite recent innovation in statistical approaches to risk modeling, such as machine learning, most prediction tools rely on standard regression techniques. Insofar that risk prediction models will continue to be developed for preterm neonatal care, making use of the clinical data available in most modern electronic health records and taking into consideration the analytic challenges related to unequal prior probabilities of exposures, non-independence of variables, and semi-competing risk can only strengthen our approach to predicting outcomes. We therefore recommend taking a broader view of risk, incorporating these concepts in creating stronger risk prediction tools that can ultimately serve to benefit the long-term care of preterm neonates.