Introduction

Recent years have seen marked improvements in survival and outcomes for infants born less than 30 weeks gestational age.1 Despite this positive trend, children born very preterm (VPT) remain at high risk for long-term physical and mental health problems, as well as developmental delay. In longitudinal follow-up studies, VPT children show deficits or delays in cognitive, motor, and language development and are at increased risk for disorders such as cerebral palsy (CP) and autism spectrum disorder (ASD).2,3,4,5,6 However, there remains significant variability in outcomes for children in this group, with many showing few to no long-term impairments.

Most studies examining neurodevelopmental outcomes in VPT children report rates of impairment for individual outcomes separately. That is, they report the prevalence of neurocognitive impairments (e.g., intelligence quotient [IQ] < 70) or medical diagnoses (e.g., CP) as discrete outcomes distinct from each other. This approach treats the discrete outcomes as independent of one another, when in fact it is likely that some outcomes will co-occur. An alternative to the individual variable-based approach is to integrate across multiple measures to identify subgroups of children with similar patterns of behavior or impairment. Given the great diversity of outcomes for VPT children, a comprehensive, “person-” or “child-centered” approach might identify subgroups of children who experience greater, fewer, or different types of neurodevelopmental impairments across multiple domains and provide a more nuanced description of these children.

One methodology for identifying subgroups of children is latent profile analysis (LPA). Conceptually, LPA is a statistical tool that captures similarities and differences between individuals rather than modeling relationships among variables at the group level. LPA identifies subgroups of individuals who are similar to one another, but different from individuals in other subgroups, based on patterns of performance across multiple variables. Applying LPA to outcome data in VPT children provides a comprehensive picture of children who are at varying levels of risk across multiple domains that could enable us to develop more targeted prevention and intervention strategies.

LPA has been applied to the study of developmental outcome in preterm infants in middle childhood including studies with extremely7,8 and moderately preterm9 samples. LPA successfully identified subgroups of children who ranged from average or above average to severely impaired on standardized cognitive or behavioral outcomes such as IQ, attention, executive function, and internalizing and externalizing problems.7,8,9 In the current investigation, we similarly apply LPA as a method for summarizing outcomes for VPT children. However, our study is novel in that we considered younger children and a wider range of neurodevelopmental outcomes. Whereas previous studies have included either cognitive7,9 or behavioral outcomes8 in their LPA analyses, our goal was to investigate patterns of developmental outcome measures across different domains of functioning. Thus, we included cognitive, behavioral, language, and motor outcomes, as well as CP diagnosis and ASD risk at 24 months corrected age. We hypothesized that we would observe profiles that ranged from above average or average to severely impaired neurodevelopment with or without abnormal behavior.

Methods

Study population

The Neonatal Neurobehavior and Outcomes in Very Preterm Infants (NOVI) study enrolled infants born <30 weeks postmenstrual age (PMA) from nine NICUs affiliated with six universities from April 2014 to June 2016. Inclusion criteria included: (1) birth <30 weeks PMA; (2) parental ability to read and speak English or Spanish; and (3) residence within 3 h of the NICU and follow-up clinic. Infants were excluded for major congenital anomalies, maternal age <18 years, cognitive impairment, or death.

Parents of eligible infants were invited to participate in the study at 31–32 PMA or when survival to discharge was determined to be likely by the attending neonatologist. Researchers explained study procedures and obtained informed consent in accordance with each institution’s review board. Children were included in this analysis if they were enrolled in NOVI at birth and were seen at the 24-month follow-up visit (Mcorrected_age = 25.3 months).

Measures

Bayley Scales of Infant and Toddler Development, 3rd edition (Bayley-III)

The Bayley-III10 is a widely used developmental assessment tool that captures cognitive, language, and motor domains. The language scale contains receptive and expressive language subtests, while the motor scale contains gross and fine motor subtests. In this investigation, we used five scaled scores (i.e., cognitive, receptive language, expressive language, gross motor, fine motor) that are derived from raw scores and have a mean of 10 and a standard deviation (SD) of 3. The Bayley-III has high reliability in premature infants11 and has been used in prior studies with similar samples.12

Child Behavior Checklist 1 ½–5 years (CBCL)

The CBCL13 is a widely used parent-report measure of child behavior problems in which parents rate 99 specific child behaviors as 0 (“Not True”), 1 (“Somewhat or Sometimes True”), or 2 (“Very True or Often True”). Individual items are summarized into 7 symptom subscales: Emotionally Reactive, Anxious/Depressed, Somatic Complaints, Withdrawn, Sleep Problems, Attention Problems, and Aggressive Behaviors. Raw subscale scores were used in the analyses.

CP diagnosis

A standardized neuromotor examination was performed along with completion of the Gross Motor Function Classification System (GMFCS). Child diagnosis of CP was determined based on the GMFCS and/or abnormal neurological exam findings.

Modified Checklist for Autism in Toddlers, Revised, with Follow-Up (MCHAT-R/F)

The MCHAT-R/F14 is a screening instrument for early signs of ASD risk for children between 16 and 30 months. It consists of 20 yes/no questions that ask about the child’s social, communicative, and play behaviors (e.g., “Does your child try to attract your attention to his/her own activity?”) and other behaviors associated with ASD (“Does your child ever seem oversensitive to noise?”). Responses are summed and used to classify children as low risk (total score 0–2; requires no further evaluation), medium risk (total score 3–7; requires administration of MCHAT-Follow-Up interview to clarify responses and reduce likelihood of false-positive screen results), and high risk (total score 8–20; warrants immediate referral for evaluation and intervention). As the MCHAT-R/F was designed for high sensitivity, there is a high false-positive rate.14 Although many children who screen positive for ASD on the MCHAT-R/F will not be formally diagnosed with ASD, they are at heightened risk for other developmental delays. In this study, a positive screen for ASD was defined as an MCHAT-R/F score of 3 or higher after follow-up interview.

Statistical analyses

LPA classifies individuals into mutually exclusive groups based on patterns of responses to observed indicators. These groups are latent because they are not directly observed. LPA uses maximum likelihood estimation, a probability-based method for determining the parameters of a model such that they maximize the likelihood of the model producing the data that are observed. The best number of latent profiles can be determined from model fit statistics as well as the sizes and interpretability of the groups. We used LPA to classify infants into mutually exclusive groups based on 14 outcome variables: five Bayley-III scaled scores, seven CBCL syndrome scores, CP diagnosis, and ASD positive screen (M-CHAT ≥ 3). We used Bayley scaled scores and CBCL syndrome scores instead of the more global summary scores for these measures (i.e., Bayley cognitive, motor, and language composite scores; CBCL internalizing and externalizing composite scores) because we were interested in studying more fine-grained characteristics of infants. Additionally, LPA models with more indicators generally perform better (e.g., better convergence, less parameter bias) compared to models with fewer indicators.15

LPA models with different numbers of latent profiles were fitted. To determine the best-fitting model, we applied the following criteria. First, the majority of solutions had to meet statistical convergence criteria. Second, we evaluated which model had the lowest Bayesian Information Criteria (BIC) adjusted for sample size. The BIC is a numerical index of how well a model fits the underlying data; it balances goodness of fit with model parsimony. Third, we evaluated which model had the highest entropy and highest average class probabilities, both of which index the degree of classification accuracy. Fourth, we conducted Lo–Mendell–Rubin (LMR) and Bootstrapped Likelihood Ratio tests (BLRT) which compare the fit of a model with k profiles to a model with k−1 profiles. A significant LMR or BLRT test indicates that a model with k profiles fits significantly better than a model with k−1 profiles. Finally, we ensured that the smallest profile included at least 5% of the sample.

All LPA models were run in Mplus 7.4. Additionally, all LPA models accounted for clustering of children within families and allowed for unequal variances for the outcome variables across different profiles. This specification allowed for the possibility that Bayley or CBCL scores might be more or less variable in certain groups.

Using the best-fitting LPA solution, we described the mean Bayley and CBCL scores, as well as prevalence of CP and positive ASD screens, in each profile. To contextualize Bayley and CBCL scores, we describe group means as they compare to norm-referenced scores (e.g., ≤1 SD or ≤2 SD below the mean for Bayley; T-score ≥65 or ≥70 for CBCL). We also compared group means (Bayley, CBCL) and proportions (CP, ASD screen positives) across the groups using one-way ANOVA and chi-squared tests, respectively. We followed up significant omnibus tests (e.g., F-test from ANOVA) with post hoc comparisons to determine which profiles were statistically different from one another.

Results

Description of sample at birth and follow-up

Of the 704 children enrolled in the study, 587 (83%) were seen for follow-up at 2 years. Those lost to follow-up were more likely to be male and to have had a serious brain injury at birth (Table 1). Descriptive statistics for 2-year outcome data in the full sample are shown in Table 2. For Bayley-III subscales, mean scores ranged from 7.89 (receptive communication) to 9.41 (fine motor). Between 12% (fine motor) and 28% (receptive communication) of children had scores 1 SD or more below the standardized mean. For CBCL, mean number of endorsed symptoms ranged from 1.60 (withdrawn) to 9.04 (aggressive behaviors). Between 4% (anxious/withdrawn) and 16% (attention problems) of children met criteria for borderline elevated behavior problems (T ≥ 65). Of 553 children assessed, 86 (16%) had a CP diagnosis. Of 585 children completing the MCHAT-R/F, 91 (16%) screened positive for ASD risk. Of 551 children with both sources of data, 36 (6.5%) had both a CP diagnosis and a positive ASD screen, 51 (9.3%) had only a CP diagnosis, 49 (8.9%) had only a positive ASD screen, and 415 (75%) had neither.

Table 1 Demographic and medical characteristics of sample.
Table 2 Means and percentages of outcome variables in full sample.

LPA analysis

We fitted LPA models with 1 to 5 profiles and compared their fit statistics (Table 3). The majority of solutions for the 5-profile model failed to converge; thus this model was not considered further. The sample size adjusted BIC decreased with increasing number of profiles, suggesting improved fit with increasing numbers of profiles. Model entropy and average class probabilities were highest for the 4-profile solution. Both LMR and BLRT suggested that the model with four profiles fit significantly better than the model with three profiles. The size of each latent profile was also sufficient (>5%) for the 4-profile model. Thus, the 4-profile model was determined to have the best fit to the data. We next describe and compare the four profiles in terms of their mean scores on the Bayley and CBCL and prevalence of CP diagnosis and ASD- positive screens (Table 4 and Fig. 1). Omnibus testing revealed significant differences between the latent profiles on all outcome variables (all p < 0.0001). Therefore, below we describe the results of post hoc tests that describe pairwise comparisons (e.g., profile 1 vs. 2).

Table 3 Model fit statistics for LPA models.
Table 4 Means and percentages of outcome variables by latent profile.
Fig. 1: Means of Bayley and CBCL scores (left y-axis) and risk for CP and ASD (right y-axis) by latent profile.
figure 1

Bayley outcomes are subscale scores (M = 10, SD = 3). CBCL outcomes are syndrome scale raw scores. Children in profile 1 (black; 31%) had optimal outcomes, profile 2 (blue; 41%) had typical outcomes, profile 3 (purple; 11%) had low Bayley scores and high CBCL scores, and profile 4 (red; 16%) had the lowest Bayley scores. Color shading provides information about normal limits (green), low or borderline elevated scores (yellow), and very low or clinically elevated (orange) scores. For CBCL, while raw scores were used to estimate models, interpretive background shading is based on corresponding t-scores. COG Bayley cognitive, EC Bayley expressive communication, RC Bayley receptive communication, FM Bayley fine motor, GM Bayley gross motor, Emot CBCL emotionally reactive, Anx/Dep CBCL anxious/depressed, Somat CBCL somatic complaints, Withdrawn CBCL withdrawn, Sleep CBCL sleep problems, Attention CBCL attention problems, Aggressive CBCL aggressive behaviors, WNL within normal limits.

Profile 1 included 184 (31.3%) children. This group had the highest mean scores on Bayley cognitive, expressive communication, and receptive communication subscales, and the lowest scores (i.e., fewest behavior problems) on all CBCL subscales (all p < 0.002). Rates of CP (8%) and a positive screen for ASD risk (1%) were both low in this group. Profile 2 included the largest proportion of children (N = 243; 41.4%). Children in this group had the second highest scores, after profile 1, for Bayley cognitive, expressive communication, and receptive communication subscales (all p < 0.002). Mean Bayley scores for fine and gross motor subscales were not statistically different from those in profile 1 (all p > 0.05) but were significantly higher than in profiles 3 and 4 (all p < 0.0001). Children in profile 2 had significantly higher mean CBCL symptoms compared to children in profiles 1 and 3 (p < 0.0001), except for somatic complaints and withdrawn symptoms, which were similar (p = 0.68) or lower (p < 0.001) in profile 2 compared to profile 3, respectively. Rates of CP (6%) and a positive screen for ASD risk (5%) were low in profile 2 and were not significantly different from rates in profile 1 (all p > 0.05).

Profile 3 consisted of 65 children (11.1%). Children in this profile had low Bayley scores, with mean scores more than one SD below the population mean for four of the five subscales. Mean CBCL symptoms were highest in this group as compared to all others (all p < 0.0001). Rates of CP (25%) and a positive screen for ASD risk (44%) were significantly higher in profile 3 than in profiles 1 and 2 (all p < 0.0004). Finally, profile 4 consisted of 95 children (16.2%). Children in this profile had the lowest Bayley scores for cognitive, fine motor, and gross motor subscales compared to all other profiles (all p < 0.001), whereas mean scores for expressive and receptive communication subscales were equally low in profiles 3 and 4 (all p > 0.05). Mean scores for all Bayley subscales were more than one SD below the population mean. However, CBCL symptoms were close to or below the sample mean for all subscales. Mean scores for most CBCL subscales (e.g., emotionally reactive, anxious depressed, sleep problems, attention problems, aggressive behaviors) were lower in profile 4 than in profile 2 (all p < 0.001), though not as low as in profile 1 (all p < 0.003). Finally, rates of CP (45%) and a positive screen for ASD risk (51%) were similar (ASD; p = 0.21) or higher (CP; p = 0.006) than in profile 3.

Finally, we examined rates of low Bayley scores (>1 SD and >2 SD below population mean) and high CBCL scores (T ≥ 65 and T ≥ 70) in each profile (Table 5). Similar to our comparison of mean scores, we found lowest rates of low Bayley scores in profiles 1 and 2, and highest rates in profiles 3 and 4. In profile 4, 52–82% of children had Bayley scores >1 SD and 20–42% had Bayley scores >2 SD below the mean. Additionally, we found that rates of high CBCL scores were notably higher in profile 3 compared to all other profiles, with 28–83% scoring in the T ≥ 65 range, and 9–63% scoring in the T ≥ 70 range.

Table 5 Prevalence of clinically relevant scores by latent profile.

Because infants born extremely preterm (EPT; <28 weeks gestational age) are at highest risk for poor outcomes, we examined whether there were differences in outcome domains and profile membership for this group (N = 354, 60% of sample). We found that infants born EPT were less likely to be classified in profile 1 (25% vs. 41%), and more likely to be classified in profiles 3 (13% vs. 8%) and 4 (19% vs. 11%), all p < 0.05. Infants born EPT were more likely to have Bayley scores >1 SD below the population mean for all subscales and were more likely to have Bayley fine and gross motor scores >2 SD below the population mean (all p < 0.05). They were also more likely to have elevated (T ≥ 65 range) CBCL scores on the withdrawn (13% vs. 5%) and attention problems subscales (19% vs. 12%), all p < 0.05. Rates of CP were higher in infants born EPT (20% vs. 9%), p < 0.001, although ASD risk was only marginally higher (18% vs. 12%), p = 0.06.

Discussion

We found evidence for four discrete neurodevelopmental profiles indicating distinct combinations of developmental and behavioral outcomes at 24 months adjusted age in a sample of VPT children. Two of the profiles (profiles 1 and 2) included 72.7% of the sample with most having Bayley scores within the normal range (i.e., within 1 SD of the population mean). The other two profiles (profiles 3 and 4) included the remaining 27.3% of the sample with most having Bayley scores outside of the normal range (>1 SD below the population mean). Children in profile 1 were distinguished by having both higher cognitive and language scores and lower behavior problem scores than children in the other three profiles. Children in profile 2 had slightly lower Bayley scores and slightly higher CBCL problem scores compared to profile 1, but most scores were within normal limits. Children in profiles 1 and 2 were less likely to have a CP diagnosis or a positive ASD screen compared to children in profiles 3 and 4.

The two profiles (profiles 3 and 4) with low Bayley scores were remarkable in that children in profile 3 had higher Bayley cognitive and motor scores than children in profile 4 but had higher behavior problem scores than children in any other profile. Interestingly, behavior problem scores were similarly low in children with the lowest Bayley scores (profile 4) and children with Bayley scores in the normal range (profiles 1 and 2). Profile 4 children had the highest rates of CP and rates of ASD risk were similarly high for children in profiles 3 and 4 compared to profiles 1 and 2. Therefore, although profiles 3 and 4 showed similar neurobehavioral abnormalities, there were substantial differences between their behavior problems.

It is noteworthy that among a reasonably sized cohort of infants born <30 weeks gestational age, 73% fell within normal limits in both neurodevelopmental and behavioral domains. The Extremely Low Gestational Age Newborns (ELGAN) study found that 78% of children had Bayley Mental Development Index (MDI) scores ≥70 at 24 months.16 However, it is difficult to compare these findings to the current study because the ELGAN study used the Bayley-II17 and reported composite rather than subscale scores. The corresponding rate of 27% that we found to score below normal limits is somewhat lower than has been reported in contemporary cohorts of extremely preterm infants18,19 but is higher or on par with studies examining VPT infants evaluated using the Bayley-III.20,21 Our cohort was recruited from nine NICUs from various regions in the United States, as opposed to previous papers that report results from single sites in the United States or from countries outside of the United States. Additionally, our sample was recruited from 2014 to 2016, as opposed to previous studies that recruited participants in the late 1990s or early 2000s. Therefore, differences among studies could reflect differences in cohort characteristics as well as secular changes including improvements in the care and management of VPT infants over time.

Although behavior problems were not as prevalent as neurodevelopmental problems, we did observe marked borderline and elevated behavior problems in 11% of the cohort (profile 3). Interestingly, the most behavioral problems were found in children in one of the two profiles with low neurodevelopmental scores (profile 3), yet, the group of infants with the lowest neurodevelopmental scores (profile 4) did not have elevated behavioral problems. Our observation of few children with clinically significant behavior problem scores is consistent with prior studies using dimensional measures of symptomology in premature infants22 and suggests that the behavioral difficulties in this group may be better characterized as a “low severity, high prevalence” pattern23 that could nonetheless culminate in impaired functioning, especially as children enter formal schooling. A benefit of using the CBCL is that it enables us to investigate which specific behavioral domain(s) are likely to be problematic for VPT children. In this study, we observed greatest behavioral difficulties on the emotionally reactive, withdrawn, attention problems, and aggressive behaviors subscales. These findings are consistent with other studies that report greater CBCL externalizing problems as compared to internalizing problems12 and especially elevated levels of attention problems in VPT children at later follow-up.12,24

As expected, children in the low neurodevelopmental profiles had the highest rates of CP.25 Interestingly, ASD risk was also concentrated in the two low neurodevelopmental groups (profiles 3 and 4). The latter findings need to be interpreted with caution as the number of positive cases is small and the MCHAT is a risk assessment not a diagnostic tool.14 The M-CHAT has been shown to have high misclassification rates in VPT infants,26 despite ASD being more prevalent in this group.22 It is also possible that the positive screens on the M-CHAT in this sample are identifying sensory and social communication issues that present increasing challenges for VPT infants whether they are associated with a later ASD diagnosis or not.27 Our findings could lead to a more systematic understanding of which preterm infants are most likely to develop ASD as well as other sensory and social communication disorders, namely those with a low neurodevelopmental profile with or without comorbid behavior problems.

Our findings are both similar and different from previous studies that have investigated neurodevelopmental profiles in preterm children using LPA methods. Interestingly, all previous studies have also identified four distinct profiles,7,8,9 regardless of the study’s specific measures or sample characteristics. The two studies that investigated profiles of cognitive functioning following preterm birth describe groups that similarly spanned the full spectrum of performance from typical performance to severe impairment, with the two most impaired groups comprising approximately one quarter of the entire sample.7,9 The single study examining behavioral outcomes in preterm children found a typical group, a group with subclinical elevation in emotional, attentional, and peer problems, a group elevated in all domains except peer problems, and a group with clinically elevated problems across all domains.8 The group with the most behavioral problems was also small, comprising 8% of the preterm sample. These findings differ somewhat from our results because only one of our four profiles had markedly elevated behavior problems (profile 3). These differences could be due to the different developmental stages of children in the two samples, as the previous study examined children between 7 and 8 years of age. The previous study also included both preterm and term-born children in their estimation of latent profiles, whereas the current study included only VPT children. Finally, the current study included cognitive, motor, language, and behavioral variables in the same model, rather than just behavioral variables, which would undoubtedly contribute to different LPA solutions.

The current study illustrates the need to move beyond individual variable analysis (e.g., Bayley or CBCL alone) and towards novel approaches for studying different profiles of risk. Child-centered, rather than variable-centered, approaches are one strategy for identifying subgroups of children with similar profiles of risk. Our results add to a growing literature that have identified distinct subgroups of preterm children who are likely to require targeted follow-up and intervention services. Such an approach is particularly relevant given the Academy of Pediatrics28,29 promotion of universal screening for a wide range of neurodevelopmental and behavioral conditions that impact children’s long-term developmental and achievement potential and the provision of early intervention referral for high-risk children, even in the absence of a specific diagnosis. Rather than focusing on single outcomes that may portend risk for future adaptive functioning, the current approach allows for a more comprehensive assessment and identification of children with multiple, perhaps more subtle deficits across multiple developmental domains that may culminate in day-to-day difficulties for children.23 The differentiation in patterns of deficits reported in the current study (e.g., cognitive, motor, and language deficits with or without co-occurring behavior problems) demonstrates how comprehensive developmental assessments could lead to precision medicine approaches in intervention development. In turn, provision of interventions that target specific co-occurring difficulties might yield greater impact than interventions that target individual domains separately. The LPA approach modeled here is powerful as it provides an efficient and illustrative summary measure of the “whole child”, rather than a piecemeal approach that requires clinicians to independently synthesize information from multiple sources (e.g., medical diagnoses, results of developmental assessments). Our understanding of the epidemiology of prematurity would also be enhanced by reporting prevalence rates for different profiles of developmental outcome in these children. This added information could have public health benefits, as it would broaden the scope of resources and interventions necessary to better address the needs of VPT infants and their families.

To better understand the utility of the profiles described here, future research should investigate how well they predict long-term adaptive functioning, including mental and physical health outcomes as well as social and academic competence. It would be valuable to note whether these profiles are more predictive than individual measures, given some evidence of poor prediction of broadband indicators such as Bayley scores in high-risk samples.30 If these profiles are not more predictive than individual broadband indicators, this could suggest that more nuanced measures tapping discrete facets of developmental and behavioral domains are needed. A more comprehensive assessment that additionally incorporated relevant child biomarkers (e.g., cortisol, epigenetics, heart rate variability) might also increase the predictive value of these profiles. Beyond improvements to the profiles, it will be important to identify the early life factors (e.g., medical complications, sociodemographic factors, adversity/stress) that predict membership in these groups. Finally, it would be worthwhile to investigate whether similar profiles would replicate in other at-risk groups beyond VPT children.

Strengths of this study include our use of a diverse sample of high-risk neonates followed longitudinally with relatively low attrition. However, we were limited in the types of assessments that were available at the 24-month follow-up. For example, only parental report of child behavior problems were available, as opposed to objectively determined behavioral difficulties. Additionally, although we used Bayley subscales rather than composite scores, the Bayley is still considered a broadband measure of developmental status, as opposed to more specific measures of distinct neurocognitive abilities (e.g., attention, memory, processing speed). Similarly, we used symptom subscales from the CBCL rather than overall summary scores such as externalizing, internalizing, and total behavior problem scores. We used Bayley and CBCL subscales because we wanted to provide a more nuanced understanding of the characteristics of these children. We recognize that conducting these analyses with a different set of outcome measures could alter the number and meaning of extracted latent profiles.

In sum, this study discovered four distinct profiles of VPT children at 24-month follow-up who differed in their cognitive, behavioral, motor, and language development, as well as in prevalence of CP and ASD risk. The profiles provide a “whole child” snapshot that enables us to describe child outcome across multiple domains. Child-centered analysis techniques such as LPA could facilitate the development of more targeted intervention strategies and provide caregivers and practitioners with a more comprehensive understanding of child behavior.