Introduction

Retinopathy of prematurity (ROP) is a disorganized growth of developing retinal blood vessels that affects approximately 14,000 premature infants in the United States (U.S.) per year [1]. ROP is characterized by an initial delay in the first phase of retinal vascularization, followed by pathologic vasoproliferation and intravitreal angiogenesis [2]. Of the estimated 1500 infants diagnosed with severe enough ROP to require medical or surgical treatment in the U.S., one-third become legally blind (defined as 20/200 vision or less) due to this condition [3,4,5].

Current American Academy of Pediatrics clinical guidelines for ROP recommend screening all infants ≤1500 g birth weight or ≤30 weeks gestation [6]. Screening should be initiated at 4 weeks chronological age or 31 weeks postmenstrual age, whichever occurs later. Additional examinations are performed repeatedly to detect late-stage ROP, and different strategies may be needed for infants 22–23 weeks’ gestational age, for whom guidelines are extrapolated [6]. Exams are performed by an ophthalmologist with expertise in ROP, and the need for follow-up examinations is based on retinal findings. While many countries follow these guidelines, regional factors can greatly impact the incidence of ROP. Other countries have screening criteria that range from <1000 to 2500 g birth weight and <30 to 37 weeks gestational age [7].

Retinal examinations are screening tests that are capable of identifying infants with ROP. If left untreated, ROP can led to retinal detachment and blindness [8]; however, since only a small number of infants develop ROP significant enough to require treatment according to current guidelines [9], constructing strong predictive models to quantify risk could identify a target population of infants who are far more likely to benefit from screening [8]. Previous models including Weight, Insulin-like growth factor I, Neonatal, Retinopathy of Prematurity (WINROP; n = 79 infants) [10]; Colorado Retinopathy of Prematurity (CO-ROP; n = 499 infants) [11, 12]; and Children’s Hospital of Philadelphia Retinopathy of Prematurity (CHOP ROP; n = 524 infants) [13] used relatively small cohorts to develop algorithms for ROP screening and were later validated in larger cohorts. However, model development using a larger at-risk population could lead to more informed decisions regarding exposure to procedures that may be costly or cause discomfort [14,15,16].

We used demographic, clinical intervention, and outcomes data obtained from the medical records of a large multicenter cohort of hospitalized preterm infants who met current screening guidelines for ROP to identify factors associated with treating ROP. Additionally, we assessed if this information can be utilized to identify subgroups of infants with quantifiably low risk of needing ROP treatment.

Methods

Study design and setting

We identified infants discharged from neonatal intensive care units (NICUs) across the U.S. managed by the Pediatrix Medical Group from 2006–2015 with birth weight ≤1500 g or gestational age ≤30 weeks who had undergone an examination for ROP in accordance with current screening guidelines [6]. Infants were excluded if they did not survive to discharge, were transferred prior to discharge, or had major congenital anomalies.

Data source

We obtained the data from the Pediatrix Clinical Data Warehouse, which is an electronic health record capturing information that is prospectively collected with computer-assisted tools from physicians’ daily notes, procedure notes, laboratory results, and admission and discharge summaries. Data were collected daily from an infant’s admission until death or discharge. These data included maternal history and demographics, medications, culture results, laboratory results, diagnoses, and other aspects of clinical care.

Definitions

We defined treated ROP as receiving laser therapy, cryotherapy, surgical therapy (such as vitrectomy or scleral buckle), or treatment with bevacizumab. We measured weight between day 0 and day 28 of life and normalized these values by assigning Z-scores using standard growth curves [17]. A Z-score of 0 represents an average weight for postmenstrual age. We calculated the change in weight Z-score by subtracting the Z-score on day 0 from the Z-score on day 28; these values were divided into categories (<−2, between −2 and −1, between −1 and 0, and ≥0). A negative change in weight Z-score would indicate that the infant had decreased in weight percentile between day 0 and day 28. We defined necrotizing enterocolitis (NEC) as any new episode of NEC according to modified Bell’s stage IIA or greater [18]. We defined bacteremia as any blood culture positive with an organism not typically considered a contaminant prior to postnatal day 28. Intraventricular hemorrhage (IVH) was defined as presence of grade III or IV IVH prior to postnatal day 28. The number of infants screened per treated infant was calculated as the inverse of the incidence of treated ROP.

Statistical methods

We used frequencies (with percentages) and medians (with 5th and 95th percentiles) to describe categorical and continuous study variables, respectively. The primary outcome was treated ROP. We compared the distribution of characteristics between infants with treated ROP and those without treated ROP using the Chi-squared test. We chose risk factors that occurred prior to postnatal day 28, when many infants are evaluated for the need for ROP screening. Risk factors evaluated on postnatal day 28 included: history of bacteremia, NEC, and IVH prior to postnatal day 28; respiratory support received on postnatal day 28; and level of supplemental oxygen received on postnatal day 28. Other risk factors included: birth weight, sex, race, small for gestational age (SGA) status, exposure to antenatal steroids, and change in weight Z-score. We did not include gestational age in the model because of its collinearity with birth weight. We performed univariable logistic regression analyses to determine which risk factors were most strongly associated with treated ROP. We calculated odds ratios with 95% confidence intervals (CIs), sensitivity, specificity, positive and negative predictive values, positive likelihood ratios, and negative likelihood ratios for each of these risk factors. We generated receiver operating characteristic (ROC) curves in order to calculate an area under the curve value.

We performed a stepwise regression with selection of covariates in order of significance and backward elimination using 0.2 as the threshold p-value for inclusion in the final multivariable regression model. Due to the strong relationship between birth weight and gestational age, we chose to include only birth weight in the model. In order to evaluate model performance, we applied it on a cohort of infants discharged in 2016 from Pediatrix Medical Group NICUs with the same inclusion criteria. We chose a cutoff of 1% probability of treated ROP to determine which infants would require screening.

Statistical analyses were performed using STATA 15 (College Station, TX). All statistical tests were two-sided with a significance of 0.05. Permission to conduct this study was provided by the Duke University Institutional Review Board (Durham, NC).

Results

Among 75,821 infants from 281 NICUs meeting the inclusion criteria (Fig. 1), 3% (2306/75, 821) developed treated ROP (Table 1). The median birth weight was 695 g (5th, 95th percentile; 475, 1080) for infants with treated ROP and 1140 g (640, 1565) for infants without treated ROP. The median gestational age was 25 weeks (23, 28) for infants with treated ROP and 29 weeks (24, 32) for infants without treated ROP. The median weight change from postnatal day 0 to day 28 was 248 g (60, 519) in infants with treated ROP and 390 g (139, 687) in infants without ROP. On unadjusted analyses, all risk factors were significantly different between infants with and without ROP treatment.

Fig. 1
figure 1

Study flow diagram. ROP, retinopathy of prematurity

Table 1 Demographics of infants screened for retinopathy of prematurity from 2006–15

Infants meeting several combinations of risk factors were at low risk of being treated for ROP (Table 2). More severe clinical status on postnatal day 28 had a greater association with the incidence of treated ROP at lower birth weights (Table 2). Risk factors in the final multivariable regression model included birth weight, any ventilator support on postnatal day 28, small for gestational age, sex, IVH, bacteremia, fraction of inspired oxygen (FiO2) on postnatal day 28, antenatal steroids, change in weight Z-score of ≥0, race, and NEC (Table 3). The area under the curve for the final multivariable regression model was 0.90 (95% CI: 0.89–0.90).

Table 2 Incidence of treated ROP among infants ≤1500 g birth weight or ≤30 weeks gestational age grouped by birth weight and common clinical factors
Table 3 Multivariable logistic regression modela to predict treated retinopathy of prematurity

Applying our final multivariable regression model to a cohort of 6127 infants discharged in 2016 with non-missing data for model covariates and outcome, we found that the model was able to accurately identify infants at risk for treated ROP. Sensitivity of the model to predict treatment of ROP using a probability cut-point of >1% was 97.9% (95% CI 92.5–99.7%), and specificity was 63.3% (62.0–64.5%). Positive predictive value was 4.0% (3.2–4.9%) and negative predictive value 99.9% (99.8–100.0%). Only 2 of 94 infants with treated ROP did not meet the 1% probability of treated ROP cut-point criteria for screening based on the model; these two infants were both born at a gestational age of 28 weeks, had birth weights of 925 g and 1180 g, and did not receive mechanical ventilation at postnatal day 28. Neither infant had IVH, NEC, or bacteremia.

Discussion

In our cohort of more than 80,000 very low birth weight infants, several factors were found to be associated with treated ROP, including low birth weight, ventilator status on postnatal day 28, SGA, male sex, history of IVH, history of bacteremia, increased FiO2 on postnatal day 28, lack of antenatal steroid exposure, white race, increase in weight Z-score between postnatal day 0 and 28, and history of NEC. The association between development of treated ROP and these factors is consistent with previous reports in this population. We also identified several groups of infants, including those with higher birth weights and a favorable clinical status on postnatal day 28, who were at low risk of receiving ROP treatment (<1%).

With various cutoffs among our cohort, the number of infants screened per treated infant ranged from 152 to 1064. Due to the disastrous consequences of ROP, it is generally accepted that a high number of infants need to be screened in order to identify potential cases of ROP; this principle also underlies the use of other widely accepted screening tests. For example, the Newborn Metabolic Screening test is used on more than 95% of newborns in the U.S. annually [19]. Similar to our model, the Newborn Metabolic Screening test has accepted a high number of infants needed to be screened (1333), due to the severe consequences of potential diseases [19].

The benefits of detecting as many cases of ROP as possible must be weighed against the risks of over-screening, which include infant discomfort, excessive cost, and limited availability of trained clinicians to perform the standard screening tests [20]. Other recent publications have advocated for a risk factor-based approach to ROP screening for infants not meeting traditional criteria, particularly in less developed countries [21, 22]. A recent study of 1380 infants found that limiting screening to infants <1250 g or <30 weeks and infants 30–32 weeks or 1250–1500 g with one or more risk factors resulted in 29% fewer exams with zero cases missed [23]. Another study reviewed 259 cases of ROP in infants >1250 g and found that the infants who developed stage 3 ROP had at least two other risk factors [24]. These smaller studies support the utility of applying stricter criteria for ROP screening.

Several studies have evaluated predictive factors for the development of ROP. Poor postnatal weight gain [25] and low serum insulin-like growth factor 1 levels were found to be associated with ROP and subsequently, the WINROP screening algorithm was developed and validated for its ability to predict treatment of ROP [10, 26]. Another recent study examined the CO-ROP screening test using data from 6351 premature infants. The CO-ROP test uses birth weight, gestational age, and weight gain in the first month of life as its determining risk factors for ROP. CO-ROP was found to have a sensitivity of 96.9% (95% CI, 95.4–97.9%) and a specificity of 40.9% (95% CI, 39.3–42.5%) [11]. The CHOP ROP model uses the same risk factors as the CO-ROP test; 7483 infants were included in the CHOP ROP validation test, where the model was proven to have a sensitivity of 98.5% (95% CI, 96.9–99.3%) for detecting type 1 ROP [27]. Another prospective cohort study of 487 very low birth weight infants in a Turkish NICU found that weight gain in the first 4 weeks of life was not a significant predictor of ROP [28]. The authors concluded that weight gain is a surrogate marker for other clinical comorbidities rather than an independent risk factor for ROP. Finally, a recent Swiss study performed on 6719 very low birth weight infants found that their model, which did not include weight gain, reduced the number of infants requiring screening by 13%, with a sensitivity of 95% and specificity of 88% for treated ROP [29]. The prevalence of treated ROP in this group (1%) was lower than in our cohort (3%).

Because absolute growth in grams per day is dependent on birth weight, we chose to examine weight by change in Z-score. We found that compared to a change in Z-score of <−2, a change in Z-score between −2 and −1, and between −1 and 0, did not predict treated ROP, whereas a change in Z-score of ≥0 was associated with ROP. We speculate that infants who had faster than expected growth that was associated with an increase in their weight Z-score, may have been more clinically ill, experiencing weight gain through retention of fluid, and requiring more respiratory support. Importantly, we did not find that a decrease in Z-score was predictive of ROP. Small for gestational age status appeared protective in our model. While this finding seems counterintuitive, it is easily explained; since current guidelines recommend screening for infants ≤1500 g birth weight or ≤30 weeks gestational age, our data include some proportion of infants who are at the larger end of the weight range, and whose gestational age is quite advanced. The median gestational age for SGA infants was higher than gestational age for non-SGA infants, so being SGA appeared relatively protective.

Based on the findings of our study and others, limiting screening to infants <1250 g in addition to selective screening of infants up to 1500 g with specific risk factors, would significantly decrease the amount of ROP examinations compared to the current American Academy of Pediatrics guidelines. Implementing a risk-based screening strategy in our infant population would significantly reduce the need for screening. When testing our model on infants discharged in 2016, only 2/94 (2.0%) infants with ROP were not identified; however, each of these infants had a birth weight <1250 g and would be identified for routine screening with a birth weight cutoff of <1250 g. Of the 6127 infants screened in 2016 under the current guidelines, 3819 of these infants would have been spared screening based on our model and our chosen threshold probability of 1%. While the number of infants spared depends on the threshold chosen, reduced screening can lower the economic burden of ROP examinations, as well as the pain and discomfort of the examination for infants. Nevertheless, changes in medical practice should only be undertaken after all stakeholders (including families and patient advocates) are involved. After making these changes, outcomes should be prospectively monitored at both the institutional and multicenter levels. NICUs must balance the benefits of more flexible screening guidelines against the risk of more undetected cases of ROP, which can lead to blindness.

Our study has many strengths, including the large sample size and use of a database that includes infants from centers in >30 U.S. states; the Pediatrix Clinical Data Warehouse includes approximately 25% of all infants admitted to neonatal intensive care units in the U.S. Several studies of predictive models use a longitudinal approach to gathering risk factors, such as weight gain. Rather than relying on the entire clinical course in risk stratification, using a postnatal day 28 snapshot of an infant decreases the need for labor-intensive longitudinal monitoring to estimate risk. Given the large numbers of infants included in our analysis, in addition to supportive findings in other published studies, we believe that our results accurately represent the cohort of infants who are screened across the U.S. Our study was limited by the potential for inter-center variability in documentation, as well as variability in clinical care (e.g., oxygen limits and threshold for ROP treatment), as all Pediatrix NICUs do not operate under identical clinical protocols. While we believe that our results are well representative of the cohort of infants who are screened across the U.S., it is possible that particular characteristics of Pediatrix NICUs could limit generalizability to other settings. Infants who were transferred from neonatal intensive care units to next level hospitals were not included in this study, due to the potential for missing data and inaccurate final diagnosis of ROP. This group included both acute and convalescent transfers; therefore, the impact of these missing data is unknown. We also excluded infants with congenital anomalies, since they may have additional diagnosis-specific risk factors that place them at increased risk for ROP. Additionally, changes in neonatal intensive care over the study period and in the future may affect the validity of our model. This model will need to be validated further through external implementation, particularly in non–Pediatrix-affiliated NICUs. Our model is complex compared to current screening guidelines and other published models; however, other risk calculators, such as the Kaiser sepsis risk calculator, have been implemented in well-baby and intensive care nurseries across the U.S. [30, 31]. Incorporation of such calculators into the medical record can ease adoption of new practices. Alternatively, while many factors were associated with ROP, most of the model variability (~87%) was explained by birth weight, with additional factors leading to progressively smaller increases in area under the curve. This finding suggests that a more parsimonious version of our model may demonstrate acceptable predictive accuracy.

In summary, our results suggest that infants >1250 g without other risk factors may be at very low risk of developing ROP requiring treatment. With the current screening guidelines, a substantial number of infants in this group will be required to undergo a screening with potentially negative findings in order to find those with ROP. A risk factor-based approach would significantly decrease the number of retinal examinations in infants with a minimal number of missed ROP cases. Further studies are needed to determine whether the small risk of missed ROP cases, with potential for blindness, is acceptable when the model is applied to other populations.