Introduction

Retinopathy of prematurity (ROP) is a severe complication of preterm birth and may lead to severe visual impairment or even blindness. It is a two-stage developmental vascular proliferative disorder resulting from a mismatch between oxygen demand and oxygen supply within the retina.1,2 By now, a large variety of risk factors for the development of ROP have been described, but it is not clear which risk factors are truly independent and which factors are rather associations.3

In developed countries, the incidence of ROP in preterm infants below 28 weeks of gestational age (GA) ranges widely between 25 and 91%, with treatment rates varying between 2 and 30%.4 The criteria for ROP screening differ between countries and, in developed countries, usually include patients below 31–32 weeks of GA and a birth weight (BW) below 1250–1500 g. All recommendations extend ROP screening to patients above given limits for GA and BW if additional risk factors were present. These factors are not clearly defined, but include prolonged oxygen therapy (Argentina5), respiratory problems or sepsis (Brazil6), “infants believed to be at high risk for ROP” (Canada7), or “unstable clinical course with cardiorespiratory support and are believed to be at high risk of ROP” (USA8).

However, these criteria seem to miss patients potentially requiring treatment in developing countries.9,10 Screening inclusion criteria for ROP need to reach a high sensitivity to avoid missing patients who potentially could suffer severe consequences such as blindness. When prevalence is small, high sensitivity leads to low specificity with many patients screened who ultimately do not require treatment. The lower the prevalence for infants requiring ROP treatment, the more crucial it is to optimize screening criteria in order to reduce work load in the neonatal intensive care units, financial expenses, and, most importantly, avoidable stress for preterm infants. This is an issue in particular in Switzerland with its very low prevalence of ROP treatment of 4.3% compared to other countries like Canada (10.4%), United Kingdom (8.5%), or Australia and New Zealand (7.3%) in preterm infants below 28 weeks.4,11

The aim of this study was to assess if the screening criteria for ROP can be optimized in very low birth weight (VLBW, birth weight below 1500 g) infants in Switzerland. All VLBW infants live-born between 2006 and 2015 in Switzerland were included in this study.

Patients and methods

This study is based on a retrospective analysis of patients who were born between 2006 and 2015 and were registered at the national registry of very preterm infants in Switzerland (SwissNeoNet, SNN) of the Swiss Society of Neonatology. The network prospectively collects perinatal and follow-up data of live-born infants with a GA between 22 0/7 weeks and <32 0/7 weeks or a birth weight of <1501 g. All 9 Swiss perinatal centers, 5 step-down units, and 16 neuro-/developmental pediatric units participated. Routine comparison with the Swiss Federal Statistical Office reveals 96% population coverage. This study includes patients with a GA below 32 0/7 weeks and focused on data from the perinatal dataset plus the information of any ROP treatment after primary discharge home from the 18 to 24 months follow-up assessment. Infants with a GA as of 32 0/7 and a birth weight below 1501 g were excluded because a cohort with both GA and birth weight as cutoff cannot include GA as the main predictor. None of these infants were treated for ROP. We excluded all infants that died before reaching 5 weeks of age as they never would have received a ROP screening.

Screening criteria for ROP differed slightly between the individual centers. ROP screening criteria included a birth weight of <1500 g in eight of nine Swiss hospitals with tertiary neonatal care. The remaining center screened only patients with a birth weight below 1250 g. In seven hospitals preterm infants with a GA of <32 0/7 weeks were screened, while in the remaining two centers the cutoff was at 30 0/7 weeks. One center started screening at a postnatal age of more than 7 weeks; all other centers started ROP screening between 4 and 5 weeks. Administration of supplemental oxygen for ≥3 days or unstable clinical course was defined as additional screening criterion in most hospitals. The final model contained the following five risk factors: GA, birth weight z-score, continuous positive airway pressure (CPAP) >3 days, multiple birth, and surfactant. The parameters rejected were: male sex, cesarean section, any antenatal steroids, delivery room intubation and mean weight gain per week hospital stay, any mechanical ventilation, and supplemental oxygen >3 days. The date of first ROP screening differed between 6 and 8 weeks postnatally. ROP was assessed by indirect ophthalmoscopy by the attending ophthalmologists. In Switzerland, all ROP treatments are performed by one of five ophthalmological clinics.

Definitions

Stages of ROP were categorized according to the international classification of ROP.12 Severe ROP was defined as ROP stages 3–5. Data on retinal zones and development of plus disease were not available from the database. ROP was stratified as treated ROP when any form of therapy (laser coagulation or intravitreal anti-vascular endothelial growth factor (VEGF)) was performed according to current recommendations.

Patients were classified as small for gestational age (SGA) when the birth weight was below the 10th percentile based on the growth curves by Voigt et al.13 Length of support with supplemental oxygen, CPAP, and mechanical ventilation was measured as the number of days during hospitalization with more than 12 h of the different form of respiratory support, respectively. We defined bronchopulmonary dysplasia as an oxygen requirement at 36 weeks postmenstrual age according to the NICHD consensus conference paper,14 NEC as clinical signs (abdominal distension, bilious aspirates, and/or bloody stools) confirmed by radiographically visible intramural gas or at laparotomy (Bell stages 2 and 3),15 and antenatal steroid use as any administration prior to birth, regardless of the time interval. Growth per week was calculated as weight at discharge or death minus weight at birth divided by the amount of weeks from birth until first discharge home or until death. Sepsis was defined as clear clinical evidence of infection plus at least one positive blood culture (including coagulase-negative staphylococci and fungal pathogens). Severe intraventricular hemorrhage (sIVH) was based on the most severe ultrasound result during hospital stay reaching grade 3 to 4 of the classification defined by Papile et al.16

Ethics

Data collection and evaluation for this study were approved by the Swiss Federal Commission for Privacy Protection in Medical Research and the Swiss ethical review boards (KEK-ZH-Nr. 2014-0551 and KEK-ZH-Nr. 2014-0552). The patients’ representatives were informed about the use of data for research.

Statistical methods

Based on risk factors of patients born between 2006 and 2012 (training set), a mathematical model using logistic regression for the outcome ROP intervention was performed using a step-wise elimination process while ensuring that variable elimination did not significantly change the model. An elimination of a variable was accepted only if the model with the variable was not significantly different to the model without the variable according to Wald (p < 0.10).17 The final model contained the following seven risk factors: GA, days of supplemental oxygen, days of CPAP, days of mechanical ventilation, birth weight z-score, surfactant, and multiple birth. The parameters rejected were: male sex, cesarean section, any antenatal steroids, delivery room intubation, and mean weight gain per week hospital stay. To test validity of this model to predict preterm infants requiring ROP intervention, it was applied on patients born between 2013 and 2015 (validation set). Sensitivity and specificity were analyzed as well as receiver operating characteristic (ROC) curves and predictive c-statistics. Model building and validation was based on fivefold imputed data using multivariate imputation by chained equations based on random forest.

Three of the included parameters are not available at birth: duration of supplemental oxygen, CPAP, and mechanical ventilation. We included a sensitivity analysis using these parameters as categorical data; whether or not they had longer than median length of respiratory support (supplemental oxygen >3 days, CPAP >3 days, mechanical ventilation >0 days).

Cochrane–Armitage test for trend was used to test the decrease of mortality over time. Results with a p value of <0.05 were considered significant. Analyses were performed with SPSS Version 21 (IBM Corporation, Armonk, NY, USA) and R 3.4 (r-project for statistical computing; www.r-project.org).

The following prediction model formula was applied:

$${p}\left( {{y} = 1{|x}_1, \ldots, {x}_p} \right) = \frac{{\exp \left( {{b}_0 + {\sum}\left( {{b}_k \ast {x}_k} \right)} \right)}}{{1 + {\mathrm{exp}}\left( {{b}_0 + {\sum}\left( {{b}_k \ast {x}_k} \right)} \right)}},$$

where b0 is the intercept, bk is the maximum likelihood estimate for chosen risk factors, and xk the risk factor values of individual patient.

Results

Record completeness of all infants according to comparison with Swiss Federal Statistical Office was 96%. Of the 7817 patients registered in the SNN database, information on ROP treatment was missing for 1116 patients of whom we retrospectively collected and completed 942 cases, leading to a data completeness of 97.8%. After exclusion of patients who died in the delivery room or during the first 5 weeks of life in hospital, 6719 infants remained for analyses.

Characteristics of included patients eligible for ROP screening are shown in Table 1. Analyses showed that the majority of patients requiring ROP therapy were born with a GA below 28 weeks. ROP was treated in only 13, 6, and 3 patients at a GA of ≥27 0/7, ≥28 0/7, and ≥29 0/7 weeks, respectively (Fig. 1). Patient characteristics of these patients are shown in Table 2. The mortality at 5 weeks of age, after which ROP screening usually begins, significantly declined between 2006 and 2015 (p = 0.03) raising the proportion of infants at risk for developing ROP over the years.

Table 1 Patient characteristics stratified by gestational age
Fig. 1
figure 1

Gestational age and birth weight of the study population. Red dots show patients with treated ROP

Table 2 Parameters of patients with a gestational age above 26 weeks needing ROP treatment

Infants were treated for ROP at a GA of 38.17 ± 3.46 weeks, range 32.85–56.42 weeks. Only two patients were treated with a lower GA than 34 0/7 weeks and in three patients ROP therapy was performed after 44 weeks GA. The chronological age at ROP treatment was 88 ± 24 days with a range from 41 to 215 days with one patient being treated before 60 days of life. The combination of GA and chronological age showed that all patients would have been identified prior to ROP treatment when screening had begun at 60 days of life or a GA of 37.42 weeks, whichever is reached first.

GA-stratified analysis revealed that missing data were randomly distributed for most of the parameters. Antenatal steroids had a higher rate of missing data at 23 weeks GA (17.2%) and 24 weeks (5.4%), respectively, ROP intervention had a higher rate of missing data in infants with a GA of 31 weeks (5.3%). Logistic regression analysis was based on fivefold imputed data to compensate for missing values in antenatal steroids (2.7%), ROP intervention (2.6%), and to a minor degree in other parameters. Without data imputation, all cases with missing information for either the primary outcome (ROP intervention) or one of the predictors would have been eliminated from the model prior to model building. This would have led to the higher rate of exclusion of data on either end of the GA range.

The training set (patients born between 2006 and 2012) had a patient population of 4522 preterm infants <32 weeks GA, including 56 patients with treated ROP. The validation set (patients born between 2013 and 2015) included 2197 patients with 20 ROP therapies. Table 3 shows values which resulted from analyses of our patient characteristics and were included in the final model.

Table 3 Patient characteristics used in the final prediction model

Depending on the cutoff point, the logistic regression model allows predicting the number of false-negative cases, that is, cases for which an ROP intervention was performed during 2013–2015, but which was not detected by the model. Number of infants needed to test true positives, false negatives, sensitivity, and specificity for each cutoff value are displayed in Table 4.

Table 4 Cutoff values for predicted ROP intervention probability

Results show that all patients needed to be screened to reach a sensitivity of 100%. However, to reach a sensitivity of 95.0% (one patient false negative) and a specificity of 87.6%, the model predicted a reduction in the number of screened patients to 13.2%. This reflects a high predictive c-statistics value of 0.916 (Fig. 2). The undetected patient was born at 30 + 1 weeks with congenital nephroblastoma and therefore probably with a different pathophysiology than the average patient requiring ROP treatment towards which the prediction model was fit.

Fig. 2
figure 2

ROC curve of c-statistic for prediction to need ROP treatment: AUC = 0.916

The analysis, using supplemental oxygen, duration of CPAP, and duration of mechanical ventilation as categorical variables, showed almost identical results. The ROC showed an area under the curve of 0.909. Only 16.7% of patients needed to be screened for a sensitivity of 95% and a specificity of 84.0% (details are presented in the Supplemental material).

Discussion

This study investigated ways to improve ROP screening criteria in Switzerland. Based on the low prevalence of ROP therapy during a period of ten years of 1.2% in children born below 32 weeks GA and <0.1% in infants born between 28 and 32 weeks, a logistic regression model was developed using known risk factors for ROP. The prediction model showed that all patients would have to be screened to guarantee a sensitivity of 100%. However, only 13% of patients needed to be screened for a sensitivity of 95.0%, missing one patient requiring ROP treatment.

All screening programs aim for a high sensitivity, since no patients with the specific condition should be missed. This is particularly true for ROP, where an unidentified infant can suffer dramatic consequences such as severe visual impairment or even blindness, underlining the importance of achieving 100% sensitivity. On the other hand, high sensitivity in turn leads to a low specificity. This exposes a high number of patients to minor consequences, which in ROP screening means exposure to pain, stress, increased work load, and higher financial expenses. Therefore, screening criteria should identify patients at risk for ROP and limit screening to these patients as far as safely possible. Furthermore, the latest safe time to begin screening should be identified.

Screening criteria for ROP usually include birth weight and GA. In developed countries most recommendations state that patients with birth weight of <1250–1500 g or a GA of <30–32 weeks should be screened for ROP.18

In developing countries, recommendations frequently include patients with higher GA or higher birth weight.18 Several studies have shown that the incidence of ROP is higher in these countries and that application of criteria as defined by the American Academy of Pediatrics (AAP)8 or the Royal College of Pediatrics and Child Health19 misses patients needing ROP treatment.9,10 This discrepancy confirms the absence of uniform screening criteria. Instead, we suggest that criteria could be improved if based on local risk factors and local incidence of ROP.

Our prediction model including data imputation showed that only 13% of patients needed to be screened for ROP to reach a sensitivity of 95.0% missing one patient. This raises the question if there were additional risk factors in this patient and if this patient should have been screened for the above-mentioned unclearly defined criteria. These criteria could not be included in the prediction model since databases rely on variables with clear definitions and therefore cannot include parameters such as “believed to be at risk.”

The only patient missed with this prediction model suffered from a nephroblastoma. Nephroblastoma is a well-vascularized tumor which is dependent on vasculogenesis for growth and which frequently expresses high doses of vascular growth factors, such as VEGF.20 It is well known that VEGF plays a major role in the development of ROP in the second phase of relative hypoxia of the retina.2 Thus, it is very likely that the development of ROP in this patient was intensified by the growth factors produced by the tumor and not only by the usual pathophysiologic mechanisms occurring during ROP development in preterm infants. Therefore, this patient had additional specific risk factors that make ROP screening necessary, independent of GA or birth weight.

Altogether, these considerations indicate that there is a large potential to reduce ROP screening in Switzerland. Our model applies data available at patient discharge (duration of supplemental oxygen, CPAP, and mechanical ventilation) and is therefore not fit for direct prospective implementation at screening age. To assess if a similar model can be achieved with variables that are available before first screening is performed, we included a second sensitivity analysis using modes of respiratory support as categorical variables. The results were almost identical to the first analysis, proving large potential to reduce ROP screening. However, a prospective study first needs to be performed based on these screening criteria and augmented by a list of diagnosis for which ROP screening should be performed (independent of GA or birth weight in very preterm infants). This should enable the selection of new screening criteria matching the local situation.

Several studies assessed prediction models for ROP development. A recent publication summarized publications and methods which have been evaluated.21 Of these, the WINROP algorithm has been studied most extensively. WINROP was developed in Sweden and is based on postnatal weight gain, either alone22 or in combination with serum levels of insulin-like growth factor-1.23 The algorithm has mostly been tested retrospectively and was used in different populations. While prediction has shown to be excellent in one Swedish publication,24 application of the algorithm did not reach a sensitivity of 100% in all other populations with values mostly around 90%.25,26,27,28 Furthermore, these retrospective analyses included relatively small sample sizes of approximately 600 patients. This shows that a screening solely based on WINROP would fail to detect several patients developing severe ROP.

An extensive analysis of all Danish patients treated for ROP in the years 2002–2006 compared screening criteria combining GA at delivery and birth weight limits and new risk-based criteria were compared with regards to their effectiveness.3 Results showed that a reduction of 17.4% of screened patients allowed the detection of all patients treated for ROP in the observation period and might lead to one missed treatment-demanding ROP every 11 years and one case of blindness every 18 years. This implies that a potential for reduced ROP screening exists in Denmark. However, it also mirrors the difficulties of optimizing screening criteria with possibilities to avoid screening in a large group of preterm infants on the one hand, but increasing risk to miss patients requiring ROP treatment on the other hand.

Besides the question who needs to be screened for ROP, it is also a matter of debate when screening should be started. The AAP recommends that the first ROP screening should be performed at a chronological age of 4 weeks, but not before a postmenstrual age of 31 weeks is reached.8 These limits are considerably earlier than results of our study, where all patients would have been detected prior to therapy with limits of a chronological age of 60 days or postmenstrual age of 37 weeks. This shows that the development of ROP differs between populations not only concerning of incidence but also in terms chronologic progression. Starting ROP screening in Switzerland at a later chronologic and/or postmenstrual age than at the dates recommended by the AAP seems possible.

Our study has several strengths and limitations. Of note is that different screening criteria were used across the neonatal centers, in particular some centers did not screen patients between 30 and 32 weeks GA. However, these units reduced screening because long periods of ROP-free infants above these criteria were documented. No extra cases of ROP were found at 2 years of age that were missed. Our study was based on a whole population in which a low incidence of treated ROP was reported. This resulted in only 76 patients with treated ROP, which made the development, the validation of a prediction model, and definition of new screening criteria difficult. Our precise data analyses revealed that results of this retrospective study are very robust and can therefore be expected to be valid. In order to further increase the validity of our results, it would have been desirable to include more precise descriptions of ROP in terms of zone and plus disease stadium, as well as type I and II ROP in the analyses.29 However, these data were not available from the SNN database.

A strength of this study is the completeness of data. Almost 7000 preterm infants below 32 weeks were evaluated over a period of 10 years. Concerning ROP, we reached data completeness of more than 97% of patients. The missing data was estimated using multivariate imputation by chained equations. Furthermore, and in contrast to many other databases, the SNN database is not formed by a collaboration of certain hospitals, which might lead to a selection bias, but represents a whole population. Data of 96% of all Swiss VLBW infants born during the observation period were collected, making results representative for Switzerland. GA stratified comparison with the birth registry of the federal statistical office revealed that the missing 4% of the population predominantly concern infants that died at extremely low GA in peripheral low-level neonatology units.

Conclusion

Results of this study show that it may be possible to greatly reduce the number of infants requiring ROP screening and thereby reducing their burden and saving health care costs and resources. A prospective test of the identified model is needed before it can be applied as a general guideline.