Introduction

Retinopathy of prematurity (ROP) affects the immature retina of premature infants and is a leading cause of lifelong visual disability worldwide [1]. Infants at risk for ROP are identified using recommended birth weight (BW) and gestational age at birth (GA) criteria. These BW and GA levels are set high in an attempt to ensure that all infants requiring treatment are examined [2,3,4]. Although currently recommended guidelines have high sensitivity for the detection of ROP, their specificity is low, as less than 10% of examined infants receive treatment, and only about half of examined infants develop any ROP [1, 5, 6].

Due to the low diagnostic ability of current screening guidelines, various statistical prediction models have been developed to try to identify infants at high risk for ROP allowing better allocation of resources for those at high risk while sparing those at low risk [7].

DIGIROP (https://www.digirop.com) [8] is a recently published online prediction model that uses only birth characteristics data (GA, BW and sex) to estimate the risk for treatment of sight-threatening ROP for infants born at 24–30 weeks’ GA. In the original cohort, which included 7609 Swedish infants, with a mean GA of 28.1 weeks and a mean BW of 1119 g, this algorithm obtained a sensitivity, specificity, positive predictive value and negative predictive value that were numerically non-inferior as those obtained from previously reported algorithms (Children’s Hospital of Philadelphia-ROP (CHOP-ROP), Omaha-ROP (OMA-ROP), weight, insulin-like growth factor 1, neonatal, ROP (WINROP) and Colorado-ROP (CO-ROP)) [8].

The primary aim of this study was to determine the efficacy of the online prediction model DIGIROP in detecting treatment-requiring retinopathy of prematurity (TR-ROP) in a Portuguese cohort.

Materials and methods

Study design

This was a retrospective cohort study of all preterm infants who underwent ROP screening from April 2012 to May 2019, in two neonatal intensive care units from two institutions in Portugal. The study was approved by the Ethical Review Boards of Centro Hospitalar de Lisboa Ocidental and Hospital Beatriz Ângelo, by the National Data Protection Authority (Comissão Nacional de Proteção de Dados) and by the Ethics Committee of the Nova Medical School. It was carried out in compliance with the tenets of the Declaration of Helsinki, in its latest version (Brazil, 2003).

Methods

Following the current Portuguese screening guidelines, all infants with GA ≤ 32 weeks, BW ≤ 1500 g or at higher risk of ROP determined by a neonatologist were eligible to be included. Infants without a known ROP outcome and those with any ocular diseases apart from ROP were excluded. The estimation of GA was based on foetal ultrasonography.

For ROP screening, an initial fundus examination using indirect ophthalmoscopy was performed at 32 weeks of postmenstrual age (PMA) or at 4 weeks of chronological age, whichever came later. The diagnosis of ROP and indication of treatment for ROP followed the International Classification of ROP Revisited [9] and the Early Treatment for ROP Study [10], respectively. The term severe ROP included both ‘Type 1 ROP’ (defined as zone I with plus disease with any stage or zone I stage 3 with no plus disease or zone II stage 2 or 3 with plus disease) and ‘Type 2 ROP’ (defined as zone I stage 1 or 2 without plus disease and zone II stage 3 without plus disease), both defined by the criteria of the Early Treatment for ROP study [10]. All infants diagnosed with Type 1 ROP underwent digital imaging recording and were subsequently treated.

Birth characteristics data, including sex, BW, GA, dates of retinal examinations, ROP stage and zone at every examination and any treatments performed for ROP, were collected from the clinical records.

DIGIROP model

The online prediction model DIGIROP (https://www.digirop.com) provides risk estimation and 95% confidence interval (CI) for sight-threatening ROP requiring treatment using only birth characteristics data for infants with GAs from 24 to 30 weeks.

GA, BW and sex were retrospectively inserted on the online application and the risk estimation with 95% CI was recorded.

Statistical analysis

The algorithm was applied to all infants with GA 24–30 weeks. The optimal cut-off point to achieve 100% sensitivity was calculated. Receiver operating characteristic curves, with calculation of the area under the curve (AUC) and 95% CI, were used to evaluate performance of the algorithm to identify TR-ROP. Data analyses were performed using SPSSv23 software (IBM®, USA).

Results

Baseline characteristics

Of the 431 infants who underwent ROP screening and had information on ROP outcome, 174 infants were excluded for having a GA outside 24–30-week limit imposed by the DIGIROP algorithm criteria. Of the 257 infants eligible for DIGIROP analysis, median GA was 29 weeks (range 24–30), median BW was 1060 g (range 408–2080) and 8.9% (n = 23) developed TR-ROP.

Among the 174 infants excluded from DIGIROP, median GA was 31 weeks (range 23–36), median BW was 1446 g (660–2670) and 1.3% (n = 2) developed TR-ROP, (Table 1). Common risk factors for the development of ROP among the cohorts are listed in Table 2.

Table 1 Gestational age (GA), birth weight (BW) and gender.
Table 2 Risk factors for retinopathy of prematurity.

Algorithm outcome

After applying the DIGIROP model, the highest obtained risk estimation for sight-threatening TR-ROP was 0.5454 (95% CI 0.4343–0.6616) with a median achieved risk of 0.0938 (range 0.0016–0.5404). In the non-TR-ROP group, the highest obtained risk for TR-ROP given by the algorithm was 0.3655 (95% CI 0.2899–0.4519) with a median risk of 0.0039 (range 0.0001–0.3655).

The optimal cut-off point to achieve 100% sensitivity was 0.0016. The number of infants requiring screening for ROP would have decreased from 257 to 187 infants (−27.2%) if the model was applied.

AUC for TR-ROP was 0.70 (95% CI 0.57–0.83) with a sensitivity of 0.52 (95% CI 0.31–0.73) and a specificity of 0.86 (95% CI 0.76–0.93).

Discussion

Failure to detect TR-ROP can lead to irreversible blindness. Therefore, screening criteria have to have high sensitivity. Since only 4.3–30.4% of infants [11] will develop TR-ROP, many who do not require treatment will be screened. Screenings are not free of complications. There are several indicators that examinations are stressful and painful for the infants and pharmacological mydriasis may have potential serious adverse events. Moreover, ROP screening requires a skilled workforce available 52 weeks a year, which may not always be feasible [12, 13].

Given this context, several attempts have been made to improve screening in order to stratify risk and decision making, thus reducing unnecessary examinations and improving resource allocation.

Most of the proposed prediction algorithms use statistical modelling approaches that include BW, GA and other postnatal risk factors, such as oxygen exposure, sepsis or postnatal weight gain [7] that need to be collected over time. For instance, the weight in the WINROP model is inserted up to 36-week PMA [14, 15], whilst for the Growth and Retinopathy of Prematurity (G-ROP) modified screening criteria it is measured up to postnatal day 39 [16]. The need for continuous data input can hamper the use of these models namely when babies are transferred between neonatal units.

Recently, Pivodic et al. [8] developed and validated an individual risk prediction model (DIGIROP) for identifying TR-ROP using solely birth characteristics (BW, GA and sex), without the need of any other postnatal factors. The prediction model was developed based on 7286 Swedish infants and had an AUC of 0.90. After external validation in new cohorts in Sweden (n = 323), United States (n = 1535) and Europe (n = 354), the AUC remained high (0.94, 0.87 and 0.90, respectively) [8]. In addition, DIGIROP obtained a sensitivity and specificity that were numerically non-inferior to those obtained from previously reported algorithms requiring more complex postnatal data (CHOP-ROP, OMA-ROP, WINROP and CO-ROP) [8].

Our goal was to determine the ability of the DIGIROP model in detecting TR-ROP in a Portuguese cohort.

In our cohort, the highest risk obtained in the TR-ROP population sample was 0.5404 vs. 0.3655 in the non-TR-ROP population sample, with a median achieved risk of 0.1174 in TR-ROP vs. 0.0039 in non-TR-ROP. The AUC for TR-ROP was numerically lower than the one reported by the DIGIROP group (0.70 vs. 0.90), indicating only moderate accuracy. There is no information regarding race/ethnicity in the development model. However, the AUC by race/ethnicity, as reported for the US cohort, varied from 0.79 for Hispanics to 0.90 for blacks [17]. It is possible that differences in race/ethnicity might partially explain our results.

In our cohort, the optimal cut-off point to achieve 100% sensitivity was 0.0016. Even though not discussed in detail in their manuscript, the cut-off values for the DIGIROP model selected based on sensitivity obtained applying the published cut-off values from the CHOP-ROP, OMA-ROP, WINROP and CO-ROP models varied between 0.0076 and 0.02 [7]. The apparent different performance of DIGIROP in our cohort might be due to differences between patient populations.

The Total SWEDROP Cohort, Validation US Group and Validation European Group, from which de DIGIROP was initially validated, had a median GA of 28.6, 27.9 and 28.1 and a median BW of 1110, 980 and 990 g, respectively. The present cohort had numerically slightly different BW and GA (median GA was 29 weeks and median BW 1060 g).

Prediction models cannot be used in new settings without validation studies to prove that the model is generalisable and performs well with other patients and settings [18]. Since no prediction model performs well universally, research on developing and validating robust ROP prediction models for clinical use continues to be of interest. Our findings show that even algorithms that were validated in large cohorts (>7000 infants) can perform differently when applied in a different setting.

This is not the first time that an algorithm underperforms when tested on a new cohort. ROP risk profiles can vary widely from country to country, and even from city to city and unit to unit [19]. The WINROP model demonstrated 100% sensitivity (95% CI, 90–100%) for severe ROP in its original cohort but subsequent validation studies demonstrated consistently lower sensitivities ranging from 98.6% (95% CI, 95–99.8%) in a multicentre US and Canadian cohort, to 62% (95% CI, 55.9–69.9%) in Spain [20,21,22,23,24,25,26].

One of the proposed explanations for this phenomenon is that the mechanism for ROP development and progression may differ between high-income countries and low-income countries. In highly developed neonatal care settings, low BW and early GA are significant risk factors for ROP, and the addition of slow weight gain, as a surrogate for low serum IGF-1, helps to increase the sensitivity for identifying ROP. In countries with developing neonatal care systems, infants with significantly higher BW and GA can develop severe ROP, primarily due to excessive supplemental oxygen administration [6]. Hyperoxia inhibits VEGF activity and destroys immature retinal vasculature, causing a true ‘oxygen-induced retinopathy’ type of ROP [7]. Moreover, older GA infants have higher levels of endogenous IGF-1 production, suggesting that low serum IGF-1 is not a factor in limiting VEGF activity, and slow weight gain is not a predictive factor for this variant of ROP [7].

Portugal has a well-developed neonatal care, with a survival threshold of 25 weeks [27]. According to the 2013 Portuguese Very Low Birth Weight Infant Registry, the overall survival rate was 89%. When analysed by GA intervals, survival rates were 95.3% for 28 and 31 weeks, 77.6% for 25–27 weeks and 42.1% and 16.7% for 23 and 24 weeks, respectively [27]. Early and late neonatal mortality rates at and after 22 weeks of gestation in 2015 in Portugal, when compared to Sweden, were 2.1 and 1.5 per 1000 live births, respectively [28]. In comparison, the total preterm-related infant mortality rate was 2.01 per 1000 live births in 2016 in the United States [29].

In Portugal, all infants under 32 weeks are screened. Due to the GA limits of DIGIROP, 174 infants were not able to be inserted in the algorithm and, of these, 2 (1.3%) developed TR-ROP. In the recently developed G-ROP model [16], validated in 7483 infants in North America (with BW < 1501 g, GA < 32 weeks or a poor postnatal course), 1440 infants (19.2%) had a GA of 31 weeks or more. This ability to include all screened infants is an advantage comparatively to DIGIROP. Most of the older GA babies will not develop TR-ROP but some do require treatment. It is imperative to develop prediction models that correctly identify this small subset of infants at high risk. Pivodic et al. [8] did not compare DIGIROP with G-ROP in their article.

Another possible explanation for the different performance of DIGIROP in our cohort is that although the sole use of birth characteristics makes this model appealing and easy to use, the exclusion of postnatal factors might also have led to the non-identification of possible TR-ROP. It is widely recognised that postnatal factors play an important role in the development of severe ROP [7].

There are some limitations to consider in this study, namely its retrospective nature, the inclusion of only two institutions from the same geographical area and the relatively small sample size. Also, the collection spanned 2012–2019 as such we cannot fully account for changes in medical practice during this period. However, any prediction model must be validated in novel populations, and testing this new algorithm’s applicability in the Portuguese population is important.

In conclusion, in our cohort, the optimal cut-off point to achieve 100% sensitivity was 0.0016, which would have led to a decrease of 27.2% of screened infants. It is essential that algorithms continue to be tested in different populations, especially in cohorts that include both younger and older GA infants.

Summary

What was known before

  • Retinopathy of prematurity (ROP) is a leading cause of preventable blindness worldwide.

  • Prediction models can potentially help to stratify risk/decision, while reducing unnecessary examinations and improving resource allocation.

  • Prediction models cannot be used in new settings without validation studies.

What this study adds

  • First time DIGIROP has been applied after its original publication

  • The optimal cut-off point to achieve 100% sensitivity in our cohort was 0.0016 vs. 0.0076–0.02 in the original publication. AUC for TR-ROP revealed moderate accuracy.

  • The number of infants requiring screening for ROP would have decreased from 257 to 187 infants (−27.2%) if the model was applied.

  • It is essential that algorithms continue to be tested in different populations, especially in cohorts that include both younger and older GA infants.