Manuscript citation: Pivodic A, Hård AL, Löfqvist C, Smith LEH, Wu C, Brunder MC, Lagreze WA, Stahl A, Holmstrom G, Slbertsson-Wikland K, Johansson H, Nilsson S, Hellstrom A. Individual risk prediction for sight–threatening Retinopathy of Prematurity using birth characteristics. JAMA Ophthalmol. 2020;138:21–9.
Type of investigation
Retrospective cohort study.
Can an accurate prediction model be constructed for retinopathy of prematurity needing treatment using only birth characteristics of preterm infants 24-30 weeksʼ gestation?
Design: Retrospective cohort study using data from Swedish National Patient Registry of infants screened for retinopathy of prematurity (ROP) from 1 January 2007 to 7 August 2018. The purpose was to develop and validate an individualized, predictive model to estimate risk for treatment of sight-threatening ROP using birth characteristics. (ROP treatment was Laser surgery in most cases, anti-vascular endothelial growth factor antibody in some cases and a combination of both treatments in a few cases, Hellstrsom A, personal communication).
Outcome: The study outcome was ROP treatment (dichotomous variable based on the International Classification of ROP and Early Treatment for Retinopathy of Prematurity (ETROP) criteria for treatment).
Patients: The target population was 9135 extremely premature infants’ born 2007–2018 from the Swedish National Registry for Retinopathy of Prematurity (SWEDROP). Excluded were 1388 infants >31 weeks at birth, and 138 infants with missing data, for a total of 7609 infants. From those, two internal groups were constructed: the model development group and the validation temporal group. These groups were further subdivided by gestational age, <24 weeks and ≥24 weeks, the latter being the focus of the developmental model. Hence, the internal model development group was comprised of 6947 infants, born 2007–2017; while the internal validation temporal group contained 308 infants, born 2017–2018. Same gestation age cohorts were used for the two external geographical validation models, which included 1485 infants born in the United States from 2005 to 2010 and 329 European infants, born 2011–2017.
Intervention/exposure: prematurity, born 24–30 weeks gestation.
Statistical analysis: Poisson regression for time-varying data was utilized with ROP treatment as the outcome and birth characteristics as predictors.
Follow-up: Twenty postnatal weeks (graphs show up to 28 postnatal weeks).
The overall incidence of ROP needing treatment from SWEDROP was 5.8% (442 of 7609). The incidence of ROP treatment was 40.1% (142/354) for infants with GA < 24 weeks, 10.2% (287/2806) among those with GA 24 weeks to <28 weeks, and 0.3% (13/4449) among those at least 28 weeks GA. The developmental model for GA 24 to 30 weeks included: piecewise linear current postnatal age (break points, 8 and 12 weeks), piecewise linear continuous GA given in weeks and days (break point, 27 weeks), sex, piecewise linear BWSDS (break point,−1SDS), postnatal age × piecewise linear GA interaction, sex × GA interaction, and postnatal age × piecewise linear BWSDS interaction.
All predictive models were assessed with discrimination and calibration. Discrimination is the ability of a model to differentiate between those who do or do not have ROP and is measured by the area under the receiver-operating curve (AUC); calibration is the agreement between predictions from the model and observed outcomes, usually shown as departure from a line on a graph.
Each model showed high predictive ability: AUC of 0.90 (95%CI, 0.89–0.92) for internal model development, 0.94 (95% CI 0.90–0.98) for internal temporal validation; external, geographical validation was 0.87 (95% CI 0.84–0.89) for U.S. cohort, and 0.90 (95% CI 0.85–0.95) for European cohort. The sensitivity of the final model was 99.0%. Risk of infants needing ROP treatment increased between postnatal weeks 8 to 12 and decreased thereafter. Calibration plots were reported in the supplement; there was only a slight departure from the calibration line.
The authors conclude the predictive model based on birth characteristics data is generalizable and enables individualized, early risk prediction for infants born at 24–30 weeks gestation needing ROP treatment.
The primary purpose for the Pivodic et al. model was to predict ROP needing treatment in premature infants using only birth characteristic data. Progress research strategy (PROGRESS) 3 was the guide for the statistical plan and model development strategy [1, 2] in this high-quality work. However, by completing PROBAST , a newly developed instrument for assessing statistical models, we found some deficiencies and uncovered potential bias in the predictive model.
The PROBAST instrument (Prediction model study Risk of Bias Assessment Tool) was designed specifically to assess statistical model studies making individualized predictions . The instrument includes an explanation and elaboration, along with a template for conducting an assessment. The explanation section defines risk of bias as “shortcomings in the study design, conduct or analysis, which leads to systematically distorted estimates of a model’s predictive performance or to an inadequate model to address the research question.” The template includes 20 signaling questions exploring four study domains: participants, predictors, outcome, and analysis. Scoring of each question and domain is low, high, or unclear in terms of bias. Within the analysis domain, model predictive performance is evaluated using calibration, discrimination or classification measures. PROBAST is available at http://development.probast.org/.
Our assessment included the development and validation models; however, it was limited to models assessing risk of severe ROP needing treatment in infants born at 24–30 weeks gestation. Table 1 includes the results for the signaling questions, followed by the rationale for each domain rating.
Rationale for PROBAST ratings
Subjects for the developed model were from the Swedish National Patient Registry. There is no information regarding race/ethnicity in the development model, nor is there an assessment of the 138 infants that were excluded from the target population due to missing data. For example, AUC by race/ethnicity, as reported for the US cohort, varied from 0.79 for Hispanics to 0.90 for blacks, indicating this variable contributes to the models’ ability to discriminate. Also, any missing data should be evaluated for randomness and imputed where possible or evaluated for differences from the analyzed cohort. Together, these may have introduced bias. Thus, there is “High concern” for the participant domain.
First, birth weight SDS is a composite score for expected reference weight, which is based on GA, sex, and birth weight for all healthy singletons born at GA at least 24 weeks from 1990 to 1999 and is derived from infants registered in the Medical Birth Register of Sweden. This scoring system may not be widely used. Next, there is no evaluation of predictors for collinearity (predictors that are highly related to each other). Collinearity may be suspect in the Pivodic et al. model that uses both gestational age and birth weight, along with their interaction, as these are highly associated. This potential collinearity may have resulted in inflated variance, unduly influencing the slope parameters estimates, and rendering future predictions for individuals less accurate.
Further, limiting the model to birth characteristics (which was the stated goal of Pivodic et al.) may lead to overfitting (too few outcome events relative to the number of predictors) or underfitting (failing to include important predictors), and may not fully consider the true risk of severe ROP needing treatment. For example, Pivodic et al. state that postnatal age (not necessarily a birth characteristic) was the best predictive variable for the temporal risk of ROP treatment. Care received while in the NICU may also modify this risk. Incorporating variables such as oxygen saturation targets, type of infant feeding (breast milk versus formula), and supplementation with certain nutrients (Vitamin A, Omega-3 fatty acids and Vitamin E), may change the estimates of risk for developing severe ROP . Thus, the risk of bias was rated as “High” for the predictor domain.
Evaluations for ROP needing treatment may not be consistent across all locations; research shows observer bias between centers for ophthalmologists assessing acute ROP . Therefore, the risk of bias is “Unclear” in the outcome domain.
First, ROP needing treatment was a rare event, ~4.1% in infants born 24–30 weeks’ gestation (300 of 7255). As sample size decrease, the number of infants with the event may also decrease. For example, the temporal validation group contained 308 infants, or about 13 with ROP needing treatment. With smaller samples, the risk increases for selecting spurious predictors (overfitting) or failing to include important predictors (underfitting). Thus, the temporal group may not have had enough evidence for accurate internal validation.
As stated above, collinearity may contribute to risk of bias in all models. Collinearity may alter coefficients by inadvertently reversing them or producing inflated error terms, resulting in inaccurate risk estimates and over-optimistic p-values. A different type of regression analysis could help overcome any collinearity issue; for example, ridge regression, lasso, or elastic net procedures all help reduce model variance. Specifically, elastic net mathematically selects among all predictors, including possible interactions, for the “best” variables to predict an outcome. Thus, the arbitrary criterion for selecting predictors with univariable analysis results <0.10 used by Pivodic et al. would not be applicable.
Internal temporal validation
Internal temporal validation of a development model is when the investigators use the same coefficients produced during the model development stage, the same predictors, outcome definitions and measurements, but sample from a later period. Infants born 2017–2018 from the Swedish National Patient Registry make up the temporal group. Results showed good AUC model performance; however, because data collection spanned 2007–2018 in the registry, it is important to recognize and account for changes in medical practice during this extended period. For example, recommendations for oxygen saturation targets used in preterm infants had changed .
External geographical validation
The intent of external model validation is to quantify the developed model’s predictive performance using a new participant dataset, typically from different investigators that contain measures similar to the developed model. This can be data collected from different settings, intentionally different population, or different locations.
Pivodic et al. selected two groups for external validation: a sample of infants born in the U.S. from 2005 to 2010 and a sample of infants born in Germany (European group) from 2011 to 2017. As with temporal validation, the coefficients from the developed model were used to predict outcomes for the two external cohorts. All validation assessments showed the model was performing well as measured by high calibration, discrimination, classification, and sensitivity.
In creating a predictive model for risk of treatment for severe ROP, it is important to have both high sensitivity (correctly identify those that require treatment; true positive rate) and high specificity (correctly identify those who would not need to be screened; true negative rate). But, positive predictive value (PPV) ranged from 9.2 to 21.8%; so that of those who tested positive, as high as 21.8% actually have ROP. With no known biomarkers for ROP, routine retinal examinations are necessary to detect the small percent of infants requiring treatment. Moreover, the increased survival of extremely premature infants leads to an increased number of infants needing screening. Unfortunately, the number of ROP-trained ophthalmologists is limited and expected to decrease . As such, predictive models may help to select premature infants who are not at risk for severe ROP (such as infants >27 weeks GA). However, specificity was markedly reduced in some of Pivodic et al. models, ranging from 10.5 to 49.3%; thus, these may not be useful for identifying premature infants who are not at risk for ROP. For these reasons, the risk of bias was “High” in the analysis domain.
In conclusion, the evaluation using PROBAST showed risk of bias may be present, which would limit generalizability. As a result, we recommend the following:
Select samples for the development of models that more closely represent the target population in order to increase generalizability of the final model.
Conduct a thorough assessment for potential predictors, including evaluating multicollinearity.
Use larger samples for validation models, (the incidence of severe ROP is trending up in recent studies) .
Incorporate changes of neonatal care into all models; for example, recent changes in recommendations for oxygen saturation levels.
Consider more robust regression techniques to select predictors, such as elastic net, that reduce variance while considering all possible interactions. Alternatively, a Bayesian approach might be useful in that prior knowledge would be incorporated into the model building process.
The American Academy of Ophthalmology did a systematic study of predictive models and defined criteria for model development, which they assigned Level I, Level II, and Level III . Per this measure, the Pivodic et al. study appears to meet the criteria for a Level I rating (high-quality study). However, according to PROBAST these models may in fact be biased. Despite this potential shortcoming, Pivodic et al. demonstrated the risk of ROP needing treatment peaked at 12 weeks postnatal age, regardless of gestational age at birth and not related to post-menstrual age. Rate of the increase was 54% per week from postnatal week 8 through 12; afterwards it decreased 30% per week. This information may help with scheduling ROP screenings and in deploying strategies to prevent the progression of ROP .
EBM lesson: evaluation of a prediction model
The PROBAST instrument appears to provide substantial insight into the risk of bias when developing a prediction model for individuals. Constructing a model to predict an outcome is a complex statistical procedure. Therein lies a danger for using models to predict individual outcomes before fully understanding the potential risk of bias. While we acknowledge that Pivodic et al. employed high quality, rigorous tools for evaluating their models, by using the new PROBAST instrument, shortcomings were easily identified that could significantly alter risk estimates of ROP requiring treatment in premature infants.
Pivodic A, Hård AL, Löfqvist C, Smith LEH, Wu C, Bründer M, et al. Individual risk prediction for sight –threatening retinopathy of prematurity using birth characteristics. JAMA Opthalmol. 2020;138:21–9.
Steyerberg EW, Moons KG, van der Windt DA, Hayden JA, Perel P, Schroter S, et al. PROGRESS Group. Prognosis Research Strategy (PROGRESS) 3: prognostic model research. PLoS Med. 2013;10:e1001381.
Moons KGM, Wolff RF, Riley RD, Whiting PF, Westwood M, Collins GS, et al. PROBAST: a tool to assess risk of bias and applicability of prediction model studies: explanation and elaboration. Ann Intern Med. 2019;170:W1–W33. https://www.equator-network.org/, accessed 02/18/2020.
Raghuveer TS, Zackula RE. Strategies to prevent severe retinopathy: a 2020 update and meta-analysis. NeoReviews. 2020, in press.
Darlow BA, Elder MJ, Horwood LJ, Donoghue DA, Henderson-Smart DJ. Australian and New Zealand Neonatal Network. Does observer bias contribute to variations in the rate of retinopathy of prematurity between centres? Clin Exp Ophthalmol. 2008;36(Jan-Feb):43–6.
Wong RK, Ventura CV, Espiritu MJ, Yonekawa Y, Henchoz L, Chiang MF, et al. Training fellows for retinopathy of prematurity care: a web-based survey. J AAPOS. 2012;16:177–81.
Hutchinson AK, Melia M, Yang MB, VanderVeen DK, Wilson LB, Lambert SR. Clinical models and algorithms for the prediction of Retinopathy of Prematurity. Ophthalmology. 2016;123:804–16.
The Journal club is a collaboration between the American Academy of Pediatrics- Section of Neonatal Perinatal medicine and the International Society of Evidence-based neonatology (EBNEO.org).
Conflict of interest
The authors declare that they have no conflict of interest.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Zackula, R.E., Raghuveer, T.S. Prediction of severe retinopathy of prematurity in 24–30 weeks gestation infants using birth characteristics. J Perinatol (2020). https://doi.org/10.1038/s41372-020-00876-9