Introduction

Very preterm, very low birth weight infants are major contributors to the population burden of mortality and morbidity related to perinatal factors; hence, efforts to improve the quality of care for this patient group is an important area of health service endeavour.1 Data collection and analyses of processes and outcomes across neonatal units and networks are potentially informative and can be powerful tools to improve quality by benchmarking and feedback. Such data might also offer substantial potential for research.

There are, however, many practical challenges involved in utilising data from multiple countries to improve neonatal care. There are differences between and within countries in the methods used to record data, the regulatory approvals governing their use, consistency in the items recorded and their definitions, no less in clinical practice.2 These difficulties can be major obstacles to realising their potential. In Europe, the open platform EuroNeoNet, was an initial attempt to record and utilise data from neonatal units.3 EuroNeoNet closed in 2015 and was followed by eNewborn, a platform integrating innovative information technology, original software, a revision of the EuroNeoNet data set, and international collaboration. The characteristics of the eNewborn platform have been described elsewhere.4

The aim of this paper is to report a pilot evaluation of eNewborn as a large, real-world data set. We used selected variables and an exemplar statistical analysis to provide an indication of the potential utility of the eNewborn database. Our specific objectives were to use the eNewborn database to (i) present summary statistics for early, neonatal and in-hospital mortality by gestational age (GA); (ii) analyse the relationships between delivery room care and Apgar scores at 1 and 5 min on mortality and bronchopulmonary dysplasia (BPD) as a first step for the development of a future tool to predict risks of these outcomes in extremely and very low birth weight infants; and (iii) calculate the length of time that would be needed by individual countries on the basis of their neonatal admission rates to achieve 80% power to detect a clinically relevant change in mortality and BPD following a hypothetical intervention.

Methods

Data source

The eNewborn database receives information on all live births born between 22 weeks 0 days and 31 weeks 6 days of gestation or ≤1500 g birth weight who are admitted to a participating neonatal unit. Data on babies who die in the delivery room are not included as information on these infants varies within and between countries. Over 200 neonatal units across Europe submit data to eNewborn. Belgium, Czech Republic, Portugal, Switzerland and the United Kingdom submit data extracted from an in-country database. Data are submitted by individual neonatal units from 5 additional countries: France 10 units, Germany 1 unit, Poland 1unit, Spain 2 units, and Finland 1 unit. Data entry was either directly online or as an extract from local electronic medical records. The proportional contribution of each country is described in Appendix 1.

Definitions

We defined mortality as ‘early’ (days 0–6 from birth), ‘neonatal’ (days 0–27) and ‘in-hospital’ (before discharge from neonatal care) and BPD as any supplemental oxygen received at 36 weeks’ postmenstrual age. We defined delivery room care as ‘no resuscitation’, ‘basic resuscitation’ (supplemental oxygen and/or positive pressure breaths with a bag and face mask or continuous positive airway pressure)’ and ‘advanced resuscitation’ (endotracheal intubation and/or cardiac compression and/or administration of adrenaline/epinephrine). We categorised birth weight as extremely small for gestational age (ESGA; <3rd centile), small for gestational age (SGA; <10th centile) and appropriate for gestational age (AGA; ≥10th centile).5

Statistical analysis

We extracted 12 variables for each infant: (i) GA; (ii) birth weight; (iii) delivery room care (none, basic, advanced); (iv) Apgar at 1 min; (v) Apgar at 5 min; (vi) BPD (yes/no); (vii) death (early, neonatal, in-hospital; none); (viii) antenatal steroids (yes/no/partial); (ix) multiple birth (yes/no); (x) inborn/outborn; (xi) surfactant yes/no; (xii) sex. For each variable, we tabulated the number of infants and the number of missing values. As birth weights <250 g are implausible for liveborn infants, we classified these as missing. Major birth defects were recorded.

We estimated early neonatal, neonatal and in-hospital mortality for boys and girls combined and separately. To investigate the relationship between delivery room care and 1 and 5 min Apgar scores, we first explored their correlations by GA. We examined the evolution of Apgar scores by fitting an independence model to determine whether an association between 1 and 5 min values could be demonstrated. In case of a significant association, we tested the symmetry of the changes in these variables, presented as a square table, by Bowker’s test.6 In case of a significant test, we used the off-diagonal elements of the table to judge the direction in which the changes occurred. After the independence model, we fitted a model of linear by linear association to examine the effect of antenatal steroids on the Apgar score at 1 min. A finding of increasing Apgar scores with increasing compliance with antenatal steroid administration would reject the hypothesis of no association.6

We then conducted four exploratory logistic analyses. In the first analysis, we fitted ‘neonatal mortality’ as the endpoint against delivery room care category, GA, birth weight category, antenatal steroids, multiple birth and sex. To avoid problems of multicollinearity, we modelled birth weight category rather than birth weight. In the second analysis, we replaced delivery room care category with Apgar score at 1 min and in the third with Apgar score at 5 min. In addition, we fitted a similar model with the evolution of the Apgar score from minute 1 to minute 5, replacing delivery room care, Apgar scores at 1 min and at 5 min of the previous models. Similarly, we carried out a second series of three logistic regressions this time with BPD as the outcome and adding surfactant administration to the above explanatory variables. We had to deal with the hierarchical structure of the data consisting of infants in neonatal intensive care unit (NICU) and NICU in countries, leading to correlations within data. These correlations are contrary to the requirement of independence and may produce what is called extra-binomial variation: the variance of the dependent variable (here neonatal mortality and BPD) will be greater than expected under the assumption of a binomial distribution. This may result in underestimation of the standard errors and overestimation of the chi-square statistics. To correct for this problem, we divided all the individual chi-squares and standard errors by the ratio of the Pearson’s goodness-of-fit chi-square to its degrees of freedom, leaving the coefficient estimates unchanged.7,8

We checked model fit by the ‘c-statistic’.9 Since we were dealing with non-nested models (these are models in which no model is a subset of one of the other models), we carried out a model selection using the Bayesian Information Criterion (BIC).10 BIC is a function of the probability of a model, the number of parameters and the number of observations. It ranks the models studied according to their score, with the lower the BIC score, the better the fit.

To assess the biases that might arise from missing data, we carried out bivariate (cross-tabulation) and multivariate (logistic regression) sensitivity analyses by creating a category for missing values for Apgar scores at 1 and 5 min and antenatal steroids and compared their effect with that of non-missing values on in-hospital mortality and BPD.

To illustrate the benefits of bringing together data from different countries, we conducted a power analysis. We calculated how many years would be needed for each country to identify a statistically significant difference in mortality following a hypothetical intervention. To improve intelligibility, we reduced the 3-year period of observations to average yearly number of admissions. We provide details of the calculations in Appendix 2.

Participation in the study was agreed on according to each country’s national regulations. Agreements were network based or unit based.

Results

We identified 39,529 infants in the eNewborn database over the 3-year period 2014–2016, born at or below a GA of 31 weeks and 6 days or ≤1500 g birth weight. Mortality and patient characteristics are shown in Table 1. The early neonatal, neonatal and in-hospital mortality rates were 3.90% (95% confidence interval (CI) 3.71, 4.09), 6.00% (95% CI 5.77, 6.24) and 7.57% (95% CI 7.31, 7.83), respectively. Of the 2373 babies who died in the neonatal period, approximately two-thirds were in the first postnatal week. Birth defects occurred in 7.29% (n = 173), which represents 0.43% of the neonatal deaths among which 0.18% (n = 73) were related to a major congenital or chromosomal anomaly. This is a negligible contribution, and therefore we did not include congenital anomalies in the multivariate analysis.

Table 1 (a) Main endpoints of the eNewborn database (2014–2017); (b) characteristics of survivors to discharge and non-survivors.

We show early neonatal, neonatal and in-hospital mortality rates and 95% CIs by GA for boys and girls combined in Fig. 1. The decline in mortality with increasing GA is exponential up to 26 weeks and stabilises afterwards.

Fig. 1
figure 1

Early neonatal (black), neonatal (blue), and in-hospital mortality (red) rates; mean and 95% confidence intervals, by gestational age (in completed weeks), for boys and girls combined.

Neonatal mortality by sex, birth weight and GA are shown in Fig. 2. To avoid the undue effect of superimposition of cases, we plotted a random sample, defined as equal to the number of deceased newborns, of boys and girls surviving the neonatal period in the upper part of the graphs. In the lower part, we plotted the corresponding graphs for deceased boys and girls. For ease of interpretation, we added the 3rd, 10th, 90th and 97th Fenton centiles. The figure shows greater representation of deceased babies at the lower GAs and in the growth-restricted categories.

Fig. 2: Neonatal mortality by sex, birth weight, and gestational age.
figure 2

Neonatal survivors and deaths. Upper panel left: random sample of 1021 survivors (girls); Upper panel right: random sample of 1346 survivors (boys). Lower panel left: 1021 deaths (girls); Lower panel right: 1346 deaths (boys). Dashed lines from bottom to top 3rd, 10th, 90th, and 97th Fenton centile.

Table 2 describes the correlation between the three categories of delivery room management (none, basic and advanced) and Apgar score. In the smallest GA groups (22–24 weeks), we observed very weak (<0.20) correlations between Apgar scores at both 1 and 5 min and delivery room care. The correlation coefficient between Apgar score and delivery room care increased with increasing GA from 0.13 at 22–24 weeks to 0.50 >30 weeks at 1 min and from 0.09 to 0.30 at 5 min. There were strong correlations (0.60) between both Apgar scores in the most immature GA groups decreasing progressively to a moderate correlation of 0.44 in the GA group ≥30 weeks.

Table 2 Delivery room care (by category) by Apgar 1- and 5-min scores (by category) and gestational age (in completed weeks and by category), expressed in percentages and Spearman’s rank correlation coefficients.

For babies <25 weeks GA, 95% received advanced delivery room care if the 1 min Apgar score was between 0 and 3, 92% when the 1 min Apgar was 4–6 and 82% when the 1-min Apgar was 7–10. For GAs between 25–27 and 28–31 weeks, these percentages were 90%, 80% and 56% and 69%, 40% and 14%, respectively.

Table 3 shows a significant (p < 0.001) discrepancy between the observed and expected values obtained by an independent model for both antenatal steroids and 5 min Apgar score. There was a significant improvement in Apgar score between 1 and 5 min (Bowker’s test for symmetry: p < 0.0001). Overall, 59.6% of the 36,877 newborns had the same Apgar score at 1 and 5 min, 39.3% had an improved score and 1.1% a deteriorated score (Table 2). Of the 36,877 babies, 6251 (17.0%) had a 1-min Apgar score between 0 and 3. Of these, by 5 min, 19.8% had a similar Apgar score, 47.7% had a score of 4–6 and 32.5% a score of 7–10. Of the newborns with a 1-min Apgar score of 4–6, by 5 min 80.9% had an improved and 0.6% a deteriorated score. Of those with a 1-min Apgar score of 7–10, 1.9% had a worse score at 5 min.

Table 3 Apgar score at 1 vs 5 min and administration of antenatal steroids.

Increasing compliance with administration of antenatal steroids was associated with higher Apgar scores (p < 0.001).

The logistic regression modelling showed increased neonatal mortality with advanced delivery room resuscitation, Apgar scores of 0–3 and 4–6, no improvement of very low Apgar scores, decreasing GA, ESGA and SGA status, boys, multiple births and no antenatal steroids (Table 4). Babies with an incomplete course of antenatal steroids also fared worse than those with a complete course. The fit of the models was very good, all had a c-statistic between 0.84 and 0.86. According to the BIC, the ranking of the models was as follows: (1) delivery room care, (2) Apgar score at 5 min, (3) Apgar score at 1 min, and (4) evolution of the Apgar score.

Table 4 Neonatal mortality and bronchopulmonary dysplasia, expressed as odds ratios (OR) and 95% confidence intervals (95% CI), as a function of delivery room care, 1-min Apgar score, 5-min Apgar 5 score or evolution of the Apgar score between 1 and 5 min, adjusted for gestational age, birth weight, antenatal steroids, multiple birth, sex and surfactant administration (BPD only).

In the BPD logistic regression modelling, we observed an adverse association with advanced delivery room resuscitation, Apgar scores of 0–3 and 4–6, decreasing GA, ESGA and SGA status, boys, multiple births, surfactant administration and a complete antenatal steroid course (Table 4). The fit of the models was very good; all had a c-statistic between 0.84 and 0.85. According the BIC the ranking of the models was as follows: (1) Apgar score at 5 min, (2) delivery room care, (3) Agar score at 1 min, and (4) evolution of the Apgar score.

The association between missing 1- and 5-min Apgar scores and neonatal mortality was very similar to that of the non-missing scores in both bivariate and multivariable analyses. Similarly, the effect of missing antenatal steroids was close to the ‘incomplete antenatal steroid course’ category. In the bivariate analysis, infants with a missing 1-min Apgar score had a 3.5% lower risk of BPD than the lowest 1-min Apgar score category, whereas in the multivariable analysis BPD risk was 7% higher. In the same analyses, with ‘missing antenatal steroids’ compared to non-missing antenatal steroids, we observed about 5% higher risk in the former in both the bivariate and multivariable analyses.

In Table 5, we display the number of years needed for each country or network to achieve 80% power to identify a statistically significant change in neonatal mortality and BPD in 23, 24 and 25 weeks’ GA infants. The data are presented for countries with complete data, using their ‘rates’ as present in the database and assuming these countries have similar therapeutic approaches and outcomes in terms of neonatal mortality and BPD.

Table 5 Number of years needed by a country, given the average yearly number of admissions of babies born at 23, 24 and 25 weeks’ gestation and all three gestational age bands combined, to achieve 80% power to detect a statistically significant change in neonatal mortality and BPD rates in response to a hypothetical intervention, aiming at a 5% change in these rates.

Discussion

In this first analysis of the eNewborn database, we provide baseline figures for mortality and BPD for almost 40,000 very preterm/very low birth weight babies across Europe. We show that in-hospital mortality mirrors survival rates reported elsewhere11,12 with the 50% GA-specific mortality rate situated at 23–24 weeks, also in keeping with other studies.13 However, the Epice Research Group study reported an in-hospital mortality of 13.6% compared to that of our cohort, namely 7.6 %. This might be partly due to the inclusion of all babies <32 weeks or <1500 g in eNewborn with more growth-restricted mature babies increasing the survival rate.

We showed a significant increase in Apgar scores between 1 and 5 min and an association between increasing compliance with antenatal steroid administration and increased 1-min Apgar score.14 We found that, of babies with a 1-min Apgar score between 0 and 3, almost 20% had no improvement by 5 min. We further observed (1) a higher mortality for Apgar scores of 0–3 and 4–6 at 1 min and for Apgar score 0–3 at 5 min and (2) lowered mortality with improved Apgar scores at 5 min, indicating the importance of recovery in the very few first minutes of life.

We also found for babies with a 1-min Apgar score of 7–10, indicating good condition at birth, that advanced delivery room care was reported for >80% of babies at <25 weeks’ GA and over half at 25–27 weeks but was associated with high mortality and BPD. We identified as anticipated a predominant association between decreasing GA and adverse outcomes and significant associations with low Apgar scores, growth restriction, male sex and multiple birth. The odds of death were reduced across all GAs in infants who received both a complete or incomplete course of antenatal steroids. However, improved survival was accompanied by a higher risk of BPD.

The strengths of our study are the large number of infants, with <10% missing values,15 and the number of countries contributing complete or near complete neonatal unit admission data (Appendix 1). Europe has small and large countries and therefore their individual data contributions are inevitably non-homogenous. Differences in morbidity and mortality have been largely described between neonatal units and hospitals16,17,18,19,20 but less often between and within countries. We included several adjustments (GA, Apgar scores, AGA, gender, multiplets, prenatal steroids) that impact on mortality in our logistic regressions. The c-statistics, reflecting the fit of the models, are very good (all between 0.84 and 0.86) and therefore not including country as a covariate into the analysis seems not to have been a major drawback. The selection of ‘Model Delivery Room Care as the best performing model in the context of neonatal mortality is due to the higher number of observations, its log-likelihood of −4993.7 being smaller than that be considered of −4297.7 of ‘Model 5 min Apgar score’. In the context of BPD, ‘Model 5 min Apgar score’ performed best (see Appendix 3: model fit and model selection). Although ‘Evolution Apgar score’ is an appealing model, it remains a second-choice model due to a lesser likelihood and an increased number of parameters (combinations).

Furthermore, we limited ourselves in this exploratory phase to main effects models (models without interactions) as these are easier to understand than models with interactions. The fit of our models was such that the effect of the interactions was minimal. However, if we want to assess prediction, we must take interactions into account; for example, Fig. 2 clearly shows that there is an interaction between GA and birth weight category on neonatal mortality. Identifying optimal weightings and validating them needs sophisticated analyses and large data sets, hence will be possible in the future when the network has grown.

We also show that bias due to missing data was marginal for BPD and absent for Apgar scores. Study weaknesses are that we had no means to quality assure or cross-validate data submitted directly by individual neonatal units, incomplete population coverage and inconsistent definitions of key outcome measures, such as BPD.21 Further we have no information on socioeconomic status and other maternal details nor risk variables such as the Clinical Risk Index of Babies score.

Despite limitations, we feel some broad observations are possible. As anticipated, the most immature babies had lower Apgar scores than babies of greater maturity. Reassuringly, we identified a positive correlation between antenatal steroid exposure and 1-min Apgar score. In the most immature groups, we identified both a poor correlation between 1- and 5-min Apgar scores and delivery room care. This suggests that, across the countries represented, neonatal teams may not be necessarily intervening with advanced resuscitation attempts in the most immature infants, reflecting current thinking in relation to the care of infants at the margins of viability. The high reported rate of advanced delivery room care in babies in good condition at birth, particularly in those of greater immaturity, suggests persisting resistance to adopting a gentler, less intrusive approach to immediate newborn care, with intervention only when clearly warranted.22 More research is needed to define best strategies and types of intervention for better outcomes and can only be achieved practically through international collaboration. In Sweden, wide regional variations are observed in mortality and in management. Proactive care did not increase the risk of neurodevelopment impairment at 2.5 years.23 But the definition of pro-active care can be different in different settings.

There is strong evidence that antenatal steroids reduce the risk of BPD and our finding of greater odds of BPD following a complete course of antenatal steroids is likely to be a consequence of survival bias. However, the evidence base for antenatal steroids in women at risk of preterm birth has changed following a trial in low- and middle-income settings that identified harms, including greater risk of neonatal sepsis, itself a risk factor for death and BPD.24 Further, a reappraisal of the Cochrane review evidence has indicated that generalisability across all settings should not be assumed. Though the countries contributing to eNewborn are all high income, they nonetheless are likely to encompass considerable heterogeneity in respect of patient characteristics, no less in care practices.25,26 In addition, many practices have changed since the randomised trials of antenatal steroids were performed, and it is not inconceivable that the balance of benefits to risks may also have altered. Current recommendations for delivery room interventions have changed over time to encourage less invasive techniques.27 Large population data sets such as eNewborn offer potential to re-evaluate treatments over time.

The eNewborn database is in its infancy and has as yet limited coverage. The Vermont Oxford Network started in 1988 and is far larger with a wide range of outputs but also suffers from incomplete population coverage.28 The use of the eNewborn database has thus far been limited to benchmarking. However, a unique aspect is the interactive navigation that has been described elsewhere.4 Data are provided on a collaborative basis and no financial contribution is required by participants. The sustainability of the network is based on external funding from grants, donations and commissions.

The eNewborn platform benefits from flexible data capture and can accommodate direct online recording as well as receive extracts from established databases with quality-assured data, such as the UK National Neonatal Research Database.4,29 Clinicians are rightly and increasingly required to audit care practices against accepted standards, conduct comparative evaluation of outcomes nationally and internationally and undertake quality improvement programmes. The UK Royal College of Paediatrics and Child Health has run a National Neonatal Audit Programme since 2007, and because data for the audit are obtained from the National Neonatal Research Database, clinical teams are not faced with the burdens of added data collection.29 More widespread country-wide participation could drive rigorous pan-European audit especially if overseen by a cross-national authority such as the newly established European Board of Neonatology as part of the European Society of Paediatric Research (https://www.espr.eu/). The introduction of neonatal care standards by the European Foundation for the Care of the Newborn Infant (https://www.efcni.org/) also now provides a potential focus for audit (https://newborn-health-standards.org/). International collaboration can also drive improvements in care by, for example, identifying variation, providing benchmarks and identifying potential areas for improvement such as delivery room resuscitation, as indicated by our findings.

Many newborn interventions have an inadequate evidence base, providing a clear impetus for comparative effectiveness studies. Most neonatal medications continue to be used off-license or off-label, because they have not been evaluated in relevant patient groups. These chronic issues call for sustained emphasis on clinical and translational research. However, we show that it would take a country such as Belgium 9 years and 17 years, respectively, to have a sufficient number of eligible babies to achieve 80% power to detect a statistical significant change in neonatal mortality and BPD following a hypothetical intervention in babies of 23–25 weeks’ GA, in contrast to 1 year for the eNewborn network as a whole. International collaboration offers the only real hope for conducting studies aiming to identify impacts of interventions that are unlikely to have arisen by chance.

In summary, we provide illustrative data describing potential uses of the eNewborn database. Neonatal critical care is a high-cost service that is justified both by the moral imperative to provide quality care for sick newborn infants and because of its life-long impact. The eNewborn platform offers opportunity for high-quality data capture across neonatal services in Europe and the means to drive audit, quality improvement and research.