Introduction

Apnoea of prematurity (AOP) is one of the most prevalent diagnoses in neonatal intensive care units, occurring in near all infants born before gestational week 28, in 85% born before gestational week 30 and in 20% of infants born before gestational week 34.1 AOP derives from an immature respiratory control centre in the preterm infant and can lead to desaturation and reflexive bradycardia, which might result in the need for respiratory support with mechanical ventilation or non-invasive ventilation, such as continuous positive airway pressure or intermittent positive pressure ventilation.2 The hypoxia following apnoeic spells may damage the developing organs and might constitute a risk factor for several long-term morbidities in preterm infants, such as neurodevelopment impairment with lifelong disability.2

Caffeine is indicated for treatment of AOP,3 prophylaxis for infants at risk for apnoea,4 and to prevent extubation failure in mechanically ventilated infants.5 It has been used to treat respiratory failure in preterm infants since the 1970s, when it was first recognised to have positive effects on respiration,6 and is the most commonly prescribed medication and respiratory stimulant in neonatal intensive care units.7,8 A large, randomised controlled trial (RCT) comparing caffeine with placebo in premature babies with birth weight <1250 g showed that caffeine administration reduced apnoea as well as bronchopulmonary dysplasia (BPD), cerebral palsy and cognitive delay at 18–21 months of age.9,10 Despite possible dose-related side effects of caffeine including tachycardia, tachypnoea and irritability, no serious short- or long-term harms of caffeine therapy were found in the trial.11 Caffeine is today considered to be one of the safest drugs in preterm infants, and it is likely that this has further increased its usage.11

The age at caffeine initiation decreased from a mean of 10 days after birth (median 4 days) in 1997 to a mean of 4 days after birth (median 1 day) in 2010.12 In 2000, 21% of infants treated with caffeine received therapy already on the day of birth, while in 2014, this had increased to 67%.13 The prophylactic use of caffeine, i.e. treatment to prevent AOP before it has occurred, has also increased.14

The rationale for an earlier initiation of caffeine is to prevent the need for mechanical ventilation and reduce the time on mechanical ventilation, by maintaining spontaneous ventilation through boosting the infant’s respiratory drive.15 This is assumed to also have positive long-term effects since mechanical ventilation and protracted oxygen therapy constitute risk factors for both BPD and poor neurodevelopmental outcomes.16,17

Pharmacodynamically, caffeine acts as a selective adenosine antagonist at the A2a receptors and a non-selective adenosine antagonist at A1 receptor.1,11 The effect of caffeine on respiration is likely a result of stimulation of the respiratory centre in the medulla and of the diaphragm leading to improved pulmonary compliance.11 Caffeine may have several other effects, like anti-inflammatory properties and increased diuresis facilitating breathing by removing excessive fluid in the lungs.11

In contrast to the several studies suggesting that caffeine improves outcomes with no serious side effects, the effect of early and high-dose caffeine treatment was recently studied in an RCT that showed a higher incidence of cerebellar haemorrhage in the group who received early high-dose caffeine (80 mg/kg over the first 36 h of age) compared to the group who received early low-dose caffeine.18

In summary, there is strong evidence to support that caffeine for preterm babies is beneficial for several important outcomes; however, the optimal timing of caffeine administration has not been systematically appraised and remains uncertain. Earlier, caffeine might reduce respiratory morbidity by preventing apnoeic spells and lung inflammation; however, its effects in the first hours of life might increase the risk for overtreatment including harms such as intracranial bleeding. This systematic review aims to assess whether early administration of caffeine is more effective and safer than late administration for reducing morbidity and mortality in preterm infants.

Methods

This systematic review was performed in accordance to standard methods used by Cochrane Neonatal Review Group.19 The method was published in a preregistered protocol before performing the search and the data collection.20 We included RCTs including cluster RCTs, as well as prospective and retrospective cohort studies. We excluded studies with any other study design. We defined data comparing early versus late administration of caffeine within the intervention arm of RCTs studying caffeine versus placebo as non-randomised, i.e. as an observational study. We included studies on preterm infants born before gestational week 34 and admitted to neonatal intensive care units. The threshold between early and late administration was set to 24 h after birth for our primary analysis (pre-specified in the protocol).20 This was an arbitrary chosen cut-off. However, we included studies using other definitions for early and late caffeine administration as well.

Primary outcomes were: (1) all-cause neonatal mortality (within 28 days of life); (2) respiratory function at school age calculated as forced expiratory volume in the first second (FEV1) at school age; (3) major neurodevelopmental disability (composite outcome defined as cerebral palsy, developmental delay (Bayley Mental Developmental Index21,22 or Griffiths Mental Development Scale23 assessment >2 standard deviations (SDs) below the mean), intellectual impairment (intelligence quotient (IQ) >2 SDs below the mean), blindness (vision <6/60 in both eyes) or sensorineural deafness requiring amplification.24

Secondary outcomes were: (1) mortality prior to first hospital discharge; (2) BPD defined as respiratory support or oxygen, or both, at 28 days of life;25 respiratory support or oxygen, or both, at 36 weeks’ postmenstrual age;26 or physiological definition;27 (3) intraventricular haemorrhage (IVH) on brain ultrasound in the first month of life: any grade; severe (grade 3–4);28 (4) cerebellar haemorrhage on brain ultrasound in the first month of life;29 (5) cystic periventricular leukomalacia on brain ultrasound in the first month of life; (6) necrotising enterocolitis (defined as Bell’s Stage II or greater);30 (7) retinopathy of prematurity (all stages and severe (stage 3 or greater));31 (8) surgery for patent ductus arteriosus; (9) duration of mechanical ventilation (days); (10) duration of hospital stay (days); and (11) brain magnetic resonance imaging abnormalities at term equivalent age, defined as white matter lesions (i.e. cavitations32 and punctate lesions,33 GM-IVH34 or cerebellar haemorrhage35).

We included relevant RCTs and cohort studies regardless of language, publication date or publication status (published, unpublished, in press or in progress). An extensive, systematic literature search was performed, searching the following databases: PubMed, Embase, The Cochrane Library, and CINAHL complete (Appendix 1). We searched for ongoing clinical trials in The ISRCTN registry, The World Health Organisation International Clinical Trials Registry Platform and The US National Institutes of Health Ongoing Trials Register ClinicalTrials.gov. Two authors (S.N.V., C.N.) independently screened all titles and abstracts obtained from the searches for potential relevance, followed by assessment of the full text of the potential eligible studies. We assessed the studies in accordance to our inclusion criteria: study design, type of participants, and interventions. We used Covidence36 for title and abstract screening, full text screening, providing reasons for exclusion of irrelevant articles, and generating a PRISMA flow chart. Two authors (S.N.V., C.N.) independently performed data extraction and assessment of risk of bias. For the assessment of risk of bias in the included RCTs, we used Cochrane’s “Risk of Bias” 2.0 tool37 to formally assess the following domains: bias arising from the randomisation process, bias due to deviations of intended interventions, bias due to missing outcome data, bias in measurement of the outcome, bias in selection of the reported results, and overall risk of bias. For the included observational studies, we used the “Risk Of Bias In Non-randomised Studies - of Interventions” (ROBINS-I) tool38 to formally assess the risk of bias in the following domains: bias due to confounding, bias in selection of participants into the study, bias in classification of interventions, bias due to deviations from intended interventions, bias due to missing data, bias in measurement of outcomes, bias in the reported results, and the overall risk of bias. In the confounding domain, we took into account the following confounding factors: antenatal steroids, gestational age, birth weight, sex, Apgar score, indication to start caffeine, and level of respiratory support at study entry. We resolved through discussion any disagreements regarding the selection of studies, data extraction and the risk of bias assessment. When necessary, a third review author was consulted (M.J., M.B.).

Results

The search generated 2051 references. After de-duplication, 2013 references remained for title and abstract screening, of which 1864 references were deemed irrelevant. We assessed the remaining 125 publications in full text of which 16 studies met the inclusion criteria. Two of the included studies were RCTs,39,40 two were prospective cohort studies41,42 and 12 were retrospective cohort studies12,13,14,43,44,45,46,47,48,49,50,51 (Table 1). Figure 1 presents the PRISMA flow chart. The search in registries for clinical trials identified two ongoing trials comparing early versus late caffeine.52,53 We contacted the principal investigators, but no data had been published. In addition, 21 studies awaiting classification (available as abstracts only) were identified.

Table 1 Study characteristics.
Fig. 1
figure 1

PRISMA flow chart.

We assessed only one trial to have a low risk of bias (Amaro et al.39). We assessed all other studies to have a high or serious/critical risk of bias (Tables 2 and 3), which means that the results from these studies should not be relied upon, since they may be inaccurate. The major problems in most of the cohort studies were a “risk of bias due to confounding” and a “risk of bias due to selection of participants into the study” (see “Discussion”).

Table 2 Risk of Bias assessment with the RoB 2.0 tool for the included RCTs.
Table 3 Risk of Bias assessment with ROBINS-I tool for the included cohort studies.

Owing to the large clinical heterogeneity (i.e. that studies differed in inclusion criteria, time and dose of caffeine, outcome definition, indication for treatment and respiratory status at study start) and because all studies except one had a high or serious/critical risk of bias, we did not find it meaningful to pool the results in a meta-analysis, since the result from such a meta-analysis may be seriously misleading. Instead, we have narratively summarised the results from the studies. We used a p value <0.05 as cut-off for statistical significance, corresponding to a 95% confidence interval (CI) that does not cross the line of no effect. Study characteristics and outcome data from the included studies are presented in Tables 1 and 4, respectively. Some studies reported only p values and no point estimates or CIs for some or all outcomes. For those outcomes, we have estimated risk ratios (RRs) and CIs based on the data reported in the studies and all such estimates are marked with an asterisk (*).

Table 4 Outcome data.

Since Amaro et al.39 was the only included study assessed to have a low risk of bias, we will describe this study in more detail. The aim of the trial was to assess the effect of early caffeine on the age of first successful extubation in preterm infants. Preterm infants born at 23–30 weeks of gestation requiring mechanical ventilation in the first 5 postnatal days were randomised to receive either a 20 mg/kg loading dose followed by 5 mg/kg/day of caffeine (n = 41) or placebo (n = 42) until considered ready for extubation. The placebo group received a blinded loading dose of caffeine before extubation. The trial was stopped early because an interim analysis at 75% enrollment showed a trend towards higher mortality in one of the groups (the early caffeine group). Unblinded analysis revealed that there was no statistically significant difference in mortality between the early caffeine (22%) and control groups (12%; p = 0.22). The study did not find any statistically significant differences between the groups in any of the outcomes that they reported and that we had prespecified in this systematic review (Table 4).

In Saeidi and Maghrebi,40 40 very preterm infants (mean birth weight 1123 g) were randomised to receive caffeine either before or after 72 h of life, at same dose as in Amaro’s study (20 mg/kg loading dose followed by 5 mg/kg/day). No statistically significant differences were found between early and late caffeine groups for mortality prior to hospital discharge (RR (95% CI) 0.75 (0.34–1.66)*) and any grade of IVH (RR (95% CI) 0.59 (0.24–1.42)*). The RR for BPD (defined as respiratory support or oxygen, or both, at 28 days of life) was 0.37 (95% CI: 0.14–0.98)*.

We used the “Grading of Recommendations Assessment, Development and Evaluation” (GRADE) assessment for the quality of the evidence. None of the trials reported the primary outcomes of this review, i.e. on all-cause neonatal mortality, respiratory function at school age or major neurodevelopmental disability. Quality of the evidence for the outcome mortality before discharge (reported in both trials) was very low due to risk of bias and imprecision (downgraded by two levels: two small trials; few events).

Outcomes reported in the 14 cohort studies are shown in Table 4.

Discussion

In this systematic review of the benefits and harms of early compared to late administration of caffeine to preterm infants, 2 RCTs and 12 cohort studies investigating the review question were found. However, it is not possible to draw firm conclusions based on the available evidence. This is because few studies evaluated long-term or patient-relevant outcomes, and all studies except one had a serious or critical risk of bias, which means that the results can be seriously misleading. The RCT with a low risk of bias had a small sample size and no long-term follow-up.

For the cohort studies, the serious or critical risk of bias does not necessarily mean that they are of low quality. Rather, there are inherent methodological problems related to investigating the effect of early versus late caffeine in non-randomised studies. All cohort studies had problems in the risk-of-bias domains “bias due to confounding” and “bias due to selection of participants into the study”. Bias due to confounding arises because infants who receive caffeine early differ in important prognostic factors compared to infants who receive caffeine late. For example, when caffeine is given to facilitate extubation, infants who receive caffeine early are ready to be extubated earlier (and thus likely healthier) than infants who receive caffeine late. Some studies did not adjust for any confounders, while most studies adjusted for factors like gestational age and birth weight. However, few studies adjusted for respiratory status of the infant and none adjusted for clinical indication for the administration of caffeine (i.e., for example, extubation, apnoea or respiratory failure). We therefore judged all studies to be of serious or critical risk of bias due to residual confounding.

Bias due to “selection of participants into the study” emerges when start of follow-up, specification of eligibility and treatment assignment do not coincide. This bias is sometimes referred to as “immortal time bias”.54 A classic example of immortal time bias is hormone replacement therapy for women during menopause. Several observational studies showed a clinically and statistically significantly lower risk of cardiovascular disease for women using hormone replacement therapy and a dose–response effect was observed so that the longer the use of intervention, the greater the reduction in cardiovascular risk. However, later RCTs comparing hormone replacement therapy with placebo revealed that the intervention in fact increased the risk of cardiovascular disease substantially. The reason behind the contradicting results between observational studies and the later RCTs was that the results from the observational studies were flawed owing to residual confounding and immortal time bias.55 Studies of early versus late caffeine administration in preterm infants have slightly different considerations for immortal time bias than hormone replacement therapy in menopausal women; however, the underlying principles are the same. If start of follow-up, specification of eligibility and treatment assignment do not coincide, part of the follow-up will include time when participants cannot have died, since they would not have been included in the respective group if they had died during that part of follow-up.54 In the case of this review, infants must have survived to the (later) time of caffeine administration to be included in the late group, and the infants who died before that time may have been included in the early group but could not have been included in the late group. Of note, some of the cohort studies included in this review tried to reduce immortal time bias by excluding from the analysis all infants who died before cut-off for early versus late caffeine administration. For example, in one study early caffeine was defined as before 3 days of life and late caffeine as after 3 days of life, and all infants who died before 3 days of life were excluded from the analysis.12 In this way, immortal time bias before 3 days of life is diminished. However, mean time for caffeine administration for the late group was on 11 days of life, which means that immortal time bias still exists after 3 days of life. Further, if early caffeine affects mortality in either direction, this approach introduces a new kind of bias since participants may be excluded from the analysis due to outcomes of the intervention. We doubt that it is possible to avoid immortal time bias in observational studies comparing early versus late caffeine administration.

Apart from the high risk of bias in the included studies, we also found that few studies reported on the outcomes that are likely most relevant to patients and caregivers. For example, BPD is a medical measurement with likely limited direct relevance to affected individuals and their families, i.e. it is a so-called “surrogate outcome”. The rationale to use surrogate outcomes instead of outcomes directly relevant to patients is that they often require shorter trials with fewer participants to detect an effect.56 Surrogate outcomes are often assumed to transfer into patient relevant outcomes with further follow-up. However, throughout medical history, it has been shown that this assumption is often exaggerated or even false, so that an early apparently positive effect from an intervention on a surrogate outcome may have no or even harmful effect on the later patient relevant outcome.56 When we chose primary outcomes for this review, we deliberately focused on the outcomes we believe would matter most to patients and caregivers. Similarly, the choice of which outcomes to prioritise in future studies should be informed by patients and their families.

The results from this review highlight an important gap in the evidence base and call for additional research in this field. This is especially important since time trends show a rapid increase in the use of early caffeine administration to preterm infants during the past decades.12,13 The reasons to this trend are likely to be multifactorial and in line with a generally more aggressive approach to medical interventions in neonatal intensive care57 as well as in medicine at large.58 In addition, the price of caffeine increased by a factor of about 10 after the registration of the drug by a pharmaceutical company around 2010.59 The need of high-quality evidence in this area is further emphasised by a recent study showing an increased risk for cerebellar haemorrhage in the group who received early and high-dose (80 mg/kg over the first 36 h of age) caffeine compared to the group who received caffeine at a lower dose.18

The conclusion of our systematic review is in contradiction to previous systematic reviews on the same topic.60,61 The systematic review from Pakvasa et al.60 published in 2018 concluded that “optimal timing of caffeine initiation suggests a significant treatment benefit of earlier initiation of caffeine, compared with later initiation of caffeine, in decreasing the risk of BPD”. The authors state that the “over-all quality of the evidence is low” however judged the risk of bias in the observational studies of BPD as “not serious”. They did not seem to use a standardised tool to measure the risk of bias in the included studies. They refer to the GRADE assessment, which should be used after the risk of bias assessment for the individual studies to judge the overall certainty of the evidence, with consideration of other factors, such as inconsistency, indirectness and imprecision. Further, the subgroup analysis from an RCT46 was considered in their meta-analysis as an RCT. However, there was no randomisation in regard to receiving early versus late caffeine in this trial, and therefore these data should be evaluated as observational. They also included other meta-analyses in their data summary, although some of the studies included in these meta-analyses were already included in their systematic review as individual studies. Similarly, we disagree with the analysis conducted in the systematic review from Kua and Lee.61 They judged most of the included studies as of high quality and with low risk of bias, concluding that early caffeine “may help decrease the burden of morbidities in preterm infants”.

This systematic review has several strengths. A comprehensive search with no language or date restrictions was conducted, and we succeeded to translate one eligible study from Mandarin to English.51 Further, two authors independently did the screening of eligible studies, data extraction and performed a thorough assessment of risk of bias of included studies. Potential limitations include our broad approach, i.e. that we, for example, included all studies regardless of clinical indication, which may have contributed to the large clinical heterogeneity of the included studies. Further, our choice of the definition of early caffeine (i.e. within 24 h) and outcomes could be discussed. We chose neonatal mortality as a primary outcome while most of the included studies that reported on mortality presented “mortality before discharge”, which may be an outcome equally relevant to patients as neonatal mortality. We did, however, include “mortality before discharge” as a secondary outcome.

Conclusion

No firm conclusions about the effectiveness or safety of early compared to late caffeine can be drawn based on the available evidence. Non-randomised studies comparing early versus late caffeine to preterm infants have inherent methodological problems that are difficult to overcome. It is unlikely that observational studies will contribute to further relevant knowledge in this area. Therefore, RCTs are needed to answer the question of optimal timing for caffeine administration in neonatal care. Future trials should ensure to focus on outcomes relevant to patients and their families, as well as include long-term follow-up.