Caffeine for apnea and prevention of neurodevelopmental impairment in preterm infants: systematic review and meta-analysis

This systematic review and meta-analysis evaluated the evidence for dose and effectiveness of caffeine in preterm infants. MEDLINE, EMBASE, CINHAL Plus, CENTRAL, and trial databases were searched to July 2022 for trials randomizing preterm infants to caffeine vs. placebo/no treatment, or low (≤10 mg·kg−1) vs. high dose (>10 mg·kg−1 caffeine citrate equivalent). Two researchers extracted data and assessed risk of bias using RoB; GRADE evaluation was completed by all authors. Meta-analysis of 15 studies (3530 infants) was performed in REVMAN across four epochs: neonatal/infant (birth-1 year), early childhood (1–5 years), middle childhood (6–11 years) and adolescence (12–19 years). Caffeine reduced apnea (RR 0.59; 95%CI 0.46,0.75; very low certainty) and bronchopulmonary dysplasia (0.77; 0.69,0.86; moderate certainty), with higher doses more effective. Caffeine had no effect on neurocognitive impairment in early childhood but possible benefit on motor function in middle childhood (0.72; 0.57,0.91; moderate certainty). The optimal dose remains unknown; further long-term studies, are needed.


INTRODUCTION
Infants born preterm are physiologically and metabolically immature and have higher rates of morbidity and mortality, and poorer long-term neurodevelopmental outcomes than those born at term [1].Amongst other issues, they are at risk of apnea of prematurity [2] and intermittent hypoxemia [3], which result in a decrease in oxygen saturation and bradycardia and have been associated with increased risk of neurodevelopmental impairment [4,5].Rates of apnea are correlated with the degree of prematurity, occurring most frequently in extremely preterm infants, though late preterm infants are also affected [2].Late preterm infants also experience frequent episodes of intermittent hypoxemia [3] and poorer neurodevelopmental outcomes than term-born infants [6].
Methylxanthines are respiratory stimulants that have been used in preterm neonates for decades to both prevent and treat apnea of prematurity and to facilitate extubation [7].Caffeine is a naturally occurring methylxanthine used extensively worldwide for hundreds of years for its central nervous system stimulant properties [7].Caffeine and other methylxanthines, such as theophylline, have been used in the treatment of apnea in newborn infants since the 1970s [8].The precise mechanism by which methylxanthines improve respiratory function continues to be debated, but caffeine is known to stimulate the respiratory center in the medulla by antagonizing adenosine A1 and A2A receptors, increasing sensitivity and response to carbon dioxide and PO 2 and enhancing diaphragmatic function [9].Caffeine is now used in preference to other methylxanthines due to its wider therapeutic window and longer duration of action in neonates, which allow for daily dosing and remove the need for therapeutic drug monitoring [10,11].
Despite this longstanding clinical use there remain several evidence gaps, including indications for treatment, dosing regimen, the most appropriate patient population, and the short-and long-term effects of caffeine therapy [12].The aim of this systematic review was to assess the effectiveness of caffeine in reducing the rate or occurrence of apnea and reducing longterm neurodevelopmental impairment in preterm infants (<37 weeks' post-menstrual age [PMA]).A secondary aim was to assess if there is any difference in these outcomes between caffeine given at standard doses (≤10 mg•kg −1 caffeine citrate equivalent) and high doses (>10 mg•kg −1 caffeine citrate equivalent).

METHODS
This systematic review was guided by the Cochrane Handbook for Systematic Reviews of Interventions [13] and is reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [14].Prior to the literature search being conducted, the protocol was registered with the Prospective Register of Systematic Reviews (PROSPERO, CRD42020154678).
We included all randomized controlled trials (RCTs) in preterm infants (<37 weeks' PMA) of caffeine (at any dose and for any reason) vs. placebo or no treatment (comparison one), or highdose caffeine (>10 mg•kg −1 citrate equivalent) vs. low-dose caffeine (≤10 mg•kg −1 caffeine citrate equivalent) (comparison two), which reported one or more prespecified outcomes.We included published studies and those published in abstract if they included sufficient information to confirm eligibility and allow Grading of Recommendations Assessment, Development and Evaluation (GRADE) [15].We did not include observational or nonrandomized studies.No limit was placed on year of publication, and studies in any language were included and translated if an English abstract was available for the initial screening stage.
We reported outcomes across four developmental epochs: neonatal/infancy (<1 year of age), early childhood (ages 1-5 years), middle childhood (ages 6-11 years) and adolescence (ages 12-19 years).If longitudinal studies reported multiple assessments of an outcome within the epoch, the last reported assessment in each epoch was included in the analysis.
The primary outcome for the neonatal/infant epoch was apnea, defined as a pause in breathing of ≥20 s, or <20 s with bradycardia (heart rate <100 beats per minute [bpm]), cyanosis or pallor [16], or as per author definitions.For all other epochs, the primary outcome was neurocognitive impairment, defined by authors, using standardized tests appropriate for age.
Secondary outcomes for the neonatal/infant epoch included bronchopulmonary dysplasia (BPD), defined as ongoing requirement for oxygen or respiratory support at 36 weeks' PMA; intermittent hypoxemia, expressed as events per hour and defined as a fall in oxygen saturation (SpO 2 ) of 10% or more from baseline, or as defined by authors; retinopathy of prematurity (ROP) Stage III or worse [17]; intraventricular hemorrhage (IVH) grade III or IV [18]; patent ductus arteriosus (PDA), defined as use of medical or surgical treatment for ductal closure; tachycardia, defined as mean heart rate ≥160 bpm or as per authors; duration of mechanical ventilation; duration of positive pressure support; growth velocity, including weight gain (g.kg −1 .day−1 ), linear growth (cm.week−1 ) and head growth (cm.week−1 ) to 36 weeks' PMA (or as defined by authors); death; survival without neurosensory impairment (including, but not limited to deafness, blindness and cerebral palsy); and time to establish full enteral feeds (as defined by authors).
For all other epochs, secondary outcomes included: motor impairment, defined by authors using standardized tests appropriate for age; hearing impairment, defined as requiring one or more hearing aids or worse, or as per authors; visual acuity less than 1 LogMAR, or as per authors; death; survival without neurosensory impairment, including, but not limited to, deafness, blindness, death and cerebral palsy; emotional-behavioral difficulties, as defined by authors; cerebral palsy; chronic lung disease, defined as physician-diagnosed asthma or ≥2 episodes of parentreported wheeze, or as per authors; and height and weight expressed as Z-scores.

Search strategy
We searched Pubmed, Medline, Embase, the Cumulative Index to Nursing and Allied Health Literature (CINHAL Plus) and the Cochrane Central Register of Controlled Trials (CENTRAL) databases from inception to 11 July 2022 using relevant MeSH terms and keywords (caffeine and premature/ prematurity/ preterm/ low birthweight and variations).The search was limited to studies involving humans, with no limit on year of publication or language.No limits on study type were applied at the initial search stage.We also searched The World Health Organization International Clinical Trials Registry Platform (ICTRP) (who.int/ictrp/search/en/), the US National Library of Medicine Clinical Trials Registry (clinicaltrials.gov),and Australia and New Zealand Clinical Trials Registry (ANZCTR) (anzctr.org.au), for any additional trials meeting the inclusion criteria not located through the above searches.Where results of trials were not available in the public domain, we contacted the authors listed in the trial registration to confirm the status of the trial, and whether any results were available for inclusion.We hand-searched bibliographies of included studies, review papers and conference abstracts to identify any additional studies.Covidence (Covidence Systematic Review Software, Veritas Health Innovation, 2020) was used to manage search results and screen studies for inclusion.

Study selection
Two review authors independently screened all retrieved titles and abstracts to assess eligibility for inclusion.The full text of all potentially relevant studies was retrieved and assessed independently by two authors to determine eligibility.Any disagreements were resolved by mutual discussion and consultation with a third author if required.Summary characteristics of each study were extracted and tabulated.

Data extraction, bias, and quality assessment
Two authors independently extracted data from all included studies using a prespecified data form.Any discrepancies were resolved by mutual discussion and consulting a third author if required.Additional information was sought from study corresponding authors if information was unclear or not published.
Two review authors independently assessed the risk of bias (RoB) of all included trials using the Cochrane RoB tool [19] for the following domains: sequence generation (selection bias); allocation concealment (selection bias); blinding of participants and personnel (performance bias); blinding of outcome assessment (detection bias); incomplete outcome data (attrition bias); selective reporting (reporting bias); any other bias.Any disagreements were resolved by mutual discussion or consulting a third author if necessary.For one study, where EO, JA, and CM were investigators, an alternative independent colleague (AW) with no association to the study conducted the data extraction and RoB assessment in conjunction with SH.
Review Manager (RevMan version 5.4.1.The Cochrane Collaboration, 2020) was used to summarize and analyze the data.Meta-analysis using fixed effects was performed if data from >2 RCTs were available.Apnea was reported using different measures that precluded a single meta-analysis; therefore, apnea was analyzed both as a dichotomous and continuous variable.We calculated the risk ratio (RR) for dichotomous outcomes and mean difference (MD) for continuous outcomes, with confidence intervals (CI) of 95%.If data were reported as median and interquartile range, means and standard deviations were estimated [20].Planned secondary analyses included subgroup analysis by indication for caffeine and gestation length.Statistical heterogeneity was defined as an I 2 > 50% and low p-value for the Chi-Square test, and categorized according to GRADE guidelines [15].Methodological causes of heterogeneity were explored via subgroup analysis and sensitivity analysis, excluding studies at high risk of bias.
Outcomes were classified by all authors according to their importance for decision-making using GRADE classifications (7-9 critical, 4-6 important but not critical, 1-3 less important) [15].Certainty of the evidence was assessed using the GRADE framework [15] and agreed by all authors.Imprecision was assessed using optimal information size (OIS) assuming alpha 0.05 and beta 0.2 [21] and considered serious if the total number of participants was less than the OIS for the outcome, or very serious if total participants numbered less than half the OIS.For continuous outcomes we assumed alpha 0.05 and beta 0.2, and delta 0.33.
Study characteristics and results were tabulated, and forest plots generated for all comparisons where data was available.

Literature search and study selection
Our search identified 6509 studies (Fig. 1).Following the removal of 3542 duplicates, 2968 studies were screened and 2801 excluded.The full text of 159 papers were reviewed, resulting in the inclusion of 15 studies in the final review.
Early childhood: For the primary outcome of neurocognitive impairment, evidence of low certainty from one trial could not exclude clinical benefit or harm from receiving caffeine compared to placebo (RR 0.98, 95% 0.63, 1.51, 1518 children) (Table 3) [37].
There were no data for the primary outcome of neurocognitive impairment in adolescence.
Secondary analysis: There were insufficient data to undertake the planned subgroup analyses.
Other epochs: No trials of high-dose vs. low-dose caffeine reported on neurocognitive impairment.
Secondary outcomes: Moderate certainty evidence from four trials showed probable benefit for BPD with high-dose vs. lowdose caffeine (RR 0.71 95% CI 0.55, 0.91, 586 infants, I 2 = 0%) (Table 4).Evidence of very low certainty from seven trials suggested that high-dose vs. low-dose caffeine may increase the rate of tachycardia (RR 2.29 95%CI 1.41, 3.72, 839 infants, I 2 = 0%) (Table 4).The evidence was too uncertain to determine the effect of high-dose vs. low-dose caffeine on other neonatal outcomes (Table 4; Fig. 3).For the critical outcome of survival without neurosensory impairment in early childhood, low certainty evidence from one trial meant that a benefit of high-dose vs. low-dose caffeine could not be excluded (RR 0.92 95%CI 0.82, 1.03, 236 children) (Table 4).
Secondary analysis: There were insufficient data to undertake the planned subgroup analyses.All trials were randomized on a 1:1 basis (1:1:1 and 1:1:1:1:1 for the multi-arm studies Scanlon [28] and Oliphant 2022 respectively). b Outcomes are for the neonatal/infant epoch unless otherwise stated.
c Schmidt [33] reported outcomes within the early childhood epoch at both 18-21 months and 5 years of age.In accordance with the prespecified protocol, where an outcome was measured at both time points, the latest available data (at 5 years of age) were used in meta-analysis.

DISCUSSION
Currently, there is no high-certainty evidence for use of caffeine in preterm neonates for any critical or important outcomes from birth to adolescence.However, in very preterm neonates, caffeine therapy probably reduces the rate of BPD and PDA; possibly increases survival without neurosensory impairment in early childhood and reduces cerebral palsy; and probably reduces the rate of neurocognitive impairment and motor impairment in middle childhood.Although traditionally given for apnea of prematurity, the evidence supporting this benefit of caffeine was of very low certainty, given the considerable heterogeneity in contributing studies, RoB inherent in these studies and the relatively small number of infants for whom data are available.
In general, evidence for the relative effectiveness of high vs. low-dose caffeine is even less certain, but moderate certainty evidence indicates higher doses probably reduce the rate of BPD more than lower ones, and very low certainty evidence suggests higher doses may cause more tachycardia.
Quantifying the effect of caffeine on longer-term outcomes is limited by the available studies, with only two trials presenting any outcome data beyond the neonatal/infancy period (one in each comparison) and only one of those reporting significant follow-up assessments and results.As a result, meta-analysis was not possible in epochs beyond neonatal/infancy, and the certainty of the findings is limited.No information was available comparing the effects of high and low-dose caffeine on neurodevelopmental outcomes.
This review provides a current and comprehensive summary of the available literature on the use of caffeine in preterm infants and included 15 RCTs covering 3530 premature infants.In contrast to previous systematic reviews, we included all studies enrolling preterm infants (<37 weeks' PMA), rather than limiting the population to infants born at earlier gestational ages [39][40][41].This was because moderate and late preterm infants may experience apnea of prematurity [2] and are known to have episodes of intermittent hypoxemia [3], and so may also benefit from caffeine therapy, though the evidence in this are remains uncertain.Previous systematic reviews have addressed a single question (either caffeine vs. placebo, or high vs. low-dose regimens), rather than considering both together as in this review, and have often included trials of other methylxanthines which are no longer routinely used in addition to caffeine.Furthermore, these older systematic reviews did not apply the explicit and comprehensive GRADE criteria to the assessment of the quality of the evidence, and so have perhaps overstated the certainty of the evidence underlying their recommendations [10,42].Recently published Cochrane reviews present GRADE analysis for only a subset of outcomes [41,43], whereas in this review GRADE analysis was performed for all outcomes with available data.In the neonatal and infant epoch, the critical outcomes of death before one year of age, neurocognitive impairment, survival without neurosensory impairment and cerebral palsy and the important outcome of retinopathy of prematurity were not reported by any included studies.
Two included studies (Fakoor [24], Iranpour [34] were judged to have high overall risk of bias; two (Armanian [35], Erenberg [23]) were judged to have some concerns overall and one (Oliphant 2022) was judged The Cochrane Neonatal Group have recently published reviews of caffeine dosing regimens in preterm infants [41] and of methylxanthines vs placebo / no treatment [43].However, this later Cochrane review includes a substantial number of trials that used other methylxanthines (aminophylline and theophylline) no longer routinely used in clinical practice and does not include some of the more recent trials of caffeine [24,32,34] included in this review.Both the Cochrane and other reviews of caffeine low-dose vs high-dose caffeine therapy have concluded that higher doses of caffeine are [44] or may be [40,41] more effective in reducing the occurrence of extubation failure.Analysis of the evidence for the important outcome of BPD has resulted in different conclusions in different reviews; either that higher doses reduce the rate of BPD compared with lower doses [39,41,44] or that higher doses do not alter the rate of BPD [40].In contrast to previously published reviews [39,41,44], we pre-defined high (>10 mg•kg −1 day −1 caffeine citrate equivalent) and low doses (≤10 mg•kg −1 day −1 ) of caffeine on the basis of maintenance dose, avoiding cross-over of doses included in the comparison groups and hence producing a more meaningful comparison.This may explain the differences in findings, as some other reviews have included trials where the only difference in dose was in the loading dose [39,40], or where both doses used would be considered low doses in current clinical practice [39].We also included all trials where infants received caffeine, regardless of indication, as we wished to include apnea given for the prevention of neurodevelopmental impairment, as well as solely for the prevention or treatment of apnea, or to assist in extubation.
As a systematic review, the robustness of the conclusions is limited by the quality and quantity of the included studies.The caffeine vs. placebo comparison identified and included a number of recent studies that have not previously been included in published meta-analysis [24,30,34], but some of these studies have domains with high risk of bias and there was a high degree of heterogeneity between studies, limiting the quality of the evidence.Furthermore, this comparison is dominated by a single study, which contributed over 2000 infants of the 2592 participants identified [45].We had planned to undertake subgroup analysis to assess the effectiveness of caffeine based on the indication for use (prophylaxis, treatment of apnea or for extubation, late hypoxemia or established lung disease) and by gestation (extremely, very, moderately or late preterm) but were unable to undertake these analyses due to the lack of data broken down by these variables in the identified studies.The lack of data on the effectiveness of caffeine in these different subgroups remains an important evidence gap, and further research is needed to inform evidence-based decision-making in clinical care.
While caffeine is widely used in neonatal units, the evidence remains uncertain, and other reviews on the topic have called for further clinical trials in this area [40,44,46].We join previous authors in this call for further research, and this systematic review indicates the evidence gaps where more information is required to guide clinical practice.In particular, there is a lack of data on long-term outcomes following different doses of caffeine in the neonatal period, and longer-term follow-up of infants in recent trials should be conducted to address this evidence gap.This is particularly important given the indications in this and other [39,40,44] meta-analyses that higher doses may be more effective in improving short-term outcomes, as use of higher doses in clinical practice should be preceded by evidence of the long-term safety of such doses.In addition to dose, more information is required on how the indication for treatment, infant gestation, duration of treatment/stopping and timing of initiation and discontinuation to have a low overall risk of bias for this outcome.d I 2 = 78%.e OIS criteria not met (total population less than optimal information size [OIS] resulting in downgrading by one step).
f One included study (Liu [25]) was judged to have high overall risk of bias for this outcome, and the other (Murat 1981) was judged to have some concerns overall for this outcome.g I 2 = 97%.h Two included studies (Fakoor [24] & Iranpour [34]) were judged to have high overall risk of bias for this outcome and one (Armanian [35]) was judged to have some concerns overall for this outcome.i OIS criteria not met (total population less than half of OIS, resulting in downgrading by two steps).j Two included studies (Fakoor [24] & Iranpour [34]) were judged to have high overall risk of bias for this outcome; three (Armanian [35], Erenberg [23] & Liu [25]) were judged to have some concerns overall for this outcome; and two (Oliphant 2022 & Schmidt [33]) were judged to have low risk of bias overall for this outcome.
k Patient populations of the two included studies were substantially different: Bucher [22] included infants under 32 weeks' gestation (mean 30.3 weeks) while Oliphant 2022 included infants 34-36 weeks' gestation.Intermittent hypoxemia is known to vary by gestational age.
l One included study (Iranpour 2022) was judged to have high overall risk of bias for this outcome; one (Armanian [35]) was judged to have some concerns overall for this outcome; and one (Schmidt [33]) was judged to have low risk of bias overall for this outcome.
m One included study (Fakoor [24]) was judged to have high overall risk of bias for this outcome; and one (Liu [25]) was judged to have some concerns overall for this outcome. n Both included studies (Fakoor [24] & Iranpour [34]) were judged to have high overall risk of bias for this outcome.
p One included study (Iranpour [34]) was judged to have high overall risk of bias for this outcome; three (Armanian [35], Bucher [22] & Liu [25]) were judged to have some concerns overall for this outcome; and one (Oliphant 2022) was judged to have low risk of bias overall for this outcome.
q Two included studies (Fakoor [24] & Iranpour [34]) were judged to have high overall risk of bias for this outcome; one (Armanian [35]) was judged to have some concerns overall for this outcome; and one (Schmidt [33]) was judged to have low risk of bias overall for this outcome. r Results from a single study only. s The only included study (Iranpour [34]) was judged to have high overall risk of bias for this outcome.Fig. 2 Forest plots of the neonatal/infant primary outcome, and critical and selected important secondary outcomes.a Apnea results are presented as a dichotomous measure (for caffeine vs placebo comparison) or a continuous measure (for high vs low-dose comparison), based on how apnea was measured in the majority of studies in each comparison.The forest plot for the alternate measure for each comparison is presented in Fig. 3. b Death before one year of age was also considered a critical outcome, but only 1 study reported this measure (in the low vs. high-dose comparison).This data is included in Fig. 3, with other secondary outcomes.Although 2 studies reported apnea as an outcome, the apnea outcome did not occur in any participants in Oliphant 2022, and hence only a single study (Kori [26]) contributed data to this analysis.d OIS criteria not met (total population less than half of OIS, resulting in downgrading by two steps).
e One included study (Scanlon [28]) was judged to have high overall risk of bias for this outcome, two (Steer [36] & Zhao [30]) were judged to have some concerns overall for this outcome and one (Mohammed [27]) was judged to have a low overall risk of bias for this outcome. f g Data are from a single study (Steer [36]) with a high risk of incomplete outcome data. h Results from a single study only.
i OIS criteria not met (total population less than OIS, resulting in downgrading by one step).j Two included studies (Steer [36] & Zhao [30]) were judged to have some concerns overall for this outcome and two (Mohammed [27], Steer [29]) were judged to have a low overall risk of bias for this outcome. k l One included study (Scanlon [28]) was judged to have high overall risk of bias for this outcome; two (Steer [36] & Zhao [30]) were judged to have some concerns overall for this outcome; and the remaining four studies (Kori [26], Mohammed [27], Oliphant 2022 & Steer [29]) were judged to have a low overall risk of bias for this outcome.In the early childhood epoch, the critical outcomes of death, neurocognitive impairment, motor impairment, cerebral palsy, hearing impairment and visual impairment, and the important outcomes of emotional-behavioral difficulties, asthma/wheeze, growth -height and growth -height were not reported by any included studies.
increasing the occurrence of tachycardia.However, most of the current evidence is of low certainty, and establishing the optimal dose requires more research, including long-term outcome assessment.

Fig. 1
Fig. 1 Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow diagram of study selection.
values represent weighted mean.b

Table 2 .
Overall risk of bias of included studies.

Table 3 .
GRADE summary of findings for caffeine vs placebo comparison.

vs placebo/no treatment High vs low dose caffeine
childhood epoch, the critical outcomes of death before one year of age and survival without neurosensory impairment were not reported by any included studies.
tIn the early childhood epoch, the important outcome of asthma / wheeze was not reported by any included studies.uIn the middle influence outcomes.Whether caffeine should be used during mechanical ventilation should also be ascertained, and if the dose should be decreased with tachycardia or increased with gestational age.CONCLUSIONCaffeine administered to preterm infants probably reduces BPD, PDA, and motor impairment, with higher doses probably conferring additional benefit in reducing BPD but possibly Caffeine

Table 4 .
GRADE summary of findings for low vs high-dose caffeine comparison.

Table 4 .
continued and infant epoch, the critical outcomes of neurocognitive impairment, survival without neurosensory impairment and cerebral palsy were not reported by any included studies.
a For continuous outcomes values represent weighted mean.b In the neonatal c