Introduction

Apnea of prematurity (AOP) affects at least 80% of preterm infants1 born before 32 weeks gestational age (GA). AOP may lead to hypoxemia requiring immediate resuscitation,2 necessitating continuous long-term monitoring of vital signs, caffeine therapy, and respiratory support over weeks to months. Prolonged hypoxemia due to AOP in the first few months of life is associated with late death and a considerably increased risk of neurodevelopmental impairment at 18 months of age.3 Thus AOP represents a considerable burden of disease and has a significant impact on resource allocation in neonatal intensive care units (NICUs) across the globe.4,5,6 Notably, cardiorespiratory stability is a crucial prerequisite for discharge from the NICU.7

Postmenstrual age (PMA) at cessation of AOP varies considerably with a range of about 35–43 weeks PMA.8 Predicting when infants are stable enough to withdraw from respiratory support and cardiorespiratory monitoring is extremely difficult but has critical influence on safety considerations, allocation of health care resources, and discharge planning.9,10

Assessing heart rate fluctuation (HRF) in neonates by time series analysis is useful to predict critical events such as neonatal sepsis in the short term.11,12 A recent study found that HRF assessed in the first week after birth has considerable prognostic value compared to validated risk scores to predict death and morbidities in preterm infants.13 We hypothesized that baseline HRF assessed over the first 5 days of life can be used to predict medium-term cardiorespiratory stability and hence guide decisions related to discharge from the NICU. We aimed to investigate whether interbeat interval (IBI) values, their distribution, and time series parameters including sample entropy (SampEn)14 and scaling exponent alpha (ScalExp)15,16 predict cardiorespiratory stability in terms of (i) total duration of respiratory support in the NICU (primary outcome), (ii) PMA at cessation of AOP, discontinuation of caffeine therapy, stopping of continuous electrocardiography (ECG) monitoring in the NICU, mortality, and length of stay (secondary outcomes).

Methods

Study design

We conducted a prospective, single-center, observational cohort study in the tertiary-level NICU of the University Children’s Hospital Basel UKBB, Basel, Switzerland. The study was approved by the Ethics Committee of Northwestern Switzerland and written informed consent of the parents was obtained prior to inclusion of study participants. Inclusion criteria were as follows: preterm infants <24 h of chronological age; GA < 32 completed weeks, and/or birth weight <1500 g. Exclusion criteria included major congenital malformation, asphyxia, surgical intervention during the first 5 days of life, or direction of treatment toward palliative care.

Measurements

Thoracic surface electromyography (sEMG) measurements for acquisition of HRF data and synchronized video recordings for sleep staging were performed as reported in recent validation studies.17,18,19 Briefly, daily sEMG measurements were conducted during the infant’s first 5 days of life, starting at 8:30 a.m. and lasting for 3 h. Raw signal was captured at a sampling frequency of 500 Hz using commercially available software (Polybench, Inbiolab BV, Roden, NL). Time-synchronized video recordings of the infants were obtained at 15 Hz (LifeCam, Microsoft Corporation, Redmond, WA), and video-based sleep stages (awake, active sleep, quiet sleep) were scored at 10-s intervals as recommended19,20 and as described previously.21 Extensive quality control of raw sEMG data was performed as described previously.17 Raw data passing quality control were extracted for calculation of IBI using Matlab (The MathWorks, Inc., Natick, Massachusetts).

HRF characteristics considered to influence outcomes

We calculated the following HRF characteristics on normal-to-normal IBIs (NN) taking into consideration all intervals between two consecutive heart beats resulting from sinus node depolarization22: mean (IBIMean), SD (IBISDNN), coefficient of variation (IBICV), square root of the mean of squared successive differences (IBIRMSSD), and skewness (IBISkewness).22,23 SampEn was calculated according to the algorithm described by Richman and Moorman and in the citations therein.24 SampEn is a measure of randomness of a time series. It quantifies the conditional probability that, with the knowledge of a consecutive number of m data points within a given tolerance r (usually r = 0.2 × standard deviation), the next data point can be predicted. To estimate long-range correlations of IBIs, we calculated ScalExp derived from detrended fluctuation analysis (DFA) as reported previously.15,16,25,26 Briefly, each record was partitioned into windows of size L = 2n (n = 2,3,…,N where 2N was the largest integer smaller than one fourth of the record length), the local trends were removed from each window, and the root mean square fluctuations (F) computed. The ScalExp was obtained as the slope of a straight line fit to at least 85% of the LF data on a log-log graph giving the highest R2. ScalExp describes the self-similarity (scaling) of the fluctuations of a biological signal in a time series across a range of sizes of time windows and thus reflects the long-range correlations (memory) of the signal.

Demographic and clinical factors considered to influence outcomes

Details of factors considered to influence outcomes are listed in Table 1. Briefly, we considered GA, birth weight, birth weight z-score, sex, prenatal corticosteroids, early- and late-onset sepsis (EOS and LOS), necrotizing enterocolitis (grade ≥II according to Bell27), germinal matrix-intraventricular hemorrhage (IVH grade I–IV, documented according to Papile28), periventricular leukomalacia, hyperbilirubinemia, level of respiratory support at study, bronchopulmonary dysplasia, body temperature, weight loss over the first 5 days of life, time to last caffeine dose (hours) at the start of measurement, and the infant’s behavioral characteristics (positioning, sleep stage, extent of infant handling).

Table 1 HRF characteristics, demographic and clinical factors, and clinical outcomes of study participants (n = 76)

Clinical outcomes

Primary outcome was the total duration of respiratory support in the NICU, defined as a composite of endotracheal ventilation, nasal continuous positive airway pressure, and high-flow nasal cannula therapy with and without supplemental oxygen. Respiratory support was managed at the discretion of the attending neonatologist and withdrawn based on a set of criteria summarized in standard operating procedures (acceptable work of breathing (no grunting and no deep retractions to achieve stable blood gases), supplemental oxygen <30% to achieve preductal oxygen saturation of 87–95%, no manual stimulation for apnea >20 s, or bradycardia <100/min for at least 24 h in non-intubated infants). Secondary outcomes included PMA at cessation of AOP abstracted from nursing records (defined as cessation of breathing lasting for at least 20 s or one lasting for <20 s that is associated with bradycardia and/or cyanosis1), PMA at discontinuation of caffeine therapy (all infants received caffeine from day 1 of life based on clinical standard operating procedure; caffeine therapy was routinely stopped 72 h after last manual stimulation for AOP), PMA at stopping of routine ECG monitoring in the NICU (72 h after last bradycardia and ≥5 days after stopping of caffeine therapy), mortality prior to discharge, and PMA at discharge.

Statistical analysis

Aiming at a statistical power of 80% on the 5% significance level, we anticipated to recruit a total of n = 90 infants in order to analyze a minimum of n = 76 preterm infants (expected loss to follow-up 15% due to technical reasons, inherent mortality of extremely preterm infants, and withdrawal of parental consent), allowing for linear regression analysis of at least three continuous independent predictor variables of medium effect size (f2 = 0.15).29,30

We extracted demographics, clinical factors, and clinical outcomes from medical records. HRF characteristics were calculated several months after extraction of this data, i.e., assessors of clinical outcomes were unaware of HRF characteristics. We performed linear regression analysis to assess associations between HRF characteristics, demographical, and clinical factors with outcomes. We first used univariable, multilevel modeling to explore associations of considered predictors with outcomes (p < 0.1 considered to indicate potential relevance of a predictor). We then built multivariable, multilevel linear regression models for each outcome followed by stepwise backward elimination of predictors (p < 0.05 considered statistically significant). We defined a best model depending on the coefficient of determination (R2). R2 represents the amount of variance in the outcome variable that can be explained by the predictors in the model with 1 indicating that the model explains all the variance in the outcome. We compared models using likelihood ratio tests. Models were explored for interaction of predictors and model diagnostics included plotting of residuals against fitted values. Statistical analysis was performed using the Stata software (StataCorp. 2009. Stata Statistical Software: Release 11. College Station, TX: StataCorp LP).

Results

Between January 2013 and September 2015, 330 sEMG measurements were performed in 90 infants. Acceptable raw data were obtained from 309/330 (94%) measurements in 76/90 (85%) preterm infants. Twenty-one measurements were not suitable for analysis due to truly irregular beat as observed on routine ECG monitors by the clinical care team (supraventricular extrasystole, n = 9), displacement of sEMG electrodes (n = 10), and 50 Hz interference (n = 2). Mean (range) GA was 30.2 (24.7–34.0) weeks and mean (range) birth weight was 1274 (420–1900) g. Figure 1 shows the flow of the study. Table 1 lists HRF characteristics, demographic and clinical factors, and clinical outcomes. Results of univariable, multilevel linear regression analyses are summarized in Table 2. Results of multivariable, multilevel linear regression analyses are shown in Table 3 and are summarized below. Figure 2 shows the values of SampEn and ScalExp calculated over the first 5 days of life.

Fig. 1
figure 1

Study flow sheet. CV coefficient of variation, ECG electrocardiography, HRF heart rate fluctuation, IBI interbeat interval, PMA postmenstrual age, RMSSD route mean square analysis, SDNN standard deviation of normal to normal (NN) IBIs, sEMG surface electromyography, SampEn sample entropy, ScalExp scaling exponent alpha derived by detrended fluctuation analysis

Table 2 Results of univariable, multilevel regression analyses
Table 3 Multivariable, multilevel linear regression models for all outcomes
Fig. 2
figure 2

Sample entropy and scaling exponent over the first 5 days of life. SampEn sample entropy, ScalExp scaling exponent alpha derived by detrended fluctuation analysis. SampEn (a) and ScalExp (b) did not significantly change over the first 5 days of life

Primary outcome

Multivariable, multilevel modeling established a significant negative association of duration of respiratory support in the NICU with SampEn after adjusting for sex, GA, and birth weight z-score (R2 = 0.53, p < 0.001; see Table 3 and Fig. 3). Adding SampEn to a model including sex, GA, and birth weight z-score improved the predictive value of the model from 49% to 53% (p = 0.04, likelihood ratio test). ScalExp did not add predictive value to this model.

Fig. 3
figure 3

Predictive effect of sample entropy on duration of respiratory support. Duration of respiratory support (h) (log-transformed) over sample entropy of interbeat interval. There was a significant negative association of sample entropy with duration of respiratory support. Each data point reflects average values over the first 5 days of life (one data point per infant)

Secondary outcomes

PMA at discontinuation of caffeine therapy was negatively associated with SampEn after adjusting for sex, GA, birth weight z-score, and EOS (R2 = 0.35, p < 0.001; see Table 3 and Fig. 4). Adding SampEn to a model with these demographic and clinical factors resulted in a 6% increase in the predictive value of the model from 29% to 35% (p = 0.01, likelihood ratio test). ScalExp did not add predictive value to this model.

Fig. 4
figure 4

Predictive effect of sample entropy on cessation of caffeine therapy. Postmenstrual age at cessation of caffeine therapy (weeks) over sample entropy of interbeat interval. There was a significant negative association of sample entropy with postmenstrual age at cessation of caffeine therapy. Each data point reflects average values over the first 5 days of life (one data point per infant)

After adjusting for relevant demographic and clinical factors, PMA at last apnea and PMA at cessation of ECG monitoring were not associated with HRF characteristics after birth. The best multivariable models predicting those outcomes included sex, GA, birth weight z-score, EOS, and IVH (for details, refer to Table 3). Mortality was not modeled owing to the low event rate (n = 1). Length of stay was associated with IBICV, GA, birth weight z-score, EOS, LOS, and IVH. Adding IBICV to the model did not significantly change the prognostic value of the model (R2 = 50 vs. 49, (p = 0.06, likelihood ratio test). There were no significant day-to-day differences of SampEn or ScalExp within the first 5 days of life.

Discussion

We found that SampEn calculated over the first few days of life significantly improves prediction of subsequent cardiorespiratory stability in preterm infants after adjusting for sex, degree of prematurity, intrauterine growth, and comorbidities, such as sepsis or IVH. Results were robust toward the influence of sleep stage, infant positioning, and extent of external handling. Distributive indices of IBI in the time domain (IBIMean, IBISDNN, IBICV, IBIRMSSD, IBISkewness) and ScalExp derived from DFA did not contribute to the predictive value of regression models after adjusting for relevant demographic and clinical factors.

To our best knowledge, this is the first study to show that low baseline SampEn of IBI calculated within the first 5 days of life is useful for predicting cardiorespiratory stability of preterm infants over a time course of weeks to months. Since this prediction was made very early in life, it may significantly contribute to planning of treatment and allocation of health care resources.

Comparison with previous literature

Low heart rate variability has long been established as an indicator of poor prognosis in adults after acute events such as myocardial infarction.31,32 Also, low SampEn repeatedly has been shown to be an early marker of incipient sepsis in preterm neonates hospitalized in the NICU.14,33,34 In a large (n = 3003) randomized trial, Moorman et al. reported that real-time display of heart rate characteristics to clinicians in the NICU resulted in reduced sepsis-related mortality in preterm infants.35 Similarly, HRF was shown to correlate with the degree of hypoxic encephalopathy in term neonates when assessed during the first 12–48 h of life and even had a predictive value for neurodevelopmental outcome at 2 years of age.36 Our findings indicate that baseline SampEn improves medium-term prediction of duration of respiratory support in the absence of incipient events and after adjusting for known predictors of outcome. Fairchild et al. demonstrated that altered HRF within 28 days after birth was associated with abnormal brain imaging in the NICU and neurological impairment at 1 year of age in extremely low-birth-weight infants.37 They hypothesized that abnormal HRF early in life might not only reflect inflammatory processes such as sepsis causing secondary brain injury but could also directly indicate brain injury or neurological dysfunction. Similarly, we cannot rule out that SampEn at least partially is associated with clinical outcomes as a surrogate measure of other, unmeasured factors.

Strengths and limitations of the study

Strength of this study is the prospective assessment of HRF with systematic data quality control in a population of infants at high risk of AOP in whom cardiorespiratory stability is difficult to predict and of clinical importance. We reached a very high success rate of sEMG measurements (94%) and further considered various factors, such as adjustment for sleep stages and comorbidities potentially influencing both HRF within the first days of life and study outcomes during prolonged NICU stay of the infants. A limitation of our study is the lack of real-time analysis of HRF at the bedside, i.e., the current set-up would not (yet) allow clinicians to incorporate SampEn into their decision making in the NICU. However, as shown previously,17 the data quality control procedure could potentially be fully automated and bring this set-up an important step closer to real-time analysis. Further, we only acquired data from five daily, standardized, 3-h recordings. Thus we cannot exclude that measurements over longer periods of time or other segments of the day may produce different results. We assessed HRF using sEMG instead of the typically used ECG traces. As an advantage, this allows simultaneous detection of motion artifacts, on the other hand, our results might not be directly comparable to findings from HRF analysis by ECG.

Interpretation and significance

Low SampEn represents a composite of regularity of heart rate and presence of spikes, e.g., transient decelerations, in a time series of heart beats.14 The pathophysiological mechanism causing changes in SampEn is unknown. In the context of incipient events, changes in SampEn have been attributed to the effect of inflammatory cytokines.34 Our observation that daily baseline measurements of SampEn early after birth predict medium-term cardiorespiratory stability is novel and suggests that SampEn represents an integrative marker of individual stability of the autonomic nervous system in preterm infants. As contemplated by Lake et al., high SampEn, representing some level of natural fluctuation, might be a general indicator of health.14

ScalExp did not add any predictive value to the models. From a methodological point of view, time series analysis can be very sensitive to loss of data.38,39 As reported previously, we performed extensive data quality control in order to reduce the effect of motion artifacts on time series analysis and remove poor quality raw data.17 In contrast to ScalExp, SampEn has been shown to be remarkably unaffected both analytically and experimentally by loss of up to 40% of raw data.14 This might explain why only SampEn conferred prognostic utility in the current study. Further, HRF has been shown to vary between different sleep stages.40 We considered adjusting for changes in sleep stage to be of particular importance as neonates do not have a circadian sleeping rhythm but follow a much shorter, internal clock-driven, ultradian sleep/wake rhythm. However, sleep stage and extent of infant handling during data acquisition did not influence outcomes. This could be an indication that HRF measures such as SampEn and ScalExp are robust across different behavioral states in the current study population.

Clinical relevance

We demonstrated feasibility of sEMG measurements for calculating HRF in a clinical NICU setting and found that the method could potentially be fully automated.17 Our findings suggest that SampEn is a promising physiological marker to monitor health and development over relatively long time periods. In terms of prognostic power, the effect size of SampEn in the models was moderate; e.g., SampEn increased the coefficient of determination from 0.49 to 0.53 in the model assessing duration of respiratory support. This suggests that GA at birth, intrauterine growth (birth weight z-score), and sex are the major determinants of outcome. Further, the model only explains slightly more than half of the amount of variation in the outcome. While this is substantial, there are obviously other, unmeasured factors influencing the outcome that are not included in the model. Prognostically, the first week of life clearly represents a critical period of time and it is encouraging to note that baseline SampEn obtained at such an early stage of life is predictive of medium-term outcomes. This is of clinical importance as planning of resource allocation (staffing, equipment) and of discharge home require considerable time. Repetitive or even continuous measures of SampEn over a longer time period and/or closer to discharge might provide even more useful information. Yet, there was no additional predictive value of HRF metrics over clinical risk factors on secondary outcomes such as PMA at discharge or death, suggesting that the predictive value of HRF is outcome specific. Our cohort included infants from 24 weeks GA who were at high baseline risk of complications and prolonged NICU stay. However, most infants were relatively healthy: e.g., only 6 out of 76 infants had LOS, and none had NEC or bronchopulmonary dysplasia at 36 weeks PMA, which are relevant factors influencing survival and duration of respiratory support. The fact that study participants were rather healthy preterm infants is further reflected by the narrow window of PMA at cessation of caffeine and PMA at last apnea. It is thus possible that the rate of critical events and the range of continuous secondary outcome measures observed in the study population was too low to detect significant relationships with HRF metrics. Thus extrapolating our findings to infants with bronchopulmonary dysplasia or other complex issues prolonging hospital stay requires further study.

Conclusions

Baseline SampEn calculated over the first 5 days of life improves prediction of subsequent cardiorespiratory stability over weeks to months in preterm infants. Characterizing HRF in these infants confers promising prognostic utility independently of subacute events at an extremely early stage of hospitalization. Generalization of these results requires further study.