Introduction

Hospitalized preterm infants are exposed to significant numbers of painful or stressful procedures during neonatal intensive care unit (NICU) care.1,2 Repeated pain experiences have been widely associated with short- and long-term sequelae like hyperalgesia, altered brain architecture, or neurodevelopmental impairments.3,4,5 This pain burden increases with greater prematurity and many of these procedures are still not treated with analgesia.1,6

Some studies have pointed out the limitations of clinical behavior-based scales that may mirror somatic subcortical and motor nervous system activation, rather than the reflection of the cerebral activity.7,8,9 Indeed, Slater et al.10 reported that, in some cases, the pain-specific cortically evoked responses persisted after an acute nociceptive stimulus while the neonate showed no behavioral signs of pain. These results highlight the need for more objective pain assessment in preterm infants, especially when clinical conditions do not allow adequate behavioral assessments.11

Alternatives to these pain scales were developed based on the description of nociceptive pathways and neurophysiological responses to pain in newborn infants, such as hormonal, cerebral oxymetry, galvanic skin responses (GSRs), or physiological responses.12 Their goal was to improve the accuracy of pain assessments using monitoring of reliable and reproducible parameters. Among these methods, two can be routinely and easily applied at the bedside: (1) the GSRs, a reflection of sympathetic activity, which was reported to discriminate pain in neonates as early as 22 weeks of gestational age;13 and (2) heart rate variability (HRV) to measure the impact of nociception on parasympathetic activity in real time, which was correlated with postoperative pain in newborn infants.14

While a study of 29 preterm infants found no correlation between the newborn infant parasympathetic evaluation (NIPE) index and validated pain scales, other studies have shown the ability of HRV analysis to detect pain responses after acute painful procedures in term or preterm neonates, and in children.15,16,17,18,19,20 However, the validity of neurophysiological methods must be established before recommending them for routine pain assessment in clinical settings.

Our study aimed to assess the ability of the NIPE index (NIPEi) to detect acute procedural pain and stress responses in preterm infants by concurrently comparing the NIPEi to a validated composite pain scale (PIPP-R) and the skin conductance responses (SCRs).

Methods

A prospective, observational study was conducted in the Department of Neonatal Medicine of the University Hospital of Brest, France. Patients were enrolled from January 2017 to January 2018.

Population

Following signed consents from both parents, hospitalized preterm infants with gestational ages ranging from 25 + 0 to 35 + 6 weeks were eligible for this study. Measurements were performed during routine care procedures categorized as stressful or painful according to a previously published classification.1 Stressful procedures were included to compare painful responses vs. non-painful responses. Exclusion criteria included brain injury (grade 3 or 4 intraventricular hemorrhage, white matter injury), genetic abnormality, severe congenital malformation, administration of neuromuscular blockade, and either anticholinergic or adrenergic antagonists administered during the 48 h prior to recording. A maximum of ten procedures could be recorded for any single patient.

Study design

Pain and stress responses were measured using (1) the NIPEi (MDMS, Loos, France), (2) the GSR Pain Monitor (Med-Storm, Oslo, Norway), and (3) the PIPP-R score. The NIPE monitor was connected to the cardiorespiratory monitor (Intellivue, MX series, Philips Medical Systems, Eindhoven, The Netherlands) 20 min before the start of the care procedure for signal averaging as required by the manufacturer. The three Pain Monitor leads were placed on palmar or plantar skin. Two analog cameras (Sony HDR-CX740 24.1; Sony Cyber-shot DSC-W830), one focused on the patient’s face and the other on physiological parameters (heart rate, SpO2), were used to perform PIPP-R scoring post hoc.

Measurements

Frequency analysis of HRV was previously validated as a method for pain assessment with decreased parasympathetic responses to painful stimuli during heel lance procedures.17,21 The NIPE® monitor (MDMS, Loos, France) was designed using this principle, specifically for infants until 2 years old, including preterm infants.22,14 The instantaneous NIPEi is derived from an algorithm evaluating the short-term HRV in real time.22 This automated analysis of neonatal HRV in high frequencies (<0.15 Hz) reflects parasympathetic activity. It ranges from 0 to 100 and decreases with the intensity of pain or stress. The NIPEm represents the mean value over 20 min. The instantaneous NIPEi provides a value based on a 64-s moving window analysis, which is updated every 1 s.20 The NIPE monitor is connected to the cardiac monitor and continuously displays the NIPEm and the NIPEi indexes. The NIPEi seems, therefore, more suitable for acute procedural pain analysis and was used to study acute pain/stress in preterm infants. NIPEi was recorded from 2 min before until 2 min after the procedure, and removed before any other procedure could interfere with the recorded responses.16,23 Since the NIPEi appeared to change more slowly, on our first data analysis, as compared to the PIPP-R or the SCR, we decided to perform a post hoc analysis for the patients who had the NIPEi measured until 3 min after the procedure and before any other procedure (89% for the first recorded procedure and 84% for the whole sample of procedures).20 Data were directly exported from the NIPE monitor after recording.

The PIPP-R24,25 has been validated in term and preterm infants during acute painful procedures. This composite scale includes two physiological parameters (heart rate and oxygen saturation), three behavioral parameters (brow bulge, eye squeeze, and nasolabial furrow), and two contextual factors (GA, behavioral state). The PIPP-R was scored continuously over periods of 30 s starting from 2 min before until 2 min after the procedure. Two trained raters reviewed the video recordings independently and assigned PIPP-R scores; an expert validated these assessments for the first 36% procedures. The calculated intraclass correlation coefficient was high (0.92) showing the inter-rater reliability of the PIPP-R analysis.

The Med-Storm Pain Monitor recorded the SCR integrating a dedicated software using the “premature” mode. Increased number of peaks/s were previously reported to discriminate pain responses26,27 and was measured in time windows of 30 s from 2 min before until 2 min after the procedure according to previous studies.26,27

Statistical analysis

Power analysis suggested a sample size of 200 patients to estimate a correlation coefficient of 0.8 with an accuracy of 0.05 (half-amplitude of the 95% confidence interval) and α-error of 5%. Data were analyzed using the software SAS version 9.4. Correlations between the NIPEi index, the PIPP-R score, and the number of peaks/s (SCR) were calculated using Pearson’s or Spearman’s correlation coefficients according to the normality of the distribution. In the case of repeated measurements, only the first pair of measurements for each patient was considered for the primary outcome. The secondary outcome integrated the type of stimulation (stressful or painful), the patient’s sex and gestational age (<32 vs. 32–36 weeks). A modeling of the change of the different parameters was performed using a generalized linear model for repeated data. Sensitivity and specificity analyses were carried out for the NIPEi and SCR.

Ethics

The study protocol was approved by the local Ethics committee (Comité de Protection des Personnes (CPP) Ouest 6; No. ID-RCB: 2016-A00012-49) and by French National Agency for Medicines and Health Products Safety. This observational study was registered at www.ClinicalTrials.gov (NCT02885051).

Results

Patients and recorded procedures characteristics

Two hundred and fifty-four procedures were recorded for the 90 preterm infants enrolled in this study (Fig. 1). The study was stopped at the end of the planned recruitment period before reaching the calculated sample size. The observed correlation coefficients were much lower than those predicted in our hypothesis. Therefore, the study was stopped because of futility. Patient characteristics and types of procedures are listed in Table 1.

Fig. 1
figure 1

Study progress.

Table 1 Characteristics of the patients at birth and procedures studied.

The mean GA was 30.9 weeks with a mean birth weight of 1548 g. Capillary and venipunctures represented 32.7% (n = 36/110) of all painful procedures and 65.5% of the skin-breaking procedures. Nursing care represented 45.8% (n = 66/144) of the stressful procedures (for details, see Supplementary Table S1).

Patterns of PIPP-R scores, NIPEi, and SCR over time

Graphical plots showed peak responses during the procedure for the PIPP-R, over the initial 30 s of the procedure for the SCR, and at 90 s after the procedure for the NIPEi (Fig. 2). The increase in PIPP-R was more pronounced during painful procedures, while the SCR responses were more prominent after stressful procedures. The NIPEi showed no difference between stressful and painful procedures. The peak responses showing the highest PIPP-R scores during the procedure (time = 0 s) were observed in males and in very preterm infants (<32 weeks GA).

Fig. 2
figure 2

Patterns of the three parameters (PIPP-R, NIPEi, and skin conductance) over time.

Correlations between NIPEi, PIPP-R, and SCR

Analysis of the first recorded procedures for each patient showed no significant correlation coefficients (Pearson’s or Spearman’s) between the NIPEi and PIPP-R scores (baseline, response peaks, and delta) (Table 2). Negative correlation coefficients were expected between the observed decrease in NIPEi responses and the increase in PIPP-R or SCR.

Table 2 Correlation matrix between types of patients, procedures, and responses.

Repeated significant positive correlation coefficients were observed between the PIPP-R and SCR, especially in the 25–32 weeks GA group. Their best correlations (r > 0.80) were found for all painful procedures, particularly skin breaking procedures. Analysis of all 254 procedures showed similar trends, but with lower correlation coefficients between PIPP-R and SCR (Table 2). No significant negative correlation coefficients were found between NIPEi and SCR in the whole sample of recorded procedures (n = 254). The patients who had NIPEi measurements until 3 min after the procedure (n = 80 patients, n = 213 procedures) did not show higher correlation coefficients.

Sensitivity/specificity analyses

Measurement accuracy parameters are reported in Table 3 and Fig. 3. Based on receiver-operating characteristic (ROC) curve analyses, the highest areas under the curves (AUCs) were reported for skin-breaking procedures with cut-off values of NIPEi <46 and PIPP-R scores >6. With PIPP-R cut-off value >10, sensitivity increased mildly, but the negative predictive values (NPVs) were high (>80%). The NIPEi showed minimal specificity <60%.

Table 3 Diagnostic accuracy of the NIPE Index for all procedures, painful procedures, and skin-breaking procedures.
Fig. 3
figure 3

Receiver-operating characteristics (ROC) curves of NIPEi and skin conductance.

The best AUC were reported with the SCR and especially for skin-breaking procedures, associated with high specificity and high NPV. Modeling the relationship between PIPP-R and SCR (peaks/s) associated to NIPEi did not improve the ROC curves significantly (AUC = 0.70 and 0.69, respectively, for PIPP-R > 6 and >10).

Discussion

This is the first study that concurrently measures the parasympathetic (NIPEi index) and sympathetic (SCR) autonomic nervous systems to assess the physiological responses to pain and compares these to a validated composite pain scale (PIPP-R). The NIPEi and the PIPP-R scale showed no significant correlations during painful and stressful procedures in preterm infants. The SCR (peaks/s) showed better correlations with the PIPP-R, especially during skin-breaking procedures.

Our findings are consistent with the study published by Cremillieux et al.15 In a sample of 29 preterm infants, they did not find any correlation between the NIPEi, the DAN, and the PIPP-R performed during an acute painful procedure, but no sensitivity analyses were performed. However, more recently, Walas et al.,20 in a sample of 33 infants, reported a significant association between the decrease of the NIPEi and validated pain scales starting from 1 min after noxious stimulation to within 3 min after an acute painful procedure. Their sensitivity analysis for the NIPEi found an area under the ROC curve of 0.79 for the severe pain group. Our secondary analyses reporting a high NPVs and promising area under the ROC curve analyses are consistent with their results and the findings of Chanques et al.28 in adult patients. The discrepancy between the lack of correlation reported between the NIPEi and pain scales, and the significant results found in the sensitivity analyses needs to be further studied in order to confirm if the NIPEi and SCR could help to identify the highest pain levels in preterm infants.

Our study found the NIPEi to be at its nadir 90 s after the procedure. These results are in line with the findings of Walas et al.,20 who reported a median nadir of 72 s for the severe pain group. However, the median nadir was, respectively, at 111 and 157 s for the moderate and no/mild pain. When monitoring the NIPEi for pain assessment in preterm infants, longer periods exceeding 90 s could be more appropriate. Faye et al.14 observed 28 postoperative newborn infants with a mean GA of 37.8 weeks and showed a significant correlation between HRV analysis based on a time analysis of heart rates in the high-frequency domain (high-frequency variability index) and EDIN scores, which reflect prolonged pain.14

The low specificity of the NIPEi to acute pain could be partly explained by confounding factors that alter HRV. Postnatal neural maturation may have an influence on HRV responses. The HRV remains lower in preterm newborns compared to term newborns at the same corrected postnatal age, suggesting a delayed maturation of autonomic regulation.29 Mechanical ventilation could also influence the HRV response as reported by Padhye et al.,30 who observed decreases in the HRV responses in neonates from 23 to 38 weeks. No patient was mechanically ventilated in our study. A systematic review highlighted the inconsistency of HRV responses for the study of acute pain in infants younger than 1 year and insisted on further research taking into account the possible confounding factors.31

The SCR was significantly correlated to PIPP-R scores in preterm newborn infants during painful procedures and especially for those including skin-breaking procedures. These results were consistent with previously published studies. Munsters et al.13 reported the ability of the GSR measurement (peaks/s) to discriminate sympathetic pain responses as early as 22 weeks of gestational age. This study compared the measurement of SCR to a behavioral pain scale during routine heel lancing in neonates from 22 to 27 weeks of GA. The hemodynamic changes did not influence SCR values, making it an independent tool for pain assessment.32 One pilot study reported that SCR can be observed in a pain-free state and was correlated to skin temperature, but it included only 11 infants and did not standardize study procedures.33 Several studies also showed that gestational maturation did not influence the SCR responses in preterm infants,13,27,34 but these responses were influenced by the postnatal age of term newborns.35,36

Strengths of our study include the observation of a wide range of procedures routinely performed in the NICU (n = 254), not limited to only heel lances or venipunctures. No correlation was found between NIPEi and PIPP-R scores neither with painful procedures nor with stressful procedures. However, including stressful and painful procedures in the same sample could have underpowered the analysis. Our methodology also ensured the inter-rater reliability of the PIPP-R scoring as showed by the high intraclass correlation coefficient between two independent observers. Despite the early termination of this study, our sample size (n = 90) is the largest ever published to date on this specific topic. The calculated sample size was not reached for futility reasons and should not be therefore considered as a limitation. Another limitation is that some observed procedures were still in progress after the end of the recording, which could have interfered with the evolution of some parameters, in particular the NIPEi, whose maximum drop occurred 90 s after the beginning of the procedure. We recommend continued refinement of HRV as a pain assessment method for preterm infants to improve its specificity and positive predictive value.

No unique method was identified as a gold standard, but SCR seemed to be more reliable than NIPEi to assess pain and stress in preterm infants, probably with a greater accuracy when combined with a validated pain scale. The NIPEi did not show sufficient diagnostic accuracy as a unique method for acute pain assessment, although it may exclude the highest levels of pain as suggested by the high NPVs. In clinical practice, it could be of interest to assess its validity in other neonatal populations with suppressed or absent behavioral responses to pain (e.g., following birth asphyxia, neonatal stroke, severe intraventricular hemorrhage, or use of neuromuscular blockade).

Conclusion

PIPP-R and NIPEi were not correlated during acute painful and stressful procedures in hospitalized preterm infants. SCR (peaks/s) was significantly correlated with PIPP-R scores, especially during painful and skin-breaking procedures. Secondary analysis showed that NIPEi and SCRs could help to exclude severe pain in preterm infants, but this needs to be studied further.