Introduction

More than 50,000 infants are born ≤28 weeks post menstrual age (PMA) each year in the US and ~40–55% develop bronchopulmonary dysplasia (BPD) [1]. Premature infants with BPD are challenging to treat, have a high mortality rate, and survivors suffer from multiple morbidities, including pulmonary hypertension, prolonged hospitalization, and neurodevelopmental impairement [2,3,4]. Approximately 14–28% of premature infants with BPD develop pulmonary hypertension, and there is increased risk of mortality among those with BPD who develop pulmonary hypertension during the neonatal admission, usually due to cardiorespiratory compromise and right-sided heart failure. These infants are often treated with multiple drugs in an attempt to reduce pulmonary vascular resistance [5]. However, the US Food and Drug Administration has not approved any drugs for infants with BPD and pulmonary hypertension, and to date, no drug has demonstrated efficacy.

The lack of a robust, reproducible non-invasive diagnostic marker for pulmonary hypertension is a major barrier to therapeutic trials in this population [6,7,8]. Cardiac catheterization is the gold standard for quantifying pulmonary hypertension in adults and children, but is associated with a high incidence of mortality and morbidities in infants, particularly those weighing less than 2 kg [9,10,11]. Echocardiography is the ideal non-invasive alternative but cannot be validated against the gold standard due to risks and limited feasibility of performing a high volume of cardiac catheterizations in the neonatal population. Typically, echocardiographic diagnosis of pulmonary hypertension is reliant on tricuspid regurgitation jet velocity (TRJV) to estimate pulmonary arterial pressure, but the measurement frequently fails due to the lack of a sufficient regurgitation jet [9, 12]. Without a sufficient tricuspid regurgitation jet, there is a reliance on a combination of qualitative and quantitative measures of pulmonary hypertension by reading cardiologists. Historically, there has been poor agreement in diagnostic interpretation among cardiologists relying on such combinations of qualitative and quantitative measures for diagnosis in different disease states [13, 14]. In the neonatal population, recent work has shown that assessment of septal geometry by end-systole eccentric index improves diagnostic consistency of pulmonary hypertension [15]. Recognizing the current limitations, echocardiograms are used clinically in the intensive care nursery for monitoring pulmonary hypertension progression and treatment effects [12, 16, 17]. Echocardiography has not been well-validated in premature infants and needs to be studied prior to using it in a therapeutic clinical trial for the identification and monitoring of pulmonary hypertension in premature infants with BPD.

Recognizing this critical need, our objective was to determine whether echocardiograms are sufficiently reliable to diagnose and monitor pulmonary hypertension in at risk premature infants. We report our evaluation of the agreement of pulmonary hypertension diagnosis using clinically obtained echocardiograms read by pediatric cardiologists in premature infants at risk for BPD.

Methods

Study population

Infants were selected from two sites (University of North Carolina Medical Center and Duke University Medical Center) as a part of this retrospective cohort study. Inclusion criteria included: (1) Post-menstrual age at birth ≤28 weeks; (2) At least two echocardiograms performed per standard of care after day of life 14 and at least 1 week apart; and (3) Born between 2005 and 2015. Exclusion criteria included: (1) Known congenital heart disease (except patent ductus arteriosus, patent foramen ovale, or secundum atrial septal defect); (2) Known congenital diaphragmatic hernia; or (3) Known chromosome abnormality.

The study protocol was approved by The University of North Carolina at Chapel Hill Institutional Review Board and the Duke University Medical Center Institutional Review Board with a waiver of informed consent. Study coordinators at participating sites abstracted data from the medical record. Echocardiograms were transferred to Duke Clinical Research Institute (DCRI), stripped of all protected health information, and loaded onto a central server for reading. Echocardiographic measures were obtained from two-dimensional and Doppler echocardiograms using a calibrated off-line analysis system (Digisonics Digiview, Houston, TX, USA). Data quality was assessed by the DCRI study team through frequent structured data review meetings throughout the echocardiogram-reading period. Additionally, DCRI core laboratory sonographers checked all pediatric cardiologist echocardiogram report forms for appropriateness and completeness of measurement and interpretation reports.

Echocardiographic measurements

Three pediatric cardiologists (1 senior and 2 juniors) masked to any patient-specific clinical information each interpreted all study echocardiograms using a standardized imaging protocol with an additional random 10% re-read. The pediatric cardiologists were aware of the risk of pulmonary hypertension but were not aware of any clinical information regarding the patients, including diagnoses, clinical status, and treatments, and timing of the echocardiograms. End-systolic eccentricity index was used to quantify septal geometry changes related to elevated pulmonary arterial pressure [18]. Right ventricular structure and function were assessed by the following qualitative and quantitative measurements: subjective composite scoring assessment (Table 1), direction of cardiac shunts, right ventricular fractional area change percent, and tricuspid annular plane systolic excursion measured according the pediatric quantification guidelines of the American Society of Echocardiography [19]. Measurements were obtained in triplicate of three different beats when possible and averaged together [20]. Blood pressure measurement was obtained from the echocardiogram reports; blood pressure measurement method was not recorded.

Table 1 Pulmonary hypertension composite score of echocardiographic measures

Definition of pulmonary hypertension

We defined pulmonary hypertension as follows:

1. Echocardiograms with a tricuspid regurgitation jet complete Doppler envelope:

  1. a.

    Estimated right ventricular systolic pressure/systolic blood pressure ratio ≥0.67 without pulmonary outflow tract obstruction was considered pulmonary hypertension [9]. Right ventricular systolic pressures were estimated from a continuous wave Doppler tracing of the peak tricuspid valve regurgitation velocity and converted to pressure using modified Bernoulli’s equation (4 xTRJV2) without consideration of central venous pressure.

  2. b.

    A peak tricuspid valve regurgitation jet velocity >3.0 m/s was also considered consistent with pulmonary hypertension.

  1. 2.

    Echocardiograms without a tricuspid regurgitation jet: A composite scoring system previously developed by the study team from prior data was used (Table 1) [21]. Validation of the scoring system was not possible in this study as only one infant had a cardiac catheterization.

    1. a.

      Using the composite scoring system, a subjective right ventricular pressure estimate was assigned to all echocardiograms, regardless of presence of tricuspid regurgitation jet. For each echocardiogram, the severity level was assigned based on the echocardiogram possessing at least two of the four criteria for that severity level (e.g. eccentricity index of 1.3 and moderately depressed right ventricular function would qualify for a subjective right ventricular pressure estimate of moderately elevated). The most severe level in which the 2 of 4 criteria were met was assigned. Moderately or severely elevated pressure indices were considered consistent with pulmonary hypertension.

Analysis

The primary outcome measure was inter-rater agreement of the diagnosis of pulmonary hypertension among the panel of pediatric cardiologists (dichotomous variable: present or absent). We report the proportion of infants diagnosed with pulmonary hypertension using echocardiography for each pediatric cardiologist. Inter-rater agreement was measured using modified Fleiss’ kappa coefficients with bootstrap 95% confidence intervals [22].

Three secondary outcomes were analyzed. First, the intra-rater agreement of the diagnosis of pulmonary hypertension was assessed by comparing a random sample of 10% of the echocardiograms read twice, in random order. This intra-rater agreement of the diagnosis of pulmonary hypertension was measured by percent agreement and modified Fleiss’ kappa with 95% confidence intervals. Second, the performance of the consensus panel over time was measured by comparing percent agreement and modified Fleiss’ kappa calculated at three sequential study time points (study reading period divided into thirds). Third, we assessed the internal consistency of the pulmonary hypertension composite score using Cronbach’s alpha. Cronbach’s alpha if one item deleted was also calculated to assess for redundancy in composite score items. Additionally, percent agreement and modified Fleiss’ kappa was calculated for each component of the composite score to assess inter-rater agreement within components. These measures were calculated with the sample population stratified by clinical factors (weight > or ≤ 1500 g, day of life [DOL] > or ≤ 60, PMA > or ≤ 40 weeks, mechanically ventilated or not).

Sample size

A sample size of least 450 echocardiograms was deemed necessary to estimate Fleiss’ kappa with a precision ≤0.05, assuming a kappa of 0.7, prevalence of pulmonary hypertension of 0.3, a type I error rate of 0.05, and three raters [23].

Results

A total of 483 echocardiograms from 49 unique patients meeting inclusion criteria were selected from two institutions and read by all three pediatric cardiologists. Forty-nine echocardiograms were randomly selected to be read a second time by each reader. Infants with the most echocardiograms at each institution were selected to minimize the number of unique patients and patients most likely to carry a clinical diagnosis of pulmonary hypertension. Included patients had a median of 9 echocardiograms (range 6–29), of which a median of four were per infant prior to 36 weeks PMA. Overall, 188 of 483 (38.9%) echocardiograms were from infants <36 weeks PMA.

Patient characteristics are shown in Table 2. Forty-six of the 49 infants were diagnosed with BPD prior to hospital discharge defined as oxygen use at 36 weeks’ post mentrual age [1]. Of these, 12 of 49 (24.5%) were treated with nitric oxide, 11 of 49 (22.5%) were treated with sildenafil, and 19 of 49 (38.8%) were treated with one of the two.

Table 2 Patient characteristics

The clinical echocardiograms generally did not have quantifiable measures of right ventricular structure or function. Pulmonary arterial pressure estimates based on TRJV were measured in only 6% of echocardiograms (88 of 1449 total studies interpreted). The median TRJV was 40 mmHg (range 13–133). For echocardiograms with measurable TRJV, the Kappa correlation coefficient between TRJV and the composite score diagnosis of pulmonary hypertension was 0.6 (95% CI: 0.42–0.79). EI was measured by all three readers in 344 echocardiograms. The interclass correlation for EI among the readers was 0.59. Right ventricular fractional area change was reported in 73% of echocardiograms (1059 of 1449 studies). Agreement of fractional area change among readers was very low with an interclass correlation of 0.17. No patients had echocardiographic evidence of pulmonary vein stenosis.

Only one infant underwent a cardiac catheterization, which demonstrated the diagnosis of pulmonary hypertension. For this infant, there was an echocardiogram 7 days prior to the catheterization, and all reviewers reported no evidence of pulmonary hypertension. Additionally, there was another echocardiogram 14 days after the catheterization, and all reviewers reported evidence of pulmonary hypertension.

Agreement among three pediatric cardiologists on the echocardiographic definition of pulmonary hypertension

Of the 483 echocardiograms reviewed, 470 (97.3%) contained results sufficient to determine the presence or absence of pulmonary hypertension using the primary study definition (elevated TRJV or composite score) from all three pediatric cardiologists and were included in the analysis. Thus, the composite score defined most cases of pulmonary hypertension. The percentage of pulmonary hypertension diagnosis by each pediatric cardiologist was 12.3, 5.5, and 18.5%, respectively. Percent agreement among the panel of cardiologists for echocardiogram-based diagnosis of pulmonary hypertension was 81.9% (95% confidence interval: 78.4–85.4%), demonstrating overall very good agreement (Table 3). The modified Fleiss’ kappa was 0.759 (95% confidence interval: 0.771–0.801) reflecting substantial agreement among the cardiologists [24]. Furthermore, the percent agreement over time was calculated to assess changing individual reading patterns over time. There was consistent percent agreement among the readers throughout the study (range 79.5–83.88%) and modified Fleiss’ kappa range was 0.726–0.784.

Table 3 Agreement among three pediatric cardiologists on echocardiographic diagnosis of pulmonary hypertension

Assessing intra-rater reliability, duplicates of ~10% (49/483) of the echocardiograms were randomly incorporated into the readers’ reading list. Percent agreement between re-read echocardiograms was 92.4% overall (range 89.8–95.7%), and modified Fleiss’ kappa was 0.847 (95% confidence interval: 0.750–0.931), demonstrating high agreement of initial interpretation with re-read echocardiograms.

Additionally, agreement among the three readers was stratified by several clinical factors (Table 4). There tended to be higher agreement among the readers with echocardiograms obtained on younger (less than 60 days old), PMA <40 weeks, and sicker (mechanically ventilated) infants.

Table 4 Stratified agreement by clinical factors at time of echocardiogram

Agreement on each composite score component

Table 5 demonstrates the agreement among all three pediatric cardiologists for each component of the composite score definition for pulmonary hypertension. There was a high level of agreement for the direction of shunting at the level of inter-atrial connection (percent agreement 95.8%, modified Fleiss’ kappa 0.944). The remaining three components revealed wider variability but overall good agreement (percent agreement 76.7–82.9%, modified Fleiss’ kappa 0.743–0.785). For the composite score, there was a Cronbach’s alpha for all items of 0.640. Cronbach’s alpha increased to 0.712 with the shunting item deleted (Table 5).

Table 5 Agreement and internal consistency of the composite score definition of pulmonary hypertension

Discussion

Diagnosis of pulmonary hypertension in infants with BPD is challenging. Diagnostic cardiac catheterizations are not routinely performed in the intensive care nursery due to significant risk of complications. Thus, clinicians and researchers are left to diagnose and monitor disease course with non-invasively derived estimates of pulmonary arterial pressure. Echocardiography is currently the clinical modality for the diagnosis of pulmonary hypertension and for disease monitoring. In this current study, we present the first publication to evaluate the agreement of pulmonary hypertension diagnosis fromclinically acquired echocardiograms in at-risk infants using an agreed definition for pulmonary hypertension. The main findings of our study are [1]: there was a high level of inter- and intra-rater agreement in the diagnosis of pulmonary hypertension among the panel of pediatric cardiologists masked to patient-specific clinical information; and [2] there was a high level of agreement among readers in the composite score definition of pulmonary hypertension despite intermediate levels of internal consistency.

These findings are an important step forward in refining a research- and clinically-deliverable diagnosis of pulmonary hypertension. An advantage of this study approach included a pre-defined definition of pulmonary hypertension. As seen in this study and others, TRJV without consideration of central venous pressure is frequently inadequate for estimation of pulmonary pressures in this population, possibly related to low-intravascular volume, as diuretics use is common in these infants [9]. Recently published work shows that there are additional concerns about TRJV’s poor correlation with invasively derived mean pulmonary artery pressure and suggest that other echocardiographically derived estimates of mean pulmonary artery pressure may perform better maximum TRJV [25]. Our composite score definition of pulmonary hypertension had advantages of clear categories, recognizing that evaluation of septal geometry may be the next most important feature of the echocardiogram in defining pulmonary hypertension in infants following TRJV [15, 26,27,28]. Additionally, the inter- and intra-reliability of the readers’ interpretations performed well. In other pediatric disease states relying on subjective assessments, inter-observer agreements in the interpretation of echocardiograms have been poor [13, 14]. Finally, the consistency of pulmonary hypertension diagnosis across the study timeline suggests that there was no significant maturation bias.

Cronbach’s alpha is a measure of a scale’s internal consistency. Values below 0.7 indicate poor internal consistency, whereas values >0.9 indicate redundancy in scale components [29]. Cronbach’s alpha is also calculated for the scale with each component deleted. As Cronbach’s alpha increased from 0.640 on the full scale to 0.712 with the shunting item deleted, this measure suggests that the shunting item should be removed, possibly due to the overly common result of shunting direction (i.e. left to right). Further work should be undertaken to improve and validate the echocardiographic definition, as it is apparent that a consensus definition is needed clinically and for therapeutic research trials to proceed.

Agreement among the three cardiologists tended to be best with the younger, smaller infants, even when the same infants have multiple echocardiograms. There are a few plausible explanations for this finding. First, as the child in the intensive care nursery ages, there is a lower likelihood of the presence of shunt direction indicators such as a patent foramen ovale or patent ductus arteriosus. As these physiologic shunts close with age or intervention, there are fewer sources for reliable indicators of relative pressure differences between the left and right ventricles in the echocardiogram. Related to presence of echocardiographic findings, acoustic windows may deteriorate as a children with BPD ages due to worsening lung condition. Second, agreement may worsen as the child ages due to cardiac changes from long-standing elevated pulmonary pressure. For instance, an older infant could have a large, thick right ventricle due to past elevation in pulmonary arterial pressure, but due to successful treatment the pressure is not elevated at the time of the exam. The discordance between the appearance of the right sided structures and the current pulmonary pressure could be another source of disagreement. Finally, younger infants may have had a greater proportion of severe disease. Thus, the relationship of younger infants and greater agreement between readers may reflect greater agreement with more severe cases of pulmonary hypertension.

There are several limitations to the current study. First, the study is retrospective in nature and relied on clinically-acquired echocardiograms across two institutions with slightly different institutional echocardiogram protocols. Although this limitation is important, it also suggests that standardized research echocardiograms performed according to a strict protocol may enjoy even higher agreement and better differentiation of pulmonary hypertension. Likewise, the high degree of agreement among the readers should be interpreted within the context of adherence to study definitions of measurement and interpretations detailed in this manuscript. Second, while the composite score performed well by agreement, it did not provide distinction between two structural and physiologic states in children with pulmonary hypertension: [1] current elevation of pulmonary arterial pressure vs [2]. cardiac changes (i.e. dilated right ventricle, main pulmonary artery, etc.) associated with prolonged elevation of pulmonary arterial pressure but not current elevation of pulmonary arterial pressure. Due to the lack of cardiac catheterization data on these infants, there is no feasible gold standard test to validate these diagnoses or validate the scoring system proposed in this study. Third, the selection of patients into this study increased the pretest probability of the diagnosis of pulmonary hypertension among these infants. Thus, these results should be interpreted in the context of high risk infants with BPD. With that stated, less than 20% of individual echocardiograms were determined to meet the study definition of pulmonary hypertension. Finally, we had intended to compare cardiac catheterization measurements to the echocardiograms. However, only one infant meeting inclusion criteria at either institution had a cardiac catheterizations performed. This underscores the relative rarity of cardiac catheterizations in the intensive care nursery, presumably due to the high risk of a procedural adverse event in small infants.

Additionally, the total number of echocardiograms are from only 49 patients thus may not represent independent observations. Whereas correlation between the echocardiograms of the same infant may exist, the study was designed to evaluate the correlation of the assessment between different reviewers of each individual echocardiogram. The readers were not aware of studies from the same infant. Further, we acknowledge that correlation between observations may violate the assumption of independence used in the sample size calculation. We did not measure the quantitative measures of the correlation of agreement and its impact on the confidence interval due to limited statistical research in this area as to the appropriate methods. In this study, the potential correlation of interest is between the agreement of one echocardiogram with agreement on another echocardiogram from the same infant. As the echocardiograms reviewed for each infant in this study represent routine medical care across a variety of clinical conditions (ventilatory status, PMA, pulmonary vasodilator medication exposure), no two echocardiograms within an infant were expected to be highly similar. Within infants, a median of 9 echocardiograms over a median 18 weeks of life were included. No infant had two echocardiograms from the same day.

Conclusion

Pediatric cardiologists, masked to patient-specific clinical information, can reliably agree on the diagnosis of pulmonary hypertension by echocardiography. Further work should focus on defining and staging pulmonary hypertension in infants using prospectively acquired research echocardiograms validated with cardiac catheterization.