As national healthcare expenditures have grown, increasing attention has been paid to value (i.e., quality of health outcomes per dollar spent) across medical fields, including neonatology [1, 2]. Identifying high-yield opportunities to reduce waste and improve value can be challenging. Past efforts to set priorities for neonatal care have relied on expert opinion. For example, in 2015 the Choosing Wisely Campaign, an initiative to advance a national dialogue on avoiding unnecessary medical tests, treatments, and procedures, published their top five list in newborn care based on expert consensus without objective cost and utilization data [3]. More objective methods of priority setting are needed to focus value improvement efforts on high yield targets. Recent work has estimated the cost of individual test and treatment categories used in neonatal care, however, cost estimates alone are insufficient for prioritization [4].

Unwarranted practice variation has been identified as a key contributor to health care inefficiency and waste [5, 6]. Prior work has established significant inter-hospital variation beyond what is expected from differences in the patient population for many neonatal practices [7,8,9,10,11]. In some instances, unwarranted practice variation has been linked to variation in patient outcomes [12,13,14]. Much of this variation likely reflects the uncertainty of evidence and provider-specific preferences [15]. These highly variable care patterns represent opportunities for reducing unnecessary and wasteful care by establishing “best practices” for use either through comparative effectiveness research or quality improvement efforts [16]. Comparative effectiveness research is clinical and epidemiological research focused on determining what health interventions, or a combination of health interventions, achieve the best outcomes [17]. Practice variation can be a useful tool in priority setting as greater variation suggests a greater opportunity for change and optimization. While practice variation has been used more broadly in pediatrics to prioritize research [18, 19], it has not been used in efforts to develop a priority-setting framework for neonatal care.

Therefore, our objectives were to estimate the inter-hospital variability of clinician-driven tests and treatments (CTTs) among very low birth weight and very preterm infants during their birth hospitalization, and create an objective prioritization framework for value-based improvements in neonatal care by combining data on cost, use and inter-hospital practice variation.


Study design

We conducted a retrospective cohort study of very low birth weight (VLBW, birth weight <1500 g) and very preterm (VP, gestational age (GA) <32 weeks) infants admitted to neonatal intensive care units (NICUs) in the United States (US) children’s hospitals affiliated with the pediatric health information system (PHIS) database and discharged from 2012 to 2019, to estimate the cost and inter-hospital variability of CTTs ordered during hospitalization.

Data source

PHIS is an administrative database containing hospitalization data from 51 tertiary-care children’s hospitals, maintained by the Children’s Hospital Association (Lenexa, KS). The database contains data on demographics, diagnosis and procedure codes (using International Classification of Diseases, Ninth and Tenth Revision, Clinical Modification [ICD-9, ICD-10]), and daily resource utilization. Resources at each hospital are mapped to a common set of clinical transaction codes which are organized into imaging studies, clinical services, laboratory tests, pharmacy, supplies, and other (e.g., room) charges. IBM Watson Health (Ann Arbor, MI) manages the data warehouse function for the database. Data are subjected to reliability and validity checks and must pass a specified threshold of quality before being incorporated into the database. All personal health information is deidentified within the database. A protocol was reviewed by the Baylor College of Medicine Institutional Review Board and was not considered human subjects research.

Cohort identification

Subjects were identified as either VLBW or VP by discrete data for GA and birth weight (BW), or by diagnostic code if discrete data were unavailable. Subjects less than 22 weeks’ gestation or <400 g were excluded. Subjects admitted after 1 day of age were excluded since we were unable to measure resource use at the referring hospitals. Utilization and cost were only considered for days with a NICU bed charge to exclude costs incurred in other units (e.g., pediatric intensive care units). We excluded subjects with congenital anomalies that could significantly impact care costs using ICD-9/10 diagnosis codes. Two authors (BCK, JLS) reviewed all ICD-9/10 diagnosis codes among the potential cohort and independently assigned codes for exclusion. Disagreements were settled by consensus. Excluded diagnoses included critical congenital heart defects and congenital malformations of other organ systems (renal, lung, etc.). Diagnosis codes for a patent ductus arteriosus and atrial septal defects, common diagnoses among preterm infants, were not excluded. To account for potential data entry errors, we utilized a number of other exclusions which have been previously described [20]. In addition, we also excluded any hospital with fewer than 100 patients meeting our inclusion and exclusion criteria during the study period to ensure each included hospital had a sample size large enough to accurately estimate median utilization rates for between-hospital comparisons.

Outcomes reported

Pharmaceutical, laboratory, and imaging billing were classified into clinically relevant CTT categories (e.g., chest radiographs, antibiotics) and costs were estimated, as previously described [20]. In brief, costs are estimated from hospital charges, regionally adjusted using the CMS wage/price index and adjusted to 2019 US dollars using the producer price index for inpatient services, which is considered the best available tool for inflation of inpatient hospital costs [21]. To account for variation in billed charges across PHIS hospitals, standardized costs were applied to all encounters using the median cost for each billing item across all hospitals [18]. Exposure to a CTT category was defined as at least one charge for any component of that CTT on at least one day during their NICU stay. We defined NICU hospitalizations that were in the top ten percent of CTT-related costs as Resource Intensive NICU stays. Physician billing and procedure costs are not available within the PHIS database and were therefore not included in the analysis. The costs of nutritional fortification for enteral feeding were also not estimated because they are not routinely billed separately from the daily room and board charges.

Severity of illness

We assessed patient-level severity of illness using a length-of-stay-based relative weight approach. We adapted the Hospitalization Resource Intensity Score for iKds (H-RISK) by restricting the all-patient refined diagnosis-related groups (APR-DRGs; 3M Health Information Systems, Salt Lake City, UT) to only those admitted to PHIS NICUs and by basing our calculation of relative weights on NICU days [22] These relative weights were calculated as the ratio of mean NICU LOS for each APR-DRG relative to the overall mean LOS among all infants admitted to NICUs included in PHIS. Mean values were Winsorized (i.e., extreme high and low outlying values were replaced with the 95th and 5th percentiles respectively) and NICU mortalities were excluded to minimize the influence of extreme outliers on relative weight calculations. Relative weights (the NICU-SOI score) were then applied to our cohort based on the APR-DRG assigned to each NICU stay. A NICU-SOI score is greater than 1 means that their NICU stay was assigned an APR-DRG with a mean NICU LOS that is longer than the average NICU encounter for included NICUs. Increasing values of the NICU-SOI score indicate increasing mean NICU LOS for the assigned APR-DRG, suggestive of higher severity of illness.

Statistical analysis

We summarized categorical variables using frequencies and percentages, non-normally distributed continuous variables using medians and interquartile range (IQR), and normally distributed variables using mean and standard error (SE). Demographics in PHIS include sex, GA, birth weight, age at admission, admission source, race/ethnicity, insurance type, median household income, and disposition at hospital discharge (including mortality). We compared demographics and the NICU-SOI score between resource-intense NICU stays and resource mild/moderate NICU stays using a chi-square test for the association for categorical variables, a Wilcoxon rank-sum test for non-normally distributed continuous variables, and a two-sample t-test for normally-distributed continuous variables. Multivariable generalized linear mixed models (GLMM) were used to identify demographic and clinical characteristics associated with overall CTT-related spending and the odds of having a resource-intensive NICU day. Covariates in the model were the demographics listed above and our calculated NICU-SOI score. All GLMMs included a random hospital effect to account for the clustering of NICU patients at the same hospital.

We assessed hospital-to-hospital variation in total CTT-related costs by estimating the intraclass correlation coefficient (ICC) as the percentage of total variation attributable to hospital variation after adjusting for patient demographics and our NICU-SOI. Race/ethnicity was included as a variable because of its association with quality of care[23]. We identified high and low outlying hospitals for total adjusted per-patient CTT-related costs and specific billing group (pharmaceutical, laboratory, imaging) costs by comparing hospital medians to the cohort overall median and IQR. High- and low-cost outlier hospitals were defined as hospitals with a median adjusted per-patient CTT-related cost greater or less than the population third and first quartile, respectively.

Next, we created a variability index that estimates hospital-to-hospital variation in utilization and allows for direct comparisons across CTT categories. The variability index was derived to capture two types of inter-hospital variation; variation in the proportion of patients exposed to a given CTT category at least once (“exposure variability”), and variation in utilization of the CTT category among those exposed (“utilization variability”). Exposure variability was calculated using the adjusted percent exposure in the entire cohort (i.e., mean exposure) and the measured spread of adjusted hospital-specific exposures around the mean using standard distances. Exposure variability was estimated by calculating the standard deviation of those standard distances. This approach estimates variability in the proportion of infants exposed for each CTT category at each hospital beyond what would be expected to occur by chance when sampling from the overall population. Utilization variability was calculated using CTT-related costs, which act as a surrogate for utilization in this case because the PHIS database applies an average cost estimate to all patients across included hospitals. Higher costs on a given hospital day (and/or a greater number of hospital days with related costs) mean more utilization of that CTT category. Utilization Variability was estimated by calculating the coefficient of variation (CV) of the adjusted hospital mean total costs among exposed patients. The variability index for a CTT category was then calculated as the standardized Euclidean distance of the exposure and utilization variabilities described above. To minimize the impact of the hospital to hospital variation in exposure due to differences in billing patterns (rather than variation in physician and hospital practice patterns), we excluded hospitals from an individual CTT category variability index calculation if their hospital exposure rate was more than four times or less than one-quarter of the overall population exposure rate for a given CTT category.

Lastly, we developed an overall prioritization score for each CTT category by calculating the standardized Euclidean distance (from the origin) based on three factors; total adjusted costs, the proportion of patients exposed, and variability. A flowsheet of the methodology to create the prioritization score is included in the online supplement (Supplemental Fig. 2). All components were standardized using standard deviations to mitigate the influence of any one component on the overall distance calculation. Larger prioritization score values indicate greater costs, higher volumes, or higher hospital-to-hospital variation (weighted equally). P-values less than 0.05 were considered statistically significant. All data management and analyses were conducted using SAS v9.4 (SAS Institute, Cary, NC).


Cohort demographics and risk of resource-intensive NICU stay

We identified 26,098 subjects across 40 children’s hospitals contributing 1,373,883 total NICU days which met our inclusion and exclusion criteria. Ten hospitals were excluded for low patient volume, and one was excluded because it did not provide consistent data during the study period, with a gap in annual neonatal admissions recorded. A flow diagram with our inclusion and exclusion criteria is included (Supplemental Fig. 1). Patient demographics for the entire cohort are summarized in Table 1. On multivariable logistic regression analysis, decreasing GA, male sex, Black race, outborn admission, and a higher NICU-SOI score were all associated with significantly higher odds of a resource-intensive NICU stay, while mortality and self-pay insurance (compared to commercial insurance) were associated with lower odds of a resource-intensive NICU stay (Table 1). On sensitivity analysis, when our NICU-SOI score was excluded, the odds of a resource-intensive NICU stay increased as the birth weight category decreased (Supplemental Table 1). A secondary analysis using linear regression and total CTT-related costs showed similar results (Supplemental Table 2).

Table 1 Cohort demographics and adjusted odds ratio of a resource-intensive NICU stay.

Inter-hospital variation in total and billing group adjusted CTT-related costs

Inter-hospital variation in total and billing group adjusted CTT-related costs are reported in Fig. 1. The ICC, an estimate of the percentage of variation explained by inter-hospital variation (as opposed to patient-level severity of illness, defined by our model to include demographic variables and the LOS-based case mix index), was 27.7% for total CTT-related costs. Six hospitals (15%) were outliers in total adjusted CTT-related costs, defined as having a median cost greater than the upper quartile range for the overall population. The median per-patient adjusted CTT-related costs at those six hospitals were all more than twice the overall population median ($17,801 for the lowest of the high-cost outliers, compared to $7942 for the population median). Among the three billing groups, pharmaceutical adjusted CTT-related costs had the widest variation across hospitals. The six high-cost outlier hospitals among pharmaceutical adjusted CTT-related costs were the same six high-cost outliers in overall spending. Laboratory and Imaging adjusted CTT-related costs had five and four outlier hospitals, respectively, but the overall differences in spending among those billing groups were smaller than that for pharmaceutical spending (Fig. 1b).

Fig. 1: Variability in median (IQR) adjusted per-patient total CTT-related costs (A) and billing group heat map (B) across included US Children’s Hospitals.
figure 1

Hospital outliers were defined as hospitals with median adjusted costs greater/less than the inter-quartile range of the entire population. Within the billing group heat map (B), rows represent individual children’s hospitals, and columns are median adjusted per-patient cost estimates for each billing group and total CTT-related costs. aIntraclass correlation coefficient estimates the amount of total variation due to inter-hospital variation after accounting for patient demographics and illness severity.

Prioritization of CTTs

Adjusted total cost and descriptive measures of CTT variability are reported in Table 2, ranked by their prioritization score. Parenteral nutrition, chemistries, and anticoagulants were the costliest CTT categories, responsible for a combined total cost of $111,373,888 (40% of the cumulative cost of all CTT categories included). Chest radiographs were the costliest Imaging CTT ($14,852,629) but ranked fifth in total cost across all billing groups. Exposure Variability and Utilization Variability for each CTT category are shown on a scatter plot (Fig. 2). Based on our calculated inter-hospital variability index which combines those two variability estimates, anticoagulants, glucose monitoring, and hematology laboratory tests were the most variable overall. The imaging CTT with the highest inter-hospital variability was abdominal radiographs, but they were ranked eighth overall based on the inter-hospital variability index.

Table 2 Prioritization of clinician-driven tests and treatments.
Fig. 2: Scatter plot of exposure and utilization variability for CTT categories.
figure 2

aExposure variability is the standard deviation of the standard distances of adjusted hospital exposure proportions from the mean population exposure proportion for each CTT category. bUtilization variability is the coefficient of variation of the adjusted hospital mean costs per exposed patients. cUtilization variability for anticoagulants exceeded the x-axis limit of the larger figure. Three other CTT categories are repeated in inset for reference.

The components of our prioritization score (total cost, variability, population exposure) are represented by a bubble chart that plots the inter-hospital variability index by the total cost (Fig. 3). The top 3 CTT categories with the highest prioritization scores were parenteral nutrition, anticoagulants, and hematology, which together were responsible for 33% of the cumulative cost of all included CTT categories. Of the top 10 CTT categories for prioritization, three are pharmaceuticals, five are laboratory testing categories, and two are imaging tests and combined accounted for 66% of the cumulative cost of all included CTT categories.

Fig. 3: Prioritization framework for value-driven comparative effectiveness research and quality improvement.
figure 3

Size of the bubbles represents the percentage of the total cohort exposed to each CTT-category. aAdjusted for patient demographics (sex, gestational age, birth weight, age at admission, admission source, race/ethnicity, insurance type, median household income, and disposition at hospital discharge) and our NICU-SOI score. bInter-Hospital Variability Index is the standardized Euclidian distance of Exposure and Utilization Variability. cParenteral nutrition, Anticoagulants, Chemistries and Blood gases exceeded the y-axis limit for the larger figure.


We report the first value-based prioritization framework for comparative effectiveness research and quality improvement initiatives in the care of preterm infants. Combining estimates of cost, exposure, and inter-hospital variability into a single prioritization score, we ranked test and treatment categories to identify targets for further research with the highest potential for improving the value of neonatal care. Among a cohort of 26,098 infants across 8 years, the top 10 high priority CTT categories were responsible for $185,820,182 in costs (66% of all costs from included CTT categories), and include many commonly used tests and treatments in neonatal care, suggesting value-based improvements should focus on optimizing our approach to routine neonatal care. The use of parenteral nutrition was identified as the highest priority overall, followed by anticoagulants (including use for central line patency), and a number of commonly used laboratory test categories (hematology, glucose monitoring, chemistries, blood gases).

Estimates of variability have been used previously to prioritize value-driven efforts. Lee et al. measured cost variability to prioritize a value-driven outcomes program and found prematurity had the third-highest variation indirect costs among inpatient and outpatient diagnoses, highlighting the importance of focusing on variability in preterm infant care [24]. Keren et al. and Cameron et al. used variation in total cost to establish priorities for comparative effectiveness research among inpatient pediatric diagnoses and pediatric surgical diagnoses, respectively [18, 19]. These studies were not designed to identify specific drivers of cost variation within each population. Our study similarly uses a value-based priority-setting approach but is novel in identifying the key drivers of resource-related cost and variability within a specific neonatal population. This is in line with work done by Providence St. Joseph Health system, which has used detailed data on practice variation to drive their value-oriented architecture program to improve outcomes and reduce costs within specific clinically relevant patient populations [25].

Practices identified by our prioritization score may represent potential opportunities for targeted deimplementation of routine tests and treatments. Deimplementation science is the process of identifying low-value services that can be safely eliminated or reduced in practice [26, 27]. While deimplementation may involve broadly eliminating wasteful practices, our high priority targets require a more nuanced approach. For example, parenteral nutrition had the highest prioritization score overall, due to its high cost, frequent use, and wide variability. The optimal use of parenteral nutrition and feeding practices in preterm populations are areas of uncertainty and active study [28,29,30]. Prioritizing parenteral nutrition as a target for value-based practice improvement should focus on identifying specific opportunities for reduction, such as older stable preterm infants who may safely tolerate faster feeding advancement, rather than broad elimination strategies. A focus on targeted faster feeding advancement would also address anticoagulation, another category that ranked highly on our prioritization score, as it would have the potential to reduce central line days. Similar efforts would be needed for other high-priority CTT categories, such as commonly used laboratory tests and imaging studies which may not be universally wasteful, but potentially overused.

While our prioritization framework focuses on cost and variability, another important factor when setting priorities is the reduction of unnecessary harm. Direct harm, such as side effects that result from medications, are easier to identify through traditional methods of study. However, indirect harms from unnecessary testing can also be significant but are generally more difficult to identify. The cascades of care have been described in other specialties in which unnecessary testing leads to further wasteful and harmful downstream care pathways [31,32,33]. In addition, invasive procedures have been associated with abnormalities on brain magnetic resonance imaging and lower IQ in preterm infants [34, 35]. While indirect harms from excess testing are less frequently discussed in neonatology, quality improvement efforts aimed at reducing unnecessary testing are in development and would directly address many of the top 10 CTT categories based on our prioritization score [36]. Future comparative effectiveness research and quality improvement efforts focused on optimizing testing patterns should consider the potential for indirect harm from excess testing.

Our study has several limitations. The PHIS database has a data quality control program to minimize the risk of data errors. We further minimized the risk of misclassification by applying exclusion criteria to improve data quality. However, using billing patterns to estimate variation in utilization may partly reflect differences in coding systems between hospitals. The PHIS database robustly combines different hospital billing definitions into a unified coding system and we further reduced miscoding risk by excluding extreme outlier hospitals, which may represent differences in billing practices that were unaccounted for. Cost-to-charge ratios are a commonly used tool to estimate cost, used by many large administrative databases like the National Inpatient Sample and the Kids Inpatient Database, and many economic evaluations [37,38,39,40]. Despite their prevalent use, they are not a precise estimate of cost [41]. The PHIS cost master index uses the mean of all hospital and department-specific estimated costs for each billable item to make dollar values directly comparable [15]. This adjustment should be considered when interpreting overall cost estimates. Our prioritization framework specifically and purposefully focused on potentially modifiable costs from clinician-ordered tests and treatments. Daily room costs have been shown to be the largest component of costs during birth hospitalization, and therefore the length of stay is a principal driver of cost [4]. Reducing the unnecessary length of stay is also critical to reducing costs and improving value in neonatal care.

There are also important limitations to the generalizability of these findings. The PHIS database is comprised of freestanding US children’s hospitals, which may not fully reflect practice patterns in other settings such as community birth hospitals where preterm infants often receive care. Higher-level neonatal units (commonly found within children’s hospitals) may spend more on patient care, even after adjusting for GA, outborn status, and patient mix [42]. This could bias our sample towards more costly care and may underestimate variability. Based on our specific inclusion criteria, our findings do not generalize to newborn populations with congenital anomalies, or to other NICU patient populations including late preterm and term infants. Similar analyses among those populations should be conducted to establish value-based improvement priorities for their care.


We established a value-based prioritization framework for comparative effectiveness research and quality improvement based on cost, inter-hospital variability, and degree of exposure to different tests and treatment categories among VP and VLBW infants cared for in US children’s hospitals NICUs. We identified parenteral nutrition, anticoagulation, intravenous fluids, and frequently used laboratory and imaging modalities as top priorities for comparative effectiveness research and quality improvement efforts to increase the value of neonatal care.