Electronic health records (EHRs) are increasingly utilized in research for a variety of research purposes including health services access and utilization, machine learning, and cost-effectiveness [1,2,3,4]. Procedure and diagnostic codes are often used as a surrogate for the clinical outcomes of interest because they are routinely collected, easily accessible, and cover a wide range of clinical conditions. For example, the most recent version of the International Classification of Diseases (ICD), 10th edition, provides over 70,000 procedure codes and over 69,000 diagnosis codes [5], and ICD codes have previously been used to investigate questions in neonatal and perinatal medicine [6,7,8,9].

However, ICD codes are collected primarily to facilitate billing and reimbursement and the financial dynamics within the healthcare system may introduce systematic biases in code assignment [10, 11]. Though other research has shown that these codes often do not reflect clinical definitions of certain disease processes [10], it is unknown how sensitive or specific ICD codes are for many conditions in pediatric populations. The degree to which ICD codes are reflective of a patient’s true clinical state is increasingly important as “big data” efforts utilizing claims and EHR data expand into new areas of medicine and healthcare [12]. If ICD codes are not reflective of the actual clinical state of a patient, statistical models and machine learning algorithms trained on billing code data could be of little clinical utility or worse, actively misleading or harmful.

Bronchopulmonary dysplasia (BPD), defined most often as supplemental oxygen use at 36 weeks’ postmenstrual age (PMA), is one of the most common morbidities experienced by infants born at less than 28 weeks gestational age [13]. Infants with BPD are often born at lower gestational ages and lower birth weights and have more complicated neonatal courses. A diagnosis of BPD has been associated with higher healthcare utilization post discharge and increased respiratory morbidity into early childhood [14]. Recently, the utility of the current definition of BPD as a strong biomarker for later respiratory problems has been questioned [15, 16]. Understanding the etiology and risk factors for this condition remains an important goal in neonatal research. One challenge has been the continuing evolution of the definition of BPD since its original description by Northway in 1967 [17] to the more recent NIH Consensus definition in 2017 [18]. In addition, there is a lack of consistency and standardization of the clinical definition across clinical environments [11, 19, 20], rendering systematic studies of across institutions and populations difficult.

An increasingly popular approach to the study of not only BPD but also of many diseases is to leverage the large amounts of data contained in EHRs and administrative claims databases. These databases are often created to facilitate patient care and billing but are now routinely used for research purposes. However, it is yet unknown whether clinical BPD is captured with enough fidelity in these types of databases to allow for reliable and meaningful clinical conclusions. Therefore, in this study we assessed the extent to which ICD-9 and ICD-10 codes for BPD reflect the clinical definition in two distinct data sources, including a single-center neonatal intensive care unit as well as a large insurance claims database, to evaluate their potential as a reliable marker for the disease.

Study design

Single institution, EHR data source

We identified infants at high risk of developing BPD (<29 weeks gestational age, admitted with a respiratory requirement) at a single institution in the Vermont Oxford Network (VON, [21]) from 2015 to 2018. VON identifiers were matched with internal medical record numbers of individual patients to enable chart review for each patient. Patients were excluded if they were transferred or discharged prior to 36 weeks’ PMA or died prior to 36 weeks’ PMA. A chart review for each patient was completed to determine oxygen and ventilator requirement at both 28 days after birth and 36 weeks’ PMA. Demographic characteristics were examined including gestational age, birth weight, and sex. Race and ethnicity were not available for the majority of the patients in the chart review.

Three definitions of BPD were used to assess the sensitivity and specificity of the ICD-10 codes: (1) the VON definition determined by the presence of a BPD diagnosis in the VON database, which was considered the gold standard BPD diagnosis, (2) oxygen support at 36 weeks’ PMA determined through chart review, and (3) either oxygen use at 36 weeks’ PMA or a pressure requirement (high flow nasal cannula, CPAP, or mechanical ventilation) in room air (FiO2 0.21) at 36 weeks’ PMA also determined through chart review in our single-center dataset.

We then determined if each infant had an ICD-10 code related to BPD in their medical record. ICD-10 codes selected were P27.1 (BPD originating in the perinatal period), P27.9 (unspecified chronic respiratory disease originating in the perinatal period), P28.89 (other specified respiratory conditions of the newborn), and P28.9 (respiratory condition of the newborn, unspecified) after review of the ICD-10 codes for neonates related to BPD. These four codes are included in the P00-P96 category referring to respiratory and cardiovascular disorders specific to the perinatal period. An additional code of P28.5 was not included in this dataset because every infant with a code for respiratory failure of the newborn (P28.5) also had one of the four previously selected codes.

Single-center database analysis

Two-by-two tables were then constructed for each definition of BPD and the presence or absence of the code P27.1 and then the presence or absence of any of the four ICD-10 codes to calculate sensitivity and specificity for the ICD codes.

In the single-site data, we calculated the sensitivity, specificity of the lung-specific ICD-10 codes for a diagnosis of BPD as defined by the three previously listed definitions. We also determined the day of life each infant first received an ICD-10 code for BPD. We excluded ICD-10 codes that were assigned before the infant’s date of birth (n = 23).

Insurance claims data source

The infants in this portion of the analysis were drawn from a national, deidentified administrative database of ~45 million individuals with a commercial insurance plan from a single commercial provider from January 2008 to February 2018. For intensive care patients (including the neonatal ICU), claims and billing are recorded daily and have both ICD diagnostic codes and provider billing codes attached to reflect the services the patient received on a given day of life. From this cohort, we identified a subpopulation of infants with BPD on the basis of ICD-9 and ICD-10 codes. For patients born before October 1, 2015 (the transition date from ICD-9 to ICD-10), the ICD-9 code 770.7 (chronic respiratory disease arising in the perinatal period) was used. Gestational age and birth weight were estimated using ICD-9 (765.00–765.29) and ICD-10 (P07.00–P07.39, P08.00–P08.22). Patients born after October 1, 2015, were identified using the ICD-10 code P27.1 (see above). In addition, we queried the date of birth as well as the date of the first appearance of the ICD-9 or ICD-10 BPD code [5].

Large insurance claims analysis

In the large insurance database, we determined the day of life the first ICD-9 or ICD-10 code for BPD was assigned for each infant. We excluded all infants with a BPD code dating before their date of birth (n = 7 for ICD-9; n = 4 for ICD-10).

All analyses were conducted using the R statistical programming language [22]. The Harvard Medical School Institutional Review Board and Partners Healthcare System Institutional Review Board waived the requirement for approval, as it deemed this analysis of the database to not be human subjects research.


Single institution, EHR results

Patient characteristics

Data were collected on a total of 213 infants from 2015 to 2018, and 47 infants were excluded who were transferred, discharged, or died prior to 36 weeks’ PMA. In the final cohort of 166 infants, 52% of the sample was male, gestational age ranged from 22 completed weeks (1% of the total population) to 28 completed weeks (25% of the total population), and birth weight ranged from 380 to 1560 g with a mean birth weight of 880 g (Table 1).

Table 1 Patient characteristics from single-site data.

BPD definitions and prevalence

We determined the number of infants with each of the three BPD definitions: (1) presence of a VON-coded definition (single-center EHR) for BPD (n = 112), (2) clinically (single-center EHR) receiving oxygen at 36 weeks’ PMA (n = 82), and (3) clinically (single-center EHR) receiving positive pressure support (high flow nasal cannula, CPAP or mechanical ventilation) in room air or receiving oxygen support at 36 weeks’ PMA (n = 95). There were 18 infants missing a VON-coded definition for BPD who clinically qualified as having BPD through chart review. Of infants with a clinical diagnosis of BPD, 36 were missing a code for the VON diagnosis. The prevalence of BPD according to the VON definition in this cohort was 67%. (Table 1). The prevalence of BPD according to oxygen support at 36 weeks’ PMA was 49% (82/166). Finally, the prevalence of BPD according to oxygen support and/or positive pressure support was 57% (95/166).

Attributes of ICD-10 codes

Four ICD codes for BPD or chronic lung disease were identified (P27.1, P27.9, P28.89, P28.9). The sensitivity and specificity were evaluated according to all three BPD definitions. The sensitivity and specificity for the ICD code of P27.1 alone was evaluated as well as the presence of any of the four predetermined ICD codes. The sensitivity for the presence of any ICD-10 code according to the VON-coded definition (gold standard) was 93% and the specificity was 36%. These results are summarized in Table 2. The positive predictive value and negative predictive value for the ICD-10 code P27.1 were calculated with respect to the gold standard VON-coded definition. The PPV was 73% and the NPV was 48%.

Table 2 Sensitivity and specificity for three BPD definitions in single-site data.

Discordance among definitions

There were 13 infants without a clinical diagnosis of BPD who received a VON-coded definition for BPD. Four of these infants were receiving pressure support with either bubble CPAP or high flow nasal cannula. Of these 13 infants, 11 had a code for P27.1 and 4 infants had more than one respiratory code including P28.5, P27.9, or P28.89. One infant had a clinical definition of BPD but did not have a VON code. This infant received a P27.9 code (unspecified chronic respiratory disease in the perinatal period).

Date of code assignment

Finally, we looked at the date of assignment of the ICD-10 code in relation to the infant’s date of birth. A total of 105 infants had an ICD code assigned either on their date of birth or after. A histogram showing the day of life when the ICD-10 code for BPD (P27.1) was first assigned for this cohort is shown in Fig. 1. We found that 77/105 infants (73%) were assigned one of the four respiratory codes on their date of birth. Overall, 23/105 (22%) of infants had an ICD-10 code assigned after being corrected to 36 weeks’ PMA.

Fig. 1: Day of life ICD code assigned.
figure 1

In the single center dataset, this represents the number of infants with an ICD code indicating BPD in a given category of day of life.

Insurance claims database

We identified 7887 infants who received an ICD-9 or ICD-10 code indicating BPD. The distribution of gestational age, birthweight, and sex are given in Table 3 and a histogram showing when each infant first received a code for BPD is shown in Fig. 2. The detailed numbers can be found in Supplementary Table. We found that the most common day for the first appearance of this code was the infant’s day of birth (1480/5790, 25.6% ICD-9, 614/2097, 29.3% ICD-10). The distribution of the first assignment of this code was qualitatively similar across both ICD-9 and ICD-10.

Table 3 Patient demographics for infants in the insurance claims database who received an ICD-9 or ICD-10 code for BPD.
Fig. 2: Day of life ICD code assigned.
figure 2

In the insurance claims dataset, these panels represent the number of infants with an ICD code indicating BPD in a given category of day of life. The panels show the ICD-9 categories and ICD-10 categories.


EHRs have a wealth of information that can be utilized for epidemiological and health services research. However, care should be taken when using EHR-based databases without understanding exactly how the data were extracted and how precisely the information can identify the patient populations of interest. In the current study, we aimed to determine the sensitivity and specificity of ICD-9 and ICD-10 codes in diagnosing BPD using two distinct sources of data. We found that neonatal respiratory ICD-10 codes had excellent sensitivity, but poor specificity in identifying infants with a clinical diagnosis of BPD, despite the classification used. This is likely reflective of the lack of granularity in the ICD-10 codes for neonatal respiratory conditions leading to early application and overutilization of certain respiratory codes.

The variability in the definitions for BPD also contributes to a lack of specificity for the ICD-10 codes. It is well documented in the literature that the way BPD is defined changes the prevalence of BPD in neonatal populations [23]. In our data, the prevalence of BPD ranged from 49 to 67% depending on which classification was used. Therefore, not only is it difficult to have an ICD code accurately reflect a diagnosis but it is made more challenging by the fact that the disease definition itself is highly variable. The ICD-10 codes and definitions are insufficient to cover the temporal spectrum of neonatal respiratory disorders in preterm infants. The vague codes for newborn respiratory distress neither cover specific time intervals nor do they discriminate between transient tachypnea of the newborn and surfactant deficiency. There is a gap between when these early, short-term problems end and when BPD begins. A 27-week preterm infant may have surfactant deficiency whose severity resolves within a few days, but remains on O2 or other respiratory support for weeks or months without an ICD code for defining that gap in time between RDS and BPD.

A consistent finding that emerged both from our single-center data as well as the national claims database was that often the code for BPD was assigned to the infant on the date of birth, despite this being technically impossible under any definition of BPD, and relatively fewer patients received a code for BPD either after 28 days of life or after 36 weeks’ corrected PMA. This could be a result of coding practices at specific institutions such as entering codes post discharge or codes receiving a date of entry as the date of birth, or date of admission, by default. However, not all infants receive a code on the day of birth and some were assigned appropriately at 36 weeks’ corrected PMA. This occurred in both the single-center dataset as well as the large insurance database suggesting that it may not be completely related to single-center coding practices. In these cases, it may be that the ICD code is not entered as a “primary diagnosis” and therefore not attached post-hoc to the entire NICU stay. We also observed an increase for the first code assignment on day 28 of life. This is likely due to the fact that after day 28 of life, providers are no longer allowed to code for respiratory distress syndrome (ICD-9: 769, ICD-10: P22.0) and must select another code to justify reimbursement for patients still requiring oxygen support. All of the above trends appeared consistent regardless if ICD-9 or ICD-10 codes were used in the insurance data.

Utilizing ICD-9 and ICD-10 codes for any type of research analysis comes with challenges. A study by Quan et al. examined the accuracy and validity of both ICD-9 and ICD-10 codes to identify clinical conditions at the time of hospital discharge [24]. Overall, the study found that ICD-10 codes had varying sensitivity and specificity for several diagnoses, which is consistent with the conclusions from our study. Studies such as this reveal the limitations of utilizing coding and administrative data in outcomes research due to the variability in the accuracy of the codes. The clarity of the definition of the diagnosis is likely related to the accuracy of the ICD-9 or ICD-10 code for that diagnosis. A study by Reeves et al. in 2020 demonstrated that patients with sickle cell anemia could be accurately identified utilizing administrative claims data [25]. However, a definition of sickle cell anemia is much more straightforward and clearer than the changing definition of BPD. Interestingly, a study of a healthcare system in Montreal, QC, Canada, from 1983 to 1992 found that ICD-9 codes for BPD had high specificity and were likely sufficient for research purposes [26]. Changes in the definition of BPD since that study was conducted or differences between the Canadian and US Healthcare systems might explain this discrepancy, though fully understanding the drivers of these differences remains an interesting research direction.

While clinicians can enter ICD-9 or now ICD-10 codes for diagnoses accrued during the admission of a patient, medical coders also enter these ICD codes after a patient has been discharged from the hospital based on the narrative account written by the clinician at the time of discharge. There are many complicated factors related to documenting a patient’s care [27] and in the NICU and this is compounded by the fact that a firm diagnosis for a disease process such as BPD is not consistently agreed upon [11]. This can make it challenging for medical coders to understand if a diagnosis for BPD is fully justified. In essence medical coders are confronted with the same question as the field: what exactly are the criteria for a BPD diagnosis?

The ultimate utility of an ICD code is inherently tied to the use case. Given the low specificity of codes for BPD, it is likely insufficient for fine-grained epidemiological studies of this condition since using it will include many infants who do not truly have the disease. However, given the sensitivity, it may still be of utility for the development of screening or “phenotyping” tools, since the code does indeed do a good job at identifying all of the infants who have the condition. The false positives could then be eliminated through further chart review or similar kinds of efforts. Mindful use of administrative and EHR data remains an important concern for future studies.

Finally, the value of the current BPD definition in identifying children at risk of chronic respiratory morbidity has recently been a topic of debate [15, 16] using ICD codes extracted from EHR and administrative databases has poor specificity for detecting a diagnosis that itself, even when accurately labeled, may have poor predictive value for which newborns will continue to have respiratory difficulties. Thus, there is a need to identify better biomarkers and risk factors that will predict which infants are at high risk for ongoing respiratory problems. Determining these predictive risk factors and finding ways to readily identify them in the EHR and administrative databases should be a goal of epidemiologic research, if we are to better understand the etiology and natural history of lung disease in preterm infants.

Strengths and limitations

There are several limitations to this study. First, the data from the single institution may not be generalizable to other institutions. ICD-10 codes are often applied in variable ways both by individual physicians and individual institutions. Billing and coding practices both between institutions and also between centers, states, and countries differ. Billing practices over time periods also differ. These changes must be kept in mind when considering this study. However, we were able to examine ICD codes in a large insurance database that may strengthen our conclusions to be more generalizable across many institutions. We must also consider that the ICD codes in this insurance database are not validated, though, and again take care when generalizing these conclusions. Finally, our initial dataset was a small dataset of infants admitted to the neonatal intensive care unit with a respiratory requirement. This results in potential for selection bias in our population. However, as the outcome we examined was BPD, this population is more likely to develop BPD and give us more information about the specificity of the ICD-10 codes.

The strengths of our study include that we have a gold standard diagnosis of BPD for reference with the VON dataset matched to our internal medical record numbers. This decreases the ambiguity of the BPD definition, which is a well-known challenge in neonatal research. However, the VON definition should also be taken with caution since these codes are entered into the database by a coder. There is potential for misclassification bias if codes are not entered correctly, resulting in a VON definition that may not be a 100% accurate gold standard. This potential limitation is supported by the fact that some infants were both “underreported” for BPD according to their VON definition as well as “overreported” for BPD after chart review was completed. Another strength is the granularity of the data as chart review was performed on every individual patient included in the dataset to ensure accuracy of diagnoses despite the VON coded definition used.


There is great hope that large healthcare databases will facilitate the development of sophisticated epidemiological and clinical decision support tools. Often implicit in this hope is the assumption that healthcare data, such as ICD codes, accurately reflect the true clinical state of the patient. We found that for BPD in a neonatal population, this may not be true based on a single-center database. Overall, we found that lung-specific ICD-10 codes had high sensitivity but poor specificity in identifying patients with a diagnosis of BPD. This remained true despite the definition used for BPD (VON code versus chart review). We conclude that ICD codes are an imprecise way to identify infants with this condition in this single center. Caution should be exercised in making conclusions based on associations with ICD codes used as a BPD proxy. Further investigation into more national datasets should be done to understand the specificity on a national scale. Large healthcare databases are an important tool for researchers, but without a thorough understanding of how documentation differs from center to center, it may be difficult to make conclusions. If we are to continue the use ICD codes for research purposes, a multidisciplinary approach to ensuring meaningful data capture should be developed.