Main

Bronchopulmonary dysplasia (BPD) is common in premature babies born before 29 wks of gestational age and weighing less than 2.2 lb (1). Each year, approximately 5000–10,000 babies born in the United States receive a diagnosis of BPD. The rate of preterm birth (~12%) in the United States is higher than other developed countries (2). The number of BPD cases has increased over the last decades, largely due to higher survival rates of premature infants. Very-low-birth-weight premature infants, born during the canalicular and saccular stages of lung development, are at the greatest risk of BPD due to disruptions in the normal program of lung alveolar and vascular development (3,4). More than 30% of premature infants born before 30 wks of gestational age develop BPD (3) and almost all infants (>97%) with birth weight <1,250 g are diagnosed with BPD (5).

A diagnosis of BPD is dependent upon the use of supplemental oxygen following preterm birth. In addition to premature birth, environmental factors such as oxidative stress, mechanical ventilation, and infection play a significant role in pathogenesis of BPD. Higher than physiological levels of therapeutic oxygen (hyperoxia) induce stress with production of reactive oxygen species. The lungs of premature infants are susceptible to injury with pre- and postnatal exposures. These exposures may cause lung damage and induce a deviation from the normal developmental path (4).

Variability in the incidence and severity of BPD among premature infants with similar risk factors suggests that genetic susceptibility plays a significant role (6,7). After controlling for covariates, genetic factors appear to account for a majority of the variance in liability for BPD in premature infants (7,8,9). Several previous studies have identified an association between candidate gene variants and BPD. Most of these studies have focused on genes for innate immunity, antioxidant defenses, mechanism of vascular remodeling, and genes coding for surfactants proteins (5,10). While many significant findings have been reported, a majority of them have not been replicated in subsequent cohorts (5,6). Genome-wide gene expression microarray studies identified numerous genes in BPD lung tissue which involved different pathways including DNA damage, regulation of cell cycle, B-cell development, inflammatory, cell proliferation, and hedgehog signaling (11,12). More recently, genome-wide approaches to identify BPD susceptibility genes have been reported. SPOCK2 was identified as a susceptibility gene in a genome-wide association study (13). However, recent genome-wide association studies failed to replicate any of the previously identified genes, or identify any new candidate susceptibility genes at genome-wide significance (14,15).

Genetic structural variations in the human genome range from single nucleotide polymorphisms to large chromosomal anomalies (16). Copy number variants (CNV) are large genomic structural changes resulting in an increase or decrease in the number of alleles for specific genes. CNV regions are associated with several human diseases (e.g., neurodevelopmental disorders, congenital abnormalities, and cancer) (17,18). For this reason, clinical DNA microarrays are widely used as a first-tier diagnostic test for CNV screening of patients who are suspected for chromosomal aberrations (19), particularly in the cases of congenital and/or newborn disease.

There is no prior report of CNV identification in BPD using DNA microarrays. We hypothesized that a genome-wide assessment of CNV in subjects with a diagnosis of BPD will help to identify specific loci that harbor genetic susceptibility factors. We conducted an IRB-approved retrospective analysis, using clinical DNA microarray database from our institution, and identified recurrent CNV in BPD subjects. We interrogated genes present within identified CNV regions for differential expression using previously published data sets describing genome-wide expression in developing human lung tissue (20). We further looked for pathways and processes represented by genes residing within CNV regions.

Results

Identification of Recurrent CNV Regions in BPD Subjects

We identified and considered genic CNV (within RefSeq transcription boundaries), rare CNV (<1% frequency in normal population), and common polymorphic CNV (>1% frequency in the normal population). We identified 3 chromosomal regions (11q13.2, 16p13.3, and 22q11.23–q12.1) that contained a CNV in more than 1 of the 19 confirmed BPD subjects ( Table 1 ). The Database of Genomic Variants showed that two of these three recurrent CNV loci (11q13.2 and 16p13.3) were rare (<1% frequency in healthy populations). The remaining CNV region, encompassing 22q11.23–q12.1, involved rare (22q11.23) and common (q12.1) CNV regions. We further interrogated CNV at all these recurrent regions, since recent studies have shown that polymorphic regions have important functional consequences and are associated with disease susceptibility (18,21).

Table 1 Summary of recurrent CNV regions identified in BPD subjects

Locus chr11q13.2. We identified recurrent deletions at chr11q13.2 in 3 out of 19 BPD subjects (15.8%). The size range of these CNV regions was between 1.1 and 165.4 kb and included genes for GPR152, CABP4 (OMIM: 608965), AIP (OMIM: 605555), TMEM134, PITPNM1 (OMIM: 608794), CDK2AP2, C11orf72, CABP2 (OMIM: 607314), GSTP1 (OMIM: 134660), and NDUFV1 (OMIM: 161015) ( Table 2 and Supplementary Figure S1 online). CNV at this locus were not found in any subject from either the full-term or preterm control groups. The frequency of this CNV was statistically significant between BPD and full-term control group (P value = 0.028). There was a trend for significance in the frequency of CNV at this locus between BPD and preterm control group (P value = 0.084) ( Table 1 ).

Table 2 Summary of recurrent CNV locus at chr11q13.2 identified in three BPD subjects and their clinical indications

Locus chr16p13.3. We identified recurrent deletions at chr16p13.3 in 4 of 19 BPD subjects (21.0%). The size of these CNV regions was between 15.5 and 33.7 kb, and involved the hemoglobin locus. Genes within the locus affected by CNV included HBM (OMIM: 609639), HBA1 (OMIM: 141800), HBA2 (OMIM: 141850), HBQ1 (OMIM: 142240), and LUC7L (OMIM: 607782) ( Table 3 and Supplementary Figure S2 online). CNV at this locus were observed in two full-term control subjects, but not in any of the preterm controls. There was a trend for significance in the frequency of CNV at this locus in BPD subjects as compared with full-term control group (P value = 0.074). The frequency of this CNV was statistically significant between BPD and preterm control group (P value = 0.035) ( Table 1 ).

Table 3 Summary of recurrent CNV locus at chr16p13.3 identified in four BPD subjects and their clinical indications

Locus chr22q11.23–q12.1. We identified deletions and duplication in two adjacent CNV regions at 22q11.23–q12.1 in three BPD subjects (15.8%), ranging in size from 114 to 208 kb. The gap between two CNV regions is approximately 1.9 Mb. The CNV regions included genes for BCR (OMIM: 151410), ZDHHC8P1, CES5AP1, CRYBB2P1, IGLL3P, and LRP5L ( Table 4 and Supplementary Figure S3 online). We did not observe any CNV at this locus in the full-term control group, but observed CNV at this locus in 2 out of 23 preterm control group. The frequency of CNV at this locus was significantly higher (P < 0.05) in BPD subjects when compared with the full-term control group (P = 0.028), but not with the preterm (P = 0.644) control group ( Table 1 ).

Table 4 Summary of recurrent CNV locus at chr22q11.23–q12.1 identified in three BPD subjects and their clinical indications

Expression of CNV genes in human lung development. We assessed expression of genes within the CNV loci in normal human lung development using our previously published expression database (20). We interrogated 21 genes within the 3 recurrent CNV regions for differential expression. In total, 15 genes (71.4%), demonstrated significant changes (P < 0.05) in expression between the pseudoglandular and the canalicular stages of lung development, approximately at the time of preterm birth for subjects with the greatest risk of BPD ( Figure 1 ). Eight genes within the 11q13.2 CNV locus displayed significant changes in expression; four genes (CABP4, CABP2, CDK2AP2, and C11orf72) showing increases in expression and four genes (TMEM134, AIP, GSTP1, and NDUFV1) showing decreases in expression. Five genes within the 16p13.3 CNV locus (HBM, HBA1, HBA2, HBQ1, and LUC7L) showed significant decreases in expression. Two genes (BCR and LRP5L) within the 22q11.23–q12.1 locus displayed significant changes in expression; with LRP5L being increased and BCR decreased.

Figure 1
figure 1

Expression level of genes during normal human lung development. Twenty-one genes from recurrent CNV regions were assessed for expression levels in our database of normal human lung development (20). Fifteen genes (71.4%) showed change in expression between the pseudoglandular and canalicular stages (P < 0.05), coincident with the timing of preterm birth for subjects with the greatest risk of BPD. Shown here are data for 23 probesets representing 15 genes. Rows represent the probesets and columns represent sample ordered chronologically left to right. Red indicates over expression and green indicates under expression. CNV, copy number variants.

PowerPoint slide

Pathway Analysis

To test for biological processes represented by genes residing within CNV regions, we performed pathway analysis using the 21 genes present in the identified recurrent CNV regions. We identified three canonical pathways including aryl hydrocarbon receptor signaling, xenobiotic metabolism signaling, and glutathione-mediated detoxification ( Table 5 ). We also identified development- and disease-related processes, which were significantly over-represented by these genes (see Supplementary Table S1 online).

Table 5 List of statistically significant (P < 0.05) canonical pathways identified by Ingenuity pathways analysis (IPA) software

Discussion

To the best of our knowledge, this is the first retrospective study using clinical DNA microarray data for the identification of CNV associated with BPD. Genomic structural variation is classically defined as deletion, duplication, and other genomic alterations larger than 1kb (22). CNV can be classified as genic CNV, rare CNV, and common polymorphic CNV. We studied all CNV larger than 1 kb in our analysis, and identified three CNV regions that are recurrent in multiple subjects with BPD. All three of these recurrent CNV occurred in BPD subjects at a frequency significantly greater than at least one control population, either preterm or full-term subjects lacking chronic lung disease. While a recent single-nucleotide polymorphism array-based study failed to identify any statistically significant CNV associated with BPD (23), it has been reported that the sensitivity and specificity for CNV detection from single-nucleotide polymorphism arrays are lower than that of array comparative genomic hybridization platforms (24). While this is a small study, our data suggest that genes within or adjacent to these loci warrant further study as genetic susceptibility factors for BPD.

The recurrent CNV we report here occur on three separate chromosomes and collectively encompass 21 genes. Only one of these genes, GSTP1 located at 11q13.2, has been previously reported for association with BPD (25). However, no association was observed in a subsequent study (26). The CNV at this locus have also been reported to be affected in different types of cancer (27,28,29). Interestingly, the LRP5 gene is located 838 kb downstream of the 11q13.2 locus, and has been shown to regulate development of lung microvessels and alveolar formation through the angiopoietin-Tie2 pathway (30). Lrp5, a Wnt co-receptor, is also a genetic driver of lung fibrosis in mice, and a marker of disease progression and severity in humans with idiopathic pulmonary fibrosis (31). We suggest that genes adjacent to the CNV reported here warrant further consideration for association with BPD.

The CNV at locus 16p13.3 directly involve the human alpha globin gene cluster region. Tryptases are serine proteases secreted by mast cells, which themselves are commonly associated with allergic responses and asthma (32). Additionally, mast cell gene expression and mast cell accumulation have recently been associated with BPD severity in humans, and in animal models of disease (11). The genes encoding mast cell-specific tryptases (e.g., TPSAB1, TPSB2, TPSD1, etc.) are clustered about 1.02–1.06 Mb downstream of the recurrent CNV on locus 16p13.3.

The recurrent CNV region at 22q11.23–q12.1 involves two loci separated by 1.9 Mb, and includes six genes. This CNV are not rare in the general population (>1% frequency in the normal population), and so are considered a polymorphic CNV region. Recent studies have shown that polymorphic regions have important functional consequences and can be associated with disease susceptibility (18,21). For example, CNV at locus 8p23.1, a well-known polymorphic region containing the beta-defensin cluster, is associated with common multifactorial diseases such as Crohn’s disease and psoriasis (33,34). Interestingly, deletion of 22q11.23–q12.1 was previously reported in a premature baby who died 1 d after birth (35).

CNV can cause disease through a deletion or duplication, if they encompass dosage sensitive genes. Furthermore, CNV or other genomic structural variations that are located at a distance from dosage sensitive genes can affect expression through position effects or deletion of key regulatory elements (22). Dosage sensitive genes can cause disease alone or in combination with other genetic or environmental factors (22). In a previous study, 17% of the variation in gene expression was explained by CNV (36). We directly assessed expression of 21 genes located within recurrent CNV using a previously published data set describing global gene expression in lung tissue from subjects with BPD and appropriate controls (11), but found no significant changes.

Human lung development is divided into five different stages: embryonic, pseudoglandular, canalicular, saccular, and alveolar. Preterm infants born prior to or at the end of the canalicular stage, or at less than 29 wks of gestational age are at the greatest risk of BPD. Using previously published normal human lung development expression data (20), we found 15 of the 21 genes (71.4%) located within recurrent CNV loci displayed significant changes (P <0.05) in expression between the pseudoglandular and the canalicular stage of lung development. This indicates that these genes might have role in normal lung development.

BPD is a multifactorial disease and many factors contribute to the pathogenesis of disease, including hyperoxia and excessive production of free radicals. Using Ingenuity pathways analysis analysis tools, we identified 3 canonical pathways represented by the 21 genes present in the observed recurrent CNV regions, which included aryl hydrocarbon receptor signaling, xenobiotic metabolism signaling, and glutathione-mediated detoxification. Recently, it was shown that the aryl hydrocarbon receptor signaling pathway is dysregulated in the lungs of neonatal mice exposed to hyperoxia, a treatment that results in BPD-like pathology (37). Exposure to high concentrations of supplemental oxygen generates free radicals which may cause tissue damage in preterm infants due to poorly developed antioxidant systems (38). Free radicals also induce lipid peroxidation. The glutathione-mediated detoxification pathway plays an important role in the protection of tissues from toxic effects of free radicals and lipid peroxidation.

There may in fact be a precedent for associations between CNV and lung development abnormalities, as are observed in BPD. Schloo et al. (39) reported on cases of Down syndrome with abnormal pulmonary development characterized by decreased alveolar complexity. These individuals had a common CNV involving duplication of an entire chromosome (trisomy 21). Deutsch et al. (40), also reported the identification of trisomy 21 in one case with characteristic alveolar abnormalities and no heart disease. It is important to note that trisomy 21 was not present in any of the subjects described in this study.

There are some limitations in this study worth noting. BPD diagnosis was obtained through review of the electronic medical records, which was restricted to the presence or absence of a clinical diagnosis of BPD, but not the criteria used for defining BPD. The absence of quantitative criteria available for determination of BPD diagnosis is a weakness that could introduce variability within the study design. However, it is worth noting that all subjects described in this study were from a single site. Also, since the clinical indications for CNV testing for all the subjects studied were heterogeneous (and did not include BPD), this may have introduced bias in the association of CNV with BPD. Importantly, there was no common indication for CNV testing in BPD subjects; the only consistent clinical problem in those subjects is a clinical diagnosis of BPD. Similarly, no consistent clinical indication for CNV testing was observed in either set of control subjects. We conclude that the novel CNV described in this study may contribute to the identification of subjects with BPD. Replication and validation of these studies are the focus of current efforts.

Materials and Methods

Clinical DNA Database and Microarray Platform

The study was approved by the Institutional Review Board at the University of Rochester and included a retrospective analysis of the clinical DNA microarray database of our institution. This database includes data from approximately 2,800 subjects tested between 2008 and 2013. All subjects included in our studies provided consent for future research purposes. The DNA microarray experiments were performed using the Agilent 4 × 44K (Agilent Technologies, Santa Clara, CA), ISCA v1.0, and v2.0 platforms. Commercially available pooled DNA (Male & Female; Promega Corporation, Madison, WI) was used as control DNA.

Subjects and Controls

We searched our deidentified DNA microarray database using specific terms of interest (e.g., BPD, prematurity, lung, and pulmonary) and identified a set of 177 subjects at risk for a clinical diagnosis of BPD and appropriate controls. Ascertainment of the presence or absence of a clinical diagnosis of BPD involved a review of the clinical problems list for each subject in the electronic medical record system. Among the 177 at-risk subjects, 19 premature infants received a clinical diagnosis of BPD. We identified two sets of control subjects; a group of infants born prematurely, but who had no evidence of lung disease (n = 23), and a group of full-term subjects who had no evidence of either prematurity or lung disease (n = 41) ( Figure 2 ). The median gestational age for premature subject diagnosed with BPD was 30.5 wks and median gestational age for premature subject used as control was 33.5 wks. We further applied two tailed t-test and the difference in gestational age in BPD subjects and preterm control subjects was statistically not significant (P = 0.088).

Figure 2
figure 2

Workflow of identification of CNV associated with bronchopulmonary dysplasia. The steps includes (i) search of DNA microarray database (n = 2,800) for identification of subjects at risk for a diagnosis of BPD, (ii) review of medical records of potential subjects (n = 177) for ascertainment of the presence and absence of a clinical diagnosis of BPD, (iii) re-analysis of raw DNA microarray data of BPD subjects and control groups, and finally (iv) identification of CNV, assessment of human lung development gene expression database and pathway analysis using genes present in the identified recurrent CNV regions. BPD, bronchopulmonary dysplasia; CNV, copy number variants.

PowerPoint slide

Data Analysis

The raw DNA microarray results (tiff image files) were processed using Feature Extraction software v10.10 (Agilent Technologies) and analyzed using Genomic Workbench software v7.0 (Agilent Technologies) using manufacturer’s recommended analysis settings. To control for small variations, we used an extra aberration filter; the minimum number of probes present in an aberrant region was set at 3 (gain or loss) with the minimum absolute level of average log2 ratio equal to 0.25. The Derivative Log2 Ratio Spread for all subjects was in the excellent range (≤0.20).

Analysis reports from each case were generated by Genomic Workbench software and included chromosome coordinates (start and end) of CNV, cytoband, log ratio, and genes present within each CNV. The cytoband location and genes within the identified locus were verified using chromosomal coordinates of CNV in the UCSC genome browser (41) and human genome NCBI build 37 (hg19). CNV loci affected in more than one subject and the minimal affected region across subjects, were determined manually. Differences in the frequency of CNV between BPD case and control groups at any given locus were assessed by Fisher’s exact test.

Assessment of Existing Normal Lung Development Microarray Expression

For assessment of human lung development gene expression, we examined data from our previously published data set characterizing genome-wide expression in normal fetal lung tissue from 53 to 130 d of estimated gestational age (ref. (20) and GEO). For 21 genes of interest, we extracted time-specific summarized expression values, and unadjusted t-test P values for differential expression when comparing the pseudoglandular vs. canalicular stages. Data were linked across data sets according to the NCBI Gene Symbol.

Pathway Analysis

To test for pathways and biological processes represented by genes residing within recurrent CNV regions, we performed analysis using Ingenuity pathways analysis software (QIAGEN, Valencia, CA).

Statement of Financial Support

This study was funded by a Strong Children’s Research Center (SCRC) Research Grant, University of Rochester, Rochester, United States.

Disclosure

The authors have no conflicts of interest to report.