Main

Respiratory distress syndrome (RDS), an acute lung disease occurring in neonates with surfactant deficiency, remains a leading cause of morbidity and mortality in prematurely born infants. The advent of surfactant replacement therapy has dramatically reduced the severity and improved the outcome of RDS, leading to increased survival of neonates born at earlier gestational ages with quite immature lungs. Lung function in survivors is variable, with some infants emerging unscathed and others left with long-term pulmonary dysfunction. Collective improvements in obstetric and neonatal care during the past two decades have steadily increased survival of very-low-birth-weight infants (1). This prolonged survival has led to a gradual rise in the number of infants who develop bronchopulmonary dysplasia (BPD). Each year, between 5,000 and 10,000 prematurely born infants in the United States are diagnosed with BPD, a form of chronic lung disease characterized by disordered lung growth and associated with respiratory and neurodevelopmental morbidities (2,3,4). Moreover, surviving prematurely born infants with or without BPD are at increased risk for respiratory disease hospitalizations (5), indicating a long-term health disadvantage (6).

Recently, genetic variance has emerged as a significant risk factor for BPD development, accounting for as much as 82% of BPD risk among twins (7,8,9). Genetic variation in surfactant proteins genes and in surfactant protein B (SP-B)-linked microsatellites has been observed as susceptibility markers in BPD (10,11,12,13). These genetic variants may differentially affect innate immunity (14), inflammatory processes (15,16), and/or surfactant-related functions (17,18). Polymorphisms of MBL, MMP-16, extracellular matrix receptor (dystroglycan), and VEGF factor genes also associate with risk of BPD development among premature infants (19,20,21,22). These genetic variations may contribute to infection risk (19) and processes that lead to derangements in lung alveolarization or capillary growth (20,22,23). Identifying genetic markers and understanding mechanisms responsible for these variances have the potential to permit early identification of at-risk neonates and allow for experimental interventions in an attempt to decrease the long-term pulmonary dysfunction. To date, however, only candidate gene approaches for a small number of polymorphisms have been undertaken.

In this study, we used high-throughput technology to study single-nucleotide polymorphism (SNP) associations in a large number of candidate genes (n = 601) that spanned all chromosomes in various subgroups of prematurely born infants (n = 1,091). We hypothesized that genetic differences in the selected candidate genes identify prematurely born children who are at higher risk for the development of neonatal pulmonary disease.

Results

Population Stratification

The initial study group of CAs and AAs was stratified according to ethnicity using genetic markers. The decision rule was used, where individuals with admixture proportion values between 0.45 and 0.55 were excluded. Those with admixture proportion below 0.45 were placed into one group and those with admixture proportion above 0.55 were placed into the other group. Using this method, we identified 922 CAs and 169 AAs; 8 subjects were excluded (of 1,099).

The admixture testing for CAs to distinguish between Northern and Southern European descent placed 897 into one group and 25 into another. Association testing was performed with all CAs together and with stratified data using the covariates. Because one of the CA groups was small (n = 25), no differences were observed with or without this admixture covariates. Therefore, we report data without the admixture adjustment.

Association Testing

Association test analysis revealed significant findings after correcting for multiple testing only for the AAs with or without BPD. A locus on chromosome 2 contained SNPs with a significantly higher frequency in AAs with BPD (AA_BPD). After correction for multiple testing, no significant SNPs were observed in other chromosomes ( Figure 1 ). Upon enlargement of the significant chromosome 2 locus, two SNPs, rs3771150 and rs3771171, (~80 kb apart) were identified with uncorrected P values of 3.87 × 10−6 and 5.75 × 10−6, respectively ( Figure 2 ). After correction for multiple testing (24), these SNPs were significant at the 0.05 FDR level.

Figure 1
figure 1

Distribution of SNPs along chromosomes (CHRs) 1-23. Vertical axis indicates the negative log10 P value. The dotted lines represent thresholds (nominal levels) of P = 0.01 (at 2) and P = 0.001 (at 3). Horizontal axis indicates the CHR position in bp. CHR, chromosome; SNP, single-nucleotide polymorphism.

Figure 2
figure 2

Distribution of SNPs along chromosome 2. (a) Dotted lines represent thresholds (nominal levels) of negative log10 P values: P = 0.01 (at 2) and P = 0.001 (at 3). The circle depicts the region containing the two significant SNPs. (b) Enlargement of the region containing the two significant SNPs. SNP, single-nucleotide polymorphism.

Complete data for determination of BPD were available for 115 AA infants (of 169) and of these, 17 were identified with BPD (6 females and 11 males; of these, 13 had been diagnosed with RDS) and 98 with no BPD (52 females, 45 males, 1 missing; of these, 35 had been diagnosed with RDS). Of the 17 AA_BPD, all but 5 had at least one copy of the risk allele (one individual had no data for marker rs3771171). Marker rs3771150 has alleles C and T, where T is the minor allele with frequency 0.1213 based on the entire AA sample (n = 169). MAF in affected and unaffected AA samples were 0.3824 and 0.0979, respectively ( Table 1 ). The odds ratio (OR) for this marker was 5.7 with 95% confidence interval (CI) (2.5, 13.2), suggesting that AA individuals with the T allele are almost six times as likely to develop BPD as individuals with the C allele. Marker rs3771171 has alleles A and G, where G is the minor allele with frequency 0.1161. MAF in affected and unaffected AA samples are 0.375 and 0.0928, respectively ( Table 1 ). The OR for this marker is 5.9 with 95% CI (2.4, 13.9), suggesting that AA individuals with the G allele are almost six times as likely to develop BPD as individuals with the A allele. The two SNPs, rs3771150 and rs3771171, are located within intron sequences of the interleukin 18 receptor accessory protein (IL-18RAP) and interleukin 18 receptor 1 (IL-18R1) genes, respectively ( Figure 3a ), and both genes are located in a region where other interleukin genes are found ( Figure 3b ). Moreover, SNPs genotyped in this study that are 50 kb upstream of one of the significant markers and 50 kb downstream of the other are shown in Table 2 . All the SNPs upstream of rs3771171, with the exception of rs974389, rs13015714, and rs3755276, reside within the IL-1RL1 gene, which mediates IL-33 signaling, and belongs to the same family as the IL-18R1 and IL-18RAP genes (25). These receptors share common signaling pathways, including that mediated via MyD88. The remaining SNPs (with the exception of rs11886793) reside within introns of either the IL-18R1 or the IL-18RAP gene.

Table 1 Frequency of significant alleles in discovery and replication sets
Table 2 SNPsa present upstream and downstream of the two significant SNPs (rs3771150 and rs3771171)
Figure 3
figure 3

Location of each significant SNP within its respective gene ((a) IL-18R1, (b) IL-18RAP). Solid boxes: translated exons; open boxes: untranslated exons. (c) Chromosome location of IL-18RAP and IL-18R1, and adjacent genes. Asterisk, similar to NAD(P) dependent steroid dehydrogenase-like protein; filled circles, hypothetical protein; filled diamond, similar to AHPA9419. SNP, single-nucleotide polymorphism.

Table 3 Comparisons of BW and GA between cases and controls in the different pairs of study groups

To determine whether these SNPs were located in regions of LD, we performed a haplotypic analysis. Figure 4 displays the haplotype block structure for IL-18R1 and IL-18RAP, as depicted by Haploview (26). As seen in Figure 4 , rs3771171 is located in a region of strong LD with other regions of the IL-18R1 gene, raising the possibility that this SNP, rather than being of pathogenic significance itself, is a marker for another polymorphism that confers altered risk. In contrast, rs3771150 is not in a region of high LD.

Figure 4
figure 4

Haplotypic analysis (LD plots) of (a) IL-18R1 and (b) IL-18RAP genes obtained with Haploview and HapMap data from the Yoruban population (YRI), showing the region surrounding the SNPs of interest (pink rectangles). LD, linkage disequilibrium; SNP, single-nucleotide polymorphism.

GA and BW in AA_BPD and Controls

Significant differences (P < 0.001) in GA and BW were observed between cases and controls. To determine whether the significant SNP frequency differences described reflect GA or BW differences between cases and controls, we performed comparisons of groups of prematurely born infants of AA or CA descent with BPD or RDS ( Table 3 ). Regardless of race, BPD and RDS cases were identified with significantly lower BW and GA as compared to their corresponding controls ( Table 3 ). However, frequency comparisons of the two aforementioned significant SNPs between cases and controls in each set depicted ( Table 3 ) showed significant differences only for the AA_BPD group.

Replication Study

The replication P values were 0.012 for rs3771150 and 0.07 for rs3771171 ( Table 1 ). Power and minimum sample size calculations were performed by methods implemented in PAWE (27,28). For 80% power and 0.05 significance level, under a genetic model-free method, the minimal sample sizes are 139 cases and 184 controls for rs3771150, and 180 cases and 238 controls for rs3771171. The replication study did not reach the minimal required number for 80% power. However, the combined P values for both studies remained significant for both SNPs: 8.31 × 10−7 (rs3771150) and 6.33 × 10−6 (rs3771171).

Discussion

BPD is characterized by arrested lung development whereby alveolar septation and pulmonary vascularization are impaired, culminating in a reduction in alveolar number and abnormal pulmonary microvasculature (29,30). Both prenatal and postnatal factors, including infection and the requirement of mechanical ventilation, stimulate inflammatory processes that contribute to the BPD phenotype. Studies of genetic factor contributions have identified associations (10,11,12,13,19,20,22,23) of polymorphisms with BPD in genes involved in processes known to be deranged in BPD. Given the potential complexity of mechanisms and number of genes that may contribute to BPD, we used a high-throughput approach to study genetic associations of a large number of candidate genes and their impact on BPD development. The analysis revealed two SNPs corresponding to IL-18R1 and IL-18RAP that were associated with BPD development in AA infants. The products of these genes are necessary to effectively mediate interleukin-18 (IL-18) signal transduction and subsequent activation of NF-κB and MAPK8 (JNK) pathways in response to IL-18 ( Figure 5 ). There is limited information in the literature about association with disease of IL-18R1, IL-18RAP, or SNPs present in these two genes. Thus, to the best of our knowledge, this is the first study that describes an association of rs3771150 and rs3771171 with disease in AAs.

Figure 5
figure 5

IL-18-induced signaling via IL-18RAP and IL-18R1 results in secretion of cytokines, a number of which (IL-8, MCP-1/2/3, G-CSF, and IL-6) have been associated with bronchopulmonary dysplasia.

Although the role of these genes in BPD has not been previously investigated at the genetic level, we postulate that altered expression of several proinflammatory cytokines in response to IL-18 contributes to BPD. Regulation of cytokine gene expression and secretion is critical in the inflammatory response, and a variety of cytokines have been reported to be stimulated by IL-18 (31). Several of the IL-18-stimulated proinflammatory cytokines in Figure 5 have been found to be increased in BPD (31,32,33). However, further studies are needed to assess the plausibility of the scenario presented in Figure 5 .

The IL-18R1 gene intron 2 SNP rs3771171 was found at similar frequencies in our AA subjects and the YRI population in HapMap. Interestingly, ancestral allele frequency is somewhat lower in other populations. The SNP in IL-18RAP, however, is present in markedly different frequencies in different populations. The ancestral allele is found in 92% of the YRI and our AA populations, but in other populations (Papua New Guinea) is as low as 27%. The Yoruban population is of interest as it is the HapMap population most closely related to AAs. Thus, the frequencies of these alleles in the AA group, although similar to the “parent” population, can differ in other populations, underscoring the importance of ethnicity considerations in genetic study associations.

The incidence of BPD development is indirectly correlated with GA and BW (34). Although the incidence of BPD was somewhat higher in the CA group, this trend was not significant (CA: 22.4%; AA: 14.8%, P = 0.103, z-test). However, because BPD and control groups differed significantly in both BW and GA, it was necessary to ascertain whether the two significant SNPs were markers of prematurity or BPD. Comparison of four sets of groups of prematurely born CA or AA infants with lung disease (RDS or BPD) vs. their corresponding controls revealed significant BW and GA differences in each set (case, control) ( Table 3 ). However, the frequency of each of the two SNPs did not differ between cases and controls except for the AA_BPD group. Therefore, it is unlikely that the two SNPs associated with risk in AA_BPD are markers of prematurity.

As with many genetic epidemiology studies using the candidate gene approach, replication with a distinct study population is necessary. However, when studying a more rare disease (such as BPD) in a single race (such as AA), collection of replication samples may take years. However, we were able to form a new collaboration to attempt to replicate the results. Because the absolute (but not the relative) differences in MAF of the two studied alleles differ between the discovery and the replication studies, it is possible that unrecognized confounding factors exist between the two study groups. Moreover, by limiting our SNP selection and not performing a genome-wide SNP analysis, we may have missed SNPs important for BPD. Nonetheless, these results should be tested by different investigators before being used to guide targeted interventions that may carry some risk. Furthermore, changes in the clinical definition of BPD over the period of sample collection, lack of knowledge of oxygen saturation limits, or other management strategies could potentially limit the application of these findings. However, the nearly significant association in the replication group indicates that changes in clinical management may not be a limitation factor.

While this article was in preparation, a report was published of association studies between IL-18 SNPs and BPD and between IL-18 SNPs and prematurity in CAs (35). The authors found no correlation between IL-18 SNPs and BPD or prematurity in CAs, an observation similar to our findings with the two IL-18 receptor molecules ( Figure 5 ). These findings indicate that IL-18, and the cell-surface molecules with which it interacts (IL-18R1, IL-18RAP), does not contribute to the genetics of either BPD or prematurity in CAs. Within the AA population, however, the SNPs described in this report may contribute to the pathogenesis of BPD and warrant further investigation.

In summary, we identified two SNPs associated with BPD risk in AA. The involvement of IL-18R1 and IL-18RAP further supports a role of inflammation in BPD pathogenesis.

Methods

Study Group

Following institutional review board approval from the Human Subjects Protection Office at the Pennsylvania State University College of Medicine of a multisite protocol, DNA samples from prematurely born infants with or without RDS, and/or with or without BPD (n = 1,099), were prospectively collected from 1989 to 2008 and genotyped. Parents of subjects were approached if they had babies born prematurely or with neonatal lung disease (RDS and/or BPD), and informed consent was obtained. Patients treated with surfactant prophylactically were excluded, but patients who received surfactant therapy after diagnosis were included. Prenatal steroid therapy, gestational age (GA), birth weight (BW), the individual’s race admixture estimate, sex, maternal steroid treatment, and surfactant therapy were recorded. The GA and BW of each study group are shown in Table 3 . BPD was defined as a need for supplemental oxygen therapy at 28 days of life (34). This definition was chosen (as opposed to examining the diagnosis based on supplemental oxygen at 36 postmenstrual weeks) in order to focus on the dichotomous outcome of the diagnosis of BPD, as opposed to studying BPD severity. RDS was diagnosed by the neonatologist based on clinical criteria (grunting, retraction, and flaring) and verified by radiographic analysis (reticulogranular pattern). Eight samples were excluded based on the decision rule for admixture proportions, leaving 922 and 169 samples identified as CA and AA, respectively. Of these, 682 CAs and 115 AAs had complete data for determination of BPD and were used in this analysis. The replication study consisted of 82 AA cases and 102 AA controls with mean GA for cases and controls of 26.64 and 30.15 weeks, respectively.

DNA Preparation and Genotyping

Genomic DNA was extracted with phenol/chloroform and quantified by Nanodrop (Wilmington, DE). Samples with low DNA concentration (n = 30) underwent whole genome amplification (WGA) by the WGA-REPLI-g kit (Qiagen, Velencia, CA). A fraction of the samples with adequate DNA concentration (n = 61) were genotyped using genomic and WGA DNA. Of the initial 1,267 genotyped samples, only 32 failed, and of these only five were from WGA DNA. The Illumina high-throughput platform was used in two independent runs to genotype 57 samples prepared by different individuals and different protocols and concentrations ranging from 21 to 883 ng/μl. All but two of the test samples had gene call scores >0.99, a quality metric that indicates the reliability of the genotype called (maximum gene call score = 1.0).

Selection of Targeted Genes and SNPs

Candidate genes were chosen from expression profiles of animal models of neonatal lung disease and lung inflammation (36,37), from National Institutes of Health–sponsored Programs for Genomic Applications related to human lung disease development (http://innateimmunity.net, http://www.hopkins-genomics.org, http://pga.mbt.washington.edu), and by fulfilling at least one of the following criteria: (i) involvement in innate immune response, inflammation, and tissue repair, and (ii) biologic plausibility in relation to lung disease development. Thus the candidate genes include genes of cytokines, inflammation, growth factors, antioxidants, cell adhesion receptors and proteins, apoptosis-associated proteins, cytoskeletal and mobility proteins, ion channels and transport proteins, receptors, proteases, transcription factors and DNA-binding proteins, modulator-effectors of edema and water channels, coagulation and fibrinolysis, extracellular cell signaling and communication, protein turnover, and other miscellaneous genes. A current website (http://public.nhlbi.nih.gov/GeneticsGenomics/home/) may contain the information of the earlier websites we used, which are not currently maintained.

Both nonsynonymous and tagSNPs were selected from HapMap (http://www.hapmap.org) and studied for each target gene plus 5 kb upstream and downstream of each target gene sequence, based on the quality metric (gene call score ≥0.8, validation class ≥2). A tagging algorithm was then run using the selected SNPs with a minor allele frequency (MAF) of 0.2 for CAs or MAF = 0.4 for AAs, and all the selected SNPs were assayed in genomic DNA samples from both CA and AA groups. The final list (n = 6,926) of SNPs selected included the following: (i) tag SNPs (n = 4,974) (of these, 3,789 were from the Caucasian population (CEU) and 1195 from the Yoruban population (YRI); (ii) nonsynonymous SNPs (n = 1,392); and (iii) European stratification SNPs (n = 560) (38). Of the 6,926 SNPs, 450 (6.5%) failed primer manufacturing and 152 (2.3%) failed genotyping, and these were excluded.

Addressing Population Stratification

Our initial sample was a mixture of AAs and CAs. We used the method implemented in STRUCTURE (version 2.2) (39) on Ancestry Informative Markers designed to distinguish between CAs and AAs, and on Ancestry Informative Markers by Seldin et al. (38) to distinguish between Northern and Southern European descent. The replication set consisted of all parent-reported race of AA.

Association Testing and Group Comparison

We used linear trend test as implemented in PLINK (version 1.05) (40,41,42) in discovery and replication sets. The data sets analyzed included AA and CA groups using the admixture proportions identified by STRUCTURE as covariates. The latter association was tested using logistic regression. Welch’s t-test (43) was used to test for the difference in mean GA and mean BW between cases and controls.

Correction for Multiple Testing

To correct for multiple testing, we used the q value method v1.1 developed by Storey and Tibshirani (24) and applied it to group’s set of P values. Instead of controlling the probability of one or more false positives in a family of tests (the family-wise error rate), the q value controls the expected proportion of false positives among all rejected hypotheses (the false discovery rate (FDR)) (44). The q value takes a given set of P values and estimates the minimum FDR that is incurred when calling a particular test significant (the q value of the test). An FDR of 0.05 was used as the significance level.

Replication Study

We genotyped a second, distinct set of samples (n = 184) derived from AA cases (44 males; 38 females) and controls (52 males; 50 females). Genomic DNA was amplified with the Taqman universal PCR master mix and SNP genotyping assays (c__27514233_10 for rs3771171 and c__25808669_10 for rs3771150) (Applied Biosystems, Foster City, CA). Results were monitored by the ABI PRISM 7900 sequence detection system (Applied Biosystems) and analyzed by allelic discrimination. No correction for multiple testing was performed for these two SNPs.

Computing Combined P Values for Discovery and Replication Sets

We used Fisher’s method as implemented in P values (45) to combine P values across discovery and replication sets.

LD Analysis

We used Haploview (26) to examine linkage disequilibrium (LD) relationships for rs3771171 and rs3771150 in YRI. The SNP input information was obtained from HapMap, and the Yoruban population was chosen because of its closeness to AAs. The Hedricks multiallelic association measure D’ was calculated for all pairs of SNPs by the CI method (46).

Statement of Financial Support

This work was supported by National Institutes of Health grants HL34788 (J.F.) and R01HL71113 and R01HL87166 (R.M.V.).