Main

The cause of RDS of the prematurely born infant is likely a combination of multifactorial environmental insults and multigenic inherited predispositions (1, 2). Deficiency of the lipoprotein complex called pulmonary surfactant can lead to RDS in the prematurely born infant. Surfactant protein deficiency (35) as well as genetic variants of the surfactant protein A (SP-A) and SP-B have been associated with RDS (612).

SP-B is essential for normal lung function as assessed by animal (13) and human (14, 15) studies. Lower than normal amounts of SP-B protein in SP-B heterozygous (+/−) mice, under normal conditions, have been associated with physiologic lung abnormalities such as decreased compliance and air trapping (16). The human SP-B gene has been localized on the short arm of chromosome 2 (17). It consists of 11 exons, and the precursor molecule of 42 kD (18) undergoes 5′ and 3′ posttranslational cleavages to give rise to the mature SP-B of 6 kD (19).

A number of SP-B SNPs (15) and four SP-B–linked microsatellite markers (20) have been characterized. Three of the microsatellite markers [D2S388, D2S2232, (AAGG)n] are located at the 5′ end of the SP-B gene at approximately 130, 64, and 27 kb, respectively, according to the high-resolution TNG3 radiation hybrid panel analysis (20). However, according to the current version of the human genome map of chromosome 2 (http://www.ensemb.org/Homo_sapiens), D2S388 and D2S2232 markers are at approximately 191 and 105 kb from the centromeric end of SP-B, respectively. The microsatellite marker GATA41E01 (or D2S1331) is located at the 3′ end of the SP-B at approximately 1074 kb by the G3 medium-resolution radiation hybrid panel (20) and at the 5′ end of SP-B at approximately 697 kb by the current human genome map. Microsatellite marker loci are highly polymorphic as assessed by their high values of polymorphism information content (PIC) or heterozygosity index. For example, many polymorphic dinucleotide repeats have PIC values >0.70 (21), with an average heterozygosity index of about 0.75 (22). As a result, microsatellite markers are considered powerful informative markers in genetic studies of disease etiology. The higher level of resolution afforded by these multiallelic microsatellite markers can lead to a better detection of disease subgroups (21, 22). In contrast, the average PIC value of a SNP is about 0.3 (22). In general, SNPs are not as informative as microsatellite markers in linkage studies. However, haplotype analysis of multiple linked SNPs can be more informative.

In this report, using biallelic and multiallelic FBAT (2325) and ETDT/transmission disequilibrium test (TDT) analyses (for single marker analysis FBAT and TDT are essentially the same test), we aimed to a) determine whether SP-B or SP-B-linked loci are linked to the RDS locus and b) identify susceptibility and/or protective alleles and haplotypes for RDS.

METHODS

Genomic DNA was extracted from blood samples, buccal swabs, or discarded tissues from subjects with and without RDS and their parents according to institutional guidelines. The protocol for the use of human samples in this study was approved by The Human Subjects Protection Office of The Pennsylvania State University College of Medicine. The diagnosis of RDS was made by clinical criteria (grunting, flaring, retraction, need for oxygen and continuous positive airway pressure or ventilatory support) and/or verified by the reticulogranular pattern on x-ray. The genomic DNA served as template for PCR in the genotype analysis.

The study group consisted of 132 families. These families consisted of one (n = 43) or two (n = 89) parents and had at least one affected child. For these families, genotype data of the loci under study (see below) for the parents and for at least one affected offspring were used for analyses in the present study. Table 1 shows the characteristics of the study group. These include the number of affected and the total number of children per family, zygocity, race, sex, ethnicity, and age.

Table 1 Characteristics of population used in the study (N = 399)

For each microsatellite or SP-B SNP marker loci, the number of alleles or haplotypes transmitted indicates the frequency that the particular allele or haplotype was transmitted to the affected offspring by the heterozygous parent(s). Therefore, in the study, only parents heterozygous for the particular marker alleles or haplotypes were used. For each allele at loci with multiple alleles (or haplotypes), a new biallelic system was created by collapsing the alleles other than the one studied into a new allele category X. For example, to study the transmission of allele 2 of marker D2S388, the other alleles (1, 3, 4, 5, etc.) of marker D2S388 were collapsed into a new allele category X.

Genotype analysis.

Genomic DNA was used as template for genotype analysis of four SNP loci, (-18(A/C), 1013 (A/C), 1580 (C/T), and 9306 (A/G) in human SP-B (15, 26) and of four SP-B-linked microsatellite marker loci (20). The location of the microsatellite markers in relation to centromere (C) and telomere (T) as well as the SP-B gene SNPs are shown in Figure 1. Originally, the location of the microsatellite markers was determined by radiation hybrid panels (20). The G3 medium resolution hybrid panel positioned the D2S2232 more distally from the SP-B locus than the D2S388, whereas the TNG3 high-resolution hybrid panel positioned the D2S388 as the most distal marker (compared with D2S2232) from the SP-B locus (20). The GATA41E01 or D2S1331 microsatellite marker was placed at 3′ end of SP-B by the G3 medium resolution radiation hybrid panel analysis (20) but at the 5′ end of SP-B by the human genome map. The location of (AAGG)n microsatellite markers is as described by Kala et al. (20). In Figure 1, we show the relative location of the four microsatellite markers according to the current map of the human chromosome 2 (http://www.ensembl.org/Homo_sapiens). The genotyping and scoring of alleles for the microsatellites was done as described previously (15, 20). A fraction of the SNPs were genotyped using pyrosequencing (Pavlovic et al., in preparation). The PCR-based RFLP genotype method (15) provided the basis for the pyrosequencing protocol, which is a primer-based DNA sequencing method. Briefly, single-stranded templates containing the SNP under study are produced by polymerase chain reaction (PCR). Then a sequencing primer hybridizes to the single-stranded template and initiates nucleotide incorporation by DNA polymerase. Following incorporation of each nucleotide, a pyrophosphate group is released, which is converted to adenosine triphosphate (ATP). The ATP in turn drives the conversion of luciferin to oxyluciferin, producing a light that is detected by a charge-coupled device camera. Each light signal generates a pyrogram and is proportional to the number of nucleotides incorporated onto the DNA strand. Pyrograms were scored by a pattern-recognition software that compare the predicted SNP pattern (histogram) to the observed pattern (pyrogram) (Pyrosequencing AB, Uppsala, Sweden).

Figure 1
figure 1

Schematic presentation of SP-B SNPs and SP-B centromeric flanking microsatellite marker loci. The relative location of the marker loci used in the present analysis is shown from the telomere (T) to the centromere (C) according to the current map of human chromosome 2 (see “Methods”). The distance of the flanking microsatellite marker loci is noted in kilobase from the 3′ end of the SP-B gene (or in megabase from the telomere). SP-B has been located from 5′ to 3′ within the 85.80697–85.79705 region from the telomere. The relative location of the granulysin precursor (GNLY) is noted by an arrow.

Statistical analysis. The PedCheck program was used to determine the compatibility of genotypes at each marker locus within families, before analysis (27). Marker loci with incompatible parental and offspring genotypes were treated as missing data in the families of question. Association between alleles of surfactant protein genes and RDS was tested using the FBAT (23, 25, 28). ETDT analysis was performed to assess linkage of a multiallelic locus to the disease locus (29, 30).

FBAT analysis was performed using the online program http://www.biostat.harvard.edu/fbat/fbat.htm. Both multiallelic FBAT analysis for marker loci with multiple alleles or haplotypes (23) and biallelic FBAT analysis for SP-B SNP markers and selected microsatellites (23, 25) were performed. Haplotypes formed by all contiguous marker loci (two to eight loci) were tested. All FBAT analyses assumed an additive model, with a minimum size (minisize) set to four, indicating that the test statistic was not computed when the number of informative families available was fewer than four. A significant p value and a positive Z statistic were indicative of a susceptibility marker allele for disease, whereas a significant p value and a negative Z statistic were indicative of a protective marker allele for disease. Results were not corrected for multiple comparisons because the eight marker loci under study may not be independent given the short distances of their location on chromosome 2 (15, 20).

Missing parental haplotypes were reconstructed by the FBAT program based on observed parental and offspring genotypes assuming no recombination among marker loci. For individuals with multiple possible haplotypes, all the possible haplotypes were incorporated into the test. Different haplotype pairs were given different weights according to the conditional probability of observing the haplotype pairs given that they were compatible with the observed unphased genotype. FBAT evaluates the distribution of test statistics using the conditional offspring genotype distribution under the null hypothesis (24).

RESULTS

ETDT and TDT analyses.

ETDT analysis identified three microsatellite marker loci [D2S388, (AAGG)n, GATA41E01] to be linked to RDS. As indicated by the partial output for these marker loci (Table 2), the allele-wise TDT model is more powerful for marker loci D2S388 (p = 0.04) and (AAGG)n (p = 0.045) because the χ2 for goodness of fit of the allele-wise model is not significant (Table 3). For marker locus GATA41E01 (or D2S1331), the genotype-wise TDT model is more powerful (p = 0.048; goodness of fit: p = 0.03). Table 2 depicts log likelihoods under the null and the parsimonious (allele-wise) hypotheses and for the saturated (genotype-wise) model. L0 of ETDT output represents the log base e likelihood of observing the transmission pattern, assuming there is no linkage and no linkage disequilibrium (LD). L1 of ETDT output represents the log-likelihood of the observed data maximized with respect to all m-1 parameters; m is the number of alleles at the locus. Each one of them has a transmission probability that can be estimated by maximizing the log-likelihood. L2 of ETDT output represents the log-likelihood of the observed data maximized with respect to the m(m-1)/2 parameters. Each heterozygote parent has a transmission probability for transmitting a certain allele that can be estimated by maximizing the log-likelihood. Table 3 depicts the χ2 and the likelihood ratio for allele-wise and genotype-wise TDT and for the goodness of fit of the allele-wise model.

Table 2 Partial output of the ETDT analysis for the microsatellite markers: log likelihood under various hypotheses or model
Table 3 Partial output of the ETDT analysis for the microsatellite markers: χ2 for various TDTs or model

The likelihood ratio depicts the likelihood of the observed data under the alternative hypothesis relative to the likelihood of the observed data under the null hypothesis. For example, the first row for each marker in Table 3 represents the likelihood of the observed data under the allele-wise model, assuming there is linkage and/or LD relative to the likelihood of the observed data, assuming there is no linkage and no LD. The likelihood ratio was computed using the unlogged likelihood under the alternative hypothesis divided by the unlogged likelihood under the null hypothesis. For example, the first row for each marker in Table 3 was computed by unlogged L1/unlogged L0, where L1 represented the maximized log base e likelihood of the observed data under the allele-wise model, and L0 represented the log base e likelihood of the observed data, assuming there is no linkage and no LD. One allele at each of these marker loci contributes to RDS risk (Fig. 2). These comprise two susceptibility alleles, D2S388_4 (p = 0.04) and GATA41E01_1 (p = 0.01) and one protective allele (AAGG)n_8 (p = 0.03).

Figure 2
figure 2

Protective and susceptibility alleles and haplotypes. In the center (shaded area), the relative position of the SP-B SNP marker loci and the microsatellite markers present in the SP-B centromeric flanking region are shown (not drawn to scale). The protective and susceptibility haplotypes and alleles determined by multi-FBAT (haplotypes) and ETDT/TDT (alleles), are shown, respectively, above and below the SP-B diagram of the marker loci (shaded). For example, two protective haplotypes have been identified by FBAT (−18_A/1013_C/1580_T/9306_G and −18_A/1013_A/1580_T/9306_A), and by ETDT/TDT analysis, one microsatellite allele [(AAGG)n_8)]. FBAT and ETDT/TDT also identified a susceptibility haplotype and two microsatellite alleles. Three susceptibility haplotypes identified by biallelic FBAT analysis are also shown.

Based on the number of families listed in Table 1, we performed power calculation using the computer program TDT power calculator, version 1.2.1 (31). Assuming the disease allele has a frequency of 0.15 (with penetrance of 0.2, 0.1, and 0.05 for DD, Dd, and dd genotypes, respectively), the D′ (a measure of LD between the marker and disease allele) is 0.9, and the recombination fraction (between the marker and disease locus) is 0, we have 72.7% power to detect a significant linkage and/or association at 0.05 significance level if the marker has a minor allele frequency (MAF) of 0.1. Keeping all parameters unchanged except for the marker MAF, we have, respectively, 82.5%, 64.1%, 49.2%, and 35.4% power to detect a significant linkage and/or association at 0.05 significance level if the markers have MAF of 0.2, 0.3, 0.4, and 0.5.

FBAT analysis.

Multiallelic FBAT analysis of the four SP-B SNP regions (−18(A/C)/1013(A/C)/1580(C/T)/ 9306(A/G) indicated that the SP-B locus is linked to RDS (the overall p value for this region is 0.007). One susceptibility and two protective haplotypes of this region appear to contribute to this linkage (Fig. 2). The protective haplotypes are −18_A/1013_C/1580_T/9306_G (p = 0.04) and −18_A/1013_A/1580_T/9306_A (p = 0.02), and the susceptibility haplotype is −18_A/1013_C/1580_T/9306_A (p = 0.04). Although no other region showed significant linkage as judged by multiallelic FBAT, biallelic FBAT analysis identified three susceptibility haplotypes with significant p values (p < 0.01) formed by 2–4 SNP and/or microsatellite marker loci that can potentially contribute to RDS risk (Fig. 2). These are 9306_A/GATA4E01_1 (p = 0.003), 9306_A/1580_C/GATA4E01_1 (p = 0.004), and 9306_A/1580_C/1013_A/GATA41E01_1 (p = 0.005). Interestingly, the GATA41E01 (or D2S1331) locus shown by ETDT to be linked to RDS is present in all three susceptibility haplotypes identified by the biallelic FBAT.

DISCUSSION

The cause of RDS of the prematurely born infant is likely a combination of multifactorial environmental insults and multigenic inherited predispositions (1, 2, 32). SP-B is essential for normal lung function (1315), and SP-B polymorphisms from case-control studies of unrelated individuals have been shown to be associated with RDS (79, 11, 12). In the present study, we used FBAT and ETDT analyses to avoid problems of false associations that can occur with case-control studies and to determine linkage and association of SP-B SNP or SP-B–linked microsatellite marker loci with RDS. The findings indicated that the SP-B four-SNP region (multiallele FBAT) and three microsatellite marker loci (ETDT) are linked to RDS. Two protective haplotypes and one susceptibility haplotype were identified for the SP-B four-SNP region, and of the significant microsatellite marker loci, two susceptibility alleles (D25388_4, GATA41E01_1) and one protective allele [(AAGG)n_8] were identified. These findings indicate that the SP-B genetic locus and the SP-B centromeric flanking region are linked to RDS and that certain haplotypes and alleles contribute to susceptibility or protection from RDS.

For single-marker analysis, FBAT and TDT are essentially the same. However, for haplotype analysis, FBAT may be more appropriate for nearby loci that are likely to be in LD. The haplotype FBAT test uses a weighted conditional approach and assumes no recombination (24), whereas the GeneHunter haplotype TDT uses the maximum likelihood approach based on an earlier test, the Lander-Green hidden Markov model, and assumes no LD (33). FBAT and TDT measure the transmission frequency of alleles from a heterozygous parent to the affected offspring, whereas the nontransmitted allele serves as control. In this way, problems of false associations that may arise in case-control association studies due to unrecognized bias of stratification of the study group, admixture, or heterogeneity (34) may be minimized. Multiallelic FBAT and ETDT analyses can be used for a composite test analysis of all the alleles of a multiallelic locus, such as the microsatellite marker loci (29) or for the haplotypes of several SNP loci, and can determine linkage. Together, the data from the FBAT and ETDT analyses can provide evidence of linkage and association of the loci under study with the disease locus in question.

It is currently unknown whether the SP-B–linked microsatellite marker alleles, shown in the present study to be associated with increased or decreased risk of RDS, identify SP-B alleles or alleles of other (yet unknown) genes associated with RDS. A search of the human genome map specifically in the region between the microsatellite markers determined in the present study to be associated with and linked to RDS identified several known and unknown genes. One of the known genes is the granulysin gene, a T-cell product, which is located on the centromeric side of (AAGG)n at a distance of about 36 kb from SP-B or at 85.833 Mb from the telomere. Granulysin is expressed in human cytolytic T lymphocytes and in natural killer cells and contributes to host defense via its antimicrobial properties (3537). Interestingly, granulysin shares sequence similarity with a family of proteins known as saposins, of which SP-B is a family member. Granulysin has been shown to exhibit lytic activity against a wide range of microbes, inhibit growth of both Gram-positive and Gram-negative bacteria (38) and induce apoptosis in Jurkat cells (39). Recently, SP-B has been shown, in addition to its surfactant-related activity, to play a role in host defense (4042). Therefore, the sequence similarity between SP-B and granulysin and their role in host defense raise the possibility that these molecules have similar or overlapping host defense activities. However, if and how granulysin contributes to the development of RDS remain to be determined. Several other genes were identified that were located between D2S388 and D2S1331 (DNA directed polymerase 1, mitochondrial inner membrane protein, ribosomal protein L35, and receptor expression enhancing protein 1) as well as between D2S388 and D2S2322 (sialyl transferase 9, atonal homologue 8). None of these genes appear to be expressed in the lung, and thus their role (if any) in the pathogenesis of RDS is unknown. However, the significant microsatellite alleles identified in the present study may play a role in the regulation of RDS-associated alleles. It has been shown that microsatellites allow formation of Z-DNA (left-handed) conformation, which in turn may promote interaction with trans-acting factors. Such an interaction may be influenced by the length of the microsatellites (4345). A role of microsatellites in the regulation of the human fetal globin gene has been suggested in a preliminary study (46).

Previous case-control studies have associated SP-B marker alleles and RDS. The frequency of an SP-B intron 4 size variant has been shown to be higher in RDS (7, 9), and the combined frequency of the SP-B intron 4 variant and an SP-A variant was significantly higher in RDS than either marker allele alone, indicating a synergistic effect (9, 11). Moreover, another case-control study identified SP-B as an interactive genetic determinant with SP-A for RDS (8). Although the intron 4 variant was not investigated in the present study, association of SP-B variants with RDS is consistent with the finding that the SP-B genetic locus is linked to RDS.

In summary, the present findings indicate linkage and association of the SP-B gene as well as of SP-B–linked loci upstream (on the centromeric side) of the SP-B locus to the RDS locus. Several alleles and/or haplotypes are identified to be associated with increased or decreased risk of RDS. The SP-B gene has been shown to be essential for life, and genetic variants may modulate risk of disease under certain environmental insults or stressors. The significant SP-B–linked marker loci may identify loci of unknown genes associated with RDS or SP-B variants linked to the microsatellite markers. Moreover, the marker loci studied here and shown to be linked to RDS may be useful in identifying RDS subgroups for which different therapies may be considered, and their underlying mechanisms may be investigated.