Introduction

Surfactant protein C (SP-C) is a component of pulmonary alveolar surfactant, a complex lipid–protein mixture lining the alveolar surface of the lung. Deficiency of surfactant due to a lack of differentiation of the complex leads to respiratory distress syndrome (RDS), which is the major cause of mortality and morbidity among premature infants.1 Many premature infants with RDS develop a chronic lung disease called bronchopulmonary dysplasia (BPD). Some very premature infants, who do not initially have RDS, may develop BPD, suggesting that factors other than the differentiation of alveolar epithelial cells play a role in the pathogenesis.2

SP-C is expressed as a 179 amino-acid proprotein. Mature SP-C is an extremely nonpolar peptide that contains two thioester-linked palmitoyl groups. The amino acids 9–34 of mature SP-C, which are mostly valines and leucines, form a stable α-helix capable of spanning the membrane bilayer.3 Unlike other surfactant proteins, SP-C has been considered to be lung-specific. The cell type-specific expression of SP-C and the conservation of its primary structure between species implies specific functions of importance.4 However, the functional roles of SP-C are still incompletely understood. Mice lacking SP-C have apparently no serious lung disease, but show abnormalities in surfactant stability and decreased viscoelasticity of the lung. It has been suggested that SP-C is important in stabilizing the phospholipid film, especially at reduced lung volumes.5

In addition to SP-C, surfactant protein B (SP-B) is a hydrophobic protein involved in reducing surface tension at the air–liquid interface of alveoli. SP-B has additional roles in maintaining the integrity of alveolar epithelial cells. Absence of SP-B expression results in defective intracellular processing of the surfactant and secretion of a complex that contains an excess of proSP-C and no mature peptide.6,7 On the other hand, surfactant protein A (SP-A) and surfactant protein D (SP-D) are hydrophilic C-type lectins involved in the surfactant metabolism and innate immunity of the lung.8

The genes encoding surfactant proteins are polymorphic, as they contain several variable sites in their coding and noncoding DNA sequences. The SP-A, -B and -D genes have been studied as candidates for multifactorial lung diseases. Certain SP-A alleles and an SNP in the SP-B gene have been shown to associate interactively with RDS in premature infants.9

The SP-C gene is short, spanning only 3500 base pairs on the short arm of chromosome 8 (8p23.1) near the gene coding for bone morphogenetic protein-1 (BMP-1).10,11,12,13 The extent of allelic variation and the distribution of different SP-C alleles in the general population have not been examined in detail. The first report of an association between a mutation in the SP-C gene and a disease was published by Nogee et al.14 They demonstrated a single base substitution in an interstitial lung disease patient's SP-C gene that resulted in a preterm stop codon causing a large deletion in the carboxyl-terminal domain of the patient's SP-C precursor protein. Recently, Nogee15 reported several new SP-C gene mutations in 11 patients with chronic lung disease of unknown etiology.

Detection of single-nucleotide polymorphisms (SNPs) in the candidate genes of complex multifactorial diseases is a useful tool in molecular genetics of complex traits. SNPs occur frequently in the genome, and they can be used as markers of disease-causing mutations or other alterations in DNA. SNPs often have functional consequences, especially when they are located in the coding or regulatory regions of a gene.16 Among the most widely used techniques of screening for single-base changes are single-strand conformation polymorphism and denaturing gradient gel electrophoresis, which are based on the screening of a PCR-amplified DNA fragment for the presence of mutations or nucleotide mismatches. Conformation-sensitive gel electrophoresis (CSGE) has proved to be a simple and accurate method for detecting single-nucleotide changes even in repetitive and GC-rich templates.17 Under optimal conditions, the method is highly sensitive and specific.18

The aim of the present study was to examine the extent of common genetic variation in the SP-C gene in a homogenous Caucasian population of Finnish origin, and to investigate the significance of SP-C gene variation in a high-risk population consisting of very premature infants. The exonic regions of the SP-C gene were analyzed for polymorphisms, and methods for genotyping three biallelic polymorphisms in the coding regions of the SP-C gene were generated. SP-C allele and genotype frequencies were determined from a population of consecutively born healthy newborn infants, and the linkage disequilibrium between the SP-C alleles was evaluated by haplotype analysis. Here, we report for the first time evidence linking the SP-C gene with perinatal disease.

Subjects and methods

Study population

The study population of 245 high-risk newborns consisted of infants born before 34 weeks of gestation (term 37–42 weeks) in Oulu University Hospital, Seinäjoki Central Hospital and Tampere University Hospital. Umbilical cord blood samples (n=116) were collected prospectively during the years 1997–2001, and buccal smear samples (n=129) were obtained from the retrospectively recruited premature infants born during 1987–1996. Umbilical cord blood specimens were additionally obtained from 158 healthy term infants born consecutively in Oulu University Hospital during 2 months in 1998. All infants were of Finnish origin. In addition, bucchal smear samples available from the parents of the premature infants were collected retrospectively (n=69). Informed consent was obtained from the parents of the infants, and the study protocol was approved by the ethical committees of the participating centres.

The maternal and neonatal medical histories were evaluated for clinical data concerning gestational age, antenatal and postnatal glucocorticoid treatment, gender, birth order, etiology of preterm birth, mode of delivery and severity of the respiratory disorders of the premature infant. The diagnosis of RDS was made on the basis of the published clinical, radiographic and/or pathologic criteria.19 The diagnostic criterion of BPD used in the study was the requirement for supplemental oxygen at the postmenstrual age of 36.0 weeks.2

Table 1 presents the full-term infants (n=158) and the characteristics of the premature infants (n=245). There were altogether 183 cases with RDS (106 males, 77 females). The incidence of RDS among the premature infants was 74.7%. Altogether, 75 of the premature infants developed BPD (45 males and 30 females). Of them, seven had no RDS at birth.

Table 1 Clinical characteristic of the premature and the full-term infants

DNA sample preparation

Genomic DNA was extracted from frozen EDTA-anticoagulated blood samples, using the Puregene DNA Isolation kit (Gentra Systems, Minneapolis, MN, USA). DNA was then diluted to 50 ng/μl. Genomic DNA was extracted from buccal smears using Chelex 100 medium.

CSGE analysis

A total of 90 DNA samples from full-term infants were analyzed with CSGE. Six pairs of primers were designed to specifically amplify the six exons of the SP-C gene (Table 2), using the sequence J03890 (NCBI nucleotide database)11 as a reference sequence. The PCR for each CSGE reaction was performed in a 20 μl reaction volume containing 100 ng of genomic DNA, 0.06 mM dNTPs (Pharmacia, Peapack, NJ, USA), 0.2 μ M of each primer, 1 × HotPrime incubation buffer (10 mM Tris-HCl pH 8.3, 50 mM KCl), 1.5 mM MgCl2 and 1.75 U of Hot Prime TM ‘Hot start’ Polymerase (Qbiogene, France). The amplifying conditions were as follows: an initial denaturation at 95°C for 15 min, followed by 37 cycles of 30 s at 95°C, 1 min at a primer-specific temperature (58–65°C), 1 min at 72°C and final extension at 72°C for 7 min, and a 10°C dwell cycle (Thermal cycler PTC-200; MJ Research, Waltham, MA, USA). For amplification of the fragment containing exon 3, the denaturation step of the PCR program was 1 min, and the elongation step 2 min, using 0.5 mM dNTPs and 7.5 mM MgCl2.

Table 2 Primers used in CSGE analysis of the SP-C gene

Heteroduplexes were generated by incubating the PCR products at 98°C for 5 min, followed by 30 min at 68°C and a 10°C dwell cycle. A suitable amount of the PCR product, usually 10 μl, was taken for CSGE analysis. CSGE was essentially performed as described.20

Nucleotide sequencing

The observed heteroduplexes were identified by direct sequencing (ABI PRISM™ 377 Sequencer and DYEnamic ET Terminator Cycle Sequencing Kit, Amersham Pharmacia Biotech), or cloning (pGEM®-T Easy Vector System, Promega Corporation, Madison, USA) and sequencing. If several different heteroduplexes were observed for any analyzed fragment, at least two of each differently migrating heteroduplexes were sequenced. Also, several homoduplexes were sequenced from each CSGE analysis. The sequences were then compared to the published genomic SP-C gene sequences J0389011 and U0294821 (NCBI nucleotide database).

Genotyping of SP-C alleles

For the genotyping of polymorphisms in exons 1 (P14P), 4 (N138T) and 5 (N186S), genotyping methods based on PCR and RFLP were generated. The PCR for each genotyping was performed in a 10 μl reaction volume containing 50 ng of genomic DNA, 0.06 mM dNTPs (Pharmacia, Peapack, NJ, USA), 0.2 μ M each primer, 1 × HotPrime incubation buffer (10 mM Tris-HCl pH 8.3, 50 mM KCl), 1.5 mM MgCl2 and 0.9 U of Hot Prime TM ‘Hot start’ Polymerase (Qbiogene, France). The PCR cycles consisted of initial denaturation at 95°C for 15 min, followed by 37 cycles of 95°C for 30 s, 1 min at a primer-specific temperature, 72°C for 1 min and a final extension at 72°C for 7 min (Thermal cycler PTC-200; MJ Research, Waltham, MA, USA). An aliquot of the PCR product was digested with a specific enzyme, and the resulting fragments were visualized on 2.5% agarose gel. A summary of the genotyping procedure is presented in Table 3. Altogether, 472 DNA samples were genotyped, including 158 samples from full-term infants, 245 from premature infants and 69 samples from the parents of the premature infants.

Table 3 Genotyping of SP-C polymorphisms in exons 1, 4 and 5

Statistical analysis and haplotype analysis between N138T and N186S polymorphisms

The allele and genotype frequencies were calculated using SPSS for Windows versions 9 or 11 (SPSS Inc., Chicago, IL, USA). Comparisons of the allele and genotype frequencies were performed by the χ2 test (two-sided) using 2 × 2 or 2 × 3 contingency tables, respectively. The observed genotype frequencies were tested for Hardy–Weinberg equilibrium by χ2 analysis. Linkage disequilibrium between the N138T and N186S polymorphisms was determined by haplotype analysis, by genotyping the 69 DNA samples available from the parents of the premature infants. The significance of linkage disequilibrium was determined using χ2 analysis by comparing the observed and expected haplotype distributions. The expected haplotype distribution was calculated on the basis of the observed allele frequencies. The strength of linkage disequilibrium was determined by calculating the Lewontin's D′.22

Logistic regression analysis (SPSS for Windows version 11) was used to investigate whether SP-C alleles explained the risk of RDS in the presence of gender as a confounding environmental factor. Variables in the analyses were included using forced entry, and the presence or absence of RDS was set as the dependent variable.

Results

Polymorphism of the SP-C gene

The primers for CSGE were designed to amplify the six exons of the SP-C gene, each PCR fragment containing one exon and at least 70 base pairs of the 5′ and 3′ flanking sequences of the exon. In the analysis with CSGE, heteroduplexes were observed frequently in all amplified fragments, except in the fragment containing exon 1, where only two homologous heteroduplexes were detected among the 90 samples analyzed. To define the nucleotide changes causing the heteroduplexes, several samples of each fragment were cloned and sequenced after analysis with CSGE. The sequences were then compared to the published genomic human SP-C sequences. Here, the numbering of the nucleotides refers to the sequence J03890.11

A new variation, a heterozygous substitution of A for G (g.657G>A), was detected in the last nucleotide of exon 1 in two of the 90 samples analyzed with CSGE. This substitution did not change the amino-acid composition, since both codons CCG and CCA encode for Proline. The frequency of the CCA variant was found to be very low among the Finnish population (Tables 5, 6 and 7). No variation was found in exon 2, which encodes the mature SP-C protein. Sequencing revealed only one polymorphic site in intron 1, 21 base pairs upstream from the exon 2 start site (g.1335C/T), which caused the heteroduplexes observed in CSGE analysis. The codon for amino acid 45 (22 in mature SP-C) was CTC (Leu) in all sequenced samples, which differs from the sequence U02948.21 Exon 3 was also found to be conserved, since no polymorphisms were detected in the coding sequence. However, several homozygous changes were detected in intron 3 (g.1993–1994insC, g.2006–2007insC, g.2009–2010insG). Since the fragment of exon 3 was rather large and the sensitivity of CSGE could therefore be decreased, multiple samples were cloned and sequenced to ensure full analysis of this fragment.

Table 5 SP-C allele and genotype frequencies (%) among the premature infants with RDS and their premature controls
Table 6 SP-C allele and genotype frequencies (%) among the premature infants with BPD and the controls with or without RDS
Table 7 SP-C allele and genotype frequencies (%) according to the degree of prematurity

An AAT/ACT (g.2294A/C) polymorphism that resulted in p.N138T variation was detected in exon 4. We found both of these alleles to occur frequently in the Finnish population (Tables 5, 6 and 7). Also, a TG-deletion polymorphism was observed in intron 4 (g.2328–2329delTG). An AAC/AGC (g.2772A/G) polymorphism, which results in p.N186S variation, was detected in exon 5. Both of these alleles occur frequently in the Finnish population (Tables 5, 6 and 7). In addition, a G/C polymorphism was detected in intron 4, eight base pairs upstream from the first nucleotide of exon 5 (g.2643G/C). A single base deletion polymorphism leading to EcoRI-RFLP 23 was also present in intron 5 (g.2987delA). Exon 6, where several variable sites have been detected, is entirely untranslated. We found polymorphisms in the nucleotides g.3130A/G, g.3180T/C and g.3181A/G. Figure 1 presents the exonic polymorphisms that were detected in the SP-C gene.

Figure 1
figure 1

Genetic polymorphisms in the exonic regions of the SP-C gene. The black boxes indicate exons and the gray boxes indicate untranslated exons.

Linkage disequilibrium between N138T and N186S polymorphisms

The allele coding for 138 Thr was found to exist more often than expected with the allele 186 Ser, and the allele coding for 138 Asn with the allele 186 Asn, suggesting linkage disequilibrium between these loci. The strength of this linkage disequilibrium was estimated by haplotype analysis, by genotyping the DNA samples available from the parents of 36 premature infants. Altogether, 69 DNA samples of the parents were genotyped, and 70 haplotypes could be confirmed. The results are presented in Table 4. The linkage disequilibrium was very strong (Lewontin's D′=0.973) between these loci.

Table 4 Linkage disequilibrium (LD) between the N138T and N186S polymorphisms

Association analysis of SP-C alleles with RDS and BPD among premature infants

The frequencies of polymorphisms in the exons 1 (P14P), 4 (N138T) and 5 (N186S) were determined in the population of premature infants with RDS (n=183) and their premature control infants (n=62) (Table 5). The frequencies of the alleles coding for 138 Asn and 186 Asn among the infants with RDS were 0.32 and 0.35, while the frequencies among the infants without RDS were 0.23 and 0.25 (P=0.071 and 0.040, respectively). Among the infants without RDS and with BPD (n=7), the 138 Asn and 186 Asn frequencies were 0.14 and 0.21, respectively.

Among the very premature infants with gestational age of less than 30 weeks, the frequencies of 138 Asn and 186 Asn were 0.33 and 0.35 for those with RDS (n=144) and 0.20 and 0.24 for the control infants (n=27) (P=0.059 and 0.095, respectively).

When the infants with RDS and the premature control infants were further divided according to the gender, the association of alleles 138 Asn and 186 Asn with RDS was observed to be stronger among male infants (P=0.018 and 0.045, respectively). Logistic regression analysis was used to test whether alleles coding for 138 Asn or 186 Asn explained the risk of RDS when gender was included in the analysis as a confounding factor. According to this analysis, both alleles were independent risk factors for RDS (for allele 138 Asn: P=0.022, OR 2.01, 95% CI 1.10–3.66 and for allele 186 Asn: P=0.036, OR 1.87, 95% CI 1.04–3.38), as well as haplotype 138 Asn–186 Asn (P=0.020, OR 2.05, 95% CI 1.12–3.74).

The association of SP-C polymorphism with BPD was tested. The study population included 75 infants with BPD and 170 controls. The control group was categorized based on the diagnosis of RDS, and the two groups, controls with RDS and controls without RDS, were independently compared with the group of infants with BPD (Table 6). There were no significant differences in the SP-C allele or genotype frequencies between the infants with BPD and the controls with RDS. When the controls without RDS (n=55) were compared with the infants with BPD, moderate differences between the allele frequencies emerged. The frequency of 138 Asn was 0.25 and the frequency of 186 Asn 0.26 among the controls without RDS, compared to 0.33 and 0.35 among the infants with BPD (P=0.154 and 0.112, respectively). There were no differences in associations between SP-C alleles and BPD among male or female infants.

Low birth weight and very preterm birth are important risk factors for BPD and RDS. We separately compared the SP-C allele frequencies between the BPD infants and their controls among the extremely low-weight infants (birth weight <1000 g, 43 BPD and 44 controls) and the infants with birth weight 1000 g (32 BPD and 125 controls). There were no significant differences in the SP-C allele frequencies between these groups. The frequencies of the alleles coding for 14 Pro (CCG), 138 Asn and 186 Asn among the extremely low-weight infants were (BPD vs controls) 1.0 vs 0.96, 0.28 vs 0.31 and 0.32 vs 0.32, respectively. Among the infants with birth weight 1000 g, the frequencies were 1.0 vs 1.0, 0.27 vs 0.28 and 0.30 vs 0.31. We also compared the SP-C allele frequencies between the BPD infants and their controls separately in terms of the degree of prematurity. The SP-C allele frequencies in the infants with BPD and the controls were determined separately for the infants born at gestation <28 weeks (41 BPD and 46 controls) and those born at gestation 28 weeks (34 BPD and 124 controls). There were no significant differences in the SP-C allele frequencies between these groups. The frequencies of the alleles coding for 14 Pro (CCG), 138 Asn and 186 Asn among the extremely premature (<28 weeks) infants were (BPD vs controls) 1.0 vs 0.97, 0.38 vs 0.32 and 0.39 vs 0.33, respectively. Among the premature infants with gestation 28 weeks at birth, the frequencies were 1.0 vs 1.0, 0.27 vs 0.27 and 0.29 vs 0.31.

SP-C allele and genotype frequencies according to the degree of prematurity

The SP-C allele and genotype frequencies were determined in relation to the degree of prematurity (Table 7). When the SP-C allele and genotype frequencies were compared between the group of extremely preterm infants (gestational age <28 weeks, n=87) and the group of healthy full-term infants (gestational age 37–42 weeks, n=158), significant differences were observed in the frequencies of alleles coding for amino acid 138 (P=0.046). Further analysis revealed an association only among the female infants. When the subgroups of full-term female infants (n=83) and extremely preterm female infants with gestational age less than 28 weeks (n=42) were compared, the allele frequencies for 138 Asn and 186 Asn emerged higher in the group of preterm female infants (P=0.012 and 0.042, respectively). Also, the frequency of the haplotype 138 Asn–186 Asn was increased in this group (P=0.012).

All the populations studied were in Hardy–Weinberg equilibrium, as tested by χ2 analysis.

Discussion

In the present study, the common genetic variation in the SP-C gene was defined using conformation-sensitive gel electrophoresis, a highly specific method for screening single-base changes in double-stranded DNA fragments. A comparison of the CSGE and genotyping results of 90 DNA samples analyzed with both of these methods showed the sensitivity of CSGE to be 90% or more. A new variation was described, and a previously established variation was found to be associated with RDS, a serious neonatal disease caused by lung immaturity. Unlike the other surfactant proteins SP-A and SP-B, an SP-C variant was not only associated with RDS, but also with premature birth.

A new variation, heterozygous substitution of A for G (g.657G>A), was detected in the last nucleotide of exon 1. Since this substitution occurred at the exon–intron boundary, where conserved nucleotides are presumed to play an important role, it may thus have an effect on RNA splicing. However, the frequency of the variant CCA was found to be low, and therefore should be considered rather a rare variant than a common polymorphism. N138T and N186S variations were detected in exons 4 and 5, respectively.24 These two biallelic polymorphisms were frequent in the Finnish population, and are potentially useful tools in association studies. Excluding the differences listed above, the intronic regions seem to be in accordance with the sequence U02948 rather than the sequence J03890. We did not detect any of the mutations recently described by Nogee,15 which confirms that they are not common polymorphisms, but mutations associated with interstitial lung diseases of varying etiologies.

Both of the polymorphic amino acids 138 and 186 are located in the C-terminal part of the proSP-C molecule. The C-terminal region has been shown to be crucial for the targeting and processing of proSP-C that is eventually secreted as SP-C.25,26 Furthermore, most of the mutations described in the SP-C gene have been located in this region. Interestingly, two different single-base substitutions associated with lung disease have been detected in the codon for amino acid 188,15,27 only two residues apart from amino acid 186, where the Asn/Ser polymorphism is present.

Antenatal glucocorticoid and surfactant therapy at birth diminish the incidence and severity of RDS considerably, and hence the number of neonatal deaths. Both therapies have a clear-cut influence on the endogenous surfactant system. Interestingly, neither of these synergistic therapies diminishes the incidence of BPD, which is a heterogeneous disease associated with prematurity and RDS. There are a number of pathogenic factors that potentially influence the susceptibility of BPD. Very low birth-weight infants, despite minimal or no respiratory distress shortly after birth, may develop BPD.28,29 Dominant SP-C gene alterations may play a role in the etiology of BPD. The mutations in the SP-C gene encoding proSP-C were associated with interstitial lung disease often manifesting in infancy.14,15 In addition, Thomas et al27 reported a single-base SP-C gene mutation (188L>Q) that segregated with the pulmonary fibrosis phenotype in one kindred. However, we did not detect any direct association between the common SP-C alleles and BPD, as studied in 75 cases and 170 premature controls. In the present study, seven of the 75 BPD cases did not have RDS at birth. In these seven cases, the frequencies of 138 Asn and 186 Asn (0.14 and 0.21, respectively) tended to be lower than in RDS.

Considering the role of SP-C in surfactant function, we propose that the SP-C alleles influence the susceptibility to RDS in premature infants. A significant difference was observed in the allele frequency of 186 Asn (P=0.040) between the infants with RDS (n=183) and the gestation controls (n=62), and a trend was observed in the allele frequency of 138 Asn (P=0.071). When the groups of very premature infants (gestational age less than 30 weeks) were compared, the differences in the 138 Asn and 186 Asn allele frequencies between the RDS cases and the controls appeared distinct, although the sample size of the controls was small (n=27) and the results were not statistically significant (P-values 0.059 and 0.095).

Apart from genetic factors, RDS is strongly influenced by environmental factors, most notably the length of gestation. In the present study, a trend towards gestational dependence of SP-C alleles was found. Among the group of extremely premature infants with gestational age of less than 28 weeks (n=87), the frequency of 138 Asn was increased compared to the infants with gestational age of 28–32 weeks (n=150) or the full-term infants (n=158, P=0.046).

Since RDS and BPD are highly complex diseases with multiple environmental and genetic factors affecting the susceptibility, the impact of one candidate gene on the outcome is not likely to be dominating. Association studies with multiple comparisons may in some occasions result in Type I error. However, statistical correction methods can easily decrease the power of the study, especially when the effect of the gene examined is known to be limited. Therefore, we want to point out that the P-values from the significance tests in this study were not corrected for multiple comparisons, and therefore, the possibility for Type I error can not be ruled out.

Almost all extremely premature infants have RDS (90% in the present series) that is a consequence rather than a cause of premature birth. The association between the SP-C alleles, premature birth and RDS could have arisen by change. However, these observations raise the possibility that the differentiation of the surfactant system required for the neonatal respiratory adaptation and the labor process share some regulatory pathways. SP-C and the other surfactant components are secreted into the amniotic fluid. Their concentration in the amniotic cavity increases as a function of the length of gestation, reducing the risk of RDS in an unborn fetus. The hydrophobic surfactant proteins (SP-C and SP-B), when added to cultured amnion cells, stimulate the production of prostaglandins required in the labor process.30 The present finding associating very premature birth with the SP-C allele 138 Asn, that potentially affects the secretion of SP-C from immature lung, is consistent with foetal involvement in the labor process.31 The present observation that the allelic association with RDS and very premature births was influenced by the foetal sex is appealing, since the risk of premature birth and the incidence of RDS among premature infants are both influenced by the foetal gender. The expression levels of SP-C in the foetal lung reveal the differences between males and females.32 The possibility that a foetal factor required for neonatal adaptation is additionally involved in the onset of the labor process remains to be substantiated by further genetic and experimental studies.