Introduction

Cystic fibrosis (CF), the most common lethal autosomal recessive disease, is caused by mutations in the CF transmembrane conductance regulator (CFTR) gene.1, 2 So far, more than 1900 mutations have been reported to the Cystic Fibrosis Genetic Analysis Consortium (www.genet.sickkids.on.ca/cftr). The most common mutation is p.Phe508del leading to an abnormally folded protein.

CFTR encodes a cyclic adenosine monophosphate-regulated chloride channel at the apical membrane of epithelial cells.3 Clinically, CF is characterized by elevated sweat chloride concentration, exocrine pancreatic insufficiency, male infertility and progressive obstructive lung disease, the latter being the primary cause of mortality.4, 5 The respiratory disease progression in CF is expressed by various pulmonary dysfunctions,6, 7, 8, 9, 10 such as ventilation inhomogeneities,8, 11 pulmonary hyperinflation,9, 12 bronchial obstruction, trapped gas and gas exchange disturbances,10 most of them occurring early in life,7, 12, 13 and progressing even in the absence of clinical signs and symptoms.11, 14, 15 Owing to the great phenotypic variability in lung disease even observed among patients carrying the same CFTR genotype, several association studies have been conducted to find modifying genetic factors and to study their influence on CF disease outcome.16 So far, five genes showed an association in at least two independent populations with more than 500 participants in total, namely MBL2, IFRD1, IL8 and TGFB1 that are involved in the immune response and EDNRA that might influence CF lung disease by altering smooth muscle tone in the airways and/or vasculature.17, 18, 19, 20, 21 IFRD1 was initially found in a genome-wide association study (GWAS) comprising 160 severe and 160 mild CF patients from the GMS (extremes of phenotype) cohort and replicated in the TSS cohort (family-based CF Twin and Sibling study cohort). Interestingly, IFRD1 did not achieve genome-wide significance in another GWA and linkage study including 1978 individuals from the GMS and CGS (Canadian Consortium for Genetic Studies) cohort. However, the same study identified two loci at 11p13 and 20q13.2 modifying disease severity, and these findings were replicated in a separate family-based study in the TSS cohort (n=557).22 This example shows that GWAS in huge populations like the North American Cystic Fibrosis Gene Modifier Consortium, comprising the GMS, CGS and TSS cohorts, is a powerful tool, but that there is the danger of false-negative results due to the stringent corrections for multiple comparison in the statistical analysis and the heterogeneity of the cohorts. Therefore, being aware of the fact that large and small cohorts both have their pros and cons, and having a small but well-defined cohort at our disposal, we chose a candidate gene approach. Keeping in mind the danger of false-positive results, we evaluated the influence on lung disease of variants in 10 genes of the CF interactome, coding for proteins that either directly interact with CFTR or the epithelial sodium channel (ENaC), a counterplayer of CFTR, or are involved in CFTR trafficking. The genes interacting with CFTR are SNAP23 (synaptosomal-associated protein, 23kDa (Annexin A5) and PPP2R4, PPP2R1A, PPP2R5E, that encode three regulatory subunits of protein phosphatase 2A.23, 24, 25, 26, 27, 28, 29 The two genes interacting with EnaC are Nedd4L (neural precursor cell expressed, developmentally downregulated 4-like) and PRSS8 (prostasin).30, 31, 32

The last group consists of three genes that are involved in CFTR trafficking, AHSAI (activator of heat shock 90-kDa protein ATPase homolog 1), CALR (calreticulin) and KRT19 (cytokeratin 19).33, 34, 35, 36

Most candidate gene and whole-genome association studies so far focused on time point measurements of only one or two lung function parameters. To get a better insight on the influence of our single-nucleotide polymorphisms (SNPs) on CF lung disease, we used a new approach and correlated our genotype data with a wider range of lung function parameters measured over a time period to evaluate the influence of a SNP on the progression of the lung disease. We were not only interested in lung function parameters quantifying the degree of bronchial obstruction and hence airway patency, but also in parameters representing pulmonary hyperinflation, ventilation inhomogeneities, and reduction of forced expiratory flow and volume characteristics. In addition, we decided to investigate the influence of variants on longitudinal data of lung function parameters to get an idea of the progression of the lung disease over time. To achieve this, we used the clinically very carefully characterized patient cohort of the Children’s University Hospital in Bern documented over a time period of 13 years.8, 9, 10 Owing to the fact that these patients were treated at the same hospital, environmental influences in the context of medical treatment are reduced in this cohort, making it well suited to study-modifying effects.

Materials and methods

Patients

We studied 95 p.Phe508del homozygous patients (53 male and 42 female; birth years 1975–2001; onset of chronic Pseudomonas aeruginosa infection until age of three 37.8% and after age of three 62.2%) attending the Department of Pediatrics, Inselspital, Bern, Switzerland.8 Patients included in this study fulfilled the following criteria: (1) being homozygous for p.Phe508del, (2) having characteristic phenotypic features including positive sweat test and documented onset of chronic P. aeruginosa infection, and (3) providing lung function data from age 6 to 20 years. To investigate the influence of modifier genes on different aspects of CF lung disease, six lung function parameters were evaluated: (1) functional residual capacity (FRCpleth; degree of pulmonary hyperinflation), (2) lung clearance index (LCI; degree of ventilation inhomogeneities), (3) effective specific airway resistance (sReff; degree of bronchial obstruction), (4) forced expiratory volume in 1 s (FEV1), (5) forced expiratory flow at 50% forced vital capacity (FEF50) and (6) volume of trapped gas (VTG). All participants gave their full informed consent and the study was approved by the local ethics committee.

Lung function testing

Whole-body plethysmography and multibreath nitrogen washout (MBNW) techniques provided data pertaining FRCpleth, FRCMBNW, LCI and sReff, whereas FEV1 and FEF50 were assessed by spirometry. Measurement techniques have been described in detail previously.8, 9, 10, 37 All values were expressed by z-transformation in SD scores (SDS), based on gender- and age-specific regression equations.38, 39 Details regarding measurement techniques and clinical evidence of these target parameters are given in the online data supplement.

Single-nucleotide polymorphisms

SNPs that potentially alter gene function were chosen from the NCBI Database according to the following criteria: (1) missense SNPs, (2) intronic SNPs in proximity of splice sites (−50, respectively, +15 bp), (3) synonymous SNPs located in exonic splicing enhancer (ESE) and silencer (ESS) motifs and (4) SNPs in the 5′- and 3′-untranslated region (UTR). SNPs with a minor allele frequency <10% were excluded owing to the small genotype counts leading to groups that are too small for our statistical model. For the complete list of SNPs including corresponding primers see online data Supplement Table S1.

Genotyping

Genomic DNA was isolated from EDTA blood using the Qiamp DNA Blood Maxi Kit (Qiagen Gmbh, Hilden, Germany), according to the supplier’s protocol. Genotyping was performed by high-resolution melting (HRM) using the LigthCycler High Resolution Melting Master (Roche Diagnostics Gmbh, Mannheim, Germany) on the LigthCycler 480 (Roche Diagnostics Gmbh), by single-strand conformation polymorphism/heteroduplex analysis according to Liechti-Gallati et al40 or by restriction fragment length polymorphism analysis as indicated in the online data supplement. The accuracy of the genotyping was verified in around 10% of the PCR products by Sanger sequencing (see online data supplement).

Statistical analysis

Linear mixed model analysis was performed to assess the influence of variants and haplotypes in the candidate genes in a recessive, codominant and dominant model on longitudinal measures of each lung function parameter using the PROC MIXED procedure of the SAS 9.2 software (SAS Institute, Cary, NC, USA). In addition, we investigated the cumulative effects of the SNPs that showed a significant influence individually. The lung function parameter of interest was included in the calculations as a dependent variable, and the SNP genotype, age at lung function measurement, birth year, test year and gender as explaining variables. The genotype model with the lowest Akaike information criterion fitting best for our statistical model was chosen. P-values of all genotype comparisons were obtained with a contrast statement. We performed a Bonferroni correction for 10 genes and considered associations with P≤0.005 to be statistically significant. In case of an association to one lung function parameter with P≤0.005, we also included associations of this variant to other lung function parameters with P≤0.01 in our evaluation.

Graphical displays of all significant associations were done using R 2.9.0 (www.r-project.org). The corresponding codes are listed in the online data supplement.

Haplotypes were computed using the Arlequin 3.5 software.41

In silico tools

To analyze the influence of significant variants on transcription and translation, we used different in silico tools according to the location of the variants in the gene. To assess the pathogenicity of the missense SNP in KRT19, we used PolyPhen-2 and SIFT.42, 43 For intronic and synonymous SNPs, we used the Human splicing finder tool with default threshold levels and the SpliceAid 2 tool for alveolar cells.44, 45

Results

Genotyping

In total, we genotyped 72 SNPs in 10 candidate genes. Genotype and allele frequencies of all variants were in Hardy–Weinberg equilibrium in our population.

Seventy SNPs are known to the NCBI Database and two SNPs were found during our study (c.-131C>T in AnxA5 and c.48+24C>T in Nedd4L). Of these SNPs, 47 showed an allele frequency <10% and were therefore excluded. Twenty-five SNPs were included in the statistical analysis (Table 1). SNPs in linkage disequilibrium are displayed in Table 2.

Table 1 Variants showing a minor allele frequency ≥10%
Table 2 Variants in linkage disequilibrium

Influence of variants on lung function

We found six variants that presented with significant associations with at least one lung function parameter (Table 3 and Figure 1).

Table 3 Results of random coefficient models (only variants with significant associations are shown)
Figure 1
figure 1

Results of random coefficient model: (a) For all three lung function parameters, genotype AA of PPP2R4:c.-185A>C has a significant different slope and overall a worse pulmonary outcome. (b) Genotype CT of SNAP23:c.267-9T>C shows a significant different intercept from genotype CC in FEF50 and a significant steeper slope in FRCpleth. For VTG, genotype TT has a significantly steeper slope than genotype CT. (c) The slope of genotype GG is significantly different from genotype CG for KRT19:c.179G>C in association with sReff. (d) PPP2R1A:c.*465T>A genotype TT shows a significant steeper slope of the curve than genotype AA for FRCpleth.

In PPP2R4:c.-185A>C, lung disease progression over years of age was significantly stronger for AA in comparison with AC/CC best reflected by sReff and FEV1, the former a measure of bronchial obstruction of larger and smaller airways, the latter an expression of loss of elastic recoil (Figure 1a). Moreover, LCI as a measure of ventilation inhomogeneities, and hence early outcome parameter, was already increased in both subgroups at age 6 and the SDS of genotype AA significantly deteriorated with age.

Calculated in a codominant model SNAP23:c.267-9T>C in SNAP23 showed significant associations with the lung function parameters FRCpleth, FEF50 and VTG (Figure 1b). Whereas SNAP23:c.267-9T>C CT presented with lowest progression in pulmonary hyperinflation, this subgroup already started with a z-score of 1.6 at age 6 years. Significantly higher progression of pulmonary hyperinflation over age was found for SNAP23:c.267-9T>C CC. This finding presented with a mirror picture for FEF50 presenting a most pronounced decline for SNAP23:c.267-9T>C CC as estimate of the small airway dysfunction. For VTG there is a highly significant difference in the slope of SNAP23:c.267-9T>C TT compared with CT, indicating that genotype TT results in faster VTG progression.

Three SNPs in KRT19, KRT19:c.90T>C, KRT19:c.179G>C and KRT19:c.471C>T show association with sReff (P≤0.01, P≤0.005 and P≤0.01) in a codominant model. Owing to the fact that these three SNPs are in linkage disequilibrium (Table 2) only the graph for KRT19:c.179G>C (Figure 1c) is displayed, where genotype GG shows a significantly steeper slope and therefore more pronounced progression.

Finally, concerning FRCpleth genotype AA in PPP2R1A:c.*465T>A presented with a much milder course than genotypes TT and AT when calculated in a codominant model (Figure 1d). It can be hypothesized that homozygosity for the mutant allele may induce a certain protection from the development of pulmonary hyperinflation of the lungs.

Haplotype analysis did not result in significant associations to lung disease except for the two main haplotypes TGC and CCT in KRT19 (KRT19:c.90T>C, KRT19:c.179G>C and KRT19:c.471C>T), where homozygosity for haplotype TGC results in a significant steeper slope (P≤0.01) (represented by the graph of KRT19:c.179G>C (Figure 1c)).

Predicted influence of variants on transcription and translation

For KRT19:c.179G>C, a missense SNP in exon 1 of KRT19, we used PolyPhen-2 and SIFT to evaluate its pathogenicity. PolyPhen-2 predicted this variant to be benign with a score of 0.000, and SIFT predicted it to be tolerated with a score of 1, therefore this SNP does not seem to have any harming effect at all. For SNAP23:c.267-9T>C, a SNP in an intronic region, and for KRT19:c.90T>C, KRT19:c.179G>C and KRT19:c.471C>T, Human Splicing Finder and SpliceAid 2 were used to assess the influence of the variants on splicing. SNAP23:c.267-9T>C does not seem to influence splice sites directly but the consensus value (CV) of the T allele sequence compared with the C allele sequence is reduced by 23.8% for a branch point motif leading to a loss of the splice site with high probability. For the three KRT19 SNPs, we looked for changes in ESSs and ESEs with the SpliceAid 2 software. KRT19:c.90T>C T allele leads to a change in the recognized sequence of the protein HuB, a RNA-binding protein that has an important role in neuronal differentiation. SNP KRT19:c.179G>C does not lead to any changes at all in ESS or ESE. KRT19:c.471C>T is the most likely of the three SNPs to influence splicing, because the C allele creates a sequence that is recognized by YB-1, a positive splicing factor (for more details see Supplement Table S2 in the online data supplement).

Discussion

Disease severity in CF varies greatly among patients, even when they carry the same CFTR genotype.16 We hypothesized that genes interacting with CFTR and epithelial sodium channel ENaC might be potential modifiers of CF disease severity. Therefore, we assessed the impact of variants of these interacters on CF disease outcome. In contrary to most association studies performed so far, we used longitudinal lung function data of six lung function parameters. In total, we found one variant in PPP2R4, one variant in PPP2R1A, one variant in SNAP23 and three variants in KRT19 that presented with significant associations to at least one lung function parameter.

The pathogenicity of p.Phe508del CFTR is due to its inability to fold properly in the endoplasmatic reticulum (ER), destining it for ubiquitination and degradation by the proteasome. Nevertheless, when reaching the apical membrane, it shows activity. Interestingly, there is still a slight remaining CFTR function in CF patients being homozygous for p.Phe508del, indicating that some CFTR proteins pass the ER quality control and reach the apical membrane.46 We think that variants in direct interacters like PP2A and SNAP23 might enhance the residual function of CFTR while variants in KRT19, which are involved in trafficking of CFTR, might raise the amount of functional CFTR at the apical membrane and therefore modify CF disease.

PP2A and PP2C are the major phosphatases involved in CFTR deactivation. Phosphorylated CFTR is a substrate for PP2A dephosphorylation and it was shown that inhibition of PP2A activity delays the closure of the CFTR channel, while an overexpression of PP2A results in prolonged deactivation of the channel.27, 29 Thelin et al28 demonstrated that inhibition of PP2A in well-differentiated human bronchial epithelial cells increases the airway surface liquid (ASL) in a CFTR-dependent manner. In our study, we looked at PPP2R4, the protein phosphatase 2A activator, regulatory subunit 4, of PP2A and in accordance with Thelin et al at the two regulatory subunits PPP2R1A and PPP2R5E. SNP PPP2R4:c.-185A>C genotype AA was found to be associated with a worse progression for the three lung function parameters FEV1, LCI and sReff (Figure 1a). PPP2R4:c.-185A>C is a variant in the 5′ UTR of PPP2R4. We hypothesize that the C allele might lead to interferences with the translation machinery and therefore to reduced PP2A levels resulting in delayed closure of the CFTR channel and elevated ASL. Variant PPP2R1A:c.*465T>A is located in the 3′ UTR of PPP2R1A and therefore could affect the abundance of the mRNA by influencing the stability of the transcript or the site selection for polyadenadenylation. In addition, it is known that there are several mechanisms of translational control by the 3′ UTR.47 Therefore it is expected that variants in the 3′ UTR might influence translation and therefore protein levels. Genotype AA in PPP2R1A:c.*465T>A shows a much better lung progression vs genotype TT for the lung function parameter FRCpleth indicating that homozygosity for the mutant allele seems to interfere with translation leading to lower levels of PP2A and therefore to a better residual function of CFTR (Figure 1d).

SNAP23:c.267-9T>C, an intronic variant located −9 nucleotides from exon 6 in SNAP23, showed significant associations with the three lung function parameters FEF50, FRCpleth and VTG. SNAP23 physically interacts with CFTR by binding to its amino-terminal tail and inhibits CFTR chloride currents by influencing channel gating.23 Using the Human Splicing Finder tool, we found a change in a potential branch point. The CV for the sequence with the C allele (0–100) is 80.91, whereas the CV for the sequence with the T allele is 57.07 leading to an abolished splice site. Therefore one could hypothesize that genotype T leads to an aberrant splicing and less functional protein resulting in a better clinical phenotype due to the loss of the inhibitory effect of SNAP23 on CFTR channel gating. According to this hypothesis, genotype TT shows a worse intercept for all three lung function parameters but slower progression for FEF50 and FRCpleth than genotype CC. We hypothesize that in young years with relatively good lung function, the unrestricted interaction and regulation of SNAP23 and CFTR in CC genotype regulates the phenotype in a positive way. But with increasing progression of multifactorial lung disease, a decreased interaction might influence the residual CFTR function positively. Finally, heterozygous patients seem to have the best balance between full and reduced interaction resulting in least progression of lung disease (Figure 1b).

There are three variants KRT19:c.90T>C, KRT19:c.179G>C and KRT19:c.471C>T in KRT19 that are in strong linkage disequilibrium resulting in the two main haplotypes TGC and CCT (Table 2). KRT19:c.90T>C is a synonymous SNP, KRT19:c.179G>C a missense SNP in exon 1 and KRT19:c.471C>T a synonymous SNP in exon 2. According to PolyPhen-2 and SIFT, the missense SNP does not lead to pathogenic alterations in the protein, therefore we looked for changes in ESE and ESS motifs that might influence splicing of KRT19 and therefore lead to changes in protein level. Especially the change from the C to T allele in SNP KRT19:c.471C>T leads to a loss of affinity for the positive splicing factor YB-1 and therefore probably to changes in exon inclusion/skipping as could be shown in the c.6792C>G mutation in the neurofibromatosis type 1 gene (Supplement Table S2).48 KRT19 was shown to be a modulator of cellular differentiation processes at the apical membrane of epithelial cells.35 Sun36 (presented at the 23rd Annual North American Cystic Fibrosis Conference) illustrated that KRT19 affects CFTR function by stabilizing it at the plasma membrane and is able to increase its plasma membrane density by inhibiting CFTR endocytosis. It seems that homozygosity for haplotype TGC has a negative effect on protein levels and therefore worse progression of sReff (Figure 1c).

In summary, regarding interaction of these SNPs with lung function, it can be concluded that the C allele in PPP2R4:c.-185A>C may exert a certain stabilization of the CFTR to be moved at the cell membrane, an effect that is expressed by a significant milder progression of airway mechanics. Heterozygosity for SNAP23:c.267-9T>C may be helpful in the prohibition to develop pulmonary hyperinflation and hence small airway dysfunction, an effect which is additionally also shown for PPP2R1A:c.*465T>A genotype AA. On the other hand, it seems that homozygosity for the wild-type allele of the three SNPS in KRT19, that modulates CFTR maturation, may lead to a faster development of pulmonary hyperinflation.

The results of our study show especially how important it is to assess genotype–phenotype associations by a whole set of lung function parameters, and not only by spirometry and to consider longitudinal data, because the SNPs might influence disease severity in different ways at different time points. We know that owing to the small patient cohort certain effects probably are under- or overestimated in our study, therefore it needs further studies to evaluate these first promising results. We intend to confirm our results in a second bigger cohort but we were not able so far to find a CF center with comparable longitudinal lung function data. Currently, we are evaluating our candidate genes functionally by searching for differences in expression by comparative quantitative proteomics where we already found KRT19 to be upregulated twofold in a F508del homozygous compared with a wild-type bronchial cell line (publication in preparation).