Article | Open | Published:

Detection of Cystic Fibrosis Serological Biomarkers Using a T7 Phage Display Library

Scientific Reportsvolume 7, Article number: 17745 (2017) | Download Citation


Cystic fibrosis (CF) is an autosomal recessive disorder affecting the cystic fibrosis transmembrane conductance regulator (CFTR). CF is characterized by repeated lung infections leading to respiratory failure. Using a high-throughput method, we developed a T7 phage display cDNA library derived from mRNA isolated from bronchoalveolar lavage (BAL) cells and leukocytes of sarcoidosis patients. This library was biopanned to obtain 1070 potential antigens. A microarray platform was constructed and immunoscreened with sera from healthy (n = 49), lung cancer (LC) (n = 31) and CF (n = 31) subjects. We built 1,000 naïve Bayes models on the training sets. We selected the top 20 frequently significant clones ranked with student t-test discriminating CF antigens from healthy controls and LC at a False Discovery Rate (FDR) < 0.01. The performances of the models were validated on an independent validation set. The mean of the area under the receiver operating characteristic (ROC) curve for the classifiers was 0.973 with a sensitivity of 0.999 and specificity of 0.959. Finally, we identified CF specific clones that correlate highly with sweat chloride test, BMI, and FEV1% predicted values. For the first time, we show that CF specific serological biomarkers can be identified through immunocreenings of a T7 phage display library with high accuracy, which may have utility in development of molecular therapy.


There is a tremendous need for developing reliable serum based biomarkers in various diseases including proliferative disorder such as cancer, inflammatory diseases and infections as well as genetic disorders such as cystic fibrosis (CF).

Cystic fibrosis is an autosomal recessive disease caused by mutations in the gene encoding the cystic fibrosis transmembrane conductance regulator (CFTR)1. Currently, there are more than 1300 various mutations in CFTR gene that is known to cause the CF phenotype. The CF phenotype is characterized by chronic bacterial airway infections, neutrophilic inflammation with mucus in airways, progressive bronchiectasis and advanced cystic fibrosis lung disease. Mutations in the CFTR gene affect the epithelial innate immune function in the lungs, resulting in exaggerated and ineffective airway inflammation that fails to eradicate pulmonary pathogens2. Bacterial infections in CF are characterized by organisms that have substantial genetic flexibility to evade phagocytic clearance and develop resistance to multiple antibiotics2. Repeated or chronic microbial infections are thought to be the major contributor to excessive inflammation leading to CF lung damage. In addition to chronic lung infections, CF subjects may exhibit exocrine pancreatic insufficiency, diabetes mellitus, and sexual organ dysfunction.

Circulating autoantibodies and autoantigens in CF sera have been widely reported, yet their significance is unknown3,4,5. Various proteins and protein degradation products have been explored as candidate biomarkers for clinical outcome, such as neutrophil elastase, IL-86, and degradation products of lung surfactant protein SP-A7,8,9,10. A variety of proteomic approaches exploited antigenic biomarkers that could provide candidates for the diagnosis of infection, prognostic indicators or vaccine development. Pedersen et. al. used antibodies from CF patients to probe a protein array of body fluids prepared by two-dimensional gel electrophoresis for antigenic biomarker detection in Pseudomonas aeruginosa 5. Others identified the outer membrane protein OprL as a seromarker for the initial diagnosis of Pseudomonas aeruginosa infection in CF patients11.

Recently, we developed a heterologous cDNA library derived from bronchoalveolar cells (BAL) and total white blood cells (WBC) from sarcoidosis patients and combined it with cultured human monocytes and embryonic lung fibroblasts cDNA libraries to build a complex sarcoidosis library (CSL)12,13. Because the CSL represents a segment of the human lung microbiome, we hypothesize that it contains potential antigens relevant to CF. To test this, we immunoscreened our microarray platform with sera form healthy controls, CF and lung cancer patients using the power of antibody recognition present in human sera to discover potential serological biomarkers in CF.


Complex sarcoidosis library detects unique antigens in the CF sera

A panel of potential antigens was randomly selected from two highly enriched pools of T7 phage cDNA libraries through biopanning of the CSL library12. A microarray platform was constructed and immunoscreened with 111 sera (49 healthy controls, 31 with CF and 31 with adenocarcinoma (LC) of the lungs. The demographics of the study subjects are shown in Table 1. Among the CF patients, 15 (48%) were genotyped as F508del homozygotes, 9 (29%) were heterozygotes for F508del, and 7 (23%) had various mutations such as G542X or 2789 + 5 GT0A/S489X and others (Table 1). Following immunoreaction, the microarray data were pre-processed and then analyzed. We applied a student t-test on 1,000 training sets (FDR < 0.01) between CF vs. healthy controls samples. A total of 599 clones appeared significant at least once. We calculated the frequency of each significant clone and ranked the top 20 clones according to their significance and frequency. Furthermore, we performed an unsupervised PCA for all 1070 clones with data from 111 study subject sera. As shown in Fig. 1a, several LC and healthy controls clustered together with the CF samples. To investigate whether the identified 20 highly significant CF clones can improve class separation of CF samples from LC and healthy controls, we constructed a PCA plot using only those clones (Fig. 1c). Using the 20 highly significant CF clones aided to a class separation of CF samples from LC and healthy controls. Forty nine percent of variance was explained along the PC1.

Table 1 Subjects demographics.
Figure 1
Figure 1

PCA and hierarchal clustering. (a) PCA score plots along the PC1and 2 generated with 1070 clones of three groups: 1) healthy control samples (yellow circle), 2) CF samples (blue triangle) and 3) LC samples (green square). Along the PCA1 explaining a variance of only 0.18 and along the PC2 of 0.12. (b) The hierarchal clustering was applied on the healthy controls (black labels), CF patients (red labels) and LC (blue labels) with 1070 clones. (c) PCA score plots along the PC1 and 2 results when applied on the highly significant 20 CF clones. The PC1 explained 0.49 of variance, whereas PC2 explained 0.09 of variance. As shown the CF samples are well separated from the healthy controls and LC samples. (d) Hierarchal clustering using only the highly significant 20 CF clones. The green cluster includes LC and healthy control samples (no CF samples), the magenta cluster includes all the CF samples, few healthy control and two LC samples. This figure demonstrates better clustering with the highly significant 20 CF clones (panels c and d) when compared with the clustering using all clones (panels a and b).

Next, we performed unsupervised hierarchical clustering with all 1070 clones on 111 samples. We observed that the magenta cluster has a mix of samples and lacks specific sub-clusters of CF samples (Fig. 1b). In contrast, when the clustering algorithm was performed using the 20 highly significant CF clones on all samples, we observed a distinct hierarchical linkage, clearly demarcating CF samples from others (healthy controls and LC) (Fig. 1d). Distinct expression features of 20 highly significant CF clones among study subjects are highlighted in a heatmap plot (Fig. 2).

Figure 2
Figure 2

Heatmap generated based on the 20 highly significant CF clones from the data of 111 study subjects (49 healthy controls, 31 with CF and 31 with LC). Each row represents a clone, while each column represents a study subject. As shown in Fig. 2, most CF samples clustered to the left side of the heat map plot, while the LC samples and healthy controls clustered to the right side of the plot indicating different expression profiles.

Next, we applied the classifier model and calculated the AUC values on accumulating numbers of clones (see method section) on test and validation sets. Figure 3a shows the AUC values for the test set. The lowest average AUC values for the test set was 0.956. Figure 3b graphically represents the performance of the classifier model when applied to the validation set. When we applied the classifier model on the validation set, the lowest average AUC value was 0.926. These results clearly indicate that the classification model based on the accumulating number of significant clones when applied on the test and the validation sets have a very good classification performance. Finally, to assess if the identified highly significant CF clones provide a sound classification performance, we applied the naïve Bayes classification algorithm with the highly significant CF clones to predict CF samples from healthy controls and LC samples. At the optimal threshold (highest true positivity with lowest false positivity for each of the 1000 runs), we could reliably predict CF from healthy controls and LC samples with a mean specificity of 0.959 (95% CI, 0.11–0.15) and a mean sensitivity of 0.999 (95% CI, 0.18–0.21). The mean AUC under the ROC for the classifier was 0.973 (95% CI, 0.07–0.094) (Fig. 3c).

Figure 3
Figure 3

Classification performance of the naïve Bayes classifier. The classifier is to predict CF from LC and healthy control samples. (a) Performance of the classifier on the testing sets. Box plots indicate the AUC values (y-axis) when the classifier model was applied on the 1000 test sets. The x-axis is accumulating sets of clones. The accumulation of the clones starts with the most frequent clone and then one clone added at a time to reach 100 clones. (b) Performance of the classifier models on the validation set. As indicated the classifier models when they were built using the significant clones shows a high AUC values on the testing sets as well as on the completely independent validating set. (c) The ROCs generated from the average of the 1000 runs of the classifier models when applied on the validation set (randomly selected healthy controls, CF and LC) using the 20 highly significant CF clones. The box plot shows the distribution of the sensitivities. The ROC curve demonstrates an excellent classification performance with an average AUC of 0.973 (95% CI: 0.07–0.094) with sensitivity of 0.99 (95% CI: 0.18–0.21) and specificity of 0.959 (95% CI: 0.11–0.15). These results indicate excellent performance of the naïve Bayes classifier on the 20 highly significant CF clones.

Characterization of significant CF clones

Based on the results of training and validation sets, we characterized the 20 highly performing clones through sequencing and identified which clones can predict sweat chloride tests, FEV1% predicted and body mass index (BMI). After obtaining the sequences of clones, Expasy program was used to translate the cDNA sequences to protein sequences12. Protein blast using algorithms of the BLAST program were applied to identify the highest homology to identified peptides. Additionally, we compared these results with corresponding nucleotide sequences using nucleotide blast and determined the predicted amino acid in frame with phage T7 10 B gene capsid proteins. Among 20 clones four CF reactive antigens comprise relatively large peptides, while 16 CF antigens are coded by the inserted gene fragments leading to out-of-frame-peptides, hereby meeting the definition of mimotopes14 (Table 2). As CF sera reacted to these out-of-frame-peptides, it is likely that these clones represent CF antigens that are produced as a result of altered reading frames or alternative splicing, as shown in previous studies14,15. Full length of peptides and genes of the top 20 CF clones are shown in Supplementary Table 1. Table 2 shows the 14 most significant CF antigens, gene names, sensitivity, specificity and FDR adjusted p-value. Figure 4a and b show the ROC curves for the 14 CF antigens. Finally, we sought to determine whether any of the biomarkers correlate with sweat chloride test, BMI and FEV1% predicted values. Sweat chloride test, PFT and BMI values for CF subjects are shown in Table 1. Sweat chloride test is commonly used as screening tool for CF diagnosis16. We found highest spearman correlation (r = −0.54) between sweat chloride values and the clone p51_BP3_113 (GEM_5047) (Fig. 5a). By combining this clone with four additional clones a higher correlation was reached (r = −0.72) (Fig. 5b). BMI is an important clinical measure among CF patients to predict exacerbation and decline of lung function testing17. We found highest spearman correlation (r = −0.31) between BMI and the P51_BP3_47 clone (dnaJ homolog) (Fig. 5c). By combining this clone with 4 other clones a higher correlation with BMI was reached (r = −0.58) (Fig. 5d). Additionally, we found the highest correlation (r = −0.42) between FEV1% predicted and clone P197_BP4_926 (Fig. 5e). The correlation value (r = −0.6) improved once we added 4 other clones (Fig. 5f). Table 3 shows the correlation between sweat chloride values, BMI and FEV1% predicted values and significant clones. Seven out of 16 identified clones overlapped with highly specific and sensitive CF clones shown in Table 2. In addition, we identified 6 other clones with significant correlation with sweat chloride test, BMI and FEV1% predicted values. Similar results were observed when we plotted other PFT values including FVC (data not shown).

Table 2 Significant Cystic Fibrosis Clones.
Figure 4
Figure 4

Naïve Bayes classification performance for the top 14 clones. (a) ROCs for the top 6 significant clones that are increased (up-regulated) in CF sera compared to healthy control. (b) ROCs for the top 8 significant clones that are decreased (down-regulated) in CF compared to healthy controls. This figure demonstrates reasonable classification performance when the classification was applied just to one clone.

Figure 5
Figure 5

Pearson correlation of identified biomarkers with clinical values. Scatter plots depicted correlation of the sweat chloride values with one clone (a) and aggregated 5 clones (b). Scatter plots depicted the correlation for BMI predicted with one clone (c) and aggregated 5 clones (d). Scatter plots depicted correlation of FEV1% with one clone (e) and aggregated 5 clones (f). The correlation values and p values are shown in the top right of each plot. The names of the clones are shown at the bottom of each plot.

Table 3 Correlation of biomarkers with Sweat Chloride test, BMI and FEV1% predicted.


CF is characterized by a self-perpetuating cycle of airway obstruction, chronic bacterial infection, and vigorous inflammation that results in bronchiectasis, progressive obstructive lung disease, and marked shortening of life expectancy. Despite having identical cystic fibrosis transmembrane conductance regulator genotypes, individuals with F508del homozygous CF demonstrate significant variability in severity of pulmonary disease and infection. Non-invasive serological biomarkers that can aid to monitor disease progression or evaluate response to therapy would be extremely valuable. Several groups attempted to identify specific biomarkers to predict inflammation in CF using various biofluids such as sputum, BAL and serum10,18. Most of these methods led to the discovery of a series of markers or expression signatures but failed to be useful in clinical practice18. In view of this background, we applied a novel high throughput technology to overcome the current gap by constructing phage-protein microarrays in which peptides were derived from a unique sarcoidosis cDNA library and expressed as a phage fusion protein. Through immunoscreening and rigorous statistical analysis, we identified 20 highly significant CF clones as biomarkers that are able to discriminate between CF and healthy controls as well as lung cancer sera. One important issue in biomarker discovery is the validation of biomarkers and sample selection. To overcome this issue, we randomly assigned samples into 1000 training sets instead of using one training set. Then, we compared the healthy controls and CF samples for each pair of such random sets. The ranking of the top 20 clones was based on the significance and frequency of each clone (how many times each clone appears significant at FDR < 0.01).

Environmental stresses including cigarette smoking, hypoxia, and chronic inflammation have also been implicated in reduced CFTR function19,20. Additionally, subjects with smoking related chronic obstructive lung disease (COPD) can develop a similar clinical phenotype with recurrent respiratory infections, mucus inspissation and airway obstruction that is attributed to acquired CFTR deficiency21. In unsupervised HC, we have seen few false positive classification that might have been due to the selection of control groups (lung cancer and healthy controls), who had a significant smoking history. Interestingly, these subjects had more than 50 pack years of smoking history. Because it is known that cigarette smoking affects bacterial clearance22, we speculate that long-term cigarette smoking in these subjects might have led to a similar immunoreactivity to our microarray platform as CF individuals. Therefore, it is probable that if we choose a younger non-smoker group as control subjects, we would have no false positive classification.

Furthermore, we sequenced the top 20 discriminating antigens for CF and identified homologies in a public database. The range length of identified peptides for CF antigens was between 8–213 amino acids (AA). Among the 20 CF specific phage peptides, five out-of-frame peptides and one epitope were increased in sera of CF patients. One epitope (HLA-DR) was three times randomly selected (P51BP3_296, P51BP4_704 and P197_BP4_925), suggesting the importance of HLA-DR in pathology of CF. Recently, studies have demonstrated that the transcript levels of HLA-DR and HLA-DQ are reduced in CF patients23. Another epitope was DnaJ homologue (Hdj)-1/heat shock protein (Hsp) 40, a protein chaperon, which along with its co-chaperone Hsp70 regulates protein folding and trafficking in the endoplasmatic reticulum (ER) and facilitates degradation of misfolded proteins24. It has been shown that Hsp40 and Hsp70 facilitate CFTR assembly25. We found DnaJ homologue was increased in sera of CF patients and had a negative correlation with BMI of CF subjects. Another epitope (Thioredoxin like protein) was decreased in CF patients. Studies have shown that excessive neutrophil elastase activity in the airways of pediatric and adult CF patients resulted in lung damage26,27,28. Disruption of neutrophil elastase activity by adding exogenous thioredoxin or dihydrolipoic acid in the sputum of CF patients reduced the neutrophil elastase activity29. Another in-frame epitope with relevance to FEV1% predicted was Thymosin β-4 (TMSB4X). In vitro addition of Thymosin β-4 in the sputum of CF patients decreases the sputum cohesivity by depolymerizing actin30.

Among the 20 sequenced CF specific phage peptides we identified 16 antigens with relatively short out-of-frame peptides meeting the criteria as mimotopes (mimetic sequence of a true epitope)14. Although the significance of mimotopes is not clear, it has been shown that some out-of-frame peptides can be immunogenic and can activate MHC class I molecules31. Due to smaller peptide sequences of mimotopes, they may have homology with diverse proteins. Prior studies using similar techniques have identified out-of-frame peptides14,15,32. We identified two sequenced peptides (narX and barA_4) with similarity to histidine kinases that belong to a large family of membrane-spanning proteins found in many prokaryotes and some eukaryotes. This gene controls the bacterial virulence, growth and biofilm formation in CF patients33. Similarly, IgG response to Burkholderia capacia 80-kDa outer membrane protein has been shown to be significantly higher in patients with CF34. Interestingly, when we explored the correlation of biomarkers with sweat chloride values, we found a good correlation with the outer membrane porin35. Another significant biomarker detected is beta-lactamase. Several studies have shown association between the development of resistance to beta-lactam antibiotics and high beta-lactamase production in CF patients36.

Among 16 mimotopes, we found eight with decreased expression in CF patients (Table 2). Interestingly, one out of eight CF antigens with higher specificity and sensitivity (P197_BP4_830), belongs to repressor transcriptional regulators37,38. One in-vitro study showed that Pseudomonas aeruginosa toxin regulates TetR family transcriptional regulator and hence regulates CFTR expression through transcriptional repression37. Interestingly, TetR is involved in the regulation of antibiotic resistance and controls the expression of membrane-associated proteins that are involved in antibiotic resistance39. Through immunoscreening, we identified decreased NADPH dehydrogenase subunit I. Similarly, studies have shown that mitochondrial complex I activity is reduced in cells with impaired cystic fibrosis transmembrane conductance regulator40. CFTR chloride channels belong to the superfamily of ABC transporter ATPases41. Interestingly, we identified reduced ABC transporter substrate binding protein expression in CF patients. The ABC transporters are widespread in prokaryotes and eukaryotes containing nucleotide-binding domains (NBD) and two transmembrane domains (TMDs). ATP hydrolysis on the NBD drives conformational changes in the TMD, resulting in alternating access from inside and outside of the cell for unidirectional transport across the lipid bilayer42.

To our knowledge no previous study used phage display technology to detect CF serum biomarkers. We detected novel antigens for CF using a heterologous library derived from sarcoidosis subjects. Lungs are highly exposed to numerous bacteria and our library is predominantly derived from sarcoidosis BAL cells and WBCs containing diverse immune cells, including macrophages that were exposed to various pathogens. Hence, we postulate that the CSL represents a segment of the lung microbiome containing diverse antigens including CF specific antigens, sarcoidosis and TB specific antigens12,13. The phage display technology and immunoscreening has utilities not only in identifying of diagnostic biomarkers, but also may enable us to develop a novel targeted therapy utilizing the peptide sequences (mimotopes) as vehicles to deliver specific drugs. For instance, among highly significant clones, we found a sequence peptide homologous to histidine kinase (narX) with high specificity and sensitivity. Bacterial histidine kinases are promising targets for the development of antibacterial therapy. Currently efforts have been made to identify specific compounds targeting the inhibition of histidine kinase as antibacterial therapy43. Additionally, this technology might enable us to discover unknown epitopes targeting specific bacterial antigens leading to immunogenicity and antibody production in CF subjects, as well as providing us with a better understanding of host immune defenses in CF subjects. Furthermore, this microarray platform can be hybridized to detect IgA in sera or saliva of CF patients that may have clinical values.

In summary, we have developed a novel T7 phage display library derived from BALs and leukocytes of patients with sarcoidosis that displays a significant segment of the potential antigens that can recognize IgG antibodies in CF sera with high accuracy. Furthermore, we have identified a set of CF clones that highly correlate with clinical measures such as, sweat chloride values, BMI and FEV1. Microarray and immunoscreening has a value in clinical practice in antibody detection as it is non-invasive and requiring a minimal amount of blood. The identified sequences can be used to develop peptide/protein-coated magnetic nonoparticles for clinical testing or for applications in drug delivery44. The present study describes a novel approach to identify CF biomarkers. Further studies with a larger cohort group of patients and/or longitudinal studies are needed to investigate the role of these antigens in CF, their mechanism of action and their utilities in drug design and monitoring of therapy.

Materials and Methods


All chemicals were purchased from Sigma-Aldrich (St. Louis, MO) unless specified otherwise. LeukoLOCK filters and RNAlater were purchased from Life Technologies (Grand Island, NY). The RNeasy Midi kit was obtained from Qiagen, (Valencia, CA). The T7 mouse monoclonal antibody was purchased from Novagen (San Diego, CA). Alexa Fluor 647 goat anti-human IgG and Alex Fluor goat anti-mouse IgG antibodies were purchased from Life Technologies (Grand Island, NY).

Patient selection

This study was approved by the institutional review board at Wayne State University, the Detroit Medical Center and Cystic Fibrosis Center. Sera collected from 3 groups: 1) healthy volunteers; 2) confirmed CF subjects, and 3) sera from subjects with adenocarcinoma of the lungs. All study subjects signed a written informed consent. All methods were performed in accordance with the human investigation guidelines and regulations by the IRB (protocol Number = 055208MP4E) at Wayne State University.

Pulmonary function tests were performed following ATS guidelines in a licensed laboratory in all patients unless contraindicated45. All spirometric studies were performed using a calibrated pneumotachograph and lung volumes were measured in a whole-body plethysmograph (Jaeger Spirometry and SensorMedics Vmax 22, VIASYS Respiratory Care, Inc; Yorba Linda, CA, USA). All CF subjects were ambulatory patients. Sweat chloride test values were obtained from the medical records.

Serum collection

Using standardized phlebotomy procedures blood samples were collected and stored at −80 °C12.

Construction and Biopanning of T7 phage display cDNA libraries

T7 phage display libraries from BAL, WBC, EL-1 and MRC5 were made to generate a complex sarcoid library (CSL)12. Differential biopanning for negative selection was performed using sera from healthy controls to remove the non-specific IgG, and sarcoidosis sera for positive enrichment12.

Microarray construction and immunoscreening

Informative phage clones were randomly picked and amplified after four rounds of biopannings and their lysates were arrayed in quintuplicates onto nitrocellulose FAST slides (Grace Biolabs, OR) using the ProSys 5510TL robot (Cartesian Technologies, CA). The nitrocellulose slides were hybridized with sera and processed as described previously12.

Sequencing of phage cDNA clones

Individual phage clones were PCR amplified using T7 phage forward primer 5′ GTTCTATCCGCAACGTTATGG 3′ and reverse primer 5′ GGAGGAAAGTCGTTTTTTGGGG 3′ and sequenced by Genwiz (South Plainfield, NJ), using T7 phage sequence primer TGCTAAGGACAACGTTATCGG.

Data acquisition and pre-processing

Following the immunoreaction, the microarrays were scanned in an Axon Laboratories 4100 scanner (Palo Alto, CA) using 532 and 647 nm lasers to produce a red (Alexa Fluor 647) and green (Alexa Fluor 532) composite image. Cy5 (red dye) labeled antihuman antibody was used to detect IgGs in human serum that were reactive to peptide clones, and a Cy3 (green dye) labeled antibody was used to detect the phage capsid protein12. Using the ImaGene 6.0 (Biodiscovery) image analysis software, the binding intensity of each peptide with IgGs in sera was expressed as log 2 (red/green) fluorescent intensities. These data were pre-processed using the limma package in the R language environment46,47 and normexp method was applied to correct the background48. Within array normalization was performed using the LOESS method48,49. The scale method was applied to normalize between arrays48,49. Intensity ratio of a clone in CF samples divided by the same clone intensity ratio from healthy control samples were calculated to determine the fold change of a clone.

Statistical analyses

To detect frequently differentially expressed antigens for CF we applied a two-tailed t-test. To evaluate the significant CF antigens identified with t-test, we applied principal component analysis (PCA), agglomerative hierarchal clustering (HC), heatmap, and naïve Bayes classifier. To avoid the problem of over-fitting the classifiers, we randomly split the CF and healthy controls samples into: i) training, ii) test, and iii) validation sets. Out of the 31 CF samples, 21 samples were randomly assigned into training (10 samples) and test (11 samples) sets. We repeated 1000 times random processing to generate 1000 training and test sets. The remaining 10 CF samples were used as an independent validation set. The 1000 training and testing sets for the healthy controls were randomly selected from 33 out of 49 samples (16 training and 17 test set). Therefore, the number of samples for the validation set for healthy controls was 16. While 31 LC samples were randomly split into test (15 samples) and validation (16 samples) sets. For CF clones specific selection, we applied a t-test between the 1000 CF training sets vs. 1000 healthy control-training sets. To correct for multiple comparisons, we applied the false discovery rate (FDR) algorithm with a threshold of 0.01 FDR50. The frequency of each significant clone (FDR < 0.01) across all 1000 runs was calculated and sorted based on their frequency of occurrence. The top 20 clones were considered highly significant CF clones. We built a naïve Bayes classifier on each of the 1000 training sets and tested the classifier model on the 1000 testing sets. Finally, the classifier model was validated on a complete independent validation set. The range of clones starts with the most frequent clone followed by adding one clone at a time. We constructed the models on training sets and applied the model on testing sets, as well as validation set. Finally, we determined correlation of biomarkers with body mass index (BMI) and % predicted forced expiratory volume (FEV1) of CF patients. We calculated combinations of 5 clones from the top set of markers. For each combination, the aggregated vector was calculated from the mean of 5 clones and Pearson correlation between the aggregated vector and BMI and FEV1% predicted was determined.

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


  1. 1.

    Stoltz, D. A., Meyerholz, D. K. & Welsh, M. J. Origins of cystic fibrosis lung disease. N Engl J Med 372, 351–362, (2015).

  2. 2.

    Cohen, T. S. & Prince, A. Cystic fibrosis: a mucosal immunodeficiency syndrome. Nature medicine 18, 509–519, (2012).

  3. 3.

    Carter, C. J. Pathogen and autoantigen homologous regions within the cystic fibrosis transmembrane conductance regulator (CFTR) protein suggest an autoimmune treatable component of cystic fibrosis. FEMS immunology and medical microbiology 62, 197–214, (2011).

  4. 4.

    Budding, K., van de Graaf, E. A., Hoefnagel, T., Hack, C. E. & Otten, H. G. Anti-BPIFA1/SPLUNC1: a new autoantibody prevalent in patients with endstage cystic fibrosis. Journal of cystic fibrosis: official journal of the European Cystic Fibrosis Society 13, 281–288, (2014).

  5. 5.

    Pedersen, S. K. et al. An immunoproteomic approach for identification of clinical biomarkers for monitoring disease: application to cystic fibrosis. Molecular & cellular proteomics: MCP 4, 1052–1060, (2005).

  6. 6.

    Mayer-Hamblett, N. et al. Association between pulmonary function and sputum biomarkers in cystic fibrosis. American journal of respiratory and critical care medicine 175, 822–828, (2007).

  7. 7.

    von Bredow, C., Birrer, P. & Griese, M. Surfactant protein A and other bronchoalveolar lavage fluid proteins are altered in cystic fibrosis. Eur Respir J 17, 716–722 (2001).

  8. 8.

    Downey, D. G. et al. The relationship of clinical and inflammatory markers to outcome in stable patients with cystic fibrosis. Pediatric pulmonology 42, 216–220, (2007).

  9. 9.

    Rowe, S. M. et al. Potential role of high-mobility group box 1 in cystic fibrosis airway disease. American journal of respiratory and critical care medicine 178, 822–831, (2008).

  10. 10.

    Sagel, S. D., Chmiel, J. F. & Konstan, M. W. Sputum biomarkers of inflammation in cystic fibrosis lung disease. Proc Am Thorac Soc 4, 406–417, (2007).

  11. 11.

    Rao, A. R., Laxova, A., Farrell, P. M. & Barbieri, J. T. Proteomic identification of OprL as a seromarker for initial diagnosis of Pseudomonas aeruginosa infection of patients with cystic fibrosis. Journal of clinical microbiology 47, 2483–2488, (2009).

  12. 12.

    Talwar, H. et al. Development of a T7 Phage Display Library to Detect Sarcoidosis and Tuberculosis by a Panel of Novel Antigens. EBioMedicine 2, 341–350, (2015).

  13. 13.

    Talwar, H., Talreja, J. & Samavati, L. T7 Phage Display Library a Promising Strategy to Detect Tuberculosis SpecificBiomarkers. Mycobacterial diseases: tuberculosis & leprosy 6, (2016).

  14. 14.

    Wang, X. et al. Autoantibody signatures in prostate cancer. N Engl J Med 353, 1224–1235, (2005).

  15. 15.

    Lin, H. S. et al. Autoantibody approach for serum-based detection of head and neck cancer. Cancer epidemiology, biomarkers & prevention: a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology 16, 2396–2405, (2007).

  16. 16.

    Gibson, L. E. & Cooke, R. E. A test for concentration of electrolytes in sweat in cystic fibrosis of the pancreas utilizing pilocarpine by iontophoresis. Pediatrics 23, 545–549 (1959).

  17. 17.

    Sheikh, S., Zemel, B. S., Stallings, V. A., Rubenstein, R. C. & Kelly, A. Body composition and pulmonary function in cystic fibrosis. Frontiers in pediatrics 2, 33, (2014).

  18. 18.

    Srivastava, M. et al. Serum proteomic signature for cystic fibrosis using an antibody microarray platform. Molecular genetics and metabolism 87, 303–310, (2006).

  19. 19.

    Rab, A. et al. Cigarette smoke and CFTR: implications in the pathogenesis of COPD. American Journal of Physiology-Lung Cellular and Molecular Physiology 305, L530–L541, (2013).

  20. 20.

    Cantin, A. M. Cystic Fibrosis Transmembrane Conductance Regulator. Implications in Cystic Fibrosis and Chronic Obstructive Pulmonary Disease. Annals of the American Thoracic Society 13(Suppl 2), S150–155, (2016).

  21. 21.

    Solomon, G. M., Raju, S. V., Dransfield, M. T. & Rowe, S. M. Therapeutic Approaches to Acquired Cystic Fibrosis Transmembrane Conductance Regulator Dysfunction in Chronic Bronchitis. Annals of the American Thoracic Society 13, S169–S176, (2016).

  22. 22.

    Lovewell, R. R., Patankar, Y. R. & Berwin, B. Mechanisms of phagocytosis and host clearance of Pseudomonas aeruginosa. American journal of physiology. Lung cellular and molecular physiology 306, L591–603, (2014).

  23. 23.

    Hofer, T. P. et al. Decreased expression of HLA-DQ and HLA-DR on cells of the monocytic lineage in cystic fibrosis. Journal of molecular medicine 92, 1293–1304, (2014).

  24. 24.

    Stolz, A. & Wolf, D. H. Endoplasmic reticulum associated protein degradation: a chaperone assisted journey to hell. Biochimica et Biophysica Acta (BBA)-Molecular Cell Research 1803, 694–705, (2010).

  25. 25.

    Meacham, G. C. et al. The Hdj‐2/Hsc70 chaperone pair facilitates early steps in CFTR biogenesis. The EMBO Journal 18, 1492–1505, (1999).

  26. 26.

    Sly, P. D. et al. Risk factors for bronchiectasis in children with cystic fibrosis. New England Journal of Medicine 368, 1963–1970, (2013).

  27. 27.

    DeBoer, E. M. et al. Automated CT scan scores of bronchiectasis and air trapping in cystic fibrosis. CHEST Journal 145, 593–603, (2014).

  28. 28.

    Liu, H., Lazarus, S. C., Caughey, G. H. & Fahy, J. V. Neutrophil elastase and elastase-rich cystic fibrosis sputum degranulate human eosinophils in vitro. American Journal of Physiology-Lung Cellular and Molecular Physiology 276, L28–L34 (1999).

  29. 29.

    Lee, R. L. et al. Thioredoxin and dihydrolipoic acid inhibit elastase activity in cystic fibrosis sputum. American Journal of Physiology-Lung Cellular and Molecular Physiology 289, L875–L882, (2005).

  30. 30.

    Rubin, B. K., Kater, A. P. & Goldstein, A. L. Thymosin β4 sequesters actin in cystic fibrosis sputum and decreases sputum cohesivity in vitro. CHEST Journal 130, 1433–1440, (2006).

  31. 31.

    Schirmbeck, R. et al. Translation from cryptic reading frames of DNA vaccines generates an extended repertoire of immunogenic, MHC class I-restricted epitopes. J Immunol 174, 4647–4656, (2005).

  32. 32.

    Chatterjee, M. et al. Diagnostic markers of ovarian cancer by high-throughput antigen cloning and detection on arrays. Cancer research 66, 1181–1190, (2006).

  33. 33.

    Worthington, R. J., Richards, J. J. & Melander, C. Small molecule control of bacterial biofilms. Organic & biomolecular chemistry 10, 7457–7474, (2012).

  34. 34.

    Lacy, D. E. et al. Serum IgG response to an outer membrane porin protein of Burkholderia cepacia in patients with cystic fibrosis. FEMS immunology and medical microbiology 17, 87–94 (1997).

  35. 35.

    Aronoff, S. C. Outer membrane permeability in Pseudomonas cepacia: diminished porin content in a beta-lactam-resistant mutant and in resistant cystic fibrosis isolates. Antimicrobial agents and chemotherapy 32, 1636–1639 (1988).

  36. 36.

    Ciofu, O. Pseudomonas aeruginosa chromosomal beta-lactamase in patients with cystic fibrosis and chronic lung infection. Mechanism of antibiotic resistance and target of the humoral immune response. APMIS. Supplementum, 1–47 (2003).

  37. 37.

    MacEachran, D. P., Stanton, B. A. & O’Toole, G. A. Cif is negatively regulated by the TetR family repressor CifR. Infection and immunity 76, 3197–3206, (2008).

  38. 38.

    Mahenthiralingam, E., Simpson, D. A. & Speert, D. P. Identification and characterization of a novel DNA marker associated with epidemic Burkholderia cepacia strains recovered from patients with cystic fibrosis. Journal of clinical microbiology 35, 808–816 (1997).

  39. 39.

    Cuthbertson, L. & Nodwell, J. R. The TetR family of regulators. Microbiology and molecular biology reviews: MMBR 77, 440–475, (2013).

  40. 40.

    Valdivieso, A. G. et al. The mitochondrial complex I activity is reduced in cells with impaired cystic fibrosis transmembrane conductance regulator (CFTR) function. PLoS One 7, e48059, (2012).

  41. 41.

    Schneider, E. & Hunke, S. ATP-binding-cassette (ABC) transport systems: functional and structural aspects of the ATP-hydrolyzing subunits/domains. FEMS microbiology reviews 22, 1–20, (1998).

  42. 42.

    Gadsby, D. C., Vergani, P. & Csanády, L. The ABC protein turned chloride channel whose failure causes cystic fibrosis. Nature 440, 477–483, (2006).

  43. 43.

    Bem, A. E. et al. Bacterial histidine kinases as novel antibacterial drug targets. ACS chemical biology 10, 213–224, (2014).

  44. 44.

    Rana, S., Bajaj, A., Mout, R. & Rotello, V. M. Monolayer coated gold nanoparticles for delivery applications. Advanced drug delivery reviews 64, 200–216, (2012).

  45. 45.

    Raghu, G. et al. An official ATS/ERS/JRS/ALAT statement: idiopathic pulmonary fibrosis: evidence-based guidelines for diagnosis and management. American journal of respiratory and critical care medicine 183, 788–824, (2011).

  46. 46.

    Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43, e47, (2015).

  47. 47.

    R: A language and environment for statistical computing, (R Foundation for Statistical Computing, Vienna, Austria., 2015).

  48. 48.

    Ritchie, M. E. et al. A comparison of background correction methods for two-colour microarrays. Bioinformatics 23, 2700–2707, (2007).

  49. 49.

    Yang, Y. H. et al. Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic acids research 30, e15, (2002).

  50. 50.

    Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 289–300, (1995).

Download references


We thank all patients and healthy volunteers for their participation in this study. This project was funded by NIH grant R21HL104481-01A1 awarded to L.S. and with the support of the Department of Medicine, Wayne State University. We would like to thank Drs. Michael A. Tainsky and Nancy Levin for providing lung cancer samples as well as healthy controls and for his invaluable assistance in completing this study. This work has been partially supported by the following grants: NIH R01 DK089167, NIH STTR R42GM087013, NSF DBI-0965741 (to Sorin Draghici), and by the Robert J. Sokol Endowment in Systems Biology.

Author information


  1. Department of Medicine, Division of Pulmonary, Critical Care and Sleep Medicine, Wayne State University School of Medicine and Detroit Medical Center, Detroit, MI, 48201, USA

    • Harvinder Talwar
    • , Andreea Geamanu
    • , Dana Kissner
    •  & Lobelia Samavati
  2. Department of Computer Science, Wayne State University, 540 E, Canfield, Detroit, MI, 48201, USA

    • Samer Najeeb Hanoudi
    •  & Sorin Draghici
  3. Department of Obstetrics and Gynecology, Wayne State University, 540 E, Canfield, Detroit, MI, 48201, USA

    • Sorin Draghici
  4. Center for Molecular Medicine and Genetics, Wayne State University School of Medicine, 540 E, Canfield, Detroit, MI, 48201, USA

    • Lobelia Samavati


  1. Search for Harvinder Talwar in:

  2. Search for Samer Najeeb Hanoudi in:

  3. Search for Andreea Geamanu in:

  4. Search for Dana Kissner in:

  5. Search for Sorin Draghici in:

  6. Search for Lobelia Samavati in:


Harvinder Talwar contributed to the sample processing, conducted the analysis. Samer Hanoudi performed the preprocessing, the processing of the data and the statistical analysis. Andreea Geamanu enrolled the patients, obtained consents and collected the clinical data. Sorin Draghici supervised the data analysis and contributed to the writing of the manuscript. Dana Kissner provided access to patients with CF. Lobelia Samavati conceived and designed the study, participated in all areas of the research such as patient selection and oversaw patient enrollment, data analysis and writing of the manuscript.

Competing Interests

The authors declare that they have no competing interests.

Corresponding author

Correspondence to Lobelia Samavati.

Electronic supplementary material

About this article

Publication history






By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.