Background & Summary

In this study, we compared by next generation RNA-sequencing (RNA-seq) the transcriptomic profile of human airway epithelial cells from cystic fibrosis (CF) patients and healthy donors (NCF). F508del, the most common variant of the CF transmembrane conductance regulator (CFTR) gene, is associated with a severe clinical phenotype that leads to chronic inflammation and infection of the airways by opportunistic pathogens, including Pseudomonas aeruginosa1. The continuous exposure to severe harmful stimuli places lungs at constant risk of injury and thereby, tissue repair is crucial for maintaining lung homeostasis2,3. CFTR plays a key role in regeneration of the airway epithelium, the repair of which is obviously insufficient to maintain lung functions in CF4,5,6,7,8,9,10. Knowledge of the molecular mechanisms regulating airway epithelial cell differentiation was mostly gained from lineage tracing studies in mouse models3. Less is known in human although application of single-cell RNA-seq on airway biopsies and primary HAEC cultures are rapidly filling up this gap11,12,13. The present work aims to identify gene expression changes in CF and NCF human airway epithelial cells (HAECs) undergoing repair. Some cultures of NCF and CF HAECs were also exposed to flagellin for 24 h to mimic Pseudomonas aeruginosa infection and processed for RNA-seq.

The tracheobronchial airway epithelium is pseudostratified and constituted of basal (BCs), secretory Club/Clara (SCs), ionocytes (ICs), mucin-producing goblet (GCs) and ciliated cells (CCs)3,11,12,14. It is well demonstrated that epithelium regeneration/repair is initiated by BC proliferation to repopulate the denuded injured area3. In parallel, subsets of progenitor cells (suprabasal cells, sBCs) cycle and/or progressively mature to intermediate - or early progenitor - cells leading to the generation of SCs. After wound closure, all cells exit the cell cycle, BCs return to their original state while SCs terminate their differentiation to GCs and CCs. Figure 1 illustrates the logFC changes in expression of markers of the different cell subtypes with time of repair after injury of CF and NCF HAEC primary cultures. We focused on the initial steps of repair by comparing the post-wounding conditions (24 h post-wounding pW, wound closure WC, usually reached 42 hours after injury, and 2-days post wound closure pWC) to the control non-wounded condition (NW). We monitored TP63, cytokeratin 5 (KRT5) and KRT14 for BCs (Fig. 1a), KRT4 and KRT13 for sBCs (Fig. 1b), SCGB1A1 and SCGB3A1 for SCs (Fig. 1c), MUC5B and SPDEF for GCs (Fig. 1d), FOXJ1, FOXI1 and CFTR for CCs and ICs (Fig. 1e). Globally, proliferation can be evaluated by the expression of MKI67 (Fig. 1b) and early differentiation by the expression of KRT8 (Fig. 1d), a marker which is not detected in BCs and sBCs. Note that FUT4, a marker of immature SCs is detected (Fig. 1c). The results indicate that the repair process is engaged after wounding in both CF and NCF cultures and that our RNA-seq allows monitoring gene expression during the initial steps before the generation of mature SCs. A schematic overview of the experimental conditions as well as the comparisons performed between conditions and groups are provided in Fig. 2. Table 1 indicates the number of gene changes for each time point after wounding relative to the NW conditions (top). Comparison of the number of gene changes between conditions (pW vs NW; WC vs pW; pWC vs WC) is also given (middle). We also performed comparison of gene changes between CF and NCF HAEC cultures for the different conditions (bottom). Again, up- and downregulated genes in CF HAECs are detected for all conditions, suggesting alterations in the switch between proliferation and differentiation for CF HAECs. Finally, flagellin stimulation at Time 0 (NW) and at WC further highlighted differences in the transcriptomic response of CF HAECs (Table 2).

Fig. 1
figure 1

Changes in gene expression (logFC) of markers of subpopulations of NCF (blue lines and dots) and CF (red lines and dots) HAECs at different times of wound repair as compared to their initial expression (values set at 0) in non-wounded conditions. (a) Expression levels of basal cell marker genes: TP63, KRT5 and KRT14. (b) Expression levels of suprabasal cell marker genes (KRT4 and KRT13) and of a marker of cell proliferation (MKI67). (c) Expression levels of Club cell marker genes (SCGB1A1 and SCGB3A1), including the marker of immature cells (FUT4). (d) Expression levels of goblet cell marker genes (MUC5B and SPDEF) and of KRT8, which is a marker of early cell differentiation. (e) Expression levels of ciliated cell and ionocyte marker genes (FOXJ1 and FOXI1, respectively), with both subpopulations expressing CFTR. Data are expressed as means; error bars were not drawn for clarity since no statistical differences were observed between NCF and CF cultures. pW: post wounding; WC: wound closure; pWC; post wound closure.

Fig. 2
figure 2

Experimental design and condition’s comparison. (a) Schematic illustration of the wound-induced repair process in HAECs. Well-differentiated airway epithelium 3D cultures from CF patients and NCF donors were used, corresponding the non-wounded (NW) condition. At time 0, a circular wound (W) was induced in the center of the culture but leaving intact the epithelium at the periphery. Twenty-four hours after wounding (pW), migrating and proliferating cells started to cover the denuded area. Wound closure (WC) was reached 42 hours after wounding. mRNA was isolated from two Transwells per patient/donor and for each condition, NW, pW, WC and 48 hours after wound closure (pWC; 90 hours after wounding). In parallel experiments, 2 NW and WC Transwells per patient/donor were treated with flagellin to mimic infection with Pseudomonas aeruginosa. (b) Illustration of the gene expression comparisons performed between different conditions after wounding (pW, WC, pWC) and the initial NW condition, exposed or not to flagellin (F). (c) Illustration of the gene expression comparisons performed for all conditions between CF and NCF cultures.

Table 1 Number of differently expressed genes with FDR (False Discovery Rate) 5% and the number of which have a fold-change 2 (FC 2) thresholds.
Table 2 Number of differently expressed genes with FDR (False Discovery Rate) 5% and the number of which have a fold-change 2 (FC 2) thresholds.

In summary, this study presents RNA-seq data from healthy and CF human HAECs undergoing repair after injury. We extracted gene expression of typical marker genes of the different cell subtypes that constitute the airway epithelium and report differences in the repair process between CF and NCF cultures. We believe that these data will be valuable for researchers studying airway epithelium regeneration in the context of the CF disease.


Cell cultures

Well-differentiated primary cultures of bronchial airway epithelial cells (MucilAir™ and MucilAir™-CF) on Transwell filters at the air-liquid interface for 45–60 days were purchased from Epithelix Sàrl (Plan-les-Ouates, Switzerland). All CF HAEC cultures were generated from 7 patients homozygous for the F508del CFTR variant. NCF cultures were generated from 7 subjects but one culture (subject 4) did not differentiate appropriately and was discarded. Detailed characteristics of the patients (age, sex, smoking status) are not available. The basal medium, which consisted of DMEM:F12 (3:1, Life Technologies, Zug, Switzerland) supplemented with 1.5% Ultroser G (Bioserpa, Cergy, France) and antibiotics, was refreshed every 2 days. Mechanical wounding was performed using an airbrush linked to a pressure regulator, as previously described15.

RNA extraction

Total RNA was extracted using Qiagen RNeasy Kit (Qiagen, Hombrechtikon, Switzerland), according to the manufacturer’s instructions. At 24 hours post-wound (pW) and at WC, the Transwell filters were cut off and undamaged cells at the periphery of the wound were discarded from the repairing cells using a sterile scalpel before lysis and RNA extraction. Two filters were pooled per condition. RNA-seq was performed by the iGE3 Genomic Platform at the Faculty of medicine, University of Geneva.

Differential gene expression analysis

Library size normalizations and differential gene expression calculations have been performed using the package edgeR ( The genes having a count above one count per million reads (cpm) in at least 3 samples were kept for the analysis. For each comparison, the latest condition was used as the ‘control’, i.e. genes with a positive fold-change are more expressed in the first condition compared to the ‘control’ condition. Genes with maximal expression value in any of the compared conditions lower than 1 RPKM (reads per kb per million read) were removed from the analysis before calling for differentially expressed genes. The differentially expressed gene tests were done with a general linear model with a negative binomial distribution. The differentially expressed genes p-values are corrected for multiple testing error with a 5% FDR (false discovery rate) and the correction used is Benjamini-Hochberg (BH). By default, the fold-change (FC) and the Benjamini-Hochberg corrected p-value thresholds were set to 2 and 0.01, respectively. Genes with higher Benjamini-Hochberg corrected p-value or lower FC were not considered as differentially expressed.

Data Records

The data can be accessed to NCBI Gene Expression Omnibus (GEO) with the accession number GSE12769616. The lists of differentially expressed genes with FDR 5% and FC 2 thresholds for the comparisons indicated in Tables 1 and 2 are available in figshare17. Datasets of original reads for all conditions (NCF and CF, before and after wounding) are available in the NCBI SRA repository18.

Technical Validation

RNA integrity assessment

Before sequencing, QuBit (Invitrogen) was used to assess RNA quality and quantity without prior purification of the samples.

RNA-seq data quality assessment

Single read of 50 bases, TruSeq stranded mRNA, was performed with a HiSeq 4000 from Illumina. The sequencing quality control was done with FastQC ( Quality scores of 32–40 were achieved (Fig. 3a), corresponding to 1/1000 and 1/10’000 chance of errors, respectively. The reads were mapped with STAR, an ultra-fast and universal RNA-seq aligner, which can do spliced alignments and read clipping:

Fig. 3
figure 3

Quality assessment FASTQ data. (a) The quality distribution along the reads plot is shown for one NCF (left, sample 54) and one CF (right, sample 26) sample. Box and whisker plots demonstrate the distribution of per base quality for each left and right read position read for each of the analyzed samples. Mean value is indicated by the dark line; the yellow box represents the interquartile range (25–75%) with the lower and upper whiskers representing the 10 and 90% points, respectively. (b) MDS (principal components analysis) plot indicating the similarity of the counts in the samples obtained from the first (black letters) and the second (red letters) series of NCF and CF cultures.

Only the reads that are mapped once to the genome are considered for the read allocation to genomic features. Ambiguous reads were removed using featureCounts:

Reads mapping is provided as a Supplementary Data File (Online-only Table 1). For polyA-enriched RNAseq, 70% or more reads uniquely assigned to a gene are considered really good, although this percentage may be affected by the nature of the different expressed genes.

Sequencing was performed on two different occasions with RNA samples collected at one-year interval times. Figure 3b shows the multi-dimensional scaling (MDS) plot (principal components analysis) of the samples, which gives an indication of the similarity of the counts in the earlier and former experiments (first and second series, respectively). No batch effect could be observed between the two sequenced data.