Transcriptomic profile of cystic fibrosis airway epithelial cells undergoing repair

Pathological remodeling of the airway epithelium is commonly observed in Cystic Fibrosis (CF). The different cell types that constitute the airway epithelium are regenerated upon injury to restore integrity and maintenance of the epithelium barrier function. The molecular signature of tissue repair in CF airway epithelial cells has, however, not well been investigated in primary cultures. We therefore collected RNA-seq data from well-differentiated primary cultures of bronchial human airway epithelial cells (HAECs) of CF (F508del/F508del) and non-CF (NCF) origins before and after mechanical wounding, exposed or not to flagellin. We identified the expression changes with time of repair of genes, the products of which are markers of the different cell types that constitute the airway epithelium (basal, suprabasal, intermediate, secretory, goblet and ciliated cells as well as ionocytes). Researchers in the CF field may benefit from this transcriptomic profile, which covers the initial steps of wound repair and revealed differences in this process between CF and NCF cultures.

www.nature.com/scientificdata www.nature.com/scientificdata/ www.nature.com/scientificdata www.nature.com/scientificdata/ (Fig. 1c). The results indicate that the repair process is engaged after wounding in both CF and NCF cultures and that our RNA-seq allows monitoring gene expression during the initial steps before the generation of mature SCs. A schematic overview of the experimental conditions as well as the comparisons performed between conditions and groups are provided in Fig. 2. Table 1 indicates the number of gene changes for each time point after wounding relative to the NW conditions (top). Comparison of the number of gene changes between conditions (pW vs NW; WC vs pW; pWC vs WC) is also given (middle). We also performed comparison of gene changes between CF and NCF HAEC cultures for the different conditions (bottom). Again, up-and downregulated genes in CF HAECs are detected for all conditions, suggesting alterations in the switch between proliferation and differentiation for CF HAECs. Finally, flagellin stimulation at Time 0 (NW) and at WC further highlighted differences in the transcriptomic response of CF HAECs (Table 2).
In summary, this study presents RNA-seq data from healthy and CF human HAECs undergoing repair after injury. We extracted gene expression of typical marker genes of the different cell subtypes that constitute the airway epithelium and report differences in the repair process between CF and NCF cultures. We believe that these data will be valuable for researchers studying airway epithelium regeneration in the context of the CF disease.

Methods
Cell cultures. Well-differentiated primary cultures of bronchial airway epithelial cells (MucilAir ™ and MucilAir ™ -CF) on Transwell filters at the air-liquid interface for 45-60 days were purchased from Epithelix Sàrl (Plan-les-Ouates, Switzerland). All CF HAEC cultures were generated from 7 patients homozygous for the F508del CFTR variant. NCF cultures were generated from 7 subjects but one culture (subject 4) did not differentiate appropriately and was discarded. Detailed characteristics of the patients (age, sex, smoking status) are not available. The basal medium, which consisted of DMEM:F12 (3:1, Life Technologies, Zug, Switzerland) supplemented with 1.5% Ultroser G (Bioserpa, Cergy, France) and antibiotics, was refreshed every 2 days. Mechanical wounding was performed using an airbrush linked to a pressure regulator, as previously described 15 .   www.nature.com/scientificdata www.nature.com/scientificdata/ RNA extraction. Total RNA was extracted using Qiagen RNeasy Kit (Qiagen, Hombrechtikon, Switzerland), according to the manufacturer's instructions. At 24 hours post-wound (pW) and at WC, the Transwell filters were cut off and undamaged cells at the periphery of the wound were discarded from the repairing cells using a sterile scalpel before lysis and RNA extraction. Two filters were pooled per condition. RNA-seq was performed by the iGE3 Genomic Platform at the Faculty of medicine, University of Geneva. Differential gene expression analysis. Library size normalizations and differential gene expression calculations have been performed using the package edgeR (http://www.ncbi.nlm.nih.gov/pmc/articles/ PMC2796818/). The genes having a count above one count per million reads (cpm) in at least 3 samples were kept for the analysis. For each comparison, the latest condition was used as the 'control' , i.e. genes with a positive fold-change are more expressed in the first condition compared to the 'control' condition. Genes with maximal expression value in any of the compared conditions lower than 1 RPKM (reads per kb per million read) were removed from the analysis before calling for differentially expressed genes. The differentially expressed gene tests were done with a general linear model with a negative binomial distribution. The differentially expressed genes p-values are corrected for multiple testing error with a 5% FDR (false discovery rate) and the correction used is Benjamini-Hochberg (BH). By default, the fold-change (FC) and the Benjamini-Hochberg corrected p-value thresholds were set to 2 and 0.01, respectively. Genes with higher Benjamini-Hochberg corrected p-value or lower FC were not considered as differentially expressed.

Data Records
The data can be accessed to NCBI Gene Expression Omnibus (GEO) with the accession number GSE127696 16 . The lists of differentially expressed genes with FDR 5% and FC 2 thresholds for the comparisons indicated in Tables 1 and 2  www.nature.com/scientificdata www.nature.com/scientificdata/ Technical Validation RNA integrity assessment. Before sequencing, QuBit (Invitrogen) was used to assess RNA quality and quantity without prior purification of the samples.
Only the reads that are mapped once to the genome are considered for the read allocation to genomic features. Ambiguous reads were removed using featureCounts: http://www.ncbi.nlm.nih.gov/pubmed/24227677. Reads mapping is provided as a Supplementary Data File (Online-only Table 1). For polyA-enriched RNAseq, 70% or more reads uniquely assigned to a gene are considered really good, although this percentage may be affected by the nature of the different expressed genes.
Sequencing was performed on two different occasions with RNA samples collected at one-year interval times. Figure 3b shows the multi-dimensional scaling (MDS) plot (principal components analysis) of the samples, which gives an indication of the similarity of the counts in the earlier and former experiments (first and second series, respectively). No batch effect could be observed between the two sequenced data.