Transcriptome Analysis Reveals Distinct Gene Expression Profiles in Eosinophilic and Noneosinophilic Chronic Rhinosinusitis with Nasal Polyps

Chronic rhinosinusitis with nasal polyps (CRSwNP), one of the most prevalent chronic diseases, is characterized by persistent inflammation of sinonasal mucosa. However, the pathogenesis of CRSwNP remains unclear. Here, we performed next-generation RNA sequencing and a comprehensive bioinformatics analyses to characterize the transcriptome profiles, including mRNAs and long noncoding RNAs (lncRNAs), in patients with eosinophilic and noneosinophilic CRSwNP. A total of 1917 novel lncRNAs and 280 known lncRNAs were identified. We showed eosinophilic CRSwNP (ECRSwNP) and noneosinophilic CRSwNP (non-ECRSwNP) display distinct transcriptome profiles. We identified crucial pathways, including inflammatory, immune response and extracellular microenvironment, connected to the pathogenetic mechanism of CRSwNP. We also discovered key lncRNAs differentially expressed, including lncRNA XLOC_010280, which regulates CCL18 and eosinophilic inflammation. The qRT-PCR and in situ RNA hybridization results verified the key differentially expressed genes. The feature of distinct transcriptomes between ECRSwNP and non-ECRSwNP suggests the necessity to develop specific biomarkers and personalized therapeutic strategies. Our findings lay a solid foundation for subsequent functional studies of mRNAs and lncRNAs as diagnostic and therapeutic targets in CRSwNP by providing a candidate reservoir.


Patients and biopsy specimens
The study was approved by the Ethics Committee of Peking Union Medical College Hospital. All study participants provided written informed consent and were recruited at Peking Union Medical College Hospital. All experiments were performed in accordance with approved guidelines. A total of 35 patients with ECRSwNP, 31 patients with non-ECRSwNP and 34 control subjects were recruited for the study. The diagnosis of CRSwNP was based on standard criteria issued in the European Position Paper on Rhinosinusitis and Nasal Polyps 2012 guidelines(1). None of the subjects used systemic or topical corticosteroids or other immune-modulating drugs for at least 4 weeks before surgery.
All CRSwNP patients were Chinese with bilateral CRSwNP. The presence of CRS and bilateral NPs was confirmed by means of office endoscopy and computed tomographic imaging. They failed to respond to maximal medical therapy and subsequently underwent functional endoscopic sinus surgery. Nasal polyp tissues from the apex region of polyps were collected during functional endoscopic sinus surgery. Control subjects were patients undergoing endoscopic trans-sphenoid removal of nonfunctioning pituitary adenomas not having inflammatory sinonasal diseases. The sphenoid mucosal tissues of control subjects were collected during the surgery.
The atopic status was evaluated by the Phadiatop test (Phadia, Uppsala, Sweden) to detect IgE antibodies against various common inhalant allergens. Specific IgE values of 0.35 kIU/L or greater were considered positive. The diagnosis of asthma was based on history and pneumologist diagnosis. Subjects who had an immune deficient disease, antrochoanal polyps, fungal sinusitis, cystic fibrosis, primary ciliary dyskinesia, or gastroesophageal reflux disease were excluded from the study.
The CRSwNP patients were classified based on the tissue eosinophil number, counted in the lamina propria of the polyps in five random microscopic high power fields (HPFs, ×400 magnification). We defined ECRSwNP as subjects whose tissue eosinophil number per HPF was more than twenty(2), and non-ECRSwNP as not fulfilling the criteria.

Sampling of tissue specimens and histological analysis
The tissues were divided into three parts immediately after harvest from the surgery.
One part was fixed in 10% formalin, and later embedded in paraffin for hematoxylin and eosin (HE) staining. The other two parts were immediately snap-frozen in liquid nitrogen and stored at -80°C for whole transcriptome sequencing and quantitative Real-time PCR (qRT-PCR) experiments. Paraffin sections (4µm) were stained with HE dye. In order to determine the degree of eosinophil infiltration, two of the authors independently detected the number of eosinophils per HPF by manually counting five random HPF of lamina propria per specimen in the HE sections. We used a Leica DM3000 light microscope with an N Plan 40×/0.65 lens and an FN 22 eyepiece; this 'HPF' includes an area of 0.24mm 2 on the slide. In cases of disagreement (when 2 average numbers differed by >10%), a consensus was reached by our research team reviewing the specimen together.

Whole transcriptome library preparation and sequencing
Total RNA was isolated from 3 ECRSwNP, 3 non-ECRSwNP and 3 control subjects using Trizol (Invitrogen) according to the manufacturer's instructions. RNA purity was checked using the NanoPhotometer spectrophotometer (IMPLEN, CA, USA). RNA concentration was measured using Qubit RNA Assay Kit in Qubit 2.0 Flurometer (Life Technologies, CA, USA). RNA integrity was verified on an Agilent Bioanalyzer 2100 (Agilent Technologies, Palo Alto, CA). Only samples with RNA integrity number (RIN) over 7.0 were used for library construction. A total amount of 3 µg RNA per sample was used as input material for library construction. Ribosomal RNA (rRNA) was removed by Epicentre Ribo-zero rRNA Removal Kit (Epicentre, USA) according to the manufacturer's instructions. Subsequently, strand-specific sequencing libraries were generated with the dUTP method using the resulting RNA by NEBNext Ultra Directional RNA Library Prep Kit for Illumina (NEB, USA) following manufacturer's recommendations. The library quality and quantity was assessed using the Agilent Bioanalyzer 2100 system and qRT-PCR. RNA-seq was performed on an Illumina Hiseq 2000 platform and 100 bp paired-end reads were generated according to Illumina's protocol. The NCBI Gene Expression Omnibus accession number for the RNA-Seq data reported in this paper is GSE72713.

RNA-seq data analysis
(1) Quality control: The raw sequencing data (raw reads) were stored in the FASTQ format. Clean data (clean reads) were obtained by removing reads containing adapter, reads containing N (N means not knowing the base information) for more than 10% of all the bases and low quality reads marked by sequencing platform from raw data. All the down stream analyses were based on the clean data with high quality.
(3) Transcripts identification pipeline: For mRNA analyses, the RefSeq database (Build 37.3) was chosen as the annotation references. For lncRNA analyses, the GENCODE v19 database was chosen as the annotation references. To identify lncRNA transcripts from the assembled transcripts set, we ran the set through the following filters: (a) Expression level threshold. The transcripts were reconstructed in at least two different tissues or by both Scripture and Cufflinks in the same tissue with minimal read coverage ≥ 3. This step aimed to distinguish the lowly expressed lncRNAs from the various lowly expressed, unreliable fragments assembled from the RNA-seq data.
(b) Size selection. The transcripts were multiexonic and not smaller than 200 bases.
This step aimed to remove transcripts which were not consistent with the definition of lncRNA, or single-exonic, which might result from potential DNA contamination.
(c) Filter of known non-lncRNA annotations. We eliminated all transcripts that overlapped a transcript which included all known protein-coding genes, microRNAs, tRNAs, miscRNA, rRNAs and pseudogenes and retained all transcripts previously annotated as lncRNAs (GENCODE v19) or that have not been annotated thus far.
(d) Positive coding potential threshold. We scored the coding potential of novel transcripts using phylogenetic codon substitution frequency (PhyloCSF) (5). All transcripts with PhyloCSF scores >100 were discarded.
(e) Known protein domain filter. To eliminate protein-coding transcripts thoroughly, we translated every transcript in all three possible reading frames and employed Pfam Scan (v1.3) to detect any of the known protein domains cataloged in the Pfam database (release 27; used both Pfam A and Pfam B) (6). We excluded all transcripts with a Pfam hit. Finally, the remaining transcripts were considered reliably expressed lncRNAs.
(4) Differential Expression Analysis The read counts of each transcript were normalized to the length of the individual transcript and to the total mapped fragment counts in each sample and expressed as fragments per kilo-base of exon per million fragments mapped (FPKM) of both lncRNAs and mRNAs in each sample. The mRNA and lncRNA differential expression analyses for all pairwise comparisons: ECRSwNP versus CTRL, non-ECRSwNP versus CRTL, and ECRSwNP versus non-ECRSwNP using Cuffdiff (v2.1.1) (3). An adjusted P value <0.05 (Student's t-test with Benjamini-Hochberg false discovery rate (FDR) adjustment) was used as the cut-off for significantly differentially expressed genes.
(5) GO and KEGG enrichment analysis Differentially expressed genes were analyzed by enrichment analyses to detect over-represented functional terms present in the genomic background. Gene ontology (GO) analysis was performed using the GOseq R package(7), in which gene length bias was corrected. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were performed using KOBAS software(8).

Prediction of the function of lncRNAs
Most of the lncRNAs in current databases have not yet been functionally annotated.
Thus the prediction of their functions is based on the functional annotations of their related cis and trans target mRNAs. We defined potentially cis-regulated target genes as protein-coding genes within 100kb in genomic distance from the lncRNA, and potentially trans-regulated target genes as protein-coding genes coexpressed with the lncRNA with Pearson correlation coefficient (PCC) >0.95 or <-0.95 and beyond 100kb in genomic distance from the lncRNA or in different chromosomes. We analyzed the expression correlation of lncRNA:cis-mRNA pairs and lncRNA:trans-mRNA pairs by calculating PCC.

Quantitative Real-time PCR
Total RNA was isolated using Trizol (Invitrogen) following the manufacturer's instruction. The cDNA was synthesized using Superscript III (Invitrogen). All qRT-PCR primers (Table S8) were verified to produce specific PCR product and to react efficiently. All qRT-PCR reactions were performed on a Roche Lightcycler480 real-time PCR system using TOYOBO Thunderbird SYBR qRT-PCR Mix (TOYOBO, OSAKA JAPAN) with technical triplicates. Relative quantification of target genes was performed using the 2 -ΔΔCt method with GAPDH as a reference gene.