Introduction

Head and neck squamous cell carcinoma (HNSCC) is the sixth most commonly reported malignancy worldwide1. There are about 500,000 new HNSCC patients diagnosed worldwide each year, and the death rate of new cases is about 20%2,3. Although comprehensive treatment, including surgical resection, radiotherapy, and chemotherapy, continues to develop, the overall 5-year survival rate of HNSCC patients has not been significantly improved over the past few decades and has not exceeded 50%. Thus, HNSCC is considered a malignant tumor with low survival rate4. Therefore, understanding the molecular nature of HNSCC carcinogenesis is a pivotal step for meliorating early diagnosis, predicting prognosis, and developing effective therapeutics5.

In recent years, the regulatory network composed of long non-coding RNAs (lncRNAs), microRNAs (miRNAs) and messenger RNAs (mRNAs) has gained a great interest in the study of molecular biological mechanisms involved in the process of tumor occurrence and progression. lncRNA refers to a non-coding RNA transcript with a length greater than 200 nucleotides, which is located in the nucleus and cytoplasm of eukaryotic cells6. A large number of clinical and experimental studies have shown that some lncRNAs play important roles in the emergence and development of malignant tumors7.

miRNA is an endogenous single stranded RNA molecule of 22 nucleotides length that does not encode for a protein. miRNA inhibits the expression of a target gene by complementation binding of its seed region to the microRNA response elements (MREs) on the mRNA8. The regulation network between miRNA and target genes is involved in a variety of biological processes, including tumor occurrence and metastasis9,10. In recent years, research on the role of miRNAs and their implication in the regulation of initiation and development of HNSCC has advanced, and a large number of papers have been published. In 2016, Irani provided a comprehensive literature review of the role of miRNAs in head and neck cancer metastasis11. These studies have laid a good foundation for understanding the mechanism of lncRNA/miRNA/mRNA interactions.

In 2011, Salmena and colleagues presented the competing endogenous RNA (ceRNA) hypothesis which states that the pool of mRNAs, lncRNAs, and other non-coding RNAs share common MREs with miRNAs, which can act as natural miRNA “sponges” and inhibit miRNA function through competitive binding to MREs on the target mRNA12. This theory is based on the fact that lncRNAs can regulate expression of target regulatory genes not only by direct interaction, but also indirectly through competitive binding with miRNAs, as they contain MREs which can bind to the core seed sequence of miRNA. At present, the ceRNA regulation theory has been proven to be implicated in the development of cancer by some related studies. For example, Guo et al. found that lncRNA-BGL3 was a target of miR-17, miR-93, miR-20a, miR-20b, miR-106a and miR-106b, microRNAs that repress mRNA of phosphatase and tensin homolog (PTEN). Further experiments showed that lncRNA-BGL3 operated as a competitive endogenous RNA for binding these microRNAs to cross-regulate PTEN expression13. The lncRNA gene, HLUC, acts as a ceRNA of PRKACB in liver cancer by competing with miR-372, thereby reducing miR-372’s inhibitory effect on CREB and causing up-regulation of CREB14. With the development of bioinformatics technology, more researchers have adopted data analysis and mining methods to study ceRNA networks. In addition, some representative databases, such as miRTarBase15, miRDB16, TargetScan17 and StarBase18 provide data and useful resources for studying ceRNA networks. The Cancer Genome Atlas (TCGA) research team has published a comprehensive resource for multidimensional molecular spectroscopy that stores a large number of tumor samples. These datasets make it feasible to build a ceRNA regulatory network in cancer.

In recent years, numerous studies have confirmed that the lncRNA-miRNA-mRNA ceRNA regulatory network is implicated in the development of gastric, liver, breast, pancreatic and bladder cancer19,20,21,22,23. However, similar studies on HNSCC are limited, and there is still a lack of comprehensive analysis of lncRNAs and miRNAs related to HNSCC based on high-throughput sequencing and large-scale sample size. In this paper, we obtained RNA expression data and compared the expression profiles between 44 normal tissues and 502 tumor tissues of HNSCC. Following, we identified differentially expressed mRNAs, miRNAs and lncRNAs between the samples from HNSCC patients. Finally, 8 miRNAs, 16 mRNAs and 71 lncRNAs were selected to build the lncRNA-miRNA-mRNA ceRNA network. Based on the survival analysis of RNAs in the ceRNA network, we analyzed the lncRNAs that significantly affect the survival and prognosis of HNSCC patients.

Materials and Methods

Patients and TCGA data retrieval

The RNA sequence data of 546 samples with head and neck squamous cell carcinoma were retrieved from the TCGA data portal. The TCGA dataset (https://portal.gdc.cancer.gov/), being comprised of more than two petabytes of genomic data, is publically available, and this genomic information helps the cancer research community to improve the prevention, diagnosis, and treatment of cancer. This study is in accordance with the publication guidelines provided by TCGA (https://cancergenome.nih.gov/publications/publicationguidelines). Since the data comes from the TCGA database, no further approval was required from the Ethics Committee.

RNA sequence data processing

The RNA expression data (level 3) of 546 HNSCC samples were downloaded from the TCGA data portal (up to Aug 29, 2017). The RNA and miRNA sequence data from the 546 samples had been derived from the IlluminaHiSeq_RNASeq and the IlluminaHiSeq_miRNASeq sequencing platforms; all the data were freely available to download. The sequencing data of the 546 samples contained the corresponding RNA-seq and miRNA-seq data. 546 samples were divided into 2 cohorts: 502 tumor samples and 44 normal samples.

In this paper, we mainly used the program code written in Perl and R language to analyze and deal with RNA data.

Identification of differentially expressed mRNAs (DEmRNAs), miRNAs (DEmiRNAs) and long non-coding RNAs (DEIncRNAs)

We identified mRNAs and lncRNAs by using the Ensembl database (http://www.ensembl.org/index.html, version 89)24. In this study, lncRNAs and mRNAs that were not included in the database were excluded.

Before conducting differential expression analysis, we wrote the R language code to ensure that all unexpressed RNAs was filtered out. This was carried out by deleting all rows with a mean read of less than or equal to one. By using the edgeR25 software to further analyze the data, the differentially expressed mRNAs, lncRNAs and miRNAs were obtained. All P values use false discovery rate (FDR) to correct the statistical significance of the multiple test. Fold changes (log2 absolute) 2 and FDR adjusted to P < 0.01 were considered significant.

For the obtained differentially expressed mRNAs, lncRNAs, and miRNAs, we generated heat maps and volcano maps using the gplots and heatmap packages in the R platform.

Construction of a ceRNA regulatory network

The ceRNA control network of HNSCC was mainly established by the following steps.

First, the miRcode database26 was used to predict interactions between lncRNA with miRNAs. miRcode provides “whole transcriptome” human microRNA target predictions based on the comprehensive GENCODE gene annotation, including >10,000 long non-coding RNA genes26. miRNA targets can be predicted scientifically by entering the name of the relevant lncRNA and miRNA in the miRcode website platform (http://www.mircode.org/).

Next, we used miRTarBase15, miRDB16 and TargetScan17 databases to retrieve miRNA targeted mRNAs. miRTarBase is a database that has accumulated more than three hundred and sixty thousand miRNA-target interactions (MTIs), which are collected by manually surveying pertinent literature and mining the text systematically to filter research articles related to functional studies of miRNAs15. miRDB is an online database for miRNA target prediction and functional annotations. All the targets in miRDB were predicted by mirTarget, which was developed by analyzing thousands of miRNA-target interactions from high-throughput sequencing experiments16. TargetScan predicts biological targets of miRNAs by searching for the presence of conserved 8mer, 7mer, and 6mer sites that match the seed region of each miRNA17. In order to improve the validity of our search results, we only included miRNA-targeted mRNAs present in all three databases to construct the ceRNA network.

At last, we used Cytoscape 3.5.1 software to visually map the results. Cytoscape (http://www.cytoscape.org/) is an open source software platform for visualizing molecular interaction networks and biological pathways and integrating these networks with annotations, gene expression profiles and other state data.

Survival analysis

We use the R survival package (https://CRAN.R-project.org/package=survival, Version: 2.41-3) for survival analysis for all RNAs in the ceRNA network. The univariate Cox proportional hazards regression27 was carried out to identify the lncRNAs, mRNAs and miRNAs whose expression correlated with overall survival. For the overall survival rates, the log-rank test was used to compare the significant differences in univariate analysis between subgroups. Unless otherwise stated, a P value of less than 0.05 is considered statistically significant.

Data availability

The datasets analysed during the current study are available in the TCGA repository, https://portal.gdc.cancer.gov/.

Results

Differentially expressed mRNAs (DEmRNAs) in HNSCC

This study investigated the expression levels of RNAs in 502 tumor tissues (cohort T) and 44 normal tissues (cohort N). A differentially expressed gene is defined as a gene whose log2 value of the differential expression multiple value (logFC) is greater than 2 and the corrected P value (FDR) is less than 0.01. According to this standard, 869 (43.51%) up-regulated mRNAs and 1128 (56.49%) down–regulated mRNAs were identified by using the edgeR package. The first 25 up-regulated mRNAs, the first 25 down-regulated mRNAs, and their corresponding logFC, P-value, and FDR values, generated by edgeR, are outlined in Table 1. A complete list of DEmRNAs is provided in appendix 1. In Fig. 1, we show the distribution of all the differentially expressed mRNAs on the two dimensions of -log (FDR) and logFC through a volcano map. All the mRNA expression levels were standardized to the sample mean.

Table 1 Differentially expressed mRNAs in HNSCC samples.
Figure 1
figure 1

Volcano map of DEmRNAs. Red spots represent up-regulated genes, and green spots represent down regulated genes.

From Table 1 and Fig. 1, we can see that the difference of down-regulated mRNAs is more significant than the up-regulated mRNAs, but the logFC value between them is relatively close.

Meanwhile, in order to better understand the mechanisms involved in the development of head and neck squamous cell carcinoma, we used the ClusterProfiler package in the R language for KEGG analysis and mapping, to further analyze the functional characteristics of differentially expressed RNA. The 23 significantly enriched KEGG pathways were listed in Table 2, and the first 12 pathways with the most significant p-values were shown in Fig. 2.

Table 2 The complete list of 23 significantly enriched KEGG pathways (pvalueCutoff = 0.05 and qvalueCutoff = 0.05).
Figure 2
figure 2

The first 12 pathways with the most significant p-values. The x-axis represents the number of DE mRNAs involved in the pathway.

Differentially expressed lncRNAs (DElncRNAs) in HNSCC

We identified 1041 lncRNAs to have statistically significant differential expression (logFC > 2 and FDR < 0.01)in HNSCC samples compared to normal tissues, including 755 up-regulated lncRNAs (72.5%) and 286 down-regulated lncRNAs (27.5%). The first 25 up-regulated lncRNAs and the first 25 down-regulated lncRNAs are outlined in Table 3. The distribution of all the differentially expressed lncRNA on -log (FDR) and logFC are depicted in the volcano map in Fig. 3.

Table 3 Differentially expressed lncRNAs in HNSCC samples.
Figure 3
figure 3

Volcano map of DElncRNAs. Red spots represent up-regulated genes, and green spots represent down regulated genes.

As demonstrated in Table 3 and Fig. 3, most of the differentially expressed lncRNAs are up-regulated (72.5%), but the statistical significance in the down-regulated lncRNAs is more prominent.

Differentially expressed miRNAs (DEmiRNAs) in HNSCC

To construct the lncRNA-miRNA-mRNA ceRNA regulatory network, we also compared miRNA differential expression profiles in tumor tissues and normal tissues. As a result, a total of 82 differentially expressed miRNAs were identified in samples from HNSCC patients, including 44 up-regulated miRNAs (53.6%) and 38 down-regulated miRNAs (46.4%). The first 25 up-regulated miRNAs and the first 25 down-regulated miRNAs are listed in Table, and the volcano map of the related differentially expressed miRNAs is shown in Fig. 4.

Figure 4
figure 4

Volcano map of DEmiRNAs. Red spots represent up-regulated genes, and green spots represent down regulated genes.

Table 4 and Fig. 4 show that, although the number of up-regulated miRNAs is slightly higher than the number of down-regulated miRNAs, the significance of the down-regulated miRNAs remains prominent.

Table 4 Differentially expressed miRNAs in HNSCC samples.

Construction of a ceRNA regulatory network in HNSCC

In order to better understand the role of differentially expressed lncRNAs in HNSCC and to further elucidate the interaction between these differentially expressed lncRNAs and differentially expressed miRNAs, we constructed a lncRNA-miRNA-mRNA related ceRNA regulatory network of HNSCC.

First, we used the 1041 differentially expressed lncRNAs retrieved from the miRcode database26, and applied the Perl program to identify 173 pairs of interacting lncRNAs and miRNAs. From the 82 retrieved DEmiRNAs, we predicted that 8 of them could interact with 71 differentially expressed lncRNAs. Following, we found that these 8 DEmiRNAs targeted 411 mRNAs in all three databases (miRTarBase, miRDB and TargetScan). The 411 targeted mRNAs were cross-checked with the 1997 DEmiRNAs retrieved from the miRcode database. The results showed that 16 mRNAs were involved in the construction of the ceRNA regulatory network (Fig. 5). Finally, we constructed the ceRNA regulatory network of head and neck Squamous cell carcinoma by incorporating 71 DElncRNAs, 16 DEmRNAs and 8 DEmiRNAs, as shown in Fig. 6.

Figure 5
figure 5

Venn diagram of mRNAs involved in ceRNA regulation network. As shown in the figure, the number of mRNA expressed in the red area is only the difference expression, rather than the target number. The blue area presents only the target number instead of the number of mRNA expressed differently, while the purple area in the middle represents the number of mRNA which is both the differential expression and the target.

Figure 6
figure 6

DElncRNA mediated ceRNA regulatory network in head and neck squamous cell carcinoma. The nodes highlighted in red indicate expression up-regulation, and the nodes highlighted in blue indicate expression down-regulation. IncRNAs, miRNAs and mRNAs are represented by diamonds, rounded rectangles, and ellipses, respectively.

Correlation analysis of survival and the expression of lncRNAs, miRNAs and mRNAs in the ceRNA network

To identify the potential lncRNAs, miRNAs and mRNAs with prognostic characteristics, the expression levels of 71 lncRNAs, 8 miRNAs and 16 mRNAs in the ceRNA network were profiled using the univariate Cox proportional hazards regression model. As a result, only one miRNA, one mRNA and thirteen lncRNAs were found to be significantly associated with overall survival (P < 0.05). Through the Kaplan-Meier curve analysis, we found that the expression levels of the STC2 mRNA and hsa-mir-206 miRNA were negatively correlated with the overall survival rate (P < 0.05; Fig. 7). In addition, 13 lncRNAs were closely related to survival. 8 differentially expressed IncRNA, including ABCA9-AS1, AL163952.1, AL356056.1, ANO1-AS2, AP002478.1, HOTTIP, LINC00052, and LINC00460, were negatively correlated with overall survival (P < 0.05). On the contrary, 5 differentially expressed IncRNA, including AL161645.1, HCG22, MIAT, MUC19, and ZFY-AS1, were positively correlated with overall survival time (P < 0.05). In Fig. 8, we show the Kaplan-Meier curves of 6 lncRNAs with the most significant P values. In the meantime, in order to better understand the relationship between the expression of these 13 lncRNAs and the patient’s survival time, we generated risk heat maps of these 13 lncRNAs in combination with clinical survival data (Fig. 9).

Figure 7
figure 7

Kaplan-Meier curve analysis of DEmRNA (STC2), DEmiRNA (has-mir-206) in HNSCC patients.

Figure 8
figure 8

Kaplan-Meier curve analysis of DElncRNAs and overall survival rate in HNSCC patients.

Figure 9
figure 9

A risk heat map constructed from 13 lncRNAs from 546 HNSCC patients that have a significant impact on survival. The value of risk rises gradually from left to right.

The constructed ceRNA network showed a possible interaction between DElncRNAs and mRNAs in head and neck squamous cell carcinoma. To confirm this finding, we performed regression analysis between the expression levels of 13 DElncRNAs significantly related to survival and all 16 DEmRNAs included in the network. It was shown that most lncRNA and mRNA in the network do not have direct linear correlation, except HOTTIP and HOXA10 (Fig. 10).

Figure 10
figure 10

Linear regression of HOTTIP vs HOXA11 expression level. Dashed lines represent 95% confidence interval.

Discussion

In recent years, lncRNA related research has attracted the attention of various cancer fields. Accumulative evidence shows that long-chain non-coding RNAs play a very important regulatory role in tumorigenesis and tumor progression. Although research on lncRNAs for head and neck squamous cell carcinoma is limited, a recent study has reported few differentially expressed lncRNAs such as HOTAIR, NEAT1, UCA1, and MALAT1, in oral cancers28. The HOX transcript antisense intergenic RNA (HOTAIR) is one of the most widely studied lncRNAs29,30. Different studies indicate that HOTAIR plays a significant role in the metastatic process and may be a predictor of poor patient prognosis when is highly expressed31,32. A meta-analysis, performed by Troiano et al., revealed that HOTAIR’s high expression was related to advanced tumor stage, lymph-node metastasis and poor overall survival, which demonstrated the potential prognostic role of HOTAIR in HNSCC30. Tang et al. first demonstrated that HOTAIR, NEAT-1, UCA1 expression levels were up-regulated, while MEG-3 expression levels were down-regulated in oral squamous cell carcinoma (OSCC) and its corresponding adjacent tissues33. They also showed that only HOTAIR could be detected in the patient’s saliva, especially in patients with lymph node metastases33. Li et al. found that HOTAIR expression in laryngeal squamous cell carcinoma (LSCC) was significantly higher than that in para-cancerous tissues, it was correlated with patients’ poor prognosis, and was an independent prognostic factor of LSCC34.

In the ceRNA network constructed in this paper, we found that the high expression of the lncRNA HOTAIR is closely related to 4 differentially expressed miRNAs (hsa-mir-301b is highly expressed, whilst hsa-mir-211, hsa-mir-206 and hsa-mir-375 are low expressed). The highly homologous miRNA to hsa-mir-301b, hsa-miR-301a-3p, has been proven to act as an oncogene by directly regulating the anti-oncogene Smad4, thereby playing a role in the emergence and development of laryngeal squamous cell carcinoma35. Koshizuka et al. showed that miR-1 and miR-206 were down-regulated in HNSCC clinical specimens, which is in agreement with our results. Furthermore, they found that two growth factor receptors, the epidermal growth factor receptor (EGFR) and the hepatocyte growth factor receptor (c-MET), were directly regulated by both miR-206 and miR-1 in HNSCC cells36. Bruce et al. has reported that metadherin was the direct target of miR-375 and their control mechanism may represent a novel oncogenic pathway that drives human head and neck cancer (HNC) progression, possibly through the PI(3) K pathway37. Finally, mir-211 has also been reported to promote head and neck carcinoma progression by targeting TGFβRII38.

Considering all the identified lncRNAs, HOTTIP’s abnormal expression had the most significant impact on the survival of patients (P = 6.241e-04). Meanwhile, the only miRNA and mRNA showing significant differences in their expression levels (P < 0.05) were hsa-mir-206 and STC2, respectively. HOTTIP has been proven to be correlated with the progression and prognosis of tongue squamous cell carcinoma, pancreatic cancer, osteosarcoma, and hepatocellular carcinoma39,40,41,42. miR-206 has been closely linked to the diagnosis, proliferation, and prognosis of multiple cancers, including HNSCC, by many research reports43,44,45,46. Yang et al. revealed that STC2 controls HNSCC metastasis via the PI3K/AKT/Snail signaling pathway and that targeted therapy against STC2 may be a novel strategy to effectively treat patients with metastatic HNSCC47. In 2014, Ren et al. demonstrated that miR-206 was expressed at markedly low levels in a cohort of gastric tumors in comparison with their matching normal tissues, and in high amounts in gastric cancer cell lines48. Furthermore, they found that the miR-206’s anti-metastatic effect was probably mediated through targeting the metastasis regulatory gene STC2 and other mRNAs, which were drastically down-regulated by the exogenous miR-206’s stable expression in SCG-7901 cells48. However, in our study, STC2 was obviously down-regulated by the low expression of mir-206 in HNSCC patients.

In addition to HOTTIP, we also found several lncRNAs in the ceRNA network that were closely linked to the survival of HNSCC patients. 13 lncRNAs had a significant impact on survival, and the expression of some lncRNAs showed a more obvious change trend as the risk increased. From the survival analysis, only LINC00460 had been previously described as a therapeutic target and a novel prognostic biomarker for the diagnosis and treatment of nasopharyngeal carcinoma49. The remaining five lncRNAs have been identified for the first time to be closely related to the prognosis of HNSCC, which can serve as potential targets for future clinical treatments.

Conclusion

To conclude, our study has identified differentially expressed mRNAs, lncRNAs, and miRNAs in HNSCC patients. Importantly, a ceRNA network has been constructed to propose a novel regulatory mechanism for the development of HNSCC. The lncRNAs identified in our constructed ceRNA network may have an important impact on the survival and prognosis of HNSCC patients.