A ceRNA-associated risk model predicts the poor prognosis for head and neck squamous cell carcinoma patients

Head and neck squamous cell carcinoma (HNSCC) is one of the most malignant cancers with poor prognosis worldwide. Emerging evidence indicates that competing endogenous RNAs (ceRNAs) are involved in various diseases, however, the regulatory mechanisms of ceRNAs underlying HNSCC remain unclear. In this study, we retrieved differentially expressed long non-coding RNAs (DElncRNAs), messenger RNAs (DEmRNAs) and microRANs (DEmiRNAs) from The Cancer Genome Atlas database and constructed a ceRNA-based risk model in HNSCC by integrated bioinformatics approaches. Functional enrichment analyses showed that DEmRNAs might be involved in extracellular matrix related biological processes, and protein–protein interaction network further selected out prognostic genes, including MYL1 and ACTN2. Importantly, co-expressed RNAs identified by weighted co-expression gene network analysis constructed the ceRNA networks. Moreover, AC114730.3, AC136375.3, LAT and RYR3 were highly correlated to overall survival of HNSCC by Kaplan–Meier method and univariate Cox regression analysis, which were subsequently implemented multivariate Cox regression analysis to build the risk model. Our study provides a deeper understanding of ceRNAs on the regulatory mechanisms, which will facilitate the expansion of the roles on the ceRNAs in the tumorigenesis, development and treatment of HNSCC.


Identification of DElncRNAs, DEmRNAs and DEmiRNAs. A schematic of the workflow of this work
is shown in Fig. 1. The clinical features of 502 HNSCC patients were demonstrated in Fig. 2. We collected level 3 HNSCC-related RNA-seq count data of 502 HNSCC samples and 44 normal samples and level 3 miRNA-seq count data of 525 HNSCC samples and 44 normal samples from the National Cancer Institute's Genomic Data Comments data portal. In total, 25,295 lncRNAs, 19,601 mRNAs and 1880 miRNAs were extracted to implement normalization and variance-stabilizing transformation by "DESeq2" package. Using "DESeq2" package to perform differentially expressed gene analysis (DEGA), we identified DElncRNAs, DEmRNAs and DEmiR-NAs between HNSCC samples and normal samples with the cut-off of |log2 (foldchange)| (|log2FC|) ≥ 1 and adjusted P value < 0.05. In total, 5749 DElncRNAs were screened out, including 2243 downregulated and 3506 upregulated lncRNAs in HNSCC samples. Of the 4790 DEmRNAs, 2418 were downregulated and 2372 were upregulated. Among 303 DEmiRNAs, there were 126 downregulated miRNAs and 177 upregulated miRNAs. The volcano plots displayed the distributions of DElncRNAs, DEmRNAs and DEmiRNAs (Fig. 3a) and the heat maps manifested the expression levels of these genes (Fig. 3b).

Construction of PPI networks and validation of key DEmRNAs.
To select robust HNSCC-specific mRNAs, we used downregulated and upregulated DEmRNAs with |log2FC| > 2 to perform PPI analysis, and chose the gene pairs with the highest confidence (the combined score > 0.9) for further analysis, separately. In total, downregulated PPI network included 343 nodes and 1188 edges (Fig. 4a), while upregulated included 256 nodes and 1032 edges (Fig. 4b). Subnetworks selected by MCODE algorithm were displayed and clustered together based on the same cluster number, whose nodes were colored in different shades of colors according to cluster scores. The most significantly correlated module of downregulated PPI network included 22 downregulated DEmRNAs (Fig. 4c), and upregulated module covered 23 upregulated DEmRNAs (Fig. 4d) www.nature.com/scientificreports/ advanced clinical stages when compared to normal samples. On the contrary, higher expressions of 3 upregulated mRNAs were observed in the advanced stages. Consistent with results from our analysis of the TCGA database, the subsequent expression analysis from Gene Expression Profiling and Interactive Analyses (GEPIA) database revealed that MYL1 and ACTN2 were decreasingly expressed in HNSCC patients and COL1A1, COL1A2 and COL3A1 were upregulated (Fig. 4g). Survival analysis showed that high expression of MYL1 (Log-Rank (LR) P = 0.0064), ACTN2 (LR P = 0.0039) and MYH8 (LR P = 0.007) presented worse OS than low expression (Fig. 4h). Finally, MYL1 and ACTN2 were confirmed their prognostic association with HNSCC in TCGA and GEPIA database, respectively, and were subsequently included in further analysis.
Identification of co-expression module by WGCNA. 19,601 mRNAs and 25,295 lncRNAs retrieved from TCGA RNA-seq data were used to establish co-expression networks, respectively. 489 HNSCC patients with complete clinical information were selected to be analyzed using WGCNA algorithm. Pearson correlation matrix among mRNAs were converted into an adjacency matrix that was strengthened by the best power (β = 4) with scale-free topology criterion of R 2 = 0.84 (Fig. 5a). Through minimizing interference from noise and spurious associations by topological overlap measure (TOM), mRNAs with identical expression profiles were grouped together into gene modules. Then, gene modules with 70% similarity were merged into one module by dynamic cutting algorithm (Fig. 5b). From this analysis, 12 gene modules were identified for downstream analysis. Module-trait relationship (MTR) analysis results showed that the magenta module was found to have the highest association with metastasis (P = 2E−13) and prognosis (P = 0.00002) in HNSCC patients (Fig. 5c). The module membership in the magenta module (including 307 mRNAs) suggested the correlation with gene significance (GS) was 0.73 (P = 2.4E−52) for metastasis (Fig. 5d) and 0.5 (P = 8E−21) for prognosis (Fig. 5e). Parallel processing and analysis were synchronously implemented among lncRNAs, of which the results were showed in Supplementary Figure 1. The yellow module was most significantly correlated with metastasis (P = 1E−14) and prognosis (P = 0.00003), including 743 lncRNAs. The metastatic correlation between the module membership of the yellow module and GS of 743 lncRNAs was 0.82 with P = 9.7E−182, and the prognostic correlation was 0.61 with P = 6.3E−77. Finally, the magenta module in mRNAs co-expression network and the yellow module in lncRNAs co-expression network were selected for further analysis.  (Fig. 6a). In the over-expressed ceRNA network, 55 upregulated DElncRNAs and 12 upregulated DEmRNAs were connected by 18 common miRNAs (5 downregulated DEmiRNAs) and 282 edges were involved (Fig. 6b). Among the 6 downregulated DEmRNAs and 12 upregulated DEmRNAs, we found that GABBR1 was involved in the neuroactive ligandreceptor interaction pathway and RYR3 was associated with ion channel activity. Establishment of HNSCC-specific prognostic model for lncRNA and mRNA signatures. Using the univariate regression analysis, we identified potential lncRNAs and mRNAs from ceRNA networks with prognostic values for HNSCC. Among them, 10 lncRNAs and 10 mRNAs were detected to be significantly associated with OS of HNSCC (Table 2). Meanwhile, we found that 6 lncRNAs (AC009148.1, AC073573.1, AC114730.3, AC136475.3, AL139130.1 and AL513320.1) and 6 mRNAs (CAPS, MUC19, RIC3, RYR3, LAT and PRRT2) were simultaneously identified to have prognostic values in LR test and univariate Cox regression analysis. Subsequently, in order to establish HNSCC-specific prognostic model, the multivariate Cox regression analysis were performed among the lncRNAs (AC114730.3 and AC136475.3) and mRNAs (LAT, RYR3) with stronger correlation to OS (P < 0.01). As shown in Fig. 7a, the prognostic model was established (P = 0.00055). High lncRNA expression of AC114730.3 and AC136375.3, and high mRNA expression of LAT and RYR3 contributed to better OS in HNSCC patients (Fig. 7b). Based on the median value of risk score for all patients, HNSCC patients were divided into high-and low-risk groups (Fig. 7c). In the high-risk group, the mortality rate was significantly higher (P = 0.0016) and the prognosis was worse when compared to the low-risk group (Fig. 7d). Furthermore, in Fig. 7e, the area under curve (AUC) of the prognostic model was 0.67 and 0.657 that www.nature.com/scientificreports/ were calculated by NNE and KM method, respectively, and which indicated the 2 lncRNAs and 2 mRNAs could be effective prognostic biomarkers in predicting OS for HNSCC patients.
To further confirm the metastatic and prognostic roles of the lncRNAs and mRNAs, we analyzed their expression levels between different groups. Firstly, the expression levels between HNSCC samples and normal samples were evaluated, and the results showed AC114730.3, AC136375.3 and LAT were significantly over-expressed in HNSCC patients, whereas RYR3 was downregulated (Fig. 7f). When compared to the low-risk group, all the four RNAs were significantly decreased expression in the high-risk group, demonstrating higher risk scores (Fig. 7g). Furthermore, we evaluated their expression levels when took the clinical stages into account (Fig. 7h). AC136475.3 and LAT obviously over-expressed in stage I, II, III, and IV HNSCC samples compared with normal samples, respectively. AC114730.3 expressed higher only in the stage IV than in normal samples. RYR3 showed lower expression level except stage I. It's surprising that gene expression levels in the high-risk group www.nature.com/scientificreports/ were lower than that in the low-risk group at different clinical stages, which suggested some of the patients with clinical early-stage (stage I or II) were supposed to belong to the high-risk group and were likely to have a poor prognosis (Fig. 7i). Moreover, the expressions of the RNAs (expect RYR3) between alive-and dead-group were also observed, which were consistent with the above expression patterns (Fig. 7j).   (Fig. 8a). RYR3 protein expression level was not evaluated because its information was not available in the database. Besides, cancerous and para-cancerous tissues from oral squamous cell carcinoma (OSCC) patients were collected from The Affiliated Hospital of Stomatology, Zhejiang University School of Medicine. Verification of gene expression levels and protein expression levels were demonstrated in Fig. 8b,c. Consistent with our bioinformatic analysis results, the protein expression patterns demonstrated that MYL1, ACTN2 and LAT were strongly over expressed in OSCC tissues than in normal oral mucosa tissues. IHC staining revealed the cytoplasmic membranous location of the MYL1, ACTN2 and LAT proteins in cancerous tissues. Atypia of cancerous cell was shown in the nests. The boundary between epithelial tissue and connective tissue becomes blurred in cancerous tissues, which was clearly well-structured in the normal oral mucosa tissues. All patients' information is provided in Tables 3 and 4.

Discussion
HNSCC is a common malignant cancer with high genetic diversities, and per year causes hundreds of thousands of physical dysfunctions and aesthetic problems, even deaths 1 . Especially, advanced HNSCC with high metastatic rate usually suggests poor overall survival because conventional therapeutic strategies have achieved unsatisfactory results 2 . In order to promote outcomes of this disease, novel therapeutic targets with higher specificity and efficacy are urgently needed, which are also expected to be potential predicting biomarkers for HNSCC. It's well documented that lncRNAs play chief roles in many biological processes by ceRNA mechanism, such as tumorigenesis and progression. In this study, we systematically analyzed the HNSCC-related RNA-seq and miRNA-seq data from TCGA database by integrated bioinformatics analyses including GO and KEGG pathway analyses, PPI network construction, WGCNA, ceRNA network establishment and survival analysis, and screened out the HNSCC-specific prognostic lncRNA and mRNA signatures to establish a novel ceRNA-based risk model. Finally, four mRNAs (MYL1, ACTN2, LAT and RYR3) and two lncRNAs (AC114730.3 and AC136375.3) were identified to associate with OS in HNSCC patients. MYL1 (myosin light chain 1) encodes an alkali light chain for myosin that is a hexametric ATPase cellular motor protein. It was well studied that MYL1 played a key role in congenital myopathy and knockdown of the gene could cause severe muscle-related disorders in zebrafish 29 . Recently, MYL1 has been identified as a potential biomarker in various cancers 30,31 . Sajnani et al. found that MYL1 was experimentally proved to be downregulated in buccal cancer and be involved in the "actin mediated cell contraction" biological progress 31 , which was consistent with our results. ACTN2 (actin alpha 2) is a member of the spectrin gene superfamily that includes varying groups of cytoskeletal proteins. In breast cancer patients, mutated ACTN2 was found to be related to invasive ductal carcinoma and suggested a worse OS than ductal carcinoma in situ 32 . Sun et al. and his colleagues reported ACTN2 was one of the hub genes selected by bioinformatics methods in PTEN mutation prostate cancer 33 . However, so far researches on the associations between aberrantly expressed ACTN2 and tumorigenesis and progression of HNSCC are barely reported. The current study firstly indicated negative MYL1 and ACTN2 expression contributed to a better OS of HNSCC patients. Perhaps the low expression of the genes was to inhibit some activities in the tumor cells, so patients with lower expression of these genes showed better prognosis.
With the development of bioinformatics analysis methodology, the single method to find a limited group of genes as therapeutic targets has been less convincing, so the comprehensive bioinformatics analyses are more recognized by researchers. The ceRNA theory points out that lncRNAs could function as ceRNAs and competitively bind to miRNAs by acting as "sponges" of miRNAs, which might derepress activity or expression of those miRNA-mediated target with similar binding sites 34 . This analysis may shed light on regulatory networks that have been underestimated by conventional protein-coding studies. Heretofore, some studies have investigated   36 . These studies provide a better understanding how the ceRNA contributes to improving the diagnostic and prognostic efficiency of HNSCC patients. However, rare studies have focused on the co-expression among ceRNAs, let alone establishing risk models. In the present study, we innovatively used WGCNA to identify coexpressed modules in HNSCC, and further constructed lncRNA-miRNA-mRNA ceRNA network based on the co-expressed genes in the most significant module. Moreover, LR test and univariate Cox regression analysis selected 12 RNAs (including 6 lncRNAs and 6 mRNAs) that were simultaneously having significant prognostic values in HNSCC patients. LAT, RYR3, AC114730.3 and AC136375.3 were ulteriorly implemented multivariate Cox regression analysis to build a HNSCC-specific prognostic model, since their stronger correlations to OS of HNSCC. The protein encoded by LAT (linker for activation of T cells) can recruit multiple downstream molecules after phosphorylation and activation, and is involved in immunology related pathways 37 . T cell is an important component of acquired immunity for hosts, which is able to be activated by phosphorylated LAT through TCR signaling pathway 38 . Wang et al. elucidated prognostic risk models containing 3 genes by comprehensively analyzing methylation data and RNA-seq data for clear cell renal cell carcinoma (ccRCC) from TCGA database, and concluded that hypomethylation and over-expression of LAT resulted in poor OS of ccRCC 39 . Nevertheless, no studies so far have reported any correlation between LAT and HNSCC. In our study, we found that LAT was highly expressed in HNSCC patients and correlated with tumor metastasis and prognosis, and those patients with higher expression of LAT surprisingly presented lower risk scores and better survival. According to our findings and previous literatures, we speculate that LAT, as a protective gene, takes a good effect on preventing tumor metastasis and deterioration in HNSCC. On the contrary, RYR3 (ryanodine receptor 3), one of the isoforms of RYRs, encodes protein to release calcium from intracellular storage 40 . The present study demonstrated that RYR3 was apparently downregulated in HNSCC tissues when compared to normal samples, and lower expression of RYR3 always led to higher risk scores and worse OS in HNSCC patients. However, none of the previous studies has been elucidated its biological role in HNSCC. Schmitt K et al. held that reduced expressed RYR2, another isoform of RYRs, might serve as risk factor for unfavorable prognosis and upcoming malignant conversion in head and neck cancer 41 . Using siRNA or miR-367 to knockdown the endogenous RYR3 apparently inhibited growth and migration of breast cancer cells, which led cell-cell contacts became more weakened because of their rounder morphology 42 . Therefore, tumor suppressor role of RYR3 on HNSCC still need to be further clarified.
Between the identified 2 lncRNAs in the risk model, AC114730.3 and AC136375.3 were remarkably overexpressed in HNSCC tissues, and HNSCC patients with higher expressions presented lower risk scores and better survival, which were similar to LAT expression pattern. Although no studies so far have reported any correlation between the 2 lncRNAs and cancer, our study firstly detected the upregulated AC114730.3 in ceRNA network competed with downregulated miR-375 and miR-338-3p to indirectly regulate downstream mRNAs. Dependent upon various cancer types, miR-375 tended to have dramatically suppressive or oncogenic effect on cancers [43][44][45] . MiR-375 could inhibit the growth and metastasis of ovarian cancer cells, of which the suppressive effect was reversed by MLK7-AS1 lncRNA that indirectly upregulated YAP1 expression 43 . The MLK7-AS1 regulatory role towards miR-375 was observed in gastric cancer as well 44 . However, lower miR-375 expression level was evaluated in malignant basal-like breast cancer than in luminal-like breast cancer, suggesting malignant development of tumor 45 . The current study found AC114730.3 was aberrantly expressed in HNSCC and functioned as a novel regulatory lncRNA of miR-375, which triggered our speculation that over-expressed AC114730.3 inhibited miR-375 expression, but might as similar to breast cancer, the miR-375 would tend to act as a tumorigenesis molecular in HNSCC. Besides, among the potential targeting mRNAs for miR-375, KCNAB3 was surprisingly upregulated in HNSCC, composing AC114730.3/miR-375/KCNAB3 regulatory axis. Thus, it is reasonable to understand why highly expressed AC114730.3 contributed to better OS in HNSCC patients. In addition, miR-338-3p was predicted to be another targeting miRNA of AC114730.3 and no researches have so far focused on its effect on HNSCC patients. Not coincidentally, such a miRNA deserves our attention since its reported association with non-small cell lung cancer (NSCLC) 46 , cervical cancer 47 and glioma 48 . Yang et al. found over-expressed LINC00525 suggested poor prognosis and could modulate miR-338-3p that endogenously targeted IRS2 in NSCLC 46 . Two circular RNA, HIPK3 and SMO, were reported to promote tumor proliferation and metastasis through sponging miR-338-3p in cervical cancer 47 and glioma 48 , respectively. Based on our findings and published literatures, however, we are weak in understanding and explaining the mechanisms of AC114730.3 and miR-338-3p about their prognostic effect on HNSCC. In brief, findings in the current study indicate that MYL1, ACTN2, LAT, AC114730.3 and AC136375.3 may play roles as HNSCC suppressors, while RYR3 may function as a HNSCC promoter.
Although ceRNA interactions have been described by multiple studies, the molecular conditions for optimal activity of ceRNA remain ambiguous 12 . The ceRNAs were required to compete with the entire target pool of miRNAs to bind targets. In order to win the "one versus multiple" battle, the abundance of ceRNA must to be increased to an aberrantly high level. A previous study has revealed up to 23-fold increase of ALDOA abundance did not lead to detectable miR-122 inhibition in hepatocytes 49 . Except for the abundance of ceRNA, other factors might also have profound effects on ceRNA crosstalk, including localization of ceRNA 13 , binding affinity of the shared MREs 13 , secondary structure of RNA 50 , RNA editing 51 and so on. Therefore, current ceRNA research is still in its infancy so that one of the exciting avenues for future work is to uncover the mystery of their widespread crosstalk. Some limitations must be noted despite the construction the risk model in clinical implication. Firstly, the ceRNA-based risk model must be verified by experimental approaches to further elucidate the precise Scientific Reports | (2021) 11:6374 | https://doi.org/10.1038/s41598-021-86048-x www.nature.com/scientificreports/ molecular mechanisms underlying HNSCC. Secondly, the predictive and therapeutic efficiency of HNSCCspecific prognostic signatures needs to be evaluated by large-scale studies.
In conclusion, four mRNAs and two lncRNAs were identified as HNSCC-specific prognostic signatures for HNSCC patients. Moreover, gene functions, co-expressed modules and ceRNA-based prognostic risk model were also elucidated, which facilitated the expansion of the current study on the roles of ceRNAs in the tumorigenesis, development and treatment of head and neck squamous cell carcinoma. Further experimental studies are required to biologically validate these findings.

Materials and methods
Data acquisition. HNSCC-related RNA-sequencing (RNA-seq) data and miRNA-sequencing (miRNAseq) data that were derived from the IlluminaHiSeq_RNASeq and IlluminaHiSeq_miRNASeq sequencing platforms, were retrieved from TCGA database (https ://www.cance r.gov/tcga), respectively. The RNA and miRNA expression profiles (level 3) were free to downloaded from the National Cancer Institute's Genomic Data Comments data portal (https ://porta l.gdc.cance r.gov). The RNA-seq data included 502 HNSCC samples and 44 normal samples, while the miRNA-seq data included 525 HNSCC samples and 44 normal samples. Among them, 44 normal samples of RNA-seq and miRNA-seq data were from normal tissues nearby tumorous tissues of 44 HNSCC patients. We also simultaneously downloaded detailed clinical and follow-up information of all HNSCC patients from TCGA database. Patients who met the following inclusion criteria were involved in TCGA-HNSCC cohort: (1) histologically diagnosed HNSCC; (2) no other malignancy except HNSCC; (3) patients with detailed clinical and follow-up information.
RNA-seq and miRNA-seq data preprocessing and differentially expressed gene analysis (DEGA). Subsequently, mRNAs and lncRNAs were annotated based on GENCODE Release 35 (GRCh38. p13) (https ://www.genco degen es.org/human /) and miRNAs were encoded according to miRbase v22 (http:// www.mirba se.org/index .shtml #openn ewwin dow). Level 3 RNA-seq and miRNA-seq count data were normalized by size factor 52 and transformed via variance-stabilizing transformation (VST) implemented using "DESeq2" package 53 . This transformation makes the expression values homoscedastic by fitting the dispersion to a negative binomial distribution. "DESeq2" package in Bioconductor project (version 3.10, http://www.bioco nduct or.org) was utilized to perform DEGA. Then, we screened out DElncRNAs, DEmRNAs and DEmiRNAs between HNSCC samples and normal samples. |Log2FC| ≥ 1 and adjusted P value < 0.05 were set as cut-off criteria. The volcano plots of lncRNA, mRNA and miRNA expression were constructed using the "ggplot2" package and the heat maps of DElncRNAs, DEmRNAs and DEmiRNAs were plotted using the "pheatmap" package in R software (version 3.6.0) 54 .
Functional enrichment analyses of DEmRNAs. We next implemented functional enrichment analyses so as to shed light on biological function of DEmRNAs. To do this, GO and KEGG pathway 55 analyses of all DEmRNAs was conducted using the "clusterProfiler" package 56 in R software. For this analysis, the statistical cutoff threshold was set at adjusted P value < 0.05. GO analysis included three categories: BP, CC, and MF.

PPI network construction and analysis of DEmRNAs.
In accordance with expression levels, DEmR-NAs were separated into two groups and then separately used to build PPI networks. DEmRNAs with cut-off of |log2FC| ≥ 2 and adjusted P value < 0.05 were uploaded to the Search Tool for the Retrieval of Interacting Genes (STRING, version 11, https ://strin g-db.org) database. The gene pairs with a combined score ≥ 0.9 (the highest confidence) were retrieved from the results and visualized in Cytoscape software (version 3.7.0). Next, the highly correlated module was screened out from whole PPI network using the MCODE plugin in Cytoscape software to perform Molecular Complex Detection algorithm. The most significantly correlated module of each group was selected for further investigation. Then, we performed functional pathway enrichment analysis among clustered DEmRNAs in the most significantly correlated module based on the "Wikipathways" database using CyTar-getLinker plugin in Cytoscape software.

Identification of key DEmRNAs and validation.
Another plugin in Cytoscape software, CytoNCA, was used to perform Centrality analyses that included Subgraph Centrality, Degree Centrality, Eigenvector Centrality, Information Centrality, Betweenness Centrality, Closeness Centrality and Network Centrlity 57 . Key DEmRNAs in each PPI network were identified by Centrality analyses. We selected the DEmRNAs with top 1% Centrality scores as key DEmRNAs. Next, we calculated expression levels of key DEmRNAs between HNSCC samples and normal samples, and further evaluated the relationship between HNSCC samples and normal samples in different stages. Moreover, GEPIA (http://gepia .cance r-pku.cn) database has been previously and publicly adopted to analyze gene expression levels of normal and tumorous samples from TCGA. Thus, we also performed the expression analysis between normal and HNSCC samples using log2(TPM + 1) in GEPIA database. Besides, survival analysis of key DEmRNAs was implemented using Log-Rank test. The P value, hazard ratio and the 95% confidence interval were calculated as well.
Weighted gene co-expression network analysis. In order to construct the co-expression network and identify co-expression gene modules, we performed WGCNA among lncRNAs and mRNAs retrieved from the TCGA RNA-seq data, respectively. After removing normal samples and HNSCC samples without clinical information, the expression matrix was firstly assembled with the gene symbols as column and HNSCC samples as the row names. Following this, "WGCNA" package in R software was adopted to perform WGCNA 58  www.nature.com/scientificreports/ The expression matrix was converted into an adjacency matrix that was further used to build an unsupervised co-expression relationship by using Pearson's correlation coefficients for all pair-wise genes. Based on scale-free topology criterion, we selected the best power (β = 4) as a soft-thresholding parameter to strengthen correlation adjacency matrix 59 . Secondly, the adjacency matrix was transformed into a topological matrix, and TOM was used to minimize interference from noise and spurious associations 60 . Any genes exhibiting identical expression profiles were grouped together into gene modules using hierarchical clustering. We set minimum module size as 30 genes. Finally, gene modules were identified from the system cluster tree by dynamic cutting algorithm and those with 70% similarity were merged into a module. Module eigengenes (MEs) were considered as the major component for each module, and together with the clinical information, they were used for MTR analysis. The MTR heat map was plotted to clearly visualize clinical trait related modules. Module significance (MS) was defined as the mean GS that was calculated based on the samples' clinical traits. Finally, the genes in the most significant module with strongest correlation to HNSCC clinical features were selected for further analysis.
Identification of HNSCC-specific prognostic lncRNA and mRNA signatures. In order to identify specific prognostic genes for HNSCC, we firstly performed KM curve analysis and LR test of DElncRNAs and DEmRNAs in ceRNA networks using "survival" package in R software. HNSCC patients were divided into high-and low-expression groups according to gene expression levels and LR P value < 0.05 was defined as the significant prognostic standard for lncRNAs and mRNAs. Then, univariate Cox proportional hazards regression analysis was implemented to evaluate the prognostic association of DElncRNAs and DEmRNAs in ceRNA networks in HNSCC patients by using "survival" package in R software as well. Statistical P value < 0.05 was set as the cut-off criteria. Next, multivariate Cox hazards regression model was constructed using overlapped DEl-ncRNAs and DEmRNAs in the above two methods with "survival" package in R software, which were strongly significantly associated with OS of HNSCC patients. The multivariate Cox hazards regression model was built to evaluate the HNSCC-specific prognostic lncRNA and mRNA signatures. The model was established based on the following equation: where "Coe" represented the regression coefficient of the lncRNA or mRNA retrieved from the multivariate Cox regression model and "Exp" referred to the expression level of the lncRNA or mRNA. Subsequently, HNSCC patients were divided into high-and low-risk groups based on the median risk score for all patients. A receiver operating characteristic (ROC) curve was plotted and AUC was calculated to access the risk prediction rate of risk model adopting "survivalROC" package in R software.
Patients and tissue sample collection. Human OSCC samples were collected at The Affiliated Hospital of Stomatology, Zhejiang University School of Medicine between January 2018 and December 2020, which were applied to validate the expression difference of the genes. There were 6 males and 4 females (average age, 59.6 years old; range 45-76 years old). Table 4 shown the details. The tissue samples were immediately fixed in 4% paraformaldehyde, embedded in paraffin, sectioned, and processed for the following steps. Each experimental sample was separated into two parts, one was used for IHC analysis and one for total RNA extraction. www.nature.com/scientificreports/ (1:250; Proteintech, 14221-1-AP) and LAT (1:100; Proteintech, 11326-1-AP) overnight. Then, washing in PBS, the sections were incubated with horseradish peroxidase (HRP)-labeled secondary antibodies for 1 h at room temperature. Diaminobenzidine (DAB) was used for visualizing immunolabeling.
Statistical analysis. The expression levels of lncRNAs and mRNAs in normal samples and HNSCC samples were firstly evaluated, and the association between expression levels and clinical stages were subsequently accessed. Besides, we also compared the genes' expression levels in high-risk group and low-group, and performed survival analysis by the LR test, of which the result was visualized by plotting KM curve. According to prognostic information of HNSCC patients, the comparison of expression in alive group and dead group was also analyzed. qPCR results were presented by using GraphPad Prism (version 7.0; GraphPad Software, Inc., USA). Student's t-test was used to compare two groups, and one-way ANOVA for more groups. All values are presented as the mean ± standard deviation and P < 0.05 was considered to indicate a statistically significant difference.

Data availability
The HNSCC RNA-seq and miRNA-seq data were deposited in the TCGA database. Besides, please contact author for data and material requests.