A novel microRNA signature predicts survival in liver hepatocellular carcinoma after hepatectomy

Liver hepatocellular carcinoma (LIHC) is the most common type of primary liver cancer. In the current study, genome-wide miRNA-Seq and mRNA profiles in 318 LIHC patients derived from The Cancer Genome Atlas (TCGA) were analysed to identify miRNA-based signatures for LIHC prognosis with survival analysis and a semi-supervised principal components (SPC) method. A seven-miRNA signature was confirmed for overall survival (OS) prediction by comparing miRNA profiles in paired primary tumour and solid tumour normal tissues. Thereafter, a linear prognostic model that consisted of seven miRNAs was established and used to divide patients into high- and low-risk groups according to prognostic scores. Subsequent Kaplan-Meier analysis revealed that the seven-miRNA signature correlated with a good predictive clinical outcome for 5-year survival in LIHC patients. Additionally, this miRNA-based prognostic model could also be used for OS prognosis of LIHC patients in early stages, which could guide the future therapy of those patients and promote the OS rate. Moreover, the seven-miRNA signature was an independent prognostic factor. In conclusion, this signature may serve as a prognostic biomarker and guide LIHC therapy, and it could even be used as an LIHC therapeutic target in the future.

With the advent of high-throughput sequencing technologies, miRNA profiling datasets have been emerging rapidly. The Cancer Genome Atlas (TCGA) has generated comprehensive, multi-dimensional maps of the key genomic changes in many types of cancer and has created a genomic data analysis pipeline that can effectively collect, select, and analyse human tissues for genomic alterations on a large scale (http://cancergenome.nih.gov/). In the current study, to identify miRNA-based signatures for LIHC prognosis, we analysed the genome-wide miRNA-Seq and mRNA profiles derived from TCGA with bioinformatics methods. First, differentially expressed miRNAs were confirmed by comparing miRNA profiles in paired primary tumour and solid tumour normal tissues to minimize the variations among different individuals 11 . Thereafter, a linear prognostic model consisting of seven miRNAs was established and used to divide patients into high-and low-risk groups according to prognostic scores. Subsequent Kaplan-Meier analysis revealed that the seven-miRNA signature correlated with a good predictive clinical outcome for 5-year survival in LIHC patients, indicating that this signature may guide LIHC therapy and that it could even be used as a therapeutic target in the future.

Materials and Methods
TCGA dataset. A total of 1047 miRNA expression profiles in level 3 patients and 24991 mRNAs in LIHC patients along with their corresponding clinical data were downloaded from the TCGA data portal (February 2015). Patients with follow-up longer than 30 days were selected for further analysis to exclude unrelated causes of death.

Identification of dysregulated miRNAs in LIHC.
The raw counts of miRNA expression data of 49 LIHC samples with their paired normal tissues were obtained from the TCGA dataset (Illumina HiSeq Systems). miR-NAs whose expression was = 0 in more than 50% of the samples data were removed and were then normalized by log2(X + 1). miRNAs with log2 fold change (log FC) < −1 or log FC > 1 (FDR adjusted P < 0.05) were considered to be differentially expressed miRNAs and were included for subsequent analysis.
Identification of miRNAs with prognostic score in LIHC. To identify common miRNAs related to OS within each of the subgroups stratified by the TMN stage, a semi-supervised method and univariate Cox regression analysis were used, and HR > 1 or HR < 1 with P < 0.05 was used as the cutoff 11,15 . The patient subclasses in each group of clinical characteristics represented non-overlapping sets.
Definition of prognostic risk model and ROC curve analysis. The TCGA data were randomly divided into a training and testing set. A semi-supervised principal components (SPC) method was used to select significant miRNAs and develop the linear miRNA signature prognostic model 11 . The prognostic score was calculated as follows: Prognostic score = (2.307831 × hsa-mir-187) + (hsa-mir-9-3 × 2.5538 12) + (hsa-mir-490 × 2.726262) + (hsa-mir-1258 × 2.217082) − (hsa-mir-3144 × 2.660158) + (hsa-mir-551a × 3.428998) + (hsa-mir-665 × 2.855964). The prognostic scores were then computed for all 318 patients. The optimal cutoff point of the prognostic score was obtained in ROC curve analysis for predicting the 5-year survival of the training set. Kaplan-Meier and log-rank methods were used for evaluating OS 16 . Time-dependent ROC curves were used to evaluate the predicted power of the miRNA-based signature model. Bioinformatic analysis of miRNA-target genes and pathways. The predicted target genes of the candidate miRNAs that existed at least in 4 datasets were obtained from the miRecords v4.0 database (www. mirecords.biolead.org). Differentially expressed miRNAs and differentially expressed genes (DEGs) function and pathway enrichment were analysed using multiple online databases, including DAVID database (https:// david.ncifcrf.gov/), which is a multifunctional website for gene annotation, visualization and integrated discovery and can therefore provide gene biological meaning for high-throughput gene functional analysis 17 . Kyoto Encyclopedia of Genes and Genomes (KEGG, http://www.genome.jp/) pathways are a computational representation of the biological system, including gene functions linked to genomic information with higher-order functional information 18 . KEGG pathway enrichment analysis was conducted with Cytoscape (version: 3.6.0). The results were considered statistically significant when the p-value < 0.05 after the FDR was corrected.

Identification of differentially expressed miRNAs in LIHC.
After filtering out unqualified cases with a follow-up of less than 30 days, a total of 318 LIHC patients (213 male/105 female) were retained for survival analysis, among which was a total of 49 patients with solid tumour normal tissues. These cases were randomly divided into a training set (n = 159) and a testing set (n = 159). There was no significant difference in clinical covariates observed between the two sets (Table 1). Differentially expressed analysis in 49 pairs of primary tumour and solid tumour normal tissues indicated that 156 miRNAs were differentially expressed (log FC > 1 or log FC < −1, P < 0.05 after FDR adjustment). Among these miRNAs, 37 miRNAs were up-regulated, and 119 miRNAs were down-regulated.
Establishment of the miRNA prognostic model. The common miRNAs associated with OS and tumourigenesis were characterized with the univariate Cox regression method 16 . Seven miRNAs were selected using the supervised principal component method in the training set. Thereafter, we developed a miRNA prognostic model for diagnosis in the LIHC cohort. According to ROC curve, the optimum cutoff point was calculated and used for classifying the samples into high-risk and low-risk groups (Fig. 1). This analysis showed the prognostic scores and miRNA expression distribution of all 318 patients and 49 pairs of solid tumour normal tissues, which were ranked according to the prognostic scores for the seven-miRNA signature. It indicated that the patients with higher prognostic scores showed a tendency towards expression of high-risk miRNAs, whereas patients with low prognostic scores showed a tendency towards non-protective miRNA expression.  Validation of the seven-miRNA signature in LIHC patients. Using the optimum cutoff value obtained from the training set, patients were assigned to high-risk and low-risk groups. The ability of the seven-miRNA signature to predict 5-year survival prognostication was examined in the testing set and in all LIHC patients.
The results indicated that patients in the high-risk group had poor OS, whereas patients in the low-risk group had good outcomes in the training set (P < 0.001, with the log-rank test method, Fig. 2A) and in the entire LIHC cohort (P = 0.0015, Fig. 2C). The AUC of the seven-miRNA signature prognostic model of the time-dependent ROC curve in the training set was 0.703 (Fig. 2B), whereas the AUC for all LIHC patients was 0.645 (Fig. 2D).
To further confirm the accuracy of the prognostic model, AUC was calculated with Kaplan-Meier methods in the early stage (including stage I and II) patients (n = 263). This calculation showed a similar result to that of the training set and the entire LIHC cohort (P = 0.0014, Fig. 2E), and the AUC of the ROC curve was 0.652 (Fig. 2F). Therefore, this model could be helpful for predicting the survival of STAD patients in stage I and II.
The seven-miRNA signature is an independent prognostic factor. To identify whether the seven-miRNA signature is an independent factor that is not affected by other factors, such as age, neoplasm histologic grade or TNM stage, Cox multivariate regression analysis was performed, and the result indicated that the seven-miRNA signature was indeed an independent prognostic factor ( Table 2; HR = 1.034 and P = 0.007). In silico analysis of target genes, pathways and DEGs. The targeted genes, which might be potentially regulated by the seven miRNAs and were predicted by more than 4 datasets were downloaded from miRecords and analysed. In addition, 276 differentially expressed genes (DEGs) were found. Twenty-nine of these DEGs were also found in the 2466 predicted genes (Fig. 3A) that were regulated by the seven miRNAs (Fig. 3B). KEGG enrichment results demonstrated that these genes were mainly involved in 8 KEGG pathways, which included the Ras signalling pathway, protein digestion and absorption, pathways in cancer, cAMP signalling pathway, focal adhesion, MAPK signalling pathway, GABAergic synapse and nicotine addiction (P < 0.05 after FDR adjustment, Fig. 3C). In these enriched pathways, a total of 18 genes were changed significantly (log FC > 1 or <−1, and P < 0.05 after FDR adjustment, Fig. 4A,B), proving the relevant pathways and mechanisms indirectly. Specifically, three genes, including GABRP, GRIN2B and COL9A1, were predicted by miR-187, miR-551a and miR-9-3, respectively (Fig. 4C).

Discussion
When LIHC occurs, it always accompanies chronic liver disease, such as cirrhosis. It may result in a worsening of symptoms, or it sometimes presents without symptoms at cancer detection 19 . Therefore, understanding the molecular mechanisms of LIHC is critical for LIHC diagnosis, prevention and therapy. In our study, 156 differentially expressed miRNAs were found between LIHC primary tumour and solid tumour normal tissues. A seven-miRNA signature (miR-187, miR-9-3, miR-490, miR-1258, miR-3144, miR-551a and miR-665) was identified and confirmed as a predictor of OS in the LIHC patients, especially patients in early stages (stage I and II), using survival analysis, ROC curve and SPC methods. Additionally, it was further confirmed that the seven-miRNA signature was an independent prognostic factor. Among these seven miRNAs, six were negatively associated with survival and one was positively associated. Six negative-relevant miRNAs were miR-187, miR-9-3, miR-490, miR-1258, miR-551a and miR-665; among these miRNAs, miR-187 has been found to be overexpressed in ovarian cancers 20 , can induce epithelial-mesenchymal transition (EMT) by targeting PTRF in non-small cell lung cancer 21 and promotes the growth and metastasis of gastric cancer by inhibiting FOXA2 22 . this result indicated that miR-187 possessed the potential to be a candidate for LIHC therapy. miR-9-3 was also confirmed to be a risk factor that can lead to poor 5-year survival. A previous report revealed that the upregulation of miR-9 expression can predicate advanced clinicopathological features and poor prognosis in LIHC patients, which was in accordance with the current results 23 . However, it has been reported that miR-9-3 acts as a tumour suppressor by targeting TAZ in LIHC 9 . This discrepancy among these studies may be caused by subtle variations in the patients, such as differences in race and living conditions. miR-490, another risk factor, has been reported to suppress tumourigenesis and progression in endometrial carcinoma by targeting TGFα 24 and to modulate cell growth and EMT of LIHC cells by targeting ERGIC3. We found that miR-490 was up-regulated, which meant that inhibition of this oncogenic miRNA might provide a potential approach for LIHC therapy. miR-1258 and miR-551a were two other risk factors identified in this study. However, it has been proven that miR-1258 is significantly down-regulated in breast cancer 25 and that miR-551a can suppress cell migration and invasion in gastric cancer by targeting PRL-3 26 , which is inconsistent with our current result, and this inconsistency might be due to different cancer types. It has been reported that miR-665 plays a key role in inflammatory bowel disease and tumour metastatic dissemination 27,28 , which means that miR-665 might be used as a potential target in LIHC therapy. As the only protective miRNA in the current study, miR-3144 can suppress target reporter activity markedly and can inhibit the expression of both synthetically exogenous and endogenous E6 oncogenes and ultimately inhibit cell growth and promote apoptosis in human papillomavirus 16 (HPV16)-positive cervical cancer cells 29 . As a promising common target of miRNAs for most high-risk HPVs, lowly expressed miR-3144 might play a key role in LIHC suppression. However, further work is still needed to validate its prognostic and target roles for clinical applications.
It is worth noting that both ROC curve and survival analyses of the early stage (stage I and II) LIHC cases proved that the seven-miRNA signature could predict their high risk and 5-year OS in early stages, which accounts for almost 60% (74/125) of all recurrent LIHC patients. Therefore, it may be beneficial for these cases to use this prognostic model to guide further medical interventions, as its results can potentially shorten exam intervals for recurrence monitoring and be used to adjust other therapy plans promptly. In silico analysis of target genes, KEGG pathways and DEGs showed that several meaningful pathways were involved in LIHC development and progression, such as the Ras signalling pathway. In all 2466 predicted genes, 29 genes were identified as DEGs in the current study. In all 217 enriched genes, 18 genes were proven to be differentially expressed. Three genes, including GABRP, GRIN2B and COL9A1, were predicted by miR-187, miR551a and miR-9-3, respectively. Specifically, GRIN2B was enriched in three pathways, including the Ras signalling pathway, cAMP signalling pathway, and nicotine addiction. While it was first identified in oesophageal squamous cell carcinoma 30 , GRIN2B has been subsequently confirmed in many other cancer types, such as oesophageal cancer, gastric cancer and breast cancer [31][32][33] . GRIN2B is epigenetically inactivated and supresses oesophageal cancer activity 33 . However, it has been reported that active GRIN2B has an important influence on the survival of breast cancer patients and that GRIN2B methylation is a common and important biologically relevant event in gastric cancer progression 31,32 . In the current study, GRIN2B was down-regulated significantly, which meant that miR-551a overexpression could supress GRIN2B activity and worsen the LIHC patient's condition. Antagonists targeting MiR-551a or a GRIN2B activator might be used as potential candidates in LIHC therapy in the future. GABRP, which was mainly enriched in GABAergic synapses and nicotine addiction, was progressively down-regulated with tumour-progression in breast cancer, which meant that it might be used as a prognostic marker 34 . Interestingly, it has also been reported that GABRP can stimulate basal-like breast cancer cell migration through the activation of ERK1/2 35 . The completely opposite results may be due to sample numbers or patients of different races. In our study, miR-187-targeted GABRP was significantly down-regulated and could be used as a prognostic marker for LIHC. To date, there have not been any reports about COL9A1 function in the progression of LIHC. We found that COL9A1 was mainly enriched in focal adhesions, which play an important role in LIHC 36,37 . In this study, miR-9-3-targeted COL9A1 was significantly up-regulated and could be useful as a prognostic marker for LIHC survival. All data were indirect proof of the usefulness of the seven-miRNA signature. In summary, a specific seven-miRNA signature prognostic model was identified using the TCGA dataset, and our validation cohort confirmed that the prognostic model may be a useful biomarker in LIHC patients. However, prospective, large-scale, multicentre studies are still needed for further validation before the current results can be applied in clinical settings.