MicroRNA signature for estimating the survival time in patients with bladder urothelial carcinoma

Bladder urothelial carcinoma (BLC) is one of the most common cancers in men, and its heterogeneity challenges the treatment to cure this disease. Recently, microRNAs (miRNAs) gained promising attention as biomarkers due to their potential roles in cancer biology. Identifying survival-associated miRNAs may help identify targets for therapeutic interventions in BLC. This work aims to identify a miRNA signature that could estimate the survival in patients with BLC. We developed a survival estimation method called BLC-SVR based on support vector regression incorporated with an optimal feature selection algorithm to select a robust set of miRNAs as a signature to estimate the survival in patients with BLC. BLC-SVR identified a miRNA signature consisting of 29 miRNAs and obtained a mean squared correlation coefficient and mean absolute error of 0.79 ± 0.02 and 0.52 ± 0.32 year between actual and estimated survival times, respectively. The prediction performance of BLC-SVR had a better estimation capability than other standard regression methods. In the identified miRNA signature, 14 miRNAs, hsa-miR-432-5p, hsa-let-7e-3p, hsa-miR-652-3p, hsa-miR-629-5p, and hsa-miR-203a-3p, hsa-miR-129-5p, hsa-miR-769-3p, hsa-miR-570-3p, hsa-miR-320c, hsa-miR-642a-5p, hsa-miR-496, hsa-miR-5480-3p, hsa-miR-221-5p, and hsa-miR-7-1-3p, were found to be good biomarkers for BLC diagnosis; and the six miRNAs, hsa-miR-652-5p, hsa-miR-193b-5p, hsa-miR-129-5p, hsa-miR-143-5p, hsa-miR-496, and hsa-miR-7-1-3p, were found to be good biomarkers of prognosis. Further bioinformatics analysis of this miRNA signature demonstrated its importance in various biological pathways and gene ontology annotation. The identified miRNA signature would further help in understanding of BLC diagnosis and prognosis in the development of novel miRNA-target based therapeutics in BLC.


Results
Identification of miRNA signature for estimating survival time. We retrieved 106 miRNA expression profiles of patients with BLC from The Cancer Genome Atlas (TCGA) database. Each miRNA profile consisted of 485 miRNAs which were the variables for survival estimation. BLC-SVR identified a set of miRNAs as a signature for estimating the survival time in patients with BLC. A robust miRNA signature was selected by performing 50 independent runs of BLC-SVR. The appearance score (ASC) for each miRNA signature of the prediction model was measured and scored according to their frequency among independent runs. The miRNA signature with a highest ASC accommodates the more frequent miRNAs among the independent runs of BLC-SVR. The average and highest ASCs obtained from 50 independent runs were 13.52 ± 1.60, and 17.27, The roles of top 10 ranked miRNAs in cancer. The literature validation on top ranked miRNAs revealed that these miRNAs possess different functions and active involvement in BLC progression (Table 3). For instance, the up-regulated hsa-miR-432-5p targets RNA-binding motif protein 5 and regulate apoptosis in bladder cancer cells 34 . Hsa-let-7 family is known to be differentially expressed in various cancers including BLC 35,36 . Hsa-miR-146b expression was upregulated in bladder cancer tissues when compared to the normal tissues 37 . A real-time quantitative polymerase chain reaction (RT-qPCR) study on BLC cell lines reported that hsa-miR-652-3p expression levels were upregulated and knockdown of this miRNA significantly affected cell proliferation, migration, and invasion in BLC 38 . A PCR based miRNA screening study revealed that hsa-miR-193 downregulated the expression of oncogenes, Cyclin D1 and EST1, and inhibited cell migration in human urothelial cells 39 . The hsa-miR-203a has been identified as a tumor suppressor and its overexpression inhibits cell proliferation, invasion and migration in BLC 40 . Hsa-miR-542 expression was downregulated and negatively correlated with the expression of surviving protein resulting in the inhibition of proliferation in BLC cells 41 .
The roles of three of the top 10 ranked miRNAs (hsa-miR-505-3p, hsa-miR-629-5p, and hsa-miR-128-3p) have not been previously reported in BLC. However, these miRNAs are implicated in other major cancers. For instance, hsa-miR-505-3p acts as a tumor suppressor in pancreatic cancer and hepatocellular carcinoma 42,43 . Hsa-miR-629-5p promotes tumor progression by targeting AKAP13 in prostate cancer 44 , and hsa-miR-128-3p acts as tumor suppressor in breast cancer by regulating the LIMK1/CFL1 signaling pathway 45 . Hence, their involvement in other major cancers suggests that their expression is biologically consistent and important in BLC. A summary of miRNAs and their regulation in BLC is shown in Table 3.

Diagnostic ability of the miRNAs.
To determine the diagnostic ability of the identified miRNA signature, receiver operating curve (ROC) analysis was performed using BLC tumor and normal samples. The ROC  www.nature.com/scientificreports/ normal samples, shown in Supplementary Fig. S4. The combination of top 10 ranked miRNAs showed better diagnostic ability. Additionally, expression differences of the top 10 ranked miRNAs between normal and tumor samples were analyzed using box-plot analysis. The analysis showed that eight miRNAs, hsa-miR-432-5p, hsa-let-7e-3p, hsa-miR-146b-5p, hsa-miR-652-3p, hsa-miR-629-5p, hsa-miR-193b-5p, hsa-miR-203a-3p, and hsa-miR-128-3p were significantly expressed (p < 0.05) between normal and tumor samples. The box plot analysis of the top 10 ranked miRNAs is shown in Fig. 3.
Prognostic ability of the miRNAs. The prognostic performance of miRNA signature was analyzed by Kaplan-Meir (KM) survival curves using CancerMIRNome 47 . Six miRNAs of the signature showed significant prognosis capability in overall survival analysis. These six miRNAs, hsa-miR-652-5p, hsa-miR-193b-5p, hsa-miR-129-5p, hsa-miR-143-5p, hsa-miR-496, and hsa-miR    www.nature.com/scientificreports/ employed. The miRNA signature is significantly involved in various biological pathways such as prion disease, fatty acid biosynthesis, fatty acid metabolism, ECM-receptor interaction, hippo signaling pathway, adherence junction, steroid biosynthesis, lysine degradation, TGF-beta signaling pathway, and proteoglycans in cancer. The number of targeted genes involved in KEGG pathways of the miRNA signature is shown in supplementary  Table S1. The miRNA signature enriched in KEGG pathways is shown in Supplementary Fig. S5. Next, the biological significance of the miRNA signature was analyzed in different stages of BLC using KEGG pathway analysis. The differentially expressed miRNAs of the signature were identified between stage II, III, and IV. There were nine miRNAs, including hsa-miR-496, hsa-miR-146b-5p, hsa-miR-652-3p, hsa-miR-26a-1-3p, hsa-miR-193b-5p, hsa-miR-642a-5p, hsa-miR-432-5p, hsa-miR-143-5p, and hsa-miR-505-3p which were differentially expressed between stage II&III, and two miRNAs, hsa-miR-143-5p and hsa-let-7e-3p, between stage III&IV. The significant biological pathways in stage II&III were thyroid hormone synthesis, oxytocin signaling pathway, ErbB signaling pathway, long-term depression, and hippo signaling pathway, to name a few. The significant pathways in stage III&IV were biosynthesis of unsaturated fatty acids, ErbB signaling pathway, GABAergic synapse, morphine addiction, and estrogen signaling pathway. There were some common pathways, including ErbB signaling pathway, thyroid hormone signaling, morphine addiction, non-small cell lung cancer, and estrogen signaling pathway in stage II&III and stage III&IV. However, some targeted pathways were different across cancer stages of BLC. The complete list of significant pathways across cancer stages are listed in Supplementary  Table S2. The bubble plots showing the KEGG pathways in BLC stages are shown in Supplementary Figs. S6&S7.
Next, GO annotations of the miRNA signature was employed in three categories, including biological process, molecular functions, and cellular components. The GO analysis showed that the miRNA signature involved several biological processes that the top five significant biological processes were DNA metabolic process, cellular protein metabolic process, membrane organization, RNA metabolic process, and nucleobase-containing compound catabolic process (Supplementary Table S3). The top five molecular functions were ion binding, nucleic acid binding transcription factor activity, protein binding transcription factor activity, enzyme binding, and enzyme regulatory activity. The top five cellular components were organelle, cytosol, nucleoplasm, protein complex, and focal adhesion. The details of GO annotations for the miRNA signature are listed in Supplementary Tables S3-S5. The GO enrichment analysis of the miRNA signature is depicted in Supplementary Figs. S8-S10.
Gene interaction network. The complex networks in which miRNAs engaged with other functional molecules can influence cell biological responses and human diseases 48 . Hence, a miRNA network analysis was employed for the top 10 ranked miRNAs with genes, long non-coding RNAs (lncRNAs), circular RNAs (ciR-NAs), and small molecules to explore the miRNA-target interactions using miRNet 2.0: a miRNA-centric network visual analytics platform 49 . The miRNA-gene target interaction network was built with experimentally validated gene target networks using the miRTarBase V8.0 50 . There were 1594 gene interactions with top 10 ranked miRNAs in a miRNA-gene target network. The miRNA-gene target network is shown in Fig. 5A.

Discussion
The critical role of miRNAs in cancer biology has opened up a new direction for oncology research. Numerous evidences have demonstrated the development of miRNA-based cancer progression, diagnosis, and therapeutics [51][52][53] . Bladder cancer is one of the common cancers and a heterogeneous disease with prognostic and therapeutic challenges. Identifying the survival related variants could help understand the cancer survival at various stages and may contribute to the therapeutic improvements in BLC. Due to cost and time consumption in experimental methods to predict the targets and identify biomarkers, computational methods are often used in miRNA biology and cancer prognosis predictions. Advances in machine learning methods have significant importance in developing fast and accurate models to aid in caner prognosis, diagnosis, and medical decisionmaking 54 . Recent developments on miRNA-disease associations revealed the importance of computational models in understanding the disease associated variants 55 . However, there are limited studies on identifying miRNA signatures to estimate the survival time in patients with BLC using machine learning techniques.
The machine learning methods often suffer from higher dimensionality issues 56 , especially in biomedical data. The used feature selection algorithms could work well in coping with the curse of dimensionality issue resulting www.nature.com/scientificreports/ from genomic data and select a robust signature for cancer prognosis 31,57 . To address the dimensionality issue, we used an optimal feature selection algorithm IBCGA to identify a small set of miRNAs from a large number of candidate miRNAs that are associated with survival time in patients with BLC. In our previous studies, optimized survival estimation methods were developed to estimate the survival time in patients with glioblastoma, lung adenocarcinoma, and ovarian cancers [28][29][30] . In this study, we developed an optimized SVR-based method BLC-SVR to identify a miRNA signature associated with survival and estimate the survival time in patients with BLC. The identified miRNA signature consisted of 29 miRNAs as a signature and obtained a R 2 and MAE of 0.81 and 0.51 year, between actual and predicted survival times, respectively. The estimation capability of BLC-SVR was compared with some standard regression methods and results showed its promising estimation performance. Further, the identified miRNAs of the signature were ranked based on their contribution to the estimation performance. The literature survey on top 10 ranked miRNAs demonstrated that seven of top 10 ranked miRNAs are actively involved in BLC progression except the three miRNAs hsa-miR-505-3p, hsa-miR-629-5p, and hsa-miR-128-3p. These three miRNAs are important contributors to the estimated performance. Therefore, hsa-miR-505-3p, hsa-miR-629-5p, and hsa-miR-128-3p may be novel targets for BLC and further studies are needed to validate their roles in BLC.
The functional analysis of miRNAs revealed the involvement of miRNAs in physiological process that are essential for disease mechanism. Biological relevance of the identified miRNA signature concluded that the miRNA signature was involved in several biological pathways, including biological processes, molecular functions, and cellular components. The top-3 KEGG pathways were prion diseases, fatty acid biosynthesis, and fatty acid metabolism. In human prion diseases, point mutations in the prion protein gene (PRNP), which encodes PrP, induce familial forms of human prion diseases 58 . Somatic missense mutation in the prion protein gene (PRNP) have been identified in patients with BLC 59 . Urinary retention as an early symptom was observed in patients with prion disease 60 . The prion proteins were detected in urine and involved in disease infection 61 . Fatty acid synthesis and fatty acid metabolism pathways are associated with various cancers including BLC 62 . A previous study showed that the change in the fatty acid composition may be an indicator of altered lipid metabolism occurring in vivo during human bladder tumorigenesis. The bladder cancer tissue showed a significant reduction in total n-6 polyunsaturated fatty acid (− 15.1%; P < 0.001) 63 .
The pathway analysis demonstrated that the group of miRNAs target specific pathways and target genes that might contribute to the cancer progression. To investigate the numbers of genes, lncRNAs, ciRNAs, and small molecules targeted by the top ranked miRNAs, miRNA-gene target interaction networks were constructed. A miRNA network showed some key molecules that were connected to top ranked miRNAs which might act as underlying drivers of survival in BLC. In conclusion, the identified miRNA signature would guide the understanding of the survival associated miRNAs and help develop miRNA target-based therapeutic strategies in BLC.

Material and methods
Dataset. The clinical characteristics. The miRNA expression profiles of 409 patients along with their survival times were retrieved from the TCGA database. All the data extraction methods were carried out in accordance with the TCGA guidelines and regulations. The patient selection criteria included patients with survival times and miRNA expression profiles. The miRNA expression was considered if the expression levels of mature miRNAs were presented in more than 70% of the samples. After the filtration process, there were 106 patients with miRNA expression profiles where each miRNA profile consisted of 485 miRNAs in the final dataset.
The clinical characteristics of the 106 patients with BLC are presented in Supplementary Fig. S11. The majority of BLC patients were male, and 69% were male and 31% female. The average age at diagnosis was 70.46 ± 9.43 and average height of the patients was 173.4 ± 11.14 cms. The total numbers of patients in stages 2, 3, and 4 were 14, 37, and 55, respectively. The range of survival times of patients were between 0.63 and 94.26 months.
BLC-SVR method. BLC-SVR was designed to identify a set of miRNAs as a signature that could estimate the survival time in patients with BLC. BLC-SVR method was developed based on SVR incorporated with the optimal feature selection algorithm IBCGA. Two main parts of BLC-SVR are feature selection and survival estimation. BLC-SVR adopted the optimization technique from our previous study 29 . Feature selection algorithm IBCGA . BLC-SVR utilized the optimal feature selection algorithm IBCGA to select a minimum number of features from a large number of candidate features (miRNAs) while maximizing the prediction performance 32 . The IBCGA uses an intelligent evolutionary algorithm to solve the large parameter combinatorial optimization problems 64 . Here, we used genetic algorithm (GA) terms GA-chromosomes and GA-genes for the feature representation. The chromosome of IBCGA comprises 485 GA-genes and three 4-bit GA-genes to encode parameters C, γ, and ν for ν-SVR. The encoded GA-chromosomes were designed as described in previous studies [29][30][31] . The best prediction model of BLC-SVR was generated from the 50 independent runs of IBCGA. The main steps in IBCGA are described as follows: where the detailed description can be refer to the work 32 : www.nature.com/scientificreports/ Step 1: (Initialization) Randomly generate an initial population of individuals.
Step 2: (Evaluation) Evaluate the fitness value of all individuals using the fitness function, which is to maximize the prediction accuracy (R 2 ) in terms of 10-CV.
Step 3: (Selection) Use a conventional method of tournament selection that selects a winner from two randomly selected individuals to generate a mating pool.
Step 4: (Crossover) Select two parents from the mating pool to perform an orthogonal array crossover operation.
Step 5: (Mutation) Apply a conventional mutation operator to the randomly selected individuals in the new population.
Step 6: (Termination test) If the stopping condition for obtaining the solution is satisfied, output the best individual as the solution. Otherwise, go to Step 2.
Step 7: (Inheritance) If r is less than a predefined number of features, randomly changes one bit in the binary GA-genes for each individual from 0 to 1, increase the number r by one and go to Step 2. Otherwise, stop the algorithm.
The applications of support vector machines (SVM) have diverse importance in biomedical sciences and precision medicine due to their capability in solving complications in predictions 65 . The SVM has two modules, support vector classifier (SVC) and support vector regression (SVR) 66 . The SVMs are used in various cancer diagnosis and prognosis predictions. The optimized SVMs were used to predict the cancer stage in breast cancer and hepatocellular carcinoma 26,27 , and estimation of survival in patients with lung adenocarcinoma, glioblastoma, neuroblastoma, and ovarian cancers [28][29][30][31] . The LibSVM package 67 was used to implement the BLC-SVR. The optimization technique of SVR can be written as follows: where 0 ≤ ν ≤ 1, ξ i ≥ 0, ξ * i ≥ 0, (x 1 , y 1 )…(x m , y m ) are the input data points, C is the regularization parameter, ε is an insensitive loss function, and b is a constant.
Performance measures. We used squared correlation coefficient (R 2 ) and mean absolute error (MAE) as the estimation measures to evaluate the prediction performance of BLC-SVR.
where y i and z i are the actual and predicted survival times of the ith miRNA, respectively, y and z are the corresponding means, and N is the total number of BLC patients in the validation set. The mean absolute error (MAE) is also used for the evaluation of prediction performance, defined as follows: Appearance score ASC. The robust signatures among the 50 independent runs of BLC-SVR has the highest ASC obtained using the following procedure 29 .
Step 1: Perform Ns independent runs of BLC-SVR for obtaining Ns miRNA signatures. There are m i features in the ith signatures, i = 1, …, Ns (in this study Ns = 50).
Step 2: The ASC of a miRNA signature is calculated as follows: (1) Calculate the appearance frequency f(miR)for each feature miR that appears in the Ns signatures.
(2) Calculate the score F i , i = 1, …, Ns. Where miR it is the tth feature in the ith signature: (3) Obtain the i-th feature set with the highest appearance score Fi as the robust signature.
Ridge regression, Lasso and elastic net. The estimation performance of BLC-SVR was compared with some standard regression methods, ridge regression, Lasso, and elastic net. Ridge regression is a penalized regression approach where the Euclidean norm was used as the penalty 68 . Lasso uses L1 regularization to identify features and regression coefficients by regularizing the coefficients to zero that lead to minimize the prediction error 69 . Elastic net is a combination of Lasso and ridge regression 70 . The minimum λ was chosen after 100 iterations of 10-CV for ridge, Lasso and elastic net. The prediction performance was evaluated in terms of the correlation coefficient and mean absolute error.