Identification and characterization of the lncRNA signature associated with overall survival in patients with neuroblastoma

Neuroblastoma (NB) is a commonly occurring cancer among infants and young children. Recently, long non-coding RNAs (lncRNAs) have been using as prognostic biomarkers for therapeutics and interventions in various cancers. Considering the poor survival of NB, the lncRNA-based therapeutic strategies must be improved. This work proposes an overall survival time estimator called SVR-NB to identify the lncRNA signature that is associated with the overall survival of patients with NB. SVR-NB is an optimized support vector regression (SVR)-based method that uses an inheritable bi-objective combinatorial genetic algorithm for feature selection. The dataset of 231 NB patients that contains overall survival information and expression profiles of 783 lncRNAs was used to design and evaluate SVR-NB from the database of gene expression omnibus accession GSE62564. SVR-NB identified a signature of 35 lncRNAs and achieved a mean squared correlation coefficient of 0.85 and a mean absolute error of 0.56 year between the actual and estimated overall survival time using 10-fold cross-validation. Further, we ranked and characterized the 35 lncRNAs according to their contribution towards the estimation accuracy. Functional annotations and co-expression gene analysis of LOC440896, LINC00632, and IGF2-AS revealed the association of co-expressed genes in Kyoto Encyclopedia of Genes and Genomes pathways.


Neuroblastoma (NB) is a commonly occurring cancer among infants and young children. Recently, long non-coding RNAs (lncRNAs) have been using as prognostic biomarkers for therapeutics and interventions in various cancers. Considering the poor survival of NB, the lncRNA-based therapeutic strategies must be improved. This work proposes an overall survival time estimator called SVR-NB to identify the lncRNA signature that is associated with the overall survival of patients with NB. SVR-NB is an optimized support vector regression (SVR)-based method that uses an inheritable bi-objective combinatorial genetic algorithm for feature selection. The dataset of 231 NB patients that contains overall survival information and expression profiles of 783 lncRNAs was used to design
Neuroblastoma (NB) is the most common cancer in children, comprising 10% of all childhood cancers 1 . Most cases occur in very young children under the age of one year 2 ; hence, NB is commonly referred to as an embryonic tumour 3 and is responsible for approximately 11% of cancer deaths in children. Initially, the tumour originates in tissues of the sympathetic nervous system and is thus found as lesions in the adrenal glands, pelvis or abdomen chest 4 . The characteristics of neoplasms are highly enigmatic because these tumours exhibit either spontaneous regression or rapid progression. The prospect of survival depends on the age at diagnosis, tumour stage, and genetic features. According to The International Neuroblastoma Staging System, NB is staged into five groups: stage 1 to 4 and 4S based on metastasis formation and lymph node involvement 5,6 . The treatment of NB exhibits clinical diversity; hence, the treatment response is correlated with clinical and biological factors, including cancer risk group, age, and genetic abnormalities. Children with stage 1 and stage 2 neuroblastomas can be cured with surgery alone as a primary therapy 7 . Infants with stage 4 neuroblastomas exhibit better prognosis in response to treatment with chemotherapy and surgery 8 . In contrast, patients with high-risk NB exhibit poor event-free survival after chemotherapy, whereas improved event-free survival is observed in patients with advanced-stage NB after radiotherapy and chemotherapy followed by autologous bone marrow transplantation 9 . Despite treatment conditions, only 40-50% of patients with NB exhibit long-term survival 10 . Due to the heterogeneous nature of NBs, the clinical behaviour and molecular mechanisms underlying tumour growth are largely unknown, and more efficacious therapeutics are necessary to control this cancer.
The most common genetic abnormality observed in NB is amplification of the MYCN gene in NB cells. MYCN-mediated oncogenic transformation is responsible for aggressive tumour formation and poor prognosis in NB 11 . Further, genetic abnormalities associated with NB include loss of heterozygosity at the distal short arm of chromosome 1, which is associated with clinical outcome 12,13 , hyperdiploid features 14 , and defects in the function of nerve growth factor (NGFR) 15,16 . Genome-wide studies have sought to identify protein biomarkers for improved NB therapies. For instance, pharmacodynamic biomarkers have been developed to evaluate the mechanism of PI3K/AKT/mTOR pathway signalling activity and MYCN protein expression in children with NB 17 . Expression of biomarkers, including X-linked inhibitor of apoptosis and vascular growth factors, regulates bone marrow metastasis in NB 18 . Genomic amplification of the MYCN oncogene is associated with NB tumour aggressiveness and poor prognosis in NB patients 19 . Germline mutations in the anaplastic lymphoma kinase gene are largely responsible for familial NB, and this germline mutation can serve as potential therapeutic target for NB 20 . Although advances in treatment conditions and therapeutics have improved patient prognosis, long-term survival of the high-risk group has not been considerably improved. Hence, the identification of potential targets associated with NB survival is urgently required.
Over the past several years, advancements in next-generation sequencing (NGS) and microarray technologies have increased the interest in non-coding RNAs (ncRNAs), including small non-coding RNAs, such as miRNAs, piRNAs, and snoRNAs, and long non-coding RNAs (lncRNAs), given their significant roles in specific diseases. In particular, the role of lncRNAs in evolution and genome function is a newly described phenomenon. LncRNAs are non-coding RNAs that are >200 nucleotides in length and have been implicated in pathological and biological process through post-transcriptional regulation of mRNA processing and cis regulation 21 . Over the last decade, several studies have identified that lncRNAs play a significant role in several biological processes 22 . LncRNAs are highly stable and easily detectable in body fluids 23,24 . Several studies have revealed the significance of lncRNAs in various cancers. For instance, specific lncRNAs are up-or down-regulated in prostate cancer; lncRNAs, such as PCGEM-1, PCAT-1, and PCA3, play critical roles in prostate cancer 25,26 . The lncRNA HOTAIR is up-regulated, silences genes through interactions with LSD1 and PRC2 and is also involved in protein degradation via interaction with E3 ubiquitin ligases in various cancer types, include lung, ovarian, and pancreatic cancers [27][28][29] . LncRNAs also play important roles in NB. Specifically, ncRNA possess oncogenic properties, and its overexpression is correlated with poor prognosis in NB patients 30 . Overexpression of NDM29 in NB cell lines is associated with chemosensitivity 31 . Despite of advances in RNA-sequencing technologies, functions of several lncRNAs are not yet validated. LncRNAs are emerging as crucial players in tumorigenesis by directly or indirectly acting as tumor suppressors 32 or oncogenes 33 . Various approaches were developed to use lncRNAs as potential targets in cancer, such as post-transcriptional targeting of lncRNAs 34 , modulation of lncRNAs using genome-editing techniques 35 , and loss of lncRNA function by inhibition of RNA-protein interactions using RNA-binding small molecules 36 . The identification of lncRNA signature in the context of cancer provides an opportunity to explore lncRNAs as possible targets and improve our knowledge of lncRNAs association with the overall survival of NB.
Several researchers have attempted to predict NB patient survival. Oberthuer et al. predicted individual survival rates for NB patients using the automatic relevance determination (CASPAR) algorithm 37 . Wei et al. developed a survival predictor using an artificial neural network and identified 19 genes that predict clinical outcome in NB patients 38 . Gene-wide promoter methylation profiling and cox elastic net analysis were utilized to predict NB patient outcome, and the degree of methylation of retinoblastoma 1 (RB1) and teratocarcinoma-derived growth factor 1 (TDGF1) was associated with poor survival 39 . MicroRNA expression profiling and support vector machines (SVMs) were used to predict event-free survival in NB patients 40 . However, few studies exist that use lncRNAs for survival prediction in NB patients. Divya et al. utilized lncRNA expression profiles and reported that SNHG1 is highly expressed and significantly associated with poor survival in NB patients 41 . Another study by Divya et al. used lncRNA expression data from 493 NB patients and identified a 16-lncRNA prognostic signature that predicts event-free survival 42 . In addition, lncRNA expression profiling was also used in other cancer types for prediction purposes. The lncRNA signature was used to predict the overall survival in esophageal squamous cell carcinoma 43 . Six lncRNAs were identified which significantly correlate with the disease free survival in patients with colorectal cancer 44 50 . Genome-wide analysis study on 419 patients with glioblastoma identified six lncRNAs, AC005013.5, UBE2R2-AS1, ENTPD1-AS1, RP11-89C21.2, AC073115.6, and XLOC_004803 which distinguished the high and low risk groups 51 . In conclusion, utilization of lncRNA expression in cancer survival prediction could aid in the understanding of the molecular mechanisms underlying cancer progression and the identification of potential biomarkers.
Accordingly, this study proposed the SVR-NB method to identify the lncRNA signature that is strongly associated with overall survival in NB patients. Different from our previous studies 41,42 , SVR-NB was developed based on support vector regression (SVR) 52 and an inheritable bi-objective combinatorial genetic algorithm (IBCGA) 53 to select a small set of lncRNAs as a signature among a large number of lncRNAs. We retrieved RNA-seq data and overall survival information of NB patients from the database of gene expression omnibus (GEO) accession GSE62564. In clinical research, the time to death is an event of interest; hence, we exclusively focused on patients who died from NB. After the filtration process, 104 patients with 104 expression profiles consisting of 783 lncRNAs and corresponding overall survival information were obtained for further analysis. SVR-NB identified 35 out of 783 lncRNAs which are strongly correlated with overall survival in NB patients. SVR-NB using 10-fold cross-validation (10-CV) achieved a mean squared correlation coefficient of 0.85 ± 0.009 and a mean absolute error of 0.56 ± 0.09 years between actual and estimated overall survival times in NB patients. We analysed the roles of identified lncRNAs in different cancers. Furthermore, functional annotation and co-regulated gene expression analyses of top ranked lncRNAs were discussed. We hope that these findings will improve multimodal therapy and survival in patients with NB.

Results and Discussion
Overall survival estimation. We utilized SVR-NB to identify the lncRNA signature that correlated with the overall survival in NB patients. We utilized 104 lncRNA expression profiles of 783 lncRNAs and the corresponding overall survival data from 104 NB patients. SVR-NB used the feature selection algorithm IBCGA to identify a small set of lncRNAs as a signature that influence overall survival of NB patients.
SVR-NB achieved a best squared correlation coefficient of 0.89 and a mean absolute error of 0.49 years between the actual and estimated overall survival time using 10-CV from 30 independent runs (Table 1). SVR-NB obtained a mean squared correlation coefficient of 0.85 ± 0.009 and a mean absolute error of 0.56 ± 0.09 years in NB patients. We measure the feature frequency score (FFS) for each of 30 independent runs of SVR-NB to select one robust feature set with the highest FFS. The obtained signature of 35 lncRNAs has the highest FFS of 7.86 indicating that each lncRNA appears 7.86 times on average in the 30 runs. The FFS values of 30 runs are given in Supplementary Fig. S1.
We compared the SVR-NB method with three standard linear regression methods: ridge, LASSO and elastic net regression methods. Ridge regression used all the features and obtained a squared correlation coefficient of 0.62 and a mean absolute error of 0.87 years between the actual and estimated overall survival times. LASSO identified 41 features and achieved a squared correlation coefficient and a mean absolute error of 0.68 and 0.78 years, respectively. The elastic net method identified 44 features and obtained a squared correlation coefficient and a mean absolute error of 0.67 and 0.81 years, respectively, between the actual and estimated survival time. The SVR-NB estimation performance is better than that of these three standard regression methods. The correlation plots of SVR-NB, ridge, LASSO, and elastic net are presented in Fig. 1.
Additionally, we used the signature of 35 lncRNAs and Naïve Bayes classifier 54 to classify the 352 NB patients into high risk and low risk groups. Naïve Bayes classifier achieved a leave-one-out cross-validation accuracy, Matthews correlation coefficient, precision, recall and area under ROC curve of 86.64%, 0.73, 0.86, 0.86, and 0.94 respectively. The prediction performance of Naïve Bayes classifier was evaluated using a receiver operating curve (ROC), as shown in Supplementary Fig. S2.

SVR-NB validation.
We evaluated the performance of SVR-NB in an independent test cohort of 127 patients with NB who are still living. The independent test cohort exhibits the mean overall survival time of 39.22 ± 15.42 months, whereas the predicted mean overall survival time is increased compared with the actual mean overall survival time 43.55 ± 17.58 months. The predicted mean overall survival time of 73 (50.96 ± 18.45) among the 127 patients is increased compared with the actual mean overall survival time (32.77 ± 16.18). The obtained squared correlation coefficient was 0.31 between actual overall survival time and predicted overall survival time.
The prediction error in terms of mean absolute error for the remaining 54 patients whose predicted overall survival time is smaller than the actual overall survival time, which is 1.19 years between the actual overall survival time and predicted overall survival time. Comparing to the prediction error of 0.63 years obtained for the 104 NB patients using SVR-NB (FFS), whereas the prediction error of 1.19 years is higher, due to the small sample size. However, SVR-NB would perform better by increasing the training sample size. The estimation of overall survival in the independent test cohort is presented in Fig. 2.
Ranking of the lncRNA signature. We ranked the lncRNAs of the identified signature using main effect deference (MED) analysis 55 . MED analysis reveals the contribution of each lncRNA among the lncRNA signature towards estimation accuracy of the overall survival time. LncRNAs with higher MED scores indicate a greater contribution of these lncRNAs towards the estimation accuracy of overall survival time, while the lncRNAs with lower MED scores indicates the lesser contribution. The top 10 ranked lncRNAs based on the MED analysis are LOC440896, LOC729770, LINC00632, CXCR2P1, LOC643542, LOC387720, IGF2-AS, DUX4L3, HAS2-AS1, and LINC01606. We ranked all 35 lncRNAs, and their corresponding MED values are presented in Table 2. The top 10 lncRNAs and their chromosome locations are provided in Supplementary Table S1. www.nature.com/scientificreports www.nature.com/scientificreports/ Significance of top ranked lncRNA in cancers. LOC440896. Uncharacterized LOC440896 alias AL353608.3 is differently expressed in various cancers. Genome-wide analysis studies on 79 small cell lung cancer patients reported that AL353608.3 is up-regulated and differently expressed in lung cell carcinoma compared with that of normal cells with a log2-fold change of 3.2 56 . RNA-sequencing of cells derived from patients with juvenile idiopathic arthritis demonstrated that AL353608.3 was up-regulated in inflammatory cells with a log2-fold change of 5 compared with that of normal cells 57 . This lncRNA is actively involved in breast cancer cells, and expression of AL353608.3 is up-regulated in breast cancer cells compared with that of normal counterparts 58 . Additionally, AL353608.3 was down-regulated in blood platelets from patients with pancreatic adenocarcinoma with a log2-fold change of −4.2 compared with that in healthy samples 59 , and expression of this lncRNA expression is also involved in glioblastoma 59 .
LINC00632. Long intergenic non-protein coding RNA 632 (LINC00632) is implicated in several major cancers. For instance, LINC00632 expression was up-regulated in breast cancer cells with a log2-fold change of 5.2 compared with that in normal cells 58,60 . Up-regulation of LINC00632 was observed in prostate carcinoma cells with a log2-fold change of 4.8 compared with that in healthy cells 61 . Additionally, down-regulation of LINC00632 is significantly associated with different cancer types, such as non-small cell lung carcinoma 59 and medulloblastoma 62 , and down-regulation of LINC00632 is frequently observed in glioblastoma 63,64 . In addition to cancer tissues, LINC00632 is highly expressed in normal brain tissue with a mean RPKM of 3.06 ± 1.54 65 .  www.nature.com/scientificreports www.nature.com/scientificreports/ LOC643542. Uncharacterized LOC643542 is highly expressed in human normal tissues and 27 other tissue types, such as fat, kidney and brain, with mean RPKM values of 0.29 ± 0.17, 0.15 ± 0.11, and 0.07 ± 0.14, respectively 65 . Genome-wide association studies revealed the association of LOC643542 with major depressive disorder 66 . A meta-analysis of 1110 major depressive disorder cases reported that LOC643542 is localized in the brain region and exhibits a higher number of single-nucleotide polymorphisms 66 . Genome-wide association studies further confirm the association of LOC643542 in bipolar disorder 67 and hyperactivity disorder 68 .

IGF2-AS.
RNA-sequence analysis study on breast carcinoma patients revealed that IGF2-AS is up-regulated in HER2 breast carcinoma cells with a log2-fold change of −4.1 compared with that in normal cells 58 . Down-regulation of IGF2-AS was also observed in amyotrophic lateral sclerosis 69 and Down syndrome (trisomy 21) 70 with log2-fold changes of −1.5 and −2.9, respectively, compared with those in normal cells. IGF2-AS expression was up-regulated in glioblastoma 71 with a log2-fold change of 3 and in childhood brain tumour ependymoma 62 with a log2-fold change of 1.3.
HAS2-AS1. HAS2-AS1 is frequently down-regulated in different cancer types. RNA sequencing of six tumour types revealed that HAS2-AS1 is down-regulated in various cancers 59 . HAS2-AS1 expression was down-regulated in breast carcinoma cells, pancreatic carcinoma, colorectal carcinoma, non-small cell lung carcinoma, and glioblastoma cells with log2-fold changes of −3.2, −2.6, −1.8, −1.2, and −1.2, respectively, compared with those in normal cells 59 . In addition, HAS2-AS1 up-regulation was also observed in glioblastoma cells with a log2-fold change of 3.5 71 .  www.nature.com/scientificreports www.nature.com/scientificreports/ LINC01606. LINC01606 is implicated in various cancers. RNA-sequence analysis on LINC01606 revealed that LINC01606 is up-regulated in triple-negative breast cancer cells and HER2-positive breast carcinoma cells with log2-fold changes of 5.4 and 2.8, respectively 58 . Up-regulation of LINC01606 was also observed in oesophageal adenocarcinoma 72 . LINC01606 was down-regulated in pancreatic adenocarcinoma 59 and glioma 71 with log2-fold changes of −5 and −3.5, respectively. RNA-sequencing studies on different tumour types revealed down-regulation of LINC01606 in hepatobiliary carcinoma, non-small cell lung carcinoma, and colorectal carcinoma with log2-fold changes of −2.7, −2.4, and −2.1, respectively 59 .
Few studies reported the remaining four lncRNAs (LOC729770, CXCR2P1, LOC387720, and DUX4L3) among the top 10 ranked lncRNAs, involved in NB and other cancers. Though, these four lncRNAs LOC729770, CXCR2P1, LOC387720, and DUX4L3 have few experimental validations in NB, their contribution towards the overall survival estimation is higher ranked second, fourth, sixth, and eighth respectively. Hence, these four lncR-NAs are potential biomarkers of NB survival time to be further validated. We summarize the top 10 ranked lncR-NAs and their role in cancer/disorder in Supplementary Table S2.
Though there were limited number of experimental validations on lncRNAs in NB, we reported some studies to support the association between the identified lncRNAs and cancer. A study using a real-time reverse transcriptase polymerase chain reaction assay (qPCR) and western blot analysis on NB cells revealed that MYCN expression was found to be up-regulated and associated with the NB stage 73 . Northern blot analysis on Wilm's tumor samples reported that IGF2-AS was found to be up-regulated in Wilms' tumor samples compared to the healthy samples 74 . A qPCR and Sothern blot analysis on hepatocellular carcinoma cells revealed that IGF2-AS can significantly restrain the malignant cells and may act as gene therapeutic target 75 . Up-regulation of HAS2-AS1 was observed in oral squamous cell carcinoma using qPCR and western blot analysis 76 . LNC00964 expression was found to be down-regulated in colorectal cancer using qPCR analysis 77 . The qPCR and western blot analyses revealed the up-regulation of MEG3 in pancreatic ductal carcinoma 78 , multiple myeloma 79 , and ovarian cancer 80 .  Supplementary Fig. S3.

Expression difference in amplified MYCN
Additionally, we performed the survival analysis of the top 10 ranked lncRNAs using Kaplan-Meir (KM) survival curves. We used median expression of the lncRNA as a threshold to classify lncRNA expression into high expression group and low expression group. The KM-survival curves were plotted for the top 10 ranked lncRNAs. The overall survival KM plots for the two groups were shown in Supplementary Fig. S4.
Six lncRNAs among the top 10 are differently expressed in various normal human tissues, such as lung, liver, ovary, brain, and other tissues. The expression levels of these six lncRNAs in different tissues are shown in Supplementary Fig. S5 using the human body map.
Functional annotations of LOC440896, IGF2-AS, and DUX4L3. We examined the functional annotations of the top 10 ranked lncRNAs using Database for Annotations Visualization and Integrated Discovery tool (DAVID) 81 . Each lncRNA is associated with specific functional annotations. For instance, among the top 10 ranked lncRNAs, LOC440896 is associated with the sequence feature of the putative uncharacterized protein FLJ45355. IGF2-AS is associated with the putative insulin-like growth factor2 antisense gene protein and sequence variant. DUX4L3 is associated with compositionally biased regions Ala-rich and Arg-rich and DNA binding region. The lncRNA DUX4L3 is associated with various gene-ontology terms, including nitrogen compound metabolic process (GO:0006807), biosynthetic process (GO:0009058), regulation of biological process (GO:0050789), regulation of metabolic process (GO:0019222), cellular metabolic process (GO:0044237), and biological regulation (GO:0065007).
Furthermore, the UCSC_TFBS algorithm available from DAVID was used to identify protein interactions, including transcription factors with sets of target genes. Four out of the top10 ranked lncRNAs including CXCR2P1, HAS2-AS1, DUX4L3, and LOC440896, are involved in protein interactions and have functions related to transcription factors. We summarize the functional annotations associated with the top 10 ranked lncRNAs in Table 3.
Furthermore, we investigated the expression levels of these three lncRNAs in NB patients using integrated bioinformatics and wet-lab data analysis of NB data 83 , in which 88 human NB samples were analysed. Gene expression charts were generated using the gene expression activity chart plugin, which is available from the BioGPS gene annotation portal 84 . Expression charts for LOC440896, LINC00632 and IGF2-AS among 88 human NB samples are presented in Supplementary Fig. S6.

Conclusions
Recent advances in NGS data have attracted considerable attention in the exploration of the significance of ncR-NAs in cancer. LncRNAs are becoming a subject of interest in cancer research due to their critical role in multiple biological processes. Recent developments in computational biology and experimental techniques have identified thousands of lncRNAs in eukaryotes. However, only few lncRNAs are characterized and experimentally validated to confirm their disease association. Hence, developing computational models to identify the lncRNAs in cancer is an important task that would aid to understand the disease at lncRNA levels, and disease diagnosis. Various computational prediction models have been developed to discover non-coding RNAs and disease association [85][86][87][88][89][90] . Chen et al. developed potential computational models to identify the lncRNA and disease association 91,92 . Identification of the lncRNA signature associated with overall survival in cancer patients using well-validated computational methods is helpful for the therapeutic strategies. LncRNAs are implicated in tumorigenesis and exhibit diverse regulatory processes in cellular process. Thus, the identification of lncRNA signature would be important in terms of disease characterization and therapy. Therefore, we attempted to identify the lncRNA signature that is associated with the overall survival of NB patients, which could aid in NB therapeutics. Accordingly, we developed a survival time estimator called SVR-NB to estimate the overall survival time and identify the lncRNA signature that is associated with overall survival in NB patients. We incorporated the feature selection algorithm IBCGA into SVR to establish the optimized SVR model. SVR-NB identified a 35-lncRNA signature that is potentially correlated with the overall survival time of NB patients. SVR-NB obtained a 10-CV squared correlation coefficient of 0.85 ± 0.009 and a mean absolute error of 0.56 ± 0.09 years between the actual and estimated overall survival times in NB patients. In addition, SVR-NB performed better than standard regression methods, including ridge, LASSO and elastic net. Although, the estimation performance of SVR-NB is promising, it has some limitations due to the small sample size. The prediction error of SVR-NB on the independent test cohort was increased when compared to that on the training dataset. Nonetheless, SVR-NB performance can be improved by increasing the number of samples.
We ranked the lncRNAs of the identified signature based on their contribution towards the survival estimation. Furthermore, we analysed the roles of the top ranked lncRNAs in cancer. Functional annotations and co-regulated gene expression of LOC440896, LINC00632 and IGF2-AS are discussed. The expression levels of these three lncRNAs in NB samples were presented using expression charts. Although some of the lncRNAs among the top 10 ranked list, such as LOC729770, CXCR2P1, LOC387720, and DUX4L3 are uncharacterized, and not involved in NB, our analysis suggests that these four lncRNAs might exhibit critical roles in NB patients' overall survival and are promising biomarkers of NB survival time for further validation.
The development of technologies for potential identification of lncRNAs and their role in cancer are important for NB diagnostics and therapeutics. Identified lncRNAs in this study could aid in the development of lncRNA-based targeted cancer therapies in NB patients.

Materials and Methods
Dataset. We retrieved the lncRNA expression dataset of 493 NB samples from GEO accession GSE62564. The details about preprocessing and normalization of the GSE62564 dataset is described in the work 41 . We applied filtration to the dataset, including elimination of duplicate entries, selection of samples who died from NB, and retrieval of overall survival time by using the sample ID. We eliminated samples with the overall survival time of less than 30 days. In the lncRNA filtration process, we applied log intensity variation 93 to reduce the size of candidate features from 6260 to 783 lncRNAs. After the filtration process, the training dataset consisted of 104 patients with overall survival time and 104 expression profiles of 783 lncRNAs. Another dataset of 127 patients with NB who are alive from GEO accession GSE62564 was used as an independent test cohort.

SVR-NB.
This study proposed an overall survival time estimator SVR-NB based on SVR using IBCGA to identify the set of lncRNAs in NB patients. The functionality of SVR-NB is two-fold: to estimate the overall survival time and to identify significant lncRNAs strongly associated with overall survival.
The support vector machine (SVM) algorithm 94 , is useful in solving bioinformatics problems 95,96 . SVR is another version of SVM for regression. SVR has been widely applied in many biomedical fields, such as pharmaceutical research 97 and cancer prognosis 98 . We have successfully applied an SVR incorporated with feature selection algorithm IBCGA for estimation of survival in patients with glioblastoma multiforme and lung adenocarcinoma 99,100 .
SVR-NB is developed based on ν-SVR for the given data points (x 1 , y 1 ), … (x n , y n ), where x i ∈ R l is an NB patient input sample and, y i ∈ R k is a target label (y i is the overall survival time). The primal problem of ν-SVR is described as follows.
; and b is a constant. Here, 0 ≤ ν ≤ 1, and C is the regularization parameter. The ε-insensitive loss function. To avoid the over training, we used 10-fold cross-validation (10-CV) to evaluate the performance of the model. Pearson's correlation coefficient (CC) was used as a fitness function. Pearson's correlation coefficient (CC) is formulated as follows: where x i and y i are actual and estimated overall survival time of the i th lncRNA respectively, and x and y are their corresponding means. Here, N is the total number of patients with NB. We used squared correlation coefficient to evaluate the model performance.
Inheritable bi-objective combinatorial genetic algorithm. To select a minimal set of informative features from a large number of candidate features the inheritable bi-objective combinatorial genetic algorithm (IBCGA) is used. The IBCGA uses an intelligent evolutionary algorithm 101 that can efficiently solve large parameter optimization problems. In this study, we propose a method for the identification of informative lncRNAs associated with NB overall survival based on the IBCGA and ν-SVR by maximizing the estimation performance in terms of correlation coefficient (CC). In this work, the LibSVM package 102 was used for implementation of ν-SVR.
The encoded chromosomes and the customized IBCGA were designed as described in previous studies 99,100,103 . The chromosome of the IBCGA comprises 783 genes and three 4-bit genes for encoding γ, C, and ν for the ν-SVR. In this work, the parameter values are r start = 10, r end = 50, N pop = 50, P c = 0.8, P m = 0.05, and G max = 60 53 .
where x i and y i are actual and estimated overall survival time of the i th lncRNA, respectively. Here, N is the total number of NB patients. The steps of IBCGA are as follows.
Step 1: (Initialization) Randomly generate a population of N pop individuals.
Step 2: (Evaluation) Evaluate the fitness value of all individuals using the fitness function that is the squared correlation coefficient (SCC) in terms of 10-fold cross-validation (10-CV).
Step 3: (Selection) Use a tournament selection method that selects the winner from two randomly selected individuals to generate a mating pool.
Step 4: (Crossover) Select two parents from the mating pool to perform orthogonal array crossover operation.
Step 5: (Mutation) Apply a conventional mutation operator to the randomly selected individuals in the new population. Mutation is not applied to the best individuals to prevent the best fitness value from deterioration.
Step 6: (Termination test) If the stopping condition for obtaining the solution is satisfied, output the best individual as the solution. Otherwise, go to Step 3.
Step 7: (Inheritance) If r < r end , randomly change one bit in the binary genes for each individual from 0 to 1; increase the number r by one, and go to Step 3. Otherwise, stop the algorithm.
Step 8: (Output) Obtain a set of lncRNAs from the chromosome of the best individual.
Ridge, LASSO and Elastic net. We compared three standard regression methods with SVR-NB. The Ridge regression is also called L2-penalized regression 104 . The Ridge regression conserves all the features to build prediction models. In the Ridge regression, the penalty term (λ) regularizes the coefficients of the predictors towards zero, if the coefficients take large values, and the optimization function is penalized. Hence, the Ridge regression shrinks the coefficients and reduces the model complexity. The least absolute shrinkage and selection operator (LASSO) 105 was also employed to estimate the overall survival of NB patients. LASSO uses L1 regularization, in which some of the coefficients are neglected or regularized to zero for the evaluation of output 105 . Therefore, LASSO can help in the feature selection procedure. We chose λ (minimum λ) for the tuning parameter after 100 iterations of 10-CV. We used squared correlation coefficient and mean absolute error for the performance measurement. Elastic net 106 is an extension of the LASSO, in which LASSO and ridge regression are combined. The Elastic net method can be defined as follows Feature frequency score (FFS). We measure the feature frequency score for each independent run as follows: where f(z) is the feature frequency for feature z that presents in the lncRNA set, n t is number of the features in the t-th signature, t = 1 …. R, and Z i is the i-th lncRNA in the t-th solution.

Data Availability
All the data used in this analysis can be found at the database of gene expression omnibus (GEO) accession GSE62564.