Network-based integration of multi-omics data for clinical outcome prediction in neuroblastoma

Multi-omics data are increasingly being gathered for investigations of complex diseases such as cancer. However, high dimensionality, small sample size, and heterogeneity of different omics types pose huge challenges to integrated analysis. In this paper, we evaluate two network-based approaches for integration of multi-omics data in an application of clinical outcome prediction of neuroblastoma. We derive Patient Similarity Networks (PSN) as the first step for individual omics data by computing distances among patients from omics features. The fusion of different omics can be investigated in two ways: the network-level fusion is achieved using Similarity Network Fusion algorithm for fusing the PSNs derived for individual omics types; and the feature-level fusion is achieved by fusing the network features obtained from individual PSNs. We demonstrate our methods on two high-risk neuroblastoma datasets from SEQC project and TARGET project. We propose Deep Neural Network and Machine Learning methods with Recursive Feature Elimination as the predictor of survival status of neuroblastoma patients. Our results indicate that network-level fusion outperformed feature-level fusion for integration of different omics data whereas feature-level fusion is more suitable incorporating different feature types derived from same omics type. We conclude that the network-based methods are capable of handling heterogeneity and high dimensionality well in the integration of multi-omics.


Materials and methods
Datasets. We used neuroblastoma multi-omics datasets from the TARGET project 22 and the SEQC project 23 to demonstrate applications of our methods. We confirm that all methods were performed in accordance with the relevant guidelines and regulations. Each dataset consists of samples gathered from two omics data types.
• SEQC dataset: SEQC cohort 26 had a total of 498 neuroblastoma samples, including 176 high-risk and 322 low-and intermediate-risk samples. Microarray and RNA-seq datasets for 498 neuroblastoma patients from SEQC project were downloaded from NCBI GEO database (https:// www. ncbi. nlm. nih. gov/ gds) with accession numbers GSE49710 and GSE62564, respectively. They both measure gene expression levels but by using different omics technologies. • TARGET dataset: Target cohort 22 comprised of 157 high-risk neuroblastoma samples, including gene expression data and DNA methylation data. RNA-seq expression dataset from TARGET project was downloaded from project website (https:// ocg. cancer. gov/ progr ams/ target/ proje cts/ neuro blast oma). And DNA methyla- www.nature.com/scientificreports/ tion dataset was downloaded from NIH GDC portal (https:// portal. gdc. cancer. gov/ proje cts/ TARGET-NBL). Gene expression data quantifies the transcriptome in neuroblastoma patients while DNA methylation data (adding methyl groups to genes) signifies epigenomic variations in those patients.

Patient similarity networks (PSN). PSN is a graph that represents patients as nodes and similarities
between patients as edges and is denoted by G m = (V , A m ) where V denotes the set of subjects and A m = a m uv denotes the affinity matrix (the similarity matrix) where a m uv denotes the similarity of measurements of omics type m between subjects u ∈ V and v ∈ V . If φ m v denotes omics m type measurement of subject v, then where sim is a similarity measure.
The similarity between features of individual omics datasets was determined by the Pearson's correlation coefficient between the patients: where N denotes the total feature number and i refers to the ith feature in the dataset of omics m.
These correlation values were normalized and rescaled to represent positive edge weights by using the Weighted Correlation Network Analysis (WGCNA) algorithm 27 . WGCNA enforces scale-freeness of the PSN by making its nodal degree distribution follow a power law, or at least asymptotically, and thereby our analysis becomes robust to noises and errors.
Network features. From a PSN, we computed two types of features: centrality features and modularity features. The centrality identifies features giving high scores for most important nodes of the network 28 . We computed 12 centrality features for nodes: weighted degree, closeness centrality, current-flow closeness centrality, current-flow betweenness centrality, eigen vector centrality 29 , Katz centrality 30 , hits centrality 31 (authority values and hub values), page-rank centrality 32 , load centrality 33 , local clustering coefficient, iterative weighted degree and iterative local clustering coefficient.
Modularity features were extracted by extracting the network modules by clustering the nodal features. We used spectral clustering 34 and Stochastic Block Model (SBM) clustering 35 to find network modules and the most optimal number of modules were determined by the silhouette score. Modular memberships of each node to modules were represented by one-hot vectors and the sum of these vectors for all the modules was taken as the modular feature vector for a given node. The centrality features and modular features were concatenated to obtain the network features that were used as the inputs to the classifiers.
Feature-level fusion. For each PSN obtained from omics dataset m, we extracted n feature vector x m for each node or a subject. Using feature-level fusion, we combined individual omics datasets to obtain multi-omics features: Feature-level fusion was achieved by concatenating the modularity features and computing the mean of centrality features from individual datasets.
Network-level fusion. In the network-level fusion, the PSN for multi-omics data G is obtained by combining PSN, G m , of individual omics data. We derive the similarity matrix of multi-omics PSN by fusion of those of single-omics PSN: We achieved the network-level fusion of single-omics PSN using Similarity Network Fusion (SNF) algorithm 20 .
Deep neural networks (DNN). Let us consider L + 1 layer DNN (feedforward network) for prediction of clinical outcomes where layers l = 0, 1, . . . L with l = 0 and l = L denoting the input and output layers of the DNN. Let the output, weights, and biases for layer l be denoted as h l , W l , and b l . The input layer receives features x from each subject, so h 0 = x . For layers l = 1, . . . L − 1 f denotes the activation function of layer l.
The output y of the output softmax layer L give The output class label k * is assigned the class k receiving maximum output activation: www.nature.com/scientificreports/ The network parameters are learned by minimizing the cross-entropy loss by using gradient descent approach.
In our experiments, we used an Adams optimizer to learn the weights and biases of the network.

Relevance propagation.
In order to explore the utility of network features extracted from PSN and the interpretability of our DNN models, Relevance propagation was applied. Relevance propagation is an approach to studying the relevance or attribution of each input feature to a neural network. According to a unified framework comparing existing approaches proposed by M. Ancona et al. (2018) 36 , relevance propagation methods can be classified into perturbation-based and gradient-based methods. Perturbation-based methods compute the relevance of an input feature by simply removing, masking or altering it and comparing the difference with the original output. While the theory of this kind of methods is straightforward, its drawbacks include: (1) slow running time especially with a huge input feature set; (2) unstable results when number of features removed in each iteration varies due to the non-linearity of DNN. On the contrary, gradient-based methods compute the relevance in a single forward and backward propagation through the DNN, which is stable and not timeconsuming. Therefore, we adopted a gradient-based method to analyze our DNN model. Popular gradient-based methods include Gradient * Input 37 , Integrated Gradients 38 , Layer-wise Relevance propagation (LRP) 39 , and DeepLIFT 40 . Notably, Integrated Gradients satisfies two desirable properties, i.e. sensitivity and implementation invariance, while other methods break at least one of them. Sensitivity is satisfied if a feature is given a non-zero attribution when its input and baseline differ and generate different output values. Sensitivity can be readily violated by gradients, when the final prediction is irrelevant to an input and thus always generates a zero gradient regardless of any alterations of the input. Under such circumstances, irrelevant features might be assigned a prominent attribution, which is the condition we attempt to avoid. In addition, implementation invariance means the attributions should be identical to two networks, if their outputs are equal for all inputs, despite that the two networks have disparate implementations. Consequently, we applied Integrated Gradients rather than other approaches because it conforms with sensitivity and implementation invariance.
Specifically, the integrated gradients of the ith dimension of an input x and a baseline x ′ can be formulated as 38 where F denotes the function of a DNN, and ∂F(x) ∂x i denotes the gradient of F(x) along the ith dimension. Moreover, several studies have revealed that removing insignificant input features gives rise to performance enhancement 41 . Thus, after computing the attributions of each input features, we recursively removed the features one at a time depending on their attribution ranks, and tracked how the performance changes.
Recursive feature elimination (RFE). In addition to DNN, we used Recursive feature elimination (RFE) with other classifier to compare with DNN. RFE is a feature selection method raised by I. Guyon et al. (2002) 25 designed for identifying salient genes in micro-array gene expression data. RFE utilizes an estimator to rank the features with certain criterion (e.g. linear coefficients), and recursively removes the feature with smallest ranking criterion until a desired feature subset is obtained. The iterative procedure of the RFE algorithm can be depicted as: 1. Train the estimator with the current feature set 2. Rank all the features according to the ranking criterion 3. Remove the feature with the smallest criterion In this research, we applied four Machine Learning classifiers, i.e. SVM (with linear kernel), Random Forests (RF), Logistic Regression (LR), and Decision Trees (DT) as estimators of RFE. The classifiers were iteratively trained on the network feature set with RFE algorithm to select the paramount features. Our network feature set consists of centrality features and modularity features. We hope that RFE will help discover the centrality features computed by the most suitable algorithms and modularity features representing the vital modules' membership. Currently, RFE algorithm implemented by Scikit-learn 42 only supports linear models as estimators.

Experiments
We analyzed two multi-omics neuroblastoma datasets: (i) microarray and RNA-seq expression datasets from 498 neuroblastoma samples from SEQC project 26 and (ii) 157 neuroblastoma samples including with RNA-seq expression and DNA methylation datasets from TARGET project 22 . The downloaded datasets were processed by removing any missing values or duplicate values using Pandas and Numpy libraries in Python.
The clinical descriptor used as the label for training DNN classifiers was the binary label 'death from disease' . By excluding the samples with missing descriptors, we performed binary classification on both data sets: 'death from disease' or 'not' . Both datasets were evaluated using nested 3-fold cross-validation due to relatively limited number of samples.
Data preprocessing. The Wilcoxon signed-rank test 43,44 was performed on individual omics datasets to identify the most relevant features of the input features. Correction of multiple test based on Benjamini-Hochberg 45 was applied to control the false discovery rate, considering the high-dimensional input features. Then the www.nature.com/scientificreports/ features that were most correlated with the clinical outcome were identified at a significance level (p-value) of 0.001. This effectively reduced the input number of features for each omics datasets. Since gene expression or DNA methylation data include lots of noise and not all the genes/features may be relevant to the disease, Wilcoxon Analysis allowed us to identify and eliminate irrelevant features early, making our models simpler and more accurate.
Building PSN and feature extraction. Distances between patients were obtained by computing Pearson's correlation coefficients among omics features and thereby PSNs for each omics dataset were built. The correlation weights were normalized and rescaled to be positive by using WGCNA algorithm 27 , making PSN to behave as scale-free networks. We used the smallest beta value for the algorithm, which achieved 90% of the truncated scale free index. WGCNA algorithm was implemented in house, using Python by applying its formula and rescaling the edges of PSN with the formula while trying different hyperparameters to test if the resultant edges successfully make the PSN scale-free. For network-level fusion, we combined PSNs derived from individual omics datasets via the SNF algorithm 20 , which was implemented using SNFtool library in R. Network features of PSN were extracted utilizing NetworkX package on Python. Twelve centrality features and modular features were extracted as input features for classifiers. The number of modules detected for each omics dataset were different. In order to discover network modules, spectral clustering was applied using NetworkX and Stochastic Block Model (SBM) was applied using graph-tool package in Python. We extracted 204 modules for microarray and 16 modules for RNA-seq expressions of SEQC dataset, and 60 modules for RNA-seq expressions and 34 modules for DNA methylation of TARGET dataset. For combined networks generated by network-level fusion, 109 and 44 modules were extracted for SEQC and TARGET datasets, respectively. Before being fed into the neural network, the features extracted were normalized to have a zero mean and a unit variance. In order to achieve feature-level fusion, we computed means of centrality features extracted from PSNs of individual omics data, and concatenated the modularity features.
Training DNN and RFE models. We applied feed-forward DNN for predicting clinical outcomes with features extracted from multi-omics PSNs via Tensorflow V1 framework (https:// www. tenso rflow. org/ versi ons/ r1. 15/ api_ docs/ python/ tf). The weights and biases of DNN were trained by minimizing the cross-entropy loss function with an Adams optimizer 24 . Notably, the SEQC dataset is extremely imbalanced, where around 77% of the samples belong to the majority class, which is "alive", while the TARGET dataset does not suffer from imbalance issue. In order to handle data imbalance, we decided to apply weighted cross-entropy loss function 46 on SEQC dataset, whereas since the data was balanced, we used general softmax cross-entropy loss function on TARGET dataset. The rationale of a weighted cross entropy function is that it assigns different weights to the majority and minority classes to compensate the unbalance naturally. The weightage for class i is defined as where n i denotes the number of sample belonging to class i.
We used rectified linear unit (ReLU) activation function and dropouts function in the hidden layers. We experimented with batch size of 8 and 32. Early stopping criterion was implemented to determine the convergence of learning in order to avoid overfitting.
Nested cross-validation (CV) 47 was employed for tuning the hyper-parameters and model selection. Since the number of samples in our dataset is insufficient to create a standalone testing set, using typical cross-validation may lead to overfitting and data leakage, whereas nested CV is designed to address these issues. The algorithm of nested CV is illustrated in Algorithm 1.
The nested cross-validation procedure is composed of outer CV loop and inner CV loop. The training fold for the outer CV is further splitted into k-fold of inner CV. The inner CV loop is similar to the typical CV, which is used for tuning the hyper-parameters such as hidden sizes, learning rates, batch size, etc. The average score of each hyper-parameter set is calculated across all the inner CV folds to discover the best hyper-parameters. Then the outer CV loop is used for model selection, where each model with the best hyper-parameters decided by the inner CV will be tested and compared. This strategy ensures that the testing data for the final evaluation is excluded from the procedures of tuning hyper-parameters, which leads to a more robust evaluation. www.nature.com/scientificreports/ After tuning the parameters and evaluating models, we obtained the DNN model with best performance and then applied Integrated Gradients 38 on the model to compute saliency scores for input features. Implementation of Integrated Gradients is imported from DeepExplain framework 36 because DeepExplain supports Tensorflow V1 which we utilized to develop our DNN model. Then the input features were ranked by their saliency scores and removed one by one to seek for performance improvement.
Other than DNN, RFE was also explored for predicting clinical outcomes with PSN network features. We implemented RFE with four machine learning models (i.e. linear kernel SVM, Random Forests, Logistic Regression, and Decision Trees). Network features were evaluated and ranked on the training set, and only the salient features were selected for clinical outcome prediction on the testing set. The details of RFE procedure is explained in Algorithm 2.
Comparing with existing methods. To further demonstrate the utility of our approach to integrating multi-omics data and extracting network features, we compared our developed models with several popular approaches, i.e. RGCCA 17 , MOFA 16 , and DIABLO 12 . In our method, the high dimensional and heterogeneous multi-omics data are converted into Patient Similarity Networks (PSN). Then topological features are extracted from the networks, where dimensionality reduction was achieved. And two techniques, i.e. feature-level fusion and network-level fusion, are proposed to integrate PSN or topological features from various omics data. Thus, we compared our method with other approaches that handle feature reduction and multi-omics integration in different ways. RGCCA and MOFA are unsupervised JDR techniques to discovering latent salient factors in omics features that can be readily fed into a downstream analysis. DIABLO is a supervised extension of sparse RGCCA that can be solely applied in a classification task.

Results
. Results  www.nature.com/scientificreports/ the dataset distribution is imbalanced, we also recorded F1 score and ROC-AUC score which works better on illdistributed datasets. The results are shown in the format of mean ± standard deviations obtained over different random splitting for cross-validation. Firstly, we performed single omics analyses on both datasets to contrast the model performance after multiomics integration. Then for multi-omics approaches, we recorded results of both feature-level and network-level fusion. For network-level fusion, PSNs of individual omics datasets were fused in accordance with SNF algorithm, and for feature-level fusion, the features of single omics PSNs were averaged or concatenated and fed into DNN. To study the contribution of centrality and modularity features, we also separated the two kinds of network features from the whole feature set, and fed them into DNN models individually. The results are also shown in Tables 1 and 2 when the feature type is centrality or modularity.
Moreover, relevance propagation was applied on optimum models trained on all the datasets to compute the saliency scores of input features. Thereafter, the input features were removed one by one to track the performance variation, and best performances achieved by the abridged feature set are recorded in Tables 1 and 2 together with the feature dimensionality.
In SEQC dataset, over 80% of the samples belong to "alive" class rendering the data distribution extremely imbalanced, while in TARGET dataset, the "alive" and "death" class each takes up around 50% samples. Under such circumstances, we decided to give priority to F1 score while analysing the results regarding SEQC dataset, and focus on accuracy for TARGET dataset, since F1 score balances precision and recall on the positive class and measures imbalanced dataset better. www.nature.com/scientificreports/ To optimize the performance of DNN, we applied grid search on hyperparameters to discover the best results under specific configurations. Sizes of hidden layers, number of neurons in the layers, batch size, and learning rate were fixed by experimenting with the validation test. The highest F1 score (0.54±0.09) was achieved on SEQC dataset when the DNN architecture is [8,64,4,8], learning rate is 0.01, and batch size is 8. And on TARGET dataset, best accuracy (65.1±4.7%) was gained with the structure [4,4,4], learning rate 0.01, and batch size 32.
As seen from Tables 1 and 2, the experiments of multi-omics dataset with fusion generally achieved higher F1 score or accuracy than prediction based on single omics datasets. On SEQC dataset, highest accurancy (about 80%) and F1 score (around 0.54) were obtained with feature-level fusion technique. Although F1 score of RNA-Sequencing prediction was slightly better, its accuracy is not as good as feature-level fusion. On TARGET dataset, best accuracy (around 65.1%) was achieved by network-level fusion which is better than other techniques. This demonstrates the potential of our approach to integrating multi-omics datasets.
However, it is shown that network-level and feature-level fusion behaved differently on SEQC and TARGET datasets. Notably, the two subsets used for fusion in SEQC dataset, i.e. RNA-Seq and microarray, both belong to the gene expression omics, but leverage different technologies to measure the gene expression profiling. However, in TARGET dataset, RNA-Seq and DNA Methylation data belong to transcriptomics and epigenomics, respectively. It is observed that feature-level fusion is prefered in SEQC dataset, whereas network-level fusion performs better in TARGET dataset. The underlying reason could be that in the condition of homogeneous subsets in SEQC dataset, their features might be redundant rendering the constructed PSNs incompatible to be integrated by SNF algorithm. Therefore, combining the extracted topological features from individual PSNs by averaging the centrality features and concatenating the modularity features is more suitable than network-level fusion. Thereafter, we can leverage the machine learning algorithms to select the features by learning proper weights for them. On the contrary, network-level fusion with SNF is outstanding on multi-omics subsets in TARGET dataset. Consequently, we claim that network-level fusion is generally inclined to better integrate multi-omics datasets, while feature-level fusion is more suitable for combining two homogeneous datasets.
Moreover, on both SEQC and TARGET datasets, models with only modularity features outperformed models with centrality features. This illustrates that a sample's membership of the modules clustered in the PSN contributed more than a sample's importance in the whole PSN to the clinical outcome prediction. Nevertheless, a better performance is obtained when both centrality and modularity features are involved in most cases.
Relevance propagation results. We applied Integrated Gradients implemented by DeepExplain for computing attributions of input features because DeepExplain can be readily conducted on Tensorflow V1 models. An attribution vector were generated for each sample, thus forming an attribution matrix of size (n sample , n feature ) . In order to generalize the saliency score for each input feature, we calculated the magnitude of the attribution vector across all the samples. Then the input features were removed one at a time according to their ranks of saliency.
In Tables 1 and 2, the Abridged Feature type rows show the performance of DNN after eliminating insignificant features. We found that in the process of removing the input features one by one, the DNN performances did not drop until most features were eliminated. Specifically, for the case of Feture-level fusion on SEQC dataset, when only 12 out of 233 features were preserved, the performance almost maintained the same with the original models. As for Network-level fusion on TARGET dataset, the performance maintained until 11 out of 57 features were left. From the perspective of F1 score, eliminating the irrelevant features even enhanced the performances slightly on both datasets.
Then we investigated about these remained features. The indices of the remained features in SEQC dataset are [25,171,125,202,51,211,118,87,14,13,15,18], all of which represent memberships of modules clustered by the spectral or SBM algorithms. In TARGET dataset, the indices are [35,22,31,12,46,47,33,44,5,27,14]. Two of the remained features belong to centrality features, representing load centrality and iterative local clustering coefficient, and the rest 9 features belong to modularity features.

RFE performances.
In addition to DNN, we also explored some machine learning techniques' capabilities of classification with the network features. In the previous section, we compared the performance of DNN models with original input feature set and reduced feature set. In this section, similarly, we would like to compare the performances of several classical classifiers with their performances after Recursive Feature Elimination (RFE). RFE is a technique for feature selection in linear classifiers and is explained in Algorithm 2. The extracted feature set after proposed network-level or feature-level fusion is used for fitting the linear classifiers, and the results with or without RFE feature selection are presented in Table 3 and 4. Table 3 shows the results on SEQC dataset. From the perspective of F1 score, it is apparent that feature-level fusion outperformed network-level fusion to a great extent. The highest F1 scores, achieved by Logistic Regression estimator, are approximate to DNN's results at around 0.71, when 43 out of 234 features are selected. In Table 4 presenting results on TARGET dataset, we found that the accuracy of network-level fusion is superior to feature-level fusion, which is in accordance with the results of DNN models. The best accuracy is also achieved by Logistic Regression classifier, which is around 70.1%.
As shown in Tables 3 and 4, RFE did not enhance the accuracy and F1 score all the time. In most cases, RFE generated comparable results with the baseline models after removing redundant features, while sometimes it even lowered the performances significantly (e.g. the Logistic Regression case of network-level fusion on TAR-GET dataset). Therefore, we demonstrate that the feature saliency ranks given by the linear classifiers are not so reasonable as the ones given by relevance propagation of DNN models. Existing methods. Table 5 displays the comparison of performances obtained by our best model and other popular approaches on SEQC dataset. The three approaches, i.e. RGCCA, MOFA, DIABLO, all aim at reducing the feature dimensionality, discovering the latent factors, and also integrating multi-omics data. Notably, RGCCA and MOFA are unsupervised methods, so we implemented a simple Logistic Regression model for the downstream classification analysis. DIABLO is a supervised method that provides their own function for evaluating performances. As shown, RGCCA is inclined to predict that all the samples belong to an arbitrary label, which leads to unstable prediction with high standard deviation and a low F1 score. A potential reason of its poor performance could be that the selected components are tough to decide when the input feature dimensionality is enormous. MOFA achieves a higher F1 score than our method, but its accuracy is over ten percent lower than ours. DIABLO also yields an inferior accuracy than ours, and unfortunately, DIABLO does not offer any evaluation using metrics of F1 score, which renders it difficult to evaluate its performance on an imbalanced dataset. Generally, our method outperformed the other currently popular approaches on SEQC dataset. Table 6 shows the comparison of results on TARGET dataset. The best performance of our method given by Logistic Regression model significantly outperformed the other three existing approaches, from both accuracy's and F1 score's perspectives. Therefore, we demonstrate that our method is effective in distilling the paramount features for clinical outcome prediction in neuroblastoma.

Discussion and conclusion
We addressed two challenges, heterogeneity and high dimensionality of multi-omics data, for their integration and analyses by using network-based methods. The multi-omics data were used for building PSNs where nodes represented patients and nodal features of PSNs were used to represent patients' features. This enables a huge reduction in feature dimensionality -from tens of thousands to tens. Multi-omics data are heterogeneous but the PSNs built with different omics data are homogeneous and can be readily combined since they have similar configurations. Building PSN from multi-omics data allows for both dimensionality reduction and conversion from different omics types to homogeneous networks. Currently, one limitation of this research might be there are only two available omics types in our datasets. In the future, we plan to experiment our approach on datasets with more types of omics data and evaluate the performance of their integration. We achieved about 79% and 70% accuracies on SEQC and TARGET datasets, respectively, for clinical end point prediction for neuroblastoma, which were significant improvements over the accuracies obtained only with one omics data type. These have important implications practically as the survival rates of neuroblastoma is about 50%. In our experiments, we used only two omics types in the datasets and our methods are generalizable for any number of omics datasets. Our experiments showed that network-level fusion where integration of multi-omics datasets is achieved by fusing homogeneous networks performed better than simply combining features of different PSNs. We used SNF for combining PSNs but one may use other techniques such as tensor based integration. However, when the two subsets both belong to the same omics, e.g. RNA-Seq and microarray in the SEQC dataset, feature-level fusion is prone to generate better results, where the centrality features are averaged and modularity features are concatenated.
The aim for designing DNN relevance propagation and RFE experiments is to identify the paramount network features in the input set. For centrality features, we discovered the most suitable algorithm used for representing a node's importance in the network. And for modularity features, we identified the essential modules that plays key roles in the clinical outcome prediction. Comparing the baseline feature set with the abridged feature set on DNN, even though the performance is comparative, we realized the majority of the extracted features are insignificant to clinical outcome prediction in neuroblastoma, while only a fraction of the network features is highly related to the task. In the future, we shall investigate more about how these salient features can be extracted and identified in one shot, and how they serve to predict clinical endpoints.
However, scrutinizing the RFE results, we found that RFE strategy failed to enhance the performance, and the selected features varied dramatically in different cases of random splitting, disallowing us to further explore the selected features in a generalized way. Consequently, we believe that relevance propagation in DNN models is more rational than RFE to delve into the networks features extracted from PSN.
As collection of multi-omics data becomes more affordable, novel approaches to omics data integration and analysis are of necessity. By comparing our approach with several existing methods, we demonstrate the potentials of network-based approaches with an application to neuroblastoma clinical outcome prediction. One can further explore our methods on other cancers or complex diseases.