Network-based Biased Tree Ensembles (NetBiTE) for Drug Sensitivity Prediction and Drug Sensitivity Biomarker Identification in Cancer

Oskooei, Ali; Manica, Matteo; Mathis, Roland; Martínez, María Rodríguez

doi:10.1038/s41598-019-52093-w

Download PDF

Article
Open access
Published: 04 November 2019

Network-based Biased Tree Ensembles (NetBiTE) for Drug Sensitivity Prediction and Drug Sensitivity Biomarker Identification in Cancer

Scientific Reports volume 9, Article number: 15918 (2019) Cite this article

2575 Accesses
18 Citations
1 Altmetric
Metrics details

Subjects

Abstract

We present the Network-based Biased Tree Ensembles (NetBiTE) method for drug sensitivity prediction and drug sensitivity biomarker identification in cancer using a combination of prior knowledge and gene expression data. Our devised method consists of a biased tree ensemble that is built according to a probabilistic bias weight distribution. The bias weight distribution is obtained from the assignment of high weights to the drug targets and propagating the assigned weights over a protein-protein interaction network such as STRING. The propagation of weights, defines neighborhoods of influence around the drug targets and as such simulates the spread of perturbations within the cell, following drug administration. Using a synthetic dataset, we showcase how application of biased tree ensembles (BiTE) results in significant accuracy gains at a much lower computational cost compared to the unbiased random forests (RF) algorithm. We then apply NetBiTE to the Genomics of Drug Sensitivity in Cancer (GDSC) dataset and demonstrate that NetBiTE outperforms RF in predicting IC50 drug sensitivity, only for drugs that target membrane receptor pathways (MRPs): RTK, EGFR and IGFR signaling pathways. We propose based on the NetBiTE results, that for drugs that inhibit MRPs, the expression of target genes prior to drug administration is a biomarker for IC50 drug sensitivity following drug administration. We further verify and reinforce this proposition through control studies on, PI3K/MTOR signaling pathway inhibitors, a drug category that does not target MRPs, and through assignment of dummy targets to MRP inhibiting drugs and investigating the variation in NetBiTE accuracy.

Genome-wide investigation of gene-cancer associations for the prediction of novel therapeutic targets in oncology

Article Open access 01 July 2020

DrugnomeAI is an ensemble machine-learning framework for predicting druggability of candidate drug targets

Article Open access 24 November 2022

Using predictive machine learning models for drug response simulation by calibrating patient-specific pathway signatures

Article Open access 27 October 2021

Introduction

There is strong evidence that the tumor’s genetic and epigenetic makeup can influence the outcome of anti-cancer drug treatments^1,2,3,4,5, resulting in heterogeneity in patient clinical response to therapeutic drugs¹. This varied clinical response has led to the promise of personalized (or precision) medicine in cancer, where molecular biomarkers, e.g. gene expression, obtained from a patient’s tumor profiling may be used to design a personalized course of treatment. Targeted treatments have been shown to improve survival rates, for instance, in treating chronic myeloid leukemia (BCR–ABL) and malignant melanoma (BRAF)^6,7. Despite these success stories, variability in drug response still remains an open challenge and the link between genetic and epigenetic alterations and drug response is not appropriately characterized for a large number of cancer drugs^8,9. As large datasets emerge containing genetic profiles of tumors and their associated drug sensitivity, there is a need for computational methods that can effectively harness the available data and link genetic profiles with drug sensitivity through identification of important biomarkers^{10,11,12,13,14,15}.

The Sanger Institute’s Genomics of Drug Sensitivity in Cancer (GDSC) database is a vast resource of over 200 cancer compounds screened with over a thousand genetically profiled pan-cancer cell lines⁸. The dataset has been of particular interest for drug sensitivity prediction and biomarker identification efforts^{1,10,16,17,18,19,20}. These include a number of works employing quantitative, statistical and machine learning methods such as: Cell line-similarity and drug-similarity based models²¹ multilevel mixed effect models using all drug-cell line combinations²², quantitative structure-activity relationship (QSAR) analysis using kernelized Bayesian matrix factorization²³, lasso and elastic net models for drug sensitivity prediction and target identification^10,24,25, collaborative filtering based methods for drug sensitivity prediction^26,27 as well as logic models for predictor identification²⁸.

In this work, we introduce a novel machine learning method that enables us to predict IC50 values and identify informative predictors for drug sensitivity using the GDSC dataset. Our approach is based on constructing a biased tree ensemble, where bias is elaborately designed to recapitulate the prior knowledge of drug targets and their high-confidence biomolecular interactions extracted from the STRING database of molecular interactions²⁹. Tree ensemble methods^30,31 such as the popular random forests (RF)³² algorithm consist of an aggregation of decision trees and are suitable for dealing with high-dimensionality³³ (i.e. small number of samples and large number of features) that is often encountered in biomolecular datasets. In addition, unlike regularized linear methods such as lasso and elastic net³⁴, regression trees can capture non-linear relationships. Furthermore, tree ensemble methods are robust and have few tuning parameters (number of trees, mtry, and tree depth) and as such are easy to train. Due to these favorable attributes, tree ensembles, and in particular the random forests algorithm, have been used extensively for the analysis of biomolecular data^{35,36,37,38,39,40}.

In this paper, we first introduce the Biased Tree Ensembles (BiTE) approach, where the classification and regression trees (CART)⁴¹ are constructed according to prior knowledge. Unlike random tree ensembles (i.e., RF) in which all features have an equal probability of being selected as split variables in a tree (Fig. 1A), in BiTE, features that are more important or informative according to the available prior knowledge are given a higher probability (Fig. 1B). We demonstrate that BiTE is a more transparent and interpretable algorithm compared to RF, as it is immediately clear which set of features contributed the most to the model performance. For instance, if a set of features results in BiTE’s loss of accuracy, it can be deduced that the features were uninformative predictors; conversely, an improved accuracy can be attributed to the set of features towards which we biased the model. In this manner, BiTE may be used to examine the predictive power of various features in a transparent and controllable manner.

Building upon BiTE, we propose the Network-based Biased Tree Ensembles (NetBiTE) algorithm, where two layers of prior knowledge – instead of one in BiTE – are fed into the model. First, drug target proteins are determined from drug databases and the literature and are assigned an initial bias weight. Second, the initial bias weights are propagated over STRING, a network of protein-protein-interaction (PPI) comprising the entire gene set. A number of network-based methods have been previously put forward that take advantage of PPI networks in combination with biomolecular profiles of cells, in order to identify subnetworks that represent a pathway or a functional complex^42,43. Network propagation (or diffusion) over PPI networks has been previously used to identify pathways, subnetworks or associations that represent a disease, a tumor type or a patient^44,45,46. Network propagation in essence defines a “neighborhood of influence” surrounding an entity of interest, for instance a mutated gene⁴⁶. NetBiTE utilizes network diffusion over STRING PPI network in order to establish a neighborhood of influence surrounding the drug target proteins and construct tree ensembles that are biased towards this neighborhood. Even though several modified random forests or tree ensembles algorithms have been previously proposed^47,48,49,50, to the best of our knowledge, NetBiTE is the first algorithm in which multiple layers of prior knowledge are quantitatively and systematically combined and utilized in constructing biased tree ensembles.

In the following sections, we demonstrate that BiTE and NetBiTE outperform RF in predicting IC50 drug sensitivity using both a synthetic dataset and the GDSC dataset. In addition, we showcase how NetBiTE in conjunction with the GDSC dataset and the STRING PPI network can identify important biomarkers for drug sensitivity.

The organization of this paper is as follows. In section 2.1, we compare BiTE versus RF using a synthetic dataset. We showcase that BiTE can achieve a superior performance and stability at a significantly lower computational cost. In section 2.2, we apply NetBiTE to the GDSC data for a panel of 50 cancer drugs and compare the predictive performance with that of RF. We demonstrate that NetBiTE achieves significant accuracy gains over RF for drugs than inhibit membrane receptor pathways (MRPs), suggesting that the expression of their reported target genes is an informative biomarker for drug sensitivity. We further investigate this hypothesis by studying all drugs within the GDSC database that target MRPs and by performing two control experiments. In section 3, we discuss the possible reasons behind our observations in the context of prior findings related to the role of oncogenes in cancer development as well as drug sensitivity and resistance.

Methods and Materials

Dataset

Throughout this work we made use of the gene expression and drug IC50 data publicly available as part of the Genomics of Drug Sensitivity in Cancer (GDSC) database. The Genomics of Drug Sensitivity in Cancer Project is part of a collaboration between The Cancer Genome Project at the Wellcome Trust Sanger Institute (UK) and the Center for Molecular Therapeutics, Massachusetts General Hospital Cancer Center (USA)^1,2.

The dataset includes screening results of more than a thousand genetically profiled human pan-cancer cell lines with a wide range of anti-cancer compounds (265 compounds as of the writing of this paper). The screened compounds include chemotherapeutic drugs as well as targeted therapeutics from various sources². We based our models on gene expression data as it has been shown to be more predictive of drug sensitivity in comparison to genomics (i.e., copy number variation and mutations) or epigenomics (i.e., methylation) data⁵¹.

The synthetic dataset used in the case study of Fig. 2 was created by randomly selecting a subset of 200 cell lines and 200 genes from the standardized RMA basal expression profiles within the GDSC database. Synthetic IC50 values were generated from two randomly selected “drug target” genes according to the following 5^th degree relationship:

$$IC{50}_{synthetic}=a{X}_{i}^{5}+b{X}_{j}^{5}+c{X}_{i}^{3}{X}_{j}^{2}+d{X}_{i}^{2}{X}_{j}^{3}\,i,j\in \{{\mathbb{N}} < 200\}$$

(1)

A panel of 50 cancer compounds that had been screened with the highest number of cell lines (highest number of samples, n = 883) was selected for the initial characterization (Supplementary Fig. S1) and RF versus NetBiTE comparison studies (Fig. 3). The drugs in the studied panel belonged to twelve different categories of drugs (Fig. 3C).

Studies on individual drug categories were performed for four different drug categories targeting four different signaling pathways. The four drug categories, the number of cell lines screened (n) within each category, and the number of genes within the dataset (m) are: RTK signaling pathway inhibitors (RSPIs) drugs (21 drugs, n = 310, m = 15698), EGFR signaling pathway inhibitors (ESPIs) drugs (7 drugs, n = 309, m = 15698), PI3K/MTOR signaling pathway inhibitors (PMSPIs) drugs (21 drugs, n = 238, m = 15698) and IGFR signaling pathway inhibitors (ISPI) drugs (4 drugs, n = 820, m = 15698). All RMA basal expression profiles were standardized and IC50 drug response data were scaled between zero and one according to the following equation,

$$IC{50}_{n}=\frac{IC50-IC{50}_{min}}{IC{50}_{max}-IC{50}_{min}}$$

(2)

where IC50_max and IC50_min are the maximum and minimum IC50s of a given drug across all tested cell lines.

Methods

We have developed a new methodology for drug sensitivity prediction using a combination of gene expression data and prior knowledge. Our Biased Tree Ensemble (BiTE) algorithm consists of an ensemble of CART trees that are constructed according to a bias distribution that represents prior knowledge about the drug targets and their interactions and associations with other genes within the dataset. In the popular random forests algorithm (RF), a subset (d < m) of the total number of features is randomly selected and used to determine the next split variable that minimizes the variance (for regression) or Gini impurity (for classification)³². For datasets with a large number of features, such as gene expression data, the optimization process can become computationally demanding³⁴.

Unlike the RF algorithm in which all features have an equal chance of being selected, in our method, we assign a weight to each feature (i.e., gene), that controls the probability of the feature being selected as a split variable. The variable selection scheme in BiTE, biases the selection of split variables towards a specified subset of variables (e.g. drug targets) that are known to be relevant or informative from prior knowledge. In addition to significantly reducing the computational cost, injection of informative prior knowledge renders the model more transparent and interpretable. Hence, BiTE can be used to verify the importance of a set of features by assigning a higher selection weight to the said features and investigating the variation in the model accuracy as compared to a model with uniform weights. Improvement or deterioration of the model performance, compared to the unbiased model, would indicate the predictive power of the studied feature set or lack thereof. Using this unique feature of BiTE, one can for instance, identify biomarkers for cancer drug sensitivity as we demonstrate in this work.

To construct the bias weight distribution to be ingested into the algorithm, we assign a high bias weight (e.g., W = 0.6 or W = 1, where W = 0 means never select and W = 1 means always select) to the reported drug target genes, while assigning a very small positive weight (ε = 1e-5) to all other genes (Figs 2B and 3A). For drug sensitivity prediction studies, we implement an additional step of propagating the initial weights distribution (W₀) over the STRING protein-protein-interaction (PPI) network. The STRING network contains protein-protein interaction information from various sources, including high-throughput lab experiments, computational predictions, automated text-mining of public text collections and information from additional databases²⁹. By propagating the initial weights over STRING, we smoothen the initial weights over a prior knowledge-based PPI network and thereby, introduce an additional layer of prior knowledge to our algorithm’s weighting scheme. Our Network-based Biased Tree Ensembles (NetBiTE) algorithm is a result of combining BiTE with network propagation of bias weights.

The network smoothening of the weights can be described as a random walk and propagation of the initial drug target weights throughout the network (Fig. 3A). Let us denote the initial weights as W₀ and the string network as S = (P, E, A), where P are the protein vertices of the network, E are the edges between the proteins and A is the weighted adjacency matrix. The adjacency matrix weights indicate the level of confidence that a certain interaction exists. The smoothened weights are determined from an iterative solution of the propagation function^44,45,52:

$${W}_{t+1}=\alpha {W}_{t}A^{\prime} +(1-\alpha )\,{W}_{0},\,A^{\prime} ={D}^{-\frac{1}{2}}A{D}^{-\frac{1}{2}},$$

(3)

where A′ is the normalized adjacency matrix – an adjacency matrix where the weight for each edge is normalized by the degrees of its end points – resulting in the probability of an edge existing between two nodes in a random walk over the network. D is a diagonal matrix with the row sums of the adjacency matrix on the diagonal. The diffusion tuning parameter, α (0 < α < 1), defines the distance that the prior knowledge weights can diffuse through the network. The higher α is, the more smoothened the resulting weights distribution will be. The optimal value of α depends on the network and is reported to be 0.7 for the STRING network⁴⁴. We further investigated the role of α and confirmed that for our drug sensitivity prediction studies α = 0.7 indeed results in the most desirable model performance (see Supplementary Section S2 and Fig. S2). Adopting a convergence rule of e = (W_t+1 − W_t) < 1e-6, we solved Eq. 3 iteratively for each drug and used the resultant bias weights distribution, W_s, in building a biased tree ensemble for IC50 prediction.

The propagation of the initial drug target weights over STRING simulates the propagation of perturbations within the cell following the drug administration. The biological linkage between the drug targets and their associated effectors is simulated via a random walk-based propagation of initial target weights over the PPI network, resulting in the spreading of the weights to the neighborhood of influence of a drug target, i.e. the set of high-confidence neighbors of the drug target.

In all IC50 prediction studies with NetBiTE and RF, we used a number of trees of n_tree = 500 and a target partition size (TPS) of TPS = 1, as suggested by our tuning parameter analysis (see Supplementary Section S1). In addition, mtry was set to the number of reported targets for each drug. To evaluate the model performance, thirty to forty-fold cross validation (30 ≤ k ≤ 40) was used, based on the size of the dataset. Data samples were divided into k parts and at each step of the computation, one fraction of the data was left out as a test sample for model evaluation while the model was trained on the remaining data samples. This process was repeated until all subsets of the dataset were used once as test samples. The model performance was evaluated by determining the Pearson correlation, or the square root of the coefficient of determination ($\rho =\sqrt{{R}^{2}}$) between the actual IC50 values and the predicted ones^51,53. In IC50 prediction experiments, each complete round of computation was repeated 10 times and the mean of 10 trials was used as the representative model prediction. The repetition and averaging was used to minimize fluctuations inherent in all algorithms based on random sampling, e.g. RF and to lesser degree, NetBiTE (see Fig. 2 for comparison).

For computations performed in this work, we used Ranger random forests implementation in C++⁵⁴ as well as the Scikit-learn random forests implementation⁵⁵. Ranger implementation accepts a split weights vector as an input and enables biased selection of split variables, necessary for NetBiTE computations. Data preprocessing, analysis and postprocessing were performed entirely in Python 3.5 and the plots and graphs were generated using the matplotlib Python library⁵⁶.

Results

Comparing BiTE with RF using a synthetic dataset

We performed a case study with a small synthetic dataset (200 × 200) to highlight the key differences and advantages of BiTE over RF in the context of drug sensitivity prediction. We created a synthetic dataset by extracting a subset of the GDSC rma normalized basal expression profiles; synthetic IC50 values were generated using a nonlinear drug sensitivity function based on two randomly selected targets (Fig. 2A).

Using the synthetic dataset, we compared the performances of four algorithms: (i) standard random forests (RF); (ii) BiTE with a weight of W = 0.6 and a small positive weight (ε = 1e-5) given to the two artificial drug targets and all other genes respectively – where W = 0 means a feature is never selected and W = 1 means it is always selected (see Fig. 2B); (iii) XGBoost⁵⁷ tree boosting algorithm; and (iv) linear regression (LR) over the two target genes. The performances of all four methods are compared in Fig. 2C–F.

In the case of RF, BiTE and XGBoost, we studied the model performance for various n_tree ranging from 10 to 5000 trees. In addition, in the case of BiTE and RF, we varied mtry for each n_tree from 2 (number of targets) to 200 (the total number of genes). As expected, for all investigated n_tree, RF achieves the highest performance when mtry reaches its maximum value of 200, as the model can test all available genes and pick the most informative. It is noteworthy that BiTE achieves the same maximal performance (ρ ≈ 0.9) and stability with a minimal mtry (mtry = 2) even at tree numbers as low as (n_tree = 10), which represents a significant saving in computational cost. XGBoost significantly outperform RF in low n_tree and mtry but is consistently outperformed by BiTE at both low and high number of trees, highlighting the importance of the informative prior knowledge accessible to BiTE.

In all four subplots (Fig. 2C–F), we have compared RF, XGBoost and BiTE with a linear regression model (the black dotted lines). As we have adopted a nonlinear drug sensitivity function, we do not expect LR to perform well, however it is insightful to include LR performance as a baseline. A conclusion that can be immediately drawn from the plots is that for very low mtry or n_tree, RF can underperform LR, even though the former is an inherently nonlinear model that should perform better in capturing a nonlinear relationship.

It is noteworthy that the results in Fig. 2 have been obtained using an ideal dataset with well-defined targets and IC50 function, resulting in an outstanding predictive performance using BiTE. Such significant improvement in model performance is not to be expected when using real IC50 data such as the GDSC data, where the informative predictors and the dependency of the IC50 values on these predictors is only partially known. However, if high quality prior knowledge is available, we do expect to see a clear improvement in model performance, stability and computational running times when compared to RF and XGBoost, as shown in Fig. 2. In the coming sections, we will further investigate whether our findings with the synthetic data can be generalized to real drug sensitivity data.

Drug sensitivity prediction using network-based NetBiTE

We applied our network-based Biased Tree Ensembles (NetBiTE) to the GDSC data for a panel of 50 cancer drugs. Unlike BiTE (section 2.1), where the bias weights are assigned solely based on our knowledge of drug targets (the first layer of prior knowledge), NetBiTE propagates the initial weight distribution over a network of molecular interactions (second layer of prior knowledge). Weight propagation results in a smoothened weight distribution where neighbors of known drug targets are also assigned a high weight (Fig. 3A).

As shown in Fig. 3A, for each drug in the studied panel of 50 drugs, a weight of one was given to each drug target and the resulting weight vector (W₀) was propagated over the STRING PPI network²⁹. As described in detail in section 2.2, the smoothening propagates the initial weights over the network resulting in lower weights for the targets (W_s,t < 1) and positive weights for all other genes within the network (W_s,I > 0). The smoothened weight distribution (W_s) is then fed to BiTE and the IC50 values for each drug are estimated.

In Fig. 3B, we study the effect of the tuning parameter, α, that controls the diffusion depth of prior knowledge within the network, i.e. how much the known targets influence their neighbors. We compare NetBiTE’s predictive accuracy with the optimal α = 0.7, as reported in the literature⁴⁴, and a much smaller value of α = 0.02. As shown in the plot, NetBiTE achieved a much higher accuracy with α = 0.7 (ρ ≈ 0.5) than with α = 0.02 (ρ ≈ 0.26). Hence, we adopted α = 0.7 in the remaining NetBiTE models. Figure 3C shows the histogram for the drug categories within the panel of 50 studied drugs. Drugs targeting the PI3K/MTOR pathway, “Other” category, “Other Kinases” and drugs targeting the RTK signaling pathway are most represented. The varying representation in Fig. 3C is a result of the non-uniformity in the representation of various drug categories within the GDSC database.

The comparison between the predictive performance of NetBiTE and RF applied to the panel of 50 drugs is shown in Fig. 4D. At first glance the NetBiTE algorithm does not appear to be more accurate than RF. A closer inspection, however, points out a high variation in accuracy gain with NetBiTE across different drug categories. As shown in Fig. 3E, within the studied panel of 50 drugs, the majority of drugs that target RTK or EGFR signaling pathways show a noticeable improvement in prediction accuracy using NetBiTE while the other drug categories are either worsened or primarily unaffected by the use of NetBiTE. NetBiTE IC50 predictions for RTK and EGFR signaling pathway inhibitors experience the most frequent (60% and 50% of the drugs respectively) and significant improvements in accuracy with RTK inhibitors exhibiting a 25% improvement (Δρ = 0.12) and EGFR inhibitors, a 30% improvement (Δρ = 0.14). It is noteworthy that both RTK signaling and EGFR signaling pathways are membrane receptor pathways, suggesting a trend that we will investigate in the coming sections. The other drug categories experienced infrequent or insignificant improvements that do not reflect a statistically significant trend. Our preliminary observation is that drugs that target membrane receptor proteins achieve higher IC50 prediction accuracy with NetBiTE than with RF, suggesting that the expression of their target genes are informative biomarkers for IC50 drug sensitivity.

To further investigate this hypothesis, we extracted all membrane receptor pathways inhibitors (MRPIs) within the GDSC database. The extracted drugs were inhibitors of three pathways: RTK signaling (21 drugs), EGFR signaling (7 drugs) and IGFR signaling (4 drugs) pathways. We then applied NetBiTE to the data for these three drug categories.

Figure 4A,B show the NetBiTE and RF results for RTK signaling pathway inhibitors (RSPIs). As demonstrated in Fig. 4A, NetBiTE is superior to RF particularly at lower tree numbers, as expected from the case study in 2.1. The difference between the prediction accuracies of NetBiTE and RF for 10 trees is statistically significant (p-value of 0.04 in a two-sided t-test). In Fig. 4B the difference in accuracies of NetBiTE and RF (Δρ = ρ_NB − ρ_RF) is plotted for each RSPI drug. As shown, the majority of RSPI drugs fall under the positive region, meaning their IC50 prediction using NetBiTE was more accurate than RF predictions. 70% of RSPI drugs show improvement with NetBiTE for all numbers of trees with a maximum improvement of 0.3 for Linifanib. These results are in line with our earlier observations in Fig. 3E, where 60% of the RSPBI drugs, in a panel of 50, showed improvement in predictive performance using NetBiTE.

Figure 4C,D show the results for EGFR signaling pathway inhibitors (ESPIs). As shown in Fig. 4C, similarly to RSPI drugs, ESPI drugs show a noticeable improvement in prediction accuracy using NetBiTE. Again, the improvement is more significant at lower tree numbers. Individual improvements in accuracy (Δρ = ρ_NB − ρ_RF) are shown in Fig. 4D. Of the seven ESPI drugs, five show improvement with NetBiTE and four show consistent improvement across all n_tree.

The last category of MRPI drugs in the GDSC dataset are the drugs targeting the IGFR signaling pathway. Although there were only four drugs in this category, for the sake of thoroughness, we applied NetBiTE to these four drugs and compared the results with RF. The comparison between NetBiTE and RF performances for IGFR signaling pathway inhibitors (ISPIs) is shown in Fig. 4E. The results are similar to those shown before: accuracy improvement that is particularly significant at lower tree numbers. Individual gains in predictive performance (Δρ = ρ_NB − ρ_RF) are shown in Fig. 4F. Three out of four drugs show a mean positive accuracy improvement across all tree numbers. However, due to the small number of drugs in this category we could not perform a meaningful statistical test to compare the two methods. The results however, do appear to be in line with the two other MRPI categories of drugs.

By investigating the targets for the groups of drugs that show improved or worsened accuracy (Δρ > 0 or Δρ < 0) with NetBiTE, we observed that certain genes tend to appear more frequently in the targets of either the improved or worsened groups. As shown in the Supplementary Fig. S3, in the case of RTK signaling pathway inhibitors (RSPIs), KIT, FLT3 and PDGFRB were the most frequent targets in improved accuracy group while KDR, TGFBR1 and FGFR1 were the most frequent targets in the worsened group of RSPI drugs. For ESPI drugs, ERBB2 was a target exclusively in the improved group and in the case of ISPI drugs, INSRR was a target exclusively in the drugs with improved accuracy.

The results in Figs 3 and 4 consistently indicate that membrane receptor pathway inhibitors are more accurately predicted with NetBiTE. We hypothesize that the target genes for these drugs are more informative biomarkers of drug sensitivity than the targets of other drug categories. To further verify this hypothesis, we performed two control experiments: In one experiment, we compared NetBiTE and RF for cancer drugs that target a pathway that is not a membrane receptor pathway. We chose PI3K/MTOR signaling pathway inhibitors (PMSPIs), as PI3K/MTOR signaling pathway is a non-membrane tyrosine kinase pathway with a central role in cancer development⁵⁸. Moreover, PMSPIs are well represented in the GDSC database with 21 screened compounds. The results for this control experiment are shown in Fig. 5A,B. As shown in Fig. 5A, NetBiTE performs poorly across all n_tree when compared to RF. Individual accuracy gains (Δρ = ρ_NB − ρ_RF) shown in Fig. 5B, are predominantly negative with only three of the 21 drugs showing a slight positive improvement (Δρ < 0.05) in accuracy with NetBiTE. The observed results suggest that for PMSPI drugs, biasing the NetBiTE model towards the reported drug targets did not result in any accuracy gain, which in turn suggests that the expression of drug target genes for this category of drugs is not an informative biomarker for drug sensitivity.

In a second control experiment we selected two RSPI drugs: one that shows significant improvement in IC50 prediction accuracy with NetBiTE (Linifanib) and one that shows significantly worsened prediction accuracy with NetBiTE (PD173074). We then replaced the target genes for these two drugs with an equal number of randomly selected genes and applied NetBiTE to these new “dummy” drug targets. The results for this experiment are shown in Fig. 5C,D. Figure 5C shows that the use of dummy drug targets results in a reduced model accuracy that falls even below the RF accuracy for Linifanib. The results suggest that the randomly selected drug targets were uninformative, resulting in a poor predictive performance less than half of the prediction accuracy of NetBiTE with correct targets (ρ = 0.3 and ρ = 0.7 respectively). On the other hand, in Fig. 5D, the randomly assigned targets for PD173074 resulted in a prediction accuracy similar to that of RF. The results suggest that for PD173074, the drug targets reported in the literature were uninformative predictors for drug sensitivity and, as such, resulted in an inferior predictive performance when compared to randomly selected features.

Generally speaking, these results suggest that for those drugs on which NetBiTE performs poorly in predicting IC50 (i.e. non-responsive drugs), the expression of drug target genes reported in the literature is not an informative biomarker for IC50 drug sensitivity prediction. In the coming section, we will attempt to shed light on the reasons behind our observations in the context of cancer biology and mechanisms of drug sensitivity and resistance. Our preliminary explanation is that for drugs on which NetBiTE performs poorly, the mechanism of action may be through a complex biomolecular cascade that is independent of the expression of the target genes prior to drug administration. For these drugs as a result, there is no association or pattern between the target gene expression and IC50 drug sensitivity that can be identified using machine learning techniques such as NetBiTE.

Conversely, for drugs on which NetBiTE achieves high accuracy gains over RF in predicting IC50 (i.e. responsive drugs), the expression of target genes prior to drug administration appears to be a strong biomarker for IC50 drug sensitivity following drug administration. The results in Fig. 5 further confirm that the observed accuracy improvements with NetBiTE for MRPI drugs are due to the selection of highly informative genes and are not a result of random events, as substitution of the known drug targets by random dummy targets resulted in the loss of NetBiTE accuracy gains.

Discussion

We observed, from the NetBiTE results, that the expression of membrane receptor pathway genes in pan-cancer cell lines is predictive of IC50 drug sensitivity for compounds that inhibit those pathways. Interestingly, in a recent study of drug sensitivity in colorectal cancer for 16 cancer compounds, Schuette et al.⁵⁹ demonstrated experimentally that the sensitivity to EGFR inhibitors may be predicted using molecular biomarkers while they were unable to establish such a link for mTOR inhibitors. For instance, they observed that patient-derived organoid models with ERBB2 amplification were sensitive to drugs targeting ERBB2 (AZD8931 and Afatinib) but less so to drugs targeting EGFR only (Gefitinib)⁵⁹. In addition, it has been shown that ERBB2 amplification in breast cancer is a biomarker for sensitivity to Lapatinib, an ERBB2 inhibitor^1,10,60. This observation is in agreement with our identification of ERBB2 expression as a biomarker for drug sensitivity prediction (see Supplementary Fig. S3). Furthermore, RTK signaling enrichment and FLT3 mutations are a known biomarker for sensitivity to FLT3 inhibitors^51,61. In our studies, FLT3 was a frequent target in drugs that achieved enhanced accuracy with NetBite (see Fig. S3). Quizartinib, a novel second-generation FLT3 inhibitor with enhanced FLT3 specificity has shown favorable clinical outcomes in treating acute myeloid leukemia^61,62. Interestingly, Quizartinib is a top responder to NetBiTE (see Fig. 4B) highlighting the agreement of our findings with the literature where available.

Our findings regarding the MRP gene expression may be explained by the central role these pathways play in cancer development, survival and promotion of drug resistance. Growth factor signals are transmitted across the cell membrane, via their specific membrane receptors, to cytoplasmic signaling effectors that control many critical functions of cancer cells⁶³. In addition, various targeted therapy and drug resistance studies have pointed out that cancer cells appear to rely heavily on one or a few oncogenes or a single oncogenic pathway for survival and proliferation⁶⁴.This phenomenon is widely recognized as oncogenic addiction in cancer^7,64,65. Membrane receptor proteins have been shown to act as proto-oncogenes in various cancer types^66,67,68. For instance, hyper activation of receptor tyrosine kinases (RTKs, two-third of known TKs)⁶⁹ has been shown to be implicated in cancer even in the absence of extracellular activating ligands via overexpression of RTKs⁷⁰, activating mutations⁷¹ or autocrine stimulation^58,72.

In addition, in many cancer cells where membrane receptor pathways are not the key oncogene, they appear to play a central role in promoting drug resistance by compensating for the inhibited pathway^73,74. For instance, in prostate cancer the inhibition of either of the key oncogenic pathways PI3K and AR, results in a feedback upregulation of the other pathway through induction of EGFR family RTK signaling⁷.

One may pose the question, why the sensitivity to other drug categories such as PI3K/MTOR inhibitor drugs are not well predicted through their reported targets. We speculate that other drug categories that are non-responsive to NetBiTE, trigger complex cascade effects⁵⁹ that cannot be captured using only the expression of drug target genes, rendering drug sensitivity biomarker identification challenging. For instance, for non-membrane tyrosine kinases (NRTKs) it’s been suggested that their involvement in cancer may not only be a result of over expression but also due to mutations or translocations⁷². As such we speculate that a more intricate epigenetic mechanism may be involved in the inhibitory mechanism of other drugs that are irresponsive to NetBiTE.

Conclusions

We have presented a new method and approach for drug sensitivity prediction and biomarker identification using a Network-based Biased Tree Ensembles (NetBiTE) algorithm that integrates prior knowledge and genetic profiles of cancer cells in order to make predictions. Prior knowledge comprises the known drug targets and a network of protein-protein interaction (PPI) such as the one provided by the STRING database. Using a synthetic dataset, we demonstrated that by identifying the most informative features and biasing split variable selection towards these features, Biased Tree Ensembles (BiTE) can significantly outperform the random forests (RF) algorithm given the same tuning parameters (n_tree and mtry). In addition, we demonstrated that BiTE may be exploited for identification of biomarkers for cancer drug sensitivity. Next, we introduced NetBiTE, which combines BiTE with the propagation of drug target weights over the STRING PPI network. We used NetBiTE to study a panel of 50 cancer compounds tested with 883 pan-cancer cell lines within the GDSC dataset. The propagation of bias weights over the STRING PPI network simulates the underlying biological interactions between the target proteins and other key effectors following the drug perturbation. We showed that NetBiTE outperforms RF for the majority of drugs that inhibit membrane receptor pathway proteins, suggesting that the expression of membrane receptor pathway genes prior to drug administration is a biomarker for drug sensitivity of these compounds. To verify this observation, we performed two control studies: (1) using our model, we studied drugs that do not act on membrane receptor proteins. In this case, we chose drugs that block PI3K/MTOR pathway because of the prominent role of the pathway in cancer biology and the abundance of compounds within the GDSC database that target this pathway. NetBiTE did not result in improved accuracy when applied to drugs in this category, suggesting that unlike the MRPI drugs, the expression of target genes is not providing NetBiTE with any critical information that would result in an improvement in the model accuracy. (2) We performed a second control study in which we replaced the targets for two RTK signaling pathway inhibitors (Linifanib and PD173074), one responsive and one non-responsive to NetBiTE, with randomly assigned targets. Random assignment of targets resulted in significant worsening of the model performance in the case of Linifanib (the responsive drug) and no improvement at all compared to RF in case of PD173074 (the non-responsive drug). The results further confirmed that the performance improvement with NetBiTE is not a random occurrence, but is indeed the result of injecting informative prior-knowledge into the tree ensembles algorithm. We envision that our devised NetBiTE (and BiTE) method can play a key role as a readily implementable tool for testing the relevance of prior knowledge and for identifying critically informative features in various types of data and in particular in the context of personalized medicine for cancer, where identification of correct biomarkers is critical for the prediction of treatment outcomes. Prediction of the response to anticancer drug combinations⁷⁵ would be a valuable future extension of this work.

References

Garnett, M. J. et al. Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature 483, 570 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Yang, W. et al. Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res. 41, D955–D961 (2012).
Article PubMed PubMed Central CAS Google Scholar
Qu, J., Chen, X., Sun, Y.-Z., Li, J.-Q. & Ming, Z. Inferring potential small molecule–miRNA association based on triple layer heterogeneous network. J. Cheminformatics 10, 30 (2018).
Article CAS Google Scholar
Chen, X., Guan, N.-N., Sun, Y.-Z., Li, J.-Q. & Qu, J. MicroRNA-small molecule association identification: from experimental results to computational models. Brief. Bioinform., https://doi.org/10.1093/bib/bby098 (2018).
Wang, C.-C., Chen, X., Yin, J. & Qu, J. An integrated framework for the identification of potential miRNA-disease association based on novel negative samples extraction strategy. RNA Biol. 16, 257–269 (2019).
Article PubMed PubMed Central Google Scholar
Geeleher, P., Cox, N. J. & Huang, R. S. Cancer biomarker discovery is improved by accounting for variability in general levels of drug sensitivity in pre-clinical models. Genome Biol. 17, 190 (2016).
Article PubMed PubMed Central CAS Google Scholar
Pagliarini, R., Shao, W. & Sellers, W. R. Oncogene addiction: pathways of therapeutic response, resistance, and road maps toward a cure. EMBO Rep. 16, 280–296 (2015).
Article CAS PubMed PubMed Central Google Scholar
Yang, W. et al. Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res. 41, D955–D961 (2013).
Article CAS PubMed Google Scholar
Macaluso, M., Paggi, M. G. & Giordano, A. Genetic and epigenetic alterations as hallmarks of the intricate road to cancer. Oncogene 22, 6472 (2003).
Article CAS PubMed Google Scholar
Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
The Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 61 (2012).
Article ADS PubMed Central CAS Google Scholar
Heiser, L. M. et al. Subtype and pathway specific responses to anticancer compounds in breast cancer. Proc. Natl. Acad. Sci. 109, 2724 (2012).
Article ADS CAS PubMed Google Scholar
The International Cancer Genome Consortium. International network of cancer genome projects. Nature 464, 993 (2010).
Article ADS PubMed Central CAS Google Scholar
Lamb, J. et al. The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease. Science 313, 1929 (2006).
Article ADS CAS PubMed Google Scholar
Shoemaker, R. H. The NCI60 human tumour cell line anticancer drug screen. Nat. Rev. Cancer 6, 813 (2006).
Article CAS PubMed Google Scholar
McDermott, U. et al. Identification of genotype-correlated sensitivity to selective kinase inhibitors by using high-throughput tumor cell line profiling. Proc. Natl. Acad. Sci. 104, 19936 (2007).
Article ADS CAS PubMed PubMed Central Google Scholar
Haverty, P. M. et al. Reproducible pharmacogenomic profiling of cancer cell line panels. Nature 533, 333 (2016).
Article ADS CAS PubMed Google Scholar
Seashore-Ludlow, B. et al. Harnessing Connectivity in a Large-Scale Small-Molecule Sensitivity Dataset. Cancer Discov. 5, 1210 (2015).
Article CAS PubMed PubMed Central Google Scholar
Basu, A. et al. An Interactive Resource to Identify Cancer Genetic and Lineage Dependencies Targeted by Small Molecules. Cell 154, 1151–1161 (2013).
Article CAS PubMed PubMed Central Google Scholar
McDermott, U., Sharma, S. V. & Settleman, J. High‐Throughput Lung Cancer Cell Line Screening for Genotype‐Correlated Sensitivity to an EGFR Kinase Inhibitor. In Methods in Enzymology 438, 331–341 (Academic Press, 2008).
Sheng, J., Li, F. & Wong, S. T. C. Optimal Drug Prediction From Personal Genomics Profiles. IEEE J. Biomed. Health Inform. 19, 1264–1270 (2015).
Article PubMed PubMed Central Google Scholar
Vis, D. J. et al. Multilevel models improve precision and speed of IC50 estimates. Pharmacogenomics 17, 691–700 (2016).
Article CAS PubMed Google Scholar
Ammad-ud-din, M. et al. Integrative and Personalized QSAR Analysis in Cancer by Kernelized Bayesian Matrix Factorization. J. Chem. Inf. Model. 54, 2347–2359 (2014).
Article CAS PubMed Google Scholar
Park, H., Imoto, S. & Miyano, S. Recursive Random Lasso (RRLasso) for Identifying Anti-Cancer Drug Targets. Plos One 10, e0141869 (2015).
Article PubMed PubMed Central CAS Google Scholar
Covell, D. G. Data Mining Approaches for Genomic Biomarker Development: Applications Using Drug Screening Data from the Cancer Genome Project and the Cancer Cell Line Encyclopedia. Plos One 10, e0127433 (2015).
Article PubMed PubMed Central CAS Google Scholar
Liu, H., Zhao, Y., Zhang, L. & Chen, X. Anti-cancer Drug Response Prediction Using Neighbor-Based Collaborative Filtering with Global Effect Removal. Mol. Ther. Nucleic Acids 13, 303–311 (2018).
Article CAS PubMed PubMed Central Google Scholar
Zhang, L., Chen, X., Guan, N.-N., Liu, H. & Li, J.-Q. A Hybrid Interpolation Weighted Collaborative Filtering Method for Anti-cancer Drug Response Prediction. Front. Pharmacol. 9, 1017 (2018).
Article PubMed PubMed Central CAS Google Scholar
Knijnenburg, T. A. et al. Logic models to predict continuous outputs based on binary inputs with an application to personalized cancer therapy. Sci. Rep. 6, 36812 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Szklarczyk, D. et al. STRING v10: protein–protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 43, D447–D452 (2015).
Article CAS PubMed Google Scholar
Yang, P., Hwa Yang, Y., B Zhou, B. & Zomaya, Y. A. A review of ensemble methods in bioinformatics. Curr. Bioinforma. 5, 296–308 (2010).
Article CAS Google Scholar
Lavanya, D. & Rani, K. U. Ensemble decision tree classifier for breast cancer data. Int. J. Inf. Technol. Converg. Serv. 2, 17 (2012).
Google Scholar
Breiman, L. Random Forests. Mach. Learn. 45 (2001).
Caruana, R., Karampatziakis, N. & Yessenalina, A. An empirical evaluation of supervised learning in high dimensions. In 96–103 (ACM, 2008).
Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition. (Springer New York, 2009).
Chen, X. & Ishwaran, H. Random forests for genomic data analysis. Genomics 99, 323–329 (2012).
Article CAS PubMed Google Scholar
Moon, H. et al. Ensemble methods for classification of patients for personalized medicine with high-dimensional data. Artif. Intell. Med. 41, 197–207 (2007).
Article PubMed Google Scholar
Pang, H. et al. Pathway analysis using random forests classification and regression. Bioinformatics 22, 2028–2036 (2006).
Article MathSciNet CAS PubMed Google Scholar
Fan, Y. et al. Applying random forests to identify biomarker panels in serum 2D-DIGE data for the detection and staging of prostate cancer. J. Proteome Res. 10, 1361–1373 (2011).
Article CAS PubMed Google Scholar
Ye, Y., Wu, Q., Huang, J. Z., Ng, M. K. & Li, X. Stratified sampling for feature subspace selection in random forests for high dimensional data. Pattern Recognit. 46, 769–787 (2013).
Article Google Scholar
Díaz-Uriarte, R. & Alvarez de Andrés, S. Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7, 3 (2006).
Article PubMed PubMed Central CAS Google Scholar
Steinberg, D. & Colla, P. CART: classification and regression trees. Top Ten Algorithms Data Min. 9, 179 (2009).
Article Google Scholar
Chuang, H.-Y., Lee, E., Liu, Y.-T., Lee, D. & Ideker, T. Network-based classification of breast cancer metastasis. Mol. Syst. Biol. 3, 140 (2007).
Article PubMed PubMed Central Google Scholar
Calvano, S. E. et al. A network-based analysis of systemic inflammation in humans. Nature 437, 1032 (2005).
Article ADS CAS PubMed Google Scholar
Hofree, M., Shen, J. P., Carter, H., Gross, A. & Ideker, T. Network-based stratification of tumor mutations. Nat. Methods 10, 1108 (2013).
Article CAS PubMed PubMed Central Google Scholar
Vanunu, O., Magger, O., Ruppin, E., Shlomi, T. & Sharan, R. Associating Genes and Protein Complexes with Disease via Network Propagation. PLOS Comput. Biol. 6, e1000641 (2010).
Article ADS MathSciNet PubMed PubMed Central CAS Google Scholar
Vandin, F., Upfal, E. & Raphael, B. J. Algorithms for Detecting Significantly Mutated Pathways in Cancer. J. Comput. Biol. 18, 507–522 (2011).
Article MathSciNet CAS PubMed Google Scholar
C. Zhang, Y. Li, Z. Yu & F. Tian. A weighted random forest approach to improve predictive performance for power system transient stability assessment. In 2016 IEEE PES Asia-Pacific Power and Energy Engineering Conference (APPEEC) 1259–1263, https://doi.org/10.1109/APPEEC.2016.7779695 (2016).
Xu, B., Huang, J. Z., Williams, G. & Ye, Y. Hybrid weighted random forests for classifying very high-dimensional data. Int. J. Data Warehous. Min. 8, 44–63 (2012).
Article Google Scholar
Amaratunga, D., Cabrera, J. & Lee, Y.-S. Enriched random forests. Bioinformatics 24, 2010–2014 (2008).
Article CAS PubMed Google Scholar
Ye, Y., Li, H., Deng, X. & Huang, J. Z. Feature weighting random forest for detection of hidden web search interfaces. Int. J. Comput. Linguist. Chin. Lang. Process. Vol. 13 Number 4 Dec. 2008 13, 387–404 (2008).
Google Scholar
Menden, M. P. In silico models of drug response in cancer cell lines based on various molecular descriptors. (University of Cambridge, 2016).
Zhou, D., Bousquet, O., Lal, T. N., Weston, J. & Schölkopf, B. Learning with local and global consistency. In 321–328 (2004).
Menden, M. P. et al. Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties. PLoS One 8, e61318 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Wright, M. N. & Ziegler, A. ranger: A fast implementation of random forests for high dimensional data in C++ and R. ArXiv Prepr. ArXiv150804409 (2015).
Pedregosa, F. et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
MathSciNet MATH Google Scholar
Hunter, J. D. Matplotlib: A 2D Graphics Environment. Comput. Sci. Eng. 9, 90–95 (2007).
Article Google Scholar
Chen, T. & Guestrin, C. Xgboost: A scalable tree boosting system. In 785–794 (ACM, 2016).
Paul, M. K. & Mukhopadhyay, A. K. Tyrosine kinase–role and significance in cancer. Int. J. Med. Sci. 1, 101 (2004).
Article CAS PubMed PubMed Central Google Scholar
Schütte, M. et al. Molecular dissection of colorectal cancer in pre-clinical models identifies biomarkers predicting sensitivity to EGFR inhibitors. Nat. Commun. 8, 14262 (2017).
Article ADS PubMed PubMed Central CAS Google Scholar
Konecny, G. E. et al. Activity of the dual kinase inhibitor lapatinib (GW572016) against HER-2-overexpressing and trastuzumab-treated breast cancer cells. Cancer Res. 66, 1630–1639 (2006).
Article CAS PubMed Google Scholar
Wander, S. A., Levis, M. J. & Fathi, A. T. The evolving role of FLT3 inhibitors in acute myeloid leukemia: quizartinib and beyond. Ther. Adv. Hematol. 5, 65–77 (2014).
Article CAS PubMed PubMed Central Google Scholar
Yamaura, T. et al. A novel irreversible FLT3 inhibitor, FF-10101, shows excellent efficacy against AML cells with FLT3 mutations. Blood 131, 426 (2018).
Article CAS PubMed Google Scholar
Bianco, R., Melisi, D., Ciardiello, F. & Tortora, G. Key cancer cell signal transduction pathways as therapeutic targets. Eur. J. Cancer 42, 290–294 (2006).
Article CAS PubMed Google Scholar
Weinstein, I. B. & Joe, A. K. Mechanisms of disease: oncogene addiction—a rationale for molecular targeting in cancer therapy. Nat. Rev. Clin. Oncol. 3, 448 (2006).
Article CAS Google Scholar
Weinstein, I. B. & Joe, A. Oncogene Addiction. Cancer Res. 68, 3077 (2008).
Article CAS PubMed Google Scholar
Yarden, Y. et al. Human proto-oncogene c-kit: a new cell surface receptor tyrosine kinase for an unidentified ligand. EMBO J. 6, 3341–3351 (1987).
Article CAS PubMed PubMed Central Google Scholar
Naoe, T. & Kiyoi, H. Oncogenic protein tyrosine kinases. Cell. Mol. Life Sci. CMLS 61, 2932–2938 (2004).
Article CAS PubMed Google Scholar
Pollak, M. Insulin and insulin-like growth factor signalling in neoplasia. Nat. Rev. Cancer 8, 915 (2008).
Article CAS PubMed Google Scholar
Gschwind, A., Fischer, O. M. & Ullrich, A. The discovery of receptor tyrosine kinases: targets for cancer therapy. Nat. Rev. Cancer 4, 361 (2004).
Article CAS PubMed Google Scholar
Wang, R., Kobayashi, R. & Bishop, J. M. Cellular adherence elicits ligand-independent activation of the Met cell-surface receptor. Proc. Natl. Acad. Sci. 93, 8425–8430 (1996).
Article ADS CAS PubMed PubMed Central Google Scholar
Weiner, D. B., Liu, J., Cohen, J. A., Williams, W. V. & Greene, M. I. A point mutation in the neu oncogene mimics ligand induction of receptor aggregation. Nature 339, 230 (1989).
Article ADS CAS PubMed Google Scholar
Sierra, J. R., Cepero, V. & Giordano, S. Molecular mechanisms of acquired resistance to tyrosine kinase targeted therapy. Mol. Cancer 9, 75 (2010).
Article PubMed PubMed Central CAS Google Scholar
Pillay, V. et al. The Plasticity of Oncogene Addiction: Implications for Targeted Therapies Directed to Receptor Tyrosine Kinases. Neoplasia 11, 448–IN2 (2009).
Article CAS PubMed PubMed Central Google Scholar
Jones, H. E. et al. Insulin-like growth factor-I receptor signalling and acquired resistance to gefitinib (ZD1839; Iressa) in human breast and prostate cancer cells. Endocr. Relat. Cancer 11, 793–814 (2004).
Article CAS PubMed Google Scholar
Chen, X. et al. NLLSS: Predicting Synergistic Drug Combinations Based on Semi-supervised Learning. PLoS Comput. Biol. 12, e1004975 (2016).
Article PubMed PubMed Central CAS Google Scholar

Download references

Acknowledgements

The authors would like to thank Drs Costas Bekas and Maria Gabrani for their continuous support and useful discussions. The project leading to this publication has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 668858.

Author information

Authors and Affiliations

IBM Research – Zurich, Säumerstrasse 4, 8803, Rüschlikon, Switzerland
Ali Oskooei, Matteo Manica, Roland Mathis & María Rodríguez Martínez
Institute für Molekulare Systembiologie, Auguste-Piccard-Hof 1, 8093, Zürich, Switzerland
Matteo Manica

Authors

Ali Oskooei
View author publications
You can also search for this author in PubMed Google Scholar
Matteo Manica
View author publications
You can also search for this author in PubMed Google Scholar
Roland Mathis
View author publications
You can also search for this author in PubMed Google Scholar
María Rodríguez Martínez
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.O. contributed to the conception, designed and performed experiments and wrote the manuscript. M.M. and R.M. contributed to the conception and design of experiments. M.R.M. contributed to the conception and design of experiments and wrote the manuscript.

Corresponding author

Correspondence to María Rodríguez Martínez.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Document

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Oskooei, A., Manica, M., Mathis, R. et al. Network-based Biased Tree Ensembles (NetBiTE) for Drug Sensitivity Prediction and Drug Sensitivity Biomarker Identification in Cancer. Sci Rep 9, 15918 (2019). https://doi.org/10.1038/s41598-019-52093-w

Download citation

Received: 14 May 2019
Accepted: 07 October 2019
Published: 04 November 2019
DOI: https://doi.org/10.1038/s41598-019-52093-w

This article is cited by

Network pharmacology: a bright guiding light on the way to explore the personalized precise medication of traditional Chinese medicine
- Ling Li
- Lele Yang
- Peng Li
Chinese Medicine (2023)
Simultaneous regression and classification for drug sensitivity prediction using an advanced random forest method
- Kerstin Lenhof
- Lea Eckhart
- Hans-Peter Lenhof
Scientific Reports (2022)
Biological knowledge-slanted random forest approach for the classification of calcified aortic valve stenosis
- Erika Cantor
- Rodrigo Salas
- Sandra Guauque-Olarte
BioData Mining (2021)
Quantitative Structure–Mutation–Activity Relationship Tests (QSMART) model for protein kinase inhibitor response prediction
- Liang-Chin Huang
- Wayland Yeung
- Natarajan Kannan
BMC Bioinformatics (2020)
Gene regulatory network analysis with drug sensitivity reveals synergistic effects of combinatory chemotherapy in gastric cancer
- Jeong Hoon Lee
- Yu Rang Park
- Sun Gyo Lim
Scientific Reports (2020)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.