A novel heterogeneous network-based method for drug response prediction in cancer cell lines

Zhang, Fei; Wang, Minghui; Xi, Jianing; Yang, Jianghong; Li, Ao

doi:10.1038/s41598-018-21622-4

Download PDF

Article
Open access
Published: 20 February 2018

A novel heterogeneous network-based method for drug response prediction in cancer cell lines

Fei Zhang¹^na1,
Minghui Wang^1,2^na1,
Jianing Xi ORCID: orcid.org/0000-0001-6785-5618²,
Jianghong Yang² &
…
Ao Li^1,2

Scientific Reports volume 8, Article number: 3355 (2018) Cite this article

7998 Accesses
70 Citations
3 Altmetric
Metrics details

Subjects

Abstract

An enduring challenge in personalized medicine lies in selecting a suitable drug for each individual patient. Here we concentrate on predicting drug responses based on a cohort of genomic, chemical structure, and target information. Therefore, a recently study such as GDSC has provided an unprecedented opportunity to infer the potential relationships between cell line and drug. While existing approach rely primarily on regression, classification or multiple kernel learning to predict drug responses. Synthetic approach indicates drug target and protein-protein interaction could have the potential to improve the prediction performance of drug response. In this study, we propose a novel heterogeneous network-based method, named as HNMDRP, to accurately predict cell line-drug associations through incorporating heterogeneity relationship among cell line, drug and target. Compared to previous study, HNMDRP can make good use of above heterogeneous information to predict drug responses. The validity of our method is verified not only by plotting the ROC curve, but also by predicting novel cell line-drug sensitive associations which have dependable literature evidences. This allows us possibly to suggest potential sensitive associations among cell lines and drugs. Matlab and R codes of HNMDRP can be found at following https://github.com/USTC-HIlab/HNMDRP.

Refining the impact of genetic evidence on clinical success

Article Open access 17 April 2024

Feasibility of functional precision medicine for guiding treatment of relapsed or refractory pediatric cancers

Article Open access 11 April 2024

PERCEPTION predicts patient response and resistance to treatment using single-cell transcriptomics of their tumors

Article 18 April 2024

Introduction

Over the past 20 years, significant improvement in genomic profiling technologies have make it possible that personalized medicine become the fashion trend of future medical science^1,2. In comparison with the paradigm of conventional symptoms-oriented drug discovery and development, personalized treatment makes use of tumor response and vulnerability to handle the expensive and limitations in clinical experiments. The major challenge in personalized prevention and treatment is the identification of biomarkers which is critical to understand the pathogenesis of given complex disease³. However, researchers are required to consider the time and cost effectiveness of predictive biomarker in human or animal models as it is not feasible to test the clinical efficacy and toxicity of large populations of cancer patients with hundreds of drugs. High-throughput drug screening technologies enable many studies to conduct large-scale experiments on human cancer cell lines. For instance, two recent consortiums, GDSC⁴ (Genomics of Drug Sensitivity in Cancer) and CCLE⁵ (Cancer Cell Line Encyclopedia) have analyzed around 1500 cancer cell lines and their genomic profiles against 280 drugs. Both of two studies provide genome-wide data of multiple type of cancer cell lines and drug sensitivity data of established anticancer drugs against these cell lines.

For improving understanding of disease and potential personalized medicine, one burgeoning field of interest is the problem of drug response prediction⁶. So far many prediction methods have been developed to facilitate and speed up drug discovery⁷ and repositioning process. For example, Gupta et al. use genomic feature based model to predict anticancer drug responses and have achieved good results based on above dataset⁸. Dong et al. propose a SVM classification model to accurately predict drug sensitivity according to gene expression profile in the CCLE dataset and have attained good performance for several drugs⁹. Meanwhile, Geeleher et al. apply ridge regression model and use the same dataset to predict drug response and also obtain equally good performance¹⁰. This kind of approach underlines the use of cell line’s genomic information in drug response prediction. In addition, many studies begin to pay their attention to the use of heterogeneity relationships among cell line genomic alteration, cell line-drug sensitivity and drug chemical structure. For instance, Liu et al. develop a systematic algorithm to predict the anti-cancer drug response via combining both cell line genomic and compound structure features^11,12. Menden et al. propose a machine learning model to accurately predict cell line-drug sensitivities using both the cell line’s genomic features and the drug’s chemical structure properties¹³. And Ammad-Ud-Din et al. propose a kernelized Bayesian matrix factorization method (KBMF) to predict drug response by integrating the same dataset of cell line genomic and drug chemical properties¹⁴. Based on the same principle, Wang et al. propose a kernel function to correlate the heterogeneous pharmacogenomics information of both cell and drug, and then use SVM classifier to infer the cell line-drug associations¹⁵. And Zhang et al. construct a dual-layer network between cell line and drug and use weighted model to efficiently predict anti-cancer drug response through incorporating similarity between cell line and drug¹⁶.

Despite aforementioned great works have achieved promising results, other factors contributing to predict cell line-drug associations lies in the fact drug-target and protein-protein interaction (PPI) information are often cooperated in drug discovery, which have been demonstrated in previous studies^17,18,19,20. Recently, Stanfield et al. construct a heterogeneous network to compute network profiles for cell lines and drugs, then perform a random walk with restart to predict links between cell lines and drugs based on these profiles²¹. The authors show integrating cell line mutation data, drug responses with PPI network can significantly improve its prediction performance. Despite its effectiveness, drug-target interactions are not integrated into the heterogeneous network to compute network profiles and therefore may influence the prediction results.

Inspired by the above method, there is a strong incentive to combine genomic and compound information with drug-target and PPI interaction information to predict drug responses. Accordingly, we present a novel heterogeneous network-based method for drug response prediction, named HNMDRP, to efficiently predict cell line-drug associations by incorporating cell line genomic profile, drug chemical structure, drug-target and PPI information. We first introduce the similarity measure to construct this heterogeneous network model²² by calculating Pearson correlation coefficient between cell line genomic profiles, drug chemical structures and target gene. Subsequently, we perform an information flow-based algorithm²³ on this network and obtain the score of all cell line-drug pair, where the score is the prediction of drug response. In order to validate the effectiveness of drug-target and PPI information in our cell line-drug-target heterogeneous network, we compare it with existing methods. To perform a proper evaluation on our novel heterogeneous network-based method, we implement leave-one-out cross validation (LOOCV) to demonstrate its superior performance compared with existing state-of-the-art methods: Zhang’s method¹⁶, Stanfield’s method²¹, DLNDRP²⁴, SVMDRP. The comprehensive results show that our method achieves the best AUC values for most drugs. Besides, our method can retrieve the largest true cell line-drug sensitive associations when focusing on the top percent predicted cell-drug associations. We then use our HNMDRP method to find several novel potential sensitive associations according to high-ranking prediction results which are strongly supported by related literatures. These results provide convincing evidence of the good performance of HNMDRP as well as potential value in future biological experiments.

Results

Evaluation of prediction performance of HNMDRP

In this work, leave-one-out cross validation²⁵ (LOOCV) is applied to evaluate the predictive performance of our HNMDRP method in predicting drug response between cell line and drug. At each step of LOOCV experiment²⁶, consistent with previous studies^26,27,28, we treat a sensitive association between a cell line and a drug as testing data by setting the value as 0 in the matrix A_cd. The rest of all associations are treated as training data for model learning. But only the prediction score of testing data is extracted each time. This process is repeated until every sensitive association between cell line and drug is treated as testing data once. Actually, for each given drug, only those cell lines with known associations are ranked in descending order according to the prediction score of LOOCV experiment. Afterward, the receiver operating characteristic (ROC) curve is employed to show the predictive performance of our HNMDRP method and other methods by plotting true positive (sensitive) and false positive (resistant) at different cutoff points²². Here, true positive rate (TPR) represents the percentage of sensitive cases correctly labeled as positives, and false positive refers to the ratio of resistant cases incorrectly labeled as positive. At the same time, we also compare the predictive performance of our method when only removing each information that include drug’s 1-D and 2-D structure information, PPI information, gene-gene correlation information and target similarity network information. The experimental results (as shown in supplementary Figure S4) show that all information are vital for drug response prediction, and PPI and gene-gene correlation information play relatively more important role than others. In addition, The computational complexity is mainly determined by equations (4) and (5) and are O(nm⁵l⁴) and O(n³m⁵l²), respectively. Considering the fact that the number of cell lines(n) and number of drugs(m) are relatively smaller than number of target genes(l), thus, the main contribution of computational complexity is the target gene nodes(l). Accordingly, the overall complexity of our model is O(n³m⁵l²).

Compared with existing methods

In order to comprehensively assess the efficiency of our method on predicting drug responses, we compare HNMDRP method with state-of-the-art method: Zhang’s method, Stanfield’s method, DLNDRP and SVMDRP. Here, Zhang et al. propose a computational framework for the dual-layer integrated cell line-drug network to accurately predict tumor drug responses. And Stanfield’s method is performed on network profile which is computed by a large heterogeneous network to accurate and reproducible classification of drug sensitive and resistance. DLNDRP is a heterogeneous graph based inference on a two-layer network which consist of only cell line nodes and drug nodes for drug response prediction. SVMDRP is implemented on cell line gene expression and drug sensitivity data for predicting drug response. We made comparison of these five methods as shown in Fig. 2 and Table 1. From the results of Fig. 2, we find that our method achieve better results than both Stanfield’s method and Zhang’s method. In addition, as shown in Table 1, we can see that the average AUC value of our HNMDRP method are 5.6% and 14.26% higher than DLNDRP and SVMDRP, respectively. The results of remaining drugs are listed in Supplementary Table S3. The highest AUC value of 93.8% is obtained by drug SNX-2112 which also achieved good results using liquid chromatography method²⁷. According to these results, we know that our method HNMDRP can predict drug responses more accurately than other state-of-the-art methods investigated here.

Table 1 The results of leave-one-out cross validation: AUC value of several drugs.

Full size table

Tissue specific of cell line type

Drug responses may have large differences in diverse tissues types. Therefore, we test whether our HNMDRP can achieve a good performance when considering different cell line tissue types. As shown in Fig. 3A, 19 tissue types of cancer cell line and the distribution of these types are obtained based on GDSC dataset. We find that the major tissue types are leukemia (acute myeloid leukemia and chronic lymphocytic leukemia), urogenital system (bladder cancer), Lung NSCLC (non-small cell lung carcinoma). They take up 8.3% (80), 10.4% (100), 11.3% (109) on all 962 cancer cell lines, respectively. In order to demonstrate the comparable predictive results of our proposed method in different tissue types, we examine the performance on predicting drug responses in above three types of tissue. As shown in Fig. 3B, the bar represents the area under the ROC curve for three tissue types. And the average AUC values are 0.6787, 0.5053, 0.5534, 0.5265 and 0.5324 for five methods HNMDRP, Zhang’s method, Stanfield’s method, DLNDRP and SVMDRP on leukemia, urogenital system and lung NSCLC. These results indicate that our HNMDRP method can also achieve consistent performance on diverse tissue types. And the AUC values of the rest tissue types are listed in Supplementary Table S4. Furthermore, we only use the specific type of cell line to train our model and predict the drug responses based on these tissue types. The experimental results show that our method also achieve the best performance as shown in supplementary Figure S4.

Case studies

It is known that the prediction results of false positive are usually suspicious in study of bioinformatics²⁸. In this work, our HNMDRP method has attained a good performance in predicting known cell line-drug associations when compared with other existing method. We need to validate the ability of retrieving true positive (sensitive) associations in the prediction results among five methods. Thus, in addition to the ROC curves, we also compare the numbers of correctly retrieved cell line-drug sensitive associations according to different percentiles²⁹. As shown in Fig. 4, we take drug GSK2126458 as an example, which have 94 positives (sensitive) and 808 negatives (resistant) associations, for each percentile p% (1%, 2%, 5%, 10% and 100%), we count the number of retrieved true positives among 962 cell lines based on the prediction results. And we can easily find that our HNMDRP method has little true positive predictions at percentiles 1% and 2%, but has significant more predictions at higher percentiles. These results indicate that HNMDRP method gives most of the known cell line-drug sensitive associations higher ranks and gives several unknown associations very high ranks.

Computationally predicted results usually need experimental verification, but it has more difficulty and limitation in practical implementation. Thus, similar to Wang et al.¹⁵, which find out novel sensitive associations based on the prediction score of cell line-drug pair with unknown associations in the database. To further test the ability of our HNMDRP method in predicting potential cell line-drug associations, we searched the top20 ranked candidate prediction results of all cell line-drug pair which have unknown association with drugs in GDSC dataset. As shown in Table 2, we find literature evidences to support those cell line-drug pairs be novel potential sensitive associations. For instance, the cell type of cell line MHH-CALL-2 is B cell leukemia, and the literature evidence provided by Lucas et al. indicate that the drug MS-275 is the promising treatment programs on this cancer cell line which is ranked 4 in prediction results³⁰. Meanwhile, Gobin, et al. suggest that drug NVP-BEZ235 is the potential therapeutic strategy on cell line CHSA0011 of cell type chondrosarcoma, which is ranked 10 among all cell lines³¹. For drug Belinostat and cell line AMO-1, the published work³² gives evidence to clarify them be potential treatment in clinical trials. The remaining novel sensitive prediction results and literature evidences shown in Table 2 indicate that our HNMDRP method can accurately uncover novel sensitive associations between cancer cell line and drug, which provide a foundation of future experimental verification. Based on the above results, we can generally confirm that drug-target and PPI information are really important for drug response prediction.

Table 2 The top20 predictions of cell line-drug pairs (unknown) computed by HNMDRP which have literature evidences be novel sensitive associations.

Full size table

Discussion and Conclusion

In this work, we propose a novel heterogeneous network-based method (HNMDRP) to predict the responses of cancer cell lines with multiple drugs based on experimentally IC₅₀ values³³ from the GDSC study⁴. Here, five sub-networks are constructed: (1) cell line similarity network, which is obtained by calculating Pcc values based on cell line gene expression profiles, (2) drug similarity network, which is obtained by calculating Pcc values based on drug chemical structures, (3) target similarity network, which is obtained by merging PPI information and correlational coefficient³⁴ based on gene expression profile, (4) cell line-drug association network, which is obtained by log-normalized IC₅₀ values from GDSC study, (5) drug-target interaction network, which is obtained by known compound molecular activities. Then a comprehensive heterogeneous network is constructed based on above sub-network. Our main contribution is integrated cell line gene expression profiles, drug chemical structure features, drug-target interactions and PPIs simultaneously. And we demonstrate that known drug-target interactions and PPIs are helpful for improving prediction performance of drug response. The validity of our method is not only supported by its effective in predicting known cell line-drug associations, but also in predicting unknown cell line-drug associations which have dependable literature evidences. Another advantage of our method is the use of correlations among cell lines, drugs, targets. Thus, the huge dimensionality of cell line gene expression profile, drug chemical structure features are not seriously affecting the prediction results.

In addition, as people only concern about whether the specific cancer cell line is sensitive or resistant to a therapy drug, but not what the exact response value is. In this work, we do not learn the exact response value which usually did in previous work^16,35,36, but studying the binary classification problem (sensitive or resistant)⁹ of the drug response. From the results, we find that for most drugs, our HNMDRP method can obtain the best ROC curves, and the value of AUC is obtained from the corresponding curves. Comprehensive results show that our HNMDRP have achieved slightly better performance than existing state-of-the-art method in predicting drug responses.

Despite our method have achieved encouraging results, it cannot avoid the following limitations which we will extend and improve in future work. Firstly, the construction of cell line similarity network relied only on cell line’s genome-wide gene expression profile data, but not integrating cell line’s somatic mutation, copy number variation^36,37 which could potentially influence the prediction performance based on our heterogeneous network method²². Secondly, the construction of drug similarity network relied on drug’s 1-D and 2-D structural properties which might give sufficient features to represent a drug, but not integrating the 3-D structure features which may play a crucial role for certain drugs. Thirdly, construction of target similarity network relied only on correlational relationship and PPIs³⁴, and target sequence information could be analyzed to characterize the similarity among targets. Previous work indicate that sequence information is predictive in drug response¹⁵. Thus, if effectively incorporate these informative data resources into our model, the predictive performance may be further improved. With increasing data and theoretical support become available over time, we hope our method will have even better prediction results and potentially promote drug discovery process.

Materials and Methods

In this work, we use GDSC study⁴ as benchmark dataset which is downloaded from website (http://www.cancerrxgene.org/) by Wellcome Trust Sanger Institute. The dataset consist of 1001 cancer cell line and 265 tested drugs, and it also provide gene expression profiles which represent cell line genomic information and a series of continuous IC₅₀ values³³ which represent the drug response measurement. In this work, we use 189 drugs which they have both chemical structure features and drug response data and 962 cell lines which they have both genomic profiles and drug response after data preprocessing. We also extract the interactions between 189 drugs and 243 target genes based on the GDSC dataset. In order to incorporate PPIs into target similarity network, we download totally 4850628 PPIs data from STRING³⁸ database and extract 396419 PPI interactions among available 3040 genes which are associated with target genes³⁹. We briefly describe the methods of calculating similarities and connections in the following section.

Cell line similarity network

To construct cell line similarity network, firstly, we separate the baseline gene expression profile of cancer cell line based on genomic data from GDSC. Then we get 962 cell lines with 16383 dimensional gene expression profiles (Fig. 1B left panel). Similar to previously study¹⁶, the Pearson correlation coefficient⁴⁰ (Pcc) value of each cell line pair is calculated based on their gene expression profiles. Finally, as shown in Fig. 1C, we use a matrix SIM_cc to represent cell line–cell line similarity network which is generated by the Pcc value of all cell line pairs.

Cell line-drug association network

Initial cell line-drug associations are summarized by the log-normalized IC₅₀ values from the GDSC database. We use the threshold provided by Iorio, et al.⁴¹ to classify these continues IC₅₀ values into two classes: sensitive or resistant (Fig. 1A). Firstly, the threshold is distinct for each drug, and then the IC₅₀ values higher than this threshold are defined as resistant, otherwise are defined as sensitive. Finally, we get overall associations including 17316 sensitive, 129815 resistant and 34687 unknown among 962 cell lines and 189 drugs. As shown in Fig. 1C, we use a matrix A_cd to represent the association network between 962 cell lines and 189 drugs for further analysis.

Drug similarity network

To construct drug-drug similarity network, firstly, we download drug’s chemical structures from PubChem⁴² (https://www.ncbi.nlm.nih.gov/pccompound) of 189 drugs in which they all have chemical structure features. Then we extract the 1-D and 2-D structure properties (listed in Supplementary Table S1) of 189 drugs using PaDEL software⁴³ program with default settings (Fig. 1B middle panel). The 1-D features include compositional molecular properties such as atom count, bond count and molecular weight. And 2-D features consist of various quantitative properties of molecular topology, e.g., Kappa shape indices⁴⁴, Randic⁴⁵ and Wiener indices⁴⁶. Finally, we follow the work of Zhang et al.¹⁶, the Pcc value of each drug pair is calculated based on these features. As shown in Fig. 1C, we use a matrix SIM_dd to represent drug-drug similarity network which is generated by the Pcc value of all drug pairs.

Drug-target interaction network

In this work, our target information are collected from GDSC⁴ database. First, we extract drug-target interactions among 189 drugs and 243 target genes which also exist in KEGG⁴⁷ drug database. And then, we extract 3040 available genes which are associated with target genes³⁹ based on STRING database. Finally, as shown in Fig. 1C, the corresponding matrix A_dt is generated to represent drug-target network among 189 drugs and 3040 genes.

Target similarity network

To construct target-target similarity network, two different gene-gene relationship matrixes W_ppi and W_corr are generated (Fig. 1B right panel). Firstly, we use 0.4 confidence cut-off value^48,49 to extract 396419 PPIs between available genes based on STRING database³⁸. Similar to the works^50,51, the confidence score of those PPIs are transformed to matrix W_ppi(i, i). It is normalized as below:

$$\overline{Wppi}=Wppi(i,j)/\sqrt{Dppi(i,i)\ast Dppi(j,j)}$$

(1)

where ${D}_{ppi}(i,i)$ is the sum of row i in ${W}_{ppi}(i,i)$, $\overline{Wppi}(i,j)$ is the normalized matrix which represent the weight of PPIs among available genes. Then we extract gene expression profiles of those available genes based on GDSC database. We follow previous study³⁹ and calculate the Pcc value based on gene expression profiles. We use a matrix W_corr to represent the weight of the correlational relationships which is generated by the above calculated Pcc value among available genes³⁴. Finally, in order to deal with these two kinds of weighted matrix (W_corr and $\overline{Wppi}$) fairly, we treat them as below⁵²:

$$SI{M}_{tt}=1-(1-{W}_{corr})\ast (1-\overline{Wppi})$$

(2)

As shown in Fig. 1C, we use a matrix SIM_tt denote the target similarity network which is constructed by merging correlational relationship (W_corr) and PPI ($\overline{Wppi}$) information.

HNMDRP

In this work, we propose a novel heterogeneous network-based method (HNMDRP) to efficiently predict cell line-drug associations by making good use of heterogeneous information of cell line gene expression profile, drug chemical structure feature, drug target interaction and PPIs information. The overall workflow of our method is summarized as Fig. 1. Firstly, the Pcc⁴⁰ is a widely used measurement for identifying correlational relationships³⁴. And it is defined as:

$$Pcc=\frac{{\sum }^{}(X-\bar{X})(Y-\bar{Y})}{\sqrt{{\sum }^{}{(X-\bar{X})}^{2}{\sum }^{}{(Y-\bar{Y})}^{2}}}$$

(3)

where X and Y are the column vector of a node’s feature, $\bar{{\rm{X}}}$ and $\bar{{\rm{Y}}}\,\,$are the mean value of each feature vector. Here, we take cell line similarity network as an example. The Pcc value together with the p-value (t-test) between this cell line and other cell lines are calculated. We take the procedure of previously study³⁹ and use their criteria to choose the cell line pairs with absolute Pcc value which is ranked in top 50% among all cell line pairs and the p-value less than 0.01 as correlated, then use such Pcc value as the similarity score. Via this procedure, we can also obtain drug similarity network among 189 drugs. Then we introduce the similarity measure to construct a heterogeneous network model by incorporating complex relationships which include cell line gene expression, drug chemical property, drug-target and PPIs simultaneously. This comprehensive network H(C, D, T, and E) consists of five sub-networks, i.e. cell line-cell line similarity network, drug-drug similarity network, target-target similarity network, cell line-drug association network and drug-target interaction network. These networks are connected by three types of nodes that are defined below: cancer cell line nodes, drug nodes and target gene nodes. Let CC = {c₁, c₂, c₃…c_n} denote the n cancer cell line nodes, DD = {d₁, d₂, d₃…d_m} denote the m drug nodes. These two types of node are transformed to similarity matrixes SIM_cc and SIM_dd. Here, in each intra-network, the element of SIM(i, j) in row i column j is the Pcc value between node i and node j. And TT = {${t}_{1},{t}_{2},{t}_{3}\ldots {t}_{l}$} denote the l target gene nodes, the element of SIM_tt is obtained by combining PPI and correlational relationships. In addition, we define the weight of the edges between nodes as CD = {$c{d}_{ij}$|i = 1, 2, 3…n, j = 1, 2, 3…m} and DT = {dt_ij|i = 1, 2, 3…m, j = 1, 2, 3…l}. The matrix A_cd (i, j) is the bipartite association network between cell lines and drugs. For instance, the edge (E) $c{d}_{ij}$ is set as 1 if cell line i is sensitive to drug j, otherwise, resistant or unknown are set to be 0. And the matrix ${A}_{dt}(i,j)$ is also a bipartite graph which is built according to the molecular activity between drugs and target genes. The edge dt_ij is set as 1 if a drug has its corresponding therapeutic target j, otherwise is set as 0. Finally, as Fig. 1C shows, a comprehensive heterogeneous network is constructed based on above five similarity and interaction network. Subsequently, an information flow-based algorithm²³ is performed on this synthetic network as below:

$${A}_{cd}^{k+1}=\alpha {A}_{cd}^{k}\times (SI{M}_{dd}\times {A}_{dt}^{k}\times SI{M}_{tt}\times {A}_{dt}^{k\,T})+(1-\alpha ){A}_{cd}^{0}$$

(4)

$${A}_{dt}^{k+1}=\alpha ({A}_{dt}^{k\,T}\times SI{M}_{cc}\times {A}_{cd}^{k}\times SI{M}_{dd})\times {A}_{dt}^{k}+(1-\alpha ){A}_{dt}^{0}$$

(5)

where the matrix ${A}_{cd}^{0}$ and ${A}_{dt}^{0}$ represent the initial cell line-drug associations and drug-target interactions, SIM_cc, SIM_dd and SIM_tt are the similarity network among cell line, drug, and target gene, respectively, α is the decay factor in the range of 0 to 1. These two equations can be viewed as propagation algorithm across this comprehensive network in the process of iteration²³. The matrix ${A}_{cd}^{k+1}$ is the final drug response prediction score when the difference between ${A}_{cd}^{k+1}$ and ${A}_{cd}^{k}$ satisfy a sum error with a threshold value of 1e-4²⁴. Since different data resources are merged together, proper normalization on matrixes are required to ensure the algorithm can converge²³. And it is defined as follows:

$$Norm({v}_{i},{v}_{j})=\frac{W({v}_{i},{v}_{j})}{\sqrt{{\sum }_{k=1}^{m}\,W({v}_{i},{v}_{k}){\sum }_{k=1}^{n}\,W({v}_{k},{v}_{j})}}$$

(6)

where W (v_i, v_j) is the matrixes of $(SI{M}_{dd}\times {A}_{dt}^{k}\times SI{M}_{tt}\times {A}_{dt}^{kT})$ or $({A}_{dt}^{kT}\times SI{M}_{cc}\times {A}_{cd}^{k}\times SI{M}_{dd})$ in the process of iteration, Norm(v_i, v_j) is the normalized matrix.

References

Eisenstein, M. Personalized medicine: Special treatment. Nature 513, S8–S9 (2014).
Article CAS PubMed Google Scholar
Mirnezami, R., Nicholson, J. & Darzi, A. Preparing for precision medicine. New England Journal of Medicine 366, 489–491 (2012).
Article PubMed Google Scholar
Cui, J. et al. An integrated transcriptomic and computational analysis for biomarker identification in gastric cancer. Nucleic acids research 39, 1197–1207 (2010).
Article PubMed PubMed Central Google Scholar
Yang, W. et al. Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucleic acids research 41, D955–D961 (2013).
Article CAS PubMed Google Scholar
Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity. Nature 483, 603 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Gottlieb, A., Stein, G. Y., Ruppin, E. & Sharan, R. PREDICT: a method for inferring novel drug indications with application to personalized medicine. Molecular systems biology 7, 496 (2011).
Article PubMed PubMed Central Google Scholar
Wang, L. et al. RFDT: A Rotation Forest-based Predictor for Predicting Drug-Target Interactions using Drug Structure and Protein Sequence Information. Current protein & peptide science (2016).
Gupta, S. et al. Prioritization of anticancer drugs against a cancer using genomic features of cancer cells: A step towards personalized medicine. Scientific reports 6 (2016).
Dong, Z. et al. Anticancer drug sensitivity prediction in cell lines from baseline gene expression through recursive feature selection. BMC cancer 15, 489 (2015).
Article PubMed PubMed Central Google Scholar
Geeleher, P., Cox, N. J. & Huang, R. S. Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines. Genome biology 15, R47 (2014).
Article PubMed PubMed Central Google Scholar
Liu, X. et al. A systematic study on drug-response associated genes using baseline gene expressions of the Cancer Cell Line Encyclopedia. Scientific reports 6 (2016).
Chen, X. et al. NLLSS: predicting synergistic drug combinations based on semi-supervised learning. PLoS computational biology 12, e1004975 (2016).
Article PubMed PubMed Central Google Scholar
Menden, M. P. et al. Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties. PLoS one 8, e61318 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Ammad-Ud-Din, M. et al. Integrative and personalized QSAR analysis in cancer by kernelized Bayesian matrix factorization. Journal of chemical information and modeling 54, 2347–2359 (2014).
Article CAS PubMed Google Scholar
Wang, Y., Fang, J. & Chen, S. Inferences of drug responses in cancer cells from cancer genomic features and compound chemical and therapeutic properties. Scientific Reports 6 (2016).
Zhang, N. et al. Predicting anticancer drug responses using a dual-layer integrated cell line-drug network model. PLoS Comput Biol 11, e1004498 (2015).
Article PubMed PubMed Central Google Scholar
Drews, J. Drug discovery: a historical perspective. Science 287, 1960–1964 (2000).
Article ADS CAS PubMed Google Scholar
Schreiber, S. L. Target-oriented and diversity-oriented organic synthesis in drug discovery. Science 287, 1964–1969 (2000).
Article ADS CAS PubMed Google Scholar
Chen, X. et al. Drug–target interaction prediction: databases, web servers and computational models. Briefings in bioinformatics 17, 696–712 (2015).
Article PubMed Google Scholar
Huang, Y.-A., You, Z.-H. & Chen, X. A systematic prediction of drug-target interactions using molecular fingerprints and protein sequences. Current protein & peptide science (2016).
Stanfield, Z., Coşkun, M. & Koyutürk, M. Drug Response Prediction as a Link Prediction Problem. Scientific reports 7, 40321 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Chen, X., Liu, M.-X. & Yan, G.-Y. Drug–target interaction prediction by random walk on the heterogeneous network. Molecular BioSystems 8, 1970–1978 (2012).
Article CAS PubMed Google Scholar
Wang, W., Yang, S., Zhang, X. & Li, J. Drug repositioning by integrating target information through a heterogeneous network model. Bioinformatics 30, 2923–2930 (2014).
Article CAS PubMed PubMed Central Google Scholar
Wang, W., Yang, S. & Li, J. In Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing. 53 (NIH Public Access).
Kohavi, R. In Ijcai. 1137–1145 (Stanford, CA).
Sun, D., Li, A., Feng, H. & Wang, M. NTSMDA: prediction of miRNA–disease associations by integrating network topological similarity. Molecular BioSystems 12, 2224–2232 (2016).
Article CAS PubMed Google Scholar
Zhai, Q.-Q. et al. Determination of SNX-2112, a selective Hsp90 inhibitor, in plasma samples by high-performance liquid chromatography and its application to pharmacokinetics in rats. Journal of pharmaceutical and biomedical analysis 53, 1048–1052 (2010).
Article CAS PubMed Google Scholar
Elmore, J. G. et al. Ten-year risk of false positive screening mammograms and clinical breast examinations. New England Journal of Medicine 338, 1089–1096 (1998).
Article CAS PubMed Google Scholar
Xu, X. & Wang, M. Inferring Disease Associated Phosphorylation Sites via Random Walk on Multi-Layer HeterogeneousNetwork. IEEE/ACM Transactions on Computational Biology and Bioinformatics 13, 836–844 (2016).
Article CAS PubMed Google Scholar
Lucas, D. et al. The histone deacetylase inhibitor MS-275 induces caspase-dependent apoptosis in B-cell chronic lymphocytic leukemia cells. Leukemia 18, 1207 (2004).
Article CAS PubMed Google Scholar
Gobin, B. et al. NVP-BEZ235, a dual PI3K/mTOR inhibitor, inhibits osteosarcoma cell proliferation and tumor development in vivo with an improved survival rate. Cancer letters 344, 291–298 (2014).
Article CAS PubMed Google Scholar
Gimsing, P. et al. A phase I clinical trial of the histone deacetylase inhibitor belinostat in patients with advanced hematological neoplasia. European journal of haematology 81, 170–176 (2008).
Article CAS PubMed Google Scholar
Sebaugh, J. Guidelines for accurate EC50/IC50 estimation. Pharmaceutical statistics 10, 128–134 (2011).
Article CAS PubMed Google Scholar
Liao, Q. et al. Large-scale prediction of long non-coding RNA functions in a coding–non-coding gene co-expression network. Nucleic acids research 39, 3864–3878 (2011).
Article CAS PubMed PubMed Central Google Scholar
Venkatesan, K. et al. (AACR, 2010).
Costello, J. C. et al. A community effort to assess and improve drug sensitivity prediction algorithms. Nature biotechnology 32, 1202 (2014).
Article CAS PubMed PubMed Central Google Scholar
Shen, L. et al. Drug sensitivity prediction by CpG island methylation profile in the NCI-60 cancer cell line panel. Cancer research 67, 11335–11343 (2007).
Article CAS PubMed Google Scholar
Szklarczyk, D. et al. The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic acids research 39, D561–D568 (2011).
Article CAS PubMed Google Scholar
Peng, C. & Li, A. A heterogeneous network based method for identifying GBM-related genes by integrating multi-dimensional data. IEEE/ACM Transactions on Computational Biology and Bioinformatics (2016).
Ahlgren, P., Jarneving, B. & Rousseau, R. Requirements for a cocitation similarity measure, with special reference to Pearson’s correlation coefficient. Journal of the American Society for Information Science and Technology 54, 550–560 (2003).
Article Google Scholar
Iorio, F. et al. A landscape of pharmacogenomic interactions in cancer. Cell 166, 740–754 (2016).
Article CAS PubMed PubMed Central Google Scholar
Bolton, E. E., Wang, Y., Thiessen, P. A. & Bryant, S. H. PubChem: integrated platform of small molecules and biological activities. Annual reports in computational chemistry 4, 217–241 (2008).
Article CAS Google Scholar
Yap, C. W. PaDEL‐descriptor: An open source software to calculate molecular descriptors and fingerprints. Journal of computational chemistry 32, 1466–1474 (2011).
Article CAS PubMed Google Scholar
Hall, L. H. & Kier, L. B. The molecular connectivity chi indexes and kappa shape indexes in structure‐property modeling. Reviews in Computational Chemistry ume 2, 367–422 (2007).
Google Scholar
Randić, M. Novel graph theoretical approach to heteroatoms in quantitative structure—activity relationships. Chemometrics and Intelligent Laboratory Systems 10, 213–227 (1991).
Article Google Scholar
Bonchev, D. The overall Wiener index a new tool for characterization of molecular topology. Journal of chemical information and computer sciences 41, 582–592 (2001).
Article CAS PubMed Google Scholar
Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic acids research 28, 27–30 (2000).
Article CAS PubMed PubMed Central Google Scholar
Butland, G. et al. eSGA: E. coli synthetic genetic array analysis. Nature methods 5, 789–795 (2008).
Article CAS PubMed Google Scholar
Jafari, M., Nickchi, P., Safari, A., Tazehkand, S. J. & Mirzaie, M. IMAN: Interlog protein network reconstruction, Matching and ANalysis. bioRxiv 069104 (2016).
Von Mering, C. et al. STRING: known and predicted protein–protein associations, integrated and transferred across organisms. Nucleic acids research 33, D433–D437 (2005).
Article Google Scholar
Franceschini, A. et al. STRINGv9. 1: protein-protein interaction networks, with increased coverage and integration. Nucleic acids research 41, D808–D815 (2012).
Article PubMed PubMed Central Google Scholar
Guo, X. et al. Long non-coding RNAs function annotation: a global prediction method based on bi-colored networks. Nucleic acids research 41, e35–e35 (2013).
Article CAS PubMed Google Scholar
Huang, X.-F. et al. Aurora kinase inhibitory VX-680 increases Bax/Bcl-2 ratio and induces apoptosis in Aurora-A-high acute myeloid leukemia. Blood 111, 2854–2865 (2008).
Article CAS PubMed Google Scholar
Galanis, E. et al. Phase II trial of vorinostat in recurrent glioblastoma multiforme: a north central cancer treatment group study. Journal of clinical oncology 27, 2052–2058 (2009).
Article CAS PubMed PubMed Central Google Scholar
Iseki, H. et al. Cyclin-dependent kinase inhibitors block proliferation of human gastric cancer cells. Surgery 122, 187–195 (1997).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China (No. 61471331, No. 61571414 and No. 31100955), University of Science and Technology of China, USTC. We appreciate the valuable suggestions from any reviewers. We also thank Dongdong Sun for many helpful discussions and suggestions.

Author information

Fei Zhang and Minghui Wang contributed equally to this work.

Authors and Affiliations

School of Information Science and Technology, University of Science and Technology of China, Hefei, AH230027, China
Fei Zhang, Minghui Wang & Ao Li
Centers for Biomedical Engineering, University of Science and Technology of China, Hefei, AH230027, China
Minghui Wang, Jianing Xi, Jianghong Yang & Ao Li

Authors

Fei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Minghui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jianing Xi
View author publications
You can also search for this author in PubMed Google Scholar
Jianghong Yang
View author publications
You can also search for this author in PubMed Google Scholar
Ao Li
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

F.Z. and M.W. wrote the main manuscript text and prepared all Tables and Figures. A.L., J.Y. and J.X. provided valuable suggestions and guidance. All authors reviewed the manuscript.

Corresponding author

Correspondence to Minghui Wang.

Ethics declarations

Competing Interests

The authors declare no competing interests.

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Supplementary information

Dataset 1

Dataset 2

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zhang, F., Wang, M., Xi, J. et al. A novel heterogeneous network-based method for drug response prediction in cancer cell lines. Sci Rep 8, 3355 (2018). https://doi.org/10.1038/s41598-018-21622-4

Download citation

Received: 04 October 2017
Accepted: 06 February 2018
Published: 20 February 2018
DOI: https://doi.org/10.1038/s41598-018-21622-4

This article is cited by

Drug response prediction using graph representation learning and Laplacian feature selection
- Minzhu Xie
- Xiaowen Lei
- Guijing Li
BMC Bioinformatics (2022)
A multi-view multi-omics model for cancer drug response prediction
- Zhijin Wang
- Ziyang Wang
- Yonggang Fu
Applied Intelligence (2022)
A recursive framework for predicting the time-course of drug sensitivity
- Cheng Qian
- Amin Emad
- Nicholas D. Sidiropoulos
Scientific Reports (2020)
RefDNN: a reference drug based neural network for more accurate prediction of anticancer drug resistance
- Jonghwan Choi
- Sanghyun Park
- Jaegyoon Ahn
Scientific Reports (2020)
Machine learning approaches to drug response prediction: challenges and recent progress
- George Adam
- Ladislav Rampášek
- Anna Goldenberg
npj Precision Oncology (2020)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.