Combining genomic and network characteristics for extended capability in predicting synergistic drugs for cancer

The identification of synergistic chemotherapeutic agents from a large pool of candidates is highly challenging. Here, we present a Ranking-system of Anti-Cancer Synergy (RACS) that combines features of targeting networks and transcriptomic profiles, and validate it on three types of cancer. Using data on human β-cell lymphoma from the Dialogue for Reverse Engineering Assessments and Methods consortium we show a probability concordance of 0.78 compared with 0.61 obtained with the previous best algorithm. We confirm 63.6% of our breast cancer predictions through experiment and literature, including four strong synergistic pairs. Further in vivo screening in a zebrafish MCF7 xenograft model confirms one prediction with strong synergy and low toxicity. Validation using A549 lung cancer cells shows similar results. Thus, RACS can significantly improve drug synergy prediction and markedly reduce the experimental prescreening of existing drugs for repurposing to cancer treatment, although the molecular mechanism underlying particular interactions remains unknown.


Supplementary Tables
Supplementary Table 1: 14 features initially selected to differentiate synergistic pairs from unlabeled combinations.

TC
The similarity between the therapeutic categories (ATC code) of the two agents. TC was chosen to examine whether the two paired drugs in known synergistic combinations tend to share similar therapeutic effects.
The averaged molecular weight of the two agents. AMW was selected to see whether the drug components in the synergistic combinations tend to have small/big molecular weights.
The variance of molecular weight of the two agents. VMW was selected to see whether the drug components in the synergistic combinations tend to have similar molecular weights.

MI
The similarity between the biological processes (BPs) regulated by the targets of the two agents, and a large value often imply a high functional similarity. It was found that the paired drugs in the synergistic combinations tend to regulate different cancer-related BPs, as negative MI values were observed with the known synergistic combinations.

DCI
The variance between the combined agent pair-induced relative change in network information-transmitting efficiency and the sum of the change induced by the individual agents. Indicated by DCI, the synergistic combinations tend to produce more effects on reducing the information sending efficacy of the cancer network than the sum of the individual drugs.

Dis
The average distance between target proteins of the two agents in the context of PPI network. The drug components in the synergistic combinations tend to have shorter distance between target proteins in the context of protein-protein interaction network.

 7
Eff.D The evaluation of the efficacy of drug pairs considering both therapeutic effects and additional effects, calculated with degree of the drug target in the network. This feature was designed based on the assumption that good combinations are expected to generate maximum therapeutic effects and minimum additional effects.
The evaluation of the efficacy of drug pairs considering both therapeutic effects and additional effects, calculated with betweenness of the drug targets in the network. This feature was designed based on the assumption that good combinations are expected to generate maximum therapeutic effects and minimum additional effects. Eff.E The evaluation of the efficacy of drug pairs considering both  therapeutic effects and additional effects, calculated with eigenvector centrality of the drug target in the network. This feature was designed based on the assumption that good combinations are expected to generate maximum therapeutic effects and minimum additional effects. 10 NNO The overlap rate between the network neighbors of the target sets of the two agents. This feature was used to see whether the paired drugs tend to share more interaction partners in the protein-protein interaction network.

MP.I
The proportion of identical pathways regulated by the targets of the two agents.

12
MP.C The proportion of cross-talking pathways regulated by the targets of the two agents.

MP.Int
The proportion of interacting pathways regulated by the targets of the two agents.

MP.U
The proportion of unrelated pathways regulated by the targets of the two agents. We found the paired drugs in the synergistic combinations tend to regulate the unrelated cancer-related pathways. The overlap rate of the differentially expressed genes of the two agents.

Supplementary
The overlap rate of the network neighbors of the differentially expressed genes of the two agents.

DEG_BP
The similarity between the biologic processes of the differentially expressed genes of the two agents.
The coverage rate of the breast cancer pathway by the differentially expressed genes of the two agents.

DEG_Dis
The average shortest distance between the two differentially expressed gene sets of the two agents in PPI network  Supplementary   Topotecan Lapatinib * "*" the finally selected 26 positive samples used to build the prediction model.

Supplementary Note 3: Robustness of RACS
The robustness of RACS was associated with the background PPI network and the cancer network, as the discriminating features used in preliminary ranking model were calculated based on these networks. To test the robustness of RACS according to the perturbation of the networks, protein nodes in the background PPI network or the cancer network were randomly removed at different percentages for 3 times, to investigate the influence on the values of the discriminating features respectively. As a result, all the features remained discriminating (with |Z-score|>3) until more than 10% of nodes were randomly removed in the networks (Supplementary Fig. 8). Next, these permutated features were incorporated into RACS to obtain the ranking list on the dataset used in "Evaluating performance of the preliminary ranking model" part. The normalized Discounted Cumulative Gain (nDCG: http://en.wikipedia.org/wiki/NDCG) was employed to measure the similarity between the permutation ranks and the original one with one query set containing 10 positive samples. The permutation rank results showed that the performance of the preliminary ranking model kept acceptable (with nDCG value ~ 0.6) until more than 20% of the nodes were removed from the two networks respectively (Supplementary Fig. 9). It was reasonable that random removing of a small number of nodes made no big change of the network as the majority of the nodes were of less connection with other nodes, while the removing of a large amount of nodes made the network completely different.

Supplementary Note 4: Contribution of individual features
The contribution of the individual features was further evaluated by different query set of the model on ER positive breast cancer data. As shown in Supplementary Fig. 10 Generally, RACS performed better using the 7 single significant features than using the insignificant features ( Supplementary Fig. 11). RACS obtained a PC-index of 0.69 when combining the 7 significant features together, while the PC-index of RACS decreased to 0.49 when all 14 features were used ( Supplementary Fig. 11). Among the 7 models using only 6 of the significant features, the one with MI removed performed best. But, none of them surpassed the RACS model with all the 7 significant features (Supplementary Fig. 11). Besides, the model performance worsened when incorporating one of the 7 insignificant features with the significant features ( Supplementary Fig. 11).

Supplementary Note 6: Performance of RACS when sub-selecting the drug targets
RACS model was constructed by combining knowledge derived from the target proteins and the gene expression profiles of drug response. Currently, the target protein information of lots of compounds has been accumulated. But in the meantime, target proteins have not been completely known for many compounds. Given that this may bias the prediction accuracy of RACS, we further evaluated the performance of RACS by sub-selecting the target proteins.
The 118 collected drugs target 241 human proteins in total. Most of these drugs target only a few proteins (1~3), while some have many targets. For example, Gleevec has 11 targets and Arsenic trioxide has 13 targets. As drugs with 2 or 3 targets account for most of the multi-target drugs ( Supplementary Fig. 12 ), these drugs were chosen to show how RACS would perform when a subset of the target proteins were missed. For each of these drugs, 117 drug pairs were obtained when combine with the left 117 drugs. The concordance of predicted rankings of the 117 drug pairs before and after sub-selecting the target proteins were measured using C-index. For the drugs with 2 targets, the two targets were removed respectively in every simulation. As a result, for most drugs, the predicted ranking list after removing one target showed high concordance (with C-index>0.7) with the original ranking ( Supplementary Fig. 12). For the drugs with 3 targets, the three single targets together with all three 2-targets-combinations were removed respectively each time. Similar high concordance was seen when one of the targets were removed from most of the drug with 3 targets, while the concordance became a little lower when two of the targets were removed ( Supplementary Fig. 12).

Supplementary Note 7: The construction of "Targeting pathway of experimentally tested drug combinations in ER positive breast cancer cell MCF7"
First, pathways closely related to breast cancer were collected, i.e., Estrogen Signaling pathway, ERBB pathway, PI3K/Akt/mTOR Signaling Pathway, p53 Signaling Pathway, Ras Signaling Pathway, Notch Signaling Pathway, Wnt Signaling pathway and NFkB Pathway. These pathways were compiled into a big pathway via the overlapping elements. Next, this pathway was modified using gene expression profiles and mutation information of MCF7 in CCLE 2 to make this pathway specific to MCF7 cell line. Drug targets were then mapped to the pathway, and for better display, pathways not targeted were removed to form the final "Targeting pathway of experimentally tested drug combinations in ER positive breast cancer cell MCF7".
Dexamethasone and Imatinib were purchased from Selleckchem (Houston, TX, USA). The purity of each drug is above 98%.
Cell line: NSCLC cell lines A549 was obtained from the Shanghai Cell Bank, Chinese Academy of Sciences. Cells were grown in F12K mediums supplemented with 10% fetal serum and 100 μg/ml penicillin and 100 μg/ml streptomycin at 37℃ under 5% CO 2 .
MTT assay: Each well of 96-well plates was seeded with 100 μl of cell suspension containing 5000 cells, leaving 100μl room for testing drugs. Drugs were added next day after seeding. The plates were continuously cultured for another 48 hours. The cytotoxicity was assayed by MTT For each drug, the dataset was classified into a control group and a drug-treated group, each with no less than 3 samples. T-test was performed on the expression values of each gene in the two sample groups. Only genes with P_value <0.05 were used as the differentially expressed genes (DEGs).

2-step significance test
Take one parameter of DEG_Overlap as example. We have 9 positive samples and 46 unlabeled samples. For each drug pair, a group of DEG can be identified for each drug and DEG_Overlap can be derived for each drug pair.
Step-1 is to test whether the two DEG sets are randomly overlapping or not for a drug pair. A p-value can be derived for each drug pair. While Step-2 is to test whether those drug pairs with p-value < 0.05 of DEG_Overlap are enriched in the positive sample sets.
Step-1: To test whether the two DEG sets are randomly overlapping, DEG_Overlap should be compared to the overlapping rate of randomly selected gene sets for the same drug pair. For each drug of a pair, the same number of DEG sets was randomly chosen from the whole gene set in Affymetrix Microarray HGU133-A (11,961 genes). The DEG_Overlap was then calculated between the two simulated genes sets for 10,000 times. In this way, a p-value can be obtained as the probability of getting a simulated value of DEG_Overlap bigger than the actual DEG_Overlap of this drug pair, as synergistic drug pairs was reported with higher overlapping DEG than non-synergistic pairs 21 .
Step-2: A hypergeometric test was adopted to test whether those drug pairs with p-value < 0.05 are enriched or not in the positive sample sets comparing to unlabeled sample sets. In step 1, a p-value was calculated for each pair (Supplementary Fig. 21). In this step, we found that for both features of DEG_Overlap and Pathway_Coverage, drug pairs with statistically significant p-values (p<0.05) were enriched in positive samples (Supplementary Table 3)."