Functional Strati�cation of Cancer Drugs through Integrated Network Similarity

Heterogeneity across tumors is the main obstacle in developing treatment strategies. Drug molecules not only perturb their immediate protein targets but also modulate multiple signaling pathways. In this study, we explored the networks modulated by several drug molecules across multiple cancer cell lines by integrating the drug targets with transcriptomic and phosphoproteomic data. As a result, we obtained 236 reconstructed networks covering �ve cell lines and 70 drugs. A rigorous topological and pathway analysis showed that chemically and functionally different drugs may modulate overlapping networks. Additionally, we revealed a set of tumor-specic hidden pathways with the help of drug network models that are not detectable from the initial data. The difference in the target selectivity of the drugs leads to disjoint networks despite sharing the exact mechanism of action, e.g., HDAC inhibitors. We also used the reconstructed network models to study potential drug combinations based on the topological separation, found literature evidence for a set of drug pairs. Overall, the network-level exploration of the drug perturbations may potentially help optimize treatment strategies and suggest new drug combinations.


Introduction
Transformation of normal cells to tumor cells is a multi-stage process where multiple signaling pathways and biomolecular connections alter [1][2][3][4][5][6][7] .However, response to the drug treatment is highly dependent on cellular and physiological factors in cancer, and many drugs have multiple targets 8 .The molecular heterogeneity across the tumor types may result in different signaling alterations for the same drug [9][10][11] .
Moreover, drugs perturb simultaneously a set of protein networks besides their immediate targets.Therefore, network-based approaches may address several unknowns about the drug e cacy and mechanisms of action across various cancer types [12][13][14] .Integrative multi-omic approaches can provide a realistic view of network-level alterations toward developing better treatment strategies against the complexity and heterogeneity of cancer [15][16][17][18][19] .
Different network-based strategies were previously used to investigate protein-drug and protein-protein interactions.The topological separation of disease modules within the human interactome was studied to elucidate disease-disease interactions 12 .In a similar line, the network proximity of drug targets in the human interactome was considered to predict drug-disease interactions for drug repurposing 13 .Cheng et al. measured the closest distances between drug targets and disease genes and used patient-speci c data to investigate the effects of drugs on different diseases 14 .Ritz et al. developed PathLinker to reconstruct human signaling pathways, which nds multiple shortest paths from receptors to transcriptional regulators in a reference protein interactome 20 .Wu et al. used BioNetGen software 21 to model a detailed rule-based biochemical network of VEGF-mediated eNOS signaling pathway to interpret the effects of an angiogenic inhibitor Thrombospondin-1 (TSP1) 22 .Halasz et al. studied signal transduction networks in colorectal cancer by integrated network reconstruction using Bayesian mechanistic modeling algorithm 23 .Naldi et al. combined weighted shortest paths and random walk methods 24 , and Buffard et al. used this method to identify signi cant pathways from phosphoproteomic data from cancer cells 25 .Scoring the topological proximity between disease proteins in the interactome was used to prioritize disease genes, infer new drug targets, identify the e cacy of drugs, and predict phenotypes [26][27][28][29][30][31] .One challenge in network-based approaches is the incompleteness of the human interactome.It has many false negatives 32,33 and is biased to well-studied proteins.Previous works have considered incomplete interactomes, and different link prediction approaches are combined with networkbased studies [34][35][36][37] .
Many cancer drugs eventually lead to resistance and cause adverse effects when applied continuously and with high doses 38,39 .The combination of drugs with different mechanisms of action is an alternative approach to improve the treatment strategies and repurpose drugs.Therefore, combinatorial drug treatment approaches are rigorously studied to eliminate or reduce resistance, recurrence, and possible side effects [40][41][42][43][44][45] .Effective drug combinations can be predicted with the help of network topology-based analysis of drug-disease interactions and inference of affected pathways 46 .
All these reviewed approaches have the overarching aim of better understanding molecular alterations in diseases, how drugs act within the cell and nding the best treatment options.In this study, we further elaborate on perturbed networks to mechanistically understand the similarities and differences of drugs.
Conceptually we (i) strati ed the drugs at the network level beyond their immediate targets, (ii) evaluated the alterations of drug modulation in different cell lines (iii) suggested potential drug combinations based on topological separation of the networks.For this purpose, we used an integrative approach based on reverse engineering principles that combine a link prediction strategy to modify the underlying interactome and the solution of the prize-collecting Steiner forest problem to reconstruct drug and cell line-speci c networks.All-pair comparison of the reconstructed networks shows that chemically and functionally different drugs may modulate shared pathways.We next considered these reconstructed networks for coherence with the available drug response data and possible drug combination prediction.Thus, these network models are rich resources for examining different aspects of drug actions at the pathway level in different cancer types.

Overview of the Method
We used transcriptomic and phosphoproteomic data of ve cancer cell lines treated with 89 perturbagens and the associated control treatment (Connectivity Map Project -CMap) to understand the drugs' signaling level differences and commonalities systematically.We obtained the upstream regulators -the set of transcription factors -of the signi cantly expressed genes for each cell line-drug pair from transcriptomic data.Additionally, we retrieved the targets of each drug from CMAP Drug Repurposing tool 47 , which combines the data from DrugBank, PubChem, and other drug-related databases.Finally, we merged the set of transcription factors, phosphoprotein hits, and drug targets to obtain the list of seed proteins of each cell line-drug pair for the network modeling.In Figure 1, we conceptually illustrated our integrative approach.
Although we started with 89 drugs and ve cell lines, the number of proteins in the seed list is very low for some drugs, making the network modeling not feasible.We have also tested less stringent thresholds to enlarge the size of the seed list.However, we still could not overcome this issue for some drugs (i.e., decitabine, ginkgetin in A549 and YAPC, vemurafenib, and tacrolimus in MCF7 and PC3 cells).Therefore, we continued with 70 drug treatments on ve cell lines.We calculated Tanimoto similarities and MACCS key distances of these 70 drugs (Supplementary Figure 1, Supplementary Note 1) to understand their chemical similarities and elucidated their mechanism of action (Supplementary Figure 2) to later use as a reference for the analysis.
To discover the hidden mechanisms underlying the effects of drugs, we integrated the seed proteins list (drug targets, transcription factors, and phosphoproteins) with the human interactome using Omics Integrator 48 software.The edge weights in the human interactome (iRefWeb v13.0) represent the con dence of the interactions and are calculated by the MI-Score function.We additionally re ned the human interactome with several approaches to decrease the impact of false positive and false negative interactions.Additionally, the link prediction followed by cellular localization lters enriched the interactome with the missing interactions (see Methods).Omics Integrator solves the prize-collecting Steiner Forest (PCSF) problem to reconstruct optimal subnetworks by integrating the list of seed proteins (terminal set) and a reference interactome.The performance of the network modeling approaches is highly dependent on the interactome, parameter tuning, and inclusion of known biological insights.Omics Integrator has a better performance compared to all pairs shortest paths, page rank, and heat diffusion approaches when rigorous parameter tuning is applied and multiple suboptimal solutions are merged.This modi cation increases the precision and recall of PCSF 49 .Additionally, PCSF can reconstruct a network from a single list of seed nodes whereas other powerful approaches such as PathLinker 20 and ResponseNet 50 require two sets of inputs.Because of its good performance among other methods requiring only a single seed list, Omics Integrator was selected as the main tool for network reconstruction.In this study, we merged the outputs of multiple parameter sets and re ned the underlying interactome to reconstruct better networks.The terminal set size varies between 6 and 214 across the cell line-drug pairs, and the representation of phosphoproteins is relatively limited (max of 20 hits).Proteins have weights re ecting their importance in the terminal set based on their type.
Transcription factors have the mean of their target gene expressions, phosphoproteins have the fold change compared to the control as the weight, and drug targets have a uniform weight inferred from the overall weight distribution of all proteins.
Overall, we constructed 236 subnetworks from the combination of 70 drugs and ve cell lines.However, not all cell lines have the same number of subnetworks.Out of 236 networks, cell line A375 has 70, A549 has 46, MCF7 has 43, PC3 has 59, and YAPC has 18 drug-speci c subnetworks (Supplementary Figure 3A).The topological characteristics of the drug networks can give clues about the perturbation or the connectivity of the modulated proteins.We summarized the topological properties of networks for each cell line (Supplementary Figure 4).We found that the nal network size is dependent on the number of altered proteins/genes from multi-omic data rather than the centrality of the drug targets in the interactome.All topological analyses are present in the Supplementary Material (Supplementary Note 2, Supplementary Figure 5).
We initially quanti ed the node level overlap between network pairs where extensive overlap means similar network neighborhood.98.4% of network pairs share at least one protein.Therefore, a comparison solely based on the overlapping nodes and node frequencies does not allow us to understand the comprehensive modulation of drugs (Supplementary Figure 3 and Supplementary Note 2).
The reconstructed networks preserve more detailed information about the drug effects beyond the common proteins and their association with cancer pathways.If two drug networks highly overlap, these drugs' action and phenotypic outcomes may be potentially similar.We applied the network-based separation approach, developed by Menche et al. 51 , to reveal disease-disease relations on each drug network pair within and across cell lines to explore the network level separation.The separation score represents how close two networks are to each other, a topology-based comparison calculating the average shortest distances between the nodes in each network in the reference interactome (see Methods).The topological overlap of two drug networks in a cell line re ects their similarities at the pathway level.We found several overlapping subnetworks for drug pairs that do not have common target proteins or chemical structures, such as BIX-01338, entinostat, etoposide.The similarity between two networks represented by the separation scores is found statistically signi cant based on a hypergeometric test (Supplementary Data 1).
We also compared the application of the separation score method on reconstructed networks against the direct application on the seed proteins.Direct application of the method resulted in a distribution of separation scores mainly within the negative range.Because of the molecular heterogeneity, it is rarely expected for any drugs to modulate the same pathways in all cancer types.This situation implies false positives across cancer types.The difference between the two applications stems mainly from the intermediate (Steiner) nodes found via the network reconstruction method.These intermediate proteins can potentially reveal the off-target effects of drugs (Supplementary Figure 6 and Supplementary Note 3).
Moreover, the effect of link prediction applied on the interactomes used in network reconstruction is investigated to reveal whether the predicted edges cause any misleading results or not.We observed that predicted edges constitute very low percentages of total edge count in the networks, and they do not cause any spurious proteins to be included in the networks.On the other hand, the predicted edges can be considered as the noise introduced to the reference interactome.The very low level contribution of link prediction to the reconstructed networks show the robustness to this noise (Supplementary Note 4, Supplementary Table 1 and Supplementary Data 2).

Chemically and Functionally Different Drugs May Modulate Overlapping Networks
Transforming the networks into a matrix of separation scores (Supplementary Data 3) and clustering represents the overlaps of the networks as drug modules.We observed that the network overlap of some drugs is very high in speci c cell lines such as A375, MCF7, and PC3, although their chemical similarity is limited.All-pair separation scores of drug modules in the A375 (skin melanoma) cell line are illustrated in Figure 2A with corresponding MoA similarities (T1=same MoA, T2=different MoA).We noticed that the chemically and functionally different drugs might modulate similar networks.Separation scores of network pairs in other cell lines can be found in Supplementary Figure 7.For example, we observed that tacedinaline and geldanamycin have overlapping networks in the A375 cell line, suggesting similar signaling output despite their different targets or mechanism of action (separation-score = -0.61, Figure 2B).
Tacedinaline is a selective HDAC1 inhibitor, while geldanamycin is a heat-shock protein (HSP) inhibitor targeting HSP90AB1 and HSP90AA1 proteins.Effects of HDAC inhibitor and HSP90 inhibitor drugs on HDACs and HSPs and the potential interplay between HDACs (especially HDAC6) and HSP90 has been highlighted in several studies [52][53][54][55][56][57][58] .In our analysis, two drugs share several downstream transcription factors (i.e., HIF1A, HSF1, CEBPA, CEBPB, AR, FOXO1/3) in the reconstructed networks.We then linked these drug targets to cancer phenotypes using CancerGeneNet 59 which calculates the shortest paths between genes and phenotypes.We found that the downstream transcription factors of HDAC1 and HSP90s are shared on the path toward cell death and differentiation phenotypes (Figure 2B sub-panel).
The intersection of tacedinaline and geldanamycin networks in A375 is enriched in several pathways such as MAPK, AMPK, and TGF-beta signaling pathways.Moreover, transcriptomic data of the two drugs are highly correlated.Despite the high overlap, they also have some disjoint pathways enriched in each speci c network.Tacedinaline affects chemokine, neurotrophin, ErbB, TNF, FoxO, PPAR, prolactin, and thyroid hormone signaling pathways that are not present in the geldanamycin network (Figure 2C).
To systematically evaluate the performance of this method, we classi ed the drug pairs having the same MoA and similar resulting networks as true positives (TP) and drug pairs having different MoA and overlapping networks as false positives (FP).We de ned drug pairs having the same MoA and separated networks as false negatives (FN) and drug pairs having different MoA and separated networks as true negatives (TN).We plotted the ROC curve and Precision-Recall curve using different threshold values (Supplementary Figure 8 and Supplementary Data 4).Because the data is imbalanced (200 positive cases -pairs with same MoA, 6017 negative cases -pairs with different MoA), the best performance is found based on Matthews Correlation Coe cient (MCC).The best performance is achieved when the separation score threshold is selected as -0.45.(precision=0.138,recall=0.400,accuracy=0.900,sensitivity=0.400,speci city=0.917,TPR= 0.400, FPR=0.083,F1-score=0.205,MCC=0.192).Same MoA and overlapping networks are expected, but the interesting cases are the drug pairs with different MoA and overlapping networks.We, therefore, further studied the literature to nd evidence about mechanistic similarities among FP drug pairs.This evidence strengthens our claim that even drugs without the same MoA can perturb similar pathways (Supplementary Table 2).Despite being an FP based on the ground truth, the synergistic behavior of decitabine and entinostat was previously shown in the activation of FoxO1 60 .
There are network pairs following the ground truth, i.e., drugs having the same MoA and overlapping networks.One example of drugs with the same MoA is sirolimus and OSI-027, which are selective mTOR inhibitors.Phase I clinical trials of OSI-027 were completed in 2013 for the investigation on patients with advanced solid tumors or lymphoma.It has activity on pancreatic ductal adenocarcinoma as an inhibitor of cell proliferation 61 .The networks of sirolimus and OSI-027 in A375 cells highly overlap (ss = -0.562).They have many common proteins and shared signaling pathways such as MAPK signaling, PI3K-Akt signaling, and cGMP-PKG signaling pathways.However, the OSI-027 network has speci cally an enrichment of Hippo signaling, Jak-STAT signaling pathways, while sirolimus differs from OSI-027 with the enrichment of Wnt signaling, ErbB signaling pathways, and some immunity-related pathways such as Toll-like receptor signaling pathway (Supplementary Figure 9A-B).
Selumetinib and PD-0325901(Mirdametinib) are MEK inhibitors targeting MAP2K1.We showed a consistent similarity between two drugs for the same cancer types.The separation scores between two drugs reach up to -0.64 in YAPC (pancreatic cancer) (Supplementary Figure 9C-D).This similarity is a supportive result for the clinical studies.However, the network perturbation of these drugs may vary in different cancer types.For example, the separation score between PD-0325901 in the PC3 cell line and selumetinib in the YAPC cell line is 0.48.While the affected pathways are FoxO, VEGF, ErbB cAMP, Rap1, Ras, GnRH signaling pathways in PD-0325901 treated PC3, Jak-STAT, TGF-beta, PI3K-Akt, TNF, and Wnt signaling pathways are enriched in selumetinib treated YAPC (Supplementary Figure 9E-F).
Target selectivity of the drug pairs determines the network separation, despite having the same MoA.
The analysis of drug modulation in a cell line-speci c manner helped determine a subset of drugs acting similarly in A375, MCF7, and PC3 (Supplementary Data 5).Afterward, we measured the separation of the drugs across different cell lines to discover if the same subset of drugs similarly acts in various tumor types.We compared the separation scores of drug pairs from 236 networks (Figure 3A) by dividing them into the same cell line and the different cell line groups.Next, we checked if two similar drugs in the same cell line act similarly in different cancer types.The results show that the network level overlap between drug pairs in the same cell line is preserved across different cell lines, but their separation is relatively higher.For example, resveratrol and geldanamycin are highly overlapping in the A375, MCF7, and PC3 cell lines (in A375, separation score=-0.60; in MCF7, separation score=-0.47; in PC3, separation score=-0.54),but their overlap is lower across cell lines such that separation score between A375-resveratrol and MCF7geldanamycin is -0.40 and separation score between A375-resveratrol and PC3-geldanamycin is -0.43.
Another aspect of this difference is the target selectivity of the drugs.Two examples of this phenomenon are the HDAC inhibitors and JAK inhibitors.RGFP-966 is a slow-on/slow-off, competitive tight-binding selective HDAC3 inhibitor.Tacedinaline selectively targets HDAC1.Other HDAC inhibitors, Belinostat, Entinostat, Trichostatin-a, and Vorinostat, inhibit several HDACs.The network separations of these HDAC inhibitors across all cell lines vary based on the target selectivity of these drugs.We observed both overlapping and non-overlapping networks depending on the cell types.Belinostat, Trichostatin-a, and Vorinostat networks usually have some part overlapping in all cell types, as these three drugs are in the same structural group, and it is expected for them to behave similarly.Tacedinaline treatment activates similar mechanisms on A375, MCF7, and PC3 cells as separation scores between these are negative, but the network of Tacedinaline on YAPC cells is more distant than other cell networks.The separation score of Tacedinaline networks on A375 and YAPC cells is 0.38.YAPC-Tacedinaline network is a small network.It is not signi cantly enriched for any pathways other than Osteoclast differentiation which is recently shown to be induced in pancreatic cancer-derived exosomes 62 .In contrast, the A375-Tacedinaline network is enriched for several critical signaling pathways such as MAPK, ErbB, FoxO, TGF-beta, and TNF signaling pathways (Figure 3B-C).
However, RGFP-966 treatment on A375, MCF7, and PC3 results in distant networks both from each other and from the networks of other HDAC inhibitors.All pairwise separation scores of RGFP-966 networks are positive (MCF7-PC3:0.42,A375-PC3:0.28, and A375-MCF7:0.45),and the number of shared proteins is minimal.Selectivity of RGFP-966 on HDAC3 among all HDAC proteins results in the induction of various pathways in different cell types.When RGFP-966 is compared to Belinostat, Entinostat, Trichostatin-a, and Vorinostat, the most distant networks have separation scores around 0.4, and these networks generally belong to A375 and PC3 cells.For example, the PC3-RGFP-966 network is separated from A375-Trichostatin-a with a score of 0.46.Two networks are not commonly enriched in any pathway, while PC3-RGFP-966 is only enriched in cell cycle, but A375-Trichostatin-a is enriched in several pathways such as estrogen, neurotrophin, and TNF signaling pathways (Supplementary Figure 10A-B).As Tacedinaline is the other selective HDAC inhibitor, RGFP-966 and Tacedinaline treatments cause more distant PPI networks.RGFP-966 in PC3 and Tacedinaline in A375 have the most different networks with a separation score of 0.62.They only have three proteins in common, and there are no commonly affected pathways (Supplementary Figure 10C-D).
Among three JAK inhibitors, ruxolitinib, tofacitinib, and TG-101348 (fedratinib), only TG-101348 is known for its selectivity to JAK2 protein, and the rest targets JAK1, JAK2, JAK3, and TYK2 proteins with varying a nities [63][64][65][66][67][68] .According to our results, the same drug may modulate different subnetworks in different cell lines, despite a marginal overlap.Within the same cell lines, tofacitinib and ruxolitinib networks have the most distant network pairs with separation scores of 0.68 in A375 and 0.46 in PC3.In A375, they only share the target proteins, JAK1 and JAK2.The tofacitinib network differs from ruxolitinib with the enrichment of Jak-STAT, ErbB, PI3K-Akt signaling pathways.In PC3, they share two more proteins other than JAK1/2 while the ruxolitinib network is enriched in the AMPK signaling pathway, and the tofacitinib network is only enriched in the Jak-STAT signaling pathway.A relatively low separation score (-0.147) is obtained from networks of TG-101348-and tofacitinib-treated A549 cells.Two networks share 23 proteins besides the target proteins, and commonly induced pathways include Jak-STAT and Notch signaling pathways.However, TG-101348 and tofacitinib networks have a separation score of 0.4 in A375, and the Jak-STAT signaling pathway is commonly enriched while the A375-tofacitinib network differs from TG-101348 with ErbB, mTOR, PI3K-Akt signaling pathways (Supplementary Figure 11).

Topologically separated drug networks may guide the use of drug combinations
The most effective strategy for nding drug combinations is to modulate the orthogonal sets of proteins or pathways simultaneously.The modulated pathways should be close enough to in uence the disease module commonly.That can be described as targeting a disease through multiple signaling pathways 11,69 .Therefore, networks play an essential role in identifying drug combinations.Cheng et al.
showed the e cacy of drug combinations using drug targets and disease proteins by calculating their separation mapped on the interactome 46 .Given two drugs, if each drug module overlaps with a disease module and these two drug modules are topologically separated, this is de ned as complementary exposure 46 .They demonstrated six classes of drug combination-disease interactions from FDA-approved or experimentally validated pairs and found one class correlates with therapeutic effects (complementary exposure).In our study, we found the intersection between our network models and the drug combinations provided in Cheng et al.Our reconstructed drug networks include many hidden proteins and modulated omic hits besides the drug targets.Therefore, the coverage of the networks is higher than the drug modules described in previous studies.In our analysis, we considered the disease module as the set of cancer driver genes (obtained from Cancer Genome Interpreter 70 ).We de ned a rule where if two drug networks have at least one cancer driver protein in common and each has a topologically disjoint region, then the combination of these drugs can be effective.(Figure 4A).We only have four drug pairs intersecting with Cheng et al., but when we consider them per cell type, we could analyze seven drug pairs in total.
The separation scores of seven cases vary between -0.09 and 0.675.The network pairs in this analysis share a limited number of nodes (min three, max 16 proteins).KEGG pathway enrichment of these drug networks shows that each drug modulates a different region of the interactome so that different signaling pathways are perturbed, and their combination may be effective (Supplementary Figure 12-13).
Considering these seven cases and the concept of complementary exposure as a guideline, we rated other drug pairs in our dataset with similar criteria.If two networks are separated (ss>0.0)and share less than two common enriched pathways and the networks are not in small size (number of nodes > 40), and at least one is a large network (number of nodes > 100), then these drug pairs have potential to be used in combination.Given these criteria, we found 1,441 potential drug combinations out of 6,217 possible cell line-drug network pairs.Some of these new drug pairs were previously supported in the literature for their use in combination therapy.The top-ranking drug pair was I-BET-762 and lenalidomide in the A375 cell line.Two networks are separated with a score of 0.693, and they do not share any modulated signaling pathways.Lenalidomide network is enriched in several essential pathways such as PI3K-Akt, ErbB, MAPK, Ras, Rap1, VEGF, and FoxO signaling pathways, while the I-BET-762 network is enriched in Jak-STAT signaling and transcriptional misregulation in cancer pathways (Figure 4B).Three proteins are shared between these networks.Among them, "BRAF" is a cancer driver protein.These networks also contain different disease-related proteins (Figure 4C).The combination of lenalidomide with CPI-203, a primary amide analog of (+)-JQ1 and having the same mechanism of action as I-BET-762, is shown to synergistically induce cell death in bortezomib-resistant mantle cell lymphoma 71 .Similar activities of BET inhibitors and lenalidomide are reported in different studies of multiple myeloma 72,73 .
The separation between networks of a drug across cell lines gives clues about their sensitivity levels.
Next, we associated the network models with the drug sensitivity of the cell lines.For this purpose, we collected drug sensitivity scores of cell lines deposited in the Genomics of Drug Sensitivity in Cancer (GDSC) Database 80 .The intersection of GDSC and CMap datasets results in ve drugs that at least one cell line is signi cantly resistant or sensitive to.A375 is sensitive to CHIR-99021 and PD-0325901.CHIR-99021 has reconstructed networks in three cell lines (A375, MCF7, and PC3).PD-0325901 has reconstructed networks in four cell lines (A375, A549, PC3, and YAPC).Trichostatin-a has networks in two cell lines (MCF7 and A375), and MCF7 is signi cantly sensitive to Trichostatin-a.Thus, we can compare the networks to better understand the changes in sensitivity of cell lines to these drugs.PC3 is resistant to Staurosporine and YAPC is resistant to Dinaciclib.Since there may be several processes underlying the resistance of cells and we can not directly infer these mechanisms from our reconstructed networks, we focused only on the drugs that cell lines are sensitive to (CHIR-99021, PD-0325901, and Trichostatin-a).
If a drug has dissimilar network models across different cell lines, sensitivity to that drug may also vary (Figure 5A).Since Trichostatin-a(TSA) has highly overlapping networks across three cell lines, we could not observe this pattern for sensitive MCF7 across A375 and PC3.MCF7-TSA network differs from others with active TGF-beta signaling and cell cycle pathways, while A375 is differently enriched in MAPK signaling, neurotrophin signaling, estrogen signaling, TNF signaling pathways, and PC3 differs with Jak-STAT signaling and AMPK signaling pathways.TSA targets class I and class II HDACs.Treatment with HDAC inhibitors is reported to restore TGF-beta signaling in breast cancer 81 , so it is expected to observe that the TGF-beta signaling pathway is enriched in MCF7 cells (Figure 5B).The higher difference between z-scores implies the more separated networks in CHIR-99021 and PD-0325901 modulation.Therefore, analyzing the networks of these drugs and exploring enriched pathways may give clues about the resistance to these drugs.PD-0325901, to which A375 cells are sensitive, is a selective MAP2K1 (MEK1) inhibitor directly related to cell proliferation.Given the MEK is in the middle of the RAS/RAF/MEK/ERK pathway, it is on the upstream of several cellular mechanisms for cell proliferation and cell survival.In networks of two cell lines whose z-scores are close to the resistance threshold, YAPC and PC3, we observed several modulated pathways as expected.While the neurotrophin signaling pathway is enriched in all cells, PC3 cells differ with several pathways such as cGMP-PKG signaling, sphingolipid signaling, Rap1, and Ras signaling, VEGF signaling, Oxytocin signaling, FoxO signaling, and Fc epsilon RI signaling.Sphingolipid signaling is shown to be involved in the resistance of prostate cancer cell lines to antineoplastic drug treatment (z-score of PC3 is 1.95, very close to the resistance threshold) 82 (Figure 5C).
We also analyzed all possible relationships between z-score differences and separation scores.We could nd z-scores of 30 drugs and calculated pairwise z-score differences of those with cell line-drug networks and their associated separation scores.After ltering for pairs with one negative z-score and one positive z-score, and separation scores higher than -0.45, we performed a regression analysis and observed a moderately positive correlation such that separation score increases with increasing delta z-scores (R=.29, p-value=0.011)(Figure 5D).

Discussion
Modulation of drug targets is not local when the complex interactions between molecules are considered within the cell.The impact can diffuse to distant parts or modulate several pathways simultaneously.Each drug has its speci c modulated network, which can be used to detect the similarity between drugs.
We, therefore, reconstructed the networks of cancer drugs by integrating transcriptomic, phosphoproteomic, and drug targets with protein-protein interactions using Omics Integrator.We modi ed the underlying reference interactome with the gene expression data and the predicted interactions to overcome its incompleteness and make it cell line-speci c.The resulting drug networks cover the hidden proteins besides the initial data to explore connected mechanisms.We hypothesize that the topological overlap between each drug network pair on the interactome can be a solid basis for identifying their pharmacodynamic similarity.Our topological and pathway-level analysis of 236 reconstructed networks from ve cell lines and 70 drugs demonstrated that chemically and functionally different drugs may modulate overlapping networks, which conventional comparisons cannot reveal.For example, tacedinaline and geldanamycin target different proteins, but their networks in the A375 cell line share a module consisting of common transcription factors leading to cell death and differentiation phenotypes.
Another example is the drugs targeting the same proteins and modulating separated networks i.e., sirolimus and OSI-027.Sirolimus network is enriched in Wnt, cAMP, ErbB, and FoxO signaling pathways, while OSI-027 network is differently enriched in Jak-STAT and Hippo signaling pathways in A375 cells.Besides exploring the different drugs in the same tumor type, we can also compare the network-level impact of a drug across multiple cell lines.Tacedinaline, one of the HDAC inhibitors, modulates MAPK, ErbB, FoxO, TGF-beta pathways in the A375 cell line in the reconstructed network while these pathways are absent in the network of YAPC cell line.Additionally, the target selectivity of the drugs is a strong signature of the separation of the networks.We, therefore, speci cally investigated the networks of HDAC inhibitors and JAK inhibitors.As an example, RGFP is the selective member of HDAC inhibitors.Networks of drugs, such as vorinostat, belinostat, trichostatin-a, targeting the same HDAC proteins have overlapping regions in almost all cancer cells.On the other hand, the network of RGFP-966, which targets HDAC3 selectively, is topologically distant from other HDAC inhibitor networks in different cancer types.
As evidenced by Menche et al. 51 that the overlap of the disease modules can quantify the molecular similarity of the diseases, we applied this quanti cation to interpret the drug similarities and eventually the effects of potential drug combinations.Treatment strategies combining drugs are used to decrease adverse effects and to prevent resistance to drugs [83][84][85][86] .Therefore, we compared our drug network models with experimentally validated drug combinations.Recently, the DREAM AstraZeneca-Sanger drug combination prediction challenge provided a large combinatorial cell line screening dataset.The challenge was open to methods to predict synergistic drug combinations 87 , i.e., machine learning or network-based approaches [88][89][90][91] .In this study, we found that the experimentally synergistic drug pairs modulate topologically separated networks.Using their separation scores, common pathway enrichments, and network sizes, we rated cell line-drug network pairs and found 1,441 drug pairs out of 6,217 possible pairs.These predictions can be further used to experimentally study their synergy on speci c conditions and provide guidance to nding alternatives to currently used drug pairs.In the future, we will integrate our network-level results with learning-based approaches to better understand the pharmacodynamics of drugs.
Another use of drug network models is to understand the drug resistance mechanisms.To reveal the differences between sensitive and resistant cells to a given drug, we compared the networks of which one is signi cantly sensitive, and the other is not to the given drug.We found that the networks of cell lines having different responses to the same drug have limited or no overlap.Therefore, pathway-level comparison and the topological analysis of drug networks provided the potential to guide us better understand drug response.
In summary, we leveraged networks to nd pharmacodynamic similarities of drugs within and across multiple cancer cell lines.Further, these reconstructed networks are used for understanding drug response and combinations.We believe that network-based approaches are fundamental in interpreting the drug actions in tumors and integrating multi-omic data.This study is directly devoted to this task.

Data
The 'omic' data is obtained from ConnectivityMap (CMap), which contains 1.3 M L1000 data for nine different core cell types, treated with 27,927 perturbagens.Within CMap, the Touchstone V1.1 dataset contains 8,388 perturbagens (well-annotated genetic and small-molecular perturbagens) for nine cell lines for three time points and three replicates; and P100 dataset contains proteomic data for six different core cell types, treated with 90 perturbagens 92,93 .In addition to transcriptional data (L1000), proteomic and phosphoproteomic data for a subset of these perturbagens have been recently released (P100, released on 4/13/18).P100 dataset was used in our study, which contains already ltered transcriptional and phosphoproteomic data for the same set.
The P100 dataset represents three different read-outs (phosphosignaling (P100), chromatin modi cations (GCP), and transcriptional changes(L1000)) of the same 90 small-molecule perturbations in six cell lines.The duration of treatment was 3 hours for P100, 24 hours for GCP, and 6 hours for L1000.These data are available at multiple levels of processing: level 1 is uorescence intensity (for L1000) or mass spectrometry extracted ion chromatogram traces (for P100, GCP); level 2 is gene expression or proteomic values without normalization; level 3 is normalized; and level 4 is differential (i.e., each sample is compared to all other samples on a plate).We used normalized (level3) data of L1000 and P100 for ve cancer cell lines in our studies.L1000 assay measures transcription levels of 978 genes (referred to as Landmark genes) and 80 control transcripts directly.It then infers the expression levels of the remaining 11,350 genes via Ordinary Least Square (OLS) regression.From 11,350 inferred genes, 9,196 genes are considered well inferred and called Best Inferred Genes (BING).We used landmark genes in the terminal set preparation procedure.Phosphoproteomic data generated by the P100 assay consists of 96 phosphosites and is a reduced representation of phosphoproteomics, targeting common signaling pathways.
The drug-target interactions are retrieved mainly from CLUE Drug Repurposing tool 47 , which curates information from several databases such as DrugBank & PubChem (https://clue.io/repurposing-app).
The reconstructed networks have been oriented through these targets for each drug.
We also have the edge-weighted protein-protein interaction network for the network modeling part retrieved from iRefWeb, v13.0 reference human interactome, which has 15,404 nodes (proteins) and 175,820 weighted edges (protein interactions) without self-loops.The weights of edges represent how real an interaction is based on the MIScore function.

Calculation of Tanimoto and MACCS key distance similarities
For 70 drugs listed with network reconstruction studies, SMILES signatures are collected from chemical databases such as PubChem and DrugBank.All pairwise Tanimoto and MACCS key similarities are calculated with python using the open-source RDkit 94 module.Hierarchical clustering on similarity matrices is performed with the Morpheus tool of BROAD institute with the parameters as the metric is 'euclidean distance' and linkage method is 'average' (https://software.broadinstitute.org/morpheus/).

Network Modeling Approach
For network reconstruction purposes, a solution to the prize-collecting Steiner tree problem was searched through Omics Integrator software 19 to nd an optimum tree.Nodes obtained from the experiments (terminal nodes) and nodes not detected in experiments and obtained by the algorithm (Steiner nodes) were determined by this process.Let G = (V, E, c, p) be an undirected graph, with the vertices/nodes V associated with non-negative prizes p(v), and with the edges E associated with non-negative costs c(e).The Prize-Collecting Steiner Tree problem (PCST) consists of nding a connected subgraph T = (V', E' ) of G, that minimize the weight of T, which is the sum of its edge costs plus the sum of the penalties of the vertices of G outside of the solution.
Omics Integrator is a software package that applies the prize-collecting Steiner forest (PCSF) approach to construct the most biologically relevant protein-protein interaction network.This tool is e cient in the integration of different omics data using interactome data.
There are two distinct tools within the Omics Integrator package: Garnet and Forest.To integrate different experimental results derived from mRNA, proteins, metabolites, etc., measurements, Garnet and Forest complement each other and use omics data either gathered from experiments or derived from several databases.In this study, only the Forest tool was used.
Forest tool constructs the interaction network by using user-de ned omic data that is the list of proteins/genes together with their importance(prizes), and by using interactome data together with their signi cance levels.Each protein/gene given as input to the Forest tool is de ned as 'terminal'.If necessary, Forest can add extra nodes from the interactome data called 'Steiner' nodes.When constructing the network, the algorithm optimizes the score by calculating the sum of prizes of nodes not included plus the costs of edges included.The algorithm seeks to minimize the score to nd the most optimum and biologically relevant protein-protein interaction network.There is always a possibility to include 'hub' nodes, highly-connected nodes.Forest uses a generalized prize function to decide whether these 'hub' nodes are, in fact, essential and should be included in the network or not.This generalized prize function assigns negative weights to nodes according to the number of connections they have in the interactome.The function is: where β and µ are scaling parameters and degree(v) is the number of connections of node v in the interactome.β is used to calibrate the effect of terminal nodes, and µ is used to calibrate the effect of hub nodes.µ is 0 in the default parameters where hub correction is disabled; if it is increased, the algorithm tries to exclude these hub nodes.β enables the algorithm to include terminal nodes, and increasing β facilitates more terminal nodes to be included in the nal network.
The forest tool has six PCSF parameters; however, ω, β and D are the ones that at least need to be de ned by the user.D is the depth parameter which is the maximum path length from v 0 to terminal nodes.The other three optional parameters are µ, g (reinforcement parameter, default is 1e-3), and garnetBeta (scales the Garnet output prizes relative to the provided protein prizes, default is 0.01).
Transcriptional and phosphoproteomic datasets (P100) were used to model protein-protein interaction networks.Gene expression data allows detecting transcription factors while phosphoproteomic data allows differentiating active and inactive proteins.By integrating these two data types and drug target data, multilayer networks were constructed.For parameter tuning purposes, we run the algorithm for each combination of D=10, µ=[0.00,0.005, 0.01, 0.015, 0.02, 0.025, …, 1.0], β= [2, 3, 4, 5, 10] and ω= [1, 2, 3].After collecting all the networks, we chose the ones with the highest number of terminal nodes and the minimum number of 'hub' proteins (degree>100) and then merged all of these networks to get the nal cell line-drug network.

Preparation of Terminal as an Input for Network Reconstruction Method
All analyses were performed with the python scipy 95 module.L1000 data includes uorescence values of several replicas for transcriptional measurements of both drug-treated samples and DMSO treated samples as control.These two conditions for each landmark (genes directly measured by L1000 assay) gene were compared with one way-Anova method after verifying the assumptions and p-values for each landmark gene are collected.The p-values for each gene as drug-treated condition compared against control condition is generated, genes that have lower p-value than the selected threshold were collected and labeled as 'signi cantly transcribed genes'.These 'signi cantly transcribed genes' are listed for each cell line-drug pair.Also, log2 fold changes of average transcription values of each gene compared to the control condition are calculated and stored to be later used in the prize designation of terminals.
From L1000 data, by using the 'signi cantly transcribed genes' list and transcription factor regulatory network 96 , we found out transcription factors that are regulating any of the genes in our list.If a transcription factor regulates at least 3 of the genes in the signi cant gene list, we used it as a terminal, and mean log2 fold changes of interactors of these transcription factors are used as their prizes.From P100 data, 'signi cantly phosphorylated proteins' are collected using the phosphosites passing the pvalue threshold(p<0.05).
Lastly, these two lists were joined together, prioritizing those coming from proteomic data.Also, the targets of the drug of interest were appended to the terminal set with a uniform weight inferred from the overall weight distribution of all proteins.
Application of link prediction strategy on the interactome common between both networks (A and B), the distance d AB of that protein is 0.
For our comparison purposes, this separation scoring method is used.Several separation score matrices were prepared to be able to compare drug networks on different levels.As there are different cell lines, there are matrices for each cell line that effects of drug treatments were compared within a cell type.Separation score matrices for each drug were also prepared to compare the drug's effect based on the cell type.Finally, a separation score matrix containing scores of pairwise network comparisons of all cell line-drug pair combinations was prepared to have a broader idea.All matrices are subjected to hierarchical clustering using python scipy.cluster.hierarchymodule.

Pathway-based Comparison
Reconstructed networks are subjected to functional analyses based on their KEGG pathway enrichments.
Pathway enrichments are calculated using DAVID source code 98 .All signi cantly enriched pathways (p<.05) are collected for each network and signaling pathways are ltered to be later used for comparison purposes.

Calculation of topological features of the reconstructed networks
The number of nodes, number of edges, average degree, average shortest path lengths, density, and diameter of the networks are calculated with python using Networkx 99 module.

Hypergeometric Tests
Hypergeometric tests are used to analyze the signi cance of the network overlaps.For this purpose, hypergeom method of python scipy.statsmodule is used.The parameters of this method are x, M, n, N that are de ned as the number of nodes common in both network, number of nodes in the larger network, number of nodes in the smaller network and number of total nodes found in the human interactome, respectively.

Declarations
Data Availability datasets used in this work are publicly available from the following sources: The CMap data was downloaded using accession code GSE101406.Installation details and documentation for Omics Integrator software can be found on this link: https://github.com/fraenkel-lab/OmicsIntegrator.We used the PPI network from https://github.com/fraenkel-lab/OmicsIntegrator(IRefIndex Version 13.0), and drugtarget data is curated from Cmap Drug Repurposing Tool (https://clue.io/repurposing-app ).The script for parameter tuning performed for network reconstruction can be downloaded from https://github.com/gungorbudak/forest-tuner.The separation score method is adapted from the source

Figures
Figures

Figure 1 Overview
Figure 1

Figure 2 Cell
Figure 2

Figure 3 Network
Figure 3

Figure 4 A
Figure 4