ProTargetMiner: A proteome signature library of anticancer molecules for functional discovery

We present a publicly available, expandable proteome signature library of anticancer molecules in A549 adenocarcinoma cells. Based on 287 proteomes affected by 56 drugs, the main dataset contains 7,328 proteins and 1,307,859 refined protein-drug pairs. By employing the specificity concept in partial least square modeling, deconvolution of drug targets and mechanistic proteins is achieved for most compounds, including some kinase inhibitors. We built the first protein co-regulation database that takes into account both protein expression and degradation. A surprising number of strong anti-correlations is found, underscoring the importance of protein repression in cell regulation. Our analysis uncovered a group of proteins with extremely steady expression which are likely essential for core cellular functions. These findings bring about deeper understanding of cell mechanics. Extension of the dataset to novel compounds will facilitate drug design. The introduced specificity concept and modeling scheme are beneficial in other analysis types as well. Statement of Significance ProTargetMiner is the first of its kind library of proteome responses of human cancer cells to anticancer molecules. This expandable resource facilitates the deconvolution of drug targets, action mechanisms, and cellular effects. It reveals death modalities, uncovers protein co-regulation and anti-correlation networks and defines the “untouchable” proteome essential for core cellular functionalities.


Introduction
Deciphering the targets, action mechanisms, resistance factors, and the modalities of cancer cell death for novel compounds, especially for those derived from phenotypic screenings, are all challenging tasks in drug discovery. These tasks can be addressed by connecting the affected cellular phenotypes to small molecules by socalled connectivity maps (1)(2)(3). Such an approach explores the similarities of the cell response signatures produced by the compound of interest to other responses in the database. However, majority of studies published so far are based on gene expression (mRNA) profiles. Since proteins are the targets of most drugs, proteome responses can be more specific to drug action. One recent effort has focused on building a connectivity map based on phosphoproteomic and chromatin signatures, measuring the abundances of 100 phosphopeptides and 59 histone modifications for treatments with 90 drugs (4).
There is no reason why protein abundances cannot serve a basis for connectivity maps; after all, biological systems are defined by their proteome status. Also, protein abundances are in general as much determined by expression as by degradation (5), both reflected uniquely in proteomics data. In several studies, no strong correlation between mRNA levels and protein concentrations has been found even at the steady state (6). In dynamic situations where degradation processes play an important role, such as programmed cell death (7), the relationship between the transcriptome and proteome should become even less direct.
Here we use chemical proteomics to study the relationship between the anticancer drug molecules and the dying cell phenotype induced by these molecules.
Chemical proteomics has traditionally been defined as the use of small molecules (which are considered known entities) in studying the unknown functions of proteins (8).
Recently, chemical proteomics began to designate also the opposite approach, in which proteome analysis is applied to studying functions of small molecules (9).
We have previously shown that, when sensitive cell lines are treated with the compound of interest, drug targets are consistently found among the most regulated proteins and mapping these proteins on protein networks can reveal the drug action mechanism. This observation served as a basis for a new chemical proteomics method called Functional Identification of Target by Expression Proteomics (FITExP) (10). In many cases, the affected target and other mechanism-related proteins are found upregulated. This can be explained by a feedback effect when inhibition of certain proteins activates their (over)production. The alternative effect, involving target protein down-regulation, can be caused by protein translocation followed by proteolysis (11).
To increase the specificity of analysis for a given compound, a panel of known drugs is added to the experiment. The specificity parameter reflects the protein regulation in the treatment by a given compound versus all other compounds. FITExP could successfully identify the targets of several chemotherapeutics (10), and probe the targets and mechanisms of metallodrugs (12) and even toxic nanoparticles (13). We have also shown that combining the proteomic data from treated matrix-attached cells with matrix-detached cells can improve the deconvolution of drug targets and action mechanisms, and identify proteins involved in cell life and death decisions (14).
Achieving a high level of specificity in analysis usually requires the use of several compounds and cell lines. We hypothesized that the equivalent increase in specificity can be obtained with a single cell line when a multitude of "contrasting" drugs is used.
While specifically regulated proteins help to identify the drug targets and the action mechanisms, mapping the whole proteome, and even the set of ≥ 1000 most abundant proteins, may help to determine the cell death modality (15). Assuming that the hypothesis on the molecule-specific nature of the obtained dataset holds, interrogating a cell line with an extensive (>50 molecules) drug panel was expected to probe most cell death pathways. Each pathway can be represented by a cell death trajectory in a space encompassing all possible cell states. Therefore, we could hope to determine the number of orthogonal death modalities -the subject that has generated a significant debate and even controversy in the cell death community (16).
We define a death trajectory caused by a toxic agent, to be a track in the proteome space, passing from a normal living state that the untreated cells occupy and explore during cell division, circadian and metabolic cycles, to a particular death state.
The proteome space and death trajectories are schematically shown in Fig. 1. The first crossing beyond the normal state defines the molecular action mechanism of the toxic agent.
We selected A549 human adenocarcinoma cells originating from lung cancer as a model system, because it is well covered in the literature, and it showed the highest sensitivity to the compounds among other tested cell lines (MCF-7 and RKO) in a pilot experiment. Viability measurements were performed for 118 clinical molecules cherrypicked for cancer from Selleckchem FDA-approved drug library and several experimental compounds with unknown targets. A collection of 56 compounds were chosen to treat the cells at LC50 concentrations for 48 h (selection criterion was the induction of 50% cytotoxicity in 48h at concentrations below 50 µM). With the biological effect (cell death) being of the same magnitude, the differences in the proteome states could be attributable to the differences in targets, action mechanisms as well as modalities of cell death and survival. The selected compounds belong, according to available information, to 18 different classes with versatile targets and mechanisms, spanning 112 known drug targets curated from DrugBank (https://www.drugbank.ca/) in September 2018. These drugs and their known targets are listed in Supplementary   Table S1. As standard drugs used for quality control, methotrexate, paclitaxel, and camptothecin were chosen and included in each TMT10 multiplexed proteomics experiment (labeling information for each experiment is given in Supplementary Table   S2).
The overview of the project's objectives and the workflow is given in Fig. 1.
Besides mapping the orthogonal death pathways, the obtained dataset allowed us to deconvolve compound targets and action mechanisms. We could also probe protein coregulation and anti-correlation networks, the untouchable proteome that is (almost) unchanged in all perturbations, as well as the variable proteome that exhibits large drug-to-drug variations. Here we present this dataset, called ProTargetMiner, as well as our most significant findings so far from its interrogation.  Interrogating a cell line with an   extensive drug panel will probe most cell death pathways, each represented by a different cell death trajectory, which will allow us to build a global cell death map (multidimensional "Death globe"), for clustering of compounds. The extensive proteome dataset will also provide specific information on drug targets, mechanistic proteins, and other specifically regulated proteins. The proteome perturbation data will provide novel protein co-regulation and anti-correlation links, as well as define untouchable and variable proteomes. B, Workflow: determination of LC50 values for the compound library; cell treatment with 56 compounds and sample preparation for shotgun proteomics; sample multiplexing including control and standard treatments (methotrexate, paclitaxel and camptothecin); lysis, digestion and labeling with TMT-10plex; combining the 10-plexed samples and fractionation of the pooled sample to increase the proteome coverage; analysis of individual fractions by LC-MS/MS; protein identification and quantification; data post-processing.

Overview of the Proteomics data
Overall, for the original dataset, 287 cell lysates were prepared and 229 LC-MS/MS analyses were performed. In total, 144,075 peptides attributed to 7,328 proteins were quantified in all experiments, with at least 2 unique peptides per protein. After selecting only proteins quantified in at least one replicate in each treatment, the list was reduced to 4,557 proteins (Supplementary Table S3) that were used in all subsequent analyses. The final resource therefore, contains 1,307,859 (287x4557) filtered high quality drug-protein data points.

Number of independent dimensions
Using factor analysis of the whole dataset, we identified at least 11 independent dimensions. Supplementary Table S4 lists the contribution of all proteins to each dimension. The most contributing proteins to the first dimension were cyclin-dependent kinase inhibitor 1 (CDKN1A) and PCNA-associated factor (PCLAF), with opposite signs.
While CDKN1A is involved in p53 mediated inhibition of cellular proliferation by blocking cell cycle progression, PCLAF is a cell cycle-regulated substrate that acts as a regulator of DNA repair during replication. The top 30 proteins of the first three dimensions map best to "p53 signaling pathway and cell cycle", "focal adhesion and angiogenesis" and "chromatin assembly and fatty acid metabolism", respectively. Some of the dimensions clearly correspond to classical cell death modalities, such as p53-dependent apoptosis (1 st dimension), autophagy (4 th dimension), and macromitophagy (6 th dimension), while other dimensions were harder to ascribe to known modalities. Associating top proteins defining these dimensions with cell death modalities is a subject of future research.

Drugs with similar mechanisms induce similar proteome changes
We employed a nonlinear dimension reduction method t-SNE that is widely used for projection of molecular signatures in transcriptomics, to reduce the proteomic space to three dimensions (17). As a result, we obtained the "Death globe", on which all druginduced proteome signatures are mapped as points. We used the proximity of these points in the 3D t-SNE projection to evaluate the similarity of the drug-induced signatures. As expected, drugs with similar mechanisms (e.g., tubulin depolymerization inhibitors paclitaxel and docetaxel, pyrimidine analogues 5-fluorouracil, floxuridine and carmofur, thioredoxin reductase inhibitors auranofin, TRi-1 and TRi-2 (18) and topoisomerase inhibitors camptothecin, topotecan and irinotecan) were proximate on the t-SNE plot, confirming that the Death globe approach can be used for evaluating the similarities between drug action mechanisms. Similar results were obtained with a 2D t-SNE plot ("Death map", Fig. 2A). We also employed a conventional correlation-based hierarchical clustering analysis to confirm the t-SNE findings (Fig. 2B).
We found tomatine to be a gross outlier in the t-SNE and PCA plots and thus excluded it from the subsequent analyses. Tomatine is likely to act via proteasome inhibition (19), along with unspecific membrane damage (20). These mechanisms may explain the extraordinary changes induced by tomatine in the cell proteome.

ProTargetMiner enables functional discovery
Protein regulation is usually defined as a ratio of the protein abundances in the cells incubated with a drug and vehicle control. However, regulation is not a parameter specific enough in drug target deconvolution because of the proteins involved in generic cell responses to toxic agents (e.g., death or survival pathways). Instead, we introduced "specific regulation", which was defined as the ratio of the regulation to a particular drug to the median regulation in all other drugs of the panel (10). Here, we also employ a sophisticated supervised classification method called orthogonal partial least square discriminant analysis (OPLS-DA), for the discovery of specifically regulated proteins (21). With this approach, different models can be easily built, contrasting a given compound, or a group of compounds, against either all other molecules or a selected subset. In these models, the proteins specifically up-or down-regulated in response to a given treatment are found on the opposite sides of the loading plot, with each protein represented by a dot (Fig. 3A). The position on the y-axis reflects the strength of the orthogonal components, and best target candidates are thus located near the x-axis extremities.
As representative examples of drug target deconvolution, OPLS models for methotrexate, paclitaxel, and vincristine are shown in Fig. 3B-D, respectively. The methotrexate target DHFR is convincingly identified. Tubulins are found to be most specifically up-regulated proteins for paclitaxel and down-regulated for vincristine.
These two drugs affect tubulin depolymerization and polymerization, respectively.
Network analysis of the specifically regulated proteins on either side of the model highlights the compounds' mode of action (right panel in Fig. 3).
ProTargetMiner can also reveal a compound effect on protein complexes and organelles. The proteasome inhibitor Bortezomib demonstrates enrichment of the proteasome subunits among specifically up-regulated proteins (Fig. 3E). The sorafenib model shows the specific down-regulation of NADH dehydrogenases and mitochondrial ribosomal proteins (Fig. 3F). This latter finding is in line with the earlier report for human neuroblastoma cells (22); thus the ProTargetMiner results can be cautiously generalized to other cells. 1 As another example, pyrimidine analogues floxuridine and carmofur downregulated ribosomal proteins similar to 5-fluorouracil (23) (Supplementary Fig. S1A). A similar effect on ribosome was also observed for the alkylating agent oxaliplatin ( Supplementary Fig. S1A). A very recent finding shows that oxaliplatin kills cells by inducing ribosome biogenesis stress, unlike its platinum analogues (24).

ProTargetMiner analysis of kinase inhibitors
One could suspect that ProTargetMiner would be less sensitive to kinase inhibitors because this analysis doesn't include assessment of the phosphorylation sites, while there are fewer reasons to expect for the kinase abundance to be affected by its inhibition than for metabolite-processing proteins or generally other targets. To Biochemical pathways affected by compounds are related not only to death pathways but also to cell survival (14). Therefore, the specifically regulated proteins could be potentially linked to drug resistance. For example, EGFR was specifically upregulated in the sorafenib model ( Supplementary Fig. S2C) and is known to be involved in resistance to this drug (26). Another kinase upregulated in response to sorafenib (and regorafenib) was AXL ( Supplementary Fig. S2C-D). AXL is a receptor tyrosine kinase regulating many aspects of cell proliferation and survival, and its overexpression induces resistance to EGFR targeted therapies (27). To investigate if AXL inhibition can rescue A549 cells from sorafenib and regorafenib, we combined these molecules with TP0903, a specific AXL inhibitor in non-cytotoxic concentrations (<100nM). The combination treatment significantly sensitized the cells to sorafenib and regorafenib in 24 h and 48 h ( Supplementary Fig. S2D).

The degree of drug-induced proteome changes
The degree of proteome perturbation can vary likely due to a difference in the number of affected targets and pathways. To test this hypothesis, the drugs were ranked by the overall deviation of their molecular signatures compared to the untreated state ( Fig. 4). Bortezomib induced the highest proteome variation, while tubulin inhibitors gave the least proteome perturbation potentially indicating the lack of offtarget effects. As an alternative deviation assessment, the number of up-or downregulated proteins was calculated for each compound at a 1.5 fold cutoff ( Supplementary Fig. S3A). For proteasome inhibitors b-AP15 and especially for bortezomib, the number of significantly up-regulated proteins was much higher than down-regulated proteins (up/down ratio of 17.8 for bortezomib compared to the average of 2.9 for all other drugs). The variation induced by bortezomib was much larger than by b-AP15, likely indicating that bortezomib is affecting more targets/pathways. As another example, the proteome variation was larger for auranofin compared to other thioredoxin reductase inhibitors TRi-1 and TRi-2, confirming the recent finding that auranofin has off-targets (18).

Complex-specific effects of compounds
To investigate the drug effects on selected protein complexes, we plotted the

A protein correlation database
As protein synthesis is a resource demanding cell function, protein expression is spatiotemporally controlled. Coordinated expression seems optimal for a set of genes involved in the same protein complex or biological pathway. Multiple studies have shown that co-expression can indicate functional relationships. Therefore, coexpression has been extensively exploited for deduction of gene function (1) through association analysis (30). However, co-regulation databases are mainly available based on transcripts expression (30,31), while proteomics can capture co-regulation more accurately (32). Yet to the best of our knowledge, no large-scale proteome co-regulation database has been made public so far.
ProTargetMiner provides an opportunity to build such a database of protein coregulation. We analyzed the pairwise correlations of proteins in a 4212 x 4212 matrix  Table   S8) mapped to "RNA binding" (24 proteins, p<0.017) and "catalytic activity" (52 proteins, p<0.017).
We also calculated the number of total co-regulating (and anti-correlating) partners for each protein, to reveal hub proteins (Supplementary Fig. S4B-C).
Mitochondrial trifunctional enzyme subunit beta (HADHB) and alpha (HADHA) with 276 and 271 co-regulation partners were on top.
For network visualization and pathway analysis, we mapped the proteins to StringDB entities and generated a set of external payload data, which can be uploaded to the StringDB or its plugin in Cytoscape using personal configurations (nodes as well as positive and negative edges are available in Supplementary Table S9). This way, known interactions and ProTargetMiner co-regulations can be visualized on the same plot.
The anti-correlation of protein abundances is often ignored, even though negative correlations are less likely than positive correlations to arise from technically induced artifacts (33), and can reveal opposing biological processes. For example, anti-correlations can reflect active regulatory transcriptional repression, activation or even canceling of such events (34). Therefore we investigated this aspect in more detail.
Since TP53-inducible glycolysis and apoptosis regulator (TIGAR) was among the top anti-correlating proteins with many anti-correlating protein pairs, we chose to compare its expression pattern with its most anti-correlating pair NCAPG (r=-0.84) (Fig.   5B). TIGAR and NCAPG, each one together with 5 most correlating proteins, were projected to protein networks (Fig. 5C). The TIGAR group mapped to p53 signaling pathway (3 proteins, p<0.002), while the NCAPG group mapped to condensin complex (5 proteins, p< 2.6E-13) and cell division (6 proteins, p<2.98E-07). This finding is in line with p53 mediated inhibition of entry into mitosis when DNA synthesis is blocked, and with TIGAR's role in p53-mediated protection from the accumulation of genomic damage (35). CMBL strongly co-regulated with TIGAR (r=0.89), but was not part of the p53 signaling pathway in String. However, a microarray screen found CMBL to be a p53-inducible protein (36), confirming the predictive power of protein co-regulations.
Analyzing in detail the TIGAR vs NCAPG dependence (Fig. 5D), we noticed that while the overall correlation coefficient was significant (r=-0.84), the slope -0.75 was far from -1, indicating an underlying complexity. A closer inspection revealed that, when TIGAR had an above-average regulation, its anti-correlation with NCAPG (r=-0.75, plot 1 on Fig. 5D) was stronger than when it was relatively down-regulated (r=-0.52, plot 2). Also, with TIGAR up-regulated, its slope with NCAPG was close to -1 (-0.97), as in perfect association. However, when TIGAR was down-regulated, the slope declined as well (-0.56). In contrast, when NCAPG was up-regulated, its anti-correlation with TIGAR was weak (r=-0.23, plot 3), but when NCAPG was down-regulated, it anti-correlated with TIGAR very strongly (r=-0.80, slope -1.01, plot 4). These findings showed that the anticorrelation can be bimodal, which led us to consider two pairs of correlation coefficients for every protein pair (Fig. 5E), with correlations calculated separately for above and below the median abundance of protein A. When these correlations were heat-mapped on a 2D plot with a cutoff of |r|>0.54 (Fig. 5F), five dense areas unexpectedly emerged (circles 1-5 in Fig. 5G) (the correlation data can be found in Supplementary Table S10). At first glance, the spots 2-4 look too symmetric in respect to the main diagonal, which raises a suspicion of an artifact. Indeed, if the spot produced by a single group of proteins had an elliptic rather than circular symmetry, then two symmetric Gaussians would fit to it instead of one. However, scrambled protein abundance data gave a single central spot with a circular rather than elliptic symmetry (Fig. 5H). Moreover, the spot 1 doesn't have a symmetric counterpart. Therefore, entities 2-4 appear to be real. Entity 1 ("highly co-regulated") composes 12% of the total number of (anti-) correlating protein pairs, encompassing pairs with a high co-regulation (average r=0.52).
The top 300 protein pairs (r≥0.90) mapped to ribosome, MCM and condensin complexes, chaperonin-containing T-complex, NADH dehydrogenases, pyruvate dehydrogenase, tubulin superfamily as well as some integrins and spectrins. This component is centered above the diagonal, meaning that co-regulation is higher when the proteins are down-regulated than when they are upregulated. One explanation for the asymmetry is that down-regulated abundances can reach zero, whereas upregulations cannot reach infinity. However, the asymmetry is so pronounced that a biological reason is a strong possibility. Indeed, upregulation of proteins, being caused by protein overexpression, has no natural stopper, while down-regulation is to a large extent driven by protein degradation, which stops or slows down when only strongly bound stoichiometric complexes are left in the system. Thus degradation can provide better correlation between the abundances of the constituent proteins in these complexes.
Entities 2-5 are all represented by Gaussians with very similar widths. Positive and negative components comprise 55% and 33% of total protein pairs, respectively. Entity 2 ("co-down-regulated") corresponds to protein pairs which have better coregulation when down-regulated (average r=0.5) than when upregulated (average r=0.3).
Entity 3 ("co-upregulated" proteins) is centered below the diagonal and encompasses proteins correlating better when upregulated (average r=0.5) than when 2 0 down-regulated (average r=0.28). Entities 2 and 3 account for 23% and 32% of correlating pairs, respectively. Entity 4 ("anti-correlating") has a higher anti-correlation when protein A is downregulated (average r=-0.29 vs r=-0.46), while the above-diagonal entity 5 ("negatively regulated") shows higher anti-correlation when protein A is upregulated (average r=-0.45 vs r=-0.29). The frequency of protein pairs in these two entities are also almost equal (16% and 17% for entity 4 and 5, respectively).
Highly anti-correlating pairs of a protein can provide useful information. For example, the upregulated TRIM28 strongly (r<-0.61) anti-correlates with 34 proteins.
These proteins map to focal adhesion (7 proteins, p<8.1E-06), and a recent study has indeed shown the direct involvement of TRIM28 in cell adhesion (38). Furthermore, the 34 anti-correlating proteins map on ErbB signaling pathway (4 proteins, p<0.001), in line with TRIM28 directly interacting with ErbB4 and suppressing its transcriptional activity (39). Thus, the anti-correlation data can be used in parallel with co-regulation data for deciphering protein function in a given context.

"Untouchable" proteome reflects essential cell functions
There has been an extensive debate on which house-keeping proteins (HKPs) can be used as steady-level controls in molecular biology experiments (40). Recent research has shown that classic HKPs may actually change their expression under specific conditions (41). For example in our dataset, GAPDH, a popular HKP, exhibits stable levels for most treatments, but could not be a good control for studies with bortezomib and few other compounds (Fig. 5J). To identify the best steady-level controls, we investigated the protein expression variations by calculating standard deviations across all perturbations ( Fig. 5K and Supplementary Table S11). The typical expression profile of the most steadily expressed and most variable proteins are depicted in Fig. 5L. The 100 most stable proteins belonged to proteasome (9 proteins, p<2E-10), spliceosome (8 proteins, p<3E-05), and mRNA surveillance pathway (6 proteins, p<0.001). On the other hand, the 30 most variable proteins mapped to cell cycle (5 proteins, p<0.001), FoxO signaling pathway (4 proteins, p<0.004) and p53 signaling pathway (3 proteins, p<0.012).

Increasing the analysis sensitivity by ignoring proximate compounds
Molecules proximate to a compound of interest on the 2/3D maps and/or in hierarchical clusters represent a potential problem for OPLS-DA analysis, as they may share common targets or action mechanism. Contrasting these proximate compounds along with all other molecules against the compound of interest would downplay the common mechanism-related proteins. Therefore, removing proximate compounds from OPLS-DA model would increase the analysis specificity. Indeed, in the model built for

Miniaturizing the ProTargetMiner methodology
In drug development, detailed characterization of compound-induced effects in the most relevant biological setting and cell type is desirable. Thus it would be advantageous to build a dataset similar to ProTargetMiner but with a customized drug panel and cell type. This however, will be both time-consuming and expensive.
Miniaturization of the experiment requires determination of the minimal drug panel size for deducing the target and action mechanism. To address this issue, PLS-DA models were built in R by including different numbers of contrasting compounds in the model (n = 1-54, 50 molecule combinations randomized for each n). The mean rankings of the known drug targets for representative compounds camptothecin, methotrexate, OSW-1 and paclitaxel were determined for each number of contrasting drugs n. As expected, with higher n the deconvolution process was more successful for drug targets but not for random proteins (Fig. 6). Encouragingly, already 8-10 molecules in the drug panel were in most cases enough for target rankings below 10. NDUFV2 and CARS2 were randomly chosen.

Deep proteome validation analysis
A repeated, deep proteome analysis was performed to confirm the consistency of the deduced drug targets and action mechanisms. To provide maximum diversity of the proteome responses, each of the 9 drugs used in this experiment was chosen from a separate cluster in Fig. 2. In total, 94,061 peptides were quantified belonging to 8,898 proteins, of which 8,308 proteins were defined by at least two peptides. After removing the missing values, 7,398 proteins were used for analysis (Supplementary Table S12).
Interestingly, PCA showed 10 principal components, explaining from 28% of the data for the 1 st component to 2.7% for the last component (Fig. 7A-B). This result confirmed the orthogonality of the death trajectories induced by the chosen drugs.
The OPLS-DA models built for bortezomib and vincristine (Fig. 7C-D) yielded nearly identical target/mechanistic proteins as the models built using the main dataset ( Fig. 3). In an OPLS model for dasatinib, 10 known targets (among 23 known targets in DrugBank derived from different studies) were confirmed (Fig. 7E). Importantly, at least five kinases were among the most specifically changed proteins, with MAPK14 being the most specifically down-regulated protein. These findings show that ProTargetMiner methodology can be successfully miniaturized and that the scheme can be applicable to kinase inhibitors in the state-of-the-art proteomics depth. showed that these proteins are most specific to nutlin. Data are represented as mean±SD.

Specificity helps to identify subtle but biologically significant changes
The case of nutlin illustrates the importance of using the specificity parameter as opposed to conventional differential regulation in determining drug mechanism. Nutlin has been found to slow down the DNA repair, probably through MDM2 mediated stalling of double strand break repair (42). While the 30 most down-regulated proteins by nutlin mapped to "cell cycle" (20 proteins, p<2E-15), the 30 most specifically down-regulated proteins extracted from the OPLS-DA model (Fig. 7F) gave "DNA repair" (9 proteins, p<2E-5). The regulation of these proteins across the drug panel shows a dip for nultin (Fig. 7G). The top DNA repair related protein APEX1 ranked 10 th among the most specifically down-regulated proteins out of the total 7398 proteins in the OPLS model, while it was the 520 th most down-regulated protein, exhibiting only a -41% abundance change. As the majority of the DNA repair proteins were only marginally regulated by nutlin, these proteins would be difficult if not impossible to attribute to the action mechanism without contrasting nutlin against other compounds. Also, both p53 and its negative regulator MDM2, the known nutlin target (43), exhibited higher specific response to nultin than to other compounds (Fig. 7H). These results unravel a potential approach to study subtle but biologically meaningful changes.

Discussion
We generated the first installment of a proteomics signature library for anticancer molecules, and proposed a modeling scheme that could be employed for deconvolution of drug targets, action mechanism, resistance factors, overall cellular effects and more.
We consider ProTargetMiner as a complement for preceding drug deconvolution databases, such as CMap (2,3). The OPLS-DA modeling used in ProTargetMiner can be hypothetically applied to transcriptomics data or to non-anti-cancer treatments, as long as the biological endpoints for all used compounds are similar, as in ProTargetMiner. We limited our discussion to cases in which a single drug is contrasted against many others, but the same approach can also be applied to characterize features shared among a selected class of compounds, or between any combinations of drugs. Such a methodology can also be applied to a panel of cell lines, e.g., inquiring which proteins specifically respond to a compound in a given cell line but not in other cell types.
The 11 dimensions discovered in the ProTargetMiner dataset by factor analysis represent orthogonal pathways, theoretical constructs that do not necessarily resemble the classical pathways known to biologists from textbooks (44). The 2012 report of the Cell Death Nomenclature Committee recognized 13 distinct death modalities, while the 2018 updated report recognized 14 death modalities. These numbers are not far from the dimensionality we discovered. The orthogonal pathways uncovered here deserve a detailed bioinformatics analysis, as they may hide novel death modalities.
Our analysis showed clearly that performing simple differential expression and focusing on the top hits obscures subtle but specific compound-induced regulations.
These levels of regulation are only accessible to high precision technologies. In proteomics, precision comes not only from the instrumental parameters, but also from post-processing such as factor analysis (45). We showed that specific regulations as small as 15% can be reliably assessed as the most characteristic changes for a given compound. Such level of precision would have been impossible several years ago, when the standard practice in proteomics recommended disregarding the regulations less than a factor of 1.5 or even 2. Such subtle but specific regulations could be approached here using the specificity concept through building OPLS-DA models that automatically take into account reproducibility, missing values and the change magnitude.
ProTargetMiner can be used for a variety of exploratory studies, only a few of which we have so far performed. Among yet to be explored opportunities is the discovery of novel biological pathways or gene function elucidation at the protein level.
These studies can be done based on the proteomics co-regulation database. Our analysis also revealed a host of anti-correlating protein pairs, which can be biologically meaningful. An unexpected discovery was the number of anti-correlations (spots 4 & 5 on Fig. 5F), reaching 3/5 of that of positive correlations (spots 2&3). Being largely ignored currently in bioinformatics analysis, these anti-correlations represent an untapped wealth of important biological information. Furthermore, the dataset allows one to identify proteins that can serve as HKPs in molecular biology experiments.
This resource is expandable in various dimensions, e.g., by incorporating more perturbations, time points, and profiling more cell lines. Although expansion of the compound library seems desirable, one must consider that for a comprehensive database, enough perturbations must be done to saturate all possible cellular states. Summarizing, ProTargetMiner is a novel tool with significant potential in drug discovery, which can be useful to broad community of cancer researchers.

Acknowledgements:
We would like to acknowledge Marie Ståhlberg and Carina Palmberg for their assistance in mass spectrometry analyses. We are also grateful to Elias S.J. Arnér for providing TRi-1 and TRi-2 compounds, Stig Linder for b-AP15 and Kaori Sakurai for OSW-1 and tomatine.

Compounds
The initial drug library consisted of 118 molecules cherry-picked for cancer from Selleckchem FDA-approved drug library. AXL inhibitor TP-0903 was bought from Selleckchem (Cat#S7846) and auranofin from Sigma (Cat#A6733).

LC-MS analysis
Samples were loaded with buffer A (0.1% FA in water) onto a 50 cm EASY-Spray column (75 µm internal diameter, packed with PepMap C18, 2 µm beads, 100 Å pore size; Cat#ES803) connected to the EASY-nLC 1000 (Thermo; Cat#LC120) and eluted with a buffer B (98% ACN, 0.1% FA, 2% H 2 O) gradient from 2% to 35% of at a flow rate of 250 nL/min. Mass spectra were acquired with an Orbitrap Q Exactive Plus mass spectrometer (Thermo; Cat# IQLAAEGAAPFALGMBDK) in the data-dependent mode at a nominal resolution of 30,000, in the m/z range from 375 to 1400. Peptide fragmentation was performed via higher-energy collision dissociation (HCD) with energy set at 35 NCE.
For deep proteomics validation set, samples were loaded with buffer A (0.1% FA in water) onto a 50 cm EASY-Spray column (75 µm internal diameter, packed with PepMap C18, 2 µm beads, 100 Å pore size) connected to a nanoflow Dionex UltiMate 3000 UPLC system (Thermo) and eluted in an increasing organic solvent gradient from 2% to 26% (B: 98% ACN, 0.1% FA, 2% H 2 O) at a flow rate of 300 nL/min. Mass spectra were acquired with a Q Exactive HF mass spectrometer (Thermo; Cat#IQLAAEGAAPFALGMBFZ) in data-dependent mode at a nominal resolution of 60,000 (@200 m/z), in the mass range from 350 to 1500 m/z. Peptide fragmentation was performed via higher-energy collision dissociation (HCD) with energy set at 33 NCE.

Protein identification and quantification
The raw data from LC-MS were analyzed by MaxQuant, version 1.5.6.5 (RRID:SCR_014485) (47). The Andromeda engine searched MS/MS data against Uniprot complete proteome database (human, version UP000005640_9606, 92957 entries). Cysteine carbamidomethylation was used as a fixed modification, while methionine oxidation was selected as a variable modification. Trypsin/P was selected as enzyme specificity. No more than two missed cleavages were allowed. A 1% false discovery rate was used as a filter at both protein and peptide levels. For all other parameters, the default settings were used.

Statistical analysis
After removing all the contaminants, only proteins with at least two peptides were included in the final dataset. Protein abundances were normalized by the total protein abundance in each sample. Data were processed by Excel, R, Python, and SIMCA (Version 15, UMetrics, Sweden; RRID:SCR_014688).

Protein correlation analysis
Relative protein abundances in log2-scale were reported by Diffacto (45) and subject to batch effect removal using the Limma package (48). We excluded 345 proteins that were quantified in less than 30 individual LC-MS/MS runs, and averaged the protein abundances from replicate experiments for each individual drug treatment. Afterwards, a 4212-by-4212 correlation matrix was calculated. Two-tail p-values associated with the Pearson's correlation coefficients were calculated based on t-distribution, and were subjected to the Benjamin-Hochberg procedure for controlling false discovery rate (FDR). A cutoff of FDR-adjusted p-value < 0.001 was applied to select strongly correlating or anti-correlation protein pairs. Considering the correction of in total 17,740,944 (4212x4212) p-values, the FDR cutoff is rather stringent Network mapping STRING version 10.5 (http://string-db.org) tool was employed for protein network analysis (49). Medium confidence threshold (0.4) was used to define protein-protein interactions. Enrichment analysis with the whole genome as a background dataset was used to identify the enriched gene ontology terms and pathways.