Targets of drugs are generally, and targets of drugs having side effects are specifically good spreaders of human interactome perturbations

Network-based methods are playing an increasingly important role in drug design. Our main question in this paper was whether the efficiency of drug target proteins to spread perturbations in the human interactome is larger if the binding drugs have side effects, as compared to those which have no reported side effects. Our results showed that in general, drug targets were better spreaders of perturbations than non-target proteins, and in particular, targets of drugs with side effects were also better spreaders of perturbations than targets of drugs having no reported side effects in human protein-protein interaction networks. Colorectal cancer-related proteins were good spreaders and had a high centrality, while type 2 diabetes-related proteins showed an average spreading efficiency and had an average centrality in the human interactome. Moreover, the interactome-distance between drug targets and disease-related proteins was higher in diabetes than in colorectal cancer. Our results may help a better understanding of the network position and dynamics of drug targets and disease-related proteins, and may contribute to develop additional, network-based tests to increase the potential safety of drug candidates.


Results
Targets of drugs with side effects spread perturbations better in the human interactome than targets of drugs without side effects. The initial working hypothesis of our research was that drugs having protein targets that better propagate changes in the human interactome may have a higher probability of causing side effects. This hypothesis is in agreement with earlier findings showing that the interactome neighbourhood contributed to drug side-effect similarity 20 . In order to test our hypothesis, we compared the propagation of perturbations started from drug targets with and without known side effect, as well as that of non-target proteins in the human protein-protein interaction network using the Turbine network dynamics software package developed earlier in our group 35 .
To compare the spreading efficiency of drug target proteins with and without side effects we ran a series of perturbation simulations on the human interactome using the Turbine programme 35 . We assembled a human interactome containing 12,439 proteins and 174,666 edges using the STRING database 46 , out of which 1,726 were target proteins of 3,626 human drugs obtained from the DrugBank database 47 and a total of 99,423 drug-side effect pairs from the SIDER database 2 were analysed as described in Methods in detail. Simulations were based on the communicating vessels network dynamics model tested earlier 35 , where changes from one protein to its neighbours 'flow' in proportion with the energy differences between the 'source' and the 'target' proteins. We examined a total of 495 target proteins of 597 drugs (Suppl. Table 1), which were reported to have side effects according to the SIDER database 2 . As control groups, we have also examined the 1,231 target proteins of the remaining 3,029 drugs having no reported side effects in the SIDER database 2 , as well as the remaining 10,713 proteins in our human interactome, which were not listed as drug targets in DrugBank 47 . For each selected protein target we calculated the silencing time, which is the number of time steps in the simulation needed for the initial perturbation to disappear completely due to dissipation. Small silencing time values were shown to be an efficient measure of large spreading efficiency of network nodes earlier 35 , since in this case the initial perturbation efficiently spreads in the network and it becomes dissipated fast. Figure. 1 shows the cumulative distribution of the normalized number of proteins having an increasing silencing time (thus decreasing perturbation efficiency). Targets of drugs with side effects had a significantly larger proportion of small silencing times (i.e. large spreading efficiency) than targets of drugs having no side effects (Mann-Whitney-Wilcoxon test, p = 1.677e-5). Similarly, the proportion of targets of drugs without side effects having a small silencing time (i.e. large spreading efficiency) was significantly larger than that of human interactome proteins, which have not been reported as drug targets in DrugBank 47 (Mann-Whitney-Wilcoxon test, p = 2.2e-16). Thus targets of drugs with side effects were found to be better spreaders of perturbations than targets of drugs having no reported side effects. Importantly, drug targets were also better spreaders of perturbations than non-target proteins.
Simulations shown on Fig. 1 were run with a starting energy of 1,000 units and a dissipation value of 5 units. Being curious whether our result is robust for the variations of simulation parameters, we repeated these simulations using a starting energy of 10,000 and a dissipation of 1 or 5 units. Under these conditions we obtained very similar results (Suppl. Figs. 1 and 2) to those shown on Fig. 1. When we split the starting energy of 1,000 units equally among targets of multi-target drugs instead of examining each target protein alone as the source of perturbations, we were able to reproduce the same pattern (Suppl. Fig. 3) as that of Fig. 1. Furthermore, to test the robustness of the results against the choice of protein-protein interaction network, we randomly deleted 50% of the 12,439 proteins in our human interactome. Examining the spreading efficiency in the giant component of this truncated interactome we obtained very similar results (Suppl. Fig. 4) to those shown in Fig. 1 Next we were curious whether the larger spreading efficiency of drug targets with side effects, as compared to drug targets without side effects or proteins having no reported drugs bound to them, is also shown by examining perturbation reach values. Perturbation reach values show the number of proteins, which received the perturbation from the initial perturbation source protein until the perturbation was dissipated from the system. Small perturbation reach values were shown to characterize small spreading efficiency in earlier studies 35 , since in this case the original perturbation reached only a small number of proteins before it became dissipated. Targets of drugs with side effects had a significantly smaller proportion of small perturbation reach values (i.e. small spreading efficiency) than that of targets of drugs having no side effects (Mann-Whitney-Wilcoxon test, p = 1.663e-5; Suppl. Fig. 5). Similarly, the proportion of targets of drugs without side effects having a small perturbation reach value (i.e. small spreading efficiency) was significantly smaller than that of human interactome proteins, which have not been reported as drug targets in DrugBank 47 (Mann-Whitney-Wilcoxon test, p = 2.2e-16; Suppl. Fig. 5). Using a starting energy of 10,000 but a dissipation of 1 instead of 5 units, or splitting this starting energy equally among targets of multi-target drugs, we obtained very similar results (Suppl. Figs. 6 and 7). These studies confirmed that drug targets are better spreaders of perturbations than non-target proteins, and also that targets of drugs with side effects are better spreaders of perturbations than targets of drugs having no reported side effects.
A qualitatively similar picture emerged, when we examined the spreading efficiency of target proteins of drugs against two diseases, colorectal cancer and type 2 diabetes (Suppl. Tables 2-6). We chose these two diseases, because they represent very well the target groups of different drug design strategies 4 , and they had been the subjects of several former network-related studies [37][38][39][40][41][42][43][44][45] . Drug targets of both diseases were found to be better spreaders of perturbations than non-target proteins (Suppl. Fig. 8; p = 3.367e-5 and p = 5.88e-5 for colorectal cancer and diabetes, respectively). There was a tendency showing that targets of drugs with side effects were better spreaders of perturbations than targets of drugs having no reported side effects both in colorectal cancer and in diabetes. However, due to the low number of identified drug targets having side effects (3 and 25, respectively), these latter differences were not statistically significant (p = 1 and p = 0.2593, respectively). shows the cumulative distribution of the normalized number of proteins with given silencing times, which are drug targets with known side effects (blue dashed line), which are drug targets without known side effects (red solid line) and which are not drug targets (green dotted line). The number of proteins was normalized by dividing the number of proteins in each silencing time range by the total number of proteins allowing a better comparison. The total number of drug targets with and without side effects and nontarget proteins was 495, 1,231 and 10,713, respectively. The human interactome containing 12,439 proteins and 174,666 edges was built from the STRING database 46 , 1,726 human drug targets were obtained from the DrugBank database 47 and 99,423 drug-side effect pairs were taken from the SIDER database 2 . Silencing times were calculated separately for every protein/drug target with the Turbine program 35 as described in the Methods section using a starting energy of 1,000 and a dissipation value of 5 units. Statistical analysis was performed using the Mann-Whitney (Wilcoxon rank sum) test function of the R package 56 . There was a statistically significant difference (p = 1.677e-5) between the silencing times of drug targets with known side effects and the silencing times of drug targets without reported side effects. The difference between the silencing times of drug targets and non-target proteins was also statistically significant (p = 2.2e-16).
Scientific RepoRts | 5:10182 | DOi: 10.1038/srep10182 Colorectal cancer-related proteins are good spreaders of perturbations and have a high centrality, while type-2 diabetes-related proteins show an average spreading efficiency and average centrality. Very importantly, a rather interesting difference emerged, when we examined the spreading efficiency of proteins related to colorectal cancer and diabetes. Mutated genes and their corresponding proteins in colorectal cancer and in type-2 diabetes were obtained from the Cancer Gene Census database 48 (Suppl. Table 7) and from the article of Parchwani et al. 49 (Suppl. Table 8), respectively. In case of colorectal cancer, disease-associated proteins were found to be significantly better spreaders than the residual proteins of the human interactome. On the contrary, diabetes-related proteins showed indistinguishable spreading properties to the rest of human proteins, which were not associated with the onset of diabetes (Fig. 2). To test the robustness of the results against the choice of protein-protein interaction network, we randomly deleted 50% of the 12,439 proteins in our human interactome. Here again, colorectal cancer-associated proteins were found to be significantly better spreaders than the residual proteins of the human interactome (data not shown; p = 0.00021 in Mann-Whitney test) and spreading efficiency of diabetes-related proteins showed no significant difference as compared to the rest of human proteins (data not shown; p = 0.095 in Mann-Whitney test).
These findings are in agreement with earlier results showing that cancer-associated proteins are enriched in proteins having a high centrality in the human interactome 37,38,40,[42][43][44][45] . Indeed, in our human interactome, cancer-related proteins had a significantly higher degree, closeness and betweenness centralities than diabetes-related proteins, having a 9.6-, 1.2-and 54-fold increase, respectively (Table 1). In agreement with their similar silencing time values (Suppl. Fig. 8), drug targets without or with side effects showed no significant centrality differences in the human interactome (Suppl. Table 9).
The interactome distance between drug targets and disease-related proteins is higher in diabetes than in colorectal cancer. Encouraged by the results showing an increased centrality of cancer-related, but not of diabetes-related proteins in the human interactome, we examined the interactome geodesic distance (i.e. shortest path) between drug targets and disease related proteins in both diseases using the neighbourhood matrices of related proteins. Our data show that the geodesic distance  46 . Colorectal cancer-and type 2 diabetesrelated proteins were obtained from the Cancer Gene Census database 48 and from the article of Parchwani et al. 49 , respectively. Silencing times were calculated separately for every protein with the Turbine program 35 as described in the Methods section using a starting energy of 1,000 and a dissipation value of 5 units. Statistical analysis was performed using the Mann-Whitney (Wilcoxon rank sum) test function of the R package 56 . There was a statistically significant difference between the silencing times of disease-related and non-related proteins in case of colorectal cancer (p = 2.329e-9) and but there was none in case of type 2 diabetes (p = 0.8343).
in the human interactome between drug targets and disease-related proteins is significantly larger in case of type-2 diabetes than in colorectal cancer (targets without side effects: p = 1.062e-5; targets with side effects: p = 5.441e-3). (Table 2; Suppl. Tables 10-13 and Suppl. Fig. 9) This finding is supported by the visual representation of the human sub-interactome of drug target and disease-related proteins of these two diseases (Suppl. Fig. 10), where drug targets and disease-related proteins of colorectal cancer  Table 2. Average network distance of drug targets without and with known side effects used in the treatment of colorectal cancer and type 2 diabetes from the disease-associated proteins. * This value is significantly greater than the average network distance of drug targets without known side effects in colorectal cancer (p = 1.062e-05). Statistical analysis was performed using the Welch (Student's) two sample t-test function of the R package 56 . ** This value is significantly greater than the average network distance of drug targets with known side effects in colorectal cancer (p = 0.005441). Statistical analysis was performed using the Welch (Student's) two sample t-test function of the R package 56 . The table shows the arithmetic mean of the average network distance between drug targets (with and without known side effects used in the treatment of colorectal cancer and type 2 diabetes) and the proteins related to the respective disease (results were very similar, if instead of arithmetic means we used the medians; data not shown). The total number of colorectal cancer-and diabetes-related proteins in the human interactome were 18 and 14, respectively. Average network distances were calculated as shortest paths using the Pajek programme 58 49 , respectively. We used the mean values and the t-test because of the near-normal distribution of the average network distances.
are intertwined, while these two groups of proteins remain rather separated in type-2 diabetes. This observation is further substantiated by the fact, that only 1 of the 18 colorectal cancer-related proteins (6%) is not connected to the giant component of the sub-interactome, while 10 of the 14 diabetes-related proteins (71%) are missing from the same giant component (Suppl. Fig. 10).

Discussion
The most important finding of our study is that 1.) drug targets are better spreaders of perturbations in the human interactome than non-target proteins in general; and in particular, 2.) targets of drugs with side effects are also better spreaders of perturbations than targets of drugs having no reported side effects (Fig. 1). These findings were robust, since they could be reproduced when we used different perturbation parameters (Suppl. Figs. 1,2 and 3), different measures of perturbation spread (Suppl. Figs. 5, 6 and 7), and reduced the size (coverage) of the human interactome to half of the original (Suppl. Fig. 4). These results are in agreement with those of a previous study showing that the interactome neighbourhood contributed to side-effect similarity 20 . Importantly, colorectal cancer-related proteins are good spreaders of perturbations and had a high centrality, while type-2 diabetes-related proteins showed an average spreading efficiency and had an average centrality in the human interactome ( Fig. 2 and Table 1). These findings are in agreement with earlier results showing that cancer-associated proteins are enriched in hubs, bottlenecks and bridges all having a high centrality in the human interactome 37,38,40,[42][43][44][45] .
Furthermore, the interactome-distance between drug targets and disease-related proteins was higher in diabetes than in colorectal cancer ( Table 2; Suppl. Tables 10-13 and Suppl. Fig. 9). This finding is in agreement with both the results of previous studies and intuitive insights on the classification of drug target strategies 4 . Most drug targets are 3 or 4 steps away in the human interactome from proteins involved in the same disease 50 . Moreover, cancer-related and metabolic disease-related proteins were shown to have an average network distance to the related drug targets of 2.3 and ~5 network edges, which are smaller and higher than the most abundant distance values, respectively, forming the two extremes of the distance-spectrum 50 . The former value is in the range we found in our study ( Table 2). The latter value of a disease group containing diabetes is much larger than that related to cancer, which is again in agreement with our findings. As a general trend, rapidly proliferating cells, like those in cancer, are attacked at their central proteins, while differentiated cells, such as those involved in type-2 diabetes, are attacked at the neighbours of central proteins 4 . These assumptions are also in agreement with a smaller network distance of centrally positioned cancer-related proteins from centrally positioned cancer drug targets than the distance between the more peripheral diabetes-related proteins and drug targets.
Analysis of perturbation spread in molecular networks may be used to develop additional, network-based tests to increase the potential safety of drug candidates. Assessment of perturbation spread in weighted networks (where the edges are weighted according to the abundance of their end-node proteins of relevant tissues, e.g. the endothelial cell in colorectal cancer, as well as hepatocyte and myocyte in diabetes, as described in our earlier study for the yeast interactome 51 ), directed networks (such as signalling networks 4,52 ), or networks considering the subcellular localization of participating proteins 53 , as well as using quantitative measures of side-effect severity and abundance may provide additional information and will be subjects of later studies.
In summary, our results contributed to a better understanding of the network position and dynamics of disease-related and drug target proteins. The findings may help the future development of novel, network-based pharmacovigilance methods increasing the potential safety of drug candidates.

Methods
Construction of the human protein-protein interaction network. In this paper, we examined the propagation of perturbations in the human protein-protein interaction network (interactome). The choice of this type of network was driven by the fact that it contains the most proteins and the greatest number of connections (as opposed to signalling networks or regulatory networks). Human interactome data were downloaded from the STRING database 46 on 8 February, 2013. STRING contains interaction data based on a vast number of data collection principles. We have only used manually collected ('database' column) or experimental ('experiments' column) data having higher reliability than e.g. predicted data. Only human protein-protein interactions were included in the interactome. In order to facilitate the comparison with drug targets, the STRING Ensemble Protein ID (ENSP) protein codes were translated to UniProt ID 54 using the UniProt translator. From the original 13,484 ENSP IDs we managed to translate 12,493 to UniProt IDs, but only 12,439 proteins were connected to other proteins. The database contained a total of 377,920 human protein-protein interactions, out of which 350,528 remained after translating the protein IDs to UniProt IDs using the UniProt translator, which were further reduced to 174,666 after eliminating multiple links and loops (self-links). The original STRING database also contained edge weights indicating the reliability of data. Since we only worked with manually collected and experimental data, our interactome contained no edge weights.

Measurement of the propagation of perturbations in the human interactome.
The propagation of perturbations in the human interactome was measured with the network perturbation analysis software for simulating network dynamics called Turbine 35 . For the simulation experiments we chose the software's communicating vessels model 35 , where changes from one protein to its neighbours 'flow' in proportion with the energy differences between the 'source' and the 'target' proteins. The communicating vessels model 35 contains a starting energy (E) and a dissipation parameter (D), where the starting energy is distributed equally among the proteins of the human interactome specified at the individual simulations, while in each step of the simulation the program subtracts D units of energy from each protein of the interactome. In most simulations E and D were set to 1000 and 5 units, respectively. Having these starting energy and dissipation parameters it was possible to trace the propagation of perturbations in the network rather easily. However, all the key simulations were also examined using different E and D values to examine the robustness of the results. To characterise the propagation efficiency of the starting node(s), the measure of silencing time 35 was used, which is the time elapsed from the start of the simulation until the energy of all nodes reaches the minimum threshold of less than 1 unit. We also calculated perturbation reach values 35 , which show the number of proteins receiving the perturbation from the initial perturbation source protein until the perturbation was dissipated from the system. Characterisation of drug side effects. Drug side effects were collected from the SIDER database 2 .
This database contains information about drug side effects and their frequencies from public documentation and package inserts, with the help of drug labels and terms from MedDRA (Medical Dictionary for Regulatory Activities). SIDER data were downloaded from the version of 17 October, 2012. This version of the SIDER database 2 contained 996 drugs, 4,192 unique side effects and 215,850 drug-side effect pairs. After eliminating the duplicates, 99,423 drug-side effect pairs remained. In order to be able to compare data, we converted drug IDs in the SIDER database 2 into IDs of the DrugBank database 47 by matching the drug names.
Characterisation of drug targets. We collected drug targets from the DrugBank database 47 version last updated on 10 February, 2013. The XML version of the database was used, including the drug names, indications and target list. The proteins in the target list were identified by their UniProt IDs 54 with the help of the external reference table available in the database. From the drug target list only those drugs that targeted human proteins were selected. From the original 6,718 drugs 3,926 such drugs were found, of which 3,626 had target proteins contained in our human interactome.
After comparison with the drug-side effect data from the SIDER database 2 , we found that 597 drugs (with a total of 495 target proteins) had known side effects, while the remaining 3,029 drugs (with 1,231 target proteins) had no reported side effects to date.
Protein and drug target data related to the two examined diseases: colorectal cancer and type 2 diabetes. Genes involved in colorectal cancer were collected from the Cancer Gene Census 48 database, by selecting those proteins in the entire database that contained the word 'colorectal' in their 'Tumour Types' column. Genes related to type 2 diabetes were obtained from the article of Parchwani et al. 49 . The 18 genes involved in colorectal cancer and the 46 genes related to type 2 diabetes were then mapped to proteins marked by UniProt ID 54 with the help of the Protein Identifier Cross-Reference (PICR) 55 application. See Suppl. Tables 7 and 8 for the genes and their respective proteins involved in the two diseases. From these proteins, all 18 colorectal cancer-related but only 14 type 2 diabetes-related were contained in our interactome. Drugs used in treatment of colorectal cancer and diabetes and their drug targets were collected based on the drug indications in the DrugBank database 47 . See Suppl. Table 2 for the relevant keywords used. We found 11 drugs against colorectal cancer and 36 against type 2 diabetes, which all had valid targets. Drugs against colorectal cancer and type 2 diabetes had 33 and 42 target proteins, respectively, out of which 27 and 39, respectively, were contained in our human interactome.
Other methods. A number of Bash shell scripts were written to automate the network simulation experiments with Turbine. Statistical analysis of the results was performed with the R software package 56 . The Pajek software 57 was used to measure geodesic distances and centralities in the human interactome, the Cytoscape software 58 was used to create images of the human interactome and the Inkscape software 59 was used to create some other images.