Introduction

Many biological functions, in living organisms are accomplished by proteins. In fact, proteins are the smallest operating units in cells whose cell organization and functions comprehension depends on their behavior. A protein rarely operates stand-aloan. In other words, proteins operate in groups which are called complex1. A complex is made up of proteins that are all physically connected at the same time. It is notable that a protein complex should not be mistaken by a functional module. A functional module is defined by high density of interactions within a group of proteins, where a group is said to have high density when the amount of intergroup interactions are more than intragroup ones. An approach to study these crucial molecules is recognition of different complexes. We know that, a single cell of a simple organism consists of thousands of proteins, so there are millions of potential complexes related to them. Although available accurate experimental methods can determine the authenticity of proposed complexes, the mentioned experimental processes are not possible and reasonable, due to the extreme number of these candidate complexes2. It seems that computational approaches can be a suitable alternative for detecting these complexes3. Extracting protein complexes from protein interaction networks is one of these computational approaches. In recent decades, many powerful experimental methods have been proposed to extract a large amount of protein-protein interactions (PPIs)4. Tandem Affinity Purification with mass spectrometry (TAP-MS)5, Yeast-Two-Hybrid (Y2H)6, Co-immunoprecipitation (Co-IP) and Protein-Fragment Complementation Assay (PCA)7 are examples of these high-throughput techniques. This collection of PPIs is usually known as PPI network. PPI networks can be modeled as an undirected graph, where nodes denote proteins and edges represent interactions between these proteins. In such networks, complexes are considered as dense subgraphs because it is reasonable to assume that the number of inner interactions between members of one complex is usually more than the number of outer interactions of its members8, 9. By making such an assumption, the problem of detecting protein complexes changes to the traditional graph clustering problem.

We know that the existence of several limitation in experimental methods causes considerable noise (false positive and false negative) in the production process of PPI networks8. Although, there is not any certain solution for reducing the noise, some suggestions have been made by some methods. For example, Chua et al.10 and Brun et al.11 have proposed two algorithms which use FS-weights12 and CD-distance respectively. In addition, some of proteins are multifunctional and simultaneously serve in more than one complex. Therefore complexes may have overlap13. It is worth mentioning that not all of the protein-protein interactions have the same reliability and time of occurrence14. The reliability of an interaction is often shown by the weight of corresponding edge in PPI networks15 while the time is often ignored. Clustering methods should be capable of handling noise, weights and occurrence times of interactions in PPI network.

Due to the variety of available complex detection algorithms, there is no categorization that covers all types of complex detection methods. The primary idea and also type of information which are used in the algorithms, are usually the main axis to classify them. Hence, all available complex detection algorithms can be divided into two main categories. The first category consists of all algorithms that don’t use any biological information except PPI networks. The second category covers the algorithms that use various kinds of biological information in addition to PPI networks. This information is used for making better decisions. In both categories, all algorithms are different types of graph clustering algorithms. The algorithms in the first category are classified into five main groups16, 17. (1) Local neighborhood Density search (LD); in which every cluster is initialized with a single node or a group of nodes which construct a dense subgraph. In every step of the algorithm, one or more nodes can be joined to a cluster or to be discarded. The possibility of proposing overlapped clusters is one of the advantages of methods which use this strategy. For example, MCODE18 and ClusterONE13 belong to this category. (2) Cost-based Local search (CL); the methods in this category decompose the graph into some parts in every step. This graph decomposition is led by a cost function for accessing to a better partitioning. Such methods are often unable to produce overlapped clusters, which is an significant disadvantage of methods in this category19. The importance of producing overlapped clusters is due to the large amount of proteins that belong to several protein complexes simultaneously20. The core of RNSC algorithm (without the filtering step) is a famous example of such methods21. (3) Flow Simulation (FS); the main idea of the methods of this category is the behavior of a fluid in canals and spread of information on a network. The MCL4 and RRW22 methods are the best examples for this category. They use random walk theorem for implementing their approach. (4) Clique finding methods (CF); algorithms in this category, predict clusters by merging, mixing or deleting the different types of cliques or k-cores. CMC8 and CFinder19 are samples of this category. (5) Other traditional graph clustering methods; there are often a few methods which don’t have any prominent idea. These methods are unique in their characteristics. Therefore, these algorithms are put in a separate category. AP is an example of this category23. Its idea is the same as the popular k-center clustering algorithm which is implemented on weighted graph instead of multi-dimensional vector space23. It is also possible to classify algorithms in the second category based on considerable diversity of the types of biological information which is used by them24. We only focused on the first category of these algorithms for two main reasons. First, our knowledge and technology is not sufficient for extracting all biological information so, our perspective of biological rules is limited and incomplete. Hence, the significant existing noise and information defects may cause deviation and bias in the extracted biological information and the algorithm’s results. Second, an improvement on the algorithms of the first category can improve the results of algorithms in the second category automatically.

Complex detection methods which are using PPI networks, have a limited accuracy. The large amount of noise (false positive and false negative interactions) are responsible for this fault. Previously, biologists generally had concurred that the amount of connections between vertices in a PPI network are closely related to the their biological importance, hence hubs were more likely to be lethal genes25, whereas later it was found that this correlation might not be completely true26. On the other hand, Han et al. have proposed a binary hub classification which divided hubs into two groups, ‘party hubs’ and ‘date hubs’27. Date hub refers to a group of vertices that have many connections with other vertices but in different times. This group of vertices emerges in the form of hubs, when we have a static snapshot of all the occurred interactions, as PPI networks. While party hubs are high degree vertices which appear as global connectors in the PPI networks28. Similarly Liu et al. classified hubs into two types, ‘module hubs’ and ‘inter-module hubs’14. Based on this classification the hubs in a module are recognized as module hubs and the hubs which connect modules to each other are considered as inter-module hubs. Comparing these two classifications, it seems that inter-module hubs are date hubs and module hubs are party hubs. As a result, module hubs are important biological hubs in which their presence is crucial in clusters, while inter-module hubs are unessential or even fake hubs and if necessary, they can be ignored. The more in-depth analysis has been provided by Batada et al.28, 29.

Thus, probably by eliminating inter-module hubs not only do we have a better-separated network with less noise28, but also we consider different occurrence time for protein interactions indirectly. Considering the hubs has recently received much attention. For example Liu et al. and Yong et al. have considered the biological properties of hubs and have tried to detect protein complexes by removing all hubs in network30, 31. Since these methods are classified as the second category, so we were not able to compare their methods with the methods in the first category.

Here we propose a new protein complex detection method from PPI networks which is classified as the first category ‘LD’. The main idea of this method has been based on eliminating noise in networks via removing hubs. In this approach, some of the hubs were removed at the beginning stage. This group of hubs included both module hubs and inter-module hubs. In fact, our study show that many high degree hubs are inter-module hubs in the PPI networks which are denser while these hubs change to module hubs in the case of sparser networks. Then a greedy growth process were used for creating primary clusters from different single nodes. After that, some of the eliminated hubs were added to the primary clusters based on the density of PPI network and modularity concept. This concept helped us to add module hubs to appropriate primary clusters and filter the inter-module hubs. Final clusters were presented by merging highly overlapped primary clusters and filtering the sparse clusters. The experimental results demonstrate that our algorithm (IMHRC) outperforms other protein complex detection methods, especially ClusterONE algorithm, that is a state-of-the-art method32.

Results

Before presenting the results of our study, we have discussed datasets, evaluation metrics and Gold Standards which were used to assess the results of complex detection algorithms. Then the results of the methods are presented.

Evaluation metrics

Comparing the outputs of complex detection algorithms with a predefined gold standard set is one of the common ways to assess their performance. Existing significant amount of overlap between real complexes in the gold standard sets and also between predicted complexes, cause the difficulty in comparison methods. On the other hand, it is possible to match a real complex with more than one predicted complex and vice versa. In addition, the matching between predicted complexes and real complexes is often partial. So we need to use some standard criteria in order to calculate the amount of matching between the gold standard and predicted complexes.

One of the common criteria in literature is the geometric accuracy (Acc) which has been introduced by Brohee and van Helden33. It is the geometric mean of clustering-wise sensitivity (Sn) and clustering-wise positive predictive value (PPV). Given n complexes in the gold standard as references and m predicted complexes, let t ij denote the number of common proteins between reference complex i and predicted complex j and also let N i denote the number of proteins in the reference complex i. Sn, PPV, and Acc are defined as followed:

$$Sn=\,\frac{{\sum }_{i=1}^{n}\mathop{{\max }}\limits_{j}\{{t}_{ij}\}}{{\sum }_{i=1}^{n}{N}_{i}}$$
(1)
$$PPV=\,\frac{{\sum }_{j=1}^{m}\mathop{{\max }}\limits_{i}\{{t}_{ij}\}}{{\sum }_{j=1}^{m}{\sum }_{i=1}^{n}{t}_{ij}}$$
(2)
$$Acc=\,\sqrt{Sn\times PPV}$$
(3)

Sn measures the fraction of proteins in the reference complexes that is detected by predicted complexes. Since the availability of a giant component can increase the amount of Sn, PPV was used. In fact, protein aggregation in one predicted complex inflates Sn while putting every protein into to the correct predicted complexes which is the same as reference complexes, can maximize the PPV. So accuracy criterion (Acc) was used for balancing the two measures. It should be noted that using Acc cannot turn them into a perfect criterion for evaluating complex detection algorithms. Assume that there is a perfect complex detection algorithm whose output is the same as reference complex sets. Sn gets the maximum value on this algorithm. But this is not true about the PPV. As a matter of fact, because of the overlapping property, there are some proteins which belong to more than one predicted complex. So the numerator of PPV is always smaller than its denominator. It means that although overlap property is one of the intrinsic properties of complexes, the PPV criterion would not be maximized when overlap exists and this is an obstacle.

Nepusz et al. used the Fraction and MMR criterion to overcome this issue13. If P is denoted the set of predicted clusters and C is denoted the set of gold standard complexes, the fraction criterion is defined as following:

$${N}_{c}=|\{c|c\in C,\exists \,p\in P,O\,(p,c)\ge \omega \}|$$
(4)
$$Fraction=\frac{{N}_{c}}{|C|}$$
(5)

As mentioned later, \(O(p,c)\) which is called as the matching score, calculates the extent of matching between a reference complex c and a predicted complex p. So these criteria show the fraction of gold standard complexes which are matched by at least one predicted cluster. The threshold ω was set to 0.25. By choosing 0.25 for ω, it is guaranteed that at least half of the proteins in a matched gold standard cluster is distinguished by at least half of the proteins in a matched predicted cluster. To evaluate MMR, a bipartite weighted graph was constructed which one of its parts associated to the reference complexes and another associated to the predicted complexes. The matching score between every member of one part with each member of another part was calculated by the equation (12) and was considered as a weighted edge in the graph, if its value was greater than 0.2. By running a maximum weighted bipartite graph matching algorithm, we obtained a one-to-one mapping between members of two groups with the maximal match. The value of MMR criterion is equal to the normalized maximal match which is total weight of selected edges, divided by the number of the reference complexes. Nepusz et al. have proposed sum of the Accuracy, MMR and Fraction criterions for comparing the performance of the complex detection algorithms13. They showed ClusterONE dominates other complex detection methods, and introduced ClusterONE as a state-of-the-art method. In addition, recently Feng et al. have introduced ClusterONE as the state-of-the-art complex detection method and have proposed a new supervised learning method that has achieved a better performance than ClusterONE32. Since in the learning step of this algorithm biological information are used, we can put this algorithm in the second category. So we compared our experimental results with ClusterONE and other best complex detection methods in the first category.

Gold Standard set

For evaluating result of the methods, two gold standards were used as benchmarks. These gold standards include the recent version of the MIPS catalog of protein complexes and the Gene Ontology based protein complex annotations from SGD. The MIPS catalog has a hierarchical structure, so the complexes may be composed of several subcomplexes which are available at most in five hierarchy levels deep13. We extracted all complexes from all MIPS categories which consist of at least three and at most 100 proteins. It should be mentioned that the MIPS category 550 was excluded, because of all its complexes is predicted by computational methods. Also, we used Saccharomyces Genome Database (SGD) as another source for extracting the second gold standard set. SGD includes Gene Ontology (GO) annotations for all yeast (Saccharomyces cerevisiae) proteins. These GO terms provide biological information which can be used for producing reference complexes. This process has been introduced in refs 13 and 22. Therefore, we used this approach for creating SGD gold standard set which included the reference complexes of at least three and at most 100 protein. In this experiments, the threshold for matching between a predicted complex and a reference complex was considered as 0.25 based on equation (12).

Datasets

In our assessment four experimental yeast PPI datasets were used which include Gavin1, Collins15, Krogan Core and Krogan Extended34. All these datasets are weighted. Weights express the reliability of each interaction which is a value between zero to one. The weights in the Gavin dataset are Socio-affinity index, which measures affinity between proteins. This criterion calculates how many times pairs of proteins are observed together as preys, or a bait and a prey in the data set and then computes their log-odds35. All PPIs in the Gavin data set have socio-affinity index larger than five1. All chosen PPIs in the Collins data set were selected based on their purification enrichment score which contains the top 9074 interactions, as suggested in the original paper15. In these experiments, we also used two different versions of Krogan dataset. All PPIs in the first version which are referred to as Krogan Core, have weights larger than 0.273, while all PPIs in the second version which are referred to as Krogan Extended, have weights larger than 0.101. Generally, all settings and parameters in every dataset were set based on what the original papers have proposed. Moreover, we decided to eliminate self-interactions and isolated proteins from all datasets. Other properties of these networks are shown in Table 1.

Table 1 Details of four PPI Network datasets used in the experiments.

Evaluation

To assess the robustness of IMHRC against other complex detection algorithms, we selected seven of the best algorithms in this topic. In this paper, we tried to have a comprehensive comparison of all the state-of-the-art complex detection methods which not only have been introduced in the last decade but also their source codes or binary execution files is accessible. Furthermore, these groups of methods only use topological information and don’t use any biological information except PPI networks. These algorithms include: AP23, CFinder19, CMC8, MCL4, ClusterONE13, Core of RNSC21 and RRW22. Parameters of all these methods was set to the values that have recommended by their authors or by Nepusz et al. in ref. 13. In fact, Nepusz et al. have calculated the best setting for every algorithm on each datasets. In evaluation, the best setting for IMHRC algorithm was used too.

Determination of β and γ

For implementing the idea of removal and putting the hubs back, we used the parameters β and γ. In order to specify the values of β and γ, we calculated the results of IMHRC for each β and γ in the range of \(0\le {\rm{\beta }},\gamma \le 0.2\) by considering the change 0.001 of these values in each step. The resulting surfaces are shown in Figs 1 and 2 and Supplementary Figures 1 to 6.

Figure 1
figure 1

Performance of IMHRC for the different values of β and γ on the Collins dataset and the SGD gold standard. The β and γ axes indicate the number of hubs that have been removed and put back, respectively and T axis specifies the performance of method. (a) The back view of surface is shown in this figure. (b) The front view of surface is shown in this figure.

Figure 2
figure 2

Performance of IMHRC for the different values of β and γ on the Gavin dataset and the SGD gold standard. The β and γ axes indicate the number of hubs that have been removed and put back, respectively and T axis specifies the performance of method. (a) The back view of surface is shown in this figure. (b) The front view of surface is shown in this figure.

We investigated for the values of β and γ for which performance obtain is the best. Table 2 shows the values of β and γ on the all datasets. The experimental results demonstrate that the best value of β and γ depend to the density of datasets and type of gold standards. It seems that, high values for β and low values for γ are appropriate when the network is dense, while for sparser network, low values should be specified for both of them.

Table 2 The threshold of β and γ in IMHRC on all datasets.

Performance

Table 3 shows the values and settings of all methods. Tables 4 and 5 show the details and the overall performance of the methods based on Accuracy (Acc), Fraction (Frac) and Maximum Matching Ratio (MMR). Variety in the number of real complexes in different datasets is interesting. A real complex was remained in the gold standard set with respect to a dataset, if at least half of its proteins belonged to that dataset. The size of gold standards in Tables 4 and 5 clearly show that krogan datasets are more comprehensive than Collins and Gavin datasets. While the numbers of proteins and interactions in the datasets show that the Gavin and Collins datasets are denser than the other two (Table 1).

Table 3 The applied clustering algorithms’ settings in different datasets.
Table 4 Experimental results and performance comparison of all methods used in this paper on the SGD gold standard.
Table 5 Experimental results and performance comparison of all methods used in this paper on the MIPS gold standard.

Similarly, the number of predicted complexes often were increased when methods were implemented on sparser datasets. This process was more evident about CMC, IMHRC, Cluster ONE and MCL respectively. However, this was not true about CFinder. In contrast, it did not seem any specific patterns for increasing the number of matched clusters from denser datasets to sparser datasets, except in Cluster ONE and IMHRC. In fact, Cluster ONE and IMHRC were the only two methods whose matched predicted clusters increased when the number of predicted clusters increased. In addition, the number of matched predicted clusters which were introduced by IMHRC was always more than other methods. The Fraction criterion clearly shows which methods are more powerful in recognizing real complexes. Based on Tables 4 and 5, IMHRC, Cluster ONE and CMC have the first, second and third best performance on the Fraction criterion respectively.

It is notable that the number of the matched clusters in Tables 4 and 5 is the cardinality of a maximal one-to-one matching between real complexes and predicted clusters based on MMR criterion. Fraction calculates how many real complexes are recognized by at least one of the predicted clusters of a method. So considering the quantity of matched clusters is stricter than the value of Fraction criterion. Size and quality of predicted clusters are other important issues that were measured by Acc and MMR. It is obvious that a predicted cluster is more valuable if the number of its common proteins with the proteins of real complexes is high. The Sn criterion calculates the amount of matching. As it is evident in Tables 4 and 5, CFinder, ClusterONE, IMHRC, and MCL have the highest Sn value. But we know that if a method produces a giant component between its predicted clusters, the value of Sn is not completely trustable. The results in Tables 4 and 5 show CFinder and somewhat ClusterONE have such a behavior. As mentioned previously, using PPV criterion is a way for resolving this defect. A significant difference between the value of Sn and PPV for CFinder is a proof of this claim. Hence, we had to use Acc for comparing the performance of methods. The results showed that ClusterONE, IMHRC, and MCL are the first, second and third best algorithms in the terms of Acc respectively. MMR was the last criterion for comparing the performance of methods. This criterion clearly indicated how much a method could detect real complexes based on both quality and quantity. Again Tables 4 and 5 clearly represent IMHRC, CMC, and ClusterONE as the first, second and third best methods in the terms of MMR criterion. So these algorithms have more accuracy to distinguish and fit predicted clusters with real complexes. For example, we investigated one of the real complexes in the MIPS, based on the Krogan Extended dataset whose proteins are: APC5, CDC23, CDC26, CDC27, APC1, APC4, APC9, APC2, CDC16, DOC1 and APC11. The matching score 0.909, 0.736, 0.699, 0.649, 0.556, 0.545, 0.545 and 0.545 was achieved by IMHRC, CFinder, MCL, ClusterONE, AP, RRW, RNSC and CMC algorithms respectively. Figure 3 depicts the clusters obtained by these algorithms which are matched with the real complex. Finally as Figs 4 and 5 demonstrate, IMHRC dominates all other methods on all datasets except in one case. Actually, the performance of IMHRC is lower than clusterONE, when MIPS and Collins are gold standard and dataset respectively.

Figure 3
figure 3

In this figure, we show results of all clustering methods on detection of a real complex based on the Krogan Extended dataset. The yellow nodes denote real complex and the blue nodes are others proteins. In addition, the halos represent results of algorithms. (a) The red halo shows the result of IMHRC. (b) The red and blue halos show the result of MCL and ClusterONE respectively. (c) The blue halo shows the results of RRW, RNSC and CMC. (d) The yellow and violet halos show the result of CFinder and AP respectively.

Figure 4
figure 4

Modules obtained by different methods. Comparison of the total performance of all methods used in the evaluation on all datasets and using the SGD gold standard.

Figure 5
figure 5

Modules obtained by different methods. Comparison of the total performance of all methods used in the evaluation on all datasets and using the MIPS gold standard.

Discussion

Protein complexes are fundamental operating units in cells. Therefore, understanding the characteristics and behavior of cells depend on analyzing proteins and their complexes. Many computational methods have been proposed to detect protein complexes from PPI networks.

In this paper, we propose a new complex detection algorithm which recognizes real complexes from PPI networks by removing inter-module hubs. Removing hubs are one of the fundamental parts of this algorithm. In fact, we observed that the existing noise in PPI networks and different time of protein interactions are two basic challenges for detecting real complexes from PPI networks. Our survey show removing and putting some part of hubs back can be a good alternative for overcoming these two problems. Actually, module hubs are fundamental units in the structure of complexes and perform many tasks such as “RNA metabolic process”, or “nuclear organization and biogenesis”. Therefore, the presence of them in the complexes is required, but the roles of inter-module hubs are less important in the duty of complexes. In other words, inter-module hubs have mediator roles such as “signal transduction”14.Our study show that many of the high degree nodes in dense PPI networks are inter-module hubs while these nodes are module hubs in sparse PPI networks. This is in agreement with the researches on hubs by Han et al.27 and the results of Liu et al. on the DIP dataset which is really a sparse network14.

We also created a powerful mechanism which is capable of considering the weights of protein interactions and overcoming the overlap property of complexes. For assessing the effect of removing hubs and robustness of our mechanism, we performed detailed evaluation. We compared our method with seven state-of-the-art techniques on four popular datasets. The results showed that not only did our method not only have the highest number of matches in all cases, but also the quality of these matches were better than the other methods. Therefore, our method can predict real complexes with more accuracy and precision.

The evaluation in our comparison was based on three common criteria which have been used in literature. But it seems that there are still some defects in these criteria that prevent us from a flawless assessment. The effect of a giant component and the number of predicted complexes are samples of imperfections. Therefore, one of our future works will be designing a mixture of criteria with fewer defects. In addition, we will try to redesign our complex detection mechanisms for detecting real complexes with more accuracy and precision.

Application

For accessing a rigorous analysis, we performed an assessment for other algorithms the same as IMHRC algorithm. For this purpose, we removed β percent of vertices of network according to step 1 of IMHRC Algorithm section. Next, we run all algorithms on new network. After that, we put γ percent of eliminated hubs back according to the repairing phase of step 3 of IMHRC Algorithm section. Because the calculations were too long, we run them upon one gold standard – SGD – and two datasets that one is denser – Collins – and the other one is sparser – Krogan Core -. So we were able of understanding how removing and putting hubs back affect the performance. The results are depicted in Figs 1 and 2 and Supplementary Figures 1 to 20.

As shown by Figs 1 and 2 and Supplementary Figures 1 to 20, this idea can improve performance of all the algorithms except CFinder on the Collins dataset. We can partition results into three groups. The first group includes the algorithms that have significant improvement. The algorithms in the second group, have satisfactory improvement and all algorithms with partial improvement are placed in the third group. According to this classification, we placed IMHRC, ClusterONE, RNSC on the Collins dataset and MCL, CFinder on the Krogan Core, in the first group when the SGD was used as gold standard. The second group included IMHRC on the Gavin dataset and AP, CMC, RRW on the Collins dataset and also AP, RNSC, CMC, RRW on the Krogan Core dataset when the SGD was used as gold standard. Finally, all remained cases were placed in the third part. The results of IMHRC show that the improvement is partial when MIPS is the gold standard. Analyzing the results based on SGD gold standard shows that, MCL, CMC and RRW on the Collins Dataset have the best performance when we remove a lot of hubs and don’t put them back. Whereas, MCL, CFinder and RRW on the Krogan Core dataset have the best performance when we removed a lot of hubs and put them back. Nevertheless, IMHRC, ClusterONE and RNSC on the Collins dataset have the best performance when almost half of the eliminated hubs would be put back. In some cases, we need to eliminate a few hubs for accessing the best performance. For example, AP and RNSC on the Krogan Core dataset are two cases to name. For these cases, we also need to put eliminated hubs back. But it didn’t need to do for IMHRC on the Gavin, AP on the Collins, and CMC on the Krogan Core. Table 6 shows the performances of all algorithms after removing and putting hubs back.

Table 6 Influence of removing and putting hubs back in all methods used in this paper.

Methods

Terminologies

As mentioned, mathematically, a PPI network is modeled as an undirected weighted graph \(G=(V,E,W)\) where V is a set of nodes, \(E=\{{e}_{ij}:i,j\in V\}\) is a set of edges and \(W:E\to { {\mathcal R} }^{+}\) is a function that assigns a weight (a positive value between 0 and 1) to every edge in the graph, in a way that nodes denote the proteins, edges denote interactions between proteins and the weights denote credibility of interactions. In this model, every \({C}_{k}=({V}_{k},{E}_{k},{W}_{k})\) where \({V}_{k}\subseteq V\), \({E}_{k}\subseteq E\), \({W}_{k}\subseteq W\) shows k th cluster or subgraph which is distinguished by a graph clustering algorithm. For any protein \(\,v\) \(\in V\), \(N(v)=\{a|va\in E\}\) is a set of neighbors of \(v\) and \(deg(v)=|N(v)|\) is the degree of \(v\). Let \(w\,(i,j)\) indicates the weight of e ij and \(A=[{a}_{ij}]\) indicate adjacency weighted matrix of \(G\) which is defined by:

$${a}_{ij}=\{\begin{matrix}w(i,j),if\,(i,j)\in E\\ 0,\,otherwise\end{matrix}$$
(6)

Also, the weighted degree of node i is defined by as:

$$de{g}_{w}(i)=\sum _{j\in N(i)}{a}_{ij}$$
(7)

We defined the weighted degree of a predicted cluster C k as:

$$de{g}_{w}({C}_{k})=de{g}_{w}^{in}({C}_{k})+de{g}_{w}^{out}({C}_{k})$$
(8)

In which \(de{g}_{w}^{in}({C}_{k})\) and \(de{g}_{w}^{out}({C}_{k})\) are inner weighted degree and outer weighted degree of cluster C k respectively and defined as follows:

$$de{g}_{w}^{in}({C}_{k})=\frac{1}{2}\sum _{i\in {V}_{k}}\sum _{j\in N(i){\cap }^{}{V}_{k}}{a}_{ij}$$
(9)
$$de{g}_{w}^{out}({C}_{k})=\sum _{i\in {V}_{k}}\sum _{j\in N(i)-{V}_{k}}{a}_{ij}$$
(10)

Also the density of C k is defined by ref. 36:

$$de{n}_{w}({C}_{k})=\frac{2\times de{g}_{w}^{in}({C}_{k})}{(|{V}_{k}|)(|{V}_{k}|-1)}$$
(11)

In addition, quantifying the extent of overlap between two clusters A and B were calculated in accordance with neighborhood affinity score that is defined as3:

$$O\,(A,B)=\frac{{|A{\cap }^{}B|}^{2}}{|A|\times |B|}$$
(12)

IMHRC Algorithm

In our approach, we detected complexes in four steps. These steps were designed to reduce noises in the network, process weighted graphs, consider the property of the overlapped complexes and indirectly consider times of interaction occurrence.

Step 1: In this step β percentage of the vertices with the highest degree (hub) were removed from the PPI network. The intuition behind the hub removal was based on two effects. First, since the occurrence probability of interactions are independent and identically distributed (i.i.d), the higher a node’s degree is, the more likely to have false positive interactions. Therefore, removing the nodes with a high degree (hub) can eliminate much more false positive interactions (Fig. 6). Second effect is to add the time asynchronization to the network implicitly. Since biological interactions occur in different times14, many interactions may have different times. But the weighted graph which is constructed from PPI network is static and it can’t distinguish the time of different interactions31. In this situation, it is possible that a vertex with low or normal degree which is common between two or more complexes turns into a vertex with a high degree or hub which is known as an inter-module hub. In fact, it is possible for a dense subgraph to rises from integrating a number of smaller and sparser subgraphs, with common vertex (Fig. 7)28. Our study show that such a dense subgraph is recognized wrongly as a big complex, by many methods. Although we know that this isn’t a comprehensive rule for all hubs (such as module hubs) in the main graph, a significant amount of hubs in the network behaves as mentioned. In other words, there is not a specific threshold that separates module hubs and inter-module hubs. By removing hubs, we expected that not only have the effects of ignored occurrence time of biological interactions been reduced, but also the new manipulated graph has less noise and is less complicated. In this way, we have a sparser graph of which the dense subgraphs are more obvious and complexes can be more easily detected. To implement this step, we constructed a new graph by removing hubs. In order to select hubs to be eliminated, we sorted them in the priority queue based on their degree and select β percentage of them from the head of the queue.

Figure 6
figure 6

Removing hubs decreases noise and make the graph sparser. The red nodes in the figure (a) denote inter-module hubs and the green nodes denote module hubs in the network. When we remove inter-module hubs from the network, we will eliminate some noise and will have sparser and well-separated graph (b).

Figure 7
figure 7

Effect of eliminating occurrence time of different interactions. The probability of creating wrong dense subgraph and hub vertices are increased when the time is not considered. The orange node is a sample that shows this situation. This normal node is a member of two dense complexes while it changes to a hub and a member of a larger subgraph in PPI network.

Step 2: In this step we tried to recognize primary clusters. In many methods it is a common assumption that complexes are considered as dense subgraphs24. We used this idea to detect the primary clusters, too. These clusters might have overlapped properties. In our definition, density was a concept that is a composition of structure and weights of edges. On the other hand, a dense subgraph not only is well separated from the rest of the network, but also its inner edges have more weight than outer edges. For detecting these subgraphs, we used a quality function named “modularity function”. The main idea in this step was based on maximizing modularity function in the form of local and greedy. The Value of modularity function for subgraph C k was calculated by the equation (13).

$$Q({C}_{k})=\frac{de{g}_{w}^{in}({C}_{k})}{de{g}_{w}^{in}({C}_{k})+de{g}_{w}^{out}({C}_{k})+p|{C}_{k}|}$$
(13)

In this formula, “p” is a controlled variable, by which, we could model uncertainty. Actually, because of limitation in the experimental methods, all interactions have not been discovered. So “p” can be observed as an agent for these undiscovered interactions and implicitly consider them in the calculation of function. On the other hand, “p” is also a way to consider noise. In fact, this variable helped us control the sensitivity of density variation that stems from adding or removing 1 node to the subgraph. If there is significant amount of noise in the network, the density altered dramatically when 1 node is added to or removed from the spars subgraph. So, the smaller the size of subgraph is, the more effective the role of “p” is in adding a node to or removing a node from the subgraph. For implementing this approach, we acted as ClusterONE algorithm13. At first, we selected one node which was not in any available cluster and has the highest degree. This node which is called the “seed”, can create a new cluster. Next we tried to maximize the value of modularity function of the new cluster by an iterative and greedy approach. In this process, the best decision was made for maximizing the modularity function in every step. The best decision could be adding an external boundary node to or deleting an internal boundary node from current cluster. In this definition every external node which is the neighbor with at least one of the cluster nodes is called “external boundary node” of that cluster; and every internal node of a cluster which is the neighbor with at least one of a cluster’s external boundary node is called the “internal boundary node” of that cluster. After reaching to the maximal value of modularity function for a growing cluster, it was introduced as a new primary cluster. We repeated the process for the remaining nodes. This greedy process is explained in five following steps. Let \({u}_{0}\) represents the initial seed:

  1. (1)

    Let \({C}_{{k}_{0}}={u}_{0}\) and the step number \(t=0\)

  2. (2)

    Calculate the value of modularity function for \({C}_{{k}_{t}}\) and set \({C}_{{k}_{t+1}}={C}_{{k}_{t}}\)

  3. (3)

    For every external boundary node u of \({C}_{{k}_{t}}\) calculate the modularity function for \({C}_{k}^{\prime} ={C}_{{k}_{t}}\cup \{u\}.\) If \(Q({C}_{k}^{\prime}) > Q({C}_{{k}_{t+1}})\), let \({C}_{{k}_{t+1}}={C}_{k}^{\prime}\).

  4. (4)

    For every internal boundary node u of \({C}_{{k}_{t}}\) calculate the modularity function for \({C}_{k}^{\prime\prime}={C}_{{k}_{t}}-\{u\}.\) If \(Q({C}_{k}^{\prime\prime}) > Q({C}_{{k}_{t+1}})\) then let \({C}_{{k}_{t+1}}={C}_{k}^{\prime\prime}\).

  5. (5)

    If \({C}_{{k}_{t+1}}\ne {C}_{{k}_{t}}\), let \(t=t+1\) and return to step 2. Otherwise, maximal value of modularity function for \({C}_{{k}_{t}}\) is reached. Therefore, \({C}_{{k}_{t}}\) is recognized as a new primary cluster.

It should be noted that the initial seed could be eliminated from the cluster during growth process like as others nodes. In addition, every node only had one chance to be a seed of a new cluster. So the eliminated seed could no longer be considered as a seed but it could be added to another cluster in its growth process.

Step 3: After introducing primary clusters, in this step we tried to repair clusters and merged some of them. In repairing phase, γ percentage of eliminated hubs were considered and checked whether adding them to the primary clusters, increased the modularity functions or not. In fact, we wanted to add module hubs to the primary clusters and filter inter-module hubs. This part of hubs were selected from initially eliminated hubs which had the lower degree than the others. If the hubs are inserted in an ascending priority queue based on their degrees, in form of \(({x}_{1},{x}_{2},{x}_{3},\ldots {x}_{n})\), γ percentage includes \(({x}_{1},\ldots {x}_{\gamma n})\) nodes. In repairing phase, an iterative process was run on all primary clusters. In this process x i was added to a primary cluster. If modularity function of the new cluster increased, the change was preserved and the same process was repeated on updated cluster with x i+1. On the contrary, if modularity function did not increase, the change did not maintain and the same process was repeated on the primary cluster with x i+1. It is obvious that module hubs have more chance than inter-module hubs to be added to the primary clusters. This is due to the number of outer edges which is usually more than the number of inner edges in an inter-module hub. This is often reversed for module hubs. So adding module hubs usually could increase modularity but it was not true about inter-module hubs. It is notable that index i was begun from 1 to \(\lfloor \gamma n\rfloor \) for every primary cluster. The threshold γ was chosen as a value between 0 to 10 percent. Actually, our study show that β − γ percent of eliminated hubs with higher degrees are inter-module hubs. So not only does deleting this group of hubs reduce the complexity of the network, but also in this manner a significant amount of noise in the network is reduced.

After repairing phase, the clusters which had a significant overlap with each other were merged. For implementing this process, we created a new graph called “overlap graph”. In overlap graph, every cluster is indicated as a node and the amount of overlap between two clusters is represented by a weighted edge. This edge is created if the overlap value is above the overlap threshold (max-overlap). Based on the overlap graph, every pair of nodes was sorted in a priority queue according to their overlap value if they had an edge between them. Overlap value was calculated according to the equation (12). Next in finite steps, one pair popped from the head of the queue. If the overlap value of the pair was above the overlap threshold (max-overlap), they were merged and then the queue are updated with a new cluster and the old clusters are deleted. When there aren’t any pairs for merging, the process was terminated. This process demonstrates a fundamental difference between IMHRC and ClusterOne. ClusterOne partitions primary clusters into several groups. Each cluster will be put to a group, if its overlap value with at least one of the members of that group, is above the overlap threshold −0.8 as default-. Then, ClusterOne merges members of each group without any updating phase.

Step 4. In this step, all remaining clusters that contained less than three members were discarded. This approach is common in literature. In the final part, the clusters with density below 0.3, were discarded. The value of density was calculated according to equation (11).