Identifying protein complexes in PPI network using non-cooperative sequential game

Identifying protein complexes from protein-protein interaction (PPI) network is an important and challenging task in computational biology as it helps in better understanding of cellular mechanisms in various organisms. In this paper we propose a noncooperative sequential game based model for protein complex detection from PPI network. The key hypothesis is that protein complex formation is driven by mechanism that eventually optimizes the number of interactions within the complex leading to dense subgraph. The hypothesis is drawn from the observed network property named small world. The proposed multi-player game model translates the hypothesis into the game strategies. The Nash equilibrium of the game corresponds to a network partition where each protein either belong to a complex or form a singleton cluster. We further propose an algorithm to find the Nash equilibrium of the sequential game. The exhaustive experiment on synthetic benchmark and real life yeast networks evaluates the structural as well as biological significance of the network partitions.

neighborhood expansion mechanism based approach, ClusterONE, for detecting overlapping protein complexes is proposed in ref. 2. ClusterONE uses a greedy search procedure to find highly cohesive groups of vertices. The search process is performed repeatedly considering different seeds to form multiple, possibly overlapping groups. To obtain the resulting complexes, the overlapping groups are merged based on their overlap scores.
Other popular and recent developed methods for protein complex identification includes Affinity Propagation 13 , multi-objective based framework 14,15 , PPSampler2 16 and PEWCC 17 . Affinity Propagation 13 groups data points based on their similarity. All data points are initially considered as potential "exemplars". The algorithm first finds sub-paths that allow easy message exchanges between nodes. In subsequent steps, message exchange between the nodes are continued until a high quality cluster is formed. The multi-objective based framework 14 optimizes two objective functions, one is based on density of a module and another on the Gene Ontology based semantic similarity among proteins, resulting in dense and functionally homogeneous protein complexes. PPSampler2 16 is a modification of its previous version PPSampler 18 which partitions proteins into clusters based on a scoring function generated from the topological properties of the PPI network. A novel algorithm called PEWCC is proposed in Zaki et al., where reliability of the protein interaction data is assessed before partitioning the interaction network. PEWCC performs weighted clustering to partition the refined interaction networks into maximal cliques which serve as protein complexes.
A common limitation of the above mentioned approaches is that they ignore the self-organizing nature of the protein complexes, primarily triggered by the biochemical factors including hydrophilic energy, electrostatic forces between the proteins. Game theory, a natural choice of framework to model the self-organizing nature, could focus on the underlying factors that drive the protein complex formation. Bohl et al. 19 , reviews game theoretical concepts in cell biology and molecular biology focusing on the subcellular level by considering viruses, genes, and molecules as players. The existing works mostly uses two player simultaneous game for protein complex identification.
In this paper, we model the problem of protein complex detection as a multi-player sequential move non-cooperative game that models the proteins as rational players. The underlying hypothesis for the model are: (i) proteins always act in selfish manner: even when the proteins forms complexes it only optimizes its own objective(s) (ii) protein complexes are dense substructures of low diameters: most of the real life networks tend to demonstrate small world property, and (iii) there might exist some proteins which do not belong to any complex: this could be due to the physio-chemical properties. We formulate a Partial Dense vertex Cover (PDC) game for protein complex identification. The strategy set of a player is determined by a preference relation which is a transitive, reflexive function over the set of all possible vertex covers. The Nash equilibrium of the game is the minimum partial dense vertex cover. We propose an algorithm, NashPDC, to find the Nash equilibrium of the PDC game. The resulting partition generated by NashPDC represents the protein complexes.
A thorough experiment is carried out by comparing the performance of the proposed method with that of the other well-known methods such as clusterONE 2 , MCODE 11 , RNSC 12 , MCL 20 , PPSampler2 16 and PEWCC 17 . The biological relevance of the identified complexes are assessed by gene ontology and pathway based analysis. The experiment uses synthetic benchmark network and real life yeast PPI network for evaluation. The experimental results show a significant improvement achieved by the proposed approach over the other methods.

Materials and Methods
We represent a PPI network using an undirected graph G = (V, E) where a node v ∈ V represents a protein and an edge e ∈ E represents the interaction between two proteins. In this section, we discuss the proposed method which is based on the following basic concepts of graph theory and non-cooperative game theory.
Given an unweighted graph G = (V, E), G(S) denote the subgraph induced by S on G where S ⊆ V. The set of edges of G(S) is denoted by E(G(S)). For any vertex v ∈ V, the p th order open neighborhood, N p (v), is the set of vertices connected to v by a path of length less than or equal to p. The closed p th order neighborhood of vertex v is The degree density, α(G), is defined as the ratio of the minimum degree of G to the maximum possible degree, i.e. α = The edge density, δ(G), is defined as the ratio of the number of edges in G to the total number of possible edges, i.e. δ =  also termed as the minimum eccentricity of G is defined as the maximum shortest path between any two pairs of nodes, i.e. R(G) = max u,v∈V |SP(u, v)| where SP(u, v) denotes the shortest path between u, v. The local transitivity of a vertex v, denoted by t(v) is defined as the ratio of the triangles connected to the vertex and the triples centered on the vertex, i.e. = . Further, the term cover indicate a set of vertices and a graph partition is defined as a set {C 1 , C 2 , …, C K } (K is a positive integer and K ≤ |V|) such that -(i) ∀ k ∈ {1, In non-cooperative game theory, strategic form game models the interaction between a finite set of N rational players. In strategic form game a player's decision problem is to choose a strategy that will counter best the strategies adopted by the other players. Each player is faced with this problem and the players can be thought of as simultaneously choosing their strategies from the respective strategy sets. A strategic form game is modeled by a three tuple  ∈ ∈ N S ( , ( ) , ( ) ) • N is a finite set of rational players • S i denote the strategies or actions of player i while S = x i∈N S i is called the set of action profiles (or strategy profiles) and , implies ), total (for all elements a, b either  a b or  b a) binary relation on the set of action profiles. We write  a b if Scientific RepoRts | 7: 8410 | DOI:10.1038/s41598-017-08760-x  a b but not  b a. Intuitively,  a b means that strategy b is preferable to a. The preference relation may also be defined based on the outcomes of the strategies.
It is assumed that in the strategic form game the set of strategies and the players' preference relations are known to all the players. The only uncertainty concerns the actions chosen by the players. For i ∈ N let s −i ∈ S −i where S −i denote the action profile of all the players in N\i. The best responses of player i given the actions of other players, s −i , is defined as In other words, in Nash equilibrium no player i has a profitable deviation from s * .
The framework. The proposed framework is based on the concept of partial dense vertex cover, defined as below.

Definition 1.
Given an unweighted graph G = (V, E), a partial dense vertex cover C(G) is a collection of subsets of V, {C 1 , … C k } with the properties: does not contain any induced subgraph satisfying the other conditions.
Here 0 < λ ≤ γ ≤ 1 and p is a positive integer. A minimum partial dense vertex cover is a partial dense vertex cover with the minimum value of k. In this paper, the terms cover and coalition have been used interchangeably.
The proposed method models the protein-complex detection method as a partial dense vertex cover (PDC) • Each player i from the set of rational players N represents a node i ∈ V in G.
• The strategy set S i of a player i ∈ N is to -(i) propose a coalition, by sending out joining requests (ii) accept a joining request (iii) reject a joining request and (iv) leave a coalition. Once a player accepts and joins a coalition he is not allowed to propose a new coalition. A player if rejects a coalition joining request or leaves a coalition can subsequently propose a new coalition. A proposer, if do not receive satisfactory response may choose to leave the coalition. •  i is the preference relation of a player i over the set of possible partitions. The preference relation determines the quality of the partial dense vertex cover. A player prefers to belong to a minimum partial dense vertex cover or to stay alone than to belong to any random partition.
Given the preference relation  i , the Nash equilibrium of the PDC is a partition where no player can gain from unilateral deviation. For a PDC game, a partition p * is in Nash equilibrium, if for all other partitions p and for every player i,  ⁎ p p i when all the players in N\i plays their best response strategy. In other words, a minimum partial dense vertex cover is a Nash equilibrium of PDC game and vice versa. There may exist multiple Nash equilibriums. We eliminate the two trivial partitions i) where a single cover or a grand coalition forms and ii) where all the players form singleton coalitions.
The following example illustrates the above framework.
Example 1. Figure 1 shows a graph G with 12 nodes. We induce the PDC game on the graph G so that every node of G is mapped to a player in PDC. For λ = 0.6, γ = 0.65 and p = 2 the preference relation of the players are given as below. The partition {{1, 2, 3, 4, 5, 6}, {7, 8, 9, 10, 11, 12}} is in Nash equilibrium, as no player can gain from unilateral deviation.
We propose an algorithm, Nash equilibrium based Partial Dense vertex Cover Detection (NashPDC) to find the Nash equilibrium of the game PDC. The basic working principal of the Algorithm 1 is described below.
• The players are ranked based on their geometric mean of degree and local transitivity (in descending order).
This forms the rule of order. The first player in the rule of order is the one with the maximum links and maximum local transitivity. This is one of the ways to prioritize proteins. However, other approaches may be used. • The first player i of rule of order proposes a coalition C which is the p th order bounded neighborhood of i.
• Each player j in C accepts the coalition joining request only if that is his best response strategy, otherwise rejects the invitation. Once all the players either accept or reject the coalition joining request, a stable coalition C forms. Subsequently, all the players in C quit the game. • The game play is continued by repeating the above steps with the players who do not belong to any coalition until no more stable coalition can be formed.
Complexity. At the worst case, every node sends a coalition request. Selecting a player from the rule of order takes O(logN) time. For each proposed coalition at most d max p number of nodes are processed where d max is the maximum degree in G. Thus the worst case complexity of the algorithm 1 is

Results
In this section, we evaluate the performance of our proposed method with that of some state-of-the art techniques. The evaluation is conducted on both synthetic benchmark and real life yeast PPI network data sets.

Dataset description.
To test the correctness of the proposed method we first use the synthetic benchmark networks generated using the popular Girvan Newman (GN) model 21  vertices each. The inter-module edges of GN model is controlled by the mixing parameter μ. In other words, the mixing parameter μ controls how well the modules are defined by controlling the number of edges with which a module is connected to the rest of the graph. When the value of μ is strictly less than 0.5, the expected number of links joining a node to others in different groups is less than those in the same group. This yields well defined groups (modules). As the value of μ drops to 0.5 or below, it becomes difficult to identify a module boundary from the rest of the network. Irrespective of the size and sparsity of the networks, GN benchmark dataset is a popular choice to test the correctness of different module detection algorithms and compare their performances. By varying the value of μ we generate networks with well defined modules as well as the networks with modules that are not so clearly distinguishable from the rest of the network. The later is a case similar to real life scenarios where all modules are not necessarily well defined. The code to generate the GN networks are freely available at the site https://sites.google.com/site/santofortunato/inthepress2.
Further we use real life yeast PPI networks -two experimental yeast PPI datasets (Gavin et al. 22 , Krogan et al. 23 ), yeast PPI interaction derived from DIP 24 and a manually curated high-quality yeast dataset from MIPS 25 . The Krogan dataset has two variants, namely, the core data set which is referred to as Krogan core and the extended data set which is referred to as Krogan extended. The key topological properties of the PPI networks built from these dataset are given in Table 1. For comparing the resulting protein complexes with benchmark we downloaded gold standard data from the site http://yeast-complexes.russelllab.org/. It consists of 491 experimentally verified yeast protein complexes. We consider this as benchmark and compute the extent of overlap with the resulting clusters.
Validation on synthetic benchmark. We first test the performance of the proposed method on synthetic benchmark networks generated using GN model. The inter-module edges of GN model is controlled by the mixing parameter μ. We consider μ = {0.1, 0.2, …, 0.7}. To simulate the real life modules which are not very clearly distinguishable from the rest of the network, we vary the value of μ. To avoid any bias in the results obtained by conducting the experiments on a single GN synthetic network, we generate 100 instances of every configuration of GN model. The results obtained for each network configuration are then averaged and reported.
To quantitatively measure the correspondence between the ground-truth modules and the identified modules we use the metric Normalized Mutual Information (NMI). NMI, defined as below, measures the similarity between two partitions based on entropy 26,27 . For two given partitions π a ,π b , NMI is:   where n i a is the number of nodes in cluster π ∈ C i a a and n ij ab is the number of common nodes between cluster π ∈ C i a a and π ∈ C i b b . The value of NMI ranges between 0 to 1. Higher the value, the more similar the two partitions are with the maximum value 1 indicating two identical partitions. Figure 2 shows the NMI values for different values of μ. As shown in the figure, the proposed method correctly identifies all the modules for the values of μ upto 0.4. For higher values of μ as the inter module edge count becomes equal or more than the intra module edge count, identifying a module, thus, becomes difficult. As a result the NMI falls sharply when μ ≥ 0.5. The experiment shows the accuracy of the proposed method in identifying well defined modules. Validation on null model. To verify the performance of the algorithm on the null model, we test our performance on random Erdös-Réyni (E-R) graphs 28 . The nodes being randomly connectedly, the E-R network do not exhibit any modular structure. A good dense cluster detection algorithm should not identify any significant structure from E-R network. We run the proposed method on the E-R graphs. The proposed method do not identify any significant module. The experiment shows that the proposed method works correctly on null model.

Validation on noisy synthetic data.
Presence of noise is a very common phenomenon when dealing with complex real life data sets. Noisy data occur due to many reasons including but not limited to erroneous measurements, sampling bias. The study of stability of a method against noise is thus essential. We study the stability of the proposed method when the endpoints of the edges of a graph is rewired to a random vertex with given rewiring probability (ρ). The initial graphs are generated using GN model. We measure the performance by computing the NMI between the known partition of the initial graph and the partition obtained from the perturbed graph for various values of ρ. Figure 3 shows the NMI values for different values of ρ. As shown in the figure, the proposed method correctly identifies all the modules for the values of ρ upto 0.05. When ρ is above 0.25 the value of NMI drops sharply. The experiment shows that the proposed method correctly identifies the modules when a certain level of random noise present in the data and is thus robust against random noise.
Validation using connectivity density. To validate the identified modules from topological perspective, we use the connectivity density measure 29 . The connectivity density of a module M is defined as follows: where d(n i ) represents degree of node n i within the module M, N represents total number of connections. It simply denotes the ratio of total degrees of nodes within the module to the total number of connections. The experiment studies the relative change in the connectivity density when an identified protein complex is shifted a little. A shifted (replaced) module is obtained by randomly replacing a small portion of the proteins in a module with the same number of proteins outside of that module in such a way that the replacement proteins are connected with the proteins in the module but do not belong to it. For an identified protein module, a little shift is expected to decrease the connectivity density, which should not be expected from a random module. In addition, shift in random modules may results either increase or decrease in connectivity density. The experiment is carried out on the real life yeast PPI networks. Figure 4(a,b) shows the results averaged over 300 randomization experiments. Figure 4(a) is a scatter plot of the densities of identified modules and replaced modules. As shown in Fig. 4(a) for most of the modules, 20% component replacement causes the density to decrease by a significant amount dropping below the original density. With the higher rate of replacements (30% and 40%), the density decrease is even higher. Figure 4(b) shows the summary statistics of the changes in the replaced module density. As shown in the figure, on an average the replaced module density is nearly 50% of the original module density for 20% component replacement and even lower for higher replacements. The observation suggests that the identified modules are indeed densely connected local subgraphs, and thus are good candidates for functional modules in the yeast protein network. To know whether the obtained results is trivial and can be easily expected for any random modules we have performed an analysis. For this, we have repeated the same experiment for randomly selected modules from the PPI network. Figure 4(c,d) shows the scatter plots of the densities of random as well as replaced modules. It is observed that there is no distinct pattern which distinguishes the densities of modules with 20, 30 and 40% replacement.
Evaluating structural quality using ground truth. In this experiment the performance of the proposed method is compared with the base line methods by using metrics that measure the correspondence between the predicted complexes and the reference complexes. The metrics used for evaluation are: (i) Jaccard index based performance metric, matching score(D), (ii) sensitivity(Sn), positive predictive value(PPV), and accuracy(Acc) 2, 30 .
Let P = {P 1 , P 2 , … P n } and R = {R 1 , R 2 , … R m } are the sets of predicted complexes and reference complexes, respectively. The matching score is the geometric mean of S and T, defined as: where S is the average predicted complex-wise overlap defined as = ∑ = S Ov n i n i  PPV max Since Sn is maximum when every protein is assigned to the same cluster, while the PPV is maximum when every protein is in its own cluster, it is necessary to balance the two measures. The Geometric Accuracy (Acc) represents a trade-off between sensitivity and the positive predictive value and is defined as: The advantage of taking the geometric mean is that it yields a low score when either the Sn or the PPV metric is low. High accuracy value thus indicates a high performance in terms of both Sn and PPV.
The results of evaluation are shown in Tables 2 and 3. In Table 2 the performance of the proposed method is compared with other five baseline methods, viz., MCODE, MCL, ClusterONE, RNSC, PPSampler2 and PEWCC in terms of the matching score D. As shown in the table, unlike other methods, the proposed method performs consistently well on all the data sets. The proposed method outperforms MCODE and MCL in most of the data sets. ClusterONE method attains higher D value for DIP, GAVIN and Krogan_Cr data sets while the same attains lower D value for other data sets. On the contrary, RNSC method attains higher D value for MIPS and Krogan_Ex data sets while the same attains lower D value for other data sets. Table 3 shows the performance of the methods in terms of the metrics Sn, PPV, Acc. It can be noticed from the table that the proposed method performs consistently well in each of the datasets. In terms of Sn the proposed method outperforms MCODE and clusterONE in all the data sets. However, RNSC and MCL exhibit higher accuracy (Acc) than the proposed method on MIPS and KROGAN-extended data sets. It is also important to note that only the proposed method consistently gives good Sn value on all the data sets. As shown in the result no other method attain consistently high Sn on all the data sets.
We further analyze the the complexes identified by the proposed method on Gavin dataset, and compare the same with that identified by clusterONE method as clusterONE attains second highest accuracy score after the proposed method. Among the experimentally verified benchmark complexes in Gavin data set, 10.62% complexes (52 complexes out of 490 complexes) are captured with more than 80% coverage by the proposed method. On the contrary clusterONE covers 8.16% benchmark complexes (40 out of 490) with more than 80% coverage. Figure 5 shows the layout of the three benchmark complexes: 'Small subunit processom' , 'complex-435' and 'complex-410' . Green and red nodes represent captured and non-captured nodes, respectively, by a particular method. From Fig. 5(a) it can be noticed that, for 'Small subunit processom' complex, the proposed method covers 89.66% proteins, while for clusterONE the percentage of coverage is 68.97%. Similarly, for the other two complexes, the coverage attained by the proposed method is much higher than that of clusterONE.
To further know the performance of the proposed method in large PPI data, we have utilized WI-PHI PPI database of yeast 31 , which consists 6,000 proteins and 50,000 PPIs. We run the proposed method on this data and compare the predicted clusters with the experimentally verified complexes. The results consists of 17 predicted clusters with minimum and maximum size eight and 39, respectively. The resulting sensitivity (S n = 0.2384), positive predictive value (PPV = 0.4619) and accuracy (Acc = 0.3318) reveals that proposed method performed well in large PPI data.  Table 3. Comparisons of results with respect to sensitivity, specificity and accuracy. To investigate whether the proposed method can detect sparse, low dense protein complexes we set the input parameter α (degree density) and δ(edge density) to lower values (α = 0.001, delta = 0.0008) and run the algorithm. We collected the resulting low dense clusters and compare these with benchmark protein complexes having lower density. Figure 6 shows two identified clusters which match with five benchmark low dense complexes such as: m-AAA protease complex, complex-329, Scs2/Opi1 complex, complex-319 and Cytochrome bc1 complex. It can also be noticed from the Fig. 6 that the unmatched proteins of the identified clusters exhibits high interactions among them and need to be explored further.
Evaluating biological significance. Functional similarity of the identified complexes. It has been observed that proteins within a complex are functionally similar 14,32,33 . For understanding the biological roles and functions of proteins, functional similarity is a more informative measure compared to the structural and sequence similarity. The semantic similarity between Gene ontology (GO) terms is used to measure the functional similarity between proteins 34 . Here we use Relevance measure proposed in ref. 35 to compute functional similarity between the identified modules. Figure 7 shows the distribution of functional similarity scores for the identified modules. In Fig. 7(a) the left pane shows the fraction of identified modules having similarity score above certain value x, while the right pane shows the distribution of similarity score. As shown in the figure for Gavin data set more than 75% modules identified by the proposed method have semantic similarity score higher than 0.5 while for the MCODE, MCL, clusterONE and RNSC it is 53.66%, 39.02%, 61.34% and 48.78%. Similarly, as shown in Fig. 7(b) a large fraction of modules identified by the proposed method on the Krogan_Extended data set has similarity score above 0.5.
Gene ontology based analysis of the identified complexes. We have performed a GO and pathway based study to biologically validate the identified protein complexes. GO represents an important resource to describe the functional characteristics of genes in a module. In Table 4 we provide the most significant GO terms, GO-id of biological process (BP) annotation of top 15 identified protein complexes in Gavin dataset. In Table 4 last column provides the p-value of each annotated GO-term. The p-value is computed by comparing the GO terms shared by the genes in the module to the background distribution of annotation. We have utilized a widely used webserver David functional annotation tool (https://david.ncifcrf.gov/) to perform the gene enrichment analysis. Here, the obtained p-values are calculated by using Fisher's exact test. The obtained p-values are also subsequently adjusted for multiple comparisons using a Bonferroni correction based on the number of genes in the modules. The p-value of a gene module signifies the probability of observing at least x number of genes out of total n genes in the module annotated to a particular GO terms, given that the proportion of genes in the whole genome are annotated with that GO terms. So a p-value of a module closer to zero signifies that it is less likely to observe the annotation of a particular GO term to a group of genes occurs by chance. For comparison purpose we have computed the p-values of all the protein complexes identified by MCODE, MCL, clusterONE, RNSC and PPSampler2 algorithm. The distribution of p-values is given in Fig. 8. The left pane of the Fig. 8 shows the fraction of modules having p-value below certain value x while the right pane shows the distribution of p-values. As shown in the figure for any given p-value, the fraction of predicted complexes identified by the proposed technique is larger than the other methods.
Evaluation using aggregated ranking. The above experiments measure the performances based on various metrics and the results show that the different protein complex identification methods rank differently for different metrics. Thus it is difficult to comment on the best performing algorithm. In this section, we evaluate the performances of the methods using an aggregated ranking procedure. We use three topological measures namely density, betweenness centrality and clustering coefficient and a biological relevance measure namely GO based semantic similarity for ranking each complex. Subsequently, a rank aggregation scheme 36 is used to aggregate the individual ranks. A comparison between the predicted complexes identified by the proposed method and that   equilibrium of the game corresponds to a minimum partial dense vertex cover of the given network.We carry out a thorough experiment for validation of topological and biological relevance of the identified complexes. The data sets used in the experiment are synthetic benchmarks and real life yeast PPI data. The experiments on synthetic benchmark network and null-models proves the accuracy of the proposed method. We further test robustness of the proposed model on perturbed synthetic benchmarks where edges are rewired with certain probability. The results prove that the developed model accurately identifies the network module even when the rewiring probability is moderately high. The tests on real PPI networks also shows promising results. The validation through  Table 5. Table shows the comparison of top rank five predicted complexes of proposed and clusterONE method. Second column represents benchmark complexes detected by the methods. Third column represents proportion of proteins that are involved in some benchmark complexes. connectivity density proves that the identified modules on the real data are dense indeed and good candidates for protein complexes. The partitions are further validated using matching score, sensitivity, positive predictive value and accuracy metrics against experimentally annotated data. Although the proposed method does not outperformed all other state-of-the-art with respect to the performance metrics, but it achieves a consistently good score for all of the datasets. It is noticeable that, although RNSC and MCL achieve higher accuracy than the proposed method in MIPS and KROGAN data, they are not performing well while comparing with respect to the functional similarity score. The experiment shows that the proposed method attains high functional similarity score for most of the identified modules. Additionally, from the experiment we find a common drawback with the existing approaches is that they find a few well-matched clusters and a high number of poorly matched clusters. The proposed approach overcome this issue and finds the clusters whose matching score is far less skewed compared to the existing approaches. Finally we measure the performance using an aggregated ranking approach which proves the superiority of the proposed method over the base-line methods. It is noticeable that the proposed approach is also capable to identify sparse, low dense protein complexes. Setting the input parameters α and β to a low value, the proposed method results low dense clusters which may be treated as possible candidates for sparse, low dense protein complexes. Applying game theory for sub-cellular events like protein-protein interactions is a rather new approach of research. For simplicity, the proposed method is designed to detect the disjoint complexes. However, with the use of appropriate data structure and by removing the restriction that if a protein joins a coalition, then the protein is removed from the game, the proposed method can be extended to detect the overlapping complexes. In future we would like to study the weighted PPI networks for the analysis of protein complexes and functional modules.