Graph theoretic network analysis reveals protein pathways underlying cell death following neurotropic viral infection

Complex protein networks underlie any cellular function. Certain proteins play a pivotal role in many network configurations, disruption of whose expression proves fatal to the cell. An efficient method to tease out such key proteins in a network is still unavailable. Here, we used graph-theoretic measures on protein-protein interaction data (interactome) to extract biophysically relevant information about individual protein regulation and network properties such as formation of function specific modules (sub-networks) of proteins. We took 5 major proteins that are involved in neuronal apoptosis post Chandipura Virus (CHPV) infection as seed proteins in a database to create a meta-network of immediately interacting proteins (1st order network). Graph theoretic measures were employed to rank the proteins in terms of their connectivity and the degree upto which they can be organized into smaller modules (hubs). We repeated the analysis on 2nd order interactome that includes proteins connected directly with proteins of 1st order. FADD and Casp-3 were connected maximally to other proteins in both analyses, thus indicating their importance in neuronal apoptosis. Thus, our analysis provides a blueprint for the detection and validation of protein networks disrupted by viral infections.


Results
From our results in an earlier study 17 we concluded that CHPV induces neuronal apoptosis through Fas-mediated extrinsic apoptotic pathway with the involvement of the following five proteins: Fas, Fas-associated Death Domain (FADD), Caspase-8 (Casp-8), cleaved Caspase-3 (Casp-3), and X-linked inhibitor of apoptosis (XIAP) (Fig. 1, see also Methods section for more details). These 5 proteins were inserted as inputs to STRING 9.1 online database (http://string-db.org/) for extraction of the 1 st and 2 nd order interactomes. The 1 st order interactome contained 26 proteins while the 2 nd order contained 71 proteins (Fig. 1b,c). The names of each protein from 1 st and 2 nd order interactomes are presented in Table 1.
The MATLAB-based Visual Connectome Toolbox 21 was used for graph-theoretic analysis of 1 st and 2 nd order interactome data. We computed the degrees of freedom (degree centrality) for each protein in the 1 st & 2 nd order networks. Subsequently, we arranged them in a descending order ( Table 2). From Table 2, we observed that in both 1 st and 2 nd order networks FADD and Casp-3 are the common members among the top 5 proteins having highest degree centrality values. Mutual cross-validation of results from 1 st and 2 nd order network analysis confirms that FADD and Casp-3 are dominant players in apoptotic pathway underlying CHPV infection in neurons. Modularity determines how well a network can be divided into subgroups (hubs). Generally the modularity score ranges between [− 0.5, 1) with more modular networks having a positive score. A more randomly assigned network will have a modularity score of approximately zero. We computed modularity scores of both 1 st and 2 nd order networks sets. The modularity score of the 1 st order network was 0.36 while the 2 nd order was 0.41. Theoretically, due to the random partitioning of nodes into modules to initiate the graph theoretic algorithm, the results may vary trial to trial unless the modular structure is significantly unambiguous. In our data set the modularity score remained unchanged in all 50 repetitions of the analysis. Additionally, we evaluated the significance of the estimated modularity score by comparing with the modularity scores of a random network with an identical number of nodes. We start with an adjacency matrix with all values set to zero for a given number of nodes. Then we randomly assigned a value 1 in upper diagonal matrix locations. Finally, symmetric locations in lower diagonal positions are assigned values 1 to design the adjacency matrix for which network metrics are computed. Diagonal elements were always assigned a value 0 to avoid self-connections. The mean modularity score of a random network (50 repetitions) with 26 nodes was 0.13, whereas for a random network with 71 nodes, the score was 0.09. In both cases the estimated modularity values of the empirical networks were statistically significant at Bonferroni corrected p < 0.05 Scientific RepoRts | 5:14438 | DOi: 10.1038/srep14438 (χ 2 = 20.67, df = 1 for 1 st order and χ 2 = 58.40, df = 1 for 2 nd order interactome). In case of the 1 st order network, our analysis indicated the presence of 4 modules whereas in case of 2 nd order network 12 modules were identified.
Using the Ci scores from Table 1 and Fig. 2, we color coded each module in Fig. 1(b,c). Modules 2 and 4 of the 1 st order interactome and module numbers 3 and 2 of 2 nd order interactome, respectively were presented in identical colors because they have multiple common members. The common members of module number 2 from 1st order and 3 from 2 nd order are Casp-8, Tnfrs10b, Cflar, Fas, FADD, TRADD. Module 4 from 1 st order and 2 from 2 nd order has Casp-9, Casp-7, XIAP, Apaf-1 and Diablo. Extraction of a consistent network structure from the analysis of 1 st order and 2 nd order interactomes provides confidence about the biological relevance of the key modules. Table 3 lists the UniProt IDs of all proteins identified in the 1 st and 2 nd order interactomes.

Discussion
In this report we propose an analysis framework to compute the modular structure of a complex protein-protein interaction network (interactome). The choice of seed proteins: Fas, Fas associated Death Domain (FADD), Caspase-8, Caspase-3 and X-linked Inhibitor of Apoptosis Protein (XIAP) for the construction of the interactome was guided from our previous experimental findings 17 . These were apoptotic proteins over-expressed in mouse neurons following Chandipura Virus infection. We used the STRING 9.1 database to compute the first order interactome. There are currently several bioinformatics toolboxes available, each with their own set of unique controls. We chose STRING 9.1 because it was the only method to the best of our knowledge that allowed us to prune networks based on a statistical confidence level. However, it is pertinent to note that the database used to extract the interactome will immensely influence the estimation of any functional modular structure. A study comparing the interactomes extracted from several data sets may potentially help future research in terms of data interpretation. Next, we computed the graph theory metrics: Modularity Score (Q), Community Structure (Ci) and Degree Centrality (Z) to infer further about the protein-protein interactions underlying apoptosis. To establish the predictive validity of our analysis, we constructed a second order interactome based on secondary interacting partners of the seed proteins using the STRING 9.1 database (at 95% confidence) and re-calculated the graph theory metrics. The consistent presence of key protein assemblies in the first order and second order interactomes provides confidence regarding the robustness of our approach. Finally, we compared the closeness of modularity and degree centrality computed in empirical networks with that obtained for simulated random networks. Since, no modular structures are expected   Table 1. Protein names, community structure value (Ci) score of the 1 st and 2 nd order interactome. The protein names for the table were arranged according to the chronology in which they have been queried from the STRING 9.1 database. The first 5 are the proteins whose expressions were monitored empirically the next 21 were the primary interacting partners. The next 45 secondary interacting partners were added to the list.
in a random network, this addressed the issue of face-validity, that is, whether the method is successful in extracting meaningful information and helped us control false positives. Modularity score (Q) of a network ranges between [− 0.5, 1) with negative Q scores signifying random interactions within the network. As the within group interactions increase, the network starts to become more modular and the Q value shifts more towards the positive side nearing to 1. For every network there exists an optimal Q value beyond which the modularity score cannot be enhanced even if we increase the number of modules. In our case we have determined the Q values of 1 st order and 2 nd order ineractome are 0.3911 and 0.4716, respectively. These scores were stable across 50 independent runs. The community structure also remained unchanged. These two findings give us the confidence to state that protein-protein interactions are indeed highly modular due to their inherent biological properties. Hence it is pertinent that the interactive nodes of both the networks have been classified into a maximum number of possible modules. Next we focus on each module to decipher their biological significance.
In the first order interactome, 4 interactive modules were identified (Fig. 1b). We could clearly characterize that all proteins segregated in separate modules on the basis of their functional role in the apoptotic process. Module 2 and 3 consist of all the proteins which are mostly known as death domain (DD) and death-inducing signalling complex (DISC). Proteins like FADD, TRADD, Cflar RIPK1, Daxx, Bcap31, Tnfrs1a & 10b have been reported to contribute the DD 22,23 while Caspase 8 and FADD forms the DISC 24,25 . Other members like Fas (Module 2) and TNF (Module 3) are commonly known as the initiators of the death process. Module 3 consists of proteins that are co-stimulators of tumor necrosis factor (TNF) induced cell death whereas Module 2 consists of proteins that contribute to both Fas and TNF pathways. Module 4 is a heterogeneous group that consists of both apoptotic activators and inhibitors that belong to caspase group. XIAP has been previously reported both in our previous report and other researchers to be a Casp3 antagonist 17 while Birc2 & Birc3 are well known to be in association with TNF to combat the apoptosis signalling 26,27 . Surprisingly, TNF and Birc2 and Birc3 were not in the same module in our 1 st order interactome. Other apoptotic activators of module 4 are Apaf-1 and Diablo along with the caspases like Casp 9 & 7. Overall this module represents proteins that are affecting the intermediate phase of apoptosis before the appearance of the final executioner of the apoptotic pathways. Module 1 is a classical cluster consisting of the close interactors of Casp3, the final executioner of the apoptotic pathway. This module consists of some of the targets of Casp3 which gets cleaved in order to bring about various changes in the cellular environment and to help in completion of the apoptotic process. Both Dffa and Dffb are cleaved by Casp3 to effect the DNA fragmentation 28 while Gsn cleavage brings about morphological changes to the cell during apoptosis 29 . Ngf 30 has been previously reported to be closely associated with Casp3. Akt-1 activation in response to cytokine receptor signalling has been associated with anti-apoptotic processes 31 . In our analysis we observed that although Akt-1 is linked with other modules, its association with Caspase-3 is strong, and as a result Akt-1 has been grouped in Module 1. However, the scenario changes drastically once we enhance the network including the primary interactors of each of the proteins in the 1 st order interactome model to develop the 2 nd order interactome.
The 2 nd order interactome segregated into 12 modules, among which 7 were larger groups, each containing 6 or more members while the rest were smaller groups with single nodes (Fig. 1c). The module configurations of the 2 nd order interactome clearly indicate that most members of module 3 and 2 are also present in module 2 & 4 of the 1 st order interactome, respectively. Module 3 in 2 nd order interactome consists of Casp-8 and FADD, key players of the DD and DISC processes. Module 2 is now an integrated assembly formed from nodes of module 1 and 4 of 1 st order interactome and consists of proteins taking part in the intermediate stage and the final execution of apoptosis. A closer look at the 2 nd order interactome reveals the 4 major groups apart from 2 and 3. Modules 1, 8 and 9 have been built around few of the major anti-apoptotic proteins of the 1 st order interactome for example Akt1, Birc 2, Birc3, Traf2 and TNF. We observed that in 1 st order interactome TNF and Traf2 were included within module 3 whereas in 2 nd order interactome TNF and Traf2 were placed in modules 8 and 9 respectively. TNF has been earlier reported to be involved in activation of apoptotic pathways 32,33 . But from our analysis we propose TNF may have some anti-apoptotic function based on its interactions with cytokines IL-1a and IL-6, that have been reported to be involved in cell survival 34,35 . The modules 8, 10 and 11 being influenced by the anti-apoptotic proteins form a significant part of this network that was not so prominent in the 1 st order interactome. Other modules such as 4, 5, 6, 7 and 12 although consisting of fewer members in the context of our study, have the potential to embark into larger modules if an even bigger network is considered. This is simply because these modules consist of very important proteins that have been known to play pivotal roles in apoptosis.
Degree centrality is simply defined as the interaction score of a particular node within a network. The more interactions a node has within a group of nodes which are mutually interacting among each other, the higher its chance will be to form a module. Hence the community structure formation largely depends upon degree centrality of the nodes within a complex network. Casp3 and FADD were ranked among top 5 proteins when nodes of 1 st and 2 nd order interactomes were sorted in terms of degree centrality (Table 2). This signifies the pivotal role played by these two proteins in apoptosis and also gives us confidence to interpret the biological significance of modules from graph-theoretic measures. In Table 2 we see an interesting pattern. Nodes in the 1 st order interactome that have positive degree centrality scores remained to be in the positive side in the 2 nd order interactome. However, degree centrality of nodes that had 0 or negative values in the 1 st order connectome either enhanced or got depreciated in 2 nd order. In order to explain this pattern we have to carefully analyze both the interactome models. Nodes having positive scores in the 1 st order connectome interact not only maximally within their modules but also with other nodes in different modules. Hence, with the increase in number of interacting partners in the 2 nd order interactome, the overall connectivity is enhanced for the constituent nodes. For example, Akt1 in the 1 st order interactome interacts with several nodes of different modules but not consistently within one module. However, in the 2 nd order interactome, the degree centrality of Akt1 increased and creation of a separate module involving Akt1 was observed 36,37 . Other nodes like TNF and Traf2 that were in one module in 1 st order, increased their interactive partners and gained entry to bigger modules in 2 nd order interactome. Nodes that have a predominant role to play in apoptosis maintained their modules and their degree centrality scores across both 1 st and 2 nd order interactome models.
In conclusion, we have outlined a robust method for studying the interactome underlying apoptosis following CHPV infection. This method may be used to study other metabolic pathways in order to yield important information about the strategic proteins of a specific network and the functionally important modules within the network. In the future, therapeutic targeting of particular proteins in case of various disease conditions needs to be investigated.

Methods
Empirical data. In an earlier study 17 , samples of Chandipura Virus was inoculated into Balb/c mouse intraperitoneally (i.p.) post-natal 10 days, at a plaque forming unit (pfu/ml) of 3 × 10 5 . The animals developed CHPV related symptoms of hind limb paralysis, high grade fever and severe weight loss, within 72-96 hours post infection leading to death. From immunoblotting and immunostaining analyses performed on the extracted brain tissue, we found over-expression of 6 proteins of the extrinsic apoptosis pathway: Fas, FADD (Fas-associated Death Domain), Caspase-8, Caspase-3 and XIAP (Poly ADP Ribose Polymerase-1). Our results were further validated using RNAi studies, ELISA assays and flow-cytometric analyses 17 . Table 3 enlists the protein names with their corresponding Uniprot IDs.

Generation of meta-network. STRING (Search Tool for the Retrieval of Interacting Genes/Proteins)
is an open datasource providing information about protein-protein interactions based on experimental data, computational prediction methods and public database 38 . STRING 9.1 database contains information about more than 5.4 million proteins and > 1100 organisms 39 . The database has two modes of applications: Protein-mode (for protein interactions) and COG-mode (for gene interactions). STRING imports protein association information from databases of physical interaction and curated biological pathway knowledge (MINT, HPRD, BIND, DIP, BioGRID, KEGG, Reactome, IntAct, EcoCyc, NCI-Nature Pathway Interaction Database, GO). Protein/genes are queried to the STRING database which as an output that represents the associations in the form of a graph network with nodes (proteins/genes) and edges (interactions). The edges are weighted, integrated and a confidence score is assigned to each of them based upon the evidence of the association obtained from experimental data, computational prediction and public data collection methods. Based on these edges are assigned various shades of color (blue) 38 . The prediction methods generally used in determining the interactions are: Neighbourhood. This method of prediction utilizes the theory that protein interactions validated in case of one or more species is predicted to carry more weightage and confidence score.
Gene Fusion. Proteins fused in one genome are likely to be functionally linked and hence carry stronger association.
Co-occurrence. Occurrence of two proteins within the same metabolic pathway is predicted to functionally linked with each other. Hence their co-occurrence strengthens their confidence score.

Co-expression.
Simultaneous expression of two proteins is also predicted to have strong interaction between them.

Generation of 1 st order interactome
The 5 proteins identified through molecular analyses were queries in the STRING 9.1 that produced 26 interacting partners as an output from the Mus musculus database. The STRING 9.1 software defines significance of the interactions between various queried proteins in terms of confidence score. This confidence score is an empirical score defined by the number of citations and experimental evidence for a particular interaction. The highest (0.95) confidence score in the database, that defines the significance of interactions between various queried protein was chosen to extract interactomes in this study. Furthermore, we limited the number of interacting partner to 1000 in the provision for maximum interacting partners using active prediction methods as neighbourhood, gene fusion, co-occurence and co-expression.

Generation of 2 nd order network
In order to investigate the structure of an even larger network we queried for interacting partners of all the 26 proteins obtained from the previous analysis. The 2 nd order connectome in Fig. 1b was generated from STRING 9.1 database using same confidence score (0.95) as for the 1 st order connectome and limiting to 1000 interacting partners Graph theoretic analysis. The adjacency matrices for graph theoretic analysis were created from 1 st and 2 nd order interactomes. Visual Connectome analysis tool box in MATLAB was used to compute the modularity score and the degree centrality of all the nodes 40 .
Degree Centrality. Degree centrality is the property that defines the connectivity of particular node with other nodes of the same network. This means the higher number of connections of a particular node with other nodes in a network, higher is its degree centrality. The node with the highest degree centrality is the one through which maximum edges pass.
Degree centrality of a vertex v, for a given graph G = (V,E) with |V| vertices and |E| edges is defined as Modularity. Modularity score is used to measure the community structure within a network. The value of modularity ranges between [− 0.5, 1) with 0 and negative values meaning a network with randomly assigned edges to positive values indicating highly communal structure. In a given graph G (V, E) which can be partitioned into two membership variables s. If a node v falls into community 1 then s v = 1 or else s v = − 1. An adjacency matrix may be denoted by A, which says A vw = 1 means there is a connection between nodes v and w and A vw = 0 when there are no interactions. Modularity (Q) is then defined as the fraction of edges that fall within community 1 or 2, minus the expected number of edges within communities 1 and 2 for a random graph with the same node degree distribution as the given graph. The expected number of edges will be calculated using the concept of Configuration Models 41 . The configuration model is a randomized representation of a particular graph. Given a network with n nodes, where each node v has a node degree k v , the configuration model intercepts each edge into two halves, and then each half edge is defined as a stub, that is rewired randomly with any other stub in the network even allowing self loops. Hence even though the node degree distribution of the graph remains intact, the configuration model results in a completely random network. Let the total number of stubs be Modularity score is calculated as The above equation is valid for two-community structure and can be generalized into c-community structure.  B vw is also referred to as the modularity matrix which will be having elements whose rows and columns sum upto 0, so that it always has an eigen vector (1, 1, 1..) with eigen value 0 42 . The algorithm that we used, initially divided the network into two communities and in further iterations the community structure is subdivided. For a group g of size n g we can express the contribution to modularity as Certainly (8) and (11) are similar and therefore spectral approach 42 was applied to the generalized modularity matrix to maximize the values of Δ Q. ∑ ∈ B k g vw for a complete network happens to be a symmetric matrix and thus (11) turns to nothing but (8). Once Δ Q is almost 0 for an indivisible network, then further subdividing beyond this point will not contribute to the increase in modularity value Q. This can be used to terminate community structure division.
The algorithm ran with the following theory: The modularity matrix, (9) was constructed for both interactomes and found the most positive eigenvalue and the corresponding eigenvector in each case. The algorithm divided the network into two parts depending upon the signs of the elements of the corresponding vectors, and then subdividing using the generalized modularity matrix (12). In the process Δ Q comes to 0 or negative at any stage of subdivision the algorithm left subgraph undivided. Hence, the algorithm would end at a certain point when the optimal network has been estimated. In order to fine tune this method of community structure optimization further, the Visual Connectome toolbox 21 that we employed uses Kernighan-Lin algorithm 43 .