Identification of key regulators in prostate cancer from gene expression datasets of patients

Identification of key regulators and regulatory pathways is an important step in the discovery of genes involved in cancer. Here, we propose a method to identify key regulators in prostate cancer (PCa) from a network constructed from gene expression datasets of PCa patients. Overexpressed genes were identified using BioXpress, having a mutational status according to COSMIC, followed by the construction of PCa Interactome network using the curated genes. The topological parameters of the network exhibited power law nature indicating hierarchical scale-free properties and five levels of organization. Highest degree hubs (k ≥ 65) were selected from the PCa network, traced, and 19 of them was identified as novel key regulators, as they participated at all network levels serving as backbone. Of the 19 hubs, some have been reported in literature to be associated with PCa and other cancers. Based on participation coefficient values most of these are connector or kinless hubs suggesting significant roles in modular linkage. The observation of non-monotonicity in the rich club formation suggested the importance of intermediate hubs in network integration, and they may play crucial roles in network stabilization. The network was self-organized as evident from fractal nature in topological parameters of it and lacked a central control mechanism.

Prostate is a gland of the male reproductive system which secretes seminal fluid in human adult 1 . According to World Cancer Report 2014, the cancer of prostate or Prostate cancer (PCa) in man is second most common cancer, after lung cancer, and is responsible for a fifth of cancer deaths in males worldwide 2 . PCa, based on the type of origin in prostate, can be classified into five types: (i) acinar adenocarcinoma, (ii) ductal adenocarcinoma, (iii) transitional cell (or urothelial) cancer, (iv) squamous cell cancer and (v) small cell prostate cancer, with adenocarcinomas being the most common, even though metastasis is much quicker in other types 3,4 .
In recent years, gene expression studies using high-throughput techniques namely next generation sequencing, microarray and proteomics have led to the identification of new genes and pathways in PCa. The identification of novel key regulators is important as the current therapeutic modalities against PCa, including the use of antiandrogens and blocking androgen synthetic pathway 5 and using Luteinizing hormone-releasing hormone (LHRH) agonists and antagonists along with cytotoxic anticancer drugs, cause notable side effects 6,7 . Moreover, PCa diagnosis, which is largely dependent on the Prostate specific antigen (PSA) and Digital rectal examination (DRE), has its own limitations 8,9 . PSA is also elevated in benign prostatic hyperplasia and other noncancerous conditions 9 . This necessitates the discovery of more reliable biomarkers for better and early diagnosis, as well as identification of new targets other than the genes involved in androgen metabolism for the discovery and development of new and more potent drugs which have less toxicity and lesser side effects.
Genes are regulated in a coordinated way and the expression of one gene usually depends on the presence or absence of another gene (gene interaction). Network theory, which studies the relations between discrete objects through graphs as their representations, can be used to study complex gene regulatory networks which can have different types (random, scale-free, small world and hierarchical networks). The development of algorithms to study of these networks can provide an important tool to find/identify disease-associated genes in complex diseases such www.nature.com/scientificreports www.nature.com/scientificreports/ Method for detection of levels of organization. Considering the size of the network and its sensitivity, Louvain method of modularity (Q) maximization was used for community detection 17 . The first level of organization was established by the interaction of communities constructed from primary PPI network. The sub-communities constructed from all communities in the first level of organization constituted second level of organization. In the same way, successive levels were constructed until the level of motifs, thereby each smaller community had a minimum of one triangular motif defined by sub-graph G (3,3). Since the triangular motifs are overrepresented in PPI network and serve as controlling unit in a network 18 , we used motif G (3,3) as qualifying criteria for a community/subcommunity as a constituting member at a certain level of organization. Further, each community or smaller community landed up to different level of organization.
Topological analyses of the networks. Cytoscape plugins, NetworkAnalyzer 19 and CytoNCA 20 were used to analyse the topological properties of the network for centralities, degree distribution, clustering coefficients and neighbourhood connectivity. The highest degree nodes were identified as hubs of the PCa network. Top 103 hub proteins having degree k ≥ 65 were considered for tracing the key regulators of the network. Other topological parameters, viz., Rich club coefficients (Φ), Participation coefficients (P i ) and Within-module degree (Z i score) were calculated using Igraph package "brainGraph" (https://github.com/cwatson/brainGraph) in R. Another parameter subgraph centrality was also calculated using Igraph functions.
Degree (k). In the analysis of network, degree k indicates the total number of links established by a node in a network and is used to measure the local significance of a node in regulating the network. In a graph represented by G = (N, E), where N denotes nodes and E the edges, the degree of i th node (k i ) is expressed as = ∑ k A i ij N ij , where A ij denotes the adjacency matrix elements of the graph.
Probability of degree distribution (P(k)). It is the probability of a random node to have a degree k out of the total number of nodes in the network and is represented as fraction of nodes having degree (k), as shown in Eq. (1), where N k is the total number of nodes with degree k and N, total nodes in the network.
k P(k) of random and small-world networks follow Poison distribution in degree distribution against degree, but most real-world networks, scale-free and hierarchical networks follow power law distribution P(k) ~k -γ , where, 4 ≥ γ ≥ 2 . In hierarchical networks, γ ~2.26 (mean-field value) indicating a modular organization at different topological levels 21 . Therefore, P(k) pattern defines the characteristic topology of a network.
Clustering coefficients C(k). The strength of internal connectivity among the nodes neighbourhoods which quantifies the inherent clustering tendency of the nodes in the network is characterised by the Clustering coefficient C(k), which is the ratio between the number of triangular motifs formed by a node with its nearest neighbours and the maximum possible number of triangular motifs in the network. For any node i having degree k i in an undirected graph, C(k) can be expressed as Eq. (2), where m i is the total number of edges among its nearest neighbours. In scale-free networks C(k) ~ constant, but it exhibits power law in hierarchical network against degree, C(k) ~ k −α , with α ~ 1 21 .
Neighbourhood connectivity C N (k). The node neighbourhood connectivity is the average connectivity established by the nearest-neighbours of a node with degree k, represented by C N (k) can be expressed as shown in Eq.
(3), where, P(q|k) is conditional probability of the links of a node with k connections to another node having q connections.

N q
In hierarchical network topology, C N (k) exhibit power law against degree k, that is, C N (k) ~ k β , where, β ~ 0.5 22 . Further, the positivity or negativity of the exponent β can be defined as, respectively, the assortivity or disassortivity nature of a network topology 23 .
Centrality measures. A node's global functional significance in regulating a network through information processing is estimated by the basic Centrality measures -Closeness centrality C C , Betweenness centrality C B and Eigenvector centrality C E 24 . Another centrality measure, Subgraph centrality C S is also used to describe the participation of nodes in other subgraphs in the network 25 . These centrality measures collectively determine the cost effectiveness and efficiency of information processing in a network.
The closeness centrality C C represents the total geodesic distance from a given node to all its other connected nodes. It represents the speed of spreading of information in a network from a node to other connected nodes 26 . C C of a node i in a network is calculated by the division of total number of nodes in the network, n by sum of geodesic path lengths between nodes i and j which is represented by d ij in Eq. (4). www.nature.com/scientificreports www.nature.com/scientificreports/ Betweenness Centrality C B is the measure of a node which is the share of all shortest-path traffic from all possible routes through nodes i to j. Thus, it characterizes a node's ability to benefit extraction from the information flow in the network 27 and its controlling ability of signal processing over other nodes in the network 28 . If d ij (v) denotes the number of geodesic paths from node i to node j passing through node v, then C B (v) of node v can be obtained by Eq. (5).
If M denotes the number of node pairs, excluding v, then normalized betweenness centrality is given by the Eq. (6).
Eigenvector centrality C E is proportional to the sum of the centrality of all neighbours of a node and it reflects the intensity of these most prominent nodes influencing the signal processing in the network 29 . If nearest neighbours of node i in the network is denoted by nn(i) with eigenvalue λ and eigenvector v i of eigen-value equations, where, A is the network adjacency matrix, C E can be shown by the Eq. (7), E j nn i j ( ) C E score corresponds to maximum positive eigenvalue, λ max , of the principal eigenvector of A 29 . Since a node's C E function depends on the centralities of its neighbours, it varies across different networks association of high C E nodes; within closely connected locality of such nodes reduces the chances of isolation of nodes 29 . Thus, C E becomes a powerful indicator of information transmission power of a node in the network.
The subgraph centrality C S of a node calculates the number of subgraphs the node participates in a network. It can be calculated using eigenvalues and eigenvectors of adjacency matrix of the graph, as shown in Eq. (8), where λ j is the j th eigenvalue and v j (i), the i th element of the associated eigenvector. The weightages are higher for smaller graphs. Higher subgraph centrality of a node corresponds to better efficiency of information transmission and increase in essentiality of the node in the network 25 .
Within-module degree and Participation coefficients of the hubs. In complex networks the characterization of hubs as high degree nodes with higher centrality values is incomplete without exploring the role of nodes at the modular levels 30 . The role of nodes at the modular level is determined through the participation of nodes in establishing links between the nodes within the module as well as outside the module and calculating the modular degree of the nodes. Within-module degree or Z-score, Z i , signifies the connections of a node i in the modules and categorizes a node as modular hub-node with high (Z i ≥ 2.5) signifying more intra-module connectivity of the node than inter-module, whereas, lower Z values, Z i < 2.5, categorizes as non-hub nodes with less intra-module connectivity 30 . The Z-score can be calculated as shown in Eq. (9), where k i represents the number of links of node i to other nodes in its modules S i and k s i , the average of degree (k) over all nodes in S i ; σ k s i , is the standard deviation of k in S i .
The participation coefficient, P i determines the participation of the node i in linking the nodes inside and outside its module 30 . P i values lie in the range of 0−1 with higher values corresponding to the participation of nodes in establishing links outside the modules with homogeneous distribution of its links among all modules, and if k is is taken to represent the number of links of node i to nodes in modules s and k i , the total degree of node i, P i can be calculated as in Eq. (10), where, N M is the number of modules in the network.
Rich-club analysis. Identification of hubs in a network generally is done through general centrality measures, especially higher degree nodes are commonly considered as hubs and existence of high degree nodes in a network correlate with the local regulatory roles of these high degree hubs in the network 31 . This phenomenon of formation of rich club connection between high degree hubs exhibit the robustness of the network and the www.nature.com/scientificreports www.nature.com/scientificreports/ resilience when the hubs are targeted 32 . The existence of rich club phenomenon among hubs is investigated by calculating the Rich-club coefficients Φ(k) across the degree range 32 . Φ(k) is equivalent to the clustering coefficient among a subgroup of nodes with degrees ≥k. In order to remove the random interconnection probability factor, normalization of the rich club coefficients can be done by the Eq. (11), where Φ rand (k) is the rich-club coefficient of random networks with similar size and degree sequence and Φ norm (k) > 1 indicating rich-club formation. This rich club phenomenon is associated with the assortivity nature of the networks and is important to understand the roles played by these hubs' roles in the network integration and efficient transmission of signals 33 .
norm rand Tracking the key regulators in the networks. The most influential genes in the PCa network were identified first through calculating the centrality measures. Since, higher degree nodes have higher centrality values, top 103 highest degree nodes (Degree k ≥ 65) were considered among the hub nodes of the network for tracing the key regulators which may play important role in regulating the network. Then tracing of nodes from the primary network up to motif level G(3, 3) was done on the basis of representation of the respective nodes (proteins) across the sub modules obtained from Louvain method of community detection/clustering 17 . Finally, the hub-nodes (proteins) which were represented at the modules at every hierarchical level were considered as key regulators of the PCa network.
Functional association analysis of modules. The modules at all levels of hierarchy were analysed for their functional annotations with DAVID functional annotation tool 34 . The functions and pathways with corrected p < 0.05 were considered statistically significant.

PPI network in PCa follows hierarchical scale-free topology composed of modules at five levels of hierarchy.
From the interactome network of 3,871 PCa genes, the physical interacting PPI network of 2,960 proteins with 2,960 nodes and 20,372 edges was constructed as the primary network ( Fig. 1). Analysis of this primary PCa network showed that the network followed power law distributions for probability of node degree distribution, P(k), clustering coefficient C(k) and neighbourhood connectivity distribution C N (k) against degree (k) with negative exponents 22 (Eq. 12) (Fig. 2). This power law feature indicates that the network exhibited hierarchical-scale free behaviour with systems level organization of modules/communities. Further, community finding using Louvain modularity optimization method 17 led to the detection of communities and sub-communities at various levels of organization (Fig. 3A). Thus, a total of 436 communities and smaller communities were detected, out of which 38 reached up to level V, the level of motif G (3,3). Communities at the first hierarchical level also showed power law distribution for P(k), C(k) and C N (k) against degree distribution with negative exponents indicating further systems level organization of modules (Eq. 12) except in case of communities C8, C10 and C15 where the C N (k) exhibit power law against degree k with positive exponents (β ~ 0.05, 0.13, 0.14 respectively) (Fig. 2). This indicates assortivity nature in the modules indicating Figure 2. Topological properties of PCa and the modules/communities at the first hierarchical level. Degree distribution probability (P(k)), clustering coefficient (C(k)), neighbourhood connectivity (C N (K)) as function of degree (k) and centrality measurement closeness (C C (k)), betweenness centrality (C B (k)), eigenvector centrality ((C E (k)), subgraph centrality (C S ) as a function of degree. (2019) 9:16420 | https://doi.org/10.1038/s41598-019-52896-x www.nature.com/scientificreports www.nature.com/scientificreports/ the possibility of rich-club formation in these modules, where, hubs play significant role in maintaining network properties and stability 22 . used to assess the importance of the nodes in information processing in a network. Betweenness centrality C B , Closeness centrality C C , Eigenvector centrality C E and Subgraph centrality C S are various topological properties which can determine the efficiency of signal transmission in a network 25 . In PCa network and modules at the first hierarchical level, these parameters also exhibited power law as a function of degree (k) with positive exponents where the centralities tend to increase with higher degree nodes (Eq. 13) (Fig. 2). This behaviour revealed the increase in efficiency of signal processing with higher degree nodes in the network showing the importance of these nodes in controlling the flow of information, thereby regulating and stabilizing the network. Hence, hub proteins had a significant influence in regulating the network and might be playing an important role in PCa. In order to identify the most influential key regulator proteins in the network, top 103 hub-proteins having degree (k) ≥ 65 were considered for identification of the key regulators through their representation at every topological level (Supplementary Table 1 2) were found to be the backbone of the network. These key regulators along with their partners forming the motifs (Fig. 3B), might be playing the most important roles in regulating and maintaining the stability (network integrity, optimization of signal processing, dynamics etc.) of the network. www.nature.com/scientificreports www.nature.com/scientificreports/ Modules of the network were associated with specific functions. Community detection of the network using Louvain modularity optimization method leads to clustering of the primary PCa network up to the level of motifs (Fig. 3A). This clustering showed that Modularity (Q) of the networks exhibited an increasing pattern with topological levels with highest average Modularity (Q = 0.5527) seen at the first hierarchical level, and lowest (Q = 0.0013) at the level V, the motif level 35,36 .
In complex PPI networks the modules have biological meanings relating to functions and gene ontology analyses have revealed enrichment of certain known functions and pathways in the modules 37 . Our primary PCa-network was composed of 14 modules deduced from the community detection and their mean clustering coefficients C(k) ~ 0.094−0.392 (Table 3). Among these, modules C12 and C13 which were the largest and had the highest mean clustering coefficients C(k) = 0.392 and 0.218 respectively, showing a functional homogeneity in the modules. These modules were analysed for their functional annotations with DAVID functional annotation tool 34 to reveal association with different functions (Table 3).
Hubs in the PCa network coordinate the modules acting as modular hubs. In complex hierarchical networks, the modularity of sub-communities and the roles played by the nodes in the modules is defined with the nodes Within-module Z score, Z i along with their Participation coefficients P i 30 . Z i gives the degree of the nodes within their modules, and P i describes the influence of a node inside the module, as well outside it, in terms of signal processing as well as maintaining network stabilization. Hence, Z i and P i were calculated for each node in the modules using Eqs (9), (10), respectively. Accordingly, within-module Z score, the nodes are classified as follows: ( In the PCa PPI network™study, many hub-proteins were acting as modular hubs, helping in establishing connection between the modules at different hierarchical levels. For example, CUL7 and RANBP2 were among important key regulator protein hubs in PCa which also acted as modular kinless and connector hubs of module C3 and C5 at the first hierarchical level (Fig. 4A,B). P53, E2F1 and c−MYC acted as kinless global hubs of module C9 connecting with all the modules and other proteins in the network. NOP56, FBL, RNF2 and NPM1 also acted as connector modular hubs of C12 module connecting other modules at the same level (Fig. 4A,C).
PCa network exhibited non-monotonicity in rich-club formation across the hierarchy. Identification of rich club nodes is another common feature to study the influence of hubs in the network forming a strong connection among them which is done by calculating normalized rich club coefficient Φ norm across the degree range k (Eq. 11). Normalized rich-club coefficient Φ norm > 1 indicates the existence of rich club among the nodes which play key role in network integration, increasing its stability and improving the efficiency of transmission of information among hub proteins. Since, PCa network is hierarchical and shows disassortativity in nature with node neighbourhood connectivity C N (k) following power law distribution against degree (k) with negative value of exponent β (Eq. 13), rich club formation among the hub proteins is quite unlikely 32,38 . Although rich club formation is not exhibited among high degree hub proteins, the moderate intermediate degree protein with degree 19 ≤ k ≤ 107 showed higher rich club coefficients than the hubs in PCa network (Fig. 5). In the PCa network across the hierarchy, different patterns of rich club coefficients were exhibited among the modules (Fig. 5), showing the phenomenon of non-monotonic behaviour at different hierarchical levels. With respect to modules C12 and C13 at first hierarchical level, they exhibit rich club formation between the high degree nodes but the pattern changes moving at the lower levels. However, in the modules C8, C10 and C15, the topological properties of these modules exhibit assortativity nature due to (i) the node neighbourhood connectivity C N (k) in these modules follow power law with positive β exponents, (ii) Φ increases monotonically with degree k, and (iii) Φ norm approximately increases with degree k with values of Φ norm > 1 (Fig. 6), indicating the possibility of rich club formation among the high degree nodes (Fig. 6A). Considering the nodes with degrees whose Φ norm is larger than one, the approximate range of degrees of nodes forming rich-club in these three modules are 61≥k≥14 (C8), 52≥k≥6 (C10), 37≥k≥6 (C15), and clearly show rich-club formations in the respective network modules (red coloured nodes in the respective modules in Fig. 6).
The key regulator identified in this study and their key functions in disease conditions.

Discussion
The real-world complex networks generally have hierarchically organized community structure, which is evident from fractal studies and scaling behaviour of these networks 21 . Even though there is no specific definition of communities or modules in a network, each community/module is established by densely interconnected nodes forming clusters around the hub nodes which generally have their own local properties and organization 35 . The hubs have highest interactions in the network due to their high degree, constitute both intra and inter communities' interactions in the network in a hierarchical manner, and thus play a central role in information processing in the network 31 . The primary PPI PCa network constructed in this study for tracking the hubs up to the level of motifs led to the identification of 19 key regulators (hubs) from 3,871 genes found to be significantly overexpressed in human prostate adenocarcinomas. There have been limited community finding methods in complex networks, among which the Newman and Girvan leading eigenvector algorithm 35,36 , is commonly used. However, in comparatively large complex networks, Louvain method, which is based on modularity, Q maximization/ optimization 17 , is the most suitable, sensitive and comparatively faster. In our study, considering the size of the network and its sensitivity, we used Louvain method for community detection and while giving equal importance to the hubs, motifs and modules of the network, we identified the novel key regulators. 11 key regulators (RPL11, RPL15, RPL19, RPL23A, RPL3, RPL5, RPL6, RPLP0, RPS11, RPS8 and RPSA) belong to the family of ribosomal proteins (RPs) which are involved in ribosomal biosynthesis and other eight predicted regulators (HSPA5, NOP2, RANBP2, SNU13, CUL7, CCT4, ASHA1 and EIF3A) have important functions reported to be associated with various other cancers. Moreover, at the level of motifs these key regulators interact with other proteins which may also be playing important roles in PCa and establishing themselves to be the candidate disease-genes along with key PCa regulators (Fig. 3B).
The emergence of 11 RPs as key regulators in PCa is an important finding in this study. It could be due to the crucial role of RPs in cell growth and proliferation propagated through protein synthesis. In cancers, ribosomal biosynthesis increases to meet the requirement of rapidly growing/proliferating cells 39 . Some RPs take part in extra-ribosomal functions involved in tumorigenesis, immune cell signalling, and development and regulating diseases through translocation across the nuclear pore complex 40 . RPs have been associated with tumorigenesis either as oncoproteins or tumour suppressors, with differential roles being reported in different cancers. During ribosomal or nucleolar stress such as hypoxia, lack of nutrient, starvation, deregulation of genes etc., RPs modulate the p53-mediated apoptosis. The association of RPs with cancers as discussed in Table 2 suggests a potential unexplored function of these proteins in PCa, both as therapeutic target and predictive biomarker. An understanding of the functions and the pathways of key RPs, for example their role in stabilizing p53 during ribosomal stress and role in cell growth/proliferation in PCa patients is of immense significance as it provides new insights into the control and prevention of PCa.
Besides, other non-ribosomal predicted key regulators identified in this study, SNU13, CCT4, AHSA1, CUL7, EIF3A, HSPA5, NOP2 and RANBP2, are also vital in cell physiology and are equally important for their involvement in cell growth and proliferation in one way or another. The NHP2−likeprotein1(SNU13) identified in this study as another key regulator, is a component of the spliceosome complex 41 which interacts with several RPs and strengthens the role of RPs in cancers. CCT4, Chaperonin containing TCP1 subunit 4, is a chaperone which when mutated is associated with hereditary sensory neuropathy 42 .
AHSA1, theActivatorofHSP90ATPaseActivity1, is a positive regulator of the heat shock protein 90 (HSP90) and when activated HSP90 forms a complex with HSP70 which helps in either binding of the tumour suppressor p53 to DNA, or its degradation by ubiquitination 43 . In cancers, activated HSP90 stabilizes the mutated p53 which decreases its DNA binding activity and degradation through binding with its inhibitor MDM2, thus promoting tumour progression 44 . The activation and transportation of steroid hormones (androgen receptor, AR and oestrogen receptor, ER) to the nucleus is also mediated by HSP90 45 ; thus, AHSA1 activation of HSP90 may influence  www.nature.com/scientificreports www.nature.com/scientificreports/ the androgen metabolism in PCa. Moreover, AHSA1 is a regulator of the cell growth, apoptosis, migration and invasion through Wnt/β−catenin signaling pathway 46 , which suggests its role as a candidate-disease gene in PCa.
CUL7, Culin7, is a component of an E3ubiquitin−proteinligase complex and interacts with p53, CUL9 and FBXW8, and is reported to be an antiapoptotic oncogene 47 . CUL7 has been associated with various cancer types, but its promotion of epithelial-mesenchymal transition in metastasis and its regulation of ERK−SNAI2 signalling affecting the expression of cell adhesion proteins, E−cadherins, fibronectin, N−cadherin and vimentin in cancer is well studied 48 . CUL7 inhibits apoptosis in lung cancer through inhibition of p53 which regulates c−MYC cell cycle progression 47 . CUL7 regulates cell cycle progression through CyclinA overexpression and affects the cell migration, which is a hallmark of cancer, influencing microtubule dynamics in breast cancer 49 . Therefore, the targeted knockdown and silencing of CUL7 has led to a decrease in cell proliferation, weaker −tubulin accumulation in microtubules, promoting their stability and decreasing cell migration (in breast, liver and lung carcinoma cells) and has been suggested as a potential therapeutic target in various cancers [47][48][49] .
The Eukaryotic translation initiationfactor 3 subunit A(EIF3A) forms 43SPre−initiation complex(43SPIC) with other initiation factors and 40Sribosome and initiates the protein synthesis process. This translates mainly genes involved in cell proliferation, cell differentiation, apoptosis etc. and exerts transcriptional activation/repression through forming different forms of stem loop binding with the mRNAs 50 . Dysregulation of translation initiation and the role of EIF3 has been studied in cancers and involvement of EIF3 complex in regulation of mTOR pathway 51 , makes it an interesting protein to study for its regulatory role in PCa.
The Heat shock protein family A (HSP70) member 5(HSPA5) or glucose−regulated protein 78kDa (GRP78), is a chaperone localized in endoplasmic reticulum (ER) and involved in folding and assembly of proteins and plays an active role in unfolded protein response in ER stress, promoting cell survival which is a common process of escaping cell death in cancers 52,53 . Due to this activity, HSPA5 is an emerging therapeutic drug target for cancer. NOP2(p120) is a putative RNA methyl transferase protein and its expression is detectable in proliferating normal and tumour cells, but undetectable in non-proliferating normal cells 54 . Its role in regulating cell cycle progression from G1 to S phase and transformation of normal fibroblast cells 55,56 makes NOP2 an interesting protein which can be used as biomarker for cell transformation. Ran binding protein 2 (RANBP2) is another key regulator identified in this study which is involved in the SUMOylation of TopioisomeraseII− before the onset of anaphase, helping in separation of chromatids from the centromere and its under-expression, mutation or deficiency has been observed in various cancers specially lung cancer and myelomocytic leukemia acting as tumor suppressor genes 57 . Since SUMOylation plays an important role in tumour progression 58 , the p150/importin/RANBP2 pathway may also play a significant role in PCa progression.
In PCa, p53 and AR are the most mutated genes reported according to COSMIC 14 . The protein-protein interactome of GeneMANIA 15 showed that out of the 19 key regulators identified in this study, 12 (CUL7, HSPA5, CCT4, RPL19, RPL11, RPL3, RPL6, RPLP0, RANBP2, RPS8, RPL23A and RPL15) interact directly with p53 and other key regulators through them (Fig. 3C). Association of mutation in the Androgen Receptor gene (AR) which causes the mutated receptor to remain in activated state and continue to maintain androgen receptor mediated downstream signalling even in lower level of circulating androgens leading to discovery of androgen independency in prostate cancer 59 . A recent report suggests several mutations in the AR gene in different metastatic castration-resistance (CRPC) patients in prostate cancer suggesting AR mutants as a good biomarker candidate 60 . β−catenin (CTNNB1) and GSK−3β are other co-regulators of Androgen receptor and phosphorylation of AR by GSK−3β which inhibit AR driven transcription, but in prostate cancer, the increase in the activity of AKT suppression of GSK−3β due to phosphorylation helps in PCa progression 61 . In the PCa, loss of tumour suppressor PTEN gene also releases the inhibitory effect on AR increasing its trans localization to nucleus and transcriptional activity 62 . Therefore, the interaction of the key regulators on AR acted indirectly through p53 and β-catenin(CTNNB1) (Fig. 3C), where the 12 key regulators interact with p53 which regulates GSK−3 and PTEN which are the upstream regulators of AR. In addition, key regulators, RPSA and HSPA5 interact with AR indirectly through β−catenin (CTNNB1) and AKT1 suggesting an important role of the reported key regulators in regulating the functions mediated through p53 and AR in PCa. The findings reiterate the putative roles of these hubs in PCa manifestation and progression. This study may prove fundamental in characterizing the potential therapeutic targets and biomarkers for sensitive intervention and diagnosis of PCa. It is to be noted that in this study the PCa PPI network followed hierarchical scale free topology. Along with the conventional centrality measures, C B , C C , C E and C S , probability degree distribution P(k), clustering coefficient C(k) and node neighbourhood connectivity distribution C N (k) are used to characterize a network whether one is scale-free, random, small-network or hierarchical network 21 . PCa PPI network followed power law distributions for probability of node degree distribution, P(k), clustering coefficient, C(k), and neighbourhood connectivity distribution against degree k with negative exponents 21 (Eq. 12) (Fig. 2), indicating the network falls in hierarchical-scale free behaviour which can exhibit systems level organization of modules/communities.
Since, node neighbourhood connectivity distribution C N (k) as a function of degree k obeyed power law with negative exponent β, it showed its disassortative nature indicating that there is no signature of rich club formation www.nature.com/scientificreports www.nature.com/scientificreports/ among high degree nodes in the network 32 . Degree centrality is the most commonly used centrality measure used to define the hubs which are the high degree nodes in the network. This disassortivity may be due to the sparse distribution of the hubs among the modules playing key roles in coordinating specific function within each module as well as establishing the connections among the modules 32 . Furthermore, we used Louvain modularity optimization method 17 to detect, find communities and sub-communities and their organization at various levels of organization (Fig. 3A). The communities/sub-communities at various hierarchically organized levels also exhibited hierarchical scale-free topology, as was the case in the primary PCa network (Fig. 2). This hierarchical organization shows the systematic coordinating role of the emerged modules/communities and hubs in regulating and maintaining the properties of the network 10 . In such type of networks, the centrality-lethality rule 31 is not obeyed which indicates that disturbing the hub/hubs in the network will not cause the whole network collapse.
Another important feature we found in PCa network is the observation of the non-monotonic behaviour in the rich club formation in the PCa PPI network and across its hierarchy (Fig. 5). The intermediate degree nodes (19 ≤ k ≤ 107) in PCa network showed normalised rich club coefficients (Φ norm > 1) greater than the highest degree hubs, indicating an important role of these intermediate degree nodes (even AR also falls in this category) in regulating the network organization and maintaining stability through establishing key links between the low degree nodes and high degree hubs. Hence, this category of nodes could perform key roles specially in integrating various types of nodes in the network to optimize topological properties of the network. Formation of rich club among the high degree nodes in the communities C8, C10 and C15 (Fig. 6A) indicating an increase in sensitivity of these hubs on being targeted hence take significant roles in regulating their respective modular functions, i.e., endocytosis, proteosome and DNA repair mechanisms (Table 3). These high degree hubs in these modules fall among the intermediate degree nodes in the primary PCa PPI network (Fig. 6B). Thus the varying pattern of rich club signatures across the hierarchy may possibly relate to the change in popularity of the proteins at different levels of organization, and hence hub-proteins preserve their level-dependent influence across the hierarchy 10 . Such behaviour in the PPI networks can be correlated to their weaker resilience and instability at sub-system/modular level which may be critical for certain functional modules due to malfunctions in the key regulator hub-proteins.
The Centrality measures are used to assess the importance of the nodes in information processing in the network. Betweenness centrality C B , closeness centrality C C , eigenvector centrality C E and subgraph centrality C S are the topological properties which can determine efficiency of signal transmission in a network 25 . The behaviour of these parameters exhibiting power law as a function of degree k with positive exponents, where the centralities tend to increase with higher degree nodes (Eq. 13) (Fig. 2), reveals the increase in efficiency of signal processing with higher degree nodes in PCa network, showing the importance of hubs in controlling the flow of information, thereby regulating and stabilizing the network organization. Therefore, hub-proteins have a significant influence in regulating the network although they do not control the whole network completely, thereby increasing the risk of being targeted in the network. Hence, the certain hubs might be acting as key regulators in PCa and the 19 predicted key regulators might serve as a backbone of the network.
Community detection of the network using Louvain modularity optimization method led to clustering of the primary PCa network up to the level of motifs (Fig. 3A). This clustering showed that modularity, Q, of the networks exhibit an increasing pattern with the topological levels with highest average modularity (Q = 0.5527) seen at the first hierarchical level of PCa network and lowest (Q = 0.0013) at level V, that is, at the motif level 35,36 . In complex PPI network the modules have biological meanings and gene ontology analyses have revealed enrichment of certain known functions and pathways in the modules 37 . The functional homogeneity in the modules of PCa network has been correlated to their mean clustering coefficients as modules with higher mean clustering coefficients have better chance to be associated with specific functions 63,64 . Moreover, in disease interactome, the disease modules which are unique modules representing the interaction between disease genes and their neighbourhood, overlaps with the topological modules derived from the network and functional modules associated with functions and are interrelated 65 . Primary PCa network is composed of 14 modules deduced from the community detection method with their mean clustering coefficients C(k) ~ 0.094−0.392 (Table 3). Among them modules C12 and C13 which were the largest had the highest mean clustering coefficients, C(k) = 0.392 & 0.218, respectively, showing a functional homogeneity in these modules. These modules have been analysed for their functional annotations with DAVID functional annotation tool 34 which revealed association with different functions (Table 3). Modules C12 and C13 are represented with ribosomal biosynthesis and transcriptional regulation, respectively. This suggests a bigger role of RPs in PCa which is also evident from the representation of various RPs (RPL3, 5, 6, 11, 15, 19 etc.) as key regulators in PCa network. Transcriptional regulation is the most important level of gene regulation which is accomplished mainly through interaction of transcription factors along with their cofactors to the promoter regions of many genes. The tumour suppressor transcription factor (TF) p53 gene-the most mutated among all PCa-is one of the hub proteins represented in this community. Another important TF, c−MYC-an oncogene acting as a regulator of the cell cycle progression and cell division-is also represented in this community. Moreover, reports on regulations of p53 with the key ribosomal proteins (RPL5, RPL6, RPL11 etc.) and c−MYC key regulator CUL7 through p53 in several cancers suggest a critical association of transcriptional regulation in PCa.
Since the study of complex hierarchical networks is incomplete without understanding the modularity of sub-communities and the roles played by the nodes in the modules, our study applied the approach to characterize the nodes in PCa network through defining their within-module Z score Z i with their participation coefficients P i 30 . In the PCa network many hub proteins act as modular kinless hubs or connector modular hubs maintaining the links within the modules as well as connecting other modules at the same level ( Fig. 4A-C). This shows the importance of the hub-proteins in the hierarchical organization of the network exhibiting their involvement in establishing links among the nodes in each module as well as among the modules in the network which are associated with specific functions.

Conclusions
This paper introduces a new method for finding key regulators in prostate adenocarcinomas using biological networks constructed from high throughput datasets of Prostate cancer patients. The network theoretical approach used here placed equal emphasis on the hubs, motifs and modules of the network to identify key regulators/regulatory pathways, not restricting only to overrepresented motifs or hubs. It established a relationship between hubs, modules and motifs. The network used all genes associated with the disease, rather than using manually curated datasets. Highest degree hubs (k ≥ 65) were identified, out of which 19 were novel key regulators. The network, as evident from fractal nature in topological parameters, was a self-organized network and lacked a central control mechanism. Identification of novel key regulators in prostate cancer, particularly ribosomal proteins add new dimension to the understanding of PCa and its treatment and predicting key disease genes/pathways within network theoretical framework. This method can be used to any networks constructed from patients' datasets which follow hierarchical topology.