Identification of key regulators in Sarcoidosis through multidimensional systems biological approach

Sarcoidosis is a multi-organ disorder where immunology, genetic and environmental factors play a key role in causing Sarcoidosis, but its molecular mechanism remains unclear. Identification of its genetics profiling that regulates the Sarcoidosis network will be one of the main challenges to understand its aetiology. We have identified differentially expressed genes (DEGs) by analyzing the gene expression profiling of Sarcoidosis and compared it with healthy control. Gene set enrichment analysis showed that these DEGs were mainly enriched in the inflammatory response, immune system, and pathways in cancer. Sarcoidosis protein interaction network was constructed by a total of 877 DEGs (up-down) and calculated its network topological properties, which follow hierarchical scale-free fractal nature up to six levels of the organization. We identified a large number of leading hubs that contain six key regulators (KRs) including ICOS, CTLA4, FLT3LG, CD33, GPR29 and ITGA4 are deeply rooted in the network from top to bottom, considering a backbone of the network. We identified the transcriptional factors (TFs) which are closely interacted with KRs. These genes and their TFs regulating the Sarcoidosis network are expected to be the main target for the therapeutic approaches and potential biomarkers. However, experimental validations of KRs needed to confirm their efficacy.


Results
Gene expression profiling of sarcoidosis through microarray data. This study provides information on the structure of correlation-based tuning between genes in multiple microarray datasets by comparing analysis across datasets that is relevant in understanding gene functions. Each series has a different number of differentially expression genes, as presented in Table 1. Based on the differential expression analysis of six GSE series, a total of 1,182 DEGs were identified, of which 263 were up-regulated and 919 were down-regulated genes, respectively (Table S1).
Gene ontology and pathway analysis of DEGs. The biological function and pathways enrichment was analyzed for a total of 172 up and 705 down-regulated genes. We found that the DEGs were significantly enriched in many biological, cellular, and molecular functions as well as some pathways. The modified Fisher exact p-value (EASE score) ≤ 0.05 is considered strongly enriched. The top 10 enriched biological functions are presented in Table 2. By analyzing the BP, we found that the up-regulated DEGs from the SARC's PPI network were enriched in positive regulation of gene expression, positive regulation of protein kinase activity, osteoblast differentiation, inflammatory response, and single organismal cell-cell adhesion. At the same time, the downregulated DEGs were significantly involved in the immune response, inflammatory response, signal transduction, adaptive immune response, and innate immune response. The up-regulated DEGs were correlated with the SARC network: hierarchical scale-free features. The primary SARC PPI network was constructed by up and down-regulated genes that contain 877 nodes and 10,546 edges; the remaining genes have not shown their interaction and were excluded from the network. The network's topological properties follow hierarchical characteristics 13 and scale-free behavior in these parameters because of the power-law nature 14,15 . The probability of node degree distributions (P), clustering coefficient (C), and neighborhood connectivity (CN) against degree k exhibit fractal nature or power-law (Figs. 1a, 2a first rows against level 0). The power-law fits on the data distributions was performed and validated by following the standard statistical fitting procedure given by Clauset et al. 16 , where, all the statistical p-value for all datasets was calculated against 2500 random sampling are found to be > 0.1 (greater than one), and the goodness of fits are found to be ≤ 0.33 (less than and equal to) which is the threshold value predicted. These distributions are done on a log-log plot through a straight line 15 .
The negative value of P(k) and C(k) indicates that the primary SARC network follows a hierarchical scale-free fractal network. The positive value of C N (k) indicates the nature of assortativity that regulates the primary SARC network by identifying a large cluster of degree-nodes (rich club formation).
Similarly, the network centrality parameters: closeness (C C ), betweenness (C B ), and eigenvector (C E ) centralities also show fractal behavior. The positive values of these centrality parameters indicate that the leading hubs in the SARC network play a strong regulatory role.
(1) munities that were further broken down into sub-community and sub-sub-community up to sixth level. The modular structure and its arrangement were carried out by the standard community finding techniques of Newman and Girvan 17 at different organizational levels (Fig. 3). Using this approach, we found that our network is organized hierarchically through six different levels. The corresponding Hamiltonian Energy (HE) is decreased from top to bottom in a network organization against the different organizational levels (Fig. 4a). The leading hubs (nodes) are essential regulators depending on the changes in the activities of proteins/ genes and their regulating mechanism. All of the leading hubs are not a key regulator for disease progression, but only those hubs that regulate the network from top to bottom where the network cannot be further divided into sub-community and form motif have been considered to be important leading hubs. We termed them as "Key Regulators (KRs)" because; they were deeply rooted hub genes which can reach motif level (fundamental regulating unit) through different levels of the organization via communities or sub-communities from primary network to motif level. These key regulators are treated as the backbone to maintaining the network's stability, as they capacitate the network to tackle any unacceptable changes in it.
We identified six key regulators, namely ICOS, CTLA4, GPR29, FLT3LG, CD33, and ITGA4, which are the SARC network's key regulators or organizers. These key regulators were separated from each other after level 2, ICOS-CTLA4-GPR29 moved into the same sub-communities, and FLT3LG-CD33-ITGA4 moved into another sub-community and then moved separately till the sixth level (motif). ICOS-CTLA4-GPR29 and FLT3LG-CD33-ITGA4 are forming a triangular motif (Fig. 6a). ANPEP-IL2RA and FOXN2-TNFR3F25 reached the sixth level but because they don't form motif they could not be considered as key regulators (Fig. S1).
Then, the top 100 hubs were ranked by the number of degrees. Surprisingly, none of these KRs genes fall into the top 10 leading hubs categories. However, two key regulators, CTLA4 and CD33 were among the top 100 high [a] Figure 1. SARC PPI network and sub-networks followed hierarchical scale-free topologies. (a) The behaviors of degree distributions P(k), neighborhood connectivity C N (k), clustering co-efficient (k), closeness C C (k), betweenness C B (k) and eigenvector C E (k) measurements as a function with degree k for an original primary network (level 0) and FLT3LG-CD33-ITGA4 motif knockout networks at various levels of organization (level 1-4). (b) the changes in the exponent values of the six topological properties of the FLT3LG-CD33-ITGA4 motif knockout network [colors corresponding to the ones used in the topological properties plots, i.e., violet for P(k), orange for C N (k), blue for C(k), green for C C (k), maroon for C B (k) and cyan for C E (k)] compared with the topological properties' exponents of the corresponding original networks (black) at various levels of the organization. γ, α, β, δ, µ and τ are the exponents of the degree distribution, neighborhood connectivity, clustering coefficient, closeness centrality, betweenness centrality and eigenvector centrality, respectively. www.nature.com/scientificreports/ degree hubs (Fig. 4c). It means that KRs don't always need to be the network's large leading hubs; rather, they can randomly change their popularity at various levels of an organization (Fig. 3). All the key regulators maintained low popularity or profile, but essential regulators in the SARC network; they regulate the motif level of organization. Few more genes, namely, ANPEP, IL2RA, FOXN2 and TNFR3F25 supported the network reached till the sixth level. IL2RA was among the top 100 high degree hub genes. These key regulators act as signal propagators from top to bottom and vice versa to maintain the stability of the networks, whenever the network is under external stress and inherent properties. According to the highest degree, the top 10 leading hubs are IL6, PTPRC, ITGAM, CD86, CTLA4, CCR5, ITGB2, ITGAX, LCP2 and SELL. Functional pathways enrichment analysis suggested that the top 10 leading hub and key regulators are mainly enriched in the Hematopoietic cell lineage, Cell adhesion molecules (CAMs), Pathways in cancer, Intestinal immune network for IgA production, Tuberculosis, Transcriptional misregulation in cancer, Rheumatoid arthritis, T cell receptor signaling pathway, Chemokine signaling pathway and Cytokinecytokine receptor interaction (Fig. 5).
We have computed the Probability P y x l of key regulators to understand the regulating ability of each of the six key regulators: where, x = number of edges x [l] at level l. E [l] = total number of edges of the network or modules or sub-modules.
The computed Probability P y x l of all the key regulators shows an increase in P y values from top to bottom, which increases the level l. This means the regulatory role of each fundamental regulator becomes more powerful at deeper levels of the organization and active workers at the grassroots level (Fig. 6b). [a]  Figure 2. SARC PPI network and sub-networks followed hierarchical scale-free topologies. (a) The behaviors of degree distributions P(k), neighborhood connectivity C N (k), clustering co-efficient (k), closeness C C (k), betweenness C B (k) and eigenvector C E (k) measurements as a function with degree k for an original primary network (level 0) and ICOS-CTLA4-GPR29 motif knockout networks at various levels of organization (level 1-4). (b) the changes in the exponent values of the six topological properties of the ICOS-CTLA4-GPR29 motif knockout network [colors corresponding to the ones used in the topological properties plots, i.e., violet for P(k), orange for C N (k), blue for C(k), green for C C (k), maroon for C B (k) and cyan for C E (k)] compared with the topological properties' exponents of the corresponding original networks (black) at various levels of the organization. γ, α, β, δ, µ and τ are the exponents of the degree distribution, neighborhood connectivity, clustering coefficient, closeness centrality, betweenness centrality and eigenvector centrality, respectively.
significance of the key regulators in a SARC network, changes in the topological properties of the network are finally studied by removing key regulators from the network. It demonstrates the importance of the key regulators in the SARC Network. The knockout experiment was carried out separately for both the motifs; FLT3LG-CD33-ITGA4 and ICOS-CTLA4-GPR29 are triangular motifs. In both cases, a considerable change in the topological properties of the network has been observed, but somehow the network was reorganized itself and has tolerance against network error. In all the key regulators or motif knockout network, the decrease in the exponent of P(k) γ indicates that the network self-reorganizes to stabilize and save the network properties from the breakdown. The increase in exponent of C(k) α indicates community compactness increases to save the communities from breakdown. In the deeper levels of the organization, the positive exponent value of C N (k) β becomes negative, which indicates that the network is most tolerant and dis-assortative in nature. It is observed that the exponent value of C B (k) μ in the network first increases then decreases because of the removal of key regulators but again, the value of μ increases, which indicates the decreasing importance of the regulatory roles of the remaining hubs but reorganize themselves to prevent the network breakdown. The increase in exponent of C C (k) δ indicates that information processing in the network becomes faster when key regulators are removed, and hence reorganize the perturbed network and save it from breakdown. Further, the decreases in the exponent value of eigenvector centrality τ   www.nature.com/scientificreports/ indicate that transmission of information is diminished because the key regulators are removed (Figs. 1a, 2a). In all the key regulators or motif knockout experiments, the values of the exponent for all the topological properties show drastic changes in deeper levels of the organization, but we did not get a breakdown of the network and maintains the hierarchical features of its organization after removing the key regulators or motif (Figs. 1b, 2b). The change in γ etc., for Figs. 1b and 2b gives an overall picture of how important these two motifs. While ICOS-CTLA4-GPR29 motif knockout has greater impact on destroying scale free and assortative nature of the network at lower levels, on the other hand FLT3LG-CD33-ITGA4 motif has a little or no effect on the integrity of the network as compared to ICOS-CTLA4-GPR29 motif.
Energy distribution in the network: calculation of Hamiltonian energy. The Hamiltonian Energy calculations for a network within CPM's formalism analyze competitive contributions from the organization of nodes and edges, and this energy is used to organize or reorganize the network at different levels. This technique can also amplify the important changes in the organization of the network as it goes down to different levels of the organization, capturing the importance of hubs in the network and also at the modular level. Hamiltonian Energy formalism, therefore, proves to be a powerful technique for considering differences in the organization of a network 18 .
θ is the change in Hamiltonian functions due to removal of key regulators at level θ, where HE [L0] θ and HE [R] θ are the Hamiltonian functions for original and removed SARC networks respectively and corresponding community/sub-communities, then we obtain, where HE θ = HE [R] θ . This demonstrates that removal of KRs causes slight destruction of wiring or rewiring energy that is propagated at all levels of the organization of the SARC network. The relative energy of every key regulators can have at various levels of network organization is shown in Fig. 7.  IL6  PTPRC  ITGAM  CD86  CTLA4  CCR5  ITGB2  ITGAX  LCP2  SELL  TLR8  TYROBP  IFNG  CCR7  CXCR4  SPI1  IL1B  CCL5  CXCL10  VCAM1  CD28  MMP9  EGFR  STAT1  CCL4  LCK  CCL2  IL10RA  CD2  CCR2  PLEK  IL15  CXCR3  IRF8  FCGR2B  CD69  CSF1R  ZAP70  CXCL9   IL7  IL2RB  CD68  CD4  GZMB  TLR3  ITGAL  LILRB2  BTK  FGR  PRF1  FCGR1A  CD48  CYBB  HCK  IKZF1  IL2RA  FYN  CCL3  LYN  RAC2  CD53  IRF4  CD27  IL2RG  TLR6  FCGR3A  CXCL2  IGF1  CD3E [c] Hamiltonian Energy (HE) www.nature.com/scientificreports/ The Hamiltonian Energy was calculated for hubs with all possible communities in the network at each level. We find that the distribution of energy in the primary SARC network is highest and starts to decrease as the organizational levels increase. The decrease in Hamiltonian Energy indicates the dominance of the interacting edges over the network size, indicating fast processing of information.
Next, in the KRs knockout experiments, we calculated Hamiltonian Energy from the network and communities or sub-communities in terms of understanding the change in energy distributions within the SARC network. Due to KRs knockout, a minor decrease in the Hamiltonian energy is observed at each level (Fig. 7). This means that the elimination of KRs causes a significant loss of wiring or rewiring energy that is propagated across the level of network organization. However, the network does not collapse and maintains the hierarchical features of its organization. This indicates that the network is sensitive to perturbation but tries to maintain its network organization and properties, which are elegantly robust.
Compactness of network: LCP-DP approach. The LCP architecture not only assists the quick transfer of data through the different network community but also through local processing too. Using LCP approach, we analyzed the SARC network to check its self-organization behavior at different levels of network organization. The LCP-corr of all the communities or sub-communities was measured at different levels presented in Fig. 4b. At each level, the average values of LCP-corr are greater than 0.853 (zero LCP-corr communities aren't taken on www.nature.com/scientificreports/ average) and these values do not change with the error bar. This means that the network maintains self-organization and compactness and has effective data processing. It serves as a strong dynamic and heterogeneous LCP networks which help in network evolution and reorganization.
miRNA key regulators network. ENCORI was used for screening the key regulator's targeted miRNAs.
Seven databases were predicted to identify the miRNAs as the targeted miRNAs of the key regulators. Further, Cytoscape (V 3.6.1) was used to draw the network of the miRNA-key regulator. The resulting network of interactions contains six key regulators and 77 miRNAs, as presented in Fig. 8a. In the Supplementary file, the respective miRNAs targeting key regulators are presented in Table S2.
TF-key regulators regulatory network. NetworkAnalyst has also enriched TF-gene interactions. ChEA databases were used to predict the TF-KRs interactions. The resulting interaction network consists of 6 key regulators and 65 transcription factors. Furthermore, it has been found that various transcription factors regulator which regulate more than two KRs;, among them, five transcription factors were identified with the highest interaction degree ≥ 3 in the TF-Key regulator's regulatory network (Table 3). This implies that these transcription factors have strong connections with these key regulators (Fig. 8b). In the Supplementary file, detailed information of transcription factors of key regulators are presented in Table S3.   www.nature.com/scientificreports/

Discussion
Although some progress in the study of SARC has been made, the exact molecular mechanisms of occurrence and development in SARC are still unclear. Therefore, studying the mechanism to identify the molecular targets for diagnosis and treatment is crucial. In recent decades, the quest for DEGs has been accelerated and its differential expression widely spread. In this study, the raw gene expression data of six GSE series were obtained from the GEO dataset and a total of 1,126 DEGs were identified, including 270 up-regulated and 856 down-regulated genes that surpassed the cut-off criteria of p-values and fold change. The KEGG pathways results indicate that the down-regulated DEGs were mainly linked with cytokine-cytokine receptor interaction, Tuberculosis, Osteoclast differentiation, pathways in cancer and Human T lymphotropic virus type I infection. In comparison, the up-regulated DEGs were not enriched. These findings also provide helpful evidence for the study of molecular interactions in SARC progression. Indeed, several research studies have indicated that tuberculosis and pathway in cancer are highly associated with the growth and development of SARC. Many studies have been reported a strong association between a history of tuberculosis patients with a higher risk for lung cancer and related mortality. The association between Tuberculosis and the risk of lung cancer in a high-income country was identified in a prospective Korean cohort research study 19 . In patients with a history of lung disease, oxidative stress and local chronic inflammation are mechanisms that increase the risk of lung cancer. Fibrosis is important in the maintenance of inflammation [20][21][22] . A correlation between SARC and lung cancer has been identified in similar studies [23][24][25][26] . In patients with SARC, immunologic defects can result from a lack of immune response against tumors or oncogenic viruses. In comparison, chronic inflammation associated with SARC can contribute to the development of cancer 27 . However, the correlation between Osteoclast differentiation and SARC remains unclear.
Furthermore, the SARC network was constructed from up and down-regulated genes (DEGs) that gave a network with 877 nodes and 10,546 edges. The constructed network showed hierarchical scale-free behavior, and it means that the network has system-level organizations that involve interconnected communities or sub-communities. Since the nature of the network is hierarchical, each gene activity does not have much importance, but its synchronization shows different significant functional regulations of the network. In the process, individual gene activities assume less significance. In our study, 6 genes out of 877 genes in the network, namely ICOS ↑ , CTLA4 ↓ , GPR29/CCR6 ↓ , FLT3LG ↓ , CD33 ↓ , and ITGA4 ↓ were the most influential key regulators of the SARC network. These key regulators act as the backbone of network activities and its regulations which could be the most probable target gene of disease. Earlier it has been identified that ICOS, CTLA4 and CCR6 polymorphism is related to autoimmune disease risk in patients with Sarcoidosis [28][29][30] . These key regulators are found to reach the same community and formed a triangular motif till the last level. These genes are also involved in several other diseases which is life-threatening including various type of cancer, Acute Leukemia, Acute Promyelocytic Leukemia, Common variable immune deficiency (CVID), Autoimmune lymphoproliferative syndrome type V, Multiple sclerosis, celiac Disease, Immune system Disease, Crohn's disease and Alzheimer, etc. presented (Fig. 9).
Our study reported that the gene ICOS is the up-regulated gene in SARC patients compared to healthy controls, as determined by a SARC network analysis. ICOS (Inducible T Cell Co-Stimulator) is a co-stimulatory molecule that belongs to the CTLA4 and CD28 cell surface receptor family. Although CD28 is expressed on T cells constitutively to emerge signal for resting T cells to fully activated, ICOS is only up-regulated after activation of cells. A positive signal is provided by this molecule to increase the proliferation of T cells. Studies have been shown that the blocking of ICOS results in the inhibition of immune responses for the T helper type-1, T helper type-2 and T helper type-17 31 . Moreover, recent research has shown that in ICOS-deficient patients, impaired function is observed in CD4 + and CD8 + T cells.
Our finding suggested that the five genes were down-regulated, in which CTLA4 (cytotoxic T lymphocyte antigen 4) is a member of immunoglobulin's superfamily, which can inhibit T-cell activation, proliferation and lead to the incidence of peripheral immune tolerance. CTLA4 is a cell surface receptor related to CD28, which binds to CD80 and CD86 ligands. CTLA4 binding to CD86 and CD80 delivers a negative signal to activate T cells by making CD86 and CD80 less accessible to CD28 32 .
The trans-membrane protein CD33 (Siglec-3) is a sialic acid-binding immunoglobulin like lectin and is expressed in hematopoietic and immune cells. CD33 recognizes glycolipid and glycoprotein. Sialic acid residues have one or more immune-receptor tyrosine based inhibition motif and mediate cell-cell interactions that restrict or inhibit immune responses. The function of CD33 has been involved in many processes such as immune cell growth, immune or malignant cell in adhesion processes, and inhibition of cytokine release by monocytes and endocytosis. However, no studies on CD33 with respect to SARC have been performed. In this study, we found that only one potential miRNA hsa-miR-335-5p that CD33 might target. www.nature.com/scientificreports/ The GPR29 gene that encodes the protein CCR6 (C-C chemokine receptor type 6) is expressed predominantly in dendritic cells (DC) and memory T cells which is a B cell maturation and differentiation. It is involved in recruiting and migrating DCs and T cells during immunological responses. CCR6 only binds CCL20 and β-defensins.
The ITGA4 gene encodes a member of the protein family of integrin alpha chains. The ITGA4 integrin family mediates cell-cell adhesions that are particularly important for immune function. Alpha 4 integrin's are involved in the surveillance, haematopoiesis, inflammation and pathogenesis of cardiovascular diseases. Up-regulation of ITGA4 has been reported in various malignancies in different studies, such as breast cancer, neuroblastoma and melanoma and immune disorders such as Crohn's disease and multiple sclerosis. Down-regulation of ITGA4 and its ligands or inhibition of ITGA4 ligand complex formation was considered a possible therapeutic approach. However, no studies on ITGA4 with respect to SARC have been performed.
FLT3LG is a protein-coding gene. DCs provide the key association between innate and adaptive immunity by recognizing pathogens and priming immune responses specific to the pathogen. FLT3LG regulates the production of DCs and is especially essential for the positive classical DCs of plasmacytoid DCs and CD8 and their CD103 positive tissue counterparts. However, there is no report on the correlation between FLT3LG and SARC. We also found that 6 potential miRNA (hsa-miR-381-3p, 493-5p, 522-3p, 300, 1287-5p, 3150a-3p) that FLT3LG might targeted. SARC is closely related to the immune response. Excessive activation of the immune response to unknown inhaled antigens is considered to be one of the pathogenesis of SARC 33 . Most of the DEGs related to SARC, which we obtained are also related to the immune response. This study believes that the complex relationship of these immune-related DEGs may lead to excessive immune responses.
The network shows fractal nature because of its topological properties, which follow a power-law distribution. It indicates that the network is self-organization and stable. Therefore, the network has a significance of hierarchical properties, and it has no central control system. The KRs knockout experiments show the slight changes in topological properties of the network. However, we did not get a network breakdown, and the network keeps functionally reorganizing itself to stabilize the removal of these key regulators, which is evidence of self-organization. The SARC networks' self-organizing behaviors were also examined by the LCP approach, which leads us to conclude that the network maintains self-organization and is compact with efficient processing of information.
The function of genes is regulated at both transcriptional and post-transcriptional levels. Therefore, we studied the miRNAs-KRs and TFs-KRs networks to provide deeper insights into the regulatory behavior of the  www.nature.com/scientificreports/ identified key regulators. TFs drive gene transcription which may be in a coordinated fashion through genes with associated functions. On the other hand, miRNA are especially powerful regulators of transcript levels at the post-transcriptional level, while it should be observed that there are other less potent and less well-defined categories of non-coding RNAs that also affect transcript levels post-transcriptionally. Thus, we used miRNA and TF targets to identify their targets among the key regulators involved in SARC. In this study we identify some TFs with highest connection with key regulators. RUNX1 is involved in immune response, angiogenesis, embryonic development, hematopoiesis and tumorigenesis 34 . PPARD is a receptor of nuclear hormones which regulates a range of biological processes. It has been suggested that this gene plays a role in the development of many chronic diseases including atherosclerosis, obesity, cancer and diabetes 35 . STAT3 is a transcription factor of cellular signal involved in the regulation of several cellular processes such as cell proliferation, cell differentiation and angiogenesis in normal cells. Diseases like immunodeficiency autoimmunity and cancer are associated with mutations in human STAT3 36 . The PAX3 gene encodes a member of the transcription factors of the paired box or PAX family. During the formation of the skeletal muscle, neural crest derivatives and central nervous system, this protein is expressed and regulates the expression of target genes that impact on differentiation, proliferation, survival and motility in these lineages. PAX3 is also involved in many type of cancers 37 . SMAD4 belongs to the family of signal transduction proteins which are phosphorylated and activated by trans-membrane serine threonine receptor kinases in response to TGF-β signaling through many pathways. The function of SMAD4 as a tumor suppressor and inhibits the proliferation of epithelial cells 38 . Our finding showed that these transcription factors formed a linked regulatory network with KRs; therefore, our result signified that the dynamic changes in these transcription factors activities appear in SARC which may play a significant role in regulating the gene function and expression of KRs associated with the appearance and development of SARC. Therefore, according to this study, the identified few key regulators may act as therapeutic targets for SARC in the future. There are some limitations, such as sample size is limited. In addition, we may not further investigate how KRs-miRNAs networks effects the diagnosis and treatment of SARC in details because of the lack of experimental studies and validations. Despite these limitations, this analysis may provide more accurate results based on the integrated bioinformatics analysis compared to single dataset studies.

Conclusion
In this study, we performed an integrated analysis based on six microarray gene expression profiles of Sarcoidosis and healthy control to identify DEGs and their associated biological function, and pathways enrichment analysis was performed. The protein interaction network was constructed and analyzed its topological properties and uncovered novel key regulators for Sarcoidosis. Moreover, we constructed miRNA-KRs and TF-KRs network, to provide deeper insight into the regulatory behavior. Our result demonstrated the importance of key regulators and found them to reach the same community and form a triangular motif. All of the genes are known to be involved in immune response and its metabolism. Therefore, these genes and factors are also likely to play a significant role in SARC, considering the preventive impact of immune response on the appearance of this disease. However, the sample size is limited; further studies are also needed to validate the expression and function of the identified key regulators in Sarcoidosis.

Methodology
Sarcoidosis associated microarray datasets selection. The NCBI-GEO 39 dataset is an accessible database that contains gene profiles. Six microarray datasets GSE16538 40 , GSE18781 41 , GSE19314 42 , GSE19976 6 , GSE37912 43 and GSE75023 44 were downloaded from GEO datasets 45 . In our study, the datasets were selected based on inclusion and exclusion criteria that are (i) Sarcoidosis patient and healthy control studies of humans. All the datasets and references, which confirmed to the criteria as mentioned above, were manually screened. No ethical approval was required as this study is purely based on bioinformatic analysis. GEO2R 45 is an online program that allows the comparison and evaluation under the same experimental conditions of two distinct groups of samples. In this study, the selected SARC and healthy control datasets were pre-processed using GEO2R for background correction and normalization. This is based on limma R package 46 . Subsequently, the results of the finding were downloaded in the format of MS Excel, and genes that followed the |logFC (fold change)| ≥ 1 and P-value < 0.05 primary cut-off criteria were considered as DEGs (including regulated genes Up and down). The probes ID without gene annotation or more than one gene annotation were filtered out; the average value of multiple probes corresponding to the same. The probe IDs were converted to gene symbols using Synergizer online server 47 50 with an interaction score > 0.40 as the threshold. Through STRING, protein-protein interactions can be investigated and analyzed, the interactions being functional as well as physical associations. These associations are obtained from text-mining, experiment, co-expression analysis, other databases, gene fusion, neighborhood and co-recurrence. Subsequently, in the Cytoscape software (V 3.6.1) 51 the SARC PPI network was visualized and analyzed.

Identification of differentially expressed genes.
Characterization of networks topological properties. The Structural properties of complex networks were described through topological parameter behaviors. The SARC network's topological properties were computed using the Network Analyzer 52 and CytoNCA 53 plugin in Cytoscape. The topological properties analyzed in this study are defined below: Probability of degree distribution. In a PPI network, the degree k represents the number of links the node connects with other nodes. If G = (N, E) describes a graph of a network, where N and E represent the node and edges respectively. The network's degree distribution probability (P(k)) is measured by, where nk = Number of nodes having degree k and N = Total number of nodes in the network. P(k) of small world and random network follows Poisson distribution while, for real world, scale free and hierarchical network obeys power-law P(k) ~ k −γ , where, γ is the exponent of degree distribution 54,55 . In hierarchical networks the value of γ becomes close to γ*2.26 (mean-field value) which indicates the importance of community with hubs in the network 13,14 .
Clustering coefficients. In a PPI network, the clustering coefficient (C(k)) describes how strongly node neighborhoods are internally connected. This is the ratio of the number of its closest neighborhood edges e i to the total likely number of edges of degree k i . Clustering coefficient (C(ki)) of i th node for an undirected network can be measured by, where e i = Total number of connected pairs among all closest neighbors of the node i, k i = degree of the node i.
The average clustering coefficient (C(k)) characterizes the entire organization of clusters in the network. Similarly (C(k)), P(k) probably depends on network size. In scale-free networks C(k) ~ constant, but it obeys power-law in hierarchical network with degree, C(k) ~ k −α , with α ~ 1, where, α is the exponent of Clustering coefficient 13,15 .
Neighborhood connectivity. The average connectivity of a node's closest neighbors in a network represents the node's neighborhood connectivity in the network 56 . The neighborhood connectivity is measured by, where P q k = conditional probability that a connection belonging to a node with connectivity k to another node having q connectivity.
In scale free network, C N (k) ~ constant, while the hierarchical network obeys power-law in degree k, C N (k) ~ k −β with β ~ 0.5 57 where, β is the exponent of neighborhood connectivity. Furthermore, positive and negative signs in β could be an indication of assortivity & dis-assortivity in network topology respectively 58 .
Betweenness centrality. Betweenness centrality of a node in a PPI network represents the prominence of information flow through one node to another node through the shortest path 59,60 . The geodesic paths are shown from node i to node j by ' dij(v)' which passes through node 'v' and ' dij' . The Betweenness centrality of a node v can be measured by, Closeness centrality. Closeness centrality represents how quickly information is circulated in the network from one node to another node 61 . The Closeness centrality of the node i is described as the reciprocal average length of the geodesic paths between the node and all other nodes connected to it in the network and it is measured by, where dij = length of the geodesic path between nodes i and j, n = total number of nodes in the network connected to node i. www.nature.com/scientificreports/ Eigenvector centrality. In a PPI network, Eigenvector centrality of a node i (C E (i)) in a network is proportional to the sum of closest neighbor centralities 62 , and it is measured by, where nn(i) = closest-neighbors of nodes i in the network. λ = Eigen value of the eigenvector. v i = ' Av i = λv i ' where A is the adjacency matrix of the network. The principal eigenvector of A, which corresponds to the maximum positive eigenvalue πmax, represents a centrality score of its eigenvector. Because the eigenvector centrality function of the node varies smoothly across the network and depends on its neighbors, node with high eigen-vector centrality is embedded in the locality of nodes of high eigen-vector centralities, and chance of having isolated nodes in and around the locality is very low 63 . Thus, the centrality of the eigenvector can be used as an indicator of the spreading power of the node in the network.
Community detection: leading Eigen vector approach. Detecting and characterizing the modular structure and its properties in the hierarchical network are important in identifying network behavior predictions at different levels of hierarchy, as well as accessing the network's organizing principle in the study. In this study, the Leading Eigen Vector (LEV) approach 64,65 was used in R from the package 'igraph' (http:// igraph. sf. net) 66 to detect the community or modules. The LEV approach is the most effective approach for community detection as it calculates the Eigenvalue for each link, which illustrates the importance of each link, not nodes. We used this approach to detect modules from the primary network, sub-modules from modules at each level of organization, and so on until the motifs level is reached (i.e., 3 nodes and 3 edges), which is the last level of network organization after which the network cannot be further broken. Identifying any sub-module as a community was based on the criterion that it should be found to contain at least one triangular motif (defined by G (3,3). All the communities, sub-community and sub-sub-community are classified as level-1, level-2 and so on.
Genes tracing across the networks. In a network, all hubs are important regulators and only those genes which regulate the network from up to down (top to motif level) were considered as the most important and persuasive genes. These genes are termed as 'Key Regulators' of the network. To identify these key regulators in the SARC network was done through gene tracing. This gene tracing was conducted up to the level of the motif in different communities or sub-communities obtained from Newman and Girvan's method of community detection or clustering 65 . Through tracing, the most important and persuasive genes within the network were identified that regulates the network.

Key regulators knock out experiment.
To understand the change in the network organization was observed through the knockout experiment in the absence of these important nodes. We consecutively removed the identified key regulators from the constructed primary SARC network, after that, we measured different topological properties of the reorganized or modified network to study the regulating abilities of these key genes by measuring the degree of structural change due to their absence. Each time we measured the topological properties using Network Analyzer, while in Cytoscape, we used another CytoNCA 53 plugin for topological properties for Eigenvalue calculation.
Energy distribution in the network: calculation of Hamiltonian energy. At each level of the network, by following the formalism of the Constant Potts Model (CPM), the Hamiltonian energy (HE) is used as a technique to organize a network at a certain level. HE gives the energy distribution at the global level as well as at the modular level of the network 67,68 . HE of a network and community or sub-community can be calculated by, where e c = Number of edges in a community c. n c = Number of nodes in a community c and γ = the resolution parameter acting as edge density threshold which is set to be 0.5.
Further, in the KRs knockout experiment, after removing key regulators from the network, we calculated the HE of network and communities or sub-communities at each level. The difference in HE of the primary SARC network and the key regulators removed SARC network calculated the perturbation caused by the key regulators.
here L = level in the network, θ = key regulators removed network.

Compactness of network: local-community-paradigm (LCP) approach. The LCP Decomposition
Plot (LCP-DP) is an approach to represent the topological properties of a network in 2D (two dimensional) space of common neighbour's (CN) index of interacting nodes and local community links (LCL) of each pair of interacting nodes in the network, and it provides number, information of size, and firmness of communities in a www.nature.com/scientificreports/ network. This can further be used as a measure of self-organization in the network 69 . The LCP correlation (LCPcorr) is the Pearson correlation coefficient between the variables LCL and CN and it is measures as; where cov (CN, LCL) = the covariance between LCL and CN, σ CN and σ LCL = standard deviation of LCL and CN.

miRNA-key regulators network construction. The Encyclopedia of RNA Interactomes (ENCORI) is
an accessible web-based tool that focuses mainly on interactions with miRNA targets 70 . ENCORI uses seven developed miRNA target prediction databases, including TargetScan, miRanda, PITA, PicTar, microT, RNA22 and miRmap. In this study, the targeted miRNAs of key regulators were considered miRNAs. Subsequently, this was visualized in Cytoscape software and analyzed the co-expression network of key regulators and their targeted miRNAs 51 .
TF-key regulators regulatory network. Network Analyst is a comprehensive, accessible web-based tool for network visual analytics of gene expression profiles, statistical meta-analysis and data interpretation 71 . The integrative study of TF-gene interactions for input genes can be supported, and TF's effect on the functional pathways and expression of the key regulators can be assessed. In this study, TF-KRs interaction was predicted using the ChEA database and Cytoscape software was constructed and visualized the TF-KRs regulatory network.