Investigation of MicroRNA and transcription factor mediated regulatory network for silicosis using systems biology approach

Silicosis is a major health issue among workers exposed to crystalline silica. Genetic susceptibility has been implicated in silicosis. The present research demonstrates key regulatory targets and propagated network of gene/miRNA/transcription factor (TF) with interactions responsible for silicosis by integrating publicly available microarray data using a systems biology approach. Array quality is assessed with the Quality Metrics package of Bioconductor, limma package, and the network is constructed using Cytoscape. We observed and enlist 235 differentially expressed genes (DEGs) having up-regulation expression (85 nos) and down-regulation expression (150 nos.) in silicosis; and 24 TFs for the regulation of these DEGs entangled with thousands of miRNAs. Functional enrichment analysis of the DEGs enlighten that, the maximum number of DEGs are responsible for biological process viz, Rab proteins signal transduction (11 nos.) and Cellular Senescence (20 nos.), whereas IL-17 signaling pathway (16 nos.) and Signalling by Nuclear Receptors (14 nos.) etc. are Biological Pathway involving more DEGs. From the identified 1100 high target microRNA (miRNA)s involved in silicosis, 1055 miRNAs are found to relate with down-regulated genes and 847 miRNAs with up-regulated genes. The CDK19 gene (Up-regulated) is associated with 617 miRNAs whereas down-regulated gene ARID5B is regulated by as high as 747 high target miRNAs. In Prediction of Small-molecule signatures, maximum scoring small-molecule combinations for the DEGs have shown that CGP-60774 (with 20 combinations), alvocidib (with 15 combinations) and with AZD-7762 (24 combinations) with few other drugs having the high probability of success.

exposure to silica dust by job rotation and use of personal protective equipment, etc. to control the issue enhanced with early diagnosisand prediction 9,13 . Bandyopadhyay et al. 14 stated that diagnostic challenges arise as silicosis shows resemblance in radiological and clinical overlap with pulmonary tuberculosis and neoplastic lesions. Therefore, it is necessary to investigate the complex molecular mechanisms that underlie the disease. Identification of differentially expressed genes (DEGs) in response to silica exposure and detail examination of the DEGs using the suitable statistical and computational approach may provide valuable information about the molecular mechanism(s) underlying the toxicity of crystalline silica 15 . The DEGs (upregulation/ downregulation) and/or their products, regulators, mostly transcription factors (TFs) and micro RNA (miRNA) following appropriate validation to recognize as a suitable biomarker(s) for silicosis is under investigation. A thorough investigation of possible molecular targets and mechanisms can make a way to achieve successful treatment options as well as prevention of potential adverse effects of silica exposure. The behavior of genes and genetic components due to silica exposure and toxicity are still rarely explained and are a vast area of scientific research 16 . Zhang et al. 17 . attempted to identify genome-wide aberrant DNA methylation profiling in lung tissues from silicosis patients using Illumina Human Methylation 450 K bead chip arrays. In the past, microarray-based transcriptomics studies have been successfully employed to gain insights into the molecular mechanisms underlying the toxicity of chemicals 7 as well as to identify molecular markers for their toxicities 18 . Advances in high throughput gene expression profiling, such as transcriptomics, microarray analysis can enlighten in a better understanding of the effects of toxic agents in biological systems.
Wang et al. 19 reported phagocytosis of SiO 2 into the lung causes inflammatory on the cascade resulting in fibroblast proliferation and migration followed by fibrosis associated with monocyte chemotactic protein 1 and sustaine increase in p53 and PUMA protein levels. Wang et al. 19 interpreted involvement of MAPK and PI3K pathways in the SiO 2 − induced alteration of p53 and PUMA expression and to have possibility of link between SiO 2 -induced p53/PUMA expression in fibroblasts and cell migration on the basis of miRNAs that can interference with p53 and PUMA and prevent SiO 2 -induced fibroblast activation as well as migration. The study provide insight into the potential use of p53/PUMA in the development of novel therapeutic strategies for silicosis treatment. Therefore, in the current study we target to investigate the molecular mechanism underlying lung cell differentiation by identifying DEGs involved in silicosis, their expression (up/down-regulation), produce the possible network and gene ontology term analysis namely biological pathway and biological process for differentially expressed genes. We have also considered for identification of transcription factors (TFs) and microRNA (miRNA) associated with these DEGs for the regulation of the disease gene. A common network construction combining the relevant of TFs, miRNA, gene network, and small molecules for regulation of the disease progress is targeted.

Methodology
Microarray datasets. In this study, we have considered the work "Mechanisms of crystalline silica-induced pulmonary toxicity revealed by global gene expression profiling" and Dataset GSE30180 from the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo) 16 . Datasets GSE30180 contains information regarding clinical tissues of human lung epithelial cells (A549 cells) considering 10 samples (5 control and 5 crystalline silica exposed to 800 ug/ml for 6 h using differential gene expression profile induced by silica conducted by National Institute for Occupational Safety and Health USA 16 ). We have considered the available maximum concentration (800 ug/ml) from the study considering general worker's shift exposure time period to have an idea at extreme conditions. Quality Control assessment of data quality is a major concern in microarray analysis. arrayQualityMetrics 20 is from Bioconductor package that provides a report with diagnostic plots for one or two colour microarray data are used.
We have used arrayQualityMetrics for data quality assessment of microarray data and Quality Control is performed to identify potential low-quality arrays. The removal of low-quality arrays is desirable to avoid negative impact in downstream analysis procedures, by introducing invalid information and ultimately impairing statistical and biological significance. Array quality is assessed through the computation of commonly used statistical measures arrayQualityMetrics, an R package for quality control and quality assessment analysis that supports different types of microarrays in R. Here, we use an additional QC/QA that provides an HTML report with interactive plots. The instensity distribution of the arrarys of each box correspeonds to one array which indicated the over all quatity of each array with other corresponds are goods ( Figure S1, S2).

Screening of differentially expressed genes.
To screen DEGs between control and crystalline silica exposed cell, differential expression analysis is conducted using Bioconductor. Bioconductor operates in R (a statistical computing environment) and is applied for genomic data analysis and comprehension. The normalized data is analysed for identification of DEGs by limma package 3.26.8 in R (following adjust = "fdr" sort by B", number 250). The robust MultiArray average method 21 is applied to perform background correction and data normalization using default parameters in the limma package 22 . Subsequently, a differential analysis between silica exposed and the no silica exposed is performed using the limma package 22 , a modified version of the standard t-test incorporating the Benjamini-Hochberg (BH) multiple hypotheses correction technique 23 . DEGs are defined as the false discovery rate (FDR) set as the cut-off parameters to screen out 250 significant increases or decreases in gene expression levels 24 . Identification of transcription factor. IRegulon plugin 25 in Cytoscape (version 3.8.0) 26 is used to detect transcription factors using motif2TF and their optimal sets of direct targets for a set of genes Chip-seq data. The minimum identity between orthologous genes is 0.05%, while the maximum FDR on motif similarity is 0.001. The normalized enrichment score (NES) > 5 is considered as a threshold value for the selection of potential rela-Identification of miRNA (microRNA) for regulation Silicosis. To identify targets, regulators and interactions of the molecular factors included in the deferential gene expressed network, we search for gene-miRNA target and cross-validation using microRNA Data Integration Portal (mirDIP) 27 in 30 database sources such as BCmicrO, BiTargeting, CoMeTa, Cupid, DIANA, ElMMo3, GenMir + + , MAMI, MBStar, MirAncesTar, MirMAP, MirSNP, MirTar, Mirza-G, MultiMiTar, PACCMIT, PITA, PicTar, RNA22, RNAhybrid, RepTar, Tar-getRank, TargetScan, TargetSpy, miRDB, miRTar2GO, miRcode, microrna.org, mirCoX and miRbase for regulating the expressions using both unidirectional and bidirectional search method. All the database considered in the present study is from authenticated and publically available sites 27 .

Functional enrichments analysis.
ClueGO and CluePedia 28 plug-in of Cytoscape is used for functional enrichment analysis. ClueGO plug-in translates functionally grouped Gene Ontology (GO) and pathway annotation networks with a hypergeometric test along with the kappa coefficient 28 of pathways as well as functional correlations among pathways. ClueGO provides enrichment scores for selected gene sets against the user-provided gene list. CluePedia 29 finds new markers that are potentially related with pathways and extend ClueGO functionality with other biological data deriving screened results. Default parameters are used, and only GO terms with P < 0.05 are selected in ClueGO with a Benjamini-Hochberg correction and a kappa score of 0.5 (medium).
Network construction and analysis of clusters. Finally, regulatory networks are constructed for silicosis by merging selected DEGs and TFs-DEGs pairs using Cytoscape. Thereafter, the Molecular Complex Detection (MCODE) plug-in 23 is used to screen Clusters of hub genes from the network with degree cut-off = 10, haircut on, node score cut-off = 0.2, k-core = 2, and max. depth = 100.
Prediction of small-molecule signatures considering DEGs of silicosis. L1000CDS 2 web tool is used for prediction of the potential small-molecule signature that matches user input signature genes expressions based on characteristic direction method in the underlying dataset. It is an ultra-fast LINCS L1000 Characteristic Direction Search Engine for prediction the potential small-molecule signature 30 and DEGs are pasted into up/down text box and the top 50 signatures are considered.

Results and discussion
Identification of differentially expressed genes. The considered dataset is applied for significant DEGs identification by Bioconductor and 235 DEGs are found to be significantly associated with silicosis disease on the basis of the considered criteria. Table 1a, b enlist total of 235 DEGs, where down-regulated genes and up-regulated genes are 150 and 85 nos, respectively. On the basis of important pathway analysis and GO term Sellamathu et al. 16 identified 60 DEGs for crystalline silica exposure in their study.

Identification of transcription factors (TFs) for differentially expressed genes.
Adopting the iRegulon plugin in Cytoscape, 24 TFs are identified from publicly available database signatures/genesets (Gen-eSigDB, Ganesh clusters or/and MSigDB) that are significantly associated with the DEGs involved in the silicosis disease. It is found that 20 TFs influence both up and down-expressed genes whereas four TFs are solely involved in controlling only the down-regulated genes ( Table 2).

Identification of miRNA for DEGs from database-driven expansion of the network. Using
CyTargetLinker 31 authors observed initially only 2716 miRNAs responsible for silicosis affected gene regulation from databases as mentioned in the methodology. Therefore, the authors further explored a few other databases as mentioned in the methodology (public databases around 30 nos). We applied mirDIP and observed a total of 2586 miRNAs associated with targeted DEGs responsible for silicosis and categorized as "very high" (top 1%) and "high" (top 5% (excluding top 1%)) 27 . A total of 1105 miRNA is categorized as very high where 846 miR-NAs influence upregulated genes, 1055 miRNAs regulate down-regulated genes and 801 miRNAs regulates both (Supplementary file 1). For example, among identified miRNA, miRNA-29 influences epithelial-mesenchymal transition 32 , and miRNA-489 can repress its target genes MyD88 and Smad3 responsible for silicosis 33 . Chen et al. 34 reported the involvement of IL-10-producing B cells in the development of silica-induced lung inflammation and fibrosis of mice and many such in the list. Yang et al. 35 observed significant genetic heterogeneity involved in the origin and development of silicosis from research data and recommended relevant miRNAs as biomarkers having a role in regulating pulmonary fibrosis. The research group reported differential miRNAs in leukocytes as up-regulated (18 nos.) and down-regulated (20 nos.) during silicosis, compared with the control group miR-19a in peripheral blood leukocyte 35 . Faxuanet al. 36   Regulatory network construction. We identified targets, regulators and integrators of the molecular factors included in the DEG Network. Gene expressed network is a node and edge interaction between genegene or gene-miRNA and regulatory transcription factors. Precisely, we searched for gene regulatory interaction network as a) TF-target gene interactions and b) miRNA-gene interactions. We merged all the extracted interactions with the gene expressed Network using the same annotation principles as above.
TFs-DEGs network analysis. The interaction network is derived using plugin Cytoscape with 147 Nodes and 769 edges (Fig. 1). The blue nodes indicate here are the genes namely, RELA, JUND, and CEBPB which act simultaneously as TF, so function as both regulator and regulated gene. The most interacting TFs are CHD1, CEBPG, FOXF1, CREB3, RELA, and FOXP2, etc. as given in Table 2. The TF CHD1 is a gene similar to various other genes that act as TF and can regulate the activity of certain genes providing instructions to make epithelialcadherin or E-cadherin (a protein) having influence in cell adhesion, the transmission of chemical signals within cells, control of cell maturation and movement 38 . ATF4 TF is known to regulate memory, metabolism, and adaptation of cells to stress factors such as anoxic insult, endoplasmic reticulum stress, and oxidative stress 39 . In normal bronchial epithelial cells, CEBPG TF correlates with antioxidant and DNA to repair genes which is absent for individuals with bronchogenic carcinoma 40 . Expression of KAT2A(Tyr645Ala), reduces gene expression and inhibits tumor cell proliferation as well as tumor growth 41 . CREB3 influences leukocyte migration, tumor suppression, and endoplasmic reticulum stress-associated protein degradation 42,43 . PAX3 plays critical roles during fetal development as well as neural crest 44,45 . Mutations in paired boxgene 3 are associated with Waardenburg syndrome, craniofacial-deafness-handsyndrome, and alveolar rhabdomyosarcoma 46 . FOXF1 promote slung regeneration after partial pneumonectomy and involved in murine vasculogenesis, lung, and foregut development. Mice with reduced levels of pulmonary FOXF1 may face death due to pulmonary hemorrhages with deficient alveolarization and vasculogenesis 47,48 . POLR2A geneis associated with poor overall and disease-free survival of patients with early-stage non-small cell lung cancer 49 . FOXP2 is responsible for defective postnatal lung alveolarization resulting in postnatal lethality in mice. T1 alpha, a lungalveolar epithelial type 1 cell-restricted gene crucial for lung developmentand function, is a direct target of FOXP2 and FOXP0. Both FOXP2 and FOXP1 are crucial regulators of lung and oesophageal development 50 . E2F1significantly influences S phase progression and apoptosis as well FOXM1 expression, cell survival, epirubicin resistance 51 and its low level in primary lung adenocarcinoma may lead to cancer 52 . In small cell lung cancer, activation of oncogene EZH2 is often triggered by genomic deregulation of the E2F/Rb pathway 53 . Alveolar macrophages from RELA deficient animals are significantly less capable to involve in the canonical NF-κB pathway (a prototypical immune transcription pathway) and stimulate epithelial cells 54 .
miRNA regulatory network analysis. In present study, from the dataset GEO30180 we enlisted significant DEGs and investigated associated miRNAs using CyTargetLinker and in-silico validation for miRNA through cross validation with the multiple database mirDIP is carrird out for creation of Gene-TFs-miRNAs regulatory network. We have chosen the miRNA and mRNA from microRNA Data Integration Portal (mirDIP) web tools. In present manuscript, authors tried to identify the significant DEs, regulatory TFs and relevant miRNAs and predict the small molecules for drug combination of silicosis. Interaction network for genes and miRNAs is derived with 2461 Nodes and 13,343 edges as given in Fig. 2 using a plug-in CyTranslinker. An enlarged part is showing GPRY gene with its regulating miRNAs as an example. A grand total of 1100 number of very high target miRNA is identified from the considered database mirDIP using score class "very high". For many upregulated genes, miRNA (numbers are given in parenthesis) is associated in several hundred; for example, CDK19 (617), OGT (552), LGR4 (426), MAT2A (409),NREP (408), TIA1 (387), and TMEM2 (202).

Figure 1.
The gene-transcription factor regulation network. Notes: a round node represents a gene (yellow node) with differencial expresstion, a triangle nodes (green node) represents the TFs, and a rectangular rectangular nodes (pink node) represent the TFs that plays a role as regulator and regulated. www.nature.com/scientificreports/ Integrating TF and miRNA regulatory networks. TFs induce micro RNAs (miRNAs) transcription and miRNAs influence mRNA translation as well as transcript degradation for the regulation of gene expression and all these results in complex relationships feedback or feed-forward loops [54][55][56] . Furthermore, miRNAs and TFs are capable to alter each other's expression which results in difficulties for ascertaining the effect either one has on the target gene (TG) expression. The Integrated network (Fig. 3) consists of a total of 1396 nodes and edges 17,248 is constructed using Cytoscape from 235 (85 + 150) DE TGs, 24 regulating TFs and 1100 (846 up regulators + 1055 down regulators; 801 are common for both) very high target miRNA. In the network analysis, 14 clusters are achieved. The highest score of the cluster 1 is found to be 5.22 as given in Table 3. The present Gene-TFs-miRNAs regulatory network to find out the potential regulatory and regulated nodes provide detail insight for the gene associations, regulations by relevant TFs, and controlling the miRNAs.
Extraction of the cluster network. Using MCODE we extracted total 14 clusters according to the score computed along with nodes and edges, in which only six possesses Gene-TFs-miRNA interactions. In cluster 1 with maximum score, there are mainly Gene-Gene-miRNA interactions. However, gene FOXQ1 is observed in the clusters which is also identified as TF in literature. Few clusters are found to have only gene-gene or gene-miRNA interaction over Gene-TFs or Gene-TFs-miRNA interaction and is given in Supplementary Table S1.
From Fig. 4 in cluster 3 (Nodes-312, Edges-621, Score = 3.968), for example, TF JUND is associated with various genes as well as numerous miRNAs altogether. A similar type of network is observed for TFs RELA and PEBPB. In present study, considering the data with crystalline silica exposure of 0 and 800 µg/mL for 6 h exposure, 235 DEGs are enlisted on the basis of FDR set as the cut-off parameters to screen out 250 significant increases or decreases in gene expression levels. Previous study by Sellamatu et al. 16 considered a set of exposure (0, 15, 30, 120, 240 µg/cm 2 for 0-6 h) for their study and on the basis of potential pathway and gene ontology term identified significantly expressed 60 DEGs. Current study emphasise on identification and enlist of the biological pathway and process individually where identified DEGs are involved identifying potential regulating TFs and target miRNA according to the DEGs. www.nature.com/scientificreports/ Functional enrichments analysis. In present study, biological processes and pathways involoving the 235 DEGs with silica exposure are evaluated and significant biological process with DEGs involved are given in Table 3. A maximum of 11 genes is responsible for Rab proteins signal transduction followed by 9 genes for DNA integrity checkpoint (Fig. 5a). Various researchers reported that overexpression of Rab GTPases have a striking relationship with carcinogenesis and dysregulation of Rab proteins can be linked to the progression of already existent tumors contributing to their malignancy 57,58 . For Cluster 1, the biological process identified and the genes involved are biological adhesion (TNS3, CD274, and CDH1), biological phase (TOP2A), biological regulation (TOP2A, CD274), cellular process (TN53, IDH1, TOP2A, UBAP1, CD274, and CDH1), immune system process (TN53, CD274), metabolic process (IDH1, and UBAP1) and response to a stimulus (CD274) (Fig. 5b). The biological pathways observed in the study is shown in Fig. 6a,b. Major pathways with higher involvement of genes (numbers given in parenthesis) are Cytokines and Inflammatory Response (6), Rheumatoid arthritis (11) Prediction Small molecule signatures. Table 4 displays the Rank (based on the synergy score), the perturbed molecule (names of the chemical perturbations), Dose, Cell-line, time, the direction of regulation, and the GEO ID from which thesignature is extracted. We next sought to predict the drug combination with the ranked small molecule drug signature list (Table 5). From L1000CDS2 search engine forgene-set search, for differentially expressed genes provided nearly fifty significant combinations. These help infinding reverse www.nature.com/scientificreports/ and mimic combinations of an input gene expression signature for controlling Silicosis. The maximum scoring small-molecule combinations are CGP-60774 (total 20 combinations) followed by alvocidib (total 15 combinations) and with AZD-7762 (total 24 combinations) as can be seen from Table 5 with a few other drugs having a high probability of success.

Conclusions
Our study targets to provide detailed guidance for future fundamental researches along with some key genes, TFs and miRNAs, which are potential biomarkers for silicosis. The approach used in the manuscript allows us to incorporate various data sources into an integrated network, analysis of network parameters in order to find key network elements. Using various data sources, we found the relationships between different molecular components to support our comprehension of how silicosis progresses. In this study, 235 differentially expressed genes (150 down-expressed and 85 up-expressed) are identified as affected by exposure to crystalline silica. These genes are regulated by 24 TFs and 1100 very high target miRNAs. The network between DEGs, TFs and miRNAs are constructed using the various plug-in of Bioconductor, Cytoscape and MCODE to find the associateship of their various aspects. Total of 14 clusters of the network is achieved where only six clusters there is Gene-TFs-miRNA interaction and other eight clusters posess DEGs-miRNAs interactions or DEGs-DEGs-miRNAs interactions The most targeted genes and TFs as well as miRNAs are observed in cluster 3 to cluster 5 in the www.nature.com/scientificreports/