Integrative Bioinformatics approaches to therapeutic gene target selection in various cancers for Nitroglycerin

Integrative Bioinformatics analysis helps to explore various mechanisms of Nitroglycerin activity in different types of cancers and help predict target genes through which Nitroglycerin affect cancers. Many publicly available databases and tools were used for our study. First step in this study is identification of Interconnected Genes. Using Pubchem and SwissTargetPrediction Direct Target Genes (activator, inhibitor, agonist and suppressor) of Nitroglycerin were identified. PPI network was constructed to identify different types of cancers that the 12 direct target genes affected and the Closeness Coefficient of the direct target genes so identified. Pathway analysis was performed to ascertain biomolecules functions for the direct target genes using CluePedia App. Mutation Analysis revealed Mutated Genes and types of cancers that are affected by the mutated genes. While the PPI network construction revealed the types of cancer that are affected by 12 target genes this step reveals the types of cancers affected by mutated cancers only. Only mutated genes were chosen for further study. These mutated genes were input into STRING to perform NW Analysis. NW Analysis revealed Interconnected Genes within the mutated genes as identified above. Second Step in this study is to predict and identify Upregulated and Downregulated genes. Data Sets for the identified cancers from the above procedure were obtained from GEO Database. DEG Analysis on the above Data sets was performed to predict Upregulated and Downregulated genes. A comparison of interconnected genes identified in step 1 with Upregulated and Downregulated genes obtained in step 2 revealed Co-Expressed Genes among Interconnected Genes. NW Analysis using STRING was performed on Co-Expressed Genes to ascertain Closeness Coefficient of Co-Expressed genes. Gene Ontology was performed on Co-Expressed Genes to ascertain their Functions. Pathway Analysis was performed on Co-Expressed Genes to identify the Types of Cancers that are influenced by co-expressed genes. The four types of cancers identified in Mutation analysis in step 1 were the same as the ones that were identified in this pathway analysis. This further corroborates the 4 types of cancers identified in Mutation analysis. Survival Analysis was done on the co-expressed genes as identified above using Survexpress. BIOMARKERS for Nitroglycerin were identified for four types of cancers through Survival Analysis. The four types of cancers are Bladder cancer, Endometrial cancer, Melanoma and Non-small cell lung cancer.


Materials and methodology
Free version of the Flow Chart Creator [https:// www. smart draw. com] was used for getting the blunder free Flow Chart image using SmartDraw. SmartDraw's flowchart maker includes templates, tools, and symbols to make flowcharts easy and fast. Templates can be copied to MSOffice and Google apps from this flowchart App. Joint Photographic Experts Group (.jpeg) picture was transferred for better visualization. The flowchart is given in Fig. 1. This flowchart explains the overall methodology and details of databases and tools used.
Identification of direct target genes. Target identification is the major process of identifying the direct and indirect molecular targets such as protein or nucleic acid (macro molecule). In bioinformatics, target identification is the process of finding the efficacy of a pharmaceutical/natural drug. In our study, direct target genes have been identified using Integrative Bioinformatics practice (drug based direct target). A total of 12 genes were identified as direct drug target for Nitroglycerin, 10 genes were obtained from the PubChem Database [https:// pubch em. ncbi. nlm. nih. gov/] and 2 from SwissTargetPrediction [http:// www. swiss targe tpred iction. ch/]. In PubChem Database Nitroglycerin targets revealed comprehensive outcome evaluation. Applying 'Drug-Gene Interactions' , 10 target genes were obtained. SwissTargetPrediction allowed to estimate the most probable macromolecular targets of a small bioactive molecule. A combination of two dimensional and three dimensional similarity with a library of 3,70,000 known active proteins on more than 3 thousand proteins from different species are available 14 . SwissTargetPrediction genes were found in Homo sapiens after submitting the SMILES format of Nitroglycerin.
Network and pathway analysis. Protein-Protein Interactions (PPIs) network functions as regulatory nodes in many cell-signalling networks associated with cancer's "hallmarks". A number of PPIs that are closely linked to cell signalling and cell survival have been identified and validated as cancer biomarkers, and they have become the subject of interest in academic and industry circles for drug discovery programs 15 . STRING (Search Tool for Interacting Genes Retrieval) [https:// string-db. org/] is an online tool to construct and analyse proteinprotein interaction network of Nitroglycerin, with not more than 20 interactors at the first, and second shells set as the cut off area. Cytoscape is a free software platform for visualizing molecular interaction networks and biological pathways as well as combining them with annotations, gene expression profiles, and other state data. CluePedia tool is a plugin with Cytoscape 16 was used to identify crucial modules for further analysis.   www.nature.com/scientificreports/ mutation in different types of cancer influenced by 12 target genes individually. Identification and visualization of the listed 12 direct target genes and their associated cancer types revealed that 3 genes affected 4 types of cancer namely Bladder cancer, Endometrial cancer, Melanoma and Non-small cell lung cancer. The three Nitroglycerin-associated target genes are EGFR, HRAS and MAPK3. Both mutations and the genomic alteration frequency within the selected cancers were ascertained using cBioPortal.
Prediction of interconnected genes for three mutated genes. The 3 mutated target genes (EGFR, HRAS and MAPK3) found in mutation analysis were input into PPI network. Using interactions with a high confidence score, genes which were associated (interconnected) with the three mutated target genes of Nitroglycerin were obtained with the help of STRING database (version 11). The interconnected genes were identified by a step-by-step process. Maximal groups/cliques were extracted from the PPI network. Each clique and the hub genes was notified by key pathways. 39 Interconnected genes were obtained. Of the 39, genes were found to be duplicated and hence 37 genes were chosen for further studies.
Microarray data information and DEG analysis. NCBI 24,25 . These four datasets (pertaining to four types of cancers) were chosen for further DEG Analysis. Differential Expression of Gene (DEG) Analysis was used to study and compare the gene expression between normal sample and diseased sample. Criteria for Upregulated and Downregulated genes in cancers were defined using GEO2R tool (a cancer microarray database and web-based data-mining platform) 26 . Cancer type (Bladder cancer, Endometrial cancer, Melanoma and Non-small cell lung cancer) and analysis type ('cancer vs. normal' analysis) were selected as the filters. We defined the corresponding adjusted p value for genes. An adjusted p value < 0.05 and logFC (fold change) ≥ 1 for Upregulated or ≤ -1 cutoff criteria for Downregulated genes and were also defined 27 .
Finding co-expressed genes of target genes. A manual comparison of 37 Interconnected Genes found after Mutation Analysis (step 1 as per flowchart) with Upregulated/Downregulated genes found from DEG Analysis (Step 2 as per flowchart) was done to ascertain 16 co-expressed genes which in turn were used for further analysis.
Network analysis (linkage) and validation of co-expressed genes. Network analysis in STRING was used to: • Predict linkage (relationship) between co-expressed genes • Find the degree of closeness particularly high closeness between co-expressed genes • High betweenness prediction of the co-expressed genes Cytoscape Network Analyser was used to validate the network analysis.
GO and pathway enrichment analysis of co-expressed genes. The functions and pathway enrichment of candidate DEGs were analysed using DAVID tool version 6.8 [http:// david. ncifc rf. gov/]. Gene ontology is a bioinformatics resource that provides information about gene product function. DAVID provides a comprehensive set of functional annotation tools to investigate large list of genes. It also helps in analysing biological roles of genes. It is used to perform GO and KEGG pathway enrichment analyses of differential expression of genes. Using GO study, functions of possible co-expressed genes of Nitroglycerin in four cancers were identified 28 .
Survival analysis and validation. Survival analysis is used to analyse the probability distribution of survival of biological organisms. Survival analysis was performed using SurvExpress tool (An Online Biomarker Validation Tool and Database for Cancer Gene Expression Data Using Survival Analysis) 29 . SurvExpress is a large, versatile, and fast tool available freely on the Net. The input for SurvExpress is a list of co-expressed genes.

Results
Drug target identification. The genes that interacted with Nitroglycerin drug were retrieved and identified from PubChem database 30 and SwissTargetPrediction database (Table 1).
Depending on the degree of interaction, identified genes were segregated as functionally known Activator (GUCY1A3, GUCY1B3, GUCY1A2, GUCY1B2 and GSR), Inhibitor (HIF1A), and Agonist (NPR1). Apart from the previously mentioned genes, functionally unknown genes were also identified. Examples of Functionally unknown genes are GSTM1, MAPK3, EGFR, HRAS, DAO. Although, GSTM1 was identified as a functionally unknown gene, it was identified as Suppressor 31 . Similarly functionally unknown gene EGFR is activated by autocrine or paracrine growth factors in some tumors 32 . The functionally unknown EGFR gene is also responsible Analysis of the connection between target genes and cancer. STRING was used to develop PPI network and signalling pathways of genes with Nitroglycerin 33 (Fig. 2). Interaction analyses show 11 nodes, 15 edges, 2.73 average node degree, 0.773 average local clustering coefficient and PPI enrichment p value of 0.000301. The predicted networks in all places in this work has significantly more acceptable interactions as per the reference value given in the STRING database (PPI enrichment p value = 1.0e−16). The projected target genes were interlinked with cancers like Glioma, Bladder cancer, Endometrial cancer, Melanoma, Non-small cell lung cancer and Renal cell carcinoma. Choline Metabolism, PD-L1 expression and PD-1 checkpoint pathway in cancer and Central Carbon metabolism related to cancer interlinked with projected target genes were retrieved from the KEGG pathways analysis through Cytoscape (Fig. 3). The findings revealed that three Nitroglycerin genes (EGFR, HRAS and MAPK3) were found to be common in 4 types of cancers viz Bladder cancer, Endometrial cancer, Melanoma and Non-small cell lung cancer. Association of four cancers confirmed a positive correlation between 3 mutated genes of Nitroglycerin and the afore mentioned four cancers. Incidentally a positive correlation between 3 target genes of Nitroglycerin and four cancers was an unexpected outcome 15 .

Analysis of genetic alteration in cancers.
Mutational analysis was done to identify the genomic changes of 12 genes in various cancers. From the genomic changes identified above, genes with prominent expressions in cancers are identified. Prominent expression of genes as identified above and their associated cancers are identified. Genes having prominent expression are EGFR, HRAS and MAPK3. The cancers associated with the 3 genes as identified above are Bladder cancer, Endometrial cancer, Melanoma and Non-small cell lung cancer.
Mutational analysis in 4 cancers. cBioPortal was used to investigate the genomic changes of three Nitroglycerin genes (EGFR, HRAS and MAPK3) associated with respective cancers (Tables 2, 3). OncoPrint was used to show the most important alteration frequency of genes (Fig. 4).

Prediction of interconnected genes for 3 mutated genes.
Genes interconnected (network associated) with three target genes of Nitroglycerin were identified with the help of STRING database. Total of 39 associated genes ( Fig. 6) were identified from protein-protein interaction analysis. The thirty nine associated genes are AKTI, BRAF, CBL, CDC42, DUSP26, EGF, EREG, ERRFI1, GAB2, GRB2, HRASLS2, IL6, JAK1, Table 1. PubChem and SwissTargetPrediction for direct target genes of Nitroglycerin. This shows direct target genes interactions with respect to Drug, Gene Name, Gene ID, Interaction claim source and Interaction type. Gene names are official gene symbols that are unique identifiers. Interaction claim source is the interaction taken from other available chemical compound databases. Interaction type is a function of inhibitor for target. DEG analysis. DEG Analysis was done on four cancer datasets by comparing cancer samples with normal tissues in GEO2R tool (Fig. 7). The four types of cancer datasets are GSE7476 (Analysis of clinical bladder cancer classification according to microarray expression profiles), GSE17025 (Gene Expression Analysis of Stage I Endometrial Cancers), GSE35389 (Expression data from normal melanocytes, melanoma cells and their exosomes), GSE32989 (Expression profiling of lung cancer cell lines). Volcano plot was constructed for DEG analysis. The volcano plot shows the relationship between p values of a statistical test and the magnitude of fold change in terms of control versus cancer. The magnitude of fold change values denotes the extent to which genes were upregulated or downregulated. In volcano plot, the parameters of adjusted p value < 0.05 and logFC cutoff criteria ≥ 1 are upregulated and adjusted p value < 0.05 and logFC cutoff criteria ≤ -1 are downregulated were selected for our study.
Co-expressed genes identification. Target genes and co-expressed genes will always have similar gene patterns and gene regulation 35,36 . The right approach to arrive at target biomarkers for Nitroglycerin is to adopt ways and means to find co-expressed genes (similar expression patterns).
By comparing the DEGs and interconnected genes with three target genes of Nitroglycerin, sixteen genes were found to be co-expressed (a possible Nitroglycerin therapeutic targets) in four types of cancers (Table 4). Result reveal Bladder cancer has five co-expressed genes. These are ERRFI1, IL6, PIK3R1 and SPRY2 which were found to be upregulated and YWHAZ which was found to be downregulated; Endometrial cancer has twelve genes. These are EGFR, ERRFI1, IL6, JAK2, PLXNC1, RGL3 which were found to be upregulated and CBL, CDC42, PIK3R3, STAT3, UBE2D2 and YWHAZ which were found to be downregulated; Melanoma has PLXNC1 as upregulated and TGFA as downregulated gene. Non-small cell lung cancer has GAB2 and PIK3R1 as two upregulated genes.
Network analysis (linkage) and validation of co-expressed genes. Co-expressed genes linkage analysis of Bladder cancer, Endometrial cancer, Melanoma and Non-small cell lung cancer revealed nine poten-  Figure 1 shows that a group of genes EGFR, HRAS, MAPK3 and HIF1A were directly and indirectly connected with one another. Therefore, these genes were functionally linked and related. Another group of genes GUCY1A3, GUCY1A2, NPR1 and GUCY1B3 were functionally interconnected with one another. GO and pathway analysis of co-expressed genes. KEGG pathway analysis and Gene Ontology analysis using DAVID database was performed for four types of cancer targets. KEGG pathway analysis showed that sixteen co-expressed genes were found to participate in pathways such as: Fc epsilon RI signaling pathway, Hepatitis B, Measles, Axon guidance, Ebb signaling pathway and PI3K-Akt signaling pathway (Table 5). GO analysis comprises three functional groups: (1) biological processes, (2) cellular components, and (3) molecular functions. In biological processes, majority of genes are involved in negative regulation of apoptotic process and phosphatidylinositol-mediated signaling. In cellular components, many of the genes are present in cytosol and cytoplasm. In Molecular function, majority of genes promote protein binding and protein kinase binding. Table 5 shows the pathways, process, location, function, disease and expression level for each individual gene.

Survival analysis (Kaplan-Meier plot and ROC curve).
Survival analysis was performed using Sur-vExpress tool for each of the sixteen co-expressed gene in four cancer dataset viz Bladder Cancer, Endometrial Cancer, Melanoma and Non-small lung cancer. This analysis helps to identify high risk of death and low survival of co-expressed genes (Fig. 9). Kaplan-Meier plot showed the Concordance Index (CI), p value for Survival Curve and Hazard Ratio for risk group. Higher CI values are associated with better prediction for Survival Curve. Survival risk curves are represented in green and red color for low and high risk respectively. The x-axis represents the time (in days) of the study. Hazard Ratio value (≥ 1) indicates high risk rate that leads to low   www.nature.com/scientificreports/ Figure 4. Genomic alteration of EGFR, HRAS and MAPK3 in all four cancer types. Green color denotes "missense mutation" of known significance, Light Green color denotes "missense mutation" of unknown significance, Yellow color denotes "Splice mutation", Grey color denotes "Truncating mutation" of unknown significance, Violet color denotes "fusion", Red color denotes "amplification" and Blue color denotes "deep deletion" of unknown significance. www.nature.com/scientificreports/    Figure 8. Linkage analysis of co-expressed genes. www.nature.com/scientificreports/   Table 6. Overall analysis revealed that RGL3 gene has high risk by the Hazard Ratio value but Survival ROC curve classification gave a less accurate score in prediction. Finally, SPRY2 (Bladder), CBL (Endometrial), TGFA (Melanoma) and GAB2 (Non-small cell lung) genes have high risk rate and low survival rate. These four genes are valid therapeutic targets (biomarkers) for Nitroglycerin from the co-expressed genes.

Discussion
Cancer is the second deadliest disease in the world. Hence finding a novel drug is very important for reducing risk of death and increasing survival rate. However finding a novel drug by experimental approach of target identification is time consuming, and sometimes takes as long as even 12 years or more. On the other hand computational procedure for repurposing/finding a novel drug takes a very short time and at lower cost 37 .
In mutational analysis of four cancer sample studies, (bladder cancer, endometrial cancer, melanoma and non-small cell lung cancer) three genes whose frequency of genetic alterations were measured were found to be mutated. The three genes are EGFR, HRAS and MAPK3, Proteins within the gene are responsible for gene mutation. Mutation of proteins in two Genes EGFR and HRAS were found responsible for mutations in multiple cancers. Mutation of proteins in one gene MAPK3 was found to be unique to each type of cancer. Mutated proteins responsible for multiple cancers are F359L, P733S, S306L, G13D, G13R and Q61K. We found that EGFR gene that caused intragenic mutation (Gene-EGFR, chr-7 and type-Fusion) occurred in three types of cancers viz Endometrial Cancer (  . These proteins within the above discussed 3 genes can be taken for the therapeutic analysis. It is worth pointing out that since proteins within Gene EGFR and Gene HRAS affect multiple cancers as compared to proteins within Gene MAPK3 which are unique to each type of cancer, the cost benefit payoff ratio would be higher for Gene EGFR and Gene HRAS than for Gene MAPK3. In the present study, we investigated the DEGs among four cancer data sets (cancer vs. normal). We examined a total of 20,529 DEGs, 2279 upregulated and 758 downregulated genes for Bladder cancer; 3238 upregulated and 2553 downregulated genes for Endometrial cancer; 52 upregulated and 69 downregulated genes for Melanoma; and 875 upregulated and 253 downregulated genes for Non-small cell lung cancer. While Nitroglycerin is commonly used for treatment of CVD patients and a number of studies have generally shown Nitroglycerin to be an antitumor agent, our research is at the minute level of genes. Our research dwells further into different gene targets such as Direct Target Genes, Mutated Genes, Interconnected Genes, Co-expressed genes, and finally Biomarkers of Nitroglycerin for the four cancers. Linkage analysis proved that co-expressed genes should have similar patterns in gene expression and gene regulation. GO and Pathway Analysis results confirmed that the co-expressed genes have a major role to play in many biological functions such as Protein Kinase Binding, Protein Binding, Glycoprotein binding, and Molecular Adaptor Activity of Transmembrane Receptor Protein Tyrosine Kinase. All these functions were disrupted by co-expressed genes in respective cancers. DEG analysis further revealed upregulation of genes SPRY2 (Bladder), and GAB2 (Non-small cell lung) and downregulation of genes CBL (Endometrial), TGFA (Melanoma) were associated with low survival rate and high risk of death as measured by survival probability and AUC score. This is corroborated by gene suppression of SPRY2 38 that revealed distinct tumor suppressive roles in different cancer perspectives [39][40][41] . Further corroboration is obtained when the downregulation of SPRY2 caused significantly reduced cell proliferation/cell death 42 . Besides SPRY2 promoter plays an important role in ERK signaling and inhibition of several human cancers 43,44 .
Studies reveal potential clinical impact of CBL gene on cancer immunotherapy. In our study gene CBL is identified as having high closeness among the 16 co-expressed genes in linkage analysis and predicted as downregulated gene via DEG analysis. Our study corroborates with an existing report that CBL positively regulates signal transduction which means it increases regulation (activate/upregulation) which in turn leads to reduction of complications in Endometrial cancer 45 . As per our findings, we suggest CBL as an apt target for Nitroglycerin and novel drug design against Endometrial cancer. As regards TGFA our study corroborates with overexpression of the gene, leading to non-progression of cancer 46 . Further corroboration is obtained with respect to study of TGFA expression in esophageal cancer 47 . Corroboration is obtained for GAB2 when suppression of the same reduces lymph node metastases and invasive cancer. GAB2 also seems to collaborate with other oncogenes linked to the progression of breast cancer, including the SRC family. Standard chemotherapy employs GAB2 as Table 6. Survival probability and reoccurrence score. Bold indicates the genes that have high risk rate and low survival rate. These genes are identified as Biomarkers for Nitroglycerin in this study. www.nature.com/scientificreports/ a potential gene target in treatment of GAB2-driven ovarian cancer. GAB2 is involved in signaling the growth of malignant tumors 48 .

Conclusion
Though the identification of drug-gene interaction is significant in drug discovery approach, the cost overrun for experimental approaches is enormous. It is extensively time consuming and very challenging. To offset humongous cost overrun and time consuming practices, several computational practices including pharmacology of drugs and evaluating drug-target interactions are leading to the discovery/invention of potential Biomarkers for a drug. Our analyses based on the latter method of computational procedures promotes connecting the diseases with the drug-associated gene sets at a minimum cost and in quick time. Integrative Bioinformatics Analysis is a computational procedure which improves understanding the mechanism of drugs relating to cancer treatment in quick time and can be considered as versatile in explaining the concepts of drug-disease interaction. Integrative Bioinformatics analyses helps to identify Biomarkers of Nitroglycerin drug in a few days which otherwise would have taken a few months or sometimes even a few years. After Survival Analysis we concluded that four genes (SPRY2-for Bladder cancer, CBL-for Endometrial cancer, TGFA-for Melanoma and GAB2 for Non-small cell lung cancer) were the Biomarkers for Nitroglycerin. The results of our research can now be used in experimental procedure to gain insight into the role of the identified Biomarkers in cancer treatment. The identified Biomarkers can also be used in further computational procedures.