Systems-level differential gene expression analysis reveals new genetic variants of oral cancer

Oral cancer (OC) ranked as eleventh malignancy worldwide, with the increasing incidence among young patients. Limited understanding of complications in cancer progression, its development system, and their interactions are major restrictions towards the progress of optimal and effective treatment strategies. The system-level approach has been designed to explore genetic complexity of the disease and to identify novel oral cancer related genes to detect genomic alterations at molecular level, through cDNA differential analysis. We analyzed 21 oral cancer-related cDNA datasets and listed 30 differentially expressed genes (DEGs). Among 30, we found 6 significant DEGs including CYP1A1, CYP1B1, ADCY2, C7, SERPINB5, and ANAPC13 and studied their functional role in OC. Our genomic and interactive analysis showed significant enrichment of xenobiotics metabolism, p53 signaling pathway and microRNA pathways, towards OC progression and development. We used human proteomic data for post-translational modifications to interpret disease mutations and inter-individual genetic variations. The mutational analysis revealed the sequence predicted disordered region of 14%, 12.5%, 10.5% for ADCY2, CYP1B1, and C7 respectively. The MiRNA target prediction showed functional molecular annotation including specific miRNA-targets hsa-miR-4282, hsa-miR-2052, hsa-miR-216a-3p, for CYP1B1, C7, and ADCY2 respectively associated with oral cancer. We constructed the system level network and found important gene signatures. The drug-gene interaction of OC source genes with seven FDA approved OC drugs help to design or identify new drug target or establishing novel biomedical linkages regarding disease pathophysiology. This investigation demonstrates the importance of system genetics for identifying 6 OC genes (CYP1A1, CYP1B1, ADCY2, C7, SERPINB5, and ANAPC13) as potential drugs targets. Our integrative network-based system-level approach would help to find the genetic variants of OC that can accelerate drug discovery outcomes to develop a better understanding regarding treatment strategies for many cancer types.

Oral Cancer constitutes approximately 90% among all Head and Neck Cancer (HNC) sub-types 1 . However, it is more prominent in urban areas of South Asia with a ratio of 15-40% among all cancer types 2 . In Pakistan, it ranked as 2nd most prevalent cancer-type, with increasing incidence in the past few years 3,4 . The complexity of genetic mechanisms in cancer has been revealed through recent investigations. Many biological systems seem to involved in the development and progression of the cancer. But, the complications in system-interactions are limitedly understood which is a major restriction in developing effective treatments 5 . The gene expression studies may help to investigate the differential expression of genes in different biological states, cell cycle stages, subjects or tissues. This gene expression analysis is an important pinpoint for investigating biological processes and their functional disorders. cDNA microarrays were used to monitor and reveal the expression level for thousands of the genes which are differentially expressed in tumors simultaneously 5 . This technique can exploit this valuable information regarding gene expression analysis. This rapidly progressing technique provides comprehensive data for gene expression profiling of thousands of genes to the investigators in one experiment. Many studies demonstrated that this technique is useful to identify novel genes for cancer and its molecular level classification in human 6,7 . Thus, this novel technique may help us to identify new potential targets for drug development for optimal and effective disease therapies. It may also establish an important link between clinical medicine and gene sequences for humans 8  www.nature.com/scientificreports/ obtained the same delta value of 0.00515 as we used the LOOCV approach (during raw cross-validation and afterward during modified cross-validation). The substantial codes (0.1, 0.01, 0.001, and 0.05) with residuals of limited deviance suggested the consistency of the differential analysis. cDNA datasets were also analyzed for some necessary factors like RNA quality, sequence biases or RNA degradation. In genomic analysis, the use of low-quality RNA samples in the sequencing of the entire genome is inefficient. It is not clear if transcript deg- Histogram (smoothed histograms) shows density estimate of the data. Typically, the distributions of the arrays have similar shapes and ranges. Arrays whose distributions are very different from the others are considered for possible problems. High levels of background shifted an array's distribution to the right. Lack of signal diminishes its right tail. A bulge at the upper end of the intensity range indicated the signal saturation. www.nature.com/scientificreports/ radation occurs reliably in low-quality RNA samples, in which case the effects of degradation can be reversed by data normalization or whether different RNA samples can be degraded at different rates, which could bias expression measurements. So, for differential expression analysis, we assessed the RNA quality. To verify the dataset reliability for identification of variation at the transcriptional level in original samples. The normalization process was used to standardize sample handling techniques and to assess optimal RNA variability threshold by using discrimination measures for statistical and algorithmic analysis. All the probe sets have their individual probes aligned at 5′-end of the target RNA molecule. The competitive binding of a particular probe to its target has been observed to depend upon a 3′/5′ intensity gradient. Due to the poor quality of RNA, a reduced quantity of RNA is hybridized to the array. The low hybridization leads to a decrease in the total signal output level. But if the degree of saturation level increases the 3′/5′ intensity gradient decreases. The 3′-end of the target gene contains a probe set that corresponds to the transcripts. The statistical and function summary for each batch-array is produced by ' AffyRNAdeg' to measure RNA degradation level and its significance (Fig. 2).

Disease-gene curation for differentially expressed genes (DEGs). From 21 datasets, we found 30
DEGs and David tool was used to retrieve their gene symbol and biological annotation. We selected the most significant ranked genes from the list of differentially expressed genes. For disease-gene curation, these genes were text mined using CTD (Comparative Toxicogenomics Database), PubMed, OMIM, MeSH, and PMC databases to filter disease-specific genes (Supplementary Table 1). We observed CYP1B1, CYP1A1, C7, ADCY2, SERPINB5, and ANAPC13 are the most curated terms in the databases. These shortlisted genes were further analyzed by mapping at (p < 0.00005) through Cancer Genetics and OMIM databases and observed their role in carcinogenesis (Fig. 3).
Enrichment and cluster analysis of DEGs. These genes showed enrichment substantially linked with hydroxylase, P450 pathway, steroid metabolic process, monooxygenase activity, cellular response to organic cyclic compound, and aromatase activity ( Table 3). The dysregulation of these genes causes genetic heterogeneity, autosomal recessive disorder, head and neck disorders and other clinical phenotypes (Fig. 4). The function of the gene, its regulation, subtypes, and cellular processes play a key role in understanding its biology. The functional enrichment analysis showed that shortlisted CYP1A1, CYP1B1 genes are known to www.nature.com/scientificreports/  www.nature.com/scientificreports/ involved in xenobiotic metabolic and energy pathways. C7 is involved in immune response whereas SERPINB5 is known for protein metabolism. While ADCY2 plays an important role in cell communication and signal transduction. ANAPC13, have a potential role in class-I MHC-mediated antigens and cell-cycle progression at early tumor stages. The membrane attack complex in the extracellular region for C7, Expression of CYP1A1, SERPINB5, CYP1B1 is found in the endoplasmic reticulum. CYP1A1 is also found in nucleus and microsomes. ANAPC13 is a well-known anaphase-promoting complex. Differential expression of ADCY2, SERPINB5, CYP1B1 is also found in the cytoplasm. Cluster analysis of selected DEGs helps us in the recognition of functional annotation and significance. The results were observed with the Euclidean distance (Fig. 5). The genetic expression of sample cells is distinguished as cases and control indicating the obvious differences between two of these groups. The analysis showed the down and up-regulated genes based on the p value and fold changes ( Table 4).
Mutation analysis. ADCY2 has eight post-translational modification (PTM) sites with 234 recurrent cancer mutations at the chromosome no. 5 positive-strand encoding 1091 protein residues representing 14.02% of the predicted disordered region. The mutation visualization plot shows ADCY2 isoform ADCY2 Q474R direct network-rewiring mutation impact at the position 474, with reference amino acid residue Q and mutated amino acid residue R in the protein. The affected-site at position 472 with S amino acid residue-site enriched with a phosphorylation-type mutation affecting PTMs. Another, ADCY2 isoform ADCY2 S655R, reveals the mutation for this protein at position 655 including amino acid residue S comparison with the mutated amino acid residue R. In position 659, S-amino acid residue site enriched with phosphorylation-type mutation, this shows distal-mutation PTM impact with the affected-site. CYP1B1 showed a 10.5% predicted disordered sequence region with 100 mutations observed at chromosome no 2 on the negative strand with 543 protein residues and 3 PTM sites. CYP1B1 I87S, CYP1B1 Q479H, CYP1B1 T510I isoforms were revealed for CYP1B1 mutational enrichment, at the positions 87,479,510 respectively. The reference amino acid residue for these isoforms was I, Q, T along with mutant amino acid residues S, H, I respectively for each isoform. Similarly, the mutational analysis of C7 showed that 12.57% of the sequence predicted for disease-pathophysiology. Total 244 number of mutations were found on the positive strand of chromosome no. 5 for C7. The number of PTM sites for C7 was eight with 843 protein residues. So, 10 isoforms for C7 were found, among them C7 Q29R and C7 G41D were in distal-mutational PTM impact (Fig. 6). The reference amino acid residues for C7 Q29R, C7 G41D, and C7 T756I were Q, G, T, and mutant amino acid residues reported were I, R, and D, respectively (Supplementary Table 2).

Protein-protein interaction analysis.
We retrieved the related nodes and edges of all oral cancer associated DEGs from the HAPPI database to construct the integrated PPI-network (Fig. 7). This interaction analysis helped us to observe the potential functional interaction among OC related DEGs and other associated genes contributing to the disease phenotype. The seeder or source OC associated DEGs CYP1A1_HUMAN, CYP1B1_ HUMAN, ADCY2_HUMAN, C7_HUMAN, SERPINB5_HUMAN, ANAPC13_HUMAN interact with the target genes including BRAC1_HUMAN, CO6_HUMAN, ISG15_HUMAN, S1PR3_HUMAN, and other essential proteins. The network topology shows a significant relationship between seeder and target genes. The identi-  Pathway modelling of OC associated DEGs. Integrated pathways were modeled to observe the possible role of DEGs in pathophysiological mechanisms. Ras, p53, MEK, SOS, Rb, Bax, PTEN, and Raf are important interacting genes associated with the pathophysiological mechanism of oral cancer. We found that p53 signaling, microRNA signaling, salivary secretion, human papillomavirus, cell cycle, alcoholism, and xenobiotic metabolism-related pathways are interconnected in the progression and development of the disease (Fig. 8).
Toxicogenomic analysis. The toxicogenomic analysis enabled us to explore chemical genotype-phenotype exposomic information that may lead to disease progression. OC associated DEGs were curated in terms of their activity and expression with different environmental chemicals. The data revealed the activity and expression of DEGs, which either increase or decrease the expression, increase or decrease towards gene activity at different cellular events. It may also affect the cotreatment expression leading to disease occurrence. It was revealed that the same chemical exposure may show different reactivity for different genes. In this case, benzo(a)pyrene increases the expression of CYP1A1 but it affects the reactivity of ADCY2. Methylcholanthrene, albendazole,  www.nature.com/scientificreports/  www.nature.com/scientificreports/ primaquine increases CYP1A1 activity. Resveratrol and tetrachlorodibenzodioxin were found to affect the binding, decrease the reaction, and increase the CYP1A1 and CYP1B1 activity. Similarly, acetaminophen was found to increase the expression of C7, whereas alpha-cobra toxin may account for decrease reaction. Arsenic may affect C7 expression by increasing its abundance. While nickel was found to decrease C7 expression. However, decitabine was found to affect the cotreatment of SERPINB5 and decrease the gene reactivity with trichostatin A. It was observed that fonofos, methapyrilene, parathion increases and affects the ADCY2 reactivity at different cellular events while valproic acid, okadaic acid, doxorubicin, and bisphenol A decreases the ANAPC13 expression (Fig. 9).
De novo prediction of regulatory motifs. Oral cancer-associated DEGs were used for de novo analysis to predict the regulatory motifs. The transcriptional factors include ARNT, AHR, CEBPA, CTCF, HNF1B, ELK4, TCF3, and NR2E3. The conservation cutoff is 0.40 with a matrix-score threshold of 85% were set as default parameters. The parameter settings standardize to analyze oPOSSUM-tool showed how the transcriptional factor controls its related targets (Table 5).

Drug-gene interaction analysis.
The toxicogenomic approach was used to investigate the drug-genes interaction to explore available treatments. The genes that interact with anticancer drugs are docetaxel, hydroxyurea, bleomycin, daunorubicin, lansoprazole, doxorubicin, liothyronine sodium, risperidone using DGIdb database. We identified sixteen proteins as potential alternative drug targets including CXCL1, FBXO32, PTTG1, CCNB1, ADCY2, NMU, ANAPC13 and others (Fig. 10). The dysregulation of these proteins may affect the normal expression level and could be a potential part of therapeutic strategies.

Discussion
This study focused on the genetic expression and functional enrichment of genetic variants of oral cancer. The six most significantly OC associated genes (CYP1A1, CYP1B1, SERPINB5, ANAPC13, ADCY2, C7) found through a differential analysis were consider as seeder or source genes. This analysis provides us a list of new genes aberrantly expressed in oral cancer including SERPINB5, ANAPC13, ADCY2. We have investigated differential expression between cases and controls of cDNA datasets at cellular level in oral tissues and found the possible association of these genes in oral cancer. We can get more information about the mechanism of human genetic Figure 9. Toxicogenomic analysis of differentially expressed genes carried out by a comparative toxicogenomic database (CTD) helps to study the chemical-genome to phenome relationships. www.nature.com/scientificreports/  Figure 10. The drug-gene network was constructed between the FDA approved drugs and target genes. A broken line indicates the interaction between known drugs while solid line represents the novel drug targets. Anticancer drugs were retrieved from drug B+ ank. www.nature.com/scientificreports/ disorder through microarray studies. The expression profiling of these important genes shows obvious differences between cases and controls. Some DEGs were found upregulated while other downregulated in this analysis. These genes are abnormally expressed to affect physiological functions including cellular signaling, replication, mitotic division, and programmed cell death. We have observed that our differentially expressed genes are associated with the cancer pathways including biological oxidations, metabolism, adenylate cyclase-activating pathway, xenobiotics, G alpha signaling events, transcriptional targets of TAp63 isoforms, p53, and IFN-gamma pathway revealed the biological significance of these genes specifically for oral cancer progression. CDC20_HUMAN, HDAC1_HUMAN, CXL10_HUMAN interacting with source genes are potential drug targets [13][14][15] . The inherent mutations are reported for genes that encode drug-metabolizing enzymes. Such somatic gene mutations are induced chemically that play a vital role in cell differentiation and growth 16 . The sequencing investigations not only characterizes the genomics but also revealed thousands of SNVs (single nucleotide variants), the alterations in copy number along with many types of genetic variations. Such genomic to phenomic association identification, their molecular-level mechanisms, disease-related variants along with cancer-derived mutations are the current challenges in the biomedical research 17,18 . Deciphering inter-individual genetic variation is the latest trend in personal genomic era investigations. The interpretation of genomic to proteomic information may integrate the impact of mutations on cellular system-level investigations in the future with a higher magnitude 19 .
Human proteomic data analysis uses PTMs to interpret disease mutations and inter-individual genetic variations. PTMs being important regulators of protein function and signaling pathways facilitate the missense mutational analysis investigations 19 .
We have observed that CYP1A1 belongs to potential and well-preserved phase-I xenobiotic metabolizing gene family which is involved in the activation of procarcinogens. The CYP1A1 enzyme is highly associated with increased risk of tumors in the oral cavity, bronchial and laryngeal regions in smokers 4 . Similarly, the association of CYP1B1 has found in many cancer types 20 . Many investigations are reported about the substantial link of HNC with CYP1A1 and CYP1B1 16,21 .
SERPINB5 belongs to serpin encoding serine protease which plays a vital role in tumor metastasis 22,23 . It is a tumor suppressor in epithelial cells and can suppress cancerous cell invasion and their metastasizing in surrounding tissues 24 . The paradoxical expression of SERPINB5 has been observed in various types of tumor [25][26][27][28] . We have seen a highly significant association between SERPINB5 expression and oral carcinoma 29 . ADCY2 is a membrane-associated enzyme which converts adenosine-5′-triphosphate (ATP) into 3′,5′-adenosine monophosphate (cyclic AMP/cAMP) and pyrophosphate 30 involving in the regulation of cAMP synthesis 31 . This gene catalyzes the signaling molecule cAMP through G-protein beta as well as gamma subunit signaling 32,33 . Therefore, changes in expression patterns of the gene are mediated through down-streaming of the signaling cascades muscarinic acetylcholine receptors which increases IL6 production. The high regulation of the gene is observed in G-proteins, calcium, calmodulin, pyrophosphate, and post-translational modifications. The signaling pathways include RET signaling, Oocyte meiosis, calcium, and chemokine signaling pathways [34][35][36] . Aberrant methylation of ADCY2 is observed in colorectal, prostate cancer [35][36][37] and urinary bladder cancer [38][39][40] . It has been studied that ANAPC13 41 is a large-sized ubiquitin ligase that controls the cell cycle progression 42 and involved at early steps of malignancy in tumor cells 43 . Similarly, C7 (complement component-7), the terminal component for complement cascade and as a cytolytic effector for complement system, lyses transformed malignant cells [44][45][46][47][48][49][50][51] . The integration of chemical-gene interaction revealed different environmental chemical exposure to disease progression 52,53 . This analysis helped to reveal the mechanism of action between the chemical and the related gene products and their effect on human disease influence by environmental exposure [53][54][55][56] .
The PPI network predicted the important association of these genes with disease. These genes have a potential role in xenobiotic metabolism, tumor progression, suppression, cell cycle, HPV (human papilloma virus), alcoholism, and microRNA signaling pathway 30,40,48,49,51,[57][58][59][60] . The transcriptomic analysis showed expressive transcription factors like JUND, FOXO, STAT1. We found the role of these genes in metabolism of xenobiotics, p53 signaling, salivary secretion, class-I MHC mediated antigens and microRNA cancer pathways. The miRNAs regulate post-transcriptional and translational events and expressional dysregulation in these molecules leads towards the progression of many diseases [61][62][63] . Therefore, the reliable miRNA target prediction is crucial for the functional annotation of miRNAs 64,65 .
Recent reports proved that the drug-gene network enables us to understand not only the disease pathophysiology but also important in drug designing or new drug target identification or establishing novel biomedical linkages. More importantly, this network proposed many testable assumptions with the potential of great success, though the real achievement can only be justified by experimental studies.

Conclusion
Our simulation-based systems-level hypothesis is comprehensive and effective to sort out the disease-specific genetic variants from cDNA datasets repositories. Therefore, this approach will support understanding the genetic basis of complex phenotypes including cellular replication, protein signaling, mitotic division, and programmed cell death. Based on genomic to phenomic investigations, we have found new genes including ADCY2, SER-PINB5, and ANAPC13 linked with oral cancer that could be potential diagnostic or drug targets. These source genes are clearly interacting with other essential genes affecting cell cycle and apoptosis causing carcinogenesis. These findings can provide a valuable framework for developing new therapeutic strategies against oral cancer.

Methods
Accession to cDNA datasets. 5.2 We downloaded cDNA datasets related to oral cancer from the Gene Expression Omnibus database (GEO) NCBI. The comprehensive framework has been illustrated in Fig. 11 using tools, online servers, and software ( www.nature.com/scientificreports/ Normalization and differential analysis. These datasets were analyzed in identifiable format to easily access pheno-data files and missing values were imputed 66 . R software version 3.3.3 and Bioconductor packages were used in computational analysis. The normalization and quality control analysis was performed to preprocess the information available by ArrayQuality [67][68][69] . The background and normalization were aligned by using Robust Multi-Array Analysis (RMA) to detect the PM (perfect matches) and the MM (mismatches) to impute the values for statistical analysis 70 . RMA is the widely used preprocessing algorithm applied for background correction to remove local artifacts 67 .
where PM indicates a perfect match, Background by BG and non-specific binding (S); ijk is the signal for probe j of probe set k on array i.
The perfect match involves the combined signals of background (BG) and expression (E). The "ArrayQualityMetrics" software was used to evaluate the quality of dataset that is normalized to each genke's median expression level 67,68,71 . The gene-gene covariance matrix of each data set was calculated across all arrays while ignoring the missing values. The transformation formula is: where F1 and F2 represents distribution functions of the actual and reference chips.  www.nature.com/scientificreports/ To get a description of intensities, we used the RMA-algorithm to measure averages between probes in a sample set. During this analysis RNA quality was evaluated in samples. The RNA degradation analysis was performed by using AffyRNAdeg, summary AffyRNAdeg, and plot AffyRNAdeg packages 72 . The DEGs were observed using the LIMMA package, that process the information based on modified statistics which is proportional to sample variance offsets. The LIMMA package measured the duplicate spots and quality weights. The statistical analysis was performed to categorize the genes based on the significant cutoffs values logFC greater than 1, FDR less than 0.05, AEL ≥ 40% and p value ≤ 0.05 73 . K-fold cross-validation. We used K-Fold Cross-validation and Bootstrap test to estimate accuracy in the differential analysis 74 and this approach has the advantage that all the samples in the dataset can ultimately be used for both training and research. This technique is usually easier to calculate estimated average error and has been used to validate the shortlisted differentially expressed genes using the 'Boot' package of Bioconductor. Boots trapping is used effectively in molecular analysis to correct biases 75 . In such cases, we applied the generalized linear Gaussian models and used the ' cv.glm' method to test the k-fold cross-validation. It estimates the true error as the average rate error: Leave-one-out-cross-validation (LOOCV) continued to trail the Gaussian rule. The LOOCV approach is instinctively termed as the test set is left out and the rest of the data is used as the training-set 75 . We used N − 1 subsets for training and the rest for testing. Increasing the number of folds would make the bias of the true error rate estimator low and valid 75,76 .
The true error is assessed as the average error rate on test cases:

Disease-gene curation of differentially expressed genes (DEGs). The text mining is important in
biomedical research to extract useful information 77 . This analysis is designed to identify the most significant DEGs, all 30 genes from 21 datasets were curated from the DAVID database to retrieve their gene symbol, gene name, Uniprot_ID. These genes were curated using Comparative Toxicogenomics Database (CTD), Online Mendelian Inheritance in Man (OMIM), PubMed and MeSH databases to observe their role in oral cancer. This screening further shortlisted the significant DEGs 78 .
Enrichment and cluster analysis. The biological functions of the genes help us to understand the cellular level signaling network. We performed enrichment analysis using the DAVID tool [78][79][80] . FunRich tool was used to observe the biological functions of oral cancer-related DEGs at molecular level 81 . The list of DEGs were analyzed for their p-and FDR values 79 . For cluster analysis, gene expression values of cases and controls of each dataset were studied to observe genetic variations and expression profiling using One Matrix CIMminer tool 82,83 . Mutation analysis. Mutations resulting from cancer and the inherited-disease process can be understood to decode the genetic variation by associations of genotype-phenotype. The human genome contains thousands of SNVs (single nucleotide variants) and many are known for the progression of the disease. Approximately 21% of amino acid substitutions are known to be associated with disease-progression in correspondence with missense single nucleotide variants located at PTM protein sites (post-translation modifications). The chemical modification of the amino acid thus basically extends the functionality of the associated protein 19 . Mutation of differentially expressed genes were analyzed using online ActiveDriverDB database 19 . The needle plot mutations analysis provides a visual overview of the position, frequency, and functional significance of all identified mutations in our DEGs. PTM sites with all mutations and the predicted disordered region of protein sequences were observed. Placing the pins corresponds to the position along the sequence of the genes and protein, whereas the related mutation effect and PTM are explained in the figure legend.
Protein-protein interaction. The biological functions are mainly carried out by protein-protein interactions 30,84 . The interaction of proteins reveals that each protein interacts with one or more genes related to their molecular functions 85 . The biological networks indicate altered activity in normal or disease conditions. This gene-network aims to identify potentially OC associated gene signatures whose dysfunction directly contributes to disease phenotype are functionally associated. The gene signatures related to each source protein was measured. Human Annotated and Predicted Protein Interaction (HAPPI) and String databases were used to analyze gene-gene/protein-protein interactions of microarray dataset DEGs 86 . This database annotates and mine comprehensive physical as well as genetic mapping and includes experimentally validated data to simulate biological networks. We have mentioned the threshold for PPI network from HAPPI database. We used highconfidence interactions in our network (the five stars are equivalent to high score (0.90-1). The role and association of these source and target genes in oral cancer were evaluated from Cancer Genetics Web, National Cancer and OMIM database. The molecular networks were visualized by Cytoscape software (version 3.6.0) 87 . The www.nature.com/scientificreports/ Cytoscape Network Analyzer calculates topological properties of networks. The degree of annotation between the gene and disease is categorized by nodes in the network.
Pathway analysis of oral cancer linked genes. Reactomic analysis enable us to explore all metabolic networks of DEGs regarding their molecular mechanism. We analyzed these pathways to inter-connect DEGs to show the pathological mechanism of oral cancer. The KEGG and Wiki pathways databases were used to map target genes 88,89 . PathVisio tool was used to reconstruct the pathway model for understanding system-level analysis 90 .
Toxicogenomic analysis. The toxicogenomic analysis is carried out by a comparative toxicogenomic database (CTD) to retrieve exposome data. The exposome data helps investigate chemical-genome to phenome relationships to interpret the functional pathway cellular signaling-mechanism towards disease progression influenced by environmental exposures. It provides information regarding chemical-gene/protein and disease interactions which may reveal the particular gene-activity or expression regarding gene-disease connections. The curation of environment-disease exposure helps to analyze the available toxicogenomic information 55 .
Prediction of regulatory motifs. Cancer has a complex mechanism that can be explored by understanding the biological functions at transcription and post-transcriptional level. oPOSSUM version 3.0 was used to analyze promoter region target motifs like transcription factor binding sites (TFBS) or the overexpression of target matrices 91,92 . This information helps to understand the functional role of gene targets and eventually gene ontology 93 .
Prediction of oral cancer-associated miRNA targets. Numerous genes are involved in the biological signaling cascade. These cascades are influenced through small noncoding RNAs as post-transcriptional regulators, known as microRNAs (miRNAs). The function and expression of miRNA play a significant role in understanding gene etiology 5,93 . miRNA target prediction helps to explore the functional and molecular annotation of disease-specific DEG's 5,94 . Therefore, oral cancer associated DEG's miRNA targets were predicted by miRDB, an online database for functional microRNA target prediction. The target prediction data involves specie specific 3′-UTR sequences, 3′-UTR region length, miRNA seed binding-sites, miRNA-candidate target pairs along with target prediction scores, miRNA-target sequences, and other important description 95,96 . The MiRNA target predictive score is ranked and > 80 was considered as a reliable score 95,96 .
Drug-gene interaction analysis. In our study, the drug-gene network analysis was performed to correlate our shortlisted DEGs with FDA approved commercially available anti-cancer drugs. CTD database was used to investigate the relationship between the chemical and disease at default parameters. In this analysis, DEGs were directly linked to anticancer drugs. All drugs, used in this interaction, were verified through the Drug Bank database to check approval status by the FDA.

Data availability
All the other data that support the findings of this study are available from the corresponding author upon request.