Predicting the molecular mechanism-driven progression of breast cancer through comprehensive network pharmacology and molecular docking approach

Identification of key regulators is a critical step toward discovering biomarker that participate in BC. A gene expression dataset of breast cancer patients was used to construct a network identifying key regulators in breast cancer. Overexpressed genes were identified with BioXpress, and then curated genes were used to construct the BC interactome network. As a result of selecting the genes with the highest degree from the BC network and tracing them, three of them were identified as novel key regulators, since they were involved at all network levels, thus serving as the backbone. There is some evidence in the literature that these genes are associated with BC. In order to treat BC, drugs that can simultaneously interact with multiple targets are promising. When compared with single-target drugs, multi-target drugs have higher efficacy, improved safety profile, and are easier to administer. The haplotype and LD studies of the FN1 gene revealed that the identified variations rs6707530 and rs1250248 may both cause TB, and endometriosis respectively. Interethnic differences in SNP and haplotype frequencies might explain the unpredictability in association studies and may contribute to predicting the pharmacokinetics and pharmacodynamics of drugs using FN1.

list (5557 genes, table S1) of the target was submitted in the STRING to generate the network.The output file showed the interaction between the targets obtained from BioExpress database.A schematic diagram provides a summary of all methodological techniques in a schematic diagram (Fig. 1).STRING has a default feature that does not carry duplicate genes; those genes are not officially mention.STRING was generated the network 532 out of 5556 genes (Fig. 2).STRING was generated the network 532 out of 5556 genes that was take up as input in Cytoscape to generate the node file.The results obtained were summited to Cytoscape and the node file Identification of hub genes associated BC.Hub genes tend to exert core functions in a closely related network.Of the 532 genes within the module, 30 most relevant genes were selected as the candidate hub genes (FN1, MMP9, GFAP, PPARG, FOXM1, PVALB, CAV1, AQP4, ASPM, DPP6, LEP, GATA4, PBK, APOB, CES1, ACAN, IBSP, CAV3, S100B, CAPSL, MYH11, DMD, LPL, WT1, EPO, SLC2A4, PRSS1, TK1, NEURL1 and SLC6A4).Three genes (FN1, FOXM1, and PPARG ) out of 30 genes were significantly associated with BC.The prognosis results of these three genes adjusted for stage and grade also indicated significance (Fig. 5).

Study of LD and haplotype. Genotype data of CHB (Han Chinese) were retrieved from the International
Hapmap Project for studies of the LD and haplotypes to examine the genetic parameter of the identified FN1 biomarker.Data about the susceptibility of many diseases due to genetic variants has been evaluated as the information helps to find the vital biomarkers for functional associations with different diseases and drug resistance  www.nature.com/scientificreports/due to SNPs.The LD and haplotype study revealed significant blocks in the FN1 gene, having nonrandom associations as represented in Fig. 6, Table S3.Two SNPs (rs1250248, and rs6707530) were identified and found to role in different diseases and drug resistance.These 2 SNPs (rs1250248, and rs6707530) with minor allele frequencies and r2 ≥ 0.8 showed high correlation between the loci.Linkage disequilibrium has been reported between the common polymorphism found on FN1 at positions 215,360,440 and 215,436,073.The analysis revealed that of the nsSNPs identified, only two nsSNPs occurred and were linked in Han Chinese individuals.The results also indicated that only MAF (minor allele frequency) values of 0.263, and 0.233 showed relatively strong linkage disequilibrium.The genotype of rs6707530 with the FN1 gene increased the risk of occurrence of TB 20 .The genetic variant of FN1 rs1250248 was also found to be significantly associated with endometriosis.This study demonstrates a potential connection between breast cancer, TB and endometriosis, and in this paper reported genetic variants of FN1 were identified as chemoresistance.These results provide an evidence of alteration leading to chemoresistance in several diseases.LD and haplotype information would be beneficial in drug development and in understanding the genetic associations of FN1 with adverse drug effects.

Biomarker guided drug repositioning and validation.
To support identification of a drug for targeted therapy, the analysis of drug sensitivity of identified biomarkers by two approaches i.e.LD and GSCA was carried out.Further that the results of LD and haplotype study revealed the relationship between FN1 (key gene in breast cancer) with endometriosis, therefore FDA approved drugs used in breast cancer and endometriosis were selected  www.nature.com/scientificreports/for validation of the biomarkers using docking procedure by Autodock vina.In addition GSCA online server results obtained as output was used to study interaction of drugs with three selected genes (PPARG, FOXM1 and FN1).The results of GSCA showed PPARG and FN1 are sensitive towards 23 drugs (AR-42, AT-7519, BMS345541, BX-912, CUDC-101, FK866, I-BET-762, Ispinesib, Mesylate, JW-7-24-1, KIN001-102, Methotrexate, NG-25, PHA-793887, PI-103, PIK-93, Phenformin, QL-X-138, THZ-2-102-1, TPCA-1, TubastatinA, Vorinostat, WZ3105, XMD13-2, GSK690693, and NPK76-II-72-1) whereas FOXM1 was least sensitive towards them but had sensitivity towards RDEA119, Trametinib and selumetinib drugs.The forty-four drugs obtained from the GSCA and LD analysis were subjected for docking studies against all the three proteins related to the biomarkers for validation.The results showed that the drugs THZ-2-102-1, navitoclax, ng25, and trametinib have good binding affinity with all the three proteins whereas vorinostat had least binding affinity with all the three targets which was found to be inconsistent to the GSCA result.This inconsistency in results may be to the difference in the inputs made as the GSCA uses gene information whereas docking is study of protein ligand interaction.Since docking is robust technique we relied on docking results for further study ( .0,-8.9, -6.9, -6.8, -6.5, -6.5.However, it was observed that five drugs viz.Tucatinib, THZ-2-102-1, Olaparib, Navitoclax, ng25, Trametinib, Methotrexate, Abemaciclib showed good binding affinity against all three target protein.

Discussion
This study demonstrates how predicted biomarkers play an integral role in the progression of BC with the support of literature and tried to identify the potential small molecules that can be developed for its treatment.To study the key genes involved in the breast cancer and their association with other disease network biology based approach was taken up in addition to the use of other in-silico tools.The network-analyzed approach with breast cancer datasets form Bioexpress database and an independent validation set with 5557 genes was used to generate the genes network.A node file of 532 genes was created out of 5557 genes by using the network analyzer.The module preservation analysis invalidating set revealed that the identified modules were reliable, as all of their summary statistics were above 30.A venn diagram was generated using the Venn Diagram tool based on the results obtained and it could be concluded that three genes (FN1, FOXM1, and PPARG ) are closely related to the target disease.The literature study revealed that the expression of these biomarkers along with the protein related is enhanced in BC.
The involvement of identified overexpresses biomarkers in related biological processes (BPs), molecular functions (MFs), and cellular components (CCs) was studied using network based and integrated statistical methods.It was felt that biological processes, cellular components, and molecular function like protease, serine protease, hormone, motor protein, protease inhibitor, and cell division process would make a good target for the design and development of anti-BC agents.
The expression level of FN1 has been correlated to an advanced stage of breast cancer and poor clinical outcomes.FN1 were identified for the first time as cancer stromal key genes associated with breast cancer invasion and metastasis 21 .The FN1 gene encodes fibronectin 1, an important extracellular matrix glycoprotein that plays a pivotal role in occurrence and development of various tumors.It binds to several members of the family of integrin receptors 22 thereby activating the breast cancer's PI3K/Akt pathway.Furthermore, FN1 has been shown to promote cell proliferation and migration in cancers such as esophageal, oral, nasopharyngeal, colorectal, ovarian, renal, and thyroid 23,24 .
The Forkhead Box M1 encoded by FOXM1 gene is a transcriptional activator involved in cell proliferation and encodes protein which is phosphorylated in M phase and as a result regulates the expression of several cell cycle genes, such as cyclin B1 and cyclin D1 25 .The FOXM1 gene has been reported to overexpress in aggressive, therapy-resistant variants of hormone receptor-positive and triple-negative breast tumors and correlated with a poor prognosis in a variety of cancers, such as breast cancer and colorectal cancer [25][26][27] .
The Peroxisome Proliferator-Activated Receptor-Gamma (PPARG ) is a ligand-activated nuclear hormone receptor that regulates lipid metabolism and insulin sensitivity 28 .By synthesizing high amounts of lipids, ERBB2positive breast cancer cells generate palmitate-induced lipotoxicity 41 .PPAR is a nuclear hormone transcription factor that regulates the expression of several genes involved in adipogenesis, energy metabolism, proliferation, and growth of tumours.PPARG is the predominant subtype of its family that is expressed in the mammary gland and in primary and metastatic breast cancer, according to references [28][29][30][31][32][33][34][35] .
PPARG is indirectly involved in development of breast cancer through ERBB2 signaling 36  www.nature.com/scientificreports/ERBB2-positive breast cancer cells supporting its role in development of BC 36 .In numerous malignancies, including colorectal cancer, hepatocellular carcinoma, lung cancer, glioma, and leukaemia, the effects of PPAR activity in CSCs (Cancer stem cells) has been investigated 36 .All these genes are reported overexpressed along with their target proteins in cells.This study was aimed to find out specific inhibitor to control the overexpression of these proteins in the cancer cells.Lately FDA has been approving the single to multi-target drugs in cancer therapy and this move is look at a different scenario, where a new generation of anticancer drugs is capable to prevent more.There is major paradigm shift in drug design and discovery due to number of reasons 26,37 and one such approach that is gaining fast acceptability is development of multi-target drug.The advantages of multi-target drugs over their single-target counterparts include improved efficacy, a safer profile, simpler administration and patient compliance.Also polypharmacology-guided drug design has been an approach used by scientist for drug development in particularly for repurposing of drugs and multi-targeting.
Taking the concept of multitarget approach the docking of FDA approved anticancer (anti BC and antiendometriasis) agents and drugs identified in GSCA analysis was carried out against the three selected targets.The results obtained presented in Table 1.It could be observed that THZ-2-102-1 and tucatinib showed best binding affinities with all the three target receptors (3M7P, 3G73 and 4Y29) followed by navitoclax, ng25, trametinib, olaparib, and methotrexate contrary to this thiopeta had least binding score with all the target receptors.Interaction analyses of the docked poses of tucatinib 38 showed that the drug is a persuasive inhibitor of 3M7P, 3G73 and 4Y29.Tucatinib and THZ-2-102-1 could be further taken as lead for development of multitarget drug for BC treatment.Trametinib is already approved by the FDA in May 2013 for the treatment of metastatic melanomas, as well as lung cancer, renal cancer, thyroid cancer, cholangiocarcinoma, and breast cancer [39][40][41][42] .These reports further validated our results that the biomarker identified and drug for multiple target can be "the approach" for the disease.
It could be further mentioned that these drugs would be more beneficial in cases suffering from TB as the Linkage studies identified the FN1 gene as a susceptibility for TB 20 , endometriosis 39 .These drugs are not clinical approved against to our targets and may be expanded upon for wet lab studies.

Identification and selection of BC-associated genes. BioXpress is a differential expression database
for cancer where RNA-seq and miRNA-derived read counts have been evaluated for differential expression.The current version of BioXpress incorporates mRNA and miRNA-derived expression from TCGA and ICGC 16 .BioExpress database have filter feature like search type, cancer type, feature type, trend with significance cutoff to retrieve cancer dataset.To retrieve cancer dataset from the database, filters like cancer type, breast cancer, mRNA with adjusted p-value were used and the output file obtained was further used for the study.
Analysis of functional association.The Database for Annotation, Visualization and Integrated Discovery (DAVID) provides a comprehensive set of functional annotation tools for investigators to understand the biological meaning behind large lists of genes 43 .DAVID's was used to analyze modules at all levels of the hierarchy 44 .The pathways and functions with a corrected p < 0.05 were deemed statistically significant.
Construction of protein-protein interaction (PPI) network.Out of 11,053 genes identified which were significantly overexpressed in BC patients from BioXpress 16 (FC > 1, adjusted p < 0.05), 5557 genes (Table S1) were used for an interactome network using the STRING app in Cytoscape 3.6.016 18,19.The network was extracted only for the physical interaction network, which represents the protein-protein interaction network of BC-associated genes.A protein-protein interaction network using 2960 nodes and 20,372 edges was constructed as a primary network.Venn diagram server was used to show and identify the relationships among lists of degree, betweenness, and closeness centrality of the node file. 45(https:// kmplot.com/ analy sis/) was queried for assessing the prognosis of these functionally enriched module genes in network approach.KM survival plots along with 95% confidence interval (95% CI), hazard ratio (HR), number-at-risk, and log-rank p-values of enriched module-genes were presented by splitting network approach into lower and higher expression groups, respectively.www.nature.com/scientificreports/Analysis of LD and haplotype.Linkage disequilibrium (LD) plays a key role in a wide range of mapping disease associations to demographic history estimation or trail-associated genes 46,47 .Haplotype blocks reveal information on the combination of alleles or a set of single nucleotide polymorphisms (SNPs) found on the same chromosome and aid investigators in localizing disease-causing genes and loci.The Haploview tool 48 was used to study LD, and haplotype block for Han Chinese (CHB) genotype data retrieved from the International Hapmap Project 49 .The genotype data was visualized and examined for linkages and generation of LD and haplotype blocks.www.nature.com/scientificreports/Biomarker guided drug repositioning.In order to identify potential anti BC agents two approaches were used; i.e.Gene Set Cancer Analysis (GSCA) and ii.Molecular docking.To visualize and analyze the correlation between the identified biomarkers and drugs, GSCA (http:// bioin fo.life.hust.edu.cn/ GSCA/#/) an online tool was used.On the basis of Spearman Correlation and FDR value (between drugs and targets) drugs were selected for further study from GSCA analysis.This approach may be helpful in improving the efficacy or safety related to drugs.Further that the LD and haplotype studies revealed that endometriosis is associated with our target key genes 39 .Therefore, FDA-approved drugs used in BC and endometriosis were selected for further studies.Molecular docking studies being a significant in-silico technique for validating the drug-target binding interaction 50,51 was used to validate the selected drugs for multitarget purposing.The 3D structure of target proteins (PBD ID-4Y29, 3G73, 3M7P) was downloaded from Protein Data Bank (PDB) (https:// www.rcsb.org/) 52 and used for docking studies.The structure of identified/ selected drugs from GSCA and FDA-approved drugs used in BC and endometriosis were download from the PubChem databases (https:// pubch em.ncbi.nlm.nih.gov/) 53 .The AutoDock-vina 26 software was used to examine the structural binding performance between receptor and drugs by computing affinity scores (kcal/mol).Discovery Studio Visualizer was used to visualize 3D protein-ligand interaction.

Conclusions
In this study, using network approach the key genes as breast cancer regulators were identified.The GO-terms (BPs, MFs and CCs) and key genes regulatory network analyses highlighted as pathogenetic process of BC progression.Our study also suggested two biomarkers-guided tucatinib and THZ-2-102-1 drugs which might be effective in inhibiting BC-causing proteins (FN1, FOXM1, and PPARG).The literature review and molecular docking analysis were used to validate our findings.The findings of our in-silico study may potentially lead to insights into the molecular mechanisms that drive BC progression and potential therapeutic agents. Vol

Figure 2 .Figure 3 .
Figure 2. Network of the up regulated gene in breast cancer.

Figure 4 .
Figure 4. a Plots illustrating the correlation between the biological processes (BPs), cellular components (CCs), molecular function (MF) and KEGG pathways with the genes identified by DAVID.

Figure 5 .
Figure 5. Kaplan-Meier curve of key regulators FN1, FOXM1 and PPARG.p values were calculated using the log rank test to evaluate the overall survival analysis between low expression (black) and high expression (red) of key regulator genes of patients.

Figure 6 .
Figure 6.LD structure of FN1 gene: (A, B) LD structure of maximum number of SNPs, (C, D, E) LD structure of FN1 with minimum block size.

Figure 7 .
Figure 7.The structure of anti-breast cancer inhibitors of our targets.

Table 1
. result obtained in terms of list of genes name with closenesscentrality value, degree value and betweennesscentrality value by using cytoscape.S.

Table 2 .
Molecular docking analysis results of ligand selected from GSCA and FDA approved drugs for endometriosis and breast cancer against the three targets FN1, FOXM1, and PPARG.