Mechanism of Astragalus membranaceus in the treatment of laryngeal cancer based on gene co-expression network and molecular docking

Astragalus membranaceus (HUANG QI, HQ) is a kind of traditional Chinese medicine. Researchers have widely concerned its antitumor effect. At present, there is still a lack of research on the treatment of laryngeal cancer with HQ. In this study, we integrated data from the weighted gene co-expression network of laryngeal cancer samples and the components and targets of HQ. A new method for dividing PPI network modules is proposed. Important targets of HQ treatment for laryngeal cancer were obtained through the screening of critical modules. These nodes performed differential expression analysis and survival analysis through external data sets. GSEA enrichment analysis reveals pathways for important targets participation. Finally, molecular docking screened active ingredients in HQ that could interact with important targets. Combined with the laryngeal cancer gene co expression network and HQ PPI network, we obtained the critical module related to laryngeal cancer. Among them, MMP1, MMP3, and MMP10 were chosen as important targets. External data sets demonstrate that their expression in tumor samples is significantly higher than in normal samples. The survival time of patients with high expression group was significantly shortened, which is a negative factor for prognosis. GSEA enrichment analysis found that they are mainly involved in tumor-related pathways such as ECM receptor interaction and Small cell lung cancer. The docking results show that the components that can well bind to important targets of HQ are quercetin, rutin, and Chlorogenic acid, which may be the primary mechanism of the anti-cancer effect of HQ. These findings provide a preliminary research basis for Chinese medicine treatment of laryngeal cancer and offer ideas to related drug design.

Identification of critical modules. We use the WGCNA package in R software to build a weighted coexpression network 24 . When R 2 > 0.9, the network meets the scale-free condition. The calculation result is shown in Fig. 1A, where SFT.R.sq is R 2 . When the power value is set to 5, the connectivity between genes in the network satisfies the scale-free network distribution. We use a dynamic cutting algorithm to obtain ten modules, of which the gray module is a set of genes that cannot be aggregated into other modules (Fig. 1B). Then, the correlation between each module and the phenotype (survival time, recurrence, age, and grade) was calculated (Fig. 1C). A heat map showed a significant positive correlation between the green module and recurrence (cor = 0.32, P < 0.001) (Fig. 1D). The number of genes in each module is shown in Fig. 1E. The GS and MM scatter plots of the green module also illustrate the significant correlation between the green module and laryngeal cancer. Therefore, the green module was selected as the critical module. Enrichment analysis of the critical module. We used GO enrichment analysis and KEGG pathway analysis to find the biological function of genes in the green module ( Fig. 2) 25,26 . In the biological process classification, genes in the module are significantly enriched in extracellular matrix organization, hemidesmosome assembly, cell adhesion, cellular response to zinc ion, negative regulation of growth, extracellular matrix disassembly, cell-matrix adhesion, cell adhesion mediated by integrin, cellular response to cadmium ion, leukocyte migration. In the cellular component classification, the genes in modules are significantly enriched in focal adhesion, cell surface, extracellular space, basement membrane, integrin complex, perinuclear region of cytoplasm, extracellular matrix, extracellular region, proteinaceous extracellular matrix, extracellular exosome. In the molecular function classification, the genes in modules are significantly enriched in integrin binding, L-ascorbic acid binding, protein binding, procollagen-lysine 5-dioxygenase activity, laminin binding, cadherin binding involved in cell-cell adhesion, zinc ion binding, protease binding, ion channel binding, glycoprotein binding. In the KEGG pathway classification, the genes in modules are significantly enriched in focal adhesion, ECM-receptor interaction, PI3K-Akt signaling pathway, regulation of actin cytoskeleton, mineral absorp-  Fig. S2. We take HQ-PPI network as the background. Gene interactions in the weighted gene co expression network of laryngeal cancer was mapped to the HQ-PPI network. Common nodes and interactions between gene modules and PPI network are preserved. Finally, we got laryngeal cancer-HQ-PPI network. The network consists of 64 nodes and 249 edges, which are composed of 9 modules (with the gray modules removed), as shown in Fig. 3A. Therefore, the cor- www.nature.com/scientificreports/ responding green module after the mapping is considered to be a critical module for HQ treatment of laryngeal cancer. Nodes in the green module are considered candidate important targets. By calculating the topology parameters of the network nodes, the degree value of each node is obtained. As shown in Fig. 3B, compared with the nodes of other modules, the nodes of the green module have a higher degree value and are in the core position of the network.  www.nature.com/scientificreports/ Differential expression analysis, survival analysis, and GSEA analysis of important targets. Based on the sample information of TCGA and GTEx projects in the GEPIA database, it was found that some candidate important targets had differential expression between normal and laryngeal cancer. The Wilcoxon rank test was performed on the normal group and the laryngeal cancer group (Fig. 4A). Among them, the expressions of MMP1, MMP3, and MMP10 in tumor tissues were higher than those in normal tissues (p < 0.05). After that, we used the Kaplan Meier-plotter website for survival analysis. The results are shown As their expression levels increase, the overall survival time decreases significantly. The above results suggest that MMP1, MMP3, and MMP10 are poor prognostic factors, and their high expression is significantly associated with shortened patient survival. Therefore, they are considered to be important targets closely related to laryngeal cancer, which may be the primary mechanism of HQ treatment of laryngeal cancer. To understand the pathways involved in these important targets (MMP1, MMP3, MMP10), a GSEA analysis was performed using the data set GSE27020. There were ten pathways enriched in the high expression group ( Table 1 and Fig. 5. The docking results of other compounds are shown in Table S3. As shown in the figure, among the top 10 ingredients, the best combination with MMP10 is isomucronulatol-7,2′-di-O-glucosiole, Jaranol, rhamnocitrin-3-O-glucoside, Hirsutrin, the best combination with MMP3 is

Discussion
In this study, we obtained ten modules based on 109 laryngeal cancer expression data in the GSE27020 dataset. The green module is the critical module we found. The results of enrichment of GO and KEGG also showed the mechanism of laryngeal cancer. Among them, cell adhesion mediated by integrin, leukocyte migration, ECMreceptor interaction, PI3K-Akt signaling pathway etc. have been proved to be closely related to the occurrence of cancer. Integrin mediates the interaction between cell and cell, cell and extracellular matrix, and participates in the information transmission inside and outside cell 27 . It plays an important role in the proliferation, differentiation, adhesion, migration and apoptosis of cancer cells. Some studies have shown that integrin is the key to cancer cell proliferation. When cancer cells float, integrin will change its function from adhesion to communication 28 . The dysfunction of ECM-receptor interaction may lead to the abnormal activation of signal transduction pathways, including Ras/ Raf/MAPK, Raf/JNK, Rho/RAC/Pak and PI3K/Akt/mTOR. So as to create a tumor microenvironment that can promote tumor survival, angiogenesis and invasion 29 . PI3K-Akt signaling pathway is the most common dysregulated pathway in many tumor cells. When it is activated abnormally, many tumor suppressor genes, including PIK3CA, PIK3R1, PTEN, Akt, TSC1, TSC2, LKB1, mTOR, will be affected 30 . The results of the enrichment analysis may be the main mechanism of laryngeal cancer.
The green module was mapped on the HQ PPI network, and finally, the potential critical targets MMP1, MMP3, and MMP10 for the treatment of laryngeal cancer were obtained. Gene expression data from the external www.nature.com/scientificreports/ data set showed that the expression of these three genes in the tumor group was significantly higher than in the normal group. Survival analysis showed that high expressions of MMP1, MMP3, and MMP10 would lead to a reduction in patient survival. They are detrimental to prognosis. GSEA analysis showed that the MMP1, MMP3, and MMP10 high-expression groups participated in tumor-related ECM receptor interaction, small cell lung cancer, and other pathways. MMP1, MMP3, and MMP10 belong to the matrix metalloproteinases (MMPs) family 31 . The MMPs family contains multiple types of proteases. They can break down most of the extracellular matrix components and some proteins, producing growth factors essential for tumorigenesis and progression. Therefore, the MMP family has become a biomarker for some tumors 32 . MMP1 expression increases in malignant tumor cells, which is helpful for tumor cell invasion and metastasis 33 . Studies have shown that the genetic characteristics of MMP1 may become an independent prognostic factor for laryngeal cancer. And the expression of MMP1 in most of the laryngeal cancer tissues with lymphatic metastasis was higher than that of the laryngeal cancer tissues without lymphatic metastasis. MMP1 also increased with the increase of infiltration 34 . MMP3 is one of the most active members in MMPs, and it is expressed in tumor cells such as gastrointestinal tumors and liver tumors 35 . Gobin Emily's research found that MMP3 and MMP10 were significantly up-regulated in at least ten cancer types 36 40 . It can also induce mitochondrial apoptosis in colon cancer cells through a caspasedependent mechanism 41 . In breast cancer, rutin inhibits P-GP and BCRP pumps non-selectively. It can effectively reverse the resistance of breast cancer cells and restore sensitivity to cyclophosphamide 42 . Chlorogenic acid inhibits tumorigenesis of breast cancer cells by causing mitochondrial dysfunction. It can increase the sensitivity of tumour cells to chemotherapy 43 . Besides, chlorogenic acid can also affect the expression of apoptosis-related genes in A549 human lung cancer cells 44 . In addition to quercetin has been reported to be associated with laryngeal cancer, rutin and chlorogenic acid have been reported to be associated with a variety of cancers. But whether they can treat laryngeal cancer has not been reported, and it may become a potential therapeutic drug.
Our method is different from the common PPI-based network module division method. In the past, network modules were divided based on scale-free topological characteristics to obtain the modules with the best connectivity. In this study, we integrated gene expression data for laryngeal cancer. Modules are divided by co-expressed www.nature.com/scientificreports/ genes in laryngeal cancer so that they have a better degree of connection in biological functions. After this, we performed a topology analysis on the mapped network and found that important targets are in the center of the network. Therefore, our method not only integrates the patient's disease information but also ensures the high connectivity of the important targets. As the number of patient samples in TCGA, GEO and other public databases increases, more potential targets and active ingredients can be discovered in the future. Besides, our research lacks the validation of animal or cell experiments. In future research, we will perform pharmacodynamic, and toxicity experiments on these candidate components, and in vivo and in vitro experiments such as RT-PCR and western blot will perform biological verification of potential critical genes.
In this study, we analyzed the gene expression data of 109 laryngeal cancer samples and screened to obtain three important targets (MMP1, MMP3, and MMP10). Their expression in tumor samples was significantly higher than in normal samples, and patients in the high expression group had significantly shorter survival times. The components that can be combined with critical targets in HQ are quercetin, rutin, and Chlorogenic acid. These components may be the primary mechanism of the anti-cancer effect of HQ. These findings provide a preliminary research basis for Chinese medicine treatment of laryngeal cancer and offer ideas to related drug design.

Methods
Data download and preprocessing. The laryngeal cancer GSE27020 data set was obtained from the GEO database and contained a total of 109 laryngeal cancer samples. The downloaded data needs to be preprocessed before analysis. The clinical information of the samples was extracted based on the Affymetrix Human Genome U133A Array platform, and the probe names were converted into gene names. Finally, a gene expression matrix with row names of gene names and column names of sample names was obtained for subsequent analysis.

Construction of weighted gene co-expression network.
In this study, we use WGCNA package in R software to construct the weighted gene coexpression network of laryngeal cancer. Before constructing network, we need to remove outliers. In this study, the goodsamplesgenes algorithm was used to remove the genes with too many missing values. Then, the hclust algorithm was used to cluster the tissue samples. The top 25% of genes with variance in expression were selected. The remaining samples were used to build a weighted gene co-expression network. The weighting coefficient β is selected according to the criteria of the scale-free network, and the correlation matrix is converted into an adjacency matrix, which is hierarchical clustering based on the value of topology matrix (TOM) value. The dis-TOM is used as the distance measurement of the modules. The minimum module size is set as 10. We use a dynamic hybrid cutting algorithm to identify the module and draw the gene tree 45 .
Then, we calculate the eigenvalue of the module, which is the first principal component obtained by principal component analysis (PCA) of all genes in the module. Screen the modules that are significantly related to clinical characteristics as critical modules. Calculate gene significance (GS) and module membership (MM) to measure the correlation between genes and clinical information 46 . Enrichment analysis of critical modules. In order to understand the biological functions of candidate genes, this study maps the genes of critical modules to the online website DAVID (https ://david -d.ncifc rf.gov/) for gene ontology (GO) enrichment analysis and KEGG pathway analysis. The items with P < 0.05 were statistically significant.
PPI network of potential targets of HQ in the treatment of laryngeal cancer. Traditional Chinese Medicine Systems Pharmacology Database and Analysis Platform (TCMSP, https ://tcmsp w.com/tcmsp .php) includes 499 Chinese herbal medicines registered in the Chinese Pharmacopoeia, 29,384 ingredients, 3,311 targets, and 837 related diseases 47 . From this, all components of HQ and corresponding targets can be obtained. String (version 11.0, https ://strin g-db.org) is a database integrating known and predicted proteinprotein interactions, and has the function of visualizing protein-protein interaction network 48 . The PPIs of the HQ target were obtained with medium confidence of 0.4 and then imported into Cytoscape to construct the PPI network of HQ target 49 .
If a node is a potential target of HQ, and it is also in a critical module of a weighted gene co-expression network, this node may be a potential target for HQ treatment of laryngeal cancer 50 . Therefore, we take the HQ PPI network as the background and map the gene modules in the weighted gene co-expression network to the HQ PPI network. The Uniprot database (https ://www.unipr ot.org/) is used to convert gene names and target names 41 . Therefore, the module based on the weighted gene co-expression network automatically divides the PPI network into multiple modules. The nodes in the critical module are candidate important targets.
Verification of important targets based on external data sets. GEPIA (https ://gepia .cance r-pku. cn/) is an interactive web server for integrated analysis of cancer expression profiling data 52 . It contains 33 malignant tumor RNA sequencing expression data from TCGA and GTEx. In this study, gene expression data from the TCGA and GTEx databases were used as external test sets to verify important targets obtained in the previous screening. Run the "BoxPlots" module to analyze whether important targets have been differentially expressed between laryngeal cancer and normal samples. Kaplan Meier-plotter (https ://kmplo t.com/analy sis/ index .php?p=servi ce) is a website for online survival analysis 53 . The survival analysis of important targets was carried out. Under different expression levels, whether the survival time of patients has a significant difference. Enrichment analysis of pathways associated with high or low expression of important targets was performed using GSEA 3.0 software 54 . They are grouped by the expression level of important targets in the GSE27020 data-Scientific RepoRtS | (2020) 10:11184 | https://doi.org/10.1038/s41598-020-68093-0 www.nature.com/scientificreports/ set. The data set of c2.cp.kegg.v7.0.symbols.gmt in the msigdb database of the GSEA website was selected as the reference gene set. The number of random combinations was set to 1,000.
Verification of important targets based on molecular docking. Download 3D structures of all molecules in HQ based on the PubChem database 55 . The 3D structure of the target was searched and downloaded from the PDB protein database (https ://www.rcsb.org/pdb/home/home.do) 56 . We prefer to select high-resolution crystal structures complexed with corresponding biologically active ligands and use Autodock software for docking 57 . All ligands are removed from the protein receptor complex, plus polar hydrogen atoms and charges. Finally, the compounds in HQ dock with protein crystals in a semi-flexible manner.