Genetic network and gene set enrichment analyses identify MND1 as potential diagnostic and therapeutic target gene for lung adenocarcinoma

This study aimed to characterize the key survival-specific genes for lung adenocarcinoma (LUAD) using machine-based learning approaches. Gene expression profiles were download from gene expression omnibus to analyze differentially expressed genes (DEGs) in LUAD tissues versus healthy lung tissue and to construct protein–protein interaction (PPI) networks. Using high-dimensional datasets of cancer specimens from clinical patients in the cancer genome atlas, gene set enrichment analysis was employed to assess the independent effect of meiotic nuclear divisions 1 (MND1) expression on survival status, and univariate and multivariate Cox regression analyses were applied to determine the associations of clinic-pathologic characteristics and MND1 expression with overall survival (OS). A set of 495 DEGs (145 upregulated and 350 downregulated) was detected, including 63 hub genes with ≥ 10 nodes in the PPI network. Among them, MND1 was participated in several important pathways by connecting with other genes via 17 nodes in lung cancer, and more frequently expressed in LUAD patients with advancing stage (OR = 1.68 for stage III vs. stage I). Univariate and multivariate Cox analyses demonstrated that the expression level of MND1 was significantly and negatively correlated with OS. Therefore, MND1 is a promising diagnostic and therapeutic target for LUAD.

www.nature.com/scientificreports/ Machine-based-learning approaches 4, [6][7][8][9][10][11][12] have been used to analyze both histological and molecular features of tumors for classification according to molecular patterns 9,12 and identification of biologically relevant, tumor-type-defining and clinically informative genetic modifications 11 in a variety of cancers. Nowadays, rapid development of "-omic" technologies, including gene chip analysis and next-generation sequencing, has been proven to generate vast volumes of molecular data for a tumor and publicly stored in databases for assessment of differentially expressed genes (DEGs) in a tumor without the need for such subjective diagnostics 13 . The Gene Expression Omnibus (GEO) Database was the first public database for high-throughput gene expression data as well as hybridization array, chip, microarray 14 , supporting MIAME-compliant data submissions. Another public database is the Cancer Genome Atlas (TCGA) 15 , which collects molecular data sets from exome sequencing, comparative genomic hybridization (CGH) arrays, DNA methylation arrays, RNA sequencing, and reverse protein phase arrays (RPPA) along with clinical information for cancers including LUAD. Thus, TCGA is a fundamental tool for the categorization and further study of the molecular pathogenesis for LUAD. Both the authoritative TCGA and GEO databases are publicly accessible through multiple platforms. Advances in computational tools have facilitated the use of machine-learning processes for analyzing histological data and molecular features, integrating both molecular analysis and visual inspection to enhance diagnostic power [9][10][11][12] . Therefore, machinebased learning approaches are the key development for switching from the original clinical diagnosis methods to a computer-based clinical diagnosis and categorization of tumors, and for optimizing treatment schemes to be ever-more personalized through characterization of an individual's tumor.
In the present study, we downloaded genomic data from the GEO database to detect significant DEGs in LUAD and further validated these DEGs with transcriptomic data and clinicopathological data from TCGA database to investigate correlations between the expression of DEGs, including meiotic nuclear division 1 (MND1), and survival. The information retrieved for the DEGs was applied to construct a protein-protein interaction (PPI) network and conduct an overall survival (OS) analysis. In total, 495 DEGs were identified, of which MND1 was significantly associated with OS and thus might be used as a prognostic biomarker and a molecular curative target for LUAD. The results of the present study provide valuable information for understanding the mechanisms underlying the pathogenesis of LUAD and for the identification of diagnostic and therapeutic targets for LUAD.

Results
Clinicopathological statistics of TCGA LUAD cases. The clinic-pathological information for a total of 486 clinical LUAD samples downloaded from TCGA are listed in Table 1. The available clinic-pathological features for each LAUD case included the patient's age at diagnosis (years), gender, carcinoma stage, and TNM grouping. Overall, 46% of the 486 LUAD cases were male and 54% were female, and their ages ranged from 33 to 88 years (Table 1). Notably, more than half of the LUAD cases (54.8%) were stage I at diagnosis, and only 5.2% were diagnosed at stage IV. The co-expression, genetic, and physical interactions among the 495 DEGs and predicted genes were characterized using STRING to construct the PPI networks. A group of 63 hub genes had a node degree ≥ 10 in the PPI networks of upregulated and downregulated DEGs (Fig. 1A), suggesting that these genes play important roles in the progression of LUAD. The top 10 nodes in the PPI network were CCNA2, TOP2A, CCNB1, CDC20, DLGAP5, MELK, PBK, RRM2, K1F11, and K1F2C. However, for most of these hub genes, either no clinicopathological features were available in TCGA database or the relationship with clinical data was nonsignificant. Meaningful clinical data were available for MND1, which has 17 nodes (Fig. 1B) connected with other genes and participates in several pathways related to lung cancer (Fig. 1C), suggesting that MND1 might be essential in the progression of LUAD. Therefore, MND1 was elected as the target gene for further analysis in the present study.
Expression of the MND1 gene. The differential scatter plot and paired difference analyses of MND1 expression showed a significant difference between 54 normal samples and 497 LUAD samples from TCGA database ( Fig. 2A,B, p < 0.001). Compared with its expression in normal samples, MND1 was significantly upregulated in LUAD samples.

MND1 expression and clinicopathological features. The associations between MND1 expression
level and clinic-pathological data from TCGA were determined by logistic regression analysis (Table 2). When entered as a categorical dependent variable according to a median expression value of 1.83, MND1 expression was negatively related to prognostic clinic-pathologic features. The expression of MND1 increased significantly with advancing LUAD stage (OR = 1.68 for stage III vs. stage I, p = 0.046), suggesting that in patients with the elevated MND1 expression, LUAD was inclined to be diagnosed at a late stage. Age (continuous; OR = 0.37, p < 0.05) also was positively correlated with MND1 expression. No significant association of MND1 expression with distant metastasis (positive vs. negative; OR = 1.73, p = 0.209), lymph node metastasis (positive vs. negative; OR = 1.50, p = 0.098) and gender (male vs. female; OR = 1.34, p = 0.109; Table 2) was observed. The associations of MND1 expression level with survival rate and clinical stage are shown in Fig. 2C,D. LUAD samples were classified into high and low expression sets, and Kaplan-Meier curves based on the MND1 expression level were constructed (Fig. 2C). Log-rank test revealed that higher MND1 expression was significantly associated with poorer OS among LAUD patients compared with lower MND1 expression (p = 0.008). As shown in Fig. 2D, the expression level of MND1 in LUAD subjects with advanced stage disease (stage II-IV) was obviously higher than that in subjects with stage I disease (stage I vs stage II-IV, p = 0.009).
In the univariate Cox regression model, categorical MND1 expression was significantly correlated with OS (HR = 1.52, 95%CI 1.18-1.97, p = 0.001, Table 4), as seen in the Forest plots obtained using the Survminer package in R (Fig. 3). However, clinicopathologic variables including age, gender, stage, and TMN classification were not significantly associated with OS. The results indicated that elevated expression of MND1 is independently correlated with OS in LUAD cases.  Fig. 4 and Tables 5 and 6. The most differentially over-represented signaling pathways in LUAD patients with the elevated MND1 expression included the p53 signaling pathway, pancreatic cancer, small cell lung cancer, bladder cancer, melanoma, and colorectal cancer (Table 5). Signaling pathways related to aldosterone-regulated sodium reabsorption, vascular smooth muscle contraction, peroxisome proliferator-activated receptor (PPAR) signaling pathway, complement and coagulation cascades, drug  www.nature.com/scientificreports/ www.nature.com/scientificreports/ metabolism cytochrome p450, calcium signaling, and gonadotropin-releasing hormone (GNRH) signaling were differentially enriched in LUAD patients with low MND1 expression (Fig. 4, Table 6).

Discussion
LUAD is a highly complex and devastating disease for which the current therapeutic strategies are limited in number and largely ineffective. Consequently, considerable efforts have been made to develop novel and effective molecular-targeted therapeutic schemes. However, the molecular mechanisms underlying the progression and The p53 signaling pathway and pathways associated with pancreatic carcinoma, small cell lung carcinoma, bladder carcinoma, melanoma, and colorectal carcinoma were differentially over-represented in cases with the elevated MND1 expression, whereas the pathways for aldosterone-regulated sodium reabsorption, vascular smooth muscle contraction, PPAR signaling, complement and coagulation cascades, drug metabolism, cytochrome p450, calcium signaling and GNRH signaling were differentially over-represented in cases with low MND1 expression. www.nature.com/scientificreports/ metastasis of LUAD as well as relevant biomarkers have yet to be determined. In the present study, a set of 495 DEGs was discovered from the GEO data and TCGA data, including 145 upregulated and 350 downregulated genes. Among them, the overlapping genes included MND1, and members of the p53 signaling pathway were significantly associated with LUAD, consistent with the results of Gao and Wang 16 and Zhang et al. 17 , who reported several potential roles of MND1, cyclin family members and p53 signaling pathway proteins in LUAD 16,17 . Moreover, these significantly overlapping genes were key hub genes in the PPI network, with significant roles in the prognosis of LUAD, consistent with previous reports [16][17][18] . Among them, MND1 with 17 nodes (Fig. 1B) was predicted to connect with other important genes and pathways related to lung cancer, supporting the results of Dastsooz et al. 19 and Zhang et al. 17 , who demonstrated the key role of MND1 expression and function in the occurrence and progress of carcinomas including LUAD. DNA damage plays a causal role in numerous human pathologies including cancer, premature aging, and chronic inflammatory conditions. DNA repair pathway deficiencies have profound consequences on the signals of the immune system, eventually leading to malignant cancer 20,21 . In the present study, we discovered the association of hub gene MND1 with the occurrence and progression of LUAD. There have been few reports of MND1 expression in lung cancer 17 . MND1 is an intracellular protein that is expressed in the membrane of immune cells and cells of the thymus where it is essential in meiotic homologous chromosome pairing, synapsis and intragenic recombination during meiosis. The differential regulation of MND1 in LAUD patients and healthy controls is likely associated with the meiosis-specific HOP2-MND1 that form an extremely conserved and stable heterodimeric complex 22 essential for homologous recombination in higher eukaryotes [23][24][25] . Homologous recombination repairs damaged chromosomes and mediates pairing of homologous chromosomes, playing a crucial role in maintaining telomere 26 and genome stability 27 . Significantly, dysfunction in HR or its mediators and regulators can result in carcinoma-susceptible human diseases 28 . In the heterodimeric HOP2-MND1 complex, HOP2 acts as the major DNA-binding subunit, whereas MND1 is the important Rad51 interaction entity 22 that modulates ATP and DNA binding by RAD51 to stabilize the RAD51 presynaptic filament and duplex DNA capture for enhancement of synaptic complex constitution. Chi et al. 22 demonstrated stimulation of MND1 in both DMC1-and RAD51-mediated homologous strand assimilation, which is essential for the resolution of meiotic double-strand breaks. Furthermore, Bugreev et al. reported in vitro stimulation of HOP2-MND1 in the DNA strand exchange activities of RAD51 and DMC1 29 , leading to stabilization of the RAD51-single-stranded DNA nucleoprotein filament, the catalytic intermediate in recombination responses. The HOP2-MND1 complex that is predominantly expressed in human fibroblasts and cell lines 30 acts in combination with RAD51 in recombination events to cause telomere lengthening. Alternations of HOP2 were detected in early onset familial breast and ovarian cancer subjects 31,32 , and a single amino acid deletion (Glu201 del) was associated with XX ovarian dysgenesis that is featured by streak ovaries 33 . Disruptions of Hop2 and MND1 gene expression are essential for DMC1 defects in homologous recombination 34 , whereas RAD51 loss is a functioning biomarker of the DNA damage in response to an unfavorable prognostic impact in non-small cell lung carcinoma cases undergoing curative surgical resection 35 . In the present study, the increased expression of MND1 in LUAD patients was associated with advanced clinical stage, short OS time, and poor prognosis, supporting the results of Dastsooz et al. 17 .
Our study suggests that MND1 might be useful as a prognostic biomarker and treatment target for LUAD. Our findings need to be confirmed through further molecular validation studies as well as by clinical observations.
Weighted correlation network analysis (WGCNA) and gene set enrichment analysis (GSEA) have been widely used to identify classes of genes that are over-represented in a large set of genes and may have an association with disease phenotypes 36,37 . WGCNA is a co-expression network model for clustering analysis at the gene level, starting from the level of thousands of genes to determine the gene modules of clinical interest, and finally using the connectivity and gene importance within the modules to identify key genes in the disease pathway for further verification. Compared with GSEA, WGCNA provides more informative but nonsignificantly different results. However, the algorithm in WGCNA is more complicated and cumbersome for identifying modules corresponding to the biological approach. In the present study, we aimed to identify novel genes by performing clustering analysis of the biological functions of HUB genes rather than disease-related genes. Therefore, differential gene expression coupled with GSEA seems to more scientifically and accurately reflect the biological functions of genes. GSEA using TCGA data revealed that a set of important pathways including p53 signaling and pathways associated with malignancies such as pancreatic carcinoma, small cell lung carcinoma, bladder carcinoma, melanoma, and colorectal carcinoma were differentially over-represented in the elevated MND1 expression phenotype, whereas the pathways of aldosterone regulated sodium reabsorption, vascular smooth muscle contraction, PPAR signaling, complement and coagulation cascades, drug metabolism cytochrome P450, calcium signaling, and gonadotropin-releasing hormone (GnRH) signaling were differentially over-represented in the low MND1 expression phenotype. These data suggest that other genes identified by the GSEA as part of the MND1 protein network might play key roles in LUAD. Among the MND1-upregulated pathways, tumor suppressor p53 encoded by the homologous TP53 gene, has been previously proven to be involved in lung cancer, ranking first among all the genes detected in terms of its correlation with various types of human malignancies 38 . TP53 acts to slow down or monitor cell division 39 . Mutant p53 is a result of a TP53 gene alternation and acts as a tumor-promoting factor that functions essentially in the tumorigenesis of lung epithelial cells, resulting in cancer formation or cell transformation and elimination of normal TP53 gene functions 40 . Among the MND1-downregulated pathways, cytochrome p450 is a key enzyme in cancer formation and treatment 41 , serving important metabolic roles in a number of aspects of malignancy as a consequence of unusually broad substrate specificity. Cytochrome p450 is also a prominent player in the metabolism of anticancer therapy drugs to improve or diminish the drug efficacy, depending on whether the drug or its metabolites are effective. The cytochrome expression in lung carcinoma and surrounding tissues could be a crucial determiner of the efficacy of anticancer drugs 41,42 . In the present study, cytochrome p450 was inhibited by MND1, contributing to the longer OS of LUAD cases with low MND1 expression. Previous studies demonstrated correlations between TP53 mutation and poorer prognosis in non-small cell www.nature.com/scientificreports/ lung carcinoma 43 and epidermal growth factor receptor (EGFR)-mutated LUAD [44][45][46] . However, the molecular functions of co-regulations between the targetable driver MND1 and tumor suppressor TP53 and other pathways such as cytochromes in the prognostic outcomes of LUAD patients have yet to be clarified. Multiplex genomic profiling datasets for LUAD patients are now available for machine-based-learning approaches 9,12 and can be used to characterize both targetable driver alterations and tumor suppressor genes or pathways that are potentially significant for the design of therapeutic strategies and as predictive biomarkers for therapeutic efficacy in LUAD.
In conclusion, we discovered that MND1 expression is strongly negatively correlated with OS in LUAD patients. Notably, the effect of MND1 expression on the prognosis of LUAD patients is independent of clinicopathological features. Thus, MND1 can potentially serve as a prognostic biomarker for worse OS of LUAD patients and a target for the design of therapeutic schemes. Moreover, multiple pathways were upregulated or downregulated by MND1 expression during the occurrence and development of LUAD, among which the p53 signaling pathway might be the critical pathway through which MND1 modulated its effect on the OS of LUAD patients. However, our findings based on analyses of TCGA and GEO data in the present study need to be confirmed by analyses of biologically and functionally experimental data. The data regarding drug treatment for LUAD were also not available, limiting the analysis of clinical outcomes. Further in-depth studies are necessary to validate the results of the present study and reveal correlations between the targetable driver MND1 and the significant MND1-mediated pathways, which can then be applied to improve the therapeutic efficacy of treatments and prolong the OS of LUAD patients.

Methods
Data collection and process for differential expression analysis. Gene expression datasets GSE118370, GSE19188 and GSE40791 were downloaded from the GPL570 platform (https:// www. ncbi. nlm. nih. gov/) and merged using Perlscript (ActivePerl-5.26.3.2603); these included data from 121 normal tissues and 118 LUAD tissues. The data were pre-processed for background adjustment and normalization by batch rectification using sva package (version 3.32.1) in R language (version 3.6.0;). The DEGs between LUAD and healthy tissues were analyzed with eBayes-test method of limma package (version 3.40.2) in R language 47 . P values were adjusted using the eBayes test, and an adjusted p value (adj.p) < 0.05 and |log FC|> 2 were used as the cutoff thresholds.
The clinicopathological features and transcriptomic profiles of LUAD cases were downloaded from TCGA (https:// portal. gdc. cancer. gov/). Among 551 cases with transcriptome profiles, only LUAD tissues with full transcriptomic data and survival information were included, resulting in 486 clinical files for further analysis. Sixtyfive samples (54 normal samples and 11 samples with incomplete clinicopathological features) were excluded. The clinicopathological features included pathological stage, age, gender, OS (survival days and survival state), TMN grouping, lymph node status, and distant metastasis status. All samples were tested by Illumina HiSeq 200 RNA Sequencing v.2 analysis. Fragments per kilobase of transcript per million mapped reads (FPKM) was applied as the unit of gene expression for categorization of LUAD cases into high and low expression sets. Variables in these two sets and the prevalence of categorical variables were compared using the Wilcox test. LUAD samples were divided into stage I and stage II-IV, according to the clinical phase at diagnosis.

PPI network construction and hub gene analysis. The Search Tool for the Retrieval of Interacting
Genes (STRING, https:// string-db. org/) is an online database for retrieving interacting genes, including physical and functional correlations 48 . STRING 48 was used to understand the protein-protein interaction by submitting the set of proteins and the respective pathway involved in LUAD was identified by using the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database (https:// www. kegg. jp/ kegg/ pathw ay. html) 49 . A STRING search was performed to evaluate the interactive associations among DEGs using a confidence score > 0.7 as the cut-off criterion, and significantly differentially expressed genes for the prognosis of LUAD were elected as hub genes (p < 0.05). Cytoscape software (version 3.5.1) 50 was applied to construct PPI networks. The hub genes in the network were identified with cytoHubba (version 0.1) 51 in Cytoscape to characterize crucial factors in LUAD.
Gene set enrichment analysis (GSEA). GSEA is a computational tool that allows the use of a priori gene sets to determine significantly over-represented or down-represented gene groups between two biological phenotypes 52 . The expression level of MND1 with 17 nodes in the PPI network was used as a phenotype label to measure the association between a set of genes and a phenotype in TCGA dataset. GSEA was carried out first to generate a sorted list of all genes based on their association with MND1 expression and then to assess whether survival differed significantly between the high-and low-MND1 expression groups using Java 8 (gsea-3.0.jar vision). Each gene set was repeatedly permutated 1000 times for each analysis. The nominal p value (NOM-p) and normalized enrichment score (NES) were applied to rank the pathways over-represented in each of phenotypes. GSEA enrichment plots were drawn using ggplot2 package in R.
Statistical analysis. All data were statistically analyzed using R (v.3.6.0). The associations of clinic-pathological features, including age, gender, stage, and TMN grouping, with MND1 expression were estimated using the Wilcoxon signed-rank test and logistic regression. The associations of clinic-pathological characteristics with OS were identified using univariate Cox regression and multivariate Cox regression analyses. The multivariate Kaplan-Meier method in the R/Survminer package (version 0.4.4) was applied to generate the Kaplan-Meier survival plot. Multivariate Cox analysis was performed to determine the comparative effects of MND1 expression on survival among subgroups with different clinical parameters 53 : stage, lymph node status, distant metastasis status, age, and gender, using the median value of MND1 expression as the cut-off criteria. A hazard ratio (HR) based on the Cox PH model and the corresponding 95% confidence interval (95% CI) were estimated. www.nature.com/scientificreports/

Data availability
The datasets generated and analyzed during the present study are available from the corresponding author upon reasonable request. www.nature.com/scientificreports/