Network models of prostate cancer immune microenvironments identify ROMO1 as heterogeneity and prognostic marker

Prostate cancer (PCa) is the fifth leading cause of death from cancer in men worldwide. Its treatment remains challenging due to the heterogeneity of the tumor, mainly because of the lack of effective and targeted prognostic markers at the system biology level. First, the data were retrieved from TCGA dataset, and valid samples were obtained by consistent clustering and principal component analysis; next, key genes were analyzed for prognosis of PCa using WGCNA, MEGENA, and LASSO Cox regression model analysis, while key genes were screened based on disease-free survival significance. Finally, TIMER data were selected to explore the relationship between genes and tumor immune infiltration, and GSCAlite was used to explore the small-molecule targeted drugs that act with them. Here, we used tumor subtype analysis and an energetic co-expression network algorithm of WGCNA and MEGENA to identify a signal dominated by the ROMO1 to predict PCa prognosis. Cox regression analysis of ROMO1 was an independent influence, and the prognostic value of this biomarker was validated in the training set, the validated data itself, and external data, respectively. This biomarker correlates with tumor immune infiltration and has a high degree of infiltration, poor prognosis, and strong correlation with CD8+T cells. Gene function annotation and other analyses also implied a potential molecular mechanism for ROMO1. In conclusion, we putative ROMO1 as a portal key prognostic gene for the diagnosis and prognosis of PCa, which provides new insights into the diagnosis and treatment of PCa.

1. The gene correlation matrix was transformed into a scale-free network. 2. Unsupervised hierarchical clustering was performed using a dynamic tree chopping algorithm, and the clustered tree branches were defined as modules. 3. The correlation between each module and clinical traits was calculated, and the modules with higher correlations were selected.
Subsequently, the hub genes in the selected modules were ranked by intra-module connectivity and correlation with module trait genes, and finally, candidate genes were identified.
MEGENA is an innovative co-expression network analysis method that offers unique advantages over WGCNA for efficiently constructing large-scale co-expression planar filter networks and preserving gene interactions 53 . The R software package MEGENA (v. 1.3.7) is used to perform MEGENA, which consists of the following steps: (1) constructing a planar filtered network (PFN); firstly, calculating correlation coefficients based on gene expression profiles, and then filtering and clustering gene pairs using a parallel filtering method to obtain a fast planar filter network; (2) multi-scale clustering analysis; from the initial PFN of the connected components, multi-scale clustering of each parent cluster can obtain more sub-modules, followed by hierarchical clustering results; (3) downstream analysis, using multiscale hub analysis (MHA) to identify important hubs based on the network topology; (4) Finally, the correlation between clustering results and clinical information was analyzed by cluster-trait association analysis (CTA). To test whether selected modules in WGCN and multiscale hub genes in MEGCN were highly associated with tumorigenesis and malignant progression, we performed GO and KEGG pathway 54 analysis on all genes in the selected DAVID (https:// david. ncifc rf. gov/) database 55 for hub modules. P ≤ 0.05 was considered as the cut-off criterion for identifying enrichment. Similarly, an R package heatmap is used in heatmap figure. And two-way hierarchical clustering is used for clustering.
Validation and survival analysis of pivotal genes. In WGCNA, when hub modules were identified, we selected candidate hub genes by module connectivity (cor. gene module-Membership (MM) > 0.8) and clinical feature correlation (cor.geneTraitSignificance (GS) > 0.2), both by Pearson correlation absolute values were used to determine. At the same time, we chose the STRING database (https:// www. string-db. org/) 56 to identify the connectivity of candidate genes. Next, Cytoscape's plug-in CytoHubba was used to identify top50 genes as candidate genes 57 . The gene modules from the WGCNA and MEGENA analyses are saved separately. The userListEnrichment R function was used to calculate the overlap of genes between the WGCNA and MEGENA modules. The WGCNA and MEGENA modules that overlapped significantly and were significantly associated with PCa were retained. In this study, we used the Gene Expression Profiling Interactive Analysis (GEPIA) dataset (http:// gepia. cancer-pku. cn/) for prognostic analysis of PCa 58 . We analyzed the differential expression of key hub genes in PCa and their association with RFS using GEPIA. P-value and fold change were defined as 0.05 and group limit of 50% for two survival analyses. Also, normal distribution tests and differential analysis of hub genes in normal versus tumor tissue and normal versus different tumor subtypes were performed based on the Wilcoxon test. Finally, Human Protein Atlas Dataset (https:// www. prote inatl as. org/) 59  www.nature.com/scientificreports/ Development and validation of the hub gene prognostic model. To assess the prognostic value of hub genes, Cox regression analysis was used to evaluate the correlation between genes and survival status in a cohort of 494 PCas. Next, we chose the glmnet R language package for LASSO Cox regression modeling to narrow down the candidate genes and build prognostic models 60 . Ultimately, the screening retained three genes and their coefficients, and the penalty parameter λ was determined based on the minimum criterion. The algorithm obtained a more accurate model by constructing a penalty function. As a method for complex covariance data, variable selection can be achieved and parameter estimation to better address multiple covariances in regression analysis. The PCa expression data were centrally normalized using a scale function, and a risk score was calculated.
Patients with PCa OS were divided into low and high-risk subgroups based on median risk scores. Kaplan Meier analysis was performed to compare OS time assessment risk scores and overall survival between the two subgroups. To validate the accuracy and predictive power of the model, ROC curve analyses were performed using the survivor, survminer, and timeROC R packages for one year, three years, and five years, respectively, to calculate the area under The curve (AUC) and to compare the effect of the timeROC R package on classifier performance.

Relationship between key genes and tumor-infiltrating immune cells. The Tumor Immunity
Evaluation Resource (TIMER) (https:// cistr ome. shiny apps. io/ timer/) is a comprehensive resource for detecting immune cell infiltration in tumor tissue using RNA-Seq expression profiling data and contains immune infiltration in different cancer types 61 . Hub gene expression was correlated with six types of immune infiltration (B cells, CD4 + T cells, CD8 + T cells, neutrophils, macrophages, and dendritic cells) were correlated and assessed using the Gene module. The SCNA module was also used to explore the correlation between somatic cell copy number changes and the abundance of immune infiltrates.
Exploring the drug sensitivity of the hub gene. The GSCAlite (http:// bioin fo. life. hust. edu. cn/ web/ GSCAL ite/) database is a genomic cancer analysis platform 62 . The database can be used for genomic and immunogenomic analyses. It also enables researchers to combine clinical information and small molecule drugs to mine candidate biomarkers and valuable small molecule drugs for better experimental design and further clinical trials. The GSCA data contains 33 cancer types from TCGA and normal tissue data from GTEx, and over 750 small-molecule drugs from GDSC and CTRP for 10,000 genomic data. Spearman correlations represent gene expression associated with drugs. A positive correlation means that a gene with high expression is resistant to the drug and vice versa.
Statistical analysis. R 63 (version 3.6.2) and related software packages were an application for all the statistical analyses. P < 0.05 is a statistically significant difference.

Result
Combined consistency clustering and principal component analysis to obtain sample cohorts. The analytical process used in this study is shown in Fig. S1. To follow up on the molecular heterogeneity of PCa, we performed an unsupervised consensus analysis on all samples. In this study, we chose k = 5, which allowed us to divide all samples into five groups (Fig. S2). Among them, C1:59, C2:146, C3:129, C4:102, and C5:114. Moreover, we found that C1-C4 were tumor subgroups, and C5 were normal samples ( Fig. 1A-C). Next, to verify the robustness of this classification, we performed another principal component analysis based on the results of consistent clustering and observed the subgroup differentiation. PC1 and PC2 were selected as the main components for the analysis, and 260 samples with significant differences were obtained after excluding outlier samples as well as those with insignificant differences ( Table 1). As shown in Fig. 1D, there was an individual crossover between tumor subgroups, indicating good differentiation between subgroups. The clustering results were as follows: C1:41, C2:68, C3:72, C4:51, and C5:28, reflecting the impact of differences between tumor subgroups and normal samples on transcriptional profiles.

Identification of differentially expressed genes and GO functional annotation in PCa.
We performed differential gene expression analysis of the four tumor subgroups separately from normal samples. A total of 3521 differentially expressed genes were obtained. Of these, 581 up-regulated and 1432 down-regulated genes were obtained for C1 vs. C5, 231 up-regulated and 637 down-regulated genes were obtained for C2 vs. C5, 234 up-regulated and 684 down-regulated genes were obtained for C3 vs. C5, and 282 up-regulated and 2265 down-regulated genes were obtained for C4 vs. C5 ( Fig. 2A). Comparisons between tumor subgroups revealed 842 DEGs specific to C1, 47 DEGs specific to C2, 29 DEGs specific to C3, and 1045 DEGs specific to C4 (Fig. 2B). Three hundred ninety-six overlap genes were also identified, suggesting a degree of similarity in expression profiles between tumor subgroups but also a high degree of heterogeneity between tumor subgroups (Fig. 2C).
Functional enrichment analysis of four groups of differentially expressed genes. The researchers used the Metascape website to carry out a biological process analysis of the GO function of 3521 DEGs (Fig. S3). As shown in Fig. 3A, there was a strong interaction between the four groups of differentially expressed genes. Also, DEGs were found to be mainly enriched in functions such as leukocyte migration, inflammatory response, regulation of MAPK cascade, circulatory system processes cell component morphogenesis, and regulation of cell adhesion (Table 2). Moreover, Metascape enrichment analysis also showed that DEGs differed significantly between the four groups in the pathways regulating cell adhesion and leukocyte migration, demon- www.nature.com/scientificreports/ strating genetic and functional differences in tumor subtypes ( Fig. 3B-E). Naturally, to ensure that the differential analysis did not filter out key genes, we also performed GSEA analysis on all genes and found that they were mainly enriched in P53, DNA repair, Notch signaling pathway, etc. (Fig. S4).

Co-expression network analysis reveals key modules of PCa network.
We used the expression profiles and clinical information of 260 samples from PCa as input files for WGCNA by integrating them. We eliminated two outlier samples based on the clustering tree and then plotted the sample dendrograms and expression heat maps of the traits (Fig. 4A). After excluding the two outlier samples, the remaining 258 samples and 3521 differentially expressed genes were subjected to WGCNA analysis (Fig. S5). We chose a soft threshold power of 8 (scale-free R 2 = 0.85) by calculation, ensuring that the scale-free network was reasonable (Fig. 4B). We built the network by choosing the soft threshold. And the minimum number of genes in the modules was required to be 50. Subsequently, the initial modules were divided using a dynamic tree, and the genes were divided into modules based on the similarity of the characteristic genes in the gene clustering map, and ten gene modules were finally identified (Fig. 4C). One color indicates a gene module, while grey gene modules are considered invalid genes that cannot be assigned to a module. Next, we created a heat map of the correlation between gene modules and clinical information (Fig. 4D). We found that the blue module was found to have the highest correlation coefficient with consensus clustering (Pearson cor = 0.87, p = 1e−200) (Fig. 4E). This suggests www.nature.com/scientificreports/ that the genes in this module correlate with tumor typing as well as malignancy; meanwhile, we found a large correlation coefficient between the green module and tumorigenesis (Pearson cor = 0.57, p= 3e−08) (Fig. 4F).
In contrast, the absolute values of the correlation coefficients between genes and age, recurrence status, and gleason_score were smaller in the other modules, suggesting that the genes are less relevant to other clinical information. Finally, we calculated the connectivity of the genes within the blue (Fig. 5A) and green modules (Fig. 5B) and selected the top50 most closely connected genes for subsequent analysis. Meanwhile, the expression profiles of 3521 differentially expressed genes (DEGs) were applied to the multiscale embedded gene co-expression network (MEGENA) analysis. After parallelization, early termination, and pre-quality control steps, 10,333 gene pairs of PFNs were used to enter 1755 MCA for multiscale cluster identification. A total of 5 different scales were identified by multi-scale cluster analysis, and 226 significant modular clusters were constructed (Fig. S6), where the modular p-values were all less than 0.05.
Comparing the significant modules of WGCNA with the MEGENA modules, we found a degree of overlap between the key gene modules of WGCNA and those of MEGENA. We found a large degree of overlap between the blue module and green module of WGCNA and the C1_5 module (Fig. 6A) and C1_7 (Fig. 6B) module of MEGENA. We also annotated the genes of the four modules and found that the four groups of genes are mainly involved in the cell cycle, cell division, and transcriptional functions (Fig. S7). WGCNA and MEGENA can complement each other and help us to preserve the maximum range of gene modules specific to PCa. We found that the expression profiles of this class of overlapping genes differed greatly between tumor and normal adjacent tissue, and some of the genes also differed greatly between different subtypes of tumors (Fig. 7A, B). Further, the expression of NDUFB7, AURKAIP1, ROMO1, SCAND1, GADD45GIP1, and NDUFA13 was upregulated in the four tumor subtypes compared with normal adjacent tissue (Fig. 7C). Next, we annotated the GO_BP function of 29 overlapping genes based on the DAVID database ( Table 3) and found that these genes are mainly involved in cell division, cell cycle, mitosis, and positive regulation of ubiquitin-protein ligase activity (Fig. 7D).
Validation of key candidate genes. We first performed a disease-free survival analysis of key genes using the GEPIA database to explore the relationship between genes and phenotype and prognostic status. We selected median as a criterion to differentiate gene expression levels and found that TPX2, ROMO1, PLK1, UBE2C, and KIF4A were considered hub genes with the most significant p-values selected from the 29 core genes (Fig. 8A). www.nature.com/scientificreports/ We then perform a Wilcoxon test on the five hub genes screened, and we found significant differences (P < 0.05) in the expression of the five genes between tumor and normal adjacent tissue (Fig. 8B). Meanwhile, we further analyzed the expression differences of the five hub genes among different tumor subtypes, and we found that these genes also differed significantly among tumor subtypes of PCa (Fig. 8C). Moreover, we found that ROMO1 was highly expressed in both tumor and normal tissues after immunohistochemistry (Fig. 8D). Meanwhile, we incorporated PCa data from GETx that found significant upregulation of ROMO1 in tumor tissues from GEPIA database (Fig. S8).

Construction and validation of prognostic gene markers in TCGA .
The data for 497 tumors were obtained from The Cancer Genome Atlas (TCGA) dataset, and corresponding clinical information was incorporated into the prognostic training model. The expression data of five genes, TPX2, ROMO1, PLK1, UBE2C, and KIF4A, were pooled, and LASSO Cox was applied to construct a regression model. The model identified ROMO1, PLK1, and KIF4A5 based on the optimal value of λ to construct an OS prognostic model for PCa patients (Fig. 9A, B). The risk scores, patient survival status distributions, and gene expression profiles associated with the three genetic features in the training dataset are shown in Fig. 9C. OS was significantly lower in the high-risk group than in the low-risk group (P = 0.0021) (Fig. 9D). Finally, ROC analysis was performed, and the area under the ROC curve (AUC) for overall survival (OS) at 1, 3, and 5 years was 0.789 and 0.72 and 0.626, respectively (Fig. 9E). In a subsequent study, we also did a LASSO regression analysis on ROMO1 and found that ROMO1 could also be used as an independent prognostic model and that its low-risk group had a better survival outcome than the high-risk group (Fig. S9).

Alterations in the immune microenvironment may result from aberrant regulation of key genes in PCa.
To explore the relationship between hub genes expression and tumor-infiltrating immune cells, we used the TIMER tool to analyze the correlation between key gene expression and the level of immune infiltration. Our results showed that ROMO1 expression was strongly correlated with immunity to PCa and was found to be negatively correlated with B cells, CD8 + T cells, macrophages, neutrophils, dendritic cells and posi- www.nature.com/scientificreports/ tively correlated with CD4 + T cells; PLK1 and KLF4A expression was positively correlated with B cells, CD8 + T cells, macrophages, neutrophils, dendritic cells ( Fig. 10 A-C). In addition, we found that different mutant forms www.nature.com/scientificreports/ of ROMO1 correlated with immune infiltration of CD8 + T cells, macrophages, and neutrophils; different mutant forms of KLF4A correlated with immune levels of CD8 + T cells, macrophages, neutrophils, and dendritic cells; and we found that ROMO1, PLK1, and KLF4A were found to have higher levels of immune infiltration in dendritic cells (Fig. 10 D-F). This also reveals the influence of key genes on the immune microenvironment of PCa.
Drug sensitivity and resistance of key genes. To investigate the relationship between the expression of hub genes and drug interactions, we used the GSCA database to screen for drugs that interacted with it. The GSCA database integrated drug sensitivity and gene expression profile data for cancer cell lines from both the GDSC and CTRP databases into GSCALite, and then, using Spearman correlation analysis, the expression levels of genes were correlated with the IC50. FDR < 0.05 was considered significant (Supplementary Table 1. During the research, we found that ROMO1 expression was significantly positively correlated with nutlin-3, fluorouracil, etc. KIF4A expression was significantly positively correlated with PD318088 and selumetinib, and negatively correlated with ML239; PLK1 expression was significantly negatively correlated with STF-31 was significantly negatively correlated (Fig. 11). This reveals a diversity of expression levels of key genes and drug sensitivity.

Discussion
In recent years, PCa has become the most common and deadly solid cancer and genitourinary tumor in men worldwide 64 , with a diagnosis rate of 12% and a mortality rate of 9%, and its incidence rises with age 65 . The traditional clinical treatment options include resection, radiotherapy, chemotherapy, and endocrine therapy 66 . The tumor heterogeneity leads to limitations in conventional treatment and makes it difficult to manage and risk assess patients effectively 67,68 . Recent studies have demonstrated the imperative of identifying new effective biomarkers and promising immune-related therapeutic targets that can be used to guide cancer treatment in clinical practice 69 . In this study, we found a high correlation between ROMO1 expression and clinical features such as the occurrence of PCa, subtypes classified by consistent clustering. Also, ROMO1 correlated with the level of immune cell infiltration and immune pathways in PCa. Firstly, many factors can contribute to cancer development, such as dietary status, genetic mutations, epigenetic alterations, etc. We consequently identified 260 valid samples comprising a normal group and four tumor sub-types by consensus clustering and principal component analysis after outliers were excluded. Secondly, the limmaR package was selected to differentially analyze the four groups of tumor subtypes from the normal group, respectively. Finally, 3521 DEGs were identified and functionally annotated, and these differentially expressed genes were found to be mainly enriched in ah immune-related pathways. Next, WGCNA and MEGENA were applied to construct co-expression networks of differentially expressed genes to identify gene modules associated with tumorigenesis and heterogeneity, and 29 overlapping genes were identified by calculation. Then, the Lasso Cox regression model was constructed to identify ROMO1, PLK1, and KIF4A5 as the optimal core genes. Subsequently, when analyzing the relationship between the expression of hub genes and tumor-infiltrating immune cells, it was found that the expression of ROMO1, PLK1, and KIF4A were all associated with tumor-infiltrating immune cells, which in turn led to the alteration of the tumor microenvironment alterations, which in turn lead to tumor heterogeneity. Finally, we found that nutlin-3, fluorouracil, and others could be potential therapeutic  www.nature.com/scientificreports/ agents for ROMO1, and PD318088 and selumetinib could be potential therapeutic agents for PLK1. In conclusion, our study provides a new theoretical basis for the diagnosis and treatment of PCa. Polo-like kinase 1 (PLK1) is a member of a family of serine/threonine protein kinases that are widely found in eukaryotic cells 70 . Its specific functions are mainly in cell cycle processes, including controlling mitotic entry and the G2/M checkpoint, coordinating centrosomes and the cell cycle, regulating spindle assembly and chromosome segregation, performing multiple functions during mid-spindle and abscission, promoting DNA replication, participating in the cytoplasmic division and meiosis, and playing an important role in the initiation, maintenance, and completion of mitosis [71][72][73] . The pharmacological inhibition of PLK1 in triple-negative breast cancer has been reported to increase the anti-proliferative activity of drug-resistant cells, which in turn causes G2/M phase block and increases the phosphorylation of cell cycle proteins inducing apoptosis 74 . Also, in PCa, mitotic kinase polo-like kinase 1 (PLK1) is expressed at elevated levels and is associated with tumor grade 75 . In the present study, we also found a significant prognostic effect of PLK1, with expression in different subtypes of the prostate gland also differing significantly from normal samples, and found that PLK1 expression was also associated with tumor-infiltrating immune cells, further demonstrating the reliability of PLK1 as a biomarker for PCa.      76 . KIF4A has important roles in DNA repair, DNA replication, spindle organization, cytoplasmic division, and intracellular transport 77 . KIF4A has been previously reported to be aberrantly expressed in many cancers, revealing its function and role in different tumors [78][79][80] . KIF4A can promote PCa cell growth through AR and AR-V7-dependent signaling 78 . In the present study, KIF4A expression in different subtypes of www.nature.com/scientificreports/ the prostate was also significantly different from normal samples, and KIF4A expression was also found to be associated with tumor-infiltrating immune cells.
Studies have shown that ROMO1 is overexpressed in hepatocellular carcinoma, colorectal cancer, and glioma 81-83 but has not been reported in PCa. In the present study, we selected the Wilcoxon test and found that ROMO1 was highly expressed in tumor tissue and significantly different from normal tissue; we also found that the four identified tumor subtypes were significantly different. The expression of ROMO1 was also found to be associated with tumor-infiltrating immune cells, leading to changes in the tumor microenvironment and further increasing tumor heterogeneity; drug sensitivity analysis revealed that nutlin-3 and fluorouracil could be used as potential therapeutic agents for ROMO1.
In general, tumor pathogenesis involves many interacting signaling pathways, including tumor cell proliferation, cell immortalization, invasion, and migration, etc. 84 . The complexity of cancer can be reflected through the tumor microenvironment, while protein interactions can further increase heterogeneity between tumors 85,86 . In the present study, we have used weighted gene co-expression network analysis (WGCNA), which classifies the gene co-expression network of PCa into ten highly correlated signature modules. The modules were then correlated with specific clinical features to identify genes that are key to tumorigenesis and transformation, to help identify potential mechanisms involved, and to explore candidate biomarkers. However, WGCNA has the limitation of not being able to coexist at different levels of clustering within a single network, thus not reflecting the multi-scale hierarchical nature of complex networks. Multi-scale embedded gene co-expression network analysis (MEGENA), on the other hand, allows the construction and analysis of large-scale planar filtered co-expression networks to the greatest extent possible 53 . Parallelization of embedded network construction and the identification of multiscale clustering structures are two key components of MEGENA, which is an essential complement to existing co-expression network analysis methods by identifying multiscale modular systems and co-expression networks with varying degrees of sparse and tight connectivity. Here, by using WGCNA in conjunction with MEGENA to construct a gene co-expression network for PCa, we identified more meaningful clusters of coexpressed genes and identified key biomarkers associated with prostate carcinogenesis in transformation.
In this study, we combined various bioinformatic analysis methods, especially the introduction of weighted gene co-expression network analysis and multi-scale chimeric network analysis, to reveal that ROMO1 may serve as a new key prognostic marker for PCa. However, the article still has some limitations. Firstly, the role of ROMO1 has not been validated experimentally in vivo and in vitro, and secondly, the fact that the number of cancer samples and normal samples are not identical to each other has led to some preference in our data. We believe that if we had direct access to a larger sample of clinical sequencing data and sample information, we would obtain better and more accurate results.

Conclusions
In conclusion, our systematic analysis of the TCGA database supported by array-based and sequence-based PCa data has identified ROMO1, a key gene closely associated with the PCa tumor microenvironment, and the essential signaling pathways involved. For this reason, we propose that ROMO1 may serve as a potential biomarker and therapeutic target for PCa. It may provide new theoretical inferences for the diagnosis and prognosis of PCa.

Data availability
All data are available. Please contact us to access if it is needed.