Identification and validation of tumor microenvironment-related signature for predicting prognosis and immunotherapy response in patients with lung adenocarcinoma

Mounting evidence has found that tumor microenvironment (TME) plays an important role in the tumor progression of lung adenocarcinoma (LUAD). However, the roles of tumor microenvironment-related genes in immunotherapy and clinical outcomes remain unclear. In this study, 6 TME-related genes (PLK1, LDHA, FURIN, FSCN1, RAB27B, and MS4A1) were identified to construct the prognostic model. The established risk scores were able to predict outcomes at 1, 3, and 5 years with greater accuracy than previously known models. Moreover, the risk score was closely associated with immune cell infiltration and the immunoregulatory genes including T cell exhaustion markers. In conclusion, the TME risk score can function as an independent prognostic biomarker and a predictor for evaluating immunotherapy response in LUAD patients, which provides recommendations for improving patients’ response to immunotherapy and promoting personalized tumor immunotherapy in the future.

Statistical analysis.The Kruskal-Wallis test was used to compare more than two groups, while the Wilcoxon test was used to compare two groups.It was decided to evaluate the survival curves using the Kaplan-Meier log-rank test.The chi-square test was performed to examine the relationships between risk groupings and somatic mutational burden, and Spearman analysis was utilized to determine correlation coefficients.The CIBERSORT algorithm was used to further examine the results (p < 0.05).Statistics were regarded as significant when a two-sided p < 0.05 was reached.Utilizing the software R, all statistical analyses were completed (version 4.1.1).

Identification of DEGs between normal lung and LUAD tissues.
The flow chart describing our study is shown in Supplementary Fig. S1.From the data obtained from the TCGA-LUAD, we compared the levels of gene expression in 59 normal tissues with those in 504 malignant tissues.A total of 203 genes associated with TME were analyzed and compared.The batch effect was removed from the data of the TCGA database, and the data were normalized.There were 1251 TME-associated genes (FDR < 0.05 and |log2FC|> 1) differentiated in expression between LUAD and normal lung tissues (Table S2).Among the TME-related DEGs, 920 genes were found to be significantly up-regulated, while 331 genes were found to be significantly down-regulated.The top 50 genes in terms of differential expression that were either up-regulated or down-regulated were plotted as volcanoes (Fig. 1A).
NMF algorithm-based molecular subtypes of OS-related genes.Covariance and RSS were used to develop C1 and C2, and k = 2 was determined to be the optimal value for clustering based on its stability and performance (Fig. 1B-E).According to the findings of the Kaplan-Meier curve, samples that fell into the C1 cluster enjoyed longer OS and PFS than those samples that fell into the C2 cluster (Fig. 1F-H).There were significant differences between C1 and C2 in immune and stromal scores (Fig. 2A-G).The above results showed that OSrelated genes can be clustered, and these clusters were related to a variety of immune cell clusters.
Developing and validating a prognostic prediction model for OS-related genes.In the expression matrix of the whole TCGA-LUAD samples with 1251 genes, 203 prognostic TME-related genes were obtained by univariate Cox analysis (Table S3).We applied LASSO and Cox regression analyses to further narrow the range of prognostic genes, and finally, constructed a TME-related gene signature involving 6 genes in the training cohort.The results of the LASSO analysis were shown in Fig. 3A,B  database was used to explore protein expression levels in LUAD samples.We extracted IHC from the HPA database.The results showed PLK1, LDHA, FURIN, FSCN1, and RAB27B were up-regulated in lung cancer tissues, and MS4A1 was down-regulated in lung cancer tissues (Supplementary Fig. S2A,B).Furthermore, we also found that PLK1, LDHA, FURIN, FSCN1, and RAB27B were increased in NSCLC cancerous cell lines compared with that in normal human bronchial epithelium cell line BEAS-2B, MS4A1 was down-regulated in lung cancer cells lines (Supplementary Fig. S2C,D).Using the median risk score in the training cohort as the cutoff point, the patients were divided into low-risk and high-risk groups.KM survival analysis indicated that the low-risk group had better overall survival than the high-risk group (P < 0.001, Fig. 3C,D).There was a significant difference in OS between the two patient groups that had high and low expression of the six genes (Fig. 3E-J).show the expression patterns of the six genes in the TCGA-LUAD cohort, the distribution of sample survival status, and corresponding risk scores.Both internal validations with the TCGA-LUAD cohort and external validation with the GSE68571 cohort (Fig. 4D-F) demonstrated a stable and robust prognostic value for this risk prognostic feature (Fig. 4D-F).We further assess the predictive value of the TME-related gene signature in the external test cohort.The results from the above data (GSE13213/ GSE50081 and GSE10072 cohort) shared the same trend in survival, respectively (Supplementary Fig. S3A).

Construction of prognostic nomogram.
The area under the ROC curve was 0.741 for a 1-year overall survival rate, 0.722 for a 3-year overall survival rate, and 0.657 for a 5-year overall survival rate (Fig. 5A).We integrated gender, tumor grade, T, N, and M stages for AUC analysis of 1-year (Fig. 5B), 3-year (Fig. 5C), and 5-year (Fig. 5D) OS to further confirm that risk score can be a prognostic signal among numerous clinicopathological characteristics.This was done to further confirm that the risk score can be a prognostic signal among numerous clinicopathological characteristics.When compared to the AUC value the risk score and clinicopathological features were all less significant.There was a significant difference between the univariate and multivariate Cox regression results for overall survival, with the difference being significant (Fig. 5E,F).To evaluate the significance of model prediction, we developed a predictive nomogram consisting of risk score and clinicopathological features (Fig. 5G).Constructing the calibration curve allowed for the evaluation of the nomogram's predictive performance, which may result in an underestimate of the 5-year survival rate (Fig. 5H).www.nature.com/scientificreports/Functional analysis of OS-related genes.The median expression of the hub genes was chosen as the cut-off value, and all samples were divided into low-expression and high-expression groups based on where they fell on that scale.GSEA later uncovered a functional enrichment of genes.According to the findings of an analysis conducted by KEGG, a high level of MS4A1 expression is connected to certain signaling pathways, such as the cytokine-cytokine receptor interaction circuit and the chemokine signaling network.There was a correlation between the increased expression of LDHA and the cytosolic DNA sensing pathway, in addition to other signaling pathways.Two distinct signaling routes were found to be connected to the elevated levels of PLK1 expression.These pathways were the chemokine signaling pathway and the system that involves the interaction of cytokines with their respective receptors.The enhanced expression of RAB27B was found to be related to the activation of certain signaling pathways, including cytochrome P450-mediated xenobiotic metabolism and retinol metabolism (Fig. 6A-F).

Association of risk characteristics with clinicopathological variables.
To explore the correlation between risk and clinicopathological variables, we visualized a plot based on clinicopathological features (Fig. 7A-F).There was a difference in gender, stage, T stage, M stage, and N stage.The results were consistent with clinical practice.

Association of risk signature with TMB.
To study the connection between risk score and gene mutations, we first determined how gene changes were distributed between high-risk (HRG) and low-risk (LRG) groups, and then we visualized the top 20 gene waterfalls that had the most common somatic mutations (Fig. 8A,B).Significantly, mutated gene mutation profiles revealed that TP53 had a higher rate of somatic mutations gene (SMG) in the high-risk group (50% vs 37%), whereas KEAP1 had a higher rate of somatic mutation in the low-risk group (18% vs 16%).This difference was because TP53 is associated with a higher risk of developing cancer overall.We found that HRG had a higher tumor mutational burden (TMB) (Fig. 8C).The Risk scores (RS) and TMB of each sample are shown in Fig. 8D.According to the findings of the Kaplan-Meier analysis, overall survival of high and low levels of TMB does not no any difference ( Fig. 8E).Patients were categorized into the following four categories according to their TMB risk.The plot implied that TMB did not impair the risk score's prognostic ability (p < 0.001, Fig. 8F).The aforesaid results proved that risk scores can predict the effect of immunotherapy and were considered a prognostic indicator.
Risk signature in TIME context.We used 7 methods to investigate the connection between risk score and immune infiltration as well as stromal cells to gain a deeper understanding of the connection between risk score and TIME (Fig. 9A).The XCELL study found that the microenvironment score increased in direct proportion to the level of LDHA and MS4A1 expression (Fig. 9B,C).A strong upward trend in high-risk scores was seen when utilizing the ESTIMATE method, and there were significant differences between high-risk and low-risk groups in the ESTIMATE scores, immunological scores, and stromal scores (Fig. 9D).

Biological function and signal pathway enrichment analysis.
According to the findings of the GSVA investigation, the individuals who belonged to the high-risk group exhibited increased apical junction and epithelial-mesenchymal transition, in addition to the activated MAPK pathway and chemokine signaling pathway (Fig. 10A,B).

Sensitivity of immunotherapy and chemotherapy prediction.
We also try to predict the sensitivity of immunotherapy.47 genes related to checkpoint blockade were retrieved (Table S4).The plot suggested that many check-point blockade-related genes (TNFSF4, PDCD1LG2, LAG3, CTLA4, and CD86) were positively correlated with risk scores (Fig. 10C).According to the risk assessment system, LRG has a high IPS score (PD1negative, CTLA4-negative(Fig.10D).In addition to this, we evaluated the efficacy of LUAD in combination with standard chemotherapy drugs such as cisplatin, docetaxel, and methotrexate.The newly identified indicators, such as IPS, TIDE score, and predicted neoantigens, were used to evaluate the possibility of immunotherapy response in TCGA-LUAD samples.Accordingly, we found that the PD1 and PD-L1 were significantly elevated in the high-risk group, and LAG3 was significantly decreased in the high-risk group (Supplementary Fig. S3B).Through analysis, we found that the IC50 of the three chemotherapeutic drugs (cisplatin, gemcitabine, and gefitinib) showed significant differences in HRG/LRG.The drug sensitivities of cisplatin (Fig. 10E) and gemcitabine www.nature.com/scientificreports/(Fig. 10F) were higher in HRG than in LRG, whereas gefitinib (Fig. 10G) had higher drug sensitivities in LRG.These results suggest a potential link between the risk signature and chemotherapeutic drug sensitivity.

Discussion
Lung cancer is a malignant tumor of the respiratory system with a high incidence, and it has the highest mortality rate among both men and women, non-small cell lung cancer (NSCLC) accounts for 80% of all lung cancer pathology types, and half of NSCLC is LUAD 18 .Targeting TME could assist traditional therapies and improve clinical responses in numerous cancer types 19 .Growing evidence has shown that TME has a significant impact on the growth and development of cancerous cells, and affects the response to immunotherapy.In this research, samples of lung adenocarcinoma were interrogated with TME-related genes, among which high expression of PLK1, LDHA, FURIN, FSCN1, and RAB27B was correlated with poor OS, while high expression of MS4A1 was correlated with longer OS compared with the MS4A1 low expression.Researchers have demonstrated that PLK1 promotes the development of Kras/ Tp53-mutant lung adenocarcinoma through transcriptional activation of the receptor RET 20 .High expression of LDHA regulators contributes to the tumor microenvironment and predicts prognosis in lung adenocarcinoma 21 .FURIN correlated with immune infiltration serves as a potential biomarker in SARS-CoV-2 infection-related lung adenocarcinoma 22 .FSCN1-induced PTPRF-dependent tumor microenvironment inflammatory reprogramming promotes lung adenocarcinoma progression via regulating macrophagic glycolysis 23 .Rab27b Is a potential indicator for lymph node metastasis  and unfavorable prognosis in lung adenocarcinoma 24 .MS4A1 dysregulation in asbestos-related lung squamous cell carcinoma is due to CD20 stromal lymphocyte expression 25 .The role of MS4A1 in tumor progression is still in the initial stage of research and is worthy of further exploration.In line with previous research, our study found that high TME risk was identified as a poor factor for OS in TCGA and GEO datasets 16 .In the field of lung cancer, the genes in our six-gene signature have not been extensively studied.However, the six-gene signature has a significant role in predicting and diagnosing LUAD in our research, indicating it or each gene in it may be the potential specific directions for future research on LUAD.Recently, we notice that there is another prognostic model built by an immune gene in LUAD 17 .however, when we compared our risk model with other prognosis risk signatures, the C-index demonstrated the highest AUC of our model.These results indicate that the overall performance of our proposed model is superior to others.
To explore the biological function of these six TME-related genes, we performed enrichment analysis and found that in lung adenocarcinoma, TME was most related to the MAPK pathway and chemokine signaling pathway.Chemotherapy is the main adjuvant treatment for lung urothelial carcinoma, and immune checkpoint inhibitors are the most promising new treatment.The immune checkpoints PD-L1/CD274, CTLA4, and B7-H3/ CD276 were significantly correlated with the risk scores.The combination of CTLA4 antibody, tremelimumab, and durvalumab in the treatment of metastatic lung cancer is in progress 26 .Anti-CD3 antibody chemically coupled with anti-B7-H3 mAb antibody has been clinically approved 27 .
As to treatment strategies, the high-risk group has a higher expression of co-inhibitory factors and is more resistant to common chemotherapy including cisplatin, gemcitabine, and gefitinib.The risk assessment demonstrated that on the gene set level, the TME-related gene set indicates higher disease risk, and more potential to perform immune evasion and establish chemotherapy resistance.Although this research is mainly based on bioinformatics analysis, this is an innovative and cutting-edge study based on big data and could serve as assistance to clinical practice.

FigureFigure 1 .
Figure 1.Comparison of the two clusters 1 and 2 (C1 and C2) in the TCGA-LUAD cohort.(A) Volcano map of the TCGA database differentially expressed genes in LUAD.(B) NMF algorithm-based clustering of the consensus map.(C,D) The cophenetic correlation coefficient is a measure that can be used to determine how stable the cluster is that was produced using NMF.(E) RSS reflects the model's performance in terms of clustering.(F) There were statistically significant differences between C1 and C2 in terms of overall survival.(G) The progression-free survival (PFS) demonstrated a significant difference between stages C1 and C2 of the disease.(H) A plot of the data from the alluvial deposits shows what percentage of C1 and C2 are present in each of the several molecular subtypes.

Figure 2 .
Figure 2. Substantial variations in immune cells of the tumor microenvironment (TME).(A-G) Numerous TME stromal cells displayed considerable differences.

Figure 3 .
Figure 3. Construction and validation of the TME-related 6-gene risk score (RS).(A) Profiles of 203 candidate genes' LASSO coefficients.The tenfold cross-validation value is marked by a vertical line.(B) A lasso regression parameter adjustment using a ten-fold cross-validation.Vertical lines are formed from the best data when the minimal criterion and the one standard error criterion are both met.The vertical lines on the left side of the image show the four genes that have not yet been found.(C,D) A Kaplan-Meier curve analysis was performed on the TCGA and GEO databases to demonstrate the difference in overall survival between high-risk and lowrisk groups.(E-J) A Kaplan-Meier curve analysis illustrating the difference in overall survival between patients with high and low expression of PLK1, LDHA, FURIN, FSCN1, RAB27B, and MS4A1 in lung cancer.

Figure 4 .
Figure 4. Evaluation and validation of the prediction value of the six-gene risk signature by the GEO dataset.(A) Confirmation of the predictive value of risk scores applied to the TCGA cohort.(D) The prognostic risk ratings of the GEO cohort were checked and found to be accurate.(B) The survival rate and urination of TCGA cohort patients who were diagnosed with LUAD.(E) The survival rate and overall duration of patients diagnosed with LUAD who were part of the GEO cohort.(C) The distribution of polygenic model risk scores among the members of the TCGA cohort.(F) The distribution of polygenic model risk scores among individuals in the GEO cohort.

Figure 5 .
Figure 5. Validation of risk signatures prognostic effectiveness.(A-D) A ROC analysis was carried out so that the prognostic value of the prognostic characteristics could be determined.The area under the curve of the risk score, together with other clinical indicators used to predict overall survival at 1, 3, and 5 years (E) The results of running a Cox regression with just one variable to analyze overall survival.(F) The conclusions were drawn from running a multivariate Cox regression on overall survival data.(G) In LUAD patients, a nomogram was utilized to predict survival.(H) Curves for the 1-, 3-, and 5-year nomogram calibration.

Figure 6 .
Figure 6.GSEA results for samples that have high or low expression of 9 core genes.(A-F) An enriched gene set was compiled with KEGG using data from samples that had high PLK1, LDHA, FURIN, FSCN1, RAB27B, and MS4A1 expression.

Figure 7 .
Figure 7.The significance of predictive risk variables in clinical practice.(A) A heatmap illustrating the distribution of the clinical characteristics and risk ratings associated with each sample.Prevalence of clinically different subgroups based on LRG or HRG.(B) Gender, (C) Tumor stage, (D) Stage M, (E) Stage N and (F) Stage T.

Figure 8 .
Figure 8. Correlation between risk score and TMB.High-risk score (A) and low-risk score were used to create the oncoPrint (B).(C,D)Correlation between risk score and TMB.(E) The Kaplan-Meier curves for groups with high and low TMB.(F) Patient stratification using the Kaplan-Meier curve based on TMB and risk signature.

Figure 9 .
Figure 9. Tumor-infiltrating cells abundance estimation.The results of the Spearman correlation analysis demonstrated a greater link between patients in the (A) high-risk category and tumor-infiltrating immune cells.(B,C) The correlation between the score of the microenvironment and the expression of LDHA and MS4A1.(D) A comparison of the TME score for groups that are high risk with groups that are low risk.

Figure 10 .
Figure 10.Gene Enrichment pathways of GSVA.(A) A heatmap illustrating the link between risk ratings and representative pathway entries on the KEGG database.(B) A heatmap illustrating the relationship between immunological signatures and risk scores.(C) The correlation between the levels of gene expression for immune checkpoint blockage and risk ratings.estimating how well immunotherapy will work.(D) IPS distribution map estimations of the effect that immune checkpoint inhibitors have on risk scores (E-G) A sensitivity analysis of cisplatin, gemcitabine, and gefitinib was performed on patients who had either a high or low-risk score.

Table 1 .
A six-gene signature in lung adenocarcinoma microenvironment.