Pancreatic adenocarcinoma associated immune-gene signature as a novo risk factor for clinical prognosis prediction in hepatocellular carcinoma

Pancreatic adenocarcinoma (PAAD) has high mortality and a very poor prognosis. Both surgery and chemotherapy have a suboptimal therapeutic effect, and this caused a need to find new approaches such as immunotherapy. Therefore, it is essential to develop a new model to predict patient prognosis and facilitate early intervention. Our study screened out and validated the target molecules based on the TCGA-PAAD dataset. We established the risk signature using univariate and multivariate Cox regression analysis and used GSE62452 and GSE28735 to verify the accuracy and reliability of the model. Expanded application of PAAD-immune-related genes signature (-IRGS) on other datasets was conducted, and the corresponding nomograms were constructed. We also analyzed the correlation between immune-related cells/genes and potential treatments. Our research demonstrated that a high riskscore of PAAD-IRGS in patients with PAAD was correlated with poor overall survival, disease-specific survival and progression free interval. The same results were observed in patients with LIHC. The models constructed were confirmed to be accurate and reliable. We found various correlations between PAAD-IRGS and immune-related cells/genes, and the potential therapeutic agents. These findings indicate that PAAD-IRGS may be a promising indicator for prognosis and of the tumor-immune microenvironment status in PAAD.


Immuno-correlation analysis and drug prediction of PAAD-IRGS.
We conducted the PAAD-IRGS risk score correlation analysis with 24 immune-related cells 36 in PAAD using the spearman's test 37 . Subsequently, survival analysis of several significant immune-related cells was conducted to identify whether they were risk factors of PAAD using tumor immune estimation resource (TIMER), version 2.0 database [38][39][40] . Then we downloaded the immunophenoscore (IPS) 41 data from The Cancer Immunome Atlas (TCIA) database (https:// tcia. at/ patie nts), which supports results of comprehensive immunogenomic analysis of next generation sequencing data (NGS) based on TCGA 42 , for analyzing the correlation between PAAD-IRGS and immune response in PAAD patients.
Relationships between PAAD-IRGS risk score and three kinds of immunomodulators expression in PAAD based on TCGA were explored and visualized with heatmaps, as well as relevant drug prediction accordingly via tumor-immune system interaction database (TISIDB) 43 (http:// cis. hku. hk/ TISIDB/ index. php), integrating multiple heterogeneous data. We searched the website with the gene symbol S100P, S100A2 and MMP12 and download relevant information in the "drug" module. Circle map and annotations were performed accordingly.
Analysis of protein expression of the PAAD-IRGS. The human protein atlas (HPA) database 44 ,a spatial map of the human proteome (http:// www. prote inatl as. org/ human prote ome/ patho logy) was used to ascertain the physiological and pathological expression data of S100P, S100A2 and MMP12. As supplementary, we used UALCAN (http:// ualcan. path. uab. edu/ index. html) to conduct protein level analysis of S100P, S100A2 and MMP12 genes. It is a comprehensive and interactive public resource for cancer OMICS data analysis 45 , provided by the Clinical proteomic tumor analysis consortium (CPTAC) dataset 46 . Statistical analysis. All statistical analyses were performed with R (version 3.6.3). Normally distributed variables were analyzed using the t-test and one-way ANOVA test and non-normally distributed variables with nonparametric tests. Log-rank test and Cox regression were used for survival analysis, Pearson's correlation and spearman's rank correlation test for correlation analysis. P or P.adj < 0.05 was considered statistically significant. The correlations was defined as follows: 0.00-0.10 (negligible), 0.10-0.39 (weak), 0.40-0.69 (moderate), 0.70-0.89 (strong), 0.90-1.00 (very strong) 47 .

Results
The study design for this work is shown in Fig. 1.

DEGs & IRGs analysis. 178 PAAD patients with gene expression and prognostic information and 4
matched adjacent normal samples were included in the training cohort. 25,597 gene IDs were analyzed after removing null values, in which we obtained 539 differentially expressed genes that met the cut-off criterion of |log2(FC)|> 1 & P.adj < 0.05 in PAAD (236 genes up-regulated while 303 down-regulated) ( Fig. 2A). Through the intersection of 490 DEGs and 1744 IRGs, 49 differentially expressed IRGs in PAAD were screened out (Fig. 2B). Enrichment analysis. The KEGG pathways which were most associated with immunity involved in natural killer mediated cytotoxicity (P < 0.001), B cell receptor signaling pathway (P < 0.001) and chemokine signaling pathway (P < 0.05) (Fig. 2C). Specifically, regulation of the immune effector process, cell killing and humoral immune response of the biological process (BP) module (all P < 0.001) were observed to be associated with immunity. So was major histocompatibility complex (MHC) protein binding and cytokine receptor binding of molecular functional (MF) module (Fig. 2D). Gene overlap is highlighted in the volcano plot (Fig. 2E).
Construction and assessment of PAAD-IRGS. We further analyzed the genes identified above to identify the potential diagnostic and prognostic value of IRGs in PAAD. Based on LASSO regression analysis, four prognostic risk biomarkers were identified (high expression of S100P, S100A2, and MMP12 was associated with poor prognosis, while low expression of DEFA5 was associated with better prognosis) (Fig. 3A). S100P, S100A2 and MMP12 were expressed higher in tumor tissues, compared with normal tissues (P < 0.001), while  www.nature.com/scientificreports/ the opposite was true for DEFA5 expression (P < 0.05) (Fig. 3B). The area under curve (AUC) of S100P, S100A2, and MMP12 were 0.971, 0.968, and 0.981, indicating their excellent diagnostic value. However, DEFA5 was considered an inefficient biomarker for diagnosis (AUC = 0.438) (Fig. 3C). Subsequent univariate and multivariate COX regression analyses were conducted on the four genes, excluding DEFA5 (P = 0.164) (Fig. 3D). The model of PAAD-IRGS was finally comprised of S100P, S100A2 and MMP12. We plugged the corresponding regression coefficients into the equation as follows to complete the establishment of PAAD-IRGS: PAAD-IRGS = EXP(S100P) × 0.132 + EXP(S100A2) × 0.098 + EXP(MMP12) × 0.095. Furthermore, we performed PAAD-IRGS specialized differential expression analysis in different pathology stages. Using the Gene expression profiling interactive analysis (GEPIA) database 48 , a statistically significant difference in S100P expression was observed in different pathology stages of TCGA-PAAD (P < 0.001) (Fig. 4A). After a single gene correlation analysis of these three genes, we obtained 79 co-correlated genes (Fig. 4B). Based on further enrichment analysis, KEGG pathways seemingly involved in ECM-receptor interaction, regulation of actin cytoskeleton, p53 signaling pathway, focal adhesion and pancreatic cancer, and GO pathway focused on cell-membrane organization and connection (Fig. 4C).
The model showed a better diagnostic capability than individual genes with an AUC of 0.993 (95% confidence interval (CI) = 0.987-0.998) (Fig. 4D). By single gene survival analysis, we observed that patients in S100A2 highexpression group had a worse OS than patients in S100A2 low-expression group (hazard ratio (HR) = 1.62, 95% CI = 1.07-2.46, P = 0.023). However, there is no statistically significant difference between low and high expression groups of S100P or MMP12 (Fig. 4E). Patients in the PAAD-IRGS high-risk score group had a much worse OS than patients in low-risk score group (HR = 2.21, 95% CI = 1.45-3.39, P < 0.001) (Fig. 4F).

PAAD-IRGS related drugs.
TISIDB is a web portal for tumor and immune system interaction, which supports genomics, transcriptomics, and clinical data from TCGA and mechanism, and drug information from public databases. We can only obtain potential drugs associated with PAAD-IRGS, which is demonstrated in a  (Fig. 15). Currently, drugs targeting PAAD-IRGS (S100P, S100A2 and MMP12) remained in the experimental stage, and effective targeted drugs for pancreatic cancer are still in the blank.

Analysis of protein expression of the PAAD-IRGS.
We obtained the protein expression pattern of S100P and S100A2 in different cancers based on the HPA database. Expression of S100P in most pancreatic (83.3%) and liver (54.5%) cancers showed moderate to intense cytoplasmic and nuclear staining (Fig.S1A). Immunohistochemistry (IHC) results also confirmed that S100P was highly expressed in PAAD and LIHC than in corresponding normal tissues (Fig.S1B). Although the level of S100A2 protein expression was lower than that of S100P (Fig.S2A), we can still observe the moderate intensity of S100A2 in PAAD and LIHC than in corresponding normal tissues (Fig.S2B). The information on MMP12 in the HPA database was absent, we conducted further verification using the UALCAN database. To be consistent, the protein expression of S100P was higher in PAAD and LIHC than in corresponding normal tissues (P < 0.001) (Fig.S3A), as well as in MMP12 (P < 0.001) (Fig.S3B). Despite the data absent in LIHC, the protein expression of S100A2 was higher in PAAD than in normal tissues (P < 0.01) (Fig.S3C).

Discussions
Although pancreatic cancer is still one of the leading causes of cancer-related death worldwide, some improvements in patient outcomes have been made due to advancements in therapeutics 51 . Since there are no obvious clinical symptoms in the early stage, pancreatic cancer is usually advanced at diagnosis. Secondly, the high mortality of PAAD seems to be inextricably associated with its suppressed immune microenvironment and significant decrease of T cell infiltration levels in the tumor 52 . Although immunotherapy has revolutionized the cancer treatment model, PAAD patients rarely respond to these therapies due to poor activation and infiltration of T cells in the tumor-immunity microenvironment (TIME). Recent research has revealed potential www.nature.com/scientificreports/ epigenetic-transcriptional mechanisms by which tumor cells remodel their TIME and suggested EGFR inhibitors as potential immunotherapy sensitizers in PAAD 53 . Intra-tumoral IFN-γ-producing Th22 cells were reported to be associated with TNM staging and the worst outcomes in PAAD 54 . γδ T Cells were also considered to promote pancreatic oncogenesis by restraining αβ T Cells activation 55 . Each T cell subpopulations secretes different cytokines and chemokines that modulate the immune response in synergistic and opposite ways 56 . Additionally, expansion of immunosuppressive B cells induced by IL-1β might promote PAAD 57 , and many extracellular matrix (ECM) components, including collagen, growth factors, cytokines, chemokines, and cancer-associated fibroblast (CAF) play vital role in tumor progression 58 . All tumor-immunity components in the TIME interact continuously, constructing a complex stroma-tumor crosstalk network. Due to the complexity of tumor-immunity mechanisms, there is still no effective way to predict prognosis in clinical practice. Our study aimed to discover immune-related biomarkers and establish a robust model to predict prognosis in PAAD patients. The TCGA-PAAD dataset was used to screen for potentially immune-related DEGs , then analyzed for differential expression and intersection. GO and KEGG enrichment analyses were also performed to confirm that the mechanisms involved in these genes were focused on immune-related pathways (Fig. 2D). Furthermore, we narrowed down the results by Lasso regression analysis and obtained three key IRGs finally through the univariate and multivariate Cox regression analysis. The PAAD-IRGS comprised of S100P, S100A2 and MMP12 had an outstanding diagnostic value (Fig. 4D) and accurately predicted the prognosis for PAAD patients (Fig. 5A,B). We Table 6. The univariate and multivariate analysis for the DSS (TCGA-PAAD). DSS disease specific survival, CI confidence interval, NA reference group or could not be evaluated. *Compared with each group (log-rank test or Omnibus test for univariate, Cox regression analysis with adjusted hazard for multivariate). p < 0.05 means statistically significant (highlighted in bold). www.nature.com/scientificreports/ specially performed secondary enrichment analysis on PAAD-IRGS, revealing that this model was also associated with pathways of ECM and cell-membrane junction and immune-related pathways (Fig. 4C). Among the three genes, S100P, a 95-amino-acid protein belonging to the S100 family, was regarded as a promising diagnostic 59 and prognostic biomarker 60 for pancreatic cancer with a potential mechanism of regulating www.nature.com/scientificreports/ invasion into the lymphatic endothelial monolayer 61 , which is consistent with our results. S100A2, another member of the S100 family, was reported as a prognostic biomarker involved in immune infiltration and immunotherapy response prediction in pancreatic cancer 62 , which matches our findings. Turn to MMP12, as one of the members of the matrix metalloproteinases family, it encodes extracellular matrix participating in the (EMT) which was identified as a strictly programmed shift playing a crucial role in tumor invasion and metastasis 63 . MMP12 was also revealed to be a potential diagnostic biomarker for pancreatic carcinoma. Its up-regulation was associated with a poor prognosis 64 . These genes were verified to be closely correlated with different cancers, especially the diagnosis and prognosis of PAAD, a finding we also made in this work. Although many types of diagnostic or prognostic biomarkers, and even to some extent predictive models have been identified in recent studies 60,65-68 , we discovered three IRGs with high specificity. We integrated them to establish a novel prognostic model for PAAD. Compared with other models, our model had an extremely remarkable performance on both diagnosis and prognosis prediction in PAAD patients. In our study, patients in the PAAD-IRGS high-risk group had a significantly worse OS than those in the lowrisk group (Fig. 4F), indicating that the PAAD-IRGS score may be an independent risk factor when evaluating the prognosis of PAAD patients. Additionally, time-dependent ROC and DCA results (Fig. 5A,B) showed that PAAD-IRGS had a good performance in prediction prognosis. The nomogram integrating PAAD-IRGS and multiple clinicopathological variables showed better accuracy and reliability than any singular variable (Table 3). We not only evaluated and validated the PAAD-IRGS by using two datasets of pancreatic cancer from GEO, but also investigated its application to hepatocellular carcinoma. Hepatobiliary and pancreatic diseases are often classified into the same category since they are anatomically and functionally linked. Although the cholangiocarcinoma dataset of TCGA was discarded due to its small sample size, we found that the PAAD-IRGS had excellent diagnostic and prognostic value on LIHC patients. We also combined relevant clinicopathological variables with PAAD-IRGS to construct a comprehensive nomogram model, which showed good accuracy and robustness. Based on the results, we might speculate whether the three genes participated in the oncogenesis, progression and metastasis of LIHC and PAAD partially or collectively. However this needs further exploration. We also looked into using PAAD-IRGS to predict DSS and PFI in patients with PAAD. The results of PAAD-IRGS and the relevant prognostic model were encouraging. Unlike other biomarkers that only had diagnostic value, PAAD-IRGS had the dual capability to predict diagnosis and prognosis with high accuracy. Several multiple-genes prognostic model have been established and reported 67,68 . Compared with them, our model had outstanding general applicability with high accuracy and stability. As to the miRNA or lncRNA-related signatures 65,69-71 , our PAAD-IRGS was more stable and convinced; Compared with multiple-gene signatures 9,68 , necroptosis-related www.nature.com/scientificreports/ gene signature 72 and m6A-related gene signature 73,74 , which had been reported, our PAAD-IRGS was new and more versatile with outstanding performance. It can be well applied to prognostic prediction of multiple cancers with different prognostic parameters. Its good diagnostic ability for various cancer and its relationship with tumor-immunity would make it promising for further research. There were several limitations of this research to be concerned about. The limitation to this study worth noting include: there may be an effect on the result due to batch effect and differences in sample sizes that are difficult to eliminate completely. Secondly, although the prognostic value of the PAAD-IRGS was evaluated in multiple datasets, large-scale clinical research is still necessary for further validation. Thirdly, we conducted correlation analyses between PAAD-IRGS and immune-related cells/immunomodulators and disclosed some potential immune-related targets. However, the underlying mechanisms and pathways need further investigation and experiment validation.
In conclusion, our study established a novel prognostic model comprised of three genes with high specificity for predicting prognosis in patients with PAAD. This model demonstrated excellent performance in predicting

Data availability
The original contributions presented in the study are included in the article/supplementary material. Further inquiries can be directed to the corresponding author/s. All data and original files in our work are freely available under a 'Creative Commons BY 4.0' license. All methods were carried out in accordance with relevant guidelines and regulations.