The novel immune-related genes predict the prognosis of patients with hepatocellular carcinoma

Hepatocellular carcinoma (HCC) is one of the main causes of cancer deaths globally. Immunotherapy is becoming increasingly important in the cure of advanced HCC. Thus it is essential to identify biomarkers for treatment response and prognosis prediction. We searched publicly available databases and retrieved 465 samples of genes from The Cancer Genome Atlas (TCGA) database and 115 tumor samples from Gene Expression Omnibus (GEO). Meanwhile, we used the ImmPort database to determine the immune-related genes as well. Weighted gene correlation network analysis, Cox regression analysis and least absolute shrinkage and selection operator (LASSO) analysis were used to identify the key immune related genes (IRGs) which are closely related to prognosis. Gene set enrichment analysis (GSEA) was implemented to explore the difference of Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway between Immune high- and low-risk score groups. Finally, we made a prognostic nomogram including Immune-Risk score and other clinicopathologic factors. A total of 318 genes from prognosis related modules were identified through weighted gene co-expression network analysis (WGCNA). 46 genes were strongly linked to prognosis after univariate Cox analysis. We constructed a seven genes prognostic signature which showed powerful prediction ability in both training cohort and testing cohort. 16 significant KEGG pathways were identified between high- and low- risk score groups using GSEA analysis. This study identified and verified seven immune-related prognostic biomarkers for the patients with HCC, which have potential value for immune modulatory and therapeutic targets.

Hepatocellular carcinoma (HCC) is known as the third cause of cancer death worldwide and its incidence continues to rise 1 . HCC can be induced by various risk factors, such as hepatitis B/C virus infection, nonalcoholic steatohepatitis (NASH), alcoholism, and smoking 2 . The patients with HCC often accompanying cirrhosis and many molecular pathways deregulation 3 . Traditional treatment methods for HCC patients have shown poor clinical efficacy for a long time 4 . Curative approaches for HCC including: surgical resection、orthotopic liver transplantation and locoregional therapies. While most HCC patients were already at advanced status when received the diagnoses and were not amenable to curative resection or ablation. Thus palliative treatments such as: trans-arterial approaches and systemic therapies are particularly important for such patients 5 . Sorafenib has been the only first-line therapy for advanced HCC patients for more than 10 years, while the reduction in the risk of death was limited as the median time to progression was 5.5 months and median overall survival was 10.7 months [6][7][8] . Recently immunotherapy is becoming the new standard treatment for advanced HCC patients all around the world 9 . Some clinical studies have shown that Nivolumab therapy can provide demonstrable responses for a subset of advanced HCC patients 10 . According to the 2020 American Society of Clinical Oncology guideline, Atezolizumab + bevacizumab has been recommended as the new first-line treatment for most advanced HCC patients 11 . This management has shown superior efficacy including higher objective response rates and median survival compared with sorafenib based on a global, open-label, phase 3 trial 12 . Combination of immune checkpoint inhibitors and kinase inhibitors will soon become a cornerstone 13 . While some early-phase clinical trials indicated that the combination therapy may increase the toxicity of individual agents 14,15 . In addition, the response rate of immunotherapy at the present stage is limited, with the objective response rate generally failing to exceed 20%. Exploring strategies to maximize patient response 、striving to better predict and choose patients who are likely to respond are the development directions of HCC immunotherapy. Therefore, novel biomarkers for prediction of treatment response are critical to develop and optimize new management strategies 16  www.nature.com/scientificreports/ Gene microarrays and RNA-sequencing technology combined with bioinformatics analysis have identified several prognostic biomarkers for cancer diseases these years [17][18][19][20][21] . Some immune-related prognostic signatures showed strong prediction ability. For example, Huang et al. 22 constructed an immune related gene (IRG) prognostic classifier consisting of PSME1, CDC42, CMTM6, CXCR6, CD8B, HLA-DQB1, HLA-C, and TNFSF13  based on GEO data for melanoma patients. Similar IRGs prognostic signatures have been reported for colorectal  cancer 23 , lung adenocarcinoma 24 and gastric cancer 25 , as well.
In this study, we analyzed the HCC gene expression profiles in The Cancer Genome Atlas (TCGA, http:// www. cancer. gov/ tcga) database using weighted gene co-expression network analysis (WGCNA). Genes from the significant prognostic-related modules were further computed for Cox regression and Lasso regression and constructed an IRG prognostic signature consisting of seven genes (NR1D1, HDGF, LMBR1, PRDX1, NR6A1, EPO and DCK). In addition, we analyzed the IRG signature by intergrating other clinical information like tumor stage, grade, patient's age, gender and the IRG signature was confirmed as an independent prognostic indicator for HCC. The flow chart of this study was shown in (Fig. 1A). Our study established an immune-related signature for HCC patients and provided information of subsequent personalized diagnoses and treatment strategies of HCC.

Materials and methods
Data collection and immune related genes selection. The GSE76427 gene expression profiles were retrieved from the Gene Expression Omnibus (GEO: https:// www. ncbi. nlm. nih. gov/ geo/) database and the samples without follow-up information were excluded. The GEO expression file was normalized for further analysis. The HCC RNA-seq data was retrieved from TCGA database including 407 HCC samples and 58 normal samples. The TCGA dataset was normalized using the FPKM method 26 . To eliminate the batch effect of GEO and TCGA dataset, R package "sva" was applied. A total of 1811 IRGs (Table s1) related to human cancers were identified through the Immunology Databases and Analysis Portal (ImmPort) database (https:// www. immpo rt. org/ home 27 . The common immune-related genes among the three above datasets were chosen. Weighted gene co-expression network analysis. Weighted gene co-expression network analysis (WGCNA) was performed in TCGA dataset, and the power exponential weighting of gene correlation coefficients was used to reveal the relationships among different genes 28 . We also explored the relevance among the clinical phenotype with gene modules. Module Membership (MM) represented the Pearson correlation coefficient of the module's first principal component and module genes expression. Gene significance (GS) represented the level of linear correlation between clinical phenotype and module genes expression.
Construction and verification of the IRG signature. The HCC patients' clinical information was retrieved from TCGA database and the samples without overall survival (OS) and survival state information were excluded. A total of 371 samples were brought into survival analysis. Aiming to make the established prognostic signature have better generalization, the samples were divided into training dataset (186 samples) and test dataset (185 samples) randomly. In training dataset, univariate Cox regression was used with the survival R package (p < 0.01) to identify survival-related IRGS, the Least Absolute Shrinkage and Selection Operator (LASSO) regression was used to eliminate collinearity among IRGs 29 . Ultimately a prognostic model involving seven key IRGs was constructed by multivariate Cox proportional hatablezards regression analysis 30 . The immune risk score was served as predictors of prognostic status. And we calculated the immune risk score with the following formula: risk score = n i=1 exp i * βi. . We plotted the Kaplam-Meier survival curves to evaluate the model's prediction effect. All of the samples were categorized into high-risk group and low-risk group according to the median value of the training dataset. Timedependent receiver operating characteristic (ROC) curves were plotted to appraise the prediction performance of 1-, 3-, 5-year survival 31 . We also calculated the Area Under roc curve (AUC) of the training dataset, testing dataset and GSE dataset via time ROC R package.

Relationship between IRGs signature and clinicopathologic features. The univariate Cox analysis
determined the correlation between survival and clinicopathologic features while the indicators including: age, gender, TNM stage, tumor grade and Immune risk score. Then the independent prognostic indicators of HCC was identified by the multivariate Cox analysis. Finally we generated a prognostic nomogram using rms R package.

Relationship between IRGs signature and immune checkpoints expression. TIMER database
(https:// cistr ome. shiny apps. io/ timer/) is a public resource to analyze and visualize the abundance of tumorinfiltrating immune cells in a given cancer type 32 . In order to explore the effect of the IRGs signature in HCC immunotherapy, the "Correlation" module was used to calculate the relationship between the IRGs signature and another 6 immune checkpoints' expression including PDCD1, PDCD1LG2, CTLA4, CD247, HAVCR2, and IDO1. And Spearman's correlation > 0.3 was considered to have a significant correlation.

Results
Identification of prognosis-related modules by WGCNA. Weighted Gene Correlation Network Analysis (WGCNA) was performed on 1284 overlapping immune-related genes (Fig. 1B). Firstly, we removed one sample, TCGA-FV-A4ZP, as the height in the hierarchical clustering tree is greater than 20,000 ( Fig. 2A).
The soft-thresholding power to construct a gene regulatory network was established basing on a scale-free R 2 = 0.9 (Fig. 2B). Eight modules were identified using dynamic pruning method and similar modules has been merged (Fig. 2C). The highest correlation with survival status was shown in green module while the red and blue modules showed the highest correlation with the overall survival of HCC patients (Fig. 2D). Finally, we chose a total of 318 IRGs from the three modules for further analysis.
Construction of prognostic signature based on immune related genes. As the result of univariate Cox regression analysis, 46 survival related genes of the above three modules were identified with the cut-off value of P < 0.01 (Table s2). In order to eliminate collinearity between IRGs, Lasso regression analysis was performed (Fig. 3). In the end, a total of 7 genes (NR1D1, HDGF, LMBR1, PRDX1, NR6A1, EPO and DCK) were included in multivariate Cox regression to establish a prognostic signature. The hazard ratio and 95%CI of the seven IRGs are presented (Fig. 4). The Kaplan-Meier curves were plotted in the training dataset according to the immune risk score and the high-risk group showed a poor overall survival compared to low-risk group (Fig. 5A).
The time-dependent ROC curves revealed the prognostic signature had a superior predictive accuracy as the AUC was 0.8473 for 1-year, 0.7575 for 3-year, 0.6872 for 5-year in the training dataset (Fig. 5B).
Verification of the prognostic signature in testing dataset and GSE cohort. The clinical characteristics of patients from TCGA and GEO database are shown in Table 1. In the TCGA testing dataset, the Kaplan-Meier analysis revealed that the high-risk patients showed a worse survival (Fig. 5C). The time-dependent ROC curves are plotted (Fig. 5D). We also used an independent cohort (GSE76427) to verify the predictive ability of the seven-gene signature. The Kaplan-Meier curve in GSE cohort was plotted (Fig. 5E) and the ROC curve showed the signature still had a good accuracy as the AUC was 0.6887 for 1-year, 0.604 for 3-year and 0.6332 for 5-year (Fig. 5F). Moreover, the immune-risk signature was an independent prognostic indicator for overall survival (OS) in multivariate Cox analysis with p < 0.01 (Fig. 6). We also constructed a prognostic nomogram which integrating the clinical features with immune-risk score as a quantitative tool for predicting OS of HCC patients (C index = 0.714, 95% CI = 0.651-0.777, Fig. 7A). The calibration plots presented that the nomogram performed a moderate accuracy (Fig. 7B).
The IRGs signature's effect in HCC immunotherapy. We used the TIMER database to further explore the relationship between the IRGs signature and the expression of another 6 immune checkpoints including PDCD1, PDCD1LG2, CTLA4, CD247, HAVCR2, and IDO1. The expression scatterplots were plotted (Fig. 8).
According to the results, the expression of DCK was significantly associated with PDCD1LG2 (cor = 0.319, P < 0.05) and HACVR2 (cor = 0.382, P < 0.05). And the expression of EPO was significantly associated with PDCD1 (Spearman's correlation = 0.302, P < 0.05), which indicates that DCK and EPO might play a important role in the response to immunotherapy in HCC.

Discussion
HCC is one of the most deadly cancers all around the world, whose incidence and mortality rates are predicted to continue rising 34 . Liver resection, liver transplantation and radiofrequency ablation have been the potential curative treatments for HCC patients at a relatively early stage for a long time [35][36][37][38] . Transarterial chemoembolization and Tyrosine kinase inhibitor targeting agent Sorafenib have been demonstrated having survival benefits advanced HCC patients. While the improvement of overall survival is still limited. Sorafenib can only prolong the overall survival of advanced HCC patients by 3 months, and it also has significant side reaction. Immunotherapy is developing rapidly and has formed part of the standard treatment regimen for many cancer diseases 39 . The normal liver is a tolerogenic immune organ in order to prevent the abnormal immune response caused by continual pathogen exposure 40 . Chronic inflammatory, which can inhibit the antigen-specific immune surveillance, is an important process in the development of HCC 41 . Chronic inflammatory state also leads to the overexpression of immune checkpoint molecules 42 . Therefore, activate the immune reaction to HCC can be helpful to the treatment. Several studies showed that increasing the level of activated peripheral blood mononuclear cells and dendritic cells benefited in both early and advanced stage HCC [43][44][45] . The blockade of immune checkpoints is the most successful immunotherapeutic in various cancer diseases. Programmed cell death-1(PD-1) antibodies including nivolumab and pembrolizumab have already shown promising efficacy in advanced HCC patients 46 . However, the reliable predictive biomarkers for immunotherapy are still lacking.
High throughput sequencing technology, gene expression databases and bioinformatics analysis provided possibility for systematic profiling of immune signatures in solid cancer. Wang et al. 22  www.nature.com/scientificreports/ www.nature.com/scientificreports/ cancer. Another bioinformatics research which analyzed three circRNA datasets and one miRNA dataset found that UBE2L3 was the key gene in the pathogenesis of HCC 47 . In this study, we identified a seven-gene signature comprising NR1D1, HDGF, LMBR1, PRDX1, NR6A1, EPO and DCK that was closely related to the prognosis. NR6A1 and NR1D1 were members of the nuclear hormone receptor family 48 . NR6A1 was reported as a novel biomarker associated with migration and invasion of prostate cancer 49 . The overexpression of NR6A1 in prostate cancer cells could reduce G0/G1 phase cell cycle arrest and promoted tumor growth 50 . Hepatoma-derived growth factor (HDGF) played an important role in the development and progression in many solid cancers including HCC 51,52 . HDGF stimulated the differentiation of neutrophils in gastric cancer patients and relayed Hp-induced inflammatory signaling, which was involved in gastric carcinogenesis 53 . High HDGF levels in serum of non-small www.nature.com/scientificreports/ cell lung cancer patients also indicated bone metastasis and unfavorable prognosis 54 . LMBR1 was related to the preaxial polydactyly in human and mouse. Its role in the tumor progression is still unclear. While according to a gene expression study about gastrointestinal stromal tumor, LMBR1 might be a tumor progression promoter in GISTs by regulating the expression of nuclear BMI1 55 . PRDX1 was a member of peroxiredoxin family 56 , whose overexpression was associated with lymph metastasis, histopathological grading and the tumor size in head and neck squamous cell carcinoma patients 57 . Bajor et al. 58 also found that the downregulation of PRDX1 could significantly impaired the growth rate of breast cancer cells and was a potential target for breast cancer treatment. Erythropoietin (EPO) was a biomarker of predicting prognosis in patients with Acute-on-chronic liver failure (ACLF), and patients without bleeding showed a lower level of EPO 59 . Deoxycytidine kinase (DCK) was wellknown for its association with resistance to chemotherapeutic agents. Based on the research of Shang et al. 60 , the knock down of DCK inhibited proliferation and tumorigenicity of cervical cancer cells. PDCD1 encodes the Programmed cell death protein-1 (PD-1), which expresses in the activated T cells. The interaction between PD-1 and its ligand PD-L1 (encoded by PDCD1LG1) suppresses the activation of lymphocytes and cytokine production 61 . Inhibitors of PD-1 and PD-L1 have shown great clinical benefits in some types of cancer 62 . Our study analyzed the correlation between the expression of IRGs and immune check point and found that the expression of PDCD1/PDCD1LG2 was in positive correlation with the expression of DCK and EPO, which revealed that the overexpression of DCK and EPO may upregulate the expression of PD-1/PD-L1, and lead to a suppression of anti-tumor immune response. The expression of DCK was also in positive correlation with the expression of HAVCR2. HAVCR2, which also known as TIM3, plays an important role in inhibiting the Th1 response and the expression of cytokines, some preclinical researches have shown that vivo blockade of Tim-3 leading to the enhance of anti-tumor immunity and the inhibition of tumor growth 63 . The expression level of DCK was positively related to TIM3, indicating a potential correlation with the immunological tolerance in HCC. Thus, the DCK and EPO from IRGs may regulate the expression of some specific immune checkpoints and promote the tumor escape mechanisms in HCC. Further researches are needed to explore whether DCK and EPO could be the potential targets in immune checkpoint inhibitor therapy.
Gene set enrichment analysis (GSEA) indicated 16 significant differential KEGG pathways among high-and low-risk groups. Cell cycle is accomplished through a series of events including S phase, M phase, G1 and G2 phases 64 . The check point network controlled by ATM/ATR and CHK2/CHK1 can prolong cell-cycle progression in G1, G2 or S phases in respond to DNA damage [65][66][67][68] . However, mutations in checkpoint pathway provide the opportunity for the continued growth of genomic abnormalities cells, thereby increasing the chance of malignant transformation 69 . Glycosylphosphatidylinositol (GPI)-anchored proteins has been reported associated with a range of cancer 70 . Trink et al. found the first oncogenic GPI-T subunit, PIG-U, in bladder cancer. Another research showed that the level of cell surface GPI-anchored protein is high in breast cancer cells 71 . Nucleotide Excision Repair pathway plays a crucial role in cancer prevention 72 . And the defects of the global genome NER (GG-NER) sub pathway result in cancer predisposition 73 . Autophagy plays an important role in controlling protein and organelle quality and quantity 74 ,which can promote cancer progression by suppressing P53 expression and inhibiting cell death, senescence, and an anti-tumor immune response. In addition, the homologous recombination repair (HRR) pathway provides high-fidelity repair of double-strand DNA breaks. The deficiency in HRR is a target for a new selective therapy for high-grade ovarian cancer 75 .  www.nature.com/scientificreports/ Our study has following limitations. Firstly, the number of samples we retrieved from TCGA database and GSE76427 are relatively small, so that the prediction ability of our IRGs prognostic signature should be verified in a large prospective clinical cohort. Secondly, as the clinical information released in publicly available datasets is limited, the clinicopathological parameters that included in nomogram were not comprehensive. The overall survival of HCC patients are significantly impacted by some other clinical factors as well. Including the aetiology of HCC (HBV/HCV infection), the status of liver failure and portal hypertension, the treatment received by HCC patients and so on. Based on this limitation, the immune-related signatures may not be an absolute independent risk factor for the overall survival for HCC patients. Variables not included in our study may have potential relevance to the immune-related signatures, which need to be further explored in specific research. Thirdly, our study was a retrospective analysis based on the data from published studies rather than real-word treatment experience, therefore the clinical application value of the IRGs prognostic signature needed further evaluation. What's more, the IRGs prognosis signature in this study was established by pure bioinformatics analysis. Therefore, further experiments are needed to validate our results.
In conclusion, we successfully recognized a novel IRGs prognostic signature with high predictive value and accuracy for patients with HCC, which may contribute to the decision making of clinical management. Moreover, this study provides additional information for further research on HCC pathogenesis and targeted therapies. www.nature.com/scientificreports/

Data availability
The datasets supporting the conclusions of this article are available in the public databases TCGA database (http:// www. cancer. gov/ tcga) and Gene Expression Omnibus (https:// www. ncbi. nlm. nih. gov/ gds/) with the accession numbers: GSE76427.