29 immune-related genes pairs signature predict the prognosis of cervical cancer patients

To screen the key immune genes in the development of cervical cancer, construct immune related gene pairs (IRGPs), and evaluate their influence on the prognosis of cervical cancer. Tumor Genome Atlas (TCGA) database and geo database were downloaded as training set and validation set respectively, and immune related gene data were downloaded from immport. IRGPs model is established by machine learning, and the model is analyzed and evaluated. Using the Uclcan to analyze the immune genes expression in cervical cancer, and to further explore the association with the expression level and the clinical stage and prognosis of cervical cancer. According to the analysis of training set, we identified 29 IRGPs as key gene pairs and constructed the model. The AUC value of the model was greater than 0.9, and the model group survival rate was conspicuous different (P < 0.001). The reliability of the model was confirmed in the validation group. Our IRGPs play an important role in the occurrence and development of cervical cancer, and can be used as a prognostic marker and potential new target of cervical cancer.

Scientific RepoRtS | (2020) 10:14152 | https://doi.org/10.1038/s41598-020-70500-5 www.nature.com/scientificreports/ results of PD-1/PD-L1 inhibition in cervical cancer are also satisfactory. However, at present, immunoassay sites with a therapeutic effect are scarce, and the research on tumor immunotherapy is far from sufficient. In this study, we screened immune genes that are significantly related to the prognosis of cervical cancer, constructed an immune gene pair (IRGP) model based on these genes, and used it to verify the unique prognostic markers of cervical cancer.

Method
Data acquisition. Gene expression profile data of 255 patients with cervical squamous cell carcinoma were obtained from cancer and tumor gene map (TCGA https ://www.tcga.org), and gene expression profile data set gse44001 16 was obtained from gene expression compilation (GEO https ://www.ncbi.nlm.nih.gov/geo/) database, including 300 samples of cervical cancer patients.
Acquisition of sample immune gene expression. 2,498 immune related genes were downloaded from immport (https ://www.immpo rt.org/home), including antigen presenting cells, chemokines and their receptors, cytokines and their receptors, interferon, interleukin, etc. Using limma package in R (3.61), we compared the gene expression data of cervical cancer samples downloaded from TCGA database and geo database as training set and verification set with immune related genes, and extracted the expression amount of immune related genes in cervical cancer samples.

Construction of immune related gene pairs (IRGPs).
In the two groups of data processed in the previous step, the IRGP of the sample is calculated, and the relatively high change is selected according to the standard of media absolute deviation > 0. 5. IRGP values are calculated by comparing gene expression levels in specific samples or profiles in pairs. The immune related genes are matched to compare the IRGPs. If the first IRG is larger than the second IRG, the output of the IRGP is 1; otherwise, the output is 0. If the ratio of IRGP score of 0 or 1 in training set and verification set is higher than 80%, then remove the IRGP and retain the remaining IRGP as candidate IRGP for prognosis prediction. The logistic rank test was used to screen the prognosis IRGP (FDR < 0.01). Cox risk regression analysis and glment in R (3.61) were used to perform tenfold cross validation to analyze the candidate IRGP and obtain the IRGP index. We constructed the best 29 gene pairs as immune gene pair model. We use ROC to calculate the optimal cutoff value of IRGP index, and use it as the basis to distinguish high and low risk groups.

IRGPs model validation.
The single factor and multi factor Cox proportional risk analysis and survival analysis of TCGA and gse44001 cervical cancer samples were carried out with IRGPs model. Functional enrichment analysis of GSEA, go and KEGG. Gene set enrichment analysis(GSEA) enrichment analysis was carried out for each gene related to immune prognosis using the fgsea package in R (3.61) 18 .Cluster profiler 19 was used to enrich Gene ontology(GO)function and KEGG pathway. Significant enrichment criteria: the absolute value of NES is greater than 1, the nomp value is less than 0.05, and the fdrq value is less than 0.25.

Expression of immune gene in cervical cancer.
Ualcan 20 were used to analyze the expression of immune genes in cervical cancer.
Statistical analysis. Measured data were expressed as mean ± standard deviation (x ± s) and data were compared using t test. Kaplan Meier method was used for survival analysis. The receiver operating characteristic curve (ROC curve) and ROC analysis were completed by survivalROC(1.0.3). Single factor and multi factor analysis using Cox proportional risk regression model. P < 0.05 was statistically significant, P < 0. 01 as the difference has very significant statistical significance.
Ethical approval and consent to participate. This article does not contain any studies with patients or animals performed by any of the authors. www.nature.com/scientificreports/ (Table 1). 73 prognosis related IRGPs were screened by lasso Cox proportional risk regression analysis. After 1,000 iterations, we selected 29 optimal IRGPs to build the immune prognosis model (Table 2).

Expression of immune related genes and construction of
IRGPs model validation. The immune prognosis model was applied to the training set, and the patients in each training set were scored. According to ROC curve analysis, the optimal cutoff value for patients to be divided into high and low risk groups is 1.124 (Fig. 1A). After evaluating the model, we found that AUC value of model 1, 3 and 5 years is 0.913 (Fig. 1B), 0.913 (Fig. 1C) and 0.912 (Fig. 1D). The results show that our immune prognosis gene has a high reliability for the model. The training set was divided into high-risk group ( Fig. 2A) and high-risk group (Fig. 2B). The results showed that the overall survival rate (OS) of high-risk group was significantly lower than that of low-risk group. For TCGA training set data, single factor and multi factor Cox risk regression analysis showed that only IRGPs model showed significant prognostic effect in single factor Cox ( Fig. 2C), while age and IRGPs could be significant independent prognostic factors in multi factor Cox (Fig. 2D).
Applying this model to the validation set of gse44001 ( Fig. 3A) (Table 3), survival analysis showed that the OS of patients in the high-risk group was significantly lower than that in the low-risk group (Fig. 3B). In the single factor and multi factor Cox analysis, IRGPs model and tumor size were significantly correlated with prognosis ( Fig. 3C,D).

Infiltration of immune cells in cervical cancer samples.
Most studies believe that the occurrence and development of tumor are closely related to immune cells, so it is an ideal method to study the infiltration of immune cells in tumor. We used CIBERSORT to analyze the infiltration of 22 kinds of immune cells in patients with high and low risk groups. Figure 4A shows the expression of immune cells in different risk groups. Macrophage M0 (Fig. 4B), activated mast cells (Fig. 4C) were significantly overexpressed in the high-risk group, while stationary dendritic cells (Fig. 4D), stationary mast cells (Fig. 4E), activated CD4T cells (Fig. 4F), and cd8t cells (Fig. 4G) were overexpressed in the low-risk group.  (Fig. 5A,B). KEGG results showed that these IRGP were involved in cytokine cytokine receptor interaction, chemokine signaling, tumor necrosis factor signaling, MAPK signaling, NF kappa B signaling, natural killer cell-mediated cytotoxicity, viral proteins and cytokines, and Th1 and Th2 cell differentiation (Fig. 5C,D). The results of GSEA (Fig. 6A) showed that these IRGP were significantly enriched in trace ribonucleoprotein complex (Fig. 6B), neurotransmitter transporter activity (Fig. 6C), endopeptidase activity ( Fig. 6D), fibroblast growth factor receptor binding (Fig. 6E), hormone activity ( Fig. 6F), fibroblast cell proliferation (Fig. 6G), and growth factor receptor binding (Fig. 6H).

Expression of immune gene in cervical cancer.
We explored the expression of IRGP in cervical cancer using the ualcan model (Table 4). There were 6 low-level expression of IRGP in cervical cancer (Fig. 7) and 8 high-level expression of IRGP (Fig. 8). There were differences in the expression of 5 low expression IRGP and 8 high expression IRGP in different age groups (Fig. 9), and there were differences in the expression of 14 IRGP in different stages of cervical cancer (Fig. 10).

Discussion
Cervical cancer is one of the most common gynecological malignancies. HPV infection is considered to be the main cause of cervical cancer 21,22 although the incidence rate of cervical cancer has been significantly decreased due to the development and promotion of HPV vaccine 23 . But incidence rate of cervical cancer is still high in developing countries and China's low income countries 24 . At present, for cervical cancer patients without invasion and lymphatic metastasis, the effect of surgery combined with radiotherapy and chemotherapy is better. If metastasis and infiltration occur, the treatment effect of cervical cancer patients will become very unsatisfactory. In recent years, immunotherapy has performed well in a variety of cancers including cervical cancer [25][26][27] .
Blocking PD-L1 / PD-1 signaling pathway to attack tumor cells expressing PD-L1 is the current mainstream method 28 . Although the anticancer activity of PD-1 and PD-L1 inhibitors is exciting, such immunotherapy is not effective for all patients, and a meta-analysis shows that patients who receive PD-1 / PD-L1 inhibitors have a www.nature.com/scientificreports/ higher risk of rash, thyroid dysfunction, pruritus, pneumonia and colitis [29][30][31] . Therefore, it is of great significance for the detection and treatment of cervical cancer to predict and find more biomarkers that may be related to immune prognosis. At present, most of the prognostic genes need to be standardized to reduce the errors caused by sequencing platform and samples. In this study, the scores of IRGPs constructed by us are calculated from the gene expression data of the same sample, which can not only ignore the impact of different platforms, but also do not need to standardize and scale the data. This method has been used in many studies, including cancer molecular classification, with high reliability 32,33 .
In this study, we screened 29 pairs of IRGP to construct the immune prognosis model related to the overall survival rate of cervical cancer patients. The AUC values of the model in 1, 3 and 5 years were all greater than 0.9. According to these 29 pairs of IRGP, they were divided into high-risk group and low-risk group. In TCGA training group and GSE44001 verification group, the OS of high-risk group was significantly lower than that of low-risk group (P < 0.01). These 29 pairs of IRGP have a good effect on sample discrimination. We found that macrophage Mo and activated mast cells were significantly over expressed in high-risk group by immunocyte infiltration analysis of samples. The existing research shows that mast cells and macrophages play an important  www.nature.com/scientificreports/ role in cervical cancer, which can promote the development of cervical cancer by promoting lymphangiogenesis and angiogenesis [34][35][36] . However, in the low-risk group, the expression of static dendritic cells, static mast cells, activated CD4T cells and cd8t cells is high. Although the effect of CD4T cells on cervical cancer has not been agreed, the cd8t cells are closely related to the better prognosis of cervical cancer patients [37][38][39] , there is evidence that dendritic cells will decrease in patients with high HPV infection, which indicates that high expression of dendritic cells is beneficial to resist cervical cancer 40 , which is consistent with our results. The enrichment analysis of go and GSEA showed that these immune genes were mainly involved in the binding of cytokines and their receptors, the binding of chemokines and their receptors, the binding of growth factors and their receptors, the binding of epidermal growth factor receptors, the activity of metalloendopeptidase, the binding of fibroblast growth factors and their receptors, hormone activity, fibroblast proliferation, and the binding process of growth  45,46 . KEGG results showed that these immune genes were mainly enriched in chemokine signaling pathway, tumor necrosis factor signaling pathway, MAPK signaling pathway, NF kappa B signaling pathway, natural killer cell-mediated cytotoxicity, viral protein and cytokine, and Th1 and Th2 cell differentiation. Th1 and Th2 may be involved in the pathogenesis and growth of cervical cancer. Th1 may be the target of predicting chemotherapy response of advanced cervical cancer [47][48][49][50] , while other pathways are classical signal pathways related to cancer. Immune cytokines play an important role in cervical lesions. Torres et al. Found that IL-10 is highly expressed in the cervix of women with persistent HPV, which may be related to the persistence of HPV and the promotion of disease progression. Further research by their team showed that copy individuals of IL-4, IL-6, IL-10 and TGFB1 were significantly associated with cervical cancer, and could be used as biomarkers for susceptibility to the disease 51,52 .These 29 pairs of IRGP have 47 different immune genes, most of which are cytokines, antimicrobial agents and natural killer cells, which are involved in various stimulation reactions and play a key role. In cervical cancer, HPV can inhibit the apoptosis of cervical cancer cells by down regulating NOD1 53 . In our sample, we also found that the expression of NOD1 in tumor tissue is low and there are differences in different ages and stages (Figs. 7D, 9C, 10I). Sang Yeon Cho et al. Found that duox1 is highly expressed in cervical squamous cell carcinoma and can play a good prognostic role by increasing the amount of innate immune cells 54 . The analysis also showed that DUOX1 is highly expressed in tumor tissues and related to age and grade (Figs. 8B, 9G, 10D). Stc2 can promote the proliferation of cervical cancer cells and increase the resistance to cisplatin 55 , while high expression of DDL4 is usually associated with low pelvic lymph node metastasis and survival rate of cervical cancer 56 . Therefore, we believe that the IRGP constructed in this study plays an important role in the development and prognosis of cervical cancer. There are also some deficiencies in our research. Although we select data samples from two databases for analysis, and use more advanced methods to reduce the errors caused by platforms, samples, etc., this is still a retrospective analysis. If we can carry out a prospective study or obtain clinical samples and evaluate them with Western blot or immunohistochemistry, it will be more convincing.    www.nature.com/scientificreports/

Conclusion
We constructed an immune gene pair model which is closely related to the prognosis of cervical cancer patients. The model contains 29 IRGP and 47 immune-related genes. The biological functions of these 47 immune-related genes are closely related to the occurrence and development of cervical cancer. Therefore, we think that these IRGPs may be the target of predicting or diagnosing cervical cancer, and suggest that immunotherapy can improve the prognosis of cervical cancer patients by regulating these IRGPs.