An immune cell infiltration-related gene signature predicts prognosis for bladder cancer

To explore novel therapeutic targets, develop a gene signature and construct a prognostic nomogram of bladder cancer (BCa). Transcriptome data and clinical traits of BCa were downloaded from UCSC Xena database and Gene Expression Omnibus (GEO) database. We then used the method of Single sample Gene Set Enrichment analysis (ssGSEA) to calculate the infiltration abundances of 24 immune cells in eligible BCa samples. By weighted correlation network analysis (WGCNA), we identified turquoise module with strong and significant association with the infiltration abundance of immune cells which were associated with overall survival of BCa patients. Subsequently, we developed an immune cell infiltration-related gene signature based on the module genes (MGs) and immune-related genes (IRGs) from the Immunology Database and Analysis Portal (ImmPort). Then, we tested the prognostic power and performance of the signature in both discovery and external validation datasets. A nomogram integrated with signature and clinical features were ultimately constructed and tested. Five prognostic immune cell infiltration-related module genes (PIRMGs), namely FPR1, CIITA, KLRC1, TNFRSF6B, and WFIKKN1, were identified and used for gene signature development. And the signature showed independent and stable prognosis predictive power. Ultimately, a nomogram consisting of signature, age and tumor stage was constructed, and it showed good and stable predictive ability on prognosis. Our prognostic signature and nomogram provided prognostic indicators and potential immunotherapeutic targets for BCa. Further researches are needed to verify the clinical effectiveness of this nomogram and these biomarkers.


Identification of shared prognostic immune cells. As listed in
, five shared immune cells related to prognosis of BCa were identified, including cytotoxic cells, CD8+ T cells, T helper cells, T follicular helper cells (TFH), and Dendritic Cells (DC). The KM curves of the five common prognostic cells were presented in Supplementary Fig. 1a, which showed that high infiltration levels tumor samples had favorable prognosis.
The results of significant relationship between clinicopathological factors and prognostic cells were demonstrated in Supplementary Fig. 1b. The infiltration abundance of T helper cells was significantly lower in samples of elder patients, high grade, late stage (stage III and IV) and MIBC subtype. TFH also showed lower infiltration level in samples of late stage. The infiltration abundance of CD8 T cells, however, was significantly higher in MIBC, which may be influenced by the basis of small sample size of NMIBC. constructed the sample dendrogram and trait heatmap with no outlier sample detected. Then, under the softthresholding of 3, the gene cluster dendrogram was produced with the high similarity of feature genes into the same module. Next, we created a module-trait heatmap to demonstrate the correlation between the infiltration levels of the five prognostic immune cells and different modules. As showed in Fig. 1, turquoise module showed the highest correlation coefficient with CD8 T cells (cor = 0.54), TFH (cor = 0.43), cytotoxic cells (cor = 0.75) and DC (cor = 0.55) simultaneously. Additionally, this module had marginally significant and weak correlation with the survival time compared to other modules. Taking together, this module was regarded as the key module and genes within the module were the most relevant to tumor prognosis. We then extracted all genes in this module for next studies.
Identify IRMGs and perform enrichment analysis. As shown in Fig. 2a, a total of 259 shared genes of IRGs and MGs were identified and considered as immune-related module genes (IRMGs). The top 20 enriched terms by the online tool Metascape were listed in Fig. 2b. The highest-levels of enriched clusters in the four categories were cytokine-cytokine receptor interaction in KEGG pathway, lymphocyte activation in biological process of GO, Adaptive Immune System in Reactome Gene Sets, and PID IL12 2PATHWAY (IL12-mediated signaling events) in Canonical pathways.
PIRMGs identification, mutation analysis and TF regulatory network construction. 376 samples were randomly and equally divided into training and testing cohorts. We then performed univariate Cox regression analysis on the IRMGs in training cohort and identified 12 IRMGs related to the prognosis of BCa (Fig. 3a). Subsequently, to clarify the molecular characteristics of the PIRMGs, we used the online tool cBio-  www.nature.com/scientificreports/ portal to perform the gene alternations analysis and found that the most common types were amplification and mutation (Fig. 3b). We then distinguished the TFs from the turquoise module and constructed a regulatory network based on the 12 PIRMGs and 15 module-related TFs (Fig. 3c,d).
Develop a signature in the training cohort. The prognostic signature was developed with five PIRMGs, namely FPR1, CIITA, KLRC1, TNFRSF6B, and WFIKKN1 (

WFIKKN1.
Next, 188 BCa samples in the training cohort were divided into high-and low-risk groups according to the median risk score of 1.018. Figure 4a-c demonstrated the risk score distribution, survival status and the five PIRMGs expression patterns between two groups. To assess the prognostic value of the signature, we performed KM survival analysis and found that patients in high-risk group had worse prognosis compared to those in low-risk group (Fig. 4d). We then produced time-dependent ROC curves to estimate the performance of the

Signature validation in internal and external validation datasets.
Based on the risk score calculator proposed in the training cohort, the risk score of each BCa sample of both internal and external validation datasets were obtained and these samples were further divided into high-and low-risk groups according to the median risk score in training group.
In internal validation dataset, the results were similar to those of the training cohort ( Supplementary Fig. 2). Additionally, the results of the external validation datasets also suggested that BCa patients in high-risk group  www.nature.com/scientificreports/ also suffered from worse prognosis than those in low-risk group (Fig. 5). Moreover, the signature presented stable performance in all cohorts.
In conclusion, the five PIRMGs signature can divided BCa patients into two risk-level groups with significant differences in overall prognosis. Table 3, the signature showed independent prognostic prediction ability in three cohorts of TCGA dataset, simultaneously. Besides, age and stage were considered as two independent clinicopathological predictors of BCa.

Independent analysis of the signature and clinicopathological characteristics. As shown in
The results of clinical features relationship analysis demonstrated that patients of high grade, late stage, MIBC subtype, and older age had significant higher risk scores (Fig. 6a).
Stratified analysis was further carried out between the signature and age and stage. And the results showed that the signature had significant prognostic value and reliability for BCa patients with the same age and late stage (Fig. 6b-i).
In short, these results suggested the relationship between signature and progression of BCa.
Construction of a nomogram. Based on the five PIRMGs signature and age and stage, we constructed a nomogram to predict the 1-, 3-, and 5-year overall survival of BCa (Fig. 7a). Time-dependent ROC curves showed the adequate discrimination of the nomogram with an AUC of 0.774, 0.724, and 0.709 at 1-, 3-, and 5-year follow up (Fig. 7b). Additionally, we plotted the calibration plots to demonstrated the good predictive effect of this nomogram on overall survival (Fig. 7c).    Supplementary Fig. 3, the full model demonstrated superiority to the based model in 1-, 3-, and 5-year time point. Moreover, we compared the AUC value of the nomogram to those of other clinical variables by ROC curves (Supplementary Fig. 4). In a word, the signature combined with traditional clinical variables has potential clinical utility.

Discussion
In this study, we identified five PIRMGs (FPR1, CIITA, KLRC1, TNFRSF6B, and WFIKKN1) from the turquoise module which showed positive and significant correlations with several immune cells including CD8+ T cells. A prognostic signature was then constructed to divide patients of BCa into two distinct risk groups. Ultimately, a nomogram composing of signature, age and tumor stage was developed to calculate the total score of each BCa and return a quantitative survival probability. Two GEO datasets were obtained to validate the significant predictive power and good performance of the signature and nomogram, respectively. Although there have been several immune-related signatures proposed in literature, we firstly proposed a novel immune cell infiltration abundance-related prognostic signature of BCa through WGCNA.
Though correlation analysis between TFs of the turquoise module and PIRMGs, three key TFs (STAT4, IKZF1 and STAT1) were identified from the network. STAT4 and IKZF1 have been proved as essential factors in immune cell development and immune response 10 . STAT1 can affect the cellular survival and response to pathogens due to its critical roles in gene expression.
Cytotoxic cells are composed of CD8+ T cells, gamma delta T cells (Tγδ) and natural killer (NK) cells, participating in the innate immune system. The definite function of cytotoxic cells is eliminating intracellular pathogens and abnormal cells. Unlike CD8 T cells with strict major histocompatibility complex (MHC) restriction, both Tγδ and NK cells have the ability to recognize and kill infected cells in the absence of antibodies and www.nature.com/scientificreports/ MHC, and secret a large number of cytokines, allowing for a rapid immune response 11 . As a critical role in the adaptive immune system, T helper cells regulate the proliferation of B cells and participate in pathogen clearance, and autoimmunity through specific coordinate effector functions. There are two major subsets of T

helper cells (Th1 and Th2 cells) once the proliferating T helper cells develop into effector T cells. Th1 cells mainly lead to an activated cell-mediated response, while Th2 cells favor a predominantly humoral response 12 . TFH, a subset CD4+ T cells, are found in B cell follicles and germinal centers. TFH play an important role in regulating the selection and survival of B cells that can differentiate into plasma cells and memory B cells. More importantly,
TFH are believed to have the ability to decrease the repertoire of potentially autoimmune-causing mutated B cells within the germinal center. In general, the functions of, and mediation between, the cytokine, immune cells, and immune systems are critical for anti-tumor immunity, and profound understanding in these molecular interactions can promote the breakthroughs in tumor immunotherapies. WFIKKN1, encodes large extracellular multidomain proteins, were mainly explored in cell growth and metabolism 13 , while, the biological behavior of WFIKKN1 was poorly studied in tumors.
As reported in previous studies, the expression of peptide-loaded HLA class I molecule (HLA-E) in tumor cells can negatively mediate the anti-tumor activity of NK cell, by ligation of the NK inhibitory receptor CD94/NKG2A (KLRC1) 14 . This key molecular mechanism in tumor resistance to immune cells was further explored by Kamiya et al. via establishing NKG2Anull and NKG2A + NK cells. And the results showed that NKG2A downregulation was associated with the higher cytotoxicity of NK cell in decreasing HLA-E-expressing tumor cells 15 . The consistent result was also reported in the study by Chen et al. who further found that NKG2A + CD8+ T cells form the predominant subset of NKG2A+ cells in lung cancer tissue but not NK cells and NKG2A blockade could promote anti-tumor immunity by reducing dysfunctional CD8+ T cells 16 . Surprisingly, these findings were contrary to the results of our study in which high expression of NKG2A was associated with favorable prognosis. Further studies are required to explore the role and molecular mechanism of NKG2A in BCa.
The critical role of CIITA in immune response to tumor cells was well established in literature. Mortara et al. reported that MHC class II expression in breast cancer was dependent on CIITA. And they found that CIITAinduced MHC class II expression on tumor cells had the ability of triggering an adaptive and protective immunity by presenting tumor antigen to Th cells, antitumor polarization, and establishment of antitumor immune memory 17 . Accolla et al. proposed an innovative method for construction of optimal anti-tumor vaccine, based on the biological functions of CIITA-induced MHC class II. The reason why traditional tumor-specific MHC-I-bound peptides had limited anti-tumor efficacy, as they analyzed, was that Th cell was inadequate triggered to maintain the proliferation of all the immune effector cells. Considering this drawback, they conducted a in vivo in mice assay and found that CIITA-driven MHC class II expression in tumor cells revealed strong inhabitation of tumor growth 18 . The protective role of CIITA in tumor was also proved by Lee et al. 19 . Based on these findings, novel anti-tumor vaccination protocols will be established and translated in clinics to provide a favorable prognosis for patients with tumors.
FPR1 functions as a key part of the innate immune system mainly expressed in the phagocytic and blood leukocyte cells, and mediates the response to pathogens invasion. Jiang et al. reported that overexpression of FPR1 was associated with drug-resistant BCa and may deteriorate the overall condition of drug-resistant BCa 20 . The biological behaviors of FPR1 were also explored in ovarian cancer 21 , cervical cancer 22 , and lung cancer 23 as a risk factor related to poor prognosis, advanced stage and metastasis. However, Prevete et al. found that FPR1 acted as a tumor suppressor in human gastric cancer by counteract angiogenesis 24 .
TNFRSF6B belongs to the tumor necrosis factor receptor superfamily and acts as inhibiting Fas ligandinduced apoptosis 25 . Upregulated TNFRSF6B was identified in several human cancers including colon and lung cancers, and functioned as predictor of tumor invasion 26 . Tseng and colleagues found that overexpression of TNFRSF6B was also associated with the progression of chronic kidney disease 27 . Zekri et al. studied the differentially expressed genes in metastasis advanced Egyptian BCa and found that TNFRSF6B was downregulated in BCa samples, which was contrary to our findings 28 . This study, however, did not explore the molecular mechanism of TNFRSF6B in depth.
Together, we identified five PIRMGs, including FPR1, CIITA, KLRC1, TNFRSF6B, and WFIKKN1, which might play a vital role in tumorigenesis of BCa and serve as potential targets of immunotherapy. Besides, we developed a prognosis signature based on a series of analyses, which had good prognosis prediction ability in BCa. Ultimately, a stable prognostic predictive nomogram of signature, age, and stage, was developed to predict the 1-, 3-and 5-year survival probability of patient with BCa. Further researches are needed to verify the clinical effectiveness of this nomogram. Figure 8 presents the work flow of our study. Data acquisition. We downloaded the transcriptome data (log2 transformed RSEM normalized count) and clinical data of BCa from the TCGA Hub in the UCSC Xena database (https:// tcga. xenah ubs. net). GSE32548 and GSE32894 were downloaded from GEO database (https:// www. ncbi. nlm. nih. gov/ gds/). The exclusive criteria of BCa samples were: (1) BCa with follow up < 90 days and, (2) BCa with missing survival data. Finally, 376, 126 and 224 BCa samples in TCGA and GSE32548 and GSE32894 datasets were obtained, respectively. Samples of TCGA dataset were divided into training and testing cohorts. The training cohort was used for signature construction, and the testing cohort and entire TCGA dataset were used for internal validations. Tow GEO datasets were used for external validations. Ethics approval statement is not needed because the BCa samples were obtained from the public databases.  29 . 318 transcription factors (TFs) were downloaded from the Cancer database (http:// cistr ome. org) 30 for constructing a regulatory network.

Infiltration levels of 24 immune cells in BCa samples.
Firstly, we quantified the infiltration abundances of the 24 immune cells (reported in previous studies) in BCa samples by ssGSEA 31,32 . After the infiltration levels and the survival data of the BCa samples were obtained and merged, univariate Cox regression analysis www.nature.com/scientificreports/ and Kaplan-Meier survival analysis were performed to screen the shared prognostic immune cells. KM curves were produced to illustrate the results of survival analyses. Then, with an attempt to further clarify the possible association between the shared prognostic cells and tumor progression, we analyzed the relationship between clinicopathological factors (age, grade, stage and subtypes) and these cells.

Construction of a weighted co-expression network.
The input data files of WGCNA were: (1) transcriptome data of the 376 BCa samples of TCGA dataset and (2) the phenotype matrix which consisted of the survival data and the infiltration levels of the identified prognostic immune cells.
Firstly, the variability of gene expressions across the 376 samples was measured by a robust method called median absolute deviation (MAD). And the top 5000 MAD genes were identified for following analysis. A sample dendrogram with phenotype heatmap was constructed.
Subsequently, we calculated the best soft-thresholding, named β, by Soft Threshold function. A weighted adjacency matrix was then constructed based on the β value. And the adjacency matrix was further transformed into a topological overlapping matrix (TOM). Finally, genes with similar expressions were clustered into one module called co-expression module. p-values and correlation coefficients were calculated to identify the association between a co-expression module and the phenotype. Finally, the key module with significant correlation with phenotype was identified, and all genes within the module (module genes, MGs) were extracted for next studies.
Identification of immune-related module genes (IRMGs). IRMGs were considered as the shared genes of MGs and IRGs. A Venn plot by "VennDiagram" package in R was generated to show the analytic results 33 . To reveal the functional mechanism of the IRMGs, we then performed enrichment analysis by online tool Metascape (https:// metas cape. org/) which is designed to provide a comprehensive gene list annotation and analysis resource for researchers 34 .

Identification of prognostic IRMGs (PIRMGs).
The 376 BCa samples of TCGA dataset were randomly divided into training cohort and testing cohort in the ratio of 1:1. Then, we performed univariate Cox regression analysis in the training cohort to screen the IRMGs associated with prognosis of BCa, and IRMGs with p-values lower than 0.05 were considered as PIRMGs. The online tool cBioportal (http:// www. cbiop ortal. org/) was then used to analyze the alterations of the PIRMGs 35 .
Next, we identified TFs in the key module by "VennDiagram" package and analyzed their correlations with PIRMGs by Pearson correlation coefficient. We further selected significant correlations under the cutoff values: correlation score > 0.4 and p-value < 0.001. A regulatory network of the significant correlations was ultimately constructed to further explore the molecular mechanisms of these PIRMGs. Software Cytoscape 3.8.0 was employed to visualize the network 36 .

Development of a prognostic signature. A prognostic signature was developed by Least Absolute
Shrinkage and Selector Operation (LASSO) analysis and multivariate Cox regression analysis. Risk score of each BCa was calculated using the formula: Risk score = ExpGene 1 × CoefGene 1 + ExpGene 2 × CoefGene 2 + … ExpGene (n) × CoefGene (n) . In this equation, "ExpGene" represented gene expression and "CoefGene" was the regression coefficient.
Then, BCa samples in training cohort were divided into high-and low-risk groups depending on the median value of the risk score. KM survival curve by log-rank test and time-dependent receiver operating characteristic (ROC) curve were produced to assess the prognostic ability and performance of the signature. Further, the stability and reliability of the signature was verified in the internal and external validation group by the same methods.
Furthermore, the independent prognostic ability of signature was assumed by univariate and multivariate Cox regression analysis. Stratified analysis was also employed to assess the performance of signature in the same clinicopathological characteristics.

Development and evaluation of a nomogram.
With an attempt to predict the 1-, 3-and 5-year survival probability of each BCa patient, we constructed a nomogram based on the risk score and independent clinicopathological characteristics. The performance of the nomogram was validated using time-dependent ROC curves and calibration plots.
Additionally, to further asses the clinical utility of the signature for prognosis, we compared a full model including risk score and clinical variables (age, stage and grade) to a base model including only clinical variables through time-dependent ROC curves. AUC represents the performance of each model at specific time point.

Data availability
The data underlying this study are freely available from the TCGA Hub at Xena datasets (https:// tcga. xenah ubs. net) and the GEO database with accession number of GSE32548 and GSE32894 (http:// www. ncbi. nlm. nih. gov/ geo/).

Funding
This work was supported by Chongqing Science and Technology Commission (cstc2015shmszx120067).