A novel immune prognostic index for stratification of high-risk patients with early breast cancer

The prognostic value of current multigene assays for breast cancer is limited to hormone receptor-positive, human epidermal growth factor receptor 2-negative early breast cancer. Despite the prognostic significance of immune response-related genes in breast cancer, immune gene signatures have not been incorporated into most multigene assays. Here, using public gene expression microarray datasets, we classified breast cancer patients into three risk groups according to clinical risk and proliferation risk. We then developed the immune prognostic index based on expression of five immune response-related genes (TRAT1, IL2RB, CTLA4, IGHM and IL21R) and lymph node status to predict the risk of recurrence in the clinical and proliferation high-risk (CPH) group. The 10-year probability of disease-free survival (DFS) or distant metastasis-free survival (DMFS) of patients classified as high risk according to the immune prognostic index was significantly lower than those of patients classified as intermediate or low risk. Multivariate analysis revealed that the index is an independent prognostic factor for DFS or DMFS. Moreover, the C-index revealed that it is superior to clinicopathological variables for predicting prognosis. Its prognostic significance was also validated in independent datasets. The immune prognostic index identified low-risk patients among patients classified as CPH, regardless of the molecular subtype of breast cancer, and may overcome the limitations of current multigene assays.

The presence or high expression of immune-related genes is associated with favorable outcomes for patients with HR−/HER2+ or triple-negative breast cancer (TNBC, HR−/HER2−) [16][17][18][19][20][21][22] . Several molecular predictors of recurrence of HR− breast cancer based on prognostic immune gene signatures have been reported 16,17,20,23 . However, the prognostic or predictive significance of immune gene signatures with respect to HR+ breast cancer remains unclear. Of note, previous studies show that immune gene signatures can be prognostic for ER+ breast cancer. Schmidt et al. showed that the B cell metagene is independently associated with reduced risk of metastasis of lymph node-negative (LN−) breast cancer with high proliferative activity 24 . Similarly, we previously developed a novel prognostic model for LN− breast cancer based on the combination of proliferation-related genes and immunity-related genes 25 . Importantly, the study revealed that high proliferative activity is associated with an increased immune response in those with ER− or ER+ breast cancer, and that the positive prognostic value of immune response genes was true for LN− breast cancer, regardless of ER status. These results suggest that immune responses have a positive effect on the clinical outcome of fast-proliferating early breast cancer, regardless of ER status.
Despite the known prognostic or predictive significance of immune gene signatures in breast cancer, immune response-related genes have not been incorporated into commonly used commercial multigene assays. Recently, a multigene prognostic assay called Geneswell Breast Cancer Test (BCT), which includes the immune responserelated BTN3A2 (immunoglobulin superfamily related to T cell immune reaction), was developed and validated as prognostic for HR+/HER2− early breast cancer 26 . However, to date, there is no immune gene signature-based commercial assay that is prognostic for both HR+ and HR− breast cancer. Based on a previous study in which we showed a significant association between immune response-related genes and favorable outcomes in patients with highly proliferating tumors, regardless of ER status, we aimed to develop the immune gene-based prognostic model to further predict the risk of recurrence of high-risk early breast cancer, regardless of molecular subtype. Here, we first classified patients with four molecular subtypes into three risk groups according to clinical risk and proliferation risk. We then developed a novel prognostic model based on combined expression of five immune response-related genes and LN status and used it to predict the risk of recurrence in high-risk patients with early breast cancer.
Patients with each subtype of breast cancer were first classified into four risk groups according to clinical risk and proliferation risk: (1) clinical high-risk and proliferation high-risk; (2) clinical high-risk and proliferation low-risk; (3) clinical low-risk and proliferation high-risk; and (4) clinical low-risk and proliferation low-risk groups. The four risk groups (assigned according to the clinical risk and proliferation risk) for each molecular subtype of breast cancer were further grouped into three risk groups based on Cox analysis outcomes: a clinical and proliferation high-risk (CPH) group; a clinical and proliferation intermediate-risk (CPI) group; and a clinical and proliferation low-risk (CPL) group. Patients with the HR+/HER2− subtype were assigned to the three risk groups, whereas patients in the HR+/HER2+ ubtype and TNBC subtype were assigned to only two risk groups www.nature.com/scientificreports/ (CPH and CPI); all HR−/HER2+ patients were assigned to the CPH group (Fig. 1a). The probability of diseasefree survival (DFS) or distant metastasis-free survival (DMFS) of patients classified into the CPH group were significantly lower than those of patients classified into the CPI or CPL groups (Fig. 1b). Moreover, the association between 110 immune response related-genes and the clinical outcomes of the CPH, CPI, and CPL groups were assessed with respect to each molecular subtype. In all CPH patients within each molecular subtype, expression of immune response-related genes was associated with favorable clinical outcomes. However, no significant immune response-related genes were associated with a favorable clinical outcome in the CPI and CPL groups. Based on these results, the CPH group was used to develop the novel immune genebased prognostic model. 61.4% of patients in the CPH group were aged > 50 years and 286 (74.1%) patients had LN-tumors (Supplementary Table S1).
Prognostic value of the immune prognostic index in the clinical and proliferation high-risk group. The immune prognostic index was developed based on the expression of the top five immune response-related genes ( Table 2) in combination with LN status to predict a recurrence in the CPH group. To assess the prognostic value of the immune prognostic index in the CPH group, the 386 patients were stratified into three risk groups (low, intermediate, and high) using the optimal cutoff points determined by maximally selected statistics. Kaplan-Meier curves revealed a significant difference in DFS or DMFS between the groups categorized according to the immune prognostic index. The survival rates of patients in the high-risk (hazard ratio, 5.77; 95% confidence interval [CI], 3.40-9.80) and intermediate-risk (hazard ratio, 2.42; 95% CI, 1.52-3.86) groups were significantly lower than those of patients in the low-risk group (P < 0.001) (Fig. 2a). The 10-year DFS or DMFS rates for patients in the low-, intermediate-and high-risk groups were 73.4%, 51.3%, and 14.1%, respectively. Moreover, the immune prognostic index was significant for DFS or DMFS of patients with any of the four molecular subtypes of breast cancer (P = 0.007 for HR+/HER2−; P < 0.001 for HR+ HER2+ and TNBC; P = 0.002 for HR−/HER2+) (Fig. 2b).
Next, we analyzed the association between clinical variables, the immune prognostic index, and clinical outcomes. Univariate analysis revealed that the immune prognostic index (as a risk group or continuous variable) and LN status were significantly associated with risk of recurrence or distant metastasis (P < 0.001) ( Table 3). Importantly, the immune prognostic index retained its statistical significance in multivariate analysis, indicating that it was an independent prognostic factor in the CPH group ( Table 3).
The C-index was used to compare the prognostic performance of the immune prognostic index compared with that of clinical/gene variables used to develop prognostic model. As shown in Fig. 3a, the immune prognostic index had the highest C-index (0.75). These results illustrate that the immune prognostic index (based on the combination of expression of five prognostic immune genes and LN status) is superior to other clinical/gene variables used to predict recurrence or distant metastasis in those with high-risk early breast cancer.
Validation of the immune prognostic index using independent datasets. Next, we validated the prognostic significance of the immune prognostic index using independent datasets (GSE17705 and Molecular Taxonomy of Breast Cancer International Consortium [METABRIC] datasets). Similar to discovery dataset, a higher percentage of patients with LN− tumors than those with LN+ tumors was included in these validation datasets (Supplementary Table S1). Patients in the CPH group were stratified into two risk groups according to the optimal cutoff of the immune prognostic index. In the GSE17705 dataset, there was a significant difference in the DMFS of the intermediate-and high-risk groups. Patients classified as high risk had a significantly lower probability of DMFS than patients in the intermediate-risk group (P = 0.039) (Fig. 4a). Furthermore, multivariate analysis revealed that the immune prognostic index (as a continuous variable) is an independent prognostic factor (hazard ratio, 1.40; 95% CI, 1.21-1.61; P < 0.001) ( Table 4). The C-index was highest for the immune prognostic index (0.66), indicating that the index is better than clinical variables for predicting prognosis (Fig. 3b).
Similar results were observed for the METABRIC dataset. The 341 patients with stage I and II breast cancer were classified into low-or high-risk groups according to the immune prognostic index. However, because two immune response-related genes (IGHM and IL2RB) were not included in the METABRIC dataset, only three genes (TRAT1, IL21R, and CTLA4) were used to calculate the immune prognostic index. Survival analysis revealed that the immune prognostic index was prognostic for overall survival (OS) (P < 0.001) (Fig. 4b). It retained prognostic significance in multivariate analysis (P = 0.004 as a continuous variable; P < 0.001 as a risk group) ( Table 4). Subgroup analysis of the METABRIC dataset according to molecular subtype showed that the immune prognostic index was prognostic for HR+/HER2+, HR−/HER2+, and TNBC (P < 0.05), but only not prognostic for HR+/HER2− (P = 0.24) breast cancer ( Supplementary Fig. S2).
Although the immune prognostic index was developed to predict the risk of recurrence in patients with early breast cancer who were not treated with adjuvant chemotherapy, we also tested its ability to predict the prognosis in the datasets including chemotherapy-treated patients (in datasets GSE3494, GSE21653 and GSE42568). We found a significant difference in OS (P = 0.027 for GSE3494) or DFS (P = 0.007 for GSE21653 and 42568) between the two risk groups (Supplementary Fig. S3a). Indeed, the index (as continuous variable) was an independent prognostic factor (Supplementary Table S2). The C-index of the immune prognostic index (0.80) was higher than that of other clinical/gene variables ( Supplementary Fig. S3b). www.nature.com/scientificreports/ and multivariate analysis showed that the immune prognostic index was an independent prognostic factor, while two immune gene signatures were not significant (Supplementary Table S3). The immune prognostic index also had a higher C-index than those of other signatures in discovery and METABRIC dataset, respectively (Supplementary Fig. S4).

Discussion
Here, we developed and validated a novel immune gene-based prognostic index to predict recurrence or distant metastasis in high-risk patients with early breast cancer. First, we identified the most significant immune response-related genes associated with clinical outcome in a pre-specified CPH subgroup and then developed a novel immune prognostic index for this subgroup based on a combination of five immune response-related genes and LN status. When patients in the CPH group were classified into three risk groups according to the immune prognostic index, we found that the probability of DFS or DMFS of patients classified as intermediate or low risk were significantly higher than those of patients in the high-risk group. Multivariate analysis identified the immune prognostic index (as a continuous or categorical variable) as an independent prognostic factor. The prognostic value of the immune prognostic index was also validated in independent datasets. These results demonstrate that the immune gene-based model is prognostic for recurrence in the high-risk subgroup of all subtypes of breast cancer including HR+ and HR− breast cancer. Given that the prognostic significance of immune gene signatures was demonstrated mainly for HR-breast cancer, it is important to ascertain whether immune genes may be prognostic for highly proliferating HR+ breast cancers. The immune prognostic index identified low-risk or intermediate-risk patients within the high-risk group; these patients may not benefit from adjuvant chemotherapy. These results are in line with those of our previous study showing the positive association of immune response-related genes with clinical outcomes in highly proliferating tumors, regardless of ER status 25 . Moreover, our findings are supported by a recent study, showing that 17 immune gene signatures are prognostic for DMFS only in patients with ER-and highly proliferating breast cancers 20 . Importantly, comparative analysis of prognostic performance of our model and other immune gene signatures suggests that our immune prognostic index may be superior to other immune gene signatures including B cell response genes and genes related to immune/inflammatory cytokine regulation in predicting the prognosis of early breast cancer.
The five genes (TRAT1, IL21R, IGHM, CTLA4, and IL2RB) used in the immune prognostic index play roles in immune responses by regulating the function of T cells, B cells, natural killer (NK) cells, or interleukin signaling pathways. Here, we found that high expression of these immune response-related genes was associated with a favorable prognosis in the high-risk subgroup of patients with early breast cancer. These results are consistent with previous studies showing that IGHM (immunoglobulin heavy constant mu) gene expression correlates with a better prognosis for TNBC 28 , that immunostimulatory cytokine IL2 (interleukin 2) signaling through interaction with its receptor IL2RB (interleukin 2 receptor subunit beta) enhances the anti-tumor effects of NK cells 29 , and TRAT1 (T cell receptor associated transmembrane adaptor 1) is positivity associated with survival of melanoma patients 30 . By contrast, IL21 (interleukin 21) and IL21R (interleukin 21 receptor) play a role in promoting migration and invasion of breast cancer 31 . Another study shows that higher expression of IL21R in patients with primary HER2+ breast cancer is associated with positive effects of trastuzumab with respect to DFS, suggesting a possible role of IL21R as predictive marker for anti-HER2 32 . The study also showed that IL21R expression by CD8+ T cells is required for antitumor immune response of anti-ErbB2 antibodies against HER2+ tumors; also, IL21 signaling via IL21R may increase trastuzumab efficacy. Importantly, CTLA-4 (cytotoxic T lymphocyte antigen 4) suppresses activation of cytotoxic T cells, thereby contributing to the evasion of anti-tumor immune responses 33 . Higher expression of CTLA-4 mRNA levels is associated with advanced stage and axillary LN metastasis in those with breast cancer 34 . Given the known role of CTLA-4, we were surprised to find that its expression was associated with a favorable prognosis. This may be due to CTLA-4 expression by lymphocytes. Yu et al. showed that CTLA-4 expression by lymphocytes is associated with a favorable prognosis, whereas its expression by tumor cells is related to a poor prognosis 35 . However, as the prognostic significance and role of these five genes in breast cancer are largely unknown, it is notable that this study suggests that their expression is associated with a favorable prognosis in the high-risk subgroup with early breast cancer.
Current multigene assays are based on expression of proliferation-related genes, and their prognostic significance is limited to HR+/HER2− breast cancer. Proliferation-based gene signatures are strongly prognostic for ER+/HER2− breast cancer, but less so for other subtypes of breast cancer 15 . Our immune gene-based prognostic www.nature.com/scientificreports/ www.nature.com/scientificreports/ index may overcome the limitations of these current multigene assays and can be used alongside them to improve prognosis. However, there are some limitations. Owing to the lack of complete clinical information held in public microarray datasets, it was difficult to include a sufficient number of patients in the discovery and validation datasets. Also, there was a discrepancy between the discovery and validation datasets with respect to the optimal cutoff for classifying patients into risk groups. Moreover, only three genes (TRAT1, IL21R, and CTLA4) in the METABRIC dataset were used to calculate the immune prognostic index because two immune response-related genes (IGHM and IL2RB) were not included in this dataset. Finally, because the immune prognostic index was validated using public microarray datasets, its prognostic ability should be further validated by independent studies using samples derived from patient tissue. We developed a novel prognostic model based on a combination of five immune response-related genes and LN status and used it to predict the risk of recurrence in a CPH group of patients with early breast cancer. The immune prognostic index identified low-risk patients among clinically high-risk patients with highly proliferating tumors belonging to all subtypes of breast cancer. Moreover, the immune prognostic index was an independent prognostic factor, with performance superior to that of clinical variables used to predict the risk of recurrence. Its prognostic significance was also validated using independent datasets. Thus, the immune prognostic index may be used to provide additional prognostic information and to support current multigene assays used to identify low-risk patients who do not require adjuvant chemotherapy for early breast cancer, regardless of subtype.

Methods
Public microarray data mining and analysis. Affymetrix microarray datasets of breast cancer, including clinical information such as molecular subtype, clinicopathological variables, treatments, and patient survival information were downloaded from the GEO (http://www.ncbi.nlm.nih.gov/geo). Five GEO datasets (GSE4922, GSE6532, GSE7390, GSE11121, and GSE31519) based on the Affymetrix HG-U133A (GPL96 platform) were pooled to generate a discovery dataset (Supplementary Table S4). Among the 1327 patients from five GEO datasets, those with early breast cancer who were not treated with adjuvant chemotherapy (treated with adjuvant hormone therapy alone or no adjuvant therapy) were included in the discovery dataset. Breast cancer was classified into four molecular subtypes according to ER or progesterone receptor (PR) and HER2 status: HR+/HER2− (ER+ or PR+ /HER2−), HR+/HER2+ (ER+ or PR+/HER2+), HR−/HER2+ (ER−/PR−/HER2+) breast cancer, and TNBC (ER−/PR−/HER2−). If datasets did not contain information about ER/PR and HER2 status, expression levels of the corresponding gene encoding each marker was used as described previously 36 . The downloaded datasets were log 2 normalized. To reduce bias, genes with low variance were filtered out (interquartile range ≤ 0.5). Batch effects were removed using ComBat algorithm to reduce non-biological variations 37 . Table 3. Univariate and multivariate analyses of the immune prognostic index and clinicopathological variables in the discovery dataset. CI, confidence interval; LN, lymph node. P values < 0.05 are marked in bold. www.nature.com/scientificreports/ www.nature.com/scientificreports/ In addition, independent GEO datasets based on Affymetrix HG-U133A (GPL96 platform) or HG-U133_ Plus_2 (GPL570 platform), and Illumina bead-based METABRIC dataset were obtained; these were used as validation datasets (Supplementary Table S4). The METABRIC dataset was downloaded from the cBioPortal website (http://www.cbiop ortal .org).
Identification of candidate prognostic proliferation and immune response-related genes in breast cancer. Candidate genes prognostic for breast cancer were identified using Cox regression analysis and gene pathway analysis of the discovery dataset. The top most significant genes with adjusted P values < 0.01 were selected by Cox regression analysis; these were annotated using DAVID bioinformatics resources 38,39 and the gene annotation 'topGO' package in R (https ://bioco nduct or.org/packa ges/relea se/bioc/html/topGO .html). In addition, a list of candidate prognostic proliferation-related genes 18,25 and immune response-related genes 22,40 was compiled from previous studies. Based on the data analysis and literature search, 37 proliferation-related genes and 110 immune response-related genes were selected as candidate prognostic genes (Supplementary Table S5). Most of immune response-related genes displayed a higher expression in patients with favorable prognosis compared with those with poor prognosis. Higher expression of immune response-related genes showed a tendency for being associated with favorable prognosis (Supplementary Table S6), whereas high expression of proliferation-related genes was related to poor clinical outcome.

Stratification of patients according to clinical risk and proliferation risk. Clinical risk was
assessed based on histologic grade, tumor size, and LN status using a modified version of Adjuvant! Online, as described previously 8 . Patients with higher histologic grade, larger tumor size, and positive LN status were classified as clinical high risk. To assess proliferation risk, prognostic proliferation-related genes were selected from the 37 candidate genes in the discovery dataset using multivariate analysis. Multivariate analysis of the discovery dataset identified ten proliferation-related genes (BUB1B, RRM2, KIF18B, PTTG1, MELK, CDK1, FOXM1, TRIP13, RACGAP1, and KIFC1) as independent prognostic indicators of DFS or DMFS (Supplementary Fig. S5). Based on expression of these ten proliferation-related genes, patients were classified into high-or low-risk groups. If expression of five or more genes was greater than the median expression level of all ten genes, www.nature.com/scientificreports/ the patient was assigned to the proliferation high-risk group; otherwise, the patient was assigned to the proliferation low-risk group.
Development of a prognostic model based on expression of immune response-related genes and LN status. Lasso regression analysis was performed to identify the most significant immune responserelated genes significantly associated with DFS or DMFS. The top five immune response-related genes (TRAT1, IL21R, IGHM, CTLA4, and IL2RB) with the lowest lasso regression coefficient (< − 0.05) ( Table 2) were selected. Multivariate analysis of clinical variables (age, tumor size, histologic grade, and LN status) revealed that LN status was the most significant independent prognostic factor (data not shown). Therefore, expression of five immune response-related genes in combination with LN status was used to develop a prognostic score, referred to as the immune prognostic index, to predict a recurrence in the clinical and proliferation high-risk group. The coefficient values for each variable were calculated using Cox regression analysis, and the immune prognostic index was defined as a linear combination of these coefficients, which was used to predict the recurrence: www.nature.com/scientificreports/ where LN status is 0 (LN−) or 1 (LN+). A higher value for this index indicates a higher risk of recurrence or distant metastasis. The optimal cutoff for risk classification was determined using the maximally selected rank statistics in the 'survminer' R package 41 .
Statistical analyses. DFS was defined as the time from the date of surgery to the date of relapse, including locoregional recurrence and distant metastasis. DMFS was defined as the time from the date of surgery for the primary tumor to the date of distant metastasis. OS was defined as the time from the date of surgery to the date of death. Univariate and multivariate analyses using Cox's proportional hazard regression models were used to assess the association between clinical/gene variables and patient survival. All hazard ratios are reported with 95% CIs. The probability of DFS, DMFS, and OS was estimated using the Kaplan-Meier method and statistical differences in survival rates between groups were assessed using the log-rank test. Lasso regression analysis to select the prognostic immune response-related genes was done using the 'coxnet' package in R (https ://cran.r-proje ct.org/web/packa ges/glmne t/vigne ttes/Coxne t.pdf). Lasso regression analysis performs regularization and feature selection by penalizing the coefficients of the input variables using optimal λ as a tuning parameter 42 . The prognostic performance of the immune prognostic index was compared with that of clinical/gene variables or other immune gene signatures using Harrell's concordance index (C-index) 43 . R package 'survcomp' was used to calculate the C-index. P values < 0.05 were considered statistically significant. All statistical analyses were conducted using R 3.4.3 (http://r-proje ct.org).

Data availability
The datasets generated during and/or analysed during the current study are available in the GEO website (http:// www.ncbi.nlm.nih.gov/geo) and cBioPortal website (http://www.cbiop ortal .org).