Late recurrence of breast cancer is associated with pro-cancerous immune microenvironment in the primary tumor

The fact that 20–40% of all breast cancer (BC) patients develop recurrence when 5 year survival is 90% strongly suggests that late recurrence, i.e. more than 5 years after diagnosis, is the remaining challenge to decrease the absolute number of BC deaths. Better understanding late recurrence is an essential first step to address this issue. We hypothesized that primary tumors with a distinctive tumor immune microenvironment will develop late recurrence. Accordingly, we evaluated the relationship between the timing of cancer recurrence, clinical factors, gene expression profiles, and immune status utilizing two published large cohorts. 308 primary BCs in TCGA were analyzed and categorized as: recurrence ≤2 years (Early, n = 49), between 2–5 years (Mid, n = 54), recurrence >5 years (Late, n = 20), and no recurrence >5 years (Survivors, n = 185). 1,727 primary BCs in METABRIC were analyzed and categorized similarly: Early, n = 170; distant (D), n = 19; local (L), Mid, n = 213; D, n = 21; L, Late, n = 199; D, n = 57, L, and Survivors, n = 1048. Utilizing pre-ranked GSEA, we showed that primary tumors with Survivors were associated with anti-cancer signaling such as INF-α/-γ response and TNF-α signaling, compared with all recurrence groups in pre-ranked GSEA. Furtherrmore, we found that host defense immunity (leukocyte fraction, lymphocyte infiltration, and macrophage fractions) was decreased in primary tumors with Late recurrence compared with Survivors. Utilizing the CIBERSORT algorithm, we showed anti-cancer lymphocytes, memory CD4+ T cells and γδT cells, were significantly lower, and pro-cancerous regulatory T cells were significantly higher in Late tumors compared with Survivors. In agreement, cytolytic activity score that assesses immune cell cytolytic activity was significantly lower in Late compared with Survivors. We demonstrated that not only host defense immunity, but also pro-cancerous immune cells and immune cell cytolytic activity in primary BC was associated with late recurrence.

appropriately guide management 5 . Roughly 20-40% of estrogen receptor (ER) + BC patients eventually develop distant metastasis, and half of these events occur five or more years after diagnosis of the primary tumor 6 . This is in sharp contrast to ER-negative tumors, for which the recurrence rate peaks at around two years, but the rate diminishes after five years 7 . There have been attempts to utilize multi-parametric molecular assays, such as IHC4, OncotypeDX, EndoPredict, PAM50 risk of recurrence score, and Breast Cancer Index, to predict late recurrence in addition to early recurrence (relapse less than five years after initial treatment) 1,8 . However, many of these markers are not specifically tailored to predict late recurrence, as some are reportedly predictive of not only early but also late recurrence. While gene expression signatures that are retrospectively associated with late recurrences have recently been identified by comparing the gene expression profiles of primary tumors of early vs. late recurrences 6 , or using dormant cancer cells in experimental systems 9 , it remains to be determined whether these signatures can prospectively predict late recurrence.
Given the limitations described above, accurately risk-stratifying primary tumors as to their propensity for late recurrence remains a major clinical challenge in BC. Tumor infiltrating lymphocytes (TILs) are immune cells that have migrated to the tumor tissue and the local microenvironment 10 . The presence of TILs in tumor tissue is a result of the immune response generated by the patient against the malignancy. Recently, evidence has emerged demonstrating the importance of TILs in breast cancer as follows: the presence of TILs has been shown to correlate with a good prognosis and higher rates of pathological complete response to neoadjuvant chemotherapy 10 . Host factors are suggested to influence the timing of cancer recurrence since the processes and factors that have been implicated in dormancy include angiogenesis 11,12 , immune-surveillance [13][14][15] , and a wide variety of microenvironment cues such as extracellular matrix, growth factors and cytokines. Therefore, TILs may also be greatly involved in the timing of breast cancer recurrence.
We hypothesize that the host's immune status may be closely related to the timing of cancer recurrence. We examined the relationship between the timing of cancer recurrence and clinical factors, gene expression profiles, and immune status utilizing collected data from The Cancer Genome Atlas (TCGA) and Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) primary BC cohorts.

Materials and Methods
Data acquisition. TCGA was supervised by the National Cancer Institute (NCI) and the National Human Genome Research Institute 16 . The gene expression levels (mRNA expression z-score from RNA-sequence) from Genomic Identification of Significant Targets in Cancer for TCGA cohort was downloaded through cBioportal (TCGA provisional dataset) 17,18 . The values of "progression free survival (PFI)" and "PFI time" were obtained from (Liu et al., 2018 dataset) 19 . We defined timing of cancer recurrence as Early; recurrence ≤2 years, Mid; recurrence between 2-5 years, Late; recurrence >5 years, and Survivors; no recurrence >5 years. In the TCGA BC cohort, out of 934 primary BC patients, 308 women, excluding 626 women without relapse but not followed for 5 years, were analyzed. Out of a total of 308 women with recurrence or follow up data in the TCGA BC cohort, one hundred and twenty-three (39.9%) BC patients developed recurrent tumors, 49 Early, 54 Mid, 20 Late, and 185 BC patients were Survivors. The Nottingham Grade was calculated based on tubule formation, nuclear pleomorphism, and mitotic count, which were obtained from the TIE database containing pathology reports of the TCGA BC cohort patients. The gene expression levels (mRNA expression z-score from microarray) from METABRIC cohort was downloaded through cBioportal (METABRIC Nature 2012 & Nat Commun 2016 dataset). The values of relapse status (distant and local) and their relapse time were used as obtained from (Rueda et al., 2019 dataset) 20 . Out of 1,904 primary BC patients in METABRIC, 1,727 primary BC were used for distant and local recurrence analysis except for 274 women without distant and local recurrence but not followed for 5 years and 1,410 primary BC were used for breast cancer specific death (BSD) analysis except for 494 women alive but not followed for 10 years. They were used to support the authenticity of the association between timing of cancer recurrence and gene expression and TILs 21,22  Statistical analyses of RNA expression and loneliness. The analysis followed a two-step process.
First, we calculated the fold changes of genes, corresponding to each timeframe of cancer recurrence (whole, Early, Mid, and Late), which provided a list of t-scores and corresponding p-values for each timeframe of cancer recurrence in relation to each of the gene expression values. Second, gene set enrichment analysis was performed in Gene Set Enrichment Analyses (GSEA) Pre-ranked using these collections of gene sets from the Hallmarks gene sets using software provided by the Broad Institute (http://software.broadinstitute.org/gsea/index.jsp). We only considered gene sets significantly enriched that met a threshold of normalized enrichment score (NES) >1.5 or <−1.5 and false discovery rate (FDR) q-value < 0.01. Immune characteristics analysis. We used a previously developed dataset 23 to examine the association between timing of cancer recurrence and immune characteristics (intratumoral immune states, antigen-specific T cell receptor (TCR) and B cell receptor (BCR) repertoires, and immune subtypes). These previously defined "intratumoral immune states" were characterized using scores of 160 immune expression signatures and cluster analysis to identify modules of immune signature sets. "Immune subtypes" were defined as follows: C1 (wound healing) had elevated expression of angiogenic genes, a high proliferation rate, and a Th2 cell bias to the adaptive immune infiltrate, which was related with luminal A BC. C2 (IFN-γ dominant) had the highest M1/M2 macrophage polarization, a strong CD8 signal and, together with C6, the greatest TCR diversity. C2 also showed a high proliferation rate, which may override an evolving type I immune response, and was comprised of highly mutated BC. C3 (inflammatory) was defined by elevated Th17 and Th1 genes, low to moderate tumor cell proliferation, and, along with C5, lower levels of aneuploidy and overall somatic copy number alterations than the other subtypes. C4 (lymphocyte depleted) displayed a more prominent macrophage signature with Th1 suppressed and a high M2 response. C5 (immunologically quiet) exhibited the lowest lymphocyte and highest macrophage responses, dominated by M2 macrophages. C6 (TGF-β dominant) displayed the highest TGF-β signature and a high lymphocytic infiltrate with an even distribution of type I and type II T cells.
To evaluate intra-tumor immune cell composition, the relative fraction of 22 immune cell types in tumor tissue was estimated using the CIBERSORT deconvolution algorithm 24 , as described before 25 . These 22 cell fractions were calculated via the online calculator (https://cibersort.stanford.edu/) as previously shown 25 . The immune cytolytic activity (CYT) was defined as the geometric mean of GZMA and PRF1 expression values in Transcripts Per Million (TPM). The gene expression data were obtained in RSEM format from the Genomic Data Common data and converted to TPM by a given gene's estimated fraction of transcripts and multiplying with 10^6 26,27 . CYT was calculated as previously described 25 . Statistical analysis. All statistical analyses were performed using R software (http:///www.r-project.org/) and Bioconductor (http://bioconductor.org/). The chi-square test or Fisher's exact test or the nonparametric Mann-Whitney U test and contingency analysis were used to assess baseline differences between binary variables. The Kruskal-Wallis test was used to assess the relationship between mRNA expression and timing of cancer recurrence. Correlations were calculated using Spearman's rank correlation coefficient. In the analysis of disease free survival (DFS), the Kaplan-Meier method was used to estimate survival rates, and differences between survival curves were evaluated by the log-rank test. Cox's proportional hazards model was used for the univariate and multivariate analysis of prognostic status. Two-sided P values < 0.05 was considered as statistically significant for all tests.

Results
Association between clinical features of the primary tumors and the timing of cancer recurrence. We studied the relationship between clinical features of the primary tumor and the timing of cancer recurrence in TCGA BC cohort (Table 1) and METABRIC cohort (Tables 2 and 3). Compared with Survivors without recurrence, the primary tumor which developed Early recurrence was significantly associated with a larger tumor size (p = 0.0061), lymph node metastasis (p = 0.037), higher Nottingham Grade (p < 0.0001), higher clinical stage (p < 0.0001), negative ER (p = 0.0085), and negative progesterone receptor (PgR) (p = 0.0023) in TCGA BC cohort (Table 1). In addition to all the above mentioned features, positive human epidermal growth receptor 2 (HER2) (p < 0.00001), low frequency of the hormone receptor (HR) + HER2− group (p < 0.00001), no treatment with adjuvant endocrine therapy (p = 0.045), and treatment with adjuvant chemotherapy (p < 0.00001) were associated with Early in distant metastasis analysis of METABRIC cohort. Compared to Survivors, Midterm recurrence was significantly associated with lymph node metastasis (p = 0.00086) and higher clinical stage (p = 0.00093) in TCGA. In METABRIC, Mid was significantly associated with older age (p = 0.0075) and postmenopausal status (p = 0.0077), as well as clinical features significantly associated with the Early group. Interestingly, there was no statistically significant difference in clinical features between Survivors and Late recurrence group in TCGA, whereas, Late was significantly associated with lymph node metastasis (p = 0.000029), positive ER (p = 0.014), high frequency of the HR + HER2− group (p = 0.0017), and treatment with adjuvant endocrine therapy (p = 0.014), compared to Survivors in distant metastasis analysis of METABRIC (Table 2). In the local recurrence analysis of METABRIC cohort, Late was significantly associated with age (p = 0.035), premenopausal status (p = 0.035), positive PgR (p = 0.049), and treatment with radiation therapy (p = 0.021), compared to Survivors. Interestingly, there was no statistically significant difference in clinical characteristics between Early and Late and Survivors in the local recurrence analysis (Table 3). In addition, although we verified the relationship between timing of BSD and clinical features in METABRIC cohort, the results were similar to those of the TCGA BC cohort and the distant metastasis analysis in METABRIC cohort (Table S1). These results indicate that primary tumors that develop Late recurrence, particularly, local recurrence, were not as clinically aggressive as Early and Mid recurrence, and had almost the same features as Survivors.

Tumor immune microenvironment differs by cancer recurrences timeframe.
To assess the tumor immune microenvironment, leukocyte fraction, lymphocyte infiltration, macrophage regulation, antigen-specific TCR and BCR, and previously defined "Immune Subtypes" 23 were compared among the primary tumors by the Primary BCs with cancer recurrence data were analyzed and categorized as follows: recurrence ≤2 years (Early), recurrence between 2-5 years (Mid), recurrence >5 years (Late), and no recurrence >5 years (Survivors). Left panels: In volcano plots, X-axes: log2 FC; Y-axes: −log 10 P-value from limma analysis. mRNAs with P-value < 0.05 and FC >log2(1.5) are marked in red, with P-value < 0.05 and FC <log2(1/1.5) in green, all others in black. Right panels: In pre-ranked GSEA, blue bar shows NES and red dots show -log10 FDR q-value. We only considered gene sets significantly enriched that met a threshold of NES >1.5 or <−1.5 and FDR q-value < 0.01. Abbreviations: BC, breast cancer; GESA, Gene Set Enrichment Analyses; METABRIC, Molecular Taxonomy of Breast Cancer International Consortium; FC, fold change; NES, normalized enrichment score; FDR, false discovery rate.  T and B cells), TGF-β response, IFN-γ response, and wound healing, which robustly reproduced co-clustering of these immune signature sets 23 . Interestingly, both leukocyte fraction and macrophage regulation were significantly lower only in the Late group, whereas lymphocyte infiltration was statistically significantly lower in all the tumors that recurred regardless of timing (Early, Mid, and Late), indicating that weak host defense cancer immunity correlated with recurrence, particularly in Late (Fig. 3A). Antigen-specific TCR and BCR repertoires are critical for the recognition of pathogens and malignant cells and may reflect a robust anti-tumor response comprising a large number of antigen specific adaptive immune cells that have undergone clonal expansion and effector differentiation 23 . We demonstrated the relationship between TCR and BCR repertoires and timing of cancer recurrence in Fig. 3B. Lower TCR diversity was associated with later recurrence (Mid and Late recurrence in Shannon Entropy and all recurrence in Richness), but there was no correlation between BCR repertoire and timing of cancer recurrence. The six resulting clusters "Immune Subtypes", C1-C6, were characterized using a distinct distribution of scores over the above five immune Low CYT in primary tumors was associated with late recurrence in the TCGA BC cohort. In order to verify that low CY T can serve as a predictive biomarker of Late recurrence, we examined the relationship between CYT and the whole cohort and earlier (Early + Mid) and Late recurrence (Fig. 6). Patients with low CYT were marginally associated with worse DFS (p = 0.057), which were tested by the Kaplan-Meier method and verified by the log-rank (Mantel-Cox) test. Next, we examined the relationship between low CYT and DFS by timing of cancer recurrence. CYT was not associated with DFS in Early, but it was significantly associated with worse DFS in Late (p = 0.025). The DFS Cox hazard analysis for timing of cancer recurrence is shown in Table S2. The results showed that low CYT score was a significantly worse prognostic parameter in Late (univariate analysis; hazard ratio (HR): 0.36, 95% confidence interval (CI): 0.14-0.91, p = 0.031, multivariate analysis; HR: 0.29, 95% CI: 0.11-0.76, p = 0.012), but not in Early (univariate analysis; HR: 0.8, 95%CI: 0.83-1.88, p = 0.28, multivariate analysis; HR: 0.7, 95% CI: 0.93-2.16, p = 0.1). Interestingly, in the Late group, clinical factors, such as tumor size, node metastasis, and clinical stage, were not correlated with prognosis. These results indicated that immune cell cytolytic activity was a relevant prognostic factor for late recurrence.

Discussion
As late recurrence in BC remains a challenge despite advances in overall BC survival, studies have focused on efforts to more accurately and reliably predict the risk of late BC recurrence. While prior studies have shown the importance of clinical factors 1-5 , subtypes 6,7 , and gene signatures 1,8 , the relationship between late recurrence and immune status has yet to be demonstrated. Accordingly, we showed that BC patient who develop recurrence earlier (Early and Mid) had primary tumors associated with more aggressive clinical characteristics such as larger tumor, more lymph node metastases, higher pathological grades, higher Stages, and negative ER and PgR, compared to Survivors; however, clinical characteristics of primary tumors with Late recurrence were almost the same as Survivors (Tables 1-3). In addition, we showed that a decrease in host defense immunity, activation of pro-cancerous immune cells and a decrease in immune cell cytolytic activity in BC were closely related to late recurrence by computational biologically analyzing two large primary BC cohorts. This study generated three interesting results with clinical implications. First, primary tumors of Survivors were associated with anti-cancer signaling such as INF-α/-γ response and TNF-α signaling, compared with the recurrence groups (Figs 1 and S1). In addition, in both distant and local recurrence analyses, Survivors correlated with TNF-α signaling via NFκβ compared to the Late group (Fig. 2). These results support the hypothesis that immune system status is implicated in the prevention of BC recurrence 28 . Furthermore, primary tumors with earlier recurrence (Early and Mid) were mainly associated with cell cycle related gene sets and MYC target gene sets involved in BC exacerbation and primary tumors with Late recurrence were associated with estrogen signaling, compared with Survivors, as described previously [1][2][3][4][5][6][7][8] (Figs 1 and S1). Interestingly, in local recurrence, estrogen response gene sets were found to be more predominant than those of distant metastasis. Second, host defense immunity (leukocyte fraction, lymphocyte infiltration, and macrophage fractions) was decreased in primary tumors with Late recurrence compared with Survivors. In addition, primary tumors with Late recurrence were significantly associated with low diversity of TCR and specific "Immune Subtypes", such as, C1 (wound healing) and C2 (IFN-γ dominant) (Fig. 3). To our knowledge, there has been no report that host defense immunity is involved in BC late recurrence. Finally, late recurrence was associated with activation of pro-cancerous immune cells and a decrease in cytolytic activity of immune cells in primary breast tumors. Utilizing the CIBERSORT algorithm, we showed that anti-cancer lymphocytes, memory CD4+ T cells and γδT cells, were significantly lower, and pro-cancerous www.nature.com/scientificreports www.nature.com/scientificreports/ regulatory T cells were significantly higher in Late tumors compared to Survivors (Fig. 4). In agreement, CYT score that assesses immune cell cytolytic activity was significantly lower in primary tumors with Late recurrence compared to Survivors and low CYT score in primary tumors was statistically significantly associated with worse DFS in the Late group (Figs 4 and 6). Interestingly, in local recurrence, there was no statistically significant www.nature.com/scientificreports www.nature.com/scientificreports/ difference between timing of cancer recurrence and Survivors (Fig. 5). It has been reported that BCs are infiltrated with diverse populations of immune system cells and these infiltrates appear to be associated with disease outcome 6 . For example, patients with gene signatures of Th1/CTL phenotype were shown to have favorable outcomes whereas Th2/B-cell related genes were more likely to occur in patients with HR−/HER2− disease 29 . In addition, www.nature.com/scientificreports www.nature.com/scientificreports/ some translational studies in patients with breast carcinoma have suggested that infiltration by pro-cancerous immune cells such as regulatory T cells might have a great response to chemotherapy and might affect the clinical outcome 10 . However, there were no reports as we have shown that pro-cancerous immune cells in tumor tissue may be involved in the timing and type of recurrence of breast cancer.
In general, late recurrence seems to be a reflection of a very slowly proliferation of BC cells dormant in distant sites 6 . The fact that dormant micrometastases stay in distant organs for many years suggests a long evolutionary process of these cells after their departure from the primary tumor. During this time, independent genetic and epigenetic traits may arise and drive the recurrences which will not be present in the original primary tumors 30 . However, we did not access the gene expression and distribution of immune cells in recurrence tumors by timing of cancer recurrence. The methods of assessing immune infiltrates in BC are quite varied and due to these differences individual studies are not comparable to each other. Liquid biopsy, which is a non-invasively conducted genetic test using genes extracted from body fluids such as blood and urine, has been developed as a way of providing relevant predictive information related to the tumor tissue as previously demonstrated 29,[31][32][33][34] . If tumor immune microenvironment can be monitored by liquid biopsy, it is expected to deepen the understanding of the authentic clinical and prognostic value of immune system cells in BC patients.
Although the study demonstrates promising results, it has limitations. First, this is a retrospective study utilizing publicly available datasets, thus it is prone to selection bias. Second, this study is based on the gene expression of the primary tumor in TCGA and METABRIC cohorts, and as it does not include any in vitro or in vivo experiments it also therefore does not delve deeply into the mechanism of our results to further understand the correlations reported.
In conclusion, we demonstrated the relationship between late recurrence and clinical factors, gene expression profiles, and immune status utilizing collected data from TCGA and METABRIC primary BC cohorts. Not only host defense immunity, but also pro-cancerous immune cells and cytolytic activity of immune cells were associated with Late recurrence in primary BC. Based on these reported results, we anticipate that further research can be conducted to establish a greater understanding of the role of immune cells in BC cancer recurrence.