## Introduction

The tumor microenvironment influences cancer initiation and progression1,2. In breast cancer, clinicopathological characteristics such as age, grade, stage, and molecular subtypes associate with prognosis and drive treatment decisions. High-throughput gene expression analyses led to a molecular classification of breast cancers3,4. The five clinically relevant molecular subtypes: Luminal A, Luminal B, Her2-enriched, Basal-like, and Normal-like, have different incidences, survival, prognosis, and tumor biology. Such patient stratification has clinical and economical utility in breast cancer management5.

In addition to cancer cell biology, an inflammatory microenvironment influences initiation and progression6. The immune microenvironment surrounding cancer cells can recognize and inhibit tumor growth7 or promote progression8. It is crucial to characterize the quality and quantity of immune response at the tumor site, as it may help to pinpoint patients who could benefit from immunotherapies and will improve our understanding of the tumor–host biology.

In breast cancer, high immune infiltration has been associated with better clinical outcome9,10. In particular, high CD8+ T cell infltration associate with better overall survival (OS) in estrogen receptor (ER)-negative patients11,12. In addition, high immune infiltration has been associated with an increased response to neo-adjuvant and adjuvant chemotherapy13.

Recently, we and others have demonstrated that transcriptomic data can be leveraged to dissect the tumor microenvironment14,15,16,17,18,19. Such methods have shown that elevated expression of leukocyte marker genes associates with a lower risk of breast cancer recurrence14,17,20,21. Notably, Ali et al. and Bense et al. recently reported through comprehensive studies how specific immune cell types influence breast cancer outcome14,22. In these studies, the authors assessed each predicted cell type individually and did not consider the immune microenvironment as a whole. More studies are needed to specify the role and the clinical relevance of the immune contexture in breast cancer.

In the present study, we discover clinically relevant immune clusters with gradual immune infiltration. In 15 breast cancer cohorts, spanning 6101 breast cancer samples, the group of patients with intermediate levels of tumor immune infiltration has a worse prognosis independently of known prognostic molecular and clinicopathological features. Through characterization of the immune composition of the clusters, we find a pro-tumorigenic immune infiltration associated with the poor prognosis group. Further phenotypical analyses show two mutually exclusive aggressive tumor phenotypes in breast cancers, one linked to epithelial–mesenchymal transition (EMT) and the other to proliferation. Both phenotypes are found in the poor prognosis cluster on an inactive/pro-tumorigenic immune microenvironment.

## Results

### Immune clusters in breast cancer

The expression of 760 genes in 95 formalin-fixed, paraffin-embedded (FFPE) tumor samples of the MicMa cohort was measured using the nCounter® PanCancer Immune Profiling array, an array designed to profile immune infiltration in solid tumors. Seventy-nine of these 95 samples have been previously profiled by Agilent whole-genome 4 × 44K oligo array23. We first compared the expression obtained with the two platforms using Pearson and Spearman correlations and found a high degree of positive correlation between the genes’ expression values (Supplementary Fig. 1A).

In order to group patients according to their similarity in expression of the immune-related genes, we performed unsupervised hierarchical clustering of the correlation matrix (Fig. 1a: 95 MicMa-nCounter and Supplementary Fig. 1B; 104 MicMa-Agilent samples). Silhouette plot analysis from 3 to 10 clusters indicated that 3 clusters captured best the segmentation of both the nCounter and the Agilent datasets (Supplementary Fig. 1C, D). We therefore continued our analyses based on three clusters of patients. We compared the clustering obtained from FFPE: MicMa-nCounter, 95 samples (correlation matrix obtained from the expression of 760 genes on the Immune Profiling array) to the clustering performed on fresh frozen tissue MicMa-Agilent, 104 samples (correlation matrix obtained from the expression of the 509 genes on the Immune Profiling array found in all datasets used in this study). Seventy-nine samples were overlapping in these two datasets. With different platforms used to measure gene expression, as well as incomplete overlap in gene lists and samples used to perform unsupervised clustering, we still found the cluster assignment for the 79 overlapping samples significantly similar (Supplementary Table 1 with Fisher exact test <0.0001).

To confirm that the clusters were associated with the tumor immune microenvironment (Fig. 1b), we used the algorithm Nanodissect to score for total lymphocyte and myelocyte infiltration17,24,25. Nanodissect scores were first validated in the MicMa cohort using the evaluation of immune infiltration of matched hematoxylin and eosin (H&E) sections analyzed by experienced pathologists (Fig. 1c and Supplementary Fig. 1E).

We found the three clusters significantly correlated with Nanodissect lymphocyte (Fig. 1b) and myelocyte (Supplementary Fig. 1F) scores. In addition, Chi-squared test showed significant association between clusters and immune infiltration assessed by experienced pathologists (p < 0.0001). We concluded that Clusters A–C reflect gradual immune infiltration and were therefore called immune clusters.

### Clusters reflect gradual immune infiltration

We validated the association between the clusters and lymphoid/myeloid infiltration using the expression data from nine other cohorts (Supplementary Table 2). As stated above, 509 of the 760 genes on the nCounter® PanCancer Immune Profiling array were found in all datasets studied, the expression of these 509 genes was used in the unsupervised clustering (Fig. 1d and Supplementary Fig. 2A for the clustering of the METABRIC and The Cancer Genome Atlas (TCGA) cohorts, respectively). In each cohort, the three clusters obtained were significantly associated with lymphoid and myeloid Nanodissect scores (Lymphoid score: METABRIC, Fig. 1e; TCGA, Supplementary Fig. 2B).

Lymphoid and myeloid infiltrations gradually increased from Cluster A (blue; low infiltration; cold tumors) to Cluster B (light blue; intermediate infiltration) and Cluster C (pink; high infiltration; hot tumors).

For an additional layer of validation, we used the pathological assessment of immune infiltration in the METABRIC cohort26, which was significantly associated with the Nanodissect scores (Fig. 1f and Supplementary Fig. 2C) and with the immune clusters: Chi-square test between immune clusters and pathological assessment of immune infiltration p value < 0.0001. We could now strongly conclude that unsupervised hierarchical clustering using genes of the PanCancer Immune Profiling array allows to group breast cancer tumors according to gradual levels of immune infiltration.

### Immune clusters associate with prognosis

We examined the immune clusters in perspective of survival using Kaplan–Meier analysis and log-rank tests. For the two largest cohorts METABRIC (n = 1904) and TCGA (n = 981), we found Cluster B (with intermediate levels of immune infiltration) associated with worse prognosis (Supplementary Fig. 3A, B). Such a worse outcome for Cluster B cases was also observed when stratifying for ER-negative (Supplementary Fig. 3C, D) and ER-positive cases (Supplementary Fig. 3E, F) separately. To refine our observation, we plotted patient survival according to Cluster B (light blue) vs Clusters A and C (purple) and confirmed a clear and significant worse prognosis for patients in Cluster B (Fig. 2). We further validated this result in four additional cohorts with relevant survival data: TAI (n = 327), VDX (n = 344), STK (n = 159), and UPP (n = 251) (Supplementary Fig. 4). We concluded that immune clusters associate with prognosis both in ER-negative and ER-positive breast cancers.

### Predicting immune clusters with binomial logistic regression

Motivated by the clinical relevance of the immune clusters, we aimed at developing a general method that could precisely and sensitively predict the classification of patients to the worse prognosis group without having to rely on unsupervised clustering. We developed a model through training on 10 cohorts (4546 samples) and testing on 5 others (1555 samples). We used binomial logistic regression penalized by the lasso method to obtain a set of genes (Supplementary Data 1) that sensitively and specifically predict whether a sample is part of Cluster B or not, as assessed by receiver operating characteristic curve and area under the curve (AUC) analysis (Fig. 3a). Our model predicted the immune clusters with an AUC = 85.8% (82.8%–88.7%). We found that 96.3% of the samples assigned to Clusters A and C by clustering were predicted to be A and C by the model, while 68.8% of the samples assigned to Cluster B through clustering were found in Cluster B using the lasso method (Fig. 3b). It appeared that the lasso method decreased the number of samples in Cluster B (Fig. 3b). As unsupervised clustering is less reliable in small cohorts and because learning the cluster assignment from several cohorts will help to precise the phenotype underlying the immune clusters, we hypothesized that the lasso-derived classification would be a better prognostic factor than the clustering method. Indeed, by comparing the survival log-rank test p values, we found that the lasso classification generally improved the significant associations between the immune clusters and survival (Supplementary Table 3). The lasso model was validated in five additional cohorts: Fig. 3c–e for STAM (n = 856), MAINZ (n = 200), and UPSA (n = 289) and Supplementary Fig. 5A, B for CAL (n = 118) and PNC (n = 92).

As the binomial logistic regression only predicted two clusters (Cluster B vs Clusters A and C), we performed another round of binomial logistic regression to distinguish between Cluster A and C with high accuracy (Supplementary Fig. 5C, D). In conclusion, binomial logistic regression penalized by the lasso method refined Cluster B and provided a single sample predictor that could be applicable to every next patient in the clinic. In the subsequent analyses, we use the categories given by the lasso methods as it has a more significant association with survival.

### Immune clusters, an independent prognostic factor

We further investigated how the immune clusters were related to well-known clinicopathological features in breast cancer (size, age, grade, stage, lymph node involvement, and molecular subtypes (PAM50)). Cluster A (with low immune infiltration) was enriched in ER-positive and Luminal cases, while a higher proportion of ER-negative and Basal-like cases was found in Cluster C (with high immune infiltration) (Fig. 4a, b). ER-negative and ER-positive samples as well as the PAM50 subtypes were equally represented in the poor prognosis Cluster B (Fig. 4a, b).

We tested the prognostic impact of the immune clusters while accounting for other prognostic factors using multivariable Cox regression analysis. The variables available for each cohort (ER status, PAM50 subtypes, age, nodal status, size, and grade) were entered into each model. The odd ratios and p values associated with each variable in each model are shown in Supplementary Table 4. We found that immune clusters were an important factor to model survival as shown by the significant p values associated with immune clusters in each cohort Cox model. Indeed, if we removed the immune clusters from the modeling, the Akaike Information Criterion (AIC) index was increased (Supplementary Table 5), demonstrating the important value of immune clusters on top of all other variables for explaining breast cancer survival.

To further test the strength of the immune clusters as an important prognostic biomarker, we used a stepwise backward selection. From the initial Cox models containing all variables, we removed the weakest predictor variable only if this did not weaken the model (as monitored by the calculation of AIC index). This allowed us to find for each cohort the set of variables explaining survival best. For all cohorts, the immune clusters were kept in the best fitted minimal model, and in 9 out of 11 cohorts, the immune clusters were a significant prognostic variable (Table 1). To further emphasize and illustrate the clinical relevance of the immune clusters and their independence from the PAM50 molecular subtypes, we plotted for the METABRIC and TCGA cohorts the Kaplan–Meier survival curve for each PAM50 subtype (Supplementary Fig. 6).

### Validation in a new RNA-seq dataset with risk of recurrence (ROR) scores

We generated a new dataset: EMIT0, which is a subset of the OSLO2 cohort study. The OSLO2-EMIT0 was assessed by the Food and Drug Administration-approved Prosigna risk of recurence (ROR) scores. As recently demonstrated, ROR scores add significant prognostic information above standard clinicopathological features3,27. We assessed whether the immune clusters could add prognostic value to ROR scores. We found Cluster B composed of samples with intermediate ROR scores compared to Clusters A and C (Fig. 4c). This suggested that the poor prognosis associated with Cluster B was not likely to be explained by the information contained in the ROR scores. This observation was also true when assessing the ER-negative (Supplementary Fig. 7A) and ER-positive (Supplementary Fig. 7B) cases separately. For all cohorts, we calculated the ROR scores following Parker et al.3’s method, which is related to PAM50 subtyping3, and confirmed that Cluster B was composed of intermediate ROR scores as exemplified in the METABRIC cohort (Fig. 4d and Supplementary Fig. 7C, D).

Multivariable regression analysis confirmed that the immune clusters bring additional prognostic value to the ROR scores (Supplementary Table 6) as demonstrated by the significant p values for the immune clusters when modeling survival with ROR scores and immune clusters. Through computation of net reclassification improvement (NRI) and integrated discrimination improvement (IDI) indexes28, we emphasized the additional value of immune clusters to classify patients according to survival when taken together with ROR scores, as indicated by the positive NRI and IDI coefficients in all cohorts. Bootstrapping for confidence interval (CI) construction for NRI and IDI showed that, for several cohorts, the immune clusters significantly improved patient classification according to prognosis when added to the ROR scores (Supplementary Table 6). Using complementary statistical analyses, we demonstrate the clinical relevance of the immune clusters in breast cancer.

### Immune clusters and response to neoadjuvant chemotherapy

We further assessed the association between the immune clusters and response to neoadjuvant chemotherapy, using gene expression data from studies in which patients were treated in neoadjuvant setting (chemotherapy before surgery). The endpoint of these studies was pathological complete response (pCR), which means complete eradication of cancer cells at the end of the chemotherapeutic regimen before surgery (see Supplementary Table 2 for datasets used in this section). We used gene expression data from 8 studies (1377 samples), and assigned to each sample its immune cluster belonging using the lasso method. As shown in Fig. 4e, we found the highest percentage of responders in Cluster C (59%), followed by Cluster A (30%) and the lowest percentage of responders in Cluster B (11%). Since Cluster B is also the smallest cluster in terms of patient numbers, we also calculated the percentage of responders within each cluster. Cluster C was composed in average of 42% of responders and 58% of patients with residual disease, whereas Cluster B had 18%/82% and Cluster A 13%/87% of responders/residual disease cases, respectively.

As the pCR rate differs as a function of ER status29, we also calculated the percentage of responders in ER-positive and ER-negative cases independently and found the lowest rate of responders in Cluster B regardless of ER status (Supplementary Fig. 8A, B, respectively).

For each cohort with response to neoadjuvant chemotherapy, we assessed the distribution (chi-square p values, Supplementary Table 7) of the pCR and non-pCR cases across the immune clusters taking into account all cases, or  ER-positive and ER-negative cases independently. When considering the whole cohort, we found the distribution of the responders significantly different across immune clusters, with less responders in Cluster B and most responders in Cluster C. When splitting by ER status, the same tendency was observed although not always significant.

These results demonstrate that patients in Cluster C have a higher probability to be responders, which corroborate previous studies reporting a higher pCR rate for cases with high immune infiltration and/or proliferative phenotype29,30. Our results also highlight a low response rate in Cluster B, suggesting that such patients may be candidates for testing of new neoadjuvant therapeutic options.

### In silico dissection of the immune clusters

To assess whether the gradual immune infiltration in the clusters could explain the association with prognosis, we tested which of the immune clusters or total immune infiltration scores was more predictive of survival in a Cox multivariable regression analysis (Supplementary Table 8). Nanodissect lymphocyte scores were poorly associated with survival, we therefore hypothesized that specific immune cell-type mixtures, rather than the total number of immune cells in the tumor microenvironment, may explain the poor prognosis in Cluster B.

We estimated the proportions of 22 distinct immune cell types using the CIBERSORT algorithm19. We calculated per cohort and cluster the median infiltration of each immune cell type and performed unsupervised clustering of such cell-type-specific median infiltration scores (Fig. 5a). We found that the CIBERSORT inferred immune infiltration recapitulated the immune clusters. Cluster C cases were enriched, among other cell types, for macrophages M1, memory activated T cells, and follicular T helper cells (Fig. 5a), as also illustrated by the distribution of the CIBERSORT scores in the METABRIC and the TCGA cohorts (Fig. 5b and Supplementary Fig. 9). Cluster A had, as expected, very low levels of immune cells. In the poor response and prognosis Cluster B, higher levels of macrophages M2, resting mast cells, and resting memory T cells were found (Fig. 5a), as also illustrated by density plots for the METABRIC and TCGA cohorts (Fig. 5b and Supplementary Fig. 9).

Using generalized linear models, we specified the immune cell types distinguishing between Cluster B vs A–C and identified resting and pro-tumorigenic immune cell types explaining Cluster B (Fig. 5c). We also tested which immune cell types explained the differences between Cluster A versus Cluster B (Supplementary Fig. 10A) and between Cluster B versus Cluster C (Supplementary Fig. 10B). When comparing Cluster A to Cluster B, all immune cell types could explain Cluster B, indeed, Cluster A has no or low immune infiltration. When comparing Cluster B to C, we found again the pro-tumorigenic cell types macrophages M2 and resting mast cells explaining Cluster B. These results suggest that pro-tumorigenic immune infiltration in Cluster B may favor tumor growth. In conclusion, Cluster A is composed of immune-cold tumors, Cluster C contains immune-hot tumors, and cases in Cluster B have a pro-tumorigenic immune infiltration.

### Phenotypic analysis of the immune clusters

To further characterize the phenotype associated with the poor prognosis in Cluster B, we identified through differential gene expression analysis the genes significantly overexpressed in Cluster B. We found 909 genes upregulated in Cluster B when compared to Cluster A and Cluster C separately (Bonferroni-corrected p value < 0.0001; Supplementary Data 3). These genes were associated with stem cell biology and EMT, as shown by the gene set enrichment analysis (GSEA) using the H and C2 collection of the MsigDB31 (Fig. 6a).

To further characterize the relationship between the immune clusters and cancer cell phenotype, we used gene sets associated with EMT, stem cells, hypoxia, and proliferation. In total, 11 gene sets from the MsigDB and an additional EMT-related signature from Tan et al.32 were selected (Supplementary Data 3). We calculated per cluster and cohort an average gene set enrichment score using the GSVA method; this score reflects the activity of each pathway/gene set in an immune cluster33. Unsupervised clustering of averaged-gene-set scores clearly separated the immune Clusters A and C, while Cluster B was divided into two subgroups (Fig. 6b). These results suggested an association between immune clusters and the stem cell/EMT-related gene signatures.

### Two mutually exclusive phenotypes in breast cancer

Through unsupervised clustering of GSVA enrichment scores, we identified two mutually exclusive gene signatures in breast cancer, (i) one associated with proliferation and embryonic stem cell-like phenotype and (ii) and the other with EMT and mammary stem cell phenotype.

A proliferative phenotype was dominating Cluster C (Supplementary Fig. 11A), the same was observed when gene set scores were calculated for each METABRIC sample (Supplementary Fig. 11B). In Cluster B, the average gene set scores were either high for EMT or proliferation-related signatures (Supplementary Fig. 11C). At the sample level in the METABRIC, we observed a similar pattern with samples having the one or the other state activated (Supplementary Fig. 11D). Cluster A showed low scores for both the EMT and proliferative states (Supplementary Fig. 11E, F).

To formally identify which gene set scores explained Cluster B, we tested how each gene set contribute to Cluster B vs Clusters A and C using generalized linear models. EMT signatures contributed positively to Cluster B while proliferation and cell motility were associated with Clusters A and C (Fig. 6c). We also tested which gene set score explained Cluster B when compared separately to Cluster A (Supplementary Fig. 12A) or Cluster C (Supplementary Fig. 12A). We found in both cases EMT scores being a significant explanatory variable of Cluster B. However, EMT signature scores alone were not of strong prognostic value according to Cox regression analysis (Supplementary Table 9). Overall, these results suggest a mutually exclusivity between EMT and proliferation in breast cancers. They also suggest that only when accompanied by a certain immune contexture the EMT or the proliferative phenotype result in poor prognosis.

### Correlation between tumor phenotype and immune infiltration

As immune clusters were associated with both (i) immune cell types and (ii) gene set signatures, we formally assessed the relation between immune infiltration (CIBERSORT) and cancer cell characteristics (gene set scores). Figure 6d shows that the proliferation and EMT scores correlate significantly with different type of immune cells. Notably, high EMT scores are associated with macrophages M2, resting mast cells, and resting memory T cells while high proliferation is correlated with a more active adaptive tumor microenvironment (macrophages M1, T helper cells, activated dendritic cells, and active memory T cells). These data suggest a continuum between the cancer cell phenotype and the composition of the tumor microenvironment.

### Heterogeneity in gene set scores within Cluster B

Cluster B was dominated by samples with pro-tumorigenic immune infiltration and high EMT signal; however, ~35% of Cluster B samples also exhibited a proliferative phenotype. To explore this heterogeneity within Cluster B, we grouped samples according to the gene signature scores in an unsupervised manner into B1 dominated by the EMT phenotype and B2 by the proliferation (Fig. 6e).

In the METABRIC and TCGA, B2 cases with the proliferative phenotype had a worse outcome (Fig. 6f, g, also see Supplementary Fig. 13 in which survival probabilities of B1 and B2 are plotted with Cluster A and Cluster C). While we were able to identify a difference in survival between Cluster B1 and B2 in METABRIC and TCGA, for other smaller cohorts, it was difficult to conclude, as further splitting Cluster B resulted in small groups. To further assess whether the heterogeneity in gene set scores was accompanied by heterogeneity in immune contexture, we sought for differences in specific immune cell types between sub-clusters B1 and B2. Unsupervised clustering in Supplementary Fig. 14 showed that the two sub-clusters B1 and B2 both have a pro-tumorigenic/resting immune microenvironment.

Altogether, the two mutually exclusive states within Cluster B may be relevant in regard to prognosis; however; a unifying factor of Cluster B is the presence of a pro-tumorigenic/resting immune microenvironment.

## Discussion

The tumor microenvironment plays an important role in breast cancer pathogenesis. We provide a new immune-related subtype in breast cancer with relevance for prognosis and response to neoadjuvant chemotherapy in both ER-positive and ER-negative cases. The herein described immune clusters are dependent on both the abundance and composition of the immune infiltrate and are independent of other prognostic factors, including PAM50.

Through unsupervised clustering using the expression of genes part of the nCounter® PanCancer Immune Profiling Panel, we identified in FFPE and fresh frozen breast tumors, three clusters of patients. These clusters were (i) associated with total levels of immune infiltration and with specific immune microenvironment, (ii) provided an independent prognostic information, and (iii) revealed two mutually exclusive breast cancer phenotypes.

As the immune clusters provided an independent prognostic value in breast cancer, we developed a simple method that refined and accurately predicted whether a sample falls in the poor prognosis cluster (Cluster B) or not. We tested our method successfully in 15 cohorts, spanning 6101 breast cancer samples. We demonstrate using different and complementary statistical approaches the strength of the immune cluster as a new prognostic biomarker.

Through phenotypical characterization of the immune clusters, we also identified two mutually exclusive states in breast cancers, one associated with EMT and the other with proliferation. A similar observation of two mutually exclusive states: proliferative and EMT, was recently reported in a pan-cancer genomic analysis of metastatic tumors34. Our study therefore suggests that such a mutual exclusion could be extended to primary breast tumors and possibly to other primary cancer types. The EMT process has often been associated with metastasis35; it has been also previously suggested that transcription factors such as TWIST1, which may drive the EMT process, need to be turned off for the cancer cell to proliferate36. Such a mechanism may explain why these two processes could not coexist in cancer cells.

Samples with the EMT or proliferative phenotype were found in the poor prognosis cluster (Cluster B). About 65% of the samples in Cluster B had an EMT-like phenotype. We further found that this dominating phenotype could help explain Cluster B when compared to Clusters A and C using generalized linear models. As opposed to that, 35% of the samples in Cluster B had a proliferative phenotype like most of the Cluster C samples. Proliferation in Cluster C associated with infiltration of active anti-tumorigenic immune cells, while proliferative samples in Cluster B had infiltration of immune cells less likely to eradicate cancer cells (macrophages M2, resting mast cells). This indicates that in breast cancer a proliferative phenotype associated with a non-adapted, pro-tumorigenic, resting immune microenvironment relates to an adverse outcome as indicated by the Kaplan–Meier analysis (Fig. 6f, g).

Many studies have suggested that EMT drives an aggressive tumor phenotype in breast cancer37,38. However, recent studies have questioned the role of EMT in tumorigenesis, progression, and metastasis39. Importantly, we show here that specific immune infiltration is associated with the EMT process in breast cancer. As a recent study also suggests40, we highlight that immune contexture is an important factor to consider when evaluating the role of EMT during cancer pathogenesis.

Using CIBERSORT19 to infer for specific immune cell infiltration, we found the EMT state highly correlated with infiltration of resting mast cells, macrophages M2, natural killer (NK) cells, and resting memory T cells. It has been previously suggested that EMT could be associated with a pro-tumorigenic microenvironment in lung cancer41. In esophageal squamous cell carcinoma, M2 macrophages promote migration, invasion, and enhance EMT42. On the other hand, mast cells have been associated with angiogenesis in breast cancer43. Based on several gene expression datasets, our current results demonstrate that the EMT process is accompanied with infiltration of pro-tumorigenic/resting immune cell types. The presence of antitumorigenic immune cells, like NK cells, has also been found to be highly correlated with the EMT in melanoma44.

In Cluster C, a proliferative phenotype was found to be correlated with infiltration of activated dendritic cells, T helper cells, macrophages M1, and CD4 memory T cells. These cell types reflect an antitumoral microenvironment. Cluster C is dominated by both a highly proliferative phenotype and high infiltration of antitumoral cell types. One may argue that chemotherapies may successfully eradicate such proliferative tumors with the support of an antitumoral microenvironment; explaining a better outcome of these patients and the higher rate of responders to neoadjuvant chemotherapy in Cluster C (Fig. 4e).

Low immune infiltration in Cluster A associated with neither the proliferative nor the EMT state, which may indicate a less aggressive tumor phenotype.

Previous studies have suggested that basal-like breast cancers display a high metastatic ability associated with mesenchymal features45. Sarrio et al.46 further showed that several markers of EMT were upregulated in basal-like breast cancers46. Our study shows using recent algorithms that the EMT phenotype is enriched in Cluster B. In breast cancer, a recent gene transcriptional profiling has identified an EMT gene expression signature associated with claudin-low and metaplastic breast cancers47. However, the claudin-low subtype in the METABRIC cohort did not correlate with Cluster B.

Our study suggests that targeting the primary pathways involved in EMT such as transforming growth factor-b48, E-Cadherin49, WNT/B-catenin pathway50, Notch51, hypoxia, or tumor necrosis factor-alpha52 are interesting opportunities for therapeutic intervention for patients with the worse prognosis (Cluster B). More importantly, the macrophage re-education strategy, which proposes to remodel M2 type of macrophages into an anti-tumor, “M1-like” mode53, could be beneficial for Cluster B patients.

In the era of modern immunotherapy, a few clinical trials using immune checkpoint inhibitors have been conducted in breast cancers and have been planned to be combined with immunogenic chemotherapy or radiation therapy. The results of the first clinical trials using monoclonal antibodies against immune checkpoint inhibitors have recently been communicated and show some degree of response especially in certain subpopulations54,55. Our study suggests that considering both the immune cell types infiltrating the tumor and the main state of the tumor (EMT or proliferative) will precise treatment decisions and improve response to these new treatment strategies.

## Methods

### Gene expression analysis from FFPE

Operable early breast cancer patients were included in the Oslo1 micrometastasis observational study between 1995 and 199856. Informed consent has been obtained from all participants and the study was approved by the local ethical committee (S-97103). FFPE were collected for a subset of patient that also had fresh primary tumors collected for detailed molecular analyses, a cohort called MicMa. Only patients within the MicMa (n = 96) subset were included in the current analysis. FFPE tissue was first examined with H&E staining to determine the tumor area and dissection was performed to mainly include tumor tissue. RNA purification was performed using the Roche® High Pure FFPET RNA Isolation Kit; ≥1–5 10-μm FFPE slides were used for each tumor. A minimum of 100 ng of total RNA was used on the nCounter platform (Nanostring Technologies, Seattle, WA, US) and the PanCancer Immune Profiling Panel57. Data were normalized using all housekeeping genes and log base 2 transformed.

### RNA-seq analysis of the OSLO2-EMIT0 cohort

The OSL2 breast cancer cohort is a study collecting material from breast cancer patients with primary operable disease in several hospitals in south-eastern Norway. Inclusion of patients started in 2006 and is still ongoing. The study was approved by the Norwegian Regional Committee for Medical Research Ethics (approval number 1.2006.1607, amendment 1.2007.1125). Patients gave written consent for the use of material for research purposes. All experimental methods performed are in compliance with the Helsinki Declaration. Tumor tissue was cut into pieces and mixed before distribution to RNA extraction. RNA was isolated using the QIAgene kit Allprep DNA/RNA/miRNA universal on the QIAcube machine and method (Qiagen). Quality control was performed by Nanodrop ND-1000 (NanoDrop Technologies) and BioAnalyzer 2100 (Agilent) analysis. All RNA had RNA Integrity Number (RIN) ≥ 6. We used Illuminas TruSeq Stranded mRNA Library Prep Kit for the automated NeoPrep Library Prep System (Illumina). Starting amount was 120 ng total RNA and we used Illuminas NextSeq500 sequencers (2 × 75 bp). Raw sequencing read data were demultiplexed and filtered using Bowtie2 against ribosomal, phiX174, and UCSC RepeatMasker sequences. The sequence data were processed as described previously.58 Log-transformed FPKM RNA-seq gene expression data at GEO are available at GSE135298. Raw data are available at EGAS00001003631.

### Data collection and processing

Publicly available expression data from breast cancer cohorts were used in this study. Patients’ consents and ethical approval are available in the respective original articles the datasets were published with (Supplementary Table 2). Expression data were obtained from Gene Expression Omnibus, the European Genome-phenome Archive, ArrayExpress, or TCGA data portals. For survival analyses, we selected studies with >100 samples and relevant survival data from patients with invasive breast tumors sampled at the time of surgical resection without neoadjuvant treatment. Survival data were of four types: relapse-free survival, distant metastasis-free survival, OS, or breast cancer-specific survival.

For analysis of response to neoadjuvant chemotherapy, we selected cohorts of patients treated with a chemotherapeutic regimen and for which gene expression has been profiled from the primary tumor prior to treatment. pCR was assessed at the time of surgery at the end of treatment and refers to the total elimination of cancer cells at surgery.

Except for the METABRIC cohort for which the ER status has been extensively used and defined, we used gene expression data together with the R package optim to systematically infer for ER status using a two-component Gaussian finite mixture model using maximum likelihood estimation as previously described59. Classification into the PAM50 intrinsic molecular subtypes was performed based on gene expression data using the genefu package in R3.

### Gene set enrichment analysis

Gene set enrichment analysis was performed using the Molecular Signatures Database v4.0 (MSigDB31) H and C2 collections. Enrichment was assessed by hypergeometric testing.

### Unsupervised clustering to obtain immune clusters

First a correlation matrix was calculated to assess the dependence between samples initially based on the expression of the 760 genes in the nCounter® PanCancer Immune Profiling Panel and later with the 509 genes that are present in all clustered datasets (training in Supplementary Table 2). Hierarchical clustering of patients’ correlation matrix was performed using the R package pheatmap v1.0.12 using correlation as clustering distance and ward.D as linkage. Clusters were identified using the cutree function. To determine the optimal number of clusters for each cohort, we used the silhouette analysis of KMeans using the cluster R package; for most of the cohorts assessed, three clusters was a better pick than more numerous clusters.

### Nanodissect analysis, lymphoid and myeloid scores

The algorithm Nanodissect (http://nano.princeton.edu) was used as previously described to predict for lymphoid and myeloid infiltration24,25. Breast collection data (May 2013), which contains 17,940 genes measured on 622 arrays, was inspected for genes specifically expressed in lymphoid or myeloid cell types and not expressed in mammary gland or mammary epithelium. The genes with >65% probability to be positive lymphocyte- or myelocyte-specific standard genes as opposed to mammary gland or epithelium were used in downstream analysis. Nanodissect scores for lymphocyte or myelocyte infiltration reflect the average expression of the respective genes (Supplementary Data 4) in a sample.

### CIBERSORT analysis

The algorithm CIBERSORT was used on normalized expression data to infer the absolute proportions of 22 types of infiltrating immune cells. CIBERSORT is a deconvolution algorithm that uses a set of reference gene expression values (547 genes) to predict 22 immune cell type proportions from bulk tumor sample expression data by using support vector regression19. To assess the reliability of the deconvolution method, CIBERSORT derives a p value for each sample. CIBERSORT software package was obtained from the developers, and analysis was performed by using the default signature matrix at 1000 permutations.

### Single-sample GSEA (GSVA)

Gene set analysis was carried out using the GSVA Bioconductor package v1.30.033. We curated gene sets for various epithelial mesenchymal transition, stem cell, proliferation, and cell cycle-related pathways (Supplementary Data 3). For each sample, a score for the enrichment of a set of genes using gene expression profile was obtained.

### Binomial logistic regression to predict immune clusters

We used binomial logistic regression through the glmnet v2.0–16R package60 to develop a method that allows to assign any given sample to the group with the worse prognosis or not without resorting to unsupervised clustering. This predictor method is highly efficient for smaller cohorts and allow to assign class to single samples. To perform the analysis, we mean centered datasets and set up a logistic regression using the binomial distribution to predict categorical response of the two possible outcomes: being in the bad prognosis group or not. This approach gave a signature of target genes that together captured the variation associated with the two categories (Supplementary Data 1).

Patients were divided into Cluster B or Clusters A and C groups according to the following index for patient i:

$${\rm{Index}}_i = \mathop {\sum }\limits_{g = 1}^n \;\beta _g.X_{gi}$$
(1)

where g is the target (gene), n is the number of targets, βg is the lasso coefficient for target gene and Xgi is the gene expression value in sample i. If index for patient i is higher than the intercept = 1.206538657, sample is assigned to Cluster B.

### Pathological assessment of immune infiltration

Vascular invasion, inflammatory cell infiltrate, and necrosis, including relation of tumor cells/tumor stroma, were evaluated on slides stained with H&E as previously described61. Using a simple microscope, subjective categorization of inflammatory cell infiltrate into the categories of “low,” “moderate,” “high,” and “severe” was performed based on the frequency of mononuclear inflammatory cell infiltration observed in the invasive tumor.

### ROR score calculation

ROR scores for each sample were calculated as described in ref. 3, ROR-Score = 0.05 × Basal + 0.12 × Her2-enriched − 0.34 × Luminal A + 0.23 × Luminal B; where Basal, Her2-enriched, Luminal A, and Luminal B are the correlation of each sample to the centroid obtained using the genefu package in R.

### Statistical, survival, multivariable Cox regression analysis

All analyses were performed in the R version 3.3.2. Unless otherwise stated, results were considered statistically significant, if p value < 0.05. Kaplan–Meier estimator and log-rank tests were performed using the functions Surv, survfit, and survdiff (R package survival v2.42–3). Multivariable Cox regression analyses were used to test the independent prognostic value of the immune clusters using the R package survival and the coxph function. Mann–Whitney U or Kruskal–Wallis tests were used to assess statistical significance within boxplots.

In the box-and-whisker plots, the line within each box represents the median. Upper and lower edges of each box represent 75th and 25th percentile, respectively. The whiskers represent the lowest datum still within [1.5 × (75th − 25th percentile)] of the lower quartile and the highest datum still within [1.5 × (75th − 25th percentile)] of the upper quartile.

To identify differentially expressed genes between clusters, we used a t test followed by Bonferroni correction of the p value. A strict corrected p value (p < 0.0001) was used to identify differentially expressed genes.

NRI and IDI were calculated using the survIDINRI v1.1–1 R package. To assess the 95% CI and p values for the IDI and NRI, a standard bootstrap method was used with resampling performed 500 times. NRI and IDI were assessed at the maximum follow-up time as presented in the Kaplan–Meier survival analysis to assess the improvement in performance of the survival model.

Forest plots were obtained using the forestplot v1.7.2 R package and represent for the univariate and multivariate analysis the hazard ratio and their 95% CI. Boxes represent hazard ratios and are inversely proportional to the width of the CI, horizontal lines are 95% CI.

Correlation plot using the corrplot v0.84 package visualizes Spearman correlations, only False Discovery Rate-corrected significant correlation are visualized and colored according to directionality of the rho values. Size of the dots are proportional to the rho value.

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.