Introduction

Tumor-infiltrating lymphocytes (TILs), play an important, but incompletely understood role in chemotherapy response and prognosis in breast cancer [1, 2]. For more aggressive types of breast cancer, such as HER2+ and triple-negative (TNBC) histopathologic subtypes, a higher TILs score in the primary tumor is associated with improved prognosis and response to therapy, whereas there is little or no association between TILs and outcomes for ER+ disease [3,4,5,6]. ER+ and Luminal disease seem to be less immunogenic, although there is some heterogeneity in immune gene expression [7,8,9].

Data from prior studies indicate that spatial heterogeneity in TILs is associated with breast cancer subtypes and outcomes [10,11,12,13,14]. Specifically, the location of immune cells in intratumoral stroma and/or their proximity to tumor cells [15] appear to correlate with prognosis in ER- tumors [13], TNBC [16], and ER+ tumors [14]. These profiling methods using histologic images showed the importance of studying the spatial heterogeneity of TILs but were limited by the small number of immunohistochemical markers used to estimate TIL abundance.

Larger scale immunoprofiling may elucidate the relationship between specific immune subpopulations and breast tumor subtype. Recent studies have leveraged bulk RNA sequencing to identify clinically-relevant immune phenotypes and evaluate how immune cells mediate chemotherapy response and immune checkpoint blockade [17,18,19,20,21,22,23]. In a gene expression analysis of ~11,000 breast tumors, Ali et al. investigated 22 subsets of immune cells and found that T regulatory cells (Tregs) and M0 and M2 macrophages were most strongly associated with poor outcome, regardless of ER status [23]. Although these studies in bulk tissue have uncovered prognostic immune cells and signatures, they lack resolution on intratumoral heterogeneity and the spatial context of immune cells across the tumor.

Recent studies biomarkers of response to immunotherapy in melanoma incorporated high-plex immune expression data with spatial information, using the GeoMx® (NanoString) digital spatial profiling (DSP technology [24,25,26]. They demonstrated that PD-L1 expression in CD68-positive cells was a predictive marker for progression-free survival, overall survival, and response to therapy [24]. Conversely, PD-L1 measured in tumor cells was not prognostic. A recent paper using the same high-plex platform found that elevated expression of HLA-DR in TNBC was associated with long-term disease-free survival. Specifically, HLA-DR protein expression in the epithelial compartment was a better discriminator of outcome than stromal expression of HLA-DR [27]. These findings underscore that both high-plex immune marker expression and tissue context may be important variables in immune-based prognostication.

We sought to evaluate differences in immune biomarker expression, while also considering tissue context, in a population-based study of breast cancer. Specifically, we sought to compare immune marker expression in epithelium-rich and CD45+ (immune cell) infiltrated “hot spots” within the tumor according to breast cancer subtype. We also compared methods for capturing immune response based on whole slides and tissue microarrays (TMA). The resulting data provide technical insights into approaches for immune profiling and point to important immune differences among breast cancer subtypes.

Materials and methods

Study population and samples

The Carolina Breast Cancer Study (CBCS) is a population-based study of African American and Non-African American (98% Caucasian, referred to as White) women from 44 counties of central and eastern North Carolina conducted in three phases (phase I: 1993–1996; phase II: 1996–2001; and phase III: 2008–2013); study details and sampling schema have been described previously [28,29,30,31,32]. Briefly, cases were women ages 20–74 years, diagnosed with a first primary invasive breast cancer, and identified via rapid case ascertainment. Black and younger women (age <50) were oversampled. Race was determined by self-report and categorized as white or black. Tumor characteristics for cases (e.g., tumor size, grade, hormone receptor (HR+) status, node status, and stage) were abstracted from medical records and pathology reports. For this paper, we utilized patient samples from phase III (CBCS3), which recruited women between 2008 and 2013. Patients who provided informed consent completed a baseline questionnaire regarding personal characteristics, including socioeconomics, insurance status, health behaviors, and health history, in addition to the collection of patient tumor tissue, blood, and medical records. IHC for subtype classification was performed in a central laboratory, and designations have been previously described and were defined as follows: Luminal A is ER ≥ 10% or PR ≥ 10% and Ki67 < 7%, Luminal B is ER ≥ 10% or PR ≥ 10% and Ki67 ≥ 7%, HER2+ is ER < 10% and HER2 = 3, and Basal-like is ER < 10%, PR < 10%, and any EGFR or Ki67 positive signal. For Luminal subtypes, if Ki67 was missing, grade was substituted; grade ≤ 2 for Luminal A and grade = 3 for Luminal B [33, 34]. The Ki67 cut-off for the samples we used in the CBCS was developed and reporting in Allott et al. [34]. They identified optimal Ki67 thresholds by generating receiver operative characteristics (ROC) curve among all Luminal tumors regardless of IHC-based HER2 status and applying the Youden method [35] to maximize the sum of the sensitivity and specificity for PAM50-defined Luminal tumors. They identified an optimal Ki67 threshold of 7.1% [34]. The study was approved by the Office of Human Research Ethics/Institutional Review Board at the University of North Carolina at Chapel Hill, conducted in accordance with U.S. Common Rule. Written informed consent and HIPAA authorization were obtained from each participant.

Three Basal-like and three Luminal A samples were chosen to perform whole slide DSP. Samples had previously been cored for TMA based on H&E analysis of slides. H&E-stained TMAs were analyzed with the Aperio (Leica Biosystems, Wetzlar, GR) GENIE algorithm to determine immune cell infiltration. Selected samples had 1% or more immune infiltrate. One un-stained, TMA-cored, whole slide per sample was used for the DSP assay. Four TMA slides from CBCS3 were chosen for analysis, encompassing cores from the six tumors that were analyzed on whole slides (described above). These TMAs also included cores from 69 other tumors. Each TMA had 2–4 (with the vast majority having four) cores per patient sample and included 11–27 patient samples.

Digital spatial profiling (DSP)

DSP was performed using the NanoString (Seattle, WA) GeoMx® platform [36]. Immunofluorescence (IF) for pan-Cytokeratin (tumor), CD45 (leukocyte), CD68 (macrophage), and a DNA stain (SYTO 83) were used to visualize tissue compartments and regions of interest (ROIs). DSP analysis included 61 oligonucleotide-conjugated antibodies, including antibodies for negative control IgGs and housekeeping proteins. All IF antibodies and oligo-tagged antibodies were from Abcam (Cambridge, UK). After hybridization of the capture probes to FFPE slide-mounted tissues, the oligo tags were released from the ROIs via targeted ultraviolet radiation exposure, and then were counted in a Nanostring nCounter assay [36].

For whole slides, the Basal-like and Luminal A slides each had 12 ROIs selected from areas that were either epithelium-rich or immune hot spots based on pan-Cytokeratin and CD45 IF signal, respectively. No regions of uniform CD68 staining were identified, but CD68 staining was often diffusely evident in CD45-positive regions. ROI selection was performed with guidance from a board-certified pathologist. Each slide had four small ROIs (300 µm), four medium ROIs (500 µm), and four large ROIs (650 µm). For TMAs, we used four TMAs, representing a total of 75 patients each with 2–4 1 mm cores. The distribution of subtypes on these TMAs included 5 HER2-positive, 31 Luminal A, 21 Luminal B, 14 Basal-like, and 4 with missing IHC subtype calls [33]. The sample was roughly equally divided by race, with 37 Black patients and 38 non-Black patients. We again stained with immunofluorescent markers for CD45, CD68 and pan-Cytokeratin and selected ROIs with >70% tumor cellularity, resulting in a total of 346 ROIs selected, with 1–3 ROIs per core. ROI sizes ranged from 100 to 650 µm in diameter. ROIs were selected to contain 70% epithelial (pan-Cytokeratin positive) cell content in a 650 µm circular ROI. If that was not possible, a smaller circular ROI (300–100 µm) with 70% tumor cell content was selected.

Data normalization and visualization

RCC files of multiplex data for ROIs were loaded into the DSP app developed by NanoString. Within the app, sample ROIs were visualized and normalized using several normalization options, including positive control normalization (ERCC internal spike-in controls), negative control normalization (mouse and rabbit IgGs), housekeeping control normalization (Ribosomal S6 protein and Histone H3), and ROI/ultraviolet-light mask area normalization. Raw and normalized data were analyzed by unsupervised hierarchical clustering and visualized in a heatmap within the app or exported for further visualization (described below). For the final normalization, digital counts from barcodes corresponding to protein probes were normalized to internal spike-in controls (ERCC) to account for technical variation. Counts were then normalized to ultraviolet-light mask area. The same normalization method was used for TMA analysis.

All raw and normalized data were visualized using relative log expression plots, or boxplots of log2-transformed protein expression (for all markers, including tumor proteins, immune proteins, and housekeeping proteins). Data were also analyzed by principal component analysis. Visualization of raw and normalized data was performed for all ROIs for both whole slides and TMAs. Visual assessment of normalized data for whole slides and TMAs was similar for all normalization methods, but ERCC and area normalization reduced the technical variation associated with varying ROI size. All data visualization was performed in R version 3.6.1 [37]. After normalization, protein expression values were log2-transformed and median-centered using Cluster 3.0 [38]. Centroid linkage hierarchical clustering of the log2-transformed, median-centered protein expression with was also done in Cluster 3.0 and visualized using Java Tree View [39]. Heatmap annotations were added in Adobe Illustrator.

Statistical analyses

Supervised protein expression analyses visualized in volcano plots were performed in R. The subtype comparisons are as follows, Basal versus Luminal A, Basal versus Luminal B, Luminal A versus Luminal B. First, Wilcoxon p values were calculated for the difference in protein expression for each protein between subtypes. The p values had FDR-BH adjustment, with a q < 0.05 marked as significant. The p values were then −log10-transformed. Then, the fold change in protein expression values were calculated by taking the average expression of each protein per subtype and subtracting one subtype from the other. A dot plot was made using the −log10 q values and fold change values For the volcano plots, each data point is represented by a black dot with annotation for markers with q < 0.05 and fold change > 2. Dashed lines indicate the cut-off q < 0.05 and fold change < 2.

The GeoMx® ImmunoOncology Assay does not provide single-cell co-localization of markers, but many of the immune cell types that were evaluated had more than one marker in the assay. This facilitated the development of cell type scores (using the average or median of normalized expression values of the markers of interest), similar to the approach often used in bulk sequencing or expression analysis [40,41,42,43]. Treg scores were calculated by taking the median of the log2-transformed values for CD4, CD25, and FOXP3 for each ROI [42]. For whole slides, the dataset was separated into immune hot spots and epithelium-enriched ROIs, and the Treg score was calculated for each ROI. For analysis of TMAs, the average of the log2-transformed values for each protein was first calculated from all ROIs available on a given sample. Treg scores were then calculated for each sample. In an exploratory analysis, we used a ROC curve to illustrate the potential ability of the Treg score to distinguish Basal-like from Luminal A tumors. ROC curve analysis was performed in R using the Epi package [44], and sensitivity was calculated. Treg score boxplots were visualized in R using ggplot2 [45], and Wilcoxon p values were calculated for subtype comparisons. For whole slides, immune score and p value were stratified by dataset (immune hot spot or epithelium-enriched region).

Quality control assessment

Although immune marker expression patterns were similar in whole slides and TMAs, we formally assessed within sample agreement in whole slides versus TMAs. We compared four of the six samples used for both whole slides and TMAs, as two samples only had one epithelium-enriched ROI in the whole slides. Images of nine epithelium-enriched ROIs measured on whole slides and on TMAs for a given sample (Sample A) are shown in Supplementary Fig. 1A, B, respectively. We examined the variability in CD45 expression between each ROI and saw that expression was similar on whole slide and TMA (Supplementary Fig. 1C). In Supplementary Fig. 1D, we show a density plot of standard deviation across all 44 stromal and immune markers (Table 1). The standard deviation of expression for each protein was calculated as follows: for each sample, the standard deviation of single proteins was calculated for all the ROIs for the sample. Once the standard deviation was calculated for all proteins for all samples, we visualized the distribution of these standard deviations using ggplot2 density plots in R [37]. The distributions of standard deviation obtained from whole slides and TMAs were similar (Supplementary Fig. 1D).

Table 1 Complete list of 44 markers used in NanoString GeoMx® Digital Spatial Profiling (DSP).

Across all four samples, intraclass correlation coefficients (ICCs) were good to excellent between single sample ROIs in whole slides (0.837–0.949) and in TMAs (0.844–0.938). Concordance was also excellent between TMAs and slides (0.829–0.916). Our results suggest that efficient immunoprofiling processing of hundreds of samples using TMAs is feasible, due to similarity in expression profiles between whole tissue and TMAs when focusing on epithelium-enriched regions. For calculating the ICC within either whole slide ROIs per sample only or TMA ROIs per sample only, a one-way random effects model was chosen, with single unit measures and assaying agreement. For comparing the average protein expression for a sample in slides versus TMAs, a test-retest approach was used. The one-way random effects model was chosen, with average measures and assaying agreement.

For validation of ROI selection criteria, the expression of two B-cell lineage-specific antigens, CD19 and CD20, was compared. These markers are expressed on the surface of mature B cells and are often used together to measure human B-cell populations [46]. In addition, tumor-infiltrating B cells in breast cancer have been associated with improved clinical outcomes [20, 47], and may provide an interesting avenue of future biomarker study in the CBCS. CD19 expression was analyzed with by IF (PA0843, Leica, RTU) on the TMAs used for DSP analysis. The log2-transformed total CD19-positive cells counts were compared to log2-transformed, normalized CD20 values from DSP. The correlation between the expression of the two separate B-cell markers analyzed with different methodologies is shown in Supplementary Fig. 1E (R squared = 0.3244, p < 0.0001). In addition to the CD19 vs. CD20 analysis, we confirmed correlations between the multiplexed protein quantitation and standard IHC for ER, PR, and HER2 (ER, r = 0.978; PR, r = 0.934; HER2, r = 0.993).

Results

Whole slide analysis of intratumoral immune marker heterogeneity

To measure differences in immune marker expression on whole slides, we selected six whole tumor slides with evidence of >1% infiltrating immune cells, including three invasive breast cancers of Basal-like and three of the Luminal A subtype (as defined by central IHC) [33]. A board-certified pathologist (BCC), identified 12 ROIs per slide, including areas heavily infiltrated with CD45 cells (i.e., “immune hot spots”) and areas with little evidence of CD45-positive immune infiltration (“epithelium-enriched regions”). Figure 1 shows selected ROIs with CD45 (leukocytes), CD68 (macrophages), and pan-Cytokeratin (tumor cells) IF markers to visualize regions.

Fig. 1: Subtype immune marker heterogeneity is apparent in epithelium-enriched regions versus immune hot spots.
figure 1

A Representative whole tumor slide stained with CD45 (red), CD68 (yellow), and pan-Cytokeratin (green), in addition to 61 oligo-conjugated antibodies for immune and tumor cell markers. B Regions of interest (ROIs) were selected based on cellularity, large (650 µm), medium (500 µm), and small (300 µm). C Heatmap of protein expression for whole slide dataset with immunohistochemistry (IHC) subtype and cellularity labeled. Protein class clusters are denoted by colored bars and branches, with dark blue denoting stromal proteins, light blue denoting T-cell and immune activation markers, and pink denoting immunosuppressive markers. D, E Volcano plots for Basal-like vs. Luminal A subtypes were run separately for (D) immune hot spot ROIs and (E) epithelium-enriched ROIs. Data points are represented as black dots with annotation for markers with q < 0.05 and fold change > 2. Dashed lines indicate the cut-off q < 0.05 and fold change < 2.

Using NanoString GeoMx® DSP, 44 targets for immune and stromal markers (Table 1), with an additional 11 tumor and proliferation markers, were measured. After normalization to control for ROI size, unsupervised hierarchical clustering was performed and heatmap visualization revealed two main clusters of protein expression, largely split by whether CD45/CD68 levels were high (indicating immune “hot spot”) or pan-Cytokeratin levels were high (indicating epithelium enrichment) (Fig. 1C).

To assess whether there were differences in immune profile by breast cancer subtype, we performed supervised analysis. Within immune hot spots, there were few significant differences between Basal-like and Luminal A tumors (Fig. 1D), whereas in epithelium-enriched regions, there were substantial differences in immune marker expression by subtype (Fig. 1E). In epithelium-enriched regions, suppressive immune markers and markers of proliferation were significantly more highly expressed in Basal-like tumors, whereas in immune hot spots, only a single anti-apoptotic marker (Bcl-2) was more highly expressed in Basal-like tumors.

Many of the immune cell types that were evaluated had multiple markers GeoMx® ImmunoOncology Assay, facilitating the development of cell type scores to estimate the abundance of immune cell populations in the tumor microenvironment. We assessed various immune cell scores based on the median expression across multiple, related immune markers. For example, the median expression of CD8 and Granzyme B (GZMB) was used to create a CD8+ T-cell score. Within immune hot spots, there were no significant differences in the value of scores for CD8+ T-cell, B cells, macrophages, dendritic cells, and Tregs across tumor subtypes. However, within epithelium-enriched regions, Treg scores (based on CD4, CD25, and FOXP3 protein expression) were significantly higher in Basal-like samples compared to Luminal A tumors (Fig. 2A).

Fig. 2: Treg marker expression is higher in Basal-like tumors.
figure 2

A T regulatory (Treg) signatures were higher in Basal-like tumors compared to Luminal A tumors only in epithelium-enriched (p = 0.02) areas but not in immune-high regions (p = 0.86). B Higher Treg signature expression in Basal-like tumors in tissue microarrays (TMAs) (p = 0.0078). C Receiver operating characteristic (ROC) analysis shows 92.9% sensitivity in Basal-like versus Luminal A classification based on Treg signature expression. D Significant differences in immune marker expression between Luminal A and Basal-like tumors in TMAs. Volcano plot of Basal-like versus Luminal A. Data points are represented as black dots with annotation for markers with q < 0.05 and fold change > 2. Dashed lines indicate the cut-off q < 0.05 and fold change < 2.

Scaling measurement of epithelium-enriched immune expression to TMAs

To confirm these patterns within a larger dataset, we extended the DSP technology to TMAs. Comparing 14 Basal-like to 31 Luminal A cases, we identified immune profiles associated with each subtype. Similar to the findings with the whole slides, the Treg multi-marker score was significantly higher in Basal-like versus Luminal A tumors (Fig. 2B). Furthermore, the Treg score discriminated between Basal-like and Luminal A cases with 92.9% sensitivity in a ROC analysis (Fig. 2C). We next performed supervised analysis of immune marker expression by subtype and found significant upregulation of several other immune markers in Basal-like versus Luminal A (Fig. 2D). While the emphasis of this study was on identifying markers for Basal-like immune response, we also explored qualitative differences in immune marker expression between Luminal A and Luminal B tumors and high risk of recurrence vs. low risk of recurrence (ROR-P) (Supplementary Fig. 2), though sample size and power were limited. We observed a tendency towards higher immune marker expression in Luminal B and high risk of recurrence tumors.

Discussion

Studying immune marker expression in both whole slides and TMAs, we found that epithelium-enriched regions show significant differences in immune marker expression by breast cancer subtype. Compared to Luminal A tumors, Basal-like breast cancers have higher expression of Treg markers, as well as a number of other immune and proliferation markers. In contrast, the ability to detect subtype-specific immune marker expression appeared to be obscured in areas with overall high immune marker expression. This suggests that in bulk tissue studies, some subtype-specific differences may be obscured due to the complex patterns of immune response in the whole section.

Our results validate a number of studies showing that Tregs are more highly infiltrated in triple-negative and Basal-like breast cancer [48,49,50,51,52,53,54]. Interestingly, we did not observe Treg differences in immune hot spots. This may provide an explanation for null findings and/or lack of focus on Tregs in previous bulk tissue studies of breast cancer, which have instead focusing on CD8+ T cells and macrophages [8, 23, 55]. Treg differences may have been missed in those studies if hot spots and epithelial rich sections are averaged. To contextualize our finding about the importance of Tregs in Basal-like cancers, we note that infiltrating Tregs, marked by their expression of Forkhead box protein P3 (FOXP3), have been reported to be prognostic in breast cancer [56]. Specifically the presence of Tregs is a poor prognostic indicator in ER+ breast cancer, but a favorable prognostic factor in HER2+/ER− disease [56].

There were several strengths of this study, both technical and substantive. First, our data preserved spatial context and simultaneously evaluated multiple immune biomarkers. Interestingly, both context and multiplex may be important; the combination of Treg markers as a score showed important subtype-specific differences, even though not all the individual markers for Tregs were significantly different. These data help validate a novel method and identify sampling strategies for future immunoprofiling of breast cancers. Second, we also technically confirmed correlations between the multiplexed protein quantitation and standard IHC for ER, PR, and HER2 and evaluated intratumoral heterogeneity to inform our sampling strategy. We were able to compare whole slides and TMAs from the same breast cancers and confirm that TMAs are appropriate for studying the tumor immune microenvironment. Our results also align with at least one other study of breast cancer [27, 57] in showing that epithelium-enriched regions show important immune response information. Third, we used a well-annotated data source representing a diversity of patients. Finally, despite a limited sample size, we were able to show significant differences in Basal-like vs. Luminal breast cancer, implying that future studies with larger datasets will provide even greater understanding of differences.

There were some limitations of our analysis. We recognize that this approach does not provide evidence of marker co-localization studies, which is partially offset by using multiple markers [58,59,60,61,62,63] (e.g., Treg markers CD4, CD25, and FOXP3) as a proxy for a single-cell type. We also were unable to compare RNA-based molecular subtypes because data were not available for 35% of cases on our TMAs. We also lacked power to assess differences by race and age. Future expanded analyses should evaluate differences in immune response by race and in association with breast cancer progression. Previous studies in The Cancer Genome Atlas did not find strong differences in the breast cancer immune microenvironment by race, but this research was also limited by a small number of Black patients [64]. We performed ROC curve analysis to test whether an immune score, such as the Treg score, has value in predicting subtype, but we acknowledge that we have limited sample size and therefore these analyses may not be stable estimates. Finally, we acknowledge that we were unable to use ROI masking effectively. Previous studies have used masking to profile immune hot spots [24, 25], however this type of masking is impractical in breast tumors, where regions of immune infiltration tend to be segregated by stromal or epithelial components.

In summary, this work demonstrated that TMAs are a viable approach for immunoprofiling, providing resolution that would be missed in single-marker studies and in bulk tissue studies. This finding is particularly impactful for large cancer epidemiological studies, which often have scarce whole slide tissue and/or only retained TMAs with epithelium-enriched cores. This work represents a first step toward developing feasible immunoprofiling approaches that could be conducted on a large scale, and ultimately combined with genomic datasets, to improve breast cancer prognostics and identify new therapeutic opportunities.