Introduction

Over the past decade, there has been an increasing awareness of the key roles played by intratumor heterogeneity (ITH) and the tumor microenvironment (TME) in breast cancer1,2,3. Thus, there is a pressing need to gain a better understanding of the role played by molecular variation in the development and progression of tumors. Recently developed technologies that permit the detailed characterization of complex spatial relationships among tumor, immune, and stromal cells at single-cell resolution hold substantial potential for providing critical insight into the TME, which may help identify opportunities to improve clinical care. Multiplexed tissue imaging methods address these needs by building upon the extensive experience gained over many years by pathologists using immunohistochemistry (IHC). The routine assessment of estrogen receptor (ER) and progesterone receptor (PR) levels using IHC has established them as critical prognostic markers and strong predictors of response to endocrine therapy4,5. Similarly, human epidermal growth factor receptor 2 (HER2) expression helps identify patients who are more likely to respond to anti-HER2-targeted therapy. IHC is commonly used for HER2 protein expression analysis, and fluorescence in situ hybridization (FISH) serves as a complementary approach to confirm HER2 gene amplification. The detection of ER, PR, and HER2 by IHC has been instrumental in determining appropriate therapeutic approaches for breast cancer patients. However, more advanced quantification methods and single-cell analysis have the potential to further refine and personalize treatment strategies.

Previous studies have extensively documented the ITH of ER, PR, and HER2 expression using IHC6. However, a comprehensive characterization of the expression of these markers at the single-cell level has not yet been performed. In standard pathology practice, ER and PR IHC are scored at the whole tissue level, and the percentage of immunoreactive tumor nuclei is reported using a semiquantitative scoring system which categorizes samples as positive (≥10% of nuclei immunoreactive), low positive (≥1% to <10% of nuclei immunoreactive) or negative (<1% nuclei immunoreactive). Assessing HER2 expression involves a more complex scoring process that considers the intensity of immunoreactivity, the extent of membranous signal (partial or complete), and the proportion of positive cells. Along with semiquantitative scoring of HER2 expression (0, 1+, 2+, or 3+), HER2 FISH is utilized in most institutions to analyze equivocal samples (scored as 2+) following the guidelines set by the American Society of Clinical Oncology (ASCO)/College of American Pathologists (CAP)7. HER2 amplification, determined by FISH, is defined as a HER2/CEP17 ratio greater than 2.0 with an average HER2 copy number greater than 4.0 using a dual probe system or an average HER2 copy number of greater than 6.0 HER2 signals/cell using a single probe system. Around 15–20% of breast cancer cases are identified as HER2+ based on protein overexpression and/or gene amplification. Prior to the development of HER2-targeted therapies, HER2 positivity was associated with a poor prognosis8,9,10. Now, HER2 protein overexpression is the primary predictor of responsiveness to HER2-targeted therapies in breast cancer. However, despite the careful patient selection using IHC/FISH and the availability of contemporary HER2-targeted therapies, pathological complete response (pCR) is only observed in 30–56% of HER2+ patients receiving preoperative therapy11,12,13,14,15,16. Moreover, primary and acquired clinical resistance to these therapies has been increasingly reported17. Differences in pCR rates are partly associated with the hormone receptor (HR) status, where patients with HR+/HER2+ tumors are less likely to experience pCR compared to HR−/HER2+ tumors16,18. Importantly, even in tumors designated as HER2 3+ by IHC, not all cancer cells show high-level HER2 expression19,20, suggesting that HER2 heterogeneity may provide insights into therapeutic response.

HER2 ITH has been well documented in breast cancer21. HER2 overexpression and amplification can present a heterogeneous pattern, including HER2-positive and HER2-negative tumor cell subpopulations occurring within the same tumor20,22. Distinct patterns of cells with heterogenous HER2 status include “clustered” type, featuring the presence of two topographically distinct tumor clones of tumor cells, one harboring HER2 amplification and the other with normal HER2 status; “mosaic” type, displaying either diffuse intermingling of cells with different HER2 statuses; and “scattered type”, with isolated HER2-amplified cells in a HER2-negative tumor cell population23,24,25. According to the 2009 ASCO/CAP guidelines, HER2 genetic heterogeneity is defined as the presence of ≥5% to <50% of infiltrating tumor cells with a ratio ≥2.2 when using dual probes or ≥6 HER2 signals/cell using single probes26. Preclinical murine models of mixed HER2-expressing tumor cells have revealed that HER2 heterogeneity impacts response to anti-HER2 antibody therapy27,28. This may be explained in part because heterogeneity in HER2 expression may lead to variation in the cell cycle properties of tumors29. Clinically, the percentage of HER2-positive cells within the tumor, as well as IHC scores, correlate with response to anti-HER2 therapy11,20. Indeed, heterogenous HER2 expression is correlated with a high risk of relapse and resistance to chemotherapy and Trastuzumab in patients with HER2-positive breast cancer27. In a clinical trial that enrolled confirmed HER2-positive patients, HER2 ITH was assessed by central pathology review and defined as either: (1) HER2 positivity by ISH in > 5% and < 50% of tumor cells (i.e., CAP guideline) or (2) an area of the tumor that tested HER2 negative in at least one of the six areas evaluated per tumor30. HER2 ITH was determined to be a strong predictor of resistance to a dual-HER2-targeted therapy regimen (T-DM1 plus Pertuzumab), with no patients with cancers classified as heterogeneous experiencing a pCR30. This effect was also evident in subgroup analysis by HR status30. These data further support hormone receptor status as a possible driver of ITH in HER2+ breast cancer31,32. The infiltration of TILs has been shown to be inversely correlated with HR expression33, suggesting varied immune activity in HR+/HER2+ versus HR−/HER2+ cancers as contributing to differential response to HER2-targeted therapy34.

Methods to assess HER2 heterogeneity at a single-cell level across large populations of tumor cells may provide important information beyond the data from routine clinical IHC. Here, we used cyclic immunofluorescence (CyCIF), a microscopy platform for multiplex tissue imaging, to evaluate HER2 expression in a cohort of HER2-enriched tumors. With CyCIF, iterative four-channel imaging is performed (with each cycle involving different antibodies directly conjugated to fluorophores) from a single section of a formalin-fixed paraffin-embedded (FFPE) tumor specimen allowing the acquisition of data on 60 or more different antigens35,36,37. Images are then registered and stitched to generate a composite representation that is used for visualization and analysis38,39. Because CyCIF permits imaging across an entire tissue section, it is an appropriate method for evaluating the tumor and immune heterogeneity present in tumors and biopsies35,36,37,40. Given the ability of CyCIF to enable single-cell imaging analysis, we hypothesized that CyCIF imaging would support a better understanding of breast ITH. As with most new technologies that utilize immunostaining, appropriate antibody validation is key to reliable performance. Therefore, in this study, we first evaluated multiple commercially available fluorophore-conjugated antibodies directed against proteins commonly used to characterize breast carcinomas, including ER, PR, HER2, androgen receptor (AR), and p53. After assembling a qualified panel of antibodies, we performed single-cell multiplexed tissue imaging and analysis and identified tumor cell clusters that were associated with distinct clinical features, including heterogeneous HER2 expression. Single marker expression of HER2 ITH correlated with clinical outcome as previously described. Further, by using multiple tumor and immune markers, we derived heterogeneity scores and demonstrated that high heterogeneity measured through single-cell analysis may have predictive value for patients with poorer clinical outcomes.

Results

Qualifying antibodies for CyCIF

Routine clinical testing of ER, PR, and HER2 is conducted in CLIA-certified laboratories that must document proficiency against pre-established criteria41,42. Recognizing the importance of having concordance between the results obtained from clinical testing and multiplexed tissue imaging, we first focused on testing the performance of multiple antibody clones against ER, PR, HER2, AR, and p53. To qualify these antibodies for use in CyCIF, we used a quantitative approach recently developed for assembling antibody panels for multiplexed tissue imaging assays (Fig. 1a)43. Single FFPE sections of human tissue were stained with 2 to 5 different commercially available, fluorophore-conjugated antibodies against the same protein target (Table 1), and the signal intensity from the different clones was compared. The performance of fluorophore-conjugated antibodies was evaluated against the clinical-grade antibodies used in practice by the Pathology Department at Brigham and Women’s Hospital (BWH)44.

Fig. 1: Overview of fluorescent CyCIF antibody qualification against antibodies used in the clinical laboratory.
figure 1

To qualify breast cancer-related antibodies HER2, ER, PR, AR, and p53, the BC03 tissue microarray (TMA), which represents 16 breast tumors in duplicate, was used. Multiple CyCIF antibodies were compared to a single antibody commonly used in clinical practice as a reference. a Schematic representation of the different levels of fluorescent antibody validation using the CyCIF method, starting from tissue staining (lowest level of validation) towards patient-level (highest level) inter-assay comparison (i.e., direct comparison of each patient tissue to itself between assays). bf Representative CyCIF images of antibodies tested by CyCIF on the BC03 TMA. Asterisks indicate clinical antibodies (*) and qualified CyCIF antibodies (**) for each target. g Representative CyCIF image of HER2 (TF-MA5-14509; sp3) and ER (CS98710) staining, showing the majority of tumor cells are ER+, and some showing strong, membrane staining for HER2. Left image is a full TMA core (36× mag.); the right image corresponds to the left image (74× mag.).

Table 1 Clinical and CyCIF antibodies used in the study.

Antibody testing was initially performed using a commercial tissue microarray (TMA; BC03), which included 32 samples, representing breast tumors from 16 patients arrayed in duplicate. CyCIF and corresponding clinical antibodies were applied to the same FFPE tissue following antigen retrieval using the standard CyCIF protocol as previously described35,36,37. Typically, CyCIF can accommodate unconjugated antibodies from different species (or isotypes) in the first cycle of staining, which are subsequently detected by indirect immunofluorescence using secondary antibodies conjugated to fluorophores. The clinical antibodies, which are often only available in unconjugated formulations, were therefore applied in the first cycle in unconjugated form. Fluorophore-conjugated CyCIF antibodies (i.e., primary antibodies conjugated directly to fluorophores) were used in subsequent cycles. Tables 1 and 2 detail the fluorophore-conjugated antibodies (referred to as “CyCIF antibodies”) used in the antibody qualification phase of this study. The clinical and CyCIF antibodies displayed expected staining patterns by CyCIF imaging when assessed by visual inspection (Fig. 1b–g), except for the clinical-grade AR antibody, which underperformed in the CyCIF assay compared to the CyCIF antibodies throughout the project (Fig. 1e).

Table 2 Antibody staining panels used for BC03 TMA.

The CyCIF antibodies were next assessed against the clinical antibodies at multiple levels of analysis (Fig. 1a), including at the pixel-level (pixel-by-pixel comparison; Supplementary Fig. 1), and on a per-cell level (cell-to-cell comparison; Supplementary Fig. 2). After we had selected a single high performing CyCIF antibody for each of the targets (ones that performed at least as well as the clinical-grade antibody in the pixel and cell level comparisons), we then assessed the signal intensity values acquired at the level of individual tissue cores (sample-to-sample level comparisons; Fig. 2, Supplementary Fig. 3). In addition, inter-assay comparisons of antibody performance between CyCIF and IHC (Fig. 3, Supplementary Fig. 4) and between CyCIF and HER2 FISH (in HER2-positive breast tumors) was performed to provide orthogonal qualification (Fig. 3).

Fig. 2: Core-to-core comparison of clinical and CyCIF antibodies against ER, PR, and HER2.
figure 2

To qualify breast cancer-related antibodies, the BC03 TMA, representing 16 breast tumors in duplicate was used. ac CyCIF was performed using the qualified CyCIF antibody against a single antibody commonly used in clinical practice as a reference for ER (a), PR (b), and HER2 (c). The left graph depicts a single-cell dot-plot between the clinical clone on the x axis and the validated CyCIF antibody on the y axis. Each dot represents single-cell fluorescent intensity values from the two antibodies. Dashed lines indicate the gating cutoffs. The middle graph shows the corresponding mean log intensity of the core-to-core analysis of the clinical and CyCIF antibodies. The single-cell data were collected for individual TMA cores, with a binary gate applied to obtain the positive signal of each core (range from 0–1). The X- & Y axis represent the positive score calculated from either clinical or CyCIF antibodies, respectively. The right graph shows positivity scores (number of positive cells over total cells) for the clinical and CyCIF antibodies by TMA case. d, e Cross-assay comparison of the clinical and CyCIF antibodies analyzed by CyCIF compared to the clinical antibody analyzed by IHC using Aperio software for ER (d) and HER2 (e). Left, dot-plot representation of two different scores obtained from CyCIF and from IHC-Aperio. CyCIF of clinical (green dots) and CyCIF antibodies (blue dots) were used on the same section, while IHC was performed on a different section from the same TMA block. Each dot represents a single core from BC03 TMA. CyCIF scores are plotted on y axis as positive ratio of immunofluorescence, IHC scores on x axis are plotted as the percent of positive cells. Right graph, quantitative assessment of ER and HER2 IHC versus CyCIF staining. IHC scores by Aperio were used to stratify (0–24, 25–49, 50–74, 75–100) different TMA cores/cases, and the mean intensities of CyCIF antibody staining from each TMA core are shown using boxplot analysis. CyCIF antibodies: ER (CST 74244 S) and HER2 (ab225510).

Fig. 3: Inter-assay analysis of HER2 enriched TMAs (TMAs 226 and 227).
figure 3

Following the selection of qualified ER, PR, and HER2 antibodies, two HER2-enriched TMAs, which included 567 tissue cores (representing 189 patients in triplicate), were used to further qualify CyCIF antibodies. a, b Percent of ER+ and HER2+ cells assessed through CyCIF (y axis) is compared to the score assigned by a clinical pathologist (x-axis) for each TMA. c Cross-assay comparison of the HER2 clinical and CyCIF antibodies analyzed by CyCIF compared to the clinical antibody analyzed by IHC using Aperio software. Left, dot-plot represents two different scores obtained from CyCIF and one obtained from IHC-Aperio. CyCIF of clinical (green dots) and CyCIF (blue dots) antibodies were used on the same section, while IHC was done on a different section from the same TMA block. Each dot represents a single core from BC03 TMA. CyCIF scores are plotted on y axis as positive ratio of immunofluorescence, IHC scores on x axis plotted as percent of positive cells. Right, quantitative assessment HER2 IHC versus CyCIF staining. IHC scores by Aperio were used to stratify (0–24, 25–49, 50–74, 75–100) different TMA cores/cases, and the mean intensities of CyCIF antibody staining from each TMA core are shown using boxplot analysis. d Clinically annotated HER2 FISH scores against IF/CyCIF staining using the SP3 antibody (Pearson r = 0.71) and HER2 FISH scores against IF/CyCIF staining using the CyCIF antibody, ab225510 (Pearson r = 0.65). Individual patients are shown in different colors, in triplicate. The triplicate cores tend to cluster together, indicating minimal variation.

The pixel-level analysis involved computing fluorescence intensity values for each antibody at a single pixel resolution and then performing a pixel-to-pixel correlation between the antibodies of the same target. This analysis revealed strong concordance between most CyCIF antibodies and their corresponding clinical antibody. Random sampling of 5000 pixels from 32 samples revealed Pearson correlation coefficients generally ranging from 0.70 to 0.97 (Supplementary Fig. 1). As expected, the DNA/Hoechst signal was not correlated with the epitope-specific signal generated by the antibodies (Supplementary Fig. 1). The pixel-level data of fluorescent intensity also allowed us to evaluate the dynamic range for each antibody revealing that most antibodies could capture and discriminate both low- and high-expressing cells (Supplementary Fig. 1, box plots). While most CyCIF antibodies performed well, some had poor correlation to other antibodies, including the clinical antibody. For example, the HER2 CyCIF ab4 had suboptimal performance compared to the clinical antibody, as demonstrated by a narrow dynamic range and lower sensitivity (Supplementary Fig. 1C, D). DNA/Hoechst was used as a reference and showed a wide dynamic range, as expected.

Multi-channel whole slide imaging data is typically segmented to identify single cells, and the staining intensity in each channel is computed on a per-cell basis38. Therefore, we next performed cell-to-cell comparisons of the signal acquired from the clinical antibody for each target to each of the CyCIF antibodies (Supplementary Fig. 2A–C, image on the left). Briefly, cells were segmented as described in the methods, and 5000 random cells were computationally isolated and analyzed from the 32 samples. Similar to the pixel-level comparisons, the cell-to-cell analysis revealed that the signal generated by most CyCIF and clinical antibodies was highly correlated (Supplementary Fig. 2, middle plot, intensity of each cell is plotted in log scale) and demonstrated a wide dynamic range indicating that these antibodies could detect both cells with low and high antigen expression (Supplementary Fig. 2A–E, boxplot on the right). The HER2 CyCIF ab4 that had not performed well in the pixel analysis similarly performed poorly in the cell-to-cell analysis with a narrow dynamic range and lower correlation coefficient with the clinical antibody compared to the correlation coefficient of other CyCIF antibodies versus the clinical antibody (Supplementary Fig. 2C).

Testing qualified CyCIF antibodies

Top performing CyCIF antibodies were identified based on the highest correlation with the clinical antibody and other CyCIF antibodies, highest performance in signal-to-noise ratio assessment, wide dynamic range, and best overall performance upon visual inspection (Table 1). The performance of the selected CyCIF antibodies was then tested again against the clinical antibodies. The BC03 TMA was stained with both the qualified CyCIF panel and clinical antibodies, and sample-level analysis was performed (core-to-core comparisons). The single-cell data was collected for individual TMA cores, and the mean log intensity of the signal for each antibody was used to calculate correlations. These analyses revealed concordance with R values of 0.91 for ER, and 0.94 for HER2 between the clinical and CyCIF antibodies (Fig. 2a–c and Supplementary Fig. 3A, B, middle plot). Of note, the clinical PR antibody (PgR636) was less sensitive than the conjugated CyCIF PR antibody, resulting in a minor discrepancy in the correlation between cores, likely because the CyCIF antibody identified more PR+ cells. After binary gating using a 2-component Gaussian Mixture Model (GMM), there was excellent core-to-core correlation between the positivity ratio (the number of positive cells divided by total cells of each core; ranging from 0~1) for the ER, PR, HER2, and p53 antibodies (Fig. 2a–c, Supplementary Fig. 3, far right graphs). A poor correlation was observed, however, for the AR antibodies due to the poor performance of the clinical-grade AR antibody in the CyCIF assay (Supplementary Fig. 3A). This can be explained given that the clinical antibody was selected for clinical testing based on its performance in IHC, which uses a protocol that differs from the CyCIF protocol. Indeed, we confirmed that the clinical AR antibody performed as expected by IHC (Supplementary Fig. 3C) but failed to work well in CyCIF due to a high background signal (Supplementary Fig. 3d, ab1).

In the initial evaluation, clinical-grade antibodies had been used as unconjugated reagents in the CyCIF assay. In the subsequent validation step, we compared the performance of the CyCIF antibodies against the clinical-grade antibodies used in standard IHC (i.e., cross-assay comparison between CyCIF and IHC). For this comparison, CyCIF was performed on single FFPE sections from TMA BC03 using both the CyCIF and clinical antibodies, and IHC was performed in the BWH Pathology Department Laboratory using the clinical antibodies on a serial section from the same TMA (Supplementary Fig. 4A). The IHC using the clinical antibodies was scored in two different ways: (i) using Aperio digital pathology software (recorded as percent positive cells) and (ii) by microscopic inspection by two pathologists (according to a clinical scoring schema). The Aperio IHC score of the clinical antibody was then compared to the positive ratio of the two different antibodies (the CyCIF and the clinical antibodies) as measured by CyCIF (Fig. 2d, e; Supplementary Fig. 4B, C). The Aperio IHC scores (% positive cells) from the clinical antibodies are shown on the x-axis and are plotted in two ways: (i) against itself in the CyCIF assay (green dots) and (ii) against the CyCIF antibody (blue dots). The clinical antibody IHC scores (x-axis) by Aperio were used to stratify TMA cores/cases, and the mean intensities of CyCIF staining of both the clinical and CyCIF antibodies (y-axis) from each TMA core are shown using boxplot analysis (Fig. 2d, e and Supplementary Fig. 4B, C, boxplot). ER and HER2 scoring of the CyCIF data had a high correlation (clinical antibody vs. CyCIF antibody) with Aperio IHC scoring (Fig. 2d, e). As expected, the clinical AR antibody by IHC was not correlated to itself when used in the CyCIF assay (green dots) but the clinical IHC analysis demonstrated a high correlation to the CyCIF AR antibody (blue; r = 0.74; Supplementary Fig. 4B) supporting the use of the CyCIF AR antibody. We also found high correlation between the clinical-grade p53 antibody and a CyCIF p53 antibody on core-level analysis (Supplementary Fig. 4C). Across the study we found that ‘mean fluorescence intensity’ (rather than the positive ratio via CyCIF) correlates better with the Aperio IHC score. This may in part be because Aperio scoring reflects mean expression across cells in the tissue.

The correlation of CyCIF to semiquantitative scoring of the IHC by two pathologists was then assessed. Scores from two pathologists for ER and HER2 IHC were highly correlated with the Aperio IHC scoring (Supplementary Fig. 4D–H). Our analysis of the TMA cores revealed some discrepancies with the results available from the vendor of the TMA, which may be attributable to the fact that the vendor scoring was not performed on immediate serial sections and no information was provided regarding the antibodies that had been used by the vendor (Supplementary Fig. 4G, H).

The cross-assay comparison was then extended to include two HER2-enriched TMAs (TMA226 and 227) from a cohort of samples from patients who were diagnosed with their primary breast cancer between March 1995 and November 2005 and subsequently treated at the Dana-Farber/Brigham and Women’s Cancer Center (Table 3). The tissues were annotated with clinical data, including the results of HER2 FISH that was performed as part of clinical care (Table 3). TMA226 and 227 include 567 tissue cores from 189 tumors arrayed in triplicate45,46. CyCIF was performed on a single slide from each TMA, and serial sections were used for ER and HER2 IHC. The CyCIF images were analyzed to identify the percent of marker-positive cells out of the total keratin-positive tumor cell population. IHC was scored in two ways: (i) by a pathologist according to CAP guidelines for ER (none, weak, moderate, strong) and percent of positive cells; and HER2 (0, 1+, 2+, 3+) and (ii) using Aperio software as a percent of positive cells. The CyCIF and IHC pathology scores were highly correlated for ER and HER2 (Fig. 3a, b) as were the CyCIF and Aperio scores of HER2 (Fig. 3c). We found high correlation between HER2 copy number (as measured by HER2 FISH analysis) and the expression of HER2 protein as determined through CyCIF using both the clinical and CyCIF antibodies (Pearson r = 0.71 and 0.65, respectively; Fig. 3d). Individual cores from the TMA are plotted in Fig. 3d colored by patient. While we observed differences between cores from the same patient, they largely clustered together, indicating that each sample resembles the larger tissue. Taken together, these analyses identified fluorophore-conjugated CyCIF antibodies, which compare favorably to widely used clinical antibodies.

Table 3 Clinical annotation of TMAs 226, 227, 240.

A qualified antibody panel accurately assigned single cells based on clinical annotation

Having established a qualified CyCIF antibody panel (Supplementary Fig. 5; Table 4), we next characterized the ITH of breast tumors at a single-cell level. CyCIF was performed on the two HER2-enriched TMAs (TMA226 and 227) and an additional TMA that was enriched for triple-negative breast cancer samples (TMA240). Together, the TMAs included 834 total breast tumor cores from 278 unique patients, including HER2+ (regardless of HR status; n = 158, 57%), HR+/HER2− (n = 31, 11%) and HR−/HER2− (TNBC; n = 89, 32%) (Tables 3, 5). A total of 512,699 single cells were segmented, and fluorescence intensity values were computed on a per-cell basis (Table 6). While the full data set was used for analysis, the data from 50,000 randomly selected cells was used for visualization in the t-distributed stochastic neighbor embedding (t-SNE), which projects the integrated staining intensity for each cell onto two dimensions preserving the high-dimensional relationships between the makers (Supplementary Fig. 6). Tumor cells (i.e., Keratin positive) single cells clustered according to the clinical annotation that was extracted from the clinical database of the corresponding tumor (HER2+ [regardless of HR status], HR+/HER2− and HR−/HER2−) and, as expected, the immune cells were randomly distributed (Supplementary Fig. 6A). Keratin positive (Supplementary Fig. 6B) tumor cells expressed combinations of ER, PR and HER2 as expected in partially overlapping patterns (Supplementary Fig. 6C). Ki67 was expressed in subsets of the HR+/HER2+, HR−/HER2+ and HR−/HER2− tumor cells. AR was co-expressed in a subset of HR+ tumors and in a subset of HR−/HER2− tumor cells. p53 was predominantly expressed in HR−/HER2− tumor cells (Supplementary Fig. 6D). Keratin negative cells were positive for CD45 and/or CD68 and a subset of those expressed PD-L1 (Supplementary Fig. 6E).

Table 4 Qualified antibody staining panel.
Table 5 Total number of cases by ER/PR/HER2 status.
Table 6 Total number of single cells analyzed per subtype of breast cancer.

A qualified CyCIF antibody panel reveals distinct clusters of cancer cells in HER2+ breast cancer

Given that the qualified antibody panel accurately assigned single cells based on clinical annotation, we performed a deeper analysis focusing on the two TMAs enriched with HER2+ tumors (567 tissue cores from 189 patients, a total of 201,601 single cells analyzed; Table 6). The tumor cells from the HER2 enriched cases were analyzed at the single-cell level, and single cells were clustered by their patterns of ER, PR, and HER2 expression (Fig. 4a). When the t-SNE was colored by a patient identifier (Fig. 4b), we observed a substantial degree of ITH for ER, PR and HER2 expression. In general, the tumors were enriched for HER2 expression as expected (Fig. 4c).

Fig. 4: Overview of single-cell data from HER2+ enriched breast tumors.
figure 4

Following the selection of qualified ER, PR, and HER2 CyCIF antibodies, the expression of selected antibodies was evaluated at a single-cell level in 567 HER2+ invasive breast cancer samples from 189 patients, and t-Distributed Stochastic Neighbor Embedding (t-SNE) is shown as a distribution of all single cells. a Selected markers were used to plot single cells. b t-SNE in a is shown colored by patient ID. c Visualization of markers within t-SNE plots.

Clustering of all the single-cell (tumor and non-tumor cells) revealed 7 distinct clusters, including 4 tumor and 2 immune/stromal populations (Fig. 5a–d). Clusters 2, 4, 5, and 7 represented the tumor cells as defined by the expression of keratin. Among the 4 tumor populations, three were HER2+ and displayed different levels of HER2 expression relative to each other high, moderate and low: cluster 2 (HER2highERnegPRposARposPD-L1highKi67pos), cluster 5 (HER2modERnegPRposARposp53highPD-L1posKi67pos) and cluster 4 (HER2lowERposPRlowARpos)). One cluster was HER2-negative (cluster 7 (HER2negERhighPRhighARhigh)). Volcano plot analysis reveals heterogenous expression of markers across clusters (Fig. 5d). Clusters 3 and 6 represent an immune population as characterized by expression of the leukocyte marker CD45 and macrophage marker CD68, suggesting these are macrophages. Cluster 1 had heterogeneous expression of most proteins and, therefore, did not correspond to a distinct population of cells. We revealed that there was a low expression of HER2 and moderate expression of Keratin through the violin plot analysis (Fig. 5d) and that 69.89% of the cells had some Keratin positivity, 36.14% Her2 positivity, and 34.05% were double positive, overlapping with the single populations; therefore, it is likely a tumor cell population that also contains non-tumor cells within the cluster. The use of additional antibodies against other immune cells, endothelial, fibroblast, and other tumor markers would likely increase the ability to cluster additional cells into appropriate classes. Taken together, these analyses revealed the presence of substantial HER2 ITH in breast tumors at a single-cell level that may have implications for clinical care.

Fig. 5: CyCIF single-cell analysis of HER2+ breast cancer reveals tumor populations with heterogenous HER2 expression.
figure 5

Following the selection of qualified ER, PR, HER2, AR, and p53 CyCIF antibodies, the expression of selected antibodies was evaluated at a single-cell level in 567 HER2+ invasive breast cancer samples, representing 189 patients. a Single-cell clustering is shown, and b median expression for each antigen across each cluster is shown. Relative expression of HER2 is designated as high, moderate (mod), low, and negative among clusters 2, 5, 4, and 7, respectively. Tumor clusters are defined as: cluster 2 (HER2highERnegPRposARposPD-L1highKi67pos), cluster 5 (HER2modERnegPRposARposp53highPD-L1posKi67pos), and cluster 4 (HER2lowERposPRlowARpos)). One cluster was HER2-negative (cluster 7 (HER2negERhighPRhighARhigh)). Cluster 3 and 6 represent immune/stromal populations as characterized by the expression of the leukocyte marker CD45. Cluster 1 has heterogeneous expression of most proteins and, therefore, did not form a distinct population of cells. Area in A and B refers to the nuclear area of segmented cells. c The 7 cell clusters are visualized using t-SNE. d Volcano plots of expression of each marker by cluster.

CyCIF reveals ITH of HER2+ breast cancer

Tumors with high HER2 ITH have been shown to be more resistant to HER2-targeted therapy, and recent data from clinical trials have implicated HER2 ITH in determining clinical outcome30. We evaluated HER2 expression in individual cells in tissue samples from 77 unique patients (triplicate cores; n = 231, from the cohort that was clinically defined as HER2+ and had at least 500 cells pooled from the triplicate cores) by association with recurrence data obtained from clinical records. The interpatient variation of HER2 expression as measured by the coefficient of variation in single cells revealed that higher heterogeneity in individual patients correlated with recurrence, as expected (Fig. 6a). The mean intensity expression of HER2 did not correlate with recurrence, nor did expression of Ki67, both measured by CyCIF, indicating that using single parameters of expression may not be sufficient in evaluating the tumor due the complexity of tumor heterogeneity (Fig. 6b).

Fig. 6: HER2 Heterogeneity scores derived from clustering analysis reveal correlation to clinical outcome.
figure 6

Following the selection of qualified ER, PR, HER2, AR, and p53 CyCIF antibodies, the expression of selected antibodies was evaluated at a single-cell level in 567 HER2+ invasive breast cancer samples, representing 189 patients. Tissues from HER2+ patients (n = 77) in which there were at least 500 cells pooled from the triplicate cores were used for ITH analysis. a HER2 expression was analyzed in single cells, and the coefficient of variation (C.V.) among patients was plotted (y axis) by recurrence status. b HER2 and Ki67 mean intensity expression measured by CyCIF. c Distribution of cells across all clusters (blue) and HER2 core number 113 (orange) and d representative tumor with low (HER2-5 and HER2-161) and high (HER2-164 and HER2-170) heterogeneity. e HER2 heterogeneity scores were generated by identifying cells from each tissue mapped to the entire t-SNE. A larger boundary corresponds with higher diversity. f Samples that have equal distribution of each cluster have high heterogeneity and are diamond-shaped in the boundary mapping. g GMM and t-SNE scores reveal an association with recurrence. h Patients treated with Trastuzumab were removed from the GMM and t-SNE score analysis.

It is unknown how the complexity of the tumor as a whole influences ITH and clinical outcome. Therefore, we then sought to characterize the ITH of HER2+ tumors beyond the expression of HER2 using the same 77 patients with triplicate cores. To do this, we developed two metrics to score the ITH of cell types, which we derived from (i) the GMM clusters and (ii) t-SNE representations of the CyCIF data. The GMM score is a population-level estimation of the heterogeneity of cell-type composition. Clustering all single cells from the cohort into 7 different subpopulations using GMM (Fig. 6c, d) allowed us to determine the GMM score for each sample, which reflects how the cell-type composition of the sample differs from the cell-type composition of the entire cohort. In the HER2 enriched cohort, the 7 clusters contained between 7–20% of the cells (Figs. 5c and 6d, blue bar). We observed that individual patient samples were enriched for cells derived from some clusters more than others. For example, the distribution of cells from core HER2-113 across the 7 GMM clusters is shown in Fig. 6c, orange bars, and reveals over-representation of Cluster 3 in that tumor, whereas other samples comprised a more even distribution of clusters and therefore had a more heterogenous composition (e.g., HER2-164 and HER2-70; Fig. 6d). HER2-5 and HER2-161 are more enriched in a single cluster, therefore, less heterogeneous (Fig. 6d). To visualize the composition of the samples, we generated glyph plots (Fig. 6e) and calculated the Euclidean distance from the mean distribution of all samples (see methods) to generate GMM scores. A high GMM score represents low heterogeneity, while a low GMM score represents heterogenous cell compositions (Supplementary Table 1). Next, we generated t-SNE scores which are derived directly from the single-cell data in high-dimensional space (Fig. 6f). Unlike the GMM score, the t-SNE score is not based on separating cells into different clusters, instead it uses the distance and spread of each single-cell in t-SNE space. A wider distribution of single cells for any given sample in t-SNE space represents tumors with high ITH, while tumors with low ITH have a more localized distribution (Fig. 6f).

To evaluate the potential clinical significance of ITH, we utilized the GMM and t-SNE scores along with recurrence data obtained from clinical records. The GMM and t-SNE scores revealed differences between patients who experienced recurrence versus those who did not (Fig. 6g). Since the HER2 enriched TMA cases are from patients who were diagnosed with their primary breast cancer between March 1995 and November 2005 and adjuvant Trastuzumab was not approved by the FDA until 2006, patients primarily received chemotherapy without anti-HER2 therapy (Table 7). To unify the analysis, we removed the small fraction of patients who did receive Trastuzumab (n = 10) and performed the analysis again with the remaining 67 patients (Table 7) and found the GMM correlation with recurrence as well as the t-SNE score correlation with recurrence followed a similar association as with the full data set (Fig. 6h).

Table 7 Treatment of HER+ patient cohort (TMAs 226 and 227).

We assessed additional associations with clinical data, including ER and PR status, clinical stage, age, and tumor grade, all extracted from the clinical data, as well as Ki67 expression derived from CyCIF analysis. In some cases, adjacent categories were combined when there were low numbers of patients for each category (Table 8A). Interestingly, none of these features were significantly associated with recurrence (Table 8B). We then fit two models, one with GMM score and the other with t-SNE score with the clinical features and found that both GMM and t-SNE scores were significantly associated with time to recurrence and among the other clinical features examined, only clinical stage (III-IV vs I) was significantly associated in both Model 1 (GMM score; p = 0.03; Table 8C) and Model 2 (t-SNE score; p = 0.049; Table 8C). Taken together, this work suggests that high ITH as measured through single-cell analysis, may be linked to poorer clinical outcomes.

Table 8 Association of clinical data.

Discussion

This study is the first to evaluate the performance of antibodies routinely used clinically to analyze breast cancers in a highly multiplexed imaging platform such as CyCIF that enables single-cell analysis across an entire tissue sample. We developed a panel of qualified antibodies against common breast cancer markers that show excellent concordance with clinical antibodies routinely used in CLIA-certified labs. We then used the qualified antibodies along with other cell states and immune markers to perform CyCIF. Using a HER2-enriched cohort of 567 tissue cores from 189 patients, we performed clustering analysis of 201,601 single cells. Clustering analysis allowed an unbiased approach to inform our understanding of how HER2 heterogeneity relates to other relevant cancer markers. Heterogenous expression of HER2 expression among individual patients correlated with recurrence. This has been previously reported using IHC analysis, but we report it here for the first time using single-cell analysis (Fig. 6a). Further, we identified 4 keratin positive tumor cell clusters that varied by HER2 expression levels relative to each other (high, moderate, low and negative). These clusters further varied with respect to other breast cancer-specific markers such as ER, PR, AR, and p53, as well as PD-L1. Importantly, we revealed that ITH correlates with clinical outcome.

Clustering of single cells from tumors using CyCIF revealed new classifications of HER2 heterogenous breast tumors. Indeed, we revealed that clusters 2 and 5 had high to moderate expression of HER2, PR, AR, and PD-L1but were negative for ER. Cluster 5 had high expression of p53, whereas cluster 2 was negative for p53. Cluster 4 consisted of a population of HER2low-expressing cells as well as low expression of ER, PR, and AR and heterogenous expression of PD-L1. Cluster 7 represented a HER2negERpos population of tumor cells, which was also positive for PR and AR, and negative for p53 and PD-L1. ER+ tumors are generally associated with low tumor-infiltrating lymphocytes (TILs)47, and up-regulation of PD-L1 in the tumor has been shown to be driven by interferon-gamma production by CD8+T cells48. Therefore, the HER2negERpos tumor Cluster 7 may represent an immunologically cold tumor environment indicated by the absence of PD-L1. ASCO/CAP acknowledges the spatial heterogeneity of HER2 staining as “clustered”, “mosaic” and “scattered”. These non-clonal patterns are more frequent in cases that are 2+. Our patient cohort had a limited sample size of HER2-low tumors, and therefore, we were not able to assess spatial heterogeneity among HER2-low tumors. However, a major advantage of the CyCIF technology is the ability to perform spatial analysis, and therefore, further investigation of spatial relationships is warranted in HER2-low tumors.

Two immune/stromal cell clusters were identified based on CD45 expression and lack of keratin expression. Cluster 3 is characterized by high expression of both CD45 and CD68, suggesting this cluster contains macrophages. Further work to interrogate the phenotype of tumor-associated macrophages may provide an opportunity for new therapeutic targeting49. Cluster 6 is less clear but also represents an immune population of cells, likely macrophages, based on its expression of CD68. Both Clusters 3 and 6 also express PD-L1, whereas Cluster 3 has a higher expression of Ki67. Notably, cluster 6 represented 20% of all cells analyzed, which was the highest proportion of total cells in the HER2-enriched cohort of breast tumors. Cluster 1 has heterogeneous expression of most proteins and, therefore did not form a distinct population of cells, as they are spread throughout the t-SNE space. This is likely because sufficient phenotype markers were not included in our antibody panel to accurately identify these cells.

To interrogate the relationship between ITH and clinical outcome, we derived GMM and t-SNE scores from the GMM clustering and t-SNE representation of the CyCIF data. The GMM score is based on the distribution of different cell populations, defined by GMM clustering, and provides a heterogeneity score based on cell-type composition, based on a percentage of cells in each cluster that are present within individual tumors. A limitation of the GMM score is that it may not capture the subtle differences within any given population since it is categorical. For example, cells within the same cluster could be heterogeneous in marker expression, but the GMM score would not capture that. Alternatively, the t-SNE score is defined by the overall distribution in high-dimensional marker space (i.e., t-SNE space), so it should recapitulate more subtle differences between single cells. In most cases, the GMM and t-SNE scores were correlated (Fig. 6g), and we found that both GMM and t-SNE scores correlated with worse clinical outcomes in a historical patient population that was treated with chemotherapy largely without HER2 targeted therapy (Table 7). Importantly, other clinical features such as ER and PR status, age, and tumor grade, all extracted from the clinical data, as well as Ki67 expression derived from CyCIF analysis, did not associate with recurrence, and the clinical stage was only associated in the adjusted analyses (Table 8). This work reveals that single-cell imaging techniques have the ability to define ITH and predict clinical outcomes.

In the current study, HER2+ patients were treated prior to the routine use of Trastuzumab (or other HER2-targeted therapy) and received chemotherapy, hormone therapy, Trastuzumab, or a combination of therapies (Table 7). Future studies are warranted for breast cancer patients who receive (neo)adjuvant anti-HER2 therapy to determine the prognostic and potentially predictive utility of the HER2 ITH and ITH evaluation method developed here. The treatment of HER2+ breast cancer is rapidly evolving and should be taken into consideration for future studies. In addition to anti-HER2 agents, new treatments for HER2+ disease have been tested in the clinic such as Trastuzumab deruxtecan (T-DXd), a HER2 antibody-drug conjugate (ADCs), which is composed of an anti-HER2 antibody, a cleavable tetrapeptide-based linker, and a topoisomerase I inhibitor payload, and have led to remarkable responses in previously treated HER2+ metastatic cancer. In addition, recent data from the Phase 3 DESTINY Breast04 study of patients with HER2-low metastatic breast cancer, T-DXd resulted in significantly longer progression-free and overall survival than the physician’s choice of chemotherapy50. Interestingly, T-DXd has recently been shown to work in clinically defined HER2 1–2+51 as well as 052 by IHC. The new concept of HER2-low expression level has not yet been defined by ASCO/CAP guidelines, although these patients have been shown to benefit from ADCs53. Here, we have described a rigorous approach for assessing ITH, which is likely to be valuable for HER2-low or heterogenous tumors and will need to be tested in these patient cohorts. Single-cell multiplexed tissue imaging may provide an opportunity to interrogate heterogeneity with greater depth in relation to multiple markers and topographic representations and may potentially offer a new approach to assess the duration of clinical benefit in response to HER-targeted therapies.

In the clinical setting, ER, PR, HER2 IHC, and/or FISH are routinely performed on breast tumor samples to inform therapeutic options for the patient. However, even after a tumor is characterized based on the expression of ER, PR, and HER2, clinical studies reveal that response to therapy can vary, in part due to ITH30. Our work here indicates that single-cell, multiplexed IF imaging may be a reliable approach to elucidate both HER2 and tumor ITH in research settings and provides a basis for testing multiplexed platforms for assessing ITH in breast tumors in clinical settings. However, additional studies are warranted. A limitation of this study is that we used TMAs instead of whole tissue sections to evaluate ITH, and it is increasingly apparent that whole slide imaging provides a more complete assessment of tumor features, with spatially correlated features resulting in a reduction in effective sample size40. However, this analysis of a large number of patients (including 567 HER2+ invasive breast cancer samples from 189 patients with triplicate sampling from each patient) is useful for providing initial insights into the workflows and approaches that can be used to study larger cohorts of whole slide images, as the technical capacity to do so becomes available54. Additional analysis on surgical specimens is warranted to investigate ITH at a whole tissue level; however, in the clinical setting, many tumors are sampled by core biopsies that often render limited material, and the statistical approaches needed to account for these small samples require further development. In addition, further work to understand the context of immune and stromal cells, including endothelial cells, fibroblasts, lymphocytes, and innate immune cells, may lend additional information on the complexity of the TME and response to therapy, and these efforts will be facilitated by the use of methodologies that permit deep phenotyping of cellular transcriptomes using emerging single-cell spatial transcriptomic methods.

Methods

Specimens, patients, and ethics

BC03 TMA

Commercial tissue samples were obtained from Reveal Biosciences (BC03), which includes 16 breast cancer tissues in duplicate with a paired normal tissue. Grading, TNM staging data, AR, ER, PR, HER2, p53, and Ki67 IHC data are available from the vendor.

DFCI/BWH TMAs

Breast cancer microarrays were constructed with tissues obtained from untreated, de-identified patients who provided written informed consent under Dana Farber Cancer Institute IRB protocol 93-085. All tissues are from archival excisions or mastectomies, not core biopsies. All tissues are pretreatment (no prior chemotherapy) and were collected between 1998-2005. Archival formalin-fixed, paraffin-embedded breast cancers were collected, and the best blocks and best areas for coring were identified and selected by a breast pathologist (D.D.). Each tumor sample was represented by three tissue microarray cores that, when possible, were taken from different areas of the same tumor. Results of immunohistochemical studies for estrogen (ER) and progesterone receptor (PR) and HER2 and FISH assay results for HER2 were extracted from pathology reports. TMA construction was carried out in the Dana Farber/Harvard Cancer Center Tissue Microarray Core Facility. Three 0.6 mm cores were taken from marked areas and placed into a recipient block using a manual arrayer (Beecher Instruments). Formalin-fixed, paraffin-embedded (FFPE) tissue was sectioned at 5 mm.

Ethics

The study was conducted in accordance with ethical principles founded in the Declaration of Helsinki. All analysis was approved by the institutional review boards of Dana-Farber Cancer Institute and Harvard Medical School.

Reagents and antibodies

To determine the optimal antibody candidate for each biomarker in CyCIF, we compared multiple fluorophore-conjugated antibodies as shown in Tables 1 and 2. Each research (CyCIF) antibody was compared to a single antibody commonly used in clinical practice as a reference.

Data analyses

Analyses on CyCIF were performed at the level of pixels, cells and tissue cores. In addition, inter-assay analyses were performed comparing: (1) CyCIF vs. IHC, the latter assessed both by digital pathology and by two independent pathologists; and (2) CyCIF vs. FISH for HER2. Following validation of these antibodies, the expression of ER, PR, HER2, AR, PD-L1, p53 and Ki67 were used to better understand ITH in breast cancer.

Single-cell analysis breast cancer cores

For single-cell analysis, a total of 589,343 cells from 278 breast carcinomas were included. In the DFCI TMAs a total of 512,699 cells were analyzed as indicated: HER2+201,601; HR + 94,237; and TNBC 216,861 (Table 6).

Tissue-based cyclic immunofluorescence

CyCIF (https://www.cycif.org/) was performed as described previously37 and used by our group37,55,56. Briefly, 4–5 µm FFPE unstained slides were baked (30 mins at 60 °C) and antigen retrieval was performed using Leica BOND RX with ER1 solution (Leica Biosystems #AR9961). A pre-staining cycle is subsequently performed and is constituted by blocking of sample with secondary antibodies so that auto-fluorescence and non-specific antibody binding can be reduced. All staining steps were done at 4 °C overnight. Staining is followed by bleaching with 25 mM NaOH with 4.5% H2O2 for 45 mins with light exposure. Each successive CyCIF cycle included immunostaining the specimen with the testing antibodies, followed by nuclear staining with a DNA dye, four-channel imaging and fluorophore bleaching. When all cycles are completed, the slide is stained with H&E to allow conventional histopathology review. Individual images are then stitched together into high-dimensional representation for further segmentation and analyses. The RareCyte CyteFinder (RareCyte, Seattle, WA) was used for image capturing. Ashlar (https://github.com/labsyspharm/ashlar) was used to stitch or merge images in each round of CyCIF. This combined image is then viewable using Omero (https://www.openmicroscopy.org/omero/) due to the computational size of the combined image. Single-cell segmentation of the stitched image used the watershed algorithm based on nuclear staining of Hoechst 33342 to generate a nuclear mask image, which defines the single-cell regions extended by 3 pixels to define a cell boundary35. Segmentation is based on nuclear stains; however, the cytoplasmic & membrane signals are also captured, relevant for cytoplasmic staining such as HER2, via expanding nuclear masks. The data presented here demonstrate that HER2 positivity from CyCIF is highly correlated with pathologist’s scores indicating this method of segmentation and quantification are representative. Within the single-cell ROIs, gating a ‘positive’ or ‘negative’ status for each marker is conducted based on the local minimum implemented in a custom ImageJ/Matlab script.

Immunohistochemistry

All IHC was performed in the Brigham and Women’s clinical pathology (CLIA) laboratory. For IHC analyses, 4–5 µm sections were made from FFPE blocks. Unstained slides were deparaffinized and subjected to antigen retrieval using and immunostaining was subsequentially performed with the tested clones (Table 1). All staining procedures were performed according to the manufacturers’ instructions in the presence of appropriate controls. Two pathologists evaluated the IHC expression of each given clone, according to the parameters recommended by the latest protocol from the College of American Pathologist7. In addition, IHC was also assessed by digital pathology (Aperio ImageScope by Leica Biosystems Inc.)

Calculation of Gaussian Mixture Model (GMM) score

All clusters were used to generate the GMM score, which was calculated by the distance matrix from cluster composition of individual patients, and how much deviation from the whole cohort. The formula is:

$$\begin{array}{ll}{GMM}\,{score}=1-{distance}\left(\right.{Cohort}[{cluster}\,{composition}],\\{Patient}[{Cluster}\,{composition}]\left.\right)\end{array}$$
(1)

As an example:

 

Whole cohort:

Patient 1

Patient 2

Patient 3

Cluster 1:

0.25

0.3

0.1

0.2

Cluster 2:

0.25

0.2

0.3

0.2

Cluster 3:

0.25

0.2

0

0.3

Cluster 4:

0.25

0.3

0.6

0.3

In this case, patients 1 & 3 are with GMM score 0.9, while patient 2 is 0.54. The lower the score, the more heterogeneous.

Calculation of t-distributed stochastic neighbor embedding (t-SNE) score

All clusters were used to generate the t-SNE score, which was done by Cyt package as described37. After generating the tSNE1/tSNE2 values for each single cells, the t-SNE score for each TMA cores was calculated used the formula below:

$${tSNE}\,{score}=\sqrt{\sum{\left(\right.{tSNE}1-{mean}({{tSNE}1}_{{all}\,{cells}})}^{2}+\sum {\left(\right.{tSNE}2-{mean}({{tSNE}2}_{{all}\,{cells}})}^{2}}$$
(2)

Association of clinical data

Some levels of clinical stage and tumor grade were combined due to numbers of patients in some groups. To test the association between GMM/t-SNE score and other features, the following methods were used:

  1. 1.

    For ER, PR and tumor grade, Wilcoxon rank-test was used due to data having two categories.

  2. 2.

    For clinical stage the Kruskal-Wallis test was used.

  3. 3.

    For age and CyCIF tumor Ki67 analysis the Spearman correlation test was used.

Cox proportional hazard model was used to fit two models, one with GMM score and clinical features; and the other with t-SNE score and clinical features. The hazard ratio and p value are shown.