Introduction

Human epidermal growth factor receptor 2 (HER2) status can be determined by immunohistochemistry (IHC) or in situ hybridization (ISH) assays for the evaluation of HER2 overexpression or gene amplification, respectively. Usually, the assessment begins with IHC, with equivocal results requiring ISH reflex test. The 2018 ASCO/CAP (American Society of Clinical Oncology/College of American Pathologists) guideline recommends counting at least 20 non-overlapping cells in two separate areas of invasive cancer in HER2 ISH test [1]. Although this is usually interpreted as counting a total of 40 cells (at least 20 cells per area), the supplementary data of the previous guideline explains that the minimum cell number is, in fact, a total of 20 cells in two separate areas of invasive cancer (at least 10 cells per area) [2].

HER2 testing variability remains an important issue, with the minimum number of cells needed to obtain an accurate result yet to be determined [3,4,5,6]. Moreover, genetic heterogeneity has been reported in almost all types of cancer, including BC, being responsible for discrepant HER2 results between IHC and ISH tests, as well as contributing to changes in disease progression and response to target therapy [7, 8]. Several clinical studies have demonstrated that anti-HER2 targeted therapy improves progression-free survival and overall survival only in patients with HER2-positive BC, which represents about 15% of all BC cases [9,10,11,12,13,14,15]. For this reason, the accurate assessment of HER2 status is crucial to identify patients who are most likely to benefit from this targeted therapy.

In this study we aim to evaluate the effect of counting an increasing number of invasive cancer cells in the result of ISH quantification (HER2/CEP17 ratio and average of HER2 copy number per cell) as well as to compare two different approaches of measuring genetic heterogeneity (single cell and population based).

Materials and methods

Case selection

A cohort of 100 consecutive BC cases (primary and metastatic) with an equivocal HER2 result by IHC (score of 2+) was retrieved from the archives of Ipatimup Diagnostics from April to August 2019. The cases included formalin-fixed paraffin-embedded needle core biopsies (NCB) and surgical excision specimens (SES) referred to our institution (national reference center for HER2 ISH) for an evaluation of HER2 amplification with bright-field ISH. HER2 test by IHC was performed by the sending institution and information regarding pre-analytical conditions as well as the antibody used were not available. This study has been performed in accordance with the national regulative law for the handling of biological specimens from tumor banks, being the samples exclusively available for research purposes in retrospective studies, as well as under the international Helsinki declaration. Ethical approval and informed consent were not required for this study.

Bright field in situ hybridization

ISH was performed on 3-μm-thick sections in one block of each case with dual-hapten, dual-color ISH. The dual-probe assay (VENTANA HER2 Dual ISH DNA Probe Cocktail Assay (catalog number 760-6072); Ventana Medical Systems, Inc., Tucson, AZ, USA), which is Food and Drug Administration-approved, contains a HER2 locus-specific probe (black signal) and a control probe specific for the centromere of chromosome 17 (centromere enumeration probe-CEP17, red signal) that allows detection of HER2 amplification by light microscopy. The entire procedure was carried out on an automated staining system (Ventana BenchMark XT Staining System; Ventana Medical Systems, Inc., Tucson, AZ, USA) according to the manufacturer’s instructions. Appropriated positive and negative controls were used in every set of slides. Optimal staining consists of an absence of non-specific background staining, distinct nuclear morphology, and clear and specific signals within the nucleus.

ISH interpretation

The samples were quantified by a biomedical scientist (AC) with previous ISH training according to the 2018 ASCO/CAP guideline for HER2 amplification in BC [1]. The training, six months prior to this study, consisted in the parallel evaluation of 100 ISH tests with the pathologist achieving a diagnostic result concordance higher than 95% (amplified and non-amplified status) [16].

Corresponding hematoxylin and eosin staining was used for the identification of the invasive component of the tumor, and, whenever available, the IHC slide was used to score in the area with strongest intensity. The evaluation of the HE and the IHC slides were performed by a pathologist (AP) that marked the tumor areas (manually) for ISH quantification. Only cells with a minimum of one copy of HER2 and CEP17 each were scored. The number of HER2 signals was estimated in clusters, except for doublets, which counted as a single signal. The evaluation of the samples included scanning the entire ISH slide prior to counting and scoring 20 nuclei, in five different areas, recording the numbers of HER2 and CEP17 signals in each cell over an area with higher level of HER2 amplification. This approach allowed us to continuous add the result of individual cells, until 100 cells were reached, and measure the effect on the HER2/CEP17 ratio and on the average of HER2 copy number quantifications.

The 2018 BC guideline defines HER2 gene amplification as positive (classical group 1) when the HER2/CEP17 ratio is ≥2.0 and the average HER2 copy number is ≥4.0 signals per cell, and negative (classical group 5) when the HER2/CEP17 ratio is <2.0 and the average HER2 copy number is <4.0 signals per cell. Moreover, group 2 is defined as HER2/CEP17 ratio ≥2.0 and average HER2 copy number <4.0 signals per cell; group 3 as HER2/CEP17 ratio <2.0 and average HER2 copy number ≥6.0 signals per cell; and group 4 as HER2/CEP17 ratio <2.0 and average HER2 copy number ≥4.0 and <6.0 signals per cell. The final classification in groups 2 to 4 (non-classical groups) depends on the result of IHC analysis and is considered positive if a score 3+ in these groups or a score 2+ in group 3, and negative if otherwise [1].

The results obtained by this approach were compared with the results from the original ISH report performed by a pathologist (AP) counting 20 cells in two separate areas of invasive cancer (20 + 20), with an additional 20 cells if HER2/CEP17 ratio between 1.8 and 2.2. Cases with discordant results were reviewed during common microscopy session to search for reasons of disagreement. No additional testing was performed to resolve the discordances. Finally, genetic heterogeneity (GH) was documented, defined in the 2018 ASCO/CAP guideline as a discrete population of tumor cells with HER2 amplification. A case was considered positive if HER2 gene amplification represented at least 10% of the total tumor cell population [1]. We also evaluated single cell genetic heterogeneity, defined as tumor cells with HER2 amplification between 10 and 90% of the total tumor cells without forming a discrete population of tumor cells.

Statistical analysis

Statistical analyses were performed using the Statistical Package for the Social Sciences (SPSS) version 26.0 for Windows. Pearson´s correlation coefficient (PCC) was used for comparison of quantitative variables. The level of significance was set at P < 0.05.

Interobserver agreement rates regarding interpretation of the HER2 amplification assay were evaluated with kappa (k) statistics. k-Values range between zero (chance agreement) and 1 (perfect agreement) and were satisfactory if greater than 0.81.

The coefficient of variation (CV) was used to quantify the HER2/CEP17 ratio and HER2 copy number variability independently of the unit of measurement, which results from the standard deviation divided by the mean. Margin of error (ME) at 95% confidence interval (CI) was calculated by multiplying the critical value (1.96) with the standard of error (SE). Standard of error was calculated as the ratio of standard deviation with the squared root of the number of cells analyzed. Curve estimation regression models were used to describe the behavior of the margins of error with an increasing number of invasive cancer cells.

Results

The cohort included 83 needle core biopsies and 17 surgical specimens, diagnosed in 98 women and in 2 men. The age of the patients ranged from 33 to 93 years old, with a median age at diagnosis of 60 years old. The histologic types of primary BC cases were invasive carcinoma of no special type (87), lobular carcinoma (7), and micropapillary carcinoma (2). The metastatic sites included regional lymph nodes (2), bone (1), and liver (1). Cohort characteristics are summarized in Table 1.

Table 1 Cohort characteristics.

Regarding ISH quantification measurements, we observed a high correlation of HER2/CEP17 ratio and of average of HER2 copy number between the observers (PCC = 0.959; p < 0.001 and PCC = 0.916; p < 0.001, respectively) (Fig. S1A, B). Importantly, we observed a low correlation of HER2/CEP17 ratio and of average of HER2 copy number with the coefficient of variation (CV) (PCC = 0.284; p = 0.004 and PCC = 0.465; p < 0.001, respectively) (Fig. S2A, B), showing that this measure of variability is independent from the unit of measurement. We also observed that a minimum of 56 or 30 invasive cancer cells per case are required to stabilize the CV for HER2/CEP17 ratio or for average of HER2 copy number (difference from the final CV less than 0.01), respectively (Figs. 1A and S3A). The average margin of error of HER2/CEP17 ratio was determined with increasing number of cells, reaching a value of 0.40, 0.29, 0.25, 0.22 and 0.20 when counting 20, 40, 60, 80, and 100 invasive cancer cells, respectively (Fig. 1B). In addition, the average margin of error of average of HER2 copy number was also determined with increasing number of cells, reaching a value of 0.53, 0.41, 0.34, 0.29, and 0.26 when counting 20, 40, 60, 80, and 100 invasive cancer cells, respectively (Fig. S3B). Curve estimation regression models showed that a minimum of 457 or 926 invasive cancer cells per case are needed to reach a margin of error below 0.1 for HER2/CEP17 ratio (power model (y = 1.48x−0.44), R2 = 0.998, p < 0.001) or for average of HER2 copy number (power model (y = 2.02x−0.44), R2 = 0.997, p < 0.001), respectively.

Fig. 1: Relationship between number of cells and the value of CV and ME of HER2/CEP17 ratio.
figure 1

A Average difference of the coefficient of variation (CV) of HER2/CEP17 ratio according to the number of cells and the final CV; B Average margin of error of HER2/CEP17 ratio (confidence interval of 95%) according to the number of cells.

In ISH test, HER2 positivity was identified in 15 cases by the pathologist (13 cases group 1 and 2 cases group 3) and in 14 cases by the biomedical scientist (12 cases group 1 and 2 cases group 3), with 97 concordant cases (k = 0.879). The discordances (positive versus negative) were observed between group 1 and groups 5 and 4 (2 and 1 cases, respectively). Nevertheless, there were 4 more cases with discordant negative ISH groups (k = 0.786). These cases corresponded to discordances between group 5 and groups 2 and 4 (3 and 1 cases, respectively), all considered negative given the equivocal result by IHC (for details see Tables 2 and 3).

Table 2 Distribution of cases in each ISH group.
Table 3 Cases with HER2 status discordance between observers.

Regarding the 3 discordant cases (positive/negative), only one case (case #17) had a HER2/CEP17 ratio and an average of HER2 copy number quantification with a margin of error that did not cross the threshold of different ISH groups. The remaining two cases, as well as the cases with discordances between negative ISH groups, had HER2/CEP17 ratio or average HER2 copy number quantifications with margins of error that crossed the thresholds of different ISH groups. After common microscopy session, we detected the following reasons for discordant results: weak black signals (case #8), slight non-specific precipitation of black signals (case #17) and quantification of signals in non-invasive carcinoma (in situ carcinoma) (case #80) (Fig. 2). GH assessed by visual observation according to the ASCO/CAP 2018 guideline (discrete population of tumor cells with HER2 amplification) was not documented in any of the cases.

Fig. 2: ISH images from cases with HER2 status discordance between observers.
figure 2

A: case #8 with weak black signals; B: case #17 with slight non-specific precipitation of black signals; C: case #80 with invasive carcinoma (right) with HER2 amplification and ductal carcinoma in situ (left) without HER2 amplification.

To objectively measure GH of different populations of tumor cells, we compared the HER2/CEP17 ratio and the average of HER2 copy number of different areas in each case, analyzing the overlap of margins of error. We observed that in 20% and 39% of the cases, there was at least 1 area with HER2/CEP17 ratio or average of HER2 copy number, respectively, different from the remaining areas. When using a combined threshold of HER2/CEP17 ratio of 2.0 and average of HER2 copy number of 4.0 and 6.0 (corresponding to different ISH groups), GH in different areas was observed only in 7% of the cases (Table 4). In these cases, 3 cases had positive and negative ISH areas, with only one case (case #70) showing one different area with a margin of error that did not cross the combined threshold. This case was classified as positive (ISH group 3), which was concordant with the original report, although the negative area (ISH group 4) was initially overlooked (Fig. 3A). The other two cases (cases #34 and #66) had a slight non-specific precipitation of black signals which caused the classification of just one area as positive, in each case, with average of HER2 copy number margins of error crossing the thresholds of negativity (Fig. 3B, C). Regarding the remaining 4 cases with different negative ISH areas, 3 cases had a combination of groups 5 and 2, and 1 case had a combination of groups 5 and 4. In addition, 2 of these heterogeneous cases (case #64 and #66) were part of the 7 cases with discordances between the observers.

Table 4 Cases with genetic heterogeneity in different areas.
Fig. 3: ISH images from cases with genetic heterogeneity in different areas.
figure 3

A: case #70 ISH group 3; B and C: case #34 and #66 with slight non-specific precipitation of black signals.

Single cell GH (10% <positive cells <90%) was observed in 84, 34 and 8% of the cases when using the threshold of HER2/CEP17 ratio >2.0, and of HER2 copy number >4.0 or >6.0, respectively. However, when using a combined threshold of HER2/CEP17 ratio >2.0 and HER2 copy number >4.0, or HER2 copy number >6.0 regardless of ratio (corresponding to ISH groups 1 and 3, respectively), single cell GH was observed in 27% of the cases. We observed a high correlation between the number of positive cells observed in the cases and the expected number of positive cells if the cases had normal distributions of measurements (PCC = 0.973, p < 0.001; PCC = 0.988, p < 0.001 and PCC = 0.996, p < 0.001, using the threshold of HER2/CEP17 ratio of 2.0, and of average of HER2 copy number of 4.0 and 6.0, respectively) (Fig. S4). Finally, we evaluated the relationship of single cell GH with the values of HER2/CEP17 ratio and of average of HER2 copy number, observing that it reaches its maximum value near the thresholds of positivity (Figs. 4 and S5).

Fig. 4: Relationship between heterogeneity and the value of HER2/CEP17 ratio and HER2 copy number.
figure 4

A relationship between the HER2/CEP17 ratio and the proportion of single cell genetic heterogeneity using the combined threshold; B: relationship between the HER2 copy number and the proportion of single cell genetic heterogeneity using the combined threshold.

Discussion

In the last years, we have been documenting the effect of the latest ASCO/CAP guidelines for HER2 evaluation in BC [17, 18]. In the era of precision medicine, it becomes important not only to provide the correct result but, whenever applicable, the most precise one. The ISH assay for HER2 in BC has the objective to quantify HER2 gene amplification, supplying a continuous result. The importance of precision becomes even more relevant in cases in which the result is near the decision thresholds. As far as we know, there are no studies evaluating the exact precision of this assay in the context of BC. In this work, we show that about 60 invasive cancer cells are required to stabilize the coefficient of variation of HER2/CEP17 ratio, a number that many laboratories already use empirically, and that we now show the underlying mathematical reasoning. In addition, the precision, when following the 2018 ASCO/CAP guidelines, is very low, having margins of error of 0.40 for HER2/CEP17 ratio and 0.53 for average of HER2 copy number, when counting 20 invasive cancer cells (the minimum number of cells required). Even when increasing the number of invasive cells to 100 (which is rarely done) the margins of error are not below 0.20. This means that in cases with HER2/CEP17 ratio between 1.8 and 2.2, when at least 40 cells should be evaluated, it is very likely that the final result has margins of error crossing the decision thresholds. As such, we continue to defend that 20 cells should not be the minimal cell number recommended by current guidelines [19]. Although the number of cases could be higher in this study, a limitation of this work, we propose that laboratories should quantify their margins of error, including it in the HER2 ISH report.

The 2018 ASCO/CAP guideline already mentions the problem of cases with HER2 evaluations near decision thresholds, although the explanation provided by the authors, in our opinion, may be an unfortunate one as it states that “there is a high likelihood that repeat testing will result in different results by chance alone” [1]. Instead of giving the impression that results can be random, it should be point out that any quantification measurement in any field can have imprecision and HER2 gene amplification is not an exception to this rule. We have shown previously that intra and interobserver concordance rate of HER2 ISH test increases with increasing cell count from 20 to 60 invasive cells [19]. We have also shown, along with others, that cases near the thresholds are precisely the cases that can have intra-observer as well as interobserver discordances, making a precise result even more clinically relevant [19, 20]. Our data estimates that about 450 or 930 invasive cells must be evaluated to reach margins of error of 0.1 (for HER2/CEP17 ratio and average of HER2 copy number, respectively), which is impractical for manual assessment. To overcome this limitation, image analysis of ISH tests can assist this quantification having the potential to evaluate thousands of cells, lowering the margins of error to minimal values. Clinicians should be aware of the limitations of current measurements, integrating the results, preferably with margins of error, along with the traditional criteria to decide the best treatment for individual patients.

In our work, we observed a high correlation of measurements between both observers and a concordance rate above 95%, with only one discordant case with margins of error that did not cross the decision thresholds. Although the invasive cancer was marked on HE by the pathologist, we were able to trace a case in which the quantification was made on in situ carcinoma (mixed with invasive carcinoma) underlying the importance of training, experience and supervision in the interpretation of HER2 ISH test. The UK guidelines recommend that when training new professionals in HER2 ISH test, evaluations of at least 100 ISH tests in parallel with an experienced observer should be done until a minimum concordance of 95% is reached [16].

The first definition of HER2 genetic heterogeneity (HER2-GH) was published in 2009 as an extension of the 2007 ASCO/CAP guideline described as HER2 gene amplification in 5 to 50% of individual invasive cancer cells [8]. Afterwards, several studies reported the presence of HER2-GH in BC from 5 to 40% of the cases [21,22,23,24,25,26]. The definition was not based on clinical studies with prognosis or response to target therapy, representing the first step to unravel the clinical significance of HER2-GH. Later, it was shown that the presence of HER2-GH was associated with reduced disease-free survival and less response to anti-HER2 target therapy [25, 26].

In this study, we show that single cell HER2-GH can be accurately estimated from the values of HER2/CEP17 ratio and average of HER2 copy number, being higher near the thresholds of positivity. Previously, it was demonstrated that HER2-GH is more frequent in cases with low HER2 amplification, ratios near the thresholds and that GH measured in individual cells is not informative of clonal heterogeneity within a tumor population [24, 25, 27]. The different prognosis, as well as response to target therapy, in BC cases with single cell HER2-GH can be explained by the closeness of HER2/CEP17 ratio and HER2 copy number values to the thresholds, representing nothing more than a mathematical artifact, instead of representing the presence of different biological clones.

We acknowledge that the definition of single cell HER2-GH used in this study is different from the initial definition proposed in 2009. The upper limit of 50% was soon criticized arguing that if HER2-GH is defined as the presence of more than 5% of amplified cells, the 95% should be used as the upper limit. The application of the original definition could give rise to paradox cases, with difficult interpretation and unknown clinical significance [28, 29]. In this study, we choose to use the same proportion of positivity in both definitions (10–90%) to compare the effect of measuring HER2-GH using single cells or population of cells, excluding other differences as a source of discrepancy in the results.

The 2013 ASCO/CAP changed the definition of HER2-GH from individual cells to discrete population of tumor cells (at least 10% of the total tumor cell population with HER2 amplification) [2]. After this change, the presence of HER2-GH has been reported in less than 2% of the cases, although there is still the need to determine the minimal proportion of amplified tumor cell population that achieves clinical response to HER2-targeted therapy [17, 18, 30, 31]. Interestingly, it was shown that in BC cases with HER2-GH up to 30%, both cases with clustered and scattered amplified cells showed an intermediate clinical behavior between homogenous amplified and non-amplified BC [31]. Moreover, HER2-GH by IHC (score 3+ in >10 and <100%) was not considered in this study, which focused rather on ISH analysis in equivocal results (score of 2+). Importantly, HER2-GH can be missed when just evaluating core biopsies rather than excision specimens, although it is less frequently observed than in other breast biomarkers (estrogen and progesterone receptors) [32].

In this work, we clearly show that single cell HER2-GH is much higher than population HER2-GH in equivocal (score of 2+) BC cases. When using HER2/CEP17 ratio higher than 2.0 or average of HER2 copy number higher than 4.0, we can document GH in about 85% or a third of the cases, respectively. Even when using a combined threshold, corresponding to different ISH groups, we still report single cell HER2-GH in about 30% of the cases, which is much higher than the population HER2-GH observed (less than 10% of the cases). In these last cases, only 3 cases had a mixture of positive and negative areas, and only one case (case #70) had quantifications with margins of error far from the thresholds, as well as without artifacts, supporting true HER2-GH. Although we could not find the negative area in that case after review, the quantification of the average of HER2 copy number above 4.0 makes it unlikely that the measurement was made in an area other than invasive carcinoma, probably representing a minor component. Nonetheless, the impact of the presence of minor components (either HER2 amplified or not) needs to be further studied and clarified. Unfortunately, the absence of clinical data in this cohort (such as survival data or response to target-therapy) made it impossible to correlate it with HER2-GH.

Cases with population HER2-GH as defined by different negative ISH groups (5, 4, and 2) represent heterogenous negative BC cases that might display different clinical behavior compared to homogenous negative group 5 BC cases, a situation that, although rare in this study (less than 5%), has not been investigated in the literature. It has already been shown that BC cases from group 4 and 2 have worse clinical outcome compared to group 5 [33, 34]. Importantly, most BC cases (more than 90%) disclosed homogeneity in HER2/CEP17 ratio and in average of HER2 copy number quantifications, with only 1% of the cases showing convincing population HER2-GH.

In conclusion, we show that margins of error in HER2 ISH test are high, even when counting 100 cells, a limitation that could be overcome through quantification with image analysis. In addition, we show that population HER2-GH is a rare event, and that single cell HER2-GH is maximal in cases near the thresholds of positivity.