Main

Amplification of the HER2 gene and concomitant protein overexpression are present in between 10 and 20% of primary breast cancers.1, 2, 3 Identification of this subset of breast cancers has become a key component of the diagnostic workup of all new breast cancers, given the aggressive nature of these tumors and the role of HER2 status in predicting response to various treatment modalities. HER2 status has been shown to predict sensitivity to anthracycline-based chemotherapy regimens.4, 5, 6, 7 In addition, amplification of the HER2 gene and/or overexpression of the HER2 protein confers relative resistance to cytoxan-based regimens8 and tamoxifen-based therapies in the setting of estrogen receptor-positive breast cancers.9 Perhaps most importantly, breast cancers with HER2 alterations are targets for treatment with trastuzumab, a humanized monoclonal antibody, which has been shown to improve response rate and survival markedly when added to chemotherapy or as a monotherapy.10, 11, 12 Recent studies have demonstrated that adjuvant trastuzumab can reduce risk of recurrence by one-half, and mortality by one-third, in early stage breast cancer patients.13, 14 Other agents, targeting the HER2 gene product, have also demonstrated clinical utility15 and several more are in development.

HER2 testing has become an essential part of the clinical evaluation of all breast cancer patients in the United States, and accurate HER2 results are critical in identifying patients for whom this targeted therapy is appropriate. This is particularly important given the cardiotoxic side effects of trastuzumab seen in approximately 1.4% of patients receiving the drug as a single agent,10, 16, 17 and in even higher percentages of patients receiving trastuzumab concomitantly with paclitaxel (13%) or anthracyclines (27%),11 as well as the high cost of the drug.18, 19

Although a tight association between HER2 gene amplification and protein overexpression has been documented in breast cancers by western and northern blot analyses,20 Press et al21 have demonstrated that immunohistochemistry (IHC) on deparaffinized, formalin-fixed tissue can be quite variable in its ability to identify HER2-amplified tumors. The high level of discordance between HER2 protein expression by IHC and HER2 gene amplification by fluorescence in situ hybridization (FISH) has been documented in several studies. Discordance rates may be as high as 20% when HER2 testing is performed in low volume, local laboratories, whereas discordance is believed to be lower in high volume, central laboratories.22, 23 More recent studies continue to document significant levels of discordance between results of HER2 studies performed at local and central laboratories, eg, 18% for IHC and 12% for FISH,24 and a 21.8% false-positive rate and 8.9% false-negative rate for HER2 IHC (vs by FISH) at local laboratories.25

Addressing this issue of HER2 test accuracy, the American Society of Clinical Oncologists (ASCO) and the College of American Pathologists (CAP) have recently released new guidelines for laboratory testing of HER2 status in breast cancer.26, 27 HER2 IHC scoring is reported as negative (0/1+), equivocal (2+), or positive (3+). Among other things, these guidelines require validation of HER2 testing by all laboratories performing HER2 testing, which entails documenting 95% concordance rates between cases that are IHC 3+ and FISH amplified, and between cases that are IHC 0/1+ and FISH nonamplified. HER2 FISH is reported as amplified (HER/CEP17 ratio>2.2), equivocal (ratio 1.8–2.2), or negative (ratio <1.8).

A number of factors appear to improve concordance levels between HER2 assessment by IHC and FISH. Image analysis has been demonstrated to improve interobserver variability among pathologists evaluating HER2 IHC, and also to produce better concordance with HER2 FISH.28, 29 We have previously demonstrated the value of an ongoing quality assurance program, entailing parallel testing by IHC on all FISH cases, which significantly improves concordance between the two methods.1 Vincent-Salomon et al30 have documented improved IHC and FISH concordance by ‘recalibrating’ the IHC methodology. Leong et al31 have shown that requiring 3+ positivity by IHC to include circumferential ‘tram-track’ pattern from staining of apposing cell membranes in>25% of the tumor cells led to 100% concordance of IHC and FISH.

We have previously demonstrated that a significant decrease in false-positive (IHC3+/FISH−) results can be obtained through a modification of the FDA-approved scoring system for HER2 IHC by obtaining a normalized IHC score for the breast cancer.32 This score is obtained by subtracting the score representing the level of immunostaining on the non-neoplastic breast epithelium from the score representing the level of immunostaining on the tumor. However, this study only included 48 cases from a single institution that were initially fixed in alcoholic formalin and subsequently in neutral buffered formalin. The present study was designed to evaluate a normalized IHC scoring system on a large number of breast cancer cases from multiple institutions, and to compare this normalized scoring system with the widely used, FDA-approved scoring system, with specific attention to the achievement of the high levels of concordance of HER2 testing between IHC and FISH mandated by the new ASCO–CAP guidelines.

Materials and methods

Study Design

From January 2003 to December 2006, 16 141 breast tumor specimens were submitted to PhenoPath Laboratories (Seattle, WA, USA) for HER2 testing. Cases submitted for IHC testing with indeterminate results (2+ staining) were further tested by FISH, accounting for a disproportionately high fraction of 2+ cases in this study cohort. As part of an ongoing quality assurance program, cases submitted for primary FISH testing were tested for HER2 status by IHC. A total of 6604 tumors were tested in parallel by both methods. Tumor specimens were received from over 100 hospitals and cancer centers in 29 states. Specimens included sections from primary breast resections, needle core biopsies, and metastatic lesions. All tissues had been fixed in formalin, although the duration of fixation and the exact nature of the buffer in which the formalin was made were not recorded in most cases. All tissues were submitted as paraffin blocks or precut tissue sections.

Immunohistochemistry

Tissue sections were deparaffinized and rehydrated before incubating them in 0.01 M citrate buffer at pH 6.0 in a steamer for 40 min at more than 95°C. All immunohistochemical procedures were performed on a Dako Autostainer (Dako, Carpinteria, CA, USA). A polyclonal antibody to HER2 (A0485; Dako) was applied at a 1:200 dilution in phosphate-buffered saline (PBS) to sections and incubated for 40 min at room temperature. With intervening wash steps in PBS, slides were incubated for 30 min at room temperature in a rabbit-specific labeled polymer (EnVision™+; Dako), followed by 10 min at 37°C in a solution containing 3% hydrogen peroxide and 3,3′-diaminobenzidine. Slides were counterstained with hematoxylin.

Normalized Scoring Methodology

Immunostained slides were scored according to a modification of the scoring system approved by the FDA, as described previously.32 Only invasive carcinoma was scored among the neoplastic cells. For tumor cells, only membrane staining intensity and pattern were evaluated using the semi-quantitative scale of 0–3+. The non-neoplastic epithelium was scored on a 0–3+ scale using identical criteria. The normalized HER2 score subtracts the score on the benign cells from that on the tumor cells. If benign epithelium was not present in the section, the non-normalized score on the tumor was used. An example of this normalized IHC scoring system is shown in Figure 1.

Figure 1
figure 1

Calculation of normalized scoring system involves determination of IHC scores of tumor (left) and non-neoplastic breast epithelium (right). (a) Tumor with IHC score of 3, adjacent non-neoplastic breast epithelium with IHC score of 0; normalized score=3−0 or 3. (b) Tumor with IHC score of 3, adjacent non-neoplastic breast epithelium with IHC score of 1; normalized score=3−1 or 2. (c) Tumor with IHC score of 2, adjacent non-neoplastic breast epithelium with IHC score of 1; normalized score=2−1 or 1.

Fluorescence In Situ Hybridization

Deparaffinized tissue sections were pretreated using a modification of the vendor's standard protocol, and then incubated with the FDA-approved Vysis PathVysion™ probe set, which includes SpectrumGreen-conjugated probe to the α-satellite DNA located at the centromere of chromosome 17 (17p11.1–q11.1) and a SpectrumOrange-conjugated probe to the HER2 gene (Abbott Diagnostics, Chicago, IL, USA). Morphometric analysis was performed using a MetaSystems™ image analysis system, incorporating the Metafer software with extended focus/tile sampling methodology (MetaSystems™, Altlussheim, Germany). Manual counting was performed on all cases in which the presence of autofluorescence and/or artifact prevented the counting of sufficient numbers of cells. In addition, all cases with ratios of HER2/CEP17 between 1.5 and 2.5 by morphometric analysis were scored manually by counting green and orange signals from at least 60 nonoverlapping cells.

Data Analysis

HER2/CEP17 ratios obtained by FISH analysis were compared with the normalized and non-normalized IHC scores to determine respective concordance rates.

Results

Table 1 shows the comparison between the non-normalized IHC results and normalized IHC scores as compared to FISH amplification used as a ‘gold standard.’ Among the 6604 tumors in which both IHC and FISH tests were performed, using a non-normalized IHC scoring system, 267/872 (30.6%) of the IHC 3+ cases proved to be nonamplified (false positive) by FISH, whereas using the normalized scoring system only 30/562 (5.3%) of IHC 3+ cases proved to be ‘false positive.’ For cases that were negative by IHC (0/1+) there was no significant difference in the number that were amplified by FISH using the non-normalized system 9/1076 (0.8%) and using the normalized system 15/1076 (1.4%).

Table 1 Raw data of normalized and non-normalized IHC scores compared to FISH amplification used as ‘gold standard’

These results are demonstrated graphically in Figure 2. Overall, using the normalized scoring system, 1904/1919 (99.2%) of those showing IHC results of 0 or 1+ proved to be nonamplified by FISH; 529/562 (94.7%) of those cases showing IHC results of 3+ proved to be amplified; and 529/4123 (12.8%) of those cases showing 2+ IHC results proved to be amplified. Among those cases that were IHC 3+ before normalization and 2+ after subtraction of staining on benign glands, 12% were amplified by FISH, which is no different than the overall percentage of IHC 2+ cases that were amplified. No cases of IHC 3+ were IHC negative (0 or 1+) following normalization.

Figure 2
figure 2

Graphic depiction of comparison between normalized and non-normalized IHC scores in relationship to FISH amplification. Percentage of cases showing FISH amplification is depicted on y axis.

For the 15 cases that were IHC negative and amplified by FISH, the HER2/CEP17 ratios of 6 (40%) were 2.1 or 2.2 (data not shown), values that are considered ‘equivocal’ using the new ASCO–CAP guidelines. For the 30 cases that were IHC 3+ and FISH nonamplified, 7 cases had greater than 4 HER2 gene copies, but the cells demonstrated polysomy of chromosome 17 and the ratio of HER2/CEP 17 was less than 2. These cases were therefore scored as ‘negative’ for amplification by FISH.

The concordance rates of IHC and FISH comparing the two scoring methods are presented in Figure 3. Using the normalized scoring method, the concordance rate between IHC 3+ and FISH amplification was 94.7%. Using the non-normalized scoring method, the concordance rate was only 69.4%. Concordance rates of IHC 0/1+ and FISH nonamplified were not significantly different between the two methods, 99.2 and 99.5%, respectively.

Figure 3
figure 3

Overall concordance between HER2 IHC and FISH results, comparing normalized and non-normalized scoring systems. Concordance percentages are depicted on y axis.

Discussion

The accuracy of diagnostic assays for HER2 in breast cancer is extremely important as HER2 status is not only a prognostic marker but also predictive of response to chemotherapy, particularly to HER2-targeted therapy such as trastuzumab.10, 11, 12 The diagnostic tests most widely used are IHC and FISH, measuring protein overexpression and gene amplification, respectively. There is a wide reported variation in both the accuracy of, and concordance between, these two methods. In general, documented concordance rates have fallen well below the 95% threshold mandated by the new ASCO–CAP guidelines, with many studies demonstrating concordance rates (excluding 2+ cases) closer to 80–90% (see references 3,33–37; Reddy, et al.38). The wide range of reported concordance rates between IHC and FISH assessment of HER2 status in breast cancer reflects, at least in part, the wide variation in methodology, instrumentation, and experience of the laboratories performing the testing.

The sensitivity and accuracy of HER2 testing by IHC is highly dependent upon both preanalytical factors, such as tissue fixation,39 and analytic factors, such as choice of anti-HER2 antibody employed in the IHC assay.21 As the introduction of HercepTest™, an FDA-approved kit for IHC testing, was intended to introduce a high level of accuracy and reproducibility to HER2 IHC testing, in fact HercepTest™ has been demonstrated in several studies to produce significant numbers of false positives (ie, cases demonstrated to be nonamplified by FISH).32, 36, 40, 41 Furthermore, the accuracy of HercepTest in identifying HER2 status in deparaffinized sections of a series of 117 well-characterized breast cancers was 88.9%.42 We have shown here that a normalized scoring method minimizes the number of false-positive IHC results, reducing the false-positive rate from 31 to 5%. Our improved HER2 accuracy likely is due to the normalization process reclassifying cases possessing a high level of immunostaining that is not a consequence of HER2 gene amplification leading to protein overexpression. In such cases, the high-level immunostaining could represent manifestations of preanalytical variables related to tissue fixation and/or processing. We do not believe that the high HER2 IHC accuracy reported here is attributable to our use of the Dako A0485 polyclonal antibody outside the HercepTest™ immunostaining kit and protocol, although this might be worth further investigation.

We have achieved an extremely high concordance rate between HER-2 testing by IHC and FISH, despite the use of tissues from a wide range of hospitals and laboratories with nonstandardized fixation and tissue processing. The key feature contributing to this high level of concordance was the use of a normalized IHC scoring system, which dramatically reduced the incidence of IHC 3+ cases that proved to be nonamplified, thereby increasing the specificity of this assay. Importantly, the use of this normalized scoring method did not significantly alter the sensitivity of IHC. Cases that were IHC 0/1+ (negative) were amplified in only 0.8% of cases when using a non-normalized score and only 1.4% when using normalization. Of these cases, 6/15 had ratios of 2.1 and 2.2 and 12/14 had ratios less than 4. Therefore, according to the newly published guidelines, 40% would fall in the equivocal category and require repeat testing. The negative predictive value of IHC using the normalized scoring method was 99.2%.

Although attaining near-perfect correlation between assessment of HER2 status by IHC and FISH is a laudable goal, discordance between these two measurements may be a function both of biology as well as laboratory error. For example, Pauletti et al43 have demonstrated that at least 3% of breast cancers show protein overexpression in the absence of concomitant gene amplification, implying that such cancers manifest high levels of protein expression through a mechanism other than gene amplification. Several investigators have shown that polysomy of chromosome 17 can account for a small subset of breast cancers showing 3+ levels of HER2 immunostaining but no amplification by FISH when the HER2/chromosome 17 ratio is evaluated.44, 45, 46 In the present study, of the 15 cases that were IHC 3+ and FISH nonamplified, 8 had polysomy of chromosome 17 with HER2/CEP17 ratios that were less than 2. Therefore, a concordance rate of 95% or higher may well be biologically unattainable. Using a normalized IHC scoring system, we were nearly able to achieve this 95% concordance rate between positive IHC and FISH (94.7%).

The new ASCO–CAP guidelines mandate significant changes in HER2 testing in laboratories throughout the United States. As technical handling of tissue continues to be a significant factor in standardization of test quality, the new guidelines mandate fixation in 10% neutral buffered formalin for a minimum 6-h and maximum 48-h duration. Although optimal fixation is extremely important, the potentially adverse effect of fixation resulting in strong HER2 IHC immunostaining appears to be overcome through the use of this normalizing scoring method. Indeed, the specimens studied here were retrieved from over 100 hospitals from across the United States and represent a wide variation in tissue processing and fixation.

In summary, extremely high concordance between IHC and FISH assessment of HER2 status in breast cancer is achievable, but to attain this high level of concordance, modification of the FDA-approved IHC scoring system is required. If the published literature is a guide, it seems likely that many laboratories may need to revise their IHC scoring method along the lines suggested in this study to achieve the high level of concordance mandated by the ASCO–CAP guidelines.