INTRODUCTION

Progress in molecular biology has resulted in the identification and greater understanding of molecular markers that may have prognostic and predictive value for breast cancer patients. The human epidermal growth factor receptor-2 (HER2/neu/c-erbB-2) is one of the best characterized of such markers. The subset of patients with breast cancer demonstrating a HER2-positive status has aggressive tumors and a poor prognosis (1, 2, 3). There is mounting evidence that HER2 status may predict response to chemotherapy and hormonal therapy, although conclusive data are needed (4, 5). Most important, demonstration of high HER2 receptor overexpression or HER2 gene amplification is essential for treatment with the anti-HER2 monoclonal antibody therapy Herceptin, which has significant clinical benefits in patients with metastatic breast cancer (6, 7, 8). Clinical studies have also shown that the level of HER2 overexpression correlates with clinical benefit. Patients whose tumors have high HER2 receptor overexpression and/or amplification of the HER2 gene benefit most from Herceptin (7, 9, 10, 11). For these reasons, testing for HER2 status is important for the management of patients with breast cancer, and accurately assessing HER2 status is essential in deciding which patients will benefit from Herceptin therapy.

Currently, no single assay is globally accepted as the gold standard for HER2 testing. Factors that can lead to inaccuracies in HER2 testing results include preparation, fixation, and storage of the tissue sections; the antibody or probe used to detect HER2; scoring or result interpretation; lack of validation of methodologies; experience of personnel; and interobserver variability.

Given the wide variation in testing procedures it is difficult to suggest gold-standard HER2 testing guidelines. As a first step toward standardizing HER2 testing procedures at a local level, several countries have developed national guidelines for diagnostic centers to follow. These include published guidelines from Australia (12), Canada (13), France (14), Germany (15), Japan (16), and the United Kingdom (17) and unpublished guidelines from the Czech Republic, Finland, Sweden, and The Netherlands, which are used routinely. Recommendations by the American Society of Clinical Oncology (ASCO; 18) and by the College of American Pathologists (CAP; 19) have also been published. The national guidelines vary in the actual recommendations and the level of detail (Table 1). The guidelines are continuously evolving as we better understand the issues surrounding HER2 testing and more data on HER2 and response to Herceptin become available. This review summarizes the areas of agreement in the current national testing guidelines and highlights issues in HER2 testing that can lead to variable results. By giving an overview of the national testing guidelines, we hope that areas of agreement and disagreement can be identified and considered. This should lead to further improvements in the level of reproducibility and accuracy of HER2 testing and increase the proportion of patients accurately identified as eligible for anti-HER2 therapy.

TABLE 1 Areas in Which National Guidelines Make Recommendations Regarding How to Test for HER2 Status

When to Test

There is debate over when to assess HER2 status. Recent testing guidelines from ASCO (18) recommend evaluating HER2 status on every primary breast cancer either at diagnosis or at the time of recurrence. The German Pathology Advisory Board also supports early determination of HER2 status. Advocates of early testing cite the fact that HER2 positivity is an early event in breast cancer development and that the HER2 status of primary tumors appears to correlate with that of metastatic sites (20, 21, 22, 23, 24, 25). However, others feel that it is only relevant to test for HER2 status in patients with advanced disease because this is the setting for which Herceptin is licensed. Of note, there is huge variation in national regulations on storing paraffin blocks: for example, although there is no legal requirement in Germany, pathologists in France and Canada are legally required to store blocks for 10 and 15 years, respectively. It is well known that breast cancer can recur up to 20 years after the first diagnosis. So, there is a practical argument for determining and recording the HER2 status of samples at diagnosis for use when the disease recurs.

Early testing for HER2 status could be particularly relevant if there is conclusive evidence of the value of HER2 in predicting response to adjuvant therapies. Current data suggest that HER2 positivity predicts response to anthracycline-based therapy (26, 27, 28, 29). Most reports also suggest that a HER2-positive status predicts resistance to hormonal therapy (30, 31, 32, 33, 34). One study suggests that the aromatase inhibitor letrozole may be more effective than tamoxifen in patients whose tumors are either EGFR (HER1) or HER2 positive and estrogen receptor (ER) positive (35). If these limited studies are confirmed by large, prospective, randomized trials, there will be strong pressure to screen all patients presenting with breast cancer for HER2 status. Studies are also investigating the clinical benefit of adjuvant Herceptin, which may ultimately necessitate HER2 testing at the time of diagnosis.

Assay Method

There is no agreement on the best method for determining HER2 status. Assays such as immunohistochemistry (IHC) and enzyme-linked immunosorbent assay (ELISA) detect HER2 receptor overexpression. Fluorescence in situ hybridization (FISH), Southern blotting, chromogenic in situ hybridization (CISH), and polymerase chain reaction (PCR) measure the level of HER2 gene amplification. However, many of these assays are currently limited to research. The two most commonly used HER2 assays in clinical diagnostics, and recommended by all current national testing guidelines, are IHC and FISH. At present, neither CAP (19) nor ASCO (18) recommends the use of one over the other. CAP recommendations state that currently, “It is unclear whether FISH assays are superior to IHC, or whether FISH should be considered an adjunct or replacement.” Of note, CISH has superseded FISH in Finland as the recommended method for determining HER2 gene amplification and is performed only at the two national reference laboratories by highly trained personnel. However, CISH is not currently recommended for routine diagnostic use outside Finland.

The two most widely used techniques, IHC and FISH, can be conducted using a variety of antibodies (IHC) or probes (FISH), either as part of a kit or alone. Only the Danish testing guidelines specifically recommend one particular assay, the IHC-based HercepTest (DAKO). Whichever assay is used, it should be standardized by following written protocols and procedures and be regularly validated, internally and externally, through the implementation of quality control (QC) and quality assurance (QA) measures.

Tissue Processing

The handling and processing of tissue samples before the HER2 assay can affect the results. Thus, there is a need to standardize the steps and procedures involved, including the following:

  • Type of specimen

  • Time from excision to fixation

  • Specimen slicing before fixation

  • Duration of fixation

  • Type of fixative

  • Storage of paraffin-embedded specimens/tissue sections and slide preparation

Type of Specimen

It was originally thought that fresh (frozen) tissue samples would give the most accurate results. However, this is not the case for IHC and FISH. Currently, most HER2 testing is done when metastatic disease is diagnosed, using specimens taken at first diagnosis of the primary cancer and therefore likely to be stored for long periods. At the moment, surgically excised samples from lumpectomies or mastectomies are preferred, although if these are not available core biopsies can be used. Cell blocks from fine needle aspiration biopsy (FNAB) may also be useful, particularly in the metastatic setting (36, 37).

Time from Excision to Fixation

Time to fixation is an important issue. Tissue specimens should be fixed as soon as possible after removal, preferably within 1 hour, using standardized procedures and strictly sticking to fixation times.

Specimen Slicing before Fixation

The thickness of tissue slices before fixation can affect HER2 assay results by delaying the penetration of the fixative. Therefore, care should be taken to ensure that specimens are fixed optimally by slicing at 0.5–1.0 cm and fixing immediately.

Duration of Fixation

Optimal fixation is essential to ensure that the HER2 protein is preserved and then correctly identified by IHC. Optimal fixation times vary between a minimum of 6 hours to a maximum of 48 hours (Rüschoff J, personal communication). Even tissue samples from core biopsies need a minimum period of fixation. Currently, there are no data on the reliability of quick fixation using a microwave in HER2 assays; therefore, this method is not recommended for routine clinical practice.

Type of Fixative

Many different fixatives are used, including phosphate-buffered formalin, alcohol-formalin fixatives, and Bouin's solution. It has been reported that the type of fixative can affect the assessment of HER2 by IHC and introduce artifacts (38, 39). For example, formalin fixation is associated with some loss of HER2 overexpression, as determined by IHC, especially if fixation is for <24 hours or >48 hours. Alcohol-based fixatives, such as Z-5 and Pen-Fix, can generate false-positive cases when using IHC (39). In France, Bouin's solution is commonly used although it has the disadvantage of making retrospective FISH testing impossible. Alcohol-based fixatives can also hamper FISH testing. Of the national testing guidelines that specifically recommend fixatives (Australian [12], Czech Republic [unpublished data], Canadian [13], and United Kingdom [17]), all suggest using 10% phosphate-buffered or neutral-buffered formalin.

Storage of Paraffin-Embedded Specimens/Tissue Sections and Slide Preparation

After fixation and processing, tissue specimens are usually embedded in paraffin. Properly fixed and paraffin-embedded samples for HER2 testing will keep indefinitely before sectioning if stored at room temperature (20–25° C). There is anecdotal evidence that paraffin block sections should not be cut from the blocks and left at room temperature for a significant period of time before HER2 testing, as this may result in some loss of antigen in the section prepared. According to the recommendations from the manufacturer of the HercepTest, tissue sections mounted on slides and stored at room temperature (20–25° C) should be stained within 4–6 weeks of sectioning to maintain antigenicity. For FISH testing, the United Kingdom guidelines suggest that storing cut sections for more than 6–12 months should be avoided.

The thickness of the tissue sections can affect the visualization and interpretation of assay results. Therefore sections should be cut at the standard thickness of 3–5 μm for IHC and 4–5 μm for FISH. A hematoxylin and eosin (H&E) section should be evaluated along with the IHC or FISH sections to ensure that there is adequate tumor tissue versus normal tissue and also to confirm the presence of an invasive component in the tumor.

Immunohistochemistry

Methodology

There are several factors that may contribute to the varying results observed with IHC assays:

  • Sensitivity and specificity of the antibodies

  • Use of antigen retrieval techniques

  • Antibody dilution

  • pH of buffer

  • Sensitivity and specificity of the detection system

Although there are a large number of anti-HER2 antibodies (each targeting different epitopes on the HER2 receptor), the most commonly used are the polyclonal antibody A0485, used alone or as part of the HercepTest, and the monoclonal antibodies CB11 (alone or as part as the Ventana PathWay kit) and TAB250. Several investigators have reported that the sensitivity and specificity of anti-HER2 monoclonal and polyclonal antibodies differ, although a number of these studies have also shown a high rate of concordance among the different anti-HER2 antibodies (39, 40, 41, 42, 43, 44).

It is important to be aware of the specificity and sensitivity of individual antibodies and to take into account that antigen retrieval increases the sensitivity of the antibody at the expense of specificity. The manufacturer's guidelines for the HercepTest specify that wet antigen retrieval, for example involving a water bath, should be used. However, antigen retrieval with a microwave is sometimes used with other antibodies although it can be more difficult to standardize. Whichever method is chosen, it should be closely monitored and standardized and should follow strict protocols (45). The United Kingdom guidelines suggest monitoring the antigen retrieval process by using normal breast epithelium for comparison. If excess antigen retrieval has occurred and normal epithelium stains positive, then it is suggested that the assay should be rejected and the sample retested. The increased sensitivity associated with antigen retrieval can be balanced by diluting the antibody for more optimal results. The pH of the buffer can affect the concentration of the antibody (46). Therefore, antibody dilution should be optimized for each laboratory and re-examined every time new reagents are used. The dilution should be calibrated using tissue arrays or cell lines as controls.

Most of the national guidelines suggest using one of the commonly used antibodies for IHC (Table 2). Some national guidelines, such as the Canadian and Swedish, recommend using at least two IHC antibodies with complementary specificity and sensitivity for HER2 testing. Only the Danish testing guidelines stipulate using a particular IHC assay, the HercepTest. Although some groups have reported that the HercepTest has good correlation with FISH as long as the manufacturer's protocol is scrupulously followed (47, 48), others have noted that the HercepTest can be associated with a high degree of false-positive results (38, 41, 49, 50), particularly when the score is determined to be IHC 2+. Thus, an internal validation of this kit is warranted before first use. Whichever antibody is used, the issues and pitfalls should be understood and considered when conducting the assay and interpreting results. It is essential to validate the assay first, and QC and QA measures should be undertaken regularly, internally and externally (Table 3).

TABLE 2 Commonly Used Anti-HER2 Antibodies Suggested for IHC by National HER2 Testing Guidelines
TABLE 3 Principles of Quality Control/Quality Assurance

Quality Control

The term quality control describes the internal validation procedures needed to guarantee the accuracy of a batch of HER2 test results. There should be standard operating procedures for IHC assays and the protocols should be calibrated, either by comparing results with those achieved using another technology, such as FISH, or by comparing them with an external control, such as cell lines calibrated by IHC or FISH. The use of controls of known HER2 levels alongside the assay procedure is mandatory, and most of the national testing guidelines stipulate inclusion of positive and negative controls (determined by IHC and FISH) as a minimum. Additional controls close to cutoff values are also recommended. Controls can be tissue arrays, tissue specimens of known immunoreactivity, or cell lines (51). As an example, SK-BR-3 is a high HER2-overexpressing cell line, MDA-175 overexpresses HER2 at an intermediate level, and the cell lines MDA-231 and MCF-7 do not overexpress HER2 and so can act as negative controls.

Quality Assurance

Quality assurance is the technical evaluation that compares the results from one laboratory with those from other laboratories (i.e., external controls). HER2 testing laboratories in the United Kingdom, Canada, France, and Germany are encouraged to join external technical QA programs, which can include confirmation of the percentage of positive and negative IHC results from one laboratory by retesting samples with another assay. For example, the Canadian guidelines recommend that 5% of samples should be retested by IHC at a designated reference laboratory and that if there are any discrepancies, the sample should be retested by FISH. Other QA initiatives include sending samples to a reference laboratory for confirmation by FISH. Of note, the concordance between high HER2 overexpression (IHC 3+), and FISH is around 90% (41, 52, 53, 54, 55, 56, 57). However, the concordance between IHC at the 2+ level and FISH can be as low as 25% (41, 54, 57), and most guidelines recommend retesting all IHC 2+ samples with FISH. Ring studies can also help promote training and experience among diagnostic personnel.

Result Assessment

Interpretation of stained samples can be subject to interobserver variability, which may affect the results (44, 58, 59, 60). Most national testing guidelines recommend the following when interpreting IHC results:

  • Score the percentage and intensity of cells showing complete membrane staining

  • Cytoplasmic staining should not be included when interpreting results

  • Assess staining in the invasive component but not in the in situ component

  • Normal epithelial cells should not stain. If staining is noted, the test should be rejected

  • Be aware of retraction artifacts, which may be falsely interpreted as positive

There are several methods of interpreting cells stained by IHC. The most common scoring system is recommended in the HercepTest manufacturer's protocol, but this has practical difficulties and is open to interpretation errors, particularly around the subjective IHC 2+/3+ cutoff point. However, the staining intensity can be compared with controls included with the HercepTest kit as an interpretation guide. DAKO, the manufacturer of the HercepTest, have recently modified their recommendations and propose FISH testing of all IHC 2+ tumors. The Finnish and Swedish guidelines advocate retesting all IHC 2+ and 3+ samples by in situ hybridization, diminishing the need to make a distinction between IHC 2+ and 3+ cases. However, this approach would be associated with increased cost implications. It should also be pointed out that the HercepTest scoring system may not be appropriate for use with other antibodies. Therefore, diagnostic personnel should conduct substantial validation studies to find the threshold of positivity for the system in their laboratory. Even so, there can still be a high level of interobserver variability, especially when distinguishing the 2+/3+ or equivalent cutoff. The Canadian guidelines recommend a cutoff for positivity of 10% of cells with moderate/strong complete membrane staining. Using this, a high level of concordance has been noted between IHC (CB11) and PCR (61). We believe that if there is any doubt over a score of 3+, it should be classified as equivocal and retested by another method, such as FISH.

Definitions of equivocal samples include the following:

  • Heterogeneous staining that may hinder interpretation

  • Cytoplasmic or retraction artifact obscuring the interpretation of true membrane staining

  • Weak staining in >30% of tumor cells

  • Staining of normal epithelium

  • Extensive incomplete membrane staining

There are several ways of improving the interpretation of IHC scores. The Australian and French guidelines suggest a regular audit of HER2-positive results in an unselected breast cancer population to check that these are within the reported limits of 15–25% (62, 63). There is also an association between certain histological types of tumor and HER2 status (64), and this can be used to question a HER2 result. The French and Canadian guidelines note that positivity of classic lobular carcinoma, mucinous and tubular carcinoma, or the absence of immunoreactivity in high-grade infiltrating ductal carcinomas should alert the pathologist to query the results. The Canadian guidelines also suggest questioning a negative HER2 result in high-grade, ER-negative clinically aggressive tumors or a negative result in Paget's disease or inflammatory carcinoma.

Fluorescence In Situ Hybridization

Probes

FISH has been shown to be a highly reproducible technique for HER2 testing, and prolonged storage of paraffin blocks does not appear to affect its sensitivity. Two FISH kits are commercially available: INFORM determines the absolute level of HER2 gene signals; PathVysion recognizes the potential for cells to be polysomic for chromosome 17 by calculating the ratio of HER2 gene to chromosome 17 centromere. It should be noted that the clinical significance of polysomy to anti-HER2 therapy is unknown. Both FISH assays take 2 days and require the use of expensive fluorescent microscopy. Most national testing guidelines favor the PathVysion assay. Personnel evaluating FISH results require extensive training and expertise to distinguish between normal and malignant cells and intraductal and invasive tumor cancer cells, and to spot artifacts.

Quality Control and Quality Assessment

QC and QA measures should be implemented for FISH assays, as discussed in the preceding section on IHC.

Result Assessment

FISH can quantify HER2 gene amplification. According to the manufacturers' protocols, HER2 gene amplification is either scored as an absolute value (INFORM; >4 considered positive) or as a ratio of HER2 gene amplification to chromosome 17 (PathVysion; >2 copies of HER2 for each chromosome 17 considered positive). However, these cutoff values are arbitrary and may well be revised in the future. Interpretation of FISH results can be difficult when the ratio is between 1.8–2.2 for the PathVysion kit or when there is a score of 4–6 with the INFORM kit. The current PathVysion scoring system was, however, clinically relevant when Herceptin pivotal trial data were re-analyzed and HER2 amplification was correlated with clinical outcome (9, 10).

The number of cells needed to determine the level of HER2 gene amplification varies in the literature from 20–100 (65). However, the manufacturers' protocols recommend that for HER2 gene amplification, 60 nuclei should be counted and an average score taken. Cells from different areas of the sample should be counted. Diagnostic personnel must have enough experience to ensure that the cells counted are invasive cancer cells. With the PathVysion kit only, cells with one or more FISH signals of each color should be scored. The final result should then be calculated as a ratio of average HER2 signal to that of chromosome 17 signal in 60 interphase tumor cells. A further issue with FISH is that excessive digestion can occur if the sample has been poorly fixed. Modifying the digestion time can give better visualization of the signal. One important point to note is that FISH is often considered to be the gold standard to which other HER2 assays are compared. However, the interobserver variability with FISH has not been determined and requires further examination.

What to Report

There is no consensus of national testing guidelines on what should be reported. The guidelines from Canada and the Czech Republic advocate full reporting of methodology (including controls, kit or antibody/probe) and scoring method. This is in line with recent recommendations from ASCO (18) and CAP (19). The ASCO recommendations state that because different laboratories use different assays, reporting should include “not only an estimate of HER2 levels but also a statement about the test's quality controls, the method, the specific kit or reagent, details of the scoring system and a statement regarding reproducibility, sensitivity and specificity of the assay, and a reference to the clinical validation of the assay or its correlation with a clinically validated c-erbB-2 test.” Such reporting may seem overly detailed, but it will help to improve the reproducibility of assays and decrease variability of results. When correlating results from different laboratories it may be important to know more than whether a HER2 result is positive or negative. As a minimum we recommend that the following should be reported:

  • Assay method

  • Antibody or probe

  • Scoring system (including number of cells analyzed for FISH)

  • Controls

  • Final score interpretation

Where to Test

Where to test is a contentious issue. Many of the national testing guidelines advocate centralized testing, at least to confirm equivocal IHC results. Centralized HER2 testing can be particularly useful for FISH, which needs extensive training and costly equipment. Centralized facilities assay greater numbers of samples per year, leading to greater experience and accuracy. Rigid quality control and validation add to the level of accuracy. But samples are generally fixed before being sent to the centralized facility and this cannot be controlled or standardized centrally. Local testing can be as accurate as centralized testing as long as there are education and training programs, standard and validated protocols, and quality control and assurance (48). Local laboratories can also provide results more quickly, with close contact between the pathologist and medical oncologist. However, a laboratory needs to test an adequate number of samples a year in order to have the technical experience to produce optimal results. The NSABP has specified that a laboratory must test at least 100 samples a month before patients tested at the laboratory can enter the NSABP B-31 Herceptin adjuvant trial (66). It is recommended in the United Kingdom (though not stated in the current guidelines) that laboratories should ideally test ≥250 samples a year by IHC (48).

Suggested Algorithm

HER2 positivity predicts a response to the anti-HER2 monoclonal antibody therapy Herceptin. Patients with strong HER2 overexpression (IHC 3+) or HER2 gene amplification benefit most from this therapy (7, 9, 10, 11). Testing algorithms should take this into consideration.

Most national testing guidelines suggest a similar testing algorithm. Tumor samples are initially tested by IHC. Samples with strong HER2 overexpression (IHC 3+) indicate eligibility for Herceptin therapy. IHC 2+ samples should be retested with another method, preferably FISH, to confirm results (Fig. 1). If FISH is used to determine HER2 status, amplification indicates eligibility for Herceptin therapy.

FIGURE 1
figure 1

HER2 testing algorithm.

SUMMARY

The consistent take-home message from national testing guidelines is the need to standardize HER2 testing procedures and to validate tests against a reference to improve accuracy. Protocols should be defined, reproducible, and strictly adhered to. There should be mandatory controls, and accuracy and reliability checks should be part of routine clinical practice. These procedures plus continued education and training are essential to accurately identify patients with HER2-positive breast cancer.

We recommend IHC as the screening test of choice for HER2 status, with the caveat that pathologists and laboratory personnel should receive continuous education in all aspects of this assay. It is clear that to reach a satisfactory level of standardization and accuracy, laboratories must do a reasonable number of assays a year to ensure quality of results. If a commercial IHC kit is used then every detail of the manufacturer's protocol should be scrupulously followed, particularly with antigen retrieval. If an in-house IHC assay is used, it should be validated against another technology (FISH or PCR) and controlled frequently. Ambiguous cases should be retested by another method, preferably FISH.