Prognostic and predictive markers

Antibody-defined markers in breast cancer can be employed in two different ways: as prognostic markers (those that can independently forecast clinical outcome) and as predictive markers (those that can independently predict response to a particular therapy).

Part 1: Estrogen Receptors

Estrogen and progesterone receptors are weak prognostic markers of outcome1 and strong predictive markers of response to endocrine, for example, tamoxifen-based, therapy,2 and are the only immunohistochemistry (IHC)-based breast markers to have received the imprimatur of a consensus committee of the College of American Pathologists.3 Estrogen receptors (ERs) expression has long been considered to be present in two-thirds of breast cancers,2 but more recent studies suggest that its incidence may be closer to 70%.4 ER status is strongly influenced by tumor grade and histology;5 as demonstrated by Nadji et al4 in a study of almost 6000 tumors, virtually all grade I tumors are ER positive, as are pure tubular, colloid, and classic lobular carcinoma.

Analysis of progesterone receptor (PR) expression is generally reported along with ER expression, and IHC determination of PR expression has now been clinically validated.6 It has further been conclusively demonstrated that PR status is independently associated with disease-free and overall survival, that is, patients with ER-positive/PR-positive tumors have a better prognosis than patients with ER-positive/PR-negative tumors, who in turn have a better prognosis than patients with ER-negative/ PR-negative tumors.7

As with all IHC studies of therapeutic targets, accurate and perhaps quantitative assessment of the results are critical. There are several major factors that can dramatically affect the apparent ER and PR status of a breast cancer as determined by IHC, including tissue fixation, choice of anti-ER or anti-PR antibody, and determination of thresholds for reporting positive results. As documented by Rhodes et al,8 there is wide variation in the reporting of results of estrogen and PR status, and all of these factors contribute to this.

Choice of specimen and fixation

The first question to be addressed is the type of specimen on which hormone receptor studies are most optimally performed, for example, needle core biopsies or resection specimens. Whereas the published literature suggests a concordance rate of 60–100% in breast samples of individual patients when core needle vs resection specimens are examined, more recent studies employing the most up-to-date methodologies in the context of larger number of cases show near 100% concordance between needle core and resection specimens.9 However, Mann and colleagues have demonstrated that as many as 9% of women may have false-negative ER IHC studies if their resection, rather than needle core biopsy, is utilized. These false-negative studies may be a consequence of inadequate fixation. Indeed, Goldstein et al10 have demonstrated that at least 6–8 h of formalin fixation time for breast biopsies is required to obtain reliable ER determination by IHC; pathologists therefore risk false-negative ER studies when tissues are under-fixed, as can happen in a specimen that comes in at the end of the day and is put immediately into a tissue processor. Given the lack of control some laboratories have over the fixation duration, and given the lability of the ER antigen, it is still advisable to look for ‘built-in’ positive controls in the form of non-neoplastic breast epithelium when identifying an ER-negative case.

Choice of antibody

Although a number of anti-ER antibodies are available, the ideal antibody is one that is both robust and has been clinically validated; to date, there are only three such antibodies, 1D5,11 6F11,12 and SP113 clones, which have all been demonstrated to produce results that correlate with clinical outcome; all have also been demonstrated to be equal or superior to ligand-binding assays in this respect.11, 12, 13, 14 Published data further suggest that the SP1 rabbit monoclonal may be the most robust of these reagents and better in identifying those patients most likely to respond to tamoxifen than the 1D5 clone.13 Whereas earlier studies had suggested that the ER−/PR+ group of tumors corresponds to about 10% of all cases,15 more recent studies using more robust antibodies suggest that this latter group probably represents one composed of false-negative ER studies; with optimal immunohistochemical methods, the number of tumors in this subset is near zero, or zero.4

Threshold for positivity

The use of clinically validated, reproducible, and standardized cutoffs for determining the scoring of positive results is critical. While a wide range of arbitrary cutoffs (eg 5 or 10% of tumor cells) are employed by different laboratories, in fact only one cutoff for both ER and PR immunostaining has been clinically validated as predicting response to tamoxifen-based therapy. In the landmark study of Harvey et al,12 a nine-point, semiquantitative ‘Allred’ score (ranging from 0–8) was performed on a series of almost 2000 patients and results were correlated with response to adjuvant endocrine therapy. Although there was a strong direct association between the level of ER expression, that is, the Allred score, and response to hormonal therapy, statistical analysis revealed that, by calibrating the definition of ER positivity to those with Allred scores greater than 2, the largest number of patients could be identified who benefited from adjuvant endocrine therapy. An Allred score of 3 or more (ie the definition of ER positivity) corresponds to as few as 1% of cells showing weak immunostaining signal. More recent studies have demonstrated the identical cutoff (ie 1% of cells with weak signal) for IHC analysis of PR.6 Ideally, however, the laboratory should employ an ER antibody using a cutoff score for positivity that has been clinically validated.

The value of further quantification of ER and PR at this time is uncertain. It has been shown that it is feasible to quantify ER and/or PR signals using different proprietary instruments16, 17, 18 or even relatively simple microcomputer-based image analysis techniques.19, 20 Recent published studies showing a dichotomized, bimodal distribution of ER expression using the 1D5 monoclonal antibody have called into question the necessity of quantification, suggesting that ER is almost always either completely positive or completely negative.4, 21 Inasmuch as not all ER-positive patients respond to endocrine therapy, and because ER appears, for example, by ligand-binding assays to be a continuous rather than binary parameter in breast, it is not yet clear that one should discard the notion of quantification of ER and PR analysis by IHC.22 Dichotomization can result in loss of information (Figure 1). In the future, other techniques might prove more efficacious than IHC in quantifying ER,23 but if the question is one of whether a patient should be treated with endocrine therapy, a dichotomized result from IHC studies may be the most appropriate.24

Figure 1
figure 1

Dichotamizing a continuous variable can result in loss of information. The grayscale image (a, top) has been ‘dichotamized’ by increasing the gamma so that only black and white remain (b, bottom). Has the same loss of information happened with ERs, in which a continuous variable has been dichotomized with IHC to be all positive or all negative?

The role of hormone receptor studies in ductal carcinoma in situ (DCIS) was demonstrated. NSABP Protocol B-24 showed that patients with hormone receptor-positive DCIS will likely have a significant risk reduction of subsequent invasive disease when given anti-estrogen therapy, compared with hormone receptor-negative tumors.25

Part 2: Human Epidermal Receptor Protein-2

The human epidermal receptor protein-2 (c-erbB-2; HER2) oncogene protein is a transmembrane glycoprotein in the epidermal growth factor receptor family. It is expressed at low levels in a variety of normal epithelia, including breast duct epithelium, but amplification of the HER2 gene and concomitant protein overexpression are present in 10–20% of primary breast cancers (Figure 2). Determination of HER2 status in breast cancer is important, as it has been determined that it is a prognostic as well as a predictive marker. HER2 overexpression and/or gene amplification is an independent prognostic marker of clinical outcome, in both node-negative and node-positive patients.26, 27, 28, 29 The major utility of HER2, however, is as a predictive marker. As a predictive marker, HER2 status has been shown to predict sensitivity to anthracycline-based chemotherapy regimens.30, 31, 32, 33 In addition, amplification of the HER2 gene and/or overexpression of the HER2 protein confers relative resistance to cytoxan-based regimens34 and tamoxifen-based therapies in the setting of ER-positive breast cancers.35 Perhaps most importantly, breast cancers with HER2 alterations are targets for treatment with trastuzumab, a humanized monoclonal antibody, which has been shown to markedly improve response rate and survival when added to chemotherapy or used as a monotherapy.36, 37, 38 Recent studies have demonstrated that adjuvant trastuzumab can reduce the risk of recurrence by one half, and mortality by one third, in early-stage breast cancer patients.39, 40 Other agents, targeting the HER2 gene product, have also demonstrated clinical utility,41 and several more are in development. Trastuzumab is one of the first successful therapies that has been custom-designed to identify a tumor-associated molecule.

Figure 2
figure 2

Cartoon showing relationship of HER2 DNA (orange dots), mRNA (green arrows), and protein levels (red peripheral band) in normal breast epithelium (left) compared with HER2-positive breast cancer (right). Note that the vast majority of HER2-positive tumors show parallel marked increases of DNA, mRNA, and protein, but HER2 protein is present at low levels in normal breast epithelium.

HER2 testing has become an essential part of the clinical evaluation of all breast cancer patients in the United States, and accurate HER2 results are critical in identifying patients for whom this targeted therapy is appropriate. This is particularly important, given the cardiotoxic side effects of trastuzumab seen in approximately 1.4% of patients receiving the drug as a single agent,36, 42, 43 and even in higher percentages of patients receiving trastuzumab concomitantly with paclitaxel (13%) or anthracyclines (27%),37 as well as the high cost of the drug.44, 45

HER2 IHC poses even greater challenges than does ER IHC, as both accurate as well as semi-quantitative assessment of the results of HER2 immunostaining are critical. Many of the same factors critical to accurate ER immunostaining apply to HER2 immunostaining, such as tissue fixation, choice of antibody, and determination of thresholds for reporting positive results.

Although a tight association between HER2 gene amplification and protein overexpression has been documented in breast cancers by Western and Northern blot analyses,26 Press et al46 have demonstrated that IHC on deparaffinized, formalin-fixed tissue can be quite variable in its ability to identify HER2-amplified tumors. The high level of discordance between HER2 protein expression by IHC and HER2 gene amplification by fluorescence in situ hybridization (FISH) has been documented in several studies. Discordance rates may be as high as 20% when HER2 testing is performed in low-volume, local laboratories, whereas discordance is believed to be lower in high-volume, central laboratories.47, 48 More recent studies continue to document significant levels of discordance between results of HER2 studies performed at local and central laboratories, for example, 18% for IHC and 12% for FISH,49 and a 21.8% false-positive rate and 8.9% false-negative rate for HER2 IHC (vs by FISH) at local laboratories.50

Addressing this issue of HER2 test accuracy, the American Society of Clinical Oncologists (ASCO) and the College of American Pathologists (CAP) have recently released new guidelines for laboratory testing of HER2 status in breast cancer.51, 52 HER2 IHC scoring is reported as negative (0/1+), equivocal (2+), or positive (3+) (Figure 3). Among other things, these guidelines require validation of HER2 testing by all laboratories performing HER2 testing, which entails documenting 95% concordance rates between cases that are IHC 3+ and FISH-amplified, and between cases that are IHC 0/1+ and non-FISH-amplified. HER2 FISH is reported as amplified (HER/CEP17 ratio >2.2), equivocal (ratio 1.8–2.2), or negative (ratio <1.8).

Figure 3
figure 3

Examples of HER2 immunostaining: (a) 0 (negative); (b) 2+ (equivocal); (c) 3+ (positive). According to the ASCO–CAP guidelines, if the laboratory's assays have been properly validated, and assuming proper fixation, no further testing would be required for (a) or (c). The equivocal test (b) would require further testing by another modality, for example, FISH (original magnification ×200).

A number of factors appear to improve concordance levels between HER2 assessment by IHC and FISH. Image analysis has been demonstrated to improve interobserver variability among pathologists evaluating HER2 IHC, and also to produce better concordance with HER2 FISH.53, 54 My laboratory has previously demonstrated the value of an ongoing quality assurance program, entailing parallel testing by IHC on all FISH cases, which significantly improves concordance between the two methods.55

A significant decrease in false-positive (IHC3+/FISH−) results can also be obtained through a modification of the Food and Drug Administration (FDA)-approved scoring system for HER2 IHC by obtaining a normalized IHC score for the breast cancer.56 This score is obtained by subtracting the score representing the level of immunostaining on the non-neoplastic breast epithelium from the score representing the level of immunostaining on the tumor. Our initial studies demonstrated this to be the case in tissues fixed in alcoholic formalin, but more recent studies have demonstrated that this ‘normalization’ technique can yield very high concordance between IHC and FISH results; indeed, concordance rates approaching that recommended by the ASCO–CAP panel cannot be obtained without using such a technique.57

The accuracy of diagnostic assays for HER2 in breast cancer is extremely important, as HER2 status is not only a prognostic marker but also predictive of response to chemotherapy, particularly to HER2-targeted therapy such as trastuzumab.36, 37, 38 The diagnostic tests most widely used are IHC and FISH, measuring protein overexpression and gene amplification, respectively. There is a wide reported variation in both the accuracy of, and concordance between, these two methods. In general, documented concordance rates have fallen well below the 95% threshold mandated by the new ASCO–CAP guidelines, with many studies demonstrating concordance rates (excluding 2+ cases) closer to 80–90%.58, 59, 60, 61, 62, 63 The wide range of reported concordance rates between IHC and FISH assessment of HER2 status in breast cancer reflects, at least in part, the wide variation in methodology, instrumentation, and experience of the laboratories performing the testing.

The sensitivity and accuracy of HER2 testing by IHC is highly dependent upon both preanalytical factors, such as tissue fixation,64 and analytic factors, such as choice of anti-HER2 antibody employed in the IHC assay.46 Although the introduction of HercepTest™, an FDA-approved kit for IHC testing, was intended to introduce a high level of accuracy and reproducibility to HER2 IHC testing, in fact HercepTest has been demonstrated in several studies to produce significant numbers of false positives (ie cases demonstrated to be non-amplified by FISH).56, 61, 65, 66 Furthermore, the accuracy of HercepTest in identifying HER2 status in deparaffinized sections of a series of 117 well-characterized breast cancers was 88.9%.67

Although attaining near-perfect correlation between assessment of HER2 status by IHC and FISH is a laudable goal, discordance between these two measurements may be a function both of biology as well as laboratory error. For example, Pauletti et al68 have demonstrated that at least 3% of breast cancers show protein overexpression in the absence of concomitant gene amplification, implying that such cancers manifest high levels of protein expression through a mechanism other than gene amplification. Several investigators have shown that polysomy of chromosome 17 can account for a small subset of breast cancers showing 3+ levels of HER2 immunostaining but no amplification by FISH when the HER2/chromosome 17 ratio is evaluated.69, 70, 71

The new ASCO–CAP guidelines mandate significant changes in HER2 testing in laboratories throughout the United States. As technical handling of tissue continues to be a significant factor in standardization of test quality, the new guidelines mandate fixation in 10% neutral-buffered formalin for a minimum of 6 h and a maximum of 48 h duration. Extremely high concordance between IHC and FISH assessment of HER2 status in breast cancer is achievable, but to attain this high level of concordance, many laboratories may need to revise their tissue fixation and IHC methodology and scoring methods to achieve the high level of concordance mandated by the ASCO–CAP guidelines.

Choice of antibody

Although there are a wide range of antibodies available to the HER2 protein, the ability of these antibodies to detect overexpression is extremely variable.46 As of the time of writing this article, the United States FDA had approved two different antibodies for IHC assessment of HER-2 expression, the HercepTest, which is based on the Dako A0485 polyclonal antibody, and the CB11 monoclonal antibody, available only in kit form for use on the Ventana autostainer; these two antibodies are also available outside of their kit formats, but without the imprimatur of FDA approval.

Scoring positive cases

Although the FDA-approved scoring system of 0, 1+, 2+, and 3+ (see table) was tailored for the HercepTest kit, and despite the fact that applies only to the determination of eligibility for trastuzumab therapy, it has become the de facto scoring system for most antibodies and test formats. However, use of the FDA-approved scoring system does not automatically ensure accurate assessment of HER2 status, as was demonstrated in our study56 documenting the improved accuracy of HER2 IHC studies by using a subtraction scoring system in which the signal score of the non-neoplastic breast epithelium is subtracted from that of the tumor. Non-neoplastic breast epithelium expresses low levels of the HER2 gene product that may produce a 1+ or even 2+ signal, depending upon tissue fixation and processing parameters, and if this is ignored a significant number of false-positive breast cancer HER2 scores will result.56

FDA Scoring System for HER2

IHC validation

One important new requirement of the ASCO–CAP guidelines is that of validation of IHC by all laboratories. To be validated, an IHC assay must have only 5% or less of samples classified as either positive or negative disagree with a ‘validated’ assay. If a laboratory cannot meet this standard, it should not be performing HER2 IHC and should send specimens to another laboratory with a validated assay. Cases scored as equivocal (ie 2+ by IHC) are not expected to be concordant but should be tested by another method (eg FISH).

Exclusion criteria

The ASCO–CAP guidelines mandate certain preanalytical factors that can result in rejection of the specimen for IHC evaluation of HER2 status. These include tissues fixed in fixatives other than neutral-buffered formalin, needle core biopsies fixed in formalin for less than 1 h, excisional biopsies fixed in formalin for less than 6 h, and any specimen fixed in formalin longer than 48 h. However, these are not ‘absolute’ exclusion criteria, and it is possible to accept such specimens if the assay can be validated under those conditions. In contrast, an exclusion criteria that is ‘fatal’ is the presence of severe artifact (eg crush or edge effect).