Main

For decades, the value of estrogen receptor (ER) as a prognostic and predictive marker in breast cancer has been an unparalleled example of the impact of biomarker research on patient care.1, 2, 3 Its importance is such that recent discoveries of high error rates in clinical testing for ER, in both Canada and the United States, spurred an immediate reaction toward improved standardization in ER assessment4, 5, 6, 7 resulting in publication of the guidelines for tissue processing and analysis to optimize companion diagnostic testing of ER in breast cancer specimens. As a result, research into pre-analytical variables that may influence biomarker test results has expanded dramatically,8, 9 although somewhat less attention has been paid to analytical variables, specifically, those concerned with methods of ER detection and quantification/measurement.

Before the current immunohistochemistry (IHC)-based standard, ER expression was widely evaluated using the ligand-binding assay (LBA). This test incubated breast tissue lysate with radiolabeled estradiol and resulted in an absolute quantification (fmol/mg) of the ER.3 However, LBAs are limited by the large tissue requirement and their inability to provide contextual information including the capability to distinguish ER expression in benign vs malignant cells.10 Upon development of specific monoclonal antibodies,11, 12 the practical ease and cost effectiveness of IHC led to rapid implementation of a new clinical standard for in situ assessment of protein expression after demonstration of their prognostic and predictive value.13, 14 However, the advantages of this in situ detection method of ER were confounded by the introduction of the human eye as a measurement tool resulting in significant reader variability.15, 16

Over the past few decades, many platforms have endeavored to eliminate this intra- and inter-observer variability and achieve consistent evaluation of diagnostic specimens. Systems such as the CAS-200 (ref. 10) and ChromaVision's ACIS17 function on a principle of color deconvolution; for ER and other nuclear markers, this allows optical density measurements of positive target staining within a nuclear counterstain.18, 19 Recently, technology has allowed the development of more rapid and sophisticated methods of digital image analysis. One such platform, the Aperio ScanScope and Digital Image Analysis Suite, combines both high-resolution image capture and quantitative assessment, and is FDA-approved to assist pathologists in ER, PR, Her2, and Ki67 measurement in breast cancer.20, 21, 22 In spite of the FDA approval, adoption is still limited. A recent CAP survey (2014) shows that less than 25% of over 1100 labs surveyed use automated assessment for ER.

Despite these advances, any system relying on chromogenic immunostaining is subject to the inherent limitations of absorbance measurement, such as a low dynamic range and saturation of the signal intensity based on enzymatic visualization of the antibody. Most widely used is 3,3′-diaminobenzidine (DAB), a highly thermochemically stable polybenzimidazole that provides brown-colored staining.23 The chromogen deposition occurs through a redox reaction catalyzed by an enzyme that allows direct bright-field light microscopy assessment.24, 25, 26, 27 Fluorescent systems of visualization and measurement are not subject to the limitations of high density and saturation. Optical detection and quantification of fluorescent signal depends on excitation and photon emission of specific wavelengths, resulting in signal intensity directly proportional to the concentration of the target of interest.28 The dynamic range of common assays with fluorophores that emit in the visible region of the spectrum is two to three times the dynamic range of chromogenic stains. Multicolor detection by using fluorescent target labeling, which can be spectrally resolved, makes it possible to examine several markers at once.29, 30 Several methods of quantification of fluorescent staining have been described.31 Here we use the AQUA technology as it does not require feature-based image fractionation, but rather allows detection of biomarker expression within specific subcellular compartments, as defined by antibody-conjugated fluorophore labeling and colocalization of the target of interest with cytoplasmic or nuclear staining.32 The fluorescent intensity is measured and divided by the compartment area to yield a quantitative, continuous, and reproducible score for each field of view. This technology has been extensively previously validated in tissue microarrays (TMAs) as well as whole-tissue sections.33, 34

To assess the problem of user and methodological bias in quantification of ER expression in breast cancer, we chose a three-pronged experimental approach to compare both automated (Aperio) and visual (pathologist) scoring of chromogenic staining, as well as to evaluate both of these techniques against quantitative immunofluorescence (QIF)-based ER detection. Each method of staining and detection was performed with two common clinical ER antibody clones (1D5 and SP1).

Materials and methods

Patient Cohorts and TMA Construction

Two retrospective breast cancer cohorts were constructed consisting of tissue obtained from the Archives of the Pathology Department at Yale University (New Haven, CT) and used to create two representative TMAs, as previously described. Briefly, YTMA 49 consists of 621 patients diagnosed between 1962 and 1982. This cohort is completely annotated with clinicopathological and follow-up information. YTMA 128 contains 235 patients diagnosed between 2003 and 2008. Cohort characteristics are summarized in Supplementary Tables 1 and 2. For both cohorts, 0.6 mm cores were taken from each specimen and combined into randomized TMAs, which were cut into 5-μM sections and adhered to glass slides for immunostaining. An Index TMA consisting of cell lines with known concentration of ER and of patient samples with variable ER expression pattern (described previously in Welsh et al.35) was run alongside each experiment for standardization and reproducibility purposes and to determine the threshold of detection for ER positivity for the different staining and reading methods described here.

Immunostaining with SP1

To visualize ER expression with the rabbit monoclonal SP1 antibody (ThermoScientific, Waltham, MA), slides were baked at 60 °C for 30 min to remove excess paraffin. Deparaffinization was performed in xylenes for two periods of 20 min each, after which slides were transferred to 100% ethanol and rehydrated to water in grades of ethanol. Heat-induced antigen retrieval took place in a PT module (LabVision, Kalamazoo, MI), where slides were immersed in sodium citrate buffer (pH 6) for 20 min at 97 °C. Slides were then rinsed in distilled water, transferred to a solution of 0.75% H2O2 in methanol for 30 min at room temperature to block endogenous peroxidases, and rinsed again in distilled water. They were then transferred to a Labvision autostainer, where the remaining staining steps were performed at room temperature and rinsed with tris-buffered saline/0.05% Tween-20 (TBST) between each stage. Nonspecific antigens were blocked by 30 min in 0.3% bovine serum albumin (BSA) diluted in TBST.

For chromogenic visualization, slides were incubated for 1 h with SP1 antibody (1:100) in BSA-TBST, and then with anti-rabbit EnVision (Dako, Carpinteria, CA) for 1 h. Signal was developed for 5 min in DAB solution (Dako; prepared according to the manufacturer's instructions), followed by counterstaining for 1 min with hematoxylin (Tacha’s automated hematoxylin, BioCare Medical, Concord, CA). Slides were removed from autostainer and coverslipped with Prolong Gold mounting medium (Life Technologies).

For slides to be visualized with fluorescence, a cocktail of SP1 antibody (1:100) and mouse pan-cytokeratin (Dako; 1:100) in 0.3% BSA-TBST were added for 1 h. The slides were then incubated with a secondary antibody cocktail of goat anti-mouse AlexaFluor 546 (Life Technologies) diluted 1:100 in anti-rabbit EnVision (Dako) for 1 h. Signal was amplified with Cy5-tyramide (Perkin Elmer, Waltham, MA) for 10 min, and nuclear staining was accomplished with 10 μg/ml DAPI (Life Technologies) in BSA-TBST for 20 min. Slides were then removed from the autostainer and coverslipped using the Prolong Gold mounting medium (Life Technologies).

Immunostaining with 1D5

For ER visualization with the 1D5 antibody (Dako) and subsequent analysis with Aperio’s FDA-approved nuclear algorithm, slides were stained according to the clinical site protocol for 1D5 as described previously.22

ER 1D5 slides intended for fluorescent visualization were immunostained according to the same protocol as described for ER SP1. Slides were incubated in a primary antibody cocktail containing 1D5 (1:50) and pan-cytokeratin (rabbit polyclonal, Dako) at 1:100 in BSA-TBST for 30 min, followed by a secondary cocktail of goat anti-rabbit AlexaFluor 546 (1:100) in anti-mouse EnVision (Dako) for 30 min, as well as signal amplification with Cy5 and DAPI staining.

Aperio Nuclear Algorithm

For analysis with Aperio’s nuclear algorithms, chromogenic slides were scanned to create bright-field digital images using the ScanScope CS (Aperio, Vista, CA). All digital images were viewed in ImageScope and analysis performed in Spectrum, elements of the Aperio image review and analysis suite. Slide images were first segmented to obtain a single image for each TMA spot, after which the pen tool was used to circle (‘annotate’) tumor areas for each spot. This was refined by use of a negative pen tool to subtract stromal areas enclosed by tumor to ensure that analysis would be restricted to tumor only.

For ER 1D5 scoring with the FDA-approved nuclear algorithm on YTMA 128, TMA spot images were first annotated to exclude stroma and restrict analysis to tumor areas only. The algorithm was then run on each spot to generate both a markup image (showing scoring for individual nuclei) and a percent-positive nucleus score for each spot.

For ER SP1 scoring, the unlocked nuclear algorithm was modified to take into account a darker counterstain and improve color deconvolution, but was otherwise not altered from the settings of the FDA-approved nuclear algorithm. The nuclear algorithm input includes a section for red, green, and blue absorbance (OD) values for the hematoxylin counterstain in order to facilitate deconvolution from the nuclear stain, which has its own set of OD values. ImageScope’s Image Quality feature was used to measure the RGB OD values within negative control spots. These were then averaged for the slide, substituted for the defaults, and the resultant algorithm saved and used to generate ER scores as percent-positive nuclei in annotated spot images. The counterstain RGB values were determined separately for each slide stained with SP1 to account for subtle variations in hematoxylin counterstaining between slides.

Pathologist Scoring

YTMA 49 and YTMA 128 slides with ER staining visualized by DAB were submitted to three board-certified pathologists (Path1, Path2, and Path3), who estimated percent-positive nuclei using the digital images acquired by Aperio’s ScanScope CS. TMA spots denoted by a pathologist to contain no invasive breast cancer were excluded from further analysis in all three ER assessment methods, as were spots with diffuse cytoplasmic staining instead of specific nuclear signal.

Automated Quantitative Analysis

Immunofluorescence staining for both SP1 and 1D5 antibodies was quantified using automated quantitative analysis (AQUA) as previously described.32 Briefly, monochromatic images for each of the DAPI, Cy3, and Cy5 channels were captured after for each TMA spot using an automated PM-2000 microscope platform (Genoptix/Novartis). The cytokeratin expression (Cy3) was used to binarize pixels to create an epithelial tumor mask. DAPI staining within this tumor mask was used to create a nuclear compartment, in which ER expression (Cy5) was measured as the sum of all pixel intensities, divided by the area of the nuclear compartment. Scores were then individually normalized according to exposure time, bit depth, and lamp hours to allow direct comparison between spots on the same slide.

Statistical Analysis

Regression analysis to assess method and assay reproducibility was performed in Microsoft Excel 2010, and results were confirmed in the StatView software platform (SAS Institute, Cary, NC) by means of Pearson coefficients and ANOVA testing. Kaplan–Meier survival analysis was performed using StatView for each ER scoring method, and statistical significance was assessed using the log-rank test.

Results

Fluorescent and Chromogenic Assessment

To evaluate methods of ER visualization and measurement, immunostaining was performed on serial sections of two breast cancer TMA cohorts collected at Yale, as previously described.35 Figure 1 shows examples of low and high ER expression with both chromogenic and fluorescent detection methods on serial sections. Digital images of each slide were then captured for further analysis (Figure 2).

Figure 1
figure 1

Examples of estrogen receptor staining in breast cancer tissue microarrays by both chromogenic and fluorescent methods. (a) Low and (b) high expression as visualized by 3-diaminobenzidene; corresponding on serial sections (c) low and (d) high expression as seen via conjugation with Cy5-tyramide.

Figure 2
figure 2

A demonstration of the components of fluorescent and chromogenic quantification as utilized by automated quantitative analysis (AQUA) and Aperio’s nuclear algorithm, respectively. (a) Simultaneous visualization of nuclei (blue, DAPI), pan-cytokeratin (green, AlexaFluor 546), and estrogen receptor (red, Cy5-tyramide) in a single tissue microarray spot. The AQUA program generates a tumor mask compartment from cytokeratin expression, further refines it into a nuclear compartment using DAPI positivity, and measures target signal intensity in the nuclear compartment. (b) Typical chromogenic staining for estrogen receptor in a strongly positive case, as visualized by diaminobenzidine (DAB) and counterstained with hematoxylin. The tumor areas are manually outlined (annotated; green line) by the user to exclude stromal nuclei. Aperio’s nuclear algorithm then uses morphological characteristics and the hematoxylin counterstain to identify nuclei. DAB intensity is then measured on a per-cell basis to determine positivity, and a markup image generated to illustrate results. Nuclei are binned into four categories to mimic pathologist intensity scoring: negative (blue=0), weak positive (yellow=1), positive (orange=2), and strong positive (red=3).

Fluorescent detection slides were scanned at × 20 to collect images from the DAPI, Cy3 (cytokeratin), and Cy5 (ER) channels (Figure 2a). These images were then analyzed with the AQUA software, which created an epithelial tumor mask from cytokeratin expression, and then used DAPI expression within this mask to form a nuclear compartment. ER signal was quantified as the sum of pixel intensities divided by the nuclear compartment area and normalized to generate a Nuclear AQUA Score for each patient.

Chromogenic detection slides were scanned using Aperio’s ScanScope CS digital image acquisition system, and board-certified pathologists scored percent-positive nuclei for each TMA spot using these digital images. The images were then manually annotated by a trained technician to exclude stromal areas, and were analyzed with Aperio’s nuclear algorithm. Nuclei are binned into four categories (negative nuclei or weak, medium, and strong positive nuclei), and a markup image created to reflect scoring results (Figure 2b). Aperio’s nuclear algorithm quantifies the annotated tissue for percent-positive nuclei as well as staining intensity according to predefined four categories resulting in a semiquantitative scoring system.

Antibody and User Variability

Our first step was to examine the relationship between ER 1D5 and ER SP1 scoring on YTMA 128 by all three methods of assessment (Figure 3). Whereas all methods show a correlation between the 1D5 and SP1 scores (Figure 3c), the relationship changes as a function of the method. Despite following the clinical site protocol precisely, we observed a titration independent, light brown haze over the tissue stained with the 1D5 antibody that was not present with SP1. As we wished to omit antibody-specific variables confounding reading and interpretation of the slides, all further analyses were performed using the ER SP1 clone.

Figure 3
figure 3

A comparison of antibody clones 1D5 and SP1 as they affect manual and automated assessment of estrogen receptor expression on YTMA 128. (a) Pathologist scoring of 1D5 vs pathologist scoring of SP1. (b) Aperio’s FDA-approved nuclear algorithm scoring of 1D5 vs scoring of SP1 by a modified version of Aperio’s nuclear algorithm. (c) Automated quantitative analysis (AQUA) scoring of 1D5 vs SP1 in the nuclear compartment.

To assess operator-based reproducibility, each assay analysis method was completed by two different operators allowing assessment of the subjective component of each scoring method (Figure 4). The Pearson coefficients (R2) were above 0.9 for all methods; however, both automated scoring methods had higher reproducibility (R2>0.95) between different operators. The regression R2 between pathologists 1 and 2, as assessed by traditional visual scoring methods, was 0.92. The non-continuity of the scores can also be seen in Figure 4a. The regression between the Aperio scores for two users was 0.96, showing better performance that traditional scoring but still suggesting some element of subjectivity. When two different users completed the AQUA scoring, the regression as nearly perfect (0.995), suggesting minimal user variation.

Figure 4
figure 4

Interuser reproducibility for methods used to quantify estrogen receptor expression. (a) Pathologist scoring and (b) Aperio nuclear algorithm assessment of ER positivity were reported as percent-positive nuclei (chromogenic visualization), and (c) automated quantitative analysis (AQUA) quantification as the Nuclear AQUA Score (fluorescent visualization).

Assessment Methods Comparison

We then examined variability between methods using a linear regression analysis for continuous data (Figure 5). Although the pathologist data are not truly continuous, the estimations of percentage of positive nuclei were assumed to be continuous for the purposes of this assay. The regression between either pathologists’ percent-positive nucleus scores and the score from Aperio’s nuclear algorithm showed a nonlinear relationship where the pathologist scores were consistently higher than those generated by the Aperio nuclear algorithm (Figure 5a). There were essentially no cases where the pathologist's estimate was below the Aperio score. A similar pattern was seen with AQUA scores. Although AQUA measures pixel intensity of the target of interest (ER in this study) as opposed to percent positivity, it has a similar relationship when compared with pathologist scoring (Figure 5b). The closest relationship between any two methods is clearly between the two types of automated scoring, despite the different detection techniques (Figure 5c). However, comparing the two automated scoring methods reveals the lower dynamic range and enzymatic saturation of the DAB signal as compared with fluorescent measurement.

Figure 5
figure 5

Relationships between methods used to assess estrogen receptor. (a) Aperio’s nuclear algorithm vs pathologist scoring; (b) automated quantitative analysis (AQUA) vs pathologist scoring; and (c) AQUA vs Aperio’s nuclear algorithm.

Survival Analysis and Discordance

Although regressions help us examine the similarities and differences in ER quantification methods, they do not provide any case-specific information on patient classification into the ER-negative or ER-positive groups. Furthermore, comparison of tests is more valuable when the test comparison can be assessed as a function of patient outcome. To see how the three assessment methods compared on this basis, we looked at their determination of ER status for patients on YTMA 49, a large, historic cohort collected at Yale between 1962 and 1982. The 10-year disease-free survival Kaplan–Meier curves are very similar between all three methods (Figure 6); however, their differences can be seen in the summary tables (Tables 1 and 2). When the continuous scores are binarized to generate positive or negative output, only 19 of 233 total cases were discordant. There was only one case that was positive by pathologist and Aperio scoring, but negative by AQUA. In contrast, there were 10 cases that positive by pathologist and AQUA, but negative by Aperio (Table 1). There were three cases that were positive by pathologist, and negative by the AQUA and Aperio methods; and finally, five cases were positive by AQUA, and negative by pathologist and Aperio scoring. The number of discordant cases is too small to evaluate which method better correlates with outcome.

Figure 6
figure 6

Kaplan–Meier survival analysis of breast cancer patients on YTMA 49 with estrogen receptor-negative (blue) and -positive (red) tissue, as measured by (a) pathologist, (b) Aperio’s nuclear algorithm, and (c) automated quantitative analysis (AQUA). The cutoff used for pathologist scoring and Aperio’s nuclear algorithm was 1% positive nuclei, as per the ASCO-CAP guidelines. The ER positivity threshold for AQUA was determined using an Index TMA with positive and negative cell lines stained alongside YTMA 49. Number of positive and negative cases in each group are summarized in Table 1.

Table 1 Summary of ER assessment method discordance on YTMA 49
Table 2 Hazard ratios for ER positivity in unselected breast cancer cohort YTMA 49 as diagnosed by different reading methods

These discordant cases were carefully reviewed by an independent pathologist, who was not involved in previous readings, to determine reasons for discordance (images not shown). In the one case positive by the pathologist and Aperio, but negative by AQUA, there was a clear nuclear fluorescent staining visual by eye, but the nuclear AQUA score for that case was 107, just barely below the threshold of 110 (in a set of scores that ranged from 0 to 12 500). In contrast, for the five cases positive by AQUA and negative by pathologist and Aperio scoring, low but clearly positive fluorescent nuclear staining can be seen by eye, whereas by chromogenic detection no nuclear staining is detectable. This may be because of masking by the hematoxylin counterstain on these particular spots. Similarly, the 10 cases positive by pathologist and AQUA, but negative by Aperio, have clearly visual nuclear staining on both the fluorescent and chromogenic detection slides, but, for unknown reasons, the hematoxylin counterstain appears somewhat darker than most spots on the slide and was not detected by the Aperio algorithm. Finally, in the three cases that were positive by pathologist scoring and negative by the AQUA and Aperio algorithms, closer pathologist examination was unable to determine whether the cells considered positive contained extremely strong hematoxylin or were, in fact, positive for diaminobenzidine (spots appeared black).

In an effort to test the flexibility and performance of the Aperio nuclear algorithm, we attempted to further adjust the RGB values for the counterstain levels to see whether the algorithm would pick up the 10 false-negative cases. However, we were unable to find a set of values that would satisfy all cases. When settings were changed that allowed the algorithm to recognize these 10 cases as positive, the altered algorithm classified clearly negative nuclei as positive in other cases, or picked up far fewer nuclei than were actually present.

Discussion

The 2010 ASCO-CAP guidelines for ER assessment recommend image analysis to quantify percent-positive tumor cells,5 especially as it is difficult to reliably score to a 1% threshold without laboriously counting individual cells. Aside from assisting pathologists, automated analysis systems such as the Aperio ScanScope XT and its associated algorithms have also been shown to be useful in discovery of more complex relationships between biomarkers.36 Here we show that one method of automated chromogenic assessment shows good reproducibility and prognostic value, but, compared with fluorescence, is limited by the nature of chromogenic staining itself. Chromogenic staining requires a counterstain to provide context; however, this counterstain introduces inherent complications to objective scoring. It is well known that the quality and intensity of hematoxylin counterstaining varies among preparations, vendors and protocols, over the lifetime of the reagent, and also between cell and tissue types. The CAS-200 platform is an example of a system that required adjustments to account for counterstain differences between slides and batches.37 In the clinic, when a patient case has an obvious problem with the counterstain, the slide can be sent back and another stain requested. However, there is still a chance that even ‘acceptable’ counterstaining can mask low-level chromogenic staining, whether by eye or by automated color-deconvolution (or spectral unmixing) analysis, as occurred in five cases in this study.38 Previous unpublished work from our lab suggests that there are a number of cases where dark staining with hematoxylin, due either to tissue variation or pathologist preference, has obscured low-level ER expression to generate a false-negative test.

Fluorescent detection avoids the disadvantages and limitations of the hematoxylin counterstain, but has other limitations. Specifically, the absence of hematoxylin makes it challenging to generate the cellular context with a conventional IHC appearance. Whereas additional fluorophores can be used to visualize other tissue features, the image is still quite different from conventional IHC. QIF is also generally costlier than traditional IHC. Unfortunately, the cost analysis of automated ER evaluation in clinical lab settings is beyond the scope of this manuscript, and this information is not accessible to us. One could imagine though that routine ER assessment might be performed using regular DAB-based IHC as established, and just the cases that are negative by this assay could be sent out to laboratories that offer fluorescence-based assays, taking advantage of increased sensitivity of this assay for low expressing biomarkers. Other advantages of QIF consist of broader dynamic range, dynamic adjustment of exposure time, and decreased requirement for human interface for tumor selection.

Perhaps the greatest advantage of QIF lies in the potential to generate a standard curve that can be used to establish a defined, reproducible cutoff for every assay. This method also has the potential to enable more accurate quantification of biomarker expression.38 Recent studies have demonstrated that quantification by ELISA can provide more accurate assessment of patient outcome than qualitative IHC, and may even demonstrate a distinct benefit between negative, moderate, and strong ER positivity rather than just between positive and negative groups.39 This advantage extends beyond analysis of ER in breast cancer to most accurate quantification of biomarker expression levels in various cancer and tissue types.

Whereas this study of comparison of different methods of ER analysis was performed in a rigorous and tightly controlled manner, it is subject to a number of limitations. Evaluation of ER expression was performed on TMAs, which allows a high-throughput approach, but does not truly represent the clinical setting where biopsies or whole-tissue sections are routinely stained and evaluated for the biomarker in question. One can argue that discordances in ER assessment are because of the small amount of tumor represented in a 0.6-mm TMA core. This might be a valid argument regarding ER heterogeneity, as 0.6 mm might not always represent the ER status of whole-tissue sections. However, the different staining methods were performed on serial sections, reducing heterogeneity between methods to a minimum. In addition, it does not resolve the issue of false-negative reading due to variability in hematoxylin staining intensity. Moreover, the three methods of ER analysis were also compared on a number of whole-tissue sections (around 25 samples for this study). These data were not shown in the manuscript because they did not render additional information. The results of ER analysis on whole-tissue sections using the different methods of assessment did not show any discrepancies, probably because of the low number of cases. Another limitation of this study is that staining and analysis were performed within a single institution. Whereas this approach guarantees consistency for pre-analytical tissue processing and analytical procedures, these results would be more robust if more than one laboratory participated in the study.

In addition, this study does not reveal a significant difference of ER reading methods in regards to survival analysis. However, this observation might be because of the relative small number of patients included in survival analysis. To determine the best prognostic and predictive value of these tests by Kaplan–Meier analysis, a larger number of patients would need to be analyzed with all the three methods.

In summary, each of the methods of in situ protein detection in FFPE tissue samples has its strengths and weaknesses. Whereas conventional DAB-based IHC is a well-established and inexpensive procedure, reproducibility and sensitivity of the scoring are dependent on the counterstain and the reading method—by eye or automated. QIF on the other hand offers an automated and standardized approach to biomarker evaluation. Higher sensitivity of the assay and broader dynamic range facilitate more exact measurements of protein concentrations. Increased costs of QIF and the absence of hematoxylin generating the cellular context with a conventional IHC appearance need to be considered.

In theory, QIF can combine the best of both worlds—in situ evaluation of a biomarker and rigorous quantification. Our data here and in previous work by others and us suggest that patient care may be improved with quantitative assessment. Whereas the percentage of discordant cases in this study (8.2%) is relatively low, and in keeping with expected variability compared with other studies,40 a more objective estimate of ER positivity could benefit hundreds of thousands of women worldwide.