Main

For locally advanced breast cancers, neoadjuvant chemotherapy is often administered before surgical intervention. This method for treating cancer originally had two goals, improving the feasibility of breast-conserving surgery and enhancing overall survival.1, 2, 3, 4 Randomized trials have confirmed the success of achieving breast-conserving surgery.5, 6, 7, 8 Unfortunately, improvement in overall survival resulting from neoadjuvant chemotherapy has remained elusive,9, 10 although some studies demonstrate short-term survival benefit.11, 12

Patients who achieve a pathological complete response (pCR) to neoadjuvant chemotherapy have an improved disease-free and overall survival13, 14 compared with non-pCR cases where pCR is defined as the absence of residual invasive carcinoma15 with or without residual carcinoma in situ.16 Large multicenter trials have demonstrated pCR in only 15–20% of patients treated with standard neoadjuvant chemotherapy.1 This proportion increases to 26% when a regimen, including a combination of adriamycin (doxorubicin) and cyclophosphamide (cytoxan) followed by a taxane, is used.17 Despite the usefulness of this regimen, adverse effects of these drugs include nausea, vomiting, low white blood cell counts, cardiotoxicity, fever, diarrhea, and peripheral neuropathy.18, 19

Biomarkers that predict response to therapy may eliminate unnecessary treatments or could help adapt treatment regimens to induce a better response. One predictor of response is molecular subtype. HER2-positive and triple-negative breast cancers are more likely to show pCR after neoadjuvant chemotherapy, compared with luminal subtypes.17, 20 Clinicopathologic features, including age, tumor size, grade, and ER status, have been combined into a nomogram to predict the likelihood of response. Attempts have also been made to define gene expression signatures that predict therapeutic response.21, 22, 23, 24, 25 Unfortunately, most of these models do not outperform clinical nomograms.26 Nonetheless, novel molecular biomarkers may provide an alternative approach to establishing determinants of chemoresistance.

Ki-67, a marker of cell proliferation, is expressed in all phases of the cell cycle, except G0.27 This protein is localized to the nucleus28 and its expression is often quantified in terms of percentage of positive nuclei. This is usually a semiquantitative estimate determined by pathologists who count as many as 1000 nuclei to determine this proportion. The threshold for differentiating high and low Ki-67 reported by the literature varies widely from 1% to 28.6%.29 Another potential source of variability in this measurement is the field of view (FOV) selected by the pathologist for determining Ki-67.30 It is controversial whether the pathologist should choose hotspot areas or simply count all areas and average. Likewise, no consensus has been achieved regarding whether a relationship exists between Ki-67 levels before neoadjuvant chemotherapy and improved response.31, 32 Some studies have determined that high expression of Ki-67 before therapy is significantly predictive,33, 34, 35, 36, 37 but others have concluded that Ki-67 was not an independent predictor of pCR.38, 39, 40 This variability in predictive value may be due to the absence of standardized, objective, and controlled methods for measurement.

The aim of the present study is to objectively measure Ki-67 on a cohort of biopsies obtained before neoadjuvant chemotherapy. We used an immunofluorescence-based quantitative approach, which enabled objective analysis of every FOV. We compare different objective scoring methods, including measurements defined by maximum score to simulate hotspot scoring and measurements defined by the average of all fields, to determine which is the most predictive for response to neoadjuvant chemotherapy.

MATERIALS AND METHODS

Patient Cohort

A cohort of 105 consecutive invasive breast cancer patients that received neoadjuvant therapy were included if pre-surgical biopsies were obtainable. Tissue was collected from the archives of the Department of Pathology at Yale University (New Haven, CT). All patients were diagnosed between 2002 and 2010, and patient age at diagnosis ranged from 28 to 76 years, with a median age at diagnosis of 49 years. All patients received neoadjuvant therapy, with 68.5% receiving a regimen containing four cycles of doxorubicin, administered with cytoxan, followed by four cycles of taxane. An additional 12.4% received trastuzumab. The distribution of treatment regimens, nuclear grade, tumor size, hormone receptor status, and HER2 status is described in Table 1. The tissue collected was used after written patient consent. The study was performed according to the Yale University Institutional Review Boards protocol number 9500008219.

Table 1 Patient characteristics overall and by pCR group

Whole Tissue Section Assay

Biopsy specimens were placed on slides as whole tissue sections with one to three cores on a slide. The Mib-1 mouse monoclonal antibody to Ki-6741 was used (Dako, Carpinteria, CA). This antibody had been previously validated by our research group.42, 43 Slides were deparaffinized by heating overnight at 60 °C and soaking in xylene, and were rehydrated in ethanol (twice in 100% ethanol for 1 min, twice in 95% ethanol for 1 min, once in 85% ethanol, and once in 75% ethanol). Antigen retrieval was performed in a PT module (LabVision, Fremont, CA) with citrate buffer (pH 6) at 97 °C for 20 min. Endogenous peroxidase activity was blocked with hydrogen peroxide in methanol at room temperature for 30 min. Non-specific antigens were blocked with incubation in 0.3% bovine serum albumin in Tris-buffered saline/Tween for 30 min. Slides were then incubated in a cocktail of the Ki-67 mouse monoclonal antibody (1:100 dilution) and a rabbit monoclonal cytokeratin antibody (1:100 dilution, Dako) for 1 h at room temperature. Next, slides were incubated in a cocktail of Alexa Fluor 546-conjugated goat anti-rabbit secondary antibody (Life Technologies—Invitrogen, Carlsbad, CA) diluted 1:100 in mouse EnVision reagent (Dako) for 1 h at room temperature. The EnVision reagent contains a mouse secondary antibody conjugated to many molecules of horseradish peroxidase (HRP). Slides were then incubated in Cyanine 5 (Cy5)-tyramide (PerkinElmer, Boston, MA) at 1:50 dilution for 10 min, thereby activating the HRPs for signal amplification. Finally, Prolong Gold containing 4′6-diamidino-2-phenylindol (DAPI) was used to identify nuclei. All staining was performed on LabVision 720 Autostainer (Thermo Scientific).

Quantitative Immunofluorescence Using AQUA

Automated quantitative analysis (AQUA) is a method used to objectively and accurately measure protein expression within the tumor and subcellular compartments, as described previously.44, 45 Briefly, a set of monochromatic, high-resolution images were captured using a PM-2000 image workstation (HistoRx, Branford, CT). For each FOV, three images were captured at wavelengths matching the DAPI, Alexa 546, and Cy-5 fluorophores to analyze nuclei, cytokeratin, and Ki-67, respectively. These images were analyzed by the AQUA software. A tumor mask was created to define the tumor area by dichotimizing the cytokeratin signal so that each pixel was ‘on’ or ‘off.’ A nuclear compartment and a Ki-67 compartment were similarly created from the DAPI signal and Cy5 signal, respectively, with the Ki-67 compartment entirely contained within the nuclear compartment.

An AQUA score for each FOV was then calculated as the signal intensity of Ki-67 in the nuclear compartment divided by the area of the nuclear compartment, and this score was normalized for exposure time, bit depth, and lamp hours. Percentage of positive nuclei was determined by dividing the area of the Ki-67 compartment by the area of the nuclear compartment and multiplying by 100. Whole tissue sections were included for further analysis when at least five FOVs were captured, in which the tumor area represented at least 2% of the total area of the tissue specimen. For each whole tissue section, the mean and maximum value among all FOVs was calculated for AQUA score and percentage of positive nuclei.

Statistical Analysis

Pearson’s correlation coefficient (R) was used to assess correlation between quantitative measurements of Ki-67 expression. Fisher’s exact test and χ2-testing were used to determine the significance of clinicopathologic factors (grade, hormone receptor status, HER2 status, and treatment) in predicting pCR. Logistic regression was used for univariate and multivariate analyses. All statistical analysis was performed using Statview software (SAS Institute, Cary, NC).

RESULTS

Factors Correlating with pCR

Initially, 105 slides were analyzed for Ki-67; however, only 94 had enough FOVs to be considered for further analysis. The number of FOVs analyzed ranged from 5 to 115, with a mean of 34.6 and median of 30. In this cohort, pCR correlated significantly with lack of nodal metastasis (P=0.0002), ER-negative status (P=0.011), and PgR-negative status (P=0.015). Among the 94 patients analyzed, pCR was trending with HER2-positive status (P=0.094), and this association was significant when all 105 patients in the cohort were considered (P=0.024). There was no association between response and age at diagnosis, tumor size, nuclear grade, or treatment regimen (Table 1).

Ki-67 expression was objectively measured in terms of percent positive nuclei and AQUA score for each FOV. Percentage of positive nuclei measures the frequency of expression above the threshold of detection in each nucleus, whereas the AQUA score takes both frequency and intensity of expression into account. The two scoring methods were highly concordant, where a high AQUA score was nearly always represented by a high percentage of positive nuclei (Figure 1a). Some fields contained a low percentage of positive nuclei with high intensity (Figure 1b), whereas others displayed several positive nuclei with low intensity (Figure 1c). Analysis of each FOV provided an approach to quantification of intratumoral heterogeneity. Most biopsy sections contained regions with higher Ki-67 expression, known as ‘hotspots,’ for both scoring methods. Heat maps were created to visually depict this intratumoral heterogeneity (Figure 1d).

Figure 1
figure 1

Representative Ki-67 staining and demonstration of tumor heterogeneity. Selected images are taken from different fields of view from the same biopsy. In these images, Ki-67 staining is in red color and cytokeratin is in green color. These include a field with intense Ki-67 staining with a high percentage of positive nuclei (a), a field with few positive nuclei but intense Ki-67 staining (b), and a field with a high percentage of positive nuclei but less intense staining (c). A heat map of automated quantitative analysis (AQUA) scores obtained for all fields from this biopsy specimen demonstrates heterogeneity among AQUA scores (d).

An AQUA score and the percentage of positive nuclei were determined for a tumor in two different ways. The mean score of all of the FOVs is henceforth labeled the ‘average score,’ and the score of the FOV with the highest score is the ‘maximum score.’ For Ki-67, both a high average and a high maximum AQUA score were significantly associated with pCR (Figures 2a and b). The same trend was observed when Ki-67 was measured using the percentage of positive nuclei (Figures 2c and d).

Figure 2
figure 2

Comparison of methods for determining Ki-67 positivity. (a) Distribution of automated quantitative analysis (AQUA) scores calculated by averaging all fields within a single biopsy. (b) Distribution of AQUA scores from the field with the maximum AQUA score from each biopsy. (c) Distribution of percentage of nuclei positive for Ki-67 determined by calculating average percentage positive nuclei for each field within a biopsy. (d) Distribution of percent positive nuclei by determining the field with the maximum percentage of nuclei positive for Ki-67. (e) Regression between AQUA scores averaged for all fields and percent positive nuclei averaged for all fields. (f) Regression between maximum AQUA score for each biopsy and maximum percent positive nuclei for each biopsy.

Comparison between Scoring Methods for Ki-67

AQUA score and percent positive nuclei were positively correlated with an r2-value approaching 0.9 (Figures 2e and f). This held true for both average score and maximum score, and in both instances the AQUA score could be approximated by multiplying the percent positive nuclei by 279. Regardless of whether AQUA score or percent positive nuclei were used, the relationship between high Ki-67 score and pCR was significant. The strength of this association was similar between these two scoring methods (Table 2).

Table 2 Univariate logistic regression analysis of the prediction of pCR using different methods of Ki-67 scoring

Average score and maximum score were also compared as methods to evaluate Ki-67 expression as a predictive factor. Maximum score was more predictive of pCR than average score (OR: 3.546 vs 2.948 for AQUA scoring and 3.509 vs 2.712 for percent positive nuclei), although this difference was not statistically significant. High Ki-67 expression significantly correlated with pCR for both average and maximum score (Table 2). Despite the larger effect seen with maximum score, average score trended toward a better combination of sensitivity and specificity with an AUC of 0.769, compared with 0.732 when the maximum FOV was used (Figure 3).

Figure 3
figure 3

Receiver-operating curves for automated quantitative analysis (AQUA) analysis of Ki-67. Sensitivity and specificity were assessed for (a) AQUA scores averaged across all fields of view and (b) AQUA scores obtained by analyzing the maximum field of view.

Average and maximum AQUA score were also analyzed by multivariable logistic regression analysis in a model that included tumor size, nuclear grade, nodal status, ER positivity, and HER2 positivity (Table 3). In this analysis, high Ki-67 expression was an independent predictor of pCR for both average score (P=0.0025) and maximum score (P=0.0239).

Table 3 Multivariate logistic regression analysis for using the maximum Ki-67 AQUA score as a predictive biomarker for pCR

Ki-67 Expression Stratified by Subtype

Maximum AQUA score was also used to investigate whether Ki-67 was more predictive of response to therapy among subpopulations of patients. When only ER-positive patients are considered, increased Ki-67 AQUA score was associated with pCR (Figure 4a), although this relationship did not quite reach significance (P=0.054). For ER-negative and HER2-positive patients, there was insufficient statistical power to establish a relationship between Ki-67 AQUA score and pCR (Figures 4b and c, respectively). However, HER2-negative patients demonstrated a significant association between Ki-67 and pCR (P=0.001; Figure 4d).

Figure 4
figure 4

Distribution of Ki-67 automated quantitative analysis (AQUA) scores stratified by molecular subtype. AQUA scores for Ki-67 stratified based on (a) patients with ER-positive breast cancer and (b) patients with ER-negative breast cancer. AQUA scores for Ki-67 stratified based on patients with (c) HER2-positive breast cancer and (d) HER2-negative breast cancer.

DISCUSSION

High expression of Ki-67 is an independent predictor of pCR in this cohort of 94 evaluable breast cancer patients of 105 treated with neoadjuvant chemotherapy. This expression was significant in both univariate analysis and multivariate analysis with age, node status, hormone receptor positive, and HER2 status taken into account. We examined four different approaches for the objective analysis of Ki-67, and the results were only marginally different when compared with each other and with outcome.

The AQUA method of quantitative immunofluorescence offers an objective way to analyze Ki-67 expression. This method is beneficial, because it is efficient and easily reproducible. Moreover, it avoids the inherent subjectivity of a pathologist’s judgment. Automated scoring of Ki-67 by other methods has previously reported similar results to visual assessment and has been shown to minimize interobserver variability.46, 47 Analysis of Ki-67 by AQUA also eliminates the necessity of determining a cut point for percent positive nuclei. No consensus has been achieved regarding the ideal cut point between high and low Ki-67 staining in breast cancer, although levels between 10% and 20% are most common.48 Arguably, the best approach is to avoid a cut point altogether, as continuous data provide maximal information content.

No significant difference was discerned between analyzing Ki-67 expression by AQUA or percent positive nuclei. The AQUA score considers not only the frequency of expression but also the intensity of expression, whereas determination of percent positive nuclei only represents the frequency. Despite differences in methodology, Ki-67 expression scoring determined by the two methods was positively correlated further minimizing the difference between these approaches in predicting outcome.

The choice of FOVs utilized for Ki-67 analysis represents a longstanding controversial topic. Although some pathologists favor averaging the Ki-67 score over the entire sampled specimen,30, 47 others have suggested the selected region with the highest level (the hotspots) of Ki-67 should be selected for scoring.49, 50 The current recommendation by the International Ki-67 in Breast Cancer Working Group for determining Ki-67 expression is to count at least 500 invasive cancer cells and to include areas of high Ki-67 expression.30 The AQUA automated method allows analysis of the entire biopsy and thus the comparison between the average and the hotspot (maximum score) methods of data analysis. In this study, there was no significant difference between maximum score and averaging scores from all of the FOVs in terms of predicting of pCR, although both methods determined this correlation to be significant. Conversely, the average score trended toward better sensitivity and specificity for determination of whether a patient will respond to therapy, although the difference between the average and maximum score was once again not significant.

Although limited by the small cohort size, stratification of the cohort showed some interesting trends. The relationship between high Ki-67 expression and pCR was observed among HER2-negative patients but not HER2-positive patients, although there were fewer HER2-positive patients and thus less statistical power. As most of the HER2-positive patients in this cohort received trastuzumab therapy, Ki-67 expression may not correlate with response to this targeted therapy. Interestingly, Ki-67 expression was trending more predictive of pCR in ER-positive patients than ER-negative patients. This result could be an artifact due to the higher number of ER-positive patients in the cohort and therefore higher statistical power. Another explanation could be the infrequency of high Ki-67 expression among ER-positive patients compared with ER-negative patients. Therefore, ER-positive patients with high Ki-67 expression may stand out compared with the rest of the subpopulation as candidates for anthracycline-based neoadjuvant treatment. In another study, high Ki-67 was indicative of improved relapse-free survival among exclusively ER-positive patients treated with adjuvant chemotherapy and endocrine therapy.51 As this cohort is not sufficiently powered for subtype analysis, all of these findings should be further validated in a larger study to draw any conclusions within subtypes of breast cancer.

In conclusion, automated quantitative analysis of Ki-67 determined that high expression of Ki-67 was predictive of pCR in univariate and multivariate analysis. This method is most importantly objective, and therefore avoids some of the pitfalls of subjective analysis of percent positive nuclei, including interobserver variability and determination of a cut point between high and low expression. This method also enables efficient analysis of the entire biopsy, removing the subjectivity of hotspot selection. Averaging all FOVs also trended toward a marginally more sensitive and specific assay than considering only the maximum FOV. Future larger studies will be needed to confirm these observations.