Comparison of Visual and automated assessment of Ki-67 proliferative activity and their impact on outcome in primary operable invasive ductal breast cancer

Background: Immunohistochemistry of Ki-67 protein is widely used to assess tumour proliferation, and is an established prognostic factor in breast cancer. There is interest in automating the assessment of Ki-67 labelling index (LI) with possible benefits in handling increased workload, with improved accuracy and precision. Patients and methods: Visual and automated assessment of Ki-67 LI and survival were examined in patients with primary operable invasive ductal breast cancer. Tissue microarrays (n=379 patients) immunostained for Ki-67 were scored visually and automatically with the Slidepath Tissue IA system. Results: Visual and automated Ki-67 LI were in excellent agreement (ICCC=0.96, P<0.001). On univariate analysis, visual (P<0.001) and automated Ki67 LI (P<0.05) were associated with cancer-specific survival in patients with invasive ductal breast cancer overall and in patients who received endocrine therapy (Tamoxifen) (P<0.01 for visual and P<0.05 for automated scoring). Conclusion: Automated assessment of Ki-67 LI would appear to be comparable to visual Ki-67 LI. However, automated Ki-67 LI assessment was inferior in predicting cancer survival in patients with breast cancer, including patients who received Tamoxifen.

Breast cancer accounts for 22% of all female cancers (Parkin et al, 2001). More than 42 000 women in the UK are diagnosed with breast cancer each year and approximately 80% survive at least 5 years (Cancerstats, 2008).
Tumour progression is influenced by tumour cell proliferation, which can be estimated by measuring the expression of the nuclear antigen Ki-67. Ki-67 expression is tightly linked to the cell cycle (Scott et al, 1991;McCormick et al, 1993), but does not appear to be expressed during DNA repair, and Ki-67 has been used to identify good and poor prognostic categories in invasive breast cancer (Fitzgibbons et al, 2000). Several recent studies have reported an association between higher Ki-67 proliferative activity and poorer recurrence-free (Goldhirsch et al, 2007;Viale et al, 2008;Jung et al, 2009) and cancer-specific survival (de Azambuja et al, 2007;Al Murri et al, 2008;Yerushalmi et al, 2010). The Ki-67 proliferative activity has also been reported to be associated with the clinical response to chemotherapy (Goldhirsch et al, 2007;Viale et al, 2008;Dowsett et al, 2009;Jones et al, 2010).
Nuclear Ki-67 is usually estimated as the percentage of tumour cells positively stained by immunohistochemistry. Compared with other markers of proliferation, Ki-67 proliferative activity is accurate, easy and economical to be determined, and consistent, which makes it an ideal diagnostic tool (Urruticoechea et al, 2005). Recently introduced image analysis techniques offer the potential for automated assessment and possibly increased precision, but this may prove difficult in heterogeneous tissues like breast carcinomas (Urruticoechea et al, 2005).
The aim of the present study was to assess whether automated scoring of Ki-67 proliferative activity was as accurate as visual scoring in terms of both precision and prognostic ability in primary operable invasive ductal breast cancer.

PATIENTS AND METHODS
Patients presenting with invasive breast cancer at Royal Infirmary, Western Infirmary or Stobhill Hospital, Glasgow, between 1995 and 1998 were studied (n ¼ 379). Available clinico-pathological data included age, histological tumour type, grade, tumour size, lymph node status, oestrogen (ER) and progesterone (PR) status, type of surgery, and use of adjuvant treatment (chemotherapy, hormonal therapy and/or radiotherapy). Tumour proliferative activity was determined as Ki-67 labelling index (LI) in these patients.
Institutional Review Board approval for the use of human tissue in this study was given by the Research Ethics Committee of the North Glasgow University Hospitals NHS Trust.

Methods
Tissue micro array (TMA) construction TMAs were used in the present study. In brief, a tumour-rich area of each specimen was identified and marked by a qualified pathologist (EM), and TMAs were constructed in triplicate, using 0.6 mm 2 cores, to account for intra-tumour disease heterogeniety (Tovey et al, 2006).
Immunohistochemistry Ki67 immunohistochemistry was performed by established protocols in the Department of Pathology, Glasgow Royal Infirmary with appropriate positive and negative controls. Dako anti-Ki-67 (monoclonal mouse anti-human, Ki-67 antigen, clone MIB1, code M7240, DAKO, Glostrup, Denmark) was used at dilution 1 : 100 for 30 min for immunohistochemistry on a Bond Max automated slide stainer (Leica Microsystems, Wetzlar, Germany) according to the manufacturer's instructions, with Leica Envision detection system. Slides were lightly counterstained with haematoxylin, dehydrated and mounted with DPX.
Slide scanning and scoring Stained slides were scanned using a Hamamatsu NanoZoomer (Hertfordshire, UK). Visualisation and automated cell counts were carried out using the Slidepath Tissue IA system version 3.0 (SlidePath's Tissue IA system, Dublin, Ireland) and visual counting of the percentage of positive cells was performed on a computer monitor.
Assessment of tumour proliferative activity (Ki-67) The number of Ki-67-positive cells was counted both visually and automatically (Canna et al, 2008), and the percentage of positive invasive carcinoma cells, as a percentage of total tumour cells, was calculated in all three cores. As each tumour had triplicate cores, the mean count for each carcinoma was taken as a final score. A total of 65 cores were counted independently by two observers (BE and ZM) blinded to patient outcome and the other observer's score, giving an interclass correlation coefficient (ICCC) of 0.94, indicating excellent agreement. ZM subsequently scored all slides. The accuracy of scoring depends on individual cores containing a satisfactory sample of tumour cells, which was checked by a qualified pathologist (JJG).
Image Analysis of Ki-67 staining For the automated determination of Ki67 LI, digitised slides were accessed through the Slidepath Image Analysis system and evaluated using the program's nuclear scoring algorithm, which quantifies nuclear staining within individual cores and derives a counting score for each target area.
Nuclei stained with polymerised diaminobenzidine and/or haematoxylin are identified and separated by a thresholding and segmentation algorithm. Using the Slidepath software, specific cell populations within a heterogeneous sample can be selected for analysis according cell nuclear area. The investigator is able to adjust the upper and lower limits of a range of acceptable nuclear areas, such that cells within the specified range are accepted for analysis, whereas those out of the range are rejected. In this way, the relatively large tumour cells can be selected over, for example, relatively much smaller inflammatory cells. However, this system does not work perfectly and inevitably there is some error in the selection process-for example, some visual artefacts may be accepted as tumour nuclei, multiple small nuclei may be confused for a single large nucleus when located close together, or obvious (to the human observer) tumour nuclei may be mistakenly rejected. These errors may represent limitations to the utility of automated image analysis.
Nuclear Ki-67 staining is classified as positive or negative based on observer-specified intensity thresholds. Pseudo-colours (red/blue) display these staining intensity measurements for individual nuclei, allowing thresholds to be chosen appropriately ( Figure 1A -C).
Intensity thresholds were chosen for a sample of TMA cores from the whole cohort and once set they were used for analysis over the entire patient cohort without adjustment.
Statistical analysis Several methods were used to examine the correlation between visual and automated Ki-67 LI in order to make comparisons with published studies easier: ICCC, Spearman's r and Pearson's r. Consistency between the observers was analysed using the statistic k with values 0.40-0.59 considered to represent moderate agreement; 0.60-0.79 good, and 40.80 very good agreement (Landis and Koch, 1977). Univariate analysis with calculation of hazard ratios was performed using a Cox proportional hazards model. Deaths up to March 2010 were included in the analysis. Inter-relationships between the methods were assessed using contingency tables with the w 2 test for trend as appropriate. Analysis was performed using SPSS software version 18 (SPSS Inc., Chicago, IL, USA).

RESULTS
Clinical and pathological characteristics of patients (n ¼ 379) are shown in Table 1. Most were older than 50 years (69%), had a grade I or II carcinoma (54%) smaller than 2 cm (58%) with no axillary lymph node involvement (55%). A total of 225 patients (59%) had ER-positive tumours and 168 patients (44%) had PR-positive tumours. Three hundred patients (79%) had HER-2 negative tumours. In all, 184 (49%) patients received only endocrine therapy, 77 (20%) received only chemotherapy only and 95 (25%) received both. As there are no generally accepted prognostic thresholds for Ki-67 LI, survival analysis was undertaken by tertiles. Survival curves for visually assessed Ki67 LI ( Figure 2) indicated that first and second tertiles were prognostically favourable and could be considered 'low' with the prognostically adverse third tertile taken as 'high'. This yielded a cutoff at 15% (o15% low, 415% high). A total of 272 (72%) patients had a carcinoma with a low Ki-67 proliferative activity using this criterion. Table 2 compares visual and automated determinations of Ki-67 LI. Of the 272 cases with a low visual Ki-67 LI, 39 (14%) cases scored high by the automated method. Of the 107 cases with a high visual Ki-67 LI, 11 (10%) scored low by the automated method. As expected, visual and automated determinations of Ki-67 LI were strongly correlated (r ¼ 0.87, r ¼ 0.94, Figure 3) with good agreement of visual and automated Ki-67 status using the cutoff described (ICCC ¼ 0.96, Po0.001). The k value 0.70 also reflected good agreement.
The minimum follow-up was 142 months; median follow-up of survivors was 165 months. During follow up 92 patients had recurred, 15 local, 57 distant and 4 both; 163 patients died, 81 died of their cancer. Univariate relationships between survival and Ki-67 proliferative activity determined by visual and automated methods are shown in Table 3. Visual (Po0.01) but not automated (P ¼ 0.557) measurements of Ki-67 status were predictive of recurrence-free survival while both visual (o0.001) and automated (Po0.05) measurements were predictive of cancer-specific survival ( Figure 4A and B), visual scoring achieved a higher level of statistical significance (Po0.001) than the automated score (Po0.05).
Cancer recurrence and cancer-specific survival on endocrine therapy (Tamoxifen) were of interest as possible indicators of the 108 120 132 144 Figure 2 The relationship between Ki-67 assessed using tertiles by visual counting method and cancer outcome in patients with invasive ducal breast cancer.   ability of different scoring methods to predict the response to Tamoxifen. Univariate survival analysis was therefore undertaken for patients with ER-positive tumour who received treatment with Tamoxifen. Recurrence-free and cancer-specific survival by Ki-67 status determined visually and automatically are shown for these patients in Table 4. Visually determined Ki-67 status predicted recurrence-free (Po0.01) and cancer-specific survival (Po0.001) in patients who received Tamoxifen, whereas the automated method (Po0.01) was only significantly associated with cancer-specific survival in patients who received Tamoxifen ( Figure 5A and B).

DISCUSSION
The results of the present study show that visually assessed Ki-67 proliferative activity was associated with cancer-specific survival in   patients with operable ductal breast cancer overall and in patients treated with Tamoxifen. Although Ki-67 proliferative activity assessed by the automated system was in reasonably good agreement with visual assessment, its prognostic value with respect to recurrence-free and cancer-specific survival was not as high as visual assessment. These results confirm the clinical value of visually assessed Ki-67 status and suggest that more work is required before automated assessment can be unreservedly recommended for routine clinical laboratory measurement of Ki-67 LI in patients with invasive ductal breast cancer. Visual methods are used widely in the clinical assessment of Ki-67 proliferative activity. Centralisation of laboratory services and increasing workloads may increase interest in image analysis for the assessment of Ki-67 LI. Image analysis may possibly provide more detailed information and improve quality control (Faratian et al, 2009). Observer interpretations can vary, but a human observer may be better at recognising non-tumour or stromal areas in the sample than an image analysis algorithm (Konsti et al, 2011), which may explain why Ki-67 LI determined automatically was less predictive than visually determined Ki-67 status in the present study. Alternatively, disparities in prognostic effectiveness between visual and automated scoring in this instance may be due to the software errors in tumour cell selection described earlier.
From the literature it is clear that many cutoffs for Ki-67 LI have been used to predict cancer outcome. However, without standardisation of the methodology, such cutoffs have limited clinical value outwith specific centres. Indeed, this problem has been recognised by the International Ki-67 in Breast Cancer Working Group as they were unable to come to consensus regarding the ideal cut point(s) that might be used in routine clinical practice (Dowsett et al, 2011). Nevertheless, it is of interest that Ki-67 LI of between 10 and 20% have been mostly reported to be associated with cancer outcome (Stuart-Harris et al, 2008). In the present study, survival analysis was initially undertaken using tertiles (making no assumption of the correct prognostic cutoff point). Survival curves for these Ki-67 LI tertiles were examined ( Figure 2) and indicated that first and second tertiles had similar good outcome and therefore could be considered as 'low risk', and the third tertile had poor outcome and therefore could be considered as 'high risk'. This analysis yielded a cutoff at 15% staining positivity, reassuringly in the middle of the 10 -20% range.
In the present study, the main question was whether the determination of Ki-67 status by visual and automated methodologies could be regarded as equivalent. The main outcome measure was the parity (or lack thereof) of automated staining assessment, compared with visual assessment methods. Agreement between the methods was good, which was determined using standard statistical approaches. However, the two methods yielded significantly different prognoses with respect to recurrence-free and cancer-specific survival using Cox regression analysis and Receiver operator characteristics.
In the present study, 14% of patients with low visual Ki-67 status were in the high Ki-67 group by automated assessment. Dowsett and co-workers pointed out that negative nuclei determine the overall population for calculating the proportion of Ki-67-positive cells, and that weak counterstaining can therefore result in an overestimation of the Ki-67 index. Thus, it is important to optimise the degree of counterstaining (Dowsett et al, 2011). It is possible that the counterstain used in this study was slightly weak, therefore resulting in an underestimation of negative nuclei and, hence, an overestimation of the Ki-67 index in some cases. Such a scenario might explain some of the observed discrepancies. Moreover, a few discrepant cases were associated with section damage, dye precipitates, imperfect (out-of-focus) scanning or cytoplasmic staining. So, although automated Ki-67 LI has some promise to replace the visual method in the routine clinical pathology laboratory, considerable care will be required to generate reliable clinical measurements.
Other studies have investigated the automated assessment of Ki-67 status in breast cancer (Fasanella et al, 2011;Konsti et al, 2011). Correlations between visual and automated assessment were similar (r ¼ 0.94 for this study, 0.85 Fasanella; k 0.70 for this study, 0.57 Konsti), despite the use of different image analysis systems. Fasanella et al (2011) did not examine the relationships between Ki-67 proliferative activity and survival. In a more comparable study, Konsti et al (2011) reported that automated assessment of Ki-67 proliferative activity had prognostic value in 1334 breast cancer patients, but did not examine the survival relationships in the subgroup of patients who received Tamoxifen.
In conclusion, the present study does show good agreement between visual and automated assessment of Ki-67 proliferative activity in invasive breast cancer, and that automated assessment of Ki-67 LI would appear to be comparable to visual Ki-67 LI. However, automated Ki-67 LI assessment was inferior in predicting cancer survival in patients with breast cancer, including patients who received Tamoxifen. Visually determined Ki-67 status was better, and therefore, although automated assessment of Ki-67 proliferative activity may have a role in clinical assessment of breast cancer, careful validation remains necessary.