An international study to increase concordance in Ki67 scoring

Polley, Mei-Yin C; Leung, Samuel C Y; Gao, Dongxia; Mastropasqua, Mauro G; Zabaglo, Lila A; Bartlett, John M S; McShane, Lisa M; Enos, Rebecca A; Badve, Sunil S; Bane, Anita L; Borgquist, Signe; Fineberg, Susan; Lin, Ming-Gang; Gown, Allen M; Grabau, Dorthe; Gutierrez, Carolina; Hugh, Judith C; Moriya, Takuya; Ohi, Yasuyo; Osborne, C Kent; Penault-Llorca, Frédérique M; Piper, Tammy; Porter, Peggy L; Sakatani, Takashi; Salgado, Roberto; Starczynski, Jane; Lænkholm, Anne-Vibeke; Viale, Giuseppe; Dowsett, Mitch; Hayes, Daniel F; Nielsen, Torsten O

doi:10.1038/modpathol.2015.38

Download PDF

Original Article
Published: 20 February 2015

An international study to increase concordance in Ki67 scoring

Mei-Yin C Polley¹,
Samuel C Y Leung²,
Dongxia Gao²,
Mauro G Mastropasqua³,
Lila A Zabaglo⁴,
John M S Bartlett⁵,
Lisa M McShane ORCID: orcid.org/0000-0001-8195-3206¹,
Rebecca A Enos⁶,
Sunil S Badve⁷,
Anita L Bane⁸,
Signe Borgquist⁹,
Susan Fineberg¹⁰,
Ming-Gang Lin¹¹,
Allen M Gown¹²,
Dorthe Grabau⁹,
Carolina Gutierrez ORCID: orcid.org/0000-0002-2941-3461¹³,
Judith C Hugh¹⁴,
Takuya Moriya¹⁵,
Yasuyo Ohi¹⁶,
C Kent Osborne¹³,
Frédérique M Penault-Llorca¹⁷,
Tammy Piper¹⁸,
Peggy L Porter¹¹,
Takashi Sakatani¹⁹,
Roberto Salgado²⁰,
Jane Starczynski²¹,
Anne-Vibeke Lænkholm²²,
Giuseppe Viale²³,
Mitch Dowsett²⁴,
Daniel F Hayes²⁵ &
Torsten O Nielsen²
on behalf of the International Ki67 in Breast Cancer Working Group of the Breast International Group and North American Breast Cancer Group (BIG-NABCG)

Modern Pathology volume 28, pages 778–786 (2015)Cite this article

6521 Accesses
166 Citations
16 Altmetric
Metrics details

Subjects

Breast cancer

Abstract

Although an important biomarker in breast cancer, Ki67 lacks scoring standardization, which has limited its clinical use. Our previous study found variability when laboratories used their own scoring methods on centrally stained tissue microarray slides. In this current study, 16 laboratories from eight countries calibrated to a specific Ki67 scoring method and then scored 50 centrally MIB-1 stained tissue microarray cases. Simple instructions prescribed scoring pattern and staining thresholds for determination of the percentage of stained tumor cells. To calibrate, laboratories scored 18 ‘training’ and ‘test’ web-based images. Software tracked object selection and scoring. Success for the calibration was prespecified as Root Mean Square Error of scores compared with reference <0.6 and Maximum Absolute Deviation from reference <1.0 (log2-transformed data). Prespecified success criteria for tissue microarray scoring required intraclass correlation significantly >0.70 but aiming for observed intraclass correlation ≥0.90. Laboratory performance showed non-significant but promising trends of improvement through the calibration exercise (mean Root Mean Square Error decreased from 0.6 to 0.4, Maximum Absolute Deviation from 1.6 to 0.9; paired t-test: P=0.07 for Root Mean Square Error, 0.06 for Maximum Absolute Deviation). For tissue microarray scoring, the intraclass correlation estimate was 0.94 (95% credible interval: 0.90–0.97), markedly and significantly >0.70, the prespecified minimum target for success. Some discrepancies persisted, including around clinically relevant cutoffs. After calibrating to a common scoring method via a web-based tool, laboratories can achieve high inter-laboratory reproducibility in Ki67 scoring on centrally stained tissue microarray slides. Although these data are potentially encouraging, suggesting that it may be possible to standardize scoring of Ki67 among pathology laboratories, clinically important discrepancies persist. Before this biomarker could be recommended for clinical use, future research will need to extend this approach to biopsies and whole sections, account for staining variability, and link to outcomes.

Systematically higher Ki67 scores on core biopsy samples compared to corresponding resection specimen in breast cancer: a multi-operator and multi-institutional study

Article Open access 21 June 2022

Balazs Acs, Samuel C. Y. Leung, … on behalf of the International Ki67 in Breast Cancer Working Group of the Breast International Group and North American Breast Cancer Group (BIG-NABCG)

AI improves accuracy, agreement and efficiency of pathologists for Ki67 assessments in breast cancer

Article Open access 13 January 2024

Amanda Dy, Ngoc-Nhu Jennifer Nguyen, … April Khademi

Computational pathology improves risk stratification of a multi-gene assay for early stage ER+ breast cancer

Article Open access 17 May 2023

Yuli Chen, Haojia Li, … Anant Madabhushi

Main

The nuclear proliferation marker Ki67 has potential utility in the clinical management of breast cancer.^{1, 2, 3, 4, 5} Oncologists frequently consider Ki67 index in making decisions about whether or not to administer chemotherapy. The American Society of Clinical Oncology, however, has refrained from recommending the use of Ki67 in clinical practice in large part due to lack of standardization in Ki67 assessment.⁶ A recent, comprehensive review found that, although Ki67 has clinical validity when scored by central laboratories, analytical standardization is needed.⁷ Despite such reservations, a recent consensus of international breast cancer experts has recommended using Ki67 via immunohistochemistry to guide clinical decisions regarding the use of adjuvant chemotherapy.⁸

In 2009, the International Ki67 in Breast Cancer Working Group of the Breast International Group and North American Breast Cancer Group formed to address the problem of inter-laboratory inconsistency in Ki67 assessment. It proposed guidelines for preanalytical and analytical conditions for Ki67 assessment, as well as for interpretation, scoring, and related data handling.¹ In 2011–2012, the Working Group carried out an international reproducibility study focusing on how much variability among Ki67 scores from different laboratories was contributed by scoring and staining procedures (‘Phase 1’). Findings from that study, which included eight highly experienced laboratories applying their own best practices to tissue microarray slides, revealed substantial variation in Ki67 scores among laboratories, on both centrally and locally stained sections (intraclass correlation of 0.71 and 0.59, respectively).⁹

To determine whether standardization of Ki67 scoring methodology could result in high concordance in scores among laboratories, the Working Group in 2012–2013 undertook a second study (‘Phase 2’), in which laboratories were asked to calibrate to a specified Ki67 scoring method before scoring centrally stained tissue microarray slides. The findings from this second study are reported here.

Materials and methods

This study was approved by the British Columbia Cancer Agency Clinical Research Ethics Board (protocol H10-03420). All tissue microarray cases represented clinical cases of invasive breast carcinomas diagnosed in British Columbia during 2009–2010, described previously.⁹

Central staining used the MIB-1 clone, mouse monoclonal antibody (Dako, Carpinteria, California). Ventana CC1 was used for the antigen retrieval (1:50 MIB-1 dilution, 32 min MIB-1 incubation), and UltraMap for detection. Counterstaining was performed using hematoxylin Gill1 (Ventana), 8 min staining time.

Web-Based Calibration Exercise

Laboratories calibrated to a prescribed scoring method using a web-based tool. Nine ‘training’ and nine ‘test’ tissue microarray cases representing a wide range of Ki67 scores served as the calibration cases, chosen from one of the 50-core (1 mm) centrally MIB-1-stained breast cancer tissue microarrays used in Phase 1. Images of the cores were uploaded to a website designed by the Genetic Pathology Evaluation Centre at the University of British Columbia.¹⁰

The website interface allowed laboratories to score the web-image cores via computer mouse click on individual cells. The website software tracked which cells were scored, how they were scored (positive or negative), and overall Ki67 score for each core. Images of hematoxylin and eosin-stained sections were also uploaded for reference.

The specific Ki67 scoring procedure prescribed for this study involved counting 250 cells at the top of the tissue microarray core and 250 cells at the bottom, with any level of brown nuclear staining in invasive cancer cells considered ‘positive’ (Supplementary Information S1). This formal counting method was chosen as it was a simple method that showed high intra-laboratory reproducibility in Phase 1.⁹ Our Phase 1, as well as studies by others, found counting to give more consistent results than visual estimation.^{9, 11, 12} During the calibration exercise, refinements were made to the scoring instructions. For example, a ‘sampler’ of images was provided to demonstrate lightest to darkest staining that should be considered ‘positive’ or ‘negative’. The scorers were also reminded not to score ductal carcinoma in situ, with example images provided (Supplementary Information S1).

Two laboratories (Laboratories E and H) yielding the highest internal consistency in Phase 1⁹ served as reference laboratories for the calibration exercise. The average of their scores for each core served as the reference scores. Seventeen non-reference laboratories scored the training and test cases following the instructions described above. A laboratory ‘passed’ the training or testing if it met preset performance criteria (see Statistical Analyses and Supplementary Information S2). If a non-reference laboratory did not ‘pass’ the training, it was asked to learn from its discrepancies with reference scoring by comparing its scored images with scored reference images on the website, where the scorer could zoom in or out as needed (Figure 1). Laboratories were allowed multiple attempts on the training set and had to ‘pass’ the training to proceed to the test cases. In the test set, only one attempt was allowed, and reference images were not shown.

Scoring-on-Glass Experiment

Sixteen (including the two calibration reference laboratories) of the 19 laboratories participated in the ‘scoring on glass’ study after calibrating to the prescribed method. Eight countries were represented.

The laboratories scored 50 prestained (MIB-1, Dako, Carpinteria, California) 1-mm cores on a tissue microarray glass slide following scoring instructions similar to those used in the calibration exercise (Supplementary Information S3). The 50 cases resided on the Phase 1 tissue microarray block not used for selecting the calibration cases.

Three Ki67-stained sections of this tissue microarray (from the same ribbon of serial sections) were mounted on glass slides and distributed among the 16 laboratories. Laboratories were assigned to Groups based on which specific tissue microarray section they scored: Group 1 (7 laboratories), Group 2 (4 laboratories), Group 3 (5 laboratories). All laboratories that had participated in our prior Phase 1 study⁹ were placed into Group 1.

Laboratories used a web-based counter application while scoring the glass slide under a microscope. This counter captured each laboratory’s Ki67 score per core (including numbers of cells scored positive or negative), time spent, and any comments. Images of hematoxylin and eosin-stained sections from the cores were also made available on the website for reference.

Statistical Analyses

Calculation of Ki67 score

Ki67 score was defined as the percentage of invasive cancer cells positively stained in the examined region. For the statistical analyses, these scores were transformed to a logarithmic scale by adding 0.1% and applying a log base 2 transformation to satisfy model assumptions of normality and constant variance.¹ For example, a Ki67 score of 30% would be transformed to log₂(30.1)=4.912.

Web-based calibration exercise

Prespecified criteria for evaluating whether a laboratory passed the calibration ‘training’ and ‘testing’ reflected deviation of scores from reference scores (Supplementary Information S2):

Root Mean Square Error <0.6 (log2-transformed data): Root Mean Square Error in a laboratory’s scores compared with reference scores.
Maximum Absolute Deviation <1.0 (log 2-transformed data): Maximum Absolute Deviation of the laboratory’s scores from the reference scores.

The paired t-test was used to compare the performance measures from the training (first attempt) and testing.

Scoring-on-glass experiment

‘Success’ for the scoring-on-glass was prespecified as an observed intraclass correlation statistically significantly >0.70, the approximate intraclass correlation observed in our Phase 1 study,⁹ in which experts used their own scoring methods (see Supplementary Information S4 for discussion of statistical power). An intraclass correlation ranges from 0 to 1, with 1 representing perfect reproducibility.

Among laboratories scoring the exact same tissue microarray section, the intraclass correlation represents the proportion of the total variability attributable to biological variability among the 50 patient tumor spots represented on the tissue microarray section. This intraclass correlation will be referred to as the same-section intraclass correlation.

Comparing between Groups of laboratories scoring different sections of the same tissue microarray, a slightly different intraclass correlation is computed reflecting the proportion of variability attributable to biological differences between patient tumors. In this situation, the same patient tumor is represented on three different tissue microarray sections, which may introduce additional variability, for example, due to intratumoral heterogeneity or assay variation. This second type of intraclass correlation will be referred to as the different-section intraclass correlation and is expected to be less than the same-section intraclass correlation.

The same-section and different-section intraclass correlations were calculated from a variance components model that quantified the contributions from different sources of variability (see Supplementary Information S4 for details). The sources of variability (variance components) considered in the model include patient tumor (ν_Patient), spot from the same patient tumor on different tissue microarray sections (ν_Spot), scoring laboratory (ν_Lab), and residual error (ν_Error). Residual error includes within-scorer variation and any other variation not explained by the other sources of variability.

Intraclass correlation estimates were computed as same-section intraclass correlation=(ν_Patient+ν_Spot)/(ν_Patient+ν_Lab+ν_Spot+ν_Error) and different-section intraclass correlation=ν_Patient/(ν_Patient+ν_Lab+ν_Spot+ν_Error), where each of ν_Patient, ν_Lab, ν_Spot, and ν_Error are replaced by their estimated values obtained by fitting the variance components model. Credible intervals for the variance components and these intraclass correlations were obtained using the Markov Chain Monte Carlo routines for fitting generalized linear mixed models (MCMCglmm package in R; see Supplementary Information S4).

Secondary analyses examined concordance among laboratories for dichotomized Ki67 scores (‘low’ versus ‘high’) for several cutoffs commonly used in clinical practice or reported in the literature. To obtain a simple statistic for each cutoff, a kappa statistic was computed over all 16 laboratories.¹³

Exploratory analyses compared performance in Phase 1⁹ to performance in Phase 2.

All data analyses were performed using the R language version 2.15.2. Sources of experimental variation in log2-transformed Ki67 scores were analyzed using random effects models as implemented in the R packages lme4 and MCMCglmm. Data were visualized using boxplots and spaghetti plots.

Results

Web-Based Calibration Exercise

Twelve of the 17 non-reference laboratories participating in the calibration exercise met the prespecified passing criteria on their one allowed attempt at scoring the calibration test set (Supplementary Table S1). Failures to pass were all due to divergence from reference scoring on the lowest Ki67 case. Table 1 presents performance measure summary statistics for training (first attempt) and testing. Overall performance of the laboratories through the calibration exercise showed encouraging, but non-significant, trends of improvement from first training attempt to testing: mean Root Mean Square Error across laboratories decreased from 0.6 to 0.4 (P=0.07) and mean Maximum Absolute Deviation decreased from 1.6 to 0.9 (P=0.06) (paired t-test).

Table 1 Summary performance statistics across laboratories, calibration exercise

Full size table

Scoring-on-Glass Experiment

Summary statistics for log2-transformed Ki67 scores across the 16 laboratories participating in the scoring-on-glass experiment are given in Table 2. Figure 2 displays the side-by-side boxplots of untransformed Ki67 scores across laboratories by Group. A similar plot in which laboratories are grouped by whether or not they passed the calibration test is provided in Supplementary Figure S1.

Table 2 Summary statistics of log2-transformed Ki67 scores (percentage of invasive cancer cells scored positive), scoring on glass tissue microarray slides (16 laboratories, 50 cases)

Full size table

The same-section intraclass correlation estimate was 0.94 (95% credible interval: 0.90–0.97), and the different-section intraclass correlation estimate was 0.92 (95% credible interval: 0.88–0.96). Biological variation among different patients was the largest component of the total variation (ν_Patient=1.40, 95% credible interval: 0.91–2.12), followed by residual variation (ν_Error=0.05, 95% credible interval: 0.04–0.05), then laboratory-to-laboratory variation (ν_Lab=0.04, 95% credible interval: 0.02–0.09), and finally tissue spot within patient tumor (ν_Spot=0.02, 95% credible interval: 0.01–0.03). The overall mean level of Ki67 (μ) was 4.49 (log2-transformed data).

All intraclass correlations estimated in this study—both same-section and different-section—met the prespecified criteria for success as both 95% credible intervals fell completely above intraclass correlation=0.70.

Figure 3 displays the variation in scores across laboratories for the 50 cases. Highlighted (gray lines) are the 26 cases for which at least one of the 16 laboratories reported a score in the clinically relevant range of 10%≤Ki67≤20%. In only one of those 26 cases did all laboratories report scores confined to that range.

In a secondary analysis, six well-known Ki67 cutoffs were applied to the scoring-on-glass data. Inter-laboratory concordance was found to be substantial for all the cutoffs and was best for the ≤5% cutoff, where observed agreement was nearly perfect (Table 3).

Table 3 Kappa values for commonly used Ki67 cutoffs when applied to our study data (16 laboratories scoring tissue microarray glass slides)

Full size table

Exploratory comparison of the different-section intraclass correlation=0.92 (95% credible interval: 0.88–0.96) estimated in this study to the intraclass correlation=0.71 (95% credible interval: 0.47–0.78) reported for the centrally stained sections from our prior Phase 1 study⁹ suggests that there was improvement in reproducibility. Although differences in the two studies’ designs prevented direct, formal statistical testing for improvement in intraclass correlation between the studies, the intraclass correlation observed in Phase 2 was statistically significantly >0.70, with the lower bound on the credible interval substantially higher than 0.70. Recalculating the intraclass correlation for Phase 1 using the same statistical approach used in this study, restricted to the 7 laboratories and 50 cases common to both studies, the impression was similar: an intraclass correlation of 0.71 (95% credible interval: 0.54–0.86).

Laboratories spent a median of 5.6 min per core to score the requisite 500 cells (range of laboratory medians 3.0–12.3 min). Scoring all 50 cases required a median of 5.0 h (range 2.9–10.8 h).

Discussion

Substantial interest surrounds use of immunohistochemistry for Ki67 to estimate cellular proliferation rates to guide therapy for patients with breast cancer.^{1, 2, 3, 4, 5, 7, 8} However, for a tumor biomarker assay to be accepted for routine clinical application, it must have both analytical validity as well as high levels of evidence that it has clinical utility for a given use context.¹⁴ Several reports have demonstrated that immunohistochemistry assays for Ki67 have clinical validity for a number of use contexts in early-stage breast cancer, such as prognosis, or as a surrogate end point for response in clinical trials, and perhaps as a predictive factor for benefit from adjuvant chemotherapy.¹ However, the analytical validity of the Ki67 immunohistochemistry assay in diverse clinical laboratories remains poor.⁷ Thus this tumor biomarker test cannot currently be recommended for routine decision-making to guide treatment decisions in standard contemporary oncology practice.¹⁵

Our previous study (Phase 1) revealed substantial variation in Ki67 scoring when laboratories used their own scoring methods on centrally stained slides from the same tissue microarray block, indicating that absolute values and cutoffs for Ki67 cannot be transferred between laboratories without careful standardization of scoring methodology.⁹ In this second study, we eliminated variability due to preanalytical staining issues by providing centrally stained tissue microarray slides, thus permitting laboratories to calibrate to a specified scoring method. We found that a standardized, practicable visual scoring method could deliver high inter-laboratory reproducibility in Ki67 scoring on centrally stained tissue microarray slides. The intraclass correlation when all laboratories were scoring the exact same tissue microarray section was 0.94 (95% credible interval: 0.90–0.97), and the intraclass correlation when laboratories were scoring different sections from the same tissue microarray block was 0.92 (95% credible interval: 0.88–0.96). This latter intraclass correlation (0.92) suggests an improvement over intraclass correlation=0.71 (95% credible interval: 0.54–0.86) calculated in Phase 1 (restricting to the 50 cases and 7 laboratories common to both studies).⁹ As the study was not specifically designed to make this comparison, and a formal statistical test was not performed, it should be interpreted with some caution. Nonetheless, it appears to represent a decrease from 29 to 8% in the proportion of variation in Ki67 scores. The design of this study, using centrally stained slides, suggests that this decrease can be attributed to standardized scoring methodology. If this can be translated into a true clinically applicable context, Ki67 differences between individual patients may well reflect biological differences between tumors, which the assay is intended to detect.

The very light level of staining considered sufficient to be called ‘positive’ in our scoring instructions was also seen as a potential limitation by scorers. Although they found the provided visual examples helpful in both the calibration and the glass-slide scoring, at the very light end of the spectrum it seemed to require very subjective assessment to deem a cell positive or negative. This ‘lightening’ of the ‘positive’ threshold may account for the overall mean of laboratory Ki67 means increasing from 25.7% in Phase 1 to 29.1% in Phase 2 among the Group 1 laboratories, who participated in both studies (Supplementary Table S2).

Finally, although all laboratories were given an opportunity to review their calibration images vs reference images shortly before they scored on glass, different amounts of time passed between completion of the calibration and the subsequent scoring-on-glass among the laboratories (range <1 month to 7 months). Nonetheless, overall good concordance was still observed; therefore, we believe that some learning effect obtained from the calibration was retained.

This calibration exercise demonstrated that laboratories could be ‘trained’ to follow a specific scoring pattern. We also learned that visual depictions of the levels of staining intensity considered ‘positive’ or ‘negative’ were considered helpful by the scoring pathologists. Nonetheless, clinically important discrepancies persisted among laboratories for some cases in the critical Ki67 range between 10 and 20%, in which the cutoff to distinguish high from low is often selected.^{1, 3} There were 26 cases for which at least one of the 16 laboratories reported a score in the range 10%≤Ki67≤20%. In only one of these 26 cases did all laboratories report scores confined to that range.

Our analyses of concordance after application of cutoffs suggest that the observed degree of inter-laboratory agreement is influenced by the cutoff chosen and argue for caution in dichotomizing Ki67 scores into ‘high Ki67’ and ‘low Ki67’. In fact, as discussed in our earlier guidelines,¹ cutoffs for Ki67 should be context-dependent. Moreover, the establishment of such cutoffs will require additional data to permit examination of associations between Ki67, clinical outcome, and other clinical and pathological factors. As with other continuous variables in pathology, different laboratories use different cutoffs for the same context; but uniform cutoffs can be problematic as well if measurements differ substantially across laboratories. The impact on clinical decision-making varies according to context and, in some circumstances, may merit the reporting of confidence intervals or the definition of equivocal values that fall within a range of uncertainty.

Our study does not have immediate clinical application, as we chose to use a highly standardized specimen format (centrally stained tissue microarray cores) not typically used in clinical practice. However, we conclude that we were able to achieve satisfactory levels of scoring agreement in the highly stylized context in which this study was performed. Having done so, our next logical next step will be to see if similar agreement is possible on core biopsies and if this proves satisfactory proceed to a similar strategy with whole sections, in which the reader will have broader discretion regarding what portion of the tissue should be read. Finally, we plan to return to a local staining methodology strategy, which we expect will re-introduce additional variability into the analyses. Our goal is to standardize each of these points of potential variability among the participants of this set of exercises. If we can maintain the satisfactory concordance we have observed in the current study, we will provide the methodological strategy and protocol to the general pathological community, in a manner similar to what has been accomplished by the American Society of Clinical Oncology and College of American Pathologists joint Committees to establish guidelines for HER2 and hormone receptor analyses.^{16, 17, 18, 19}

In conclusion, Ki67 is a widely accessible biomarker of proliferation with strong evidence for clinical validity for prognosis and prediction in breast and other cancers.¹ Its clinical utility has been limited by poor inter-observer reproducibility, with differences in scoring methodology a major contributor to variability.^{9, 20} We demonstrate that improved inter-observer reproducibility in Ki67 scoring can be achieved on centrally stained tissue microarray slides after training observers on a standardized, practicable visual scoring method that can be performed relatively quickly. Our results suggest that strategies can be developed to at least partially overcome inter-observer variation, which has limited the utility of this marker. The potentially encouraging results demonstrated in this study remain tempered by the recognition that achieving agreement among laboratories in clinical practice remains a significant challenge, as these promising findings need to be translated to core biopsies and/or whole sections, with local staining, and clinical validity confirmed. Therefore, we maintain that, unless and until preanalytical and analytical features for immunohistochemistry of Ki67 can be standardized, this assay platform should not be used to drive patient-care decisions in clinical practice. If we are successful in our next set of studies, then immunohistochemistry for Ki67 could be incorporated into standard of care. If not, then evaluation of proliferation for prognostic and predictive considerations in early breast cancer will need to be performed with other assay platforms that may have more analytical validity but which may be more difficult to apply in practices around the world.

References

Dowsett M, Nielsen TO, A'Hern R et al. Assessment of Ki67 in breast cancer: recommendations from the International Ki67 in Breast Cancer Working Group. J Natl Cancer Inst 2011;103:1656–1664.
Article CAS PubMed PubMed Central Google Scholar
Goldhirsch A, Wood WC, Coates AS et al. Strategies for subtypes—dealing with the diversity of breast cancer: highlights of the St. Gallen International Expert Consensus on the Primary Therapy of Early Breast Cancer 2011. Ann Oncol 2011;22:1736–1747.
Article CAS PubMed PubMed Central Google Scholar
Inwald EC, Klinkhammer-Schalke M, Hofstadter F et al. Ki-67 is a prognostic parameter in breast cancer patients: results of a large population-based cohort of a cancer registry. Breast Cancer Res Treat 2013;139:539–552.
Article CAS PubMed PubMed Central Google Scholar
Viale G, Giobbie-Hurder A, Regan MM et al. Prognostic and predictive value of centrally reviewed Ki-67 labeling index in postmenopausal women with endocrine-responsive breast cancer: results from Breast International Group Trial 1-98 comparing adjuvant tamoxifen with letrozole. J Clin Oncol 2008;26:5569–5575.
Article PubMed PubMed Central Google Scholar
Yerushalmi R, Woods R, Ravdin PM et al. Ki67 in breast cancer: prognostic and predictive potential. Lancet Oncol 2010;11:174–183.
Article CAS PubMed Google Scholar
Harris L, Fritsche H, Mennel R et al. American Society of Clinical Oncology 2007 update of recommendations for the use of tumor markers in breast cancer. J Clin Oncol 2007;25:5287–5312.
Article CAS PubMed Google Scholar
Luporsi E, Andre F, Spyratos F et al. Ki-67: level of evidence and methodological considerations for its role in the clinical management of breast cancer: analytical and critical review. Breast Cancer Res Treat 2012;132:895–915.
Article CAS PubMed Google Scholar
Goldhirsch A, Winer EP, Coates AS et al. Personalizing the treatment of women with early breast cancer: highlights of the St Gallen International Expert Consensus on the Primary Therapy of Early Breast Cancer 2013. Ann Oncol 2013;24:2206–2223.
Article CAS PubMed PubMed Central Google Scholar
Polley MY, Leung SC, McShane LM et al. An international Ki67 reproducibility study. J Natl Cancer Inst 2013;105:1897–1906.
Article PubMed PubMed Central Google Scholar
Genetic Pathology Evaluation Centre Ki67-QC calibrator [http://www.gpec.ubc.ca/calibrator].
Mikami Y, Ueno T, Yoshimura K et al. Interobserver concordance of Ki67 labeling index in breast cancer: Japan Breast Cancer Research Group Ki67 Ring Study. Cancer Sci 2013;104:1539–1543.
Article CAS PubMed PubMed Central Google Scholar
Tang LH, Gonen M, Hedvat C et al. Objective quantification of the ki67 proliferative index in neuroendocrine tumors of the gastroenteropancreatic system: a comparison of digital image analysis with manual methods. Am J Surg Pathol 2012;36:1761–1770.
Article PubMed Google Scholar
Fleiss JL . Measuring nominal scale agreement among many raters. Psychol Bull 1971;76:378–382.
Article Google Scholar
Simon RM, Paik S, Hayes DF . Use of archived specimens in evaluation of prognostic and predictive biomarkers. J Natl Cancer Inst 2009;101:1446–1452.
Article PubMed PubMed Central Google Scholar
Hayes DF . From genome to bedside: are we lost in translation? Breast 2013;22 Suppl 2:S22–S26.
Article PubMed Google Scholar
Wolff AC, Hammond ME, Hicks DG et al. Recommendations for human epidermal growth factor receptor 2 testing in breast cancer: American Society of Clinical Oncology/College of American Pathologists clinical practice guideline update. J Clin Oncol 2013;31:3997–4013.
Article PubMed Google Scholar
Wolff AC, Hammond ME, Hicks DG et al. Recommendations for human epidermal growth factor receptor 2 testing in breast cancer: American Society of Clinical Oncology/College of American Pathologists clinical practice guideline update. Arch Pathol Lab Med 2014;138:241–256.
Article PubMed Google Scholar
Hammond ME, Hayes DF, Dowsett M et al. American Society of Clinical Oncology/College of American Pathologists guideline recommendations for immunohistochemical testing of estrogen and progesterone receptors in breast cancer. Arch Pathol Lab Med 2010;134:907–922.
PubMed PubMed Central Google Scholar
Hammond ME, Hayes DF, Dowsett M et al. American Society of Clinical Oncology/College of American Pathologists guideline recommendations for immunohistochemical testing of estrogen and progesterone receptors in breast cancer. J Clin Oncol 2010;28:2784–2795.
Article PubMed PubMed Central Google Scholar
Varga Z, Diebold J, Dommann-Scherrer C et al. How reliable is Ki-67 immunohistochemistry in grade 2 breast carcinomas? A QA study of the Swiss Working Group of Breast- and Gynecopathologists. PLoS One 2012;7:e37379.
Article CAS PubMed PubMed Central Google Scholar
Ellis MJ, Tao Y, Luo J et al. Outcome prediction for estrogen receptor-positive breast cancer based on postneoadjuvant endocrine therapy tumor characteristics. J Natl Cancer Inst 2008;100:1380–1388.
Article CAS PubMed PubMed Central Google Scholar
Cheang MC, Chia SK, Voduc D et al. Ki67 index, HER2 status, and prognosis of patients with luminal B breast cancer. J Natl Cancer Inst 2009;101:736–750.
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported by the Breast Cancer Research Foundation. Additional funding for the UK laboratories was received from Breakthrough Breast Cancer and the National Institute for Health Research Biomedical Research Centre at the Royal Marsden Hospital. Funding for the Ontario Institute for Cancer Research is provided by the Government of Ontario. Judith Hugh is the Lilian McCullough Chair in Breast Cancer Surgery Research and the CBCF Prairies/NWT Chapter. We are grateful to the Breast International Group and North American Breast Cancer Group (BIG-NABCG) collaboration, including the leadership of Dr Nancy Davidson, Dr Martine Piccart, and Dr Larry Norton, and for the support of the Breast Cancer Research Foundation.

Author information

Authors and Affiliations

Division of Cancer Treatment and Diagnosis, Biometric Research Branch, National Cancer Institute, Bethesda, Maryland, USA
Mei-Yin C Polley & Lisa M McShane
Pathology and Laboratory Medicine, University of British Columbia, Vancouver, British Columbia, Canada
Samuel C Y Leung, Dongxia Gao & Torsten O Nielsen
Division of Pathology and Laboratory Medicine, European Institute of Oncology, Milan, Italy
Mauro G Mastropasqua
Breakthrough Breast Cancer Research Centre, The Institute of Cancer Research, London, UK
Lila A Zabaglo
Transformative Pathology, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
John M S Bartlett
The Emmes Corporation, Rockville, Maryland, USA
Rebecca A Enos
Indiana University Simon Cancer Center, Indianapolis, Indiana, USA
Sunil S Badve
Department of Pathology and Molecular Medicine, Juravinski Hospital and Cancer Centre, McMaster University, Hamilton, Ontario, Canada
Anita L Bane
Department of Clinical Sciences, Division of Oncology and Pathology, Lund University, Lund, Sweden
Signe Borgquist & Dorthe Grabau
Montefiore Medical Center and the Albert Einstein College of Medicine, Bronx, New York, USA
Susan Fineberg
Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
Ming-Gang Lin & Peggy L Porter
PhenoPath Laboratories, Seattle, Washington, USA
Allen M Gown
Lester and Sue Smith Breast Center and Dan L. Duncan Cancer Center, Baylor College of Medicine, Houston, Texas, USA
Carolina Gutierrez & C Kent Osborne
Department of Laboratory Medicine and Pathology, University of Alberta, Alberta, Canada
Judith C Hugh
Department of Pathology, Kawasaki Medical School, Kurashiki City, Japan
Takuya Moriya
Department of Pathology, Hakuaikai Sagara Hospital, Matsubaracho, Kagoshima, Japan
Yasuyo Ohi
Department of Pathology, Centre Jean Perrin and Université d'Auvergne, Clermont-Ferrand, France
Frédérique M Penault-Llorca
Edinburgh Cancer Research Centre, Western General Hospital, Edinburgh, UK
Tammy Piper
Department of Pathology, Jichi Medical University, Shimotsuke, Tochigi, Japan
Takashi Sakatani
Institut Jules Bordet, Brussels, Belgium
Roberto Salgado
Birmingham Heart of England, National Health Service, Birmingham, UK
Jane Starczynski
Slagelse Hospital, Slagelse, Region Sjælland, Denmark
Anne-Vibeke Lænkholm
Division of Pathology and Laboratory Medicine, European Institute of Oncology, and University of Milan, Milan, Italy
Giuseppe Viale
Academic Department of Biochemistry, Royal Marsden Hospital, London, UK
Mitch Dowsett
Breast Oncology Program, University of Michigan Comprehensive Cancer Center, Ann Arbor, Michigan, USA
Daniel F Hayes

Authors

Mei-Yin C Polley
View author publications
You can also search for this author in PubMed Google Scholar
Samuel C Y Leung
View author publications
You can also search for this author in PubMed Google Scholar
Dongxia Gao
View author publications
You can also search for this author in PubMed Google Scholar
Mauro G Mastropasqua
View author publications
You can also search for this author in PubMed Google Scholar
Lila A Zabaglo
View author publications
You can also search for this author in PubMed Google Scholar
John M S Bartlett
View author publications
You can also search for this author in PubMed Google Scholar
Lisa M McShane
View author publications
You can also search for this author in PubMed Google Scholar
Rebecca A Enos
View author publications
You can also search for this author in PubMed Google Scholar
Sunil S Badve
View author publications
You can also search for this author in PubMed Google Scholar
Anita L Bane
View author publications
You can also search for this author in PubMed Google Scholar
Signe Borgquist
View author publications
You can also search for this author in PubMed Google Scholar
Susan Fineberg
View author publications
You can also search for this author in PubMed Google Scholar
Ming-Gang Lin
View author publications
You can also search for this author in PubMed Google Scholar
Allen M Gown
View author publications
You can also search for this author in PubMed Google Scholar
Dorthe Grabau
View author publications
You can also search for this author in PubMed Google Scholar
Carolina Gutierrez
View author publications
You can also search for this author in PubMed Google Scholar
Judith C Hugh
View author publications
You can also search for this author in PubMed Google Scholar
Takuya Moriya
View author publications
You can also search for this author in PubMed Google Scholar
Yasuyo Ohi
View author publications
You can also search for this author in PubMed Google Scholar
C Kent Osborne
View author publications
You can also search for this author in PubMed Google Scholar
Frédérique M Penault-Llorca
View author publications
You can also search for this author in PubMed Google Scholar
Tammy Piper
View author publications
You can also search for this author in PubMed Google Scholar
Peggy L Porter
View author publications
You can also search for this author in PubMed Google Scholar
Takashi Sakatani
View author publications
You can also search for this author in PubMed Google Scholar
Roberto Salgado
View author publications
You can also search for this author in PubMed Google Scholar
Jane Starczynski
View author publications
You can also search for this author in PubMed Google Scholar
Anne-Vibeke Lænkholm
View author publications
You can also search for this author in PubMed Google Scholar
Giuseppe Viale
View author publications
You can also search for this author in PubMed Google Scholar
Mitch Dowsett
View author publications
You can also search for this author in PubMed Google Scholar
Daniel F Hayes
View author publications
You can also search for this author in PubMed Google Scholar
Torsten O Nielsen
View author publications
You can also search for this author in PubMed Google Scholar

Consortia

on behalf of the International Ki67 in Breast Cancer Working Group of the Breast International Group and North American Breast Cancer Group (BIG-NABCG)

Corresponding author

Correspondence to Torsten O Nielsen.

Ethics declarations

Competing interests

Mitch Dowsett: Advisory boards for Nanostring and Genoptix. Lecture fees from Ventana. Daniel F Hayes: Research Advisory Panel: Pfizer; Consultancy: Eli Lilly; Sponsored Clinical Research: Janssen Diagnostics; Stock Options: Oncimmune, Inbiomotion; Patents: 3 approved (none related to the submitted work). Torsten Nielsen: Patents: Bioclassifier: Intellectual property in a gene expression test that is not part of the submitted work. Consultancy: Nanostring Technologies: Intellectual property in a gene expression test that is not part of the submitted work. The other authors declare no conflict of interest.

Additional information

Supplementary Information accompanies the paper on Modern Pathology website

Rights and permissions

Reprints and permissions

About this article

Cite this article

Polley, MY., Leung, S., Gao, D. et al. An international study to increase concordance in Ki67 scoring. Mod Pathol 28, 778–786 (2015). https://doi.org/10.1038/modpathol.2015.38

Download citation

Received: 14 October 2014
Revised: 17 December 2014
Accepted: 18 December 2014
Published: 20 February 2015
Issue Date: June 2015
DOI: https://doi.org/10.1038/modpathol.2015.38

This article is cited by

Subspecialized breast pathologists have suboptimal interobserver agreement in Ki-67 evaluation using 20% as the cutoff
- Di Ai
- Gulisa Turashvili
- Xiaoxian Li
Breast Cancer Research and Treatment (2023)
Artificial intelligence-assisted interpretation of Ki-67 expression and repeatability in breast cancer
- Lina Li
- Dandan Han
- Yueping Liu
Diagnostic Pathology (2022)
Low correlation between Ki67 assessed by qRT-PCR in Oncotype Dx score and Ki67 assessed by Immunohistochemistry
- Zohair Selmani
- Chloé Molimard
- Elsa Curtit
Scientific Reports (2022)
Hypoxia does not predict lack of benefit from adjuvant radiotherapy for patients with early stage breast cancer
- Catharine M. L. West
British Journal of Cancer (2022)
Comparison of immunohistochemistry and RT-qPCR for assessing ER, PR, HER2, and Ki67 and evaluating subtypes in patients with breast cancer
- Lili Chen
- Yanyang Chen
- Zunfu Ke
Breast Cancer Research and Treatment (2022)

Subjects

Abstract

Similar content being viewed by others

Main

Materials and methods

Web-Based Calibration Exercise

Scoring-on-Glass Experiment

Statistical Analyses

Calculation of Ki67 score

Web-based calibration exercise

Scoring-on-glass experiment

Results

Web-Based Calibration Exercise

Scoring-on-Glass Experiment

Discussion

References

Acknowledgements

Author information

Authors and Affiliations

Consortia

on behalf of the International Ki67 in Breast Cancer Working Group of the Breast International Group and North American Breast Cancer Group (BIG-NABCG)

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links