Ki-67 as prognostic marker in early breast cancer: a meta-analysis of published studies involving 12 155 patients

The Ki-67 antigen is used to evaluate the proliferative activity of breast cancer (BC); however, Ki-67's role as a prognostic marker in BC is still undefined. In order to better define the prognostic value of Ki-67/MIB-1, we performed a meta-analysis of studies that evaluated the impact of Ki-67/MIB-1 on disease-free survival (DFS) and/or on overall survival (OS) in early BC. Sixty-eight studies were identified and 46 studies including 12 155 patients were evaluable for our meta-analysis; 38 studies were evaluable for the aggregation of results for DFS, and 35 studies for OS. Patients were considered to present positive tumours for the expression of Ki-67/MIB-1 according to the cut-off points defined by the authors. Ki-67/MIB-1 positivity is associated with higher probability of relapse in all patients (HR=1.93 (95% confidence interval (CI): 1.74–2.14); P<0.001), in node-negative patients (HR=2.31 (95% CI: 1.83–2.92); P<0.001) and in node-positive patients (HR=1.59 (95% CI: 1.35–1.87); P<0.001). Furthermore, Ki-67/MIB-1 positivity is associated with worse survival in all patients (HR=1.95 (95% CI: 1.70–2.24; P<0.001)), node-negative patients (HR=2.54 (95% CI: 1.65–3.91); P<0.001) and node-positive patients (HR=2.33 (95% CI: 1.83–2.95); P<0.001). Our meta-analysis suggests that Ki-67/MIB-1 positivity confers a higher risk of relapse and a worse survival in patients with early BC.

The crude incidence of breast cancer (BC) in Europe is 109.8/ 100.000 women per year and it is responsible for 38.4 out of 100.000 deaths per women annually (Pestalozzi et al, 2005). Significant improvements in both disease-free survival (DFS) and overall survival (OS) have been obtained with the extensive use of adjuvant systemic therapies (EBCTCG, 2005). In the last few decades, proliferation markers have been extensively evaluated as prognostic tools in BC. However, the only prognostic factors utilised in clinical decision making are some histologic features (e.g. tumour size, histologic grade, nodal status and lymphovascular invasion), hormone receptor status, HER-2 status and age (Colozza et al, 2005;Hayes, 2005).
Ki-67 is present in all proliferating cells and there is great interest in its role as a marker of proliferation (Gerdes et al, 1983). The Ki-67 antibody reacts with a nuclear non-histone protein of 395 KD present in all active phases of the cell cycle except the G0 phase (Cattoretti et al, 1992). MIB-1 is a monoclonal antibody against recombinant parts of the Ki-67 antigen; a good correlation exists between Ki-67 and MIB-1 (Cattoretti et al, 1992).
Recently, gene array techniques have revealed the Ki-67 gene's role in several 'proliferation signatures', showing that a set of genes with increased expression patterns is correlated with tumour cell proliferation rates, as assessed by the Ki-67 labelling index (Perou et al, 1999;Whitfield et al, 2006). Moreover, Ki-67 is one of the 21 prospectively selected genes of the Oncotype DX TM assay used to predict the risk of recurrence in a node-negative, tamoxifentreated BC population enrolled in the National Surgical Adjuvant , as well to predict the magnitude of chemotherapy benefit in women with node-negative, estrogen receptor (ER)-positive BC enrolled in the NSABP B20 trial (Paik et al, 2004(Paik et al, , 2006. Despite the large number of published papers analyzing the prognostic role of Ki-67 in early BC, it is still not considered as an established factor to be used in clinical practice, probably because most of the studies are retrospective and because some uncertainty remains on the way Ki-67 should be assessed (Eifel et al, 2001;Goldhirsch et al, 2003;Colozza et al, 2005;Urruticoechea et al, 2005). Therefore, due to the fact that a more convincing demonstration of the Ki-67 prognostic role, in early BC, would be of value for initiating further research on the assessment methods of Ki-67, we performed this literature-based metaanalysis to better quantify the prognostic impact of Ki-67 expression.

Publication selection
For this meta-analysis, we selected studies evaluating the relationship between Ki-67/MIB-1 status and prognosis in early BC published until May 2006. To fulfill our selection criteria, the studies had to have been published as a full paper in English. Articles were identified by an electronic PubMed search using the following keywords: 'breast cancer ','proliferative index','proliferative marker','survival' and 'prognostic'. We also screened references from the relevant literature, including all the identified studies and reviews. To avoid duplicate data, we identified articles that included the same cohort of patients by reviewing interstudy similarity in the country in which the study was performed, investigators in the study, source of patients, recruitment period and inclusion criteria. Therefore, when the authors reported the same patient population in several publications, only the most recent or complete study was included in this analysis.

Data extraction
Information was carefully extracted from all publications by three authors (EA, GC and MP). The following data were collected from each study: publication date, first author's last name, antibody and cut-off used for assessing Ki-67 positivity, distribution of Ki-67 status, follow-up period, treatment, nodal status and data allowing us to estimate the impact of Ki-67 expression on DFS and/or OS.
We did not define any minimal number of patients to include a study in our meta-analysis, nor a minimal duration of median follow-up. The exclusion criteria are described below and were not driven by the study individual results.

Statistical methods
Ki-67 was considered positive or negative according to the cut-off values provided by the authors. For the quantitative aggregation of the survival results, the impact of Ki-67 expression on prognosis was measured using Hazard Ratio (HR). For each study, this HR was estimated by a method depending on the results provided in the original publication. The most accurate method was to retrieve the estimated HR and its variance using two of the following parameters: the HR point estimate, the log-rank statistic or its P-value, and the O -E statistic (difference between numbers of observed and expected events) or its variance. If those data were not available, we looked for the total number of events, the number of patients at risk in each group and the log-rank statistic or its P-value, to estimate the HR. Finally, if the only useful data were in the form of graphical representations of the survival distributions, we extracted from them the survival rates at specified time-points in order to reconstruct the HR estimate and its variance, with the assumption that the rate of patients censored was constant during the study follow-up (Parmar et al, 1998).
Three independent persons read the curves to reduce reading variability. If authors reported survival of three or more groups, we pooled the results to make feasible a comparison between two groups. Whenever possible, HR estimates for subgroups were calculated, such as in node-negative, node-positive or untreated patients. Results were crosschecked with those from the original publication to be sure that they are not discrepant, in particular when reading of the survival rates had to be performed on the survival curves.
The individual HR estimates were combined into an overall HR using the Peto's method that was first used and published in 1985 (Yusuf et al, 1985). We carried out heterogeneity w 2 -tests, and if the assumption of homogeneity of individual HRs had to be rejected, we used a random-effect model in place of a fixed-effect model. By convention, an observed HR41 implied a worse prognosis for the group with positive Ki-67 expression. This impact of Ki-67 on survival was considered to be statistically significant if the 95% confidence interval (CI) for the overall HR did not overlap 1. We have used the authors' definitions for DFS and OS.
All the statistical calculations for our meta-analysis were performed with personal computing.

Characteristics of the studies
Out of 68 studies published between the years 1989 and 2006, 46 had the sufficient information for HR extraction, including 38 studies evaluable for DFS and 35 for OS, some of them being evaluable for only one of these end points, or they analysed only one of these end points. Tables 1 and 2 list the evaluable studies with their main characteristics, and Table 3 presents the main results of this meta-analysis. The reasons to consider an article as non-evaluable were: (a) no univariate analysis reported; (b) no possibility to calculate HR using one of the methods mentioned above due to the fact that the distribution of Ki-67 was not reported in the article, or sometimes Ki-67 was analysed in combination with other prognostic markers rendering the analysis impossible; (c) overlapping data published in different journals; and (d) inclusion of metastatic BC patients. Table 4 lists all the studies considered non-evaluable for this meta-analysis, but used at sensitivity analysis.
The number of patients included across all studies varied from 42 to 863, and the follow-up period varied from 23.6 months (mean) to 16.3 years (median). Different antibodies were used through all trials: anti-Ki-67 was used in 24 studies (52.1%), anti-MIB-1 in 24 studies (52.1%), both antibodies were performed in five studies (Keshgegian and Cnaan, 1995;Veronese et al, 1995;Bevilacqua et al, 1996;Querzoli et al, 1996;Billgren et al, 2002), anti-Ki-S5 in two studies (Rudolph et al, 1999a;Esteva et al, 2004) and anti-Ki-S11 in one study (Rudolph et al, 1999b). The different cut-off values used were those of the authors (range: 3.5 -34%). Threshold definitions were mean or median values, the best cut-off value or an established arbitrary value.
The necessity to exclude some studies due to a lack of results for aggregating the results is a well-known important problem when conducting a meta-analysis, because the excluded studies show often a smaller effect compared to the studies published with full details and evaluable for the meta-analysis. To assess the impact of bias related to the unevaluable studies (that might lead to an overestimation of the effect), we performed an analysis on the overall patient populations including both evaluable and unevaluable studies. For papers reporting only HR estimates obtained in multivariate analyses, we used this HR estimate together with its variance. For those with uncertainties related to the number of events and then the variance of the HR estimate, we made rough approximation of the variance. Finally, for the studies where no useful information could be retrieved from the publication, we considered that the HR estimate was 1 (i.e. no impact at all for Ki-67) and used a minimal variance compared to the included studies of the same size. Even by carrying out this sensitivity analysis, we still observe a significant pejorative impact of Ki-67 positivity on DFS (HR 1.74, 95% CI 1.56 -1.95; Po0.001; heterogeneity test Po0.001) and OS (HR 1.76, 95% CI 1.54 -2.00; Po0.001; heterogeneity test Po0.001).

DISCUSSION
The present meta-analysis confirms that high Ki-67 expression in patients with early BC confers worse prognosis in the overall population and quantifies its prognostic univariate impact. Further, it was also shown in subgroup analyses for node-negative, node-positive and untreated patients. This is the first metaanalysis of published studies to evaluate the association between Ki-67/MIB-1 expression and prognosis in early BC. Prognostic markers may be defined as those markers that are associated with some clinical outcomes, typically a time-to-event outcome such as OS or DFS, independently of any treatment or intervention. The best setting to apply this concept is in untreated populations, which helps identifying the so-called pure prognostic marker. Prognostic markers may also be used to aid the decision-making process for adjuvant therapy, for example, they may be used as decision aids in determining whether a patient should receive adjuvant chemotherapy or how aggressive that therapy should be (McShane et al, 2005).
Ki-67 has been assayed in many studies as a prognostic and/or predictive marker in early BC. As a predictive marker, very few trials of primary systemic therapy, mostly retrospective and with conflicting results have been published (Colozza et al, 2005), and therefore we felt that the assessment of the predictive role of Ki-67 was out of scope for this meta-analysis.
Our meta-analysis was carried out using literature published results, and we therefore acknowledge some limitations of our approach which is, however, much less expensive than a metaanalysis using individual patients data. The language selection could favour positive studies, following the assumption that they are more often published in English, whereas the negative ones tend to be published more often in local journals using the author's native languages (Egger et al, 1997). However, we did not identify many papers published in a national language (Italian, Russian, Serbian, German) (Lelle, 1990;Topic et al, 2002;Kushlinskii et al, 2004;Costarelli et al, 2005). This may be called the 'Tower of Babel bias' and, in at least one of 36 consecutive meta-analyses, the exclusion of papers for linguistic reasons produced different results from those which would have been obtained if this exclusion criterion had not been used (Gregoire et al, 1995). Another possible source of confusion is the use of the same cohort of patients in different publications, although studies that were clearly based on the analysis of the same patient cohorts were excluded in this meta-analysis.
Some authors consider meta-analyses using individual data to be the gold standard evidence (Stewart and Parmar, 1993;Oxman et al, 1995). This approach is normally considered to be a new study that takes into account all performed studies on the topic, published or not, and that requires an individual data update by the investigators; it is thus much more time consuming, complex and costly. In a comparison between a meta-analysis based on individual patient data and one based on extracted data, the overall duration for the former was found to be 1 -5 years while for the latter it is only 1 -5 months. Additionally, the overall cost to perform an individual patient data meta-analysis is $50 000 to $500 000, whereas for an extracted data study it is in the range of $5000 to $30 000 (Piedbois and Buyse, 2004). Therefore, a metaanalysis on published literature is worthwhile and, especially in a situation, as here, it is very unlikely to find the resources to conduct a meta-analysis based on the individual data.
The method used for extrapolating HR might be a source of some variability in the HR estimates. When no other useful information was available, we extrapolated the HR from the survival curves using several time points during follow-up for reading the corresponding survival rates, assuming that censored observations were uniformly distributed. The estimation of survival rates based on the graphical representation of the survival curves was performed independently by three of the authors and we compared our HR estimate and its statistical significance with the results published in each individual trial. We did not identify any major contradiction between our results and the results available in the papers.
The adverse impact of Ki-67 positivity on both OS and DFS was observed in the overall population as well as in the subgroups node-negative and node-positive patients. Significant heterogeneity was detected when considering the whole population and node-negative patients. It is not considered appropriate to define a single measure (i.e. HR associated with Ki-67 positivity in this case) from studies with inherent dissimilarities. The observed disparity among the conclusions of different studies, responsible for the observed heterogeneity, can be quantified by applying   Biesterfeld et al (1998) 103 Yes Bukholm et al (2003) 147 No Ceccarelli et al (2000) 217 Yes Galiegue et al (2004) 117 No Gasparini et al (1992) 164 Yes Haerslev et al (1996) 487 Yes Jalava et al (2000) 414 No Kroger et al (2006) 157 No Kronblad et al (2006) 377 Yes Lampe et al (1998) 142 Yes Liu et al (2000) 225 Yes Michels et al (2003) 104 No Molino et al (1997) 322 Yes Rudas et al (1994) 184 No Tsutsui et al (2005) 249 Yes Yang et al (2003) 147 No Ki-67 in early breast cancer E de Azambuja et al quality scores to the selected studies included in the meta-analysis. However, these scores do not always explain the observed results (Greenland, 1994). In this case, the methodological characteristics of each study must be taken into consideration. In 1992, Cattoretti et al (1992) reported better success in staining Ki-67 in paraffin-embedded samples after the new antibodies anti-MIB-1 and anti-MIB-3 had been developed. Although several antibodies are now commercially available to stain Ki-67, anti-MIB-1 is the most frequently used in recent studies (Urruticoechea et al, 2005). In our meta-analysis, antibodies other than anti-MIB-1 and anti-Ki-67 were included, such as anti-Ki-S5 (Rudolph et al, 1999a;Esteva et al, 2004) and anti-Ki-S11 (Rudolph  Figure 1 Results of the meta-analysis with all evaluable studies for DFS. A hazard ratio (HR)41 implies a worse DFS for the group with increased Ki-67. The squared size is proportional to the number of patients included in each study. The centre of the lozenge gives the combined HR for the meta-analysis and its extremities the 95% CI. Bevilacqua 1996Bos 2003Domagala 1996 Figure 2 Results of the meta-analysis with all evaluable studies for OS. A HR41 implies a worse OS for the group with increased Ki-67. The squared size is proportional to the number of patients included in each study. The centre of the lozenge gives the combined HR for the metaanalysis and its extremities the 95% CI.
et al, 1999b), albeit representing only a minority of the cases. Moreover, Ki-67 expression is usually estimated as the percentage of tumour cells positively stained by the antibody, with nuclear staining being the most common criteria of positivity. The use of different antibodies and scoring protocols without a standard minimum number of cells to be counted may account for some of the differences between the studies.
In our meta-analysis, some studies have used 10% as the cut-off (arbitrary value), whereas others have chosen mean, median, the optimal cut-off value or arbitrary values, and these differences might be responsible for the difficulty in determining a standard threshold in daily practice. However, some authors have described that the choice of the cut-off point for IHC may depend on the clinical objective: if Ki-67 is used to exclude patients with slowly proliferating tumours from chemotherapeutic protocols, a cut-off of 10% will help avoid overtreatment. In contrast, if Ki-67 is used to identify patients sensitive to chemotherapy protocols, it is preferable to set the cut-off at 25% (Spyratos et al, 2002). In the context of this meta-analysis, we may assume that increased Ki-67 leads to an increased risk of relapse and/or death and that a relative increase is estimated although the baseline risk (the risk in the group considered Ki-67 negative) is not the same in all the studies.
A further limitation of our meta-analysis is that it assesses only the univariate prognostic value of Ki-67. So, we cannot infer from our meta-analysis that Ki-67 is an independent factor; the answer to that question should come from a prospective study (it is likely that a meta-analysis of individual data would not solve the question as the intersection of the sets of covariates available in the individual studies is most probably very small).
To better clarify the prognostic role of ER status, Sotiriou et al (2006) used gene array profiling to explore the implications of the joint distribution of ER status and gene expression grade index (GGI) to predict clinical outcome. They found that almost all ER-negative tumours were associated with high GGI scores (high grade), whereas ER-positive tumours were associated with a heterogeneous mixture of GGI values. This means that GGI adds additional prognostic information when the ER status is known, whereas the opposite is not true. Unfortunately, due to the lack of information in the published studies used in our study, an analysis of the impact of Ki-67 expression on the ER-negative and ER-positive subpopulations and grade, which are well-known risk factor associated with worse outcome, was not possible. Table 5 summarises the main results of the recent genes signatures for prognosis/prediction in BC.
Despite years of research and hundreds of reports of tumour markers in oncology, the number of markers that have emerged as clinically useful is quite small. The REporting of tumour MARKer Studies (REMARK) guidelines was the major task of the NCI-EORTC First International Meeting on Cancer Diagnosis, representing a collaborative effort of statisticians, clinicians and laboratory scientists. The guidelines contain 20 recommendations derived from studies on tumour markers and regarding study design, methods of statistical analysis, preplanned hypotheses, patient and specimen characteristics, and assay methods. The widespread use of published guidelines for analytical methods and the reporting of results would greatly facilitate the development of alternative analyses and meta-analyses (Alonzo, 2005;McShane et al, 2005).
Despite some limitations, this meta-analysis supports the prognostic role of Ki-67 in early BC, by showing a significant association between its expression and the risk of recurrence and death in all populations considered and for both outcomes, DFS and OS. Had the proposed REMARK guidelines been employed in all the studies selected for this meta-analysis and had all necessary information been available, our literature-based meta-analysis would better characterise the role of Ki-67 as prognostic marker. Proliferation and antiapoptosis genes Oh et al (2006)