INTRODUCTION

Histological grading is well accepted for evaluation of the prognosis of patients with astrocytic tumors. It is important to differentiate the low-grade tumors (astrocytoma) from the high-grade ones (anaplastic astrocytoma and glioblastoma), because their clinical management differs. However, the morphologic criteria are not always accurate prognostic indicators of individual case. In order to better assess their classification, many complementary methods, such as proliferation markers, have been introduced. Ki-67 is one of the cell proliferation markers that had been widely studied, and MIB-1 is one of the most sensitive, commercially available Ki-67 equivalent antibodies (1, 2, 3). MIB-1 can react with an epitope of Ki-67 protein in formalin-fixed, paraffin-embedded sections after microwave antigen retrieval (4). The MIB-1 labeling index was calculated as the percentage of MIB-1–positive nuclei. Previous reports showed that astrocytic tumors with low MIB-1 labeling indices had better prognosis than that with high MIB-1 labeling indices, and the MIB-1 labeling indices of astrocytomas were lower than those of anaplastic astrocytomas and glioblastomas (5, 6, 7, 8, 9, 10). Furthermore, a cutoff value of MIB-1 labeling index was also proposed in distinguishing the better prognostic group from the poorly prognostic one, or astrocytoma from anaplastic astrocytoma/glioblastoma. Because the reported MIB-1 labeling indices overlapped between these two groups and the cutoff values varied substantially, their reproducibility was doubted (11).

There are many possible sources of variability of the MIB-1 labeling index, in other words, the staining methods and the counting methods. It would be possible to get different MIB-1 labeling indices and cutoff values using different counting methods. The interobserver variability and reproducibility could be different. Studies of this topic are rare.

In this study, we evaluated the interobserver reproducibility of four manual counting methods for MIB-1 labeling index. The interobserver agreement of an MIB-1 labeling index cutoff value of 11.0, which is used in our daily practice, was also studied. The counting variability and application of MIB-1 labeling index for astrocytic tumors were discussed.

MATERIALS AND METHODS

A total of 60 astrocytic tumor specimens, including 20 astrocytomas, 20 anaplastic astrocytomas, and 20 glioblastomas, which were excised and diagnosed at the Taipei Veterans General Hospital during 1999 to 2001, were retrieved from the surgical pathology file of the Department of Pathology and Laboratory Medicine. The histological grading (World Health Organization, 1993) was recorded from the corresponding surgical pathology reports. Of the 60 patients, 32 were male and 28 were female. The mean age was 50.9 years (range, 5 mo to 91 y). Forty-one patients had cerebral hemispheric tumors, 5 patients had tumors of the thalamus or basal ganglia, 6 patients had cerebellar tumor, 5 patients had brainstem tumors, and 3 patients had spinal cord tumor. All specimens were recut from the paraffin-embedded tissue blocks for hematoxylin and eosin staining and immunohistochemical staining for Ki-67 (clone MIB-1, monoclonal, 1:75; DAKO, Glostrup, Denmark; Dako ChemMate Detection Kit, peroxidase; microwaved 3 times for 5 min each time). Positive and negative controls were included with each batch of the sections to confirm the consistency of the analysis. Sections from one glioblastoma that had a known average MIB-1 labeling index of 43.3 (standard deviation, 2.1) were used as positive controls. The batch of MIB-1 staining was considered acceptable when the labeling index of the control section was within the range of 39 to 47.6. Normal brain tissues from autopsy were used as negative controls.

For MIB-1 staining, distinct nuclear staining of the tumor cell was recorded as positive. The MIB-1 labeling index was defined as the percentage of immunoreactive tumor cells in the evaluated area. Vascular components and hematogenous cells were excluded. The evaluated areas also excluded necrotic, degenerated, and poorly preserved areas.

The MIB-1–staining sections were evaluated by three of the authors (CH, DH, and CY.). All counts were performed at a magnification of 400× (field size = 0.1735 mm2). Five viable fields from the area of maximal labeling were chosen for counting. In order to evaluate the counting methods, four counting methods were all applied in each case (Fig. 1). In the first three methods, an ocular graticule consisting of 10 × 10 = 100 grids covering an area of 0.0552 mm2 was used, and all the positive cells in the graticule were counted. The methods to compute the total number of cells in the evaluated area were different. In Method 1, we counted every cell in the graticule as the total number of cells. In Method 2, area-fraction estimate was used, and only the total numbers of cells in the 1st and the 10th columns (20 grids in total) were counted. This number times 5 was used to estimate the total number of cells in the graticule. In Method 3, total number of cells was estimated by line intersection, and the cells on the lines of left and upper edges of the graticule were counted. The total number of cells in the graticule was estimated by the number of cells on the left edge times the number of cells on the upper edge. The ocular graticule was not used in Method 4. All positive cells in the entire 400× field were counted. An imaginary line of cells counted from the edge of the visual field to the center was regarded as the cells on a radius. The total number of cells was estimated by line intersection without a graticule, using the following formula: 3.1416 × radius2.

FIGURE 1
figure 1

MIB-1 counting in a 400× field. Method 1: directly count the positive cells and all tumor cells in 100 grids. Method 2: count all positive cells in 100 grids and the cells in the 1st and the 10th columns (shadowed area). Method 3: count all positive cells in 100 grids and the cells on the lines of left and upper edges of the graticule. Method 4: remove the ocular graticule, and then count all the positive cells in the field and an imaginary line of cells from the edge of the visual field to the center (radius; MIB-1 immunostain; magnification, 400×).

The data for the MIB-1 labeling index were presented as mean ± standard deviation (median [range]). The distribution of the MIB-1 labeling index was significantly skewed from the normal distribution, as tested by Shapiro-Wilk statistics. The logarithmically converted values of MIB-1 labeling indices were distributed normally and were used in the following statistical analyses. Reproducibility of the MIB-1 labeling index counting methods was analyzed by calculating intraclass correlation coefficient and within-subject coefficient of variation from analysis of variance (12, 13). An intraclass correlation coefficient of <0.40 indicated poor, one between 0.40 and 0.75 indicated fair to good, and one of >0.75 indicated excellent reproducibility (13). The lower the coefficient of variation value (closer to zero), the less variation. Generally speaking, a coefficient of variation of <20% was desirable, and >30% was undesirable (14). The within-subjects standard deviation from analysis of variance was also used to determine 95% confidence interval (95% CI) of a cutoff value 11.0, which we used in routine practice to interpret the MIB-1 labeling index of astrocytic tumor (15). This cutoff value could separate better and worse prognostic groups (5). The level of interobserver agreement using the cutoff value 11.0 was quantitated using the generalized κ and pairwise κ statistics (16). The pairwise κ statistics were the proportion of cases in which two observers agree, adjusted for the level of agreement that would be expected to occur solely by chance. The generalized κ was the summary of the agreement across all observers. In brief, the greater of the κ values reflected stronger agreement between the raters. For interpretation of the kappa value, the following guidelines were used: 0.00 to 0.20, slight; 0.21 to 0.40, fair; 0.41 to 0.60, moderate; 0.61 to 0.80, substantial; and 0.81 to 1.00, almost perfect agreement (17).

Patient outcome was assessed by review of the hospital charts. Survival times were calculated as the time from surgery to death or as the time to last follow-up appointment in surviving patients. Survival was analyzed by the Kaplan-Meier method and compared by the log-rank test. The average value of MIB-1 labeling index counted by three observers was used to correlate with the outcome of the patient. A P value of <.05 was considered significant.

RESULTS

The 2-year survival rates of patients with astrocytoma, anaplastic astrocytoma, and glioblastoma were 88.9%, 39.3%, and 0, respectively. The survival of patients with astrocytoma was significantly longer than that of patients with anaplastic astrocytoma or glioblastoma (P < .001). The difference in survival between patients with anaplastic astrocytoma and patients with glioblastoma did not reach statistical significance (P = .085).

The number of cells was 264 ± 155; 232 (40, 964) [mean ± standard deviation; median (range)] within the graticule (100 grids), 26 ± 16; 23 (2, 118) within a column (10 grids), 10 ± 6; 8 (1, 31) on the left or the upper edge of the graticule, and 11 ± 6; 10 (2, 30) on the radius of the field. Data for the positive cells were 68 ± 77; 53 (1, 603) within the graticule and 144 ± 163; 115 (1, 1300) in the entire field. On average, it took 25 minutes to perform the counting process of a case by Method 1, whereas by Methods 2, 3, and 4, it took 12, 7, and 10 minutes, respectively. Using estimating methods could save time in counting.

The MIB-1 labeling indices using four different counting methods by three observers were showed in Table 1. Using Method 1, the MIB-1 labeling indices of astrocytomas counted by different observers were all <11.0 (range, 0.3 to 10.0), and those of anaplastic astrocytomas and glioblastomas were >11.0 (range, 12.5 to 77.7). The distribution of MIB-1 labeling indices of astrocytomas and those of anaplastic astrocytomas and glioblastomas had no overlapping. The mean value of MIB-1 labeling indices of anaplastic astrocytomas was smaller than that of glioblastoma, but their distributions overlapped. Using Method 2, the MIB-1 labeling indices of astrocytomas ranged from 0.2 to 11.4, whereas those of anaplastic astrocytomas and glioblastomas ranged from 11.9 to 78.0. They could also be completely separated, but the difference between the upper limit of the former (11.4) and the lower limit of the latter (11.9) became smaller. The cutoff value of 11.0 showed prognostic relevance. In Method 1 and Method 2, the survival of patients with MIB-1 labeling index >11.0 was worse than that of patients with MIB-1 labeling index ≤11.0 (P < .001). Using Method 3 and Method 4, the MIB-1 labeling index values of astrocytomas, anaplastic astrocytomas, and glioblastomas were widely overlapped. A cutoff value of MIB-1 labeling index was not found. Besides, 50 (27.8%) of 180 MIB-1 labeling index values obtained by Method 3, and 15 (8.3%) of 180 obtained by Method 4, were >100.0, which was not reasonable.

Table 1 MIB-1 Labeling Indices Distributions Using Various Counting Methods by Three Observers*

Interobserver reproducibility using the intraclass correlation coefficient for each counting method is showed in Table 2. The intraclass correlation coefficients of Methods 1 and 2 were >0.90 in each grade of astrocytic tumors, and they indicated excellent reproducibility. The intraclass correlation coefficients of Methods 3 and 4 ranged from 0.04 to 0.77, and they indicated poor reproducibility in anaplastic astrocytoma and glioblastoma (intraclass correlation coefficient <0.4).

Table 2 Intraclass Correlation Coefficient among Three Observers Using Various Counting Methods*

Table 3 shows the coefficient of variation of MIB-1 labeling index by different counting methods. The coefficients of variation of Method 1 (10.3%) and Method 2 (14.0%) were desirable (<20%), whereas those of Method 3 (112.8%) and Method 4 (92.0%) were undesirable (>30%). The 95% CI of an MIB-1 labeling index cutoff value 11.0 ranged disparately by different counting methods. The intervals of Method 1 (9.1 to 13.3) and Method 2 (8.5 to 14.2) were narrower than those of Method 3 (2.5 to 48.3) and Method 4 (3.1 to 39.5). In Method 3 and Method 4, it was impossible to apply a MIB-1 labeling index cutoff value in practice, because they had such a wide range of 95% CI that constituted almost a half or one third of the reasonable range of MIB-1 labeling index (0 to 100).

Table 3 Within-Subject Coefficient of Variation of MIB-1 Labeling Index and 95% Confidence Interval (CI) of an MIB-1 Labeling Index Cut-Off Value of 11.0 Using Various Counting Methods

Using 11.0 as the cutoff value to evaluate the MIB-1 labeling index of astrocytic tumors, Method 1 had complete interobserver agreement, whereas Methods 2, 3, and 4 did not. Disagreements occurred in 1 case (1.6%) by Method 2, in 8 cases (13.3%) by Method 3, and in 10 cases (16.7%) by Method 4. Generalized κ values revealed almost perfect agreement in Method 1 (1.00), almost perfect agreement in Method 2 (0.97), moderate agreement in Method 3 (0.60), and substantial agreement in Method 4 (0.70). The pairwise κ values of Method 1 and Method 2 indicated almost perfect agreement (0.96 to 1.00). Values of Method 3 revealed moderate to substantial agreement (0.44 to 0.70), whereas those of Method 4 showed substantial agreement (0.68 to 0.74; Table 4).

Table 4 Pairwise Interobserver Agreements According to the Kappa Statistic Using an MIB-1 Labeling Index Cut-Off Value of 11.0*

Regarding the intraclass correlation coefficient, coefficient of variation, and interobserver agreement, Method 1 was the most reproducible method of the 4 methods evaluated.

DISCUSSION

The MIB-1 labeling index was defined as the percentage of immunoreactive tumor cells in the evaluated area. It must be in the range from 0.0 to 100.0. An MIB-1 labeling index value of <0.0 or >100.0 is an unacceptable error. In some cases, the total number of cells estimated by line intersection, such as cells on the edge of the graticule in Method 3 or cells on radius in Method 4, was less than the number of immunostained positive cells obtained by direct counting. This error may in part be due to the tumor cells being not evenly or regularly distributed, because even or regular distribution is the basis for application of line-intersection estimates. It is hard to calculate a line of cells accurately and consistently because the cells are not evenly and regularly arranged in a straight line. In Figure 1, the number of cells on the left edge is not equal to that on the right edge, and the number of cells on the upper edge is not equal to that on the lower edge of the same graticule. Even a small change of the selected field by parallel moving of the graticule results in the number of cells on the left edge changing greatly, whereas the number of positive cells in the graticule remains the same. Similarly, the number of cells in the radius of one direction is different from that of the other directions in the same field.

Using an estimated method to evaluate the total number of cells seems easy and less consuming of time and labor. However, it will amplify the counting variation, especially the line-intersection estimate methods, such as Method 3 or Method 4. We suppose a small difference, consisting only of a one-cell difference in counting the number of cells in the graticule, in the column, on the edge of the graticule, and on the radius, occurred in counting Methods 1, 2, 3, and 4, respectively. The variation of the total number of cells of each counting method is different. In our data, the average number of cells in the graticule was 264, that in a column was 26, that on the edge of the graticule was 10, and that on the radius of the field was 11. A difference of one cell in the number of cells on edge of the graticule or on the radius means a difference of almost 10%. After computing processing, either edge times edge or 3.1416 times radius square, 10% of difference in the number of cells on edge of the graticule or on the radius would make a nearly 20% difference in the total number of cells. In Method 1, a difference of one cell in the number of cells in a graticule resulted in <1% of difference. By direct counting, no further amplification occurred in total number of cells. In Method 2, a difference of one cell in the number of cells in a column resulted in 4% of difference. After computer processing the sum of Column 1 and Column 10 times 5, the difference in the total number of cells remained 4%, and no further amplification occurred.

Counting cells without assisting equipment, such as a graticule or grids, has some problems. We found that the center of the field was hard to locate exactly without the assistance of a graticule. Cells were irregularly distributed in the field, and they were not arranged on a straight line. The cell size was not uniform in a section. The imaginary radius was not straight, its thickness could not be constant, and the number of cells on the radius would be different. Besides, it was also difficult to have the exact count of the positive cells without a graticule, especially when there were many positive cells in the field.

The MIB-1 labeling index is affected by both the staining and the counting methods. Staining variability can be reduced when standard reagents, procedure, and quality control are applied. There are some variables in counting methods, including (1) sample unit definition, such as a fixed number of fields or a given number of cells; (2) method of sample size evaluation, such as the number of field evaluated and the total number of cells counted; (3) method of choosing counting fields, such as highest labeled versus random and continuous versus discontinuous; (4) total number of cells by direct counting or by estimate (line intersection or area fraction); and (5) using assistive equipment or not (5, 10, 11, 18). Instead of manual methods, computerized image analysis has been used (19, 20, 21, 22, 23). Compared with the manual method, automatic or semi-automatic methods need special equipment, such as digital cameras, computerized color image analyzers, or image analysis software. These systems, particularly if automated, hold the promise of improving the accuracy and reproducibility of quantitative immunohistochemistry (24). These images can be saved as files, and they are available for review. However, their advantages mainly depend on the equipment and software, and they may not be available in every surgical pathology laboratory. Further training for using these tools is also required. MIB-1 labeling index produced by visual estimation of the percentage of highlighted nuclei in the most densely labeled area has also been reported (25). Thus, applying a counting method should verify its reproducibility. Our data showed that the interobserver reproducibility of MIB-1 labeling index was significantly different when using different counting methods. The methods using line intersection to estimate the total number of cells, such as Method 3 or Method 4, showed smaller intraclass correlation coefficient, greater coefficient of variation, and worse interobserver agreement. The interobserver variability could be reduced when a proper counting method was used.

MIB-1 has been employed as an operational marker of cell proliferation in various types of human tumors, including astrocytic tumors. A correlation between MIB-1 labeling index and histological grade or survival has been documented (26, 27). Because of the interlaboratory and interobserver variability, the reproducibility of MIB-1 labeling index still leaves a question open. Regarding the significance between MIB-1 labeling index and histological grade or survival, we think that the reproducibility of the counting method would play an important role. A method with good reproducibility will reduce the counting error between the cases studied, and the real differences in MIB-1 labeling index between cases will be explored. Using a method with bad precision, the differences of MIB-1 labeling index between cases will be hidden by counting error, and the relationship between MIB-1 labeling index and prognosis cannot be well evaluated. Therefore, a reasonable counting method with good reproducibility is very important. Concerning the reported cutoff values of MIB-1 labeling index with significance, they ranged from 2.5 to 15% (5, 6, 7, 8, 9, 10). Because the reference standards for MIB-1 staining or counting method are not yet available, it would be difficult to compare the results yielded from different methods. If the same reagents, procedure, quality control, and counting method are applied, the staining and counting variability will be controlled. Applying a precise MIB-1 labeling index cutoff value such as 8, 10, 11, 12, or 15% to predict the prognosis of astrocytic tumors is reasonable (5, 6, 7, 8, 9). Although we cannot get the true value of the MIB-1 labeling index to deal with this problem, the MIB-1 labeling index with good precision can work in daily practice. If staining reagents, procedure, quality control, and counting method are not the same, the cutoff value should be reset.