Introduction

Endoscopic treatment for colorectal cancer confined to the mucosa (Tis) and invading the submucosa (T1) has increased over the last few decades [1, 2]. Because T1 colorectal cancer can metastasize to lymph nodes with an incidence ranging from 6.3% to 14.3% [3,4,5,6,7], strict histologic evaluation of endoscopically resected specimens is necessary for deciding further treatment strategies. Several histologic factors have been proposed for evaluating the risk of lymph node metastasis of T1 colorectal cancer, with positive lymphatic invasion, positive venous invasion, positive poorly differentiated clusters, high-grade tumor budding, and deep (≥1000 µm) submucosal invasion as the major risk factors of lymph node metastasis [4, 6,7,8,9].

Tumor budding is a well-known prognostic indicator in early and advanced colorectal cancer. Several studies, most of which used hematoxylin and eosin staining for evaluation, have reported the importance of tumor budding [7, 8, 10,11,12,13]. Although tumor budding can be evaluated by hematoxylin and eosin staining in some cases, a prominent inflammatory reaction in particular can make their observation difficult. Cytokeratin immunohistochemistry can also be used to visualize single cancer cells or buds and may allow for more accurate identification of tumor budding than hematoxylin and eosin staining. Although some studies have investigated the cut-off values between low- and high-grade tumor budding in advanced colorectal cancer using immunohistochemistry [14, 15], the optimal cut-off value for stratifying T1 colorectal cancer has not yet been determined.

In clinical practice, inter-observer reproducibility is another important issue to be considered for efficient prediction of lymph node metastasis. Previous studies compared inter-observer variability in budding counts on hematoxylin and eosin- and cytokeratin immunohistochemistry-stained specimens, and the degree of inter-observer agreement as calculated by kappa coefficients was consistently higher for cytokeratin immunohistochemistry than for hematoxylin and eosin staining [16, 17]. Although cytokeratin immunohistochemistry seems to be an ideal method for evaluating tumor budding, both in-depth histologic instructions and consensus formation between pathologists are needed for highly reproducible assessment. Most recently, Jepsen et al. reported that tumor budding evaluation with digital image analysis was more reproducible than the conventional method in pT1 and pT2 colorectal cancer [18]. In this context, computer-assisted budding assessment is a potential method for resolving these issues.

Here, we conducted a retrospective study to determine the optimal cut-off value between low- and high-grade tumor budding evaluated with cytokeratin immunohistochemistry for stratifying T1 colorectal cancer. In addition, we developed a computer-assisted semiautomatic evaluation method and compared the efficacy of each predictive factor between the manual and semiautomatic methods.

Materials and methods

Case recruitment and sample preparation

Five hundred and twenty consecutive T1 colorectal cancer cases from 517 patients resected at The Cancer Institute Hospital between 2005 and 2012, were included in the case review. We adopted both endoscopically (n = 171) and surgically (n = 349) resected cases to avoid selection biases to determine the actual incidence of lymph node metastasis. Among the patients treated with endoscopic resection, 96 patients (56.1%) underwent additional intestinal resection with lymph node dissection, and 74 (43.9%) patients were followed up by periodic colonoscopy and/or abdominal computed tomography scanning. Serial 2-mm to 5-mm thick tissue sections of the whole lesion were cut from resected specimens fixed with 20% buffered formalin and embedded in paraffin, and then 3-μm thick sections were prepared for staining. Each section was stained with hematoxylin and eosin. After the initial assessment, 57 cases failed to complete the histologic evaluation due to the sample condition (piece-meal resection, etc.), and eventually 463 cases were included in the study. The cases were split into a derivation dataset (n = 318, between 2005 and 2010) and a validation dataset (n = 145, 2011 and 2012). All the cases in this study were initially diagnosed by two pathologists (including N.Y.) and then later reviewed by another pathologist (M.T.). Any changes from the initial diagnosis are described in Supplementary Table 1. This study was performed in accordance with the ethics committee of The Cancer Institute Hospital, Japanese Foundation for Cancer Research (Number: 2012-1052).

Histologic assessment and special staining

One representative section from each case was selected for evaluation of the following factors: tumor depth, lymphatic invasion, venous invasion, poorly differentiated clusters, and tumor budding. Venous invasion was confirmed by hematoxylin and eosin with Victoria-blue staining, which highlights elastic fibers. The depth of submucosal invasion was defined according to the Japanese Society for Cancer of the Colon and Rectum criteria [19]. The poorly differentiated clusters were defined as cancer clusters comprising five or more cells and lacking glandular formation, and regarded as positive if the size was at least 0.24 mm2 using a ×40 objective lens [9]. We evaluated tumor budding using cytokeratin immunohistochemistry (primary antibody: mouse monoclonal anti-cytokeratin AE1/AE3 antibody, clones: AE1 and AE3, in cocktail, Leica Biosystems, Newcastle, UK). We also evaluated the microsatellite instability status. As the majority of sporadic microsatellite instability-high colorectal cancers have lost the expression of MLH1 [20], we evaluated MLH1 expression by immunohistochemistry (primary antibody: mouse monoclonal anti-MLH1 antibody, clone: ES05, Leica Biosystems). The immunohistochemical staining procedures were performed by an auto-stainer (Leica Bond-III, Leica Biosystems). The definition of the tumor budding for manual evaluation was the same as in previous reports [7, 8, 12], which is currently described in the Japanese Society for Cancer of the Colon and Rectum criteria [19]. In brief, tumor budding is a cancer cell or a cluster comprising one to four cells in the invasive frontal region. Pathologists chose the microscopic field using a ×20 objective lens (equal to 0.95 mm2) containing the most frequent tumor budding, and counted the number of foci.

Computer-assisted semiautomatic tumor budding detection

Automatic detection of tumor budding foci was carried out on micrographs of cytokeratin immunohistochemistry-stained sections. Pathologists selected one field containing the most frequent tumor budding focus (0.95 mm2), the same square measure as used for the manual evaluation, and obtained a micrograph of each case. In addition, we considered a different field size (0.785 mm2), according to a previous report [21]. A binary image of each micrograph was generated, and the automatic budding selection was performed according to the size and circularity of the foci. We measured the size (µm2) of a total of 500 randomly selected cancer clusters (1–5 cells) and non-budding cytokeratin-positive fragments (0 cells) to determine thresholds for the automatic selection of budding foci. We also measured the circularity of a total of 100 cytokeratin-positive objects to determine the thresholds. After image processing, manual corrections for eliminating inappropriately selected foci, which were intra-glandular, intra-mucin, glandular, and cytokeratin-negative objects. We defined this evaluation workflow as the semiautomatic method. We used Image J software (National Institute of Health, Bethesda, MD, USA) for image analysis [22]. The workflow from a cytokeratin image to an automatic counting result was executed by macro programming of Image J (Fig. 1a). Macro programming is a series of automated Image J commands that can be used in the macro editor of Image J. The programming codes are shown in Supplementary Table 2.

Fig. 1
figure 1

Workflow of semiautomatic tumor budding selection with Image J software. a Manual and automatic procedures. b Original image of cytokeratin immunohistochemistry. c Binary image of cytokeratin immunohistochemistry. d Automatically selected budding foci (blue) by size from 100 µm to 480 µm. e Automatically selected tumor budding foci by size and circularity of 0.45–1.00. Note that some of the foci in (d) are excluded in (e) by low circularity (magnified images shown in the right bottom corner), and some inappropriate foci remain (red objects indicated by red arrows [magnified image shown in the right upper corner]). f Size of cytokeratin-positive objects including 1–5 cancer cells. Small objects not fulfilling the criteria of tumor budding are regarded as 0 cells. Gray zone, from 100 µm2 to 480 µm2, was adopted as tumor budding. g Circularity of true tumor budding and other foci. Gray zone, from 0.45 to 1.00, was adopted as tumor budding. Scale bars are 100 µm

Inter-observer agreement study

Fifty consecutive cases, between December 2007 and August 2008, were selected to evaluate the degree of inter-observer reproducibility. Three pathologists (M.T., H.K., and T.M.) evaluated the cases in this study. Budding was graded as low or high according to the optimal cut-off value for cytokeratin immunohistochemistry determined in this study. Prior to evaluation, the pathologists discussed some adoption criteria for budding foci using 10 representative cases that were not part of the 50 consecutive cases. Similarly, the three pathologists and two non-pathologists (M.K.: researcher, and Y.T.: technician) who participated in this study discussed the criteria of the semiautomatic method for 10 representative cases.

Statistical analysis

The primary endpoint of this study was lymph node metastasis. In cases without surgical treatment, 5-year disease-free survival was regarded as no lymph node metastases. For the derivation dataset, cut-off values of low- or high-grade budding for both the manual and semiautomatic methods were determined by receiver operating characteristic (ROC) curves. The balanced error rate of the ROC curves was considered a candidate value, and the optimal cut-off value was determined based on the odds ratio for lymph node metastasis. The area under the ROC curve (AUC) was considered the predictive ability of the variables, which was ranked as follows: ≤0.70: low, 0.70–0.90: moderate, and ≥0.90: high. In the inter-observer agreement study, we used kappa statistics, which can be interpreted as follows: 0.41–0.60, moderate; 0.61–0.80, substantial; and >0.80, almost perfect, according to a previous study [8].

Univariate and multivariate analyses were conducted for the derivation dataset. In the univariate analyses, the qualitative factors were analyzed with Fischer’s exact test, and the quantitative factors were analyzed using unpaired Student’s t-test with Welch’s correction. Multivariate logistic regression was used to calculate the odds ratios and 95% confidence intervals after simultaneously controlling for potential confounders. Selection of the variables was determined using a stepwise forward selection method, for which p <  0.10 was considered statistically significant. We analyzed two regression models with the manual and semiautomatic methods, and compared the models using the Akaike information criterion. For the validation dataset, we compared the two methods with calibration plots for predictive and observed frequencies of lymph node metastasis. Correlation of the frequencies was calculated by Pearson’s correlation coefficient. We developed a scoring system for the validation dataset depending on the odds ratios of the multivariate analysis, and calculated the scoring points using the approximate ratios. We conducted all analyses using R version 3.2.4 [23].

Results

Semiautomatic tumor budding selection

Representative figures show the workflow of automatic selection, including three inappropriately selected foci eliminated by manual correction (Fig. 1b, c, d and e). The size thresholds to select the budding/sprouting foci comprising one to four cells were >100 µm2 and <480 µm2, with an average size of 253.1 µm2 (Fig. 1f). The sensitivity and specificity for selecting the foci was approximately 90% for both the upper and lower thresholds. Because there were two candidate thresholds of circularity, 0.35–1.00 and 0.45–1.00, we compared both thresholds using ROC curves. The AUCs were consistently better with manual correction than without it, and almost the same AUCs were observed between 0.35–1.00 and 0.45–1.00 (Supplementary Fig.1a). The latter circularity range showed significantly fewer inappropriately selected foci (Supplementary Fig. 1b), and therefore we applied the threshold of 0.45–1.00 for the semiautomatic evaluation method (Fig. 1g).

Cut-off value of tumor budding on cytokeratin immunohistochemistry

Representative figures of tumor budding in the invasive front of T1 colorectal cancer are shown in Fig. 2. The average tumor budding counts on cytokeratin immunohistochemistry-stained sections using the manual and semiautomatic methods were 5.9 ± 6.4 and 9.1 ± 12.6 (±SD), respectively. The average number of manually corrected foci was 1.84 ± 2.82 (±SD) (Supplementary Fig. 1b). The AUCs of the manual and semiautomatic methods were 0.759 and 0.731, respectively (Fig. 3). The optimal cut-off values for high-grade budding using the manual and semiautomatic methods were ≥10 and ≥12, respectively. In addition, the same cut-off value, ≥12, was obtained using the semiautomatic method with a 0.785-mm2 field size, and the average budding count was 8.4 ± 11.3 (±SD; Supplementary Fig. 1c). The lymph node metastasis rates for each number of buds are shown in Supplementary Figure 2. These results indicate that both methods showed moderate accuracy for metastatic prediction, and the manual method was slightly better.

Fig. 2
figure 2

Representative figures of the invasive front of submucosal invasive colorectal cancer stained with a hematoxylin and eosin and b cytokeratin immunohistochemistry. The tumor budding count was 5 in hematoxylin and eosin staining, and 14 in cytokeratin immunohistochemistry-stained sections. c Automatic selection of tumor budding, showing 13 identified clusters. Scale bars are 100 µm

Fig. 3
figure 3

Receiver operating characteristics (ROC) curves for determining the cut-off values of high-grade tumor budding. Areas under the ROC curves analyzed by manual and semiautomatic methods are 0.759 and 0.731, respectively

Univariate and multivariate analyses of the risk factors for lymph node metastasis

Table 1 summarizes the results of univariate analyses, and each of the following four factors had a significant influence (p < 0.01) on lymph node metastasis: positive lymphatic invasion, positive venous invasion, positive poorly differentiated clusters, and high-grade tumor budding assessed using either the manual or semiautomatic method (Table 1). In multivariate analyses, high-grade tumor budding evaluated using both the manual and semiautomatic methods was an independent risk factor for lymph node metastasis (Table 2). Positive lymphatic invasion and positive venous invasion were also independent risk factors (Table 2). Positive poorly differentiated clusters were not statistically significant. The Akaike information criteria were 158.0 and 163.7 for the manual and semiautomatic methods, respectively. These results indicate that both the manual and semiautomatic methods can be used as predictive indicators for lymph node metastasis, and the manual method was slightly better for the regression model.

Table 1 Univariate analysis of features for lymph node metastasis
Table 2 Multivariate analysis of the four parameters using a logistic regression model

Inter-observer agreement was significantly improved with the semiautomatic method

Among the three pathologists, the average kappa coefficients evaluated with the manual and semiautomatic methods were 0.463 and 0.781, respectively, and the difference was statistically significant (Fig. 4). Furthermore, among pathologists and non-pathologists, the coefficient was 0.694, indicating substantial inter-observer agreement, which was significantly higher than the manual method (Fig. 4). These results indicate that the semiautomatic method can provide reliable evaluative quality, not only by pathologists but also by non-pathologists.

Fig. 4
figure 4

Average kappa coefficient in the inter-observer agreement study. Among three pathologists, the average kappa coefficient was significantly higher with the semiautomatic method than the manual method. Similarly, among three pathologists and two non-pathologists, the coefficient was significantly higher with the semiautomatic method than the manual method. *p < 0.05 by Student’s t-test

Excellent lymph node metastasis prediction for the validation dataset

In the validation dataset, the correlation coefficients between predicted and observed lymph node metastases assessed by the models with manual and semiautomatic methods were 0.908 and 0.929, respectively. Calibration plots for regression models using both methods showed excellent correlation between predicted and observed metastatic frequencies (Fig. 5). We determined the scoring points based on the results of multivariate analysis, as follows: 3 points for lymphatic invasion, 2 points for venous invasion, 3 points for high-grade tumor budding with the manual method, and 2 points for high-grade budding with the semiautomatic method. As a result, similar lymph node metastasis rates were obtained by both methods (Table 3). These results indicate that both manual and semiautomatic methods can effectively predict lymph node metastasis, and the model using the semiautomatic method was slightly better.

Fig. 5
figure 5

Calibration plots for the validation dataset. Each dot represents lymph node metastasis frequency of cases with a different combination of risk factors. Both models with manual and semiautomatic methods showed excellent prediction for lymph node metastasis

Table 3 Predictive simulation for the validation dataset

Discussion

In the present study, we determined the optimal cut-off value for high-grade tumor budding for T1 colorectal cancer on cytokeratin immunohistochemistry-stained sections, and established a computer-assisted semiautomatic evaluation method with high reproducibility. Conventional histologic evaluation of tumor budding by hematoxylin and eosin staining is well investigated. The most widely accepted definition of tumor budding is a cell cluster comprising fewer than five cells [7, 8, 12]. The Japanese Society for Cancer of the Colon and Rectum guideline for T1 colorectal cancer defines high-grade budding (Grades 2 and 3) as 5 or more foci in the field of a ×20 objective lens [19]. According to this guideline, the grade of tumor budding should be described in routine pathology reports for T1 colorectal cancer, but the recommended evaluation method (e.g., hematoxylin and eosin staining only) is not described. On the other hand, the International Tumor Budding Consensus Conference group strongly recommended that tumor budding should be evaluated on hematoxylin and eosin, since the majority of the studies were based on hematoxylin and eosin staining [21]. Some studies have explored the importance of tumor budding evaluated by cytokeratin immunohistochemistry. Ohtsuki et al. reported that a significantly higher budding count was obtained by cytokeratin immunohistochemistry than by hematoxylin and eosin staining in advanced colorectal cancer, and encouraged the use of immunohistochemistry for more accurate prediction of disease-free survival [24]. In daily clinical practice, cytokeratin immunohistochemistry is performed for T1 colorectal cancer with obscure budding foci, especially in cases with prominent inflammation to avoid misidentifying some plump nuclei of fibroblasts or endothelial cells as budding [25]. Cytokeratin immunohistochemistry is helpful for distinguishing true tumor budding from other cells, but an inaccurately high budding count may lead to overestimation of the budding grade and unnecessary additional surgery. In this study, we determined the optimal cut-off value for cytokeratin immunohistochemistry by manual evaluation as ≥10, which was twice as high as the value for evaluation with hematoxylin and eosin staining. Our results clearly indicate that different cut-off values should be applied to cases evaluated with cytokeratin immunohistochemistry in T1 colorectal cancer.

Inter-observer agreement for tumor budding related to cytokeratin immunohistochemistry was described in previous studies. Kai et al. described the usefulness of immunohistochemistry for unskilled pathologists to improve the agreement compared with hematoxylin and eosin staining [17]. A recent report described that immunohistochemical budding evaluation is not an independent risk factor for lymph node metastasis of T1 colorectal cancer, and the cut-off value of high-grade tumor budding on cytokeratin immunohistochemistry-stained sections was 8 [26], which substantially differs from our result. This discrepancy may be due to the interpretation gap among pathologists for cytokeratin-positive foci. These facts indicate that sufficient inter-observer consensus formation is necessary before clinical application of the manual methods. The novel semiautomatic method proposed in this study can provide a highly reproducible budding evaluation with only 10 cases for substantial consensus formation, not only by pathologists but also non-pathologists. This method provides better reproducibility than conventional methods, requiring only cytokeratin immunohistochemistry and a computer with Image J software, which is universally available.

Our semiautomatic method can be accomplished using a single micrograph and a simple computer operation with manual corrections, which is a better tool for evaluating tumor budding than the conventional method. The method takes only a few minutes to perform, and can thus be readily used in daily diagnostic practice with the freely available Image J program and preinstalled macros. A fully automatic method, however, is preferred to completely avoid investigator error. In our semiautomatic method, the computer software selected some inappropriate objects, which resulted in false-positive clusters. In addition, a single cancer cell smaller than 100 µm2 was neglected in the semiautomatic selection, which resulted in the elimination of true tumor budding and was considered a false-negative focus. Similarly, tumor clusters over 480 µm2 in size were also neglected. Because of these objects, the predictive value might not be superior to manual evaluation. A previous study reported similar problems with false-negative and false-positive objects, which required manual adjustment for efficient prediction of lymph node metastasis [18]. We used freely available Image J software and adopted only two parameters for selecting tumor budding foci, size, and circularity; hence, the method is simple and has universal applicability. Further development of the computer software is needed for full-automation.

In summary, a proper cut-off value should be applied for evaluating tumor budding of T1 colorectal cancer with cytokeratin immunohistochemistry. The semiautomatic method proposed in this study may be a useful alternative for highly reproducible tumor budding evaluation in clinical practice.