Introduction

The annual incidence of pancreatic ductal adenocarcinoma (PDAC) has increased worldwide, and PDAC is a major cause of cancer-related death in Europe and the United States1,2,3. In Japan, PDAC is the fifth and third leading cause of cancer-related death in men and women, respectively4. Despite improvements in diagnostics and therapeutics, the prognosis of PDAC remains dismal, with a 5-year overall survival (OS) rate of approximately 5%5,6. Because of the lack of effective screening methods and the aggressive biology of PDAC, it is typically diagnosed at an advanced stage when patients present with symptoms7. Only 15–20% of patients present with resectable disease, while 30% present with borderline resectable or locally advanced disease8. In addition, the 5-year OS rate is only 30%, even among those who underwent curative resection9. Thus, neoadjuvant therapy (NAC) is increasingly used to control local tumor spread and micrometastasis of PDAC. Recent studies have shown that NAC improves OS in patients with resectable and borderline resectable PDAC10,11,12,13,14. Furthermore, a Japanese phase III study demonstrated the significant survival benefits of gemcitabine and S1 after NAC for patients with resectable PDAC15. However, prognostic markers for PDAC resected after NAC are yet to be determined.

Pathology assessments of residual tumors and tumor regression may be useful to predict patient outcomes after post-neoadjuvant resections for PDAC. However, multiple systems are currently available to assess tumor regression, and each system has distinct criteria; thus, it is difficult to correlate grades between the systems. The most commonly used tumor regression grading systems worldwide are the College of American Pathologists (CAP) and Evans’ systems. Recently, new systems have been introduced by The University of Texas M.D. Anderson Cancer Center (MDA)12,16,17 and the Japan Pancreas Society (JPS) (Table 1)18. Both the Evans’ and JPS grading systems are specific for PDAC and are commonly used in Japan19,20. The CAP grading system is not specific for PDAC but is also used for other cancers including those of the colon, rectum, and bile duct and is typically used in the United States21,22. MDA is similar to CAP, but a three-tiered system instead of four-tiered12,16. The Evans’ and JPS grading systems specify a percentage of tumor cell viability or destruction for each grade, whereas the CAP and MDA systems do not. Moreover, the JPS, CAP and MDA grading systems, but not Evans’, require estimating tumor bed (considered to reflect treatment-related fibrosis secondary to tumor cell death) and then evaluating a proportion of the residual tumor. Thus, it is difficult to compare the Evans’, JPS, CAP, and MDA grades (Table 1) except for complete responses (Evans’ IV, CAP 0, MDA 0, and JPS 4) that are easy to understand across all four grading systems.

Table 1 The five grading systems used to assess pancreatic tumor regression.

Another issue associated with the four systems is the ambiguity of some criteria. For instance, it is difficult to determine the viability of degenerative tumor cells. Further, it can be challenging to distinguish treatment-related necrosis (nonviable tumor cells) from tumor necrosis17,23. Furthermore, differentiating treatment-related fibrosis from cancer-associated fibrosis may be complex and subjective given that even treatment-naïve PDAC often exhibits prominent fibrosis—desmoplastic reaction and/or associated chronic pancreatitis23,24. Such difficulties in the interpretation may cause interobserver disagreement in the assessment of tumor regression.

Recently, we have reported the prognostic utility of measuring the largest area of residual tumor (ART) with a digital platform in pancreatic, gastric, lung, and rectal cancers25,26,27. ART is designed to be more objective by eliminating the process of estimating the original tumor area. Unfortunately, ART may not be practical because imaging software is needed to measure the residual tumor area. Therefore, a semi-quantitative grading system based on a number of microscopic fields equivalent to ART has been proposed25. Such a grading system appears to be more objective than the commonly used grading systems and can be applied in routine pathology practice. In the present study, we assessed and compared the reproducibility and prognostic performance of a modified ART grading system with those of the four grading systems using a multicenter cohort, in the hope of identifying the most clinically relevant grading system to assess tumor regression in post-neoadjuvant resections for PDAC.

Results

Clinical features of the study cohort

The study cohort consisted of 97 patients with PDAC (median age: 66 years, Supplementary Table 1). Prior to NAC, most (53%) cases were classified as borderline resectable, followed by resectable (31%), metastatic (9%), and locally advanced (7%). The neoadjuvant regimens were GS only in 55 patients (Supplementary Table 2) and GS with radiation in 42 patients (Supplementary Table 3). At the time of resection, most cases had stage I or II tumors with negative resection margins.

Histological changes following post-neoadjuvant treatment for PDAC

PDAC following NAC often showed the degeneration (Fig. 1A,B) and necrosis (Fig. 1C) of cancer cells. In this study, we defined non-viable tumor cells as those exhibiting pyknosis, karyorrhexis, karyolysis, or the disappearance of nuclei. If it was difficult to distinguish non-viable cells secondary to treatment from degenerative tumor cells secondary to cancer-related ischemic changes, we did not consider those as non-viable cells to avoid overestimating the treatment effects. As for the assessment of fibrosis, we simply evaluated a ratio of the residual tumor cells over the fibrous stroma for the CAP and JPS grading systems, as it was difficult to differentiate fibrosis secondary to NAC from pre-existing or cancer-related chronic pancreatitis. When a few tumor cells were scattered in the fibrous stroma, it was considered a moderate response based on the fraction of the residual tumor (Fig. 1D). Macrophage aggregates without cancer cells (Fig. 1E), vascular degeneration (Fig. 1F), and acellular mucous pools were also considered treatment effects, and we estimated the total tumor mass before NAC including the areas with those lesions in each case.

Figure 1
figure 1

Histologic changes after neoadjuvant treatment for pancreatic ductal adenocarcinoma (PDAC). (A) Degenerative cancer cells and inflammatory cell infiltration. (B) Degenerative cancer cells in the fibrous tissue. (C) Necrotic cancer cells. (D) A few cancer cells in the fibrous tissue (major response). (E) Macrophage infiltration without cancer cells. (F) Degeneration of a vessel.

Concordance of tumor regression grading system for PDAC

Table 2 shows the agreement of assessments between the two observers using the five grading systems. The agreement was the highest with the MDA system (95.9%), compared to those with the Evans’, CAP, JPS and ART systems (58.8%, 72.2%, 55.2% and 76.3%, respectively). Among the Evans’, JPS and ART systems with five-tiered grading, ART showed the highest agreement. The interobserver concordance of the five grading systems was fair to substantial (kappa value: Evans’ 0.34, CAP 0.50, MDA 0.65, JPS 0.33 and ART 0.60), and the ART system had the highest value among the 3 five-tiered systems. For individual grades, agreements on Evans’ IIa (41.9%), CAP 1 (42.9%), MDA 1 (42.9%), JPS 2 (30%) and ART 2 (35.3%) were lower than those of the other grades.

Table 2 Comparison of tumor regression grades between the two pathologists for the five systems.

Prognostic value of tumor regression grade for PDAC

The median follow-up of the entire study cohort was 20.7 months (range: 0.7–61.7 months). Complete response (Evans’ Grade IV, CAP Score 0, MDA 0, JPS Grade 4 and ART Grade 0) was seen only in one case who showed no recurrence or cancer-specific death. Upon stratifying the study cohort by tumor regression grades in each system, there was a trend toward correlation of lower regression grades with shorter OS and RFS (Supplementary Figures 1 and 2). However, there were significant overlaps in all systems, while the MDA and ART grading systems appeared to have better discrimination in survival curves for both RFS and OS among the five systems.

Figure 2
figure 2

Assessment of the area of residual tumor (ART) scores. (A) score 4; (B) score 3; (C) score 2; (D) score 1. (E) Enlarged view of (D). Arrows indicate cancer cells. Cytokeratin 19 staining is shown in the inset. (F) There were two tumor foci at a distance ≥ 2 mm; thus, it was considered score 2. Black line, remnant tumor area; blue circle, estimated view with a ×4 objective lens.

Therefore, ROC analysis was performed to determine the best cut-off for high- vs. low-grade tumor regression to predict clinical outcomes in each grading system (Table 3). The analysis identified the cut-off between Grades I and IIa in Evans’, 2 and 3 in CAP, 1 and 2 in MDA, 1 and 2 in JPS and 3 and 4 in ART as the largest areas under the curves, confirming the optimal cut-off point of between 3 and 4 in ART, as proposed in a previous study25. Upon using the established cut-off level, univariate analysis showed that the high-grade regression group in the ART grading system had significantly longer OS and RFS than the low-grade regression group, while there was no significant difference in patient outcomes between low- and high-grade regression groups in the other grading systems (Figs. 3, 4 and Supplementary Table 4). However, high-grade regression based on the ART grading system did not remain as a predictor of favorable survival (P = 0.219 for OS and P = 0.253 for RFS) upon multivariate analysis. In this model, small vessel invasion and positive resection margin were associated with shorter OS (P = 0.040 and 0.015, respectively) and the male gender and adjuvant treatment with shorter RFS (P = 0.010 and 0.007, respectively). Furthermore, high-grade tumor regression in accordance with the ART grading system was associated with tumors located in the body and tail, preoperative diagnosis of metastatic disease, negative vascular invasion, negative perineural invasion, and lower pathologic stage in all patients (Table 4), chemotherapy-treated patients (Supplementary Table 2) and chemoradiotherapy-treated patients (Supplementary Table 3).

Figure 3
figure 3

Overall survival after resection stratified by high- vs. low-grade regression. CAP, College of American Pathologists; MDA, the University of Texas M.D. Anderson Cancer Center; JPS, Japan Pancreas Society; ART, Area of Residual Tumor.

Figure 4
figure 4

Recurrence free survival after resection stratified by high- vs. low-grade regression. CAP, College of American Pathologists; MDA, the University of Texas M.D. Anderson Cancer Center; JPS, Japan Pancreas Society; ART, Area of Residual Tumor.

Table 3 Adequate cut-off value to estimate clinical outcomes.
Table 4 Clinicopathological characteristics of high- and low-grade regression groups based on ART scores.

Discussion

Multiple previous studies have shown the efficacy of NAC for resectable, borderline resectable, and locally advanced PDAC28. Volume reduction by NAC has been reported to contribute to the increased number of curative resections with fewer complications and provide better clinical outcomes in PDAC. Pathological features of the tumor after NAC may serve as prognostic factors in these cases. For instance, marked fibrosis, perineural invasion, muscular vessel invasion, and tumor stage as determined by the American Joint Committee for Cancer have been associated with prognosis12,13,24,29. In addition, the extent of tumor regression after NAC has been reported as a predictor of clinical outcomes after resection20,30,31; thus, it is important to establish a pathology grading system to assess the extent of tumor regression that is clinically relevant and practical. Currently, there are multiple tumor regression grading systems available for post-neoadjuvant pancreatic resections, and few studies have compared the clinical relevance and practicality between those systems32. The present study is the first multicenter study to evaluate and compare the reproducibility and prognostic performance among multiple tumor regression grading systems. Of the grading systems evaluated in this study, ART, a new grading system that we had proposed, showed high interobserver concordance and a significant association with patient outcomes in the univariate analysis.

Marked tumor regression greater than Evans’ Grade IIb, in which > 50% of tumor cells are non-viable, has been reported to predict favorable outcomes after resection in patients with PDAC who had received preoperative chemoradiation therapy33. Similarly, CAP and MDA Grades 0 & 1 were associated with significantly more favorable patient outcomes than Grades 2 & 3 and Grade 2, respectively16,20. In the current study, high-grade tumor regression was associated with better prognosis in all grading systems evaluated, but only the ART grading system showed statistical significance supporting its clinical relevance. This system was based on our prior study that used morphometry to measure ART and showed its significant association with patient outcomes25. We established a tumor regression grading system based on a number of 40× microscopic fields equivalent to ART, explored multiple cut-off values to identify the best cut-off, and confirmed the prognostic relevance of the ART grading system in this study, thereby, translating our scientific evidence into pathological practice.

On a somber note, only one (1.0%) patient achieved complete response in this study, significantly less than that previously reported14,20. The lower response rate could be attributed to the study cohort comprised of 80% borderline resectable or more advanced tumors, but it may also be explained by the difference in NAC regimens used. For instance, previous studies wherein most patients with PDAC were treated with chemoradiation reported complete response in 2.5–2.7% patients of the study cohort14,20. In the current study with the vast majority of patients treated with GS-based chemotherapies, major response was seen in 9.5% of patients who had also received radiation and in 5.5% of those who had received chemotherapy only (P = 0.141, data not shown). These results are consistent with the findings of a recent study in which preoperative chemoradiation therapies resulted in more prominent fibrosis and smaller ART than preoperative chemotherapies for rectal cancer26. Furthermore, in a study using preoperative FOLFIRINOX for PDAC, complete response was reported in 13% of the cohort patients34. Therefore, large-scale multi-cohort studies are warranted to assess the performance of various NAC regimens on tumor regression.

Reproducibility is a major problem in pathologic assessments for PDAC after NAC. Kalimuthu and colleagues have reported that the CAP, Evans’, and MDA tumor regression grading systems had suboptimal concordance among four gastrointestinal pathologists17. Similarly, each of the five systems evaluated in the current study showed fair to moderate concordance between the two observers. The concordance was particularly low with the Evans’ and JPS grading systems that require estimating the degree of tumor cell degeneration, although we defined the viability of tumor cells as precisely as possible. Furthermore, using fibrosis as a surrogate for the pre-treatment tumor area to assess regression could be contentious given the complexity and subjectivity in differentiating fibrosis secondary to treatment from the fibrosis of chronic pancreatitis that is cancer-related and/or pre-existing17. While the MDA system with 3 tiered grading that also involves the assessment of tumor bed achieved substantial concordance in this study, we believe that assessing tumor regression based on fibrosis remains controversial. Conversely, the ART grading system does not require the pathologist to estimate the tumor area before NAC and is much simpler than the other systems leading to better reproducibility than the vast majority of the commonly used grading systems. Now, we have shown that the ART grading system has not only prognostic relevance but also good reproducibility; thus, it may be most practical for the assessment of tumor regression in post-neoadjuvant resections for PDAC.

The present study has several limitations. First of all, interobserver concordance was assessed by only two observers. More importantly, the study cohort was relatively small, and most patients were treated with GS-based chemotherapy with or without radiation. Given the efficacy of FOLFIRINOX34, an increasing number of patients are being treated with the regimen; thus, we have planned a validation study to evaluate the reproducibility and prognostic utility of the ART grading system by a larger number of observers in larger cohorts that include patients treated with FOLFIRINOX. In addition, multivariate analysis failed to confirm the prognostic significance of ART. It may be attributed in part to the small cohort size, but it also indicates the limitation of prognostic prediction based solely on a single pathological parameter. A comprehensive prediction model with multiple pathological and clinical variables may be more useful to accurately predict and stratify patient outcomes35.

In conclusion, the commonly used tumor regression grading systems showed no bearing on patient outcomes after post-neoadjuvant resections for PDAC with fair to substantial interobserver concordance, while the ART grading system that was designed to be simple and more objective has achieved good reproducibility and showed a prognostic value. Although additional studies are warranted to further evaluate the clinical utility of the ART grading system in larger cohorts treated with various neoadjuvant regimens, we believe that it has the potential to become the standard grading system to assess tumor regression in post-neoadjuvant resections for PDAC.

Materials and methods

Study cohort

The study cohort consisted of 97 patients with PDAC who had undergone post-neoadjuvant pancreatectomy at the National Cancer Center Hospital East, Juntendo University, Tokai University School of Medicine, or Tokyo Medical University between 2013 and 2017. All patients received gemcitabine and S-1 (GS)-based neoadjuvant chemotherapies with or without radiation (Supplementary Tables 1, 2, and 3). The resected pancreas was routinely fixed with formalin and sectioned every 5 mm vertical to the main pancreatic duct. All sections from the entire pancreatic specimen and lymph nodes were processed in paraffin-embedded tissue blocks (mean, 32 raging from 11 to 66). All tissue blocks were sectioned, stained with hematoxylin and eosin, and microscopically evaluated. The pathological diagnosis for each case was assigned by a gastrointestinal pathologist in accordance with the 2019 WHO Classification of Tumours of the Digestive System36, 37. Only patients with conventional PDAC were included in the study cohort, while those with invasive carcinomas arising in association with intraductal papillary mucinous neoplasm or mucinous cystic neoplasm, acinar cell carcinoma or neuroendocrine carcinomas were excluded.

This study was conducted in accordance with the principles embodied in the 2008 Declaration of Helsinki and was approved by the ethics committees of Tokyo Metropolitan Geriatric Hospital (permit #16-47), National Cancer Center Hospital East (#2017-358), Juntendo University (#19-056), Tokai University School of Medicine (#16R273), and Tokyo Medical University (#T2018-0001). Informed written consent to use the tissues was obtained from all patients.

Evaluation of tumor regression by Evans, CAP, MDA, JPS and ART grading systems

Two observers (Y.M. and M.M.-K.) individually reviewed all histology sections from each case to assess the residual tumor using the following grading systems: (1) Evans’, which evaluates the fraction of necrotic cells among the residual cancer cells; (2) CAP, which evaluates the amount of residual tumor in correlation with fibrosis; (3) MDA, which is a modified CAP system with three-tiered grading16,17; (4) JPS, which combines the CAP and Evans’ systems; and (5) the ART grading system (Table 1). The ART system was proposed on the basis of our previous report on ART26. We microscopically evaluated the area of residual tumor (black line, Fig. 2A–D) in the slice (cross section) that had the most abundant residual tumor, and then graded tumor regression in accordance with a number of microscopic fields (blue circle) with a 4× objective lens (UPlanSApo 4× , Olympus, Tokyo, Japan; the estimated surface area of 23.75 mm2/40× magnification) that collectively cover the largest ART as follows: Score 0, no remaining viable cancer cells; Score 1, ≤ 1 field; Score 2, > 1 and ≤ 2 fields; Score 3, > 2 and ≤ 3 fields; Score 4, > 3 fields (Table 1). When it was difficult to identify the cancer cells with a 4 × objective lens, the specimen with evaluated with higher magnifications. After mapping the ART with higher magnification, we evaluated the ART score with a 4× objective lens. One case required cytokeratin 19 staining to identify the small number of remnant cancer cells (Fig. 2E, arrows and inset indicate cytokeratin 19-positive cancer cells). When multiple residual tumor foci were identified in sections made from the slice and were at least 2-mm apart, we evaluated the individual foci and summed the numbers of microscopic fields (Fig. 2F, score 2). When multiple small foci were present close to each other (within 2-mm), they were considered to form one singe ART. Carcinoma in situ, acellular mucin (Fig. 2F), and lymph node metastasis were excluded from the assessment.

When the two pathologists recorded the same grade, it was used as the final grade. If the two pathologists recorded different grades, the third pathologist (M.K.) reviewed the case to determine the majority grade as the final grade. All observers were surgical pathologists with expertise in the field of PDAC and were blinded to clinical information, the original pathology diagnosis, and the other reviewers’ grades. Interobserver concordance between the two observers (Y.M. and M.M.-K.) was compared among the four systems. Recurrence-free survival (RFS) and OS were correlated with the final grades in each system.

Statistical analysis

Concordance between the two observers was assessed using Cohen's kappa coefficient. Receiver operating characteristics (ROC) were analyzed to determine the best cut-off value for each grading system. OS and RFS were analyzed based on Kaplan–Meier survival estimates. Significant survival-related factors according to univariate analysis (P < 0.05) were entered in a multivariate Cox proportional-hazards model. Clinicopathological characteristics were analyzed using chi-square test. P < 0.05 was considered to indicate significance in all analyses. Statistical analyses were performed using the StatView J version 5.0 software package (SAS Institute, Inc., Cary, NC, USA) and SPSS version 22 (IBM Corp., New York, NY, USA).