Introduction

Neoadjuvant therapy is increasingly used for the treatment of patients with resectable, borderline resectable, and locally advanced pancreatic ductal adenocarcinoma (PDAC), as it may improve the margin-negative resection (R0) rate, disease-free survival (DFS), and overall survival (OS) [1,2,3,4,5]. Histologic examination of pancreatic cancer resection specimens following neoadjuvant therapy offers the opportunity to assess the effect of treatment at the tissue and cellular levels. This histologic assessment can serve two purposes. First, in the context of randomized controlled trials, comparison of the histologic tumor response allows evaluation of the effectiveness of different neoadjuvant treatment regimens. Second, for the individual patient, the histologic tumor response reflects the sensitivity or resistance of the patient’s cancer cells to the neoadjuvant treatment, which may in turn help with the selection of adjuvant treatment. For both purposes, tumor response scoring (TRS) systems must be reproducible and must correlate with patient outcome.

Several TRS systems for PDAC following neoadjuvant therapy have been proposed [6,7,8,9,10,11,12,13,14,15,16,17]. However, there is no international agreement on which system represents best practice. The difficulties in reaching consensus regarding the ideal TRS system are multifold. First, the extent of tissue sampling, which is key to compensating for intratumor heterogeneity of the tumor response (as detailed in Fig. 1), is often not specified but probably varies among published studies. Second, interobserver agreement is poor, which is probably related to the difficulty in applying the criteria that define the different categories in the TRS systems [17, 18]. For example, criteria for the recognition of histologic features such as “rare small groups of cancer cells” are subjective and may contribute to interobserver variability. Third, existing TRS systems differ in the number of distinct categories and thereby vary in discriminative prognostic power and in ease of application [19, 20]. In view of these unresolved issues, it is not surprising that comparative studies on the predictive performance of the proposed scoring systems are rare and provide limited evidence as to which TRS system performs best with regard to the trade-off between prognostic accuracy, interobserver agreement, and applicability in daily practice [21].

Fig. 1: Image descriptions: the amount and distribution of cancer cells and fibrosis are often very heterogeneous across the different tumor areas, both in non-treated and treated patient.
figure 1

Representative micrographs 1 and 2 show two distinct areas from the same pancreatic specimen that was treated with neoadjuvant chemotherapy. Representative micrographs 3 and 4 show two distinct areas from the same pancreatic specimen that was not treated with neoadjuvant chemotherapy.

In an attempt to identify, discuss, and overcome the challenges encountered in the assessment of pathologic tumor response following neoadjuvant therapy in resected pancreatic cancer, an international consensus meeting was organized. This article provides an overview of the considerations, outcomes, and consensus statements that originated from this meeting.

Materials and methods

The consensus meeting took place on November 22, 2019 at Schiphol Airport, Amsterdam, the Netherlands and was attended by 18 of the 23 invited expert pancreatic pathologists from 9 countries including 4 continents: North America (United States of America), Europe (Italy, the Netherlands, Norway, Turkey, and the United Kingdom), Oceania (Australia), and Asia (Japan and Korea). Also invited and attending were a medical oncologist, two surgeons, a radiation oncologist, two experts in artificial intelligence, and five junior researchers. In total, 29 clinicians and researchers attended the meeting, all on invitation. Supplementary Table S1 lists all attendees, areas of expertise, and affiliations. A pre-meeting survey was conducted by S.v.R., A.F.S., C.S.V., L.A.A.B., and J.V., and the consensus meeting was organized and (co-)chaired by S.v.R., M.G.B., J.V., and L.A.A.B. During the meeting, the participants decided to form the ISGPP.

Pre-meeting survey

The digital pre-meeting survey was conducted among all invited pathologists (n = 23) to obtain a baseline impression of the areas of agreement and disagreement, using Google Forms. The survey consisted of 21 statements with additional comment sections, and one open question (Table 1). All statements were related to the assessment of tumor response following neoadjuvant chemo(radio)therapy in pancreatic cancer: (i) the clinical significance of TRS, (ii) histopathologic technique/sampling, and (iii) various TRS scoring systems. Participants were asked to state whether they agreed or disagreed with the given statements and were given the opportunity to expand on their agree/disagree answer, or to state uncertainty, in form of a supplemental free-text answer. When participants provided only the latter, the free-text answers were assigned to the agree/disagree/other categories. In case this was unclear, participants were contacted. A sub-analysis was performed to provide insight into differences in opinion between pathologists from different continents.

Table 1 Pre-meeting survey outcomes regarding the assessment of tumor response following neoadjuvant therapy in pancreatic cancer.

Proceeding of the meeting

The pre-meeting survey outcomes were presented at the beginning of the meeting and served as a starting point for further discussion. In addition, a summary of the current literature on TRS in PDAC (S.v.R.) and issues of controversy in the histopathologic assessment of tumor regression in PDAC were presented (C.S.V.). Eventually, following constructive discussion, consensus was reached on some topics. The topics on which there was disagreement were identified as requiring future study.

Post-meeting survey

Two months after the meeting, a post-meeting survey, prepared and conducted by S.v.R., A.F.S., M.G.B., L.A.A.B., H.W., C.S.V., and J.V., was sent to confirm consensus on several statements. The statements in this survey were constructed after the consensus meeting and aimed to represent the outcomes formulated by all participants at the end of the consensus meeting. Table 2 lists the post-meeting survey statements. Statements were considered consensus statements when ≥80% of respondents agreed.

Table 2 Consensus statements of the 2019 Amsterdam International Consensus Meeting on tumor response grading in pancreatic cancer specimens following neoadjuvant therapy.

Results

Pre-meeting survey

All 23 pathologists invited to participate in the pre-meeting survey completed the pre-meeting survey. Eight of the 21 pre-meeting statements reached ≥80% agreement. Seven statements had between ≥60 and <80% agreement. For six statements agreement was <60%. Table 1 shows the statements and pre-meeting survey outcomes. Supplementary Table S2 provides information on the comparative analysis between continents.

Open question: “What is the main goal and clinical relevance of TRS?”

Overall, respondents regard TRS as a tool that could be clinically relevant in three ways. First, TRS may help predict clinical outcome. Respondents expect the degree of tumor response to correlate with DFS and OS after surgery. Hence, TRS could be used to stratify patients in post-operative clinical trials, and to identify those patients who are most likely to benefit from adjuvant treatment due to a high likelihood of disease recurrence. Second, TRS is viewed as a tool that could potentially guide the choice of adjuvant (chemo)therapy. Little or no tumor response in the resection specimen would indicate that the tumor is resistant to the administered neoadjuvant agent(s). As such, TRS may guide clinicians to use different adjuvant therapies that target biological mechanisms other than those targeted by the neoadjuvant agent/regimen. Third, TRS is viewed as a potential objective parameter in studies comparing the effectiveness of different neoadjuvant regimens. More extensive tumor response in one treatment group could indicate superior treatment effect.

Post-meeting survey and statements

Twenty of the 23 (87%) pathologists invited to participate in the post-meeting survey completed the post-meeting survey. Each of the seven post-meeting statements scored ≥80% agreement. Table 2 shows the statements and the exact agreement rates. Here, we provide contextual descriptions of the consensus statements and their underlying arguments.

TRS is important because it provides information about the effect of neoadjuvant treatment that is not provided by other histopathology-based descriptors.

All pathologists of the ISGPP agreed that TRS is a clinically relevant parameter. Indeed, as the most direct measure of the efficacy of therapy, the tumor response score represents information that is only indirectly provided by other parameters. Furthermore, in addition to other “traditional” histopathologic parameters (such as tumor grade, margin status etc.), TRS, regardless of the scoring system, is widely seen as predictor of clinical outcome [10, 22,23,24,25,26,27]. However, current TRS systems do not allow prognostic stratification of the vast majority of patients, in whom neoadjuvant treatment results in less than (near) complete tumor regression.

TRS for resected PDAC following neoadjuvant therapy should assess residual (viable) tumor burden instead of tumor regression.

Most TRS systems are based on an evaluation of either the proportion of the cancer cells that remain viable following treatment or the proportion of tumor cells that have been destroyed by therapy. A problematic issue with both approaches is that the denominator, i.e., the tumor burden before therapy, is unknown. Moreover, it is unclear how the residual viable cancer cells should be assessed after therapy; for instance by counting residual cancer cell numbers or by measuring the size of the foci of residual cancer, and how this can be done in daily practice. Comparison with the original tumor size measured on imaging is inadequate, because tumor size measurements based on gross pathology and radiology often yield divergent results, even when no treatment has been given. Some TRS systems require determination of the amount of residual viable cancer cells in relation to the treatment-induced fibrosis [28]. However, fibrosis, for reasons other than neoadjuvant treatment (i.e., concurrent pancreatitis, obstructive changes of the surrounding parenchyma, and/or extensive stromal reaction inherent to pancreatic cancer, i.e., desmoplasia) is likely to be histologically indistinguishable from fibrosis secondary to tumor regression (as detailed in Fig. 1). To circumvent the inherent problems related to estimating tumor regression, the ISGPP recommends the use of scoring systems that assess residual (viable) tumor burden only.

The College of American Pathologists (CAP) scoring system is considered the most adequate scoring system to date because it is based on the presence and amount of residual cancer cells instead of tumor regression.

Multiple TRS systems have been proposed in the past decades [4,5,6,7,8,9,10,11,12,13,14,15]. Of these, the CAP, MDACC (MD Anderson), and Evans systems have been used and investigated most frequently (Table 3) [4, 6, 8]. The CAP system is a four-tiered descriptive system based on the amount of residual cancer remaining after therapy, and is adopted from a modified Ryan scheme originally proposed for neoadjuvantly treated rectal cancer [29]. The Evans system is five-tiered and based on the percentage of destroyed tumor. The MDACC system is three-tiered and based on the percentage of the remaining tumor. The MDACC system originated from a modification to the CAP system that merges CAP Grade 2 and 3 into one category. This was done because PDAC patients with a CAP grade 2 response and those with a grade 3 response had a comparable prognosis [10, 22]. Although the MDACC system is easy to use and is predictive for patient survival and prognosis, the ISGPP prefers the CAP TRS system because (a) it is based on the amount of residual cancer cells, instead of the measurement of percentage-based tumor regression and (b) the MDACC system classifies the majority (more than 80%) of PDAC patients as Grade 2 response (poor performer, >5% residual cancer). The ISGPP considered that provision of stratification/categories additional to those of the MDACC Grade 2 group would be more informative. Therefore, the majority of the ISGPP considers the CAP system the most informative to date (90% agreement).

Table 3 Most commonly used scoring systems of tumor response in resected pancreatic cancer after neoadjuvant therapy.

The defining criteria of the categories in the CAP scoring system should be improved by replacing subjective terms including “minimal” or “extensive” with objective criteria to evaluate the extent of viable tumor.

Although the CAP system is endorsed by the ISGPP as the most informative scoring system to date, it lacks clear definitions of each grade in terms of microscopic findings. Ideally, the defining criteria of the categories in the CAP scoring system should be improved by replacing subjective terms such as “minimal” or “extensive” with objective criteria to evaluate the extent of viable tumor. For example, the definition of score 1, “single cells or rare small groups of cancer cells,” leaves room for interpretation, especially when the tumor response is heterogeneous throughout the tumor and results in multiple small residual cancer foci. Similar subjective descriptive criteria are also used to define CAP scores 2 and 3. Future studies are necessary to define each category of the current CAP TRS system better and more objectively. For example, strategies may be explored wherein terms as “single cells or rare small groups of cancer cells” could be defined by a maximum diameter of viable tumor in millimeter or a maximum viable tumor area in square millimeter, the maximal number of microscopic foci of viable tumor, the absolute number of individual tumor cells in a given area, or by the number of cells per high-power field.

The improved, consensus-based system should be validated retrospectively and prospectively.

Ideally, validation of a new scoring system should be based on a comparison with other currently available scoring systems in order to test its superiority. Both the inter- and intra-observer agreement and prognostic significance will require evaluation. For validation studies, international collaboration within a multidisciplinary context is highly desirable in order to demonstrate wide applicability.

Prospective studies should determine the extent of tissue sampling that is required to ensure adequate assessment of the residual cancer burden, taking into account the heterogeneity of tumor response.

The response of neoplastic cells to neoadjuvant treatment varies within a tumor, and the resulting patchy distribution of residual cancer cells, is a major challenge when scoring tumor response. As such, the measured response to neoadjuvant therapy may differ, depending on the areas sampled and the extent of tumor sampling. While extensive tissue sampling is important to ensure adequate representation of the heterogeneity of treatment effect, the scoring of this heterogeneous, often patchy, process remains challenging (as detailed in Fig. 1). Pathologists may either report the poorest response seen in a part of the tumor or estimate the average of the responses observed throughout the entire specimen. The ISGPP hypothesizes that the second approach is most appropriate, as it is likely more representative of the entire tumor, although there is currently no evidence that supports which approach correlates better with patient outcome. As such, new studies are needed to identify the extent of tissue sampling that ensures optimal assessment of the residual cancer burden. Uniform studies can only be achieved by establishing clear rules on the minimum requirements for sampling and reporting. Only then can sufficient evidence be obtained to compare various systems and to draw definitive conclusions. To report complete pathological response reliably (CAP 0, MDACC 0, or Evans 4), we expect that extensive, if not complete, sampling approaches are required. However, complete examination (e.g. embedding the entire pancreas, including all adjacent structures and tissues) is challenging in practice, and its benefits in relation to prognostic significance are unknown. Further studies are needed to identify the optimal balance between the costs of extensive sampling and the risk of missing clinically important foci of residual cancer.

In future scientific publications, the extent of tissue sampling should be described in detail in the “Materials and methods” section.

While the extent of sampling is a key determinant of the accuracy of TRS, there is currently wide variation in practice. In many studies describing and/or comparing TRS systems, detailed explanations of the extent of sampling are lacking. This makes it impossible to compare the meaning of even complete tumor response in different studies. Hence, the extent of sampling should be included in the “Material and methods” section of all scientific publications such that the comparability of data from different studies can be evaluated. Before a case can be classified as “complete remission,” total sampling of the pancreas is advised (with additional sections from the blocks with fibrous changes), and the prior biopsy diagnosis ought to be reviewed in consultation. Unless there is overtly abundant residual cancer that can be readily documented, extensive sampling of the pancreas is warranted to assess the amount and distribution of the residual cancer properly.

Areas of disagreement and future research

Some topics that were covered during the meeting did not reach agreement and remain open for further discussions and studies. The optimal number of tiers within a scoring system remains a topic of debate. An ideal TRS system (1) has strong prognostic value, (2) allows clinically relevant patient stratification, and (3) is reproducible, reliable, and practical. When attempting to establish these three characteristics, the number of tiers in a TRS system plays a central role. The majority of participants agreed that distinguishing complete response from non-complete response (either substantial or no response) is reliable and easy to do, provided that total sampling and careful evaluation are performed. However, since complete response is observed in only a small minority of patients, a two-tiered system does not allow clinically relevant stratification in the majority of patients and is therefore of limited use. However, as more tiers are added, prognostic systems generally face a trade-off between adding prognostic value in terms of more discriminatory power (more tiers) and maintaining simplicity so as to optimize interobserver agreement and applicability in daily practice (fewer tiers). Further research is needed to investigate whether additional tiers provide more relevant patient stratification without affecting interobserver agreement.

Information gathered by the pre-meeting survey and comparison of opinions between delegates from different continents highlight that there is significant divergence in practice e.g., in terms of dissection method and the use of photo documentation. For example, 5/13 (38.5%) of European pathologists agreed that axial slicing is the dissection method of choice to accurately score response after neoadjuvant therapy in PDAC, versus 0/7 (0%) of American pathologists. These issues were not addressed during the consensus meeting because they are not exclusively relevant to the assessment of tumor regression but rather present a potential source of nonuniform reporting on any surgical pancreatic cancer specimens, including those from treatment-naïve patients.

During the meeting, the use of ancillary or novel techniques to improve the prognostic value of TRS was discussed. Immunohistochemistry and other markers deemed to relate to response to treatment deserve further investigation to test their relevance for clinical practice. Another potential area of exploration concerns machine learning (ML) and artificial intelligence (AI) strategies. During the meeting, the potential of ML and AI strategies in TRS was widely acknowledged. These strategies have become increasingly versatile in recent years, and some have been successfully implemented for pathology practice [30, 31]. As ML and AI utilize an algorithmic approach, interobserver variability might be reduced.

Discussion

The Amsterdam International Consensus Meeting obtained consensus by 23 expert pathologists from four continents on seven statements regarding TRS to assess the effect of neoadjuvant treatment in resection specimens with PDAC. Objective criteria, adequate interobserver agreement, and standardized sampling and reporting are desirable for both clinical practice and clinical research. Objective definitions and easy-to-apply evaluation criteria are necessary for accurate and reproducible evaluation of the tumor response to treatment.

This consensus did not use a formal evidence-based approach nor did it provide a new TRS system. Rather, the consensus was intended as the start of a process to improve TRS in PDAC. In that spirit, a formal Delphi process was not used, since the authors preferred to have a face-to-face meeting to discuss current problems and dilemmas in this particular field.

Adequate scoring of tumor response after neoadjuvant therapy in resected pancreatic cancer aims to result in (1) more accurate assessment of treatment response and outcome prognostication, (2) a useful measure to guide adjuvant regimens, and (3) a valuable tool in comparative trials of neoadjuvant therapies. Given the complexity of challenges with current TRS practices, the development of an improved, easy-to-apply and objective TRS system and a consensus on the extent of tissue sampling are necessary. Once the improved system is established, retrospective and prospective validation is of paramount importance. The ISGPP, which was formed during the consensus meeting, aims to achieve these outcomes by facilitating international multidisciplinary collaborative research in alliance with the Neo-adjuvant Working Group of Pancreatobiliary Pathology Society (PBPath.org).

International Study Group of Pancreatic Pathologists (ISGPP)

Boris V. Janssen1,2, Faik Tutucu1,3, Stijn van Roessel2, Volkan Adsay3, Olca Basturk4, Fiona Campbell5, Claudio Doglioni6, Irene Esposito7, Roger Feakins8, Noriyoshi Fukushima9, Anthony J. Gill10, Ralph H. Hruban11, Jeffrey Kaplan12, Seung-Mo Hong14, Alyssa Krasinskas15, Claudio Luchini16, Johan Offerhaus17, Arantza Fariña Sarasqueta1, Chanjuan Shi18, Aatur Singhi19, Eline C. Soer1, Elizabeth Thompson11, Marie-Louise F. Velthuysen22, Marc G. Besselink2, Lodewijk A. A. Brosens17,24, Huamin Wang25, Caroline S. Verbeke26, Joanne Verheij1