Main

The development of immune checkpoint inhibitors (ICIs) has heralded a new era in immuno-oncology (Couzin-Frankel, 2013). Therapies such as ipilimumab, pembrolizumab and nivolumab have shown impressive responses in metastatic melanoma and are FDA approved for this disease (Hodi et al, 2010; Robert et al, 2011; Hamid et al, 2013; Robert et al, 2014; Larkin et al, 2015; Schadendorf et al, 2015; Weber et al, 2015). Numerous trials with these agents alone or in combination are underway in melanoma and a variety of other tumour types.

Clinical trials of ipilimumab, the first ICI to be licensed for metastatic melanoma, highlighted idiosyncratic patterns of response. Patients may have a standard RECIST radiological response upon treatment or progression during treatment, followed by a delayed response, possibly several months after treatment completion. This may be due to either transient tumour growth followed by an immune response or pseudo-progression, implying an inflammatory immune response leading to the increase in tumour size. Mixed responses or prolonged stable disease is additionally not uncommon. Such patterns of response provide particular challenges in managing patients on these therapies. Furthermore, traditional radiological clinical end points may not adequately capture benefit from these agents, where progression-free survival may be inaccurately assessed and overall survival (OS) is the most accurate measure of efficacy.

The immune-related response criteria (irRC) were formulated in 2009 as an alternative to response evaluation criteria in solid tumors (RECIST) to more accurately capture responders or benefit from ipilimumab (Wolchok et al, 2009). All lesions are considered with overall tumour burden being assessed at each scan as opposed to defined target lesions. Progression on one scan requires confirmation on a follow-up scan during which patients may remain on treatment. Immune-related response criteria are being validated in clinical trials with ICI, where it is used along with RECIST as a corollary or exploratory end point. The CHOI criteria were proposed in 2004 (Choi et al, 2004) to better capture responses in gastrointestinal stromal tumours treated with imatinib. Although decreases in tumour lesion size can be seen with this agent, the majority of patients have stable disease or tumour growth by RECIST in the initial phase of treatment. CHOI assesses tumour density where a decrease in tumour density of >15% indicates a response to treatment.

Patterns of response may differ between ICI agents. Anti-PD-1 and CTLA-4 antibodies have distinct mechanisms of action (Das et al, 2015) and delayed responses or pseudo-progression appear to be less common with anti-PD-1 agents at least in melanoma (Chiou and Burotto, 2015; Wolchok et al, 2015). It is unknown if patterns of response with ICI differ between solid tumour types.

We aimed to explore the different patterns of response to pembrolizumab an anti-PD-1 antibody, in melanoma patients receiving this drug on a phase I study. One of the aims was to assess and contrast response using RECIST and irRC. We also sought to explore the role of CHOI and modified CHOI (mCHOI) in assessing response with this agent.

Materials and methods

This retrospective study was conducted at the Princess Margaret Cancer Centre, Toronto, Ontario, Canada, using an institutional research ethics board-approved protocol (14–7328), in accordance with the Declaration of Helsinki. Patients enroled onto the Keynote 001 pembrolizumab phase I study in melanoma were identified and their clinical characteristics and scan(s) reviewed. Bi-dimensional tumour diameter and density measurements were obtained at baseline and in subsequent serial assessment CT scans performed at defined time points as per trial protocol. Where possible the scan performed before the baseline scan (pre-baseline scan) was also reviewed. Measurable lesions were defined as those lesions measuring ⩾5 mm in longest diameter or ⩾15 mm in short axis diameter for lymph nodes (LNs). Lesion-specific response was determined by change in the product of longest perpendicular diameters and classified as complete (CR, complete disappearance or ⩽10 mm short axis diameter for LN), partial (PR, ⩾50% reduction), PD (⩾25% increase) or stable (SD, neither CR or PR nor PD). Patient tumour response was determined using irRC (Wolchok et al, 2009), RECIST 1.1 (Eisenhauer et al, 2009), CHOI (Choi et al, 2004, 2007) and mCHOI (Nathan et al, 2010) criteria. All radiological assessments were performed as previously described (Choi et al, 2004, 2007; Eisenhauer et al, 2009; Wolchok et al, 2009; Nathan et al, 2010), see Table 1 and the number of lesions assessed is shown in Supplementary Table 1

Table 1 Comparison of radiological assessment criteria

Overall response was assigned as per final CT scan assessment. Association of each radiological criterion with OS was determined.

Statistical analysis

Categorical variables, such as classification of tumour burden, site of occurrence, gender, stage, LDH level, BRAF and NRAS status, were summarised with counts and percentages. Continuous variables such as tumour burden from pre baseline to baseline and follow-up measures were summarised with mean, medians and/or ranges as appropriate. χ2-test was used to compare level of categorical covariates of interest with overall response (CR vs PR/SD/PD). Student’s t-test was used to compare lesions size at baseline based on overall response (CR vs PR/SD/PD). Time to event was defined as the interval from the baseline CT scan until the date of death or date of last follow-up for those who were alive at the end of study. Overall survival estimates were obtained using the Kaplan–Meier product-limit method. Log-rank test was used as a test statistic to assess the impact of overall response on OS. Cox proportional hazards model was also used to assess the impact of increase in tumour size and increase in density from baseline on OS. All P-values were two-sided and for the statistical analyses, P<0.05 will be considered to indicate a significantly different result. Statistical analyses were performed using SAS Version 9.4 (SAS Institute Inc., Cary, NC, USA).

Results

Patient characteristics and distribution of lesions at baseline

Thirty-seven patients with 567 measurable lesions treated with pembrolizumab in a phase 1 trial were studied. The median age of patients was 56 years, 54% of patients were male, 95% were stage M1c (all patients had visceral metastases, five patients had a normal LDH level) and 84% were BRAF negative; patient characteristics are summarised in Table 2.

Table 2 Characteristics of patients at baseline prior to commencing treatment

In total, there were 567 lesions assessed at baseline in all 37 patients, with a median product of diameter (POD), 154 mm2 (range 15–20 976). These multiple lesions were located at various sites, including visceral organs, skin, nodes and muscle. The most common site of occurrence was in the lungs with 163 of the 567 lesions (29%) occurring at this site. Nodal disease accounted for 25% (140) of all the lesions, see Table 3.

Table 3 Sum of the total tumour size and distribution of metastases in all patients

Response in lesions according to site of metastases and lesion size

Overall, most lesions remained stable in bi-dimensional size over the entire assessment period, whereas 25% (140 of the 562) demonstrated CR, which was defined as complete disappearance of a lesion or <10 mm in short axis diameter (Figure 1A). Complete response was most commonly seen in lung metastases with 42% of lung lesions undergoing complete disappearance, compared with 18% CR observed at other sites (69 of the 163 lung lesions vs 71 of the 399 other sites, P<0.0001; Figure 1B).

Figure 1
figure 1

Response as assessed in individual lesions and according to location. (A) Overview of response of individual lesions, most lesions remained stable (34%) with CR evident in 25%. (B) Response differed according to location of metastatic disease with lung lesions showing the greatest response.

Lesions that underwent CR were smaller in size (defined as PODs) at baseline compared with those with PR/SD/PD; mean/s.d. POD of CR (568.8/879.6 mm2) vs mean/s.d. POD of PR/SD/PD (806.7/1166.7 mm2; P=0.015).

There was an association between clinical benefit and BRAF status (P<0.0001), but not NRAS mutation status. There was a higher rate of CR or PR (P<0.0001) when the LDH level was below 300 U l−1.

Changes in lesion size pre-baseline, at baseline and on treatment

On the whole, there was an increase in tumour burden from the pre-baseline to baseline study as assessed by change in the POD of all lesions with an average increase of 1467 mm2 (range −621 to 7074). When the POD was calculated per patient, the majority (84%, 31 out of 37) showed an increase in tumour size from pre-baseline to baseline. There was no evidence of associations between this pre-baseline to baseline change (increased or decreased) and subsequent response at the first follow-up study done (assessable in 37 patients, P=0.19). However, there was a significant association noted between this pre-baseline to baseline change (increased or decreased) and the second follow-up assessment study (assessable in 30 patients, P=0.03). Of patients whose disease was increasing before the baseline scan, 56% (irRC) and 48% (RECIST) demonstrated either complete or PR in the second follow-up study and 16% and 28% experienced SD as per irRC and RECIST criteria, respectively (P=0.03 and P=0.03).

Response according to irRC

When assessed using the irRC, 59% of patients (22 of 37 patients) had clinical benefit at the first assessment time point, described as CR (n=1), PR (n=14) or SD (n=7). Forty-one percentage of patients (15 out of 37) showed unconfirmed PD at the first assessment study (in the irRC, confirmation of PD by a repeat assessment in at least 4 weeks time is required). Among this group of patients, seven were confirmed as PD in the subsequent surveillance studies, two patients subsequently went on to show a delayed benefit with SD by irRC or PR by irRC and the other six patients did not have additional imaging assessment thereafter (Figure 2). One of the seven patients with confirmed PD at the second assessment showed an atypical delayed response in the subsequent assessments, with a resultant PR by irRC.

Figure 2
figure 2

Patterns of response as assessed by (A) irRC and (B) Response evaluation criteria in solid tumors (RECIST) 1.1. Response patterns differed with greater variation seen on first scan as assessed by RECIST in comparison with irRC.

Response according to RECIST 1.1 criteria

When assessed using the RECIST criteria, 43% (16 out of 37 patients) were found to have PD on the initial scan. Among these RECIST PD patients, the irRC identified four of them as having stable disease in the initial assessment study, two of whom continued to demonstrate objective responses (irRC) in the subsequent assessment point, one had irRC PR and one irRC SD. Consequently, this analysis shows that 5% (2 out of 37) of treated patients, who were initially characterised as PD by RECIST criteria, did actually go on to demonstrate some treatment benefit (Figure 2).

Response according to CHOI and mCHOI criteria

Twenty-four (65%) patients met CHOI density and size criteria for assessment of clinical benefit (CR, PR or SD) at first follow-up. CHOI and mCHOI criteria showed benefit in 38% (14 out of 37). Change in tumour size and density on first follow-up assessment was associated with OS with each 1000 mm2 increase in tumour size from baseline increasing the hazard of dying by 26% (HR=1.26, (95% CI=1.12–1.42), P=0.0002). Similarly, each 20HU increase in density increased the HR by 15% (HR=1.15, (95% CI 1.045–1.260), P=0.004).

Differences in response assessment between criteria

The impact of density as opposed to change in size of lesions was examined by comparing where possible RECIST response to CHOI at each time point. There were differences in clinical benefit by each of these criteria, suggesting that density had added value in assessment of response (Table 4; Supplementary Table 2).

Table 4 Comparison of numbers of patients deriving clinical benefit vs progressive disease as assessed by RECIST vs CHOI

Examining the largest data sets, those at baseline, at first and at second follow-up, there were differences between criteria in the numbers of patients determined to be deriving clinical benefit on treatment (Table 5). It was not possible to determine which criterion performed optimally in assessing response during treatment or overall. The amplitude of response by each criterion across all assessable patients is shown in Supplementary Figure 1. Figure 3 illustrates the difference in assessment by different criteria in one lesion on serial CT scans on treatment.

Table 5 Comparison of the numbers of patients deemed to be deriving clinical benefit vs progressive disease by each radiological criteria
Figure 3
figure 3

Serial images from a 62-year-old woman with metastatic melanoma to the subcutaneous tissues treated with anti-PD-1 antibody. (A) Baseline scan shows a subcutaneous deposit measuring 26 × 20 mm with a density of 57 HU. (B) The first follow-up scan performed 13 weeks after the first infusion and showed that the lesion had progressed by RECIST and irRC (measurements were 48 × 32 mm), but had decreased in density (to 38 HU). (C) On the second follow-up CT scan, the measurements were 60 × 31 mm, but the density had again decreased (to 35 HU). (D) On the final assessment scan, the measurements were 67 × 30 mm, but density was 25 HU.

Associations between response (by each criterion) and OS

At the time of reporting the median follow-up was 9.7 months (range 3–19). Responders (CR/PR/SD vs PD) defined by any criterion had superior OS (by log rank testing); irRC P<0.0001, RECIST P=0.0003, CHOI P=0.008 and mCHOI P=0.018, respectively.

Using irRC, the 6-month survival for responders vs non-responders was 100% vs 64% (95% CI 0.3–0.85) and 1-year survival was 85% (95% CI 0.48–0.97) vs 36% (95% CI 0.11–0.63), respectively. By RECIST 1.1, 6-month and 1-year OS for responders vs non-responders were 100% vs 78% (95% CI 0.51–0.91) and 100% vs 41% (95% CI 0.14–0.67), respectively. Using CHOI, the 6-month and 1-year OS were 95% (95% CI 71–99) vs 77% (95% CI 0.44–0.92) and 79% (95% CI 0.44–0.94) vs 52% (95% CI 0.22–0.75), respectively. Under mCHOI, 6-month and 1-year survival were 100% vs 82% (95% CI 0.59–0.93) and 86% (95% CI 0.33–0.98) vs 63% (95% CI 0.39–0.8), respectively. Thus, response by each response criterion assessed here had independent prognostic value.

Discussion

We performed a comparison of four radiological criteria in the assessment of response patterns in metastatic melanoma patients treated with pembrolizumab. There were differences in response according to location of metastatic disease and size of metastases. The majority of lesions showed stability over time with treatment rather than CR or PR. Nevertheless, there was a significant association with OS in all responders or benefiters from treatment (SD, PR and CR), by any response criterion, indicating a positive effect on overall disease control regardless of magnitude of tumour shrinkage. Responses tended to be early but 5% had a delayed response. Interestingly, there was an association between growth on pre-baseline scan and response on the second assessment scan possibly indicating a longer time to response in patients with disease progression pretreatment.

Ongoing and published clinical trials evaluating anti-PD-1/PDL1 agents have predominantly used RECIST criterion in assessing response (Chiou and Burotto, 2015). Some have also used irRC as a secondary response criterion. Where comparisons are possible, it does appear that irRC captures responses otherwise missed with RECIST, although the number of cases is small, between 3–12% depending on the study (Chiou and Burotto, 2015). The differential responses seen in our study according to the location of metastases and size have been reported in other studies. Twenty-seven patients treated on the pembrolizumab Keynote 001 phase I study at a single centre were evaluated for response using POD (as in our study) to assess individual lesions and irRC alone for overall response (Lyle et al, 2014). Lung lesions and smaller lesions (median POD 80 vs 246 mm2(P<0.05) were noted to have the highest rate of CR. When the majority of Keynote 001 melanoma patients were assessed as a whole regardless of cohort (n=411), larger lesion size was the only independent factor associated with inferior response (Joseph et al, 2014). Tumeh* et al (2015) reported (n=112 patients treated on keynote 001) liver metastases as being associated with treatment failure with lower CD8+T-cell tumour infiltration, T-cell PD-1 and tumour PDL1 expression. Lung lesions were again associated with higher rates of response. Given the differential responses seen in the liver but the consistent high rates of response in the lungs between these studies, it is likely that tumour microenvironments differ between metastatic sites and between individual patients. These differences likely account for the heterogeneity in responses and have implications for patient management (Herbst et al, 2014; Tumeh et al, 2014).

Comparing across radiological criteria in our study is difficult, as it was not possible to assess response by each criterion in every patient (due to patient drop out of trial consistent with this being a phase I study). Our study is also limited by its small size and its retrospective nature. Nonetheless, the use of CHOI criterion is novel and identified responses not otherwise captured by RECIST alone. The use of change in tumour density may take into account changes in tumour vasculature and or tumour necrosis that may not be perceptible using RECIST 1.1 yet be predictive of benefit. This is particularly important where the majority of lesions, as reported here, remain stable by size criteria over time. Future prospective studies are needed to evaluate the role of CHOI is assessing response to ICI particularly in cases of possible progression or heterogeneity of response.

The use of ICI is likely to extend beyond metastatic melanoma and non-small cell lung cancer to many other solid tumours. Although combination therapies of different ICI or ICI with targeted and anti-angiogenic agents are actively being explored, the toxicities from such combinations are likely to be significant. Moreover, it is clear that even with single agent ICI the assessment of response and determination of which patients are benefiting from their treatment is problematic. Future perspective studies are needed to determine a consensus on the best radiological criterion or combination of criteria to use in managing patients on ICI.