Real-world outcomes versus clinical trial results of immunotherapy in stage IV non-small cell lung cancer (NSCLC) in the Netherlands

This study aims to assess how clinical outcomes of immunotherapy in real-world (effectiveness) correspond to outcomes in clinical trials (efficacy) and to look into factors that might explain an efficacy-effectiveness (EE) gap. All patients diagnosed with stage IV non-small cell lung cancer (NSCLC) in 2015–2018 in six Dutch large teaching hospitals (Santeon network) were identified and followed-up from date of diagnosis until death or end of data collection. Progression-free survival (PFS) and overall survival (OS) from first-line (1L) pembrolizumab and second-line (2L) nivolumab were compared with clinical trial data by calculating hazard ratios (HRs). From 1950 diagnosed patients, 1005 (52%) started with any 1L treatment, of which 83 received pembrolizumab. Nivolumab was started as 2L treatment in 141 patients. For both settings, PFS times were comparable between real-world and trials (HR 1.08 (95% CI 0.75–1.55), and HR 0.91 (95% CI 0.74–1.14), respectively). OS was significantly shorter in real-world for 1L pembrolizumab (HR 1.55; 95% CI 1.07–2.25). Receiving subsequent lines of treatment was less frequent in real-world compared to trials. There is no EE gap for PFS from immunotherapy in patients with stage IV NSCLC. However, there is a gap in OS for 1L pembrolizumab. Fewer patients proceeding to a subsequent line of treatment in real-world could partly explain this.


Study population.
Patients with stage IV NSCLC diagnosed between January 1st, 2015 and December 31st, 2018 in one of six Santeon Hospitals were selected. Staging was based on the 7th edition TNM classification for the years 2015-2016 and 8th edition for incidence years 2017-2018. Characteristics that were recorded for included patients were age (at year of diagnosis), gender, ECOG PS, histology, brain metastases, pre-existing autoimmune disease, and PD-L1 expression of the tumour. Also, the type of treatment (best supportive care (BSC), chemotherapy, targeted therapy or immunotherapy), line of treatment, hospital where the patient was treated, and (in case of immunotherapy) information on immune related adverse events (irAE) and palliative radiotherapy were collected.
Identification of systemic treatments per patient. First-line treatment was defined as the initial systemic treatment that was started after diagnosis. Second-line or further line treatment was defined as systemic treatment applied after completion or discontinuation because of disease progression of first or second-line treatment, respectively. After the identification of all different regimens we ordered patients in four different categories: chemotherapy, treatment with tyrosine kinase inhibitors (TKIs), immunotherapy, or best supportive care. For this study we focussed on immunotherapy and identified the most frequently used types of drugs (pembrolizumab and nivolumab) in first and second-line in our database. Second-line nivolumab for NSCLC in the Netherlands was introduced in March 2016. In 2017, first-line pembrolizumab for NSCLC with PD-L1 tumour proportion score (TPS) ≥ 50% was introduced.
Reference outcome. After identification of the most commonly used types of immunotherapy, corresponding reference outcomes were established from clinical trials. These clinical trials were identified by a literature search on PubMed for clinical trials used for approval of immunotherapy drugs. If multiple registration studies were published with different patient populations, the study with the most comparable patient population (based on the distribution of stage, PD-L1 expression, and histology) to our cohort was chosen. We searched for updated publications of the selected studies (if applicable) to use up-to-date data and to utilize as much as possible of their follow-up time for the comparison with real-world data.
Statistical analysis. All statistical analyses were conducted using R software package version 3.6.1.
To present an overview of baseline characteristics for all treated patients, frequencies (proportions) were calculated for categorical variables, and means (standard deviations) were provided for normally distributed continuous data.
For patients who received immunotherapy in first or second-line, overall survival (OS) was calculated as time between start date of treatment until date of death from any cause. Patients still alive at January 1, 2020 were censored as this was the end of follow-up date. Progression free survival (PFS) was calculated as time between start date of treatment until the occurrence of progression according to RECIST criteria when noted. Date of death was noted in absence of acknowledged progression from the individual patient files. Survival curves were obtained for the treatment groups using the Kaplan-Meier method.
The potential existence of an efficacy-effectiveness gap was assessed in two manners. First, a so-called efficacyeffectiveness (EE) factor was calculated by dividing the patient's individual median survival by the corresponding reference OS and PFS from clinical trials 13 . This factor was used to estimate the presence of an efficacy-effectiveness gap and shows how the patient's individual survival is related to survival presented in the reference RCT. As an example, an EE factor of 0.70 shows that median survival is 30% shorter in clinical practice than in RCTs. The Wilcoxon signed rank test was used to analyse the distribution of the EE factors. The null hypothesis (median OS in real-world is similar to median OS reported in clinical trials) is rejected if the distribution is significantly different from test value 1.0. www.nature.com/scientificreports/ Second, we assessed hazard ratios (HRs) between real-world immunotherapy treatment regimens and groups in corresponding RCTs to compare OS and PFS. This was achieved by first digitizing the Kaplan-Meier (KM) curves for the immunotherapy arm from the included clinical trials with the R-package 'digitize' . The extracted data points for survival probability were used for reconstructing the KM curve with the algorithm as described by Guyot et al. 19 The coordinates of the KM curve from the published graph were read in with the R-coding script by Guyot et al., together with the information on numbers at risk and total number of events, to reconstruct the KM data. With this reconstructed individual patient data (IPD), KM curves and Cox HRs were estimated using the R routines 'survfit' and 'coxph' .

Scientific Reports
Univariate and multivariate analyses for potential prognostic factors for PFS and OS in real-world were performed using Cox proportional hazards regression models. The factors with p < 0.05 on univariate analysis were included in the multivariate analysis. In this analysis, missing values were imputed by single stochastic regression imputation (single run with all available characteristics in the model). Pearson's chi-square tests were performed to test if the proportion of the prognostic factor in real-world differed from reference RCT data, if applicable.
Ethical statement. All methods were carried out in accordance with relevant guidelines and regulations.
The study was approved by the Santeon institutional review board, and all clinical information was provided in a de-identified fashion and informed consent was waived (SDB219-008). The study was performed in accordance with the ethical standards of the institutional and national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Results
Baseline characteristics. We identified 1950 patients diagnosed with stage IV NSCLC in the period 2015 to 2018. Figure 1 provides an overview of the different treatment patterns of all patients towards best supportive care (BSC). Of these patients, 922 (47%) did not receive active anti-tumour treatment in 1L because of their ECOG PS, comorbidities, or at request of the patient. Twenty-three patients (1%) were referred to other hospitals after diagnosis of whom we did not have information on treatment. Of all diagnosed patients, 1005 (52%) received 1L treatment of which 92 (9%) received immunotherapy in first-line. Median OS for patients receiving chemotherapy, immunotherapy or TKI was 7.5 months, 15.6 months or 15.5 months, respectively (Appendix SFigure 1).
Of all treated patients, only 365 (36%) received subsequent treatment in one of the six hospitals of which 200 (55%) received immunotherapy. The most frequently received 1L immunotherapy was pembrolizumab (n = 83, 90% of all patients treated with immunotherapy in 1L) and in second-line nivolumab for patients with nonsquamous tumour histology (n = 141, 71%). Based on these subgroups, two registration studies (Checkmate 057 and KEYNOTE-024 2,5 ) were identified for the comparison with real-world data. Table 1 shows patient characteristics of patients with 1L pembrolizumab or 2L nivolumab (both monotherapy). The median age of patients with 1L pembrolizumab was 66 years and almost all patients (96%) had an ECOG PS of 0 or 1. The median age of patients with 2L nivolumab was 64 years and 95% had an ECOG PS of 0 or 1.  EE factor analysis. Median OS was shorter for all patients who received 1L pembrolizumab in real-world practice compared to the clinical trial. Table 2 shows an EE factor of 0.45 (p < 0.001 from 1), which means that median survival is 55% shorter for patients treated in clinical practice relative to median survival from the registration clinical trial. There was no significant difference in median PFS for 1L pembrolizumab, with an EE factor of 0.85 (p = 0.86). For 2L nivolumab, the EE factor for median OS is 0.65 (p = 0.065). For PFS, we found an EE factor of 1.61 (p < 0.001) ( Table 2). This indicates that median PFS is 61% higher for nivolumab in clinical practice, compared to median PFS from the RCT.
From the univariate cox proportional hazards regression model, no significant associations were found with PFS or OS for all potential prognostic factors in real-world patients with 1L pembrolizumab, or for PFS in realworld patients with 2L nivolumab. Higher ECOG PS (0-1 vs. ≥ 2) at start of 2L nivolumab was associated with  When looking at explanatory factors after starting 1L pembrolizumab in real-world, we found that PFS was influenced by the occurrence of irAEs. Having an irAE reduces the hazard by a factor of 0.48 (95% CI 0.26-0.89; p = 0.019). For 2L nivolumab, the occurrence of irAEs was also an explanatory factor for longer PFS and OS (HR 0.45; 95% CI 0.30-68; p < 0.001 and HR 0.44; 95% CI 0.27-0.70; p < 0.001, respectively).
In real-world, the proportion of patients with irAEs was significantly lower compared to clinical trials for both 1L pembrolizumab and 2L nivolumab (31% versus 77% with p < 0.001, and 28% versus 69% with p < 0.001, respectively). Furthermore, fewer patients received a subsequent line of treatment in real-world compared to patients in clinical trials after both 1L pembrolizumab and 2L nivolumab (17% versus 36% with p = 0.002, and 28% versus 42% with p = 0.014, respectively).

Discussion
This study showed that PFS of patients with stage IV NSCLC treated with immunotherapy is comparable between real-world and clinical trials. However, OS is significantly shorter for patients with 1L pembrolizumab in realworld (median 15.8 vs. 30 months; HR 1.55 (95% CI 1.07-2.25)). This finding is in line with our previous research 13 which showed that survival after first-line chemotherapy and targeted therapy is around one quarter shorter in real-world compared to clinical trial data. The present study extends this finding to first-line pembrolizumab monotherapy in patients with ≥ 50% PD-L1 expression. To our knowledge, this is the first study that calculated hazard ratios for PFS and OS of real-world versus clinical trials. This approach provided insight that stage IV NSCLC patients who are treated with immunotherapy in regular clinical practice have a comparable period of PFS time but that this does not extend to a similar OS benefit as demonstrated in the registration trial. This implies that something differs between regular practice and clinical trial participants after progression on immunotherapy. One explanation from our data could be the receiving of a next line of treatment being more frequent in clinical trial participants compared to in real-world (two times more frequent in 1L setting, and 1.5 times in 2L setting). Patients who received immunotherapy through participating in a clinical trial, for example, might have an above average intrinsic willingness to search for further treatment options after treatment failure. Other possible explanations could be that clinics active in trial enrolment communicate more about remaining experimental treatments, or that patients characteristics, other than performance status, limit the tolerability for subsequent systemic treatment (in this case chemotherapy) after progression on immunotherapy to a larger extent in regular care patients.
The results on median OS and PFS as found in our study are largely in line with previous observational studies on 1L immunotherapy treatment with pembrolizumab in real-world. Velcheti et al. 20 evaluated real-world survival in patients with metastatic NSCLC with an PD-L1 expression of ≥ 50% and ECOG PS ≤ 1, and found a median PFS of 6.8 months and median OS of 19.1 months. Their PFS and OS are slightly shorter than found in the KEYNOTE-024 clinical trial, but real-world OS is longer than was found in our study, which could be explained by only including patients with ECOG PS 0 or 1, or by the higher percentage of patients with secondline systemic therapy as compared to our cohort (28% vs. 17%, respectively). A French study which included patients with advanced NSCLC and PD-L1 expression of ≥ 50%, reported a median PFS of 10.1 months and a median OS of 15.2 months for patients treated with pembrolizumab in first-line, including patients with brain metastases and ECOG PS 2 21 . As in the results of the current study, their PFS is comparable with the median PFS from the KEYNOTE-024 trial of 10.3 months, whereas OS is shorter (comparable to our real-world findings). Tambo et al. 22 found a median of 6.1 for PFS which is shorter than the median PFS from both the registration study and the current study. Shorter PFS may be explained by inclusion of patients with ECOG PS 2 (12%) and 3-4 (11%). Their OS did not reach the median and therefore could not be compared.
The outcomes found for 2L nivolumab are also in line with previous studies with real-world data. Crinò et al. 23 found a median PFS for patients treated with nivolumab in second or further line of 4.2 months and a median OS of 7.9 months, which is comparable with the present findings (3.8 and 8.2 months, respectively). Another study in Dutch patients treated with second-line nivolumab reported a median PFS and OS of 2.6 and 10.0 months, respectively 24 . Grossi et al. 25 reported results from the expanded access program in Italy on 1588 patients, including patients with ECOG PS 2 and aged ≥ 75 years, where median PFS was 3.0 months and median OS 11.3 months. The results of those three studies on real-world nivolumab in second-line suggest that results for PFS and OS in real-world and RCTs are indeed comparable. Interesting from our data is that the median PFS from 2L nivolumab was significantly longer than in the clinical trial but without any difference in overall hazard (HR 0.91). An explanation could be a difference in how date of progression is determined between the two settings. To illustrate, in Checkmate 057 disease progression was assessed nine weeks after start of treatment and every six weeks thereafter, compared to every 8 weeks in real-world. This relatively small difference in timing could lead to larger absolute differences with a median PFS in weeks range in 2L settings. Besides this, our finding also highlights that comparing median PFS times only could lead to biased conclusions about relative effectiveness.
Regarding prognostic factors, for patients treated with 2L nivolumab, the negative association between a higher/worse ECOG PS and OS is in line with previous research. Crinò et al. 23 also found ECOG PS to be a poor prognostic factor. Additionally, the results from Schouten et al. 24 from the Dutch expanded access program and the study by Martin et al. 26 show similar results. A Dutch study on real-world effectiveness of ICIs in patients with stage IV NSCLC in first and second-line also shows that an ECOG PS ≥ 2 is a poor prognostic factor 14 . Additionally, in a phase 3B/4 community-based study of nivolumab monotherapy in previously treated patients with advanced NSCLC including patients with poor performance status (Checkmate 153), a median overall survival of 4.0 months was found in patients with ECOG PS 2 27 . Next to the association of ECOG PS ≥ 2 and worse outcome for patients with 2L nivolumab in real-world, the proportion of patients with ECOG PS ≥ 2 in real-world is significantly higher compared to the clinical trial. This confirms the general thinking that trials select more fit patients, where patients with higher ECOG PS in clinical practice aim for immunotherapy as well, resulting in an EE gap.
Besides this, our data also confirms the relationship between irAEs and outcomes of immunotherapy. Similar to Lisberg et al. 28 who retrospectively analysed the relationship in patients that were treated with pembrolizumab in KEYNOTE-001, we found improved survival for patients with irAEs. Another study by Haratani et al. 29 also revealed that irAEs were positively associated with survival outcome in nivolumab treated patients. Ricciuti et al. 30 confirmed the positive relationship between the occurrence of irAEs and survival for patients with 2L nivolumab as well.
The main strengths of this study are its complete and precise data, nationwide multicentre approach, and that PFS and OS were compared both through comparison of medians as well as through a proportional hazards approach. The latter method better addresses that many patients were still alive at time of conducting the www.nature.com/scientificreports/ analyses and provides a unique opportunity to compare survival dynamics not captured by a single value for median survival.
Limitations of this study are that (patient) factors potentially responsible for an efficacy-effectiveness gap could only be analysed on the cohort level instead of patient level, because individual patient data from the respective clinical trials were not available. When available, these could have been analysed in a multivariable cox regression model potentially leading to identification of additional factors associated with the EE gap. Better identification of these factors could lead to trial designs matching real-world populations better and eventually smaller EE gaps. Besides this, another limitation is that the treatment patterns shown in Fig. 1 do not match with current practice anymore (introduction of 1L immunochemotherapy combination limiting the use of 2L immunotherapy). Nevertheless, this does not compromise the relative effectiveness assessments within specific treatments as conducted in our study.
Apart from the well-known complexity of how to translate progression free survival times to overall survival when survival data of the trial cohort is not mature yet 31,32 , the present study shows that this translation can be even more complex later on when it comes to generalizability to real-world settings. Considering that costeffectiveness assessments are often based on overall survival data from trials, our observed shorter overall survival time after 1L pembrolizumab in real-world compared to the clinical trial data could provide an argument to rework the initial cost-effectiveness assessment.

Conclusion
There is no efficacy-effectiveness gap for the outcome PFS for immunotherapy in patients with stage IV NSCLC. However, there is a gap in OS for first-line pembrolizumab. Fewer patients proceeding to a subsequent line of treatment in real-world could partly explain this. PFS and OS results from clinical trials can differ in generalizability to regular clinical practice. www.nature.com/scientificreports/