Relative Efficacy of Checkpoint Inhibitors for Advanced NSCLC According to Programmed Death-Ligand-1 Expression: A Systematic Review and Network Meta-Analysis

Although currently available immune checkpoint inhibitors with similar but slightly different indications are recommended for patients with advanced non-small cell lung cancer (NSCLC), their effects by programmed death-ligand-1 (PD-L1) expression level are not yet known. This meta-analysis aims to assess the survival benefit and comparative efficacy of checkpoint inhibitors according to PD-L1 expression level: <1%, 1–49%, and ≥50%. We searched the MEDLINE, EMBASE, and Cochrane database through December 2017. A fixed-effect Bayesian network meta-analysis (NMA) was performed to estimate hazard ratios (HRs) for overall survival (OS) with 95% credible intervals (CrIs). Seven trials including 3688 patients were selected from among the 673 screened studies. Checkpoint inhibitor remarkably improved OS over chemotherapy in the PD-L1 ≥ 50% subgroup compared with the PD-L1 < 1% and PD-L1 1–49% subgroups. Atezolizumab, nivolumab, and nivolumab were the most effective agents for second- or later-line settings in the PD-L1 < 1%, PD-L1 1–49%, and PD-L1 ≥ 50% subgroups, respectively. PD-L1 expression ≥50% on tumor cells could be a reliable indicator that helps patient selection in view of cost-efficiency, and each checkpoint inhibitor reported to be the best agent by PD-L1 expression level could be carefully recommended in each PD-L1 expression subgroup.

Data extraction. We extracted the most extended follow-up data including updated survival analyses from the meeting abstracts in cases of multiple sources reported in the same trial. The following records were abstracted from each included study: trial name, year of publication, treatment details, line of treatment, PD-L1 diagnostic assay tool, clinical information on the study patients (age, never smoker, and histology) and the number of patients by three PD-L1 expression subgroups. The HRs with corresponding 95% confidence intervals (CIs) for overall survival (OS) were extracted from the included articles.
All included trials reported HRs and 95% CIs for OS in patients with expressions of PD-L1 < 1%, PD-L1 ≥ 50%, or PD-L1 ≥ 1%. To calculate HRs and 95% CIs for the PD-L1 1-49% subgroup of each trial, we assumed that combining log HR and its standard error for PD-L1 1-49% with log HR and its standard error for PD-L1 ≥ 50% by fixed-effect meta-analysis using the inverse-variance method could calculate HR and its 95% CI for PD-L1 ≥ 1% 14 . As we extracted HRs and 95% CIs for PD-L1 ≥ 1% and PD-L1 ≥ 50%, it was possible to calculate HRs and 95% CIs for PD-L1 1-49% of each trial. To test this hypothesis, we extracted and combined HRs that were reported in two subgroups with mutually exclusive property (e.g., male and female, non-squamous and squamous) in all included articles. The authors also checked whether calculated HRs corresponded to the reported HRs for the entire population, as PD-L1 1-49%, PD-L1 ≥ 50%, and PD-L1 ≥ 1% had the same property. From this approach, we identified that pooled HRs were nearly consistent with reported HRs for the overall population (with error ≤0.01). Two authors (J.K. and J.H.L.) abstracted the data independently using a predefined data sheet, and two other authors (J.C. and M.H.L.) resolved the discrepancies in the extracted data. Two reviewers (J.K. and J.C.) assessed the quality of the included studies using the Cochrane Collaboration risk-of-bias tool. Data synthesis and analysis. As included trials are well-designed randomised trials and similar in important ways, such as patient characteristics and outcome measurement, and due to the scarce number of trials consisting each edge of the network, a fixed-effect model was considered appropriate. A NMA using HRs for OS was conducted in the Bayesian framework using JAGS and the GeMTC package in R (https://drugis.org/software/r-packages/gemtc) 15,16 . To estimate relative HRs for OS, a Markov Chain Monte Carlo simulation was performed with 5,000 adaptations and 20,000 iterations of each of the four automatically generated Markov chains. After all simulations were performed, the NMA calculated the probability that each treatment would be best by calculating the percentage of simulations in which a certain treatment ranked first. Non-informative priors were chosen for the between-studies standard deviation and the relative effects of treatment. Heterogeneity in the network was evaluated via the standard deviation within each pairwise meta-analysis.
To provide more practical information in the clinical field and reduce the heterogeneity between studies that used the same checkpoint inhibitors, we also conducted a subgroup NMA including trials performed in secondor later-line settings.
Publication bias could not be reported because of the small number of trials included in the pairwise comparisons. A good average of quality of included studies is provided in Supplementary Fig. S1. All trials reported a high risk of blinding of participants and personnel due to the open-label designs. Random sequence generation and allocation concealment were reported appropriately in the Keynote 010, POPLAR, and OAK trials. The CheckMate 017, CheckMate 057, and OAK trials reported an unclear risk of detection bias, which is evaluated by SCIentIfIC RepoRts | (2018) 8:11738 | DOI:10.1038/s41598-018-30277-0 whether the outcomes of treatment are being assessed by a third independent reviewer. Attrition, reporting, and other biases were not detected in any of the trials.

Discussion
Several factors complicate decision-making for clinicians concerning the use of checkpoint inhibitors for treating advanced NSCLC. First, three different agents that have similar mechanisms of action are available for patients with advanced NSCLC. Second, each agent has a similar but slightly different indication of PD-L1 expression level. Third, although PD-L1 expression is approved as a companion or complementary diagnostic by the US Food and Drug Administration, the question of its clinical significance persists. In the situation mentioned above, we tried to evaluate the efficacy of checkpoint inhibitors by PD-L1 expression level and calculate the probability of each being the best treatment in second-or later-line settings through Bayesian simulations. This network meta-analysis demonstrated that the checkpoint inhibitor improved OS over chemotherapy in all three subgroups and a remarkably better effect was observed in the PD-L1 ≥ 50% subgroup than in the PD-L1 < 1% and PD-L1 1-49% subgroups, and provided information about the rank order of each treatment for second-or later-line settings.
The trend for a linear relationship between the PD-L1 expression level on tumor cells and the efficacy of checkpoint inhibitors have been reported in advanced NSCLC [1][2][3]5,6,[9][10][11][12][13] . Based on this observation, the Keynote 024 trial comparing first-line platinum-based chemotherapy with pembrolizumab succeeded in reporting positive data by strictly selecting a predefined population with high PD-L1 expression on at least 50% of tumor cells 4,17 . However, the Checkmate 026, which compared first-line chemotherapy with nivolumab in patients with PD-L1 ≥ 1%, did not show a significant survival benefit of nivolumab even in those with PD-L1 ≥ 50% 8 . Various hypothetical factors may explain the difference in the results obtained from the Checkmate 026 compared with those from the Keynote 024 in the PD-L1 ≥ 50% subgroup. Among possible explanations, the most sound reasons may be a lack of power to detect an actual benefit of nivolumab in the Checkmate 026 due to a non-predefined design and an imbalance in the number of patients treated with nivolumab versus chemotherapy in PD-L1 ≥ 50% (88 vs 126) 18 . In our study, the PD-L1 ≥ 50% subgroup including 1284 patients showed a substantial benefit of using checkpoint inhibitors compared to the subgroup with PD-L1 < 1% or PD-L1 1-49%, suggesting that PD-L1 expression ≥50% would be a reliable indicator that helps with patient selection in view of cost-efficiency.
Current checkpoint inhibitors have their own IHC assay platforms, and different assay methods may lead to inappropriate result interpretation and treatment decisions. For this reason, efforts have been made to evaluate the comparability of various IHC assays [19][20][21][22] . Most studies reported that two PD-L1 IHC assays (Dako 22C3 and Dako 28-8) had similar performances for tumor cell staining of PD-L1, while SP143 showed less tumor cell staining than others [20][21][22] . Additionally, the SP143 assay quantified PD-L1 expression on tumor-infiltrating immune cells as well as on tumor cells. Therefore, the results for atezolizumab in this meta-analysis should be interpreted cautiously.
In our study, the distribution of patients by PD-L1 expression in trials 1,2,5,6 that recruited patients regardless of PD-L1 expression was 786 (43%) in PD-L1 < 1%, 700 (39%) in PD-L1 1-49%, and 323 (18%) in PD-L1 ≥ 50% (Table). The findings were comparable to those of the study that reported on the prevalence of PD-L1 expression in patients investigated for enrollment in three pembrolizumab trials: the Keynote-001, -010, and -024 23 . In this study, 4784 patients were assessed for PD-L1 expression; 1596 (33%) had PD-L1 < 1% on tumor cells, 1832 (38%) had PD-L1 1-49%, and 1356 (28%) had PD-L1 ≥ 50%. From these statistics, it is possible to estimate the approximate distribution of patients with advanced NSCLC according to PD-L1 expression. The fact that advanced NSCLC has a relatively even distribution of PD-L1 expression in conjunction with the fact that there are similar clinical indications for checkpoint inhibitors could render it difficult for physicians to make clinical decisions. Our study might contribute to resolving this issue by dividing PD-L1 expression level into three subgroups with mutually exclusive categories and demonstrating the relative efficacies of checkpoint inhibitors and suggesting the best agents according to the subgroups.
The heterogeneity between first-line setting studies 4,8 and those with the second-or later-line 1-3,5,6 could occur from the factors that first-line chemotherapy could affect cancer immunogenicity 24 and that trials performed in first-line settings allowed crossover from the chemotherapy arm to the checkpoint inhibitor arm at disease progression 4,8 , while second-or later-line setting trials did not allow crossover 1-3,5,6 . Indeed, preclinical data in a lung cancer mouse model demonstrated that the use of cyclophosphamide and oxaliplatin was associated with immune response stimulation, thereby producing a synergistic effect with checkpoint inhibitors 24 . In consistent with the preclinical data, our extracted data from the original studies showed that the second-or later-line checkpoint inhibitors had a superior impact to that of the first-line treatment compared with chemotherapy in the same Figure 2. Forest plot of meta-analysis comparing checkpoint inhibitors vs chemotherapy for overall survival by PD-L1 expression. The size of the squares reflects the weight of the study in the meta-analysis. The effect size of individual trial represents the extracted hazard ratio and 95% confidence interval, and pooled effect-size represents the combined hazard ratio and 95% credible interval from meta-analysis. The combined effects were calculated with a Bayesian fixed-effect model. PD-L1: programmed death-ligand-1.
PD-L1 expression subgroup. For this reason, we conducted a subgroup NMA including trials with second-or later-line settings to control the heterogeneity between studies with the same checkpoint inhibitors and investigate more useful data in the clinical field. Although our study investigated the efficacy of checkpoint inhibitors as single agent, recently the study reporting first-line immune checkpoint inhibitor with cytotoxic chemotherapy for metastatic non-squamous NSCLC was published 25 . Pembrolizumab combination regimen, consisting pemetrexed, a platinum-based drug, and The effect size of individual trial represents the extracted hazard ratio and 95% confidence interval, and pooled effect-size represents the combined hazard ratio and 95% credible interval from network meta-analysis. The combined effects were calculated with a Bayesian fixed-effect model. PD-L1: programmed death-ligand-1.
Our study has several strengths. This-meta analysis was performed using the most updated survival analysis with a relatively sufficient follow-up duration of each trial, which supports the credibility of the data used in this analysis. Moreover, we separated PD-L1 expression levels into three subgroups with mutually exclusive categories, which could help with clinical decision-making processes using each patient's PD-L1 status. Actually, previous study also analysed the relative effects of the checkpoint inhibitors in second-or later-line settings for advanced NSCLC 26 . However, this study evaluated the efficacies by PD-L1 expression level in an overlapping manner, such as ≥1%, ≥5%, ≥10%, and ≥50%. Our work differs from the study in that hazard ratios of overall survival in the range of 1-49% PD-L1 expression was computed in a robust way, providing more practical evidence. We also included 3688 patients with information about measurable PD-L1 expression level from seven randomised controlled trials, thereby securing adequate power to detect genuine differences. On the contrary, we faced several limitations during this study. First, the HRs and 95% CIs for the PD-L1 1-49% subgroup were calculated by the formula. Although we identified that this estimation could be a reasonable approximation, caution is needed when interpreting results of the PD-L1 1-49% group. However, for example, a survival analysis presenting the HR of the PD-L1 1-49% subgroup in the Keynote 010 trial 27 that was shown at the 2016 ASCO annual meeting reported a pooled HR of 0.75 (95% CI, 0.62-0.91), and our study estimated the calculated HR of 0.76 (95% CI, 0.64-0.89) for the PD-L1 1-49% subgroup. It seems that there is little difference between two values considering the data from our study were retrieved from a longer follow-up period 10 , indicating our calculation could be a robust estimation. Second, as mentioned above, the VENTANA SP142 assay has slightly different properties compared to those of the other two tests, Dako 22C3 and 28-8. Third, other potential effect modifiers, such as previous radiotherapy history or imbalance in the number of patients between the checkpoint inhibitors and chemotherapy groups by PD-L1 expression subgroups were not considered. Despite these limitations, to the best of our knowledge, this NMA is the first study that performs a pooled analysis of seven checkpoint inhibitor trials with a focus on PD-L1 expression status.
In conclusion, for advanced NSCLC patients checkpoint inhibitors showed a more remarkable effect in the PD-L1 ≥ 50% subgroup than in the PD-L1 < 1% or PD-L1 1-49% subgroups. The subset NMA of the second-or later-line setting trials demonstrated the probabilities for each checkpoint inhibitor of being the best treatment by PD-L1 expression level. Based on our results, we carefully recommend atezolizumab, nivolumab, and nivolumab in patients with expressions of PD-L1 < 1%, PD-L1 1-49%, PD-L1 ≥ 50%, respectively.