Introduction

Several oncology societies have developed tools to quantify the magnitude of clinical benefit of drugs for the treatment of solid tumors. These include the American Society of Clinical Oncology Value Framework (ASCO-VF)1,2, the European Society for Medical Oncology Magnitude of Clinical Benefit Scale (ESMO-MCBS)3,4, the National Comprehensive Cancer Network (NCCN) Evidence Blocks5 and the ASCO Cancer Research Committee criteria (ASCO-CRC)6.

The magnitude of clinical benefit does not influence regulatory approval of cancer drugs. Approval by regulatory agencies such as the US Food and Drug Administration (FDA) requires substantial evidence of safety and efficacy from adequate and well-controlled trials irrespective of the magnitude of such benefit7. Advances in the understanding of the molecular basis of cancer, has led to rapid development of new drugs and an increasing number of cancer drug approvals based on non-randomized trials8,9. Despite this, randomized controlled trials (RCTs) remain the gold standard to evaluate the benefits and risks of new cancer therapies10. However, RCTs have limitations including the increasing use of intermediate endpoints which have not been validated as true surrogates for definitive outcomes11,12 and the choice of control group therapy which may not always reflect contemporary standard of care13,14,15,16.

While there is an extensive literature exploring the associations between the magnitude of clinical benefit and characteristics of new drugs as well as the clinical trial design supporting their approval9,17,18,19 much less is known about the influence of control group therapy on the output of clinical benefit frameworks. Knowledge of the impact of control group therapy could aid in the design of clinical trials (such as expected effect size and influence17 on quality of life (QoL) assessment), inform drug reimbursement decisions by payers and provide feedback to the developers of value frameworks. In this article, we quantify the proportion of RCTs meeting thresholds for substantial clinical benefit at the time of FDA marketing approval and assess the association between characteristics of control group therapy and magnitude of clinical benefit. We hypothesized that the magnitude of clinical benefit difference will be greater in trials in which control group therapy has minimal or no overlap with the experimental group.

Methods

Data sources

We searched the Drugs@FDA database20,21 to identify applications for approvals of cancer drugs for solid tumors from January 1, 2012, to December 31, 2021. We excluded drugs approved for hematologic malignancies, for pediatric populations and non-therapeutic agents such as medical devices and diagnostic or contrast agents. Then, we excluded applications which were based exclusively on single-arm or non-randomized trials as well as non-inferiority or equivalence studies. Finally, we searched MEDLINE (host: PubMed)22 to identify primary publications of clinical trials supporting FDA approvals.

Data extraction

Four authors (C.M., J.C.T., A.B. and A.T.) extracted data using predesigned electronic forms. The following characteristics were collected for each application: approval date, approval indication, drug and brand name, cancer site, number of trials supporting approval (one vs. more than one trial), application number, whether or not approval was based on a subgroup analysis, submission type (initial vs. supplemental), type of approval (accelerated vs. regular approval) and regulatory pathways, including priority or standard review, breakthrough or non-breakthrough therapy designation, and orphan or non-orphan drug designation, as determined by the FDA20,21,23,24. We paid specific attention to drug class (chemotherapy, hormone therapy, immunotherapy, or targeted therapy) for both the experimental and control groups, defined by the authors. Control arm therapy was divided in two different groups: (1) active treatment group defined as control arm comprising an active anticancer drug such as chemotherapy, hormone therapy, immunotherapy and/or targeted therapy and (2) non-active treatment group defined as control arm comprising placebo and/or best supportive care alone. Subsequently, the active treatment group was further divided into (1) active treatment plus placebo and (2) active treatment without placebo. Among the active treatment plus placebo subgroup, matched placebo was defined as a RCT in which there was overlap between experimental and control arm active treatments with the control arm containing a placebo while the experimental arm comprised an additional active experimental therapy (e.g. chemotherapy plus immunotherapy in the experimental arm vs. the same chemotherapy regimen plus placebo in the control arm)25. Finally, when possible, we assessed the quality of the control group therapy (optimal vs. suboptimal) and considered suboptimal control group therapy if prior RCT data showed that the control agent was inferior to an available alternative based on methods reported previously13.

We also collected data on whether a companion diagnostic test was available, as defined by the FDA26. In addition, the following characteristics were collected for each RCT: setting (curative vs. palliative), study design (open-label vs. blinded), phase (II vs. III), sample size, crossover, and efficacy primary endpoint (Overall survival [OS] vs. intermediate endpoint as defined by the FDA27). For RCTs with co-primary endpoints, we identified the most definitive primary endpoint chosen by FDA to support approval. Finally, toxicity data were extracted from published articles and when available so were QoL data. A drug was considered to have shown a QoL benefit if a statistically significant difference was reported between the experimental arm and baseline among RCTs based on a global score, a subscale, or a specific item from a validated patient-reported outcome instrument.

Data scoring

Three authors (C.M., J.C.T. and A.B.) scored each RCT with 4 different frameworks: ESMO-MCBS version 1.14, ASCO-VF version 22, NCCN Evidence Blocks5, and ASCO-CRC6. Discrepancies were resolved by a fourth author (A.T.). If more than one RCT supported a single application, each trial was evaluated separately and assigned a separate grade.

Substantial clinical benefit was defined as recommended in prior studies. The ASCO-CRC published targets for clinically meaningful benefit using a single cutoff in clinical trials for 4 cancer types (pancreatic cancer, lung cancer, triple-negative breast cancer, and colon cancer): OS improvements ranging from 2.5 to 6 months and progression-free survival (PFS) improvements ranging from 3 to 5 months. Consistent with prior studies18,28, we expanded this definition to RCTs of all solid tumors in the palliative setting28. For other scales, the following cutoffs were utilized: ASCO-VF threshold score ≥ 45 (applied in palliative and curative setting)29; NCCN Evidence Blocks threshold score ≥ 16 (applied in palliative and curative setting)18; and a grade of A or B for trials of curative intent and 4 or 5 for those of non-curative intent using ESMO-MCBS3,4.

Statistical analysis

Data were reported descriptively as proportions, medians, and ranges. Associations between characteristics of control group therapy and substantial clinical benefit scores were explored using logistic regression as were associations between application and clinical trial characteristics and magnitude of clinical benefit. Multivariable analysis was planned only if there were sufficient data to fit a multivariable model adequately. Results of logistic regression were reported as odds ratios (ORs) and their respective 95% confidence intervals (CIs). Sensitivity analyses were performed excluding trials in the curative setting. Additionally, a post-hoc sensitivity analysis was performed to examine the role of QoL and toxicity on substantial benefit measured by ESMO-MCBS and ASCO-VF. In this analysis, we rescored trials without QoL and toxicity data and repeated the analyses described above. All analyses were conducted using SPSS Statistics, version 25 (IBM Corp, Armonk NY). Statistical tests were 2-sided, and statistical significance was defined as a 2-tailed P value < 0.05.

Use of experimental animals and/or human participants’ statement

Live animals and/or humans were not involved in this study.

Results

Study cohort

We identified 171 RCTs supporting the approval of 76 new cancer drugs for 164 solid tumor indications between January 1, 2012, and December 31, 2021. Among the 164 applications, in 158 (96%) the approval was based on 1 RCT, in 5 (3%) applications the approval was based on 2 RCTs and in 1 application (1%) the approval was based on 3 RCTs. Of the 171 RCTs included, one trial included two different cohorts (germline, and non-germline BRCA mutation carriers)30 and two trials each supported approval of two different indications (one trial for pembrolizumab plus chemotherapy and pembrolizumab as a single agent31 and another trial for nivolumab plus ipilimumab and nivolumab as a single agent32). Consequently, a total of 174 data points were available for analysis (see Fig. 1).

Figure 1
figure 1

Summary of applications (n = 164), RCTs (n = 171) supporting the FDA application approvals and final data points (n = 174) analyzed in our study. RCTs Randomized Controlled Trials, FDA US Food and Drug Administration.

Tables 1 and 2 summarize the characteristics of included applications and trials supporting drug approval at the time of market authorization.

Table 1 Characteristics of applications.
Table 2 Characteristics of RCTs.

Value framework scores

The ESMO-MCBS version 1.1 scores could be applied to 172 of 174 trials (99%). Among these, 77 trials (45%) met the threshold for substantial clinical benefit. ASCO-VF version 2 scores were applied to 170 of 174 trials (98%). Of these, 79 trials (46%) met the ASCO-VF scores for substantial clinical benefit. NCCN Evidence Blocks were applied to 150 trials (86%) of which 108 (72%) met the threshold for high clinical benefit. Finally, ASCO-CRC criteria were applicable to 135 (76%) trials in the noncurative setting. Of these, 99 (73%) met the criteria for substantial clinical benefit. When we rescored trials without QoL and toxicity data using ESMO-MCBS and ASCO-VF, 49 (28%) and 68 (40%) trials met the threshold for substantial clinical benefit, respectively.

Association between control group therapy and clinical benefit

Of the 174 RCTs included in the analysis, 52 (30%) had non-active treatment such as placebo and/or BSC in the control arm and 122 (70%) had an active treatment within the control arm such as chemotherapy, hormone therapy, immunotherapy and/or targeted therapy. Among RCTs with active treatment plus placebo in the control arm, 34 (28%) were matched placebo (see Fig. 2). In total, 17 (10%) trial used a suboptimal control group therapy.

Figure 2
figure 2

Types of control group therapy. (A) Of 174 RCTs analyzed, 70% had an active therapy and 30% had non-active therapy in the control arm. Among RCTs with active therapy in the control arm, 28% were matched placebo (e.g., active therapy plus placebo in the control arm vs. the same active therapy plus an additional drug in the experimental arm). (B) Active therapy group was defined as control arm comprising an active anticancer drug such as chemotherapy, hormone therapy, immunotherapy and/or targeted therapy and non-active therapy group was defined as control arm comprising placebo and/or best supportive care alone. Other = Granulocyte–macrophage colony-stimulating factor. RCTs Randomized Controlled Trials.

Table 3 shows associations between characteristics of control group therapy and clinical benefit. In univariable analyses, there were non-significant associations between active therapy and higher clinical benefit scores with ESMO-MCBS and NCCN Evidence Blocks, but not with ASCO-VF and ASCO-CRC. RCTs with substantial overlap between experimental and control arms (e.g. a control arm comprising of active treatment plus a matched placebo compared to the same therapy with an additional drug in the experimental arm) were associated with significantly lower odds of substantial benefit with ESMO-MCBS and ASCO-VF (OR 0.27, 95% CI 0.11–0.65; P = 0.003 and OR 0.30, 95% CI 0.13–0.73; P = 0.008, respectively) but not with NCCN Evidence Blocks or ASCO-CRC criteria (OR 0.74, 95% CI 0.28–1.97; P = 0.55 and OR 1.36, 95% CI 0.51–3.64; P = 0.54, respectively). Similar results were observed when excluding trials in the curative setting. In the post-hoc sensitivity analysis in which we rescored trials with the ESMO-MCBS and ASCO-VF scales without QoL and/or toxicity adjustment, the magnitude of effect was attenuated, and statistical significance was lost (OR 0.50, 95% CI 0.18–1.34; P = 0.17 and OR 0.49, 95% CI 0.20–1.17; P = 0.11, respectively). There was no significant difference between type of active therapy and clinical benefit, although, there was a non-significant association with higher odds of substantial benefit with ESMO-MCBS with trials in which the control group therapy was chemotherapy, while for ASCO-CRC a non-significant association in the opposite direction was observed. Analysis of optimal versus suboptimal control group therapy was limited by small number of RCT categorized as having suboptimal control groups. There appeared to be non-significant association with lower magnitude of clinical benefit for trials with optimal control groups using ESMO-MCBS and ASCO-VF. A non-significant effect in the opposite direction was observed for NCCN Evidence Blocks. No association was observed for ASCO-CRC. Multivariable analysis was attempted, but a model could not be fitted adequately.

Table 3 Association between characteristics of control group therapy and clinical benefit.

Association between characteristics of applications and clinical trials and clinical benefit

Table 4 shows associations between characteristics of applications and of clinical trials with magnitude of clinical benefit. As expected, based on prior work17,19, there were statistically significant associations between high ESMO-MCBS scores and immunotherapy trials, drugs approved with a companion diagnostic test, breakthrough therapy designation, open-label trials, and studies which allowed crossover. However, for the ASCO-VF, only drugs with a companion diagnostic test and priority review were associated with greater clinical benefit. For NCCN Evidence Blocks, drugs with a companion diagnostic test were also associated with substantial clinical benefit while for ASCO-CRC statistically significant association with meaningful clinical benefit was observed with the use of intermediate endpoints.

Table 4 Association between characteristics of applications and clinical trial and clinical benefit.

Discussion

RCTs have been the gold standard to demonstrate efficacy and safety of new cancer therapies. However, even RCTs have limitations10,11,12. The characteristics of control group therapy have been shown to influence the conclusions of RCTs33. Despite this, little is known about the influence of control group therapy on magnitude of clinical benefit scales. In this article, results show that among trials with substantial overlap between experimental and control therapy (e.g. active treatment plus a matched placebo in the control arm vs. the same active therapy plus an additional drug in the experimental arm) there appeared to be a lower odd of substantial clinical benefit with ESMO-MCBS and ASCO-VF, but no difference with NCCN Evidence Blocks and ASCO-CRC. We hypothesized that this discordance was explained by the difference in the methodology of these value frameworks. ESMO-MCBS and ASCO-VF grades are based on efficacy outcomes and adjusted if improvement in toxicity, QoL or tail of the curve effects are observed. It is important to highlight however that typically, these differences in QoL need to be statistically significant (despite generally low statistical power for such endpoints) and do not explore whether differences meet the minimally clinical important difference for the respective scales. NCCN Evidence Block scores are performed by NCCN Panel members and assess efficacy, safety, quality and quantity of evidence, consistency of evidence and affordability, whereas ASCO-CRC grades are applicable only in the non-curative setting and only evaluate efficacy (OS and PFS).

To investigate the influence of these methodologic differences in measurement of clinical benefit between scales, our post-hoc sensitivity analysis rescored trials with the ESMO-MCBS and ASCO-VF frameworks excluding data on QoL and high-grade toxicity. Results showed a non-significant and lower magnitude association between active treatment plus a matched placebo and lower clinical benefit. This suggests that QoL and toxicity likely explain at least part of the discordance observed between scales. These findings also have face validity as experimental therapy which is comprised in part by the same treatment as the control group would be expected to have a lower chance of reducing grade 3–4 toxicity and consequently QoL is less likely to be improved.

Consistent with prior studies17,18,19, our data show that less than a half of trials meet the threshold for meaningful clinical benefit as assessed using ESMO-MCBS and ASCO-VF, whereas approximately three quarters showed substantial clinical benefit using the NCCN Evidence Blocks and the ASCO-CRC criteria. Of note, the type of active therapy was not associated with statistically significant differences in the magnitude of clinical benefit. However, this analysis was limited by small sample sizes, and it is noteworthy that meaningful effect sizes were observed for control groups comprising of chemotherapy (ESMO-MCBS and NCCN Evidence Blocks) and of immunotherapy (NCCN Evidence Blocks).

The appropriateness of control group therapy was not associated with a statistically significant difference in the magnitude of clinical benefit. There seemed to be non-significant association with lower odds of substantial clinical benefit for trials with optimal control groups when using the ESMO-MCBS and ASCO-VF while a non-significant effect in the opposite direction was observed for NCCN Evidence Blocks. The observation that these associations were in opposite directions again suggest the importance of QoL and high-grade toxicity assessment in the interpretation of the results of these frameworks. Of interest, the ASCO-CRC framework was not sensitive to this effect, and this could be explained by the fact that this framework can only be applied in the palliative setting and its assessment is based exclusively on efficacy outcomes.

Of interest, we also explored predictive factors associated with clinical benefit. Consistent with prior studies17, trials supporting approval of cancer drugs with a companion diagnostic test were more likely to be scored as having a substantial clinical benefit according to the ESMO-MCBS, the ASCO-VF and the NCCN Evidence Blocks. This observation which has been reported previously17,19 is likely explained by higher magnitude of benefit seen when targeted therapy is delivered to groups of patients most likely to benefit for it and avoiding empirical exposure (and thereby unnecessary toxicity) of those who are unlikely to benefit34. In addition, some variables such as immunotherapy trials, breakthrough therapy designation and priority review were associated with substantial clinical benefit. Of note, consistent with prior work17,19, our analysis showed an association between higher framework scores and intermediate endpoints. Regulators require clinical trials to show that surrogate endpoints can be relied upon to predict, or correlate with, clinical benefit35. As not all endpoints which were examined in our analysis met the above definition, we elected to use a broader term of intermediate endpoint which we believe is the more scientifically robust term. The observation that potentially unvalidated endpoints are associated with higher clinical value scores is an area of concern. Furthermore, the discordant observation of palliative setting and clinical benefit with ESMO-MCBS and ASCO-VF is likely explained by the different ways in which trials are assessed by these frameworks. In the palliative setting (which comprises the majority of included trials), studies may have higher odds of substantial clinical benefit with ASCO-VF due the ability to apply extra points cumulatively for outcomes such as tail of the curve effects, treatment free interval, cancer-related symptoms and QoL. In contrast, with ESMO-MCSB a total of 1 extra point can be added to such effects.

Our study has several limitations. First, we evaluated clinical benefit at the time of approval, however, the analysis of clinical benefit can change over time with updated data on efficacy, toxicity or QoL over the course of post-marketing period17,36. Second, the source for data collection was variable with QoL and patient-reported outcomes data being extracted from published articles rather than drug labels. Unfortunately, these data are frequently not presented in primary publications of clinical trials. Third, defining control therapy group as optimal and suboptimal might be controversial given that depending on the tumor type, optimal treatment can rapidly evolving. Fourth, our assessment of whether control group therapy was optimal was based on standards of care in high income countries. Value frameworks are utilized often in lower resource environments where these definitions of optimal control group therapy may not apply. Finally, in many of the analyses, the sample size was small. This resulted in an inability to adequately fit multivariable models. This will add some uncertainty to the reported results and thereby limit generalizability.

In summary, clinical benefit scales can be sensitive to the type of control group therapy. RCTs with an active treatment plus matched placebo in the control arm were less likely to be scored as providing substantial clinical benefit using the ESMO-MCBS and the ASCO-VF scales. Control group therapy did not influence NCCN Evidence Blocks or ASCO-CRC scores. This is likely explained, at least in part, by differences between the different clinical benefit scales in the inclusion and/or weighting of QoL and toxicity. These results can be used to aid in the design of clinical trials (trials with substantial overlap between experimental and control therapy are likely to have a lower effect size and attenuated impact on QoL assessment), inform drug reimbursement decisions by payers (a lower incremental cost effectiveness ratio would likely be observed with greater overlap between experimental and control group) and provide feedback to the developers of value frameworks. The sensitivity of the ESMO-MCBS and ASCO-VF frameworks to control group therapy should be taken into consideration for future development of these scales. Adjustment of scores of trials with overlapping treatments may be warranted.