Introduction

Despite improvements in outcome, in most cases, metastatic cancer remains an incurable disease. While efficacy of experimental therapy remains of primary interest, a focus on toxicity of cancer drugs which could adversely influence quality of life and even increase non-cancer mortality is warranted in the setting of incurable cancer. Randomized controlled trials (RCTs) are the recognized gold standard to evaluate the efficacy and safety of new treatments1,2. Often, toxicity data from RCTs are insufficient and can be misleading3. Large RCTs are not designed to detect statistically significant differences in toxicity between standard and experimental arms. Furthermore, rare but potentially life-threatening adverse events may not be identified in RCTs4.

It has been demonstrated that newly approved anticancer drugs increase morbidity and treatment-related mortality5, however there are few data about toxicity of experimental cancer drugs in unselected trials including those not resulting in drug registration. Here, we report a meta-analysis of efficacy, safety and tolerability of all phase 3 RCTs in common advanced solid tumors registered on ClinicalTrials.gov. The primary objective of the study was to quantify systematically the trade-off between efficacy and toxicity of experimental cancer therapy relative to control group treatment in unselected phase 3 trials. We hypothesized that a small incremental benefit of experimental cancer drugs would be associated with higher toxicity.

Methods

Search strategy

We searched Clinicaltrials.gov6 to identify phase 3 RCTs evaluating new drugs in adult patients with advanced breast, colorectal, lung, or prostate cancer. We included trials categorized as completed or active with accrual completed between January 1, 2005 and October 31, 2016. Consistent with prior methodology7, studies evaluating supportive care agents, studies with different scheduling and/or dosing of the same agent, single arm studies, trials not evaluating systemic therapy (such as trials exploring radiation, surgery, imaging [including screening] and chemoprevention) or those consisting exclusively of biomarker, pharmacokinetic and pharmacodynamics analyses were excluded. Trials listed as active but not recruiting and without available results were excluded as well.

Data extraction and synthesis

We utilized methods similar to those used previously in analyses for approved drugs5. Briefly, full text articles of eligible studies from the literature were retrieved and the primary data for efficacy, safety and tolerability were extracted independently by two coauthors, DR and HG. Disagreement was resolved by consensus. For efficacy endpoints, we extracted hazard ratios (HRs) and corresponding 95% confidence intervals (CI) for progression-free survival (PFS) and/or overall survival (OS). We extracted data only for the primary endpoint of each study. For safety and tolerability, first we identified the number of patients at risk both, in the experimental and control arms, and then collected data on the number of patients with each of the following safety and tolerability outcomes: treatment-related death, treatment discontinuation without disease progression, and 12 commonly reported grade 3/4 adverse events (AEs) including: anemia, neutropenia, thrombocytopenia, diarrhea, vomiting, stomatitis, hypertension, cardiac events, fatigue/asthenia, skin toxicity, dyspnea, neuropathy. Subsequently, we calculated the odds ratios (ORs) comparing the experimental and control groups for each safety and tolerability outcome measure.

Assessment of study quality or risk of bias was not performed routinely as all included studies were large randomized trials and those which were open-label studies were often appropriately unblinded (e.g. substantial differences in toxicity profile between experimental and control drugs making blinding ineffective). In sensitivity analyses, the impact on concealment and the potential for the placebo effect were explored (see below).

Statistical analysis

Data were presented descriptively as absolute numbers, proportions, and ranges, as appropriate. Data were pooled in a meta-analysis using RevMan version 5.3 (Cochrane Collaboration, Copenhagen, Denmark). For efficacy analyses, pooled estimates of HRs were computed and weighted using generic inverse variance approach8 and random-effect modelling9. Analyses of safety and tolerability were computed using different methods for toxic death, treatment discontinuation and grade 3/4 AEs. For toxic death where absolute event rates were less than 1%, the Peto one-step odds ratio method was utilized8,10. For treatment discontinuation where there were low absolute event rates and substantial variability in relative effect-sizes, the Mantel–Haenszel odds ratio method was used9. Finally, for grade 3/4 AEs, the DerSimonian and Laird random-effects method was utilized and studies were weighted using the generic inverse variance approach8. Sensitivity analyses excluding open-label studies and those which were not placebo-controlled were performed. We also performed additional subgroup analyses of efficacy and safety outcomes based on cancer site. Associations between efficacy and toxicity were assessed using meta-regression which comprised a univariable linear regression of the natural logarithm of the HR for efficacy endpoints and the natural logarithm of the OR for toxicity outcomes. Regression was weighted by individual study sample size using the weighted least square (mixed effect) function11. Statistical analyses were conducted using SPSS statistical software, version 21 (IBM Corp, Armonk, NY). All statistical tests were two-sided, and statistical significance was defined as p < 0.05. No corrections were made for multiple statistical testing.

Conference presentation

This study was presented in part at the 2018 American Society of Clinical Oncology Annual Meeting (Ribnikar et al. J Clin Oncol 2018; 35(15_Suppl); Abstract 6588.

Results

A total of 377 RCTs were identified initially. After excluding ineligible studies, a total of 143 studies comprising 88,603 patients were included in the analysis (see Fig. 1 for study selection schema and PRISMA flow diagram). The characteristics of included trials are presented in Table 1. The details of the 143 trials that were included in the analysis are presented in the Supplementary Table 1. Among the 377 trials identified in clinicaltrials.gov published results could not be identified for 42 studies (11%). This likely reflects publication bias.

Figure 1
figure 1

Study selection.

Table 1 Characteristics of included trials.

Efficacy

PFS was the primary endpoint in 68 of trials (48%) and data on PFS were reported in 60 studies (42%). PFS was significantly improved with experimental therapy in 35 (58%) studies. Overall, experimental drugs were associated with a 20% relative improvement in PFS in comparison to control drugs (HR 0.80; 95% CI 0.78–0.82). Sensitivity analysis excluding open-label studies (i.e. only including blinded trials) did not change results substantially (HR 0.75; 95% CI 0.72–0.78). A sensitivity analysis excluding placebo-controlled trials showed that experimental drugs were associated with a 14% relative improvement in PFS in comparison to control drugs (HR 0.86; 95% CI 0.83–0.89).

OS was the primary endpoint in 64 trials (45%) and data on OS were reported in 56 (39%) studies. OS was significantly improved with experimental therapy in 26 (46%) studies resulting in a 13% relative improvement in overall survival (OS) compared to control agents (HR 0.87; 95% CI 0.85–0.89). Sensitivity analysis excluding open-label studies showed similar results (HR 0.87; 95% CI 0.85–0.90) as did sensitivity analysis excluding placebo-controlled trials (HR 0.88; 95% CI 0.85–0.92).

A subgroup analysis of efficacy outcomes according to cancer site demonstrated a similar trend as the overall analysis with improved PFS and OS in all cancer sites except in prostate cancer where we observed a worse PFS in experimental group (HR 1.45; 95% CI 1.00–2.11) in comparison to control group (see Table 2 for details regarding efficacy and safety outcomes according to cancer site).

Table 2 Hazard ratios (HRs) and odds ratios (ORs) and their 95% confidence intervals (CIs) for efficacy and safety outcomes based on the site of cancers.

Toxicity

Data about individual grade 3/4 AEs were reported in all studies, however 9 (6%) studies did not report data on toxic deaths and 18 (13%) studies did not report data on treatment discontinuation. Overall, compared to control groups in individual studies, experimental drugs were associated with higher odds of toxic death, treatment discontinuation without progression, and most grade 3/4 AEs (see Table 3). A sensitivity analysis exploring the effect of blinding on toxicity data is shown in Table 4. There were no differences between blinded and open label studies for toxic death, however, blinded studies showed higher odds for treatment discontinuation and the following grade 3/4 adverse events: neutropenia, thrombocytopenia, diarrhea, stomatitis, hypertension, skin toxicity and neuropathy. A sensitivity analysis excluding placebo-controlled trials also showed higher odds for toxic death, treatment discontinuation and all grade 3/4 AEs (see Table 5), but odds were especially higher for thrombocytopenia, diarrhea, stomatitis, hypertension, cardiac events and skin toxicity.

Table 3 Odds ratios (ORs) and 95% confidence intervals (CIs) for safety and tolerability end points of experimental drugs in comparison to control groups.
Table 4 Results of sensitivity analysis according to concealment method.
Table 5 Results of sensitivity analysis based on placebo-control.

An additional subgroup analysis regarding safety outcomes according to the cancer site demonstrated higher odds for almost all toxicity outcomes with experimental drugs in all cancer sites, except for toxic death in breast cancer, neuropathy in prostate, lung and colorectal cancer and neutropenia in lung cancer. Odds were especially higher for thrombocytopenia in breast cancer, skin toxicity and stomatitis in prostate cancer, diarrhea and skin toxicity in lung cancer and thrombocytopenia, hypertension and skin toxicity in colorectal cancer (see Table 2 for details regarding odds for individual toxicity according to cancer site).

Associations between efficacy and toxicity

We did not identify any statistically significant associations between PFS and any of the endpoints of toxicity. However, there was a statistically significant positive association between the HR for OS and the OR for treatment discontinuation without progression and for skin toxicity and a negative association with thrombocytopenia (see Table 6).

Table 6 Associations between efficacy and toxicity. β refers to the linear regression co-efficient.

Discussion

The main goal of phase 3 RCTs is to assess the efficacy of experimental therapy, however in the palliative treatment of patients with advanced cancer, where maintaining a good quality of life is crucial, toxicity profile and tolerability of drugs are of considerable importance. A modestly effective anticancer agent which adds significant toxicity and attenuates quality of life may not provide a favorable balance between benefits and risks.

In this study, we quantified the efficacy, safety and tolerability of experimental anti-cancer drugs evaluated in phase 3 RCTs in common solid tumors over almost 12 years. Results show that only 57% of phase 3 RCTs resulted in a significant improvement in their primary endpoint. While the estimate of around 50% success rate is consistent with prior published data12, given that phase 3 trials were likely supported by positive phase 1 and 2 data, we consider that a 57% success rate is disappointing. Pooled data show a 20% relative improvement in the hazards of progression and a 13% relative improvement in the hazards of death with experimental agents compared to controls. In contrast, we demonstrated that experimental drugs are associated with increased odds of toxic death, treatment discontinuation without disease progression and high grade AEs when compared to the standard treatment received by controls. When we evaluated individual toxicities independently, experimental agents showed increased odds for thrombocytopenia, diarrhea, stomatitis, hypertension, cardiac events, fatigue/asthenia and skin toxicity compared to the treatment in the control arms. Univariable analysis did not identify any association between PFS (the most common primary endpoint of included studies) and any of toxicity outcome measures, however there was a statistically significant association between OS and treatment discontinuation without disease progression and with skin toxicity (greater magnitude of effect) and thrombocytopenia (lower magnitude of effect). The reason for these observations is unclear and may reflect a chance finding.

The balance between benefits and risks of anti-cancer drugs extends over a spectrum of efficacy and toxicity. It is difficult to identify scenarios in which trade-offs between benefits and risks are favorable and unfavorable. Furthermore, because phase 3 trials are closely monitored and stopped at signs of futility or increased toxicity in experimental groups, the trade-off of risks and benefits for patients is very likely to be different early in the earlier phases of accrual compared to later phases.

Data suggest that industry-sponsored RCTs, which represent the majority of all RCTs7, are more likely to exclude elderly patients as well as those with medical comorbidities and certain concomitant medications13. This means that compared to real-world practice, participants of RCTs are likely to have better performance status, less comorbidity and are expected to have better tolerability of treatment. This has direct implications to routine clinical practice since drugs approved with a favorable balance between benefit and risk among participants of RCT populations may not be representative of less selected real world population in which reduced benefit and increased toxicity may be observed, limiting generalizability5.

A previous meta-analysis reported an increased toxicity associated with FDA approved agents5 with a similar magnitude of effect to that we observed in this current study exploring unselected drugs. In contrast to our study, Niraula and colleagues did not include immunotherapeutic agents. When targeted agents used as monotherapy were compared to chemotherapeutics in the analysis by Niraula et al., a lower rate of treatment discontinuation without disease progression and less hematologic toxicity were observed. However, targeted agents are more likely to be used for a prolonged time in comparison to conventional chemotherapy which is typically administered for shorter durations. This can lead to an increased risk of cumulative low grade toxicity which may not be captured in RCTs which focus more on higher grade toxicities.

In our study, no association was observed between efficacy or toxicity end points. However, individual reports for some targeted agents, support an association between improved clinical outcomes such as PFS, OS or quality of life and certain AEs (e.g. skin toxicity with EGFR inhibitors and hypertension with VEGF inhibitors)14,15. This can occur when inhibition of the same target is responsible for both efficacy and toxicity.

Prior data demonstrate no apparent difference in efficacy between targeted therapy where the target was an oncogene, activated oncogenic signaling pathways, angiogenesis or an immune-modulatory target16. However, greater improvement in PFS with drugs targeting oncogenes or activated pathways and anti-angiogenic agents as compared to immunotherapy and conventional cytotoxic drugs was seen. This finding may reflect that PFS may not be the optimal endpoint for trials evaluating immunotherapy17. Of note, immunotherapy was associated with a more favorable safety and toxicity outcomes compared with other forms of targeted therapy or cytotoxic chemotherapy16. These results should be interpreted with caution as immune-related events may be sub-optimally reported in RCTs18 as classification of AEs have been based on the Common Terminology Criteria for Adverse Events (CTCAE)19, which may underestimate some immune-related AEs20. In contrast, some of the individual severe AEs such as diarrhea, skin toxicity and dyspnea (as surrogates for colitis, dermatitis and pneumonitis) may be captured in the analysis16.

Of note, our sensitivity analysis showed that blinded studies were associated with significantly higher odds for treatment discontinuation and several grade 3/4 AEs. These findings additionally strengthen our general results and raise concern about under-reporting of toxicity in open-label trials. It has been suggested that patient-reported outcomes (PROs) are a key outcome measure of clinical trials regardless of blinding status and thus improved design could help ensure high-quality data which may inform patient-centered care21.

Since the majority of studies included in this analysis investigated efficacy and toxicity of targeted agents such as small molecules (most commonly kinase inhibitors) it is important to highlight key lessons learned from pivotal trials of this group of drugs. It has been suggested that such drugs should undergo testing of more than one dose in phase 2 trials, incorporating biomarker and target inhibition data when the mechanism of action is clear and continuously evaluating dosing and dosing regimens throughout drug development. The observation of treatment-related death in phase 3 trials is highly undesirable22.

Our analysis has limitations. First, it is based on clinical trial reports and not on individual patient data. Second, significant heterogeneity was seen between trials. In some trials the control group was an approved active treatment, whilst in others it was a placebo or best supportive care. This has an important impact on the observed relative benefit and relative toxicity. Third, we included data on only 4 common solid tumors, thus limiting generalizability to other tumor types; however, these 4 groups of tumors represent the largest cancer burden worldwide. Fourth, efficacy endpoints were reported as relative statistics and relative differences do not necessarily translate into large differences in absolute benefits23. Finally, the use of CTCAE more likely captures more severe acute toxicities and may not capture less severe, but chronic AEs. This makes the generalizability of AEs as a measure of overall quality of life more limited. Furthermore, an important point we would like to highlight at the end is the fact that toxicities that occur under 5% may be inadequately reported in clinical trials and thus may not be fully represented in our manuscript as we could only extracted data that were reported24. And last but not least, our analysis represents completed trials which are more likely to be positive so the trade-off of risk and benefits for patients we observed is somewhat different that would be if the trials were not completed.

In conclusion, only 57% of individual phase 3 RCTs in common solid tumors result in improved outcomes and many experimental drugs have worse safety and tolerability compared to control therapy. Oncologists should be aware of these risks and should disclose them to cancer patients when considering enrollment on phase 3 trials.