Introduction

Diffuse large B-cell lymphoma (DLBCL) is the most common aggressive lymphoma subtype. Immunochemotherapy, mostly with rituximab plus cyclophosphamide, doxorubicin, vincristine, and prednisone (R-CHOP), has become the standard treatment over the past decade [1,2,3,4]. However, 15–40% of patients are refractory to initial immunochemotherapy, or relapse after complete response (CR). Such patients have poor outcomes, mainly depending on the risk group [5]. There is an urgent need to find more effective agents or regimens for high-risk patients in the immunochemotherapy era.

Overall survival (OS) is the gold-standard treatment endpoint in randomized controlled trials (RCTs). However, OS as the primary endpoint requires a large sample size and long follow-up time to observe the survival benefit, leading to high clinical development costs and delays in introducing novel drugs. When used as the primary endpoints in clinical trials, early efficacy endpoints such as progression-free survival (PFS) and event-free survival (EFS) may require a smaller sample size and shorter evaluation time than OS, and have been established in some malignancies [6,7,8]. Trial- and individual-level studies have demonstrated that 24-month PFS and EFS may be considered the early efficacy endpoints for OS in DLBCL [9,10,11,12]. However, these studies may not be comprehensive because they only included available 13 RCTs willing to disclose individual patient data and were based on a subset of all potentially eligible trials [1,2,3,4, 12,13,14,15,16,17,18,19,20,21]. The association of PFS or EFS with OS has not been specifically addressed at trial- or treatment arm-level in RCTs on patients treated with immunotherapy; furthermore, its association and predictive value have not been externally validated. We investigated PFS and EFS as efficacy endpoints in DLBCL in the rituximab era through literature-based analysis at both trial- and treatment arm-level. The correlation between PFS and OS was validated in independent cohort studies to confirm its significant role in guiding clinical practice.

Methods

Literature search and study selection

Inclusion and exclusion criteria

This study was exempted from review by the institutional review board because it used existing data and enrolled no human subjects. The eligibility criteria included phase III RCTs, phase II trials, and retrospective studies investigating the long-term survival of DLBCL patients who received first-line rituximab-containing immunochemotherapy. Studies were excluded if they met any of the following conditions: phase I trial; transformed or relapsed/refractory DLBCL; inadequate survival data; serology-positive for HIV, hepatitis B/C virus, or Epstein–Barr virus; sample size of <100 patients per arm; or patients with DLBCL consisting of <80% of the whole-sample size.

Literature search

Studies published before 31 December 2019, were included via a systematic literature search of MEDLINE, Embase, and PubMed using the keyword “DLBCL AND rituximab” and with the restriction to RCT, phase II trial, and retrospective study. Formal publications and meeting abstracts were included. Two authors (J.Z. and J.T.) conducted the literature search independently, and reviewed the results with a third author (S.N.Q.). When disagreement in study inclusion was met, J.Z., J.T and S.N.Q. carefully reviewed the potential eligible study again. Disagreements about study inclusion were resolved by consensus.

RCT inclusion and quality control

All potentially eligible RCTs were assessed for risk of bias in seven domains (random sequence generation, allocation concealment, blinding of participants and personnel, blinding of outcome assessment, incomplete outcome data, selective reporting, and other bias) using the Cochrane Collaboration tool. All information available in the assessment was acquired from formal publications, meeting abstracts, trial registry information on ClinicalTrials.gov (www.clinicaltrials.gov), and e-mail contact with trial designers. RCTs with high risk of bias in any domain were excluded.

A total of 109 abstracts were reviewed. After excluding 43 ineligible records, the full texts of 66 records were reviewed. Thirty-nine unqualified records were excluded, and 27 RCTs were included in the quality assessment (Figs. 1a and 2; Supplemental Table 1) [1,2,3,4, 13,14,15,16,17,18,19, 22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37]. Seven trials were rated with unclear risk of selection bias because of the lack of comprehensive reporting on the randomization process. The LNH03-1B trial was excluded from the study because of the high risk of bias related to its premature close and a sample size far below statistical requirements (Fig. 2; Supplemental Table 1) [37]. Eventually, 26 qualified RCTs were included for trial- and treatment arm-level analyses (Table 1) [1,2,3,4, 13,14,15,16,17,18,19, 22,23,24,25,26,27,28,29,30,31,32,33,34,35,36]. According to the purposes of each trial, 26 RCTs were classified into 5 subgroups: (1) four RCTs (15%) compared R-CHOP (like) with CHOP (like) [1,2,3,4]; (2) ten (38%) RCTs compared R-CHOP (like) with rituximab+intensified/de-escalated chemotherapy [13, 15,16,17, 22, 23, 25,26,27, 31]; (3) nine (35%) investigated maintenance or consolidation therapy [3, 14, 18, 23, 24, 28,29,30, 32]; (4) three (12%) focused on R-CHOP+novel targeted therapy [19, 34, 35]; (5) two (8%) investigated the novel use of anti-CD20 monoclonal antibody [33, 36]. Of note, the two-stage randomized trial ECOG4494/CALGB9793 [3] and the 2 × 2 factorial randomized trial DLCL04 [23] were classified into 2 subgroups according to the respective research questions. These 26 RCTs included a total of 16,340 patients (median sample size, 623), with a median follow-up time of 2–10 years. The most common primary endpoints in these RCTs were EFS (n = 12, 46%) and PFS (n = 7, 27%), followed by disease free- or failure-free survival (n = 5, 20%), OS (n = 1, 4%), and CR (n = 1, 4%). The majority of RCTs (n = 20, 77%) used 2 or 3 years as the time point of the primary endpoint.

Fig. 1: Flow chart for study inclusion.
figure 1

PRISMA flow charts for a phase III RCTs and b phase II and retrospective studies. RCTs randomized controlled trials.

Fig. 2: Summary of risk of bias in RCTs.
figure 2

“+” (green), “?” (yellow), and “−” (red) represent low, unclear, and high risk of bias, respectively. RCTs randomized controlled trials.

Table 1 Summary of phase III randomized controlled trials included in trial- and treatment arm-level analyses.

Phase II trial and retrospective study inclusion and quality control

To validate the RCT findings, we analyzed the relationship between PFS and OS using phase II and retrospective data. For single-arm phase II trials and retrospective cohort studies, quality was assessed, with a maximum 9-star score, using the Newcastle–Ottawa scale (NOS) in terms of selection, comparability, and outcome [38]. Studies with low to moderate risk of bias (≥6 stars) were included in the statistical analysis. For the LNH2007-3B randomized phase II trial [39], the risk of selection bias was assessed using the Cochrane Collaboration tool. A total of 1129 abstracts were reviewed. After excluding 865 unqualified records, the full texts of 264 records were reviewed. We excluded 203 ineligible studies, and included 61 studies in the quality assessment (Supplemental Table 2). After excluding 10 studies with high risk of bias, a total of 47 retrospective studies and 4 phase II trials with 67 rituximab immunochemotherapy treatment arms were included in the external validation (Fig. 1b) [39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89]. The average NOS score was 6.9 stars. A total of 14,936 patients were included, with each arm containing 100–1322 patients (median, 177). The median follow-up time was 1.2–7.2 years (Table 2).

Table 2 Summary of phase II and retrospective studies used for predictive model validation.

Statistical methods

Endpoint definition

In the RCTs [1,2,3,4, 13,14,15,16,17,18,19, 22,23,24,25,26,27,28,29,30,31,32,33,34,35,36, 39], OS was defined as the time from randomization to death from any cause. EFS was defined heterogeneously, but generally from randomization to any treatment failure, including disease progression, death, and treatment discontinuity for any reason (e.g., adverse effects or withdrawal). PFS was generally measured from the time of randomization to disease progression, relapse, or death from any cause (Supplemental Table 3). In the retrospective studies [43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89], OS was generally defined as the time from diagnosis or treatment to death from any cause, and PFS from diagnosis or treatment to disease progression, relapse, or death from any cause (Supplemental Table 4).

Data extraction

In the RCTs, patient characteristics, sample size, follow-up period, primary endpoint, standard and treatment arms, hazard ratio (HR), absolute EFS/PFS rates (year 1, 2, 3, 5), and 5-year OS were extracted (Table 1). For a repeatedly reported RCT, we included the most recent result with the longest follow-up time. All results of the standard and treatment arms were based on the intention-to-treat population. For the phase II trials and retrospective studies, patient characteristics, sample size, median follow-up time, treatment, absolute PFS rates (year 1, 2, 3, 5) and 5-year OS were extracted (Table 2). As described previously [90], the HR or survival rates at the different time points was extracted from the full text (labeled “*”) or the Kaplan–Meier survival curve using Engauge Digitizer software.

Correlation evaluation

The correlation analyses of the RCTs, weighted by trial size, were performed at both trial- and rituximab immunochemotherapy arm-level, without inclusion of treatment arms using conventional CHOP (like) regimen in arm-level analysis. At trial-level, the correlation of log HR (PFS) or log HR (EFS) with log HR (OS) was estimated using the Pearson correlation coefficient r in weighted linear regression, with weight equal to trial sample size. At rituximab immunochemotherapy arm-level, the linear correlation between the 1-, 2-, 3-, and 5-year PFS or EFS rates and 5-year OS rate was also evaluated by the correlation coefficient r, with weight depending on the sample size of each treatment arm. A strong association was indicated when the value of r was close to 1, and the 95% confidence intervals (CIs) of r were obtained using the bootstrap method with 1000 replications.

Sensitivity analysis

Phase III RCTs were classified into five subgroups according to study purposes. To assess the consistency and robustness of the developed predictive model across different settings, sensitivity analyses were performed by leaving each subgroup of trials out at a time. The correlation coefficient r and its 95% CI in trial-level and treatment arm-level correlation were reported similarly.

External validation of RCT prediction model in phase II trials and retrospective studies

We validated our finding by applying the predictive linear regression models to the phase II and retrospective studies with adequate survival data. The predicted 5-year OS rate was calculated from the actual 1–5-year PFS rates in the phase II or retrospective studies using the established linear regression model from the RCTs. For example, the equation “5-year OS = α × 1-, 2-, 3-, or 5-year PFS + β” was derived from the RCTs. Using the reported 1–5-year PFS rate derived from the phase II and retrospective studies, we used these models to generate the predicted 5-year OS rates. The actual and predicted 5-year OS rates were plotted in scatter plots. Statistical analysis was performed in SPSS (version 21.0, IBM Inc.); data visualization was performed using the ggplot2 package in R software (version 3.3.2, R Foundation for Statistical Computing).

Data sharing statement

For original data, please contact yexiong12@163.com.

Results

Trial-level correlation between treatment effects of PFS or EFS on OS in RCTs

Of 26 RCTs (Table 1), 20 (77%), 1 (4%), and 1 (4%) reported one, two, and three pairs of PFS HR and OS HR, respectively. A significant correlation was observed after analyzing 25 pairs of PFS HR and OS HR. Log HR (PFS) correlated with log HR (OS) (r = 0.772; 95% CI, 0.471–0.913; Fig. 3a). Sensitivity analyses showed good consistency in most subgroups, except when leaving the subgroup R-CHOP (like) vs. CHOP (like) out (r = 0.61; 95% CI, 0.075–0.863; Supplemental Fig. 1a). This result was expected. Among the 26 RCTs we studied, 4 trials [1, 2, 4, 13] were shown statistically significant OS benefits, including 3 trials [1, 2, 4] in the subgroup comparing R-CHOP (like) with CHOP (like). The exclusion of these positive trials at once naturally leads to a wider confidence interval.

Fig. 3: Trial-level Correlation Between Treatment Effects on PFS or EFS and OS in RCTs.
figure 3

Trial-level correlations between a HR for PFS and HR for OS, and b HR for EFS and HR for OS. Circle size is proportional to the number of patients in each comparison. The solid blue line indicates the fitted weighted linear regression line; the light green zone represents its 95% CI; r indicates the correlation coefficient. PFS progression-free survival; EFS event-free survival; OS overall survival; RCTs randomized controlled trials; HR hazard ratio; CI confidence interval.

Fourteen RCTs (54%) reported one pair of EFS HR and OS HR each (two treatment arms); three RCTs (12%) reported two pairs of EFS HR and OS HR each (four treatment arms). The analysis of 20 pairs of EFS HR and OS HR demonstrated that log HR (EFS) correlated with log HR (OS) (r = 0.838; 95% CI, 0.625–0.938; Fig. 3b). Sensitivity analyses demonstrated good consistency in most subgroups, except when leaving the subgroup R-CHOP (like) vs. CHOP (like) out (r = 0.732; 95% CI, 0.278–0.941) because of similar reasons as in PFS (Supplemental Fig. 1b). These results confirm that treatment gain in PFS or EFS can predict OS benefit at trial level with an acceptable consistency.

Treatment arm-level correlation between PFS or EFS and OS in RCTs

Forty-four rituximab immunochemotherapy arms from 26 RCTs reported 5-year OS. Thirty-five (80%) rituximab immunochemotherapy arms reported 1-year and 3-year PFS; 37 (84%) arms reported 2-year PFS and 33 (75%) arms reported 5-year PFS. The 1-year (r = 0.813; 95% CI, 0.624–0.913; Fig. 4a), 2-year (r = 0.858; 95% CI, 0.705–0.933; Fig. 4b), 3-year (r = 0.873; 95% CI, 0.716–0.946; Fig. 4c), or 5-year PFS (r = 0.871; 95% CI, 0.711–0.954; Fig. 4d) correlated linearly with 5-year OS. Generally speaking, sensitivity analyses continued to demonstrate robust consistency in terms of correlation r. When leaving out 10 trials from R-CHOP (like) with rituximab+intensified/de-escalated chemotherapy subgroup (Supplemental Fig. 1c–f), which account for nearly half of all treatment arms, the findings remain consistent with wider confidence intervals due to the reduced number of arms.

Fig. 4: Rituximab Immunochemotherapy Arm-level Correlation Between PFS and OS in RCTs.
figure 4

The rituximab immunochemotherapy arm-level associations between a 1-year PFS and 5-year OS, b 2-year PFS and 5-year OS, c 3-year PFS and 5-year OS, and d 5-year PFS and 5-year OS. Circle size is proportional to the number of patients in each treatment arm. The solid blue line indicates the fitted weighted linear regression line; the light green zone represents its 95% CI; r indicates the correlation coefficient. PFS progression-free survival; OS overall survival; RCTs randomized controlled trials; CI confidence interval.

Twenty-seven rituximab immunochemotherapy arms (61%) reported 1-, 2-, 3- and 5-year EFS. Linear regression analysis revealed correlations between 1-year (r = 0.853; 95% CI, 0.729–0.920; Fig. 5a), 2-year (r = 0.896; 95% CI, 0.815–0.945; Fig. 5b), 3-year (r = 0.921; 95% CI, 0.851–0.966; Fig. 5c), or 5-year EFS (r = 0.931; 95% CI, 0.855–0.975; Fig. 5d) and 5-year OS. Sensitivity analysis indicated good consistency (Supplementary Fig. 1g–j). This finding indicates that improvements in 1–3-year PFS or EFS are associated with higher 5-year OS.

Fig. 5: Rituximab Immunochemotherapy Arm-level Correlation Between EFS and OS in RCTs.
figure 5

The rituximab immunochemotherapy arm-level associations between a 1-year EFS and 5-year OS, b 2-year EFS and 5-year OS, c 3-year EFS and 5-year OS, and d 5-year EFS and 5-year OS. Circle size is proportional to the number of patients in each treatment arm. The solid blue line indicates the fitted weighted linear regression line; the light green zone represents its 95% CI; r indicates the correlation coefficient. EFS event-free survival; OS overall survival; RCTs randomized controlled trials; CI confidence interval.

External validation of association of PFS with OS in Phase II and retrospective studies

Sixty-seven treatment arms from the phase II and retrospective studies were used for external validation. As EFS was not available in the retrospective studies, only PFS prediction models could be evaluated. Using the PFS predictive models from the RCTs (Fig. 4), we calculated the predicted 5-year OS rate for each retrospective study using the actual 1-, 2-, 3-, or 5-year PFS rate (Table 2). The simple regression line between the actual and predicted 5-year OS approached the diagonal line, indicating that the predicted OS was approximated to the actual OS. The predicted 5-year OS rate correlated significantly with the actual 5-year OS rate, with the correlation coefficient r ranging from 0.795 to 0.897 (Fig. 6a–d). This finding validates the premise that PFS is predictive of OS.

Fig. 6: External validation of association of PFS with OS after Rituximab immunochemotherapy.
figure 6

Using PFS linear regression models (as shown in Fig. 4), the predicted 5-year OS, as calculated according to the actual 1-, 2-, 3-, and 5-year PFS from the phase II trials and retrospective data (Table 2), is plotted against the actual 5-year OS. The predicted OS approximates to the actual OS, as indicated by approaching the diagonal line, i.e., the line of identity; r indicates the correlation coefficient. PFS progression-free survival; OS overall survival.

Discussion

This is a large-scale, comprehensive study combining data from high-quality phase III RCTs, phase II trials, and retrospective studies to assess the association between the early efficacy endpoints of PFS or EFS with OS in patients with DLBCL primarily treated with immunochemotherapy. Consistent with previous findings [9,10,11,12], analyses of the 26 qualified RCTs showed that improved PFS or EFS correlated with OS benefit at trial level. There was a linear correlation between 1–5-year PFS or EFS and 5-year OS rates at treatment arm level. The comprehensive sensitivity analyses indicated an acceptable overall consistency of the developed predictive model across settings. The external validation showed good calibration between the actual and predicted 5-year OS rates based on the 1–5-year PFS rates in the phase II and retrospective studies. These findings provide new evidence supporting the clinical use of PFS and EFS as early efficacy endpoints for evaluating treatment benefit and accelerating approval for superior treatments.

Previous studies, primarily using 13 RCTs conducted before 2015, concluded that the early efficacy endpoints of EFS or PFS are strongly related to OS at both individual and trial level [9,10,11]. The survival of DLBCL patients who achieved PFS or EFS at 24 months is almost equal to that of the age- and sex-matched general population [9,10,11,12]. Therefore, 2-year EFS or PFS are accepted as early efficacy endpoints. Although the use of individual patient data allows better characterization of important covariates that affect survival, it restricts the analysis to a limited number of RCTs, and the analysis is not easily replicated by independent researchers. In most recently published trials and in clinical practice, there are multiple effective agents not only as initial treatment but also in second-line or salvage settings. Any validation of an early efficacy endpoint is relevant only within the context in which the validation occurred. These factors prompted re-examination and external validation of the correlation between PFS or EFS at the given time points with OS. The present literature-based analysis relied on data from RCTs, phase II trials, and retrospective studies to assess the validity of the early efficacy endpoints, and represents a critical step toward understanding the impact of immunochemotherapy on PFS or EFS and OS in DLBCL. With strict inclusion criteria and quality control, we included large-scale, qualified RCTs for trial-level surrogacy analysis, and phase II trials and retrospective studies for external validation. The correlation between PFS or EFS with OS was well established for DLBCL at both the trial and treatment arm level from the RCTs. Furthermore, the correlation between 1–5-year PFS and OS was externally validated by analyzing the phase II and retrospective data. Consistent with previous studies [9,10,11,12], these results highlight the significant role of PFS and EFS as early efficacy endpoints in designing prospective trials.

As the association of improved PFS or EFS with prolonged OS in DLBCL in this study is straightforward, the use of PFS and EFS as early efficacy endpoints not only incorporates survival, but also reduces treatment-related events, disease relapse, and progression. Compared with long-term OS, dynamic assessment of PFS or EFS at 1–3 years has a lower likelihood of confounding by subsequent or salvage treatment. Innovative treatment strategies with a large magnitude of effect on PFS or EFS for high-risk patients with DLBCL may have a large effect on OS in RCTs. Importantly, we found that PFS or EFS as early as 1 year correlated with 5-year OS at the treatment arm-level, mainly because the majority of patients were at high risk of early relapse and poor post-progression survival. Consistent with this finding, other studies have demonstrated that ~70% of disease failures occurred within the first year after treatment, but rarely after 5 years [9, 12]. For patients who achieved EFS at 12 and 24 months, the risk of relapse in the next 5 years dropped to 13% and 8%, respectively [9]. If patients experienced progression or relapse within 2 years, the median OS after disease progression was only 7.2 months [11].

The strengths of this study include the quality control design, large sample size, external validation of PFS outcomes, and current standard treatment. First, the data were obtained from high-quality RCTs, phase II, and retrospective studies that enrolled large-scale cohorts (>31,000 patients) with newly diagnosed DLBCL uniformly treated with rituximab-containing immunochemotherapy. We could eliminate selection bias with great confidence due to the limited number of RCTs or treatment option heterogeneity. This comprehensive surrogacy study at trial- and treatment arm-level complements previous evidence and strengthens the clinical use of PFS and EFS as early efficacy endpoints. Second, the positive relationships between the 1–5-year PFS and 5-year OS rates were externally validated using independent data that included patients across different countries with varied eligibility criteria, immunochemotherapy regimens, radiotherapy, and follow-up times. As a variety of immunochemotherapy regimens was investigated in a heterogeneous population, we could examine for variability in treatment outcomes and hence improved the generalizability of our study. Our generation and validation of prediction models for describing the association between the 1–5-year PFS and 5-year OS rates is unique. The RCT validation in an independent cohort improved the reliability of the conclusions.

The study limitations include the lack of individual patient data and standardized definition of endpoints and follow-up assessments. First, this is a literature-based systematic review without individual patient data; therefore, patient-level surrogacy was absent. Second, precise modeling requires standardized definitions of endpoints and standardized follow-up assessments or surveillance strategies in DLBCL trials, which is infeasible to accomplish in our study. For example, while PFS was calculated from the date of randomization in RCTs, it was generally calculated from diagnosis or initial therapy in retrospective studies. In addition, EFS events typically consisted of both PFS events, as well as unplanned treatment, treatment discontinuation and toxic events as they were used to evaluate the safety, toxicity or compliance of a novel therapy. Moreover, EFS events were defined inconsistently across trials and dependent on the trial design and purpose. In clinical practice, the exact date of disease progression is difficult to determine precisely, such that the reported PFS or EFS event date was naturally dependent on the frequency and interval of two consecutive clinical visits and imaging assessments. Such an inherited heterogeneity in the interval and frequency of assessments across cannot be removed nor quantified. Third, the predicted model concluded in this study was based on findings in patients treated with anthracycline-based immunochemotherapy, and its extrapolation to other treatments would be speculative. The impact of post-progression management was beyond the scope of this study, and such information is not routinely collected in clinical trials. When more effective salvage treatment occurs and post-progression survival is significantly prolonged in the future, the predicted model should also be modified and optimized. Fourth, the correlation between EFS and OS was not externally validated in the retrospective populations, because EFS is generally not reported in retrospective studies.

In conclusion, our assessment of a large sample of high-quality data for patients with DLBCL provides high-level evidence that PFS and EFS are valid early efficacy endpoints for OS in the immunochemotherapy era.