Introduction

The overarching goal in improving treatment outcomes in cancer is to reduce symptom burden and prolong the length and quality of life (QoL) while limiting treatment-related toxicity. While measurement of overall survival (OS) remains the gold standard for interpreting the impact of new therapies for multiple myeloma in phase 3 clinical trials, as outcomes have improved, it has become increasingly challenging to wait for this endpoint to keep pace with the rapid progress being achieved in the field. Median OS, when short, can be compared directly between interventions. However, with improved myeloma therapies and increased survival duration, it is increasingly difficult to use OS as the primary endpoint to ensure timely approval of novel therapies for the benefit of patients. This difficulty is also reflective of the use of study designs that incorporate crossover after relapse and successful scientific advances over time, which for myeloma have consistently provided effective approved agents that have impacted outcomes after relapse both in and outside of clinical trials, ultimately resulting in greater reliance on surrogate endpoints of OS to provide reliable and validated indicators of clinical benefit of novel interventions.

Surrogate endpoints are defined by the National Institutes of Health as “biomarkers intended to substitute for a clinical endpoint” [1]. For surrogate endpoints to be used in the context of drug approval, the United States Food and Drug Administration (FDA) mandates that “clinical trials are needed to show that the surrogate endpoint can be relied upon to predict or correlate with clinical benefit in a context of use” [2]. Progression-free survival (PFS), defined as time from start of therapy to disease progression or death, has been used in this way by both the FDA and the European Medicines Agency (EMA) in many drug approvals to date [3, 4]. Second progression-free survival (PFS2), defined as time to second disease progression or death, has been used additionally to account for the impact of a drug/intervention on potential resistance after relapse [3]. Earlier surrogate endpoints predictive of long-term outcomes such as OS, including minimal residual disease (MRD) negativity, are increasingly used as research endpoints [5,6,7] but have previously been more challenging to regulatory agencies as primary endpoints of clinical trials [8]. The Oncologic Drugs Advisory Committee of the FDA has recently advised that current evidence supports the use of MRD as an accelerated approval endpoint in myeloma clinical trials [9]. This paves the way for accelerated approval based on MRD as an endpoint but with a requirement for ongoing follow-up and PFS/OS confirmation.

The aim of this review is to explore the current use of PFS as a surrogate endpoint for OS in myeloma, examine situations when this has not been reliable, and discuss ways to ensure that studies are planned to account for these situations, or at least mitigate their effect. This will improve the development of optimal study designs moving forward and increase acceptance of other surrogate endpoints such as MRD, while improving our ability to confidently establish the efficacy of novel agents/interventions in the treatment of patients with myeloma and rapidly translating this into approvals for patient access [10].

PFS is an accepted surrogate endpoint in myeloma

There are many examples of myeloma trials in which a demonstrable PFS benefit is associated with a similar benefit in OS [11,12,13,14,15]. Based on such data, PFS as a surrogate endpoint has contributed to many approval decisions by the EMA and FDA. Recent examples include the MAIA trial, which compared the combination of daratumumab, lenalidomide, and dexamethasone (DRd) to lenalidomide and dexamethasone (Rd) in patients with newly diagnosed myeloma (NDMM) considered ineligible for autologous stem cell transplant (ASCT). The trial showed a PFS benefit with DRd compared to Rd (hazard ratio [HR] 0.55 [95% CI, 0.45–0.67]; p < 0.0001) and an OS benefit (HR 0.66 [95% CI, 0.53–0.83]; p = 0.0003). The OS findings were consistent across subgroups, with the exception of patients with impaired baseline hepatic function, which favored Rd (HR 1.29 [95% CI, 0.64–2.60]), although patient numbers (DRd, n = 31; Rd, n = 29) were small [11]. Similar findings have been observed with combinations of agents from all anti-myeloma drug classes, in both transplant-eligible and -ineligible populations, as well as newly diagnosed and relapsed patient populations. These include the SWOG 0777 study that compared lenalidomide, bortezomib, and dexamethasone (RVd) to Rd in transplant-ineligible patients with NDMM and demonstrated that PFS benefit was accompanied by OS benefit [12], building on the efficacy shown for this combination in both transplant-eligible and -ineligible NDMM patients [13]. Similarly, in the setting of relapse, there are numerous examples, including the POLLUX (comparing DRd vs Rd) [14] and ASPIRE trials (comparing carfilzomib with Rd vs Rd) [15], where PFS and OS were consistently superior for the triplet vs the doublet. However, there are several recent studies where PFS does not act as a surrogate for OS, and the validity of using it as a surrogate endpoint has been questioned [16]. In some cases, adequately powering a study to identify an OS difference may be challenging due to sample size constraints, unless a large magnitude of effect is expected and may not be included as a co-primary endpoint. Exploring other reasons for lack of this translation and how this discrepancy might be addressed will help advance myeloma therapies at a greater pace by ensuring use of the earlier, more rapidly interpretable endpoints whilst preserving the validity of OS in reflecting improved outcomes overall.

Heterogeneity in populations can affect the ability to translate a PFS benefit to OS benefit

Recent examples of clinical trials in which a significant PFS benefit has not translated into an OS benefit for all patients has occurred in situations where specific subgroups have heterogeneous outcomes with respect to the endpoint. In some of these, subgroup heterogeneity has led to an apparent lack of OS benefit or even a detrimental effect on OS in the study population; however, subgroup analyses have led to the identification of groups that may benefit from the treatment being studied [17,18,19]. Important examples of this include differences based on heterogeneity by molecular subgroups, prior ASCT, and age.

Molecular heterogeneity

The BELLINI trial demonstrated a significant PFS benefit with the addition of the BCL-2 inhibitor venetoclax to bortezomib and dexamethasone (HR 0.63 [95% CI, 0.44–0.90]) but with a significantly worse OS (HR 2.03 [95% CI, 1.04–3.95]) in patients with relapsed/refractory myeloma (RRMM) [20]. This led to a partial clinical hold being imposed by the FDA on all venetoclax trials. However, on further analysis of the trial, there was evidence of significant heterogeneity in OS outcomes between patients with t(11;14) or high BCL2 expression (OS: HR 0.82 [95% CI, 0.40–1.70]) and those with no t(11;14) or low BCL2 expression (OS: HR 1.34 [95% CI, 0.81–2.20]), which likely accounted for the discordance between PFS and OS in the study population [18]. This heterogeneity was also evident in the PFS analysis (Fig. 1A). In those patients without t(11:14) and/or high BCL2 expression and lack of benefit from the use of venetoclax, additional toxicity may have led to the worsening in OS. Across the study, adverse events led to 12 deaths in the venetoclax arm (the majority due to infection) and only one in the placebo arm [18].

Fig. 1: Heterogeneity in trial outcomes by molecular status and age.
figure 1

A Heterogeneity by t(11:14) and BCL2 expression status in the BELLINI trial [18]. Heterogeneity in PFS and OS outcomes were seen between patients with t(11;14) or high BCL2 expression and those with no t(11;14) or low BCL2 expression. B Heterogeneity by age in the OCEAN trial [17]. Heterogeneity by age was observed for both PFS (p = 0.033het) and OS (p = 0.006het). Dex dexamethasone, mel melflufen, pom pomalidomide, pts patients, Vd bortezomib and dexamethasone, VenVd venetoclax, bortezomib, and dexamethasone.

This effect of heterogeneity between molecular subgroups was also identified in pre-clinical [21, 22] and early-phase studies [23]. However, the BELLINI trial was intentionally not restricted to the t(11;14) subtype of myeloma based on the premise that combination with dexamethasone [24] and bortezomib [25] could increase BCL-2 dependency even in patients without t(11;14) or high BCL2 expression, and therefore, all myeloma patients rather than a subset may benefit [26]. In phase 1b study investigating venetoclax, bortezomib, and dexamethasone in patients with RRMM, there were markedly improved responses in patients with high BCL2 expression compared to those with low BCL2 expression (94% vs 59%) [27], although it required the results of the phase 3 BELLINI study to demonstrate the negative interaction between PFS and OS outcomes in the non-t(11;14), low BCL2 subgroups [20, 28].

Designing the BELLINI study with recruitment of only patients with t(11;14) and/or high BCL2 expression or adequately powering this subgroup within the all-comers study may, in retrospect, have prevented the significant pause in development following the trial results and arguably led to the approval of venetoclax. However, the feasibility of recruiting the patient numbers required to achieve this was considered potentially prohibitive and reflects the reality of conducting clinical trials of selected subgroups within a meaningful time frame. Importantly, investigation of venetoclax, although delayed, was subsequently reinitiated. The phase 3 CANOVA study was designed to evaluate the safety and efficacy of venetoclax plus dexamethasone (Ven/dex) compared with pomalidomide plus dexamethasone (Pom/dex) in patients with RRMM and with t(11;14)-positive disease only, but did not meet its primary endpoint of PFS [29]. However, a post hoc sensitivity analysis of CANOVA that counted the start of a new line of anti-myeloma therapy as a PFS event (rather than being censored) demonstrated a significant PFS benefit with Ven/dex vs Pom/dex (HR 0.651 [95% CI, 0.487–0.870; p = 0.003). Of note, there was no significant difference in OS in this analysis although a trend in favor of Ven/dex was seen (Ven/dex: 32.4 months vs Pom/dex: 24.5 months; HR 0.697 [95% CI, 0.472–1.029]; p = 0.067) [29].

Importantly, this heterogeneity has not been observed in all studies, as it is usually determined by the degree of molecular targeting of the agent under study. During development of a new agent, it is therefore key to study molecular heterogeneity through allied translational research programmes to identify and validate potential populations that may benefit most. The results of such subgroup analyses would enable the definition of a “target population(s)” for specific therapies prior to the phase 3 trial design, which could benefit timely drug development. This suggested approach would be preferable to all-comers trials followed by subgroup analysis, which nonetheless may have validity, especially if these analyses are planned a priori, but may take longer, require a larger sample size, and may require further validation as part of the drug approval process.

Age-related heterogeneity

Heterogeneity in OS outcomes by age has recently been highlighted by results of the phase 3 OCEAN trial, where there was a significant PFS advantage with the novel peptide-drug conjugate melflufen and dexamethasone combination (melflufen/dex) compared to Pom/dex (HR 0.79 [95% CI, 0.64–0.98; p = 0.032]) in the overall population of patients with RRMM [17]. However, this improvement in PFS did not translate to an OS benefit (HR 1.10 [95% CI, 0.85–1.44; p = 0.47]) [17]. Examination of subgroups revealed evidence of heterogeneity by age for both PFS (phet = 0.033) and OS (phet = 0.006; Fig. 1B). There was also an apparent detriment to OS with melflufen/dex in patients <65 years (HR 1.71 [95% CI, 1.09–2.69]) compared to that with Pom/dex but a significant benefit in patients >75 years (HR 0.46 [95% CI, 0.23–0.92]) [17]. Whether this is related solely to age or to differences in prior treatment is of interest and may be a complex interaction of both, as there was also significant heterogeneity by previous ASCT with the use of high-dose alkylation as conditioning.

Heterogeneity by age was also seen in the Myeloma XI trial comparing lenalidomide maintenance to observation in patients of all ages [30]. The randomization of the lenalidomide and observation arms was conducted in transplant-eligible and -ineligible patients enrolled in the trial, with pre-planned analysis in both. Patients in the transplant-eligible pathway had a median age of 61 years vs 74 years for those in the transplant-ineligible pathway [31, 32]. In contrast to the OCEAN study, there was no heterogeneity by transplant eligibility for PFS. Overall, there was significant PFS benefit (HR 0.46 [95% CI, 0.41–0.53]; p < 0.0001), consistent in both pathways of the trial (transplant-eligible patients: HR 0.48 [95% CI, 0.40–0.58], p < 0.0001; transplant-ineligible patients: HR 0.44 [95% CI, 0.37–0.53], p < 0.0001). However, for OS, there was significant heterogeneity between pathways (transplant-eligible patients: HR 0.69 [95% CI, 0.52–0.93], p = 0.014; transplant-ineligible patients: HR 1.02 [95% CI, 0.80–1.29], phet = 0.0445) demonstrating a translation of PFS benefit to OS for younger and fitter patients, but not for older and/or less fit patients.

Whilst there are key differences between the OCEAN and Myeloma XI trials, including enrollment of RRMM patients in OCEAN compared to NDMM patients in Myeloma XI, taken together, these studies trigger the hypothesis that heterogeneity could be associated with reduced impact on OS for immunomodulatory drug (IMiD) use in older patients, compared to younger patients, perhaps from differences in immune effects and innate immune exhaustion associated with advancing age, and other potential confounders such as vascular health.

To better understand these differences, we examined heterogeneity by age groups seen within and between recently published phase 3 clinical trials in myeloma reporting PFS and OS (Supplementary Methods). To compare the PFS and OS impact from different classes of anti-myeloma therapies, only trials where the effect of specific agents belonging to one of the three major anti-myeloma drug classes (anti-CD38, proteasome inhibitors [PIs], or IMiDs) could be isolated were included. This included “add-on” trials of an agent or where it was compared to an inactive control (observation or placebo) but not a different class of agent. For example, where A is the agent in question, trials of AB vs B, ABC vs BC, and A vs observation/placebo were included while trials of A vs C or AC vs BC were not.

This search strategy identified 20 trials that were included in the analysis (Supplementary Fig. 1; Supplementary Table 1); of these, five isolated an anti-CD38 antibody effect, eight an IMiD effect, and seven a PI effect. Median age across all trials was 66 years (range, 59–73). In total, 19 of 20 trials reported significant PFS benefit in the overall population (five of five evaluating anti-CD38 antibodies, eight of eight evaluating IMiDs, and six of seven evaluating PIs), but only 10 of 20 trials reported significant OS benefit (four of five evaluating anti-CD38 antibodies, three of eight evaluating IMiDs, and three of seven evaluating PIs).

When adjusting for individual drug class effect, there was no evidence of a relationship between the HR and age for PFS (p = 0.607; Fig. 2A). For OS, however, there was a significant negative relationship between HR and median age of patients included in trials (p = 0.022). Notably, this relationship was observed for IMiDs and PIs, but not anti-CD38 antibodies (p = 0.017; Fig. 2B).

Fig. 2: Random effect model of progression-free survival and overall survival across trials isolating the effect of different anti-myeloma agents.
figure 2

The PFS (A) and OS (B) data were extracted from the most recent publication or presentation related to each trial including hazard ratio (HR) and 95% CI as well as median age of patients enrolled. mAb monoclonal anti-CD38 antibody, IMiD immunomodulatory agent, PI proteasome inhibitor, REML restricted maximum likelihood.

For trials where subgroup analysis of HR for OS by age were reported, these intra-trial outcomes were plotted by age group (Supplementary Fig. 2). Lack of standardization in age groupings reported made the cross-trial analysis challenging. The clearest trend appeared in a prespecified analysis of the ICARIA-MM trial that analyzed OS with isatuximab plus pomalidomide and low-dose dexamethasone vs pomalidomide and low-dose dexamethasone in patients with RRMM, wherein a decrease in the HR was seen with increasing age [33].

On examining the effect of both IMiDs and PIs in this analysis, there appears to be a relationship between median age of participating patients and the HR for OS, but not PFS. It appears to be less likely for a PFS benefit to translate to an OS benefit in both IMiD and PI trials recruiting older patients, suggesting a differential effect of IMiDs and PIs on survival after progression in older patients that is not seen with anti-CD38 antibody therapies. Possible hypothesis for this effect could be an ongoing effect of IMiDs and PIs after relapse or lack of dosing optimization of these agents for older patients, leading to toxicity that continues into subsequent lines of therapy, thus impacting outcomes. Alternatively, both IMiDs and PIs may induce alterations in myeloma clonality and/or in the bone marrow stroma, which in turn may impair the efficacy and/or tolerance of subsequent treatments. Within all trials reported, dosing of the key agents was not adjusted based on age or frailty, which could perhaps have led to different outcomes. This approach of age- or frailty-based dosing is being prospectively studied in the ongoing FiTNEss (Myeloma XIV) trial, which is randomizing ASCT-ineligible patients to receive standard-of-care dosing and reactive dose modification in the event of toxicity or frailty score-adapted dosing (International Myeloma Working Group frailty score) with dose reductions determined by frailty groups [34].

Limitations and confounders of this analysis (Fig. 2) included an inability to examine agents separately within each class due to the limited number of trials. Specifically, within the PI group, there are many trials with ixazomib; however, there are only very few trials studying the addition of bortezomib or carfilzomib that also have OS data at the current time. In the IMiD groups, it is also difficult to interrogate differences between lenalidomide and pomalidomide, and these data do not include any trials of the next-generation cereblon E3 ligase modulatory drug (CELMoD) agents. It is, therefore, possible that the effects identified may not be a class effect but rather, driven by one or more of the individual agents within that class. The included trials also have different lengths of follow-up, which affects both the likelihood of OS data being available and the maturity of data. Importantly, differences in the way age subgroups were reported or lack of age subgroup outcome reporting limited the breadth of the intra-trial analyses.

These findings provide critical considerations for future trial designs, as well as interpretation of current study results. They highlight the critical importance of reporting subgroup analyses by age to ensure these interactions are understood and explored. For future trials, this analysis also raises the question of whether agents should be added in such a way that their individual outcomes can be studied (as in the studies included in our analysis) or used in more head-to-head studies. The latter clearly provides key information about an agent or combination in direct comparison with another, but when conducting head-to-head comparisons, it is important to understand the impact of subgroup heterogeneity based on prior studies such as those presented here that may otherwise be “hidden”. Recent examples of when this might be important include the DREAMM3 trial that compared belantamab mafodotin to Pom/Dex in patients with RRMM [35], which did not meet its primary endpoint of PFS. The data reported to date suggest that there was a subset of patients sensitive to belantamab mafodotin, with a median duration of response for belantamab mafodotin not yet reached vs 8.5 months with Pom/dex, consistent with the idea that heterogeneity in the population being studied may have prevented the overall PFS being significantly different from that of the control group. Similar considerations should be given to reporting other potential surrogate endpoints, with sub-analyses being especially valuable in identifying likely subgroups in which early endpoints are more or less likely to translate into OS benefit. Ongoing studies of belantamab in different populations may shed additional light on its true efficacy and are awaited with great interest.

Of note, in the presence of true heterogeneity of treatment effects, interpretation of the main treatment effect can be complex and may limit interpretation of the overall treatment effect. For example, given that IMiDs and PIs may have heterogeneous OS benefit across age groups (e.g., Myeloma XI trial [30]; Fig. 2), careful interpretation of observed HR heterogeneity in trials where these are used as a comparator is needed. If a treatment has an observed homogeneous HR benefit compared with a heterogeneous comparator, it paradoxically means that the experimental arm may in fact be heterogeneous as well. Similarly, observed heterogeneous HRs in comparison with a heterogenous comparator require caution in terms of interpretation of the treatment benefit of the experimental drug, as seen with ICARIA-MM or OCEAN [17, 33]. Taken together, these analyses further support the importance of evaluating subgroups in determination of clinical benefit.

Other situations where PFS benefit may not translate to OS: the issue of competing risks

The examples discussed above include trials exhibiting heterogeneity in outcomes between subgroups that could be contributing to the lack of translation of PFS benefit to OS benefit. Other trials have shown this lack of translation without distinct subgroup heterogeneity currently identified, key examples being the recent studies of early vs delayed ASCT examined in the EMN02 [36], IFM 2009 [37, 38], and DETERMINATION trials [39]. In the parallel IFM 2009 and DETERMINATION studies, there was a significant PFS benefit associated with early vs delayed ASCT (IFM 2009: HR 0.70 [95% CI, 0.59–0.83; p < 0.001] [38]; DETERMINATION: HR 1.53 [95% CI, 1.23–1.91; p < 0.001]). It should be noted that the HRs are calculated differently, such that they reflect a similar outcome in favor of ASCT in both trials. Despite these impressive PFS differences between early vs delayed ASCT groups in the IFM 2009 trial at 93 months follow-up [38, 40], there was no significant difference in OS (HR 1.03 [95% CI, 0.80–1.32; p = 0.81]). Similarly, in the DETERMINATION trial, at 76 months follow-up, there was no significant difference in OS between RVd plus ASCT vs RVd alone (HR 1.10 [95% CI, 0.73–1.65; p > 0.99]). This could be due to several reasons; of note, neither study was specifically powered to demonstrate an OS benefit, as the primary endpoint was PFS. While it is possible to hypothesize that an OS benefit would have been seen if the studies had been larger, notably, in other similarly sized studies not involving transplant, there has been a correlation between PFS and OS suggesting that transplant may be the confounding effect [11, 12, 41]. An alternative explanation could be that studies in transplant-eligible patients with NDMM now have long median PFS and OS, including the control arms of these studies and therefore, the influence of crossover and other therapeutic advances over time plays a larger role, with salvage therapy in particular proving increasingly effective. To this point, more patients in the IFM 2009 trial received a delayed transplant (77%) than in DETERMINATION (28%), suggesting that beyond crossover to ASCT, other factors such as the remarkable efficacy of next-generation novel agents have a favorable impact on long-term outcomes. In recent years the development of immunotherapies for myeloma including bispecific antibodies and chimeric antigen receptor T-cell therapies has moved at a rapid pace [42, 43]. These treatments challenge the old paradigm that patient remission will shorten with each line of treatment with many inducing deep and durable remissions at later lines of therapy. This confounds the follow-up of overall survival for clinical trials as longer-term outcomes may be driven by access to such therapies as opposed to the original trial randomization. In trials performed across multiple international centers, global heterogeneity in access to subsequent anti-myeloma drugs may also exacerbate this.

Importantly, the longer the median PFS and OS achieved, the greater the potential for competing risks affecting OS outcomes to become apparent. These competing risks may affect subgroups differently and reflect toxicity, side effects, and impact on the disease itself, such as mutational burden and signatures resulting from mutagenic treatments [40] as well as second primary malignancies, specifically, secondary myeloid leukemia and myelodysplastic syndrome [39]. Another example of the issue of competing risks is exemplified by a recent post-hoc analysis of DETERMINATION that explored outcomes with RVd alone vs RVd plus ASCT in African American and White patients which showed that while outcomes with RVd plus ASCT were similar in patients of both races, African American patients with high body mass index and female sex appeared to derive significantly more PFS benefit from RVd alone. This may be related to the Duffy-null genotype, a common variant in people of African and Middle Eastern genetic ancestry, that has a key role in cytokine homeostasis, inflammation and likely, myeloma pathobiology and was present in approximately 60% of African American patients tested to date in the study [44, 45].

Prior studies of transplant have shown consistent PFS with ASCT [46], but OS benefit has been less clear. Interestingly, in the EMN02 study that compared ASCT with bortezomib-melphalan-prednisone in patients with NDMM, ASCT was associated with PFS benefit (HR 0.73 [95% CI, 0.62–0.85]; p = 0.0001) compared with bortezomib-melphalan-prednisone, but there was no significant difference in OS (HR 0.9 [95% CI, 0.71–1.13]; p = 0.35). Conversely, tandem ASCT showed a benefit in both PFS (HR 0.74 [95% CI, 0.56–0.98]; p = 0.036) and OS (HR 0.62 [95% CI 0.41–0.93]; p = 0.022) compared with single ASCT [36]. This suggests that in the context of ASCT, the impact of high-dose melphalan and its effects on both the patient and their myeloma is key in understanding the complexity of these interactions, as well as their combined long-term effect on the outcome for either benefit or detriment in any particular patient through competing risk.

Does lack of translation of PFS benefit to OS benefit matter? Is OS the only important goal?

When examining the circumstances wherein an agent may have substantial PFS benefit without translation to OS, it is important to reflect on the impact of that situation on an individual patient. While prolonging life is of key importance, first remission is often when patients are feeling most well, with the fewest myeloma-related symptoms. Prolonging this time (before relapse) even in the context of no OS benefit may still, therefore, be of benefit to patients. An illustrative example is the transplant-ineligible pathway of the Myeloma XI trial in which the use of lenalidomide maintenance more than doubled median PFS from 11 months to 26 months (HR 0.44 [95% CI, 0.37–0.53]; p < 0.0001) in patients who had not undergone ASCT, but there was no significant OS difference [30]. Prolonging first remission, with only a single anti-myeloma agent rather than combination therapies required at relapse with more hospital attendances, may benefit patients’ QoL. It may also be more cost-effective for patients to continue single-agent maintenance therapy for longer, and thus need less expensive therapies for a shorter duration to achieve the same effect. Although this is an area of active study, recent data suggest that early ASCT may be more expensive than delayed therapy (or being kept in reserve), at least in the jurisdiction of US health care, challenging similar arguments used previously in favor of patients undergoing early vs deferred ASCT [47,48,49,50].

The duration of first remission in the DETERMINATION trial was approximately 2 years longer with early vs delayed ASCT, with a median PFS of 67.5 months vs 46.2 months (HR 1.53 [95% CI, 1.23–1.91; p < 0.001]), but remarkably, no significant difference in OS [39]. However, for many patients, this would mean a prolonged time after ASCT when they may have regained a QoL not dissimilar to their pre-diagnosis state, perhaps returning to work, whilst taking oral maintenance therapy. This comes with the key caveat of significant loss of QoL during transplant, which may last for several months [51]. Conversely, there may clearly be patients for whom a delayed transplant is more appropriate, for example, if lifestyle factors favor early return to work after/during induction therapy, with ASCT delayed until first relapse or used later and only if needed. The risk-benefit balance in these scenarios are patient-specific and should therefore be carefully discussed, with personal and disease-related factors tailored to patient preference [52]. These issues highlight the critical importance of incorporating health-related QoL into clinical trial endpoints as well as parallel healthcare resource utilization and health economic analyses within studies to enhance optimal translation into clinical practice and for reimbursement discussions [53]. Efforts to encourage this approach are included in recommendations such as those in the European Society for Medical Oncology-Magnitude of Clinical Benefit Scale [54].

Similar to these arguments, the acceptance of the use of MRD as an endpoint for accelerated approval, does not mean that MRD negativity should be pursued at all costs. Particularly in the older patient population, the balance of efficacy and toxicity is critical and co-primary endpoints combining both should be considered to ensure the achievement of a deep response that does not come at too high a price for the patient.

Conclusions

Whilst PFS continues to be commonly used in FDA and EMA approvals as a surrogate endpoint for OS in myeloma, recent examples highlighted above demonstrate situations in which clinical benefit of PFS does not translate to OS. Understanding why this has occurred can be aided by understanding and utilizing subgroup analyses of clinical trials, potentially even if not pre-specified and new information has come to light, which may be of value to avoid overlooking a group that derives significant benefit from a particular therapy, or missing others for whom the benefit is less. Furthermore, this may help identify target populations for new agents that may better inform subsequent studies. The use of such analyses may require dialog with approval bodies such as the FDA and EMA to ensure such findings are accepted in drug approval reviews, as well as being patient-focused to optimize availability of novel agents to target populations in need. Whether or not subgroup analyses can contribute to approval processes may depend on whether there is a strong biological rationale for the subgroup effect, whether the analysis was prespecified and appropriately powered, and the magnitude of difference. Post-hoc analyses may provide a high degree of evidence but given the statistical risks associated with multiple testing and adequacy of powering of the study, further validation may be needed as part of the approval process. Importantly, for optimal patient access, this should not necessarily preclude initial approval, especially given the complexity of disease pathobiology and treatment effect as the therapeutic landscape expands, but critically, may require further validation as part of subsequent prospective studies. Additionally, ongoing follow-up of trials for OS (preferably also adequately powered) will ensure that a clear picture emerges over time. However, powering of endpoints for OS is likely to require additional patients, longer follow-up, and therefore, significant additional resources to implement. Translational analyses performed pre-clinically and during early-phase clinical trials are critical to understand how to better design later-phase studies. Novel trial designs such as adaptive studies in which subgroups are identified at interim analysis and expanded in the second part of the study may also help improve rapid integration of this approach.

Understanding the value of a long PFS, including the duration of progression-toxicity-free survival, and the translation (or otherwise) to OS for patients is best informed by concurrent patient-reported outcomes within key phase 3 trials. If feasible, this would need to be continued at least until the point estimate of PFS2 to capture patient experiences after first relapse, as well as continuing to assess OS, which remains the gold standard for outcome.

How can these findings be applied to implementation of other surrogate endpoints in myeloma, such as MRD? Recent acceptance of the use of MRD as an endpoint for accelerated approval is vital given the ongoing advances in myeloma therapies seen to date, to enable their rapid translation to real-world practice [53]. This means that even PFS is now too long to make this a meaningful endpoint and enable early adoption of novel therapies in the frontline setting. However, if MRD is used as an endpoint, it is equally important to ensure appropriate follow-up for PFS and OS. Subgroup analyses should be encouraged to avoid missing key groups that do, or do not, benefit unless a target population can be identified from prior knowledge of drug action, which requires both confirmation in randomized trials and careful attention to correlatives. In our view, implementing these measures will hopefully ensure timely drug approvals in myeloma and subsequently, further improvements in clinical benefit for all our patients.