If a man will begin with certainties, he shall end in doubts, but if he will be content to begin with doubts, he shall end in certainties —Francis Bacon

The US Food and Drug Administration requires proof of clinical benefit before granting approval to new drugs for treatment of neoplasms such as acute myeloid leukemia (AML). Clinical benefit, in this context, is defined as improvement in length and/or quality-of-life (QoL) or in validated surrogates for these end points.1 Although several drugs have been approved in the last 3 years for other leukemias including chronic myeloid leukemia, chronic lymphocytic leukemia and acute lymphoblastic leukemia, new drug approvals for AML have lagged. The overwhelming reason is failure to develop new, effective AML therapies. We cannot identify a drug whose Food and Drug Administration approval would have markedly changed the prognosis of most persons with AML. Nonetheless, some AML experts believe a greater emphasis on end points other than survival would facilitate drug development and approvals in AML. We discuss pros and cons of other potential end points for trials of new drugs for AML in light of distinctive disease characteristics, which complicate straightforward consideration of end points. An example is the demanding nature of AML therapy raising questions, particularly in older persons, about the magnitude of better survival and the proportion of persons benefitting needed to justify this demand. Other considerations include the efficacy of rescue (salvage) therapies after failure of initial therapy, especially allogeneic hematopoietic cell transplants and improvements in prevention and treatment of fungal and other infections, important causes of death in persons with AML.2, 3, 4 These rescue strategies and improvements in supportive care act to dissociate survival after AML therapy from surrogates such as event-free survival (EFS). Moreover, efficient access to blood and bone marrow samples allows identification of many cytogenetic and molecular subgroups of AML and provides sensitive, albeit imperfect ways to quantify residual leukemia cells in persons in histological complete remission. These developments raise the possibility of approving a new drug in only a subset of persons with AML and/or of using measurable residual disease (MRD) testing as a surrogate end point.5, 6 However, despite these many advances and considerations we conclude no current end point for approving drugs in AML is entirely satisfactory.

Conventional surrogates for survival: complete remission, remission duration and EFS

The use of a survival end point for drug approval in AML requires following subjects for a sufficient interval for a substantial number of events (deaths) to occur for a precise estimate. In older people with AML on clinical trials probability of survival is 25–30% at 2 years and it is considerably higher in younger persons.7, 8, 9 Consequently, it may take 2–3 years or more to accurately predict survival. This delay has resulted in interest in a surrogate end point for survival. If a surrogate can be assessed more quickly or has a faster event rate than death, trials using a surrogate end point can be done more quickly. However, a valid surrogate must correlate with survival such that improvement in one accurately predicts improvement in the other.10, 11, 12

The most obvious survival surrogate in AML is complete remission usually defined as <5% bone marrow blasts and more than 1 × 10E+9/l granulocytes and more than 100 × 10E+9/l platelets in the blood. Validity of this surrogate is based on the observation persons achieving complete remission live longer than those who do not, a prolongation reflecting the interval in complete remission.13 In prior studies survival after relapse was similar to that of persons never achieving remission (a situation now changed; see below), suggesting longer survival associated with complete remission reflects achieving a state of complete remission rather than an association between achieving complete remission and general health or AML biology.13 However, although complete remission is a more tractable end point than survival, data from several recent studies of new AML drugs report increased complete remission rates compared with controls but no increase in survival implying a disassociation under some conditions.14, 15

If complete remission is an imperfect surrogate for survival, it seems unlikely less stringent responses such as complete remission without platelet recovery (CRp) or complete remission without hematologic recovery (CRi) will be more valid survival surrogates. Empiric data are conflicting. Walter et al.16 reported persons receiving intensive anti-leukemia therapy (conventional or high-dose cytarabine with or without an anthracycline) lived longer if they attained complete remission rather than CRp after accounting for covariates such as cytogenetics and lead-time bias. In contrast, data from older persons receiving less intensive therapy such as azacitidine suggest no survival difference between subjects achieving complete remission compared with subjects achieving other responses including partial remission.17, 18 These contradictory data suggest the association between complete remission or other response states and survival differs for age cohorts and for different drugs.

EFS is sometimes considered a surrogate for survival. (In fact, survival is an EFS end point where death is the event; see below.) Its analog, progression-free survival, is the basis for many new drug approvals in other neoplasms.19, 20 In contrast to survival where death is the only event of interest, EFS also includes failure to obtain complete remission and relapse from complete remission. As most failed remission attempts and relapses in persons achieving complete remission occur within 1 year of starting therapy, EFS is quicker to evaluate than survival.21, 22 Another advantage of EFS is a more precise test of a drug’s efficacy than survival. This is because persons with AML can live for a considerable interval after events (such as relapse) because of rescue therapies and supportive care discussed above.2, 3, 4 These post-relapse interventions are often unspecified in the design of phase-2 and -3 studies and may be used in non-random and biased ways, the consequence of which is to convert a controlled study into an observational database of uncontrolled interventions.23 Accordingly, if management of people after therapy failure differs between the conventional and investigational treatment arms of a randomized trial, survival may differ between these arms solely because of differences in post-failure interventions unrelated to the drug being tested. Furthermore, using a survival end point makes it difficult to use a crossover design in phase-3 studies. For example, a new drug effective after failure in persons initially assigned to a placebo might narrow the difference in survival between the new drug and placebo arms. Although, this might decrease the probability a new drug would be approved, physicians and subjects might be more interested in prolonging survival than in approving the new drug. We also need to consider the substantial heterogeneity of persons diagnosed as having AML. Some of the variables defining this heterogeneity are identified but almost one-half are not.24 As such, valuable information may be garnered in phase-2 trial with a randomized discontinuation design and in phase-3 trials with a crossover design.

However, despite these interesting considerations, validity of EFS as a survival surrogate in AML is dubious. Othus et al.25 studied EFS as a potential survival surrogate in 3133 subjects with newly diagnosed AML treated on one of four randomized studies. Two trials primarily entered subjects age <60 years and two were limited to subjects >60 years. Events occurred in about 80% of subjects and deaths in about 70% with median censoring times of 3–9 years. Kendall tau was used to evaluate the correlation between EFS and survival by comparing the EFS and survival in one subject to EFS and survival in another.26 If EFS is longer in the first subject than in the second subject, and if survival is also longer in the first, results are considered concordant. However, if EFS but not survival is longer in the first subject than in the second subject, results are considered discordant. These pair-wise comparisons are repeated iteratively over many subject pairs. A Kendall tau value of 1.0 (effect size=1.0) indicates EFS and survival are perfectly concordant or correlated positively, a value of −1 (effect size=−1) indicates they are completely discordant and correlated negatively, and a value of 0 (effect size=0) indicates EFS and survival are unrelated. Although much of the data in these studies are censored, this can be accounted for using standard statistical techniques.27 Three studies showed a statistically significant correlation between EFS and survival but with Kendall tau values of only 0.11–0.66. A Kendall tau value of 0.47 for the EFS/survival correlation is reported by others.28 These relatively small effect sizes indicate that EFS is not a reliable surrogate for survival in trials of new drugs in AML. Failure of EFS as a survival surrogate results from the efficacy of rescue therapies and consequent ability to prolong survival after events have occurred. For example, median survival after relapse was 2.6 years in a recent MRC/NCRI study.2 Contributing factors are use of allogeneic transplants as rescue therapy and better supportive care, particularly more effective anti-infection drugs.2, 3, 4

Another cause of dissociation between EFS and survival is numbers of courses of induction therapy given. In some studies more than 90% of subjects failing to achieve complete remission after a first course of induction therapy received a second course of the same therapy as prescribed in the protocol.25 In other studies only about 50% of non-responders received a second course of induction therapy.29 The definition of EFS considers failure to enter complete remission as an event regardless of how many induction therapy courses are given. As complete remission rates can be as high as 40–50% when a second course of the same, previously unsuccessful induction therapy is given, EFS will be longer if a second course of the same therapy is given to initial non-responders but shorter if no second course is given.25, 30 However, because rescue therapies are arguably as effective as second courses of initial therapy, survival may not be materially altered by switching to another regimen. Indeed Kendall tau values examining EFS as a survival surrogate are higher in trials in which a second course of initial therapy is more often given.25

Conceptually, composite end points such as EFS (and survival) are problematic statistically (reviewed in ref. 31). Events such as death from co-morbidities frailty, or therapy related adverse events compete with events such as death from resistant AML. Such competing risks interfere with and/or preclude a precise estimate of events of primary interest such as efficacy of a new anti-leukemia therapy, reduce statistical power and lead to inferential error. Although a complete discussion of the limitations of using a composite end point such as EFS in AML is beyond the scope of our commentary, this is an important unresolvable issue.

Newer surrogates: MRD and bridge to transplant

In many neoplasms, follow-up after therapy is indirect relying predominately on radiologic assessment such as computer tomography, magnetic resonance imaging or positron emission tomography. In contrast, in AML the ability to easily access blood and bone marrow samples is direct and has resulted in the development of sensitive techniques to evaluate MRD including multi-parameter flow cytometry (MPFC), PCR, analyses of genes and gene expression and proteomic analyses.32, 33 PCR and especially MPFC have found widest application. Considerable data indicate results of MPFC and PCR testing for MRD allows stratification of persons with AML in histological complete remission into cohorts with very different risks of relapse, EFS and survival.34, 35, 36, 37, 38

Although counterintuitive, incorporating a requirement for a negative MRD test in the definition of complete remission or of EFS might decrease the strength of the correlation between these end points and survival. If relapse was said to occur if there was a positive MRD test despite having <5% blasts histologically, EFS in such persons would shorten, although by histological criteria they would remain in complete remission. However, if physicians acting on this new criterion chose to intervene (for example, with an allotransplant), survival might increase. This would have the effect of decreasing the strength of the correlation between EFS and survival unless physicians intervene with therapies that shorten survival. This is more than a theoretical possibility where MRD replaces histology as the definer of relapse given non-trivial rates of false negativity and positivity of MRD testing (discussed below).

Survival data are typically reported in terms of medians. However an improvement in long-term survival may not be detected as an increase in median survival if less than one-half of persons benefit from an intervention. There is a very low risk of AML relapse in persons in complete remission for 3 or more years regardless of prior prognostic cohort.21, 22 These data suggest 3-year leukemia-free survival might be a reasonable surrogate for long-term survival. For example, Sargent et al.39 reported freedom-from-progression at 30 months in persons with follicular lymphoma was a valid surrogate for long-term progression-free survival (median more than 7 years). In persons with AML, Walter et al.16 reported a strong correlation between achieving complete remission and survival at 3 years. A stronger correlation with long-term, if not median, survival might result by considering results of MRD testing in the definitions of complete remission or EFS.34, 35, 36, 37, 38 For example, a higher proportion of persons might be alive at 3 years if they achieve complete remission with a negative rather than with a positive MRD test. Alternatively, a higher proportion might be alive at this time if they remain in complete remission with a negative MRD test at, for example, 3, 6 or 12 months equivalent to EFS with a negative MRD test at these time points.

Accuracy of MRD testing as a predictor of relapse in persons with AML is controversial. Specifically, its precision in separating cohorts more and less likely to relapse, contrasts with imprecision in predicting relapse at the subject level.40 For example, in SWOG study SO106 Othus et al. recently reported MPFC test results at the time of complete remission was a stronger predictor of relapse-free survival (relapse and death as events) than age, pre-treatment cytogenetics or NPM1 and FLT3 mutations.41, 42 However, the univariate c-statistic for the MPFC results in complete remission to predict relapse-free survival was only 0.58 (1.0=perfect prediction; 0.5=no predictive value). The c-statistic value for relapse-free survival prediction of a multivariable model incorporating all these features was only about 0.7. However, this study used MPFC techniques developed 5–10 years ago; more sophisticated MPFC testing is now available but has not been validated as being more sensitive and/or specific correlate of relapse or relapse-free survival.40 Should MRD testing results become sufficiently accurate such that complete remission with a negative MRD test reproducibly identifies long-term survivors, this outcome or EFS with a negative MRD test might be reasonable end points for new drug approvals. Elsewhere we discuss substantial theoretical and practical limitations of MRD testing in AML, a situation that differs enormously from MRD testing in other leukemias such as acute lymphoblastic leukemia and chronic lymphocytic leukemia, where the testing target is lineage marker (IGH or TCR rearrangement) rather than neoplasm specific.40 Furthermore, even in a disease such as chronic myeloid leukemia where there is a highly sensitive and specific marker of the leukemia clone (BCR/ABL) there is a >50% rate of false negative PCR tests when used to predict cure.43 These data suggestion caution using results of MRD testing as a survival surrogate regardless of potential technical advances.

Acceptance of these surrogate end points such as complete remission with a negative MRD test at a pre-specified time point requires standardization of MRD testing. It will require agreement about what constitutes long-term survival and how strongly correlated the surrogate end point must be, the former preferably empirically derived as suggested above and the latter perhaps guided by the c-statistic. More focus on identifying surrogates for long-term survival might comport with observations most people with AML are more interested in whether they will be cured than whether they will live 3 months longer even if such an increase were statistically significant. The example of interleukin-2 therapy in kidney cancer provides precedent for Food and Drug Administration approval of new drugs when median survival is not prolonged but where long-term remissions occur in some people.44

Recently there is interest in whether a new drug might increase the proportion of persons able to proceed to an allotransplant. Fundamental to using this end point, commonly termed bridge to transplant in new drug approvals is the notion this end point will correlate with survival. A strong correlation is unlikely for several reasons. For example, the definition of who is able to receive a transplant is subjective, inconsistent and non-reproducible. Persons in studies using a bridge to transplant end point frequently only attain CRp or CRi pretransplant. As discussed, it is unclear if these response states are associated with a survival advantage.16 CRi and CRp are more frequently associated with a positive MRD test than is conventional complete remission and thus may be associated with a higher risk of relapse post transplant and worse survival.34 Moreover, post-transplant outcomes such as graft-versus-host disease and cytomegalovirus infections complicate analyses of EFS or survival with the bridge to transplant approach. For example, what if a therapy given as a bridge to transplant produces a high rate of CRi and CRp responses but increases likelihoods of death from graft-versus-host disease or cytomegalovirus pneumonia? What is needed is a randomized trial where conventional therapy and bridge to transplant strategies are compared for survival outcomes and that proves moving to a transplant improves survival over a non-transplant strategy in persons with similar response to the new drug. No such studies are reported nor likely to be done. Subjects are removed from phase-2 studies to receive a transplant because it is felt that they are likely to fare poorly otherwise. However, this action is a form of informative censoring that violates the assumption of non-informative censoring inherent to Kaplan–Meier survival analyses.45 On the basis of the sum of these considerations we consider bridge to transplant an unvalidated end point that should not be used for new drug approvals. It assumes shifting to a transplant that improves survival but it may also miss efficacy of a new drug if transplant outcomes are unfavorable.

Clinical benefit assessed by QoL instruments

Improvement of QoL is a worthy objective in AML given the morbidity (including frequent transfusions and hospitalizations) and mortality associated with AML therapies, which often result in relatively small gains in survival for most people. Although better QoL is an explicit Food and Drug Administration criterion for new drug approval, the relation between complete remission or EFS, each with or without a negative MRD test, and better QoL is poorly studied.1, 46 Nonetheless, most clinicians believe there is a strong association because persons achieving and maintaining complete remission are typically happier and fitter than those failing to do so. It is plausible and testable achieving and remaining in complete remission confers readily quantifiable benefits that improve QoL even if survival is unchanged. Among these benefits are the possibility of receiving fewer transfusions and drugs and spending less time in medical facilities. These variables as well as QoL are influenced by disease state (complete remission or not), continued chemotherapy or a recent transplant. Differences in QoL may depend on a person’s response state (complete remission, CRp and CRi) and response duration and might only become apparent after recovering from the adverse effects of anti-leukemia therapy. These hypotheses need testing. Studies that carefully measure QoL over time are critical to provide data needed to define criteria for non-survival clinical benefits provided by AML therapies.

In summary, defining clinical benefit of therapy in AML trials is complicated by several disease- and treatment-specific considerations and statistical constraints that make survival a challenging and perhaps inappropriate end point for new drug approvals. Although conventional complete remission and EFS end points do not correlate well with survival, EFS may be a better assessment of a new drug’s efficacy because it is unaffected by subsequent uncontrolled, potentially biased interventions. Complete remission, EFS or both may correlate with better QoL. Assuming MRD assessment becomes more accurate, which it may not, it may be possible to introduce complete remission or EFS with a negative MRD test as surrogates for long-term survival in the future but this is premature. Much work is needed to test these hypotheses. If we discover a knockout drug for AML many of these considerations may become unnecessary. To paraphrase what Supreme Court Justice Potter Stewart said regarding obscenity: ‘I cannot define obscenity but I know it when I see it’. We will know a really effective AML drug when we see it!

We hope our commentary stimulates consideration of alternative end points for approval of new drugs for AML, especially for so-called targeted therapies under development. More importantly, we hope this discussion stimulates collection and analysis of data required to support end points other than survival for approval of new drugs in AML. To return to Francis Bacon’s comment: ‘If a man will begin with certainties, he shall end in doubts, but if he will be content to begin with doubts, he shall end in certainties’. We are certain about the former but less so about the latter.