For many are called, but few are chosen.

Mathew 22:14

In this issue of Leukemia Montalban-Bravo et al.1 report results of an interesting clinical trial of azacitidine alone or with vorinostat in 109 subjects with acute myeloid leukemia (AML) or high-risk myelodysplastic syndrome (⩾intermediate-2 by IPSS). What is exceptional is the study cohort is composed of subjects usually excluded from clinical trials, whereas subjects usually included in clinical trials were excluded. Eligibility criteria required ⩾1 of the following: (1) Eastern Cooperative Oncology Group (ECOG) performance score ⩾3; (2) serum creatinine or bilirubin >2 mg/dl; (3) ⩾1 diverse co-morbidities; (4) another cancer active or in remission <2 years. Response and survival were assessed after entry of cohorts of 3–6 subjects with accrual to terminate, should there be <5% probability of 60-day survival (primary endpoint), was not ⩾20% better than the 50% seen previously in 181 similar subjects. There was a similar design to evaluate complete remission rate against the historical rate of 28%. Greater than 20% ⩾grade-3 non-hematologic toxicity was also a stopping criterion. The 109 subjects (median age 71 years) included 55% eligible because of a prior cancer, 23% because of an abnormal bilirubin or creatinine, 13% because of an ECOG performance score ⩾3 and 17% because of other co-morbidities. Six subjects met >1 eligibility criterion. None of the stopping rules were met. Sixty-day survival rate was 79% and was not influenced by which specific criterion led to eligibility.

There are several potential criticisms of this study. However, this is not the focus of our editorial. Rather, it is the study eligibility criteria which we find most interesting because they provide a precedent for relaxing eligibility criteria in future clinical trials in AML, especially should they be accompanied by stringent monitoring of adverse events as occurred here. Relaxing eligibility criteria could expand numbers of persons entering clinical trials and help bridge the gap between appearance and reality with regard to AML clinical trials.

Reality in AML therapy is perhaps best represented by data from the Surveillance Epidemiology and End Results (SEER) program. Although 5-year survival rates for AML have improved to about 27% in 2006–2012 (the most recent reported interval) the pace of improvement has been slow (http://seer.cancer.gov/statfacts/html/amyl.html). Because 60% of persons in the United States ⩾66 years (the approximate median age at AML diagnosis) receive no therapy within 3 months of diagnosis, the SEER data may underestimate progress among those treated.2 However, there is little doubt much of the gap between appearance and reality reflects reports of advances in AML therapy typically outpacing real improvements. Ten years ago one of us (EE) reported about 70% of 91 abstracts describing 39 new therapies for AML presented at the American Society of Hematology (ASH) annual meetings 1993–2001 described promising or encouraging data.3, 4 Only 15% of studies were declared negative; the other 15% were inconclusive. About 45 of the 63 positive abstracts eventuated in peer-reviewed publications, 38 of which reported favorable outcomes. However with a minimum 5-year follow-up, only one drug, gemtuzumab ozogamicin, was approved by US Food and Drug Administration (FDA) or European Medicines Agency (EMA) for therapy of AML with approval subsequently withdrawn.5 Although factors other than efficacy influence approval and subsequent use, approval per se does not necessarily translate to clinically meaningful (an FDA term) benefit for most persons with AML.6

Our ASH survey is 15 years old. Although our conclusions may not apply to current ASH abstracts or to new drugs such as midostaurin7 and CPX3518 (each of which currently has limited indications), we suspect they still operate. This is because the constituency for success of new therapies far exceeds the constituency for objectivity. Investigators, pharmaceutical companies, medical centers and persons with AML cannot be expected to be disinterested given the benefits of seemingly promising results. The numbers of trials untranslatable to clinical practice have led to a recent emphasis on pragmatic trials, which aim to evaluate new drugs in a real-world setting.9 As such these trials explore recruitment of subjects and investigators, delivery of the intervention within the trial, nature of follow-up and nature, determination and analysis of outcomes. Here we focus on recruitment of subjects into AML clinical trials, emphasizing the role of selection biases such that persons entering these trials seem the chosen people.

The most obvious cause of selection biases is restrictive study eligibility criteria. Montalban-Bravo et al.1 trial is notably distinctive from the typical protocol requiring an ECOG performance score <3, near normal kidney, liver and heart functions, no serious co-morbidities and no uncontrolled infections. Absence of any of these is known to confer an unfavorable prognosis as is the vague but highly important group of persons, typically older, judged unfit. Excluding such subjects improves outcomes of clinical trials but limits translating results to the universe of persons with AML. In other instances, restrictive eligibility criteria may have less impact on outcomes but pose ethical issues. Consider a 60-year-old woman with breast cancer who after surgery has a 90% probability of cure without need for further therapy. One year later she develops AML with complex cytogenetics. According to European LeukemiaNet (ELN)10 and National Comprehensive Cancer Network (NCCN)11 guidelines she should enter a clinical trial. However, her antecedent breast cancer (clearly not therapy-related) leads to exclusion from trials, although there are no data suggesting it affects her AML prognosis and data from persons with metastatic lung cancer and a history of a prior cancer suggest it would not.12 A similar conclusion is suggested by data from the Montalban-Bravo et al.1 study. Less restrictive eligibility criteria would potentially address this ethical issue, increase generalizability of results, accelerate accrual rates and decrease complexity and costs of conducting and monitoring clinical trials. Van Spall et al.13 reported 37% of 2709 exclusion criteria in 283 randomized controlled trials were poorly justified. About 84% of these trials included ⩾1 such criterion.

Enrollment of newly eligible subjects could decrease effect size. This might occur if, for example, AML subtypes less likely to be benefited from a new drug and/or more likely to be harmed are enrolled. In this circumstance more subjects would be needed to maintain a given power potentially more than counterbalancing the impact of faster accrual and increasing study duration. George14 examined this issue positing various reductions in effect sizes in the newly eligible subjects and various increases in accrual rates. If the reduction in effect size was relatively small, a 10% increase in accrual rate would provide enough subjects to maintain power of 0.9 and study length would be unaffected. If the new drug had no effect in the newly eligible subjects, study duration would need to be longer to maintain a power of 0.9 especially if accrual was dominated by newly eligible subjects. Increases in generalizability of conclusions must be balanced against increases in study duration. Interestingly, relatively simple calculations such as those described by George14 have gained no traction in clinical trials design. Obviously estimating the hazard ratio with the new treatment in currently ineligible subjects is difficult if they continue to be ineligible. Making such subjects eligible might identify additional subgroups likely to respond, especially if enrollment is accompanied by attempts to discover and verify bio-markers associated with response. There are several examples where a wider spectrum of subjects respond to a drug than originally hypothesized.15 It follows current studies may exclude subjects with no scientific bases. Kim et al.16 noted the rationale for relaxing eligibility criteria remains applicable in the era of precision medicine and molecular-based therapy.16

Although Montalban-Bravo et al.1 concluded their subjects had been safely treated relative to historical controls, loss of a safety margin with less restrictive eligibility criteria for trials of new drugs is plausible. However, medicine is fundamentally concerned with risks and benefits. It is likely many subjects would conclude the benefit-to-risk ratio with conventional therapy is so low they would prefer to be in a clinical trial. This assumption is implicit in ELN10 and NCCN11 recommendations for trials in persons with poor prognoses with conventional therapies. Furthermore, reliance on restrictive eligibility criteria as the sole arbiter of study safety may be misleading, reducing medical practice to an exercise in box-checking independent of clinical judgment. We also note decreasing therapy-related mortality (TRM) rates in clinical trials in AML.17 Although this decrease was in eligible subjects it might also apply to currently ineligible subjects. The eligibility criterion of no prior cancer could be eliminated as could those of a creatinine, bilirubin or other liver function tests levels below a threshold when the drug(s) being studied is not metabolized or excreted by the relevant organ.18 Requiring normal left ventricular ejection fraction contradicts data indicating anthracycline-induced myocardial toxicity is cumulative and highly unlikely to result from doses of these drugs typically used in initial therapy of AML.19 There is no reason to require a bone marrow sample if there are many myeloblasts in the blood (save for some research questions), as results of cytogenetics and mutation analyses have been shown concordant between blood and bone marrow samples.20, 21 Many people are referred after an initial bone marrow has been performed but the study protocol requires another sample for biologic studies of unproved value. Subjects, physicians and third party payers frequently object and the subject is not enrolled in the study. Worse, this process is very plausibly non-random serving to exclude unfit subjects. And many persons >65 years (excluding the authors) can be healthier than younger persons. Substantial data indicate chronologic age is not the most important determinant of TRM; consideration should be given to replacing age as the criterion of eligibility with a composite index incorporating age with other covariates such as performance score, bilirubin and creatinine rather than regarding each of these in isolation.22, 23 Such indices are widely available.22, 23 In addition, eligibility criteria should be unambiguous in contrast to vague statements such as no active infections or reasonable life expectancy.

A more subtle form of selection bias occurs when subjects are excluded from study entry despite meeting eligibility criteria. The magnitude of this selection bias is difficult to quantify. Reasons for excluding these persons are typically non-random and reflect physicians’ subjective (but often accurate) impression these persons will not do well. Few would find the results of a study in which, for example, 30 subjects are treated out of an unknown number of eligible subjects, including those seen but never referred, more credible than the results of a study where number treated approximates numbers eligible. Our experience reviewing typescripts suggest examples of the former far outweigh the latter. Consideration should be given to requiring trialists report not only numbers of subjects treated on-study but also numbers of eligible persons not treated and reason(s) therefore. Outcomes of the latter could be reported as quasi controls.

Selection bias can also be relevant after study initiation. For example, drop-outs from a study often occurs at time of complete remission. A German AML Cooperative Study Group trial randomizing subjects at the time of study-entry to several subsequent therapy allocations, such as maintenance chemotherapy versus an autotransplant found <30–50% of randomized subjects once in remission received their assigned therapy.24 Selection bias would arise if the drop-outs were less representative of the universe of persons with AML than those receiving their assigned therapy. This might occur if, as typically, those who receive induction and post-remission therapy are younger and fitter than those receiving only induction therapy. Without knowing characteristics (and numbers) of these two groups, one might seriously overestimate the benefit of a post-remission therapy on the starting population. Another example is about 10% of subjects on ECOG trials drop out after each therapy cycle again inflating the estimated benefit on the starting population and on the totality of persons with AML (J Rowe, personal communication).

Although selection bias usually results in over-estimation of benefit, another type of selection bias potentially predisposes to false-negative results. Here new drugs are tested exclusively in subjects likely to fail, such as those with advanced AML or adverse cytogenetics or mutation profiles, unless the drug is extraordinarily effective. This bias might be decreased by successively testing new drugs in cohorts with diverse prognoses. Numbers of subjects in each cohort would depend on the results in the previous cohort with worse outcomes in the previous cohort resulting in fewer subjects in the next cohort. For example, a Bayesian strategy like that used by Montalban-Bravo et al.1 to adjust randomization probability in the expansion phase of their trial might be appropriate.25 This process begins with a prior probability of response for the worst prognosis cohort. Bayes theorem is then used to combine the observed results and the prior probability to yield a posterior probability, which serves as the prior probability for the next worst prognosis cohort. This strategy assumes a continuum of prognostic cohorts with increasing probabilities of a favorable outcome, a readily testable assumption.

We hope Montalban-Bravo et al.1 typescript and this editorial stimulate discussion of the important role of selection biases in determining outcomes of AML therapy. Further improvements in AML therapy may depend on improved understanding of the difference in normal and neoplastic myelopoiesis and the consequent development of truly effective drugs. The pace at which this will occur is, to some extent, beyond our control, difficult as this may be to admit. Consequently, we should focus on controllable factors such as selection biases in clinical trials. Let’s increase numbers of chosen people so we might have a better idea of the real impact of new therapies in AML.