Main

Do old paradigms remain relevant in this era of personalised medicine? Oncologists often design early Phase 2 trials as single arm studies, with dichotomous clinical outcomes as primary efficacy endpoints. There are hypothesised population values for the target endpoints of interest; and, comparison of observed outcomes from the trial with these population values are then utilised to justify further clinical testing. In this commentary, we argue that one might improve on the design and analysis of such trials through the use of individualised information.

We begin with a motivating example. The author recently consulted on a clinical study aimed at assessing the efficacy of adjuvant multimodality therapy in patients at high risk for prostate cancer recurrence after radical prostatectomy (Michael Lilly, University of California Irvine Comprehensive Cancer Center, personal communication). A single arm Phase 2 study was conducted, with biochemical recurrence constituting the primary efficacy endpoint. It was hypothesised that 2-year non-recurrence exceeding 90% would warrant further clinical investigation of the new therapy. Twenty-four patients were initially enrolled, and two recurrences were observed within 2 years of prostatectomy. Should the trialists be encouraged by the seemingly positive outcome of this trial?

The 90% target represents a global assessment, and represents the trialists’ prior judgment of a clinically significant outcome (Adjei et al, 2009). Nevertheless, this target outcome can be refined with individualised information from the study patients. For example, such individualised information is available from nomograms, which present tailored individual predictions of clinical outcomes based on patient characteristics known to be predictive of the outcome of interest. Several validated nomograms for disease recurrence after radical prostatectomy for prostate cancer have been developed (Feller, 1968; Kattan et al, 1998, 1999; Berry, 2006). In particular, these nomograms have been shown to predict actual clinical outcomes with high accuracy. We will illustrate how these nomogram assessments can be used as a comparator in our clinical trial, with emphasis on whether observed disease recurrence differs from what might be expected with nomogram prediction. Readers interested in the mathematical details can refer to the appendix; here, we summarise the main finding: if the nomogram probabilities are assumed to be accurate and well calibrated, and if the subjects enrolled in the trial have similar attributes to the training population used for nomogram development, then the probability of observing two or fewer failures by 2 years is less than one in fifty if adjuvant treatment is merely equivalent to standard of care.

We believe the use of individual estimates as comparators in the clinical trial setting is more appropriate than a global target, so long as the individual estimates are well calibrated, that is, that actual outcomes are accurately predicted by the estimated outcome probabilities. Perhaps a less contentious use of nomogram estimates in this setting relates to patient selection: one might hope to improve patient homogeneity, or the possibility of discerning treatment efficacy, by restricting entry to patients at perceived higher risk of progression. These patients would be more appropriate candidates for intensive therapy, such as adjuvant therapy administered after radical prostatectomy, than patients with a low a priori likelihood of disease progression. As a reviewer has commented, this notion of enriching a clinical trial with likely responders is very appealing, and should lead to more efficient trials. See Roach et al (2006) for related discussion.

We chose a validated nomogram for prediction of biochemical recurrence following radical prostatectomy. As a reviewer has commented, there are a plethora of available nomograms, and some discernment is needed when selecting one for comparator purposes. The nomogram we have selected aligns with the inclusion/exclusion criteria of our particular trial; and, importantly, it has been shown to be well calibrated. Hence the individualised predictions arising from the nomogram-derived probabilities should constitute an improvement over a global assumption that recurrence would occur at a fixed rate in the study cohort (as would be assumed in a ‘standard’ Phase 2 trial). Although perfect prediction would be ideal, reasonably high predictive accuracy is a realistic goal.

It has been argued (Shariat et al, 2008) that nomograms are the best available predictive tools for clinical outcomes (in terms of accuracy and discriminating characteristics) in prostate cancer. Nevertheless, alternatives to nomograms as comparators can be devised. The Stephenson nomogram is based on a Cox proportional hazards regression model, and the use of such a regression model would be another option for generating individualised predictions. Or, one could construct a more ‘modern’ nomogram, by incorporating molecular marker information or other potential predictors into the underlying algorithm. The issue then becomes, whether predictive accuracy is enhanced with these modern nomograms, relative to the available standards.

Intrinsic patient heterogeneity in clinical trials impacts both design and analysis. Suppose, for example, we were to design a Phase 2 trial to achieve a specified precision in the estimated outcome probability, based on the assumption that the clinical outcomes will be binomially distributed. If we fail to incorporate the variability in expected responses between patients (overdispersion in the responses relative to binomial variability), our design will be underpowered. Bayesian clinical trials Roach et al (2006) provide a natural framework for accommodating overdispersion in response distributions resulting from patient heterogeneity, and should become increasingly prominent in this era of personalised medicine.