Sir

We compliment Dr Flores and colleagues for the detailed descriptions of methodology and statistical analysis in their report on mifepristone for psychotic major depression (PMD) (Flores et al, 2006). However, similar to our concerns about earlier trials of mifepristone for this disorder (Rubin and Carroll, 2004), we have reservations about the authors' interpretations of their data.

First, we question the analysis of mifepristone's efficacy. Only the number of individuals meeting a 50% reduction in the Brief Psychiatric Rating Scale's Positive Symptom Subscale (BPRS PSS) was reported as significantly different between drug and placebo (reported χ2=3.968, 1 df, p=0.046). The authors used an uncorrected χ2 test to obtain this result, even though the frequency is less than five in one of the cells of the 2 × 2 distribution. When the more appropriate χ2 test with Yates' correction is applied, the result is not significant (χ2=2.539, 1 df, p=0.111). An alternative statistical test is Fisher's exact probability, which, with a two-tailed test, also is nonsignificant (p=0.109). Thus, there is not even a nominally significant difference in the BPRS PSS between drug and placebo. In addition, by ANOVA, the authors found no main effect of medication on BPRS PSS scores (Flores et al, 2006). This negative finding casts further doubt on the validity of the authors' claim of efficacy using the categorical criterion of 50% reduction of BPRS PSS scores. The conclusion that ‘…mifepristone was found to be significantly more effective than placebo in reducing psychotic symptoms’ therefore is not supported.

The second problem is that three outcome measures (the BPRS PSS, the full BPRS, and the Hamilton Depression Scale) were each analyzed by two different statistical methods, χ2 for numbers of subjects achieving a certain percentage reduction on each scale, and repeated measures ANOVA on the raw scale scores at Day 0 and Day 8. None of the other five statistical tests yielded a significant result. The aforementioned reported significance level of 0.046, even if it were valid, would represent a true significance for only one test and would become clearly nonsignificant when corrected for the actual number of statistical tests performed. Thus, by conventional statistical standards, the authors failed to demonstrate efficacy of mifepristone in reducing psychotic symptoms.

Third, there is reason to question the generalizability of these less than robust clinical trial results to clinical practice. The prevailing construct of PMD describes a severe illness (American Psychiatric Association, 2000) with significant suicide risk and functional incapacity that requires close clinical monitoring. Indeed, the company that aims to bring mifepristone to market for treatment of PMD (Corcept Therapeutics Inc.) has stated its intention to target in-patient psychiatric services that offer electroconvulsive therapy (ECT) as the company's primary market (Corcept Therapeutics Inc., 2004). In the present study, however, an unstated number of subjects was recruited by advertisement rather than through clinical pathways; subjects waited up to 4 weeks between screening and study participation; and, when the study treatment was finally scheduled, 16% of the enrolled sample (one in six cases) no longer met diagnostic criteria for major depressive disorder with psychotic features. These characteristics suggest that many patients had relatively mild or transient symptoms, and that they were generally not in clinical crisis. As well, the authors failed to demonstrate baseline overactivity or dysregulation of the hypothalamo-pituitary-adrenal (HPA) axis, a salient feature of PMD that provides the rationale for trials of a glucocorticoid receptor blocker like mifepristone.

Fourth, in their analyses of efficacy, the authors erroneously claimed that they performed an ‘(e)xamination of the clinical utility of mifepristone in the treatment of psychotic symptoms…’ They did not examine clinical utility. Rather, they examined a surrogate psychometric measure of psychotic symptom severity (BPRS PSS). This instrument is an ordinal subscale that lacks proven ratio properties, so the clinical utility of a 50% reduction of symptoms is uncertain. It is clear that the patients still had a notable mean psychotic symptom load at the end of 8 days treatment with mifepristone. We also call attention to the invalid PSS content used by the authors: one of the four items included in their PSS is termed ‘delusions,’ which is not an item in the full BPRS. Straightforward and easily understood measures of clinical utility would include: How many patients achieved remission of depression with mifepristone compared to placebo? How many in each group achieved full resolution of psychotic symptoms? How many in each group achieved remission of functional incapacity? What were the Global Assessment of Functioning (GAF) Scale scores (American Psychiatric Association, 2000) before and after treatment with mifepristone compared to placebo? In how many cases did addition of mifepristone or placebo render planned change of ineffective ongoing standard drug treatments unnecessary? In how many cases did addition of mifepristone or placebo render planned use of ECT unnecessary? These are the outcomes clinicians need to know. Other pertinent measures of clinical utility not reported are number needed to treat to remission, number needed to harm (four of 15 developed a skin rash), risk–benefit ratio, and cost-effectiveness (Laupacis et al, 1988; Feinstein, 1987).

Demonstration of clinical utility also will require comparison of mifepristone with standard treatments for psychotic depression, such as antidepressant–antipsychotic drug combinations and ECT. Recent research has highlighted how unreliable are claims of clinical utility for new drugs when comparison is made only with placebo, rather than with established treatments (Freedman, 2005; Lieberman et al, 2005). To date, there have been no head-to-head comparisons of mifepristone with standard treatments of psychotic depression. It is clear from the full BPRS and Hamilton Depression Scale data (Flores et al, 2006) that mifepristone had no significant advantage over placebo on overall clinical status or depression severity. The mean Hamilton score dropped eight points on mifepristone and seven points on placebo.

Fifth, the authors erroneously concluded from their endocrine data that ‘short-term use of mifepristone…may re-regulate the HPA axis (in psychotic depression).’ The only inference supported by the data is that mifepristone elevated plasma ACTH and cortisol levels, especially during the morning surge in circadian HPA axis activity, which is the expected effect of a glucocorticoid receptor blocker, even in control subjects. As the repeat testing in this study was conducted with the drug still present, the claim of HPA axis ‘re-regulation’ by reference to ‘steepened ascending slopes’ of cortisol and ACTH reflects nothing more than the drug confound. This claim of ‘re-regulation’ is misleading because it implies that the brief period of treatment with mifepristone led to an ongoing, therapeutically beneficial re-regulation of previously dysregulated HPA axis activity. The authors presented no data to support this suggestion, no data indicating pretreatment HPA axis dysregulation, no data indicating persistence of the claimed ‘re-regulation’ beyond the period of exposure to the drug, and no data from control subjects as a necessary comparison. As clinical use of mifepristone is projected for only 7–8 days in treating psychotic depression (Belanoff et al, 2002), the authors may not claim ‘re-regulation’ of the HPA axis in the absence of data beyond the immediate treatment period. A further problem is that the authors' claimed ‘re-regulation of the HPA axis’ by mifepristone, with elevation of plasma ACTH and cortisol above pretreatment levels, does not mitigate the baseline HPA axis overactivity of patients with PMD but rather adds to it.

In summary, the authors have made an invalid claim of efficacy for mifepristone in PMD, based on inadequate statistical analyses. The authors have made an invalid claim of clinical utility based on an inadequate measure of that construct. The population studied has dubious generalizability to clinical practice. And the authors have made an unjustified claim of beneficial neuroendocrine change based on weak experimental design and on a fundamental confusion about re-regulation of the HPA axis in PMD.