The Food and Drug Administration (FDA) recently cleared the first transcranial magnetic stimulation (TMS) device to treat depression in patients who have failed one antidepressant (Neuronetics, 2008). That decision seems to be based on an analysis very similar to the one reported by Lisanby et al (2009).

Lisanby characterizes her study as ‘an exploratory statistical approach’, but the more accurate description is the FDA's: ‘a post hoc evaluation’ (Food and Drug Administration, 2007a). Moreover, it is a post hoc subgroup evaluation of a published (O'Reardon et al, 2007) negative randomized controlled trial. As stated by FDA clinical trial expert Dr Robert Temple at another FDA advisory committee meeting, an ‘after-the-fact subset analyses in a study that did not win… is different from subset analyses in a study that did win’(Food and Drug Administration, 2005).

Lisanby, however, states that the published trial showed ‘TMS to be safe and efficacious’ (Lisanby et al, 2009). This is misleading. In the trial (O'Reardon et al, 2007), the difference between treatment arms was both statistically and clinically non-significant (p=0.057, 1.7 points on the 60-point Montgomery Asberg Depression Rating Scale) for the primary outcome (change in Montgomery Asberg Depression Rating Scale at 4 weeks). This finding only became statistically significant (p=0.038) after the post hoc exclusion of six patients even though they had met a priori inclusion criteria, an obviously inappropriate statistical maneuver.

Even if the trial had been positive, several intrinsic aspects of post hoc analyses make them particularly subject to bias.

First, findings that arise from a post hoc analysis are often used to guide future investigation, but should not themselves be interpreted as conclusive (Rothwell, 2005; Furberg and Furberg, 2007). Such analyses can be motivated by an earlier inspection of the data, which is a potential source of bias (Wang et al, 2007; Hayward, 2002). As Dr Thomas Brott, chairperson of the advisory committee that considered TMS, concluded, ‘trying to stratify in this fashion and draw conclusions when the overall P-value is in the range that it is, is done with great peril’ (Food and Drug Administration, 2007b).

Second, post hoc explorations can conceal unfavorable findings. The company presented data to the FDA advisory committee showing treatment efficacy separately for those with one to four earlier treatment failures. However, Lisanby combines those with two to four such failures, comparing that group with patients with only a single treatment failure. Dichotomizing treatment failure obscures the fact that the small group (n=12) with four earlier treatment failures also seemed to respond to TMS (p=0.022), a finding that undermines Lisanby's conclusion about the effect of treatment resistance on the response to TMS (Food and Drug Administration, 2007a).

Third, Lisanby did not justify why she did not adjust the significance level for multiple hypothesis testing when, according to her methods, at least 10 variables were tested.

In sum, Lisanby did not clearly identify her analysis as post hoc, mischaracterized the full trial as positive, and reached a conclusion of efficacy based on a post hoc evaluation that obscured important treatment variability while neglecting to account for multiple comparisons. Indeed, when presented with the same data, the FDA advisory committee concluded that TMS' ‘clinical effect was perhaps marginal, borderline, questionable, and perhaps a reasonable person could ask whether there was an effect at all’ and rejected the device (Food and Drug Administration, 2007b). It is concerning that FDA has cleared this device, particularly if patients are diverted from effective therapies such as antidepressant medications.