INTRODUCTION

Achieving complete remission should be considered the standard of care in antidepressant treatment (Thase, 1999). Succinctly, the goals of antidepressant therapy should be to: (1) reduce and ultimately eliminate all signs and symptoms of depression, (2) restore functioning to the asymptomatic state, and (3) achieve and maintain remission (Thase, 2001). These goals are of the utmost clinical importance for individual patients, but difficult to measure in clinical trials. One crucial question is how to define remission in a way that a shorter-term definition of remission is relevant to longer-term outcome.

In clinical antidepressant trials, remission is defined using rating scales, such as the Hamilton Rating Scale for Depression (HAMD; Hamilton, 1960) or the Montgomery–Asberg Depression Rating Scale (MADRS; Montgomery and Asberg, 1979). A 50% reduction in score severity is often used as an indicator of response (Frank et al, 1991), but many subjects with a 50% improvement remain highly symptomatic. This standard is not an acceptable characterization of remission as residual or ‘subthreshold’ symptoms are associated with ongoing functional impairment and may increase the risk of developing further depressive episodes (Horwath et al, 1992; Paykel et al, 1995; Judd et al, 1997; Maier et al, 1997; Van Londen et al, 1998; Steffens et al, 2003).

Cutoff scores have also been proposed to define remission. HAMD scores between 7 and 11 are often used as definitions of remission (Nierenberg and Wright, 1999), with a HAMD score of 7 or less viewed as a stringent criterion for complete remission (Ballinger, 1999). This strict cutoff is considered a way to separate patients with true antidepressant response from those exhibiting nonspecific effects or spontaneous transient remission (Ballinger, 1999). It may also predict relapse, as subjects who achieve a HAMD below 8 have lower rates of relapse than do subjects with residual symptoms (Paykel et al, 1995). To establish a similar cutoff for the MADRS, a large-scale study examined a range of MADRS scores in depressed individuals who had achieved a Clinical Global Impression Scale for Severity (CGI-S; Guy, 1976) score of 1 (not ill) (Hawley et al, 2002). The authors concluded that a CGI-S score of 1 best correlated with MADRS scores below 10, thus providing an optimal definition of remission. Although they define a cutoff, the authors do not clearly demonstrate how a shorter term definition of remission may relate to longer-term risk of relapse.

Using the MADRS as a measure of depression severity, we examined how a range of MADRS scores representing remission at 3 and 6 months predicted relapse over 18 months in a cohort of elderly subjects with depression. Our primary question was, is there a cutoff score on the MADRS at 3 or 6 months above which one could reliably predict relapse? The corollary to this question is that patients achieving scores less than such a cutoff would have less risk of relapse, and so be more able to maintain remission.

To test this question, we examined MADRS scores as one might examine a laboratory result. Taking this approach, one can calculate the sensitivity and specificity of a particular score for predicting relapse. This further allows us to plot the sensitivity and specificity (for purposes of this technique, defined as 1-specificity) on a receiver operating characteristic (ROC) curve to determine if there is a particular point that provides optimal sensitivity and specificity.

METHODS

Sample

Subjects were depressed individuals enrolled in the NIMH-sponsored Mental Health Clinical Research Center (MHCRC) at Duke University Medical Center, a study designed to examine longitudinal outcomes of late-life depression. At baseline enrollment, all subjects met DSM-IV criteria for a current episode of major depression and were at least 60 years of age. Exclusion criteria for the study included: another major psychiatric illness; alcohol or drug abuse or dependence; clinically diagnosed primary neurologic illness, including dementia; and medications, illnesses, or physical disability affecting cognitive function.

The Institutional Review Board approved this study. After providing a complete description of the study to the subjects, written informed consent was obtained.

Study Design

All subjects participating in the larger study were considered for this analysis. As we wished to examine long-term outcomes of subjects with short-term remission, all subjects in the current study had to exhibit a MADRS 15 at either the 3- or 6-month assessment, and had to be followed for a total of 18 months or until relapse, defined as a subsequent MADRS >15. Subjects who exhibited a MADRS >15 at either the 3- or 6-month assessment were excluded from this analysis.

Baseline Cognitive Screen

Subjects were excluded if they had dementia or suspected dementia at baseline. Geriatric psychiatrists examined all subjects, reviewed medical records, and conferred with referring physicians on all subjects. Subjects were excluded if they had a Mini Mental State Examination (Folstein et al, 1975) scores below 25 after an acute (8-week) treatment phase.

Assessment Procedures

At baseline, a trained interviewer administered the Duke Depression Evaluation Schedule (DDES; Landerman et al, 1989), which assesses depression using the NIMH Diagnostic Interview Schedule (Robins et al, 1981). Depression severity was measured using the MADRS at baseline and every 3 months during the study. There was no minimum baseline MADRS required for study entry.

The DDES also includes measures of functional disability, level of perceived social support, and medical comorbidity. Functional disability was measured in terms of deficits in instrumental activities of daily living (IADLs), using nine items modified from previous studies (Rosow and Breslau, 1966): getting around the neighborhood, shopping for necessities, preparing meals, cleaning house, doing yard work, managing finances, walking one fourth of a mile, walking up and down a flight of stairs, and caring for children. Subjects reply using a standardized three-option answer format (‘yes’, ‘with difficulty’, or ‘no’). Higher scores were indicative of greater impairment, and composite measures were constructed (Steffens et al, 1999). Social support measures included the Subjective Social Support Scale and the Instrumental Social Support Scale from the Duke Social Support Index (George et al, 1989; Landerman et al, 1989). Medical comorbidity was measured using the clinician-administered Cumulative Illness Rating Scale (CIRS) (Linn et al, 1968), modified for geriatric populations (Miller et al, 1992).

Antidepressant Treatment

The MHCRC provides antidepressant therapy based on a rigorous, guideline-based algorithm, the Duke Somatic Treatment Algorithm for Geriatric Depression (STAGED) Approach (Steffens et al, 2002). This approach provides guidelines for antidepressant therapy based on past medication history and current depression severity. All approved antidepressant medications were available for use. For individuals who have failed multiple drug trials, there are possibilities of combination medication trials and electroconvulsive therapy. Subjects were not routinely referred to psychotherapy, although some were already engaged in ongoing psychotherapy at study entry while others were referred for individual and/or group psychotherapy, usually cognitive-behavioral psychotherapy.

Treatment was monitored by MHCRC investigators to assure that the clinical protocol was being followed. Subjects were evaluated every 3 months, or more frequently as clinically indicated. Each subject is thus offered, to the best of our ability, the most appropriate care.

Analytic Strategy

Summary statistics were derived for demographic and clinical variables. Means and standard deviations reported for continuous variables and percentages for dichotomous variables. Two-tailed t-tests were used to test for differences in continuous variables and χ2 tests for categorical variables between subjects who did and did not subsequently relapse.

We examined MADRS scores ranging from 0 to 15 at the 3- and 6-month assessment periods. We then examined their ability to predict relapse (defined as MADRS >15) within 18 months from baseline. Sensitivity and specificity was calculated for each MADRS score. Subjects with a MADRS at or above the cutoff at 3- or 6-months would be expected to relapse by 18 months, while those with a MADRS lower than the cutoff at 3- or 6-month would be expected to not relapse. After these calculations, a ROC curve was constructed and the area under the curve (AUC) was calculated.

Sensitivity is the ability to detect those who will relapse, and relates to the number of subjects at or above a given cutoff who relapse. In this case, the greater the number of subjects with MADRS scores at or above the cutoff who ultimately relapse, the higher the sensitivity. Specificity is the ability to detect those who will not relapse, and relates to the number of subjects below a given cutoff who do not relapse. In this case, the greater the number of subjects with MADRS scores below the cutoff who do not later relapse, the higher the sensitivity.

Other factors, such as functional disability, medical comorbidity, and social support may also affect depression, thus limiting the univariate relationship between 3- or 6-month depression severity and later risk of relapse. To consider these potential confounders, we performed a post hoc analysis wherein we reconstructed the 3- and 6-month ROC curves for relapse while controlling for these factors. We calculated the AUC for these new curves.

RESULTS

Sample Characteristics: Cohort with Remission at 3 Months

Out of a cohort of 255 potential subjects, 154 subjects (60.4%) exhibited a MADRS15 at 3 months and so were included in the 3-month analyses. This group had a mean age of 69.1 years (SD=6.96) and 65.5% were female. They had a mean CIRS score of 4.0 (SD=3.1), a mean IADL deficit score of 3.9 (SD=4.9), and a mean social support scale score of 23.6 (SD=3.6). The mean baseline MADRS score was 27.32 (SD=7.87), with a 3-month mean MADRS of 6.95 (SD=4.71).

Over the next 15 months (for 18 months in the study total), 101 subjects maintained remission while 53 relapsed. The mean time to relapse was 195 days (SD=117.9). There were no significant differences in age (t-test, 152 df, t=−0.80, p=0.4243), sex (χ2=0.6397, 1 df, p=0.4238), CIRS score (t-test, 152 df, t=−1.11, p=0.2706), or IADL impairment (t-test, 152 df, t=−1.67, p=0.0963) between the groups who did and did not relapse over this time. The group who relapsed exhibited a statistically significant lower score on the social support scale (mean=22.7 compared with 24.1; t-test, 152 df, t=2.25, p=0.0258) and a significantly higher 3-month mean MADRS score (8.3, SD=4.2 compared with 6.2, SD=4.8; t-test, 152 df, t=−2.66, p=0.0087).

Sample Characteristics: Cohort with Remission at 6 Months

Out of a cohort of 240 potential subjects, 118 subjects (49.2%) exhibited a MADRS 15 at 6 months and so were included in the 6-month analyses. In total, 26 subjects included in the 3-month evaluation had relapsed by this assessment, so were not included in this analysis. The 118 remitted subjects had a mean age of 68.6 years (SD=6.69) and 62.7% were female. They had a mean CIRS score of 3.8 (SD=3.0), a mean IADL deficit score of 3.5 (SD=4.9), and a mean social support scale score of 23.6 (SD=3.5). The mean baseline MADRS was 27.99 (SD=7.94), with a 6-month mean MADRS of 6.54 (SD=4.63).

Over the next 12 months (for 18 months in the study total), 91 subjects maintained remission, while 27 relapsed. The mean time to relapse was 178 days (SD=98.1). There were no significant differences in age (t-test, 116 df, t=0.34, p=0.7329), sex (χ2=0.2342, 1 df, p=0.6284), CIRS score (t-test, 116 df, t=−0.70, p=0.4852), or IADL impairment (t-test, 116 df, t=−0.64, p=0.5242) between the groups who did and did not relapse by 18 months. The group who relapsed exhibited a statistically significant lower score on the social support scale (mean=22.1 compared with 24.0; t-test, 116 df, t=2.53, p=0.0128) and significantly higher 6-month mean MADRS score (8.1, SD=4.9 compared with 5.4, SD=4.3; t-test, 116 df, t=−2.81, p=0.0059).

Data Missing from the Samples

To be included in these evaluations, not only did subjects have to have a MADRS <16 at either the 3- or 6-month evaluation, but they also had to have longitudinal assessments (1) either until 18 months from baseline, or (2) until relapse, up to 18 months from baseline. In total, 10 of the 101 subjects who had a MADRS <16 at 3 months and did not relapse by 18 months did not have MADRS data at the 6-month evaluation, so were not included in the 6-month evaluation. Five subjects who had not remitted at the 3-month assessment did not have 6-month data.

ROC Curve Analysis of MADRS Data

There was no specific cutoff on either the 3- or 6-month ROC curve above which one could predict relapse with reasonable sensitivity and specificity (Table 1 and Figure 1), although the 6-month ROC exhibited more of a flattening plateau effect than did the 3-month curve. The 3-month ROC had an AUC of 0.63; the 6-month ROC had an AUC of 0.66. There was an increase in sensitivity and decrease in specificity as the MADRS cutoff dropped. For a low MADRS cutoff such as 2, this means that most subjects who later relapse will have a MADRS at or above the cutoff (high sensitivity of 96.2%); unfortunately, most subjects who do not later relapse will also be at or above the cutoff (low specificity of 14.9%). For a high MADRS cutoff of 13, many subjects who later relapse will have a MADRS score below the cutoff (low sensitivity of 18.9%), while many subjects who do not later relapse also have a MADRS below this cutoff (high specificity of 88.1%).

Table 1 Sensitivity and Specificity of Defining Relapse by 3-Month MADRS
Figure 1
figure 1

ROC curve of long-term relapse with respect to early remission. (a) Relapse with respect to 3-month MADRS. AUC value=0.63. (b) Relapse with respect to 6-month MADRS. AUC value=0.66.

As a post hoc analysis, we recalculated 3- and 6-month ROC curves for relapse while controlling for CIRS score, IADL score, and perceived social support score. These adjustments had minimal effect on either curve; the adjusted 3-month ROC exhibited an AUC of 0.66, the adjusted 6-month ROC exhibited an AUC of 0.67.

DISCUSSION

The primary finding in this large study of depressed elders is that there is no cutoff score on the MADRS that can provide both a sensitive and specific measure of later depression relapse. This finding is robust even when controlling for other factors that may influence relapse, such as functional disability, medical comorbidity, and perceived social support. Concordant with other reports (Paykel et al, 1995), we found that the greater the level of residual depressive symptoms early in treatment, the greater the longer-term risk of relapse.

In this report we use 3- and 6-month MADRS scores as laboratory tests, examining their sensitivity and specificity in predicting outcomes over 18 months. As in any test, one hopes to find a cutoff score above which one could detect disease (in this case, later relapse) with good sensitivity and specificity. We found no such point: at both 3 and 6 months; there was no MADRS score where both sensitivity and specificity were above 70%. The AUC values were also low, approximately 0.6 for both analyses. In general, the closer an AUC value is to 1.0, the better the ‘test’; AUC values in the range reported here show that 3- or 6-month MADRS scores cannot well predict future relapse.

A variety of scores for depression scales have been proposed as a cutoff for remission, but these do not appear to be related to the longer-term risk of relapse. This suggests that there is no particular cutoff that is sufficient to consider as ‘low enough’ to protect against future relapse, so the primary conclusion would be to strive for the lowest score possible. Although rating scale remission cutoff scores may be a useful benchmark for clinical research, they are of less clinical significance and do not replace the goal of total symptom remission.

We did not find a strong effect of other factors considered to be associated with risk of relapse, as demonstrated by the minimal change seen in the AUC values for the adjusted ROC curves. Although greater disability and lower perceived social support are generally associated with depression outcomes in elderly populations (Bosworth et al, 2002), these effects may be less when compared with the effect of residual depressive symptoms, although the group who relapsed did exhibit a statistically significant lower score on the measure of perceived social support than did the group who maintained remission. Medical comorbidity has long been considered a risk factor for poor outcomes, although recent research has shown that it may have less of an effect on outcomes than previously thought (Miller et al, 2002; Krishnan, 2003).

This study provokes us to reconsider how we define remission and how we use definitions of remission in clinical trials. Does a defined rating scale score provide any predictive value of how that individual will do over the next year? Although this study examined the MADRS and not the HAMD scale which is more commonly used in clinical trials, based on our results we would hypothesize that our findings would hold true for other measures. Clearly, a comparable effort needs to be made to examine the HAMD scale. Better definitions of remission are also necessary, particularly definitions that provide more empirically validated ability to predict maintenance of remission. This may require the development of measures more sensitive and specific to long-term outcomes.

The relationship we describe between short-term remission and long-term outcomes is more applicable to clinical research than clinical practice. Clinical trials make conclusions based on definitions of response and remission after a treatment period lasting typically 6–12 weeks; our analysis of 3-month MADRS provides an appropriate comparison. In contrast, a medication trial in a clinical patient may have varying lengths depending on multiple factors, and not adhere to a rigid 3- or 6-month timetable. Still, our study would support that clinically treated subjects with greater residual depressive symptoms are at higher risk of long-term relapse.

This study should be replicated not only using other scales beyond the MADRS but also in different populations. Long-term fluctuations in depression severity in the elderly population are not well studied, although, as in other populations, residual symptoms increase risk of relapse (Steffens et al, 2003). Younger depressed individuals, who may have fewer confounding problems such as disability and medical comorbidity, may exhibit a different pattern than what we found in this study. Further, future studies may examine specific symptom domains that may be important for sustained remission or for risk of relapse.

This study does have limitations. Antidepressant treatment was guided by the STAGED algorithm (Steffens et al, 2002), which allows for all available antidepressant treatments. Although this more accurately mirrors ‘real world’ treatments, it does not allow for direct comparisons with more rigid randomized clinical trials. However, it does allow for flexibility in the treatment regimen and changes in the treatment regimen should be expected for subjects who relapse or are otherwise doing poorly. Additionally, although we consider other factors that may contribute to relapse in depression, such as disability, perceived social support, and medical comorbidity, this is not necessarily an exhaustive list of potential contributors to relapse. Other factors such as specific or global cognitive deficits (Alexopoulos et al, 2000) may also contribute to a differential risk of relapse.

In conclusion, we did not find a cutoff MADRS score at 3- or 6-months that predicted maintenance of remission at 18 months. We did find that lower scores at each time point were increased with greater likelihood of maintaining remission. Thus, the rating scale cutoffs commonly used in clinical research have little utility in the long-term management of older patients with depression, where complete symptom remission should be the goal.