INTRODUCTION

The Brief Psychiatric Rating Scale (BPRS; Overall and Gorham, 1962) and the Clinical Global Impressions Scale (CGI; Guy, 1976) are both instruments frequently used to evaluate the psychopathological state of patients with schizophrenia. While the BPRS is a methodologically sound assessment device (for a review, see Hedlund and Vieweg, 1980), an important advantage of the CGI is that its results can be understood much more intuitively (Nierenberg and DeCecco, 2002). Furthermore, it takes only 1–2 min to complete the CGI. Time can be an important factor, especially in large, industry-independent, clinical trials. However, to the best of our knowledge, no investigations of the relative sensitivity of the CGI and the BPRS in detecting differences between drugs for patients with schizophrenia have yet been published. To fill this gap, we reanalyzed original patient data from four pivotal trials comparing amisulpride with haloperidol in the treatment of schizophrenia. In meta-analyses, amisulpride had shown a moderate, but statistically significant superiority in overall efficacy compared to conventional antipsychotics (Davis et al, 2003; Leucht et al, 2002). The amisulpride data set therefore appeared to provide an ideal basis for our investigation.

PATIENTS AND METHODS

The Database

We requested and obtained original patient data of four published randomized controlled trials that compared amisulpride with haloperidol for a post hoc analysis (Colonna et al, 2000; Carrière et al, 2000; Möller et al, 1997; Puech et al, 1998). These four studies represent the manufacturer's complete data set of trials comparing amisulpride with haloperidol with the exception of one trial (Speller et al, 1997) which did not use the BPRS and could therefore not be included. A number of further old and small amisulpride vs haloperidol comparisons have been published, but original patient data are no longer available because amisulpride has changed its owner several times (Klein et al, 1985; Pichot and Boyer, 1988; Costa e Silva, 1989; Delcker et al, 1990; Ziegler, 1989). Descriptions of these studies can be found in Leucht et al (2002). Wetzel et al (1998) was excluded a priori because it compared amisulpride with flupenthixol, and mainly because there was a large, statistically significant difference of the BPRS total score between groups at baseline (approximately 7 points), whereas there was no baseline difference between drugs using the CGI. We felt that this discrepancy could bias our analysis of the relative sensitivity of the two scales. Finally, two further available studies were not included, because one of them used the PANSS and not the BPRS, and because both of them compared amisulpride with another atypical antipsychotic (risperidone) so that pooling with the haloperidol studies in the meta-analysis would not have made sense (Peuskens et al, 1999; Sèchter et al, 2002). Important characteristics of the studies included are presented in Table 1. All studies were randomized and all but one (Colonna et al, 2000) were double-blind. All trials examined patients with schizophrenia or schizophreniform disorder according to DSM-III-R or DSM-IV (American Psychiatric Association, 1987, 1994), and with one exception (Carrière et al, 2000) all required various minimum scores as an inclusion criterion to assure that the patients had positive symptoms. One potentially ineffective 100 mg/day amisulpride dose group (n=61) from the study by Puech et al (1998) was excluded a priori. The mean BPRS total score at baseline of all included patients was 59.9±12.8 and the mean CGI-Severity score was 5.3±0.8 (n=1138, 737 men, 401 women; 754 received amisulpride, 383 received haloperidol; mean age 35.6±10.9 years, weight 70.0± 14.3 kg, height 170±9 cm).

Table 1 Characteristics of the Included Studies

Outcome Parameters

All studies used the original BPRS (18-item version, 1–7 rating system, Overall and Gorham, 1962); the items were not derived from the Positive and Negative Symptoms Scale (PANSS, Kay et al, 1987). The single items were rated on a seven-point scale (1=not present, 2=very mild, 3=mild, 4=moderate, 5=moderately severe, 6=severe, and 7=extremely severe). Thus, the range of possible BPRS total scores ranges from 18 to 126. The CGI-Severity (CGI-S) and the CGI-Global Improvement (CGI-I) scales (Guy, 1976) were also available for all studies. The CGI-S assesses the clinician's impression of the current state of the patient's illness. The rater is asked to ‘consider his total clinical experience with the given population’. As for the BPRS, the time span considered is the week prior to the rating; and the following scores can be given: 1=normal, not at all ill, 2=borderline mentally ill, 3=mildly ill, 4=moderately ill, 5=markedly ill, 6=severely ill, and 7=among the most extremely ill patients. The CGI-I assesses the patient's improvement or worsening since the beginning of the study using the following scores: 1=very much improved, 2=much improved, 3=minimally improved, 4=no change, 5=minimally worse, 6=much worse, 7=very much worse. A third item of the CGI that attempts to relate therapeutic effects and side effects, the efficacy index, was not used for the analysis.

A number of continuous and dichotomous outcomes were used to compare the sensitivity of the CGI and the BPRS in detecting differences in efficacy between amisulpride and haloperidol. The continuous outcomes were the mean BPRS total score at end point, the mean change of the BPRS total score from baseline, the mean CGI-severe score, and the mean CGI-improvement score. In analyzing dichotomous outcomes, we considered the following frequently used cutoffs to define response: an at least 20 or 50% reduction of the BPRS total score at baseline and a CGI-improvement score of at least minimally better or at least much better. Other cutoffs for the BPRS, for example, at least 30 or 40% reduction, can also be found in the literature, but the 20 and 50% cutoffs are probably the most typical ones. Furthermore, we analyzed the CGI-severity score only as a continuous outcome. Of course, it can also be analyzed as a dichotomous measure of response, but we felt that in antipsychotic drug trials it is most commonly used as a continuous measure.

The primary analysis was the difference between amisulpride and haloperidol at study end points in the pooled data set. In a last-observation-carried-forward approach (LOCF), patients' last observation was used in the case of dropouts even if there was no postbaseline evaluation and irrespective of the duration of the study. This was certainly a very conservative LOCF condition, a strict ‘once randomized–analyzed’ strategy that has for example been applied in reviews of the Cochrane Schizophrenia Group (Adams et al, 2005), whereas in many recent randomized antipsychotic drug trials at least one postbaseline rating was required. Since only 20 patients (1.8%) did not have at least one postbaseline rating, it is unlikely that the later strategy would have changed the results to any significant degree. Furthermore, an observed case analysis was also made at week 1, week 2, and week 4. Observed case analyses were not made at other time points, because not all studies reported evaluations at other weeks. We also analyzed the single studies separately. Only patients for whom data for all three scales (BPRS, CGI-I, CGI-S) were available were used, which explains the slight discrepancies between the figures in Table 1 and those shown in the analyses.

Statistical Analysis

Effect sizes as statistical measures of the magnitude of the difference between amisulpride and haloperidol were used to compare the relative sensitivity of the CGI and the BPRS in detecting differences between the two drugs. For the continuous variables, effect sizes were expressed as standardized mean differences (SMDs). SMDs are measures of the magnitude of the difference between the effects of two drugs expressed as standard deviations and thus allow the results obtained with different scales (here the CGI and the BPRS) to be compared. Various formulas for the calculation of SMDs are available. In the primary analysis, we used Hedges' g and its standard error se(g) according to the formulas:

where n1 and n2 are the number of patients in the amisulpride and haloperidol groups, respectively, F is the F-value of the treatment contrast between amisulpride and haloperidol, and dfe the number of degrees of freedom of its error term (Rosenthal, 1991, pp 22–23 and 65). Both were taken from an ANCOVA using treatment as a factor, and sex, gender, baseline BPRS and baseline CGI-severity score as covariates. In the pooled database, ‘study’ was used as a further covariate. All analyses were also corroborated using Cohen's d and Rosenthal's r (see for example Rosenthal (1991) for the exact formulas), which yielded virtually identical results. Data are presented on the journal's website (see Supplementary information).

For the dichotomous parameters, effect sizes were expressed as odds ratios and the results were corroborated using Rosenthal's r, yielding again virtually identical results (see Supplementary information). The odds ratios and their standard errors were derived from a logistic regression using the same covariates as those used for the continuous outcomes. The significance of the individual effect sizes was calculated as z=effect size/se (effect size). Furthermore, as a rule, if the 95% confidence intervals of the effect sizes in Figures 1 and 2 do not cross the y-axis, there is a statistically significant difference between groups (p at least <0.05).

Figure 1
figure 1

Effect sizes (95% CI) of the difference between amisulpride and haloperidol—continuous parameters. SMD=standardized mean difference, n=number of patients, BPRS=Brief Psychiatric Rating Scale, CGI=Clinical Global Impressions Scale, CI=confidence interval, LOCF=last observation carried forward. The lines around the effect sizes indicate their 95% confidence intervals. If these lines do not cross the y-axis, there is a statistically significant difference between amisulpride and haloperidol (p at least <0.05). Exact numbers are presented on the journal's website (see Supplementary information).

Figure 2
figure 2

Effect sizes (95% CI) of the difference between amisulpride and haloperidol—dichotomous parameters. OR=odds ratio, n=number of patients, BPRS=Brief Psychiatric Rating Scale, CGI=Clinical Global Impressions Scale, CI=confidence interval, LOCF=last observation carried forward. The lines around the odds ratios indicate their 95% confidence intervals. If these lines do not cross the y-axis, there is a statistically significant difference between amisulpride and haloperidol (p at least <0.05). Exact numbers are presented on the journal's website (see Supplementary information).

Finally, in order to compare the pooled effect sizes obtained using the BPRS and the two CGI scales, the scale values were standardized individually for each time point and subjected to a MANCOVA with treatment and studies as group factors, sex, gender, baseline BPRS and baseline CGI-severity score as covariates, and the three scales BPRS, CGI-S, and CGI-C as a measurement replication factor. This procedure was possible only for the SMDs derived from the continuous outcomes, since we are not aware of a methodologically sound statistical test for comparing odds ratios. It was therefore possible to compare the odds ratios only qualitatively, that is, by examining whether the 95% confidence intervals of the different scales overlapped. P-values below 0.05 (two-tailed) were considered to show statistical significance. All analyses were made with SPSS for Windows version 11.0 and Microsoft Excel 2000.

RESULTS

Continuous Outcomes

Figure 1 illustrates the results of the continuous outcomes obtained in the single studies and after all studies were pooled. It should be noted that the effect sizes obtained by the mean change of the BPRS from baseline and by the total BPRS score were identical in all cases, because the BPRS total score at baseline was used as a covariate in the ANCOVA. Therefore, the results of both methods of analyzing the BPRS are shown in one column of Figure 1.

The graphical display in terms of SMDs and their 95% confidence intervals shows that the effect sizes obtained from the BPRS and from the CGI were similar. Effect sizes derived from both scales (single studies and pooled results) showed the same direction of the effect in all instances. Although in all analyses the effect size of at least one of the CGI measures was numerically minimally greater than that of the BPRS, the respective 95% confidence intervals usually overlapped broadly; and numerical superiority of course does not mean statistical superiority. In the statistical comparison of the effect sizes obtained from the pooled results, the only significant difference between SMDs was found at week 2: here, the effect size derived from the CGI-I (SMD=0.26) was significantly greater than that of the BPRS (SMD=0.16), p=0.004.

Dichotomous Outcomes

Figure 2 illustrates the results derived from the dichotomous outcomes. Compared to the continuous outcomes, the graphical display suggests an overall greater variability of the results in the single studies using the different outcomes. This variability is not surprising, because some of the information is lost if results of a scale are dichotomized using a cutoff. Indeed, there was not only a higher variability degree between the BPRS and the CGI results; but the 20 and 50% BPRS results showed variability as well, at least when the individual studies were considered. On the whole, the sensitivities of the BPRS and the CGI were again similar. Although in 14 out of 20 cases at least one of the two CGI outcomes showed a slightly greater effect size than both BPRS outcomes, the 95% confidence intervals overlapped broadly indicating similar sensitivity. The pooled analysis of all four studies showed more consistent results across outcomes because pooling increases precision and reduces confidence intervals.

Amisulpride Vs Haloperidol

In the pooled results, all outcomes (continuous and dichotomous) showed an increasing superiority of amisulpride compared to haloperidol over the first 4 study weeks, as is illustrated by the increasing effect sizes. A statistically significant superiority of amisulpride was found as early as 2 weeks after the initiation of treatment according to all eight outcomes. In the primary LOCF analysis at study end points, the SMDs of continuous outcomes ranged between 0.27 and 0.31, indicating a superiority of amisulpride to the extent of approximately 0.3 standard deviations—a small to medium superiority according to Cohen's classification (Cohen, 1988). The odds ratios obtained for the pooled LOCF dichotomous outcomes ranged between 1.58 and 1.87. Thus, the odds for being a responder were approximately 1.58–1.87 greater in the amisulpride group than in the haloperidol group.

DISCUSSION

The main result was that the sensitivity of the two CGI scales in detecting efficacy differences between amisulpride and haloperidol was as great as that of the BPRS. The effect sizes obtained by the CGI and the BPRS were similar, and both scales showed a statistically significant superiority of amisulpride compared to haloperidol as early as 2 weeks after initiation of treatment.

This result is surprising at first glance. While the BPRS is a methodologically well developed and psychometrically sound assessment device (for a review, see Hedlund and Vieweg, 1980), not much research on the psychometric properties of the CGI has yet been undertaken. In 116 patients with panic disorder and depression, Leon et al (1993) found good concurrent validity and sensitivity to change, while others (Beneke and Rasmus, 1992) criticized the CGI on semantic, logical, and statistical grounds. However, in a manner similar to our findings, Khan et al (2002, 2004) showed in two antidepressant drug trials that the sensitivity of the CGI-S and CGI-I was similar to that of the Montgomery–Asberg depression rating scale (Montgomery and Asberg, 1979) and the Hamilton depression rating scale (Hamilton, 1960). One reason for the similar sensitivity of the CGI and the BPRS could be that physicians take the same factors into account in their CGI rating that are measured by the BPRS. Furthermore, physicians rating the CGI can also give a great deal of weight to a component of psychopathology that has shown a much change during the trial. In the BPRS, however, each symptom is scored by a single item, so that the calculation of a BPRS total score may level out effects in specific components of psychopathology.

A strength of our analysis is that a large sample was available, so that the results are rather robust. A further strength is that we analyzed not merely a single study, but we had all pivotal amisulpride vs haloperidol studies available, which showed consistent results. A limitation is that the data were derived from industry-sponsored trials on antipsychotic drugs. Although investigators usually obtain rater trainings prior to such trials, scales are often filled in quickly without much attention to detail. Sometimes the raters even change during the course of a study. A further important problem is that the CGI and the BPRS ratings were not made strictly independently, but rather one after the other by the same rater. Part of the raters may have filled in the BPRS first and then considered this rating when filling in the CGI. We do not believe that raters do this systematically, and part of the clinicians may have rated the CGI first. The problem may also be more important in regard to the CGI-S scale than in regard to the CGI-I scale. If the latter instrument were strongly influenced by the BPRS, raters would first have to consider the current BPRS rating and then compare this rating with the baseline rating. Such a procedure would require a very good memory. Nevertheless, specific studies would be needed to rule out this potential bias with absolute certainty. Patients could be randomly assigned to either CGI ratings after routine clinical interviews or to standardized BPRS interviews and vice versa. Another design was used for the evaluation of a new version of the CGI for schizophrenia (Haro et al, see below). A total of 114 patients were examined in a clinical interview simulating routine conditions, then the CGI was filled in, and only afterwards was a complete PANSS interview conducted to avoid the PANSS interview influencing the CGI rating. The sensitivity of the CGI and the PANSS in detecting changes from the patients' baseline state was the same. Until more such studies are available, our results should be considered only to be hypothesis generating. Nevertheless, from a practical point of view, physicians participating in industry-sponsored drug trials will always be pressed for time, which limits the quality of the BPRS ratings. To be sure, CGI ratings also require an interview, but the idea is to save time by extracting the information from routine clinical interviews rather than conducting formal BPRS interviews covering the specific BPRS questions. This advantage of the CGI is especially important in industry-independent, pragmatic trials. Such trials have recently been advocated due to the well-known problems of typical pharmacy-sponsored efficacy studies, such as high dropout rates (Wahlbeck et al, 2001) or inclusion of very select populations. Pragmatic studies attempt to recruit large patient samples in routine settings. Usually only limited funding is available, so that time is a crucial factor. This is a problem not only of antipsychotic drug trials but also in other psychiatric areas. In depression, patient self-report measures are currently under investigation (Rush et al, 2005), but the use of such scales in schizophrenia is problematic due to the nature of the disorder, especially thought disorder. Another advantage of the CGI is that its results can be understood intuitively (Nierenberg and DeCecco, 2002), while it is unclear what a mean BPRS total score of, for example, 40 or 60 means clinically. A limitation of the CGI is its unclear validity, so that we hasten to emphasize that its use as the only efficacy scale in a trial cannot currently be recommended. In contrast to the BPRS, it does not allow for a fine-grained examination of schizophrenic psychopathology, for example, in terms of positive symptoms or negative symptoms. It provides only a global judgment of a patient's overall state for which the rater is instructed to ‘consider his total clinical experience with the given population'. The rating may therefore depend both on the rater's experience and on patient characteristics, for example, whether a patient suffers mainly from positive or mainly from negative symptoms. Thus, scales such as the BPRS or the PANSS are of course much better instruments in situations where specific aspects of schizophrenic symptoms need to be assessed. To compensate for this limitation of the CGI, a version specifically designed for patients with schizophrenia has recently been developed (the Clinical Global Impression-Schizophrenia Scale) and its validity and reliability has been verified (Haro et al, 2003). This new scale uses the same scoring system as the original CGI, but the anchors are clearer and there are subscales for the specific evaluation of positive, negative, depressive, and cognitive symptoms. It has already been successfully used as a sole measure in a large prospective observational study in which clear efficacy differences between antipsychotics were found (Dossenbach et al, 2004). Finally, our work is of course not a ‘meta-analysis’ of all studies using the BPRS and the CGI. Individual patient data were needed for our analysis and companies are often hesitant to share such data. We are greatly indebted to SanofiAventis for their willingness to provide individual patient data from their pivotal trials and wish to stress that we did not arbitrarily select among a population of studies. However, obviously, replications using other medications in other samples would be useful.

We conclude that the sensitivity of the CGI in detecting differences between antipsychotic medications in drug trials may be as good as that of the BPRS. It is justified to use the former instrument, which consumes very little time, in addition to more sophisticated scales such as the BPRS in drug trials for schizophrenia. Further development and evaluation of the CGI is warranted.