Introduction

Differences in efficacy between antipsychotic drugs and placebo in clinical trials of acute schizophrenia have decreased over the decades and we have recently demonstrated that this is to large degree due to an increasing placebo response [1]. This phenomenon might help to account for some unexpected trial results where even standard drugs such as haloperidol did not outperform placebo [2]. Implications of these results pose a challenge for antipsychotic drug development, as well as for assessment of their efficacy in everyday clinical practice.

We previously have demonstrated that drug-placebo differences can be explained mainly by the increasing placebo response and to a lesser extent industry sponsorship. When we investigated factors explaining placebo-response in a subsequent paper, lower chronicity and higher sample size were correlated with this phenomenon [3], replicating some of the findings of previous publications, which were based on ~2 times smaller datasets [4,5,6].

However, as effect sizes are defined as a difference between drug effects and placebo effects, the remaining question is which factors moderate drug-response and whether these are different from those moderating placebo-response. Intuitively, one would assume that a factor leading to an increase in placebo response would lead to a parallel increase of drug response (this is sometimes referred to as the “additivity assumption” [6, 7]). In our original paper on drug-placebo differences, we found that the additivity assumption is not always fulfilled. While placebo-response increased over the years, drug-response remained stable, leading to decreasing effect sizes over the decades [1].

Therefore, here we try to identify modifiers of drug-response and explore how they interplay with modifiers of placebo response in the forming of effect sizes. Our results will provide further in-depth clarification for the decrease in drug-placebo differences over the years and may have an impact on the design of future placebo-controlled clinical trials of antipsychotic drugs.

Material and methods

We followed the PRISMA guidelines [8] (see checklist in the data supplement S1) and initially published a protocol in PROSPERO (CRD42013003342, see data supplement S2). We used the same database and largely the same text of the method section as in our previous publications for consistency [1, 3].

Inclusion/exclusion criteria

Participants

Adults with acute exacerbations of schizophrenia or related disorders (following the Cochrane Schizophrenia Group) were included. We accepted all diagnostic criteria and we also included schizoaffective, schizophreniform, or delusional disorder, because these do not require generally different treatment [9]. We excluded relapse prevention studies in stable patients receiving maintenance medication [10], studies in patients with predominant negative symptoms, and studies in patients with major concomitant somatic or psychiatric illness. There is no indication that effect sizes have decreased in relapse prevention studies in stable patients, differences between drug and placebo remain large [10]. The outcome in patients with predominant negative symptoms would be different (negative symptoms rather than overall symptoms). Studies in patients with major concomitant or psychiatric illness are extremely rare and would have increased heterogeneity even further.

Interventions

We included all antipsychotics licensed in at least one country, except clozapine, a more efficacious drug [11] so that pooling with the other compounds would not have been appropriate (only one clozapine arm with nine patients had to be excluded on this basis [12] making the impact of this decision negligible). We excluded intramuscular formulations, because these are used primarily as sources either for emergency use (short-acting i.m. drugs) or for relapse prevention (long-acting depot drugs). We examined all antipsychotics as a group under the assumption that efficacy differences between drugs are small [11, 13,14,15], except for clozapine, which was excluded for this reason.

Types of studies

Published and unpublished, double-blind, placebo-controlled randomized controlled trials of at least 3 weeks duration [16] were included. Studies with a high risk οf bias in sequence generation or allocation concealment were excluded [17]. We a priori excluded Chinese studies due to quality concerns [18, 19].

Search strategy

We searched the Cochrane-Schizophrenia-Group-Controlled-Trials-Register (compiled by regular systematic searches of more than 15 databases, clinical trial registers, the FDA website, hand searches and conference proceedings [20] without language restrictions, available to us until version August 2009) with the term “placebo;” and we searched MEDLINE, EMBASE, PsychInfo, Cochrane CENTRAL and ClinicalTrials.gov (last search October 2016, search terms are presented in the online supplement S3), supplemented by screening previous reviews [5, 11, 21,22,23,24,25,26,27,28,29].

Outcomes

The outcome was the mean change from baseline to endpoint of the Positive and Negative Syndrome Scale (PANSS [30]) total score. If the PANSS was not available we used the change from baseline to endpoint of the Brief Psychiatric Rating Scale (BPRS [31]) and converted it to the PANSS using a validated method [32].

Study selection and data extraction

At least two reviewers among MH, MT, MS, and SL independently selected potentially relevant publications from the abstracts found by our search and decided to include studies, and at least two reviewers among CL, MH, BH, MS, MR, SB, MK, PR, TA, NP, and SL (see acknowledgement) extracted data in duplicate in Excel sheets. Risk of bias was independently assessed by at least two of the following reviewers (CL, SL, MH, BH) with the Cochrane Collaboration’s risk-of-bias tool [17]. Disagreement was resolved by discussion. Missing data were requested from authors or the sponsoring pharmaceutical companies for all studies published in the last 30 years. We preferably extracted intention-to-treat data and we preferred mixed-effect-model-of-repeated-measurements (MMRM) models over last-observation-carried-forward (LOCF). It is nowadays a standard that analyses should be based on intention-to-treat datasets, and it has been shown that MMRM is a more appropriate statistical approach to account for attrition in psychopharmacological trials than LOCF [33]. Missing standard deviations were estimated from test statistics or by using the mean standard deviation of the remaining studies [34, 35].

Statistical analysis

We conducted meta-regressions in a frequentist framework with drug-response as dependent variables and compared them with our previously published results on placebo-response [3] and drug-placebo differences [1]. All the potential predictors that we had used in our previous papers [1, 3] were analyzed as independent variables. Drug and placebo-response were continuous variables defined as the difference in PANSS/BPRS scale before and after treatment. Drug-placebo differences were calculated as standardized mean differences. The initial choice of predictors had been based on previous evidence [4,5,6, 36, 37], which suggested that these factors might be relevant. We categorized the moderators into patient-, study design-, and drug-related factors, although there were expected overlaps. We first ran univariable meta-regressions exploring separately the effect of each potential moderator. For the multivariable meta-regression models we only used factors that were significant in univariable analyses and we followed a formal variable selection procedure using the backward-stepwise algorithm with removal criterion p = 0.15. We monitored how much heterogeneity in drug response, placebo response and drug placebo differences each predictor explains by comparing the heterogeneity of each meta-regression model with the heterogeneity of the model without any covariates.

Patient-related factors

The patient-related factors were: chronicity measured by the patients’ mean age, duration of illness, duration of the current episode, and first-episode status [5, 37], percentage men [37], US American populations vs. not/mixed countries [38], severity (PANSS total score) at baseline [36], in-vs. outpatient [5], operationalized criteria (e.g., ICD-10 or DSM-III to IV-R) vs. unspecific ‘clinical diagnoses’’, and the total dropout rates of both groups combined (newly added as a moderator) because the joint process of response with dropout is rarely accounted for appropriately [39, 40].

Study design-related factors

We analyzed publication year, the impact of risk of bias (appropriate vs. unclear randomization [41] and allocation concealment methods [42], blinding [42], and missing outcome data [17, 43]), study duration [5], duration of wash-out [5], requirement of a scale-derived minimum of symptoms at baseline [36], PANSS vs. BPRS as a scale, sample size [44], number of sites [5], percentage of academic sites [5], number of medications and arms [5], percentage of participants randomized to placebo [4], and drug company sponsorship of at least one study arm (medication donation alone was not considered company sponsorship [45]).

Drug-related factors

We classified the antipsychotics by their mechanisms according to the “Neuroscience-based Nomenclature” [46], antipsychotic doses in chlorpromazine equivalents according to the International-Consensus-Study-of Antipsychotic-Dosing [6, 47], and fixed vs. flexible dose studies [4].

We also analyzed whether the degree of placebo response in the studies was associated with the degree of drug response. We performed all analyses using Stata 14.2 and assuming a significance level of 5%.

Results

Description of included studies

The PRISMA flow diagram [8] is presented in the online supplement Fig. S1 and a description of the included studies in supplement Table S4. A summary of study characteristics is presented in Table 1. Overall, 167 studies with 28,102 participants met the inclusion criteria, of which 104 studies with 23,567 participants (8023 allocated to the placebo groups, and 15,544 to the drug groups), which were published between 1969 and 2016 provided data to calculate drug-placebo differences. In the studies with such data, the patients’ mean duration of illness was 13.8 (SD 4.0) years, the mean age 38.6 (SD 4.8) years and the median duration of studies with useable outcomes was 6 weeks (range 3–26 weeks, for the outcome examined here all studies except one (26 weeks) lasted ≤ 12 weeks). There were no studies in first-episode patients or in treatment-resistant patients. Risk of bias is presented in the online supplement S5. We only included randomized, double-blind trials, but the reports often did not indicate full details about sequence generation, or allocation concealment. Descriptions of methods and success of blinding were frequently insufficient, as well. The data reflected the high dropout rates in current schizophrenia studies (overall mean 39.3%, SD 17.1). Older studies were poorly reported, making it often impossible to extract data (52 studies (50%) of the had a high risk of selective reporting). Sixty-five studies (62.5%) were sponsored by the manufacturers of one antipsychotic included, 31 (29.8%) were not primarily industry sponsored and in 8 (7.7%) studies the sponsor was unclear.

Table 1 Summary of characteristics of included studies

Analysis of potential moderators—univariable analysis

The mean drug response in PANSS units was 17.45 (95% CI 15.89,19.01; 100 studies (N) with 14,933 participants (n)), the placebo response was 6.25 (95% CI 4.64,7.85; 99 studies (N) with 7623 participants (n)) and the mean SMD was 0.47 (95% CI 0.42,0.52, p < 0.001). As expected, drug-response and placebo-response were strongly correlated (Fig. 1).

Fig. 1
figure 1

Meta-regression placebo-response vs. drug-response

In the following text, in Table 2 and in Fig. 2 we sorted the moderators in the following way:

  1. 1.

    We first present the moderators that had a significant effect on the drug-placebo difference (effect size, SMD), because it is the drug-placebo difference that ultimately counts.

  2. 2.

    We then present the results of the moderators that had a significant effect on either drug-response or placebo-response, but without an important effect on the resulting drug-placebo difference.

  3. 3.

    The moderators that had no important effect on any result are summarized at the end.

Table 2 Univariable meta-regressions of potential predictors of drug-response, placebo-response, and drug-placebo differences (SMDs)
Fig. 2
figure 2figure 2figure 2

Moderators of placebo-response—univariable meta-regressions. The figures in this panel correspond to the following moderators: a Publication year, b number of participants, c)number of sites, d mean dose in chlorpromazine equivalents, e minimum scale-derived severity threshold as inclusion criterion, f average participant age in years, g average duration of illness in years, h industry sponsorship, i scale, j operationalized criteria or not, k drug mechanism, l number of medications, m baseline severity (PANSS total score at baseline) n duration of wash-out phases, o study duration, p country, q total dropout rate, r percentage randomized to placebo, s number of arms risk, t percentage of academic sites, u dosing schedule, v percentage of men, w risk of bias concerning randomization method, x risk of bias concerning allocation concealment, y risk of bias concerning blinding, z risk of bias concerning missing outcomes. M1–M5 are drug mechanisms of action according to the “Neuroscience-based Nomenclature (NbN)” [46]: M1 = receptor antagonists (D2) clopenthixol, fluphenazine, haloperidol, perphenzaine, pimozide, pipotiazine, sulpiride, trifluoperazine. M2 = receptor antagonists (D2, 5-HT2) chlorpromazine, iloperidone, loxapine, lurasidone, olanzapine, sertindole, thioridazine, ziprasidone, zotepine. M3 = receptor partial agonists (D2, 5-HT1A) aripiprazole, brexpiprazole, cariprazine. M4 = receptor antagonists (D2, 5-HT2, NE alpha2) asenapine, paliperidone, risperidone. M5 = receptor antagonist (D2, 5-HT2) and reuptake inhibitor (NET) quetiapine. A few old drugs have not been classified by NbN yet. The B-values and their 95% confidence intervals at the bottom of the graphs are the coefficients for placebo response (Bpla), drug response (Bdrug), and drug-placebo differences (BSMD). The numbers in square brackets in the figures describe to what the coefficient refers to. For example, in a publication year: “ [10 years increase], Bpla = 2.74 (1.60, 3.88), Bdrug 0.26 (−0.96, 1.48), BSMD (–0.10 (–0.13, –0.06)” means a study that was conducted 10 years later had on average 2.74 (95% confidence interval 1.60 to 3.88) PANSS points higher placebo response, 0.26 (–0.96, 1.48) higher drug response and –0.10 (–0.13, –0.06) lower drug-placebo difference. Or in g, as an example for a dichotomous moderator: [With (vs. not)], Bpla = 7.47 (3.93, 11.01), Bdrug = 0.93 (–2.83, 4.70), BSMD = –0.21 (–0.34, –0.09)” means a study, which had a minimum baseline severity as an inclusion criterion had on average 7.47 (95% confidence interval 3.93, 11.01) PANSS points higher placebo response, –0.21 (–0.34, –0.09) lower drug response and –0.21 (–0.34, –0.09) lower drug-placebo difference compared to a study without such a criterion. The moderators are statistically significant if the 95% confidence interval does not include 1. 1Results without one outlier, which was the only study restricted to elderly people with schizophrenia who had a mean age of 70 years, 20 years more than the next oldest study population, 2this meta-regression was also statistically significant when the only outlier study of a duration of 26 weeks was excluded

Moderators with a significant effect on drug-placebo differences (SMDs)

Placebo-response was significantly higher in more recent studies, in studies with a larger number of participants and sites, in studies with a minimum baseline severity as an inclusion criterion, use of PANSS instead of the BPRS and in studies with operationalized diagnostic criteria, whereas drug-response was not significantly associated with these factors (Fig. 2a–l). The net effect were smaller drug-placebo differences (Fig. 2a, b, c, e, i, j and Table 1).

Studies using higher mean doses in chlorpromazine equivalents had significantly smaller placebo-response and tended to have more drug response (not significant), resulting in larger drug-placebo differences (Fig. 2d and Table 1).

The chronicity measures (mean age and mean duration of illness) were negatively correlated with both drug-response and placebo-response. However, the effect was more pronounced in the drug-groups, leading to smaller drug-placebo differences in more chronic patients, although this effect was significant only for mean age (Fig. 2f, g).

Industry sponsorship, the number of medications and drug mechanism according to the “Neuroscience-based Nomenclature (NbN)” [46] had no significant impact on neither drug-response nor placebo-response, but the slopes were different enough that drug-placebo differences were significantly smaller in industry-sponsored studies, in studies on drugs with another primary mechanism than D2-antagonism by NbN and in studies with only one medication (Fig. 2h, k, l and Table 1).

A post-hoc sensitivity analysis following a reviewer request showed that the pattern of increasing placebo response, relatively stable drug response and decreasing drug placebo differences was present before and after the year 2000. The mean coefficients were virtually the same, although confidence intervals show that not all result that were statistically significant in the main analysis were significant in the sensitivity analysis. The smaller number of trials is a likely explanation (Supplement Fig. S2).

Moderators with a significant effect on drug-response and/or placebo-response but without a significant impact on drug-placebo differences (SMDs)

Studies with higher baseline severity, shorter duration of the wash-out phases, shorter study duration, and studies conducted in countries outside the US or mixed had higher drug-response than their counterparts (Fig. 2m–p). However, placebo response was affected in the same direction with similar slopes resulting in no significant effect on drug-placebo differences (Fig. 2m–p and Table 2).

Moderators without a significant effect on neither drug-response, nor placebo-response nor drug-placebo differences (SMDs)

The total dropout rate, the percentage of patients randomized to placebo, the number of arms, the percentage of academic sites, fixed or flexible dosing, percentage of men, and risk of bias in terms of randomization, allocation concealment, blinding and missing outcome data had no important impact on the results (Fig. 2q–z).

Moderators of drug response, placebo response and drug-placebo differences— multivariable analysis

As some significant predictors are naturally related to each other, we made the following choices for the multivariable models:

  1. a.

    We chose mean participant age rather than mean duration ill as a measure of chronicity, because more studies reported this outcome;

  2. b.

    We did not include placebo response in the multivariable model of drug response and vice versa because both are strongly correlated (see Fig. 1).

Significant factors in the model for drug response included: average age and baseline severity; in the model of placebo response: average age and total number of participants; and for drug-placebo differences: degree of placebo response and industry sponsorship (industry-sponsored trials had smaller effect sizes than non-industry-sponsored ones, see Table 3).

Table 3 Multivariable meta-regression models for drug response, placebo response, and drug-placebo differences (SMD) (backward-stepwise algorithm)

Discussion

In our analysis, we found that drug-response, placebo-response, and drug-placebo differences in effect sizes are not affected by the same factors in the same way. Certain factors influenced only either drug-response or on placebo-response so that they had a net effect on drug-placebo differences. Other factors affected drug-response and placebo-response in the same direction and to a similar degree so that they had no net effect on the effect size. Finally, some factors had no influence for any of these relationships.

Factors that had an impact on drug-placebo differences (effect size)

We have previously demonstrated that placebo-response has increased over the years while drug-response remained stable resulting in decreasing drug-placebo differences (Fig. 2a [1]). Here, we found a similar pattern of results for factors “sample size” and the related factor “number of sites” (Fig. 2b, c). Specifically, the more participants and sites, the more placebo-response and the smaller the effect sizes, while drug-response was unaffected by sample size and number of sites. The reason for this is difficult to interpret. We speculate that drug-response may plateau, while in studies with large sample size more patients who benefit from placebo are recruited compared to smaller trials where patients can be more carefully selected.

The effects on drug response and placebo response differed even more with regard to the doses used. In studies with high antipsychotic doses drug-response increased while placebo-response decreased and consequently effect size increased (Fig. 2d). It could be that the patients in studies with high doses are more severely ill so that they benefit more from drug and less from placebo. An alternative interpretation is that in studies with high doses there is more un-blinding by side-effects so that raters may guess the group patients are assigned to.

In studies that employed a threshold for minimum severity as an inclusion criterion we found that placebo-response was higher than in studies without such a criterion, while drug-response is not affected (Fig. 2e). One reason could be artificial baseline inflation in such studies. If the baseline ratings were inflated to meet an inclusion criterion, the next rating of patients in the placebo-group may be automatically lower, while in the drug group we may see a true reduction of symptoms. More chronic patients (the moderator mean age) respond less well to drug and to placebo but the effect was more pronounced in the drug groups so that it could be helpful to recruit younger patients than it is currently the case (mean age around 40 years, Fig. 2f). If, as in some previous reports [3, 4], we would have only analyzed placebo-response, the conclusion might have been to use more chronic patients to avoid placebo-response, demonstrating the importance of examining drug-response simultaneously. For the related factor “duration of illness” the influence on the drug-placebo difference was not significant, but fewer studies reported this factor and the pattern was the same (Fig. 2g).

Industry-sponsoring tended to reduce drug response and tended to increase placebo response, which resulted in smaller effect sizes than those of non-sponsored studies (Fig. 2h). This finding is counterintuitive in the sense that the pharmaceutical industry is suspected for designing trials in a way that inflates positive results [45]. However, industry sponsorship probably is a composite of multiple factors.

Studies using PANSS rather than BPRS, studies analyzing other antipsychotics than D2 receptor antagonists according to NbN, and studies applying operationalized diagnostic criteria had higher placebo-response, similar drug-response and smaller drug-placebo differences than their counterparts (Fig. 2i–k). However, this may be confounded by publication year. In early studies operationalized diagnostic criteria and the PANSS were not available yet, and early studies examined D2 receptor antagonists (mainly haloperidol) more frequently than recent studies.

Finally, studies that examined two or more active drugs had larger drug-placebo differences than those with only one drug. This result stands in contrast the finding that the number of arms in a study (which could also be due to different dose arms of the same drug being examined) did not affect the effect size (Fig. 2l, s).

Factors that had an impact on either drug-response or placebo-response or both, but not on drug-placebo differences

Baseline severity data shows a pattern that could be expected based on the findings from a previous individual-patient data meta-analysis [36]. The more ill the patients were at baseline the higher the difference between drug-response and placebo-response, although the effect on the drug-placebo difference was not significant (Fig. 2m). In contrast, the factors: duration of the wash-out period, study duration and study origin affected drug-response and placebo-response in the same way so that they had no impact on the drug-placebo difference (Fig. 2n–p). This finding is important, because from previous analyses examining placebo-response only, one might have concluded that washout phases need to be long enough to have a low placebo-response. However, our data suggest that longer wash-out phases may not be an optimal solution, because drug-response was also lower in studies with long wash-out periods.

Finally, the factors total dropout rate, percentage randomized to placebo, number of arms, percentage of academic sites, fixed or flexible dosing, and the various risk of bias items had no important effect on drug-response, placebo-response and drug-placebo differences (Fig. 2q–z). Reasons why, in contrast to some of our previous analyses, we found an effect of the percentage of trial participants randomized to placebo group [4, 5], the percentage of academic sites [5] or the percentage of men in a study [37], could be due to somewhat different definitions, or it could simply be due to our much larger number of trials available for the current analysis.

Given that various factors did not affect drug-response, placebo-response and drug-placebo differences in the same way, it is not surprising that the factors which were significant in the multivariable analyses were also not the same (Table 3). It should be noted that multivariable models are subject to multicollinearity and sometimes lead to over-simplified models by erroneously dropping variables with important contribution to model fit. In contrast, univariable models have a risk of confounding as moderators are correlated. Both results should therefore be considered. To our knowledge the publication by Rutherford et al. [6, 47] is the only other one that analyzed the interaction between drug-response and placebo-response. Comparison is difficult, because at that time only approximately half of the placebo-controlled studies were available, and because studies comparing antipsychotics with each other were included as well. This is probably the reason why drug-response increased in their analysis while it remained stable in ours.

Our analysis is limited by the fact that we have examined antipsychotics as a class, because not enough studies would have been available for single drugs. Moreover, meta-regression is based on mean values of trials, which makes the results prone to ecological bias [48], meaning that an average value representing a large group of individuals is rarely accurate in describing a specific individual from that group. Individual patient data meta-analysis is more powerful in this regard, but unfortunately it is limited by insufficient data availability.

Previous analyses had shown that placebo-response in antipsychotic drug trials has increased over the years [3, 5, 6], that drug response has remained stable [1] and that as a consequence drug-placebo differences have decreased [1]. The current analysis adds the following information to these analyses: The predictors of drug response are not the same as those of placebo response and of effect sizes. Therefore, it is not sufficient to only understand placebo response in such trials. We also need to understand drug-response, because its interaction with placebo-response forms the drug-placebo difference, which counts for patients.

Funding and disclosure

The meta-analysis was supported by the German Federal Ministry for Education and Research (Bundesministerium für Bildung und Forschung, BMBF) Grant: FKZ 01KG1115 and by the National Institute for Health Research (NIHR) Oxford Health Biomedical Research Center (grant BRC-1215-20005). The funding body was not involved in the study design, collection, analysis, interpretation of data, and in the decision to submit for publication. The views expressed are those of the authors and not necessarily those of the UK National Health Service, the NIHR, or the UK Department of Health. Authors had full access to the study data and complete discretion in the analysis of data and writing of this report. In the last 3 years, Stefan Leucht has received honoraria for consulting or lectures from LB Pharma, Otsuka, Lundbeck, Boehringer Ingelheim, LTS Lohmann, Janssen, Johnson&Johnson, TEVA, MSD, Sandoz, SanofiAventis, Angelini, Recordati, Sunovion, Geodon Richter. Claudia Leucht is Stefan Leucht’s spouse. Maximilian Huhn received lecture honoraria from Janssen and Lundbeck. Dr. Cipriani is also supported by an NIHR Research Professorship (grant RP-2017-08-ST2-006) and by the NIHR Oxford Cognitive Health Clinical Research Facility. The other authors declare no competing interests.