Antipsychotic-placebo separation on the PANSS-6 subscale as compared to the PANSS-30: a pooled participant-level analysis

In order for measurement-based care to be implemented, there is a need for brief rating instruments that can be administered in a short amount of time, but that are still sufficiently informative. Here, we assessed the drug–placebo sensitivity of the six-item subscale (PANSS-6) of the 30-item Positive and Negative Syndrome Scale (PANSS-30) using a large collection of patient-level data (n = 6685) from randomized controlled trials of risperidone and paliperidone. When analyzing the data by study, we found no material difference in mean effect sizes (ES) between the two measures (PANSS-30 ES = 0.45, PANSS-6 ES = 0.44; p = 0.642). Stratifying the pooled population according to several putative effect moderators (e.g., age, formulation, dose, or diagnosis) generally yielded no meaningful ES differences between the two measures. Similarly, early improvement (≥20% improvement at week 1) on the PANSS-6 predicted subsequent response (≥40% improvement at endpoint) as well as the analog prediction using PANSS-30. Finally, cross-sectional symptom remission assessed via the PANSS-6 showed very good agreement (sensitivity = 100%, specificity = 98%) with cross-sectional symptom remission defined by the Remission in Schizophrenia Working Group.


INTRODUCTION
The 30-item Positive and Negative Syndrome Scale (PANSS-30) 1 is the most widely used rating instrument in schizophrenia. While widespread in research settings, it is not readily amenable to routine clinical use because it takes 45-60 min to assess all 30 PANSS items 1 . Clinical practice may therefore be better served by using brief rating instruments that can be completed in a short amount of time, e.g., for routine objective tracking of short-term disease progression or improvement, or for assessing sustained response and remission [2][3][4] .
One such brief rating instrument is the unidimensional six-item PANSS subscale (PANSS-6). Following up on prior item-level analyses of the PANSS 5,6 , the PANSS-6 was derived as a unidimensional measure of schizophrenia severity via item response theory analyses of the eight-item PANSS-based definition of symptom remission from the Remission in Schizophrenia Working Group 2,7 . The PANSS-6 subscale includes three items measuring positive symptoms (P1 Delusions, P2 Conceptual disorganization, and P3 Hallucinatory behavior) and three items measuring negative symptoms (N1 Blunted affect, N4 Passive/ apathetic social withdrawal, and N6 Lack of spontaneity and flow of conversation). The sensitivity of the PANSS-6, when extracted from PANSS-30 assessments, has previously been found to match that of the PANSS-30 as far as antipsychotic-placebo differences 7 and differences between antipsychotics are concerned 8,9 . The PANSS-6 also has a high rate of agreement with the PANSS-based definition of symptom remission from the Remission in Schizophrenia Working Group 2,8,9 . By using the Simplified Negative and Positive Symptoms Interview (SNAPSI) a PANSS-6 rating can be completed in 15-20 min 10 . Furthermore, PANSS-6 ratings obtained using the SNAPSI have been shown to have good inter-rater reliability 11,12 and validity when using PANSS-6 ratings obtained via SCI-PANSS of the same patients conducted by independent raters as the reference 10 . Accordingly, PANSS-6 was recently highlighted as an alternative to longer clinician-rated scales in the practice guideline for the treatment of schizophrenia published by the American Psychiatric Association 13 .
In this study, we compared the sensitivity of PANSS-6 and PANSS-30 to the efficacy of antipsychotics in a large collection of patient-level data (n = 6685) from 18 acute-phase trials of antipsychotics in schizophrenia and schizoaffective disorder, which had used the PANSS-30 as outcome measure. We aimed to assess if there are conditions under which the PANSS-6 might be less sensitive than the PANSS-30, or conversely, if there are situations in which PANSS-6 may provide an advantage. Thus, we first compared the PANSS-6 and PANSS-30 in terms of their sensitivity to drug-placebo differences for all 18 included trials individually. We then pooled all studies and assessed sensitivity across several putative effect moderators (e.g., time under treatment, baseline severity, drug formulation). We also assessed how well cross-sectional symptom remission defined by the PANSS-6 aligned with the cut-off for symptom remission defined by the Remission in Schizophrenia Working Group 2 , without requiring the 6-month time criterion. Finally, since early symptomatic improvement on the PANSS-30 has been shown to be a strong predictor of subsequent response [14][15][16][17] , we also assessed the positive predictive value (PPV) and the negative predictive value (NPV) of early improvement on the PANSS-6 and PANSS-30. We hypothesized that PANSS-6, which can be assessed in much less time than then PANSS-30, would perform on par with PANSS-30, thereby presenting a clinically valid and useful measurement-based care instrument for both clinical care and research purposes.

Included studies
In total, 18 placebo-controlled studies with 46 antipsychotic-placebo comparisons were available for inclusion. Of these, nine investigated paliperidone extended release (n = 3232), five investigated paliperidone palmitate (n = 2085), three investigated risperidone (n = 1029), and one investigated risperidone depot (n = 336). One study, R076477-SCH-302, included elderly patients only; two studies, R076477-PSZ-3001 and RIS-SCH-302, only included adolescents; and the remaining studies included adults. Ten out of 12 studies investigating per oral (PO) formulations were of 6 weeks duration; the remaining 2 were of 4 weeks duration. Four out of six studies investigating long-acting injectables (LAIs) were of 13 weeks duration; the remaining two were of 12 and 9 weeks duration, respectively. Details of the included studies are displayed in Table 1. Table 2 details all comparisons between active treatment and placebo. Out of 46 antipsychotic-placebo comparisons, a statistically significant superiority of treatment was found for 38 when the PANSS-30 was used as the effect parameter. Likewise, superiority of treatment was found for 39 pairs when the PANSS-6 was the outcome measure. Seven treatment-placebo comparisons showed no statistically significant separation on either outcome measure and 38 showed statistically significant separation on both outcome measures. The one comparison that differed between outcome measures was the high-dose group (paliperidone extended release 6-12 mg) in Study R076477-PSZ-3001, where the PANSS-6 showed a statistically significant superiority of active treatment (ES 0.51, p = .013), while the PANSS-30 did not (ES 0.33; p = 0.102). Five drug-placebo comparisons showed an ES difference between the PANSS-6 and PANSS-30 above 0.10 (three favoring PANSS-6). Endpoint item scores for these five comparisons are provided in Supplementary  Tables 1-5. When analyzing all active treatments in each trial as a group, the arithmetic mean effect size across trials was 0.45 for the PANSS-30 and 0.44 for the PANSS-6; with a non-significant difference in effect size between the two scales of 0.0061 (SEM 0.0130; p = 0.642). Effect sizes were numerically larger for PANSS-30 than for PANSS-6 in 11 out of 18 trials (p = 0.346).

Study-level comparisons between PANSS-30 and PANSS-6
Pooled comparisons between PANSS-30 and PANSS-6 Table 3 details the results of pooled analyses stratified for putative effect moderators. The overall pooled effect size (0.46 for PANSS-30 and 0.45 for PANSS-6) was similar to the arithmetic mean effect size. In most analyses (13 out of 18), the PANSS-30 yielded numerically larger effect size. With the exception of the analyses in old age (≥65) individuals, where PANSS-6 had an effect size 0.09 units larger than PANSS-30, effect size differences did not exceed 0.03 in favor of either outcome measure.
Comparison of symptomatic remission between PANSS-6 and PANSS-8 According to the PANSS-8 symptom remission criteria, 21.6% of placebo-treated participants and 33.8% of actively treated participants reached cross-sectional remission (meeting the PANSS-8 symptom remission criteria at the last available visit). Analyses yielded an additional 1.1% remitted patients on placebo and 1.5% on active treatment when only the PANSS-6 criteria were applied. Among these PANSS-6 remitters, 21 individuals scored 4 points on G5 Mannerisms and posturing, but were otherwise in PANSS-8-defined remission; 67 scored 4 points and 2 scored 5 points on G9 Unusual thought content, but were otherwise in PANSS-8-defined remission, and 4 individuals scored 4 points on both G5 Mannerisms and posturing and G9 Unusual though content. There were thus 94 'false positives' (i.e., remitters on PANSS-6 but not on PANSS-8), yielding a specificity of 98.0%. Patients in PANSS-6 remission but not in PANSS-8 remission (mean PANSS-30: 69.4) had significantly (p < 0.0001) higher PANSS-30 scores than patients in PANSS-8 remission (mean PANSS-30: 56.9) but significantly (p<0.0001) lower PANSS-30 scores than patients who were not in remission according to either criteria (mean PANSS-30: 88.2).
Comparisons of PPV and NPV for PANSS-30 and PANSS-6 Figure 1a-h details the PPV and NPV of early response (≥20% decrease in PANSS-6/PANSS-30 at week 1) as it pertains to ultimate response (≥40% decrease in PANSS-6/PANSS-30 at the last available observation). While the NPV was higher for placebotreated patients (82.7-84.8%; Fig. 1a-d) than for trial participants receiving active treatment (69.3-71.4%; Fig. 1e-h), the PPV was higher for actively treated participants (54.3-60.6%; Fig. 1a-d) than for placebo-treated patients (42.0-46.2%; Fig. 1e-h). There were no major differences in PPV or NPV either across or within PANSS scales, i.e., early improvement on PANSS-6 predicted subsequent response on both the PANSS-6 and PANSS-30 with comparable accuracy, and early improvement on the PANSS-30 likewise predicted subsequent response on both the PANSS-6 and PANSS-30 with comparable accuracy. Table 4 contains the results from the post hoc analysis of effect sizes for individual PANSS items. Most effect sizes (26/30) were in the range of 0.20-0.40. The lowest effect size was observed for item G7 Motor retardation (ES 0.11), and the highest effect sizes were seen for the two positive symptoms P2 Conceptual disorganization and P6 Suspiciousness/persecution (ES 0.40 for both items). The impact of adding a specific item to the PANSS-6 largely mirrored effect sizes for the individual items. The largest improvement in subscale effect size (4.6%) was seen for item P7 Hostility, and the largest decline (−4.1%) was seen for item G7 Motor retardation.

DISCUSSION
The main finding of this study is that the PANSS-6 and the PANSS-30 have comparable sensitivity to antipsychotic efficacy across a range of putative effect moderators. This finding was evidenced by a negligible difference in mean effect sizes when all 18 included trials were analyzed individually; likewise, subgroup analyses showed no effect size differences exceeding 0.03, with the exception of the old age (≥65) subgroup where PANSS-6 showed an ES 0.09 units larger than PANSS-30. Similarly, agreement between cross-sectional symptom remission as defined by PANSS-6 and by the eight-item definition suggested by the Remission in Schizophrenia Working Group 2 was very high. Moreover, with regard to prediction of subsequent response via early improvement, the PPV and NPV were comparable between the PANSS-6 and the PANSS-30, both within and across outcome scales. That PANSS-6 is equally sensitive to the PANSS-30 with regard to the efficacy of antipsychotics is in line with results from prior studies on both the efficacy 7 and effectiveness of antipsychotics in the treatment of schizophrenia 9,10 .
While, based on these and previous results, PANSS-6 seems to be an adequately sensitive instrument for tracking core schizophrenia severity, it should be emphasized that PANSS-6 ratings might need to be accompanied by ratings on measures of other constructs that are relevant in relation to the care of individuals with schizophrenia, e.g., depression, anxiety, cognition, agitation/ aggression, medication side effects, level of functioning and     quality of life [18][19][20][21][22] , and that such considerations will depend on the research questions and areas of individual need being addressed.
With regard to prediction of subsequent response via early improvement, the PPV and NPV of the PANSS-6 were comparable to those of the PANSS-30, both with respect to longitudinal prediction based on the same measure and in comparisons across time and between the PANSS-6 and PANSS-30 (Fig. 1). These results replicate previous findings showing that early improvement on the PANSS-30, as well as on other schizophrenia rating scales [14][15][16][17] , is a strong predictor of subsequent response. The fact that this relationship holds also for PANSS-6 is of obvious importance if the PANSS-6 -which does not take as much time to complete as the PANSS-30 -is to be used to reduce contact time in clinical trials that may contribute to an observed inflated placebo response 23 , or to inform personalized treatment in clinical care settings.
The high rate of agreement (100% sensitivity, 98.0% specificity) between cross-sectional symptom remission as defined by the PANSS-8 and PANSS-6 is partly by design since all patients meeting the PANSS-8 criteria also qualify for remission according to the PANSS-6, thus yielding perfect sensitivity. Patients who had remitted according to PANSS-6, but not PANSS-8, had significantly higher PANSS-30 symptom scores than PANSS-8 remitters (69.4 vs 56.9) but significantly lower symptom scores than non-remitters to the PANSS-8 definition (69.4 vs 88.2). This finding suggests that the small fraction of additional PANSS-6 remitters may differ from the larger group of PANSS-8 remitters.
Most individual PANSS items (26 out of 30) had effect sizes in the range of 0.20-0.40. As expected due to the predominant efficacy of currently available antipsychotics for positive rather than negative symptoms 24,25 , positive symptoms were, on average, those that separated most clearly from placebo (Table 4). This contrasts to similar analyses of patients with major depression where much larger disparities in individual-item effect sizes have been reported for the Hamilton Depression Rating Scale 26 . Notably, the two PANSS items with the lowest effect sizes were G7 Motor retardation (ES 0.11) and G1 Somatic concern (ES 0.15), which could reflect that specific side effects of antipsychotics worked against a general improvement in the underlying condition 27,28 , as has been suggested for depression-i.e., that some side effects (e.g., sedation, dystonia, arthralgia, nausea) are mistaken for psychopathology as measured by G7 Motor retardation and G1 Somatic concern 29-31 . Another factor that may contribute to the low drug-placebo separation for these items is the comparatively low baseline scores (2.30 for G7 and 2.54 for G1) in combination with the fact that these symptoms showed some improvement also on placebo (endpoint scores on placebo: 2.10 and 2.35, respectively). Taken together, this leaves little room for a true drug effect to be detected.
This study has a number of limitations. First, although we included a large number of trials and participants, only few trials included certain subgroups, e.g., trials of schizoaffective disorder and trials focusing on adolescents or older adults. The power to detect differences in sensitivity between the PANSS-6 and the PANSS-30 in these subgroups was hence insufficient, and the results should be interpreted with caution. Second, the results stem from data obtained through randomized clinical trials, and it remains to be investigated to what degree these results will generalize to clinical settings. Third, PANSS-6 ratings were derived from the full PANSS-30 ratings; however, PANSS-6 ratings obtained using the SNAPSI may not correspond to those observed when conducting the full SCI-PANSS. Ideally, one should compare data from different raters independently scoring the PANSS-6 and PANSS-30 in the same patients, at the same time. In lieu of such data, analyses extracting PANSS-6 scores from full PANSS-30 ratings are the best alternative. Notably, we recently conducted a study comparing dedicated PANSS-6 assessments via the SNAPSI and PANSS-6 assessments as part of the full PANSS-30 ratings obtained by independent raters and found good agreement across the two methods of obtaining PANSS-6 ratings 11 . Finally, from an implementation perspective, it should be noted that, while the SNAPSI is freely available for non-commercial clinical and research purposes (https://www.medavante-prophase.com/ welcome-to-snapsi/), use of the PANSS-30 and its subscales (including the PANSS-6) requires a license agreement with the copyright holder (Multi-Health Systems) and is associated with a fee. To summarize, in this large-scale patient-level analysis, the sensitivity of the PANSS-6 to antipsychotic efficacy was comparable to that of the PANSS-30 across a range of different tests and putative effect moderators. These findings add to a growing literature indicating that the PANSS-6 can be used to adequately monitor the severity of core schizophrenia symptomatology over time [7][8][9][10][11][12][13] . Given its brevity, the PANSS-6 may facilitate the implementation of objective tracking of core schizophrenia severity in the clinic and contribute to making future clinical trials of treatments for schizophrenia less costly and resource intensive.

Data acquisition
We requested patient-level data for all industry-sponsored, acute-phase, placebo-controlled trials of risperidone and paliperidone via the Yale Open Data Access (YODA) project 32 . Remote access to patient-level data was provided by Johnson & Johnson and YODA for all 19 requested studies. One study (RIS-USA-1/Study 201) used the Brief Psychiatric Rating Scale (BPRS) instead of the PANSS and could hence not be included in the analyses. In order to verify the accuracy of the data, we compared our results to those from study reports provided by YODA and with those available in public reports from the United States Food and Drug Administration 33-39 , the European Medicines Agency 40 , and ClinicalTrials.gov [41][42][43] .

Analyses and statistics
Individual antipsychotics and doses were first analyzed by trial using analysis of covariance (ANCOVA). Analyses were performed on the intention-to-treat population using last observation carried forward methodology up until the last scheduled evaluation for each trial. The efficacy of all included treatment arms was assessed using both the PANSS-30 and PANSS-6. The models included a fixed factor for treatment and baseline score on the corresponding outcome measure as a covariate. Effect size differences between the PANSS-6 and PANSS-30 were assessed using paired samples t-test, and the rate at which effect sizes favored either outcome measure over the other was assessed using the onesample chi-squared test with both outcomes expected to occur with equal frequency. In order to not include placebo-treated participants more than once (i.e., since a trial may have had several active treatment arms), the two latter analyses were conducted with all patients receiving active treatment analyzed together in each trial. We then pooled all available studies and conducted analyses stratified by putative effect moderators. The model specifications for the pooled analyses were analogous to those used for the analyses of individual studies with the addition of a fixed factor for study (with one exception, see below). The assessed effect moderators were earlier endpoints (weeks 2, 4, and 6), drug formulation (LAI, or PO), diagnosis (schizophrenia vs. schizoaffective disorder), baseline severity (PANSS-30/PANSS-6 at or below median vs. above median), dose, and age group (adolescents, adults, older adults). For the dose analyses, we included all trials investigating at least two different doses of one active treatment and included the arm with the lowest given dose in the 'low-dose' group and the arm with the highest given dose in the 'high-dose' group. For the age group analyses, we excluded the study factor since some studies included very few patients belonging to a specific age group.
We then assessed endpoint symptom remission in the pooled population (i.e., a cross-sectional definition of remission not requiring the 6-month time criterion) 2 . We did so by contrasting PANSS-6-defined remission (defined as a score of ≤3 on all PANSS-6 items, range 1-7) with the eight-item definition (defined as a score of ≤3 on all PANSS-6 items as well as on G5 Mannerisms and posturing and G9 Unusual thought content, PANSS-8) suggested by the Remission in Schizophrenia Working Group 2 . Due to the overlap between the criteria, all patients in remission according to the PANSS-8 criteria were by definition also in PANSS-6-defined remission. We thus focused on those additional patients found to be in remission only according to the PANSS-6. We detailed their scores on the two additional items in the PANSS-8 and assessed (via independent samples t-tests) whether their PANSS-30 scores were different from those of patients who were in symptom remission according to the PANSS-8 definition, and from those of patients who were not in remission according to either definition, respectively 2 .
We then assessed the PPV and NPV of early improvement (defined as a ≥20% reduction in PANSS-6 or PANSS-30 at the week 1 evaluation) on subsequent response (defined as a ≥40% reduction in PANSS-6 or PANSS-30 at the endpoint evaluation). These assessments were performed both within scales (e.g., early improvement in PANSS-6 predicting PANSS-6 response) and across scales (e.g., early improvement in PANSS-30 predicting PANSS-6 response). Analyses were stratified by treatment (placebo or active treatment). For patients to be included in the analyses, they needed to have an evaluation during week 1 and at least one subsequent evaluation. The last available scheduled evaluation was used as the endpoint evaluation. In order for percentage differences to make intuitive sense, PANSS scores were rescaled from 1 to 7 to 0 to 6 (i.e., so that a patient with no PANSS-30-measured symptoms would score 0 rather than 30) for this analysis.
Finally, based on the observation that the pooled effect sizes obtained with the PANSS-6 and PANSS-30 were almost identical, but slightly higher for the PANSS-30, we conducted the following post hoc analyses. First, we calculated effect sizes for all individual PANSS-30 items. Subsequently, we analyzed how the drug-placebo sensitivity of the PANSS-6 would be impacted by including each of the 24 PANSS-30 items not included in the PANSS-6 ("add-one-in" analysis). The models used for these analyses were identical to those described above but with the outcome parameter being the item in question or that item plus PANSS-6, with the baseline score on the respective outcome parameter being included as a covariate.
All analyses were conducted using R version 3.6.1. Two-sided p values <0.05 were considered statistically significant. Due to substantial overlap between outcomes (the PANSS-6 items are nested within the PANSS-30), populations (individual trials are nested within the pooled population), and subgroups (e.g., participants with low scores on the PANSS-6 also tend to have low scores on the PANSS-30), correction for multiple testing was not performed.

Ethics
The data used for this study consist of de-identified patient-level data from previously conducted clinical trials. Secondary analyses of de-identified data does not fall under the purview of ethical committees in the jurisdiction where the research was carried out.

Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.

DATA AVAILABILITY
The data used in this article can be requested from the Yale Open Data Access website 32 .