INTRODUCTION

Diffuse cognitive deficits are present in patients with schizophrenia throughout the course of illness (Goldberg and Gold, 1995; Saykin et al, 1991; Sharma and Harvey, 2001). Cognitive impairment has been well documented in multiple domains, including executive functions such as attention, abstraction, and mental flexibility, as well as learning and memory. The deficits are evident in first-episode neuroleptic naïve (NN) patients, before therapeutic intervention (Saykin et al, 1994), and persist despite symptomatic improvement with conventional antipsychotic treatment (Cassens et al, 1990; Censits et al, 1997). The recognition that performance on neurocognitive measures is related to functional outcome (Bellack et al, 1999; Green, 1996; Harvey et al, 1998) has prompted efforts to assess whether newer antipsychotics may ameliorate cognition in addition to symptoms.

Earlier studies with ‘atypical’ neuroleptics, reviewed by Keefe et al (1999), have noted improvement in some cognitive domains (Buchanan et al, 1994; Goldberg et al, 1993; Green et al, 1997; Hagger et al, 1993; Hoff et al, 1996). However, Harvey and Keefe (2001) suggested that these efforts were methodologically limited. The number of patients was small, baseline pharmacological status was wide-ranging, and most interventions were open-labeled with varied doses of medications and evaluated changes over only 6–12 weeks. The use of repeated testing requires control over practice effects, which need to be determined by repeated administration to the comparison sample. Furthermore, the clinical significance of observed improvement on the neurocognitive measures should be evaluated. For example, studies reporting beneficial neurocognitive improvement in patients treated with newer, compared to conventional antipsychotics, have applied relatively high doses of typical agents. Lower doses of typical medication have shown similar clinical efficacy to atypical agents (Geddes et al, 2000). A longitudinal 2-year comparative study between risperidone and haloperidol, administered at a low dose, did not show that risperidone is associated with enhanced cognition (Green et al, 2002). Purdon et al (2000) reported cognitive improvement in olanzapine-treated patients, relative to risperidone and haloperidol. The improvement in the olanzapine group was most notable in memory and visual organization skills.

The rigorous study of treatment effects on neurocognition is challenged by the need for establishing a broad neurocognitive profile in order to identify specific domains that show change worthy of pursuing in a large-scale double-blind study. Traditional neuropsychological batteries are lengthy and require professional administration and scoring. We have developed a computerized neurocognitive ‘scan’ that has been applied to healthy participants (Gur et al, 2001a) and to patients with schizophrenia (Gur et al, 2001b). The computerized testing provides standard unbiased administration, automated scoring, and error-free data entry, and demonstrates a profile similar to the traditional paper and pencil measures (Saykin et al, 1991, 1994). The computerized procedures have not been applied in conjunction with treatment and their association with clinical status examined longitudinally is unknown. The main purpose of the present study was to determine whether treatment-associated changes could be detected for specific neurocognitive domains.

An issue that has not been settled in previous studies is the extent to which neurocognitive improvement relates to symptomatic amelioration. As a first step, we selected an open label design that permits optimal therapeutic response. We followed a prospective sample of olanzapine-treated patients with schizophrenia, using independently obtained clinical assessment and neurocognitive measures. Patients were studied at informative stages of therapeutic intervention: baseline, prior to initiation of treatment; following 6 weeks of treatment, when positive symptoms are likely to be affected; and after 6 months of treatment, to examine long-term effects. The concomitant clinical assessment and monitoring of treatment and course enabled the examination of the relation between symptom amelioration, and changes in functioning and neurocognition. The healthy participants established the normative pattern for the neurocognitive measures controlling for learning and practice effects. We tested the hypothesis that treatment is associated with improved symptoms, cognition, and functioning and that some domains of cognitive improvement are associated with clinical response.

METHODS

Subjects

The initial sample enrolled in the study included 19 outpatients with schizophrenia, 11 men and eight women, and 34 healthy participants, 17 men and 17 women, from the Schizophrenia Research Center at the University of Pennsylvania. Healthy participants were balanced to patients sociodemographically with respect to age (range 18–45) and parental education. The groups did not differ (mean±SD) in age (patients 30.5±9.1; controls (28.5±7.0) or parental education (patients 14.3±3.9; controls 14.8±2.8), but, as expected, patients attained lower education (12.4±2.7) than controls (15.2±2.2), t=4.13, df=51, p<0.001.

All research participants at the Schizophrenia Research Center undergo standardized rigorous intake and assessment procedures. These consist of medical, neurological and psychiatric evaluations, and laboratory tests. The psychiatric evaluation includes clinical assessment, a structured interview (SCID-P, First et al, 1996), and history obtained from family, care providers, and records (Gur et al, 1991). All patients, except one with schizoaffective depressed-type illness, had a DSM-IV diagnosis of schizophrenia established in a consensus conference based on all available information. None had a history of any other disorder or event that might affect brain function including hypertension, metabolic disorders any neurological disorder, or event and history of substance abuse. Age of onset of psychotic symptoms of sufficient severity to result in functional decline was established on the basis of converging sources of information and averaged 25.3±8.0, while the duration of illness was 5.2±5.7 (range 0.5–19) years. Most patients in our center are treated with a new generation antipsychotic agent. Consecutive patients for whom treatment with olanzapine was recommended by their primary psychiatrist on a clinical basis, were referred for participation in the study. At entry to the study, 11 patients were first-episode NN, three had been treated with a typical agent (haloperidol), two with atypical agents (risperidone and quetiapine), and three with typical and more recently with atypical agents. For previously treated (PT) patients, residual symptoms, intolerance of side effects, or concern for long-term side effects underlined the decision to start olanzapine.

Healthy participants underwent the same evaluation as patients and the SCID-NP and SCID-II were applied (First et al, 1995, 1997). Those with a first-degree relative with a history of schizophrenia or affective illness were excluded.

Procedures

Following the initial intake evaluation, a complete description of the study was provided and written informed consent was obtained before participation. For patients, medications were discontinued in the most clinically appropriate manner, as determined by the treating physician. For PT patients after a 48-h washout period off antipsychotic medications, baseline assessments were performed and treatment with olanzapine was initiated. Patients were followed as clinically indicated with flexible dose intervention. Olanzapine daily dose averaged 13.0±6.4 mg (range 5–30 mg). Concomitant medications included clonazepam for two patients. Compliance was assessed by pill counts and patient and family education was provided as part of the activities offered to all participants in the center.

Subjects underwent a baseline evaluation that was repeated at 6 weeks and at 6 months. The evaluation included clinical assessment, functional status, and neurocognitive measures. Of the initial sample of healthy controls, 24 returned for at least one of the follow-up sessions (20 to the 6-week follow-up, and partially overlapping 21 to the 6-month follow-up) and were included in the analysis of change scores. Reasons for missing sessions included scheduling, moving, and lack of interest. Three patients also did not complete neurocognitive measures longitudinally due to change of location or lack of interest. Participants who did not complete follow-up did not differ on baseline measures from those who completed the study.

The clinical examination included assessment of symptoms and outcome, performed by trained reliable (ICC>0.90) investigators (Gur et al, 1991). Symptom ratings included the Brief Psychiatric Rating Scale (BPRS; Overall and Gorham, 1980) and the Scales for Assessment of Negative Symptoms (SANS; Andreasen, 1984a) and Positive Symptoms (SAPS; Andreasen, 1984b). Patients had no extrapyramidal signs (Simpson and Angus, 1970) or tardive dyskinesia (Simpson et al, 1979). The Strauss–Carpenter Outcome Scale (Strauss and Carpenter, 1972) was applied to evaluate outcome functioning in the social, vocational, personal, and general domains. The total score for the Quality of Life Scale (Henrichs et al, 1984) was also evaluated.

The computerized neurocognitive scan was administered following the clinical assessment. Its development, administration, and validation procedures were detailed (Gur et al, 2001a) and its application to schizophrenia was described (Gur et al, 2001b). Briefly, the scan provides measures of accuracy, speed, and efficiency for eight neurocognitive domains: Abstraction and Mental Flexibility (ABF), Attention (ATT), Verbal Memory (VME), Face Memory (FME), Spatial Memory (SME), Language processing (LAN), Spatial processing (SPA), and Sensorimotor (SM).

Data Analysis

The clinical ratings on the BPRS, SANS, SAPS, and the outcome measures were the dependent variables used to assess treatment effects. A one-way Mixed model, with time (baseline, 6 weeks, 6 months) as a within-group (repeated-measures) factor was used to test the hypothesis that olanzapine treatment is associated with symptomatic relief and functional improvement, and for clinical subscales it was extended to a two-way Mixed model, with subscale as another within-group measure. Significant main effects for time or time × subscale interactions legitimized testing for improvement using paired t-tests without correction for multiple comparisons. Improvement was defined as the average severity on the second and third assessments subtracted from severity on first assessment, so that higher scores would indicate more improvement. The neurocognitive measures within each domain were transformed to their standard equivalents (z-scores), using means and standard deviations from the baseline session of the healthy participants. The scores were calculated for accuracy, speed, and efficiency (accuracy/logspeed). The z-scores for speed were reversed in sign to make them consistent with higher values reflecting better (faster) performance. The domain scores were the dependent measures in a Mixed model with one grouping factor of diagnosis (schizophrenia, control) and two repeated-measures factors of time (baseline, 6 weeks, 6 months) and domain (eight neurocognitive domains). A significant diagnosis × time × domain interaction legitimized comparing improvement scores between patients and controls for each domain. This was performed with follow-up Mixed model analyses testing for a time × diagnosis interaction within each neurocognitive domain. To examine the relation between clinical and neurocognitive change, we computed the Spearman correlations between clinical and neurocognitive improvement scores. To contain Type I (experimenter-wise) error, these correlations were examined only for the efficiency scores and the global SANS and SAPS measures.

RESULTS

Clinical

The clinical ratings are presented in Table 1. Treatment with olanzapine was associated with improvement in symptoms, as reflected in total BPRS, SANS, and SAPS, as well as specific subscales. The Mixed model analysis for total BPRS showed a main effect of time, F(2,19)=6.98, p=0.0053, indicating improvement. A time × subscale analysis for the SANS showed a main effect of time, F(2,167)=3.95, p=0.0212, reflecting overall improvement, and subscale, F(4,167)=10.29, p<0.0001. For SAPS, there were likewise main effects of time, F(2,130)=4.91, p=0.0088, and subscale, F(3,130)=13.07, p<0.0001. With regard to outcome measures, the level of functioning subscale analysis indicated no significant effect of time, F(2,122)=1.10, p=0.3378, but a significant main effect for subscale, F(3,122)=6.51, p=0.0004 and a time × subscale interaction, F(6,122)=3.93, p=0.0017. No significant improvement was documented for the quality of life scale. As can be seen in Figure 1, improvement scores were positive for the symptom scales and for the outcome measures.

Table 1 Clinical Ratings at Baseline, 6 Weeks and 6 Months
Figure 1
figure 1

Means±SEM of improvement scores in symptom severity (left panel), for the BPRS, SANS, and SAPS, and for outcome subscales (right panel): clinical (CLIN), personal (PERS), social (SOC), and vocational (WORK) domains.

For improvement scores for the specific SANS and SAPS Global measures, significant amelioration was seen for the negative symptoms of Affect and Avolition and the positive symptoms of Hallucinations, Delusions, and Thought Disorder (Figure 2).

Figure 2
figure 2

Means±SEM of clinical improvement scores on the SANS and the SAPS subscales. AFF=Affect; ALO=Alogia; AVO=Avolition; ANH=Anhedonia; ATT=Attention; HAL=Hallucinations; DEL=Delusions; BIZ=Bizarre behavior; THD=Thought disorder.

Neurocognitive

The Mixed model analysis on the efficiency scores showed significant main effect of diagnosis, F(1,788)=21.35, p<0.0001, controls outperforming patients, domain, F(7,788)=6.22, p<0.0001, indicating variability in performance across neurocognitive domains, and time, F(2,788)=5.06, p=0.0065, indicating overall improved efficiency with repeated measurements. There was a significant diagnosis × domain interaction, F(14,779)=4.28, p<0.0001, indicating differential impairment in patients for specific neurocognitive domains, and a diagnosis × time interaction, F(4,779)=2.66, p=0.0316, indicating that patients and controls differed in their improvement over time. Finally, there was a significant diagnosis × time × domain interaction, F(47,751)=3.27, p<0.0001, indicating different changes in patients and controls depending on the domain. Accuracy scores also showed significant main effects of diagnosis, F(1,836)=21.81, p<0.0001, domain, F(7,836)=5.37, p<0.0001, but not time, F(2,836)=2.11, p=0.1223. Thus, improved performance efficiency with repeated administration, when examined across the sample, was not reflected in better accuracy. As with the efficiency scores, the diagnosis × domain interaction was significant, F(14,827)=4.89, p<0.0001, but the diagnosis × time interaction was not, F(4,827)=1.13, p=0.3410. However, the three-way diagnosis × time × domain interaction was significant, F(47,799)=3.02, p<0.0001. For speed, there were main effects for diagnosis, F(1,835)=13.84, p<0.0002, domain, F(7,835)=6.48, p<0.0001, and time, F(2,835)=14.82, p<0.0001. The diagnosis × domain and diagnosis × time interactions were significant, F(14,826)=6.24, p<0.00001 and F(4,826)=8.81, p<0.0001, as was the three-way diagnosis × time × domain interaction, F(47,798)=3.41, p<0.0001.

Since the three-way interaction was significant for efficiency, accuracy, and speed, follow-up Mixed model analyses were performed on all dependent measures in each domain to test for a diagnosis × time interaction. For the efficiency measure, the diagnosis × time interaction was significant for all domains (ATT, F(5,63)=2.67, p=0.0298; VME, F(5,66)=5.03, p=0.0006; FME, F(5,63)=9.10, p<0.0001; SME, F(5,58)=4.62, p=0.0013; LAN, F(5,59)=4.42, p=0.0018; SPA, F(5,65)=3.79, p=0.0045; SM, F(5,23)=4.79, p=0.0038), except ABF, F(5,66)=1.64, p=0.1618. For accuracy, the interaction was significant for ABF, F(5,66)=2.72, p=0.0268; VME, F(5,66)=5.04, p=0.0006; FME, F(5,63)=6.50, p<0.0001; SME, F(5,58)=4.00, p=0.0035; LAN, F(5,59)=3.33, p=0.0103; SPA, F(5,66)=3.79, p=0.0044, but not for ATT, F(5,63)=2.21, p=0.0644, or SM, F(5,66)=1.73, p=0.1395. For speed, the interaction was significant for all domains (ABF, F(5,66)=5.29, p=0.0004; ATT, F(5,63)=2.95, p=0.0187; VME, F(5,66)=3.29, p=0.0104; FME, F(5,63)=9.51, p<0.0001; LAN, F(5,59)=2.65, p=0.0318; SPA, F(5,65)=5.88, p=0.0002; SM, F(5,66)=4.71, p=0.0010), except for SME, F(5,58)=1.62, p=0.1699.

Improvement measures were examined to clarify the time × diagnosis interaction, contrasting baseline to the first (6 weeks) and second (6 months) follow-up measures (Figure 3). As can be seen, there was little change in accuracy from the first to the second measurement (top left) in either group, with the exception of improved face memory in controls and comparable improvement in spatial memory for patients and controls. Patients also showed improvement in attention accuracy, while the improvement in controls was marginal. For speed, controls showed some improvement for all domains except attention, while patients improved in abstraction, face memory, spatial, and sensorimotor speed (bottom left). For the improvement scores comparing the third (6 months) to the first (baseline) session, a significant improvement in accuracy was seen in patients on both abstraction and flexibility and in spatial memory, which was more marked than that seen for controls (top right). Controls also showed improvement in face memory, not seen in patients. Both groups showed markedly increased variability in attention accuracy change. Improvement in speed was seen in patients, comparable for that of controls, with the exception of attention, verbal memory, and language, where patients showed no speed savings (bottom right).

Figure 3
figure 3

Means±SEM of change in neurocognitive domain measures from baseline (time 1) to 6 weeks (time 2; left column) and 6 months (time 3; right column) for accuracy (top row) and speed (bottom row) in patients (SCH) and controls (CNT). ABF=abstraction and flexibility; ATT=attention; VME=verbal memory; FME=face memory; SME=spatial memory; LAN=language processing; SPA=spatial processing; SM=sensorimotor.

There were too few patients to examine formally whether the NN differed from the PT subgroup in improvement rates. Nonetheless, because of the potential importance of this factor, we compared the groups on average improvement for efficiency scores. The means were close, with the largest favoring the NN patients on FME (improvement of 0.276±1.331 for NN compared to worsening of −0.758±2.589 for PT patients) and the largest favoring PT patients on ABF (0.254±1.433 for NN and 0.885±0.860 for PT).

Correlation of Clinical and Neurocognitive Improvement

Correlations between neurocognitive and clinical improvement were nil for SAPS subscales, but significant for word and spatial memory and the spatial processing domain with SANS (r=0.67, 0.80, and 0.55, all p<0.01). Scatterplots (Figure 4) revealed that all patients who improved neurocognitively also showed clinical improvement, while all patients who did not improve clinically showed a decline in neurocognitive performance. On the other hand, some patients who improved clinically showed no improvement or even decline in the neurocognitive measures. Neither age of onset nor duration of illness correlated significantly with the change scores (correlations ranging from −0.34 to 0.43).

Figure 4
figure 4

Scatterplots showing the association between clinical change on the SANS and neurocognitive change for verbal memory, spatial memory, and spatial processing. Positive values reflect improvement for both measures.

DISCUSSION

Patients showed clinical improvement associated with olanzapine treatment, which is consistent with previous studies (Bilder et al, 2002; Sanger et al, 1999). While the improvement was more pronounced for positive than for negative symptoms, some amelioration was achieved for two of the negative symptoms, affective flattening and avolition. Other reports have suggested that atypical agents may improve negative symptoms, which are more resistant to typical neuroleptics (Chakos et al, 2001). A favorable response to intervention in first-episode patients has been documented (Emsley, 1999; Sanger et al, 1999) and over half of the current sample consisted of first-episode patients. The low level of baseline severity may explain the lack of improvement in bizarre behavior. The limited severity of these positive symptoms in this sample is representative for patients in our center, who are engaged in extensive procedures that require the ability to consent and fully cooperate.

Our patients at baseline showed cognitive impairment similar in magnitude and profile to that of patients reported in earlier studies with these tests (Gur et al, 2001b). Both patients and healthy participants showed changes associated with repeated testing. However, the pattern of change differed in a way that supports treatment effects for some domains. In the healthy participants, repeated testing produced improvement, predominantly in speed, from baseline to the second administration at 6 weeks. These savings in speed reached a plateau by the third administration at 6 months. Patients, in contrast, showed significant improvement on the third administration in the abstraction and spatial processing domains and for accuracy on spatial memory. Since these effects are apparent against practice effects observed in the healthy controls, they are likely to be treatment related. This finding is consistent with results reported by Purdon et al (2000), indicating specific effects of olanzapine treatment on memory and spatial functions. The improved memory and speed of processing performance in our patients is consistent with Bilder et al (2002), who examined the effects of several antipsychotics including olanzapine. However, our results permit greater specificity within the memory domain by providing measures of face and spatial memory. While spatial and face memory performance improved in patients, verbal memory did not. This is consistent with Bilder et al (2002), who did not find improvement in verbal memory in patients treated with olanzapine.

It is noteworthy that with the exception of the specific domains in which patients showed improvement exceeding that of controls, in other domains they failed to show even the modest practice effects evinced by the controls. This is consistent with evidence for learning deficits in schizophrenia, and poses a methodological issue for research on treatment effects related to neurocognitive measures. Specifically, studies in which patients show even modest cognitive improvement could be considered successful for specific domains because patients with schizophrenia show minimal, if any, benefit from practice alone.

The present design permitted correlation of clinical and neurocognitive changes, and this analysis revealed that improvement in spatial memory was quite strongly associated with clinical improvement in negative symptoms. The specificity of correlations between neurocognitive improvement and amelioration of negative symptoms has also been reported by Bilder et al (2002). In the present study, all patients who improved on spatial memory also improved clinically, while those who did not improve their performance showed minimal amelioration or even worsening of symptoms. The treatment-associated improvement in spatial memory could reflect improvement in spatial working memory, which has been an area of substantial deficit in schizophrenia (Park et al, 2003). However, spatial memory shows impairment even when working memory deficits are considered (Wood et al, 2002). Furthermore, spatial memory is among the cognitive domains related to emotion processing in schizophrenia and associated with symptom severity (Kohler et al, 2000). Somewhat lower correlations with negative symptoms were observed for verbal memory and for the spatial processing domains. The specificity of these correlations to negative symptoms further supports the potential contribution of atypical agents.

While treatment was associated with some improvement in the clinical subscale of the outcome measures, there was lack of treatment effect on quality of life. This lack of relationship was also reported by Bilder et al (2002) across treatment groups. Possibly, the lack of improvement in these important facets of schizophrenia, which has been related to cognition (Green, 1996), reflects the relatively short follow-up. Likely, improvement in quality of life lags behind symptomatic relief. This may be particularly the case in first-episode patients. It is also possible that the scale is less sensitive to change. Establishing the time course of clinical, neurocognitive, and psychosocial changes following treatment intervention would provide useful data for testing models of the causal chain.

The present study has several limitations. Most importantly, it does not include a randomized double-blind design where olanzapine is compared to another agent. Therefore, the attribution of effects to olanzapine is tenuous and will require further investigation. Another potential limitation is the use of the same tests in the repeated measures, rather than alternate forms. This may have particularly influenced the memory measures. However, the use of the same tests does not explain the specific improvement observed in patients because controls had the same benefits of repeated exposure. Using the same tests has also permitted to gauge the upper limit of practice effects without confounding them with exposure to novel stimuli. These practice effects turned out to be quite small for most measures and in the healthy participants, they reached their plateau by the third administration for most domains. Overall, practice effects were noticeable for speed rather than accuracy and, in the case of attention, at the expense of accuracy by the third session. The failure of controls to improve is unlikely to be a result of ceiling effects because the tests have been designed using psychometric standards for maximizing discriminability (Gur et al, 2001a). The inclusion of both speed and accuracy measures further limits the influence of ceiling effects. Indeed, controls showed improvement in speed even in domains where no improvement was observed in accuracy. Furthermore, on several domains both patients and controls showed improvement on the third administration. Finally, over half of the patient sample consisted of first-episode participants, where improvement is more likely. The generalizability of our results may therefore be limited to populations with more favorable outcome. Unfortunately, the sample was insufficiently powered to examine the difference between subgroups of patients formally. However, an examination of the means did not indicate consistently better improvement for either subgroup. As noted by Carpenter and Gold (2002), caution is required when interpreting treatment-associated improvement in some cognitive measures.

Notwithstanding its limitations, the present study suggests that treatment with olanzapine may result in both clinical and cognitive improvement, which are related. This finding encourages larger scale randomized studies that can be facilitated by the efficiency of computerized testing. The automated and errorless administration and scoring make such testing feasible in field studies across sites. The availability of both accuracy and speed measures can help in the interpretation of effects in efforts to relate neurocognitive changes to processing strategies.