Introduction

Schizophrenia shows a lifetime prevalence of 0.30–0.66% in the general population(McGrath et al, 2008) making it one of the leading factors of global disease burden (‘WHO, 2004 Global Burden of Disease—Update,’ n.d.). After more than a century of research into the neurobiology of the disorder, its pathophysiological underpinnings still remain unknown. Over the past 15 years, considerable research efforts elucidated a vast array of functional (Howes et al, 2012; Minzenberg et al, 2009) and structural brain abnormalities (Chan et al, 2009; Fornito et al, 2009; Honea et al, 2005) that may constitute the ‘organic surrogate’ of the illness. Even though these results indicate significant differences in, eg, brain structure between healthy controls (HC) and patients at the group level—a substantial overlap is usually observed between groups, which disallow the use of these differences for the individualized diagnosis of the disorder. Therefore, alterations in brain structure and function have so far not been successfully integrated into the diagnostic process as disease biomarkers operating on the single-subject level (Borgwardt et al, 2012; Borgwardt and Fusar-Poli, 2012; Kapur et al, 2012). The main reason for this gap between research and its potential diagnostic application lies in traditional univariate statistical approaches, which neglect the heavily interconnected nature of the functional and structural brain data (Davatzikos, 2004).

To overcome these methodological drawbacks, an increasing number of studies have applied novel multivariate statistical approaches to the analysis of brain alterations in patients with schizophrenia (eg, (Davatzikos et al, 2005; Fu and Costafreda, 2013; McIntosh and Lobaugh, 2004; Zarogianni et al, 2013)). These results indicate that patterns of subtle structural and functional changes can be highly distinctive of schizophrenia-related brain alterations, even though each individual component within these patterns might be not. Most importantly, the classification performance of neuroimaging biomarkers based on multivariate statistical methods is typically assessed by using cross-validation strategies that allow estimating the predictive models’ generalizability to unseen test individuals. In this regard, the majority of studies using multivariate machine learning algorithms reported good generalization performances, which might open up the possibility of neuroimaging to become part of the routine diagnostic process in the future. For instance, support-vector machines (Davatzikos et al, 2005), partial least squares analysis (Kawasaki et al, 2007; McIntosh and Lobaugh, 2004), random forests (Anderson et al, 2010; Greenstein et al, 2012) and artificial neural networks (Bose et al, 2008; Josin and Liddle, 2001; Rathi et al, 2010) have shown to differentiate patients from HC with diagnostic accuracies of 60–100% using neuroimaging data.

However, these studies differ with respect to multiple aspects such as the demographic characteristics of the investigated populations, the clinical symptoms of the patient samples, the imaging modalities employed, the preprocessing of neuroimaging data prior to analysis, the statistical models, as well as the evaluation scheme of the models’ performance. As a result, the sensitivity and specificity of the reported predictive models differ widely, making it difficult to compare the classification performance of neuroimaging-based biomarkers across studies. Furthermore, little is known about which factors contribute to the success of MRI-based predictive modeling as authors may typically test a range of analysis pipelines and finally report only the analysis scheme achieving the highest test performance (Pers et al, 2009). Only a few studies have systematically compared two or more algorithms (Bose et al, 2008; Castellani et al, 2012; Rathi et al, 2010). However, a systematic investigation of different imaging modalities or multivariate methods is still missing. Finally, to the best of our knowledge, no comparative reports exist to date on the relationship between clinical variables of the tested samples and diagnostic accuracies of neuroimaging-based diagnostic models. Age, gender, psychiatric symptoms, or current medication represent potentially confounding variables, which might affect the diagnostic success of such models.

Thus, we conducted a meta-analysis of multivariate pattern recognition studies to evaluate the performance of neuroimaging phenotypes in distinguishing patients with schizophrenia from HC. Within this framework, we also assessed the potentially moderating impact of different clinical variables on these neurodiagnostic signatures.

Materials and Methods

Search and Selection Strategy

The entire electronic PubMed database was searched from 1 January 1950 up to 31 May 2013. Initially, studies were screened by using a comprehensive search term ((‘support vector’ OR ‘SVM’ OR ‘classification’ OR ‘categorization’) AND (‘MRI’ OR ‘fMRI’ OR ‘magnetic resonance’ OR ‘imaging’ OR ‘gray matter’ OR ‘gray matter’ OR ‘white matter’ OR ‘DTI’ OR ‘diffusion tensor imaging’ OR ‘PET’ OR ‘positron emission tomography’ OR ‘SPECT’ OR ‘single photon emission tomography’) AND (‘schizophrenia’ OR ‘psychosis’ OR ‘psychotic’ OR ‘schizophreniform’)). Subsequently, all studies were screened according to the following criteria: To be included in the meta-analysis a paper needed to report results of a neuroimaging-based multivariate classification model separating patients with schizophrenia from HC. We included all available multivariate approaches such as support-vector machines, random forests, discriminant analysis, logistic regression, neural networks, as well as combinations thereof. Studies were included if the following measures of classification performance were available or if data allowed for the calculation of the following parameters: true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). In case insufficient data was reported, authors were contacted via email to provide additional information regarding their published reports. Furthermore, in multivariate classification it is of utmost importance to apply some form of cross-validation while estimating model parameters to avoid overfitting, which is associated with low generalizability. Thus, only studies that applied cross-validation (eg, leave-one-out, n-fold, and bootstrapping) were included in the analysis. In some cases, multiple studies were published based on the same sample or with large overlap between samples. We verified sample overlap by contacting the corresponding authors. In order to avoid bias we excluded samples with large overlap (shared n>20%). The results of the literature search are presented in a flow-chart following the PRISMA guidelines (Moher et al, 2009) (see Supplementary Figure 1).

Data Extraction

The main outcome measure was the diagnostic test performance of the different multivariate approaches for separating schizophrenic patients from HC as measured by sensitivity (=TP/(TP+FN)) and specificity (=TN/(TN+FP)). The following additional information was extracted from all studies: names of the authors; year of publication; population characteristics of HC and patient groups (group size, age, gender, antipsychotic use, diagnosis, and symptom ratings); type of neuroimaging data (magnetic resonance imaging ‘MRI’, functional MRI ‘fMRI’, resting-state fMRI ‘rsfMRI’, positron emission tomography ‘PET’, single photon emission computed tomography ‘SPECT’, diffusion tensor imaging ‘DTI’, scanner type, and resolution), characteristics of the employed preprocessing methodology, characteristics of the classification method (eg, linear discriminant analysis and support-vector machine) and characteristics of the cross-validation procedure. Data extraction was performed by two authors separately (LKI, JK) to ensure accuracy and disagreements were discussed in a consensus conference. The QUADAS-2 guidelines were used to assess study quality of all publication included in the present meta-analysis (see Supplementary Figure 2) (Whiting et al, 2011).

Data Analysis

In studies of diagnostic test accuracy sensitivity and specificity are often negatively correlated and therefore pooling them in the context of a meta-analysis might lead to biased results (Gatsonis and Paliwal, 2006). Instead a bivariate approach(Reitsma et al, 2005) and a strategy based on a hierarchical summary ROC model (HSROC (Rutter and Gatsonis, 2001)) have been suggested to estimate sensitivity and specificity across studies. However, in most situations both approaches lead to identical results (Harbord et al, 2007). In the present analysis we implemented the strategy introduced by Reitsma et al (2005). In this bivariate approach, log-transformed sensitivity and specificity are combined in one bivariate regression model while explicitly accounting for their correlation. It is assumed that sensitivity and specificity vary across studies because of differences in study populations, sampling errors, and differences in implicit thresholds applied to the data to separate patients from HC. Thus a random-effects model is applied in order to account for between-study heterogeneity. As larger samples are associated with smaller sampling error and thus with more precise effect size estimates, the studies included in the meta-analysis are weighted according to their sample size. Meta-analysis results are presented in forest plots separately for sensitivity and specificity. Summary estimates for sensitivity and specificity are provided separately for MRI, for rsfMRI studies as well as for all studies combined. We considered n=5 to be the minimum number of studies to justify a separate meta-analysis (Ioannidis and Lau, 2001). The robustness of the results as well as the effect of potentially confounding variables (eg, age, gender ratio, and year of publication) was investigated by adding moderator variables to the bivariate regression model. In order to investigate potential publication bias in meta-analyses of diagnostic accuracies, it has been recommended to create funnel plots by plotting log diagnostic odds ratios (logDOR) for all studies against with n1 and n2 representing the sample sizes of the patient and the HC group. This measure is proportional to the inverted square-root of the effective sample size (ESS):. In case of a publication bias the distribution of studies in the funnel plot is asymmetrical. A statistical test for funnel plot asymmetry is provided by a regression of logDOR with weighted by ESS (Deeks et al, 2005). All computations were performed using the R statistical programming language version 2.10.13 (R Core Team, 2013) with the package mada (Doebler, 2012).

Results

The initial literature search identified 399 studies of interest. After screening all studies and applying the inclusion criteria, 361 studies were excluded. Fan et al (Fan et al, 2007) and Davatzikos et al (Davatzikos et al, 2005) used overlapping samples. Only Fan et al (Fan et al, 2007) was included in the main analysis as it is the most recent report of this sample. For additional moderator analysis we included Davatzikos et al as additional data was provided (Davatzikos et al, 2005). Between Liu et al (Liu et al, 2012) and Shen et al (Shen et al, 2010) there was an overlap of only 4 out of 32 subjects. This was considered a minor overlap and both samples were included in the analysis. Two studies (Fekete et al, 2013; Hu et al, 2013) were based on the same sample but were included in the meta-analysis as they computed predictive models based on fundamentally different features. In order to exclude the possibility that this affected the results of our meta-analysis, the effect of excluding each of these studies on overall sensitivity and specificity was investigated. The final sample consisted of n=38 studies with of a total of n=1602 SZ patients and n=1637 HC. Among the included studies were n=20 studies using structural MRI, n=11 studies using rsfMRI, n=4 studies using fMRI, n=3 studies using PET, and n=1 study using DTI to build predictive models (see Supplementary Table 1 for an overview of the characteristics of the included studies).

Across all studies, neuroimaging-based classifiers separated SZ from HC with a sensitivity of 80.3% (95% CI: 76.7–83.5%, see Figure 1) and a specificity of 80.3% (95% CI: 76.9–83.3%, see Figure 2). A summary ROC-curve of the included studies along with the estimated summary is presented in Figure 3. Visual inspection of funnel plots did not show evidence for a publication bias (see Supplementary Figure 3). Regression with year of publication did not show any effect on sensitivity (p=0.766) or specificity (p=0.801).

Figure 1
figure 1

Forest plot of sensitivities for studies using MRI, fMRI, rsfMRI, rCBF-PET, F-DOPA-PET, and DTI to diagnose schizophrenia. Summary estimates for sensitivity are computed using the approach described by Reitsma et al (2005).

PowerPoint slide

Figure 2
figure 2

Forest plot of specificities for studies using MRI, fMRI, rsfMRI, rCBF-PET, F-DOPA-PET, and DTI to diagnose schizophrenia. Summary estimates for specificity are computed using the approach described by Reitsma et al (2005).

PowerPoint slide

Figure 3
figure 3

SROC curve of the Reitsma model with the summary sensitivity and false positive rate indicated in black as well as color-coded the sensitivity and false positive rate of the invidivual studies of different imaging modalities.

PowerPoint slide

No significant effects of sex, illness duration, PANSS positive scores, PANSS negative scores, or analysis methods (SVM/LDA) on sensitivity or specificity (all p>0.1) were observed (Table 1). We detected a significant effect of patients’ age (p=0.029) indicating higher sensitivity in older subjects (see Figure 4). There was no evidence for an effect of age on specificity (p=0.095) and no age effect in the HC on sensitivity (p=0.168) or specificity (p=0.380). We observed a significant effect of positive-to-negative symptom ratio on specificity (p=0.022), indicating higher specificity in patients with predominantly positive symptoms (see Figure 4). There was no effect of positive-to-negative symptom ratio on sensitivity (p=0.500). Comparing studies investigating first-episode patients (FEP) vs chronic patients (CSZ), we found a significantly higher sensitivity in CSZ (p=0.025, see Figure 4) but no such effect on specificity (p=0.202). A significant effect of antipsychotic medication (converted to chlorpromazine equivalents, CPZ-eq) on specificity (p=0.016) was found, indicating higher specificity in subjects treated with higher medication doses (see Figure 4). However, CPZ-eq did not significantly affect sensitivity (p=0.09).

Table 1 Results from Bivariate Meta-analyses Applying the Approach by Reitsma et al (2005)
Figure 4
figure 4

Results from the moderator analysis: linear regression models with (a) chlorpromazin equivalent predicting specificity, (b) age of patients predicting sensitivity, (c) PANSS ratio predicting specificity and differences in sensitivity and specificity between (d) stages of illness and (e) imaging modalities.

PowerPoint slide

When the structural MRI studies were separately analyzed the meta-analysis showed a sensitivity of 76.4% (95% CI: 71.9–80.4%) and a specificity of 79.0% (95% CI: 74.6–82.8%). The rsfMRI studies had a sensitivity of 84.46% (95% CI: 79.9–88.2%) and a specificity of 76.9% (95% CI: 71.3–81.6%). After excluding Hu et al (2013) or Fekete et al (2013) there was no significant change in the sensivitiy (84.7% with a 95% CI: 79.98–88.46% and 84.28% with a 95% CI: 79.6–88.04%, respectively) or specificity (77.87% with a 95% CI: 83.3–71.29% and 76.5% with a 95% CI: 81.35–70.84%). ‘Data source’ was added as a moderating variable to the bivariate meta-analysis model to investigate significant differences between different data sources (MRI, rsfMRI). There was a significant difference (p=0.010) between the sensitivity of rsfMRI and structural MRI studies, indicating higher sensitivity in rsfMRI studies (see Figure 4). There was no significant difference in specificity (see Figure 4). To investigate the potential effect of different multivariate approaches, the data set was restricted to studies that applied support-vector machines (n=12) and discriminant analysis (n=13). The bivariate meta-analytic model showed no significant difference between DA and SVM studies regarding sensitivity (p=0.766) and specificity (p=0.801).

Discussion

We present a meta-analysis of a total of n=38 studies with of a total of n=1602 SZ patients and n=1637 HC. Our results suggest that a neuroimaging phenotypes of schizophrenia separate patients from HC with an overall sensitivity and specificity of 80%. Similar results were obtained when the analysis was restricted to individual imaging modalities (structural MRI or rsfMRI). This finding was robust against the inclusion of potential confounding factors such as year of publication and there was no evidence for a publication bias.

Effect of Age

Interestingly, older age was significantly associated with higher sensitivity. Although illness duration itself did not have a significant impact on sensitivity and specificity, there was a higher sensitivity in patients in a chronic stage of schizophrenia as compared with first-episode patients. These findings might result from more pronounced brain changes in older subjects with schizophrenia. In addition, this finding may be caused by secondary disease effects, which are not related to the underlying brain pathology, but are rather due to environmental factors associated with a more unfavorable illness course in this patient population. In keeping with this hypothesis, numerous studies reported progressive brain changes to be associated with short-term (Tost et al, 2010) and long-term (Navari and Dazzan, 2009) antipsychotic treatment. Thus, pronounced brain changes and higher sensitivity of neuroimaging-based diagnostic models in older patients might additionally result from long-standing antipsychotic treatment (Fusar-Poli et al, 2013; Ho et al, 2011; Smieskova et al, 2009). The investigation of antipsychotic treatment as a moderator in the present analysis indicated a potential effect of the current antipsychotic dose. However, while older age was associated with higher sensitivity, higher chlorpromazine equivalents were associated with higher specificity. To further disentangle possible effects of antipsychotic medication on diagnostic classification measures from the impact of the disease process itself, future meta-analyses have to cover a critical mass of patient samples having well-documented prospective medication data (Ho et al, 2011).

Effect of Psychotic Symptoms

Another interesting finding of the present analysis is the association between predominant positive symptoms and higher specificity of the neuroimaging-based diagnostic models. It has been reported that brain changes associated with schizophrenia are related to the extent of psychopathology as measured by psychotic symptom scales(Modinos et al, 2013; Palaniyappan et al, 2012). Similarly, there seem to be differences in brain alterations in patients with predominant positive vs predominant negative symptoms (Koutsouleris et al, 2008; Nenadic et al, 2010). This might seem counterintuitive as previous studies indicate larger brain structural abnormalities in patients with pronounced negative symptom symptoms (Koutsouleris et al, 2008). However, it might be the case that the pattern of gray matter alterations in patients with mainly positive symptoms—even if it is subtle—is more distinctive as compared with patients with negative symptoms and thus facilitates higher classification performances. It may be hypothesized that patients with predominant positive symptoms also received higher dosages of antipsychotic medication, which in turn may impact on the brain as discussed above. Therefore, the finding that positive symptoms are associated with specificity might be confounded by previous treatment. Another potential interpretation of this association may relate to the current, purely symptom-based diagnostic system, which forms the ground truth for fully supervised neuroimaging-based disease classification. In this regard, greater homogeneity between clinical raters can be expected when they diagnose schizophrenia in patients with pronounced positive symptoms as compared to patients with negative symptoms, who are difficult to differentiate from patients with major or psychotic depression. Thus, predominant negative symptoms might be associated with higher neurobiological variability compared with the phenotype of acute psychosis, creating an area of diagnostic ambiguity not only for clinical raters but also for any downstream supervised classification methods relying on these raters. In fact, this would create an upper bound on the sensitivity and specificity that could be achieved by means of supervised neuroimaging-based predictive models.

Effect of Neuroimaging Modality

Our results point to a significantly higher sensitivity associated with rsfMRI data as compared with structural MRI data whereas both neuroimaging modalities showed a similar specificity. This suggests that more homogeneous functional resting-state patterns in schizophrenia lead to a tighter clustering of patients in the functional compared with the structural feature space, and hence to an increased capacity of rsfMRI-based classification algorithms to detect the disease condition. This in turn suggests that disease heterogeneity is greater in the neuroanatomical domain as show in a recent study (Zhang et al, 2014). Future studies may involve both structural and functional MRI data to generate diagnostic classifiers with superior sensitivity and specificity.

Differences between Multivariate Methods

We observed a substantial methodological heterogeneity concerning the multivariate algorithms used to build the predictive models. The most frequent approaches were discriminant analysis and support-vector machines, which were used by 26 out of 36 studies (72%). It is important to note that support-vector machines typically show higher classification performance when nonlinear relationships are present in the data. Also—unlike linear discriminant analysis—support-vector machines increase generalizability by emphasizing samples that are located close to the decision boundary in the feature space (Hastie et al, 2009). As both approaches showed almost identical sensitivity and specificity in our analysis, these differences seem not to have affected the classification performance. Three studies (Bose et al, 2008; Josin and Liddle, 2001; Rathi et al, 2010) applied an artificial neural network model to structural MRI and PET data with slightly higher sensitivity (86–100%) and a slightly higher specificity (85–100%). Two studies (Anderson et al, 2010; Greenstein et al, 2012) applied a random-forest approach to fMRI and MRI data. These studies report a slightly lower sensitivity (64 and 73%) and slightly lower specificity (83 and 74%) compared with other studies. However, it is noteworthy that the comparison of different classification methods in the context of the present meta-analysis might be confounded by the characteristics of the investigated samples such as age, medication, symptoms, and disease stage. To the best of our knowledge, a systematic investigation of different classification algorithms for the MRI-based diagnosis of schizophrenia in large representative patient populations is missing.

Limitations of the Presented Study

It is of note that 20% of patients were misclassified as HC by the applied multivariate models. This misclassification rate may be due to either (1) the existence of a different pattern of brain abnormalities in this subgroup compared with the majority of patients, or (2) to the absence of a homogenous discriminative pattern in this patient subgroup compared to the HC group, or (3) to the rater-based ‘noise’ in the diagnostic labels provided to supervised classification algorithms. The aggregated data analysis performed in our study does not allow us to clarify these alternative possibilities. Hence, future studies employing semi- and unsupervised machine learning methods in well-controlled representative study populations are needed to potentially elucidate the neurobiological heterogeneity of the disorder and in turn use this information to generate high-performing neuroimaging-based classifiers of schizophrenia.

In this context, it needs to be noted that most of the published studies on neuroimaging-based diagnostic models largely focus on methodological details of the applied machine learning algorithms. This results from the fact that multivariate prediction of psychiatric diagnosis is a young research topic. Thus, most studies aim at ‘proof of concept’ approaches, showing that multivariate models are principally able to infer distinctive brain patterns at the single-subject level. Another reason might be the availability of numerous competing algorithms. Most studies so far have tried to compare new techniques to previous ones while paying little attention to the systematic investigation of methodological factors within the same sample.

On the other hand, most studies provide only limited information regarding the investigated patients samples and their clinical characteristics. As pointed out by Deville et al (Devillé et al, 2002), a detailed description of the patients’ disease status, symptoms, length and course of illness, current medication, or comorbidities is crucial for evaluating the potential of such models to enter clinical practice in the future. The results of our meta-analysis fully agree with this requirement as they showed that clinical factors such as age or symptoms affect sensitivity and specificity, while methodological factors did not. As such some patient samples might be more suitable for the application of neuroimaging-based predictive models than others. This also has implications for the interpretation of neuroimaging-based predictive models. There are multiple confounding factors that are illness-related, but not causative, that might result in neurobiological differentiation. Thus, to move from a theoretical field of research toward a clinical application of these diagnostic methods, future studies should provide detailed clinical and sociodemographic information about the investigated patient and HC samples. This clinical information is the ‘conditio sine qua non’ for evaluating the applicability of multivariate methods to various patient samples, subsamples or disease states.

It must be noted that the studies included in the present analysis identified schizophrenia-distinctive brain patterns as compared to healthy volunteers. To date, only few studies have investigated patterns of brain abnormalities that differentiate between different psychiatric disorders. For the differentiation between schizophrenic and bipolar patients diagnostic accuracies of 92% for schizophrenia and 79% for bipolar disorder based on fMRI (Costafreda et al, 2011) and an overall classification accuracy of 88% (Schnack et al, 2014) or 100% (Bansal et al, 2012) based on sMRI have been reported. This research direction is critical as there is considerable doubt whether the current nosological constructs are subserved by distinct neurobiological signatures, or alternatively whether there exists a significant pathophysiological overlap between disease entities. A promising strategy to address this issue might be the delineation of more homogenous patient subgroups within and across disease boundaries (Insel et al, 2010) by means of unsupervised and semisupervised analysis methods (Filipovych et al, 2011, 2012). Also, future studies need to address the question of how well neuroimaging-based biomarkers generalize, eg, across different sites. In the studies included in the present analysis most of the data have been acquired on one site using the same scanners and scanning sequences. However a recent study indicates that diagnostic models are not site specific but that similar sensitivity and specificity can be achieved for data acquired from different sites (Nieuwenhuis et al, 2012). Although the present analysis indicates a discriminative pattern of brain alterations associated with schizophrenia, our results underline the importance of an exhaustive assessment of clinical characteristics during the investigation of such biomarkers.

FUNDING AND DISCLOSURE

EliLilly has provided medication for a clinical trial led by SL as principal investigator. LKI has been supported by the EU-funded project Personalised Prognostic Tools for Early Psychosis Management ‘PRONIA’ (Grant Agreement Number 602152). The other authors declare no conflicts of interest in relation to this study.