Recognition of Schizophrenia with Regularized Support Vector Machine and Sequential Region of Interest Selection using Structural Magnetic Resonance Imaging

Structural brain abnormalities in schizophrenia have been well characterized with the application of univariate methods to magnetic resonance imaging (MRI) data. However, these traditional techniques lack sensitivity and predictive value at the individual level. Machine-learning approaches have emerged as potential diagnostic and prognostic tools. We used an anatomically and spatially regularized support vector machine (SVM) framework to categorize schizophrenia and healthy individuals based on whole-brain gray matter densities estimated using voxel-based morphometry from structural MRI scans. The regularized SVM model yielded recognition accuracy of 86.6% in the training set of 127 individuals and validation accuracy of 83.5% in an independent set of 85 individuals. A sequential region-of-interest (ROI) selection step was adopted for feature selection, improving recognition accuracy to 92.0% in the training set and 89.4% in the validation set. The combined model achieved 96.6% sensitivity and 74.1% specificity. Seven ROIs were identified as the optimal discriminatory subset: the occipital fusiform gyrus, middle frontal gyrus, pars opercularis of the inferior frontal gyrus, anterior superior temporal gyrus, superior frontal gyrus, left thalamus and left lateral ventricle. These findings demonstrate the utility of spatial and anatomical priors in SVM for neuroimaging analyses in conjunction with sequential ROI selection in the recognition of schizophrenia.


Results
Through application of the anatomically and spatially regularized SVM on the DARTEL-transformed structural MRIs for the full cerebrum, the training accuracy reached 86.6% in the classification of schizophrenia and healthy controls. The full cerebrum accuracy was found to be 83.5% in the validation model.
We found the 1.13 million entries of the coefficient vector ( Fig. 1) approximately followed a Gaussian distribution  − .
. ( 4 61, 8 69 ) 2 (Fig. 2). Each element in the coefficient vector indicates the significance of an individual voxel in MRI classification derived by the underlying SVM model with the spatial and anatomical regularization. The red and yellow colors indicate association with general increases and decreases in gray matter densities respectively in schizophrenia patients compared to controls. For the positive coefficients, a higher intensity of the respective voxels was classified as schizophrenia whereas the opposite was true in the case of negative coefficients. The distributions of the coefficients within different ROIs were found to vary, indicating different levels of significance among all ROIs. Accordingly, we present a list detailing the distribution of weights corresponding to each ROI derived by solving the SVM model (Supplementary Table S3). This describes the proportion of positive and negative weight values as well as the mean (SD) weights with respect to voxels within each ROI.
In this study, there are a total of 64 ROIs under consideration. We selected a small subset of ROIs to predict schizophrenia for achieving the highest training accuracy using a proposed sequential ROI selection algorithm. We generated 64 different selection paths concerning the sequence of selected ROIs of interest as shown in the topmost graph in Fig. 3. Regardless of the choice of ROI at the initialization step in the algorithm, the derived selection paths appeared to have similar sequential patterns after several iterations. Specifically, the paths generated with different initial ROIs only shifted a small part of the corresponding selection paths while retaining the comparability for the rest in the path. In general, the common sequential pattern of all selection paths, which consists of several ROIs of interest, demonstrated the significance of the underlying ROIs in the classification. For example, as shown in Fig. 3, as to the underlying 64 selection paths, the most common ROIs generated by the algorithm from the 2nd to 5th iterations included the middle frontal gyrus, occipital fusiform gyrus, left putamen and superior parietal lobule. In terms of trade-off between the number of voxels and training accuracy, Scientific REPORTS | (2018) 8:13858 | DOI:10.1038/s41598-018-32290-9 using more voxels in training generally increases training accuracy but on the other hand, reduces the efficiency in training phase. Correspondingly, the graphs in Fig. 4 depict the tradeoff between accuracy and voxel inclusion for the 64 selection paths.
The training accuracies of different initialized ROIs varied although after the first iteration all paths tended to converge in accuracy. The top five significant ROIs in terms of accuracy included the middle frontal gyrus (79.5%), precentral gyrus (79.3%), pars opercularis of the inferior frontal gyrus (79.0%), superior division of the lateral occipital cortex (78.2%) and posterior division of the supramarginal gyrus (78.1%). At each iteration, the next significant ROI was added into the model. The dynamic accuracy of each path was found to reach a turning point at the 7th iteration, indicating 7 as the optimal number of ROIs in the subset. Before the 7th iteration, the SVM was found to be under fitted with limited voxels as the input features. After the 7th iteration, the additional ROIs did not increase the training accuracy and there was a decrease in overall accuracy of the model. We provide a comprehensive list of the top 20 ROIs with the highest accuracy reached at the 7 th iteration in Supplementary  Table S2. The 7 ROIs that were found to make up the optimal subset were the occipital fusiform gyrus, middle frontal gyrus, pars opercularis of the inferior frontal gyrus, anterior division of the superior temporal gyrus, left thalamus and left lateral ventricle. The 7 ROIs constituted 12.4% of total brain volume but yielded 92.0% in terms of training accuracy.

Discussion
In the current study, we employed SVM with the spatial and anatomical regularization to create a data-driven model classifying SZ patients and HC based on key neuroanatomical features in structural MRI scans. The spatial proximity is encoded using the image connectivity as a regularization graph. Specifically, in the 3-dimensional image under consideration, the 6-connectivity graph is in terms of 6 face-connections for each voxel 37 . These 6 connected voxels are the 6 nearest neighbouring voxels. The spatial regularization uses the heat kernel on the graph 37 to mute local variations, such as noise from the MRI scanner. The anatomical regularization counteracts the intensity interferences among different ROIs, so that each ROI is taken as a relatively independent input variable for SVM. This method effectively reduces overfitting caused by local variation in brain images and allows for spatial and anatomically coherent discrimination patterns to be derived, thereby increasing translational relevance and interpretation. In our study, we found that the regularized SVM model yielded recognition accuracy of 86.6% in the training set and subsequent validation accuracy of 83.5% in an independent group of subjects. The addition of sequential ROI selection improved the recognition accuracy of schizophrenia to 92% in the training  Table S3 in Supplementary Information. set and 89.4% in the validation set. A subset of 7 ROIs selected was thus found to surpass the full cerebrum model in overall classification performance.
Pattern recognition studies of high dimensional data using machine learning methods have generally utilized two major approaches, i.e., the pre-model selection and post-model selection methods. The pre-model selection step utilizes machine learning models to determine the feature input for SVM such as principal component analysis (PCA) and independent component analysis (ICA) [39][40][41] and the post-model selection adopts the coefficients of SVM as the significance measure to identify key features 33,35,42 . Unlike using ROIs (i.e., the sets of certain voxels of interest) in feature selection, the results derived by PCA or SVM are associated with discrete voxels, which might not necessarily be clinically interpretable. Specifically, we treated ROIs as independent variables in feature selection and analyzed the associated sets of the corresponding weights in this study. We used the sequential ROI selection process and took the training accuracy as the validation measure. The robustness of the sequential ROI selection process also ensures its generalizability in utility to other MRI studies. As shown in Fig. 5, the sequential ROI selection algorithm yielded 64 selection paths with different initial ROIs which approximately converge to a common sequential pattern.
Notably, the 89.4% classification accuracy obtained in the validation set with the additional application of the sequential ROI selection step appeared to be the highest, compared to relevant studies in literature 28,33,35 which focused on voxel-based gray matter deficits in schizophrenia without incorporating both spatial and anatomical priors. In the first classification study which applied SVM to structural MRI data, Davatzikos et al. 33 obtained an overall recognition accuracy of 81.1% using a "leave-one-out" method whereby the classifier was trained on all but one participant and later applied to test the left-out sample for cross-validation. These results provided an approximation of how generalizable the classifier would be to an independent cohort. Subsequently, Fan et al. 28 used a SVM-Recursive Feature Elimination technique and observed an improved accuracy of 91.8% and 90.8% in distinguishing between the two groups of 69 SZ patients and 69 healthy controls. However, to the best of our knowledge, only two studies of schizophrenia validated their recognition models against an independent sample 35,43 . In the study conducted by Kawasaki and colleagues 43 , the classification accuracy of the training sample (60 subjects) was 75%, whereas the validation sample (32 subjects) achieved 80% accuracy. Nieuwenhuis et al. 35 argued that the increase in accuracy derived from the validation sample could be partly due to the small sample size. Moreover, the study utilized a multivariate linear model instead of SVM. In the largest study of pattern recognition in schizophrenia to date, Nieuwenhuis and colleagues 35 tested a SVM classifier developed with a sample of 239 subjects (SZ 53.6%) on a test group of 277 other participants (SZ 56.0%) and achieved accuracy of 71.4% and 70.4% respectively. With the addition of a 10% feature reduction step, the classification accuracy for the training set was improved to 86.8%, although accuracy of the validation set remained at 69.1%. One possible explanation for the reduction in recognition accuracy with larger sample sets could be attributed to greater variability in brain endophenotypes with sample size increase. While the above studies examined different cohorts of participants using different approaches, the method adopted in this study seems to be promising in recognition of schizophrenia by incorporating rich information in the SVM model.
With regard to structural brain abnormalities, the present analyses yielded a 7-ROI optimal model comprising of the occipital fusiform gyrus, middle frontal gyrus, pars opercularis of the inferior frontal gyrus, anterior superior temporal gyrus, superior frontal gyrus, left thalamus and left lateral ventricle. Our findings of implicated brain regions as well as the discriminative patterns derived from our model show consistency with that of extant  voxel-based meta-analyses 12,23 and prior classification studies 28,33,35,[43][44][45] in schizophrenia. For instance, we report decreased gray matter densities in frontal and superior temporal regions that corroborate with a vast body of literature showing volumetric reductions in these corresponding brain areas 12,23,[46][47][48] . These reductions in frontotemporal brain regions are thought to underlie a range of cognitive deficits and clinical features seen in schizophrenia. Specifically, smaller inferior frontal gyrus volume has been found to correlate with greater negative symptoms, decreased motivation, as well as poorer language functioning in schizophrenia 49,50 . Medial prefrontal cortex reductions have been associated with deficits in executive functioning such as decision-making, computation, motor planning and imagery, discrimination, and reasoning 51 , as well as possible disruptions in the prefrontal-cingulate circuitry associated with non-goal directed and stimulus-independent processes 52 . The superior frontal gyrus, frequently found to be reduced in first-episode and neuroleptic naïve schizophrenia patients 16,53 , has been linked to anomalies in self-awareness, social cognition, and emotion 54,55 . Similarly, the superior temporal gyrus (STG) and subcortical regions such as the thalamus have been found to be implicated early in the illness course [56][57][58][59] . Correspondingly, our analyses showed decreased gray matter densities in these brain areas. Reduced STG and thalamic volume have been associated with positive symptoms including auditory hallucinations and thought disorder 60,61 , as well as deficits in working memory and attention 62,63 . Although less extensively examined, we found regional density decrease within the occipital cortex that has also been obtained in previous SVM classification studies 33,44 which reported gray matter volumetric and concentration reductions, demonstrating the sensitivity of the SVM approach in detecting subtle brain regions that are altered in schizophrenia. Lastly, lateral ventricular enlargement presents as one of the earliest neuropathological hallmarks discovered in schizophrenia that has proven to be a robust finding across various cross-sectional and longitudinal studies 14,64,65 .
There are several limitations on this study. First, a pertinent issue is the sample size which can markedly affect diagnostic performance of the model. A limited number of samples can result in model overfitting and poor generalization of the method to independent data sets. As most of the existing studies have been carried out on a limited number of participants, it is difficult to draw definite conclusions and early results may not generalize well to other patient groups. Moreover, to circumvent the issue of small sample size, most studies have employed cross-validation frameworks by partitioning the data. However, it must be highlighted that the use of these techniques must be interpreted with caution as there is a serious risk of biasing classifier's performance, particularly in the instance where data samples in the validation set are also present in the testing set 24 .
Second, although the SVM method has generally been found to provide better classification 32 , it is difficult to directly compare classification outcomes and ML methods across such studies due to the differences in clinical characteristics of the sample sets, image processing and classification pipelines. For example, Zarogianni and colleagues 24 observed that ML methods, when performed in first-episode schizophrenia sample groups, produced poorer diagnostic performance compared to studies that included schizophrenia patients with greater chronicity of their illness 34,66,67 . It was hypothesized that this could be due to less pronounced and discernible neural alterations in the early onset group compared to chronic schizophrenia, thus influencing the accuracy of classifiers. Third, the presence of co-morbid disorders may also affect the sensitivity of the model in discriminating disease-specific patterns. Specifically, in a SVM classification study of early onset schizophrenia patients which included those with co-morbid substance use disorders, Zanetti and colleagues 67 observed classification accuracy of 73.4% when compared against HC.
Future studies may want to consider pooling of subjects across sites 68 to further increase the sample size of training and validation sets. Furthermore, multi-site studies may encompass more diverse, heterogeneous clinical populations, demonstrating a range of clinical manifestations related to a particular disorder 69 . In addition, further efforts are needed to examine the feasibility of using classification approaches to differentiate between psychiatric disorders. To aid classification performance, integrating neuroimaging indices with other biological markers such as genetic information may improve diagnoses, prediction and monitoring of treatment response and illness prognoses.

Conclusion
We first used a SVM approach with customized anatomical and spatial kernels in the classification of patients with schizophrenia from healthy controls based on whole-brain gray matter volumetric data. We then optimized the SVM model by using a sequential ROI selection algorithm which identified an optimal 7-ROI set with a training accuracy of 92.0% and classification accuracy of 89.4% in an independent validation data set. The 7-ROI subset included the occipital fusiform gyrus, middle frontal gyrus, pars opercularis of the inferior frontal gyrus, anterior superior temporal gyrus, superior frontal gyrus, left thalamus and left lateral ventricle. This may potentially augment or supplement clinical approaches in assessing the clinical presentations of different individuals with schizophrenia which may be varied, complex and often recurring and relapsing.

Participants.
A total of 212 participants were recruited, comprising of 141 patients with DSM-IV diagnosis of schizophrenia (SZ) (97 males and 44 females) and 71 healthy controls (HC) (45 males and 26 females). Written, informed consent was obtained from all participants after a thorough explanation of the study procedures. Diagnoses were made by the treating psychiatrist using information obtained from clinical history, existing medical records, interviews with the patient and significant others and the administration of the Structured Clinical Interview for DSM-IV Axis I disorders -Patient Edition (SCID-I/P) 70 . All patients were on a stable dose of antipsychotic medication for at least 2 weeks prior to recruitment and none had medication withdrawn for the purpose of the study. Participants with history of neurological illness or a diagnosis of alcohol or drug misuse in the preceding 3 months based on DSM-IV criteria were excluded. The SCID Non-Patient Edition (SCID-I/NP) was administered to healthy controls (HC) to rule out the presence of any other Axis I psychiatric disorder 71  was approved by the Institutional Review Board of the Institute of Mental Health, Singapore and the National Neuroscience Institute, Singapore. All procedures were carried out in accordance with the relevant guidelines and regulations outlined by the approving institutions.
Data from the 212 participants were divided into a training set consisting of 127 participants (60%) and validation set with 85 participants (40%). As to the partitioning of the data, the training and test sets were randomly selected by matching participants based on variables like age, gender and diagnosis listed in Table 2. The training set was used to train the SVM model to derive key anatomical regions and the model was then tested in the validation set. The permutation test was conducted for cross validation.
Neuroimaging Data Acquisition. Magnetic resonance imaging was performed at the National Neuroscience Institute, Singapore, using a 3-Tesla whole body MRI scanner (Philips Achieva, Philips Medical Systems, Eindhoven, The Netherlands) with a SENSE head coil. Whole-brain scans were then acquired with a high resolution T1-weighted Magnetization Prepared Rapid Gradient Recalled Echo (MP-RAGE) sequence (repetition time (TR) = 8.4 s; echo time (TE) = 3.3 ms; flip angle = 8°). Each T1-weighted volume consisted of 180 axial slices of 0.9 mm thickness with no gap (field of view (FOV) = 230 × 230 mm; acquisition matrix = 256 × 256 pixels). The stability and high signal to noise ratio (SNR) are important factors contributing to enhance the accuracy and sensitivity in the measurement system. A regular quality control procedure was adopted to ensure the stability of a high signal-to-noise ratio.

Neuroimaging Data
Analyses. An optimized VBM protocol 72 was applied using Statistical Parametric Mapping (SPM8) (http://www.fil.ion.ucl.ac.uk/spm/). In summary, we (1) segmented individual T1-weighted images into gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF); (2) created a study-specific template consisting of both healthy participants and patients using nonlinear DARTEL registration 73 ; (3) registered each GM probability map to the customized template in Montreal Neurological Institute (MNI) space and performed tissue segmentation; (4) performed modulation by multiplying voxel values by the Jacobian determinants derived from the spatial normalization step; (5) applied smoothing on the normalized GM maps by an 8-mm isotropic Gaussian kernel. The voxel intensities of these blurred segments indicate local presence or concentration of gray matter and are henceforth referred to as gray matter densities.
The Harvard-Oxford cortical and subcortical atlases 74 were combined to identify the anatomical structure for which each MRI voxel belongs to. Since only the cerebrum is the region of interest in schizophrenia brain classification, the brain stem was excluded from the subcortical atlas. In total, 1,135,246 voxels from 64 anatomical structures were included in the complete atlas. SVM with Regularization Classification. Support Vector Machine (SVM) as a supervised learning model has been widely used in pattern detection with high dimensional data. It detects boundary samples (support vectors) and compounds the features of the support vectors to derive key features in classification using L 1 -norm or L 2 -norm. For the problem under consideration, the features of the subjects used in training in the SVM model are individual voxels. The dimensions of the model are equal to the number of voxels in a single brain MRI scan. The value of the coefficient in the model associated with each voxel indicates the relative importance of the respect voxel in classifying subjects into two groups (i.e., SZ or HC).
In this study, we applied the framework of spatial and anatomical regularization of SVM developed by Cuingnet et al. 37 to the classification of neuroimaging data for patients with schizophrenia, which extended its application to the classification of patients with Alzheimer's disease discussed in 37 . By introducing the regularization operator to the SVM, the classification function is constrained to be smooth with respect to the spatial and anatomical priors. Briefly, the underlying regularization is defined based on spatial proximity and anatomical proximity using Laplacian operator. Specifically, the spatial regularization uses the heat kernel and 6-connectivity-model to mute the local variations, such as noise from MRI scanner. The anatomical regularization counteracts the intensity interferences among different ROIs. In previous applications using SVM, when the training set is small, a large radius of isotropic kernel is usually required. Then, the anatomical kernel would play a key role in retaining the key features of each anatomical structure. In general, the combination of both spatial and anatomical kernels could adjust the spatial and anatomical factors appropriately within the brain image. This framework effectively reduces the overfitting caused by local variation in brain images and allows for spatial and anatomically coherent discrimination patterns to be derived, resulting in increasing translational relevance and interpretation. A reader is referred to 37 for a detailed description and explanation on the underlying SVM framework. Let  denote the domain the 3D images and v denote an element of  (i.e., a voxel). Let  χ = L ( ) 2 represent the set of integral functions on  equipped with the canonical inner product denoted by . . .
, Let x s ∈ χ be the data of a given subject s. In this study, x s can be viewed as an element of R m , where m denotes the total number of voxels since the images are discrete. Assume there is a group of N subjects with the corresponding data x s ∈ χ, = … s N 1, , . Each subject is associated with a group y s ∈ {−1, 1} (e.g., diseased or healthy), = … s N 1, , . In combination with these two regularization terms, the underlying SVM optimization problem was presented in Equation (21) of the ref. 37 That is, where λ ∈ R + is the regularization parameter and ⋅  ( ) hinge is the hinge loss function commonly used in SVM defined by Here, L s and L a refer to the associated Laplacians of the graph encoding spatial proximity and anatomical proximity, respectively. β a and β s denote anatomical and spatial hyperparameters which control the spatial and anatomical effect in the regularization. Each element in the coefficient vector w = (w v ) v∈V corresponds to a voxel and indicates the significance of the individual voxel in MRI classification. b is the usual scalar parameter in defining the hyperplane in SVM. It is known from (37, Equation (22)) that the above minimization problem is equivalent to an SVM optimization problem with kernel Mathematically, the underlying kernel function involves two items concerning matrix exponential associated with information pertaining to anatomical and spatial voxel adjacency. This offers an improved classification performance for problems under consideration. Generally, the kernel takes local voxel variance and cross-structural variance into account. It uses the space kernel to reduce the variance of neighboring voxels and the binary anatomical kernel to segregate the cerebrum according to the anatomical structure. In this study, the binary anatomical voxel adjacency matrix (L a ) was derived from the Harvard-Oxford cortical and subcortical atlases associated with each voxel with the relevant structural label. The binary anatomical voxel adjacency matrix is a large sparse matrix with the dimension being equal to the number of pixels in an MRI scan. The binary value indicates whether two voxels are from the same anatomical structure or not. The probabilistic spatial voxel adjacency matrix (L s ) was derived from the 6-connectivity model and the parameters β a , β s and λ were determined by the grid search. The map in Fig. 1 presents a concept visualization to reflect how the effect of the high dimensional kernels would translate into a real cerebrum sample. With the anatomical and spatial regularization, the coefficients are well smoothed within the same anatomical structure and regularized the reciprocal effect among different ROIs.

Sequential ROI Selection.
In the machine learning framework, various learning strategies may be applied for feature selection. In general, the L 1 -norm performs well with the SVM algorithms in feature selection. However, in this study, we did not choose to use the L 1 -norm as it selects features by voxel, which might not have specific anatomical and clinical interpretation. Instead, we propose the use of a sequential ROI selection algorithm to identify the key ROIs.
To perform the sequential ROI selection, at the initialization step, we defined a singleton set of ROI by choosing an arbitrary ROI from 64 ROIs under consideration. This set was updated progressively by adding a new ROI from the remaining ROIs at next iteration. Here the newly added ROI was chosen based on the criteria that the training accuracy in predicting schizophrenia using this ROI together with ROIs chosen in previous iterations could achieve the highest accuracy, compared to training prediction accuracies attained by using any other individual remaining candidate and those ROIs selected previously. This process continues until the total number of ROIs of the sequential ROI selection is reached. Hence, we then derived a selection path of all ROIs (i.e., 64 ROIs) under consideration. The selection process is a greedy-forward algorithm in which an optimization problem is solved for choosing the optimal ROI at each step. In view of the influence of the initial ROI on the subsequent sequence of ROIs, we ran through 64 scenarios with each ROI being chosen as the initial ROI. In general, n selection paths were derived with n different initializations of individual ROIs and the corresponding training accuracy at each step in the selection paths were noted. Eventually, the more significant features, i.e., ROIs, in classification were included in the selected ROI training set. Due to the large number of voxels under consideration, the training predictions involved at each step are large scale problems, subsequently incurring massive computational efforts in implementation. To improve the efficiency of the sequential ROI selection process, it necessitates balancing both computational load and prediction accuracy. Specifically, we were interested to investigate the tradeoff between the voxels used and the training accuracy achieved. Thus, we chose a number of ROIs of interest which as small as possible (i.e., 7 ROIs) from all derived selection paths that resulted in the highest accuracy in prediction. Furthermore, the proposed selection process is in the context of a robust optimization approach which seeks to generate a relatively stable solution of interest, under the uncertainty of randomly selected training and test sets.