Using deep belief network modelling to characterize differences in brain morphometry in schizophrenia

Neuroimaging-based models contribute to increasing our understanding of schizophrenia pathophysiology and can reveal the underlying characteristics of this and other clinical conditions. However, the considerable variability in reported neuroimaging results mirrors the heterogeneity of the disorder. Machine learning methods capable of representing invariant features could circumvent this problem. In this structural MRI study, we trained a deep learning model known as deep belief network (DBN) to extract features from brain morphometry data and investigated its performance in discriminating between healthy controls (N = 83) and patients with schizophrenia (N = 143). We further analysed performance in classifying patients with a first-episode psychosis (N = 32). The DBN highlighted differences between classes, especially in the frontal, temporal, parietal, and insular cortices, and in some subcortical regions, including the corpus callosum, putamen, and cerebellum. The DBN was slightly more accurate as a classifier (accuracy = 73.6%) than the support vector machine (accuracy = 68.1%). Finally, the error rate of the DBN in classifying first-episode patients was 56.3%, indicating that the representations learned from patients with schizophrenia and healthy controls were not suitable to define these patients. Our data suggest that deep learning could improve our understanding of psychiatric disorders such as schizophrenia by improving neuromorphometric analyses.


Supplementary information Deep Belief Networks
The deep learning method that we used in this study consisted of a deep neural network pre-trained by a DBN (DBN-DNN). The DBN has gained popularity since the successful implementation of an efficient learning technique that stacks simpler models known as restricted Boltzmann machine (RBM) 6 .

Restricted Boltzmann Machine
The RBM can be interpreted as an artificial neural network that extracts latent features of the input unknown probability distribution based only on observed samples 19 . Given some observations, training an RBM means adjusting the model parameters such that the probability distribution represented by it fits the distribution of the training data as well as possible.
The RBM network consists of a bipartite graph that has a visible layer and a hidden layer (Fig. 1). The RBM can be defined as an energy-based model, and the joint probability distribution of hidden unit values h and visible unit values v is determined using an energy function E (1).
where the normalizing constant Z is called the partition function by analogy with physical systems. The partition function is obtained by summing over all possible pairs of visible and hidden vectors (2).
The energy function of GRBM can be defined by: (4) where bi and cj are the bias of visible unit i and hidden unit j, respectively, and Wi,j is the weight parameter of the model connections.
The objective of training is to fit the probability distribution model over a set of visible random variables v to the observed data. Thus, the training process can be operated by maximum likelihood estimation method for the marginal probability . The gradient of the likelihood on the RBM parameters (weights and biases) has a closed form. However, it includes an intractable expectation over the joint distribution of visible and hidden P(v,h).
Usually, an approximation of the gradient is used to deal with this intractable expectation problem. A truncated version of Gibbs sampling method called Contrastive Divergence (CD) 6 uses the conditional probability, P(v|h) and P(h|v) in the approximation. The popularity of the RBM stems from CD efficient algorithm and from the ability to calculate conditional distributions over v and h easily. The conditional probabilities of the RBM can be computed as: Similarly, for a GRBM, the corresponding conditional probability of visible units become: (7) where  the logistic sigmoid function ((x)=1/(1+e -x )), and the normal distribution is denoted by N(mean;variance). Further information on RBM model and training can be found in 6,19 .

Creating Deep Belief Networks
After training, the hidden unit values of RBM provide a closed-form representation of the dependencies between the visible units. The idea is that the hidden units extracted relevant features from the observations. However, these features are regarded as low-level features. To achieve more complex representations, the model needs to calculate the higher-level features based on the lower-level ones. So, we create a DBN by stacking RBMs 6 . The stacking procedure is described as follows. After training a GRBM with the continuous input data, we treat the activation probabilities of its hidden units as the input data to train the Bernoulli-Bernoulli RBM one layer up. Similarly, the hidden units' activation probabilities of the second-layer RBM are used as input for next RBM, and so on until reaching the desired depth. By stacking RBMs, the DBN can learn a hierarchical structure of the input data.
This "pre-training" can be followed by a discriminative training that fine-tunes all layers jointly to perform the classification task. This fine-tuning is done by initiating the parameters of a deep neural network with the values of DBN pre-trained parameters. Besides that, final layer (composed of softmax units) is added to implement the desired targets of the training data, the labels SCZ and HC. Finally, the backpropagation algorithm and a gradient-based optimization algorithm can be used to adjust the network parameters, creating a DBN-DNN.