Abstract
Finding effective and objective biomarkers to inform the diagnosis of schizophrenia is of great importance yet remains challenging. Relatively little work has been conducted on multi-biological data for the diagnosis of schizophrenia. In this cross-sectional study, we extracted multiple features from three types of biological data, including gut microbiota data, blood data, and electroencephalogram data. Then, an integrated framework of machine learning consisting of five classifiers, three feature selection algorithms, and four cross validation methods was used to discriminate patients with schizophrenia from healthy controls. Our results show that the support vector machine classifier without feature selection using the input features of multi-biological data achieved the best performance, with an accuracy of 91.7% and an AUC of 96.5% (pā<ā0.05). These results indicate that multi-biological data showed better discriminative capacity for patients with schizophrenia than single biological data. The top 5% discriminative features selected from the optimal model include the gut microbiota features (Lactobacillus, Haemophilus, and Prevotella), the blood features (superoxide dismutase level, monocyte-lymphocyte ratio, and neutrophil count), and the electroencephalogram features (nodal local efficiency, nodal efficiency, and nodal shortest path length in the temporal and frontal-parietal brain areas). The proposed integrated framework may be helpful for understanding the pathophysiology of schizophrenia and developing biomarkers for schizophrenia using multi-biological data.
Similar content being viewed by others
Introduction
Finding effective and objective biomarkers to inform the diagnosis of schizophrenia (SZ) is of great importance yet remains challenging1,2. Currently, increasing evidence has shown that the gut microbiome, blood and electroencephalogram (EEG) provide abundant clues for the diagnosis of SZ. Recently, several studies have indicated that patients with SZ show an altered gut microbiome composition3,4,5, which is significantly associated with the severity of symptoms3 and human brain structure and function5. Moreover, a large number of previous studies indicate alterations in both pro- and anti-inflammatory molecules in the central nervous system, which have also been detected in peripheral blood, and may correlate with SZ symptoms6,7,8. Furthermore, several EEG analyses indicate that patients with SZ show significant alterations in the power of various frequency bands, including the increases in delta and theta waves, the decreases in alpha waves and the increases in beta and gamma waves9,10,11,12. However, most of these alterations are observed at the group level with substantial variability among individuals with the same phenotypic diagnosis. Consequently, none of these alterations has proven to have the ability to reliably aid in the differential diagnosis of SZ to date1,13. Therefore, studies analyzing how gut microbiota data, blood data and EEG data behave at an individual level are important; for example, this information could be used to better understand the pathology and identify objective biomarkers for the clinical diagnosis of SZ14.
Recently, pattern recognition based on machine learning has attracted increasing attention, which is well suited for the identification of subtle patterns of information in the data and, consequently, is useful to better predict the diagnosis at an individual level1,15,16,17. Using a variety of biological data, such as gut microbiota data4,18, blood data14, and EEG data12,19,20,21,22, along with machine learning techniques, hundreds of studies have been performed in an attempt to achieve the accurate classification of patients with SZ. For instance, a previous study4 used Boruta variable selection to select the most discriminatory taxa and random forests methods to develop a classifier and predict SZ based on the important microbiota features. A receiver operating characteristic curve analysis revealed that 12 significant microbiota biomarkers were capable of being used as diagnostic factors. A more recent study14 developed a probabilistic multi-domain data integration model consisting of immune and inflammatory biomarkers in peripheral blood and cognitive biomarkers using machine learning to discriminate patients with SZ from healthy controls (HCs). Another study20 applied the 1-norm support vector machine (SVM) method based on EEG signals of 64 channels during a working memory task to classify patients with SZ versus healthy controls and an accuracy of 87% was achieved. Despite these advances, previous discriminative studies of SZ have primarily focused on biomarkers extracted from a single type of biological data, which only capture partial information about the human body and therefore influence the resulting classification performance. Currently, increasing evidence has shown that the combination of multimodal imaging data can further improve the classification performance23,24,25,26.
In this study, we collected multi-biological data, including gut microbiota data, blood data and EEG data, from patients with SZ and HCs. An integrated framework of machine learning consisting of multi-biological data, multi-classifiers, multi-feature selection algorithms and multi-cross validation methods, was used to discriminate patients with SZ from HCs. Numerous previous studies have shown that: (1) combining multi-biological data provides more complementary information for discriminative analysis14,24; (2) multi-classifiers, multi-feature selection algorithms can better adapt to heterogeneous biological data27,28; (3) multi-cross validation methods can test the performance of models more credibly21. In this study, we proposed an integrated framework to improve the classification performance and the understanding of biomarker identification for SZ.
Materials and methods
Participants
The final sample comprised 99 participants, including 49 patients with SZ and 50 HCs. Patients with SZ were recruited from the Affiliated Brain Hospital of Guangzhou Medical University, Guangzhou, and met the diagnostic criteria in the fourth edition of the Diagnostic and Statistical Manual of Mental Disorder-IV-Text Revision (DSM-IV-TR). The psychopathology and symptom severity of the patients were evaluated with the positive and negative syndrome scale (PANSS) and the psychiatric symptoms were steady for >ā2Ā weeks; the PANSS evaluated the rate of change at ā¤ā20% over 2Ā weeks and the total score on the PANSS was ā„ā30. Patients with SZ were excluded if they met any of the following criteria: (1) any other psychiatric axis I disorder meeting the DSM-IV criteria; (2) constipation, diarrhea, diabetes, hypertension, heart disease, thyroid diseases or any somatic diseases; (3) a history of epilepsy, with the exception of febrile convulsions; (4) a history of having received electroconvulsive therapy in the past 6Ā months; (5) lactating, pregnant, or planning to become pregnant; (6) alcohol dependence; or (7) noncompliant with drug treatment or a lack of legal guardians.
The HCs were solicited from the local community through advertisements and were screened for their family clinical history and a history of mental illness. All healthy subjects had no history of brain disease (such as pain, schizophrenia, concussion, brain trauma, etc.), ocular disease, treatment with psychotropic medication and drug abuse. In addition, the subjects were asked not to drink alcohol, tea, coffee or any other food or drugs that might excite the central nervous system within 48Ā h before the experiment and that they get enough sleep the night before the test.
The study protocol was approved by the ethics committees of the Affiliated Brain Hospital of Guangzhou Medical University, and written informed consent was obtained from each subject or their legal guardian prior to the study.
Multi-biological data acquisition and preprocessing
EEG recording and preprocessing
Three minutes of resting EEG data were recorded from 16 scalp electrodes (i.e., Fp1, Fp2, F3, F4, C3, C4, P3, P4, O1, O2, F7, F8, T3, T4, T5, and T6) while the participantās eyes were closed according to the International 10/20 System and referenced to electrode Cz (UEA-B, symptom, China). All electrode impedances were maintained at less than 10Ā kĪ©. Signals were amplified and digitized using a sampling rate of 1000Ā Hz and a 60-Hz low-pass filter during recording.
EEG preprocessing was conducted using MATLAB software (Math Works, Natick, MA). Preprocessing was divided into four steps: electrode positioning, filtering, elimination of bad signal segments and signal frequency band decomposition. A bandpass filter of 0.1ā45Ā Hz was used to improve the quality of the signal. Then, the EEG signal was divided into several epochs of 2Ā s, and artifact noise, such as eye blinks and movement, was removed by technicians. Finally, the signal was divided into seven frequency subbands by a finite impulse response filter: delta band (1.5ā4Ā Hz), theta band (4ā8Ā Hz), alpha1 band (8ā10Ā Hz), alpha2 band (10ā13Ā Hz), beta1 band (13ā20Ā Hz), beta2 band (20ā30Ā Hz) and gamma band (30ā45Ā Hz).
Fecal sample collection and preprocessing
Fresh fecal samples were collected from all subjects and then were stored at āā80Ā Ā°C until DNA extraction. Two hundred milligrams of each fecal sample were used for DNA extraction.
The DNA extraction method was consistent with our previously published report3. Sequencing of the V4 region of the 16S rRNA gene was performed on the Illumina MiSeq platform. The row sequences were processed using QIIME2 (version 2018.6). Forward and reverse reads for each individual sample were demultiplexed, joined and quality filtered. We obtained a total of 4,561,105 joined sequences from these raw paired-end sequences, ranging from 15,449 to 95,651, and the average length of all joined sequences was approximately 251Ā bp. Then, the DADA229 algorithm was used for sequence quality control and feature table construction. After quality filtering, we obtained 4,148,451 high-quality reads, ranging from 13,581 to 90,203 and with a mean of 41,903.5 reads. Then, all the high-quality reads were clustered, 2031 features were obtained, and the frequency per feature ranged from 2 to 533,200, with an average of 3356.9. We used a pretrained naĆÆve Bayes classifier for taxonomic annotation, and this classifier was trained on the Greengenes database (version 13.8). The raw sequence data reported in this article have been deposited in GenBank in the National Center for Biotechnology Information (NCBI) under accession numbers MT545156āMT547172, which are publicly accessible at https://www.ncbi.nlm.nih.gov.
Blood collection and preprocessing
Three milliliters of blood were collected from control subjects and patients by simple venipuncture between 7.00 and 9.00Ā a.m., after an overnight fasting and tobacco abstinence for more than 12Ā h. Blood biochemical indicators were detected with an automatic biochemical analyzer.
Multi-biological feature extraction
EEG feature extraction
In this study, we used the phase-locked value (PLV) method to quantify the functional connectivity (FC) between any two channels of EEG signals, as shown in Fig.Ā 1.
The instantaneous phase \({\varnothing }\left(t\right)\) was calculated from the signal \(x(t)\) by using the Hilbert transform:
The phase was computed using the following expression:
Phase synchronization is defined as the locking of phases of two oscillators:
The phase-locking value (PLV) is defined as:
where \(i\) denotes the imaginary unit, N indicates the total number of samples, and \(\Delta t\) is the bespeak time between the successive samples \({\text{j}}\) from 1 to Nāāā1.
In this study, a cost threshold strategy was used to analyze global and nodal attributes of the functional brain network (FBN). The cost threshold should be greater than \(2*{\text{ln}}\left({\text{N}}\right)/N\), where N represented the number of nodes, to ensure that the small-world properties of FBNs were estimable30. Moreover, the resulting brain networks should have sparse properties and distinguishable properties compared to the degree-matched random networks. Thus, we selected the small-world regime as a range of cost thresholds (\( 34\% \le {\text{cost}} \le 73\% ,\;{\text{step}} = 1\% \)). The area under the curve for each attribute was then calculated across the range of cost thresholds and used in a subsequent analysis. Here, all the global and nodal attributes were calculated using the toolbox of BCT31. Global attributes include the global clustering coefficient (aCp), shortest path length (aLp), global efficiency (aEg), local efficiency (aEloc), aGamma, aLambda, and aSigma. Nodal attributes include the clustering coefficient (aNCp), nodal shortest path length (aNLp), nodal efficiency (aNe), nodal local efficiency (aNLe), and degree centrality (aDc). In this study, 56 global attributes of an FBN and 640 nodal attributes of 16 nodes were computed from the whole band and seven frequency subbands. Importantly, any features with missing values for any participant were removed. Finally, 48 global attributes and 526 node attributes were used for the subsequent analysis.
Gut microbiota feature extraction
Through gene sequencing technology, microbiota markers from 171 species were obtained from all subjects. Among them, any microbiota marker that was missing in more than 85% of the participants was removed. Ninety-four microbiota markers were removed, and 77 gut microbiota markers were selected for the final analysis.
Blood feature extraction
The white blood cells (WBC) count, neutrophils (NEU) count, lymphocytes (LYM) count, platelets (PLT) count and monocytes (MON) count were recorded from complete blood counts after routine blood tests. Four blood indicators inflammation and immunity, including the neutrophilālymphocyte ratio (NLR), plateletālymphocyte ratio (PLR), monocyteālymphocyte ratio (MLR) and systemic immune inflammation index (SIII), were calculated based on the numbers of the five cell types described above. Moreover, the oxidative stress indicators, including superoxide dismutase (SOD), homocysteine and C-reactive protein (CRP) levels, were also detected in the collected serum. In conclusion, we collected a total of 12 blood features for the final analysis.
Statistical analysis
Statistical analyses were conducted using SPSS software version 22 (IBM). The comparison of the sex distribution between the two groups was performed using the Ļ2 test. Comparisons including age and education years between the two groups were performed using a two-tailed two-sample t test. Unless specified otherwise, the significance of all tests was set to pā<ā0.05, or FDR-corrected pā<ā0.05.
Machine learning
We developed an integrated framework of machine learning to discriminate patients with SZ from HCs (Fig.Ā 2). Briefly, the framework involved three phases: the data preparation, model training, and independent model testing.
Data preparation
Data preparation included feature extraction and subject grouping. We extracted three types of biological features from fecal data, blood data, and EEG data, namely, gut microbiota features, blood features, and EEG features, respectively. For the final analysis, Seventy-seven gut microbiota features, 12 blood features, and 574 EEG features were selected for the final analysis. Three types of biological features were used as input features for machine learning, either individually or in combination, to form four input feature sets. At this stage, we randomly split the set of participants into two groups, a training dataset and an independent testing dataset, at a ratio of 3:1. The training dataset was used to train the model parameters, and the independent testing dataset was used to evaluate the performance of the trained model.
Model training
The specific details of the model training phase and independent model testing phase are shown in Fig.Ā 3. The model training procedures included three steps: multi-feature selection algorithms, multi-classifier, and multi-cross validation methods.
Because some features are less effective, irrelevant, or redundant for classification, and too many features may cause āoverfittingā, effective feature selection methods can be used to identify the discriminative features and facilitate disease classification and interpretation. Three feature selection algorithms were used on each classifier, including principal component analysis (PCA), recursive feature elimination (RFE) and analysis of variance (ANOVA), to observe the classification effect of the classifier.
A specific classification model directly based on multi-biological data is difficult to build due to heterogeneity. Therefore, the use of several machine learning methods to construct different classification models is meaningful. In this study, we used five different popular classifiers including support vector machine (SVM), random forest (RF), linear discriminant analysis (LDA), logistic regression (LR) and k-nearest neighbor (KNN), to determine the most suitable model and to evaluate classification performance based on single and combined biological features.
Multi-cross validation methods were used to analyze the training set, including tenfold, fivefold, threefold and leave-one-out methods, and to ensure that the sample size was sufficiently large to train the model and prevent overfitting caused by insufficient training. Several combinations of the aforementioned procedures were investigated for optimized data analysis. PCA and RFE feature selection algorithms were unable to be used due to the small dimension of blood features. As a result, 280 models were obtained based on four input feature sets, five classifiers, three feature selection algorithms and four cross validation methods. Model training in the second phase was performed with their application restricted to the training data set.
Independent model testing
In the third phase, we used an independent testing dataset to estimate the generalizability of 280 models arising from the second phase. We utilized the metrics of accuracy, sensitivity and specificity to quantitatively estimate the performance of all the methods mentioned in this study. Moreover, we plotted receiver operating characteristic (ROC) curves and then calculated the area under the curve (AUC) for each classification situation to examine the possibility of correctly discriminating patients with SZ and HCs.
A permutation test was applied to evaluate the statistical significance of the classification results. In our analysis, we disrupted the labels of all samples 1000 times, and the p value was computed as the proportion of accuracies that were no less than the accuracy obtained with the original data. The statistical significance was set to pā<ā0.05. All automatic classification work was performed using NEURO-LEARN (https://github.com/Raniac/NEURO-LEARN32), which is a solution for collaborative pattern analysis of neuroimaging data.
Results
Participants
The resulting data set comprised 99 participants, including 49 patients with SZ (mean [SD] age, 42.06 [12.48] years; 24 [49.0%] males) and 50 HCs (mean [SD] age, 41.70 [13.07] years; 23 [46.0%] males). Significant differences in either age (tā=ā0.141, pā=ā0.888) and sex (tā=ā0.294, pā=ā0.769) were not observed between the patients with SZ and HC group. See Table 1 for a detailed description of other characteristics.
Classification results and analysis
We used an independent testing dataset to estimate the generalizability of the 280 models. The classification performance of the tenfold cross validation method, fivefold cross validation method, threefold cross validation method, and leave-one-out cross validation method (eTables 1āTable 4 in the āSupplementary S1ā) was obtained. No significant differences were observed among the results of multi-cross validation methods. Table 2 shows the classification performance of the model obtained using different input features with tenfold cross validation methods. The optimal classification performance was achieved when multi-biological features were combined as input features, with 91.7% accuracy, 91.7% sensitivity, 91.7% specificity, and 96.5% AUC. The performance of the classifier based on multi-biological features was better than that of the classifiers using a single type of biological feature (Fig.Ā 4). In addition, the blood features achieved the best classification performance when using a single type of biological feature, with an accuracy of 83.3% and an AUC of 87.5%. When gut microbiota features, blood features, and EEG features were used as input feature sets alone, the classifiers and feature selection algorithms of the optimal model were inconsistent, potentially due to the heterogeneity of biological data. The SVM, LR and RF classifiers without using any feature selection algorithm displayed better classification performance when using combined features, with AUCs greater than 90%.
Discriminative features
In this subsection, the most informative features selected to differentiate the patients with SZ from HCs are reported. We discuss the most discriminative features from the optimal model that were generated when combined features were used. For quantitative analysis, the top 34 (5% of the total number of features) commonly selected features are summarized in Table 3, which shows the top 34 features for classification listed in descending order of their weights, including 14 gut microbiota features, 8 blood features, and 12 EEG features.
Discussion
To the best of our knowledge, this discriminative study of SZ is the first to combine multi-biological data of gut microbiota data, blood data, and EEG data. We developed an integrated framework of machine learning to discriminate patients with SZ from HCs. The main findings of this study are described below. (1) Using a combination of three types of biological features as input features for the classification, the best performance was achieved, with an accuracy of 91.7%, a sensitivity of 91.7%, a specificity of 91.7%, and an AUC of 96.5%. (2) the most discriminative features (top 5%) included gut microbiota features (Lactobacillus, Haemophilus, and Prevotella), blood features (superoxide dismutase level, monocyte-lymphocyte ratio, and neutrophil count), and EEG features (nodal local efficiency, nodal efficiency, and nodal shortest path length in the temporal and frontal-parietal areas).
In this study, we developed an integrated framework of machine learning using a combination of multi-biological data, which is a promising direction for the identification of biomarkers for the diagnosis, prognosis, and treatment patients with SZ. The comparison of classification performance with existing studies is listed in Table 4. A recent study indicated that the diagnosis of SZ can be predicted with possible clinical utility by a computational machine learning algorithm using the combination of blood and cognitive biomarkers; more importantly, the integration of multi-biological data outperforms a single type of biological data14, consistent with our findings. Interestingly, an early SVM-based prediction of the later development of SZ in a familial high-risk cohort is possible and can be improved by combining schizotypal and neurocognitive features with neuroanatomical variables33. In summary, based on the integrated framework of machine learning, the combination of multi-biological data substantially improves the classification performance for patients with SZ. Our results revealed that the features from multiple biological datasets provided complementary information and can help to develop effective and objective biomarkers for the clinical diagnosis of SZ1.
To date, although numerous discriminative studies of SZ have used either data of blood-based6,34,35, or neuroimaging data9,10,26,36,37, few studies have investigated the potential of biomarkers for the diagnosis of SZ using gut microbiota data. Based on accumulating evidence, the gut microbiota bidirectionally communicates with the central nervous system through the microbiome-gut-brain axis (MGBA), thereby influencing brain function and behavior38,39. Recently, a few studies have focused on the role of the MGBA in SZ and revealed several alterations in the gut microbiota in patients with SZ4,40,41,42. These reports of an altered gut microbiotas are consistent with the finding from study, for which the most informative features of the gut microbiota include Lactobacillus, Haemophilus, Collinsella, Clostridium, and Prevotella. Furthermore, Yuan et al.42 have shown that changes in the gut microbiota and its metabolites may cause neuronal damage. Lactobacillus stimulate TNF production; therefore, Lactobacillus may induce changes in inflammatory factors that induce SZ43. On the other hand, short-chain fatty acids (SCFAs), the primary bacterial metabolites produced, can enter the central nervous system through the bloodābrain barrier (BBB)44. Clostridium is the main source of propionate in the gut, indicating that Clostridium may influence the BBB and act on the brain by regulating SCFAs. In addition, Collinsella has been shown to produce the proinflammatory cytokine IL-17a and to alter intestinal permeability by promoting the release of neurotransmitters produced by gut microbiota45, thereby acting on the central nervous system. Above all, these investigations suggested that the gut microbiota may affect the central nervous system by acting on several pathways, providing a physiological basis for validating the use of the gut microbiota as a biomarker in the classification of the two groups.
Among the blood features we extracted, those that contributed the most to the classification included SOD level, MLR, MON count, NEU count, CRP level, WBC cunt and NLR, consistent with previous studies using conventional univariate statistical analysis. Numerous studies and increasing evidence suggest that the oxidative stress contributes to the pathogenesis of SZ, and abnormalities in antioxidant enzymes, including SOD activity, are frequently observed in patients diagnosed with SZ46,47,48. A previous study49 indicated that SOD activity remained lower in patients with SZ and may be an important indirect biomarker of oxidative stress in individuals with SZ. The present findings provide additional evidence of increased oxidative stress in patients with SZ. Blood inflammatory and immune system abnormalities in patients with SZ have been widely reported, which lead to an increase in levels of inflammatory markers. The NEU count was reported to be increased in patients with chronic SZ50. An increased MON count has also been reported in patients with chronic SZ51,52. Furthermore, a moderately increased CRP level in patients with SZ compared to HCs has been observed53,54,55. Subjects with SZ have significantly elevated WBC counts. The MLR and NLR have recently been used as indicators of inflammation, and predictors of cardiovascular disease, the leading cause of death in patients with SZ. A recent meta-analysis revealed a significant increase in the NLR in patients with SZ56. Elevated MLR and NLR have been observed in patients with SZ, suggesting an increased inflammatory response in individuals with SZ50. Our experimental results are consistent with these studies.
Table 3 shows that the EEG features with heavy weight are primarily derived from the delta and alpha2 frequency bands and partly from the beta and gamma frequency bands. Previous investigators observed increases in delta and theta waves, decreases in alpha waves and increases in beta and gamma waves in individuals with SZ9,10,12,14,57,58. Moreover, the most prominent change was in the spectral power of the delta wave, which may support the development of a biological marker for diagnosing patients with SZ9,10,59. In addition, among these EEG features, node attributes including nodal local efficiency (aNLe), nodal efficiency (aNe), nodal clustering coefficient (aNCp), and degree centrality (aDc), contributed most to classifying patients with SZ. EEG studies have shown a disruption in the small-world attributes of patients with SZ in the resting state, with a lower clustering coefficient and a longer shortest path length60. In addition, global and local efficiency are lower in patients with SZ than that in healthy people61. The most discriminative EEG features in Table 3 are primarily concentrated in the temporal lobe and partly in the frontal lobe. Abnormalities in temporal and frontal lobe function and structure have been widely reported in patients with SZ62 The frontal and temporal lobes are primarily associated with higher cognitive functions, among which the temporal lobe is associated with hearing and language functions, which have been confirmed by MRI studies63. These results are consistent with previous structural and functional neurological findings.
Limitations
The present study has several limitations. First, since this study employed a cross-sectional design, we cannot infer causality. Some evidence suggests that immune-inflammatory markers are altered from the beginning of SZ, and researchers have broadly accepted that inflammation plays a causal role in SZ. However, from a diagnostic perspective, this finding is irrelevant. A specific marker must only discriminate between two conditions, regardless of whether it is a cause, consequence, or correlate of the pathophysiological process. Second, a significant difference in education years was observed between the two participant groups, although the results remained unchanged when this factor was included as a covariate. Third, the sample size was moderate. A larger independent sample is essential to examine the reproducibility of our findings.
Conclusions
In conclusion, we developed an integrated framework of machine learning and used the combination of multi-biological data to discriminate patients with SZ from HCs, which substantially improved the classification performance. Based on our results, features from multiple biological datasets provide complementary information that aids in providing effective and objective biomarkers to inform the clinical diagnosis of SZ, and our framework is effective at conveying comprehensive and complementary information for the purpose of classification.
References
Fernandes, B. S. et al. The new field of āprecision psychiatryā. BMC Med. 15(1), 80 (2017).
McCutcheon, R. A., Reis Marques, T. & Howes, O. D. SchizophreniaāAn overview. JAMA Psychiat. 77(2), 201ā210 (2020).
Li, S. et al. Altered gut microbiota associated with symptom severity in schizophrenia. PeerJ 8, e9574 (2020).
Shen, Y. et al. Analysis of gut microbiota diversity and auxiliary diagnosis as a biomarker in patients with schizophrenia: A cross-sectional study. Schizophr. Res. 197, 470ā477 (2018).
Li, S., et al. The gut microbiome is associated with brain structure and function in schizophrenia. Sci. Rep. 11, 9743 (2021).
Chan, M. K. et al. Applications of blood-based protein biomarker strategies in the study of psychiatric disorders. Prog. Neurobiol. 122, 45ā72 (2014).
Tomasik, J., Rahmoune, H., Guest, P., Bahn, S. Neuroimmune biomarkers in schizophrenia. Schizophr. Res. 176, 3ā13 (2014).
Colpo, G. D., Leboyer, M., Dantzer, R., Trivedi, M. H. & Teixeira, A. L. Immune-based strategies for mood disorders: facts and challenges. Expert Rev. Neurother. 18(2), 139ā152 (2018).
Alfimova, M. V. & Uvarova, L. G. Changes in EEG spectral power on perception of neutral and emotional words in patients with schizophrenia, their relatives, and healthy subjects from the general population. Neurosci. Behav. Physiol. 38(5), 533ā540 (2008).
Boutros, N. N. et al. The status of spectral EEG abnormality as a diagnostic test for schizophrenia. Schizophr. Res. 99(1ā3), 225ā237 (2008).
Gong, Q. & He, Y. Depression, neuroimaging and connectomics: A selective overview. Biol. Psychiatry 77(3), 223ā235 (2015).
Kim, J. W. et al. Diagnostic utility of quantitative EEG in un-medicated schizophrenia. Neurosci. Lett. 589, 126ā131 (2015).
Stephan, K. E. et al. Charting the landscape of priority problems in psychiatry, part 1: Classification and diagnosis. Lancet Psychiatry 3(1), 77ā83 (2016).
Fernandes, B. S. et al. Precision psychiatry with immunological and cognitive biomarkers: A multi-domain prediction for the diagnosis of bipolar disorder or schizophrenia using machine learning. Transl. Psychiatry 10(1), 162 (2020).
Deo, R. C. Machine learning in medicine. Circulation 132(20), 1920ā1930 (2015).
Chen, Z. et al. Detecting abnormal brain regions in schizophrenia using structural MRI via machine learning. Comput. Intell. Neurosci. 2020, 1ā13 (2020).
Ji, D. et al. Machine learning of discriminative gate locations for clinical diagnosis. Cytometry A 97(3), 296ā307 (2020).
He, Y. et al. Gut microbiome and magnetic resonance spectroscopy study of subjects at ultra-high risk for psychosis may support the membrane hypothesis. Eur. Psychiatry 53, 37ā45 (2018).
Phang, C. R., Noman, F., Hussain, H., Ting, C. M. & Ombao, H. A multi-domain connectome convolutional neural network for identifying schizophrenia from EEG connectivity patterns. IEEE J. Biomed. Health Inform. 24(5), 1333ā1343 (2020).
Johannesen, J. K., Bi, J., Jiang, R., Kenney, J. G. & Chen, C. A. Machine learning identification of EEG features predicting working memory performance in schizophrenia and healthy adults. Neuropsychiatr. Electrophysiol. 2, 3 (2016).
Tikka, S. K. et al. Artificial intelligence-based classification of schizophrenia: A high density electroencephalographic and support vector machine study. Indian J. Psychiatry 62, 273 (2020).
Luo, Y., et al. Biomarkers for prediction of schizophrenia: Insights from resting-state EEG microstates. IEEE Access 8, 213078ā213093 (2020).
Sharpee, T. O. et al. 25th annual computational neuroscience meeting: CNS-2016. BMC Neurosci. 17(Suppl 1), 54 (2016).
Zhuang, H. et al. Multimodal classification of drug-naĆÆve first-episode schizophrenia combining anatomical, diffusion and resting state functional resonance imaging. Neurosci. Lett. 705, 87ā93 (2019).
Sui, J. et al. Combination of resting state fMRI, DTI, and sMRI data to discriminate schizophrenia by N-way MCCAā+ājICA. Front. Hum. Neurosci. 7, 235 (2013).
Li, X. et al. Altered topological characteristics of morphological brain network relate to language impairment in high genetic risk subjects and schizophrenia patients. Schizophr. Res. 208, 338ā343 (2019).
Dai, Z. et al. Discriminative analysis of early Alzheimerās disease using multi-modal imaging and multi-level characterization with multi-classifier (M3). Neuroimage 59(3), 2187ā2195 (2012).
Donnelly-Kehoe, P. A., Pascariello, G. O. & GĆ³mez, J. C. Looking for Alzheimerās Disease morphometric signatures using machine learning techniques. J. Neurosci. Methods 302, 24ā34 (2018).
Callahan, B. J. et al. DADA2: High-resolution sample inference from Illumina amplicon data. Nat. Methods 13(7), 581ā583 (2016).
Erdƶs, P. & RĆ©nyi, A. On random graphs I. Publ. Math. Debrecen 6, 290ā297 (1959).
Rubinov, M. & Sporns, O. Complex network measures of brain connectivity: Uses and interpretations. Neuroimage 52(3), 1059ā1069 (2010).
Lei, B., et al. NEURO-LEARN: A solution for collaborative pattern analysis of neuroimaging data. Neuroinformatics 19, 79ā91 (2020).
Zarogianni, E., Storkey, A. J., Johnstone, E. C., Owens, D. G. & Lawrie, S. M. Improved individualized prediction of schizophrenia in subjects at familial high risk, based on neuroanatomical data, schizotypal and neurocognitive features. Schizophr. Res. 181, 6ā12 (2017).
Schwarz, E. et al. Identification of a blood-based biological signature in subjects with psychiatric disorders prior to clinical manifestation. World J. Biol. Psychiatry 13(8), 627ā632 (2012).
Fernandes, B. et al. Serum thiobarbituric acid reactive substances in bipolar disorder during mania and schizophrenia: A biomarker with possible diagnostic implications. In: 4th biennial meeting of international society for bipolar disorders (2010).
Wu, F. et al. Structural and functional brain abnormalities in drug-naive, first-episode, and chronic patients with schizophrenia: A multimodal MRI study. Neuropsychiatr. Dis. Treat 14, 2889ā2904 (2018).
Lu, X. B. et al. Analysis of first-episode and chronic schizophrenia using multi-modal magnetic resonance imaging. Eur. Rev. Med. Pharmacol. Sci. 22(19), 6422ā6435 (2018).
Cryan, J. F. & Dinan, T. G. Mind-altering microorganisms: The impact of the gut microbiota on brain and behaviour. Nat. Rev. Neurosci. 13(10), 701ā712 (2012).
Sampson, T. R. et al. Gut microbiota regulate motor deficits and neuroinflammation in a model of parkinsonās disease. Cell 167(6), 1469ā1480.e1412 (2016).
Schwarz, E. et al. Analysis of microbiota in first episode psychosis identifies preliminary associations with symptom severity and treatment response. Schizophr. Res. 192, 398ā403 (2018).
Zhu, F. et al. Transplantation of microbiota from drug-free patients with schizophrenia causes schizophrenia-like abnormal behaviors and dysregulated kynurenine metabolism in mice. Mol. Psychiatry 25(11), 2905ā2918 (2020).
Yuan, X., Kang, Y., Zhuo, C., Huang, X. F. & Song, X. The gut microbiota promotes the pathogenesis of schizophrenia via multiple pathways. Biochem. Biophys. Res. Commun. 512(2), 373ā380 (2019).
Rocha-RamĆrez, L. M. et al. Probiotic lactobacillus strains stimulate the inflammatory response and activate human macrophages. J. Immunol. Res. https://doi.org/10.1155/2017/4607491 (2017).
De Vadder, F. et al. Microbiota-generated metabolites promote metabolic benefits via gut-brain neural circuits. Cell 156(1), 84ā96 (2014).
Chen, J. et al. An expansion of rare lineage intestinal microbes characterizes rheumatoid arthritis. Genome Med. 8(1), 43 (2016).
Ben Othmen, L. et al. Altered antioxidant defense system in clinically stable patients with schizophrenia and their unaffected siblings. Prog. Neuropsychopharmacol. Biol. Psychiatry 32(1), 155ā159 (2008).
Zhang, X. Y. et al. Antioxidant enzymes and lipid peroxidation in different forms of schizophrenia treated with typical and atypical antipsychotics. Schizophr. Res. 81(2ā3), 291ā300 (2006).
Zhang, X. Y. et al. Disrupted antioxidant enzyme activity and elevated lipid peroxidation products in schizophrenic patients with tardive dyskinesia. J. Clin. Psychiatry 68(5), 754ā760 (2007).
Raffa, M. et al. Reduced antioxidant defense systems in schizophrenia and bipolar I disorder. Prog. Neuropsychopharmacol. Biol. Psychiatry 39(2), 371ā375 (2012).
Ćzdin, S., Sarisoy, G. & Bƶke, Ć. A comparison of the neutrophil-lymphocyte, platelet-lymphocyte and monocyte-lymphocyte ratios in schizophrenia and bipolar disorder patientsāA retrospective file review. Nord J. Psychiatry 71(7), 509ā512 (2017).
Wilke, I. et al. Investigations of cytokine production in whole blood cultures of paranoid and residual schizophrenic patients. Eur. Arch Psychiatry Clin. Neurosci. 246(5), 279ā284 (1996).
Zorrilla, E. P., Cannon, T. D., Gur, R. E. & Kessler, J. Leukocytes and organ-nonspecific autoantibodies in schizophrenics and their siblings: Markers of vulnerability or disease?. Biol. Psychiatry 40(9), 825ā833 (1996).
Miller, B. J., Buckley, P., Seabolt, W., Mellor, A. & Kirkpatrick, B. Meta-analysis of cytokine alterations in schizophrenia: Clinical status and antipsychotic effects. Biol. Psychiatry 70(7), 663ā671 (2011).
Miller, B. J., Gassama, B., Sebastian, D., Buckley, P. & Mellor, A. Meta-analysis of lymphocytes in schizophrenia: Clinical status and antipsychotic effects. Biol. Psychiatry 73(10), 993ā999 (2013).
Miller, B. J., Culpepper, N. & Rapaport, M. H. C-reactive protein levels in schizophrenia: A review and meta-analysis. Clin. Schizophr. Relat. Psychoses 7(4), 223ā230 (2014).
Karageorgiou, V., Milas, G. P. & Michopoulos, I. Neutrophil-to-lymphocyte ratio in schizophrenia: A systematic review and meta-analysis. Schizophr. Res. 206, 4ā12 (2019).
Miyauchi, T. et al. Computerized EEG in schizophrenic patients. Biol. Psychiatry 28(6), 488ā494 (1990).
Sponheim, S. R., Clementz, B. A., Iacono, W. G. & Beiser, M. Clinical and biological concomitants of resting state EEG power abnormalities in schizophrenia. Biol. Psychiatry 48(11), 1088ā1097 (2000).
Sponheim, S. R., Iacono, W. G., Thuras, P. D., Nugent, S. M. & Beiser, M. Sensitivity and specificity of select biological indices in characterizing psychotic patients and their relatives. Schizophr. Res. 63(1ā2), 27ā38 (2003).
Jamal, W., Das, S. & Maharatna, K. Existence of millisecond-order stable states in time-varying phase synchronization measure in EEG signals. Conf. Proc. IEEE Eng. Med. Biol. Soc. 2013, 2539ā2542 (2013).
Chen, J. et al. Variability in resting state network and functional network connectivity associated with schizophrenia genetic risk: A pilot study. Front. Neurosci. 12, 114 (2018).
van den Heuvel, M. P. & Fornito, A. Brain networks in schizophrenia. Neuropsychol. Rev. 24(1), 32ā48 (2014).
Modinos, G. et al. Neuroanatomy of auditory verbal hallucinations in schizophrenia: A quantitative meta-analysis of voxel-based morphometry studies. Cortex 49(4), 1046ā1055 (2013).
Acknowledgements
This work was supported by the National Key Research and Development Program of China (2020YFC2004300, 2020YFC2004301, 2019YFC0118800, 2019YFC0118802, 2019YFC0118804, and 2019YFC0118805), the National Natural Science Foundation of China (31771074, 81802230, and 32000980), the Key Research and Development Program of Guangdong (2018B030335001, 2020B0101130020, and 2020B0404010002), the Guangdong Basic and Applied Basic Research Foundation Outstanding Youth Project (2021B1515020064), the Guangdong Basic and Applied Basic Research Foundation (2019A1515110427), the Key Platform and Scientific Research Project of Guangdong Provincial Education Department (2018KTSCX246), the Science and Technology Program of Guangzhou (201807010064, 201803010100, 201903010032, and 202103000032), the Key Laboratory Program of Guangdong Provincial Education Department (2020KSYS001), and the Scientific Research Project of Traditional Chinese Medicine of Guangdong (20211306). All procedures performed in studies involving human participants were conducted in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Declaration of Helsinki and its later amendments or comparable ethical standards.
Author information
Authors and Affiliations
Contributions
P.K. wrote the main text of the manuscript. P.K., K.W., F.W., and D.X. designed the study. P.K., K.W., F.W., D.X., Z.P., and J.L. conducted experiments and analyzed the data. J.S., S.L. and X.C. edited the manuscript. Z.P. and J.Z. revised the manuscript. K.W. and F.W. obtained funding. J.Z., G.L., J.C., Y.N. and X.L. provided technical and material support. All authors reviewed and approved the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Ke, Pf., Xiong, Ds., Li, Jh. et al. An integrated machine learning framework for a discriminative analysis of schizophrenia using multi-biological data. Sci Rep 11, 14636 (2021). https://doi.org/10.1038/s41598-021-94007-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-021-94007-9
This article is cited by
-
A novel blood-based epigenetic biosignature in first-episode schizophrenia patients through automated machine learning
Translational Psychiatry (2024)
-
Sampling inequalities affect generalization of neuroimaging-based diagnostic classifiers in psychiatry
BMC Medicine (2023)
-
Automated detection of mental disorders using physiological signals and machine learning: A systematic review and scientometric analysis
Multimedia Tools and Applications (2023)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.