One of the overarching goals of the emerging field of precision psychiatry is to incorporate advanced technologies to provide an objective data-driven personalized approach to the diagnosis and treatment of mental disorders1,2. However, unlike other medical fields, there is an acknowledged ‘translational gap’ in psychiatry1,3. In parallel, the field of biological psychiatry aiming to provide a neurobiological basis for current mental disorders, has provided contrasting results, even in pivotal biomarkers4. Hence, the diagnosis and clinical management of major mental disorders is still entirely based on psychopathological knowledge, while the treatment of mental disorders remains predominantly based on ‘trial and error’, albeit within the confines of fitting evidence-based prescription to a clinical profile5.

Over the past two decades the field has witnessed a remarkable increase in interest on biomarkers for mental disorders6. In particular, the literature on non-genetic peripheral biomarkers has grown exponentially, with the publication of several systematic reviews and meta-analyses7,8,9,10,11,12. The identification and validation of biomarkers for mental disorders are thought to be crucial steps in the development of precision and biological psychiatry, and its ultimate incorporation in the current landscape of psychiatric care is expected to follow1. However, this change is not translating into meaningful modifications in clinical practice.

Several reasons may contribute to the contrast between the overall volume of this literature and the limited applicability of peripheral biomarkers in current psychiatric practice. For instance, it has been proposed that conventional psychiatric diagnoses based, for example, on the Diagnostic and Statistical Manual for Mental Disorders (DSM) may lack biological validity2,13. In this respect, it has been proposed that similarly to genetic14 and neuroimaging15,16 biomarkers, alterations in peripheral biomarkers for major mental disorders may be shared across distinct diagnostic categories, and thus may have a transdiagnostic nature6. However, what is a trans-diagnostic construct in psychiatry remains debated, and no study has properly assessed the trans-diagnostic nature of any biomarker with a methodologically sound approach17.

In addition to the lack of consensus on how to define a trans-diagnostic construct, a core reason for this translational gap even in a single disorder may be due to the presence of several biases including large heterogeneity, an excess significance bias, as well as a selective reporting of statistically significant (i.e., ‘positive’) findings without proper adjustment to multiple confounders. An Umbrella review systematically evaluates and collects information from multiple systematic reviews and meta-analyses on all outcomes of a given topic for which these have been performed18. Umbrella reviews are particularly suited to uncover these biases19, as previously demonstrated with respect to peripheral biomarkers for depression20, bipolar disorder (BD)20, and schizophrenia21. However, those previous umbrella reviews have only addressed studies that have differentiated participants with a specific mental disorder and healthy controls, and not changes in peripheral biomarkers following treatment for these disorders. Moreover, those umbrella reviews focused on only one mental disorder each.

Thus, the current work provides a comprehensive umbrella review of meta-analyses of peripheral biomarkers for major mental disorders related to high prevalence and burden, namely Alzheimer’s disease (AD), autism spectrum disorder (ASD), BD, major depressive disorder (MDD), and schizophrenia, including also first-episode psychosis (FEP) stage. We aimed to re-assess the presence of bias in this literature and identify biomarkers that would be supported by most convincing evidence. In addition, we aimed to identify shared and unique alterations in biomarkers for those major mental disorders among those supported by either convincing or highly suggestive evidence. In the current analysis, we considered both studies that investigated abnormalities in peripheral biomarkers of mental disorders compared to controls (i.e., between-group meta-analyses) and ones that assessed alterations in the levels of peripheral biomarkers after treatment (i.e., within-group meta-analyses).


Literature search

We conducted an umbrella review, which is a systematic collection of multiple systematic reviews and meta-analyses done in a specific research topic22. The PubMed/MEDLINE database was searched from inception to February 17, 2019 for all available meta-analyses non-genetic peripheral biomarkers for major mental disorders. This search strategy was augmented through (1) handsearching the reference lists of included articles and (2) tracking citations of included articles through the Google Scholar database. The search string used in the current umbrella review was developed by a professional librarian and is available in the Supplementary Online material. The searches, screening, data extraction, and methodological quality appraisal were independently conducted by at least two investigators. Disagreements were resolved through consensus. When a consensus could not be reached a third investigator (AFC) made the final decision. An a priori defined protocol was followed (available upon reasonable request to the corresponding author of the current manuscript).

Eligibility criteria

We included meta-analyses published in peer-reviewed journals that assessed and synthesized studies on peripheral biomarkers for adults with AD, ASD, BD, MDD, Schizophrenia, including FEP. We included studies in which biomarkers were assayed in participants with a specific mental disorder compared to controls (i.e., between-group meta-analyses), as well as ones which assessed changes in peripheral biomarkers in any of those disorders after treatment (i.e., within-group meta-analyses). Studies published in English were considered for inclusion. This decision was made because most well-designed systematic reviews and meta-analyses are published in English. We included studies in which diagnoses of mental disorders were conducted by means of a validated structured interview based on standard diagnostic criteria such as the International Classification of Disease (ICD) or the Diagnostic and Statistical Manual of Mental Disorders (DSM). We also considered studies in which a probable diagnosis of a major depressive episode was established through a validated screening questionnaire as well as studies in which a diagnosis of FEP was based on clinical assessment by a mental health care provider. We excluded the following types of studies: (1) systematic reviews without a meta-analytic synthesis of the evidence; (2) animal studies; (3) studies of other types of biomarkers (for example, genetic biomarkers); (4) studies that included participants with two or more diagnoses; (5) studies that included participants with other primary psychiatric diagnoses (e.g. anxiety disorders); (6) studies that investigated biomarkers for other purposes (for example, biomarkers of risk, stage or prognosis)23; (7) studies conducted in pediatric samples (except from ASD and FEP); and (8) if there was more than one meta-analysis for the same biomarker in the same population, we considered only the largest MA (i.e., the one with the largest number of included individual studies).

Data extraction

For each eligible reference, we extracted the first author, year of publication, specific diagnoses assessed, as well as the number of included studies. We also extracted the summary effect size (ES) measure of each meta-analysis considering the ES used in each study. When available, the following variables were extracted at a study-level: number of cases, number of controls, sample size, ES, and study design. In each eligible reference, we only included the primary analyses due to the expected large amount of evidence. However, when included references provided details on the mood state of participants (e.g. mania or bipolar depression), we also extracted this information at an individual-study level.

Statistical analysis and methodological quality appraisal

Data were analyzed from March 1, 2019 to October 10, 2019. We estimated ESs and 95% confidence intervals (CIs) using both fixed and random-effects modeling24. Due to the anticipated high heterogeneity observed in meta-analyses of peripheral biomarkers for major mental disorders, random-effects calculations were considered in this review. When ESs were not provided as standardized mean difference (SMD) metrics (e.g., odds ratio), we converted the primary ESs to SMD25. We also estimated the 95% prediction interval, which accounts for between-study heterogeneity and assesses the uncertainty of the effect that would be expected in a new study addressing the same association26. For the largest study included in each meta-analytic estimate, we calculated the standard error (SE) of the ES. If the SE of the ES is <0.1, then the 95% CI will be <0.20 (i.e., less than the magnitude of a small ES). We calculated the I2 metric to quantify between-study heterogeneity. Values ≥50% and ≥75% are indicative of large and very large heterogeneity, respectively27. To assess evidence of small-study effects, we used the asymmetry test developed by Egger et al. 28. A P-value <0.10 in the Egger’s test and the ES of the largest study being more conservative than the summary random-effects ES of the meta-analysis were considered indicative of small-study effects20. We also annotated whether the association reported in each meta-analytic estimate was nominally significant at a P < 0.05 level as well as at a P < 0.005 level. The level of P < 0.005 has been proposed as a more stringent level of significance that could increase the reproducibility of many fields29.

We also determined whether the meta-analysis had a statistical power ≥ 80% to detect either a small (i.e., ES ≥ 0.2) or a medium (i.e., ES ≥ 0.5). We used the method described in detail elsewhere30. Finally, we also assessed evidence of excess of significance bias with the Ioannidis test31. Briefly, this test estimates whether the number of studies with nominally significant results (i.e., P < 0.05) among those included in a meta-analysis is too large considering their power to detect significant effects at an alpha level of 0.05. First, the power of each study is estimated with a non-central t distribution. The sum of all power estimates provides the expected (E) number of datasets with nominal statistical significance. The actual observed (O) number of statistically significant datasets is then compared to the E number using a χ2-based test31. Since the true ES of a meta-analysis cannot be precisely determined, we considered the ES of the largest dataset as the plausible true ES. This decision was based on the fact that simulations indicate that the most appropriate assumption is the ES of the largest dataset included in the meta-analysis32. Excess significance for a single meta-analysis was considered if P < 0.10 in Ioannidis’s test and O > E20. We graded the credibility of each association according to the following categories: convincing (class I), highly suggestive (class II), suggestive (class III), weak evidence (class IV), and non-significant associations (Table S1).

For evidence supported by either class I or class II evidence, we used credibility ceilings, which is which is a method of sensitivity analyses to account for potential methodological limitations of observational studies that might lead to spurious precision of combined effect estimates. In brief, this method assumes that every observational study has a probability c (credibility ceiling) that the true ES is in a different direction from the one suggested by the point estimate33. The pooled ESs were estimated considering a wide range of credibility ceilings. All analyses were conducted in STATA/MP 14.0 (StataCorp, USA) with the metan package.

The methodological quality of included systematic reviews and meta-analyses was also appraised using the Assessment of Multiple Systematic Reviews (AMSTAR) instrument, which has been validated for this purpose34,35. Scores range from 0 to 11 with higher scores indicating greater quality. The AMSTAR tool involves dichotomous scoring (i.e. 0 or 1) of 11 items related to assess methodological rigor of systematic reviews and meta-analyses (e.g., comprehensive search strategy, publication bias assessment). AMSTAR scores are graded as high (8–11), medium (4–7) and low quality (0–3)34.


Our search strategy identified 1161 unique references of which 991 were excluded after title/abstract screening and 170 underwent full-text review (Fig. 1). Therefore, 110 references met inclusion criteria7,8,9,10,11,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139, and 60 references were excluded with reasons (Table S2). In the 110 included references, there were 81 between-group meta-analytic estimates for MDD, 79 for AD, 62 for schizophrenia, 45 for ASD, 37 for BD, and 15 for FEP. In addition, there were 25 within-group meta-analytic estimates for MDD, 13 for Schizophrenia, and 2 for BD (Mania) (Table S3). In total, there were 247,678 biomarker measurements estimates in cases and 476,340 assays in controls across between-group meta-analyses, while there were 9298 biomarker measurements across within-group meta-analytic estimates (Table S3). One hundred and ninety meta-analytic estimates were statistically significant at a P-value < 0.05, whilst 109 were significant at a P-value < 0.005 (Table S3).

Fig. 1
figure 1

Study flowchart.

Power of meta-analyses

Fifteen between-group meta-analytic estimates had an estimated power >0.8 to detect a small ES, and 145 meta-analyses (126 between-group meta-analyses) had an estimated power >0.8 to detect a medium ES (Table S3).

Heterogeneity and prediction intervals

No evidence of large heterogeneity (i.e., I2 < 50%) was found in 65 meta-analyses (18.1%), whilst 294 (81.9%) meta-analytic estimates had evidence of large heterogeneity (i.e., I2 > 50%). The prediction interval crossed the null value in 341 (94.9%) meta-analytic associations, while prediction intervals of 20 (5.0%) meta-analyses did not cross the null value (Table S3).

Small-study effects and excess significance bias

Evidence of small-study effects, which is an indication of publication bias, was observed in 38 (10.6%) meta-analyses, whilst evidence of excess of significance bias was verified in 74 (20.6%) meta-analytic estimates (Tables S3).

Grading of the evidence

Only 2 (0.5%) meta-analytic estimates exhibited class I evidence (83, 119). In euthymic BD participants there was an increase in basal cortisol awakening levels (Hedges’g = 0.25; 95% CI: 0.15–0.35, P < 0.005) compared to controls87. Participants with schizophrenia presented decreased Vitamin B6 (pyridoxal) levels relative to controls123. In addition, 42 (11.7%) meta-analytic estimates were supported by class II evidence, of which 3 were derived from within-group meta-analyses (Table 1). Among those estimates, C-reactive protein levels were increased in euthymic BD, bipolar mania, and in MDD relative to controls80,102. In addition, soluble interleukin-(IL)-2 receptor (sIL-2R) levels were increased in MDD and in schizophrenia relative to controls7,8. Moreover, levels of antibodies against the N-methyl-d-aspartate receptor (NMDA-R) were elevated in BD and in schizophrenia relative to controls85. Brain-derived neurotrophic factor (BDNF) levels were decreased in AD and in MDD44,110. Furthermore, levels of insulin-like growth factor-1 (IGF-1) were elevated in bipolar mania and in MDD relative to controls84. The remaining findings supported by type II evidence were unique to a single disorder (Table 1).

Table 1 Peripheral biomarkers supported by convincing and highly suggestive evidence across major mental disorders.

Of the 44 biomarkers supported by either type I or type II evidence, 37 (84.1%) survived 10% credibility ceilings (Table 2).

Table 2 Sensitivity analysis using credibility ceilings for the meta-analyses investigating the associations between biomarkers and Alzheimer disease, autism, bipolar disorder, depression, first episode psychosis, schizophrenia.

Qualitative methodological appraisal of eligible meta-analyses

Qualitative methodological appraisal of eligible meta-analyses through the AMSTAR tool revealed that 49 references were classified as high, 58 as medium, and 3 as low methodological quality, respectively (Table S4). The overall methodological quality of included references was high according to the AMSTAR [(median: 8; IQR = 2 (7–9)] (Table S4).


Our umbrella review provided an up-dated synthesis of the literature of non-genetic peripheral biomarkers for major mental disorders. We included data from 733,316 biomarker measurements. However, in this vast literature only two associations met a priori defined criteria for convincing evidence, whilst 42 meta-analytic estimates met criteria for highly suggestive evidence. This collaborative effort found compelling evidence that overall the literature on non-genetic peripheral biomarkers has a high prevalence of different types of bias. In addition, this umbrella review provides relevant insights for the conduct of further studies to investigate the associations supported by most convincing evidence. It should also be noted that overall the methodological quality of eligible meta-analyses as assessed with the AMSTAR tool was high, which provides further credibility to our quantitative grading of findings.

Associations supported by convincing evidence merit discussion. First, euthymic participants with BD exhibited a high cortisol awakening response relative to controls87. This finding indicates that the hypothalamic–pituitary–adrenal (HPA) axis is disrupted in BD on a trait-like basis. This suggests that the HPA axis could be targeted in BD140 to improve cognitive function, which may be compromised even during euthymic states141,142. In addition, participants with schizophrenia exhibited decreased vitamin B6 (pyridoxal) levels compared to controls123. This suggests that individuals with schizophrenia may present aberrations in the one-carbon cycle where pyridoxal is a main metabolic component. An alternative explanation might be the poor nutrition which frequently affects people with schizophrenia98. This finding is consistent with a recent systematic review and meta-analysis which provided preliminary evidence that adjunctive pharmacological interventions targeting the one-carbon cycle may improve negative symptoms in schizophrenia (although the clinical significance of this improvement may remain questionable143 and aligns with recent evidence showing that adjunctive treatment with B-vitamins may improve symptomatic outcomes in treatment of psychotic disorders144,145).

Importantly, only five biomarkers were found to be significantly associated with more than one mental disorder. Also, the highest class of evidence for these biomarkers was II. Moreover, no study applied a methodologically solid approach to assess the trans-diagnostic nature of any biomarker17. We found peripheral elevation on the acute phase reactant, CRP, in BD (both during euthymia and mania) as well as in MDD providing evidence that these disorders are at least partly associated with peripheral inflammation. In addition, the s-IL-2R was increased in both MDD and schizophrenia relative to controls. It is noteworthy that IL-2 is a key cytokine involved in the development, survival and function of regulatory T cells (TRegs)146,147, and it has been recently proposed that aberrations in “fine tuning” immune-regulatory mechanisms may contribute to the pathophysiology of both MDD and schizophrenia148,149. Antibodies against the NMDA-R were increased in BD and schizophrenia. This finding is consistent with the existence of autoantibodies against the GluN1 subunit of this receptor in patients with psychotic manifestations150,151. Furthermore, lower serum BDNF levels were observed in participants with MDD and AD relative to controls. This finding is consistent with the “neurotrophic hypothesis” of depression152, while parallel lines of evidence suggest that aberrations in BDNF signaling may contribute to neurodegeneration in AD153. Finally, lower levels of IGF-1 were observed in bipolar mania and MDD compared to controls. This finding is consistent with the modulatory role of glucose-related signaling including the trophic molecule IGF-1 in hippocampal plasticity154. In addition, preclinical evidence suggests that IGF-1 may be involved in the pathophysiology of affective disorders155,156.

There is an emerging body of literature investigating the putative role of non-genetic peripheral biomarkers for the prediction of treatment response in major mental disorders. Surprisingly, no such biomarkers met criteria for convincing evidence, while only three biomarkers met criteria for type II evidence. Adiponectin levels in schizophrenia decreased after treatment with second-generation antipsychotics. This is an interesting finding since hypoadiponectinemia has been associated with a wide range metabolic diseases which are common untoward effects of these drugs157,158. In addition, IL-6 levels decreased after treatment with antidepressants. These data are consistent with preclinical findings which show that antidepressants have anti-inflammatory properties and may also inhibit M1 microglia polarization159. Finally, lipid peroxidation markers increased after antidepressant drug treatment for MDD.

It is worth noting that only 15 meta-analytic estimates had a power >0.80 to detect a small ES. In addition, previous umbrella reviews indicate that the vast majority of peripheral biomarker studies are substantially underpowered20. This may undermine the progress and reliability of this particular field and of neuroscience in general through the generation of spurious findings160. The “true” ESs of most non-genetic peripheral biomarkers may be expected to be small, similarly to those reported in the genetic literature. Therefore, the design of large, multicenter studies with an open pre-registered protocol, or the creation of Consortia, may be a crucial step to assess the role of peripheral biomarkers in the diagnosis and treatment of major mental disorders within the framework of precision psychiatry1, as the model adopted by the Enigma neuroimaging group161, or similarly to other large collaborative initiatives162. Likewise the creation of biomarker scores using a similar rationale as for the generation of polygenic risk scores may ultimately be a next step in this field.

Strengths and limitations

It should also be noted that large statistical heterogeneity was verified in most included meta-analytic estimates (81.9%). Although this is considered a relevant indicator of bias in this literature, it may also reflect genuine heterogeneity, which may occur both within and between major diagnostic categories163. In addition, methodological differences of individual studies included in the assessed meta-analyses may also contribute to heterogeneity. Those include, for example, the time of sample selection as well as measurement properties of the assays (e.g. intra-assay and inter-assay coefficients of variation). Guidelines to standardize the collection and measurement of peripheral biomarkers in psychiatry have been recently proposed164. Furthermore, differences in sample selection across individual studies might have contributed to the observed heterogeneity in some meta-analytic estimates. For example, illness stage and disorders in which mixed presentations are common (e.g., bipolar disorder) might have contributed to heterogeneity across some included meta-analyses. In addition, approaches to subtype major mental disorders according to frameworks such as the NIMH Research Domain Criteria may help to decrease the heterogeneity of this literature in the future through the study of biologically valid and more homogenous phenotypes13,163,165.


This umbrella review of non-genetic peripheral biomarkers for major mental disorders revealed that this literature is fraught with several biases and is underpowered. Nevertheless, two associations supported by convincing evidence and 42 associations supported by highly suggestive evidence were verified. Most associations supported by either convincing or highly suggestive evidence pertained to a single disorder. Future multi-centric studies with a priori publicly available protocols, with an ad-hoc methodology to assess the trans-diagnostic nature of biomarkers17, as well as the subtyping of these disorders into more biologically valid phenotypes, and enough statistical power may improve the reliability and reproducibility of this field, which is of relevance for the translation of biological and precision psychiatry into practice.