Identification of potential plasma protein biomarkers for bipolar II disorder: a preliminary/exploratory study

The diagnostic peripheral biomarkers are still lacking for the bipolar II disorder (BD-II). We used isobaric tags for relative and absolute quantification technology to identify five upregulated candidate proteins [matrix metallopeptidase 9 (MMP9), phenylalanyl-tRNA synthetase subunit beta (FARSB), peroxiredoxin 2 (PRDX2), carbonic anhydrase 1 (CA-1), and proprotein convertase subtilisin/kexin type 9 (PCSK9)] for the diagnosis of BD-II. We analysed the differences in the plasma levels of these candidate proteins between BD-II patients and controls (BD-II, n = 185; Controls, n = 186) using ELISA. To establish a diagnostic model for the prediction of BD-II, the participants were divided randomly into a training group (BD-II, n = 149; Controls, n = 150) and a testing group (BD-II, n = 36; Controls, n = 36). Significant increases were found in all five protein levels between BD-II and controls in the training group. Logistic regression was analysed to form the composite probability score of the five proteins in the training group. Receiver-operating characteristic curve analysis revealed the diagnostic validity of the probability score [area under curve (AUC) = 0.89, P < 0.001]. The composite probability score of the testing group also showed good diagnostic validity (AUC = 0.86, P < 0.001). We propose that plasma levels of PRDX2, CA-1, FARSB, MMP9, and PCSK9 may be associated with BD-II as potential biomarkers.


Aim of the study
In the present study, we aimed to identify candidate plasma proteins associated with BD-II using iTRAQ in an initially small group of participants. We further explored the association of the identified candidate protein with BD-II in a larger sample. In addition, we plan to evaluate whether these proteins could be plasma biomarkers or form a diagnostic model to assist in BD-II diagnosis. To examine the validity of the diagnostic model using candidate proteins, the participants (BD-II and controls) were divided into training and testing groups.

Results
We recruited 185 patients with BD-II and 186 controls. Categories of mood states were based on clinical evaluation according to the HAMD and YMRS rating scales without applying duration criteria: Euthymic (HAMD-17 and YMRS < 8), depressive (HAMD-17 > 8 and YMRS < 8), hypomanic (YMRS > 7 and HAMD-17 < 8) and mixed state (HAMD-17 > 7 and YMRS > 7) 15 . Of all patients, 23 were in depressive states, 16 were in hypomanic state; 138 were in mixed state and 8 were in euthymic state. In order to build a diagnostic model using the identified proteins, all participants were randomly selected into the training and testing groups using SPSS. We selected 36 participants from the BD-II and control groups (about 20% of all recruited participants), as the testing group, 80% of the participants were left in the training group. Table 1 showed the clinical characteristics of all participants, listed as training and testing groups. Table 1 shows the clinical characteristics of all participants, listed as training and testing groups. All the recruited patients were first diagnosed with BD-II with no prior treatment for bipolar disorder. All patients were recruited in the morning, and blood samples were collected between 9 am and noon. We did not restrict certain mood states as inclusion criteria. At inclusion, the mean HAMD score was 12.7 ± 4.2; the YMRS score was 11.9 ± 3.6. The mean age of onset of BD-II was 15.3 ± 5.7 years.
The ELISA results of the training and testing groups showed that all plasma proteins increased significantly in patients with BD-II compared to controls, in the training and testing groups ( Table 1), except that significant differences were not found in the PRDX2 level between BD-II and controls in the testing group. In addition, significant differences in PRDX2, CA-1 and MMP9 levels between male and female were found in the BD-II group. In the control group, significant differences in CA-1, FARSB and MMP9 levels were found between different genders (Supplement Table 1). Due to such difference, we have included age and gender as covariates in the logistic regression testing for predictor for BD-II.
The correlation between plasma levels of proteins and mood severity is shown in Table 2. The level of FARSB negatively correlated with the HAMD (P = 0.005) and YMRS scores (P < 0.001).
Using the training group, we analysed logistic regression to generate a composite probability score (a combination of 5 protein levels) to predict the diagnosis of BD-II (Table 3). We found that the levels of CA-1, FARSB, MMP9, and PCSK9 may predict the diagnosis of BD-II using logistic regression. ROC curve analysis showed that the composite probability score combining MMP9, FARSB, PRDX2, CA-1, and PCSK9 could differentiate BD-II from controls with an AUC of 0.89 (P < 0.001, 95% CI = 0.86-0.93) (Fig. 1a). Table 1. Clinical and demographic data of patients with bipolar II disorder (BD-II) and healthy controls. We used t-test for continuous variables and chi-square test for categorical variables. HAMD Hamilton depression rating scale, YMRS Young manic rating scale, N/A not available, SD standard deviation. *P < 0.05; **P < 0.01. a Calculated using transformed data (Log10 transformation of original data for normal distribution). www.nature.com/scientificreports/ To replicate the diagnostic validity of the probability score in the aforementioned logistic regression, we further computed the probability score of the five proteins of the testing group using the intercept and B values from logistic regression in Table 3. The ROC analysis of the diagnostic validity of this computed probability score showed that AUC = 0.86 (P < 0.001, 95% CI = 0.77-0.91) (Fig. 1b).
Pathway enrichment analysis was conducted for upregulated and downregulated protein candidates, respectively, as shown in Supplementary Figure 2a. The upregulated protein candidates were significantly enriched in critical biological processes, including hydrogen peroxide, cellular oxidant detoxification, and innate immune response. In addition, the downregulated proteins were significantly involved in platelet degranulation, retina homeostasis, actin filament organisation, and muscle filament sliding (Supplementary Figure 2b).
The study had a power of approximately 0.40 to detect a small effect, and 0.99 to detect medium and large effects for independent t-test for the training group (N = 299) by setting a small effect size = 0.2, medium effect size = 0.5, and large effect size = 0.8 and alpha = 0.05 16 . For the regression model, the training group (N = 299) had a power of approximately 0.68 to detect a small effect, and 0.99 to detect medium and large effects by setting a small effect size = 0.02, medium effect size = 0.15, and large effect size = 0.35 (alpha = 0.05) 16 .

Discussion
In the current study, we identified that plasma levels of PRDX2, CA-1, FARSB, MMP9, and PCSK9 may discriminate patients with BD-II from controls. In addition, the combination of these proteins (MMP9, PCSK9, FARSB, CA-1, and PRDX2) may distinguish BD-II from controls. The identified proteins were never reported to be associated with BD-II in previous studies. Although most of the proteins are related to the inflammatory process, the underlying mechanisms of how these proteins are involved in the pathogenesis of BD-II require further investigation.
PRDX2 is a 2-Cys antioxidant enzyme of the peroxiredoxin family and is abundantly expressed in mammalian cells. It protects cells from oxidative stress because PRDX2 may scavenge levels of H2O2 and ROS 17 . PRDX2 protein is essential for redox balance and prolongs cell lifespan 18 . PRDX2 overexpression has been reported in gastric cancer 18 and correlated with the progression of colon, cervical, and lung cancers. We are first to report the association between the increase of PRDX2 protein levels and BD-II. Moreover, it may assist with the prediction of BD-II. Further studies are warranted to clarify the underlying mechanism of PRDX2 proteins interaction in BD-II and role of oxidative stress in it.
We also found an increase in the levels of CA-1 and FARSB in BD-II patients compared to controls. CA-1 is an enzyme that catalyses the reversible hydration of carbon dioxide 19 . It is found primarily in the red blood cells, the colonic epithelium, and neutrophils 20 . Association of CA-1 with gastrointestinal inflammation has Table 2. Correlation between level of protein levels and mood severity in BD-II (training group). HAMD Hamilton depression rating scale, YMRS Young manic rating scale. r: Pearson's correlation coefficient. *P < 0.05; **P < 0.01. a Calculated using transformed data (Log10 transformation of original data for normal distribution).  Table 3. Logistic Regression using protein levels to predict diagnosis of BD-II (training group). Reference group: normal control. Constant number: 33.732. Using logistic regression to form a probability score (combination of 5 protein levels) to predict diagnosis of BD-II. Probability score for the testing group is calculated in the following way: Logit_ probability = 33.732 + (− 0.005) * PRDX2 + (− 3.6) * CA1 + (− 4.0) * FARSB + (− 2.9) * MMP9 + (− 5.2) * PCSK9. Probability score = (2.718281728 ** logit_probability)/(1 + 2.718281728 ** logit_probability). *P < 0.05; **P < 0.01. a Calculated using transformed data (Log10 transformation of original data for normal distribution). www.nature.com/scientificreports/ been derived from previous data that showed decreased CA-1 protein in the inflamed mucosa of ulcerative colitis 21 . Significantly increased expression of CA-1 was found in the aortic lesions, particularly in calcified regions 22 . CA-1 is also proposed as a biomarker for the diagnosis of non-small cell lung cancer as it was found to be highly expressed in serum 23 . Although CA-1 has been never associated with any mental disorder because it was reported to mediate cerebral vascular permeability, its patho-mechanism in BD-II still requires further investigation. FARSB catalyses the synthesis of phenylalanyl-tRNAPhe (Phe-tRNAPhe). Mutation of the FARSB gene leads to a multisystem disease, aminoacyl-tRNA synthetase-related diseases, characterised by the following clinical features: brain calcifications, cerebral aneurysms, interstitial lung disease, and cirrhosis 24 . We found significantly negative correlations between the level of FARSB and clinical symptoms. Therefore, although we found significant association of that the FARSB levels may predict BD-II, the FARSB may be a state but not a trait biomarker. However, since no previous studies have reported how FARSB proteins are expressed in mental disorders, the underlying pathogenesis of FARSB with BD-II requires further study. A trait marker is enduring which may reflect the underlying pathophysiology of the disease, while a state marker reflects clinical manifestation and may be changeable. We found that the FARSB significantly correlated with HAMD and YMRS scores, www.nature.com/scientificreports/ this rather speaks for a state and not a trait biomarker. Being a state marker, the FARSB may reflect treatment response of BD-II, which is clinically important as well. The underlying mechanism for the correlation between FARSB level and clinical symptoms of BD-II requires further investigation. Because most of the patients recruited in this study were in mixed episode, it will be of interest to explore whether significantly higher levels of FARSB is noted in euthymic group of BD-II compared to controls, which may support its role as a trait marker. Our finding partially agrees with a previous study that reported increased MMP9 proteins in non-differentiated BD compared to controls; this report suggested that MMP9 is a staging marker 25 . MMP9 may increase the inflow of cytokines into the central nervous system (CNS) by increasing the permeability of the blood-brain barrier (BBB) 26 . The increased inflow may exert neurotoxic and gliotoxic effects, which further contribute to an inflammatory process in the CNS. Since CNS inflammation and neurodegeneration may be involved in the pathogenesis of BD 27 , the current finding of increased plasma MMP9 levels in BD-II may provide a further reference from the inflammation aspect for the association of BD and inflammation.
We found an increase in the plasma levels of PCSK9 in BD-II compared to controls. PCSK9 is a hepatic enzyme that may modulate the metabolism and homeostasis of plasma cholesterol. PCSK9 promotes the degradation receptors of low-density lipoprotein (LDL) and very-low-density lipoprotein (VLDL) by binding to these receptors 28 . So, an increase in the level of PCSK9 may result in the degradation of the receptors of LDL and VLDL, thereby elevating serum LDL-cholesterol levels and increasing the risk for cardiovascular disease 28 . Contrarily, a decrease in PCSK9 level can lower blood LDL-cholesterol concentrations 28 . In mental disorders, higher CSF levels of PCSK9 were found in patients with Alzheimer's Disease 29 and patients with alcohol use disorder 30 than in controls. In addition, plasma PCSK9 levels were found to correlate positively with PCSK9 levels in the CSF of patients with alcohol use disorder. Recent studies have suggested a direct relationship between PCSK9 levels and inflammation 31 . It has also been proposed that anti-inflammatory agents may have inhibitory effects against the PCSK9 enzyme 32 . PCSK9 inhibitors have been reported to be effective for neuroprotection with no negative impact on cognition 33 . However, the exact mechanism for the relationship between PCSK9 and BD-II requires further investigation.
We further performed gene ontology analysis of the identified proteins to elucidate the functional relevance with biological processes. The identified pathways of upregulated proteins included regulation of lipoprotein, response to oxidative stress, and innate immune response (Supplementary Figure 2a), which were all previously associated with the pathogenesis of affective disorders 34,35 . However, further mechanistic studies are required to elucidate the exact function of identified proteins and how they interact in the pathogenesis of BD-II.
The major finding of the current study is the identification of individual candidate proteins and the combination of these proteins as diagnostic biomarkers for BD-II. The AUC of the ROC curve was 0.89. This proposed diagnostic panel consisting of five plasma proteins may be an applicable and convenient clinical tool to assist in BD-II diagnosis. This model was further validated by the testing group, which also showed a diagnostic validity of AUC = 0.86. However, whether this model may distinguish BD-II from BD-I or major depressive disorder, which are clinically challenging, still warrants future study that encompasses recruitment of more groups with affective disorders.
Our study has the following shortcomings that should be interpreted with caution. First, it will be ideal to subject the entire sample to iTRAQ analysis for the identification of candidate protein biomarkers. With limited funding, we were unable to afford this expense and only subjected the initial group to iTRAQ analysis. Contrarily, our results would be much more representative if we applied the sampling pool method from each group and then subject the samples to iTRAQ. Our study result is therefore very exploratory and preliminary which requires further confirmation. We only selected upregulated proteins in the current study because we were not sure whether the downregulated candidates were undetectable in BD-II or not. We thought that upregulated protein biomarkers might be easier to detect whereas downregulated proteins may limit detection ability 36 . Hence, we may have neglected candidate proteins whose expression was downregulated in BD-II. In addition, we sampled plasma proteins instead of central nervous system samples (cerebrospinal fluid); therefore, its applicability to other types of samples is unknown. Third, we did not control for some frequently occurring metabolic comorbidities, including diabetes and hypertension, as these diseases may confound the correlation between candidate proteins and BD-II. In addition, we did not control for the time of fasting before blood sampling. As some proteins may change according to age, the results from our relatively young population should be interpreted with caution. Because all participants were randomly selected into the training and testing groups using SPSS, there were significant differences in age and gender between BD-II and controls in the training cohort, but not in the testing cohort. There is a trend of differences in age and gender between BD-II and controls in the testing cohort also; however, the differences were not-significant, probably due to the smaller sample size compared to the training cohort. Future studies with age and gender matched case and control groups in both the training and testing cohorts may be warranted. In the current study, as a control for differences in age and gender, we included gender and age as covariates in the logistic regression model. However, the statistical difference in age and gender between patients and controls may still confound the validity of our results. Only proteins that are not influenced by clinical state or gender differences between patient and controls are really of use as diagnostic markers, therefore, the proteins we identified in the current study still require further validation. Further study focusing on patients in euthymic state may be needed as well to clarify whether FARSB is a trait marker as well. Fourth, as we mentioned earlier, other affective disorders which are frequently confused or misdiagnosed with BD-II, including major depressive disorder and BD-I, were not recruited in the current study as comparative groups. We, therefore, are not sure whether the current model may assist with the common misdiagnosis of BD-II as other mood disorder. Although, we analysed the correlation between protein levels and mood severity, we were unable to evaluate the change of each candidate protein in disease progression or after administration of BD-II treatment due to the cross-sectional design of the current study. It will be ideal for a future longitudinal www.nature.com/scientificreports/ study to observe the changes in candidate proteins alone in the course of illness to determine whether they are suitable treatment targets.

Conclusion
We have identified candidate protein biomarkers-PRDX2, CA-1, FARSB, MMP9, and PCSK9-associated with BD-II in the current study. We also found that the combination of these five proteins may predict the diagnosis of BD-II with good validity. We believe that these plasma protein biomarkers may be an addition to precision psychiatry by assisting in the identification and recognition of BD-II. Prompt and accurate diagnosis may facilitate timely pharmacological and psychological intervention, which not only decreases the lengthened and difficult course of the disease but can also alleviate the socioeconomic burden on society.

Methods
We . The inclusion criterion for recruitment included fitting the diagnosis of BD-II. Exclusion criteria were (a) any other major and minor mental illnesses besides BD-II, such as organic mental disorders and substance use disorder and (b) any significant medical or neurological disorders. Healthy controls were recruited from the community. These participants also received structural interviews using the SADS-L to screen for psychiatric conditions. Inclusion criteria for the controls were: (a) age between 20 and 65 years; (b) no major or minor mental illnesses (such as schizophrenia, mood disorders, anxiety disorder, substance use disorder, and personality disorder) and no family history of psychiatric disorder among their first-degree relatives; (c) no blood transfusions or severe trauma within the past month. Plasma collection. Twenty millilitres of whole blood was collected from the antecubital vein of each participant. Blood samples were collected in a test tube containing ethylenediaminetetraacetic acid (EDTA) (Greiner Bio-One Vacuette; Santa Cruz Biotechnology, Santa Cruz, CA), kept on ice for no more than 30 min. To isolate plasma, whole blood was prepared by centrifuging at 3000 g for 15 min at 4 °C and then stored at − 80 °C for further evaluation.

Measures of symptomatology.
iTRAQ library preparation and screen plasma protein. In this study, we randomly selected plasma samples from two BD-II patients and two controls as initial groups for the iTRAQ analysis to identify candidate proteins. Four plasma samples were first subjected to high-abundance protein depletion using the Pierce Top 12 Abundant Protein Depletion Spin Columns (85165, ThermoFisher Science). Then, the protein library of the four plasma samples was prepared using the iTRAQ Reagents Multiplex Kit (4352135, Sciex), and the quality of the library was further confirmed. Finally, the four libraries of plasma samples were analysed with LC/Q-Exactive Orbitrap MS (Thermo) for 24 h. The raw data were analysed with Proteome Discoverer v2.4 (Thermo) by referring to the MASCOT 2.5 database (Matrix Science).
The analysis of labeled peptides was performed using an LTQ Orbitrap Velos ETD mass spectrometer (Thermo Scientific, Bremen) connected to an Agilent 1200 nanoLC system. The online reversed-phase chromatography included a trapping column (75 µm × 2 cm, C18 material 5 µm, 100 Å) with a flow rate of 4 µL/min and an analytical column (75 µm × 10 cm, Magic C18 AQ, 3 µm particle size, pore size 100 Å) with a flow rate of 350 nL/ min. The peptide was eluted using a linear gradient of 5 to 45% acetonitrile in 75 min. The electrospray source consisted of a 10 ± 2 µm emitter tip (New Objective, MA, Woburn) maintained at 2.4 kV. The full scan MS acquired with the mass resolution of 60,000 and the MS/MS scan with the mass resolution of 15,000 using the FT mass analyzer were used to collect data independently in a data-dependent manner. For each survey scan in the MS cycle, twenty strongest precursor ions were selected for MS/MS. HCD fragmentation was performed at 42% normalized collision energy. To avoid repeated selection of ions for MS/MS, the dynamic exclusion window was set to 30 s. The AGC settings for the complete FT MS and FT MS/MS were 1 million and 100,000 ions, and the maximum accumulation time were 300 ms. The use of polydimethylcyclosiloxane (m/z, 445.1200025) ion could accurately measure the locked mass. We followed previous research 43 for these detailed experimental steps.
From iTRAQ analysis, we detected 827 proteins in these samples according to two specific parameters: (1) FDR for protein < 0.01 and peptide identification > 2; (2) the number of protein match unique peptides ≥ 1.5 or ≦ 0.75. The iTRAQ data were further analysed using Partek software (Qiagen, Germany), and the expression of www.nature.com/scientificreports/ candidate proteins between healthy control and BD-II were analysed by t-test. We identified 49 proteins with significantly increased expression and 88 proteins with significantly decreased expression in BD-II patients compared to those in controls (Supplementary Figure 1). The top 38 differentially expressed protein candidates were presented using a heatmap (19 upregulated and 19 downregulated proteins) (Fig. 2). From the top 19 upregulated protein candidates, we selected five candidate proteins for further analysis by meeting the following criteria: (1) detection in all samples that showed differential expression twofold relative to the control sample and (2) relation to inflammatory and oxidative stress, as these might be pathogenesis of bipolar disorder 44  Evaluate concentration of candidate proteins by ELISA. The expression levels of five candidate proteins-MMP9, FARSB, PRDX2, CA-1, and PCSK9-in the plasma of BPII and healthy controls were assessed using ELISA kits. The ELISA kits used in this study were as follows: MMP9 (ARG80129, Arigo, Taiwan), FARSB (EH8444, FineTest, China), PRDX2 (ARG82038, Arigo, Taiwan), CA-1 (ARG82250, Arigo, Taiwan), and PCSK9 (ARG81395, Arigo, Taiwan). An ELISA reader (SpectraMax-M2; Molecular Devices, Sunnyvale, CA), which has a minimum detectable dose of 80 pg/mL, was used.
Pathway enrichment analysis. These identified proteins were subjected to Gene Ontology analysis using DVVID Bioinformatics Resources 6.8 (PMID: 19131956) to identify significantly enriched pathways.

Statistical analysis.
Because the protein data did not follow a normal distribution but were skewed to the right, we took Log10 for all protein levels to transform the protein data to a normal distribution for further www.nature.com/scientificreports/ analysis. The t-tests and chi-square tests were used to analyse the differences in clinical variables and plasma protein levels between patients and controls. Pearson's correlation was used to analyse the correlations between the level of plasma proteins and mood severity. Logistic regression was further performed to identify whether each candidate may predict BD-II from controls by setting BD-II as the dependent variable and levels of each plasma protein as independent variables in one model, controlling for age and gender. In addition, we used logistic regression to form a composite probability score of the combination of MMP9, FARSB, PRDX2, CA-1, and PCSK9 protein levels in order to predict the diagnosis of BD-II. By analysing the receiver operating characteristic (ROC) curves and the area under the ROC curve (AUC) of the composite probability scores generated by the above logistic regression, we tried to determine whether the combination of MMP9, FARSB, PRDX2, CA-1, and PCSK9 could distinguish between BD-II and control groups. The cut-off values of optimal diagnostic points of the ROC curve were set at the largest Youden's index (sensitivity and specificity-1). Using the random sampling method in SPSS, the patients were then divided into training (BD-II, n = 149; control, n = 150) and testing groups (BD-II, n = 36; control, n = 36) in order to conduct the replication study. We further computed the composite probability scores of the testing group using the intercept value and the B values from the logistic regression of the training group. Moreover, we analysed the ROC analysis of the computed composite probability scores of the testing group to examine whether these five proteins may effectively differentiate BD-II from controls. We used the statistical software SPSS v25.0 (Armonk, NY: IBM Corp.) to perform all statistical analyses. We performed the power analysis using G-Power 3.1.9.2 16,47 , and the effect-size conventions were determined according to Buchner et al. 16  www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.