Introduction

Bipolar disorder (BD) is characterized by recurrent episodes of mania and depression,1 and currently diagnosed solely by the assessment of the symptoms and past history by an interview. Thus, there is an emergent need for the development of methods to diagnose BD using biomarkers.

In addition to classical candidate markers related to monoamines, such as platelet serotonin concentration,2 recent studies are focusing on altered response of lymphocytes to glucose deprivation,3 oxidative stress markers,4 and serum/plasma brain-derived neurotrophic factor.5 Comprehensive gene expression analysis has also been used for the identification of blood biomarkers. These include phosphodiesterase 4B (PDE4B)-associated mRNA signature in monocytes,6 blood biomarkers related to myelination or growth factor signaling derived from DNA microarray analysis in blood RNA and convergent functional genomics,7 and eight candidate genes are shown by DNA microarray analysis of blood RNA.8

In addition to these studies, mounting evidence suggests that lymphoblastoid cells (LCLs) derived from patients with BD show various phenotypes, such as altered calcium signaling,9 altered inositol levels10 or diminished endoplasmic reticulum stress response.11, 12 Several studies suggested that expression levels of several genes in LCLs would be potential biomarkers of BD.13, 14, 15, 16, 17, 18 By using the LCLs, potential effects of medication can be minimized by culturing the cells in drug-free culture media for more than 1 month.

In this study, we measured the expression levels of 17 genes that were previously reported as candidate potential biomarkers from comprehensive gene expression analysis in patients or model animals, suggested to be relevant to BD from neurobiological studies or emerged as candidate genes from genome-wide association studies (GWASs). We tested these genes in the first set of the patients and controls, and selected the genes useful for discrimination by logistic regression analysis; this was tested in an independent sample set. These analyses showed that ANK3, RASGRP1 and POLG1 would be candidate biomarkers.

Materials and Methods

Subjects

The first set of the subjects included 13 patients with BD (1 woman and 12 men, 53.2±14.9 years old, all were bipolar I disorder) and 21 controls (5 women and 16 men, 47.6±12.0 years old). They were included in our earlier study of biomarkers.16 The second set of samples included 18 patients with bipolar I disorder (2 women and 16 men, 43.0±11.8 years old) and 37 controls (20 women and 17 men, 47.6±14.6 years old). They were included in our previous study of endoplasmic reticulum stress response.12 Because the first sample set comprised only of patients with bipolar I disorder, we mainly focused on bipolar I disorder in this study. Patients were diagnosed according to the DSM-IV (Diagnostic and Statistical Manual of Mental Disorders, fourth edition) criteria by the consensus of at least two psychiatrists. A structured interview, the Mini-International Neuropsychiatric Interview,19 was used for recently recruited patients. Controls were selected from students, nurses, office workers and doctors at participating institutes, along with their friends. A senior psychiatrist interviewed them and assessed them as healthy. Written informed consent was obtained from all participants. The Research Ethics Committee of RIKEN approved the study.

Cell cultures and RNA extraction

Lymphocytes were separated from the peripheral blood and transformed by Epstein–Barr virus. LCLs were cultured in RPMI 1640 (Sigma; St Louis, MO, USA) containing 10% fetal bovine serum as described previously.20 All LCLs were used after two rounds of freezing and reculturing.

Total RNA was prepared from LCLs using TRIzol reagent (Invitrogen, San Diego, CA, USA), followed by DNase I (Takara Bio, Shiga, Japan) treatment to exclude genomic DNA. The SuperScriptII first-strand synthesis system with oligo(dT) (Invitrogen) was used to synthesize complementary DNA according to the manufacturer’s instructions.

Candidate biomarker genes

We examined the expression levels of the following 17 genes as potential biomarkers. We selected the genes on the basis of the following grounds, but genes whose expressions were not detected in the LCLs by our previous DNA microarray experiments were excluded. They included the candidate genes such as MAG, PMP227 and CACNA1C.21

(1) The genes that the authors previously reported as candidate biomarkers in the LCLs of patients with BD are as follows:15, 16, 17

NDUFV2 (NADH-ubiquinone oxidoreductase flavoprotein 2, a subunit of mitochondrial complex I), PDLIM5 (PDZ and LIM domain protein 5, LIM) and DNAJB1 (DnaJ (Hsp40) homolog, subfamily B, member 1, HSPF1).

(2) Causative genes for mitochondrial DNA deletion syndrome. Mitochondrial DNA deletion syndrome can accompany BD and depression,22 and an animal model carrying a mutation in one of these genes (Polg1) showed BD-like behavioral phenotypes.23

POLG1 (polymerase-γ catalytic subunit), WFS1 (Wolfram syndrome 1), POLG2 (polymerase-γ accessory subunit), SLC25A4 (ANT1, adenine nucleotide translocator 1) and Twinkle (C10orf2; mitochondrial DNA helicase).

(3) Commonly altered genes in the frontal cortex of BD model mice and postmortem brains of patients with BD by comprehensive gene expression analysis are as follows:24

PPIF (peptidyl-prolyl cis-trans isomerase F, cyclophilin D) and SFPQ (splicing factor proline/glutamine-rich).

(4) Mitochondria-related genes commonly altered in the hippocampus and frontal cortex in BD model mice are as follows:25

TOP1MT (mitochondrial topoisomerase) and GLUD1 (glutamate dehydrogenase 1).

(5) The gene found to be altered in the postmortem brains of patients with BD by a comprehensive gene expression analysis is as follows:26

AK2 (adenylate kinase 2).

(6) The genes reported as a biomarker in monocytes is as follows:6

PDE4B.

(7) The candidate biomarker suggested by a convergent functional genomics approach:7

MBP (myelin basic protein).

(8) The genes identified by a genome-wide association analysis are as follows:21

ANK3 (ankyrin G) and RASGRP1 (RAS guanyl-releasing protein 1).

Real-time quantitative reverse transcription PCR

TaqMan Preamp Master Mix kit (Applied Biosystems, Foster City, CA, USA) was used for quantifying gene expression levels, according to the manufacturer’s protocol. In brief, 10 cycles of preamplification PCR were performed with Preamp Master Mix kit and pooled TaqMan Gene Expression assay mix (Applied Biosystems). After the dilution of the preamplified product, second-round PCR was performed with TaqMan Gene Expression Master Mix (Applied Biosystems) and individual TaqMan Gene Expression assay mix. TaqMan Gene Expression assay mix used in this study were as follows: Hs00221478_m1 (NDUFV2), Hs00179051_m1 (PDLIM5), Hs00428680_m1 (DNAJB1), Hs01018668_m1 (POLG1), Hs00903605_m1 (WFS1), Hs00200546_m1 (POLG2), Hs00154037_m1 (ANT1), Hs00222440_m1 (C10orf2 (Twinkle)), Hs00194847_m1 (PPIF), Hs00192574_m1 (SFPQ), Hs00369537_m1 (TOP1MT), Hs01632647_g1 (GLUD1), Hs00797700_s1 (AK2), Hs00277080_m1 (PDE4B), Hs00921945_m1 (MBP), Hs00241738_m1 (ANK3) and Hs00183347_m1 (RASGRP1). We measured the ΔCt=Ct (each gene)−Ct (ACTB) for each sample in triplicate. The relative expression level was calculated by 2−ΔCT

Statistical analysis

The Mann–Whitney U test was used for the comparison of expression levels of a particular gene between patients and controls because this test is robust to deviation from the normal distribution. P<0.05 was considered as significant.

To test whether these genes are useful for the diagnostic markers, logistic regression analysis was used. The expression levels of all the 17 genes, as well as sex and age, were defined as independent variables, and the diagnosis (bipolar I disorder or control) was defined as a dependent variable. The variables useful for the discrimination were selected by a forward stepwise regression method using the data of the first set of samples. Using these variables selected by the logistic regression analysis, a discriminant function was generated, and this was used for the prediction of the diagnosis in the second set of samples. The cutoff point was set at 0.5. Whether this discriminant function is useful or not was tested by a χ2-test for independence. Fisher’s exact probability test was used when there is a cell with a number less than 5.

The statistical analyses were performed by SPSS for Windows version 11.0 (SPSS, Tokyo, Japan). For the variable selection by a bootstrap method, SPSS version 19 was used. Mersenne Twister method was used for the generation of random numbers. Sampling was performed 1000 times after the stratification by the diagnosis. The q-values were calculated by R software.

Results

Among the genes examined, SFPQ, Twinkle, AK2 and RASGRP1 showed nominally significant difference between patients with BD and controls in the first set of samples (P=0.046, 0.026, 0.014 and 0.024, respectively, by Mann–Whitney U test). However, none of them survived the correction of multiple testing and the q-values were not significant (q=0.16, 0.15, 0.15 and 0.15, respectively). Furthermore, none of them was replicated in the second set of bipolar I disorder samples (Figure 1).

Figure 1
figure 1

Expression levels of candidate genes in bipolar I disorder. Bars represent means. *P<0.05 by Mann–Whitney U test. Error bars indicate standard deviation. Although the first sample set showed significant difference between patients with bipolar I disorder and controls (P-values by Mann–Whitney U test are 0.046, 0.027, 0.014 and 0.024 for SFPQ, Twinkle, AK2 and RASGRP1, respectively), there was no statistically significant difference in the second set of samples (P-values are 0.162, 0.190, 0.554 and 0.123, respectively). There was no statistically significant difference of POLG1 and ANK3 between bipolar I patients and controls for the first and second sample sets (POLG1, P=0.082 and P=0.141; ANK3, P=0.158 and P=0.095, respectively).

A total of 19 independent variables (17 genes, age and sex) were entered into a logistic regression analysis with a stepwise method to identify independent predictors. The analysis identified three statistically significant independent variables (POLG1, ANK3 and RASGRP1). Confounding factors, age and sex, were not selected as parameters having predictive value for the discrimination. The discriminant function was generated as follows.

Using the discriminant function containing these three genes, the patients with bipolar I disorder and controls in the first sample set were discriminated with sensitivity of 76% and specificity of 85% (P=0.00064 by Fisher’s exact probability test; Figure 2).

Figure 2
figure 2

Results of logistic discriminant analysis. The receiver operating characteristic (ROC) curve of the sensitivity and specificity for the prediction using the three genes, ANK3, RASGRP1 and POLG1. The area under the ROC curve was 0.704±0.080 (mean±s.e.m.), with a 95% confident interval of 0.546 and 0.862 (P=0.015).

By using the same discriminant function, the second sample set was discriminated. This analysis showed that patients with bipolar I disorder and controls can be discriminated with sensitivity of 44% and specificity of 81% (χ2=3.97, P=0.046). The area under the receiver operating characteristic (ROC) curve was 0.704±0.080 (mean±s.e.m.), with a 95% confident interval of 0.546 and 0.862 (P=0.015).

Among the three genes (POLG1, ANK3 and RASGRP1) used for discrimination, ANK3 showed higher levels, whereas POLG1 and RASGRP1 showed lower levels in the patients (Figure 1). The gene with the largest effect size was ANK3.

Although there was no statistically significant difference of POLG1 and ANK3 between bipolar I patients and controls for first and second sample sets (Figure 1), all the three genes selected for the logistic discriminant analysis showed nominally significant difference between patients with bipolar I disorder and controls in combined samples (31 patients with bipolar I disorder and 58 controls, POLG1, controls, 0.0057±0.0023 (mean±s.d.), bipolar I disorder, 0.0049±0.0020, P=0.031; ANK3, controls, 0.00023±0.00028 (mean±s.d.), bipolar I disorder, 0.00053±0.00079, P=0.026; and RASGRP1, controls, 0.017±0.010 (mean±s.d.), bipolar I disorder, 0.013±0.008, P=0.008).

Discussion

In this study, we used gene expression analysis to test whether the expression levels of candidate genes would be useful for diagnostic biomarkers of BD. By a logistic regression analysis, three genes (ANK3, RASGRP1 and POLG1) were selected among the 17 candidate genes. Using these three genes, an independent sample from patients with bipolar I disorder can be discriminated from controls with statistical significance. Although the sensitivity was not enough to be used for the diagnosis in clinical settings, these results suggest that gene expression analysis in LCLs might be a promising strategy for the development of biomarkers of BD.

The two of the three genes (ANK3 and RASGRP1) selected for the discrimination were stemmed from a meta-analysis of GWASs.21 Association between ANK3 and BD was first reported by a GWAS using DNA pooling method.27 After the identification of two single-nucleotide polymorphisms (SNPs) of ANK3 by a meta-analysis of GWASs,21 the association of one of these two SNPs was replicated in an independent sample set.28 Other GWASs also supported the association.29, 30, 31 We also confirmed the association with one of the SNPs of ANK3 in Asian population.32 ANK3 encodes ankyrin G, an adaptor protein that links various membrane proteins such as ion channels with spectrin. Marked alteration of the composition of ankyrin in erythrocyte membrane has been reported in patients with BD.33 A recent study suggested that allelic imbalance is observed for the mRNA expression of ANK3 in LCLs.34 However, the SNP associated with BD (rs10994336) was not associated with allelic expression of ANK3. Thus, the mechanism of cis-regulation of mRNA expression of ANK3 is not known.

On the other hand, RASGRP1 is a guanyl nucleotide exchange factor that activates Ras. RASGRP1 has binding domains of Ca2+ and diacylglycerol. Thus, genetic association of these genes with BD was compatible with calcium signaling abnormalities in BD.9 Although ANK3 was upregulated in the patients, RASGRP1 was downregulated in the patients (Figure 1). To our knowledge, there have been no other studies to measure expression levels of these genes in blood cells of patients with BD. The present finding suggests that these genes would be candidate biomarkers of BD.

The other gene selected for discriminant function was POLG1, which encodes a catalytic subunit of mitochondrial DNA polymerase. POLG1 was downregulated in patients (Figure 1). POLG1 is one of the causative genes for mitochondrial diseases, such as chronic progressive ophthalmoplegia35 and mitochondrial recessive ataxia syndrome,36 both of which frequently comorbid with mood disorders. We showed that transgenic mice with neuron-specific expression of mutant Polg1 displayed BD-like phenotypes.23 The present finding suggests that reduced expression of POLG1, in addition to mutations of POLG1, is relevant to BD.

Variable selection by logistic regression analysis is unstable. Minor changes in the data can lead to a different list of significant variables and, consequently, different discriminant function. To construct a more robust discriminant function, we ran the variable selection process many times on bootstrap samples of the training data and evaluated that on the testing data. By a bootstrap method, besides the three genes (ANK3, RASGRP1 and POLG1) selected by the stepwise method, age and nine additional genes (NDUFV2, PDLIM5, DNAJB1, POLG2, SLC25A4, PPIF, TWINKLE, TOP1MT and SFPQ) were selected as significant variables. However, the results of the discrimination of the second sample set did not differ from that by the three genes (sensitivity of 44% and specificity of 81%, χ2=3.97, P=0.046).

Although the present study suggested that gene expression analysis in LCLs may be used for biomarkers of BD, there are still many limitations in the application of gene expression analysis to a diagnostic test. First, the sensitivity and specificity, 44% and 81%, respectively, are too low to apply this as a biomarker. These values are, however, comparable with other established screening tests. For example, the sensitivity and specificity is 34.9% and 63.1%, respectively, for prostate-specific antigen in prostate cancer,37 23.9% and 93.8%, respectively, for fecal occult-blood testing in colorectal cancer38 and 64.3% and 91.4%, respectively, for α-fetoprotein in hepatocellular carcinoma.39 In this study, we selected a representative gene from each study.6, 7 If we use the full lists of the genes in those studies, the prediction may be more accurate. Furthermore, there was a difference in sensitivity between the training and testing data sets. However, sensitivity/specificity values calculated on the training data set are optimistic in nature. Second, the number of subjects is limited to generalize the current results. Third, patients with other diseases should also be tested to know the specificity of the findings. Although we obtained the high specificity of the discrimination with the equation, this specificity should be confirmed with the samples for other psychiatric diseases. It is uncertain whether this procedure distinguishes BD and major depression.

The use of LCLs for biomarker would be controversial, because Epstein–Barr virus has been implicated in the pathogenesis of human malignancies and genomic instability, and chromosomal aberrations may be involved.40 In clinical settings, it is very difficult to obtain blood samples from bipolar patients who are not taking any drugs. Thus, we used LCLs to minimize the direct effect of drugs. By using LCLs, effect of alteration of cellular fraction, such as increase of granulocytes by lithium, can also be avoided. Min et al.41 compared the gene expression profiles between LCLs and B-lymphocytes obtained from six individuals and found that the LCL expression levels were correlated with those in the B-cells (CD19, ρ=0.83 and CD20, ρ=0.79). They also showed that FSCN1, CD70 and TNFSF9 were upregulated, whereas FCRL3, RASGRP2 and TYROBP were downregulated in LCLs compared with peripheral blood leukocytes (PBLs). We also studied the difference of DNA methylation status between LCLs and PBLs. We found that hypermethylation was more predominant than hypomethylation in LCLs compared with PBLs. Thus, LCLs should be used with caution because methylation patterns in LCLs are not the same as those in PBLs. However, methylation status in LCLs was correlated with that in PBLs from the same individuals. Thus DNA methylation may be partly maintained after Epstein–Barr virus transformation.42 By carefully excluding the genes affected by Epstein–Barr virus transformation, it might be possible to detect a marker translatablable to other cells or tissues.43 It should be noted that gene expression patterns of LCLs cannot reflect the molecular status in the brain, whereas it can reflect genetic and epigenetic status of an individual. Indeed, some of SNP-expression relationships are conserved between the brain tissues and LCLs.44 As discussed above, gene expression patterns in LCLs do not always represent the gene expression status in PBLs or the brain. However, it does not prevent us from directly comparing the gene expression levels in LCLs with a disease.

In summary, the present study supports the potential utility of gene expression analysis of candidate genes in LCLs for the biomarker of BD, and suggested that the combination of ANK3, RASGRP1 and POLG1 would be promising candidate of biomarkers. However, the predictive accuracy does not seem clinically useful, even if they are statistically significant. Further studies are needed before the application of gene expression analysis as a clinical test.