Since their introduction in the 1950s, antipsychotic drugs are the medication of choice in the treatment of psychoses. However, despite the subsequent introduction of many new antipsychotics, about one third of patients are relatively unresponsive to treatment [1,2,3]. Inter-individual differences in clinical outcome following antipsychotic medication depends on many factors [4]. However, the basis of the heterogeneous response to treatment remains unclear. The lack of treatment algorithms or biomarker-based guidelines results in a trial-and-error process to identify the adequate treatment at the optimal dose with a minimum of side-effects for each patient. As a result, determining whether a patient will respond to antipsychotics involves the careful evaluation of at least 2 courses of different treatments, which substantially delays the provision of alternative treatments such as clozapine [5].

Over the last decades, considerable research efforts have focused on whether genetic information could be helpful to predict patient’s response as well as adverse effects to a given antipsychotic. Pharmacogenetic studies have mainly focused on specific candidate genes involved in the pharmacodynamics and in the pharmacokinetics of antipsychotic drugs or on the primary targets of antipsychotics [6]. Although many genetic variants have been reported as associated with response to treatment, none of these studies have led to the identification of a biomarker robust enough to be applicable in clinical practice [7,8,9,10,11,12,13]. More recently, genome-wide association studies (GWAS) of antipsychotic treatment response have used clinical scales or neurocognitive tests as outcome measurements, but without consistent findings and no functional validation of associated variants [14,15,16,17,18,19,20,21,22]. Transcriptome analysis provides an alternative method of identifying biomarkers of treatment response, but to date, studies that used this approach had a limited statistical power [23,24,25].

We sought to address these issues by conducting an RNA-seq analysis on 188 patients with first episode psychosis from the OPTiMiSE cohort [26, 27]. We examined gene expression at baseline and after 4 weeks of treatment with a single antipsychotic medication, i.e. amisulpride, using peripheral blood mononuclear cells (PBMCs). Genes that were differentially expressed in good responders only were validated in an independent sub-sample of the OPTiMiSE population and we used basal expression of these genes in combination with clinical data to predict treatment response.

Methods and materials


The cohort analysed in this study was obtained from the OPTiMiSE clinical trial ( [27]. This trial is registered with under the name “Optimization of Treatment and Management of Schizophrenia in Europe (OPTIMISE)”, number NCT01248195. After providing written informed consent, 491 patients with a first episode psychosis were included and 453 started medication. All inclusion sites obtained ethical approval and this research was approved by an Institutional Review Board. One-hundred-and-eighty-eight individuals from this cohort were included in our study (Supplementary Fig. S1). All patients were diagnosed with schizophrenia, schizophreniform disorder or schizoaffective disorder according to the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV) and confirmed on the basis of the Mini International Neuropsychiatric Interview Plus (M.I.N.I. Plus) [28] (Supplementary Table 1). At baseline, patients were medication-naïve or minimally-treated (any antipsychotic medication used for more than 2 weeks in the previous year or for a total of 6 weeks in their lifetime) and were ill for no longer than 2 years. Only subjects involved in the single-treatment arm (phase 1) of the trial were considered for this study [27]. All patients were treated for 4 weeks with amisulpride (200–800 mg/day orally). At the end of the 4-week treatment, the blood concentration of antipsychotic used has been determined. This measurement provided a reliable indication of the treatment compliance of individual patients.

DNA extraction and genotyping

Blood sample for DNA extraction was available for 162 subjects out of the 188 who were included in the RNA-Seq analysis. Genomic DNA was extracted from whole blood in EDTA tubes or buffy coat samples using the Maxwell® 16 Blood DNA Purification Kit (Promega Corp., Madison, USA) according to the manufacturer’s standard protocols. DNA quality control was performed by checking the absence of PCR inhibitors and evaluating the DNA integrity by electrophoresis migration (4200 TapeStation, Agilent Technologies Inc., Santa Clara, CA, U.S.A.). Genotyping of 954,212 single nucleotide polymorphisms (SNPs) was performed using the OmniExpressExome-8v1–4_A1 beadchips (Illumina Inc.), according to the manufacturer’s protocol. In order to determine the genetic ancestry of subjects, the genotypic data of 90,468 relevant SNPs have been merged with those from 395 individuals of the HapMap populations as described elsewhere [29]. We performed a principal component analysis and fixed a threshold on the two first components to distinguish between individuals with European or other ancestry in our cohort.

RNA extraction

Total RNA was extracted from blood samples, collected in PAXgene Blood RNA tubes, with the PAXgene Blood miRNA Kit (QIAGEN GmbH, Hilden, Germany) using a standard protocol on QIAcube robot (QIAGEN). A DNase digestion procedure was added after extraction. Total RNA was purified using the RNA Clean & ConcentratorTM-5 kit following the manufacturer’s instructions (Zymo Research Corp., Irvine, USA). RNA quantification was performed on a NanoDropTM 8000 spectrophotometer, in duplicate. The quality and integrity of the RNA samples were assessed using an Agilent Bioanalyzer 2100 and the RNA6000 Nano Labchip kit (Agilent Technologies). Only samples with an RNA integrity number (RIN) higher than 7 were selected. Three hundred and seventy-six RNA samples fulfilled quality control criteria for RNA-seq analyses, corresponding to two RNA samples (one at inclusion and one after 4 weeks of treatment) for 188 subjects. Forty-eight additional RNAs were available for replication studies, in 24 independent subjects (Supplementary Fig. S1).

RNA sequencing

Library preparation and sequencing

All libraries were prepared using the Illumina TruSeq Stranded Total RNA with Ribo-Zero Globin kit (Illumina Inc., San Diego, CA, U.S.A.), which removes ribosomal RNA and globin mRNA. Libraries were prepared using one microgram of total RNA as initial input and following the standard protocols. After library quality control, 2 × 100 bp paired-end (PE) sequencing was performed on a HiSeq4000 system (Illumina), pooling a maximum of six samples on each lane, in order to reach 40 to 50 millions of PE-reads for each sample. The Illumina pipeline was used to generate raw RNA sequencing data (fastq files) for each sample.

Quality control and reads mapping

Sequence quality controls were performed, using FastQC and in-house bioinformatic pipelines, from a sampling of 2 × 10 million reads in order to assess the levels of read duplicates, adapters, remaining rRNA and the GC content. Reads were trimmed for adapters and low-quality bases (Phred quality score <30) using Trimmomatic software (v.0.32). Reads were then mapped to the Genome Reference Consortium human genome assembly 37 (GRCh37) reference genome (hg19) using Tophat software (v.2.0.13). Read mapping quality was assessed using RNA-SeQC software. Then, gene-level quantification in read counts was performed by HTSeq software (v.0.6.1), using gene annotation from Ensembl v.86. All downstream analyses were conducted using the statistical software R (v.3.2.4) and Bioconductor R packages.

Filtering and normalization

Genes with counts-per million (cpm) below 1 in more than two third of the samples were considered unexpressed and removed from the analysis using edgeR package (v.3.12.1). Raw gene counts matrix has been normalized to account for differences in sequencing depth and RNA composition using the “median ratio” method implemented in the DESeq2 package (v.1.16.1). To avoid batch effects as confounding factors, all subjects had their two time points in a same batch. Considering the whole normalized set of expressed genes, principal component analysis (PCA) has been performed using the FactoMineR package (v.1.39) to evaluate the efficacy of DESeq2 normalization and for identifying outliers.

Validation of expression data by quantitative reverse transcription polymerase chain reaction (qRT-PCR)

Five hundred nanograms of total RNA was reverse-transcribed using the High-Capacity cDNA Reverse Transcription kit following standard protocol (Thermo Fisher Scientific, Waltham, MA, U.S.A.). Mixtures of the cDNAs and the TaqMan Universal PCR Master Mix were further loaded in the 384-well low density TaqMan array microfluidic cards (Thermo Fisher Scientific), containing probes and primers for each selected gene (Supplementary Table 2). Real-time PCR reactions were then carried out in an ABI Prism 7900HT sequence detection system (Thermo Fisher Scientific) using manufacturer’s instructions. Each assay was carried out in duplicate and threshold cycle (Ct) values were automatically calculated by the SDS 2.2 software (Thermo Fisher Scientific), after having manually set the analysis threshold. Two reference genes (18S and GUSB) with various expression levels were included in the analyses to perform a relative RNA quantification, using the most suitable reference gene.

Statistical analyses

All variables normally distributed were expressed as mean ± standard deviation (SD) and comparisons between conditions were examined by t-tests and one-way analyses of variance (ANOVAs). Not normally distributed variables were expressed as median and interquartile range (IQR) and between-group comparisons were analysed by Mann–Whitney and Kruskal–Wallis tests.

Differential expression before and after treatment was performed by a paired-analysis with the DESeq function and the Wald test using two explanatory factors in the experimental design, participants and time (two time points per subject). For gene expression levels before treatment, we corrected putative bias from library preparation batches as well as from the inclusion site. We then compared good and poor responders using the same function and a statistical model adjusted for age, sex, ancestry, DUP, positive PANSS score at inclusion and centre in which individuals had been recruited, these features either having been shown to be significantly different between future good and poor responders (Supplementary Table 1) or being putative confounding factors. Significantly DEGs between the two time points were defined as having a Benjamini–Hochberg (BH) corrected p value lower than 0.1.

For the qRT-PCR analysis, ΔCt values of each gene of interest were calculated before and after treatment and then normalized by subtracting the ΔCt values of the reference gene to generate ΔΔCt values. To select genes with significant expression changes after treatment, the medians of fold changes (2-ΔΔCt) were subsequently compared to a reference (Me = 1) using Wilcoxon Rank Sum Test, and an adjusted p value below 0.1 was considered to be significant.

Receiver operating characteristic (ROC) curves were drawn using the pROC package (v.1.14.0) to assess the performance of the treatment response prediction.

The statistical power was estimated using the PROPER package (v.1.14.1) [30] with 1000 simulations based on lymphoblastoid cell lines expression data from CEU individuals of the HapMap project.


Genes are differentially expressed after amisulpride treatment in good responders only

There is no consensus in defining treatment response in schizophrenia, limiting research and clinical translation [31]. In the OPTiMiSE cohort, we observed that the distribution of the total PANSS score change over 4 weeks of treatment revealed an admixture of two subpopulations with a Gaussian distribution (Fig. 1a). In this model, 56% were good responders, with a mean reduction of total PANSS score of 36.3% (SD = 12.1), and 44% were poor responders, with a mean PANSS total reduction of 13.0% (SD = 18.5). The intersection of the two subpopulations was around 20% reduction in total PANSS score. We thus used this threshold to distinguish between good (N = 113) and poor (N = 75) responders in our study. Note that good responders were slightly older than poor responders with a lower duration of untreated psychosis and more positive symptoms (Supplementary Table 1).

Fig. 1: Differentially expressed genes (DEGs) in patients with first episode psychosis after 4-week treatment with amisulpride.
figure 1

a Bimodal distribution of total PANSS changes after 4-week amisulpride treatment in first episode psychosis patients (n = 188). Theorical distribution for the whole sample is represented in black. Gaussian probability density functions were obtained by the estimated PANSS changes mixture function and showed two subgroups with poor (red) and good (blue) symptoms improvement. b Volcano plots of DEGs in good (n = 113) and poor (n = 75) responders before and after 4-week amisulpride treatment. Genes over- and under-expressed after Benjamini–Hochberg correction (FDR < 0.1) are shown in red and green, respectively. c Venn diagram representing the overlap between the 32 DEGs identified in good responders of the current study and genes that have been previously reported to be differentially expressed in brain or blood from patients with schizophrenia (SZ) when compared with controls, or after antipsychotic (AP) medication. Either the number of genes or their name is reported in each part of the diagram. Genes for which differential expression has been validated in our replication cohort are shown in bold. *indicates genes that were differentially expressed between good and poor responders before amisulpride treatment. FC Fold change.

In order to identify molecular mechanisms associated with treatment outcome, we compared gene expression levels at baseline and after 4 weeks of treatment with amisulpride. We thus conducted an RNA-seq analysis before and after treatment on total RNA from PBMCs of 188 subjects. We generated an average of 47 million 100 bp paired-end reads per sample. After normalization and quality control, 16,204 genes were expressed in PBMCs before and after treatment. Visual inspection of the first two dimensions of the PCA plot performed on the whole set of genes revealed a homogeneous cluster (Supplementary Fig. S2), showing that the expression level of the majority of genes does not vary overtime. However, comparison of gene expression levels before and after treatment revealed that good responders only showed 32 DEGs that survived to BH-correction for multiple testing when using an FDR threshold of 0.1. Nine of these genes were up-regulated and 23 were down regulated (Fig. 1b, Table 1). Note, 17 out of these 32 genes had already been reported to be differentially expressed in brain or blood from patients with schizophrenia or after antipsychotic medication (Fig. 1c). Although the population of poor responders was smaller than the one of good responders, we estimated that we had a statistical power of 92% to detect a fold change of 1.06, which was the minimum difference that we observed in the 32 genes differentially expressed after treatment in good responders. This suggests that the difference observed between good and poor responders resulted from the response status.

Table 1 Genes differentially expressed after 4-week treatment with amisulpride.

To validate these DEGs, we used qRT-PCR in 24 independent subjects from the OPTiMiSE cohort, who were not included in the RNA-seq analyses and who had more than 20% improvement in total PANSS score after treatment (Supplementary Fig. S1). We selected only brain-expressed protein-coding genes that were differentially expressed after treatment in good responders only. In addition, we restricted our selection to genes for which the expression change was correlated with symptom improvement (Table 1). Although they did not show an FDR lower than 0.1 in poor responders, three genes (FAT1, FBXL13 and P2RY12) were excluded because they had a fold change close to those of good responders and a nominal p value lower than 0.05. In addition, two genes (P4HA2 and TRPC6) were removed from replication studies because they were not detectable by qRT-PCR. Eleven genes (ACSL5, ALPL, CA4, DGAT2, DHRS13, GALNT14, HOMER3, KAZN, KIAA0319, SLC4A4 and WLS) were thus tested in the replication sample. Despite a small sample size limiting our statistical power (80% of probability of detecting a fold change higher than 1.13 or lower than 0.88), 6 genes (ALPL, CA4, DHRS13, GALNT14, KAZN and HOMER3) showed a significant difference in expression levels after treatment (Table 1).

Gene expression level at baseline predicts treatment response

To test whether the genes that were differentially expressed in the replication sample could be used as biomarkers to predict treatment response, we compared their expression levels before treatment between patients who will respond to treatment (n = 113) and those who will not (n = 75). In addition to the 6 genes (ALPL, CA4, DHRS13, GALNT14, HOMER3 and KAZN) that were differentially expressed in the replication sample, we included 4 genes (ACSL5, DGAT2, KIAA0319 and WLS) for which the expression fold-changes were similar between the RNA-Seq and the qRT-PCR analyses (Table 1). Six out of the 10 genes (ALPL, CA4, DHRS13, DGAT2, HOMER3 and WLS) were significantly overexpressed in responders (Table 2). We then estimated the expression level-based predictive performance for these 6 genes. Area under the curve (AUC) varied between 0.589 and 0.634 according to the gene (Table 3). Ten-thousand label permutations among the 188 individuals showed this prediction was significantly higher than random prediction for the 6 genes. As we observed clinical differences between future good and poor responders, we adjusted our model in a multivariate analysis including the age at inclusion, the sex, the duration of untreated psychosis (months) and the positive PANSS score at inclusion. Addition of clinical criteria improved our models for the 6 genes (Table 3). Note, the predictive values combining gene expression and clinical data for each gene was higher than those calculated for gene expression alone or clinical data alone. Finally, we computed a multivariate model considering the expression level of the 6 genes with clinical data and estimated that our model was able to discriminate between good and poor responders with 93.8% chance. As observed on Fig. 2, the combination of both gene expression level at inclusion and clinical features at inclusion considerably improve the overall performance of models based only on gene expression (Z = 4.82, p = 7 × 10−7) or on clinical data alone (Z = 6.14, p = 4 × 10−10), showing the importance of combining clinical and biological data to predict treatment response in first episode psychoses.

Table 2 Gene differential expression at inclusion between future good and poor responders to amisulpride treatment.
Table 3 Predictive value of models based on gene expression and clinical data at inclusion.
Fig. 2: Receiver operator characteristic curves from logistic regression models predicting good or poor response to amisulpride treatment.
figure 2

The red curve represents the model combining the gene expression level of ALPL, CA4, DGAT2, DHRS13, HOMER3 and WLS. The green curve represents the model combining the age, the sex, the positive PANSS score (PPANSS) and the duration of untreated psychosis (DUP). The blue curve represents the model combining the ALPL, CA4, DGAT2, DHRS13, HOMER3 and WLS gene expression level as well as the age, the sex, the PPANSS and the DUP at inclusion. Area under the curve (AUC) are indicated for each model.


Our first hypothesis was to test whether amisulpride would change gene expression differently between good and poor responders to treatment. Consistently with previous studies, we have been able to distinguish good from poor responders using a threshold of 20% change in the total PANSS score after 4 weeks of treatment [32,33,34,35]. We then found that only good responders showed treatment-associated DEGs, suggesting that antipsychotic medication, when efficient, can affect gene expression in peripheral mononuclear cells. Although similar results have previously been reported [23,24,25, 36, 37], we are the first to conduct a study on a large homogeneous cohort of patients, all treated with the same antipsychotic medication, and to have replicated our results on an independent sample.

Our second hypothesis was to determine whether genes for which the expression level changes after treatment in good responders might be used to predict treatment outcome. Six out of the ten DEGs that we tested showed an overexpression at inclusion in patients whose symptomatology improved after 4-week treatment. This is consistent with data from previous studies using other antipsychotic medications, reporting that genes overexpressed in subjects with schizophrenia when compared with controls, were down-regulated after antipsychotic medication [23, 24]. As expected, the six genes that we identified in our cohort were down regulated after treatment, suggesting that antipsychotic medication may normalize the altered expression of genes implicated in schizophrenia.

The 4 genes for which we confirmed the differential expression by replication on an independent sample had previously been reported as being differentially expressed in brain of patients with schizophrenia (Fig. 1c) [24, 38,39,40,41,42,43,44,45,46]. In particular, ALPL has been previously reported to be overexpressed in amygdala [38] of individuals with schizophrenia as well as in blood of drug-naïve patients [46]. Consistently with what we observed in our cohort, its expression has been shown to decrease after treatment with different atypical antipsychotics [23], suggesting its expression level might be used for treatment response prediction irrespective of the drugs took by affected individuals. Moreover, we observed similar predictive values for the four replicated genes (ALPL, DHRS13, HOMER3 and CA4) as well as for two additional genes (DGAT2 and WLS) that were differentially expressed at inclusion between the future good and poor responders. For all of them, we were able to increase the accuracy of our models combining clinical data and gene expression at inclusion. Although the proteins encoded by these genes were not known to interact together or to be involved in a single functional pathway, the best model was observed when combining the expression level of the six genes and was improved again when we added clinical features. These results are consistent with the polygenic hypothesis of schizophrenia and demonstrate the importance of combining biological and clinical markers to develop precision medicine in psychiatry.

Few information was available on the function of DHRS13 and CA4, except the first belongs to the short chain deshydrogenase/reductase enzyme family [47] and the latter encodes a carbonic anhydrase expressed at apical surface of several cell types, including brain capillaries [48]. Nevertheless, we noticed that CA4 may contribute to the extracellular buffering after synaptic transmission in hippocampal slices [49]. Similarly, there is few information on WLS and DGAT2. Note that the protein encoded by DGAT2 catalyses the final reaction in the synthesis of triglycerides [50]. An overexpression of this gene may thus partly explain the metabolic syndrome frequently observed in individuals with psychosis [51] even in drug-naïve patients with first episode schizophrenia [52]. In contrast, there is a body of literature supporting the role of HOMER3 and ALPL, in schizophrenia treatment. Homer proteins are enriched at the postsynaptic density of the glutamatergic neurons and directly bind the metabotropic glutamate (mGlu) receptors that are more and more investigated as novel drug targets for this disease [53]. HOMER1 and HOMER3 proteins form a tetrameric structure that interact with SHANK proteins and the mGlu receptors, serving as a structural framework for the postsynaptic density, essential to maintain the dendritic spine structure and synaptic function [54]. Interestingly, overexpression of Homer1a in basal and lateral nucleus of the amygdala impaired auditory feared conditioning and reduce social interaction in rats [55], two core phenotypes for animal models of schizophrenia. By decreasing the expression of HOMER3, amisulpride may thus act on the different groups of mGlu receptors and improve psychotic and cognitive symptoms in individuals with first episode psychosis.

ALPL is the only gene that has been previously reported with a decreased expression after antipsychotic treatment in medication-naïve patients with schizophrenia [23]. ALPL, encodes a member of the tissue-nonspecific alkaline phosphatase family and is ubiquitously expressed. In human, it is expressed at synapse to regulate the dopamine, serotonin, norepinephrine and γ-amino butyric acid (GABA) syntheses in cerebral cortex, by dephosphorylating the active form of vitamin B6, the pyridoxal phosphate (PLP), into pyridoxal (PL) [56]. ALPL alteration is also associated with purinergic signalling dysregulation [57], which may be involved in psychiatric disorders [58]. A down-regulation of ALPL after treatment may thus affect neuron proliferation or decrease the biosynthesis of specific neurotransmitters, whose dysregulation is known to be a core component of schizophrenia symptoms. Its implication in the synthesis of various neurotransmitters may explain the improvement of both negative and positive symptoms observed in good responders treated with amisulpride.

Although there is still a debate on how much the peripheral changes might reflect brain processes [59,60,61], we can speculate that the peripheral down-regulation of ALPL, CA4, DGAT2, DHRS13, HOMER3 and WLS observed in the blood of antipsychotic-treated patients may reflect their differential expression in the brain. Their blood expression level might thus be used as a biomarker to distinguish individuals for whom amisulpride will have the best efficiency from those who may benefit from other antipsychotics. We have recently shown that change from amisulpride to olanzapine did not improve outcome for most of the patients [26]. These biomarkers may thus help in selecting patients to treat earlier with clozapine. Further replications on independent cohorts are now needed to confirm our results and determine if our biomarker-based treatment response prediction might be relevant for other antipsychotic medications.

Altogether, our results identified new mechanisms to explain symptom improvement after amisulpride medication and propose new blood biomarkers that, in combination with selected clinical symptoms, may help in predicting treatment outcome in first episode psychoses.

Funding and disclosure

This work was supported by the European Commission Seventh Framework Programme (HEALTH-F2–2010–242114) and the Investissements d’Avenir programme managed by the Agence Nationale pour la Recherche (ANR) under reference ANR-11-IDEX-0004-02 (Labex BioPsy). This work also received financial support from the Institut National pour la Santé et la Recherche Médicale (Inserm); the Fondation FondaMental; the Commissariat à l’énergie atomique et aux énergies alternatives (CEA) and the Fondation pour la Recherche Médicale (FRM Espoirs de la Recherche 2015, FDM20151234954 to RT). The Jamain’s team is affiliated with the Paris School of Neuroscience (ENP) and the Bio-Psy Laboratory of Excellence. The authors, including the contributing members of the OPTiMiSE Study Group, have declared that they have no competing financial interests in relation to the work described in this study.