Introduction

Schizophrenia, bipolar disorder and severe depression are common and highly disabling brain diseases caused by an interaction of genetic and environmental factors.1, 2 However, despite enormous efforts, the genetic variations that contribute to these diseases and their environmental risk factors remain elusive. Genome-wide association studies have frequently been employed to identify susceptibility genes and single-nucleotide polymorphisms (SNPs) that may be associated with these mental disorders.3, 4, 5 A number of candidate genes for the disorders have been reported. For instance, a web resource for schizophrenia, the Schizophrenia Forum (SZgene) database (http://www.szgene.org/), includes results from 1727 genetic association studies and reports 1008 candidate genes and 8788 polymorphisms in the update on 15 April 2011.6 Despite the numerous candidate genes reported for schizophrenia, the effect size of each variant is small or moderate and most associated SNPs have failed to be replicated. The need for independent and systematic validation to prioritize further examination of possible candidate genes for mental disease is widely acknowledged.

Identification of DNA sequence variants that regulate gene expression levels in a relevant tissue is one of the most promising approaches used to initially scan for candidate genes as well as to prioritize previously identified candidate genes that are associated with complex disease such as psychiatric disorders.7, 8, 9 The identification of a cis association of a SNP with gene expression levels has been previously used to validate candidate genes for complex traits mapped to the same chromosomal locations.10 Our recent study using an integrative approach that combined results from genome-wide SNP scans for the cytoarchitectural traits and cis expression quantitative trait loci (eQTL) analysis in the brain tissue revealed two novel candidate genes associated with cellular abnormalities in the prefrontal cortex of major psychiatric disorders.11 Limited availability of human post-mortem brain tissues is a major obstacle to obtaining detailed brain expression complex trait loci (eQTL) mapping. Utilization of publicly available resources is an effective alternative strategy that may overcome such limitation. The Stanley Neuropathology Consortium Integrative Database (SNCID; http://sncid.stanleyresearch.org) is a publicly available and web-based tool that integrates expression microarray data sets from five brain regions including frontal cortex, temporal cortex, thalamus, cerebellum and hippocampus and genome-wide SNP genotype data sets of subjects in the Stanley Neuropathology Consortium (SNC) and the Array Collection (AC).12 A total of 1749 neuropathology data sets using the SNC are integrated into the database, which thereby enables one to further explore the correlations between gene expression levels and quantitative measures of neuropathological markers in the various brain regions. The specific aims of this study are twofold. First, we explore the candidate genes that may be functionally relevant for major psychiatric disorders by identifying cis associations between SNPs and gene expression in various brain tissues. Second, we examine the possible functional role of schizophrenia candidate genes that were previously identified in genetic association studies. Thus, we explored cis eQTLs in the four brain regions, frontal cortex, temporal cortex, thalamus and cerebellum, of SNC subjects and in hippocampus of AC subjects. We also repeated the analysis in frontal cortex data from the AC as a replication study to examine the overall consensus of cis eQTLs between the two frontal data sets. We then examined whether the expression levels of any candidate genes from the SZgene database meta-analysis (http://www.szgene.org/) were regulated by cis expressed SNPs (eSNPs) in brain tissues, in order to determine if there were any functional effects on gene expression of the previously identified schizophrenia susceptibility genes. Finally, we performed a coexpression network analysis between the genes in the frontal cortex that were differentially expressed between schizophrenia and normal controls and the cis eQTL genes in an attempt to identify the potential role of these genes in a disease-specific coexpression module.

Materials and methods

Data used in this study

Gene expression microarray data from frontal cortex,13 cerebellum, thalamus and temporal cortex14were generated by multiple independent groups using samples from the SNC (N=60), which contains 15 well-matched cases in each of four groups: schizophrenia, bipolar disorder, major depression and unaffected controls.15 Other sets of microarray data from frontal cortex16, 17and hippocampus were generated using samples from the AC (N=105). The AC is an independent tissue collection containing 35 cases in each of three groups: schizophrenia, bipolar disorder and unaffected controls. The groups from both tissue collections are matched for descriptive variables such as age, gender, race, post-mortem interval, mRNA quality, brain pH and hemisphere. Outlier chip data were excluded in this analysis based on previous quality-control analyses for chip-level parameters such as scaling factor, gene call and average correlation.18 Information for the microarray studies such as tissue collection, brain region and number of outlier chips is listed in the Supplementary Table S1 online. The confounding effects on the Frozen Robust Multiarray Analysis (fRMA)-normalized microarray gene expression data were identified using Surrogate Variable Analysis (SVA).19 To adjust disease effect on the gene expression data, we randomly assign 0 or 1 for the primary variable in the SVA. All covariates from SVA were used in the linear regression to adjust the confounding effects on the gene expression data. The standardized residuals from the linear regression were used to evaluate the effectiveness of this method on removing confounding variables on two microarray data sets from both the SNC and AC. Transcripts correlated with potential confounding variables were identified using nonparametric analysis. The continuous variables such as age, brain pH, post-mortem interval and lifetime exposure to antipsychotics were examined by correlation analysis using R (open source program from Comprehensive R Archive Network (CRAN)). Two categorical variables such as microarray batch and sex were tested using variance analysis. Adjusted P-values, based on the Hochberg method that were <0.05, were considered significant. Although all cases and controls were included in the analysis, only the disorder cases were used for the correlation analysis for the effect of lifetime exposure to antipsychotics. SNP genotyping data using DNA samples from the SNC and the AC were generated by Dr Chun-Yu Liu and colleagues (University of Chicago, IL, USA) using the Human SNP Array 5.0 chips (Affymetrix, Santa Clara, CA, USA).20

eQTL analysis

Raw image files from SNP chips, quality-control analysis and identification of ethnic outliers were performed as previously described.11 Briefly, genotypes were called using the BRLMM algorithm (Affymetrix). SNPs with a call rate of <90%, minor allele frequency <5% or extreme deviation from Hardy–Weinberg equilibrium test (P<0.05) were filtered out for further eQTL analyses. A total of 309 531 SNPs passed this filter. For examination of population stratification, clustering was initially performed using the pairwise identity-by-state (IBS) calculator in the PLINK.21 IBS pairwise distances were then plotted and examined by multidimensional scaling analysis and Z statistical analysis. Samples of >3 s.d. compared with the group mean were considered outliers. Four ethnic outliers from the SNC and three outliers from the AC were excluded in the eQTL analysis. One additional sample from AC was excluded because of a final diagnosis of CADASIL (cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy). We only used genotyped SNP data from chips for our association analysis rather than imputing genotypes because SNP imputation can often result in errors in genotyping and cause false-positive associations.22 The standardized residuals from the linear regression were used as traits in PLINK for eQTL analyses. We defined cis eSNPs as those that were localized within 1 Mb of either the 5′ or the 3′ end of the gene. The trans eSNPs were defined as all SNPs that reached genome-wide significance level, except those in a cis position. We employed a conservative Bonferroni method to correct multiple testing for controlling false positives.7 Adjusted P-values of <0.05 (unadjusted P-value; 1.6E−07 =0.05/309 531) were considered genome-wide significant for eQTL analyses.

Coexpression network analysis

Unsupervised and supervised coexpression network analyses were performed using the Weighted Correlation Network Analysis (WGCNA) in R.23 The coexpression network was generated using expression values of all genes in the frontal cortex of schizophrenia and normal controls from the AC (unsupervised WGCNA). A second coexpression network was generated using significant cis eQTL genes and genes that were differentially expressed in the frontal cortex between schizophrenia and normal controls from the AC samples (supervised WGCNA).24 A total of four microarray data sets (at www.stanleygenomics.org; study no. 1, 3, 5 and 7) were generated from prefrontal cortex. Three of these (study no. 1, 3 and 7)16, 17 were generated using the same platform, Affymetrix 133a, and hence to avoid variations between platforms we pooled the data from these three data sets. The pooled data were then subjected to median normalization with the biometric research branch (BRB)-array tools (http://linus.nci.nih.gov/BRB-ArrayTools.html) to remove systematic variations. After median normalization, confounding effects were adjusted using SVA and a linear regression method as described in the previous section. However, disease effect was not removed. Standardized residuals that were significantly associated with disease (nominal P-value <0.05) and standardized residuals of cis eQTL genes were then used as input for the WGCNA.23 The minimum module size and the minimum height for merging modules were set at 30 and 0.25, respectively. The coexpression module was visualized using VisANT.25

Functional annotation

The cis eQTL genes and genes that were involved in the coexpression module were functionally annotated using the Database for Annotation, Visualization and Integrated Discovery (DAVID) database (http://david.abcc.ncifcrf.gov/home.jsp) and by the over-representational analysis method.26 The biological processes of Gene Ontology Consortium (http://www.geneontology.org) were used for functional annotations. The P-values of <0.05 were considered significant.

Results

eQTL analysis in various human post-mortem brain tissues

Gene expression microarray data derived from post-mortem brain tissue are often confounded by uncontrolled biological, clinical and technical variables.27 Batch effect is particularly problematic and has been shown to significantly affect gene expression levels in microarray data.28, 29 To remove the effect of batch and other confounding variables in our gene expression microarray data, we normalized the data using the newly developed method, fRMA, followed by the SVA.19, 30 We evaluated how effective this method was at removing confounding variables using two microarray data sets from both the SNC and AC (Supplementary Table S1 online). Using the data set from SNC temporal cortex (study 18) we found that microarray batch was the most significant confounding variable in both the RMA and fRMA-normalized data sets, with 947 and 1031 transcripts significantly correlated with batch, respectively (Supplementary Table S2 online). Using the data set from AC frontal cortex (study 1) we found that microarray batch and brain pH were both major confounding variables (Supplementary Table S3 online). The SVA successfully adjusted the effects of the confounding variables on both microarray data sets (Supplementary Table S2 and S3 online).

Using the SVA we obtained the standardized residuals from the linear regression with covariates and conducted a genome-wide eQTL analysis of various brain tissues. We used the standardized residuals as traits. We initially analyzed gene expression microarray data from frontal cortex, temporal cortex, thalamus and cerebellum from the SNC (Supplementary Table S1 online). Expression levels of a total of 53, 11, 84 and 27 genes were correlated with cis SNPs in the frontal cortex, temporal cortex, thalamus and cerebellum at genome-wide significance level, respectively (nominal P<1.6E−07; Figure 1a and Supplementary Table S4 online). Among the cis eQTL genes, expression levels of 16, 0, 20 and 5 genes were also significantly associated with trans SNPs in the frontal cortex, temporal cortex, thalamus and cerebellum, respectively (Supplementary Table S5 online). In addition, correlations between the expression levels of 31, 1, 69 and 15 genes and cis SNPs were unique in the frontal cortex, temporal cortex, thalamus and cerebellum, respectively (Figure 1a). The expression level of only one gene, phosphodiesterase 4D interacting protein (PDE4DIP), was associated with a SNP, rs12124527, in all the brain regions tested here. We then replicated the cis eQTLs of the frontal cortex using the larger AC collection. The replication study revealed associations between cis SNPs and expression levels of 460 genes and replicated 34 cis eQTL genes out of 53 (64%) that were identified in the SNC study (Figure 1b and Supplementary Table S6 online). Moreover, 281 cis eQTL genes were identified in the AC hippocampus data and 147 cis eQTLs were common to both the frontal cortex and hippocampus (Figure 1c and Supplementary Table S7 online). Among the cis eQTL genes, expression levels of 43 and 46 genes were also significantly associated with trans SNPs in the frontal cortex and hippocampus, respectively (Supplementary Table S5 online). The association between PDE4DIP expression and the rs12124527 SNP was replicated in the AC frontal cortex and hippocampal data.

Figure 1
figure 1

Number of cis expression quantitative trait loci (eQTL) genes in various brain regions. Venn diagram shows common and unique cis eQTL genes across multiple brain regions of the Stanley Neuropathology Consortium (SNC) samples (a) and of the Array Collection (AC) samples (c). Overlapped cis eQTL genes in the frontal cortex between the SNC samples and the AC samples are shown (b).

Next, we performed a functional annotation analysis to identify biological processes that were overrepresented in the brain cis eQTL genes. Whereas several processes such as cell adhesion, visual perception and glutamatergic transmission were overrepresented in the genes with cis eSNPs in the SNC frontal cortex (Table 1), metabolic processes such as glutamine metabolic process and protein transport and targeting and antigen processing were overrepresented in the AC frontal cortex (Table 2). Amino acid metabolic process, nucleotide biosynthesis and enzyme-linked receptor protein signaling pathways were significantly overrepresented in cis eQTL genes in the AC hippocampus (Supplementary Table S8 online).

Table 1 Biological processes (Gene ontology) significantly associated with cis eQTL genes in the frontal cortex of the Stanley Neuropathology Consortium samples
Table 2 Biological processes (Gene ontology) significantly associated with cis eQTL genes in the frontal cortex of the Array Collection samples

Comparison between schizophrenia susceptibility candidate genes and brain cis eQTL genes

Genetic association studies have yielded numerous candidate genes that may increase the risk for schizophrenia. However, most candidate genes have not been replicated nor functionally validated. To examine the possible functional role of schizophrenia candidate genes, we compared the list of the candidate genes in the SZgene database meta-analysis (updated 12/1/2010) to our list of cis eQTL genes. The SZgene meta-analysis identified 45 genetic variants and 42 linked genes. After excluding the non-SNP variants from their data set, we were left with 39 SNPs and 39 linked candidate genes. Because only 6 SNP markers out of the 39 SNPs were included in our Affymetrix SNP 5.0 data set, we conducted a gene-level comparison instead of SNP-level comparison. We determined whether there were cis associations between the expression levels of the 39 candidate genes and SNPs within 1 Mb of the genes. Among the 39 candidate genes, we found that the expression levels of four genes, HTR2A, PLXNA2, SRR and TCF4, were significantly associated with cis SNPs in at least one brain region tested (Table 3). The expression levels of HR2A and PLXNA2 were associated with cis SNPs in the frontal cortex, whereas the expression levels of SRR (serine racemase) and TCF4 were associated with cis SNPs in two brain regions. The cis eSNPs of these genes are located at least 25 kb from the SNPs that were significantly associated with schizophrenia in the SZgene meta-analysis. Thus the SZgene case–control genetic association analyses for schizophrenia may not have identified the most functionally relevant genetic variations that contribute to the etiology of psychiatric disorders.

Table 3 Schizophrenia candidate genes (from the SZgene database) with expression levels significantly associated with cis eSNPs in brain tissue

Coexpression network analysis in the frontal cortex

To further examine whether or not the four schizophrenia candidate genes (HTR2A, PLXNA2, SRR and TCF4) and genes of which expression levels were regulated by cis SNPs may be involved in the etiology of schizophrenia, we performed both unsupervised and supervised gene coexpression network analyses using the AC frontal cortex data. We were unable to construct a coexpression module that was significantly associated with schizophrenia disease status using the unsupervised analysis. One module was associated with disease (P=0.05); however, it was also associated with post-mortem interval (P=0.01). We then constructed a supervised coexpression network using genes differentially expressed between schizophrenia and normal controls (Supplementary Table S9 online) and the cis eQTL genes obtained from the pooled data of three Affymetrix 133a microarray data sets that measured gene expression in the frontal cortex. We constructed one coexpression module that was significantly associated with schizophrenia disease status (P=2E−08; Figure 2a). Age, sex, post-mortem interval, brain pH and lifetime antipsychotic treatment were not significantly associated with this module (all P>0.05). Genes associated with apoptosis, chromatin organization, RNA splicing, cell cycle, regulation of nucleic acid metabolism and endocytosis were overrepresented in this module (Figure 2b and Supplementary Table 10 online). A previous coexpression network analysis that used gene expression microarray data from prefrontal cortex from schizophrenia subjects and controls31 also identified a module (module 16) with similar overrepresentation of biological processes such as chromatin organization, cell cycle, endocytosis and regulation of nucleic acid metabolism. Apoptosis and endocytosis have previously been associated with the pathophysiology of the frontal cortex in schizophrenia,32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44 and recent studies have also indicated that aberrant RNA splicing and epigenetic alterations may be involved in the pathophysiology of schizophrenia.35, 36 Several genes associated with GABAergic neurons, including γ-aminobutyric acid (GABA) A receptor, δ (GABRD) and parvalbumin (PVALB), were found in the coexpression module. However, out of the four schizophrenia candidate genes common to both the SZgene meta-analysis and our cis eQTL gene list, only one, SRR, was found in the module. The biological process, response to drug, was overrepresented in the module and was enriched with 11 genes including SRR. A substantial number of cis eQTL genes were involved in the coexpression module that was significantly associated with schizophrenia disease status and were also associated with the biological processes. This result indicates that cis eQTL analysis in brain tissue may more reliably identify susceptibility genes for schizophrenia as compared with the current case–control genetic association studies.

Figure 2
figure 2

Coexpression network analysis in the frontal cortex. The coexpression module that is significantly associated with schizophrenia in frontal cortex of the Array Collection (AC) (a) and biological processes (Gene ontology) overrepresented in the genes in the coexpression module (b). Network connections with topological overlap above the threshold of 0.02 were visualized using VisANT.25 The cis expression quantitative trait loci (eQTL) genes are pink. The candidate gene, SRR (serine racemase), derived from the meta-analyses of genetic studies in the SZgene database (http://www.szgene.org/) is in blue.

Discussion

Identifying genetic variations that affect gene expression in the brain may be a promising approach for finding molecular pathways that are functionally relevant to the etiology and/or treatment of mental disease. In this study, we conducted an eQTL analysis of 315 440 transcripts in 5 different brain regions from two different tissue collections and identified cis associations between 648 transcripts and 6725 SNPs. The expression of one gene, PDE4DIP, was associated with one SNP, rs12124527, in all brain regions examined. This association was also previously described in the frontal cortex.20 The protein encoded by PDE4DIP serves to anchor phosphodiesterase 4D to the Golgi/centrosome region of the cell. A number of abnormalities in the phosphodiesterase signaling system have been described in the brains of subjects with schizophrenia, bipolar disorder and depression,37, 38, 39, 40, 41, 42 indicating that molecules within this system could be potential targets for therapeutic intervention.38, 43

Approximately 14% of cis eQTL genes were also correlated with trans SNPs in various brain regions, suggesting that the expression levels of a subset of cis eQTL genes may be regulated by multiple variants. However, when we examined whether or not the expression levels of candidate genes from the SZgene database meta-analysis were significantly associated with cis SNPs, we found only 4 genes that overlapped between the SZgene database and our eQTL gene list. Furthermore, only one candidate gene, SRR, was involved in a coexpression module that was associated with schizophrenia. SRR maps to chromosome 17p13 and encodes an enzyme that synthesizes D-serine from L-serine.44 The D-serine is an endogenous co-agonist of the N-methyl-D-aspartate (NMDA) receptor.45 Hypofunction of the NMDA receptor is potentially a major underlying pathophysiology of schizophrenia.46, 47 Our results support this hypothesis and suggest that abnormal NMDA receptor-mediated signaling may be influenced by genetic variations. A SNP, rs16952025, localized in an intron of the neighboring gene, SMG6, was significantly associated with the expression level of SRR. However, there was no significant association between this SNP and the expression level of SMG6. Several post-mortem studies have examined levels of SRR mRNA and serine racemase protein in schizophrenia48 and found abnormalities in schizophrenia, although the results have been inconsistent. Although SRR mRNA levels appear to be unchanged in frontal cortex of schizophrenia,41 the protein levels have been reported to be either decreased,49 increased50 or unchanged.51 The inconsistent results are most likely because of different methodologies, different cohorts (often with small numbers) and the different brain areas used. Consequently further study will be required in the future when larger cohorts become available to confirm changes in SRR levels in the brain of subjects with mental illness.

Our comprehensive brain eQTL analysis functionally validated only 4 genes out of 39 candidate genes positively identified in the SZgene meta-analysis. We were unable to identify any significant associations between the expression levels of the remaining genes and cis SNPs in any of the brain regions we tested. In fact, the 39 candidate genes were derived from 1008 candidate genes that were obtained from 1727 original genetic association studies. Such a low functional validation rate raises the possibility that the current case–control genetic association studies may not effectively identify genetic variations that underlie the etiology of schizophrenia. However, there are other reasons that may contribute to a low functional validation rate. For example, the probes on the microarray platforms used to analyze gene expression in this study mainly bind to sequences in the 3′-untranslated regions and do not distinguish between various alternative splicing isoforms. Indeed, tissue-specific alternative RNA-splicing is very predominant in the brain.52 Furthermore, intronic SNPs can be associated with the altered expression of specific alternative splicing isoforms of certain schizophrenia candidate genes, for example, ErbB4 and GRM3.53, 54 Therefore, comprehensive expression profiling that includes various alternative splicing isoforms using deep mRNA-sequencing technology may aid in the identification of novel cis eQTL genes in human post-mortem brain tissues in the future.

The frontal cortex is one of the most thoroughly examined brain regions in post-mortem studies and many neuropathology abnormalities have been identified in this region in schizophrenia.55, 56 Previous gene expression microarray studies in the frontal cortex identified several biological processes that were overrepresented in the genes differentially expressed between schizophrenia and normal controls; for example, decreased presynaptic function, abnormal mitochondrial function and altered expression of apoptosis-related genes are all major findings from microarray studies of frontal cortex in schizophrenia.33, 34, 57 However, glutamatergic transmission, amino acid metabolism, proteolysis and protein targeting were all overrepresented in the eQTL genes in the frontal cortex in our current study. Thus, the abnormalities described in the biological pathways from the eQTL study may be more directly related to genetic variation, whereas the pathways identified by gene expression studies are likely to be influenced by factors in addition to genetic variation, including epigenetics and environmental factors.

Although our study reveals a number of associations between cis SNPs and gene expression in multiple brain regions, the results should be interpreted with caution. First, the SNC, which we used for the initial eQTL analyses, contains a relatively small sample size (N=56). Small sample size is known to generate higher false-positive associations as well as to be a cause of low detection power in genome-wide association analysis. Thus, the cis eQTL results from frontal cortex, cerebellum, thalamus, and temporal cortex using SNC samples should be viewed as exploratory. However, we subsequently performed a second analysis using an independent collection (AC) with a larger sample size (N=101). Our previous power analysis using AC as well as a previous eSNP association study indicated that a relatively small sample size (N=100) has >80% power to detect an association of gene expression traits with moderate effect size (R2=0.35).11, 58 We therefore attempted to replicate the results of cis eQTLs in frontal cortex from SNC using the AC data. A total of 34 (64%) of the cis eQTL genes identified in the SNC frontal cortex data were also found in the AC frontal data. However, we identified 450 additional cis eQTL genes in the AC frontal cortex samples, which were not identified in the SNC frontal samples, suggesting that some significant cis eQTL genes may have been missed in the SNC analysis.

Second, using whole tissues for gene expression traits may dilute the effect of some genetic variants that may only act on cell type-specific gene expression. Although this phenomena has not been explored in the brain, there are numerous cell type-specific abnormalities in the brain of subjects with psychiatric disorders.32, 59, 60, 61 Thus, the use of cell type-specific expression traits in future studies may increase the power to identify cis eQTLs in the brain.

In this study, we investigated the associations between SNPs and gene expression in various human brain regions. Although previous brain eQTL studies focused on cortex,8, 62 we have extended the analysis to include the hippocampus, thalamus and cerebellum. These data can be used to identify genetic variations associated with psychiatric disorder and can be used to identify genetic variations that affect neuropathological abnormalities and gene expression changes. As we show in this study, the data can be used to functionally validate candidate genes to determine if they are affecting changes in gene expression in subjects with neuropsychiatric disorders. In order to facilitate further studies, we have integrated the genome-wide eQTL results from this study into the SNCID, which is a web-based database that also includes 1747 neuropathological markers measured in the same SNC samples. The update will allow users to investigate associations between SNPs and genes of interests in various brain regions and to further explore associations between SNPs and neuropathological markers and gene expression traits that are correlated with neuropathological markers in the various brain regions of subject with major psychiatric disorders.