Introduction

‘Things fall apart; the center cannot hold’

– WB Yeats, The Second Coming

Schizophrenia is a devastating disorder affecting 1% of the population. While there is clear evidence for roles for both genes and environment, a comprehensive biological understanding of the disorder has been elusive so far. Most notably, there has been until recently a lack of concerted integration across functional and genetic studies, and across human and animal model studies, resulting in missed opportunities to see the whole picture.

As part of a translational convergent functional genomics (CFG) approach, developed by us over the last decade,1, 2, 3, 4, 5 and expanding upon our earlier work on identifying genes for schizophrenia6 and biomarkers for psychosis,7 we set out to comprehensively identify candidate genes, pathways and mechanisms for schizophrenia, integrating the available evidence in the field to date. We have used data from published genome-wide association studies (GWAS) data sets for schizophrenia.8, 9 We integrated those data with gene expression data—human postmortem brain gene expression data, human induced pluripotent stem cell-derived neuronal cells10 and human blood gene expression data7 published by others and us, as well as with relevant animal model brain and blood gene expression data generated by our group6 and others. In addition, we have integrated as part of this comprehensive approach other genetic data—human genetic data (linkage, copy number variant (CNV) or association) for schizophrenia, as well as relevant mouse model genetic evidence (Figure 1, Table 1 and Figure 2). Animal model data provide sensitivity of detection, and human data provide specificity for the illness. Together, they help to identify and prioritize candidate genes for the illness, using a polyevidence CFG score, resulting in essence in a de facto field-wide integration putting together the best available evidence to date. Once that is done, biological pathway analyses can be conducted and mechanistic models can be constructed (Figure 3).

Figure 1
figure 1

Convergent functional genomics. GWAS, genome-wide association study; ISC, International Schizophrenia Consortium; SNP, single-nucleotide polymorphism.

PowerPoint slide

Table 1 Top candidate genes for schizophrenia—CFG analysis of ISC GWAS data
Figure 2
figure 2

Top candidate genes for schizophrenia. CFG, convergent functional genomics; GWAS, genome-wide association study; ISC, International Schizophrenia Consortium.

PowerPoint slide

Figure 3
figure 3

Schizophrenia as a disease of disconnection. (a) Biology of schizophrenia, (b) gene–environment interplay.

PowerPoint slide

An obvious next step is developing a way of applying that knowledge to genetic testing of individuals to determine risk for the disorder. On the basis of our comprehensive identification of top candidate genes described in this paper, we have chosen the nominally significant single-nucleotide polymorphisms (SNPs) inside those genes in the GWAS data set used for discovery (International Schizophrenia Consortium, ISC), and assembled a genetic risk prediction (GRP) panel out of those SNPs. We then developed a genetic risk prediction score (GRPS) for schizophrenia based on the presence or absence of the alleles of the SNPs associated with the illness in ISC, and tested the GRPS in independent cohorts (GAIN European Americans (EA), GAIN African Americans (AA), nonGAIN EA, nonGAIN AA)9 for which we had both genotypic and clinical data available, comparing the schizophrenia subjects to normal controls. Our results show that a panel of SNPs in top genes identified and prioritized by CFG analysis can differentiate between schizophrenia subjects and controls at a population level, although at an individual level the margin is minimal. The latter point suggests that, like for bipolar disorder,11 the contextual cumulative combinatorics of common variants and environment12 plays a major role in risk for illness. Moreover, the genetic risk component identified by us seems to be stronger for classic age at onset schizophrenia than for early onset and late-onset schizophrenia, suggesting that those subtypes may be different, either in having a larger environmental component or having a different genetic component.

We have also looked at genetic heterogeneity, overlap and reproducibility between independent GWAS for schizophrenia. We show that the overlap is minimal at a nominal P-value SNP level, but increases dramatically at a gene level, then at a CFG-prioritized gene level and finally at a pathway level. CFG provides a fit-to-disease prioritization of genes that leads to generalizability in independent cohorts, and counterbalances the fit-to-cohort prioritization inherent in classic SNP level genetic-only approaches, which have been plagued by poor reproducibility across cohorts. Finally, we have looked at overlap with candidate genes for other psychiatric disorders (bipolar disorder, anxiety disorders), as well as with other disorders affecting cognition (autism, Alzheimer disease (AD)), and provide evidence for shared genes.

Overall, this work sheds comprehensive light on the genetic architecture and pathophysiology of schizophrenia, provides mechanistic targets for therapeutic intervention and has implications for genetic testing to assess risk for illness before the illness manifests itself clinically.

Materials and methods

Genome-wide association studies data for schizophrenia

The GWAS data from the ISC was used for the discovery CFG work.8 This cohort consists of EA subjects (3322 schizophrenics and 3587 controls). SNPs with a nominal allelic P-value <0.05 were selected for our analysis. No Bonferroni correction was performed.

Four independent cohorts,9 two EA (GAIN EA 1170 schizophrenics and 1378 controls; nonGAIN EA 1149 schizophrenics and 1347 controls) and two AA (GAIN AA 915 schizophrenics and 949 controls; nonGAIN AA 78 schizophrenics and 20 controls), were used for testing the results of the discovery analyses. The GWAS GAIN and nonGAIN data used for analyses described in this paper were obtained from the database of Genotype and Phenotype (dbGaP) found at www.ncbi.nlm.nih.gov.

The software package PLINK (http://pngu.mgh.harvard.edu/~purcell) was used to extract individual genotype information for each subject from the GAIN GWAS data files. We analyzed EA, and separately, AA, schizophrenia subjects and controls.

Gene identification

To identify the genes that correspond to the selected SNPs, the lists of SNPs from the GWAS were uploaded to NetAFFX (Affymetrix, Santa Clara, CA, USA; http://www.affymetrix.com/analysis/index.affx). We used the Netaffx na32 Genotyping Annotation build. In the cases where a SNP mapped to multiple genes, we selected all the genes. SNPs for which no gene was identified were not included in our subsequent analyses.

Convergent functional genomics analyses

Databases

We have established in our laboratory (Laboratory of Neurophenomics, Indiana University School of Medicine; www.neurophenomics.info) manually curated databases of all the human gene expression (postmortem brain, blood, cell cultures), human genetic (association, CNVs, linkage) and animal model gene expression and genetic studies published to date on psychiatric disorders.12 Only the findings deemed significant in the primary publication, by the study authors, using their particular experimental design and thresholds, are included in our databases. Our databases include only primary literature data, and do not include review papers or other secondary data integration analyses, to avoid redundancy and circularity. These large and constantly updated databases have been used in our CFG cross-validation and prioritization (Figure 1).

Human postmortem brain gene expression evidence

Information about genes was obtained and imported in our databases by searching the primary literature with PubMed (http://ncbi.nlm.nih.gov/PubMed), using various combinations of keywords (for this work: schizophrenia, psychosis, human, brain, postmortem). Convergence was deemed to occur for a gene if there were published human postmortem brain data showing changes in expression of that gene in tissue from patients with schizophrenia.

Human blood and other peripheral tissue gene expression data

For human blood gene expression evidence, we have used previously generated data from our group,7 as well as published data from the literature. We also included recent data generated from induced pluripotent stem cell-derived neurons.10

Human genetic evidence (association, CNVs, linkage)

To designate convergence for a particular gene, the gene had to have independent published evidence of association, CNVs or linkage for schizophrenia. We sought to avoid using any association studies that included subjects that were also included in the ISC or GAIN GWAS. For CNVs, all the known genes on a CNV were taken. For linkage, the location of each gene was obtained through GeneCards (http://www.genecards.org), and the sex-averaged cM location of the start of the gene was then obtained through http://compgen.rutgers.edu/old/map-interpolator/. For linkage convergence, per our previously published criteria,2 the start of the gene had to map within 5 cM of the location of a marker linked to the disorder.

Animal model brain and blood gene expression evidence

For animal model brain and blood gene expression evidence, we have used our own comprehensive pharmacogenomic mouse model (phencyclidine and clozapine) data sets,6 as well as published reports from the literature curated in our databases.

Animal model genetic evidence (transgenic)

To search for mouse genetic evidence (transgenic) for our candidate genes, we utilized PubMed as well as the Mouse Genome Informatics (http://www.informatics.jax.org; Jackson Laboratory, Bar Harbor, ME, USA) database, and used the search ‘Genes and Markers’ form to find transgenics for categories ‘Schizophrenia’ as well as ‘abnormal nervous system physiology’ (subcategory ‘abnormal sensorimotor gating’).

Convergent functional genomics analysis scoring

We used a nominal P-value threshold for including genes from the ISC GWAS in the CFG analysis: having a SNP with P<0.05. All six cross-validating lines of evidence (other human data, animal model data) were weighted equally, receiving a maximum of 1 point each (for human genetic evidence: 0.5 points if it is linkage, 0.75 if it is from CNVs, 1 point if it is association). Thus, the maximum possible CFG score for each gene is 6. We have capped each line of evidence at 1 point, regardless of how many different studies support that line of evidence, to avoid potential ‘popularity’ biases, where some genes are more studied than others.

The more lines of evidence, that is, the more times a gene shows up as a positive finding across independent studies, platforms, methodologies and species, the higher its CFG score (Figure 1). This is similar conceptually to the Google PageRank algorithm, in which the more links to a page, the higher it comes up on the search prioritization list.13 Human and animal model data, genetic and gene expression were integrated and tabulated, resulting in a polyevidence CFG score. It has not escaped our attention that other ways of weighing the lines of evidence may give slightly different results in terms of prioritization, if not in terms of the list of genes per se. Nevertheless, we feel this simple scoring system provides a good separation of genes, with sensitivity provided by animal model data and specificity provided by human data.

Pathway analyses

IPA 9.0. (Ingenuity Systems, Redwood City, CA, USA) was used to analyze the biological roles, including top canonical pathways, of the candidate genes resulting from our work (Table 2 and Supplementary Table S5), as well as used to identify genes in our data sets that are the target of existing drugs (Supplementary Table S2).

Table 2 Ingenuity pathway analyses of top candidate genes

Intra-pathway epistasis testing

As an example,11 the ISC GWAS data were used to test for epistatic interactions among the best P-value SNPs in genes from our data set present in a top canonical biological pathway identified by Ingenuity pathway analysis (Supplementary Table S4). SNP × SNP allelic epistasis was tested for each distinct pair of SNPs between genes, using the PLINK software package.

Genetic risk prediction panel and scoring

As we had previously done for bipolar disorder,11 we developed a polygenic GRPS for schizophrenia based on the presence or absence of the alleles of the SNPs associated with illness, and tested the GRPS in independent cohorts for which we had both genotypic and clinical data available, comparing the schizophrenia subjects to normal controls. We tested two panels: a smaller one (GRPS-42) containing the single best P-value SNP in ISC in each of the top CFG prioritized genes (n=42), and a larger one (GRPS-542), containing all the nominally significant SNPs (n=542) in ISC in the top CFG prioritized genes (n=42; Tables 3, 4, Supplementary Table S3, and Figure 4).

Table 3 GRPS-42: non-differentiation between schizophrenics and controls in independent cohorts using a panel composed of the single best SNP from ISC in each of the top candidate genes (42 SNPs, in 42 genes)
Table 4 GRPS-542: differentiation between schizophrenics and controls in four independent cohorts using a panel composed of all the nominally significant SNPs from ISC in the top candidate genes (542 SNPs in 42 genes)
Figure 4
figure 4

Genetic risk prediction of schizophrenia in four independent cohorts. AA, African American; EA, European American; GRPS, genetic risk prediction score.

PowerPoint slide

Of note, our SNP panels and choice of affected alleles were based solely on analysis of the ISC GWAS, which is our discovery cohort, completely independently from the test cohorts. Each SNP has two alleles (represented by base letters at that position). One of them is associated with the illness (affected), the other not (non-affected), based on the odds ratios from the discovery ISC GWAS. We assigned the affected allele a score of 1 and the non-affected allele a score of 0. A two-dimensional matrix of subjects by GRP panel alleles is generated, with the cells populated by 0 or 1. A SNP in a particular individual subject can have any permutation of 1 and 0 (1 and 1, 0 and 1, 0 and 0). By adding these numbers, the minimum score for a SNP in an individual subject is 0, and the maximum score is 2. By adding the scores for all the alleles in the panel, averaging that, and multiplying by 100, we generate for each subject an average score corresponding to a genetic loading for disease, which we call Genetic Risk Predictive Score (GRPS).

The software package PLINK (http://pngu.mgh.harvard.edu/~purcell) was used to extract individual genotype information for each subject from the GAIN and nonGAIN GWAS data files. We analyzed separately EA and AA schizophrenia subjects and controls, to examine any potential ethnicity variability (Tables 3 and 4, and Supplementary Table S3). To test for significance, a one-tailed t-test was performed between the schizophrenia subjects and the control subjects, looking at differences in GRPS.

Results

Top candidate genes

To minimize false negatives, we initially cast a wide net, using as a filter a minimal requirement for a gene to have both some GWAS evidence and some additional independent evidence. We thus generated an initial list of 3194 unique genes with at least a SNP at P<0.05 in the discovery GWAS analyzed (ISC),8 that also had some additional evidence (human or animal model data), implicating them in schizophrenia (CFG score 1; Table 5). This suggests, using these minimal thresholds and requirements, that the repertoire of genes potentially involved directly or indirectly in cognitive processes and schizophrenia may be quite large, similar to what we have previously seen for bipolar disorder.11

Table 5 Reproducibility between independent GWAS

To minimize false positives, we then used the CFG analysis integrating multiple lines of evidence to further prioritize this list of genes, and focused our subsequent analyses on only the top CFG scoring candidate genes. Overall, 186 genes had a CFG score of 3 and above (50% of maximum possible score of 6), and 42 had a CFG score of 4 and above (Tables 1 and 5, and Figure 2).

Our top findings from ISC (Table 1) were over-represented in two independent schizophrenia GWAS cohorts, the GAIN EA and GAIN AA. In total, 37 of the top 42 genes identified by our approach (88.1%) had at least a SNP with a P-value of <0.05 in those independent cohorts, an estimated twofold enrichment over what would be expected by chance alone at a genetic level (as there were 9002 genes at P<0.05 in the GAIN-EA GWAS, and the number of genes in the human genome is estimated at 20 500,14 the enrichment factor provided by our approach is (37/42)/(9002/20 500)≈2). Of note, there was no correlation between CFG prioritization and gene size, thus excluding a gene-size effect for the observed enrichment (Supplementary Figure S1).

Candidate blood biomarkers

Of the top candidate genes from Table 1 (see also Figure 2), 15 out of 42 have prior human blood evidence for change in schizophrenia, implicating them as potential blood biomarkers. The additional evidence provided by GWAS data suggests a genetic rather than purely environmental (medications, stress) basis for their alteration in disease, and their potential utility as trait rather than purely state markers.

Biological pathways

Pathway analyses were carried out on the top genes (Table 2), and on all the candidate genes (Supplementary Table S5). Notably, glutamate receptor signaling, G-protein–coupled receptor signaling and cAMP-mediated signaling were the top canonical pathways over-represented in schizophrenia, which may be informative for new drug discovery efforts by pharmaceutical companies.

Genetic risk prediction

Once the genes involved in a disorder are identified, and prioritized for likelihood of involvement, then an obvious next step is developing a way of applying that knowledge to genetic testing of individuals to determine risk for the disorder. Based on our identification of top candidate genes described above using CFG, we pursued a polygenic panel approach, with digitized binary scoring for presence or absence, similar to the one we have devised and used in the past for biomarkers testing5 and for genetic testing in bipolar disorder.11 Somewhat similar approaches but without CFG prioritization, attempted by other groups, have been either unsuccessful15 or have required very large panels of markers.8, 16

We first chose the single best P-value SNPs in each of our top CFG prioritized genes (n=42) in the ISC GWAS data set used for discovery, and assembled a GRP panel out of those SNPs (Table 3). We then developed a GRPS for schizophrenia based on the presence or absence of the alleles of the SNPs associated with the illness, and tested the GRPS in independent cohorts (GAIN EA and GAIN AA), comparing the schizophrenia subjects to normal controls (Table 3). The results were not significant. We concluded that genetic heterogeneity at a SNP level is a possible explanation for these negative results. We then sought to see if we get better separation with a larger panel, containing all the nominally significant SNPs (n=542) in the top CFG prioritized genes in ISC (n=42), on the premise that a larger panel may reduce the heterogeneity effects, as different SNPs might be more strongly associated with illness in different cohorts. We found that our larger panel of SNPs was indeed able to significantly distinguish schizophrenics from controls in both GAIN EA and GAIN AA, two independent cohorts of different ethnicities. To verify this unexpectedly strong result, we further tested our panel in two other independent cohorts, nonGAIN EA and nonGAIN AA, and obtained similarly significant results (Table 4 and Figure 4).

We also looked at whether our GRPS score distinguishes classic age of onset schizophrenia (defined by us as ages 15 to 30 years) from early onset (before 15 years) and late-onset (after 30 years) illness. Our results show that classic age of onset schizophrenia has a significantly higher GRPS than early or late-onset schizophrenia, in three out of the four independent cohorts of two different ethnicities (Figure 5).

Figure 5
figure 5

Genetic risk score and age at onset of schizophrenia. AA, African American; AAO, age at onset; EA, European American; GRPS, genetic risk prediction score.

PowerPoint slide

Finally, as we had done previously for bipolar disorder,11 we developed a prototype of how the GRPS score could be used in testing individuals to establish their category of risk for schizophrenia (Figure 6). The current iteration of the test, using the panel of 542 SNPs, seems to be able to distinguish in independent cohorts who is at lower risk for classic age of onset schizophrenia in two out of three EA subjects, and who is at higher risk for classic age of onset schizophrenia in three out of four AA subjects.

Figure 6
figure 6

Prototype of how genetic risk prediction score (GRPS) testing could be used at an individual rather than population level, to aid diagnostic and personalized medicine approaches. We used the average values and standard deviation values for GRPS from the GAIN samples from each ethnicity (European American (EA) and African American (AA)) as thresholds for predictive testing in the independent nonGAIN EA and nonGAIN AA cohorts. The average GRPS score for schizophrenics in the GAIN cohort is used as a cut-off for schizophrenics in the test cohort (that is, being above that threshold), and the average GRPS score for controls in the GAIN cohort is used as a cut-off for controls in the test nonGAIN cohort (that is, being below that threshold). The subjects who are in between these two thresholds are called undetermined. Furthermore, to stratify risk, we categorized subjects into risk categories (in red, increased risk; in blue, decreased risk): Category 1 if they fall within one standard deviation above the schizophrenics’ threshold, and category −1 if they fall within one standard deviation below the controls threshold. Category 2 and −2, subjects are between one and two standard deviations from the thresholds, category 3 and −3, subjects are between two and three standard deviations, and category 4 and −4, subjects are those who fall beyond three standard deviations of the thresholds. The positive predictive value (PPV) of the tests increases in the higher categories, and the test is somewhat better at distinguishing controls in EA (that is, in a practical application, individuals that are lower risk of developing the illness), and schizophrenics in AA (that is, in a practical application, individuals that are higher risk of developing the illness).

PowerPoint slide

Overlap among studies

We examined the overlap at a nominally significant (P<0.05) SNP level between ISC, GAIN EA and GAIN AA, and found that a minority of these SNPs (0.4%) overlap (Table 5 and Figure 7). We then examined the overlap at a gene level, then CFG prioritized genes level and finally biological pathways level, and found increasing evidence of commonality and reproducibility of findings across studies.

Figure 7
figure 7

Overlap between independent genome-wide association study (GWAS). AA, African American; EA, European American; CFG, convergent functional genomics; ISC, International Schizophrenia Consortium; SNP, single-nucleotide polymorphism.

PowerPoint slide

Discussion

Our CFG approach helped prioritize genes, such as DISC1 and MBP, with weaker evidence in the GWAS data but with strong independent evidence in terms of gene expression studies and other prior human or animal genetic work. Conversely, some of the top findings from GWAS, such as ZNF804A, have fewer different independent lines of evidence, and thus received a lower CFG prioritization score in our analysis (Supplementary Information-Table S1), although ZNF804A is clearly involved in schizophrenia-related cognitive processes.17 While we cannot exclude that more recently discovered genes have had less hypothesis-driven work done and thus might score lower on CFG, it is to be noted that the CFG approach integrates predominantly non-hypothesis driven, discovery-type data sets, such as gene expression, GWAS, CNV, linkage and quantitative traits loci. We also cap each line of evidence from an experimental approach (Figure 1) at a maximum score of 1, to minimize any ‘popularity’ bias, whereas multiple studies of the same kind are conducted on better-established genes. In the end, it is gene-level reproducibility across multiple approaches and platforms that is built into the approach and gets prioritized most by CFG scoring during the discovery process. Our top results subsequently show good reproducibility and predictive ability in independent cohort testing, the litmus test for any such work.

At the very top of our list of candidate genes for schizophrenia, with a CFG score of 5, we have four genes: DISC1, TCF4, MBP and HSPA1B. An additional five genes have a CFG score of 4.5: MOBP, NRCAM, NCAM1, NDUFV2 and RAB18.

DISC1 (Disrupted-in Schizophrenia 1), encodes a scaffold protein that has an impact on neuronal development and function,18, 19, 20 including neuronal connectivity.21 DISC1 has been identified as a susceptibility gene for major mental disorders by multiple studies.22, 23, 24 DISC1 isoforms are upregulated in expression in blood cells in schizophrenia, thus serving as a potential peripheral biomarker as well.25, 26 Developmental stress interacts with DISC1 expression to produce neuropsychiatric phenotypes in mice.27 Notably, its interacting partners PDE4B,28 TNIK,29 FEZ1 (ref. 30) and DIXDC1 (ref. 31) are also present on our list of prioritized candidate genes, with CFG scores of 4, 4, 3.5 and 2.5, respectively (Table 1 and Supplementary Table S1).

TCF4 (transcription factor 4) encodes a basic helix-turn-helix transcription factor, expressed in immune system as well as neuronal cells. It is required for the differentiation of subsets of neurons in the developing brain. There are multiple alternatively spliced transcripts that encode different proteins, providing for biological diversity and heterogeneity. Defects in this gene are a cause of Pitt-Hopkins syndrome, characterized by mental retardation with or without associated facial dysmorphisms and intermittent hyperventilation. TCF4 has additional genetic evidence for association with schizophrenia-relevant phenotypes.32, 33, 34, 35 It is changed in expression in postmortem brain,36 induced pluripotent stem cell-derived neurons10 and blood from schizophrenia patients.7 Notably, it is a candidate blood biomarker for level of delusional symptoms (decreased in high delusional states) based on our previous work.7

MBP (myelin basic protein) is a major constituent of the myelin sheath of oligodendrocytes and Schwann cells in the nervous system. MBP-related transcripts are also present in the bone marrow and the immune system. MBP has additional genetic evidence for association with schizophrenia.37 It is decreased in expression in postmortem brain38 and blood39 from schizophrenia patients. MBP is also changed in expression in the brain and blood of a pharmacogenomics mouse model of schizophrenia, based on our previous work.6 It was also decreased in expression in a stress-reactive genetic mouse model of bipolar disorder,40 and treatment with the omega-3 fatty acid docosahexaenoic acid led to an increase in expression. Notably, MBP is a candidate blood biomarker for level of mood symptoms (increased in high mood states in bipolar subjects), based on our previous work.5 Overall, the data indicate that MBP and other myelin-related genes41, 42 may be involved in the effects of stress on psychosis and mood. Demyelinating disorders such as multiple sclerosis tend to be precipitated and exacerbated by stress, and have co-morbid psychiatric symptoms.43 Of note, other myelin-related genes are also present on our list of prioritized candidate genes: MOBP and MOG, with CFG scores of 4.5 and 3, respectively (Table 1 and Supplementary Table S1).

HSPA1B (heat-shock 70-kDa protein 1B), a chaperone involved in stress response, stabilizes existing proteins against aggregation and mediates the folding of newly translated proteins. HSPA1B has additional genetic evidence for association with schizophrenia.44 It is changed in expression in postmortem brain45 and induced pluripotent stem cell-derived neurons10 from schizophrenia patients. HSPA1B is also increased in expression in the brain and blood of a pharmacogenomics mouse model of schizophrenia, based on our previous work.6 It was also co-directionally changed in the brain and blood in a phramacogenomic mouse model of anxiety disorders, we have recently described,46 as well as in a stress-reactive genetic mouse model.40 Treatment with the omega-3 fatty acid docosahexaenoic acid reversed the increase in expression of HSPA1B in this stress-reactive genetic mouse model.47 Another closely related molecule, HSPA1A (heat-shock 70-kDa protein 1A), is also present on our list of prioritized candidate genes, with a CFG score of 3.5 (Supplementary Table S1). Heat-shock proteins may be involved in the biological and clinical overlap and interdependence between response to stress, anxiety and psychosis.

NRCAM (neuronal cell adhesion molecule) encodes a neuronal cell adhesion molecule. This ankyrin-binding protein is involved in neuron–neuron adhesion and promotes directional signaling during axonal cone growth. NRCAM is also expressed in non-neural tissues and may have a general role in cell–cell communication via signaling from its intracellular domain to the actin cytoskeleton during directional cell migration. It is decreased in expression in postmortem brain48 and peripherally in serum49 from schizophrenia patients. NRCAM is also changed in expression in the brain of a pharmacogenomics mouse model of schizophrenia, based on our previous work.6 It was also increased in the amygdala in a stress-reactive genetic mouse model studied by our group.40 Another closely related molecule, NCAM1 (neural cell adhesion molecule 1), is among our top candidate genes as well. These data support a central role for cell connectivity and cell adhesion in schizophrenia.

Another top candidate gene is CNR1 (cannabinoid receptor 1, brain). CNR1 is a member of the guanine-nucleotide-binding protein (G-protein) coupled receptor family, which inhibits adenylate cyclase activity in a dose-dependent manner. CNR1 has additional genetic evidence for association with schizophrenia.50, 51 It is decreased in expression in postmortem brain from schizophrenics.52 The other main cannabinoid receptor, CNR2 (cannabinoid receptor 2), is among our top candidate genes too (Supplementary Table S1), and is decreased in expression in postmortem brain from schizophrenics as well. These data support a role for the cannabinoid system in schizophrenia, perhaps through a deficiency of the endogenous cannabinoid signaling that leads to vulnerability to psychotogenic stress,53 and is accompanied by increased compensatory exogenous cannabinoid consumption that may have additional deleterious consequences.54

A number of glutamate receptor genes are present among our top candidate genes for schizophrenia (GRIA1, GRIA4, GRIN2B and GRM5), as well as GAD1, an enzyme involved in glutamate metabolism, and SLC1A2, a glutamate transporter (Table 1). Other genes involved in glutamate signaling present in our data, with a lower scores, are GRIN2A, SLC1A3, GRIA3, GRIK4, GRM1, GRM4 and GRM7 (Supplementary Table S1). Glutamate receptor signaling is one of the top canonical pathways over-represented in our analyses (Table 2), and that finding is reproduced in independent GWA data sets (Table 2). One has to be circumspect with interpreting such results, as glutamate signaling is quasi-ubiquitous in the brain, and a lot of prior hypothesis-driven work has focused on this area, potentially biasing the available evidence. Nevertheless, our results are striking, and contribute to the growing body of evidence that has emerged over the last few years implicating glutamate signaling as a point of convergence for findings in schizophrenia,55 as well as for autism56 and AD.57 Glutamate signaling is the target of active drug development efforts,58 which may be informed and encouraged by our current findings.

Our analysis also provides evidence for other genes that have long been of interest in schizophrenia, but have had previous variable evidence from genetic-only studies: BDNF, COMT, DRD2, DTNBP1 (dystrobrevin binding protein1/dysbindin; Table 1). In addition, our analysis provides evidence for genes that had previously not been widely implicated in schizophrenia, but do have relevant biological roles, demonstrating the value of empirical discovery-based approaches such as CFG (Table 1): ANK3,48 ALDH1A1 and ADCYAP1, which is a ligand for schizophrenia candidate gene VIPR2,59, 60 also present in our data set, albeit with a lower CFG score of 2. Other genes of interest in our full data set (Supplementary Table S1) include ADRBK2 (GRK3), first described by us as a candidate gene for psychosis,1 CHRNA7,61 and PDE10A,62 which are targets for drug development efforts.

Pathways and mechanisms

Our pathway analyses results are consistent with the accumulating evidence about the role of synaptic connections and glutamate signaling in schizophrenia, most recently from CNV studies63(Table 2, Supplementary Table S5, Figure 3). Very importantly, the same top pathways were consistent across independent GWA studies we analyzed (Tables 2, 5, and Supplementary Table S5). We also did a manual curation of the top candidate genes and their grouping into biological roles examining them one by one using PubMed and GeneCards, to come up with a heuristic model of schizophrenia (Figure 3). Overall, while multiple mechanistic entry points may contribute to schizophrenia pathogenesis (Figure 3a), it is likely at its core a disease of decreased cellular connectivity precipitated by environmental stress during brain development, on a background of genetic vulnerability (Figure 3b).

Genetic risk prediction

Of note, our SNP panels and choice of affected alleles were based solely on analysis of the discovery ISC GWAS, completely independently from the test GAIN EA, GAIN AA, nonGAIN EA and nonGAIN AA GWAS. Our results show that a relatively limited and well-defined panel of SNPs identified based on our CFG analysis could differentiate between schizophrenia subjects and controls in four independent cohorts of two different ethnicities, EA and AA. Moreover, the genetic risk component identified by us seems to be stronger for classic age of onset schizophrenia than for early or late-onset illness, suggesting that the latter two may be more environmentally driven or have a somewhat different genetic architecture. It is likely that such genetic testing will have to be optimized for different cohorts if done at a SNP level. Interestingly, at a gene and pathway level, the differences between studies seem much less pronounced than at a SNP level, if at all present (Table 5), suggesting that gene-level and pathway-level tests may have more universal applicability. In the end, such genetic data, combined with family history and other clinical information (phenomics),64 as well as with blood biomarker testing,5 may provide a comprehensive picture of risk of illness.65, 66

Reproducibility among studies

Our work provides striking evidence for the advantages, reproducibility and consistency of gene-level analyses of data, as opposed to SNP level analyses, pointing to the fundamental issue of genetic heterogeneity at a SNP level (Table 5 and Figure 7). In fact, it may be that the more biologically important a gene is for higher mental functions, the more heterogenity it has at a SNP level67 and the more evolutionary divergence,68 for adaptive reasons. On top of that, CFG provides a way to prioritize genes based on disease relevance, not study-specific effects (that is, fit-to-disease as opposed to fit-to-cohort). Reproducibility of findings across different studies, experimental paradigms and technical platforms is deemed more important (and scored as such by CFG) than the strength of finding in an individual study (for example, P-value in a GWAS). The CFG prioritized genes show even more reproducibility among independent GWAS cohorts (ISC, GAIN EA, GAIN AA) than the full list of unprioritized genes with nominal significant SNPs. The increasing overlap and reproducibility between studies of genes with a higher average CFG score points out to their biological relevance to disease architecture. Finally, at a pathway level, there is even more consistency across studies. Again, the pathways derived from the top CFG scoring genes show more consistency than the pathways derived from the lower CFG scoring genes. Overall, using our approach, we go from a reproducibilty between independent studies of 0.4% at the level of nominally significant SNPs to a reproducibility of 97.1% at the level of pathways derived from top CFG scoring genes.

Overlap with other psychiatric disorders

Despite using lines of evidence for our CFG approach that have to do only with schizophrenia, the list of genes identified has a notable overlap with other psychiatric disorders (Figure 8, Supplementary Table S1). This is a topic of major interest and debate in the field.12, 69 We demonstrate an overlap between top candidate genes for schizophrenia and candidate genes for anxiety and bipolar disorder, previously identified by us through CFG (Figure 8), thus providing a possible molecular basis for the frequently observed clinical co-morbidity and interdependence between schizophrenia and those other major psychiatric disorders, as well as cross-utility of pharmacological agents. In particular, PDE10A is at the overlap of all three major psychiatric domains, and may be of major interest for drug development.62 The overlap between schizophrenia and bipolar may have to do primarily with neurotrophicity and brain infrastructure (underlined by genes such as DISC1, NRG1, BDNF, MBP, NCAM1, NRCAM, PTPRM). The overlap between schizophrenia and anxiety may have to do primarily to do with reactivity and stress response (underlined by genes such as NR4A2, QKI, RGS4, HSPA1B, SNCA, STMN1, LPL). Notably, the overlap between schizophrenia and anxiety is of the same magnitude as the previously better appreciated overlap between schizophrenia and bipolar disorder,6, 70 supporting the consideration of a nosological domain of schizoanxiety disorder,46 by analogy to schizoaffective disorder. Clinically, while there are some reports of co-morbidity between schizophrenia and anxiety,71 it is an area that has possibly been under-appreciated and understudied. ‘Schizoanxiety disorder’ may have heuristic value and pragmatic clinical utility.

Figure 8
figure 8

Genetic overlap among psychiatric disorders.

PowerPoint slide

We also looked at the overlap with candidate genes for autism and AD from the literature (Supplementary Table S1), to elucidate whether schizophrenia, autism and AD might be on a spectrum, that is, whether autism might be a form of ‘schizophrenia praecox’, similar to schizophrenia being referred to as ‘dementia praecox’ (Kraepelin). We see significant overlap between the three disorders among the top genes with a CFG score of 4: a third of the genes overlap between schizophrenia and autism, and a quarter between schizophrenia and AD. Additional key genes of interest are lower on the list as well, with a CFG score of 3: CNTNAP2 for autism, MAPT and SNCA for AD (Supplementary Table S1).

Conclusions and future directions

First, in spite of its limitations, our analysis is arguably the most comprehensive integration of genetics and functional genomics to date in the field of schizophrenia, yielding a comprehensive view of genes, blood biomarkers, pathways and mechanisms that may underlie the disorder. From a pragmatic standpoint, we would like to suggest that our work provides new and/or more comprehensive insights on genes and biological pathways to target for new drug development by pharmaceutical companies, as well as potential new uses in schizophrenia for existing drugs, including omega-3 fatty acids (Supplementary Table S2).

Second, our current work and body of work over the years provides proof how a combined approach, integrating functional and genotypic data, can be used for complex disorders-psychiatric and non-psychiatric, as has been attempted by others as well.72, 73 What we are seeing across GWAS of complex disorders are not necessarily the same SNPs showing the strongest signal, but rather consistency at the level of genes and biological pathways. The distance from genotype to phenotype may be a bridge too far for genetic-only approaches, given genetic heterogeneity and the intervening complex layers of epigenetics and gene expression regulation.74 Consistency is much higher at a gene expression level (Table 5),75 and then at a biological pathway level. Using GWAS data in conjunction with gene expression data as part of CFG or integrative genomics76 approaches, followed by pathway-level analysis of the prioritized candidate genes, can lead to the unraveling of the genetic code of complex disorders such as schizophrenia.

Third, our work provides additional integrated evidence focusing attention and prioritizing a number of genes as candidate blood biomarkers for schizophrenia, with an inherited genetic basis (Table 1 and Figure 2). While prior evidence existed as to alterations in gene expression levels of those genes in whole-blood samples or lymphoblastoid cell lines from schizophrenia patients, it was unclear prior to our analysis whether those alterations were truly related to the disorder or were instead related only to medication effects and environmental factors.

Fourth, we have put together a panel of SNPs, based on the top candidate genes we identified. We developed a GRPS based on our panel, and demonstrate how in four independent cohorts of two different ethnicities, the GRPS differentiates between subjects with schizophrenia and normal controls. From a personalized medicine standpoint, genetic testing with highly prioritized panels of best SNP markers may have, upon further development (Figure 6) and calibration by ethnicity and gender, a role in informing decisions regarding early intervention and prevention efforts; for example, for classic age of onset schizophrenia before the illness fully manifests itself clinically, in young offspring from high-risk families. After the illness manifests itself, gene expression biomarkers and phenomic testing approaches, including clinical data, may have higher yield than genetic testing. A multi-modal integration of testing modalities would be the best approach to assess and track patients, as individual markers are likely to not be specific for a single disorder. The continuing re-evaluation in psychiatric nosology66, 77 brought about by recent advances will have to be taken into account as well for final interpretation of any such testing. The complexity, heterogeneity, overlap and interdependence of major psychiatric disorders as currently defined by DSM suggests that the development of tests for dimensional disease manifestations (psychosis, mood and anxiety)66 will ultimately be more useful and precise than developing tests for existing DSM diagnostic categories.

Finally, while we cannot exclude that rare genetic variants with major effects may exist in some individuals and families, we suggest a contextual cumulative combinatorics of common variants genetic model best explains our findings, and accounts for the thin genetic load margin between clinically ill subjects and normal controls, which leaves a major role to be played by gene expression (including epigenetic changes) and the environment. This is similar to our conclusions when studying bipolar disorder,11 and may hold true in general for complex medical disorders, psychiatric and non-psychiatric. Full-blown illness occurs when genetic and environmental factors converge, usually in young adulthood for schizophrenia. When they diverge, a stressful/hostile environment may lead to mild or transient illness even in normal genetic load individuals, whereas a favorable environment may lead to supra-normative functioning in certain life areas (such as creative endeavors) for individuals who carry a higher genetic load. The flexible interplay between genetic load, environment and phenotype may permit evolution to engender diversity, select and conserve alleles, and ultimately shape populations. Our emerging mechanistic understanding of psychosis as disconnectivity, mood as activity11 and anxiety as reactivity46 may guide such testing and understanding of population distribution as being on a multi-dimensional spectrum, from supra-normative to normal to clinical illness.