GWAS results for educational attainment aid in identifying genetic heterogeneity of schizophrenia

Higher educational attainment (EA) is known to have a protective effect on the severity of schizophrenia (SZ). However, recent studies have found a small positive genetic correlation between EA and SZ. We investigated possible causes of this counterintuitive finding using genome-wide association (GWAS) results for EA and SZ (N = 443,581) and a replication cohort (1,169 controls and 1,067 cases) with high-quality SZ phenotypes. We found strong genetic dependence between EA and SZ that cannot be explained by chance, linkage disequilibrium, or assortative mating. Instead, several genes seem to have pleiotropic effects on EA and SZ, but without a clear pattern of sign concordance. Genetic heterogeneity in both phenotypes is the most likely explanation of this finding. This insight can be exploited by using a combination of EA and SZ GWAS results to improve the polygenic prediction of clinical symptoms and disease severity of SZ. In particular, although a polygenic score for SZ is robustly associated with case-control status, it does not predict any of the SZ symptoms or disease severity. In contrast, co-dependent polygenic scores that split the SZ score into two parts based on the sign concordance of SNPs for SZ and EA predict symptoms and disease severity in patients to some extent. Furthermore, using EA as a proxy-phenotype for SZ, we isolate FOXO6 and SLITRK1 as additional statistically plausible candidate genes for SZ.


INTRODUCTION
Schizophrenia (SZ) is the collective term used for a severe, highly heterogeneous and costly psychiatric disorder that is caused by environmental and genetic factors. [1][2][3][4] The latest genome-wide association study (GWAS) by the Psychiatric Genomics Consortium (PGC) identified 108 genomic loci that are associated with SZ. 5 These 108 loci jointly account for ≈3.4% of the variation on the liability scale to SZ, 5 while all single nucleotide polymorphisms (SNPs) that are currently measured by SNP arrays capture ≈64% (s.e. = 8%) in the variation in liability to the disease. 6 This implies that many genetic variants with small effect sizes contribute to the heritability of SZ, but most of them are unidentified as of yet. A polygenic score (PGS) based on all SNPs currently accounts for 4-15% of the variation on the liability scale to SZ. 5 Yet, this PGS does not predict any differences of symptoms or severity of the disease among SZ patients. 4 Partly, this could be due to the fact that the clinical disease classification of SZ spans across several different behavioral and cognitive traits that may not have identical genetic architectures. Therefore, identifying additional genetic variants and understanding through which pathways they are linked with the clinical diagnosis of SZ is an important step in understanding the etiologies of the 'schizophrenias'. 7 However, GWAS analyses of specific SZ symptoms would require very large sample sizes to be statistically well-powered, and the currently available datasets on deeply phenotyped SZ patients are not large enough yet for this purpose.
Here, we use an alternative approach to make progress with data that is readily available -by combining GWAS for SZ and educational attainment (EA). The GWAS sample sizes for EA are the largest for any cognition-related phenotype to date. Furthermore, previous studies suggest a complex relationship between EA and SZ 8,9 that may be used to gain additional insights into the genetic architecture of SZ and its symptoms. In particular, phenotypic data seem to suggest a negative correlation between EA and SZ. 10 For example, SZ patients with lower EA typically show an earlier age of disease onset, higher levels of psychotic symptomatology, and worsened global cognitive function. 10 In fact, EA has been suggested to be a measure of premorbid function and predictor of outcomes in SZ. Moreover, it has been forcefully argued that retarded intellectual development, global cognitive impairment during childhood, and bad school performance should be seen as core features of SZ that precede the development of psychotic symptoms and differentiate SZ from bipolar disorder (BIP). [11][12][13][14][15] Furthermore, credible genetic links between SZ and impaired cognitive performance have been found. 16 In contrast to these findings, recent studies using large-scale GWAS results identified a small, but positive genetic correlation between EA and SZ ( , = 0.08) 8 and higher values of the PGS for SZ have been reported to be associated with creativity and greater educational attainment. 17 Other statistically well-powered studies found that high intelligence (IQ) has protective effects against SZ 18 and reported a negative genetic correlation between IQ and SZ ( , = −0.2), 19 suggesting the possibility that genetic effects which contribute to EA but not via IQ are responsible for the observed positive genetic correlation between SZ and EA. Indeed, the latest GWAS on EA 8 already indicated that the genetic influence on higher schooling is not only mediated by IQ, but also by other factors such as behavioral inhibition and openness, which may be independently related to SZ and some of its symptoms. [20][21][22] We propose that genetic heterogeneity in SZ is a possible reason for these partially contradictory results from previous studies. To explore this hypothesis and to discern it from alternative explanations, we performed a series of statistical genetic analyses using largescale GWAS results for SZ and EA from non-overlapping samples. We find a nonmonotonous genetic dependence between both traits that can be leveraged (i) to identify novel candidate loci for SZ and (ii) to improve the genetic prediction of SZ symptoms and disease severity. We annotate possible biological pathways, tissues, and cell types implied by genetic variants that are associated with both traits. Furthermore, we explore whether the genetic dependence between EA and SZ is due to pleiotropic effects of genes on both traits, linkage disequilibrium (LD) among SNPs, or assortative mating. In addition, we elucidate the relationship of EA and SZ with BIP and IQ. Finally, we test if our results are driven by individuals who were diagnosed with schizoaffective disorder (SD).

All details are described in the Supplementary Information.
For our study, it is important to differentiate between genetic dependence and genetic correlation. In our context, genetic dependence means that the genetic variants associated with EA are more likely to be also associated with SZ than expected by chance. In contrast, genetic correlation is defined by the correlation of the (true) effect sizes of genetic variants on the two traits. Thus, genetic correlation implies a linear genetic relationship between two traits. Note that two traits can be genetically dependent even if they are not genetically correlated and vice versa (Supplementary Section 1). One possible cause of a non-linear genetic dependence is that at least one of the traits is genetically heterogeneous in the sense that it summarizes across endophenotypes (or symptoms) with non-identical genetic architectures.

GWAS
GWAS analyses on EA (n = 363,502) 8 and SZ (34,409 cases and 45,670 controls) 5 were carried out in non-overlapping samples of Europeans. The GRAS (Göttingen Research Association for Schizophrenia) replication sample 23 was not part of either GWAS (Supplementary Section 2-3).

Proxy-phenotype method (PPM)
We use the proxy-phenotype method (PPM) 24 to test if EA-associated loci are more likely to be associated with SZ than expected by chance. If the two traits are genetically dependent, PPM can increase the statistical power to identify loci associated with SZ because it reduces the multiple testing burden by using relevant information from the independent EA sample. 8,9,24 Our PPM analyses followed a pre-registered analysis plan (https://osf.io/dnhfk/). Analyses were carried out using 8,240,280 autosomal SNPs that passed quality controls in both GWAS and additional filters described in the Supplementary Section 4. We selected approximately independent lead-SNPs from the EA GWAS that passed the pre-defined significance threshold of < 10 −5 and looked up their SZ results. To test if EA-associated SNPs are more strongly associated with SZ than expected by chance, we conducted a Mann-Whitney test that compares the PSZ-values of the EA-associated lead SNPs for SZ with the PSZ -values of a set of randomly drawn, LD-independent SNPs with similar minor allele frequencies (Supplementary Section 5-6).

Distinguishing pleiotropy from LD
For each of the SNPs isolated by our PPM analysis, we looked at their neighboring SNPs within a +/-500kb window and estimated their posterior probability of being causal for EA or SZ using PAINTOR. 25 We then selected two sets of SNPs, each of which contains the smallest number of SNPs which yields a cumulative posterior probability of 90% of containing the causal locus for EA and SZ, respectively. For each of these sets, we calculate the posterior probability that it contains the causal locus for the other trait. We classify the probability of a locus being pleiotropic as low (0-15%), medium (15-45%), or high (>45%) (Supplementary Section 7).

Biological annotations
To gain insights into possible biological pathways that are indicated by the PPM results, we applied DEPICT 8,26 using a false discovery rate (FDR) threshold of ≤ 0.05. To identify independent biological groupings, we used the affinity propagation method based on the Pearson distance matrix for clustering 27 (Supplementary Section 8).

LD-aware enrichment of PPM results across different traits
We developed an association enrichment test that corrects for the LD score of each SNP (Supplementary Section 9). We applied this test to the SNPs isolated by the PPM analyses. LD scores were obtained from the HapMap 3 European reference panel. We investigated SZ and 21 additional traits for which GWAS results were available in the public domain. Some of the traits were chosen because they are phenotypically related to SZ (e.g., BIP), while others were less obviously related to SZ (e.g., age at menarche) or served as negative controls (e.g., fasting insulin).

Replication of PPM results
Our replication sample, the GRAS data collection, is described in Supplementary Section 10. Following our pre-registered analysis plan (https://osf.io/dnhfk/), our replication uses a PGS that is based on the 132 independent EA lead-SNPs that are also nominally associated with SZ ( < 10 −5 and < 0.05, Supplementary Section 11).

Polygenic prediction of schizophrenia symptoms in the GRAS sample
Our most direct test for genetic heterogeneity in SZ is based on PGS analyses that we carried out in our replication sample, the GRAS data collection, which contains exceptionally detailed measures of SZ symptoms. 4,7,23 We predicted years of education, age at prodrome, age at disease onset, premorbid IQ (approximated by a multiple-choice vocabulary test), global assessment of functioning (GAF), the clinical global impression of severity (CGI-S), as well as positive and negative symptoms (PANSS positive and negative, respectively) among SZ patients (N ranges from 903 to 1,039, see Supplementary Sections 10 and 12 Furthermore, we also randomly split the SZ_all score 100 times into two equally large halves and tested if that improves prediction.

Distinguishing between schizophrenia and bipolar disorder
We further elucidate the genetic relationship between SZ and related traits (EA, BIP, childhood IQ, and neuroticism) by using Genome-Wide Inferred Statistics (GWIS) 28 to obtain GWAS regression coefficients and standard errors for SZ GWAS that are "purged" of their genetic correlation with BIP and vice versa (yielding "unique" SZ(min BIP) and "unique" BIP(min SZ) results, respectively). We computed genetic correlations of these GWIS results with EA, childhood IQ, and neuroticism using bivariate LD score regression 29 and compared the results to those obtained using ordinary SZ and BIP GWAS results (Supplementary Section 13).

Simulations of assortative mating
We conducted simulations to test if strong assortative mating on EA and SZ can induce a spurious genetic dependence between the two traits (Supplementary Section 14). Figure 1 presents an overview of the proxy-phenotype analyses. The first-stage GWAS on EA (Supplementary Section 2) identified 506 loci that passed our predefined threshold of P EA < 10 −5 (https://osf.io/dnhfk/); 108 of them were genome-wide significant (P EA < 5 × 10 −8 , see Supplementary Table 2). Of the 506 EA lead-SNPs, 132 are associated with SZ at nominal significance (P SZ < 0.05), and 21 of these survive Bonferroni correction (P SZ < 0.05 506 = 9.88 × 10 −5 ) ( Table 1). LD score regression results suggest that the vast majority of the association signal in both the EA 8 and the SZ 5 GWAS are truly genetic signals, rather than spurious signals originating from uncontrolled population stratification. Figure 2a shows a Manhattan plot for the GWAS on EA highlighting SNPs that were also significantly associated with SZ (red crosses for P SZ < 0.05, green crosses for P SZ = 9.88 × 10 −5 ).

Proxy-phenotype analyses
A Q-Q plot of the 506 EA lead SNPs for SZ is shown in Figure 2b. Although the observed sign concordance of 52% is not significantly different from a random pattern ( = 0.40), we find 3.23 times more SNPs in this set of 506 SNPs that are nominally significant for SZ than expected given the distribution of the P values in the SZ GWAS results (raw enrichment = 6.87 × 10 −10 , Supplementary Section 6). The observed enrichment of the 21 EA lead SNPs that pass Bonferroni correction for SZ ( The effect sizes of these 21 SNPs on SZ are small, ranging from Odds = 1.02 (rs4500960) to Odds = 1.11 (rs4378243) after winner's curse correction (Table 1). However, Bayesian calculations with reasonable prior beliefs (e.g., 1% or 5%, Supplementary Section 6) suggest that most of these 21 SNPs are likely or virtually certain to be truly associated with SZ.

Prediction of future genome-wide significant loci for schizophrenia
Of the 21 variants we identified, 12 are in LD with loci previously reported by the PGC 5 and 2 are in the major histocompatibility complex (MHC) region on chromosome (chr) 6 and were therefore not separately reported in that study. Three of the variants we isolated (rs7610856, rs143283559, rs28360516) were independently found in a recent meta-analysis of the PGC results 5 with another large-scale sample. 30 We show in the Supplementary Section 6 that using EA as a proxy-phenotype for SZ helped to predict the novel genomewide significant findings reported in that study, which illustrates the power of the proxyphenotype approach. Furthermore, two of the 21 variants (rs756912, rs7593947) are in LD with loci recently reported in a study that also compared GWAS findings from EA and SZ using smaller samples and a less conservative statistical approach. 31 The remaining 2 SNPs we identified (rs7336518 on chr13 and rs7522116 on chr1) add to the list of empirically plausible candidate loci for SZ.

Detection of shared causal loci
For eight of the 21 loci identified by the PPM analysis the credible set of causal loci for EA and SZ had a medium or high credibility to have direct causal effects on both EA and SZ (including rs7336518). Five of these loci have concordant effects on the two traits (i.e ++ or --) while three have a discordant effects (i.e +-or -+, Supplementary Section 7).

Biological annotations
Biological annotation of the 132 SNPs that are jointly associated with EA and SZ using DEPICT identified 111 significant reconstituted gene sets (Supplementary Table 4

.1).
Pruning these resulted in 19 representative gene sets including dendrites, axon guidance, transmission across chemical synapses, and abnormal cerebral cortex morphology (Supplementary Table 4 Supplementary Figure 3.a). All significantly enriched tissues are related to the nervous system and sense organs (Supplementary Figure 3.b). Table 4.3). DEPICT prioritized genes that are known to be involved in neurogenesis and synapse formation (Supplementary Table 4.4). Some of the genes, including SEMA6D and CSPG5, have been suggested to play a potential role in SZ. 32,33 For the two novel candidate SNPs reported in this study (rs7522116 and rs7336518), DEPICT points to the FOXO6 (Forkhead Box O6) and the SLITRK1 (SLIT and NTRK Like Family Member 1) genes, respectively. FOXO6 is predominantly expressed in the hippocampus and has been suggested to be involved in memory consolidation, emotion and synaptic function. 34,35 Similarly, SLITRK1 is also highly expressed in the brain, 36 particularly localized to excitatory synapses and promoting their development, 37 and it has previously been suggested to be a candidate gene for neuropsychiatric disorders. 38

LD-aware enrichment across different traits
Supplementary Figure 4 and Supplementary Table 5.1 show the LD-aware enrichment of the SNPs that are jointly associated with EA and SZ across 22 traits. We find significant joint LD-aware enrichment for SZ, further confirming that the PPM results are not entirely driven by LD. We also find LD-aware enrichment for BIP, neuroticism, childhood IQ, inflammatory bowel disease, and age at menarche. However, we find no LD-aware enrichment for other brain-traits that are phenotypically related to SZ, such as depressive symptoms, subjective well-being, autism, and attention deficit hyperactivity disorder. We also do not find LDaware enrichment for most traits that are less obviously related to the brain and our negative controls. Furthermore, one of the novel SNPs we isolated shows significant LD-aware enrichment both for SZ and for BIP (rs7522116). The results suggest that the loci identified by the PPM are not simply related to all (brain) traits. Instead, they show some degree of phenotype-specificity.

Replication in the GRAS sample
A PGS based on the 132 loci jointly associated with both EA and SZ (SZ_132) adds ∆ 2 = 7.54% − 7.01% = 0.53% predictive accuracy for the SZ case-control status to a PGS (SZ_all) derived from the GWAS on SZ alone ( = 1.7 × 10 −4 , Supplementary Table  7.2.a, Model 3). The SZ_132 score also significantly adds ( = 3.4 × 10 −4 ) to the predictive accuracy of the SZ case-control status when all other scores we constructed are included as control variables (Supplementary Table 7

Prediction of schizophrenia measures in the GRAS sample of patients
Phenotypic correlations in the GRAS sample show that higher education is associated with later age at prodrome, later onset of disease, and less severe disease symptoms among SZ patients (Supplementary Section 12, Supplementary Table 8.1 and Supplementary  Figure 5). The EA_all PGS is associated with years of education ( = 1.0 × 10 −6 ) and premorbid IQ ( = 2.7 × 10 −4 ) among SZ patients (Supplementary Section 12 and Table  2). Consistent with earlier results, 4 we find that none of the SZ measures can be predicted by the PGS for SZ (SZ_all, Table 2). Yet, splitting the PGS for SZ based on the signconcordance of SNPs with EA (Concordant and Discordant) increases predictive accuracy significantly for severity of disease (GAF (pF = 0.023)) and symptoms (PANSS negative (pF = 0.007)) ( Table 2). This increase in predictive accuracy is evidence for genetic heterogeneity in SZ (Supplementary Section 1). Specifically, our results indicate that those with a high genetic propensity for EA have better assessments of global functioning (GAF) and less severe negative symptoms (PANSS negative). However, if the high genetic predisposition for EA is primarily due to loci that also increase the risk for SZ (i.e. high values on the Concordant score), this protective effect is attenuated. The highest gain in predictive accuracy for SZ symptoms from splitting the SZ_all by sign concordance with EA is currently observed for PANSS negative (ΔR 2 = 0.63%). We repeated these analyses excluding patients who were diagnosed with schizoaffective disorder (SD) and found similar results, implying that our findings are not only due to the presence of patients with SD (Supplementary Section 12, Supplementary Table 8.4.a). Randomly splitting the SZ_all score does not yield any gains in predictive accuracy (Supplementary Section 12 and Supplementary Table 8.5).

Simulations of assortative mating
Our simulation results suggest it is unlikely that assortative mating is a major cause for the genetic dependence we observe between EA and SZ (Supplementary Figure 7).

DISCUSSION
We demonstrate strong genetic dependence between EA and SZ, but without a clear pattern of sign concordance. Our results cannot be explained by chance, linkage disequilibrium, or assortative mating. Using EA as a proxy-phenotype, we isolated 21 genetic loci for SZ and two novel candidate genes, FOXO6 and SLITRK1. Eight of the 21 loci seem to have true pleiotropic effects on EA and SZ. Furthermore, we showed that EA GWAS results help to predict future GWAS findings for SZ, despite the low genetic correlation between both traits. Moreover, splitting the PGS for SZ into two scores based on the sign concordance of SNPs with EA enables the prediction of disease symptoms and severity from genetic data for the first time to some extent. This result is not driven by patients with SD and it cannot be repeated by randomly splitting the SZ score. Finally, we showed that the small positive genetic correlation of EA and SZ seems to be driven by loci that also increase the risk for BIP, whereas the genetic correlation between EA and "unique" SZ(min BIP) is actually negative. This is consistent with Kraeplin's original description of dementia praecox and with more recent accounts which suggest that the main difference between SZ and BIP is that SZ is a neurodevelopmental disorder, whereas BIP is not. [11][12][13][14][15] Our results are most consistent with the idea that EA 8 and SZ are both genetically heterogeneous traits. Specifically, our results raise the possibility that the clinical diagnosis of SZ aggregates over at least two subtypes with non-identical symptoms and genetic architectures: One part resembles BIP and high IQ (possibly associated with Concordant SNPs and personality traits such as openness or behavioral inhibition), while the other part is a cognitive disorder that is independent of BIP (possibly influenced by Discordant SNPs). Furthermore, our results point to possible side-effects of pharmacogenetic interventions that may aim to target pleiotropic genes.    Notes: Educational attainment (EA) and schizophrenia (SZ) GWAS results are based on the analyses reported in ref. 5,8 . All cohorts that were part of the SZ GWAS were excluded from the meta-analysis on EA. The GRAS data collection was not included in either the SZ or the EA meta-analysis. Proxy-phenotype analyses were conducted using 8,240,280 autosomal SNPs that passed quality control. Genetic outliers of non-European decent (N = 13 cases) were excluded from the analysis in the GRAS data collection.  Notes: The heatmap displays the genetic correlations across 7 sets of GWAS or GWIS summary statistics. Genetic correlations were estimated with LD score regression. 29 The color scale represents the genetic correlations ranging from -1 (red) to 1 (blue). Asterisk denotes that the genetic correlation is significant at P value < 0.01.