Introduction

Schizophrenia (SCZ) is a chronic psychiatric disorder, with an estimated lifetime prevalence of ~1%1,2,3. This disorder exerts a substantial impact on the individuals it affects, their families, and the larger societal context4. The associated symptoms, including hallucinations and delusions, can substantially disrupt daily routines, impeding one’s capacity to engage in work, education, and maintain interpersonal relationships, thereby resulting in noteworthy social and economic consequences5. Additionally, SCZ demonstrates a high heritability rate in twin studies, estimated at approximately 80%6,7, emphasizing the significant role of genetic factors in this disorder.

Cognitive dysfunction stands as a central and persistent facet of SCZ8,9,10,11,12,13,14, often encompassing memory deficits, thought disorganization, impaired attention, and difficulties in problem-solving15,16. There is evidence suggesting that cognitive impairments might emerge even before an official diagnosis of the condition17,18,19,20. Cognitive impairment is a prevalent feature among the majority of SCZ individuals, although its severity can differ from one person to another. In contrast to other aspects of SCZ, cognitive deficits often exhibit relative stability within the same patient over time, with their severity typically mirroring changes in the patient’s clinical condition21. Nevertheless, the fundamental pathophysiology of cognitive impairments in schizophrenia remains largely unknown, highlighting the crucial need to develop a deeper understanding of the underlying mechanisms causing these cognitive deficits.

The Psychiatric Genomics Consortium (PGC, http://www.med.unc.edu/pgc) conducted a genome-wide association study (GWAS) involving 76,755 SCZ patients and 243,649 controls, and found 287 risk loci associated with SCZ22. In parallel, GWAS was performed on a sample of over 250,000 individuals and identified 225 significant genome-wide loci associated with cognitive performance (CP)23. Accumulating evidence suggests genetic overlap between SCZ and cognitive functioning24,25,26. Moreover, a previous study utilizing the conditional/conjunctional false discovery rate (cond/conj FDR) method to reveal 21 shared loci between SCZ and cognitive domains17. These findings, though remarkable, originated from GWAS studies with relatively smaller sample sizes and limited discovery of significant shared loci, emphasizing the need for additional investigation and validation. Furthermore, MiXeR27, a recently proposed method, assesses genetic overlap regardless of the direction of effects and offers a comprehensive estimation of shared polygenic architecture. It also has the capability to calculate polygenicity, discoverability, and heritability of complex phenotypes, thus enhancing our comprehension of cross-trait genetic architectures. However, there have been no studies to date that have employed the MiXeR approach to investigate the genetic correlation between SCZ and CP.

In the current study, the primary objective was to examine the genetic overlap of SCZ and CP. By utilizing the recently published GWAS summary data22,23, the MiXeR approach was initially employed to estimate the genetic overlap between SCZ and CP, extending beyond the measurement of their genetic correlation27. Subsequently, the cond/conjFDR method28 was applied to identify the specific shared genetic loci29.

Methods

GWAS Data

All GWAS datasets used in this study were approved by the ethics committees of original studies, and all individuals provided informed consent prior to their participation. The GWAS summary statistics for SCZ were derived from the PGC, including data from 53,386 cases and 77,258 controls of European ancestry22. The GWAS summary statistics for CP were obtained from the Social Science Genetic Association Consortium (SSGAC) and encompassed 257,841 samples23. For details see Supplementary Methods and Supplementary Table 1.

Statistical analysis of genetic overlap between SCZ and CP

To estimate the genetic overlap between SCZ and CP, MiXeR (https://github.com/precimed/mixer) was used, which employs GWAS summary statistics to quantify polygenic overlap27. MiXeR is capable of estimating the polygenicity, heritability, and discoverability of a single phenotype through univariate analysis30, as well as quantifying the genetic overlap of cross-traits. In the cross-trait analysis, the additive genetic effect was modeled as a mixture of four bivariate Gaussian components relating to the two traits. The performance of the model was evaluated using the negative log-likelihood function, and the best model was determined based on the comparison against those with the smallest and largest polygenic overlaps (lower scores indicating better performance). For more details on MiXeR, please refer to the Supplemental Methods. The enrichment of cross-traits was visualized through conditional Q-Q plots, where enrichment is observed when the proportion of SNPs associated with the primary phenotype increases with the strength of the secondary phenotype association29.

Afterwards, the condFDR/conjFDR method28 was employed to identify the shared genetic loci. By utilizing condFDR, we aimed to increase the discovery rate of genetic variants associated with SCZ and CP, enhancing the identification of genetic loci for major phenotypes by utilizing the association of both major and minor phenotypes29. Subsequently, conjFDR analysis was performed to identify genomic loci jointly associated with both traits, with conjFDR value was defined as the maximum of the two mutual condFDR values. In this study, SNPs below the thresholds of condFDR < 0.01 and conjFDR < 0.05 were considered significant. To minimize the inflation effect resulting from linkage disequilibrium (LD) dependency among SNPs, we performed 500 iterations of random pruning in the condFDR/conjFDR analysis. During each iteration, a single random SNP representative was preserved for each LD block, and the results from all iterations were subsequently averaged.

Notably, before performing MiXeR and condFDR/conjFDR analyses, genomic regions exhibiting LD were excluded31. Variants located in these specific genomic areas, such as the extended major histocompatibility complex (MHC) region (hg19 location Chr6: 25119106-33854733), chromosome 8p23.1 (hg19 location Chr8:7200000-12500000), and the gene MAPT (hg19 location Chr17:40000000-47000000), were removed32.

Genomic loci definition

Functional Mapping and Annotation (FUMA, https://fuma.ctglab.nl/) was employed to identify independent genomic loci33, which serves as an online tool for functionally mapping genetic variants. Initially, independent significant SNPs were identified as SNPs that were independent from each other with r2 < 0.6 and condFDR < 0.01 or conjFDR < 0.05. From this set, we further selected lead SNPs by identifying a subset that showed LD with each other at r2 < 0.1.

Functional annotation

All candidate SNPs from the genomic loci showing condFDR or conjFDR value < 0.1 and r2 ≥ 0.6 with one of the independent significant SNPs were utilized for functional annotation33. SNPs were annotated through various methods, incorporating combined annotation-dependent depletion (CADD) scores for predicting the deleterious effects of SNPs on protein structure and function34, RegulomeDB scores to assess the likelihood of regulatory functionality35, and chromatin states to predict transcriptional and regulatory effects within the vicinity of the SNP locus36,37. Finally, we utilized g: Profiler (https://biit.cs.ut.ee/gprofiler/gost) to evaluate gene-set enrichment for the genes located nearest to the identified shared loci, enhancing our understanding of the genetic mechanisms and potential underlying biological processes in these diseases38.

Validation analyses

To evaluate the robustness and consistency of our MiXeR and cond/conjFDR findings, validation analyses were conducted using an additional CP dataset from Savage JE et al.39. Comprising 269,867 individuals from 14 cohorts, this dataset has been widely utilized in analogous multivariate genomic analyses40,41, offering an opportunity to validate our results derived from the SSGAC study23. Prior to initiating the validation analyses, LD was calculated between the lead SNPs of genome-significant loci reported in these original studies23,39. Subsequently, MiXeR and condFDR analyses were performed on the Savage et al. dataset using the same procedures as mentioned above. Finally, the estimates of genetic overlap between SCZ and CP obtained from MiXeR analysis were compared, and LD between the lead SNPs in the loci identified through cond/conjFDR analysis in both datasets was also calculated. In the validation analysis, LD with an r2 > 0.9 was considered indicative of replicability.

Results

Genetic overlap between SCZ and CP

Univariate MiXeR analysis identified lower polygenicity (SCZ = 9572 SNPs, standard error = 255; CP = 10,874 SNPs, standard error = 297), higher heritability (SCZ = 0.39, standard error = 0.005; CP = 0.20, standard error = 0.002), and higher discoverability (SCZ = 6.28e-05, standard error = 1.47e-06; CP = 2.77e-05, standard error = 6.51e-07) for SCZ than CP (Fig. 1A). In bivariate MiXeR analysis, a genetic correlation of −0.25 was calculated between SCZ and CP (Fig. 1B), and a substantial genetic overlap was found, with a Dice coefficient of 92.8%. Specifically, there are about 9.5 K SNPs shared between SCZ and CP, representing 98.9% of the SNPs influencing SCZ and 87.1% of the SNPs influencing CP. Furthermore, there are 0.1 K SNPs (1.1%) uniquely associated with SCZ and 1.4 K SNPs (12.9%) only associated with CP. The substantial difference between the number of shared SNPs and those specific to each trait highlights a significant genetic overlap between SCZ and CP.

Fig. 1: Shared and unique polygenic variants of SCZ and CP.
figure 1

A The MiXeR-estimated heritability, polygenicity, and discoverability for SCZ and CP. B Polygenic overlap between SCZ (blue) and CP (orange). The numbers in the Venn diagram indicate the estimated number of shared and specific genetic variants (in thousands), the numbers in parentheses are standard errors, and rg corresponds to genetic correlation between two traits. SCZ schizophrenia, CP cognitive performance.

The conditional Q-Q plots revealed an enrichment of SCZ and CP, indicating a polygenic overlap between them (Fig. 2A, B). In these plots, the blue line represents the association p-values of all SNPs in the primary trait, without considering their association with the conditional trait. The red line represents the association p-values of SNPs in the main trait that meet the condition of having the conditional trait association p-value less than 0.1. The yellow and purple lines represent the conditional trait association p-values of 0.01 and 0.001, respectively. As the threshold for the p-values of the conditional trait becomes stricter, the curve in the Q-Q plot continuously shifts towards the left, indicating the pleiotropic enrichment of SNP associations between SCZ and CP.

Fig. 2: Polygenic overlapping effects of SCZ and CP.
figure 2

A Conditional Q-Q plots of nominal versus empirical -log10 p-values in SCZ conditional on CP and (B) vice versa, at the level of p < 0.100, p < 0.010 and p < 0.001, respectively. The blue line includes all SNPs and dashed lines indicate the null hypothesis. SCZ schizophrenia, CP cognitive performance.

Shared loci between SCZ and CP

The condFDR analysis was employed to enhance the detection of genetic variants associated with SCZ and CP. At a condFDR < 0.01 threshold, we identified 313 loci associated with SCZ conditional on CP (Supplementary Table 2). Similarly, we found 332 loci associated with CP conditional on SCZ (Supplementary Table 3). Additionally, at a conjFDR < 0.05 threshold, we observed 236 gene loci that exhibited joint association with both SCZ and CP (Supplementary Table 4 and Fig. 3). Among these shared loci, 170 were not identified in the original GWAS for SCZ22, and 177 were not found in the original GWAS for CP23, with 139 newly identified shared loci for both SCZ and CP (Supplementary Table 4). The genomic distributions resulting from the condFDR/conjFDR analyses for SCZ and CP are shown in Fig. 4A. In the comparison of effect directions for lead SNPs at shared loci, we found that 60 (25.4%) lead SNPs exhibit consistent effect directions in SCZ and CP, while 176 (74.6%) lead SNPs show opposite effect directions (Supplementary Table 4).

Fig. 3: Shared genetic variants of SCZ and CP.
figure 3

The Manhattan plot shows the -log10 transformed conjFDR values for each SNP along the Y-axis and chromosomal positions along the X-axis. The lilac horizontal line represents the threshold for significant shared associations (conjFDR < 0.05, i.e., -log10(conjFDR) > 1.3).

Fig. 4: Distribution of loci and functional annotations of all candidate SNPs for SCZ and CP by condFDR/conjFDR analysis.
figure 4

A Distribution of 313 loci for SCZ, 332 loci for CP, and 236 loci shared between the two, spans across the genomic landscape of chromosomes. Distribution of (B) functional consequences, (C) RegulomeDB scores, and (D) minimum chromatin state of all candidate SNPs in loci for SCZ, loci for CP and loci shared between the two. The blue color represents SCZ trait conditioned on CP at condFDR < 0.01. The green color represents CP trait conditioned on SCZ at condFDR < 0.01. The red color represents shared associations between SCZ and CP at conjFDR < 0.05. SCZ schizophrenia, CP cognitive performance.

Functional annotation

Functional annotation of all candidate SNPs identified in the condFDR/conjFDR analysis was presented in Fig. 4B–D. The annotation of the candidate SNPs (n = 27,811) within the loci associated with SCZ conditional on CP revealed that the majority were situated in intronic regions (51.9%) and intergenic regions (29%), with only 1.3% located in exonic regions (Supplementary Table 5). Similarly, the annotation of the candidate SNPs (n = 29,413) within the loci associated with CP conditional on SCZ demonstrated that the majority were located in intronic regions (53.5%) and intergenic regions (30%), with only 1.2% found in exonic regions (Supplementary Table 6). When considering all candidate SNPs (n = 13,736) within the shared loci between SCZ and CP, the functional annotation indicated that the majority of these loci were situated in intergenic regions (23.3%) and intronic regions (60.5%), with 1.6% located in exonic regions (Supplementary Table 7).

Among the 236 top lead SNPs in the shared loci, 47.9% were located in intronic, 31.8% in intergenic, and 2.1% in exonic regions (Supplementary Table 4). Moreover, 17 of the 236 top lead SNPs had CADD scores above 12.37, suggesting a high level of deleteriousness. Five of these 17 SNPs (rs1506536, rs9878063, rs7708343, rs3789023, and rs2916142) were located in intronic regions within MRPL33: BRE, CELSR3, FBXL17, MAPK8IP1, and PITPNC1, respectively. Three of the 236 top lead SNPs (rs10774563, rs4275659, and rs5751191) had RegulomeDB scores of 1f, 1d, and 1f, respectively, indicating their potential influence on transcription factor binding; these SNPs annotated to RP11-728G15.1, ABCB9 and SEPT3 as nearest genes, respectively. The distribution of minimum chromatin state showed that 208 of the top lead SNPs were located in open chromatin states regions (states 1-7), indicating accessibility and openness of specific genome regions in cells. Furthermore, the gene-set enrichment analysis revealed that the shared loci between SCZ and CP were involved in biological processes such as nervous system development, multicellular organism development, and generation of neurons (Supplementary Table 8).

Validation analyses

Validation analyses were conducted to assess the robustness and reliability of our findings. This involved comparing the MiXeR and cond/conjFDR results obtained from the SSGAC dataset23 used in our study with those from the dataset by Savage JE et al.39. The original GWAS findings from the SSGAC dataset indicated the presence of 225 distinct loci, while Savage JE et al. identified 205 loci in their corresponding study23,39. Subsequent LD analysis demonstrated that 80 loci were replicable in these two GWAS results, with LD r2 > 0.9.

In the Savage JE et al. dataset, the univariate MiXeR analysis showed that CP had 10,366 SNPs for polygenicity (standard error = 335), with a heritability of 0.18 (standard error = 0.002) and a discoverability of 2.64e-05 (standard error = 6.82e-07). The bivariate MiXeR analysis uncovered a genetic correlation of −0.24 and a Dice coefficient of 94.8%, with a polygenic overlap of 9.5 K observed between SCZ and CP (Supplementary Fig. 3). The consistency in polygenic overlap estimates (all 9.5 K SNPs for SSGAC and Savage JE et al.) and genetic correlation values (−0.25 for SSGAC, −0.24 for Savage JE et al.) highlights the robustness and reliability of our findings across different datasets, providing strong evidence for a shared genetic architecture between SCZ and CP.

In the validation analysis with condFDR < 0.01, 312 loci were identified for SCZ (Supplementary Table 9), and 314 loci were identified for CP (Supplementary Table 10). Among them, a total of 257 loci were replicable for SCZ (with 220 lead SNPs identical to those previously detected), and 146 loci were replicable for CP (with 79 lead SNPs identical to those previously detected) based on LD analysis (r2 > 0.9). Under conjFDR < 0.05, 238 shared loci were identified between SCZ and CP (Supplementary Table 11), and 128 loci were replicable (with 86 lead SNPs identical to those previously detected), with LD r2 > 0.9. These results demonstrate the moderate reproducibility of our findings.

Discussion

In this study, we employed advanced analytical techniques to analyze large-scale GWAS datasets for SCZ and CP, aiming to comprehensively explore the shared genetic foundation between these traits. Specifically, utilizing MiXeR, we identified a substantial genetic overlap between these two traits. In addition, we uncovered 236 shared loci between SCZ and CP using conjFDR approach, including 139 newly identified loci. Functional annotation revealed their involvement in crucial biological processes, such as nervous system development and neuron generation. These findings shed light on the complex interplay between genetic factors underpinning SCZ and CP, advancing our understanding of the genetic mechanisms that shape these conditions.

The MiXeR method was utilized to identify mixed genetic effects and estimate the total number of causal variants. The results indicated a significant genetic overlap between SCZ and CP, with 9.5 K shared SNPs representing 98.9% of SCZ-influencing SNPs and 87.1% of CP-influencing SNPs. We also utilized the conjFDR approach to investigate the shared loci between SCZ and CP, revealing 236 shared loci. The concurrent use of these approaches allowed for a more precise and comprehensive assessment of genetic overlap among traits. However, the shared loci exhibit mixed effects, where opposing influences counteract each other (Supplementary Table 4), resulting in a modest genetic correlation between the two phenotypes. For the 74.6% of these 236 shared loci, we observed opposite genetic effects between SCZ and CP at these loci, suggesting that these variants may be contributing factors to the cognitive decline observed in individuals with SCZ. This mixed effect direction observed in the shared genetic loci of SCZ with CP further emphasizes the complexity of the genetic factors in influence.

Functional annotation demonstrates that the shared genetic variants between SCZ and CP are predominantly located in intronic and intergenic regions. Gene-set enrichment analysis implicates biological processes crucial for brain development, offering potential targets for therapeutic interventions aimed at preserving or enhancing cognitive function in individuals with schizophrenia. Interestingly, some of the genes identified in this study, such as SP4, STAG1, and FAM120A, had been previously reported to be associated with SCZ in other studies42,43. For example, the protein encoded by the SP4 gene, a transcription factor capable of binding to the GC promoter regions of various genes, including those related to the photoreceptor signal transduction system, is linked to both schizophrenia44 and bipolar affective disorder45. Similarly, a search in the GWAS catalog confirmed that the genetic risk loci reported in this study had also been validated in previous research. For instance, rs4275659, identified as a shared SNP for SCZ and CP in this study, was found to jointly impact both phenotypes, consistent with reports on its effects on CP, education, and SCZ46. These findings emphasize the significance of the identified genetic factors and their potential role in contributing to the complex etiology of SCZ and related traits.

Several limitations should be taken into account when interpreting our findings. First, the availability of GWAS summary statistic data was confined to populations of European ancestry, potentially constraining the applicability of our findings to other populations. Subsequent investigations involving diverse ethnic backgrounds could provide insights into the broader genetic landscape of SCZ and CP. Second, our study primarily focused on genetic overlap, and the clinical implications or causal relationships between the identified genetic factors and observed phenotypes were not extensively explored. Future studies should consider incorporating additional clinical and environmental factors for a more comprehensive understanding. Finally, it is crucial to acknowledge that the observed moderate reproducibility in our validation analyses is a limitation rooted in the moderate replicability of the original CP GWAS datasets. To overcome this limitation, future research should prioritize the inclusion of larger and more diverse cognitive GWAS datasets.

In conclusion, this study highlights a significant genetic overlap between SCZ and CP, with ~9.5 K genetic variants influencing both SCZ and CP. Moreover, 236 shared genetic loci between these two traits were identified. These loci, predominantly located in intergenic and intronic regions, are associated with critical biological processes such as nervous system development and neuron generation. Overall, this study enriches our understanding of the complex relationship between SCZ and CP, offering valuable insights into potential molecular mechanisms, and providing a solid foundation for advancing research on schizophrenia treatment, prevention, and novel therapeutic interventions.