Introduction

Schizophrenia is a severe mental illness and a major risk factor for suicide, especially in the early stages of the disease. Suicide is among the largest causes of death among schizophrenia patients, with approximately 50% attempting and 10% dying from suicide1,2. Although suicide is a consequence of complex psycho-social factors, the predisposition to suicidality is at least partly explained by genetics. Heritability estimates captured by twin studies range from 30 to 55% for suicidal thoughts and behaviours3. In the context of psychiatric disorders, shared heritability between schizophrenia and suicide attempt/ideation has been demonstrated via polygenic risk scores and genetic correlation analyses, revealing a strong positive association4,5. Two recent large-scale genome-wide association studies (GWAS) in the International Suicide Genetics Consortium (ISGC) and Million Veteran Program (MVP) have demonstrated promising and replicable results. In a psychiatric cohort, the ISGC (total N = 549,743; 29,782 cases) detected two genome-wide significant loci associated with suicide attempts on the major histocompatibility complex (chromosome 6, index SNP rs71557378, p = 1.97 × 108) and an intergenic locus on chromosome 7 (index SNP rs62474683, p = 1.91 × 1010), with the latter locus independently replicated by the MVP study6,7. Leveraging from these two studies, the largest and most recent meta-analysis to date (total N = 958,896; 43,871 suicide attempt/death cases) has identified 12 genome-wide significant loci, notably a locus in the major histocompatibility complex (MHC)8.

In addition to the new potential association with suicide risk, the MHC region has been long implicated in schizophrenia GWAS studies9,10,11,12. The most recent and largest schizophrenia GWAS study (N = 74,776 cases and 101,023 controls) has identified 287 genetic regions associated with schizophrenia, with the MHC region, by far showing the strongest association12. The MHC region, spanning four million base pairs on chromosome 6, is best known for its role in immunity, with genes encoding for human leukocyte antigens (HLA) and many others participating in immune functioning (e.g., complement genes)13. Having hundreds of genes with complex linkage disequilibrium and high variability in the MHC region has made it particularly challenging to find a functional allele to explain the schizophrenia-MHC association. However, this association has been partly explained by a landmark study on complex variations of the complement component 4 (C4) gene, which codes for an activator of the complement system14.

The human C4 gene, located in the MHC class III region (chr 6: 31.98–32.04 Mb), has a sophisticated genetic architecture and is present as two functionally distinct isotypes, C4A and C4B15,16. Each functional C4 isotype (C4A, C4B) could be further segregated into long (C4L) or short (C4S) genomic forms17 depending on the presence or absence of a human endogenous retroviral type K (HERV-K) insertion in intron 9 of the C4 gene18, resulting in four possible combinations (C4AL, C4BL, C4AS, C4BS) (Fig. 1). In addition, different individuals could have varying copy numbers of C4A and C4B genes, giving rise to at least 22 different haplotypes15. By determining the copy number of C4 structural haplotypes, Sekar et al. found that the four common haplotypes (AL-AL, AL-BS, AL-AL, and BS) are associated with varying degrees of schizophrenia risk16,19. Specifically, in the context of C4, higher brain C4A expression is a predictor of increased schizophrenia risk, with the AL-AL haplotype conferring the highest risk (OR = 1.3)14.

Fig. 1
figure 1

Schematic of the MHC region on chromosome 6, and the C4 gene compound structural forms (created by Biorender.com).

In support of Sekar et al.’s findings, other studies have also shown that higher C4A expression is observed in schizophrenia post-mortem brains compared to healthy controls20,21,22. This higher C4A expression has been associated with more severe psychopathology symptoms, likely due to excessive complement activity in the brain of schizophrenia patients which disrupts the normal synaptic pruning process20,23. In line with these results, C4B and C4S deficiency has also been reported in schizophrenia patients21,24,25, and there is an inverse relationship between C4A and C4B copy number variations; similarly, with C4L and C4S19. However, it should be noted that there have been mixed results with regard to C4B deficiency across different ancestries. These findings warrant the question of whether C4B/C4S might have a protective effect against schizophrenia risk, whereas C4A/C4L might lead to deleterious effects.

Overall, fine-mapping studies suggest that the C4 gene is a critical driver of the association between the MHC region and schizophrenia. Given the recent association of the MHC region with suicide, we sought to explore the potential role of different C4 variants in suicide risk in our sample of schizophrenia patients. In this study, we explored the relationship between C4 genetic variants and predicted C4 brain expression with suicide risk (suicide attempt/ideation) in our Toronto schizophrenia cohort. Moreover, we examined the sex-specific effects of C4 variants on suicidality.

Results

The distribution of C4 structural forms (C4A, C4B, C4L, C4S) and compound structural forms (C4AL, C4AS, C4BL, C4BS) could be found in Fig. 2a, b. Moreover, the distribution of C4 structural forms categorized by suicide attempt can be found in Fig. 3a, b. From the total N = 434 subjects with C4 genotype data, only subjects with complete phenotypic data were included in the analysis.

Fig. 2: Distribution of C4 structural and compound structural forms in our sample (N = 432).
figure 2

a Distribution of C4 structural forms a) C4A, b) C4B, c) C4L, d) C4S (N = 432). b Distribution of C4 compound structural forms a) C4AL, b) C4BL, c) C4AS, d) C4BS (N = 432).

Fig. 3: Distribution of C4 structural and compound structural forms categorized by suicide attempt.
figure 3

a Distribution of C4 structural forms a) C4A, b) C4B, c) C4L, d) C4S categorized by suicide attempt (N = 391). b Distribution of C4 compound structural forms a) C4AL, b) C4BL, c) C4AS, d) C4BS categorized by suicide attempt (N = 391).

Association between C4 and suicide attempt and suicidal ideation

Suicide attempts and suicidal ideation were recorded in 391 and 394 subjects, respectively. Figure 4 shows the odds ratios (OR) and their precision based on our logistic regression model for the entire sample, then males and females separately, while Tables 1, 2.1, and 2.2 demonstrate the same results in tabular form, with unadjusted (Supplementary Table 2) and adjusted regression p-values (p-values were not corrected for multiple tests). We see that, in general, there is not much evidence of an association between increasing C4 copy number and suicide attempt and suicidal ideation, except possibly for C4AS, where we see some evidence of a negative association for suicide attempts (OR for total sample =0.491, Beta = −0.710, 95% CI: [0.217–0.936], p = 0.056, Table 1) and suicidal ideation (OR for total sample = 0.651, Beta = 0.814, 95% CI: [0.355–1.099], p = 0.074, Table 1), characterized by a drop in the prevalence of suicide attempts and suicidal ideation as the copy number of C4AS increases. We also notice that Fig. 4 seems to show a general tendency for negative associations across most of the variants.

Fig. 4: Effects (odds ratio) of C4 copy number on suicide attempt.
figure 4

The odds ratios and confidence interval (CI) based on our logistic regression model for the entire sample (N = 391), then males (in blue, N = 283) and females (in pink, N = 108) separately.

Table 1 Logistic regression analyses between C4 variants and suicide attempt/ideation in males and females with age, sex, and ancestry as covariates
Table 2 1. Sex-stratified logistic regression analysis between C4 variants and suicide attempt/ideation in males with age and sex as covariates

The presented outcome corresponds to the logistic regression model incorporating age, sex, and ancestry (Europeans vs non-Europeans) as covariates. The results between C4AS copy number and suicide attempt retained their marginal significance even after the inclusion of substance abuse and alcohol abuse as additional covariates in the model (Supplementary Table 1). In addition, the unadjusted analysis before controlling for covariates is also reported in Supplementary Table 2. Figure 5 depicts the probability of suicide attempts with increasing copy numbers of C4AS based on our logistic regression model.

Fig. 5: Probability of suicide attempt with increasing C4AS copy number.
figure 5

The risk of suicide attempts decreases from 49% (C4AS copy number = 0) to 5% (C4AS = 4). The inferential error bars widen with increasing C4AS copy numbers, indicating a higher uncertainty, due to a decreasing sample of cases with higher C4AS copy numbers. The adjusted probabilities were calculated based on our logistic regression model (N = 391) with age, sex, and ancestry as covariates. Supplementary Table 3 includes detailed measurements.

Sensitivity analysis

Through a detailed examination of the C4AS copy number distribution between suicide attempters and non-attempters (Fig. 3b), it becomes apparent that the occurrence of one or more C4AS copy numbers is infrequent. To mitigate the potential influence of outliers on the significance of the association between C4AS and suicide attempt/ideation, individuals with one or more copies of C4AS were combined into a pooled group. This approach ensures a more robust analysis and strengthens the reliability of our findings regarding the relationship between C4AS and suicide attempt/ideation. The results of the sensitivity analysis remained marginally significant for suicide attempts (OR = 0.481, Beta = −0.730, CI: [0.203–1.051], p = 0.077) but not for suicidal ideation (OR = 0.655, Beta = −0.422, CI: [0.315–1.352], p = 0.251). This difference is likely attributed to some participants attempting suicide without any prior history of suicidal ideation.

Sex-stratified analysis

To assess the sex-specific effect of C4 variants on suicide attempt/ideation, we stratified our sample into males (N = 283) and females (N = 108). The copy number of C4B shows some weak evidence of negative association with suicide attempts for females, but not for males (OR for females: 0.551, Beta = −0.596, 95% CI: [0.276–1.029], p = 0.071, Table 2.2; the OR for males = 1.064, Beta= 0.062, CI: [0.736–1.540], p = 0.739, Table 2.1), but in general, there is little evidence of variations across sexes.

Discussion

There have been a few previous studies that have mostly analyzed copy number variants based on single-nucleotide polymorphism array data in relation to suicidality among schizophrenia patients26,27,28. However, this study is novel in examining the relationship between C4 copy number variants and suicidality in schizophrenia using a direct-genotyping approach. We found a possible negative association between C4AS and suicide risk. Moreover, sex-stratified analyses revealed that there is a weak negative association for C4B in females, but not males, but in general, there was no strong difference between sexes. However, it should be noted that our study was exploratory in nature and had a modest sample size. Therefore, we relied on an estimation framework to observe general trends in the data, characterized by the effect size and confidence intervals.

Multiple lines of evidence suggest that increased C4 expression is associated with schizophrenia susceptibility, may be due to excessive synaptic pruning14,26,29. Mouse models with C4 overexpression have demonstrated reduced synaptic density in the medial prefrontal cortex, accompanied by abnormalities in glutamatergic cells20,29. These findings align with investigations in human subjects with a propensity for suicidal behaviour, where structural abnormalities in the prefrontal cortex have been observed alongside dysregulation in the glutamatergic neurotransmission system27,28. In addition, a recent in vivo positron emission tomography (PET) study in patients with schizophrenia has revealed aberrant changes in the frontal and anterior cingulate cortices30. These brain regions are known to be intricately involved in emotion regulation and have been implicated in the manifestation of suicidal ideation and behaviour30,31. Together, these integrated results suggest a potential role for C4 involvement in causing alterations in synaptic connectivity in the prefrontal cortex in schizophrenia patients which could distort decision-making processes and potentially contribute to an increased vulnerability to suicidal behavior.

Considering the previously established link between C4A expression and an increased risk of schizophrenia14, we expected to observe a higher incidence of suicide among people with higher C4A copy numbers. Surprisingly, our analyses did not reveal any correlation between C4AL/predicted C4A expression and suicide attempts or suicidal ideation. Interestingly, we observed a negative association between C4AS copy number and both suicide attempts and suicidal ideation, indicating that a higher C4AS copy number was associated with fewer suicidal events. These results point to a potential protective effect of increased C4AS copies against suicidal events.

One potential explanation for this result is the effect of the HERV insertion on C4 gene expression20. Previous studies have proposed that the HERV insertion may function as an enhancer of gene expression32,33. Consequently, it is plausible that in the absence of a HERV insertion in the C4AS variant, there could be reduced C4A, potentially mitigating neuro-abnormalities and contributing, at least in part, to the observed decrease in suicidal events. Moreover, in line with this potential explanation, it has been documented that C4L copy number has the opposite effect to C4S copy numbers19. Therefore, given the reduced incidence of suicide attempts and suicidal ideation observed in individuals with higher copies of C4AS, it is conceivable that the presence of C4AS may be associated with lower copies of C4AL, which is a stronger risk factor for schizophrenia risk14.

It has been previously shown that C4 alleles could have a sex-specific effect on disease pathophysiology, specifically with schizophrenia as well as autoimmune disorders, including systemic lupus erythematosus and Sjögren’s syndrome. In all of these three illnesses, C4 alleles influence men more strongly than women. At a protein level, C4 and its effector C3, have been shown to be present at higher levels in the cerebrospinal fluid and plasma of men compared to women, consistent with the more potent effects of C4 in men, which could suggest a possible reason for men’s greater vulnerability to schizophrenia34. Our sex-stratified analyses revealed that increasing copy number of C4B is weakly associated with fewer suicide attempts and suicidal ideation in females but not males. This result may be explained by the potential protective effect of the C4B variant34, which may play a role in lowering suicidal events in females.

Our experimental approach to directly genotype the complex C4 gene instead of computational methods provides a precise copy number for each compound structural variant. However, there were a number of limitations. Firstly, as the compound structural forms may have different configurations, we were unable to determine C4 haplotypes. Also, due to the stringent DNA quality requirement of our experimental workflow, 21% of our samples failed to produce reliable genotyping data and were excluded from analyses. In addition, we could not determine the exact copy numbers for C4AL, C4BL, C4AS, or C4BS in about 1% of our genotyped samples. Secondly, our sample size was modest, raising concerns about potential type I and type II errors or the possibility that our lack of significance with C4A expression might be attributed to insufficient sample size. In addition, due to the retrospective nature of our study, we were unable to extensively characterize data on factors that may influence suicide risk (e.g., method for suicide attempt, intent, childhood trauma, and any changes in socioeconomic status). Additionally, the ancestry data used as a covariate in our analysis were self-reported, potentially introducing limitations in accounting for subtle variations within and between different populations. Therefore, further investigations with larger sample sizes are warranted to replicate and validate these findings. It should also be noted that schizophrenia is a highly polygenic disorder, and no single gene could explain a significant proportion of the disease risk by itself. Consequently, in order to conduct a more comprehensive analysis of all contributing factors to suicide risk in schizophrenia, it is important to consider additional genes related to suicide risk by utilizing methods such as polygenic risk scores. Overall, our preliminary findings provide encouraging evidence to warrant further exploration of the relationship between the C4 gene and suicidal outcomes, with a specific focus on brain gene expression. Also, it is worth exploring the effect of other immune-related players in the complement pathway (e.g., CSMD1, C1q, C3) on suicide risk in schizophrenia patients.

Methods

Participants

A total of N = 433 subjects (mean age 38.7 ± 11.5; 71% males) with either schizophrenia or schizoaffective disorder were recruited as part of an ongoing schizophrenia genetic study at the Centre for Addiction and Mental Health (CAMH). Prior to study enrollment, all participants provided informed consent, and the research protocol was approved by the CAMH research ethics board. Eligible subjects were adults over the age of 18 who had a clinical diagnosis of either schizophrenia or schizoaffective disorder based on the Structured Clinical Interview for DSM-III-R or DSM-IV Axis (SCID)35. Exclusion criteria include head injury with loss of consciousness, seizure disorder, type II diabetes, and participants who were not able to read or understand English. The majority of participants were of self-reported European ancestry (sample characteristics detailed in Table 3). Data on lifetime suicide attempts, suicidal ideation, and suicidal plans were retrospectively extracted from a comprehensive search of the mood disorder module of the SCID, referral notes, life charts, family history, and a summary of medical records. The SCID specifically has a section (criterion A9 of major depressive episode) denoting suicidal ideation vs suicide plan (specific plan on how to commit suicide) vs suicide attempt. A suicide attempt was defined as any deliberate act of self-harm with intent for death35. This was specified as a suicide attempt in the hospital medical chart or recorded during the SCID interview by a well-trained clinician. Suicidal Ideation was determined based on Beck’s scale for suicidal ideation36. In addition, data on risk factors of suicide, such as substance use disorder and alcohol use disorder, were extracted. All participants provided saliva or blood samples for DNA extraction.

Table 3 Characteristics of participants used in the analysis

Genetic data collection

Genomic DNA was extracted from whole blood using the high-salt method37. To determine the precise copy number of each C4 compound structural form (C4AL, C4BL, C4AS, and C4BS), we utilized a three-step approach similar to the Sekar et al. 14. paper. In step 1, Taqman-based copy number assays for the four structural elements [C4A (Hs07226349 _cn), C4B (Hs07226350_cn), C4L (Hs07226352_cn), and C4S (Hs07226351_cn)] were run on the Viia 7 real-time PCR system (Thermo Fisher Scientific) in quadruplicate with RNaseP reference assay following manufacturer’s protocol, and the copy numbers of C4 structural elements (C4A, C4B, C4L, C4S) were resolved using the CopyCaller software (Thermo Fisher Scientific). In step 2, in individuals with at least one copy of C4S, standard long-range PCR was performed with primers specific to C4S: forward 5′-TCAGCATGTACAGACAGGAATACA-3′ and reverse 5′-GAGTGCCACAGTCTCATCATTG-3′ (TaKaRa, Clontech)38. In step 3, using a custom-designed Taqman genotyping assay, we determined the presence of C4A and/or C4B in the C4S long-range PCR product (Thermo Fisher Scientific). Subsequently, we determined the copy number of C4AS and C4BS which allowed us to extrapolate the copy numbers of C4AL and C4BL through subtraction of the total copies of C4A, C4B, C4L, and C4S. To assess genotyping quality, the formula [C4A + C4B = C4L + C4S] was used, and samples with unmatching numbers were re-genotyped. Samples with failed second genotyping were excluded from the analysis. Moreover, we calculated the predicted C4A and C4B brain expression using the formula provided in Sekar et al.’s paper14.

$$\begin{array}{c}C4A\,{expression}=(0.47* C4{AL})+(0.47* C4{AS})+(0.20* C4{BL})\\ C4B\,{expression}=(1.03* C4{BL})+(0.88* C4{BS})\end{array}$$

Statistical analysis

Logistic regression models were used to examine the association between copy numbers of C4 structural forms (C4A, C4B, C4L, C4S), compound structural forms (C4AL, C4BL, C4AS, C4BS), and the predicted brain expression level of C4A and C4B with suicide attempt and suicidal ideation. Factors affecting suicidality such as sex, age, substance abuse, and alcohol abuse were used as covariates. In addition, due to the previously established sex-specific association of C4 to schizophrenia and other autoimmune disorders, sex-stratified analyses were conducted to assess the sex-specific effect of C4 variants on suicidal outcomes. Data analyses and visualization were conducted using RStudio (version 2022.07.0 + 548). A power analysis conducted with the software GPower and using a logistic regression model, with two-sided tests, confidence level alpha 0.05 and prevalence of the outcomes encountered in our data showed that with our sample size, we have 80% to detect effect sizes equivalent to odds ratio 0.37, or a prevalence change from 44% to 22%. This is a large effect and would make statistical significance testing underpowered, particularly under multiple testing adjustments. For this reason, we will use an estimation framework for statistical inference, where we focus on the reporting of effect sizes and their precision (95% confidence intervals) and the interpretation of overall patterns of association instead of focusing on p < 0.05. We still report p-values descriptively, as a measure of statistical evidence.