Schizophrenia is a strongly heritable disorder, and identification of potential candidate genes has accelerated in recent years. Genomewide scans have identified multiple large linkage regions across the genome, with fine-mapping studies and other investigations of biologically plausible targets demonstrating several promising candidate genes of modest effect. The recent introduction of technological platforms for whole-genome association (WGA) studies can provide an opportunity to rapidly identify novel targets, although no WGA studies have been reported in the psychiatric literature to date. We report results of a case–control WGA study in schizophrenia, examining ∼500 000 markers, which revealed a strong effect (P=3.7 × 10−7) of a novel locus (rs4129148) near the CSF2RA (colony stimulating factor, receptor 2 alpha) gene in the pseudoautosomal region. Sequencing of CSF2RA and its neighbor, IL3RA (interleukin 3 receptor alpha) in an independent case–control cohort revealed both common intronic haplotypes and several novel, rare missense variants associated with schizophrenia. The presence of cytokine receptor abnormalities in schizophrenia may help explain prior epidemiologic data relating the risk for this illness to altered rates of autoimmune disorders, prenatal infection and familial leukemia.
Schizophrenia is a disease with estimated lifetime morbid risk approaching 1% worldwide.1 Its course is commonly chronic and severely disabling, accounting for nearly 3% of all years lived with disability; amongst individuals aged 15–44, SCZ is the third-leading cause of disability.2 Schizophrenia (SCZ) is associated with an increase of at least 50% in mortality rates compared with the general population,3 including a suicide rate of approximately 5%,4 resulting in a 10-year average lifespan reduction.2 Despite recent advances, pathophysiology remains poorly understood, and current treatments remain unsatisfactory.5
Genetic epidemiologic studies have revealed high heritability estimates (∼80%) for schizophrenia and elevated relative risk (λ ∼10) in first-degree relatives.6 The discovery of susceptibility genes for SCZ has proven challenging; however, research to date has identified genes with, at best, modest individual effects. As with other complex diseases, linkage studies have revealed multiple candidate regions with marginal LOD scores spanning large regions of the genome,7 whereas studies of individual candidate genes are inherently limited in their scope and may miss unexpected loci of strong effect. By contrast, the development of whole-genome association (WGA) technology provides an opportunity to rapidly identify novel susceptibility genes for complex phenotypes, even with relatively small samples, as recently demonstrated in macular degeneration,8, 9 and QT interval prolongation,10 for example. These WGA studies, which examined approximately 100 000 single-nucleotide polymorphisms (SNPs), also demonstrated the importance of subsequent fine mapping and examination of independent cohorts in clarifying and validating initial association signals obtained with WGA. Because WGA studies are unlikely to directly reveal functional variants, our goal in WGA was to identify SNPs in or near genes that could then be independently examined through deep sequencing. Thus, we report the results of a WGA study in a schizophrenia case–control cohort using a ∼500 000 SNP array and data from exonic sequencing in an independent case–control cohort.
Materials and methods
The WGA sample
For the WGA study, patients with schizophrenia-spectrum disorders (n=178, including schizophrenia, n=158; schizoaffective disorder, n=13; or schizophreniform disorder, n=7) were recruited from the inpatient and outpatient clinical services of The Zucker Hillside Hospital, a division of the North Shore–Long Island Jewish Health System. After providing written informed consent, the Structured Clinical Interview for DSM-IV Axis I disorders (SCID, version 2.0) was administered by trained raters. Information obtained from the SCID was supplemented by a review of medical records and interviews with family informants when possible; all diagnostic information was compiled into a narrative case summary and presented to a consensus diagnostic committee, consisting of a minimum of three senior faculty. Healthy controls (n=144) were recruited by use of local newspaper advertisements, flyers and community Internet resources, and underwent initial telephone screening to assess eligibility criteria. The nonpatient SCID (SCID-NP) was administered to subjects who met eligibility criteria, to rule out the presence of an Axis I psychiatric disorder; a urine toxicology screen for drug use and an assessment of the subject's family history of psychiatric disorders were also performed. Exclusion criteria included (current or past) Axis I psychiatric disorder, psychotropic drug treatment, substance abuse, a first-degree family member with an Axis I psychiatric disorder, or the inability to provide written informed consent.
Patients (65 female/113 male) and controls (63 F/81 M) did not significantly differ in sex distribution (P>0.05). All self-identified as Caucasian, non-Hispanic. Population structure was tested by examination of 210 ancestry informative markers (AIMs). AIMs included all SNPs on the array that passed initial quality control procedures (described below) and demonstrated a frequency difference of 0.5 in comparisons between Caucasian individuals and Asians or African-Americans in publicly available data from Shriver and colleagues.11 Two tests of structure were performed, both of which indicated no significant stratification. First, analysis with the STRUCTURE program12 confirmed that all subjects were drawn from a single population; second, comparison of cases and controls on allelic frequency across the 210 AIMs revealed no differences beyond those expected by chance.
The sequencing sample
The sample for the sequencing study was drawn from a larger sample of 151 SCZ patients and healthy subjects of various self-reported ethnicities who provided written, informed consent. All 151 subjects were genotyped at 67 AIMs and analyzed using the STRUCTURE program.12 The 71 cases (28 F/43 M) and 31 controls (18 F/13 M) who were at least 90% Caucasian were used in the primary case–control analysis; groups did not significantly differ on sex distribution (P>0.05). All cases were diagnosed with schizophrenia and were drawn from a nationwide population of clozapine-treated patients in the United States as part of a study on clozapine-induced agranulocytosis. As a secondary analysis, to provide comparable sample sizes between groups, 41 additional healthy individuals of Asian, Hispanic or Native American ancestry were also examined.
For the WGA study, genomic DNA extracted from whole blood was hybridized to two chips containing ∼262 000 and ∼238 000 SNPs as per manufacturer's (Affymetrix, Santa Clara, CA, USA) specifications. For all preliminary steps (e.g. polymerase chain reactions), patient and control samples were proportionally represented on each 96-well plate, so as to avoid the potential confound of plate artifacts. Genotype calls were obtained using the Bayesian Robust Linear Model with Mahalanobis distance classifier algorithm threshold at 0.5 applied to batches of 100 samples. Quality control procedures followed several steps. First, samples that obtained mean call rates <90% across both chips (or <85% for a single chip) were rejected. Mean call rate of remaining samples (total n=322) was 97%. Twenty-two of these cases were successfully repeated, and concordance of the two calls (reliability) for each SNP was evaluated. SNPs with >1 discrepancy were excluded from further analyses. Concordance across the remaining 454 699 SNPs exceeded 99%. An additional 1526 SNPs with call rates <0.85 across all valid cases were excluded, as were 13 662 SNPs not in Hardy–Weinberg equilibrium (P<0.001) in the control sample, yielding 439 511 SNPs available for analysis.
Whole-genome association statistical analysis
The primary analytic modality involved computation of likelihood ratios (df=1) for the best possible genotypic split (e.g. recessive or dominant models) for each SNP, with the constraint that a minimum of 10 subjects populate each split group (thereby excluding monomorphic and very rare SNPs), yielding 362 188 splits plotted in Figure 1. Allelic χ2-squares were also examined for each SNP that passed quality control procedures described above. All WGA statistical analyses were conducted using HelixTree 5.0.2 software (Golden Helix, Bozeman, MT, USA).
Because strict Bonferroni correction is considered overly conservative, a Bayesian formula was applied to obtain 0.95 posterior probability of a correct inference of association to a particular gene; this approach was advocated by Freimer and Sabatti,13 and modified to take into account recent estimates of the total number of genes in the human genome to approximately 20 000, resulting in a P-value threshold of ∼4.2 × 10−7. This gene-wise approach is particularly applicable to the current study design, in which the WGA analysis is conducted with the assumption that significant marker(s) are not themselves functional variants but rather indicate neighboring genetic loci to be examined in greater detail. It could be argued that this threshold is also overly conservative, in that it assumes a single genetic locus and does not take into account prior information (e.g. cytogenetic and linkage data) that may provide greater likelihood in particular regions of the genome. However, such information is difficult to quantify13 and therefore was not included in deriving the threshold. Finally, it should be noted that empirical testing of all 362 188 P-values from the primary analysis, using the Q-value program,14 yielded a false discovery rate q<0.05 for the threshold applied in the present study.
As described below, WGA data yielded an SNP near the colony stimulating factor 2 receptor alpha (CSF2RA) gene which was examined in the independent sequencing sample (n=102). Notably, no HapMap data are available for CSF2RA to guide SNP selection; all SNPs analyzed in the sequencing sample are derived from sequencing of exons and their immediate flanking regions. Because CSF2RA forms a cytokine receptor gene cluster with its neighboring gene (interleukin 3 receptor alpha; IL3RA), we also examined sequencing data for IL3RA in the same 102 individuals. The following regions were sequenced: CSF2RA, exons 3, 7, 8, 9, 12 and an upstream region; IL3RA, exons 1–3, 5–8, 10 and 12 (reference GenBank mRNA accession numbers NM_006140.3 and NM_002183.2, respectively). Fragments (500bp) were amplified using primers displayed in Table 1; note that intronic regions flanking each exon were thus included in the sequencing analysis. Primers were tailed with M13F for forward primers and with M13R for reverse primers. The sequencing reactions were carried out in both directions using BigDye Terminator Cycle Sequencing, and electrophoresis was run on the ABI Prism 3700 DNA Analyzer, according to standard procedures. Fragments were blasted against the target sequence and polymorphisms were scored for fragments with an average Phred score >30. As a control, Mendelian inheritance of each common polymorphism was assessed in one Caucasian family (four grandparents, two parents, four children) and one African American family (two parents, five children) and population specific Hardy–Weinberg equilibrium was determined for 79 unrelated individuals.
As shown in Figure 1, the WGA study yielded one SNP demonstrating an association beyond the genomewide significance threshold: rs4129148, neighboring CSF2RA. CSF2RA is located in the pseudoautosomal region (PAR1) of the X and Y chromosomes (Xp22.32/Yp11.3). Under a recessive genetic model, homozygosity for the C allele (−strand) was significantly associated with schizophrenia. As shown in Figure 2, 59% of cases (105/178), but only 31% of controls (44/143, one subject not called) were CC homozygotes (odds ratio=3.23; 95% confidence interval=2.04–5.15; population attributable risk=23.5%).
Additional tests were performed to rule out several potential confounds that might artifactually inflate the P-value of rs4129148. First, as noted above, there was no evidence of population stratification; the maximal difference between cases and controls on 210 AIMs was more than four orders of magnitude smaller than that observed for rs4129148. This distinction is important insofar as the HapMap reveals significant allelic frequency differences between populations for rs4129148. Moreover, the genotype and allelic frequencies in the WGA control sample are comparable to those reported in the HapMap CEU sample, and neither cases nor controls in the WGA sample exhibit genotypic or allelic frequencies comparable to the other HapMap ethnic groups.
Second, because rs4129148 is located on the X/Y chromosomes, case–control comparisons were performed separately for males and females. For both sexes, the recessive model described above significantly differentiated cases and controls (P0.001), with comparable genotypic and allelic frequencies across sexes. Moreover, Hardy–Weinberg equilibrium was observed in both sexes, and only one subject in the entire sample was not called. Thus, it was unlikely that any sex difference in calling of alleles or observation of heterozygosity could account for the significant results for this pseudoautosomal (not X-linked) SNP.
Third, by selecting amongst more than one genotypic model (i.e. dominant and recessive), it is possible that P-values were artificially inflated. Therefore, we separately performed genomewide allelic χ2-tests, which again demonstrated that rs4129148 achieved the strongest P-value, with virtually the same strength (P=4.4 × 10−7) as the genotypic P-value for the recessive model. Moreover, examination of the distribution of P-values for all allelic tests across the genome demonstrated no excess of significant results. Examining all 439 511 SNPs that passed quality control procedures, 44 crossed a significance threshold of P<0.0001, 429 were significant at P<0.001, and 4346 were significant at P<0.01; these results demonstrate no inflation of the family-wise error rate.
Next, we sought to extend the WGA finding by conducting two sets of analyses in the sequencing sample: (1) haplotype analysis of common SNPs (minor allele frequency0.10); and (2) frequency analysis of all rare missense variants. Common SNPs were examined using Haploview 3.3215 for patterns of linkage disequilibrium (LD) and identification of haplotype blocks. Figure 3 demonstrates the patterns of LD for common variants in CSF2RA and IL3RA. Three haplotype blocks were identified with solid spines of D′>0.70.
Our case–control analysis followed a hierarchical approach, to limit multiple testing concerns. First, we performed global χ2-tests for each of the three haplotype blocks, controlling overall alpha at 0.05 using the false discovery rate method of Benjamini and Hochberg.16 For each significant haplotype block, we then conducted planned post hoc analyses to determine which haplotypes (and SNPs) within the block were accounting for the signal. Haplotype block 1, consisting of three common SNPs in CSF2RA intron 8, was significantly associated with schizophrenia (global P=0.016, Table 2). As shown in Table 2, the CGC haplotype within this block was significantly under-represented in patients (P=0.0017), and two SNPs in the block were individually associated with SCZ (rs28694718, P=0.0175; rs28414810, P=0.0084). Haplotype block 2, containing three SNPs spanning introns 4, 5 and 6 of IL3RA, was also significantly associated with illness (global P=0.027). A haplotype containing the minor alleles of these SNPs (CGC for rs6422441, rs6603272 and rs17883192) was over-represented in patients (P=0.0086), and each of these SNPs was significantly associated with SCZ (P's<0.05). Of note, when patients who had developed agranulocytosis while undergoing treatment with clozapine (n=26) were removed from the analysis, haplotype comparisons of SCZ patients vs controls remained significant (CGC in CSF2RA, P=1 × 10−4; CGC in IL3RA, P=0.02).
We also identified seven novel exonic mutations within these two genes that resulted in amino acid substitutions (Table 3). A total of 15 substitutions were detected in cases, with only one detected in controls (Fisher's exact P=0.031). Moreover, all but one of the rare missense variants shown in Table 3 were observed in patients without agranulocytosis, again indicating that these results are not a reflection of susceptibility to agranulocytosis. Finally, because it could be argued that the control group sample size was too small for adequate comparison of rare events, we obtained sequence data from an additional 41 healthy individuals of Asian, Hispanic or Native American ancestry, resulting in total sample size (n=72 controls) directly comparable to the patient group. Notably, six of the seven rare missense alleles listed in Table 3 were not detected in any of these healthy individuals. Moreover, the rare allele of the other mutation locus (Ala17Gly) was detected in only three of the additional control subjects. No additional mutations were detected in the additional subjects. Thus, the rate of mutation was four times higher in patients than in the combined control sample, and the results remain statistically significant (P=0.01).
Using similar methods, we also sequenced a small region surrounding rs4129148 in the original 102 Caucasian individuals. There were no significant differences in rs4129148 genotype (P=0.28) or allele (P=0.14) frequency between patients and controls in this sample. Evidence for long-range LD between rs4129148 and the genic SNPs in CSF2RA and IL3RA in the sequenced cohort was present, though limited. In the control sample, there was modest evidence of LD between rs4129148 and SNPs in haplotype block 2 (D′=0.5, r2=0.12), which was not observed in the patient group. In both samples, there was some evidence of strong LD (D′ as high as 1, but with modest r2) between rs4129148 and several SNPs with low minor allele frequency (0.05<MAF<0.10) within the two genes. It should be noted that these relationships could not be tested in the WGA sample, because the Affymetrix 500K array has very limited coverage of the PAR1; there are no SNPs on the array within CSF2RA, and only two SNPs within IL3RA, neither of which passed quality control screens.
This study identified three converging lines of evidence, from two independent samples, suggesting the presence of a novel schizophrenia susceptibility locus in the PAR1. First, an SNP neighboring CSF2RA survived a genomewide test of significance in the WGA sample, with an odds ratio greater than 3 and a population-attributable risk approaching 25%. Second, intronic haplotype blocks within CSF2RA and its immediate neighbor IL3RA were significantly associated with SCZ in the independent sequencing sample. Third, seven novel missense mutations in these two genes were also significantly over-represented in patients with schizophrenia in the sequencing sample.
Whereas there is some prior cytogenetic support for a PAR1 locus,17 linkage studies have reported mixed results18, 19, 20 and have often not included coverage of the PAR1 region. To our knowledge, neither CSF2RA nor IL3RA have been previously examined in candidate gene association studies of schizophrenia, and studies of other cytokine genes in schizophrenia have been equivocal. Intriguingly, a very recent study has identified several schizophrenia-related SNPs and haplotypes both in and near the gene for interleukin 3 (the ligand for IL3RA), which is located within a commonly identified linkage peak on chromosome 5q.21
Results of the present study are also notable in that associations to illness were found for common variants in non-coding regions as well as rare missense mutations. Whereas hereditary patterns and linkage data in schizophrenia are consistent with a polygenic, common disease/common variant model,22 it has also been suggested that heterogeneous rare mutations may play an important role.23 Results of the present study suggest that these mechanisms are not mutually exclusive in the genetics of schizophrenia.
A substantial role for cytokine-related genes such as CSF2RA and IL3RA in mediating schizophrenia risk might help connect several disparate lines of epidemiologic and biological data in SCZ. First, it has long been observed that monozygotic twin concordance is approximately 50%, indicating both strong genetic and environmental components of risk.6 Well-replicated findings of increased SCZ risk for winter-born and urban-born/dwelling individuals, combined with large-scale studies demonstrating increased risk following maternal exposure to a variety of infectious pathogens, have suggested that prenatal infection may represent a significant environmental risk factor.24 The lack of specificity to particular teratogens, including perinatal hypoxic insult or trauma as well as maternal stress, points to a common pathway by which these varying exposures may equally influence schizophrenia liability; one such pathway is the cytokine-mediated inflammatory response.25 Moreover, increased levels of pro-inflammatory cytokines reported in the peripheral blood and cerebrospinal fluid of patients with schizophrenia suggest that activation of this pathway in adulthood may form a component of SCZ pathophysiology.26 Furthermore, family-based epidemiologic studies suggest that relatives of patients with schizophrenia are psychiatrically27 and neuroanatomically28 more sensitive to negative sequelae of teratogenic exposures. The present findings point toward an explanation for such gene–environment interactions: genetic variation in the receptor structure or expression of pro-inflammatory cytokines, such as CSF2RA and IL3RA and their respective ligands, granulocyte-macrophage colony stimulating factor (GM-CSF) and interleukin-3 (IL-3), may serve to heighten sensitivity to the teratogenic effects of prenatal infections and perinatal insults,29 and may contribute to the variable course and episodic nature of the adult disorder.30
Abnormalities of GM-CSF, IL-3 and their receptors have been most commonly associated with lymphoma and leukemia.31, 32 A previously unexplained (though replicated) epidemiologic finding in schizophrenia has been the reduced incidence of these two cancers in relatives of SCZ patients.33, 34 Similarly, an elevated level of GM-CSF in synovial fluid has been implicated in the inflammatory pathophysiology of rheumatoid arthritis (RA),35 and SCZ patients have consistently demonstrated a reduced prevalence of RA.36 Moreover, SCZ patients show increased incidence of other inflammatory autoimmune disorders such as celiac disease and Sjögren's syndrome before SCZ onset, and prevalence of many autoimmune disorders is increased in parents of SCZ patients.36 These patterns of comorbidity, both in patients and first-degree relatives, might also be explained by gene–environment interactions of abnormal cytokine receptor genes and varying levels of environmental exposures.
Finally, whereas GM-CSF is most commonly associated with proliferation and differentiation of granulocytes and macrophages, there is accumulating evidence of its critical role in the central nervous system repair via expression of brain-derived neurotrophic factor,37 a trophic factor that has been associated with psychiatric illness. This is consistent with broader evidence of a role for GM-CSF and IL-3 in the central nervous system, including neuroprotection,38 communication across the blood–brain barrier,39 and neurotransmitter modulation (particularly of acetylcholine and GABA levels).40 Further studies are needed to determine whether the association of this cytokine receptor genetic locus with schizophrenia is mediated by immune response to infectious agents, autoimmune or inflammatory processes, trophic factors or a combination of these mechanisms.
As with any genetic association study, additional independent replications from other laboratories will be critical to confirm these novel findings. The lack of significant association to rs4129148 in the small sequencing cohort provides a note of caution in interpreting results of the WGA study. Nevertheless, the genomewide-significant result of the WGA study appeared robust to several potential forms of artifact, including population stratification and gender bias. Notably, rs4129148 showed zero errors in the reproducibility analysis, and had a high minor allele frequency resulting in apparently robust calling of both heterozygotes and homozygotes. Still, it is important to note that even multiply replicated findings can be subject to subsequent failures to replicate,41, 42, 43, 44 presumably owing to the multiple sources of heterogeneity underlying the pathophysiology of complex disease.
Understanding the role of rs4129148 will require further explication of its relationship to the neighboring genes CSF2RA and IL3RA. Whereas the sequencing sample demonstrated only modest evidence of long-range LD, difficulties in sequencing and genotyping this region (as demonstrated by gaps in coverage in the reference sequence,45 the HapMap,46 and commercially available arrays such as the Affymetrix 500K) limit our current understanding of the LD structure in this region. Other roles of rs4129148 distinct from any potential relationship to CSF2RA and IL3RA, such as regulatory functions or coding of small RNAs, cannot be ruled out at this time. Additionally, it should be noted that an additional cytokine receptor gene, cytokine receptor-like factor 2 (CRLF2), may lie between rs4129148 and CSF2RA, although its RefSeq status is currently listed as provisional.
Despite these limitations, the design of the present study did permit the WGA result to direct attention to PAR1. The sequencing study identified two common haplotypes in CSF2RA and IL3RA that were significantly associated with schizophrenia; we also identified seven previously unreported missense mutations, which were collectively associated with illness. Future biological studies will be needed to explicate potential consequences of these haplotypes and amino acid substitutions on receptor transcription, structure and function.
This work was supported by the Donald and Barbara Zucker Foundation, internal funding from the North Shore – Long Island Jewish Health System, Clinical Data, Inc., a Keyspan Fellowship Award (TL) and grants from the Stanley Foundation (AKM), the National Alliance for Research on Schizophrenia and Depression (AKM), and the NIH (MH065580 to TL; MH074543 to JMK; MH001760 to AKM). We thank Christophe Lambert and Josh Forsythe of Golden Helix, Inc., for their computing expertise and technical support.