Introduction

Schizophrenia (SZ), bipolar disorder (BD) and recurrent major depressive disorder (rMDD) are common forms of serious mental illness, each with a strong and overlapping genetic component.1, 2, 3 Genome-wide linkage, association, cytogenetic, copy number variant and, more recently, sequencing studies establish that the genetic architecture of psychiatric illness is complex and that there is extensive genetic heterogeneity, which is incompletely defined or understood (reviewed in Sullivan et al.4). We previously reported a t(1;11) translocation in a single large Scottish family that showed genome-wide significant linkage for SZ, rMDD and jointly with BD.5 The t(1;11) translocation is balanced and structurally simple, but the outcome is genetically complex, disrupting the protein coding gene Disrupted in schizophrenia 1 (DISC1), the antisense non-coding gene Disrupted in schizophrenia 2 (DISC2) and the non-coding gene DISC1FP1, creating a DISC1/DISC1FP1 fusion transcript.6, 7, 8, 9, 10, 11

Several small independent studies have reported evidence for association of single DISC1 single-nucleotide polymorphisms (SNPs) (coding and non-coding) or haplotypes with SZ, BD, rMDD and other neuropsychiatric traits, including autism spectrum disorder, cognition, normative cognitive ageing, anxiety and structural and functional brain imaging phenotypes.9, 12, 13 Rare amino-acid substitution variants in DISC1 have been reported in cases of SZ,10, 11 BD,14 rMDD,13 autism spectrum disorder15 and agenesis of the corpus callosum,16 as has an increased burden of rare missense variants in exon 11 of DISC1 for schizoaffective disorder,17 and for DISC1 pathway genes in SZ.10 In contrast, a meta-analysis of all known common variants within the DISC locus, from a total of 11 626 cases and 15 237 controls that involved the testing of 1241 SNPs, found no evidence that common variants at the DISC locus are significantly associated with SZ.18 Moreover, the DISC1 locus has not reached genome-wide significance in large-scale meta-analyses of linkage studies of SZ,19 nor have its common variants in large-scale genome-wide association studies of SZ, BD or rMDD.20, 21, 22 A recent exon-based study that sequenced 2.7 kb of DISC1 in 727 cases of SZ and 733 controls found 32 rare alleles (minor allele frequency (MAF)<0.01) in SZ cases and 40 in European controls with no evidence for a significantly increased burden of likely pathogenic variants.23 DISC1, however, continues to feature strongly in attempts to assess genome-wide association results in terms of networks24 or in combination with known biological function.25, 26, 27

The biological functions of DISC1 fit well with current aetiological concepts in SZ-related major mental illness and cognition.28 DISC1 is a scaffold protein that interacts with, and modulates the activity of, multiple proteins with key roles in neurodevelopment, neurogenesis, neuronal migration, integration and signalling29, 30, 31 including the antidepressant and antipsychotic targets GSK3β32 and PDE4.33 Several common and rare amino-acid polymorphisms of DISC1 have predicted deleterious effects on protein function and demonstrable biological effects in experimental settings.31 The 704C allele is associated with reduced activity of ERK1 and Akt kinases, altered binding affinities of DISC1 for NDE1 and NDEL1 and variation in DISC1 oligomeric status.34, 35, 36, 37 The 607F allele results in (a) reduced binding and centrosomal localisation of PCM1, (b) reduced noradrenaline neurotransmitter release in SH-SY5Y cells,38 (c) altered mitochondrial trafficking39 and (d) a partial shift from neuronal to glial expression in the brain.40 Furthermore, Singh et al.41 reported that 607F impacts negatively on neural progenitor proliferation in E16 mouse brain, correlates with aberrant wnt signalling in human lymphoblasts, and is associated with a neurodevelopmental phenotype in morpholino mutant zebrafish. R37W lies within an arginine-rich nuclear localisation motif and a partially overlapping interaction domain for PDE442 and GSK3β.41, 43 The 37W allele shows reduced nuclear DISC1 expression and altered DISC1 regulation of ATF4, a critical modulator of cAMP signalling and mediator of the stress response.44

In summary, whereas the primary genetic evidence from the original Scottish t(1;11) family was significant at the genome-wide significance level for both SZ and rMDD and the experimental evidence and biological plausibility of DISC1 remains strong, the evidence from subsequent linkage and candidate gene association studies is however inconsistent and not supported by genome-wide association studies or meta-analysis.12 To explore these contrasting findings, we aimed here to establish the nature and frequency of DISC1 genomic sequence variants, identify rare variants in putative functional domains, and test for effects of these on cognitive traits and the risk of psychiatric illness. We comprehensively sequenced 528 kb covering the entire DISC1 locus, including TRAX (also known as TSNAX) for which there is evidence for intergenic splicing with DISC145 and the intergenic region, which contains regulatory elements immediately 5’ of DISC1.13, 46, 47, 48

Materials and methods

A full summary of the methods can be found in Supplementary Information. Briefly, all study participants gave signed consent for their data and samples to be used in studies that have been approved by the appropriate Research Ethics Committee or the GS access Committee. Genomic DNA from each individual was whole genome amplified in triplicate, the products pooled and amplified with primer pairs tiled across 528 kb of TRAX/DISC1 (hg18 chr1:229723339-230251606; hg19 chr1:231656716-232184983). For each sample, the pooled products were sheared, converted into paired-end Illumina libraries and sequenced on an Illumina GAII or HiSeq 2000 sequencer to >80% coverage and >30-fold depth. Sequences were aligned to the UCSC hg18 reference sequence, variants called using MAQ software49 and the variants in repeats removed. Ten percent of all remaining variants were validated using Sanger sequencing chemistry on an ABI3730 sequencer, and the derived information used to optimise the quality control filters. After quality control screening, all exonic and low frequency (MAF<1%) variants were also validated by Sanger chemistry sequencing as above. The variants were functionally annotated using SNPnexus50 (http://www.snp-nexus.org). Non-coding variants were annotated using the UCSC table browser for the following tracks: ‘RepeatMasker’,51 ‘CpG island’, ‘TFBS conserved’, ‘7x Reg Potential’ (which substantially overlaps with DNAse hypersensitivity sites) and/or ‘28-Way Most conserved—PlacMammal’ (http://genome.ucsc.edu/). Sequence variants classified as coding were mapped to the DISC1 L isoform and potential pathogenicity ascribed using Pmut,52 Panther53 and PolyPhen-2.54 The coding sequence variants were mapped onto a list of known curated DISC1-interactor binding sites31 and with other functional elements (for example, phosphorylation sites55). Case–control association was tested on the combined case samples as well as individually for SZ, BP and rMDD using Fisher’s exact test. Permutation was used to derive region-wide P-values and significance thresholds. Quantitative trait association analyses using LBC1936 samples were performed by linear regression of the trait residuals (adjusted for age and sex) on the number of minor alleles at each SNP, with empirical P-values estimated by permutation to avoid issues with the test statistic distribution caused by the combination of rare variants and slight deviations from normality in the phenotypes. All association analyses were performed using PLINK.56 Mark-recapture analysis followed the Lincoln-Petersen and Modified Petersen methods57, 58 with 95% confidence intervals calculated following Chapman.59 Burden analysis was performed in PLINK/SEQ to implement BURDEN and VTTEST with empirical P-values estimated using permutation. Genotyping of the replication and familial samples was performed by the Edinburgh Wellcome Trust Clinical Research Facility Genetics Core using TaqMan SNP genotyping assay C__33950433_10 with concurrent genotyping of known heterozygotes.

Data access

The accession numbers for sequence data are NCBI ss472328925—ss472331023.

Results

Sequence analysis

We sequenced 1542 Caucasians from Scotland comprising 240 cases of SZ, 221 cases of BD, 192 cases of rMDD and, as controls, 889 members of the Lothian Birth Cohort 1936 (LBC1936), which have been extensively phenotyped.60 Each sample was sequenced to >80% coverage at a minimum of 30-fold read depth by long-range PCR and sequencing on either Illumina GAII or HiSeq 2000 sequencers. To ensure a robust data set, all variants within repetitive regions were removed. Final quality score thresholds for the data were derived from capillary sequence validation of 10% of the remaining variants. All variants with an MAF <1% were validated by ABI3730 sequencing. After quality control, there was no evidence for sequencing bias between cases and controls (Supplementary Figure S1). Allele frequencies from our sample showed strong concordance to those from the European subset of the 1000 Genomes Project61 (Supplementary Figure S2). We report 2718 SNPs in the 1542 samples analysed, 708 at 1% and 2010 at <1% MAF (Supplementary Table S1). Only 1027 of the 2718 SNPs (38%) were previously reported in the European subset of the 1000 Genomes Project.61

As defined and annotated by the UCSC genome browser (http://genome.ucsc.edu), 489 SNPs mapped to regions of regulatory potential, 177 to non-coding exons (including DISC2) and 36 to coding regions of exons. Of these 36 variants, 12 were synonymous changes, 23 were non-synonymous changes, with one producing a stop codon consistent with the DISC1 Es isoform (Figure 1; Supplementary Figure S3; Supplementary Tables S1 and S2). Supplementary Table S3 summarises the overlap between variants identified in this study and other DISC1 sequencing studies and relevant association studies.10, 11, 13, 14, 16, 17, 23, 46

Figure 1
figure 1

TRAX/DISC1 (Disrupted in schizophrenia 1) genomic and exon structure: alignment of coding and regulatory variants. (a) Three-period moving average of all single-nucleotide polymorphisms (SNPs) identified per 5 kb across the region in this study with TRAX/DISC1 intron/exons structure given to scale. Total SNP number (black), those with a minor allele frequency (MAF) of <1% SNPs (blue), those 1% MAF (green), rs16856199 (arrow). For comparison, the number of SNPs identified in the 1000 genomes (red, dashed) and the number of bases repeat masked (top black) and 7x regulatory potential (top blue) are also shown. Exon and intron structure of TRAX and DISC1 are drawn to scale. (b) The position and diagnoses of exonic or regulatory SNPs. SNPs not seen previously (underlined), synonymous SNPs (black) and non-synonymous SNPs (red), stop or putative splice SNPs (green). Novel SNPs not previously reported in the European samples of the 1000 Genomes Project (v3.20101123) or the NHLBI GO Exome Sequencing Project (ESP6500) or relevant sequencing and association studies10, 16, 17, 64, 65 are underlined.

PowerPoint slide

Association and segregation of common variants with psychiatric illness and related quantitative traits

Genome-wide association studies of SZ, BP and rMDD are most consistent with a polygenic liability for common variants, but they also imply that there is real ‘missing’ genetic variation, which is most likely due to risk variants having low frequency in the population. To test for evidence of DISC1 association, we applied the Fisher’s exact test across all variants and all diagnoses (Figure 2). There was no evidence for SNP association at genome-wide levels of significance for any diagnosis when considered separately or combined, nor was there evidence for locus-wide association of variants with SZ or BP. We did detect a novel, locus-wide empirical association P=0.026 (OR=3.48, 95% CI=1.95–6.23, unadjusted P=6.3 × 10−5) between intronic variant rs16856199 and rMDD. We speculated that individual risk alleles might be predicted to segregate with disease in families. Twelve additional family members were available for genotyping for four rs16856199 carriers. The rs16856199 risk allele segregated with rMDD in all four families (Supplementary Figure S4).

Figure 2
figure 2

Region-wide association analysis for schizophrenia, bipolar and recurrent major depressive disorder. Nominal P-values for Fisher’s exact tests are plotted against genomic location (hg18) across the TRAX/DISC1 (Disrupted in schizophrenia 1) locus. Reference lines represent 1% (dashed) and 5% (solid) region-wide empirical thresholds. Only the association of rs16856199 and recurrent major depressive disorder remains significant at the 5% threshold (arrow).

PowerPoint slide

We next tested for association of rs1685199 with depression in three additional sample sets: a group of individuals referred from primary care to a hospital outpatient clinic (n=467 rMDD patients), and two population-based samples drawn from primary care as part of the Generation Scotland: Scottish Family Health Study consisting of 645 cases with rMDD and 690 cases with single episode MDD. All three groups were compared with 4017 controls drawn exclusively from Generation Scotland: Scottish Family Health Study (Supplementary additional text and Table S4). No significant association was seen with any individual replication set or all three combined (best P=0.088). Analysis of all three rMDD sets, both the original set and the two rMDD replication sets, was supportive of association (1112 rMDD, 4017 controls; P=0.0058, OR=1.46, 95%CI=1.12–1.91). Combined analysis of both sets of individuals referred from primary care showed stronger nominal association (P=0.00065, OR=1.76, 95%CI=1.27–2.44). No association was seen in the combined analysis of both the rMDD and MDD population-based replication sets (P=0.41). The risk allele for rs16856199 did not segregate with rMDD in 10 families of carriers identified from the Generation Scotland replication sample (Supplementary Figure S5). This suggests that there is increased evidence for association of rs16856199 in the more severely affected individuals.

SNP rs16856199 is on the Affymetrix 6.0 array, but the best tagging SNP on the Illumina 660W-Quad, Human Hap, Human1M-Duo arrays is rs16856189. SNP rs16856189 has an r2 of 0.27 with rs16856199, which may explain in part why this association has not been reported previously in genome-wide association studies.22, 62, 63 SNP rs6678723, which lies 2.1 kb distal to this SNP within intron 11, showed the most significant association of DISC1 in the recent mega-analysis of depression (P=0.0092).20

The LBC1936 has quantitative measures of symptoms of anxiety, depression and the personality trait of neuroticism, plus psychometrically tested measures of cognitive ability (fluid (age sensitive) and crystallised (non-age sensitive)) and cognitive ageing,60, 64 which have been shown to be highly heritable and polygenic.65, 66 Association of these traits with DISC1 was tested by linear regression analyses, co-varied for age at testing and sex (Supplementary Figures S6–S8). There were no region-wide significant findings for any of these quantitative traits.

Estimating the net pool of DISC1 variants

To estimate the effective pool of common and rare sequence variants in the European population, we applied a ‘mark-recapture’ approach (see Supplementary methods) to our data and that of the 1000 Genomes Project (v3.20101123)67 after appropriate checks on read depth and Sanger sequence validation (Supplementary Table S5; Supplementary Figure 9). The total number of DISC1 SNPs 1% MAF was estimated at 905 (95%CI=905±5), of which 901 (99.5%) are known (Supplementary Table S6). The number of rare variants (<1% MAF) is less confidently predicted, but is likely to be substantially higher (95%CI=3777±252) (Supplementary Table S6). Thus, despite the 2500 European genomes in which the DISC1 locus has been completely sequenced and the 2305 rare DISC1 variants now known, 40% or more remain to be discovered, and will be essentially ‘private’.

Rare amino-acid substitutions

Of the 17 rare coding SNPs previously reported for DISC1,10, 16, 17, 68, 69 we identified 12 (70.6%) plus an additional 8 non-synonymous variants, of which 5 are also absent from the European samples of both the 1000 Genomes Project (v3.20101123)61 and the Exome Variant Server (NHLBI GO Exome Sequencing Project (ESP), Seattle, WA; http://evs.gs.washington.edu/EVS/; September, 2012), and previous DISC1 sequencing studies.10, 11, 13, 14, 16, 17, 23, 46, 70 (Figure 1; Supplementary Figure S3; Supplementary Tables S2 and S3).

Five variants, R37W, T453M, T603M, L607F and S704C, are predicted to be deleterious by all three applied prediction algorithms (PolyPhen-2, Pmut and Panther; Supplementary Table S2). From this set, the functional effects of the common variants L607F (rs6675281) and S704C (rs821616) have been well documented in the literature, as mentioned earlier. R37W lies within a defined nuclear localisation signal71 and PDE4B binding site72 and is seen in a single case of rMDD (discussed at the end of this section). T453M is present at low frequency in cases and control individuals, both in this study and others (Supplementary Table S3). T603M was only identified in a single control, but Song et al.11 reported a T603I variant in a schizophrenic individual that was absent in their set of control individuals (Supplementary Table S3).

Five variants were only observed in affected individuals from our study (R37W, A83V, W160L, R233K and R418H), but not in 889 control individuals. Four of these non-synonymous case-only singletons are located in the largest coding exon, DISC1 exon 2, and the remaining variant is in DISC1 exon 4 (Figure 1). A83V was seen in a single individual with BD; this variant is predicted to be deleterious by PolyPhen-2 and Pmut and has been shown to affect wnt signalling.41 It was however observed at low frequency in controls and in individuals with partial agenesis of the corpus callosum in previous studies (Supplementary Table S3).11, 14, 16 Apart from R37W and A83V, none of the three remaining non-synonymous variants, W160L, R233K and R418H, are consistently predicted by the three prediction algorithms to have functional effects (Supplementary Table S2). 160L and 418H were detected in single SCZ individuals and have also previously been reported in individuals with SCZ; but 160L has also been detected in control individuals.10 The variant 233K has not been previously reported, and was identified in an individual with rMDD. No non-synonymous variants in TRAX were found in cases only. A single stop mutation was identified in a control individual and produces an alternative stop site for the DISC1 Es isoform. These variants can now be tested for potential impact on DISC1 biophysical properties, protein interaction and biological function.29, 30, 31

Evidence for familial segregation was sought for five rare exonic variants where additional family members were available, but none segregated perfectly or unequivocally with diagnosis (Supplementary Figure 9). Of note however was the identification of the non-synonymous amino-acid variant R37W (rs137948488), first reported68 in a subject with SZ, and seen here in a single case of rMDD. R37 is strictly conserved among orthologues and recent publications, including our own, have demonstrated biological effects of 37W on DISC1 interactions,32, 73 and shown a dominant-negative effect on the sub-cellular distribution of DISC1.44 Five additional family members of the 37W carrier were available for genotyping diagnosed with rMMD, generalised anxiety disorder, bipolar II or no psychiatric diagnosis at the time of assessment. The R37W mutation was present in relatives with rMDD and generalised anxiety disorder, but not in a relative with bipolar II, or any unaffected individual (Figure 3).

Figure 3
figure 3

Segregation of the R37W polymorphism with psychiatric diagnoses in a small Scottish family. The proband of the family is indicated (arrow). The codon containing the T allele encodes for the amino acid tryptophan (W) and the codon containing the A allele encodes arginine (R). rMDD, recurrent major depressive disorder; BP2, bipolar II.

PowerPoint slide

Burden analysis for putative functional variants

To explore the burden of SNPs of potential functional significance, all variants with MAF <1% were first validated by ABI3730 sequencing. There was no significant overall difference in the number of singleton variants (Supplementary Figure S10) or in the overall number of minor alleles by diagnosis (see Supplementary Methods). SNPs were classified on the basis of bioinformatic annotation into seven functional classes: those in exons including untranslated exons, coding sequence, non-synonymous coding SNPs, conserved regions, regions with regulatory potential, conserved transcription factor binding sites and CpG islands (see Materials and methods and Supplementary Table S7). The empirical P-values for the burden analysis were obtained by permutation correcting for the multiple thresholds tested, but not for the multiple functional subgroups or diagnostic classes (SZ, BP and rMDD and all cases combined), therefore all results are reported as nominal significance values (Supplementary Table S8; Supplementary Methods). Details of nominally significant results are given in Table 1. For rMDD only, there was a nominally significant (P=0.044) excess of minor alleles for SNPs with regulatory potential across all frequencies, and for rare SNPs in conserved regions with MAF 0.18% (P=0.022). Nominally significant association was found in the LBC1936 data between the burden of minor alleles across all frequencies for SNPs in conserved transcription factor binding sites and increased symptoms of depression, a measure of depressed mood at the time of testing (Table 1). In addition, a nominally significant increase in burden of minor alleles for SNPs in CpG islands or coding SNPs was observed with Moray House Test scores, measures of cognitive ability (Supplementary Table S9). Summaries of the nominally significant results are given in Table 1.

Table 1 Summary of nominally significant burden results

Discussion

Diagnoses of SZ, BP and rMDD were all present in the original Scottish family carrying the translocation that disrupts DISC1. All three DSMIV diagnoses have a strong and overlapping genetic component, but robust statistical analysis of gene-level contributions to risk are complicated by extensive genetic heterogeneity within and between diagnoses.4 We have provided the most comprehensive landscape of genetic variation at the DISC1 locus to date in patients with this spectrum of psychiatric illness and in healthy population controls with quantitative measures of mood and cognition. Comparison between our sequencing study and that of the 1000 Genomes Project confirms that current genome-wide association studies effectively captures the majority of common (but not rare) variants in the European population. Our sample size is large by current sequencing study standards, but we lack power to detect genome-wide significant P-values for either common or rare variants (see Supplementary Information and Supplementary Figure S11 for further details and also Kiezun et al.).74 Indeed, the predicted abundance of independent rare variants at this (and any other given) locus makes it highly improbable that any one will contribute to illness in the population at a frequency that will be statistically significant, given the numbers of patients we can afford to analyse by direct sequencing.74

We observed no evidence for association at the whole-genome level of statistical significance between individual rare or common variants and either psychiatric illness or cognition. This is consistent with recent findings,74 which suggest that much larger samples would be required to detect such associations. Burden analysis of multiple rare and/or deleterious putative functional variants also failed to show association with these traits. We do report both functional and putative regulatory variants that are both individually, and by functional classification, nominally associated with rMDD and/or cognitive ability at the locus-wide level of significance.

Our study identified a novel association between intronic SNP rs16856199 and rMDD in hospital-referral subjects. Segregation with diagnosis in the relatives of these probands corroborated the association, but further studies are required to understand the lack of replication in population-based cohorts with depression. This may be due to inherent differences between patients recruited from hospital-referral compared with those from population-based cohorts. Cohorts from primary care are more likely to have a family history of depression,75 and may have more physical and psychiatric comorbidity in general. Conversely, the population-based sample may have shorter, less severe episodes76 than the hospital-based cohorts.77 However, given the modestly significant P-value for rMDD in the discovery cohort, the number of psychiatric traits examined and the lack of replication, it is possible that the observed association is due to chance.

The nominal associations of the burden of common (threshold MAF=30.7%) and rare potentially regulatory variants (threshold MAF=0.060%) to measures of cognitive ability merit further study. A yet-to-be-defined subset of these is likely to have critical roles is spatial and temporal regulation of transcription and splicing.46, 78 This highlights the need for annotation tools with improved predictive value for non-coding variants.79

More importantly, our study demonstrates that substantial coding and non-coding genetic variation at the DISC1 locus remains undiscovered. Despite sequencing over 1500 subjects, we have probably captured only 40% of the extant DISC1 variants in just the European population. Crowley et al.23 sequenced 2.7 kb of DISC1 exons and 5’ and 3’ regulatory sequence in 1460 samples of European or African origin. We observed 13/38 (34%) of the variants genotyped in the replication phase, supporting the argument for an abundance of rare variants.

The level of sequence variation identified in our study is unlikely to be exceptional, and indeed is consistent with evidence emerging from other genome sequencing studies.74 Consequently, it will be challenging to demonstrate robust (replicated) association by statistical evidence alone in case–control studies, exceptionally so with the numbers of patients that are currently affordable for sequencing. The original t(1;11) family illustrates the added issue of variable penetrance and cross-boundary diagnosis for a given mutation: 70% of carriers had SZ, BP or rMDD, but 30% had no formal psychiatric diagnosis, yet t(1;11) carriers, including both affected and unaffected, had ERP P300 measures in the range typical of individuals with SZ.5 The original identification of 37W in a case of SZ and here in a case of rMDD (and two offspring, one with rMDD, the other generalised anxiety disorder) may suggest variable penetrance of this biologically functional variant.44 Of note, the 37W variant was not observed in 10 000 control individuals,11 the 1000 Genomes project,61 the NHLBI GO Exome Sequencing Project (ESP), nor any of our 889 control samples. These findings on R37W reinforce the probable importance of this domain for DISC1 subcellular distribution and binding of interacting proteins31, 44 and add to the weight of evidence for other functional studies of DISC1 amino-acid substitutions.41 Each observed amino-acid substitution provides a similar opportunity to tease out the relationships between genotype and phenotype and between structure and function.29, 31 Overall, these results demonstrate a high level of sequence variation in DISC1, a subset of which may contribute to psychiatric disorder in some individuals who will be typically rare in the population precluding classical statistical analysis and requiring biological validation. This predicts a population-specific contribution of rare casual variants to risk.80 Our results indicate the potential value of sequencing non-coding regions of the genome, which may harbour disease-associated regulatory variants. Our findings of both functional and putative regulatory variants nominally associated with depression and cognitive ability merit replication in independent samples and biological exploration.