708 Common and 2010 rare DISC1 locus variants identified in 1542 subjects: analysis for association with psychiatric disorder and cognitive traits

A balanced t(1;11) translocation that transects the Disrupted in schizophrenia 1 (DISC1) gene shows genome-wide significant linkage for schizophrenia and recurrent major depressive disorder (rMDD) in a single large Scottish family, but genome-wide and exome sequencing-based association studies have not supported a role for DISC1 in psychiatric illness. To explore DISC1 in more detail, we sequenced 528 kb of the DISC1 locus in 653 cases and 889 controls. We report 2718 validated single-nucleotide polymorphisms (SNPs) of which 2010 have a minor allele frequency of <1%. Only 38% of these variants are reported in the 1000 Genomes Project European subset. This suggests that many DISC1 SNPs remain undiscovered and are essentially private. Rare coding variants identified exclusively in patients were found in likely functional protein domains. Significant region-wide association was observed between rs16856199 and rMDD (P=0.026, unadjusted P=6.3 × 10−5, OR=3.48). This was not replicated in additional recurrent major depression samples (replication P=0.11). Combined analysis of both the original and replication set supported the original association (P=0.0058, OR=1.46). Evidence for segregation of this variant with disease in families was limited to those of rMDD individuals referred from primary care. Burden analysis for coding and non-coding variants gave nominal associations with diagnosis and measures of mood and cognition. Together, these observations are likely to generalise to other candidate genes for major mental illness and may thus provide guidelines for the design of future studies.

INTRODUCTION Schizophrenia (SZ), bipolar disorder (BD) and recurrent major depressive disorder (rMDD) are common forms of serious mental illness, each with a strong and overlapping genetic component. [1][2][3] Genome-wide linkage, association, cytogenetic, copy number variant and, more recently, sequencing studies establish that the genetic architecture of psychiatric illness is complex and that there is extensive genetic heterogeneity, which is incompletely defined or understood (reviewed in Sullivan et al. 4 ). We previously reported a t(1;11) translocation in a single large Scottish family that showed genome-wide significant linkage for SZ, rMDD and jointly with BD. 5 The t(1;11) translocation is balanced and structurally simple, but the outcome is genetically complex, disrupting the protein coding gene Disrupted in schizophrenia 1 (DISC1), the antisense non-coding gene Disrupted in schizophrenia 2 (DISC2) and the non-coding gene DISC1FP1, creating a DISC1/DISC1FP1 fusion transcript. [6][7][8][9][10][11] Several small independent studies have reported evidence for association of single DISC1 single-nucleotide polymorphisms (SNPs) (coding and non-coding) or haplotypes with SZ, BD, rMDD and other neuropsychiatric traits, including autism spectrum disorder, cognition, normative cognitive ageing, anxiety and structural and functional brain imaging phenotypes. 9,12,13 Rare amino-acid substitution variants in DISC1 have been reported in cases of SZ, 10,11 BD, 14 rMDD, 13 autism spectrum disorder 15 and agenesis of the corpus callosum, 16 as has an increased burden of rare missense variants in exon 11 of DISC1 for schizoaffective disorder, 17 and for DISC1 pathway genes in SZ. 10 In contrast, a meta-analysis of all known common variants within the DISC locus, from a total of 11 626 cases and 15 237 controls that involved the testing of 1241 SNPs, found no evidence that common variants at the DISC locus are significantly associated with SZ. 18 Moreover, the DISC1 locus has not reached genomewide significance in large-scale meta-analyses of linkage studies of SZ, 19 nor have its common variants in large-scale genome-wide association studies of SZ, BD or rMDD. [20][21][22] A recent exon-based study that sequenced 2.7 kb of DISC1 in 727 cases of SZ and 733 controls found 32 rare alleles (minor allele frequency (MAF)o0.01) in SZ cases and 40 in European controls with no evidence for a significantly increased burden of likely pathogenic variants. 23 DISC1, however, continues to feature strongly in attempts to assess genome-wide association results in terms of networks 24 or in combination with known biological function. [25][26][27] The biological functions of DISC1 fit well with current aetiological concepts in SZ-related major mental illness and cognition. 28 DISC1 is a scaffold protein that interacts with, and modulates the activity of, multiple proteins with key roles in neurodevelopment, neurogenesis, neuronal migration, integration and signalling [29][30][31] including the antidepressant and antipsychotic targets GSK3b 32 and PDE4. 33 Several common and rare amino-acid polymorphisms of DISC1 have predicted deleterious effects on protein function and demonstrable biological effects in experimental settings. 31 The 704C allele is associated with reduced activity of ERK1 and Akt kinases, altered binding affinities of DISC1 for NDE1 and NDEL1 and variation in DISC1 oligomeric status. [34][35][36][37] The 607F allele results in (a) reduced binding and centrosomal localisation of PCM1, (b) reduced noradrenaline neurotransmitter release in SH-SY5Y cells, 38 (c) altered mitochondrial trafficking 39 and (d) a partial shift from neuronal to glial expression in the brain. 40 Furthermore, Singh et al. 41 reported that 607F impacts negatively on neural progenitor proliferation in E16 mouse brain, correlates with aberrant wnt signalling in human lymphoblasts, and is associated with a neurodevelopmental phenotype in morpholino mutant zebrafish. R37W lies within an arginine-rich nuclear localisation motif and a partially overlapping interaction domain for PDE4 42 and GSK3b. 41,43 The 37W allele shows reduced nuclear DISC1 expression and altered DISC1 regulation of ATF4, a critical modulator of cAMP signalling and mediator of the stress response. 44 In summary, whereas the primary genetic evidence from the original Scottish t(1;11) family was significant at the genome-wide significance level for both SZ and rMDD and the experimental evidence and biological plausibility of DISC1 remains strong, the evidence from subsequent linkage and candidate gene association studies is however inconsistent and not supported by genome-wide association studies or meta-analysis. 12 To explore these contrasting findings, we aimed here to establish the nature and frequency of DISC1 genomic sequence variants, identify rare variants in putative functional domains, and test for effects of these on cognitive traits and the risk of psychiatric illness. We comprehensively sequenced 528 kb covering the entire DISC1 locus, including TRAX (also known as TSNAX) for which there is evidence for intergenic splicing with DISC1 45 and the intergenic region, which contains regulatory elements immediately 5' of DISC1. 13,[46][47][48]

MATERIALS AND METHODS
A full summary of the methods can be found in Supplementary Information. Briefly, all study participants gave signed consent for their data and samples to be used in studies that have been approved by the appropriate Research Ethics Committee or the GS access Committee. Genomic DNA from each individual was whole genome amplified in triplicate, the products pooled and amplified with primer pairs tiled across 528 kb of TRAX/DISC1 (hg18 chr1:229723339-230251606; hg19 chr1:231656716-232184983). For each sample, the pooled products were sheared, converted into paired-end Illumina libraries and sequenced on an Illumina GAII or HiSeq 2000 sequencer to 480% coverage and 430-fold depth. Sequences were aligned to the UCSC hg18 reference sequence, variants called using MAQ software 49 and the variants in repeats removed. Ten percent of all remaining variants were validated using Sanger sequencing chemistry on an ABI3730 sequencer, and the derived information used to optimise the quality control filters. After quality control screening, all exonic and low frequency (MAFo1%) variants were also validated by Sanger chemistry sequencing as above. The variants were functionally annotated using SNPnexus 50 (http://www.snp-nexus.org).
Non-coding variants were annotated using the UCSC table browser for the following tracks: 'RepeatMasker', 51 'CpG island', 'TFBS conserved', '7x Reg Potential' (which substantially overlaps with DNAse hypersensitivity sites) and/or '28-Way Most conserved-PlacMammal' (http://genome.ucsc .edu/). Sequence variants classified as coding were mapped to the DISC1 L isoform and potential pathogenicity ascribed using Pmut, 52 Panther 53 and PolyPhen-2. 54 The coding sequence variants were mapped onto a list of known curated DISC1-interactor binding sites 31 and with other functional elements (for example, phosphorylation sites 55 ). Case-control association was tested on the combined case samples as well as individually for SZ, BP and rMDD using Fisher's exact test. Permutation was used to derive region-wide P-values and significance thresholds. Quantitative trait association analyses using LBC1936 samples were performed by linear regression of the trait residuals (adjusted for age and sex) on the number of minor alleles at each SNP, with empirical P-values estimated by permutation to avoid issues with the test statistic distribution caused by the combination of rare variants and slight deviations from normality in the phenotypes. All association analyses were performed using PLINK. 56 Mark-recapture analysis followed the Lincoln-Petersen and Modified Petersen methods 57,58 with 95% confidence intervals calculated following Chapman. 59 Burden analysis was performed in PLINK/SEQ to implement BURDEN and VTTEST with empirical P-values estimated using permutation. Genotyping of the replication and familial samples was performed by the Edinburgh Wellcome Trust Clinical Research Facility Genetics Core using TaqMan SNP genotyping assay C__33950433_10 with concurrent genotyping of known heterozygotes.

Data access
The accession numbers for sequence data are NCBI ss472328925-ss472331023.

Sequence analysis
We sequenced 1542 Caucasians from Scotland comprising 240 cases of SZ, 221 cases of BD, 192 cases of rMDD and, as controls, 889 members of the Lothian Birth Cohort 1936 (LBC1936), which have been extensively phenotyped. 60 Each sample was sequenced to 480% coverage at a minimum of 30fold read depth by long-range PCR and sequencing on either Illumina GAII or HiSeq 2000 sequencers. To ensure a robust data set, all variants within repetitive regions were removed. Final quality score thresholds for the data were derived from capillary sequence validation of 10% of the remaining variants. All variants with an MAF o1% were validated by ABI3730 sequencing. After quality control, there was no evidence for sequencing bias between cases and controls (Supplementary Figure S1). Allele frequencies from our sample showed strong concordance to those from the European subset of the 1000 Genomes Project 61 (Supplementary Figure S2). We report 2718 SNPs in the 1542 samples analysed, 708 at X1% and 2010 at o1% MAF (Supplementary Table S1). Only 1027 of the 2718 SNPs (38%) were previously reported in the European subset of the 1000 Genomes Project. 61 As defined and annotated by the UCSC genome browser (http:// genome.ucsc.edu), 489 SNPs mapped to regions of regulatory potential, 177 to non-coding exons (including DISC2) and 36 to coding regions of exons. Of these 36 variants, 12 were synonymous changes, 23 were non-synonymous changes, with one producing a stop codon consistent with the DISC1 Es isoform ( Table S3 summarises the overlap between variants identified in this study and other DISC1 sequencing studies and relevant association studies. 10,11,13,14,16,17,23,46 Association and segregation of common variants with psychiatric illness and related quantitative traits Genome-wide association studies of SZ, BP and rMDD are most consistent with a polygenic liability for common variants, but they also imply that there is real 'missing' genetic variation, which is most likely due to risk variants having low frequency in the population. To test for evidence of DISC1 association, we applied the Fisher's exact test across all variants and all diagnoses ( Figure 2). There was no evidence for SNP association at genomewide levels of significance for any diagnosis when considered separately or combined, nor was there evidence for locus-wide association of variants with SZ or BP. We did detect a novel, locus-wide empirical association P ¼ 0.026 (OR ¼ 3.48, 95% CI ¼ 1.95-6.23, unadjusted P ¼ 6.3 Â 10 À 5 ) between intronic variant rs16856199 and rMDD. We speculated that individual risk alleles might be predicted to segregate with disease in families. Twelve additional family members were available for genotyping for four rs16856199 carriers. The rs16856199 risk allele segregated with rMDD in all four families (Supplementary Figure S4). We next tested for association of rs1685199 with depression in three additional sample sets: a group of individuals referred from primary care to a hospital outpatient clinic (n ¼ 467 rMDD patients), and two population-based samples drawn from primary care as part of the Generation Scotland: Scottish Family Health Study consisting of 645 cases with rMDD and 690 cases with single episode MDD. All three groups were compared with 4017 controls drawn exclusively from Generation Scotland: Scottish Family Health Study (Supplementary additional text and Table S4). No significant association was seen with any individual replication set or all three combined (best P ¼ 0.088). Analysis of all three rMDD sets, both the original set and the two rMDD replication sets, was supportive of association (1112 rMDD, 4017 controls; P ¼ 0.0058, OR ¼ 1.46, 95%CI ¼ 1.12-1.91). Combined analysis of both sets of individuals referred from primary care showed stronger nominal association (P ¼ 0.00065, OR ¼ 1.76, 95%CI ¼ 1. 27-2.44). No association was seen in the combined analysis of both the rMDD and MDD population-based replication sets (P ¼ 0.41). The risk allele for rs16856199 did not segregate with rMDD in 10 families of carriers identified from the Generation Scotland replication sample (Supplementary Figure S5). This suggests that there is increased evidence for association of rs16856199 in the more severely affected individuals. SNP rs16856199 is on the Affymetrix 6.0 array, but the best tagging SNP on the Illumina 660W-Quad, Human Hap, Human1M-Duo arrays is rs16856189. SNP rs16856189 has an r 2 of 0.27 with rs16856199, which may explain in part why this association has not been reported previously in genome-wide association studies. 22,62,63 SNP rs6678723, which lies 2.1 kb distal to this SNP within intron 11, showed the most significant association of DISC1 in the recent mega-analysis of depression (P ¼ 0.0092). 20 The LBC1936 has quantitative measures of symptoms of anxiety, depression and the personality trait of neuroticism, plus psychometrically tested measures of cognitive ability (fluid (age sensitive) and crystallised (non-age sensitive)) and cognitive ageing, 60,64 which have been shown to be highly heritable and polygenic. 65,66 Association of these traits with DISC1 was tested by linear regression analyses, co-varied for age at testing and sex ( Supplementary Figures S6-S8). There were no region-wide significant findings for any of these quantitative traits.
Estimating the net pool of DISC1 variants To estimate the effective pool of common and rare sequence variants in the European population, we applied a 'mark-recapture' approach (see Supplementary methods) to our data and that of the 1000 Genomes Project (v3.20101123) 67 after appropriate checks on read depth and Sanger sequence validation (Supplementary Table S5; Supplementary Figure 9). The total number of DISC1 SNPs X1% MAF was estimated at 905 (95%CI ¼ 905 ± 5), of which 901 (99.5%) are known (Supplementary Table S6). The number of rare variants (o1% MAF) is less confidently predicted, but is likely to be substantially higher (95%CI ¼ 3777 ± 252) ( Supplementary Table S6). Thus, despite the B2500 European genomes in which the DISC1 locus has been completely sequenced and the 2305 rare DISC1 variants now known, B40% or more remain to be discovered, and will be essentially 'private'.  Tables S2 and S3).
Five variants, R37W, T453M, T603M, L607F and S704C, are predicted to be deleterious by all three applied prediction algorithms (PolyPhen-2, Pmut and Panther; Supplementary  Table S2). From this set, the functional effects of the common variants L607F (rs6675281) and S704C (rs821616) have been well documented in the literature, as mentioned earlier. R37W lies within a defined nuclear localisation signal 71 and PDE4B binding site 72 and is seen in a single case of rMDD (discussed at the end of this section). T453M is present at low frequency in cases and control individuals, both in this study and others (Supplementary  Table S3). T603M was only identified in a single control, but Song et al. 11 reported a T603I variant in a schizophrenic individual that was absent in their set of control individuals (Supplementary Table S3).
Five variants were only observed in affected individuals from our study (R37W, A83V, W160L, R233K and R418H), but not in 889 control individuals. Four of these non-synonymous case-only singletons are located in the largest coding exon, DISC1 exon 2, and the remaining variant is in DISC1 exon 4 ( Figure 1). A83V was seen in a single individual with BD; this variant is predicted to be deleterious by PolyPhen-2 and Pmut and has been shown to affect wnt signalling. 41 It was however observed at low frequency in controls and in individuals with partial agenesis of the corpus callosum in previous studies (Supplementary Table S3). 11,14,16 Apart from R37W and A83V, none of the three remaining nonsynonymous variants, W160L, R233K and R418H, are consistently predicted by the three prediction algorithms to have functional effects (Supplementary Table S2). 160L and 418H were detected in single SCZ individuals and have also previously been reported in individuals with SCZ; but 160L has also been detected in control individuals. 10 The variant 233K has not been previously reported, and was identified in an individual with rMDD. No nonsynonymous variants in TRAX were found in cases only. A single Figure 2. Region-wide association analysis for schizophrenia, bipolar and recurrent major depressive disorder. Nominal P-values for Fisher's exact tests are plotted against genomic location (hg18) across the TRAX/DISC1 (Disrupted in schizophrenia 1) locus. Reference lines represent 1% (dashed) and 5% (solid) region-wide empirical thresholds. Only the association of rs16856199 and recurrent major depressive disorder remains significant at the 5% threshold (arrow).
stop mutation was identified in a control individual and produces an alternative stop site for the DISC1 Es isoform. These variants can now be tested for potential impact on DISC1 biophysical properties, protein interaction and biological function. [29][30][31] Evidence for familial segregation was sought for five rare exonic variants where additional family members were available, but none segregated perfectly or unequivocally with diagnosis (Supplementary Figure 9). Of note however was the identification of the non-synonymous amino-acid variant R37W (rs137948488), first reported 68 in a subject with SZ, and seen here in a single case of rMDD. R37 is strictly conserved among orthologues and recent publications, including our own, have demonstrated biological effects of 37W on DISC1 interactions, 32,73 and shown a dominantnegative effect on the sub-cellular distribution of DISC1. 44 Five additional family members of the 37W carrier were available for genotyping diagnosed with rMMD, generalised anxiety disorder, bipolar II or no psychiatric diagnosis at the time of assessment. The R37W mutation was present in relatives with rMDD and generalised anxiety disorder, but not in a relative with bipolar II, or any unaffected individual (Figure 3).

Burden analysis for putative functional variants
To explore the burden of SNPs of potential functional significance, all variants with MAF o1% were first validated by ABI3730 sequencing. There was no significant overall difference in the number of singleton variants (Supplementary Figure S10) or in the overall number of minor alleles by diagnosis (see Supplementary Methods). SNPs were classified on the basis of bioinformatic annotation into seven functional classes: those in exons including untranslated exons, coding sequence, non-synonymous coding SNPs, conserved regions, regions with regulatory potential, conserved transcription factor binding sites and CpG islands (see Materials and methods and Supplementary Table S7). The empirical P-values for the burden analysis were obtained by permutation correcting for the multiple thresholds tested, but not for the multiple functional subgroups or diagnostic classes (SZ, BP and rMDD and all cases combined), therefore all results are reported as nominal significance values (Supplementary Table  S8; Supplementary Methods). Details of nominally significant results are given in Table 1. For rMDD only, there was a nominally significant (P ¼ 0.044) excess of minor alleles for SNPs with regulatory potential across all frequencies, and for rare SNPs in conserved regions with MAF p0.18% (P ¼ 0.022). Nominally significant association was found in the LBC1936 data between the burden of minor alleles across all frequencies for SNPs in conserved transcription factor binding sites and increased symptoms of depression, a measure of depressed mood at the time of testing (Table 1). In addition, a nominally significant increase in burden of minor alleles for SNPs in CpG islands or coding SNPs was observed with Moray House Test scores, measures of cognitive ability (Supplementary Table S9). Summaries of the nominally significant results are given in Table 1.

DISCUSSION
Diagnoses of SZ, BP and rMDD were all present in the original Scottish family carrying the translocation that disrupts DISC1. All three DSMIV diagnoses have a strong and overlapping genetic component, but robust statistical analysis of gene-level contributions to risk are complicated by extensive genetic heterogeneity within and between diagnoses. 4 We have provided the most comprehensive landscape of genetic variation at the DISC1 locus to date in patients with this spectrum of psychiatric illness and in healthy population controls with quantitative measures of mood and cognition. Comparison between our sequencing study and that of the 1000 Genomes Project confirms that current genomewide association studies effectively captures the majority of common (but not rare) variants in the European population. Our sample size is large by current sequencing study standards, but we lack power to detect genome-wide significant P-values for either common or rare variants (see Supplementary Information and Supplementary Figure S11 for further details and also Kiezun et al.). 74 Indeed, the predicted abundance of independent rare variants at this (and any other given) locus makes it highly improbable that any one will contribute to illness in the population at a frequency that will be statistically significant, given the numbers of patients we can afford to analyse by direct sequencing. 74 We observed no evidence for association at the whole-genome level of statistical significance between individual rare or common variants and either psychiatric illness or cognition. This is consistent with recent findings, 74 which suggest that much larger samples would be required to detect such associations. Burden analysis of multiple rare and/or deleterious putative functional variants also failed to show association with these traits. We do report both functional and putative regulatory variants that are both individually, and by functional classification, nominally associated with rMDD and/or cognitive ability at the locus-wide level of significance.
Our study identified a novel association between intronic SNP rs16856199 and rMDD in hospital-referral subjects. Segregation with diagnosis in the relatives of these probands corroborated the association, but further studies are required to understand the lack of replication in population-based cohorts with depression. This may be due to inherent differences between patients recruited from hospital-referral compared with those from populationbased cohorts. Cohorts from primary care are more likely to have a family history of depression, 75 and may have more physical and psychiatric comorbidity in general. Conversely, the populationbased sample may have shorter, less severe episodes 76 than the hospital-based cohorts. 77 However, given the modestly significant P-value for rMDD in the discovery cohort, the number of psychiatric traits examined and the lack of replication, it is possible that the observed association is due to chance.
The nominal associations of the burden of common (threshold MAF ¼ 30.7%) and rare potentially regulatory variants (threshold MAF ¼ 0.060%) to measures of cognitive ability merit further study. A yet-to-be-defined subset of these is likely to have critical roles is spatial and temporal regulation of transcription and splicing. 46,78 This highlights the need for annotation tools with improved predictive value for non-coding variants. 79 More importantly, our study demonstrates that substantial coding and non-coding genetic variation at the DISC1 locus Figure 3. Segregation of the R37W polymorphism with psychiatric diagnoses in a small Scottish family. The proband of the family is indicated (arrow). The codon containing the T allele encodes for the amino acid tryptophan (W) and the codon containing the A allele encodes arginine (R). rMDD, recurrent major depressive disorder; BP2, bipolar II. remains undiscovered. Despite sequencing over 1500 subjects, we have probably captured only B40% of the extant DISC1 variants in just the European population. Crowley et al. 23 sequenced 2.7 kb of DISC1 exons and 5' and 3' regulatory sequence in 1460 samples of European or African origin. We observed 13/38 (34%) of the variants genotyped in the replication phase, supporting the argument for an abundance of rare variants.
The level of sequence variation identified in our study is unlikely to be exceptional, and indeed is consistent with evidence emerging from other genome sequencing studies. 74 Consequently, it will be challenging to demonstrate robust (replicated) association by statistical evidence alone in casecontrol studies, exceptionally so with the numbers of patients that are currently affordable for sequencing. The original t(1;11) family illustrates the added issue of variable penetrance and crossboundary diagnosis for a given mutation: B70% of carriers had SZ, BP or rMDD, but B30% had no formal psychiatric diagnosis, yet t(1;11) carriers, including both affected and unaffected, had ERP P300 measures in the range typical of individuals with SZ. 5 The original identification of 37W in a case of SZ and here in a case of rMDD (and two offspring, one with rMDD, the other generalised anxiety disorder) may suggest variable penetrance of this biologically functional variant. 44 Of note, the 37W variant was not observed in 10 000 control individuals, 11 the 1000 Genomes project, 61 the NHLBI GO Exome Sequencing Project (ESP), nor any of our 889 control samples. These findings on R37W reinforce the probable importance of this domain for DISC1 subcellular distribution and binding of interacting proteins 31,44 and add to the weight of evidence for other functional studies of DISC1 amino-acid substitutions. 41 Each observed amino-acid substitution provides a similar opportunity to tease out the relationships between genotype and phenotype and between structure and function. 29,31 Overall, these results demonstrate a high level of sequence variation in DISC1, a subset of which may contribute to psychiatric disorder in some individuals who will be typically rare in the population precluding classical statistical analysis and requiring biological validation. This predicts a population-specific contribution of rare casual variants to risk. 80 Our results indicate the potential value of sequencing non-coding regions of the genome, which may harbour disease-associated regulatory variants. Our findings of both functional and putative regulatory variants nominally associated with depression and cognitive ability merit replication in independent samples and biological exploration.

CONFLICT OF INTEREST
WRM has participated in Illumina sponsored meetings over the past four years and received travel reimbursement and an honorarium for presenting at these events.