Genome-wide Association Study of Pediatric Obsessive-Compulsive Traits: Shared Genetic Risk between Traits and Disorder

This study examined the genetic correlates of obsessive-compulsive (OC) traits and their shared genetic risks with obsessive-compulsive disorder (OCD). We conducted genome-wide association analyses on OC traits in 5018 unrelated Caucasian children and adolescents. Overall OC traits and trait dimensions (e.g., cleaning/contamination) were measured with the Toronto Obsessive-Compulsive scale (TOCS). One locus tagged by rs7856850 in an intron of PTPRD (protein tyrosine phosphatase δ) was associated with OC traits at the genome-wide significance level (p=2.48×10−8). A variant in GRID2 was significantly associated with only the symmetry/ordering dimension (p=3.2×10−8). We tested the role of central nervous system (CNS) and glutamate gene-sets using hypothesis-driven methods. A stratified False Discovery Rate found OC traits were associated with SNPs in three CNS genes: NPAS2 (p=7.8×10−7), GRID2 (p=1.6×10−6) and SH3GL2 (p=1.9×10−7). The combined effect of neither the CNS development nor the glutamate gene-set were associated with OC traits using the competitive gene-set test implemented with MAGMA. We replicated the SNP in PTPRD in a meta-analysis of three independent OCD case/control genome-wide datasets (p=0.0069, cases=3384, controls=8363). Polygenic risk from OC traits was significantly associated with OCD in a sample of childhood-onset OCD and vice versa (p’s<0.01). OC traits were highly but not significantly correlated with OCD (rg =0.83, p=0.07). We report the first replicated genome-wide significant variant for OCD traits. Our results indicate that OC traits in the general population share genetic risk with OCD in independent samples. This study demonstrates the feasibility and power of using trait-based approaches in community samples in psychiatric genomics.

Obsessive-compulsive disorder (OCD) is a common (1-2% prevalence [1]) psychiatric disorder characterized by intrusive, recurrent thoughts and repeated, ritualized behaviors. OCD symptoms cluster into several distinct dimensions (e.g., cleaning/contamination [2]). Childhood-onset OCD (before the age of 18) occurs in 30-50% of cases [3] and is more heritable compared to adult-onset OCD [4]. Two genome-wide association studies (GWAS) in clinical populations with mixed ages of OCD-onset and a meta-analysis of these studies did not identify genome-wide significant findings [5][6][7]. Top hits from previous GWAS include SNPs within DLGAP1, BTBD3, GRID2 and close to PTPRD. These genes have been functionally linked to glutamate neurotransmission and neurodevelopment, which are recurrent themes in the OCD genetic and imaging literature [8]. A previous study of obsessive-compulsive (OC) symptoms in a community-based sample of adult twins identified a genome-wide significant SNP in MEF2B (rs8100480) [9]. However, this SNP was not replicated in an independent sample [6]. OCD symptom dimensions have some shared but some distinct genetic risks [10,11], however, there have been no GWAS on OCD dimensions to date.
We conducted a GWAS of quantitative OC traits, and secondarily OC trait dimensions, using the Toronto Obsessive-Compulsive Scale (TOCS [12]) in a large pediatric, community-based sample: Spit for Science [13]. TOCS scores are heritable, factor into six commonly-observed OCD symptom dimensions [11] and include negative scores that represent 'strengths' (e.g., never upset when their belongings are rearranged) and positive scores that represent 'weaknesses' (e.g., very upset when their belongings are rearranged). The strength-to-weakness format generates scores with a more normal distribution than observed with existing OCD scales. The latter generate J-shaped distributions, especially in communitybased samples where the prevalence of OC symptoms is low [14,15]. We checked if the distribution of TOCS scores would affect the power of genetic discovery [16] by collapsing the strengths/negative scores into scores of zero, thereby replicating a J-shaped distribution. We used the collapsed TOCS measure in a secondary GWAS with another OCD symptom measure: Child Behavior Checklist -Obsessive-Compulsive Scale (CBCL-OCS [17]). To further understand the biology of OC traits, we used hypothesis-driven genome-wide approaches to test the role of genetic variants annotated to genes implicated in two leading biological hypotheses for OCD: brain development and glutamate function [18]. Genome-wide significant variants were tested for association with brain expression quantitative trait loci (eQTLs). We calculated SNP-based heritability of total OC trait scores and, secondarily, trait dimension scores to estimate the contribution of common genetic factors. We also examined the genetic correlation of total OC trait scores with other medical/mental health disorders and traits. Finally, we tested the hypothesis that OC traits in the community share genetic risk with OCD by examining individual genetic variants, genetic correlations and polygenic risk between OC traits in Spit for Science and three independent OCD case/control samples and examined if the top hits from the previous GWAS of OC symptoms [9] replicated in our study.

Discovery Participants
The Spit for Science sample is described in detail elsewhere [13]. Briefly, the sample included 15 880 participants with complete demographic, questionnaire and family information (mean age=11.

OC Trait Measure
We measured parent-and self-reported OC traits within the last 6 months using the TOCS, a 21-item questionnaire described previously [11,12]. Each item was scored on a 7-point Likert scale ranging from -3 ('far less often than others of the same age') to +3 ('far more often than others of the same age'). A score of zero was designated as an average amount of time compared to same-age peers. The TOCS total score was standardized into a z-score to account for age, sex and questionnaire respondent (parent or self). Details of z-score creation and OC dimensions scores are described in the supplement. We tested the impact of the strength/weakness structure of the TOCS by comparing it to an OCD symptom measure with a j-shaped distribution -CBCL-OCS [17]. We also re-scored the TOCS by collapsing all strengths (negative scores) into zero. This 'collapsed' TOCS total score created a distribution similar to the CBCL-OCS (see supplement and Supplemental Figure S1 for details).

Genetic Data
DNA was extracted manually from saliva using standard methods (see the supplement for additional details). We excluded any samples with concentrations <60ng/µl and insufficient quality based on agarose gels. We genotyped 5645 samples on the Illumina HumanCoreExome-12v1.0_B (HumanCore) and 192 samples on the Illumina HumanOmni1-Quad V1.0_B (Omni) bead chip arrays (Illumina, San Diego, CA, USA) at The Centre for Applied Genomics (Hospital for Sick Children, Toronto, CA). There were 538 448 markers on the HumanCore and 1 140 449 markers on the Omni array.
Quality control (QC) was conducted separately for each array using standard methods with PLINK v1.90 [19]. Sample exclusion and selection criteria are described in the supplemental methods and Supplemental Figure S2. Imputation was performed separately for all platforms and sample sets, using Beagle v4.1 [20,21] using the data from phase 3, version 5 of the 1000 Genomes project for reference (http://bochet.gcc.biostat.washington.edu/beagle/1000_Genomes_phase3_v5a/). We excluded individuals who were non-Caucasian based on principal component (PC) analysis and included only one participant from each family (inferred sibs or half-sibs, see supplement).

Analyses
GWAS were conducted using R (v3.5.1). Our primary analysis tested if imputed dosage and standardized TOCS total score were associated using a linear regression model that included the top three PCs and genotyping array as covariates. We included SNPs with a minor allele frequency (MAF)>1%, allelic R 2 imputation quality (AR2>0.3) and used the standard genome-wide threshold of p≤5x10 -8 . In the supplement, we describe secondary analyses including GWAS with non-standardized TOCS total scores, gene-based GWAS, analyses for CBCL-OCS, collapsed TOCS total score and the six TOCS OC dimensions.
Using hypothesis-driven methods, we conducted GWAS that prioritized SNPs within two gene-sets involved in brain/central nervous system (CNS) development and glutamate receptors and transporters selected based on previous literature [8](see supplement/Supplemental Table S1 for gene-set details).
We used a stratified False Discovery Rate (sFDR - [22]) to test the significance of individual SNPs. In separate analyses for each gene-set, all the SNPs in the gene-set were assigned to high priority groups and the remaining SNPs were assigned to low priority groups and FDR was controlled separately in the high and low priority groups. We then tested the association of each gene-set collectively using the MAGMA competitive gene-set test [23] with a Bonferroni correction to account for testing two hypotheses (α=0.025).

Participants
For replication analyses, we investigated three independent OCD case/control cohorts: 1) the

Analyses
GWAS summary stats from each replication sample were meta-analyzed using fixed-effect inverse variance methods. We tested if the results from the gene-based GWAS and hypothesis-driven methods for the TOCS total score replicated in the OCD samples by conducting the same genome-wide analyses as described above or in the supplement. Polygenic risk score (PRS) analyses were performed using LDpred v1.06 ( [30]; see supplement). First, we derived PRS for TOCS from the Spit for Science sample and tested their association with case/control status in the combined OCD replication cohorts (target sample: CHOP, Michigan/Toronto and a subset of the IOCDF-GC/OCGAS -see supplement). Second, we derived PRS from the combined OCD replication cohorts and tested their association with the standardized TOCS total score in the Spit for Science sample (target sample).
We examined the potential shared genetic risk between the Spit for Science and the meta-analyzed replication samples using genetic correlations estimated with LDSC [27]. Finally, we examined the top variants in the only previous GWAS of OC symptoms (only 20 loci reported [9]) in the results from the TOCS total score GWAS.

Discovery
We used 5018 participants for GWAS analyses after sample exclusion and selection (see supplement and Supplemental Figures S2/S3). In the primary analysis, rs7856850 in PTPRD was significantly associated with TOCS total scores at the genome-wide level (p=2.48x10 -8 , β=0.14, s.e.=0.025, R 2 =0.618%: Figure 1a, top hits listed in Supplemental Table S2). Several variants in this region that approached genome-wide significance were in linkage disequilibrium (LD) with rs7856850, which was genotyped on both the HumanCore and OMNI arrays ( Figure 1B). The inflation factor λ was 1.008 while the intercept of LD score regression was 1.003 and not significantly different from 1 (s.e.=0.007, p=0.66; Figure 1C). eQTL results for this and all genome-wide significant SNPs are presented in the supplement. In the secondary analyses, the results for rs7856850 were similar for the non-standardized TOCS total score (see supplement. However, when we analyzed the collapsed TOCS total score and the CBCL-OCS, the genome-wide significant locus for the TOCS total score rs7856850 was no longer genome-wide significant, although still had the same direction of effect (p=0.00045 and p=0.025 respectively; see supplement for details). Only one TOCS OC dimension, symmetry/ordering, yielded a genome-wide significant association (rs5860287 in GRID2, p=3.2x10 -8 ; β=0.118; s.e.=0.0213; R 2 =0.610%; Supplemental Figure S4).

Results from the sFDR are shown in Supplemental
showing that these loci are independent.
Among the glutamate gene-set, only SNPs in GRID2 (a gene also annotated to the CNS gene-set) approached significance (Supplemental Figure S7, Table S3). When we compared CNS development and glutamate gene-sets to all other gene-sets in the genome with the MAGMA competitive gene-set test, neither set were significantly associated with TOCS total scores (p's>=0.33).
The heritability of the TOCS total score was h 2 =0.068 (s.e.=0.052, p=0.19) using GCTA and h 2 =0.073 (s.e.=0.064; p=0.25) using LDSC when the intercept was constrained to 1. None of the OC dimensions were significantly heritable (Supplemental Table S4). TOCS total score was not significantly associated with any phenotypes on LD Hub (see supplement).

Replication
Following standard QC and sample exclusion where applicable (see supplement), we had a total of 3369 cases and 8611 controls in our replication sample (Table 1). We tested if the genome-wide SNP associated with TOCS total scores in Spit for Science replicated in the meta-analyzed OCD cohorts. rs7856850 was associated with increased odds of being an OCD case (p=0.0069, OR=1.104 per A allele [95% confidence limit 1.03-1.19], Figure 3, Supplemental Figure S8). A gene-based GWAS by MAGMA using FUMA did not identify any genome-wide significant genes. Therefore, none of the genes identified in the gene-based GWAS of OC traits were replicated (see supplement). We then tested if the 56 SNPs in the CNS development list identified in the sFDR analysis were associated with OCD case/control status in the replication meta-analysis. None of these SNPs replicated even without a Bonferroni correction for multiple testing (see supplement). Figure 4a shows that PRS calculated for TOCS total scores was significantly associated with increased odds of being a case in the meta-analyzed OCD replication samples (Nagelkerke's pseudo r 2 =0.277%, p=0.0045 at ρ=0.003). Figure 4b shows that PRS constructed from the OCD replication sample were significantly associated with TOCS total scores in Spit for Science (r 2 =0.24%; p=0.00057 at ρ=0.1).
The genetic correlation between standardized TOCS total scores and OCD meta-analysis was rg=0.825 (s.e.=0.428; p=0.073) when intercepts are constrained to 1.
One of the top-ranked SNPs from a previous GWAS of OC symptoms [9] was nominally associated with TOCS total scores in the Spit for Science sample with the same direction of effect (rs60588302, p=0.025).
This SNP is in the same region as our top hit (9p24.1), but not in LD (r 2 =0.004, D'=0.517). Another 16 of the reported top hits in den Braber [9], including a variant in MEF2BNB (rs8100480) that was genomewide significant in their sample, had effects in the same direction but were not significantly associated in the current sample (Supplemental Table S5).

Discussion
Using a trait-based approach in a community sample, we identified the first replicated genome-wide significant variant related to OCD (rs7856850). The hypothesis-driven approach showed that genetic variants related to CNS development, particularly in NPAS2, SH3GL2 and GRID2, were associated with OC traits. A variant in GRID2 was also significantly associated with the OCD dimension symmetry/ordering and four genes were significantly associated with OC traits; none of these findings were replicated. Polygenic risk and genetic correlation findings showed sharing of genetic risks between OC traits in the community and OCD case/control status in independent samples.
The genome-wide significant variant (rs7856850) associated with OC traits is in an intron of the consensus transcript of PTPRD that codes for protein tyrosine phosphatase δ. No eQTLs have been calculated yet for rs7856850 (GTEx V8 [24]). This variant replicated in a meta-analysis of three independent OCD cohorts making it the first variant associated with OC traits and OCD. The small size of the replication sample likely precluded finding genome-wide hits in the meta-analysis. Previous GWAS of OCD symptoms or diagnosis identified variants that approached significance in the region around PTPRD. However, those variants were independent of the locus found in our study [5,9]. These observations support a possible role of the 9p24.1 region in OCD. The 9p region is also the location of one of the strongest linkage peaks in earlier genome-wide linkage studies of pediatric OCD [31][32][33]. Rare CNVs in PTPRD have been identified in cases with OCD [29], ADHD [34] and with brain malformations at birth [35]. SNPs in PTPRD were genome-wise significantly associated with ASD [36], restless legs syndrome [37], and self-reported mood instability [38]. Ptprd-deficient mice show learning deficits and altered long-term potentiation magnitudes in hippocampal synapses [39]. PTPRD is expressed highly in the brain compared to non-brain tissues, especially in myelinating axons and growth cones [40][41][42] in the prenatal cerebellum [43]. The presynaptically located PTPRD is involved in axon outgrowth and guidance [44,45] and interacts with postsynaptic proteins such as Slitrk-2, interleukin-1 receptor and TrK to mediate synapse adhesion and organization in mice [46][47][48] and the development of excitatory and inhibitory synapses [49]. Members of the Slitrk and interleukin protein families have been associated with OC behaviors in humans and mice [50][51][52].
SNPs in GRID2 were significant in our sFDR analysis of CNS development genes and were associated with the symmetry/ordering dimension. GRID2 codes for glutamate ionotropic receptor, δ2 (GluD2). GRID2 is highly expressed in the testis and in the brain, particularly in brain regions commonly associated with OCD including the striatum, anterior cingulate cortex and cerebellum (GTEx V8 [53], https://gtexportal.org/home/). In the largest OCD meta-analysis, one of the top associated variants was in GRID2 (rs1030757, p=1.1x10 -6 , OR=1.18 [54]). Additionally, this variant was associated with TOCS in the Spit for Science sample (β=0.09, s.e.=0.021, p=8.87x10 -6 ) and was in the same haplotype block as the genome-wide significant GRID2 SNPs identified in our study (r 2 =0.66-0.92) suggesting that these variants may be tagging the same locus. Neither rs1030757, nor the SNP in GRID2 with the lowest p-value in the sFDR analysis (rs5860287), were associated with any brain eQTLS. Rare inherited CNVs in GRID2 have been identified in ASD cases [55] and SNPs in GRID2 have been associated with cognitive deficits in schizophrenia [56]. Glutamate has been strongly implicated in OCD [32,57]. Although we did not identify any genome-wide significant variants from the glutamate gene list and the glutamate gene-set was not associated with OC traits, many of our significant variants from the CNS development list were linked to glutamate, including a glutamate receptor gene GRID2 and NPAS2.
Within the CNS development gene-set, we identified two additional significant loci. One locus was in NPAS2, which codes for neuronal per-arnt-sim (PAS) domain protein 2. NPAS2 is a core transcriptional factor in the molecular clock that maintains circadian rhythms and is expressed exclusively in the brain in the first week of life [58]. Although not annotated to the glutamate gene-set, NPAS2 regulates the uptake of glutamate into astrocytes [59], which maintains appropriate levels of extracellular glutamate [60]. The other locus was in SH3GL2, which codes for Endophilin A1. This gene was also significant in the gene-based GWAS of OC traits. Endophillin-A1 is involved in synaptic vesicle endocytosis in presynaptic terminals [61] and is required for dendrite development driven by brain-derived neurotrophic factor (BDNF) in mice [62]. Although none of the NPAS2 or SH3GL2 SNPs replicated in the OCD cohorts, their association with brain eQTLs in LocusFocus [24] provides support for these variants.
Our results show that OC traits in the community share genetic risk with OCD. Polygenic risk for OC traits was associated with OCD case/control status and vice versa. OC traits and OCD case/control status also were substantially, but not significantly, genetically correlated, similar to a recent study (rg=0.42, p=0.095 [63]). Lack of power is the most likely explanation for the absence of significant results. Previous studies on other psychiatric disorders report shared genetic risk between traits and diagnoses, with polygenic risk and genetic correlations similar to what we report for OC traits and OCD case/control status [38,64,65]. The shared genetic risk between OC traits and OCD supports the hypothesis that an OCD diagnosis could represent the high extreme of OC traits that are widely distributed in the general population. One implication of this finding is that population-based samples with quantitative trait measures can serve as a powerful complementary approach to case/control studies to accelerate gene discovery in psychiatric genetics.
SNP-based heritability for OC traits in the current sample was not significant. Previous studies similarly report lower SNP-based heritability for self-reported OC symptoms (0.058; [63]) than for clinical OCD (0.28-.37; [54,66]. A similar trend for lower trait vs. diagnosis SNP heritability has been observed for schizophrenia [67,68] and ADHD [69,70]. The reason for the disparity in SNP heritability between traits and diagnosis is unclear. One possible explanation could be differences in the informant as shown previously for ADHD [70]. Regardless of a non-significant SNP heritability for OC traits from our sample, we still identified and replicated a genome-wide significant variant.
OCD is a heterogeneous disorder with several accepted symptom dimensions. OCD dimensions have shared, but distinct, genetic variance in twin studies [10,71,72]. When we conducted a GWAS for each dimension separately, only the symmetry/ordering dimension had a genome-wide significant hit. This variant in GRID2 was also significant in the sFDR for CNS development genes in the TOCS total score. Our results highlight the importance of including phenotypes beyond just diagnosis or overall OC traits/symptoms.
The type of OC trait measure we used may have increased our power to identify a genome-wide significant variant. The TOCS scale is similar to existing OC trait/symptom measures in item content, but is unlike existing scales in that it measures OC traits from 'strengths' to 'weaknesses'. As a result, the distribution of the total score is closer to a normal distribution than the j-shaped distributions typically observed with most symptom-based scales that rate behaviors from zero to a positive integer [12]. The TOCS total score was associated with a genome-wide significant variant, which was not significant when we used two measures with j-shaped distributions (collapsed TOCS score and CBCL-OCS). Therefore, using trait-based scales that capture strengths and weaknesses and have a less skewed distribution could improve power, especially in population samples where the prevalence of clinically significant OC symptoms is relatively low.

Conclusions
We identified the first replicated genome-wide significant variant for OC traits and demonstrated the sharing of genetic risk between OC traits and OCD. This supports the hypothesis that OCD represents the extreme end of widely distributed OC traits in the population. Trait-based approaches in community samples using measures that capture the whole distribution of traits is a powerful and rapid complement to case/control GWAS designs to help drive genetic discovery in psychiatry.