Integrating evolutionary and regulatory information with a multispecies approach implicates genes and pathways in obsessive-compulsive disorder

Noh, Hyun Ji; Tang, Ruqi; Flannick, Jason; O’Dushlaine, Colm; Swofford, Ross; Howrigan, Daniel; Genereux, Diane P.; Johnson, Jeremy; van Grootheest, Gerard; Grünblatt, Edna; Andersson, Erik; Djurfeldt, Diana R.; Patel, Paresh D.; Koltookian, Michele; M. Hultman, Christina; Pato, Michele T.; Pato, Carlos N.; Rasmussen, Steven A.; Jenike, Michael A.; Hanna, Gregory L.; Stewart, S. Evelyn; Knowles, James A.; Ruhrmann, Stephan; Grabe, Hans-Jörgen; Wagner, Michael; Rück, Christian; Mathews, Carol A.; Walitza, Susanne; Cath, Daniëlle C.; Feng, Guoping; Karlsson, Elinor K.; Lindblad-Toh, Kerstin

doi:10.1038/s41467-017-00831-x

Download PDF

Article
Open access
Published: 17 October 2017

Integrating evolutionary and regulatory information with a multispecies approach implicates genes and pathways in obsessive-compulsive disorder

Hyun Ji Noh ORCID: orcid.org/0000-0002-6634-0599¹,
Ruqi Tang^1,2,3,
Jason Flannick¹,
Colm O’Dushlaine¹,
Ross Swofford¹,
Daniel Howrigan¹,
Diane P. Genereux¹,
Jeremy Johnson¹,
Gerard van Grootheest ORCID: orcid.org/0000-0003-4350-6661⁴,
Edna Grünblatt ORCID: orcid.org/0000-0001-8505-7265^5,6,7,
Erik Andersson⁸,
Diana R. Djurfeldt^8,9,
Paresh D. Patel¹⁰,
Michele Koltookian¹,
Christina M. Hultman¹¹,
Michele T. Pato¹²,
Carlos N. Pato¹²,
Steven A. Rasmussen¹³,
Michael A. Jenike¹⁴,
Gregory L. Hanna ORCID: orcid.org/0000-0002-0742-6990¹⁰,
S. Evelyn Stewart¹⁵,
James A. Knowles ORCID: orcid.org/0000-0002-3307-5741¹²,
Stephan Ruhrmann¹⁶,
Hans-Jörgen Grabe¹⁷,
Michael Wagner^18,19,
Christian Rück^8,9,
Carol A. Mathews²⁰,
Susanne Walitza^5,6,7,
Daniëlle C. Cath²¹,
Guoping Feng^2,22,
Elinor K. Karlsson ORCID: orcid.org/0000-0002-4343-3776^1,23^na1 &
…
Kerstin Lindblad-Toh^1,24^na1

Nature Communications volume 8, Article number: 774 (2017) Cite this article

16k Accesses
48 Citations
543 Altmetric
Metrics details

Subjects

Abstract

Obsessive-compulsive disorder is a severe psychiatric disorder linked to abnormalities in glutamate signaling and the cortico-striatal circuit. We sequenced coding and regulatory elements for 608 genes potentially involved in obsessive-compulsive disorder in human, dog, and mouse. Using a new method that prioritizes likely functional variants, we compared 592 cases to 560 controls and found four strongly associated genes, validated in a larger cohort. NRXN1 and HTR2A are enriched for coding variants altering postsynaptic protein-binding domains. CTTNBP2 (synapse maintenance) and REEP3 (vesicle trafficking) are enriched for regulatory variants, of which at least six (35%) alter transcription factor-DNA binding in neuroblastoma cells. NRXN1 achieves genome-wide significance (p = 6.37 × 10⁻¹¹) when we include 33,370 population-matched controls. Our findings suggest synaptic adhesion as a key component in compulsive behaviors, and show that targeted sequencing plus functional annotation can identify potentially causative variants, even when genomic data are limited.

Transcriptome alterations are enriched for synapse-associated genes in the striatum of subjects with obsessive-compulsive disorder

Article Open access 15 March 2021

Sean C. Piantadosi, Lora L. McClain, … Susanne E. Ahmari

Exome sequencing in obsessive–compulsive disorder reveals a burden of rare damaging coding variants

Article 28 June 2021

Mathew Halvorsen, Jack Samuels, … David B. Goldstein

Btbd3 expression regulates compulsive-like and exploratory behaviors in mice

Article Open access 09 September 2019

Summer L. Thompson, Amanda C. Welch, … Stephanie C. Dulawa

Introduction

Obsessive-compulsive disorder (OCD) is a highly heritable (h ² = 0.27–0.65)¹, debilitating neuropsychiatric disorder characterized by intrusive thoughts and time-consuming repetitive behaviors. Over 80 million people worldwide are estimated to suffer from OCD, and most do not find relief with available therapeutics¹, underscoring the urgency to better understand the underlying biology. Genome-wide association studies (GWAS) implicate glutamate signaling and synaptic proteins^{2, 3}, but specific genes and variants have not been validated. Isolating and characterizing such genes are important for understanding the biology and developing treatments for this devastating disease.

In mouse, genetically engineered lines have causally implicated the cortico-striatal neural pathway in compulsive behavior. Mice with a deletion of Sapap3 exhibit self-mutilating compulsive grooming and dysfunctional cortico-striatal synaptic transmission, with abnormally high activity of medium spiny neurons (MSNs) in the striatum. Resulting compulsive grooming is ameliorated by selective serotonin reuptake inhibitor (SSRI), a first-line medication for OCD⁴. Similarly, chronic optogenetic stimulation of the cortico-striatal pathway in normal mice leads to compulsive grooming accompanied by sustained increases in MSN activity⁵. Thus, excessive striatal activity, likely due to diminished inhibitory drive in MSN microcircuitry, is a key component of compulsive grooming. The brain region disrupted in this mouse model is also implicated by imaging studies in human OCD⁶.

Pet dogs are a natural model for OCD amenable to genome-wide mapping due to their unique population structure⁷. Canine compulsive disorder (canine CD) closely parallels OCD, with equivalent clinical metrics, including compulsive extensions of normal behaviors, typical onset at early social maturity, roughly a 50% rate of response to SSRIs, high heritability, and polygenic architecture⁸. Through GWAS and targeted sequencing in dog breeds with exceptionally high rates of canine CD, we associated genes involved in synaptic functioning and adhesion with CD, including neural cadherin (CDH2), catenin alpha2 (CTNNA2), ataxin-1 (ATXN1), and plasma glutamate carboxypeptidase (PGCP)^{8, 9}.

Human genetic studies of related disorders, such as autism spectrum disorders (ASD), suggest additional genes. Both ASD and OCD are characterized by repetitive behaviors, and high comorbidity suggests a shared genetic basis⁶. Genome-wide studies searching for de novo and inherited risk variants have confidently associated hundreds of genes with ASD; this set may be enriched for genes involved in OCD¹⁰.

Focusing on genes implicated by model organisms and related disorders could find variants underlying OCD risk, even with smaller sample sizes. Researchers, particularly in psychiatric genetics, are wary of “candidate gene” approaches, which often failed to replicate¹¹. Closer examination of past studies suggests this approach is powerful and reliable when the set of genes tested is large, and the association is driven by rare variation¹¹. A study testing 2000 candidate genes for association with diabetic retinopathy identified 25 genes, at least 11 of which achieved genome-wide significance in a GWAS of type 2 diabetes, a related disorder^{12, 13}. A targeted-sequencing study of ASD, with 78 genes, identified four genes with recurrent, rare deleterious mutations; these four genes are also implicated by whole-exome sequencing studies¹⁴. Candidate gene studies also replicated associations to rare variants in APP, PSEN1, and PSEN2 for Alzheimer’s disease¹⁵, PCSK9 for low-density lipoprotein–cholesterol level¹⁶, and copy-number variants for autism and schizophrenia¹⁰.

Detecting associations driven by rare variants requires sequencing data, which captures nearly all variants. Although whole-genome sequencing studies of complex diseases are still prohibitively expensive, it is feasible to target a subset of the genome. Sequencing also facilitates identification of causal variants, accelerating discovery of new therapeutic avenues^{17, 18}. For example, finding functional, rare variants in PCSK9 led to new therapies for hypercholesterolemia¹⁹. One approach is to target predominantly coding regions (whole-exome sequencing). Although successful in finding causal variants for rare diseases²⁰, this approach misses the majority of disease-associated variants predicted to be regulatory²¹. A targeted-sequencing approach that captures both the regulatory and coding variation of a large set of candidate genes offers many advantages of whole-genome sequencing, and is feasible when cohort size and resources are limited.

Here we report a new strategy that overcomes limitations of less comprehensive candidate gene studies and exome-only approaches, and identifies functional variants associated with increased risk of OCD. We start by compiling a large set of 608 genes (~3% of human genes) using studies of compulsive behavior in dogs and mice, and studies of ASD and OCD in humans. By focusing on this subset of genes, targeting both coding and regulatory regions, and applying a new statistical method that incorporates regulatory and evolutionary information, we identify four associated genes, including NRXN1, the first genome-wide-significant association reported for OCD.

Results

Targeted-sequencing design

We compiled a list of 608 genes using three strategies (65 were implicated more than once) (Supplementary Table 1 and Supplementary Methods):

(1)
263 “model-organism genes”, including 56 genes associated in canine CD GWAS and 222 genes implicated in murine-compulsive grooming.
(2)
196 “ASD genes” from SFARI database (https://gene.sfari.org/) as of 2009.
(3)
216 “human candidate genes” from small-scale OCD candidate gene studies (56 genes), family-based linkage studies of OCD (91 genes), and by other neuropsychiatric disorders (69 genes).

We targeted coding regions and 82,723 evolutionarily constrained elements in and around these genes, totaling 13.2 Mb (58 bp–16 kb size range, median size 237 bp), 34% noncoding²².

Variant detection

We sequenced 592 European ancestry DSM-IV OCD cases and 560 ancestry-matched controls using pooled sequencing, with 16 samples per bar-coded pool (37 “case” pools; 35 “control” pools). Overall, 95% of target regions were sequenced at >30× read depth per pool (median 112×; ~7× per individual; Supplementary Fig. 1), sufficient to identify variants occurring in just one individual, assuming 0.5–1% per base machine error rate.

We called 124,541 single nucleotide polymorphisms (SNPs) using Syzygy (84,216)¹⁷ and SNVer (81,829)²³. For primary analyses, we focused on 41,504 “high-confidence” SNPs detected by both, with highly correlated allele frequencies (AF) (Pearson’s ρ = 0.999, p < 2.2 × 10⁻¹⁶; Supplementary Fig. 2). We see no significant difference between case and control pools, indicating no bias in variant detection.

Variant annotation

We used three annotations shown to be enriched for disease-associated variation to identify likely functional variants in our targeted regions: coding, evolutionary conserved, and/or DNase1 hypersensitivity site (DHS)^{21, 24,25,26,27}. We annotated 67% (27,626) of high-confidence variants, with 16% coding (49% of those were non-synonymous), 36% DHS, and 80% evolutionary conserved or divergent (Fig. 1a). We measured evolutionary constraint using mammalian GERP++ scores²⁷; scores >2 were “conserved” and scores <–2 were “divergent”.

Gene-based burden analysis

To identify genes with a significant load of non-reference alleles in OCD cases, relative to controls, we developed PolyStrat, a one-sided gene-based burden test that controls for gene length (Supplementary Fig. 3a) and incorporates variant annotation. We used four variant categories: (i) all (Overall), (ii) coding (Exon), (iii) regulatory (variants in DHS), and (iv) rare (1000 Genomes Project²⁸ AF < 0.01). Each category is further stratified by evolutionary status: (i) all detected variants; (ii) slow-evolving conserved (Cons); (iii) fast-evolving divergent (Div); and (iv) evolutionary (Evo). “Evo” is the subset of “all” variants annotated as either “conserved” or “divergent”. In total, PolyStrat considers 16 groups stratified by predicted function and evolutionary conservation.

PolyStrat p-values are corrected for multiple testing empirically using a permutation-based method that accurately measures experiment-wide statistical significance across correlated gene-based tests, while controlling for type 1 errors (Supplementary Methods). For most variant categories, quantile–quantile plots revealed good correspondence between observed values and the empirical null, with a small number of genes exceeding the expected distribution in a subset of the burden tests (Supplementary Figs. 3b and 4).

Five of the 608 sequenced genes (0.82%) show significant burdens of variants in OCD patients (Table 1; Fig. 1b), including two with excess coding variants (NRXN1 and HTR2A) and two with excess regulatory variants (CTTNBP2 and REEP3) (Fig. 2). REEP3 is the only gene with excess divergent (potentially fast evolving) variants. No genes had a significant burden of rare variants (Supplementary Fig. 4).

Table 1 Five genes with significant variant burden in OCD cases in pooled-sequencing data

Full size table

We validated the 46 SNPs contributing to significant gene-burden tests (7 in LIPH, 13 in NRXN1, 4 in HTR2A, 15 in CTTNBP2, and 7 in REEP3) by individual genotyping of 571 OCD and 555 control samples (98% of the cohort). Nine variants failed Sequenom assay design or had low genotyping rates. For the remaining 37, the genotyping and pooled-sequencing frequencies are nearly perfectly correlated (Pearson’s ρ = 0.999, p < 2.2 × 10⁻¹⁶; Supplementary Fig. 5; Supplementary Data 1).

We confirmed that our significant gene-burden test findings are not driven by population structure (Supplementary Methods) or linkage disequilibrium (LD), with one notable exception. We measured pairwise r ² between SNPs contributing to the burden test in our top five genes, and found strong LD (r ² > 0.8) between one pair, in LIPH. There was no strong LD in NRXN1, HTR2A, CTTNBP2, and REEP3.

Genes included from model-organism studies (263 genes) and larger ASD studies (196 genes) were significantly more associated than genes from human candidate gene studies (216 genes) (Kruskal–Wallis p = 5.6 × 10⁻¹⁵). This is consistent with previous work showing that genes found through smaller candidate gene studies replicate poorly¹¹. It also suggests that, when a genome-wide study of the disease of interest is not available, targeting genes implicated in a model organism may be as effective as targeting genes implicated in a comorbid, phenotypically similar human disorder.

The five genes most strongly implicated in canine CD and murine-compulsive grooming (CDH2, CTNNA2, ATXN1, PGCP, and Sapap3) have significantly lower p-values than the other 603 sequenced genes (Wilcoxon unpaired, one-sided p = 2.6 × 10⁻⁴). The difference becomes more significant when only rare variants are tested (Wilcoxon unpaired, one-sided p = 3.2 × 10⁻⁵) (Fig. 1c). This is consistent with the hypothesis that severe disease-causing variants, rare in humans due to negative selection, may persist at higher frequencies in model organisms where selection is relaxed.

Applying the burden test across multiple genes with shared biological functions, we identified gene sets with high-variant load in OCD patients. We tested all 989 Gene Ontology (GO) sets that are at least weakly enriched (enrichment p < 0.1) in our 608 sequenced genes (Supplementary Data 2) and found two with high-variant burdens: “GO:0010942 positive regulation of cell death” (uncorrected p = 3 × 10⁻⁴, corrected p < 0.03) and “GO:0031334 positive regulation of protein complex assembly” (uncorrected p = 7 × 10⁻⁴, corrected p < 0.06). Overlaying the burden test results onto the GO network topology highlights functional themes linking the enriched gene sets: regulation of protein complex assembly and cytoskeleton organization; neuronal migration; action potential; and cytoplasmic vesicle (Supplementary Fig. 6).

Validation of candidate variants by genotyping

We genotyped the top 67 candidate functional variants from the five significant genes, including 42 rare SNPs (AF < 0.01), in the pooled-sequencing cohort (Fig. 3a). This yielded, after QC, individual genotypes for 63 SNPs in 571 cases and 555 controls (98% of the cohort; genotyping rate >0.94 for all SNPs). We see near perfect correlation with the pooled sequencing for both allele frequencies (Fig. 3b, c; OCD AF, Pearson’s ρ = 0.999, p = 2.7 × 10⁻⁸⁹; Control AF, Pearson’s ρ = 0.999, p = 2.5 × 10⁻⁸⁹) and the AF differences (Fig. 3d; OCD AF–control AF, Pearson’s ρ = 0.93, p = 4.8 × 10⁻²⁸).

We genotyped these 63 SNPs in an independent cohort of 727 cases and 1105 controls of European ancestry, and found strong correlation with the first genotyping cohort for both AF (Fig. 3e, f; OCD AF, Pearson’s ρ = 0.999, p = 1.0 × 10⁻⁸²; control AF, Pearson’s ρ = 0.999, p = 1.8 × 10⁻⁹⁴) and AF differences (Fig. 3g; OCD AF–control AF, Pearson’s ρ = 0.4, p = 0.001). The risk allele from the first cohort is significantly more common in cases in the second cohort (Wilcoxon paired one-sided test for 63 SNPs, p = 0.005). More specifically, of 54 SNPs that had a higher frequency of the non-reference allele in cases in the first cohort, 61% also had a higher frequency of the non-reference allele in cases in the second cohort. The 33 SNPs that failed to validate in either of the two cohorts had smaller allele-frequency differences in the first cohort (one-sided unpaired t-test p = 0.02).

In summary, the allele-frequency analysis described above identified four genes: NRXN1, HTR2A, CTTNBP2, and REEP3. LIPH is excluded because its association is likely slightly inflated by LD and the genotyping in the second cohort did not reproduce as clearly. To validate the associations, we employed distinct strategies depending on whether the association was driven by coding (NRXN1 and HTR2A) or regulatory variation (CTTNBP2 and REEP3).

Functional validation of regulatory variants using electrophoretic mobility shift assay

For CTTNBP2 and REEP3, regulatory variants give a far stronger burden signal than does testing for either coding variants or all variants (Fig. 1b). Furthermore, the three largest effect variants in the combined cohort (1298 OCD cases and 1660 controls) alter enhancer elements in these two genes: chr7:117358107 in CTTNBP2 (OR = 5.2) and chr10:65332906 (OR = 3.7) and chr10:65287863 (OR = 3.2) in REEP3 (Supplementary Data 3). Using ENCODE and Roadmap Epigenomics data, we identified 17 candidate SNPs in CTTNBP2 and REEP3, likely to alter transcription factor-binding sites (TFBS) and/or disrupt chromatin structure in brain-related cell types^{26, 29}. All 17 alter enhancers or transcription associated loci active in either the substantia nigra (SN), which relays signals from the striatum to the thalamus, and/or the dorsolateral prefrontal cortex (DL-PFC), which sends signals from the cortex to the striatum/thalamus (Fig. 3h, i). Both regions act in the CSTC pathway implicated by neurophysiological and genetic studies in OCD (Fig. 3j)³⁰.

We functionally tested 17 candidate regulatory SNPs in REEP3 and CTTNBP2 (Table 2; Supplementary Fig. 7b). We introduced each into a human neuroblastoma cell line (SK-N-BE(2)) and assessed transcription factor binding using electrophoretic mobility shift assays (EMSA). Both DHS SNPs in REEP3, three of seven DHS SNPs in CTTNBP2, and one non-DHS variant in CTTNBP2 clearly alter specific DNA-protein binding (Fig. 4a, b). We see weak evidence of differential binding for one upstream DHS SNP in REEP3, two DHS SNPs in CTTNBP2, and one non-DHS SNP in CTTNBP2 (Supplementary Fig. 8).

Table 2 Candidate regulatory variants

Full size table

The high rate of functional validation by EMSA demonstrates that screening using both regulatory and evolutionary information is remarkably effective in identifying strong candidate OCD-risk variants. In total, eight of 12 tested DHS SNPs (67%) show evidence of altered protein binding, despite testing a single cell line at a single time point under standard-binding conditions (Table 2). This includes two SNPs with high ORs in the full genotyping data sets that strongly disrupt specific DNA-protein binding (chr10:65332906 with OR = 3.7; chr7:117417559 with OR = 2.2). Two of five non-DHS SNPs (40%) also show altered binding, illustrating that DHS mark alone is a powerful but imperfect predictor of regulatory function. Both of these SNPs alter highly constrained elements (SiPhy score 8.7), whereas only one of the three non-DHS SNPs is constrained. Although this is a small data set, our results suggest that incorporating both DHS and conservation may identify functional regulatory variants with greater specificity, an observation consistent with previously published research³¹.

Validation of coding variants using ExAC

In contrast to the regulatory-variant burden found in CTTNBP2 and REEP3, NRXN1 and HTR2A showed significant PolyStrat signals when only coding variants are considered. Of 12 candidate coding SNPs in NRXN1, seven are missense (Table 3). Four of these are SNPs private to OCD cases, and the other three are rare (AF in controls 0.0009–0.0036). All seven change amino acids in laminin G or EGF-like domains important for postsynaptic binding, potentially affecting the involvement of NRXN1 in synapse formation and maintenance (Fig. 4c). Of the three candidate coding SNPs in HTR2A, two (one missense and one synonymous) are in the last coding exon, and one (missense) is the cytoplasmic domain with a PDZ-binding motif, potentially affecting binding affinity or specificity³².

Table 3 Coding variants in NRXN1 and HTR2A

Full size table

We sought to improve our statistical power by combining our pooled-sequencing data with publicly available ExAC data³³. Using only our data, the associations of CTTNBP2, REEP3, NRXN1, and HTR2A with OCD are experiment-wide significant, but do not reach the genome-wide significance threshold p < 2.5 × 10⁻⁶ (~20,000 human genes), with the strongest association, to NRXN1, at p = 5.1 × 10⁻⁵ (cohort 1 and 2; Fisher’s combined p). For the two genes with a burden of coding variants (NRXN1 and HTR2A), we used ExAC to assess variant burden in OCD cases compared with 33,370 non-Finnish Europeans. Such a comparison was not possible for CTTNBP2 and REEP3, for which associated variants are predominantly noncoding and thus not assayed in ExAC.

To assess the significance of the variant enrichment in each gene, we used an isoform-based test that incorporates a within-gene comparison to assess significance, effectively controlling for inflation due to the extremely large size of the ExAC cohort³⁴ (Supplementary Methods). Of 542 genes with more than one isoform, we saw no significant difference between our control data and ExAC for over 90% (493 genes had corrected p > 0.05). Focusing on the subset of 66 genes with nominally significant PolyStrat scores, NRXN1 had the largest difference between cases and ExAC (χ ² = 82.3, df = 16, uncorrected p = 6.37 × 10⁻¹¹; corrected p = 1.27 × 10⁻⁶) and no difference between controls and ExAC (χ ² = 10.5, df = 16, uncorrected p = 0.84) (Fig. 4d). No previous findings in OCD genetics have reached this level of significance despite >100 candidate gene studies³⁵, a dozen linkage studies³⁰, and two GWAS^{2, 3}. HTR2A, while enriched for coding variants, had only two SNPs in cases, providing insufficient information for the isoform test.

The significant association of NRXN1 reflects an exceptional burden of variants in one of its 17 Ensembl isoforms. NRXN1a-2, which contains all 12 candidate coding SNPs, had the largest deviation between observed and expected variant counts, with a residual at least 1.4× higher than any other isoform (NRXN1a-2 = 22.3, NRXN1-001 = 16.3; median = 5.15). After adjusting for the residuals from the “null” control data and ExAC comparison, the NRXN1a-2 residual is still 1.3× higher (OCD residual/control residual NRXN1a-2 = 5.34, NRXN1-014 = 4.04).

Discussion

By analyzing sequencing data for 608 OCD candidate genes, then prioritizing variants according to functional and conservation annotations, we identified four genes with a reproducible variant burden in OCD cases. Two genes, NRXN1 and HTR2A (Table 3), have a burden of coding variants, and the other two, CTTNBP2 and REEP3 (Table 2), have a burden of conserved regulatory variants. Notably, all four act in neural pathways linked to OCD, including serotonin and glutamate signaling, synaptic connectivity, and the CSTC circuit⁶, offering new insight into the biological basis of compulsive behavior (Fig. 5).

We used three independent approaches to validate our findings: (1) For the top candidate SNPs, allele-frequency differences from sequencing data were confirmed by genotyping of both the original cohort (Fig. 3d) and a larger, independent cohort (Fig. 3g). (2) For the two genes with a burden of coding variants (NRXN1 and HTR2A), comparison of our data to 33,370 population-matched controls from ExAC³³ revealed genome-wide-significant association of NRXN1 with OCD. (3) For the two genes with the burden of regulatory variants (REEP3 and CTTNBP2), more than one-third of candidate SNPs altered protein/DNA binding in a neuroblastoma cell line (Fig. 4a).

Comparison of our approach to existing methods illustrates its unique advantages, and offers a deeper understanding of how its two key features—targeted sequencing, and incorporation of functional and conservation metrics—permit identification of significantly associated genes using a cohort smaller even than those that have previously failed to yield significant results.

Targeted sequencing captures both coding and regulatory variants, and both common and rare variants, at a fraction of the cost of whole-genome sequencing (WGS). For the modest-size cohort in this study, WGS would cost ~$2.5M, 40-fold more than our pooled-sequencing approach. Even without pooling, our targeted-sequencing costs fourfold less than WGS. Whole-exome sequencing would cost approximately the same as targeted sequencing, but misses the regulatory variants explaining most polygenic trait heritability²¹. By using existing information on OCD and related diseases to prioritize a large set of genes, then performing targeted sequencing of functional elements, our approach enhances causal-variant detection and thus statistical power, although it misses OCD-associated genes not included as candidates, and potential distant regulatory elements.

The capacity to detect associations to rare variants is especially critical for study of diseases that, like OCD, may reduce fitness, as negative selection limits inheritance of deleterious variants³⁶. Genotype array data sets, and even imputed data sets, miss many rare variants. In our data set, 80% of variants driving significant associations have allele frequencies <0.05; one of the densest genotyping arrays available, the Illumina Infinium Omni5 (4.3M markers) contains only half of these variants (Supplementary Data 1)^{2, 3}. In addition, 60% of our variants have allele frequencies <0.01, and would be missed even through imputation with 1000 Genomes and UK10K³⁷.

Our new analytical method, PolyStrat, analyzes targeted-sequencing data capturing all variants, and leverages public evolutionary and regulatory data to increase power. PolyStrat first filters out variants that are less likely to be functional, then performs gene-burden tests. In contrast to gene-based approaches focusing on ultra-rare, protein-damaging variants, PolyStrat considers variants of diverse frequencies, gaining power to identify genes with excess variants in cases.

PolyStrat is particularly advantageous when applied to studies with smaller cohorts. By testing for association at the gene level, it requires statistical correction only for the ~20,000 genes in the genome. It increases power further by using targeted-sequencing data to capture nearly all variation, including variants with higher allele frequencies and/or larger effect sizes, in regions that are coding or evolutionarily constrained, and enriching for causal variants by removing ~33% of variants unlikely to be functional³⁸. PolyStrat tests ~82 times more functional variants than PolyPhen2 (http://genetics.bwh.harvard.edu/pph2/), which focuses on protein-damaging variants (27,626 vs. 335 in our data).

Our PolyStrat results are consistent with expectations from simulations, which suggest that 200–700 cases should yield 90% power to detect associated genes with allele frequencies and effect sizes similar to our four genes³⁹. Specifically, we would achieve 90% power to detect associations to NRXN1 (combined AF = 0.022, OR = 2.4) with ~600 cases, to HTR2A (combined AF = 0.03, OR = 1.56) with ~700 cases; to REEP3 (combined AF = 0.04, OR = 2.11) with ~200 cases, and to rare (AF < 0.01) variants in CTTNBP2 (combined AF = 0.003, OR = 4.7) with ~500 cases.

Previous research on the four genes identified by PolyStrat revealed that all are expressed in the striatum, a brain region linked to OCD (http://human.brain-map.org/). All four genes are involved in pathways relevant to brain function, and harbor variants that could alter OCD risk (Table 4).

Table 4 Summary of top genes

Full size table

NRXN1 encodes the synapse cell-adhesion protein neurexin 1α, a component of cortico-striatal neural pathway^{40, 41} implicated in ASD and other psychiatric diseases⁴², and functionally related to genes associated with OCD (CDH9/CDH10)^{3, 8, 9} and canine CD^{8, 9} (CDH2) (Fig. 5). NRXN1 isoforms are implicated in distinct neuropsychiatric disorders. The non-synonymous variants in the NRXN1a-2 isoform (Fig. 4c) may alter synaptic function by disrupting cellular localization or interactions with binding partners, including neurexophilins⁴³. The five synonymous candidate variants in likely regulatory elements may affect protein folding by disrupting post-transcriptional regulation, seen in other neuropsychiatic disorders⁴⁴.

The synaptic plasticity gene REEP3, also implicated in ASD⁴⁵, encodes a protein that shapes tubular endoplasmic reticulum membranes found in highly polarized cells, including neurons⁴⁶. The two EMSA-validated REEP3 variants change regulatory elements active in the cortico-striatal neural pathway (Fig. 3h) and bound by multiple TFs (Table 2) including GATA2, which may be required to actuate inhibitory GABAergic neurons⁴⁷. Thus, variants disrupting GATA2 binding could change the balance between excitatory and inhibitory neurons in the CSTC circuit (Fig. 3j)³⁰.

CTTNBP2 regulates postsynaptic excitatory synapse formation. All four EMSA - confirmed variants in CTTNBP2 alter epigenetic marks active in the key structures of the cortico-striatal neural pathway⁴⁸ (Table 2; Fig. 3h), potentially affecting the expression of this critical gene. CTTNBP2 proteins interact with both proteins encoded by STRN (striatin), which approached experiment-wide significance in this study (uncorrected p = 0.0016, corrected p < 0.1; Fig. 1b) and the canine CD gene CDH2 (Fig. 5).

HTR2A encodes a G-protein-coupled serotonin receptor expressed throughout the central nervous system, including in the prefrontal cortex, and has been implicated in ASD and OCD³⁵. A related serotonin-receptor cluster (HTR3C/HTR3D/HTR3E) is associated with severe canine CD⁴⁹(Fig. 5). The three coding variants found in HTR2A may alter its binding affinity (Table 3)³², and one of the three, a rare missense variant (rs6308; AF = 0.004 in 1000G CEU population) is perfectly linked (D′ = 1; http://raggr.usc.edu) to a common variant (rs6314) associated with response to SSRIs⁵⁰.

Taken together, our top four associated genes and our pathway analysis implicate three classes of neuronal functions in OCD, as described below.

First, synaptic cell-adhesion molecules help establish and maintain contact between the presynaptic and postsynaptic membrane, and are critical for synapse development and neural plasticity. NRXN1 encodes a cell-adhesion molecule predominantly expressed in the brain, and CTTNBP2 regulates cortactin, another such molecule, echoing earlier findings linking cell-adhesion genes to compulsive disorders in dogs (CDH2 and CTNNA2), mice (Slitrk5), and humans (DLGAP1, PTPRD and CDH9/CDH10)^{2, 3, 8, 51} (Fig. 5). In our pathway analysis, “regulation of protein complex assembly” and “cytoskeleton organization” were enriched for variants in OCD patients.

Second, OCD may result from an imbalance of excitatory glutamate and inhibitory GABAergic neuron differentiation³⁰ (Fig. 3j), a process that involves both NRXN1 ⁵² and REEP3 ⁵³ (Table 4), as well as PTPRD, a top OCD GWAS candidate³. We also find an overall burden of variants in genes regulating cell death and apoptosis (Supplementary Data 2) and in telencephalic tangential migration, a neuronal migration event which forms connections between the key structures of CSTC circuit⁵⁴.

Third, SSRIs are the most effective available OCD treatment, suggesting a role for serotonergic pathways in disease. HTR2A encodes a serotonin receptor, and allelic variation in HTR2A alters response to SSRIs (Table 4)⁵⁰. In addition, both REEP3 and CACNA1C, which score high in this study (Fig. 1), also significantly associate with schizophrenia and act in calcium signaling, a downstream pathway of HTR2A ^55,56,57. Meta-analysis of >100 OCD genetic association studies found strong association to both HTR2A and the serotonin transporter gene SLC6A4 ³⁵. In dogs, a serotonin-receptor locus is associated with severe CD⁴⁹.

Our findings suggest broad principles that could guide studies of other polygenic diseases. We discovered that genes associated in selectively bred model organisms are more likely to contain rare, highly penetrant variants. The five genes we found to be most strongly associated with compulsive behaviors in dog and mouse (CDH2, CTNNA2, ATXN1, PGCP, and Sapap3) were significantly more enriched for rare variants in human patients than the other 603 genes targeted, although they did not individually achieve significance (Fig. 1c). We propose that the enrichment of rare variants in humans reflects natural selective forces limiting the prevalence of severe disease-causing variants. Such forces are less powerful in selectively bred animal populations. Because risk variants identified through animal models are anticipated to be rare in humans, replication will require either family-based studies, or cohorts of magnitude not currently available.

We also find that the ratio of coding to regulatory variants is positively correlated with a gene’s developmental importance. Although single-gene p-values from PolyStrat tests are positively correlated across variant categories, as is expected given overlaps between different variant categories (Fig. 1a; Supplementary Fig. 9), this pattern breaks down for our four significantly associated genes. NRXN1 and HTR2A, which have burdens of coding variants, score poorly on regulatory-variant tests; CTTNBP2 and REEP3, which have burdens of regulatory variants, score poorly in coding-variant tests (Fig. 1b). This is consistent with the ExAC study showing that genes critical to viability or development do not tolerate major coding changes³³. In that study, the authors infer that CTTNBP2 and REEP3 would be intolerant of homozygous loss of function variants (pRec = 0.99999015 and pRec = 0.953842585, respectively), whereas HTR2A (pRec = 0.225555783) and, most notably, NRXN1 (pRec = 5.13 × 10⁻⁵) would be far more tolerant. Our finding of enrichment for regulatory variants in CTTNBP2 and REEP3 suggests that these genes may tolerate variants with more subtle functional impacts, such as expression differences in specific cell types or developmental stages.

Technological advances in high-throughput sequencing bring increased focus on identifying causal genetic variants as a first step toward targeted disease therapies⁵⁸. However, existing approaches have notable limitations. WGS is prohibitively expensive in large cohorts, whereas cost-saving whole-exome sequencing does not capture the regulatory variants underlying complex diseases²¹. Leveraging existing genomic resources can increase power to find causal variants through meta-analysis and imputation, but these resources are heavily biased towards a few populations. Without new approaches, advances in precision medicine will predominantly benefit those of European descent.

Here, we describe an approach that combines prior findings, targeted sequencing, and a new analytic method to efficiently identify genes and individual variants associated with complex disease risk. In a modest-size cohort of OCD cases and controls we find associations driven by both coding and regulatory variants, highlighting new potential therapeutic targets. Our method holds promise for elucidating the biological basis of complex disease, and for extending the power of precision medicine to previously excluded populations.

Methods

Study design

We designed and carried out the study in two phases. In the first, discovery phase, we performed targeted sequencing of 592 individuals with DSM-IV OCD and 560 controls of European ancestry, and tested association for OCD at single variant-level, gene-level, and pathway-level. In the second, validation phase, we employed three distinct analyses. (1) We genotyped both the original cohort and a second, independent cohort containing 1834 DNA samples (729 DSM-IV OCD cases and 1105 controls) of European ancestry, including a total of 2986 individuals (1321 OCD cases and 1665 controls) to confirm the observed allele frequencies in the discovery phase. (2) We compared our sequencing data with 33,370 population-matched controls from ExAC to confirm the gene-based burden of coding variants as well as allele frequencies. (3) We performed EMSA to test whether our candidate variants have regulatory function. Uses of biospecimens in this study were reviewed and approved by either the Broad's Office of Research Subject Protection, or the Partners HealthCare Human Research Committee. Informed consent was obtained from all subjects included in our study.

Targeted regions

We targeted 82,723 evolutionarily constrained regions in and around 608 genes, which included all regions within 1 kb of the start and end of each of 608 targeted genes with SiPhy evolutionarily constraint score >7, as well as all exons²². For the intergenic regions upstream and downstream of each gene, we used constraint score thresholds that became more stringent with distance from the gene.

Pooled sequencing and variant annotation

Groups of 16 individuals were pooled together into 37 case pools and 35 control pools and bar-coded. Targeted-genomic regions were captured using a custom NimbleGen hybrid capture array and sequenced by Illumina GAII or Illumina HiSeq2000. Sequencing reads were aligned and processed by Picard analysis pipeline (http://broadinstitute.github.io/picard/). Variants and AFs were called using Syzygy¹⁷ and SNVer²³. We used ANNOVAR²⁵ to annotate variants for RefSeq genes (hg19), GERP scores, ENCODE DHS cluster, and 1000 G data.

Genotyping

SNP genotyping was performed using the Sequenom MassARRAY iPLEX platform. The resulting data were analyzed using PLINK1.9 (www.cog-genomics.org/plink2).

EMSA

For each allele of the tested variants, pairs of 5′-biotinylated oligonucleotides were obtained from IDT Inc. (Supplementary Data 4). Equal volumes of forward and reverse oligonucleotides (1 pmol/µl) were mixed and heated at 95°C for 5 min and then cooled to room temperature. Annealed probes were incubated at room temperature for 30 min with SK-N-BE(2) nuclear extract (Active Motif). The remaining steps followed the LightShift Chemiluminescent EMSA Kit protocol (Thermo Scientific).

Statistical analysis

For gene-association/pathway-association, we used the sum of the differences of non-reference allele rates between cases and controls per gene as test statistic, and calculated the probability of observing a test statistic by chance from 10,000 permutations. Multiple testing was empirically corrected using “minP” procedure. See Supplementary Methods for details.

Code availability

The code used in this study was obtained from R package Rplinkseq and PLINK1.9.

Data availability

All data presented in this study are accessible at: https://data.broadinstitute.org/OCD_NatureCommunications2017/.

References

Pauls, D. L. The genetics of obsessive-compulsive disorder: a review. Dialogues Clin. Neurosci. 12, 149–163 (2010).
PubMed PubMed Central Google Scholar
Stewart, S. E. et al. Genome-wide association study of obsessive-compulsive disorder. Mol. Psychiatry 18, 788–798 (2013).
Article CAS PubMed Google Scholar
Mattheisen, M. et al. Genome-wide association study in obsessive-compulsive disorder: results from the OCGAS. Mol. Psychiatry 20, 337–344 (2014).
Article PubMed PubMed Central Google Scholar
Welch, J. M. et al. Cortico-striatal synaptic defects and OCD-like behaviours in Sapap3-mutant mice. Nature 448, 894–900 (2007).
Article ADS CAS PubMed PubMed Central Google Scholar
Ahmari, S. E. et al. Repeated cortico-striatal stimulation generates persistent OCD-like behavior. Science 340, 1234–1239 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Ting, J. T. & Feng, G. Neurobiology of obsessive-compulsive disorder: insights into neural circuitry dysfunction through mouse genetics. Curr. Opin. Neurobiol. 21, 842–848 (2011).
Article CAS PubMed PubMed Central Google Scholar
Karlsson, E. K. & Lindblad-Toh, K. Leader of the pack: gene mapping in dogs and other model organisms. Nat. Rev. Genet. 9, 713–725 (2008).
Article CAS PubMed Google Scholar
Tang, R. et al. Candidate genes and functional noncoding variants identified in a canine model of obsessive-compulsive disorder. Genome Biol. 15, R25 (2014).
Article PubMed PubMed Central Google Scholar
Dodman, N. H. et al. A canine chromosome 7 locus confers compulsive disorder susceptibility. Mol. Psychiatry 15, 8–10 (2010).
Article CAS PubMed Google Scholar
Sullivan, P. F., Daly, M. J. & O’Donovan, M. Genetic architectures of psychiatric disorders: the emerging picture and its implications. Nat. Rev. Genet. 13, 537–551 (2012).
Article CAS PubMed PubMed Central Google Scholar
Farrell, M. S. et al. Evaluating historical candidate genes for schizophrenia. Mol. Psychiatry 20, 555–562 (2015).
Article CAS PubMed PubMed Central Google Scholar
Sobrin, L. et al. Candidate gene association study for diabetic retinopathy in persons with type 2 diabetes: the Candidate gene Association Resource (CARe). Invest. Ophthalmol. Vis. Sci. 52, 7593–7602 (2011).
Article CAS PubMed PubMed Central Google Scholar
Fuchsberger, C. et al. The genetic architecture of type 2 diabetes. Nature 536, 41–47 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
D’Gama, A. M. et al. Targeted DNA sequencing from autism spectrum disorder brains implicates multiple genetic mechanisms. Neuron 88, 910–917 (2015).
Article PubMed PubMed Central Google Scholar
Bertram, L. & Tanzi, R. E. Thirty years of Alzheimer’s disease genetics: the implications of systematic meta-analyses. Nat. Rev. Neurosci. 9, 768–778 (2008).
Article CAS PubMed Google Scholar
Cohen, J. C., Boerwinkle, E., Mosley, T. H. Jr. & Hobbs, H. H. Sequence variations in PCSK9, low LDL, and protection against coronary heart disease. N. Engl. J. Med. 354, 1264–1272 (2006).
Article CAS PubMed Google Scholar
Rivas, M. A. et al. Deep resequencing of GWAS loci identifies independent rare variants associated with inflammatory bowel disease. Nat. Genet. 43, 1066–1073 (2011).
Article CAS PubMed PubMed Central Google Scholar
Gutierrez-Achury, J. et al. Fine mapping in the MHC region accounts for 18% additional genetic risk for celiac disease. Nat. Genet. 47, 577–578 (2015).
Article CAS PubMed PubMed Central Google Scholar
Roth, E. M., McKenney, J. M., Hanotin, C., Asset, G. & Stein, E. A. Atorvastatin with or without an antibody to PCSK9 in primary hypercholesterolemia. N. Engl. J. Med. 367, 1891–1900 (2012).
Article CAS PubMed Google Scholar
Warr, A. et al. Exome sequencing: current and future perspectives. G3 5, 1543–1550 (2015).
Article PubMed PubMed Central Google Scholar
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Article CAS PubMed PubMed Central Google Scholar
Lindblad-Toh, K. et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478, 476–482 (2011).
Article CAS PubMed PubMed Central Google Scholar
Wei, Z., Wang, W., Hu, P., Lyon, G. J. & Hakonarson, H. SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data. Nucleic Acids Res. 39, e132 (2011).
Article CAS PubMed PubMed Central Google Scholar
Pruitt, K. D. et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 42, D756–63 (2014).
Article CAS PubMed Google Scholar
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).
Article PubMed PubMed Central Google Scholar
Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Article ADS CAS Google Scholar
Davydov, E. V. et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput. Biol. 6, e1001025 (2010).
Article PubMed PubMed Central Google Scholar
The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article PubMed Central Google Scholar
Roadmap Epigenomics Consortium. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Article PubMed Central Google Scholar
Pauls, D. L., Abramovitch, A., Rauch, S. L. & Geller, D. A. Obsessive-compulsive disorder: an integrative genetic and neurobiological perspective. Nat. Rev. Neurosci. 15, 410–424 (2014).
Article CAS PubMed Google Scholar
Kheradpour, P. et al. Systematic dissection of regulatory motifs in 2000 predicted human enhancers using a massively parallel reporter assay. Genome Res. 23, 800–811 (2013).
Article CAS PubMed PubMed Central Google Scholar
Becamel, C. et al. The serotonin 5-HT2A and 5-HT2C receptors interact with specific sets of PDZ proteins. J. Biol. Chem. 279, 20257–20266 (2004).
Article CAS PubMed Google Scholar
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Article CAS PubMed PubMed Central Google Scholar
Schneider, J. W. Caveats for using statistical significance tests in research assessments. J. Informetr. 7, 50–62 (2013).
Article Google Scholar
Taylor, S. Molecular genetics of obsessive-compulsive disorder: a comprehensive meta-analysis of genetic association studies. Mol. Psychiatry 18, 799–805 (2013).
Article CAS PubMed Google Scholar
Park, J.-H. et al. Distribution of allele frequencies and effect sizes and their interrelationships for common genetic susceptibility variants. Proc. Natl Acad. Sci. USA 108, 18026–18031 (2011).
Article ADS CAS PubMed PubMed Central Google Scholar
Huang, J. et al. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel. Nat. Commun. 6, 8111 (2015).
Article CAS PubMed PubMed Central Google Scholar
Wu, M. C. et al. Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89, 82–93 (2011).
Article CAS PubMed PubMed Central Google Scholar
Zuk, O. et al. Searching for missing heritability: designing rare variant association studies. Proc. Natl Acad. Sci. USA 111, E455–64 (2014).
Article CAS PubMed PubMed Central Google Scholar
de Wit, J. et al. LRRTM2 interacts with Neurexin1 and regulates excitatory synapse formation. Neuron 64, 799–806 (2009).
Article PubMed PubMed Central Google Scholar
Surmeier, D. J., Ding, J., Day, M., Wang, Z. & Shen, W. D1 and D2 dopamine-receptor modulation of striatal glutamatergic signaling in striatal medium spiny neurons. Trends Neurosci. 30, 228–235 (2007).
Article CAS PubMed Google Scholar
Sudhof, T. C. Neuroligins and neurexins link synaptic function to cognitive disease. Nature 455, 903–911 (2008).
Article ADS PubMed PubMed Central Google Scholar
Rujescu, D. et al. Disruption of the neurexin 1 gene is associated with schizophrenia. Hum. Mol. Genet. 18, 988–996 (2009).
Article CAS PubMed Google Scholar
Takata, A., Ionita-Laza, I., Gogos, J. A., Xu, B. & Karayiorgou, M. De novo synonymous mutations in regulatory elements contribute to the genetic etiology of autism and schizophrenia. Neuron 89, 940–947 (2016).
Article CAS PubMed PubMed Central Google Scholar
Castermans, D. et al. Identification and characterization of the TRIP8 and REEP3 genes on chromosome 10q21.3 as novel candidate genes for autism. Eur. J. Hum. Genet. 15, 422–431 (2007).
Article CAS PubMed Google Scholar
Blackstone, C., O’Kane, C. J. & Reid, E. Hereditary spastic paraplegias: membrane traffic and the motor pathway. Nat. Rev. Neurosci. 12, 31–42 (2011).
Article CAS PubMed PubMed Central Google Scholar
Kala, K. et al. Gata2 is a tissue-specific post-mitotic selector gene for midbrain GABAergic neurons. Development 136, 253–262 (2009).
Article CAS PubMed Google Scholar
Chen, Y. K. & Hsueh, Y. P. Cortactin-binding protein 2 modulates the mobility of cortactin and regulates dendritic spine formation and maintenance. J. Neurosci. 32, 1043–1055 (2012).
Article CAS PubMed Google Scholar
Dodman, N. H. et al. Genomic risk for severe canine compulsive disorder, a dog model of human OCD. Int. J. Appl. Res. Vet. Med. 14, 1–18 (2016).
Google Scholar
Porcelli, S. et al. Pharmacogenetics of antidepressant response. J. Psychiatry Neurosci. 36, 87–113 (2011).
Article PubMed PubMed Central Google Scholar
Shmelkov, S. V. et al. Slitrk5 deficiency impairs corticostriatal circuitry and leads to obsessive-compulsive-like behaviors in mice. Nat. Med. 16, 598–602 (2010).
Article CAS PubMed PubMed Central Google Scholar
Graf, E. R., Zhang, X., Jin, S. X., Linhoff, M. W. & Craig, A. M. Neurexins induce differentiation of GABA and glutamate postsynaptic specializations via neuroligins. Cell 119, 1013–1026 (2004).
Article CAS PubMed PubMed Central Google Scholar
Doly, S. & Marullo, S. Gatekeepers controlling GPCR export and function. Trends Pharmacol. Sci. 36, 636–644 (2015).
Article CAS PubMed Google Scholar
Marin, O. & Rubenstein, J. L. A long, remarkable journey: tangential migration in the telencephalon. Nat. Rev. Neurosci. 2, 780–790 (2001).
Article CAS PubMed Google Scholar
Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).
Article ADS PubMed Central Google Scholar
Schwarz, D. S. & Blower, M. D. The endoplasmic reticulum: structure, function and response to cellular signaling. Cell Mol. Life Sci. 73, 79–94 (2016).
Article CAS PubMed Google Scholar
The UniProt Consortium. UniProtKB—P28223 (5HT2A_HUMAN). http://www.uniprot.org/uniprot/P28223. Accessed 8th August (2016)
Ashley, E. A. Towards precision medicine. Nat. Rev. Genet. 17, 507–522 (2016).
Article CAS PubMed Google Scholar
Niknafs, N. et al. MuPIT interactive: webserver for mapping variant positions to annotated, interactive 3D structures. Hum. Genet. 132, 1235–1243 (2013).
Article PubMed PubMed Central Google Scholar
Chen, F., Venugopal, V., Murray, B. & Rudenko, G. The structure of neurexin 1α reveals features promoting a role as synaptic organizer. Structure 19, 779–789 (2011).
Article CAS PubMed PubMed Central Google Scholar
NCBI Resource Coordinators. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 44, D7–D19 (2016).
Article Google Scholar
Chen, Y.-K., Y.-K., C. & Y.-P., H. cortactin-binding protein 2 modulates the mobility of cortactin and regulates dendritic spine formation and maintenance. J. Neurosci. 32, 1043–1055 (2012).
Article CAS PubMed Google Scholar
Lambe, E. K., Fillman, S. G., Webster, M. J. & Shannon Weickert, C. Serotonin receptor expression in human prefrontal cortex: balancing excitation and inhibition across postnatal development. PLoS ONE 6, e22799 (2011).
Article ADS CAS PubMed PubMed Central Google Scholar
Jenkins, A. K. et al. Neurexin 1 (NRXN1) splice isoform expression during human neocortical development and aging. Mol. Psychiatry 21, 701–706 (2016).
Article CAS PubMed Google Scholar
El Sayegh, T. Y. et al. Cortactin associates with N-cadherin adhesions and mediates intercellular adhesion strengthening in fibroblasts. J. Cell Sci. 117, 5117–5131 (2004).
Article CAS PubMed Google Scholar
Chen, Y. K., Chen, C. Y. & Hu, H. T. CTTNBP2, but not CTTNBP2NL, regulates dendritic spinogenesis and synaptic distribution of the striatin–PP2A complex. Mol. Biol. Cell 23, 4383–4392 (2012).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank the participating individuals for their support, Eric S. Lander, Steven E. Hyman, Jessica Alföldi, and Kaitlin Samocha for valuable input; Leslie Gaffney for help with illustrations; Jeremiah M. Scharf for sample contribution and discussions; and Broad Genomics Platform for sample processing, sequencing, and genotyping. H.J.N. is supported by the AKC Health Foundation and Swedish Research Council, C.R. by the Swedish Research Council (K2013-61P-22168), K.L.-T. by the Swedish Medical Research Council and European Research Council, and E.K.K. by NIH NIMH (1R21MH109938-01). A Broad Institute SPARC grant supported part of this work.

Author information

Elinor K. Karlsson and Kerstin Lindblad-Toh contributed equally to this work

Authors and Affiliations

Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA, 02142, USA
Hyun Ji Noh, Ruqi Tang, Jason Flannick, Colm O’Dushlaine, Ross Swofford, Daniel Howrigan, Diane P. Genereux, Jeremy Johnson, Michele Koltookian, Elinor K. Karlsson & Kerstin Lindblad-Toh
Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research, Massachusetts Institute of Technology, 43 Vassar Street, Cambridge, MA, 02139, USA
Ruqi Tang & Guoping Feng
Renji Hospital, School of Medicine, Shanghai Jiao Tong University, 145 Shandong Middle Road, Huangpu Qu, Shanghai, 200001, China
Ruqi Tang
GGZ inGeest and Department of Psychiatry, VU University Medical Center, De Boelelaan 1117, 1081 HV, Amsterdam, The Netherlands
Gerard van Grootheest
Department of Child & Adolescent Psychiatry and Psychotherapy, University Hospital of Psychiatry Zurich, University of Zurich, Neumünsterallee 9, Zurich, 8032, Switzerland
Edna Grünblatt & Susanne Walitza
Neuroscience Center Zurich, University of Zurich & ETH Zurich, Winterthurer Strasse 190, Zurich, 8057, Switzerland
Edna Grünblatt & Susanne Walitza
Zurich Center for Integrative Human Physiology, University of Zurich, Winterthurer Strasse 190, Zurich, 8057, Switzerland
Edna Grünblatt & Susanne Walitza
Department of Clinical Neuroscience, Centre for Psychiatry Research, Karolinska Institutet Tomtebodavägen 18A, Stockholm, 17177, Sweden
Erik Andersson, Diana R. Djurfeldt & Christian Rück
Stockholm Health Care Services, Stockholm County Council, Stockholm, 14186, Sweden
Diana R. Djurfeldt & Christian Rück
Department of Psychiatry, University of Michigan, 4250 Plymouth Road, Ann Arbor, MI, 48109, USA
Paresh D. Patel & Gregory L. Hanna
Department of Medical Epidemiology & Biostatistics, Karolinska Institutet, Stockholm, 17177, Sweden
Christina M. Hultman
Department of Psychiatry & Behavioral Sciences, USC, 2250 Alcazar Street, Los Angeles, CA, 90033, USA
Michele T. Pato, Carlos N. Pato & James A. Knowles
Department of Psychiatry & Human Behavior, Brown Medical School, 345 Blackstone Boulevard, Box G-BH, Providence, RI, 02906, USA
Steven A. Rasmussen
Department of Psychiatry, Harvard Medical School, 401 Park Drive, Boston, MA, 02215, USA
Michael A. Jenike
BC Mental Health & Addictions Research Institute, UBC, 2255 Wesbrook Mall, Vancouver, BC, Canada, V6T 2A1
S. Evelyn Stewart
Department of Psychiatry & Psychotherapy, University of Cologne, Kerpener Street 62, Cologne, 50937, Germany
Stephan Ruhrmann
Department of Psychiatry & Psychotherapy, University of Medicine Greifswald, Fleischmannstrasse 8, Greifswald, 17475, Germany
Hans-Jörgen Grabe
Department of Psychiatry & Psychotherapy, University of Bonn, Regina-Pacis-Weg 3, Bonn, 53113, Germany
Michael Wagner
German Center for Neurodegenerative Diseases, Sigmund-Freud-Strasse 27, Bonn, 53127, Germany
Michael Wagner
Department of Psychiatry & Genetics Institute, University of Florida, 1149 Newell Drive, Gainesville, FL, 32610, USA
Carol A. Mathews
Department of Clinical & Health Psychology, Utrecht University, Heidelberglaan 1, Utrecht, CS, 3584, The Netherlands
Daniëlle C. Cath
Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA, 02142, USA
Guoping Feng
Program in Bioinformatics & Integrative Biology and Program in Molecular Medicine, University of Massachusetts Medical School, 368 Plantation Street, Sherman Center, Worcester, MA, 01605, USA
Elinor K. Karlsson
Science for Life Laboratory, IMBIM, Uppsala University, Uppsala, 75236, Sweden
Kerstin Lindblad-Toh

Authors

Hyun Ji Noh
View author publications
You can also search for this author in PubMed Google Scholar
Ruqi Tang
View author publications
You can also search for this author in PubMed Google Scholar
Jason Flannick
View author publications
You can also search for this author in PubMed Google Scholar
Colm O’Dushlaine
View author publications
You can also search for this author in PubMed Google Scholar
Ross Swofford
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Howrigan
View author publications
You can also search for this author in PubMed Google Scholar
Diane P. Genereux
View author publications
You can also search for this author in PubMed Google Scholar
Jeremy Johnson
View author publications
You can also search for this author in PubMed Google Scholar
Gerard van Grootheest
View author publications
You can also search for this author in PubMed Google Scholar
Edna Grünblatt
View author publications
You can also search for this author in PubMed Google Scholar
Erik Andersson
View author publications
You can also search for this author in PubMed Google Scholar
Diana R. Djurfeldt
View author publications
You can also search for this author in PubMed Google Scholar
Paresh D. Patel
View author publications
You can also search for this author in PubMed Google Scholar
Michele Koltookian
View author publications
You can also search for this author in PubMed Google Scholar
Christina M. Hultman
View author publications
You can also search for this author in PubMed Google Scholar
Michele T. Pato
View author publications
You can also search for this author in PubMed Google Scholar
Carlos N. Pato
View author publications
You can also search for this author in PubMed Google Scholar
Steven A. Rasmussen
View author publications
You can also search for this author in PubMed Google Scholar
Michael A. Jenike
View author publications
You can also search for this author in PubMed Google Scholar
Gregory L. Hanna
View author publications
You can also search for this author in PubMed Google Scholar
S. Evelyn Stewart
View author publications
You can also search for this author in PubMed Google Scholar
James A. Knowles
View author publications
You can also search for this author in PubMed Google Scholar
Stephan Ruhrmann
View author publications
You can also search for this author in PubMed Google Scholar
Hans-Jörgen Grabe
View author publications
You can also search for this author in PubMed Google Scholar
Michael Wagner
View author publications
You can also search for this author in PubMed Google Scholar
Christian Rück
View author publications
You can also search for this author in PubMed Google Scholar
Carol A. Mathews
View author publications
You can also search for this author in PubMed Google Scholar
Susanne Walitza
View author publications
You can also search for this author in PubMed Google Scholar
Daniëlle C. Cath
View author publications
You can also search for this author in PubMed Google Scholar
Guoping Feng
View author publications
You can also search for this author in PubMed Google Scholar
Elinor K. Karlsson
View author publications
You can also search for this author in PubMed Google Scholar
Kerstin Lindblad-Toh
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

K.L.-T., E.K.K., G.F., H.J.N., and R.T. conceived and designed the experiments. H.J.N., R.T., E.K.K., J.F., C.O.’D., R.S., D.H., D.P.G., K.L.-T. analyzed the data. H.J.N., D.P.G., E.K.K., and K.L.-T. wrote the paper. R.T. and H.J.N. performed sequence capture. R.S. performed EMSA. M.W., H.-J.G., S.R., C.A.M., S.E.S., S.A.R., M.A.J., J.A.K., C.R., E.G., G.L.H., D.C.C., E.A., S.W., P.D.P., C.H., M.T.P., and C.N.P. diagnosed/collected samples. J.J., M.K., and G.v.G. coordinated/prepared samples and data generation.

Corresponding authors

Correspondence to Hyun Ji Noh, Elinor K. Karlsson or Kerstin Lindblad-Toh.

Ethics declarations

Competing interests

The authors declare is no competing financial interests.

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Supplementary Information

Peer Review File

Supplementary Description

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Supplementary Data 4

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Noh, H.J., Tang, R., Flannick, J. et al. Integrating evolutionary and regulatory information with a multispecies approach implicates genes and pathways in obsessive-compulsive disorder. Nat Commun 8, 774 (2017). https://doi.org/10.1038/s41467-017-00831-x

Download citation

Received: 25 January 2017
Accepted: 01 August 2017
Published: 17 October 2017
DOI: https://doi.org/10.1038/s41467-017-00831-x

This article is cited by

Human microbiota from drug-naive patients with obsessive-compulsive disorder drives behavioral symptoms and neuroinflammation via succinic acid in mice
- Ying-Dan Zhang
- Dong-Dong Shi
- Zhen Wang
Molecular Psychiatry (2024)
The Comprehensive Effect of Socioeconomic Deprivation on Smoking Behavior: an Observational and Genome-Wide by Environment Interaction Analyses in UK Biobank
- Chuyu Pan
- Xin Qi
- Feng Zhang
International Journal of Mental Health and Addiction (2024)
In search of environmental risk factors for obsessive-compulsive disorder: study protocol for the OCDTWIN project
- David Mataix-Cols
- Lorena Fernández de la Cruz
- Jan C. Beucke
BMC Psychiatry (2023)
Effects of transcranial magnetic stimulation of the rostromedial prefrontal cortex in obsessive–compulsive disorder: a randomized clinical trial
- Luca Cocchi
- Sebastien Naze
- Michael Breakspear
Nature Mental Health (2023)
Comparative neurogenetics of dog behavior complements efforts towards human neuropsychiatric genetics
- Kathleen Morrill
- Frances Chen
- Elinor Karlsson
Human Genetics (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Targeted-sequencing design

Variant detection

Variant annotation

Gene-based burden analysis

Validation of candidate variants by genotyping

Functional validation of regulatory variants using electrophoretic mobility shift assay

Validation of coding variants using ExAC

Discussion

Methods

Study design

Targeted regions

Pooled sequencing and variant annotation

Genotyping

EMSA

Statistical analysis

Code availability

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links