Obsessive-compulsive disorder (OCD) is a neuropsychiatric condition that has an estimated prevalence of 2–3%1. Despite the unitary nosological status of OCD (DSM-5), considerable heterogeneity of OCD symptoms exists. Several studies have looked into different symptom dimensions present in OCD; some have reported exploratory or confirmatory factor analysis based on Yale-Brown Obsessive Compulsive Scale Checklist (Y-BOCS-CL) items. Most of those studies have reported four or five main (second-order) factors, which in part depend on the methodology employed2,3, thereby suggesting a multidimensional model for OCD. This multidimensional nature of OCD has been confirmed by meta-analyses and systematic reviews4,5. Along these lines, specific instruments such as the Dimensional Yale-Brown Obsessive Compulsive Scale (DY-BOCS6;) have been developed to assess OCD severity in different symptom dimensions.

A range of clinical characteristics has been associated with OCD symptom dimensions in different ways. In this vein, it has been proposed that the hoarding and symmetry dimensions are characteristic of an early-age OCD group7,8. In terms of comorbidities, a factor comprising aggressive, sexual and religious symptoms has been associated with comorbid major depressive disorder and bipolar disorder (MDD/BD); while patients with symmetry/order symptoms show greater comorbidity with eating and addictive disorders as well as attention-deficit/hyperactivity disorder (ADHD)9. Meanwhile, the contamination/cleaning dimension has been reported as the dimension that is least frequently associated with any other Axis I disorder10. In addition, the hoarding and symmetry/order dimensions have been associated with a poorer response to pharmacological treatment8,10.

Differential endophenotypic profiles have also been reported in relation to symptom dimensions in neuroimaging studies. For instance, OCD probands with symmetry/order symptoms have been reported to present a reduced volume of the right precentral gyrus11 and the hippocampus, which is also associated with aggressive/checking symptoms when compared to healthy controls12. In addition, the aggressive/checking and contamination/cleaning dimensions have been negatively correlated to right cerebellum and right insula volumes, respectively13. In terms of functionality, patients with aggressive/checking and sexual/religious symptoms have been found to present greater amygdala activation when confronted by fear-inducing stimuli14. Also, differences in connectivity have been observed between the aggressive/checking, sexual/religious and hoarding dimensions15. Although we do not know to what extent these observed neurological differences between the OCD dimensions have a genetic basis, some of the structural and functional brain characteristics identified have been directly related to certain genetic variants in OCD patients as well as in those with other psychiatric disorders16,17,18,19,20.

In fact, specific genetic bases have been identified for the different OCD symptom dimensions. For instance, severity of the contamination/cleaning dimension has been associated with both the Met allele of the Vall66Met locus within the brain-derived neurotrophic factor gene (BDNF)21 and the c.256G allele of the 5-hydroxytryptamine receptor 3E (HTR3E) variant rs762761522. The presence of this dimension has also been associated with the variants rs4657411 within the LIM homeobox transcription factor 1 alpha gene (LMX1A), and rs2075507 of the catechol-O-methyltranspherase gene (COMT) in women23,24. In addition, a protective role against these dimensions has been attributed to the ACCCG haplotype of the estrogen receptor 1 gene (ESR1)25. The symmetry/order dimension has been related to the S allele and the SS genotype of the serotonin transporter polymorphic region (SERTPR)26; the presence of the 2R allele within the dopamine receptor D4 gene (DRD4) 48-bp variable number of tandem repeats polymorphism (VNTR)27; and the A allele in the COMT variant rs2075507 in men24. This dimension in combination with aggressive/checking behavior has been associated with specific variants in a promoter region of the glutamate ionotropic receptor NMDA-type subunit 2B gene (GRIN2B) (rs1019385)28 and the SLIT and NTRK-like family member 1 gene (SLITRK1) (rs9593835); the latter, specifically in men23. The SERTPR has also been associated with higher scores in a religious/somatic dimension (l/s and l/l genotypes)29 and in counting and repeating rituals in OCD patients with a comorbid tic disorder (l/l genotype)30. Women who exhibit hoarding symptoms have been reported to present a higher frequency of the Met/Met genotype of the COMT variant rs4680 than those who do not. Hoarding has also been associated with a variant (rs1017412) within the neurotrophic receptor tyrosine kinase 3 gene (NTRK)31 and with both the short variant of the serotonin-transporter-linked polymorphic region (5HTTLPR), and its long variant together with the G allele at rs25531 in males32. A neutralization dimension, as assessed by the Obsessive-Compulsive Inventory-Revised (OCI-R) has been associated with a variant of LMX1A (rs4657411)23. Severity scores in a dimension comprising somatic and sensory phenomena symptoms have shown a trend towards an association with the Val58Met genotype of the COMT gene in interaction with sex, with women presenting lower scores33.

Although a large number of studies have focused on elucidating the genetic basis of OCD, inconsistent results have been reported in most respects34. A possible explanation for this is that the studies do not usually consider different symptom profiles among OCD patients. It has consistently been argued that it is necessary to account for OCD heterogeneity in genetic and neurobiological studies35,36. Therefore, in this study, we analyze the variants, genes and functional pathways that might be differentially involved in the OCD dimensions measured by the DY-BOCS6 through an exploratory genomic method. We hypothesize that different genomic bases will be found for the different OCD dimensions.



Three hundred and ninety-nine Caucasian Spanish patients (n = 399; 210 women; mean age = 35 ± 11) with an OCD diagnosis were recruited from the OCD clinic at Bellvitge Hospital (Barcelona, Spain). Diagnoses were made by three psychiatrists with extensive clinical experience in OCD, following the DSM-IV criteria for OCD diagnosis37 using the Structured Clinical Interview for DSM-IV Axis I Disorders-Clinician Version (SCIDCV)38. All the patients had the disorder for at least one year. Those patients presenting psychoactive substance abuse/dependence (current or in the past six months), psychotic disorders, intellectual disability, severe organic or neurological pathology (except tic disorders), or autism spectrum disorders were excluded from the study. Other affective and anxiety disorders were not criteria for exclusion in cases where OCD was the main diagnosis.

Participants were required to give written consent after being fully informed about the study. The study was approved by the Ethical Committees of Bellvitge Hospital and was performed in accordance with the Helsinki Declaration of the World Medical Association.

Clinical assessment

Medical data and both sociodemographic and clinical characteristics were collected via a structured interview during each patient’s first appointment at the clinic.

Age at onset was defined as the moment when obsessive symptoms reached a clinically significant level. Family psychiatric history was considered dichotomously, but specific information regarding family history of OCD, Tourette syndrome and depression was also collected. Only family members who had received a formal diagnosis were considered to be affected.

Baseline severity of obsessive and compulsive symptoms was also assessed through the clinician-administered version of the Y-BOCS39 during the patient’s first visit to the clinic. A global measure, as well as independent measures for both obsessions and compulsions, were recorded.

OCD Dimension: presence and severity

Dimension-specific presence and severity were evaluated using the DY-BOCS6, which is composed of a self-report part and a clinician-rated instrument. It assesses OCD severity in six different symptom dimensions that gather together thematically similar symptoms. The six are: aggressive obsessions and checking compulsions (aggressive/checking); symmetry obsessions and order compulsions (symmetry/order); contamination obsessions and cleaning/washing compulsions (contamination/cleaning); hoarding obsessions and compulsions (hoarding); sexual or religious obsessions accompanied by different rituals (sexual/religious); and a miscellaneous dimension including other obsessive thoughts and compulsive behavior (miscellaneous). We did not consider this last dimension in our analyses given its lack of specificity.

Genotyping data and quality control

Our sample consisted of 399 OCD patients genotyped using the Infinium PsychArray-24 BeadChip from Illumina. This array was developed in collaboration with the Psychiatric Genomics Consortium (PGC) and includes 50,000 variants associated with common psychiatric disorders. Variant calling was performed using three different algorithms: GenCall, which is Illumina’s default calling algorithm, and Birdseed, both for common variants; and zCall, aimed at rare variant calling. A unique set of genotypes was derived from a consensus merge of the GenCall and Birdseed common variants, also including rare variants called by zCall that passed quality control (QC) from the consensus merge of GenCall and Birdseed.

QC filtering of genotype data was performed using PLINK40. Only non-monomorphic autosomal biallelic variants in Hardy Weinberg equilibrium (p < 0.0001) with a call rate of above 98% were included.

Samples that had a call rate lower than 98% were removed. Identity by descent was calculated using independent SNPs, and omitting those samples with a pi-hat greater than 0.241. Population stratification was tested by principal component analysis, removing those samples that deviated by more than 5 standard deviations (SDs) from the mean in the first two components.

Statistical analysis

Association analyses at the SNP level were performed using the GenABEL library for R software42. Regression analyses were carried out under a log-additive model (in which the genotypes were coded as 0, 1 or 2 depending on the number of minor alleles). These operations were performed for autosomal SNPs (markers showing a minor allele frequency (MAF) > 0.05 in autosomes). Given the non-normality of the DY-BOCS scores and the impossibility of normalizing them, these variables were dichotomized to analyze the presence/absence of the different DY-BOCS dimensions. A logistic regression analysis was performed for each dimension, with the dependent variable being the dimension we were testing. Age, sex, and the four other dimensions were included as covariates in all the models. Linkage-disequilibrium (LD) plots were designed using the LocusZoom software, based on 1000 genome CEU population data (hg19/1000 Genomes Mar 2012 EUR)43. For SNP annotation, we used the Infinium PsychArray Gene Annotation File provided by Illumina (

Power calculations were performed with Genetic Association Study Power Calculator software ( to determine the power of our study to detect associations considering our sample size.

Gene-based association analyses were performed via the Sequence Kernel Association Test (SKAT)44 using the SKATMeta library45 for R software. This type of gene-based analysis has the advantage of including rare variants, which were not considered in SNP-based analyses given the lack of statistical power for detecting associations with these variants at a single-variant level. SKAT combines the effects of common and rare variants in gene-sets, increasing the power to detect small effects.

Only results from genes with at least two genotyped markers were considered. A false discovery rate (FDR) correction was used as a significance criterion.

Finally, enrichment analyses were performed for genes with at least two genotyped markers that had a SKAT p-value lower than 0.01 using web-accessible DAVID (Database for Annotation, Visualization, and Integrated Discovery) Bioinformatic Resources v6.846,47. This software analyzes input genes in the context of a genomic background (in this study, we selected the entire human genome as background) to cluster genes enriched in biological pathways and gene ontologies. We ordered the reported results by the FDR statistic to prioritize for further interpretation.


Subjects and genotyping quality control

Three hundred and seventy-six (n = 376) samples passed quality control procedures. Table 1 summarizes the sociodemographic and clinical data including DY-BOCS scores gathered for the final OCD sample.

Table 1 Sociodemographic and clinical characteristics of the sample of 376 OCD patients.

SNP-level association analyses

Our total dataset consisted of 338,357 autosomal markers, of which 258,937 were SNPs (MAF ≥ 0.05).

No SNP exceeded the statistical threshold for genome-wide significance (p < 5 × 10−8) in any dimension (Fig. 1). The results at a p value ≤ 10−4 for the five dimensions can be seen in Table 2. Suggestive associations (p < 10−5) were found with the aggressive, contamination, order, and hoarding dimensions (Fig. 1a–d).

Fig. 1: Manhattan plots for the genome-wide association analyses of genetic variations and OCD dimensions.
figure 1

a Aggressive dimension. b Contamination dimension. c Order dimension. d Hoarding dimension. e Sexual/religious dimension. A blue line indicates the level of suggestive evidence of association (p < 1 × 10−5).

Table 2 Results from SNP_level regression analyses on OCD dimensions.

In the order dimension, eight variants presented suggestive associations (p < 10−5). Six of these variants were within chromosome 12 and, given the proximity between them (5 of them were variants of the IPO8 gene), we performed LD analysis on this region. We found two clusters of SNPs: (1) rs7316477 (1.50 × 10−6), rs14139 (1.50 × 10−6), rs6487927 (1.58 × 10−6) and rs10771752 (1.83 × 10−6), all exonic variants located in the IPO8 (Importin 8) gene (Fig. 2a); and (2) rs6487928 (2.99 × 10−6) and rs12146709 (3.70 × 10−6), exonic variants located in IPO8 and CAPRIN2 (Caprin Family Member 2), respectively (Fig. 2b). All these markers should be considered as forming a single association peak, since the level of association was not maintained for any of them when adjusting the model for each of the other markers.

Fig. 2: Regional association plots with LD reported for Order dimension-suggestively associated (p < 10-5) single-nucleotide polymorphisms (SNPs) in chromosome 12.
figure 2

a Plot of rs7316477. b Plot of rs6487928.

Gene-based association analyses

Gene-based analysis were performed for 22,017 (aggressive), 21,179 (order), 20,466 (contamination) and 22,472 (hoarding and sexual/religious) genes. Of these, 17,001 (aggressive), 16,467 (order), 15,790 (contamination), and 17,378 (hoarding and sexual/religious) had at least two genotyped markers and were considered in our further analyses.

Our results for genes with at least a suggestive association (p < 10−4) in the five dimensions can be seen in Table 3. One gene reached genome-wide significant association with hoarding (SET Domain Containing 3, Actin Histidine Methyltransferase, SETD3; p = 1.89 × 10−08). This gene codes for a protein involved, among other functions, in actin binding and modification, histone methylation, chromatin organization, and regulation of transcription48. The second most significant gene was CPE (Carboxypeptidase E), which reached genome-wide significant association with the aggressive dimension (p = 4.42 × 10−6). This gene codes for a membrane protein involved in the synthesis of peptide hormones and neurotransmitters as well as different neurotrophic functions49.

Table 3 Best gene-based results from SKAT analyses for the different OCD dimensions.

Functional annotation

Functional annotation was gathered for a final set of 154 (aggressive), 103 (contamination), 111 (order), 196 (hoarding) and 167 (sexual/religious) genes. The threshold used to select the genes included in the enrichment analyses (p < 0.01) included 1% of the genes in all the dimensions except the hoarding dimension, for which 2% of the genes presented a p-value lower than 0.01.

Detailed information on the results of these analyses are given in Tables 4 and 5. In the case of the hoarding dimension, the different pathways obtained are clustered in two groups according to their biological similarity.

Table 4 Results from enrichment analyses on Aggressive, Order and Sexual/religious dimensions.
Table 5 Results from enrichment analyses on Hoarding dimension.


In this study, we examined the genomic bases of each of the most consistently validated symptom dimensions of OCD. Differential findings were obtained for the five dimensions considered. At the SNP-level, no variant reached genome-wide significance. The top SNPs were six markers in the order dimension (forming a single association peak) and one variant in the hoarding dimension (p < 5 × 10−6). Gene analyses showed one gene associated with hoarding (SETD3, p = 1.89 × 10−8) and another with the aggressive dimension (CPE, p = 4.42 × 10−6) at a genome-wide level. Different pathways or biological processes were represented by the aggressive, order, sexual/religious, and hoarding dimensions. For contamination, no pathway remained associated after FDR correction.

Six of the top variants at the SNP-level analysis conformed a genomic region involving IPO8 and CAPRIN2, which presented an association signal with the order dimension. Certain intergenic variants near CAPRIN2 presented association signals (p < 5 × 10−5) with different neuropsychological variables and personality traits (; Considering the LD findings for this region and the fact that the six variants represent a single peak of association, it may be interesting to consider this genomic region in further studies, since it could be a relevant for the order dimension.

Gene-based analyses reported one gene associated with hoarding at the genome-wide significance level (SETD3, p = 1.89 × 10−8). That gene is expressed in brain regions associated with OCD50,51 such as caudate and cerebellum (The Human Protein Atlas, SETD3, gene available from and seems to mediate transcriptome changes in the hypothalamus of mice52. It has also been associated with apoptotic processes, which in turn have been observed to mediate the neuronal loss in certain brain regions of BD patients53. Although two SNPs located near SETD3 show suggestive associations with autoimmune and inflammatory conditions (rs2614463, P = 7.00 × 10−6; rs2664299, P = 9.00 × 10−6), none of them are in LD with any of the SETD3 variants in our analysis—rs12886549 (P = 0.68, MAF = 0.24, OR = 1.09); rs8015827 (P = 0.79; MAF = 0.39; OR = 1.05); rs34322735 (P = 1.72 × 10-08, MAF = 0.01, OR = 101.39). CPE, which was associated with the aggressive dimension, is a gene highly expressed in brain54. Different polymorphisms in this gene have been associated with a loss of neuroprotective function by reducing the effects of the oxidative stress in human cell lines55 and transgenic mice56, leading to memory deficits and depressive behavior56. In addition, CPE-knockout mice have shown neurodegeneration in the hippocampus and the prefrontal cortex57.

In relation to the aggressive dimension, enrichment analyses showed overrepresentation in response to zinc ions (GO:0010043, FDR = 0.002). Zinc ions are highly prevalent in the brain, being especially prominent in forebrain glutamatergic neurons, the hippocampus, and the amygdala58,59, and they plays an important role in neuronal plasticity60. Zinc deficiency has been associated with cognitive decline, Alzheimer’s disease (AD) and different psychiatric disorders in the elderly. In addition, genetic variants within ZNF142, a gene coding for a zinc-finger protein, have been associated with a neurodevelopmental disorder resulting in speech impairment and intellectual disability61. The Peroxisomal lipid metabolism pathway (R-HSA-390918) was also significantly enriched in the aggressive dimension. Lipid metabolism has been extensively associated with different neuropsychiatric disorders, such as BD and depressive disorders, as well as AD62,63,64,65. Similarly, the sphingolipid signaling pathway (hsa04071) appears overrepresented in the order dimension. Sphingolipids are structural elements of cellular membranes and they play a role in cell signaling, differentiation and proliferation, apoptotic processes and inflammation66. Sphingolipid signaling has been observed to be involved in anxiety-like behavior in animal models as well as schizophrenia, depression and BD66,67,68,69.

Sexual/religious dimension genes were found enriched for G alpha (12/13) signaling events (R-HSA-416482). G12/13 subunits are alpha units of heterotrimeric G proteins that regulate different cell processes through the use of guanine nucleotide exchange factors (GEFs). This family of G-protein subunits has been associated with neurodevelopment and is involved in processes of cell proliferation and migration70. In addition, G12 subunits have been observed to influence memory consolidation and contextual retrieval in mice, via increased expression in the hippocampus71.

For the hoarding dimension, the first cluster of biological mechanisms and pathways includes cellular metabolic processes, such as lipid, vitamin and carbohydrate metabolism, all involving glucuronidation processes, as most genes included in these mechanisms or pathways code for UDP-glucuronosyltransferases (Table 5). It has been demonstrated that an alteration of the activity of these enzymes can affect brain function72. As an example, induction of UDP-glucuronosyltransferase 1A1 during the prenatal period can cause neurodevelopmental disorders in mice73.

There is increasing evidence from metabolomic studies of the importance of metabolic processes in psychiatric disorders. Post-traumatic stress disorder (PTSD) has been associated with the alteration of different kinds of metabolites, such as monosaccharides, nucleosides or fatty acids74. Furthermore, there is evidence of dysfunctional metabolism of lipids and vitamins in depressed patients75, which may explain the high prevalence of comorbid metabolic syndrome and cardiovascular disease76.

A lower plasma concentration of certain lipids with a neuroprotective role also has been observed in PTSD patients, compared to healthy controls77. Altered lipid metabolism has been found in other psychiatric conditions, such as MDD, BD or schizophrenia, and associated with symptoms including anxiety, stress and cognitive impairment78. Lipid metabolism in turn influences steroid synthesis, which has been associated with brain electrical activity through the role of lipids in modulating neuronal excitability79. In addition, an alteration of porphyrin and chlorophyll metabolism might affect the formation of heme groups, possibly leading to neurotoxic effects, among others80,81.

The second cluster of biological mechanisms and pathways for the hoarding dimension is related to anion transport. Most of the hoarding dimension genes involved are members of the solute-carrier 26 family A (SLC26A). These transporters have been observed to influence, among other functions, microbiome composition, pH regulation, and anion transport82, which in turn have been related with the pathophysiology of different psychiatric disorders83. More specifically, anion transport and pH regulation in the brain play a role in intra- and inter-signaling and plasticity processes84. In relation to microbiome composition, late-onset autism has been associated with differences in the gastrointestinal microflora when compared to healthy controls85. Moreover, exposure to certain microbial pathogens during fetal development has been associated with the pathogenesis of schizophrenia in humans86, and both anxiety-like behavior and cognitive impairment in rodents87,88. It is interesting to note the increase in the genetic weight observed in the hoarding dimension when the rare variants are included in the analysis, since only two variants at the SNP level reach suggestive association and no SNP reaches genome-wide significance. This contrasts with the findings obtained for this dimension in the gene-based and pathway analyses, which are notably more numerous than for the other dimensions. We think this suggests a role for rare variants in the hoarding dimension. We also believe that the consistently observed higher heritability of this dimension, compared to the others9,10,89,90, could mostly be explained by the influence of rare variants. Further research is needed to reveal the genetic bases of the hoarding dimension.

Although OCD symptom dimensions overcome the unitary clinical diagnosis of OCD, subtyping OCD according to overt clinical manifestations also presents significant limitations. Due to methodological differences, no concrete OCD dimensions classification system has been universally accepted. Some authors argue that other taxometric methods should be used to elucidate the symptom dimensions in OCD, including age of onset, comorbidities, or neuropsychological functioning in combination with clinical manifestations. The lack of significant associations among OCD symptom dimensions and individual SNPs could reflect limited statistical power due to the small sample size. Considering our total sample size, the number of cases and controls for each OCD dimension, a significance threshold of p < 2 × 10−7 (Bonferroni threshold for 258,000 SNPs analyzed) and a risk allele frequency (MAF) of 0.1, our study has 80% power to detect a relative risk (RR) of 3.5 and 5 (equivalent to OR = 4.8 and OR = 9) for order and sexual/religious dimensions, respectively; for the three other OCD dimensions, the detectable RR under these conditions is 4 (equivalent to OR = 6). A representation of the statistical power achieved for different MAF and RR thresholds is shown in Supplementary Fig. 1 (Fig. S1a–e). Furthermore, we would have liked to consider the severity score for each dimension in our analysis, in addition to their presence/absence, which would have been possible with a larger sample. However, our sample was thoroughly characterized phenotypically, and our results highlight important differences in relation to the genetic bases, as well as the genetic load of the different OCD dimensions. In addition, rare variants are considered at gene-based and pathway analyses, since this kind of analysis increases the power of detecting small effects. The inclusion of rare variants is important given the growing appreciation for their importance in neuropsychiatric disorders91,92,93.

OCD is a highly heterogeneous disorder in terms of symptom profile, comorbidity and underlying brain substrate, which represents a challenge for understanding and treating the disorder. This heterogeneity may confound and contribute to mostly negative findings in current genome-wide analysis studies, despite clear evidence for a strong genetic component of the disorder based on twin and family studies, ranging from 40–65%. Therefore, broad consensus has emerged for the need to explore OCD not as a homogeneous diagnosis, but rather considering other phenomenological approaches that investigate more refined phenotypes. In this sense, investigating genetic markers associated with different OCD symptom dimensions could be a useful strategy to begin disentangling the complex genetic vulnerability of the disorder. A clearer identification of susceptibility genes for OCD would translate into a better understanding of the etiology of the disorder and would help to develop potentially targeted and specific treatment approaches to improve the long-term outcome for OCD patients.