Obsessive-compulsive disorder (OCD) is a heterogeneous psychiatric condition characterized by persistent, intrusive thoughts and urges (obsessions) and repetitive, intentional behaviours (compulsions)1. OCD affects 2–3% of the world’s population2,3. OCD is moderately heritable, with approximately 40% of the phenotypic variance explained by genetic factors, and a higher genetic heritability has been reported in childhood onset OCD4,5,6. The genetic architecture of OCD is complex, with multiple genetic variants of small effect size contributing to its etiology. This has hampered the identification and replication of genetic susceptibility factors. A meta-analysis of hypothesis-driven candidate gene association studies has implicated serotoninergic and catecholaminergic genes in OCD, while studies focusing on glutamatergic and neurotrophic genes have shown inconsistent results7. Neither of the two independent genome-wide association studies (GWASs) of OCD8,9 nor a subsequent meta-analysis (2,688 cases and 7,037 controls)10—including children, adolescents and adults—yielded genome-wide significant findings, likely due to lack of power. However, the meta-analysis demonstrated that the polygenic signal from either sample predicted OCD status in the other sample, indicating the polygenic nature of the disorder10.

The diagnosis of OCD is based solely on clinical symptoms, and no genetic or biological markers are available with sufficient specificity and accuracy to be clinically actionable11. However, factor analyses of OCD symptoms have consistently identified specific OCD symptom clusters or dimensions, with the most reliable including contamination/cleaning, doubt/checking, symmetry/ordering, and unacceptable/taboo thoughts11,12,13,14.

Obsessive-compulsive symptoms (OCS) are also present in the general population15,16,17. Indeed, 21 to 38 % of individuals in the population endorse obsessions and/or compulsions, although only a small minority (2–3 %) meet the DSM-5 criteria for a clinical OCD diagnosis18,19. OCS are also heritable, with their heritability ranging from 30 to 77%20,21. In addition to contributing to overall OCS, genetic factors contribute to specific OCS dimensions, including contamination/cleaning22,23,24 and checking/ordering24,25. Genetic overlap between clinical OCD and OCS in the population is suggested by the fact that polygenic risk scores (PRS) based on OCD GWAS data significantly predict OCS in two population-based samples of 6,931 and 3,982 individuals, respectively20,26. Moreover, a very recent analysis found that compulsive symptoms in the general population overlap with the genetic liability for clinical OCD27, and incorporating compulsive symptom GWAS data in a meta-analysis with OCD GWAS data yielded new findings in gene-based and gene enrichment analyses. Therefore, genetic studies of OCS in the general population could aid in the identification of susceptibility loci for clinical OCD and provide insight in specific symptom domains affected by individual genetic risk factors.

In a study aiming to identify molecular mechanisms underlying OCD, we earlier performed integration of the top-ranked results from the existing GWASs with genes implicated in OCD through other evidence. This resulted in a ‘molecular landscape’ that suggested the involvement of genes regulating postsynaptic dendritic spine formation and function through central nervous system (CNS) insulin-dependent signalling28. Support for a role of dysregulated insulin signaling in OCD and OCS comes from studies showing increased OCS in men with type 1 diabetes29 and from a study indicating that OCD patients have a higher risk of developing type 2 diabetes (T2D)30. Furthermore, OCS were found to be positively correlated with blood levels of glycosylated hemoglobin (HbA1c), a diagnostic measure of T2D31, and OCD patients had markedly higher levels of fasting glucose (a characteristic of T2D)32.

In this paper, we aimed to assess the presence and extent of genetic overlap between OCD, OCS in the population, and insulin signaling, using the largest available data sets. Specifically, we parsed phenotypic heterogeneity using an exploratory factor analysis of OCS measured in a population cohort of children and adolescents. Subsequently, using PRS-based analyses, we investigated the presence of shared genetic etiologies between OCD and the total and factorized OCS. We then assessed genetic sharing between OCD, OCS, and insulin-related traits. In addition to assessing shared genetic etiologies, we tested for potential overlapping biology using gene-set analyses. Lastly, we extended our findings in an independent population cohort of children and adolescents.


Sample, phenotypic, and genetic data

We studied OCS in the Philadelphia Neurodevelopmental Cohort (PNC)33,34,35,36,37, which includes 8,719 children and adolescents aged 8–21 years with neurobehavioral phenotypes and genome-wide genotyping data. Participants in the PNC provided written consent for genomic studies when they presented to the Children’s Hospital of Philadelphia health care network. OCS were assessed with GO-ASSESS, a computerized version of the Kiddie-Schedule for Affective Disorders and Schizophrenia (K-SADs)38. For the current study, we selected 22 GO-ASSESS questions that corresponded to the diagnostic criteria for OCD (Supplementary Table 1). Participants were included if they answered the questions related to obsessions and/or compulsions. If those questions were all answered “no”, we allowed the questions on the consequences of obsessions and compulsions to be left blank, as no consequences are expected if no symptoms are present. The scores for each of the questions (0 for “no” and 1 for “yes”) were then summed to create a total OCS score (range 0–22). Genome-wide genotyping in the PNC cohort was performed in waves using six different genotyping platforms (details in Supplementary Methods). As a primary aim of our study was to assess the genetic overlap between OCD and OCS in the population, we only used phenotypic and genetic data from those PNC participants who answered positively on at least one of the questions related to the presence of obsessions and/or compulsions. This resulted in a final sample of 650 individuals for the subsequent factor and genome-wide association analyses.

Factor analysis

First, using SPSS 23 (SPSS Technologies, Armonk, NY, USA), we determined the internal consistency (Cronbach’s α) of the 22 questions that constitute the total OCS scores in the 650 PNC participants. We then conducted a factor analysis of the scores on the 22 questions using Promax rotation to determine the number of factors that, when combined, explains the largest portion of the observed variance in the total OCS score. Specifically, we considered scree plots, eigenvalues >1 and the cumulative variance explained when selecting the number of factors and assigned questions to factors based on the highest absolute loading value.

Genome-wide association analyses

Quality control filtering was applied to the genetic data to remove non-European individuals, single nucleotide polymorphisms (SNPs) with low minor allele frequency (MAF) (<0.05), poor genotype call rate (<95%), and deviations from Hardy-Weinberg equilibrium (P < 1 × 10−6) (details in the Supplementary Methods). The imputation protocol used MaCH39 for haplotype phasing and minimac40 for imputation. Imputed SNPs with low imputation quality score (r2<0.8) and low MAF (<0.05) were removed. If the total OCS score or the scores for the OCS factors fell within the limits of a normal distribution (i.e., a skewness and kurtosis between −1 and 1), we used a continuous trait design for the genome-wide association analysis. Otherwise, we used a pseudo case-control design, in which all individuals with a score of 0 for a factor were defined as ‘controls’ and compared against the ‘pseudo cases’, i.e., all individuals with a score of 1 or more for that factor. GWASs were carried out with mach2qtl39 using the total OCS score and the scores for those factors that showed sufficient variation as phenotypes, with age, gender, and the four principal components from MDS included as covariates. GWASs were performed separately for each genotyping platform and combined in an inverse-variance-weighted meta-analysis using METAL41, accounting for genomic inflation.

Shared genetic etiology analyses


First, we determined the level of shared genetic etiology between diagnosed OCD and OCS in the population. For this, we used the summary statistics from the meta-analysis of the two published GWASs of OCD10 (data provided through the Psychiatric Genomics Consortium (PGC) for 2,688 OCD cases and 7,037 controls) as the ‘base’ sample for polygenic risk score (PRS)-based analyses in PRSice42. The summary statistics from the GWASs of the different OCS in the PNC were used as the ‘target’ samples for the PRS-based analyses. For details see Supplementary Methods.

OCD, OCS and peripheral insulin signaling-related traits

To determine the level of genetic sharing between five peripheral insulin signaling-related traits and OCD as well as OCS, we conducted PRS-based analyses in PRSice42, as described above. As base samples, we used summary statistics data from GWASs of the following peripheral insulin signaling-related traits: type 2 diabetes (T2D) and the blood levels of four T2D markers: HbA1c, fasting insulin, fasting glucose and glucose 2 h after an oral glucose challenge (2hGlu) (details in Supplementary Methods). As target samples of the PRS-based analyses, we used the summary statistics from the OCD GWAS meta-analysis and the GWASs of the total OCS score and the scores for the OCS factors in the PNC. Multiple comparisons correction for all tests performed (i.e., for the tests assessing genetic sharing between OCD and OCS and the tests assessing genetic sharing between the five insulin-related traits and OCD as well as OCS) was done using the Benjamini-Hochberg false discovery rate (FDR) method. With this method, we aggregated the calculated p-values of the shared genetic etiology analyses43,44, which is similar to the approaches used in earlier studies working with multiple phenotypes and PRSice45,46.

Gene-set analyses

We first compiled a set of genes encoding proteins from our molecular landscape of OCD28 (see above). In this paper by van de Vondervoort et al., the OCD landscape was built based on proteins that have been implicated in the disease through different types of genetic evidence. Firstly, proteins were included if their corresponding genes have been implicated in OCD through SNPs from the published GWASs that are associated at P < 1.00E-04 and are located within the gene and/or 100 kb of flanking downstream and upstream sequences. In addition, genes/proteins were included that have been implicated in other ways in OCD etiology. After critical evaluation of the literature, only genes/proteins that have received support through findings from (genetic) animal studies, gene mutations, and/or two or more independent candidate gene association studies (or at least nominal significance in meta-analysis) and/or mRNA/protein expression studies, were included. Our selection resulted in a set of 51 autosomal genes for subsequent analyses. Using the GWAS results of the total OCS score and the scores for the OCS factors, competitive gene-set analyses were then performed using the Multimarker Analysis of GenoMic Annotation (MAGMA) software47, see Supplementary Methods. P-values were considered significant if they exceeded a Bonferroni-corrected threshold accounting for the number of phenotypes tested (P < 0.05/7 tests (total OCS score and six OCS factors) = 7.14E-03). For significant gene-set associations, we looked at the individual gene-wide P-values and applied Bonferroni correction (P < 0.05/51 genes in the gene-set = 9.80E-04).

Validation analyses in an independent population sample

In order to validate and possibly expand our findings, we performed PRS-based and gene-set analyses using data from GWASs of OCS in an independent population sample: the ‘Spit for Science’ project, which includes 16,718 children and adolescents aged 6–17 years recruited from a local science museum48. OCS were measured using the Toronto Obsessive-Compulsive Scale (TOCS), a validated 21-item parent-or self-report questionnaire15. TOCS items are scored from −3 (far less often than others of the same age) to +3 (far more often than others of the same age). We first assessed which TOCS questions could be grouped into OCS factors similar to those calculated based on the PNC data. Only two OCS factors (being ‘symmetry/counting/ordering’ and ‘contamination/cleaning’) were similar and therefore could be used for validation purposes, see Supplementary Table 2. Genome-wide genotyping data for 5,047 individuals of Caucasian descent entered the ‘continuous trait’ GWAS analysis for each factor. A description of genotyping, quality control and imputation can be found elsewhere49 and GWAS details in the Supplementary Methods. Using summary statistics of the GWASs of the two TOCS OCS factors, we examined the shared genetic etiology between OCD and the TOCS OCS factors, and between the five peripheral insulin signaling-related traits and the TOCS OCS factors. As described above for the PNC data, multiple comparisons correction was done using the FDR method for all tests performed in the Spit for Science cohort. Gene-set analyses between the set of 51 genes from the OCD landscape and the two TOCS OCS factors were also performed.


Factor analysis

The internal consistency between the scores on the 22 OCS questions from the PNC was satisfactory (Cronbach’s α = 0.69). Supplementary Fig. 1A shows the total score distribution (mean = 6.4, s.d. = 3.35). Factor analysis revealed an eight factors solution as the best-fitting model, explaining 58.6% of the variance in the total score. We named these eight OCS factors ‘impairment’, ‘symmetry/counting/ordering’, ‘contamination/cleaning’, ‘aggressive taboo thoughts’, ‘repetition’, ‘guilty taboo thoughts’, ‘distress’, and ‘religious taboo thoughts’ (Table 1; factor score distributions in Supplementary Fig. 1B).

Table 1 Item content of and loadings on the eight factors that constitute the best fitting model to explain the variance in the total score of the 22 items from the questionnaire of obsessive-compulsive symptoms that was completed by 650 participants from the PNC cohort.

Genome-wide association analyses

Based on the distributions of the scores, we used a continuous trait design for the GWASs of the total OCS score and the factors ‘impairment’, ‘symmetry/counting/ordering’, and ‘distress’. A pseudo case-control design was used for the factors contamination/cleaning’ and ‘aggressive taboo thoughts’, and ‘guilty taboo thoughts’. The distribution of the scores on the OCS factors ‘repetition’ and ‘religious taboo thoughts’ showed too little variation to be taken forward (Supplementary Fig. 1B). Because of the lack of power—with only 650 individuals per GWAS—we do not report individual GWAS results, but we have used the GWAS results for PRS-based analyses.

Shared genetic etiology analyses


We found statistically significant evidence for a shared genetic etiology between diagnosed OCD and the population-based OCS factor ‘guilty taboo thoughts’ (R2 = 2.28%; P = 2.52E-03) (Supplementary Fig. 2 and Table 2).

Table 2 PRS-based results for shared genetic etiology between OCD and the total OCS score as well as the scores for six OCS factors.

OCD, OCS, and peripheral insulin signaling-related traits

We found statistically significant evidence for a shared genetic etiology between T2D and ‘aggressive taboo thoughts’ (R2 = 1.86%; P = 5.95E-03) (Supplementary Fig. 3A and Table 3). Fasting insulin levels showed genetic sharing with OCD (R2 = 0.26%; P = 7.67E-05) and for HbA1c and fasting glucose levels, we did not find evidence of genetic sharing. Lastly, we observed genetic sharing between 2 h glucose levels and OCD (R2 = 0.14%; P = 4.75E-03) (Supplementary Fig. 3B–E and Table 3).

Table 3 PRS-based results for shared genetic etiology between five peripheral insulin-signaling-related traits and OCD and OCS.

Gene-set analyses

MAGMA-based gene-set analysis for the CNS insulin signalling genes extracted from our earlier-defined OCD landscape containing 33,329 SNPs (effective number of SNPs after adjusting for LD structure = 2,189) revealed a significant association with ‘symmetry/counting/ordering’ (P = 4.08E-03) (Supplementary Table 3). Within the significant gene-set, none of the individual genes showed gene-wide association (Supplementary Table 4). No significant associations were found with the total OCS score or the five other OCS factors.

Validation analyses in an independent population sample

Two OCS factors were similar between the PNC and Spit for Science cohort, i.e., ‘symmetry/counting/ordering’ and ‘contamination/cleaning’ (Supplementary Table 2 and Supplementary Fig. 4A, B). Using summary statistics of the GWASs of these two factors, we found that diagnosed OCD shows genetic sharing with ‘symmetry/counting/orderingTOCS’ (R2 = 0.49%; FDR-adjusted P = 2.42E-05) and ‘contamination/cleaningTOCS’ (R2 = 0.23%; FDR-adjusted P = 4.07E-03).

We also observed a shared genetic etiology between T2D and ‘contamination/cleaningTOCS’ (R2 = 0.28%; FDR-adjusted P = 1.59E-03) (Supplementary Table 5 and Supplementary Fig. 5A–C). Gene-set analysis for the OCD landscape genes in the two OCSTOCS factors revealed no significant associations.

All results from the PRS-based analyses are summarized in Table 4.

Table 4 Summary of results from PRS-based analyses.


In this study, we extended previous work by assessing genetic etiologies between OCD, OCS in the population, and CNS and peripheral insulin signaling. While previous studies20,26 have yielded a shared genetic etiology between OCD and the total population-based OCS score, our aalyses using phenotypic and genetic data of 650 children and adolescents from the population (PNC cohort) found genetic sharing between OCD and the OCS factor ‘guilty taboo thoughts’. In the larger Spit for Science cohort (n = 5,047), we expanded our results by showing genetic sharing between OCD and ‘symmetry/counting/ordering’ as well as ‘contamination/cleaning’. Our findings are in keeping with the literature suggesting (at least partial) genetic overlap between OCD and population-based OCS20,22,23,24,27. Since OCD is genetically correlated with other psychiatric disorders (e.g., Anorexia Nervosa, Major Depressive Disorder and Tourette Syndrome50), future studies investigating OCS as (a) shared trait(s) between disorders could help address underlying biological mechanisms of comorbidity.

OCD and OCS have been linked to altered CNS and peripheral insulin signaling. When testing for potential overlapping biology, we found significant association between a set of 51 autosomal OCD genes centered around CNS insulin-regulated synaptic function and ‘symmetry/counting/ordering’. As for peripheral insulin signaling, we found genetic sharing between T2D and—based on the PNC data—‘aggressive taboo thoughts’, and—in the Spit for Science cohort—between T2D and ‘contamination/cleaning’. For two out of the four T2D blood markers (blood levels of fasting insulin and 2hGlu), we also identified a shared genetic etiology with OCD. These findings provide support for ‘dysregulated’ peripheral insulin signaling as a biological process contributing to both OCD and population-based OCS. Further evidence for a role of (altered) peripheral insulin signaling in OCD etiology is suggested by the fact that selective serotonin reuptake inhibitors (SSRIs), the first-line pharmacological treatment for OCD, positively affect diabetic parameters when used to treat depressive symptoms in T2D (i.e., decreasing HbA1c levels and insulin requirement, and increasing insulin sensitivity)51. Interestingly, SSRIs are particularly effective for treating harm-related obsessions, which are a part of ‘aggressive taboo thoughts’52. This is in line with our finding of genetic sharing between T2D and ‘aggressive taboo thoughts’. In addition, a recent study demonstrated that bilateral deep brain stimulation (DBS), a safe and effective treatment option for pharmaco-resistant OCD, not only reduced OCD symptoms but also decreased fasting insulin levels in the blood of both OCD patients with T2D and non-diabetic OCD patients53. Moreover, insulin in the CNS - either entering from the periphery by crossing the blood brain barrier54 or synthesized in the CNS55—has important non-metabolic functions, including modulating synaptic plasticity56 and learning and memory57,58.

Although it is not clear yet what the relative contributions are of dysregulated peripheral and CNS insulin signaling to OCD and OCS, we recently demonstrated that compulsivity observed in Tallyho (TH) mice, a rodent model of T2D, is potentially linked to disturbances in insulin signaling. TH mice both displayed compulsive behaviour and increased glucose levels in their dorsomedial striatum, which could be due to decreased action of peripheral and/or CNS insulin, and the glucose levels correlated with compulsivity59.

The current results should be viewed in light of some strengths and limitations. A strength is that we used quantitative symptom scores collected through questionnaires in the general population, which has enabled us to generate OCS phenotypes that we could then perform GWASs on. Using samples selected from the community may also reduce selection bias, which can occur when patient samples are analysed (e.g., individuals suffering from several comorbid disorders are more likely to present for clinical care)60. A limitation of the current study is the small sample size of the GWASs and limited power to discover new single genetic variant associations. However, this sample size was large enough to provide proof of concept for genetic sharing between OCD, OCS in the population, and insulin signaling. A second limitation we faced was that the questions in the discovery and validation cohorts were not exactly the same, which may partly explain the lack of validation. Another limitation may be that the proportions of the variance in the target phenotypes being explained by the base phenotypes are quite small. However, these ‘variances explained’ are in fact similar to or higher than those found in similar analyses, e.g., the PRS derived from a GWAS of OCD explained (only) 0.20% of the variance in OCS in a population sample20. Moreover, as the variance explained is dependent on the size of the ‘base sample’ for the generation of the PRS61, the observed variances explained with the still relatively small meta-GWAS of OCD as base sample may be underestimated.

In conclusion, we identified a shared genetic etiology between OCD, OCS in the population, and both CNS and peripheral insulin signaling. Our results imply that altered insulin signaling is not only relevant for somatic disorders but is also involved in the etiology of psychiatric disorders and related symptoms in the population, especially OCD and OCS. Further studies are needed to disentangle the contributions of peripheral and CNS insulin production and signaling to these disorders and symptoms.