Introduction

Obsessive-compulsive disorder (OCD) is a psychiatric disorder that affects around 2–3% [1, 2] of the general population and can result in severe psychosocial impairment if untreated. The disorder is characterized by excessive, unwanted thoughts (obsessions) and/or repetitive behaviors (compulsions) [3]. Despite OCD’s large burden on affected individuals and the health care system, up to date, no biomarker has been found to classify the disorder in a clinical setting or to aid clinicians to predict response to pharmacological or psychological treatment.

OCD is considered a multifactorial disorder in which the risk to develop the disease is defined by the complex interaction of genetics, epigenetics, and environmental factors. From a genetic perspective, twin studies have estimated that the heritability of OCD is 47–61% [4,5,6,7]. Despite this high heritability, genome-wide association studies in OCD have identified only one genetic locus reaching genome-wide significance [8]. This might be explained by the current lack of statistical power to identify genetic variants of small effects. An alternative explanation is that the missing heritability is due to gene x environment interactions contributing to the etiology of OCD [5, 9, 10]. For example, evidence from retrospective and longitudinal studies has shown that childhood trauma and other environmental risk factors may predispose to OCD, presumably in combination with genetics [11, 12]. Although it is widely accepted that environmental factors play a role in OCD pathophysiology, evidence is not unequivocal, and specific mechanisms are largely unclear. Also, many studies lack methodological quality so far [13, 14]. Interestingly, environmental factors are known to exert their effects on disease susceptibility through epigenetic modifications leading to the modulation of expression and co-expression of several genes [15,16,17,18,19]. In humans, the most studied epigenetic modification is the methylation of DNA (DNAm). The development of high throughput array technology enabled genome-wide assessment of DNAm for many individuals at a moderate cost [20, 21]. Epigenome-wide association studies (EWAS) have shed light on many psychiatric disorders such as depression [10], anorexia nervosa, Alzheimer’s disease [22, 23], or schizophrenia [24], complementing genetic research. Given their dynamic and modifiable nature, DNAm can be acquired or lost over the lifespan depending on environmental influences. Thus, epigenetic modifications may serve as biomarkers for gene x environment interactions, providing further insights into the molecular basis of OCD [25].

Exposure and response prevention (ERP) based cognitive-behavioral therapy (CBT) constitutes a first-line treatment for OCD with effect sizes being among the largest of all psychological treatments for mood, anxiety, and related disorders [26]. Still, a substantial proportion of OCD patients show an insufficient response to CBT [27], stressing the need to identify predictive markers of CBT response.

We recruited 384 participants from two German cities, Berlin and Bonn, to investigate the relationship between blood DNAm and OCD, which makes it the largest study to date. We first search for genomic loci showing differentially methylated sites between cases and healthy controls. We then computed a methylation profile score (MPS) to assess its classification power to differentiate cases from controls (Fig. 1), as well as its association with treatment response and symptom severity (Y-BOCS scale).

Fig. 1: A schematic representation of our analysis.
figure 1

After the first EWAS only on the Berlin data set (discovery stage), probes are filtered based on two different approaches and replicated in the Bonn data set. A all probes with p value > 0.05 in the discovery stage were removed and the remaining CpGs were further explored in the Bonn samples. The probes reaching a  q value ≤ 0.01 (BH correction) in the Bonn samples were considered differentially methylated (DMP). B probes with q values equal or lower to 0.05, 0.01, 10–4, 10–5, 10–10, 10–20, and 10–30 were used to compute different MPS values. Next, each MPS was used as an independent variable to classify the Bonn samples and to select the best treshold based on the AU-ROC metric. The set of CpGs used to build the best MPS was selected. The intersection of both methods (12 CpGs) was then selected as the actual signals.

Materials and methods

Patients and controls

Biological samples were obtained from 185 patients with OCD and 199 healthy individuals who participated in the Endophenotypes of OCD study [28, 29]. The two recruitment centers, the Department of Psychology of Humboldt-University in Berlin and the Department of Psychiatry and Psychotherapy of the University Hospital in Bonn enrolled and evaluated all participants according to the same protocols (Table 1). Healthy individuals from the general population were recruited through public advertisements. All participants came from European ancestry. Before recruitment, written informed consent was given by all participants, and monetary compensation was paid for their time. The study was performed following the revised Declaration of Helsinki and approved by the local ethics committees of Humboldt University and the University Hospital Bonn.

Table 1 Cohort demographics.

Clinical evaluation

All participants were examined by trained psychologists using the Structured Clinical Interview for DSM-IV (SCID-I) [30]. The severity of OCD symptoms was evaluated using the German version of the Yale-Brown Obsessive-Compulsive Scale (Y-BOCS) [31, 32]. Patients with OCD were included if they: (a) were free of any psychotic, bipolar, or substance-related disorder in the past or present (b) had not been treated with any neuroleptic drug during the past 4 weeks, and (c) had not used benzodiazepines 2 weeks before the study examination. Moreover, healthy participants were excluded if they (a) had taken any psychoactive drug in the past 3 months, (b) reported any Axis I disorder, or (c) had a relative with OCD.

Current or previous treatments were assessed in the patients’ group, in which ~50% had received pharmacotherapy, predominantly with SSRI. 79 OCD patients reported treatment with psychotropic medication in the past 4 weeks. A total of 25 patients had their treatments discontinued several weeks before baseline and did not take any specific medications at the time of assessment. Another 98 patients were medication-naive, reporting no priory psychotropic medication. Four patients did not provide a medication status report. The majority of patients had one or more comorbid Axis I disorders, with major depressive disorder being the most common comorbidity (n = 41).

Treatment subsample

A subsample of OCD patients completed individual CBT at a university outpatient unit at the Berlin study site (Hochschulambulanz für Psychotherapie und Psychodiagnostik der Humboldt-University). The CBT sessions were administered by licensed psychotherapists and conformed to the general conditions for psychotherapy in the public German health care system, typically consisting of 25 or more individual 50-min sessions per week. Details about the treatment can be found in Bey et al. [33] and Kathmann et al. [34]. For n = 100 patients (n = 54 female, n = 46 male), Y-BOCS data were available at pre- and post-treatment [35].

Methylation arrays

Blood aliquots were obtained from all participants. Genomic DNA was isolated from whole blood and DNA concentration and purity were determined using the NanoDrop ND1000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA). All samples were of sufficient quantity and quality. 500 ng genomic DNA was used as input for the bisulfite conversion reaction using the EZ-96 DNAm Methylation-Lightning MagPrep Kit (Zymo Research Europe GmbH, Freiburg, Germany) with an elution volume of 15 µl. Bisulfite-treated DNA was vacuum concentrated and resuspended in 10 µl. A total of 4 µl of the resuspension was used as input for the Infinium Methylation EPIC BeadChip (Illumina Inc, San Diego, CA, USA). All analysis steps were performed following the manufacturer’s instructions. The Illumina iScan was used for imaging the array and data were exported in .idat format.

Data acquisition and quality control

The R (Bioconductor) Meffil [36] package was used throughout our pipeline to analyze the complete data set. All raw idat files were pooled together to run the quality control and normalization steps. Samples were removed if there was a mismatch between the estimated methylation sex and the gender provided by the participant, deviations from the mean value for control probes, or the median intensity for the methylated or unmethylated signal deviated more than three standard deviations (s.d.).

Probes were removed for further analysis if they mapped to a sex-chromosome, had a detection p value below 0.05, beadcount lower than three, or were aligned to multiple locations in the genome according to Nordlund et al. [37]. In addition, we removed the 10% of probes with the lowest variability to reduce the number of probes and multiple tests [38]. In the end, 366 samples (189 controls and 177 cases) and 632,997 probes passed all our quality control filters and were used to normalize the methylation intensities.

Functional normalization [36, 39] was applied to remove technical variation using 15 principal components (PCs) and an assessment center (Berlin/Bonn) as a fixed effect. Blood cell proportion was imputed using functionalities from meffil for each individual and used in the linear models to correct the methylation effect.

Two-step EWAS

To analyze our data set, the two cohorts were initially kept separated (Berlin and Bonn). While the larger cohort from Berlin served as a discovery cohort in the EWAS, the Bonn cohort was used for replication.

Meffil uses the Independent Surrogate Variable Analysis (ISVA) method which allows for estimating confounding factors (CF) in methylation studies [36, 40, 41]. Briefly, the ISVA uses the independent component analysis method to model CFs as statistically independent variables in each probe analysis [41]. Thus, ISVA provides a non-supervised framework for accounting for any CF.

Methylation status was compared between controls and OCD cases using a linear regression model. Adjustments were made for age [42], sex [43], smoking [44], cell composition [45], and surrogate variables calculated by meffil.

The current strategy for selecting CpGs for further analysis aims first to remove the maximum number of probes in the discovery step optimizing the minimum number of false negatives (p value < 0.05). The replication step follows with a more restrictive adjusted p value (q value) threshold to select CpGs that are truly associated with the phenotype (Holm-Bonferroni q value < 0.01). A similar strategy has been applied to genetics [46] and methylation [10, 23] studies.

We estimated the false discovery rate (FDR) for our approach following the method suggested by Jiang et al.[47]:

$$\widehat {FDR}\left( {\alpha _1,\,\alpha _2} \right) = \frac{{\hat P\left( {p_{i2} \le d_2|D_2 = 0,D_1 = 0,p_{i1} \le d_1} \right)}}{{\hat P\left( {p_{i2} \le d_2|p_{i1} \le d_1} \right)}} \cdot FDR_1 \\ + \frac{{\hat P\left( {p_{i2} \le d_2|D_2 = 0,D_1 = 1,p_{i1} \le d_1} \right)}}{{\hat P\left( {p_{i2} \le d_2|p_{i1} \le d_1} \right)}} \cdot \hat \pi _{o2}$$
(1)

Briefly, a probe i with pi1 ≤ c1 will pass to the second stage, where pi1 is the p value in the first stage and c1 is the threshold for the first stage. Following similar arguments for the second stage, pi2 ≤ c2, then we say that this probe has a significant difference in methylation values between cases and controls. At stage j, dj is the smallest p value for the probes that pij > cj, and the Dj is a binary variable that indicates whether there are actual differences between the cases and controls; Dj = 0 for no differences, and Dj = 1 for actual differences. The probability that a probe is significant after our two-stage approach when there are no real differences, \(\hat P\left( {p_{i2} \le d_2|D_2 = 0,\,D_1 = 0,p_{i1} \le d_1} \right)\), was estimated by permutating for 100 times the samples. The proportion of the true null hypothesis (\(\hat \pi _{o2}\)) was estimated following the Storey method [48] and \(\hat P\left( {p_{i2} \le d_2|D_2 = 0,\,D_1 = 1,\,p_{i1} \le d_1} \right)\) equals d2. Finally, \(\hat P\left( {p_{i2} \le d_2|p_{i1} \le d_1} \right)\) is the proportion of significant probes in the second stage. Our calculation for our setup yielded an FDR of 3.26 × 10−5.

Weighted correlation network analysis

Weighted correlation Network Analysis (WCNA) uses the pairwise correlation between variables to define clusters within the variables and to associate these clusters with other phenotypes.

The R package WGCNA was used for this purpose [49, 50]. Once the network was constructed, module detection was achieved by unsupervised clustering. WGCNA uses the dynamic tree-cut method to select the number of clusters given the hierarchical clustering for the adjacency matrix.

Case–control classification based on methylation profile score

The MPS is a numerical value computed for each individual using a set of DMPs. Like polygenic risk scores in genetic studies [51], MPS improves classification capacity by leveraging methylation information on DNAm differences between cases and controls. The MPS for an individual i can be computed as

$$MPS_i = \mathop {\sum}\nolimits_{j = 1}^P {\beta _jm_{ji}}$$
(2)

where P is the number of CpGs, βj is the coefficient for the association of the probe j to the phenotype, and mj,i is the methylation status of probe j. Herein, a set of q values thresholds was used to select the number of probes: 0.05, 0.01, 1 × 10−4, 1 × 10−5, 1 × 10–10, 1 × 10–20, 1 × 10–30, and 1 × 10–40. For each threshold, an MPS was computed using the selected CpGse and then its classification capacity was tested using the Bonn sample as an independent dataset. AU-ROC for each threshold was computed using the R-package p-ROC.

Clinical correlates and treatment analysis

To assess whether the OCD-related methylation profile is associated with symptom severity, we correlated our most reliable MPS (i.e., MPScommon, see Results) with the Y-BOCS scores of all patients. In the treatment subsample, we also examined whether the MPScommon predicts treatment response by performing linear regression analysis with Y-BOCS baseline score, and the Y-BOCS baseline score × MPScommon interaction as independent variables, and pre-to-post change in Y-BOCS score as the dependent variable. Age, gender, and medication were included as covariates.

Dimensionality reduction

A linear transformation algorithm and a non-linear transformation algorithm were used to reduce dimensionality. Principal component analysis (PCA) is the most popular linear transformation for dimensionality reduction. PCA estimates new coordinates that preserve the maximum variance of the dataset and projects the data points into the new orthogonal coordinate system. The base function prcomp in R was used to estimate the new PCs and projections. On the other hand, uniform manifold approximation and projection (UMAP) has become one of the most popular non-linear transformation algorithms. By using a framework that combines geometry and algebraic topology, UMAP can project a data set into two dimensions and reflect distances between points. The function umap in R was used to obtain the new coordinates.

Results

Epigenome-wide association study

To identify potential loci associated with OCD, we conducted a two-step case-control EWAS using samples recruited in Berlin for discovery and samples originating from Bonn for replication. This approach rendered in the Berlin sample a total of 188,488 DMPs with a nominal p value < 0.05. These sites were moved forward to the replication stage using the Bonn samples. We identified 310 DMPs discriminating cases and controls with a corrected p value for multiple testing q < 0.01 (Fig. 2).

Fig. 2: Two-step EWAS results.
figure 2

A Miami Plot for the two-stage analysis. The X-axis is the genome position. Y-axis is the nominal p-value for the discovery EWAS on a logarithmic scale multiplied by the sign of the coefficient in the same analysis. Horizontal red lines define the threshold 0.05 of the discovery analysis. Purple dots are the 310 CpGs that were significant at the replication stage, the dot size is equivalent to the adjusted p-value in the replications stage on a logarithmic scale. B Correlation of the discovery and replication stage. The X-axis shows the discovery coefficient, and Y-axis shows the replication coefficient. The purple dots represent the 310 CpGs that were significant at the end of the two-step approach; the dot size is equal to the adjusted p value in the replication stage. The dashed line shows the trend of the linear model based on the purple dots. C Cluster dendrogram. Branches refer to highly interconnected clusters of CpGs. Modules are represented by the colors in the horizontal bar.

We explored the correlation between the coefficients of the probes analyzed in the discovery and the replication stage (Fig. 2). This analysis showed that while the overall correlation for all 188,488 CpG sites was moderate (r = 0.42, p < 2.2 × 10–16), it was much stronger for the 310 DMPs in the replication stage (r = 0.88, p < 2.2 × 10–16). Only five DMPs showed opposite effect directions between discovery and replication, therefore they were removed from further analysis (Supplementary Table 1, Supplementary Fig. 1).

Of the 305 probes identified by our analysis, 241 were annotated to 233 genes based on the Illumina annotation. Gene Ontology (GO) analysis, using the R package missMethyl [49, 50], for the same probes did not show any term enriched after multiple test corrections. Of note, five terms from the GO analysis showed a nominal p value < 0.05 (Supplementary Table 2).

Network analysis identifies two different submodules

Given the complex nature and many pathways involved in OCD, we sought to search whether common patterns of methylation emerge among the 305 DMPs. Thus, we used WCNA which exploits correlations among probes and groups them into modules using network topology. After fitting several powers (β), we found that a power of ten approximated the best scale-free network for our co-methylation network (Supplementary Fig. 2). The adjacency matrix was then computed by using the optimal β and the methylation values. Based on the TOM dissimilarity measure, the hierarchical clustering yielded two consensus network modules, i.e., gray (n = 169, Supplementary Table 2) and turquoise (n = 136) (Fig. 2, Supplementary Tables 2 and 4).

Then we examined whether each module was associated with other phenotypes. To this end, we looked at the Pearson correlation coefficient and p value of the association of the eigenvector of each module with OCD status, age, sex, city, smoking, and Y-BOCS. While both modules were highly correlated with OCD phenotype (turquoise: r = −0.88, p = 4 × 10–122; gray: r = −0.79, p = 3 × 10–78), only the turquoise module was associated significantly with the Y-BOCS (r = −0.2, p = 2 × 10–4) (Supplementary Fig. 3). Interestingly, the gray module better captured the differences in the origin of the samples (Supplementary Fig. 3, 4, and 5).

The methylation profile score offers predictive performance for sample classification

Considering that both submodules and the full set showed a strong correlation with OCD status, we attempted to derive a MPS by following a similar strategy to developing polygenic risk scores [52]. To this end, we first constructed an MPS using only the 305 DMPs, which were confirmed in the replication stage (MPStwo-step). We also computed an MPS for each module, i.e., turquoise (MPSturquoise) and gray (MPSgray).

The MPStwo-step was indeed statistically different between OCD patients and controls for both, the Berlin (p < 2.2 × 10–16) and the Bonn samples (p < 2.2 × 10–16), whereas the difference of MPStwo-step values between both cities for the control group (p = 0.269) and the OCD patients (p = 0.057) was not significant (Fig. 3).

Fig. 3: Classification power of the resulting CpGs.
figure 3

Projection of the samples into a two-dimensional space using (A) PCA, and (B) UMAP. The 12 CpGs found as result of our analysis were used as input features. Purple data points are OCD patients and green are Controls. C Each facet represents the deviation from the mean for each MPSs. The number of CpGs that were used to calculate the MPS is shown in parentheses. Horizontal brackets display the results of the t-test for the set. D MPS correlation matrix. PC: Principal Component; UMAP: Uniform manifold approximation and projection; ns: not significant; * : p.value < 0.05; ** : p value < 0.01; ***: p value < 0.001; ****: p value < 2 × 10–16.

The lack of an independent third validation cohort to test the MPStwo-step independently prompted us to consider an alternative strategy for constructing the MPS. Herein, we constructed several MPSs using DMPs based on an a priori set of 8 corrected p value (q values) thresholds (PT) obtained from the EWAS performed in the discovery stage (Berlin samples only). Finally, classification accuracy for each calculated MPS was examined in the Bonn data set, which did not contribute to this MPS and could be used to test out-of-sample classification accuracy. The best classification accuracy for the Bonn sample is achieved using probes with q values < 1 × 10–20 (AU-ROCBerlin = 0.991, AU-ROCBonn = 0.968; Table 2). The MPS obtained for this threshold (MPSdiscoverey) contains 36 DMPs (Supplementary Table 5 and Supplentary Fig. 6), from which 12 are shared with the MPStwo-step and the MPSturquoise (Table 2). For this reason, we also constructed an MPS containing only the common CpGs (MPScommon) which also showed a good classification power (Fig. 3 and Table 2).

Table 2 MPS properties.

Association between MPS, clinical variables, and treatment response

As indicated by Pearson correlation, the MPScommon was significantly associated with Y-BOCS scores across all OCD patients (r = 0.17, p = 0.023), indicating that a more severe symptom severity goes along with a higher epigenetic profile score. In the regression model assessing treatment response, we observed effects at the trend level for the Y-BOCS baseline score (β = −3.108, t = −1.96, p = 0.053) and the Y-BOCS baseline by MPScommon interaction (β = −2.78, t = −1.74, p = 0.086). To follow up on this interaction, we ran separate analyses for patients with high and low MPScommon (median split: n = 56 low-scorers, n = 44 high-scorers). In MPScommon high-scorers, we found a significant effect of Y-BOCS baseline (β = −0.44, t = −3.09, p = 0.004) and a trend level association of the MPScommon (β = −0.28, t = -1.86, p = 0.070) with treatment response, indicating that patients with a higher score might show a better treatment response independently of baseline symptom severity. In MPScommon low-scorers, there were significant effects of Y-BOCS baseline (β = -0.32, t = -2.43, p = 0.019) and medication (β = 0.35, t = 2.67, p = 0.010). Notably, we did not observe any significant effects of age or gender in all analyses (p > 0.05). Moreover, there was no significant association between MPScommon and Y-BOCS baseline score in the treatment subsample (r = 0.09, p = 0.37), potentially due to sample size reduction.

Functional Annotation

Since the GO analysis did not reveal clear supporting evidence for functional terms that may be relevant or previously associated with OCD, we conducted a focused literature search on the 12 common CpGs identified in both MPS approaches because they may still represent true signals involved in the disease process operating in OCD. Consequently, we first mapped each CpG to the closest gene and gene position (Table 3).

Table 3 Biological annotation and summary statistics for the 12 common CpGs.

The highest association was found for the CpG cg17232014, which shows a substantial hypomethylation in OCD patients compared to controls. This CpG maps to a transcription start site (TSS) for two genes: Heme Binding Protein 1 (HEBP1) and the 5-Hydroxytryptamine Receptor 7 Pseudogene 1 (HTR7P1), most commonly known as serotonin receptor pseudogene (Supplementary Fig. 7). Although the functional consequence of the decreased methylation at this TSS is not fully understood yet, it likely results in an elevated gene expression of either HEBP1 or HTR7P1 or both.

Next, we observed that some of the associated CpGs were located close to genes linked to glucose metabolism. Thus, the cg01647172 is mapped to the 5′ untranslated region of the gene Pleckstrin Homology Domain Containing A1 (PLEKHA1) and is found hypomethylated in OCD patients. Likewise, we observed that the hypomethylated CpG cg00382572 position is assigned to the KCNQ1 gene coding for the KCNQ1 potassium channel, which is located in the pancreas and has been also associated with diabetes [53,54,55,56,57,58]. Finally, the cg19069918 is located near the gene TRPM8, which has been long studied as a cancer biomarker, particularly in pancreatic cancer [59].

The next set of CpGs was annotated to genes involved in different processes related to resident cells of the brain. Thus, the probe cg06215939 is found hypermethylated at the TSS of the Mitogen-Activated Protein Kinase 3 gene predicting a reduction in gene expression. For cg21812670, the methylation was found to be increased in OCD patients. This position is located at the TSS of the gene coding for the Rab geranylgeranyl transferase which is essential for synaptic vesicle release [60]. Along these lines, the cg13959110 is located in the gene coding for the brain myelin expression factor 2. This gene is a transcriptional repressor of the myelin basic protein gene that has been involved in myelin homeostasis. Another CpG, cg25195309 is located in the Enable Homolog gene. The function of this gene has been linked to actin polymerization in neurons [61]. Herein, neurons lacking these proteins cannot perform neuritogenesis in the developing cortex [61].

Discussion

In the present study, our primary goal was to identify changes in DNAm associated with OCD status. Following a discovery and replication strategy, we identified 305 CpGs that were differentially methylated between cases and controls. Using these 305 DMPs, or a subset of them, allowed us to classify cases and controls accurately. Importantly, similarly, high classification accuracy was reached when we applied a different analytical strategy using the strongest disease-related DMP signals of the Berlin sample to predict caseness in the independent sample from Bonn. Both analytical strategies converged on 12 common CpGs deserving further scrutiny. Finally, we found a significant association of a methylation score based on these common 12 CpGs with OCD symptom severity, as well as a trend level association with treatment response to CBT in OCD patients with high MPS, indicating that patients with larger values show better treatment response. This latter result might allow MPS to be used as a biomarker for predicting treatment response in OCD from a translational perspective.

While EWAS has already led to important advances in other neurological and psychiatric disorders, it is still early days for OCD epigenetics [31, 62]. For example, a study on the Chinese Han population reported 8417 DMPs in the blood of 65 cases and 96 controls [62]. In addition, the comparison of DNA methylation in the saliva of 59 patients with OCD and 54 controls of European origin identified nine genes with methylation changes related to OCD and ADHD which however did not survive multiple testing correction [31]. In 2022, Shiele et al. reported nine genome-wide significant DMPs mapping to several microRNAs and pseudogenes in the saliva of 68 OCD patients and 68 controls of European origin [63]. Importantly, we could not identify any overlapping signal in our datasets.

In this regard, a strength of our study is the two-step approach in which we treated Berlin and Bonn samples as independent cohorts. As a result, we were able to avoid the “winner’s course” in our analysis, i.e., overestimation of small effect sizes in underpowered cohorts. Although our sample size might seem underpowered, we defined the expected number of false positive signals that will arise from our study design following the methodology described by Jiang et al.[47]. Thus, after permuting the samples to estimate the FP rates, on average, 32,711 probes would be significant after the discovery step, which is in agreement with the theoretically expected (0.05 × 632,997 ≈ 31650). In addition, the second step would not report any significant probe under the threshold imposed. Consequently, the overall FDR was 0.003%, which corresponds to approximately 21 false DMPs after the replication step. Therefore, we assume that genuine signals among the 305 CpGs identified in our study are included. Supporting this assumption, our analytical strategy converged on 12 common probes out of the 305 DMPs that may represent true pathophysiological processes involved in OCD.

Pathway search did not lead to the identification of obvious candidate pathways for OCD, but the disgenet [64, 65] tool and literature search revealed that genes near the 12 CpGs have been linked to diseases like diabetes, Parkinson’s disease, ADHD, and multiple sclerosis. Interestingly, the pathogenic processes involving these genes are also linked to OCD, including glucose metabolism, the dopaminergic/serotonin system, and neuronal function. For glucose metabolism, we found that the PLEKHA1 locus has been associated with type 1 and type 2 diabetes mellitus and age-related macular degeneration (AMD) [66, 67]. In AMD, previous research has shown that TAPP1, a PLEKHA1 protein product, works as an activator of lymphocytes, indicating that PLEKHA1 plays a role in inflammation. Interestingly, increasing evidence has shown that inflammatory pathways are common pathogenetic mediators in the natural course of both types of diabetes that involve the activity of PLEKHA1 [68]. For KCNQ1, research has shown that overexpression of the ion channel in mouse-derived pancreatic β-cells leads to an impairment in insulin secretion stimulated by glucose and pyruvate [53]. Lastly, rats with deletion of the TRPM8 gene showed reduced insulin levels in serum due to enhanced insulin clearance in the liver. This was caused by afferent fibers innervating the hepatic portal vein, which is critical for metabolic homeostasis [69]. Importantly, this latter mechanism also seems to be the intersection connecting the nervous system with the metabolism of glucose and insulin. Hence, our data suggest that an underlying dysregulation in insulin/glucose metabolism may drive, at least in part, the symptoms and the disease processes occurring in OCD patients. Unfortunately, we did not have serum samples from patients before and after therapy to analyze whether glucose and insulin homeostasis changed after treatment.

Besides insulin and glucose metabolism, we also identified several genes involved in brain function. For example, both genes near cg17232014 on chromosome 12 HEBP1 and HTR7P1 have been associated with brain phenotypes. Thus, increased expression of HEBP1 in the brain has been linked to neurotoxicity [70] and neuroinflammation [71]. HTR7P1, although this is a pseudogene that does not translate into protein, genetic variants in HTR7P1 have been associated with neurological and growth phenotypes in children [72].

In our study, we identified several signals that support the sweet-compulsive brain hypothesis [73]. This hypothesis states that abnormal dopaminergic transmission in the striatum may perturb insulin signaling sensitivity in OCD patients. Deep brain stimulation in patients with OCD supports the hypothesis that dopamine transmission affects glucose and insulin metabolism in the brain. Interestingly, non-diabetic OCD patients seem to have an increased hepatic and peripheral insulin sensitivity [74], supporting our findings on PLEKHA1, KCNQ1, and TRPM8. Further reinforcing our brain-related genes and their connection with glucose and insulin homeostasis, research on insulin receptor signaling in the central nervous system showed that insulin receptor signaling regulates the maintenance of synapses. In addition, insulin receptor signaling contributes to the processing of sensory information, as well as structural plasticity triggered by external experience [75].

From a molecular perspective, our findings on insulin signaling receive further support from previous genetic studies using gene enrichment tools on published OCD GWASs. Herein, gene enrichment analysis using suggestive genetic signals from these GWASs showed that 40 out of 89 of the GWAS suggestive signals clustered in insulin and insulin-related signaling cascades [76]. Furthermore, using polygenic risk score-based analysis, Bralten et al. reported shared genetic etiology between OCD or Obsessive-compulsive symptoms (OCS) and type 2 Diabetes Mellitus and fasting insulin levels. Noteworthy, a significant association with OCS was found for a gene set containing central nervous system insulin signaling genes. It is interesting to note that a CpG site in our study mapped to KCNQ1, one of the genes contained in this gene set analyzed by Bralten et al. [77]. KCNQ1 is an obesity susceptibility gene that shows differentially methylated CpG sites between obese and lean women [78]. KCNQ1 is also an imprinted gene (a parental-specific epigenetic modification) expressed exclusively from the maternal allele during fetal development [79]. These combined findings emphasize the significance of investigating the intricate interplay between genetics and environmental factors in the etiology of OCD and how epigenetic modifications may serve as a bridge connecting both. Although there is still a long road ahead, exploring the relationship between OCD and the methylation status of KCNQ1 and other insulin-related genes might open new avenues for potential therapeutic or prevention strategies involving non-pharmacological dietary intervention.

In supporting our findings, three of our twelve most significant DMPs were found in a recent study comparing people with generalized anxiety disorder (GAD), or OCD, with healthy controls of Chinese Han origin [80]. These probes map to RABGGTB, MPK8IP3, and ENAH genes. To our knowledge, this is the first time that two different studies on methylation in OCD replicated each other’s results using populations of different ethnic backgrounds. Of note, Guo et al. used a similar approach and methodologies to analyze their data as in our study. Herein, DNAm is highly sensitive to batch effect and other factors that might increase the variability. Therefore, it is crucial to account for confounding factors when analyzing this kind of data set.

The correlation between MPS and OCD symptom severity highlights the potential clinical utility of epigenetic measures. Future studies should examine whether changes in symptom severity also go along with epigenetic modifications. Interestingly, we observed a trend-level association between MPS and treatment response in OCD patients, indicating that patients with the highest MPS showed better treatment response independent of baseline symptom severity. Among MPS-low scorers, there was no association with treatment response. Although we interpret this preliminary finding with caution, it may show that patients with high MPS exhibit features that make them benefit more from CBT than others, e.g., a larger environmental component contributing to their OCD.

Our results should be interpreted considering some important limitations. First, DNA extraction was done in Bonn for all samples including those derived from the Berlin sample. Consequently, Berlin blood samples were transported uncooled before DNA extraction, which may contribute to variation in the methylation analysis. However, our study considered this source of bias including the fact that we initially analyzed both samples independently. To avoid this source of technical bias, future studies should include cool transport of blood samples to the processing center or proceed locally with the DNA extraction before frozen transport to the analyzing center.

In summary, we identified 12 epigenome-wide significant CpGs for OCD using a robust statistical analysis of two German samples. The clinical validity of these CpGs is supported by the significant associations of our methylation profile score with OCD diagnosis, symptoms severity, and—at trend level—treatment response to CBT. Furthermore, genetic annotation contemplates a strong interaction of insulin and the dopaminergic system with OCD. Our findings thus support the role of epigenetic mechanisms in OCD and may help pave the way for biologically-informed individualized treatment options.