Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Identification of transdiagnostic psychiatric disorder subtypes using unsupervised learning


Psychiatric disorders show heterogeneous symptoms and trajectories, with current nosology not accurately reflecting their molecular etiology and the variability and symptomatic overlap within and between diagnostic classes. This heterogeneity impedes timely and targeted treatment. Our study aimed to identify psychiatric patient clusters that share clinical and genetic features and may profit from similar therapies. We used high-dimensional data clustering on deep clinical data to identify transdiagnostic groups in a discovery sample (N = 1250) of healthy controls and patients diagnosed with depression, bipolar disorder, schizophrenia, schizoaffective disorder, and other psychiatric disorders. We observed five diagnostically mixed clusters and ordered them based on severity. The least impaired cluster 0, containing most healthy controls, showed general well-being. Clusters 1–3 differed predominantly regarding levels of maltreatment, depression, daily functioning, and parental bonding. Cluster 4 contained most patients diagnosed with psychotic disorders and exhibited the highest severity in many dimensions, including medication load. Depressed patients were present in all clusters, indicating that we captured different disease stages or subtypes. We replicated all but the smallest cluster 1 in an independent sample (N = 622). Next, we analyzed genetic differences between clusters using polygenic scores (PGS) and the psychiatric family history. These genetic variables differed mainly between clusters 0 and 4 (prediction area under the receiver operating characteristic curve (AUC) = 81%; significant PGS: cross-disorder psychiatric risk, schizophrenia, and educational attainment). Our results confirm that psychiatric disorders consist of heterogeneous subtypes sharing molecular factors and symptoms. The identification of transdiagnostic clusters advances our understanding of the heterogeneity of psychiatric disorders and may support the development of personalized treatments.


Psychiatric disorders are typically diagnosed based on cross-sectional and longitudinal symptom profiles. However, different symptom patterns can result in the same diagnosis, and symptom arrays of different diagnoses may overlap, leading to heterogeneous clinical manifestations and trajectories. The risk for psychiatric disorders is multifactorial and influenced by the genetic background, early adverse experiences, and personality factors. Accounting for these risk factors may improve diagnostic accuracy. Common genetic variants confer an important share of psychiatric disorder risk, which can be quantified using polygenic scores (PGSs) [1]. Proportionally to the genetic risk load, a gradient of symptom severity may exist between healthy individuals and clinically diagnosed patients [2,3,4,5].

The wealth of available data and advances in machine learning intensified efforts to redefine disorder categories using data-driven methods. Previous studies stratified psychiatric disorders mostly by clustering single domains (e.g., psychometry [6,7,8,9,10], neuroimaging [11,12,13,14,15,16], biochemical markers [17], or genetics [18, 19]) or by analyzing patients from a single diagnosis (e.g., major depressive disorder (MDD) [5, 7, 11, 18,19,20,21,22] or schizophrenia (SCZ) [23,24,25,26,27,28]). Previous transdiagnostic clustering studies support the existence of diagnostically mixed subtypes across two [29,30,31] or more disorders [32,33,34,35]. However, these studies were limited by small samples and analyzed few disorders or variables [36, 37]. To our knowledge, Dwyer et al. [32] constitutes the largest published clustering study. It focused on psychosis, not covering the complete spectrum from healthy controls over affective to psychotic disorders. To assess the continuum between well-being and disease, clustering analyses profit from the inclusion of healthy controls, largely omitted in previous studies [7, 21, 29, 34].

In the present study, we applied a data-driven clustering approach to a large transdiagnostic patient/control sample. It encompassed healthy controls and patients diagnosed with MDD, bipolar disorder (BD), schizophrenia, schizoaffective disorder (SZA), or other psychiatric disorders (see below). Our study had the following two main aims: first, to use high-dimensional data clustering (HDDC) [38] to identify stable transdiagnostic clusters. Here, we used deep phenotypic data including psychopathology measures, personality traits, cognitive functioning, social functioning, attachment style, environmental exposures in childhood and youth, parental factors, and quality of life measures. Second, to characterize differences of clinical and genetic variables between clusters using supervised machine learning. Moreover, we analyzed the information gain of PGS compared with the family history of psychiatric disorders and replicated our clustering solution in an independent sample.

Materials and methods

Sample description

FOR2107 is an ongoing multi-center study recruiting patients via in- and outpatient services in Marburg and Münster, Germany; healthy subjects were recruited via newspaper advertisements [39]. Inclusion criteria for the cohort were comprehensive to ensure the recruitment of patients across different diagnoses, approximately representative for referrals to Western European psychiatric hospitals. The study protocols were approved by the ethics committees of the Medical Schools of the Universities of Marburg and Münster, following the Declaration of Helsinki, and all participants provided written informed consent. All subjects underwent a structured clinical interview for Diagnostic and Statistical Manual (DSM)-IV axis I disorders [40], administered by trained clinical raters.

All individuals recruited in the first phase of the study, i.e., whose data were available when we began the analyses, were eligible for the discovery sample (N = 1623), N = 855 independent individuals recruited subsequently were included for the replication. First, participants who had withdrawn their consent, with missing diagnosis, and relatives were excluded. Second, individuals with missing information in any of the variables used for clustering were excluded (Methods S1). Final sample sizes were N = 1250 (discovery) and N = 622 (replication). Age and diagnosis distributions differed between both samples (p = 0.01, p = 0.002, respectively), sex did not (p = 0.16). Among diagnostic groups, the proportions of healthy controls (p = 0.005) and MDD patients (p = 0.003) differed significantly (Tables 1 and S1).

Table 1 Characterization of the discovery sample and clusters.

Variables used for clustering and cluster description

Fifty-seven baseline variables were used for clustering and the description of clusters (Fig. S1, Table S2). These variables were not directly used for establishing the diagnoses. Following a suggestion by Maj [36], we combined the assessment of symptoms and disease development at the current stage with variables capturing antecedent events, such as parental factors and early environmental factors, and concomitant variables such as cognitive functioning, social functioning (resilience), and personality traits. Several variables that were confounded with diagnostic groups, strongly differentiated psychiatric patients from healthy controls, or may have over-represented specific diagnostic aspects were excluded from clustering and retained for the post hoc characterization of clusters (Fig. 1A–D; for details, see Tables S2S3). The self-reported family history of either any psychiatric disorder or specifically for MDD, BD, and SZA/SCZ was assessed for first-degree relatives and used for the genetic cluster characterization. We contrasted known with no/unknown family history.

Fig. 1: Cluster characterization in the discovery sample with clinical and genetic variables not used in the clustering pipeline.

None of the variables shown in this Fig. 1 or Table S3 were included in the clustering pipeline. BH a horizontal line represents the mean and the error bars indicate the standard deviation. The dot size is proportional to the number of individuals with the given value. Variables that were significant in the one-vs-all comparisons are marked with an asterisk sign. EH show all PGS significant after Bonferroni correction (adjusted p < 0.05), tested using the Westfall and Young procedure (Methods S6), in either one-vs-all or one-vs-one analyses (Tables S12S13). PGS were standardized by Z score transformation, the y axis unit is standard deviations. A The distribution of diagnoses within clusters. B The Global Assessment of Functioning (GAF) score, used for sorting clusters. Lower scores imply more severe impairment. C The number of times an individual was hospitalized. D The medication load index [59], reflecting the dose and variety of different medications taken. E Psychiatric cross-disorder PGS, significantly different in two one-vs-all analyses (lower in cluster 0, Bonferroni-corrected p = 0.004; higher in cluster 4, corrected p = 0.01). F MDD PGS, significantly different in two one-vs-all analyses (lower in cluster 0, p = 0.008; higher in cluster 4, corrected p = 0.04). G Schizophrenia PGS, significantly different in two one-vs-all analyses (lower in cluster 0, corrected p = 0.04; higher in cluster 4, corrected p = 0.01). H Educational attainment PGS, significantly different in one one-vs-all analysis (lower in cluster 4, corrected p = 0.004).

Genotyping and calculation of PGSs

Genotyping was conducted using the PsychArray BeadChip, followed by quality control and imputation, as described previously [41, 42] (Methods S2). Imputed genetic data were available for n = 1146 discovery-stage and n = 556 replication-stage individuals (Fig. S2). PGSs were calculated for ten disorders and traits using PRS-CS [43] (Methods S3) with training data from sufficiently powered, published genome-wide association studies: attention-deficit/hyperactivity disorder (ADHD) [44], autism spectrum disorder (ASD) [45], BD [46], psychiatric cross-disorder (CD) [47], educational attainment (EA) [48], extraversion [49], hedonic well-being [50], MDD [51], neuroticism [52], and schizophrenia [53].

Clustering analysis

The clustering of discovery-stage scaled clinical variables was conducted by HDDC [38] using the R (v3.6.0) package HDclassif [54]. This package implements a subspace clustering algorithm based on the Gaussian mixture model framework, which allowed us to fit 14 different model types, corresponding to different regularizations for the cluster solutions. The clustering pipeline had four steps: finding the best fitting model type, finding the optimal cluster number, getting the final cluster solution, and assessing the solution’s stability (Methods S4 and Fig. S3). For the code used in this study, see

Characterization of clusters

In primary analyses, we characterized the clusters with the one-vs-all strategy [55], with one-vs-one pairwise comparisons in secondary analyses. Genetic analyses used 24 variables: 10 PGS, 4 family history, eight ancestry components, age, and gender. Merged with family history, the genetic sample size was n = 1137 (discovery) and n = 542 (replication).

We analyzed by supervised high-dimensional discriminant analysis (HDDA) [54, 56], which of the 57 variables used for clustering were most important for the cluster characterization (Methods S5).

Lasso-regularized regression [57] was used to predict cluster labels with genetic variables (Methods S5). Statistical testing was performed using the Westfall and Young method [58], controlling the family-wise error rate while accounting for the possible dependence structure of the analyzed variables. The obtained p values were subsequently corrected for the number of comparisons using Bonferroni’s method. For thus adjusted p values, a significance threshold α = 0.05 was used (Methods S6). We used multinomial regression to compare PGS with family history when predicting clusters (Methods S7).

Replication analysis

We clustered the replication sample using the discovery-stage model parameters (Methods S8). Discovery-stage one-vs-all HDDA classification models were fit to the replication-stage clusters. Replication clusters were identified using the best discovery-stage model (balanced accuracy >70%).

After matching discovery and replication clusters, the discovery-stage genetic lasso models were projected to the replication sample.


Model-based clustering analysis

The discovery-stage data set contained N = 1250 individuals with a mean age of 35.1 (SD = 13.0) years. For the distribution of diagnoses, see Table 1. Site-specific differences are reported in Table S4. We performed model-based HDDC using 57 baseline variables (Table S2). Our clustering pipeline (Fig. S4) identified five clusters (Fig. 1A), which were ordered by their average Global Assessment of Functioning (GAF) scores, from lowest (cluster 0) to highest severity (cluster 4) (Fig. 1B).

Phenotypic characterization of clusters

Cluster 0 contained mostly healthy controls, whereas the other clusters were diagnostically more mixed (Fig. 1A). All clusters showed distinct profiles of diagnoses, symptoms, and environmental risk factors (Table 1 and S5).

Individuals in cluster 0 (n = 535, 84% healthy controls) showed the overall best health and quality of life and exhibited the lowest severity in most symptom and risk scores (Figs. 12 and S5, Tables S3, S6, S7). The smallest cluster 1 (n = 38) included the highest rates of females (62%) and symptomatic controls without a diagnosis (50%), who reported reduced general and mental health and increased anxiety and depression symptoms (Table 1, Fig. 2). Individuals in cluster 2 showed average general health scores but reduced mental health and parental bonding and elevated emotional maltreatment scores (Tables 1 and S7, Figs. 2 and S5). Cluster 3 had the highest rate of affective diagnoses with high depression and anxiety levels (Fig. 1A, Table 1); its members reported substantially reduced general and mental health. The mean childhood maltreatment scores in cluster 3 were lower than in clusters 1, 2, and 4 (Table 1). Cluster 4 (n = 196) featured most patients diagnosed with SZA and schizophrenia (Fig. 1A). Individuals in cluster 4 were characterized by the highest severity in many dimensions used for clustering (Tables 1 and S7, Fig. 2) and in additional variables examined post hoc, such as hospitalization and medication load index [59] (Fig. 1, Table S3).

Fig. 2: Cluster characterization in the discovery sample with variables used in the clustering pipeline.

A horizontal line represents the mean, and the error bars indicate the standard deviation, whereas the dot size is proportional to the number of individuals with the given value. Variables that were significant in the one-vs-all comparisons are marked with an asterisk sign. A Hamilton Depression Rating Scale (HAMD, 21 items, clinician-administered), range 0–66, scores >7 indicate (mild) depression. B Hamilton Anxiety Rating Scale (HAMA), range 0–56, scores >17 indicate mild to moderate anxiety severity. C Scale for the Assessment of Negative Symptoms (SANS, sum score), range 0–80, a higher score indicates more severe negative symptoms. For subscales, see Table S3. D Scale for the Assessment of Positive Symptoms (SAPS, sum score), range 0–86, a higher score indicates more severe positive symptoms. For subscales, see Table S3. E Beck Depression Inventory (BDI-II, self-reported), range 0–63, scores >9 indicate (mild) depression. F Symptom Checklist–Global Severity Index, an index of overall psychological distress, range 0–4, higher scores reflect higher levels of psychopathological distress as well as a greater severity of self-reported symptoms. G Childhood Trauma Questionnaire sum score, range 25–125, a higher score indicates more experiences of childhood trauma. H SF36–Quality of life measurements–Mental health, range 0–100, high scores define a more favorable health state.

As a secondary analysis, we characterized MDD patients within the five clusters to assess the heterogeneity of this large diagnostic group and identified distinct phenotypic signatures of MDD patients in each cluster (Tables S8S9).

Genetic characterization: variable selection

We conducted lasso regularized regression to predict cluster assignments using genetic variables, i.e., ten PGS and four self-reported family history assessments. Prediction performances were highest for the two extreme clusters 0 and 4 (cluster 0 vs. 4: area under the receiver operating characteristic curve (AUC) = 81%, sensitivity = 75%, specificity = 75%; cluster 0 vs. all: AUC = 71%, sensitivity = 66%, specificity = 66%; cluster 4 vs. all: AUC = 73%, sensitivity = 67%, specificity = 67%, Table S10). Lasso selected seven variables when comparing cluster 0 against all others and 16 for cluster 4 (Table 2). In both cases, the self-reported family history achieved larger effect sizes than PGSs of psychiatric disorders. For lasso summary statistics, see Table S11.

Table 2 Genetic characterization of the discovery clusters.

Genetic characterization: statistical significance

We used Westfall and Young’s method to assess the significance of genetic variables. One-vs-all comparisons of clusters 0, 2, and 4 identified the following significant genetic variables (Table S12 and Fig. 1E–H): Cluster 0 was characterized by a lower family history of MDD, BD, and any psychiatric disorder (each adjusted p = 0.004) and lower cross-disorder (p = 0.004), MDD (p = 0.008), and schizophrenia (p = 0.04) PGS. Cluster 2 was characterized by a higher family history of any psychiatric disorder (p = 0.005) and MDD (p = 0.03). Cluster 4 showed a higher family history of any psychiatric disorder (p = 0.004) and higher cross-disorder (p = 0.01), schizophrenia (p = 0.01), and MDD (p = 0.04) PGS, as well as lower PGS for educational attainment (p = 0.004). Pairwise comparisons resulted in significant differences between four cluster pairs (Table S13). Cluster 4 MDD patients showed significantly higher ADHD (p = 0.01) and lower educational attainment PGS (p = 0.005) than MDD patients from the other clusters (Table S8 and Fig. S6A, B). As a sensitivity analysis, we compared PGS between diagnostic labels (Table S14).

Genetic characterization: assessment of the information gain

The inclusion of PGSs and ACs in a multinomial cluster prediction model yielded an increase of R2 = 11.7% over a null model without genetic variables (Table S15). The family history alone improved the R2 by 10.8% over the null model; a model with both family history and ACs showed a gain of R2 = 13.9%. PGSs, ACs, and family history together increased R2 by 20.3%. PGSs improved the model containing family history and ACs significantly (likelihood ratio test p = 5  10−5).

Replication of the clustering analysis

The replication data set contained N = 622 individuals with a mean age of 36.3 (SD = 12.6) years (Table S1). HDDA models matched all but the smallest cluster 1 between discovery and replication samples (Fig. S7). The matched replication clusters followed the same severity ranking as the discovery-stage clusters, and many variables showed highly similar severity patterns (Fig. 3, Tables S1, S16S17).

Fig. 3: Cluster characterization in the replication sample with clinical and genetic variables not used in the clustering pipeline.

BH a horizontal line represents the mean, and the error bars indicate the standard deviation, whereas the dot size is proportional to the number of individuals with the given value. EH show all PGS that were significant after Bonferroni correction (adjusted p < 0.05) in either one-vs-all or one-vs-one analyses using the Westfall and Young procedure (Methods S6) in the discovery-stage analysis. All p values for the full replication sample are shown in Tables S19 and S20. PGS were standardized by Z score transformation, the y axis unit are standard deviations. A The distribution of diagnoses within clusters. B The Global Assessment of Functioning (GAF) score, used for sorting clusters. Lower scores imply more severe impairment. C The number of times an individual was hospitalized. D Medication load index [59], reflecting dose and variety of different medications taken. E Psychiatric cross-disorder PGS, replicated for the comparison cluster 0-vs-all (corrected p = 0.03). F Major depressive disorder PGS, replicated for the comparison cluster 4-vs-all (corrected p = 0.01). G Schizophrenia PGS, replicated for the comparison cluster 0-vs-all (p = 0.005). H Educational attainment PGS, replicated for the comparison cluster 4-vs-all (corrected p = 0.005).

The discovery-stage genetic lasso regression models applied to the replication clusters showed an AUC = 63%, sensitivity=60%, specificity=60% for cluster 0 vs. all and an AUC = 68%, sensitivity=67%, specificity=66% for cluster 4 vs. all, similar to the discovery sample. Further projections of five pairwise models yielded AUCs >60% (Table S18). As observed in the discovery sample, cross-disorder (adjusted p = 0.03) and schizophrenia (p = 0.005) PGS were significantly lower in the replication-stage cluster 0 (Table S19 and Fig. 3E, G). For cluster 4, the MDD PGS (p = 0.01) was higher and the educational attainment PGS lower (p = 0.005), confirming the discovery-stage results (Fig. 3F, H). Also schizophrenia and cross-disorder PGS were, as in the discovery stage, higher in cluster 4, but these associations showed only nominal significance and did not pass correction for multiple testing. In pairwise comparisons, replicated PGS associations included the associations of schizophrenia, cross-disorder, and educational attainment PGS when comparing cluster 0 with 4 (Table S20). MDD individuals in cluster 4 had, as in the discovery stage, significantly lower EA PGS than MDD patients in other clusters, whereas the association of ADHD PGS for MDD patients in cluster 4 did not replicate (Fig. S6C–D).


The symptoms and disease courses of patients diagnosed with any given major psychiatric disorder are highly heterogenous, suggesting ethiopathological differences between patients sharing the same diagnosis. The classification and treatment of psychiatric disorders rely on a nosological approach that does not necessarily reflect the disorders’ molecular etiology.

In the present study, we characterized subgroups in a large transdiagnostic cohort, including healthy controls, after clustering 57 multi-modal phenotypic variables. By combining model-based clustering with supervised machine learning for cluster characterization, we generated robust and replicable outcomes. Furthermore, we described clusters using genetic variables.

Comparison of clusters to a severity continuum

We identified five diagnostically mixed clusters, which were ranked along a continuous severity scale. Cluster 0 contained mostly healthy controls and was distinguished by the lowest severity in many measures—from the lowest maltreatment factors, depression level, and positive symptoms to the highest quality of life scores. Cluster 4 had the highest share of schizophrenia and SZA patients and showed the highest severity in many variables not used for the clustering, e.g., the medication load index [59] and the number of hospitalizations. Clusters 1–3 ranged between these two extremes and differed mostly in different levels of maltreatment, depression and antidepressant use, daily functioning, and parental bonding.

Using principal component analysis and SigClust [60], we could not find support for the hypothesis that a simple severity component explains our clustering best (Results S1, Table S21). The five identified categorical clusters thus rank along but do not exactly correspond to a severity continuum.

Importantly, all but the smallest of these clusters were replicated in an independent sample. Given that the proportions of diagnoses in the replication sample differed, the replication of these clusters and their characteristics, especially the severity spectrum and genetic variables, is remarkable. It underlines the stability of the cluster solution and indicates that our approach did not suffer from overfitting in the discovery sample.

Characterization of potential disorder subtypes

Compared with DSM-IV diagnostic categories, our cluster solution surpassed diagnostic boundaries mostly for MDD and BD, while patients diagnosed with schizophrenia and SZA were primarily grouped in the high-severity cluster 4. This finding confirms etiological similarities between the affective disorders MDD and BD, distinguishing them from predominantly psychotic disorders [61, 62]. Inclusion of more schizophrenia patients may have led to better discrimination of schizophrenia subtypes, as identified in previous studies [32, 63].

MDD patients were present in all five clusters, suggesting that different disorder subtypes or stages were captured. Interestingly, 80% of MDD patients in the lowest severity cluster 0 were in remission of either single or recurrent MDD at the assessment time (coded according to the DSM). Hence, their present clinical presentation was similar to healthy individuals. MDD patients in cluster 1 might represent a reactive depression subtype, with similarities to burnout (i.e., a high somatization level and life stress, low energy, and a higher age of disorder onset). MDD cases in cluster 2, with the lowest average age of onset, might suffer from exogenous depression triggered by external stressors (maltreatment and neglect in childhood). Interestingly, this cluster also contained the highest ratio of BD type-II/type-I patients (Table S22). However, these patients also showed a high genetic predisposition for depression, with 48% reporting an MDD family history. In cluster 3, MDD patients showed a low influence of adverse environmental factors and high parental bonding, similar to cluster 0. Nevertheless, their quality of life was impacted negatively by illness—cluster 3 MDD patients showed low energy and experienced limitations in role activities because of physical and emotional health problems.

Consistent with the strong presence of schizophrenia patients, cluster 4 MDD patients exhibited depression with psychotic features, showing higher positive symptoms and more antipsychotic intake. These MDD patients had significantly higher ADHD PGS than MDD patients in other clusters (p = 0.009). Previous studies have identified correlations between ADHD in childhood and the development of other severe psychiatric disorders, especially schizophrenia, in adulthood [64,65,66]. Although not available at present, a retrospective assessment of ADHD symptoms during childhood in cluster 4 MDD cases might shed further light on this correlation. MDD (and BD) patients in cluster 4 showed significantly more psychotic features than MDD/BD cases in other clusters (Table S23).

Characterization of healthy controls

Healthy controls distributed across clusters 1–4 showed isolated symptoms similar to the psychiatric patients in these clusters (Table S24). The number of healthy controls decreased with cluster severity. Apparently, the symptoms of these healthy individuals were not sufficiently severe to generate a clinically relevant presentation of any psychiatric disorder fitting the currently used nosology. For example, these individuals may have only experienced short-term symptoms, e.g., resulting from a recent adverse life event. Indeed, healthy controls in cluster 4 showed a negative events score of 21, higher than the median of any other disorder group in the clusters showing high impairment. Alternatively, they might develop a disorder later in life; with a mean age of 32, the healthy individuals were younger than the average assessed patients.

Analyses of genetic differences between healthy controls assigned to different clusters identified nominally significant differences for the ADHD PGS, similarly to the MDD subtype analysis (Table S24 and Fig. S8). Follow-up assessments of the longitudinal FOR2107 study may reveal whether a higher share of healthy controls mapping to the more severe clusters will develop a disorder over time.

Moving beyond classical diagnostic groups

Possibly, the current diagnostic criteria do not capture the whole illness spectrum. Our study might thus contribute to improved diagnostic criteria, as envisioned by the Research Domain Criteria (RDoC) project [67]. In agreement with the RDoC concept, we included variables from different domains, including behavioral tests for evaluating cognitive functioning. Although cluster 4 patients showed the lowest cognitive functioning, these differences did not substantially contribute to the clustering, possibly due to the “reliability paradox” of behavioral tests [68, 69]. These tests are particularly sensitive to situational modulators like attention and motivation as well as experience and learning effects.

MDD and BD patients were distributed over all five clusters, with similar shares of individuals mapping to clusters 2–4. Although most healthy controls were assigned to cluster 0 and most schizophrenic patients to cluster 4, 24% of healthy controls were not in cluster 0, and 30% of schizophrenia patients not in cluster 4. Among MDD patients, 22% were assigned to the high-severity cluster 4. The spread of MDD patients across all clusters supports the hypothesis that classical diagnostic groups may be inferior to a symptom-derived grouping of patients.

Characterization of clusters using PGSs

Supervised analyses of genetic variables confirmed that PGS added information to cluster comparisons beyond what could be assessed using the family history of disorders. The slight increase of explained variance conveyed by ancestry information underlined the highly polygenic nature of psychiatric disorders. Interestingly, a recent study highlighted the benefits of adding both the family history and PGS to prediction models [70]. Psychiatric cross-disorder, schizophrenia, and MDD PGS were significantly higher in the most severe cluster 4 compared with cluster 0, whereas educational attainment PGS were lower—corresponding to effect directions reported in previous studies [47, 51, 53, 61, 71]. Although PGS are still far from routine clinical use in psychiatry, they might be used for patient stratification in the future [1, 5, 63, 72].

Interestingly, genetic PGS analyses on diagnostic categories produced different results from analyses of cluster labels. For example, cluster 0 showed higher educational attainment and lower neuroticism PGS, both of which did not differ significantly between healthy controls and the other probands. Similarly, cluster 4 showed an association with several PGS while schizophrenia patients only showed increased schizophrenia PGS. These genetic differences corroborated the transdiagnostic nature of the identified clusters.

Comparison to previous clustering studies

To our knowledge, the present study is the first to cluster multidomain profiles of clinical variables across psychiatric disorders and including healthy controls. Nevertheless, the cluster profiles and identified severity spectrum partially aligns with previous findings. A transdiagnostic study identified a cluster containing mainly healthy controls and exhibiting the lowest symptom scores in the observed dimensions [34], likely corresponding to our cluster 0. Our highly impaired cluster 4, with its high percentage of schizophrenic patients, low functioning, and significantly lower EA PGS, may correspond to the severe psychosis subtype from a previous study [32]. Moreover, a single-disorder subtyping study [7] detected five clusters of MDD, with one subgroup showing an absence of many symptoms, similar to our cluster 0. Furthermore, our results highlight the correlation of various measures of childhood trauma, adverse experiences, and lack of support with illness severity, positive symptoms, hospitalizations, and the need for more intensive treatment. Several prior studies support such a correlation [30, 73,74,75,76].


Most psychiatric patients in our transdiagnostic study have been diagnosed with MDD, with only a smaller share of other, especially psychotic diagnoses. Such a distribution approximately resembles known differences in prevalence between mental health disorders in the general population. Although the high number of MDD patients allowed for a detailed description of depression subtypes, a similarly detailed characterization was not possible for psychotic disorders, which concentrated in cluster 4. Future transdiagnostic studies applying our clustering approach with more psychotic patients could focus on BD and schizophrenia subtypes, as suggested by previous single-disorder studies [25, 28].

Although we observed no overrepresentation of depression-related variables in our analysis (Results S2), we cannot entirely exclude that the variable selection influenced the obtained clustering solution. Furthermore, the diagnostic groups differed in demographic variables like age and sex, resulting in corresponding differences between clusters (Table 1, Results S2).

Moreover, although we used independent individuals for the replication data set, these probands were subsequently recruited within the same study as the discovery-stage sample. Accordingly, the proportions of healthy controls and MDD patients differed between the discovery and replication samples, limiting their comparability. We conducted the quality control of the phenotypic and genetic data jointly for both data sets, introducing minor dependencies. Furthermore, the replication sample was smaller than the discovery sample, attenuating its statistical power.

Finally, the clustering algorithm we used relied on discrete categorization and a given number of clusters. Assuming the existence of a symptom continuum from healthy to severe mental illness, future studies might consider applying methods incorporating the notion of a continuum into the global objective function [77].


In conclusion, our study constitutes a data-driven, computational approach to psychiatric disorder stratification that surpasses existing diagnostic categories and integrates different domain profiles.

Our analyses support the hypothesis that psychiatric disorders consist of heterogeneous subtypes that share etiological factors and symptoms. We have demonstrated the importance of stratifying symptoms and disorder subtypes that can be ranked according to their severity. Individuals formally diagnosed with the same disorder differ in their specific impairment. Furthermore, their symptoms may partly overlap with symptoms exhibited by patients with different diagnoses, highlighting the need for symptom- instead of diagnosis-specific treatment. Our transdiagnostic clustering approach may advance the understanding of the heterogeneity within and between psychiatric disorders. If applied to further cohorts, it may help the identification of patient groups sharing clinical features and thus profiting from similar treatments. The identification of such groups can lead to the development of more appropriate diagnoses, targeted treatment options, and prediction models for the disease course. Future assessments in FOR2107 and other longitudinal studies can reveal whether patients mapping to the different clusters show similar disease courses and treatment responses.

Funding and disclosure

The Forschungsgruppe/Research Unit FOR2107 study was funded by the German Research Foundation (DFG): grants KI 588/14-1, KI 588/14-2 to T.K.; DA 1151/5-1, DA 1151/5-2 to UD; NE 2254/1-2 to I.N.; HA 7070/2-2, HA 7070/3, HA 7070/4 to T.H.; MU1315/8-2 to B.M.M.; RI 908/11-1, RI 908/11-2 to M.R.; NO 246/10-1, NO 246/10-2 to M.M.N.; WI 3439/3-1, WI 3439/3-2 to S.W. The study was supported by the German Federal Ministry of Education and Research (BMBF), through the Integrated Network IntegraMent, under the auspices of the e:Med programme (grants 01ZX1314A, 01ZX1614A to M.M.N.; 01ZX1314G, 01ZX1614G to M.R.; 01ZX1614J to B.M.M.), through BMBF grants 01EE1406C to M.R. and 01EE1409C to M.R. and S.H.W., and through ERA-NET NEURON, “SynSchiz - Linking synaptic dysfunction to disease mechanisms in schizophrenia - a multilevel investigation“ (01EW1810 to M.R.) and BMBF grants 01EE1409C and 01EE1406C to M.R. and S.H.W. Till Andlauer was supported by the BMBF through the DIFUTURE consortium of the Medical Informatics Initiative Germany (grant 01ZZ1804A) and the European Union’s Horizon 2020 Research and Innovation Programme (grant MultipleMS, EU RIA 733161). The authors have nothing to disclose. Open Access funding enabled and organized by Projekt DEAL.


  1. 1.

    Andlauer TFM, Nöthen MM. Polygenic scores for psychiatric disease: from research tool to clinical application. Medizinische Genet. 2020;32:39–45.

    Google Scholar 

  2. 2.

    Seow LSE, Chua BY, Xie H, Wang J, Ong HL, Abdin E, et al. Correct recognition and continuum belief of mental disorders in a nursing student population. BMC Psychiatry. 2017;17:289.

    PubMed  PubMed Central  Article  Google Scholar 

  3. 3.

    Van Os J, Linscott RJ, Delespaul P, Krabbendam L. A systematic review and meta-analysis of the psychosis continuum: evidence for a psychosis proneness – persistence – impairment model of psychotic disorder. Psychol Med. 2009;39:179–95.

    PubMed  Article  PubMed Central  Google Scholar 

  4. 4.

    Johns LC, van Os J. The continuity of psychotic experiences in the general population. Clin Psychol Rev. 2001;21:1125–41.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  5. 5.

    Schultebraucks K, Choi KW, Galatzer-Levy IR, Bonanno GA. Discriminating heterogeneous trajectories of resilience and depression after major life stressors using polygenic scores. JAMA Psychiatry. 2021.

  6. 6.

    Chan CC, Shanahan M, Ospina LH, Larsen EM, Burdick KE. Premorbid adjustment trajectories in schizophrenia and bipolar disorder: a transdiagnostic cluster analysis. Psychiatry Res. 2019;272:655–62.

    PubMed  Article  PubMed Central  Google Scholar 

  7. 7.

    Maglanoc LA, Landrø NI, Jonassen R, Kaufmann T, Córdova-palomera A, Hilland E, et al. Data-driven clustering reveals a link between symptoms and functional brain connectivity in depression. Biol Psychiatry Cogn Neurosci Neuroimaging. 2019;4:16–26.

    PubMed  PubMed Central  Google Scholar 

  8. 8.

    Fountain C, Winter AS, Bearman PS. Six developmental trajectories characterize children with autism. Pediatrics 2012;129:e1112 LP–e1120.

    Article  Google Scholar 

  9. 9.

    Bell MD, Corbera S, Johannesen JK, Fiszdon JM, Wexler BE. Social cognitive impairments and negative symptoms in schizophrenia: are there subtypes with distinct functional correlates? Schizophr Bull. 2011;39:186–96.

    PubMed  PubMed Central  Article  Google Scholar 

  10. 10.

    Stein F, Lemmer G, Schmitt S, Brosch K, Meller T, Fischer E, et al. Factor analyses of multidimensional symptoms in a large group of patients with major depressive disorder, bipolar disorder, schizoaffective disorder and schizophrenia. Schizophr Res. 2020;218:38–47.

  11. 11.

    Drysdale AT, Grosenick L, Downar J, Dunlop K, Mansouri F, Meng Y, et al. Resting-state connectivity biomarkers define neurophysiological subtypes of depression. Nat Med. 2017;23:28–38.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  12. 12.

    Cheng Y, Xu J, Yu H, Nie B, Li N, Luo C, et al. Delineation of early and later adult onset depression by diffusion tensor imaging. PLoS ONE. 2014;9:e112307–e112307.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  13. 13.

    Gould IC, Shepherd AM, Laurens KR, Cairns MJ, Carr VJ, Green MJ. Multivariate neuroanatomical classification of cognitive subtypes in schizophrenia: a support vector machine learning approach. NeuroImage Clin. 2014;6:229–36.

    PubMed  PubMed Central  Article  Google Scholar 

  14. 14.

    Kaczkurkin AN, Sotiras A, Baller EB, Barzilay R, Calkins ME, Chand GB, et al. Neurostructural heterogeneity in youths with internalizing symptoms. Biol Psychiatry. 2020;87:473–82.

    PubMed  Article  PubMed Central  Google Scholar 

  15. 15.

    Costa Dias TG, Iyer SP, Carpenter SD, Cary RP, Wilson VB, Mitchell SH, et al. Characterizing heterogeneity in children with and without ADHD based on reward system connectivity. Dev Cogn Neurosci. 2015;11:155–74.

    PubMed  PubMed Central  Article  Google Scholar 

  16. 16.

    Sun H, Lui S, Yao L, Deng W, Xiao Y, Zhang W, et al. Two patterns of white matter abnormalities in medication-naive patients with first-episode schizophrenia revealed by diffusion tensor imaging and cluster analysis. JAMA Psychiatry. 2015;72:678–86.

    PubMed  Article  PubMed Central  Google Scholar 

  17. 17.

    Haroon E, Chen X, Li Z, Patel T, Woolwine BJ, Hu XP, et al. Increased inflammation and brain glutamate define a subtype of depression with decreased regional homogeneity, impaired network integrity, and anhedonia. Transl Psychiatry. 2018;8:189.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  18. 18.

    Yu C, Arcos-Burgos M, Licinio J, Wong M-L. A latent genetic subtype of major depression identified by whole-exome genotyping data in a Mexican-American cohort. Transl Psychiatry. 2017;7:e1134.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  19. 19.

    Howard DM, Folkersen L, Coleman JRI, Adams MJ, Glanville K, Werge T, et al. Genetic stratification of depression in UK Biobank. Transl Psychiatry. 2020;10:163.

    PubMed  PubMed Central  Article  Google Scholar 

  20. 20.

    Van Dam NT, Connor DO, Marcelle ET, Ho EJ, Craddock RC, Tobe RH, et al. Archival report data-driven phenotypic categorization for neurobiological analyses: beyond DSM-5 labels. Biol Psychiatry. 2017;81:484–94.

    PubMed  Article  PubMed Central  Google Scholar 

  21. 21.

    Tokuda T, Yoshimoto J, Shimizu Y, Okada G, Takamura M, Okamoto Y, et al. Identification of depression subtypes and relevant brain regions using a data-driven approach. Sci Rep. 2018;8:14082.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  22. 22.

    Beijers L, Wardenaar KJ, van Loo HM, Schoevers RA. Data-driven biological subtypes of depression: systematic review of biological approaches to depression subtyping. Mol Psychiatry. 2019;24:888–900.

    PubMed  Article  PubMed Central  Google Scholar 

  23. 23.

    Geisler D, Walton E, Naylor M, Roessner V, Lim KO, Charles Schulz S, et al. Brain structure and function correlates of cognitive subtypes in schizophrenia. Psychiatry Res. 2015;234:74–83.

    PubMed  PubMed Central  Article  Google Scholar 

  24. 24.

    Dwyer DB, Cabral C, Kambeitz-Ilankovic L, Sanfelici R, Kambeitz J, Calhoun V, et al. Brain Subtyping Enhances The Neuroanatomical Discrimination of Schizophrenia. Schizophr Bull. 2018;44:1060–9.

    PubMed  PubMed Central  Article  Google Scholar 

  25. 25.

    Dickinson D, Pratt DN, Giangrande EJ, Grunnagle M, Orel J, Weinberger DR, et al. Attacking heterogeneity in schizophrenia by deriving clinical subgroups from widely available symptom data. Schizophr Bull. 2017;44:101–13.

    PubMed Central  Article  Google Scholar 

  26. 26.

    Helmes E, Landmark J. Subtypes of schizophrenia: a cluster analytic approach. Can J Psychiatry. 2003;48:702–8.

    PubMed  Article  PubMed Central  Google Scholar 

  27. 27.

    Brodersen KH, Deserno L, Schlagenhauf F, Lin Z, Penny WD, Buhmann JM, et al. Dissecting psychiatric spectrum disorders by generative embedding. NeuroImage Clin. 2014;4:98–111.

    PubMed  Article  Google Scholar 

  28. 28.

    Farmer AE, McGuffin P, Spitznagel EL. Heterogeneity in schizophrenia: a cluster-analytic approach. Psychiatry Res. 1983;8:1–12.

    CAS  PubMed  Article  Google Scholar 

  29. 29.

    Lee J, Rizzo S, Altshuler L, Glahn DC, Miklowitz DJ, Sugar CA, et al. Deconstructing bipolar disorder and schizophrenia: a cross-diagnostic cluster analysis of cognitive phenotypes. J Affect Disord. 2017;209:71–79.

    PubMed  Article  Google Scholar 

  30. 30.

    Carbone EA, Pugliese V, Bruni A, Aloi M, Calabrò G, Jaén-moreno MJ, et al. Adverse childhood experiences and clinical severity in bipolar disorder and schizophrenia: a transdiagnostic two-step cluster analysis. J Affect Disord. 2019;259:104–11.

    PubMed  Article  PubMed Central  Google Scholar 

  31. 31.

    Kleinman A, Caetano SC, Brentani H, Rocca CC, de A, dos Santos B, et al. Attention-based classification pattern, a research domain criteria framework, in youths with bipolar disorder and attention-deficit/hyperactivity disorder. Aust N. Zeal J Psychiatry. 2014;49:255–65.

    Article  Google Scholar 

  32. 32.

    Dwyer DB, Kalman JL, Budde M, Kambeitz J, Ruef A, Antonucci LA, et al. An investigation of psychosis subgroups with prognostic validation and exploration of genetic underpinnings: the PsyCourse study. JAMA Psychiatry. 2020;77:523–33.

    PubMed  Article  PubMed Central  Google Scholar 

  33. 33.

    Forbush K, Hagan K, Kite B, Chapa D, Bohrer B, Gould S. Understanding eating disorders within internalizing psychopathology: a novel transdiagnostic, hierarchical-dimensional model. Compr Psychiatry. 2017;79:40–52.

  34. 34.

    Grisanzio KA, Goldstein-Piekarski AN, Wang MY, Rashed Ahmed AP, Samara Z, Williams LM. Transdiagnostic symptom clusters and associations with brain, behavior, and daily function in mood, anxiety, and trauma disorders. JAMA Psychiatry. 2018;75:201–9.

    PubMed  Article  PubMed Central  Google Scholar 

  35. 35.

    Lewandowski KE, Sperry SH, Cohen BM, Ongür D. Cognitive variability in psychotic disorders: a cross-diagnostic cluster analysis. Psychol Med. 2014;44:3239–48.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  36. 36.

    Maj M. Why the clinical utility of diagnostic categories in psychiatry is intrinsically limited and how we can use new approaches to complement them. World Psychiatry. 2018;17:121.

    PubMed  PubMed Central  Article  Google Scholar 

  37. 37.

    Fusar-Poli P, Solmi M, Brondino N, Davies C, Chae C, Politi P, et al. Transdiagnostic psychiatry: a systematic review. World Psychiatry. 2019;18:192–207.

    PubMed  PubMed Central  Article  Google Scholar 

  38. 38.

    Bouveyron C, Girard S, Schmid C. High-dimensional data clustering. Comput Stat Data Anal. 2007;52:502–19.

    Article  Google Scholar 

  39. 39.

    Kircher T, Wöhr M, Nenadic I, Schwarting R, Schratt G, Alferink J, et al. Neurobiology of the major psychoses: a translational perspective on brain structure and function—the FOR2107 consortium. Eur Arch Psychiatry Clin Neurosci. 2019;269:949–62.

    PubMed  Article  PubMed Central  Google Scholar 

  40. 40.

    Wittchen H-U, Wunderlich U, Gruschwitz S, Zaudig M. SKID I. Strukturiertes Klinisches Interview für DSM-IV. Achse I: Psychische Störungen. Interviewheft und Beurteilungsheft. Eine deutschsprachige, erweiterte Bearb. d. amerikanischen Originalversion des SKID I. 1997.

  41. 41.

    Meller T, Schmitt S, Stein F, Brosch K, Mosebach J, Yüksel D, et al. Associations of schizophrenia risk genes ZNF804A and CACNA1C with schizotypy and modulation of attention in healthy subjects. Schizophr Res. 2019;208:67–75.

  42. 42.

    Andlauer TFM, Buck D, Antony G, Bayas A, Bechmann L, Berthele A, et al. Novel multiple sclerosis susceptibility loci implicated in epigenetic regulation. Sci Adv. 2016;2:e1501678–e1501678.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  43. 43.

    Ge T, Chen C-Y, Ni Y, Feng Y-CA, Smoller JW. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat Commun. 2019;10:1776.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  44. 44.

    Demontis D, Walters RK, Martin J, Mattheisen M, Als TD, Agerbo E, et al. Discovery of the first genome-wide significant risk loci for attention deficit/hyperactivity disorder. Nat Genet. 2019;51:63–75.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  45. 45.

    Grove J, Ripke S, Als TD, Mattheisen M, Walters RK, Won H, et al. Identification of common genetic risk variants for autism spectrum disorder. Nat Genet. 2019;51:431–44.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  46. 46.

    Stahl EA, Breen G, Forstner AJ, McQuillin A, Ripke S, Trubetskoy V, et al. Genome-wide association study identifies 30 loci associated with bipolar disorder. Nat Genet. 2019;51:793–803.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  47. 47.

    Lee P, Anttila V, Won H, Feng Y-C, Rosenthal J, Zhu Z, et al. Genomic relationships, novel loci, and pleiotropic mechanisms across eight psychiatric disorders. Cell 2019;179:1469–82.

    Article  CAS  Google Scholar 

  48. 48.

    Okbay A, Beauchamp JP, Fontana MA, Lee JJ, Pers TH, Rietveld CA, et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature 2016;533:539–42.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  49. 49.

    van den Berg SM, de Moor MHM, Verweij KJH, Krueger RF, Luciano M, Arias Vasquez A, et al. Meta-analysis of genome-wide association studies for extraversion: findings from the genetics of personality consortium. Behav Genet. 2016;46:170–82.

    PubMed  Article  PubMed Central  Google Scholar 

  50. 50.

    Baselmans BML, Bartels M. A genetic perspective on the relationship between eudaimonic -and hedonic well-being. Sci Rep. 2018;8:14610.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  51. 51.

    Howard DM, Adams MJ, Clarke T-K, Hafferty JD, Gibson J, Shirali M, et al. Genome-wide meta-analysis of depression identifies 102 independent variants and highlights the importance of the prefrontal brain regions. Nat Neurosci. 2019;22:343–52.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  52. 52.

    Luciano M, Hagenaars SP, Davies G, Hill WD, Clarke T-K, Shirali M, et al. Association analysis in over 329,000 individuals identifies 116 independent variants influencing neuroticism. Nat Genet. 2018;50:6–11.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  53. 53.

    Pardiñas AF, Holmans P, Pocklington AJ, Escott-Price V, Ripke S, Carrera N, et al. Common schizophrenia alleles are enriched in mutation-intolerant genes and in regions under strong background selection. Nat Genet. 2018;50:381–9.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  54. 54.

    Berg L, Bouveyron C, Girard S. HDclassif: An R package for model-based clustering and discriminant analysis of high-dimensional data. J Stat Softw. 2012;46:i11.

  55. 55.

    Rifkin R, Klautau A. In defense of one-vs-all classification. J Mach Learn Res. 2004;5:101–41.

    Google Scholar 

  56. 56.

    Bouveyron C, Girard S, Schmid C. High-dimensional discriminant analysis. Commun Stat - Theory Methods. 2007;36:2607–23.

    Article  Google Scholar 

  57. 57.

    Tibshirani R. Regression Shrinkage and Selection via the Lasso. J R Stat Soc (Ser B). 1996;58:267–88.

    Google Scholar 

  58. 58.

    Westfall PH, Young SS. Resampling-based multiple testing: examples and methods for p-value adjustment. (Wiley, 1993).

  59. 59.

    Redlich R, Almeida JR, Grotegerd D, Opel N, Kugel H, Heindel W, et al. Brain morphometric biomarkers distinguishing unipolar and bipolar depression: a voxel-based morphometry–pattern classification approach. JAMA Psychiatry. 2014;71:1222–30.

    PubMed  PubMed Central  Article  Google Scholar 

  60. 60.

    Huang H, Liu Y, Yuan M, Marron JS. Statistical significance of clustering using soft thresholding. J Comput Graph Stat. 2015;24:975–93.

    PubMed  PubMed Central  Article  Google Scholar 

  61. 61.

    Coleman JRI, Gaspar HA, Bryois JConsortium; BDWG of the PG, Consortium MDDWG of the PG, Breen G. The genetics of the mood disorder spectrum: genome-wide association analyses of over 185,000 cases and 439,000 controls. Biol Psychiatry. 2020;88:169–184.

  62. 62.

    Levey DF, Stein MB, Wendt FR, Pathak GA, Zhou H, Aslan M, et al. GWAS of depression phenotypes in the million veteran program and meta-analysis in more than 1.2 million participants yields 178 independent risk loci. MedRxiv. 2020;

  63. 63.

    Bansal V, Mitjans M, Burik CAP, Linnér RK, Okbay A, Rietveld CA, et al. Genome-wide association study results for educational attainment aid in identifying genetic heterogeneity of schizophrenia. Nat Commun. 2018;9:3078.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  64. 64.

    Hamshere ML, Stergiakouli E, Langley K, Martin J, Holmans P, Kent L, et al. Shared polygenic contribution between childhood attention-deficit hyperactivity disorder and adult schizophrenia. Br J Psychiatry. 2013;203:107–11.

    PubMed  PubMed Central  Article  Google Scholar 

  65. 65.

    Dalsgaard S, Mortensen PB, Frydenberg M, Maibing CM, Nordentoft M, Thomsen PH. Association between Attention-Deficit Hyperactivity Disorder in childhood and schizophrenia later in adulthood. Eur Psychiatry. 2014;29:259–63.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  66. 66.

    Rubino IA, Frank E, Croce Nanni R, Pozzi D, Lanza di Scalea T, Siracusano A. A comparative study of axis i antecedents before age 18 of unipolar depression, bipolar disorder and schizophrenia. Psychopathology 2009;42:325–32.

    PubMed  Article  PubMed Central  Google Scholar 

  67. 67.

    Cuthbert BN. The RDoC framework: facilitating transition from ICD/DSM to dimensional approaches that integrate neuroscience and psychopathology. World Psychiatry. 2014;13:28–35.

    PubMed  PubMed Central  Article  Google Scholar 

  68. 68.

    Dang J, King KM, Inzlicht M. Why are self-report and behavioral measures weakly correlated? Trends Cogn Sci. 2020;24:267–9.

    PubMed  PubMed Central  Article  Google Scholar 

  69. 69.

    Hedge C, Powell G, Sumner P. The reliability paradox: why robust cognitive tasks do not produce reliable individual differences. Behav Res Methods. 2018;50:1166–86.

    PubMed  Article  PubMed Central  Google Scholar 

  70. 70.

    Hujoel MLA, Loh P-R, Neale B, Price AL. Incorporating family history of disease improves polygenic risk scores in diverse populations. BioRxiv. 2021;

  71. 71.

    Wray NR, Ripke S, Mattheisen M, Trzaskowski M, Byrne EM, Abdellaoui A, et al. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat Genet. 2018;50:668–81.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  72. 72.

    Murray GK, Lin T, Austin J, McGrath JJ, Hickie IB, Wray NR. Could polygenic risk scores be useful in psychiatry?: a review. JAMA Psychiatry. 2021;78:210–9.

    PubMed  Article  PubMed Central  Google Scholar 

  73. 73.

    Varese F, Smeets F, Drukker M, Lieverse R, Lataster T, Viechtbauer W, et al. Childhood adversities increase the risk of psychosis: a meta-analysis of patient-control, prospective- and cross-sectional cohort studies. Schizophr Bull. 2012;38:661–71.

    PubMed  PubMed Central  Article  Google Scholar 

  74. 74.

    Misiak B, Krefft M, Bielawski T, Moustafa AA, Sąsiadek MM, Frydecka D. Toward a unified theory of childhood trauma and psychosis: a comprehensive review of epidemiological, clinical, neuropsychological and biological findings. Neurosci Biobehav Rev. 2017;75:393–406.

    PubMed  Article  PubMed Central  Google Scholar 

  75. 75.

    Li X-B, Li Q-Y, Liu J-T, Zhang L, Tang Y-L, Wang C-Y. Childhood trauma associates with clinical features of schizophrenia in a sample of Chinese inpatients. Psychiatry Res. 2015;228:702–7.

  76. 76.

    Janssen I, Krabbendam L, Bak M, Hanssen M, Vollebergh W, Graaf R, et al. Childhood abuse as a risk factor for psychotic experiences. Acta Psychiatr Scand. 2004;109:38–45.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  77. 77.

    Shah SA, Koltun V. Deep continuous clustering. 2018;

Download references


This work is part of the German multi-center consortium “Neurobiology of Affective Disorders. A translational perspective on brain structure and function”, funded by the German Research Foundation (Deutsche Forschungsgemeinschaft DFG; Forschungsgruppe/Research Unit FOR2107). Please see the Supplement for full FOR2107 acknowledgments. We would like to thank Karsten Borgwardt, Julien Gagneur, and Janos Kalman for their helpful comments.

Author information




Concept and design: Pelin, Müller-Myhsok, Andlauer. Data analysis: Pelin. Drafting the manuscript: Pelin, Andlauer. Revising the manuscript: Ising, Müller-Myhsok, Kircher, Krug, Rietschel, Stein, Brosch, N. Winter, Opel, Schmitt. Providing data: Kircher, Dannlowski, Nenadić, Rietschel, Nöthen, Hahn, Krug, Forstner, Stein, Meinert, Meller, Brosch, N. Winter, Leenings, Lemke, Heilmann-Heimbach, Opel, Repple, Pfarr, Ringwald, Schmitt, Thiel, Waltemate, A. Winter, Streit, Witt.

Corresponding authors

Correspondence to Helena Pelin or Till F. M. Andlauer.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Pelin, H., Ising, M., Stein, F. et al. Identification of transdiagnostic psychiatric disorder subtypes using unsupervised learning. Neuropsychopharmacol. 46, 1895–1905 (2021).

Download citation


Quick links