Psychiatric disorders show heterogeneous symptoms and trajectories, with current nosology not accurately reflecting their molecular etiology and the variability and symptomatic overlap within and between diagnostic classes. This heterogeneity impedes timely and targeted treatment. Our study aimed to identify psychiatric patient clusters that share clinical and genetic features and may profit from similar therapies. We used high-dimensional data clustering on deep clinical data to identify transdiagnostic groups in a discovery sample (N = 1250) of healthy controls and patients diagnosed with depression, bipolar disorder, schizophrenia, schizoaffective disorder, and other psychiatric disorders. We observed five diagnostically mixed clusters and ordered them based on severity. The least impaired cluster 0, containing most healthy controls, showed general well-being. Clusters 1–3 differed predominantly regarding levels of maltreatment, depression, daily functioning, and parental bonding. Cluster 4 contained most patients diagnosed with psychotic disorders and exhibited the highest severity in many dimensions, including medication load. Depressed patients were present in all clusters, indicating that we captured different disease stages or subtypes. We replicated all but the smallest cluster 1 in an independent sample (N = 622). Next, we analyzed genetic differences between clusters using polygenic scores (PGS) and the psychiatric family history. These genetic variables differed mainly between clusters 0 and 4 (prediction area under the receiver operating characteristic curve (AUC) = 81%; significant PGS: cross-disorder psychiatric risk, schizophrenia, and educational attainment). Our results confirm that psychiatric disorders consist of heterogeneous subtypes sharing molecular factors and symptoms. The identification of transdiagnostic clusters advances our understanding of the heterogeneity of psychiatric disorders and may support the development of personalized treatments.
Psychiatric disorders are typically diagnosed based on cross-sectional and longitudinal symptom profiles. However, different symptom patterns can result in the same diagnosis, and symptom arrays of different diagnoses may overlap, leading to heterogeneous clinical manifestations and trajectories. The risk for psychiatric disorders is multifactorial and influenced by the genetic background, early adverse experiences, and personality factors. Accounting for these risk factors may improve diagnostic accuracy. Common genetic variants confer an important share of psychiatric disorder risk, which can be quantified using polygenic scores (PGSs) . Proportionally to the genetic risk load, a gradient of symptom severity may exist between healthy individuals and clinically diagnosed patients [2,3,4,5].
The wealth of available data and advances in machine learning intensified efforts to redefine disorder categories using data-driven methods. Previous studies stratified psychiatric disorders mostly by clustering single domains (e.g., psychometry [6,7,8,9,10], neuroimaging [11,12,13,14,15,16], biochemical markers , or genetics [18, 19]) or by analyzing patients from a single diagnosis (e.g., major depressive disorder (MDD) [5, 7, 11, 18,19,20,21,22] or schizophrenia (SCZ) [23,24,25,26,27,28]). Previous transdiagnostic clustering studies support the existence of diagnostically mixed subtypes across two [29,30,31] or more disorders [32,33,34,35]. However, these studies were limited by small samples and analyzed few disorders or variables [36, 37]. To our knowledge, Dwyer et al.  constitutes the largest published clustering study. It focused on psychosis, not covering the complete spectrum from healthy controls over affective to psychotic disorders. To assess the continuum between well-being and disease, clustering analyses profit from the inclusion of healthy controls, largely omitted in previous studies [7, 21, 29, 34].
In the present study, we applied a data-driven clustering approach to a large transdiagnostic patient/control sample. It encompassed healthy controls and patients diagnosed with MDD, bipolar disorder (BD), schizophrenia, schizoaffective disorder (SZA), or other psychiatric disorders (see below). Our study had the following two main aims: first, to use high-dimensional data clustering (HDDC)  to identify stable transdiagnostic clusters. Here, we used deep phenotypic data including psychopathology measures, personality traits, cognitive functioning, social functioning, attachment style, environmental exposures in childhood and youth, parental factors, and quality of life measures. Second, to characterize differences of clinical and genetic variables between clusters using supervised machine learning. Moreover, we analyzed the information gain of PGS compared with the family history of psychiatric disorders and replicated our clustering solution in an independent sample.
Materials and methods
FOR2107 is an ongoing multi-center study recruiting patients via in- and outpatient services in Marburg and Münster, Germany; healthy subjects were recruited via newspaper advertisements . Inclusion criteria for the cohort were comprehensive to ensure the recruitment of patients across different diagnoses, approximately representative for referrals to Western European psychiatric hospitals. The study protocols were approved by the ethics committees of the Medical Schools of the Universities of Marburg and Münster, following the Declaration of Helsinki, and all participants provided written informed consent. All subjects underwent a structured clinical interview for Diagnostic and Statistical Manual (DSM)-IV axis I disorders , administered by trained clinical raters.
All individuals recruited in the first phase of the study, i.e., whose data were available when we began the analyses, were eligible for the discovery sample (N = 1623), N = 855 independent individuals recruited subsequently were included for the replication. First, participants who had withdrawn their consent, with missing diagnosis, and relatives were excluded. Second, individuals with missing information in any of the variables used for clustering were excluded (Methods S1). Final sample sizes were N = 1250 (discovery) and N = 622 (replication). Age and diagnosis distributions differed between both samples (p = 0.01, p = 0.002, respectively), sex did not (p = 0.16). Among diagnostic groups, the proportions of healthy controls (p = 0.005) and MDD patients (p = 0.003) differed significantly (Tables 1 and S1).
Variables used for clustering and cluster description
Fifty-seven baseline variables were used for clustering and the description of clusters (Fig. S1, Table S2). These variables were not directly used for establishing the diagnoses. Following a suggestion by Maj , we combined the assessment of symptoms and disease development at the current stage with variables capturing antecedent events, such as parental factors and early environmental factors, and concomitant variables such as cognitive functioning, social functioning (resilience), and personality traits. Several variables that were confounded with diagnostic groups, strongly differentiated psychiatric patients from healthy controls, or may have over-represented specific diagnostic aspects were excluded from clustering and retained for the post hoc characterization of clusters (Fig. 1A–D; for details, see Tables S2–S3). The self-reported family history of either any psychiatric disorder or specifically for MDD, BD, and SZA/SCZ was assessed for first-degree relatives and used for the genetic cluster characterization. We contrasted known with no/unknown family history.
Genotyping and calculation of PGSs
Genotyping was conducted using the PsychArray BeadChip, followed by quality control and imputation, as described previously [41, 42] (Methods S2). Imputed genetic data were available for n = 1146 discovery-stage and n = 556 replication-stage individuals (Fig. S2). PGSs were calculated for ten disorders and traits using PRS-CS  (Methods S3) with training data from sufficiently powered, published genome-wide association studies: attention-deficit/hyperactivity disorder (ADHD) , autism spectrum disorder (ASD) , BD , psychiatric cross-disorder (CD) , educational attainment (EA) , extraversion , hedonic well-being , MDD , neuroticism , and schizophrenia .
The clustering of discovery-stage scaled clinical variables was conducted by HDDC  using the R (v3.6.0) package HDclassif . This package implements a subspace clustering algorithm based on the Gaussian mixture model framework, which allowed us to fit 14 different model types, corresponding to different regularizations for the cluster solutions. The clustering pipeline had four steps: finding the best fitting model type, finding the optimal cluster number, getting the final cluster solution, and assessing the solution’s stability (Methods S4 and Fig. S3). For the code used in this study, see https://github.com/hpelin/HDDC_transdiagnostic_clustering/.
Characterization of clusters
In primary analyses, we characterized the clusters with the one-vs-all strategy , with one-vs-one pairwise comparisons in secondary analyses. Genetic analyses used 24 variables: 10 PGS, 4 family history, eight ancestry components, age, and gender. Merged with family history, the genetic sample size was n = 1137 (discovery) and n = 542 (replication).
Lasso-regularized regression  was used to predict cluster labels with genetic variables (Methods S5). Statistical testing was performed using the Westfall and Young method , controlling the family-wise error rate while accounting for the possible dependence structure of the analyzed variables. The obtained p values were subsequently corrected for the number of comparisons using Bonferroni’s method. For thus adjusted p values, a significance threshold α = 0.05 was used (Methods S6). We used multinomial regression to compare PGS with family history when predicting clusters (Methods S7).
We clustered the replication sample using the discovery-stage model parameters (Methods S8). Discovery-stage one-vs-all HDDA classification models were fit to the replication-stage clusters. Replication clusters were identified using the best discovery-stage model (balanced accuracy >70%).
After matching discovery and replication clusters, the discovery-stage genetic lasso models were projected to the replication sample.
Model-based clustering analysis
The discovery-stage data set contained N = 1250 individuals with a mean age of 35.1 (SD = 13.0) years. For the distribution of diagnoses, see Table 1. Site-specific differences are reported in Table S4. We performed model-based HDDC using 57 baseline variables (Table S2). Our clustering pipeline (Fig. S4) identified five clusters (Fig. 1A), which were ordered by their average Global Assessment of Functioning (GAF) scores, from lowest (cluster 0) to highest severity (cluster 4) (Fig. 1B).
Phenotypic characterization of clusters
Cluster 0 contained mostly healthy controls, whereas the other clusters were diagnostically more mixed (Fig. 1A). All clusters showed distinct profiles of diagnoses, symptoms, and environmental risk factors (Table 1 and S5).
Individuals in cluster 0 (n = 535, 84% healthy controls) showed the overall best health and quality of life and exhibited the lowest severity in most symptom and risk scores (Figs. 1–2 and S5, Tables S3, S6, S7). The smallest cluster 1 (n = 38) included the highest rates of females (62%) and symptomatic controls without a diagnosis (50%), who reported reduced general and mental health and increased anxiety and depression symptoms (Table 1, Fig. 2). Individuals in cluster 2 showed average general health scores but reduced mental health and parental bonding and elevated emotional maltreatment scores (Tables 1 and S7, Figs. 2 and S5). Cluster 3 had the highest rate of affective diagnoses with high depression and anxiety levels (Fig. 1A, Table 1); its members reported substantially reduced general and mental health. The mean childhood maltreatment scores in cluster 3 were lower than in clusters 1, 2, and 4 (Table 1). Cluster 4 (n = 196) featured most patients diagnosed with SZA and schizophrenia (Fig. 1A). Individuals in cluster 4 were characterized by the highest severity in many dimensions used for clustering (Tables 1 and S7, Fig. 2) and in additional variables examined post hoc, such as hospitalization and medication load index  (Fig. 1, Table S3).
As a secondary analysis, we characterized MDD patients within the five clusters to assess the heterogeneity of this large diagnostic group and identified distinct phenotypic signatures of MDD patients in each cluster (Tables S8–S9).
Genetic characterization: variable selection
We conducted lasso regularized regression to predict cluster assignments using genetic variables, i.e., ten PGS and four self-reported family history assessments. Prediction performances were highest for the two extreme clusters 0 and 4 (cluster 0 vs. 4: area under the receiver operating characteristic curve (AUC) = 81%, sensitivity = 75%, specificity = 75%; cluster 0 vs. all: AUC = 71%, sensitivity = 66%, specificity = 66%; cluster 4 vs. all: AUC = 73%, sensitivity = 67%, specificity = 67%, Table S10). Lasso selected seven variables when comparing cluster 0 against all others and 16 for cluster 4 (Table 2). In both cases, the self-reported family history achieved larger effect sizes than PGSs of psychiatric disorders. For lasso summary statistics, see Table S11.
Genetic characterization: statistical significance
We used Westfall and Young’s method to assess the significance of genetic variables. One-vs-all comparisons of clusters 0, 2, and 4 identified the following significant genetic variables (Table S12 and Fig. 1E–H): Cluster 0 was characterized by a lower family history of MDD, BD, and any psychiatric disorder (each adjusted p = 0.004) and lower cross-disorder (p = 0.004), MDD (p = 0.008), and schizophrenia (p = 0.04) PGS. Cluster 2 was characterized by a higher family history of any psychiatric disorder (p = 0.005) and MDD (p = 0.03). Cluster 4 showed a higher family history of any psychiatric disorder (p = 0.004) and higher cross-disorder (p = 0.01), schizophrenia (p = 0.01), and MDD (p = 0.04) PGS, as well as lower PGS for educational attainment (p = 0.004). Pairwise comparisons resulted in significant differences between four cluster pairs (Table S13). Cluster 4 MDD patients showed significantly higher ADHD (p = 0.01) and lower educational attainment PGS (p = 0.005) than MDD patients from the other clusters (Table S8 and Fig. S6A, B). As a sensitivity analysis, we compared PGS between diagnostic labels (Table S14).
Genetic characterization: assessment of the information gain
The inclusion of PGSs and ACs in a multinomial cluster prediction model yielded an increase of R2 = 11.7% over a null model without genetic variables (Table S15). The family history alone improved the R2 by 10.8% over the null model; a model with both family history and ACs showed a gain of R2 = 13.9%. PGSs, ACs, and family history together increased R2 by 20.3%. PGSs improved the model containing family history and ACs significantly (likelihood ratio test p = 5 ⨯ 10−5).
Replication of the clustering analysis
The replication data set contained N = 622 individuals with a mean age of 36.3 (SD = 12.6) years (Table S1). HDDA models matched all but the smallest cluster 1 between discovery and replication samples (Fig. S7). The matched replication clusters followed the same severity ranking as the discovery-stage clusters, and many variables showed highly similar severity patterns (Fig. 3, Tables S1, S16–S17).
The discovery-stage genetic lasso regression models applied to the replication clusters showed an AUC = 63%, sensitivity=60%, specificity=60% for cluster 0 vs. all and an AUC = 68%, sensitivity=67%, specificity=66% for cluster 4 vs. all, similar to the discovery sample. Further projections of five pairwise models yielded AUCs >60% (Table S18). As observed in the discovery sample, cross-disorder (adjusted p = 0.03) and schizophrenia (p = 0.005) PGS were significantly lower in the replication-stage cluster 0 (Table S19 and Fig. 3E, G). For cluster 4, the MDD PGS (p = 0.01) was higher and the educational attainment PGS lower (p = 0.005), confirming the discovery-stage results (Fig. 3F, H). Also schizophrenia and cross-disorder PGS were, as in the discovery stage, higher in cluster 4, but these associations showed only nominal significance and did not pass correction for multiple testing. In pairwise comparisons, replicated PGS associations included the associations of schizophrenia, cross-disorder, and educational attainment PGS when comparing cluster 0 with 4 (Table S20). MDD individuals in cluster 4 had, as in the discovery stage, significantly lower EA PGS than MDD patients in other clusters, whereas the association of ADHD PGS for MDD patients in cluster 4 did not replicate (Fig. S6C–D).
The symptoms and disease courses of patients diagnosed with any given major psychiatric disorder are highly heterogenous, suggesting ethiopathological differences between patients sharing the same diagnosis. The classification and treatment of psychiatric disorders rely on a nosological approach that does not necessarily reflect the disorders’ molecular etiology.
In the present study, we characterized subgroups in a large transdiagnostic cohort, including healthy controls, after clustering 57 multi-modal phenotypic variables. By combining model-based clustering with supervised machine learning for cluster characterization, we generated robust and replicable outcomes. Furthermore, we described clusters using genetic variables.
Comparison of clusters to a severity continuum
We identified five diagnostically mixed clusters, which were ranked along a continuous severity scale. Cluster 0 contained mostly healthy controls and was distinguished by the lowest severity in many measures—from the lowest maltreatment factors, depression level, and positive symptoms to the highest quality of life scores. Cluster 4 had the highest share of schizophrenia and SZA patients and showed the highest severity in many variables not used for the clustering, e.g., the medication load index  and the number of hospitalizations. Clusters 1–3 ranged between these two extremes and differed mostly in different levels of maltreatment, depression and antidepressant use, daily functioning, and parental bonding.
Using principal component analysis and SigClust , we could not find support for the hypothesis that a simple severity component explains our clustering best (Results S1, Table S21). The five identified categorical clusters thus rank along but do not exactly correspond to a severity continuum.
Importantly, all but the smallest of these clusters were replicated in an independent sample. Given that the proportions of diagnoses in the replication sample differed, the replication of these clusters and their characteristics, especially the severity spectrum and genetic variables, is remarkable. It underlines the stability of the cluster solution and indicates that our approach did not suffer from overfitting in the discovery sample.
Characterization of potential disorder subtypes
Compared with DSM-IV diagnostic categories, our cluster solution surpassed diagnostic boundaries mostly for MDD and BD, while patients diagnosed with schizophrenia and SZA were primarily grouped in the high-severity cluster 4. This finding confirms etiological similarities between the affective disorders MDD and BD, distinguishing them from predominantly psychotic disorders [61, 62]. Inclusion of more schizophrenia patients may have led to better discrimination of schizophrenia subtypes, as identified in previous studies [32, 63].
MDD patients were present in all five clusters, suggesting that different disorder subtypes or stages were captured. Interestingly, 80% of MDD patients in the lowest severity cluster 0 were in remission of either single or recurrent MDD at the assessment time (coded according to the DSM). Hence, their present clinical presentation was similar to healthy individuals. MDD patients in cluster 1 might represent a reactive depression subtype, with similarities to burnout (i.e., a high somatization level and life stress, low energy, and a higher age of disorder onset). MDD cases in cluster 2, with the lowest average age of onset, might suffer from exogenous depression triggered by external stressors (maltreatment and neglect in childhood). Interestingly, this cluster also contained the highest ratio of BD type-II/type-I patients (Table S22). However, these patients also showed a high genetic predisposition for depression, with 48% reporting an MDD family history. In cluster 3, MDD patients showed a low influence of adverse environmental factors and high parental bonding, similar to cluster 0. Nevertheless, their quality of life was impacted negatively by illness—cluster 3 MDD patients showed low energy and experienced limitations in role activities because of physical and emotional health problems.
Consistent with the strong presence of schizophrenia patients, cluster 4 MDD patients exhibited depression with psychotic features, showing higher positive symptoms and more antipsychotic intake. These MDD patients had significantly higher ADHD PGS than MDD patients in other clusters (p = 0.009). Previous studies have identified correlations between ADHD in childhood and the development of other severe psychiatric disorders, especially schizophrenia, in adulthood [64,65,66]. Although not available at present, a retrospective assessment of ADHD symptoms during childhood in cluster 4 MDD cases might shed further light on this correlation. MDD (and BD) patients in cluster 4 showed significantly more psychotic features than MDD/BD cases in other clusters (Table S23).
Characterization of healthy controls
Healthy controls distributed across clusters 1–4 showed isolated symptoms similar to the psychiatric patients in these clusters (Table S24). The number of healthy controls decreased with cluster severity. Apparently, the symptoms of these healthy individuals were not sufficiently severe to generate a clinically relevant presentation of any psychiatric disorder fitting the currently used nosology. For example, these individuals may have only experienced short-term symptoms, e.g., resulting from a recent adverse life event. Indeed, healthy controls in cluster 4 showed a negative events score of 21, higher than the median of any other disorder group in the clusters showing high impairment. Alternatively, they might develop a disorder later in life; with a mean age of 32, the healthy individuals were younger than the average assessed patients.
Analyses of genetic differences between healthy controls assigned to different clusters identified nominally significant differences for the ADHD PGS, similarly to the MDD subtype analysis (Table S24 and Fig. S8). Follow-up assessments of the longitudinal FOR2107 study may reveal whether a higher share of healthy controls mapping to the more severe clusters will develop a disorder over time.
Moving beyond classical diagnostic groups
Possibly, the current diagnostic criteria do not capture the whole illness spectrum. Our study might thus contribute to improved diagnostic criteria, as envisioned by the Research Domain Criteria (RDoC) project . In agreement with the RDoC concept, we included variables from different domains, including behavioral tests for evaluating cognitive functioning. Although cluster 4 patients showed the lowest cognitive functioning, these differences did not substantially contribute to the clustering, possibly due to the “reliability paradox” of behavioral tests [68, 69]. These tests are particularly sensitive to situational modulators like attention and motivation as well as experience and learning effects.
MDD and BD patients were distributed over all five clusters, with similar shares of individuals mapping to clusters 2–4. Although most healthy controls were assigned to cluster 0 and most schizophrenic patients to cluster 4, 24% of healthy controls were not in cluster 0, and 30% of schizophrenia patients not in cluster 4. Among MDD patients, 22% were assigned to the high-severity cluster 4. The spread of MDD patients across all clusters supports the hypothesis that classical diagnostic groups may be inferior to a symptom-derived grouping of patients.
Characterization of clusters using PGSs
Supervised analyses of genetic variables confirmed that PGS added information to cluster comparisons beyond what could be assessed using the family history of disorders. The slight increase of explained variance conveyed by ancestry information underlined the highly polygenic nature of psychiatric disorders. Interestingly, a recent study highlighted the benefits of adding both the family history and PGS to prediction models . Psychiatric cross-disorder, schizophrenia, and MDD PGS were significantly higher in the most severe cluster 4 compared with cluster 0, whereas educational attainment PGS were lower—corresponding to effect directions reported in previous studies [47, 51, 53, 61, 71]. Although PGS are still far from routine clinical use in psychiatry, they might be used for patient stratification in the future [1, 5, 63, 72].
Interestingly, genetic PGS analyses on diagnostic categories produced different results from analyses of cluster labels. For example, cluster 0 showed higher educational attainment and lower neuroticism PGS, both of which did not differ significantly between healthy controls and the other probands. Similarly, cluster 4 showed an association with several PGS while schizophrenia patients only showed increased schizophrenia PGS. These genetic differences corroborated the transdiagnostic nature of the identified clusters.
Comparison to previous clustering studies
To our knowledge, the present study is the first to cluster multidomain profiles of clinical variables across psychiatric disorders and including healthy controls. Nevertheless, the cluster profiles and identified severity spectrum partially aligns with previous findings. A transdiagnostic study identified a cluster containing mainly healthy controls and exhibiting the lowest symptom scores in the observed dimensions , likely corresponding to our cluster 0. Our highly impaired cluster 4, with its high percentage of schizophrenic patients, low functioning, and significantly lower EA PGS, may correspond to the severe psychosis subtype from a previous study . Moreover, a single-disorder subtyping study  detected five clusters of MDD, with one subgroup showing an absence of many symptoms, similar to our cluster 0. Furthermore, our results highlight the correlation of various measures of childhood trauma, adverse experiences, and lack of support with illness severity, positive symptoms, hospitalizations, and the need for more intensive treatment. Several prior studies support such a correlation [30, 73,74,75,76].
Most psychiatric patients in our transdiagnostic study have been diagnosed with MDD, with only a smaller share of other, especially psychotic diagnoses. Such a distribution approximately resembles known differences in prevalence between mental health disorders in the general population. Although the high number of MDD patients allowed for a detailed description of depression subtypes, a similarly detailed characterization was not possible for psychotic disorders, which concentrated in cluster 4. Future transdiagnostic studies applying our clustering approach with more psychotic patients could focus on BD and schizophrenia subtypes, as suggested by previous single-disorder studies [25, 28].
Although we observed no overrepresentation of depression-related variables in our analysis (Results S2), we cannot entirely exclude that the variable selection influenced the obtained clustering solution. Furthermore, the diagnostic groups differed in demographic variables like age and sex, resulting in corresponding differences between clusters (Table 1, Results S2).
Moreover, although we used independent individuals for the replication data set, these probands were subsequently recruited within the same study as the discovery-stage sample. Accordingly, the proportions of healthy controls and MDD patients differed between the discovery and replication samples, limiting their comparability. We conducted the quality control of the phenotypic and genetic data jointly for both data sets, introducing minor dependencies. Furthermore, the replication sample was smaller than the discovery sample, attenuating its statistical power.
Finally, the clustering algorithm we used relied on discrete categorization and a given number of clusters. Assuming the existence of a symptom continuum from healthy to severe mental illness, future studies might consider applying methods incorporating the notion of a continuum into the global objective function .
In conclusion, our study constitutes a data-driven, computational approach to psychiatric disorder stratification that surpasses existing diagnostic categories and integrates different domain profiles.
Our analyses support the hypothesis that psychiatric disorders consist of heterogeneous subtypes that share etiological factors and symptoms. We have demonstrated the importance of stratifying symptoms and disorder subtypes that can be ranked according to their severity. Individuals formally diagnosed with the same disorder differ in their specific impairment. Furthermore, their symptoms may partly overlap with symptoms exhibited by patients with different diagnoses, highlighting the need for symptom- instead of diagnosis-specific treatment. Our transdiagnostic clustering approach may advance the understanding of the heterogeneity within and between psychiatric disorders. If applied to further cohorts, it may help the identification of patient groups sharing clinical features and thus profiting from similar treatments. The identification of such groups can lead to the development of more appropriate diagnoses, targeted treatment options, and prediction models for the disease course. Future assessments in FOR2107 and other longitudinal studies can reveal whether patients mapping to the different clusters show similar disease courses and treatment responses.
Funding and disclosure
The Forschungsgruppe/Research Unit FOR2107 study was funded by the German Research Foundation (DFG): grants KI 588/14-1, KI 588/14-2 to T.K.; DA 1151/5-1, DA 1151/5-2 to UD; NE 2254/1-2 to I.N.; HA 7070/2-2, HA 7070/3, HA 7070/4 to T.H.; MU1315/8-2 to B.M.M.; RI 908/11-1, RI 908/11-2 to M.R.; NO 246/10-1, NO 246/10-2 to M.M.N.; WI 3439/3-1, WI 3439/3-2 to S.W. The study was supported by the German Federal Ministry of Education and Research (BMBF), through the Integrated Network IntegraMent, under the auspices of the e:Med programme (grants 01ZX1314A, 01ZX1614A to M.M.N.; 01ZX1314G, 01ZX1614G to M.R.; 01ZX1614J to B.M.M.), through BMBF grants 01EE1406C to M.R. and 01EE1409C to M.R. and S.H.W., and through ERA-NET NEURON, “SynSchiz - Linking synaptic dysfunction to disease mechanisms in schizophrenia - a multilevel investigation“ (01EW1810 to M.R.) and BMBF grants 01EE1409C and 01EE1406C to M.R. and S.H.W. Till Andlauer was supported by the BMBF through the DIFUTURE consortium of the Medical Informatics Initiative Germany (grant 01ZZ1804A) and the European Union’s Horizon 2020 Research and Innovation Programme (grant MultipleMS, EU RIA 733161). The authors have nothing to disclose. Open Access funding enabled and organized by Projekt DEAL.
Andlauer TFM, Nöthen MM. Polygenic scores for psychiatric disease: from research tool to clinical application. Medizinische Genet. 2020;32:39–45.
Seow LSE, Chua BY, Xie H, Wang J, Ong HL, Abdin E, et al. Correct recognition and continuum belief of mental disorders in a nursing student population. BMC Psychiatry. 2017;17:289.
Van Os J, Linscott RJ, Delespaul P, Krabbendam L. A systematic review and meta-analysis of the psychosis continuum: evidence for a psychosis proneness – persistence – impairment model of psychotic disorder. Psychol Med. 2009;39:179–95.
Johns LC, van Os J. The continuity of psychotic experiences in the general population. Clin Psychol Rev. 2001;21:1125–41.
Schultebraucks K, Choi KW, Galatzer-Levy IR, Bonanno GA. Discriminating heterogeneous trajectories of resilience and depression after major life stressors using polygenic scores. JAMA Psychiatry. 2021. https://doi.org/10.1001/jamapsychiatry.2021.0228.
Chan CC, Shanahan M, Ospina LH, Larsen EM, Burdick KE. Premorbid adjustment trajectories in schizophrenia and bipolar disorder: a transdiagnostic cluster analysis. Psychiatry Res. 2019;272:655–62.
Maglanoc LA, Landrø NI, Jonassen R, Kaufmann T, Córdova-palomera A, Hilland E, et al. Data-driven clustering reveals a link between symptoms and functional brain connectivity in depression. Biol Psychiatry Cogn Neurosci Neuroimaging. 2019;4:16–26.
Fountain C, Winter AS, Bearman PS. Six developmental trajectories characterize children with autism. Pediatrics 2012;129:e1112 LP–e1120.
Bell MD, Corbera S, Johannesen JK, Fiszdon JM, Wexler BE. Social cognitive impairments and negative symptoms in schizophrenia: are there subtypes with distinct functional correlates? Schizophr Bull. 2011;39:186–96.
Stein F, Lemmer G, Schmitt S, Brosch K, Meller T, Fischer E, et al. Factor analyses of multidimensional symptoms in a large group of patients with major depressive disorder, bipolar disorder, schizoaffective disorder and schizophrenia. Schizophr Res. 2020;218:38–47.
Drysdale AT, Grosenick L, Downar J, Dunlop K, Mansouri F, Meng Y, et al. Resting-state connectivity biomarkers define neurophysiological subtypes of depression. Nat Med. 2017;23:28–38.
Cheng Y, Xu J, Yu H, Nie B, Li N, Luo C, et al. Delineation of early and later adult onset depression by diffusion tensor imaging. PLoS ONE. 2014;9:e112307–e112307.
Gould IC, Shepherd AM, Laurens KR, Cairns MJ, Carr VJ, Green MJ. Multivariate neuroanatomical classification of cognitive subtypes in schizophrenia: a support vector machine learning approach. NeuroImage Clin. 2014;6:229–36.
Kaczkurkin AN, Sotiras A, Baller EB, Barzilay R, Calkins ME, Chand GB, et al. Neurostructural heterogeneity in youths with internalizing symptoms. Biol Psychiatry. 2020;87:473–82.
Costa Dias TG, Iyer SP, Carpenter SD, Cary RP, Wilson VB, Mitchell SH, et al. Characterizing heterogeneity in children with and without ADHD based on reward system connectivity. Dev Cogn Neurosci. 2015;11:155–74.
Sun H, Lui S, Yao L, Deng W, Xiao Y, Zhang W, et al. Two patterns of white matter abnormalities in medication-naive patients with first-episode schizophrenia revealed by diffusion tensor imaging and cluster analysis. JAMA Psychiatry. 2015;72:678–86.
Haroon E, Chen X, Li Z, Patel T, Woolwine BJ, Hu XP, et al. Increased inflammation and brain glutamate define a subtype of depression with decreased regional homogeneity, impaired network integrity, and anhedonia. Transl Psychiatry. 2018;8:189.
Yu C, Arcos-Burgos M, Licinio J, Wong M-L. A latent genetic subtype of major depression identified by whole-exome genotyping data in a Mexican-American cohort. Transl Psychiatry. 2017;7:e1134.
Howard DM, Folkersen L, Coleman JRI, Adams MJ, Glanville K, Werge T, et al. Genetic stratification of depression in UK Biobank. Transl Psychiatry. 2020;10:163.
Van Dam NT, Connor DO, Marcelle ET, Ho EJ, Craddock RC, Tobe RH, et al. Archival report data-driven phenotypic categorization for neurobiological analyses: beyond DSM-5 labels. Biol Psychiatry. 2017;81:484–94.
Tokuda T, Yoshimoto J, Shimizu Y, Okada G, Takamura M, Okamoto Y, et al. Identification of depression subtypes and relevant brain regions using a data-driven approach. Sci Rep. 2018;8:14082.
Beijers L, Wardenaar KJ, van Loo HM, Schoevers RA. Data-driven biological subtypes of depression: systematic review of biological approaches to depression subtyping. Mol Psychiatry. 2019;24:888–900.
Geisler D, Walton E, Naylor M, Roessner V, Lim KO, Charles Schulz S, et al. Brain structure and function correlates of cognitive subtypes in schizophrenia. Psychiatry Res. 2015;234:74–83.
Dwyer DB, Cabral C, Kambeitz-Ilankovic L, Sanfelici R, Kambeitz J, Calhoun V, et al. Brain Subtyping Enhances The Neuroanatomical Discrimination of Schizophrenia. Schizophr Bull. 2018;44:1060–9.
Dickinson D, Pratt DN, Giangrande EJ, Grunnagle M, Orel J, Weinberger DR, et al. Attacking heterogeneity in schizophrenia by deriving clinical subgroups from widely available symptom data. Schizophr Bull. 2017;44:101–13.
Helmes E, Landmark J. Subtypes of schizophrenia: a cluster analytic approach. Can J Psychiatry. 2003;48:702–8.
Brodersen KH, Deserno L, Schlagenhauf F, Lin Z, Penny WD, Buhmann JM, et al. Dissecting psychiatric spectrum disorders by generative embedding. NeuroImage Clin. 2014;4:98–111.
Farmer AE, McGuffin P, Spitznagel EL. Heterogeneity in schizophrenia: a cluster-analytic approach. Psychiatry Res. 1983;8:1–12.
Lee J, Rizzo S, Altshuler L, Glahn DC, Miklowitz DJ, Sugar CA, et al. Deconstructing bipolar disorder and schizophrenia: a cross-diagnostic cluster analysis of cognitive phenotypes. J Affect Disord. 2017;209:71–79.
Carbone EA, Pugliese V, Bruni A, Aloi M, Calabrò G, Jaén-moreno MJ, et al. Adverse childhood experiences and clinical severity in bipolar disorder and schizophrenia: a transdiagnostic two-step cluster analysis. J Affect Disord. 2019;259:104–11.
Kleinman A, Caetano SC, Brentani H, Rocca CC, de A, dos Santos B, et al. Attention-based classification pattern, a research domain criteria framework, in youths with bipolar disorder and attention-deficit/hyperactivity disorder. Aust N. Zeal J Psychiatry. 2014;49:255–65.
Dwyer DB, Kalman JL, Budde M, Kambeitz J, Ruef A, Antonucci LA, et al. An investigation of psychosis subgroups with prognostic validation and exploration of genetic underpinnings: the PsyCourse study. JAMA Psychiatry. 2020;77:523–33.
Forbush K, Hagan K, Kite B, Chapa D, Bohrer B, Gould S. Understanding eating disorders within internalizing psychopathology: a novel transdiagnostic, hierarchical-dimensional model. Compr Psychiatry. 2017;79:40–52.
Grisanzio KA, Goldstein-Piekarski AN, Wang MY, Rashed Ahmed AP, Samara Z, Williams LM. Transdiagnostic symptom clusters and associations with brain, behavior, and daily function in mood, anxiety, and trauma disorders. JAMA Psychiatry. 2018;75:201–9.
Lewandowski KE, Sperry SH, Cohen BM, Ongür D. Cognitive variability in psychotic disorders: a cross-diagnostic cluster analysis. Psychol Med. 2014;44:3239–48.
Maj M. Why the clinical utility of diagnostic categories in psychiatry is intrinsically limited and how we can use new approaches to complement them. World Psychiatry. 2018;17:121.
Fusar-Poli P, Solmi M, Brondino N, Davies C, Chae C, Politi P, et al. Transdiagnostic psychiatry: a systematic review. World Psychiatry. 2019;18:192–207.
Bouveyron C, Girard S, Schmid C. High-dimensional data clustering. Comput Stat Data Anal. 2007;52:502–19.
Kircher T, Wöhr M, Nenadic I, Schwarting R, Schratt G, Alferink J, et al. Neurobiology of the major psychoses: a translational perspective on brain structure and function—the FOR2107 consortium. Eur Arch Psychiatry Clin Neurosci. 2019;269:949–62.
Wittchen H-U, Wunderlich U, Gruschwitz S, Zaudig M. SKID I. Strukturiertes Klinisches Interview für DSM-IV. Achse I: Psychische Störungen. Interviewheft und Beurteilungsheft. Eine deutschsprachige, erweiterte Bearb. d. amerikanischen Originalversion des SKID I. 1997.
Meller T, Schmitt S, Stein F, Brosch K, Mosebach J, Yüksel D, et al. Associations of schizophrenia risk genes ZNF804A and CACNA1C with schizotypy and modulation of attention in healthy subjects. Schizophr Res. 2019;208:67–75.
Andlauer TFM, Buck D, Antony G, Bayas A, Bechmann L, Berthele A, et al. Novel multiple sclerosis susceptibility loci implicated in epigenetic regulation. Sci Adv. 2016;2:e1501678–e1501678.
Ge T, Chen C-Y, Ni Y, Feng Y-CA, Smoller JW. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat Commun. 2019;10:1776.
Demontis D, Walters RK, Martin J, Mattheisen M, Als TD, Agerbo E, et al. Discovery of the first genome-wide significant risk loci for attention deficit/hyperactivity disorder. Nat Genet. 2019;51:63–75.
Grove J, Ripke S, Als TD, Mattheisen M, Walters RK, Won H, et al. Identification of common genetic risk variants for autism spectrum disorder. Nat Genet. 2019;51:431–44.
Stahl EA, Breen G, Forstner AJ, McQuillin A, Ripke S, Trubetskoy V, et al. Genome-wide association study identifies 30 loci associated with bipolar disorder. Nat Genet. 2019;51:793–803.
Lee P, Anttila V, Won H, Feng Y-C, Rosenthal J, Zhu Z, et al. Genomic relationships, novel loci, and pleiotropic mechanisms across eight psychiatric disorders. Cell 2019;179:1469–82.
Okbay A, Beauchamp JP, Fontana MA, Lee JJ, Pers TH, Rietveld CA, et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature 2016;533:539–42.
van den Berg SM, de Moor MHM, Verweij KJH, Krueger RF, Luciano M, Arias Vasquez A, et al. Meta-analysis of genome-wide association studies for extraversion: findings from the genetics of personality consortium. Behav Genet. 2016;46:170–82.
Baselmans BML, Bartels M. A genetic perspective on the relationship between eudaimonic -and hedonic well-being. Sci Rep. 2018;8:14610.
Howard DM, Adams MJ, Clarke T-K, Hafferty JD, Gibson J, Shirali M, et al. Genome-wide meta-analysis of depression identifies 102 independent variants and highlights the importance of the prefrontal brain regions. Nat Neurosci. 2019;22:343–52.
Luciano M, Hagenaars SP, Davies G, Hill WD, Clarke T-K, Shirali M, et al. Association analysis in over 329,000 individuals identifies 116 independent variants influencing neuroticism. Nat Genet. 2018;50:6–11.
Pardiñas AF, Holmans P, Pocklington AJ, Escott-Price V, Ripke S, Carrera N, et al. Common schizophrenia alleles are enriched in mutation-intolerant genes and in regions under strong background selection. Nat Genet. 2018;50:381–9.
Berg L, Bouveyron C, Girard S. HDclassif: An R package for model-based clustering and discriminant analysis of high-dimensional data. J Stat Softw. 2012;46:i11.
Rifkin R, Klautau A. In defense of one-vs-all classification. J Mach Learn Res. 2004;5:101–41.
Bouveyron C, Girard S, Schmid C. High-dimensional discriminant analysis. Commun Stat - Theory Methods. 2007;36:2607–23.
Tibshirani R. Regression Shrinkage and Selection via the Lasso. J R Stat Soc (Ser B). 1996;58:267–88.
Westfall PH, Young SS. Resampling-based multiple testing: examples and methods for p-value adjustment. (Wiley, 1993).
Redlich R, Almeida JR, Grotegerd D, Opel N, Kugel H, Heindel W, et al. Brain morphometric biomarkers distinguishing unipolar and bipolar depression: a voxel-based morphometry–pattern classification approach. JAMA Psychiatry. 2014;71:1222–30.
Huang H, Liu Y, Yuan M, Marron JS. Statistical significance of clustering using soft thresholding. J Comput Graph Stat. 2015;24:975–93.
Coleman JRI, Gaspar HA, Bryois JConsortium; BDWG of the PG, Consortium MDDWG of the PG, Breen G. The genetics of the mood disorder spectrum: genome-wide association analyses of over 185,000 cases and 439,000 controls. Biol Psychiatry. 2020;88:169–184.
Levey DF, Stein MB, Wendt FR, Pathak GA, Zhou H, Aslan M, et al. GWAS of depression phenotypes in the million veteran program and meta-analysis in more than 1.2 million participants yields 178 independent risk loci. MedRxiv. 2020; https://doi.org/10.1101/2020.05.18.20100685.
Bansal V, Mitjans M, Burik CAP, Linnér RK, Okbay A, Rietveld CA, et al. Genome-wide association study results for educational attainment aid in identifying genetic heterogeneity of schizophrenia. Nat Commun. 2018;9:3078.
Hamshere ML, Stergiakouli E, Langley K, Martin J, Holmans P, Kent L, et al. Shared polygenic contribution between childhood attention-deficit hyperactivity disorder and adult schizophrenia. Br J Psychiatry. 2013;203:107–11.
Dalsgaard S, Mortensen PB, Frydenberg M, Maibing CM, Nordentoft M, Thomsen PH. Association between Attention-Deficit Hyperactivity Disorder in childhood and schizophrenia later in adulthood. Eur Psychiatry. 2014;29:259–63.
Rubino IA, Frank E, Croce Nanni R, Pozzi D, Lanza di Scalea T, Siracusano A. A comparative study of axis i antecedents before age 18 of unipolar depression, bipolar disorder and schizophrenia. Psychopathology 2009;42:325–32.
Cuthbert BN. The RDoC framework: facilitating transition from ICD/DSM to dimensional approaches that integrate neuroscience and psychopathology. World Psychiatry. 2014;13:28–35.
Dang J, King KM, Inzlicht M. Why are self-report and behavioral measures weakly correlated? Trends Cogn Sci. 2020;24:267–9.
Hedge C, Powell G, Sumner P. The reliability paradox: why robust cognitive tasks do not produce reliable individual differences. Behav Res Methods. 2018;50:1166–86.
Hujoel MLA, Loh P-R, Neale B, Price AL. Incorporating family history of disease improves polygenic risk scores in diverse populations. BioRxiv. 2021; https://www.biorxiv.org/content/10.1101/2021.04.15.439975v1.
Wray NR, Ripke S, Mattheisen M, Trzaskowski M, Byrne EM, Abdellaoui A, et al. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat Genet. 2018;50:668–81.
Murray GK, Lin T, Austin J, McGrath JJ, Hickie IB, Wray NR. Could polygenic risk scores be useful in psychiatry?: a review. JAMA Psychiatry. 2021;78:210–9.
Varese F, Smeets F, Drukker M, Lieverse R, Lataster T, Viechtbauer W, et al. Childhood adversities increase the risk of psychosis: a meta-analysis of patient-control, prospective- and cross-sectional cohort studies. Schizophr Bull. 2012;38:661–71.
Misiak B, Krefft M, Bielawski T, Moustafa AA, Sąsiadek MM, Frydecka D. Toward a unified theory of childhood trauma and psychosis: a comprehensive review of epidemiological, clinical, neuropsychological and biological findings. Neurosci Biobehav Rev. 2017;75:393–406.
Li X-B, Li Q-Y, Liu J-T, Zhang L, Tang Y-L, Wang C-Y. Childhood trauma associates with clinical features of schizophrenia in a sample of Chinese inpatients. Psychiatry Res. 2015;228:702–7.
Janssen I, Krabbendam L, Bak M, Hanssen M, Vollebergh W, Graaf R, et al. Childhood abuse as a risk factor for psychotic experiences. Acta Psychiatr Scand. 2004;109:38–45.
Shah SA, Koltun V. Deep continuous clustering. 2018; https://arxiv.org/abs/1803.01449.
This work is part of the German multi-center consortium “Neurobiology of Affective Disorders. A translational perspective on brain structure and function”, funded by the German Research Foundation (Deutsche Forschungsgemeinschaft DFG; Forschungsgruppe/Research Unit FOR2107). Please see the Supplement for full FOR2107 acknowledgments. We would like to thank Karsten Borgwardt, Julien Gagneur, and Janos Kalman for their helpful comments.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Pelin, H., Ising, M., Stein, F. et al. Identification of transdiagnostic psychiatric disorder subtypes using unsupervised learning. Neuropsychopharmacol. 46, 1895–1905 (2021). https://doi.org/10.1038/s41386-021-01051-0