Psychiatric disorder classifications are transformed by the enhanced knowledge of the RDoC (Research Domain Criteria) method1,2. Statistical modelling of complex clinical and behavioural datasets can facilitate the redefinition and reconceptualization of various diagnoses with unclear boundaries3. There is overlap in clinical manifestations and illness trajectories in major psychiatric categories such as affective disorders4, anxiety disorders5,6 and psychosis7. In our previous report8, we found that personality disorder (PD) was very common in psychiatric patients, and comorbidity between PD and other psychiatric disorders do not have fixed schema and are typically uncertainty. For instance, the proportions of anxiety disorders with co-morbid PD were varied from 2.1% (Schizotypal PD) to 13.4% (Obsessive–compulsive PD). The inverse is also true that the proportions of PD with co-morbid other psychiatric disorders were highly varied. The categorical methods for current psychiatric diseases has been highlighted as having significant problems such as arbitrary diagnostic thresholds and extensive overlap among diagnostic categories. Internationally, major transformations of psychiatric diagnoses from categorical to dimensional approaches are underway. In other ways, there are large quantities of evidence in the classification of psychiatric disorders with remarkably similar personality pathologies into distinct disorders9,10. Therefore, we assume that personality pathologies (PD traits) through dimensional approach rather than categorical diagnoses of PD may better reflect different subgroups of psychiatric patients.

Not only are personality pathologies associated with mental illness, but childhood traumatic experiences (CTE) are also closely related to it11. More crucial is that the relationship between CTE and PD is extremely close, possibly together, they affected subsequent psychiatric disorders12,13. However, PD classifications remain remarkably heterogeneous and are often associated with multiple forms of childhood traumatic experiences (CTE)14. Previous studies12,13 have tended to focus on the correlational pattern between PD and CTE, but have not yet tended to use these correlation characteristics to construct subtypes of mental disorders or subgroups of psychiatric patients. The present study applied, for the first time, a canonical correlation analysis (CCA) method for defining subtypes by clustering psychiatric patients, with different clinical diagnoses, according to the patterns of the relationship (canonical variates) between PD traits and CTE. We further tested whether the CCA-derived subtypes differed with regards to demographics, clinical classifications, and PD categories.


Subjects, settings and procedures

The study was conducted following the tenets of the Helsinki Declaration and approved by the Research Ethics Committee of the Shanghai Mental Health Center (SMHC). All participants gave written informed consent at the recruitment stage of the study. The sample consisted of 3,075 consecutive outpatients who visited psychiatric or psychological health services at SMHC in 2006, the largest clinical setting for psychiatric services which served more than 800,000 outpatients in 2018. The SMHC is not a regional psychiatric hospital, but serves patients from the whole country. There are two outpatient departments settings: psychiatric and psychocounseling units, which the former targeted patients with severe mental disorders (such as psychosis) and the later targeted for mental health problems (such as anxiety and depression).

In this survey, every 10th subject in psychocounseling and every 20th subject in psychiatric clinics was selected. Inclusion criteria were as follows: (i) age, 18–60 years; (ii) individuals with the capacity to provide informed consent; and (iii) an educational background of at least junior middle school. Junior middle school is part of nine-year compulsory education in China, which is a very low level of educational requirement. Patients with severe somatic diseases, acute phase of psychoses, and diagnoses of mental retardation or dementia were excluded. Details of the study procedures; study setting; and measurements and assessments, including the steps taken to ensure a very high quality control and assurance of the procedures, are reported elsewhere8,15.

Outpatients who met our inclusion criteria received an invitation for an epidemiological survey and a free personality assessment. The questionnaire and face-to-face interview were used as a two-stage process for diagnosing PDs. Individuals whose Personality Diagnostic Questionnaire Fourth Edition Plus (PDQ-4+)16 test results were positive (total score, >28 or specific PD subscale scores, >4 or 5) entered the second stage. Participants were referred to two senior psychiatrists with 5 years of experience; each received 2 weeks of training to perform the structured clinical interview for PD assessment. The final sample included in this analysis are 2090 patients whose self-reported questionnaires and face-to-face interview were completed.


Demographic data

A self-made demographic questionnaire was administered to assess the participants’ personal, family, and social background including their physical and mental health conditions. (Table 1).

Table 1 Demographic characteristics, personality disorder traits and childhood traumatic experience, comparison of groups among clinical diagnoses.

PD trait

The PDQ-4+, a self-reported questionnaire, was used for assessing PD trait. Our previous study8 and other research has validated that the PDQ-4+ has a high sensitivity (0.89) and moderate specificity (0.65) for screening PD patients. The PDQ-4+ was designed to measure all 10 PDs traits in DSM-IV, including negativistic PD and depressive PD which are included in the Appendix of DSM-IV. Those 10 PD traits are Paranoid PD trait (7 items, cut-off = 4), Schizoid PD trait (7 items, cut-off = 4), Schizotypal PD trait (9 items, cut-off = 5), Histrionic PD trait (8 items, cut-off = 5), Narcissistic PD trait (9 items, cut-off = 5), Borderline PD trait (9 items, cut-off = 5), Antisocial PD trait (8 items, cut-off = 4), Avoidant PD trait (7 items, cut-off = 4), Dependent PD trait (8 items, cut-off = 5), Obsessive-compulsive PD trait (8 items, cut-off = 4).

PD diagnosis

The Structured Clinical Interview for DSM-IV Axis II (SCID-II)17 was designed to measure all 10 PDs in the DSM-IV, the criteria used for PD diagnoses in this study. The Chinese version of SCID-II was translated and implemented by our team. Previous studies have demonstrated that the SCID-II Chinese version has a relatively high test–retest reliability of 0.70, with a median coefficient for internal consistency of 0.70, which is highly consistent (diagnostic agreement rate of 90.7%) with the clinical diagnoses.

Childhood trauma experience (CTE)

A quantitative index of childhood adversity severity was assessed using the Child Trauma Questionnaire Short Form (CTQ-SF)14,18. The CTQ-SF consists of 5 CTE subscales from 28 self-reported items: emotional abuse, physical abuse, sexual abuse, emotional neglect, and physical neglect, with a range of 5 (low level of CTE) to 25 (high level of CTE).

Statistical analysis


Relationships between PD traits and CTE were investigated using nonparametric correlations and CCA. Spearman-rank correlations were used to test for associations between PD traits (12 items) and CTE (5 items), comprising 60 combinations, in this case resulting in 43 significant correlations (Fig. 1). We used Spearman-rank correlations because clinical variables did not fit a normal, continuous distribution. For the purpose of selecting a subset of relevant, non-redundant clinical and cognitive features, CCA was applied to identify a representation of PD trait domains that were associated with weighted combinations of CTE. CCA determines pairs of linear combinations, termed canonical variables, from two sets of variables (PD trait and CTE) (Fig. 2), such that the correlation between canonical variables is maximized.

Figure 1
figure 1

Spearman correlation between childhood traumatic experience(CTE) and personality disorder(PD) traits. Note: The connections between CTE and PD traits are sized based on the p-values.

Figure 2
figure 2

Two pairs (A,B) of canonical correlations between childhood traumatic experience (CTE) and personality disorder (PD) traits.


To find clusters in this 2-dimensional space of data points (each point represents an individual case), hierarchical cluster analysis was applied by using MATLAB’s pdist, linkage, cluster, and cluster data functions. For the current analysis, an average-linkage algorithm was used to cluster outpatients. Euclidean distance was used as a metric to evaluate sample similarity. This method was used in order to find clusters among the data points according to the inter-point and inter-cluster distances. The Euclidean distance between every pair of subjects in this 2-dimensional feature space was calculated, and then Ward’s minimum variance method was used to select a specific clustering from the dendrogram (Fig. 3), iteratively linking pairs of subjects in closest proximity, forming progressively larger clusters in a hierarchical tree. The hierarchical clustering analysis was used to delineate clusters of subjects in a two-dimensional space defined by these two canonical variates. This three-cluster solution was optimal for summarizing relatively homogeneous subgroups that were maximally dissimilar from each other. Additional potential clustering solutions, indicating four or five clusters, were also evident, nested within these subgroups. Detailed baseline demographic, PD diagnosis, PD traits and CTE for the three clusters can be found in the Table 2.

Figure 3
figure 3

Hierarchical cluster analysis of two canonical variates and a scatterplot for the three clusters.

Table 2 Demographic and clinical characteristics, personality disorder traits and childhood traumatic experience, comparison of groups among three clusters.


We further assessed the utility of the extracted sub-clusters of outpatients in classification clinical diagnosis. To evaluate whether subtypes are identical to clinical phenomenological diagnostic definitions, Support Vector Machine (SVM) model was trained and validated using the Linear kernel to illustrate the relationship of this three-cluster solution to clinical diagnostic classifications (Fig. 4).

Figure 4
figure 4

Support Vector Machine(SVM) model for three-cluster solution (A) and clinical diagnostic classifications (B).


Sample characteristics

Characteristics of 2090 patients are presented in Table 1, including demographics, clinical diagnosis, PD diagnosis, PDQ-4+ scores, and CTQ scores.


Most CTQ scores were significantly associated with PD traits (Fig. 1), especially the relationship between emotional abuse and Clusters A and B PD traits, physical abuse and Cluster B PD traits, sexual abuse and Cluster B PD traits, and emotional/physical neglect and Cluster A PD traits.

Canonical correlation

The CCA identified a dimensional representation of CTE, quantified by the CTQ, that was associated with the dimensional representation of PD traits, quantified by the PDQ-4+. After performing the CCA, we determined two linear combinations of PD trait (canonical variates) that were correlated with distinct CTE combinations, which we term “emotion abuse-related dissociality PD traits” (Fig. 2A) and “emotion neglect-related sociality PD traits” (Fig. 2B).

The first leading pair of dimensions extracted by the CCA from CTE and PD traits showed a statistically significant correlation coefficient of 0.31 (p < 0.001). The first CTE components (canonical variates) indicated that primarily emotional abuse was correlated with antisocial and paranoid PD traits (Fig. 2A), which were defined as dissociality PD traits. The scatterplot illustrates the correlation between dimensional CTQ scores and dimensional clinical scores for the PDQ-4+. To the left of scatterplot, PDQ-4+ score loadings are depicted for those PD traits with the strongest loadings. To the right of the scatterplot, CTQ score loadings are depicted for emotional abuse, which demonstrated the strongest loading.

The second leading pair of dimensions extracted from the CCA showed a strong tendency towards a statistically significant correlation coefficient of 0.12 (p < 0.001). The second CTE component indicated that primarily emotional neglect was correlated with schizoid, passive-aggressive, depressive, histrionic, and avoidant PD traits (Fig. 2B), which was defined to be sociality PD traits.

Hierarchical cluster analysis

To explore clusters, a hierarchical cluster analysis was applied to assign samples to nested subgroups with similar patterns of relationships between CTE and PD traits. Our analysis revealed three clusters defined by distinct and relatively homogeneous patterns along two dimensions (Fig. 3), comprising 17.5% (cluster 1, n = 365), 34.8% (cluster 2, n = 727), and 47.8% (cluster 3, n = 998) of the 2090 participant sample.


The three-cluster SVM model was trained and validated using the linear kernel. The results of the full analysis (confusion matrix, accuracy, sensitivity, and specificity) are presented in Fig. 4. Figure 4A shows the confusion matrix for the three clusters obtained by SVMs with linear kernel, achieving an overall accuracy of 96.8%. However, Fig. 4B shows the three clinical diagnostic categories (schizophrenia, mood disorder, anxiety disorder) only achieved an overall accuracy of 45.0%.

Distribution of subtypes across clinical diagnoses

Table 2 depicts the demographic, clinical Axis-I diagnosis, PD diagnosis, and CTE profiles of the three clusters defined by the CCA. The features of the three subtypes are summarized as follows: Cluster 3 had the highest proportion of PD, and reported the most severe pathology of PD traits, and the most severe CTE. Cluster 1 was moderately impaired on PD traits and reported moderate CTE. Cluster 2 had the lowest proportion of PD, and reported the mildest pathology of PD traits, and the least CTE. As illustrated in Fig. 5, there was considerable mixing across subtypes of clinical diagnoses. Similarly, there was considerable overlap across the clinical diagnoses on PD traits and CTE scores. There was an equal distribution of clinical diagnoses across subtypes (Table 2), except for anxiety disorders. These subtypes suggest more distinct PD trait correlates of CTE manifestation than were captured by clinical phenomenological diagnostic definitions.

Figure 5
figure 5

Distribution of personality disorder (PD) traits and childhood traumatic experience (CTE) scores by subtype and clinical diagnosis.


The present study used CCA to determine whether psychiatric patients display different subtypes that can be distinguished by combinations of PD traits and CTE variables. We found three such subtypes independent of specific clinical diagnostic classifications. Each subtype included all psychiatric categories, but in Cluster 3, there were higher and lower numbers of patients with schizophrenia and anxiety disorders, respectively.

An important feature of our CCA approach was that multicollinearity (highly correlated relationship between the variables make it difficult for the model to estimate accurately) was avoided19. CCA uses information from all the variables in the PD traits and CTE variable sets, maximizes the estimation of the relationship between the two sets20, and may identify subtypes in a more efficient way. Compared with conventional multiple testing, CCA may also help reduce type 1 error and increase result accuracy. Our analyses produced statistical support for the existence of three subtypes within the Chinese psychiatric patients, as well as effective characteristic variables for distinguishing Cluster 3 patients from the other two clusters, which contained the most severe pathology of personality traits.

The present findings provide the first evidence, to our knowledge, of transdiagnostic subtypes through the correlation between PD traits and CTE at the clinical population level. This result is linked to specific units of an RDoC analysis of self-reports21 and stands in contrast to other RDoC studies that have identified subtypes via biological dimensions22,23,24. Cluster 3 was characterized by extensive and severe dissociality PD traits and more emotional abuse during childhood. In contrast, individuals in Cluster 1 were characterized by severe sociality PD traits and experienced more emotional neglect. Individuals in Cluster 2 showed a lesser severity of PD traits and CTE. These differential patterns of PD traits and CTE across subtypes invoke an explanation for the marked diagnostic disagreement25 in psychiatric disorders that is routinely observed across clinicians26. Our study followed with the RDoC approach suggest that transdiagnostic factors such as PD traits and CTE that may be affect clinical classification of subtypes, rather than by the current overreliance of the classification of patients to psychiatric diagnosis.

These data also indicate that there may be multiple pathways to similar clinically psychiatric manifestations. Subtypes identified in the present study were also distributed fairly uniformly across all clinical categories, especially in affective disorders, suggesting that PD traits and CTE27 have broadband transdiagnostic associations across common psychiatric disorders28. This finding extends the results of previous RDoC studies29,30 to the personality pathology-childhood adversity dimensions of a wide range of disorders. Furthermore, by exploring CTE and its association with PD traits, the present study was able to examine the role of differential types of CTE in PD traits in a transdiagnostic framework.

Clinical implications

The heterogeneity of psychiatric diagnosis has emerged as a major obstacle to the development of intervention strategies and, in particular, methods of psychotherapy targeting personality31 and trauma32. Similarly, efforts to validate and replicate certain psychotherapies are inefficient, owing in part to a lack of ability to select psychiatric patients for intervention, who are most likely to benefit from personality remodelling and psychological trauma rehabilitation. Further, psychiatric disorders are treated as a single label for heterogeneous PD traits and CTE, which is problematic as it is clear that varying subtypes should be treated differently. In the current study, we determined three subtypes of patients that were associated with different patterns of PD traits and distinct CTE, which have also been broadly distributed across psychiatric diagnoses. However, we are in the initial stages of this subtype approach; therefore, it is premature to suggest definitive claims until more clinical interventions targeting subtypes of patients are conducted.

Strengths and limitations

The strengths of the present study include its novelty in developing CCA-driven subtypes for a clinical population, based on a large-scale, transdiagnostic, psychiatric sample. The use of canonical variates (i.e., combination of PD traits and CTE) as indicators is novel. However, several limitations of this study must be considered. First, a single centre was used for sample recruitment, which could limit our ability to generalize the findings. However, the centre is the largest psychiatric hospital in China, which serves over 800,000 outpatients per year with more than half coming from across the nation. Second, this is a cross-sectional design; it remains unknown whether the three subtypes of patients predicted symptom onset and maintenance. Future longitudinal studies in the experimental manipulation of psychotherapy will be important for clarifying the precise role of these subtypes. Third, CTE was measured based on retrospective self-reports, which can be biased33 and may be overestimated in patients with negative life outcomes in adulthood34. Fourth, the current dataset was only included for those whose PDQ-4+ screening was positive, which could result in uncertain whether there are any other subtypes can be identified among patients whose PDQ-4+ test was negative. Finally, while the current dataset contains a relatively large number of sample, it is limited by the single site design without independent sample for the model validation. Thus, the validation by SVM method for 3 clusters model may suffer from overfitting.


The present study extends the previous work of the transdiagnostic approach by applying a CCA-driven approach to determine subtypes for capturing PD traits-CTE distinctiveness in psychiatric disorders. Three subtypes emerged with distinctive features based on two dimensions: emotional abuse-dissociality PD traits and emotional neglect-sociality PD traits. If replicated, findings would suggest the clinical utility of these subtypes in psychiatric practice. Further research is needed to explore whether psychiatric outcomes can be improved by more individualized (i.e., by subtype) psychosocial interventions.