Introduction

Emotions are temporally limited, qualitative states that are associated with a change on the level of feelings, expression and physical states1. Emotion generation can be described in four steps: (1) an emotional relevant situation occurs; (2) we direct our attention to this situation; (3) we judge and evaluate the situation which results (4) in an emotional reaction, that can then either be regulated or not2. A common model for emotion regulation (ER) is the one by Gross3. According to this model, emotions that need regulating have to be identified; an ER strategy has to be picked; and this strategy has to be implemented and evaluated whether the strategy is sufficient or needs adaptation. ER are thus all processes that enable us to influence our emotions in such a way as to when and how we experience and express them.

ER is proposed as an additional sixth factor within the Research Domain Criteria Initiative (RDoc) of the National Institute of Mental Health (NIMH) that reflects interactions among the other five domains4,5. Recent studies in the field have shown that 40% to 70% of mental disorders are characterized by emotion dysregulation – with Attention Deficit/Hyperactivity Disorder (ADHD) being one of them6,7. This line of research points to ER and dysregulation as a biologically founded, dimensional, key transdiagnostic factor cutting across the specified R-DoC domains of the negative (ie, responses to threat/fear) and positive valence systems (ie, responses to reward), the cognitive system (ie, regulation of capacity limited systems such as awareness, higher perceptual processes, motor action), and system of social processes (ie, affiliation/attachment and social communication) as well as the arousal/regulatory system (ie, activation of neural systems for various contexts)8. As such, ER deficits distinguish between groups of healthy and mentally ill people, and are basically observable for all mental disorders with experimental research9, demonstrating that such deficits are central to the development and maintenance of psychopathology10,11.

In children, adolescents and adults with ADHD common co-occurring features relate to altered emotional experience like irritability, emotional hyper-responsiveness, rapidly changing moods and impaired emotion regulation as seen in reactive aggression, temper outbursts and poorly controlled behavior in emotionally colored situations12,13,14. A comprehensive review on ADHD and ER concluded that emotion regulation deficits (ERD) are evident in about 34–70% of adults with ADHD15. Those features are associated with mental health problems and unfavorable psychosocial outcomes in the mid- and long-term16,17,18. For instance, high levels of emotional lability explain a substantial portion of variance related to the morbidity and burden of adult ADHD12, and ERD was associated with significant functional impairments such as professional and familial instability17,19, as well as high-risk behaviors, such as risky sexual or driving behaviors19,20. Accordingly, ERD has been recognized as an important dimension of ADHD15,21,22, but is neither considered in the diagnostic criteria for the disorder in the Diagnostic and Statistical Manual of Mental Disorders23,24 nor in the International Classification of Diseases 10th revision25. Reasons why ERD had not been included as a third dimension aside attention deficits and hyperactivity/impulsivity are that those symptoms are not specific to ADHD26,27, vary substantially even in healthy controls28, and finally an explanation for the etiology of those affective symptoms in ADHD as well as a clear-cut definition of the various emotional symptoms29 is missing. The latter is also recognized in the current practitioner review on emotional dysregulation in ADHD29. Though there is no single, standard definition of ERD, it is generally regarded as a multidimensional construct entailing a lack of inhibition together with strong negative and positive emotions, and the failure to engage in self-regulatory actions27. In the review by Shaw and colleagues15 three different models for the association between adult ADHD and emotion regulation that are also presented in the recent practitioner review26 are discussed.

The first model conceptualizes ERD as a core symptom of ADHD30,31,32 based on a joint neurocognitive deficit, in the way that emotion regulation deficits in ADHD are underpinned by broader aspects of self-regulation and executive control33. For the second model it is argued that ERD entails some specific and dissociable neurocognitive components beyond executive dysfunctions34, that is ERD and ADHD are seen as correlated but distinct dimensions with overlapping, though separable neurocognitive deficits. This is supported by recent observations that emotion regulation independently contributed to the distinction between children with ADHD and typically developing children35,36. The third model regarding the observed overlap between ADHD and emotional dysregulation emphasizes the need to consider the combination of ADHD and ERD as a separate entity37. This view is supported by the observation that emotional dysregulation is more often seen in the combined ADHD presentation with a higher symptom load and a poorer outcome14,22,38,39 and by studies demonstrating that ADHD and bipolar disorder often occur together and aggregate in families at higher-than-expected rates40. Surman et al.37 for example showed that deficient emotional self-regulation was only elevated in siblings of adult patients with ADHD and deficient emotional self-regulation. At least it seems that ERD should be viewed as a defining feature of more severe adult ADHD14.

In order to further illuminate the link between ERD and ADHD specified by those three conceptual positions, we recently used a confirmatory factor analysis approach41. We modeled positive and negative emotion as well as emotion regulation skills along with the classical ADHD-core symptoms. We showed that negative affect and the failure to apply adaptive emotion regulation skills were distinct and indicative dimensions in adult ADHD41. Although this result supports ERD to be of special importance to ADHD, it neglects the obvious clinical heterogeneity of emotional symptoms observed in adult ADHD samples. Following the work by Nigg and colleagues42 on subtyping executive deficits in ADHD, the present study now uses a person-centered approach to model ADHD heterogeneity within the emotional dimensions we already established.

Results

The k means cluster analysis algorithm within ALMO 15 has the possibility to examine the classification variables regarding their influence for cluster formation. Table 1 displays the partial η2 for the variables entered into the k means cluster analysis performed in ALMO 15.

Table 1 Partial η2 for the variables entered into the k means cluster analysis performed in ALMO 15.

A two-cluster solution with an F value of 87.21 and an η2 of 0.185 was proposed, meaning that 18.5% of the variance can be explained by this partitioning. This solution possesses good quality criteria43,44. It can be seen in Table 1 that several variables have small partial η2 values under the cut-off value for a medium effect45 of 0.06 and therefore contribute less to cluster formation. We therefore decided to exclude them from further analyses to obtain a stable cluster solution. Excluded were thus all neuropsychological variables ASTM, Qb+ Activity, Qb+ Impulsivity, Qb+ Inattention. Furthermore, we excluded CAARS-O inattention, and the SCID-II total number of symptoms regarding obsessive-compulsive, histrionic, and antisocial personality disorder.

We substantiated our analysis in R. Several indicators like average proportion of non-overlap (APN), average distance between means (ADM), figure of merit (FOM), Dunn index and Silhouette coefficient confirmed that the k means algorithm with two clusters should be the most appropriate grouping for our data (R package clValid)46. Within the R package NbClust, which contains 30 indices for choosing the best number of clusters, we further elaborated the optimal number of clusters in k means. The results are displayed in Fig. 1.

Figure 1
figure 1

Optimal number of clusters in k means determined by R package NbClust.

It can be seen in Fig. 1 that 12 indices propose two as the optimal number of clusters and six indices three clusters as the optimal number. We therefore chose the two-cluster solution. The ratio of the between sum of squares and the total sum of squares indicated that 25.0% of the variance can be explained by this partitioning. This approximately corresponds to the result in ALMO and also signals a satisfactory solution. Cluster 1 consisted of 181 (47%) and cluster 2 consisted of 204 (53%) patients.

The descriptive statistics of the classification variables in Table 2 reveal that subjects in cluster 2 have a higher symptom load on the CAARS: S & O, higher mean depressive symptoms (BDI-II), a higher total mean on the SCL-90-R, less positive affect, higher negative affect, larger difficulties in emotion regulation, and a higher symptom load on all included personality disorders of the SCID-II questionnaire. According to Cohen’s d effect sizes, most of the differences can be considered as large. All differences between the two clusters reach statistical significance (Welch-test, p < 0.0001) after adjusting for multiple testing.

Table 2 Descriptive statistics of classification variables used in k means cluster analysis.

In order, to get a better understanding of the ERSQ differences between the two clusters, we compared our data to previously published clinical and healthy samples (Table 3). Both, cluster 1 and cluster 2 report significantly less emotion regulation skills than two independent samples of healthy controls. The effect sizes for cluster 1 show that compared to healthy samples the deficits in ER are small to medium. For cluster 2, the same comparison reveals large deficits. Compared to clinical samples, cluster 2 even shows larger deficits than patients with major depressive and adjustment disorders, but similar deficits as patients with chronic recurrent depressive disorders. In contrast, cluster 1 reports better skills compared to other clinical groups.

Table 3 Descriptive Statistics of the ERSQ total mean score for cluster 1 and cluster 2 and previous, large samples using the ERSQ for assessing emotion regulation skills; low values indicate reduced skills.

The variables not used for classification in Table 4 show that the two clusters do not differ much regarding age and neuropsychological measures. The means in the CAARS DSM scales underline the higher degree of ADHD symptoms in cluster 2. It can be seen in Table 3 that several differences calculated by the Welch-test reach statistical significance after adjusting for multiple testing although the respective effect sizes are small to medium which show that these differences are not of predominant clinical relevance.

Table 4 Descriptive statistics of variables not used for classification in final cluster analysis solution.

In cluster 1 the proportion of males is 67.4% (n = 122) while it is 54.4% (n = 111) in cluster 2. There is an association between cluster membership and gender (χ2 = 6.78, df = 1, p = 0.009). The effect size Cramér V signals a small effect with 0.1347.

Table 5 indicates that the proportions of ADHD presentations differ substantially between the two clusters (χ2 = 30.61, df = 2, p < 0.001). The effect size Cramér V signals a small effect with 0.29. Cluster 2 is dominated by patients with the combined type (85.0%) while in cluster 1 there is also a substantial proportion of patients with the predominantly inattentive type (35.6%).

Table 5 Frequencies and proportions of ADHD presentations in the two-cluster solution.

The two clusters differed significantly regarding the number of comorbid diagnoses (Fisher’s Exact Test, p = 0.001). In cluster 2 there is a higher proportion of patients with 2 comorbid diagnoses demonstrating that patients in cluster 2 have a higher symptom load (see Table 6).

Table 6 Frequencies and proportions of comorbid diagnoses in the two-cluster solution.

The single comorbid diagnoses differentiated for the two clusters are listed in Table 7.

Table 7 Frequencies and proportions of single comorbid diagnoses in the two-cluster solution.

After correcting for multiple testing, the only significant difference on single comorbid diagnoses between the two clusters could be observed in somatoform disorders with cluster 2 having a significantly higher proportion (24.0%) than cluster 1 (10.5%). The effect size Cramér V signals a small effect. Cluster 1 and 2 do not differ significantly regarding affective disorders after Bonferroni correction and the resulting Cramér V shows a weak association.

Discussion

Empirically derived symptom profiles based on cluster analysis from adult patients with ADHD revealed two clusters. Compared to healthy samples48,49,50 both of them were less skilled in ER, but compared to cluster 1, patients from cluster 2 reported more severe lack of skills: adult patients with ADHD from cluster 2 had highest ratings of emotional lability and reported the lowest ER skills, thus representing a subgroup of ADHD with severe ERD. Even compared to other clinical groups48 cluster 2 reported more impaired ER skills than comparison samples of patients with major depressive and adjustment disorders. In contrast, the total ER skill competence in cluster 1 was significantly higher than in samples of depressive, recurrently depressive and adjustment disordered patients.

Further, the percentage of women was elevated in cluster 251 and there was a predominance of the combined presentation of ADHD according to DSM with about 85%. In cluster 1 we found more heterogeneous presentations of ADHD with substantial proportions of the inattentive and the combined subtype. Replicating results of international studies22,51,52 patients with severe ERD in cluster 2 were found to have higher levels of emotional symptoms as indicated by their reduced positive affect, elevated BDI score and negative affect ratings. In line with previous findings, we found the severe ERD in cluster 2 to be associated with higher impairments in most clinical areas as indicated by elevated SCL-90 GSI scores and heightened prevalence of somatoform disorders, substance abuse disorders and affective disorders22. Furthermore, the severe ERD cluster showed more comorbidities. Those results are in support of Shaw’s model15 three (ADHD + ERD as a distinct entity) in which the subgroup of ADHD + ERD is associated with higher symptom load, poorer long-term outcome and comorbidity. As the self-ratings also indicate higher comorbidity of cluster 2 with personality accentuations (see Table 4 SCID-II ratings), though those are not confirmed clinical disorders (see Table 7), future research should focus on the hypothesis that ERD may be linked to specific personality profiles38.

As our external validators are rather limited, we cannot shed light on the question whether our clusters differentially represent emotionally ill patients, patients with different levels of ER capacities or both. Although a pathway from ADHD through ERD to affective and temperamental liabilities and comorbidities seems plausible, there is a gap in the literature and our data cannot fill this gap. We are missing longitudinal data showing that severe ERD (as in cluster 2) compared to normal, impulsive ADHD (as in cluster 1) attracts further emotional symptoms and adverse outcomes like a magnet.

Surman et al. did not find neuropsychological tests that differentiated adults with ADHD and deficient emotional self-regulation from adults with ADHD without deficient self-regulation52. This corresponds to our findings showing that neuropsychological variables did not contribute significantly to cluster formation.

Most ADHD studies do not routinely assess emotional dysregulation. Therefore, it might be possible that these symptoms were misinterpreted as anxiety and/or depressive symptoms, especially in women with ADHD who have a higher rate of emotional dysregulation51. Recent data suggests that a substantial proportion of patients presenting with non-psychotic long-term mental health issues (e.g., depression, anxiety disorders, substance dependence disorders) fulfill a diagnosis of ADHD53.

Our cross-sectional data only represent the static view on ER and ERD and miss the dynamics central to emotion regulation29. As outlined in the introduction, emotion regulation processes include more than global self-reports of valence of emotional experience and intensity of emotions3. Thus, our analysis is just an attempt to organize ADHD subgroups based on self-reports regarding neurocognitive and emotional dimensions, but it is only one, static piece of the puzzle. Future research needs to address the dynamics, as understanding “emotion regulation in action” seems to be a prerequisite to tailor personalized treatment options: different problems in real-life emotion regulation (e.g. insensitivity to own emotions vs. insensitivity to context) will suggest different therapeutic goals and imply distinct interventions. Dialectical behavior therapy (DBT), an intervention that specifically targets emotional dysregulation, with modifications according to the special needs of patients with ADHD, has shown moderate to large effect sizes in treatments of adult ADHD, and might in light of our findings be worthy to be considered for further research54,55,56.

Methods

Participants

All participants were recruited from our specialized adult ADHD outpatient clinic (https://www.uni-marburg.de/de/fb04/team-christiansen/downloads/adulteadhs.pdf) based at the department of psychology at Philipps University Marburg, Germany. This clinic is specialized on diagnostics of adult ADHD and has a large catchment area. Our sample consisted of 385 individuals newly diagnosed with adult ADHD who were all medication-naive. They were examined by experienced licensed clinical psychologists on the basis of a detailed clinical history, the structured diagnostic interview for ADHD in adults (DIVA 2.0), a DSM-IV based clinical interview assessing the ADHD core symptoms in childhood and adulthood as well as psychological domains often impaired in adult ADHD57. Further, the Conners Adult ADHD Rating Scales (CAARS-L self- and observer-ratings), and the Qb+©58 were used for diagnostic assessments. Additionally, the Amsterdam Short Term Memory Test (ASTM) was applied as a symptom validity measure59. The diagnosis was based on the DIVA 2.0 results in order to fulfill DSM-IV diagnostic criteria.

Our sample consisted of 233 males (60.5%) aged a mean of 32.4 years (SD = 9.8) and 152 females (39.5%) aged a mean of 32.5 years (SD = 10.2). Twelve subjects (3.1%) had no school degree, 63 (16.4%) had basic schooling, 85 (22.1%) had finished secondary school, 199 (51.7%) had a grammar school (gymnasium) degree, and we had no information available from 26 patients (6.8%). Comorbidity was high in our sample, with 59 patients (15.3%) having a comorbidity in ICD-10 chapter V block F1 (psychoactive substance use), 5 patients (1.3%) in block F2 (schizophrenia and delusional disorders), 134 patients (34.8%) in block F3 (affective disorders), 68 patients (17.7%) in block F4 (somatoform disorders), 12 patients (3.1%) in block F5 (behavioral syndromes associated with physical factors), 7 patients (1.8%) in block F6 (personality disorders), 1 patient (0.3%) in block F7 (mental retardation) and 2 patients (0.5%) in block F8 (psychological developmental disorders). Summing up the number of comorbidities, 199 patients (51.7%) had no comorbidity, 103 patients (26.8%) one comorbidity, 66 patients two (17.1%), 15 patients three (3.9%), and 2 patients (0.5%) four comorbidities.

Measures

Conners Adult ADHD Rating Scales (CAARS-L: S & O)

The German version of the CAARS-L: S assesses ADHD symptoms in adults aged 18 years or older. Symptoms are rated on a Likert-type scale (0 = not at all/never to 3 = very much/very frequently). The long version consists of 66 items that result in the four factors inattention/memory problems, hyperactivity/restlessness, impulsivity/emotional lability, and problems with self-concept. Confirmatory factor analyses of the German version in healthy adults and ADHD patients supported this factor analytic solution60,61. The four subscales are significantly influenced by age, gender, and the number of years of education. Symptom severity decreases with age, males score higher than females on hyperactivity and sensation-seeking behavior, and females score higher than males on problems with self-concept. Overall symptom ratings are higher for individuals with less education. Test-retest reliability ranges between 0.85 and 0.92, sensitivity and specificity are high for all four subscales. The CAARS-L: S represents a reliable and cross-culturally valid measure of current ADHD symptoms in adults62. The same holds true for the CAARS-L: O, the observer version, which comprises ratings on the same items by a person who has a close relationship to the subject under examination63. The hypothesized factor structure was supported and the observer version also possesses satisfactory psychometric properties.

EMO-Check

The EMO-Check Battery consists of two parts64. The first is a questionnaire for the self-assessment of therapy-relevant emotions, currently prepared for publication, that measures the extent of basic emotional states within the past week with 50 items on a five-point scale from “not at all” to “absolutely”. It is an extension of the Positive and Negative Affect Schedule (PANAS)65. It encompasses the ten items for positive and the ten items for negative affect, from the PANAS, adding additional items to cover emotions of stress, fear, anger, sadness, depression, and shame with 3 items each; guilt and disgust are measured with 1 item each. Furthermore, eleven items address the extent of coping emotions (optimism, courage, pride, etc.). In our study, we only used the two extended subscales for positive and negative affect with 25 items each.

The second part is the Emotion Regulation Skills Questionnaire (ERSQ)50 that assesses self-reports of adaptive responses to challenging feelings; it is based on the adaptive coping with emotions model49,66. The ERSQ is a 27-item self-report instrument employing a five-point Likert-type scale (0 = not at all to 4 = almost always) to assess these adaptive emotional regulation skills in the previous week. There are nine subscales containing three items each: Awareness (eg, “I paid attention to my feelings”), Sensations (eg, “My physical sensations were a good indication of how I was feeling”), Clarity (eg, “I was clear about what emotions I was experiencing”). Understanding (eg, “I was aware of why I felt how I felt”), Acceptance (eg, “I accepted my emotions”), Tolerance (eg, “I could endure my negative feelings”), Readiness to confront distressing situations if needed to attain personally-relevant goals (eg, “I pursued goals that were important to me, even if I thought that doing so would trigger or intensify negative feelings”), Self-Support (eg, “I supported myself in emotionally distressing situations”), and Modification (eg, “I was able to influence my negative feelings”). The total sum score was used for cluster analysis in the present study. When comparing our data to previously published data, we used the average score for ERSQ. As the ERSQ assesses ER skills as positive capacities, higher values indicate better skills, lower scores indicate deficient regulatory capacities. Internal consistencies are around 0.90, retest reliabilities range between 0.48 and 0.74. The total score correlated with the PANAS positive emotion subscale 0.41 and with the negative emotion subscale −0.3350. These data were confirmed in other studies10,67.

Beck Depression Inventory (BDI-II)

Depressive symptom severity was assessed with the revised Beck Depression Inventory68,69 which is a 21-item self-report measure assessing somatic, behavioral, emotional, and cognitive symptoms of depression on a 4-point scale ranging from 0 to 3. The total score ranges from 0–63, with scores higher than 14 points indicate clinically relevant levels of depressive symptoms. The German Version proved satisfactory internal consistency (α ≥ 0.84) and construct validity70,71.

Symptom Check List revised (SCL-90-R)

The SCL‐90‐R consists of 90 items (five-point Likert-scale ranging from 1 = not at all to 5 = very much) which assess nine primary symptom dimensions (somatization, obsessive–compulsive symptoms, interpersonal sensitivity, depression, anxiety, hostility, phobic anxiety, paranoid ideation, psychoticism). In addition, three global summary scores can be calculated. Reliability of the scales is satisfactory (α = 0.79 to α = 0.89) for subscales in clinical samples; to excellent (α = 0.97) for the global psychological distress score; and validity has been confirmed72. As the factorial validity for the subscales is debated, the present study only used the Global Global Severity Index which has proven as a valid indicator of psychological distress73.

Structured Clinical Interview for DSM-IV (SCID-II)

The SCID-II assesses personality disorders according to DSM-IV in a two-tiered procedure. First a 119-item self-report screening questionnaire, which uses a Yes/No response format, asks for each of the diagnostic criteria of all personality disorders listed within DSM-IV74. Second, all personality disorders for which respondents endorsed sufficient criteria for a specific diagnosis are carfully evaluated by an interviewer in order to assign a formal diagnosis75. A modified version of the SCID Screen questionnaire resulted in a correlation of 0.84 between the number of criteria fulfilled in the SCID II interview and in the questionnaire. After adjusting the cut‐off level for diagnosis, the frequency of personality disorders found by the SCID screen questionnaire or the interview was almost the same with 58% and 54%, respectively; the overall kappa was 0.7876. We only used the questionnaire data in the present study which does not allow formal diagnoses. The self-defeating personality disorder is represented by 7 items, the dependent personality disorder by 8 items, the obsessive-compulsive personality disorder by 9 items, the negativistic personality disorder by 8 items, the depressive personality disorder by 8 items, the paranoid personality disorder by 9 items, the schizotypal personality disorder by 9 items, the schizoid personality disorder by 9 items, the histrionic personality disorder by 7 items, the narcissistic personality disorder by 16 items, the borderline personality disorder by 14 items, and the antisocial personality disorder by 15 items. We chose the variables contributing significantly to separating the data into clusters and compared the resulting groups regarding the number of “yes” answers in each personality disorder being aware that we were not able to diagnose the specific personality disorder on an individual basis.

Amsterdam Short-Term Memory Test (ASTM)

The ASTM measures negative response bias and insufficient motivation in psychological examinations77. It is presented as a test of short-term memory and attention. Five semantically related words are shown for 8 seconds. They should be read aloud and remembered. Then, a simple arithmetic problem is given. Afterward, another five words are presented, and the subject is required to identify the three words that were previously shown. The score for the 30 tasks totals a maximum of 90 points. The reliability of the test is satisfactory. The internal consistency in different samples is around 0.90. In a sample of mixed neurological patients, test–retest correlation was 0.85 within an interval of 1 to 3 days77. A different study reports a reliability coefficient of ϕ = 0.92 based on the comparison of actual versus diagnosed group membership in an experimental simulation study78. The test also demonstrates good validity. The cutoff value for the ASTM is ≤84 points. Sensitivity for lack of motivation was 91% (in experimental simulants) and specificity was 89% (in neurological patients). Healthy controls from age 9 on master this test almost perfectly. Patients with neurological disorders, such as concussions, brain tumors, multiple sclerosis, or difficult-to-treat epilepsy, rarely have difficulties in handling this test, provided they do not have serious cognitive deficits. The test has been shown to identify ADHD patients with severe attention impairments59.

Quantified Behavior Test Plus (Qb+)

The Qb+ is a CPT measuring sustained attention with a 1-back working memory task (recall of the same object in shape and color; see description above) combined with a simultaneous high-resolution motion tracking system. It separately assesses hyperactivity, inattention, and impulsivity with nine parameters and takes 15 to 20 minutes. Presented stimuli are a blue circle, a blue square, a red circle, and a red square. A response key is to be pressed when two identical stimuli are shown in succession. The task requires stimulus information to be maintained in working memory until the next stimulus is presented and a matching process can be done79. The ratio of target to nontarget stimuli is 25:75. During performance of the CPT, the movements of the participant are recorded with an infrared camera tracking a reflective marker attached to a headband worn by the participant. The infrared camera is placed about 1 m away from the participant, who is sitting in front of a computer screen. Participants are seated on a chair with back support but no armrest, to assure that they sit comfortably during testing, but do not adopt a reclining posture. Participants’ activities during the test are recorded by reading the coordinates (X and Y) of the headband marker. The position of the marker is sampled 50 times per second, with a spatial resolution of 1/27 mm per camera unit. Normative data have been gathered from 1,307 individuals between 6 and 60 years of age for both versions of the test (QbTest 6–12 and Qb+) with an even age and gender distribution80. Q scores are derived for hyperactivity, inattention, and impulsivity. They are interpreted similar to Z scores with a mean of 0 and a standard deviation of 1. A Q score ≥1.5 is regarded as an atypical result.

Statistical analyses

Cluster analysis is an iterative process. We therefore first performed k-means cluster analyses generalized to all scales of measurement with squared Euclidean distances43. The k-means procedure as a person-centered approach identifies relatively homogeneous subgroups while maximizing the variability between clusters. Calculations were made with ALMO 15 (http://www.almo-statistik.de), which includes a k-means algorithm able to handle the different scaling of our variables and the large sample size43. This program proposes the optimal number of clusters and provides statistical measures to evaluate the appropriateness of several cluster solutions (F value, partial η2). Partial η2 represents the effect size in a general linear model (GLM). It is an omnibus effect size when examining the cluster solution as a whole and a partial η2 when examining the contribution of single variables to the cluster solution. η2 of 0.01 can be regarded as small, 0.06 as medium, and 0.14 as large45,81. Variables were first examined regarding their importance for cluster formation as -to the best of our knowledge- this is not available in any R package. We substantiated our analyses in R (https://cran.r-project.org/) as several R packages offer more detailed options. We applied R packages clValid82 and NbClust83 to determine the best clustering algorithm and the optimal number of clusters46. Variables not used for classification further characterized the two clusters on a descriptive level. If standard deviation was close to the mean, Huber’s M estimators were also listed. Differences between the two clusters on metric variables were evaluated by Welch tests which were shown to be robust against violations of normality and homogeneity of variance84 and by using effect size Cohen’s d with 0.2 showing a small, 0.5. a medium, and 0.8 a large effect47. In case of multiple testing, p values were adjusted by Bonferroni correction85. Priority should be given to the analysis of effect sizes as there is a critical debate over Null Hypothesis Significance Testing (NHST) and its resulting p values86,87.

Categorical variables were analysed by χ2 -Tests and effect size Cramér V of which a value ≥0.40 signals a large effect47,79, and by Fisher’s Exact Tests.

Ethics committee

The study was approved by the Ethics Committee of the Department of Psychology at Philipps University Marburg, Germany.