Introduction

When making decisions in everyday life, immediate external feedback is not always available to inform us of the utility of our choices. In the absence of external feedback, we often rely on an internally generated sense of confidence. This confidence informs metacognitive evaluations of our decisions, actions, and abilities. Though confidence and objective accuracy/utility are usually correlated, the ability to self-evaluate is often suboptimal1 and this can impact diverse cognitive functions such as learning, decision-making and error-monitoring2,3,4,5. For example, if we know that we have performed poorly on a given task, we are likely to alter our behaviour to improve future performance6,7. Conversely, if we lack insight, we risk persevering with damaging choices/behaviours. Indeed, deficits in metacognitive insight have been shown to contribute to impaired life quality in various neurological and psychiatric disorders8,9. However, the psychological determinants of metacognitive ability remain poorly understood.

Consistent relationships have been identified between metacognition and clinically relevant psychiatric symptoms, particularly general under-confidence in major depression10,11,12, under-confidence in memory in obsessive-compulsive disorder (OCD)13,14,15, and impaired metacognitive insight in schizophrenia16,17,18,19. However, suboptimal self-evaluation is not only restricted to clinical samples1,20,21. Recent studies suggest that metacognitive distortions, such as under- and over-confidence, are associated with specific personality traits22 and belief systems23 in the general population, as well as subclinical psychopathology24,25,26. For instance, symptom dimensions cutting across traditional diagnostic categories have been found to correlate with metacognitive performance in general population samples: an ā€˜anxious-depressionā€™ (AD) dimension and a ā€˜compulsive behaviour and intrusive thoughtā€™ (CIT) dimension. Those scoring highly for CIT displayed overconfidence in perceptual decisions, reduced sensitivity of confidence judgements to objective evidence and reduced metacognitive insight26,27,28,29, whereas those scoring highly for AD showed low overall confidence but increased metacognitive insight26. These symptom-specific alterations of self-evaluation may represent enduring psychological phenotypes of psychopathology. However, metacognitive performance is governed by both domain-specific and domain-general mechanisms30,31 and the degree to which metacognitive abnormalities are generalisable to cognitive domains outside of perception remains unknown.

Compared to psychopathology, fewer studies have investigated relationships between metacognition and personality traits. Due to the close links between personality traits and symptoms, it is possible that personality may play a key role in relationships between metacognition and psychopathology. Overall confidence has been positively associated with extraversion22,32 and negatively associated with neuroticism33. Extraversion shows both positive and negative relationships with psychopathology34, negatively predicting internalizing symptoms characterised by social/interpersonal dysfunction35 and/or depression and anxiety34,36, but positively predicting externalizing symptoms characterised by exhibitionism and mania34. Neuroticism positively predicts many forms of psychopathology, particularly anxiety and depression37,38,39,40. In the current study, we sought to quantify and dissociate the degree to which dimensions of both psychopathology and personality are predictive of metacognitive performance.

We adopted a computational modelling approach to measure 1st-order decision-making and metacognition across cognitive domains. This allowed for relationships with personality and psychopathology to be grounded in quantitative model-based measures41,42. This is important because confidence is influenced by multiple latent processes including metacognitive sensitivity (the degree to which confidence dissociates between correct and incorrect decisions) and metacognitive bias (the absolute level of confidence experienced regardless of objective accuracy), as well as by 1st-order task performance itself41,43: any (or all) of which may be related to psychological dispositions. In addition to metacognitive abnormalities, some previous studies have found psychiatrically relevant 1st-order decision-making27,28,29,44,45,46 and/or learning47,48,49,50,51 deficits, whilst others have not26,52,53. Elucidation and dissociation of 1st- and 2nd-order (metacognitive) decision-making abnormalities represent key steps towards an accurate mapping of the deficits underlying core symptoms of psychopathology.

Here, across two separate online studies (Nā€™sā€‰=ā€‰344 and 473, respectively), we investigated relationships between both 1st-order and metacognitive decision-making parameters, self-reported psychopathology (utilising both classic categorical and transdiagnostic approaches) and personality traits. We replicated relationships between psychiatric symptomology and decision parameters from a perceptual task across both studies. In the 2nd study we also employed a knowledge-based task to test whether the relationships are domain-general, and hence likely to have a more pervasive influence in everyday life. Finally, we investigated the degree to which personality traits influenced 1st- and 2nd-order decision-making independently of symptoms of psychopathology.

Methods

Participants

Participants were recruited online using the Prolific (www.prolific.co) and Sona Systems (https://www.sona-systems.com/) recruitment platforms (experiment 1: 393 participants, 16ā€“73ā€‰years old (Mā€‰=ā€‰25.32, SDā€‰=ā€‰10.83); experiment 2: 534 participants, 18ā€“70ā€‰years old (Mā€‰=ā€‰25.42, SDā€‰=ā€‰9.17)). Some participants (Nā€‰=ā€‰374) were paid Ā£7.50 for their time, whilst others received undergraduate course credits (Nā€‰=ā€‰553). No a priori power analysis was performed for experiment 1, with the sample size being based on those employed in relevant previous studies26,29. However, to ensure adequate statistical power to replicate the effects observed in experiment 1, we conducted an a priori power analysis (using G*Power 3.1.9.7) to determine the appropriate sample size for experiment 2. We based the power analysis on the lowest significant effect size observed for a single symptom dimension across the symptom dimension-behaviour relationships in experiment 1 (Compulsive Behaviour and Intrusive Thought (CIT)-accuracy (dā€™) relationship: f2ā€‰=ā€‰0.02). The power analysis indicated that 395 participants would be required to achieve 80% statistical power to detect such an effect. Hence, the total experiment 2 sample size (534) allowed for adequate statistical power to be maintained after data exclusion.

Due to predefined exclusion criteria (explained below), 49 participants were excluded from the experiment 1 analysis, leaving a total number of 344 participants (253 female/91 male, aged from 18 to 73ā€‰years (Mā€‰=ā€‰25.35, SDā€‰=ā€‰10.5)), and 61 participants were excluded from experiment 2, leaving a total number of 473 participants (233 female/240 male aged from 18 to 65ā€‰years (Mā€‰=ā€‰25.75, SDā€‰=ā€‰9.24)). A post hoc power analysis indicated that with the final sample (473) in experiment 2, we achieved 86% statistical power to detect an effect equal to the smallest significant effect size in experiment 1 (f2ā€‰=ā€‰0.02). The only demographic information collected from participants was age and gender, thereby data anonymity was maintained. Both studies received ethical approval from the University of Dundee Research Ethics Committee and all participants provided informed consent.

Perceptual decision task

The perceptual decision task involved 2-alternative forced-choice (2-AFC) numerosity discrimination judgements with confidence ratings and was chosen to replicate Rouault et al., (2018). The perceptual decision task was employed in both experiments 1 and 2. Figure 1a shows a schematic of the trial procedure. On each trial, a black cross appeared at the centre of the screen for 1000ā€‰ms. This was followed by two black boxes, one on the left and the other on the right of the screen, which both contained numerous white dots. These were simultaneously presented for 400ā€‰ms. Participants were then asked to decide which box contained a larger number of dots by pressing the ā€˜wā€™ key for the box on the left or the ā€˜eā€™ key for the box on the right. One box (the reference box) constantly contained 272ā€‰dots (out of 544 possible dot locations), while the other box contained an increased or reduced number of dots ranging from either āˆ’72 to +72ā€‰dots (nā€‰=ā€‰79 in experiment 1) or āˆ’64 to +64ā€‰dots (nā€‰=ā€‰265 in experiment 1 and all participants in experiment 2) in increments of 8ā€‰dots in comparison to the reference box (including an identical condition). The location (left or right) of the reference box varied pseudo-randomly across trials and within each of the difficulty levels. The order of stimulus presentation was randomly generated for each participant. There was no time limit for the response and participants were not given feedback on whether their response was correct. After providing a response, participants were asked to rate how confident they were in their decision on a scale of 1 (not confident/guessing) to 6 (certain). There was no time limit for the confidence rating. Note that 82 participants in experiment 1 completed 152 trials over 2 blocks (8 trials per 19 conditions, 76 trials per block including Ā±72 stimuli), whereas the remaining 267 participants in experiment 1, and all participants in experiment 2, completed 136 trials over 2 blocks (8 trials per 17 conditions, 68 trials per block). Only the conditions that were shared by all participants were included in the analyses (āˆ’64 to +64 numerosity difference conditions). Participants could take a self-paced break between blocks. Before starting the task, participants completed ten practice trials in which only the easiest stimuli were presented (64 or 72 dot difference). The practice trials were identical to the experimental trials except that feedback (a green tick or red cross) was provided (indicating whether the response was correct or incorrect). Two further practice trials were used to familiarise participants with the confidence rating scale in which they were instructed how to respond if they were confident or not confident.

Fig. 1: Perceptual decision-making task and behaviour in experiment 1 (nā€‰=ā€‰344).
figure 1

a Perceptual task. On each trial, participants judged which box (left or right) contained the higher number of dots and provided a confidence rating in each decision (scale of 1ā€“6, where 1 represented ā€œnot confident (guessing)ā€ and 6 represented ā€œcertainā€). b As expected, group-averaged dā€™ increased as a function of absolute numerosity difference. c Group-averaged type-1 cā€™ were biased towards ā€˜left more numerousā€™ responses across all evidence levels and were significantly different to 0 for all numerosity differences up to 56ā€‰dots (all pā€™sā€‰<ā€‰.014), but not for the easiest 64 dot difference condition (pā€‰=ā€‰.06). This leftward bias may reflect either the pseudoneglect phenomenon, whereby neurotypical individuals tend to judge stimuli presented in the left visual field as more salient than comparable stimuli in the right visual field75,76,77, and/or a motor-response bias. d Group-averaged overall mean confidence ratings increased as a function of evidence strength. All error bars reflect 95% confidence intervals for the mean.

Knowledge decision task

To investigate whether the psychiatric symptom ā€“ decision-making relationships generalised to other cognitive domains, in experiment 2 we employed an additional 2-AFC task which tested prior knowledge of generally known quantities: national populations54. Figure 3a shows a schematic of the trial procedure. On each trial, a black cross appeared at the centre of the screen for 1000ā€‰ms. This was followed by the names of two countries and the participants were required to indicate which of the two has the largest human population by selecting the corresponding button on the screen. The country names remained on the screen until the response but if the participant did not respond within 10ā€‰s, then the trial was recorded as ā€˜no responseā€™. After each response, the participant was asked to rate how confident they were in their decision on a scale of 1 (not confident/guessing) to 6 (certain). No feedback about participantsā€™ decision-making was provided during the experimental trials. There was no time limit for the confidence rating.

The national populations for creating the stimuli were downloaded from The World Bank (ā€˜https://data.worldbank.org/indicator/SP.POP.TOTLā€™) in December 2019. Eight different evidence discriminability ā€˜binsā€™ were created by grouping country pairs with similar population log ratios (bins created based on log10 (Country A Population/Country B Population)). The log ratio bins amounted to the following, ranging from least to most discriminable: bin 1 (log10 ratioā€‰=ā€‰0ā€“0.225), bin 2ā€‰=ā€‰(0.225ā€“0.45), bin 3 (0.45ā€“0.675), bin 4ā€‰=ā€‰(0.675ā€“0.9), bin 5 (0.9ā€“1.125), bin 6ā€‰=ā€‰(1.125ā€“1.35), bin 7 (1.35ā€“1.575), bin 8ā€‰=ā€‰(1.575ā€“1.8). Each bin included 18 different country pairs (full task stimuli available at https://osf.io/s3cth/). The location (left or right) of the most populous country varied pseudo-randomly across trials but was counterbalanced within each of the discriminability bins (i.e., same proportion of ā€˜leftā€™ larger and ā€˜rightā€™ larger stimuli). The order of stimulus presentation was randomly generated for each participant. Participants completed 144 trials over 2 blocks (9 trials per 16ā€‰log ratio conditions, 72 trials per block) and could take a self-paced break between blocks. Before starting the task, 10 practice trials were completed in which only examples of the most discriminable stimuli were presented (bin 8). The practice trials were identical to the experimental trials except that feedback (a green tick or red cross) was provided (indicating whether the response was correct or incorrect).

Modelling type-1 and type-2 sensitivity and bias

We modelled 1st-order decisions and confidence ratings on both tasks within an extended signal detection theory (SDT) framework. This model extends the classic SDT approach55 to quantify latent parameters (i.e., sensitivity and bias) contributing to both type-1 and type-2 decisions. Type-1 sensitivity (dā€™) indexes how accurate the participantā€™s 1st-order task decisions are. Meta-dā€™ characterises type-2 (metacognitive) sensitivity as the value of dā€™ that a metacognitively optimal observer, with the same type-1 criterion, would have required to produce the observed type-2 (confidence) data. An individual with optimal metacognitive sensitivity will always be more confident when correct and less confident when incorrect. For a metacognitively ideal observer (a person who is rating confidence using the maximum possible metacognitive sensitivity), meta-dā€™ should equal dā€™. Importantly, we can therefore define the level of metacognitive insight/efficiency, controlling for 1st-order performance, as the value of meta-dā€™ relative to dā€™ (meta-dā€™/dā€™). A meta-dā€™/dā€™ value of 1 indicates theoretically ideal metacognitive insight. A value below 1 indicates that evidence available for the type-1 decision is lost when making metacognitive judgements (type-2 decision), whereas a value above 1 indicates that more evidence is available for the type-2 decision than for the type-1 decision41. Note that we employed the meta-dā€™/dā€™ measure of metacognitive efficiency, rather than the alternative meta-dā€™-dā€™ measure, because it has been shown to better isolate metacognitive sensitivity from 1st-order accuracy43.

The confidence criteria (type-2 cā€™) represent type-2 bias calculated within the meta-dā€™ framework: the tendency to give high or low confidence ratings regardless of evidence strength. We calculated the absolute distance between type-2 cā€™ and type-1 cā€™ (|type-2 cā€™ - type-1 cā€™|) to isolate confidence bias from perceptual response bias56. Lower confidence criteria (|type-2 cā€™ -type-1 cā€™|) values indicate an overall bias in favour of higher confidence ratings and higher values indicate a bias in favour of low confidence ratings (i.e., confidence criteria are inversely related to mean absolute confidence ratings). Confidence criteria values were calculated separately for each of the possible type-1 responses (i.e., ā€˜leftā€™ or ā€˜rightā€™ more numerous/higher population judgements in the perceptual and general knowledge tasks respectively) and for each of N-1 confidence ratings available to choose from (6 in the current experiment). To streamline the analysis, we averaged over the 5 |type-2 cā€™ - type-1 cā€™| values for each response (ā€˜leftā€™ or ā€˜rightā€™) separately and then averaged over the resulting ā€˜leftā€™ and ā€˜rightā€™ mean criteria to gain a single overall confidence criterion estimate.

All measures were calculated using individual participant fits (fit_meta_d_mcmc function) within the ā€œHmeta-dā€ toolbox57 (https://github.com/metacoglab/HMeta-d) in Matlab (Mathworks, USA). The input parameters for the model fits were as follows:

mcmc_params.response_conditionalā€‰=ā€‰0;

mcmc_params.estimate_dprimeā€‰=ā€‰0;

mcmc_params.nchainsā€‰=ā€‰3;

mcmc_params.nburninā€‰=ā€‰1000;

mcmc_params.nsamplesā€‰=ā€‰10000;

mcmc_params.nthinā€‰=ā€‰1;

mcmc_params.doparallelā€‰=ā€‰0;

mcmc_params.dicā€‰=ā€‰1;

The scripts for running the fits can be found at https://osf.io/s3cth/. It is important to note that the model-based measures were calculated collapsed across all discriminability levels from each participant independently (NOT within a hierarchical model) for regressions with self-reported psychiatric symptoms and personality traits (Figs. 2, 6 & 7). However, to test the reliability of the symptom-metacognitive efficiency relationships, we also employed alternative hierarchical analysis approaches which incorporated group-level prior densities when estimating metacognitive efficiency57,58 (see ā€˜Statistical Analysesā€™ section below and Supplementary Figs. 7 and 8).

Self-report psychometric questionnaires

Each participant in both experiments completed a battery of nine mental health questionnaires which assessed symptomology across a range of disorders. Symptoms of depression were measured using the Zung Self-Rating Depression Scale59. Obsessive-Compulsive symptoms were measured using the Obsessive-Compulsive Inventory-Revised60. Trait anxiety was measured using the State-Trait Anxiety Inventory Form Y-261. Alcohol addiction was measured using the Alcohol Use Disorder Identification Test (AUDIT)62. Apathy was measured using the Apathy Evaluation Scale63. Eating disorder symptomology was measured using the Eating Attitudes Test (EAT-26)64. Impulsivity was measured using the Barratt Impulsivity Scale (BIS-11)65. Schizotypy was measured using the Short Scales for Measuring Schizotypy66. Social anxiety was measured using the Liebowitz Social Anxiety Scale which contains 24-items67. These questionnaires were chosen to allow us to investigate the three underlying transdiagnostic symptom dimensions identified by47 and replicated by31. In addition to the psychiatric symptom questionnaires, participants in experiment 2 also completed the Big Five Inventory68.

Transdiagnostic symptom dimensions

Using the same psychiatric symptom questionnaires, Gillan et al., (2016)47 conducted an exploratory factor analysis (FA) on data collected in a large sample (nā€‰=ā€‰1413). They found that the items from all 9 mental health questionnaires (nā€‰=ā€‰209 items) clustered around three latent ā€˜factorsā€™ which they termed ā€˜Anxious-Depressionā€™, ā€˜Compulsive Behaviour and Intrusive Thoughtsā€™ and ā€˜Social Withdrawalā€™ based on the individual items loading most strongly on each respective factor. The ā€˜Anxious-Depressionā€™ factor was most heavily weighted by items from the Generalised Anxiety, Depression, Apathy, and Impulsivity questionnaires (see Gillan et al., 2016). The ā€˜Compulsive Behavior and Intrusive Thoughtā€™ factor was most heavily weighted by items from the OCD, Eating Disorders, Alcoholism and Schizotypy questionnaires. Lastly, the ā€˜Social Withdrawalā€™ factor had the highest average loadings from the Social Anxiety questionnaire. These factors have subsequently been replicated in an independent sample26. We replicated the FA performed by Rouault et al., (2018)26 to test whether the previously observed three transdiagnostic symptom dimensions26,47 were replicated in our data (Nā€‰=ā€‰817 participants from both experiments 1 and 2). The analysis was conducted on the 209 individual questionnaire items using the fa() function from the Psych package in R, with an oblique (oblimin) rotation and maximum likelihood estimation. For the Liebowitz Social Anxiety Scale (LSAS), the average of the avoidance and fear/anxiety answers of each item was taken. In line with previous studies26,47, a 3-factor latent structure was found to provide the most parsimonious explanation for the item-level responses. Supplementary Fig. 3A plots correlations between the item weights from the FA performed on the current data and those of Gillan et al., (2016)47 for each factor. Supplementary Fig. 3B plots correlations between the individual participant scores calculated using the item weights from our FA and those of Gillan et al., (2016) for each factor.

Due to the larger sample size used to conduct their factor analysis, we applied the weights from Gillan et al., (2016) to derive scores for the three symptom dimensions for the main analyses. First, the raw responses for each item were z-scored across participants, the individual item z-scores within each participant were then multiplied by their corresponding factor weights and the resulting products were summed across all items for each factor. Finally, the factor sums were z-scored across participants in preparation for statistical analyses. Note that the results were also reproduced using the item weights from the FA performed on the current data. The R script for running the FA can be found at https://osf.io/s3cth/.

Procedure

Both experiments were conducted online via the Gorilla experiment platform69. The experiments could only be completed on either a laptop, tablet, or personal computer (and not on a mobile phone) to facilitate a more optimal screen size for the visual perception task. After clicking an online link and providing informed consent, participants were first asked to provide demographic information of age and gender assigned at birth. The participants then completed the questionnaires and task(s) in a randomised order. The entire experimental session took between 40ā€‰min and 1ā€‰h for both experiments.

Exclusion criteria

Several predefined exclusion criteria were applied to the data from both experiments to ensure acceptable data quality. Across both studies, ~23% of participants were excluded based on the criteria, leaving 344 participants for experiment 1 and 473 participants for experiment 2.

Participants who met any one or more of the following criteria in experiment 1 were excluded from all analyses:

  1. 1.

    Did not provide gender information (nā€‰=ā€‰5, 1.28%).

  2. 2.

    Below- or near-chance perceptual decision task performance (overall accuracyā€‰<ā€‰55%) (nā€‰=ā€‰9, 2.29%).

  3. 3.

    Below the age of 18 (nā€‰=ā€‰11, 2.8%).

  4. 4.

    Incorrect response to a ā€˜catchā€™ item employed as an attention check (nā€‰=ā€‰12, 3.05%). The ā€˜catchā€™ item was embedded within the Zung Depression Scale and read as follows: ā€œIf you are paying attention, please select ā€˜Good part of the timeā€™ for this answerā€.

  5. 5.

    Used the same single confidence rating across all trials of the perceptual decision task (nā€‰=ā€‰1, 0.25%).

  6. 6.

    A metacognitive efficiency (meta-dā€™/dā€™) ratio below 0 on the perceptual decision task (nā€‰=ā€‰13, 3.31%). A negative metacognitive efficiency score can occur when type-1 accuracy is around chance level and/or the participant is not using the confidence scale as expected (i.e. repeating a single confidence rating on the vast majority of trials or randomly selecting confidence ratings70).

Based on these criteria, a total of 49 participants (12.5%) were excluded from experiment 1.

The exclusion criteria for experiment 2 included all of those employed in experiment 1 plus additional criteria based on knowledge task performance. Again, any participants who met any one or more of the following criteria were excluded from all analyses:

  1. 1.

    Did not provide gender information (nā€‰=ā€‰0, 0%).

  2. 2.

    Below- or near-chance perceptual decision task performance (overall accuracyā€‰<ā€‰55%) (nā€‰=ā€‰19, 3.56%).

  3. 3.

    Below the age of 18 (nā€‰=ā€‰1, 0.19%).

  4. 4.

    Incorrect response to the ā€˜catchā€™ item employed as an attention check (nā€‰=ā€‰13, 2.43%).

  5. 5.

    Used the same single confidence rating across all trials of the perceptual decision task (nā€‰=ā€‰2, 0.37%).

  6. 6.

    A metacognitive efficiency (meta-dā€™/dā€™) ratio below 0 on the perceptual decision task (nā€‰=ā€‰27, 5.06%).

  7. 7.

    Below- or near-chance knowledge decision task performance (overall accuracyā€‰<ā€‰55%) (nā€‰=ā€‰11, 2.06%).

  8. 8.

    Used the same single confidence rating across all trials of the knowledge decision task (nā€‰=ā€‰0, 0%).

  9. 9.

    A metacognitive efficiency (meta-d'/d') ratio below 0 on the knowledge decision task (nā€‰=ā€‰7, 1.31%).

  10. 10.

    Failed to respond on >4 trials (out of 144) on the knowledge task (nā€‰=ā€‰12, 2.25%).

Based on these criteria, a total of 61 participants (11.42%) were excluded from experiment 2.

Statistical analyses

To examine the relationships between task measures and both self-reported symptoms and personality traits, we conducted a series of multiple linear regressions (always controlling for age and gender). All regressions were conducted using the fitlm function in MATLAB R2021a (Mathworks, USA). All variables were z-scored to ensure comparability of the regression coefficients.

The dependent measures derived from the perceptual decision-making task and the general knowledge task were type-1 accuracy (d'), metacognitive sensitivity (meta-d'), metacognitive efficiency (log(meta-d'/d')) and confidence criteria (|type-2 c'āˆ’type-1 c'|). Due to high correlations between some of the different psychiatric symptom questionnaires, we assessed relationships between individual questionnaire scores (log-transformed) and the task measures, and between individual questionnaire scores (log-transformed) and personality dimensions, in separate regression models. In the syntax of the fitlm function, the regressions were as follows:

$${\rm{Dependent}} \, {\rm{variable}} \sim {\log}({\rm{Questionnaire}}\, {\rm{Score}}) + {\rm{age}} + {\rm{gender}}.$$

For the regressions assessing relationships between the psychiatric symptom dimensions and the task measures, and between symptom dimensions and personality dimensions, all symptom dimensions were entered in the same regression model:

$${\rm{Dependent}}\, {\rm{variable}} \sim {\rm{Factor}}1 \, \lq{\rm{anxious}}\hbox{-}{\rm{depression}}\rq + \,{\rm{Factor}}2 \, \lq{\rm{compulsive}}\, {\rm{behaviour}}\, {\rm{and}} \,{\rm{intrusive}}\, {\rm{thought}}\rq + \,{\rm{Factor}}3 \, \lq{\rm{social}}\, {\rm{withdrawal}}\rq + {\rm{age}} + {\rm{gender}}.$$

This was also the case for the regressions assessing relationships between personality dimensions and task measures, whilst controlling for symptom dimensions:

$${\rm{Dependent}}\, {\rm{variable}} \sim {\rm{extraversion}} + {\rm{agreeableness}} + \,{\rm{conscientiousness}} + \,{\rm{openness}}\, + \,{\rm{neuroticism}} + \,{\rm{Factor}}1 \, \lq{\rm{anxious}}\hbox{-}{\rm{depression}}\rq + \,{\rm{Factor}}2 \, \lq{\rm{compulsive}}\, {\rm{behaviour}}\, {\rm{and}}\,{\rm{intrusive}}\, {\rm{thought}}\rq + \,{\rm{Factor}}3\, \lq{\rm{social}}\, {\rm{withdrawal}}\rq + {\rm{age}} + {\rm{gender}}.$$

To correct for multiple comparisons, Bonferroni correction was applied over the number of dependent variables tested in each different analysis. For the individual questionnaire-behaviour relationships presented in Figs. 2a and 6a, c and Supplementary Fig. 6A, the corrected alpha level was 0.0014. For the symptom dimension-behaviour relationships presented in Figs. 2b and 6b, d and Supplementary Fig. 6B, the corrected alpha level was 0.0125. For the personality dimension-behaviour relationships presented in Fig. 7a, b, the corrected alpha level was 0.0167. For the individual questionnaire-personality relationships presented in Fig. 8a, the corrected alpha level was 0.0011. For the symptom dimension-personality relationships presented in Fig. 8b, the corrected alpha level was 0.01.

Pearson correlation coefficients were calculated for each of the between-subject correlations of interest. Paired- and independent-samples t-tests were employed to test for differences in decision parameters both within- and between-tasks.

For analysis of the relationships between psychiatric symptom dimensions and metacognitive efficiency, in addition to the linear regression approach outlined above, we also adopted two approaches which have recently been employed to test for group differences and to link external qualities to metacognitive efficiency57,58. These approaches incorporate Bayesian priors to constrain estimates of both group-average and individual participant metacognitive efficiency using hierarchical modelling. Two separate analyses were performed using the hierarchical fitting option in the ā€œHMeta-dā€ toolbox57. These analyses were conducted to test the reliability of the null relationships between psychiatric symptom dimensions and metacognitive efficiency observed in the multiple linear regression analyses performed using non-hierarchical individual participant Meta-dā€™ fits (presented in Figs. 2, 6 and 7). For both hierarchical analyses, we used the perception task data from both experiments combined (Nā€‰=ā€‰817) to maximize statistical power.

In the first hierarchical analysis, we used median splits to create ā€˜highā€™ and ā€˜lowā€™ symptom dimension score groups for each of the three dimensions (AD, CIT, and SW). The hierarchical Bayesian estimation implemented in HMeta-dā€™ specifies group-level prior densities over each of the participant-level parameters and provides a group-level estimate of metacognitive efficiency (meta-dā€™/dā€™). We estimated group-level metacognitive efficiency separately for the high and low symptom groups across all three symptom dimensions. The group-level fits were performed using the fit_meta_d_mcmc_group function57 with the following input parameters:

mcmc_params.response_conditionalā€‰=ā€‰0;

mcmc_params.estimate_dprimeā€‰=ā€‰0;

mcmc_params.nchainsā€‰=ā€‰3;

mcmc_params.nburninā€‰=ā€‰1000;

mcmc_params.nsamplesā€‰=ā€‰10000;

mcmc_params.nthinā€‰=ā€‰1;

mcmc_params.doparallelā€‰=ā€‰0;

mcmc_params.dicā€‰=ā€‰1;

Group difference in metacognitive efficiency were assessed by first calculating the distribution of differences in posterior parameter samples from each group (highā€‰>ā€‰low), and then determining the 95% highest-density interval (HDI) for this distribution. The group-level posterior densities were then used to test the statistical significance of differences in metacognitive efficiency. Specifically, if the 95% highest-density interval (HDI) of the difference between groups did not include 0 then the difference was judged to be statistically significant, whereas if the HDI did include 0 then the difference was judged not statistically significant57.

In the second hierarchical analysis, we adopted a recently developed approach which allows for relationships between potential covariates and metacognitive efficiency (meta-d'/d') to be estimated within the hierarchical meta-dā€™ model57,58. This approach embeds the estimation of symptom-metacognitive efficiency relationships into the parameter inference routine, such that the group-level estimate of regression coefficients reflects the influence of individual differences in symptom severity on metacognitive efficiency58. The regressors included in the hierarchical model were Age, Gender, AD scores, CIT scores and SW scores, with the outcome variable being metacognitive efficiency (meta-d'/d') scores. The fitting was performed using the fit_meta_d_mcmc_regression function57 with the following input parameters:

mcmc_params.response_conditionalā€‰=ā€‰0;

mcmc_params.estimate_dprimeā€‰=ā€‰0;

mcmc_params.nchainsā€‰=ā€‰3;

mcmc_params.nburninā€‰=ā€‰1000;

mcmc_params.nsamplesā€‰=ā€‰10000;

mcmc_params.nthinā€‰=ā€‰1;

mcmc_params.doparallelā€‰=ā€‰0;

mcmc_params.dicā€‰=ā€‰1;

Again, posterior densities were used to test the statistical significance of the regression coefficients. Specifically, if the 95% highest-density interval (HDI) of a regression coefficient did not include 0 then the relationship was judged to be statistically significant, whereas if the HDI did include 0 then the relationship was judged not statistically significant.

Results

In study 1 (Nā€‰=ā€‰344 after data exclusion), participants performed a visual two-alternative forced-choice (2-AFC) numerosity discrimination task (Fig. 1a) and completed a battery of nine self-report psychiatric symptom questionnaires. The task involved deciding which of two simultaneously presented black boxes contained a greater number of white dots on each trial, and then rating confidence in the decision (on a scale of 1ā€”ā€˜not confident (guessing)ā€™ to 6ā€”ā€˜certainā€™). The true numerosity difference between the boxes (and hence task difficulty) was manipulated from trial-to-trial. Figure 1 provides a schematic of the trial procedure and an overview of task performance.

Psychiatric symptom dimensions are associated with dissociable 1st-order and metacognitive decision-making signatures

To quantify latent parameters contributing to both 1st- and 2nd-order decisions, we modelled the task data within an extended signal detection theory (SDT) framework41,42,57,71. This provided measures of both 1st-order accuracy (dā€™) and the degree to which confidence ratings dissociated correct from incorrect decisions (metacognitive sensitivity (meta-dā€²)) (see Methods for full details). Because d' and meta-d' are measured in the same units (signal-to-noise ratio), their ratio can be used to index the level of metacognitive efficiency of the observer43,57. This measure quantifies how much of the information available for 1st-order decisions is retained when rating confidence. The meta-d' model also separates sensitivity measures from measures of both 1st-order (criterion (c')) and 2nd-order (confidence criteria) bias.

We investigated relationships between self-reported psychiatric symptoms and the task measures of interest (perceptual accuracy (d'), metacognitive sensitivity (meta-d'), metacognitive efficiency (meta-d'/d'), confidence criteria), whilst controlling for age and gender (see Methods). Note that to calculate the task measures, individual meta-dā€™ fits were applied to the data (collapsed across all levels of absolute numerosity difference) from each participant independently (NOT within a hierarchical model), thereby providing overall metrics of both perceptual and metacognitive performance for each participant which were independent of the data from other participants. This importantly satisfies the assumption that observations should be independent of each other for regression analysis. Full sample distributions of all measures are shown in Supplementary Fig. 1, and relationships with demographic variables (age and gender) are reported in Supplementary Fig. 2 and Supplementary Results.

Figure 2a plots standardised regression coefficients indexing the strength and direction of the relationships between questionnaire scores and each task measure. Self-reported apathy (Ī²ā€‰=ā€‰0.18, pā€‰=ā€‰.033, corrected) and generalised anxiety (Ī²ā€‰=ā€‰0.19, pā€‰=ā€‰.027, corrected) were positively associated with confidence criteria (indicating negative relationships with absolute confidence). No other relationships survived Bonferroni correction.

Fig. 2: Associations between 1st- and 2nd-order decision parameters and self-reported psychopathology, additionally controlling for the influence of age and gender, in experiment 1.
figure 2

a Associations between psychiatric symptom questionnaire scores and Meta-dā€™ parameters from separate regression models. Given that all variables were z-scored prior to entry into the regression models, the y-axis indicates the change in each decision parameter (in standard deviations) for each change of 1 standard deviation of questionnaire scores. Accuracyā€‰=ā€‰d', Metacognitive sensitivityā€‰=ā€‰meta-d', Metacognitive efficiencyā€‰=ā€‰log(meta-d'/d'). b In line with previous studies26,47, factor analysis on the correlation matrix of all 209 questionnaire items revealed a three-factor solution comprising anxious-depression (AD), compulsive behaviour and intrusive thought (CIT) and social withdrawal (SW). The relationships between these transdiagnostic symptom dimension scores and Meta-dā€™ parameters were investigated using multiple regression models. CIT showed negative relationships with both 1st-order accuracy and confidence criteria, whereas AD showed a positive relationship with confidence criteria. All error bars denote 95% Confidence Intervals for the regression coefficients. Ā°Pā€‰<ā€‰0.05 uncorrected; **Pā€‰<ā€‰0.05 Bonferroni corrected for multiple comparisons over the number of dependent variables tested.

As well as relating scores on each questionnaire separately, we performed a transdiagnostic analysis26,47 to relate underlying dimensions of psychopathology to both perceptual and metacognitive performance. The transdiagnostic approach accounts for high comorbidity between diagnostic categories (indicated by strong correlations between individual questionnaire scores (Supplementary Fig. 2A)) and potentially heterogenous symptom clusters within categories72,73. The questionnaires were chosen to match those of previous studies26,47 that used factor analysis to identify three symptom dimensions underlying the 209 items across all nine questionnaires: an ā€˜anxious-depressionā€™ (AD) dimension, a ā€˜compulsive behaviour and intrusive thoughtā€™ (CIT) dimension and a ā€˜social withdrawalā€™ (SW) dimension. We conducted the same factor analysis in our entire sample (across both studies: Nā€‰=ā€‰817) and replicated the three dimensions (Supplementary Fig. 3).

We tested relationships between the symptom dimensions and task measures, again controlling for age and gender (Fig. 2b). The CIT dimension showed a dissociation between 1st- and 2nd-order effects: despite being associated with lower objective accuracy (Ī²ā€‰=ā€‰āˆ’0.16, pā€‰=ā€‰.047, corrected), CIT was also associated with reduced confidence criteria (indicating high levels of absolute confidence) (Ī²ā€‰=ā€‰āˆ’0.17, pā€‰=ā€‰.022, corrected). Conversely, whilst the AD dimension showed no relationship with objective accuracy (Ī²ā€‰=ā€‰0.01, pā€‰=ā€‰.85), it was associated with increased confidence criteria (indicating low levels of absolute confidence) (Ī²ā€‰=ā€‰0.33, pā€‰<ā€‰.001, corrected). The confidence criteria effects replicate Rouault, Seow, et al. (2018)26 who found AD/CIT to be associated with low/high levels of absolute confidence, respectively. It is noteworthy that the CIT-confidence effect was not captured in the standard questionnaire analyses (Fig. 2a), and therefore the transdiagnostic approach revealed relationships masked by classic diagnostic categories.

Overall, the results of study 1 show that dissociable psychiatric symptom dimensions are associated with distinct 1st-order and metacognitive decision-making signatures, with CIT predicting reduced perceptual accuracy but high absolute confidence levels and AD predicting low absolute confidence levels despite intact perceptual accuracy.

Both domain-specific and domain-general factors contributed to performance, and confidence was the most strongly correlated measure across cognitive domains

In a 2nd study (Nā€‰=ā€‰473 after data exclusion), we sought to extend the results in an independent sample by testing (1) whether the relationships generalise across cognitive domains and (2) whether big-5 personality dimensions explain additional variance in either 1st- and/or 2nd-order decision measures, over and above that explained by symptom dimensions. Participants performed the same perceptual task but also performed an additional 2-AFC task which tested prior knowledge of generally known quantities: national populations (Fig. 3a)54,74. The knowledge task was chosen to maintain a similar trial and response structure to the perceptual task while indexing performance in a different cognitive domain. The task involved judging which of two countries had the highest human population, and then rating decision confidence on the same 6-point scale (1ā€”ā€˜not confident (guessing)ā€™ to 6ā€”ā€˜certainā€™). The true population difference between the two countries (and hence task difficulty) was manipulated from trial-to-trial (Methods). Figure 3 provides a schematic of the trial procedure and an overview of performance on both tasks. In addition to the nine psychiatric symptom questionnaires, participants also completed the Big Five Inventory (BFI)68 to assess personality dimensions of ā€˜extraversionā€™, ā€˜agreeablenessā€™, ā€˜conscientiousnessā€™, ā€˜openness to experienceā€™ and ā€˜neuroticismā€™. Full sample distributions of outcome measures for study 2 are shown in Supplementary Fig. 4 and relationships with age and gender are reported in Supplementary Fig. 5 and Supplementary Results.

Fig. 3: Knowledge decision-making task and behaviour in study 2 (nā€‰=ā€‰473).
figure 3

a In addition to the perception task, participants also completed a task which tested knowledge of national populations. On each trial, participants judged which of two countries had the higher human population and provided a confidence rating (scale of 1ā€“6, where 1 represented ā€œnot confident (guessing)ā€ and 6 represented ā€œcertainā€). Eight evidence discriminability bins were created by grouping pairs of countries with similar population log ratios. The log ratio bins amounted to the following, ranging from least to most discriminable: bin 1 (log10 ratio = 0ā€“0.225), bin 2ā€‰=ā€‰(0.225ā€“0.45), bin 3 (0.45ā€“0.675), bin 4ā€‰=ā€‰(0.675ā€“0.9), bin 5 (0.9ā€“1.125), bin 6ā€‰=ā€‰(1.125ā€“1.35), bin 7 (1.35ā€“1.575), bin 8ā€‰=ā€‰(1.575ā€“1.8). b In both tasks, group-averaged dā€™ increased as a function of evidence strength. c The systematic type-1 leftward biases (here indexed by the mean type-1 cā€™) decreased as a function of evidence level for both tasks but were systematically stronger for the perceptual task. d Group-averaged overall mean confidence ratings increased as a function of evidence strength. All error bars reflect 95% confidence intervals for the mean.

Comparing performance between the tasks (Fig. 4), participants performed better on the perceptual (mean dā€™ā€‰=ā€‰1.70; SDā€‰=ā€‰0.56) compared to the knowledge (mean d'ā€‰=ā€‰1.32; SDā€‰=ā€‰0.47) task (t(472)ā€‰=ā€‰13.25, pā€‰<ā€‰.001). However, meta-dā€™ did not significantly differ (mean perceptual meta-d'ā€‰= 1.32; SDā€‰=ā€‰0.63, mean knowledge meta-d'ā€‰=ā€‰1.37; SDā€‰=ā€‰0.66: t(472)ā€‰=ā€‰āˆ’1.32, pā€‰=ā€‰.186). Meta-d' values were more closely aligned with d' values in the knowledge task, as can be seen by comparing Fig. 4a, b. Accordingly, overall metacognitive efficiency was higher for knowledge (mean meta-d'/d'ā€‰=ā€‰1.07; SDā€‰=ā€‰0.44) relative to perception (mean meta-d'/d'ā€‰=ā€‰0.8; SDā€‰=ā€‰0.36) (t(472)ā€‰=ā€‰10.9, pā€‰<ā€‰.001) (Fig. 4c). Leftward group-level response biases (indexed by cā€™) were significantly stronger for perception (mean perceptual c'ā€‰=ā€‰0.12; SDā€‰=ā€‰0.34, mean knowledge c'ā€‰= 0.04; SDā€‰=ā€‰0.13: t(472)ā€‰=ā€‰5.01, pā€‰<ā€‰.001) (Fig. 4d). Given that the leftward bias was present for both tasks, but stronger for perception, suggests that both motor and perceptual biases likely contributed75,76,77. Finally, despite the knowledge task being objectively more difficult than the perceptual task (as reflected by the dā€™ differences), knowledge confidence criteria were lower (indicating higher mean confidence ratings) (mean perceptual confidence c'ā€‰=ā€‰0.73; SDā€‰=ā€‰0.3, mean knowledge confidence c'ā€‰=ā€‰0.65; SDā€‰=ā€‰0.24: t(472)ā€‰=ā€‰āˆ’5.93, pā€‰<ā€‰.001) (Fig. 4e).

Fig. 4: Between-task comparisons of overall performance.
figure 4

The data are shown for (a) type-1 accuracy (d'), (b) metacognitive sensitivity (meta-dā€™), (c) metacognitive efficiency (meta-dā€™/dā€™), (d) criterion (type-1 cā€™) and (e) type-2 criterion (confidence c'). On each box, the central line is the median, the edges of the box are the 25th and 75th percentiles, and the whiskers extend Ā±2.7 standard deviations from the median. *Pā€‰<ā€‰0.05, **Pā€‰<ā€‰0.01.

To estimate the contribution of domain-general mechanisms, we tested the correlation of each measure (collapsed across evidence levels) between tasks (Fig. 5). We reasoned that significant correlation of a given measure between tasks suggests that a shared latent mechanism must contribute across cognitive domains78,79. The only non-significant correlation was for c' (r(471)ā€‰= 0.06, pā€‰=ā€‰.18). All other correlations indicated influence of domain-general mechanisms on performance, though with marked differences in correlation strength across measures. Both d' (r(471)ā€‰=ā€‰0.26, pā€‰<ā€‰.001) and meta-d' (r(471)ā€‰=ā€‰0.16, pā€‰<ā€‰.001) showed moderate correlations across tasks, whilst meta-d'/d' (r(471)ā€‰=ā€‰0.09, pā€‰=ā€‰.043) showed the weakest correlation of the significant measures. In line with previous studies78,79, the most strongly correlated measure across tasks was confidence c' (r(471)ā€‰=ā€‰0.52, pā€‰<ā€‰.001), suggesting that overall confidence calibration represents a stable, ā€˜trait-likeā€™ measure which strongly influences metacognitive judgements across cognitive domains. It is important to note that estimates of confidence c' may be inherently less noisy than estimates of meta-d' and meta-d'/d', and that this may contribute to the differences in correlation strength of these measures across tasks. Further work is needed to ascertain whether absolute confidence levels are indeed an inherently more stable trait across cognitive domains than metacognitive sensitivity/efficiency.

Fig. 5: Between-participant Pearson correlations across the two tasks.
figure 5

Data are plotted for overall (a) type-1 accuracy (d'), (b) metacognitive sensitivity (meta-d'), (c) metacognitive efficiency (meta-d'/d'), (d) criterion (type-1 c') and (e) type-2 criterion (confidence c'). *Pā€‰<ā€‰0.05, **Pā€‰<ā€‰0.01.

Psychiatrically relevant 1st- and 2nd-order decision-making signatures are domain-general

Next, we investigated whether the relationships between psychiatric symptoms and task measures are themselves domain-specific or domain-general. For perception, similar relationships between task measures and both individual questionnaires and symptom dimensions were observed to those in experiment 1, though in experiment 2 additional significant relationships were found between CIT and metacognitive sensitivity (Ī²ā€‰=ā€‰āˆ’0.15, pā€‰=ā€‰.013, corrected) and between SW and 1st-order accuracy (Ī²ā€‰=ā€‰0.13, pā€‰=ā€‰.048, corrected) (see Fig. 6a, b compared to Fig. 2a, b). The perception-symptom relationships across both studies combined (Nā€‰=ā€‰817) are presented in Supplementary Fig. 6. To test whether the relationships generalised across cognitive domains, we turned to the knowledge task (Fig. 6c, d). In line with perception, knowledge confidence criteria were positively associated with apathy (Ī²ā€‰=ā€‰0.18, pā€‰<ā€‰.001, corrected) and generalised anxiety (Ī²ā€‰=ā€‰0.14, pā€‰=ā€‰.047, corrected).

The knowledge task-symptom dimension results closely replicated those of the perceptual task (Fig. 6d). CIT was associated with reduced 1st-order accuracy (Ī²ā€‰=ā€‰āˆ’0.19, pā€‰<ā€‰.001, corrected) and metacognitive sensitivity (Ī²ā€‰=ā€‰āˆ’0.13, pā€‰=ā€‰.039, corrected) as well as reduced confidence criteria (Ī²ā€‰=ā€‰āˆ’0.18, pā€‰<ā€‰.001, corrected), whereas AD was positively associated with confidence criteria (Ī²ā€‰=ā€‰0.24, pā€‰<ā€‰.001, corrected). However, SW and 1st-order accuracy were not correlated for the knowledge task (Ī²ā€‰=ā€‰0.05, pā€‰=ā€‰.351). As in experiment 1, no significant relationships with metacognitive efficiency were found for any of the symptom dimensions in either task. Importantly, these null results held when we further tested them (on the combined perceptual data from both experiments) using alternative hierarchical analysis approaches which incorporated group-level prior densities when estimating metacognitive efficiency57,58 (see Supplementary Results and Supplementary Figs. 7 and 8). Hence, we found no evidence for any relationship between symptom dimensions and metacognitive efficiency. Note that the negative relationships between CIT and metacognitive sensitivity in both tasks may be accounted for by the relationships between CIT and 1st-order accuracy, given that meta-d' positively correlates with d'. Indeed, the lack of any relationship between CIT and metacognitive efficiency (meta-d'/d') indicates that CIT is primarily associated with 1st-order accuracy rather than metacognitive sensitivity. Overall, overlap in relationships with psychiatric symptoms between the perceptual and knowledge tasks suggests that domain-general mechanisms largely underlie the associations between distinct dimensions of psychopathology and 1st-order and metacognitive decision signatures.

Fig. 6: Associations between 1st- and 2nd-order decision parameters and self-reported psychopathology, additionally controlling for age and gender, in experiment 2.
figure 6

a Associations between psychiatric symptom questionnaire scores and perceptual Meta-dā€™ parameters. Given that all variables were z-scored prior to entry into the regression models, the y-axis indicates the change in each decision parameter (in standard deviations) for each change of 1 standard deviation of questionnaire scores. Accuracyā€‰=ā€‰d', Metacognitive sensitivityā€‰=ā€‰meta-d', Metacognitive efficiency = log(meta-d'/d'). b Associations between transdiagnostic symptom dimension scores and perceptual Meta-d' parameters. c Associations between psychiatric symptom questionnaire scores and knowledge Meta-dā€™ parameters. d Associations between transdiagnostic symptom dimension scores and knowledge Meta-d' parameters. All error bars denote 95% Confidence Intervals for the regression coefficients. Ā°Pā€‰<ā€‰0.05 uncorrected; **Pā€‰<ā€‰0.05 corrected for multiple comparisons over the number of dependent variables tested.

Personality explains additional variance in 1st-order decisions, but not confidence

Having established domain-general associations with dimensions of psychopathology, we next investigated whether Big-5 personality traits account for additional variance in 1st- and/or 2nd-order performance across both tasks.

We entered Big-5 factor scores into regression models as predictors along with the symptom dimensions (and age and gender) (Fig. 7). Note that variance inflation factors (VIFs) were ā‰¤2.83 for all predictors, indicating a negligible influence of multicollinearity on the estimated coefficients80. The analysis was only performed for d', meta-d' and confidence c' as no relationships were found with metacognitive efficiency (meta-d'/d') for any of the symptom (Fig. 6b, d) or personality (Supplementary Fig. 9) dimensions when tested independently. For the personality dimensions, extraversion was negatively correlated with 1st-order accuracy on the knowledge task (Ī²ā€‰=ā€‰āˆ’0.18, pā€‰=ā€‰.014, corrected) and a similar but weaker negative relationship was observed on the perceptual task (Ī²ā€‰=ā€‰āˆ’0.15, pā€‰=ā€‰.023, uncorrected). Additionally, openness to experience was positively correlated with 1st-order accuracy on the perception task (Ī²ā€‰=ā€‰0.12, pā€‰=ā€‰.034, corrected). With personality dimensions included in the regression models, CIT scores remained significant independent predictors of 1st-order accuracy for both tasks (perception: Ī²ā€‰=ā€‰āˆ’0.17, pā€‰=ā€‰.006, corrected; knowledge: Ī²ā€‰=ā€‰āˆ’0.15, pā€‰=ā€‰.019, corrected).

Fig. 7: Associations between 1st- and 2nd-order decision parameters and both self-reported personality traits and symptom dimensions, controlling for age and gender, in experiment 2.
figure 7

Data are plotted separately for the (a) perception and (b) knowledge tasks. Note that these analyses were only performed for d', meta-dā€™ and confidence c' as no relationships were found with metacognitive efficiency (meta-d'/d') for any of the symptom or personality dimensions when tested alone. All error bars denote 95% Confidence Intervals for the regression coefficients. Ā°Pā€‰<ā€‰0.05 uncorrected; **Pā€‰<ā€‰0.05 corrected for multiple comparisons over the number of dependent variables tested.

No personality dimensions significantly predicted metacognitive performance (meta-dā€™ or confidence cā€™) in either task after multiple comparison correction, whereas confidence criteria were positively related to AD (perception: Ī²ā€‰=ā€‰0.19, pā€‰=ā€‰.026, corrected; knowledge: Ī²ā€‰=ā€‰0.23, pā€‰=ā€‰.003, corrected), and negatively related to CIT (perception: Ī²ā€‰=ā€‰āˆ’0.14, pā€‰=ā€‰.036, corrected; knowledge: Ī²ā€‰=ā€‰āˆ’0.17, pā€‰=ā€‰.003, corrected) for both tasks. Hence, whilst both personality and psychiatric symptom dimensions were independently associated with 1st-order accuracy, symptom dimensions were the only significant predictors of domain-general confidence.

Transdiagnostic symptom dimensions elucidate relationships between personality traits and psychopathology

Finally, we investigated relationships between Big-5 personality dimensions and symptoms of psychopathology (Fig. 8). Controlling for age and gender, extraversion was negatively associated with apathy (Ī²ā€‰=ā€‰āˆ’0.39, pā€‰<ā€‰.001, corrected), social anxiety (Ī²ā€‰=ā€‰āˆ’0.53, pā€‰<ā€‰.001, corrected), generalised anxiety (Ī²ā€‰=ā€‰āˆ’0.47, pā€‰<ā€‰.001, corrected), depression (Ī²ā€‰=ā€‰āˆ’0.36, pā€‰<ā€‰.001, corrected) and schizotypy (Ī²ā€‰=ā€‰āˆ’0.28, pā€‰<ā€‰.001, corrected), but positively associated with alcoholism (Ī²ā€‰=ā€‰0.16, pā€‰=ā€‰.017, corrected). Agreeableness was significantly negatively associated with scores on 6 out of 9 questionnaires (all Ī²ā€™sā€‰ā‰¤ā€‰āˆ’0.1, all pā€™sā€‰ā‰¤ā€‰.001, corrected). Conscientiousness was significantly negatively associated with scores on 7 questionnaires (all Ī²ā€™sā€‰ā‰¤ā€‰āˆ’0.11, all pā€™sā€‰ā‰¤ā€‰.001, corrected). Openness to experience was negatively associated with apathy (Ī²ā€‰=ā€‰āˆ’0.36, pā€‰<ā€‰.001, corrected). Neuroticism was significantly positively associated with scores on 8 of the questionnaires (all Ī²ā€™sā€‰ā‰„ā€‰0.09, all pā€™sā€‰ā‰¤ā€‰.001, corrected).

Fig. 8: Widespread associations between self-reported personality traits and psychopathology, controlling for the influence of age and gender.
figure 8

a Associations between psychiatric symptom questionnaire scores and personality dimension scores from separate regression models. The y-axis indicates the change in each personality dimension score for each change of 1 standard deviation of questionnaire scores. b Associations between transdiagnostic symptom dimension scores and personality dimension scores. All error bars denote 95% Confidence Intervals for the regression coefficients. Ā°Pā€‰<ā€‰0.05 uncorrected; **Pā€‰<ā€‰0.05 corrected for multiple comparisons over the number of dependent variables tested.

For symptom dimensions (Fig. 8b), extraversion was negatively associated with both AD (Ī²ā€‰=ā€‰āˆ’0.24, pā€‰<ā€‰.001, corrected) and SW (Ī²ā€‰=ā€‰āˆ’0.62, pā€‰<ā€‰.001, corrected), but positively associated with CIT (Ī²ā€‰=ā€‰0.28, pā€‰<ā€‰.001, corrected). Only AD showed a significant negative association with agreeableness (Ī²ā€‰=ā€‰āˆ’0.31, pā€‰<ā€‰.001, corrected), suggesting that this transdiagnostic dimension may account for the ubiquitous negative relationships observed across the individual questionnaires (Fig. 8a). For conscientiousness, AD was negatively associated (Ī²ā€‰=ā€‰āˆ’0.66, pā€‰<ā€‰.001, corrected) whilst SW was positively associated (Ī²ā€‰=ā€‰0.17, pā€‰<ā€‰.001, corrected). This suggests that AD may also account for the negative relationships between multiple questionnaires and conscientiousness (Fig. 8a). Openness was negatively correlated with AD (Ī²ā€‰=ā€‰āˆ’0.15, pā€‰=ā€‰.016, corrected), but positively correlated with CIT (Ī²ā€‰=ā€‰0.17, pā€‰=ā€‰.002, corrected). It is notable that no positive relationships were observed between either conscientiousness or openness and any of the individual questionnaire scores (Fig. 8a), whereas the transdiagnostic analysis revealed positive relationships with SW (conscientiousness) and CIT (openness), respectively (Fig. 8b). Hence, the transdiagnostic approach revealed relationships which were masked by classic diagnostic categories. Finally, neuroticism was positively associated with all three symptom dimensions (all Ī²ā€™sā€‰ā‰„ā€‰0.24, all pā€™sā€‰<ā€‰.001, corrected). The results confirm strong relationships between dimensions of personality and psychopathology and highlight that the transdiagnostic approach provides information about the nature of these relationships which is not apparent using classical diagnostic categories.

Discussion

Distortions of both 1st-order perceptual decision-making and metacognitive evaluation have been suggested to characterise various forms of psychopathology. To date it has remained unclear exactly which latent processes are involved and whether the distortions generalise across cognitive domains. Here, employing a battery of self-report psychiatric symptom questionnaires and computational modelling of psychophysical performance across two studies, we found a symptom dimension characterised by ā€˜compulsive behaviour and intrusive thoughtā€™ (CIT) to be associated with reduced 1st-order objective accuracy but, paradoxically, increased confidence. Conversely, an ā€˜anxious-depressionā€™ (AD) dimension was associated with systematically low absolute confidence in the absence of any relationship with 1st-order accuracy. These relationships replicated across perception and general knowledge tasks and occurred independently of age and gender. Alongside dimensions of psychopathology, we also investigated whether Big-5 personality traits explained additional variance in either 1st-order and/or metacognitive decision-making. Whilst dimensions of both personality (extraversion, openness) and symptoms (CIT) were independently associated with 1st-order accuracy, only symptom dimensions (AD, CIT) predicted metacognitive performance. Overall, the results reveal robust, domain-general signatures of decision-making and metacognition related to distinct psychological dispositions and psychopathology in the general population, and further elucidate the nature of relationships between personality and psychopathology.

The CIT dimension most prominently links features of impulsivity, OCD, schizotypy, addiction and eating disorders. Our results suggest domain-general alterations across multiple levels of the decision hierarchy in CIT, in line with previous studies which have found compulsivity to be associated with alterations in 1st-order perceptual decision-making27,28,81, goal-directed control9,47,51,82 and confidence judgements26,28,29. The CIT dimension was associated with a positive confidence bias (across both experiments and tasks) and reduced metacognitive sensitivity (across both tasks but only in study 2) but showed no relationship with metacognitive efficiency. Previous studies have found a reduction in metacognitive efficiency associated with compulsivity26,27 but we did not find evidence for this here. The lack of an association with metacognitive efficiency suggests that the relationship between CIT and metacognitive sensitivity (meta-d') may have been driven by the negative relationship between CIT and first order accuracy (d'). Our results suggest that confidence ratings still dissociate between correct and incorrect trials to the degree expected given the 1st-order performance in CIT, but overall confidence calibration is high. The apparent contradiction of reduced objective performance but inflated confidence is in line with an altered connection between confidence and behaviour29.

The 1st-order decision deficits associated with CIT, and related disorders, have been attributed to alterations in decision formation processes such as evidence accumulation27,44,83. Here we show that the deficits extend beyond decisions about external sensory stimuli to include semantic memory/knowledge decisions based on internal evidence. Hence, they cannot be explained by low level sensory dysfunction. Higher order deficits in the internal modelling of task structures have also been shown to characterise compulsivity9,82,84. However, as optimal performance on our tasks did not require participants to learn underlying state transition probabilities, but rather depended in a straightforward manner on their decision accuracy on each individual trial, it seems unlikely that impaired internal task models can explain the 1st-order effects observed here. The effects may be explained by a recently proposed ā€˜decision acuityā€™ (d) trait found to underlie decision-making performance, independently of IQ, across a large range of decision tasks46. Interestingly, both d and IQ scores were found to be negatively related to a psychiatric dimension characterised by compulsivity/obsessionality/schizotypy (labelled ā€˜aberrant thinkingā€™)46.

The AD dimension, which most prominently linked features of apathy, anxiety, and depression, was associated with low confidence across cognitive domains in the absence of any relationships with objective performance. These findings confirm negative confidence bias as a feature of anxious-depressive symptomology, even in sub-clinical samples25,26,53,85, and have implications for prominent theories of the role of metacognition in depression. Whereas the negativity hypothesis86 posits that depressed individuals evaluate themselves in an overly negative way, the depressive realism hypothesis87 posits that depressed individuals are more accurate in their evaluations of themselves and that it is non-depressed individuals whose evaluations are distorted by a positivity bias. Under these theories, we would expect depressive symptoms to be associated with either an increase in confidence criteria (negativity hypothesis) or an increase in metacognitive sensitivity/insight (depressive realism). Our results were more in line with the former, as we found no evidence for a relationship between AD symptoms and metacognitive efficiency. Hence, while individuals reporting high levels of AD were more negative in their confidence ratings overall (in line with the negativity hypothesis), this was not associated with a reliable alteration in their ability to dissociate correct from incorrect responses. Indeed, other recent studies have also found no relationship between metacognitive efficiency and anxious-depressive symptoms28,53.

The computations underlying metacognitive sensitivity and bias have been suggested to arise from dissociable neural networks. For instance, in the prefrontal cortex (PFC), metacognitive sensitivity is associated primarily with anterior (aPFC) structure and activity20,88,89, whereas absolute confidence is associated with ventromedial (vmPFC), posterior medial (mPFC) and dorsolateral (dlPFC) regions30,90,91. Our results suggest that anxious-depressive symptoms may be associated with changes in networks subserving absolute confidence, but not metacognitive sensitivity. Intriguingly, recent evidence suggests that interactions between confidence and reward valuation/motivation are reflected in activity in the vmPFC and dorsal anterior cingulate cortex (dACC)25. These regions have also been associated with symptoms of apathy92, anxiety93 and depression94 and hence represent promising candidates for the neural locus of the AD effects.

The functional consequences of confidence biases in both AD and CIT should be investigated further. Negative confidence bias may have a pernicious long-term influence on motivation95,96, learning97,98, information seeking6 and self-esteem53,99 which in turn may cause and/or exacerbate anxious-depressive symptoms. Conversely, inflated confidence may result in rigid beliefs and cognitive inflexibility, symptoms often observed in OCD100, addiction101 and schizophrenia102,103. Changes in confidence calibration may be linked to maladaptive beliefs about self-efficacy and the level of control one has over their thoughts and/or behaviours. It would be of interest to assess whether successfully challenging these maladaptive beliefs, through techniques such as cognitive behavioural86 or metacognitive104 therapies, would result in corresponding changes in confidence criteria. As well as providing a useful neuro-computational outcome measure for clinical research105, this would help to elucidate a key open question of the causal direction of the relationship between symptoms and metacognitive bias: Do the biases arise prior to, and potentially confer risk for, the onset of symptomology; or are they rather concomitant symptoms themselves? Incorporating quantitative measurement of metacognitive bias into studies employing longitudinal and/or interventional designs could shed light on this question.

We found no evidence that personality traits play a role in the relationships between metacognition and psychopathology. Metacognitive bias related to dimensions of psychopathology directly rather than through a shared link with general psychological dispositions. Indeed, Big-5 dimensions did not predict confidence in either cognitive domain. Interestingly, 1st-order accuracy was negatively associated with extraversion for both tasks. These relationships occurred independently of the accuracy-CIT relationships and, though they were not hypothesized, are in line with previous studies32,106,107. Hence, both personality and symptom dimensions were related to 1st-order performance. To elucidate the source of these relationships, future studies may investigate whether factors known to influence decision-making performance, such as choice history bias108,109, attention deficits110, confirmation bias111, and/or alteration in reward/loss sensitivity81,112, contribute to the observed 1st-order CIT and/or personality effects. We did not measure IQ here and so it is possible that variation in general intelligence may contribute to the effects, though evidence for relationships between IQ and both extraversion113,114 and compulsivity26,46 is mixed. Future studies may also investigate whether IQ and/or the recently proposed d factor46 play a role in the observed 1st-order effects.

Although they were not significantly related to metacognition, personality dimensions were strongly correlated with psychopathology. Numerous relationships with classic diagnostic categories were observed for each Big-5 dimension34,35,36,37,38,39,40. However, relationships between personality and transdiagnostic symptom dimensions were also found which were masked by the classical categories: positive relationships between SW and conscientiousness, and between CIT and openness. These findings suggest links between personality traits and symptoms which do not neatly fit established diagnostic boundaries, thereby further validating interest in the identification of transdiagnostic symptom predictors72,73. Given that the Big-5 represent one level within a hierarchy of traits115,116, it would be interesting to investigate exactly which subordinate facets of each dimension are most strongly linked to transdiagnostic symptoms.

Our results have implications for current models of metacognition. A normative model posits that confidence computations reflect the probability of being correct in a statistically optimal manner117,118,119. However, the relationships between symptoms and confidence ratings, and the dissociations between d' and meta-d' observed across both tasks, show that the normative model alone cannot fully account for subjective confidence. Rather, our results align with models positing that confidence judgements arise from processes which are dissociable from the decision itself74,120.

Both domain-specific and domain-general factors influenced metacognitive performance. At the group-level, objective accuracy was lower for knowledge than perception, but overall metacognitive efficiency and absolute confidence levels were higher. The differences in metacognitive efficiency and confidence criteria between the tasks support an influence of domain-specific factors30,121, though it is difficult to identify exactly which as these measures are not only influenced by differences in metacognitive mechanisms between cognitive domains, but also by differences in task characteristics such as 1st-order difficulty41 and variability in difficulty across stimulus levels122 which were not equalised between tasks. However, a possible explanation for increased metacognitive efficiency in the knowledge task is that, whereas self-evaluation of perceptual task performance required assessment of evidence presented very briefly and then fading in iconic memory, the internally generated knowledge evidence was presumably available to the same degree throughout the trial, including during confidence judgements. Alternatively, given that confidence levels were also higher for the knowledge task here, the increased metacognitive efficiency scores may be explained by a recently discovered positive correlation between efficiency and confidence1,123.

In support of domain-general processes also influencing performance, we found significant between-task correlations. In line with previous studies, type-1 accuracy46, metacognitive sensitivity124 and metacognitive efficiency125,126 were all somewhat correlated across tasks. However, overall confidence bias was the most strongly correlated measure78,79 and most strongly linked to symptoms. This suggests that a trait-like, global metacognitive process9,28 links to psychopathology, as opposed to more ā€˜localā€™, domain-specific processes such as uncertainty about sensory evidence or model-based task representations. Global metacognitive evaluations may be intimately linked to beliefs about overall self-efficacy and are likely to have a more pervasive influence on everyday functioning9,28. One important consideration is that the task measures of interest here may be affected by different levels of noise121,127 and this may have influenced both estimates of their reliability across tasks and the strength of their relationships with other variables (such as symptom scores). For instance, it is possible that estimates of confidence bias may be inherently less noisy than estimates of metacognitive sensitivity and efficiency. Although Meta-dā€™ measures of metacognitive performance are widely adopted and currently represent the state-of-the-art in the field41,42,57, alternative approaches to modelling/quantifying metacognitive abilities128,129,130 are emerging which may be applied in future research to further characterise relationships between metacognition and psychopathology.

Testing symptom variation in the general population affords the advantage of efficient collection of large samples and overcomes the arbitrary boundaries between psychopathology and normality imposed by diagnostic manuals including the DSM131 and ICD132. However, it remains to be seen whether these results can be extended to clinical samples with the highest levels of symptom severity. The transdiagnostic approach revealed relationships between psychopathology and both metacognition and personality traits which were not apparent in analyses using classic diagnostic categories (see also Rouault et al., 201826), and this may be due to relationships being masked by overlap of symptom dimensions within single categorical disorders, such as overlap of AD and CIT within OCD9,24,25,47. This creates challenges both in terms of relating results to previous research and for translation to clinical practice72. Future research should investigate whether diagnostic categories (such as OCD) or transdiagnostic dimensions (such as compulsivity) are stronger predictors of cognitive and/or metacognitive deficits in clinical samples. Along these lines, Gillan et al., (2020)133 showed that the CIT dimension was a significant predictor of deficits in goal-directed planning whereas having a diagnosis of OCD was not. Furthermore, identification and quantification of relationships between symptoms and cognition at the level of the individual, rather than at the population level134, could remain agnostic to over-arching diagnostic labels and provide direct targets for therapeutic intervention, in line with a move towards precision psychiatry135,136.

We employed the same battery of questionnaires as previous studies26,47 and were able to replicate three previously reported symptom dimensions (AD, CIT, and SW). However, the questionnaire items contributing to these dimensions do not exhaustively cover all forms of psychopathology and other transdiagnostic symptom structures have been proposed36,51,137,138 which may capture a broader range of cognitive/metacognitive alterations. It is also important to note that the age ranges of both samples here were heavily skewed towards young adults (Supplementary Figs. S1 and S4), likely due to the online recruitment strategy. Future studies should investigate decision-making and metacognition over extended symptom and age ranges and across different transdiagnostic structures. Additionally, it will be important to ascertain whether relationships between psychopathology and both 1st and 2nd-order decision-making are relatively invariant, or whether they depend on time and context139. For instance, the relationships may fluctuate as a function of disorder trajectory or symptom provocation. Understanding temporal dynamics and contextual triggers will help to refine models of the neurocomputational signatures associated with psychopathology and potentially facilitate the identification of novel treatment techniques.