Introduction

The schizophrenia spectrum describes a range of psychotic disorders characterised by continuous or episodic positive and negative symptoms, as well as cognitive impairment1,2. Although checklists such as the DSM-5 aid diagnosis and categorisation of schizophrenia spectrum disorders (SSD), they fail to give healthcare professionals a nuanced understanding of the ‘clinical core’ of these disorders3.

The concept of self-disorders (SD) was derived from a subgroup of basic symptoms (BS), and BS may be seen as providing historical or conceptual influence on the later SD4. BSs are subtle, subclinical, and subjectively experienced disturbances of mental processes5,6,7. They include thought interference, thought block and a disruption to abstract thinking5. It may be somewhat intuitive to consider BSs as precursors to first-rank symptoms such as thought insertion, although from a historical-conceptual perspective, first-rank symptoms were developed prior to the BS5,6. A detailed description of BSs can be found in Schultze-Lutter et al.7. Though the aetiology is poorly understood, the Early Heidelberg School’s perceptual anomalies model for self-disturbances (the preferred term over ‘self-disorders’) proposes these abnormalities arise in sensory processing at an unconscious level5,8. The Early Heidelberg School of Psychiatry first developed and systematically described “self-disturbances” (Ichstörungen in German) in schizophrenia. Kurt Schneider later incorporated many self-disturbances in his first-rank symptoms. This approach has been further developed by several pioneering authors and is actively used today8,9,10,11. By applying this concept alongside in-depth interviews of patients with schizophrenia, the Bonn Scale for the Assessment of Basic Symptoms (BSABS)12, and later Schizophrenia Proneness Instrument (SPI), were developed13. A subgroup of BSABS items explore BSs relating to anomalies in one’s subjective self-experience.

The ipseity disturbance model developed the concept of SDs, or anomalous self-experiences (ASEs)7. The ipseity disturbance model defines SDs as psychiatric phenomena characterised by a trait-like, persistent disruption in the tacit, pre-reflective level of selfhood, known as the minimal self14. The minimal self, sometimes used interchangeably with ipseity, is the basic level of selfhood where a subject’s emotions, experiences and actions are given first-person ownership, agency, and awareness15. This distinguishes SDs in the schizophrenia spectrum from self-disorder-like phenomena seen in other conditions such as personality disorders16,17,18. Proponents of the ipseity disturbance model combined qualitative findings from patients with SSDs with a subgroup of BSABS items to create the Examination of Anomalous Self-experience (EASE)7,19.

Studies utilising the BSABS and EASE have shown the aggregation of certain SDs within SSDs20,21,22. A recent study by Koren et al.23 provides longitudinal evidence to corroborate this cross-sectional association between SDs and SSDs23. In this study, a greater total SD score was associated with a greater risk of conversion to non-affective psychosis (NAP) and schizotypal disorder, when compared with other psychiatric disorders. Other studies have demonstrated statistically insignificant differences in SD scores between schizophrenia and schizotypal disorders, validating the theory that SDs are a core feature of the broad schizophrenia spectrum and represent a clinical vulnerability phenotype24,25.

Systematic reviews have begun to emerge within this field, notably a recent meta-analysis by Raballo et al.26 and a systematic review by Henriksen, Raballo, and Nordgaard27. This meta-analysis reported the selective aggregation of SDs within SSDs, when compared to other mental illnesses (OMI) and healthy controls (HC)26. This meta-analysis also reported evidence suggesting that SDs may be a marker of vulnerability for conversion to full-blown psychosis within the schizophrenia spectrum. Like the present study, Raballo et al.’s meta-analysis explored the extent to which SDs express a specificity for SSDs. However, unlike Raballo et al.’s study, this systematic review also poses questions of how likely SDs are to occur within the schizophrenia population and how clinically useful current SD assessment tools are. These questions highlight gaps in current literature, including recent meta-analyses. Raballo et al.26 report a methodology in line with the PRISMA guidelines and utilise data such as mean differences in SD score, mirroring this study. However, both studies vary in eligibility criteria, with26 including studies with clinically high-risk (CHR) and child/adolescent groups. Although follow-up studies have demonstrated that SDs are temporally stable traits in help-seeking adolescents which also predict later transition to a diagnosis of SSDs23, these groups have been excluded in the current analysis given the potential of added heterogeneity and uncertainties in diagnoses, at least cross-sectionally, in adolescent samples. Thus, whilst the pooled effect sizes this meta-analysis generates may have reduced power, they should be more precise and enable the generation of more accurate conclusions. In contrast to Raballo et al.26’s meta-analysis26, this meta-analysis separately explores BSABS and EASE studies, which should facilitate the comparison of these assessment tools and improve precision.

This systematic review and meta-analysis propose that self-disturbances or self-disorders provide a promising avenue for gaining a better subjective understanding of the core phenomena of SSDs3. This study aims to answer the following questions. Firstly, what is the likelihood of presenting self-disorders or self-disturbances among the schizophrenia spectrum population? Secondly, what is the difference in self-disorder scores between schizophrenia spectrum groups and other non-schizophrenia spectrum groups? Thirdly, what is the clinical utility of current assessment tools in identifying self-disorders among schizophrenia spectrum groups?

Results

Study selection

Following application of the eligibility criteria, two open-label cohort/follow-up and 13 open-label cross-sectional observational studies were included in the systematic review and meta-analysis (PRISMA Flowchart, Fig. 1).

Figure 1
figure 1

PRISMA flowchart for study selection.

Study characteristics

Table 1 shows the characteristics of all included studies. Roughly half of included studies were performed in inpatient (five studies) and outpatient units (seven studies). Three studies were set in a combined inpatient and outpatient unit. The geographical setting for included studies varied. However, Denmark was the setting of the largest proportion of studies (eight studies). The other studies were set in Norway (two studies), Melbourne (three studies), Portugal (one study), and Italy (one study). Of the included studies, six used the BSABS and nine used the EASE for assessment of SDs. Studies using the EASE varied in how the SD score outcome was measured. Six of the EASE studies reported dichotomous scores, two of the nine EASE studies reported continuous scores, and one reported both scores.

Table 1 Summary table of included studies reporting the following: main author(s), geographical setting, criterion/instrument for diagnosis, SD assessment tool, type of sample, sample size, mean total SD score, odds ratio of SD, descriptive psychopathology, risk of bias rating, quality of evidence rating, and key findings.

Regarding the target population, studies varied in terms of which participants with SSDs were recruited. Eight out of the 15 studies recruited participants with SSDs exclusively, one study used participants with schizotypal personality disorder (SPD), and one study recruited participants with non-affective psychosis (NAP), which included schizophrenia. Five studies recruited participants with an SSD or SPD and one study recruited participants with either SPD or NAP, which included schizophrenia. Only one study recruited participants based upon symptoms rather than diagnosis, recruiting participants with first rank symptoms (FRS) instead.

When combining the samples of all included studies, a population of 810 participants on the schizophrenia spectrum were included. This consisted of 56 participants with an unspecified SSD, 150 participants with schizophrenia, and 262 participants with an NAP that included schizophrenia. It also contained 262 participants with SPD, 50 CHR participants (with SPD), and 30 participants with FRS.

There was significant variation in the comparison groups for each included study. A mixed composition of OMI was the most common comparison group (five studies), followed by HC (three studies). Other comparison groups included participants with no FRS (one study), autism spectrum disorder (ASD) (one study), no SSD (two studies), non-schizophrenic NAP (one study), bipolar disorder (BD) (one study) and obsessive–compulsive disorder (OCD) (one study).

A comparison population of 781 participants without an SSD were included. This included 302 participants with a variety of OMIs, 195 HCs, and 86 participants with no SSD. Smaller numbers of other comparison groups were also included: no FRS (68), BD (67), OCD (28), ASD (22), and non-schizophrenic NAP (13).

Whilst all studies reported either total SD score or the odds ratio of SDs as a primary outcome, secondary outcomes varied between included studies. Most studies reported clinical secondary outcomes, notably the OPCRIT (six studies), PANSS (seven studies) and GAF (six studies). Neurocognitive outcomes (two studies), aberrant salience outcomes (two studies) and EEG neurophysiology outcomes (one study) were other notable secondary outcomes reported in included studies.

Risk of bias and quality of evidence assessment

The risk of bias and quality of evidence rating for included studies can be found in Table 1, with a detailed breakdown of each rating in Supplementary Table S2. Concerning the quality of evidence, most studies (11) achieved a moderate quality of evidence score. Four of the 15 studies were determined to have a low quality of evidence. None of the included studies were determined to have a high quality of evidence.

Regarding risk of bias, three of the 15 included studies were determined to have a low risk of bias. Two studies were judged to have a low to moderate risk of bias. Nearly half of the studies (seven) were determined to have a moderate risk of bias. Although three studies were judged to have a moderate to high risk of bias, no studies clearly had a high risk of bias.

Differences in mean self-disorder score between SSD and control groups in studies using the BSABS

Panel (a) of Fig. 2 portrays the standardised mean effect sizes and 95% confidence intervals for studies using the BSABS. Three of the four studies (exception36) expressed statistically significant effect sizes suggestive of greater SD aggregation in SSD groups compared to control groups. The pooled effect size for BSABS studies was Hedge’s g = 0.774, 95% CI 0.529–1.019. The variance for the pooled effect size was Z = 6.191. The pooled effect size was statistically significant (p < 0.01). Heterogeneity was moderate (I2 = 49%).

Figure 2
figure 2

Meta-analysis of the aggregation and likelihood of developing self-disorders in schizophrenia spectrum and control groups, as measured with the Bonn scale for the assessment of basic symptoms (BSABS). Studies are grouped by outcome measure. (a) Describes the standard difference in means for total self-disorder score in schizophrenia spectrum and control groups. (b) Describes the odds ratios of developing self-disorders in schizophrenia spectrum and control groups. Data shown as mean ± SEM of self-disorder scores for (a) and log odds ratio ± SE for (b). Both are representative of two independent samples; values are significant if p < 0.05. BD = bipolar disorder, HC = healthy control, MD = mood disorder, NAP = non-affective psychosis, OCD = obsessive compulsive disorder, OMI = other mental illness, SD = self-disorder, SPD = schizotypal personality disorder, SZ = schizophrenia.

The likelihood of expressing self-disorders in SSD versus control groups in studies using the BSABS

Panel (b) of Fig. 2 displays the effect sizes for odds ratios (OR) and the 95% confidence intervals (CI) for studies using the BSABS. The effect sizes of four of the five studies (exception36) showed a significantly greater likelihood of SDs in SSD groups compared to controls. The pooled effect size for BSABS studies was OR = 5.435, 95% CI 2.499–11.823. Heterogeneity was judged to be high (I2 = 66%).

A sensitivity analysis was performed given the high heterogeneity and large variance (> two standard deviations) in three of the studies:20,]4,34,3535. Panels (a) to (d) of Supplementary Fig. S1 describe the odds ratio effect sizes when each potential outlier is removed. Panel (e) of Supplementary Fig. S1 describes the odds ratio effect sizes when all potential outliers are removed. A detailed description of the results from the sensitivity analysis can be found in Supplementary Material 1.

Differences in mean self-disorder score between SSD and control groups in studies using the EASE with dichotomous scores

Panel (a) of Fig. 3 displays the standardised mean effect sizes and 95% confidence intervals for studies using the EASE with dichotomous scores. The effect sizes for all seven studies showed greater SD scores within SSD groups when compared to control groups. The pooled effect size for EASE studies using dichotomous scores was Hedge’s g = 1.604, 95% CI 1.176–2.032. The pooled effect size expressed a variance of Z = 7.343. The pooled effect size was statistically significant (p < 0.01). Despite a statistically significant pooled effect size, heterogeneity bordered on very high (I2 = 76%).

Figure 3
figure 3

Meta-analysis of the aggregation of self-disorders in schizophrenia spectrum and control groups, as measured with the examination of anomalous self-experiences (EASE). Studies are grouped by the type of self-disorder score reported. (a) Describes the standard difference in means for dichotomous total self-disorder scores in schizophrenia spectrum and control groups. (b) Describes the standard difference in means for continuous total self-disorder scores in schizophrenia spectrum and control groups. Diamonds indicate pooled effect sizes and squares indicate individual study effect sizes. Data shown as mean ± SEM of self-disorder scores and are representative of two independent samples; values are significant if p < 0.05. ASD = autism spectrum disorder, BD = bipolar disorder, FEP = first episode psychosis, FRS = first rank symptoms, HC = healthy control, NAP = non-affective psychosis, OMI = other mental illness, SD = self-disorder, SPD = schizotypal personality disorder, SSD = schizophrenia spectrum disorder, SZ = schizophrenia, UHR = ultra-high risk for psychosis.

It must be noted that in Nordgaard et al.’s 2020 study on FRS33, some patients with schizophrenia but no FRS have been included as controls; as such it would be inaccurate to assume that there were no patients with schizophrenia in the control group. To deal with this issue, we did an additional sensitivity analysis removing this study (also see “Methods” section for three-level analysis for dependent effect sizes). The pooled effect size for EASE studies using dichotomous scores after removing the FRS study was Hedge’s g = 1.707, 95% CI 1.266–2.148. The pooled effect size expressed a variance of Z = 7.591. The pooled effect size was statistically significant (p < 0.01) and heterogeneity still bordered on very high (I2 = 72%).

Differences in mean self-disorder score between SSD and control groups in studies using the EASE with continuous scores

Panel (b) of Fig. 3 presents the standardised mean effect size and 95% confidence intervals for studies using the EASE with continuous scores. All three studies demonstrated effect sizes suggestive of greater SD scores in SSD groups compared to control groups. The pooled effect size for these studies was Hedge’s g = 2.584, 95% CI 1.476–3.693. The variance for the pooled effect size was Z = 4.568. The pooled effect size was statistically significant (p < 0.01). However, one study16 had an effect size which was not significant at the 1% level, although it was significant at the 5% level. Heterogeneity was very high (I2 = 92%).

Three-level analysis of combined odds ratio and Hedge’s g effect sizes

In a three-level random-effects model meta-analysis combining all the effect sizes where odds ratios were log-transformed to approximate a normal distribution similar to Hedge’s g, the Q-statistic on testing the homogeneity of effect sizes was 94.514 (p < 0.001). The estimated heterogeneity at level 2 Tau-squared and at level 3 Tau-squared were 0.5908 and 0.8039, respectively. The level 2 I-squared and the level 3 I-squared were 0.4028 and 0.5481, respectively. SSD status (level 2) and cluster of studies (level 3) explain about 40% and 55% of the total variation, respectively. The average population effect (Z-statistic and its 95% Wald CI) was 2.1429 (1.0915–3.1942).

Discussion

This meta-analysis is among the first to explore the merit of theories which posit that SDs show a specificity within the schizophrenia spectrum, a finding that is consistent with that from two very recent previous reviews26,27. Our meta-analysis appears to indicate a significant magnitude of effect suggestive of a greater expression of SDs within the schizophrenia spectrum population, when compared with HCs and OMIs. This magnitude of effect was observed in both studies using the BSABS and the EASE. Thus, we have found good evidence to support the over-expression of these SD phenomenon within the schizophrenia spectrum, whether they are interpreted as a subgroup of basic symptoms or a more pervasive distortion in the minimal self7,14,39.

This overexpression of SDs within the schizophrenia spectrum is further supported by our meta-analysis of odds ratios for the likelihood of SDs occurring. This meta-analysis reported a 2.5–12 times greater likelihood of SDs occurring within the schizophrenia spectrum population, when compared with non-schizophrenia spectrum populations. Even following the removal of outliers, SDs were over one to 4.5 times more likely to occur within schizophrenia spectrum populations when compared to non-schizophrenia spectrum populations.

Despite good evidence suggesting that SDs are a core clinical feature of the schizophrenia spectrum, there are some limitations to the evidence. The variation in pooled effect sizes suggests that SDs are not experienced by everyone within the schizophrenia spectrum. Given that our meta-analysis did not subgroup for different comparator groups, it is difficult to establish the boundaries of SDs in SSDs. This is perhaps reflected in the significant heterogeneity observed across all pooled effect sizes. Alongside methodological differences and variability in target population, the range of different comparison groups likely contributed to this generally high heterogeneity. With high heterogeneity, this study has less confidence in its pooled effect sizes. Also, these results should be interpreted with caution given the results of the three-level meta-analysis. The three-level meta-analysis found effect sizes generated by the meta-analysis to be highly dependent. Another methodological consideration that must be borne in mind is the inclusion of the same patients as separate samples in different types of analyses. Whilst we do not consider it at all likely that this approach is an intrinsic deficiency or a source of significant bias with regard to our findings, we must interpret these results with some caution given the high degrees of variability and inconsistency in the included studies’ original methodologies, which were what necessitated our analytical approach in the first place.

We anticipated that there would likely be some differences in the patterns of results from the EASE and the BSABS from the outset, given their conceptual and methodological differences. In particular, we expected results from BSABS studies to demonstrate less variance than those from EASE studies. Hence, we chose to analyse the BSABS and EASE separately, which has enabled this systematic review to empirically compare the two assessment tools. There are, of course, important caveats with regard to making this comparison. Studies utilising the BSABS have often not used the full scale, and some components of the BSABS (e.g. perceptual disorders) cannot be rated by the EASE and vice versa. However, there are still significant overlaps as to what these two scales measure, which focus on SDs albeit from different schools of thought. The BSABS was designed to facilitate the prediction of imminent risk of psychosis, hence its empirical grounding7. This is reflected in the results of our meta-analysis. For studies using the BSABS, we observed a medium to large effect size suggesting greater SD aggregation in SSDs, less variance compared to the EASE, and moderate heterogeneity. The BSABS was developed from the unpublished Heidelberg checklist using in-depth interviews to identify basic symptoms, which were then grouped by clinical reasoning7,40,41. The smaller range of items, refined incrementally with a good empirical basis is a likely explanation for these results.

In contrast, the EASE was developed with a focus on exploring the nature and experience of SDs as a core phenotype within the broad schizophrenia spectrum based on self-descriptions obtained from patients suffering from SSDs, thus it has a more theoretical grounding and is informed by the Husserlian approach to phenomenology7,19. Studies using the EASE showed a very large effect size suggestive of greater SD aggregation within SSDs, greater variance, and very high heterogeneity. The EASE was developed from a subgroup of BSABS items which were hybridised with philosophical concepts and qualitative explorations of the abnormal self-experiences of those with SSDs19,41. Therefore, it is logical that studies using the EASE, with items assessing a greater range of SDs but with less empirical grounding, would express a greater effect size with more variance. It is important to recognise these conceptual differences as they may explain some of the differences observed in the BSABS and EASE study results.

This study gives validation to the concept of SDs as a core clinical feature of the broad schizophrenia spectrum3. Therefore, this review hopes to encourage clinicians' interest in phenomenology, since there are vital and clinically relevant findings to be drawn from it. From the perspective of the ipseity disturbance model, the magnitude of effect observed within this analysis gives credit to the concept of SDs as a core clinical vulnerability phenotype of the schizophrenia spectrum, thus informing the construct validity of SSDs. From the perspective of the perceptual anomalies model, this study’s lack of focus on CHR groups prevents commenting on the BSABS’s use in predicting conversion to psychosis. However, the model is supported by the observed effect sizes for SD score and significant odds of SDs being present within SSDs. Regardless of the favoured model, it is apparent that SDs are a core phenomenon within SSDs. These findings could improve clinicians’ understanding of the lived experience of individuals on the schizophrenia spectrum, enabling them to improve patients’ quality of life. From a research perspective, exploration of SDs via the BSABS and the EASE provides one of the most promising avenues for advancing current understanding of psychosis development.

In addition, with these findings on the validity of SD assessment tools in identifying SDs within SSDs, this study hopes to encourage the adoption of assessment tools for SDs within clinical practice. However, we do not recommend the implementation of current assessment tools. Despite the high interrater reliability of the EASE and BSABS12,42, they are lengthy and resource intensive assessments7,43. This creates a difficult conundrum in clinical practice, where the volume of first-hand personal data gathered must be balanced with the limited time clinicians have to perform assessments. This review proposes the development of a shorter assessment than the current BSABS and EASE, aimed at capturing key SD manifestations. What classifies as a key SD phenomenon is beyond the scope of this review but should be investigated. However, this review would like to emphasise that tick-box checklists should not necessarily be pursued for clinical use when assessing SD in patients; their utility is perhaps better suited for the purpose of screening large, potentially healthy, populations for SD. Self-report assessments, such as the FCQ and IPASE, have been shown to be unreliable when used as an SD assessment tool despite their potential as screening measures for SD43,44,45,46. Both have poor agreement with interviewer assessments, frequently overestimating the presence of SDs. Although there are many valid and reliable tick-box checklists, in the context of SDs, it is a logical extrapolation that tick-box checklists would have the same unreliability in this area. Perhaps future research into SD assessment tools should consider a mixed-methods approach, not unlike the EASE. However, a greater emphasis should be placed on empirically based items, as with the BSABS.

Finally, this study recommends that future research into SDs adopt a more robust methodological framework with more consistent reporting. This recommendation is based on the generally high heterogeneity and inconsistent quality of included studies. At least one study made inappropriate use of statistics, for example using Fisher’s exact to calculate odds ratios in small sample sizes. Often a lower quality was due to bias introduced by a lack of random selection, failure to blind participants and assessors, and samples unrepresentative of the target population. It is worth noting it would be difficult to reduce this bias in some of these studies. Current SD assessments require intense interviewing, making it difficult to introduce blinding. This also makes it difficult to recruit participants with poor cognitive functioning or aggression, an underrepresented subpopulation in included studies. However, if future research can form a standardised protocol for the exploration of SD phenomenon and transparently report methodology, then the reliability of both individual and pooled study results will be improved.

The pooled results of our meta-analysis provide more powerful evidence for the association of SDs and the schizophrenia spectrum than existing individual studies. The findings of this meta-analysis echo the findings of the recent meta-analysis by Raballo et al.26. This meta-analysis demonstrated greater effect sizes than this meta-analysis, however this can likely be explained by the greater number of included studies, which in turn increases their analysis’ power. This meta-analysis included fewer studies as our eligibility criteria excluded studies involving adolescent and CHR groups. By contrast, the current analysis included longitudinal studies and performed a three-level meta-analysis with nested effect sizes. This analysis showed a high level of dependency and so reduces the confidence with which conclusions can be drawn. However, by performing it the robustness of this meta-analysis’ methodology and the reliability of its results have been increased.

Whilst not the first published meta-analysis within this field26,27, this meta-analysis is the first to opt to analyse the BSABS and EASE separately. Although this reduces the power of pooled effect sizes, it improves the precision of the results and allows for more accurate conclusions to be drawn. It also facilitates comparison of each assessment tool.

However, there are several limitations to this study. Firstly, the meta-analysis did not perform subgroup analysis on different comparison groups. HCs and OMIs as comparisons provide their own respective strengths and weaknesses. This study chose not to perform the subgroup analysis to maintain sample sizes and power. However, this likely accounted for a proportion of observed heterogeneity.

This meta-analysis originally intended to calculate the prevalence of SDs in SSDs. However, the lack of a cut-off score for the presence/absence of an SD prevented this from being done. Although this was compensated for by pooling odds ratios for the likelihood of SDs, it still diminishes the accuracy and generalisability of our results.

Finally, this meta-analysis chose not to explore populations with a CHR for SSDs. This was done given the sometimes fleeting and non-specific nature of SDs within CHR populations, which would not warrant accurate measurement of SDs. However, the use of these assessment tools in the prediction of SSD conversion risk is a major area of interest for their clinical utility. Therefore, by not analysing this population, this study cannot make a complete assessment of the clinical utility of these SD assessment tools.

In summary, evidence from this meta-analysis suggests that SDs show a greater aggregation and likelihood of occurring within the broad schizophrenia spectrum, when compared to HCs and OMIs. This aids in validating SDs as a core clinical feature of SSDs, which carries implications for aetiological research into SSDs. Assessment tools for SDs have potential for clinical application, however, this might be unlikely in their current iterations.

Methods

Search strategy and data collection

This systematic review was conducted following the guidelines provided in the Preferred Reporting Items for Systematic Reviews (PRISMA) statement checklist47. The electronic literature search was conducted by one researcher (S.B.) using the databases Medline, Embase, PsychINFO, PubMed and the Web of Science. To ensure all relevant literature was captured, grey literature was searched for on Google Scholar, Opengrey, Proquest, and Psychextra. The references and citations of included studies were also explored to gather literature missed in the initial search. Where relevant, the researchers contacted the authors of identified studies with inaccessible, incomplete, or ongoing trials to gather extra data.

In line with other systematic reviews, a condition, context, and population (CoCoPop) process was developed for this systematic review and was as follows: the condition as adults with an SD; the context as any setting; and the population as adults with a clinically diagnosed SSD or first episode non-affective psychosis (NAP).

A detailed description of the methodology for data collection can be found in Supplementary Materials. This includes rationales for the search terms, the screening process, the review process, and the process for handling disputes. Supplementary Material 3 presents an example search strategy. The eligibility criteria for included studies are shown in Supplementary Table S1. To summarise, the following inclusion criteria were applied: participants with a diagnosis of SSD or NAP only, inclusion of an SD assessment tool, inclusion of an observer rated SD score, adult only participants (mean age > 18 years old), English language only, participants with a clinical diagnosis only, and inclusion of a comparison group. The following exclusion criteria were applied: publication pre-1967, inclusion of a self-reported SD score, inclusion of children or adolescents, non-English language, participants with research diagnoses only, no comparison group, single case studies, and qualitative studies.

Data extraction

The researcher (S.B.) responsible for data collection carried out data extraction in line with the PRISMA statement guidelines47. The research supervisor (C.H.) provided oversight to the process for quality assurance. The full texts of included articles were re-read, with key characteristics extracted and placed within a summary of findings table (Table 1). Key characteristics reported within the table included geographical setting, instrument for clinical diagnosis, SD assessment tool, sample type, sample size, SD assessment results (mean SD score), SD odds ratio, key findings, descriptive psychopathology, and demographic features of the sample. The summary of findings table included a rating for “quality of evidence” and “risk of bias”. To maintain transparency, a detailed summary of how each quality of evidence and risk of bias rating was determined can be found in Supplementary Table S2.

A detailed description of the methodology for quality of evidence and risk of bias assessments is given in Supplementary Materials. The quality of included studies was determined through assessment with the Grading of Recommendations, Assessment, Development, and Evaluation (GRADE) handbook48 (Supplementary Materials). Study quality was graded from “high” to “very low”. The risk of bias for included studies was determined using a risk-of-bias tool designed specifically for systematic reviews of prevalence studies (Supplementary Materials)49. Study bias was rated from “low” to “high”.

Data analysis (systematic review and meta-analysis)

A narrative synthesis in line with Cochrane guidelines was performed on the 15 studies which met the aims and eligibility criteria of this systematic review50,51. This involved an initial synthesis of data relating to the utility of SD assessment tools, followed by the extraction of relevant findings (Table 1) mentioned above. Findings relevant to the research aims were systematically discussed and both the quality and potential bias in included studies were critically appraised.

A meta-analysis was performed on all included studies with an SSD group and at least one other comparison group. Given the varied comparison groups for different included studies, we anticipated considerable heterogeneity. Meta-analysis was performed using the Comprehensive Meta-analysis Professional statistical software package version 3.0.

For descriptive analysis, all results were reported via a random-effects model52. The random-effects model was favoured for the anticipated heterogeneity in methodologies of different included studies in this field of research. The random-effects model’s aim of facilitating inferences about population level effects also aligned with this systematic review’s research aims52,53. Effect sizes were calculated for standardised differences (Hedge’s g with 95% confidence intervals) in mean total SD scores and the odds ratios of having an SD.

Using Comprehensive Meta-Analysis, heterogeneity was quantified and assessed with Cochran’s Q and I2 statistics54. Regarding I2, heterogeneity was graded as follows: low heterogeneity (0–25%), moderate heterogeneity (25–50%), high heterogeneity (50–75%), and very high heterogeneity (75–100%)55,56. Sensitivity analyses were performed on some models with very high heterogeneity. Potential outliers were identified as studies with point estimates over 2 standard deviations and p < 0.05. Each potential outlier was removed individually, and the model recalculated to determine their individual impact on the effect size and heterogeneity. Finally, all outliers were removed, and the model recalculated.

Due to the significant differences between assessment with the EASE and BSABS, subgroup analysis was performed on the type of SD assessment tool used. Assessment with the EASE does not enable the generation of odds ratios because there is no quantitative cut-off score for the presence/absence of SDs. Therefore, odds ratios for the presence/absence of SDs were only performed on studies using the BSABS. A further subgroup analysis was performed on EASE studies based off the type of scoring reported (dichotomous or continuous scores).

Some included studies used the same sample of participants as other included studies or were follow-ups of other included studies. For this reason, a three-level random-effects meta-analysis with nested (dependent) effect sizes in R using the metaSEM package was performed57. For the purpose of this analysis, relevant studies using the BSABS and EASE were mixed together. This was done given the highly dependent nature of the effect sizes. Odds ratios, such as in the Parnas et al. studies5,35, were log-transformed to approximate a normal distribution like Hedge’s g. Also, only one set of effect sizes (usually SSD/HC) were selected for each comparison.

Finally, to assess potential publication bias, a funnel plot was produced encompassing all studies included in the current meta-analysis (see Supplementary Materials)58.