Introduction

Schizophrenia is among the top medical disorders that produce disability worldwide1. Among the core symptoms of the illness, auditory verbal hallucinations (AVH), which are the experience of hearing the voice of a person or of an entity that is not present physically, are the most commonly reported form of hallucinations with a lifetime prevalence of 70% in this population2. AVH can have a devastating effect on a patient’s life due to high levels of distress3, feelings of depression4, impaired social functioning5, increased suicide risk6, and delayed recovery7. The felt distress is mainly due to the negative/derogative content of voices (i.e., threatening, frightening, hostile voices)8,9. Unfortunately, not all patients respond to antipsychotic medication. It has been estimated that 20–50% of patients will have treatment-resistant schizophrenia (TRS)10,11,12,13. Moreover, only 30–60% of these patients will respond to clozapine14,15,16. TRS is associated with some of the highest levels of impaired functioning17, rates of hospitalization18, and costs to society19. Thus, schizophrenia, especially TRS, is a complex, severe, and disabling psychiatric disorder that poses significant therapeutic challenges20.

With the recognition of the limitations, side effects and health risks associated with antipsychotic medication21,22, psychosocial interventions have become extensively endorsed in clinical practice guidelines as part of the treatment of those with psychotic experiences23,24. The most widely studied evidence-based and first-line psychological treatment recommended from guidelines for psychotic symptoms is cognitive–behavioral therapy for psychosis (CBTp)25,26,27, which comprises an umbrella of interventions. The main instrument of change in cognitive–behavioral approaches involves discussing the origins of hallucinations, reframing appraisals, and modifying behavior related to psychotic symptoms, increasing the use of better coping strategies (i.e., mindfulness), reducing distress, and improving well-being28,29,30. The effect sizes across meta-analyses have varied generally depending on the (1) specific population chosen (e.g., poor treatment responders31), (2) type of therapy (e.g., case formulation-based32), (3) intensity of therapy (e.g., low-intensity33), (4) assessed time-points (e.g., post-therapy34, follow-up improvements35), or (5) comparison groups (e.g., active control36). Globally, most studies have found CBTp to be at best moderately effective in ameliorating psychotic symptoms and improving domains of well-being; effects, however, appear weaker when compared with other psychotherapies and at follow-ups31,32,33,35,36,37. CBTp remains further shortcoming with up to 50% of patients not responding to this approach38.

Given these modest treatment effects, and the limited impact of CBTp on AVH, researchers have suggested that the development of interventions should be guided by research on processes specific to the experience of voice-hearing39. There is consequently an increasing tendency for CBTp to focus less on changing faulty thinking and to begin to employ supplementary therapeutic methods to highlight ways of relating to the self, emotion regulation, and interpersonal relationships40. As highlighted in a systematic review by Lincoln and Peters41, these approaches have been shown to yield better effects compared to generic CBTp. Amid these individually tailored interventions are a new wave of relational approaches building on the perspective that AVH are experienced as coming from entities that have personal identities, and with whom the voice-hearer establishes a personal relationship42,43,44,45,46,47. Dialogical therapies (i.e., refs. 45,46,47,48,49,50) aim to ameliorate the voice-hearer’s relationship by encouraging assertive interactions with voices, by negotiating new ways of relating and by ameliorating self-views51,52,53,54. Different techniques (i.e., role-play with the therapist, empty-chair work) have been used to allow patients to engage with their voices. With advances in technology, AVATAR Therapy (AT) uses a visual depiction of the AVH that enables the therapist to role-play the voice to aid the voice-hearer practice different responses to their experience in a more direct manner47,55,56. We have independently extended the therapy using immersive virtual reality (VR) with a head-mounted display to deliver the therapy (VR-assisted therapy (VRT))46. The exposure to an avatar of patients’ personified voice is likely to be a unique and robust device to reduce fear and distress associated with persecutory voices, which is to a certain degree similar to exposure-based therapies41,57,58. Importantly, this novel intervention enables voice-hearers to converse with their voice in the aims of improving coping and diminishing felt distress by addressing power and control within these relationships as well as by modifying negative self-perceptions and ways of relating46,55,56.

The results of the two pilot trials comparing AT/VRT to treatment-as-usual46,47 as well as a larger RCT comparing AT to supportive counseling45 showed large effects of VR therapies on AVH in short-term follow-ups and up to a 24-week follow-up. The therapy has also shown improvements for overall symptoms of schizophrenia, depressive symptoms, voice malevolence as well as omnipotence, and quality of life. Improvements appeared to be larger than those of conventional treatments. Though, to date, no randomized trial has compared this relational VR therapy to a recommended active treatment with a long-term 12-month follow-up. The aim of this pilot comparative trial was therefore to evaluate the efficacy of VRT over our CBT for AVH adapted for patients with TRS in the short-term and to examine if effects are maintained in time. The trial additionally had for aim to assess the acceptability and feasibility of both interventions and estimate the amplitude (e.g., effect size) of the potential difference in efficacy between both interventions for future larger trials.

Results

Sample characteristics

During the study period, 138 referrals were provided by clinical teams and the community, 35 individuals refused to partake in the project, thereby leaving 103 participants that were assessed, of whom 74 were eligible. Reasons for participant exclusion were: not having a primary diagnosis of schizophrenia or schizoaffective disorder (n = 1), not hearing distressing voices, reporting voices not speaking in the therapists’ primary language or in denial of voices (n = 15), having substance use problems (n = 4), not stabilized with treatment (n = 6), having received another psychological treatment at the time of the intervention (n = 1), and being under curatorship (n = 2). Eligible participants were randomized to either VRT (n = 37) or CBT for AVH (n = 37). For a flowchart of the study participants, please see Fig. 1. Overall, there was a greater proportion of men (76%), most were Caucasian (82%), the mean age was 42.5 years (SD = 12.7), ~80% were single and the mean duration of schooling was 12.2 years (SD = 3.6) (see Table 1 for more details). Most patients held a diagnosis of schizophrenia (77%) with a mean duration of illness of 16 years (SD = 10.4) and were treated with atypical antipsychotics (96%). Over half of participants were also prescribed clozapine but continued to experience persistent AVHs, thereby meeting the criteria for ultra-resistance. At baseline, there were no significant differences between the two groups (VRT and CBT) in terms of psychiatric symptoms and sociodemographic data (p > 0.05).

Fig. 1: Trial profile of patients who received virtual reality (VR)-assisted therapy or cognitive–behavioral therapy.
figure 1

There were 138 referrals provided by clinical teams and the community, of whom 74 were eligible and randomized to either one of the therapies.

Table 1 Baseline sociodemographic and clinical characteristics.

Short-term treatment efficacy

As observed in Table 2, several statistically significant within-group improvements were found for VRT and CBT from baseline to 3-month follow-up.

Table 2 Outcomes at baseline, post-treatment, and 3-month follow-up for short-term efficacy comparison.

Both treatment groups showed significant reductions on the prespecified primary outcome consisting of AVH symptoms assessed with the total PSYRATS-AH score (p < 0.001 for VRT and p = 0.001 for CBT). Significant reductions were found most prominently for distress related to AVH and voice frequency subscales of the PSYRATS. Based on Cohens’ d, the effects of VRT on AVH were large (PSYRATS-AH-Total score d = 1.080; PSYRATS-AH-Distress d = 0.998; PSYRATS-AH-Frequency d = 0.701) and small to moderate for CBT (PSYRATS-AH-Total score d = 0.555; PSYRATS-AH-Distress d = 0.434; PSYRATS-AH-Frequency d = 0.339).

Concerning beliefs about voices measured with the BAVQ-R, VRT showed significant improvements from baseline to 3-month follow-up on persecutory beliefs (p = 0.039). Although not statistically significant, CBT showed a trend toward significance on the persecutory beliefs subscale (p = 0.076) and total beliefs about voices (p = 0.056). Both therapies showed moderate effects on persecutory beliefs about voices (d = 0.438 for VRT and d = 0.382 for CBT). Depressive symptoms as secondary outcomes measured with the BDI-II also diminished in both treatment groups with effects being of moderate magnitude (d = 0.577 for VRT and d = 0.498 for CBT). Although not statistically significant for the CBT arm, overall general symptoms as measured with the PANSS significantly diminished (p = 0.008) with VRT. Most effects were observed on the excited/hostility subscale (p = 0.005) and anxio-depressive subscale (p < 0.001). The effect of VRT was of moderate range (d = 0.651) for overall symptomatology and was found to be larger for affective symptoms (d = 0.724 for excited/hostility symptoms and d = 0.786 for anxio-depressive symptoms). In addition, VRT significantly ameliorated quality of life (p = 0.001) with an effect of moderate magnitude (d = 0.637).

There was one statistically significant between-group Time × Treatment effect for the anxio-depressive subscale of the PANSS, yielding to a superiority of VRT over CBT (p = 0.025) (see Table 2).

Long-term maintenance

As shown in Table 3, results for VRT were maintained in the long-term up to the 1-year follow-up with no statistically significant differences from 3-month follow-up for most outcomes. The only exception comprised the engagement subscales of the BAVQ-R for VRT, which was found to diminish significantly (p = 0.002) from 3- to 12-month follow-up and returned to baseline. CBT showed no statistically significant differences in any of the outcomes.

Table 3 Outcomes at 3-, 6-, and 12-month follow-up for long-term maintenance comparison.

Acceptability and feasibility of interventions

Of the 74 participants, nine withdrew at some stage from VRT and three from CBT. Reasons for withdrawal included lack of motivation, not wanting to reduce their voices and moving away. In terms of adverse events, no patients were re-hospitalized during the totality of the trial. Attrition rate at post-treatment was 16.2% (Fig. 1). Reasons for discontinuation at follow-ups varied and included patients initially having a maximum follow-up period being set at 3 months, patients being unreachable after several attempts and patients not desiring to further participate in the project. In the 15 patients having participated in semi-structured interviews on their perspectives concerning treatments, most participants found their corresponding intervention (VRT or CBT) to be adequate in content, sequence, dose, tailoring, timing, mode of delivery, and equipment use. One-third of participants did find the dose of interventions (VRT and CBT) to be too short and would have preferred supplementary sessions. Particularly related to VRT, 37.5% voiced the intervention as being stressful at first, which is precisely within the scope of VRT. Once they had overcome the initial exposure to anxiety, they enjoyed their experience and found it to be interesting. In addition, 42.9% of participants in CBT found the homework to be either uninteresting as sessions progressed, not enough to gain awareness or lacking visual aid.

Discussion

With the rise of VR in psychotherapy to enhance conventional approaches, this randomized comparative trial aimed to compare the efficacy of an innovative treatment using VR (VRT) to our adapted CBT for AVH in patients with TRS. Both therapies of nine weeks were found to be feasible to implement and acceptable to patients with no adverse events being attributed to any of the interventions. Drop-out rates, while slightly larger for VRT, were in similar range to other psychosocial interventions59. In addition, both interventions were found to be efficacious and yielded notable improvements in symptomatology for patients with persistent symptoms who have not responded to prior treatments. This is a breakthrough as approximately half of patients were considered ultra-resistant and prescribed clozapine.

Concerning key outcomes, our findings showed that both VRT and CBT reduced overall AVH in the short-term, including associated distress and frequency. Although our findings did not show a statistically significant superiority of VRT over CBT, VRT did achieve larger effects particularly on overall AVH (d = 1.080), voice distress (d = 0.998), and frequency (d = 0.701). These effects are in the same range as those observed in the prior trials on VR therapies for AVH45,46,47, which were of large magnitude as well. Markedly, effects were maintained up to our 1-year follow-up. These findings, while not significant, suggest that VR therapies for voices may potentially achieve greater efficacy on AVH in comparison to the small-to-moderate effects that have been observed in literature on generic CBTp, which is corroborated by the moderate effect of our CBT for AVH (d = 0.555). Moreover, CBT is generally not meant to reduce frequency in voices as is the case of AT and VRT, but rather change the beliefs patients have towards their voices60. In this trial, CBT only showed a trend towards significance on overall beliefs about voices and persecutory voices, with effects both of moderate range (d = 0.488 and d = 0.382). Effects may have reached significance in this trial if we had a larger sample size. The effect on persecutory beliefs, which combined malevolence and omnipotence, was significant for VRT and similarly attained a moderate effect (d = 0.438). This was in accordance with our initial pilot trial that found significant results on both malevolence and omnipotence46. Interestingly, we observed a between-group Time × Treatment effect trend toward significance for engagement with voices. This suggests that patients engaged more with their voices following VRT. Since VRT is an experiential therapy that allows patients to engage with a personified version of their distressing voice by emphasizing on the alteration of their emotional experience, patients may more easily increase engagement. This may also extend outside of the therapy sessions. However, the effect then returned to baseline value at the 1-year follow-up, which suggests that booster sessions may be necessary to maintain effects in time.

Of interest, VRT specifically reduced clinician-ranked overall symptoms of schizophrenia in comparison to CBT. Our results suggested a superiority of VRT over CBT on more affective symptoms (i.e., anxio-depressive symptoms) with effects reaching large magnitude. This finding is not surprising given VRT’s emphasis on enabling patients to experience strong emotions (e.g., anxiety, fear, and anger) during the dialog with their voices and to learn to regulate them. The therapy may therefore help reduce cognitive avoidance of fear-relevant information (i.e., the voice and its content) and reduce anxiety as a result of exposure61,62. A usual therapy experience for patients who engage with the approach generally involves some early anxiety followed by a reported sense of relief, achievement, power, and liberation56,63,64. It has been speculated that distressing AVH with negative content may directly impact mood, and low mood may in turn make a patient more vulnerable to further AVH65. Therefore, focusing on emotional regulation and reductions in distress may influence affective symptoms as observed in both our pilot project and this trial. Through our immersive VRT, affective symptoms may be tackled by enabling patients to learn to better manage their intense emotions and to improve their self-image. Beyond VRT sessions, patients appeared to continue to consolidate their learnings into their daily lives, which may likewise explain the significant improvement observed on subjective quality of life. This finding is important as patients with TRS often have poorer quality of life66, which is an indicator of their sense of well-being and satisfaction of their life circumstance67. Not only has the treatment of patients with schizophrenia been traditionally focused on symptoms, but many psychotherapies have not targeted quality of life. As observed in our trial, CBT has not clearly shown to improve quality of life68. In recent years, subjective quality of life has become a particularly crucial target that should be improved with treatment since enhanced quality of life may bring about recovery in patients69.

The trial has implications for the treatment of patients with TRS since it showed that both treatment modalities demonstrated significant improvements, which paralleled with their corresponding therapeutic targets as well as delivery modality. Due to the heterogeneity in patients with schizophrenia, voice-hearers may comprise distinct subtypes that require different forms of treatment70. In this sense, CBT may be best for patients who are not ready to be immersed into the emotion-inducing experience of VRT and desire to learn more about their AVH. In addition, CBT is ideal for patients who hear voices with no communicative content as it may appear at the least pointless and potentially harmful to attempt to enter a dialog with an absent agent71. Nevertheless, this type of manual-based approach may not be adequate for patients with cognitive deficits, may not be sufficiently individually tailored and may not allow to target relevant factors of their hallucinatory experience (i.e., interpersonal aspects of their experience) and rather emphasizes beliefs about voices. Hence, instead of trying to challenge beliefs about voices and learn to resist voices, VRT, in accordance with “Third-wave therapies,” primarily focuses on how patients relate with their voices by working on improving self-esteem, self-acceptance, and emotion regulation. Within this approach, the patient’s relationship with their voice is fundamentally viewed in the context of their current and previous significant relationships4,72. VRT may therefore target a range of therapeutic targets that are relevant to the voice-hearing experience and allows patients to experientially live their experience in a secure therapeutic environment, thereby enabling learnings to be more readily transferred to the real world. Nevertheless, VRT may be too anxiogenic and confrontational at first for some patients, which may explain the slightly higher drop-out rate in comparison to CBT. It also remains to be clarified whether VRT is superior or equivalent to traditional relational approaches (e.g., ref. 50) and whether it truly necessitates the use of immersive VR in comparison to using a computerized system as in AT45,47.

There are noteworthy limitations to this trial that should nonetheless be acknowledged. The most important limits include the evaluators being non-blinded to treatment allocation during the clinical assessments, small sample size, and single therapist per therapy. First, while most outcomes measured used “self-reported” type assessment, the PANSS used clinical judgment. Thus, the evaluation of clinical changes may have been under- or over-estimated. Though, this is less likely the case since evaluators were trained on a series of videos provided from an external provider to ensure interrater reliability. Second, we observed several trends toward significance (p < 0.1), which may have become significant if a larger sample size was obtained. Nevertheless, our results were in similar range to those of a full-powered well-conducted randomized trial on AT45. Third, another limitation is the fact that the therapy has been offered only by a skilled therapist with substantial expertise in the psychological treatment of schizophrenia. Apart from CBT for AVH, it is, however, unknown if the efficacy of VRT would remain across therapists as this type of therapy poses delivery challenges (e.g., shifting in real time between communicating as therapist and avatar), ethical considerations (e.g., the therapist must recreate critical and hostile interactions), and training dilemmas (e.g., the therapist should be experienced with this specific population). A further single-blind randomized controlled trial comparing VRT to CBTp, which will be sufficiently powered, is underway to target these limitations (ClinicalTrials.gov Identifier: NCT04054778).

In summary, our pilot comparative trial is the first to compare VRT to evidence-based CBT for the treatment of refractory voices in patients with schizophrenia. We showed that both low-intensity treatment groups are beneficial interventions with effects lasting in time. Nevertheless, VRT yielded to larger effects on AVH and showed additional effects on affective symptoms and quality of life. Keeping in mind that there is no sole effective intervention that is likely to benefit all patients, VRT highlights the future of patient-tailored approaches that integrates several processes (i.e., self-experience, emotion regulation) relevant to potentially improve the effectiveness of generic CBT for voices. Since schizophrenia, mostly TRS, is an extremely complex disorder associated with significant impairments in social and occupational functioning, VRT may have implications for patients’ health and quality of life that are potentially immense. Although this study was not conducted with the aim of further understanding the therapeutic elements of the interventions, future research should aim to understand the components of psychotherapies that leads to efficacy over simply conducting efficacy trials. There is indeed emerging work into understanding the therapeutic components that lead to AT’s and VRT’s large efficacy on AVH and other facets of the illness63,64. More research is necessary to establish which components of AT/VRT make it efficacious and to determine which patients may respond better to the intervention. Studies are currently underway to better understand the differences between treatment “responders” versus “non-responders”.

Methods

Participants

We were referred 138 patients from the Institut Universitaire en Santé Mentale de Montréal and the community. Briefly, patients were eligible if they were 18 years of age or older, had a diagnosis of either schizophrenia or schizoaffective disorder with persistent AVH and failed to respond to two or more antipsychotic trials. Participants were excluded if they presented a neurological disorder, an unstable and serious physical illness, or a substance use disorder in the past year and if they followed CBTp in the past year. The trial was conducted in accordance with the Declaration of Helsinki and was approved by the institutional ethical committee (CER IPPM 16-17-06). We obtained written informed consent from all patients.

Design

This is a pilot randomized parallel comparative trial comparing two 9-weekly interventions of 1 h: VRT and CBT for AVH. All patients continued to receive standard psychiatric care (treatment as usual) and agreed to withhold from changing existing medication over the duration of the therapy sessions. Patients fulfilling inclusion criteria were randomly assigned (based on a 1:1 ratio) to either VRT or CBT for AVH. Randomization was determined following the completion of baseline assessments by an external research coordinator. Therapy discontinuation from either group was defined as nonattendance to consecutive sessions and discontinuation decided by patients or recommended by the treating therapist (i.e., participant using substances that interfered with the psychotherapy). This clinical trial has been registered on Clinicaltrials.gov (identifier number: NCT03585127).

Virtual reality (VR)-assisted therapy

Patients generally underwent 9-weekly sessions consisting of one avatar creation session and eight therapeutic sessions where patients were immersed into the VR setting. Of all patients, most (67%) received nine sessions, whereas the rest received seven sessions. The change in the number of sessions from seven in the pilot trial46 to nine was based upon consensus with the treating psychiatrist (AD) and the research team that patients necessitated additional consolidation sessions to achieve better treatment effects. The therapy was delivered by an experienced clinician (AD) who has around 7 years of experience as a psychiatrist. In his clinical practice, he has evaluated and treated over one thousand patients with major psychiatric disorders including schizophrenia73,74,75,76,77,78,79,80,81,82. The therapy was manualized and assessment of the external validity of the delivery of the intervention was performed by a doctoral student (LD). To do so, a random selection of patient sessions was rated based on a scale developed by the team to assess adherence to the manualised approach.

In the first session, patients underwent a comprehensive assessment of AVH and were requested to create and personalize the face and voice of an avatar best resembling the person or entity believed to be the source of their most distressing voice. This was ensured with the aid of a doctoral student (LD) and the treating psychiatrist (AD). Patients who heard several voices were invited to select the most distressing voice or the most dominant one for the creation of the avatar. Patients were immersed in VR through the Samsung Gear VR head-mounted display or, more recently using more advanced technologies, through an Oculus Rift head-mounted display. The platform that was used allowed to quickly and simply create highly realistic synthetic characters. Unity 3D game engine with custom made assets and Morph3D Character System were used to create idiosyncratic avatars. The voice of the avatar was simulated in real time with a voice transformer (Roland AIRA VT-3). Lip synchronization was performed via SALSA with RandomEyes Unity 3D extension. Patients sat in an adjacent separate room from the therapist, who would converse with patients either through the voice of the avatar or as themselves. The immersive virtual environment consisted of an avatar seen from a first-person perspective standing in a dark room. An inventory of facial expressions was integrated into the platform to use at the therapist’s discretion to enable the avatar to express emotions that patients would easily recognize such as joy, sadness, anger, and fear based on the Facial Action Coding System83.

The immersive therapeutic sessions consisted of (1) pre-immersion where the therapist would discuss the preceding week and determine the objective of the therapy session with the patients; (2) immersion where the patient would be immersed in the VR environment and be encouraged to enter in a dialog with their avatar animated in real time by the therapist; and (3) post-immersion where the therapist debriefed the patient and evaluated their feelings of their immersive experience. Sessions 2–4 aimed to confront patients to their hallucinatory experience. The therapist induced a dialog between patients and their avatar with the help of sentences they provided, which were generally abusive, critical, and hostile remarks. Patients were incited to enter a dialog with the avatar to enhance emotional regulation and assertiveness. Session 5 targeted self-esteem, which was supported by enabling the patients to express themselves and to consider their personal qualities. To facilitate this process, a list of qualities presented by the patient’s personal surroundings was introduced in the dialog of the avatar. The interaction of the avatar with the patient became less abusive and more supportive as sessions of VRT progressed. The patient generally became more empowered in the interaction they held with their avatar as the former developed more assertiveness. In the final consolidation sessions, patients had the opportunity to apply what they had previously learned in the experiential sessions and to follow-up on their initial objectives.

Cognitive–behavioral therapy for auditory verbal hallucinations

The active control condition consisted of nine individual and weekly sessions of 1 h. These sessions were administered in an individual format by a licensed psychologist trained in CBT by Dr. O’Connor, who had trained 35 psychologists throughout his career84,85,86,87,88,89,90,91. The CBT program was derived and adapted from current evidence-based treatments for AVH89. The therapy was manualized, and a doctoral student (LD) performed the assessment of the external validity of the delivery of the intervention, based on a treatment fidelity grid developed by the research team, on a randomly selected sample. Dr. O’Connor likewise ensured the fidelity to the manual by conducting weekly meetings with the treating psychologist.

The intervention involved a succession of learning modules and suggested task assignments. The first contact with the patient consisted of a history of their voices for goal setting and an introduction to the therapy. Sessions 2 and 3 focused on assessing and learning about hallucinations. With the cognitive model of hallucinations (session 3), the voices were comprehended as triggers rather than beliefs. Patients completed voice journals (assignments), which allowed them to understand and reflect on their positive symptoms and associated triggers. The following sessions focused on metacognition. In the 4th session, patients learned about diverse attributional mechanisms and the session included another voice journal to detect the beliefs that were the cause of their ill-being and, in the 5th and 6th sessions, patients were aided to interpret situations in a better manner with the use of vignettes. In sessions 7 and 8, patients practiced mindfulness exercises, were encouraged to ask for feedback and learned to observe. Session 8, including a last voice journal, allowed patients to put forward alternative explanations to their most common beliefs about their hallucinations. Session 9 led to the end of the intervention and aimed to prevent relapse.

Clinical assessments

Clinical assessments were administered before and after each intervention and at follow-up periods (3-, 6-, and 12 months) by trained psychiatric nurses. The evaluators had several meetings throughout the trial to ensure that all evaluations were conducted in a thorough and consistent manner.

The predetermined primary outcome consisted of the overall severity of AVH up to our 3-month follow-up, which was re-evaluated at 6 and 12 months to ensure maintenance of effects. AVH were evaluated with the total auditory hallucination subscale score of the Psychotic Symptoms Rating Scale (PSYRATS-AH)92 that comprises 11 items evaluated by interview (0–44). Since AVH are multidimensional, we further chose to examine the subscales of the PSYRATS-AH consisting of distress, frequency, attribution, and loudness. The psychometric properties of the PSYRATS-AH have shown excellent interrater reliability and good validity92.

Secondary outcomes included beliefs about voices, overall psychiatric symptoms, and quality of life. Patients’ beliefs about their voices as well as the manner they cope with them were measured with the Beliefs About Voices Questionnaire-Revised (BAVQ-R)93. The Cronbach’s α for the total scale has been found to be high (mean α = 0.86)93. Factor analysis has shown that the BAVQ-R supports four subscales94: two subscales relating to beliefs (persecutory beliefs combining omnipotence as well as malevolence components, and benevolence) in addition to two further subscales that measure responses to the voices (resistance and engagement). Depressive symptoms were assessed with the Beck Depression Inventory-II (BDI-II)95, which consists of a 21-item self-report inventory. The BDI-II was separated into cognitive and somatic-affective components. This instrument has shown high levels of internal consistency (α = 0.90) and test-retest reliability ranging from 0.73 to 0.9696. Symptoms of schizophrenia were evaluated with the Positive And Negative Syndrome Scale (PANSS)97. Evaluators were trained to administer the latter clinical scale by using a series of gold-standard videotapes and by conducting consensus ratings ensuring interrater reliability. This scale has reported good interrater reliability, appropriate test-retest reliability, and high internal reliability97,98. This scale was separated into five symptom clusters99: positive symptoms (including hallucinations, delusions, and disorganized thoughts, speech, and behavior), negative symptoms (including lack of motivations and social withdrawal), cognitive symptoms (including memory, language, and attention), hostility and excitement symptoms (including impulse control and violence), and anxio-depressive symptoms (including anxiety and depressive symptoms). Life satisfaction was evaluated with the Quality of Life Enjoyment and Satisfaction Questionnaire-Short Form (Q-LES-Q-SF)100,101, which consists of a self-report scale of 14 items. This scale has shown high internal consistency and test-retest reliability102.

In addition to drop-out rate and attrition, the perspectives of a sub-sample of patients from each treatment arm were examined to assess the acceptance and feasibility of both interventions. Semi-structured interviews were therefore held with patients based on a set of questions from Feeley and Cossette103. Questions were aimed at gaining information regarding patients’ views on several factors of the therapies including content, sequence, dose, setting, mode of delivery, and equipment/material used. Interviews were held until data saturation was achieved (eight VRT and seven CBT). These interviews were recorded and then transcribed.

Analyses

Statistical analyses were performed with SPSS Statistics for Windows (Version 25, IBM). Descriptive statistics were conducted on baseline data to test for group differences. Potential differences in clinical variables (psychosocial, sociodemographic, and pharmacological) at baseline were verified with chi-square tests in the case of dichotomic data and independent t-tests in the case of continuous data. Changes in reported outcomes for short-term efficacy, before and after treatment and at 3-month follow-up, were assessed using a linear mixed-effects model with maximum-likelihood estimations for missing data. The same methodology was conducted to test for maintenance in long-term follow-up from 3 to 12 months. Both between-group and within-group comparisons were verified. Time × Treatment group interaction allowed to indicate whether there was a significant change between VRT and CBT over time. The statistical threshold for significance was set at p < 0.05. Effect sizes were categorized as small (0.2), medium (0.5), and large (>0.8) effects104.

As for acceptability and feasibility of the therapies, patients’ verbatim from the transcripts were classified into main themes comprising content, sequence, dose, tailoring, timing, mode of delivery, and equipment/material used. Verbatim within each theme was then categorized as being satisfactory for patients, unsatisfactory or missing. Reasons for reduced satisfaction were considered. Frequencies of these categorizations for each theme were then calculated.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.