An experimental study of a virtual reality counselling paradigm using embodied self-dialogue

When faced with a personal problem people typically give better advice to others than to themselves. A previous study showed how it is possible to enact internal dialogue in virtual reality (VR) through participants alternately occupying two different virtual bodies – one representing themselves and the other Sigmund Freud. They could maintain a self-conversation by explaining their problem to the virtual Freud and then from the embodied perspective of Freud see and hear the explanation by their virtual doppelganger, and then give some advice. Alternating between the two bodies they could maintain a self-dialogue, as if between two different people. Here we show that the process of alternating between their own and the Freud body is important for successful psychological outcomes. An experiment was carried out with 58 people, 29 in the body swapping Self-Conversation condition and 29 in a condition where they only spoke to a Scripted Freud character. The results showed that the Self-Conversation method results in a greater perception of change and help compared to the Scripted. We compare this method with the distancing paradigm where participants imagine resolving a problem from a first or third person perspective. We consider the method as a possible strategy for self-counselling.


Supplementary
2. When I talk in public I feel nervous, overwhelmed, I think that the it will not go well, I start to play with my fingers and my hands sweat and I would like to control my emotions.

Work or study anxiety
1. When I find it hard to learn from my mistakes I feel ridiculous and disappointed with myself, I think I cannot change and I think about how bad I do without learning and that I am not able to remedy it and I would like to be consistent about the things I think.
2. When I consider looking for a job, I feel insecure I think I am not prepared to face a job that I do not know and I react by disconnecting from the subject and I would like to be more daring to be able to look for a job.

Family Issues
1. When my mother shows her concerns about my aspirations I feel censored and I think that such aspirations are not important or are fanciful, so I put myself in her place and I just act as she wants and I would like to show her that I am capable of fulfilling my dreams.
2. When my family is not well and there are screams at home I feel sad and discouraged I think about providing solutions and I react by trying to calm those screams and I would like to earn enough money to be able to improve the situation.

Selection of others
1. When I am in contact with a spider I feel disgust and panic, I think that it will chase me and react by running or screaming and I wish I could not worry because the spider is near me.
2. When I think I'm sick I feel nervous and afraid, I think I'm going to die and I react by looking for physical symptoms or going to the doctor and I would like to be able to react calmly.
3. When the subway (metro) is very crowded I feel insecure and anxious, I think someone can steal from or hurt me, I start to sweat and my heart races and I wish I could be normal in a subway full of people.
4. When I'm alone with a possible aggressor I feel helpless, scared and weak, I think I want to leave because I can be assaulted. I make it look like he is not there and I wish I could stop my fear.

Supplementary Text S2 -The scripted condition
In the scripted condition the virtual Freud interacted with the participant using the following statements. Each time the participant finished speaking and pressed the wand button, the virtual Freud would say the next sentence in the given order. After virtual Freud had finished the sentence, control would switch back to the participant who could then respond.

Supplementary Text S3 -Participant responses
The variables in Tables A-C, describe various aspect of the responses of participants to the virtual environment. We consider these descriptively since there are no hypotheses to test and we are solely concerned with how this particular sample of participants experienced the simulation, rather than making inferences to a wider population. In order to check for possible differences between the Scripted and Self-Conversation groups we compute the effect size based on the Mann-Whitney rank sum test, which is the probability that an observation in the Scripted group is greater than in the Self-Conversation group. (Note that this is not a significance test, but an effect size and is a descriptive estimate based on counts in the data). These effects sizes are shown in Table D.
We require presence, body ownership and agency to be high overall for the experiment to make sense for this sample.

Experience of the conversation
Several aspects of the experience of the VR session were explored with the questions shown in Table A. These questions were given during the AfterVR assessment point. The therapist made me nervous. 58 Figure A shows that there was very little incidence of simulator sickness or disturbance by outside sounds. Both Scripted and Self-Conversation groups had a strong sense that the virtual Freud was talking with them, and they were comfortable overall. There appears to be some difference between the Scripted and Self-Conversation groups with respect to nervousness, with the Self-Conversation group being more so, but in any case the median of the Self-Conversation group is 0.    My thoughts in relation to the conversation were the same as in a real situation.

Presence -Place Illusion and Plausibility
58 Figure B shows that almost all the scores are above the 0 mark, and that there is overall similarity between the Self-Conversation and Scripted group. The effect sizes shown in Table D range between 0.39 and 0.47.

29
*Only applied to the Self-Conversation group.
Figure C shows that overall the levels of self-recognition, body ownership and agency were high. For body ownership and agency with respect to both their own and the Freud body the lower quartiles are all at least 1, and most of the medians 2, albeit with several outliers. The only caveat is that self-recognition may, overall, have been slightly lower for the Self-Conversation group. This cannot be due to any systematic reason because the same methodology was used throughout. In the Scripted group 20/29 gave a score of 1 on this variable and in the Self-Conversation group 20/29 gave a score of 0. However, ownmirror gives almost identical scores for both groups. Table F shows that the probabilities of Scripted being greater than Self-Conversation range between 0.51 and 0.53 for owndown, ownmirror and ownagency, and is 0.77 for selfrecognition. Note that as in ( Table  3. The variables freuddown, freudmirror and freudagency were only applicable to the experimental group.  Table A shows the instruments used to assess general aspects of the participants' psychological condition, apart from the actual personal problem that they discussed in the experiment. Risk. The latter set of items are of particular interest to detect people who are at risk of doing harm to themselves or to others, and that was a reason for inclusion in this study as well. CORE-SFB contains 18 items covering all domains and the total score was the variable used here, higher values indicating more psychological distress. Since CORE instruments ask clients to value their status in each item according to the last week, they are administered always at the beginning of a session. Therefore, in this study it was not included in the third assessment, at the end of the VR experience. The factorial structure of available data clearly supports its use as separate variables as it has been done in this study, with higher values indicating higher levels of depression, anxiety or stress. Besides its utility to assess changes at the end of the study, it was also used at the first assessment point to exclude participants with high levels of distress. Figure A shows the changes in outcomes over time, for the variables in Table A. This shows a small decline in CORE over time which is relatively steeper for the Scripted compared to the Self-Conversation group, and no particular changes with respect to ATQ. The changes and differences in CORE are small in absolute terms. Since CORE was initially greater in the Scripted compared to the Self-Conversation group, its decline in the Scripted group is also greater. There is an apparent small reduction in ATQ After1Week compared to PriorVR. Figure B depicts the scores on depression, anxiety and stress at the initial assessment and those of the week after the VR experience. There is a slight decrease in the means except for stress in the Scripted group. The Scripted group scores were initially higher than the Self-Conversation group, and this difference is maintained.  (Table A).

Automatic Thoughts
From Table B of Supplementary Text S6, CORE4 is strongly positively related to CORE1 with posterior probability 1.000. Moreover, the bulk of the posterior distribution of the coefficient of CORE1 is less than 1, with posterior ( $%,' < 1) = 0.943, indicating that overall, other things being equal, the overall intervention is associated with a decrease in the CORE score. This decrease is independent of the experimental condition. Further, the Self-Conversation condition may be associated with an increase in CORE4 compared to the Scripted condition. This has probability 0.901, but the effect is small with the expected value of the coefficient being 0.11 -i.e., CORE4 might be greater by the amount 0.11 in the Self-Conversation condition compared to the Scripted condition. This should be compared with the range of values of CORE4 which is 0.17 to 2.28, mean ± SD 0.90 ± 0.52.
ATQ4 is strongly positively related to ATQ1 (probability = 1.000). It is highly likely that the overall intervention is associated with a reduction of ATQ independent of the condition, since the 95% credible interval for the coefficient of ATQ1 is well below 1 (0.59 to 0.89). There is some evidence that the Self-Conversation condition is associated with a reduction in ATQ4. This has probability 1 -0.099 = 0.901. The coefficient is -0.89 and the range of values of ATQ4 is 8 to 30, mean ± SD 14.1 ± 4.70. In other words the expected difference between the two conditions is that the Self-Conversation condition is approximately one point less than the Scripted condition. Hence the effect is small.

44% of variance was explained by first factor (Ydisc) 79% of variance was explained by first two factors (Ydisc and Yhelp)
The factor loadings are shown in Table A, suggesting that factor 1 is dominated by importance and discomfort and factor 2 by help and significant. Varimax rotation was applied. The factor variables have correlations with the original as shown in Table B. Variables produced from the first factor in a factor analysis over the variables in Table 2 (main paper)  Variables produced from the second factor in a factor analysis over the variables in Table 2 (main paper) correlating positively with help and significant. ℎ ' , ℎ ) changes a binary variable (Table 2 main paper) ℎ ' , ℎ )

General psychological variables (Supplementary Text S4):
The dependent variables (except for changes) are typically considered as continuous variables in analysis (e.g., using ANOVA). Here we follow this convention, except that we conservatively use Student t for the distribution of these variables, since this can have much fatter tails than the Normal, allowing departure from the assumption of normality and also allowing for outliers. The degrees of freedom of the distributions are treated as parameters and their posterior distributions are obtained. Hence the distributions used adapt to the data. We use the notation ~( , , ) to indicate that y has a Student distribution with degrees of freedom v, median µ and scale parameter s. The same distributions are used for the factor analysis variables derived from Tables 2 and 3.
The variable changes in Table A is binary (0 No -no change, 1 Yes -change). This is treated using the Bayesian equivalent of logistic regression. This may be used when y is a binary response variable with possible values 0 or 1, and 2 , ' , … , A are predictor variables. The linear predictor is defined as = ∑ F A FG2 F . The probability of observing a '1' (change) is given by which is derived from the logistic distribution. The Bayesian method will produce posterior distributions of the parameters F . Note the log-odds ratio: Hence, the F give the change in the log-odds of the response being 'Yes' compared to 'No', as a result of a unit increase in the corresponding F , other things being equal.
In order to denote this distribution we write:

~_ ( )
As explained above in the initial session a psychologist with clinical experience talked with the participants to elicit their problem, to select a problem which was feasible to work with in this format. The psychologist gave participants the opportunity to express feelings, thoughts, and desired outcomes about the problem. It is likely that this conversation in itself would have resulted in positive changes with respect to their problem. Therefore, the outcomes assessed at PriorVR are used as covariates for the final outcomes as response variables obtained at After1Week. The assessments at PriorVR were made after the discussion with the clinician, but before the VR experience. Hence here we are interested in whether the VR results in an improvement with respect to the problem over and above that which might have been caused by discussion with the clinician.
For CORE, ATQ and the DASS variables (Table A) we use the InitialMeeting scores as the covariates for the response variables at assessment point After1Week. We did not expect the virtual conversation to impact these variables, but we included them in order to assess whether there were wider effects beyond a response to the specific problem.
Our final response variables are those at After1Week, since even if there might be an improvement immediately after the VR (AfterVR assessment point) if this does not survive at least one week then it is not of interest.
We use the notation shown in Table A for the terms in the Bayesian model.
For all but the first equation, interest focuses on the * ,' parameters. Positive values of these indicate that Self-Conversation is positively associated with the corresponding response variable compared to Scripted. Also it will be important to check that the * ,2 parameters are positive, since they reflect the expected positive association between the assessment points 1 or 2 and 4 values. For the first likelihood equation the interest focuses on W,2 .
Prior distributions: All of the b parameters have prior normal distributions with mean 0 and standard deviation 10. Hence, approximately 95% of the distributions of each b is in the range ±19.6, and 99% approximately between ±25.8. The cut-points have prior normal distributions with mean 0 and standard deviation 5, but constrained to be ordered.
All of the s and v parameters have Cauchy prior distributions with scale parameter 10, but restricted to the non-negative domain. Approximately 95% of the distribution is between 0.4 and 259, and 99% between about 0.08 and 1314 (equal tails). The Cauchy distribution is used here in preference to a flat prior, since it is a proper distribution, and although extremely large values of the scale parameters are unlikely they are possible with this distribution, without the use of improper priors.
Missing values were handled using the MATLAB function knnimpute. This replaces the missing value by a weighted mean of the k nearest neighbours of the variable in the same class, using Euclidian distances between the columns. The k used was the number of columns available for that particular class of variables. For example, for importance the class is importance1,…,importance4, so that k would be 3. For help there is help2, help3 and help4, so that k would be 2.
For the Bayesian analysis we used the Stan system 23 (http://mc-stan.org). The code representing the model specification above is given in Supplementary Text S6. In particular we used the R interface to Stan (https://mc-stan.org/users/interfaces/rstan). The factor analyses and descriptive graphs were produced using Stata 15 (https://www.stata.com). The Monte Carlo simulation was run with 4000 iterations, using 4 chains.
Convergence of the simulation was successful, with all Rhat values being 1. The graphs and Pearson correlations between the observed and fitted values are shown in Supplementary Text S7.
The raw data is available as Supplementary Data S1.    (Table 1). Pearson correlation r = 0.77.

Supplementary Text S8 -Analysis of the Interview Data
Participants were interviewed at the end of the entire experience (i.e., at the end of the After1Week session) and their responses were recorded, using a semi-structured interview following Elliott, et al. 1 . The recordings have been initially used to generate a frequency analysis of common responses to the experiences of the participants. The qualitative data analysis software NVIVO was used (https://www.qsrinternational.com/nvivo/home). Categories for the frequency analysis were defined depending on the repetitions found in the spontaneous responses of the participants during the interviews. In order to define a category, a word/idea (or synonyms) had to be repeated at least by two different participants (between conditions or within one condition). The counting of these categories was only for repetitions between different participants, not counting when a participant repeated the same word or several times. This resulted in tables of the categories and their frequencies by the S and SC group.
The changes questionnaire (Table 2) allowed a determination as to whether or not participants had experienced changes in the week following the VR exposure. All the questions the interview method were oriented to explaining what had changed during the week since the VR exposure. Of those in the SC group 25/29 (86%) reported a change after one week whereas 14/29 (48%) of the S group reported a change ( Figure 4D). Examining only those who indicated a change we found through the interviews that 88% of the participants in the SC group reported that this change was due to the VR session whereas amongst those in the S condition, 29% attributed their change to the VR session.
Participants in the S group were likely to report that the change they experienced was due to being exposed repeatedly to the problem during the three visits of experimental procedure and therefore thinking a lot about it. No one in the SC group said this. This is shown in Figure A, where the greatest contribution to change amongst those in the S group was because the method required them to 'think more about it' (their problem). Figure A shows that the pattern of responses to the interview can be seen to be quite different between the two groups. Note that although some in the S group had answered positively to the changes questionnaire, they revealed during the interview that in fact they had experienced no changes. For the SC group the reasons for their changes focused on issues such as seeing themselves from the outside, as another person, with a new perspective, talking to themselves, and with their own answers and solutions. For those in the S group these reasons appeared much less frequently.
This categorization is the first step towards a deeper qualitative analysis of the data obtained in this study. Further work will attempt to determine relations between categories that have been identified here, and their relationship to the degree of the outcome.