An Embodied Perspective as a Victim of Sexual Harassment in Virtual Reality Reduces Action Conformity in a Later Milgram Obedience Scenario

Group pressure can often result in people carrying out harmful actions towards others that they would not normally carry out by themselves. However, few studies have manipulated factors that might overcome this. Here male participants (n = 60) were in a virtual reality (VR) scenario of sexual harassment (SH) of a lone woman by a group of males in a bar. Participants were either only embodied as one of the males (Group, n = 20), or also as the woman (Woman, n = 20). A control group (n = 20) only experienced the empty bar, not the SH. One week later they were the Teacher in a VR version of Milgram’s Obedience experiment where they were encouraged to give shocks to a female Learner by a group of 3 virtual males. Those who had been in the Woman condition gave about half the number of shocks of those in the Group condition, with the controls between these two. We explain the results through embodiment promoting identification with the woman or the group, and delegitimization of the group for those in the Woman condition. The experiment raised important ethical issues, showing that a VR study with positive ethical intentions can sometimes produce unexpected and non-beneficent results.

Instructions given by the virtual experimenters prior to the commencement of the shocks Experimenter 1: Welcome and thank you for being part of this study about human memory and learning processes. The woman you see in front of you has memorized a series of pairs of associated words. Your task is to check if she remembers the correct associations, and give the appropriate comment according to her performance.
Experimenter 2: On the wall in front of you, some projected words will appear that you will have to read aloud. The correct answer will light up in green and the woman will have to give you the answer verbally.
Experimenter 1: In this experiment we want to study the effects of positive and negative evaluation in learning processes. In this case, the negative valuation corresponds to a penalty for each error. So, every time she gives you the correct answer you will have to say "Correct" and then move on to the next series of words with the side button of the controller [in your hand]. Instead, if she gives you the wrong answer, you will have to say "Wrong" and administer an electric shock using the machine in front of you.
Experimenter 2: It is very important that each time she makes a mistake or does not answer, first increase the voltage of the machine with the central button of the control, so that you administer the appropriate discharge by means of the trigger. It is imperative that you follow this instructional protocol so that the woman can receive an adequate assessment of her performance. Please be sure to maintain a sustained pace throughout the process. The following were recorded phrases for the virtual experimenters to say to the participant when necessary, triggered by the real experimenter: Experimenter 1: "What you're doing is to help better understand how memory works, please continue." "You have accepted the conditions, now you must complete your task." "Please concentrate on the task you have to do." "You have to tell her that her answer is incorrect, raise the voltage with the central button and give her the corresponding discharge with the trigger" Experimenter 2: "I thought we understood each other. Follow the protocol, please. " "Do not worry, and think that what you are doing is something positive for science, please follow." "Please, do not get distracted." "Remember that now you have to press the side button to move on to the next question."

An Embodied Perspective as a Victim of Sexual Harassment in Virtual Reality Reduces Action Conformity in a Later Milgram Obedience Scenario
Solène Neyret, Xavi Navarro, Alejandro Beacco, Ramon Oliva, Pierre Bourdin, Jose Valenzuela, Itxaso Barberia, Mel Slater

Weibull Distribution fits to the Number of Shocks
The Distribution Fitter of MATLAB R2017A was used to check the assumption that a Weibull distribution would be appropriate for the nshocks variable.  The number of shocks variable (nshocks) was partitioned into two parts, according as to whether the number of shocks was less than 12 (n = 29) or greater than 12 (n = 31). A Weibull distribution was fit for each of these. Figure A shows the result for the low shocks group and Figure B for the high shocks group. Both seem reasonable fits, and amongst the distributions explored were the best fits.
The probability density function of the Weibull distribution is: The expected value of the distribution is The model used was a mixture of two Weibull distributions with means 1 and 2 with 1 < 2 .
In support of this, we used the Stan 'ordered' specification for the prior distributions of 1 corresponding to 1 and 2 corresponding to 2 , such that The method then allows the posterior " ( = 1,2) to adapt to the data.

Goodness of Fit of the Weibull Model
Using the built-in Stan pseudo random number generators 16000 observations were generated on the model Eqns 1-3. Hence for each individual i = 1,…,60, a posterior predicted distribution was generated for nshocks, NN50 and HR. The means of the distributions were used as point estimates and correlations with the observed values computed, as shown in Table A. We treat these correlations as effect sizes following Cohen 1 , where r = 0.1 is considered a 'small' effect, r = 0.3 is considered 'medium' and r = 0.5 is 'large'. All the correlations are large. From the simulated results we have the predicted distributions for each ℎ @ , = 1, … , . Amalgamating these we can observe the predicted probability distribution of the number of shocks over all participants, and compare with the histogram of the observed values.

Alternative models
A number of alternative models were considered for the distribution of nshocks.

Normal distribution
Typically researchers assume and employ a Normal Distribution (in this case it would be a mixture of Normal distributions). However, it proved impossible to fit Normal distributions to these data. The Stan program, used to fit the Bayesian model did not converge even with 100,000 iterations. Examining Figure 4A (main manuscript) shows that the Normal distribution does not provide a good candidate, especially for the higher number of shocks.

Negative Binomial distribution
The negative binomial distribution occurs as the number of Bernoulli trials needed to obtain a given number of k 'successes'. This model would imply that the (likely non-conscious) decision making strategy of participants would be to wait until a certain number of negative events had occurred, and then withdraw. It is a type of waiting time distribution.
The mixture distribution of two negative binomial distributions was fitted in Stan with 4000 iterations, and no divergences. It gives similar qualitative results to the Weibull with respect to the estimation of the means of the numbers of shocks.

Gumbel distribution
The Gumbel distribution models the maxima of a sample of Normally distributed data. This model would imply that participants non-consciously continue until some maximum of a latent variable (probably related to stress) occurs, and then they withdraw. The Gumbel distribution was fitted with 32000 iterations (though with some divergences). It gives similar qualitative results to the Weibull with respect to the estimation of the means of the numbers of shocks.

Fréchet distribution
Whereas the Weibull distribution is typically used to model the minimum of a sample, the Fréchet distribution is its inverse, and a distribution to model maxima. This was fitted in Stan with 4000 iterations, and no divergences. It gives similar qualitative results to the Weibull with respect to the estimation of the means of the numbers of shocks. Table B shows that no matter which distribution is used to model nshocks, the results are the same at the basic level.

Comparisons of alternative models using the leave-one-out criteria
Here we use the 'leave-one-out' method 2 , which assess the predictive ability of the model, based on repeatedly fitting the model with one individual left out, and then predicting the results for that individual using the remaining data. This results in a statistic called epld_loo, and the greater this value, the better the predictive fit. The method also acts as further check on divergences of the fit. Using this method the best fitting distribution is Gumbel, the Weibull is next, then Fréchet and the Negative Binomial the worst.
The "loo" library in R produces the results shown in Table C. The distribution with the highest elpd is the Gumbel, and the differences from that are shown, with Weibull being the second, and Negative Binomial the worst. The idea is that if the reduction in elpd, taking into account the standard error, is large then the model with the greater elpd is the better predictive fit.
However, the Gumbel fit reports thousands of divergences and as we have seen above does not reproduce well the original data, so we use the Weibull distribution in the main manuscript. Also the fact that the Gumbel requires 32,000 iterations to apparently converge compared to only 8000 for the Weibull does not recommend it as a good model.

Relationships Between Physiological Responses
Heart rate (HR) and NN50 were recorded throughout the Shocks scenario. Two segments of 120s each were retained -in the baseline period prior to the start of the shocks scenario, and the final 120s prior to the subject giving the last shock.   The Autonomic Perception Questionnaire (APQ) was administered prior to and after the shocks experience. The APQ measures subjective awareness of bodily sensations such as sweating and stomach palpitations in 24 questions as below: Each is scored on a 10 point scale with 1 = 'Not at all' and 10 = 'A great amount'. The score is the sum over all 24. The final score is the difference between this sum of scores in the experimental period and the baseline (dAPQ). The greater the difference the more that negative bodily sensations were experienced. There is an internal consistency since the change in APQ is positively correlated with the change in HR. This is shown in Figure D (r = 0.28, n = 58).

Explaining Low and High Shocks
Here we present a model that accounts for the low and high shocks. Figure D shows the relationship between the change in HR (dHR) and the change in APQ (dAPQ). In addition there is a relationship between Plausibility (Psi) and dAPQ ( Figure E).
In order to have a single measure for Psi we carried out a principle components factor analysis on the scores from the Psi questions in Table 1 (main paper). This resulted in an overall score that we refer to as Psi. This is highly correlated with the original questionnaire scores. The correlations between Psi and the questionnaire scores range between 0.72 and 0.88 (n = 58). Figure E shows the relationship between the change in APQ and Psi (r = 0.32, n = 58). Moreover, as shown in Figure 9 (main paper), there is a strong relationship between Plausibility and low or high shocks (nshocks > 15). Figure F shows this also with the new factor analysis variable Psi, illustrating how high plausibility is associated with lower shocks.  Figure G shows the relationship between Condition and the change in heart rate (dHR) and the change in APQ (dAPQ). As shown in the main result the HR change is steeper for those in the Group condition than the Woman, and both are steeper than the Control condition. However, with respect to dAPQ there are no differences between the conditions. We postulate a possible causal chain involving the change in heart rate (dHR), the change in APQ (dAPQ) reflecting interoception of participants' physiological state, plausibility and the and whether or not the number of shocks is high. This is shown in Figure H. This model states that an increase in HR is associated with an increase in interoception regarding subjective physiological state (dAPQ), which in turn is associated with higher plausibility, which is associated with a reduction in the number of shocks. Conversely, following the same chain, a decrease in HR compared with baseline results in an increase in the number of shocks. The Condition (Control, Woman, Group) are exogenous factors representing the experimental conditions. This is formalised as the following model (with dHR and dAPQ standardized to mean 0 and SD 1).
Here bernoulli_logit (Stan library) is a standard binary logistic model suitable for modelling binary outcomes.

Condition dHR dAPQ Psi HighShocks
The prior distributions (weakly informative) are normal(0,10) for all the parameters, and Cauchy(0,5) on the positive half-line for the standard deviation parameters.
The model fit was accomplished with 4000 iterations with 4 chains, there were no divergences, and all Rhat values were equal to 1.
The results are summarised in Table A. It can be seen that there are strong linear dependencies following the model in Figure H.   Predicted posterior distributions on each of the three response variables were computed, and Table B gives the correlations between the means of these distributions and the observed values.

95% credible interval
Following the same strategy as Supplementary Text S2 we use the 'leave-one-out' method for assessing the model. The results are shown in Table C. What is important is that the standard errors are small compared with the estimates, and the effective number of parameters are in line with the actual number of parameters of the model. With the two end-points of the chain (Condition, HighShocks) fixed there are 6 possible model permutations. We analysed each of the 6 possible chains following the same model structure as above, and computed the leave-one-out statistic (elpd) and the correlations between the observed and means of the predicted posterior distributions for HighShocks. The results are shown in Table D. The model with the highest elpd is that shown in Figure H.
There is a small reduction in elpd for the model where the order of dAPQ and dHR are swapped, but the SE of the reduction is small compared to the size. However, the correlations between the observed and predicted values are the same. All other models have a much lower elpd and small correlations. Figure G shows that there is a clear relationship between condition and dHR but no such relationship between condition and dAPQ. Therefore it seems reasonable to conclude that the preferred model is that shown in Figure H.
Regarding dHR, the Control condition corresponds to the parameter S|,^, the Woman condition to S|,^+ S|,1 and the Group condition to S|,2 . The posterior distributions are shown in Figure I. It can be seen that Group is associated with a decrease in dHR, Control with an increase, and no change in the case of the Woman condition. The word "acosado" (harassed) was used only in the woman condition, showing that the effect we wanted to create was successful. 5 participants out of 20 (in the Woman condition) used this word to spontaneously describe their experience inside the virtual scene (25%). The word "incómodo" (uncomfortable) was used in both conditions but with more frequency in the woman condition (10% in the Group condition and 25% in the woman condition). The qualification "Imbécil(es)" (idiot(s)) describing the avatar(s) harassing the woman, was used only in the woman condition showing that participants in the Group condition did not get this negative perception of the male avatars. Participants in the Group condition reported more diverse sensations, it was difficult to find words repeated more than 2 or 3 times between participants. The word repeated with greater frequency was the word "extraño" (strange). For the participants in the Group condition there was a small effect of feeling "outside" the scene (observing, spectator, movie), only 2 participants reported that they were feeling "pity" for the victim. More surprisingly, 2 participants reported feelings of "calmness" during the harassment scene.

Body Ownership
An important response variable in the Bar scenario was the extent to which participants had the perceptual illusion that the virtual body that they embodied was their body. For this purpose we had administered the following questions immediately after each virtual exposure: mirror: I had the feeling that the virtual body I saw when I looked towards the mirror was my body.
down: I had the feeling that the virtual body I saw when I looked down was my body.
Each of these were scored on a -3 to +3 scale, where -3 signifies complete disagreement and 3 complete agreement. These questions were given after Phase 1 and Phase 2 where mirror1 is the score after phase 1, mirror2 after phase 2 and similarly for down.

Figure C -Box plots of body ownership by phase 1 and 2 and condition.
Figure C shows the box plots of the two questionnaire scores by phase and condition. It is clear that overall the body ownership scores were high (for example, all the interquartile ranges are above the 0 score, and all the medians are 1 or 2 out of the maximum score of 3). Most important, and in line with other findings, the level of subjective body ownership does not differ whether the participants are embodied in the male or female body, and also do not vary between the two phases.

An Embodied Perspective as a Victim of Sexual Harassment in Virtual Reality Reduces Action Conformity in a Later Milgram Obedience Scenario
Solène Neyret, Xavi Navarro, Alejandro Beacco, Ramon Oliva, Pierre Bourdin, Jose Valenzuela, Itxaso Barberia, Mel Slater

Differences Between Low and High Shock Groups
Here we consider the Low shock group to be those who gave 14 or less shocks, the remainder being the High shock group. (It makes no perceptible difference if 12 is used as the cut-off rather than 15, but we follow the findings in Figure 8A, main text).

Place Illusion
Place Illusion was measured by the following questions. The questions are based on those in Steed, et al. 1 and many previous papers. Please rate your feeling of being in the training room situation with the following scale from -3 to +3 (in which +3 represents the feeling you usually have when you're in a place). real To what extent did you feel at certain times during the experience that the training room was the reality for you? (-3 not at all, 3 all the time) beenthere When you think about your experience, do you remember the situation in the training room as if it were some images that you have seen or as if it were a place where you have been? (-3 images, 3 place been) intrainingroom During the experience, what has been stronger, the feeling of being in the training room or the feeling of being in the real world of the virtual reality laboratory? (-3 laboratory, 3 virtual training room)