Experiments as Code and its application to VR studies in human-building interaction

Experiments as Code (ExaC) is a concept for reproducible, auditable, debuggable, reusable, & scalable experiments. Experiments are a crucial tool to understand Human-Building Interactions (HBI) and build a coherent theory around it. However, a common concern for experiments is their auditability and reproducibility. Experiments are usually designed, provisioned, managed, and analyzed by diverse teams of specialists (e.g., researchers, technicians, engineers) and may require many resources (e.g., cloud infrastructure, specialized equipment). Although researchers strive to document experiments accurately, this process is often lacking. Consequently, it is difficult to reproduce these experiments. Moreover, when it is necessary to create a similar experiment, the “wheel is very often reinvented”. It appears easier to start from scratch than trying to reuse existing work. Thus valuable embedded best practices and previous experiences are lost. In behavioral studies, such as in HBI, this has contributed to the reproducibility crisis. To tackle these challenges, we propose the ExaC paradigm, which not only documents the whole experiment, but additionally provides the automation code to provision, deploy, manage, and analyze the experiment. To this end, we define the ExaC concept, provide a taxonomy for the components of a practical implementation, and provide a proof of concept with an HBI desktop VR experiment that demonstrates the benefits of its “as code” representation, that is, reproducibility, auditability, debuggability, reusability, & scalability.


S1 Frameworks for experimentation in VR through the lens of ExaC
In this section, we provide a more detailed discussion of the frameworks mentioned in Table 3 of the main text, highlighting their individual contributions to the overall development of VR experiments.This review of frameworks is an integral part of our iterative process to examine how various elements of an experiment are currently being documented, automated, and interfaced within existing experimentation tools.By doing so, we were able to assess whether the proposed six-pillar taxonomy adequately covers the existing automated aspects of experiments.Throughout this review, we keep a focus on the six pillars of ExaC (refer to Figure 2 in the main text) to establish a connection between the potential for automation.
The Experiments in Virtual Environments (EVE) framework 1 introduced a focus on automating data collection, assembly and analysis.An experiment protocol is implemented using prefabricated (prefab) templates or creating new scenes with tasks and annotations on which data to store.The design-implementation-link problem for novices is addressed with prefabs but experts are required to create new solutions.At runtime, the framework manages the participant sessions and data assembly based on configuration files.This framework also offers questionnaires and data analysis in an R package to bring together all data associated with an experiment in a single place and simplify difficult steps in data assembly and data analysis.This setup allows for rapid prototyping and piloting including the analysis of the experiment, which is crucial for preregistration.The framework is open-source and has been regularly extended with new templates based on studies conducted over the years [2][3][4][5] and extended to run on all major operating systems.
The Virtual Reality Experiments (VREX) 6 was developed at the same time as EVE.Plug-and-play design of experiments is fully achieved by providing a stand-alone software to edit the experiments based on Unity.This limits the experiments to predefined tasks but in a fully configurable indoor environment.Other interesting features are configurations for spatial audio and different locomotion systems.The data collection is a rudimentary data logging to files of fixed variables.The authors also provide the actual unity project which would allow to create new tasks to expand the stand-alone version.
Expanding on the idea to automate difficult steps in experimental design in VR, the Automatic Generation of Experimental Protocol Run Time (AGENT) 7 uses Domain-Specific Language (DSL) to facilitate the description of data collection.This approach is meant to reduce even further the workload on the researcher by generalizing typical experimental protocols.This approach works well in organizing the experimental protocol and generating the code to execute it.However, it requires the researcher to link the generated code to objects in their scenes (design-implementation-link problem).Ultimately, the DSL cannot overcome the necessity to have in-depth technical knowledge of the game engine.AGENT also presumes the researcher to have a high-quality protocol in order to automatically generate the experiment code.As an external library, AGENT could be integrated with any game engine but only provides Unity API.
The framework VRate 8 focuses on providing questionnaires in VR.Questionnaires were implemented to work in immersive VR and are created from JavaScript Object Notations (JSON) and stored in comma separated files (csv).There is no support for the implemention of experiments beyond questionnaires.Previous frameworks have focused on tailored environments to study general phenomena.The VREVAL framework 9 focuses on automating experiment environments by integrating building information models (BIM).Here, BIMs are combined with questionnaires, navigation tasks, and annotation tasks to conduct research that can be evaluated with external BIM software.It is one of the few frameworks, to our knowledge, that uses the Unreal game engine.
Another interesting angle of research that is enabled by running VR studies is to understand how people react in crowds both under normal and emergency conditions.The Networked Virtual Reality for the Decision Science Laboratory (NVR-DeSciL) 10 focuses on studying crowd dynamics in VR and is a networked solution to enable multiple participants in a single virtual environment at the same time.An authoritative server architecture is used to provide the same experience to all participants.Here, all user inputs are sent to the server and an update is produced for all participants at once.The framework is tailored to the Decision Science Laboratory (DeSciL) at ETH Zürich but offers an interesting approach how to automate VR for multi-participant experiments.
Another framework that focuses on the execution of the experiment protocol is the Unity Experiment Framework (UXF) 11 .Instead of using a DSL like AGENT, they programmatically separate the what from the how to address the design-implementationlink problem.Here, the framework provides a generalization of the experiment flow in the form of sessions and trials and researchers must register with the sessions and trials what should happen in C# code.In a second step, the researcher must input how by writing C# code snippets on how a trial within a session is run.UXF provides many prebuild code snippets to design classic experiments.While the setup does not quite follow a classical " as Code" approach in the form of a configuration file, it can be considered one as the C# code is essentially reading the configuration and executing it.
The Unified Suite for Experiments (USE) 12 focuses on neuroscience application by providing hard-and software to automate synchronizing data with high temporal precision in local labs.The frame rate as well as transmission delays produce serious issues for brain-scanning where the exact stimulus must be determined.USE introduces the SyncBox as a hardware setup that measures the current frame on the screen and aligns it ex-post with the high-frequency data from brain-scanners and other hardware.The framework also provides a nested hierarchy and a user-defined experimental flows to create experiments tasks, providing yet another attempt how to solve the design-implementation-link problem.Additionally, the framework is able to automate interactions for artificial and non-human users.
Another attempt at the design-implementation-link problem has been taken by the framework Biomotion Lab Toolkit for Unity Experiments (bmlTUX) 13 .It is a small framework that focuses on organizing participants into a factorial design with dependent and independent variables, providing a nested design for experiments but ultimately requires the coding of the trial.It compares to UXF or AGENT in providing a user interface for the factorial design instead of coding it in C# or a DSL.
OpenMaze 14 focuses on a particular type of navigation experiments and tries to provide automations with maximal flexibility and low complexity.Configuration files are used to define the relations between goals, landmarks, and enclosures.Here, the design-implementation-link problem is solved by restricting the input space and providing pre-configured implementations.
Data assembly is rarely the focus of a framework.Virtual Observations 15 focuses on the reconstruction of user actions.In particular, immersive VR user input requires special care as handheld controllers spatially located in the virtual scene produce a large amount of data that critically defines the participants' action in the scene.The framework advances replay and monitoring functionality but does not provide features for experimental design beyond recording the necessary data.This framework is mostly orthogonal to other frameworks and could be used in parallel but without specific integration would duplicate data recording features.Nonetheless, the more fine-grained approach to user input in immersive VR is a valuable addition to automation practices.
The Landmarks framework 16 focuses on data collection, assembly and analysis like EVE but sets different focuses.It also includes different immersions (HMD, Desktop, etc), the timeline of an experiment (a nested hierarchy) including tasks, the environment and the data collection with log-files.A unity package is provided easing the access to developing an experiment.
VR-Rides 17 is not a classical experiment framework but focuses on exercise games (exergames) where users physically move with either a pedaling device or a treadmill through a Google Street View (GSV).It provides automation for hardware and GSV.Interestingly, the framework provides a user-study module that allows to assign participants to conditions, as well as a database allowing to implement experiments.
The Toggle Toolkit 18 provides an interesting approach to resolve the design-implementation-link problem.It introduces trigger as a design concept to transfer unity features to experimental design paradigms (autonomous trigger, key trigger, timer trigger, collider trigger, collider-key trigger, distance trigger).The triggers can be chained producing something similar to a experimental flow but embedded in the scene.While expressing complex task designs may be tedious, it can be performed without writing code providing an effective solution to the design-implementation-link problem.
The framework Delayed Feedback based Immersive Navigation Environment (DeFINE) for Studying Goal-Directed Human Navigation is an extension of UXF with a focus on the stimulus-response-feedback architecture for navigation experiments.In particular, the feedback is an explicit part of the framework where other frameworks require the researcher to develop these features themselves.The feedback is presented in the form of a leader board to encourage participants to perform better.The framework also provides researchers with a set of locomotion methods including teleport, arm swing, head-bob and physical walking.Lastly, it is also one of the few frameworks to provide immersed questionnaires to maintain immersion for participants.

S2 VR-Check for general VR experiment
The VR-Check 19 has been developed to underpin the development of VR software for medical use with the main purpose to retrain humans after severe physical or neuronal trauma.While the original checklist is thus very domain specific, we generated a more general notion to help improve the development of an experimental protocol, experimental documentation and most of all, experimental design.We rephrase the checklist as questions to make them more accessible to the researcher using them.