Similarity of gaze patterns across physical and virtual versions of an installation artwork

An experiment was conducted to compare museum visitors’ gaze patterns using mobile eye-trackers, whilst they were engaging with a physical and a virtual reality (VR) installation of Piet Mondrian’s Neo-plasticist room design. Visitors’ eye movements produced approximately 25,000 fixations and were analysed using linear mixed-effects models. Absolute and area-normalized dwell time analyses yielded mostly non-significant main effects of the environment, indicating similarity of visual exploration patterns between physical and VR settings. One major difference observed was the decrease of average fixation duration in VR, where visitors tended to more rapidly switch focus in this environment with shorter bursts of attentional focus. The experiment demonstrated the ability to compare gaze data between physical and virtual environments as a proxy to measure the similarity of aesthetic experience. Similarity of viewing patterns along with questionnaire results suggested that virtual galleries can be treated as ecologically valid environments that are parallel to physical art galleries.


Scientific Reports
| (2021) 11:18913 | https://doi.org/10.1038/s41598-021-91904-x www.nature.com/scientificreports/ Until recently, the only feasible way of implementing eye-tracking in a study was by restricting it to laboratory environments, where generally the only stimulus option was the reproduction of artworks instead of originals. In line with the development of mobile eye trackers, a paradigm-shift for empirical aesthetic research was stepping outside of the well-controlled laboratory environments into the real world where observers engage with works of art in their original forms 28 . In a previous pilot experiment, for example, we investigated the eye movements of gallery visitors whilst they were engaging with a room-scale installation 29 . This installation was later recreated virtually with a set of variations based on the topological properties and observed gaze patterns, and these variations were used in an online 2D view eye-tracking experiment. To further illustrate this research potential, researchers have investigated (1) interaction between gaze patterns and abstract paintings in a gallery 30 and potential implementation of scan path analysis using support-vector machine algorithms to classify paintings based on fixation sequences 31 , (2) use of mobile eye-tracking analysis on abstract and representational paintings in a museum 32 , (3) effects of bottom-up factors (as indexed by saliency maps derived from paintings) and top-down factors (as manipulated by the information about paintings provided to the participants, who were allowed to view the same paintings again) between children and adults, whilst viewing Van Gogh paintings 33 , (4) interaction between speaking and fixation patterns and various gaze metrics 34 , (5) difference in exploration strategies among wheelchair and non-chair users in museums 35 , (6) amount of attentional shift between museum content itself and a supplementary tablet containing information on that content 36 , (7) whether fixation duration can predict aesthetic choice 37 -among others. One commonality across these diverse studies is their emphasis on the necessity of fieldwork in empirical aesthetics, aiming to measure aesthetic experience and judgments in genuine settings.
Presenting arts online and more recently virtually was a huge step forward for accessibility of cultural heritage. Although VR has been used previously in pioneering works 38 and research 39 , there is currently a growing interest in both consumer-grade and research-grade VR solutions. One particular reasoning behind this interest seems to be the experimental research potential to employ freely moving participants in virtual environments [40][41][42] . Also, the accessibility to modelling software and game engines provides ease and widely accessible tools to create novel immersive environments [43][44][45] . Additionally, following the development of eye movement analysis in 3D space collected from digital simulated environments 46,47 , the recent emergence of VR headsets capable of eyetracking offers a completely new opportunity to step beyond the conventional lab and into the in-situ context. A relatively unexplored area with emerging experimental design guidelines as well as some ethical concerns 48,49 , nevertheless VR holds exciting promise for empirical aesthetic research. There is some previous research comparing museum and laboratory settings as well as original and reproduction artworks 50 , and similarly, investigating preference towards types of substituted representations of artworks 51 , or targeting emotional experience using mobile EEG to develop a classifier based on the data recordings from a real and virtual museum 52 . In line with recent experimental results underlining the observed contextual differences (particularly between lab-based and real-world conditions), aesthetics research in laboratories resembling the genuine contexts of aesthetic experience as much as possible was proposed 53 . However, direct comparative research between the art galleries and arguably their closest proxy, immersive environments, (particularly for 3D arts and based on eye-tracking) is still missing. As the VR environment develops into a valid and comparable setting to physical galleries and museums, a direct comparison between VR and in-situ environments seems to be crucial to enable future use of VR. If such research suggests similarity between the two settings, then immersive environments can be treated as both highly controllable and ecologically valid research settings.
Here we focus on the work of Piet Mondrian, whose abstract paintings are prominent examples of the De Stijl art movement. In most of his late works, Mondrian radically restricted compositional features of artworks following the art movement of Neoplasticism, by using only horizontal and vertical lines and three primary colours red, blue, and yellow, along with black, white, and grey. This abstraction epitomising purity and sparseness of lines and colours lends itself to straight-forward mathematical descriptions to aid quantitative approaches. In this sense, reproduction of Mondrian paintings and quasi-Mondrians as manipulated versions of originals have been used as stimuli for empirical aesthetics, arguably due to the artist's historical significance as well as the low-level compositional features, offering clear and easily modifiable geometric structures as experimental stimuli. For example, researchers investigated whether computer-generated synthetic Mondrians were preferred more compared to originals 54,55 , whether original or rotated, oblique orientations of Mondrian's paintings were preferred and whether eye-movement patterns were similar across orientation conditions 56 , and whether aesthetic preference towards Mondrian paintings was correlated with measured pupil size of participants 57 . Incorporation of other variables in relation to Mondrian's work has also been a prominent research theme, such as asking whether liking of original Mondrians was mediated by personality factors like openness to experience 58 . Recently, a distinct example of Mondrian's work, a room design proposal commissioned by Ida Bienert in the early twentieth century but never realised 59 has drawn attention from researchers, along with an art-historic curiosity. Using variations of scale physical and digital models, it was argued that the room-scale artwork proposal conflicted with strict neoplasticist ideals, because of perspective distortions in retinal projections, which are exacerbated by changes of viewpoints in the room 60 . Following on from that, our test case mainly aims to measure observers' visual exploration as indexed by eye-trackers inside 1:1 scale physical and virtual versions of this particular design proposal. In terms of art research, our approach can be seen both as a behavioural experiment in a physical gallery, and as a comparative study aimed to investigate whether a virtual installation would be a suitable proxy for a physical installation.
The main aim of the present study was to compare observers' gaze patterns (as constituents of the aesthetic experience) within an art installation in physical and virtual instantiations. The physical installation created by the artist Heimo Zobernig, and the VR reconstruction developed by our team were temporarily exhibited in the Albertinum Museum in Dresden, Germany. Having a full day of access to a flagship exhibition on historical milestones of abstract art in an internationally renowned museum 61 62 . On the other hand, questionnaire responses as a part of psychological testing were considered to reflect decision making and other (conscious) cognitive processes 63 . As a result, to the best of our knowledge, we present the first direct comparison of quantitative measures capturing core aspects of visual aesthetic experience of an installation artwork in both physical and virtual embodiment in its museum context. Given the strong topological similarity between physical and VR installations, we expect very similar visual exploration behaviours between the two contexts. In this sense, the non-directional alternative hypothesis can be formulated such that the visual exploration patterns as indexed by absolute and area-normalized fixation duration regarding sets of area of interests (AOIs) during the viewing of a static abstract installation between in-situ and VR condition are different, whereas the null hypothesis as the default state can be formulated that there is no difference between the two contexts.

Methods
Participants. Museum visitors were approached at the exhibition entrance and invited to take part using opportunity sampling during regular visiting hours. They took part in the study voluntarily. All participants provided written informed consent prior to the experiment. Thirty-one museum visitors (21 females, 9 males, M Age = 49.23 years, SD Age = 18.25 years, R Age = 20-79 years) participated in the study. All participants reported having normal or corrected-to-normal vision, in the sense that they viewed both stimuli in the same conditions as if they were viewing other artworks of the exhibition. Participants could use their contact lenses for both settings, or wear their glasses in the VR headset and a corrective lens was added to the wearable eye tracker whenever needed. Although no explicit vision status measure such as a visual acuity or contrast sensitivity test was implemented; a screening questionnaire comprising eleven items was provided, aiming to link any unusual eyetracking data (such as calibration failure or frequent lack of fixation detection) to the vision condition (such as recent laser surgery), and potentially to exclude the participant data (see Supplementary Fig. S1 Fig. 1a-b and Supplementary Fig. S2a), or as a reconstructive decision for the VR implementation to match it to the physical layout of the original room, the Damenzimmer in Ida Bienert's villa in Dresden, for which it was designed (see Fig. 1c-d and Supplementary  Fig. S2b). Briefly, Zobernig produced the artwork as an interpretation, which deliberately did not try to exactly reproduce Mondrian's commissioned watercolour painting of the design, which furthermore did not match the actual, physical dimensions of the room in the Bienert Villa. Our team's VR reconstruction was based on Mondrian's composition combined with physical room measurements taken by us, and in line with Neoplasti- In the physical installation, one of the artistic decisions of Heimo Zobernig was to extrude the interior patterns onto exterior surfaces of the room, whereas our VR reconstruction had a homogeneous grey texture for the exterior surfaces. Also note that since counterbalancing of the conditions was implemented to minimize the temporal order effects in repeated measures design, the participants were randomly assigned to view either (a-b) the physical installation first, or (c-d) the VR reconstruction first. www.nature.com/scientificreports/ cist rules, slightly adjusting the design such as to fit into the actual room layout, including positions of walls, windows, and doors. In this sense, we did not aim to create identical architectural constructs, but compare the aesthetic experience in two very similar environments inspired by the same design idea. The adjustment of the VR design in accordance with the actual room dimensions led to VR dimensions of 499 by 494 cm, with a height of 360 cm, whereas the dimensions of the physical installation were 483 by 510 cm with a height of 385 cm, which followed Mondrian's design sketch. Both the physical and VR installation incorporated monochromatic coloured patches for the room surfaces, and two main natural white lighting sources as an ambient light and as a surface light coming from the ceiling inside of the room, with no further controls for the similarity of the colour saturation and luminance. The outer environment surrounding the installation was a static grey scene in VR without any additional digital audio or digital avatars in the scene, whereas the physical installation was situated in the large museum space, right next to our VR installation (compare Fig. 1a-b and c-d). In both settings, some background noise from the visitors were inevitably present in the gallery space. Hardware and software used to record was a wearable eye-tracker (Tobii Pro Glasses 2) via Tobii Pro Glasses Controller, and a VR headset (HTC Vive with integrated Tobii eye tracker) via an executable file built by using Unity with Tobii Pro VR Analytics, a software package to enable data collection. In addition, an exit questionnaire was completed by participants, which included four items on basic demographic information, eighteen rating items as five-point Likert-scale that implement both positive and negative scoring on interest and opinions about art, and four open-ended questions as feedback (see Supplementary Fig. S1 for questions from the rating-scale questionnaire).
Design. The study was designed as a within-subjects experiment, since each participant was expected to take part in both physical and VR conditions. To overcome carryover effects, the order of visiting the physical and virtual room was counterbalanced as the participants were pseudo-randomly assigned to either physical-first or VR-first conditions such that the half of the participants viewed the physical installation first, and the other half viewed the VR version first. The main conceptual justification for encountering two versions of the room design was the assumption that the forms of (top-down) effects due to the temporal order of viewing tend to cancel each out, albeit potentially introducing some noise to the data. Main dependent variables, both as absolute and areanormalised values, were dwell time defined as cumulative fixation duration, number of fixations, and average fixation duration on particular regions of the room, and all measured using the eye-trackers. Main independent variables were artwork media type as a binary variable labelled either as physical or virtual, and sets of AOIs as indexed by corresponding 3D geometry of the rooms, such as six surfaces of the cuboid room, three pieces of furniture, and six colours on room panels (see Fig. 2 for an overview of the AOI mapping). In line with the null hypothesis, no main difference is expected in terms of absolute and area-normalized dwell time, depending on the artwork media type and sets of AOIs. Note that although it was not explicitly recorded, we expected noise in the data from simply a re-exposure effect or forms of order effects such as fatigue, boredom, or practice, but aimed to minimize them using counter-balancing. Additionally, we assumed most participants engaged with the artworks for the first time during the experiment, at least for the first time during the data collection day, this assumption was supported by the fact that none of the participants mentioned about a previous viewing of the artwork, and a majority of them were not familiar with the artist as indexed by their response on the questionnaire.
Procedure. After participants were given the written and oral instructions, they completed consent forms and screening forms about vision status at a reception desk, and they went through the experiment consisting of three steps. For one set of participants, they firstly were equipped with the mobile eye-tracking glasses and explored the physical artwork as long as they wanted by themselves. Researchers did not accompany participants, but since the study was conducted in a public museum space where other visitors had rights to view the artwork, there were a few instances of an additional visitor inside the installation, but only for brief intervals: there were five instances where another visitor was present simultaneously with a participant. For those five participants, the duration of co-presence was approximately 85 s in total, corresponding roughly to the 3% of the experiment's whole eye-tracking recording duration inside the physical installation. Secondly, participants were equipped with the VR headset and explored the digital artwork, again without any time limitation. In this phase, a researcher always accompanied the participant simply to handle the cables between the headset and the computer. For the other set of participants, the order of viewing was reversed. For the physical installation, the participants always entered and exited the room using the same door opening on the South Wall. Due to the technical limitation of the physically walkable area in the VR version, participants started the VR experience at the centre of the VR room, and all were instructed to face forward towards the same wall at the start. Lastly, they completed a brief exit questionnaire either in German or in English depending on their preferred language at the reception desk (see Supplementary Fig. S3a for an overview of the procedure, and Supplementary Fig. S3b for a view from the gallery space). Note, eye-tracking calibrations were executed prior to the data collection, separately for the VR and mobile eye-tracker for each participant, to ensure the reliability of gaze data (in terms of precision and accuracy). The default in-built calibrations provided by the Tobii software (i.e., five-point for the VR and single-point for the mobile) were expected to provide a similar level of quality (but noisier than the data obtained in a lab setting from a research-grade screen-based eye tracker). No further interim recalibration, which is an occasionally used practice for eye-tracking drift correction for longer experimental designs, was carried out, because the installation was treated as a single stimulus, and the recording duration was short.
Data analysis. Three sets of data were formed: from the questionnaire, mobile eye-tracking, and VR eyetracking. All participants completed the physical part of the experiment; however, due to a data-saving error on the mobile eye-tracker data during the study, seven recordings could not be recovered, resulting in twenty- www.nature.com/scientificreports/ four valid recordings from eye-tracking glasses, instead of thirty-one. One participant did not participate in the VR part of the experiment due to discomfort caused by the screen brightness, resulting in thirty valid recordings from the VR headset. All participants completed the exit questionnaire, resulting in thirty-one respondents. Analysis of the questionnaire included descriptive statistics indicating the frequency distribution of rating responses on a Likert-scale. The workflow from the raw recording data to the statistical analysis is explained in detail in Supplementary  Fig. S4. The main difference in workflow between mobile and VR was the interim manual coding for the fixation locations, realized by the lead author (also see Supplementary Fig. S5 for a reliability comparison). Following on from that, the AOI mapping (as illustrated in Fig. 2) was the basis for five comparisons of eye-tracking data, related to the sparse spatial layout and composition style of Mondrian's design, generating subsets of the data that were split out into separate levels for the data analysis: (1) Room elements consisted of three levels: interior cube surfaces, furniture, and regions representing the outside vista as door and window openings. (2) Colour types were split into two levels: luminance-and chrominance-type, such that whether the colour had luminanceinformation only as white, grey, black, or had chroma-information as red, blue, yellow. Five main types of eye-tracking variables were analysed which were either reported as main results or summarized as supplementary results: (1) Absolute dwell time was defined as the cumulative fixation duration per AOI. (2) Area-normalized dwell time aimed to measure a type of fixation density or attentional density, after accounting for sampling at chance, to correct for the relative size of areas. It was calculated as cumulative fixation duration multiplied by the given AOI area in percentage, such that any given AOI was expressed as a fraction of the total area of all the AOIs. (3) Fixation count was the number of individual fixations on a given AOI. Unless there is a significant difference in the average fixation duration metric (see below), fixation count tends to be highly correlated with absolute dwell time. (4) Area-normalized fixation count was the normalized metric using area size of an AOI as a fraction, as above. (5) Average fixation duration was the mean duration of fixations on any given AOI, and a derivative metric since it was calculated as absolute dwell time divided by the number of fixations. This derived metric allowed comparison of how often the gaze is relocated between different image a b c d Figure 2. An overview of 3D AOI mapping for the VR condition. (a) A diagrammatic view of the cuboid room, in which (b) six surfaces of the room and (c) three pieces of furniture were present. Each individual coloured 2D panel was coded as an individual AOI during the development of the VR environment, and (d) had a unique colour value out of six possible colours. For example, the dwell time on red colour patches was calculated as the cumulative sum of fixation duration on four AOIs, namely North-9r, East-2r, West-5r, and Bookcase-13r. Along the same lines, the 3D AOI mapping was also formed for the physical condition, regarding the same sets of AOIs.  Supplementary Fig. S7 for an overall table of mean and standard errors of the all above-mentioned measures, and also see Supplementary Fig. S8 for time to first fixation comparing the physical and VR settings for five comparisons, visualized as boxplots). Note that our explicitly presented results were mainly based on absolute and normalized dwell time to prevent analytical redundancy, since we expected very similar results for the potential analyses based on absolute and normalized fixation counts (see Supplementary Fig. S9 for the correlation table indicating the strength of the relation between dwell time and  fixation count). Lastly, because free-viewing introduced a difference in viewing time between conditions and across participants, the viewing percentage can be calculated (see Supplementary Fig. S10). Eye-tracking data for absolute and area-normalized dwell time were analysed using linear mixed-effects model (LME), which is equivalent to repeated measures analysis of variance (RM ANOVA). The reason not to run RM ANOVAs was that missing data from a single condition entail deletion of all data from the participant, whereas in LME each data point is treated as a single observation without participant exclusion. The main software packages used for the data analysis were MatLab, R, jamovi, Mathematica (mathworks.com, r-project.org, jamovi.org, wolfram.com); software for the data visualization were Unity, SketchUp, Lumion, Adobe CC (unity. com, sketchup.com, lumion.com, adobe.com).
Prior to evaluating our results, it is important to appreciate that we aimed to analyse a unique case study, mainly to compare the similarity of eye movement patterns between a physical and a virtual version of an art installation, and therefore our presented results and conclusions were limited within the confines of the realworld experimental conditions, rather than general and definitive. Therefore, the results could not be boldly generalized to the wider population and to wider forms of artworks; this limitation holds true for almost all research in empirical aesthetics. Additionally, as a general disclaimer, due to the small sample size in the traditional sense, this research has potentially low power, which in turn increases the probability of incorrectly failing to reject the null hypothesis and minimises the likelihood of reproducibility of results presented.

Results
Questionnaire. The eighteen Likert-scale rating questions can be clustered into four categories, and the most frequent response from five scale points as the mode of the data is often regarded as the most informative value (see Supplementary Fig. S11 for a summary of questionnaire responses). Overall, the questionnaire results indicated that the participants were highly educated, from diverse age groups, mostly regular museum visitors, were split into two on particular judgments comparing the two settings, open to VR experience, but only for shorter periods of viewing times. Note that the questionnaire was purely aimed at gaining more insight about participants' views such as their overall attitudes towards visual arts and the experiment, or familiarity with the artwork. The main research question was about eye movements during the aesthetic experience, and not qualitative differences in the experience itself, and therefore particular assessment tools based on self-report (e.g., about art expertise 64 , aesthetic emotions 65 or quality of VR use 66 ) were not administered. On the conceptual level, we did not form our main hypothesis on the grounds of the confounding variables, for example, whether the amount of previous knowledge about the artist or an art period acts as a mediator on the relationship between dwell time and sets of AOIs (see Supplementary Fig. S12 for initial exploration of these relationships).
Initial visualizations: dwell time as heatmaps. Since both physical and VR versions were visited by the same participant group, an initial one-to-one qualitative comparison was possible, using heatmaps to visualise the amount of dwell time on any given point for physical and virtual environments. Note that both eyetrackers collected gaze data with approximately 100 Hz sampling rate and used the same algorithms to detect fixations. An exemplar dwell time heatmap pair from a single participant for both conditions and from two diagonal viewpoints of the room can be seen in Fig. 3. An initial qualitative evaluation indicated some similarities between overall viewing patterns and some specific differences such as response to furniture between physical and VR environments. Additionally, individual differences between participants were apparent: for example, some participants spent relatively more time in the artworks, resulting in longer total dwell times and fixation counts. Some variability was inevitably present in individual patterns of preference, for example, for particular colours or walls (see the open-data directory to view individual heatmaps of participants). Overall, in both conditions, hotspots of attention as indicated by densely fixated regions, seemed to be located on coloured patches of red, blue and yellow, as well as the furniture (see Supplementary Fig. S13 for gaze data validity measures).  Fig. 4a. The viewing duration was comparatively longer than findings in the previous studies where the average viewing duration for 2D artworks such as paintings tends to be around 30 s in a museum context 67,68 , but rather similar to other research where the viewing durations for two distinct 3D installations were around two and four minutes 28 . In line with this total viewing duration difference, dwell time was relatively shorter in physical environment compared to virtual environment (M Physical = 76.02 ± 11.59 s, M VR = 97.68 ± 11.54 s), but did not reach statistical significance (as F (1, 52) = 1.71, p = 0.197, η p 2 = 0.032), and shown in Fig. 4b. Similarly, total fixation count was smaller in physical environment compared to virtual environment (M Physical = 322.08 ± 42.32, M VR = 579.13 ± 69.53), reaching statistical significance (as F (1, 52) = 8.82, p = 0.005, η p 2 = 0.145), and shown in Fig. 4c. Lastly, average fixation duration was substantially longer in physical environment compared to vir-  Fig. 4d. Taken together, these initial data analyses showed that participants seemed to be slightly more engaged with the virtual installation compared to the physical installation, which might be attributed to several differences between the two conditions, including a possible novelty effect of VR as suggested by the questionnaire data showing that most participants had not used VR previously, and the presence of other visitors in the physical installation, among other possible distractions. The substantial difference between average fixation durations suggested a shift in terms of general viewing strategy: visitors seemed to be rapidly scanning the VR environment with shorter intervals of attentional focus as reflected by shorter fixations, compared to the physical environment.
Spatial distribution of area-normalized dwell time. Each comparison of area-normalized dwell time (described as attentional density by accounting for sampling at chance to correct for the relative size of areas, and calculated as cumulative fixation duration multiplied by the given AOI area in percentage) was analysed using a separate linear mixed-effects model. For comparison 1 on room elements, all AOIs were used in the analysis, including AOIs belonging to outside areas visible through window and door openings. www.nature.com/scientificreports/ only AOIs on three pieces of furniture was used in the analysis. Note that the cupboard as one of these furniture elements had four additional AOIs as its frame or profile in VR condition compared to the physical installation, since the cupboard was constructed as a 3D object in VR but rendered as only a 2D flat surface in the physical installation. Also note that each individual rectangular panel of the room was defined as a single AOI, and then they were combined into sets for a given analysis: for example, all four blue panels as four distinct AOIs in the room constituted blue-condition for the comparison 2 and 3 on colour types and individual colours, all panels on the ceiling constituted ceiling-condition for the comparison 4 on cube surfaces, etc. Post hoc comparisons using t-tests were Bonferroni corrected; significance level, denoted by α, was set to 0.05; and Bonferroni-corrected p-values as observed, unadjusted p-values multiplied by the number of comparisons made were reported for determining significance (p corrected ⩻ α) for all results. Area-normalized dwell times comparing the physical and virtual environment without normalization of area covered by AOIs, are shown as boxplots in Fig. 5a-e. Lastly, note that the spatial distribution of absolute dwell time can be further seen as a supplementary analysis in Supplementary Fig. S14. Comparison 1 on room elements: A significant difference between room elements was found, and a difference was observed between environments and in terms of an interaction: F ( www.nature.com/scientificreports/ on furniture (M Furniture = 185.70 ± 17.69 s/%) was higher than both for surfaces (M Surfaces = 71.60 ± 7.46 s/%) and outside (M Outside = 115.00 ± 13.09 s/%): t (28.7) = 7.79, p < 0.001; and t (28.6) = 5.11, p < 0.001, respectively. In terms of environments, the area-normalized dwell time in VR (M VR = 148.64 ± 13.04 s/%) was longer than the physical installation (M Physical = 93.50 ± 9.07 s/%). After normalizing for the area sizes, overall, the density of visual attention was highest in VR compared to in the physical environment, in line with the similar but non-significant trend observed for absolute dwell time. Here, the difference between VR and physical environment reached a statistical significance, mainly due to increased weighting of furniture and outside for the analysis, and also due to minor area size differences between physical and VR versions mentioned previously. Similarly, since the surface areas of furniture and outside were relatively smaller than room elements, the area-normalization changed the trend between room elements such that visitors attended significantly more densely on furniture of the installation compared to surfaces or outside, irrespective of environments. When the interaction was broken down by focusing on the two types of environment to check whether environmental differences exist for any level of room elements, normalized dwell time for furniture and outside were significantly higher in VR compared to the physical environment (p = 0.041, p = 0.011, respectively), whereas the difference was not present for surfaces (p > 0.05). When the interaction was broken down by focusing on the room elements to check how room elements differences have an effect differently for VR and physical environment, some trend changes were also visible, such as the normalized dwell time difference between surface VR and outside VR was significant (p < 0.001), but the dwell time difference between surface VR and outside Physical was not significant as (p > 0.05), suggesting that for some pairs, the amount of dwell time difference was dependent on the environment. Note that since the surface areas for both furniture and outside were relatively higher in VR condition, here, the area-normalization enhanced the dwell-time difference between environments and in terms of an interaction, whereas no significant difference was observed for absolute dwell time (compare Fig. 5a and Supplementary Fig. S14a). Comparison 2 on colour types: A significant difference between colour types was found, but no difference was observed between environments or in terms of an interaction: The trend stayed the same compared to absolute dwell time, since the walls were roughly the same size within the cuboid rooms. When the interaction was broken down by focusing on the two types of environment to check whether environmental differences exist for any level of cube surfaces, all six post hoc comparisons yielded non-significant results (where all p > 0.05). The interaction was only pronounced, when the interaction was broken down by focusing on the cube surfaces to check how levels of cube surfaces have an effect differently for VR and physical environment: in this approach, some trend changes were visible, such as the normalized dwell time difference between ceiling VR and south-wall VR was significant (p < 0.001), but the normalized dwell time difference between ceiling VR and south-wall Physical was not significant (p > 0.05), suggesting that for some paired cube surfaces, the amount of dwell time difference was dependent on the environment.
Comparison 5 on furniture: A significant difference between types of furniture was found, but no difference was observed between environments or in terms of an interaction: F (2, 31.09) = 6.28, p = 0.005; F (1, 30. Note that since the surface areas for furniture were slightly different between VR and physical conditions such as in VR condition the cupboard had four additional AOIs and therefore had more surface area, here, the area-normalization diminished the dwell-time difference between environments (p = 0.058), whereas a significant difference (p = 0.009) had been observed for absolute dwell time (compare Fig. 5e and Supplementary Fig. S14e).

Discussion
The main research objective was to develop a methodology to assess the active exploration patterns of visual arts experience, and more specifically, to make a first step towards exploring the effects of an artwork's presentation medium as physical or virtual on this experience. We have focused on one example artwork, and tested a limited number of participants, and indisputably, future work should draw on more targeted and possibly larger samples and a wider spectrum of artwork. Nevertheless, we are dealing with a large data set consisting of approximately 25,000 fixations, each of which represents a single, albeit small and relatively unconscious decision about the artwork. As it stands, our case study also aimed to demonstrate that empirical approaches can contribute in a meaningful way to the understanding of art appreciation and its delivery through different media. To the best Scientific Reports | (2021) 11:18913 | https://doi.org/10.1038/s41598-021-91904-x www.nature.com/scientificreports/ of our knowledge, this is the first case of a direct and quantitative experiment to compare real-world aesthetic experience side-by-side with its VR counterpart. A major empirical justification of this research can be linked to communicating historic and contemporary visual arts to a remote audience [69][70][71][72] , especially in the context of novel trends in presenting arts to remote audience in the wake of the COVID-19 pandemic. Our main conclusion following the overall results was that when engaging with a spatial art installation derived from the Mondrian's design, participants showed predominantly similar viewing patterns on average in both physical and virtual environments, as indexed by gaze data from eye trackers. Our assessment on the similarity was the interpretation of the absolute and area-normalized dwell time analysis, which showed mostly non-significant main effects of environment and a lack of significant pairwise differences between the physical and VR versions for any significant interactions, except for absolute dwell time on some furniture elements (but also note that the furniture elements occupied only about ten percent of the surface area of the whole installation, and the most prominent design difference between physical and virtual installation was also present for furniture, in particular for the cupboard was a 3D piece in the virtual installation, but a 2D projection in the physical installation). In line with our expectations, our findings favour the null hypothesis, since no major difference was observed for the visual exploration patterns between in-situ and VR condition. It is important to briefly restate that, in general, the null results do not necessarily mean the lack of an effect or a difference, and they might be prone to over-interpretation; therefore, the findings can be described as preliminary evidence in need of further research and converging results.
Potential drivers for some particular gaze trends should be considered: (1) Irrespective of the viewing context, chroma-containing colours attracted higher visual attention densely compared to the luminance-only colours, as indexed by normalized dwell time. If the abstract nature of the installation and the minimum amount of semantic information available to the observer in this environment indicate that the participants' visual attention was mostly driven by the bottom-up factors, then we can argue for that even an elementary saliency map based on colour or contrast should have a strong effect on the difference on attended locations (see Supplementary  Fig. S15 for exemplar saliency maps generated using Itti algorithm 73 and using histogram contrast). (2) The most prominent semantic information available was the types of furniture. This was only true, if a participant was able to attribute objecthood status to the rather atypical furniture elements in the room. In this sense, object-based attention as a higher-level cognitive process and often studied along with scene perception and semanticallydriven saliency maps associated with it, can further help to explain some observed behaviours: for example, as a specific AOI within the set of furniture, the cupboard in the physical condition had the least amount of dwell time. Although it mainly consisted of yellow and black coloured patches, the cupboard was a flat 2D surface in the physical setting but not in the VR, which might reduce participants' ability to recognize the flat surface as a piece of furniture, and therefore potentially diminishes the object-based attentional guidance. (3) In terms of six surfaces, although the ceiling attracted the least amount of attentional density as indexed by normalized dwell time, no statistically significant difference was observed between the four cardinal walls (N-E-S-W). This non-significant effect on the cardinal directions was also present in a similar, previous pilot study 29 , where we had utilised a mobile eye-tracker within another abstract installation consisting of coloured patches of parallelograms, covering all four walls of a gallery room. Additionally, a related observation from the present experiment, in both the physical and VR conditions, was that whilst participants were moving within the installation, the participants tended to not rotate themselves continuously, and did not form any number of full rotational circles in either clockwise or anti-clockwise directions. Put differently, the cumulative sum of a participant's rotation on the axial plane parallel to the floor was almost always ± 180° in the physical installation since they entered and exited the installation from the same door; and very often within the range of ± 180° in the VR. We speculate that this observed behaviour of self-restriction on rotation might have an equalising effect on the distribution of visual scans on cardinal directions, and therefore on the normalized dwell time corresponding to the cardinal directions. Although the raw data recorded from simple gyroscopes in eye-trackers without precise motion tracking are not suitable for comparison, the general movement of participants, such as gait dynamics, might be prone to change depending on the exposure to the environment 74 . Here, as an anecdotal observation, participants naive to the VR tended to move more carefully or relatively slowly, compared to the physical world, a major factor might be the lack of visual bodily cues in the VR (also see Supplementary Fig. S16 for exemplar motion trajectories in VR).
In terms of experimental validity, most (if not all) empirical research in vision science has to make an inevitable trade-off between internal and external validity: internal validity roughly refers to the strength of the link between research findings and design of the study, and it can be increased for example by minimizing confounding variables and presenting well-controlled stimuli. On the other hand, external validity is related to the generalisability of the findings beyond the selected artificial stimulus, testing environment, or group of participants in the research. As a related concept, ecological validity often refers to the generalisability of the findings to real-world settings 75 . Here, we favoured ecological validity: although collecting gaze data using mobile and VR eye-trackers inside 1:1 scale physical and virtual versions of the artwork in a counter-balanced order from the same group of museum visitors aimed to preserve internal validity to some extent, our testing environment was far away from artificial laboratory conditions, where for example strict control of participant's viewing distance to a well-calibrated monitor accompanied with desktop-grade eye-tracker with higher sampling rate is often regarded as a procedural norm. On the other hand, art galleries and museums can be described as ecologically valid conditions where visitors' behaviour can be measured 53 , and these physical conditions are not well tested so far for VR.
Given the overwhelmingly similar pattern of eye movements in the two different environments, our results would suggest that using VR would be described as a suitable proxy for the aesthetic experience in gallery and museum settings. Describing eye-movements as an indicator and one of the few directly measurable components of aesthetic experience during artwork viewing is a common assumption behind many previous studies: often, www.nature.com/scientificreports/ researchers utilize eye-tracking as a meaningful tool to compare conditions or participant groups to answer their research questions, for example, (1) whether figure paintings and landscape paintings induce dissociable gaze patterns 76 , (2) whether expert and non-expert participants in visual arts form different oculomotor measures 77 , or (3) whether the overlap of museum visitors' viewing pathways on two paintings can be indexed and compared 78 . Additional measures can also be incorporated in studies and researchers can ask, for example, whether motioncapture alongside the eye-tracking during viewing a figurative sculpture by museum visitors who are trained dancers or non-dancers can be a feasible metric for aesthetic and kinesthetic experience 79 . Here, we used fixation maps and derived metrics such as dwell time per AOI as one potential way of comparing physical and VR museum contexts, and our main justification behind this is that the conceptualization of fixation maps 80 allows us to quantify the similarity of eye movement traces 81 . Note that we fully acknowledge that aesthetic experience as a highly complex process cannot be reduced to eye-movements, but nevertheless maintain that eye-movement metrics can be an essential measure to compare the interaction of viewers with an artwork. However, the assumption of the ecological validity of VR still needs more rigorous test cases to become a generalizable argument. We compared some of the basic measures that may be used to relate to aesthetic experience in terms of attentional engagement. Apart from mostly comparable results on absolute and area-normalized dwell times, visitors spent relatively more time in the virtual environment compared to the physical environment. More specifically, the main eye-tracking results showed that in both conditions, (1) participants visually explored in all directions as all surfaces of the installation except for the ceiling, (2) preferred coloured parts of the installation over the non-coloured parts as indexed by area-normalized dwell time, and (3) often revisited the same location as indexed by fixation counts on a given AOI. Results from the exit-questionnaire indicated overall positive feedback from participants, and provided a comparison between physical and virtual artworks, where participants were generally split equally into two towards favouring either physical-or virtual-versions on various evaluations. Since the perception and judgment of art are highly subjective, individual differences both in terms of gaze patterns and questionnaire responses were inevitably present. Overall, our findings suggest that in the test case presented here, the virtual presentation of the artwork did not radically change the observers' visual exploration.
Recently, a comparative study between physical and virtual settings for an art gallery was investigated, with a focus on using EEG and ECG to classify emotion recognition and type of environment 52 : relevant to our results, participants' self-assessment ratings on arousal and valence were part of their study, and almost no difference was found between physical and virtual contexts for eight art pieces, except for valence rating on a single art piece. In another study conducted to compare VR-museum and 2D computer monitor settings, no difference towards artworks' perceived quality and artistic quality was found, although the aesthetic experience of paintings was described as more intense in VR 82 . Similarly, virtual environments can enhance memorability to some degree: for example, one study investigated active and passive view of spherical, 360° movie clips (such that whether the viewpoint of footage is dependent to head-orientation of the participant or not) involving Rubens and Nicolas paintings displayed via a head-mounted display (HMD), and their findings indicated that viewers' impression on paintings were described as more powerful and realistic in the active viewing condition 83 . Another study compared the memory recall and recognition between 360° pictures displayed on HMD and on a tablet, and their results favoured the VR-display over tablet-display 84 . Apart from some enhancing effects of VR, the presentation medium of artwork seemed to induce minimal change on observers' experience.
Looking further afield than virtual art galleries, researchers have compared different examples of VR environments with their corresponding contexts to validate the feasibility of using VR as an empirical research tool: for example, comparing user experience in physical and virtual buildings in terms of architectural research showed that user ratings were mostly not affected between the two conditions, although some difference was present in atmospheric ratings such as boredom, attractiveness, and invitingness 85 . In another study, the perceived spaciousness of a room in VR was investigated, replicating the main findings of its counterpart experiment in a physical room 86 . Similarly, comparing participants' evaluations such as perceived pleasantness, interest, excitement, complexity, and satisfaction between physical and virtual interiors in terms of architectural and lighting design yielded no significant differences 87 . In a rather different research area, using measures of perceived presence, attitude towards a video game, memory recall and recognition of brand placement in a 2D, 3D, and VR version resulted in higher levels of presence in VR context, whereas attitude towards the video game and recognition of the brands was not changed 88 . Overall, the indication of VR as a valid context for behavioural research seems to be echoed by many researchers.
Eye-tracking and oculomotor data as a tool for aesthetic research, albeit useful, must be used with caution 89 . Correlation between preference and gaze data such as total dwell time and first fixation on one hand implied the feasibility of using eye-tracking metrics as an indicator of observers' aesthetic judgment 90 , on the other hand, observers' ability to acquire the gist of a painting rather impressively in sub-second duration regime 91 might suggest a redundancy of gaze data, and the prediction potential of fixation parameters towards aesthetic value has been also challenged 37 . In our study, we described observers' eye-tracking data both as a measure of visual interest and as a similarity measure of aesthetic experience, assuming similar visual input to the observer leads to similar aesthetic experience. Linking oculomotor responses to aesthetic judgment more directly might require additional sources of data such as continuous aesthetic ratings 92 or eye movement recording synchronized with event-related potentials 93 .
Total viewing time is inherently linked to the fixation count and dwell time (i.e., total fixation duration), but not necessarily to the average fixation duration: whilst it is logical to think that the increase of the viewing time is often linearly translated into the increase of fixation count and dwell time; generally, no radical change is expected in terms of average fixation duration. Although there might be various factors , including the novelty of the VR experience, our finding of a significant shortening of average fixation duration in VR (M VR = 171. 31  www.nature.com/scientificreports/ aesthetic appreciation, since the intention to positively appreciate a set of paintings results in a greater number of fixations and lower average fixation duration, compared to the intention to negatively evaluate 94 . Alternatively, this difference found between the two conditions might be interpreted as an effect of authenticity: although in our case both conditions were reconstructions of the original artwork presented in two different media, the potential effects of originality (such as whether an artwork is original, copy, or fake) on observer rating and gaze behaviour have been noted previously 95 , therefore it may be possible that visitors might have presupposed the VR condition a less authentic version of the artwork. Similarly, a potential arousal effect induced by the novelty remarked by participants, might be a factor accounting for the observed difference, since outside aesthetics research, changes in arousal states are often linked to changes in various gaze metrics such as average fixation duration 96 , pupil size 97 or saccadic velocity 98 . Additionally, compared to the mobile eye-tracking, the VR eye-tracking is, in theory, more robust to challenging conditions such as rapid head movements and change of environmental illumination: these might affect the fixation detection algorithms, and partially account for the average fixation duration differences. Previous research also indicated that Mondrian's abstract painting entailed a high amount of visual search as indexed by, for example, the number of saccades compared to other paintings 99 . If we were to denote dwell time on AOI sets as an indicator of visual search, then our results suggest that physical and VR condition also resulted in mostly similar visual search strategies during the visual exploration. Speculatively, particular differences between the two settings in general viewing such as average fixation duration (or albeit nonsignificant, total dwell time), might be linked to the current state of the VR. VR was perceived as a novel experience by the participants during the experiment, and this novelty might be linked to, for example, spending more time in the virtual installation. In time, the resemblance between physical and virtual galleries is only expected to increase, and with diminished novelty effects, more comparable general results might be expected in future studies.
Although promising results and valuable insights were acquired, comparing physical and virtual art spaces is still in its early stages, and our research was not aiming to provide fully comprehensive answers and explanations. Conducting a comparative experiment using two parallel, equally valid reconstructions models in a museum setting can be seen as a unique opportunity, but our findings on the similarity of gaze patterns for only one single, very specific example of an abstract art installation, with a particular population sample, does not justify bolder conclusions and generalizations about the validity of VR-context, especially without further behavioural measurements. First and foremost, most of our participants are regular art gallery and museum visitors, but many are not familiar with VR. Therefore, the extent and amount of some visitors' mental state of surprise especially during VR condition, or their awareness of wearing the mobile eye-tracker or the VR headset, and the possible influence of these aspects on exploration patterns remains unclear. A training phase for both wearable devices in future experiments might reduce novelty effects and the remaining discrepancy between conditions to some extent. It is clear that there is an enormous potential for more comprehensive work, both in a variety of methods and in the scope of arts presented. For example, to increase the inter-stimulus consistency, a rigorous photogrammetric workflow consisting of 3D imaging laser scanner in conjunction with readings from colorimeter measurement can be utilized to be the base of the virtual counterpart of any given static installation, preferably followed by the colour calibration processes of a VR-HMD, which would also require additional psychophysical testing. The methodological workflow might also include comparing gaze patterns with body motions indexed by gyroscopes during the experiment 100 ; or alternatively, a change of experimental design might allow for precise control for motion and viewing duration, at a cost of reduced freedom (see Supplementary Fig. S17). The concept of peripersonal space 101 might also help to develop a more comprehensive theoretical perspective. In future research, it would be useful to compare a complete exhibition between physical and virtual environments, instead of comparing just a single artwork. For the physical condition, a complete exhibition as a set of selected artworks in a dedicated gallery space might be provided. For the virtual environment condition, a well-controlled exact digital replica of the physical exhibition might be created, and ideally, use of an untethered HMD with inside-out position tracking might allow visitors to walk within the virtual exhibition without any constraints or without relying on alternative ways of VR locomotion such as teleportation. Additionally, an augmented reality (AR) version of the same exhibition would allow a ternary comparison between physical, VR, and AR conditions. Interacting with artworks as stimuli might allow for asking more fine-tuned research questions, related to memorability 102,103 or effects of haptic feedback and visual cues to depth information 104 . 3D saliency maps as extensions of 360° saliency maps 105 might be investigated to describe the extent of bottom-up influence of the environment on gaze behaviour. In terms of further data analysis, investigation of temporal dynamics 106-108 might provide more in-depth results, using tools such as temporal scan path analysis, or adapting methods from graph theory and related fields (see Supplementary Fig. S18 as an initial exploration of such directions). As we step inside the world of virtual museums and gallery spaces, current directions of VR in terms of artistic expression, digital heritage, and empirical research remains wide open. Despite the need for more comprehensive future studies, our research can be seen as an important and promising starting point for comparing aesthetic experience between virtual and physical environments.

Data availability
All anonymised data are accessible via Open Science Framework for anyone who would like to re-analyse the data or run any form of additional analyses: www. osf. io/ bgtpy.