Introduction

A novel genre of videos, labled as “naked-eye virtual reality” (VR), has gained viral status on Bilibili, one of China’s most popular video streaming platforms. These video creators contend that spectators can perceive a stereoscopic effect on a two-dimensional (2D) screen without requiring a VR headset. Three distinct types of naked-eye VR videos have attracted Bilibili audiences. The first is a roller coaster ride that transports spectators from ocean depths to the sky. The second simulates a spacecraft journey that propels spectators into outer space through wormholes, sometimes even taking them into the maw of a black hole. The third sub-genre introduces spectators to canonical works of art history by allowing them to step into the canvas or scrolls. In contrast to the first two genres, which transport spectators to spatial realms characterised by volumetric depth, the third type creates the illusion of a three-dimensional (3D) world on a 2D surface. Rather than enabling spectators to ascend to the sky or dive deep into the ocean, it generates the illusion of penetrating the painting’s flat surface. Therefore, the aesthetics of naked-eye VR can be summarised as 2D moving images that fool the naked eye using 3D depth cues.

Our perception of the 3D world in which we live is the fusion of two images captured by the naked eye. The depth cue that the brain extracts is the difference between the two images (Urey et al. 2011, p.541), and this depth cue is referred to as “binocular disparity”—the disparity between the images of an object perceived by the two eyes. Wheatstone (1962) conceptualised binocular disparity—that is, the differences in distance between the observed images due to the interocular distance—as the “horizontal parallax”. These differences result in an apparent horizontal shift in the position of the same object (Hattler and Cheung 2023, p.18). Horizontal parallax and binocular disparity were first observed by Euclid, and Wheatstone took advantage of them to invent the first stereoscopic device in 1852 (Wade 2002, p.913). They also allow the generation of stereoscopic vision in 3D films (Hattler and Cheung 2023, p.18), as well as 3D layouts in VR games (Aizenman et al. 2023, 2).

Situated in an intermediate stage between the stereoscope, a device that predates the birth of cinema, and VR with a headset, naked-eye VR occupies a parallax intersection in media history. Parallax history refers to the rediscovery of obsolescent technology’s aesthetics in new media, and the aesthetics of new media in obsolescent technology.It was introduced into the field of film history by Thomas Elsaesser (2019), who suggested that the visual effects achieved by post-cinematic media could be identified retrospectively in pre-cinematic optical toys and that these obsolescent technologies anticipated the arrival of new media technologies (p.79). The naked-eye VR can be best explored through a parallax history as it sutures the conjuncture between pre-cinematic stereoscope technology and the post-cinematic technology of VR with a head-mounted display (HMD). The binocular disparity that generates a 3D illusion in naked-eye VR can be traced back to the stereoscope. However, naked-eye VR video is a post-cinematic technology that can be described as a “deficient” form of standard VR with HMD as it fails to provide an all-immersive environment with the illusion of a frameless screen. Nevertheless, naked-eye VR has liberated spectators from the confinement of HMD, enabling them to choose their spectatorial position or even produce a naked-eye VR video. Therefore, the position of naked-eye VR in media history falls between retrospection and anticipation, pre-cinematic technology and post-cinematic art, and pre-VR media and post-VR culture.

What connects the stereoscope, naked-eye VR, and VR with a headset is the technical basis of “horizontal parallax”. Whissel (2016), however, proposes the “parallax effect” concept that is not only technological, but also epistemic and affective (p.233–235). Whissel recalibrates “positive parallax” and “negative parallax”, two sub-categories of parallax effect accordingly (p.233–235). In its original technological context, positive parallax is a situation wherein binocular focus converges behind the screen, creating the illusion that an object is situated at a volumetric depth behind the screen (Petkov 2012, p.50). Conversely, negative parallax results in an illusory protrusion effect, causing an object to appear to pop out from the screen as the binocular focus converges at the screen’s plane (Petkov 2012, p.50). Whissel associates positive parallax with an epistemic desire characterised by “a curious look that sees in order to know, such that the mind feels its way into the very depth of the picture”, whereas negative parallax provokes a “somatic and emotional responses in spectators via the emergent 3D images” (Whissel 2016, p.235). Similarly, positive parallax in naked-eye VR promises all-seeing spectators who can look into the image without being looked at, and all-knowing subjects who can explore everything about the image without being included in it. However, in negative parallax, the image breaks the screen and “fourth wall”, reminding spectators of the unknown space behind the screen and returning the gaze to the spectators by including them in the virtual space of the image.

Naked-eye VR can be categorised as a parallax media knotting together a parallax view between two incommensurable perspectives. The standard parallax view definition concerns “a change in observational position that provides a new line of sight”, usually concerning the shifting position of an object against its background (Žižek 2009, p.17). Žižek (2009) specifies the parallax view as situated in an irreducible gap between an “objective scientific account” in the third-person perspective and a “subjective phenomenon experience” in the first-person perspective (p.10–17). As with the naked-eye VR, the parallax view knots together the “objective” scientific account of two flat images in slightly different positions and the “subjective” experience of spectators seeing a volumetric depth in these flat images. The spectator’s position shifts from an “objective” point of view to a “subjective” one when viewing the image in parallax, transitioning from observing a two-dimensional space in third-person perspective to experiencing a three-dimensional world in first-person perspective. Naked -eye VR can, therefore, be defined as a parallax media that mediates the parallax view between the subjective and objective perspectives, first-person and third-person perspectives, and flat surface and volumetric depth.

As a parallax media, naked-eye VR can be categorised as a door that oscillates between its function as a window into the 3D depth of a fictional world and a wall that shuts at the 2D surface of the screen. Before computers became ubiquitous, media theory was dominated by three metaphors: frames, windows, and mirrors. Eisenstein likened the moving image on the screen to a framed painting on an opaque canvas (Andrew 1984, p.12); Bazin theorised the screen as a transparent window open to the world, a perspective that informs realist film theory (Sobchack 1992, p.16); and psychoanalytic film theory conceptualised cinema as a reflective mirror that allows spectators to identify with their mirror image: the character on the screen (Metz 1981, p.51). As the world entered the age of information, Manovich observed that computer screens were fundamentally different from cinematic screens because they oscillated between depth and surface, functioning as both a window into an illusionistic space and a flat control panel (Manovich 2002, p.41).

Mitchell (2015) drew a similar spectrum of a screen as a wall or window and suggested that screens can operate as walls that project the image on themselves and also function as windows transmitting visual information through themselves (p.233–235). Mitchell (2015) specified that walls symbolise the screen’s surface on which the images are projected, while windows signify an open doorway behind the screen to the concrete object in the image and the social context surrounding the image (p.233–235). Sandifer (2011), Hattler and Cheung (2023), and Gao and Jin (2021) proposed that “window frames”, which separate the extra-diegetic space spectators are situated in and the diegetic space inside the screen, disappear in stereoscopic technology. Specifically, Zhou (2023) further argued that the HMD of the VR has shifted the screen’s operational logic from the “frame” that demarcates the real and the virtual space to the “case” that contains the spectator’s vision in a frameless space (p.19). Rogers also (2019) highlights the VR screen’s capacity to surround, envelop, or enclose spectators in a “container” (p. 139). However, Rogers (2023) believes that the frame separating the space on-screen and off-screen remains, despite the disappearance of the rectangular frame of the screen (p.269).

In naked-eye VR, however, the frame reappears at the beginning of the video as the limit between the left and right eyes, and even without the frame separating the vision of the two eyes, the rectangular frame demarcating the screen’s boundary remains. Only when spectators adapt their vision to the horizontal parallax does the frame momentarily disappear, allowing spectators to see a frameless stereoscopic space. The frame’s appearance and disappearance can be best categorised as a “door”, theorised by Siegert (2012) as the symbolic threshold between inside and outside and an epistemic divider separating two worlds (p.10). Siegert (2012) also drew an analogy between a door, gate, and bridge, each suggestive of a pathway to a hypothetical space beyond (p.10). As in the case of naked-eye VR, the opened door operates as a window opening to a space beyond the screen and a stereoscopic space beyond the 2D images demarcated by the frame, while the closed door is akin to a wall that represents the limit of spectators’ scopic and epistemic desires and a frame that frames spectators’ vision within the 2D image. Therefore, naked-eye VR is a threshold between a door opened to the volumetric depth from a subjective perspective and shutting at the 2D surface in the objective account.

In what follows, I elaborate on the operational logic of naked-eye VR as a door between the parallax view of the surface and volumetric depth, look and to be looked at, known and unknown. While traditional VR promises a frameless space wherein spectators are all-seeing and all-knowing, naked-eye VR acknowledges the screen’s frame as the threshold of the parallax view, highlighting the stereoscopic illusion generated by flat images perceived by the two eyes. The paintings in the naked-eye VR further exemplify the unexpected encounter between stereoscopic technology in VR, highlighting the illusion of volumetric depth and the surface appearance of a canvas or scroll. The volumetric depth of the scene promises all-seeing and all-knowing omniscience, whereas the surface marks the end of this scopic and epistemic exploration, limiting visibility and knowability. Therefore, the screen in naked-eye VR functions as a door: when closed, it stops visual and cognitive exploration at its surface; when opened, it reveals the image’s volumetric depth and unlocks speculative imagination beyond the screen.

Horizontal parallax: between the left and the right eye

The technological basis of the stereoscopic effect lies in horizontal parallax, which utilises binocular disparity. The Holmes Stereoscope, the most popular 19th-century stereoscope, places two slightly different images of the same object photographed from different angles side-by-side on a stereo card (Fig. 1). These images are then fused using a stereo lens to produce a 3D effect. Horizontal parallax relies on the separation of a single vision of an object into two different points of view, which are then merged by the right and left eyes. Autostereoscopic displays, such as the Parallax Barrier invented by Berthier in 1896 and Lenticular Lenses invented by Hess in 1915, also take advantage of horizontal parallax (Chen et al. 2022, p. 429). Another commonly used stereoscopic technology is depth-fused 3D, which renders the same image on two overlapping screens at different depths and then fuses the two images using binocular disparity as a depth cue (Lee et al. 2007, p.192). Depth cues generated by binocular disparity are also key to VR HMDs. Images of an object are presented to the left and right eyes at slightly different angles, but spectators are not aware of binocular disparity when their vision is enclosed by an immersive HMD screen in an illusory 3D space (Luo et al. 2018, p.1545).

Fig. 1: The stereo card.
figure 1

A woman looks through a stereoscope.Underwood & Underwood (1901) The stereograph as an educator. [Image] Library of Congress Prints and Photographs Division, DC 20540, Washington.

While HMD devices hide the visual cues associated with horizontal parallax behind the screen, naked-eye 3D technology prominently displays these cues on the screen. For instance, an outdoor LED screen on Taiguli in Chengdu features a spacecraft that appears to fly out of the screen, facilitated by the binocular parallax of the two sides of the L-shaped screen (Fig. 2). The L-shaped screen also forms a false frame on the right, which is completely black and indistinguishable from the adjacent building’s black wall. When the spacecraft image moves to the screen’s right border, it appears as though it has flown out of the screen. Similarly, some video creators display a parallax barrier and feature objects moving in and out of it with horizontal parallax. For instance, in naked-eye 3D adaptations of a video game war scene, a bullet appears to fly through onscreen barriers, creating the illusion that it pops out towards the spectator (Fig. 3). By displaying the barriers and frames used to create the parallax, naked-eye 3D technology generates the illusion of breaking the fourth wall, acknowledging the spectator’s role in the protrusion effect.

Fig. 2: Spacecraft on an L shaped screen.
figure 2

A spacecraft that seems to fly out of the window. Source: Chengnanxiaoyue (2020) Spacecraft flying out of the naked-eye 3D screen at Taiguli, Chengdu. Available at: https://www.bilibili.com/video/av585271361.

Fig. 3: Bullet scene.
figure 3

A bullet that seems to break the screen. A bullet that appears to break the screen. Source: Mixiaoguoya (2021) 4K naked-eye 3D. Available at: https://www.bilibili.com/video/BV1Uv411T7Ng.

Naked-eye 3D videos actualise the potential for spectator participation in the parallax effect. These videos train spectators to become parallax machines, enabling them to perceive stereoscopic images through corporeal engagement without using an HMD. Naked-eye 3D videos present two slightly different images of the same object side-by-side. Spectators are guided to perceive a single stereoscopic image from the two flat images using either the “parallel eye method” or the “cross-over eye” method. The “parallel eye” method requires spectators to use a partition, such as a finger, book, or cardboard, between the eyes. This ensures that the right eye views the image on the right-hand side, and the left eye views the image on the left-hand side (Fig. 4). Spectators are instructed to adjust their gaze until the two images converge, at which point the partition is removed (Fig. 4). The “cross-over eye” method requires spectators to look at the image on the left-hand side with the right eye and vice versa; then relax their gaze until the images converge (Fig. 5). Occasionally, videos provide a “special aid”, such as two dots on each image, and instruct spectators to focus on a single point until the two dots converge (Fig. 5). Both methods train spectators to transform their bodies into a parallax machine, which first separates the binocular vision and then converges the stereo pairs observed by the two eyes into a single 3D image. Thus, the spectator’s body becomes a parallax machine that generates a stereoscopic image with unaided eyes. This reveals the operational logic of a VR HMD and simultaneously enlists the spectator in constructing one.

Fig. 4: Parallel eye instruction.
figure 4

A step-by-step instruction for the parallel eye. The image is slightly modified to improve the clarity. Source: Qiongwanke (2021) Instruction for Parallel eyes. Available at: https://www.bilibili.com/read/cv11435825.

Fig. 5: Crossover eye instruction.
figure 5

Source: Feelingjun (2019) Instruction for naked-eye 3D. In: bilibili. Available at: https://www.bilibili.com/video/BV164411q71n.

In contrast to VR viewed with an HMD headset, which conceals the screen’s boundary to create an illusion of limitless space, naked-eye VR acknowledges the limits of vision and anticipates volumetric depth beyond those limits. As Gao and Jin (2021) argue, the key factor distinguishing VR from cinema is the screen frame. In cinema, the real, off-screen world is separated from the fictional on-screen world by a frame that defines the screen’s limits on the x-y axis (p.169). Conversely, a VR HMD replaces the screen’s frame with a 360-degree panorama projected by the headset, promising limitless space along the z-axis. Paradoxically, naked-eye 3D accentuates the “frame” between the images seen by the left and right eyes, which remains invisible to spectators even in everyday lived experience, to generate a depth cue that extends along the z-axis. This concept is exemplified by a naked-eye 3D remix of a series of Van Gogh paintings, including The Starry Night, The Yellow House, and Vincent’s Bedroom in Arles. The video begins with The Starry Night hanging on a wall, quickly zooms into the houses in the painting’s background, lands on the street, and enters the Yellow House at the end of the street and Vincent’s bedroom before finally jumping outside the window to gaze upon The Starry Night once again.

The video presents two nearly identical images side by side and invites spectators to view the screen using the “parallel eye” method (Fig. 6). These images have been uploaded to the Chinese video streaming website Bilibili, where spectators’ comments, known as “bullet subtitles”, are displayed onscreen. As discussed in the bullet subtitles, some spectators can see three images: the two original images displayed side by side and a stereoscopic image emerging from the screen that is only visible with parallel vision. This demonstrates that spectators are fully aware of the frame between the left and right eyes, as evidenced by the boundary between the two images upon which the horizontal parallax is based. Some spectators can even perceive overlapping images as perceived by the left and right eyes when experimenting with the alternating focus of their parallel eyes. As the video nears its conclusion and the windows of Van Gogh’s bedroom are about to open to reveal the starry night outside, one spectator comments, “When I look at the window, the scenes within it appear doubled, and when I focus on the scenes in the window, the window itself appears doubled”. With the horizontal parallax traces appearing as the “frame” between the visions perceived by the two eyes, spectators are conscious of the 3D images’ constructed nature as the emergent effects of the 2D pairs.

Fig. 6: Naked-eye 3D for Starry Night.
figure 6

The screen shot of a stereopair. Source: Stereoscope Vision (2022) Naked-eye 3D version of Van Gogh’s Starry Night. Available at: https://www.bilibili.com/video/BV1SZ4y1U78T.

Naked-eye 3D is positioned between pre-VR stereoscopic technology and post-VR participatory culture. While the VR HMD can be seen as a successor to the lenses of the Holmes stereoscope, naked-eye 3D repurposes stereo cards, liberating spectators from the fixed observational position defined by the lenses and HMDs. Every spectator can participate in constructing a 3D image with the naked eye by juxtaposing two identical moving images. In participating, spectators are acutely aware that they are not unconsciously immersing themselves in virtual space, but rather observing the parallax between 2D images and 3D vision. As one bullet subtitle highlights, what strikes spectators most is the ability to “see a 3D space out of the 2D images”. Similar comments accompany the naked-eye VR version of Van Gogh’s paintings. The terms “naked-eye VR” and “naked-eye 3D” are often used interchangeably but are slightly distinct concepts: most naked-3D videos display stereo pairs that require parallel or cross-eye methods, whereas naked-VR showcases a single image, enabling spectators to perceive the 3D space using normal visual focus. Additionally, in naked VR, spectators can freely alter the angle from which they observe the 720-degree spherical space by sliding the screen.

Although the parallax barrier is not visible in the naked-eye VR Van Gogh video, spectators can still perceive and acknowledge the parallax between the 3D space and the 2D painting. Contrary to those who claim to see a 3D space in Van Gogh’s 2D painting, naked-eye VR spectators assert that they have entered the 2D space of the painting as 3D beings. Many bullet subtitles describe the video as a result of the “dimensionality reduction strike”, a concept Liu Cixin coined in his science fiction trilogy The Three-Body Problem to describe the process of downgrading a space to a lower dimension, with the solar system ultimately destroyed after being reduced to a 2D world at the story’s end. Through this reference, some spectators of the naked-eye VR version of Van Gogh’s paintings seek to describe the process of transforming themselves from 3D beings observing a 2D image into 2D figures navigating a 3D image. However, other spectators dispute this description in their subtitles, arguing that “it is not us who have been dimensionally downgraded but the image that has been upgraded from two to three dimensions”. Regardless of their stance, spectators experience a parallax effect as they shift between visualising 3D and 2D space.

Whether upgrading the painting to 3D or downgrading spectators to 2D, the naked-eye VR rendition of Van Gogh’s paintings can be described as a parallax media that combines the painting’s surface space with the stereoscopic space of VR, which is characterised by volumetric depth. In the video, the z-axis extends beyond the surface of a single painting through an open door and window, leading spectators from The Starry Night to The Yellow House, from The Yellow House to Vincent’s Bedroom in Arles, and from Vincent’s Bedroom in Arles back to The Starry Night. The naked-eye VR video narrative progresses in a single direction along the z-axis, trapping spectators from a first-person perspective that follows a chronological timeline; however, the spectators are free to rotate their spectatorial position at each point on the timeline in a 720-degree range. As one spectator pointed out, the spectatorial position offered by this video is akin to the first-person perspective in video games. Specifically, it resembles the “rail shooters” game, with the player’s avatar moving along a rail in a single direction, yet the player can freely rotate their perspective to shoot enemies or monsters. When viewers of the naked-eye VR experimented with the rotation, one subtitle rotated the image from ground level to a low angle and suggested that the sky was particularly scary (Fig. 7). Another subtitle explained that the sky is uncanny because it represents the 2D space in Van Gogh’s original painting, which is at odds with the 3D street from a perspectival standpoint. The spiral in the sky brings together the incommensurable perspectives of the 2D and 3D worlds. It also marks the limits of spectators’ vision and understanding while anticipating another world beyond spectators’ perception and comprehension.

Fig. 7: Sky in the naked-eye VR for Starry Night.
figure 7

The distorted clouds in the sky. Source: 360 panorama video technology (2021) Naked-eye VR of Van Gogh’s Paintings. Available at: https://www.bilibili.com/video/BV1Zv411W7Go.

Parallax history: from the pictorial scroll to naked-eye VR

Similar to naked-eye VR—situated on the verge of the known and unknown, between what is seen and what is looked at—Chinese scroll paintings intertwine the omniscient third-person perspective with a limited first-person perspective. Chinese scroll paintings, which originated during the Han Dynasty, unfold on a continuous roll of paper or silk (Delbanco 2008). These can be divided into two categories: hanging scrolls and handscrolls. Hanging scrolls display the entire image, accommodating an omniscient spectator who views an immobile image (Wang 2022, p.4). By contrast, handscrolls present consecutive subframes that divide the image into sections (Wu 1996, p.63). Spectators must simultaneously unroll the left side with their left hand and roll the right side with their right hand (Wang 2022, p.4-5). Spectators can only view one section at a time and never see the entire painting at once (Wu 1996, p.63).

Chinese scroll paintings encapsulate spectators’ conflicting desires: to view the whole image as an omniscient spectator and to immerse oneself in it as a character with a limited, first-person perspective. The omniscient spectator yearns to assimilate all information on the image’s surface through a panoramic view. However, the handscroll conceals the sections that remain rolled up, hinting at the hidden and unseen and enticing spectators to reveal the image’s secrets (Wu 1996, p.65). This desire to delve into the image is exemplified by The Night Entertainment of Han Xizai, a handscroll illustrating a banquet hosted by Han Xizai, a minister during the Southern Tang Dynasty. The artist, Gu Hongzhong, whom the king commanded to spy on Han’s banquet, rendered five sequential events that occurred at the banquet on five successive sections of the handscroll. Each segment is demarcated by a screen and linked by Han’s presence in every scene. Viewing the four scenes sequentially, spectators embark on a journey into Han Xizai’s vast mansion, penetrating layers of screens by unrolling the handscroll (Wu 1996, p.65). Wu (1996) posited that spectators are transformed into cinematic voyeurs “wanting to see what happens, to see things unrolling” (p.65). This desire for epistemic depth heralds a stereoscopic vision that challenges the generic convention of the handscroll, which typically emphasises surface detail.

The Night Entertainment has become an intermedia screen that integrates the aesthetics of both surface and depth and the seemingly incommensurable cinematographic elements of montage and long take. The screens separating the different sections of the handscroll are reminiscent of cuts that separate different shots in a montage, whereas the uninterrupted process of viewing an unfolding image echoes the long take, which captures a changing scene in a single shot. The cinematic aesthetics of The Night Entertainment make adaptation to moving images ideal. This is evident in a banquet sequence in the television drama Palace of Devotion (2022) that directly refers The Night Entertainment. The five scenes in the sequence were filmed in a single long take, with the camera moving from right to left in the direction in which a Chinese painting scroll was read. There are no cuts, dissolves, or interruptions between scenes, and the divisions of different sections of the painting are spatialised as doors, pillars, and screens, simulating a voyeuristic gaze that scans the surface of a scroll. Nonetheless, the sequence does not simply adopt the flat composition of the pictorial scroll but also frequently dollies in to capture the depth of each scene and employs a circular track to emphasise the 3D space. Therefore, the sequence highlights the extent of the surface across the x-y axis with a long take and depth along the z-axis with the dolly and track.

The paradoxical integration of surface and depth is evident in the panoramic compositions of horizontal scrolls, a prime example of which is Along the River at Qingming Festival, painted by Zhang Zeduan in the twelfth century. This scroll painting offers a panoramic depiction of Bianliang. Unlike Renaissance paintings, Qingming scrolls do not have a single vanishing point but are characterised by a panoramic scatter perspective, suggesting that they may have been painted from a high point that offered a view over a vast expanse (Yan 2012). The image composition corresponds to the view obtained by a spectator who scans a panoramic scene to capture all of it simultaneously. However, the scroll is not devoid of depth; trees and houses are arranged using a linear perspective, with multiple focal points evenly distributed across the scroll (Wang 2022, p.9). This mirrors the camera movement of a long take that pans from right to left in the scene. However, zooming into a specific region of the image can cause it to appear 3D with a depth of field. In addition, Zhang Zeduan adeptly mitigated parallax distortion at the intersection of 3D and 2D perspectives by rendering fluid contours of rivers and streets.

The dimensional parallax of the Qingming scroll was designed to accommodate two viewing perspectives: one from the flat surface along the x-y axis and another from the depth along the z-axis. Dimensional parallax can be defined as the incommensurable conjunction of 3D and 2D perspectives. In contrast to The Night Entertainment, primarily exhibited as a handscroll, the Qingming scroll can be displayed as either a handscroll or a hanging scroll. The image can be hung on a wall as a panorama to give spectators a comprehensive view of the scene by stepping back, albeit at the cost of missing fine details. Alternatively, the image can be viewed as an unfolding handscroll for the voyeuristic spectator keen to delve into its intricacies. However, such a spectator can only examine one portion of the image at a time and cannot capture the entire composition while handling the scroll. Accordingly, the hanging scroll represents an omniscient perspective for spectators to see the image as a panorama, whereas the handscroll represents a limited perspective for spectators to see consecutive scenes individually.

The VR adaptation of a scroll painting accommodates the desire for an omniscient perspective by initially presenting the image as a hanging panorama and then zooming in to guide spectators into the image to see it from a first-person perspective in stereoscopic space. This is demonstrated in the VR adaptation of Zhang Daqian’s Cloud Sea of Mount Hua (1936), which depicts Mount Hua with sporadic houses, clouds on mountain peaks, two poets standing on the mountain, and poems describing the mountains. The VR video begins with the scroll presented as a panorama and then zooms into the clouds such that the peaks on both sides recede rapidly, creating the illusion that the spectator is flying into the scroll. However, the poems appear as though they were written on the scroll’s surface, and the figures of the two poets remain 2D, in contrast to the 3D space. This dimensional parallax is masked by floating clouds that seamlessly bridge the 2D space of the original scroll and the 3D space of the VR, thereby avoiding parallax distortion.

Similarly, the VR adaptation of the Qingming scroll initially offers a panoramic view of a hanging scroll in a virtual gallery before swiftly transitioning from a constrained first-person perspective to a street-level view. This shift in perspective was accompanied by a 90-degree inversion of the long takeover of the scroll’s horizontal axis. This morphs into a long take that penetrates the surface of an image. While the original painting reveals only one side of the houses along the river, preventing spectators from peering into courtyards and rooms, the VR adaptation dismantles the barrier between the 2D surface and the 3D space. It guides spectators towards fluidly shaped objects, such as rivers and streets, which bridge dimensional parallax. The spectator initially lands on a riverboat and then transitions to the street, delving into the depths of a scene previously hidden behind the image’s surface.

Initially designed as a panoramic stereo, the VR adaptation of the Qingming scroll fuses two spherical images with binocular disparity, rendering the image a 3D panorama along the x-, y-, and z-axes (Fig. 8). As video caters to spectators using VR HMDs to peer through fisheye lenses, binocular parallax is discernible to those viewing the image with the naked eye. Notably, it is this visibility of binocular distortion that transforms the experience into a form of a naked-eye 3D video, provided that the spectator watches it with “parallel eye” or “crossover eye”. In one comment, a spectator instructed others to form circles with their hands, combine them, and peer through the gaps between their fists. By doing so, the spectators’ eyes cross, and the brain merges the spheres on the right and left sides, using binocular disparity as a depth cue to create a 3D structure. The binocular parallax remains visible to the naked eye, generating a sense of dimensional parallax. Another spectator suggested that it feels as if spectators are transitioning from a 2D world to a 3D world or from a 3D world to a four-dimensional one. Hence, the binocular parallax in naked-eye 3D functions as a parallax machine, transporting spectators between higher- and lower-dimensional worlds.

Fig. 8: VR adaptation of Qingming scroll.
figure 8

The double spherical images in the Qingming scroll. Source: Yulemaogaoxiaogou (2016) VR version of the Qingming scroll. Available at: https://www.bilibili.com/video/BV1bs411x7zt.

Fig. 9: Naked-eye VR of Qingming Scroll.
figure 9

The naked eye without the double sphere. Source : 360 panorama video technology (2020) Naked-eye VR version of the Qingming scroll. Available at: https://www.bilibili.com/video/BV13y4y1D7BZ.

In 2020, the streaming platform Bilibili introduced 360-degree panoramic video, enabling spectators to manipulate an image for a stereoscopic viewing experience without parallax distortion (Fig. 9). Additionally, it rolled out an interactive surface that allowed spectators to explore the 3D space. Whether by panning across the sky, sweeping down to the ground, or tracing the streets at a distance, spectators can navigate the VR environment simply by dragging the image onto the screen or tilting their phones. Video uploaders can also format their content as panoramic stereo by stitching several spherically structured images together, allowing spectators to experience the space within the painting at 360°. If the panoramic mode is enabled, Bilibili eliminates spherical distortion and presents the image in scalar proportions, focusing on a single vanishing point (Fig. 9). However, if the spectators rotated the image backward, they noticed a faint line dividing the image’s left and right halves. This is the stitch line between two spherical stereoscopic images combined to produce a panoramic image (Fig. 10). The faint line reveals the trace of the binocular parallax and the depth cue that the naked-eye VR device intends to conceal.

Fig. 10: The stich line.
figure 10

Rotating the glasses-free VR version of the Qingming scroll to view its reverse side. Rotating the glasses-free VR version of the Qingming scroll to view its reverse side. Source: 360 panorama video technology (2020) Naked-eye VR version of the Qingming scroll. Available at: https://www.bilibili.com/video/BV13y4y1D7BZ.

Fig. 11: The ink dots.
figure 11

The sky in the naked-eye VR of Qingming scroll. Source: 360 panorama video technology (2020) Naked-eye VR version of the Qingming scroll. Available at: https://www.bilibili.com/video/BV13y4y1D7BZ.

Even without perceptible binocular disparity, naked-eye VR videos can induce a sense of dimensional parallax. Although it is technically known as panoramic stereo, both uploaders and spectators often use the term “naked-eye VR” because of the medium’s capacity to simulate the sensation of stepping into a 3D space without a VR headset. Nevertheless, the image is still present on the surface of the 2D screen, an effect accentuated by the flat composition characteristics of Chinese scroll painting. If the spectators rotate the image upward, they may notice the ink dots in the sky (Fig. 11), and when scrolling downward, they can see the fabric’s texture that forms the ground on which the scroll is painted (Fig. 12). Both aspects highlight the materiality of the surfaces of traditional Chinese scroll paintings. Paradoxically, naked-eye VR also realises the “scattered depth” implied by the scattered perspective used in Chinese scroll paintings, which are not structured around a single vanishing point. Instead, every object is structured in scalar proportions, and a scroll painting has multiple vanishing points. In the video’s bullet subtitles, spectators noted their experience of dimensional parallax, stating that they felt as though they had been reduced to 2D characters in the painting. This dimensional parallax reveals the tension between the perspective of 3D VR, which enables spectators to “walk into” the space, and the painting’s flat composition, which reduces all the characters to two dimensions.

Fig. 12: The fabric texture.
figure 12

The texture of the ground in the naked-eye VR version of the Qingming scroll. Source: 360 panorama video technology (2020) Naked-eye VR version of the Qingming scroll. Available at: https://www.bilibili.com/video/BV13y4y1D7BZ.

The parallax door: the window, the wall, and the frame

The painting’s flat surface painting and the volumetric depth of the image correspond to the two operational logics of the screen: as a wall on which the image’s shadow is displayed and as a window open to the information that is supposed to lie in the image and beyond the screen. The screen’s operational logic as a window and a wall corresponds to the decorative and practical use of the “screen” in ancient China. The etymological root of the Chinese term Ping (屏 Screen) can be traced back to the Zhou Dynasty (771–265 BCE), as a free-standing panel made up of wood and silk that is used to shield a private space in the room (Handler 2007). In its practical use as furniture, the screen stands as a wall that prevents the unwelcomed voyeur from looking into the private space. In its decorative use, the screen is also a surface for painting, and it operates as a window that invites spectators to look at the image painted on the screen. As Wu (1996) suggested, it secretly entices spectators to think beyond the image and imagine what lies on the other side of the screen (p.69). As such, Chinese screen paintings can be theorised as a window opened to volumetric depth in the image and beyond the screen and a wall that prevents the voyeuristic gaze from actually penetrating the 2D screen.

Similar to Chinese screen paintings, the screen in naked-eye VR is an opaque surface that can operate as a window or wall. However, it is not a mirror, as no avatar or character functions as a self-image for spectators to identify with. Cinematic apparatuses traditionally employ a point-of-view shot that connects spectators with their semblance through a two-step identification process. The first shot highlights the object of the gaze, aligning spectators with the camera. The second shot portrays the character as the subject of the gaze, melding the identity of the offscreen spectator with that of the onscreen character. Conversely, in naked-eye VR, the entire video is revealed in a single long take, and the narrative unfolds from the spectator’s first-person perspective. The second shot, which typically features an avatar representing the spectator, is omitted. The naked-eye VR adaptation of the Qingming scroll, as well as many other naked-eye videos, begins with a bird’s-eye view—a spectatorial position that no character in the painting can assume. Subsequently, the camera descends to the street level, lights on the boat, and finally reaches the shore. From there, it trails a porter navigating a single-plank street but soon overtakes him as they take different paths. The lack of an avatar makes spectators of the naked-eye VR a subject of the look without the mirror image to be looked at, and the video an opaque moving image without a reflexive surface.

Rather than a reflexive mirror, the screen in the naked-eye VR is more closely aligned with an opaque door, demarcating the threshold between the surface and the depth, the wall and the window. When the door is closed, spectators can only read the surface of the scroll hanging on the wall. When the door is opened, spectators can explore the stereoscopic depth in an illusory 3D expanse. In the naked-eye VR of the Qingming scroll, the closing and opening of the door are symbolised by the image frame’s appearance and disappearance. At the beginning of the video, the Qingming scroll initially appears as one of many framed paintings hanging in a gallery, and the audience can only read the image in a manner akin to a cinematic long shot swiping across the x-axis. However, as the camera quickly zooms in on the painting, it turns 90° to penetrate the image along the z-axis and directs spectators to look at the image. Upon initiating the journey into the image, the frame suddenly disappears. Some spectators noted the disappearance of their comments and claimed that they had become one of the characters in the painting, as they could no longer find the frame of the screen. Therefore, the appearance and disappearance of the frame demarcate the threshold between the surface of the 2D painting enclosed by the x-y axis of the frame and the frameless space that the spectators explore along the z-axis.

The frame not only appears as the frame of the painting but also as the frame in the painting, such as the window and door frames portrayed literally in the video. At the beginning of the video, the spectator is walking on the street and heading towards the closed door of a gate tower that marks the end of the painting and the spectator’s journey; however, as the spectator approaches, the door gradually opens, leading the spectator to pass the door frame and ascend from the street level, flying towards a closed window of the tea house’s second floor, with a viewer amusingly commenting that they are afraid of hitting the window with the forehead. Nevertheless, the window automatically swings open as the spectators are about to hit it and invites them into the room—a space within a space. Once the spectator reaches the far end of the teahouse, another window opens, granting a view of the space beyond and enabling the spectator to fly outside to explore. Accordingly, the closed door symbolises the limit of the available information displayed on the screen, whereas the open window and open door signify the image depth along the z-axis, measuring spectators’ epistemic desire to delve into the image.

The frame serves as a door frame that demarcates a threshold in space and marks a threshold of time. Before spectators enter the teahouse, a cloudy morning sky is prevalent. However, a noticeable shift occurs as the spectators enter through the window. Rain begins to fall, marked by the sounds of raindrops and thunder. As spectators navigate deeper into the teahouse, the sky, seen through windows on either side, darkens rapidly into dusk. Finally, night falls as the spectators exit through the window and step into the street, lit by lanterns. The time that elapses between spectators’ entrance into and exit from the teahouse suggests that the window frame demarcates not only the liminal space but also the liminal time. Subsequently, the frame serves as a common boundary between two phases that approach but never converge, and the act of crossing it signals a rite of passage, a state of transition. Consequently, the window frame as a limit is analogous to the boundary in binocular parallax between left and right and in dimensional parallax between surface and depth. It invokes phases, perspectives, and dimensions without facilitating convergence.

As an epistemic divider separating the two worlds, the frames of the gate, window, and painting can all be categorised as doors, demarcating the threshold between the inside and outside, the surface of the screen and the hypothetical space beyond the screen. For spectators, crossing the image frame symbolises a journey into the 2D world of the scroll painting. However, the disappearance of the painting signals immersion into the 3D space beyond the 2D surface of the screen. Therefore, the naked-eye VR frame serves as a doorway to another dimension, providing access to a secondary realm in another world. Spectators who pass through a gate, window, or door find themselves being transported from one fictional space to another. This movement was acknowledged by a spectator of the naked-eye VR video of Van Gogh’s paintings, who commented on the door’s role as a gateway to the “second world in this uncanny painting”. The “second world” lying beyond the door is a painting within the painting, leading spectators from Vincent’s Bedroom in Arles to The Starry Night. Similarly, the window in the video of the Qingming scroll serves as a doorway, drawing spectators from a visible street scene in the painting to an unseen indoor scene that is not in the original composition.

With dimensional parallax, a naked-eye VR video operates as a door between the known and unknown and looked-at and to-be-looked-at. In the naked-eye VR video of the Qingming scroll, the road extends to the open door, and the open door along the z-axis exemplifies Whissel’s positive parallax, guiding spectators to see and know more. The image approaching the spectator can be categorised as Whissel’s negative parallax, which breaks the fourth wall, evoking the somatic shock by introducing the unknown to the spectator. In the video of Van Gogh’s paintings, a spectator mentioned in one of the subtitles that the spiral in the sky seems to strike the spectator’s face, and the building’s closed door made them feel that they were being observed. The incommensurability of vision can be attributed to the paradox of VR: although it promises spectators all-seeing power in a 720-degree space, spectator are still limited by the screen’s frame and cannot look beyond what is available. This is best exemplified by tragedy scenes in naked-eye VR, wherein a spectator trapped on a broken roller coaster, crashed plane, or stalled elevator is observed and engulfed by a monster or ghost that unexpectedly enters the frame. Regardless of how the spectator rotates the screen, there is always a space beyond the limit of visibility that returns the gaze to the spectator.

A closed door in the naked-eye VR video represents the invisible space beyond the screen. Although naked-eye VR promises a frameless space within the illusion of an unbounded screen, the de facto frame of the screen still limits what spectators can see and know. Spectators could rotate the screen and zoom in and out to see everything in the video, but they could not see or know more than what the video provided. Therefore, the screen serves as a door of intelligibility, with its frame limiting spectators’ vision along the x-y axis and its surface operating at the end of the z-axis. Towards the end of the naked-eye VR video of the Qingming scroll, spectators’ exploration of the z-axis is stopped by the last door in the video, which gradually closed upon the spectators’ arrival. The closed door operates as a wall that arrests spectators’ scopic and epistemic exploration and as a screen that displays what is beyond the image. Nevertheless, it also invites spectators to look behind the closed door and speculate on the scene that the screen has screened. After the door closes, the spectatorial position gradually rises from the ground level to the sky, providing spectators with a bird’s-eye view of what is inside the courtyard’s closed door—a position that could not be occupied by any of the original figures in the painting. Therefore, the door operates as a limit of facticity that demarcates the reign of visibility while signalling the existence of a space beyond—a space that can be speculated about and imagined.

The rising of the spectatorial position can also be theorised as a move from “what is” and “what has been” to “what if”. The closed door limits spectators’ epistemic exploration of “what is” available in the image, highlighted by the visual cues referring to “what has been” in media history: the ink dots in the sky and the fabric’s texture imply the flat composition of the pictorial scroll, and the stitch line evokes the two flat images that combine to generate a volumetric depth in the stereoscope. However, the closed door also inspires a “what if” imagination that speculates into the courtyard beyond the closed door, screen, and surface of the painting. In the original Qingming scroll, there was an rise in the spectatorial position from the ground level at the start to the riverbank in the middle, culminating in a bird’s-eye view upon reaching the city streets at the end of the scroll. Nevertheless, the angle in the original painting was not sufficiently high for spectators to look into the courtyard, and this unrealised potential was visualised using the naked-eye VR adaptation of the painting. The point-of-view in the naked-eye VR further rises to the sky, gazing into the flying lanterns in the sky—the objects that have never appeared in the original scroll painting yet can be speculated in a “what if” imagination. The naked-eye VR has, therefore, crystallised a parallax history, as it rediscovers the post-cinematic aesthetic in pre-cinematic technology by revealing that the parallax view between the surface and depth in stereoscopic technology is anticipated by the painting scroll.

Conclusion

Naked-eye VR can be positioned to represent a juncture in media history that can only be explored through a parallax historiography. At the macro level, naked-eye VR embodies the coexistence of three- and two-dimensionality, simultaneously ushering in a post-screen culture by generating an illusion of depth beyond the screen. At the micro level, naked-eye VR is rooted in pre-cinematic technology, including stereoscopy and panorama; however, it seeks to create visual effects characteristic of VR and is disseminated in the form of video remakes—hallmarks of post-cinematic visual culture. Naked-eye VR results in unexpected parallels between pictorial and screen art and between pre-cinematic and post-cinematic technologies while maintaining incommensurability between different media conventions.

In contrast to headset-based VR, naked-eye VR simulates the disappearance of the frame to create an immersive environment while simultaneously acknowledging the existence of frame and depth cues that are usually concealed behind the screen. In the naked-eye VR videos of the Qingming scroll and Van Gogh’s paintings, the painting’s frame disappears at the start of the video, creating the illusion of walking into an image that is characteristic of VR technology. However, the screen’s frame remains, signalling the boundaries of spectators’ vision on the x-y axis, and the screen’s surface continues to limit spectators’ exploration on the z-axis. In naked-eye 3D videos, the frames surrounding what the left and right eyes see are highlighted to provide depth cues, and the boundary where the two frames meet persists as evidence of binocular parallax; It marks the limit of binocular vision on the x-y axis and generates depth on the z-axis beyond this limit. It also operates as a door to dimensional parallax, establishing the boundary of two-dimensionality when closed and revealing the anticipation of a 3D space on and beyond the screen when opened.