The cinema as a cultural institution has been studied by academic researchers in the arts and humanities. At present, cultural media studies are the home to the aesthetics and critical analysis of film, film history and other branches of film scholarship. Probably less known to most is that research psychologists working in social and life science labs have also contributed to the study of the medium. They have examined the particular experience that motion pictures provide to the film audience and the mechanisms that explain the perception and comprehension of film, and how movies move viewers and to what effects. This article reviews achievements in psychological research of the film since its earliest beginnings in the 1910s. A leading issue in the research has been whether understanding films is a bottom-up process, or a top-down one. A bottom-up explanation likens film-viewing to highly automated detection of stimulus features physically given in the supply of images; a top-down one to the construction of scenes from very incomplete information using mental schemata. Early film psychologists tried to pinpoint critical features of simple visual stimuli responsible for the perception of smooth movement. The riddle of apparent motion has not yet been solved up to now. Gestalt psychologists were the first to point at the role of mental structures in seeing smooth movement, using simple visual forms and displays. Bottom-up and top-down approaches to the comprehension of film fought for priority from the 60s onwards and became integrated at the end of the century. Gibson’s concept of direct perception led to the identification of low-level film-stylistic cues that are used in mainstream film production, and support film viewers in highly automated seamless perception of film scenes. Hochberg’s argument for the indispensability of mental schemata, too, accounted for the smooth cognitive construction of portrayed action and scenes. Since the 90s, cognitive analyses of narration in film by film scholars from the humanities have revolutionised accounts of the comprehension of movies. They informed computational content analyses that link low-level film features with meaningful units of film-story-telling. After a century of research, some perceptual and cognitive mechanisms that support our interaction with events in the real world have been uncovered. Today, the film experience at large has reappeared on the agenda. An integration of top-down and bottom-up mechanisms is sought in explaining the remarkable intensity of the film experience. Advances are now being made in grasping what it is like to enjoy movies, by describing the absorbing and moving qualities of the experience. As an example, a current account of film viewers' emotional experience is presented. Further advances in our understanding of the film experience and its underlying mechanisms can be expected if film psychologists team up with cognitive film studies, computer vision and the neurosciences. This collaboration is also expected to allow for research into mainstream and other genres as forms of art.
An agenda for the psychology of the film
At the time of the first kinetoscope and cinema exhibitions in 1894–1895, thanks to devices such as the Phenakistoscope, Zoetrope and Praxinoscope, moving images had been popular for decades. Just before that time, academic psychology turned to the identification of the mechanisms underlying the functioning of the mind. Perception psychologists began to study apparent movement of experimental visual stimuli under controlled conditions because they found moving stimuli interesting cases in human perception, or as part of the study of psychological aesthetics founded by Gustav Fechner and Wilhelm Wundt. The publication of The Photoplay: A Psychological Study marked the beginning of the psychology of the film. Hugo Münsterberg was trained by Wundt and recruited by William James to lead the experimental psychology lab at Harvard. Importantly, Münsterberg was also an avid cinemagoer as his analyses of theatrical films of his time may tell, and a professing cinephile at that. Münsterberg set two tasks for the study of the film: one was to describe the functioning of psychological mechanisms in the reception of film; the other to give an account of film as an artistic medium.
Münsterberg shared his contemporaries’ and even today’s spectators’ fascination for the wonder of moving images and their apparent reality. He described the film experience as a 'unique inner experience' that due to the simultaneous character of reality and pictorial representation “brings our mind into a peculiar complex state” (p. 24).
In the first part of The Photoplay, explores how film characteristically addresses the mechanisms of the basic psychological functions investigated by experimental psychology—namely perception, attention, memory and emotion.Footnote 1 In The Photoplay the imagination is the psychological faculty that theatrical movies ultimately play upon; attention, perception, memory and emotion are also directed by the film, but contribute to the film experience as building blocks for the imagination in the first place. One of the ways that films entertain the imagination is by mimicking the psychological functions. Film scenes may represent as-if perceptions, as-if thoughts, as-if streams of associations, and as-if emotions or more generally: display subjectivity.Footnote 2 Second, the film creates an imagined world that deviates from real world scenes as we perceive these in real life. Liberated from real life perceptual constraints involves the spectator’s self in 'shaping reality by the demands of our soul' (p. 41). Third, Münsterberg has a nuanced view of the automaticity of responses to film. On the one hand, it is the spectator’s choice—based on their interest—which ideas from memory and the imagination to fit to images presented on screen; they are felt as 'our subjective supplements' (p. 46). On the other, the film’s suggestions function to control associated ideas, '… not felt as our creation but as something to which we have to submit' (p. 46). And yet in Münsterberg’s view the film does not dictate psychological responses in any way.Footnote 3
Finally, The Photoplay provides abundant and compelling introspective reports of the film experience and so probes into the phenomenology of film, that is, what it is like to watch a movie. I think it is fair to say that for Münsterberg the film experience is the ultimate explanandum for a psychology of the film. In order to account for that phenomenology by mechanism of the mind proper descriptions of the film experience are needed, and introspective reports are an indispensable starting point for these.
The other task Münsterberg set himself was to propose an account of the film as a form of art. Part two of The Photoplay proposes that the film experience includes an awareness of unreality of perceived scenes. This awareness is taken as fundamental for psychological aesthetics; all forms of art are perceived to go beyond the mere imitation of nature.Footnote 4 Münsterberg showed himself a formalist in that he theorised that aesthetic satisfaction depends not on recognition of similarity with the real world or practical needs, but on the sense of an 'inner agreement and harmony [of the film’s parts]' (p. 73).Footnote 5 But in order to qualify as art, according to Münsterberg film was not to deviate too much from realistic representation that distinguishes theatrical movies from non-mainstream forms.
Münsterberg’s agenda is in retrospect quite complete. The detailed investigation of psychological mechanisms and aesthetics of film is followed by a last chapter on the social functions of the photoplay. The thoughts forwarded in it are more global than those on perception and aesthetics. The immediate effect of theatrical films on their audience is enjoyment due to their freeing the imagination, and their easy accessibility to consciousness 'which no other art can furnish us' (p. 95). Enjoyment comes with additional gratifications such as a feeling of vitality, experiencing emotions, learning and above all aesthetic emotion.
In a final section behavioral effects of successful films are discussed. Here the film psychologist vents concerns on what we now refer to as undesirable attitude changes and social learning, especially in young audiences. The agenda of today's social science research on mass media effects (e.g. Dill, 2013) is not all that different from Münsterberg's in the last chapter of The Photoplay.
The two tasks that Münsterberg worked on set the agenda for the psychology of film in the century after The Photoplay. It is clearly recognisable in the psychology of the film as we know it today.Footnote 6 But the promising debut made in 1916 was not followed up until the nineteen seventies, or so it seems. James Gibson lamented in his last book on visual perception that whereas the technology of the cinema had reached peak levels of applied science, its psychology had so far not developed at all (1979, p. 292). The cognitive revolution in psychology of the 60s paved the way for its upsurge in the early 80s. But some qualifications need to be made on the seeming moratorium. First, Rudolf Arnheim developed since the 1920s a psychology of artistic film form. Second, although not visible as a coherent psychology of the film, laboratory research on issues in visual perception of the moving image—in particular studies of apparent movement—continued.
Gestalt psychology and film form
Rudolf Arnheim’s essays published first in 1932 added analytic force to Münsterberg’s conviction that film is not an imitation of life. Film and Reality (1957) highlights shortcomings of film in representing scenes as we know them from natural perception.Footnote 7 In the same essay, it is pointed out that comparing a filmic representation of a scene with its natural perception is what analytic philosophers would call an error of category. In The Making of Film (1957) Arnheim presented an inventory of formative means for artistic manipulations of visual scenes, including delimitation and point of view, distance to objects and mobility of framing. It is argued that chosen manipulations often go against the most realistic options. For example, ideal viewpoints and canonical distances are often dismissed in favour of more revealing options.Footnote 8 Arnheim’s aesthetics of film gravitates towards acknowledged artistic productions more than to the 'naturalistic narrative film' (e.g., 1957, p. 116–117) the more moderate art form that Münsterberg tended to prefer.
Arnheim was informed by such founders of Gestalt psychology as Wertheimer, Köhler and Koffka. This school held that natural perception results from the mind’s activity. It organises sensory inputs into patterns according to formal principles such as simplicity, regularity, order and symmetry. Arnheim developed into the leading Gestalt theorist of aesthetics of the 20th century. In his 1974 book he analysed a great number of pictorial, sculptural, architectural, musical and poetic works of art while only rarely referring to film.Footnote 9 The cornerstone aesthetic property of art works including film is expression, defined by Arnheim as 'modes of organic or inorganic behaviour displayed in the dynamic appearance of perceptual objects or events' (1974, p. 445). Expression’s dynamic appearance is a structural creation of the mind imposing itself on sound, touch, muscular sensations and vision. Expressive qualities are in turn, the building blocks of symbolic meaning that art works including film add to the representation of objects and events as we know them in the outer world. Thus, Arnheim’s theory of expression and meaning in the arts seems to echo Münsterberg’s formalist position on the perception of 'inner harmony' as the determinant of film spectators’ aesthetic satisfaction.
Münsterberg shared the amazement that moving images awakened in early film audiences. He considered the experience of movement a central issue for the psychology of the film. The experience of movement in response to a series of changing still pictures has been studied in psychology and physiology under the rubric of apparent motion.Footnote 10 In Münsterberg’s days, international psychology labs were probing the perception of movement in response to experimental stimuli that were perceived as moving images. Well-known examples include apparent motion induced by the subsequent views of single stationary lines in different positions that result in phi movement, the perception of one moving shape or line. Researchers in this area have continued to study the perception of movement in film as only one of many interesting visual stimuli, such as shapes painted on rotating disks, or dynamic computer-generated lights, shapes and objects of many kinds. Why and how we see motion has been as basic to the study of visual perception as questions of perception of colour, depth, and shape. Helmholtz proposed that what we need to explain is how retinal images that correspond one-to one, i.e., optically with a scene in the world are transformed into mental images, or percepts that we experience. In the case of apparent motion, we also need to understand how a succession of retinal images are perceived as one or more objects in motionFootnote 11 Apparent motion in film viewing needs to be smooth,Footnote 12 and depends on frame rates and masking effects. (The latter effects refer to dampening of the visual impact of one frame by a subsequently presented black frame).
Münsterberg’s conviction that the perception of movement needs a cognitive contribution from the viewer clashes with alternative explanations that rely on prewired visual mechanisms that automatically and immediately pick up the right stimulus features causing an immediate perception of motion, without the mind adding anything substantial. The inventors of nineteenth century moving image devices explained the illusion of movement by the slowness of the eye, possibly following P.M. Roget’s report on apparent motion to the Royal Society in 1824. In the early years of cinema, the persistence of vision account was meant to add precision to this explanation. It proposed that the retina, the optic nerve or the brain could not keep up with a rapid succession of projected frames, and that afterimages would bridge the intervals between subsequent frames. Anderson and Fisher (1978) and Anderson and Anderson (1993) have argued why the notion is false and misleading. It suggests that film viewers’ perceptual system sluggishly pile up retina images on top of one another. However, this would lead them to blur which obviously is not the case. The Andersons refer to the explanation as a myth because it is based on a mistaken conception of film viewing as a passive process. Even with the characteristically very small changes between subsequent frames characteristic of motion picture projection, the visual system performs an active integrative role in distinguishing what has changed from one image to another. This integrating mechanism in film viewing is exactly the same as in perceiving motion in real world scenes. Mechanistic explanations have since been founded on growing insights in the neuroscience of vision, such as single cell activity recordings in response to precisely localised stimulus features.Footnote 13 'Preprocessing' of visual input before it arrives in the cortex takes place in the retina and the lateral geniculate nucleus, which have specialised cells or trajectories for apprehending various aspects of motion. There are major interactions between perceptual modules.Footnote 14 Physiological and anatomical findings in the primate visual system, as well as clinical evidence, support the distinction of separate channels for the perception of movement on the one hand, and form, colour and depth on the other (Livingstone and Hubel, 1987). Research on how exactly the cortical integration systems for vision are organised has not yet come to a close. A variety of anatomical subsystems have been identifiedFootnote 15, and there is room for task variables in the explanation of motion perception.Footnote 16 The operation of task variables in presumably automated processes (e.g., attentional set, induced by specific task instructions) complicates accounts of apparent motion and the perception of movement based on lowest processing levels.
Non-trivial and clear-cut contributions of the mind to smooth apparent motion have been proposed by Gestalt psychologists. Arnheim (1974) considered the perception of movement as subsidiary to that of change. The mind uses Gestalt principles such as good continuation and object consistency to perceive patterns in ongoing stimuli. Movement is the perception of developing sequences and events.Footnote 17 Gestalt psychologists have attempted to identify stimulus features that are perceived as a spatiotemporal pattern of 'good' motion, and they discovered various types of apparent motion have been distinguished as a function of stimulus features. In an overview volume, Kolers (1972) presented phi and beta motion as the major variants. Phi, the most famous, was first documented by Wertheimer in 1912. An image of an object is presented twice in succession in different positions.Footnote 18 Pure or beta motion that is objectless motion, was the novel and amazing observation; the perception seemed to be a sum or integration by the mind beyond the stimulus parts, and asked for an explanation. It is also experienced when the objects in the subsequent presentations are different.
Wertheimer and those after him looked for mechanisms of the mind that could complement the features of the stimulus responsible for apparent motion in its various forms.Footnote 19 Other studies of apparent motion, too, indicated that simple models of stimulus features alone could not explain apparent motion.Footnote 20 One of the best examples of what the cognitive system adds to stimulus features is induced motion (Duncker, 1929). When we see a small target being displaced relative to a framework surrounding it, we invariably see the target moving irrespective of whether it is the target or the frame that is displaced. Ubiquitous film examples are shots of moving vehicles, with mobile or static framing.
In this summary and incomplete overview of the field, we could not make a strict distinction between mechanist and cognitive explanations for the perception of movement in film. The current state of research does not allow for it.Footnote 21 Kolers’s conclusions on the state of the field closing his 1972 volume on motion perception seem still valid. He inferred from then extant research that there must be separate mechanisms for extracting information from the visual stimulus and for selecting and supplementing the information into a visual experience of smooth object motion or motion brief. He concluded that 'The impletions of apparent motion make it clear that although the visual apparatus may select from an array [of] features to which it responds, the features themselves do not create the visual experience. Rather, that experience is generated from within, by means of supplementative mechanisms whose rules are accomodative and rationalizing rather than analytical' (p.198). But even if after Koler's analysis some perceptual (Cutting, 1986) or brain mechanisms (Zacks, 2015) have been identified today we still do not know enough about the self-supplied supplementations.Footnote 22
Perception and cognition of scenes
Mental representation and event comprehension
Contributions of the mind can go considerably beyond apparent motion, i.e., the perception of smooth motion from one frame to another. The cognitive revolution in academic psychology that took off in the 1960s broadened the conceptualisation of contributions of the mind to the film experience beyond the narrower stimulus-response paradigms that had dominated psychological science until the 1960s. The cognitive revolution went beyond Gestalt notions of patterns applied by the mind on stimulus information. It introduced the concept of mental representation as a key to understanding the relation between sensory impressions from the environment on the one hand, and people’s responses to it. Moreover, these cognitive structures were seen functional in mental operations such as retrieval and accommodation of schemas from memory, inference and attribution. These were quite complex in comparison to perceptual and psychophysical responses. In the past 30 years, they have come to encompass event, action, person, cultural, narrative and formal-stylistic schemas. The cognitive turn in film psychology has stimulated a growing exchange with humanist film scholarship, resulting in advances in the elaboration of cognitive structural notions. Early applications of the cognitive perspective in the psychology of the film can be found in the 1940s and 50s in work by Albert Michotte (1946) and Heider and Simmel.Footnote 23
Against mental representation: direct perception of film events
The psychology of the film as a subdiscipline of academic psychology really took off in the late 1970s. Münsterberg’s broad agenda that had been scattered across isolated studies of mainly movement perception regained general acclaim. This was due first to the booming supply and consumption of moving images through media television and computer-generated imagery since the 60s. Second, the cognitive turn in experimental psychology renewed an interest in perception and cognition as it occurs in natural ecologies. This is the backdrop against which James Gibson (1979) noted the virtual absence of a psychology of the moving image, motivating his chapter on the film experience. The chapter was important in that it applied his highly influential ecological principles of perception of real world scenes to perception in the cinema. Gibson’s general theory of visual perception (e.g., Gibson, 1979) hinges on the notion that the visual system has evolved to extract relevant information from the world in a direct fashion. A scene presents itself to the observer as an ambient optical array that immediately and physically reflects the structure of the real world. Changes and transitions in the flow of the optical array are due to natural causes such as alternations of lighting intensity of the scene, e.g., due to clouds, or movement of objects in the scene or of the observer. These variations in the optical flow enable the automatic pick-up of invariants. Example invariants are the change in size of portions in the array, and the density of texture in that portion when the observer gets closer to, or farther away from the object.Footnote 24 The changes in these parameters are linked with depth-information in a way that is constant across different scenes, observer speeds, lighting conditions, etc. Invariants enable the direct perception of the real world in the service of adaptive action. Disturbances of the optic flow can automatically be perceived as events. The events are categorised on the basis of the nature of the disturbances, e.g., as terrestrial, animate, or chemical events. Furthermore, the direct tuning of the perceptual senses to the structures of the environment enable an immediate perception of affordances, for example the slope of a hill causes the direct perception of 'climbability'.
The experience of motion pictures according to Gibson involves a dynamic optical flow exactly like the one an observer would have when being present at the filmed scene.Footnote 25 Film represents the world to the senses that are calibrated to that world. The field of view of the camera becomes the optic array to the viewer (Gibson, 1979, p. 298). Perception of objects, movement, events and affordances is direct and realist, based as it is on the same invariants and affordances that the scene in the real world would offer. Deviations from these as emphasised by cognitivist film psychologists from Münsterberg through Arnheim to Hochberg as we will shortly see, are largely taken as non-representative exceptions.
A major affordance offered by conventional movies is empathy with characters. Empathy presupposes that we understand what happens to characters. Scenes present their actions, reactions and feelings. However, most scenes are not continuous. How do we understand scenes presented in pieces, and what are the limits to our understanding? Gibson’s reply to the question of how continuity is perceived in scenes that is, smooth movement and unitary events across cuts would be that the perceptual system extracts the same invariants from the two shots on either side of the cut. The elegant explanation again rests upon a presumed correspondence between perception of real world scenes and film scenes.
Gibson inspired important theorising on the film experience, notably by Anderson and Cutting that we will turn to shortly. Here we emphasise that his direct perception account of the film experience stands in perpendicular opposition to the key innovation that the cognitive turn introduced in experimental psychology. Gibson denied the necessity of mental representations in the perception of objects and events, be it in real scenes or in film.
Cognitive schemas and the canonical set-up of the cinema
The role of mental representations, be they cognitive principles or schemas or other mental structures was argued over a lifetime of work in the psychology of film by Julian Hochberg. A perception psychologist with an interest in pictorial representations and their aesthetics, he devoted a large part of his work to identify what is given in film stimuli and how perception goes beyond that, in often ingenuous demonstrations and experiments. (The demonstrations are, in fact, introspective observations of film perception under exactly specified, reproducible stimulus conditions). A comprehensive overview can be found in Hochberg (1986).Footnote 26 His legacy should be referred to as the Hochberg and Brooks oeuvre, because his wife Virginia Brooks a psychologist and filmmaker, contributed such a great deal to it. Hochberg found that cognitive schemata are necessary in the perception of film for two reasons. The most profound one is that completely stimulus-driven (or 'bottom-up') accounts of the perception of movement, events, and scene continuity do not really explain the experience. For example, Hochberg and Brooks point out that neurophysiological motion detectors do not explain motion perception, that is, they 'amend but do not demolish' an account based on a mental representation of motion (Hochberg and Brooks, 1996b, p. 226). The same would go for any other direct perception account, including Gibson’s optics plus invariant extraction model. The more practical argument is that the direct perception account fails to pose limits to the scope of its application, leaving thresholds and ceiling conditions for the mechanisms out of consideration. The canonical set-up of cinematic devices for recording and displaying motion pictures has evolved to produce good impressions of depth, smooth and informative motion, emphasis on relevant objects and continuity of action, often violating the course of direct perception in comparable real world scenes. Figure 1 presents a demonstration of active disregard that viewers of mainstream movies typically display. (See also Cutting & Vishton (1995) on contextual use of depth-information).
The most immediate demonstration of apparent motion is Duncker’s induced motion referred to above, a cinematic effect because it is dependent on canonical projection within a frame. The best analytic examples are about the perception of events in filmed dance.Footnote 27 For Hochberg and Brooks an ecological approach to perception in the cinema needs to take the ecology of the cinema into account.
The necessity of cognitive schemas in film perception was pointed out most pregnantly in Hochberg’s dealing with the comprehension of shot transitions or cuts. It was argued that known sensory integration and Gibson’s extraction of invariants, fail to account for viewers’ comprehension of frequent and simple cinematic events like elision of space and time. Overlap in contents between successive shots can be hard to identify or lack at all. Hochberg and Brooks proposed a principled alternative: films play in the mind’s eye. Viewers construct an off-screen mental space from separate views, and they can link two successive views by the relation of each of these to this space. In constructing a mental space, overlap may even be overruled by other cues, that have nothing to do with any invariance. The construction must involve event schemas and cognitive principles removed from anything immediately given in the film. Schemas may indeed outperform (mathematical) invariants picked up from the optical array offered by the screen. Hochberg and Brooks (1996b) show, for example how gaze direction of film characters or personae in subsequent shots may be more effective in the construction of a continuous mental scene than overlapping spatial or visual symbolic contents. Footnote 28 Mental schemas seem to be indispensable in the comprehension of sequences of completely non-overlapping cuts. A famous demonstration by Hochberg and Brooks is reproduced in Fig. 2. The succession of shots is readily understood when it is preceded by the presentation of a cross, which provides the integrating schema. Viewers’ schema-based continuous perception of scenes is supported by the ways that traditional cinema tells its stories. The presentation of an overall view in so-called 'establishing shots' followed by a 'break-down' of its object into subsequently presented part views is a cornerstone procedure in classical continuity film style (Bordwell and Thompson, 1997/1979).
A smooth understanding of non-overlapping cuts may require dedicated knowledge of discursive story units and rules for their ordering that only literary analysis types of study can reveal (Hochberg, 1986, pp. 22–50). Hochberg and Brooks (1996a, p. 382) pointed out that theoretical or empirical proposals as to the nature of such representations were lacking. They found Gestalt principles unsatisfactory (Hochberg, 1998). Current film psychologists have taken up this challenge as we shall see briefly.
As a final contribution of Hochberg and Brooks’ to the psychology of the film, we would like to highlight their view of film spectators as partners motivated to deliver their share in a communicative effort. Film viewers contribute to the canonical setup of the cinema in that they are astutely aware of the filmmaker’s communicative intentions: '… the viewer expects that the film maker has undertaken to present something in an intelligible fashion and will not provide indecipherable strings of shots' (Hochberg, 1986, p. 22–53). Viewers must be assumed to have an associated motivation to explore the views presented to them. In a series of inventive experiments, Hochberg and Brooks gathered evidence for an impetus to gather visual information. Looking preference increased with cutting rate and with complexity of shot contents. Visual momentum, or viewer interest, (Brooks and Hochberg, 1976; Hochberg and Brooks, 1978) as they termed it is the absorbing experience typical of cinema viewing. These studies help us to understand how current cutting strategies meet the viewers’ typical motivation for cognitive enquiry. The reward of comprehension is carefully dosed by varying the time allowed to the viewer to inspect objects and scenes, dependent on their novelty and complexity.
Hochberg’s demonstrations of the involvement of mental structures in understanding portrayed events was in large part based on introspective evidence. They have been criticised for relying too heavily on top-down control of perception by too intricate mental structures, by Gibson and others.Footnote 29 Current research in the cognitive structure tradition uses more sophisticated experimental set-ups. Inspiration has been drawn from theories of discourse processing in cognitive science. In this research, the relationship of 'top-down' use of schemas in scene comprehension with 'bottom-up' processing of stimulus features has become an important question.Footnote 30 Zacks has extensively investigated how film viewers segment the ongoing stream of images and extract meaningful events and actions from it. Viewer segmentation depends on automatically detected changes in a situation (Zacks, 2004). Detection of the changes requires only minimal use of schemas, and triggers automated perceptual-motor simulations of events and subevents such as actions.Footnote 31 Segmentation follows the logic of events in the real world. Most importantly, multiple events can be organised in a hierarchical or linear fashion, as scenes, sets of events and subevents or actions (Zacks, 2013).
Theory of mind and layered meaning of events
Extracting events in understanding film scenes needs more than retrieving schemas of real world events. The fact that they are presented with an idea in mind, is reflected in their understanding. Understanding film scenes and especially characters, their actions, plans and goal has been argued to require a so-called Theory of Mind (Levin et al., 2013). TOM is a system of cognitive representations of what beliefs, needs, desires, intentions and feelings people have in their interaction with others and the world. It is acquired in early childhood, when children understand that others, too, have an internal life, similar to but also different from one’s own beliefs and feelings. Levin et al. explain how use of TOM, also referred to as mentalising is necessary for an elementary understanding of film character actions and feelings. For example, character gaze following that underlies our perception of what characters feel or want to do with respect to an object that they look at requires TOM. TOM underlies grasping spatial (and action-) relations in scene comprehension across cuts using gaze following. Understanding relations between more complex events require schema-controlled theorizing on what people believe, do, think, and feel. Finally, Levin et al. demonstrate through film analyses how film viewers construct multi-layered representations of a film’s action from the point of view of different characters, the viewer and even from the narrator’s or filmmaker’s. For example, viewer and character perspectives may clash as in dramatic irony, or the narrator may create false beliefs on story events in viewers.
Continuity of events and viewer attention
Hochberg’s question of what the mental schemas look like that enable us to perceive smooth progress of events across film cuts has recently been addressed by the next generation of film psychologists. They have sought answers in profound analyses of the canonical setup delivered by the founders of cognitive film theory in the humanities, such as Bordwell (1985, 2008), and Anderson (1996). Bordwell’s extensive analyses of classical film narrative and his account of the viewer’s mental activity in the comprehension of the film’s story-world suggest a film-psychological hypothesis on the experience of continuity: Classical Hollywood film style serves smooth progress of the narrative. Continuity editing ensures fluency across shot transitions. Shot A cues cognitive schema-based or narrative expectations that are subsequently matched in shot B. Expectations can be perceptual or cognitive, i.e., requiring inferences supported by event schemas. Anderson added a Gibsonian perspective, arguing that the perception of film scenes mimics the perception of real world scenes. Continuity shooting and editing closely follow the constraints of the human perceptual systems that have evolved to 'extract' continuity from changing views of scenes in the real world. Recent research into the experience of smooth development of events and scenes across shot transitions draws on these principles of continuity narration.Footnote 32 Framing, editing and sound finetune the viewer’s top-down search to focus on candidate target stimuli. A quite complete and accurate explanation was offered by Tim Smith. His Attentional Theory of Cinematic Continuity (2012) explains the viewer’s sense of smooth progress by the continuity editing principles that mainstream filmmakers tend to adhere to. AToCC breaks away from Hochberg’s analyses to the degree that it holds that viewers do not need intricate spatial or semantic schemas to construct continuous events from separate shots. Rather it is built on the Gibsonian principle that perceiving continuity in film scenes derives from the continuity that we experience in perceiving scenes in the natural world. The ecology of the cinema renders it sufficient to follow a number of simple spatiotemporal guidelines. Continuity editing film style guides viewers’ attention in seamlessly following action across cuts. Attention, that is the focused selection of objects in a shot by the viewer, i.e., what and where the viewer directs their gaze, is led by the filmmaker. The viewers’ gaze in shot A is directed to the part of the screen where the target of interest in shot B, that is after the cut, will be. The shift of attention from one portion to another of the screen in shot A is shortly followed by the cut, and because the gaze 'lands' in the right place in shot B, the cut has become invisible.Footnote 33 The theory of continuity perception adds precise levels of analysis to the construction of mental scene spaces that Hochberg proposed. It distinguishes higher level and lower level control of attention. Higher-level ones include 'perceptual inquiries' as Hochberg and Brooks (1978a) called them. The expectations or questions that guide the gaze may be minimally articulated, e.g., 'what or whom are these characters looking at' as in gaze following, but the operation of higher level cognitive schemas are not excluded. The best demonstration to date of the control of focus of attention by the narrative is given in research on suspense and its effects on film viewer gazes by Bezdek et al. (2015) and Bezdek and Gerrig (2017). Footnote 34 Their results can be taken to imply that suspense, a state of high absorption, is associated with focal attention to story-world details supervised by expectations created by the narrative (see also Doicaru, 2016).
The study of film viewers' attention has delivered a firm account of the role of the ubiquitous Hollywood continuity film style in the typical experience of smoothly flowing film scenes and stories that audiences allover the world have. (See for a review Smith, Levin & Cutting, 2012).
A lead role in perception for cinematic low-level features?
Experimental psychology has always aspired basic explanations of perceptual responses, preferably through transparent mechanistic associations with physically observable stimulus conditions. The role of high-level narrative schema-based attention in smooth film experiences discussed in the previous section, is subject to debates in which experimental data support arguments pro and con. To begin with, AToCC emphasises the role of leading expectations in following cuts, but more akin to the Gibsonian approach of visual perception than to Hochberg’s schema position as it is, it tends to stress lower level features as directing attention bottom-up, too or even more so. One lower level is given by film-stylistic devices, for instance the use of sound that can orient viewers to direct their gazes to the next shot’s portion of the screen where the sound’s origin will be shown. Another are lower level stimulus features in a narrower and technical sense, such as bright lights and movements with sudden onset that automatically attract attention due to the make-up of the senses and the brain. Especially movement was shown by Smith to be an extraordinary low level attentional cue. The power of low level feature control of attentional shifts has inspired Loschky et al. (2015) to speak of the 'tyranny of film'. They start from research findings suggesting that the use of low-level stylistic features can result in attentional synchrony across film audiences, that is individual viewers of a scene gaze at exactly the same portions of the screen at exactly the same time.Footnote 35 Remarkable degrees of inter-viewer synchronization of visual attention has also been established in studies of localisations of brain activity in film viewers (e.g., Hasson et al., 2003). However, Stephen Hinde’s research has recently shown that the distraction effect of inserted low-level attention triggers is quite limited (Hinde et al., 2017) In line with this notion of top-down attention control overriding bottom-up attention triggers, Magliano and Zacks (2011) demonstrated that the perception of cuts is suppressed by higher order processes related to the construction of complex events.
Gibson’s idea of invariants in optical arrays can now be made concrete, enabling the prediction of bottom-up controlled attention and perception from objectively identified features. Developments in computer vision, image and sound analysis have paved the way for automated extraction of features and patterns in visual and auditory stimuli in terms of multiple dimensions. For example, machine extraction of saliency as a feature predictive of bottom-up attention has been developed and applied in numerous computer vision applications. A much-cited article by Itti and Koch (2001) illustrates the idea for static images. Specialised neural network algorithms detect features such as colour, intensity, orientations, etc. in parallel over the entire visual field. Each feature is represented in a feature map, in which neurons compete for saliency. Feature maps are combined into a saliency map. A last network sequentially scans the saliency map, moving from the most salient location to the next less salient one and so on. Footnote 36 An excellent explanation of how to obtain saliency maps is given at a Matlab page.Footnote 37
Psychologists of film in their attempts to explain the extraordinary smooth and intense perceptual experience that mainstream film typically provides, currently seek to join forces with computer vision scientists. In a next step, they may seek collaboration with vision labs in the world that attempt to link their low-level film image feature analyses with film narrative structures and viewer responses.Footnote 38
The work of perception researcher James Cutting has carried the psychology of the film into the next stage of the Gibsonian ecological approach, while also linking it with insights in the structure of film narrative from humanities scholarship.Footnote 39 In an interesting essay on the perception of scenes in the real world and in film Cutting (2005) summarised the ecological perspective on perception stating that understanding how we perceive the real world helps to grasp how we perceive film and vice versa.Footnote 40 In the last decade Cutting developed powerful computational content analysis methods that reveal the patterning of low-level features in relation to dimensions of film style and technology, in representative samples of Hollywood films of well over a hundred titles. The theoretical starting point of the approach is that movies exhibit reality. The psychologist Cutting subscribes to the analytical distinctions made in literary and film theories between plot, form and style of a narrative on the one hand, and the represented story-world on the other. The Gibsonian proposal is that analyses of the fabula or story-world (i.e., the action, events, characters and so on) should lead to identification of syuzhet features (i.e., formal and stylistic features that are physically given in the film stimulus or can be perceived without substantial instruction) functional in the perception and understanding of that story-world; vice versa, variations in form and style reflect variations in the portrayed story-world. Cutting’s definition of low-level film features used in the analyses was informed by analyses of narrative, style and technology by David Bordwell, and methods for statistical style analysis by Barry Salt (2009).
Low-level features analysed by Cutting and co-workers are physically and quantitatively determinable elements or aspects occurring in moving images, regardless of the narrative. They include shot duration, temporal shot structure, colour, contrast and movement. The value of each feature can be expressed as an index for an entire film, or for some segment targeted in an analysis.Footnote 41 Inspection by an analyst complements machine vision analyses, but I would qualify the indexing approach as computational (objective) film analysis, because of intensive tallying and numerical operations developed by specialists in psychological data-processing. The features do not constitute events or scenes, but they accentuate these. A recording of their measurements for an entire film would constitute an abstract backbone to be filled with scenes and events. One possible comparison is with the rhythmic score of a song without melodies and words. In the hands of capable film-makers they are indispensable for conveying the narrative, due to their direct, predictable and automated effects on the visual system.
The primary use of the approach is in film analysis. The multi-feature configurations of indices can be used to reliably 'fingerprint' films or sections. Reliably because the indices are derived from large numbers of measurements. Computational film analysis uses a historical corpus of films and has been deployed over the past decade to corroborate and enrich historical analyses of film style.Footnote 42 The climax so far of efforts to integrate computational content analysis with film theory and analysis is Cutting’s (2016) report on narrative theory and the dynamics of popular movies. The corpus consisted of 160 English language films released between 1935 and 2010, ten for each year. As Figure 3 illustrates a typical course obtained of the number of shot transitions over film presentation time, interpretable as to mark the acts and the pace of narration, see Figure 3. An important outcome of the analyses is that clear physical support was obtained for the four-act structure proposed by film historian Thompson (1999) across the entire period. It should be noted that Thompson’s act structure was identified largely on the basis of higher level narrative segmentation.Footnote 43 Shot scale was unrelated to the act structure. Cutting added analyses of higher order level film features that can be interpreted to co-vary with narration.Footnote 44 Cutting then ventured upon a multi-feature analysis of the entire corpus. Associations among all indices across all titles could be reduced to four dimensions: motion, framing, editing and sound. They correlated in a meaningful way. For example, shot scale was inversely related to shot duration; in classical narration close-ups tend towards briefer durations than wide shots. Each dimension represented polar opposites between features, e.g., music vs. conversation for sound and close-ups vs long shots for framing. Computational content analysis can explore the dynamics of the dimensional representations over subsequent acts of movies. Figure 4 reproduces Cuttings findings for prolog, setup, complication, development, climax, and epilog.Footnote 45 It would seem that the analysis winds up in a level of cinematic content representation that is grounded in directly given stimulus features, integrated with film-analytical features that can be readily indexed and seem relevant as production tools in regular filmmaking.
What does computational content analysis mean psychologically, that is how do indices and dimensions function in the viewer’s perceiving and comprehending events? Patterns of features trigger changes in viewers’ physiological, attention, perception and emotion systems, according to Cutting (2016, p. 27). Typical low-level configurations may correlate with possible effects on the viewer’s perception and experience of events. For example, shot duration may support interpretations of pace, mood and tension, think of drama’s long takes; temporal shot structure is functional for sustaining attention or suspense (e.g., when a sequence of brief shots abruptly merges into long duration shots), e.g., in thrillers; movement (of camera and objects on screen) serves arousal in the viewer, as in action movies; low luminance signals possible threat as in horror movies, while high luminance may lend 'a sense of other worldliness' (Brunick et al., 2013, p. 141). All low-level features can help viewers in categorising films as to genre, and changes in these will support segmentation of events and scenes, which is at the basis of smooth narrative understanding. Combinations of indices enable more interesting interpretations of possible experience effects.Footnote 46 However, because the studies that the overarching computational content analysis was based on do not involve response measurement, a direct connection between cinematic form (especially narrative procedures) and cinematic meaning that Cutting argues for is open to further elaboration. Even in the face of the richness of directly given information that has been extracted using computers, Cutting sees room for the use of cognitive schemas. The very narrative acts that are underlined by immediately given information may be schematic in nature, but he finds it more likely that their functioning is less dependent on memory-processes than the very high-level cognitive structures implied in cognitive scripts and TOM reasoning.
To conclude the sections on the cognition of film scenes, we seem to have made important progress in understanding how movies construct events in film viewers' minds an brains, as put it in his state of the art review. Movies in part "dictate" events, actions and scenes to viewers' brains using an "alphabet" of visual and auditive features; viewers in turn contribute to the construction of story-worlds by developing and matching higher-order structural anticipations using embodied cognitive event, character and narrative schemas. Since 1916, the film units that have been analysed increased from paired single stimuli (as apparent motion experiments) to whole film acts (as in computational film analysis). Analyses of narrative structure from film theory have become for the psychology of film what harmonics and counterpoint analysis signify to the psychology of music or the theories of syntax and semantics to psycholinguistics. They inform psychological notions of film structure and organization.
The awareness of narrative film
The third part of The Photoplay deals with issues other than the psychological mechanisms or the psychology of film form namely the awareness offered by the photoplay. It was only natural to Münsterberg as a child of his time to designate the special awareness that film creates as the explanandum in psychological research, the mechanisms of film stimuli impinging on attention, perception and memory being the explanans. His characterisations of this conscious awareness, what it is like to watch theatrical films, or in other words the phenomenology of the film experience remains in my view as yet unparalleled. Apart from the sense of freedom that we have already discussed, they include attentional and affective experiences.
Münsterberg described enjoyment as the immediate effect of theatrical film, explaining it from the exceptional freedom of the imagination: "The massive outer world has lost its weight, it has been freed from space, time, and causality, and it has been clothed in the form of our consciousness. The mind has triumphed over matter and the pictures roll on with the ease of musical tones. It is a superb enjoyment which no other art can furnish us" (Münsterberg, 1916, p. 95). Light has been thrown on the remarkable fluency of the film experience noted by Münsterberg by current research in narrative procedures, and the mechanisms of continuity perception discussed in the previous section. Münsterberg also stressed that the enjoyment of photoplays depends on our experience of the film’s story as an emotionally meaningful world separate from reality: 'The photoplay shows us a significant conflict of human actions … adjusted to the free play of our mental experiences and which reach complete isolation from the practical world …' (p. 82). And finally, he singled out the role of focused attention in enjoyment. 'It is as if that outer world were woven into our mind and we were shaped not through its own laws but by the acts of our attention, …' (Münsterberg, 1916, p. 39).
Twentieth century academic psychology did not develop much of a body of theory and research on human consciousness. Hence it is not surprising that alongside research into perception and comprehension one doesn’t find much work on the conscious experience of film. Measurements of perceptual, attentional, cognitive and affective responses in experimental psychology are extremely limited with regards to the contents of consciousness that they tap. Lab tasks enabling measurement are must be simple, e.g., identification, comparison or categorisation of visual stimuli, rather than free description or recall. Self-reports associated with such tasks must be quantifiable and take the shape of choice responses, simple intensity ratings or readily codifiable reports. Behavioural measures are farther removed from any contents of experience because these need to be inferred. Here, too, simple objective coding is a must. Descriptive and interpretative reports of the qualia and meaning of experiences afforded by film have been largely left to hermeneutic film criticism and phenomenologically oriented film philosophy in the humanities. Scholarship in these fields follows in the footsteps of Münsterberg. The present overview of the psychology of the film cannot go into it further; I refer to Sobchack’s (1992) volume on the phenomenology of the film experience. It opens with the proposition that film directly expresses perceptions, a proposition coming close to the observation in The Photoplay that the contents of the audience’s experience are perceptions, attention, thinking and emotion that are projected before them on the screen.
Absorption in film
Meanwhile, progress can be reported in understanding one aspect of the rich and complex film experience namely its intensity. Münsterberg observed that the film audience’s enjoyment is due to prolonged states of attention strongly focused on a fictional story-world, so strong in fact that the here and now escapes consciousness and it seems instead as if an 'outer world were woven into our mind'. Elsewhere we have proposed to refer to the experience of intense attention as absorption in a story-world (Tan et al., 2017), following Nell's (1988) groundbreaking description of "being lost in a book". Media psychologists specialised in research on media entertainment (Vorderer et al., 2004, Bilandzic & Bussele, 2011) have developed a variety of measures capturing enjoyable absorption-like states afforded by narrative, television drama and video-gaming. We discuss four of these.
a. Narrative engagement (Bussele and Bilandzic, 2008, 2009) is a pleasant state of being engrossed or entranced by the narrative as a whole as it is presented in a book or film, including the activity of reading or viewing it.Footnote 47 (Tele-)Presence (Schubert et al., 2001; Wirth et al., 2007; and others) refers to the embodied awareness of being in a virtual world: being there with your body, in other words absorption in a story-world.Footnote 48 The concept has its origin in research into the experience of virtual realities.Footnote 49 Attempts have been made to ground mechanisms of film-induced emotion on presence that is the audience’s basic and embodied awareness of being in the middle of the story-world as a witness to events befalling characters Anderson (1996); Tan (1994, 1996).
b. Green and Brock’s (2000) definition of transportation is the most frequently used conceptualisation of absorption in media-psychological research. It is considered a major gratification offered to readers of narrative and film viewers alike. It overlaps with presence in that it features a sense of being in the story-world, as well as a realistic and attentive imagery of details. The difference may be that as a metaphor transportation evokes associations with transition to or travel into the film’s story-world.Footnote 50 More than presence, the operationalisations of transportation entail personal relevance and participatory sympathetic feeling, amplifying the emotional quality of the experience.
c. Empathy is the common denominator for concepts referring to absorption in the inner life of fictional characters. Like transportation, it is seen as a major gratification in reading stories and watching drama and movies. Viewer empathy has been defined as perceiving, understanding and emotionally responding to character feeling in the seminal work on the subject by Zillmann (Zillmann, 1991, 1996). Perceived similarity and sympathy for the character (grounded in moral attitudes) have been suggested and tested as determinants of spectator empathy in drama (e.g., Zillmann, 1996; 2000; 2003; 2006).Footnote 51 There is still a need to sort out possible forms of empathy specific to the canonical conditions of the cinema which may be quite different from situations in real life where we observe other persons.Footnote 52 Moreover, empathy with film characters can be less or more cognitively demanding.Footnote 53 Identification (e.g., Cohen, 2001) seems to stand for complete absorption of the viewer’s self by a represented character.Footnote 54 It can be argued that empathy is the rule in film viewing while identification is the exception (e.g., Zillmann, 1995; Tan, 1996, 2013a, b), as most mainstream film narratives are mainly geared towards provoking the former rather than the latter. According to Smith (1995) they use 'alignment' techniques that promote perspective taking and allegiance strategies that foster viewer sympathy for the character while the distinction between self and character is unaffected.
d. Finally, flow (Csikszentmihalyi, 1997) is the odd person out in the series of absorption-like experience concepts reviewed here, because it applies not only to absorption in movies, narratives or games, but to any activities that stand out for a certain intensity and intrinsic reward as well. The rather simple idea supporting the concept is that a pleasurable state is experienced when the challenges inherent in an activity just match the person’s capacities. In the canonical setup of mainstream film (and mainstream audiences) this balance is generally realised due to filmmakers’ skilful presentation of interesting story-events, and the overlap of it with attentional, perceptual and cognitive routines that film viewers have acquired in the real world. Mainstream movie continuity film style facilitates flow a great deal as it tedns to minimize challenges posed by transitions from one view or perspective to another. Smith's (2012) studies were discussed above as relevant to smooth continuity of visual attention, and I would also mention the research on comprehension of events by Schwann (2013; Garsofsky & Schwan, 2009)
Obviously, these and other varieties of absorption are not mutually exclusive. Elsewhere we have presented qualitative empirical support for a dynamic interplay among the varieties of absorption (Bálint and Tan, 2015).Footnote 55
From the overview we may conclude that Münsterberg’s introspective psychology of the film experience is in large part echoed in the empirical observations gathered one century later. Viewers feel absorbed in another, exceptionally vivid reality, 'clothed in the [embodied] forms of our consciousness' (presence and transportation). Empathy is mentioned by Münsterberg as a prominent experience, and his notion of an unhampered stream of the imagination may correspond with the experience of flow. Focused attention is already in The Photoplay a major component of the film experience, that would later be investigated in research on bottom-up vs. top-down attention discussed above. Absorption, empathy and intensely focused attention can easily substantiate the enjoyability of watching films as Münsterberg already would have it. However, compared to Münsterberg’s conceptualisation of the typical film awareness, insights into how acts of imagination on the part of the spectatorcontribute to it have not advanced that much in the psychology of film.Footnote 56
A narrative simulation account of emotion in film viewing
Absorption is an affective state characteristic of the film expeirience. However, a description of the typical experience of narrative films is incomplete if more specific affective states are not considered. Watching movies has been identified with emotions. We go to the cinema to experience mirth, compassion, sadness, bittersweet emotions, thrill, horror, and soon in response to what we see and hear happening to characters and ourselves. Emotions of movie audiences have not received much attention since Münsterberg’s Photoplay. Twenty-first century film psychology has taken up where he left off, and a major step forward has been to regard the narrative structure of films as a fundamental starting point for explaining film viewer emotions. The narrative simulation account is, I think, dominant in today’s psychological approaches to the issue of why the cinema offers the intense and remarkable emotional experience that Münsterberg’s photoplays induced a century ago. Important work on emotion in media users has been done in media psychology, most on empathy with characters, but narrative induced emotion has not received much attention, as can be seen from a complete overview by Konijn (2013). Cognitive scholars in the humanities have highlighted different aspects of film narratives that induce perceptions of fictional events associated with intense emotional experiences (e.g., genre-typical film style: Grodal, 1997, 2009, 2017 ; Visch and Tan, 2009; narrative procedures, e.g., Smith, 1995; Plantinga, 2009; Berliner, 2017). I hope the reader will allow me to use my own work on the subject as an illustration. It is closely related to the cognitive - theoretical analyses just referred to. I have found a cognitive approach to emotion in general psychology fruitful for narrative modelling of emotion in film viewing.Footnote 57 Investigations of film-induced emotion have raisedthe issue of apparent realism: how can a clearly fictional world be taken for real to the effect of intensely moving emoting viewers? Oatley introduced a cognitive theory of narrative fiction as simulation (1999, 2012, 2013) that applies to film as a stimulus for possibly complex emotions. Narrative runs simulations on the embodied mind just as programs run simulations on computers.Footnote 58 I would add that filmviewers take part in a playful simulation in which the film leads them to imagine they are present in a fictional world, where they witness fictional events that film characters are involved in (Tan, 1995, 1996, 2008). Being a witness involves embodied perceptions of what happens in a fictional world, as well as in the imagination constructing and participating in events, without acting on these. In the process, events are taken for real for the sake of playful entertainment. This position is related to Walton’s (1990) well-known account of fiction as make-believe.
Frijda’s cognitive theory of the emotions (Frijda, 1986, 2007) is the starting point for further explanation of emotional experiences in response to film. The theory posits that the emotion system has evolved for adaptive action in the first place. For example, the sight of a monster will spawn a strong urge to flee due to a basic concern for safety being jeopardised. Of course, film audiences do not run out of the auditorium. According to the cognitive theory of emotion, action responses are not fixed responses to emotional stimuli, but the result of appraisals of what they mean for a person’s concerns in light of the situational context. Playful simulation provides the contextual frame for the complex appraisal of apparent realism of film events. The appraisal has three stages: perceptual, imagination based and self-involved.Footnote 59
1. Many popular film stimuli provoke immediate and automated appraisals of concern relevance and ensuing emotional responses, due for instance to their nature of unconditioned stimuli in the real world. A snake popping out from the bush would be an example. Emotional appraisals in the cinema can be and often are empathetic. That is they include perspectives on events taken by film characters. Film technology in mainstream movies is used to emphasise emotional triggers; editing could strengthen the suddenness of the snake’s appearance, and photography could render fear releasers such as the typical movements of the snake more salient.Footnote 60 But popular films also present us with emotional stimuli that are immediately perceived as fake, for example a rubber prop snake. Due to the playful simulation frame further cognitive processing of perceptions takes place. In the first case, film viewers realise that just perceived events are not real but must be held true for the sake of a playful simulation. In the second, they realise that the fake stimulus is only a prompt, and comply with its invitation to hold the stimulus true and allow it to appeal to their concerns, also for the sake of playful simulation.
2. Once imagination takes over from perception, the reality status of stimuli is traded for believability. As part of the imagination fictional events are matched with higher order genre-specific narrative schemas, and then dealt with as possibilities in a particular world. As Frijda (1989) argued when he discussed the apparent reality of fiction: 'Seeing a fake snake approach a real person is not scary. But watching an imaginary snake approach an imaginary Jane is. The first is seen as unreal in a real word, and the second as real in an imaginary world. And this is how we appraise events in fiction. The fun of art is in the play with the duality' (p. 1546). Play with the possibility of events in the imagined world and entertaining as-if emotions can suffice for genuine emotion to arise. As I argued elsewhere (Tan, 1996) the appraisal of the possibility of events in a particular fictional world can and usually does lead to genuine emotion, because humans have been equipped with a capacity to have emotions in response to mental representations of counterfactual and imaginary events. Footnote 61
3. The genuine emotion can—but does not need to—open up considerations of the believability of fictional events in the real world. Moreover, it can lead to imaginations in which the viewer’s self is involved in the events or their ramifications. The appraisal of fictional film events is treated in more detail in Tan and Visch (2018). The search for film style and technology features that are conducive to particular emotional appraisals has only slowly lifted off. Cutting's computational content analyses were already mentioned There are scattered empirical studies e.g. of camera angle and editing pace by Kraft (1987) and Lang et al. 1995, respectively. Film technique manuals and critical anayles provide abundant intuitively convincing examples of how to produce emotionally appealing sequences. It is to be expected that computational film analysis will soon enable large scale studies of the use of style and technology in emoting scenes.
Back to emotion and action. As film viewers perceive film scenes to be projections on screen of a fictional world, they understand they cannot act, and their action tendencies are suppressed.Footnote 62 As importantly, one’s inability to act upon a fictional world is a strong trigger for emotional responses involving the imagination of action. Driven by sympathy, viewers desire that protagonists escape from a horrific situation. In their imagination they anticipate and hope that the protagonist is saved by someone or something and if need be by a fictional miracle.Footnote 63 Thus, they experience or exhibit a virtual form of action readiness (Frijda, 1986).Footnote 64 This readiness for action can be directly observed in film viewers from their "participatory responses" (Bezdek, Foy & Gerrig, 2013) - such as overt expressions of sympathy for a character (see also Tan, 2013b). However, there is one thing that film-viewers as witnesses invariably do when properly emoted: eagerly watch the events on screen.
Following cognitive film theory further, I consider the emotional experience of film as the sum total of experience of the appraisal, internal and external bodily expressions and changes in action readiness integrated in consciousness in accompanying the sensory intake of units of film.
Film, interest and enjoyment
An account of `film - audience emotion is incomplete if it does not go into the question why we actually take the trouble of watching movies. Münsterberg already wondered how mature people can become so emotionally absorbed in fantasy worlds. Narrative films can be argued to address two basic emotional concerns in particular, curiosity and sympathy (Tan, 1996). All sorts of narrative fiction, including film provoke interest by presenting events with uncertain consequences. Thus, they address a basic curiosity, that is a need for novelty, knowing and exploration. Interest is the emotion that responds to appeals involving this concern. Interest in film viewing does have a real action readiness to it referred to above: watch eagerly. Because the response in interest includes spending and focussing attention to specific story-world events, its experience goes hand in hand with absorption. Mainstream film’s narrative is perfectly designed to support a characteristic systematic unfolding of interest as an emotion. Movies continuously present cognitive challenges that viewers know they can meet.Footnote 65 Silvia (2006) has shown in a greater number of studies that this is the condition for optimal interest. I have referred to the core appraisal of narrative interest as promise of rewarding outcomes, in terms either of desirability for a protagonist or mankind in general, or of coherence, completeness or elegance of a narrative’s structure, or both (Tan, 1996). In addition, the prospect of sought emotions, such as excitement, enjoyment and appreciation is as well part of the promise that ongoing film narratives constantly offer.Footnote 66 Interest is closely linked with enjoyment, the primary gratification that movies offer their audience. In the cinema interetst is pleasant because it is fun to entertain anticipations of as yet uncertain story-outcomes. Moreover, every outcome, even if it is unanticipated or unfavorable, is greeted with enjoyment because it answers one's curiosity. (In the case of sad, horrific or otherwise hedonically negative or mixed outcomes, "enjoyment" is not the proper label for the rewarding emotion. We return to the fun of unpleasant emotion in a later section).On a final note, interest in film viewing is a case of narrative interest as a broader category of emotions, but the sensory qualities of the medium are relevant for how interest feels. Curiosity to know is in part a desire for the closure of a propositional narrative structure, but in the cinema we do not only want to know but also to see and hear. The enjoyment of seeing a couple kiss or a heroine return after an odyssee of some sort is in the cinema incomplete when it is not shown. In the cinematic appraisal of interest, an anticipation of embodied completion of our narrative-led imagination is a major ingredient of the promise of reward.
Emotional responses to fiction film worlds
The second concern that movies touch upon is sympathy. That this concern is active throughout the reception of all traditional movies answers the question why film viewers care about damsels, hobbits or gorilla’s in distress. There is a fundamental human need for bonding with others and recognising whatever fictional character as someone 'like us' supposedly suffices for sympathy to arise.Footnote 67 Mainstream films activate the concern to the full as their sympathetic protagonists meet with ups and downs in on the way to their goals. Sympathy-based emotions like disappointment, regret, awe, mirth, suspense, hopes and fears, compassion and sadness occur in response to obstacles or their removal on the way to protagonists realising their projects.Footnote 68 Because these emotions arise in response to events (appraised as desirable or undesirable) in a fictional world, we refer to these emotions as responding emotions.Footnote 69 Some frequently experienced sympathetic responding emotions such as fear, sadness, compassion and being moved, can be empathetic, that is require mentalising a character’s inner life. Said more precisely, empathetic emotion requires that the viewer’s appraisal of any fictional events reflects the perspective of a character; the event is understood from a character’s imagined point of view and with her concerns, and feelings. In its most intense forms, sympathy can look and feel like self-indulgent sentiment. However, there is no point in condemning tears of sadness or joy as silly. The term sentiment is not necessarily pejorative. The appraisal of a character’s suffering or good doing can involve an acknowledgment of its superior measure, notably in relation to the self’s suffering or good doing. In my compassion with or admiration for a beloved character I can feel that her fate is really woeful compared to mine, or that her altruistic achievements make mine totally insignificant. Being moved, awe and having goose bumps are emotional responses accompanying such appraisals (Tan and Frijda, 1997; Tan, 2009; Wassiliwizky et al., 2017; Schubert et al., 2018) Footnote 70
However, not every responding emotion requires empathy or sympathy.Footnote 71 The sympathy concern does not only drive our siding with characters and responding emotionally to the ups and downs in their projects. As I proposed (Tan, 1996) it can make us invest affectively 'film-long' in characters, on top of going along in their hopes and fears, successes and failures. We are also witnesses of characters’ slower and more profound development into personae we would want them to be. The share of action or plot development relative to that of character differs from one genre to another.Footnote 72 Generally, action movies and especially comedies tend to allow for only minimal character development, whereas the drama genres may indulge into it. In these genres, viewer interest may depend in larger part on characterisation and character development.
Another class of emotions responding to the fictional world are 'spectacular' that is spectacle based. The spectacle of landscapes, buildings, natural objects and artifices, human or animal figures in motion, can surprise us and touch on a sense of beauty and invoke appraisals of harmony, elegance, or serenity. In some genres the spectacle of explosions, injury, cruelty disfiguration, etc. may incite disgust, fear raise emotions. Spectacle-based emotions do not rely on empathy of any depth, their stimulus being the mere view or sound of a fictional scene; they are neither dependent on sympathy. In more traditional terms, image and sound combinations of objects, events, and figures in the fictional world can be emotionally appraised as spectacular, beautiful, sublime, horrific, bizarre, absurd and so on. Amazement, enjoyment, awe (the wow-feeling), entrainment, being moved and aesthetic appreciation are apt labels for ensuing emotions. Like all emotional responses to fiction worlds, spectacle-based emotions can also arise when we read narratives, but in the cinema, they compete conspicuously with plot and character-driven interest and sympathy-based affective response. It seems like the viewer’s witness role is temporarily swapped for a spectator role.Footnote 73 The viewer can identify even further with patterns of motion or sequences of image and sound that lack reference to the film’s story-world. Viewers may contemplate lyrical associations of visuals, sounds, music and symbolic concepts in embodied consciousness as Grodal (1997) proposed. If story action imaginations give rise to emotions, lyrical associations are responded to with moods, e.g., nostalgic, tense or relaxed ones. The seemingly immediate representations on screen of emotions through camera movements and associative editing editing that Münsterberg described would be examples.
Emotion structure of narrative film
As a way to profile the dynamics of emotion across an entire film I proposed to represent these in a succinct model, the affect structure of a film (Tan, 1996). The model represents the course of interest and of responding emotions in time as predicted by theevents as they are subsequently presented by the film.Footnote 74 Generalising across titles, a most general hypothesis is that the level of interest during mainstream movies tends to rise globally. This is because on the way to protagonists’ goals, stakes tend to go up every novel complication. This will lead to increasing promise of reward roughly between the prologue and climax acts. Locally though, interest peaks and dips alternate over subsequent scenes, depending on genre and particular film. Figure 5 displays an example course of interest measured in viewers of the film In for treatment. In this study of emotions induced by a tragic drama on a terminally ill hospital patient, we found that an initial appraisal of the protagonist as increasingly suffering under the yoke of an oppressive hospital regime, was associated with a responding emotion of compassion. After the complication act, the protagonist’s acts of resistance against the hospital’s regime gave way to admiration due to an appraisal of the protagonist’s sense of self-determination. Both measures determined the level of interest measured continuously using a seven-point slider device (Tan and van den Boom, 1992). Affect structures can be more or less generic. That is, responding emotions are just like the plots, characters, and events that prompt these, characteristic for a certain genre. The study of genre-based emotion has been concentrated in research of undesirable effects of watching violence, sensation or horror in entertainment fare, see e.g. a volume edited by Bryant and Vorderer (2006). Psychological research into the role of viewer genre knowledge is on its way (e.g. Tan & Visch, 2009).
The appeal of unpleasant emotions
A brief glance at the success rates of films featuring sad, violent or horrific content illustrates the appeal that unpleasant emotions can have to audiences at large. Münsterberg already objected to vicious effects of violent and repulsive imagery in 1910s photoplays, contents that he observed to be worryingly attractive. The psychology of the film holds various explanations in stock, but none as yet chosen. The best documented proposal is Menninghaus et al.’s distancing-embracing model that stipulates two complmentary mechanisms. One rids painful, disgusting or otherwise unpleasant aesthetic stimuli from an impact that would prevent any enjoyment or appreciation of the stimulus. The other allows for experiences that are 'intense, more interesting, more emotionally moving, more profound, and occasionally even more beautiful' (Menninghaus et al., 2017, p. 1). The model is meant to explain the prevalence of negative emotion in all art forms, and harbours a great many classical approaches to the issue. Media psychologists have proposed what I think are regulation accounts of the pleasures of negative emotion. An emotion such as horror results from appraisal of monsters etc. as threatening and repulsive, but the emotion itself, too, can be subject to appraisal. Likewise, your crying in the cinema may induce embarrassment upon your realising that it is only a film you are watching.Footnote 75 Serious drama, the contents of which can be appraised as poignant or thought-provoking (Oliver and Hartmann, 2010), and more in particular independent arthouse titles that tend to provoke appreciation and elevation rather than enjoyment seem to compensate the most painful experiences they offer by a high instruction or (self-) reflection potential (Oliver & Bartsch, 2013). They offer continuous promises of broadening insights or revising one’s views of the world and the self, possibly only materialising to the full long after the show. In my own work I have pointed at the modulating effects of genre schemas (Tan & Visch, 2017) and narrative interest on negative emotions.Footnote 76
In closing the sections on film-induced emotion we need to note that the account of the cognitive appraisal of emoting events given here is simplified. Even straightforward film narratives can have complexities in terms, e.g., of plot lines, or character and narrator perspective that affect the intricacies of emotional events. I refer readers to Oatley’s (2012; 2013) discussion of in this sense more sophisticated appraisals of fictional events. More generally, film psychological research is needed into the use of more complex TOM heuristics in the comprehension of film narrative, and in emotional appraisals of film events.
The conclusion on the psychology of film awareness must be, I think, that the gripping nature of the film experience is as astonishing today as it was to early film audiences. Media psychologists have started to measure it, and cognitive film scholars have forwarded theoretical frameworks for an account of film viewer affect and emotion. But the phenomenology of film has not been expanded by film psychologists beyond the descriptions of what it is like to watch a movie provided in The Photoplay.
The psychology of film as art
Whether or not the awareness of film entails appreciations of artistry can only be a rhetorical question, but the psychology of the film has not explicitly addressed the subject. After Münsterberg and Arnheim hardly any psychologist considered film as an art form at all. And neither have general psychological aesthetics taken film into consideration. The psychology of narrative film as it developed since the 1990’s has addressed the aesthetics of movies, but rather implicitly. We have discussed psychologists’ efforts to explain the natural fluency in the perception of story-events that Münsterberg already found characteristic for the film experience. They pointed at the conventional use of continuity film style. Mainstream cinema’s narration has been demonstrated by cognitive film theorists to be at best marginally self-conscious (Bordwell, 1985, 2006). That is formal features of a film’s composition, style and use of technology are non-salient and subservient to the viewer’s reconstruction of and absorption in a fabula. The viewer’s construction of a story-world is only discretely cued by the narration, and formal or stylistic patterns that do the job tend to escape consciousness to a more than considerable degree (see Tan et al., 2017). We could say, I believe, that the psychological aesthetics of popular film is as it stands, first and foremost about absorption, the intense and fluent imagination of being in a fictional world. And it should be added that a psychological aesthetics of forms other than popular narrative fiction film is missing. Available knowledge suffices to propose a psychology of the thriller, the romance drama or the coming-of age film, but not for a psychology of the documentary, the expressionist, the surrealist or the postmodern film, let alone of experimental, avant-garde and other museum film art forms. After all then, at present we are not far removed from Münsterberg’s speculation on the aesthetic experience of theatrical film as intense absorption due to the inner harmony of a film’s parts and conditional on only modest deviations from realistic photo-representations of the worlds that it plays.
However, as we write, everything seems set to embark on research in the film audience’s aesthetic appraisals of movies. We can rest assured that at present 'the inner parts' of mainstream film in terms of contents, style and technology have been well-described by film theorists such as those referred to above. They can help psychologists teaming up with computer vision and hearing specialists to develop computational analyses of 'the inner harmony between the parts'. As a favourable sign of the times we also note a growing interest in the implicit knowledge that the regular film audience has of patterned uses of film style and technology in various forms and genres (see, e.g., Visch and Tan, 2009). Moreover, the first attempts have been made to identify the psychological dimensions that underlie film audience aesthetic tastes.Footnote 77 Dimensions of what I called the Artefact emotions, that is the affective evaluations of films as aesthetic products will soon be identifiable from reviews by critics and the film audience at large that are already available in large data repositories.Footnote 78 Large scale highly data-intensive research can be accompanied by smaller scale laboratory studies of whether and how viewers attend to aesthetically relevant patterns of formal and stylistic features.Footnote 79
The agenda that Hugo Munsterberg set for the psychology of the film, explaining the film experience through revealing psychological mechanisms underlying it, and accounting for its aesthetic functions is after a century still leading. I believe that psychologists of film have over the century not added new questions, while the ones he posed have been shown to be complex or even resilient. Nonetheless the field has gradually expanded. After the 1970's growth accelerated and today we face what in modesty may be called a surge. Two film-psychological books, Art Shimamura’s Psychocinematics (2013) and Jeffrey Zacks’ Flicker: Your brain on movies (2014), have recently filled the void left after The Photoplay.
The review of psychological studies into the film experience presented in this contribution is highly selective. It was not meant at all to cover the entire field, if only because we selected achievements from the vast research area of moving images and their perception. This is why the essay is titled 'A psychology of the film' rather than 'The etc.'. Granted its basic limitations, an overview of a century of film psychology could conclude with a comparison with research agenda that was set in Münsterberg’s Photoplay. The typical gripping experience that mainstream movies offer the audience has now come to be characterised as a sense of being absorbed by and quasi-physically present in a film scene that feels like going on as smoothly and continuously as a scene in real life. Considerable progress has been made in understanding how the basic psychological functions attention, perception and memory contribute to viewers’ comprehension of film. An understanding has developed of how attentional, perceptual and cognitive mechanisms dovetail with the solutions and norms of traditional cinemascopy. In the conventional 35 mm theatre set-up, the dark environment where high-density projections extend over the limits of the foveal acuity field, screens are big enough to allow for sufficient stimulation of the peripheral motion-sensitive visual field and the spinning projector shutter makes for smooth stroboscopic movement. Moreover, the visual system is quite resistant against perspective transformations due to less optimal viewing points, probably through extracting invariants under transformation (Cutting, 1986). Mainstream narrative continuity film-style ensures a fluent perception and comprehension of a film’s story-world, action, characters and their inner lives. Emotional responses can be explained from the development of the story and the progress of protagonists’ projects.
And yet, a lot less effort has been spent in theoretically elaborating further on what the film experience is. There is a general disbelief that it would involve a mere recognition of events, situations, persons etc. as we know them in the real world. But what exactly the spectator’s imagination contributes to the typical awareness of the film is still mysterious. And how filmic events, and the ways they have been staged, acted, framed, photographed and edited exactly influence and prompt acts of imagination on the part of audiences, has only in part been understood.
Meanwhile, the supply of "photoplays" has immensely multiplied and diversified since 1916, but the mainstream narrative film has by far remained the most popular form. Today’s ubiquitous access to moving images through a multiplicity of screens has made it more urgent than ever for psychologists to understand the experiences associated with extremely different cinematic devices. They range from handheld phones to giant 3-D multiplex screens and surround installations in museums. Canonical set-ups of the cinema also tend to diverge because of networked interaction technologies seeking application in the production, distribution and exhibition of motion pictures. Psychologists of the film can use their current understanding of how audiences experience mainstream cinema as a basis for differentiating what film semiologists call 'dispositives': clusters of production, exhibition and reception practices characterised by specific expectations, attitudes and competences of their end users.Footnote 80
The psychology of film is rapidly developing into an interdisciplinary field. Münsterberg’s psychological study already reflected inspiration from fields far removed from experimental psychology such as the then conventional practice of the photoplay as well as from Aristotelian poetics of the theatre play. In the same vein, current psychologists of film as we have seen, improve their understanding of the perception and cognition of film in a collaboration with experts in the analysis of narration in the fiction film. Advances in current models of film viewer attention featuring narrative cuing are profoundly informed by (historical) film analyses.Footnote 81 Scholars in cognitive film studies, such as those collaborating within the Society for the Cognitive Study of the Moving Image are steadily producing in-depth analyses of film at work conjointly with the viewer’s mind.Footnote 82 The same goes for the (more modest) advances made in psychological models of film-produced emotion. Further collaborations with specialists in machine-analysis of image and sound can be expected to add to an objective identification of formal and stylistic film structures, also beyond the domain of traditional mainstream film, 'in the wild' of cyberspace, and in experimental art cinemas.
The technology of measuring psychological responses to film structures (perception, attention, memory and affect) has also developed tremendously since Münsterberg founded the perception lab at Harvard. Gaze tracking, fMRI and TMS have been added to the psychophysical and cognitive response registrations. Integration of large scale image analysis data with behavioural measures obtained in the lab or as 'big data' is the next step in the development of film psychology. The study of integral responses to units of film extending beyond a few seconds entailing entire actions, events, scenes and acts, or even films as a whole, requires new response recording devices and data models. Perhaps it will be feasible within a decade or so to append large emotional response datasets obtained from social media and filmdatabase metadata to computational content analyses described above. We will then be able to categorise films into meaningful clusters, e.g., genres and subgenres based on relations between themes, plots, film style and emotion profiles. Small scale lab experiments can tell us more about what exactly the mind adds to the image on screen and the sound from cinema loudspeakers remains. Let me single out as the leading issue the question how bottom-up and top-down mechanisms interact in producing the film experience.Footnote 83 Diversification of the set-up of in-depth studies is also necessary following the multitude of conventional set-ups of film viewing on various screens and in on-line or 'live'(?) exhibitions.
And just as in 1916, a select but growing minority of researchers in academic, empirical psychology want to understand why and how it is we perceive and what it is like to enjoy movies. They want an understanding because first they are movie-loving psychologists and second they find film a challenging testing ground for fundamental models of attention, perception, memory, imagination, emotion and aesthetics.
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A more detailed discussion of the functions in photoplay viewing can be summarised thus: As regards the perception of film scenes, Münsterberg argued that in the cinema depth is seen without spectator’s taking it for real, that movement is perceived not without the spectator’s mind adding the quality of smooth motion to merely seeing a succession of positions. For example, apparent movement of in fact stationary lines is '… superadded by the action of the mind, to motionless pictures' (1916, p. 29). Attention in the cinema concentrates the mind on details that acquire an unusual vividness and become the focus of our impulses and feelings. Close-ups objectify this weaving 'of the outer world into our minds' (p. 39). Attention is characterised by a series of subsequent shifts in its object. Shifts are provided by scene or action details made salient by spatial mise-en-scène, notably actor expression (movement and gestures), and mobile framing. Memory is used at any moment to remember events presented earlier in the film. Just as attention and perception are an instrument of the imagination, memory enables the fusing of events in our consciousness that are physically apart. Münsterberg’s view of the emotions showed similarities with James’ theory on the subject, as it stressed their embodied character; emotions cannot do without behavioural and physiological expressions. Münsterberg proposed that emotions that film audiences experience are portrayed on screen. The viewer’s imagination transforms what they see into their own felt emotion: The 'horror, pain and the joy' that spectators go through are 'really projected to the screen' (p. 53). In addition, he introduced a distinction between what we would refer to today as emotions based on empathy with characters on the one hand, and on the other emotions responding to the scenes they are in.
Münsterberg’s observation of how film expresses the basic psychological functions has been compellingly argued by Baranowski and Hecht’s (2017) in their excellent review of Münsterberg’s Photoplay.
Even if what we call today automated responses do have a place in the psychological functions, perception, attention, and memory are according to Münsterberg in the end acts of the mind, and imagination is even more so.
The aesthetic experience is grounded in a Kantian conception emphasising the completeness of the work of art in itself, and an explicit denial of the contemplant’s desires or practical needs in it.
This in turn requires that we 'enter with our own impulses into the will of every element, into the meaning of every line and colour and tone. Only if everything is full of such inner movement can we really enjoy the harmonious cooperation of the parts' (p. 73).
This probably not in the least due to the stability of the experimental and social have been on the agenda of the psychology of film ever since. The functions and mechanisms of the mind that experimental research focuses on have globally remained the same, and the interest in aesthetics has not waned.
Constancies in visual perception are disrupted due to the optical and mechanic qualities of film. Examples in point include reduced depth, absence of colour, object shape and volume distortions due to insufficient information on object size or camera’s distance.
A famous example is the ballet sequence in René Clair’s Entr’Act (1924). Filmed through a glass plate on which the dancers move, they are seen from a most unusual angle, at least compared to the canonical views that theatre audiences have, i.e., from below, and from an as unusual distance, i.e., from nearby. So close indeed that their robes fill the entire frame, and the spectator is struck by their expanding contours in the 2D plane of the screen.
To be sure, his treatment of the perception of movement, dynamics and expression in works of all arts, seem to be modelled after the organisational principles the mind uses in shaping the film experience.
Cutting has often convincingly argued that stroboscopic motion is a better label than apparent motion. His definition is 'a series of discrete static images can sometimes render the impression of motion'(Cutting, 2002, p. 1179)
Why and how we see motion has been as basic to the study of visual perception as questions of perception of colour, depth, and shape. Helmholtz proposed that what we need to explain is how retinal images that correspond one-to one, i.e., optically with a scene in the world are transformed into mental images, or percepts that we experience. In the case of apparent motion, we need to understand in addition how a succession of retinal images are perceived as one or more objects in motion.
By smooth is meant that no transitions or flicker are seen, and no blurring of superposed images occurs. The problem of apparent motion in film has been formulated in this way by the Dutch perception psychologist and filmmaker Emile van Moerkerken in an unpublished chapter written in 1978. The issue of why and when flickering instead of smoothly projected images are seen has been technically resolved through trial and error. Cinematic projectors need to present at least 24 frames per second if flicker is to be avoided, and higher frequencies, for instance 72 fps are even better (e.g., Anderson, 1996, pp. 54–59). These frequencies are above the human perception system’s critical fusion frequency, at least for the conventional luminance ranges in cinematic projection.
In the late nineteen sixties the organisation of the cortical cell complexes for visual perception in layered columns were identified by neurophysiologists Hubel and Wiesel (1959). Cells in Brodman areas 17 and 18 were found sensitive to different aspects of motion (e.g., orientation and spatial vs. temporal resolution), while integration into forerunners of motion perception is assumed to take place in areas V4 and MT.
Luminance and colour identification have been shown to interact with the more motion dedicated complexes in delivering impressions of motion, while the phenomenon of perceiving depth from movement has been very well documented.
For example, form-invariant apparent motion—that seems to require somewhat less elementary integration has been shown attributable to specialised MT cells for slower and faster motion (O’Keefe and Movshon, 1998). And as another example, Anstis (1980) discovered a system based on comparison of subsequent locations for apparent horizontal motion of a single dot, and another one for the perception of wave-form motion of an array of dots.
For example, it has been reported that test participants accurately perceive velocity of motion of a grating pattern only when they pay attention to its details (Cavanagh, 1992).
As another example, tension in a static work of art is perceived due to the brain’s synthesis of forces from implied movements, such as outward-directed tensions perceived in symmetrical geometric shapes. These can be observed in 'gamma movement', Arnheim, 1974, p. 438.
The presentation times are short (flashes), say two-hundred milliseconds. The objects differ between the two presentations only in spatial position, we refer to these as A1 for object A in position 1, and A2. Depending on the interval between presentations apparent motion can be seen. With a briefest interval simultaneity of objects A1 and A2 is seen; less brief (appr. 100 Ms) makes us see 'pure motion'; that is 'objectless movement'; with still briefer intervals (appr. 60 Ms) we see 'optimal movement' of the object A1 to A2; and with briefest interval partial movement.
Wertheimer believed that perceived motion patterns reflected a short-circuiting between cells in the brain that were successively stimulated.
For example, among Korte’s laws, proposed in 1915, was a rule stating that the ratio of spatial distance between shapes and the interval between successive presentations was constant for the perception of 'good motion', clearly a Gestalt-like pattern. This coupling of the two features obtained in controlled studies, is surprising until today because purely mechanistic intuition would have it that increases in spatial distance would need 'compensation' by briefer inter-stimulus intervals to preserve smooth apparent motion. A related discovery, reported by Kolers (1972, p. 39 also militates against light-hearted use of an analogy with mechanics: Decreasing the spatial distance between successively presented shapes does not necessarily result in better movement.
First, the physiological account resting on 'prewired' neurocircuitry cannot do without integrative operations at a higher level of mental processing involving integration across separate cortical modules. Even if such operations are prewired, they represent contributions of the mind. Second, as importantly, the impact of visual stimulus features has on the perception of movement, and especially more complex forms, have been shown sensitive to control by the will within certain bounds. Third, figural processes in apparent motion appear to be extremely plastic, defying explanations by stimulus factors, as the example of induced motion illustrates.
As an illustration, even a somewhat forgotten proposal by Van der Waals and Roelofs (1930) according to Kolers, seems to go. They proposed that in apparent motion, the intervening motion is constructively interspersed in retrospect that is, only after the second presentation of the Koler object. And after Kolers' volume on apprent motion, several proposals have been forwarded on possible mechanisms. For example Kubovy and Gepshtein (2007) demonstrated in two experiments that spatial and temporal distances act either in trade-off or coupled to one another to provide for smooth apparent motion; the one at low speeds and the other at high speeds. None of the proposals have been accepted as the final solution, also because different definitions of the factors or the criterion for motion have been used.
Michotte (1946) attempted with some success to capture configurations of moving objects that would be perceived as instances of causation, a mentally represented concept. For example, block A is seen to 'push' block B forward if A approaches B (that is standing still) with an appropriate speed, and contact time. Alternatively, B will be perceived to 'depart' if some time in contact has elapsed before B moves away from A. In fact, Michotte’s experimental phenomenology was influenced by Brentano who was a major inspiration to the early Gestalt psychologists as well. Another great contribution by Michotte to the psychology of the film was that he was one of the first to analyse the problem of the apparent reality of cinematic scenes that Münsterberg and Arnheim had signalled. His diagnosis was that we see non-real objects, that is shapes projected on the screen. However, we do perceive—physiologically—real movement of these, and this is a condition presumed to be decisive for perceiving reality. Heider and Simmel are known for their demonstration of the inevitability of event, person and story-based schema-based inferences that viewers of simple animated geometric figures tend to make (Heider and Simmel, 1944).
Note that objects are not part of an optic array, as the latter refers to the metrical organisation of patterns of light.
There are certainly limits to the likeness of the dynamical optical flow offered by film images to real world ones. First, the flow is interrupted by cuts, and second the projected image in the cinema constrains the optic flow in a variety of ways. (Thanks to one the anonymous reviewers).
The discussion of Hochberg and Brooks’ psychology of the film is based on an earlier essay (Tan, 2007).
Hochberg and Brooks (1996a) provided wonderful examples of the intricate aesthetics of camera movement when filming a human figure in motion, examples that require frequent analyses of filmed dance, or to film dance oneself, as Brooks has done indeed. Movement may be seen where there is actually none, apparent reversals of direction or apparent stasis may all occur, even in parallel. Hochberg and Brooks (1996b) demonstrated that complex movements need to be ‘parsed’ by viewers into components depending on factors such as fixation point and even viewer intentions. Direct realist explanation of the film awareness would soon stumble on degrees of stimulus complexity too high to capture in optical array invariants; input from other cognitive structure-based mechanisms capable of selecting candidates for 'pick-up' would be necessary.
Hochberg (1986) stated that in some cases only the most complex cognitive efforts could explain an understanding of shot transitions, that could only be conveyed through literary analysis. Here he was probably referring to cases in artistically highest end productions.
For example, Hayhne (2007) criticised Hochberg’s stipulation that mental schemas used in understanding shot transitions cannot be spatially precise or complete. She quoted evidence of the use of self-produced body movements following a mental map with extreme precision.
According to one such theory (the so-called Event Indexing Model, Magliano, Miller & Zwaan, 2001) viewers of film like readers of stories generate embodied cognitive models of (story-) situations. These mental models represent sequences of events, people and their goals, plans and actions, in spatiotemporal settings. The situation model is continuously updated while the film proceeds. Updates follow upon the identification of changes in story-entities (e.g., movement of characters or objects), time, causality and intentionality.
This synthetic response by the viewer can be taken as the actual recognition and categorisation of an event or action. Neuroscience research has identified areas of the brain involved in recognising—and 'simulating' actions such as grasping an object, or exhibiting a facial expression, e.g., Hasson et al. (2004).
As an example study, Garsoffky et al. (2009) demonstrated that the recognition of events by film viewers improved when framing objects or events across shots adheres to viewpoints that are common in real world perception. Other studies tested the notion that movies adhering to this style present viewers with simplified event views that they can readily integrate in an available event schema (e.g., Schwan, 2013).
The cueing of attentional shifts to the target portion of screen B can assume distinct forms, such as through match on action, establishing and shot/ reverse shots, and point shot. The attentional shift has carried the conscious experience across the discontinuity in views. The theory is documented by numerous analyses of scene perception, in which analysed shot contents are overlaid with dynamic gaze maps. The model can explain how violations of continuity principles result in less efficient gaze behaviours. Artistically motivated violations are taken seriously, but dealt with as atypical for the canonical set-up.
Bezdek et al. (2015) report a study in which participants were shown a film scene at the centre of fixation while checkerboard patterns were flashed in the periphery of vision. The results of fMRI analyses showed that activity of peripheral visual processing areas in the brain was diminished with increasing narrative suspense of the scenes, whereas activity in areas associated with central vision, attention and dynamic visual processing increased.
In one experiment, viewers were presented with a sequence from Moonraker in which James Bond jumps out of a plane and can be expected to fall 'safely' onto a circus tent. This high-level event schema-based cognitive expectation was enhanced in one condition but not in another, through providing a written context before the sequence was shown. It turned out that providing context knowledge led to the critical inference and to less surprise, pointing at the functionality of high-level attention cues. However, gaze behaviour did hardly differ between the high-level cued vs. non-cued viewers. Moreover, effects predicted from a tyranny of film analysis of the sequence—that is where viewers looked and what, were much stronger than the subtle effects of high-level cognitive processes.
The computation of visual salience can easily be extended to the case of film by replacing the input image by a series of frames and the output by an array of saliency maps. Furthermore, low-level features such as colour and orientation need to be integrated over successive images into dynamic ones, e.g., changes in orientation, and into motion features.
For example, an international group from the universities of Brescia and Teesside has recently shown able to predicts movie affect curves that is, dynamic patterns of emotional responses, from low-level features such as colour, motion and sound, while taking into account the influence of film grammar (e.g., sequences of varying shot-types) and narrative elements (e.g., script or dialogue analysis classifications). The analysis of the grammatical and narrative features can be supported by the computer but are not entirely machine-executably algorithmic. The emotional responses were measured using physiological and self-report measures (Canini et al., 2010).
In his earlier widely acclaimed work in general visual perception, Cutting continued the Gibsonian ecological approach to the perception of real world scenes, attempting to find formal extraction and coding principles sustaining the direct pick-up of behaviourally relvant information. See, e.g., Cutting (1981), in which ecological tenets regarding the perception of events based on invariant structures in the information offer of the visual stimulus. This line of research also included cinematic perception. An example is his study on the perception of rigid shapes when viewers are seated at extreme angles vis-à-vis the centre of projection, e.g., front row side aisle (Cutting, 1987).
In the essay Cutting lists the cues in the optical array that sustain the perception of distance in the real world, and then elaborates on how filmmakers manipulate depth cues in order for the audience to perceive scenes exactly the way the narrative requires them to.
Following the convenient overview in Brunick et al. (2013) they are for duration average shot duration in seconds; for temporal shot structure the distribution of shot durations; for movement the degree of difference between pixels in adjacent frames (zero when frames are identical means no movement); for luminance the degree of black vs white of images; and for colour the distribution of hues and degrees of saturation of frames.
For example, in the analyses just mentioned Cutting et al. established in their Hollywood sample an increase of movement between 1905 and 1935 and could relate this finding to film-analytic accounts of stylistic changes supporting growing emotional impact of movies. As another example, consider the well-documented finding that shot duration tends to decrease across the history of popular film. Salt (2009) reported a linear decrease of average shot length. Cutting and Candan (2015) could use his data and added nuances to the general linear decrease trend that they replicated. One was that different slopes for shot classes obtained, especially in the post 1940s’ Hollywood films, another that shot scale, in particular increasing use of wide angle shots, contributed considerably to the decrease in shot duration.
The climax works towards the minimum as the narrative tends to progress here presenting focused events without disruption, while its scope is wider and shifting in the set-up and epilogue acts. Consistently, during the climax movement is more frequent while shots also tend to be darker compared to the remaining acts. The set-up and epilogue contrast most conspicuously with the climax, while complication and development exhibit steady in-between values for the low-level feature parameters.
They do not manifest physically, but their indexing is perceptually straightforward. One is time shifts, a structural feature. It decreased over the time of a film, in line with the film-narratological notion that a film’s action thickens towards a deadline. Three other higher-level features were more semantic in nature. Character appearances dropped after the set-up. Action shots were most numerous at the end of the set-up and the beginning of the climax, while conversations levelled down during the climax.
Cutting’s (2016) interpretative qualifications illuminated the stylistic distinctions among the acts. They are most informative and any summarisation would be detrimental to the value of the analyses. To give just one example For example: 'The development also has several characteristics in contrast to the complication: its shot durations are a bit longer (Study 1), it has more noncut transitions (Study 2), and it is dimmer (Study 4) so that by its end the luminance falls to the psychological and literal “darkest moment” for the protagonist' (Cutting, 2016, p. 24). I encourage the reader interested in the stylistic comparison of the acts to reading the original article.
An example is an analysis by Cutting et al. (2011) of 150 historical films were indexed as to movement and shot duration. They observed a decrease of movement with decreasing shot durations, and reasoned that a basic perceptual mechanism could be at the basis of this correlation: people can only follow so much movement in a duration-limited view. The researchers then analysed newer films that far exceeded the maximum movement-to- shot duration ratio, and it was found from the public discourse around the titles that viewers could not cope with the overload stimulation.
Dimensions captured in the instrument include comprehension of the narrative, a sense of being in the story-world, emotional responses to story-world events and characters, and attentional focus on story-world details. The remaining experience concepts refer to experiences of entertainment or story-worlds excluding awareness of a narrative or any other constructions underlying these.
Hinde (2017) has recently presented evidence showing that self-reported presence is positively related to response latencies in a dual attention task in which participants were required to respond to a distractor signal while watching a movie. This result supports the notion of absorption and loss of awareness of the real world.
Variants of presence stress embodied apparent reality of the portrayed world, and the loss of awareness of mediation. Loss of awareness and apparent reality point to the illusion of being absorbed by the story-world. Presence seems the most immediate experiential outcome of natural or real-world scene perception and event comprehension mechanisms. It was implied in Gibson’s summary of the awareness of film: 'We are onlookers in the situation, …, we are in it and we can adopt point of observation within its space'.
In this respect, the concept of transportation builds on Gerrig’s (1993) seminal work on the experience of narrative worlds. Transportation requires a 'deictic shift' (Segal, 1995) from the real to the story-world (Segal, 1995 in Bussele and Bilandzic, 2009). When the narrative ends the spell is broken and the audience returns into the previously inaccessible real world.
In line with general psychological research on empathy, a distinction has been made between embodied simulation of film character feeling and a cognitively more demanding forms of empathy with characters (e.g., Tan, 2013a, b). Complex forms of empathy that require TOM cognition presuppose that there is an awareness of the distinction between self and other. The highest degrees of absorption by characters (measured by items such as 'I became the character') seem characterised by a complete fusion of the viewers’ self with the character and are properly referred to as identification (e.g., Cohen, 2001). In this case, viewer emotion is identical with character emotion.
For example, cinematic techniques of selective or emphatic framing of character expression can lead to stronger mimicry or embodied simulation on the part of the viewer than observation of a person in the real world would allow (e.g., Coplan, 2006; Raz et al., 2013).
The less demanding forms are based on automated embodied simulation or mirroring, for instance mimicry. Complex forms involve mentalising, or reasoning supported by general Theory of Mind schemas and inferencing. The most demanding occur when the film’s narration withholds information about a character’s inner life in relation to story-events as in some arthouse films (Tan, 2013a, b). Mentalizing has like cognitively less demanding forms of empathy been shown to be affected by film style. Rooney and Bálint (2018) recently demonstrated that close-ups of the face stimulate the use of TOM in the perception of characters.
Identification has been empirically observed and isolated from other forms of absorption by Cohen (2001); Tal-Or and Cohen (2010); Bálint and Tan (in press).
In an attempt to qualify what it is like to be absorbed in a film, Bálint and Tan (2015) synthesised a summarising dynamic image schema, from a study of film viewers’ reports on their own experience of absorption while watching a film. Image schemas are culturally shared embodied cognitive structures that have been identified by cognitive linguists and are hypothesised to underlie cognition and experience and are more specifically used in metaphorical thinking and use of language. The schema entails the viewer’s self-travelling into the center of the story-world. The self exerts forces to remain inside the story-world, and is taken there in some cases notably by the author. In Bálint and Tan’s study, readers of novels turned out to use the same image schemas to describe their experience as film viewers.
It is noteworthy that Münsterberg considers the activity of the basic functional mechanisms perception, attention and memory as consisting of 'acts', rather than responses as it would become common in mainstream experimental psychology, see, e.g., p. 57. 'Imagination' refers to acts resulting in 'products of the active mind' (p. 75) in particular memories, associations and emotions added to perceptions as 'subjective supplements' (p. 46).
For an overview of current cognitive emotion theories see Oatley and Laird (2013).
Through procedures such as suggestion and juxtaposition of fictional elements and perspectives, and due to strong coherence of elements, simulations are as engaging as to allow for recipients’ explorations of social situations, involving the self. This results in emotions ranging from the more basic to the social and culturally sophisticated type.
The stages correspond to Oatley’s (2013) direct, imaginative and self-related modes of appraisal in film-induced emotion.
There is some literature on the affective potential of mainstream film techniques. See for example experiments on camera angle and image composition on emotional appraisal of objects and characters such as weakness, tenseness, dominance or strength reported in Kraft (1991), and an overview of formal and presentation features of media messages in relation to their emotional effects by Detenber and Lang (2011).
This capacity has the obvious adaptive advantage of learning proper responses to critical situations before they are met in the actual world. The same point has been made by Currie (1995); see also Currie and Ravenscroft (2002). See also Tan (2008) on pretense play as exercising emotions and adaptive responses in film viewing. My position on the issue of the authenticity of emotion in response to fictional narrative is opposed to Walton (1990) who proposed that make-believe worlds can only induce 'as-if emotions'.
Neuropsychological accounts of film viewer emotions, such as those by Grodal (2009) and Zacks’ (2014) emphasise suppression of actions such as fight or flight, by prefrontal circuits following appraisals, e.g., of threats or provocations. In my related application of the cognitive theory to film viewing, viewers can experience a tendency to flee as an initial tendency, due to automated mimicry or simulation.
In the end virtual action responses in the cinema should be understood as an example of the situatedness of emotion in general. (See Griffith & Scarantion, 2009). The conventional set-up of the cinema positions spectators as witnesses to fictional events and appraisals, experiences, expressions and action readiness take shape according to the cinematic situation.
In the end, viewers know on the basis of their narrative and genre schemas, the film will provide answers to extant questions they have underway.
Needs of mood management and the occurrence of emotions that help to improve moods have been shown to explain preference for entertainment products such as movies (Zilmann, 2003).
Sympathy for mainstream protagonists is probably rather immediately induced by our felt similarity and familiarity with them, and more especially in terms of moral values (Zillmann, 2000).
The nature of the events and their outcomes corresponding to ups and downs in the life of a protagonist vary from one genre to another. For example, the action heroine meets with assaults on her life and deals blows to her stalker; the romance protagonist with separation and reunion. See also Zillmann’s theory of the enjoyment of drama.
I have introduced these emotions earlier (Tan, 1996) under the heading of Fictional World emotions or F emotions, because they are responses to events in a fictional world. F emotions include empathetic and non-empathetic emotions. Non-empathetic emotions can either be based on sympathy, for example, when we fear that a bomb will explode to the harm of a protagonist, or not based on sympathy. Awe induced by the sight of a sublime landscape would be an example. F-emotions are defined in opposition to A emotions. The latter category consists of responses to the film as a human-made artefact instead of a fictional world produced in the viewer’s imagination.
The point has been made in Tan and Frijda (1997) and Tan (2009), and more recently underscored in psychophysiological research using film by Wassiliwizky et al. (2017). Schubert et al. (2018) refer to the emotion as kama muta a socio-relational emotion of feeling closeness when an intensification of communal sharing relations is appraised. In the study just referred to such moments had been analysed in film fragments.
An example is the fear we have when we watch a horror monster in a view not aligned with any character’s, or without a character being in the neighbourhood of the monster.
See, for example, study of interest during character vs. action development oriented films Doicaru (2016, Ch. 2).
See for empirical comparative analyses of absorbed modes of witnessing drama and detached modes of spectatorship in watching nature documentaries Tan (2013b).
Films are segmented (from larger to smaller units) in acts, scenes and events. All subsequent events induce interest. Every scene offers answers or matches to anticipations induced earlier, leading to enjoyment. Enjoyment tends to reinforce interest—as it stimulates intake and rewards past efforts. Every scene, too, induces novel questions and affective anticipations, keeping interest at least alive.
Some researchers of media entertainment refer to such regulatory reappraisals as meta-emotions (Bartsch et al., 2008). Positive gratifications may be derived from such reappraisals and associated emotions. Viewers of sad drama may appreciate their own moral stance that transpires through their experience of a character’s losses and suffering from injustice. Horror lovers may like the emotion because they explicitly seek it, and younger male audiences of extremely violent films have been shown to test, and pride themselves on their coping abilities (Hill, 1997). A related act of emotion regulation is male viewers’ display of protective attitudes towards their female company during horror shows (Zillmann and Weaver, 1997).
In Tan (1996) I proposed that, in contrast to aversive situations witnessed in real life, popular fiction scenes on separation, isolation, violence, terror and horror and so on are due to their being part of a story always signal that we are in medias res; the narrative is to be continued, we are curious to know where it is heading, and it is virtually impossible to completely abort expectancies and imaginations of a turn to the positive. Entertaining these is in itself not unpleasant, especially when viewers are open to the possibility that they can learn from the unpleasant events.
Doicaru (2016) reviewed general models of aesthetic appreciation as to their suitability for explaining aesthetic appreciation of film. She reported a validation study of a measurement instrument in which five general factors were identified that may be used to describe dimensions of aesthetic appraisal in film viewing. They were Cognitive stimulation, Negative emotionality, Self-reference and Understanding. A corpus of films from different genres and aesthetic categories (e.g., mainstream, arthouse and experimental were used, and according audiences were involved.
Movies can move us not only in our role of witnesses of events in a fictional world, but also as artefacts made by filmmakers with some formal intention in mind; appreciation of visual beauty etc. are an example. They have the construction of the artefact as their object, and need to be distinguished, as artefact emotions from emotional responses to witnessed events in fictional worlds. They are aesthetic emotions because they involve appraisals of artefact features, such as form, style, use of technology and implied meaning. Untrained audiences can recount their artefact emotions: Professional critics can add elaborations of the appraisals they made while viewing. They have as their object the complex of film form, use of style and technology and intended or unintended meanings. We can further our understanding of appraisals in Artefact emotions using such intuitions available in critical film analyses. They are massively represented in internet user groups like Youtube and Metacritics. Machine learning algorithms are now being developed to extract and categorise emotions from film forums, and differentiate both films and target audiences, see, e.g., Buitinck et al. (2015).
A few example studies on effects of foregrounding procedures in narrative film and their effects on cognitive strategies and aesthetic appreciation can be found in Hakemulder (2007) and Bálint et al. (2016).
The concept has developed over the past three decades, see Casetti (2015). I have freely summarised meanings to fit the purpose of sketching a research agenda for psychologists.
The large project started by Bordwell et al. (1985) on the historical poetics of American mainstream cinema, already have provided psychological research into the mechanisms underlying the film experience with major concepts and reference norms for conventional structuring of film narratives and their stylistic parameters. Among these are continuity, spatiotemporal segmentation and stylistic emphasis.
Interested readers should regularly consult the society’s scholarly journal Projections.
In the role of top-down influences I emphatically include the Münsterbergian acts of imagination on the part of the spectator.
The research for this article has been supported in part by the Netherlands Organisation for Scientific Research (NWO), grant number 360-30-200 for the project 'Varieties of Absorption'.
Data sharing not applicable to this paper as no datasets were generated or analysed.