An agenda for the psychology of the film

At the time of the first kinetoscope and cinema exhibitions in 1894–1895, thanks to devices such as the Phenakistoscope, Zoetrope and Praxinoscope, moving images had been popular for decades. Just before that time, academic psychology turned to the identification of the mechanisms underlying the functioning of the mind. Perception psychologists began to study apparent movement of experimental visual stimuli under controlled conditions because they found moving stimuli interesting cases in human perception, or as part of the study of psychological aesthetics founded by Gustav Fechner and Wilhelm Wundt. The publication of The Photoplay: A Psychological Study marked the beginning of the psychology of the film. Hugo Münsterberg was trained by Wundt and recruited by William James to lead the experimental psychology lab at Harvard. Importantly, Münsterberg was also an avid cinemagoer as his analyses of theatrical films of his time may tell, and a professing cinephile at that. Münsterberg set two tasks for the study of the film: one was to describe the functioning of psychological mechanisms in the reception of film; the other to give an account of film as an artistic medium.

Münsterberg shared his contemporaries’ and even today’s spectators’ fascination for the wonder of moving images and their apparent reality. He described the film experience as a 'unique inner experience' that due to the simultaneous character of reality and pictorial representation “brings our mind into a peculiar complex state” (p. 24).

In the first part of The Photoplay, explores how film characteristically addresses the mechanisms of the basic psychological functions investigated by experimental psychology—namely perception, attention, memory and emotion.Footnote 1 In The Photoplay the imagination is the psychological faculty that theatrical movies ultimately play upon; attention, perception, memory and emotion are also directed by the film, but contribute to the film experience as building blocks for the imagination in the first place. One of the ways that films entertain the imagination is by mimicking the psychological functions. Film scenes may represent as-if perceptions, as-if thoughts, as-if streams of associations, and as-if emotions or more generally: display subjectivity.Footnote 2 Second, the film creates an imagined world that deviates from real world scenes as we perceive these in real life. Liberated from real life perceptual constraints involves the spectator’s self in 'shaping reality by the demands of our soul' (p. 41). Third, Münsterberg has a nuanced view of the automaticity of responses to film. On the one hand, it is the spectator’s choice—based on their interest—which ideas from memory and the imagination to fit to images presented on screen; they are felt as 'our subjective supplements' (p. 46). On the other, the film’s suggestions function to control associated ideas, '… not felt as our creation but as something to which we have to submit' (p. 46). And yet in Münsterberg’s view the film does not dictate psychological responses in any way.Footnote 3

Finally, The Photoplay provides abundant and compelling introspective reports of the film experience and so probes into the phenomenology of film, that is, what it is like to watch a movie. I think it is fair to say that for Münsterberg the film experience is the ultimate explanandum for a psychology of the film. In order to account for that phenomenology by mechanism of the mind proper descriptions of the film experience are needed, and introspective reports are an indispensable starting point for these.

The other task Münsterberg set himself was to propose an account of the film as a form of art. Part two of The Photoplay proposes that the film experience includes an awareness of unreality of perceived scenes. This awareness is taken as fundamental for psychological aesthetics; all forms of art are perceived to go beyond the mere imitation of nature.Footnote 4 Münsterberg showed himself a formalist in that he theorised that aesthetic satisfaction depends not on recognition of similarity with the real world or practical needs, but on the sense of an 'inner agreement and harmony [of the film’s parts]' (p. 73).Footnote 5 But in order to qualify as art, according to Münsterberg film was not to deviate too much from realistic representation that distinguishes theatrical movies from non-mainstream forms.

Münsterberg’s agenda is in retrospect quite complete. The detailed investigation of psychological mechanisms and aesthetics of film is followed by a last chapter on the social functions of the photoplay. The thoughts forwarded in it are more global than those on perception and aesthetics. The immediate effect of theatrical films on their audience is enjoyment due to their freeing the imagination, and their easy accessibility to consciousness 'which no other art can furnish us' (p. 95). Enjoyment comes with additional gratifications such as a feeling of vitality, experiencing emotions, learning and above all aesthetic emotion.

In a final section behavioral effects of successful films are discussed. Here the film psychologist vents concerns on what we now refer to as undesirable attitude changes and social learning, especially in young audiences. The agenda of today's social science research on mass media effects (e.g. Dill, 2013) is not all that different from Münsterberg's in the last chapter of The Photoplay.

The two tasks that Münsterberg worked on set the agenda for the psychology of film in the century after The Photoplay. It is clearly recognisable in the psychology of the film as we know it today.Footnote 6 But the promising debut made in 1916 was not followed up until the nineteen seventies, or so it seems. James Gibson lamented in his last book on visual perception that whereas the technology of the cinema had reached peak levels of applied science, its psychology had so far not developed at all (1979, p. 292). The cognitive revolution in psychology of the 60s paved the way for its upsurge in the early 80s. But some qualifications need to be made on the seeming moratorium. First, Rudolf Arnheim developed since the 1920s a psychology of artistic film form. Second, although not visible as a coherent psychology of the film, laboratory research on issues in visual perception of the moving image—in particular studies of apparent movement—continued.

Gestalt psychology and film form

Rudolf Arnheim’s essays published first in 1932 added analytic force to Münsterberg’s conviction that film is not an imitation of life. Film and Reality (1957) highlights shortcomings of film in representing scenes as we know them from natural perception.Footnote 7 In the same essay, it is pointed out that comparing a filmic representation of a scene with its natural perception is what analytic philosophers would call an error of category. In The Making of Film (1957) Arnheim presented an inventory of formative means for artistic manipulations of visual scenes, including delimitation and point of view, distance to objects and mobility of framing. It is argued that chosen manipulations often go against the most realistic options. For example, ideal viewpoints and canonical distances are often dismissed in favour of more revealing options.Footnote 8 Arnheim’s aesthetics of film gravitates towards acknowledged artistic productions more than to the 'naturalistic narrative film' (e.g., 1957, p. 116–117) the more moderate art form that Münsterberg tended to prefer.

Arnheim was informed by such founders of Gestalt psychology as Wertheimer, Köhler and Koffka. This school held that natural perception results from the mind’s activity. It organises sensory inputs into patterns according to formal principles such as simplicity, regularity, order and symmetry. Arnheim developed into the leading Gestalt theorist of aesthetics of the 20th century. In his 1974 book he analysed a great number of pictorial, sculptural, architectural, musical and poetic works of art while only rarely referring to film.Footnote 9 The cornerstone aesthetic property of art works including film is expression, defined by Arnheim as 'modes of organic or inorganic behaviour displayed in the dynamic appearance of perceptual objects or events' (1974, p. 445). Expression’s dynamic appearance is a structural creation of the mind imposing itself on sound, touch, muscular sensations and vision. Expressive qualities are in turn, the building blocks of symbolic meaning that art works including film add to the representation of objects and events as we know them in the outer world. Thus, Arnheim’s theory of expression and meaning in the arts seems to echo Münsterberg’s formalist position on the perception of 'inner harmony' as the determinant of film spectators’ aesthetic satisfaction.

Apparent motion

Münsterberg shared the amazement that moving images awakened in early film audiences. He considered the experience of movement a central issue for the psychology of the film. The experience of movement in response to a series of changing still pictures has been studied in psychology and physiology under the rubric of apparent motion.Footnote 10 In Münsterberg’s days, international psychology labs were probing the perception of movement in response to experimental stimuli that were perceived as moving images. Well-known examples include apparent motion induced by the subsequent views of single stationary lines in different positions that result in phi movement, the perception of one moving shape or line. Researchers in this area have continued to study the perception of movement in film as only one of many interesting visual stimuli, such as shapes painted on rotating disks, or dynamic computer-generated lights, shapes and objects of many kinds. Why and how we see motion has been as basic to the study of visual perception as questions of perception of colour, depth, and shape. Helmholtz proposed that what we need to explain is how retinal images that correspond one-to one, i.e., optically with a scene in the world are transformed into mental images, or percepts that we experience. In the case of apparent motion, we also need to understand how a succession of retinal images are perceived as one or more objects in motionFootnote 11 Apparent motion in film viewing needs to be smooth,Footnote 12 and depends on frame rates and masking effects. (The latter effects refer to dampening of the visual impact of one frame by a subsequently presented black frame).

Münsterberg’s conviction that the perception of movement needs a cognitive contribution from the viewer clashes with alternative explanations that rely on prewired visual mechanisms that automatically and immediately pick up the right stimulus features causing an immediate perception of motion, without the mind adding anything substantial. The inventors of nineteenth century moving image devices explained the illusion of movement by the slowness of the eye, possibly following P.M. Roget’s report on apparent motion to the Royal Society in 1824. In the early years of cinema, the persistence of vision account was meant to add precision to this explanation. It proposed that the retina, the optic nerve or the brain could not keep up with a rapid succession of projected frames, and that afterimages would bridge the intervals between subsequent frames. Anderson and Fisher (1978) and Anderson and Anderson (1993) have argued why the notion is false and misleading. It suggests that film viewers’ perceptual system sluggishly pile up retina images on top of one another. However, this would lead them to blur which obviously is not the case. The Andersons refer to the explanation as a myth because it is based on a mistaken conception of film viewing as a passive process. Even with the characteristically very small changes between subsequent frames characteristic of motion picture projection, the visual system performs an active integrative role in distinguishing what has changed from one image to another. This integrating mechanism in film viewing is exactly the same as in perceiving motion in real world scenes. Mechanistic explanations have since been founded on growing insights in the neuroscience of vision, such as single cell activity recordings in response to precisely localised stimulus features.Footnote 13 'Preprocessing' of visual input before it arrives in the cortex takes place in the retina and the lateral geniculate nucleus, which have specialised cells or trajectories for apprehending various aspects of motion. There are major interactions between perceptual modules.Footnote 14 Physiological and anatomical findings in the primate visual system, as well as clinical evidence, support the distinction of separate channels for the perception of movement on the one hand, and form, colour and depth on the other (Livingstone and Hubel, 1987). Research on how exactly the cortical integration systems for vision are organised has not yet come to a close. A variety of anatomical subsystems have been identifiedFootnote 15, and there is room for task variables in the explanation of motion perception.Footnote 16 The operation of task variables in presumably automated processes (e.g., attentional set, induced by specific task instructions) complicates accounts of apparent motion and the perception of movement based on lowest processing levels.

Non-trivial and clear-cut contributions of the mind to smooth apparent motion have been proposed by Gestalt psychologists. Arnheim (1974) considered the perception of movement as subsidiary to that of change. The mind uses Gestalt principles such as good continuation and object consistency to perceive patterns in ongoing stimuli. Movement is the perception of developing sequences and events.Footnote 17 Gestalt psychologists have attempted to identify stimulus features that are perceived as a spatiotemporal pattern of 'good' motion, and they discovered various types of apparent motion have been distinguished as a function of stimulus features. In an overview volume, Kolers (1972) presented phi and beta motion as the major variants. Phi, the most famous, was first documented by Wertheimer in 1912. An image of an object is presented twice in succession in different positions.Footnote 18 Pure or beta motion that is objectless motion, was the novel and amazing observation; the perception seemed to be a sum or integration by the mind beyond the stimulus parts, and asked for an explanation. It is also experienced when the objects in the subsequent presentations are different.

Wertheimer and those after him looked for mechanisms of the mind that could complement the features of the stimulus responsible for apparent motion in its various forms.Footnote 19 Other studies of apparent motion, too, indicated that simple models of stimulus features alone could not explain apparent motion.Footnote 20 One of the best examples of what the cognitive system adds to stimulus features is induced motion (Duncker, 1929). When we see a small target being displaced relative to a framework surrounding it, we invariably see the target moving irrespective of whether it is the target or the frame that is displaced. Ubiquitous film examples are shots of moving vehicles, with mobile or static framing.

In this summary and incomplete overview of the field, we could not make a strict distinction between mechanist and cognitive explanations for the perception of movement in film. The current state of research does not allow for it.Footnote 21 Kolers’s conclusions on the state of the field closing his 1972 volume on motion perception seem still valid. He inferred from then extant research that there must be separate mechanisms for extracting information from the visual stimulus and for selecting and supplementing the information into a visual experience of smooth object motion or motion brief. He concluded that 'The impletions of apparent motion make it clear that although the visual apparatus may select from an array [of] features to which it responds, the features themselves do not create the visual experience. Rather, that experience is generated from within, by means of supplementative mechanisms whose rules are accomodative and rationalizing rather than analytical' (p.198). But even if after Koler's analysis some perceptual (Cutting, 1986) or brain mechanisms (Zacks, 2015) have been identified today we still do not know enough about the self-supplied supplementations.Footnote 22

Perception and cognition of scenes

Mental representation and event comprehension

Contributions of the mind can go considerably beyond apparent motion, i.e., the perception of smooth motion from one frame to another. The cognitive revolution in academic psychology that took off in the 1960s broadened the conceptualisation of contributions of the mind to the film experience beyond the narrower stimulus-response paradigms that had dominated psychological science until the 1960s. The cognitive revolution went beyond Gestalt notions of patterns applied by the mind on stimulus information. It introduced the concept of mental representation as a key to understanding the relation between sensory impressions from the environment on the one hand, and people’s responses to it. Moreover, these cognitive structures were seen functional in mental operations such as retrieval and accommodation of schemas from memory, inference and attribution. These were quite complex in comparison to perceptual and psychophysical responses. In the past 30 years, they have come to encompass event, action, person, cultural, narrative and formal-stylistic schemas. The cognitive turn in film psychology has stimulated a growing exchange with humanist film scholarship, resulting in advances in the elaboration of cognitive structural notions. Early applications of the cognitive perspective in the psychology of the film can be found in the 1940s and 50s in work by Albert Michotte (1946) and Heider and Simmel.Footnote 23

Against mental representation: direct perception of film events

The psychology of the film as a subdiscipline of academic psychology really took off in the late 1970s. Münsterberg’s broad agenda that had been scattered across isolated studies of mainly movement perception regained general acclaim. This was due first to the booming supply and consumption of moving images through media television and computer-generated imagery since the 60s. Second, the cognitive turn in experimental psychology renewed an interest in perception and cognition as it occurs in natural ecologies. This is the backdrop against which James Gibson (1979) noted the virtual absence of a psychology of the moving image, motivating his chapter on the film experience. The chapter was important in that it applied his highly influential ecological principles of perception of real world scenes to perception in the cinema. Gibson’s general theory of visual perception (e.g., Gibson, 1979) hinges on the notion that the visual system has evolved to extract relevant information from the world in a direct fashion. A scene presents itself to the observer as an ambient optical array that immediately and physically reflects the structure of the real world. Changes and transitions in the flow of the optical array are due to natural causes such as alternations of lighting intensity of the scene, e.g., due to clouds, or movement of objects in the scene or of the observer. These variations in the optical flow enable the automatic pick-up of invariants. Example invariants are the change in size of portions in the array, and the density of texture in that portion when the observer gets closer to, or farther away from the object.Footnote 24 The changes in these parameters are linked with depth-information in a way that is constant across different scenes, observer speeds, lighting conditions, etc. Invariants enable the direct perception of the real world in the service of adaptive action. Disturbances of the optic flow can automatically be perceived as events. The events are categorised on the basis of the nature of the disturbances, e.g., as terrestrial, animate, or chemical events. Furthermore, the direct tuning of the perceptual senses to the structures of the environment enable an immediate perception of affordances, for example the slope of a hill causes the direct perception of 'climbability'.

The experience of motion pictures according to Gibson involves a dynamic optical flow exactly like the one an observer would have when being present at the filmed scene.Footnote 25 Film represents the world to the senses that are calibrated to that world. The field of view of the camera becomes the optic array to the viewer (Gibson, 1979, p. 298). Perception of objects, movement, events and affordances is direct and realist, based as it is on the same invariants and affordances that the scene in the real world would offer. Deviations from these as emphasised by cognitivist film psychologists from Münsterberg through Arnheim to Hochberg as we will shortly see, are largely taken as non-representative exceptions.

A major affordance offered by conventional movies is empathy with characters. Empathy presupposes that we understand what happens to characters. Scenes present their actions, reactions and feelings. However, most scenes are not continuous. How do we understand scenes presented in pieces, and what are the limits to our understanding? Gibson’s reply to the question of how continuity is perceived in scenes that is, smooth movement and unitary events across cuts would be that the perceptual system extracts the same invariants from the two shots on either side of the cut. The elegant explanation again rests upon a presumed correspondence between perception of real world scenes and film scenes.

Gibson inspired important theorising on the film experience, notably by Anderson and Cutting that we will turn to shortly. Here we emphasise that his direct perception account of the film experience stands in perpendicular opposition to the key innovation that the cognitive turn introduced in experimental psychology. Gibson denied the necessity of mental representations in the perception of objects and events, be it in real scenes or in film.

Cognitive schemas and the canonical set-up of the cinema

The role of mental representations, be they cognitive principles or schemas or other mental structures was argued over a lifetime of work in the psychology of film by Julian Hochberg. A perception psychologist with an interest in pictorial representations and their aesthetics, he devoted a large part of his work to identify what is given in film stimuli and how perception goes beyond that, in often ingenuous demonstrations and experiments. (The demonstrations are, in fact, introspective observations of film perception under exactly specified, reproducible stimulus conditions). A comprehensive overview can be found in Hochberg (1986).Footnote 26 His legacy should be referred to as the Hochberg and Brooks oeuvre, because his wife Virginia Brooks a psychologist and filmmaker, contributed such a great deal to it. Hochberg found that cognitive schemata are necessary in the perception of film for two reasons. The most profound one is that completely stimulus-driven (or 'bottom-up') accounts of the perception of movement, events, and scene continuity do not really explain the experience. For example, Hochberg and Brooks point out that neurophysiological motion detectors do not explain motion perception, that is, they 'amend but do not demolish' an account based on a mental representation of motion (Hochberg and Brooks, 1996b, p. 226). The same would go for any other direct perception account, including Gibson’s optics plus invariant extraction model. The more practical argument is that the direct perception account fails to pose limits to the scope of its application, leaving thresholds and ceiling conditions for the mechanisms out of consideration. The canonical set-up of cinematic devices for recording and displaying motion pictures has evolved to produce good impressions of depth, smooth and informative motion, emphasis on relevant objects and continuity of action, often violating the course of direct perception in comparable real world scenes. Figure 1 presents a demonstration of active disregard that viewers of mainstream movies typically display. (See also Cutting & Vishton (1995) on contextual use of depth-information).

Fig. 1
figure 1

Example of perceptual disregard in the cinema. Hochberg (2007) discusses the view of objects moving in front of a landscape. In normal film viewing flatness of studio-backgrounds and quasi-camera movement is disregarded. Traditional films can use a painted or projected landscape at the backdrop of the set, and panning camera movements instead of a really mobile camera to create a convincing impression in the viewer of following a moving object in the scene’s space. A cycling woman is followed in a pan shot moving from left to right; frames A and B constitute the beginning and the end of the panning shot. In normal perception in the real-world objects on the horizon seem to move in the direction of the moving subject, whereas nearby objects move in opposite direction. Panning involves a stationary viewpoint, causing the image to lack this 'motion parallax'. For example, the scarecrow in the middle ground of frame B should be further to the left from the ridge on the horizon than in frame A (DA < DB), but the distance between the objects has remained identical (DA = DB). However, the lack of parallax and resulting apparent flatness can be and is disregarded and viewers experience smooth self-motion parallel to the moving object. Disregard such a this is part and parcel of normal film viewing or the "ecology of the cinema".

The most immediate demonstration of apparent motion is Duncker’s induced motion referred to above, a cinematic effect because it is dependent on canonical projection within a frame. The best analytic examples are about the perception of events in filmed dance.Footnote 27 For Hochberg and Brooks an ecological approach to perception in the cinema needs to take the ecology of the cinema into account.

The necessity of cognitive schemas in film perception was pointed out most pregnantly in Hochberg’s dealing with the comprehension of shot transitions or cuts. It was argued that known sensory integration and Gibson’s extraction of invariants, fail to account for viewers’ comprehension of frequent and simple cinematic events like elision of space and time. Overlap in contents between successive shots can be hard to identify or lack at all. Hochberg and Brooks proposed a principled alternative: films play in the mind’s eye. Viewers construct an off-screen mental space from separate views, and they can link two successive views by the relation of each of these to this space. In constructing a mental space, overlap may even be overruled by other cues, that have nothing to do with any invariance. The construction must involve event schemas and cognitive principles removed from anything immediately given in the film. Schemas may indeed outperform (mathematical) invariants picked up from the optical array offered by the screen. Hochberg and Brooks (1996b) show, for example how gaze direction of film characters or personae in subsequent shots may be more effective in the construction of a continuous mental scene than overlapping spatial or visual symbolic contents. Footnote 28 Mental schemas seem to be indispensable in the comprehension of sequences of completely non-overlapping cuts. A famous demonstration by Hochberg and Brooks is reproduced in Fig. 2. The succession of shots is readily understood when it is preceded by the presentation of a cross, which provides the integrating schema. Viewers’ schema-based continuous perception of scenes is supported by the ways that traditional cinema tells its stories. The presentation of an overall view in so-called 'establishing shots' followed by a 'break-down' of its object into subsequently presented part views is a cornerstone procedure in classical continuity film style (Bordwell and Thompson, 1997/1979).

Fig. 2
figure 2

Role of mental schemas in the comprehension of continuous space across shots as discussed in Hochberg and Brooks (1996, 2007). a The sequence of eight static shots does not seem to make sense. b A static preview of the entire object as in A) would activate a mental schema of a cross. Subsequent shots are then recognised as consecutive camera relocations, counter-clockwise rotations offering subsequent views of corners From Hochberg and Brooks (2007). Adding a shot of the cross moving diagonally to the lower left corner of the frame would smoothen the transition between the entire object view and the view of its top right corner further and facilitate the perception of the subsequent parts. Hochberg and Brooks (1996) reported that replacing one of the shots by a blank frame does not lead to confusion. For example, if shot 7 were replaced by a blank frame, the view of the lower left angle of the cross would seem to have been skipped, and shot 8 would be recognised as to present a view of the lower left corner. That is, the trajectory of the views would remain intact in keeping with the overall view of the object. This illustrates all the more the leading role of the schema of a cross in the perception of its parts.

A smooth understanding of non-overlapping cuts may require dedicated knowledge of discursive story units and rules for their ordering that only literary analysis types of study can reveal (Hochberg, 1986, pp. 22–50). Hochberg and Brooks (1996a, p. 382) pointed out that theoretical or empirical proposals as to the nature of such representations were lacking. They found Gestalt principles unsatisfactory (Hochberg, 1998). Current film psychologists have taken up this challenge as we shall see briefly.

As a final contribution of Hochberg and Brooks’ to the psychology of the film, we would like to highlight their view of film spectators as partners motivated to deliver their share in a communicative effort. Film viewers contribute to the canonical setup of the cinema in that they are astutely aware of the filmmaker’s communicative intentions: '… the viewer expects that the film maker has undertaken to present something in an intelligible fashion and will not provide indecipherable strings of shots' (Hochberg, 1986, p. 22–53). Viewers must be assumed to have an associated motivation to explore the views presented to them. In a series of inventive experiments, Hochberg and Brooks gathered evidence for an impetus to gather visual information. Looking preference increased with cutting rate and with complexity of shot contents. Visual momentum, or viewer interest, (Brooks and Hochberg, 1976; Hochberg and Brooks, 1978) as they termed it is the absorbing experience typical of cinema viewing. These studies help us to understand how current cutting strategies meet the viewers’ typical motivation for cognitive enquiry. The reward of comprehension is carefully dosed by varying the time allowed to the viewer to inspect objects and scenes, dependent on their novelty and complexity.

Hochberg’s demonstrations of the involvement of mental structures in understanding portrayed events was in large part based on introspective evidence. They have been criticised for relying too heavily on top-down control of perception by too intricate mental structures, by Gibson and others.Footnote 29 Current research in the cognitive structure tradition uses more sophisticated experimental set-ups. Inspiration has been drawn from theories of discourse processing in cognitive science. In this research, the relationship of 'top-down' use of schemas in scene comprehension with 'bottom-up' processing of stimulus features has become an important question.Footnote 30 Zacks has extensively investigated how film viewers segment the ongoing stream of images and extract meaningful events and actions from it. Viewer segmentation depends on automatically detected changes in a situation (Zacks, 2004). Detection of the changes requires only minimal use of schemas, and triggers automated perceptual-motor simulations of events and subevents such as actions.Footnote 31 Segmentation follows the logic of events in the real world. Most importantly, multiple events can be organised in a hierarchical or linear fashion, as scenes, sets of events and subevents or actions (Zacks, 2013).

Theory of mind and layered meaning of events

Extracting events in understanding film scenes needs more than retrieving schemas of real world events. The fact that they are presented with an idea in mind, is reflected in their understanding. Understanding film scenes and especially characters, their actions, plans and goal has been argued to require a so-called Theory of Mind (Levin et al., 2013). TOM is a system of cognitive representations of what beliefs, needs, desires, intentions and feelings people have in their interaction with others and the world. It is acquired in early childhood, when children understand that others, too, have an internal life, similar to but also different from one’s own beliefs and feelings. Levin et al. explain how use of TOM, also referred to as mentalising is necessary for an elementary understanding of film character actions and feelings. For example, character gaze following that underlies our perception of what characters feel or want to do with respect to an object that they look at requires TOM. TOM underlies grasping spatial (and action-) relations in scene comprehension across cuts using gaze following. Understanding relations between more complex events require schema-controlled theorizing on what people believe, do, think, and feel. Finally, Levin et al. demonstrate through film analyses how film viewers construct multi-layered representations of a film’s action from the point of view of different characters, the viewer and even from the narrator’s or filmmaker’s. For example, viewer and character perspectives may clash as in dramatic irony, or the narrator may create false beliefs on story events in viewers.

Continuity of events and viewer attention

Hochberg’s question of what the mental schemas look like that enable us to perceive smooth progress of events across film cuts has recently been addressed by the next generation of film psychologists. They have sought answers in profound analyses of the canonical setup delivered by the founders of cognitive film theory in the humanities, such as Bordwell (1985, 2008), and Anderson (1996). Bordwell’s extensive analyses of classical film narrative and his account of the viewer’s mental activity in the comprehension of the film’s story-world suggest a film-psychological hypothesis on the experience of continuity: Classical Hollywood film style serves smooth progress of the narrative. Continuity editing ensures fluency across shot transitions. Shot A cues cognitive schema-based or narrative expectations that are subsequently matched in shot B. Expectations can be perceptual or cognitive, i.e., requiring inferences supported by event schemas. Anderson added a Gibsonian perspective, arguing that the perception of film scenes mimics the perception of real world scenes. Continuity shooting and editing closely follow the constraints of the human perceptual systems that have evolved to 'extract' continuity from changing views of scenes in the real world. Recent research into the experience of smooth development of events and scenes across shot transitions draws on these principles of continuity narration.Footnote 32 Framing, editing and sound finetune the viewer’s top-down search to focus on candidate target stimuli. A quite complete and accurate explanation was offered by Tim Smith. His Attentional Theory of Cinematic Continuity (2012) explains the viewer’s sense of smooth progress by the continuity editing principles that mainstream filmmakers tend to adhere to. AToCC breaks away from Hochberg’s analyses to the degree that it holds that viewers do not need intricate spatial or semantic schemas to construct continuous events from separate shots. Rather it is built on the Gibsonian principle that perceiving continuity in film scenes derives from the continuity that we experience in perceiving scenes in the natural world. The ecology of the cinema renders it sufficient to follow a number of simple spatiotemporal guidelines. Continuity editing film style guides viewers’ attention in seamlessly following action across cuts. Attention, that is the focused selection of objects in a shot by the viewer, i.e., what and where the viewer directs their gaze, is led by the filmmaker. The viewers’ gaze in shot A is directed to the part of the screen where the target of interest in shot B, that is after the cut, will be. The shift of attention from one portion to another of the screen in shot A is shortly followed by the cut, and because the gaze 'lands' in the right place in shot B, the cut has become invisible.Footnote 33 The theory of continuity perception adds precise levels of analysis to the construction of mental scene spaces that Hochberg proposed. It distinguishes higher level and lower level control of attention. Higher-level ones include 'perceptual inquiries' as Hochberg and Brooks (1978a) called them. The expectations or questions that guide the gaze may be minimally articulated, e.g., 'what or whom are these characters looking at' as in gaze following, but the operation of higher level cognitive schemas are not excluded. The best demonstration to date of the control of focus of attention by the narrative is given in research on suspense and its effects on film viewer gazes by Bezdek et al. (2015) and Bezdek and Gerrig (2017). Footnote 34 Their results can be taken to imply that suspense, a state of high absorption, is associated with focal attention to story-world details supervised by expectations created by the narrative (see also Doicaru, 2016).

The study of film viewers' attention has delivered a firm account of the role of the ubiquitous Hollywood continuity film style in the typical experience of smoothly flowing film scenes and stories that audiences allover the world have. (See for a review Smith, Levin & Cutting, 2012).

A lead role in perception for cinematic low-level features?

Experimental psychology has always aspired basic explanations of perceptual responses, preferably through transparent mechanistic associations with physically observable stimulus conditions. The role of high-level narrative schema-based attention in smooth film experiences discussed in the previous section, is subject to debates in which experimental data support arguments pro and con. To begin with, AToCC emphasises the role of leading expectations in following cuts, but more akin to the Gibsonian approach of visual perception than to Hochberg’s schema position as it is, it tends to stress lower level features as directing attention bottom-up, too or even more so. One lower level is given by film-stylistic devices, for instance the use of sound that can orient viewers to direct their gazes to the next shot’s portion of the screen where the sound’s origin will be shown. Another are lower level stimulus features in a narrower and technical sense, such as bright lights and movements with sudden onset that automatically attract attention due to the make-up of the senses and the brain. Especially movement was shown by Smith to be an extraordinary low level attentional cue. The power of low level feature control of attentional shifts has inspired Loschky et al. (2015) to speak of the 'tyranny of film'. They start from research findings suggesting that the use of low-level stylistic features can result in attentional synchrony across film audiences, that is individual viewers of a scene gaze at exactly the same portions of the screen at exactly the same time.Footnote 35 Remarkable degrees of inter-viewer synchronization of visual attention has also been established in studies of localisations of brain activity in film viewers (e.g., Hasson et al., 2003). However, Stephen Hinde’s research has recently shown that the distraction effect of inserted low-level attention triggers is quite limited (Hinde et al., 2017) In line with this notion of top-down attention control overriding bottom-up attention triggers, Magliano and Zacks (2011) demonstrated that the perception of cuts is suppressed by higher order processes related to the construction of complex events.

Gibson’s idea of invariants in optical arrays can now be made concrete, enabling the prediction of bottom-up controlled attention and perception from objectively identified features. Developments in computer vision, image and sound analysis have paved the way for automated extraction of features and patterns in visual and auditory stimuli in terms of multiple dimensions. For example, machine extraction of saliency as a feature predictive of bottom-up attention has been developed and applied in numerous computer vision applications. A much-cited article by Itti and Koch (2001) illustrates the idea for static images. Specialised neural network algorithms detect features such as colour, intensity, orientations, etc. in parallel over the entire visual field. Each feature is represented in a feature map, in which neurons compete for saliency. Feature maps are combined into a saliency map. A last network sequentially scans the saliency map, moving from the most salient location to the next less salient one and so on. Footnote 36 An excellent explanation of how to obtain saliency maps is given at a Matlab page.Footnote 37

Psychologists of film in their attempts to explain the extraordinary smooth and intense perceptual experience that mainstream film typically provides, currently seek to join forces with computer vision scientists. In a next step, they may seek collaboration with vision labs in the world that attempt to link their low-level film image feature analyses with film narrative structures and viewer responses.Footnote 38

Fig. 3
figure 3

Examples of computational film analyses. Number of shot transitions as a function of acts. Cutting (2016), Fig. 2. Under Creative Commons License (http://creativecommons.org/licenses/by/4.0/). Note that ordinates are inverted; lower positions of titles mean larger number of shots and decreased shot durations. Normalised time bins refer to units of duration standardised in view of variable film length of separate titles. Left panel displays distribution of cuts over time and acts, right panel of non-cut transitions such as dissolves, fades and wipes.

The work of perception researcher James Cutting has carried the psychology of the film into the next stage of the Gibsonian ecological approach, while also linking it with insights in the structure of film narrative from humanities scholarship.Footnote 39 In an interesting essay on the perception of scenes in the real world and in film Cutting (2005) summarised the ecological perspective on perception stating that understanding how we perceive the real world helps to grasp how we perceive film and vice versa.Footnote 40 In the last decade Cutting developed powerful computational content analysis methods that reveal the patterning of low-level features in relation to dimensions of film style and technology, in representative samples of Hollywood films of well over a hundred titles. The theoretical starting point of the approach is that movies exhibit reality. The psychologist Cutting subscribes to the analytical distinctions made in literary and film theories between plot, form and style of a narrative on the one hand, and the represented story-world on the other. The Gibsonian proposal is that analyses of the fabula or story-world (i.e., the action, events, characters and so on) should lead to identification of syuzhet features (i.e., formal and stylistic features that are physically given in the film stimulus or can be perceived without substantial instruction) functional in the perception and understanding of that story-world; vice versa, variations in form and style reflect variations in the portrayed story-world. Cutting’s definition of low-level film features used in the analyses was informed by analyses of narrative, style and technology by David Bordwell, and methods for statistical style analysis by Barry Salt (2009).

Low-level features analysed by Cutting and co-workers are physically and quantitatively determinable elements or aspects occurring in moving images, regardless of the narrative. They include shot duration, temporal shot structure, colour, contrast and movement. The value of each feature can be expressed as an index for an entire film, or for some segment targeted in an analysis.Footnote 41 Inspection by an analyst complements machine vision analyses, but I would qualify the indexing approach as computational (objective) film analysis, because of intensive tallying and numerical operations developed by specialists in psychological data-processing. The features do not constitute events or scenes, but they accentuate these. A recording of their measurements for an entire film would constitute an abstract backbone to be filled with scenes and events. One possible comparison is with the rhythmic score of a song without melodies and words. In the hands of capable film-makers they are indispensable for conveying the narrative, due to their direct, predictable and automated effects on the visual system.

The primary use of the approach is in film analysis. The multi-feature configurations of indices can be used to reliably 'fingerprint' films or sections. Reliably because the indices are derived from large numbers of measurements. Computational film analysis uses a historical corpus of films and has been deployed over the past decade to corroborate and enrich historical analyses of film style.Footnote 42 The climax so far of efforts to integrate computational content analysis with film theory and analysis is Cutting’s (2016) report on narrative theory and the dynamics of popular movies. The corpus consisted of 160 English language films released between 1935 and 2010, ten for each year. As Figure 3 illustrates a typical course obtained of the number of shot transitions over film presentation time, interpretable as to mark the acts and the pace of narration, see Figure 3. An important outcome of the analyses is that clear physical support was obtained for the four-act structure proposed by film historian Thompson (1999) across the entire period. It should be noted that Thompson’s act structure was identified largely on the basis of higher level narrative segmentation.Footnote 43 Shot scale was unrelated to the act structure. Cutting added analyses of higher order level film features that can be interpreted to co-vary with narration.Footnote 44 Cutting then ventured upon a multi-feature analysis of the entire corpus. Associations among all indices across all titles could be reduced to four dimensions: motion, framing, editing and sound. They correlated in a meaningful way. For example, shot scale was inversely related to shot duration; in classical narration close-ups tend towards briefer durations than wide shots. Each dimension represented polar opposites between features, e.g., music vs. conversation for sound and close-ups vs long shots for framing. Computational content analysis can explore the dynamics of the dimensional representations over subsequent acts of movies. Figure 4 reproduces Cuttings findings for prolog, setup, complication, development, climax, and epilog.Footnote 45 It would seem that the analysis winds up in a level of cinematic content representation that is grounded in directly given stimulus features, integrated with film-analytical features that can be readily indexed and seem relevant as production tools in regular filmmaking.

Fig. 4
figure 4

Five movie dimensions in narrational space. Reproduced from Cutting (2016) Fig. 9. under Creative Commons License (http://creativecommons.org/licenses/by/4.0/). The displayed representation is obtained from dimensional reduction of the numerous associations between film titles in terms of their feature profiles. The results of the first stage of the analysis are not displayed here, see Fig. 8 in Cutting (2016). In that stage, the number of associations between all titles regarding all features was reduced to four dimensions (see main text) using principal component analysis. In the next stage the analysis was applied to the features and films for each separate act, to result in the configurations shown here. Arrows vary in length, correspondingly to differences in the range of values on the dimensions. Black dots indicate median values of the acts on the dimensions. Considering for example the sound dimension, it can be seen that the set-up tends to have more conversation and the climax has more music. The red bars indicate the dispersion of values on the dimension and the degree it is skewed towards one or the other end

What does computational content analysis mean psychologically, that is how do indices and dimensions function in the viewer’s perceiving and comprehending events? Patterns of features trigger changes in viewers’ physiological, attention, perception and emotion systems, according to Cutting (2016, p. 27). Typical low-level configurations may correlate with possible effects on the viewer’s perception and experience of events. For example, shot duration may support interpretations of pace, mood and tension, think of drama’s long takes; temporal shot structure is functional for sustaining attention or suspense (e.g., when a sequence of brief shots abruptly merges into long duration shots), e.g., in thrillers; movement (of camera and objects on screen) serves arousal in the viewer, as in action movies; low luminance signals possible threat as in horror movies, while high luminance may lend 'a sense of other worldliness' (Brunick et al., 2013, p. 141). All low-level features can help viewers in categorising films as to genre, and changes in these will support segmentation of events and scenes, which is at the basis of smooth narrative understanding. Combinations of indices enable more interesting interpretations of possible experience effects.Footnote 46 However, because the studies that the overarching computational content analysis was based on do not involve response measurement, a direct connection between cinematic form (especially narrative procedures) and cinematic meaning that Cutting argues for is open to further elaboration. Even in the face of the richness of directly given information that has been extracted using computers, Cutting sees room for the use of cognitive schemas. The very narrative acts that are underlined by immediately given information may be schematic in nature, but he finds it more likely that their functioning is less dependent on memory-processes than the very high-level cognitive structures implied in cognitive scripts and TOM reasoning.

To conclude the sections on the cognition of film scenes, we seem to have made important progress in understanding how movies construct events in film viewers' minds an brains, as put it in his state of the art review. Movies in part "dictate" events, actions and scenes to viewers' brains using an "alphabet" of visual and auditive features; viewers in turn contribute to the construction of story-worlds by developing and matching higher-order structural anticipations using embodied cognitive event, character and narrative schemas. Since 1916, the film units that have been analysed increased from paired single stimuli (as apparent motion experiments) to whole film acts (as in computational film analysis). Analyses of narrative structure from film theory have become for the psychology of film what harmonics and counterpoint analysis signify to the psychology of music or the theories of syntax and semantics to psycholinguistics. They inform psychological notions of film structure and organization.

The awareness of narrative film

The third part of The Photoplay deals with issues other than the psychological mechanisms or the psychology of film form namely the awareness offered by the photoplay. It was only natural to Münsterberg as a child of his time to designate the special awareness that film creates as the explanandum in psychological research, the mechanisms of film stimuli impinging on attention, perception and memory being the explanans. His characterisations of this conscious awareness, what it is like to watch theatrical films, or in other words the phenomenology of the film experience remains in my view as yet unparalleled. Apart from the sense of freedom that we have already discussed, they include attentional and affective experiences.

Münsterberg described enjoyment as the immediate effect of theatrical film, explaining it from the exceptional freedom of the imagination: "The massive outer world has lost its weight, it has been freed from space, time, and causality, and it has been clothed in the form of our consciousness. The mind has triumphed over matter and the pictures roll on with the ease of musical tones. It is a superb enjoyment which no other art can furnish us" (Münsterberg, 1916, p. 95). Light has been thrown on the remarkable fluency of the film experience noted by Münsterberg by current research in narrative procedures, and the mechanisms of continuity perception discussed in the previous section. Münsterberg also stressed that the enjoyment of photoplays depends on our experience of the film’s story as an emotionally meaningful world separate from reality: 'The photoplay shows us a significant conflict of human actions … adjusted to the free play of our mental experiences and which reach complete isolation from the practical world …' (p. 82). And finally, he singled out the role of focused attention in enjoyment. 'It is as if that outer world were woven into our mind and we were shaped not through its own laws but by the acts of our attention, …' (Münsterberg, 1916, p. 39).

Twentieth century academic psychology did not develop much of a body of theory and research on human consciousness. Hence it is not surprising that alongside research into perception and comprehension one doesn’t find much work on the conscious experience of film. Measurements of perceptual, attentional, cognitive and affective responses in experimental psychology are extremely limited with regards to the contents of consciousness that they tap. Lab tasks enabling measurement are must be simple, e.g., identification, comparison or categorisation of visual stimuli, rather than free description or recall. Self-reports associated with such tasks must be quantifiable and take the shape of choice responses, simple intensity ratings or readily codifiable reports. Behavioural measures are farther removed from any contents of experience because these need to be inferred. Here, too, simple objective coding is a must. Descriptive and interpretative reports of the qualia and meaning of experiences afforded by film have been largely left to hermeneutic film criticism and phenomenologically oriented film philosophy in the humanities. Scholarship in these fields follows in the footsteps of Münsterberg. The present overview of the psychology of the film cannot go into it further; I refer to Sobchack’s (1992) volume on the phenomenology of the film experience. It opens with the proposition that film directly expresses perceptions, a proposition coming close to the observation in The Photoplay that the contents of the audience’s experience are perceptions, attention, thinking and emotion that are projected before them on the screen.

Absorption in film

Meanwhile, progress can be reported in understanding one aspect of the rich and complex film experience namely its intensity. Münsterberg observed that the film audience’s enjoyment is due to prolonged states of attention strongly focused on a fictional story-world, so strong in fact that the here and now escapes consciousness and it seems instead as if an 'outer world were woven into our mind'. Elsewhere we have proposed to refer to the experience of intense attention as absorption in a story-world (Tan et al., 2017), following Nell's (1988) groundbreaking description of "being lost in a book". Media psychologists specialised in research on media entertainment (Vorderer et al., 2004, Bilandzic & Bussele, 2011) have developed a variety of measures capturing enjoyable absorption-like states afforded by narrative, television drama and video-gaming. We discuss four of these.

a. Narrative engagement (Bussele and Bilandzic, 2008, 2009) is a pleasant state of being engrossed or entranced by the narrative as a whole as it is presented in a book or film, including the activity of reading or viewing it.Footnote 47 (Tele-)Presence (Schubert et al., 2001; Wirth et al., 2007; and others) refers to the embodied awareness of being in a virtual world: being there with your body, in other words absorption in a story-world.Footnote 48 The concept has its origin in research into the experience of virtual realities.Footnote 49 Attempts have been made to ground mechanisms of film-induced emotion on presence that is the audience’s basic and embodied awareness of being in the middle of the story-world as a witness to events befalling characters Anderson (1996); Tan (1994, 1996).

b. Green and Brock’s (2000) definition of transportation is the most frequently used conceptualisation of absorption in media-psychological research. It is considered a major gratification offered to readers of narrative and film viewers alike. It overlaps with presence in that it features a sense of being in the story-world, as well as a realistic and attentive imagery of details. The difference may be that as a metaphor transportation evokes associations with transition to or travel into the film’s story-world.Footnote 50 More than presence, the operationalisations of transportation entail personal relevance and participatory sympathetic feeling, amplifying the emotional quality of the experience.

c. Empathy is the common denominator for concepts referring to absorption in the inner life of fictional characters. Like transportation, it is seen as a major gratification in reading stories and watching drama and movies. Viewer empathy has been defined as perceiving, understanding and emotionally responding to character feeling in the seminal work on the subject by Zillmann (Zillmann, 1991, 1996). Perceived similarity and sympathy for the character (grounded in moral attitudes) have been suggested and tested as determinants of spectator empathy in drama (e.g., Zillmann, 1996; 2000; 2003; 2006).Footnote 51 There is still a need to sort out possible forms of empathy specific to the canonical conditions of the cinema which may be quite different from situations in real life where we observe other persons.Footnote 52 Moreover, empathy with film characters can be less or more cognitively demanding.Footnote 53 Identification (e.g., Cohen, 2001) seems to stand for complete absorption of the viewer’s self by a represented character.Footnote 54 It can be argued that empathy is the rule in film viewing while identification is the exception (e.g., Zillmann, 1995; Tan, 1996, 2013a, b), as most mainstream film narratives are mainly geared towards provoking the former rather than the latter. According to Smith (1995) they use 'alignment' techniques that promote perspective taking and allegiance strategies that foster viewer sympathy for the character while the distinction between self and character is unaffected.

d. Finally, flow (Csikszentmihalyi, 1997) is the odd person out in the series of absorption-like experience concepts reviewed here, because it applies not only to absorption in movies, narratives or games, but to any activities that stand out for a certain intensity and intrinsic reward as well. The rather simple idea supporting the concept is that a pleasurable state is experienced when the challenges inherent in an activity just match the person’s capacities. In the canonical setup of mainstream film (and mainstream audiences) this balance is generally realised due to filmmakers’ skilful presentation of interesting story-events, and the overlap of it with attentional, perceptual and cognitive routines that film viewers have acquired in the real world. Mainstream movie continuity film style facilitates flow a great deal as it tedns to minimize challenges posed by transitions from one view or perspective to another. Smith's (2012) studies were discussed above as relevant to smooth continuity of visual attention, and I would also mention the research on comprehension of events by Schwann (2013; Garsofsky & Schwan, 2009)

Obviously, these and other varieties of absorption are not mutually exclusive. Elsewhere we have presented qualitative empirical support for a dynamic interplay among the varieties of absorption (Bálint and Tan, 2015).Footnote 55

From the overview we may conclude that Münsterberg’s introspective psychology of the film experience is in large part echoed in the empirical observations gathered one century later. Viewers feel absorbed in another, exceptionally vivid reality, 'clothed in the [embodied] forms of our consciousness' (presence and transportation). Empathy is mentioned by Münsterberg as a prominent experience, and his notion of an unhampered stream of the imagination may correspond with the experience of flow. Focused attention is already in The Photoplay a major component of the film experience, that would later be investigated in research on bottom-up vs. top-down attention discussed above. Absorption, empathy and intensely focused attention can easily substantiate the enjoyability of watching films as Münsterberg already would have it. However, compared to Münsterberg’s conceptualisation of the typical film awareness, insights into how acts of imagination on the part of the spectatorcontribute to it have not advanced that much in the psychology of film.Footnote 56

A narrative simulation account of emotion in film viewing

Absorption is an affective state characteristic of the film expeirience. However, a description of the typical experience of narrative films is incomplete if more specific affective states are not considered. Watching movies has been identified with emotions. We go to the cinema to experience mirth, compassion, sadness, bittersweet emotions, thrill, horror, and soon in response to what we see and hear happening to characters and ourselves. Emotions of movie audiences have not received much attention since Münsterberg’s Photoplay. Twenty-first century film psychology has taken up where he left off, and a major step forward has been to regard the narrative structure of films as a fundamental starting point for explaining film viewer emotions. The narrative simulation account is, I think, dominant in today’s psychological approaches to the issue of why the cinema offers the intense and remarkable emotional experience that Münsterberg’s photoplays induced a century ago. Important work on emotion in media users has been done in media psychology, most on empathy with characters, but narrative induced emotion has not received much attention, as can be seen from a complete overview by Konijn (2013). Cognitive scholars in the humanities have highlighted different aspects of film narratives that induce perceptions of fictional events associated with intense emotional experiences (e.g., genre-typical film style: Grodal, 1997, 2009, 2017 ; Visch and Tan, 2009; narrative procedures, e.g., Smith, 1995; Plantinga, 2009; Berliner, 2017). I hope the reader will allow me to use my own work on the subject as an illustration. It is closely related to the cognitive - theoretical analyses just referred to. I have found a cognitive approach to emotion in general psychology fruitful for narrative modelling of emotion in film viewing.Footnote 57 Investigations of film-induced emotion have raisedthe issue of apparent realism: how can a clearly fictional world be taken for real to the effect of intensely moving emoting viewers? Oatley introduced a cognitive theory of narrative fiction as simulation (1999, 2012, 2013) that applies to film as a stimulus for possibly complex emotions. Narrative runs simulations on the embodied mind just as programs run simulations on computers.Footnote 58 I would add that filmviewers take part in a playful simulation in which the film leads them to imagine they are present in a fictional world, where they witness fictional events that film characters are involved in (Tan, 1995, 1996, 2008). Being a witness involves embodied perceptions of what happens in a fictional world, as well as in the imagination constructing and participating in events, without acting on these. In the process, events are taken for real for the sake of playful entertainment. This position is related to Walton’s (1990) well-known account of fiction as make-believe.

Frijda’s cognitive theory of the emotions (Frijda, 1986, 2007) is the starting point for further explanation of emotional experiences in response to film. The theory posits that the emotion system has evolved for adaptive action in the first place. For example, the sight of a monster will spawn a strong urge to flee due to a basic concern for safety being jeopardised. Of course, film audiences do not run out of the auditorium. According to the cognitive theory of emotion, action responses are not fixed responses to emotional stimuli, but the result of appraisals of what they mean for a person’s concerns in light of the situational context. Playful simulation provides the contextual frame for the complex appraisal of apparent realism of film events. The appraisal has three stages: perceptual, imagination based and self-involved.Footnote 59

1. Many popular film stimuli provoke immediate and automated appraisals of concern relevance and ensuing emotional responses, due for instance to their nature of unconditioned stimuli in the real world. A snake popping out from the bush would be an example. Emotional appraisals in the cinema can be and often are empathetic. That is they include perspectives on events taken by film characters. Film technology in mainstream movies is used to emphasise emotional triggers; editing could strengthen the suddenness of the snake’s appearance, and photography could render fear releasers such as the typical movements of the snake more salient.Footnote 60 But popular films also present us with emotional stimuli that are immediately perceived as fake, for example a rubber prop snake. Due to the playful simulation frame further cognitive processing of perceptions takes place. In the first case, film viewers realise that just perceived events are not real but must be held true for the sake of a playful simulation. In the second, they realise that the fake stimulus is only a prompt, and comply with its invitation to hold the stimulus true and allow it to appeal to their concerns, also for the sake of playful simulation.

2. Once imagination takes over from perception, the reality status of stimuli is traded for believability. As part of the imagination fictional events are matched with higher order genre-specific narrative schemas, and then dealt with as possibilities in a particular world. As Frijda (1989) argued when he discussed the apparent reality of fiction: 'Seeing a fake snake approach a real person is not scary. But watching an imaginary snake approach an imaginary Jane is. The first is seen as unreal in a real word, and the second as real in an imaginary world. And this is how we appraise events in fiction. The fun of art is in the play with the duality' (p. 1546). Play with the possibility of events in the imagined world and entertaining as-if emotions can suffice for genuine emotion to arise. As I argued elsewhere (Tan, 1996) the appraisal of the possibility of events in a particular fictional world can and usually does lead to genuine emotion, because humans have been equipped with a capacity to have emotions in response to mental representations of counterfactual and imaginary events. Footnote 61

3. The genuine emotion can—but does not need to—open up considerations of the believability of fictional events in the real world. Moreover, it can lead to imaginations in which the viewer’s self is involved in the events or their ramifications. The appraisal of fictional film events is treated in more detail in Tan and Visch (2018). The search for film style and technology features that are conducive to particular emotional appraisals has only slowly lifted off. Cutting's computational content analyses were already mentioned There are scattered empirical studies e.g. of camera angle and editing pace by Kraft (1987) and Lang et al. 1995, respectively. Film technique manuals and critical anayles provide abundant intuitively convincing examples of how to produce emotionally appealing sequences. It is to be expected that computational film analysis will soon enable large scale studies of the use of style and technology in emoting scenes.

Back to emotion and action. As film viewers perceive film scenes to be projections on screen of a fictional world, they understand they cannot act, and their action tendencies are suppressed.Footnote 62 As importantly, one’s inability to act upon a fictional world is a strong trigger for emotional responses involving the imagination of action. Driven by sympathy, viewers desire that protagonists escape from a horrific situation. In their imagination they anticipate and hope that the protagonist is saved by someone or something and if need be by a fictional miracle.Footnote 63 Thus, they experience or exhibit a virtual form of action readiness (Frijda, 1986).Footnote 64 This readiness for action can be directly observed in film viewers from their "participatory responses" (Bezdek, Foy & Gerrig, 2013) - such as overt expressions of sympathy for a character (see also Tan, 2013b). However, there is one thing that film-viewers as witnesses invariably do when properly emoted: eagerly watch the events on screen.

Following cognitive film theory further, I consider the emotional experience of film as the sum total of experience of the appraisal, internal and external bodily expressions and changes in action readiness integrated in consciousness in accompanying the sensory intake of units of film.

Film, interest and enjoyment

An account of `film - audience emotion is incomplete if it does not go into the question why we actually take the trouble of watching movies. Münsterberg already wondered how mature people can become so emotionally absorbed in fantasy worlds. Narrative films can be argued to address two basic emotional concerns in particular, curiosity and sympathy (Tan, 1996). All sorts of narrative fiction, including film provoke interest by presenting events with uncertain consequences. Thus, they address a basic curiosity, that is a need for novelty, knowing and exploration. Interest is the emotion that responds to appeals involving this concern. Interest in film viewing does have a real action readiness to it referred to above: watch eagerly. Because the response in interest includes spending and focussing attention to specific story-world events, its experience goes hand in hand with absorption. Mainstream film’s narrative is perfectly designed to support a characteristic systematic unfolding of interest as an emotion. Movies continuously present cognitive challenges that viewers know they can meet.Footnote 65 Silvia (2006) has shown in a greater number of studies that this is the condition for optimal interest. I have referred to the core appraisal of narrative interest as promise of rewarding outcomes, in terms either of desirability for a protagonist or mankind in general, or of coherence, completeness or elegance of a narrative’s structure, or both (Tan, 1996). In addition, the prospect of sought emotions, such as excitement, enjoyment and appreciation is as well part of the promise that ongoing film narratives constantly offer.Footnote 66 Interest is closely linked with enjoyment, the primary gratification that movies offer their audience. In the cinema interetst is pleasant because it is fun to entertain anticipations of as yet uncertain story-outcomes. Moreover, every outcome, even if it is unanticipated or unfavorable, is greeted with enjoyment because it answers one's curiosity. (In the case of sad, horrific or otherwise hedonically negative or mixed outcomes, "enjoyment" is not the proper label for the rewarding emotion. We return to the fun of unpleasant emotion in a later section).On a final note, interest in film viewing is a case of narrative interest as a broader category of emotions, but the sensory qualities of the medium are relevant for how interest feels. Curiosity to know is in part a desire for the closure of a propositional narrative structure, but in the cinema we do not only want to know but also to see and hear. The enjoyment of seeing a couple kiss or a heroine return after an odyssee of some sort is in the cinema incomplete when it is not shown. In the cinematic appraisal of interest, an anticipation of embodied completion of our narrative-led imagination is a major ingredient of the promise of reward.

Emotional responses to fiction film worlds

The second concern that movies touch upon is sympathy. That this concern is active throughout the reception of all traditional movies answers the question why film viewers care about damsels, hobbits or gorilla’s in distress. There is a fundamental human need for bonding with others and recognising whatever fictional character as someone 'like us' supposedly suffices for sympathy to arise.Footnote 67 Mainstream films activate the concern to the full as their sympathetic protagonists meet with ups and downs in on the way to their goals. Sympathy-based emotions like disappointment, regret, awe, mirth, suspense, hopes and fears, compassion and sadness occur in response to obstacles or their removal on the way to protagonists realising their projects.Footnote 68 Because these emotions arise in response to events (appraised as desirable or undesirable) in a fictional world, we refer to these emotions as responding emotions.Footnote 69 Some frequently experienced sympathetic responding emotions such as fear, sadness, compassion and being moved, can be empathetic, that is require mentalising a character’s inner life. Said more precisely, empathetic emotion requires that the viewer’s appraisal of any fictional events reflects the perspective of a character; the event is understood from a character’s imagined point of view and with her concerns, and feelings. In its most intense forms, sympathy can look and feel like self-indulgent sentiment. However, there is no point in condemning tears of sadness or joy as silly. The term sentiment is not necessarily pejorative. The appraisal of a character’s suffering or good doing can involve an acknowledgment of its superior measure, notably in relation to the self’s suffering or good doing. In my compassion with or admiration for a beloved character I can feel that her fate is really woeful compared to mine, or that her altruistic achievements make mine totally insignificant. Being moved, awe and having goose bumps are emotional responses accompanying such appraisals (Tan and Frijda, 1997; Tan, 2009; Wassiliwizky et al., 2017; Schubert et al., 2018) Footnote 70

However, not every responding emotion requires empathy or sympathy.Footnote 71 The sympathy concern does not only drive our siding with characters and responding emotionally to the ups and downs in their projects. As I proposed (Tan, 1996) it can make us invest affectively 'film-long' in characters, on top of going along in their hopes and fears, successes and failures. We are also witnesses of characters’ slower and more profound development into personae we would want them to be. The share of action or plot development relative to that of character differs from one genre to another.Footnote 72 Generally, action movies and especially comedies tend to allow for only minimal character development, whereas the drama genres may indulge into it. In these genres, viewer interest may depend in larger part on characterisation and character development.

Another class of emotions responding to the fictional world are 'spectacular' that is spectacle based. The spectacle of landscapes, buildings, natural objects and artifices, human or animal figures in motion, can surprise us and touch on a sense of beauty and invoke appraisals of harmony, elegance, or serenity. In some genres the spectacle of explosions, injury, cruelty disfiguration, etc. may incite disgust, fear raise emotions. Spectacle-based emotions do not rely on empathy of any depth, their stimulus being the mere view or sound of a fictional scene; they are neither dependent on sympathy. In more traditional terms, image and sound combinations of objects, events, and figures in the fictional world can be emotionally appraised as spectacular, beautiful, sublime, horrific, bizarre, absurd and so on. Amazement, enjoyment, awe (the wow-feeling), entrainment, being moved and aesthetic appreciation are apt labels for ensuing emotions. Like all emotional responses to fiction worlds, spectacle-based emotions can also arise when we read narratives, but in the cinema, they compete conspicuously with plot and character-driven interest and sympathy-based affective response. It seems like the viewer’s witness role is temporarily swapped for a spectator role.Footnote 73 The viewer can identify even further with patterns of motion or sequences of image and sound that lack reference to the film’s story-world. Viewers may contemplate lyrical associations of visuals, sounds, music and symbolic concepts in embodied consciousness as Grodal (1997) proposed. If story action imaginations give rise to emotions, lyrical associations are responded to with moods, e.g., nostalgic, tense or relaxed ones. The seemingly immediate representations on screen of emotions through camera movements and associative editing editing that Münsterberg described would be examples.

Emotion structure of narrative film

As a way to profile the dynamics of emotion across an entire film I proposed to represent these in a succinct model, the affect structure of a film (Tan, 1996). The model represents the course of interest and of responding emotions in time as predicted by theevents as they are subsequently presented by the film.Footnote 74 Generalising across titles, a most general hypothesis is that the level of interest during mainstream movies tends to rise globally. This is because on the way to protagonists’ goals, stakes tend to go up every novel complication. This will lead to increasing promise of reward roughly between the prologue and climax acts. Locally though, interest peaks and dips alternate over subsequent scenes, depending on genre and particular film. Figure 5 displays an example course of interest measured in viewers of the film In for treatment. In this study of emotions induced by a tragic drama on a terminally ill hospital patient, we found that an initial appraisal of the protagonist as increasingly suffering under the yoke of an oppressive hospital regime, was associated with a responding emotion of compassion. After the complication act, the protagonist’s acts of resistance against the hospital’s regime gave way to admiration due to an appraisal of the protagonist’s sense of self-determination. Both measures determined the level of interest measured continuously using a seven-point slider device (Tan and van den Boom, 1992). Affect structures can be more or less generic. That is, responding emotions are just like the plots, characters, and events that prompt these, characteristic for a certain genre. The study of genre-based emotion has been concentrated in research of undesirable effects of watching violence, sensation or horror in entertainment fare, see e.g. a volume edited by Bryant and Vorderer (2006). Psychological research into the role of viewer genre knowledge is on its way (e.g. Tan & Visch, 2009).

Fig. 5
figure 5

Continuous interest over the course of In for treatment; N = 21; from Tan and Van den Boom (1992). Interest was registered every second using a slider rating device. Measurement was validated by self-report interest ratings. Numbers under the abscissa represent subsequent scenes. 1–6: prolog; 7–18: complication, 19–20 development; 24: climax followed by epilog.

The appeal of unpleasant emotions

A brief glance at the success rates of films featuring sad, violent or horrific content illustrates the appeal that unpleasant emotions can have to audiences at large. Münsterberg already objected to vicious effects of violent and repulsive imagery in 1910s photoplays, contents that he observed to be worryingly attractive. The psychology of the film holds various explanations in stock, but none as yet chosen. The best documented proposal is Menninghaus et al.’s distancing-embracing model that stipulates two complmentary mechanisms. One rids painful, disgusting or otherwise unpleasant aesthetic stimuli from an impact that would prevent any enjoyment or appreciation of the stimulus. The other allows for experiences that are 'intense, more interesting, more emotionally moving, more profound, and occasionally even more beautiful' (Menninghaus et al., 2017, p. 1). The model is meant to explain the prevalence of negative emotion in all art forms, and harbours a great many classical approaches to the issue. Media psychologists have proposed what I think are regulation accounts of the pleasures of negative emotion. An emotion such as horror results from appraisal of monsters etc. as threatening and repulsive, but the emotion itself, too, can be subject to appraisal. Likewise, your crying in the cinema may induce embarrassment upon your realising that it is only a film you are watching.Footnote 75 Serious drama, the contents of which can be appraised as poignant or thought-provoking (Oliver and Hartmann, 2010), and more in particular independent arthouse titles that tend to provoke appreciation and elevation rather than enjoyment seem to compensate the most painful experiences they offer by a high instruction or (self-) reflection potential (Oliver & Bartsch, 2013). They offer continuous promises of broadening insights or revising one’s views of the world and the self, possibly only materialising to the full long after the show. In my own work I have pointed at the modulating effects of genre schemas (Tan & Visch, 2017) and narrative interest on negative emotions.Footnote 76

In closing the sections on film-induced emotion we need to note that the account of the cognitive appraisal of emoting events given here is simplified. Even straightforward film narratives can have complexities in terms, e.g., of plot lines, or character and narrator perspective that affect the intricacies of emotional events. I refer readers to Oatley’s (2012; 2013) discussion of in this sense more sophisticated appraisals of fictional events. More generally, film psychological research is needed into the use of more complex TOM heuristics in the comprehension of film narrative, and in emotional appraisals of film events.

The conclusion on the psychology of film awareness must be, I think, that the gripping nature of the film experience is as astonishing today as it was to early film audiences. Media psychologists have started to measure it, and cognitive film scholars have forwarded theoretical frameworks for an account of film viewer affect and emotion. But the phenomenology of film has not been expanded by film psychologists beyond the descriptions of what it is like to watch a movie provided in The Photoplay.

The psychology of film as art

Whether or not the awareness of film entails appreciations of artistry can only be a rhetorical question, but the psychology of the film has not explicitly addressed the subject. After Münsterberg and Arnheim hardly any psychologist considered film as an art form at all. And neither have general psychological aesthetics taken film into consideration. The psychology of narrative film as it developed since the 1990’s has addressed the aesthetics of movies, but rather implicitly. We have discussed psychologists’ efforts to explain the natural fluency in the perception of story-events that Münsterberg already found characteristic for the film experience. They pointed at the conventional use of continuity film style. Mainstream cinema’s narration has been demonstrated by cognitive film theorists to be at best marginally self-conscious (Bordwell, 1985, 2006). That is formal features of a film’s composition, style and use of technology are non-salient and subservient to the viewer’s reconstruction of and absorption in a fabula. The viewer’s construction of a story-world is only discretely cued by the narration, and formal or stylistic patterns that do the job tend to escape consciousness to a more than considerable degree (see Tan et al., 2017). We could say, I believe, that the psychological aesthetics of popular film is as it stands, first and foremost about absorption, the intense and fluent imagination of being in a fictional world. And it should be added that a psychological aesthetics of forms other than popular narrative fiction film is missing. Available knowledge suffices to propose a psychology of the thriller, the romance drama or the coming-of age film, but not for a psychology of the documentary, the expressionist, the surrealist or the postmodern film, let alone of experimental, avant-garde and other museum film art forms. After all then, at present we are not far removed from Münsterberg’s speculation on the aesthetic experience of theatrical film as intense absorption due to the inner harmony of a film’s parts and conditional on only modest deviations from realistic photo-representations of the worlds that it plays.

However, as we write, everything seems set to embark on research in the film audience’s aesthetic appraisals of movies. We can rest assured that at present 'the inner parts' of mainstream film in terms of contents, style and technology have been well-described by film theorists such as those referred to above. They can help psychologists teaming up with computer vision and hearing specialists to develop computational analyses of 'the inner harmony between the parts'. As a favourable sign of the times we also note a growing interest in the implicit knowledge that the regular film audience has of patterned uses of film style and technology in various forms and genres (see, e.g., Visch and Tan, 2009). Moreover, the first attempts have been made to identify the psychological dimensions that underlie film audience aesthetic tastes.Footnote 77 Dimensions of what I called the Artefact emotions, that is the affective evaluations of films as aesthetic products will soon be identifiable from reviews by critics and the film audience at large that are already available in large data repositories.Footnote 78 Large scale highly data-intensive research can be accompanied by smaller scale laboratory studies of whether and how viewers attend to aesthetically relevant patterns of formal and stylistic features.Footnote 79

Concluding remarks

The agenda that Hugo Munsterberg set for the psychology of the film, explaining the film experience through revealing psychological mechanisms underlying it, and accounting for its aesthetic functions is after a century still leading. I believe that psychologists of film have over the century not added new questions, while the ones he posed have been shown to be complex or even resilient. Nonetheless the field has gradually expanded. After the 1970's growth accelerated and today we face what in modesty may be called a surge. Two film-psychological books, Art Shimamura’s Psychocinematics (2013) and Jeffrey Zacks’ Flicker: Your brain on movies (2014), have recently filled the void left after The Photoplay.

The review of psychological studies into the film experience presented in this contribution is highly selective. It was not meant at all to cover the entire field, if only because we selected achievements from the vast research area of moving images and their perception. This is why the essay is titled 'A psychology of the film' rather than 'The etc.'. Granted its basic limitations, an overview of a century of film psychology could conclude with a comparison with research agenda that was set in Münsterberg’s Photoplay. The typical gripping experience that mainstream movies offer the audience has now come to be characterised as a sense of being absorbed by and quasi-physically present in a film scene that feels like going on as smoothly and continuously as a scene in real life. Considerable progress has been made in understanding how the basic psychological functions attention, perception and memory contribute to viewers’ comprehension of film. An understanding has developed of how attentional, perceptual and cognitive mechanisms dovetail with the solutions and norms of traditional cinemascopy. In the conventional 35 mm theatre set-up, the dark environment where high-density projections extend over the limits of the foveal acuity field, screens are big enough to allow for sufficient stimulation of the peripheral motion-sensitive visual field and the spinning projector shutter makes for smooth stroboscopic movement. Moreover, the visual system is quite resistant against perspective transformations due to less optimal viewing points, probably through extracting invariants under transformation (Cutting, 1986). Mainstream narrative continuity film-style ensures a fluent perception and comprehension of a film’s story-world, action, characters and their inner lives. Emotional responses can be explained from the development of the story and the progress of protagonists’ projects.

And yet, a lot less effort has been spent in theoretically elaborating further on what the film experience is. There is a general disbelief that it would involve a mere recognition of events, situations, persons etc. as we know them in the real world. But what exactly the spectator’s imagination contributes to the typical awareness of the film is still mysterious. And how filmic events, and the ways they have been staged, acted, framed, photographed and edited exactly influence and prompt acts of imagination on the part of audiences, has only in part been understood.

Meanwhile, the supply of "photoplays" has immensely multiplied and diversified since 1916, but the mainstream narrative film has by far remained the most popular form. Today’s ubiquitous access to moving images through a multiplicity of screens has made it more urgent than ever for psychologists to understand the experiences associated with extremely different cinematic devices. They range from handheld phones to giant 3-D multiplex screens and surround installations in museums. Canonical set-ups of the cinema also tend to diverge because of networked interaction technologies seeking application in the production, distribution and exhibition of motion pictures. Psychologists of the film can use their current understanding of how audiences experience mainstream cinema as a basis for differentiating what film semiologists call 'dispositives': clusters of production, exhibition and reception practices characterised by specific expectations, attitudes and competences of their end users.Footnote 80

The psychology of film is rapidly developing into an interdisciplinary field. Münsterberg’s psychological study already reflected inspiration from fields far removed from experimental psychology such as the then conventional practice of the photoplay as well as from Aristotelian poetics of the theatre play. In the same vein, current psychologists of film as we have seen, improve their understanding of the perception and cognition of film in a collaboration with experts in the analysis of narration in the fiction film. Advances in current models of film viewer attention featuring narrative cuing are profoundly informed by (historical) film analyses.Footnote 81 Scholars in cognitive film studies, such as those collaborating within the Society for the Cognitive Study of the Moving Image are steadily producing in-depth analyses of film at work conjointly with the viewer’s mind.Footnote 82 The same goes for the (more modest) advances made in psychological models of film-produced emotion. Further collaborations with specialists in machine-analysis of image and sound can be expected to add to an objective identification of formal and stylistic film structures, also beyond the domain of traditional mainstream film, 'in the wild' of cyberspace, and in experimental art cinemas.

The technology of measuring psychological responses to film structures (perception, attention, memory and affect) has also developed tremendously since Münsterberg founded the perception lab at Harvard. Gaze tracking, fMRI and TMS have been added to the psychophysical and cognitive response registrations. Integration of large scale image analysis data with behavioural measures obtained in the lab or as 'big data' is the next step in the development of film psychology. The study of integral responses to units of film extending beyond a few seconds entailing entire actions, events, scenes and acts, or even films as a whole, requires new response recording devices and data models. Perhaps it will be feasible within a decade or so to append large emotional response datasets obtained from social media and filmdatabase metadata to computational content analyses described above. We will then be able to categorise films into meaningful clusters, e.g., genres and subgenres based on relations between themes, plots, film style and emotion profiles. Small scale lab experiments can tell us more about what exactly the mind adds to the image on screen and the sound from cinema loudspeakers remains. Let me single out as the leading issue the question how bottom-up and top-down mechanisms interact in producing the film experience.Footnote 83 Diversification of the set-up of in-depth studies is also necessary following the multitude of conventional set-ups of film viewing on various screens and in on-line or 'live'(?) exhibitions.

And just as in 1916, a select but growing minority of researchers in academic, empirical psychology want to understand why and how it is we perceive and what it is like to enjoy movies. They want an understanding because first they are movie-loving psychologists and second they find film a challenging testing ground for fundamental models of attention, perception, memory, imagination, emotion and aesthetics.