Theory of mind affects the interpretation of another person's focus of attention

Dawson, Jessica; Kingstone, Alan; Foulsham, Tom

doi:10.1038/s41598-021-96513-2

Download PDF

Article
Open access
Published: 25 August 2021

Theory of mind affects the interpretation of another person's focus of attention

Jessica Dawson¹,
Alan Kingstone² &
Tom Foulsham¹

Scientific Reports volume 11, Article number: 17147 (2021) Cite this article

1517 Accesses
1 Citations
8 Altmetric
Metrics details

Subjects

Abstract

People are drawn to social, animate things more than inanimate objects. Previous research has also shown gaze following in humans, a process that has been linked to theory of mind (ToM). In three experiments, we investigated whether animacy and ToM are involved when making judgements about the location of a cursor in a scene. In Experiment 1, participants were told that this cursor represented the gaze of an observer and were asked to decide whether the observer was looking at a target object. This task is similar to that carried out by researchers manually coding eye-tracking data. The results showed that participants were biased to perceive the gaze cursor as directed towards animate objects (faces) compared to inanimate objects. In Experiments 2 and 3 we tested the role of ToM, by presenting the same scenes to new participants but now with the statement that the cursor was generated by a ‘random’ computer system or by a computer system designed to seek targets. The bias to report that the cursor was directed toward faces was abolished in Experiment 2, and minimised in Experiment 3. Together, the results indicate that people attach minds to the mere representation of an individual's gaze, and this attribution of mind influences what people believe an individual is looking at.

Implicit Theory of Mind under realistic social circumstances measured with mobile eye-tracking

Article Open access 13 January 2021

A novel perceptual trait: gaze predilection for faces during visual exploration

Article Open access 24 July 2019

Guilt-inducing interaction with others modulates subsequent attentional orienting via their gaze

Article Open access 01 April 2023

Introduction

From shortly after birth, humans are drawn to animate and biological elements of the environment¹. Indeed, across the lifespan, human attention tends to prioritize animate beings, such as humans and other animals, over inanimate items. This is reflected in dissociations between the representation and processing of biological animate and inanimate items in the brain^2,3 and a behavioural bias toward animate items^4,5, both of which may confer a number of evolutionary advantages⁶.

One specific instance of this preferential bias for biologically relevant stimuli can be found in the human tendency to select and follow the eye gaze of other conspecifics^7,8,9, which has been linked extensively to theory of mind (ToM)^10,11,12,13. ToM describes the cognitive capacities which underlie our understanding of other people. These are often measured by asking people to judge what other people know, or why they behave the way they do, and there is considerable scientific interest in understanding how ToM develops and how it is related to behaviours such as perspective taking and empathy^14,15. This has frequently been studied by measuring the impact that another person's gaze direction has on an observer's attention. Recent work, however, suggests that gaze following may not require nor measure ToM^16,17, see Cole and Millett¹⁸ for a review. The present study therefore takes an alternative approach and turns the traditional gaze following approach on its head, by measuring whether ToM affects the interpretation of another person's gaze direction.

We achieved this goal by presenting observers with prototypical data collected from a mobile eye tracking study and asking the observers to indicate if the fixation cursor, which represents the gaze direction of another person, is directed toward different items in a visual scene. This is a task that researchers may have to complete when coding such data, but the possible impact of ToM and one's goals on those decisions has not been investigated. Although some studies have shown that participants can make judgements about another person’s intention by looking at their eye movements as represented by a fixation cursor¹³, it is also the case that we are surprisingly unaware of our own fixations^19,20,21. In the present study we can test whether knowledge of what people are likely to look at (the animacy bias) can be applied to a fixation cursor. Across three experiments observers were told that the position of the fixation cursor was generated by a human (i.e., one who does have a mind and goals, Experiment 1), randomly by a computer (i.e., one who does not have a mind or goals, Experiment 2) and by a computer vision system (i.e. an agent which does not have a mind but does have explicit goals, Experiment 3). As humans, unlike computers, are preferentially biased toward animate items in the environment, we predicted that observers would be biased to report that a fixation cursor was directed to an animate item versus an inanimate object only when the cursor was understood to be generated by a human.

Experiment 1

Method

All experiments were approved by the Ethics committees of the University of British Columbia or the University of Essex, and all research was performed in accordance with institutional guidelines. Informed consent was obtained from all participants. Experiments were pre-registered.

Participants

426 (321 female) volunteers were recruited online and via posters at the University of Essex and the University of British Columbia.

Stimuli

Drawing from staged scenes taken on a university campus, we selected 10 animate scenes each containing a different person, and 10 inanimate scenes each containing a different object. Each image measured 930 × 671 pixels. Onto each scene we placed a red cursor (that differed in shape or size: a large or small circle or cross). These cursor types were selected to explore whether different shapes or sizes of cursor, which are commonly used with eye-tracking data, affected decisions regarding eye movement behaviour. Each of these cursors could occupy one of five different distances from the target object, with the nearest cursor at the edge of the target, and the distances increasing horizontally (left or right) in steps of 15 pixels (Fig. 1), with the vertical position fixed. In images of people, the faces were in profile with the cursor always placed to the front of the face. Collectively, 20 scenes (10 animate, 10 inanimate) × 4 cursor types × 5 distances yielded a set of 400 images for this study.

Design

Participants were randomly assigned to one of the cursor shapes (between-subjects). The within-subject factors were target type (person or object) and cursor distance (5 distances). Each participant saw 20 of the 100 possible images for their cursor condition, randomly selected with the provision that each original image (10 animate and 10 inanimate) was presented only once but all 5 distances were represented for both target types.

Procedure

Participants judged cursor location via an online survey (Qualtrics). After reading the instructions, participants were provided with an explanation of eye-tracking and shown an example video clip of a cursor representing eye gaze moving around a scene. Participants were instructed that researchers have to make decisions as to whether the person was looking at an object of interest (a “hit”) or not, and that where they were looking was depicted by the cursor. Participants were made aware of the subjectivity of gaze cursor coding decisions, given some inaccuracies that could be seen in the video clip. It was explained to participants that researchers have to code whether a cursor is on the target, a ‘hit’, or not by deciding whether the cursor is on target. More specifically, participant instructions were: ‘For the purposes of this research, pretend you are a researcher analysing eyetracking footage. In a moment, you will be shown 20 still images from a live video recording. You will then need to decide if the Focus Point is a 'hit' (on target) or not.’. Following this, participants were asked ‘Is this a ‘hit’?’ and given the name of the potential target (‘ball’, etc.), for each of the 20 images. Participants selected ‘Yes’ or ‘No’ before the next image was presented in a randomized order.

Results and discussion

We analysed the relative frequency of “hit” judgements for objects and faces, split by the five levels of Distance (1–5) and by Cursor Shape and Size. We used a generalised linear mixed model (GLMM) approach, using 4 predictor variables (Distance, Target Type, Cursor Size, and Cursor Shape) to predict the binary response and thus in which circumstances participants would classify the cursor as a hit. Each participant (426) responded to each image (20), giving 8520 data points. We used the lme4 package in R and a binomial function, assessing the contribution of each factor with maximum likelihood. Participant and scene were included as random effects. Where possible we also included random slopes by participant and item, and these were dropped when models failed to converge.

Figure 2 shows the empirical data and the best fitting statistical model. The continuous variable of Distance was a significant predictor (compared to intercept-only: χ²₍₃₎ = 1027.8 p < 0.001). As expected, the probability of a cursor being coded as hitting the target decreased as distance from the target increased (β = − 1.96, ± 0.08 SE, p < 0.001). Adding Target Type (object or face) further improved the model (χ²₍₄₎ = 206.12, p < 0.001). There was an increased probability of reporting a hit when the cursor was near a face, compared to when it was near an object (β = − 1.36, ± 0.10 SE, p < 0.001). In additional models, we added Cursor Size and Shape but these did not improve the model fit (p = 0.43 and p = 0.19, respectively). Thus, the size and shape of the cursor did not make a difference to whether a participant would code the cursor as a hit. The interaction between Distance and Target Type also failed to improve the model fit (p = 0.93). Table 1 gives full details of the best fitting model, which includes random effects of participant and image and random slopes of Distance and Target Type by participant.

Table 1 The best fitting GLMM for predicting the binary decision of cursor location in Experiment 1. The reference level for Target Type was the face condition.

Full size table

As illustrated in Fig. 2, as distance away from the target increases by 1 step (15 pixels), the hit rate drops by roughly 20%. However, participants treated selection of faces and objects differently. If the target was a face, the predicted chance of a hit was 10–15% higher than when the target was an inanimate object. This difference was fairly consistent across the 5 distances measured.

Collectively, these results show a clear difference in the way that the location of a gaze cursor relative to a target is evaluated based on whether the target is a face or an inanimate object. When the target is a face, participants are more likely to judge that the cursor indicates that a human observer is looking at the face than when the target is an inanimate object. This outcome provides support for our hypothesis that observers will be preferentially biased to report that a fixation cursor is directed to an animate item versus an inanimate object when the cursor is understood to be generated by a human. Interestingly, for both types of target, there is a graded response, indicating that participants did not only consider the cursor as selecting the target when it fell on or near to the target. Even when gaze (i.e., the cursor) was some distance away, participants were willing to interpret it as reflecting attention to the target, especially when the target was a face.

It is tempting to attribute these effects to the judges’ theory of mind. By this account, the cursor is more readily judged to be targeting a face because the judge attributes a mind to the looker, and they know that such an observer is biased towards animate objects. However, an alternative possibility is that because the judges themselves are humans with minds, it is their own attention that is being pulled toward the animate items in the scenes (supporting work by Pratt et al.⁵). This would explain the marked tendency to report that the cursor is directed toward the target when it is a face rather than an inanimate object.

To distinguish between these two explanations, we conducted a second experiment, the key change being that participants were told that the cursor was randomly generated by a computer. This should remove any preconceived beliefs about the attributes of the looker from whom the cursor was generated. If Experiment 1’s results reflect attributions of the mind to the looker, which is represented by the cursor, then in Experiment 2 the preferential bias to report the cursor as directed towards faces (rather than inanimate objects) should be eliminated. However, if the results of Experiment 1 reflect the judge's (i.e., the participant’s) own attention being drawn toward the faces, we should find the same results as before and regardless of the instructions. Of course, these two explanations are not mutually exclusive, and the results of the current experiment may reflect both the participant’s attribution of mind to the looker and their own attentional bias, in which case one would expect the preferential bias to report that the cursor is being directed toward to faces may be reduced but not eliminated.

Experiment 2

As cursor size and shape did not matter in Experiment 1, we ran only one cursor condition in Experiment 2.