This is for you: Social modulations of proximal vs. distal space in collaborative interaction

Rocca, Roberta; Wallentin, Mikkel; Vesper, Cordula; Tylén, Kristian

doi:10.1038/s41598-019-51134-8

Download PDF

Article
Open access
Published: 18 October 2019

This is for you: Social modulations of proximal vs. distal space in collaborative interaction

Roberta Rocca ORCID: orcid.org/0000-0001-9017-8088^1,2,
Mikkel Wallentin^1,2,3,
Cordula Vesper^1,2 &
…
Kristian Tylén^1,2

Scientific Reports volume 9, Article number: 14967 (2019) Cite this article

3117 Accesses
19 Citations
22 Altmetric
Metrics details

Subjects

Abstract

Human spatial representations are shaped by affordances for action offered by the environment. A prototypical example is the organization of space into peripersonal (within reach) and extrapersonal (outside reach) regions, mirrored by proximal (this/here) and distal (that/there) linguistic expressions. The peri-/extrapersonal distinction has been widely investigated in individual contexts, but little is known about how spatial representations are modulated by interaction with other people. Is near/far coding of space dynamically adapted to the position of a partner when space, objects, and action goals are shared? Over two preregistered experiments based on a novel interactive paradigm, we show that, in individual and social contexts involving no direct collaboration, linguistic coding of locations as proximal or distal depends on their distance from the speaker’s hand. In contrast, in the context of collaborative interactions involving turn-taking and role reversal, proximal space is shifted towards the partner, and linguistic coding of near space (‘this’ / ‘here’) is remapped onto the partner’s action space.

Effect of perceived interpersonal closeness on the joint Simon effect in adolescents and adults

Article Open access 22 October 2020

The relationship between action, social and multisensory spaces

Article Open access 05 January 2023

Spatial communication systems across languages reflect universal action constraints

Article Open access 30 October 2023

Introduction

Representing Space for Action and Interaction

A significant part of human spatial cognition is influenced by affordances for action and interaction offered by the environment^1,2. The tendency to encode functional features of objects and locations finds expression in the organization of space into an immediate peripersonal and a distal extrapersonal region. This distinction maps onto a contrast between objects within and outside manual reach, optimizing sensorimotor representations for manual action and defensive behavior^3,4.

The link between manual affordances and spatial cognition has been explored extensively in the literature^5,6. However, the majority of studies on the topic have focused on solitary individuals manipulating an object, while a considerable portion of our everyday behaviors unfolds in the context of face-to-face social exchanges where multiple individuals engage in a dynamic interaction with each other and with the environment^7,8.

In a range of situations, from passing the salt at the dinner table to building a new house, we find ourselves in situations where actions are performed together with other people and coordinated via language. In these cases, space is often shared between interlocutors, and objects in this space lend themselves to collaborative joint attention⁹ and action^10,11,12. Within such joint attentional scenes, objects’ affordances for manipulation emerge as a result of complex interactions between their distance to other objects and agents, and their contribution to specific action goals. Representations of space are thus likely to be dynamically modulated by functional properties of objects with regards to both individual and social action possibilities.

These observations lead to a number of specific predictions. First, given the action-oriented nature of spatial representations, peripersonal space is hypothesized to be biased towards hand-centered coordinates, a format which facilitates fast movement execution¹³.

Second, in activities involving other agents, functional representations of space may undergo structural changes to encode object affordances for joint action^14,15. Such dynamical social adaptations of near/far spatial coding are expected on the basis of a large number of studies on joint action showing that interlocutors spontaneously adapt to each other across a number of linguistic and non-linguistic behavioral measures, from the alignment of lexicon^16,17 and situation models¹⁸, to subtle bodily sway¹⁹, and even heart rate²⁰. Furthermore, studies on recipient design in language have shown that individuals flexibly adjust their utterances to the specific interlocutor and the overarching goal of the interaction^21,22.

We tested these two hypotheses using a novel experimental paradigm that allows to investigate dynamic modulations of sensorimotor representations: (a) in a multi-person interactive context; (b) maintaining fundamental features of social task-oriented behavior; (c) in a controlled and low-dimensional space. Using spatial demonstratives (words like this and that) as an online linguistic index of representations of proximal space, we aimed at systematically investigating how egocentric action-oriented biases and context-specific social modulations interact in shaping representations of space in everyday social interactions.

Demonstratives as linguistic indices of near/far spatial encoding

Spatial demonstratives are highly frequent lexical forms found across all natural languages²³. Used in conjunction with pointing gestures and gaze cues, they constitute powerful interpersonal coordination devices^24,25, that allow interlocutors to jointly attend to relevant locations, and to align on shared spatial representations. Whereas some languages have highly elaborate systems, most languages show a simple dyadic distinction between a so-called proximal (“this”, in English) and a distal (“that”) demonstrative²⁶.

Experimental evidence shows that the contrast between these two forms is a reliable proxy of object reachability in individual contexts. Over a series of studies, Coventry and colleagues have shown that participants systematically prefer proximal demonstratives for objects within reach, and distal demonstratives for objects outside reach^27,28,29. Moreover, the choice of demonstrative forms is sensitive to the same dynamic manipulations affecting the boundaries of peripersonal space³⁰. For instance, participants spontaneously extend the space for which they use a proximal demonstrative if they point to referents using a stick^31,32, and are similarly affected by perceptual and psychological parameters of the referent, such as its visibility, ownership and familiarity, when choosing a demonstrative form²⁷.

The present study

Over two experiments, we investigated how coding of objects and locations as near vs. far is influenced by affordances for manual action, and modulated across contexts of social interaction. Using a novel experimental paradigm, we presented participants with targets appearing on a two-dimensional horizontal plane and asked them to point and refer to them using demonstratives. In Experiment 1, participants performed the task alone or with a confederate, who was either engaged in a complementary naming task or in a collaborative communicative task. We hypothesized that targets located closer to the speaker on the sagittal plane and rightwards on the lateral plane (all participants were right-handed) would be more likely to be identified as proximal, which would support the hypothesis that proximal/distal encoding is tied to manual affordances of the referent. Crucially, we hypothesized that, in the context of collaborative interaction, targets located closer to the partner (that is to the participant’s left) would be more likely to be labelled as proximal than in individual or non-collaborative social contexts, indicating that in this condition manual biases are modulated by social affordances. Results from Experiment 1 have been previously reported in conference proceedings³³. In the present paper, these results are discussed more extensively and complemented with a second experiment, building on the interactive components of the social manipulation. In Experiment 2, we used a similar setup, but enhanced the interactive aspects of the task by introducing turn-taking and role reversal. Both experiments were pre-registered on the Open Science Framework. Pre-registrations, data and code are publicly available at osf.io/qjxg9/.

Experiment 1

Methods

Participants

Eighty right-handed participants (female = 43, age range = 19–48, median = 26, sd = 7.6) with Danish as first language took part in the experiment in return for monetary compensation. All participants gave written informed consent. The study received ethical approval from the Human Subjects Committee of the Cognition and Behaviour Lab at Aarhus University, and it was carried out in accordance with local ethical guidelines and procedures.

Design and procedure

Upon arrival to the lab, participants were instructed that they would be presented with a spatial working memory test, and that they were assigned to the “linguistic condition”. However, unbeknownst to the participants, there was no between-participant manipulation, that is, all participants did the same version of the experiment (in line with^27,28,29).

Participants stood by the inferior edge of a 40” screen, placed horizontally on a table. At each trial, a grid of circles would appear on the screen. After 500 ms, the grid disappeared and two target shapes (circles, triangles, squares, hexagons, stars) appeared on the screen for a random interval between 200 ms and 800 ms. The position of targets was randomized across trials. The grid would then reappear and the participant was prompted to designate the target positions. Participants were instructed to remember the locations of targets, then point to them while referring to the locations with the Danish demonstratives den her or den der, equivalent to the English this and that. Participants were explicitly instructed to use both demonstrative forms at each trial, one for each target, and were reminded to do so whenever they used only one of the expressions. No explicit instructions were given on the order of the pointing nor on the order of deictic forms. There were 132 trials per condition per participant.

Participants performed the task across three conditions. In the baseline condition, participants performed the task alone. In the complementary condition, a confederate stood to the left of the participant and named the target shapes (e.g. star, circle) after the participant had finished pointing. In the complementary condition, the two tasks were mutually independent. Neither the participant nor the confederate relied on the information provided by the other person in order to be able to perform their own task. In contrast, in the collaborative condition, the confederate’s task depended on the information conveyed by the participant’s pointing. In this condition, the confederate closed his or her eyes during target exposure and only opened them after hearing a click sound signifying the re-appearance of the grid. The participant then pointed at the location of both targets and referred to them using demonstratives. This allowed the confederate, who did not have perceptual access to the targets, to report their positions on a touch screen device placed next to the screen.

The authors and two student assistants took turns in the role of experimenter (live-coding the participants’ responses) and confederate. The baseline was always performed first. The order of the complementary and collaborative conditions was counterbalanced across participants. The setup of the experiment is displayed in Fig. 1.

Responses were coded as “invalid”, if the participants reported not having seen where the targets had appeared, when participants failed to use both demonstrative forms, or when the experimenter could not hear the participant’s response. The experiment was not videotaped, and no offline coding of responses was performed.

There was a total of 286 invalid trials out of 31680 total trials. Missing trials were distributed as follows: 76 in the baseline condition (0.95 per participant on average), 77 in the complementary condition (0.96 per participant on average), 133 in the collaborative condition (1.66 per participant on average). Invalid trials were excluded from the analysis.

At the end of the experiment, participants were asked to report what they thought the experiment was about, then debriefed. None of the participants reported to have realized that their use of demonstratives was the behavior of interest and they reported no awareness of how they had distributed the demonstrative forms spatially.

Analysis

The relative distances between the x coordinates and the y coordinates of the two targets were used as predictors for a mixed effects logistic regression, fitted using the glmer function from lme4 package in RStudio³⁴. For each trial, one of the two targets (henceforth: T1) was randomly selected and logged as target of interest. Relative distances on each of the axes were computed by subtracting the x coordinates and the y coordinate of the competitor target (henceforth: T2) from those of T1. The relative distance on the x axis (RelativeX) took positive values if T1 was further to the right than T2, whereas their distance on the y axis (RelativeY) took positive values if T1 was further away from the speaker than T2 on a sagittal axis. Figure 2 exemplifies how relative distances were computed in a sample trial.

Notice that relative distances combine information on distance between T1 and T2, and distance between T1 and the speaker. To provide a concrete example, large positive values of relative distance on the y axis not only indicate larger distance between the targets, but they also indicate that T1 is further away from the speaker, while the opposite holds for negative values. This metric makes it possible to investigate how the likelihood of coding a referent as proximal or distal is influenced by its position relative to the speaker and to competing referents, both relevant factors in linguistic organization of space into proximal vs. distal locations^28,35.

The fixed effects structure of the model included the relative distance between the two targets on the y axis (henceforth: RelativeY), the relative distance between the two targets on the x axis (henceforth: RelativeX), and a categorical predictor for condition (henceforth: Condition), as well as all interactions. The demonstrative word chosen to refer to T1 was used as outcome variable in the model. The distal demonstrative (that) was set as reference level, while the proximal form (this) was coded as success outcome.

The random effect structure included random intercepts for each participant as well as random slopes for RelativeY. The decision to only include slopes for RelativeY was made during the preregistration phase. The rationale was to keep the random effects structure simple enough to avoid potential convergence issues and therefore post-hoc adaptations of the model, while still accounting for between-participant variability in sensitivity to the expectedly most prominent effect.

Parameters were estimated using maximum likelihood estimation with Laplace approximation. The power simulation for the model is reported in the preregistration, yielding 70–100% power for fixed effects and two-way interactions. Planned contrasts for the categorical predictor Condition compared the participant’s behavior in the baseline condition with cumulative behavior in the social conditions, as well as the complementary condition against the collaborative condition.

Expected effects

On the basis of the hypotheses outlined above, we expected a negative main effect of RelativeY. More specifically, we hypothesized that the further away from the speaker T1 was on the sagittal plane, the less likely this target would be labelled as “proximal”.

Additionally, we expected a positive main effect of RelativeX. This effect would indicate that the more to the speaker’s right T1 was relative to T2, the more likely T1 would be labelled as “proximal”.

These effects would support the hypothesis that proximal/distal encoding is linked to manual affordances of the referent. Additionally, we expected RelativeX to interact with Condition. This indicates that the right-ward bias expressed by the effect of RelativeX would be significantly less pronounced in the collaborative conditions. In this condition, we expected targets located closer to the partner – that is, more to the participant’s left – to be relatively more likely to be labelled as proximal than in individual or non-collaborative social contexts. Given the binary nature of the outcome (proximal vs. distal), this prediction can equivalently be expressed in terms of right-ward targets being more likely to be labelled as distal in the collaborative condition, compared to individual or non-collaborative social contexts.

Detecting this effect would support the hypothesis that, in contexts of collaborative interaction, manual biases are modulated by social affordances.

Results

Main effects of distance on the sagittal and lateral axes

Figure 3 displays the proportion of proximal demonstratives as a function of RelativeY and RelativeX across conditions.

The proportion of proximal demonstratives decreases as RelativeY increases, i.e. as T1 moves further away from the speaker on the sagittal axis relative to T2. Additionally, Fig. 3 displays an increase in proportion of proximal demonstratives as a function of an increase in the value of RelativeX, i.e. as T1 moves further to the participant’s right.

The mixed effects logistic regression model with RelativeX, RelativeY and Condition as predictors, and including all interactions, confirms the statistical reliability of these patterns. The model displays a significant effect of RelativeY, β = −2.59, se = 0.27, z = −9.69, p < 0.001 and of RelativeX, β = 0.32, se = 0.02, z = 16.78, p < 0.001.

Taken together, the effects of relative distance of targets on the sagittal and lateral plane are in line with the idea that proximal space is intrinsically linked to manual affordances. As distance from the pointing hand decreases, objects become more likely to be identified as proximal.

Effect of social manipulations

Planned contrasts reveal a significant interaction between RelativeX and Condition when comparing the complementary and collaborative condition, β = 0.05, se = 0.02, z = 2.17, p = 0.03. No such effect is observed when cumulatively comparing the baseline to the social conditions, β = −0.001, se = 0.01, z = − 0.08, p = 0.936.

The interaction between RelativeY and Condition reaches statistical significance both in the contrast between the baseline and the two social conditions, β = −0.07, se = 0.03, z = −2.51, p = 0.012, and between the complementary and collaborative condition, β = −0.11, se = 0.05, z = − 2.45, p = 0.014. None of the three-way interactions reached statistical significance.

Figure 4 displays the observed proportion of proximal outcomes as a function of both RelativeX and RelativeY across conditions, providing an overview of the interplay between proximal/distal coding of space and the three experimental variables included in the statistical analysis.

The heatmaps clearly display the effect of RelativeY across all conditions. The right-lateralized bias on the x axis detected in the statistical analysis is particularly salient in the baseline and complementary condition, whereas it is slightly attenuated in the collaborative condition. Maps are smoothed by averaging across 8 nearest neighbors in x-y 2D space.

Figure 5 zooms in on the differences between the collaborative and the complementary condition, displaying the difference in proportion of proximal demonstratives for all combinations of values of RelativeX and RelativeY.

A full overview of the estimates from the statistical model is provided in Supplementary Table S1.

Interim discussion

In Experiment 1, we detected a functional bias in favor of the pointing hand when participants performed the task individually. The hand-oriented bias persisted if two participants were performing independent tasks. However, in contexts of collaborative interaction, that is, where participants depended on each other to perform their tasks, proximal space is shifted towards the partner. Interestingly, we observed social modulations of proximal/distal coding of space both in terms of expected left-ward modulations in proximal space and in terms of an unpredicted interaction between condition and relative distance on the sagittal axis (RelativeY). These modulations on the sagittal axis indicate that, while sagittal distance played a major role in speakers’ proximal/distal coding of space when participants performed the task individually or with a confederate performing an independent task, the effect of this factor was significantly less pronounced in contexts of collaborative interaction. This might in turn indicate that, in collaborative contexts, speakers drift away from coding locations as proximal vs. distal uniquely based on distance from their own body, towards a coding strategy that takes (also) the partner’s body as reference. Involvement in a collaborative interaction might not simply influence coding of proximal vs. distal space in the form of modulations of individual lateralized biases, but rather trigger a deeper change in the very reference frame against which proximity is evaluated.

In Experiment 2 we aimed at replicating both the expected and unexpected findings from Experiment 1 adding one more naturalistic component to the social tasks. In Experiment 2, in fact, participants take turns in performing speaker and addressee tasks introduced in Experiment 1. This role alternation mirrors the turn-taking structure of natural linguistic interaction, and enhances the collaborative nature and ecological validity of the task.