Language as shaped by the environment: linguistic construal in a collaborative spatial task

What causes cultural groups to favour specific conventions over others? Recently, it has been suggested that cross-linguistic variation can be motivated by factors of the wider non-linguistic environment. Large-scale cross-sectional studies have found statistical differences among languages that pattern with environmental variables such as topography or population size. However, these studies are correlational in nature, revealing little about the possible mechanisms driving these cultural evolutionary processes. The present study sets out to experimentally investigate how environmental factors come to shape the emergence of linguistic conventions. To this end, we adapt the classical Maze Game task to test the hypothesis that participants routinise different linguistic strategies to communicate positions in the maze contingent on particular environmental affordances (i.e. structure of the mazes). Our results confirm that subtle environmental motivations drive the emergence of different communicative conventions in an otherwise identical task, suggesting that linguistic adaptations are highly sensitive to factors of the shared task environment. We speculate that these kinds of mechanisms found at a local interactional level, through processes of cultural evolution contribute to the systematic global variation observed among different languages.


Introduction
D ifferent languages carve up the world in quite different ways. Notable examples include the way languages divide the same continuous colour space in different numbers of basic colour terms (Berlin and Kay, 1969;Gibson et al., 2017) or the way languages conceptualise the same spatial relation between two objects in cardinal (object 1 is south of object 2), intrinsic (object 1 is in front of object 2) or relative terms (object 1 is left of object 2) (Majid et al., 2004;Haun et al., 2011). What is the source of this cross-linguistic variability? Influential approaches have suggested an innate biological basis of concepts (Haidt and Joseph, 2007;Caramazza and Mahon, 2006;Hauser et al., 2002;Caramazza and Shelton, 1998;Pinker, 1994;Fodor, 1983). However, such nativist approaches are generally associated with universalist predictions and thus have difficulties accounting for observations of wide cross-linguistic variability in conceptual construal (Evans and Levinson, 2009;Everett, 2013a). An alternative approach sees concepts as socio-cultural conventions stabilised through processes of cultural evolution (Kirby, 2017). In this relativist view, linguistic structure comprises learned social conventions, and structural diversity is regarded as a constitutive property of language directly reflected in the large variation in grammar, semantic and conceptual categories found among the world's languages (see, e.g., Everett, 2013a;Hammarström, 2016;Lupyan and Dale, 2016 for an overview). However, the origin of this variation remains an open question: Is cross-linguistic variation fully stochastic, that is, an expression of continuous random selection among multiple equally available alternatives leading to gradual change and conventionalisation over time? Or is culture-specific linguistic structure motivated by non-random identifiable factors? Two classes of factors have often been suggested as key candidates to address this question: biological, innate (non-linguistic) cognitive biases on the one hand (Kirby et al., 2007), and cultural evolutionary dynamics on the other (Evans and Levinson, 2009). Recent work using computer-simulations (Christiansen and Chater, 2008), agentbased models (Puglisi et al., 2008;Steels, 2011;Kirby, 2017) and experiments with human subjects (Tamariz, 2017) suggests a combination of implicit learning, processing biases Chater, 2008, 2016a), and interactional dynamics (Garrod and Doherty, 1994;Kirby et al., 2015) to account for diachronic changes in language structure over time.
However, there has also been an increasing amount of correlational evidence that the diversity of the world's languages might be motivated by adaptation to local social, physical or technological environments (see Lupyan and Dale, 2016 for a review). Large-scale cross-sectional data suggests that languages, as they are learned and used, adapt to their specific ecological niche.
For instance, the morphological complexity of languages seems to be predicted by social variables such as number of L2 learners and population size (Lupyan and Dale, 2010;Bentz and Winter, 2013;Cuskley et al., 2018). Variability in certain aspects of phonetics is suggested to be associated with bite configurations adapted to long-term changes in diet (Blasi et al., 2019). Furthermore, several studies have suggested that environmental factors can motivate subtle differences that become gradually entrenched over time through mechanisms of cultural transmission. Examples include relationships between aspects of the physical environment and lexical (Brown and Lindsey, 2004;Regier et al., 2016) or linguistic sound inventories (Everett, 2013b(Everett, , 2017Maddieson and Coupé, 2015).
Another example is spatial referencing: There is high variation among the world's languages in how people express spatial deixis (Levinson et al., 2018) or relations between objects (Levinson and Wilkins, 2006). Interestingly, while industrialised, urban speechcommunities seem to prefer egocentric frames of reference reflected in expressions like left and right, more rural speech communities often rely on expressions reflecting prominent properties of the local environment to express spatial relations (Levinson, 2003;Palmer, 2015). These expressions (e.g., uphill, downriver or oceanward) are viewpoint-independent and thus rely on different geocentric conceptualisations, which also manifest when speakers of these languages are tested on non-linguistic tasks (Majid et al., 2004;Haun et al., 2011). These observations suggest that the choice of reference frame could be motivated by non-linguistic variables, such as local topography, population structure or L2-contact (Li and Gleitman, 2002;Bohnemeyer et al., 2015). For instance, it was found that even phylogenetically distant languages spoken on atolls (ring-shaped collections of islands), such as Dhivehi and Marshallese, converge in utilising reference frames relating to the topography of the atoll ("oceanward" vs. "lagoonward"), while, for instance, Marshallese speakers in Springdale, Arkansas (US) prefer an egocentric reference frame (Palmer et al., 2017).
However, due to the cross-sectional nature of these studies, the actual causal dynamics are often inaccessible and can only be hypothetically inferred. Observations are often based on small samples and patterns are varied and probabilistic rather than deterministic (e.g., Majid et al., 2004). It is thus very hard to disentangle the influence of environmental factors as these often conflate a number of sociocultural factors pertaining to subsistence (e.g., Palmer et al., 2017 find the geocentric reference frame used more on fishing islands than non-fishing islands), education, or contact with other languages (Bohnemeyer et al., 2015). For instance, the finding that ambient humidity predicts whether a language exhibits tone as a phonological feature (Everett et al., 2015) was recently found to be confounded by other historical factors (Roberts, 2018).
In order to advance our understanding of the underlying mechanisms shaping linguistic conventions and variation in underlying conceptual strategies, we devise an experimental approach to test conceptually grounded predictions about causal relationships between variables in a controlled way (Galantucci et al., 2012;Roberts and Winters, 2013). Specifically, we test the hypothesis that linguistic conventions are contingent on environmental affordances, that is, that conceptual construal expressed in language is motivated by structure inherent in the environment in which communication and coordination take place 1 . 'Affordances' thus refers to features of the environment that make certain actions possible or desirable given the constraints of the bodily capabilities and intentions of an organism. In this sense a cup affords grasping if you are human with an opposable thumb (Gibson, 1979).
Languages are essentially sets of conventions constantly reshaped through learning and use in contexts of social interaction (Lewis, 1969;Beckner et al., 2009;Tylén et al., 2010). Previous studies have shown that conventions emerge spontaneously in task-related dialogue when pairs of participants are facing collaborative problems. Examples include the "Maze Game" (Garrod and Anderson, 1987;Garrod and Doherty, 1994): The Maze Game provides participants with a coordination problem as they need to exchange information about the location of switches and gates to collaboratively solve the task. This requires them to establish a shared vocabulary to coordinate their positions in the mazes. Previous studies have found that participants spontaneously develop and align description schemes for positions in the maze, reflecting their particular mental construal (conceptualisation) of the spatial scene. For instance, some participants would denote a position in a maze by reference to salient figurative details of the maze, while others would conceive of the maze as consisting of horizontal lines and navigate accordingly. Some description strategies were generally found to be more effective and favoured over others. For instance, many participants would initially use the FIGURAL strategy, but would, through repeated trials discover that a more efficient strategy was to create an abstract coordinate system with numbered rows and columns that could be applied reliably across maze trials (Garrod and Doherty, 1994).
The current study adapts the original Maze Game design by adding an environmental dimension in the form of three maze topologies: These experimental conditions profile different affordances for referential strategies and allow testing whether participants would spontaneously adapt their mental construal and corresponding linguistic descriptions to form distinct conventions contingent on these environmental affordances. Each environment features different salient properties that act as attractors motivating the stabilisation of different conceptualisations. Irregular mazes were designed to profile figurative aspects, stratified mazes were meant to evoke descriptions based on horizontal lines, and regular mazes were designed neutrally to highlight the possibility of construing positions as points on a coordinate system (see the section "Methods").
We thus hypothesise that linguistic variation will emerge between the three environments as participants establish and over time converge on proto-conventions 2 relying on these conceptualisations. More specifically we test the following two hypotheses: H1: The topological layout of the mazes will motivate different linguistic strategies across the three conditions. This corresponds to the following four predictions: H1 P1 : Participants solving the Maze Game in an irregular environment will tend to use a FIGURAL strategy more predominantly, reflecting a mental construal relying on salient shapes in the maze layouts. H1 P2 : Participants solving the Maze Game in a stratified environment will use a LINE strategy more predominantly, conceptually construing the mazes as consisting of parallel horizontal lines. H1 P3 : Participants solving the Maze Game in a regular environment will use a MATRIX strategy more predominantly, construing the mazes as an abstract coordinate system consisting of rows and columns. H1 P4 : Another prevalent construal that has been reported in previous maze game experiments is the PATH strategy, which describes a PATH from a reference starting point to a goal location. We therefore predict that, in competition with the MATRIX strategy, participants solving the maze game in a regular environment (not providing salient landmarks) could also use the PATH strategy, more predominantly.
All predictions from hypothesis 1 are tested in two ways. First, we test a simple model assuming variation in conceptualisations as a function of the environments. Second, we test a model predicting temporal effects. That is, from an initial situation of strong competition among construals of the mazes, each participant pair should over time converge on a preferred description scheme. We hypothesise that this choice will, to some extent, be motivated by the affordances of the environments making up the conditions, giving rise to interactions between environment and time.
Further, we were interested in how contextualised social interaction can give rise to the gradual stabilisation of linguistic conventions. To investigate the extent to which linguistic behaviours in the experiment evolve characteristics of proto-conventions, we included a final trial testing participants from all three conditions on the same maze, neutral to the three types of topological affordances. If linguistic strategies established through repeated interaction over the experimental trials are conventionalising, we expect participants to stick with their linguistic strategy even when presented with a new environment potentially equally affording a different strategy. In other words, environmental affordances are expected to be particularly influential as new linguistic strategies are establishing. Once conventionalised, a linguistic construal is stabilised by socio-cultural entrenchment to facilitate effective communication (it is costly to continuously change/adapt conventions as it can lead to misunderstandings). This leads to the second hypothesis: H2: Participants will keep using their preferred strategy (contingent on environmental conditions) even when presented with a new environment potentially affording a different construal, indicating aspects of conventionalisation. This leads to the prediction that when finally tested on the same maze, participants will display systematic differences in their linguistic behaviours depending on their assignment to one of the three conditions. More specifically, on a final, neutral trial that is the same across conditions, we expect participants to keep using the strategy they have previously routinized.
H2 P1 : Participants in the irregular condition will keep using FIGURAL descriptions more than other description strategies.
H2 P2 : Participants in the stratified condition will keep using LINE descriptions more than other description strategies.
H2 P3-P4 : Participants in the regular condition will keep using MATRIX or PATH descriptions more than other description strategies.

Methods
Participants. Thirty-three participant pairs (n = 66, 24m/42f, M Age = 23, SD = 3) were recruited among students at Aarhus University. All participants signed informed consent in concordance with regulations of the local research ethics committee. Participants were randomly assigned to pairs and did not know each other in advance. Three additional pairs were excluded, due to non-compliance with instructions or severe difficulties with solving the task described below.
Materials. The task was based on Garrod and Anderson's (1987) Maze Game. The original setup involved participants located in separate rooms having to collaboratively coordinate in real-time via headsets to solve a series of mazes (see Fig. 1). The task is for each participant to move from a start position to a goal position in the maze. While seeing the same maze, start and goal positions differ for members of a dyad. Furthermore, the path from start to goal will initially be blocked by one or more 'gates'. The 'switches' to open the gates of one participant can only be operated by the other dyad member. However, participants cannot see the position of their partner's switches (only seeing their own, which are out of reach) and so, to solve the mazes, they depend on information from each other, particularly the positions of switches. An experimental trial ends when both participants reach their goal destinations.
Our variant of the Maze Game differed along two dimensions. We ran the Maze Game through a (written) chat system 3 , and also introduced three novel experimental conditions manipulating the shape of mazes in an irregular, stratified, and regular condition. The mazes were produced by systematically varying a 7 × 7 grid (Fig. 1). Irregular mazes were designed to involve geometric or abstract shapes like a cross or a square, protruding "extremities" sticking out from a main part and overall shapes that could be interpreted in various figurative ways (see Fig. 1a). Irregular mazes were not designed to afford specific interpretations, but to provide many affordances for perceiving figures or shapes. Stratified mazes, by contrast, involved prominent horizontal displacements, which could be easily identified as "lines" or "rows" (Fig. 1b). Lastly, regular mazes featured a high density of boxes in grid-like structures with no particular salient local features. Hence, mazes across conditions varied systematically along several dimensions: By comparison, regular mazes had more rooms (M n = 28) and connections between rooms than irregular mazes (M n = 21.4), since the figural shapes required empty space to become salient. The average room number for stratified mazes (M n = 26.7) was comparable to the regular condition, but stratified mazes differed from regular and irregular mazes in that their connection ratio was skewed in favour of horizontal connections to create salient "lines" (Fig. 1e). The final maze resembled the regular condition in being relatively dense, while providing participants with figural affordances (e.g., it could be segmented into a "snake" with a "head" and "tail" or "narrow corridors" and a "square"). In addition, it resembled stratified mazes in that it provided clear horizontal lines. Visually overlaying all mazes per condition shows that irregular mazes were more unstructured, while regular mazes cluster around a dense square, and stratified mazes show clear horizontal patterns (Fig. 1f). We kept the number of switches (1-2 per maze) and gates (~2 per maze) the same across conditions to balance the level of difficulty.
Procedure. An experimental session could include up to four pairs (eight participants) tested simultaneously. Participants were seated in separate booths in front of client computers, unable to see their neighbours' screens and unaware of the identity and position of their interlocutor to whom they were connected over a network. One experimenter supervised the participants, while a second monitored the ongoing games and chats on the server computer in a separate control room.
Participants were randomly allocated to one of the conditions (regular, irregular or stratified). We tested 10-12 pairs per condition (due to the exclusion, see section "Participants"). Each experimental session consisted of 12 trials: 11 condition-specific mazes (the order of which was randomised within conditions), and a final 12th maze that remained constant across all conditions (see Fig. 1d). Participants communicated in Danish through a written chat client and all conversations and game performances were logged to the server.
Data analysis. The full corpus contained 4841 turns (M Length = 6 words) from 33 pairs. On average pairs produced 12 turns per maze (decreasing from M = 22 turns on the first maze to M = 7 on the final maze). These were manually coded at the turn level for spatial description types by coders blind to the conditions. 1260 descriptions were identified. 27% of the corpus (three pairs per condition) were coded independently by two coders with substantial inter-rater-reliability (Cohen's κ = 0.7).
Coders relied on the same coding scheme used by Garrod and Doherty (1994), assigning each linguistic description of a location in the maze to one of the following four categories: (1) FIGURAL descriptions, where positions are identified in relation to salient figural shapes recognised in the maze, (2) LINE descriptions, where the maze is conceptually construed as consisting of horizontal or vertical arrays of boxes, i.e. parallel lines, and then positions are identified by reference to these lines, (3) MATRIX descriptions, where the maze is construed as a grid and positions are referred to as intersections of x and y coordinates, and (4) PATH descriptions, where locations are identified by describing a path from a start to an end point. We further included a fifth category, UNDEFINED, for descriptions too vague to be classified as one of the above categories. UNDEFINED descriptions accounted for 3% of all descriptions and were excluded from further analysis (see section "Results" for examples of each strategy, and Garrod and Doherty (1994), for more details on the coding scheme).
The distribution of description strategies was quite heterogeneous across conditions following our predictions (Fig. 2). To test H1, that the environments systematically motivate different construals, we built a multilevel Bayesian multinomial regression with a logit link. The predictor type coded for each turn was the categorical outcome (FIGURAL, LINE, MATRIX, and PATH), while condition was the categorical predictor (irregular, stratified and regular). Further, we modelled varying effects by pair and interlocutor to regularise for individual and pair variability, as well as by maze to regularise for individual maze variability. We used regularising priors, that is, discounted extreme values: a normal distribution centred at chance level (25%, log-odds: −1) for the occurrence rate of each of the description strategies, a positive half-normal distribution centred at 0 with a standard deviation of 0.1 log-odds for individual, pair and maze variability, and LKJ distribution with η = 5 for the correlations within varying effects. The quality of the model was assessed by performing prior predictive checks and posterior predictive checks, as well as Rhat (<1.01) and effective samples for both bulk and tails of the posterior (>200). The hypotheses were assessed using evidence ratio, that is, the amount of evidence for the hypothesis (posterior samples in a range of value compatible with the hypothesis) compared to evidence against the hypothesis. Evidence ratio is a continuous measure, but it has been argued that values above three present an anchor reference for moderate to substantial evidence for the hypothesis (Morey et al., 2016). When the hypothesis was supported by less than moderate evidence, we also estimated the evidence ratio for the null hypothesis.
To test whether pairs converged on condition-related conventions over time we built two additional multinomial models including time (which maze in the sequence pairs are solving), the first with time modelled as linear, the second with time modelled as monotonic (changes happen in the same direction at each time step, but the size of change is variable) and tested whether these models had better estimated out-of-sample performance (using stacking weights based on Leave One Out Pareto-smoothed importance sampling, see Vehtari et al., 2017;Yao et al., 2018).
To test H2, that conceptualisations conventionalise to an extent where they are generalised to other environments, we implemented a multilevel multinomial regression as above, only including data from the last maze.
Finally, to gain a better understanding of the mutual attraction and transitions between description types and account for the competition of multiple strategies within condition, we built and visualised discrete time Markov chains. A Markov chain is a matrix of transition probabilities between pre-defined possible states. In other words, it indicates per each possible state (in our case the four strategies: FIGURAL, LINE, MATRIX, and PATH) the probability that any given state will follow in the next trial (see Fig. 2). The transition probability p ij to move from one possible state s i to s j is defined as p ij = Pr(X 1 = s j |X 0 = s i ). Representative Markov chains for each condition-estimated as a bootstrapped (n = 100) average of the dyad-level Markov Chains-are presented in Fig. 3d-f. For each possible state of a Markov Chain, we define its attraction strength as the tendency of transitioning or staying in that state: where A j is the attraction strength of a given state j, and p ij is the probability of ending into state j from a given state i. All analyses were run relying on R 3.6.1 (R Core Team, 2019), RStudio 1.2.1568 (RStudio Team, 2015), tidyverse 1.2.1 (Wickham, 2017), BRMS 2.9.0 (Bürkner, 2018), Stan 2.19 (Gelman et al., 2015) and MarkovChain 0.6.9.16 (Spedicato, 2017).

Results
We found abundant evidence in favour of H1 P1 , H1 P2 , and H1 P4 : irregular mazes selected for FIGURAL descriptions, stratified ones for LINE descriptions and regular ones for PATH descriptions, both in terms of relative frequency across conditions and of their increase over time. We only found partial evidence in favour of H1 P3 : participants solving regular mazes did not have a higher propensity to use the MATRIX strategy than those in any of the other conditions. However, the use of FIGURAL and LINE descriptions (but not PATH, see H1 P4 ) actively decreased and more so than MATRIX descriptions. See Fig. 2a, b for estimates by condition on a probability scale, and Tables 1 and 2 for full details.
We found abundant evidence in favour of H2 P2 and H2 P4 , partial evidence in favour of H2 P1 , and evidence against H2 P3 . Participants from the stratified and regular conditions had a stronger propensity to use LINE and PATH descriptions respectively, even in the final neutral maze. Participants from the irregular condition used FIGURAL descriptions more than those from the regular but were not more likely to use FIGURAL descriptions than PALGRAVE COMMUNICATIONS | https://doi.org/10.1057/s41599-020-0404-9 ARTICLE PALGRAVE COMMUNICATIONS | (2020) 6:27 | https://doi.org/10.1057/s41599-020-0404-9 | www.nature.com/palcomms those from the stratified condition. MATRIX descriptions were not selected for by any specific condition more than by any other. See Fig. 2 for estimates by condition on a probability scale Table 3.
Average Markov chains by condition (for the first 11 mazes) are reported in Fig. 3d-f. The Markov chains support the patterns observed in the previous analysis: FIGURAL, LINE and PATH are stable attractors, respectively, for the irregular, stratified and regular condition. However, the Markov chains also highlight the presence of multiple attractors in each environment: in other words, participants might end up conceptualising their environment differently depending on how they start describing the mazes. Notably, while MATRIX is not frequent in general, participants 'discovering' this strategy will tend to very consistently stick to it across conditions. Analogously, in the regular condition, pairs using LINE, PATH or MATRIX descriptions will tend to stick to those without shifting strategy. These observations are further supported by looking at the evolution of description strategies in the individual pairs, indicating that different pairs might follow different trajectories.

Discussion
The present experiment addressed whether dyads playing different versions of the Maze Game would adapt their conceptualisations and corresponding linguistic construal contingent on the particular environmental layouts differing in their  Table 1 The table reports the statistical testing of the H1 predictions: that specific environments select for specific conceptualisations, and therefore these conceptualisations will be more frequent in that kind of environment than in others. "Hypothesis" identifies the relevant prediction being tested. "Contrasts" indicates which conditions are being contrasted for the use of which strategy. The third column reports the mean expected difference, Beta, in log odds and 95% compatibility intervals, and the fourth column, the Evidence Ratio (ER) for the hypothesis and, when relevant, the Evidence Ratio for the null hypothesis (ER01).
affordances for referential strategies. As in previous Maze Game studies, we observe large variability in how different pairs conceptually construe the mazes in terms of their figural properties, coordinates or the specific paths to take to reach their goals, and how their conceptualisations develop and shift over time. However, importantly, we observe that part of this variation seems to be systematically motivated by environmental affordances, supporting hypothesis 1. When presented with irregular mazes, participants were relatively more inclined to designate positions with reference to salient local figurative details of the mazes and the use of a FIGURAL strategy increased over the course of trials. Consider the following example (all examples are translated from the Danish chat logs by the authors): "There are two gates, one to the right in the small indent and one to the left in the left side of the branch". Here the positions of gates are explained by reference to local shape features such as "the small indent" or "the branch". When presented with mazes in the stratified condition, participants were relatively more inclined to conceptualise the mazes as consisting of rows and would use these as the main reference when navigating the task environment, as in the example: "Gates are blocking between the upper 2nd and 3rd row. And in the 3rd row from the bottom and in the bottom row, which blocks between box number 2 and 3 from the left in both rows". Again, the use of a LINE strategy increased over the course of trials in the stratified condition. Lastly, when presented with mazes of the regular condition, participants were more likely to designate a position by the PATH one would need to take from a referent point: "then you should go to the bottom left, 1 up, 1 to the right, 1 up", and this strategy increased over trials. Contrary to our predictions, the construal of the mazes as a grid-like MATRIX of x and y coordinates ("Cannot make it to 5,1 yet, but will be at 6,5 in a short while") was less frequent in this version of the Maze Game. However, inspecting Fig. 2d-f depicting the Markov chain transition probabilities, we observe that the MATRIX strategy is in fact the more 'stable' strategy in the regular condition in the sense that participants using this Table 2 The table reports the statistical testing of the temporal aspects of the H1 predictions: that specific environments select for specific conceptualisations and therefore that these types will increase in their use over time, more so than others within the same condition. "Hypothesis" identifies the relevant predictions being tested. "Contrasts" indicates which conditions are being contrasted for the use of which strategy. The third column reports the mean expected estimate, Beta, in log odds and 95% compatibility intervals, and the fourth column, the Evidence Ratio (ER) for the hypothesis and, when relevant, the Evidence Ratio for the null hypothesis (ER01). Note that we are using linear models of time, since this model was credibly better than the others (stacking weight = 1, compared to 0 for monotonic time and 0 for condition only). Table 3 The table reports the statistical testing of the H2 predictions: that specific environments select and stabilise specific conceptualisations, that these will be preserved even when encountering the final neutral maze. "Hypothesis" identifies the relevant predictions being tested. "Contrasts" indicates which conditions are being contrasted for the use of which strategy. The third column reports the mean expected difference, Beta, in log odds, and 95% compatibility intervals, while the fourth column reports the Evidence Ratio (ER) for the hypothesis and, when relevant, the Evidence Ratio for the null hypothesis (ER01).
strategy have a 0.97 probability to stay with the strategy rather than changing to a different one (in comparison, participant pairs only have a 0.05 probability of staying with the FIGURAL strategy in this condition). In other words, while few participants are "discovering" this strategy, it presents itself as optimal when discovered, which is illustrated by the fact that MATRIX is among the most stable strategies across conditions in the sense that participants will stick to it once it has been discovered. This strong attraction of MATRIX across environments replicates observations from earlier maze-game experiments: In their study, Garrod and Doherty (1994) contrasted a condition of "isolated pairs" (similar to our experiment) with a "community condition", where participants changed partners repeatedly within a 'community'. This had the effect that strategies could spread across pairs, and the finding was that community pairs largely converged on using the MATRIX strategy. Despite the relative attraction and stability of the MATRIX strategy in our study, it is interesting that in the stratified condition there was a 16% tendency to transition to LINE from MATRIX, suggesting that among the abstract strategies, LINE was the more favourable one in this particular environment. These observations make us speculate that in the context of a community condition, participants would tend to converge on the LINE strategy rather than the MATRIX strategy if the environment was stratified.
Once a linguistic construal is introduced, it will often gain precedence through mechanisms of linguistic alignment and conceptual pacts (Brennan and Clark, 1996;Pickering and Garrod, 2004;Fusaroli and Tylén, 2012), which again can lead to conventionalisation (Garrod and Doherty, 1994) and long-term language change (Brown and Aaron, 2017). With this process, the motivation of the particular linguistic construal moves from being contingent on the environment, to depend on the local history of successful interactions (Deacon, 1997;Garrod et al., 2007). That is, as speakers in interaction gradually entrench 4 a conceptual construal, the bond to the environment might weaken while the construal comes to constitute a social convention. By implication, the same linguistic strategy is generalised and maintained even when the environment and thus affordances change. This is, at least partially, what we observe with regard to hypothesis 2. When, at the end of the experiment, participants are subjected to a maze with a shape neutral to the three conditions, they tend to stay with the conceptual construal they used through previous trials despite the fact that the maze they are facing potentially could afford a different strategy. The coordination advantage of staying with the convention tends to override the local affordance of the environment at this stage of the interaction. This is prevalent for dyads in the stratified and the regular condition. Participants in the irregular condition, who predominantly relied on FIGURAL descriptions, however, did not show the same tendency to generalise their system to the last, slightly differently shaped maze. It is important to notice that the FIGURAL strategy is more concrete than other strategies, since it is more dependent on the particular layout of the individual token maze. As argued by Healey (2008) such strategies reflect simple instance-specific forms of representation based on ad hoc associations, such as referring to easily recognisable shapes. An implication of this is that the more concrete FIGURAL strategy is a perfectly viable strategy in a stable and constrained context where participants communicate repeatedly about the same environment. However, it is less flexible and thus usually dispreferred when participants have to navigate multiple or changing environments (see also the stability of FIGURAL across conditions in Fig.  3d-f). In previous Maze Game experiments, the FIGURAL strategy has been observed to be prevalent only in early trials, whereas participants would often abandon it for more abstract strategies once they discover these to more effectively transfer to new mazes (Garrod and Anderson, 1987;Garrod and Doherty, 1994). In this context, it is interesting that participants in the irregular condition were less likely to switch to an abstract description scheme such as LINE or MATRIX despite their apparent advantages. This suggests that the salient landmarks constituted by the shape of mazes in the irregular condition provided strong affordances for ad hoc associations. Consider two excerpts from different pairs communicating about the same maze from the irregular condition (shown in Fig. 4): 1. P31: "the outermost left" P31: "in the arm" P32: "the bottom right of the middle arm" 2. P19: "at the bottom of the branch" P19: "the trunk if it's an elephant" P20: "the field to the right of the front leg" These examples show how the salient irregular shapes of the maze afford different figural construal. The pair in (1) refers to three protruding areas as arms, while the pair in example (2) conceptualises the entire maze as an elephant, which is segmented into different body-parts that are used to locate the switches. Contrary to earlier maze-game observations (Garrod and Anderson, 1987;Garrod and Doherty, 1994), in the irregular condition, pairs tended to rely on such descriptions even after they had discovered more abstract strategies.
In summary, when confronted with the novel task of communicating locations in the experimental mazes, participants were sensitive to the affordances offered by the particular environment (Tylén et al., 2013), and adopted linguistic strategies reflecting different conceptual construals of the mazes depending on the condition (see also Castillo et al., 2019). Our observations resonate with a number of large-scale cross-sectional studies suggesting that linguistic variability correlates with such environmental conditions (Lupyan and Dale, 2016) such as the linguistic reference frames discussed in the section "Introduction". While cross-sectional approaches can descriptively map tendencies in large amounts of real-world cross-linguistic data, they are, due to their correlational character, unsuitable to inform discussions about the underlying causal mechanisms (Roberts, 2018). In this study, we have taken an experimental approach, which allows us to systematically address possible mechanisms giving rise to linguistic variation in a controlled test environment. We have shown that experimentally manipulating the spatial layout of the environment as a variable affects participants' linguistic behaviour in predicted directions. This happens as interacting individuals face concrete coordination problems that require the negotiation of common ground and novel linguistic routines (Clark, 1996;Pickering and Garrod, 2004). By subjecting participants to multiple trials of collaborative problem solving, we aimed to experimentally simulate aspects of linguistic structure dynamically emerging locally and changing over time in response to contextual affordances. Other studies have shown how such local changes can affect more global patterns and timescales (Beckner et al., 2009;Fay and Ellison, 2013;Brown and Aaron, 2017;Fig. 4 Example trial. Maze from the irregular condition that could be segmented in different figural ways to describe the location of switches (shaded rooms). Tamariz, 2017). While experimental studies allow the researcher to isolate variables of interest and assure experimental control, they also have obvious limitations related to their abstract nature and less 'ecological' settings. Optimally, we should thus seek to combine experimental approaches, computational models and descriptive fieldwork in order to generate a robust accounts of the factors and mechanisms contributing to linguistic variation (Roberts, 2018). Importantly, we do not claim deterministic relations between environmental factors and linguistic conceptual structure. Linguistic conventions are likely to be continuously shaped by a meshwork of multiple cultural, historical and cognitive factors working on multiple time scales (Raczaszek-Leonardi, 2009;Beckner et al., 2009;Tylén et al., 2013;Christiansen and Chater, 2016b;Palmer et al., 2017). Endorsing perspectives from dynamical systems theory, these can be considered competing attractors influencing language in probabilistic and contextdependent ways (Elman, 1995;Spivey, 2007;Beckner et al., 2009;Fusaroli and Tylén, 2012). While, in the outset, different conceptualisations (e.g., frames of reference or spatial description strategies) might all present themselves as equally viable solutions to a communicative coordination problem, the surrounding environment might profile and subtly prime one solution over the other and thus skew the relative attraction in favour of a specific solution (Winters et al., 2015;Christensen et al., 2016;Nölle et al., 2018) which can then spread and conventionalise in a community (Garrod and Doherty, 1994).

Data availability
The datasets generated and analysed during the current study are available in the Open Science Framework repository: https://osf. io/sxtaq. Notes 1 Note that while the types of linguistic structure reviewed above include both 'compositional structure' (how linguistic signals are combined grammatically) and 'categorical structure' (how encoded meanings carve up the world conceptually), the present study focuses on the latter, conceptual aspect (see Carr et al., 2017). 2 By 'proto-conventions', we mean spontaneously emerging descriptions, that bear the potential to become fully conventionalised within a community through cultural diffusion. Adopting Garrod and Doherty's (1994) typology, we divide these into four description schemes, FIGURAL, PATH, MATRIX and LINE, which correspond to different conceptualisations of the mazes. 3 We used the Dialogue Experimental Toolkit (https://dialoguetoolkit.github.io/ chattool/), which has previously replicated the findings from Garrod and colleague's original spoken experiments while making it easier to collect and analyse the data (see, e.g., Healey and Mills, 2006;Mills, 2014) 4 The underlying processes of entrenchment and conventionalisation are discussed in greater detail in usage-based accounts, e.g., by Schmid (2016) or Divjak (2019).