Evidence for compositionality in baboons (Papio papio) through the test case of negation

Dautriche, Isabelle; Buccola, Brian; Berthet, Melissa; Fagot, Joel; Chemla, Emmanuel

doi:10.1038/s41598-022-21143-1

Download PDF

Article
Open access
Published: 10 November 2022

Evidence for compositionality in baboons (Papio papio) through the test case of negation

Isabelle Dautriche¹,
Brian Buccola²,
Melissa Berthet³,
Joel Fagot¹ &
…
Emmanuel Chemla⁴

Scientific Reports volume 12, Article number: 19181 (2022) Cite this article

2243 Accesses
3 Citations
23 Altmetric
Metrics details

Subjects

Abstract

Can non-human animals combine abstract representations much like humans do with language? In particular, can they entertain a compositional representation such as ‘not blue’? Across two experiments, we demonstrate that baboons (Papio papio) show a capacity for compositionality. Experiment 1 showed that baboons can entertain negative, compositional, representations: they can learn to associate a cue with iconically related referents (e.g., a blue patch referring to all blue objects), but also to the complement set associated with it (e.g., a blue patch referring to all non-blue objects). Strikingly, Experiment 2 showed that baboons not only learn to associate a cue with iconically related referents, but can learn to associate complex cues (composed of the same cue and an additional visual element) with the complement object set. Thus, they can learn an operation, instantiated by this additional visual element, that can be compositionally combined with previously learned cues. These results significantly reduce any claim that would make the manipulation and combination of abstract representations a solely human privilege.

Relational reasoning in wild bumblebees revisited: the role of distance

Article Open access 15 December 2023

Mental construction of object symbols from meaningless elements by Japanese macaques (Macaca fuscata)

Article Open access 04 March 2022

Infants recruit logic to learn about the social world

Article Open access 26 November 2020

Introduction

A basic mental operation that underlies the structural complexity of language is compositionality¹, which states that the meaning of an expression is determined by the meaning of its constituents and the rules used to combine them. Compositionality is hypothesized to be a defining feature of human language and unique to human thought^2,3,4, but its origin in evolution is controversial.

A rich body of field and lab research shows that animals can combine signals into sequences^{5,6,7,8,9,10,11,12,13}. For instance, putty-nosed monkeys combine two different referential calls signaling predators to trigger a group movement¹⁴, enculturated apes are able to spontaneously produce several referential signals sequentially⁵, and chimpanzees may even combine facial expressions and gestures in a productive way¹³. Yet, in all cases and more generally across this literature, there is no clear evidence that the structure of these combinations adjusts the meaning of what is being communicated by the individual signals in a systematic manner and, thus, that they qualify as cases of semantic compositionality^{9,13,15,16,17}.

Here we investigate abstract compositional representation in non-human primates through the study of negation, one of the most basic yet fundamental functional operations that exemplify compositionality. Negation is typically evidenced in language, where it can take various forms cross-linguistically: French has the words non and pas, Japanese the suffix -nai, English the prefix un-. Such negations may be seen as assuming a large range of semantic roles. For example, the negative particle no in English can be used to express falsity, refusal (e.g., in response to an order), displeasure, disbelief, exasperation, and more¹⁸. One salient technical property is that negation is always the negation of something, and as such is a typical case of functional application (the application of a function to an argument), the basic form of compositionality. In doing so, negation entails a relation where a mental representation is obtained by contrast to another source mental representation: truth is defined in contrast to falsity and vice-versa, pleasure to displeasure and vice-versa, a set (e.g., triangles) to its complement (non-triangles). We will focus on the latter case of set complementation, as it contains all the relevant pieces that make negation interesting and powerful: a compositional manipulation of abstract representations, without it being restricted to a narrow domain (truth/falsity).

Various forms of negation have been successfully taught to non-human animals: rejection, refusal, and other concepts that have negation-like functions¹⁹. As for set complementation, two types of studies are relevant, but not decisive.

First, animals have been shown to comply with ‘mutual exclusivity’, i.e., inferring that a new label is not associated with objects for which they already know a name^20,21 or inferring that a food reward is located in a location B when the only other possible location A is empty^22,23. However, these types of studies are not designed to show that a label is assigned to all other possible objects, nor do they show that animals’ behavior is based on the logical relation between the representation ‘absent’ and ‘present’ rather than holding separate representations for these two concepts. Critically, it has been argued that these studies concern proto-negation¹⁹, the operation that maps one concept to a complementary concept but only within a pre-established set of contrary pairs (e.g., absent/present), as opposed to a more general form of negation that may not require such previously established pairs (i.e., if a description applies to an arbitrary set of situations, its negation will apply to all the other situations; e.g., blue/not-blue). Such a form of negation has been argued to require a human-language-like system of representation²⁴.

Second, animals succeed in “non-matching-to-sample tasks”: when presented with a cue, they can learn to select, in a target pair, the object that is not the cue^25,26. However, these paradigms do not necessarily come with a control condition for learning concepts that are of similar complexity but not based on negation. Nor do they address the more “linguistic” side of the equation, i.e., trying to trigger such negative behavior from the stimuli only, rather than at the level of the task (see Experiment 2).

We present two experiments testing whether baboons (Papio papio) display a capacity for compositionality focusing on the critical case of negation. Compositionality requires that the constituent units of an expression have independent meanings. We define meaning here as the association between a cue and a referent. Experiment 1 tested whether baboons can learn negative, compositional, meanings (such as ’not-blue’), and Experiment 2 tested whether they can associate a negative meaning with a visual cue.

Experiment 1

We explored whether baboons are better at learning negation-like associations between a cue and a set of stimuli (e.g., a blue patch of color being the cue for non-blue objects) than arbitrary associations. Baboons ($n=24$) were tested in a matching-to-sample task using a testing procedure in which they had free access to computer-controlled operant conditioning setups with touch screens²⁷.

In each trial, baboons were first presented with one of two possible cues, either a color cue (a circle filled with a target color) or a shape cue (a target shape filled with white). After they touched the cue, two colored shapes were presented side by side on the screen, and the task was then to touch which of these two objects was associated with the cue to obtain a food reward (see Fig. 1). In the positive condition, the object-cue association was iconic: the cues (e.g., the yellow circle) shared one visual dimension with the objects triggering a reward (e.g., the yellow shuriken or the yellow letter A). In the negative condition, the responses were systematically the opposite, so that the cues (e.g., the yellow circle) were effectively associated with the complement set of iconically related objects (e.g., the purple shuriken and the purple letter A). In the arbitrary condition, the cues (e.g., the yellow circle) did not share any consistent visual dimension with their target responses (e.g., the yellow shuriken and the purple shuriken), yet as in the other conditions, the set target objects were coherent in some dimension (e.g., shape). For each condition, there were two target responses per cue, which, to avoid ambiguities, were never presented as the two proposed options.

If baboons readily assume that a cue is more likely to be associated with the object that the cue resembles the most, i.e. they show an iconicity bias, they should learn faster in the positive condition (although it is not always found that non-human animals and human infants display an iconicity bias (i.e.,^{28,29,30,31,32})). Our crucial prediction is that if baboons can represent negative concepts, therefore showing a capacity for compositionality, they will learn faster in the negative condition than in the arbitrary condition: despite the fact that all expected responses in the negative condition go opposite to an iconicity bias, participants will capitalize on the systematicity of the (negative) association.

Method

Data availability

The data and the script for the analysis in this paper are available online: https://doi.org/10.17605/OSF.IO/VZ8UR.

Ethical standards

This research conformed to the Standard of the American Psychological Association’s Ethical Principles of Psychologist and Code of Conduct and received ethical approval from the ethics committee of the French Ministry of Education (approval APAFIS 2717-2015111708173794 v3). The study is reported in accordance with ARRIVE guidelines.

Participants and apparatus

The data provided by 24 Guinea baboons (Papio papio, 17 females; age range: 2–24 years) from the CNRS primate facility (Rousset-sur-Arc, France) were analyzed. The participants were tested using ten automatic computerized learning devices for monkeys²⁷ that were freely accessible from the baboons’ living enclosures. Each test system comprised a touch screen, a food dispenser, and an automated radio frequency identification (RFID tag) of the participants. This allows us to test the individuals without capturing them, which improves animal welfare in experimental research³³.

The experiment was made available to all 25 animals in our facility, who participated in the study on a voluntary basis. Of the 25 participants, 1 failed to complete the first condition they were assigned to (see ‘Procedure and learning criteria’ below). The experiment was stopped when at least 4 participants per group completed the 3 conditions (see ‘Task and conditions’ below).

Materials

There were three sets of stimuli (see Fig. 1). Each set of stimuli was made of four objects varying systematically along two binary dimensions: color and shape. The three stimuli sets differed in the pairs of shapes (shuriken/letter A, fish/rocket, tree/butterfly) and the pairs of colors (yellow/purple, orange/pink, blue/gray) they featured. All shapes were composed of the same set of basic elements (tangrams) and thus had the same total area.

For each stimulus set, we constructed a color cue and a shape cue. The color cue was of the color shared by two target objects (the other two objects were of a different color) shaped as a circle, a shape not used in any of the stimuli sets. The shape cue was of the shape shared by two target objects, colored in white, a color not used in any of the stimulus sets. Each image was created as a bitmap file with $250 \times 250$ pixels and presented on the screen as a square of 6 cm, corresponding to a visual angle of 11.4$^{\circ }$ at a distance of 30 cm.

Task and conditions

We used a matching-to-sample task: In each trial, participants were presented with a cue (a sample), after which they had to choose between two objects (comparison stimuli) the one associated with the cue. The goal of the task was to learn the associations between each cue and their corresponding objects in each stimuli set. Each cue was always associated with two target objects (out of 4 possible for a given stimuli set).

There were three conditions (see Fig. 1C). In the positive condition, the object-cue associations were iconic: the associated object shared one visual dimension with the cue (e.g., the yellow circle cue was associated with the two yellow objects of the image set). In the negative condition, the cues were associated with the complement set of iconically related objects (e.g., the yellow circle cue was associated with the two non-yellow, purple, objects of the image set). In the arbitrary condition, the form of the cue could not be linked to the set of objects it was associated with, but the coherence of the associated objects was preserved (e.g., the yellow circle cue was associated with the two shuriken shapes: the yellow shuriken and the purple shuriken).

The three conditions were implemented in a different order to three different groups of participants such that all conditions were presented first, second, or third across the three groups. The stimuli sets were presented in the same order to the different participants (so that they would be matched with different conditions across groups).

Trials could be of 3 different types depending on how the target object was iconically related to the cue. Some trials were positive, in the sense that the cue shared a dimension with the rewarded response and not with the distractor (like all trials in the positive condition, by construction of the cue; and 1/4 of the trials in the arbitrary condition); some trials were negative, the opposite (like all trials in the negative condition; and 1/4 of the trials in the arbitrary condition); and some trials were neutral, in the sense that the cue might have shared a dimension with both or none of the target objects.

Procedure and learning criteria

Once the participant entered a test booth, they were immediately recognized by the test system with their RFID tag. As shown in Fig. 1A, a trial started with the presentation of a cue centered in the middle of the screen. Once participants touched the cue picture, two comparison stimuli appeared on each side of the screen. Baboons learned the associations between the cue and one of the comparison stimuli through positive reinforcement: touching the correct stimulus (as defined by the condition) cleared the screen and delivered a food reward. Touching the incorrect stimulus triggered a 3-s timeout indicated by a green screen. Participants were allowed a maximum of 5 s to respond. Aborted trials were not recorded, and were presented again in the next trial. The inter-trial interval was set to 3 s.

Within each condition, items were presented as a succession of blocks of 16 trials randomized within blocks. These 16 trials cover all possible combinations of target-distractor pairs ($2 \times 2$; see Fig. 1C) for each cue (2 total) with the target appearing once on the right and once on the left (2).

A condition was considered learned when the participants made no more than three errors per block for two consecutive blocks (a general accuracy criterion), and no more than 1 error per triplets of cue and comparison stimuli (to ensure that participants’ errors are not systematic) or when participants reached a maximum of 4000 trials ($n = 3$ across all participants, all in the arbitrary condition). Once these criteria were reached, the experiment progressed to the next condition.

Inclusion criteria

Because participants have different levels of expertise with the matching-to-sample task that could exhibit significant outliers in learning time, we decided a priori to remove outliers using a repeated Grubbs test³⁴ applied in each condition at the alpha level of 0.05. In total 4 condition blocks were removed, the negative and positive condition blocks for one participant, and the positive condition for two more participants. All other condition blocks for which the learning criteria were reached were included in the analysis. Including the excluded condition blocks in the analysis did not change the pattern of results. Of the 24 participants that completed at least one condition, 17 participants completed the 3 conditions (7 in the order arbitrary-positive-negative, 6 in the order negative-arbitrary-positive, and 4 in the order positive-negative-arbitrary), 3 completed 2 conditions, and 4 completed a single condition. This left us with 18 participants in the positive condition, 21 in the negative condition, and 22 in the arbitrary condition. Participants took on average 873 trials to reach the learning criterion across conditions ($\text {SE} = 127$; $\min = 64$; $\max = 4000$).

Analyses

Preliminary analysis (early iconicity bias)

To measure whether participants were more likely than chance to choose an iconic response, we calculated an iconicity index for each participant and each condition separately. We defined the iconicity index as the proportion of iconic responses in positive or in negative trials (i.e., in trials where an iconic response is possible) minus the chance level to choose an iconic response. In the arbitrary condition, the chance level was defined as participants’ accuracy in neutral trials to account for an individual learning performance. In the positive and the negative conditions, however, the chance level was 0.5 since there were no other trial types in these conditions that could be used to better estimate the chance level taking into account participants’ performance at the task. To give a concrete example, suppose that in the arbitrary condition the proportion of iconic responses is 0.6 in positive trials and 0.5 in negative trials. Participants’ performance in neutral trials is 0.55. The iconicity indexes for positive trials is thus $0.6 - 0.55 = 0.05$ and $0.5 - 0.55 = -0.05$ for negative trials. That is, the participant was more likely than chance to choose an iconic response on positive trials (iconicity index greater than 0) but less likely than chance to do so in negative trials (iconicity index less than 0). We modeled such an iconicity index using a mixed model specified as:

$$\begin{aligned} \texttt {iconicityIndex} \sim~{} \texttt {trialType + (1|participant)} \end{aligned}$$

using the lme4 package³⁵ in R. Because the model was singular, we built on recent advances in Bayesian statistics (using the brms package in R³⁶) to model our by-participant estimates³⁷. We tested whether the intercept for each level of trialType (positive; negative) was significant. For each parameter, we report estimates (B), estimated error (EE), and the 95% credible interval (CI). If zero lies outside the credible interval, then we conclude there is sufficient evidence to suggest the estimate is different from zero.

Main analysis (compositionality effect)

To quantify the ease with which different types of associations (positive, negative, arbitrary) are learned, we modeled the number of blocks (16 trials) needed to reach the learning criterion using a mixed model specified as:

$$\begin{aligned} {\texttt {Nblock}} \sim \texttt {condition + (1|participant)} \end{aligned}$$

As can be observed in Fig. 2B, the distribution of learning times (in number of blocks) was not normally distributed and followed a log-normal distribution. Because such a distribution is not supported by standard frequentist mixed model analysis methods, and because the number of observations per participant was not sufficient to produce robust participant-level estimates in the frequentist approach, we also used Bayesian statistics to fit a log-normal distribution to the data, using individual variation across participants to leverage our by-participant estimates.

Results

Preliminary result: early iconicity bias

In the positive condition, all trials are iconic (the cue and target share a visual dimension). Our first analysis asked whether participants have an initial bias towards such iconic responses, and in fact irrespective of the global condition, i.e. of the actual cue-object associations being reinforced. For each participant and condition, we focused here on the first 10% of trials before successful completion of that condition by that participant ($\text {mean} = 94 \text { trials}$, $\text {SE} = 12$, $\min = 16$, $\max = 400$). As illustrated in Fig. 2A, we found that participants initially choose the iconic response at rates greater than chance in both positive trials, where the iconic response is the rewarded response ($B = 0.11$, $EE = 0.04$, $CI = [0.05, 0.17]$), and also in negative trials ($B = 0.15$, $EE = 0.03$, $CI = [0.08, 0.20]$), where the iconic response is not the rewarded response, thus resulting in below-chance accuracy in the negative condition during the initial blocks of trials.

Main result: compositionality effect

Critical to our question is the ease with which the different conditions (positive, negative, arbitrary) are learned. As shown in Fig. 2B, the positive condition ($M = 17$ blocks, $SE = 2$) was learned faster than the negative condition ($M = 53$ blocks; $SE = 10, B = 0.83, EE = 0.30, CI = [0.34, 1.32]$) and the arbitrary condition ($M = 87, SE = 18, B = 1.34, EE = 0.30, CI = [0.83, 1.84]$). Crucially, participants were faster to learn the negative condition than the arbitrary condition ($B = 0.51, EE = 0.29, CI = [0.04, 0.99]$), despite the initial disadvantage for the former due to the early iconicity bias.

Overall then, baboons have an early bias towards positive (iconic) responses, and the positive condition is eventually learned the fastest. This bias leads participants to initially perform below chance in negative trials, where the iconic response is not the target response. But crucially this is quickly overcome in the condition where all trials are negative, eventually leading participants to perform better in the negative condition than in the arbitrary condition.

Discussion

Baboons were able to learn negative associations: they were able to quickly match a ‘yellow circle’ cue with non-yellow objects. These results are not due to “inhibition” or “reversal learning”, as in studies in which animals typically learn an association, and then later learn to not choose the matching category³⁸. Unlike these studies, the negative condition was not preceded by a corresponding positive condition using the same cue-object association that would serve as the basis of reversal learning. To be sure, we observed no effect of order, to the point that the effects are visible even when we restrict the analysis to the first condition a participant would see (baboons learned faster in the negative condition than in the arbitrary condition, $B = 1.29, EE = 0.46, CI = [0.51, 2.05]$, see details in the online analysis script). Finally, the comparison with the arbitrary condition, in which the same set of responses were taught but with different cues, shows most directly that it is the (negative) association between cue and referents that helped learning.

There are, however, three alternative interpretations to our results. First, it could be that baboons were faster to learn in the negative condition compared to the arbitrary condition because they focused on the cue feature that is not iconic, i.e. not represented in the comparison stimuli. For instance, when presented with a yellow circle cue, they would ignore the color feature and concentrate on the shape feature, therefore conferring an advantage to the negative condition (where circle would be an arbitrary cue associated with non-yellow objects) compared to the arbitrary condition (where circle, a shape, would be in competition with the target object set, i.e. shuriken shapes). This would predict that there is no advantage for the positive condition over the negative condition (where circle would also be an arbitrary cue associated with yellow objects). This is, however, not what we find: the positive condition is learned the fastest, suggesting that baboons tend to focus on the cue feature that matches rather than the one that discriminates.

The second alternative interpretation of our results is that baboons learned faster in the negative condition compared to the arbitrary condition not as a result of our experimental manipulation but as a result of the extra penalties they receive in the negative condition. Indeed, because baboons start with an iconicity bias, they initially receive more negative feedback in the negative condition compared to the arbitrary condition. There are, however, two important points that make that interpretation unlikely. First, it is not always the case that negative feedback is more efficient than positive feedback for learning in baboons³⁹, and second, even assuming it is, there is also more ground to cover in terms of learning in the negative condition compared to the arbitrary condition because of that initial iconicity bias.

The third alternative interpretation of our results is that baboons, instead of learning a negative representation (‘not yellow’), rather learned to associate a cue with a simple property. In our experiment there were only two possible colors occurring in a given condition, such that the negative concept ‘not yellow’ corresponded to the simple and “positive” category of purple objects. Hence participants in the negative condition could learn to associate the yellow cue with purple objects rather than with non-yellow objects. Importantly, however, the cue-object associations were similarly simple in the arbitrary condition: the yellow cue was also associated with a single property (all shuriken objects), and this condition was harder to learn than the negative condition. This suggests that it is the negative correspondence between cue and category that helped learning. Yet, it remains possible that the yellow cue made the color dimension salient, that is, prompted participants to pay more attention to colors, such that participants were faster to learn in the negative condition (where the target objects share the same color) compared to the arbitrary condition (where the target objects share the same shape). Thus, Experiment 2 addresses this issue, by asking whether baboons’ response to a complex stimulus reflects the responses to its component parts, rather than holistically responding to the combination as a single, distinct meaningful unit, that is, whether they demonstrate a capacity for compositionality.

Experiment 2

We explored whether baboons can associate a negation meaning with a cue, thereby showing an ability to accommodate a functional meaning for a visual cue and to integrate this logical meaning in a compositional manner. We used a task similar to the one described above.

Baboons ($n = 6$) learned in the following different stages (see also Fig. 3). At stage A1, they learned to associate atomic cues (a U shape and a wind turbine) with an object that is the same as the cue (similar to the positive condition of Experiment 1). At stage A2, the baboons were presented with the same atomic cues, and with the atomic cues augmented with a “negation morpheme” (complex cues, e.g., a U shape surrounded by 4 crosses) and for which the reward was reversed (i.e., effectively making the crosses a negation marker). At stage A3, baboons were exposed to a block similar to A2, with atomic and complex cues, but implemented with a pair of new atomic and complex cues.

During a second set of stages, baboons went through a similar phase, but objects varied not by shape but by color. They went through stages B1 (cue/responses variation) and B2 (negation) entirely analogous to A’s. Crucially, at stage B3 when a new type of cue was introduced (i.e., a new color), the meanings of atomic cues and complex cues were reversed. If, through all the stages before, participants learned the meaning of atomic cues and learned how meaning is systematically modified by the negation morpheme (i.e., show a capacity for compositionality), they should find the last stage B3, in which the negation regularity is violated, harder than its mirror image A3.