Macaque monkeys learn by observation in the ghost display condition in the object-in-place task with differential reward to the observer

Observational learning has been investigated in monkeys mainly using conspecifics or humans as models to observe. Some studies attempted to clarify the social agent’s role and to test whether non-human primates could learn from observation of a non-social agent, usually mentioned as a ‘ghost display’ condition, but they reported conflicting results. To address this question, we trained three rhesus monkeys in an object-in-place task consisting of the presentation of five subsequent problems composed of two objects, one rewarded and one unrewarded, for six times, or runs. Three types of learning conditions were tested. In the individual learning condition, the monkeys performed the first run, learned from it and improved their performance in the following runs. In the social and non-social learning conditions, they observed respectively a human model and a computer performing the first run and learned by the observation of their successes or errors. In all three conditions, the monkeys themselves received the reward after correct choices only. One-trial learning occurred in all three conditions. The monkeys performed over chance in the second run in all conditions, providing evidence of non-social observational learning with differential reward in macaque monkeys using a “ghost display” condition in a cognitive task.

. Performance in the first and second run for each monkey (related to Fig.  2 and 3). The 1 st Run column shows the number of correct and error trials in each condition, on the total of 300 trials (5 trials x 60 sessions). The 2 nd Run columns shows the proportions of correct and error trials divided for the two groups, Corrects in the 1 st run and Errors in the 1 st Run. The numbers in the 2 nd Run -Corrects columns are reported as a percentage of correct trials in the scatterplot of Fig. 3. Abbreviations: MA, Monkey Alone; HI, Human Interaction; CI, Computer Interaction. Figure S1. Time course of the tasks performance in the first run in the three experimental conditions over the sixty sessions (Related to Fig. 2). In the MA condition, the first run was performed by the monkey, in the HI condition, the first run was performed by the human agent and in the CI condition, the first run was performed automatically by the computer. Bar graphs represent the distribution of the percentage of correct responses in the first run in the three conditions. There was no significant difference between the distribution of the percentage of correct trials between the MA and the HI conditions and between the MA the CI conditions for all three monkeys (Wilcoxon Rank-Sum Test, p > 0.05). Abbreviations: MA, Monkey Alone; HI, Human Interaction; CI, Computer Interaction. Figure S2. Time course of the monkey's performance in the second run in the three experimental conditions over the sixty sessions (Related to Fig. 2). In 8 out of 9 cases, there was no significant correlation between the monkey's performance and the number of session (see Supplementary

Supplementary Method
We computed a Bayesian binomial test to compare the proportion of correct trials during the first and the second run for each monkey in each experimental condition, with H0=0.5 and H1 ≠ 0.5 (Supplementary Table S3). The Bayesian approach allows to compare the likelihood of the data under both the null and the alternative hypothesis, bypassing some of the major issues related to the interpretation of p values which occur using frequentist approaches (Wagenmakers 2007). The observed values are expressed in terms of likelihood function of the data and are weighted with a prior distribution, that is the probability distribution of the data before the observation happens. We made no assumptions on the prior distribution (α=β=1). Bayes factor is reported as the ratio between the likelihood of the data under H1 and the likelihood of the data under H0 (BF10). BF10 >1 indicates that data of the posterior distribution are more likely under H1. The 95% credible interval are also reported. They represent the 95% probability that the performance falls between these specific upper and lower intervals. We tested the difference in the proportion of correct trials in the second run using a Bayesian Contingency Tables Test (Supplementary Table S4). We divided the trials performed in the second run in each condition by each monkey in two groups, based on the correct (Group 1) or incorrect (Group 2) responses given for those trials in the first run. Since the marginal distribution in the contingency tables was fixed (300 trials) in each one of the tested condition we applied a joint multinomial sampling scheme and we computed the BF10 joint multinomial given the null hypothesis Group 1 = Group 2 and the alternative hypothesis Group 1 ≠ Group 2. Furthermore, we computed the Log Odds Ratio (Log OR) and its 95% credible intervals to address the direction and the strength of the effect. Log OR > 0 indicates that a correct answer in the second run is more likely in Group 1, and viceversa. These analyses were performed using JASP Team (2018)

Supplementary Discussion
The aim of our work was to test whether macaques could learn from what we called a "non-social" agent. We contrast our result with the negative finding of Subiaul and colleagues in 2004 who failed to find any evidence of learning in a ghost display condition in macaque's monkeys. We hypothesized that an important difference between the social observational learning condition and the ghost display condition in their study might be that the monkeys did not associate the perceived action with the reward in the ghost display condition. Indeed, in their social observational learning condition, the other monkey obtained the reward while in their ghost display condition there was no reward at all. For this purpose, we decided to have two comparable conditions delivering the reward to the monkey in both cases. However, we are aware of the risk that our observational learning condition became a Pavlovian (1 st trial)-instrumental (2 nd trial) transfer (PIT). We can think to three main arguments against that interpretation.
First, the Pavlovian part of a PIT would require a classical conditioning learning. Some previous work described the possibility to create a fast Pavlovian association in a few trials (Rescorla 1988) but most of the paradigms used in the literature included several sessions of classical conditioning before testing the transfer to instrumental conditioning. Moreover, it is usually an increase in the performance which is measured and not the capacity to make a choice between two objects.
The second argument refers to the task requirement of making a choice. In our versions of the objectin-place task, correct and errors were made in the same proportion by the human partner or by the computer in the first run. Consequently, in 50% of the cases, the monkey had to apply a loose-shift strategy, shifting from the incorrect object chosen by the other agent in the first run to the alternative object in the second run. Because in this 50% of cases the animal was not exposed to the positive feedback and to the reward and it can be ruled out a simple Pavlovian conditioning learning effect. The negative feedback was also presented over the chosen object, not over the correct one. In case of incorrect choices, the correct object (the unchosen one) was not associated with any feedback, precluding any Pavlovian conditioning learning effect. In addition, it is precisely the error made by the human or the computer that promoted the higher learning effect in the observation conditions and the monkey received the reward after observing a correct choice but not after an observed incorrect choice. It means that our monkeys were able to link the action made by the human or the computer with a positive (reward) or a negative (absence of reward) outcome by observation.
Finally, the third argument refers to the peculiarity of one-trial learning processes. It has been shown that this kind of learning relies on the activity of high level prefrontal areas such as the frontal pole (FP). Indeed, in monkeys, lesions of the FP cause a specific impairment on the learning rate in the OIP task (Boschin 2015) with the loss of the one-trial learning effect. Moreover, this fast learning seems dependent of prefrontal processes that unites all information about the different features of a scene. Such processes require high level integration and differ from Pavlovian conditioning. Bibliography