Wild primates copy higher-ranked individuals in a social transmission experiment

Little is known about how multiple social learning strategies interact and how organisms integrate both individual and social information. Here we combine, in a wild primate, an open diffusion experiment with a modeling approach: Network-Based Diffusion Analysis using a dynamic observation network. The vervet monkeys we study were not provided with a trained model; instead they had access to eight foraging boxes that could be opened in either of two ways. We report that individuals socially learn the techniques they observe in others. After having learnt one option, individuals are 31x more likely to subsequently asocially learn the other option than individuals naïve to both options. We discover evidence of a rank transmission bias favoring learning from higher-ranked individuals, with no evidence for age, sex or kin bias. This fine-grained analysis highlights a rank transmission bias in a field experiment mimicking the diffusion of a behavioral innovation.


Individual option preference
In a more detailed examination of the data, we note that, in Kubu, two individuals of the 12 learnt the pull option first (Table 1)

Model description
Network-based diffusion analysis (NBDA) infers social transmission of novel behaviour if the pattern of its spread follows a social network, which is taken to represent opportunities to learn from others [S3]. We used the order of acquisition (OADA) variant of NBDA [S4], which takes as data only the order in which individuals acquire the target behaviour and not the times of acquisition. NBDA has been expanded in a number of ways, which we utilise here.
First, one can include a dynamic network that changes over time [S5]. One type of dynamic network that can be used is a record of who has observed whom prior to each acquisition event. This offers the most direct measure of opportunities for learning in cases where the target behaviour is only performed at a specific location(s) that can be monitored closely [S5, S6]. In practise, a proxy of observation is used by ascertaining, for each performance of the target behaviour, who had the opportunity to observe. In this case we identified observers as individuals with their head or body oriented in an unobstructed line towards the subject manipulating the box. The use of a proxy for observation is not an inherent problem, since Hoppitt [S6] has shown that error in identifying observers does not increase the risk of a spurious social transmission effect but acts to make estimates of the strength of social learning conservative.
A second extension of NBDA is to include multiple options for solving a task [S4]. In a normal OADA, the parameter values are optimised to maximise the power of the model to predict which individual will be the next to learn the target behaviour at each acquisition event. If we extend the OADA to multiple options the parameter values are optimized to predict the combination of which individual will be next to learn and which option they will learn to use (assuming the options are including in the same stratum, see below). Thus, a multi-option NBDA differs from statistical approaches that aim to infer the presence or absence of social learning based on the option used alone [S7] -rather this information is added to the pattern of spread across the network when quantifying the strength of social transmission. It is also important to note that an NBDA extended to multiple options is not intended to model the role of social learning in the development of a preference for specific options over time, once both are acquired to the repertoire (cf. experience weighted attraction models, S8). Rather, it is intended to model the acquisition of different behavioural variants to the behavioural repertoire, and address whether or not their acquisition to repertoire is independent (see below).
A third extension of NBDA is to include multiple networks representing different pathways of learning [S9]. This enables researchers to test whether the rate of social transmission differs between different pathways. Here we included two different pathways of learning: option- vice versa for pull "individuals". In a standard NBDA the s parameter estimates the rate of social transmission per unit connection relative to asocial learning. In our model each network has an associated s parameter, denoted sOS and sCO. We fit models with a) a different rate for each pathway(! "# ≠ ! %" ); b) option-general social transmission c) OS social learning only (! %" = 0); d) CO social learning only (! "# = 0); and e) asocial learning (! "# = ! %" = 0), allowing us to test the strength of evidence for each hypothesis (see below).
The order of acquisition (OADA) variant of NBDA [S10] takes as data only the order in which individuals acquire the target behaviour and not the times of acquisition. This has the advantage that it does not make any assumptions about the baseline rate function unlike the time of acquisition variant (TADA) [S10]. Whilst TADA can be easily modified to include increasing or decreasing baseline rates of learning, it seems likely that the rate of learning will have fluctuated with external conditions, e.g. differences in the local conditions at which the task was positioned each day. Consequently, the different groups Kubu and Noha were treated as separate diffusions in different strata in the analysis (with different unspecified baseline functions) since they were subject to different daily local conditions [S4]. However, we included diffusion of the lift and pull options in the same stratum since they are likely to have been affected in the same way by such factors. This also gives us greater power to detect OS social transmission since the model is sensitive to the order across options, e.g. if one option diffuses through a group first, followed by the other option, it adds to the evidence that individuals were learning by OS social learning. We controlled for the possibility that such a pattern might be caused by vervet monkeys learning one option more easily by asocial learning by including option as a factor influencing the rate of asocial learning.
We also wanted to allow for the possibility that vervet monkeys might generalize their learning, i.e. learning to solve the task using one option might increase the rate at which they  Table 2). If a generalization effect were operating in concert with OS social transmission, we would expect a different pattern of diffusion to optiongeneral social transmission (! "# = ! %" ). e.g. If we had option-general social transmission, individuals that had observed many lifts are likely to be the next individual to solve, using either the lift or the pull option. If we have OS social transmission, paired with a generalization of learning, individuals that had observed many lifts are likely to be the next individual to solve specifically using the lift option, and once they have done so, are then likely to be the next to solve using the pull option. Thus different orders of events support different combinations of learning processes.
There were a number of other individual-level variables that we included as potentially having an effect on the rate of asocial and/or social learning: sex, age category (adult versus non-adult) and rank (quantified using the I&SI method: [S11]). We used the "unconstrained" model to include the effects of ILVs, which independently estimates the effects each ILV has on asocial and social learning [S4]. All variables were standardized so they were centred on zero, with a range of 1. Since s parameters are estimated relative to baseline asocial learning rate (when all ILVS=0), this transformation means they are estimated relative to an individual who is central with respect to these three ILVs. Thus, we had a total of 5 ILVs potentially having an effect on asocial learning and 3 ILVs potentially having an effect on social learning (Supplementary Table 2).
The full model can be expressed as follows: We used a multi-model inference approach using Akaike's Information Criterion corrected for sample size (AICc) [S12] to obtain support for each hypothesis: a) a different rate for each pathway (! "# ≠ ! %" ); b) option-general social transmission (! "# = ! %" ); c) OS social learning only (! %" = 0); d) CO social learning only (! "# = 0); and e) asocial learning (! "# = ! %" = 0), allowing us to test the strength of evidence for each hypothesis. For each of a-d we fit models with every combination of 5 ILVs affecting asocial learning and 3 variables affecting social learning, resulting in 256 models for each set. For e) asocial learning, = parameters have no effect so were excluded resulting in only 32 models. We calculated the total Akaike weight as a measure of support for each hypothesis a-d [S13]. Due to the lower number of models in the asocial set (e) we do not use the total Akaike weight as a measure of support for asocial learning, instead we use the 95% confidence intervals for the s parameters to this end (see below). We calculated model averaged estimates, unconditional standard errors, and the total Akaike weight for the effect of each ILV on asocial learning (@ parameters) and social learning (= parameters). In some models, standard errors could not be derived. When calculating the unconditional standard error, the standard errors for these models were replaced with an Akaike-weighted mean across models with a standard error, allowing an approximate unconditional standard error to be calculated.
Standard errors are often a misleading measure of precision for s parameters, since these often have much higher precision for a plausible lower limit than for a plausible upper limit.
Consequently, we obtain 95% confidence intervals (CIs) for s parameters using the profile likelihood method. For ! "# we used the model with the lowest AICc. Since this model did not contain ! %" , we added the parameter to the best model in order to get 95% CI for ! %" , and for ! "# − ! %" , quantifying the extent to which social learning is option-specific. Since s parameters are difficult to interpret directly, we also obtained an estimate of the number of learning events that are predicted to have occurred by each pathway corresponding to the estimate for each s parameter and its 95% CI [S6, S14].

Validation of method for detecting option-specific social transmission
Hoppitt, Boogert and Laland [S9] showed in a general case that inclusion of potentially confounding variables in an NBDA can statistically control for their effects and prevent spurious social transmission effects being detected. Thus, in principle, if one option is easier to learn asocially this should not result in a spurious OS social transmission effect in the NBDA, since we include a variable accounting for a potential difference in rate of asocial learning between the two options. However, to be certain that the statistical control is effective in this specific case, we ran simulations in which the observed bias towards lift was assumed to be entirely due to an asocial bias. We simulated the option choice based only on this bias. We retained the same pattern of observations but, in our simulations, these had no effect on option choice. We then fitted the same model that was favored in the real data (one in which social transmission was option specific only), and the equivalent model of option-general learning. We also fitted another two models in which an asocial bias for one or other option was added-to statistically control for its effects as we did in the real analysis. We then recorded the AICc for the best option general model minus the AICc for the best option specific model as measure of evidence for an option specific effect in the simulated dataset.
We then repeated this process 1000 times, and calculated the proportion of simulations in which the evidence for option specific social transmission exceeded that observed in the real data, and found it to be 0.0475-thus confirming that our finding for option specific social transmission is unlikely to be a result of a preference for one option.

Additional results
Supplementary Figure 1. Support for different combinations of option-specific (OS) and cross -option (CO) social learning. OS social learning received the most support, followed by models with separate s parameters for OS and CO social learning: in these cases, ! "# was estimated to be greater than ! %" . Overall these results indicate evidence that there was an OS social learning effect. detecting option-specific social transmission' above. $ The UCSE for age and sex effects on social learning are unrealistically high and are being skewed by a model or models with low weight. Consequently, in the main text we use 95% confidence intervals derived using the profile likelihood method to provide a plausible range for s parameters and ILV effects with support >50% (see Table 2, Main Text).

Dynamic observation networks and task exposure
It has been noted that there is concern with using dynamic observation networks where the target behaviour is performed in a specific location or locations-like here where the behaviour is directed to the foraging tasks provided [S5, S6]. A recorded observation for an individual i might simply indicate that i was in the area appropriate for performing/learning the behaviour, and consequently have been more likely to learn the behaviour in the near future. This might result in a spurious social transmission effect in an NBDA [S6]. A previous study [S5] addressed this by including a variable giving each individual's exposure to the task.
However, in our case this is not necessary. Since the alternative options, 'lift' and 'pull', are performed in the same location, being recorded as an observer of 'lift' indicates that the observer was in the appropriate area for performing both 'lift' and 'pull', and likewise for observers of 'pull'. Therefore, we could only expect a spurious option-general effect. The fact that we find evidence of an option-specific effect rules out the possibility that it is a spurious effect of this kind.

Testing whether the learning generalization effect operated on asocial or social learning
The model described in Supplementary Note 1 assumes that the effect of solving one option on the rate of solving using a second option operates through asocial learning. The model was constructed in this way to reflect the hypotheses being tested-i.e. is the learned behaviour insulated against asocial modification? Strong support is found for this effect indicating that individuals would rapidly solve using the second option after solving using the first. However, it is premature to conclude that this effect operates by increasing the speed at which the second option is learned by asocial learning, since we have not considered the possibility that the rate of social learning is increased instead or as well as that of asocial learning. Consequently, we re-fitted the best model, replacing the generalization effect on asocial learning (+@ A B G,) (-)) with an equivalent effect on social learning (+= A B G,) (-)). We found that AICc was increased by 6.63 units, indicating that the observed effect is much better explained by an increase in asocial learning rate than an increase in social learning rate. We also considered a model with effects on both social and asocial learning (AICc increased by 1.79) and with the same effect on both asocial and social learning (= A = @ A , AICc increased by 2.17) indicating that an effect on asocial learning alone is sufficient to explain the observed statistical pattern.
Overall these results indicate that the rate at which a second option, B, was learned, once an individual had learned a first option, A, was increased regardless of the individual's observational experience of option B. This suggests that learning a first option tended to be rapidly followed by asocial learning of a second option.

Model description
We extended the model described in Supplementary Note 1 above to test for biases in the transmission pathways (Supplementary Table 2). Since there was strong support for OS social learning only ( Supplementary Fig. 1), we simplified the model by dropping the CO effect. The biases we tested for were as follows: a) Rank biases. Does transmission rate from higher to lower ranked vervet monkeys differ from that from lower to higher ranks? b) Sex biases. Does the rate of transmission differ between male and female transmitters?
c) Age biases. Does the rate of transmission differ from adults to adults, adults to nonadults, non-adults to adults and non-adults to non-adults? d) Kin biases. Does the rate of transmission differ between kin and non-kin, and between different classes of kin (mother to offspring, offspring to mother, between siblings)?
In a-b) the pathways are divided into only two mutually exclusive pathways: we first describe the procedure to test for rank biases, but the procedure can be generalized to b.
We obtained a binary network J )5 , taking the value 1 when j is a higher rank than i and 0 otherwise, thus representing the transmission pathway from higher to lower ranked vervet monkeys. The network K )5 = 1 − ℎ )5 therefore represented the pathway from lower to higher ranked vervet monkeys. This allowed us to extend the model by replacing the term: When testing for a kin bias we broke the observation network down into four mutually exclusive pathways: mother to offspring; offspring to mother; sibling to sibling; non-kin, with the associated parameters: ! S" , ! "S , ! # , ! Q . We then considered three different hypotheses, all pathways different (! S" ≠ ! "S ≠ ! # ≠ ! Q ), kin different to non-kin (! S" = ! "S = ! # ≠ ! Q ) and no kin bias (! S" = ! "S = ! # = ! Q ).

Additional results
Supplementary Kin different to non-kin 21.9 All pathways different (mother to offspring, offspring to mother, sibling to sibling, non-kin) 1.9

Age biases
No bias 52.8 Older versus (younger or same age) 21.0 Transmission only from older 16.7 Older versus same age versus younger 6.7 All pathways different (adult to non-adult, non-adult to adult, adult to adult, non-adult to non-adult) 2.9 The following results are those of the same analyses as above but ran on pooling 'return

and lift' (srtli) with 'lift' and 'return and pull' (srtpu) with 'pull' successes instead of 'lift' (sli) and 'pull' (spu)
Supplementary Figure 3. Support for different combinations of option-specific (OS) and cross-option (CO) social learning. OS social learning received the most support, followed by models with separate s parameters for OS and CO social learning: in these cases, ! "# was estimated to be greater than ! %" . Overall these results indicate evidence that there was an Supplementary Kin different to non-kin 27.9 All pathways different (mother to offspring, offspring to mother, sibling to sibling, non-kin) 5.0

Age biases
No bias 47.9 Older versus (younger or same age) 31.1 Transmission only from older 7.5 Older versus same age versus younger 10.4 All pathways different (mother to offspring, offspring to mother, sibling to sibling, non-kin) 3.1

Rank biases
Higher to lower ranks only