Trade-offs between vocal accommodation and individual recognisability in common marmoset vocalizations

Recent studies find increasing evidence for vocal accommodation in nonhuman primates, indicating that this form of vocal learning is more prevalent than previously thought. Convergent vocal accommodation (i.e. becoming more similar to partners) indicates social closeness. At the same time, however, becoming too similar may compromise individual recognisability. This is especially problematic if individual recognisability is an important part of the call function, like in long-distance contact calls. In contrast, in calls with a different function, the trade-off between signalling social closeness and individual recognisability might be less severe. We therefore hypothesized that the extent and consequences of accommodation depend on the function of a given call, and expected (1) more accommodation in calls for which individual identity is less crucial and (2) that individual identity is less compromised in calls that serve mainly to transmit identity compared to calls where individual recognisability is less important. We quantified vocal accommodation in three call types over the process of pair formation in common marmoset monkeys (Callithrix jacchus, n = 20). These three call types have different functions and vary with the degree to which they refer to individual identity of the caller. In accordance with our predictions, we found that animals converged most in close contact calls (trill calls), but less in calls where individual identity is more essential (phee- and food calls). In two out of three call types, the amount of accommodation was predicted by the initial vocal distance. Moreover, accommodation led to a drop in statistical individual recognisability in trill calls, but not in phee calls and food calls. Overall, our study shows that patterns of vocal accommodation vary between call types with different functions, suggestive of trade-offs between signalling social closeness and individual recognisability in marmoset vocalizations.

. Schematic representation of the predicted trade-off between accommodation and individual recognisability. Depending on the initial vocal distance and call type, patterns of accommodation are expected to vary in different pairs, as exemplified by pair 1-3. Orange arrows indicate the amount and direction of accommodation in a pair for a call where individual recognition is crucial, green arrows for calls where individual recognition is less important. If animals are very similar (pair 3) or very different (pair 1) prior to pair formation, convergence or divergence might be found in any call type, but to a different degree. If a pair shows intermediate vocal distance (pair 2) the pattern of accommodation might vary depending on the call function. Coloured triangles at the bottom represent the amount of social function (green) or individual recognisability (orange) that can be represented in a call at any given vocal distance (black arrow) between two individuals. If individual recognition in a call is important, the optimal distance should be where individual recognition is still strong (orange arrow and orange shaded area). If individual recognition is less important, the optimal vocal distance is expected where the social function is making up a larger part (green arrow and green shaded area). www.nature.com/scientificreports/ callers are typically within visual contact, there should be more flexibility to engage in accommodation, which can be achieved at the expense of individual recognisability. A similar argument was previously made in a study on individual baboon distress-and contact calls, in which the latter were found to contain a stronger individual signature than the former 18 . Moreover, the authors of this study suggested that a call type with fewer functional constraints might be structurally more free to vary to convey individual identity, which might be true for vocal accommodation as well.
To investigate potential trade-offs between accommodation and individuality, we measured the vocal output of a total of 20 common marmosets in ten newly formed breeding pairs over the process of pair formation for seven to twelve weeks. Several species of marmosets are known to show a high degree of vocal flexibility and a certain degree of vocal learning including babbling in infants [19][20][21] , the importance of social input for vocal ontogeny, including feedback by parents [22][23][24] and acoustic differences between colonies or populations 14,[25][26][27] . We analysed three different call types with different social functions (phee calls, trill calls and food calls) that are regularly produced in a within-group (i.e. breeding pair) context. We first investigated the extent of accommodation within each dyad for each call type, by quantifying both vocal convergence, and divergence. Next, we tested whether the amount of accommodation was correlated with vocal distance prior to pair formation for each call type. Finally, we statistically tested how accommodation impacted the potential recognisability of individuals.
Phee calls are predominantly used as long distance isolation calls when individuals are separated from their mates or social group 28,29 . Phee calls are known to encode individual identity as well as group identity and sex [30][31][32] . They typically elicit answering phee calls from other group members or mates, and animals often engage in turn taking, i.e. calling back and forth over several turns 33 . Common marmosets also use phee calls as vocal territory advertisement 34 , although the social situation seems not necessarily to be encoded in the call 32 . Overall, these functions suggest that individual recognisability is essential in phee calls.
Trill calls are close distance social calls that are often exchanged between social partners in a very relaxed social situation. Wild common marmosets often produce trill calls in situations such as foraging or resting 28 . A study by Liao et al. could show that captive marmosets produce trill calls more often when being closer to their social partner and with a lower heart rate, so at a more relaxed state, compared to when producing phee calls 29 . Since trill calls are given from a close distance, typically even within visual range, we assume that individual recognisability is less essential than in phee calls.
The third call type we looked at were food calls (sometimes also referred to as chirp calls) 28 . Food calls are usually produced upon the detection of high value food and often indicate the willingness of the caller to share the food with other group members 35 . Food calls are usually produced in bouts, and are given from variable distances. They seem to be more variable than trill calls and phee calls, and might have some elements that are referential with regard to food type 36 . Food calls typically inform infants and juveniles about the presence of food items, which are subsequently offered to them by the caller 37 , but are also used by pair members (often the male) towards their mate 38 . Food calls and food sharing between adults might facilitate the development of a pair bond 39 , which is why it might be of specific interest in newly bonding animals. As marmosets can be rather dispersed during feeding, recipients may not be in the immediate vicinity of the group member producing food calls, and therefore, a clear signal of individual identity could help receivers to move towards the caller to receive the food. Table 1 provides an overview of all the predictions for the specific call types.

Methods
Subjects. We recorded the vocal behaviour of 20 captive common marmosets over the process of pair formation of newly formed breeding pairs. All animals lived with at least one family member or a former partner until shortly before we introduced to their new breeding partner. After the animals were introduced to their new partner, they were no longer in acoustic contact with their former family or mate, but could hear other marmoset groups that were housed in the same room. Animals ranged from between 2 to 9 years, and all individuals were unfamiliar with their new partner before the start of the study. www.nature.com/scientificreports/ The enclosure of each pair measured 2.4 m in height × 1.5 m in depth × 0.8 m in width and was structured with branches, ropes, tubes and other enrichment material. All animals were fed twice a day (vitamin enriched mush in the morning and a mix of fruits and vegetables around midday) and in addition received different kind of animal or insect protein and/or gum once to twice a day. Water was always available ad libitum. The animals had regular access to spacious outdoor enclosures as well as to an additional testing room.
Recording procedure. The animals were recorded both before and during pair formation in a variety of situations to elicit a broad range of calls covering a large part of the naturally occurring call spectrum of the marmoset (presentation of food to elicit food calls, recordings with partner to elicit trill calls, recordings when separated from the partner to elicit phee calls). Before pair formation, individuals were recorded on several days over two to three weeks in their home enclosure either with a family member present or after being separated from their family group, as closely in time to pair formation as possible. After pair formation, we recorded the animals on one to three days a week up to 13 weeks after pair formation. We recorded them both in their home enclosure and in an additional, familiar experimental room which was connected to the home enclosure by a system of tubes through which the animals could walk. When recorded in their home enclosure, both animals of the pair were present. When recorded in the additional testing room, animals were either both present or they were separated from each other (either with the other animal still in the room with acoustic contact, or with the other animal back in the home enclosure) for up to five minutes. Both in the home enclosure and the test room, animals were recorded with or without highly preferred food (a mixture of mealworms, cashew seeds and nutcookies). Recording sessions lasted between 20 and 30 min. During the recording, the experimenter was present in the room and pointed the hand held microphone in the direction of the focal animal, which changed every five minutes. The identity of the caller was directly annotated to the recording by the experimenter in real time using the labelling function provided by the AviSoft Recorder software 40 .
Even though we tried to elicit calls from the animals, data recording remained largely opportunistic. Therefore, we do not have all call types of all the individuals over the whole time period. Pairs with less than a minimum of five calls per call type and per point in time where therefore excluded from further analysis, which led to a final sample of 8-9 pairs, depending on the call type.
The study and all the proceedings were reviewed and approved by the Kantonales Veterinärsamt Zürich, licence number ZH223/16 and followed both the ARRIVE guidelines as well as all other important guidelines and regulations.
Recording processing. The recordings were visually inspected in AviSoft Pro 40 and each call saved as a separate file. We inspected and measured each call with the software Praat 41 and extracted 15 (phee, food call) or 17 (trill) parameters per call after a script by E.F. Briefe & A. G. McElligot 42 . We measured the fundamental frequency and extracted the frequency both at the beginning and the end of the call, further the mean, minimal and maximal F0, the percentage of the call duration for which F0 was at the max, the absolute slope of F0, the mean variation of F0 per second, the frequency values at the first, second and third quartiles of energy, the highest frequency of the whole spectrum, percentage of time this highest frequency is reached and jitter, as well as frequency modulation rate and frequency modulation extent for trill calls (see Ref. 14 for a detailed description of the parameters). Calls were excluded from the final sample if there was background noise, if they overlapped with any other call or we could not measure the whole call correctly in Praat.

Statistics. Patterns of accommodation.
To quantify convergence and divergence, we calculated the vocal distance between partners before the start of pair formation (bpf) and after pair formation (apf) for each call type (see Table 2 for the specific time after pair formation the apf-calls were recorded per pair and call type). We first performed principal component analyses for each call type and each pair based on the z-transformed values of the measured call parameters and extracted all components with an Eigenvalue greater than the 95% quantile value obtained from 10,000 datasets that were randomly generated and equal in sample size and dimensionality to our empirical data (Parallel analysis). This lead to 3-5 extracted factors depending on pair and call type. For all further analyses, we used the PC-Factors extracted by this method.
We calculated the Euclidian distance between each call of the male and each call of the female within a pair based on the extracted PC-factors. It is important to note here that-as each call served as a reference for multiple distance measurements (each call was compared to each call of the partner)-these distance measurements between partners are not independent, and this non-independence has to be taken into account in the analysis. To estimate whether the vocal distance increased or decreased over time in the different pairs, we compared the distance matrix bpf with the distance matrix apf with a bootstrapped Welch t-test (taking into account the dependencies in the data) and calculated non-parametric 95-99.9% confidence intervals around the effect size to assess whether there was a significant change in the vocal distance. An increase in distance would indicate vocal divergence, a decrease in distance vocal convergence. We used the average of the Euclidian distances as a proxy for average vocal distance between partners for either point in time. The amount of accommodation was calculated as the change in vocal distance bpf to apf by subtracting the average vocal distance apf from the average vocal distance bpf. We calculated Pearson's correlation coefficients to test if the initial distance between pair mates and the amount of accommodation was correlated, separately for each call type.
Impact on statistical individual recognisability. We investigated whether animals could statistically be distinguished by their calls, and whether this changed with accommodation. We first again performed a PCA as described above, this time including the calls of all the individuals in one analysis. We then performed a Discriminant Function Analysis (DFA) both before and after pair formation to quantify to what extent calls could sta- www.nature.com/scientificreports/ tistically be correctly assigned to the individual producing them, using the total of the correctly assigned calls as a measure of individual distinctness within calls. To test whether the amount of correctly assigned calls changed from before to after pair formation we performed a binomial GLMM, including "condition × call type" as fixed effects and "individual nested in pair" as well as "call type" as random effects. Lastly, we compared the mean of correctly assigned calls between bpf and apf split by call type using post-hoc comparisons (function "emmeans", package "emmeans"). All analysis were performed in R 3.5.3.

Patterns of accommodation across call types.
To disentangle how the calls changed over time, we quantified the amount of accommodation (both convergence and divergence) for each pair and each call type. We found that for phee calls, 5 out of 8 pairs showed a significant amount of accommodation, of which 1 pair diverged and 4 pairs converged. In trill calls, 5 out of 9 pairs showed a significant amount of accommodation, all of which converged. In food calls, all 9 pairs showed a significant amount of accommodation, and 3 pairs converged, while 6 pairs diverged (see Table 2, Fig. 2). Convergence was thus most prevalent in trill calls (55.56% of all pairs), followed by phee calls (50%) and food calls (33.33%). Next, we tested whether the amount of accommodation was correlated with the initial vocal distance of the individuals before pair formation. While in phee calls we did not find a significant effect of the initial call distance on accommodation, even though effect size was medium to large (N = 8, Pearson's correlation coefficient = 0.381, p = 0.352), both trill calls (N = 9, Pearson's correlation coefficient = 0.744, p = 0.022) and food calls (N = 9, Pearson's correlation coefficient = 0.782, p = 0.013) showed a positive correlation between the initial vocal distance and the amount of vocal accommodation (see Fig. 2).

Impact of accommodation on statistical individual recognisability.
To quantify the impact of the observed patterns of accommodation on statistical individual recognisability, we compared the amount of calls correctly classified to individuals before and after pair formation. The expected amount of correct classification by chance was around 6% for each call type, and calls were always correctly classified to higher amounts than expected, i.e. statistical individual recognisability was high in each call type (Fig. 3). When performing a discriminant function analysis, statistical individual recognisability remained at comparable levels before (45.7%) and after (46.7%) pair formation for phee calls. In trill calls, statistical individual recognisability significantly dropped from 45% (bpf) to 33.5% apf, and in food calls, it was slightly increased apf (41.7% bpf vs. 45% apf) (Fig. 3). The GLMM shows a significant difference between the call types and the situation (bpf vs apf) ( Table 3). Post hoc tests revealed that the changes in statistical individual recognisability were significant in both trill calls and food calls (Table 4).

Discussion
Increasing evidence for vocal accommodation in nonhuman primates has received a lot of attention in recent research because it suggests more vocal learning than previously assumed. When vocally accommodating, animals modify their vocalizations due to a social template, satisfying the definition of vocal learning by Janik and Slater 2 . Vocal accommodation often seems to serve a social function, reflecting social distance or the strength of a social bond. Nevertheless, an excess in vocal convergence can have disadvantages, when increasing vocal similarity leads to a loss in individual recognisability 43 . In this study, we explored potential trade-offs between the social benefits of convergence vs the necessity to maintain individuality in call structure in common marmosets.
To do so, we tested newly formed pairs and compared their vocalizations before and after pair formation. This Table 2. Amount of accommodation (convergence and divergence) for each pair and call type. Week refers to the week after pair formation when the recordings for the "after"-comparison were made (for phee-/trill-/ and food calls respectively). α-level gives the level at which the vocal distance was significantly different before and after pair formation (ns indicates that the change in distance was not significant). r indicates the effect size, while the + or − indicates the direction of the effect. Positive r values indicate convergence, i.e. that the pair became more similar, negative r values indicate divergence.

Pair
Week last recording (phee/trill/food)   . Statistical individual recognisability before and after pair formation. Percentage of correct assignments was obtained from a discriminant function analysis. Light grey bars indicate values of correct assignment before pair formation, dark grey bars after. Calls can be attributed to the correct individual by discriminant function analysis significantly better than expected by chance (red, green or blue line respectively, indicated by red asterisk) in all conditions. The amount of correct assignment though significantly decreased in trill calls after pair formation and increased in food calls (GLMM, indicated by black asterisk). We did not observe a significant change in the level of correct assignment in phee calls. www.nature.com/scientificreports/ situation has elicited vocal accommodation in pygmy marmosets previously 17 , but so far it was unclear whether and how marmosets would deal with the different requirements of converging to a partner while keeping their identity encoded in the calls. In this study, we therefore investigated how common marmosets accommodate to their partners in three different call types that critically differ in their function: Phee calls, which are long distance contact calls mainly produced when animals are separated from social partners; trill calls, which are close distance calls usually produced in close proximity, and food calls, which are emitted when animals find preferred food, often indicating willingness to share. In a second step, we examined to what extent their pattern of accommodation impacted how well calls could be attributed to specific animals statistically (individuality of calls), and whether this was related to the different call functions.
Patterns of accommodation across call types. In our first set of predictions, we expected that the amount of convergence should differ between call types with different functions if there is a trade-off between the social function of accommodation and individual identity. We found vocal accommodation in all three call types, but to a different degree. As predicted, most convergence was observed in the close-distance trill calls, and less in long distance phee-and food calls. These results are in line with studies in other marmoset species, which found that animals show vocal accommodation in their trill calls in different situations 12,13,15,17 . In trill calls we only found convergence, whereas in phee calls and food calls we found both convergence and divergence. Further, we found that in trill calls and food calls, the amount of accommodation was correlated with the initial vocal distance between pairs, but not in phee calls. From our data, we cannot conclude that such a correlation is really absent in the phee calls, or if failing to reach significance is, as suggested by the rather large effect size, an artefact of the rather small sample size. Over all, these results fit the hypothesis that a trade-off between social accommodation and preserving individual identity leads to different patterns of accommodation depending on the call function (i.e. how important it is that individual identity is encoded in a specific call type) as well as the idea of an optimal vocal distance between partners where the benefits of accommodation are reached but the negative impacts minimized. To further test this idea, in a next step we investigated whether these differences in accommodation pattern indeed affected the individual recognisability of call depending on the call types.
Consequence of accommodation for individual recognisability. Next, we investigated how well calls can be individually distinguished with statistical methods. In trill calls, where individuality is less important and which showed the highest level of convergence, we found a significant decrease in the individuality of the calls (calls could be assigned to the correct individual less reliably). In phee calls, where individuality is crucial, the statistical individual recognisability did not change even though convergence occurred in some pairs. In food calls, where individuality is also important and where divergence was most prevalent, calls could be better assigned to the correct individual after pair formation. It therefore seems that convergence did indeed reduce individual distinctness only in the call type (trills) where it is less important because the animals can see each other directly when emitting such calls. In our study, we unfortunately could not look into how changes in statistical recognisability impacted caller recognition by the animals themselves. Playback experiments would therefore be an important next step to investigate whether our findings also impact the ability of the receivers to distinguish between callers. Additionally, presenting playbacks that simulate potential partners with more or less similar calls, could answer the question if and how common marmosets use potential information encoded in different call types. Our results though show that vocal accommodation seems to be regulated differently for  Table 4. Post hoc tests reveal that while trill calls are significantly less likely to be correctly assigned to the correct individual after pair formation than before, the probability of correctly assigning food calls to the correct individual is higher after pair formation than before. There was no difference in correct assignment to be found in phee calls. p-values are Tukey HDS corrected to take multiple testing into account. www.nature.com/scientificreports/ individual call types and is probably a more complex process than hitherto expected. How convergence is differently regulated in phee-and food calls compared to trill calls remains to be established. Based on our predictions (Table 1), we would have expected similar results in both phee-and food calls. Whereas convergence occurred in all three call types, divergence occurred in food calls in particular. So what differentiates this call type from the others, especially from phee calls? In contrast to food calls, phee calls are also produced in inter-group encounters, and are known to be group-specific to a certain degree. This might limit their potential to diverge between partners in addition to the constraints already discussed. Further, the food calls of the future pairs were potentially rather similar already before pair formation, which arguably led to this high level of divergence. It thus appears that individual recognisability is indeed important for food calls, and future studies using playbacks will help disentangle why this is the case.
What we did not consider in this study is the fact that food calls are normally produced in call bouts that contain several individual food call elements. In our analysis, we only analysed the single elements but not the information that is potentially encoded in the call bout. An intriguing possibility is that marmosets also accommodate to their partner with regard to bout structure (e.g. duration, number of elements), similar to the occurrence of accommodation in humans at multiple levels, from acoustic structure to word choice and syntax 3 . Moreover, some elements of marmoset food calls appear to be functionally referential 36 . Taken together, the food calls thus appear more heterogeneous than the other two call types analysed here, and additional studies will be necessary to fully understand how they change together with changes in social context.
Our main research focus of this study was to establish how the different needs for accommodation and individuality can be accounted for. It therefore provides an important background for other studies on vocal accommodation to come. Whether or how vocal similarity or dissimilarity is used as a social signal in common marmosets is still an open question, both in breeding pairs as we studied them, as well as in the larger family groups marmosets usually live in. Based on studies in other animals, it is well possible that accommodation, or another means of vocal flexibility, is used by common marmosets to signal the strength or even maintain their pair bond 3 . We can only speculate though whether vocal similarity indeed strengthens social bonds between all individuals in a group, as it would be equally important between breeding and non-breeding group members 44 . We would consider it likely though, especially if groups contain non-related helpers, where kin selection is not sufficient to ensure cooperation. Our results suggest that trill calls are particularly likely candidate vocalisations for such a function, as they are more prone to accommodation and appear less constrained by the need to maintain individual recognisability. Moreover, they are often produced by animals which are in close contact and have a strong social bond 29 .
Vocal learning was for a long time considered rare in nonhuman primates 1 . In this study, we could confirm that common marmosets engage in vocal accommodation-a form of vocal learning-quite regularly-but also, that they most likely face trade-offs between similarity and individuality. Together, this corroborates that common marmosets have a high level of vocal flexibility, and that they use vocal accommodation as a very flexible tool which appears regulated differently depending on call types and call type functions. www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.