Contextual and combinatorial structure in sperm whale vocalisations

Sperm whales (Physeter macrocephalus) are highly social mammals that communicate using sequences of clicks called codas. While a subset of codas have been shown to encode information about caller identity, almost everything else about the sperm whale communication system, including its structure and information-carrying capacity, remains unknown. We show that codas exhibit contextual and combinatorial structure. First, we report previously undescribed features of codas that are sensitive to the conversational context in which they occur, and systematically controlled and imitated across whales. We call these rubato and ornamentation. Second, we show that codas form a combinatorial coding system in which rubato and ornamentation combine with two context-independent features we call rhythm and tempo to produce a large inventory of distinguishable codas. Sperm whale vocalisations are more expressive and structured than previously believed, and built from a repertoire comprising nearly an order of magnitude more distinguishable codas. These results show context-sensitive and combinatorial vocalisation can appear in organisms with divergent evolutionary lineage and vocal apparatus.

(75%) of the codas used by sperm whales of the Eastern Caribbean clan are 5-click codas.Fig. 1 shows the frequency distribution of click count.
Frequency Time (seconds) Time (seconds) Time (seconds) All consecutive codas Same whale Different whale Supplementary Fig. 2 Response time: In the EC1 clan, consecutive codas are produced such that they either overlap with one another (response time ≈ 0) or after a time period of four seconds.(Left) The frequency distribution of the response time between consecutive codas.(Center) The frequency distribution of the response time if the consecutive codas were made by the same speaker, denoting the four-second isochronous periodicity of the coda production of an individual whale; and (Right) The frequency distribution of response time when consecutive codas are produced by different whales.The peak at t = 0 corresponds to overlapping codas, and the peak at t ≈ 4 corresponds to codas produced during turn-taking.
The periodicity of codas produced by a single whale is fairly isochronous with speakers making codas with an interval of approximately four seconds.During exchanges between whales or in chorus, the inter-coda interval is bimodal, with some codas being overlapped by their interlocutor and others being produced in sequence at an interval of approximately four seconds (Fig. 2).

Rhythm
A coda's rhythm is determined by its characteristic sequence of standardized ICIs.We use the rhythm clusters shown in Fig. 3, reported by [1].While [1] has provided intrinsic (within-coda) evidence that rhythm types are clustered, this clustering is also reflected at the exchange level, between adjacent codas.We present two pieces of temporal evidence that indicate the need for an independent treatment of the overall duration of the coda from its underlying rhythm.

Transition dynamics of consecutive calls
The plot in Fig. 4 shows the probability of transitioning between codas of various rhythm types by a single whale.Transitions between adjacent codas of the same rhythm type are significantly more likely than cross-type transitions, making up (82%) of all transitions.This is also clear from exchange plots, in Fig. 1 of the main paper, in which a given whale can be seen to maintain its rhythm in the course of producing consecutive calls.A high probability of transitioning into the same rhythm cluster in consecutive turns indicates that discovered clusters are also coherent in time.

Overlapping codas with distinct rhythm types still match overall durations
Sometimes, overlapping codas exhibit different rhythm types (as shown in Fig. 5) and even different numbers of clicks.For example, (14%) of overlapped codas involve one 4-click coda and one 5-click coda.Even when rhythm type is not matched, duration often is matched over the course of an exchange: the difference between overlapping coda durations is on average 0.104 seconds.Under a null hypothesis that this duration matching is fully explained by the overlapping codas' discrete types, we would expect a difference in durations of Supplementary Fig. 5 Frequencies of transitions between different tempo types.The matrix is diagonally dominant.0.134 seconds, which is significantly larger (test: permutation test (one-sided), p = 0.0001, n = 248, performed by randomly resampling durations within each discrete coda type without replacement).Thus, rhythm and tempo features can be independently combined:1 tempo can be imitated even while holding different rhythm types constant.

Time (in seconds)
1 minute Supplementary Fig. 6 A set of exchange plots showing choruses between pairs of sperm whales.Here, whales exchange overlapping codas of different rhythm types while matching duration.This demonstrates independent control of rhythm and duration during coda production.

Tempo
A coda's tempo is a discretised version of its overall duration.

Time (in seconds)
Supplementary Fig. 7 All DSWP codas visualized in increasing order of overall duration.The empirical distribution of the duration cluster around a set of small modes as seen in Fig. 2(B) of the main manuscript.

Clustering
We obtained the empirical distribution of coda durations from the DSWP dataset, then estimated the probability density function of the duration using a kernel density estimator (KDE) with a Gaussian kernel function.The bandwidth of the estimator (h=0.035) was chosen such that it minimizes the mean integrated squared error.The empirical distributions of codas cluster around a small set of modes as seen in Fig. Supplementary Fig. 9 A reduction in the number of tempo clusters to four results in a multi-modal distribution of rubato.
2 causes the distribution of rubato to be bimodal only for the (new) tempo cluster 1 as shown in Fig. 9. Thus, any system with a smaller number of tempo clusters either introduces (1) a bimodal tempo distribution within the cluster or (2) a bimodal rubato distribution only within that cluster.Each of these alternatives produces a system with a single idiosyncratic cluster, in contrast to the five-cluster system.

Rubato
The rubato at any given point in the exchange is characterized by the difference in the durations of adjacent codas of the same rhythm and tempo type made by the same whale.The experiments below provide evidence that (1) whales gradually vary the total duration of their codas in consecutive calls, (2) gradual drifts over a period cause long-term changes in overall duration, and (3) this drift in duration is perceived and imitated by whales in the exchange.

Adjacent coda pairs have small duration differences
The durations of adjacent codas (of the same rhythm type made in consecution by a given whale) are similar, differing by an average of 0.05s.Under a null hypothesis that duration depends only on discrete coda type, and not adjacent codas, we would expect a significantly larger value of 0.08s (test: permutation test (one-sided), p = 0.0001, n = 2953, performed by randomly resampling durations within each discrete coda type without replacement).
Results are shown in the main paper Fig. 2(C).

Rubato accumulates over time, causing larger changes in duration
Changes in coda duration are positively correlated across adjacent coda triples.

Durations of overlapping codas from different whales are imitated
For overlapping codas, the overall duration of the coda of the initiating whale is similar to the duration of the coda produced by the interlocutor: see Fig. 10.The mean observed difference in durations is 0.99s.Under a null hypothesis that chorusing whales only match discrete coda type, we would expect a significantly larger difference of 0.129 (test: permutation test (one-sided), p = 0.0001, n = 908, performed by randomly resampling durations within each discrete coda type without replacement).
Supplementary Fig. 10 Overall durations of overlapping codas in the dataset.Overlapping codas are imitated with varying degrees of precision.The breakdown of the percentage of overlapping codas in each category and its corresponding precision range are: green: (25%) within 0.05 seconds; yellow: (45%) within 0.1 seconds; purple: (61%) within 0.15 seconds; red: (71%) within 0.2 seconds; blue: remaining set of overlapping codas (100%).When overlapping codas are made, the coda duration is determined by the leading whale, and its interlocutor has to determine the timing of its clicks before the complete coda of the leading whale can be observed.Therefore, the interlocutor has to decide when to produce the n th click (where (n > 1)) of its coda after the leading whale has produced only the first m clicks of its coda (where (m > 1)).To do this, the interlocutor is required to estimate the duration of the leading whale's coda on the basis of just the first m prefix clicks of the coda.

Precision of imitation
In (92%) of overlapped codas, the first click of the following whale is made between the first and second click of the leading whale.In (97%) of overlapped codas, the second click of the following whale is made between the second and third click of the leading whale.Estimating the duration of the leading whale's coda thus requires the second whale to both precisely measure its first ICI, and additionally infer the rhythm type of the leading whale's incompletely produced coda.Since the consecutive codas of a given whale are likely to be of the same rhythm type (see Section 2), information on the rhythm type of the leading whale is present in previously produced adjacent calls.A close matching of the overall duration by the interlocutor despite an incomplete observation of the leading whale's coda demonstrates the interlocutor's awareness of the leading whale's rhythm type.
Supplementary Fig. 11 (Left) The difference in durations of overlapping codas is distributed tightly around zero.This indicates that the durations of codas are predominantly closely matched.(Right) The cumulative distribution function of the difference in durations of overlapping codas.The CDF indicates that about (70%) of the overlapping codas are matched with a precision of about (1/5) th of sec.

Rubato is not correlated with body motion
The same tags used to record vocalization are also instrumented with accelerometers and gyroscopes, making it possible to test for correlation 5 seconds Supplementary Fig. 12 Overlapping codas made by a pair of whales in a chorus.In this exchange, the first click of the second whale in each pair of overlapping coda is made between the first two clicks of the coda of the leading whale.
between vocalization and movement.Figure 13 depicts the correlation between rubato and the orientation and acceleration of the whale recorded by sensors.We find no significant correlation.The correlation between the quantities was checked from orientation and acceleration measurements made zero seconds apart to motion readings up to 10 seconds preceding the rubato.The correlation was computed for measurements in intervals of 0.1 seconds between 0 and 10.The lack of significant correlation rules out the effect of synchronized motion being the cause of coordinated rubato change.

Ornamentation
Ornamented codas are anomalous in terms of the number of clicks, duration, as well as in the aspect of their rhythm from the surrounding set of codas.In this section, we provide extra information about experiments that highlight additional differences between ornamented codas and non-ornamented codas: (1) Discarding the ornament of a coda causes the coda constituted by the remaining clicks to match the rhythm type of its neighbouring codas.(2) The rubato of ornamented codas has a distribution different from non-ornamented codas.(3) ICIs of ornamented codas have anomalous statistics.Finally, (4) ornaments are not imitated by whales in a chorus.

Finding ornaments
We define an ornament as the last click of the coda that contains one more click than the immediate neighbouring codas made by the same whale.In the dataset, we find that (4%) of the codas contain an ornament.
Since ornament labels are assigned purely based on the number of clicks, the effects on rhythm, and duration described below are not necessary consequences of this procedure but constitute distinct sources of evidence for the independence of ornaments.

5.2
The first (n − 1) clicks of ornamented codas are more similar to neighbouring codas' rhythm type than any n-click rhythm type Codas are grouped into rhythm types based on the number of clicks in the coda and the relative spacing of the clicks in their standardised form.However, in the case of ornamented codas, the ornament appears to be independent of the

Density Squared Distance
Supplementary Fig. 15 Distribution of distances of (orange) standardised ornamented codas with the standardised cluster centers of their rhythm types, and (blue) standardised ornamented codas with the ornament removed with the cluster centers of the neighbouring codas.Rhythm types of the first (n-1) clicks of ornamented codas are closer to those of the neighbouring codas than the ornamented coda's rhythm type is to its cluster.

Ornamented codas have a higher variance in the difference of durations between adjacent codas by a given whale
Generally, in the course of an exchange, a whale fixes the rhythm that it makes, picks a tempo type to start with, and only gradually varies the duration of the coda.However, ornamented and non-ornamented codas exhibit significant differences in the distribution of their duration differences with neighboring codas (test: Kolmogorov-Smirnov test (two-sample), 206) = 0.25, p = 5e −11 , 95% CI= [0.15, 0.35], n = 206).

Ornaments are not imitated
In overlapping codas where at least one of the codas is ornamented, the number of clicks the ornamented coda is only matched (5%) of the time, compared to (71%) of the time for non-ornamented overlapping codas.The average difference in duration for overlapping codas of which at least one is ornamented (0.210s) is also significantly larger than the difference in duration of overlapping non-ornamented codas (0.099s) (test: permutation test (one-sided), p=0.0001, n=(62, 848), performed by randomly resampling labels [ornamented vs. unornamented] without replacement).

Ornaments are non-uniformly distributed across call sequences
If the ornamentation were an uncontrolled feature in sperm whale calls, it would be equally likely to occur anywhere in the sequence and therefore, uniformly distributed as a function of time in the call sequence.However, ornaments are more likely to be present at the beginnings and ends of call sequences.For this analysis, we call sequences with greater than two codas in the sequence.Fully more than half of ornaments occur at the beginning or end of a call sequence: (29.08%) in the first coda and (26.24%) in the last coda.Coda type counts by location are given in Table 1.
From this table, it can be seen that ornamented codas are more likely to occur at the start of a whale's calls compared to non-ornamented codas (test: Fisher's exact test (two-sided), odds ratio: 2.00, p =0.0006).Ornamented codas are also more likely to occur at the end of a whale's calls as opposed to non-ornamented codas (test: Fisher's exact test (two-sided), odds ratio: 1.71, p =0.008).

Ornamented codas are followed by a change in responding whale vocalizations
We define a 'change in chorusing behavior' as one of three events: a following whale begins chorusing with a leading whale, pauses chorusing, or ceases vocalizing for the remainder of the exchange.Counts by location relative to chorusing change events are also shown in Table 1.Compared to unornamented codas, ornamented codas from the leading whale are disproportionately succeeded by a change in chorusing behavior from the following whale (test: Fisher's exact test (two-sided), odds ratio: 1.56, p = 0.009).

Sentences
Lily ate some fruit.
Supplementary Table 3 The linguistic hierarchy in humans and instances of the features in English.

Combinatorial coda inventory
The proposed system hypothesizes that codas are constructed by sampling from 18 rhythm types, the presence or absence of an ornament (2 possible values), an increasing, decreasing, or unchanging rubato (3 possible values), and one of 5 tempo types.The resulting set of codas that can be generated from such a system can contain a total of 18 × 5 × 2 × 3 = 540 symbols, capable of communicating ⌈log 2 576⌉ = ⌈9.08⌉= 10 bits per coda.Therefore, the set of new features and a combinatorial coding system allows encoding a significantly greater number of messages.However, not all possible codas are frequently realized in practice.20 of the rhythm and tempo type combinations occur with both rubato and ornamentation, 6 with only rubato, 1 with only ornamentation, and 16 without ornamentation or rubato.Here we ignore infrequent occurrences (≤ 1 time in the DSWP dataset) of features across the different types.The resulting set of frequently generated feature combinations comprises 156 different codas, capable of communicating ⌈log 2 156⌉ = ⌈7.28⌉= 8 bits per coda.
A computation based on the size of the message space alone is an upper bound on the true information rate.This is because, in practice, calls made at consecutive time steps are not independent.Consecutive calls made by a given whale are more likely to be of the same rhythm type (Section 2) and overlapping calls are likely to have the same overall duration (Section 4).Therefore, computing the information transmission capacity of the communication system requires accounting for dependencies between calls over longer time horizons, an important avenue for future work.

Parallels to Human Language
While an animal's communication system does not have to be human-like to exhibit non-trivial structure, human language is considered the epitome of structured communication systems due to a combination of regularity and expressivity.A comparison between the human language and sperm whale

Supplementary Fig. 3 (
Top) The set of coda rhythm types denoted by the mean standardized coda in each of the clusters.(Bottom) The corresponding set of codas of rhythm types 3 to 7.

4
Frequencies of transitions between different rhythm types.

. 8
2(B) of the main manuscript.Any further reduction in the bandwidth causes the estimator to produce a larger set of modes in regions with insufficient data points, such as regions with duration (< 0.1) seconds and duration (> 1.5) seconds.This estimated distribution discovers a set of 5 peaks with values at [0.33, 0.51, 0.8, 1.02, 1.26], and end points at [0.45, 0.61, 0.93, 1.08].Several factors justify the choice of five discrete tempo clusters rather than a smaller number.Merging tempo clusters 4 and 5 would result in a bi-modal distribution of durations within the final tempo cluster.Merging clusters 1 and Rubato (in seconds) Rubato (in seconds) Rubato (in seconds) Rubato (in seconds) Rubato (in seconds) Frequency Distribution of rubato between adjacent codas for each of the 5 tempo types.The rubato for each of the tempo types is unimodally distributed around a mean value of zero.Frequency (counts) Rubato (in seconds) Rubato (in seconds) Rubato (in seconds) Rubato (in seconds)

Fig. 11 (
Fig. 11 (Left)  shows the distribution of the difference in duration of overlapping codas and Fig.11(Right) shows the cumulative distribution function of the absolute difference in the duration of overlapping codas.It can be seen that (45%) of the codas' duration is imitated with a precision of less than 0.1 seconds.When overlapping codas are made, the coda duration is determined by the leading whale, and its interlocutor has to determine the timing of its clicks before the complete coda of the leading whale can be observed.Therefore, the interlocutor has to decide when to produce the n th click (where (n > 1)) of its coda after the leading whale has produced only the first m clicks of its coda (where (m > 1)).To do this, the interlocutor is required to estimate the duration of the leading whale's coda on the basis of just the first m prefix clicks of the coda.In (92%) of overlapped codas, the first click of the following whale is made between the first and second click of the leading whale.In (97%) of overlapped codas, the second click of the following whale is made between the second and third click of the leading whale.Estimating the duration of the leading whale's coda thus requires the second whale to both precisely measure its first ICI, and additionally infer the rhythm type of the leading whale's incompletely produced coda.Since the consecutive codas of a given whale are likely to be of the same rhythm type (see Section 2), information on the rhythm type of the leading whale is present in previously produced adjacent calls.A close matching of the overall duration by the interlocutor despite an incomplete observation of the leading whale's coda demonstrates the interlocutor's awareness of the leading whale's rhythm type.

Table 1
Frequency of ornamented and non-ornamented codas at the start and end of coda sequences (top and respectively), and frequency before changes in vocalization style (bottom)