Multilevel animal societies can emerge from cultural transmission

Multilevel societies, containing hierarchically nested social levels, are remarkable social structures whose origins are unclear. The social relationships of sperm whales are organized in a multilevel society with an upper level composed of clans of individuals communicating using similar patterns of clicks (codas). Using agent-based models informed by an 18-year empirical study, we show that clans are unlikely products of stochastic processes (genetic or cultural drift) but likely originate from cultural transmission via biased social learning of codas. Distinct clusters of individuals with similar acoustic repertoires, mirroring the empirical clans, emerge when whales learn preferentially the most common codas (conformism) from behaviourally similar individuals (homophily). Cultural transmission seems key in the partitioning of sperm whales into sympatric clans. These findings suggest that processes similar to those that generate complex human cultures could not only be at play in non-human societies but also create multilevel social structures in the wild.

2 different social levels (see Fig. 3A, main text); shapes represent the social level at which the transmission operated (circle: population, square: social unit, circle: predefined geographical clan); whiskers represent the 95% confidence intervals (CI) estimated by an appropriate theoretical model, based on 1000 iterations and 10 replicates for each model. Significant Q-values, those falling outside of the 95% CI of the benchmark distribution (P < 0.001), indicate a reliable partition in the acoustic networks, i.e. a modular topology with subsets of social units (nodes) more strongly connected with each other than with the rest of the network (see Fig. 4, main text). The top plot illustrates our main results (Fig. 3C, main text), followed by their initial parameters values. In each of the 16 new scenarios below (a-h), a single parameter was changed at a time for all 20 models with values representing the extremes of a biologically meaningful range (the parameter value is indicated in the top left of the plot). Note that each small plot resembles very much the main pattern shown in the top plot, indicating that the clan emergence presented in the main text is robust to variation in the parameters and initial conditions.  Tested parameters were common to all of the 20 agent-based models. Three conditions for each parameter were tested: a lower extreme estimate, the chosen condition, and an upper extreme estimate.

Supplementary Notes
Supplementary Note 1. Collectively, the sensitivity analysis evaluated the robustness of the clan partition across various demographic, learning, and coda-specific parameterizations (see Sensitivity analysis of parameterization and initial conditions in the agent-based models, Supplementary Methods).
The different effects considered were the proportion of potential tutors (adult agents) as source of cultural traits (coda types) 3 , their dispersal rates 4,5 as seeding of copying errors and innovations 6,7 between social units on the transmission of coda types among learners (calves), the diversity of cultural traits in the population and hence the emergence distinct cultural groups 8 , here the clans of sperm whales. The results were very robust to changes in the initial conditions and parameterization ( Supplementary Fig. 1). This is clear in two key ways. First, in all of the 16 cases with different initial conditions, there was no reliable clan partition in the models in which they originally have not emerged (ABMs 1-11, [13][14][18][19][20]. In these cases, modularity were zero or non-significant and 2 to 3 orders of magnitude smaller than in the cases where clans emerged ( Supplementary Fig. 1). Second, clans emerged only in the very same three models with biased social learning of coda types (ABMs 15-17) in which we originally observed partition of the acoustic networks into modules (Fig. 4, main text).
Although the overall pattern is robust to the initial conditions, there were a few deviant situations (10%: 5 situations/(3 models*16 cases)) in which these three models that originally yielded clans did not produce them when the parameters were set to extreme estimates ( Supplementary Fig. 1 2). In all the 20 agent-based models, the pattern was strikingly similar: modularity values were high, stable (with small 95% confidence intervals) and consistent from 5 to 100% of the sampled links (despite an overall decrease around 30%). This confers robustness to our chosen metric for clan partition across the range of possible weights for a link (coda repertoire similarity) between nodes (social units) in the simulated networks. 9

Supplementary Methods
The balabm R package. Agent-based models are available in the R package balabm v. 1 c. Open the R scripts for the agent-based models in R (or RStudio) to run the models.
When using these models or data, please cite the R package along with the original paper:

citation("balabm")
Empirical support for the agent-based models. We built agent-based models based on the following empirical socioecological findings: a. Social structure parameters i. Female and immature sperm whales are found in nearly-permanent social units that include related, as well as unrelated, animals 9 . Mature males lead quasi-solitary lives and spend a great part of their lives near the poles, associate only briefly with female social units for mating 9 , and rarely perform coda vocalizations 10 . Therefore, males would not play any important role in the behavioural learning mechanisms we addressed here, and thus the simulated social units contained only female agents.
ii. Calves experience high natal group philopatry, i.e. they remain in the natal social unit since they are highly dependent on their mothers 9,11 . Thus, agents with 0, 1 and 2 years old do not change social unit membership, i.e., the probability of remaining with the mothers in their natal social unit (i.e. retaining the social unit membership label) was rem = 1.
iii. Changes in social unit membership are very rare. Social units are long-term entities that are composed of females and immatures that live and move together for many years and likely their entire lives 9,12 .
Thus, adult females have very low probability of migrating to other social units 13 . There is a rough estimate of 6.3% probability of individuals to be involved in social unit merging, splitting or transferring of individuals within a given year 13 . Thus, in our models there was a very low probability, c = 0.05, of adult agents migrating to another social unit (i.e., of randomly changing their social unit label) through their lives. In the abovementioned study 13 , the individuals who changed social units were relatively young females. Thus, in our models the migration probability was age-dependent, inversely proportional to the agents' age, with youngster agents more prone to change their social unit.
iv. Therefore, with i, ii, and iii, the social units emerge in the simulations as nearly-permanent and nearlymatrilineal, as in the empirical data. Although four other short-term aggregative levels have been described for sperm whales, none of them are considered a social level per se 9 either because they are temporary (small temporal scale: clusters, groups) or because they do not involve social interaction among individuals (large spatial scale: aggregations, concentrations). Clusters are ephemeral sets of individuals swimming closely at the water surface for periods of minutes, and groups are sets of units moving together in a coordinate manner for periods of few hours to few days 9 . Concentrations are patches of whales spanning a few hundred kilometers, within which aggregations of individuals within 10-20 kilometers may occur 9 . Therefore, these short-term aggregative levels were not represented in the models.

b. Population parameters
v. Empirical social units were delineated using individuals re-sighted at least three times, with at least two other individuals during at least two 12h-identification periods spaced out by at least 30 days 13 . Off the Galápagos Islands, there were 174 photo-identified individual sperm whales that meet the above criteria and were assigned to 18 known social units 13 . To account for undersampling, the simulated initial population size was more than 5 times larger (N 0 = 1000 individuals) than the empirical number of photoidentified individuals, but still a reasonable number of sperm whales (excluding mature males) to be using the waters off the Galápagos Islands 14 .
vi. The mean social unit size off the Galápagos Islands is about 12 members (including sometimes juvenile males) 14,15 . The maximum initial number of simulated social units was a function of the initial population size: T VU = N 0 / SU where T VU is the total number of social units at time step t=1, N 0 is the initial population size, and SU is a random integer from the uniform distribution ϵ [8,14]. viii. Although precise ages are hard to estimate empirically, female sperm whales are thought to become sexually mature after at about the age of 9, stop reproducing after about 40 years old, and live 70 years on average 9 . In the simulations, agents followed these characteristics. The simulation were started with agents' age being randomly assigned from an exponential distribution ( , age ≥ 0, λ = 0.08), so the initial population was mostly young, with ages typically varying from 0 to 70 years old.
Agents reproduced from 9 to 40 years old (i.e. newborn calf agents are added to their simulated social units), and demographic rates were chosen such that most whales lived 70 years (although some could die earlier and a few live longer) ( Supplementary Fig. 3B).
ix. The empirical female sperm whale reproduction rate is age-specific and estimated 16 to be b emp_i = 0.257 -(0.0038 × age i ), where b emp_i is the empirical birth rate and age i is the age in years of the individual i.
Because we were interested only in female agents, we simulated female offspring only, so the simulated birth rate was half of the empirical one (b i = b emp_i / 2), assuming a 1:1 sex ratio in birth events. At each time step t, a birth probability b i_t was calculated for all agents of the population who reached sexual maturity, which have the same probability of giving birth. In the social unit of the selected reproductive agents there will be an addition of one agent with age 0 and an empty coda repertoire vector.
x. Population was modeled as density-dependent. We assumed that all social unit members compete for the same foraged resources, and thus the population fluctuated around the carrying capacity (i.e., the initial population size, N 0 = 1000 individuals) over time (see Supplementary Fig. 3C).

xi. Mortality rates (m t ) were defined as m t = [(B t + N t ) -N 0 ] / (B t + N t ),
where B t is the number of births in year t, N t is the population size in year t, and N 0 is the initial population size. Calves usually have higher mortality than adults, and classic and recent empirical estimates corroborate this pattern 9,10 . In our models, to account for age-dependence in mortality we calculated the number of agents that would die in each time step, m t , and removed agents from the simulation according to empirical estimates of mortality rates 9 (calf = 0.093 year -1 ; adult and juveniles = 0.055 year -1 ). Therefore, all agents in the population at the time step t had a probability of dying according to their age class. The death of selected agents was represented by the removal of their coda repertoires, social unit membership, and age labels from the model.
xii. The simulations lasted for t = 700 years, a period 10 times longer than the empirical life expectancy 9 and long enough to achieve a steady state in the simulations.

c. Learning parameters
xiii. While sperm whales mainly produce echolocation sounds (continuous series of clicks), they also produce codas, stereotypical patterns of 3-40 broadband clicks lasting less than 3 seconds in total 17 . Codas are performed in a social context 9,18,19 almost exclusively by females 10  Along the simulation, they could change their repertoires and add frequencies of up to 62 coda types, which is twice as long as the empirical coda repertoires to correct for undersampling of empirical data.
xv. Individuals gradually develop their codas in early ages 22,23 and rarely change coda repertoire as adults.
The precise age for learning the acoustic repertoire is genuinely difficult to estimate given the impossibilities of experimental intervention with sperm whales in the wild. While the empirical evidence is inherently scarce, it suggests that individuals mainly compose their vocal repertoires at early ages, rather than over the course of their lives. Based on the best available empirical data, we defined a calf as an agent up to 3 years old-since after this age sperm whales are usually larger and begin to wean and perform deeper dives 12 . Learning was modelled such that calf agents change their coda repertoires between the ages of 0 and 3 years old (see Fig. 2, main text).
xvi. To account for copying errors and innovations 6,24 , in all models calf agents were subjected to a low probability (0.02) of replacing the frequency of one randomly chosen coda type (62 codas * 0.02 ≈ 1) by a value drawn from the uniform distribution ϵ[0,100] each year (at ages 0, 1 and 2 year old). This effect represented individual learning, i.e. the chance of deliberately creating new coda types (innovation), or randomly creating a new type by copying an existent one with low fidelity (copying error). Therefore, with xii and xiii, changes in the coda repertoire occurred three times for each calf agent and after the age of 3, their repertoires were fixed.
14 xvii. Individuals within social units have very similar (but not identical) coda repertoires 21,23,25,26 ; and social units of the same clans have similar (but not identical) coda repertoires 21,23 . Therefore, just as with the empirical data 21 , we evaluated possible clan segregation based on the coda repertoire of the social units.
In our simulations, we used the averaged coda repertoires of the social units to examine whether they cluster in the simulated acoustic networks (i.e., by testing for modularity). predefined geographical clans and population. In doing so, it is reasonable to assume that the mean distance between an individual and other members of its unit is less than that between the individual and other members of its clan, which in turn is less than the distance between two members of the whole population. We represented the case in which social units are fairly closed structures by allowing the calf agents to learn only within their own social units. We represented learning between social units that are spatially segregated by starting the simulations with predefined clans mimicking the geographicallysegregated clans found in the Atlantic 28 , and allowing calf agents to copy from agents of other units, but only from those within their clans. Finally, we represented the case of a completely mixed population by allowing the calf agents learn from any agent in the population.
Differences between empirical and simulated coda types. Codas are stereotypical patterns of 3-40 broadband clicks lasting less than 3 seconds in total 17 . The empirical codas types are classified according to inter-click intervals, i.e., the proportion of time between the clicks 19,21 . For instance, a coda type labelled as 5R contains 5 regularly-spaced clicks (so 4 equal inter-click intervals), while a 4+1 coda also contains 5 clicks but with an extended period between the 4 th and 5 th clicks (so 3 equal and one extended inter-click interval). Sperm whales can produce several coda types 19,21 . Here, we simulated the absolute frequency of the different coda types each whale can make, not inter-click intervals that define coda types. In our agent-based models, each agent has a coda repertoire represented by a vector with maximum of 62 elements denoting absolute continuous frequencies of different coda types from 0 (absent) to 100 (always performed coda type). For instance, consider a whale A that always performs 5 codas types 3R, Differences between empirical and simulated coda transmission. The principal mechanism for the acquisition of coda repertoire is hypothesized to be social learning, in which individuals reproduce the coda types they are exposed to 19,21,23,29,30 . The main goal of our agent-based models is to test which transmission mechanisms, if any, can reproduce the empirical patterns (distinct vocal clans of sperm whales with highly similar coda repertoires), presumably generated by social learning of coda types among whales. Therefore, we modeled changes in the agents' coda repertoires (i.e. changes in the frequency of codas used) occurring due to varieties of social learning and other alternative transmission 16 processes, to observe under which combination of transmission processes vocal clans (clusters of social units with similar coda repertoire) emerged. Briefly, the transmission processes modelled were: genetic inheritance or vertical transmission (agents copy the exact mother's repertoire), individual learning (agents are assigned to random coda types and frequencies), and oblique social learning (see main text and Fig. 2).
Oblique social learning was simulated as agents copying different proportions of other older agents' repertoires (i.e. coda types and their frequencies of usage). For an example of a typical social learning simulated event, consider the two abovementioned fictional whales A and B. When the agent A "learns" from B, it randomly selects a coda type i from B's repertoire (62-element coda vector) and copies their frequency of use so the ith element in A's repertoire will contain the same value as the ith element in B's repertoire. The proportion of coda types copied is predefined, as well as from whom the codas were copied (only from individuals of the same social unit; or from individuals of different social units of the same vocal clan; or from any individuals of the population) (see main text and Fig. 2A, iii). This "pure" social learning could also be biased by homophily, conformism, and/or symbolic marking. In the models with homophily, agents copied coda types from agents of social units with high coda repertoire similarity to the social unit they belong to, but not their own ( Fig. 2A, iv). In the models with conformism, agents disproportionately copied the most common coda types (within their social unit, within their clan, or within the entire population) ( Fig. 2A, v). Finally, in the models with symbolic marking, agent calves copied a specific subset (~10%) of the coda repertoire of their social unit to mark the social unit they belong to. This subset is represented by 6 coda types that individuals from a given social unit will always perform (i.e., all agents will have 6 specific elements with 100% frequencies in the coda vector) ( Fig. 2A, vi). In our 20 sub-models, these transmission processes could be combined and occur within the social unit (Fig. 2B, vii), the pre-defined vocal clan (Fig. 2B, viii), or entire population (Fig. 2B, xi) (see also the main text and Fig. 3A for full description).

Differences between empirical and simulated coda repertoire comparison. Empirical coda repertoires
were assigned to the social units whose members were photo-identified within 2h of the recording at the field and had at least 25  networks, which we then used to analyze the clan partition. The second method categorized codas into nearly discrete types based on the number of clicks using a k-means clustering algorithm and a variance ratio criterion to choose the number of clusters 19,21 . Each coda type was given a representative label (e.g. 3R for codas with three regularly spaced clicks; 4+1 for 5-click codas with an extended interval before the last click). Two categorical coda types were considered similar if they were assigned to the same type (categorical similarity = 1) and dissimilar if assigned to different types (categorical similarity = 0) 19,21 .
In our agent-based models, we simulated frequencies of usage of coda types (categories), and not the inter-click intervals of each coda, therefore the coda repertoire comparison resemble the second method. We chose the asymmetric weighted Bray-Curtis index as a feasible metric to compare the simulated data-the averaged frequency of usage of codas within members of a social unit-to calculate the similarity between the repertoires of the social units and further evaluate the emergence of clans. For each two coda repertoires, the similarity of the frequencies of use of coda types was given by: ( ) where C AB is the sum of the lowest frequency of the coda types that are common to both repertoires of the social unit A and B; S A is the total frequency of the codas in the repertoire of the social unit A, while S B is the total frequency of codas in the repertoire of the social unit B. The original Bray-Curtis index measures dissimilarity, so here 1-BC AB gives the similarity between repertoires: from 0 (completely different) to 1 (exactly equal repertoires). Note that this metric did not consider coda types that were simultaneously absent (double zeroes) in both repertoire vectors. Finally, the Bray-Curtis similarities between simulated social units were used to build the simulated acoustic networks from which we analyzed the potential partition into clans. To evaluate the clan partition, we ran the models changing a single parameter value at a time and calculated the modularity Q-metric for weighted one-mode networks, our measure of clan partition.

Sensitivity analysis of parameterization and initial conditions in the agent-based models.
Significantly high Q-values indicated a reliable partition in the acoustic network into modules, i.e., the emergent clans. The significance of Q-values was checked using an appropriate theoretical model reshuffling the link weights among social units, using 1000 iterations. Modularity was considered high and significant when the observed Q-value was higher than the expected by chance, falling outside of the 95% confidence intervals (CI) of the benchmark distribution generated by the theoretical model (for more details, see details in the main methods section).
Our models are computationally demanding; therefore, for each of the 8 parameters and initial conditions we tested two values (the extremes of a biologically-meaningful range) and replicated it 10 times for each model, amassing 3200 Q-values from independent runs of the agent-based models.
Supplementary Table 2 displays all parameters and their tested ranges. We compared our modularity findings (Figs. 3C and 4, main text) with 16 other situations, changing a single parameter at a time. This allowed us to parse out the effect of each parameter on our modularity results, rather than having confounding influences of multiple parameter changes as a time. While, ideally, we would examine interactions of parameters as well, due to computational limitations with agent-based models, it is common to examine changing a single parameter at a time 32,33 . The results are available in the Supplementary Figure 1 and Supplementary Note 1.

Sensitivity analysis of clan partition.
We defined a clan, both in the empirical and simulated datasets, using a weighted modularity metric to evaluate the partition of the acoustic networks (i.e., nodes representing social units connected by weighted links representing similarity among their coda repertoires). Although the agent-based models simulated coda repertoires for each agent, the empirical data is not resolved at the individual level since it is logistically very difficult to assign coda production to individual whales in the wild (this has been achieved in just one case 26 ). To make the simulated data comparable to the empirical data, we averaged the coda repertoires of all agents of the social units to build the acoustic networks (see Fig. 4, main text). However, the simulation of coda repertoires with agent-based models represents a "complete sampling", in the sense that we are able to record all codas of all agents of all social units. Clearly, this is not the case for the empirical data, for the same logistical challenges.
We have used the weighted counterpart of the modularity metric to account for fine-scale variation in