The African penguin is a nesting seabird endemic to southern Africa. In penguins of the genus Spheniscus vocalisations are important for social recognition. However, it is not clear which acoustic features of calls can encode individual identity information. We recorded contact calls and ecstatic display songs of 12 adult birds from a captive colony. For each vocalisation, we measured 31 spectral and temporal acoustic parameters related to both source and filter components of calls. For each parameter, we calculated the Potential of Individual Coding (PIC). The acoustic parameters showing PIC ≥ 1.1 were used to perform a stepwise cross-validated discriminant function analysis (DFA). The DFA correctly classified 66.1% of the contact calls and 62.5% of display songs to the correct individual. The DFA also resulted in the further selection of 10 acoustic features for contact calls and 9 for display songs that were important for vocal individuality. Our results suggest that studying the anatomical constraints that influence nesting penguin vocalisations from a source-filter perspective, can lead to a much better understanding of the acoustic cues of individuality contained in their calls. This approach could be further extended to study and understand vocal communication in other bird species.
Animal vocalisations have the potential to provide a variety of information about age, body size, sex, social status, and behavioural state of the emitter1,2. Vocalisations are also a prominent channel for signalling individual identity to conspecifics3,4,5,6,7,8. Discriminating among individuals is important for almost all social behaviours9 and the evolution of vocal individuality has been shown to be related to the size of social groups10. Accordingly, species living in larger groups have more signature information in their calls compared to species that live in smaller (even if more complex), social units10. In birds, evidence for individual acoustic variation in vocal signals exists in a wide range of species11 and acoustic features of vocalisations are shaped by genetic, developmental, and environmental factors12. Accordingly, mechanisms to encode the acoustic cues of individuality, range from amplitude13,14 and frequency modulations15,16 to the sequence of vocal units in songbirds17,18.
The source-filter theory of vocal production19 is a robust framework for studying individuality cues in the vocalisations of many non-human mammals20, where acoustic variation can originate from individual distinctiveness in the morphology or size of the vocal apparatus. More recently, the source-filter theory has emerged as the dominant theory for also explaining the acoustic output of many bird vocalisations21,22,23,24,25. According to this theory, bird calls are produced by the syrinx (the source), a two-part vocal organ located at the base of the trachea26,27, through vibrations of membranes (determining the fundamental frequency, “f0”). Subsequently, the sound passes through the suprasyringeal vocal tract (filter, formed by the tracheal tube, larynx, glottis (corresponding to the opening of the larynx), oropharyngeal cavity and beak. When the signal produced at the syrinx passes through the vocal tract, its frequency spectrum is shaped by the frequency selectivity of the suprasyringeal cavities28. The prominent peaks, corresponding to frequency bands whose energy has been left unchanged or slightly increased, are called “formants”. Individual variation in length and shape of the vocal apparatus can result in individual distinctiveness in the f0 and, above all, formants6,28,29. There is growing evidence that several bird species are perceptually sensitive to formants variation in vocalisations30,31.
Penguin vocalisations are highly exposed to ecological sources of selection, because these birds live in extreme climate conditions vocal signals have to propagate through high levels of background noise. Moreover, mechanisms for vocal recognition vary among species and according to their breeding ecology32,33. In particular, penguins of the genus Aptenodytes, which breed in extreme weather conditions on the Antarctic or sub-Antarctic islands, incubate their eggs on their feet to prevent them from freezing. Both the Emperor penguin (Aptenodytes forsteri), which does not have a nesting site or a meeting point for partners to reunite after foraging at sea, and the King penguin (Aptenodytes patagonicus), which have to identify particular areas of the colony where the partner is incubating, evolved finely tuned vocal recognition systems. They can localise and recognise a parent or a mate in huge crowded colonies of thousands of individuals, where vocalisations have a very low signal-to-noise ratio34,35,36,37. In particular, since the syrinx of birds is a two-part organ, where two independent sounds can be simultaneously generated by different sets of muscles and membranes at the right and left sides, non-nesting penguins have evolved the ability to use the beats generated by these different signals (the two-voice system), together with temporal characteristics of calls (variations in frequency or amplitude with time), to recognise each other. By contrast, the identification of the caller in penguins that build a nest (e.g. Adélie penguin, Pygoscelis adeliae, Gentoo penguin, Pygoscelis papua, or Macaroni penguin, Eudyptes chrysolophus) mostly occurs with relatively simpler mechanisms. In particular, they first locate their partner or chicks at the nest and then use vocalisations as a confirmation of the location cue. Mechanisms for vocal individuality in these species include frequency modulation of the pitch or the relative energy content of the harmonics32,38.
Penguins are a monophyletic family (Spheniscidae) of seabirds that is divided into six different genera and 18 extant species39. All the species live exclusively in the Southern Hemisphere, mostly concentrated in cooler waters. However, their distribution extends to southern Africa and South America, because of the Benguela and Humboldt currents, respectively40. Banded penguins form the genus Spheniscus. This genus comprises four extant species41 that inhabit temperate and equatorial areas of Southern Africa (African penguin, S. demersus), South America (Magellanic penguin, S. magellanicus and Humboldt penguin, S. humboldti) and the Galápagos Archipelago (Galápagos penguin, S. mendiculs). These species have similar body sizes33 and share common morphological traits. Sexual dimorphism is not evident but males are slightly larger than females. Moreover, banded penguins share similar behavioural ecology and nesting behaviours42. In particular, all species build nests in underground burrows that they excavate themselves43.
Nesting Spheniscus penguins can potentially recognise each other from vocalisations. Jouventin44 already reported that to the human ear, intra-individual similarities and inter-individual differences were apparent in the ecstatic display song of the African penguin. The song is composed of three acoustically distinctive syllables (Fig. 1), uttered both during exhalation (Types 1 and 2) and inhalation (Type 3) of air, which are combined in a phrase (for more detail see Favaro et al.45). Thumser and Ficken46 showed the presence of individual variation in number of syllables, duration of the longest syllable, and main frequency among different captive African penguins. In the same study, they also found individual variation in inter-syllable intervals and maximum frequency of the longest syllable in Magellanic penguin. Finally, they suggested the presence of individual distinctiveness in duration, minimum and maximum fundamental frequency in the contact calls of the Humboldt penguin. However, individual variation was not tested for contact calls in the other species due to the limited number of acoustic recordings available. More recently, Clark et al.47 demonstrated (using playbacks experiments of calls), that Magellanic penguins actually show partner recognition based on their vocalisations. In particular, females respond more strongly to ecstatic display songs of mates versus neighbours and strangers. Moreover, Clark et al.47 observed a significantly stronger response to penguins’ own mutual display songs than to that of stranger pairs. Finally, they reported that chicks are more responsive to the mutual display song of their parents compared to strangers. However, the study did not determine the acoustic features of calls that encode the individual identity information.
Recently, Favaro et al.45 provided a detailed description of the vocal repertoire of the African penguin. Favaro et al.45 showed that adult birds produce four basic vocalisations; namely, a contact call emitted by isolated birds, an agonistic call used in aggressive interactions, a mutual display song vocalised by pairs, at their nests, and an ecstatic display song uttered by single birds (almost exclusively males during the breeding season).
The goal of this study was to test whether African penguin vocalisations have the potential to provide information about the individual identity of callers, and to determine which parameters of contact calls and the ecstatic display song are responsible for individual identity. Lastly, we determined if the application of source-filter theory of vocal production could lead to a better understanding of the acoustic cues of individuality contained in vocalisations of this species. If shown to be reliable, the source-filter approach could be adopted to study and understand vocal communication in other bird species.
Descriptive statistics for all the acoustic parameters measured on contact calls are provided in Table 1. The Potential of Individual Coding (PIC) value was ≥1.1 for 18 acoustic parameters, across the 24 that were measured. Using these parameters as independent variables, the discriminant function analysis (DFA) correctly classified 76.4% of the calls to the six individuals. The accuracy of the DFA decreased to 66.1% when the more conservative leave-one-out cross-validation was applied. The statistical significance of this classification for each individual and across individuals is presented in Supplementary Table S1. In addition, the stepwise analysis was performed in 10 steps and resulted in the further selection of 10 acoustic parameters important for vocal distinctiveness. These included five source-related (f0Mean, f0Max, f0AbsSlope, Jitter, Shimmer) and three filter-related (F1Mean, F2Mean, VTLest) measures, the harmonic to noise ratio (Sonority) and the duration of the call (Dur).
Ecstatic display songs
Visual examination of the spectrograms showed that the ecstatic display song of the African penguin has considerable intra-individual stereotypy (Fig. 2). Descriptive statistics of acoustic parameters are provided in Table 2. The PIC value was ≥1.1 for 14 acoustic parameters across the 31 measured. Using these parameters as independent variables, the discriminant function analysis (DFA) correctly classified 71.9% of the ecstatic display songs for the seven individual penguins. When applying a leave-one-out cross-validated DFA this value dropped to 62.5%. The statistical significance of the DFA classification for each single bird and across individuals is presented in Supplementary Table S1. Moreover, the stepwise analysis was performed in 11 steps, and resulted in the further selection of nine acoustic parameters important for vocal distinctiveness. These included five source- (f0Start, f0Mean, f0Min, FMExtent) and one filter- (F1Mean) related measures, and four parameters related to number and duration of the different syllable types (DurType2, Type2, ∑Type2, ∑Type3).
Individual recognition is considered to be essential for animal sociality9,10. This explains why individually distinctive vocal features have been found in many social birds and mammals3,7,8,9,10,11,33,48,49. However, mechanisms used by animals to encode the vocal identity information are usually species-specific and are shaped by different genetic, developmental and environmental pressures4,33,35,50,51. We investigated the potential indicators of individuality in the contacts calls and ecstatic display songs in a territorial and colonial flightless seabird, the African penguin. We found evidence that 18 acoustic parameters for the contact calls and 14 for ecstatic display songs have low within-individual variation and high between-individual variation. In penguins, the ability to identify conspecifics using vocal cues is required for almost all social behaviours. For example, locating other birds using contact calls is needed to maintain cohesion when individuals are visually separated from the group or from particular social partners when foraging at sea44. In addition, vocal individuality is used for mate choice52 and parent-offspring recognition34,35,47. The study of vocal individuality in seabirds can contribute to our understanding of the evolution of their complex vocal communication systems, and how these have been shaped under different ecological pressures53. Furthermore, these data can be used to determine when and why the mechanisms used to encode information in vocalisations have diverged across species32,37,38,53,54.
Our results demonstrate how the source-filter theory of vocal production19 can be used to gain a better understanding of the biologically meaningful information contained in calls of nesting penguins. In particular, we showed that vocalisations in the African penguin can be studied by considering independent contributions from three different parts of the respiratory apparatus: lungs (temporal patterns), vocal production organ (source) and vocal tract (filter). We suggest that each of these three main motor systems contribute to encoding the individual identity in vocalisations. The chest muscles and lungs regulate exhalation, which determine the duration of contact calls and the temporal patterns in the ecstatic display song. The most prominent temporal features include the number and duration of inhalation (syllable type 3) and exhalation (syllable types 1 and 2) phases as well as the inter-syllable intervals (Fig. 1). The vocal organ (syrinx) transforms the airflow from lungs into acoustic energy. In particular, the vibration of the syringeal membranes controls the pitch of the call. Finally, the vocal tract has resonant cavities that change in volume and shape across individuals and generate amplified frequency bands, namely formants19,20. Formants alter the spectral structure of the sound and the distribution of the energy across the spectrum, which can, therefore, vary according to individual morphological distinctiveness.
The combination of the source and filter systems in birds can shape vocalizations in strikingly different ways55. Some species (e.g. doves) show a static filter that is used to amplify the fundamental frequency of a voice source not modulated beyond the formant bandwidth. Other species have a dynamic filter tracking fundamental frequency modulations during phonation (e.g. many songbirds). The last case is the dynamic filter with many independent bands of modulations, similarly to the event that we showed for the African penguin. Moreover, we observed stable and flat formants in both contact calls and ecstatic display songs. A stable vocal tract configuration during phonation usually results in stable formants, which have been recently suggested to make vocalizations particularly suitable for individual recognition56. In particular, the very stereotyped calling posture45,57 and the formant patterns we observed suggest that African penguins do not remarkably change the length or shape of their vocal tract during vocal production. Overall, our findings add important information to a growing body of literature on the importance of source- and filter-related acoustic cues in animal vocalisations. In particular, we supported its emerging role to explain the acoustic output of avian vocalisations21,22,24,25,55.
Our results provide evidence that contact calls can be used to advertise identity in penguins. Contact calls are vocalisations mostly used by birds and mammals58, which likely have evolved as social signals to maintain cohesion in stable groups59. However, there is growing evidence that contact calls can also be used for individual recognition60, which is particularly important in fission-fusion societies4. When we examined the role of the different vocal features to individual discrimination, DFA showed an accuracy of 66.1% for contact calls in the cross-validated procedure. Source-related (five parameters) and filter-related (three) parameters contributed most to the individuality, with some additional contributions by the duration of calls and the harmonic-to-noise ratio. These findings confirm that pitch and energy distribution across the spectrum of calls are both useful pathways to convey individual identity of nesting penguins33,37. Previous studies showed that the African penguin uses the contact call to maintain group cohesion when visually isolated from conspecifics45 or partners46 and especially when foraging at sea44. In captive settings, juveniles swimming alone in ponds may also emit contact calls46. Further research, investigating whether contact calls might allow also sex, mate, and kin recognition in this group of seabirds would be especially valuable.
Our results confirm that the ecstatic display song of Spheniscus penguins is composed of three acoustically distinct type of syllables arranged to form a sequence45,46 (Fig. 1). Our findings also showed that this vocalisation encodes individual identity information. However, the DFA performed to classify the ecstatic display songs showed an accuracy of 62.5%. This is lower accuracy than that obtained (100%) by Robisson et al.61 using 58 display songs from seven adult male Emperor penguins (Aptenodytes forsteri). In addition, the PIC values measured on the acoustic parameters of Emperor penguin vocalisations were higher than those we obtained for the ecstatic display songs of African penguins. We therefore find support for the hypothesis that the individual identity information encoded in non-nesting penguins is stronger than in species that build a nest33,37,62.
For the classification of the ecstatic display songs according to the emitter, the number and mean duration of the syllables type 2 and the relative contribution of each syllable type to the total duration of the song both contributed to the correct assignment of vocalisations. Moreover, in the stepwise procedure, the DFA used four source-related acoustic parameters and the mean value of the first formant to distinguish among individuals. The results of the DFA confirm that the ecstatic display song of the African penguin contains identity information in both temporal and spectral domains. However, similarly to what has been observed by Searby et al.38 for the Macaroni penguin (Eudyptes chrysolophus), our findings suggest that the signature system of the African penguin is not determined by a limited number of highly discriminant acoustic variables. By contrast, individual identity information in display songs is spread among several less discriminant vocal features.
In conclusion, we determined which acoustic features of contact calls and display songs have the potential to encode the individual identity information in the African penguin. Moreover, we showed that the source-filter theory of vocal production can lead to a far better understanding of the biological meaningful information encoded in penguin calls. This approach could be further extended to study vocal communication in other bird species.
The study complies with all applicable Italian laws and was conducted in accordance with the Guidelines for the Treatment of Animals in Behavioural Research and Teaching63. Penguins were recorded without performing any manipulations and without the use of playback stimuli. Since all recording procedures were non-invasive and did not cause any disturbance to the animals during their normal daily activity, this study does not fall in any of the categories for which approval of an ethic committee is required by Italian laws.
Subjects and housing
The study was performed using 12 adult African penguins belonging to a captive colony of 59 individuals at the “Bolder Beach” enclosure of the biopark Zoom Torino (44.933356 N, 7.419773 E), Italy. The original colony was established in 2009 by combining 37 adult African penguins hatch in four different zoological facilities in Europe (Artis Royal Zoo, Amsterdam, NL; Bird Park Avifauna, Alphen an den Rijn, NL; Wilhelma Zoo, Stuttgart, DE; South Lake Wild Animal Park, Manchester, UK). The colony increased to it current population because of several pairs reproducing. The colony was housed in an outdoor exhibit (1,500 m2, including a pond of 120 m2, water depth maximum 3 m), which reproduces the habitat of “Boulders Beach,” a natural nesting site in South Africa. All penguins were habituated to the presence of visitors and to close observations for recording vocalisations and behavioural studies45,64. Additionally, all birds had a microchip transponder and a flipper band for individual identification.
Acoustic recordings and selection of vocalisations
Contact calls (Supplementary Audio S1) and ecstatic display songs (Supplementary Video S1) (Fig. 1) were collected using the focal animal sampling method65 over 10 non-consecutive days during May 2014, and 40 non-consecutive days from September to November 2014 (corresponding to the peak of the breading seasons for the captive colony). Vocalisations were collected at a distance of between 2 and 5 m from the caller with a RØDE NTG2 condenser transducer microphone (frequency response 20 Hz to 20 kHz, max SPL 131 dB). In order to reduce recorded noise, the microphone was mounted on a RØDE PG2 Pistol Grip and protected with a windscreen. The microphone was connected to a TASCAM DR-680 digital recorder (44.1 kHz sampling rate) and acoustic data were saved into an internal SD memory card in WAV format (16-bit amplitude resolution). All the files were then transferred to a Macintosh computer for later acoustic analyses.
We analysed 100 hours of audio recordings. For each audio file, we used narrow-band spectrograms to visually inspect the overall spectral structure of vocalisations. In particular, the waveform and the FFT (Fast Fourier Transform) spectrogram were generated with the Praat v. 5.4.01 sound editor window, using a customised spectrogram setting [view range = 0 to 8000 Hz, window length = 0.02 s, dynamic range = 50 dB]. A total of 221 vocalisations were excluded because they had excessive background noise or because calls were overlapping between different penguins vocalising at the same time. Overall, the spectrographic selection left us with a total of 118 contact calls (contributed by 6 individuals) and 64 ecstatic display songs (contributed by 7 individuals) to be used for the acoustic analysis. Table 3 shows the contribution of each African penguin recorded.
For each vocalisation, we measured a series of spectral and temporal acoustic parameters (Table 4), which were potentially important to discriminate between individuals. These included both temporal measures, such as call duration (Dur), and intensity measures, related to lung capacity66, source-related vocal features (f0), and filter-related acoustic vocal features (formants)19,20,55. However, before getting into the measurements of filer- related acoustic parameters, we estimated the approximate vocal tract length (VTL) for African penguins, to set a plausible number of formants in a given frequency range. We built computational models of the penguin vocal tract deriving information from silicon casts (for details please refer to Gamba and Giacoma67; Gamba et al.68) of two cadavers kept frozen at −20 C°. These individuals (one male and one female) died from natural causes in 2011 and 2012, respectively. The penguins were observed emitting the calls with an open beak but we did not know how the air resonated in the suprasyringeal tubes. In this species, the trachea is divided by a septum for all its length69 and shows a single tube only in the upper portion of the vocal tract (corresponding to the larynx). Because of the particular anatomy of the African penguin vocal tract and the lack of information about the actual phonation process, we modelled both the resonance in a single tube and in two tubes70 using a MATLAB-based computer program for vocal tract acoustic response calculation (VTAR, Vocal Tract Acoustic Response71). The effect of the air resonating in one or both the tracheal tubes accounted for a 8–10% variation in formant position and a 3–5% variation in formant dispersion72 in the contact calls and in the ecstatic display songs. The acoustic response of the vocal tract models and the visual inspection of the spectrograms indicated 5 formants below 3500 Hz for the contact calls and 5 formants below 4000 Hz for the ecstatic display songs. Finally, for each ecstatic display song, in order to describe the variation of this multi-syllable vocalisation among individuals, we firstly identified the three types of syllables described by Favaro et al.45. Further, we measured the number of syllables type 1, type 2 and type 3, the sum of the intervals type 1 (∑Type1), type 2 (∑Type2) and type 3 (∑Type3), and the total duration of the song. However, we limited the spectral analysis performed on display songs to the syllable type 2, because f0 and formants parameters were impossible to measure in the other syllable types (for more details on this methodology see also Favaro et al.45; Thumser & Ficken46). If the song contained more than one syllable type 2, we calculated average values for all the spectral parameters.
The acoustic measurements were carried out using a series of custom built scripts6,29,70 in Praat v.5.4.0173. We extracted the f0 contour of each vocalisation using a cross-correlation method [Sound: To Pitch (cc) command]. We used a time step of 0.01 s, a pitch floor of 150 Hz, and a pitch ceiling of 350 Hz. From each extracted f0 contour, we obtained the frequency value of f0 at the start (f0Start) and at the end (f0End) of the call; the mean (f0Mean), minimum (f0Min) and maximum (f0Max) f0 frequency values across the call. We measured the percentage of duration from the beginning of the signal to the time at which the minimum frequency (Time f0min) and the maximum frequency (Time f0max) occurs. In addition, we obtained the f0 mean absolute slope (f0AbsSlope), which is a measure of the average local variability in f0, by computing the average slope between adjacent points on the pitch curve. Furthermore, we calculated the number of complete cycles of fundamental frequency modulation per second (FMRate) and the ratio between the total FM variation and FM rate (FMExtent). We also calculated Jitter [the mean absolute difference between frequencies of consecutive f0 periods divided by the mean frequency of f0 (Jitter (local) command)] and Shimmer [the mean absolute difference between the amplitudes of consecutive f0 periods divided by the mean amplitude of f0 (Shimmer (local) command)] values. Jitter and Shimmer are measures of the cycle-to-cycle variations of fundamental frequency and amplitude, respectively. For a detailed description of the algorithms used by Praat to calculate Jitter and Shimmer, please refer to Boersma74. We quantified the mean harmonics-to-noise ratio value (Sonority), the number of complete cycles of amplitude modulation per second (AMRate), the mean peak-to-peak variation of each In modulation per second (AMExtent), and the mean variation per second (AmpVar). Moreover, we extracted the contour of the first four formants (F1–F4) of each call using a Linear Predictive Coding analysis (LPC; Sound: To Formant (burg) command; for contact call: time step = 0.045 s, maximum number of formants = 5, maximum formant = 3500 Hz; ecstatic display song: time step = 0.045 s, maximum number of formants = 5, maximum formant = 4000 Hz) and we calculated the average frequency values. To check if the Praat software accurately tracked the formants, the outputs of the LPC analysis were visually inspected in real time together with the spectrogram and we corrected for octave jumps when necessary. In addition, we calculated the formant dispersion (ΔF) using the methods described by Reby & McComb29. From the ΔF, for each vocalisation we estimated the vocal tract length of the caller using the following equation: VTL = where c is the approximate speed of sound in the mammalian vocal tract (350 m/s) and the vocal tract is modelled as an uniform tube open at one end and closed at the other.
For each acoustic parameter we calculated the Potential of Individuality Coding (PIC). The PIC assesses the ratio between within-individual variation (CVw) and between-individual variation (CVb) of an acoustic parameter using the formula: where the mean CVw is the mean value of the CVw for all individuals60. We calculated the CVw using to correction for small samples (for an example, see Charrier et al.15): CVw = 100 where Xmean is the mean of the sample, and n is the sample size for one individual. We calculated the CVb according to the formula: CVb = 100 where the standard deviation and Xmean are calculated for the total sample75. According to several studies60,61,76,77, acoustic parameters showing PIC > 1 have the potential to encode the individual identity information, since their intra-individual variability is smaller than their inter-individual variability.
Acoustic parameters with PIC ≥ 1.1 (Table 5) were used to perform a discriminant function analysis (DFA) using a stepwise procedure. The F-value threshold for acceptance or rejection of independent variables was set at F = 3.84. Moreover, for external validation, we used a leave-one-out cross-validation procedure. We performed two separate classifications for the contact calls and the ecstatic display songs, respectively. In both cases, the identity of the caller was used as the group identifier and the acoustic variables as discriminant variables. The percentage of classification expected by chance was calculated according to the group sizes, since the different individuals do not contributed equally to the samples. Finally, we estimate the overall significance of the DFA classification using a Yates corrected χ2 test. Alpha values were set at 0.001. The analyses were performed in SPSS v.22 (IBM Corp. Released 2013. IBM SPSS Statistics for Macintosh, Version 22.0. Armonk, NY: IBM Corp.).
How to cite this article: Favaro, L. et al. Vocal individuality cues in the African penguin (Spheniscus demersus): a source-filter theory approach. Sci. Rep. 5, 17255; doi: 10.1038/srep17255 (2015).
We thank Zoom Torino for providing free access to their penguins and in particular Daniel Sanchez and Valentina Isaja. We are grateful to Marta Risoli for her help with video recordings.