Vocal individuality cues in the African penguin (Spheniscus demersus): a source-filter theory approach

The African penguin is a nesting seabird endemic to southern Africa. In penguins of the genus Spheniscus vocalisations are important for social recognition. However, it is not clear which acoustic features of calls can encode individual identity information. We recorded contact calls and ecstatic display songs of 12 adult birds from a captive colony. For each vocalisation, we measured 31 spectral and temporal acoustic parameters related to both source and filter components of calls. For each parameter, we calculated the Potential of Individual Coding (PIC). The acoustic parameters showing PIC ≥ 1.1 were used to perform a stepwise cross-validated discriminant function analysis (DFA). The DFA correctly classified 66.1% of the contact calls and 62.5% of display songs to the correct individual. The DFA also resulted in the further selection of 10 acoustic features for contact calls and 9 for display songs that were important for vocal individuality. Our results suggest that studying the anatomical constraints that influence nesting penguin vocalisations from a source-filter perspective, can lead to a much better understanding of the acoustic cues of individuality contained in their calls. This approach could be further extended to study and understand vocal communication in other bird species.


Results
Contact calls. Descriptive statistics for all the acoustic parameters measured on contact calls are provided in Table 1. The Potential of Individual Coding (PIC) value was ≥1.1 for 18 acoustic parameters, across the 24 that were measured. Using these parameters as independent variables, the discriminant function analysis (DFA) correctly classified 76.4% of the calls to the six individuals. The accuracy of the DFA decreased to 66.1% when the more conservative leave-one-out cross-validation was applied. The statistical significance of this classification for each individual and across individuals is presented in Supplementary Table S1. In addition, the stepwise analysis was performed in 10 steps and resulted in the further selection of 10 acoustic parameters important for vocal distinctiveness. These included five source-related (f 0 Mean, f 0 Max, f 0 AbsSlope, Jitter, Shimmer) and three filter-related (F 1 Mean, F 2 Mean, VTLest) measures, the harmonic to noise ratio (Sonority) and the duration of the call (Dur).
Ecstatic display songs. Visual examination of the spectrograms showed that the ecstatic display song of the African penguin has considerable intra-individual stereotypy (Fig. 2). Descriptive statistics of acoustic parameters are provided in Table 2. The PIC value was ≥1.1 for 14 acoustic parameters across the 31 measured. Using these parameters as independent variables, the discriminant function analysis (DFA) correctly classified 71.9% of the ecstatic display songs for the seven individual penguins. When applying a leave-one-out cross-validated DFA this value dropped to 62.5%. The statistical significance of the DFA classification for each single bird and across individuals is presented in Supplementary Table S1. Moreover, the stepwise analysis was performed in 11 steps, and resulted in the further selection of nine acoustic parameters important for vocal distinctiveness. These included five source-(f 0 Start, f 0 Mean, f 0 Min, FMExtent) and one filter-(F 1 Mean) related measures, and four parameters related to number and duration of the different syllable types (DurType2, Type2, ∑Type2, ∑Type3).

Discussion
Individual recognition is considered to be essential for animal sociality 9,10 . This explains why individually distinctive vocal features have been found in many social birds and mammals 3,[7][8][9][10][11]33,48,49 . However, mechanisms used by animals to encode the vocal identity information are usually species-specific and are shaped by different genetic, developmental and environmental pressures 4,33,35,50,51 . We investigated the potential indicators of individuality in the contacts calls and ecstatic display songs in a territorial and colonial flightless seabird, the African penguin. We found evidence that 18 acoustic parameters for the contact calls and 14 for ecstatic display songs have low within-individual variation and high between-individual variation. In penguins, the ability to identify conspecifics using vocal cues is required for almost all social behaviours. For example, locating other birds using contact calls is needed to maintain cohesion when individuals are visually separated from the group or from particular social partners when foraging at sea 44 . In addition, vocal individuality is used for mate choice 52 and parent-offspring   Table 1. Individual values of the acoustic parameters (mean ± SD) for contact calls. Our results demonstrate how the source-filter theory of vocal production 19 can be used to gain a better understanding of the biologically meaningful information contained in calls of nesting penguins. In particular, we showed that vocalisations in the African penguin can be studied by considering independent contributions from three different parts of the respiratory apparatus: lungs (temporal patterns), vocal production organ (source) and vocal tract (filter). We suggest that each of these three main motor systems contribute to encoding the individual identity in vocalisations. The chest muscles and lungs regulate exhalation, which determine the duration of contact calls and the temporal patterns in the ecstatic display song. The most prominent temporal features include the number and duration of inhalation (syllable type 3) and exhalation (syllable types 1 and 2) phases as well as the inter-syllable intervals (Fig. 1). The vocal organ (syrinx) transforms the airflow from lungs into acoustic energy. In particular, the vibration of the syringeal membranes controls the pitch of the call. Finally, the vocal tract has resonant cavities that change in volume and shape across individuals and generate amplified frequency bands, namely formants 19,20 . Formants alter the spectral structure of the sound and the distribution of the energy across the spectrum, which can, therefore, vary according to individual morphological distinctiveness.
The combination of the source and filter systems in birds can shape vocalizations in strikingly different ways 55 . Some species (e.g. doves) show a static filter that is used to amplify the fundamental frequency of a voice source not modulated beyond the formant bandwidth. Other species have a dynamic filter tracking fundamental frequency modulations during phonation (e.g. many songbirds). The last case is the dynamic filter with many independent bands of modulations, similarly to the event that we showed for the African penguin. Moreover, we observed stable and flat formants in both contact calls and ecstatic Acoustic parameter Renato (n = 10) Picchio (n = 7) Rico (n = 8) Joker (n = 9) Sky (n = 9) Kusubiro (n = 10) Soldato (n = 11)  Table 2. Individual values of the acoustic parameters (mean ± SD) for ecstatic display songs.
Scientific RepoRts | 5:17255 | DOI: 10.1038/srep17255 display songs. A stable vocal tract configuration during phonation usually results in stable formants, which have been recently suggested to make vocalizations particularly suitable for individual recognition 56 . In particular, the very stereotyped calling posture 45,57 and the formant patterns we observed suggest that African penguins do not remarkably change the length or shape of their vocal tract during vocal production. Overall, our findings add important information to a growing body of literature on the importance of source-and filter-related acoustic cues in animal vocalisations. In particular, we supported its emerging role to explain the acoustic output of avian vocalisations 21,22,24,25,55 .
Our results provide evidence that contact calls can be used to advertise identity in penguins. Contact calls are vocalisations mostly used by birds and mammals 58 , which likely have evolved as social signals to maintain cohesion in stable groups 59 . However, there is growing evidence that contact calls can also be used for individual recognition 60 , which is particularly important in fission-fusion societies 4 . When we examined the role of the different vocal features to individual discrimination, DFA showed an accuracy of 66.1% for contact calls in the cross-validated procedure. Source-related (five parameters) and filter-related (three) parameters contributed most to the individuality, with some additional contributions by the duration of calls and the harmonic-to-noise ratio. These findings confirm that pitch and energy distribution across the spectrum of calls are both useful pathways to convey individual identity of nesting penguins 33,37 . Previous studies showed that the African penguin uses the contact call to maintain group cohesion when visually isolated from conspecifics 45 or partners 46 and especially when foraging at sea 44 . In captive settings, juveniles swimming alone in ponds may also emit contact calls 46 . Further research, investigating whether contact calls might allow also sex, mate, and kin recognition in this group of seabirds would be especially valuable.
Our results confirm that the ecstatic display song of Spheniscus penguins is composed of three acoustically distinct type of syllables arranged to form a sequence 45,46 (Fig. 1). Our findings also showed that this vocalisation encodes individual identity information. However, the DFA performed to classify the ecstatic display songs showed an accuracy of 62.5%. This is lower accuracy than that obtained (100%) by Robisson et al. 61 using 58 display songs from seven adult male Emperor penguins (Aptenodytes forsteri). In addition, the PIC values measured on the acoustic parameters of Emperor penguin vocalisations were higher than those we obtained for the ecstatic display songs of African penguins. We therefore find support for the hypothesis that the individual identity information encoded in non-nesting penguins is stronger than in species that build a nest 33,37,62 .
For the classification of the ecstatic display songs according to the emitter, the number and mean duration of the syllables type 2 and the relative contribution of each syllable type to the total duration of the song both contributed to the correct assignment of vocalisations. Moreover, in the stepwise procedure, the DFA used four source-related acoustic parameters and the mean value of the first formant to distinguish among individuals. The results of the DFA confirm that the ecstatic display song of the African penguin contains identity information in both temporal and spectral domains. However, similarly to what has been observed by Searby et al. 38 for the Macaroni penguin (Eudyptes chrysolophus), our findings suggest that the signature system of the African penguin is not determined by a limited number of highly discriminant acoustic variables. By contrast, individual identity information in display songs is spread among several less discriminant vocal features.
In conclusion, we determined which acoustic features of contact calls and display songs have the potential to encode the individual identity information in the African penguin. Moreover, we showed that the source-filter theory of vocal production can lead to a far better understanding of the biological meaningful information encoded in penguin calls. This approach could be further extended to study vocal communication in other bird species.

Methods
Ethics statement. The study complies with all applicable Italian laws and was conducted in accordance with the Guidelines for the Treatment of Animals in Behavioural Research and Teaching 63 . Penguins were recorded without performing any manipulations and without the use of playback stimuli. Since all recording procedures were non-invasive and did not cause any disturbance to the animals during their normal daily activity, this study does not fall in any of the categories for which approval of an ethic committee is required by Italian laws.
Subjects and housing. The study was performed using 12 adult African penguins belonging to a captive colony of 59 individuals at the "Bolder Beach" enclosure of the biopark Zoom Torino (44. Acoustic recordings and selection of vocalisations. Contact calls (Supplementary Audio S1) and ecstatic display songs (Supplementary Video S1) ( Fig. 1) were collected using the focal animal sampling method 65 over 10 non-consecutive days during May 2014, and 40 non-consecutive days from September to November 2014 (corresponding to the peak of the breading seasons for the captive colony). Vocalisations were collected at a distance of between 2 and 5 m from the caller with a RØDE NTG2 condenser transducer microphone (frequency response 20 Hz to 20 kHz, max SPL 131 dB). In order to reduce recorded noise, the microphone was mounted on a RØDE PG2 Pistol Grip and protected with a windscreen. The microphone was connected to a TASCAM DR-680 digital recorder (44.1 kHz sampling rate) and acoustic data were saved into an internal SD memory card in WAV format (16-bit amplitude resolution). All the files were then transferred to a Macintosh computer for later acoustic analyses.
We analysed 100 hours of audio recordings. For each audio file, we used narrow-band spectrograms to visually inspect the overall spectral structure of vocalisations. In particular, the waveform and the FFT (Fast Fourier Transform) spectrogram were generated with the Praat v. 5.4.01 sound editor window, using a customised spectrogram setting [view range = 0 to 8000 Hz, window length = 0.02 s, dynamic range = 50 dB]. A total of 221 vocalisations were excluded because they had excessive background noise or because calls were overlapping between different penguins vocalising at the same time. Overall, the spectrographic selection left us with a total of 118 contact calls (contributed by 6 individuals) and 64 ecstatic display songs (contributed by 7 individuals) to be used for the acoustic analysis. Table 3 shows the contribution of each African penguin recorded. Acoustic analysis. For each vocalisation, we measured a series of spectral and temporal acoustic parameters (Table 4), which were potentially important to discriminate between individuals. These included both temporal measures, such as call duration (Dur), and intensity measures, related to lung capacity 66 , source-related vocal features (f 0 ), and filter-related acoustic vocal features (formants) 19,20,55 . However, before getting into the measurements of filer-related acoustic parameters, we estimated the approximate vocal tract length (VTL) for African penguins, to set a plausible number of formants in a given frequency range. We built computational models of the penguin vocal tract deriving information from silicon casts (for details please refer to Gamba and Giacoma 67 ; Gamba et al. 68 ) of two cadavers kept frozen at − 20 C°. These individuals (one male and one female) died from natural causes in 2011 and 2012, respectively. The penguins were observed emitting the calls with an open beak but we did not know how the air resonated in the suprasyringeal tubes. In this species, the trachea is divided by a septum for all its length 69 and shows a single tube only in the upper portion of the vocal tract (corresponding to the larynx). Because of the particular anatomy of the African penguin vocal tract and the lack of information about the actual phonation process, we modelled both the resonance in a single tube and in two tubes 70 using a MATLAB-based computer program for vocal tract acoustic response calculation (VTAR, Vocal Tract Acoustic Response 71 ). The effect of the air resonating in one or both the tracheal tubes accounted for a 8-10% variation in formant position and a 3-5% variation in formant dispersion 72 in the contact calls and in the ecstatic display songs. The acoustic response of the vocal tract models and the visual inspection of the spectrograms indicated 5 formants below 3500 Hz for the contact calls and 5 formants below 4000 Hz for the ecstatic display songs. Finally, for each ecstatic display song, in order to describe the variation of this multi-syllable vocalisation among individuals, we firstly identified the three types of  Table 3. Name, sex, date of birth, origin, and number of vocalisations recorded for each African penguin. Contact calls (n = 118) were recorded from six penguins, while ecstatic display songs (n = 64) were recorded from seven penguins. One individual (Renato) contributed both vocalisation types.
syllables described by Favaro et al. 45 . Further, we measured the number of syllables type 1, type 2 and type 3, the sum of the intervals type 1 (∑Type1), type 2 (∑Type2) and type 3 (∑Type3), and the total duration of the song. However, we limited the spectral analysis performed on display songs to the syllable type 2, because f 0 and formants parameters were impossible to measure in the other syllable types (for more details on this methodology see also Favaro et al. 45 ; Thumser & Ficken 46 ). If the song contained more than one syllable type 2, we calculated average values for all the spectral parameters. The acoustic measurements were carried out using a series of custom built scripts 6,29,70 in Praat v.5.4.01 73 . We extracted the f 0 contour of each vocalisation using a cross-correlation method [Sound: To Pitch (cc) command]. We used a time step of 0.01 s, a pitch floor of 150 Hz, and a pitch ceiling of 350 Hz.
From each extracted f 0 contour, we obtained the frequency value of f 0 at the start (f 0 Start) and at the end (f 0 End) of the call; the mean (f 0 Mean), minimum (f 0 Min) and maximum (f 0 Max) f 0 frequency values across the call. We measured the percentage of duration from the beginning of the signal to the time at which the minimum frequency (Time f 0 min) and the maximum frequency (Time f 0 max) occurs. In addition, we obtained the f 0 mean absolute slope (f 0 AbsSlope), which is a measure of the average local variability in f 0 , by computing the average slope between adjacent points on the pitch curve. Furthermore, we calculated the number of complete cycles of fundamental frequency modulation per second (FMRate) and the ratio between the total FM variation and FM rate (FMExtent). We also calculated Jitter [the mean absolute difference between frequencies of consecutive f 0 periods divided by the mean frequency of f 0   Statistical analysis. For each acoustic parameter we calculated the Potential of Individuality Coding (PIC). The PIC assesses the ratio between within-individual variation (CV w ) and between-individual variation (CV b ) of an acoustic parameter using the formula: CVb mean CVw where the mean CV w is the mean value of the CV w for all individuals 60 . We calculated the CV w using to correction for small samples (for an example, see Charrier et al. 15  where X mean is the mean of the sample, and n is the sample size for one individual. We calculated the CV b according to the formula: where the standard deviation and X mean are calculated for the total sample 75 . According to several studies 60,61,[76][77] , acoustic parameters showing PIC > 1 have the potential to encode the individual identity information, since their intra-individual variability is smaller than their inter-individual variability. Acoustic parameters with PIC ≥ 1.1 (Table 5) were used to perform a discriminant function analysis (DFA) using a stepwise procedure. The F-value threshold for acceptance or rejection of independent variables was set at F = 3.84. Moreover, for external validation, we used a leave-one-out cross-validation procedure. We performed two separate classifications for the contact calls and the ecstatic display songs, respectively. In both cases, the identity of the caller was used as the group identifier and the acoustic variables as discriminant variables. The percentage of classification expected by chance was calculated according to the group sizes, since the different individuals do not contributed equally to the samples.