Longitudinal recordings of the vocalizations of immature Gombe chimpanzees for developmental studies

Many researchers are interested in chimpanzee vocal communication, both as an important aspect of chimpanzee social behavior and as a source of insights into the evolution of human language. Nonetheless, very little is known about how chimpanzee vocal communication develops from infancy to adulthood. The largest dataset of audiorecordings from free-living immature chimpanzees was collected by the late Hetty van de Rijt-Plooij and Frans X. Plooij at Gombe National Park, Tanzania (1971–1973). These recordings have not yet been analysed. Therefore, the most extensive effort to study the development of chimpanzee vocalizations remains unfinished. The audiospecimens total over 10 h on 28 tapes, including 20 tapes focusing on 17 specific immature individuals with a total of 1,136 recordings. In order to make this dataset available to more researchers, the analogue sound recordings were digitized and stored in the Macaulay Library and the Dryad Repository. In addition, the original notes on the contexts of the calls were translated and transcribed from Dutch into English.


Methods
A map of the Gombe National Park, Tanzania, East Africa is shown in Figure 1. Most recordings were made in the feeding area located in the Kakombe valley (see Figure 1b). This was the open place where Jane Goodall fed the chimpanzees 1 . The recording situation there was as follows. At a distance between 5-15 m the recordist pointed a Sennheiser MKH 815T directional microphone (covered with a windscreen) at the chimpanzees (shaded figures in the foreground of Figure 2) while carrying a running Nagra sound recorder (full track mono, 19.05 cm/s or 7.5 inch/s). Apart from the chimpanzee vocalizations, the verbal commentary of the recordist was recorded, before or after the vocalizations. The verbal commentary covered the names of the chimpanzees and the names of the vocalizations they produced, together with a description of the behaviour surrounding the vocalizations. Definitions of the chimpanzee behavior categories, illustrated with drawings by David Bygott, were published by Plooij 27 in Appendix A of his book. After the sound recordings were made, analogue audio specimens were selected from the tape and coupled with metadata that consisted of the transcriptions of the verbal commentary in Dutch and a number of other information items that are described under Data Records. The analogue audio specimens were created by listening to the original recordings and cutting out the stretches of tape containing chimpanzee vocalizations. The stretches of tape were glued together and stored on 28 reels totalling 10 h of chimpanzee vocalizations, where 20 reels concerned 17 specific young individuals (one (or more) separate reel for each individual) and 8 reels concerned adult individuals. In 2010 the analogue audio specimens were digitized and provided together with the audiotapes to the Macaulay Library. The transcriptions of the verbal commentary were translated from Dutch to English. These transcriptions and associated metadata (see Data Records) were entered into spreadsheets (one per individual) and then into the Macaulay Library database (see Data Citation 1).

Data Records
The 1,248 audiospecimens at the Macaulay Library can be accessed directly via Data Citation 1 or by searching for 'chimpanzee' and 'Pan troglodytes' recordings with 'Van de Rijt-Plooij, H.' as the recordist (see Figure 3). Each record, which can be played back online, includes the following metadata: the catalog number, the species name, the recording date, the recording geography with map, the latitude/longitude, the media and equipment used, the name of the recordist, the recording length (duration), recording quality (rated according to a five star system) and notes. 'Recording Quality' indicates the signal-to-noise ratio with 5 stars meaning clear vocalization and very low noise in the recording. For a further  Table 2 summarizes the number of each type of vocalization given by each focal individual or the other individuals also vocalizing during the recordings of that focal individual. This table gives an indication of the frequency of the various call types. In the counting process for this summary, no discrimination was made between the calls of the focal individual and the calls of the other individuals, who were also vocalizing during the recordings of the focal individual but the identities of the callers can be distinguished in Metadata table (see below). Therefore, the top part of the table corresponds with the list of infant vocalizations and the bottom part contains 7 call types that are typically given by older individuals. It is striking that the total frequencies of the call types 'Grunt', 'Ho', 'Hoo', 'Hoocall', 'oo', and 'Tonalgrunt' are quite high. This is promising for a study comparing the grunts of human and chimpanzee infants.
Metadata for all the immature individuals, cross-referenced by Macaulay catalog number, have been submitted to Dryad (Data Citation 2) in order to allow users to search for specific recordings beyond the capabilities currently provided by the Macaulay Library web interface. The first file of these metadata is a spreadsheet (All Infants_3July2014final.xlsx) and includes the name(s) of the vocalizing individual(s), the vocalization, the behaviour, and other details. The first column of the spreadsheet contains the Macaulay catalog number and that is the link to the Macaulay database. The spreadsheet is basically the same as the Macaulay database except that the columns are organized in a slightly different way. From left to right the following columns can be found: 'Macaulay catalog number', 'Recording Device', 'Focal individual', 'Focal Birthdate', 'Recordist record number', the 'Level of Recording' as selected on the Nagra sound recorder, the 'Quality outstanding' column where an x indicates a recording that is outstanding for various reasons (such as a very clear, good-quality recording, a recording where the vocalization is without other, simultaneous vocalizations, a recording that is a good example of the development of infant vocalizations, a recording showing nice mother-infant interaction, a recording that illustrates how the infant 'follows' the vocalizations of the mother), the 'Month', 'Day' and 'Year' of the recording, the 'Other individuals Vocalizing' in the recording of the Focal Individual, the 'Date' of the recording, the 'Focal age' (the age in years of the focal individual at the date of recording), the 'Individual(s) with sound/call type', the 'Observation of context' and behaviors surrounding the vocalizations, the 'Macaulay Library Public Notes' field, the 'Microphone', the 'Recorder', and the 'Tape Speed'. As is described in the Usage Notes section, the grammar of the column containing individual(s) with sound/call type is such that the sequence of vocalizing is preserved. This gives information on who initiated calling, if several individuals called. This is important because it shows that vocalizations of others often triggered infants to vocalize. In the column 'Observation of the context and behaviors surrounding the vocalizations' the presence of nearby individuals was also noted, even if they did not vocalize.
Furthermore, the Dryad data package includes the unparsed digital copies of the chimpanzee tapes (the source analog reel-to-reel media that the Macaulay Library converted to 96 kHz/24-bit files) and two additional data files. One file is the Gombe_biography (Gombe_biography-for_1971-3.xls) for the chimpanzee individuals present during the span of time that the recordings were made. The Gombe biography contains 13 columns, but the ones most useful for analyzing the vocalizations are the name of the individual (column B), the birth date (column C), the name of the mother (column H) and the sex of the individual (column I). These columns are selfexplanatory. The other columns are explained in Strier et al. 45 for those who are interested in them. The second file is a selfexplanatory list of names of infant vocalizations (List of names of infant vocalizations.doc) as used in the spreadsheet (All Infants_3Ju-ly2014final.xlsx) and the Macaulay database.

Technical Validation
The 'Quality' of the soundrecordings in the Macaulay Library is an informal and rough indication of the ratio of signal power to noise power (SNR). Five stars means that the recording has an SNR of 50:1 (3.2% of the 1,248 recordings were given this rating; four stars means an SNR of roughly 40:1 (14.4%); three stars conveys an SNR of roughly 30:1 (29.0%); two stars points to an SNR of roughly 20:1 (34.6%); and one star indicates SNR of less than 10:1 (18.8%)). The frequency distribution of the absolute number of recordings (y-axis) over the ratio of signal power to noise power (SNR) expressed in number of stars (x-axis) is given in Figure 4.
Nearly all the sound recordings were collected by Hetty van de Rijt-Plooij.
It was impossible to test inter-observer reliability in the field, because it was difficult for two observers to be both in the best observing spot. However, intra-observer reliability can be tested over repeated observations and transcriptions of the same recordings. For example, for a short period during our study there was a portable video-recorder available in Gombe, together with a monitor. One of the infants was videotaped for 13 min during an episode which was as 'difficult' as possible: A number of other individuals were present and the focal infant was playing with another infant. We would expect this episode to represent a worst case for intra-observer reliability.
The intra-observer reliability test was done on two successive evenings speaking a verbal commentary onto an audiotape in terms of a list. The transcriptions of the audiotapes were done with an interval of one week between successive transcriptions. The analysis was done as follows: First, an overall measure of reliability was calculated. The total number of behaviour category combinations was counted. (If two or more behaviour categories occur simultaneously, these form a combination.) This number depends not only on the behaviour-sequence of the infant but also on the behaviour changes in the other individuals. Therefore, it is quite a sensitive measure. According to this measure, intra-observer reliability is high: the first transcription had 215 combinations versus 217 for the second. This gives a reliability of 0.99.
Second, the infant-behavior-sequence of the first test was compared with the one of the second test. For every 15-second period the frequency of newly started behaviour categories was established. Thereafter, the behaviour categories of the two corresponding 15-second periods were compared and the  Total  24  265  698  81  208  135  228  5  139  98  296  65  505  177  302  49  114  3,389   Table 2. Counts of call types by focal individuals. In the counting process, no discrimination was made between the calls of the focal individual and the calls of the other individuals, who were also vocalizing during the recordings of the focal individual. Therefore, the top part of the number of behaviour categories that were the same were counted. Numbers of all the 15-second periods were added together and the total was called S. The total number of behaviour categories (S+the other categories that were dissimilar) was taken as the sum of those 15-second period frequencies which had the highest number of units (T). The intra-observer reliability (R) was calculated according to the formula: R=100 S/T. The result of 76% is deemed to be acceptable 46,47 as the test had been made as difficult as possible and considering the minor/subtle mistakes made. The cause for the dissimilarities was roughly threefold. First, in many cases the sequence was the same, except that in one test a behaviour category combinations was transcribed just before a 15-second time marker and just after the same marker in the second test. The reason for this was that the verbal commentary lagged behind a little when a lot happened at once and, furthermore, this lagging behind varied from test to test.
Second, there are some examples where the categories in test 1 were dissimilar but related to the categories recorded in test 2. For instance, 'looks at' versus 'glimpse'. Here a judgement of the duration is crucial and it is understandable that mistakes are made in marginal cases.
Third, behaviour categories were recorded in one test and missed in the other. Mostly, these were short-lasting categories such as glimpse, yawning, self-scratching, shaking. Taking these kinds of mistakes into consideration we feel confident that the intra-observer reliability is high enough to use the data for further analysis.

Usage Notes
The age of the focal individuals at the moment of a recording was calculated by subtracting the date of birth (taken from the Gombe biography file) from the date of recording.
'Individual(s) with sound/call type' (Column N of the metadata spreadsheet 'All Infants_3July2014final.xlsx') gives the names of all vocalizing individuals together with the vocalization(s) they produce. A note of 'uncertain' behind a name means the recordist is not quite sure the vocalization came from that individual; 'UN' means 'unknown individual(s)'; 'GEN' means 'General' or 'the whole group'; ALL means all individuals present; HUM means 'human'; BAB means 'baboon'. The names plus vocalization are separated by ',' (comma). This column makes 'cross-references' superfluous. The Grammar of column N is as follows: (a) A comma followed by a single space separates vocalizations following each other immediately, or between the last vocalization of one individual and the name of the next individual in the sequence. All the vocalizations between two names belong to the individual of the first name. (b) '…' indicates that some time passes by between one vocalization and the next. (c) Parenthical comments, such as '(huu)', which is a Dutch dipthong, '(hoo)', '(soft)' or other remarks after the name of the vocalization describes how the vocalization sounds or gives a qualification or a general remark concerning the sound or the recording process. Whenever it says: 'recording needle trembling', the literal translation of the original note would be 'recording knob shaking'. However, because we do not understand how such a knob can shake, it is translated instead as 'needle trembling'. (d) 'General' means: the whole group. In Column O ('Observation of Context') of the metadata spreadsheet 'All Infants_3July2014final.xlsx' a more general behavioural context is given of the vocalizations involved in the recording. Whenever numbers are used, these refer to the distance categories as defined on page 24 of Plooij 27 .