Background & Summary

Chimpanzees resemble humans in many respects1, but differ greatly from us in terms of vocal communication. One of the most striking ways in which human language differs from the vocal communication of our primate relatives is the extent to which vocal signals are learned2, rather than innate. While songbirds35, some whales611, and even elephant seals12 show evidence for vocal learning most mammals, including nonhuman primates, have shown little evidence for learning13 (for exceptions see refs 14,15) raising questions about how and why this trait arose in the human lineage5. While captive chimpanzees can learn to use some elements of human language, they cannot effectively imitate speech, and their natural vocal communication shows little evidence of language-like features16.

Recent studies, however, have found intriguing evidence of aspects of chimpanzee vocal communication that resemble features of human language, such as referential signaling17,18 and regional dialects1921. If dialects cannot be explained simply by regional differences in ecology22 their existence is particularly intriguing, as it suggests that chimpanzees are capable of at least some learned modification of the acoustic structure of their calls. Unfortunately, no study has yet examined how the acoustic structure of these calls changes over the life of an individual, a crucial question for determining whether differences result from vocal learning. Documenting such changes is particularly difficult in chimpanzees, which live long and mature slowly. To date, very little research has been done on the development of chimpanzee vocal communication.

The most extensive effort to study the development of chimpanzee vocalizations remains unfinished. During their dissertation research on the development of behavior, the late Hetty H. C. van de Rijt-Plooij and her husband Frans Plooij recorded vocalizations of chimpanzees in Gombe National Park, Tanzania, from 1971–1973 in combination with notes on their direct observations of chimpanzee behavior and other contextual information surrounding the vocalizations. The couple focused on recording younger individuals but also recorded individuals of all ages. However, having collected a wealth of other data in the field as well, these authors published on the behavioral development of chimpanzee infants and the development of the mother-infant relationship2336 and did not analyze the vocalizations.

The centrepiece of the data package described in this paper is a collection of sound recordings of vocalizations of infant, juvenile and adolescent chimpanzees now available from the Macaulay Library (Data Citation 1) with extensive metadata (see section ‘Data Records’) for each recording. Supplementary data files are available from Dryad (Data Citation 2). All individuals were recorded longitudinally for nearly 2 years. Table 1 presents the names, birth dates, age class, sex, span of longitudinal recordings in months, and the number of recordings for each individual. The total number of recordings is 1,136. The remaining recordings concern an orphaned chimpanzee infant named Kobi, who was born in what is now the Democratic Republic of Congo and who was temporarily raised by humans in the Gombe National Park camp; he was not part of the Gombe chimpanzee community. For more information see ref. 37.

Table 1 The names, birth dates, age class, sex, span of longitudinal recordings in months, and the number of recordings for each chimpanzee individual recorded in Gombe National Park in the period 1971–1973.

In addition to the recordings collected by van de Rijt and Plooij (1971–1973), recordings of the same population also exist made by Peter Marler in 19673840, Charlotte Uhlenbroek in 1991–199341, and Lisa O’Bryan in 2009–2010 (Ph.D., candidate University of Minnesota). However, these recordings mainly concern adult individuals. Some of these adults were also recorded as infants/juveniles in the period 1971–1973, though, and comparing their adult recordings with their infant/juvenile recordings might be an especially effective way of studying vocal development.

Our recordings and behavioral observation notes will be useful to researchers interested in comparative studies and a variety of questions relating to the development, contexts, and bioacoustics of chimpanzee vocal communication. For example, McCune and coworkers4244 suggest that grunt communication is a developmental phase in human infants with evolutionary significance. Our data set could be used in a study comparing the grunts of human and chimpanzee infants.


A map of the Gombe National Park, Tanzania, East Africa is shown in Figure 1. Most recordings were made in the feeding area located in the Kakombe valley (see Figure 1b). This was the open place where Jane Goodall fed the chimpanzees1. The recording situation there was as follows. At a distance between 5–15 m the recordist pointed a Sennheiser MKH 815T directional microphone (covered with a windscreen) at the chimpanzees (shaded figures in the foreground of Figure 2) while carrying a running Nagra sound recorder (full track mono, 19.05 cm/s or 7.5 inch/s). Apart from the chimpanzee vocalizations, the verbal commentary of the recordist was recorded, before or after the vocalizations. The verbal commentary covered the names of the chimpanzees and the names of the vocalizations they produced, together with a description of the behaviour surrounding the vocalizations. Definitions of the chimpanzee behavior categories, illustrated with drawings by David Bygott, were published by Plooij27 in Appendix A of his book.

Figure 1
figure 1

Situation and map of the Gombe National Park (from Plooij27, with permission of the publisher). (a) Map of East Africa indicating the location of the Gombe National Park in Tanzania. (b) The general structure of the Gombe National Park. (c) West-east profile of the park (along line A-B in Figure 1b).

Figure 2
figure 2

The recording situation of the chimpanzee vocalizations in the Kakombe valley in the feeding area (see Figure 1b). This was the open place where Jane Goodall fed the chimpanzees. At a distance between 5–15 meters the recordist (red sweater) was carrying a Nagra sound recorder (full track mono, 19 cm/s) and pointed a Sennheiser MKH 815T directional microphone (covered with a windhose) at the chimpanzees (shaded figures in the foreground). Photo copyright Frans X. Plooij.

After the sound recordings were made, analogue audio specimens were selected from the tape and coupled with metadata that consisted of the transcriptions of the verbal commentary in Dutch and a number of other information items that are described under Data Records. The analogue audio specimens were created by listening to the original recordings and cutting out the stretches of tape containing chimpanzee vocalizations. The stretches of tape were glued together and stored on 28 reels totalling 10 h of chimpanzee vocalizations, where 20 reels concerned 17 specific young individuals (one (or more) separate reel for each individual) and 8 reels concerned adult individuals.

In 2010 the analogue audio specimens were digitized and provided together with the audiotapes to the Macaulay Library. The transcriptions of the verbal commentary were translated from Dutch to English. These transcriptions and associated metadata (see Data Records) were entered into spreadsheets (one per individual) and then into the Macaulay Library database (see Data Citation 1).

Data Records

The 1,248 audiospecimens at the Macaulay Library can be accessed directly via Data Citation 1 or by searching for ‘chimpanzee’ and ‘Pan troglodytes’ recordings with ‘Van de Rijt-Plooij, H.’ as the recordist (see Figure 3). Each record, which can be played back online, includes the following metadata: the catalog number, the species name, the recording date, the recording geography with map, the latitude/longitude, the media and equipment used, the name of the recordist, the recording length (duration), recording quality (rated according to a five star system) and notes. ‘Recording Quality’ indicates the signal-to-noise ratio with 5 stars meaning clear vocalization and very low noise in the recording. For a further specification of the measurement behind the 5 star system, see the Technical Validation section. Notes include the names of the vocalizing individual(s) together with the vocalization(s) of each individual and the behaviour and situation surrounding the vocalizations. Many recordings contain multiple calls by multiple animals. This means the sample size is overall quite large.

Figure 3
figure 3

A screenshot of a Macaulay Library website search result. Clicking on the Macaulay Library Catalog number (i.e., 161,800) will take the reader to an automatic playing of the audio along with the recording’s full set of metadata. Clicking on the red triangle plays the audio; clicking on the waveform icon brings up the audiofile in RavenViewer.

Table 2 summarizes the number of each type of vocalization given by each focal individual or the other individuals also vocalizing during the recordings of that focal individual. This table gives an indication of the frequency of the various call types. In the counting process for this summary, no discrimination was made between the calls of the focal individual and the calls of the other individuals, who were also vocalizing during the recordings of the focal individual but the identities of the callers can be distinguished in Metadata table (see below). Therefore, the top part of the table corresponds with the list of infant vocalizations and the bottom part contains 7 call types that are typically given by older individuals. It is striking that the total frequencies of the call types ‘Grunt’, ‘Ho’, ‘Hoo’, ‘Hoocall’, ‘oo’, and ‘Tonalgrunt’ are quite high. This is promising for a study comparing the grunts of human and chimpanzee infants.

Table 2 Counts of call types by focal individuals.

Metadata for all the immature individuals, cross-referenced by Macaulay catalog number, have been submitted to Dryad (Data Citation 2) in order to allow users to search for specific recordings beyond the capabilities currently provided by the Macaulay Library web interface. The first file of these metadata is a spreadsheet (All Infants_3July2014final.xlsx) and includes the name(s) of the vocalizing individual(s), the vocalization, the behaviour, and other details. The first column of the spreadsheet contains the Macaulay catalog number and that is the link to the Macaulay database. The spreadsheet is basically the same as the Macaulay database except that the columns are organized in a slightly different way. From left to right the following columns can be found: ‘Macaulay catalog number’, ‘Recording Device’, ‘Focal individual’, ‘Focal Birthdate’, ‘Recordist record number’, the ‘Level of Recording’ as selected on the Nagra sound recorder, the ‘Quality outstanding’ column where an x indicates a recording that is outstanding for various reasons (such as a very clear, good-quality recording, a recording where the vocalization is without other, simultaneous vocalizations, a recording that is a good example of the development of infant vocalizations, a recording showing nice mother-infant interaction, a recording that illustrates how the infant ‘follows’ the vocalizations of the mother), the ‘Month’, ‘Day’ and ‘Year’ of the recording, the ‘Other individuals Vocalizing’ in the recording of the Focal Individual, the ‘Date’ of the recording, the ‘Focal age’ (the age in years of the focal individual at the date of recording), the ‘Individual(s) with sound/call type’, the ‘Observation of context’ and behaviors surrounding the vocalizations, the ‘Macaulay Library Public Notes’ field, the ‘Microphone’, the ‘Recorder’, and the ‘Tape Speed’. As is described in the Usage Notes section, the grammar of the column containing individual(s) with sound/call type is such that the sequence of vocalizing is preserved. This gives information on who initiated calling, if several individuals called. This is important because it shows that vocalizations of others often triggered infants to vocalize. In the column ‘Observation of the context and behaviors surrounding the vocalizations’ the presence of nearby individuals was also noted, even if they did not vocalize.

Furthermore, the Dryad data package includes the unparsed digital copies of the chimpanzee tapes (the source analog reel-to-reel media that the Macaulay Library converted to 96 kHz/24-bit files) and two additional data files. One file is the Gombe_biography (Gombe_biography-for_1971-3.xls) for the chimpanzee individuals present during the span of time that the recordings were made. The Gombe biography contains 13 columns, but the ones most useful for analyzing the vocalizations are the name of the individual (column B), the birth date (column C), the name of the mother (column H) and the sex of the individual (column I). These columns are selfexplanatory. The other columns are explained in Strier et al.45 for those who are interested in them. The second file is a selfexplanatory list of names of infant vocalizations (List of names of infant vocalizations.doc) as used in the spreadsheet (All Infants_3July2014final.xlsx) and the Macaulay database.

Technical Validation

The ‘Quality’ of the soundrecordings in the Macaulay Library is an informal and rough Indication of the ratio of signal power to noise power (SNR). Five stars means that the recording has an SNR of 50:1 (3.2% of the 1,248 recordings were given this rating; four stars means an SNR of roughly 40:1 (14.4%); three stars conveys an SNR of roughly 30:1 (29.0%); two stars points to an SNR of roughly 20:1 (34.6%); and one star indicates SNR of less than 10:1 (18.8%)). The frequency distribution of the absolute number of recordings (y-axis) over the ratio of signal power to noise power (SNR) expressed in number of stars (x-axis) is given in Figure 4.

Figure 4
figure 4

The frequency distribution of the absolute number of recordings (y-axis) over the ratio of signal power to noise power (SNR) expressed in number of stars (x-axis). The average SNR over 1,248 recordings is 2.49.

Nearly all the sound recordings were collected by Hetty van de Rijt-Plooij.

It was impossible to test inter-observer reliability in the field, because it was difficult for two observers to be both in the best observing spot. However, intra-observer reliability can be tested over repeated observations and transcriptions of the same recordings. For example, for a short period during our study there was a portable video-recorder available in Gombe, together with a monitor. One of the infants was videotaped for 13 min during an episode which was as ‘difficult’ as possible: A number of other individuals were present and the focal infant was playing with another infant. We would expect this episode to represent a worst case for intra-observer reliability.

The intra-observer reliability test was done on two successive evenings speaking a verbal commentary onto an audiotape in terms of a list. The transcriptions of the audiotapes were done with an interval of one week between successive transcriptions. The analysis was done as follows:

First, an overall measure of reliability was calculated. The total number of behaviour category combinations was counted. (If two or more behaviour categories occur simultaneously, these form a combination.) This number depends not only on the behaviour-sequence of the infant but also on the behaviour changes in the other individuals. Therefore, it is quite a sensitive measure. According to this measure, intra-observer reliability is high: the first transcription had 215 combinations versus 217 for the second. This gives a reliability of 0.99.

Second, the infant-behavior-sequence of the first test was compared with the one of the second test. For every 15-second period the frequency of newly started behaviour categories was established. Thereafter, the behaviour categories of the two corresponding 15-second periods were compared and the number of behaviour categories that were the same were counted. Numbers of all the 15-second periods were added together and the total was called S. The total number of behaviour categories (S+the other categories that were dissimilar) was taken as the sum of those 15-second period frequencies which had the highest number of units (T). The intra-observer reliability (R) was calculated according to the formula: R=100×S/T. The result of 76% is deemed to be acceptable46,47 as the test had been made as difficult as possible and considering the minor/subtle mistakes made. The cause for the dissimilarities was roughly threefold.

First, in many cases the sequence was the same, except that in one test a behaviour category combinations was transcribed just before a 15-second time marker and just after the same marker in the second test. The reason for this was that the verbal commentary lagged behind a little when a lot happened at once and, furthermore, this lagging behind varied from test to test.

Second, there are some examples where the categories in test 1 were dissimilar but related to the categories recorded in test 2. For instance, ‘looks at’ versus ‘glimpse’. Here a judgement of the duration is crucial and it is understandable that mistakes are made in marginal cases.

Third, behaviour categories were recorded in one test and missed in the other. Mostly, these were short-lasting categories such as glimpse, yawning, self-scratching, shaking. Taking these kinds of mistakes into consideration we feel confident that the intra-observer reliability is high enough to use the data for further analysis.

Usage Notes

The age of the focal individuals at the moment of a recording was calculated by subtracting the date of birth (taken from the Gombe biography file) from the date of recording.

‘Individual(s) with sound/call type’ (Column N of the metadata spreadsheet ‘All Infants_3July2014final.xlsx’) gives the names of all vocalizing individuals together with the vocalization(s) they produce. A note of ‘uncertain’ behind a name means the recordist is not quite sure the vocalization came from that individual; ‘UN’ means ‘unknown individual(s)’; ‘GEN’ means ‘General’ or ‘the whole group’; ALL means all individuals present; HUM means ‘human’; BAB means ‘baboon’. The names plus vocalization are separated by ‘,‘ (comma). This column makes ‘cross-references’ superfluous. The Grammar of column N is as follows:

  1. a

    A comma followed by a single space separates vocalizations following each other immediately, or between the last vocalization of one individual and the name of the next individual in the sequence. All the vocalizations between two names belong to the individual of the first name.

  2. b

    ‘…’ indicates that some time passes by between one vocalization and the next.

  3. c

    Parenthical comments, such as ‘(huu)’, which is a Dutch dipthong, ‘(hoo)’, ‘(soft)’ or other remarks after the name of the vocalization describes how the vocalization sounds or gives a qualification or a general remark concerning the sound or the recording process. Whenever it says: ‘recording needle trembling’, the literal translation of the original note would be ‘recording knob shaking’. However, because we do not understand how such a knob can shake, it is translated instead as ‘needle trembling’.

  4. d

    ‘General’ means: the whole group.

In Column O (‘Observation of Context’) of the metadata spreadsheet ‘All Infants_3July2014final.xlsx’ a more general behavioural context is given of the vocalizations involved in the recording. Whenever numbers are used, these refer to the distance categories as defined on page 24 of Plooij27.

Additional information

How to cite this article: Plooij, F. X. et al. Longitudinal recordings of the vocalizations of immature Gombe chimpanzees for developmental studies. Sci. Data 1:140025 doi: 10.1038/sdata.2014.25 (2014).