Rhythmic synchronization tapping to an audio–visual metronome in budgerigars

Hasegawa, Ai; Okanoya, Kazuo; Hasegawa, Toshikazu; Seki, Yoshimasa

doi:10.1038/srep00120

Download PDF

Article
Open access
Published: 17 October 2011

Rhythmic synchronization tapping to an audio–visual metronome in budgerigars

Ai Hasegawa^1,2,
Kazuo Okanoya^1,2,3,
Toshikazu Hasegawa¹ &
…
Yoshimasa Seki^2,3

Scientific Reports volume 1, Article number: 120 (2011) Cite this article

7413 Accesses
95 Citations
71 Altmetric
Metrics details

Subjects

Abstract

In all ages and countries, music and dance have constituted a central part in human culture and communication. Recently, vocal-learning animals such as parrots and elephants have been found to share rhythmic ability with humans. Thus, we investigated the rhythmic synchronization of budgerigars, a vocal-mimicking parrot species, under controlled conditions and a systematically designed experimental paradigm as a first step in understanding the evolution of musical entrainment. We trained eight budgerigars to perform isochronous tapping tasks in which they pecked a key to the rhythm of audio–visual metronome-like stimuli. The budgerigars showed evidence of entrainment to external stimuli over a wide range of tempos. They seemed to be inherently inclined to tap at fast tempos, which have a similar time scale to the rhythm of budgerigars’ natural vocalizations. We suggest that vocal learning might have contributed to their performance, which resembled that of humans.

Orangutans show active voicing through a membranophone

Article Open access 23 August 2019

Adriano R. Lameira & Robert W. Shumaker

Statistical learning for vocal sequence acquisition in a songbird

Article Open access 10 February 2020

Logan S. James, Herie Sun, … Jon T. Sakata

Neural dynamics underlying birdsong practice and performance

Article 20 October 2021

Jonnathan Singh Alvarado, Jack Goffinet, … Richard Mooney

Introduction

Dancing to the rhythm of music is such a natural and spontaneous human behaviour that we are typically unaware that the ability to align movement to external rhythmic stimuli is not ubiquitous in nature. For example, rhesus monkeys (Macaca mulatta) had great difficulty synchronizing their finger taps with both external visual and auditory stimuli in isochronous tapping tasks¹, while human children with no musical training could perform similar tasks quite easily². To date, there have been no other scientific attempts to train animals to move in synchrony with external rhythmic stimuli. However, spontaneous motor entrainment to music has been demonstrated in several parrot species and one elephant species^3,4, supporting the vocal learning and rhythmic synchronization hypothesis, which posits that vocal learning provides a neurobiological foundation for auditory–motor entrainment⁵. These reports suggest that there are additional spontaneous dancers in the natural world other than humans, but they consist of case studies of one individual per species and analyses of YouTube videos, whose recording conditions are unknown. To examine this topic, we systematically investigated the generation of rhythmic motor patterns synchronized with external stimuli in budgerigars (Melopsittacus undulates; Fig. 1), a vocal-learning parrot species, using audio–visual metronome-like stimuli instead of complex music. This study is an important first step towards understanding the timing control mechanism in vocal learners.

We trained budgerigars (four males, four females) to perform isochronous tapping tasks. Isochronous tapping is a task originally designed to investigate cognitive timing control in humans, in which a subject taps his or her finger in synchronization with external rhythmic stimuli⁶. In a previous study with rhesus monkeys¹, three members of this species were trained to perform isochronous tapping by an operant conditioning method. The monkeys and human subjects were required to produce four taps to a visual or an auditory metronome followed by three continuation taps in the absence of stimulus presentation. After more than a year of extensive training, the monkeys finally learned the task, but their performance differed from that of humans in that they did not synchronize their movements to the stimulus onset as human subjects did, instead tapping around 300 ms after stimulus onsets. In addition, there was no auditory dominance in their tapping behaviour, while humans performed noticeably better under the auditory condition than under the visual condition. Using similar tasks, we attempted to determine whether the budgerigars’ performance resembles that of humans or if it parallels that of non-vocal-learning rhesus monkeys.

Through an operant conditioning procedure, the budgerigars learned to peck at an LED key during the 300-ms stimulus presentation or during an acceptable period before and after stimulus presentation and such responses were counted as ‘hits’. The hit period before stimulus onset accounted for 20% of the inter-stimulus onset interval (IOI) (under the 450-ms- and random-IOI conditions, it was 50 ms and 180 ms, respectively), whereas the hit period after onset accounted for 20% of the IOI length plus the 300-ms stimulus presentation (Fig. 2; see Methods). We tested the birds under seven different IOIs: 450 ms, 600 ms, 900 ms, 1,200 ms, 1,500 ms, 1,800 ms and random numbers of milliseconds. The random-IOI condition, in which three possible IOIs (i.e., 900 ms, 1,200 ms and 1,500 ms) appeared in random order, served to obtain estimated reaction time (ERT) to the stimuli. Under each condition, the birds were required to make 50 key-pecking sequences, each consisting of six successive hits and we analysed the birds’ tapping behaviour during these sequences. All birds learned to perform the tapping task (see Supplementary Videos S1 and S2 online for examples). Additionally, we recorded the warble songs of our participants and compared the intervals between warble-song elements with the IOIs at which they created precise intervals in the tapping tasks.

The present article describes the following two experimental datasets obtained in the tapping experiments: data on the time difference between tap onset and stimulus onset (asynchrony), which were used to analyse the key pecking timing using circular statistics and data on the length of interresponse intervals (IRIs). We first offer evidence for entrainment in budgerigars by comparing the real birds’ performance with that of various simulated birds. We also clarify the behavioural difference between tapping under fast-tempo (450 ms and 600 ms) and slow-tempo (900 ms, 1,200 ms, 1,500 ms and 1,800 ms) conditions, discussing this difference from an evolutionary perspective by analysing the budgerigars’ warble songs.

Results

We investigated whether budgerigars are capable of motor ‘entrainment’ to an audio–visual and, ultimately, to an auditory metronome. The essential element of entrainment, a phenomenon also referred to as beat perception and synchronization (BPS)⁵, is that a subject extracts periodicities from rhythmic stimuli and moves in a consistent phase relationship with them based on temporal anticipation. The temporal relationship between movement and stimuli should be steady when the rhythm is constant.

Using the tap-asynchrony data from the second to the sixth pecks in 50 six-hit sequences (data obtained from female D under the short IOI conditions were excluded; see Data analysis), we confirmed that the movements of real birds in all 46 tapping experiments (6 constant-IOI conditions × 8 individuals − 2 excluded) maintained a consistent phase relationship with the stimuli (Rayleigh test with unspecified mean direction⁷, P < 0.0001), with a peak in asynchrony distribution (see Fig. 3 for examples of two individuals).

Phase-matched tapping with stimulus onset

A supplemental but nonetheless important feature of entrainment is phase matching (i.e., synchronization) with the beat (in this article, we regard phase matching with the stimulus onset as phase matching with the beat). It is not a necessary condition for entrainment, but it is typically seen in human tapping⁷ and is considered to be the outcome of temporal anticipation and thus likely to represent entrainment.

In our budgerigars, phase-matched tapping with stimulus onset was significant in one of seven subjects under the 450-ms-IOI condition, in two of seven under the 600-ms-IOI condition, in five of eight under the 900-ms-IOI condition, in seven of eight under the 1,200-ms-IOI condition and all eight subjects showed significant synchronized tapping under the 1,500-ms- and 1,800-ms-IOI conditions (Rayleigh test with a specified mean direction of zero⁸, P < 0.01; Table 1, see Fig. 4 for examples under the 1,200-ms-IOI condition).

Table 1 Comparison between real and simulated subjects.

Full size table

In 6 of 46 tapping experiments, the birds’ movements were phase-matched with stimulus onset and not with the ERT (Supplementary Table S1, see Figure 4 for examples). The ERT was 150 ms and was used as the average response latency (150.4 ms) and the modal class value of the frequency distribution histogram under the random-IOI condition (see Methods section). In addition, in 32 of 46 experiments, the centre of the range in the circular distribution according to which a significant number of the pecks were directed (P < 0.01) lied before ERT (Supplementary Table S1, also see Figure 4). These results suggest that the birds’ pecks were directed toward stimulus onset rather than toward ERT.

Negative mean asynchrony

In isochronous tapping tasks performed by humans, taps tend to precede the actual beat by a few tens of milliseconds⁷, which is known as negative mean asynchrony (NMA). NMA suggests temporal anticipation even more strongly than does phase matching with stimulus onset. We calculated the mean asynchrony of the second through sixth pecks in each of the 50 six-hit sequences under each constant-IOI condition for each individual. We observed the NMA under all constant-IOI conditions except for that under the 450-ms-IOI condition (Table 1).

Possibilities other than entrainment

However, other possible explanations may be capable of explaining these results. Using computer simulations involving 10,000 simulated birds for the six constant-IOI conditions, we examined the following four possibilities that could have made the birds appear entrained.

First, the birds may have been pecking randomly. To test this possibility, we conducted a Monte Carlo (MC) simulation. Second, the birds may have continued to produce intervals of arbitrary length regardless of the actual IOI. This possibility was tested by the constant-interval (CI) simulation. Third, the birds may have stored the actual IOI length in their short-term memory and reproduced it in a serial fashion without attending to the stimuli. We tested this possibility using the memorised IOI (MI) simulation. The fourth possibility was that the birds may have been simply reacting to the stimuli. This possibility was tested by the simple reaction (SR) simulation in which the birds’ response time was around the ERT of the real subjects (see Methods section for details of each simulation).

We analysed the data in the same way in each simulation as we did for the real subjects. We then compared the performance of simulated subjects with that of real birds in terms of three aspects: the rate at which phase-matching with stimulus onset occurred, the rate at which NMA appeared and the number of failures in the completion of a six-hit sequence (i.e., one to five successive hit sequences) before producing 50 six-hit sequences.

Monte Carlo simulation

The probability of observing periodicity in the MC-simulated birds was under 1% (Rayleigh test with unspecified mean direction at the P < 0.01 level). Additionally, the Gini coefficients⁹ of the asynchrony distribution of the simulated subjects, which are a measure of inequality in a distribution, were smaller than the real birds under all conditions (Kruskal–Wallis test, P < 0.0001; Dunn’s multiple comparison test, P < 0.001), suggesting that no consistent phase relationship with the stimuli existed (Fig. 3, Supplementary Table S2). Thus, the MC model lacks the crucial prerequisite condition for entrainment.

Additionally, the simulated subjects were much less likely to show phase matching with the stimulus onset and NMA (Fisher’s exact test, P < 0.0025 except for the 450-ms-IOI condition under which no NMA was observed in real subjects; Table 1), producing many more failures than the real birds (Wilcoxon signed rank test, P < 0.0125; Table 1). These results indicate that the budgerigars’ periodicity and phase-matched tapping did not occur by chance and that they may have been behaving according to a certain set of rules.

Constant interval simulation

In the CI simulation, the phase-matching rate did not differ from that of the real subjects under four conditions (P > 0.0125; Table 1). However, the Gini coefficients of the asynchrony distribution of the simulated birds were smaller than the real birds under all conditions (Kruskal–Wallis test, P < 0.0001; Dunn’s multiple comparison test, P < 0.001), suggesting a fluctuating phase relationship with the stimuli over the course of the trials (Fig. 3, Table S2). Additionally, with the exception of the 450-ms-IOI condition, the simulated subjects produced NMA less frequently (P < 0.00025; Table 1) and the simulated birds failed more frequently (P < 0.0125). Thus, the CI model also failed to explain the real birds’ performance.

Memorised IOI simulation

In the MI simulation, the simulated birds’ pecks showed ‘accurate’ periodicity. Additionally, their phase-matching rate did not differ from that of real subjects under four conditions (P > 0.0125; Table 1). However, the smaller Gini coefficients of the asynchrony distribution of the simulated birds under all conditions (Kruskal–Wallis test, P < 0.0001; Dunn’s multiple comparison test, P < 0.001) indicated that the phase relationship with the stimuli did not remain constant over the trials (Fig. 3, Table S2).

Additionally, with the exception of the performance under the 450-ms-IOI condition, the rate at which NMA appeared was lower than that in trials with the real subjects (P < 0.0025). Furthermore, the simulated birds had fewer failures than did the real ones under four conditions (P < 0.0125). Thus, the MI model could not serve as the appropriate model for the real birds’ behaviour.

Simple reaction simulation

It is difficult to distinguish the SR strategy from true entrainment because the SR simulated birds maintained a consistent phase relationship with the stimuli over the course of the trials (Fig. 3).

However, the phase-matching rates of the real and simulated birds differed under the 450–1,200-ms-IOI conditions, although the pecks could phase-match with the stimulus onset under the 1,500-ms- and 1,800-ms-IOI conditions (Table 1). As the IOI elongated, the hit period became longer and both stimulus onset and ERT were located in a relatively narrow range. Therefore, the circular distribution logically concentrated on the period including both ERT and stimulus onset. Nevertheless, the responses of one real bird remained phase-matched with stimulus onset but not with ERT under the 1,500-ms-IOI condition, showing that the rate at which real and simulated birds phase-matched with ERT differed.

Moreover, NMA was extremely rare in the simulated birds and the simulated subjects had far fewer failures than did the real birds (P < 0.0125).

Taken together, none of the four possible models provided a sufficient explanation for the real birds’ behaviour. The budgerigars’ pecks maintained a consistent phase relationship with the stimuli based on anticipation and this pecking pattern was not the outcome of simple reactions. Consequently, the original idea, that the budgerigars anticipated the stimuli and entrained to the metronome, survived.

Tapping to auditory stimuli

In the main part of our study, we presented the budgerigars with auditory and visual cues simultaneously to facilitate the association between the LED key and the reward as per the operant conditioning protocol^10,11,12. However, it is important to test entrainment to auditory stimuli alone because this is the modality of interest in the vocal learning and rhythmic synchronization hypothesis⁵.

To this end, isochronous tapping to auditory stimuli alone was tested in three female budgerigars. Tapping was measured as they were alternately exposed to IOIs of 600 ms and 1,500 ms. The experiment was repeated twice (see Methods section).

The birds quickly learned to perform the task without special training. Their pecks were phase matched with the stimulus onset in five of the six tapping experiments (Rayleigh test, P < 0.01). This finding indicated that the auditory cue alone was sufficient for entrainment.

Behavioural difference between tempos

As shown, the budgerigars entrained to the metronome at a wide range of tempos and under different modalities. However, their behaviour differed under the fast- and slow-tempo conditions.

First, we calculated the mean tap asynchronies under each IOI condition (see Table S3 for the means and the error analysis). The mean asynchronies under the slow-tempo conditions were smaller than the reaction latencies under the random-IOI condition (N = 35 under the fast-tempo conditions, N = 40 under the slow-tempo and random-IOI conditions; Dunn’s multiple comparison test, P < 0.05), but those under the fast-tempo conditions were not different from those under the random-IOI condition (P > 0.05; Fig. 5, Table S4).

Second, we calculated the SDs of the last four IRIs (i.e., for the last five pecks) in the six-hit tapping sequence under each IOI condition as an indicator of the birds’ tapping variability. The SDs were much smaller and decreased slightly with long IOIs under the fast-tempo conditions (N = 1,400 for the fast-tempo conditions, N = 1,600 for the slow-tempo conditions; 450 ms: 87.0; 600 ms: 77.0; 900 ms: 132.8; 1,200 ms: 156.0; 1,500 ms: 188.6; 1,800 ms: 221.2). On the other hand, the SDs increased as the IOIs increased under the slow-tempo conditions (see Fig. 6 for the frequency distribution in IRIs).

Third, we examined the accuracy of the IRIs of the budgerigars’ tapping in terms of whether it differed from the actual length of IOIs. For this analysis, we divided the four IRIs of the last five pecks into the first and latter two and investigated the accuracy of the IRIs in each pair. The difference from the actual IOIs in the first and latter two IRIs under the fast-tempo conditions did not differ from zero (one-sample t-test, t = 0.720, d.f. = 1,399, P = 0.472 for the first two IRIs; t = 1.233, d.f. = 1,399, P = 0.218 for the latter two IRIs). On the other hand, the deviation from the actual IOI was larger than zero for the first two IRIs (t = 6.464, d.f. = 3,199, P < 0.0001), but did not differ from zero for the latter two IRIs under the slow-tempo conditions (t = 1.467, d.f. = 3,199, P = 0.142; Fig. 7). This indicated that IRIs maintained a constant precision under the fast-tempo conditions, whereas the budgerigars first created a rhythm faster than the actual IOI and then modified it to approach the actual IOI during the process of pecking under the slow-tempo conditions.

As a means of investigating these results from an evolutionary perspective, we analysed the budgerigars’ warble songs, which consist of various elements with a complex temporal structure¹³. Both males and females incorporate learned sounds in their warble songs¹⁴. We recorded 111 warble songs produced by three males (range, 14–62). Relative frequency distributions of the interval lengths of song elements were calculated for each individual and averaged among the three males. We found that 83.9% of all the intervals between elements were under 600 ms, with a peak at around 100–200 ms (N = 18,173). Only 10.6% demonstrated durations longer than 900 ms. This tendency is consistent with a previous report¹⁴.

We observed spontaneous tapping behaviour at the previous tempo at the start of training for a new IOI. Just after the birds started tapping training with a new IOI, they kept tapping at almost the same tempo as that in the previous or penultimate IOI condition. We observed this behaviour eight times in the 40 between-trial transitions of constant IOIs (5 transitions × 8 individuals) and it was observed in five of our subjects and between various IOI combinations (Table S5, Supplementary Videos S3 and S4). For example, male B continued tapping at the previous 600-ms IOI under the 1,800-ms-IOI condition (in which two successive hits were required for reinforcement). In this case, more IRIs were around 600 ms (450–750 ms) than in the long range (750–1,050 ms) during the first 20 reinforced sequences, but that this tendency disappeared in the last 30 reinforced sequences (chi-square test, χ ² = 6.352, d.f. = 1, P = 0.0117). However, in the other cases, such behaviours disappeared very rapidly and the birds quickly began to tap at the new IOI. We compared the occurrence rate of such IOI-retaining behaviours between the 600-ms-IOI condition (4 cases/23 subsequent IOI combinations) and the other four IOI conditions (4 cases/92 subsequent IOI combinations; the 450-ms-IOI condition was excluded because it was the last IOI condition for all subjects). We found that the 600-ms IOI tended to be retained with greater frequency than the other IOIs (Fisher’s exact test, P = 0.0497).

Discussion

Overall, our results demonstrated that the budgerigars can entrain to external stimuli. Phase-matched tapping with the stimulus onset and the frequent observation of NMA by the budgerigars strongly suggested that they were truly entrained to the metronome, even though we did not require them to move rhythmically.

Through computer simulations, we discarded four possible models by which the birds might have appeared to be entrained; the MC simulation, in which randomly pecking birds were simulated, ruled out the possibility that the budgerigars’ performance occurred by chance. The lack of periodicity in the MC subjects implied that if the subjects were behaving randomly, selecting six-hit samples would not be sufficient to get a false positive in terms of evidence of entrainment. The CI and MI models were more accurate than the MC model, but they did not meet the requirement for entrainment. The SR model was the most competitive but also failed to explain the real birds’ behaviour. If the MI and SR models were operative, the budgerigars could have achieved the task more efficiently, but apparently they could not help but entrain to the rhythm. Thus, we conclude that the budgerigars truly entrained to the metronome.

Interestingly, we found several behavioural differences under the fast-tempo and slow-tempo conditions. The greater asynchrony under the fast-tempo conditions could be explained by synchronization with the perceptual centre of stimuli rather than the physical onset of the stimuli, as in humans¹⁵; however, this notion needs to be tested using shorter tones. Under the fast-tempo conditions, the perceptual centre of stimuli was likely to lag behind stimulus onset because the ratio of stimulus duration (300 ms) to IOI was relatively high. Additionally, under the fast-tempo conditions, the IRIs the budgerigars created was most accurate and least variable. The SDs of IRIs increased as a function of IOIs under the slow-tempo conditions as observed in humans and monkeys¹, but they decreased slightly under the fast-tempo conditions, which could not be explained by linear prolongation of the acceptable period (see Methods section).

The notion that budgerigars might be inherently inclined to tap at fast tempos is corroborated by our finding that the motor pattern at the 600-ms IOI tended to be retained most frequently, while spontaneous tapping at slower tempos disappeared quite rapidly. Thus, the IOI around 600 ms seems to be the preferred tempo of budgerigars.

We also found that the birds first created IRIs shorter than the actual IOIs and elongated these as the pecking proceeded under the slow-tempo conditions (this may be comparable to a phenomenon known as negative lag-one in humans¹⁶). The phenomenon to approach the preferred motor tempo in musical entrainment has also been reported in a cockatoo¹⁷. The fact that the birds eventually changed even the preferred tempo to the correct tempo supports entrainment because it indicates that budgerigars performed live monitoring of the stimuli and real-time error correction to match the tempo of the stimuli, which occurred within a few stimulus presentations.

The fact that both the preferred tapping tempo and interval length in warble songs shared a similar temporal range under 600 ms suggests that they are based on the common internal time scale. This assumption is also supported by the similar rhythm of other natural behaviours of budgerigars, such as the nudging (mean speed: about one nudge every 480 ms) and pumping (mean speed: about one pump every 250 ms) associated with courtship¹⁸.

Indeed, the budgerigars’ behaviour differed from that of humans with respect to several features, such as the only occasional NMA, the forced tapping using food reinforcement and the short run of taps resulting from the experimental design. However, the budgerigars showed more behavioural similarities with humans than with rhesus monkeys¹. First, the budgerigars performed phase-matched tapping to the stimulus onset at a wide range of tempos. In contrast, rhesus monkeys always tapped after stimulus onset and never synchronized with it even when the duration of stimulus presentation was as short as 33 ms and the perceptual centre of the stimuli was located near the site of stimulus appearance¹. Second, budgerigars could entrain to auditory stimuli alone with little training. Studies investigating modalities related to synchronization in humans are in agreement about auditory dominance^19,20,21,22, but monkeys show a clear preference for visual stimuli and require considerable training to learn to tap to auditory stimuli¹. Although our findings did not prove auditory dominance in budgerigars, they are pertinent to the vocal learning and rhythmic synchronization hypothesis, which explores auditory–motor synchronization.

Our results support the hypothesis that vocal learning is the cause of the budgerigars’ more human-like performance in isochronous tapping compared to rhesus monkeys. The phenomenon of interest for the aforementioned vocal learning and rhythmic synchronization hypothesis is motor entrainment to music⁵. The entrainment to an auditory metronome demonstrated in the budgerigars represents an important milestone to support this hypothesis. Our findings suggest that vocal-learning animals are not only better at musical entrainment, but also at simpler rhythm production and perception tasks in general when the rhythm is conveyed in the auditory domain. For example, vocal-learning starlings (Sturnus vulgaris) could discriminate tones varying in tempo²³ and in rhythmic and arrhythmic patterns²⁴ and vocal-learning dolphins (Tursiops truncates) could discriminate different 4-s rhythms²⁵. On the other hand, non-vocal-learning pigeons (Columba livia) were only capable of tempo discrimination and could discriminate neither rhythm nor meter despite task simplification²⁶. Thus, the budgerigars’ more human-like performance compared with monkeys in isochronous tapping might be explained by their vocal learning ability.

Because this study was designed as a controlled experiment, we excluded all social factors. However, given that a study on humans reported the social facilitation of rhythmic entrainment among children²⁷ and in view of the social nature of budgerigars²⁸, a more social experimental environment may improve the birds’ performance.

This study is the first report on synchronization tapping in a vocal-mimicking species. Our study may provide a key to the unsolved riddle of why some individual parrots express musical entrainment behaviour and others of the same species do not. In our experiments, we showed that budgerigars generally have the ability to entrain to external stimuli when they are encouraged to do so. This entrainment is concerned not only with dancing, but also with music making and drumming that are synchronised with an external beat. We can infer from our finding that such rhythmic sensorimotor synchronization ability is likely present in all individual parrots, but latent in many of the conspecifics. In conclusion, our findings provide additional support for the hypothesis that rhythmic synchronization evolved as a by-product of selection for vocal learning.

Methods

Animals

Adult budgerigars (four males and four females, 1 year old) participated in the experiments and recordings of warble songs were used. The birds’ access to food was restricted and they were maintained at an average of 84% of their free-feeding body weights for the operant procedure. All experimental procedures involving the animals complied with a protocol approved by the RIKEN Animal Experiments Committee and the RIKEN BSI guidelines.

Apparatus

We used a sound-attenuated chamber (inside dimensions: D 40.9 × W 58.9 × H 37.0 cm) for the tapping experiments and song recordings. We mounted an operant conditioning cage (15.5 × 30 × 22 cm high) in this chamber, which accommodated a custom-built response panel consisting of one piezoelectric sensor to detect pecking vibrations mounted behind a green light-emitting diode (LED) key. The tapping movements were registered with a device with accuracy in the milliseconds. A 100-ms refractory period followed each response recording to avoid registering multiple time points for one response. A loudspeaker was placed above the roof of the cage to play back auditory stimuli: 3-kHz pure tones (a frequency close to budgerigars’ natural contact call frequency²⁹) with a sound pressure level of 70 dB. The birds received one or two seed grains from a food tray under the key over a 2-s interval immediately following a correct response. The system was controlled by a personal computer and custom-made software created on a platform (Visual Studio 2005; Microsoft, USA).

Procedures

We trained the birds to peck the LED using operant conditioning techniques incorporating positive reinforcement. First, we trained the birds to peck the lit key. As soon as they pecked the key, the light disappeared and they obtained a reward. Then, we trained the birds to peck the blinking key during stimulus presentation. We shortened the stimulus duration gradually from 1,000 ms to 300 ms. The IOI length was either 3, 4, or 5 s and was randomly chosen at the beginning of each new sequence (i.e., after the bird obtained a reward). This phase ended when the ratio of the number of pecks during stimulus presentation (i.e., reinforcements) in each trial (which should be 50) to the number of total stimulus presentations exceeded 75% on three occasions when the stimulus duration was 300 ms.

The tapping training phase in which auditory and visual stimuli were simultaneously presented for 300 ms followed. We used the same ‘acceptable periods’ prior to stimulus onset and after stimulus offset. The first response for a stimulus during stimulus presentation and the acceptable period was counted as a ‘hit’. Any following responses during the period and a response outside of this period were regarded as an ‘error’. If no response occurred from the beginning of the acceptable period before stimulus onset to the beginning of the acceptable period before the next stimulus onset, a ‘miss’ was recorded. We used six constant-IOI conditions (450, 600, 900, 1,200, 1,500 and 1,800 ms) as well as a random-IOI condition. Under the random-IOI condition, three IOIs appeared in random order (900, 1,200, or 1,500 ms). The training order of the conditions was randomised among individual birds, but every bird received the random and the 450-ms-IOI conditions in the penultimate and final positions, respectively. The acceptable period was 20% of the IOI length under all conditions except the 450-ms- and random-IOI conditions in which it was 50 ms and 180 ms, respectively. Under each condition, the number of successive hits required for reinforcement was gradually increased from one to six. If the birds produced an error or a miss response just after making several hit responses, the accumulated hit count returned to zero. After 50 six-hit sequences under each IOI condition, the bird proceeded to the next IOI condition. These six-hit sequences were used for the analyses.

Three birds (Females A, B and C) were tested using auditory stimuli alone. In order to accomplish this, the luminance of the LED was gradually reduced during two or three successive sessions. Then, the electric current to the LED was cut off. The other experimental conditions were the same as those used in the audio–visual experiment. Two IOI conditions (600 ms and 1500 ms; representing fast and slow tempo) were used and the birds were alternately exposed to each IOI twice.

Recording of warble songs

We placed two plastic cages (15.5 × 30 × 22 cm high) facing towards a microphone in the sound-attenuated chamber. Birds were separately placed into the chamber (for almost 3 days per bird) in random pairs. We recorded songs using software Avisoft-RECORDER (Avisoft Bioacoustics, Germany) at a sampling frequency of 22.05 kHz. Songs were high-pass-filtered at 1 kHz to eliminate low-frequency noise produced by a ventilating fan. Each data point consisted of a continuous recording longer than 20 s. Warble elements were automatically separated and the interval between the onset of adjacent elements were measured using the ‘automatic measurement’ function of Avisoft SASLab Pro as the first-to-last points in the spectrogram exceeding a fixed threshold. This threshold was established by manual adjustment so that the segmentation pattern corresponded almost perfectly with that of human observers.

Simulations

We simulated four models: Monte Carlo (MC), constant interval (CI), memorised IOI (MI) and simple reaction (SR). Six simulated pecks were generated following the rule for each simulation (see below) in the stimulus-presentation sequence consisting of five intervals. Intervals between pecks were at least 100 ms, which corresponded with the 100-ms refractory period of experiments in real birds. Then, each peck was judged, in order, from the onset of the first stimulus presentation as a ‘hit’, ‘error’, or ‘miss’ in the same manner as in the real experiment. When an error or a miss was detected, the pecking series was regarded as a failure at that time point and the next six key pecks were generated. The second to the sixth pecks were analysed as in the real experiment. We generated data for 10,000 simulated subjects under six constant-IOI conditions and the simulations continued until each subject produced 50 six-hit sequences.

In the CI and MI simulations, we appended fluctuations to the periodic response intervals to create more natural models. To do this, we referred to the scalar timing theory³⁰ and to studies on the discrimination of short temporal intervals by humans and birds^31,32. Briefly, we randomly sampled a value as a response interval for each peck on a normal distribution using the Box–Muller method³³. The distribution was created based on the original interval length created according to the rule for each simulation (see below) and the SD, which was estimated from a fixed Weber fraction value (0.05) and the original interval³¹.