Main

Seven subjects with normal vision and hearing were presented through headphones with a burst of white noise (90 decibels sound-pressure level, 10-ms duration, with 4-ms rise and fall times), the spectrum of which had been processed (by using head-related transfer functions) to simulate an external sound from a frontal direction. Brief light flashes (10 ms) were produced by an array of five green light-emitting diodes (LEDs) at different distances from the subjects (1–50 m; Fig. 1). The intensity of the light flash was 14.5 candelas per square metre at a viewing distance of 1 m, and was increased in proportion to the square of the viewing distance for the other distances in order to produce consistent intensity at the eye. The difference in onset times between the sound and light stimuli was varied randomly from −125 ms to 175 ms in steps of 25 ms.

Figure 1: Synchrony in audiovisual perception.
figure 1

a, Representative results from one observer. The percentage of light-first response for each viewing distance is plotted against sound delay (stimulus-onset asynchrony). Different colours represent results for different viewing distances (red, pink, yellow, green, blue, brown and black correspond to 1, 5, 10, 20, 30, 40 and 50 m, respectively). Dashed line indicates the 50% point, which corresponds to subjective simultaneity. b, Points of subjective equality (filled circles) plotted against viewing distance. Hollow circles, plots of 25% (bottom curve) and 75% (top) of light-first response indicate the threshold for detecting asynchrony. Dashed line represents the real sound-arrival time.

Subjects were instructed to look at the centre of the LED array and to imagine that the LEDs were the source of both light and sound, while listening to the sound directly from the sound source. To eliminate possible bias effects, we used a two-alternative forced-choice task to measure subjective simultaneity: in this task, observers judged whether the light was presented before or after the sound. Twenty responses were obtained for each condition. To determine the stimulus-onset asynchrony that corresponded to subjective simultaneity, we estimated the 50% point (the point of subjective equality) by fitting a cumulative normal-distribution function to each individual's data using a maximum-likelihood curve-fitting technique.

When the LED array was 1 m away, the point of subjective equality occurred at a sound delay of about 5 ms; however, the sound delay at this point increased with viewing distance (P < 0.001; Fig. 1a, b). This increased delay was roughly consistent with the velocity of sound (about 1 m per 3 ms at sea level and room temperature), so the point of subjective equality increased by about 3 ms with each 1-m increase in distance. This relationship was consistent at least up to a distance of 10 m.

Our results show that the brain probably takes sound velocity into account when judging simultaneity. However, it takes about 120 ms for sound to travel 40 m, and we found that the threshold for detecting the sound delay was 106 ms at a viewing distance of 40 m, so active compensation is likely to operate only for shorter distances than this.

We have shown that the brain takes sound velocity into account when integrating audiovisual information. The brain can therefore integrate audiovisual information over a wide range of temporal gaps, and correctly match sound and visual sources.