Speech is the most complex auditory signal and requires the most processing1. The human brain devotes large cortical areas2,3 to deciphering the information it contains, as well as parsing speech sounds produced simultaneously by several speakers4. The brain can also invoke corrective measures to restore distortions in speech5; for example, if a brief speech sound is replaced by an interfering sound that masks it, such as a cough, the listener perceives the missing speech as if the brain interpolates through the absent segment. We have studied the intelligibility of speech, and find it is resistant to time reversal of local segments of a spoken sentence, which has been described as “the most drastic form of time scale distortion”6.
We subdivided a digitized sentence into segments of fixed duration (say, 50 ms). Every segment was then time-reversed without smoothing the transition borders between the segments. The entire spoken sentence was therefore globally contiguous, but locally time-reversed, at every point (A+B in Fig. 1). Listeners report perfect intelligibility of the sentence for segment durations up to 50 ms, and partial intelligibility for segment durations exceeding 100 ms (Fig. 1, bottom), with 50% intelligibility occurring at about 130 ms; by psychoacoustic standards, such segment distortions are very long. Many defining features of speech sounds are rapid temporal transitions with durations well within the reversal window.
Perception of speech against local time reversal is robust even if alternating segments are shifted in time (A+delayed B). Speech also remains intelligible if odd-numbered segments are displaced forwards in time by two or three times the duration of the window. For example, for segments of 100 ms, shifting the odd-numbered segment forward in time by 200 ms reduces the intelligibility rating by only 15%. For segments of 50 ms, intelligibility is not significantly affected by a displacement of 100 or 200 ms, but the speech does sound more echoic. Furthermore, the results are not changed if half the segments (A in Fig. 1) are presented to one ear and the other half (B in Fig. 1) to the other ear.
When subjects listen repeatedly to locally time-reversed sentences with moderately long windows (100 ms), they report that previously unintelligible words become clear. This type of ‘learning’ is not simply due to an improvement in identification, as subjects say they can now hear actual words, indicating some form of cognitive recalibration. The experience is similar to familiarization with a newly heard accent.
These findings lend support to recent theories7,8 of speech encoding that state, contrary to conventional thinking, that a detailed auditory analysis of the short-term acoustic spectrum is not essential to the speech code. Rather, the ultralow-frequency modulation envelopes in the order of 3 to 8 Hz are critical cues to intelligibility. Although the amplitude spectrum of a waveform is unaffected by time reversal, the temporal envelopes, as well as the fine structure of the running spectrum, are highly distorted for such sounds. The advantage of a robust speech-encoding system that uses higher-order corrective measures and ultralow-frequency cues is obvious in noisy environments where the listener needs to extract perceptually and identify a stream of speech cues that compete with extraneous noise, as in the ‘cocktail party effect’9.
Moore, B. C. J. An Introduction to the Psychology of Hearing 4th edn (Academic, New York, 1997).
Lassen, N. A., Ingvar, D. H. & Skinhoj, E. Sci. Am. 239, 50–59 (1978).
Nishizawa, Y., Olsen, T. S., Larsen, B. & Lassen, N. J. Neurophysiol. 48, 458–466 (1982).
Cherry, E. C. J. Acoust. Soc. Am. 25, 975–979 (1953).
Warren, R. M., Bashford, J. A., Healy, E. W. & Brubaker, B. S. Percept. Psychophys. 55, 313–322 (1994).
Licklider, J. C. R. & Miller, G. A. The Perception of Speech. Handbook of Experimental Psychology (ed. Stevens, S. S.) 1040-1074 (Wiley, New York, 1960).
Greenberg, S. & Arai, T. J. Acoust. Soc. Am. 103, 3057 (1998).
Greenberg, S. I. Behav. Brain Sci. 21, 267 (1998).
Yost, W. A. Percept. Psychophys. 58, 1026–1036 (1996).
About this article
Cite this article
Saberi, K., Perrott, D. Cognitive restoration of reversed speech. Nature 398, 760 (1999). https://doi.org/10.1038/19652
This article is cited by
Scientific Reports (2022)
The Intelligibility of Time-Compressed Speech Is Correlated with the Ability to Listen in Modulated Noise
Journal of the Association for Research in Otolaryngology (2022)
Communication breakdown: Limits of spectro-temporal resolution for the perception of bat communication calls
Scientific Reports (2021)
Brain networks underlying aesthetic appreciation as modulated by interaction of the spectral and temporal organisations of music
Scientific Reports (2019)
Post-comatose patients with minimal consciousness tend to preserve reading comprehension skills but neglect syntax and spelling
Scientific Reports (2019)