Cognitive restoration of reversed speech

Saberi, Kourosh; Perrott, David R.

doi:10.1038/19652

Download PDF

Scientific Correspondence
Published: 29 April 1999

Cognitive restoration of reversed speech

Kourosh Saberi¹ &
David R. Perrott²

Nature volume 398, page 760 (1999)Cite this article

2507 Accesses
154 Citations
6 Altmetric
Metrics details

Abstract

Speech is the most complex auditory signal and requires the most processing¹. The human brain devotes large cortical areas²,³ to deciphering the information it contains, as well as parsing speech sounds produced simultaneously by several speakers⁴. The brain can also invoke corrective measures to restore distortions in speech⁵; for example, if a brief speech sound is replaced by an interfering sound that masks it, such as a cough, the listener perceives the missing speech as if the brain interpolates through the absent segment. We have studied the intelligibility of speech, and find it is resistant to time reversal of local segments of a spoken sentence, which has been described as “the most drastic form of time scale distortion”⁶.

Main

We subdivided a digitized sentence into segments of fixed duration (say, 50 ms). Every segment was then time-reversed without smoothing the transition borders between the segments. The entire spoken sentence was therefore globally contiguous, but locally time-reversed, at every point (A+B in Fig. 1). Listeners report perfect intelligibility of the sentence for segment durations up to 50 ms, and partial intelligibility for segment durations exceeding 100 ms (Fig. 1, bottom), with 50% intelligibility occurring at about 130 ms; by psychoacoustic standards, such segment distortions are very long. Many defining features of speech sounds are rapid temporal transitions with durations well within the reversal window.

**Figure 1: Segments of speech showing the effects of time reversal.**

Perception of speech against local time reversal is robust even if alternating segments are shifted in time (A+delayed B). Speech also remains intelligible if odd-numbered segments are displaced forwards in time by two or three times the duration of the window. For example, for segments of 100 ms, shifting the odd-numbered segment forward in time by 200 ms reduces the intelligibility rating by only 15%. For segments of 50 ms, intelligibility is not significantly affected by a displacement of 100 or 200 ms, but the speech does sound more echoic. Furthermore, the results are not changed if half the segments (A in Fig. 1) are presented to one ear and the other half (B in Fig. 1) to the other ear.

When subjects listen repeatedly to locally time-reversed sentences with moderately long windows (100 ms), they report that previously unintelligible words become clear. This type of ‘learning’ is not simply due to an improvement in identification, as subjects say they can now hear actual words, indicating some form of cognitive recalibration. The experience is similar to familiarization with a newly heard accent.

These findings lend support to recent theories⁷,⁸ of speech encoding that state, contrary to conventional thinking, that a detailed auditory analysis of the short-term acoustic spectrum is not essential to the speech code. Rather, the ultralow-frequency modulation envelopes in the order of 3 to 8 Hz are critical cues to intelligibility. Although the amplitude spectrum of a waveform is unaffected by time reversal, the temporal envelopes, as well as the fine structure of the running spectrum, are highly distorted for such sounds. The advantage of a robust speech-encoding system that uses higher-order corrective measures and ultralow-frequency cues is obvious in noisy environments where the listener needs to extract perceptually and identify a stream of speech cues that compete with extraneous noise, as in the ‘cocktail party effect’⁹.

References

Moore, B. C. J. An Introduction to the Psychology of Hearing 4th edn (Academic, New York, 1997).
Lassen, N. A., Ingvar, D. H. & Skinhoj, E. Sci. Am. 239, 50–59 (1978).
Google Scholar
Nishizawa, Y., Olsen, T. S., Larsen, B. & Lassen, N. J. Neurophysiol. 48, 458–466 (1982).
Google Scholar
Cherry, E. C. J. Acoust. Soc. Am. 25, 975–979 (1953).
Google Scholar
Warren, R. M., Bashford, J. A., Healy, E. W. & Brubaker, B. S. Percept. Psychophys. 55, 313–322 (1994).
Google Scholar
Licklider, J. C. R. & Miller, G. A. The Perception of Speech. Handbook of Experimental Psychology (ed. Stevens, S. S.) 1040-1074 (Wiley, New York, 1960).
Greenberg, S. & Arai, T. J. Acoust. Soc. Am. 103, 3057 (1998).
Google Scholar
Greenberg, S. I. Behav. Brain Sci. 21, 267 (1998).
Google Scholar
Yost, W. A. Percept. Psychophys. 58, 1026–1036 (1996).
Google Scholar

Download references

Author information

Authors and Affiliations

Division of Biology, 216-76, Caltech, Pasadena, 91125, California, USA
Kourosh Saberi
Departmentof Psychology, California State University, Los Angeles, 90032, California, USA
David R. Perrott

Authors

Kourosh Saberi
View author publications
You can also search for this author in PubMed Google Scholar
David R. Perrott
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saberi, K., Perrott, D. Cognitive restoration of reversed speech. Nature 398, 760 (1999). https://doi.org/10.1038/19652

Download citation

Issue Date: 29 April 1999
DOI: https://doi.org/10.1038/19652

This article is cited by

The common limitations in auditory temporal processing for Mandarin Chinese and Japanese
- Hikaru Eguchi
- Kazuo Ueda
- Hiroshige Takeichi
Scientific Reports (2022)
The Intelligibility of Time-Compressed Speech Is Correlated with the Ability to Listen in Modulated Noise
- Robin Gransier
- Astrid van Wieringen
- Jan Wouters
Journal of the Association for Research in Otolaryngology (2022)
Communication breakdown: Limits of spectro-temporal resolution for the perception of bat communication calls
- Stephen Gareth Hörpel
- A. Leonie Baier
- Uwe Firzlaff
Scientific Reports (2021)
Brain networks underlying aesthetic appreciation as modulated by interaction of the spectral and temporal organisations of music
- Seung-Goo Kim
- Karsten Mueller
- Thomas Hans Fritz
Scientific Reports (2019)
Post-comatose patients with minimal consciousness tend to preserve reading comprehension skills but neglect syntax and spelling
- Agnieszka Kwiatkowska
- Michał Lech
- Andrzej Czyżewski
Scientific Reports (2019)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Cognitive restoration of reversed speech

Abstract

Main

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

This article is cited by

The common limitations in auditory temporal processing for Mandarin Chinese and Japanese

The Intelligibility of Time-Compressed Speech Is Correlated with the Ability to Listen in Modulated Noise

Communication breakdown: Limits of spectro-temporal resolution for the perception of bat communication calls

Brain networks underlying aesthetic appreciation as modulated by interaction of the spectral and temporal organisations of music

Post-comatose patients with minimal consciousness tend to preserve reading comprehension skills but neglect syntax and spelling

Comments

Search

Quick links

Abstract

Main

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

The common limitations in auditory temporal processing for Mandarin Chinese and Japanese

The Intelligibility of Time-Compressed Speech Is Correlated with the Ability to Listen in Modulated Noise

Communication breakdown: Limits of spectro-temporal resolution for the perception of bat communication calls

Brain networks underlying aesthetic appreciation as modulated by interaction of the spectral and temporal organisations of music

Post-comatose patients with minimal consciousness tend to preserve reading comprehension skills but neglect syntax and spelling

Comments

Search

Quick links