Box 2 | Why is speech categorization difficult?

From the following article:

Early language acquisition: cracking the speech code

Patricia K. Kuhl

Nature Reviews Neuroscience 5, 831-843 (November 2004)


Phonemic categories are composed of finite sets of phonetic units. Phonetic units are difficult to define physically because every utterance, even of the same phonetic unit, is acoustically distinct. Different talkers, rates of speech and contexts all contribute to the variability observed in speech.

Early language acquisition: cracking the speech code 

Talker variability

When different talkers produce the same phonetic unit, such as a simple vowel, the acoustic results (FORMANT FREQUENCIES) vary widely. This is because of the variability in vocal tract size and shape, and is especially different when men, women and children produce the same phonetic unit. In the drawing, each ellipse represents an English vowel, and each symbol within the circle represents one person's production35.

Rate variability

Slow speech results in different acoustic properties from faster speech, making physical descriptions of phonetic units difficult22.

Context variability

The acoustic values of a phonetic unit change depending on the preceding and following phonemes23.

These variations make it difficult to rely on absolute acoustic values to determine the phonetic category of a particular speech sound. Despite all of these sources of variability, infants perceive phonetic similarity across talkers, rates and contexts19, 20, 21, 22, 23. By contrast, current computer speech-recognition systems cannot recognize phonetic similarity when the talker, rate and context change24. Figure reproduced, with permission, from Ref. 35 © (1995) Acoustical Society of America.