Abstract
Speech perception requires the rapid and effortless extraction of meaningful phonetic information from a highly variable acoustic signal. A powerful example of this phenomenon is categorical speech perception, in which a continuum of acoustically varying sounds is transformed into perceptually distinct phoneme categories. We found that the neural representation of speech sounds is categorically organized in the human posterior superior temporal gyrus. Using intracranial high-density cortical surface arrays, we found that listening to synthesized speech stimuli varying in small and acoustically equal steps evoked distinct and invariant cortical population response patterns that were organized by their sensitivities to critical acoustic features. Phonetic category boundaries were similar between neurometric and psychometric functions. Although speech-sound responses were distributed, spatially discrete cortical loci were found to underlie specific phonetic discrimination. Our results provide direct evidence for acoustic-to–higher order phonetic level encoding of speech sounds in human language receptive cortex.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Perkell, J. & Klatt, D.H. Invariance and Variability in Speech Processes (Lawrence Erlbaum Associates, Hillsdale, New Jersey, 1986).
Liberman, A.M., Cooper, F.S., Shankweiler, D.P. & Studdert-Kennedy, M. Perception of the speech code. Psychol. Rev. 74, 431–461 (1967).
Diehl, R.L., Lotto, A.J. & Holt, L.L. Speech perception. Annu. Rev. Psychol. 55, 149–179 (2004).
Liberman, A.M. & Mattingly, I.G. A specialization for speech perception. Science 243, 489–494 (1989).
Vihman, M. Phonological Development: The Origins of Language in the Child (Wiley-Blackwell, Cambridge, 1996).
Liberman, A.M., Harris, K.S., Hoffman, H.S. & Griffith, B.C. The discrimination of speech sounds within and across phoneme boundaries. J. Exp. Psychol. 54, 358–368 (1957).
Harnad, S.R. Categorical Perception: The Groundwork of Cognition (Cambridge University Press, Cambridge, 1987).
Edwards, E. et al. Spatiotemporal imaging of cortical activation during verb generation and picture naming. Neuroimage 50, 291–301 (2010).
Creutzfeldt, O., Ojemann, G. & Lettich, E. Neuronal activity in the human lateral temporal lobe. I. Responses to speech. Exp. Brain Res. 77, 451–475 (1989).
Boatman, D., Lesser, R.P. & Gordon, B. Auditory speech processing in the left temporal lobe: an electrical interference study. Brain Lang. 51, 269–290 (1995).
Liebenthal, E., Binder, J.R., Spitzer, S.M., Possing, E.T. & Medler, D.A. Neural substrates of phonemic perception. Cereb. Cortex 15, 1621–1631 (2005).
Crone, N.E., Boatman, D., Gordon, B. & Hao, L. Induced electrocorticographic gamma activity during auditory perception. Brazier Award-winning article, 2001. Clin. Neurophysiol. 112, 565–582 (2001).
Howard, M.A. et al. Auditory cortex on the human posterior superior temporal gyrus. J. Comp. Neurol. 416, 79–92 (2000).
Penfield, W. & Jasper, H. Epilepsy and the Functional Anatomy of the Human Brain (LIttle, Brown and Company, Boston, 1954).
Haglund, M.M., Berger, M.S., Shamseldin, M., Lettich, E. & Ojemann, G.A. Cortical localization of temporal lobe language sites in patients with gliomas. Neurosurgery 34, 567–576 discussion 576 (1994).
Merzenich, M.M. & Brugge, J.F. Representation of the cochlear partition of the superior temporal plane of the macaque monkey. Brain Res. 50, 275–296 (1973).
Koh, K., Kim, S.J. & Boyd, S. An interior-point method for large-scale l1-regularized least squares. J. Mach. Learn. Res. 8, 1519–1555 (2007).
Miller, G.A. & Nicely, P.E. An analysis of perceptual confusions among some English consonants. J. Acoust. Soc. Am. 27, 338–352 (1955).
Iverson, P. & Kuhl, P.K. Perceptual magnet and phoneme boundary effects in speech perception: do they arise from a common mechanism? Percept. Psychophys. 62, 874–886 (2000).
Liberman, A.M. & Whalen, D.H. On the relation of speech to language. Trends Cogn. Sci. 4, 187–196 (2000).
Binder, J.R. et al. Human temporal lobe activation by speech and nonspeech sounds. Cereb. Cortex 10, 512–528 (2000).
Benson, R.R., Richardson, M., Whalen, D.H. & Lai, S. Phonetic processing areas revealed by sinewave speech and acoustically similar non-speech. Neuroimage 31, 342–353 (2006).
Uppenkamp, S., Johnsrude, I.S., Norris, D., Marslen-Wilson, W. & Patterson, R.D. Locating the initial stages of speech-sound processing in human temporal cortex. Neuroimage 31, 1284–1296 (2006).
Vouloumanos, A., Kiehl, K.A., Werker, J.F. & Liddle, P.F. Detection of sounds in the auditory stream: event-related fMRI evidence for differential activation to speech and nonspeech. J. Cogn. Neurosci. 13, 994–1005 (2001).
Jäncke, L., Wustenberg, T., Scheich, H. & Heinze, H.J. Phonetic perception and the temporal cortex. Neuroimage 15, 733–746 (2002).
Scott, S.K. & Wise, R.J. The functional neuroanatomy of prelexical processing in speech perception. Cognition 92, 13–45 (2004).
Hickok, G. & Poeppel, D. Dorsal and ventral streams: a framework for understanding aspects of the functional anatomy of language. Cognition 92, 67–99 (2004).
Whalen, D.H. et al. Differentiation of speech and nonspeech processing within primary auditory cortex. J. Acoust. Soc. Am. 119, 575–581 (2006).
Desai, R., Liebenthal, E., Waldron, E. & Binder, J.R. Left posterior temporal regions are sensitive to auditory categorization. J. Cogn. Neurosci. 20, 1174–1188 (2008).
Raizada, R.D. & Poldrack, R.A. Selective amplification of stimulus differences during categorical processing of speech. Neuron 56, 726–740 (2007).
Blumstein, S.E., Myers, E.B. & Rissman, J. The perception of voice onset time: an fMRI investigation of phonetic category structure. J. Cogn. Neurosci. 17, 1353–1366 (2005).
Blumstein, S.E. & Stevens, K.N. Perceptual invariance and onset spectra for stop consonants in different vowel environments. J. Acoust. Soc. Am. 67, 648–662 (1980).
Iverson, P. & Kuhl, P.K. Mapping the perceptual magnet effect for speech using signal detection theory and multidimensional scaling. J. Acoust. Soc. Am. 97, 553–562 (1995).
Kruskal, J.B. & Wish, M. Multidimensional Scaling (Sage Publications, Newbury Park, California, 1978).
Shepard, R.N. Multidimensional scaling, tree-fitting and clustering. Science 210, 390–398 (1980).
Acknowledgements
We are grateful to the four individuals who participated in this experiment and to A. Flinker for help with data acquisition. This research was supported by US National Institutes of Health grants NS21135 (R.T.K.), PO4813 (R.T.K.), F32NS061552 (E.F.C.), K99NS065120 (E.F.C.), FKZ-MK48-2009/003 (J.W.R.) and RI1511/1-3 (J.W.R.).
Author information
Authors and Affiliations
Contributions
E.F.C. designed the experiments, collected the data and wrote the manuscript. E.F.C. and J.W.R. analyzed the data, evaluated results and edited the manuscript. J.W.R., N.M.B. and M.S.B. helped with data collection. K.J. and R.T.K. reviewed the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–4, Supplementary Table 1 and Supplementary Results (PDF 221 kb)
Rights and permissions
About this article
Cite this article
Chang, E., Rieger, J., Johnson, K. et al. Categorical speech representation in human superior temporal gyrus. Nat Neurosci 13, 1428–1432 (2010). https://doi.org/10.1038/nn.2641
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nn.2641
This article is cited by
-
Acoustic and language-specific sources for phonemic abstraction from speech
Nature Communications (2024)
-
Joint, distributed and hierarchically organized encoding of linguistic features in the human auditory cortex
Nature Human Behaviour (2023)
-
Emergence of the cortical encoding of phonetic features in the first year of life
Nature Communications (2023)
-
Open multimodal iEEG-fMRI dataset from naturalistic stimulation with a short audiovisual film
Scientific Data (2022)
-
Modelling representations in speech normalization of prosodic cues
Scientific Reports (2022)