Abstract
Humans possess a remarkable ability to attend to a single speaker’s voice in a multi-talker background1,2,3. How the auditory system manages to extract intelligible speech under such acoustically complex and adverse listening conditions is not known, and, indeed, it is not clear how attended speech is internally represented4,5. Here, using multi-electrode surface recordings from the cortex of subjects engaged in a listening task with two simultaneous speakers, we demonstrate that population responses in non-primary human auditory cortex encode critical features of attended speech: speech spectrograms reconstructed based on cortical responses to the mixture of speakers reveal the salient spectral and temporal features of the attended speaker, as if subjects were listening to that speaker alone. A simple classifier trained solely on examples of single speakers can decode both attended words and speaker identity. We find that task performance is well predicted by a rapid increase in attention-modulated neural selectivity across both single-electrode and population-level cortical responses. These findings demonstrate that the cortical representation of speech does not merely reflect the external acoustic environment, but instead gives rise to the perceptual aspects relevant for the listener’s intended goal.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Experimental validation of the free-energy principle with in vitro neural networks
Nature Communications Open Access 07 August 2023
-
Parvalbumin neurons enhance temporal coding and reduce cortical noise in complex auditory scenes
Communications Biology Open Access 19 July 2023
-
Induced alpha and beta electroencephalographic rhythms covary with single-trial speech intelligibility in competition
Scientific Reports Open Access 23 June 2023
Access options
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout




References
Cherry, E. C. Some experiments on the recognition of speech, with one and with two ears. J. Acoust. Soc. Am. 25, 975–979 (1953)
Shinn-Cunningham, B. G. Object-based auditory and visual attention. Trends Cogn. Sci. 12, 182–186 (2008)
Bregman, A. S. Auditory Scene Analysis: The Perceptual Organization of Sound (MIT Press, 1994)
Kerlin, J., Shahin, A. & Miller, L. Attentional gain control of ongoing cortical speech representations in a “cocktail party”. J. Neurosci. 30, 620–628 (2010)
Besle, J. et al. Tuning of the human neocortex to the temporal dynamics of attended events. J. Neurosci. 31, 3176–3185 (2011)
Bee, M. & Micheyl, C. The cocktail party problem: what is it? How can it be solved? And why should animal behaviorists study it? J. Comparative Psychol. 122, 235–252 (2008)
Shinn-Cunningham, B. G. & Best, V. Selective attention in normal and impaired hearing. Trends Amplif. 12, 283–299 (2008)
Scott, S. K., Rosen, S., Beaman, C. P., Davis, J. P. & Wise, R. J. S. The neural processing of masked speech: evidence for different mechanisms in the left and right temporal lobes. J. Acoust. Soc. Am. 125, 1737–1743 (2009)
Elhilali, M., Xiang, J., Shamma, S. A. & Simon, J. Z. Interaction between attention and bottom-up saliency mediates the representation of foreground and background in an auditory scene. PLoS Biol. 7, e1000129 (2009)
Chang, E. F. et al. Categorical speech representation in human superior temporal gyrus. Nature Neurosci. 13, 1428–1432 (2010)
Crone, N. E., Boatman, D., Gordon, B. & Hao, L. Induced electrocorticographic gamma activity during auditory perception. Clin. Neurophysiol. 112, 565–582 (2001)
Steinschneider, M., Fishman, Y. I. & Arezzo, J. C. Spectrotemporal analysis of evoked and induced electroencephalographic responses in primary auditory cortex (A1) of the awake monkey. Cereb. Cortex 18, 610–625 (2008)
Scott, S. K. & Johnsrude, I. S. The neuroanatomical and functional organization of speech perception. Trends Neurosci. 26, 100–107 (2003)
Hackett, T. A. Information flow in the auditory cortical network. Hear. Res. 271, 133–146 (2011)
Bolia, R. S., Nelson, W. T., Ericson, M. A. & Simpson, B. D. A speech corpus for multitalker communications research. J. Acoust. Soc. Am. 107, 1065–1066 (2000)
Brungart, D. S. Informational and energetic masking effects in the perception of two simultaneous talkers. J. Acoust. Soc. Am. 109, 1101–1109 (2001)
Mesgarani, N., David, S. V., Fritz, J. B. & Shamma, S. A. Influence of context and behavior on stimulus reconstruction from neural activity in primary auditory cortex. J. Neurophysiol. 102, 3329–3339 (2009)
Bialek, W., Rieke, F., de Ruyter van Steveninck, R. R. & Warland, D. Reading a neural code. Science 252, 1854–1857 (1991)
Pasley, B. N. et al. Reconstructing speech from human auditory cortex. PLoS Biol. 10, e1001251 (2012)
Garofolo, J. S. et al. TIMIT Acoustic-Phonetic Continuous Speech Corpus (Linguistic Data Consortium, 1993)
Rifkin, R., Yeo, G. & Poggio, T. Regularized least-squares classification. Nato Science Series Sub Series III Computer and Systems Sciences 190, 131–154 (2003)
Formisano, E., De Martino, F., Bonte, M. & Goebel, R. “Who” is saying “what”? Brain-based decoding of human voice and speech. Science 322, 970–973 (2008)
Staeren, N., Renvall, H., De Martino, F., Goebel, R. & Formisano, E. Sound categories are represented as distributed patterns in the human auditory cortex. Curr. Biol. 19, 498–502 (2009)
Shamma, S. A., Elhilali, M. & Micheyl, C. Temporal coherence and attention in auditory scene analysis. Trends Neurosci. 34, 114–123 (2010)
Darwin, C. J. Auditory grouping. Trends Cogn. Sci. 1, 327–333 (1997)
Warren, R. M. Perceptual restoration of missing speech sounds. Science 167, 392–393 (1970)
Kidd, G., Jr, Arbogast, T. L., Mason, C. R. & Gallun, F. J. The advantage of knowing where to listen. J. Acoust. Soc. Am. 118, 3804–3815 (2005)
Shen, W., Olive, J. & Jones, D. Two protocols comparing human and machine phonetic discrimination performance in conversational speech. INTERSPEECH 1630–1633. (2008)
Cooke, M., Hershey, J. R. & Rennie, S. J. Monaural speech separation and recognition challenge. Comput. Speech Lang. 24, 1–15 (2010)
Acknowledgements
The authors would like to thank A. Ren for technical help, and C. Micheyl, S. Shamma and C. Schreiner for critical discussion and reading of the manuscript. E.F.C. was funded by National Institutes of Health grants R00-NS065120, DP2-OD00862, R01-DC012379, and the Ester A. and Joseph Klingenstein Foundation.
Author information
Authors and Affiliations
Contributions
N.M. and E.F.C. designed the experiment, collected the data, evaluated results and wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Figures
This file contains Supplementary Figures 1-3. (PDF 1436 kb)
Rights and permissions
About this article
Cite this article
Mesgarani, N., Chang, E. Selective cortical representation of attended speaker in multi-talker speech perception. Nature 485, 233–236 (2012). https://doi.org/10.1038/nature11020
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nature11020
This article is cited by
-
Semantic reconstruction of continuous language from non-invasive brain recordings
Nature Neuroscience (2023)
-
Parvalbumin neurons enhance temporal coding and reduce cortical noise in complex auditory scenes
Communications Biology (2023)
-
Induced alpha and beta electroencephalographic rhythms covary with single-trial speech intelligibility in competition
Scientific Reports (2023)
-
Experimental validation of the free-energy principle with in vitro neural networks
Nature Communications (2023)
-
How does the human brain process noisy speech in real life? Insights from the second-person neuroscience perspective
Cognitive Neurodynamics (2023)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.