Cortical encoding of speech enhances task-relevant acoustic information

Rutten, Sanne; Santoro, Roberta; Hervais-Adelman, Alexis; Formisano, Elia; Golestani, Narly

doi:10.1038/s41562-019-0648-9

Article
Published: 08 July 2019

Cortical encoding of speech enhances task-relevant acoustic information

Nature Human Behaviour volume 3, pages 974–987 (2019)Cite this article

1728 Accesses
20 Citations
61 Altmetric
Metrics details

Subjects

A Publisher Correction to this article was published on 27 August 2019

This article has been updated

Abstract

Speech is the most important signal in our auditory environment, and the processing of speech is highly dependent on context. However, it is unknown how contextual demands influence the neural encoding of speech. Here, we examine the context dependence of auditory cortical mechanisms for speech encoding at the level of the representation of fundamental acoustic features (spectrotemporal modulations) using model-based functional magnetic resonance imaging. We found that the performance of different tasks on identical speech sounds leads to neural enhancement of the acoustic features in the stimuli that are critically relevant to task performance. These task effects were observed at the earliest stages of auditory cortical processing, in line with interactive accounts of speech processing. Our work provides important insights into the mechanisms that underlie the processing of contextually relevant acoustic information within our rich and dynamic auditory environment.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Target-specific modulation profiles for the six individual target sounds.**

**Fig. 2: Activations evoked by speech sounds during the speaker and phoneme tasks.**

**Fig. 3: Probabilistic maps of the ROIs.**

**Fig. 4: Marginal modulation profiles of the MTFs during the speaker and phoneme tasks.**

**Fig. 5: Marginal modulation profiles of the task-difference MTFs.**

**Fig. 6: Dissociated spectral and temporal modulation profiles for the two tasks.**

**Fig. 7: Target classification accuracies obtained during task performance.**

Large-scale single-neuron speech sound encoding across the depth of human cortex

Article Open access 13 December 2023

Speaker-normalized sound representations in the human auditory cortex

Article Open access 05 June 2019

Adaptation of the human auditory cortex to changing background noise

Article Open access 07 June 2019

Data availability

The stimuli and the sound representations of the stimuli (feature matrix S) and the estimated fMRI responses (beta-weights) from a subset of the participants from this study are available as Supplementary Audio Files, Supplementary Data 1 and 2.

Code availability

The code that support the findings of this study is available from the corresponding author upon reasonable request.

Change history

27 August 2019
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
The original and corrected figures are shown in the accompanying Publisher Correction.

References

Belin, P., Fecteau, S. & Bedard, C. Thinking the voice: neural correlates of voice perception. Trends Cogn. Sci. 8, 129–135 (2004).
Article Google Scholar
Leonard, M. K. & Chang, E. F. Dynamic speech representations in the human temporal lobe. Trends Cogn. Sci. 18, 472–479 (2014).
Article Google Scholar
Davis, M. H. & Johnsrude, I. S. Hearing speech sounds: top-down influences on the interface between audition and speech perception. Hear. Res. 229, 132–147 (2007).
Article Google Scholar
Leonard, M. K., Baud, M. O., Sjerps, M. J. & Chang, E. F. Perceptual restoration of masked speech in human cortex. Nat. Commun. 7, 13619 (2016).
Article CAS Google Scholar
Gaskell, M. G. & Marslen-Wilson, W. D. Integrating form and meaning: a distributed model of speech perception. Lang. Cogn. Process. 12, 613–656 (1997).
Article Google Scholar
McClelland, J. L., Mirman, D. & Holt, L. L. Are there interactive processes in speech perception? Trends Cogn. Sci. 10, 363–369 (2006).
Article Google Scholar
Chi, T., Ru, P. & Shamma, S. A. Multiresolution spectrotemporal analysis of complex sounds. J. Acoust. Soc. Am. 118, 887–906 (2005).
Article Google Scholar
Santoro, R. et al. Encoding of natural sounds at multiple spectral and temporal resolutions in the human auditory cortex. PLoS Comput. Biol. 10, e1003412 (2014).
Article Google Scholar
Schonwiesner, M. & Zatorre, R. J. Spectro-temporal modulation transfer function of single voxels in the human auditory cortex measured with high-resolution fMRI. Proc. Natl Acad. Sci. USA 106, 14611–14616 (2009).
Article Google Scholar
Theunissen, F. E., Sen, K. & Doupe, A. J. Spectral-temporal receptive fields of nonlinear auditory neurons obtained using natural sounds. J. Neurosci. 20, 2315–2331 (2000).
Article CAS Google Scholar
Mesgarani, N., Cheung, C., Johnson, K. & Chang, E. F. Phonetic feature encoding in human superior temporal gyrus. Science 343, 1006–1010 (2014).
Article CAS Google Scholar
Atiani, S., Elhilali, M., David, S. V., Fritz, J. B. & Shamma, S. A. Task difficulty and performance induce diverse adaptive patterns in gain and shape of primary auditory cortical receptive fields. Neuron 61, 467–480 (2009).
Article CAS Google Scholar
David, S. V., Fritz, J. B. & Shamma, S. A. Task reward structure shapes rapid receptive field plasticity in auditory cortex. Proc. Natl Acad. Sci. USA 109, 2144–2149 (2012).
Article CAS Google Scholar
Fritz, J., Elhilali, M. & Shamma, S. A. Differential dynamic plasticity of A1 receptive fields during multiple spectral tasks. J. Neurosci. 25, 7623–7635 (2005).
Article CAS Google Scholar
Fritz, J., Shamma, S., Elhilali, M. & Klein, D. Rapid task-related plasticity of spectrotemporal receptive fields in primary auditory cortex. Nat. Neurosci. 6, 1216–1223 (2003).
Article CAS Google Scholar
Golestani, N., Hervais-Adelman, A., Obleser, J. & Scott, S. K. Semantic versus perceptual interactions in neural processing of speech-in-noise. Neuroimage 79, 52–61 (2013).
Article Google Scholar
von Kriegstein, K., Smith, D. R. R., Patterson, R. D., Kiebel, S. J. & Griffiths, T. D. How the human brain recognizes speech in the context of changing speakers. J. Neurosci. 30, 629–638 (2010).
Article Google Scholar
Mesgarani, N. & Chang, E. F. Selective cortical representation of attended speaker in multi-talker speech perception. Nature 485, 233–236 (2012).
Article CAS Google Scholar
Holdgraf, C. R. et al. Rapid tuning shifts in human auditory cortex enhance speech intelligibility. Nat. Commun. 7, 13654 (2016).
Article CAS Google Scholar
Nourski, K. V., Steinschneider, M., Oya, H., Kawasaki, H. & Howard, M. A.III. Modulation of response patterns in human auditory cortex during a target detection task: an intracranial electrophysiology study. Int. J. Psychophysiol. 95, 191–201 (2015).
Article Google Scholar
Nourski, K. V., Steinschneider, M., Rhone, A. E. & Howard, M. A.III. Intracranial electrophysiology of auditory selective attention associated with speech classification tasks. Front. Hum. Neurosci. 10, 691 (2016).
Article Google Scholar
Steinschneider, M. et al. Differential activation of human core, non-core and auditory-related cortex during speech categorization tasks as revealed by intracranial recordings. Front. Neurosci. 8, 240 (2014).
Article Google Scholar
Bonte, M., Hausfeld, L., Scharke, W., Valente, G. & Formisano, E. Task-dependent decoding of speaker and vowel identity from auditory cortical response patterns. J. Neurosci. 34, 4548–4557 (2014).
Article CAS Google Scholar
Formisano, E., De Martino, F., Bonte, M. & Goebel, R. “Who” is saying “what”? Brain-based decoding of human voice and speech. Science 322, 970–973 (2008).
Article CAS Google Scholar
Kilian-Hutten, N., Valente, G., Vroomen, J. & Formisano, E. Auditory cortex encodes the perceptual interpretation of ambiguous sound. J. Neurosci. 31, 1715–1720 (2011).
Article Google Scholar
Ley, A. et al. Learning of new sound categories shapes neural response patterns in human auditory cortex. J. Neurosci. 32, 13273–13280 (2012).
Article CAS Google Scholar
Kay, K. N., Naselaris, T., Prenger, R. J. & Gallant, J. L. Identifying natural images from human brain activity. Nature 452, 352–355 (2008).
Article CAS Google Scholar
Miyawaki, Y. et al. Visual image reconstruction from human brain activity using a combination of multiscale local image decoders. Neuron 60, 915–929 (2008).
Article CAS Google Scholar
Moerel, M., De Martino, F. & Formisano, E. Processing of natural sounds in human auditory cortex: tonotopy, spectral tuning, and relation to voice sensitivity. J. Neurosci. 32, 14205–14216 (2012).
Article CAS Google Scholar
Santoro, R. et al. Reconstructing the spectrotemporal modulations of real-life sounds from fMRI response patterns. Proc. Natl Acad. Sci. USA 10, e1003412 (2017).
Google Scholar
Baumann, O. & Belin, P. Perceptual scaling of voice identity: common dimensions for different vowels and speakers. Psychol. Res. 74, 110–120 (2010).
Article Google Scholar
Mesgarani, N., David, S. V., Fritz, J. B. & Shamma, S. A. Phoneme representation and classification in primary auditory cortex. J. Acoust. Soc. Am. 123, 899–909 (2008).
Article Google Scholar
Chi, T., Gao, Y., Guyton, M. C., Ru, P. & Shamma, S. Spectro-temporal modulation transfer functions and speech intelligibility. J. Acoust. Soc. Am. 106, 2719–2732 (1999).
Article CAS Google Scholar
Saenz, M. & Langers, D. R. Tonotopic mapping of human auditory cortex. Hear. Res. 307, 42–52 (2014).
Article Google Scholar
Fritz, J., Elhilali, M. & Shamma, S. A. Adaptive changes in cortical receptive fields induced by attention to complex sounds. J. Neurophysiol. 98, 2337–2346 (2007).
Article Google Scholar
Yin, P., Fritz, J. B. & Shamma, S. A. Rapid spectrotemporal plasticity in primary auditory cortex during behavior. J. Neurosci. 34, 4396–4408 (2014).
Article CAS Google Scholar
Anton-Erxleben, K., Stephan, V. M. & Treue, S. Attention reshapes center-surround receptive field structure in macaque cortical area MT. Cereb. Cortex 19, 2466–2478 (2009).
Article Google Scholar
Shannon, R. V., Zeng, F. G., Kamath, V., Wygonski, J. & Ekelid, M. Speech recognition with primarily temporal cues. Science 270, 303–304 (1995).
Article CAS Google Scholar
Hickok, G. & Poeppel, D. The cortical organization of speech processing. Nat. Rev. Neurosci. 8, 393–402 (2007).
Article CAS Google Scholar
Liebenthal, E., Binder, J. R., Spitzer, S. M., Possing, E. T. & Medler, D. A. Neural substrates of phonemic perception. Cereb. Cortex 15, 1621–1631 (2005).
Article Google Scholar
Ahissar, M., Nahum, M., Nelken, I. & Hochstein, S. Reverse hierarchies and sensory learning. Phil. Trans. R. Soc. Lond. B 364, 285–299 (2009).
Article Google Scholar
Giraud, A. L. & Poeppel, D. in The Human Auditory Cortex, chapter 9 225–260 (eds Poeppel, D. et al.) (Springer-Verlag, 2012).
Moore, B. C. J. An Introduction to the Psychology of Hearing 4th edn (Academic, 1997).
Griffiths, T. D. & Warren, J. D. The planum temporale as a computational hub. Trends Neurosci. 25, 348–353 (2002).
Article CAS Google Scholar
Formisano, E. et al. Mirror-symmetric tonotopic maps in human primary auditory cortex. Neuron 40, 859–869 (2003).
Article CAS Google Scholar
De Angelis, V. et al. Cortical processing of pitch: model-based encoding and decoding of auditory fMRI responses to real-life sounds. Neuroimage 180, 291–300 (2017).
Article Google Scholar
Griffiths, T. D. & Hall, D. A. Mapping pitch representation in neural ensembles with fMRI. J. Neurosci. 32, 13343–13347 (2012).
Article CAS Google Scholar
Zatorre, R. J., Evans, A. C., Meyer, E. & Gjedde, A. Lateralization of phonetic and pitch discrimination in speech processing. Science 256, 846–849 (1992).
Article CAS Google Scholar
Bitterman, Y., Mukamel, R., Malach, R., Fried, I. & Nelken, I. Ultra-fine frequency tuning revealed in single neurons of human auditory cortex. Nature 451, 197–201 (2008).
Article CAS Google Scholar
Pasley, B. N. et al. Reconstructing speech from human auditory cortex. PLoS Biol. 10, e1001251 (2012).
Article CAS Google Scholar
Da Costa, S., van der Zwaag, W., Miller, L. M., Clarke, S. & Saenz, M. Tuning in to sound: frequency-selective attentional filter in human primary auditory cortex. J. Neurosci. 33, 1858–1863 (2013).
Article Google Scholar
De Martino, F. et al. Frequency preference and attention effects across cortical depths in the human primary auditory cortex. Proc. Natl Acad. Sci. USA 112, 16036–16041 (2015).
Article Google Scholar
Marques, J. P. et al. MP2RAGE, a self bias-field corrected sequence for improved segmentation and T1-mapping at high field. Neuroimage 49, 1271–1281 (2010).
Article Google Scholar
Gallichan, D., Marques, J. P. & Gruetter, R. Retrospective correction of involuntary microscopic head movement using highly accelerated fat image navigators (3D FatNavs) at 7T. Magn. Reson. Med. 75, 1030–1039 (2016).
Article CAS Google Scholar
Goebel, R., Esposito, F. & Formisano, E. Analysis of functional image analysis contest (FIAC) data with brainvoyager QX: from single-subject to cortically aligned group general linear model analysis and self-organizing group independent component analysis. Hum. Brain Mapp. 27, 392–401 (2006).
Article Google Scholar
Kim, J. J. et al. An MRI-based parcellation method for the temporal lobe. Neuroimage 11, 271–288 (2000).
Article Google Scholar
Bishop, C. Pattern Recognition and Machine Learning (Springer, 2006).
Golub, G., Heath, M. & Wahba, G. Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21, 215–223 (1979).
Article Google Scholar
Menke, J. & Martinez, T. Using permutations instead of Student’s t distribution for p-values in paired-difference algorithm comparisons. In Proc. IEEE International Joint Conference on Neural Networks 2, 1331–1335 (2004).
Forman, S. D. et al. Improved assessment of significant activation in functional magnetic resonance imaging (fMRI): use of a cluster-size threshold. Magn. Reson. Med. 33, 636–647 (1995).
Article CAS Google Scholar

Download references

Acknowledgements

We thank the staff at the Center for Biomedical Imaging EPFL, Vaud, Switzerland for access to the imaging platform, and W. van der Zwaag for facilitating data collection; J. Gonzalez for helping with auditory recording; F. Zay for reading the stimuli; C. Türk for assisting during data collection and L. Ermacora for the phonetic segmentation of the stimuli; F. De Martino for providing code for analysing the data; V. de Angelis and N. Disbergen for helping with data analysis; G. Valente for helping with the statistical analysis and D. Gallichan for motion correction of the anatomical images. This work was supported by the Swiss National Science Foundation (grant numbers PP00P3_133701, PP00P3_163756 and 100014_182381 awarded to N.G.) and the University of Geneva Language and Communication Research Network. E.F. was supported by The Netherlands Organisation for Scientific Research (VICI grant number 453-12-002) and the Dutch Province of Limburg. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

These authors jointly supervised this work: Elia Formisano, Narly Golestani.

Authors and Affiliations

Brain and Language Lab, Department of Psychology, Faculty of Psychology and Education Sciences, University of Geneva, Geneva, Switzerland
Sanne Rutten, Roberta Santoro, Alexis Hervais-Adelman & Narly Golestani
Neurolinguistics, Department of Psychology, University of Zurich, Zurich, Switzerland
Alexis Hervais-Adelman
Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, the Netherlands
Elia Formisano
Maastricht Brain Imaging Center, Maastricht, the Netherlands
Elia Formisano
Maastricht Centre for Systems Biology, Maastricht University, Maastricht, the Netherlands
Elia Formisano

Authors

Sanne Rutten
View author publications
You can also search for this author in PubMed Google Scholar
Roberta Santoro
View author publications
You can also search for this author in PubMed Google Scholar
Alexis Hervais-Adelman
View author publications
You can also search for this author in PubMed Google Scholar
Elia Formisano
View author publications
You can also search for this author in PubMed Google Scholar
Narly Golestani
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the conception and design of the experiment. N.G. and E.F. supervised the study. S.R. created the behavioural task and stimuli, programmed the fMRI experiment, collected, analysed (including writing code) and interpreted the data, and wrote the manuscript. R.S. helped to program the fMRI experiment and to analyse the data (including writing code for it). A.H.-A. helped to create the stimuli and to implement the behavioural task. E.F. supervised the data analysis (including writing code for and implementing it), guided data interpretation and helped write the manuscript. N.G. helped to create the stimuli, to guide the data analysis and interpretation and to write the manuscript.

Corresponding author

Correspondence to Sanne Rutten.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information: Primary Handling Editor: Mary Elizabeth Sutherland.

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–16, Supplementary Tables 1 and 2, Supplementary Results 1 and 2, Supplementary Methods 1–7.

Reporting Summary

Supplementary Audio Files

Audio files of the stimuli used in the paper (for a complete description of each file, see Supplementary Information guide).

Supplementary Data 1

Feature matrix S that was obtained from the stimuli (for more information, see Supplementary Information guide).

Supplementary Data 2

Beta-weights that represent the fMRI responses to individual speech sounds for an example ROI (for more information, see Supplementary Information guide).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rutten, S., Santoro, R., Hervais-Adelman, A. et al. Cortical encoding of speech enhances task-relevant acoustic information. Nat Hum Behav 3, 974–987 (2019). https://doi.org/10.1038/s41562-019-0648-9

Download citation

Received: 21 August 2018
Accepted: 03 June 2019
Published: 08 July 2019
Issue Date: September 2019
DOI: https://doi.org/10.1038/s41562-019-0648-9

This article is cited by

Spectrotemporal cues and attention jointly modulate fMRI network topology for sentence and melody perception
- Felix Haiduk
- Robert J. Zatorre
- Philippe Albouy
Scientific Reports (2024)
Auditory cortical micro-networks show differential connectivity during voice and speech processing in humans
- Florence Steiner
- Marine Bobin
- Sascha Frühholz
Communications Biology (2021)
Speech signal analysis of alzheimer’s diseases in farsi using auditory model system
- Maryam Momeni
- Mahdiyeh Rahmani
Cognitive Neurodynamics (2021)
TASH: Toolbox for the Automated Segmentation of Heschl’s gyrus
- Josué Luiz Dalboni da Rocha
- Peter Schneider
- Narly Golestani
Scientific Reports (2020)